dc.contributor.author | Odebode, Afees | |
dc.date.accessioned | 2017-09-01T18:13:54Z | |
dc.date.available | 2017-09-01T18:13:54Z | |
dc.date.issued | 2017-09-01T18:13:54Z | |
dc.identifier.uri | http://hdl.handle.net/10222/73283 | |
dc.description | Thesis submission | en_US |
dc.description.abstract | Availability of large temporal data enabled by improved collection tools and
storage devices has posed a new set of challenges in data mining, especially in the area of
clustering data into different groups according to the basic attributes. The existing
clustering algorithms, such as K-means, tend to suffer from slow processing speed. In
addition, most of them lack the ability to eliminate outliers and anomalies. In this thesis,
we present three fast clustering algorithms with noise removal capability: KD, KDS, and
KDSD.
Technically, the proposed algorithms make use of the features of three existing
data mining methods, K-means, DBSCAN and K-Nearest Neighbor (KNN). K-means has
been an effective clustering algorithm. However, the clusters resulting from K-means are
likely to include many outliers. In addition, K-means does not scale well with cluster
size. In our research, to tackle the outlier problem, we proposed KD, a novel clustering
algorithm with noise removal capability that is based on K-means and DBSCAN.
Essentially, DBSCAN is employed to remove the outliers in the clusters resulting from
K-means. To solve the scaling problem with K-means, we proposed KDS, a fast
clustering algorithm that scales well. Finally, KDSD, a fast clustering algorithm with
noise removal capability was proposed to achieve both excellent scalability and noise
removal ability.
The performance of the proposed algorithms is thoroughly investigated through
extensive experiments with a large power consumption data set. Our experimental results
indicate that, compared to K-means, KDS runs at a much faster rate. Specifically, it takes
K-means 7.56 seconds to cluster the whole data set under investigation. However, it takes
KDS 0.363 seconds and 0.513 seconds in the case of 1% and 5% training sample
respectively. In addition, although KDSD is not as fast as KDS due to the final anomaly
removal operation, it outperforms KD. In our experiments, it takes KD 268.62 seconds to
complete the clustering process while it takes KDSD 237.836 seconds in the worst case. | en_US |
dc.language.iso | en | en_US |
dc.subject | Outliers | en_US |
dc.subject | Clustering | en_US |
dc.subject | Smart Meters | en_US |
dc.title | FAST CLUSTERING WITH NOISE REMOVAL FOR LARGE DATASETS | en_US |
dc.date.defence | 2017-08-11 | |
dc.contributor.department | Faculty of Computer Science | en_US |
dc.contributor.degree | Master of Computer Science | en_US |
dc.contributor.external-examiner | n/a | en_US |
dc.contributor.graduate-coordinator | Malcolm Heywood | en_US |
dc.contributor.thesis-reader | Dr. Vlado Keselj | en_US |
dc.contributor.thesis-reader | Dr. Qigang Gao | en_US |
dc.contributor.thesis-supervisor | Dr.Srinivas Sampalli | en_US |
dc.contributor.thesis-supervisor | Dr. Qiang Ye | en_US |
dc.contributor.ethics-approval | Not Applicable | en_US |
dc.contributor.manuscripts | Not Applicable | en_US |
dc.contributor.copyright-release | Not Applicable | en_US |