Downloads: 124 | Views: 223
M.Tech / M.E / PhD Thesis | Computer Science & Engineering | India | Volume 4 Issue 2, February 2015 | Popularity: 7 / 10
Efficient Way of Determining the Number of Clusters Using Hadoop Architecture
Siri H. P., Shashikala.B
Abstract: The process of data mining is to extract information from a data set and transform it into an understandable structure. The clustering task plays a very important role in many areas such as exploratory data analysis, pattern recognition, computer vision, and information retrieval. The key idea is to view clustering as a supervised classification problem, in which we estimate the true class labels. The problem of determining the valid number of clusters is not easy. To overcome this problem many well known methods are used to find a correct number of clusters i. e. Gap statistic, Path based clustering and Figure of Merit (FOM) but these methods could not solve the problem of finding number of clusters efficiently. This paper focuses on Average Intracluster Distance index to validate the estimated number of arbitrary shaped clusters. In hadoop the proposed technique is based on the local relations between patterns and their clustering labels which makes use of Minimum Spanning Tree (MST) algorithm based on the multiplicity property of MST to get accurate results in efficient manner.
Keywords: Minimum Spanning Tree MST, Gap statistic, IC-av
Edition: Volume 4 Issue 2, February 2015
Pages: 633 - 638
Make Sure to Disable the Pop-Up Blocker of Web Browser