International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 123 | Views: 308

M.Tech / M.E / PhD Thesis | Computer Science & Engineering | India | Volume 5 Issue 5, May 2016 | Popularity: 7.1 / 10


     

Performance Evaluation of Cluster Based Algorithm used for Text Document Classification

Rohit S. Patil, Manish Bhardwaj


Abstract: In this paper we develop a complete methodology for document classification and clustering. We start by investigating how the choice of document features influences the performance of a document classifier and then use our findings to develop a clustering method suitable for document collections. From our study of the effect of frequency transformation, term weighting and dimensionality reduction through principal components analysis on the performance of a simple K-nearest-neighbors classifier, we conclude that (a) applying a logarithm or square-root transformation to the term frequencies reduces error rates, (b) term weighting of the transformed frequencies does not appear to help much, and (c) increasing the feature space dimension beyond 50 does not improve performance. We use these findings in the construction of a Gaussian Mixture Document Clustering (GMDC) algorithm. This algorithm models the data as a sample from a Gaussian mixture. The model is used to build clusters based on the likelihood of the data, and to classify documents according to Bayes rule. Finally we will build our own classifier which will have ability to automatically select the number of clusters present in the document collection and do classification more efficiently then above two classifier.


Keywords: clustering, classification, text mining, dimensionality reduction, Gaussian mixture


Edition: Volume 5 Issue 5, May 2016


Pages: 751 - 754


DOI: https://www.doi.org/10.21275/7051602



Make Sure to Disable the Pop-Up Blocker of Web Browser




Text copied to Clipboard!
Rohit S. Patil, Manish Bhardwaj, "Performance Evaluation of Cluster Based Algorithm used for Text Document Classification", International Journal of Science and Research (IJSR), Volume 5 Issue 5, May 2016, pp. 751-754, https://www.ijsr.net/getabstract.php?paperid=7051602, DOI: https://www.doi.org/10.21275/7051602



Similar Articles

Downloads: 0

Student Project, Computer Science & Engineering, India, Volume 11 Issue 6, June 2022

Pages: 1875 - 1880

Microclustering with Outlier Detection for DADC

Aswathy Priya M.

Share this Article

Downloads: 0

Survey Paper, Computer Science & Engineering, India, Volume 11 Issue 7, July 2022

Pages: 1023 - 1029

A Survey and High-Level Design on Human Activity Recognition

Abhishikat Kumar Soni, Dhruv Agrawal, Md. Ahmed Ali, Dr. B. G. Prasad

Share this Article

Downloads: 1 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Experimental Result Paper, Computer Science & Engineering, India, Volume 11 Issue 6, June 2022

Pages: 1038 - 1041

Classification of Glassdoor Pros and Cons into Pre-Defined Categories

Mahak, Aditya Raj Gupta, Deepti Buriya

Share this Article

Downloads: 1 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Analysis Study Research Paper, Computer Science & Engineering, India, Volume 12 Issue 11, November 2023

Pages: 1840 - 1846

Analysis of Placement for Electronics and Communication Engineering Students using Multiple Clustering

Dr. Dola Sanjay S

Share this Article

Downloads: 1 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Analysis Study Research Paper, Computer Science & Engineering, India, Volume 13 Issue 1, January 2024

Pages: 805 - 811

Predicting the Energy Efficiency in Wireless Sensor Networks using LSTM and Random Forest Method

Aruna Reddy H., Shivamurthy G., Rajanna M.

Share this Article
Top