International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 128 | Views: 252

Dissertation Chapters | Computer Science & Engineering | India | Volume 3 Issue 4, April 2014 | Popularity: 6.9 / 10


     

Mining Contents in Web Pages and Ranking of Web Pages Using Cosine Similarity

Divya C.


Abstract: Now a days internet has become a part of life because of which web pages have became a key communication and information medium for various organizations. Web pages typically contain a large amount of information that is not part of the main contents of the pages, e. g. ; banner ads, navigation bars, copy right and privacy notices, advertisements which are not related to the main content (relevant information). In this paper the system use HTML Parser to construct DOM (Document Object Model) tree from which Content Structure Tree (CST) is constructed which can easily separate the main content blocks from the other blocks. The paper also introduces a method for calculating the rank of a web page based on the content similarity between the web documents and the user query, since usually when the user searches for web pages using a key word many web pages are retrieved the user might not be knowing which web pages are most relevant to overcome this problem the web pages are ranked using Cosine Similarity and Jaccard Similarity. The Cosine Similarity and Jaccard Similarity are implemented with the stop word removal algorithm. Many experiments were conducted for both Cosine Similarity and Jaccard Similarity. The obtained results have been compared to decide which one work best. The result was that Cosine Similarity retrieved most relevant pages to the user than the Jaccard Similarity.


Keywords: Content mining, DOM tree, CST tree, TF-IDF, Cosine Similarity


Edition: Volume 3 Issue 4, April 2014


Pages: 178 - 184



Make Sure to Disable the Pop-Up Blocker of Web Browser




Text copied to Clipboard!
Divya C., "Mining Contents in Web Pages and Ranking of Web Pages Using Cosine Similarity", International Journal of Science and Research (IJSR), Volume 3 Issue 4, April 2014, pp. 178-184, https://www.ijsr.net/getabstract.php?paperid=20131317



Similar Articles

Downloads: 1 | Monthly Hits: ⮙1

Student Project, Computer Science & Engineering, India, Volume 11 Issue 5, May 2022

Pages: 650 - 654

Automatic Text Summarization and Audio Generation

Tanooja K, Tejasri K, Akhilesh T, Prasanna Kavya M

Share this Article

Downloads: 106

Research Paper, Computer Science & Engineering, India, Volume 5 Issue 5, May 2016

Pages: 1964 - 1967

Improving Performance of Hindi-English based Cross Language Information Retrieval using Selective Documents Technique and Query Expansion

Aditi Agrawal, Dr. A. J. Agrawal

Share this Article

Downloads: 109

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 5 Issue 7, July 2016

Pages: 1240 - 1244

Implementing K-Means Clustering Algorithm Using MapReduce Paradigm

Botcha Chandrasekhara Rao, Medara Rambabu

Share this Article

Downloads: 110

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 4, April 2015

Pages: 981 - 984

Using SVM and Stopword removal method in Microblogging Classroom

Vidya Dhuttargaon, Amit R. Sarkar

Share this Article

Downloads: 112

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 4, April 2015

Pages: 2630 - 2634

A Review on Identifying the Main Content From Web Pages

Madhura R. Kaddu, Dr. R. B. Kulkarni

Share this Article



Top