Downloads: 106 | Views: 308

Review Papers | Computer Science & Engineering | India | Volume 4 Issue 12, December 2015 | Popularity: 6.5 / 10

Focused and Adaptive Crawling for Topic Specific and Hidden Web Entries

Vrutuja Pande, Pratap Singh

Abstract: In this paper we describe new adaptive crawling strategies to efficiently locate the entry points to hidden-Web sources and we describe a new hypertext resource discovery system called a Focused Crawler. The fact that hidden-Web sources are very sparsely distributed makes the problem of locating them especially challenging. We deal with this problem by using the contents of pages to focus the crawl on a topic, by prioritizing promising links within the topic, and by also following links that may not lead to immediate benefit. We propose a new framework whereby crawlers automatically learn patterns of promising links and adapt their focus as the crawl progresses, thus greatly reducing the amount of required manual setup and tuning. The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified not using s, but using exemplary documents. Rather than collecting and indexing all accessible Web documents to be able to answer all possible ad-hoc queries, a focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl, and avoid and network resources, and helps keep the crawl more up-to-dates we designed two hypertext mining programs that guide our crawler a classifier that evaluates the relevance of a hypertext document with respect to the focus topics, and a distiller that identifies hypertext nodes that are great access points to many relevant pages within a few links, Irrelevant regions of the Web. This leads to significant savings in hardware. Our experiments over real Web pages in a representative set of domains indicate that online learning leads to significant gains in harvest ratesthe adaptive crawlers retrieve up to three times as many forms as crawlers that use a fixed focus strategy.

Keywords: Web resource discovery, Classification, Categorization, Web crawling strategies

Edition: Volume 4 Issue 12, December 2015

Pages: 2212 - 2215

DOI: https://www.doi.org/10.21275/NOV152532

Make Sure to Disable the Pop-Up Blocker of Web Browser

Text copied to Clipboard!

Vrutuja Pande, Pratap Singh, "Focused and Adaptive Crawling for Topic Specific and Hidden Web Entries", International Journal of Science and Research (IJSR), Volume 4 Issue 12, December 2015, pp. 2212-2215, https://www.ijsr.net/getabstract.php?paperid=NOV152532, DOI: https://www.doi.org/10.21275/NOV152532

A Survey and High-Level Design on Human Activity Recognition

Abhishikat Kumar Soni, Dhruv Agrawal, Md. Ahmed Ali, Dr. B. G. Prasad

Share this Article

Downloads: 1 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Experimental Result Paper, Computer Science & Engineering, India, Volume 11 Issue 6, June 2022

Pages: 1038 - 1041

Classification of Glassdoor Pros and Cons into Pre-Defined Categories

Mahak, Aditya Raj Gupta, Deepti Buriya

Share this Article

Downloads: 1 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Masters Thesis, Computer Science & Engineering, India, Volume 11 Issue 7, July 2022

Pages: 1502 - 1505

Model of Decision Tree for Email Classification

Nallamothu Naveen Kumar

Share this Article

Downloads: 1 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Review Papers, Computer Science & Engineering, India, Volume 13 Issue 8, August 2024

Pages: 1049 - 1053

Deep Learning - Based Diabetic Retinopathy Detection: A Survey on Deep Learning Architectures

Dr. Srinidhi G. A.

Share this Article

Downloads: 1 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Research Paper, Computer Science & Engineering, India, Volume 13 Issue 10, October 2024

Pages: 1831 - 1836

Risk Assessment in Online Social Networks Through Client Activity Analysis using Machine Learning

Sanaboina Chandra Sekhar

Share this Article

Focused and Adaptive Crawling for Topic Specific and Hidden Web Entries

Similar Articles

A Survey and High-Level Design on Human Activity Recognition

Classification of Glassdoor Pros and Cons into Pre-Defined Categories

Model of Decision Tree for Email Classification

Deep Learning - Based Diabetic Retinopathy Detection: A Survey on Deep Learning Architectures

Risk Assessment in Online Social Networks Through Client Activity Analysis using Machine Learning