International Journal of Multidisciplinary Research Studies

Publishers of Refereed Open Access Indexed Journals

Journals

Archive

Conference Series

Abstract Details

Title :	Crawling the Web: Discovery and Maintenance of Large Scale Web Data: A Review
Author :	Rupali Ahuja, Amit Chawla
Journal name :	IJMRS's International Journal of Engineering Sciences, ISSN 2277-9698
Volume :	Volume 02, Issue 01, Mar. 2013
Keywords :	Crawler, Optimization, Duplicate, Webpage, Prioritization.
Abstract :	We are doing effective Web Crawling by generating a XML document for the crawling links & pages. Here we are implementing the Web Crawler for the website downloader. We will search the web page for a hyperlinks present in different formats. Filter these hyperlinks & arrange them in the form of XML document. Read each link from this seed page & use it as new page. Then XML document is parsed to the application it will start from top to bottom & start downloading the pages. In a large distributed system like the Web, users find resources by following hypertext links from one document to another. When the system is small and its resources share the same fundamental purpose, users can find resources of interest with relative ease. However, with the Web now encompassing millions of sites with many different purposes, navigation is difficult. WebCrawler, the Web’s first comprehensive full-text search engine, is a tool that assists users in their Web navigation by automating the task of link traversal, creating a searchable index of the web, and fulfilling searchers’ queries from the index.
Download Paper :	Download complete paper

Powered by Fynlogic

Copyright © 2024 ijmrs.com