Crawler

A crawler (spider, robot) is a program controlled by a crawl controller module that "browses" the Web, recursively fetching links from a set of start pages and compresses and stores the pages in a page repository. URLs and their links form a Web graph, and they are passed to the crawler control module, which decides the movement in the graph. Document identifiers are used to represent pages in the index and other data structures to save space.