psu.edu[PDF] S Raghavan, H Garcia-Molina - … of the International Conference on Very …, 2001 - Citeseer Current-day crawlers retrieve content only from the publicly indexable Web, ie, the set of web
pages reachable purely by following hypertext links, ignoring search forms and pages that require
authorization or prior regis- tration. In particular, they ignore the tremendous amount of ... Cited by 434 - Related articles - View as HTML - BL Direct - All 61 versions
psu.edu[PDF] H Kautz, B Selman, M Shah - AI magazine, 1997 - aaai.org The vast network of linked documents that make up the World Wide Web (WWW) is only one
manifestation of a larger and more profound phenomenon; namely, the social network that links
all peo- ple. In the 1960s, Stanley Milgram's (1967) pioneering work on the small-world ... Cited by 259 - Related articles - BL Direct - All 20 versions
psu.edu[PDF] PG Ipeirotis, L Gravano - … of the 28th international conference on …, 2002 - portal.acm.org Many valuable text databases on the web have non-crawlable contents that are “hidden” be-
hind search interfaces. Metasearchers are help- ful tools for searching over many such databases
at once through a unified query interface. A critical task for a metasearcher to process a ... Cited by 129 - Related articles - All 38 versions
psu.edu[PDF] PG Ipeirotis, L Gravano, M Sahami - Proceedings of the 2001 ACM …, 2001 - portal.acm.org ABSTRACT The contents of many valuable web-accessible databases are only accessible through
search interfaces and are hence in- visible to traditional web “crawlers.” Recent studies have
estimated the size of this “hidden web” to be 500 billion pages, while the size of the “ ... Cited by 128 - Related articles - BL Direct - All 27 versions
psu.edu[PDF] L Gravano, PG Ipeirotis, M Sahami - ACM Transactions on …, 2003 - portal.acm.org The contents of many valuable Web-accessible databases are only available through search
inter- faces and are hence invisible to traditional Web “crawlers.” Recently, commercial Web
sites have started to manually organize Web-accessible databases into Yahoo!-like ... Cited by 90 - Related articles - BL Direct - All 24 versions
psu.edu[PDF] A Ntoulas, P Zerfos, J Cho - Proceedings of the 5th ACM/IEEE-CS …, 2005 - portal.acm.org ABSTRACT An ever-increasing amount of information on the Web today is available only through
search interfaces: the users have to type in a set of keywords in a search form in order to access
the pages from certain Web sites. These pages are often referred to as the Hidden Web or ... Cited by 73 - Related articles - All 14 versions
psu.edu[PDF] L Barbosa, J Freire - Proc. of SBBD, 2004 - Citeseer Abstract In this paper, we study the problem of automating the retrieval of data hidden behind
simple search interfaces that accept keyword-based queries. Our goal is to automatically retrieve
all available results (or, as many as possible). We propose a new approach to siphon ... Cited by 50 - Related articles - View as HTML - All 10 versions
psu.edu[PDF] L Barbosa, J Freire - Proceedings of WebDB, 2005 - Citeseer ABSTRACT Recently, there has been increased interest in the retrieval and inte- gration of hidden Web data with a view to leverage high-quality in- formation available in online databases. Although
previous works have addressed many aspects of the actual integration, including ... Cited by 42 - Related articles - View as HTML - All 7 versions
J Palmieri Lage, AS da Silva, PB Golgher, AHF … - Data & Knowledge …, 2004 - Elsevier As the Web grows, more and more data has become available under dynamic forms of
publication, such as legacy databases accessed by an HTML form (the so called hidden Web). In situations such as this, integration of this data relies more and more on the fast ... Cited by 44 - Related articles - All 5 versions
A Bergholz, B Chidlovskii - Proceedings of the …, 2003 - doi.ieeecomputersociety.org The Hidden Web, the part of the Web that remains unavailable for standard crawlers, has become
an im- portant research topic during recent years. Its size is estimated to 400 to 500 times larger
than that of the Publicly Indexable Web (PIW). Furthermore, the in- formation on the ... Cited by 34 - Related articles - All 6 versions