[WEB4LIB] web characterization program - from CNI

Andrew Mutch amutch at waterford.lib.mi.us
Wed Oct 25 08:44:50 EDT 2000


GraceAnne,

I recently posted some related links [just before the OCLC release came out]
that relate the same kind of information but focus on individual pages vs. web
sites. The purpose of my original post was to point out the technical
difficulty that Internet filtering companies face trying to index all of the
"bad" sites of the Internet.  Here's a chunk of that post.  PS - The
"cyveillance" site has a fun little "running" total of the size of the
Internet.

Andrew Mutch
Library Systems Technician
Waterford Township Public Library
Waterford, MI

-------------------------------------------------------------------
Size of the Internet [2.1 billion unique, publicly available pages exist
on the Internet]
http://cyveillance.com/newsroom/pressr/000710.asp
(note: this survey was done in July, the current estimate is a size of
2.8 billion)

Number of Pages added to the Internet each day [7 million]
http://cyveillance.com/newsroom/pressr/000710.asp
(note: current estimates are closer to 10 million pages a day)

Size of Search Engine Indexes [Largest - Google -- directly indexing 560
million pages -- link data for 1 billion pages]
http://searchenginewatch.com/reports/sizes.html

Data Not Indexed by Search Engines [est. 500 billion pages of
information]
http://searchenginewatch.internet.com/sereport/00/08-deepweb.html





More information about the Web4lib mailing list