[Web4lib] Web Authoring Statistics From 1 Billion Pages
Blake Carver
lists at lisnews.com
Wed Jan 25 15:38:47 EST 2006
Just saw this on Slashdot:
http://code.google.com/webstats/index.html
Google did an analysis of a sample of slightly over a billion documents,
extracting information about popular class names, elements, attributes, and
related metadata.
The parser looked only at documents whose HTTP headers including a
Content-Type header with a value that started with the nine characters
text/html.
They covered things like Pages, Classes, Headers, Metadate and Editors.
--------------
Blake Carver
LISNews.org
Librarian & Information Science News
http://www.lisnews.org
More information about the Web4lib
mailing list