[Web4lib] Web Authoring Statistics From 1 Billion Pages

Wed Jan 25 15:38:47 EST 2006

Just saw this on Slashdot:
http://code.google.com/webstats/index.html

Google  did an analysis of a sample of slightly over a billion documents, 
extracting information about popular class names, elements, attributes, and 
related metadata.
The parser looked only at documents whose HTTP headers including a 
Content-Type header with a value that started with the nine characters 
text/html.
They covered things like Pages, Classes, Headers, Metadate and Editors.

--------------
Blake Carver
LISNews.org
Librarian & Information Science News
http://www.lisnews.org