[WEB4LIB] trivial question about size of largest web page
Tony Barry
me at Tony-Barry.emu.id.au
Wed Jul 11 23:42:36 EDT 2001
At 8:37 AM -0700 11/7/01, Gillian Wiseman wrote:
>I had a patron ask me a trivial question out of curiosity; what is the
>largest website (his definition is most number of pages) in existence?
What is a site?
---------------
Think of multihoming.
Is it all pages which can be retrieved from a given host name?
or all pages that can be retrieved from a given ip address
and all the host names that point at it?
Think of server farms where multiple machines serve one ip address.
Is it one host in that farm or all of them?
Think of sites with a combined index where a central point serves up
disparate pages from multiple institutions eg
http://www.pictureaustralia.org/about.html
Do you count the combines "site" as well as the
individual ones?
What is size?
-------------
Numbers of what? What is the metric?
HTML pages mounted?
Files mounted including graphics? (In which case sites
with graphical navigational systems would could as larger
than those that used text.
Storage capacity? Graphic, video and and audio sites would
win out
Bits or bytes? The same problem applies
Dynamic elements
Server side includes can introduce changes to each page
and it may change each time it is displayed. Do you count
the base information or do you add the SSI elements as
separate files?
What about sites which are generated by a database - and
nearly all really big sites would be. Do you count
the database elements (in which case library of congress
catalog is pretty big :-) or do you count the pages that
could be displayed - effectively infinite?
The question has no sensible answer because the words used have no
clear meaning in the context of the internet. It's a hang over from
thinking of information as held in an artifact. What is the biggest
book is not all that clear either ;-]
I think it's best to think of the internet as a single entity.
Silk from a Sow's Ear: Extracting Usable Structures from the Web
http://www.acm.org/sigchi/chi96/proceedings/papers/Pirolli_2/pp2.html
Graph structure in the web http://www.almaden.ibm.com/cs/k53/www9.final/
You can still define or localize related areas of information and
often you might call this a site, which as we have seen above may not
be a whole host, or one, or even at one institution. It may deliver
its files through the mediation of many different kinds of software
both at the server and client end to generate a variety of pages as
viewed which did not exist until the browser called them up.
The sense of "size" is not given my the metrics above either but it
is dependent on the variety of user experiences available to the
reader/viewer/client.
It is the end user who determines the "size" and what a "site" might
be as much as, and perhaps more, than the content at the server end.
Tony
--
phone +61 2 6241 7659
mailto:me at Tony-Barry.emu.id.au
http://purl.oclc.org/NET/Tony.Barry
More information about the Web4lib
mailing list