[WEB4LIB] trivial question about size of largest web page

Tony Barry me at Tony-Barry.emu.id.au
Wed Jul 11 23:42:36 EDT 2001


At 8:37 AM -0700 11/7/01, Gillian Wiseman wrote:
>I had a patron ask me a trivial question out of curiosity; what is the
>largest website (his definition is most number of pages) in existence?

What is a site?
---------------

Think of multihoming.

	Is it all pages which can be retrieved from a given host name?

or	all pages that can be retrieved from a given ip address
	and all the host names that point at it?

Think of server farms where multiple machines serve one ip address.

	Is it one host in that farm or all of them?

Think of sites with a combined index where a central point serves up 
disparate pages from multiple institutions eg 
http://www.pictureaustralia.org/about.html

	Do you count the combines "site" as well as the
	individual ones?

What is size?
-------------

Numbers of what? What is the metric?

	HTML pages mounted?

	Files mounted including graphics? (In which case sites
	with graphical navigational systems would could as larger
	than those that used text.

	Storage capacity? Graphic, video and and audio sites would
	win out

	Bits or bytes? The same problem applies

Dynamic elements

	Server side includes can introduce changes to each page
	and it may change each time it is displayed. Do you count
	the base information or do you add the SSI elements as
	separate files?

	What about sites which are generated by a database - and
	nearly all really big sites would be. Do you count
	the database elements (in which case library of congress
	catalog is pretty big :-) or do you count the pages that
	could be displayed - effectively infinite?

The question has no sensible answer because the words used have no 
clear meaning in the context of the internet. It's a hang over from 
thinking of information as held in an artifact. What is the biggest 
book is not all that clear either ;-]

I think it's best to think of the internet as a single entity.

Silk from a Sow's Ear: Extracting Usable Structures from the Web 
http://www.acm.org/sigchi/chi96/proceedings/papers/Pirolli_2/pp2.html

Graph structure in the web http://www.almaden.ibm.com/cs/k53/www9.final/

You can still define or localize related areas of information and 
often you might call this a site, which as we have seen above may not 
be a whole host, or one, or even at one institution. It may deliver 
its files through the mediation of many different kinds of software 
both at the server and client end to generate a variety of pages as 
viewed which did not exist until the browser called them up.

The sense of "size" is not given my the metrics above either but it 
is dependent on the variety of user experiences available to the 
reader/viewer/client.

It is the end user who determines the "size" and what a "site" might 
be as much as, and perhaps more, than the content at the server end.

Tony

-- 
phone  +61 2 6241 7659
mailto:me at Tony-Barry.emu.id.au
http://purl.oclc.org/NET/Tony.Barry


More information about the Web4lib mailing list