[Web4lib] Another Google question

Lars Aronsson lars at aronsson.se
Wed Jul 6 14:53:16 EDT 2005


Patricia F Anderson wrote:

> I teach a class on advanced Internet searching, and
> focus on the concept of "match the tool to the task".

Good for you, and your students.  The problem is that web 
searching isn't very advanced yet, and there is little point in it 
even trying to being so, because the web isn't very advanced yet.

Not too long ago I saw the movie "The Aviator" about Howard Hughes 
who founded Hughes Aircraft in 1932.  I think air flight then is 
comparable to where the Internet is today.  It has a 30 year 
history (ARPAnet 1969; Wright brothers 1903) and a century long 
prehistory (library science; balloon flight). It is promising and 
has a lot of future in it, but it is just about to leave wood and 
cloth behind it for all-metal airplanes.  Here is a six page 
article from September 1931 on chosing the right kind of wood for 
building aircraft, http://runeberg.org/tektid/1931a/0495.html
That could then be called "advanced aircraft material testing", 
but a decade later wood was no longer an "aircraft material".

Coming back to the Internet, that article from 1931 is online 
because I scanned years 1871, 1872 and 1931-1934 of that magazine. 
But the 58 years in between are still missing. That is an "almost 
indefensible" gap. Not to mention the still copyrighted years 
1935-1994 that are missing, until the magazine itself appeared 
online in 1995.  So what do you get if you "find all web pages"?  
You get my scanned six years plus the magazine's own ten years 
online, out of the total 134 years that this magazine has existed. 
That is almost 12 percent.  Suppose that Google has indexed half 
of what's online, then Google will find 6 percent of what's been 
published in the magazine.  At best Google could achieve 12 
percent by tweaking its search engine.  By promoting scanning 
projects, Google could find all text from the first half of this 
magazine's publishing history, but the recent 70 years might have 
copyright problems.

Suppose you can write a clever Google query for building aircraft 
out of other materials than metal.  Why would you go through 900 
web hits, when it is so obvious that most of the knowledge is not 
going to be available on the web at all?  There are so many stages 
of omission other than that of Google's hit list.  In fact, of all 
knowledge, most has never been published in print at all.  Which 
is why we need people to write blogs and contribute to Wikipedia 
and similar projects, to get more knowledge online.


-- 
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se


More information about the Web4lib mailing list