[Web4lib] Data mining

Thu Feb 23 17:11:11 EST 2006

An issue came up twice yesterday and I was wondering what other peoples 
thoughts are on it. The first incident happened when we were notified by a 
vendor that they had denied access to one of our IPs because their "systems 
detected a systematic downloading" of their web content. We identified the 
user and learned it was a researcher who is downloading large collections 
of articles from various vendors on a very broad topic. He then has servers 
that use complex algorithms to search the full-text of the articles to 
determine which articles are pertinent to his research.

A few hours after this incident I saw a demo of a new product for science 
researchers called QUOSA. Within the software you connect to a database, do 
a search, and then select which articles you want to download. You can 
download as many articles as you want in the results list (as long as your 
institution has a subscription to the online journal). It will then 
download the articles and organize them. It also has a feature that allows 
you to search your local collections.

These two scenarios raised some ethical questions for me. Most (if not all) 
journal vendors state that they do not allow data mining but in both cases 
the users are downloading these articles because of the additional 
functionality that is not available within many of the databases i.e. it 
will actually allow you to index and search the actual full-text of an 
article not just the abstract and citation information. Have any of you 
come across this with any of your users? If so how did you handle it with 
the users and the vendors?

Michelle

Michelle Frisque
Head, Information Systems, Galter Health Sciences Library
Northwestern University, Chicago, IL
312-503-7074 voice / 312-503-1204 fax
mfrisque at northwestern.edu