[Web4lib] Data mining
Michelle Frisque
mfrisque at northwestern.edu
Thu Feb 23 17:11:11 EST 2006
An issue came up twice yesterday and I was wondering what other peoples
thoughts are on it. The first incident happened when we were notified by a
vendor that they had denied access to one of our IPs because their "systems
detected a systematic downloading" of their web content. We identified the
user and learned it was a researcher who is downloading large collections
of articles from various vendors on a very broad topic. He then has servers
that use complex algorithms to search the full-text of the articles to
determine which articles are pertinent to his research.
A few hours after this incident I saw a demo of a new product for science
researchers called QUOSA. Within the software you connect to a database, do
a search, and then select which articles you want to download. You can
download as many articles as you want in the results list (as long as your
institution has a subscription to the online journal). It will then
download the articles and organize them. It also has a feature that allows
you to search your local collections.
These two scenarios raised some ethical questions for me. Most (if not all)
journal vendors state that they do not allow data mining but in both cases
the users are downloading these articles because of the additional
functionality that is not available within many of the databases i.e. it
will actually allow you to index and search the actual full-text of an
article not just the abstract and citation information. Have any of you
come across this with any of your users? If so how did you handle it with
the users and the vendors?
Michelle
Michelle Frisque
Head, Information Systems, Galter Health Sciences Library
Northwestern University, Chicago, IL
312-503-7074 voice / 312-503-1204 fax
mfrisque at northwestern.edu
More information about the Web4lib
mailing list