[WEB4LIB] Re: web searchable full text journal list database

Jian Liu jiliu at script.lib.indiana.edu
Thu Feb 25 13:13:25 EST 1999


Hello again. I'll try to finish it this time.

Data collection and update:

Most of our databases have title lists. Some of them are very
straightforward: you just need to download the list, grab the pieces you
need and put them in the flat ascii file according to the structure of the
file. Lists from OVID, PEAK, JSTOR, Project Muse, WNC, ACM, ACS, are
easily done. Later updates will not cause much trouble, since they are
relatively stable. For example, there has been a new update for JSTOR,
which I just did using the Links' built-in modification/addition
functions.

The hard pieces are Education Abstracts, ABI, Ebsco and Lexis-Nexis. But
once you figure out how to extract the data, it becomes fairly easy. For
example, Education Abstracts with Full Text. You need to download the
list. (I use: lynx -source http://www.hwwilson.com/journals/aedi.HTM >file)
Then you need to analyze the data. Each record looks like this:

<tr>
<td WIDTH="35%" ALIGN="left" VALIGN="top"><small>Action in Teacher
Education</small></td>
<td WIDTH="15%" ALIGN="left" VALIGN="top"><small>0162-6620</small></td>
<td WIDTH="5%" ALIGN="left" VALIGN="top"><small>01/1983</small></td>
<td WIDTH="5%" ALIGN="left" VALIGN="top"><small> </small></td>
<td WIDTH="5%" ALIGN="left" VALIGN="top"><small>01/1996</small></td>
<td WIDTH="5%" ALIGN="left" VALIGN="top"><small> </small></td>
<td WIDTH="30%" ALIGN="left" VALIGN="top"><small> </small></td>
</tr>

It has a structure. So what you do is figure out a way to extract the
pieces you need from this record. For my purpose, I extract the title and
the Full Text Start/End Date. There are many ways to do this. I use a perl
script.

The same for ABI, with some more twists and turns and some hair pulling. I
am no programmer, but I can manage to get most of the work done, and if I
am stuck, I ask. There are quite a few more capable helping hands around.
I rely on them whenever the task becomes too much for me.

Ebsco is trickier. First there are almost 80 individual files for
Academic Search FullTEXT Elite alone, and the code is very clean either.
Take a look youself and you'll see what I mean. But again, it has patterns
and once you figure out the patterns, extracting data becomes easy.
Again, I use lynx to grab the data:
lynx -source http://www.epnet.com/maglists/html/af_h1.htm >>file
lynx -source http://www.epnet.com/maglists/html/af_h2.htm >>file
...

Notice the >>, which allows you to download all the 78 files and put
them in one big file (appending, instead of overwriting).

Same with Lexis-Nexis. 

About Updating:
You have to rely on the vendors of the databases to keep the lists up to
date. Take Lexis-Nexis for example, when I started the project, the list
at Lexis-Nexis site was updated on Feb 9 1999. I grabbed that list,
processed it, and made it available around Feb 18. The list was later
updated on Feb 23. I updated mine on the same day. The updating took me
about an hour, and with more experience, I can probably combine several
steps and reduce the time needed to about 15 minutes. Other lists are
less frequently updated. ABI was last updated in Jan. Education Abstracts
posts a new date nearly everyday, but since for the past week, there has
been no change at all in the fulltext list. The concern I have is when
the vendor changes the database but does not update the list. Instead
they post a separate change list. This will create a lot of headache. I
don't know how to deal with this situation yet. Up to now, I am only
grabbing the complete list.

So with the current setup, if I do a monthly update, it will probably take
me about 3 to 5 hours. When I have time, I'll write out all the steps in
detail so that other people will be able to take over the project. For
now, I am the only person doing the whole thing.

There are other database vendors whose title lists are not readily
available. These I have to omit.

Future plans? Not much. For example, adding titles freely available from
the internet, not those with one or two articles in one issue and then it
disappears. Real ones, like the dlib. Exploring with mysql so that the
data will sit in a real database, to increase the speed of searching. Set
it up so that my colleagues can help me manage a section of it by adding
individual titles in their areas of study. Ideas? Suggestions? Comments?

Jian Liu
Indiana University Libraries
> 
> Dear all,
> 
> I have received quite a few inquiries since I posted the following message
> to this list. Most are interested in how it was done, others ask about
> management issues, such as updates. Based on what I read, this might be a
> topic of general interest. So I decided to post the reply here to reach a
> broader audience. If you have further questions, I'll be glad to answer
> them.
> 
> First of all, our library has been interested in something like this for
> quite some time now. We even had some meetings about how and who to do it.
> Everybody agrees that it has great use, not only to patrons, but also to 
> librarians, to Reference, to Interlibrary Loan, etc. But the stopper is,
> again, how and who.
> 
> Remember Peter Scott's post about his Electronic Journals Resource
> Directory about 2 or 3 months ago? I started playing with Links after
> that. Thank you, Peter. This program is best suited for maintaining an
> internet resources page. It allows for quick and easy addition/deletion,
> modification and verification of links. 
> 
> I first built our Internet Quick Reference with it, at:
> http://www.indiana.edu/~librcsd/internet/ and it has been maintained by a
> colleague of mine here since then. I began to realize that this program
> can do a lot of similiar things. So I made some modifications to the codes
> and came up with The Parliament of Australia: A Bibliography, at:
> http://www.indiana.edu/~librcsd/bib/australia_parliament/ and another
> colleague here took it over and has been maintaining/updating it. You can
> see big differences between the two, and the major difference is that it
> has no external links. From there to the current Locating Online Fulltext
> Journals and Newspapers at http://www.indiana.edu/~librcsd/fulltext/ was
> just a small step. 
> 
> More about Links: it is not a database program. It stores data in a flat
> ascii file, and it builds static html pages, as you can see from the
> browsing part of the page, which no longer needs the program after it is
> built. Only the searching part is dynamic, which is slow, as you have
> experienced. The number of records (over 11,000) is probably close to
> its upper limits now. 
> 
> The most tricky part of the project is data collection and update. I'll
> come back to it next time.
> 
> Best,
> 
> Jian Liu
> Indiana University Libraries
> 
> > 
> > We just finished one very similar to your description, at:
> > http://www.indiana.edu/~librcsd/fulltext/
> > 
> > Jian
> > Indiana University Libraries
> > 
> > > 
> > > 
> > > In order to provide better access to our increasing collection of
> > > electronic journal resources (Academic Universe, SearchBank, Proquest,
> > > Project Muse, JStor, etc.) we are considering creating a web-searchable
> > > database of all the full text journals titles we have access to.  The
> > > database would include the titles and, if available, the dates of full
> > > text coverage. Patrons could perform a title search and, if we have
> > > e-access, get a display with the title, dates of full text coverage and
> > > a hot link to the vendor's database from which we access the journal. 
> > > 
> > > Although we catalog in our OPAC journals we access from smaller full
> > > text collections like JStor and Project Muse, it is not practical in
> > > many instances to have bib records in our OPAC for journals from larger
> > > collections. Issues of staff time and the stability of journal
> > > availability tend to prohibit such projects.
> > > 
> > > Our database would likely be built from the journal lists that the
> > > various vendors provide from their sites. The biggest problem I can
> > > foresee would be extracting the information we want from each of the
> > > vendors individually formatted journal lists. Of course maintaining the
> > > currency of the database will be a chore as well. I'm sure this isn't an
> > > original idea, but I'm wondering if anyone else has attempted this and
> > > what the outcome was? 
> > > 
> > > Thanks
> > > -- 
> > > David Vose
> > > Binghamton University Libraries
> > > (607) 777-4907
> > > 
> > 
> 



More information about the Web4lib mailing list