SUMMARY: listserv archives to web

Jian Liu jiliu at nickel.ucs.indiana.edu
Mon Apr 29 18:06:47 EDT 1996


Hi all,

On March 18, 1996, I posted the following question to the list:

Could somebody recommend a FREE program that can convert listserv archives
to the web, so that the results can be browsed and searched? 

After several poeple replied to it, I then posted the second question:

Does anyone know if there is a program that can convert listerv archive
format (the mainframe type) to unix mbox format (RFC 822)? This is related
to the question I sent here earlier today about finding a program that can
provide web broswing/searching to listserv archives. Several of you have
suggested using hypermail. But according to the installation documentation
of hypermail, it supports unix mailbox format only. So I am thinking of
converting the listerv archive format (which separates messages with 73
equal signs) to this format first. Then try hypermail. 

I didn't receive responses to my second question.

The questions and answers are archived at:
http://www.lib.berkeley.edu/Web4Lib/archive/9603/0158.html
http://www.lib.berkeley.edu/Web4Lib/archive/9603/0166.html

http://www.lib.berkeley.edu/Web4Lib/archive/9603/0165.html
   <Thank you, Chris Adams (chris at sparkie.osl.state.or.us)>
http://www.lib.berkeley.edu/Web4Lib/archive/9603/0174.html
   <Thank you, John McKay (j.mckay at rave.ac.uk)>
http://www.lib.berkeley.edu/Web4Lib/archive/9603/0196.html
   <Thank you, Prentiss Riddle (riddle at is.rice.edu)>

Several people asked for a summary. I delayed it until now for
two reasons: I finished the project only last week. I have been
waiting for the List owner to update his webpage and make the 
announcement to his list first.

Now's the time for the summary.

The list in question is VICTORIA. The web access to the archives are now
at: http://www.indiana.edu/~libref/victoria/

Put simply, for browsing used hypermail; for searching I used Isearch. 
(Links to both programs are available from the above site.)

Some technical details:

1. I tried MHonArc as well. But it core dumped whenever there was a
   anormalous mail header, specifically the date format. I guess the
   perl script I asked somebody wrote for me was not sophisticated
   enough for one thing, but you wouldn't believe all the variations
   in the mail header. No wonder a good mailer is hard to find. :-)

2. More about the date line of the mail header. hypermail would just
   sit there, doing nothing if there are some extremely bad ones in
   date line. I discovered this after extensive search and hair pulling.
   To give you a taste: one email message had the year 1904, and hypermail
   would be confused and didn't know what to do. There were about 5 or
   six of the messages with 1904 as the year. There was another one with
   the year 2010. What can you do? :-) Many of the messages have GMT+1000,
   GMT+5000, etc to indicate timezone, which would confuse MHonArc 
   completely. Some dates are still not right.

3. I still don't understand what's the difference between thread and
   subject. :-)

4. Isearch is a very good search engine, but it doesn't seem to be able
   to handle a database as big as the victoria arhives. I tried to
   index the whole thing as one searchable database. It took Iindex more
   than 36 hours to index only 2/3 of it. Based on my calculation, if
   I had let it run to the finish, it would have taken it as least another
   26 hours. I finally decided to stop the indexing of the complete thing
   and provided a separate search for each of the annual archives.

5. Using hypermail to process list archives is not a very economic approach,
   it splits a mailbox file into individual messages, thus generating 
   hundreds of small files, doubling the harddrive space required.

If you have further questions, please let me know.

Jian
Indiana University Libraries


More information about the Web4lib mailing list