[WEB4LIB:14750] Converting text to HTML

Paul F. Schaffner pfs at umich.edu
Fri Aug 7 16:10:55 EDT 1998


On Fri, 7 Aug 1998, Walter W. Giesbrecht wrote:

> What we need to know is: how can we convert this ASCII file into
> HTML and make all the embedded URLs active links? 

I'm not sure how much of the record you want to retain in HTML
form, or how you want to format it (anything's possible), but
if all you want to do is extract the URLs from the records and
convert them into active links for testing, any regexp-supporting 
text editor would do it in seconds, or a simple Perl script like 
this even faster (I'm only capable of the most simple ones 
myself), regardless of platform:

#!\apps\Perl\bin\Perl.exe
while (<>) {
while (s,(^http:[^ ]+?),,) { 
print ("\n<p><a href="$1">$1</a></p>"); }
}

You'd have to modify this depending on whether and where newlines 
have been inserted in the record, or how much more of the record
you want to retain. This example assumes that you wish to retain
only the URLs themselves and that all URLs begin on new lines, 
have no line breaks or spaces in them, and begin with http:. 

If the records are in multiple files, just concatenate them as 
you go, running the script like this:

perl extract-URLs.pl *.rec >> URL-list.html

But perhaps I've misunderstood the problem.
--------------------------------------------------------------------
Paul Schaffner | pfs at umich.edu | http://www-personal.umich.edu/~pfs/
SGML Production Coordinator, Middle English Compendium ('the e-MED')
301 Hatcher Library North, Univ. of Mich., Ann Arbor MI 48109-1205



More information about the Web4lib mailing list