Summary of responses to query on Link/URL checking software for library catalogues

Allison Diaz-Forte findallison at yahoo.com
Thu Oct 17 09:45:14 EDT 2002


Dear Colleagues,

I forgot that there is an attachment constraint in effect so I am pasting  the summary of the responses into the body of this message. Thanks to all the respondents.

All the best,

Allison Diaz-Forte

 
AUTOMATED LINK/URL CHECKING
Summary of responses, and information gained after posting my query about LINK/URL MONITORING PROGRAMS to WEB4LIB

 

Special thanks to those who responded: 

Lisa Hanson O'Hara oharalh at cc.UManitoba.CA

Sherry Davids sdavids at nal.usda.gov 

 Michael Lindsey mlindsey at law.berkeley.edu, and also 

Ed Summers esummers at odu.edu for his very useful web page on the subject.





The Problem:  We have all had the experience of discovering that links often change or if they don't, the content located at their addresses change.  How do libraries manage the first problem of monitoring the status of URLs in the 856 or other links fields in their catalogues?



Solutions: If you own the Voyager system (Endeavor Information Systems) then your system manages your links by running checks on the URLs. You can check the links yourself on an individual basis while in Cataloging mode or run a program at a set time and date to get a report that tells you the results of the check.



For the rest of us, most respondents who have automated the task have divided the process into two phases.  This is necessary because there isn't one piece of software to handle the entire task. Phase I involves the extraction of the URLs from the fields in which they are located (field 856 in MARC21 format) and dumping these onto an HTML page.  You have to do this because most automated link checkers are designed to check links in HTML documents and not to trawl through catalogue records. After this is completed the second phase is where the link checking program is run across the HTML page.



These are some of the programmes mentioned by Ed Summers (esummers at odu.edu) in his page at: http://www.viva.lib.va.us/viva/tech/cat/url-checking.html

First phase: Extracting links from your catalogue



Marcxgen (All Catalogs) Freeware: Marcxgen was created by Tom Tyler (University of Denver), and is the only general purpose utility for generating HTML from MARC records (that this author knows of). Marcxgen will take a group of unblocked MARC records and create an HTML page out of them. Most online systems will let you output MARC records, and very few systems still use blocked MARC records, since the blocked format was designed for use with older magnetic tape systems. You can download the zipped version by clicking on the link above, which will run in most Windows environments. It is available for free, and Tom Tyler (who is very helpful) can be reached at ttyler at odu.edu. 

URL: http://www.du.edu/~ttyler/freeware/marcxgen.zip



Comments: Michael Lindsey, Systems Editor, Boalt Law Library (an Innovative Interfaces library), University of California runs a combination of Tom Tyler's marcxgen and Xenu.  Marcxgen takes a list of marc records and creates a single webpage of all the links therein. Then Xenu takes a webpage (the output of marcxgen) and checks all the links.   

Xenu can export its results to a tab-delimited text file, which you can  manipulate in your favorite spreadsheet. Xenu is also freeware.

URL:  http://home.snafu.de/tilman/xenulink.html



Problems: According to Mark Lindsey, this is not a perfect solution.  Some urls in the webpage that marcxgen produces end with a period, i.e. "...page.html." and these need to be removed.  So it has to be passed through a search-and-replace function in a text editor. Also, some of their redirects fail in Xenu.  He is not sure why this happens but it happens on so few links, that it has not thus far warranted a thorough investigation.



Mungeurl (Innovative): Mike Corlee (University of Missouri) wrote this Perl script which will transform a review list of 856 fields into HTML. As a Perl script, Mungeurl will run in a Unix environment or on Windows and Mac machines if you have installed Perl (Perl for a variety of platforms can be downloaded for free from CPAN: Perl Ports). Mungeurl isn't currently available for download on the WWW, but Mike Corlee will send you copy if you contact him at mulmcorle at showme.missouri.edu. 

(Have not yet received confirmation that this script is still available)

Second phase: Running HTML pages through link checking software

Commercial software

Ed Summers notes that the benefit of going with commercial products is that they tend to be easier to setup and install, and some of them generate very useful and attractive reports that can help in the correction of broken URLs.  He lists the following:



Linkbot (Win95/WinNT): Linkbot Pro 4.0, from Tetranet Software, is focused on fast, comprehensive link checking. The company claims it’s no problem to check hundreds of thousands of pages without breaking down. The product also parses HTML and JavaScript code, flagging dozens of varieties of errors. In keeping with the concentrated capabilities of the product, the main display is a collapsible Explorer-like list of URLs that identifies error conditions. Because Linkbot Pro runs as a Windows NT service, it’s easy to schedule regular site scans and reports. If you have one of those sites with hundreds of thousands of objects, the advanced filtering capabilities of Linkbot Pro can help make the task more manageable by limiting the scope. The single-user Linkbot Pro is $295. 

URL: http://www.watchfire.com/products/webqa.asp 

HTML Rename (Win95/WinNT/Mac): this program was initially developed to aid in the transfer of web sites between DOS/Windows, Macintosh, and Unix environments. Part of this package is a link checking utility. It is reasonably priced, and a shareware evaluation version is available. 

URL: http://www.xlanguage.com/products/rename.htm 

Link Alarm (Web Based): This web-based service will email you a detailed report on the broken links in a specified page at the cost of $20/year. Before subscribing you might want to check out some of the freely available web-based services below. 

URL: http://www.linkalarm.com



Link Scan: Along with link tests and missing file scans, LinkScan, from Electronic Software Publishing, checks name tags and references, which are the links within a single object that permit navigation within a file. As many of the other enterprise scale products do, LinkScan can be configured to access Web sites via a proxy server and to analyze dynamic pages, including those that use CGI, Active Server Pages, or database links. LinkScan has facilities for a single user to manage multiple sites, and for an individual site’s responsibilities to be divided among multiple administrators. Error reports can be e-mailed to the individuals associated with a particular site or site subset. LinkScan can report redirected links, which lets the site manager anticipate future broken links and handle them before they become a problem. LinkScan costs $750 per copy.

URL:  http://www.elsop.com



Comments: Sherry L. Davids, AGRICOLA Coordinator at the National Agricultural Library (VTLS migrating to Voyager) uses LinkScan. A programmer in their systems division has written a program to extract all URLs in 856 fields in bib records and produce a report that includes, along with the URL, the call no. of the item, the 035 field (NAL identifier) and the OCLC number.  This report is mounted on the Web and LinkScan (link checking software) is run to determine how many of the URLs are  broken/timed out.  LinkScan produces an online report.  Then, each link is manually checked to determine whether it is indeed broken/timed out or just slow.  For links that are not found, they look up the bib record by OCLC number and then search for the Web site using Google and other search engines.  If the site is found, they correct the bib record.  If the site is not found, they check again 2-3 times over a period of about a month.  If the site still is not found, they decatalog the bib record. At the moment, because they have approximately 8,000 bib records with URLs in their catalog and therefore the process is extremely time-consuming, they do this about every 3-4 months.



Spot (Sun Solaris): This full featured web site analysis tool will only run in the Sun Solaris environment, although it can analyze website on any kind of server. Evaluation version available. 

URL: http://transcend.labs.bt.com/spot



 

Free software



Blueprint: Focused on testing, verifying, and analyzing all the different types of links and files on your site. 

URL: http://www.exit0.com/ez1/products/blueprint_tour.html 

CheckWeb is a HTML links analyzer. The program scans HTML pages and explores all the links for errors. When CheckWeb is done the program generates a log file with all the errors it has found.

URL: http://www.duby.fr.fm/ 

CyberSpyder Link Test: CyberSpyder Link Test will check all links on a site and report on any that no longer work. When the program is started, one or more key URLs are entered, CyberSpyder Link Test then works tirelessly until all the links have been tested and prepares a set of reports on any problems found.

URL: http://www.webattack.com/get/cyberspyder.shtml 

InfoLink:  InfoLink Link Checker is an easy-to-use 32-bit program that helps you verify your links. A comprehensive report is generated for one or multiple sites.

URL: www.biggbyte.com/infolink/ 

NetMechanic (Web-based : you can submit a URL to NetMechanic, which will check your site, and email you a report (for free!) 

URL: http://www.netmechanic.com 

WebLint Gateways (Web-based): this page lists several sites where you can submit a URL and receive a report on the broken links contained in at that URL. These services use the WebLint software package listed below, and allow you to use this software without having to install it locally. 

URL: http://html.about.com/library/bl_weblint.htm 

Linklint (Perl): this shareware program (regular users are encouraged to pay $10) is a highly configurable, and powerful link checking program that will run on any platform that supports Perl. It has proven to be popular with libraries at the University of Pennsylvania and the University of Virginia.

URL: http://www.linklint.org/ 

MomSpider Multi-Owner Maintenance Spider (Perl/Unix): MOMSpider is a "web roaming robot that specializes in the maintenance of distributed hypertext infostructures"...or HTML pages. 

URL: http://ftp.ics.uci.edu/pub/websoft/MOMspider/ 

Weblint (Perl): Weblint is a multipurpose utility that will check the syntax and style of HTML page(s). One function it performs very well is link checking. 

URL: http://www.w3.org/Tools/weblint.html



Webxref (Perl):: will check links in specified page(s), and doesn't require extensive configuration before running. 

URL: http://128.138.129.27/bcn/development/webxref3.html



Xenu's Link Sleuth (TM) checks Web sites for broken links. Link verification is done on "normal" links, images, frames, plug-ins, backgrounds, local image maps, style sheets, scripts and java applets. It displays a continously updated list of URLs which you can sort by different criteria. A report can be produced at any time. 

URL: http://home.snafu.de/tilman/xenulink.html

 



Useful Links on this topic:

Checking links in your online catalog

URL: http://www.viva.lib.va.us/viva/tech/cat/url-checking.html


Link checkers

URL: http://www.lib.uiowa.edu/hardin/md/linkcheckers.html



Link checking programmes

URL: http://www.webings.com/linkchecking.shtml



Web Developer Bots 

URL: http://www.botspot.com/search/s-webdev.htm

 



Note: All links in this text were checked manually on October 17, 2002



---------------------------------
Do you Yahoo!?
Faith Hill - Exclusive Performances, Videos, & more
faith.yahoo.com


*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************



More information about the Web4lib mailing list