[Web4lib] Announcement: Web Curator Tool

Ingrid Mason ingrid.mason at natlib.govt.nz
Tue Oct 31 21:33:50 EST 2006


Organisations in the business of building up digital collections, including web material, to meet heritage or research requirements, may want to consider the potential in the web harvesting and curation made possible using the Web Curator Tool, please find following an announcement of the tool's release.  

Ingrid Mason


ANNOUNCEMENT

The National Library of New Zealand and The British Library are pleased to announce the release of the Web Curator Tool as an open-source project.  The tool, and its manuals, FAQs, mailing lists, source code, developer documentation, and other information, including a presentation, are available from http://webcurator.sf.net/.


About the Web Curator Tool

The Web Curator Tool is a tool for managing the selective web harvesting process. It is designed for non-technical users in libraries and other collecting institutions who need to capture web material for archival purposes. The tool's workflow encompasses the following tasks:

* Harvest Authorisation: seeking and recording permission to harvest web material, and to make it accessible to the general public.

* Selection and scoping: determining what material should be harvested, be it a web site, a web page, a partial web site, a group (or collection) of web sites, or any combination of these.

* Scheduling: determining when a harvest should occur, and when it should be repeated.

* Description: describing harvests  with basic Dublin Core metadata, and other specialized fields (or a by a providing a reference to an external catalogue).

* Harvesting: the Web Curator Tool will download the selected web material at the appointed time using the Internet Archive's Heritrix web crawler -- each installation can have multiple harvesters on different machines, each which can perform several harvests simultaneously.

* Quality Review: tools are provided for making sure the harvest worked as expected, and correcting simple harvest errors.

 * Endorsing and submitting: if the harvest was a success, it is endorsed then submitted  to an external digital archive.


International Collaboration

The Web Curator Tool Project was a joint project undertaken by National Library of New Zealand and the British Library and working with standards emerging from under the auspices of the International Internet Preservation Consortium (IIPC).  The software was built by Sytec Resources Ltd. in New Zealand.

Each partner provided around half the funding and contributed personnel to the project team and the test teams. Other IIPC member organisations provided invaluable assistance along the way, particularly the National Library of Australia (procurement and design) and the Library of Congress (requirements and specifications).

The main goal of the project was to design and build a Web Curator Tool that meets the needs of the two partners, and that is modular and can be extended to meet the needs of IIPC members and other collecting organisations. This ambition was realised with the open-source release of the tool on 22 September 2006.

Going forward, we hope a wider community of IIPC members, national libraries, and other collecting organisations will use and  benefit from the Web Curator Tool, and that they will make their own contributions to its future development and direction.


Technical details

The Web Curator Tool is designed to run in an enterprise setting, and would normally be installed by a system administrator (it is not a desktop application).  Users access the software through a standard web browser.

It requires Java (version 1.5), Apache Tomcat (version 5.5), a relational database, an external archive and/or access tool (such as WERA or Wayback), and (optionally) an LDAP directory for user authentication.

The software is tested on Solaris (version 9) and Red Hat Linux. It has been installed and run on Windows and on Debian Linux, and should work on any platform that supports Apache Tomcat. The Oracle and PostgreSQL databases are officially supported (by testing and installation scripts), MySQL has been used, and any database that Hibernate supports (including MySQL, Microsoft SQL Server, and about 20 others) should be compatible.

The Web Curator Tool is Free Software, distributed under the terms of the Apache Public License (version 2.0). It incorporates parts or all of several other Free Software packages, including Acegi Security System, Apache Axis, Apache Commons Logging, Heritrix (version 1.8), Hibernate (database connectivity), Quartz (scheduling), Spring Application Framework, and Wayback (version 0.6).

Recent builds, the current source code, and developer documentation are available from http://webcurator.sf.net/.


More information

For more information, including a full description, Quick Start and System Administrator Guides, and updates on the project, please visit the Web Curator Tool website, join the mailing lists (available from the website), or email the project team, who will be pleased to take your comments and respond to your queries.

Website: http://webcurator.sf.net/ 
National Library of New Zealand team: wct at natlib.govt.nz 
British Library team: wct at bl.uk 


Ingrid Mason

Resource Analyst: Innovation Centre
Epublications Librarian: Alexander Turnbull Library

National Library of New Zealand
Te Puna Matauranga o Aotearoa
New Zealand = Aotearoa
ws: www.natlib.govt.nz 
em: ingrid.mason at natlib.govt.nz 

The information contained in this email is privileged and confidential and intended for the addressee only. If you are not the intended recipient, you are asked to respect that confidentiality and not disclose, copy or make use of its contents. If received in error you are asked to destroy this email and contact the sender immediately. Your assistance is appreciated.


More information about the Web4lib mailing list