ResearchIndex (CiteSeer): Autonomous Citation Indexing of Web Publication

Gerry Mckiernan GMCKIERN at gwgate.lib.iastate.edu
Sat Jun 19 15:49:44 EDT 1999


Posted on behalf of Steve Lawrence. Apologies for
cross-posting
/Gerry McKiernan
Iowa State University
Ames IA 50011

>>> Steve Lawrence <lawrence at RESEARCH.NJ.NEC.COM> 06/13 9:49 PM >>>
ResearchIndex (formerly CiteSeer), a digital library of scientific
literature that automatically performs citation indexing is available
at:

http://researchindex.com/ 

ResearchIndex aims to improve the dissemination and feedback of
scientific literature, and to provide improvements in functionality,
usability, availability, cost, comprehensiveness, efficiency, and
timeliness.

The ResearchIndex software is available without cost for
non-commercial use. The demonstration service indexes over 200,000
computer science articles (containing over 2 million citations).

Many digital libraries of scientific literature are available
(e.g. LANL e-Print archive, ACM DL, IEEE DL, UCSTRI, CORR, ML Papers,
NCSTRL, LTRS, HP Bib, CS Bibliographies, NZDL etc.). These services
offer varying degrees of functionality, comprehensiveness, and
freshness.

Rather than creating just another digital library, ResearchIndex
provides algorithms, techniques, and software that can be used in
other digital libraries.

ResearchIndex indexes Postscript and PDF research articles and
provides:

- Autonomous Citation Indexing (ACI). ResearchIndex uses ACI to
        autonomously create a citation index that can be used for
        literature search and evaluation. Compared to traditional
        citation indices, ACI provides improvements in cost,
        availability, comprehensiveness, efficiency, and
        timeliness.

- Information on all cited documents, not just indexed documents.
        ResearchIndex computes citation statistics and related documents
        for all articles cited in the database, not just the indexed
        articles.

- Reference linking. As with many online publishers, ResearchIndex allows
        browsing the database using citation links.

- Citation context - ResearchIndex can show the context of
        citations to a given paper, allowing a researcher to quickly
        and easily see what other researchers have to say about an
        article of interest (useful for literature search and
        evaluation).

- Awareness and tracking - ResearchIndex provides automatic
        notification of new citations to given papers, and new
        papers matching a user profile. Machine learning is used
        to automatically learn user profiles.

- Related documents - ResearchIndex locates related documents
        using citation and word frequency measures and displays an
        active and continuously updated bibliography for each
        document.

- Similar documents - ResearchIndex computes the percentage of
        matching sentences between documents, allowing, for
        example, the detection of minor revisions to a paper.

- Full-text indexing - ResearchIndex indexes the full-text of the
        entire articles and citations. Full Boolean, phrase and
        proximity search is supported.

- Query-sensitive summaries - ResearchIndex provides the context
        of how query terms are used in articles, instead of a generic
        summary, improving the efficiency of search.

- Citation graph analysis - ResearchIndex analyzes the graph of
        citations, e.g. to identify authoritative and review style
        articles.

- Page images - ResearchIndex allows quick and easy viewing of
        page images.

- Up-to-date - ResearchIndex is continuously updated 24 hours
        a day.

- Powerful search - e.g. ResearchIndex allows using author initials
        to narrow a citation search.

- Autonomous location of articles - ResearchIndex uses search engines,
        crawling, and mailing list monitoring to efficiently locate
        papers on the Web. ResearchIndex can also be used on
        existing digital libraries.

- Source code available - The full source code of ResearchIndex is
        available without cost for non-commercial use.

A demonstration service is at: http://researchindex.com/ 
For more details or to obtain the software see
http://www.neci.nec.com/~lawrence/researchindex.html 
http://www.neci.nec.com/~lawrence/aci.html 

The following papers contain details of the system:

"Digital libraries and Autonomous Citation Indexing", Volume 32,
Number 6, 67-71, 1999.

"CiteSeer: An automatic citation indexing system", Digital Libraries,
June 1998 [shortlisted for best paper].

"CiteSeer: An autonomous Web agent for automatic retrieval and
identification of interesting publications", Autonomous Agents, May
1998.

"CiteSeer: Autonomous Citation Indexing and Literature Browsing Using
Citation Context", Technical Report, NEC Research, 1997.

We currently only have a small capacity machine on our external
network for demonstration. The demonstration service indexes over
200,000 computer science articles.

Credits: We would like to thank Joshua Alspector, Jose Nelson Amaral,
Anders Ardo, Shumeet Baluja, Arunava Banerjee, Eric Baum, Robert
Cameron, Rich Caruana, Ingemar Cox, Scott Fahlman, Gary Flake, Bill
Gear, Paul Ginsparg, Eric Glover, Alan Gottlieb, Steve Hanson, Haym
Hirsh, Steve Hitchcock, Paul Kantor, Jon Kleinberg, Bob Krovetz,
Andrea LaPaugh, Michael Lesk, Andrew McCallum, Steve Minton, Tom
Mitchell, Michael Nelson, Craig Nevill-Manning, Andrew Ng, Max Ott,
Brian Pinkerton, Alexandrin Popescul, Ben Schafer, Bruce Schatz,
Terrence Sejnowski, Warren Smith, Dagobert Soergel, Amanda Spink,
Harold Stone, Valerie Tucci, Lyle Ungar, David Waltz, Ian Witten, and
Peter Yianilos for useful comments and suggestions.

--
Steve Lawrence - http://www.neci.nec.com/~lawrence/


More information about the Web4lib mailing list