start working on your resumes...

Lou Rosenfeld lou at argus-inc.com
Thu Jan 30 19:18:31 EST 1997


On Thu, 30 Jan 1997, Nick Arnett wrote:

> At 12:52 PM 1/30/97 +0000, Cliff Urr wrote:
> 
> >With regard to your product, can you speak of an actual situation(s) 
> >where people are free of the boring, rote parts of categorizing and 
> >can really focus on the creative parts?
> 
> It is, in fact, early to talk about this, especially with regard to specific
> products that we may someday announce.  Prototypes have demonstrated well
> that subject-oriented classification of the style that an encyclopedia would
> use is highly automatable.  We've also had computer-human bake-offs for news
> "slugging," (tagging from a controlled vocabulary).  The computer's accuracy
> is virtually equal to the humans' and the mistakes are consistent (unlike
> the humans' mistakes).  Other areas work well, too. For example, if I want
> to have a category of "introductory documents," the system can easily learn
> the words that distinguish them, which are words such as "introductory,
> tutorial, FAQ, primer" and so forth.  (Note, our search engine looks at
> additional features of a document, not just word frequency.  I'm simplifying.)

(Hi Nick)

Although both news and encyclopedia content are very broad areas in terms
of subject scope, there are some reasons that automated tools might aid in
their classification. Reasons I can think of are their homogeneous formats
(news articles, encyclopedia articles) and uniformity in what the
respective audiences need from those types of information.  Still, I have
an extremely difficult time believing that these automated approaches work
as well as humans do in such broad and heterogeneous subject areas.  It
just doesn't make any sense to me, and I'd be very interested in how these
studies measure performance, who does the measurement, and who defines
those metrics for success (librarians? subject specialists? the system's
creators?). 

I do agree with Nick that such automated tools might reduce the time
expended by subject catalogers.  But only as a first rough pass over a
large body of content.  If these tools can accurately handle some large
percentage of the classification process, that's wonderful, and that means
they're great tools to use *in combination with* human catalogers. 
 
> We're not claiming that software will be able to eliminate *all* of the
> repetitive, uncreative categorization activities.  The examples are rather
> different from library categorization in the type of documents and
> categories, including the granularity of the categories.
> 
> I'm not aware of any significant commercial products that claim to do this
> kind of thing; I'm eager to hear of any.  We are aware of the research
> behind the "Science" article; the magazine certainly overstated what was
> learned (and the research understated what's possible with ordinary computers).
> 
> Nick
> 
> ---------------------------------------
> Verity Inc.
> Connecting People with Information
> 
> Product Manager, Categorization and Visualization
> 408-542-2164; home office 408-369-1233; fax 408-541-1600
> http://www.verity.com
> 
> 

Louis Rosenfeld                                             lou at argus-inc.com
Argus Associates, Inc.                                   http://argus-inc.com
109 Catherine Street                                   voice: +1.313.913.0010
Ann Arbor, MI  48104  USA                           	 fax: +1.313.213.8082

.......................Allied Studios address:  http://www.alliedstudios.com/
..................Argus Clearinghouse address:  http://www.clearinghouse.net/
................Internet Searcher's Handbook:  http://argus-inc.com/searcher/




More information about the Web4lib mailing list