[WEB4LIB] Compaq.com - About SpeechBot
Stephen Sloan
sloan at unb.ca
Tue Dec 14 16:20:43 EST 1999
On Tue, 14 Dec 1999, Ernest Perez wrote:
> <http://speechbot.research.compaq.com/cgi-bin/query?help=about>
>
> Slick! Really slick! Free text search of audio text. It's a great beginning.
>
A very small beginning, I think. There are several problems.
Speech recognition tools have problems with personal names. I once used
such software to make a listing of a poetry collection donated to our
library. It was very useful as I could handle each volume and record
bibliographic information as I examined the volumes. Recording authors
was a problem as the program always hazards a guess. I wish I could
remember Naturally Speaking's stab at "Siegfried Sassoon". All I can
recall is that it was pretty hilarious. This databse has similar
problems. I can't find any reference to John Olerud. He must have been
discussed in the sports show as he signed as a free agent recently. I
also tried searching "rude" and came up empty.
Aside from personal names, looking at the "transcripts" from these shows
illustrates how badly the recognition software can work. Some of the
material is just incomprehensible. Compaq acknowledges this in their
information files. They say that most important words are spoken more
than once and the software will get it right eventually. It would be
interesting to compare this approach to OCR. CIHM has scanned microfilm
and indexed the OCR'd results. Would a project work better if someone
read the text into speech recognition software? The CIHM project is at:
http://www.canadiana.org/
Here, the dirty OCR is hidden from the viewer. It's used for searching
only. The viewer sees page images or PDF files.
I'd also like to comment on Compaq'a decision to devote 10% of their
efforts on this project to shows about the paranormal. I won't, however,
as this would probably cause an off-topic flame war that would resolve
nothing. Let's just say I was surprised to see 2 shows listed here. I
guess that in the backwaters of New Brunswick we don't get to listen to
such .... er... programming ....much.
Stephen Sloan
Systems Librarian
UNB Libraries
sloan at unb.ca
(506) 453-4814
More information about the Web4lib
mailing list