[Web4lib] Common queries and common misspellings

Joshua Ferraro jmf at liblime.com
Fri Feb 17 18:20:29 EST 2006


hi Karen,

I've been doing query logging on the Nelsonville Public Library's
Koha installation almost since the beginning when the system was
installed over two years ago. I've used the data from those logs to
generate what I have defined as 'successful' queries (which really
just means that the phrase or term resulted in hits). Successful
queries are added to the list of 'popular' queries which
are used for the spellchecking feature.

The spellcheck database contains those 'successful' queries,
a normalized list of every word,title,author,subject, and
MARC subfield in the database. When a search is performed that
turns up 0 hits, its soundex value is compared to the spellcheck
database and if any close hits are found they are retrieved
and presented to the user as a 'did you mean'. You can try
it out here:

http://search.athenscounty.lib.oh.us

I've toyed with the idea of using this data for even more 
interesting purposes but never got around to it.

Here are some details on the type of information the log collects:

mysql> show columns from query_log;

| Field           | Type          |
+-----------------+---------------+
| phrase_or_term  | varchar(40)   |
| resultcount     | int(40)       |
| orig_ip         | varchar(40)   |
| timestamp       | timestamp(14) |

I put in the orig_ip column because I had a notion that queries
from inside the library might be different than those originating
from outside.

Currently there are about 350,000 queries logged.

Since the data belongs to the library, I'd have to consult with 
them before turning it over ... let me know if you're interested.

Cheers,

--
Joshua Ferraro               VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE
President, Technology       migration, training, maintenance, support
LibLime                                Featuring Koha Open-Source ILS
jmf at liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS


More information about the Web4lib mailing list