soundex

Stephen Sloan sloan at unb.ca
Thu Dec 2 10:12:48 EST 1999


A short time ago there was a message to the list that asked about Soundex
technology.  I have just found a description of how it works so I thought
drop a line and fill out the information.

A soundex parser does the following (at least from the documentation I
have):

1. strips out non-alpha characters
2. lowercase is set to uppercase
3. captures the first letter for later use
4. removes all occurences of "W" and "H"
5. B F P V are given values of "1"
6. C G J K Q S X Z are given values of "2"
7. D T are given values of "3"
8. L gets a value of 4
9. M N are given values of "5"
10. R gets a value of 6
11. all vowels are removed
12. The first letter of the string (from step 3) is then concatenated 
    with the first three numerals from the mapping process.

A simple example:
great and grate

through step 10 great becomes 6EA3
through step 10 grate becomes 6A3E
with the vowels removed and the first letter added, both terms have a
soundex value of G63 and a search would therefore match.

The only search engine that I know of that will do soundex searching is
the LiveLink product from the OpenText corporation.  That's not really an
option anymore for most of us.  Starting with version 8 of the LiveLink
product, OpenText has been pursuing the large corporation Intranet market.
The product is priced way beyond most libraries' ability to afford it.  



Stephen Sloan
Systems Librarian
UNB Libraries
sloan at unb.ca
(506) 453-4814




More information about the Web4lib mailing list