A Better Phonetic Lookup
This applet uses the "Metaphone" phonetic code algorithm described by
Lawrence Philips in the December 1990 issue of Computer Language. This
algorithm produces better matches than the Soundex algorithm. An input
word is reduced to a 1 to 4 character code using relatively simple
phonetic rules for typical spoken English.
Type a word into the test word field and press return or click on the
Calculate button to see the resulting phonetic code.
In order to test phonetic lookup based on this code, choose one
of the Word Sources - this will cause a file having a number of words
to be read from the server. The words will be placed in a lookup class
calculating the phonetic code on the fly as a key. When a word source
is resident, any words with the same phonetic code as words typed in
the test field will be displayed in the Matches text area.
|
|
The women's names, men's names, and place name files are from Gary Ward's
"Moby Words" collection which he has placed in the public domain.
These files and many more are available at:
http://fortis.speech.su.oz.au/comp.speech/Section1/Lexical/moby.html
http://www.dcs.shef.ac.uk/research/ilash/Moby/
ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/
The Metaphone Rules
Metaphone reduces the alphabet to 16 consonant sounds:
B X S K J T F H L M N P R 0 W Y
That isn't an O but a zero - representing the 'th' sound.
Transformations
Metaphone uses the following transformation rules:
Doubled letters except "c" -> drop 2nd letter.
Vowels are only kept when they are the first letter.
B -> B unless at the end of a word after "m" as in "dumb"
C -> X (sh) if -cia- or -ch-
S if -ci-, -ce- or -cy-
K otherwise, including -sch-
D -> J if in -dge-, -dgy- or -dgi-
T otherwise
F -> F
G -> silent if in -gh- and not at end or before a vowel
in -gn- or -gned- (also see dge etc. above)
J if before i or e or y if not double gg
K otherwise
H -> silent if after vowel and no vowel follows
H otherwise
J -> J
K -> silent if after "c"
K otherwise
L -> L
M -> M
N -> N
P -> F if before "h"
P otherwise
Q -> K
R -> R
S -> X (sh) if before "h" or in -sio- or -sia-
S otherwise
T -> X (sh) if -tia- or -tio-
0 (th) if before "h"
silent if in -tch-
T otherwise
V -> F
W -> silent if not followed by a vowel
W if followed by a vowel
X -> KS
Y -> silent if not followed by a vowel
Y if followed by a vowel
Z -> S
Initial Letter Exceptions
Initial kn-, gn- pn, ae- or wr- -> drop first letter
Initial x- -> change to "s"
Initial wh- -> change to "w"
The code is truncated at 4 characters in this example, but more could be used.
Lawrence Philips, "Hanging on the Metaphone", Computer Language v7 n12, December 1990, pp39-43.
A good source for further information is this group at Sourceforge:
http://aspell.sourceforge.net/metaphone/
Java PhoneticList Class
I have implemented the Metaphone code as part of a class called PhoneticList.
As the name indicates, this class tracks lists of objects by the Metaphone
code derived from a key string. Operation is similar to a Hashtable except
that any number of Objects can have the same code and an Object array is
returned by the lookup function. In the example applet, the Objects are
Strings but they could be anything.
Source code is available for free, but I would appreciate knowing how you plan to use it.
Please contact William Brogden.