What is Soundex and How Does Soundex Work?

Is there anything you have read that you think would be useful to others researching their genealogy? (Please supply the full name and author).

What is Soundex and How Does Soundex Work?

Postby bimjim » Sat Feb 06, 2010 2:34 pm

http://www.genealogyintime.com/Genealog ... 49tTRAk.Vy


What is Soundex and How Does Soundex Work?

Soundex is the name given to a system for coding and indexing family names based on the phonetic spelling of the name. The code consists of the first letter of the family name, followed by 3 digits representing the first three phonetic sounds found in the name. Similar sounding family names have similar Soundex codes. Soundex is used to index individuals for US census and other purposes. Anyone wanting to use US census data to find their ancestors will need to become familiar with Soundex.

The advantage of the Soundex system is that it avoids most problems associated with misspellings or alternate spellings of family names. For example, Smith, Smyth and Smythe all have a Soundex code of S-530. Although this is a relatively simple example, alternate spellings of a family name are not always obvious. For example, Sherman, Sharman, Shurman, Shearman, Shireman, Scherman, Schurman and Sirman are all spelling variations of the same family name. With the Soundex system, they all have the Soundex code S-655.

The Soundex system is a useful tool in searching for ancestors because the misspelling of family names was a common occurrence in official records. As well, the spelling of family names can change over time. The Soundex system is particularly useful for people searching for ancestors in the United States. In 1930, the U.S. National Archives used the Soundex system to index family names for the 1880, 1900 and 1920 census, and part of the 1910 census (the results of the 1890 census were destroyed in a fire). Anyone searching for ancestors in these census records needs to know the Soundex code system.

Soundex is still in use today by the U.S. National Archives and Records Administration (NARA) to track people for census purposes. It is also used to check the spelling of names in large databases. For example, telephone companies use Soundex to assist telephone operators in locating the telephone number of a person based on an educated guess as to the spelling of the name.


Soundex Coding Rules

Robert Russell developed the soundex system in 1918. The Russell Soundex Code, as it is known, consists of a letter and three numbers:

The letter is the first letter of the family name, with the following exceptions:
With a multi-word family name, or a hyphenated family name, or a name with an apostrophe (‘), then only the last part of the name is coded. For example, for a family name of Smith-Jones, the Smith would be ignored and it would code as Jones (J-520).
For family names that begin with a prefix (common prefixes are L’, Le, La, D’, De, Di, Du, Dela, Con, Von or Van), the prefix is usually ignored. However, this rule was not universally applied to all the U.S. censuses and thus it is necessary to search for family names with and without prefixes. For example, Lebrun can have a Soundex code of L-165 or B-650. Note, however, that Mc, Mac and O’ are not considered prefixes and, therefore, must be included in the soundex code calculation.
Numbers are assigned to the remaining letters in the family name reading from left to right based on the phonetic sounds of each letter, as follows:

Number/Represents the Letter/Phonetic Description
1 / B, F, P, V / Labial sounds (require particular use of the lips).

2 /
C, G, J, K, Q, S, X, Z / Guttural sounds (produced in the throat) and sibilate sounds (requires a hissing noise).

3 / D, T / Dental sounds (formed with the tip of the tongue against the teeth).

4 / L / Long liquid sound (produced by a long contact of the tongue and mouth)

5 / M, N / Nasal sounds (produced partly through the nose).

6 / R / Short liquid sound (produced by a slight contact of the tongue and mouth).
Discard

H, W / Disregard consonants H and W.

Discard
A, E, I, O, U, Y / Disregard vowels.


Zeroes are added to the end of short family names (if necessary) to produce three numbers. For example, the Soundex code for Bailey is B-400. Likewise, additional letters in a family name beyond the ones necessary to achieve the first three numbers are disregarded. For example, the Soundex code for Johnston is J-523; the last letter 'n' is disregarded.


Soundex Exceptions

The following exceptions apply to the numbering sequence:

• Names that have double letters (including a family name that begins with a double letter) are treated as one letter. For example, the Soundex code for Lloyd is L-300.

• Names that have letters side-by-side with the same Soundex number are treated as one letter. For example, the Soundex code for Jackson is J-250. This rule also applies to family names that start with the same Soundex number. For example, the Soundex code for Czar is C-600.

• Names that have letters with the same Soundex number that are separated by a vowel (A, E, I, O, U), then the letter to the right of the vowel is also coded. For example, the Soundex code for Carman is C-655.

• Names that have letters with the same Soundex number that are separated by an “H” or “W”, then the letter to the right of the vowel is not coded. For example, the Soundex code for Ashcroft is A-261. However, this rule was also not consistently applied to all U.S. censuses from 1880 to 1920. Sometimes H and W were treated the same as a vowel. Thus, it is necessary to consider both possibilities. Thus Ashcroft could also have a Soundex code of A-226.


Limitations to Soundex Coding

There are several limitations to the Russell Soundex system:

• It was designed to work with English names and English phonetic sounds. There is no support for characters beyond the 26 letters (A to Z) found in the English language.

• Soundex is also based on English pronunciations. Continental European family names often code poorly. For example, many French family names have silent last letters that are not pronounced, but end up being coded using soundex. Thus, Beaux has a Soundex code of B-200 even though the “x” is not pronounced.

• This problem can also occur with family names that sound exactly alike, but have different spellings because the name contains silent letters. For example, Michum (M-250) has a different soundex code from Mitchum (M-325) because the silent “t” is coded in Mitchum.

• Names that sound alike, but start with a different letter, will have different Soundex codes. In particular, be aware of names that begin with either C or K. They are often a variation on spelling of the same name, but will have different Soundex codes. For example, Craft is C-613, but Kraft is K-613.

• Family names that are totally unrelated to each other can have the same Soundex code. This occurs most often with short names or names with a high percentage of vowels (i.e. names with less than three phonetic sounds, or a Soundex code that contains zeros). For example, Smith (S-530) shares the same Soundex code with such diverse family names as Saint, Sand, Schmid, Snead and Sunday.

• Soundex also has trouble discriminating long English names (i.e. names with more than 3 phonetic sounds) or names with non-English phonetic sounds. For example, the Russell soundex coding system does not work well on Slavic or Yiddish family names. The Jewish Genealogical Society has developed a refined system that is more independent of ethnic considerations. This system was designed by Randy Daitch and Gary Mokotoff and is known as the Daitch-Mokotoff Soundex system. It is more complicated than the Russell Soundex system because it can handle more phonetic sounds and because it is based on a six-digit Soundex code. The U.S. National Archives and Records Administration, however, continue to use the Russell Soundex system.
User avatar
bimjim
Site Admin
 
Posts: 964
Joined: Sat Jun 10, 2006 1:06 am
Location: Toronto, ON, CANADA

Return to Recommended reading... Pass it along!

Who is online

Users browsing this forum: No registered users and 1 guest



free counters