Validating fuzzy logic values

At the bottom of the double metaphone page, they have the implementations of it for all kinds of programming languages: Python & My SQL implementation: https://github.com/Atom Boy/double-metaphone Firstly, I would like to add that you should be very careful when using any form of Phonetic/Fuzzy Matching Algorithm, as this kind of logic is exactly that, Fuzzy or to put it more simply; potentially inaccurate.

Especially true when used for matching company names.

Philips later developed an improvement to Metaphone, which he called Double-Metaphone.

Double-Metaphone includes a much larger encoding rule set than its predecessor, handles a subset of non-Latin characters, and returns a primary and a secondary encoding to account for different pronunciations of a single word in English.

Important: If you are developing an app for Windows Phone 8, you must use Visual Studio Express 2012 instead of Visual Studio 2010 Express.

validating fuzzy logic values-22validating fuzzy logic values-44validating fuzzy logic values-73

Before we built Match2Lists.com, we used to spend an unhealthy amount of time validating fuzzy matches.

In Match2Lists we incorporated a powerful Visualisation tool enabling us to review non-exact matches, this proved to be a real game changer in terms of match validation, reducing our costs and enabling us to deliver results much more quickly. Here's a link to the php discussion of the soundex functions in mysql and php.

I'd start from there, then expand into your other not-so-well-defined requirements. It's more appropriate for measuring the difference between two known words, not for searching. It discusses a solution designed more to detect things like proofing errors (using "Levenshtien" for "Levenshtein") rather than spelling errors (where the user doesn't know how to spell, say "Levenshtein" and types in "Levinstein".

** For example, someone writes: I have found the following threads that seem similar to this question, but the poster has not approved and I'm not sure if their use-case is applicable: How to find best fuzzy match for a string in a large string database Matching inexact company names in Java For more advanced needs, I think you need to look at the Levenshtein distance (also called "edit distance") of two strings and work with a threshold.

This is the more complex (=slower) solution, but it allows for greater flexibility.

Leave a Reply