python - Using soundex feature w/django search engine -
i'm building search engine django/python site. 1 requirement soundex feature, if searches "smith" or "johnson" search return homonyms "smyth" or "jonsen". database mysql, fwiw.
what's recommended approach? right i'm leaning towards haystack + whoosh, capture soundex feature.
thanks in advance help.
mysql has soundex() function. docs here. soundex algorithm developed aid in searching anglo-saxon names in english. it's not best choice these days.
you're better off either metaphone or double metaphone.
in case, people store result. makes easy index, , searching pretty fast.
data integrity problem, though. ideally, i'd want this.
create table persons ( ... last_name varchar(25) not null, last_name_phonetic varchar(6) not null, -- not sure length check (last_name_phonetic = double_metaphone(last_name)) ... );
but requires dbms have either intrinsic double_metaphone() function, or support user-defined functions in check() constraints. mysql doesn't enforce check() constraints @ all, you'd need implement in triggers if application needs kind of data integrity.
for it's worth, postgresql has contrib module, fuzzystrmatch, implements soundex, metaphone, double metaphone, , levenshtein distance functions. if me, i'd build in postgresql rather mysql.
Comments
Post a Comment