Log in

Previous Entry | Next Entry

It's pronounced 'fronk-en-steen'

Question for the techies. How useful is Levenshtein distance as an accuracy measure for eg. quiz answers? Or do you know of a more appropriate method for fuzzy search comparison? What I want is for it to give a good guess at rightness if the answer is spelt slightly wrong or has punctuation added, etc.



( 3 comments — Leave a comment )
16th Jul, 2013 19:47 (UTC)
Levenshtein distance will do really well for superflous (or missing) punctuation or minor typoes or spelling errors. The big problem it has is with synonyms or word reorderings. Similarly, optional words (eg.Q: "Who first formulated a theory of general relativity?" A: "Einstein" / "Albert Einstein").
16th Jul, 2013 21:20 (UTC)
Mm, good point, I'll need to try and avoid those. The stuff I'm envisaging at the moment is crpytograms and the like, where synonyms, optional words or reorderings won't count as correct; but I wouldn't want to frustrate people by failing typos.

Dan on FB has suggested using Damerau-Levenshtein, which adds transposition detection to regular Levenshtein, and hopefully won't be significantly more time-intensive for the sort of length string we're talking about.
17th Jul, 2013 06:20 (UTC)
Yup. Length of string shouldn't matter, in fact. I think you could probably check an input the size of War and Peace in a few milliseconds! (Because you can simply terminate your calculation every time the distance becomes too great to accept the answer - the O(MN) complexity for these kinds of algorithms only arises in messy cases... I think.)

( 3 comments — Leave a comment )