Previous Entry | Next Entry

It's pronounced 'fronk-en-steen'

Everett True
Question for the techies. How useful is Levenshtein distance as an accuracy measure for eg. quiz answers? Or do you know of a more appropriate method for fuzzy search comparison? What I want is for it to give a good guess at rightness if the answer is spelt slightly wrong or has punctuation added, etc.

Tags:

Comments

( 3 comments — Leave a comment )
bateleur
16th Jul, 2013 19:47 (UTC)
Levenshtein distance will do really well for superflous (or missing) punctuation or minor typoes or spelling errors. The big problem it has is with synonyms or word reorderings. Similarly, optional words (eg.Q: "Who first formulated a theory of general relativity?" A: "Einstein" / "Albert Einstein").
undyingking
16th Jul, 2013 21:20 (UTC)
Mm, good point, I'll need to try and avoid those. The stuff I'm envisaging at the moment is crpytograms and the like, where synonyms, optional words or reorderings won't count as correct; but I wouldn't want to frustrate people by failing typos.

Dan on FB has suggested using Damerau-Levenshtein, which adds transposition detection to regular Levenshtein, and hopefully won't be significantly more time-intensive for the sort of length string we're talking about.
bateleur
17th Jul, 2013 06:20 (UTC)
Yup. Length of string shouldn't matter, in fact. I think you could probably check an input the size of War and Peace in a few milliseconds! (Because you can simply terminate your calculation every time the distance becomes too great to accept the answer - the O(MN) complexity for these kinds of algorithms only arises in messy cases... I think.)

( 3 comments — Leave a comment )

Latest Month

February 2014
S M T W T F S
      1
2345678
9101112131415
16171819202122
232425262728 

Page Summary

Powered by LiveJournal.com
Designed by Lilia Ahner