Levenshtein dictionary is a sequence of all words arranged in order of similarity. The list begins with any supplied word followed by the most similar word as determined by its Levenshtein distance. The process is repeated for each remaining word resulting in a version of the dictionary that transforms from one word to the next — changing as few letters as possible with each step.
Long and unusual words like googolplex, syzygy and borborygmus appear towards the end while shorter words with common morphemes are easily grouped together earlier in the list.
The source words used to generate Levenshtein dictionary could be swapped with those for any language. For the sake of presentation, the one published here excludes abbreviations, prefixes, alternate spellings, words with dashes, words over 29 letters and words under three letters.
Levenshtein distance is a string metric introduced and published in 1965 by Vladimir Levenshtein. In simple terms, the Levenshtein distance of two strings is the number of letters that must be removed, added or changed for them to match. For example, the words belief ➝ relief have a distance of 1 because they are different by one letter. Today, Levenshtein distance is typically used for spell checking in word processors.
Arranging all dictionary words in order requires calculating Levenshtein distance over 800 million times to pair each word with its most similar remaining match.
Animating the result
Using a diff library, the difference between words can be isolated. With transitions applied, the letters morph through every English word. The dictionary and animation begin with the word adapt, chosen arbitrarily. Because each word is used only once, the resulting list would be different depending on its starting word. The list below has been generated in advance to avoid straining your processor.
Using pronunciation keys
Some words share similar spelling but have dissimilar pronunciation. The words dough and cough have similar spelling but are considerably different when spoken. Arranging the list of words using available IPA pronunciation keys rather than spelling generates the tongue-twisting series of words below.
Finding the oddest words in the dictionary
Which words are least like the others?
This exploration reveals which words are least like others — the odd ones out. By selecting words whose Levenshtein distance is large when compared to every other word, we can determine those with the most unusual spelling.
|8 letters||# of letters difference from most similar word|
The full list of 92 oddest words are available in the source.
jabberwocky, a nonsense word coined by Lewis Carroll in an 1871 poem is among one of the oddest words. Using Levenshtein distance, it proves to be a measurably weird word.
bahuvrihi, a term used in linguistics, needs a majority of its letters (five) swapped or removed to become another dictionary word. It beat all others of the same length, making it the oddest 9 letter word found.
Animation and its source available on Codepen
Project Gutenberg English word list on infochimps.com
Google's diff-match-patch library
IPA keys parsed from Collins American English dictionary