levenshtein dictionary levenshtein dictionary

an exploration and re-arrangement of dictionary words. 2014

Levenshtein dictionary is a sequence of all words arranged in order of similarity. The list begins with any supplied word followed by the most similar word as determined by its Levenshtein distance. The process is repeated for each remaining word resulting in a version of the dictionary that transforms from one word to the next — changing as few letters as possible with each step.
Long and unusual words like googolplex, syzygy and borborygmus appear towards the end while shorter words with common morphemes are easily grouped together earlier in the list.

The source words used to generate Levenshtein dictionary could be swapped with those for any language. For the sake of presentation, the one published here excludes abbreviations, prefixes, alternate spellings, words with dashes, words over 29 letters and words under three letters.

Levenshtein distance

Levenshtein distance is a string metric introduced and published in 1965 by Vladimir Levenshtein. In simple terms, the Levenshtein distance of two strings is the number of letters that must be removed, added or changed for them to match. For example, the words belief ➝ relief have a distance of 1 because they are different by one letter. Today, Levenshtein distance is typically used for spell checking in word processors.
Arranging all dictionary words in order requires calculating Levenshtein distance over 800 million times to pair each word with its most similar remaining match.

Animating the result

Using a diff library, the difference between words can be isolated. With transitions applied, the letters morph through every English word. The dictionary and animation begin with the word adapt, chosen arbitrarily. Because each word is used only once, the resulting list would be different depending on its starting word. The list below has been generated in advance to avoid straining your processor.

 

Using pronunciation keys

Some words share similar spelling but have dissimilar pronunciation. The words dough and cough have similar spelling but are considerably different when spoken. Arranging the list of words using available IPA pronunciation keys rather than spelling generates the tongue-twisting series of words below.

Finding the oddest words in the dictionary

Which words are least like the others?
This exploration reveals which words are least like others — the odd ones out. By selecting words whose Levenshtein distance is large when compared to every other word, we can determine those with the most unusual spelling.

Oddest words
8 letters# of letters difference from most similar word
arpeggio (4) adagio
froufrou (4) crouton
rutabaga (4) abaca
toboggan (4) bagman
9 letters
bahuvrihi (5) anuria
10 letters
bradytelic (5) academic
budgerigar (5) bacteria
hullabaloo (5) bugaboo
kookaburra (5) chokeberry
psilocybin (5) epsilon
11 letters
abracadabra (5) abecedary
flabbergast (5) aberrant
jabberwocky (5) amberjack
kwashiorkor (5) glassworker
rathskeller (5) atelier
12 letters
acciaccatura (6) accelerator
katzenjammer (6) antechamber
13 letters
clishmaclaver (7) bushmaster

The full list of 92 oddest words are available in the source.

Observations

jabberwocky, a nonsense word coined by Lewis Carroll in an 1871 poem is among one of the oddest words. Using Levenshtein distance, it proves to be a measurably weird word.

bahuvrihi, a term used in linguistics, needs a majority of its letters (five) swapped or removed to become another dictionary word. It beat all others of the same length, making it the oddest 9 letter word found.

Sources

Levenshtein dictionary Javascript source code is available on GitHub
Animation and its source available on Codepen

Project Gutenberg English word list on infochimps.com
Google's diff-match-patch library

IPA keys parsed from Collins American English dictionary

Corrections or comments?