Levenshtien distance
The Levenshtien distance or edit distance between two strings (not necessarily of equal length!) is the number of swaps, insertions, and deletions required to turn one string into another.
The following algorithm computes the Levenshtein distance, and also serves as it's specification.
:{
lev :: (Eq a) => [a] -> [a] -> Int
= length b
lev [] b = length a
lev a [] :as) (b:bs) | a == b = lev as bs
lev (a| otherwise = 1 + minimum [lev as (b:bs), lev (a:as) bs, lev as bs]
:}
Examples
Properties
If the two strings have the same length, then the Hamming distance serves as an upper bound on the Levenshtein distance.
The Levenshtein also satisfies the Triangle inequality. Another way of saying this is that Levenshtein distance forms a Locally Graded Category. In fact, Levenshtein distance is a canonical example of why we need to consider local gradition in Constructive Mathematics; in general, we cannot compute the levenshtein distance unless the underlying type of a string has Decidable Equality; however, the distance is still well defined even if we don't have decidable equality!
Algorithms
One natural approach to computing Levenshtein distances is via dynamic programming. This is essentially the same approach used in Levenshtein distance counts edits, but where we memoize the results in a table.
We can do this nicely in Haskell with a bit of knot-tying magic.
:{
import Data.Functor
-- We start with a type of a bounded grid equipped with a focus.
data Grid a = Grid { grid :: [[a]], focus :: (Int, Int), bounds :: (Int, Int) }
deriving (Show, Functor)
-- We can refocus the grid
refocus :: Grid a -> Int -> Int -> Grid a
= g { focus = (i, j) }
refocus g i j
-- and we can extract the element at the current focus.
extract :: Grid a -> a
Grid { grid = grid, focus = (i, j) }) = grid !! i !! j
extract (
-- Composing these together gives us access to the "subproblem" at (i, j)
gsubproblem :: Grid a -> Int -> Int -> a
= extract (refocus g i j)
gsubproblem g i j
-- 'Grid a' forms a comonad, where 'extract f g' applies f to 'g' refocused
-- at every position.
extend :: (Grid a -> b) -> Grid a -> Grid b
@Grid { bounds = (x, y) }) =
extend k (glet grid =
0..x - 1] <&> \i ->
[0..y - 1] <&> \j ->
[
k (refocus g i j)in g { grid = grid }
-- Take a fixpoint of a function that is paremeterised over an 'x' by 'y' grid,
-- where the focus is set to the coordinates '(x-1, y-1)'.
gfix :: Int -> Int -> (Grid a -> a) -> Grid a
=
gfix x y f let gfix = extend f g
= Grid { focus = (x-1, y-1), bounds = (x, y), grid = grid gfix }
g in g
-- A dynamic programming version of Levenshtein distance.
dlev :: (Eq a) => [a] -> [a] -> (Int -> Int -> Int) -> Int -> Int -> Int
0 = i
dlev as bs subproblem i 0 j = j
dlev as bs subproblem | as !! i == bs !! j = subproblem (i - 1) (j - 1)
dlev as bs subproblem i j | otherwise = 1 + minimum [subproblem (i - 1) j, subproblem i (j - 1), subproblem (i - 1) (j - 1)]
-- A bit of 'gfix' lets us tie the knot!
dist :: (Eq a) => [a] -> [a] -> Int
=
dist as bs $ gfix (length as) (length bs) $ \(g@(Grid {focus = (i, j)})) ->
extract
dlev as bs (gsubproblem g) i j:}
"kitten" "sitting" dist
2
Note that this is (essentially) the following algorithm, but where we simply count the number of non-ok edits at the end.
:{
import Data.Foldable
import Data.Function
data Edit a = Keep a | Subst a a | Insert a | Delete a
deriving (Show)
cost :: Edit a -> Int
Keep _) = 0
cost (= 1
cost _
edits :: (Eq a) => [a] -> [a] -> [Edit a]
= fmap Insert bs
edits [] bs = fmap Delete as
edits as [] :as) (b:bs) | a == b = Keep a : edits as bs
edits (a| otherwise = minimumBy (compare `on` (sum . fmap cost)) $ [Delete a : edits as (b:bs), Insert b : edits (a:as) bs, Subst a b : edits as bs]
:}
"kitten" "sitting" edits
This in turn does a search on the following edit tree for the shortest series of edits.
:{
data Tree a = Node [(a, Tree a)]
deriving (Show)
tree :: [a] -> Tree a
= foldr (\a as -> Node [(a, as)]) (Node [])
tree
editTree :: (Eq a) => [a] -> [a] -> Tree (Edit a)
= tree $ fmap Insert bs
editTree [] bs = tree $ fmap Delete as
editTree as [] :as) (b:bs) | a == b = Node [(Keep a, editTree as bs)]
editTree (a| otherwise = Node [(Delete a, editTree as (b:bs)), (Insert b, editTree (a:as) bs), (Subst a b, editTree as bs)]
:}
"ab" "b" editTree
We can test our edit producing code like so:
:{
edit :: [Edit a] -> [a] -> [a]
= as
edit [] as Keep _ : es) (a : as) = a : edit es as
edit (Subst _ b : es) (_ : as) = b : edit es as
edit (Insert a : es) as = a : edit es as
edit (Delete _ : es) (_ : as) = edit es as
edit (:}
Subst 'k' 's',Keep 'i',Keep 't',Keep 't',Subst 'e' 'i',Keep 'n',Insert 'g'] "kitten"
edit [Insert 'b', Delete 'a', Delete 'b'] "ab" edit [