Approximate pattern matching
Approximate pattern matching is a computer science problem where we try to find all occurances of a pattern string in a text within some "distance" , where we measure distance using some distance metric.
Some common distance metrics:
APM with Hamming distance
Consider the pattern in the word . The Hamming distance , so one substitution yields a match. Pattern matching with the Hamming distance can be done in time, where is the length of the string, the length of the pattern, and the size of the alphabet.
References
APM with Levenshtien distance
Couple of possible approaches here:
- Dynamic programming
- Bit parallel
- Automata
Dynamic programming is the easiest and most flexible, but slow: requires time.
:{
import Control.Monad
import Data.Function
import Data.Functor
-- | The dynamic programming pattern for APM and Levenshtien distance.
dmatch :: (Eq a) => [a] -> [a] -> (Int -> Int -> Int) -> Int -> Int -> Int
0 j = 0
dmatch needle haystack subproblem 0 = i
dmatch needle haystack subproblem i =
dmatch needle haystack subproblem i j if (needle !! (i - 1) == haystack !! (j - 1)) then
- 1) (j - 1)
subproblem (i else
1 + minimum [subproblem (i - 1) j, subproblem i (j - 1), subproblem (i - 1) (j - 1)]
-- | 'approxMatch dist needle haystack' will find all occurances of 'needle' in 'haystack' that are
-- at most 'dist' edit distance apart.
approxMatch :: (Eq a) => Int -> [a] -> [a] -> [(Int, [a])]
= do
approxMatch dist needle haystack let m = length needle
let n = length haystack
<- [m..n]
j let c = fix (dmatch needle haystack) m j
<= dist)
guard (c pure (c, drop (j - m) $ take j haystack)
:}
1 "annual" "annealingannual" approxMatch