Border array
A string is said to be a border of a string if it is both a prefix and a suffix of , and is strictly shorter than .
The border array of a string is an array of natural numbers, where is the length of the largest border of .
Algorithm
We can neatly compute the border array using a left scan.
:{
import Data.List
border :: forall a. (Eq a) => [a] -> [Int]
border xs = 0:(snd $ mapAccumL extend [0] (drop 1 xs))
where
-- Treating 'bs' as a stack.
extend :: [Int] -> a -> ([Int], Int)
extend (b:bs) x | x == xs !! b = ((b + 1):b:bs, b + 1)
| b > 0 = extend bs x
| otherwise = ([0], 0)
:}
putStr $ unlines $ fmap show $ zip (drop 1 $ inits "ababacaabcababa") (border "ababacaabcababa")
border "abcxabca"The idea behind this algorithm is that we maintain a stack of
border values seen up until a given point, and check if we can
extend the border on every iteration. If we cannot, we pop a value
from the stack, and see if that border can be extended, and so on.
This becomes relevant for strings like abaa; the border
a cannot be extended to a border aa, but
a is still a valid border.
abcxabca goes from a match of length 3 to a match of
length 1 aaaxaaaa stays constant
Note that we do not actually need to maintain the stack, as we can use the previously computed list of values as our stack.
Prefix arrays from border arrays
:{
import Data.List
lcp :: (Eq a) => [a] -> [a] -> [a]
lcp xs ys = map fst $ takeWhile (uncurry (==)) $ zip xs ys
prefixes :: (Eq a) => [a] -> [[a]]
prefixes xs = map (lcp xs) (init $ tails xs)
prefix :: (Eq a) => [a] -> [Int]
prefix = map length . prefixes
:}
putStr $ unlines $ fmap show $ zip4 (drop 1 $ inits "ababacaabcababa") (init $ tails "ababacaabcababa") (border "ababacaabcababa") (map length (prefixes "ababacaabcababa"))
map length (prefixes "ababacaabcababa")
(map length (prefixes "abcxabca"))Start from the end, and keep a running tally of the border length. I think this is vaguely the right idea, but the hard case is when the value doesn't decrease from the previous iteration. I think I need to use some sort of stack for this?
:{
bprefix :: forall a. (Eq a) => [a] -> [Int]
bprefix (hd:xs) = snd $ mapAccumR extend (0,0) $ zip (hd:xs) (border (hd:xs))
where
extend :: (Int, Int) -> (a, Int) -> ((Int, Int), Int)
extend (bprev, bmax) (x,b) | b == 0 = ((0, 0), 0)
-- | b <= bprev = ((b, bmax + (bprev - b) - 1), if x == hd then bmax + (bprev - b) - 1 else 0)
| b < bprev = ((b, bmax), if x == hd then bmax - b + 1 else 0)
| b == bprev = ((b,bprev), if x == hd then bprev else 0)
| b > bprev = ((b, b), if x == hd then bprev + 1 else 0)
:}
bprefix "ababacaabcababa"
bprefix "abcxabca"
border "aabaabaabbaaa"
prefix "aabaabaabbaaa"
There are really 4 cases here:
- If the next border value is 0, then we know that the corresponding longest prefix must be 0
- If the next border value is greater than the previous border value, then we track that as the longest possible prefix we can see
- If the next border value is 1 less than the previous border value, keep the same maximal border value.
- If the next border value is equal to the previous border value, decrease the max border by 1.
Need to cook up an example that has a border sequence like [1,2,2,2,3,4,5]