Border array

A string u 𝑢 u is said to be a border of a string w 𝑤 w if it is both a prefix and a suffix of w 𝑤 w , and is strictly shorter than w 𝑤 w .

The border array B ( u ) 𝐵 𝑢 B(u) of a string u 𝑢 u is an array of natural numbers, where B ( u ) [ i ] 𝐵 𝑢 delimited-[] 𝑖 B(u)[i] is the length of the largest border of u [ 1 . . i ] u[1..i] .

Algorithm

We can neatly compute the border array using a left scan.

:{
import Data.List

border :: forall a. (Eq a) => [a] -> [Int]
border xs = 0:(snd $ mapAccumL extend [0] (drop 1 xs))
  where
    -- Treating 'bs' as a stack.
    extend :: [Int] -> a -> ([Int], Int)
    extend (b:bs) x | x == xs !! b = ((b + 1):b:bs, b + 1)
                    | b > 0 = extend bs x
                    | otherwise = ([0], 0)

:}

putStr $ unlines $ fmap show $ zip (drop 1 $ inits "ababacaabcababa") (border "ababacaabcababa")
border "abcxabca"

The idea behind this algorithm is that we maintain a stack of border values seen up until a given point, and check if we can extend the border on every iteration. If we cannot, we pop a value from the stack, and see if that border can be extended, and so on. This becomes relevant for strings like abaa; the border a cannot be extended to a border aa, but a is still a valid border.

abcxabca goes from a match of length 3 to a match of length 1 aaaxaaaa stays constant

Note that we do not actually need to maintain the stack, as we can use the previously computed list of values as our stack.

Prefix arrays from border arrays

:{
import Data.List
lcp :: (Eq a) => [a] -> [a] -> [a]
lcp xs ys = map fst $ takeWhile (uncurry (==)) $ zip xs ys

prefixes :: (Eq a) => [a] -> [[a]]
prefixes xs = map (lcp xs) (init $ tails xs)

prefix :: (Eq a) => [a] -> [Int]
prefix = map length . prefixes
:}

putStr $ unlines $ fmap show $  zip4 (drop 1 $ inits "ababacaabcababa") (init $ tails "ababacaabcababa") (border "ababacaabcababa") (map length (prefixes "ababacaabcababa"))
map length (prefixes "ababacaabcababa")
(map length (prefixes "abcxabca"))

Start from the end, and keep a running tally of the border length. I think this is vaguely the right idea, but the hard case is when the value doesn't decrease from the previous iteration. I think I need to use some sort of stack for this?

:{
bprefix :: forall a. (Eq a) => [a] -> [Int]
bprefix (hd:xs) = snd $ mapAccumR extend (0,0) $ zip (hd:xs) (border (hd:xs))
  where
    extend :: (Int, Int) -> (a, Int) -> ((Int, Int), Int)
    extend (bprev, bmax) (x,b) | b == 0  = ((0, 0), 0)
                               -- | b <= bprev  = ((b, bmax + (bprev - b) - 1), if x == hd then bmax + (bprev - b) - 1 else 0)
                               | b < bprev   = ((b, bmax), if x == hd then bmax - b + 1 else 0)
                               | b == bprev  = ((b,bprev), if x == hd then bprev else 0)
                               | b > bprev   = ((b, b), if x == hd then bprev + 1 else 0) 
:}
bprefix "ababacaabcababa"
bprefix "abcxabca"
border "aabaabaabbaaa"
prefix "aabaabaabbaaa"

There are really 4 cases here:

Need to cook up an example that has a border sequence like [1,2,2,2,3,4,5]