Prefix array

A prefix array of a string w 𝑤 w is an array P 𝑃 P such that P [ i ] 𝑃 delimited-[] 𝑖 P[i] is the length of the longest prefix of w [ i w ] 𝑤 delimited-[] 𝑖 delimited-∣∣ 𝑤 w[i\ldots\mid w\mid] that is also a prefix of w 𝑤 w .

Algorithm

The following brute-force algorithm computes a prefix array of a string.

:{
import Data.List

-- | Length of the longest common prefix.
llcp :: (Eq a) => [a] -> [a] -> Int
llcp [] ys = 0
llcp xs [] = 0
llcp (x : xs) (y : ys) = if x == y then 1 + llcp xs ys else 0

naivePrefix :: (Eq a) => [a] -> [Int]
naivePrefix xs = map (llcp xs) (tails xs)
:}

naivePrefix "aabcaabxaaz"
naivePrefix "aabaabaabbaaa"
naivePrefix "anpanman"
naivePrefix "abacabacab"

This algorithm is O ( n 3 ) 𝑂 superscript 𝑛 3 O(n^{3}) ; llcp xs takes linear time, and tails produces a list of lists that scales quadratically with the size of xs.

The key insight comes when we observe the algebraic behaviour of llcp. Let u s , v s , w s 𝑢 𝑠 𝑣 𝑠 𝑤 𝑠 us,vs,ws be lists such that llcp ( u s , v s ) = m llcp 𝑢 𝑠 𝑣 𝑠 𝑚 \mathrm{llcp}(us,vs)=m , llcp ( v s , w s ) = n llcp 𝑣 𝑠 𝑤 𝑠 𝑛 \mathrm{llcp}(vs,ws)=n . We then have the following formula for llcp ( u s , w s ) llcp 𝑢 𝑠 𝑤 𝑠 \mathrm{llcp}(us,ws) .

llcp(us,ws)={min(m,n)if mnm+llcp(drop(m,us),drop(m,ws))if m=nllcp𝑢𝑠𝑤𝑠cases𝑚𝑛if 𝑚𝑛𝑚llcpdrop𝑚𝑢𝑠drop𝑚𝑤𝑠if 𝑚𝑛\mathrm{llcp}(us,ws)=\begin{cases}\min(m,n)&\text{if }m\neq n\\ m+\mathrm{llcp}(\mathrm{drop}(m,us),\mathrm{drop}(m,ws))&\text{if }m=n\\ \end{cases}
let (us, vs, ws) = ("aab", "aac", "abc")
llcp us vs
llcp vs ws
2
1