Computer Science 456/656
Automata and Formal Languages
Every context-free language (CFL) is in the class P-Time.
This is proved by giving a polynomial time algorithm for deciding
whether a given string is in a given CFL. The best known such algorithm
is called the CYK algorithm, and uses dynamic programming
to build a collection of derivations of substrings from grammar symbols.
Let G be a CFG.
Let L be the language generated by G.
For the sake of simplicity, I shall assume that G is in Chomsky normal
form. In actual fact, the CYK algorithm can be adapted to an arbitrary
context-free grammar, but the details are much more complex.
This complexity makes explaining it far more difficult, but adds nothing
to your understanding, because the CNF case gives you the essence of
the algorithm.
Let m be the number of
grammar symbols of G. We name this grammar symbols
x1, x2, ... xm. We can assume
that the start symbol is x1. We can also assume that
x1 ... xr are variables, and that
xr+1 ... xm are terminals.
Let w[p,d] be the substring of w of length d
starting from the pth symbol of w.
Let P[i,p,d] be the boolean array
where P[i,p,d] means that
xi derives w[p,d].
Note that the size of the array is O(n2), because the size
of the grammar is treated as a constant.
The steps of the CYK algorithm:
-
Initialize all P[i,p,d] to false.
-
For all i from r+1 to m, and for all p from 1 to n, if xi = w[p,1],
then assign P[i,p,1] to be true.
-
For each production
of the form xj -> xi, where j is between 1 and p
(i.e., xj is a variable) and i is between r+1 and m
(i.e., xi is a terminal), and for each p from 1 to n,
if P[i,p,1], then assign P[j,p,1] to be true.
-
Execute the following for all d, starting with d=2 and ending with d=n.
-
Execute the following for every production of the form
xj -> xjxk
for all p from 1 to n-d+1,
for q from p+1 to p+d-1,
if P[j,p,q-p] and P[k,q,d+p-q] then assign P[i,p,d] to true.
-
If P[1,1,n] then return Yes else return No.