CMPS 260
Some Decidable CFG/CFL Decision Problems

Emptiness Problem: Given CFG G, is L(G) = ∅?

The answer is yes iff the start symbol of G is not "fruitful". A symbol X is said to be fruitful if, for some terminal string z, X ⇒* z. Obviously, every symbol in the terminal alphabet of G is fruitful (because it is related to itself via the relation ⇒*). But there may be nonterminal symbols that are not. (An obvious example would be a nonterminal B whose only production was B → aB. We have that B ⇒* anB for every n≥0, but there is no string composed entirely of terminal symbols that can be generated from B.)

Here is an algorithm for identifying the fruitful nonterminal symbols of a grammar:

q := empty queue
Fruitful := ∅  //  set of nonterminals known to be fruitful
do for each production A --> z, where z includes only terminal symbols 
|  |  Fruitful := Fruitful ∪ {A}
|  |  q.enqueue(A)
|  fi
od

// At this point, all nonterminals that produce terminal strings in 
// one step (i.e., via the application of a single production) are in
// Fruitful and on the queue.  What follows is a loop to identify the
// nonterminals from which terminal strings are derivable by a
// sequence of two or more applications of productions.

do while !q.isEmpty()
|  B := q.dequeue();
|  do for each production A --> αBβ // in which B appears on RHS
|  |  if A ∉ Fruitful ∧ every nonterminal X in αβ satisfies X ∈ Fruitful
|  |  |  Fruitful := Fruitful ∪ {A}
|  |  |  q.enqueue(A)
|  |  fi
|  od
od
// At this point, Fruitful contains precisely the fruitful nonterminals

The logic of this algorithm is based upon the observation that a nonterminal is fruitful iff it has at least one production whose right-hand side is composed entirely of fruitful symbols.


Generates λ Problem: Given CFG G, is it the case that λ ∈ L(G)?

The answer is yes iff the start symbol of G is "nullable", meaning that it generates the empty string. The algorithm below identifies the nullable nonterminals in a CFG.

The logic of the algorithm is based upon the observation that a nonterminal is nullable iff it has at least one production whose right-hand side is composed entirely of nullable symbols. You will notice that it differs little from the algorithm (see above) that identifies the fruitful nonterminals.

q := empty queue
Nullables := ∅  //  set of nonterminals known to be nullable
do for each λ-production A --> λ
|  |  Nullables := Nullables ∪ {A}
|  |  q.enqueue(A)
|  fi
od

// At this point, all nonterminals that produce λ in one step
// (i.e., via the application of a single λ-production) are in
// Nullables and on the queue.  What follows is a loop to
// identify the nonterminals from which λ is derivable by a
// sequence of two or more applications of productions.

do while !q.isEmpty()
|  B := q.dequeue();
|  do for each production A --> αBβ // in which B appears on RHS
|  |  if A ∉ Nullables ∧ every symbol X in αβ satisfies X ∈ Nullables
|  |  |  Nullables := Nullables ∪ {A}
|  |  |  q.enqueue(A)
|  |  fi
|  od
od
// At this point, Nullables contains precisely the nullable nonterminals


The Infinite Language Problem: Given CFG G, is it the case that L(G) is infinite?

To simplify the answer to this question, it is convenient to assume that G has the following properties:

  1. Every nonterminal is "useful", meaning that it appears in at least one sentential form in at least one derivation of a terminal string from S. That is, A is useful if there is a derivation S ⇒* xAz ⇒+ xyz, where x, y, and z are strings of terminal symbols. (See web page "Removing Useless Symbols from a CFG". You will notice that step 1 is to identify the fruitful nonterminals, the algorithm for which appears above.)
  2. G has no λ-productions.
  3. G has no "unit" productions, which are those of the form A → B, in which the right-hand side is a single nonterminal symbol.

We can make these assumptions about G "without loss of generality" because for every CFL L (that does not contain λ as a member) there exists a grammar that generates it and that satisfies these restrictions. Indeed, given G, we can algorithmically produce G' that satisfies all these restrictions and, moreover, satisfies L(G') = L(G) - {λ}.

Now to answer the question...
Assuming that G is consistent with the restrictions enumerated above, L(G) is infinite iff G has some nonterminal that is self-embedding. A nonterminal A is self-embedding if A ⇒+ xAz for some x and z for which xz ≠ &lambda. Given the restrictions that we have placed upon G, it is easy to determine (via an algorithm) whether or not a given nonterminal is self-embedding. (If G were allowed to include λ-productions or unit productions, it would make the job a little harder.)

To demonstrate that L(G) is infinite if a self-embedding nonterminal symbol A exists, suppose that

Then we have

S ⇒+uAv ⇒+ uyv,
S ⇒+uAv ⇒+ uxAzv ⇒+ uxyzv,
S ⇒+uAv ⇒+ uxAzv ⇒+ uxxAzzv ⇒+ uxxyzzv,
S ⇒+uAv ⇒+ uxAzv ⇒+ uxxAzzv ⇒+ uxxxAzzzv ⇒+ uxxxyzzzv,
etc., etc.

That is for each n≥0 we have S ⇒+ uxnyznv, which are infinitely many different terminal strings, and thus L(G) is infinite.

As for the converse, we must demonstrate that the lack of a self-embedding nonterminal implies that L(G) is finite. If G has no self-embedding nonterminal, then the longest path (from the root to a leaf) in any derivation tree is bounded above by the number of distinct nonterminals in G. (Any path longer than that would contain two nodes labeled by the same nonterminal symbol, implying that that nonterminal symbol was self-embedding.)

It follows that every derivation tree of G has height less than or equal to n, where n is the number of nonterminal symbols in G's alphabet. But there are only finitely many such derivation trees. Hence, L(G) is finite.