Emptiness Problem: Given CFG G, is L(G) = ∅?
The answer is yes iff the start symbol of G is not "fruitful". A symbol X is said to be fruitful if, for some terminal string z, X ⇒* z. Obviously, every symbol in the terminal alphabet of G is fruitful (because it is related to itself via the relation ⇒*). But there may be nonterminal symbols that are not. (An obvious example would be a nonterminal B whose only production was B → aB. We have that B ⇒* anB for every n≥0, but there is no string composed entirely of terminal symbols that can be generated from B.)
Here is an algorithm for identifying the fruitful nonterminal symbols of a grammar:
|
The logic of this algorithm is based upon the observation that a nonterminal is fruitful iff it has at least one production whose right-hand side is composed entirely of fruitful symbols.
Generates λ Problem: Given CFG G, is it the case that λ ∈ L(G)?
The answer is yes iff the start symbol of G is "nullable", meaning that it generates the empty string. The algorithm below identifies the nullable nonterminals in a CFG.
The logic of the algorithm is based upon the observation that a nonterminal is nullable iff it has at least one production whose right-hand side is composed entirely of nullable symbols. You will notice that it differs little from the algorithm (see above) that identifies the fruitful nonterminals.
|
The Infinite Language Problem: Given CFG G, is it the case that L(G) is infinite?
To simplify the answer to this question, it is convenient to assume that G has the following properties:
We can make these assumptions about G "without loss of generality" because for every CFL L (that does not contain λ as a member) there exists a grammar that generates it and that satisfies these restrictions. Indeed, given G, we can algorithmically produce G' that satisfies all these restrictions and, moreover, satisfies L(G') = L(G) - {λ}.
Now to answer the question...
Assuming that G is consistent with the restrictions enumerated
above, L(G) is infinite iff G has some nonterminal that is
self-embedding. A nonterminal A is self-embedding
if A ⇒+ xAz for some x and z for which xz ≠ &lambda.
Given the restrictions that we have placed upon G, it is easy
to determine (via an algorithm) whether or not a given nonterminal is
self-embedding. (If G were allowed to include λ-productions or
unit productions, it would make the job a little harder.)
To demonstrate that L(G) is infinite if a self-embedding nonterminal symbol A exists, suppose that
Then we have
S ⇒+uAv ⇒+ uyv,
S ⇒+uAv ⇒+ uxAzv
⇒+ uxyzv,
S ⇒+uAv ⇒+ uxAzv
⇒+ uxxAzzv ⇒+ uxxyzzv,
S ⇒+uAv ⇒+ uxAzv
⇒+ uxxAzzv ⇒+ uxxxAzzzv
⇒+ uxxxyzzzv,
etc., etc.
That is for each n≥0 we have S ⇒+ uxnyznv, which are infinitely many different terminal strings, and thus L(G) is infinite.
As for the converse, we must demonstrate that the lack of a self-embedding nonterminal implies that L(G) is finite. If G has no self-embedding nonterminal, then the longest path (from the root to a leaf) in any derivation tree is bounded above by the number of distinct nonterminals in G. (Any path longer than that would contain two nodes labeled by the same nonterminal symbol, implying that that nonterminal symbol was self-embedding.)
It follows that every derivation tree of G has height less than or equal to n, where n is the number of nonterminal symbols in G's alphabet. But there are only finitely many such derivation trees. Hence, L(G) is finite.