CMPS 260 Spring 2026
Test #2 Sample Solutions

1. Consider the context-free grammar G1:

G1
SKQM   (1)
QaQb  |  λ   (2) (3)
KaK  |  a   (4) (5)
M λ  |  cMd  |  cMdd   (6) (7) (8)

(a) Give a precise description of L(G1) (preferably using set comprehension, as seen in Problem 3).
Solution: The following all describe L(G1) in slightly different ways:

{ am+nbmcjdk  |  n>0 ∧ j≤k≤2j }
{ a+ambmcjdk  |  j≤k≤2j }
{ ambncjdk  |  m>n ∧ j≤k≤2j }

A number of students, while correctly reporting that d occurs at least as many times as c, failed to recognize that d was constrained to occur no more than twice as many times as c.

(b) Demonstrate that G1 is ambiguous.
Solution: To demonstrate ambiguity, we present non-isomorphic derivation trees for the string accddd, which is the shortest string for which this is possible.

A number of students showed non-isomorphic derivation trees for longer strings. Such answers are correct but inferior to the one given here, which demonstrates ambiguity in the simplest possible way.

     S
    /|\
   / | \
  /  |  \
 /   |   \
K    Q    M
|    |   /|\
a    λ  c M d
         /|\\
        / | \\
       c  M  dd
          |
          λ
     S
    /|\
   / | \
  /  |  \
 /   |   \
K    Q    M
|    |   /|\\
a    λ  / | \\
       c  M  dd
         /|\
        c M d
          |
          λ

(c) Modify the grammar to make it unambiguous (while preserving the language that it generates).
Solution: The ambiguity of G1 stems from the productions of nonterminal M, which allow, at each step of a derivation, a single c and either one or two d's to be generated. To disambiguate the grammar, we modify it so that in any derivation the applications of productions that introduce a single d must precede those of productions that introduce a pair of d's. (This was an arbitrary choice; an alternative would be to change the grammar to force the productions that generate pairs of d's to be applied before those that generate single d's.)

SKQM   (1)
QaQb  |  λ   (2) (3)
KaK  |  a   (4) (5)
M cMd  |  N   (6) (7)
N λ  |  cNdd   (8) (9)

One subtlety is that production M → λ had to be removed or else there would be two different ways to generate the empty string from M (the other being M ⇒ N ⇒ λ).


2. Consider the context-free grammar G2:

G2
SaSbb  |  aKbb   (1) (2)
KcdKe  |  M   (3) (4)
MhM  |  g   (5) (6)

Give a precise description of L(G2) (preferably using set comprehension, as seen in Problem 3).

Solution: L(G2 = { ak(cd)jhigejb2k  |  k>0 }

Because the number of occurrences of h is independent of the number of occurrences of anything else, we could have written hi as h*.

Among the mistakes made by several students was to say cjdj rather than (cd)j. Another was to mess up the order in which the symbols occurred, for example thinking that the a's and b's lay next to each other and/or the e's preceded the h's.


3. Present a context-free grammar G3 that generates the language

L3 = { ambncjdiaaei  |  m > n  ∧  j ≥ 0  ∧  i > 1 }

Hint: Devise CFGs for L3,1 = {ambn  |  m>n}, L3,2 = {cj  |  j≥0}, and L3,3 = {diaaei  |  i>1}, and combine them into a single CFG.

Solution: Of course, L3 = L3,1 · L3,2 · L3,3.

G3
SKMN   (1)
K aKb  |  aK  |  λ   (2) (3) (4)
McM  |  λ   (5) (6)
N dNe  |  ddaaee   (6) (7)

From K can be generated the strings in L3,1.
From M can be generated the strings in L3,2.
From N can be generated the strings in L3,3

It folows that from KMN (and hence from S, using production (1) as a first step) can be generated the strings in L3 = L3,1 · L3,2 · L3,3.

A slightly different solution replaces (1) with S → KM and replaces (6) with M → N. This change makes it such that the strings derivable from M form the set L3,2 · L3,3.


4. Recall a decision problem that was discussed in class:

Given a CFG G, is L(G) infinite?

Recall that the property of a CFG that is necessary and sufficient for it to generate an infinite language is that —assuming all its nonterminal symbols are "useful"— at least one of its nonterminal symbols is "self-embedding".

A closely related decision problem is this:

Given a CFG G having a as a member of its terminal alphabet, is L(G) such that, for every string x ∈ L(G), there is a string y ∈ L(G) having more occurrences of a than x has? To ask it differently, is it the case that, for every n≥0, there is a string y ∈ L(G) having more than n occurrences of a?

(a) Describe a property of a CFG that is necessary and sufficient for the answer to this question to be "yes".

Solution: Assuming that G includes only useful symbols, there is no upper bound on the number of occurrences of a in members of L(G) iff G has some nonterminal A such that A ⇒+ xAy (for some terminal strings x and y), where at least one a occurs in xy.

From A ⇒+ xAy it follows that A ⇒* xnAyn for all n≥0. From A being useful it follows that S ⇒* uAv and A ⇒+ w (for some terminal strings u, v, and w). From all these it follows that, for every n ≥ 0

S ⇒* uAv ⇒* uxnAyn* uxnwyn

Because xy includes at least one occurrence of a, uxnwyn includes at least n occurrences of a.

(b) Sketch an algorithm for determining whether a given CFG has this property. (You can assume that every nonterminal is "useful" and that there are no λ-productions or unit productions in the grammar. See next problem for the definition of a unit production.)

Solution: Omitted for the moment.


5. Consider the context-free grammar G5:

G5
SaKb  |  c  |  dM   (1) (2) (3)
KM  |  dK   (4) (5)
MbMa  |  aS   (6) (7)

A unit production is one whose right-hand side is a single nonterminal symbol. This grammar includes such a production, namely K → M.

(a) Modify G5 so that the language it generates is preserved but so that there are no unit productions.

Solution: The only unit production in the given grammar is K → M. Using that as a first step, there are two possibilities for the first two steps in a multi-step derivation from K:

K ⇒ M ⇒ bMa   and   K ⇒ M ⇒ aS.

The same effects can be achieved (in only one step rather than two, and without the need for the unit production K → M) if we had the productions K → bMa and K → aS. So we add those two productions and remove production (4).

A few students mistakenly replaced production (1) with S → aMb, the idea being that where in the original grammar you had

S ⇒(1) aKb ⇒(4) aMb,

in the new grammar you could achieve the same thing in a single step: S ⇒(1') aMb. What this misses is that now there is no opportunity to make use of production (5) (because K is unreachable from S), repeated applications of which can generate any number of d's.

(b) Using the insight gained from doing this one example, describe a construction by which to remove unit productions from any CFG without changing the language that it generates.

Would your construction work on grammar G'5, which has all the productions of G5, plus
MQ (8)
QKc  |  b (9) (10)

Solution: The solution to (a) suggests this construction:

If A → B is a (unit) production, and the productions of B are

B → β1  |  β2  |  ···  |  βn,

then replace the unit production A → B by the productions

A → β1  |  β2  |  ···  |  βn,

This is a good start, but if any of the βi's is itself a nonterminal (i.e., B has a unit production of its own), then A would still have at least one unit production.

In the scenario presented in (b) (in which M → Q is introduced as production (8)), and following the procedure above, we would have ended up with K having the production K → Q, and thus we would not have been finished removing unit productions.

Therefore, a correct construction must apply the procedure described above iteratively, until all unit productions have been eliminated. In our example with productions (8), (9), and (10), we would replace K → M not only by K → bMa and K → aS (as indicated before) but also K → Kc | b, reflecting the facts that

K ⇒(4) M ⇒(8) Q ⇒(9) Kc
K ⇒(4) M ⇒(8) Q ⇒(10) b

A subtlety here is that we must not introduce a unit production whose right-hand side is the same as its left-hand side, as in A → A.


6. Let L be a regular language, and suppose that M were a DFA such that L = L(M). What modifications made to M would produce a DFA M' that accepted the language

MAX(L) = { x ∈ L  |  for all y≠λ, xy ∉ L }

In other words, MAX(L) contains precisely those members of L that are not proper prefixes of any members of L. In yet other words, these are the members of L that cannot be "extended" on the right (by concatenating some nonempty string there) to become a longer member of L.

Solution: The modifications of M by which to obtain M' satisfying L(M') = MAX(L(M)) is to make non-final any state of M from which there exists a non-empty walk leading to a final state.

Reasoning: x ∈ MAX(L(M)) iff x ∈ L(M) and there is no non-empty string y such that xy ∈ L(M). Let state p be such that q0x p. Then x is a member of MAX(L(M)) iff p is a final state but all its outgoing transitions go to states from which it is not possible to reach a final state. It follows that the only states that should be final in M' are the final states in M all of whose outgoing transitions go to dead states.

Under the assumption that M is minimal, there can be at most one state from which it is not possible to reach a final state, because that would be the unique "dead" state. Moreover, there can be at most one final state all of whose outgoing transitions go to that dead state. (If there were two such final states, they would be indistinguishable from each other and thus M would not be minimal.) It follows that, if M is minimal, the construction of M' from M goes like this:

If M has a final state all of whose outgoing transitions go to the dead state, M' has that as its lone final state. Otherwise, M' has no final states. In all other respects, M and M' are the same. (Of course, M' itself may be non-minimal. Indeed, in the case that M' has no final states, M' can be reduced to the one-state DFA whose only state is "dead".)