HW #5 CMPS 260 Spring 2025

CMPS 260 Spring 2025
HW #5: Context-free Grammars/Languages; q-grammars, CYK algorithm
Sample Solutions

1. Let G₁ be the following context-free grammar:

S ⟶ Sb | aHb (1) (2)

H ⟶ aHbb | Kb (3) (4)

K ⟶ cK | c (5) (6)

(a) Offer a precise description of L(G₁), the language generated by G₁.

(b) Provide some evidence for the claim that G₁ is unambiguous. To do so, choose a "generic" member of L(G₁), describe its derivation from S, and argue that there is no other way to derive it from S.

(c) Present an alternative CFG G'₁ that generates the same language as G₁ but whose only nonterminal symbols are S and K, with K having the same two productions as are in G₁. (In other words, modify the grammar so that S "does the work" of both S and H in the original grammar.) Demonstrate that your grammar is ambiguous.

Solution:
(a) {a^kc^rb^2k+m | k≥1, m≥0, r≥1}

(b) A few observations about G₁ are in order. First, it is linear (meaning that no production's right-hand side includes more than one nonterminal symbol). It follows that S ⟹^* α implies that α has at most one occurrence of a nonterminal. Keeping that in mind, it is clear that any application of (2) replaces the lone occurrence of S in a sentential form by H, making it impossible to later apply either (1) or (2). By similar reasoning, an application of (4) cannot later be followed by an application of either (3) or (4) (because H was replaced by K). Also, an application of (6) erases the lone nonterminal symbol in the sentential form, terminating the derivation.

From all this we can conclude that every derivation of a terminal string from S looks like this:

S ⟹^m < apply (1) (S ⟶ Sb) m times (m≥0) > Sb^m ⟹ < apply (2) (S ⟶ aHb) > aHbb^m ⟹^k-1 < apply (3) (H ⟶ aHbb) k-1 times (k>0) > aa^k-1Hb^2(k-1)bb^m ⟹ < apply (4) (H ⟶ Kb) > aa^k-1Kbb^2(k-1)bb^m ⟹^r-1 < apply (5) (K ⟶ cK) r-1 times (r>0) > aa^k-1c^r-1Kbb^2(k-1)bb^m ⟹ < apply (6) (K ⟶ c) > aa^k-1c^rbb^2(k-1)bb^m = < arithmetic > a^kc^rb^2k+m

Clearly, any two derivations that differ in their "choices" of m, k, or r will produce different strings. Hence, there is only one way to derive any string derivable from S, which is to say that G₁ is unambiguous.

Note that almost every student tried to show G₁'s non-ambiguity by demonstrating that one particular string (e.g., a³c²b⁸) had only one derivation. But that is not a valid argument, as you cannot prove that a property holds generally among a population of entities by demonstrating that one particular entity in that population has that property.

(c) The "new" (and ambiguous) grammar is

S ⟶ Sb | aSbb | aKbb (1) (2) (3)

K ⟶ cK | c (4) (5)

S /|\\ a S bb |\ S b

S |\ | \ S b /|\\ a S bb

It is ambiguous because in any derivation in which both productions (1) and (2) are applied, the order in which they are applied can be changed without changing the string ultimately produced, giving rise to distinct derivation trees having the same yield (as illustrated to the right).

2. Let G₂ be the following context-free grammar:

S ⟶ SH | λ (1) (2)

H ⟶ aHa | bHb | c (3) (4) (5)

Give a precise description of L(G₂), the language generated by G₂.

Solution: Let L be the language { wcw^R | w ∈ {a,b}* }. Then the language generated by the given grammar is L^* = L⁰ ∪ L¹ ∪ L² ∪ L³ ∪ ...

Or, to express it in another way, the generated language is

{ w₁cw₁^R · w₂cw₂^R · ... w_ncw_n^R | n ≥ 0, w_i ∈ {a,b}^* }

3. Present a context-free grammar G₃ that generates the language { a^j b^k c^k+1 b^2j | j≥0, k≥1 }

Solution:

S ⟶ aSbb | bHc (1) (2)

H ⟶ bHc | c (3) (4)

4. Consider the following Chomsky Normal Form grammar G₄:

S ⟶ HS | a | c (1) (2)

H ⟶ KH | b (3) (4)

K ⟶ KS | a (5) (6)

Use the CYK algorithm to determine whether the string acbbc is a member of L(G₄).

Specifically, fill in the cells of the matrix pictured below so that the cell in row i and column j contains V_i,j = { X ∈ {S,H,K} : X ⟹⁺ w_i,j }, where w_i,j is the substring of w (i.e., the string of interest) beginning with its i-th symbol and ending with its j-th symbol (inclusive). (Your answers are expected to include not only the answer to the question "Is w ∈ L(G)?" but also the correctly completed table.)

Recall that, for i satisfying 1≤i≤|w|, X ∈ V_i,i iff X → w_i,i is a production in the grammar. Meanwhile, for i and j satisfying 1≤i<j≤|w|, X ∈ V_i,j iff there exists k, where i≤k<j, and nonterminals Y and Z such that Y ∈ V_i,k, Z ∈ V_k+1,j, and X → YZ is a production in the grammar.

Solution:

1 2 3 4 5 +------+------+------+------+------+ | a | ac | acb | acbb | acbbc| | {S,K}| {K} | {H} | {} | {K,S}| 1 | | | | | | +------+------+------+------+------+ | c | cb | cbb | cbbc | | {S} | {} | {} | {} | 2 | | | | | +------+------+------+------+ | b | bb | bbc | | {H} | {} | {S} | 3 | | | | +------+------+------+ | b | bc | | {H} | {S} | 4 | | | +------+------+ | c | | {S} | 5 | | +------+

The presence of S in cell (1,5) means that S ⟹^* w_1,5 = w = acbbc, which is to verify that w ∈ L(G₄).

A number of students used the notation "{∅}" to denote the empty set, but that is wrong!! The empty set is denoted by either "∅" or "{}" (the latter being used in the table above). What "{∅}" denotes is the set containing as its lone member the empty set.

5. Let G be this q-grammar:

S ⟶ bDb | cE (1) (2)

D ⟶ cS | aEc (3) (4)

E ⟶ aSb | λ (5) (6)

(a) Show the grammar's parse table.

Solution:

a b c $

S - (1) (2) -

D (4) - (3) -

E (5) (6) (6) (6)

	a	b	c	$
S	-	(1)	(2)	-
D	(4)	-	(3)	-
E	(5)	(6)	(6)	(6)

Note that each of b, c, and $ is a member of FOLLOW(E), justifying the three appearances of (6) in the table.

(b) Show the "stack movie" illustrating the steps taken during a parse —guided by the parse table from (a)— of the string cabacbb.

Solution:

Stack Movie (parsing cabacbb) G

+--------+ +--------+ +-------+ +-------+ +------+ |cabacbb$| (2) |cabacbb$| pop |abacbb$| (5) |abacbb$| pop |bacbb$| (1) |S$ | ==> |cE $ | ==> |E$ | ==> |aSb$ | ==> |Sb$ | ==> +--------+ +--------+ +-------+ +-------+ +------+ +------+ +-----+ +------+ +-----+ +----+ 3 +-+ |bacbb$| pop |acbb$| (4) |acbb$ | pop |cbb$ | (6) |cbb$| pops |$| |bDbb$ | ==> |Dbb$ | ==> |aEcbb$| ==> |Ecbb$| ==> |cbb$| ==> |$| +------+ +-----+ +------+ +-----+ +----+ +-+
(1) S → bDb
(2) S → cE
(3) D → cS
(4) D → aEc
(5) E → aSb
(6) E → λ

Stack Movie (parsing cabacbb)	G
+--------+ +--------+ +-------+ +-------+ +------+ \|cabacbb$\| (2) \|cabacbb$\| pop \|abacbb$\| (5) \|abacbb$\| pop \|bacbb$\| (1) \|S$ \| ==> \|cE $ \| ==> \|E$ \| ==> \|aSb$ \| ==> \|Sb$ \| ==> +--------+ +--------+ +-------+ +-------+ +------+ +------+ +-----+ +------+ +-----+ +----+ 3 +-+ \|bacbb$\| pop \|acbb$\| (4) \|acbb$ \| pop \|cbb$ \| (6) \|cbb$\| pops \|$\| \|bDbb$ \| ==> \|Dbb$ \| ==> \|aEcbb$\| ==> \|Ecbb$\| ==> \|cbb$\| ==> \|$\| +------+ +-----+ +------+ +-----+ +----+ +-+	(1) S → bDb (2) S → cE (3) D → cS (4) D → aEc (5) E → aSb (6) E → λ

(c) Let G_c be the grammar resulting from augmenting G by adding production (7): S ⟶ λ. Show that G_c is not a q-grammar. Do so by showing a pair of "successful" stack movies, in one of which production (1) is applied and in the other of which production (7) is applied in the "same circumstance" (i.e., with S at the top of the stack and b being the next input symbol).

Solution: Consider these two stack movies:

Stack Movies (parsing cab and bacb) G_c

+----+ +----+ +---+ +----+ +---+ +--+ +-+ |cab$| (2) |cab$| pop |ab$| (5) |ab$ | pop |b$ | (7) |b$| pop |$| |S$ | ==> |cE$ | ==> |E$ | ==> |aSb$| ==> |Sb$| ==> |b$| ==> |$| +----+ +----+ +---+ +----+ +---+ +--+ +-+ +-----+ +-----+ +----+ +-----+ +----+ +---+ 2 +-+ |bacb$| (1) |bacb$| pop |acb$| (5) |acb$ | pop |cb$ | (6) |cb$| pops |$| |S$ | ==> |bDb$ | ==> |Db$ | ==> |aEcb$| ==> |Ecb$| ==> |cb$| ==> |$| +-----+ +-----+ +----+ +-----+ +----+ +---+ +-+
(1) S → bDb
(2) S → cE
(3) D → cS
(4) D → aEc
(5) E → aSb
(6) E → λ
(7) S → λ

Stack Movies (parsing cab and bacb)	G_c
+----+ +----+ +---+ +----+ +---+ +--+ +-+ \|cab$\| (2) \|cab$\| pop \|ab$\| (5) \|ab$ \| pop \|b$ \| (7) \|b$\| pop \|$\| \|S$ \| ==> \|cE$ \| ==> \|E$ \| ==> \|aSb$\| ==> \|Sb$\| ==> \|b$\| ==> \|$\| +----+ +----+ +---+ +----+ +---+ +--+ +-+ +-----+ +-----+ +----+ +-----+ +----+ +---+ 2 +-+ \|bacb$\| (1) \|bacb$\| pop \|acb$\| (5) \|acb$ \| pop \|cb$ \| (6) \|cb$\| pops \|$\| \|S$ \| ==> \|bDb$ \| ==> \|Db$ \| ==> \|aEcb$\| ==> \|Ecb$\| ==> \|cb$\| ==> \|$\| +-----+ +-----+ +----+ +-----+ +----+ +---+ +-+	(1) S → bDb (2) S → cE (3) D → cS (4) D → aEc (5) E → aSb (6) E → λ (7) S → λ

Each of the two stack movies arrived at a situation in which the next input symbol was b and S was at the top of the stack. In the parse of cab (see next-to-last step), production (7) was applied, but in the parse of bacb (see first step), production (1) was applied instead.

That such a pair of "conflicting" stack movies exists is due to the fact that, in G_c, S has a λ-production, S has a production whose RHS begins with b, and b ∈ FOLLOW(S).

(d) Let G_d be the grammar resulting from augmenting G by adding production (7): D → λ. Is G_d a q-grammar? If it is, explain how the parse table from (a) would be updated to account for this new production. If it is not, explain why not.

Solution: G_d is a q-grammar. Its parse table would be the same as that of G, except that (7) would occupy the (D,b) entry, because b ∈ FOLLOW(D). Indeed, FOLLOW(D) = {b}; the absence of both a and c from that set is vital, as otherwise D would violate the condition that, in a q-grammar, a nonterminal cannot have both a λ-production and a production whose RHS begins with a symbol in its FOLLOW set.

(e) (more difficult) Let G_e be the grammar resulting from replacing production (1) in G by the production (1') S → bDa. Show that G_e is not a q-grammar.

Solution: G_e is not a q-grammar, because E has a λ-production as well as a production (namely (5)) whose RHS begins with a symbol in its FOLLOW set. Specifically, a ∈ FOLLOW(E), as demonstrated by the derivation

S ⟹⁽¹⁾ bDa ⟹⁽³⁾ bcSa ⟹⁽²⁾ bccEa

S ⟶	bDb \| cE	(1) (2)
D ⟶	cS \| aEc	(3) (4)
E ⟶	aSb \| λ	(5) (6)

CMPS 260 Spring 2025 HW #5: Context-free Grammars/Languages; q-grammars, CYK algorithm Sample Solutions

CMPS 260 Spring 2025
HW #5: Context-free Grammars/Languages; q-grammars, CYK algorithm
Sample Solutions