1. Let G1 be the following context-free grammar:
S | ⟶ | Sb | aHb | (1) (2) |
H | ⟶ | aHbb | Kb | (3) (4) |
K | ⟶ | cK | c | (5) (6) |
(a) Offer a precise description of L(G1), the language generated by G1.
(b) Provide some evidence for the claim that G1 is unambiguous. To do so, choose a "generic" member of L(G1), describe its derivation from S, and argue that there is no other way to derive it from S.
(c) Present an alternative CFG G'1 that generates the same language as G1 but whose only nonterminal symbols are S and K, with K having the same two productions as are in G1. (In other words, modify the grammar so that S "does the work" of both S and H in the original grammar.) Demonstrate that your grammar is ambiguous.
Solution:
(a) {akcrb2k+m
| k≥1, m≥0, r≥1}
(b) A few observations about G1 are in order. First, it is linear (meaning that no production's right-hand side includes more than one nonterminal symbol). It follows that S ⟹* α implies that α has at most one occurrence of a nonterminal. Keeping that in mind, it is clear that any application of (2) replaces the lone occurrence of S in a sentential form by H, making it impossible to later apply either (1) or (2). By similar reasoning, an application of (4) cannot later be followed by an application of either (3) or (4) (because H was replaced by K). Also, an application of (6) erases the lone nonterminal symbol in the sentential form, terminating the derivation.
From all this we can conclude that every derivation of a terminal string from S looks like this:
S ⟹m < apply (1) (S ⟶ Sb) m times (m≥0) > Sbm ⟹ < apply (2) (S ⟶ aHb) > aHbbm ⟹k-1 < apply (3) (H ⟶ aHbb) k-1 times (k>0) > aak-1Hb2(k-1)bbm ⟹ < apply (4) (H ⟶ Kb) > aak-1Kbb2(k-1)bbm ⟹r-1 < apply (5) (K ⟶ cK) r-1 times (r>0) > aak-1cr-1Kbb2(k-1)bbm ⟹ < apply (6) (K ⟶ c) > aak-1crbb2(k-1)bbm = < arithmetic > akcrb2k+m |
Clearly, any two derivations that differ in their "choices" of m, k, or r will produce different strings. Hence, there is only one way to derive any string derivable from S, which is to say that G1 is unambiguous.
Note that almost every student tried to show G1's non-ambiguity by demonstrating that one particular string (e.g., a3c2b8) had only one derivation. But that is not a valid argument, as you cannot prove that a property holds generally among a population of entities by demonstrating that one particular entity in that population has that property.
(c) The "new" (and ambiguous) grammar is
S | ⟶ | Sb | aSbb | aKbb | (1) (2) (3) |
K | ⟶ | cK | c | (4) (5) |
S /|\\ a S bb |\ S b |
S |\ | \ S b /|\\ a S bb |
2. Let G2 be the following context-free grammar:
S | ⟶ | SH | λ | (1) (2) |
H | ⟶ | aHa | bHb | c | (3) (4) (5) |
Give a precise description of L(G2), the language generated by G2.
Solution: Let L be the language { wcwR | w ∈ {a,b}* }. Then the language generated by the given grammar is L* = L0 ∪ L1 ∪ L2 ∪ L3 ∪ ...
Or, to express it in another way, the generated language is
3. Present a context-free grammar G3 that generates the language { aj bk ck+1 b2j | j≥0, k≥1 }
Solution:
S | ⟶ | aSbb | bHc | (1) (2) |
H | ⟶ | bHc | c | (3) (4) |
4. Consider the following Chomsky Normal Form grammar G4:
S | ⟶ | HS | a | c | (1) (2) |
H | ⟶ | KH | b | (3) (4) |
K | ⟶ | KS | a | (5) (6) |
Use the CYK algorithm to determine whether the string acbbc is a member of L(G4).
Specifically, fill in the cells of the matrix pictured below so that the cell in row i and column j contains Vi,j = { X ∈ {S,H,K} : X ⟹+ wi,j }, where wi,j is the substring of w (i.e., the string of interest) beginning with its i-th symbol and ending with its j-th symbol (inclusive). (Your answers are expected to include not only the answer to the question "Is w ∈ L(G)?" but also the correctly completed table.)
Recall that, for i satisfying 1≤i≤|w|, X ∈ Vi,i iff X → wi,i is a production in the grammar. Meanwhile, for i and j satisfying 1≤i<j≤|w|, X ∈ Vi,j iff there exists k, where i≤k<j, and nonterminals Y and Z such that Y ∈ Vi,k, Z ∈ Vk+1,j, and X → YZ is a production in the grammar.
Solution:
1 2 3 4 5 +------+------+------+------+------+ | a | ac | acb | acbb | acbbc| | {S,K}| {K} | {H} | {} | {K,S}| 1 | | | | | | +------+------+------+------+------+ | c | cb | cbb | cbbc | | {S} | {} | {} | {} | 2 | | | | | +------+------+------+------+ | b | bb | bbc | | {H} | {} | {S} | 3 | | | | +------+------+------+ | b | bc | | {H} | {S} | 4 | | | +------+------+ | c | | {S} | 5 | | +------+ |
The presence of S in cell (1,5) means that S ⟹* w1,5 = w = acbbc, which is to verify that w ∈ L(G4).
A number of students used the notation "{∅}" to denote the empty set, but that is wrong!! The empty set is denoted by either "∅" or "{}" (the latter being used in the table above). What "{∅}" denotes is the set containing as its lone member the empty set.
5. Let G be this q-grammar:
S ⟶ | bDb | cE | (1) (2) |
D ⟶ | cS | aEc | (3) (4) |
E ⟶ | aSb | λ | (5) (6) |
(a) Show the grammar's parse table.
Solution:
a | b | c | $ | |
---|---|---|---|---|
S | - | (1) | (2) | - |
D | (4) | - | (3) | - |
E | (5) | (6) | (6) | (6) |
Note that each of b, c, and $ is a member of FOLLOW(E), justifying the three appearances of (6) in the table.
(b) Show the "stack movie" illustrating the steps taken during a parse —guided by the parse table from (a)— of the string cabacbb.
Solution:
Stack Movie (parsing cabacbb) | G |
---|---|
+--------+ +--------+ +-------+ +-------+ +------+ |cabacbb$| (2) |cabacbb$| pop |abacbb$| (5) |abacbb$| pop |bacbb$| (1) |S$ | ==> |cE $ | ==> |E$ | ==> |aSb$ | ==> |Sb$ | ==> +--------+ +--------+ +-------+ +-------+ +------+ +------+ +-----+ +------+ +-----+ +----+ 3 +-+ |bacbb$| pop |acbb$| (4) |acbb$ | pop |cbb$ | (6) |cbb$| pops |$| |bDbb$ | ==> |Dbb$ | ==> |aEcbb$| ==> |Ecbb$| ==> |cbb$| ==> |$| +------+ +-----+ +------+ +-----+ +----+ +-+ |
(1) S → bDb (2) S → cE (3) D → cS (4) D → aEc (5) E → aSb (6) E → λ |
(c) Let Gc be the grammar resulting from augmenting G by adding production (7): S ⟶ λ. Show that Gc is not a q-grammar. Do so by showing a pair of "successful" stack movies, in one of which production (1) is applied and in the other of which production (7) is applied in the "same circumstance" (i.e., with S at the top of the stack and b being the next input symbol).
Solution: Consider these two stack movies:
Stack Movies (parsing cab and bacb) | Gc |
---|---|
+----+ +----+ +---+ +----+ +---+ +--+ +-+ |cab$| (2) |cab$| pop |ab$| (5) |ab$ | pop |b$ | (7) |b$| pop |$| |S$ | ==> |cE$ | ==> |E$ | ==> |aSb$| ==> |Sb$| ==> |b$| ==> |$| +----+ +----+ +---+ +----+ +---+ +--+ +-+ +-----+ +-----+ +----+ +-----+ +----+ +---+ 2 +-+ |bacb$| (1) |bacb$| pop |acb$| (5) |acb$ | pop |cb$ | (6) |cb$| pops |$| |S$ | ==> |bDb$ | ==> |Db$ | ==> |aEcb$| ==> |Ecb$| ==> |cb$| ==> |$| +-----+ +-----+ +----+ +-----+ +----+ +---+ +-+ |
(1) S → bDb (2) S → cE (3) D → cS (4) D → aEc (5) E → aSb (6) E → λ (7) S → λ |
Each of the two stack movies arrived at a situation in which the next input symbol was b and S was at the top of the stack. In the parse of cab (see next-to-last step), production (7) was applied, but in the parse of bacb (see first step), production (1) was applied instead.
That such a pair of "conflicting" stack movies exists is due to the fact that, in Gc, S has a λ-production, S has a production whose RHS begins with b, and b ∈ FOLLOW(S).
(d) Let Gd be the grammar resulting from augmenting G by adding production (7): D → λ. Is Gd a q-grammar? If it is, explain how the parse table from (a) would be updated to account for this new production. If it is not, explain why not.
Solution: Gd is a q-grammar. Its parse table would be the same as that of G, except that (7) would occupy the (D,b) entry, because b ∈ FOLLOW(D). Indeed, FOLLOW(D) = {b}; the absence of both a and c from that set is vital, as otherwise D would violate the condition that, in a q-grammar, a nonterminal cannot have both a λ-production and a production whose RHS begins with a symbol in its FOLLOW set.
(e) (more difficult) Let Ge be the grammar resulting from replacing production (1) in G by the production (1') S → bDa. Show that Ge is not a q-grammar.
Solution: Ge is not a q-grammar, because E has a λ-production as well as a production (namely (5)) whose RHS begins with a symbol in its FOLLOW set. Specifically, a ∈ FOLLOW(E), as demonstrated by the derivation