Example 1:
{anbn | n ≥ 0}
Solution:
Discussion: Clearly, for any n≥0, we can produce anbn by applying (1) n times followed by an application of (2):
This demonstrates that every string of the desired form can be derived from the grammar's start symbol. To demonstrate that only strings of the desired form can be derived, we can use mathematical induction on the length of the derivation. Specifically, we make this claim:
Claim:
If S ⟹* α (i.e., α is a sentential form of the
grammar), then for some m≥0, either
α = ambm or
α = amSbm.
Proof: by mathematical induction on the length n of the derivation.
Base Case: n = 0. In this case, we have
α = S = a0Sb0, which is of
the required form (with m = 0).
Induction Step: Assume the claim to be true for derivations of length n, and suppose that α is derivable from S by a derivation of length n+1. The sentential form α' immediately preceding α in such a derivation is derivable in n steps from S and hence must be of the form amSbm for some m≥0. (α' cannot lack an occurrence of S or else α' ⟹ α cannot hold.) There are two possibilities for how α was derived from α': either via an application of production (1) or an application of production (2). In the former case, we have
Either way, α is of the correct form.
Example 2: {ambn | m≥n}
Solution:
Discussion: Another way to specify the language is { an+kbn | k≥0}. To generate an+kbn, apply production (1) n times and production (2) k times (in any order), and, finally, apply production (3). Such a derivation could look like this:
Note that, with this grammar, the n applications of (1) and k applications of (2) can be carried out in any order. Indeed, this is an ambiguous grammar, which means one having the property that, for at least one string in the language, there exist at least two non-isomorphic (i.e., non-identical) derivation trees for that string. The simplest example of this is with respect to the string aab, which has these two derivation trees:
S
/|\
a S b
/ \
a S
|
λ
|
S
/ \
a S
/|\
a S b
|
λ
|
A context-free language is said to be inherently ambiguous if every context free grammar that generates that language is ambiguous. Interestingly, such languages exist, the canonical example being {akbmcn | k=m ∨ m=n}
The language of this example is not inherently ambiguous, however, as we can easily massage the grammar so as to force all the "extra" a's to be generated after all the b's have been generated. A grammar that does this is
| S → aSb | M | (1) (2) | |
| M → aM | λ | (3) (4) |
Example 3: {ambn | n≤m≤2n}
Solution:
Another way to describe this language is {an+kbn | 0≤k≤n}. To generate an+kbn, it suffices to apply (1) n-k times and (2) k times, in any order, followed by an application of (3). Hence, this grammar is ambiguous.
Exercise: Devise an unambiguous grammar that generates this language.
Example 4: {(ab)ncn | n≥0}
Solution:
What this example illustrates is that the role played by a single symbol (e.g., such as a in the language {anbn} could instead be played by a string of symbols (e.g., ab here).
Example 5: {(a+b)ncn | n≥0}
Solution:
In this example, we incorporate regular expression notation (namely, (a+b)) into our set comprehension notation.
Example 6: {ambncndm}
Solution:
| S → aSd | M | (1) (2) | |
| M → bMc | λ | (3) (4) |
A variation of this language is {ambncn+m} (in which every occurrence of 'd' in a string is replaced by 'c'), which requires only a small adjustment to the grammar:
| S → aSc | M | (1) (2) | |
| M → bMc | λ | (3) (4) |
Example 7: {ambmcndn}
Solution:
| S → KM | (1) | |
| K → aKb | λ | (2) (3) | |
| M → cMd | λ | (4) (5) |
Here, the job of K is to produce strings in {ambm} and the job of M is to produce strings in {cndn}.
From this example, it is only a small step to recognize that context-free languages are closed under concatenation. That is, if L1 and L2 are context-free, so is L1 · L2 = {xy | x ∈ L1 ∧ y ∈ L2}
To demonstrate this, suppose that G1 and G2 are context-free grammars such that Li = L(Gi) (i=1,2). Further suppose that the nonterminal alphabets of the two grammars are disjoint, which we do "without loss of generality", with their start symbols being S1 and S2, respectively. Moreover, assume that neither grammar includes S as a nonterminal symbol. Then the grammar having S as its start symbol and that includes all the productions of both G1 and G2, plus the production S → S1S2, clearly generates L1 · L2.
Example 8: {ambm}*
Solution:
| S → MS | λ | (1) (2) | |
| M → aMb | λ | (3) (4) |
Example 9: {amb2ncd*bnam | m>0}
Solution:
| S → aSa | aKa | (1) (2) | |
| K → bbKb | M | (3) (4) | |
| M → c | Qd | (5) (6) |
Example 10: {ambncmdn}
Solution: There is no context-free grammar that generates this language! Indeed, where {anbn} is the canonical non-regular language, this may be the canonical non-context-free language.
The proof that this language is not context-free is complicated and is beyond the scope of this course. But what it illustrates is that, at their core, context-free languages are collections of strings that are "properly nested", like the parentheses, brackets, and braces in some kind of algebraic expression.
In the language of this example, you can think of a's as playing the role of left parentheses (i.e., '(') and of c's as playing their right parenthesis mates (i.e., ')'). Meanwhile, the b's play the role of left square brackets (i.e., '[') and d's their right-bracket mates (i.e., ']'). (Why must a's and c's be considered each other's mates, and similarly b's and d's? Because they are required to occur the same number of times.) But the strings of this language are not properly nested. Consider aabbbccddd, corresponding to (([[[))]]].
If you consider every other language among these examples, you can recognize pairs of symbols that act as each other's mates.