CMPS 260
Examples of Devising Context-free Grammars

Example 1: {aⁿbⁿ | n ≥ 0}
Solution:

S → aSb | λ (1) (2)

Discussion: Clearly, for any n≥0, we can produce aⁿbⁿ by applying (1) n times followed by an application of (2):

S ⟹⁽¹⁾ aSb ⟹⁽¹⁾ aaSbb ⟹⁽¹⁾ ··· ⟹⁽¹⁾ aⁿSbⁿ ⟹⁽²⁾ aⁿbⁿ

This demonstrates that every string of the desired form can be derived from the grammar's start symbol. To demonstrate that only strings of the desired form can be derived, we can use mathematical induction on the length of the derivation. Specifically, we make this claim:

Claim: If S ⟹^* α (i.e., α is a sentential form of the grammar), then for some m≥0, either α = a^mb^m or α = a^mSb^m.
Proof: by mathematical induction on the length n of the derivation.
Base Case: n = 0. In this case, we have α = S = a⁰Sb⁰, which is of the required form (with m = 0).

Induction Step: Assume the claim to be true for derivations of length n, and suppose that α is derivable from S by a derivation of length n+1. The sentential form α' immediately preceding α in such a derivation is derivable in n steps from S and hence must be of the form a^mSb^m for some m≥0. (α' cannot lack an occurrence of S or else α' ⟹ α cannot hold.) There are two possibilities for how α was derived from α': either via an application of production (1) or an application of production (2). In the former case, we have

α' = a^mSb^m ⟹⁽¹⁾ a^m+1Sb^m+1 = α In the latter case, we have

α' = a^mSb^m ⟹⁽²⁾ a^mb^m = α

Either way, α is of the correct form.

Example 2: {a^mbⁿ | m≥n}

Solution:

S → aSb | aS | λ (1) (2) (3)

Discussion: Another way to specify the language is { a^n+kbⁿ | k≥0}. To generate a^n+kbⁿ, apply production (1) n times and production (2) k times (in any order), and, finally, apply production (3). Such a derivation could look like this:

S ⇒⁽¹⁾ aSb ⇒⁽¹⁾ aaSbb ⇒⁽¹⁾ ··· ⇒⁽¹⁾ aⁿSbⁿ ⇒⁽²⁾ aⁿ⁺¹Sbⁿ ⇒⁽²⁾ aⁿ⁺²Sbⁿ ⇒ ··· ⇒⁽²⁾ a^n+kSbⁿ ⇒⁽³⁾ a^n+kbⁿ

Note that, with this grammar, the n applications of (1) and k applications of (2) can be carried out in any order. Indeed, this is an ambiguous grammar, which means one having the property that, for at least one string in the language, there exist at least two non-isomorphic (i.e., non-identical) derivation trees for that string. The simplest example of this is with respect to the string aab, which has these two derivation trees:

S /|\ a S b / \ a S | λ

S / \ a S /|\ a S b | λ

A context-free language is said to be inherently ambiguous if every context free grammar that generates that language is ambiguous. Interestingly, such languages exist, the canonical example being {a^kb^mcⁿ | k=m ∨ m=n}

The language of this example is not inherently ambiguous, however, as we can easily massage the grammar so as to force all the "extra" a's to be generated after all the b's have been generated. A grammar that does this is

S → aSb | M (1) (2)

M → aM | λ (3) (4)

To derive a^n+kbⁿ using this grammar, one must apply (1) n times, followed by an application of (2), followed by k applications of (3), followed by an application of (4). There is no choice!

Example 3: {a^mbⁿ | n≤m≤2n}

Solution:

S → aSb | aaSb | λ (1) (2) (3)

Another way to describe this language is {a^n+kbⁿ | 0≤k≤n}. To generate a^n+kbⁿ, it suffices to apply (1) n-k times and (2) k times, in any order, followed by an application of (3). Hence, this grammar is ambiguous.

Exercise: Devise an unambiguous grammar that generates this language.

Example 4: {(ab)ⁿcⁿ | n≥0}

Solution:

S → abSc | λ (1) (2)

What this example illustrates is that the role played by a single symbol (e.g., such as a in the language {aⁿbⁿ} could instead be played by a string of symbols (e.g., ab here).

Example 5: {(a+b)ⁿcⁿ | n≥0}

Solution:

S → aSc | bSc | λ (1) (2) (3)

In this example, we incorporate regular expression notation (namely, (a+b)) into our set comprehension notation.

Example 6: {a^mbⁿcⁿd^m}

Solution:

S → aSd | M (1) (2)

M → bMc | λ (3) (4)

A variation of this language is {a^mbⁿc^n+m} (in which every occurrence of 'd' in a string is replaced by 'c'), which requires only a small adjustment to the grammar:

S → aSc | M (1) (2)

M → bMc | λ (3) (4)

Example 7: {a^mb^mcⁿdⁿ}

Solution:

S → KM (1)

K → aKb | λ (2) (3)

M → cMd | λ (4) (5)

Here, the job of K is to produce strings in {a^mb^m} and the job of M is to produce strings in {cⁿdⁿ}.

From this example, it is only a small step to recognize that context-free languages are closed under concatenation. That is, if L₁ and L₂ are context-free, so is L₁ · L₂ = {xy | x ∈ L₁ ∧ y ∈ L₂}

To demonstrate this, suppose that G₁ and G₂ are context-free grammars such that L_i = L(G_i) (i=1,2). Further suppose that the nonterminal alphabets of the two grammars are disjoint, which we do "without loss of generality", with their start symbols being S₁ and S₂, respectively. Moreover, assume that neither grammar includes S as a nonterminal symbol. Then the grammar having S as its start symbol and that includes all the productions of both G₁ and G₂, plus the production S → S₁S₂, clearly generates L₁ · L₂.

Example 8: {a^mb^m}^*

Solution:

S → MS | λ (1) (2)

M → aMb | λ (3) (4)

Example 9: {a^mb²ⁿcd^*bⁿa^m | m>0}

Solution:

S → aSa | aKa (1) (2)

K → bbKb | M (3) (4)

M → c | Md (5) (6)

Example 10: {a^mbⁿc^mdⁿ}

Solution: There is no context-free grammar that generates this language! Indeed, where {aⁿbⁿ} is the canonical non-regular language, this may be the canonical non-context-free language.

The proof that this language is not context-free is complicated and is beyond the scope of this course. But what it illustrates is that, at their core, context-free languages are collections of strings that are "properly nested", like the parentheses, brackets, and braces in some kind of algebraic expression.

In the language of this example, you can think of a's as playing the role of left parentheses (i.e., '(') and of c's as playing their right parenthesis mates (i.e., ')'). Meanwhile, the b's play the role of left square brackets (i.e., '[') and d's their right-bracket mates (i.e., ']'). (Why must a's and c's be considered each other's mates, and similarly b's and d's? Because they are required to occur the same number of times.) But the strings of this language are not properly nested. Consider aabbbccddd, corresponding to (([[[))]]].

If you consider every other language among these examples, you can recognize pairs of symbols that act as each other's mates.

S → aSa \| aKa		(1) (2)
K → bbKb \| M		(3) (4)
M → c \| Md		(5) (6)

CMPS 260 Examples of Devising Context-free Grammars

CMPS 260
Examples of Devising Context-free Grammars