CMPS 260: Closure Properties of Regular (Finite-state) Languages

By a regular (aka finite-state) language we mean a language L (over some alphabet Σ) such that L = L(M) for some deterministic finite automaton (DFA) M.

Here we describe the (algorithmic) constructions on DFAs that demonstrate that regular languages are closed under complement, union, and intersection.

To make the constructions precise and concise, we make use of the algebraic style for describing DFAs. In that style, a DFA is described as a 5-tuple

M = (Q, Σ, δ, q₀, F)

where

Q is a finite set of states
Σ is an alphabet (a finite set of symbols)
δ: Q×Σ ⟶ Q is a transition function
q₀ ∈ Q is the initial state
F ⊆ Q is the set of accepting (aka final) states.

Let p,q ∈ Q and a ∈ Σ. In terms of the state transition graph describing M, q = δ(p,a) means that the transition leaving state p labeled by symbol a goes to state q.

We can generalize the transition function δ: Q×Σ ⟶ Q (which is applied to ⟨state,symbol⟩ pairs and provides information about single transitions) to obtain δ^*: Q × Σ^* ⟶ Q, (which is applied to ⟨state,string⟩ pairs and provides information about sequences of transitions) like this:

δ^*(p,λ) = p

δ^*(p,xa) = δ(δ^*(p,x),a) (x∈Σ^*, a∈Σ)

Let p and q be states and x a string over Σ. In terms of the state transition graph describing M, q = δ^*(p,x) means that beginning at state p and following the sequence of transitions whose labels "spell out" x lands you in state q.

If the sequence of transitions spelling out x beginning in the initial state, q₀, ends in an accepting state, then x is accepted by M, which is to say that x is a member of the language accepted by M, which is denoted by x ∈ L(M). Thus, L(M) can be defined like this:

Definition: L(M) = { x ∈ Σ^* | δ^*(q₀,x) ∈ F }

Theorem 1: If L ⊆ Σ^* is a finite-state language, then so is its complement, Σ^* − L.

Proof: Let M = (Q, Σ, δ, q₀ F) be such that L = L(M). Take M' = (Q, Σ, δ, q₀, Q−F). That is, M' is identical to M except that every state's status is flipped from accepting to non-accepting or vice versa. It remains to show that x is accepted by M' iff it is not accepted by M:

x ∈ L(M') = < defn of L(M') > δ^*(q₀,x) ∈ Q-F = < p ∈ Q-F ≡ p ∉ F > δ^*(q₀,x) ∉ F = < p ∉ S ≡ ¬(p ∈ S) > ¬(δ^*(q₀,x) ∈ F) = < defn of L(M) > ¬(x ∈ L(M)) = < ¬(z ∈ S) ≡ z ∉ S > x ∉ L(M)

Cartesian Product Construction

To show that regular languages are closed under both intersection and union, it will be useful first to introduce the Cartesian Product Construction on DFAs.

Let M₁ = (P, Σ, δ₁, p₀, F₁) and M₂ = (Q, Σ, δ₁, q₀, F₂). Using δ₁ and δ₂, we define δ: (P×Q) × Σ → P×Q like this:

δ([p,q], c) = [δ₁(p,c), δ₂(q,c)]

We generalize each of δ₁, δ₂, and δ to, respectively, δ₁^*, δ₂^*, and δ^* as was shown above.

Lemma: For all p ∈ P, q ∈ Q, and x ∈ Σ^*, δ^*([p,q], x) = [δ₁^*(p,x), δ₂^*(q,x)]

Proof: (by mathematical induction on |x|).

Base case: |x| = 0 (i.e., x = λ)
δ^*([p,q], x) = < x = λ > δ^*([p,q], λ) = < defn of δ^* > [p,q] = < defn of δ_i^*, i=1,2 > [δ₁^*(p,λ),δ₂^*(q,λ)] = < x = λ > [δ₁^*(p,x), δ₂^*(q,x)]
Inductive case: Let n≥0 be arbitrary and assume, as as induction hypothesis (IH), that the lemma's statement is true for all strings of length n, and let x = ya, where |y| = n and a ∈ Σ.
δ^*([p,q], x) = < x = ya > δ^*([p,q], ya) = < defn of δ^* > δ(δ^*([p,q], y), a) = < induction hypothesis > δ([δ₁^*(p,y), δ₂^*(q,y)], a) = < defn of δ > [δ₁(δ₁^*(p,y),a), δ₂(δ₂^*(q,y),a)] = < defn of δ_i^*, i=1,2 > [δ₁^*(p,ya), δ₂^*(q,ya)] = < x = ya > [δ₁^*(p,x), δ₂^*(q,x)]

An alternative version of this proof, using notation depicting walks in graphs, is in the appendix.

Using this construction, we can show that the class of regular languages is closed under both intersection and union.

Theorem 2: If L₁ and L₂ are regular languages, then so is L₁ ∩ L₂.

Proof: Let M₁ = (P, Σ, δ₁, p₀, F₁) and M₂ = (Q, Σ, δ₂, q₀, F₂) be such that L₁ = L(M₁) and L₂ = L(M₂).

Take M = (P×Q, Σ, δ, [p₀,q₀], F), where δ is defined in terms of δ₁ and δ₂ as described above, and where F = F₁×F₂. Then L(M) = L₁ ∩ L₂. In other words, for all x ∈ Σ^*, x ∈ L(M) ≡ x ∈ L₁ ∩ L₂. Here is a demonstration of this fact:

x ∈ L(M) = < defn of L(M) > δ^*([p₀,q₀], x) ∈ F₁ × F₂ = < logic: (z = z' for some z' ∈ S) ≡ z ∈ S > δ^*([p₀,q₀], x) = [p',q'] for some p' ∈ F₁ and q' ∈ F₂ = < Lemma above > (δ₁^*(p₀,x) = p' for some p' ∈ F₁) ∧ (δ₂^*(q₀,x) = q' for some q' ∈ F₂) = < logic: (z = z' for some z' ∈ S) ≡ z ∈ S > δ₁^*(p₀,x) ∈ F₁ ∧ δ₂^*(q₀,x) ∈ F₂ = < defn of L(M₁) and L(M₂) x ∈ L(M₁) ∧ x ∈ L(M₂) = < L_i = L(M₁) (i=1,2) > x ∈ L₁ ∧ x ∈ L₂ = < defn of intersection > x ∈ L₁ ∩ L₂

The proof to show that the union of two regular languages is also regular is similar, with the only difference being that the set of final/accepting states in M would be (F₁×Q) ∪ (P×F₂). That is, F = { [r,s] | r ∈ F₁ ∨ s ∈ F₂ }

The proof is left to the reader.

Appendix: Alternative Versions of Construction and Proofs

Here we will use the notation p →^c r to mean that (in some state transition graph understood from context) there is a transition from state p to state q on the symbol c. In algebraic notation, this corresponds to δ(p,c) = r.

Generalizing from symbols to strings of symbols, we use p ⟹^x r to mean that the sequence of transitions beginning at state p and whose labels "spell out" x ends in state r. In algebraic notation, this corresponds to δ^*(p,x) = r.

The construction: Using this notation, the Cartesian product construction on DFAs M₁ and M₂ can be described like this:

[p,q] →^c [r,s] iff p →^c r ∧ q →^c s

It is to be understood that p and r name states in M₁ while q and s name states in M₂.

The alternative version of the lemma that characterizes the result of the Cartesian Product Construction is this:

Lemma: For all p,r ∈ P, q,s ∈ Q, and all x ∈ Σ^*, [p,q] ⟹^x [r,s] iff p ⟹^x r ∧ q ⟹^x s

Proof:

Base case: |x| = 0 (i.e., x = λ)
[p,q] ⟹^x [r,s] = < x = λ > [p,q] ⟹^λ [r,s] = < a path of length zero goes nowhere > [p,q] = [r,s] = < nature of ordered pairs > p = r ∧ q = s = < a path of length zero goes nowhere > p ⟹^λ r ∧ q ⟹^λ s = < x = λ > p ⟹^x r ∧ q ⟹^x s
Inductive case: Let n≥0 be arbitrary and assume, as the induction hypothesis (IH), that the lemma's statement is true for all strings of length n, and let x = ya, where |y| = n and a ∈ Σ.
[p,q] ⟹^x [r,s] = < x = ya > [p,q] ⟹^ya [r,s] = < nature of state transition graphs > [p,q] ⟹^y [r',s'] ⟶^a [r,s] for some r'∈P and s'∈Q = < unabbreviate > [p,q] ⟹^y [r',s'] ∧ [r',s'] ⟶^a [r,s] for some r'∈P and s'∈Q = < IH > p ⟹^y r' ∧ q ⟹^y s' ∧ [r',s'] ⟶^a [r,s] for some r'∈P and s'∈Q = < by the construction of M > p ⟹^y r' ∧ q ⟹^y s' ∧ r' ⟶^a r ∧ s' ⟶^a s for some r'∈P and s'∈Q = < ∧ is commutative > (p ⟹^y r' ∧ r' ⟶^a r) ∧ (q ⟹^y s' ∧ s' ⟶^a s) for some r'∈P and s'∈Q = < abbreviate > (p ⟹^y r' ⟶^a r) ∧ (q ⟹^y s' ⟶^a s) for some r'∈P and s'∈Q = < nature of state transition graphs > p ⟹^ya r ∧ q ⟹^ya s = < x = ya > p ⟹^x r ∧ q ⟹^x s

Here is an alternative version of the proof of Theorem 2.

Theorem 2: If L₁ and L₂ are regular languages, then so is L₁ ∩ L₂.

Proof: Let L₁ = L(M₁) and L₂ = L(M₂), where M₁ = (P, Σ, δ₁, p₀, F₁) and M₂ = (Q, Σ, δ₂, q₀, F₂).

Take M to be the DFA obtained by applying the Cartesian Product Construction to M₁ and to M₂, with initial state [p₀, q₀] and final states F = F₁ × F₂. That is, F = { [r,s] | r ∈ F₁ ∧ s ∈ F₂ } Then L(M) = L₁ ∩ L₂. Here is a demonstration:

x ∈ L(M) = < defn of L(M) > [p₀,q₀] ⟹^x [r,s], for some [r,s] ∈ F = < F = F₁ × F₂ > [p₀,q₀] ⟹^x [r,s], for some [r,s] ∈ F₁×F₂ = < meaning of [r,s] ∈ F₁ × F₂ > [p₀,q₀] ⟹^x [r,s], for some r ∈ F₁ and s ∈ F₂ = < Lemma > p₀ ⟹^x r ∧ q₀ ⟹^x s, for some r ∈ F₁ and s ∈ F₂ = < defn of L(M₁) and L(M₁) x ∈ L(M₁) ∧ x ∈ L(M₂) = < defn of L_i > x ∈ L₁ ∧ x ∈ L₂ = < meaning of ∩ > x ∈ L₁ ∩ L₂