By a regular (aka finite-state) language we mean a language L (over some alphabet Σ) such that L = L(M) for some deterministic finite automaton (DFA) M.
Here we describe the (algorithmic) constructions on DFAs that demonstrate that regular languages are closed under complement, union, and intersection.
To make the constructions precise and concise, we make use of the algebraic style for describing DFAs. In that style, a DFA is described as a 5-tuple
where
Let p,q ∈ Q and a ∈ Σ. In terms of the state transition graph describing M, q = δ(p,a) means that the transition leaving state p labeled by symbol a goes to state q.
We can generalize the transition function δ: Q×Σ ⟶ Q (which is applied to ⟨state,symbol⟩ pairs and provides information about single transitions) to obtain δ*: Q × Σ* ⟶ Q, (which is applied to ⟨state,string⟩ pairs and provides information about sequences of transitions) like this:
|
Let p and q be states and x a string over Σ. In terms of the state transition graph describing M, q = δ*(p,x) means that beginning at state p and following the sequence of transitions whose labels "spell out" x lands you in state q.
If the sequence of transitions spelling out x beginning in the initial state, q0, ends in an accepting state, then x is accepted by M, which is to say that x is a member of the language accepted by M, which is denoted by x ∈ L(M). Thus, L(M) can be defined like this:
Definition: L(M) = { x ∈ Σ* | δ*(q0,x) ∈ F }
Theorem 1: If L ⊆ Σ* is a finite-state language, then so is its complement, Σ* − L.
Proof: Let M = (Q, Σ, δ, q0 F) be such that L = L(M). Take M' = (Q, Σ, δ, q0, Q−F). That is, M' is identical to M except that every state's status is flipped from accepting to non-accepting or vice versa. It remains to show that x is accepted by M' iff it is not accepted by M:
x ∈ L(M') = < defn of L(M') > δ*(q0,x) ∈ Q-F = < p ∈ Q-F ≡ p ∉ F > δ*(q0,x) ∉ F = < p ∉ S ≡ ¬(p ∈ S) > ¬(δ*(q0,x) ∈ F) = < defn of L(M) > ¬(x ∈ L(M)) = < ¬(z ∈ S) ≡ z ∉ S > x ∉ L(M) |
To show that regular languages are closed under both intersection and union, it will be useful first to introduce the Cartesian Product Construction on DFAs.
Let M1 = (P, Σ, δ1, p0, F1) and M2 = (Q, Σ, δ1, q0, F2). Using δ1 and δ2, we define δ: (P×Q) × Σ → P×Q like this:
We generalize each of δ1, δ2, and δ to, respectively, δ1*, δ2*, and δ* as was shown above.
Lemma: For all p ∈ P, q ∈ Q, and x ∈ Σ*, δ*([p,q], x) = [δ1*(p,x), δ2*(q,x)]
Proof: (by mathematical induction on |x|).
Base case: |x| = 0 (i.e., x = λ)
δ*([p,q], x) = < x = λ > δ*([p,q], λ) = < defn of δ* > [p,q] = < defn of δi*, i=1,2 > [δ1*(p,λ),δ2*(q,λ)] = < x = λ > [δ1*(p,x), δ2*(q,x)] |
Inductive case: Let n≥0 be arbitrary and assume,
as as induction hypothesis (IH), that the lemma's statement
is true for all strings of length n, and let x = ya,
where |y| = n and a ∈ Σ.
δ*([p,q], x) = < x = ya > δ*([p,q], ya) = < defn of δ* > δ(δ*([p,q], y), a) = < induction hypothesis > δ([δ1*(p,y), δ2*(q,y)], a) = < defn of δ > [δ1(δ1*(p,y),a), δ2(δ2*(q,y),a)] = < defn of δi*, i=1,2 > [δ1*(p,ya), δ2*(q,ya)] = < x = ya > [δ1*(p,x), δ2*(q,x)] |
An alternative version of this proof, using notation depicting walks in graphs, is in the appendix.
Using this construction, we can show that the class of regular languages is closed under both intersection and union.
Theorem 2: If L1 and L2 are regular languages, then so is L1 ∩ L2.
Proof: Let M1 = (P, Σ, δ1, p0, F1) and M2 = (Q, Σ, δ2, q0, F2) be such that L1 = L(M1) and L2 = L(M2).
Take M = (P×Q, Σ, δ, [p0,q0], F), where δ is defined in terms of δ1 and δ2 as described above, and where F = F1×F2. Then L(M) = L1 ∩ L2. In other words, for all x ∈ Σ*, x ∈ L(M) ≡ x ∈ L1 ∩ L2. Here is a demonstration of this fact:
x ∈ L(M) = < defn of L(M) > δ*([p0,q0], x) ∈ F1 × F2 = < logic: (z = z' for some z' ∈ S) ≡ z ∈ S > δ*([p0,q0], x) = [p',q'] for some p' ∈ F1 and q' ∈ F2 = < Lemma above > (δ1*(p0,x) = p' for some p' ∈ F1) ∧ (δ2*(q0,x) = q' for some q' ∈ F2) = < logic: (z = z' for some z' ∈ S) ≡ z ∈ S > δ1*(p0,x) ∈ F1 ∧ δ2*(q0,x) ∈ F2 = < defn of L(M1) and L(M2) x ∈ L(M1) ∧ x ∈ L(M2) = < Li = L(M1) (i=1,2) > x ∈ L1 ∧ x ∈ L2 = < defn of intersection > x ∈ L1 ∩ L2 |
The proof to show that the union of two regular languages is also regular is similar, with the only difference being that the set of final/accepting states in M would be (F1×Q) ∪ (P×F2). That is, F = { [r,s] | r ∈ F1 ∨ s ∈ F2 }
The proof is left to the reader.
Generalizing from symbols to strings of symbols, we use p ⟹x r to mean that the sequence of transitions beginning at state p and whose labels "spell out" x ends in state r. In algebraic notation, this corresponds to δ*(p,x) = r.
The construction: Using this notation, the Cartesian product construction on DFAs M1 and M2 can be described like this:
[p,q] →c [r,s] iff p →c r ∧ q →c s
It is to be understood that p and r name states in M1 while q and s name states in M2.
The alternative version of the lemma that characterizes the result of the Cartesian Product Construction is this:
Lemma: For all p,r ∈ P, q,s ∈ Q, and all x ∈ Σ*, [p,q] ⟹x [r,s] iff p ⟹x r ∧ q ⟹x s
Proof:
Base case: |x| = 0 (i.e., x = λ)
[p,q] ⟹x [r,s] = < x = λ > [p,q] ⟹λ [r,s] = < a path of length zero goes nowhere > [p,q] = [r,s] = < nature of ordered pairs > p = r ∧ q = s = < a path of length zero goes nowhere > p ⟹λ r ∧ q ⟹λ s = < x = λ > p ⟹x r ∧ q ⟹x s |
Inductive case:
Let n≥0 be arbitrary and assume, as the induction hypothesis (IH),
that the lemma's statement is true for all strings of length n, and
let x = ya, where |y| = n and a ∈ Σ.
[p,q] ⟹x [r,s] = < x = ya > [p,q] ⟹ya [r,s] = < nature of state transition graphs > [p,q] ⟹y [r',s'] ⟶a [r,s] for some r'∈P and s'∈Q = < unabbreviate > [p,q] ⟹y [r',s'] ∧ [r',s'] ⟶a [r,s] for some r'∈P and s'∈Q = < IH > p ⟹y r' ∧ q ⟹y s' ∧ [r',s'] ⟶a [r,s] for some r'∈P and s'∈Q = < by the construction of M > p ⟹y r' ∧ q ⟹y s' ∧ r' ⟶a r ∧ s' ⟶a s for some r'∈P and s'∈Q = < ∧ is commutative > (p ⟹y r' ∧ r' ⟶a r) ∧ (q ⟹y s' ∧ s' ⟶a s) for some r'∈P and s'∈Q = < abbreviate > (p ⟹y r' ⟶a r) ∧ (q ⟹y s' ⟶a s) for some r'∈P and s'∈Q = < nature of state transition graphs > p ⟹ya r ∧ q ⟹ya s = < x = ya > p ⟹x r ∧ q ⟹x s |
Here is an alternative version of the proof of Theorem 2.
Theorem 2: If L1 and L2 are regular languages, then so is L1 ∩ L2.
Proof: Let L1 = L(M1) and L2 = L(M2), where M1 = (P, Σ, δ1, p0, F1) and M2 = (Q, Σ, δ2, q0, F2).
Take M to be the DFA obtained by applying the Cartesian Product Construction to M1 and to M2, with initial state [p0, q0] and final states F = F1 × F2. That is, F = { [r,s] | r ∈ F1 ∧ s ∈ F2 } Then L(M) = L1 ∩ L2. Here is a demonstration:
x ∈ L(M) = < defn of L(M) > [p0,q0] ⟹x [r,s], for some [r,s] ∈ F = < F = F1 × F2 > [p0,q0] ⟹x [r,s], for some [r,s] ∈ F1×F2 = < meaning of [r,s] ∈ F1 × F2 > [p0,q0] ⟹x [r,s], for some r ∈ F1 and s ∈ F2 = < Lemma > p0 ⟹x r ∧ q0 ⟹x s, for some r ∈ F1 and s ∈ F2 = < defn of L(M1) and L(M1) x ∈ L(M1) ∧ x ∈ L(M2) = < defn of Li > x ∈ L1 ∧ x ∈ L2 = < meaning of ∩ > x ∈ L1 ∩ L2 |