We have shown that the following classes of languages are equivalent, and we call them regular:
Are all languages regular?
Let \(\#(a, b)\) denote the number of occurrences of \(a \in \Sigma^*\) as a substring in \(b \in \Sigma^*\).
Claim: the following language is not regular: \[
L_2 = \setbuild{w \in \binary^*}{\#(0, w) = \#(1, w)}
\] because deciding it requires counting the number of occurrences of symbols \(0\) and \(1\),
needing an unbounded number of states.
Let \(\#(a, b)\) denote the number of occurrences of \(a \in \Sigma^*\) as a substring in \(b \in \Sigma^*\).
Claim: the following language is not regular: \[
L_3 = \setbuild{w \in \binary^*}{\#(01, w) = \#(10, w)}
\] because deciding it requires counting the number of occurrences of substrings \(01\) and \(10\),
needing an unbounded number of states.
Let’s show (for real this time) that the following language is not regular (DFA-decidable): \[ L_1 = \setbuild{0^n 1^n}{n \in \mathbb{N}} = \{\emptystring, 01, 0011, 000111, 00001111, \dots\} \]
This is the idea behind the Myhill-Nerode Theorem: Sections 7.2-7.5
First, some definitions:
Given a language \(L \in \Sigma^*\),
the strings \(x, y \in \Sigma^*\) are called \(L\)-separable or \(L\)-distinguishable if there is a string \(z \in \Sigma^*\) such that:
\[ xz \in L \iff yz \notin L \hspace{15em} \]
Exactly one of \(xz\) or \(yz\) is in \(L\).
(\(xz\in L\ \) XOR \(\ yz\in L\))
This \(z\) is called a separating (or distinguishing) extension for \(x\) and \(y\).
If \(x\) and \(y\) are not \(L\)-separable,
then we say they are \(L\)-equivalent, denoted \(x \sim_L y\).
In other words, \(x \sim_L y\) means that for all \(z \in \Sigma^*\): \[ xz \in L \iff yz \in L \hspace{15em} \]
Both \(xz\) and \(yz\) are in \(L\), or both are not.
Note that \(\sim_L\) is an equivalence relation on \(\Sigma^*\), so it partitions \(\Sigma^*\) into equivalence classes of strings that are indistinguishable by \(L\).
\[ L_1 = \setbuild{0^n 1^n}{n \in \mathbb{N}} = \{\emptystring, 01, 0011, 000111, \ldots\} \]
Do \(00\) and \(000\) have a separating extension with respect to \(L_1\)? Mark all that apply.
\[ L_1 = \setbuild{0^n 1^n}{n \in \mathbb{N}} = \{\emptystring, 01, 0011, 000111, \ldots\} \]
Do strings \(00\) and \(000\) have a separating extension with respect to \(L_1\)?
Do \(010\) and \(0010\) have a separating extension with respect to \(L_1\)? Mark all that apply.
\[ L_1 = \setbuild{0^n 1^n}{n \in \mathbb{N}} = \{\emptystring, 01, 0011, 000111, \ldots\} \]
Theorem (Myhill-Nerode):
A language \(L\) is regular if and only if \(\sim_L\) defines a finite number of equivalence classes.
Furthermore, the number of equivalence classes equals the number of states in the
minimal DFA deciding \(L\).
To prove a language is nonregular we need only one direction of this theorem:
Corollary:
If a language \(L\) has an infinite number of equivalence classes with respect to \(\sim_L\),
then \(L\) is nonregular.
Equivalent corollary:
If for language \(L\) we can construct an infinite set \(S\) of pairwise \(L\)-inequivalent strings,
then \(L\) is nonregular.
\[ L_1 = \setbuild{0^n 1^n}{n \in \mathbb{N}} = \{\emptystring, 01, 0011, 000111, \ldots\} \]
Claim: \(L_1\) is nonregular.
Proof:
\[ L_2 = \setbuild{w \in \binary^*}{\#(0, w) = \#(1, w)} \]
Claim: \(L_2\) is nonregular.
Proof:
The exact same proof applies for \(L_2\)!
\[ L_2 = \setbuild{w \in \binary^*}{\#(0, w) = \#(1, w)} \]
Claim: \(L_2\) is nonregular.
Alternate proof using closure properties:
Claim: the following language is not regular: \[ L_4 = \setbuild{w w}{w \in \binary^*} \]
Proof:
Let \(S = \setbuild{0^n 1}{n \in \mathbb{N}} = \{1, 01, 001, 0001, \ldots\}\).
Any pair \(0^i 1, 0^k 1 \in S\) with \(i \ne k\) is \(L_4\)-inequivalent \(\quad (0^i 1 \not \sim_{L_4} 0^k 1)\).
Separating extension? \(z = 0^i 1\)
\[ 0^i 1 z \fragment{= 0^i 1 0^i 1} \fragment{\in L_4 \quad \text{but} \quad 0^k 1 z =} \fragment{0^k 1 0^i 1} \fragment{\notin L_4\quad\text{since } i \ne k} \]
Therefore every element of \(S\) is in a different equivalence class.
Since \(|S| = \infty\), \(L_4\) is nonregular.
Claim: the following language is not regular: \[ L_5 = \setbuild{0^i 1^j}{i > j} \]
Proof:
Let \(S = \setbuild{0^n}{n \in \mathbb{N}} = \{\epsilon, 0, 00, 000, \ldots\}\).
Any pair \(0^i, 0^k \in S\) with \(i > k\) is \(L_5\)-inequivalent \(\quad (0^i \not \sim_{L_5} 0^k)\).
Separating extension?
\[ 0^i 1^k \in L_5 \quad \text{but} \quad 0^k 1^k \notin L_5 \]
Since \(|S| = \infty\), \(L_5\) is nonregular.
Claim: the following language is not regular: \[ L_6 = \setbuild{0^i 1^j 0^k}{i \cdot j = k} = \{\epsilon, 010, 00100, 01100, \ldots\} \]
Proof:
Let \(S = \setbuild{0^n}{n \in \mathbb{N}}\)
Consider any pair \(0^i, 0^k \in S\) with \(i \ne k\). Separating extension?
Since \(|S| = \infty\), \(L_6\) is nonregular.
Alternative choices for \(S\): \[ S = \setbuild{0^n 1^n}{n \in \mathbb{N}}, \quad \quad S = \setbuild{0^n 1}{n \in \mathbb{N}}, \quad \quad \cdots \]
Claim: the following language is not regular: \[ L_7 = \setbuild{1^{n^2}}{n \in \mathbb{N}} = \{\epsilon, 1, 1111, 111111111, \ldots\} \]
Proof:
Let \(S = L_7\).
Consider any pair \(1^{i^2}, 1^{k^2} \in S\) with \(i > k\).
Separating extension that works for all such pairs?
The gap between adjacent perfect squares is \((k + 1)^2 - k^2 = 2k + 1\), which strictly increases with \(k\). \[ 0,\; 1,\; 4,\; 9,\; 16,\; 25,\; 36,\; 49,\; 64,\; \ldots \]
Since \(i>k\), adding \(2k + 1\) to \(i^2\) is not enough to reach \((i + 1)^2\).
Adding \(2i + 1\) to \(k^2\) could reach \((k + c)^2\) for \(c > 1\).
Example: \(k = 0, i = 4, c = 3\)
Since \(|S| = \infty\), \(L_7\) is nonregular.
Claim: the following language is not regular: \[ L_8 = \setbuild{w \in \binary^*}{w = \reverse{w} \; (w \text{ is a palindrome})} \]
Proof:
Corollary (one direction) of the Myhill-Nerode Theorem:
If a language \(L\) has an infinite number of equivalence classes with respect to \(\sim_L\),
then \(L\) is nonregular.
Proof: (Contrapositive: \(L\) regular \(\implies\) \(\sim_L\) defines a finite number of equivalence classes.)
Let’s show (another way) that the following language is not regular (DFA-decidable): \[ L_1 = \setbuild{0^n 1^n}{n \in \mathbb{N}} = \{\emptystring, 01, 0011, 000111, 00001111, \dots\} \]
The DFA must accept this string with a computation sequence: \[ s = r_0 \fragment{\transition{0} r_1} \fragment{\transition{0} r_2} \fragment{\transition{0} \cdots \transition{0} r_{n}} \fragment{\transition{1} r_{n + 1}} \fragment{\transition{1} \cdots \transition{1} r_{2n} \in F} \qquad \qquad \]
Since \(n + 1 > |Q|\) states appear in \((r_0, r_1, \ldots, r_n)\), at least one must repeat (Pigeonhole Principle): \[ r_0 \cdots \overbrace{\underbrace{r_i \transition{0} \cdots \transition{0} r_j}_{r_i = r_j}}^{\text{substring $y=0^{j-i}$ follows a cycle}} \cdots r_n \transition{1} \cdots \transition{1} r_{2n} \qquad\qquad \]
Nonempty sequence \(r_i \transition{0} \cdots \transition{0} r_{j}\) can be repeated \(k\) more times (overlapping \(r_i, r_j\)) obtaining a new computation sequence that also reaches state \(r_{2n}\), i.e., accepts the string \(x=0^{n + (j - i) k} 1^n \text{ for any } k \in \N^+\).
But since \(n+(j-i)k \ne n\), we have \(x \notin L_1\), contradicting \(L(D) = L_1\) since \(D\) accepts \(x\).
This is the idea behind the Pumping Lemma: Optional sections 7.7-7.9
Pumping Lemma
If \(L\) is regular, then there is a pumping length \(p \in \mathbb{N}\) such that for all \(w \in L\) with \(|w| \geq p\), there is a decomposition into three substrings \(w = x y z\) where:
To prove a language is nonregular using the pumping lemma, apply the contrapositive:
Given \(p\), there is a string \(w \in L, |w| \ge p\) such that for all possible decompositions \(w = x y z\) with \(|y| > 0\) and \(|x y| \leq p\) we find \(x y^i z \notin L\) for some \(i \geq 0\).
Claim: the following language is not regular: \[ L_1 = \setbuild{0^n 1^n}{n \in \mathbb{N}} \]
Proof via the Pumping Lemma:
Claim: the following language is not regular: \[ L_4 = \setbuild{w w}{w \in \binary^*} \]
Proof via the Pumping Lemma: