How may states does a DFA need to recognize a language?
Theorem (Myhill-Nerode):
A language \(L\) is regular if and only if \(\sim_L\) defines a finite number of equivalence classes.
Furthermore, the number of equivalence classes equals the number of states in the
minimal DFA deciding \(L\).
To prove a language is nonregular we need only one direction of this theorem:
Corollary:
If a language \(L\) has an infinite number of equivalence classes with respect to \(\sim_L\),
then \(L\) is nonregular.
Equivalent corollary:
If for language \(L\) we can construct a set \(S\) of pairwise \(L\)-inequivalent strings with \(|S| = \infty\),
then \(L\) is nonregular.
Given a language \(L \in \Sigma^*\),
the strings \(x, y \in \Sigma^*\) are called \(L\)-separable or \(L\)-distinguishable
if there exists a string \(z \in \Sigma^*\) such that:
\[ xz \in L \iff yz \notin L \hspace{10em} \]
Exactly one of \(xz\) and \(yz\) is in \(L\).
This \(z\) is called a separating (or distinguishing) extension for \(x, y\).
If \(x\) and \(y\) are not \(L\)-separable,
then we say they are \(L\)-equivalent, denoted \(x \sim_L y\).
In other words, \(x \sim_L y\) means that for all \(z \in \Sigma^*\): \[ xz \in L \iff yz \in L \]
Claim: the following language is not regular: \[ L_4 = \setbuild{w w}{w \in \binary^*} \]
Proof:
Let \(S = \setbuild{0^n 1}{n \in \mathbb{N}} = \{1, 01, 001, 0001, \ldots\}\).
Any pair \(0^i 1, 0^k 1 \in S\) with \(i \ne k\) is \(L_4\)-inequivalent \(\quad (0^i 1 \not \sim_{L_4} 0^k 1)\).
Separating extension? \(z = 0^i 1\)
\[ 0^i 1 z \fragment{= 0^i 1 0^i 1} \fragment{\in L_4 \quad \text{but} \quad 0^k 1 z =} \fragment{0^k 1 0^i 1} \fragment{\notin L_4\quad\text{since } i \ne k} \]
Therefore every element of \(S\) is in a different equivalence class.
Since \(|S| = \infty\), \(L_4\) is nonregular.
Claim: the following language is not regular: \[ L_5 = \setbuild{0^i 1^j}{i > j} \]
Proof:
Let \(S = \fragment{\setbuild{0^n}{n \in \mathbb{N}} = \{\epsilon, 0, 00, 000, \ldots\}}\).
Any pair \(0^i, 0^k \in S\) with \(i > k\) is \(L_5\)-inequivalent \(\quad (0^i \not \sim_{L_5} 0^k)\).
Separating extension?
\[ 0^i 1^k \in L_5 \quad \text{but} \quad 0^k 1^k \notin L_5 \]
Since \(|S| = \infty\), \(L_5\) is nonregular.
Claim: the following language is not regular: \[ L_6 = \setbuild{0^i 1^j 0^k}{i \cdot j = k} = \{\epsilon, 010, 00100, 01100, \ldots\} \]
Proof:
Let \(S = \setbuild{0^n}{n \in \mathbb{N}}\)
Consider any pair \(0^i, 0^k \in S\) with \(i \ne k\). Separating extension?
Since \(|S| = \infty\), \(L_6\) is nonregular.
Alternative choices for \(S\): \[ S = \setbuild{0^n 1^n}{n \in \mathbb{N}}, \quad \quad S = \setbuild{0^n 1}{n \in \mathbb{N}}, \quad \quad \cdots \]
Claim: the following language is not regular: \[ L_7 = \setbuild{1^{n^2}}{n \in \mathbb{N}} = \{\epsilon, 1, 1111, 111111111, \ldots\} \]
Proof:
Let \(S = L_7\).
Consider any pair \(1^{i^2}, 1^{k^2} \in S\) with \(i > k\).
Separating extension that works for all such pairs?
The gap between adjacent perfect squares is \((k + 1)^2 - k^2 = 2k + 1\), which strictly increases with \(k\). \[ 1,\; 4,\; 9,\; 16,\; 25,\; 36,\; 49,\; 64,\; \ldots \]
Since \(i>k\), adding \(2k + 1\) to \(i^2\) is not enough to reach \((i + 1)^2\).
Adding \(2i + 1\) to \(k^2\) could reach \((k + c)^2\) for \(c > 1\).
Example: \(k = 0, i = 4, c = 3\)
Since \(|S| = \infty\), \(L_7\) is nonregular.
Claim: the following language is not regular: \[ L_8 = \setbuild{w \in \binary^*}{w = \reverse{w} \; (w \text{ is a palindrome})} \]
Proof:
Corollary (one direction) of the Myhill-Nerode Theorem:
If a language \(L\) has an infinite number of equivalence classes with respect to \(\sim_L\),
then \(L\) is nonregular.
Proof: (Contrapositive: \(L\) regular \(\implies\) \(\sim_L\) defines a finite number of equivalence classes.)
Pumping Lemma
If \(L\) is regular, then there exists a pumping length \(p \in \mathbb{N}\) such that for all \(w \in L\) with \(|w| \geq p\), there exists a decomposition into three substrings \(w = x y z\) where:
To prove a language is nonregular using the pumping lemma, apply the contrapositive:
Given \(p\), there exists a string \(w \in L, |w| \ge p\) such that for all possible decompositions \(w = x y z\) with \(|y| > 0\) and \(|x y| \leq p\) we find \(x y^i z \notin L\) for some \(i \geq 0\).
Claim: the following language is not regular: \[ L_1 = \setbuild{0^n 1^n}{n \in \mathbb{N}} \]
Proof via the Pumping Lemma:
Claim: the following language is not regular: \[ L_4 = \setbuild{w w}{w \in \binary^*} \]
Proof via the Pumping Lemma: