We have shown that the following classes of languages are equivalent:
We call this class of languages the regular languages.
Are all languages regular?
Let \(\#(a, b)\) denote the number of occurrences of \(a \in \Sigma^*\) as a substring in \(b \in \Sigma^*\).
Claim: the following language is not regular: \[
L_2 = \setbuild{w \in \binary^*}{\#(0, w) = \#(1, w)}
\] because deciding it requires counting the number of occurrences of symbols \(0\) and \(1\),
needing an unbounded number of states.
Let \(\#(a, b)\) denote the number of occurrences of \(a \in \Sigma^*\) as a substring in \(b \in \Sigma^*\).
Claim: the following language is not regular: \[
L_3 = \setbuild{w \in \binary^*}{\#(01, w) = \#(10, w)}
\] because deciding it requires counting the number of occurrences of substrings \(01\) and \(10\),
needing an unbounded number of states.
Let’s show (for real this time) that the following language is not regular (DFA-decidable): \[ L_1 = \setbuild{0^n 1^n}{n \in \mathbb{N}} = \{\emptystring, 01, 0011, 000111, \ldots\} \]
The DFA must accept this string with a computation sequence: \[ r_0 \fragment{\transition{0} r_1} \fragment{\transition{0} \cdots} \fragment{\transition{0} r_{n}} \fragment{\transition{1} r_{n + 1}} \fragment{\transition{1} \cdots \transition{1} r_{2n}} \] where \(r_0 = s\) and \(r_{2n} \in F\).
Since \(n + 1 > |Q|\) states appear in \((r_0, r_1, \ldots, r_n)\), at least one must repeat (Pigeonhole Principle): \[ r_0 \cdots \overbrace{\underbrace{r_i \transition{0} \cdots \transition{0} r_j}_{r_i = r_j}}^{\text{substring $y=0^{j-i}$ follows a cycle}} \cdots r_n \transition{1} \cdots \transition{1} r_{2n} \qquad\qquad \]
Nonempty sequence \(r_i \transition{0} \cdots \transition{0} r_{j}\) can be repeated \(k\) more times (overlapping \(r_i, r_j\)) obtaining a new computation sequence that accepts the string \(x=0^{n + (j - i) k} 1^n \text{ for any } k \in \N^+\).
But since \(n+(j-i)k \ne n\), we have \(x \notin L_1\), contradicting \(L(D) = L_1\) since \(D\) accepts \(x\).
This is the idea behind the Pumping Lemma: (Optional sections 7.7-7.9)
Let’s show (a different way) that the following language is not regular (DFA-decidable): \[ L_1 = \setbuild{0^n 1^n}{n \in \mathbb{N}} = \{\emptystring, 01, 0011, 000111, \ldots\} \]
This is the idea behind the Myhill-Nerode Theorem: (Sections 7.2-7.5)
First, some definitions:
Given a language \(L \in \Sigma^*\),
the strings \(x, y \in \Sigma^*\) are called \(L\)-separable or \(L\)-distinguishable
if there exists a string \(z \in \Sigma^*\) such that:
\[ xz \in L \iff yz \notin L \hspace{10em} \]
Exactly one of \(xz\) and \(yz\) is in \(L\).
This \(z\) is called a separating (or distinguishing) extension for \(x, y\).
If \(x\) and \(y\) are not \(L\)-separable,
then we say they are \(L\)-equivalent, denoted \(x \sim_L y\).
In other words, \(x \sim_L y\) means that for all \(z \in \Sigma^*\): \[ xz \in L \iff yz \in L \]
Note that \(\sim_L\) is an equivalence relation on \(\Sigma^*\), so it partitions \(\Sigma^*\) into equivalence classes of strings that are indistinguishable by \(L\).
\[ L_1 = \setbuild{0^n 1^n}{n \in \mathbb{N}} = \{\emptystring, 01, 0011, 000111, \ldots\} \]
Theorem (Myhill-Nerode):
A language \(L\) is regular if and only if \(\sim_L\) defines a finite number of equivalence classes.
Furthermore, the number of equivalence classes equals the number of states in the
minimal DFA deciding \(L\).
To prove a language is nonregular we need only one direction of this theorem:
Corollary:
If a language \(L\) has an infinite number of equivalence classes with respect to \(\sim_L\),
then \(L\) is nonregular.
Equivalent corollary:
If for language \(L\) we can construct a set \(S\) of pairwise \(L\)-inequivalent strings with \(|S| = \infty\),
then \(L\) is nonregular.
\[ L_1 = \setbuild{0^n 1^n}{n \in \mathbb{N}} = \{\emptystring, 01, 0011, 000111, \ldots\} \]
Claim: \(L_1\) is nonregular.
Proof:
\[ L_2 = \setbuild{w \in \binary^*}{\#(0, w) = \#(1, w)} \]
Claim: \(L_2\) is nonregular.
Proof:
The exact same proof applies for \(L_2\)!
\[ L_2 = \setbuild{w \in \binary^*}{\#(0, w) = \#(1, w)} \]
Claim: \(L_2\) is nonregular.
Alternate proof by closure: