Title

ECS 120 Theory of Computation
Nonregular Languages Pt. 2
Julian Panetta
University of California, Davis

Recall: Myhill-Nerode Theorem

How may states does a DFA need to recognize a language?

Theorem (Myhill-Nerode):
A language \(L\) is regular if and only if \(\sim_L\) defines a finite number of equivalence classes.

Furthermore, the number of equivalence classes equals the number of states in the
minimal DFA deciding \(L\).

To prove a language is nonregular we need only one direction of this theorem:

Corollary:
If a language \(L\) has an infinite number of equivalence classes with respect to \(\sim_L\),
then \(L\) is nonregular.

Equivalent corollary:
If for language \(L\) we can construct a set \(S\) of pairwise \(L\)-inequivalent strings with \(|S| = \infty\),
then \(L\) is nonregular.

Recall: \(L\)-equivalence

Given a language \(L \in \Sigma^*\),
the strings \(x, y \in \Sigma^*\) are called \(L\)-separable or \(L\)-distinguishable
if there exists a string \(z \in \Sigma^*\) such that:

\[ xz \in L \iff yz \notin L \hspace{10em} \]

Exactly one of \(xz\) and \(yz\) is in \(L\).

This \(z\) is called a separating (or distinguishing) extension for \(x, y\).

If \(x\) and \(y\) are not \(L\)-separable,
then we say they are \(L\)-equivalent, denoted \(x \sim_L y\).

In other words, \(x \sim_L y\) means that for all \(z \in \Sigma^*\): \[ xz \in L \iff yz \in L \]

  • \(\sim_L\) is an equivalence relation on \(\Sigma^*\).
  • It partitions \(\Sigma^*\) into equivalence classes of strings that are indistinguishable by \(L\).

More Examples Applying the Myhill-Nerode Theorem

Claim: the following language is not regular: \[ L_4 = \setbuild{w w}{w \in \binary^*} \]

Proof:

  • Let \(S = \setbuild{0^n 1}{n \in \mathbb{N}} = \{1, 01, 001, 0001, \ldots\}\).

  • Any pair \(0^i 1, 0^k 1 \in S\) with \(i \ne k\) is \(L_4\)-inequivalent \(\quad (0^i 1 \not \sim_{L_4} 0^k 1)\).
    Separating extension? \(z = 0^i 1\)

    \[ 0^i 1 z \fragment{= 0^i 1 0^i 1} \fragment{\in L_4 \quad \text{but} \quad 0^k 1 z =} \fragment{0^k 1 0^i 1} \fragment{\notin L_4\quad\text{since } i \ne k} \]

  • Therefore every element of \(S\) is in a different equivalence class.

  • Since \(|S| = \infty\), \(L_4\) is nonregular.

More Examples Applying the Myhill-Nerode Theorem

Claim: the following language is not regular: \[ L_5 = \setbuild{0^i 1^j}{i > j} \]

Proof:

  • Let \(S = \fragment{\setbuild{0^n}{n \in \mathbb{N}} = \{\epsilon, 0, 00, 000, \ldots\}}\).

  • Any pair \(0^i, 0^k \in S\) with \(i > k\) is \(L_5\)-inequivalent \(\quad (0^i \not \sim_{L_5} 0^k)\).
    Separating extension?

    • \(1^k\)
    • \(1^i\)
    • \(1^{k + 1}\)
    • \(1^{i - 1}\)
    • \(01^{k + 1}\)

    \[ 0^i 1^k \in L_5 \quad \text{but} \quad 0^k 1^k \notin L_5 \]

  • Since \(|S| = \infty\), \(L_5\) is nonregular.

More Examples Applying the Myhill-Nerode Theorem

Claim: the following language is not regular: \[ L_6 = \setbuild{0^i 1^j 0^k}{i \cdot j = k} = \{\epsilon, 010, 00100, 01100, \ldots\} \]

Proof:

  • Let \(S = \setbuild{0^n}{n \in \mathbb{N}}\)

  • Consider any pair \(0^i, 0^k \in S\) with \(i \ne k\). Separating extension?

    • \(0^i\)
    • \(10^i\)
    • \(10^k\)
    • \(110^{2i}\)
    • \(110^{i + k}\)
  • Since \(|S| = \infty\), \(L_6\) is nonregular.

  • Alternative choices for \(S\): \[ S = \setbuild{0^n 1^n}{n \in \mathbb{N}}, \quad \quad S = \setbuild{0^n 1}{n \in \mathbb{N}}, \quad \quad \cdots \]

More Examples Applying the Myhill-Nerode Theorem

Claim: the following language is not regular: \[ L_7 = \setbuild{1^{n^2}}{n \in \mathbb{N}} = \{\epsilon, 1, 1111, 111111111, \ldots\} \]

Proof:

  • Let \(S = L_7\).

  • Consider any pair \(1^{i^2}, 1^{k^2} \in S\) with \(i > k\).

  • Separating extension that works for all such pairs?

    • \(\emptystring\)
    • \(1\)
    • \(1^k\)
    • \(1^{(k + 1)^2 - k^2}\)
    • \(1^{(i + 1)^2 - i^2}\)

    The gap between adjacent perfect squares is \((k + 1)^2 - k^2 = 2k + 1\), which strictly increases with \(k\). \[ 1,\; 4,\; 9,\; 16,\; 25,\; 36,\; 49,\; 64,\; \ldots \]

    Since \(i>k\), adding \(2k + 1\) to \(i^2\) is not enough to reach \((i + 1)^2\).

    Adding \(2i + 1\) to \(k^2\) could reach \((k + c)^2\) for \(c > 1\).
    Example: \(k = 0, i = 4, c = 3\)

  • Since \(|S| = \infty\), \(L_7\) is nonregular.

More Examples Applying the Myhill-Nerode Theorem

Claim: the following language is not regular: \[ L_8 = \setbuild{w \in \binary^*}{w = \reverse{w} \; (w \text{ is a palindrome})} \]

Proof:

  • Let \(S = \setbuild{0^n 1}{n \in \mathbb{N}}\).
  • Consider any pair \(0^i 1, 0^k 1 \in S\) with \(i \ne k\).
  • These have the separating extension \(z = 0^i\) (as well as \(z = 0^k\)).
  • Since \(|S| = \infty\), \(L_8\) is nonregular.

Proof of the Myhill-Nerode Corollary

Corollary (one direction) of the Myhill-Nerode Theorem:
If a language \(L\) has an infinite number of equivalence classes with respect to \(\sim_L\),
then \(L\) is nonregular.

Proof: (Contrapositive: \(L\) regular \(\implies\) \(\sim_L\) defines a finite number of equivalence classes.)

  • Let \(L\) be regular, and let \(D = (Q, \Sigma, \delta, s, F)\) be a DFA deciding it.
  • Denote by \(\reachedState(x)\) the state reached by \(D\) after processing a string \(x \in \Sigma^*\).
    • \(\reachedState(x)\) is sometimes called the “extended transition function.”
    • It is defined recursively as:
      • Base case: \(\; \reachedState(\emptystring) = \fragment{s}\)
      • For all \(a \in \Sigma\) and \(x \in \Sigma^*\): \(\; \reachedState(x a) = \fragment{\delta(\reachedState(x), a)}\)
  • For any \(x, y \in \Sigma^*\) such that \(\reachedState(x) = \reachedState(y)\), we have \(x \sim_L y \quad\) (\(\reachedState(xz) = \reachedState(yz)\) for all \(z \in \Sigma^*\)).
  • Thus for each state \(q \in Q\), all strings \(x \in \Sigma^*\) with \(\reachedState(x) = q\) are in the same equivalence class.
  • This means the number of equivalence classes defined by \(\sim_L\) is at most \(|Q|\) (hence finite).

Optional: The Pumping Lemma

  • Another common way to prove a language is nonregular is to apply the pumping lemma.
  • Underlying idea:
    • Any sufficiently long string \(w \in L\) will cause a DFA to revisit a state.
    • The substring read between the two visits can be deleted or repeated to obtain \(w' \in L\).

Pumping Lemma
If \(L\) is regular, then there exists a pumping length \(p \in \mathbb{N}\) such that for all \(w \in L\) with \(|w| \geq p\), there exists a decomposition into three substrings \(w = x y z\) where:

  1. \(x y^i z \in L\) for all \(i \geq 0\).
  2. \(|y| > 0\).
  3. \(|x y| \leq p\)
data/images/nonregular/pumping_vis.svg

To prove a language is nonregular using the pumping lemma, apply the contrapositive:
Given \(p\), there exists a string \(w \in L, |w| \ge p\) such that for all possible decompositions \(w = x y z\) with \(|y| > 0\) and \(|x y| \leq p\) we find \(x y^i z \notin L\) for some \(i \geq 0\).

Optional: Applying The Pumping Lemma

Claim: the following language is not regular: \[ L_1 = \setbuild{0^n 1^n}{n \in \mathbb{N}} \]

Proof via the Pumping Lemma:

  • Given pumping length \(p\), consider the string \(w = 0^p 1^p\) with \(|w| > p\).
  • The only possible decompositions \(w = x y z\) with \(|y| > 0\) and \(|x y| \leq p\)
    must have \(y = 0^k\) for some \(k \geq 1\).
  • Pumping (deleting or repeating) \(y\) changes the numbers of 0s, making the string no longer in the language: \[ x y^i z = 0^{p + (i - 1)k} 1^p \notin L_1 \quad \text{when } i \ne 1 \]
  • Therefore \(L_1\) is nonregular by the pumping lemma.

Optional: Applying The Pumping Lemma

Claim: the following language is not regular: \[ L_4 = \setbuild{w w}{w \in \binary^*} \]

Proof via the Pumping Lemma:

  • Assume that \(L_4\) is regular, and let \(p\) be the pumping length guaranteed by the pumping lemma.
  • Consider the string \(w = 0^p 1 0^p 1 \in L_4\) with \(|w| > p\).
  • The pumping lemma guarantees we can decompose \(w\) into three substrings \(w = x y z\) where:
    1. \(x y^i z \in L_4\) for all \(i \geq 0\).
    2. \(|y| > 0\).
    3. \(|x y| \leq p\)
  • The last two conditions imply that \(y = 0^k\) for some \(k \geq 1\).
  • But the pumped string \(x y^i z\) is: \[ x y^i z = 0^{p + (i - 1)k} 1 0^p 1 \notin L_4 \quad \text{when } i \ne 1 \] contradicting the first condition.