Title

ECS 120 Theory of Computation
Equivalence of RRGs, NFAs, and regex’s
Julian Panetta
University of California, Davis

Last time: Simulating DFAs with RRGs

A right-regular grammar (RRG) is a CFG whose rules are all of the form: \[ X \to a Y \qquad \text{or} \qquad X \to \emptystring \]

Theorem: Every DFA-decidable language is RRG-decidable (and thus CFG-decidable).

Given any DFA \(D\), we can convert it into RRG \(G\) such that \(L(G) = L(D)\):
\(G\) generates string \(w\) iff \(D\) accepts \(w\).

Proof

  • Let \(D = (Q, \Sigma, \delta, s, F)\) be a DFA.
  • Construct grammar \(G = (\Gamma, \Sigma, S, \rho)\) such that:
    • \(\Gamma=Q\)
    • Start symbol \(S \in \Gamma\) corresponds to start state \(s \in Q\)
    • Production rules \(\rho\) are defined as follows:
      • For each state \(X \in Q\) and \(a \in \Sigma\), add the rule \(X \to a Y\), where \(Y = \delta(X, a)\)
      • For each accept state \(X \in F\), add the rule \(X \to \emptystring\)

We now prove \(L(G) = L(D)\).

Simulating DFAs with RRGs

A right-regular grammar (RRG) is a CFG whose rules are all of the form: \[ X \to a Y \qquad \text{or} \qquad X \to \emptystring \]

Theorem: Every DFA-decidable language is RRG-decidable (and thus CFG-decidable).

For \(X \in Q, a \in \Sigma\): \(\ X \to a Y\), where \(Y = \delta(X, a)\)
For \(X \in F\): \(\qquad \quad X \to \emptystring\)

Proof

  • \(L(G) \subseteq L(D)\):
    • For any \(x \in L(G)\), there is a derivation \[ S \yields u_1 \yields u_2 \yields \ldots \yields u_n \yields x \]
    • Due to \(G\)’s special production rule structure,
      each \(u_i = x_i R_i\), where \(x_i \sqsubseteq x\) is a prefix of \(x\).
    • The state \(R_i\) is the state of \(D\) after reading \(x_i\).
    • State \(R_n\) must be an accept state for the last nonterminal to be erased via \(R_n \to \emptystring\) in \(u_n = x R_n\).
    • Thus the computation sequence \(S, R_1, R_2, \ldots, R_n\) accepts \(x\), showing \(L(G) \subseteq L(D)\).

Simulating DFAs with RRGs

A right-regular grammar (RRG) is a CFG whose rules are all of the form: \[ X \to a Y \qquad \text{or} \qquad X \to \emptystring \]

Theorem: Every DFA-decidable language is RRG-decidable (and thus CFG-decidable).

For \(X \in Q, a \in \Sigma\): \(\ X \to a Y\), where \(Y = \delta(X, a)\)
For \(X \in F\): \(\qquad \quad X \to \emptystring\)

Proof

  • \(L(D) \subseteq L(G)\):
    • For any \(x \in L(D)\), there is an accepting
      computation sequence (i.e., \(R_n \in F\)): \(\quad S, R_1, R_2, \dots, R_n\)
    • This can be translated into a derivation of \(x\) by applying the corresponding production rules in order: \[\begin{equation*} \fragment{ \begin{aligned} S &\to x[1] R_1 \\ R_1 &\to x[2] R_2 \\ R_2 &\to x[3] R_3 \\ &\cdots \\ R_{n-1} &\to x[n] R_n \\ R_{n} &\to \emptystring \\ \end{aligned} \quad \quad \quad \begin{aligned} S & \yields x[1] R_1 \\& \yields x[1] x[2] R_2 \\& \yields x[1] x[2] x[3] R_3 \\& \yields \ldots \\& \yields x[1] x[2] \cdots x[n] R_n \\& \yields x[1] x[2] \cdots x[n] = x \end{aligned} } \end{equation*}\]

Thus \(x \in L(G)\), showing \(L(D) \subseteq L(G)\).

Simulating RRGs with NFAs

Theorem: Any RRG-decidable language is NFA-decidable (and thus DFA-decidable).

Proof

  • Let \(G = (\Gamma, \Sigma, S, \rho)\) be a RRG.
  • Construct NFA \(N = (Q, \Sigma, \Delta, s, F)\) as follows:
    • \(Q = \Gamma\)
    • \(s = S\)
    • \(F = \setbuild{X \in Q}{\fragment{(X, \emptystring) \in \rho}}\)
      (For each production rule of the form \(X \to \emptystring\), let state \(X\) be accepting)
    • For all \(X \in Q\) and \(a \in \Sigma\): \(\Delta(X, a) = \setbuild{Y \in Q}{\fragment{(X, a Y) \in \rho}}\)
      (For each production rule of the form \(X \to a Y\), add the transition \(X \stackrel{a}{\to} Y\))
  • Paths from \(s\) to an accept state of \(N\) correspond exactly to a sequence of rules in \(G\), all but the last of which produce a single new terminal.
  • Therefore \(L(N) = L(G)\).

Corollary: RRGs, DFAs, and NFAs have equivalent computational (decision) power.

Note this is almost the same as the DFA-to-RRG in reverse; nondeterministic transitions like \(X \stackrel{0}{\to} Y\) and \(X \stackrel{0}{\to} Z\) come from rules like \(X \to 0Y \or 0Z\).
What if there are also rules like \(X \to Y\)? Correspond to \(\emptystring\)-transitions \(X \stackrel{\emptystring}{\to} Y\)

Left regular grammars

A left-regular grammar (LRG) is a CFG whose rules are all of the form: \[ X \to Y a \qquad \text{or} \qquad X \to \emptystring \]

  • Left Regular Grammars are also equivalent in power to DFAs/NFAs.
  • We could prove this using similar arguments to the RRG case.
  • Or we can notice:
    • Consider converting an RRG \(G\) to an LRG \(G'\) by replacing each rule \(X \to aY\) with \(X \to Ya\).
    • What is the relationship between \(L(G)\) and \(L(G')\)? \(\quad L(G') = \reverse{L(G)} = \setbuild{\reverse{w}}{w \in L(G)}\)
    • NFA-decidable languages are closed under reversal! (exercise at home)
    • Thus \(L(G')\) is NFA-decidable if and only if \(L(G)\) is.

Left and right regular grammars: Don’t mix the rules!

  • What language does the following grammar generate?

\[\begin{align*} A &\to 0B \\ B &\to A1 \\ A &\to \emptystring \end{align*}\]

\(\setbuild{0^n1^n}{n \in \N}\)… not regular!

Equivalence of regex’s and NFAs

  • NFAs and DFAs have equivalent computational power.
  • Regular grammars and DFAs have equivalent computational power.

Today:

  • Prove that NFAs and regex’s have equivalent computational power.
    Strategy: inductive constructions that incrementally build up the regex/NFA (bottom up from base cases).
  • Conclude that the following classes of languages are actually all the same, henceforth called regular = DFA-decidable = NFA-decidable = regex-decidable = RG-decidable

Implementing regex’s with NFAs

Theorem: Any regex-decidable language is NFA-decidable.

Proof

Recall the formal definition:

\(R\) is a regular expression deciding language \(L(R) \subseteq \Sigma^*\) if one of the following holds:

  1. (base) \(R = a\) for some \(a \in \Sigma\). Then \(L(R) = \{a\}\).
  2. (base) \(R = \emptystring\). Then \(L(R) = \{\emptystring\}\).
  3. (base) \(R = \emptyset\). Then \(L(R) = \{\}\).
  4. (inductive) \(R = (R_1) \cup (R_2)\) for regex’s \(R_1, R_2\). Then \(L(R) = L(R_1) \cup L(R_2)\).
  5. (inductive) \(R = (R_1)(R_2)\) for regex’s \(R_1, R_2\). Then \(L(R) = L(R_1) \circ L(R_2)\).
  6. (inductive) \(R = (R_1)^*\) for a regex \(R_1\). Then \(L(R) = L(R_1)^*\).

This inductive definition implies a “tree structure” that we can exploit to construct an NFA!

Implementing regex’s with NFAs

Theorem: Any regex-decidable language is NFA-decidable.

Proof

Inductive proof:

For any regular expression \(R\), we can construct an NFA \(N\) such that \(L(N) = L(R)\).

The base cases of the inductive definition are the base cases of the proof!

  1. If \(R = a\) for some \(a \in \Sigma\), then \(L(R) = \{a\}\) decided by the NFA
    data/images/regex_nfa/base_case_a.svg

  2. If \(R = \emptystring\), then \(L(R) = \{\emptystring\}\) decided by the NFA
    data/images/regex_nfa/base_case_e.svg

  3. If \(R = \emptyset\), then \(L(R) = \{\}\) decided by the NFA
    data/images/regex_nfa/base_case_empty.svg

Implementing regex’s with NFAs

Theorem: Any regex-decidable language is NFA-decidable.

Proof

Inductive proof:

For any regular expression \(R\), we can construct an NFA \(N\) such that \(L(N) = L(R)\).

The recursive cases of the definition are the inductive steps of the proof!

  1. If \(R = (R_1) \cup (R_2)\) for \(R_1, R_2\) regexes, then \(L(R) = L(R_1) \cup L(R_2)\).

    \(L(R_1)\) and \(L(R_2)\) are both NFA-decidable by the inductive hypothesis.
    By closure of NFA-decidability under union, so is \(L(R) = L(R_1) \cup L(R_2)\)!

  2. If \(R = (R_1)(R_2)\) for \(R_1, R_2\) regexes, then \(L(R) = L(R_1) \circ L(R_2)\).

    \(L(R_1)\) and \(L(R_2)\) are both NFA-decidable by the inductive hypothesis.
    By closure of NFA-decidability under concatenation, so is \(L(R) = L(R_1) \circ L(R_2)\)!

  3. If \(R = (R_1)^*\) for \(R_1\) a regex, then \(L(R) = L(R_1)^*\).

    \(L(R_1)\)is NFA-decidable by the inductive hypothesis.
    By closure of NFA-decidability under Kleene star, so is \(L(R) = L(R_1)^*\)!

This completes the proof!

Implementing regex’s with NFAs example: \((a \cup ab)^* b\)

  • data/images/regex_nfa/regex2nfa-example-1.svg \(\stackrel{\Large \nwarrow}{\large \leftarrow}\) base cases
  • data/images/regex_nfa/regex2nfa-example-2.svg
  • Note that this is equivalent to the NFA data/images/regex_nfa/regex2nfa-example-optimized-ab.svg

True or False: Collapsing two NFA states connected by an \(\emptystring\)-transition always preserves its behavior.

  • True
  • False
data/images/regex_nfa/nfa-collapse-eps-transition-inequivalent.svg

Implementing regex’s with NFAs example: \((a \cup ab)^* b\)

  • data/images/regex_nfa/regex2nfa-example-1.svg \(\stackrel{\Large \nwarrow}{\large \leftarrow}\) base cases
  • data/images/regex_nfa/regex2nfa-example-2.svg Don’t optimize by collapsing two states between \(\varepsilon\)-transition. (unsafe in general, even if it works in this special case)
  • data/images/regex_nfa/regex2nfa-example-3.svg
  • data/images/regex_nfa/regex2nfa-example-4.svg
  • data/images/regex_nfa/regex2nfa-example-5.svg

Converting NFAs to regex’s

Theorem: Any NFA-decidable language is regex-decidable.

Starting point: “Expression Automata” (EA) (or “Generalized NFAs” in Sipser)

Label the transition arrows of an EA with regular expressions that match substrings of the input rather than individual symbols.

data/images/regex_nfa/expression_automaton.svg
  • \(\emptystring\)
  • \(\string{a}\)
  • \(\string{b}\)
  • \(\string{ba}\)
  • \(\string{bba}\)
  • \(\string{baa}\)
  • \(\string{baaba}\)
  • \(\string{baababa}\)
  • \(\string{baababa}\)
  • \(\string{bbaaababbba}\)
  • \(\string{bbaabab}\)
  • \(\string{bbaababab}\)

Converting NFAs to Regexes

Theorem: Any NFA-decidable language is regex-decidable.

Starting point: “Expression Automata” (EA; “Generalized NFAs” in Sipser)

Label the transition arrows of an NFA with regular expressions that match substrings of the input rather than individual symbols.

data/images/regex_nfa/expression_automaton.svg

Proof idea:

  • Any NFA \(N\) is already a valid EA.
  • Incrementally convert \(N\) into an equivalent EA \(E\) of the form:
    data/images/regex_nfa/trivial_expression_automaton.svg
    where \(R\) is a regular expression.

Then \(L(N) = L(E) = L(R)\)!

Converting NFAs to regex’s: First step

In our first step, we modify \(N\) so that:

  • The start state has no inbound transitions.
  • There is a single accept state with no outbound transitions.
q0 q0 22fd2adf-a647-4d15-9846-7e093e6efb9d 22fd2adf-a647-4d15-9846-7e093e6efb9d->q0 q1 q1 q0->q1 b ε ε q1->q0 b q2 q2 q1->q2 a q2->q0 ε q0 a q0 s 22fd2adf-a647-4d15-9846-7e093e6efb9d->q0 22fd2adf-a647-4d15-9846-7e093e6efb9d->q0 ε

Given \(N = (Q, \Sigma, \Delta, q_0, F)\),
construct \(N' = (Q', \Sigma, \Delta', s, F')\):

  • \(Q' = Q \cup \{s, a\}\) where \(s, a \notin Q\)
  • \(F' = \{a\}\)
  • Assuming \(\Delta\) represents a
    set of transitions: \(\Delta' = \Delta \cup \{(s, \emptystring, q_0)\} \cup \setbuild{(q, \emptystring, a)}{q \in F}\)

\(L(N') = L(N)\) because:
there is a computation sequence of \(N\) \(r_1, r_2, \ldots, r_n\) that accepts \(w\) if and only if \(s, r_1, r_2, \ldots, r_n, a\) is a computation sequence of \(N'\) accepting \(w\).

Converting NFAs to regex’s: Incremental simplification

We then apply incremental simplifications that remove the states “in the box” (not \(s\) or \(a\)) one by one:

q1 q1 q0->q1 b ε q1->q0 b q1->q2 a q0 q0 q0 s 22fd2adf-a647-4d15-9846-7e093e6efb9d->q0 22fd2adf-a647-4d15-9846-7e093e6efb9d->q0 ε ε q2 q2 q1 q1 q0->q1 b ε q1->q0 b q1->q2 a q2->q0 ε q0 a
data/images/regex_nfa/generic_n_third_step.svg
data/images/regex_nfa/generic_n_fourth_step.svg

How does this work in general?

Converting NFAs to regex’s: Incremental simplification

  • The final simplification step operates on an EA that looks like:

    data/images/regex_nfa/generic_last_step.svg
  • We can rip state “i” out of the diagram but still accept strings whose computation sequences go through \(i\), by changing the transition between \(s\) and \(a\):

    data/images/regex_nfa/generic_last_step_complete.svg
  • Both EAs accept exactly strings of the form \(w \in L(W)\), or \(x y_1 y_2 \dots y_k z\) for some \(k \in \N\) where \(x \in L(X)\), each \(y_i \in L(Y)\), and \(z \in L(Z)\).

    • In general, some of the transitions may not exist, e.g., if there is no \(s \stackrel{W}{\to} a\) transition, use \(X Y^* Z\); if no self-loop on \(i\), use \(W \cup XZ\); if neither, then use \(XZ\).

Converting NFAs to regex’s: Incremental simplification

When more than three states remain:

  • Select any state other than \(s\) or \(a\) and call it \(i\).
  • Iterate over pairs of states \(q\) and \(r\) with transitions \(q \stackrel{X}{\to} i\) and \(i \stackrel{Z}{\to} r\) (even if \(q = r\))
  • For each of these pairs, transform:
    data/images/regex_nfa/generic_intermediate_step.svg into: data/images/regex_nfa/generic_intermediate_step_complete.svg
  • Repeat until only 2 states remain!
  • More precisely, iterate over pairs of transitions in/out of \(i\) (more on this in a bit).

Example removing state \(i\)

data/images/regex_nfa/ch6-ea-remove-state-before.svg

  • How to get from \(q\) to \(t\) going through \(i\)? \(0^*1\) (go from \(q\) to \(i\)) \((00)^*\) (loop on \(i\)) \(1^+\) (go from \(i\) to \(t\))
    • add new transition: \(q \stackrel{0^*1 (00)^* 1^+}{\to} t\)
  • How to get from \(q\) to \(r\) going through \(i\)? \(\fragment{0^*1} \fragment{(00)^*} \fragment{\emptystring} \fragment{= 0^*1 (00)^*}\)
    • replace existing \(q \stackrel{110}{\to} r\) transition with: \(q \stackrel{110\ \ \cup\ \ 0^*1(00)^*}{\to} r\)
  • How to get from \(t\) to \(r\) going through \(i\)? \(\fragment{01 (00)^* \emptystring = 01 (00)^*}\)
    • add new transition: \(t \stackrel{01(00)^*}{\to} r\)
  • How to get from \(t\) to \(t\) going through \(i\)? \(\fragment{01 (00)^* 1^+}\)
    • add new self-transition: \(t \stackrel{01(00)^*1^+}{\to} t\)
data/images/regex_nfa/ch6-ea-remove-state-after.svg

Other pairs of states, e.g., (\(s,r\) or \(q,q\)), not listed because either the first has no transition to \(i\) or the second has no transition from \(i\).

Process all pairs of transitions in/out of \(i\)

  • More generally, when removing a state \(i\), we must consider all pairs of transitions in/out of \(i\), along with the states they come from/go to.
  • We often write two transitions with the same from/to states with a single arrow, for example, \(q \stackrel{0,1}{\to} i\)… Technically these are two different transitions: \(q \stackrel{0}{\to} i\) and \(q \stackrel{1}{\to} i\).
  • Alternatively (and more conveniently when transforming an NFA to a regex by hand), one could think of the two NFA transitions represented by \(q \stackrel{0,1}{\to} i\) as a single EA transition \(q \stackrel{0 \cup 1}{\to} i\).

Example (Sipser, Figure 1.69): regex’s get large quickly!

Theorem: (not proven in this class) There are NFAs with \(n\) states such that any equivalent regex has length at least \(2^{n-1}\)!

Summary

Let \(L\) be a language; the following are equivalent:

  • \(L\) is DFA-decidable.
  • \(L\) is NFA-decidable.
  • \(L\) is regex-decidable.
  • \(L\) is RG-decidable.

From now on, we simply use the term regular.