Title

ECS 120 Theory of Computation
Equivalence of Regular Expressions and NFAs
Julian Panetta
University of California, Davis

Equivalence of Regular Expressions and NFAs

  • NFAs and DFAs have equivalent computational power.
  • Regular Grammars and DFAs have equivalent computational power.
  • Today:
    • Prove that NFAs and Regular Expressions have equivalent computational power.
      Strategy: induction proofs that incrementally simplify the regex/NFA.
    • Conclude that the following classes of languages are actually all the same:
      • DFA-decidable
      • NFA-decidable
      • Regex-decidable
      • RG-decidable
    • Languages of this class are called regular languages.

Implementing Regexes with NFAs

Theorem: Any regex-decidable language is NFA-decidable.

Proof

Recall the formal definition:

R is a regular expression deciding language \(L(R) \subseteq \Sigma^*\) if one of the following holds:

  1. \(R = a\) for some \(a \in \Sigma\). Then \(L(R) = \{a\}\).
  2. \(R = \emptystring\). Then \(L(R) = \{\emptystring\}\).
  3. \(R = \emptyset\). Then \(L(R) = \{\}\).
  4. \(R = (R_1) \cup (R_2)\) for \(R_1, R_2\) regexes. Then \(L(R) = L(R_1) \cup L(R_2)\).
  5. \(R = (R_1)(R_2)\) for \(R_1, R_2\) regexes. Then \(L(R) = L(R_1) \circ L(R_2)\).
  6. \(R = (R_1)^*\) for \(R_1\) a regex. Then \(L(R) = L(R_1)^*\).

This inductive definition implies a “tree structure” that we can exploit to construct an NFA!

Implementing Regexes with NFAs

Theorem: Any regex-decidable language is NFA-decidable.

Proof

Inductive proof:

For any regular expression \(R\), we can construct an NFA \(N\) such that \(L(N) = L(R)\).

The base cases of the inductive definition are the base cases of the proof!

  1. If \(R = a\) for some \(a \in \Sigma\), then \(L(R) = \{a\}\) decided by the NFA
    data/images/regex_nfa/base_case_a.svg

  2. If \(R = \emptystring\), then \(L(R) = \{\emptystring\}\) decided by the NFA
    data/images/regex_nfa/base_case_e.svg

  3. If \(R = \emptyset\), then \(L(R) = \{\}\) decided by the NFA
    data/images/regex_nfa/base_case_empty.svg

Implementing Regexes with NFAs

Theorem: Any regex-decidable language is NFA-decidable.

Proof

Inductive proof:

For any regular expression \(R\), we can construct an NFA \(N\) such that \(L(N) = L(R)\).

The recursive cases of the definition are the inductive steps of the proof!

  1. If \(R = (R_1) \cup (R_2)\) for \(R_1, R_2\) regexes, then \(L(R) = L(R_1) \cup L(R_2)\).

    \(L(R_1)\) and \(L(R_2)\) are both NFA-decidable by the inductive hypothesis.
    By closure under union, so is \(L(R) = L(R_1) \cup L(R_2)\)!

  2. If \(R = (R_1)(R_2)\) for \(R_1, R_2\) regexes, then \(L(R) = L(R_1) \circ L(R_2)\).

    \(L(R_1)\) and \(L(R_2)\) are both NFA-decidable by the inductive hypothesis.
    By closure under concatenation, so is \(L(R) = L(R_1) \circ L(R_2)\)!

  3. If \(R = (R_1)^*\) for \(R_1\) a regex, then \(L(R) = L(R_1)^*\).

    \(L(R_1)\)is NFA-decidable by the inductive hypothesis.
    By closure under Kleene star, so is \(L(R) = L(R_1)^*\)!

This completes the proof!

Implementing Regexes with NFAs Example: \(\string{(a \cup ab)^* b}\)

  • data/images/regex_nfa/regex2nfa-example-1.svg
  • data/images/regex_nfa/regex2nfa-example-2.svg
  • data/images/regex_nfa/regex2nfa-example-3.svg
  • data/images/regex_nfa/regex2nfa-example-4.svg
  • data/images/regex_nfa/regex2nfa-example-5.svg

Converting NFAs to Regexes

Theorem: Any NFA-decidable language is regex-decidable.

Starting point: “Expression Automata” (or “Generalized NFAs” in Sipser)

Label the transition arrows of an NFA with regular expressions that match substrings of the input rather than individual symbols.

data/images/regex_nfa/expression_automaton.svg
  • \(\emptystring\)
  • \(\string{a}\)
  • \(\string{b}\)
  • \(\string{ba}\)
  • \(\string{bba}\)
  • \(\string{baa}\)
  • \(\string{baaba}\)
  • \(\string{baababa}\)
  • \(\string{baababa}\)
  • \(\string{bbaaababbba}\)
  • \(\string{bbaabab}\)
  • \(\string{bbaababab}\)

Converting NFAs to Regexes

Theorem: Any NFA-decidable language is regex-decidable.

Starting point: “Expression Automata” (or “Generalized NFAs” in Sipser)

Label the transition arrows of an NFA with regular expressions that match substrings of the input rather than individual symbols.

data/images/regex_nfa/expression_automaton.svg

Proof idea:

  • Any NFA \(N\) is already a valid expression automaton.
  • Incrementally convert \(N\) into a simpler equivalent expression automaton \(E\) of the form:
    data/images/regex_nfa/trivial_expression_automaton.svg
    where \(R\) is a regular expression.

Then \(L(N) = L(E) = L(R)\)!

Converting NFAs to Regexes: First Step

data/images/regex_nfa/trivial_expression_automaton.svg

In our first step, we modify \(N\) so that its start/accept states already have the desired form:

  • No transitions into the start state.
  • There is a single accept state with no outbound transitions.
q0 q0 22fd2adf-a647-4d15-9846-7e093e6efb9d 22fd2adf-a647-4d15-9846-7e093e6efb9d->q0 q1 q1 q0->q1 b ε ε q1->q0 b q2 q2 q1->q2 a q2->q0 ε q0 a q0 s 22fd2adf-a647-4d15-9846-7e093e6efb9d->q0 22fd2adf-a647-4d15-9846-7e093e6efb9d->q0 ε

Given \(N = (Q, \Sigma, \Delta, q_0, F)\),
construct \(N' = (Q', \Sigma, \Delta', s, F')\):

  • \(Q' = Q \cup \{s, a\}\) where \(s, a \notin Q\)
  • \(F' = \{a\}\)
  • Assuming \(\Delta\) represents a
    set of transitions: \(\Delta' = \Delta \cup \{(s, \emptystring, q_0)\} \cup \setbuild{(q, \emptystring, a)}{q \in F}\)

\(L(N') = L(N)\) because:
There exists a computation sequence of \(N\) \(r_1, r_2, \ldots, r_n\) that accepts \(w\) if and only if \(s, r_1, r_2, \ldots, r_n, a\) is a computation sequence of \(N'\) accepting \(w\).

Converting NFAs to Regexes: Incremental Simplification

We then apply incremental simplifications that remove the states “in the box” one by one:

q1 q1 q0->q1 b ε q1->q0 b q1->q2 a q0 q0 q0 s 22fd2adf-a647-4d15-9846-7e093e6efb9d->q0 22fd2adf-a647-4d15-9846-7e093e6efb9d->q0 ε ε q2 q2 q1 q1 q0->q1 b ε q1->q0 b q1->q2 a q2->q0 ε q0 a
data/images/regex_nfa/generic_n_third_step.svg
data/images/regex_nfa/generic_n_fourth_step.svg

How does this work in general?

Converting NFAs to Regexes: Incremental Simplification

  • The final simplification step operates on an EA that looks like:

    data/images/regex_nfa/generic_last_step.svg
  • We can rip state “i” out of the diagram but still accept strings whose computational sequences through it by changing the connection between \(s\) and \(a\):

    data/images/regex_nfa/generic_last_step_complete.svg
  • Both EAs accept exactly strings of the form

    • \(w \in L(W)\), or
    • \(x y^k z\) for some \(k \in \N\) where \(x \in L(X), y \in L(Y)\), and \(z \in L(Z)\).

Converting NFAs to Regexes: Incremental Simplification

When more than three states remain:

  • Select any state other than \(s\) or \(a\) and call it \(i\).
  • Iterate over pairs of states \(q\) and \(r\) with transitions \(q \stackrel{a}{\to} i\) and \(i \stackrel{b}{\to} r\) (even if \(q = r\))
  • For each of these pairs, transform:
    data/images/regex_nfa/generic_intermediate_step.svg into: data/images/regex_nfa/generic_intermediate_step_complete.svg
  • Repeat until only 2 states remain!

Example Conversion (Dave’s Figure 6.7)

data/images/regex_nfa/example_conversion.png

Example Conversion (Sipser, Figure 1.69)