Title

ECS 120 Theory of Computation
Context-Free Grammars
Julian Panetta
University of California, Davis

Portion of CFG for Python language

compound_stmt: if_stmt | while_stmt | function_def | ...
if_stmt:
    | 'if' named_expression ':' block elif_stmt 
    | 'if' named_expression ':' block [else_block] 
elif_stmt:
    | 'elif' named_expression ':' block elif_stmt 
    | 'elif' named_expression ':' block [else_block] 
else_block:
    | 'else' ':' block 
block:
    | NEWLINE INDENT statements DEDENT 
    | simple_stmts
...

https://docs.python.org/3/reference/grammar.html

Example CFG

\[\begin{align*} A &\to 0 A 11 \\ A &\to B \\ B &\to\ ! \end{align*}\]

Which of the following strings are generated by this grammar?

  • \(\string{000!111}\)
  • \(\string{111!000}\)
  • \(\string{00!1111}\)
  • \(\string{0!11}\)
  • \(\string{0B11}\)
  • \(\string{!}\)
  • \(\string{000111}\)

The set of strings generated is \(\{0^n ! 1^{2n} \mid n \geq 0\}\).

More Examples and Some Shorthand Notation

Let’s consider another CFG \(G\): \[\begin{align*} S &\to A B \\ A &\to 0 A \\ A &\to \emptystring \\ B &\to 1 B \\ B &\to \emptystring \end{align*}\]

We express this more concisely as: \[\begin{align*} S &\to A B \\ A &\to 0 A \or \emptystring \\ B &\to 1 B \or \emptystring \end{align*}\] where the \(\or\) symbol should be read as “or”.

Which of the following are generated by \(G\)?

  • \(\string{000111}\)
  • \(\string{111000}\)
  • \(\string{00111}\)
  • \(\string{001100}\)
  • \(\string{0}\)
  • \(\string{1}\)
  • \(\string{\epsilon}\)

The set of strings generated is matched by the regex \(0^* 1^*\).

More Examples and Some Shorthand Notation

Let’s consider another CFG \(G\): \[\begin{align*} S &\to A B \\ A &\to 0 A \\ A &\to \emptystring \\ B &\to 1 B \\ B &\to \emptystring \end{align*}\]

We express this more concisely as: \[\begin{align*} S &\to A B \\ A &\to 0 A \or \emptystring \\ B &\to 1 B \or \emptystring \end{align*}\] where the \(\or\) symbol should be read as “or”.

Parse tree for string 00111

2 0 4 0 8 1 10 1 12 1 0 S 1 A 0->1 7 B 0->7 1->2 3 A 1->3 7->8 9 B 7->9 3->4 5 A 3->5 9->10 11 B 9->11 11->12 13 B 11->13

More Examples

Let’s generate all strings of properly nested parentheses.
For example, \(()\) and \((())()\) are valid, but \((()\) and \()(\) are not.

  • We can produce a valid string of parentheses by:
    • Surrounding an existing valid string with a matching pair of parentheses.
    • Concatenating two existing valid strings.
  • We can express this as a CFG: \[ S \to \fragment{( S )} \fragment{ \or S S} \fragment{ \or \emptystring} \hspace{15em} \]

Parse tree for string \((()(()))\)

1 ( 4 ( 7 ) 9 ( 11 ( 14 ) 15 ) 16 ) 0 S 0->1 0->16 2 S 0->2 3 S 2->3 8 S 2->8 3->4 3->7 5 S 3->5 8->9 8->15 10 S 8->10 10->11 10->14 12 S 10->12

Formal Definition of CFGs

A context-free grammar (CFG) is a 4-tuple \(G = (\Gamma, \Sigma, S, \rho)\) where:

  1. \(\Gamma\) is a finite set of variables (non-terminal symbols).
  2. \(\Sigma\) is a finite alphabet, disjoint from \(\Gamma\) of terminals.
  3. \(S \in \Gamma\) is the start variable.
  4. \(\rho \fragment{\subseteq} \fragment{\Gamma \times {(\Gamma \cup \Sigma)}^*}\) is a finite set of rules.
  • By default, if no start variable is specified, the left-hand-side variable from the first rule is used as the start variable
  • Variables are usually upper-case symbols but can be more readable names like \(\cfgvar{EXPR}\) and \(\cfgvar{TERM}\).

Formal Definition of Computation for CFGs

First some terminology:

  • If \(u, v, w \in {(\Gamma \cup \Sigma)}^*\) are strings of variables and terminals,
    and \(A \to w\) is a rule in the grammar,
    then we say that: \[ u A v \yields u w v \quad \quad (u A v \; \; \textbf{yields} \; u w v) \]
  • If there exists sequence of strings \(u_1, u_2, \ldots, u_k\) for \(k \ge 0\) such that: \[ u \yields u_1 \yields u_2 \yields \ldots \yields u_k \yields v \] then we say that \(u\) derives \(v\), writing \(u \derives v\).

A grammar \(G\) with start variable \(S\) accepts or generates/produces \(w \in \Sigma^*\) if \(S \derives w\).

The language of grammar \(G\) is \(L(G) = \setbuild{w \in \Sigma^*}{S \derives w}\).

A language \(A\) is called context-free or CFG-decidable if there exists a CFG \(G\) s.t. \(L(G) = A\).

Some Strategies for Designing CFGs

  • CFGs naturally support the operations of union, concatenation, and “calling” another CFG as a subroutine.
  • Let’s try to design a CFG for the language \(\setbuild{0^n 1^n}{n \ge 0} \cup \setbuild{1^k 0^k}{k \ge 0}\)
    • Start with \(S \to\ A |\ B\), where \(A\)/\(B\) are respectively start variables for CFGs for each individual language \(L_A = \setbuild{0^n 1^n}{n \ge 0}\) and \(L_B = \setbuild{1^n 0^n}{n \ge 0}\).
    • For \(L_A = \setbuild{0^n 1^n}{n \ge 0}\), how to define this language recursively?
      • base case: \(x \in L_A\) if \(x = \emptystring\)
      • recursive case: \(x \in L_A\) if \(x = 0 y 1\), where \(y \in L_A\) (note \(|y| < |x|\)).
    • This leads to the rules \(A \to 0 A 1 \or \emptystring\).
    • Similarly, CFG for \(L_B\) is \(B \to 1 B 0 \or \emptystring\).
  • Putting it all together, we have: \[ \begin{array}{rcl} S &\to& A \or B \\ A &\to& 0 A 1 \or \emptystring \\ B &\to& 1 B 0 \or \emptystring \end{array} \]

DFA-to-CFG Construction and RRGs

  • Any DFA can trivially be converted into a CFG!

  • Consider the DFA \(M = (Q, \Sigma, \delta, s, F)\)

    • Create a variable \(R_i\) for each state \(q_i \in Q\).
    • For each transition \(\delta(q_i, a) = q_j\) in the DFA, add the rule: \[ R_i \to a R_j \]
    • For each accept state \(q_i \in F\), add the rule: \[ R_i \to \emptystring \]
    • Make \(R_0\) (corresponding to start state \(s\)) the start variable.
  • The resulting CFG decides (generates) exactly the same language as the DFA decides.

  • This is an example of a right-regular grammar (RRG):

    A right-regular grammar (RRG) is a CFG where each rule is of the form: \(X \to a Y\) or of the form \(X \to \emptystring\).

Nondeterministic Finite Automata

  • Our third “declarative model of computation” is the Nondeterministic Finite Automaton.

    30006bd2-16d1-4631-ae43-267d5867c6ea q1 q1 30006bd2-16d1-4631-ae43-267d5867c6ea->q1 q1->q1 0,1 q2 q2 q1->q2 1 q3 q3 q2->q3 0,ε q4 q4 q3->q4 1 q4->q4 0,1
    Example nondeterministic finite automaton (NFA)

What’s different here vs. our previous state diagrams?

  • NFAs are a generalization of DFAs.
    • Any DFA is already a valid NFA!
    • NFAs offer additional features that make them “easier to program” than DFAs
      • They can have multiple transitions for the same state and input symbol.
      • They can have \(\emptystring\)-transitions (that do not consume any input symbols).
      • They allow states to have no transitions for a given input symbols.
    • These features also make them less straightforward to simulate, which is why we call them “declarative”.
  • We’ll later prove that it’s possible to convert any NFA into a (usually larger) DFA.

Operation of an NFA

  • Intuitively, upon reading each symbol, an NFA follows all possible paths.
    • We can think of this as the machine “forking” into multiple copies to take each available transition.
    • For each copy, if there is no transition for the consumed symbol, that copy “dies”.
  • The NFA accepts the input string if any of its “copies” is in an accept state after reading the entire input string. Otherwise, it rejects.

30006bd2-16d1-4631-ae43-267d5867c6ea q1 q1 30006bd2-16d1-4631-ae43-267d5867c6ea->q1 q1->q1 0,1 q2 q2 q1->q2 1 q3 q3 q2->q3 0,ε q4 q4 q3->q4 1 q4->q4 0,1

Example processing the string \(010110\):

f049dc7a-bc3b-4bea-bd27-efa3df373909 q1 Symbol Read Start 7fa6a46a-240d-46ed-abcc-d76c3ad9f28f q1 f049dc7a-bc3b-4bea-bd27-efa3df373909->7fa6a46a-240d-46ed-abcc-d76c3ad9f28f ebbf8d04-b69b-4f26-a498-e728a3b862a5 q1 7fa6a46a-240d-46ed-abcc-d76c3ad9f28f->ebbf8d04-b69b-4f26-a498-e728a3b862a5 d5375942-c5f0-4b80-b20d-3ae794ab0865 q2 7fa6a46a-240d-46ed-abcc-d76c3ad9f28f->d5375942-c5f0-4b80-b20d-3ae794ab0865 e1fc55cf-caec-4f4d-bb1b-3dacc8c583b6 q3 7fa6a46a-240d-46ed-abcc-d76c3ad9f28f->e1fc55cf-caec-4f4d-bb1b-3dacc8c583b6 f2228edf-0f64-40c4-ac65-3dd5a006df02 q1 ebbf8d04-b69b-4f26-a498-e728a3b862a5->f2228edf-0f64-40c4-ac65-3dd5a006df02 08d5160f-4024-4a43-8e1c-e4a2bf49cafa q3 d5375942-c5f0-4b80-b20d-3ae794ab0865->08d5160f-4024-4a43-8e1c-e4a2bf49cafa f322fcbd-8f0d-4f17-aa8f-4ce6d434eeb2 q1 f2228edf-0f64-40c4-ac65-3dd5a006df02->f322fcbd-8f0d-4f17-aa8f-4ce6d434eeb2 f0ceb3dd-d242-4dfd-871f-4caf616f2f7a q2 f2228edf-0f64-40c4-ac65-3dd5a006df02->f0ceb3dd-d242-4dfd-871f-4caf616f2f7a e38078dc-30fb-4dbe-b1a7-e0c4294a5936 q3 f2228edf-0f64-40c4-ac65-3dd5a006df02->e38078dc-30fb-4dbe-b1a7-e0c4294a5936 3c9ccc4c-5499-4a51-b8ae-37698af3e67d q4 08d5160f-4024-4a43-8e1c-e4a2bf49cafa->3c9ccc4c-5499-4a51-b8ae-37698af3e67d 1ab9da77-0993-427f-a0ed-5988cedfdd6a q1 f322fcbd-8f0d-4f17-aa8f-4ce6d434eeb2->1ab9da77-0993-427f-a0ed-5988cedfdd6a 8283baf2-8e6b-4d66-98b9-42ff2db54c94 q2 f322fcbd-8f0d-4f17-aa8f-4ce6d434eeb2->8283baf2-8e6b-4d66-98b9-42ff2db54c94 86bd6c18-5e1f-46ea-9ecd-0d0c69f8a2a9 q3 f322fcbd-8f0d-4f17-aa8f-4ce6d434eeb2->86bd6c18-5e1f-46ea-9ecd-0d0c69f8a2a9 d805dbc8-5597-437d-9cd1-3f031448c7b6 q4 e38078dc-30fb-4dbe-b1a7-e0c4294a5936->d805dbc8-5597-437d-9cd1-3f031448c7b6 1eceb7af-e4c2-492d-bd9e-7672344e2572 q4 3c9ccc4c-5499-4a51-b8ae-37698af3e67d->1eceb7af-e4c2-492d-bd9e-7672344e2572 4b49f5a0-694c-4e33-b8d7-ae8181ee88ac q1 1ab9da77-0993-427f-a0ed-5988cedfdd6a->4b49f5a0-694c-4e33-b8d7-ae8181ee88ac 5d4eab11-a13e-462d-9d0b-1adbd1dda13c q3 8283baf2-8e6b-4d66-98b9-42ff2db54c94->5d4eab11-a13e-462d-9d0b-1adbd1dda13c a0fee90b-ab00-48ef-a83e-9a89312e4351 q4 d805dbc8-5597-437d-9cd1-3f031448c7b6->a0fee90b-ab00-48ef-a83e-9a89312e4351 a0fee90b-ab00-48ef-a83e-9a89312e4351 q4 d805dbc8-5597-437d-9cd1-3f031448c7b6->a0fee90b-ab00-48ef-a83e-9a89312e4351

← Accept!