Title

ECS 120 Theory of Computation
Strings, Languages, and Finite Automata
Julian Panetta
University of California, Davis

Chapter 2: String Theory (the useful one)

  • Since everything on a computer is ultimately represented as a string of bits
    (e.g., 11010110), we need to establish terminology for working with strings.
  • Our theoretical models of computation will always operate on strings of symbols from a finite alphabet (not necessarily 0 and 1).
  • As needed, we will discuss how various discrete objects can be encoded as strings:
    • Integers
    • Floating-point numbers
    • Graphs and trees
    • Lists, stacks, and queues
    • Programs
  • This enables us to pose our study of computation as
    the study of algorithms processing finite strings.

Strings and Languages

  • An alphabet \(\Sigma\) is any nonempty finite set of symbols (usually single-character identifiers).
    \[\begin{align*} \Sigma_1 &= \{0, 1\} \\ \Sigma_2 &= \{\string{a, b, c, \ldots, z}\} \\ \Gamma &= \{\string{0, 1, x, y, z}\} \end{align*}\]

  • An string over alphabet \(\Sigma\) is a finite sequence of symbols from \(\Sigma\).
    \[\begin{alignat*}{2} w &= 010101 \quad &&\text{is a string over } \Sigma_1 \\ y &= \string{hello} \quad &&\text{is a string over } \Sigma_2 \\ z &= \string{x0yz} \quad &&\text{is a string over } \Gamma \hspace{20em} \end{alignat*}\]

    We write strings without parentheses or commas, whereas normally we’d write the sequence 010101 as a tuple \((0, 1, 0, 1, 0, 1).\)

  • The length of a string \(w\) is denoted \(|w|\).

  • The string of length 0 is called the empty string and is denoted \(\emptystring\).
    \(\emptystring\) is a string over any alphabet (including the empty alphabet \(\emptyset\)).

    \(\emptystring\) corresponds to "" in most programming languages.

  • We make no distinction between symbols and strings of length 1.
    For example, \(0\) is both a symbol and a string over \(\Sigma_1\).

The emptiness inside

Object Notation Python Equivalent Cardinality Length
Empty set \(\emptyset\) or \(\{\}\) set() 0 Sets don’t have lengths.
Empty string \(\emptystring\) "" Strings don’t have cardinalities. 0
A set containing the empty string \(\{\emptystring\}\) {""} 1 Sets don’t have lengths.

String Operations

  • We denote indexing into a string using square brackets [] or subscripts.
    For example, if \(w = \string{hello}\), then \(w[1] = w_1 = \string{h}\) and \(w[5] = w_5 = \string{o}\).
    • If \(|w| = n\), then we can write \(w = w_1 w_2 \ldots w_n\) or \(w = w[1] w[2] \ldots w[n]\).
  • We denote a substring of \(w\) starting at position \(i\) and ending at position \(j\) as \(w[i..j]\).
    • For example, if \(w = \string{hello}\), then \(w[2..4] = \string{ell}\).
    • If \(|w| = n\) then \(w[n] = w[n..n]\) refers to the \(n^\text{th}\) symbol.
      (Again, we don’t distinguish between symbols and strings of length 1.)
    • Note that substrings differ from subsequences in that substrings are contiguous.
      (\(\string{el}\) is a substring of \(\string{hello}\), whereas \(\string{eo}\) is only a subsequence.)
    • The reverse of \(w\), denoted \(\reverse{w}\), is the string obtained by writing the symbols of \(w\) in reverse order. For example, if \(w = \string{hello}\), then \(\reverse{w} = \string{olleh}\).
    • A prefix of \(x\) of \(w\), denoted \(x \prefix w\) is a substring of the form \(w[1..i]\) for some \(i\).
      A prefix is a proper prefix (\(x \prefixproper w\)) if \(i < |w|\). \[ \string{he} \prefixproper \string{hello}, \quad \string{he} \prefix \string{hello}, \quad \string{hello} \prefix \string{hello} \]
    • A suffix of \(y\) of \(w\), substring of the form \(w[j..|w|]\) for some \(j\). Alternately, \(y\) is a suffix of \(w\) if \(\reverse{y} \prefix \reverse{w}\).

More String Operations

  • Strings \(x\) and \(y\) can be concatenated to form a new string of length \(|x| + |y|\), denoted \[ xy = x \circ y = x_1 x_2 \ldots x_{|x|} y_1 y_2 \ldots y_{|y|}. \] For example, if \(x = \string{hello}\) and \(y = \string{world}\), then \(xy = \string{helloworld}\).
  • We can concatenate a string with itself \(k \in \N\) times by writing \[ w^k = \underbrace{w w \ldots w}_{k \text{ times}}. \] For example, if \(w = \string{abb}\), then \(w^3 = \string{abbabbabb}\).
    • This can be defined formally using induction on \(k\): \[ w^k = \begin{cases} \emptystring & \text{if } k = 0 \\ w w^{k-1} & \text{if } k > 0 \end{cases} \]

Sets of Strings

  • The set of all strings over an alphabet \(\Sigma\) is denoted \(\Sigma^*\). \[ \string{\{0, 1\}^*} = \string{\{\emptystring, 0, 1, 00, 01, 10, 11, 000 \ldots\}} \] The set of all strings of length \(n\) is denoted \(\Sigma^n = \setbuild{ x \in \Sigma^* }{|x| = n}\).

    • Similarly, \(\Sigma^{\le n} = \setbuild{ x \in \Sigma^* }{|x| \le n}\) and \(\Sigma^{< n} = \setbuild{ x \in \Sigma^* }{|x| < n}\).
  • Given a set of strings \(A\), we can order its elements by first comparing their lengths and then doing a “dictionary order” comparison.

    • Formally, \(x < y\) if and only if: \[ (|x| < |y|) \lor \fragment{\bigg((|x| = |y|) \land \fragment{\underbrace{\exists i \Big((x[1..i] = y[1..i]) \land (x[i+1] < y[i+1])\Big)}_{\text{wherever they first disagree, $x$ has the "smaller" symbol}}\bigg)}} \]
    • This assumes the alphabet has some “natural” ordering (e.g., \(\string{0 < 1}\), \(\string{a < b < \ldots < z}\)).
    • The resulting order is called shortlex or length-lexicographic order.
  • A set of strings is called a language.

    A set of strings is also called a decision problem!
    (A yes/no question about strings, defined by
    the set of strings for which the answer is “yes”.)

  • A set of languages is called a class.

Operations on Languages

  • If \(A\) and \(B\) are languages, then their concatenation is defined as \[ AB = A \circ B = \setbuild{xy}{x \in A \text{ and } y \in B}. \]
    • Example: \(A = \string{\{0, 11, 222\}}\), \(B = \string{\{000, 11, 2\}}\) \[ AB = \string{\{0000, 011, 02, 11000, 1111, 112, 222000, 22211, 2222\}}. \]
  • We can repeatedly concatenate a language with itself:
    • \(A^n = \underbrace{A A \ldots A}_{n \text{ times}}\)
    • \(A^{\le n} = A^0 \cup A^1 \cup \ldots \cup A^n = \bigcup_{i=0}^n A^i\)
    • \(A^* = \bigcup_{i=0}^\infty A^i\)

    Warning: when dealing with an ordinary set \(A\),
    \(A^2\) instead denotes the Cartesian product \(A \times A\),
    (all ordered pairs of elements from \(A\)).

  • Examples:
    • \(\{0, 44\}^2 = \{00, 044, 440, 4444\}\)
    • \(\{0, 1\}^3 = \{000, 001, 010, 011, 100, 101, 110, 111\}\)
    • \(\{0, 44\}^0 = \{\emptystring\}\)

Cardinality of Languages

  • The cardinality \(|A|\) of a language \(A\) is the number of strings in \(A\).

  • What is the cardinality of \(\Sigma^k\)?

    • \(2^{|\Sigma|}\)
    • \(k |\Sigma|\)
    • \(|\Sigma|^k\)
    • Not enough information

Cardinality of Languages

  • The cardinality \(|A|\) of a language \(A\) is the number of strings in \(A\).

  • The cardinality of \(\Sigma^k\) is \(|\Sigma|^k\).

  • Example: let \(A = \string{\{1, 11\}}\) and \(k = 2\).
    • \(A \times A = \string{\{(1, 1), {\color{red}(1, 11), (11, 1)}, (11, 11)\}}\) has cardinality 4.
    • \(A^2 = \string{\{11, 111, 1111\}}\) has cardinality 3.

Similarly, in Python, ("1", "11") == "111" is False.

Finite Automata (Chapter 3)

Models of Computation

  • To answer our “what is a computer” question from last lecture, we will study several computational models of increasing power.
  • The simplest model is a finite state machine or finite automaton (FA).
    • FAs are a good model for computers with extremely limited memory.
    • Beyond theory, they are also remarkably useful:
      • to implement advanced search features in text editors;
      • as part of a compiler’s front end (the lexer, or tokenizer);
      • for formally defining network protocols (e.g., TCP);
      • designing simple embedded electronic systems;
      • structing the code of a complicated user interface involving many states;
      • for verifying software correctness using model checking;
      • and more…
  • Specifically, we will start with deterministic finite automata (DFAs), and show how they can solve a certain class of decision problems.

Deterministic Finite Automata

A DFA can be visually specified by a state diagram:

data/images/dfa/dfa_m1/diagram.svg

(A DFA can be seen as a special type of directed graph with labeled edges and nodes.)

  • This example has three states (q1, q2, q3):
  • State q1 is the start state (denoted by an arrow pointing to it from nowhere).
  • State q2 is an accept state (denoted by a double circle).
  • The arrows between states are called transitions.

Operation of a DFA

data/images/dfa/dfa_m1/diagram.svg
  • A DFA processes an input string one symbol at a time.
  • After reading the last input symbol, it produces a Boolean output: “accept” or “reject” depending on whether it ends up in an accept state.

Example processing the string 1101:

1101

  • Begin in the start state \(q_1\). data/images/dfa/dfa_m1/state_q1.svg

  • Read 1 and transition to state \(q_2\). data/images/dfa/dfa_m1/state_q2.svg

    1101

  • Read 1 and transition to state \(q_2\). data/images/dfa/dfa_m1/state_q2.svg

    1101

  • Read 0 and transition to state \(q_3\). data/images/dfa/dfa_m1/state_q3.svg

    1101

  • Read 1 and transition to state \(q_2\). data/images/dfa/dfa_m1/state_q2.svg

    1101

  • Since the final state \(q_2\) is an accept state, the DFA accepts the string.

Operation of a DFA

data/images/dfa/dfa_m1/diagram.svg

Which string(s) below are accepted by this DFA?

  • \(\string{0000}\)
  • \(\string{10101}\)
  • \(\string{101010}\)
  • \(\string{0100100}\)