Title

ECS 120 Theory of Computation
Asymptotic analysis
Julian Panetta
University of California, Davis

Algorithms

  • Informally, an algorithm is a sequence of steps that can be followed to solve a problem.
  • Formally, we can define an algorithm as a procedure that can be implemented by a Turing machine.
  • However, after gaining some practice with Turing machines, we typically won’t specify the details of the machine.
  • Instead we can describe algorithms formally and precisely with a higher-level language (like Python).
  • We will argue that implementing a Python program on a Turing machine, while slower, won’t change its “complexity class.” (See HW3)

Object encoding for Turing machines

  • But programs can obviously operate on fancier objects than strings.
    How can a Turing machine do this too?
  • For example, how can a TM read a graph?
    data/images/tm/example_graph.svg
    • Node and edge lists: \(\encoding{G} = \text{(binary encoding of)}\qquad (\fragment{(1,2,3,4)}, \fragment{((1,2),(2,3),(3,1),(1,4))})\)

    • Adjacency matrix: \[ \fragment{\begin{bmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ \end{bmatrix}} \fragment{\quad \Longrightarrow \quad \encoding{G} = } \fragment{0111101011001000} \hspace{15em} \]

      How can we determine the number of nodes?

      We use \(\encoding{O}\) to denote the encoding of an object \(O\) as a string in \(\binary^*\). We can encode multiple objects with \(\encoding{(O_1, O_2, \ldots, O_k)}\) or just \(\encoding{O_1, O_2, \ldots, O_k}\).

Object encoding for Turing machines

  • But programs can obviously operate on fancier objects than strings.
    How can a Turing machine do this too?
  • For example, how can a TM read a graph?
  • The choice of encoding \(\encoding{\cdot}\) can matter a lot for practical reasons
    • What is the size of the adjacency matrix representation for a graph with \(n\) nodes? \(n^2\)
    • What about the node+edge list representation for a graph with \(n\) nodes and \(m\) edges?
      Proportional to \(n + m\). For sparse graphs \(m \ll n^2\)
    • This sort of difference is a big deal, e.g., in high-performance computing.
  • But for the sorts of performance differences we study in this course, we will show any “reasonable” encoding is good enough (won’t change whether an algorithm is polynomial time or not).

Measuring run time

3Sum Problem:
Given a set \(A\) of \(n\) integers (\(A \subset \Z\), \(|A| = n\)), determine whether there are three elements that sum to zero (i.e., \(a + b + c = 0\) for \(a, b, c \in A\)).

  • Two competing algorithms:

    def three_sum_1(A: list[int]) -> bool:
        for i in range(len(A)):
            for j in range(i, len(A)):
                for k in range(j, len(A)):
                    if A[i] + A[j] + A[k] == 0:
                        return True
        return False
    def three_sum_2(A: list[int]) -> bool:
        two_sums = set()
        for i in range(len(A)):
            for j in range(i, len(A)):
                two_sums.add(A[i] + A[j])
        for i in range(len(A)):
            if -A[i] in two_sums:
                return True
        return False
  • How do we know which one is faster?

  • Simple idea: measure the “wall-clock time” it takes to run each on the same input.

Wall-clock time experiments

$ python 3sum.py 1 50.txt
Result: True
Elapsed time: 0.35 ms
$ python 3sum.py 2 50.txt
Result: True
Elapsed time: 0.20 ms
$ python 3sum.py 1 50.txt
Result: True
Elapsed time: 0.13 ms
$ python 3sum.py 2 50.txt
Result: True
Elapsed time: 0.57 ms
  • One problem: this can be noisy and non-repeatable
    • Other processes running on the computer at the same time.
    • Dynamic clock frequencies (low power mode), cache state
    • We could work around these fluctuations by running the code many times and taking the average.

0.22 ms on average across 1000 runs

0.26 ms on average across 1000 runs

0.21 ms on average across 1000 runs

0.27 ms on average across 1000 runs

def three_sum_1(A: list[int]) -> bool:
  for i in range(len(A)):
      for j in range(i, len(A)):
          for k in range(j, len(A)):
              if A[i] + A[j] + A[k] == 0:
                  return True
  return False
def three_sum_2(A: list[int]) -> bool:
    two_sums = set()
    for i in range(len(A)):
        for j in range(i, len(A)):
            two_sums.add(A[i] + A[j])
    for i in range(len(A)):
        if -A[i] in two_sums:
            return True
    return False

Wall-clock time experiments

  • Another problem: performance is strongly impacted by low-level implementation details
    • Different programming languages.
    • Different interpreter/compiler versions.
    • Compilation flags when building the code.
    • Different hardware capabilities (CPU vs GPU, etc.)
$ g++ 3sum.cc -o 3sum
$ ./3sum 1 50.txt
Result: True
1000 runs in 3.9 ms
$ ./3sum 2 50.txt
Result: True
1000 runs in 130 ms
$ g++ 3sum.cc -O3 -o 3sum
$ ./3sum 1 50.txt
Result: True
1000 runs in 0.48 ms
$ ./3sum 2 50.txt
Result: True
1000 runs in 41 ms

Compiler optimization made both algorithm implementations faster, but they are still the same algorithms.

def three_sum_1(A: list[int]) -> bool:
  for i in range(len(A)):
      for j in range(i, len(A)):
          for k in range(j, len(A)):
              if A[i] + A[j] + A[k] == 0:
                  return True
  return False
def three_sum_2(A: list[int]) -> bool:
    two_sums = set()
    for i in range(len(A)):
        for j in range(i, len(A)):
            two_sums.add(A[i] + A[j])
    for i in range(len(A)):
        if -A[i] in two_sums:
            return True
    return False

The Turing machine as a standardized environment

  • To ensure we’re making a fair comparison of the algorithms themselves, we consider a “standard environment” when analyzing performance.
  • Specifically, we can use a Turing machine as a precisely defined environment.
  • “Time” = “number of steps taken by the Turing machine.”

Let \(M\) be a TM, and \(x \in \binary^*\).
Define \(\texttt{time}_M(x)\) to be the number of configurations that \(M\) visits on input \(x\).

\(\texttt{time}_M(x) = 1\) when \(M\) immediately halts on \(x\)

Factoring out input dependence

  • “Difficult” inputs can take longer to run than “easy” inputs.
    In the following, inputs num_x.txt and num_y.txt both contain 500 numbers.
$ ./3sum 1 num_x.txt
Result: False
1000 runs in 7.61 seconds
$ ./3sum 2 num_x.txt
Result: False
1000 runs in 0.37 seconds
$ ./3sum 1 num_y.txt
Result: True
1000 runs in 7.5e-07 seconds
$ ./3sum 2 num_y.txt
Result: True
1000 runs in 0.37 seconds

What happened?!

num_x.txt: \(\{373, 351, 694, 389, 300, \cdots \}\)

num_y.txt: \(\{0, 351, 694, 389, 300, \cdots \}\)

0+0+0=0 solution immediately found for algorithm 1

Factoring out input dependence

  • “Difficult” inputs can take longer to run than “easy” inputs.
  • How do we compare two algorithms \(A, B\) if \(\texttt{time}_A(x) < \texttt{time}_B(x)\) but
    \(\texttt{time}_A(y) > \texttt{time}_B(y)\) even when \(|x| = |y|\)?
  • We address this issue by considering only the size \(n\) of the input and not the contents.
  • We then perform a worst-case analysis:
    we determine the longest possible time \(A\) and \(B\) can run on an input of size \(n\).

If \(M\) is total, define the (worst-case) running time or time complexity of \(M\) to be the function \(t : \mathbb{N} \to \mathbb{N}^+\) such that \[ t(n) = \max_{x \in \binary^n} \texttt{time}_M(x) \]

Why must \(M\) be total?

We call such function \(t(n)\) a time bound.

Finally, we ask: “How quickly does the running time \(t(n)\) grow as \(n\) increases?”

Asymptotic analysis

  • Rather than worring about specific values of \(n\) and \(t(n)\), we consider the growth rate of \(t(n)\) as \(n\) increases.
  • We’ll say an algorithm is faster if it has a smaller growth rate.
    • We have experimental evidence that three_sum_2 is faster than three_sum_1 in this sense:

      Input Size Algorithm 1 Time (s) Algorithm 2 Time (s)
      50 0.0118399 0.0447408
      100 0.0749655 0.166695
      500 7.68075 0.377361

      These times were collected for a “worst-case input”!

    • To formally define and compare the growths of time bound functions \(t(n)\), we use asymptotic analysis.

  • Simplifications:
    • Ignore constant factors (e.g., \(2n^2\) considered equivalent to \(3n^2\)).
    • Consider only the highest-order term (e.g., \(2 n^2 + n + 2\) is equivalent to \(n^2\)).
  • These not only ease analysis but also help us ignore low-level implementation details.

Asymptotic analysis

Given nondecreasing \(f, g : \N \to \R^+\), we write \(f = O(g)\) if there is \(c \in \N\) so that, for all \(n \in \N\), \[ f(n) \le c \cdot g(n). \] We call \(g\) an asymptotic upper bound for \(f\). (Like saying \(f\)\(\le\)\(g\))

  • This is “Big-O” notation: \(f = O(g)\) means \(f\) grows no faster than \(g\).
  • Important cases:
    • Polynomial bounds: \(f = O(n^c)\) for \(c > 0\) (“\(f\) is polynomially bounded”)
      • Examples: \(n^2, n^3, n^{2.2}, n^{1000}\)
      • Larger than polynomial: \(n^{\log(n)}\) (exponent is not constant; grows with \(n\))
    • Exponential bounds: \(f = O(2^{c n^\delta})\) for some \(c, \delta > 0\)
      • Examples: \(2^n, 2^{100n}, 2^{0.01 n}, (2^n)^2 = 2^{2n}, 2^{n^2}, 2^\sqrt{n}, e^{n^2} (=2^{c n^2} \text{ for } c=\ln 2)\)
      • Also: \(a^{n^\delta}\) for any \(a > 1\) and \(\delta > 0\)

Would be more technically accurate to write \(f \in O(g)\). Many functions are \(O(g)\), and \(f\) is only equal to one of them.

Asymptotic analysis

Given nondecreasing \(f, g : \N \to \R^+\), we write \(f = o(g)\) if \(\lim\limits_{n \to \infty} \frac{f(n)}{g(n)} = 0\). (Like saying \(f\)\(<\)\(g\))

  • This is “Little-O” notation: \(f = o(g)\) means \(f\) grows strictly slower than \(g\).
  • We can say algorithm \(A\) is (asymptotically) faster than algorithm \(B\) if \(t_A(n) = o(t_B(n))\).
  • Warning, just because \(f = O(g)\) and \(g \ne O(f)\) does not mean \(f = o(g)\)
data/images/tm/little-o-big-o-counterexample.svg
  • \(f = O(g)\) since \(f(n) \le g(n)\) for all \(n\).
  • \(g \ne O(f)\) since, for any \(c > 0\), \(g(n) > f(n)\) for infinitely many \(n\).
  • \(f \ne o(g)\) since \(\frac{f(n)}{g(n)} = 1\) for infinitely many \(n\).

Strategies for comparing growth rates

  • To show \(f = o(g)\) you can always apply the definition and prove: \(\lim_{n \to \infty} \frac{f(n)}{g(n)} = 0\).
  • But usually there is an easier shortcut that you can apply.
    1. First, simplify by removing constants and lower-order terms (without changing the growth rate):
      • Examples: \[10 n^7 + 100 n^4 + n^2 + 10n = O(n^7), \quad\quad \quad\quad 2^n + n^{100} + 2^n = O(2^n) \]
      • This is helpful for the common case of an algorithm with multiple stages of different complexities.
    2. Second, you might notice that \(g(n)\) is of the form \(f(n) \cdot h(n)\) with an unbounded \(h(n)\).
      • Example: \(f(n) = n, \quad g(n) = n \log(n), \fragment{\quad g(n) = f(n) \cdot \underbrace{\log(n)}_{h(n)}}\)
      • Since \(\lim\limits_{n\to\infty} \log(n) = \infty\), we conclude \(n = o(n \log(n))\).
    3. Applying a log or raising to a power less than \(1\) shrinks the growth rate.
      \(\log(n) = o(n), \quad \fragment{\sqrt{n} = o(n),} \quad \fragment{\log(n^4) = o(n^4)}\) (actually \(\log(n^4) = 4 \log(n) = O(\log(n))\))
    4. Try taking a log of both functions since \(\log(f(n)) = o(\log(g(n)))\) implies \(f(n) = o(g(n))\).
      • Example: compare \(n^n\) to \(2^{n^2}\) \(\log(n^n) = \fragment{n \log n},\ \log(2^{n^2}) = \fragment{n^2,} \fragment{\text{ and } n \log n = o(n^2)} \fragment{\implies n^n = o(2^{n^2})}\)
      • Warning: \(f(n) = o(g(n))\) does not imply \(\log(f(n)) = o(\log(g(n)))\) (Counterexample: \(2^n = o(2^{2n})\), but \(n \ne o(2n)\))

Asymptotic analysis facts to remember

  • \(1 = o(\log \log n)\) (any unbounded function outgrows any constant)
  • \(\log \log n = o(\log n)\)
  • \(\log n = o(n^c)\) for any \(c > 0\)
  • \(\sqrt{n} = o(n)\)
  • \(n^c = o(n^k)\) if \(c < k \quad\) (\(\sqrt{n} = o(n)\) is special case \(c=0.5,k=1\))
  • \(n^c = o(2^{n^\delta})\) for any \(c > 0\) and any \(\delta > 0\)

Convenient notation for “reverse” relations

\[ \begin{aligned} f = O(g) &\iff (\exists c \in \N) (\forall n)\ f(n) \le c \cdot g(n) & f \text{ "$\le$" } g \\ f = o(g) &\iff \lim_{n \to \infty} \frac{f(n)}{g(n)} = 0 & f \text{ "$<$" } g \\ f = \Omega(g) &\iff g = O(f) & f \text{ "$\ge$" } g \\ f = \omega(g) &\iff g = o(f) & f \text{ "$>$" } g \\ f = \Theta(g) &\iff f = O(g) \text{ and } f = \Omega(g) & f \text{ "$=$" } g \end{aligned} \]

  • Very common to see authors write \(f = O(g)\) when they really mean \(f = \Theta(g)\).