Node and edge lists: \(\encoding{G} = \text{(binary encoding of)}\qquad (\fragment{(1,2,3,4)}, \fragment{((1,2),(2,3),(3,1),(1,4))})\)
Adjacency matrix: \[ \fragment{\begin{bmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ \end{bmatrix}} \fragment{\quad \Longrightarrow \quad \encoding{G} = } \fragment{0111101011001000} \hspace{15em} \]
How can we determine the number of nodes?
We use \(\encoding{O}\) to denote the encoding of an object \(O\) as a string in \(\binary^*\). We can encode multiple objects with \(\encoding{(O_1, O_2, \ldots, O_k)}\) or just \(\encoding{O_1, O_2, \ldots, O_k}\).
3Sum Problem:
Given a set \(A\) of \(n\) integers (\(A \subset \Z\), \(|A| = n\)), determine whether there are three elements that sum to zero (i.e., \(a + b + c = 0\) for \(a, b, c \in A\)).
Two competing algorithms:
def three_sum_1(A: list[int]) -> bool:
for i in range(len(A)):
for j in range(i, len(A)):
for k in range(j, len(A)):
if A[i] + A[j] + A[k] == 0:
return True
return False
def three_sum_2(A: list[int]) -> bool:
two_sums = set()
for i in range(len(A)):
for j in range(i, len(A)):
two_sums.add(A[i] + A[j])
for i in range(len(A)):
if -A[i] in two_sums:
return True
return False
How do we know which one is faster?
Simple idea: measure the “wall-clock time” it takes to run each on the same input.
$ python 3sum.py 1 50.txt
Result: True
Elapsed time: 0.35 ms
$ python 3sum.py 2 50.txt
Result: True
Elapsed time: 0.20 ms
$ python 3sum.py 1 50.txt
Result: True
Elapsed time: 0.13 ms
$ python 3sum.py 2 50.txt
Result: True
Elapsed time: 0.57 ms
0.22 ms on average across 1000 runs
0.26 ms on average across 1000 runs
0.21 ms on average across 1000 runs
0.27 ms on average across 1000 runs
def three_sum_1(A: list[int]) -> bool:
for i in range(len(A)):
for j in range(i, len(A)):
for k in range(j, len(A)):
if A[i] + A[j] + A[k] == 0:
return True
return False
def three_sum_2(A: list[int]) -> bool:
two_sums = set()
for i in range(len(A)):
for j in range(i, len(A)):
two_sums.add(A[i] + A[j])
for i in range(len(A)):
if -A[i] in two_sums:
return True
return False
$ g++ 3sum.cc -o 3sum
$ ./3sum 1 50.txt
Result: True
1000 runs in 3.9 ms
$ ./3sum 2 50.txt
Result: True
1000 runs in 130 ms
$ g++ 3sum.cc -O3 -o 3sum
$ ./3sum 1 50.txt
Result: True
1000 runs in 0.48 ms
$ ./3sum 2 50.txt
Result: True
1000 runs in 41 ms
Compiler optimization made both algorithm implementations faster, but they are still the same algorithms.
def three_sum_1(A: list[int]) -> bool:
for i in range(len(A)):
for j in range(i, len(A)):
for k in range(j, len(A)):
if A[i] + A[j] + A[k] == 0:
return True
return False
def three_sum_2(A: list[int]) -> bool:
two_sums = set()
for i in range(len(A)):
for j in range(i, len(A)):
two_sums.add(A[i] + A[j])
for i in range(len(A)):
if -A[i] in two_sums:
return True
return False
Let \(M\) be a TM, and \(x \in \binary^*\).
Define \(\texttt{time}_M(x)\) to be the number of configurations that \(M\) visits on input \(x\).
\(\texttt{time}_M(x) = 1\) when \(M\) immediately halts on \(x\)
num_x.txt and num_y.txt both contain 500 numbers.$ ./3sum 1 num_x.txt
Result: False
1000 runs in 7.61 seconds
$ ./3sum 2 num_x.txt
Result: False
1000 runs in 0.37 seconds
$ ./3sum 1 num_y.txt
Result: True
1000 runs in 7.5e-07 seconds
$ ./3sum 2 num_y.txt
Result: True
1000 runs in 0.37 seconds
What happened?!
num_x.txt: \(\{373, 351, 694, 389, 300, \cdots \}\)
num_y.txt: \(\{0, 351, 694, 389, 300, \cdots \}\)
0+0+0=0 solution immediately found for algorithm 1
If \(M\) is total, define the (worst-case) running time or time complexity of \(M\) to be the function \(t : \mathbb{N} \to \mathbb{N}^+\) such that \[ t(n) = \max_{x \in \binary^n} \texttt{time}_M(x) \]
Why must \(M\) be total?
We call such function \(t(n)\) a time bound.
Finally, we ask: “How quickly does the running time \(t(n)\) grow as \(n\) increases?”
We have experimental evidence that three_sum_2 is faster than three_sum_1 in this sense:
| Input Size | Algorithm 1 Time (s) | Algorithm 2 Time (s) |
|---|---|---|
| 50 | 0.0118399 | 0.0447408 |
| 100 | 0.0749655 | 0.166695 |
| 500 | 7.68075 | 0.377361 |
These times were collected for a “worst-case input”!
To formally define and compare the growths of time bound functions \(t(n)\), we use asymptotic analysis.
Given nondecreasing \(f, g : \N \to \R^+\), we write \(f = O(g)\) if there is \(c \in \N\) so that, for all \(n \in \N\), \[ f(n) \le c \cdot g(n). \] We call \(g\) an asymptotic upper bound for \(f\). (Like saying \(f\) “\(\le\)” \(g\))
Would be more technically accurate to write \(f \in O(g)\). Many functions are \(O(g)\), and \(f\) is only equal to one of them.
Given nondecreasing \(f, g : \N \to \R^+\), we write \(f = o(g)\) if \(\lim\limits_{n \to \infty} \frac{f(n)}{g(n)} = 0\). (Like saying \(f\) “\(<\)” \(g\))
\[ \begin{aligned} f = O(g) &\iff (\exists c \in \N) (\forall n)\ f(n) \le c \cdot g(n) & f \text{ "$\le$" } g \\ f = o(g) &\iff \lim_{n \to \infty} \frac{f(n)}{g(n)} = 0 & f \text{ "$<$" } g \\ f = \Omega(g) &\iff g = O(f) & f \text{ "$\ge$" } g \\ f = \omega(g) &\iff g = o(f) & f \text{ "$>$" } g \\ f = \Theta(g) &\iff f = O(g) \text{ and } f = \Omega(g) & f \text{ "$=$" } g \end{aligned} \]