3Sum Problem:
Given a set \(A\) of \(n\) integers (\(A \subset \Z\), \(|A| = n\)), check if there are three elements that sum to zero (i.e., \(a + b + c = 0\) for \(a, b, c \in A\)).
Two competing algorithms:
def three_sum_1(A):
for i in range(len(A)):
for j in range(i, len(A)):
for k in range(j, len(A)):
if A[i] + A[j] + A[k] == 0:
return True
return False
def three_sum_2(A):
two_sums = set()
for i in range(len(A)):
for j in range(i, len(A)):
two_sums.add(A[i] + A[j])
for i in range(len(A)):
if -A[i] in two_sums:
return True
return False
How do we know which one is faster?
Simple idea: measure the “wall-clock time” it takes to run each on the same input.
$ python 3sum.py 1 50.txt
Result: True
Elapsed time: 0.0003481250000000047
$ python 3sum.py 2 50.txt
Result: True
Elapsed time: 0.00020370899999999637
$ python 3sum.py 1 50.txt
Result: True
Elapsed time: 0.00013012499999999483
$ python 3sum.py 2 50.txt
Result: True
Elapsed time: 0.0005745000000000056
1000 runs in 0.21506016700000002
1000 runs in 0.26039516700000004
1000 runs in 0.20773370800000002
1000 runs in 0.27380958299999997
def three_sum_1(A):
for i in range(len(A)):
for j in range(i, len(A)):
for k in range(j, len(A)):
if A[i] + A[j] + A[k] == 0:
return True
return False
def three_sum_2(A):
two_sums = set()
for i in range(len(A)):
for j in range(i, len(A)):
two_sums.add(A[i] + A[j])
for i in range(len(A)):
if -A[i] in two_sums:
return True
return False
$ g++ 3sum.cc -o 3sum
$ ./3sum 1 50.txt
Result: True
1000 runs in 0.00394083 seconds
$ ./3sum 1 50.txt
Result: True
1000 runs in 0.13204 seconds
$ g++ 3sum.cc -O3 -o 3sum
$ ./3sum 1 50.txt
Result: True
1000 runs in 0.000488875 seconds
$ ./3sum 1 50.txt
Result: True
1000 runs in 0.0416488 seconds
def three_sum_1(A):
for i in range(len(A)):
for j in range(i, len(A)):
for k in range(j, len(A)):
if A[i] + A[j] + A[k] == 0:
return True
return False
def three_sum_2(A):
two_sums = set()
for i in range(len(A)):
for j in range(i, len(A)):
two_sums.add(A[i] + A[j])
for i in range(len(A)):
if -A[i] in two_sums:
return True
return False
Let \(M\) be a TM, and \(x \in \binary^*\).
Define \(\texttt{time}_M(x)\) to be the number of configurations that \(M\) visits on input \(x\).
What is \(\texttt{time}_M(x)\) when \(M\) immediately halts on \(x\)?
num_x.txt
and num_y.txt
both contain 500 numbers.$ ./3sum 1 num_x.txt
Result: False
1000 runs in 7.61071 seconds
$ ./3sum 2 num_x.txt
Result: False
1000 runs in 0.373348 seconds
$ ./3sum 1 num_y.txt
Result: True
1000 runs in 7.5e-07 seconds
$ ./3sum 2 num_x.txt
Result: True
1000 runs in 0.371461 seconds
What happened!?
num_x.txt
: \(\{373, 351, 694, 389, 300, \cdots \}\)
num_y.txt
: \(\{0, 351, 694, 389, 300, \cdots \}\)
If \(M\) is total, define the (worst-case) running time or time complexity of \(M\) to be the function \(t : \mathbb{N} \to \mathbb{N}^+\) such that \[ t(n) = \max_{x \in \binary^n} \texttt{time}_M(x) \]
Why must \(M\) be total?
We call such function \(t(n)\) a time bound.
Finally, we ask: “How quickly does the running time \(t(n)\) grow as \(n\) increases?”
We have experimental evidence that three_sum_2
is faster than three_sum_1
in this sense:
Input Size | Algorithm 1 Time (s) | Algorithm 2 Time (s) |
---|---|---|
50 | 0.0118399 | 0.0447408 |
100 | 0.0749655 | 0.166695 |
500 | 7.68075 | 0.377361 |
These times were collected for a “worst-case input”!
To formally define and compare the growths of time bound functions \(t(n)\), we use asymptotic analysis.
Given nondecreasing \(f, g : \N \to \R^+\), we write \(f = O(g)\) if there exists \(c \in \N\) such that \[ f(n) \le c \cdot g(n) \quad \text{for all } n \] We call \(g\) an asymptotic upper bound for \(f\).
Given nondecreasing \(f, g : \N \to \R^+\), we write \(f = o(g)\) if \[ \lim_{n \to \infty} \frac{f(n)}{g(n)} = 0 \]
Proof?