2. Runtime Complexity & Analysis
CS331 - 2021 Spring
Runtime Analysis
- To understand the behavior of an algorithm we want to understand how the runtime of the algorithm is affected by the size of its input
- This is called runtime complexity
Empirical Runtime Analysis
Empirical measurements
- Empirically measure runtime:
- Run our algorihms with different input sizes
- Measure runtime
- Try to fit empirically observed runtime to a function
Problems with empirical measurements
- We only vary the input size
- runtime may also dependent on how adversarial the input is
- determining how characteristics of the input other then size affect runtime can be hard
- We may observe random effects
- other process running on the machine kicks in
- OS decides to do some heavy task while we are measuring
- Our results are machine- / language- / environment-dependent
- Our results may not translate well to other environments
- It may be hard to determine the growth rate of our empirically measured results
- Multiple functions may fit our data with the some error
- Which one is the correct one?
Take-away
- We want to reason about:
- how bad can it get (worst-case runtime)
- expectation (average-case runtime)
- best result we may get (best-case runtime)
- We want to reason about runtime independent of the environment the algorithm is run in!
Problem Definition
- Given an algorithm we want to find a function T(n) that computed the runtime of the algorithm for an input of size n
- $T(n)$ = number of instruction the algorithm executes to compute the output
- We want to do this for worst-case / average-case / best-case
- In this class we will mostly focus on worst-case analysis
- Number of bits
- independent of encoding
- works for everything
- not always intuitive
- Number of elements in the input
- e.g.,
sort(array)
- input is number of elements to sort
- correspondence to number of bits
- assume that values are of fixed size
- Size of an input element
- e.g.,
factorial(n)
- input is the magnitude of the input number
- correspondence to number of bits
- measure the numbers size in bits
- e.g.,
gcd(m,n)
- find greatest common divisor of m
and n
- Option 1: $max(n,m)$ - the largest input
- Option 2: $max(n,c)$ - only vary one input and consider others as constant
- get potentially different runtime behavior for different inputs (not for
gcd
though)
- ..here probably option 1
- example for option 2:
find(str,doc)
find occurances of string str
in doc
How to reason about runtime independent of an environment?
def sum(l: list): # input size: n = len(l)
sum = 0 # cost: c1 #executions: 1
for x in l: # cost: c2 #executions: n
sum += x # cost: c3 #executions: n
print(sum) # cost: c4 #executions: 1
$$T(n) = c_1 \cdot 1 + c_2 \cdot n + c_3 \cdot n + c_4 \cdot 1$$
$$= n \cdot (c_2 + c_3) + c_1 + c_4$$
How to reason about runtime independent of an environment?
- The constant costs of executing a statement are environment-specific
- Let’s ignore them!
$$T(n) = c_1 * 1 + c_2 * n + c_3 * n + c_4 * 1$$ reduces to
$$T(n) = 2n + 2$$
How to reason about runtime independent of an environment?
- Asymtotically (when we continue to increase the input size) only the term with the greatest growth rate determines the runtime
n = 1 => T(n) = 2*1 + 2 = 3
n = 10 => T(n) = 2*10 + 2 = 12
n = 100 => T(n) = 2*100 + 2 = 102
- This behaves like $2*n$
- or $n$ if we ignore constants
Asymptotic Behavior of Functions
- Big-O notation
- a function $f(x): \mathbb{N} \to \mathbb{N}$ is in $O(g(x))$ for a function $g(x)$ if there exist constants $n_0$ and $c$ such that for all $x > n_0$ we have $f(x) < c \cdot g(x)$
- Inuitively $f(x) = O(g(x))$ means $g$ grows faster than $f$
Asymtotic Runtime of Sum
def sum(l: list): # input size: n = len(l)
sum = 0 # cost: c1 #executions: 1
for x in l: # cost: c2 #executions: n
sum += x # cost: c3 #executions: n
print(sum) # cost: c4 #executions: 1
$$T(n) = O(n)$$
- Proof sketch: e.g., choose $g(n)=n$ and $c=3$ and $n_0=4$
f(4) = 2*4+2 = 10 < 12 = g(4)
f(5) = 2*5+2 = 12 < 15 = g(5)
...
Note that!
- runtime is algorithm-specific not problem-specific
- Example: computing Fibonnaci numbers:
$$F_0 = F_1 = 1$$
$$F_n = F_{n-1} + F_{n-2}$$
F2 = 1 + 1 = 2
F3 = 2 + 1 = 3
F4 = 3 + 2 = 5
F5 = 5 + 3 = 8
...
Dumb fibonacci(n) with $O(2^n)$ runtime
def fibonacci(n):
if n == 0 or n == 1:
return 1
else
return fibonacci(n-1) + fibonacci(n-2)
#calls to compute F0 is 1 = F0
#calls to compute F1 is 1 = F1
#calls to compute F2 is 3 = #calls(F1) + #calls(F0) + 1
#calls to compute F3 is 5 = #calls(F2) + #calls(F1) + 1
#calls to compute F4 is 9 = #calls(F3) + #calls(F2) + 1
- same growth rate as Fibonacci sequence which is $O(2^n)$!
Smart fibonacci(n) with $O(n)$ runtime
def fibonacci(n):
f = [ 1, 1 ]
for m in range(2,n+1):
f.append(f[m-1] + f[m-2])
return f[n]
Example Analysis: Insertion Sort
The Sorting Problem
- Given a list of $n$ items, sort the list according to a total order $\leq$ for the elements
Input = [5, 3, 9, 1, 8]
Output = [1, 3, 5, 8, 9]
- Input size: $n$
- Sorting algorithms have been studied since the dawn of Computer Science and even before that!
Insertion Sort
- We will learn about several sorting algorithms in this course
- For the sake of runtime analysis let us consider Insertion Sort as an example of such algorithms
Insertion Sort
- Let’s break the problem into smaller, more managable parts
- We can divide the problem of sorting a list of $n$ elements into two parts:
- sort the first $n-1$ elements
- insert the $n^{th}$ element in the right position in sort order
Insertion Sort - Example
- Input:
[5, 3, 8, 1, 9]
- First $n-1$ elements sorted:
[1, 3, 5, 9]
- Insert $n^{th}$ element
8
at the right position (before 9
): [1, 3, 5, 8, 9]
- But how to we sort the first $n-1$ elements?
- Apply rule recursively, we split the list of $n-1$ elements into a list of length $n-2$ and a final element
Insertion Sort
- We implement insertion sort using a counter that keeps track of the prefix of the list that is sorted. Once this counter reaches $n$ we are done (the full list is sorted):
def insertion_sort(lst):
for i in range(1, len(lst)):
# find position of lst[i] in sorted prefix [0:i-1]
Insertion Sort
- How do we find position of
lst[i]
in sorted lst[0:i-1]
?
- Trickle down
lst[i]
by comparing it with it predecessor until the predecessor is smaller
def insertion_sort(lst):
for i in range(1, len(lst)): # after i iterations the first i elements are sorted
for j in range(i, 0, -1): # trickle the ith element down to its position withing this sorted list
if lst[j] < lst[j-1]:
lst[j], lst[j-1] = lst[j-1], lst[j]
else:
break # found final position of element
Runtime analysis
- Let’s do worst case runtime!
def insertion_sort(lst):
for i in range(1, len(lst)): # n-1
for j in range(i, 0, -1): # sum(i)
if lst[j] < lst[j-1]: # sum(i)
lst[j], lst[j-1] = lst[j-1], lst[j] # < sum(i)
else: # < n
break # < n
$$T(n) \leq n-1 + \sum_{i=1}^{n} i + \sum_{i=1}^{n} i + \sum_{i=1}^{n} i + n + n$$
Runtime analysis
$$T(n) \leq n-1 + \sum_{i=1}^{n} i + \sum_{i=1}^{n} i + \sum_{i=1}^{n} i + n + n$$
$$= 3 \cdot \sum_{i=1}^{n} i + 3n -1$$
- Recall that $\sum_{i=1}^n i = \frac{n \cdot (n+1)}{2}$
$$T(n) \leq 3 \frac{n \cdot (n+1)}{2} + 3n -1$$
Runtime analysis
$$T(n) = O(n^2)$$
Example Analysis: Linear and Binary Search
Searching in a Sequence
- Input: list
l
of length n
and element e
to search for
- Output:
true
if e
is in l
and false
otherwise
Linear Search
- Go through the list sequentially and inspect every element. Stop once the end of the list is reached or the element has been found
def linear_search(lst, e):
for x in lst:
if x == e:
return True
return False
Runtime Analysis
- What is the worst case?
- => if element is not in the list
- In this case: iterate through whole list
- => $O(n)$
def linear_search(lst, e):
for x in lst:
if x == e:
return True
return False
Binary Search
- Input is assumed to the sorted
- Compare middle element
l[mid]
of the list with e
:
- if
l[mid] == e
then return True
- if
l[mid] > e
then recursively search in l[:mid]
- if
l[mid] < e
then recursively search in l[mid+1:]
Binary Search
def binary_search(lst, e):
low = 0
hi = len(lst)
mid = (low + hi) // 2
while lst[mid] != e and low <= hi:
if lst[mid] < e:
low = mid + 1
else:
hi = mid - 1
mid = (low + hi) // 2
if lst[mid] == e:
return True
else:
raise False
Runtime Analysis
- how often is the main loop executed?
- in each loop interation we reduce the distanced of
hi - low
by a factor of ~2
- how long does it take us to get to
low=high
in the worst-case?
- => $O(\log n)$
while lst[mid] != e and low <= hi:
if lst[mid] < e:
low = mid + 1
else:
hi = mid - 1
mid = (low + hi) // 2