{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Searching, Sorting, and Timing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Agenda\n", "\n", "1. Timing\n", "2. Prelude: Timing list indexing\n", "3. Linear search\n", "4. Binary search\n", "5. Insertion sort" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Timing" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import time\n", "time.time()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Prelude: Timing list indexing" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import timeit\n", "timeit.timeit(stmt='lst[0]',\n", " setup='import random; lst=[0] * 10**6')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "timeit.timeit(stmt='lst[10**6-1]',\n", " setup='import random; lst=[0] * 10**6')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import random\n", "size = 10**3\n", "times = [0] * size\n", "lst = [0] * size\n", "for _ in range(100):\n", " for i in range(size):\n", " times[i] += timeit.timeit(stmt='lst[{}]'.format(i),\n", " globals=globals(),\n", " number=10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "times" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.plot(times, 'ro')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Accessing an element in a list by index always takes the same amount of time, regardless of position. I.e., indexing incurs a *constant time* delay.\n", "\n", "How? **A Python list uses an array as its underlying data storage mechanism.** To access an element in an array, the interpreter:\n", "\n", "1. Computes an *offset* into the array by multiplying the element's index by the size of each array entry (which are uniformly sized, since they are merely *references* to the actual elements)\n", "2. Adds the offset to the *base address* of the array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Linear Search" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Task: to locate an element with a given value in a list (array)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def index(lst, x):\n", " return None" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "lst = list(range(100))\n", "index(lst, 10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "index(lst, 99)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "index(lst, -1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def index(lst, x):\n", " raise ValueError(x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "index(lst, 10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "index(lst, -1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "try:\n", " print('Value found at', index(lst, -1))\n", "except ValueError as e:\n", " print('Value not found:', e)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import timeit\n", "times = []\n", "lst = list(range(1000))\n", "for x in lst:\n", " times.append(timeit.timeit(stmt='index(lst, {})'.format(x),\n", " globals=globals(),\n", " number=100))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "plt.plot(times, 'ro')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Binary search" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Task: to locate an element with a given value in a list (array) whose contents are *sorted in ascending order*." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def index(lst, x):\n", " # assume that lst is sorted!!!\n", " return None" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "lst = list(range(1000))\n", "index(lst, 10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "index(lst, 999)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "index(lst, -1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "for i in range(len(lst)):\n", " assert(i == index(lst, i))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import timeit\n", "times = []\n", "lst = list(range(1000))\n", "for x in lst:\n", " times.append(timeit.timeit(stmt='index(lst, {})'.format(x),\n", " globals=globals(),\n", " number=1000))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "plt.plot(times, 'ro')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import timeit\n", "import random\n", "times = []\n", "for size in range(100, 10000, 100):\n", " lst = list(range(size))\n", " times.append(timeit.timeit(stmt='index(lst, -1)'.format(random.randrange(size)),\n", " globals=globals(),\n", " number=10000))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "plt.plot(times, 'ro')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import timeit\n", "import random\n", "times = []\n", "for e in range(5, 20):\n", " lst = list(range(2**e))\n", " times.append(timeit.timeit(stmt='index(lst, -1)',\n", " globals=globals(),\n", " number=100000))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "plt.plot(times, 'ro')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Insertion sort" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Task: to sort the values in a given list (array) in ascending order." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import random\n", "lst = list(range(1000))\n", "random.shuffle(lst)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.plot(lst, 'ro')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def insertion_sort(lst):\n", " pass" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "insertion_sort(lst)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.plot(lst, 'ro')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import timeit\n", "import random\n", "times = []\n", "for size in range(100, 5000, 100):\n", " lst = list(range(size))\n", " times.append(timeit.timeit(stmt='insertion_sort(lst)',\n", " setup='random.shuffle(lst)',\n", " globals=globals(),\n", " number=1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.plot(times, 'ro')\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 1 }