Lecture Notes for CS525

avg

init

1
2
3



count = 0;
sum = 0;

update(t)

1
2


count ++
sum += t.A

close

1

return sum / count

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


class SumAgg():

    def __init__(self):
        self.sum = 0
        self.count = 0

    def update(self,a):
        self.sum += a
        self.count += 1

    def close(self):
        return self.sum / self.count


agg = SumAgg()

for i in range(1,100):
    agg.update(i)

print(agg.close())

50.0

preaggregation

A	B
1	1
5	2
2	1
6	2
4	1
7	2

count
6

1

SELECT count(*) FROM r;

Thread 1

A	B
1	1
5	2
2	1

count
3

Thread 2

A	B
6	2
4	1
7	2

count
3

Thread 3

final count

Preaggregation for …
- count is sum of partial count
- sum is sum of partial sum
- min is min of partial min

Infinitely many plans

1
2
3


FROM R
FROM R NATURAL NATURAL JOIN (SELECT DISTINCT * FROM R)
FROM R NATURAL NATURAL JOIN (SELECT DISTINCT * FROM R) NATURAL JOIN (SELECT DISTINCT * FROM R)

1

WHERE R.A = S.B AND f(R.C,S.D)

Transactions & Schedules

1

UDPATE student SET gpa = gpa * 1.05 WHERE major = 'CS';

tid	name	major	gpa
t1	X	CS	3.5
t2	Y	CS	1.5
t3	Z	CS	2.3
t4	A	BIO	1.0
t5	B	BIO	4.0

Exam questions

how to solve schedules?
external sorting and hash join I/O + other operators

Non exam questions

postgres plans

I/O estimations

sorting

for sorting we need number of blocks of R B(R) and the number of blocks of memory M
B(R) is less then M
- read R then write R and use any in memory sorting algo. $2 \cdot B(R)$
run generation: read chunks of size $M$ into memory and sort them, then write them back
- number of I/O $2 \cdot B(R)$
- $\lceil \frac{B(R)}{M} \rceil$ generated runs
merging merge $M-1$ runs at a time: reduced number of runs by a factor of $M-1$ in each merge iteration
- $i = \lceil{\log_{M-1}(\lceil \frac{B(R)}{M} \rceil)\rceil$ merge phases
- $2 \cdot B(R)$
total I/O
- $(1 + i) \cdot 2 \cdot B(R)$

joins

input: B(R), B(S), M

block nested loop

hash join

if $B(R) < M$ or $B(S) < M$ load smaller table into memory, build hashtable, scan through the other
- WLOG R is smaller $B(R) + B(S)$
partition both tables into $M-1$ fragments -> the fragments are of size $\lceil \frac{B(R)}{M-1} \rceil$ and $\lceil \frac{B(S)}{M-1} \rceil$
- how much I/O $2 \cdot (B(R) + B(S))$
number of partition phases $i = \lceil \log_{M-1}(min(B(R),B(S))\rceil - 1$

$$(2 \cdot i + 1) \cdot (B(R) + B(S))$$

schedules

recoverability

T1 reads from T2 if T1 reads a data item A and T2 was the last one to write the data item A before the read
a schedule is recoverable if for every transaction Ti that reads from a transaction Tj the commit of Tj is before the commit of Ti

1
2
3
4
5


1: r2(A), -- 4
2: w1(A), -- 10
3: w2(A), -- 5
4: c2,
5: c1

Time	A
0	4
1	4
2	10
3	5
4	5
5	5

not recoverable

1
2
3
4
5


1: w1(A), -- 10
2: r2(A), -- 10
3: w2(A), -- 11
4: c2,
5: a1

Time	A	after T1 abort
0	4	4
1	10	4
2	10	4
3	11	5
4	11	5
5	11	5

recoverable

1
2
3
4
5


1: w1(A), -- 10
2: r2(A), -- 10
3: w2(A), -- 11
4: c1,
5: c2

cascadeless

T1 reads from T2 if T1 reads a data item A and T2 was the last one to write the data item A before the read
a schedule is cascadeless if for every transaction Ti that reads from a transaction Tj the commit of Tj is before the read of Ti
- we only read committed data
check for this wi(A) …. ci …. rj(A)

1
2
3
4
5


1: w1(A), -- 10
2: r2(A), -- 10
3: w2(A), -- 11
4: a1, -- have to abort T2 now
5: a2

1

w1(A), w2(A), w3(A)

1

r1(A), r2(A), r3(A)

strictness

a schedule is strict if every transaction is only reading and writing committed data
w1(A) …. c1 …. w2(A)

conflict-serializable

construct conflict graph
- conflicting operations
  - same data item
  - different transactions
  - at least one of them is a read
  - pi(X) … qj(X) means there is an edge from i to j

examples

1

r2(A),r2(E),w1(C),r1(E),r2(C),c2,w3(A),w3(C),c3,r1(D),c1

s #+LABEL: fig:cg.jpg

cg.jpg

cg.jpg.jpg

1

r2(E),r1(A),r2(C),r1(B),r1(C),r2(A),w2(C),r2(B),c2,r1(C),w1(E),w1(A),c1

2PL

examples

a schedule (here we need locks early

1

w1(A), w2(A), w1(B)

1

l1(A), w1(A), u1(A) -- T1 now in shrinking phase, l2(A), w2(A), l1(B) -- not possible T1 in shrinking phase, w1(B)

1

l1(A), l1(B), w1(A), u1(A) -- T1 now in shrinking phase, l2(A), w2(A), w1(B), u1(B) ...

a schedule (here we need locks late)

1

w1(C), w2(A), w1(A)

1

l1(A), l1(C), w1(C), u1(A), l2(A), w2(A),u2(A), l1(A) -- not allowed already in shrinking phase, w1(A)

1

l1(C), w1(C), l2(A), w2(A),u2(A), l1(A), w1(A)

new stuff

postgres plans (externally and internally) - 1
modern index structures (zone maps, Bw-trees, skip lists, LSM) - 4
distributed query / transaction processing - 2
in-memory databases - 6
sorting and joins in postgres (when to use what) - 3