Lecture Notes for CS525
avg
init
|
|
update(t)
|
|
close
|
|
|
|
50.0
preaggregation
A | B |
---|---|
1 | 1 |
5 | 2 |
2 | 1 |
6 | 2 |
4 | 1 |
7 | 2 |
count |
---|
6 |
|
|
Thread 1
A | B |
---|---|
1 | 1 |
5 | 2 |
2 | 1 |
count |
---|
3 |
Thread 2
A | B |
---|---|
6 | 2 |
4 | 1 |
7 | 2 |
count |
---|
3 |
Thread 3
final count |
---|
-
Preaggregation for …
count
issum
of partialcount
sum
issum
of partialsum
min
ismin
of partialmin
Infinitely many plans
|
|
|
|
Transactions & Schedules
|
|
tid | name | major | gpa |
---|---|---|---|
t1 | X | CS | 3.5 |
t2 | Y | CS | 1.5 |
t3 | Z | CS | 2.3 |
t4 | A | BIO | 1.0 |
t5 | B | BIO | 4.0 |
Exam questions
- how to solve schedules?
- external sorting and hash join I/O + other operators
Non exam questions
- postgres plans
I/O estimations
sorting
- for sorting we need number of blocks of R
B(R)
and the number of blocks of memoryM
-
B(R)
is less thenM
- read R then write R and use any in memory sorting algo. $2 \cdot B(R)$
-
run generation: read chunks of size $M$ into memory and sort them, then write them back
- number of I/O $2 \cdot B(R)$
- $\lceil \frac{B(R)}{M} \rceil$ generated runs
-
merging merge $M-1$ runs at a time: reduced number of runs by a factor of $M-1$ in each merge iteration
- $i = \lceil{\log_{M-1}(\lceil \frac{B(R)}{M} \rceil)\rceil$ merge phases
- $2 \cdot B(R)$
-
total I/O
- $(1 + i) \cdot 2 \cdot B(R)$
joins
- input:
B(R)
,B(S)
,M
block nested loop
hash join
-
if $B(R) < M$ or $B(S) < M$ load smaller table into memory, build hashtable, scan through the other
- WLOG R is smaller $B(R) + B(S)$
-
partition both tables into $M-1$ fragments -> the fragments are of size $\lceil \frac{B(R)}{M-1} \rceil$ and $\lceil \frac{B(S)}{M-1} \rceil$
- how much I/O $2 \cdot (B(R) + B(S))$
- number of partition phases $i = \lceil \log_{M-1}(min(B(R),B(S))\rceil - 1$
$$(2 \cdot i + 1) \cdot (B(R) + B(S))$$
schedules
recoverability
- T1 reads from T2 if T1 reads a data item A and T2 was the last one to write the data item A before the read
- a schedule is recoverable if for every transaction Ti that reads from a transaction Tj the commit of Tj is before the commit of Ti
|
|
Time | A |
---|---|
0 | 4 |
1 | 4 |
2 | 10 |
3 | 5 |
4 | 5 |
5 | 5 |
- not recoverable
|
|
Time | A | after T1 abort |
---|---|---|
0 | 4 | 4 |
1 | 10 | 4 |
2 | 10 | 4 |
3 | 11 | 5 |
4 | 11 | 5 |
5 | 11 | 5 |
- recoverable
|
|
cascadeless
- T1 reads from T2 if T1 reads a data item A and T2 was the last one to write the data item A before the read
-
a schedule is cascadeless if for every transaction Ti that reads from a transaction Tj the commit of Tj is before the read of Ti
- we only read committed data
- check for this wi(A) …. ci …. rj(A)
|
|
|
|
|
|
strictness
- a schedule is strict if every transaction is only reading and writing committed data
- w1(A) …. c1 …. w2(A)
conflict-serializable
-
construct conflict graph
-
conflicting operations
- same data item
- different transactions
- at least one of them is a read
- pi(X) … qj(X) means there is an edge from i to j
-
examples
|
|
|
|
2PL
examples
- a schedule (here we need locks early
|
|
|
|
|
|
- a schedule (here we need locks late)
|
|
|
|
|
|
new stuff
- postgres plans (externally and internally) - 1
- modern index structures (zone maps, Bw-trees, skip lists, LSM) - 4
- distributed query / transaction processing - 2
- in-memory databases - 6
- sorting and joins in postgres (when to use what) - 3