Lecture Notes for CS425

Overview

Overview
Lecture <2022-08-24 Wed>
- Relations
  - PKs and FKs
- JSON
Lecture <2022-08-31 Wed>
- Recap relational model
  - Schema vs. Instance (data)
  - Integrity Constraints
Lecture <2022-09-14 Wed> and <2022-09-19 Mon>
Lecture <2022-09-21 Wed>
Lecture <2022-09-26 Mon>
Lecture <2022-09-28 Wed>
- nested subqueries
- Window functions
Lecture <2022-10-03 Mon>
Lecture <2022-10-05 Wed>
- DDL
- Updates (DML)
  - transactions
  - advanced update statement
Lecture <2022-10-12 Wed>
Lecture <2022-10-17 Mon>
- Accessing databases from programming languages
  - Java
  - Python
Lecture <2022-10-19 Wed>
- Triggers
- ER-model
  - requirement analysis: music collection
Lecture <2022-10-24 Mon>
- ER-model
  - Entities and relationships
  - Examples
- hand-written notes
Lecture <2022-10-31 Mon>
Lecture <2022-11-07 Mon>
Lecture <2022-11-09 Wed>
- LEFT OUTER JOIN for not exists checks
- exam scaling
Lecture <2022-11-14 Mon>
Lecture <2022-11-21 Mon>
Lecture <2022-11-30 Wed>
- example of determining normal forms
  - 2NF
  - 3NF
  - BCNF
- attribute closures and canonical cover
  - decomposition into 3NF

Lecture <2022-08-24 Wed>

Relations

$$D_1 = \{ false, true \}$$
$$D_2 = \{ 1,2,3 \}$$

R = (A,B)

$$D_1 \times D_2$$

A	B
false	1
false	2
false	3
true	1
true	2
true	3

A	B
false	1
true	1
true	3

A	B
true	3
false	1
true	1

PKs and FKs

Instructor

ID	Name	SSN	salary
1	Peter Smith	11-22	4000
2	Peter Smith	22-33	5000
3	Alice Smith	55-55	4000

Superkeys all attributes are always a superkey {ID, Name, SSN, salary }
Superkeys: K1 = {ID}, K2 = {SSN}, K3 = {ID,SSN}, K4 = {ID,Name}
Candidate keys are K1, K2 (the only minimal superkeys)
- e.g., K3 = {ID,SSN} is not minimal, because superkey K1 = {ID} is a proper subset of K3

JSON

JSON data allows nesting of object ({}) and array ([]) collection types
- e.g., the value of the address field available for some customers is itself an object
JSON is semi-structured, e.g., only some customers have an address field

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


{
    "customers": [
        {
        "name": "Peter",
        "age": 79
    },
        {
        "name": "Peter",
        "address": {
            "city": "Chicago",
            "zip": 60616,
            "street": "40 31st Street"
        }
    },
        {
        "name": "Peter",
        "age": 79
    }
    ],
    "products": [
        {
        "id": 15324,
        "price": 56,
        "name": "Lawnmover"
    }
    ]
}

Lecture <2022-08-31 Wed>

Recap relational model

structured data model
- compare to semi-structured (JSON) and unstructure (text document)

Schema vs. Instance (data)

schema

relation schema: R(A1: D1, ..., An: Dn), R(A1, ..., An)
- Student(Name,CWID,GPA)
database schema: D = {R1, ..., Rm}
- UniversityDB = { Student(Name,CWID,GPA), Course(Title, Nr, Major), ...

instance (data)

relation instance of schema R(A1: D1, ..., An: Dn):
- it is a set of tuples (rows) which is a subset D1 x ... x Dn
database instance of DB schema UniversityDB = { Student(Name:string,CWID:string,GPA:float), Course(Title:string, Nr:int, Major:string)}
- is a set of relation instance, one for each relation schema

Integrity Constraints

candidate keys / primary keys
- super key: set of attributes that uniquely identify each row in a table
- candidate key: minimal superkeys
- primary key: user-selected candidate used to identify rows
foreign keys
- "link" tuples from one table to another by referring to the primary key of the other table

foreign key violations

IName	Dname
A	BIO
B	BIO
C	BIO
D	CS
E	CS

DName	Building
BIO	the big building
CS	the small building

delete BIO
change departments of existing BIO instructors to something else and then delete
(1) change departments of existing BIO instructors to NULL
(2) delete instructors for BIO
(3) don't do it and throw ERROR
we will see later that relational database implement options (1) to (3) and allow the user to choose when they define a foreign key, which option should be used

Lecture <2022-09-14 Wed> and <2022-09-19 Mon>

Application Domain for Project

banking system

persons (roles)

clients / account holders
employee
- teller
- loan specialist
- manager

data

account holders
accounts
- account types
loans

processes

deposit / withdraw / transfer
- data: accounts
- persons: clients
manage accounts (create / close / change type?)
- data: accounts, account holders
- persons: clients, employee
loan management
- data: loans, accounts, clients
- persons: employee, clients
interest management
- data: accounts
- person: (automatic)
analytics
- data: accounts, loans, clients, account types
- person: managers

Postgres Overview

Postgres is a server-based database. That the postgres process runs in the background listing on a network port on the machine you are running it on. You can then connect to the server using a client application that understand the network protocol postgres is using. A Postgres server process manages a directory called the cluster which stores the content of your database on disk. When a postgres server is started, it is assigned a cluster directory to manage. Once you have a running Postgres server you can connect with a client application to send SQL commands to the server and receive results. Some important client applications are discussed below.

Cluster directory

Depending on how you installed Postgres, a cluster directory may have been created for you already. If not then use the initdb command as explained below to create one yourself. A cluster directory is used by Postgres to store your database files. The directory also contains important configuration files:

postgresql.conf - configuration settings
- listen_addresses = '*' - allow connections from anywhere
pg_hba.conf - controls access to the database
- each line is rule, first matching rule applies

Postgres binaries

`initdb` - creates clusters

1
2


mkdir cs425
initdb -D cs425

`pg_ctl` - start / stop servers

start server at cluster directory DIR writing logs to LOGFILE

1

pg_ctl -D DIR -l LOGFILE start

stop server at cluster directory DIR

1

pg_ctl -D DIR stop

`psql` - commandline client

connect to running server and run SQL commands
-p - network port
-h - host (IP address) where 127.0.0.1 is localhost
-U - postgres user to connect with
last one is the database to connect to (default DB created is postgres)

1

psql -h 127.0.0.1 -U lord_pretzel -p 5556 postgres

Client applications (`pgAdmin`)

Many client applications exist that can talk to a postgres server. For our purpose we want a GUI or CLI to iteratively run SQL commands and inspect their results. We already discussed the build-in psql CLI above. And explain how to use pgAdmin a widely used GUI client in the following.

`pgAdmin`

You can download pgAdmin from here: https://www.pgadmin.org/.

Setup

When you first start pgAdmin, it will ask you to set a master password. Do that first.
To add a new database connection for your Postgres server, right-click on the browser shown on the left-hand side and select Register->Server

give the server a name (any name would do)

then click on connection and enter the connection information
1. host: if you are running postgres locally this would be 127.0.0.1 (the local machine)
2. port: unless you are running the postgres server on a non-standard port this will be 5432
3. maintenance database: the first database to connect to (for most installation methods this would be postgres)
4. username: your database user name (for most installations this would be postgres)
5. password: the password of your database user
6. Save password: select yes to not have to enter the password every time you connect to the server

Inspecting the database schema

To inspect the schema of a database, select the server in the left-hand side browser and expand it. Then expand the schema public and then select tables.

Writing SQL

Rightclick on a database in the browser on the left and select QueryTool

Then you can enter queries and execute them with the "play" button.

Create schema and insert data

The following sql script generates the university example schema. The script is also available here: https://github.com/IITDBGroup/cs425/blob/master/university_schema_postgres.sql

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276


DROP TABLE IF EXISTS prereq CASCADE;
DROP TABLE IF EXISTS time_slot CASCADE;
DROP TABLE IF EXISTS advisor CASCADE;
DROP TABLE IF EXISTS takes CASCADE;
DROP TABLE IF EXISTS student CASCADE;
DROP TABLE IF EXISTS teaches CASCADE;
DROP TABLE IF EXISTS section CASCADE;
DROP TABLE IF EXISTS instructor CASCADE;
DROP TABLE IF EXISTS course CASCADE;
DROP TABLE IF EXISTS department CASCADE;
DROP TABLE IF EXISTS classroom CASCADE;

CREATE TABLE classroom
  (building		varchar(15),
  room_number	varchar(7),
  capacity		numeric(4,0),
  PRIMARY KEY (building, room_number)
  );

CREATE TABLE department
  (dept_name		varchar(20),
  building		varchar(15),
  budget		        numeric(12,2) CHECK (budget > 0),
  PRIMARY KEY (dept_name)
  );

CREATE TABLE course
  (course_id		varchar(8),
  title			varchar(50),
  dept_name		varchar(20),
  credits		numeric(2,0) CHECK (credits > 0),
  PRIMARY KEY (course_id),
  FOREIGN KEY (dept_name) REFERENCES department
  ON DELETE SET NULL
  );

CREATE TABLE instructor
  (ID			varchar(5),
  name			varchar(20) NOT NULL,
  dept_name		varchar(20),
  salary			numeric(8,2) CHECK (salary > 29000),
  PRIMARY KEY (ID),
  FOREIGN KEY (dept_name) REFERENCES department
  ON DELETE SET NULL
  );

CREATE TABLE section
  (course_id		varchar(8),
  sec_id			varchar(8),
  semester		varchar(6)  CHECK (semester in ('Fall', 'Winter', 'Spring', 'Summer')),
  year			numeric(4,0) CHECK (year > 1701 AND year < 2100),
  building		varchar(15),
  room_number		varchar(7),
  time_slot_id		varchar(4),
  PRIMARY KEY (course_id, sec_id, semester, year),
  FOREIGN KEY (course_id) REFERENCES course
  ON DELETE CASCADE,
  FOREIGN KEY (building, room_number) REFERENCES classroom
  ON DELETE SET NULL
  );

CREATE TABLE teaches
  (ID			varchar(5),
  course_id		varchar(8),
  sec_id			varchar(8),
  semester		varchar(6),
  year			numeric(4,0),
  PRIMARY KEY (ID, course_id, sec_id, semester, year),
  FOREIGN KEY (course_id,sec_id, semester, year) REFERENCES section
  ON DELETE CASCADE,
  FOREIGN KEY (ID) REFERENCES instructor
  ON DELETE CASCADE
  );

CREATE TABLE student
  (ID			varchar(5),
  name			varchar(20) NOT NULL,
  dept_name		varchar(20),
  tot_cred		numeric(3,0) CHECK (tot_cred >= 0),
  PRIMARY KEY (ID),
  FOREIGN KEY (dept_name) REFERENCES department
  ON DELETE SET NULL
  );

CREATE TABLE takes
  (ID			varchar(5),
  course_id		varchar(8),
  sec_id			varchar(8),
  semester		varchar(6),
  year			numeric(4,0),
  grade		        varchar(2),
  PRIMARY KEY (ID, course_id, sec_id, semester, year),
  FOREIGN KEY (course_id,sec_id, semester, year) REFERENCES section
  ON DELETE CASCADE,
  FOREIGN KEY (ID) REFERENCES student
  ON DELETE CASCADE
  );

CREATE TABLE advisor
  (s_ID			varchar(5),
  i_ID			varchar(5),
  PRIMARY KEY (s_ID),
  FOREIGN KEY (i_ID) REFERENCES instructor (ID)
  ON DELETE SET NULL,
  FOREIGN KEY (s_ID) REFERENCES student (ID)
  ON DELETE CASCADE
  );

CREATE TABLE time_slot
  (time_slot_id		varchar(4),
  day			varchar(1),
  start_hr		numeric(2) CHECK (start_hr >= 0 AND start_hr < 24),
  start_min		numeric(2) CHECK (start_min >= 0 AND start_min < 60),
  end_hr			numeric(2) CHECK (end_hr >= 0 AND end_hr < 24),
  end_min		numeric(2) CHECK (end_min >= 0 AND end_min < 60),
  PRIMARY KEY (time_slot_id, day, start_hr, start_min)
  );

CREATE TABLE prereq
  (course_id		varchar(8),
  prereq_id		varchar(8),
  PRIMARY KEY (course_id, prereq_id),
  FOREIGN KEY (course_id) REFERENCES course
  ON DELETE CASCADE,
  FOREIGN KEY (prereq_id) REFERENCES course
  );

DELETE FROM prereq;
DELETE FROM time_slot;
DELETE FROM advisor;
DELETE FROM takes;
DELETE FROM student;
DELETE FROM teaches;
DELETE FROM section;
DELETE FROM instructor;
DELETE FROM course;
DELETE FROM department;
DELETE FROM classroom;
INSERT INTO classroom VALUES ('Packard', '101', '500');
INSERT INTO classroom VALUES ('Painter', '514', '10');
INSERT INTO classroom VALUES ('Taylor', '3128', '70');
INSERT INTO classroom VALUES ('Watson', '100', '30');
INSERT INTO classroom VALUES ('Watson', '120', '50');
INSERT INTO department VALUES ('Biology', 'Watson', '90000');
INSERT INTO department VALUES ('Comp. Sci.', 'Taylor', '100000');
INSERT INTO department VALUES ('Elec. Eng.', 'Taylor', '85000');
INSERT INTO department VALUES ('Finance', 'Painter', '120000');
INSERT INTO department VALUES ('History', 'Painter', '50000');
INSERT INTO department VALUES ('Music', 'Packard', '80000');
INSERT INTO department VALUES ('Physics', 'Watson', '70000');
INSERT INTO course VALUES ('BIO-101', 'Intro. to Biology', 'Biology', '4');
INSERT INTO course VALUES ('BIO-301', 'Genetics', 'Biology', '4');
INSERT INTO course VALUES ('BIO-399', 'Computational Biology', 'Biology', '3');
INSERT INTO course VALUES ('CS-101', 'Intro. to Computer Science', 'Comp. Sci.', '4');
INSERT INTO course VALUES ('CS-190', 'Game Design', 'Comp. Sci.', '4');
INSERT INTO course VALUES ('CS-315', 'Robotics', 'Comp. Sci.', '3');
INSERT INTO course VALUES ('CS-319', 'Image Processing', 'Comp. Sci.', '3');
INSERT INTO course VALUES ('CS-347', 'Database System Concepts', 'Comp. Sci.', '3');
INSERT INTO course VALUES ('EE-181', 'Intro. to Digital Systems', 'Elec. Eng.', '3');
INSERT INTO course VALUES ('FIN-201', 'Investment Banking', 'Finance', '3');
INSERT INTO course VALUES ('HIS-351', 'World History', 'History', '3');
INSERT INTO course VALUES ('MU-199', 'Music Video Production', 'Music', '3');
INSERT INTO course VALUES ('PHY-101', 'Physical Principles', 'Physics', '4');
INSERT INTO instructor VALUES ('10101', 'Srinivasan', 'Comp. Sci.', '65000');
INSERT INTO instructor VALUES ('12121', 'Wu', 'Finance', '90000');
INSERT INTO instructor VALUES ('15151', 'Mozart', 'Music', '40000');
INSERT INTO instructor VALUES ('22222', 'Einstein', 'Physics', '95000');
INSERT INTO instructor VALUES ('32343', 'El Said', 'History', '60000');
INSERT INTO instructor VALUES ('33456', 'Gold', 'Physics', '87000');
INSERT INTO instructor VALUES ('45565', 'Katz', 'Comp. Sci.', '75000');
INSERT INTO instructor VALUES ('58583', 'Califieri', 'History', '62000');
INSERT INTO instructor VALUES ('76543', 'Singh', 'Finance', '80000');
INSERT INTO instructor VALUES ('76766', 'Crick', 'Biology', '72000');
INSERT INTO instructor VALUES ('83821', 'Brandt', 'Comp. Sci.', '92000');
INSERT INTO instructor VALUES ('98345', 'Kim', 'Elec. Eng.', '80000');
INSERT INTO section VALUES ('BIO-101', '1', 'Summer', '2009', 'Painter', '514', 'B');
INSERT INTO section VALUES ('BIO-301', '1', 'Summer', '2010', 'Painter', '514', 'A');
INSERT INTO section VALUES ('CS-101', '1', 'Fall', '2009', 'Packard', '101', 'H');
INSERT INTO section VALUES ('CS-101', '1', 'Spring', '2010', 'Packard', '101', 'F');
INSERT INTO section VALUES ('CS-190', '1', 'Spring', '2009', 'Taylor', '3128', 'E');
INSERT INTO section VALUES ('CS-190', '2', 'Spring', '2009', 'Taylor', '3128', 'A');
INSERT INTO section VALUES ('CS-315', '1', 'Spring', '2010', 'Watson', '120', 'D');
INSERT INTO section VALUES ('CS-319', '1', 'Spring', '2010', 'Watson', '100', 'B');
INSERT INTO section VALUES ('CS-319', '2', 'Spring', '2010', 'Taylor', '3128', 'C');
INSERT INTO section VALUES ('CS-347', '1', 'Fall', '2009', 'Taylor', '3128', 'A');
INSERT INTO section VALUES ('EE-181', '1', 'Spring', '2009', 'Taylor', '3128', 'C');
INSERT INTO section VALUES ('FIN-201', '1', 'Spring', '2010', 'Packard', '101', 'B');
INSERT INTO section VALUES ('HIS-351', '1', 'Spring', '2010', 'Painter', '514', 'C');
INSERT INTO section VALUES ('MU-199', '1', 'Spring', '2010', 'Packard', '101', 'D');
INSERT INTO section VALUES ('PHY-101', '1', 'Fall', '2009', 'Watson', '100', 'A');
INSERT INTO teaches VALUES ('10101', 'CS-101', '1', 'Fall', '2009');
INSERT INTO teaches VALUES ('10101', 'CS-315', '1', 'Spring', '2010');
INSERT INTO teaches VALUES ('10101', 'CS-347', '1', 'Fall', '2009');
INSERT INTO teaches VALUES ('12121', 'FIN-201', '1', 'Spring', '2010');
INSERT INTO teaches VALUES ('15151', 'MU-199', '1', 'Spring', '2010');
INSERT INTO teaches VALUES ('22222', 'PHY-101', '1', 'Fall', '2009');
INSERT INTO teaches VALUES ('32343', 'HIS-351', '1', 'Spring', '2010');
INSERT INTO teaches VALUES ('45565', 'CS-101', '1', 'Spring', '2010');
INSERT INTO teaches VALUES ('45565', 'CS-319', '1', 'Spring', '2010');
INSERT INTO teaches VALUES ('76766', 'BIO-101', '1', 'Summer', '2009');
INSERT INTO teaches VALUES ('76766', 'BIO-301', '1', 'Summer', '2010');
INSERT INTO teaches VALUES ('83821', 'CS-190', '1', 'Spring', '2009');
INSERT INTO teaches VALUES ('83821', 'CS-190', '2', 'Spring', '2009');
INSERT INTO teaches VALUES ('83821', 'CS-319', '2', 'Spring', '2010');
INSERT INTO teaches VALUES ('98345', 'EE-181', '1', 'Spring', '2009');
INSERT INTO student VALUES ('00128', 'Zhang', 'Comp. Sci.', '102');
INSERT INTO student VALUES ('12345', 'Shankar', 'Comp. Sci.', '32');
INSERT INTO student VALUES ('19991', 'Brandt', 'History', '80');
INSERT INTO student VALUES ('23121', 'Chavez', 'Finance', '110');
INSERT INTO student VALUES ('44553', 'Peltier', 'Physics', '56');
INSERT INTO student VALUES ('45678', 'Levy', 'Physics', '46');
INSERT INTO student VALUES ('54321', 'Williams', 'Comp. Sci.', '54');
INSERT INTO student VALUES ('55739', 'Sanchez', 'Music', '38');
INSERT INTO student VALUES ('70557', 'Snow', 'Physics', '0');
INSERT INTO student VALUES ('76543', 'Brown', 'Comp. Sci.', '58');
INSERT INTO student VALUES ('76653', 'Aoi', 'Elec. Eng.', '60');
INSERT INTO student VALUES ('98765', 'Bourikas', 'Elec. Eng.', '98');
INSERT INTO student VALUES ('98988', 'Tanaka', 'Biology', '120');
INSERT INTO takes VALUES ('00128', 'CS-101', '1', 'Fall', '2009', 'A');
INSERT INTO takes VALUES ('00128', 'CS-347', '1', 'Fall', '2009', 'A-');
INSERT INTO takes VALUES ('12345', 'CS-101', '1', 'Fall', '2009', 'C');
INSERT INTO takes VALUES ('12345', 'CS-190', '2', 'Spring', '2009', 'A');
INSERT INTO takes VALUES ('12345', 'CS-315', '1', 'Spring', '2010', 'A');
INSERT INTO takes VALUES ('12345', 'CS-347', '1', 'Fall', '2009', 'A');
INSERT INTO takes VALUES ('19991', 'HIS-351', '1', 'Spring', '2010', 'B');
INSERT INTO takes VALUES ('23121', 'FIN-201', '1', 'Spring', '2010', 'C+');
INSERT INTO takes VALUES ('44553', 'PHY-101', '1', 'Fall', '2009', 'B-');
INSERT INTO takes VALUES ('45678', 'CS-101', '1', 'Fall', '2009', 'F');
INSERT INTO takes VALUES ('45678', 'CS-101', '1', 'Spring', '2010', 'B+');
INSERT INTO takes VALUES ('45678', 'CS-319', '1', 'Spring', '2010', 'B');
INSERT INTO takes VALUES ('54321', 'CS-101', '1', 'Fall', '2009', 'A-');
INSERT INTO takes VALUES ('54321', 'CS-190', '2', 'Spring', '2009', 'B+');
INSERT INTO takes VALUES ('55739', 'MU-199', '1', 'Spring', '2010', 'A-');
INSERT INTO takes VALUES ('76543', 'CS-101', '1', 'Fall', '2009', 'A');
INSERT INTO takes VALUES ('76543', 'CS-319', '2', 'Spring', '2010', 'A');
INSERT INTO takes VALUES ('76653', 'EE-181', '1', 'Spring', '2009', 'C');
INSERT INTO takes VALUES ('98765', 'CS-101', '1', 'Fall', '2009', 'C-');
INSERT INTO takes VALUES ('98765', 'CS-315', '1', 'Spring', '2010', 'B');
INSERT INTO takes VALUES ('98988', 'BIO-101', '1', 'Summer', '2009', 'A');
INSERT INTO takes VALUES ('98988', 'BIO-301', '1', 'Summer', '2010', null);
INSERT INTO advisor VALUES ('00128', '45565');
INSERT INTO advisor VALUES ('12345', '10101');
INSERT INTO advisor VALUES ('23121', '76543');
INSERT INTO advisor VALUES ('44553', '22222');
INSERT INTO advisor VALUES ('45678', '22222');
INSERT INTO advisor VALUES ('76543', '45565');
INSERT INTO advisor VALUES ('76653', '98345');
INSERT INTO advisor VALUES ('98765', '98345');
INSERT INTO advisor VALUES ('98988', '76766');
INSERT INTO time_slot VALUES ('A', 'M', '8', '0', '8', '50');
INSERT INTO time_slot VALUES ('A', 'W', '8', '0', '8', '50');
INSERT INTO time_slot VALUES ('A', 'F', '8', '0', '8', '50');
INSERT INTO time_slot VALUES ('B', 'M', '9', '0', '9', '50');
INSERT INTO time_slot VALUES ('B', 'W', '9', '0', '9', '50');
INSERT INTO time_slot VALUES ('B', 'F', '9', '0', '9', '50');
INSERT INTO time_slot VALUES ('C', 'M', '11', '0', '11', '50');
INSERT INTO time_slot VALUES ('C', 'W', '11', '0', '11', '50');
INSERT INTO time_slot VALUES ('C', 'F', '11', '0', '11', '50');
INSERT INTO time_slot VALUES ('D', 'M', '13', '0', '13', '50');
INSERT INTO time_slot VALUES ('D', 'W', '13', '0', '13', '50');
INSERT INTO time_slot VALUES ('D', 'F', '13', '0', '13', '50');
INSERT INTO time_slot VALUES ('E', 'T', '10', '30', '11', '45 ');
INSERT INTO time_slot VALUES ('E', 'R', '10', '30', '11', '45 ');
INSERT INTO time_slot VALUES ('F', 'T', '14', '30', '15', '45 ');
INSERT INTO time_slot VALUES ('F', 'R', '14', '30', '15', '45 ');
INSERT INTO time_slot VALUES ('G', 'M', '16', '0', '16', '50');
INSERT INTO time_slot VALUES ('G', 'W', '16', '0', '16', '50');
INSERT INTO time_slot VALUES ('G', 'F', '16', '0', '16', '50');
INSERT INTO time_slot VALUES ('H', 'W', '10', '0', '12', '30');
INSERT INTO prereq VALUES ('BIO-301', 'BIO-101');
INSERT INTO prereq VALUES ('BIO-399', 'BIO-101');
INSERT INTO prereq VALUES ('CS-190', 'CS-101');
INSERT INTO prereq VALUES ('CS-315', 'CS-101');
INSERT INTO prereq VALUES ('CS-319', 'CS-101');
INSERT INTO prereq VALUES ('CS-347', 'CS-101');
INSERT INTO prereq VALUES ('EE-181', 'PHY-101');

DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1

More DDL examples

create a table for storing information about seminars

1
2
3
4
5
6
7
8


CREATE TABLE seminars (
  title VARCHAR(200), -- could use TEXT is postgres
  presenter VARCHAR(200),
  semdate TIMESTAMP,
  building		varchar(15), -- PK attributes from classroom for the FK
  room_number	varchar(7), -- PK attributes from classroom for the FK
  FOREIGN KEY (building, room_number) REFERENCES classroom -- seminars should be scheduled in existing rooms
  );

CREATE TABLE

DDL change table definitions

dropping a table

1

DROP TABLE seminars;

DROP TABLE

alter table

adding an attribute (new values will be NULL

1

ALTER TABLE seminars ADD organizing_dept varchar(20);

ALTER TABLE

1

SELECT * FROM seminars;

title

presenter

semdate

building

room_number

organizing_dept

add foreign key

1

ALTER TABLE seminars ADD FOREIGN KEY (organizing_dept) REFERENCES department;

ALTER TABLE

DML - inserts and delete

Insert

insert a single row into a table

1

INSERT INTO seminars VALUES ('Why databases are great', 'Bert', '2022-09-28 9:00', 'Packard', '101', 'Comp. Sci.');

INSERT 0 1

1

INSERT INTO seminars VALUES ('Why databases are stupid', 'Bert', '2022-09-29 10:00', 'Packard', '101', 'Comp. Sci.');

INSERT 0 1

1

SELECT * FROM seminars;

title	presenter	semdate	building	room_number	organizing_dept
Why databases are great	Bert	2022-09-28 09:00:00	Packard	101	Comp. Sci.
Why databases are stupid	Bert	2022-09-29 10:00:00	Packard	101	Comp. Sci.

Delete

delete all rows that fulfill WHERE condition from table

1

DELETE FROM seminars WHERE presenter = 'Bert';

DELETE 2

1

SELECT * FROM seminars;

title

presenter

semdate

building

room_number

organizing_dept

1
2
3


INSERT INTO seminars VALUES ('Why databases are great', 'Bert', '2022-09-28 9:00', 'Packard', '101', 'Comp. Sci.');
INSERT INTO seminars VALUES ('Why databases are stupid', 'Bert', '2022-09-29 10:00', 'Packard', '101', 'Comp. Sci.');
INSERT INTO seminars VALUES ('Why Mozart is great', 'Alice', '2022-10-30 11:00', 'Packard', '101', 'Music');

INSERT 0 1
INSERT 0 1
INSERT 0 1

1

SELECT * FROM seminars;

title	presenter	semdate	building	room_number	organizing_dept
Why databases are great	Bert	2022-09-28 09:00:00	Packard	101	Comp. Sci.
Why databases are stupid	Bert	2022-09-29 10:00:00	Packard	101	Comp. Sci.
Why Mozart is great	Alice	2022-10-30 11:00:00	Packard	101	Music

1

DELETE FROM seminars WHERE organizing_dept = 'Music' OR semdate > '2022-10-01 0:00';

DELETE 1

1

SELECT * FROM seminars;

title	presenter	semdate	building	room_number	organizing_dept
Why databases are great	Bert	2022-09-28 09:00:00	Packard	101	Comp. Sci.
Why databases are stupid	Bert	2022-09-29 10:00:00	Packard	101	Comp. Sci.

DML - queries

basic `SELECT-FROM-WHERE` blocks

examples

* means all attributes from tables in the FROM clause

1

SELECT * FROM student;

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102
12345	Shankar	Comp. Sci.	32
19991	Brandt	History	80
23121	Chavez	Finance	110
44553	Peltier	Physics	56
45678	Levy	Physics	46
54321	Williams	Comp. Sci.	54
55739	Sanchez	Music	38
70557	Snow	Physics	0
76543	Brown	Comp. Sci.	58
76653	Aoi	Elec. Eng.	60
98765	Bourikas	Elec. Eng.	98
98988	Tanaka	Biology	120

Names of students studying CS

1

SELECT name FROM student WHERE dept_name = 'Comp. Sci.';

name
Zhang
Shankar
Williams
Brown

alias in FROM clause: TABLENAME ALIAS or TABLENAME AS ALIAS
dot notation to specify which table an attribute belongs to TABLE.ATTR
in SELECT clause us AS to rename a result column ATTRIBUTE AS NEWNAME

1
2
3


SELECT name, d.dept_name AS major, building AS dept_building
FROM student s, department d
WHERE s.dept_name = d.dept_name AND (building = 'Taylor' OR building = 'Watson');

name	major	dept_building
Zhang	Comp. Sci.	Taylor
Shankar	Comp. Sci.	Taylor
Peltier	Physics	Watson
Levy	Physics	Watson
Williams	Comp. Sci.	Taylor
Snow	Physics	Watson
Brown	Comp. Sci.	Taylor
Aoi	Elec. Eng.	Taylor
Bourikas	Elec. Eng.	Taylor
Tanaka	Biology	Watson

relational algebra equivalence

1
2
3


SELECT A1, ..., An
FROM R1, ..., Rm
WHERE P

is equivalent to $$\pi_{A_1, \ldots, A_n}(\sigma_{P}(R_1 \times \ldots \times R_m)$$

casting

SQL databases automatically cast between datatypes where possible, e.g., in the condition mydatecolumn < '2002-01-01' we are comparing a DATE against a VARCHAR (string) value. Postgres would detect that it does not have a comparison operator < for these types and correctly decide to try to cast '2002-01-01' as a DATE.

You can also manually cast between types:

cast result of expression as datatype: expr::datatype
SQL standard CAST(expr AS datatype)

1
2
3


SELECT room_number::INT AS p,
       CAST(room_number AS INT) AS x
  FROM classroom;

p	x
101	101
514	514
3128	3128
100	100
120	120

keywords are case-insensitive

1

SeLeCT name FRom student;

name
Zhang
Shankar
Brandt
Chavez
Peltier
Levy
Williams
Sanchez
Snow
Brown
Aoi
Bourikas
Tanaka

joins

natural join

1
2


SELECT *
  FROM instructor NATURAL JOIN teaches;

id	name	dept_name	salary	course_id	sec_id	semester	year
10101	Srinivasan	Comp. Sci.	65000.00	CS-101	1	Fall	2009
10101	Srinivasan	Comp. Sci.	65000.00	CS-315	1	Spring	2010
10101	Srinivasan	Comp. Sci.	65000.00	CS-347	1	Fall	2009
12121	Wu	Finance	90000.00	FIN-201	1	Spring	2010
15151	Mozart	Music	40000.00	MU-199	1	Spring	2010
22222	Einstein	Physics	95000.00	PHY-101	1	Fall	2009
32343	El Said	History	60000.00	HIS-351	1	Spring	2010
45565	Katz	Comp. Sci.	75000.00	CS-101	1	Spring	2010
45565	Katz	Comp. Sci.	75000.00	CS-319	1	Spring	2010
76766	Crick	Biology	72000.00	BIO-101	1	Summer	2009
76766	Crick	Biology	72000.00	BIO-301	1	Summer	2010
83821	Brandt	Comp. Sci.	92000.00	CS-190	1	Spring	2009
83821	Brandt	Comp. Sci.	92000.00	CS-190	2	Spring	2009
83821	Brandt	Comp. Sci.	92000.00	CS-319	2	Spring	2010
98345	Kim	Elec. Eng.	80000.00	EE-181	1	Spring	2009

outer joins

LEFT, RIGHT, FULL

1

SELECT * FROM course NATURAL LEFT OUTER JOIN prereq;

course_id	title	dept_name	credits	prereq_id
BIO-301	Genetics	Biology	4	BIO-101
BIO-399	Computational Biology	Biology	3	BIO-101
CS-190	Game Design	Comp. Sci.	4	CS-101
CS-315	Robotics	Comp. Sci.	3	CS-101
CS-319	Image Processing	Comp. Sci.	3	CS-101
CS-347	Database System Concepts	Comp. Sci.	3	CS-101
EE-181	Intro. to Digital Systems	Elec. Eng.	3	PHY-101
MU-199	Music Video Production	Music	3
HIS-351	World History	History	3
FIN-201	Investment Banking	Finance	3
PHY-101	Physical Principles	Physics	4
BIO-101	Intro. to Biology	Biology	4
CS-101	Intro. to Computer Science	Comp. Sci.	4

specifying join conditions

NATURAL
ON cond, specify join condition cond (as in $\theta$-joins)
USING (A1, ... An) join on equality on all attributes A1, ..., An

1
2


SELECT i.name, t.course_id
FROM instructor i JOIN teaches t ON (i.ID = t.ID)

name	course_id
Srinivasan	CS-101
Srinivasan	CS-315
Srinivasan	CS-347
Wu	FIN-201
Mozart	MU-199
Einstein	PHY-101
El Said	HIS-351
Katz	CS-101
Katz	CS-319
Crick	BIO-101
Crick	BIO-301
Brandt	CS-190
Brandt	CS-190
Brandt	CS-319
Kim	EE-181

Lecture <2022-09-21 Wed>

SQL is bag semantics

bag semantics means we can have duplicates!

1
2
3


SELECT dept_name
FROM student
ORDER BY dept_name;

dept_name
Biology
Comp. Sci.
Comp. Sci.
Comp. Sci.
Comp. Sci.
Elec. Eng.
Elec. Eng.
Finance
History
Music
Physics
Physics
Physics

subqueries

FROM clause can contain queries (and not just tables)

1
2
3
4
5


SELECT d.*
FROM department d,
     (SELECT DISTINCT dept_name
        FROM student) dn
WHERE d.dept_name = dn.dept_name;

dept_name	building	budget
Biology	Watson	90000.00
Comp. Sci.	Taylor	100000.00
Elec. Eng.	Taylor	85000.00
Finance	Painter	120000.00
History	Painter	50000.00
Music	Packard	80000.00
Physics	Watson	70000.00

`WITH` clause

query qj can refer to query qi if i < j

1
2
3
4
5


WITH name1 AS q1,
     name2 AS q2,
     ...
     namem AS qm
SELECT ...

1
2
3
4
5
6


WITH dn AS (SELECT DISTINCT dept_name
        FROM student)
SELECT d.*
FROM department d,
     dn
WHERE d.dept_name = dn.dept_name;

dept_name	building	budget
Biology	Watson	90000.00
Comp. Sci.	Taylor	100000.00
Elec. Eng.	Taylor	85000.00
Finance	Painter	120000.00
History	Painter	50000.00
Music	Packard	80000.00
Physics	Watson	70000.00

`DISTINCT`

remove duplicates

1
2


SELECT DISTINCT dept_name
FROM student;

dept_name
Finance
History
Physics
Music
Comp. Sci.
Biology
Elec. Eng.

String operations

https://www.postgresql.org/docs/current/functions-string.html
string contention (||) and substrings (substring)

1
2


SELECT 'A' || ID || substring(name, 1, 1) AS aid
FROM Student;

aid
A00128Z
A12345S
A19991B
A23121C
A44553P
A45678L
A54321W
A55739S
A70557S
A76543B
A76653A
A98765B
A98988T

string pattern matching LIKE
- . matches any one character
- % matches any sequence of characters (including empty sequence)
return students whose name starts with Z

1
2
3


SELECT *
FROM student
WHERE name LIKE 'Z%';

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102

return students whose name contains 'a' or starts with 'Z'

1
2
3


SELECT *
FROM student
WHERE name LIKE '%a%' OR name LIKE 'Z%';

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102
12345	Shankar	Comp. Sci.	32
19991	Brandt	History	80
23121	Chavez	Finance	110
54321	Williams	Comp. Sci.	54
55739	Sanchez	Music	38
98765	Bourikas	Elec. Eng.	98
98988	Tanaka	Biology	120

string length

1
2


SELECT name, character_length(name) AS namel
FROM student;

name	namel
Zhang	5
Shankar	7
Brandt	6
Chavez	6
Peltier	7
Levy	4
Williams	8
Sanchez	7
Snow	4
Brown	5
Aoi	3
Bourikas	8
Tanaka	6

Case distinction and coalesce

CASE

list of rules, first condition $c_i$ that matches, we return $e_i$

1
2
3
4
5
6


case
         when c1 then e1
         when c2 then e2
          …
          [else en]
end

1
2
3
4
5
6
7


SELECT name, tot_cred, dept_name AS major,
       CASE
           WHEN dept_name LIKE 'Comp%' AND tot_cred >= 40 THEN 'Ready'
           WHEN tot_cred >= 45 THEN 'Ready'
           ELSE 'Not ready'
       END || ' to graduate' AS is_ready
FROM student

name	tot_cred	major	is_ready
Zhang	102	Comp. Sci.	Ready to graduate
Shankar	32	Comp. Sci.	Not ready to graduate
Brandt	80	History	Ready to graduate
Chavez	110	Finance	Ready to graduate
Peltier	56	Physics	Ready to graduate
Levy	46	Physics	Ready to graduate
Williams	54	Comp. Sci.	Ready to graduate
Sanchez	38	Music	Not ready to graduate
Snow	0	Physics	Not ready to graduate
Brown	58	Comp. Sci.	Ready to graduate
Aoi	60	Elec. Eng.	Ready to graduate
Bourikas	98	Elec. Eng.	Ready to graduate
Tanaka	120	Biology	Ready to graduate

1
2
3
4
5
6


SELECT name, tot_cred, dept_name AS major
FROM student
WHERE CASE
          WHEN (dept_name LIKE 'Comp%' AND tot_cred >= 40) OR tot_cred >= 45 THEN true
          ELSE false
      END

name	tot_cred	major
Zhang	102	Comp. Sci.
Brandt	80	History
Chavez	110	Finance
Peltier	56	Physics
Levy	46	Physics
Williams	54	Comp. Sci.
Brown	58	Comp. Sci.
Aoi	60	Elec. Eng.
Bourikas	98	Elec. Eng.
Tanaka	120	Biology

1
2
3


SELECT name, tot_cred, dept_name AS major
FROM student
WHERE (dept_name LIKE 'Comp%' AND tot_cred >= 40) OR tot_cred >= 45

name	tot_cred	major
Zhang	102	Comp. Sci.
Brandt	80	History
Chavez	110	Finance
Peltier	56	Physics
Levy	46	Physics
Williams	54	Comp. Sci.
Brown	58	Comp. Sci.
Aoi	60	Elec. Eng.
Bourikas	98	Elec. Eng.
Tanaka	120	Biology

coalesce

return first non-null value COALESCE(e1,...,en) return first non-null value

1
2


SELECT name, COALESCE(tot_cred, 0)
FROM student;

insert student with NULL

1
2
3
4
5


DROP TABLE IF EXISTS otherstudents;
CREATE TABLE otherstudents (name VARCHAR(200), tot_cred NUMERIC(3,0));
DELETE FROM otherstudents;
INSERT INTO otherstudents VALUES ('Peter', NULL);
INSERT INTO otherstudents VALUES ('Alice', 100);

DROP TABLE
CREATE TABLE
DELETE 0
INSERT 0 1
INSERT 0 1

can be expressed with CASE

1
2
3
4
5
6


SELECT name,
       CASE
           WHEN tot_cred >= 0 THEN tot_cred
           ELSE 0
       END AS tot_cred
FROM otherstudents;

name	tot_cred
Peter	0
Alice	100

`IS NULL` and `IS NOT NULL`

check whether a value is NULL (or not)

1
2
3


SELECT *
  FROM otherstudents
 WHERE tot_cred IS NOT NULL;

name	tot_cred
Alice	100

Ordering results

ORDER BY clause orders results on a list of expressions
- ASC ascending order (smallest first)
- DESC descending order (largest first)

1
2
3


SELECT name, id AS cid, tot_cred, dept_name
FROM student
ORDER BY tot_cred DESC, cid ASC;

name	cid	tot_cred	dept_name
Tanaka	98988	120	Biology
Chavez	23121	110	Finance
Zhang	00128	102	Comp. Sci.
Bourikas	98765	98	Elec. Eng.
Brandt	19991	80	History
Aoi	76653	60	Elec. Eng.
Brown	76543	58	Comp. Sci.
Peltier	44553	56	Physics
Williams	54321	54	Comp. Sci.
Levy	45678	46	Physics
Sanchez	55739	38	Music
Shankar	12345	32	Comp. Sci.
Snow	70557	0	Physics

Set operations

three set operations
- UNION, INTERSECT, and EXCEPT (set difference)
- come in two flavors:
  - set version
  - bag version (by add ALL after the operation)
- applied to queries

1
2
3
4
5
6


Q1 UNION Q2
Q1 UNION ALL Q2
Q1 INTERSECT Q2
Q1 INTERSECT ALL Q2
Q1 EXCEPT Q2
Q1 EXCEPT ALL Q2

set union

1
2
3
4
5
6


SELECT * FROM (
(SELECT name FROM student)
UNION
(SELECT name FROM instructor)
) AS names
ORDER BY name;

name
Aoi
Bourikas
Brandt
Brown
Califieri
Chavez
Crick
Einstein
El Said
Gold
Katz
Kim
Levy
Mozart
Peltier
Sanchez
Shankar
Singh
Snow
Srinivasan
Tanaka
Williams
Wu
Zhang

bag union

if we have n duplicate of a row in the left input and m duplicates in the right input, then there will n + m duplicates of this row in the result of UNION ALL

1
2
3
4
5
6


SELECT * FROM (
(SELECT name FROM student)
UNION ALL
(SELECT name FROM instructor)
) AS names
ORDER BY name;

name
Aoi
Bourikas
Brandt
Brandt
Brown
Califieri
Chavez
Crick
Einstein
El Said
Gold
Katz
Kim
Levy
Mozart
Peltier
Sanchez
Shankar
Singh
Snow
Srinivasan
Tanaka
Williams
Wu
Zhang

1
2
3
4
5
6
7
8


SELECT * FROM (
(SELECT name FROM student)
UNION ALL
(SELECT name FROM instructor)
UNION ALL
(SELECT name FROM instructor)
) AS names
ORDER BY name;

name
Aoi
Bourikas
Brandt
Brandt
Brandt
Brown
Califieri
Califieri
Chavez
Crick
Crick
Einstein
Einstein
El Said
El Said
Gold
Gold
Katz
Katz
Kim
Kim
Levy
Mozart
Mozart
Peltier
Sanchez
Shankar
Singh
Singh
Snow
Srinivasan
Srinivasan
Tanaka
Williams
Wu
Wu
Zhang

set & bag intersection

set intersection

1
2
3
4
5
6


SELECT * FROM (
(SELECT name FROM student)
INTERSECT
(SELECT name FROM instructor)
) AS names
ORDER BY name;

name
Brandt

bag intersection
- if we have n duplicate of a row in the left input and m duplicates in the right input, then there will min(n,m) duplicates of this row in the result of UNION ALL

1
2
3
4
5
6


SELECT * FROM (
((SELECT name FROM student) UNION ALL (SELECT name FROM student) UNION ALL (SELECT name FROM student))
INTERSECT ALL
((SELECT name FROM instructor) UNION ALL (SELECT name FROM instructor))
) AS names
ORDER BY name;

name
Brandt
Brandt

set & bag difference

1
2
3
4
5
6


SELECT * FROM (
(SELECT name FROM student)
EXCEPT
(SELECT name FROM instructor)
) AS names
ORDER BY name;

name
Aoi
Bourikas
Brown
Chavez
Levy
Peltier
Sanchez
Shankar
Snow
Tanaka
Williams
Zhang

no Brandt
bag intersection
- if we have n duplicate of a row in the left input and m duplicates in the right input, then there will max(n - m, 0) duplicates of this row in the result of EXCEPT ALL

1
2
3
4
5
6


SELECT * FROM (
((SELECT name FROM student) UNION ALL (SELECT name FROM student) UNION ALL (SELECT name FROM student))
EXCEPT ALL
((SELECT name FROM instructor))
) AS names
ORDER BY name;

name
Aoi
Aoi
Aoi
Bourikas
Bourikas
Bourikas
Brandt
Brandt
Brown
Brown
Brown
Chavez
Chavez
Chavez
Levy
Levy
Levy
Peltier
Peltier
Peltier
Sanchez
Sanchez
Sanchez
Shankar
Shankar
Shankar
Snow
Snow
Snow
Tanaka
Tanaka
Tanaka
Williams
Williams
Williams
Zhang
Zhang
Zhang

Aggregation and Grouping

aggregation are part of the SELECT clause
- count, avg, min, max, sum

1
2


SELECT count(*) AS num_students, avg(tot_cred) AS avgcred
FROM student;

num_students	avgcred
13	65.6923076923076923

GROUP BY clause

1
2
3


SELECT dept_name AS major, count(*)::float / 1000.0 AS num_students, round(avg(tot_cred::float / 3.0)::numeric,2) AS avgcred
FROM student
GROUP BY dept_name;

major	num_students	avgcred
Finance	0.001	36.67
History	0.001	26.67
Physics	0.003	11.33
Music	0.001	12.67
Comp. Sci.	0.004	20.50
Biology	0.001	40.00
Elec. Eng.	0.002	26.33

if aggregation and / or group by used, only group-by expressions or aggregated columns can be used in the SELECT clause

1
2
3


SELECT name, dept_name AS major, count(*)::float / 1000.0 AS num_students, round(avg(tot_cred::float / 3.0)::numeric,2) AS avgcred
FROM student
GROUP BY dept_name;

ERROR:  column "student.name" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT name, dept_name AS major, count(*)::float / 1000.0 AS...

Lecture <2022-09-26 Mon>

Aggregation and `NULL` and empty inputs

empty tables

1
2


CREATE TABLE IF NOT EXISTS myemptytable(a int, b int);
TRUNCATE myemptytable;

CREATE TABLE

count over an empty table returns 0
sum and other aggregates return NULL

1
2


SELECT count(*), sum(A), sum(A) IS NULL
FROM myemptytable;

count	sum	?column?
0		t

some null values in tables

1
2


INSERT INTO myemptytable VALUES (NULL, 10);
INSERT INTO myemptytable VALUES (NULL, 20);

INSERT 0 1
INSERT 0 1

1

SELECT * FROM myemptytable;

a	b
	10
	20

1
2


SELECT count(*), sum(A), sum(A) IS NULL
FROM myemptytable;

count	sum	?column?
2		t

1
2


INSERT INTO myemptytable VALUES (5, 10);
INSERT INTO myemptytable VALUES (5, 20);

INSERT 0 1
INSERT 0 1

if there are some non-null values, all null values are ignored

1
2


SELECT count(*), sum(A), sum(A) IS NULL
FROM myemptytable;

count	sum	?column?
4	10	f

Aggregation with `HAVING`

filter departments with less than 3 students

1
2
3
4
5
6


SELECT nst, dept_name -- can use *
  FROM
    (SELECT count(*) AS nst, dept_name
       FROM student
      GROUP BY dept_name) AS sc
 WHERE nst < 3;

nst	dept_name
1	Finance
1	History
1	Music
1	Biology
2	Elec. Eng.

HAVING clause filters after aggregation, evaluated after GROUP BY, but before SELECT
- can only refer to group-by expression and aggregated results

1
2
3
4


SELECT count(*) AS nst, dept_name
  FROM student
 GROUP BY dept_name
HAVING count(*) < 3;

nst	dept_name
1	Finance
1	History
1	Music
1	Biology
2	Elec. Eng.

`LIMIT` and `OFFSET` and top-k queries

order students on tot_creds

1
2
3


SELECT *
FROM student
ORDER BY tot_cred DESC;

id	name	dept_name	tot_cred
98988	Tanaka	Biology	120
23121	Chavez	Finance	110
00128	Zhang	Comp. Sci.	102
98765	Bourikas	Elec. Eng.	98
19991	Brandt	History	80
76653	Aoi	Elec. Eng.	60
76543	Brown	Comp. Sci.	58
44553	Peltier	Physics	56
54321	Williams	Comp. Sci.	54
45678	Levy	Physics	46
55739	Sanchez	Music	38
12345	Shankar	Comp. Sci.	32
70557	Snow	Physics	0

LIMIT n returns the first n row
- evaluated after ORDER BY
return 3 students with highest tot_cred

1
2
3
4


SELECT *
FROM student
ORDER BY tot_cred DESC
LIMIT 3;

id	name	dept_name	tot_cred
98988	Tanaka	Biology	120
23121	Chavez	Finance	110
00128	Zhang	Comp. Sci.	102

return 3 students with lowest tot_cred

1
2
3
4


SELECT *
FROM student
ORDER BY tot_cred ASC
LIMIT 3;

id	name	dept_name	tot_cred
70557	Snow	Physics	0
12345	Shankar	Comp. Sci.	32
55739	Sanchez	Music	38

OFFSET n skips first n rows
- return 4th to 6th students with highest tot_cred

1
2
3
4


SELECT *
FROM student
ORDER BY tot_cred DESC
OFFSET 3 LIMIT 3;

id	name	dept_name	tot_cred
98765	Bourikas	Elec. Eng.	98
19991	Brandt	History	80
76653	Aoi	Elec. Eng.	60

Bag (multiset) relational algebra

Nested & correlated subqueries

scalar subqueries

students whose tot_cred are larger than the avg tot_cred

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


SELECT s.*
  FROM student s,
       (SELECT avg(tot_cred) AS atc
          FROM student) AS avgct
WHERE tot_cred > atc;

WITH avgct AS (SELECT avg(tot_cred) AS atc
          FROM student)
SELECT s.*
  FROM student s, acgct
WHERE tot_cred > atc;

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102
19991	Brandt	History	80
23121	Chavez	Finance	110
98765	Bourikas	Elec. Eng.	98
98988	Tanaka	Biology	120

scalar subqueries are queries that evaluate to a single row, and can be used like scalar values in, e.g., WHERE clause

1
2
3
4


SELECT *
FROM student
WHERE tot_cred > (SELECT avg(tot_cred)
                    FROM student);

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102
19991	Brandt	History	80
23121	Chavez	Finance	110
98765	Bourikas	Elec. Eng.	98
98988	Tanaka	Biology	120

what happens if scalar subquery returns more than one row?
- runtime error

1
2
3
4
5


SELECT *
FROM student
WHERE tot_cred > (SELECT avg(tot_cred)
                    FROM student
                  GROUP BY dept_name);

`EXISTS` subqueries

EXISTS q is a boolean operator that returns true, if q returns at least result, false otherwise

1
2
3


SELECT *
FROM student
WHERE EXISTS (SELECT * FROM student WHERE tot_cred > 100);

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102
12345	Shankar	Comp. Sci.	32
19991	Brandt	History	80
23121	Chavez	Finance	110
44553	Peltier	Physics	56
45678	Levy	Physics	46
54321	Williams	Comp. Sci.	54
55739	Sanchez	Music	38
70557	Snow	Physics	0
76543	Brown	Comp. Sci.	58
76653	Aoi	Elec. Eng.	60
98765	Bourikas	Elec. Eng.	98
98988	Tanaka	Biology	120

`IN` subqueries

return CS instructors that teach at least one course

1
2
3


SELECT *
FROM instructor i
WHERE dept_name = 'Comp. Sci.' AND (i.id IN (SELECT id FROM teaches));

id	name	dept_name	salary
10101	Srinivasan	Comp. Sci.	65000.00
12121	Wu	Finance	90000.00
15151	Mozart	Music	40000.00
22222	Einstein	Physics	95000.00
32343	El Said	History	60000.00
45565	Katz	Comp. Sci.	75000.00
76766	Crick	Biology	72000.00
83821	Brandt	Comp. Sci.	92000.00
98345	Kim	Elec. Eng.	80000.00

correlations

use attribute from the outer query inside the nested subquery (inner query)
- semantics: for each row from the outer query FROM clause substitute its values for the correlated attributes, then evaluate the resulting inner query.
find students where there is at least one student in the same department that has tot_cred > 100

1
2
3


SELECT *
FROM student s1
WHERE EXISTS (SELECT * FROM student s2 WHERE s2.tot_cred > 100 AND s1.dept_name = s2.dept_name);

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102
12345	Shankar	Comp. Sci.	32
23121	Chavez	Finance	110
54321	Williams	Comp. Sci.	54
76543	Brown	Comp. Sci.	58
98988	Tanaka	Biology	120

Lecture <2022-09-28 Wed>

nested subqueries

example of evaluating nested subqueries with correlations

find students where there is at least one student in the same department that has tot_cred > 100

1
2
3


SELECT *
FROM student s1
WHERE EXISTS (SELECT * FROM student s2 WHERE s2.tot_cred > 100 AND s1.dept_name = s2.dept_name);

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102
12345	Shankar	Comp. Sci.	32
23121	Chavez	Finance	110
54321	Williams	Comp. Sci.	54
76543	Brown	Comp. Sci.	58
98988	Tanaka	Biology	120

1

SELECT * FROM student ORDER BY dept_name;

id	name	dept_name	tot_cred
98988	Tanaka	Biology	120
12345	Shankar	Comp. Sci.	32
54321	Williams	Comp. Sci.	54
00128	Zhang	Comp. Sci.	102
76543	Brown	Comp. Sci.	58
76653	Aoi	Elec. Eng.	60
98765	Bourikas	Elec. Eng.	98
23121	Chavez	Finance	110
19991	Brandt	History	80
55739	Sanchez	Music	38
44553	Peltier	Physics	56
70557	Snow	Physics	0
45678	Levy	Physics	46

foreach student, replacet s1.dept_name with the student's dept_name and then evaluate the query and WHERE

id	name	dept_name	tot_cred
12345	Shankar	Comp. Sci.	32

1

SELECT * FROM student s2 WHERE s2.tot_cred > 100 AND 'Comp. Sci.' = s2.dept_name;

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102

1

SELECT EXISTS (SELECT * FROM student s2 WHERE s2.tot_cred > 100 AND 'Comp. Sci.' = s2.dept_name);

exists
t

other nesting constructs `ANY` and `ALL`

ANY and ALL can be used comparison operators
- ANY returns true if for at least one result of the nested query the comparison evaluates to true
- ALL returns true if for all results of the nested query the comparison evaluates to true

1
2
3


SELECT *
FROM student s1
WHERE tot_cred >= ALL (SELECT tot_cred FROM student s2)

id	name	dept_name	tot_cred
98988	Tanaka	Biology	120

1
2
3


SELECT *
FROM student s1
WHERE tot_cred > ANY (SELECT tot_cred FROM student s2)

id	name	dept_name	tot_cred
00128	Zhang	Comp. Sci.	102
12345	Shankar	Comp. Sci.	32
19991	Brandt	History	80
23121	Chavez	Finance	110
44553	Peltier	Physics	56
45678	Levy	Physics	46
54321	Williams	Comp. Sci.	54
55739	Sanchez	Music	38
76543	Brown	Comp. Sci.	58
76653	Aoi	Elec. Eng.	60
98765	Bourikas	Elec. Eng.	98
98988	Tanaka	Biology	120

1
2
3
4
5
6
7


SELECT *
FROM student s1
WHERE tot_cred = ANY (SELECT tot_cred FROM student s2 WHERE s1.dept_name = s2.dept_name AND s1.id <> s2.id);

SELECT *
FROM student s1
WHERE tot_cred IN (SELECT tot_cred FROM student s2 WHERE s1.dept_name = s2.dept_name AND s1.id <> s2.id);

id

name

dept_name

tot_cred

universal quantification with nested subqueries

EXISTS is a boolean operator, so we can for example, negate it
students that are the only students in their department

1
2
3


SELECT *
FROM student s1
WHERE NOT EXISTS (SELECT * FROM student s2 WHERE s1.id <> s2.id AND s1.dept_name = s2.dept_name);

id	name	dept_name	tot_cred
19991	Brandt	History	80
23121	Chavez	Finance	110
55739	Sanchez	Music	38
98988	Tanaka	Biology	120

departments where all of the students have a tot_cred > 70
- first with ALL

1
2
3


SELECT DISTINCT dept_name
FROM student s1
WHERE 70 < ALL (SELECT tot_cred FROM student s2 WHERE s1.dept_name = s2.dept_name);

dept_name
Biology
Finance
History

1
2
3


SELECT DISTINCT dept_name
FROM student s1
WHERE NOT EXISTS (SELECT tot_cred FROM student s2 WHERE s1.dept_name = s2.dept_name AND s2.tot_cred <= 70);

dept_name
Finance
History
Biology

Give me the name of students that have taken all Biology classes
- for student s, FORALL x in Biology classes: EXISTS takes(s,x)
  - NOT EXISTS x in Biology classes: (NOT EXISTS takes(s,x))

1
2
3
4
5
6
7


WITH bioclasses AS (SELECT course_id FROM course WHERE dept_name = 'Biology')

SELECT DISTINCT id
FROM takes t1
WHERE NOT EXISTS (SELECT * FROM bioclasses c
                   WHERE NOT EXISTS (SELECT * FROM takes t2
                                      WHERE c.course_id = t2.course_id AND t1.id = t2.id))

id

Window functions

OVER()

1
2


SELECT name, avg(tot_cred) OVER () AS avgc
FROM student;

name	avgc
Zhang	65.6923076923076923
Shankar	65.6923076923076923
Brandt	65.6923076923076923
Chavez	65.6923076923076923
Peltier	65.6923076923076923
Levy	65.6923076923076923
Williams	65.6923076923076923
Sanchez	65.6923076923076923
Snow	65.6923076923076923
Brown	65.6923076923076923
Aoi	65.6923076923076923
Bourikas	65.6923076923076923
Tanaka	65.6923076923076923

PARTITION BY - basically group by

1
2


SELECT name, tot_cred, dept_name, avg(tot_cred) OVER (PARTITION BY dept_name) AS tcred
  FROM student;

name	tot_cred	dept_name	tcred
Tanaka	120	Biology	120.0000000000000000
Shankar	32	Comp. Sci.	61.5000000000000000
Williams	54	Comp. Sci.	61.5000000000000000
Zhang	102	Comp. Sci.	61.5000000000000000
Brown	58	Comp. Sci.	61.5000000000000000
Aoi	60	Elec. Eng.	79.0000000000000000
Bourikas	98	Elec. Eng.	79.0000000000000000
Chavez	110	Finance	110.0000000000000000
Brandt	80	History	80.0000000000000000
Sanchez	38	Music	38.0000000000000000
Peltier	56	Physics	34.0000000000000000
Snow	0	Physics	34.0000000000000000
Levy	46	Physics	34.0000000000000000

can also include / exclude rows based on ordering

1
2
3


SELECT name, tot_cred, dept_name, sum(tot_cred) OVER (ORDER BY tot_cred ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS tcred
  FROM student
ORDER BY tot_cred;

name	tot_cred	dept_name	tcred
Snow	0	Physics	0
Shankar	32	Comp. Sci.	32
Sanchez	38	Music	70
Levy	46	Physics	116
Williams	54	Comp. Sci.	170
Peltier	56	Physics	226
Brown	58	Comp. Sci.	284
Aoi	60	Elec. Eng.	344
Brandt	80	History	424
Bourikas	98	Elec. Eng.	522
Zhang	102	Comp. Sci.	624
Chavez	110	Finance	734
Tanaka	120	Biology	854

1
2
3


SELECT name, tot_cred, dept_name, sum(tot_cred) OVER (ORDER BY tot_cred ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS tcred
  FROM student
ORDER BY tot_cred;

name	tot_cred	dept_name	tcred
Snow	0	Physics	854
Shankar	32	Comp. Sci.	854
Sanchez	38	Music	822
Levy	46	Physics	784
Williams	54	Comp. Sci.	738
Peltier	56	Physics	684
Brown	58	Comp. Sci.	628
Aoi	60	Elec. Eng.	570
Brandt	80	History	510
Bourikas	98	Elec. Eng.	430
Zhang	102	Comp. Sci.	332
Chavez	110	Finance	230
Tanaka	120	Biology	120

1
2
3


SELECT name, tot_cred, dept_name, sum(tot_cred) OVER (ORDER BY tot_cred ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS tcred
  FROM student
ORDER BY tot_cred;

name	tot_cred	dept_name	tcred
Snow	0	Physics	32
Shankar	32	Comp. Sci.	70
Sanchez	38	Music	116
Levy	46	Physics	138
Williams	54	Comp. Sci.	156
Peltier	56	Physics	168
Brown	58	Comp. Sci.	174
Aoi	60	Elec. Eng.	198
Brandt	80	History	238
Bourikas	98	Elec. Eng.	280
Zhang	102	Comp. Sci.	310
Chavez	110	Finance	332
Tanaka	120	Biology	230

specify how many rows to include
- ROWS BETWEEN x AND y
  - n PRECEDING
  - n FOLLOWING
  - UNLIMITED PRECEDING / FOLLOWING
  - CURRENT ROW

1
2


SELECT name, tot_cred, dept_name, min(tot_cred) OVER (ORDER BY tot_cred ROWS BETWEEN 3 PRECEDING AND 2 PRECEDING) AS tcred
  FROM student;

name	tot_cred	dept_name	tcred
Snow	0	Physics
Shankar	32	Comp. Sci.
Sanchez	38	Music	0
Levy	46	Physics	0
Williams	54	Comp. Sci.	32
Peltier	56	Physics	38
Brown	58	Comp. Sci.	46
Aoi	60	Elec. Eng.	54
Brandt	80	History	56
Bourikas	98	Elec. Eng.	58
Zhang	102	Comp. Sci.	60
Chavez	110	Finance	80
Tanaka	120	Biology	98

Lecture <2022-10-03 Mon>

window functions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


CREATE TABLE sales (year INT, month INT, totsales NUMERIC(9,2));

INSERT INTO sales VALUES (2022, 1, 40.54);
INSERT INTO sales VALUES (2022, 2, 100.54);
INSERT INTO sales VALUES (2022, 3, 12.2 );
INSERT INTO sales VALUES (2022, 4, 66.66);
INSERT INTO sales VALUES (2022, 5, 8.888);
INSERT INTO sales VALUES (2022, 6, 10.00);
INSERT INTO sales VALUES (2022, 7, 100.0);
INSERT INTO sales VALUES (2022, 8, 54453.12);
INSERT INTO sales VALUES (2022, 9, 12.1);
INSERT INTO sales VALUES (2022, 10, 15.34);
INSERT INTO sales VALUES (2022, 11, 65.33);
INSERT INTO sales VALUES (2022, 12, 102.2);

CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1

1

SELECT * FROM sales;

year	month	totsales
2022	1	40.54
2022	2	100.54
2022	3	12.20
2022	4	66.66
2022	5	8.89
2022	6	10.00
2022	7	100.00
2022	8	54453.12
2022	9	12.10
2022	10	15.34
2022	11	65.33
2022	12	102.20

running total of sales in 2022 over the month

1
2
3


SELECT year, month, totsales, sum(totsales) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM sales
WHERE year = 2022

year	month	totsales	running_total
2022	1	40.54	40.54
2022	2	100.54	141.08
2022	3	12.20	153.28
2022	4	66.66	219.94
2022	5	8.89	228.83
2022	6	10.00	238.83
2022	7	100.00	338.83
2022	8	54453.12	54791.95
2022	9	12.10	54804.05
2022	10	15.34	54819.39
2022	11	65.33	54884.72
2022	12	102.20	54986.92

sales for current month and the two previous months

1
2
3


SELECT year, month, totsales, sum(totsales) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS running_3_month_total
FROM sales
WHERE year = 2022

year	month	totsales	running_3_month_total
2022	1	40.54	40.54
2022	2	100.54	141.08
2022	3	12.20	153.28
2022	4	66.66	179.40
2022	5	8.89	87.75
2022	6	10.00	85.55
2022	7	100.00	118.89
2022	8	54453.12	54563.12
2022	9	12.10	54565.22
2022	10	15.34	54480.56
2022	11	65.33	92.77
2022	12	102.20	182.87

recursive queries (views)

flights (Boston, Chicago, 'United', 544)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


DROP TABLE IF EXISTS flights;
CREATE TABLE flights
  (origin TEXT,
  destination TEXT,
  airline TEXT,
  flightnr INT,
  PRIMARY KEY(airline, flightnr));

INSERT INTO flights VALUES ('ORD', 'SEA', 'UA', 455);
INSERT INTO flights VALUES ('SEA', 'LAX', 'UA', 560);
INSERT INTO flights VALUES ('SEA', 'ORD', 'UA', 566);
INSERT INTO flights VALUES ('SEA', 'BER', 'DE', 111);
INSERT INTO flights VALUES ('BER', 'PAR', 'LH', 666);

DROP TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1

1

SELECT * FROM flights;

origin	destination	airline	flightnr
ORD	SEA	UA	455
SEA	LAX	UA	560
SEA	ORD	UA	566
SEA	BER	DE	111
BER	PAR	LH	666

is there a direct connection from ORD to BER

1
2
3


SELECT *
  FROM flights
WHERE origin = 'ORD' AND destination = 'BER';

origin

destination

airline

flightnr

is there a direct connection from ORD to BER with at most one stop

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


WITH nostop AS (SELECT 0 AS numstops, f.origin, f.destination
  FROM flights f),

  onestop AS (SELECT 1 AS numstops, f1.origin, f2.destination
       FROM flights f1, flights f2
      WHERE f1.destination = f2.origin),

  twostops AS (SELECT 2 AS numstops, f1.origin, f2.destination
       FROM onestop f1, flights f2
      WHERE f1.destination = f2.origin),

  threestops AS (SELECT 3 AS numstops, f1.origin, f2.destination
       FROM twostops f1, flights f2
      WHERE f1.destination = f2.origin)

SELECT numstops
FROM (SELECT * FROM nostop
      UNION
      SELECT * FROM onestop
      UNION
      SELECT * FROM twostops
      UNION
      SELECT * FROM threestops) f
WHERE f.origin = 'ORD' AND f.destination = 'BER';

numstops
1
3

way to express recursive computations in SQL
- fixpoint computation:
  - initialization (specifies initial content)
  - recursive step (compute next iteration content based on current content)

1
2
3
4
5
6
7


WITH RECURSIVE allconnections AS (
  SELECT origin, destination, 0 AS numstops FROM flights -- initialization
  UNION
  SELECT a.origin, f.destination, a.numstops + 1 FROM allconnections a, flights f WHERE a.destination = f.origin AND numstops < 10 -- recusive step
  )

SELECT * FROM allconnections WHERE origin = 'ORD' ORDER BY destination, numstops;

origin	destination	numstops
ORD	BER	1
ORD	BER	3
ORD	BER	5
ORD	BER	7
ORD	BER	9
ORD	LAX	1
ORD	LAX	3
ORD	LAX	5
ORD	LAX	7
ORD	LAX	9
ORD	ORD	1
ORD	ORD	3
ORD	ORD	5
ORD	ORD	7
ORD	ORD	9
ORD	PAR	2
ORD	PAR	4
ORD	PAR	6
ORD	PAR	8
ORD	PAR	10
ORD	SEA	0
ORD	SEA	2
ORD	SEA	4
ORD	SEA	6
ORD	SEA	8
ORD	SEA	10

this is a fixpoint computation
initialization, let $T_i$ is the state of the result at iteration $i$

\[ T_0 = Q_{init}(D) \]

recusive steps

\[ T_{i+1} = T_i \cup Q_{recursive}(T_i, D) \]

termination: $T_{i+1} = T_{i}$

explaining query plans

Lecture <2022-10-05 Wed>

DDL

views

how many students per department

1
2
3
4
5


DROP VIEW deptheadcount;
CREATE VIEW deptheadcount AS (
SELECT dept_name, count(*) AS headcnt
  FROM student
GROUP BY dept_name);

DROP VIEW
CREATE VIEW

1
2
3


SELECT dept_name, headcnt
FROM deptheadcount
WHERE headcnt < 3;

dept_name	headcnt
History	1
Music	1
Biology	1
Elec. Eng.	2
Finance	1

this is equivalent to

1
2
3
4
5


SELECT dept_name, headcnt
FROM (SELECT dept_name, count(*) AS headcnt
  FROM student
GROUP BY dept_name) deph
WHERE headcnt < 3;

dept_name	headcnt
History	1
Music	1
Biology	1
Elec. Eng.	2
Finance	1

materialized view store the view result

1
2
3
4
5
6


DROP VIEW deptheadcount;
CREATE MATERIALIZED VIEW deptheadcount AS (
SELECT dept_name, count(*) AS headcnt
  FROM student
GROUP BY dept_name);
;

DROP VIEW
SELECT 7

1
2
3


SELECT dept_name, headcnt
FROM deptheadcount
WHERE headcnt < 3;

dept_name	headcnt
History	1
Music	1
Biology	1
Elec. Eng.	2
Finance	1

when the base tables are updated, materialized views get "out of sync"
use REFRESH to make sure the view is up to date.

1

SELECT * FROM deptheadcount;

1

DELETE FROM student WHERE name = 'Shankar';

1

REFRESH MATERIALIZED VIEW deptheadcount;

1

SELECT * FROM deptheadcount;

dept_name	headcnt
History	1
Music	1
Physics	3
Biology	1
Elec. Eng.	2
Finance	1
Comp. Sci.	2

catalog (information schema)

catalog makes schema information query-able

1
2
3


SELECT table_schema, table_name, table_type
  FROM information_schema.tables
  ORDER BY table_schema, table_name;

table_schema	table_name	table_type
information_schema	_pg_foreign_data_wrappers	VIEW
information_schema	_pg_foreign_servers	VIEW
information_schema	_pg_foreign_table_columns	VIEW
information_schema	_pg_foreign_tables	VIEW
information_schema	_pg_user_mappings	VIEW
information_schema	administrable_role_authorizations	VIEW
information_schema	applicable_roles	VIEW
information_schema	attributes	VIEW
information_schema	character_sets	VIEW
information_schema	check_constraint_routine_usage	VIEW
information_schema	check_constraints	VIEW
information_schema	collation_character_set_applicability	VIEW
information_schema	collations	VIEW
information_schema	column_column_usage	VIEW
information_schema	column_domain_usage	VIEW
information_schema	column_options	VIEW
information_schema	column_privileges	VIEW
information_schema	column_udt_usage	VIEW
information_schema	columns	VIEW
information_schema	constraint_column_usage	VIEW
information_schema	constraint_table_usage	VIEW
information_schema	data_type_privileges	VIEW
information_schema	domain_constraints	VIEW
information_schema	domain_udt_usage	VIEW
information_schema	domains	VIEW
information_schema	element_types	VIEW
information_schema	enabled_roles	VIEW
information_schema	foreign_data_wrapper_options	VIEW
information_schema	foreign_data_wrappers	VIEW
information_schema	foreign_server_options	VIEW
information_schema	foreign_servers	VIEW
information_schema	foreign_table_options	VIEW
information_schema	foreign_tables	VIEW
information_schema	information_schema_catalog_name	VIEW
information_schema	key_column_usage	VIEW
information_schema	parameters	VIEW
information_schema	referential_constraints	VIEW
information_schema	role_column_grants	VIEW
information_schema	role_routine_grants	VIEW
information_schema	role_table_grants	VIEW
information_schema	role_udt_grants	VIEW
information_schema	role_usage_grants	VIEW
information_schema	routine_column_usage	VIEW
information_schema	routine_privileges	VIEW
information_schema	routine_routine_usage	VIEW
information_schema	routine_sequence_usage	VIEW
information_schema	routine_table_usage	VIEW
information_schema	routines	VIEW
information_schema	schemata	VIEW
information_schema	sequences	VIEW
information_schema	sql_features	BASE TABLE
information_schema	sql_implementation_info	BASE TABLE
information_schema	sql_parts	BASE TABLE
information_schema	sql_sizing	BASE TABLE
information_schema	table_constraints	VIEW
information_schema	table_privileges	VIEW
information_schema	tables	VIEW
information_schema	transforms	VIEW
information_schema	triggered_update_columns	VIEW
information_schema	triggers	VIEW
information_schema	udt_privileges	VIEW
information_schema	usage_privileges	VIEW
information_schema	user_defined_types	VIEW
information_schema	user_mapping_options	VIEW
information_schema	user_mappings	VIEW
information_schema	view_column_usage	VIEW
information_schema	view_routine_usage	VIEW
information_schema	view_table_usage	VIEW
information_schema	views	VIEW
pg_catalog	pg_aggregate	BASE TABLE
pg_catalog	pg_am	BASE TABLE
pg_catalog	pg_amop	BASE TABLE
pg_catalog	pg_amproc	BASE TABLE
pg_catalog	pg_attrdef	BASE TABLE
pg_catalog	pg_attribute	BASE TABLE
pg_catalog	pg_auth_members	BASE TABLE
pg_catalog	pg_authid	BASE TABLE
pg_catalog	pg_available_extension_versions	VIEW
pg_catalog	pg_available_extensions	VIEW
pg_catalog	pg_backend_memory_contexts	VIEW
pg_catalog	pg_cast	BASE TABLE
pg_catalog	pg_class	BASE TABLE
pg_catalog	pg_collation	BASE TABLE
pg_catalog	pg_config	VIEW
pg_catalog	pg_constraint	BASE TABLE
pg_catalog	pg_conversion	BASE TABLE
pg_catalog	pg_cursors	VIEW
pg_catalog	pg_database	BASE TABLE
pg_catalog	pg_db_role_setting	BASE TABLE
pg_catalog	pg_default_acl	BASE TABLE
pg_catalog	pg_depend	BASE TABLE
pg_catalog	pg_description	BASE TABLE
pg_catalog	pg_enum	BASE TABLE
pg_catalog	pg_event_trigger	BASE TABLE
pg_catalog	pg_extension	BASE TABLE
pg_catalog	pg_file_settings	VIEW
pg_catalog	pg_foreign_data_wrapper	BASE TABLE
pg_catalog	pg_foreign_server	BASE TABLE
pg_catalog	pg_foreign_table	BASE TABLE
pg_catalog	pg_group	VIEW
pg_catalog	pg_hba_file_rules	VIEW
pg_catalog	pg_index	BASE TABLE
pg_catalog	pg_indexes	VIEW
pg_catalog	pg_inherits	BASE TABLE
pg_catalog	pg_init_privs	BASE TABLE
pg_catalog	pg_language	BASE TABLE
pg_catalog	pg_largeobject	BASE TABLE
pg_catalog	pg_largeobject_metadata	BASE TABLE
pg_catalog	pg_locks	VIEW
pg_catalog	pg_matviews	VIEW
pg_catalog	pg_namespace	BASE TABLE
pg_catalog	pg_opclass	BASE TABLE
pg_catalog	pg_operator	BASE TABLE
pg_catalog	pg_opfamily	BASE TABLE
pg_catalog	pg_partitioned_table	BASE TABLE
pg_catalog	pg_policies	VIEW
pg_catalog	pg_policy	BASE TABLE
pg_catalog	pg_prepared_statements	VIEW
pg_catalog	pg_prepared_xacts	VIEW
pg_catalog	pg_proc	BASE TABLE
pg_catalog	pg_publication	BASE TABLE
pg_catalog	pg_publication_rel	BASE TABLE
pg_catalog	pg_publication_tables	VIEW
pg_catalog	pg_range	BASE TABLE
pg_catalog	pg_replication_origin	BASE TABLE
pg_catalog	pg_replication_origin_status	VIEW
pg_catalog	pg_replication_slots	VIEW
pg_catalog	pg_rewrite	BASE TABLE
pg_catalog	pg_roles	VIEW
pg_catalog	pg_rules	VIEW
pg_catalog	pg_seclabel	BASE TABLE
pg_catalog	pg_seclabels	VIEW
pg_catalog	pg_sequence	BASE TABLE
pg_catalog	pg_sequences	VIEW
pg_catalog	pg_settings	VIEW
pg_catalog	pg_shadow	VIEW
pg_catalog	pg_shdepend	BASE TABLE
pg_catalog	pg_shdescription	BASE TABLE
pg_catalog	pg_shmem_allocations	VIEW
pg_catalog	pg_shseclabel	BASE TABLE
pg_catalog	pg_stat_activity	VIEW
pg_catalog	pg_stat_all_indexes	VIEW
pg_catalog	pg_stat_all_tables	VIEW
pg_catalog	pg_stat_archiver	VIEW
pg_catalog	pg_stat_bgwriter	VIEW
pg_catalog	pg_stat_database	VIEW
pg_catalog	pg_stat_database_conflicts	VIEW
pg_catalog	pg_stat_gssapi	VIEW
pg_catalog	pg_stat_progress_analyze	VIEW
pg_catalog	pg_stat_progress_basebackup	VIEW
pg_catalog	pg_stat_progress_cluster	VIEW
pg_catalog	pg_stat_progress_copy	VIEW
pg_catalog	pg_stat_progress_create_index	VIEW
pg_catalog	pg_stat_progress_vacuum	VIEW
pg_catalog	pg_stat_replication	VIEW
pg_catalog	pg_stat_replication_slots	VIEW
pg_catalog	pg_stat_slru	VIEW
pg_catalog	pg_stat_ssl	VIEW
pg_catalog	pg_stat_subscription	VIEW
pg_catalog	pg_stat_sys_indexes	VIEW
pg_catalog	pg_stat_sys_tables	VIEW
pg_catalog	pg_stat_user_functions	VIEW
pg_catalog	pg_stat_user_indexes	VIEW
pg_catalog	pg_stat_user_tables	VIEW
pg_catalog	pg_stat_wal	VIEW
pg_catalog	pg_stat_wal_receiver	VIEW
pg_catalog	pg_stat_xact_all_tables	VIEW
pg_catalog	pg_stat_xact_sys_tables	VIEW
pg_catalog	pg_stat_xact_user_functions	VIEW
pg_catalog	pg_stat_xact_user_tables	VIEW
pg_catalog	pg_statio_all_indexes	VIEW
pg_catalog	pg_statio_all_sequences	VIEW
pg_catalog	pg_statio_all_tables	VIEW
pg_catalog	pg_statio_sys_indexes	VIEW
pg_catalog	pg_statio_sys_sequences	VIEW
pg_catalog	pg_statio_sys_tables	VIEW
pg_catalog	pg_statio_user_indexes	VIEW
pg_catalog	pg_statio_user_sequences	VIEW
pg_catalog	pg_statio_user_tables	VIEW
pg_catalog	pg_statistic	BASE TABLE
pg_catalog	pg_statistic_ext	BASE TABLE
pg_catalog	pg_statistic_ext_data	BASE TABLE
pg_catalog	pg_stats	VIEW
pg_catalog	pg_stats_ext	VIEW
pg_catalog	pg_stats_ext_exprs	VIEW
pg_catalog	pg_subscription	BASE TABLE
pg_catalog	pg_subscription_rel	BASE TABLE
pg_catalog	pg_tables	VIEW
pg_catalog	pg_tablespace	BASE TABLE
pg_catalog	pg_timezone_abbrevs	VIEW
pg_catalog	pg_timezone_names	VIEW
pg_catalog	pg_transform	BASE TABLE
pg_catalog	pg_trigger	BASE TABLE
pg_catalog	pg_ts_config	BASE TABLE
pg_catalog	pg_ts_config_map	BASE TABLE
pg_catalog	pg_ts_dict	BASE TABLE
pg_catalog	pg_ts_parser	BASE TABLE
pg_catalog	pg_ts_template	BASE TABLE
pg_catalog	pg_type	BASE TABLE
pg_catalog	pg_user	VIEW
pg_catalog	pg_user_mapping	BASE TABLE
pg_catalog	pg_user_mappings	VIEW
pg_catalog	pg_views	VIEW
public	advisor	BASE TABLE
public	classroom	BASE TABLE
public	course	BASE TABLE
public	department	BASE TABLE
public	flights	BASE TABLE
public	instructor	BASE TABLE
public	myemptytable	BASE TABLE
public	otherstudents	BASE TABLE
public	prereq	BASE TABLE
public	sales	BASE TABLE
public	section	BASE TABLE
public	seminars	BASE TABLE
public	student	BASE TABLE
public	takes	BASE TABLE
public	teaches	BASE TABLE
public	time_slot	BASE TABLE

1
2
3
4


SELECT table_name, column_name, ordinal_position, data_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name LIKE 's%'
ORDER BY table_name, ordinal_position;

table_name	column_name	ordinal_position	data_type
sales	year	1	integer
sales	month	2	integer
sales	totsales	3	numeric
section	course_id	1	character varying
section	sec_id	2	character varying
section	semester	3	character varying
section	year	4	numeric
section	building	5	character varying
section	room_number	6	character varying
section	time_slot_id	7	character varying
seminars	title	1	character varying
seminars	presenter	2	character varying
seminars	semdate	3	timestamp without time zone
seminars	building	4	character varying
seminars	room_number	5	character varying
seminars	organizing_dept	6	character varying
student	id	1	character varying
student	name	2	character varying
student	dept_name	3	character varying
student	tot_cred	4	numeric

More on constraints

foreign keys

for foreign keys we can specify what action to take if the referenced table is updated
- for deletions ON DELETE
- for update ON UPDATE
options
- SET NULL - set fk columns to NULL
- RESTRICT - disallow deletion / updates for tuples that referenced
- CASCADE
  - for deletion: delete referencing rows
  - for update: update the values of referencing rows

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


CREATE TABLE addr (
  id INT PRIMARY KEY,
  city TEXT,
  zip NUMERIC(5,0)
);

INSERT INTO addr VALUES (1, 'Chicago', 60616);
INSERT INTO addr VALUES (2, 'Chicago', 60612);
INSERT INTO addr VALUES (3, 'Chicago', 60645);
INSERT INTO addr VALUES (4, 'Evanston', 60555);

CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1

persons that reference address

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


CREATE TABLE person (
  name TEXT,
  addrid INT REFERENCES addr
);

INSERT INTO person VALUES ('A', 1);
INSERT INTO person VALUES ('B', 1);
INSERT INTO person VALUES ('C', 2);
INSERT INTO person VALUES ('D', 4);
INSERT INTO person VALUES ('E', 4);
INSERT INTO person VALUES ('F', 4);

CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1

1

DELETE FROM addr WHERE id = 1;

1
2
3
4


--ALTER TABLE person DROP CONSTRAINT person_addrid_fkey;
ALTER TABLE person ADD CONSTRAINT ourfk
  FOREIGN KEY (addrid) REFERENCES addr
              ON DELETE CASCADE ON UPDATE SET NULL;

ALTER TABLE

1

DELETE FROM addr WHERE id = 1;

DELETE 1

1

SELECT * FROM person;

name	addrid
C	2
D	4
E	4
F	4

1

UPDATE addr SET id = 5 WHERE id = 4;

UPDATE 1

1

SELECT * FROM person;

name	addrid
C	2
D
E
F

`CHECK` constraints

check boolean condition over the values of columns (from this table!)

1

SELECT * FROM addr WHERE NOT (zip::TEXT LIKE '60%');

id	city	zip
1	NY	15555

1

ALTER TABLE addr ADD CONSTRAINT local_check CHECK (zip::TEXT LIKE '60%');

ALTER TABLE

violates the constraint since 15555 does not start with 60

1

INSERT INTO addr VALUES (1, 'NY', 15555);

Updates (DML)

transactions

allow us to group SQL statements into atomic
- either all will be executed or none
- and current transactions cannot interact (later when we talk concurrency control)
transaction end if you run a COMMIT or ABORT
the database will abort transactions if a statement throws an error

1
2
3
4


BEGIN WORK;
INSERT INTO addr VALUES (6,'Chicago', 60754);
INSERT INTO addr VALUES (7,'NY', 15555);
COMMIT;

BEGIN
INSERT 0 1

1

SELECT * FROM addr;

id	city	zip
2	Chicago	60612
3	Chicago	60645
5	Evanston	60555

advanced update statement

insert query results

INSERT INTO ... SELECT

1
2
3
4
5


INSERT INTO student (SELECT 'AB' || substring(id,0,4),
                            name,
                            'History',
                            tot_cred
                       FROM student);

INSERT 0 11

1

SELECT * FROM student;

id	name	dept_name	tot_cred
19991	Brandt	History	80
23121	Chavez	Finance	110
44553	Peltier	Physics	56
45678	Levy	Physics	46
54321	Williams	Comp. Sci.	54
55739	Sanchez	Music	38
70557	Snow	Physics	0
76543	Brown	Comp. Sci.	58
76653	Aoi	Elec. Eng.	60
98765	Bourikas	Elec. Eng.	98
98988	Tanaka	Biology	120
AB199	Brandt	History	80
AB231	Chavez	History	110
AB445	Peltier	History	56
AB456	Levy	History	46
AB543	Williams	History	54
AB557	Sanchez	History	38
AB705	Snow	History	0
AB765	Brown	History	58
AB766	Aoi	History	60
AB987	Bourikas	History	98
AB989	Tanaka	History	120

update / delete with nested subqueries

can use nested subqueries in WHERE of delete or update

1
2
3


SELECT count(*), dept_name
   FROM student s2
  GROUP BY s2.dept_name;

count	dept_name
3	Physics
1	Biology
2	Elec. Eng.
1	Finance
2	Comp. Sci.
12	History
1	Music

1
2
3
4


DELETE FROM student s
 WHERE 3 > (SELECT count(*)
              FROM student s2
             WHERE s2.dept_name = s.dept_name);

DELETE 7

1

SELECT * FROM student;

id	name	dept_name	tot_cred
19991	Brandt	History	80
44553	Peltier	Physics	56
45678	Levy	Physics	46
70557	Snow	Physics	0
AB199	Brandt	History	80
AB231	Chavez	History	110
AB445	Peltier	History	56
AB456	Levy	History	46
AB543	Williams	History	54
AB557	Sanchez	History	38
AB705	Snow	History	0
AB765	Brown	History	58
AB766	Aoi	History	60
AB987	Bourikas	History	98
AB989	Tanaka	History	120

Lecture <2022-10-12 Wed>

exam topics review and questions

Access control

1
2
3


REVOKE ALL ON student FROM testuser;
DROP USER IF EXISTS testuser;
CREATE USER testuser WITH LOGIN PASSWORD 'test';

REVOKE
DROP ROLE
CREATE ROLE

1

GRANT SELECT ON student TO testuser;

GRANT

1

GRANT INSERT ON student TO testuser;

GRANT

1
2
3
4
5


CREATE VIEW studentstats AS (
  SELECT dept_name, count(*) AS numst, avg(tot_cred) AS avgcred
    FROM student
   GROUP BY dept_name
  );

CREATE VIEW

roles

1
2
3
4


CREATE ROLE hr;
GRANT SELECT on studentstats TO hr;
GRANT hr TO testuser;
REVOKE ALL ON student FROM testuser;

GRANT
GRANT ROLE
REVOKE

User defined types and functions

UDT = user-defined types

define new types
in postgres use C and SQL to register the new type and its functions: https://www.postgresql.org/docs/current/xtypes.html
- input: take string and return binary representation
- output: take binary and return string representation
- send/receive: serialize / deserialize for network transfer

UDF = user-defined functions and operators

can be written in PL like C, Python, …

Procedural extensions

https://www.postgresql.org/docs/current/plpgsql.html
in Postgres there is procedural extension for writing SQL + imperative code PL/PGSQL

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


CREATE FUNCTION get_num_students() RETURNS int
AS $$
DECLARE
  cnt int;
BEGIN
    SELECT count(*) INTO cnt
        FROM student;
    RETURN cnt;
END;
$$ LANGUAGE plpgsql;

CREATE FUNCTION

1

SELECT get_num_students();

get_num_students
16

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


CREATE FUNCTION get_num_students(dept TEXT) RETURNS int
AS $$
DECLARE
  cnt int;
BEGIN
    SELECT count(*) INTO cnt
      FROM student
      WHERE dept_name = get_num_students.dept;
    RETURN cnt;
END;
$$ LANGUAGE plpgsql;

CREATE FUNCTION

1
2


SELECT name, dept_name, get_num_students(dept_name)
FROM student

name	dept_name	get_num_students
Brandt	History	13
Peltier	Physics	3
Levy	Physics	3
Snow	Physics	3
Brandt	History	13
Chavez	History	13
Peltier	History	13
Levy	History	13
Williams	History	13
Sanchez	History	13
Snow	History	13
Brown	History	13
Aoi	History	13
Bourikas	History	13
Tanaka	History	13
test	History	13

1
2
3
4
5
6
7
8
9


CREATE FUNCTION plus_one(i INT) RETURNS int
AS $$
DECLARE
  res INT;
BEGIN
  res := i + 1;
  RETURN res;
END;
$$ LANGUAGE plpgsql;

CREATE FUNCTION

1

SELECT plus_one(1);

plus_one
2

control flow and iteration

conditional control flow

https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-CONDITIONALS

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


CREATE FUNCTION add_one_if_large_else_add_two(i INT) RETURNS int
AS $$
DECLARE
  res INT;
BEGIN
  IF (i > 100)
  THEN
    res := i + 1;
  ELSE
    res := i + 2;
  END IF;
  RETURN res;
END;
$$ LANGUAGE plpgsql;

CREATE FUNCTION

1
2


SELECT add_one_if_large_else_add_two(3);
SELECT add_one_if_large_else_add_two(200);

add_one_if_large_else_add_two
5
add_one_if_large_else_add_two
201

looping

https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-CONTROL-STRUCTURES-LOOPS

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


CREATE OR REPLACE FUNCTION add_one_hun(i INT) RETURNS int
AS $$
DECLARE
  j INT;
  res INT;
BEGIN
  j := 0;
  res := i;
  WHILE (j < 100)
    LOOP
    res := res + 1;
    j := j + 1;
  END LOOP;
  RETURN res;
END;
$$ LANGUAGE plpgsql;

CREATE FUNCTION

1

SELECT add_one_hun(100);

add_one_hun
200

looping through query results

https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


CREATE OR REPLACE FUNCTION another_cnt() RETURNS int
AS $$
DECLARE
  res INT;
  st RECORD;
BEGIN
  res := 0;
  FOR st IN (SELECT * FROM student)
  LOOP
    IF (st.dept_name = 'History')
      THEN
        res := res + 1;
    END IF;
  END LOOP;
  RETURN res;
END;
$$ LANGUAGE plpgsql;

CREATE FUNCTION

1

SELECT another_cnt();

another_cnt
13

table returning functions

1

SELECT i FROM generate_series(1,10) AS f(i);

i
1
2
3
4
5
6
7
8
9
10

Lecture <2022-10-17 Mon>

Accessing databases from programming languages

Java

https://github.com/IITDBGroup/cs425/tree/master/programming_language_examples/Java

additional links

Python

https://github.com/IITDBGroup/cs425/tree/master/programming_language_examples/Python

Lecture <2022-10-19 Wed>

Triggers

triggers are special functions that are executed when a certain action is performed on a table
in Postgres you first have to define the trigger function and then you can create a trigger

postgres trigger creation syntax

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


CREATE [ OR REPLACE ] [ CONSTRAINT ] TRIGGER name { BEFORE | AFTER | INSTEAD OF } { event [ OR ... ] }
    ON table_name
    [ FROM referenced_table_name ]
    [ NOT DEFERRABLE | [ DEFERRABLE ] [ INITIALLY IMMEDIATE | INITIALLY DEFERRED ] ]
    [ REFERENCING { { OLD | NEW } TABLE [ AS ] transition_relation_name } [ ... ] ]
    [ FOR [ EACH ] { ROW | STATEMENT } ]
    [ WHEN ( condition ) ]
    EXECUTE { FUNCTION | PROCEDURE } function_name ( arguments )

where event can be one of:

    INSERT
    UPDATE [ OF column_name [, ... ] ]
    DELETE
    TRUNCATE

trigger function

returns value of type trigger

1
2
3
4
5
6
7
8


CREATE FUNCTION trigger_function()
   RETURNS TRIGGER
   LANGUAGE PLPGSQL
AS $$
BEGIN
   -- trigger logic
END;
$$

trigger example: implement history tables with triggers

do not loose the history of the table (keep old version)

1
2
3
4
5
6
7


CREATE TABLE nohistory_time (a int PRIMARY KEY, b int, t_begin TIMESTAMP DEFAULT CURRENT_TIMESTAMP);

INSERT INTO nohistory_time (a,b) VALUES (1,1), (2,2), (3,3);

CREATE VIEW nohistory AS (SELECT a, b FROM nohistory_time);

CREATE TABLE history (a int, b int, t_begin TIMESTAMP, t_end TIMESTAMP);

CREATE TABLE
INSERT 0 3
CREATE VIEW
CREATE TABLE

1

SELECT * FROM nohistory_time;

a	b	t_begin
1	1	2022-10-19 15:31:29.584735
2	2	2022-10-19 15:31:29.584735
3	3	2022-10-19 15:31:29.584735

1

SELECT * FROM history;

a

b

t_begin

t_end

1
2
3
4
5
6
7
8
9


CREATE OR REPLACE FUNCTION keep_history_delete()
   RETURNS TRIGGER
   LANGUAGE PLPGSQL
AS $$
BEGIN
   INSERT INTO history VALUES (OLD.a, OLD.b, OLD.t_begin, CURRENT_TIMESTAMP);
   RETURN NEW;
END;
$$

CREATE FUNCTION

1
2
3
4


CREATE OR REPLACE TRIGGER keep_history_delete_trigger
  AFTER DELETE ON nohistory_time
  FOR EACH ROW
    EXECUTE FUNCTION keep_history_delete();

CREATE TRIGGER

1

DELETE FROM nohistory_time WHERE a = 1;

INSERT 0 1
DELETE 1

1

SELECT * FROM history;

a	b	t_begin	t_end
1	1	2022-10-19 15:35:20.330632	2022-10-19 15:35:20.339711

1

DELETE FROM nohistory_time WHERE a = 2;

DELETE 1

1

INSERT INTO nohistory_time (a,b) VALUES (4,2), (5,3);

INSERT 0 2

1

SELECT * FROM nohistory_time;

a	b	t_begin
3	3	2022-10-19 15:31:29.584735
4	2	2022-10-19 15:38:07.803667
5	3	2022-10-19 15:38:07.803667

1

DELETE FROM nohistory_time WHERE a <> b;

DELETE 2

1

SELECT * FROM history ORDER BY t_begin, t_end, a, b;

a	b	t_begin	t_end
2	2	2022-10-19 15:31:29.584735	2022-10-19 15:36:48.183113
1	1	2022-10-19 15:35:20.330632	2022-10-19 15:35:20.339711
4	2	2022-10-19 15:38:07.803667	2022-10-19 15:38:13.546615
5	3	2022-10-19 15:38:07.803667	2022-10-19 15:38:13.546615

get data as of a certain time

1

SELECT a, b FROM history WHERE '2022-10-19 15:35:20.339710'::TIMESTAMP BETWEEN t_begin AND t_end;

a	b
1	1
2	2

now updates

1
2
3
4
5
6
7
8
9


CREATE OR REPLACE FUNCTION keep_history_update()
   RETURNS TRIGGER
   LANGUAGE PLPGSQL
AS $$
BEGIN
   INSERT INTO history VALUES (OLD.a, OLD.b, OLD.t_begin, CURRENT_TIMESTAMP);
   RETURN NEW;
END;
$$

CREATE FUNCTION

1
2
3
4


CREATE OR REPLACE TRIGGER keep_history_update_trigger
  AFTER UPDATE ON nohistory_time
  FOR EACH ROW
    EXECUTE FUNCTION keep_history_update();

CREATE TRIGGER

1

INSERT INTO nohistory_time (a,b) VALUES (5,5), (6,6), (7,1), (8,2);

INSERT 0 4

1

SELECT * FROM nohistory_time;

a	b	t_begin
3	3	2022-10-19 15:31:29.584735
5	5	2022-10-19 15:42:57.311625
6	6	2022-10-19 15:42:57.311625
7	1	2022-10-19 15:44:53.411487
8	2	2022-10-19 15:44:53.411487

1

UPDATE nohistory_time SET b = a WHERE a <> b;

UPDATE 2

1

SELECT * FROM history;

a	b	t_begin	t_end
1	1	2022-10-19 15:35:20.330632	2022-10-19 15:35:20.339711
2	2	2022-10-19 15:31:29.584735	2022-10-19 15:36:48.183113
4	2	2022-10-19 15:38:07.803667	2022-10-19 15:38:13.546615
5	3	2022-10-19 15:38:07.803667	2022-10-19 15:38:13.546615
7	7	2022-10-19 15:42:57.311625	2022-10-19 15:44:53.392018
8	8	2022-10-19 15:42:57.311625	2022-10-19 15:44:53.392018
7	1	2022-10-19 15:44:53.411487	2022-10-19 15:45:03.793728
8	2	2022-10-19 15:44:53.411487	2022-10-19 15:45:03.793728

1
2
3
4
5
6


SELECT a, b
  FROM
    (SELECT * FROM history
      UNION ALL
     SELECT *, NULL AS t_end FROM nohistory_time) sub
 WHERE '2022-10-19 15:35:20.339710'::TIMESTAMP BETWEEN t_begin AND t_end OR (t_begin <= '2022-10-19 15:35:20.339710' AND t_end IS NULL);

a	b
1	1
2	2
3	3

ER-model

requirement analysis: music collection

genres, albums, artists
We store information about songs
- Songs have a title
- Songs are written by one or more artists
- Songs can be on albums as tracks and appear with a particular track number
- Songs can have a genre
We store information about artists.
- Artists have a name
- Artists have a date of birth and optionally a date of death
We want to store information about albums.
- Albums have a release date
- Albums have a number of tracks which are songs and have additionally a track number and length
- Albums have artists playing on the album
- Albums are released by some publisher (typically record company)
- Albums have titles

Lecture <2022-10-24 Mon>

ER-model

Entities and relationships

students taking courses example

../../static/imgs/studentcourse.jpg

Examples

Music collection

../../static/imgs/music.jpg

hand-written notes

https://www.evernote.com/shard/s131/sh/ad0ab9a5-1fee-4216-a799-72195428f74b/0b0199a598ac4f3c5520c488a5442e2d

Lecture <2022-10-31 Mon>

Lecture <2022-11-07 Mon>

Lecture <2022-11-09 Wed>

LEFT OUTER JOIN for not exists checks

1
2
3
4
5


SELECT emp
FROM servedby s LEFT OUTER JOIN (SELECT *, 1 AS indicator FROM mail) m ON (s.addr = m.to_addr
                                           AND s.startdate<=m.delivered_date
                                           AND ( s.enddate IS NULL OR s.enddate>=m.delivered_date))
WHERE m.indicator IS NULL

exam scaling

grade = 0.25 * 100/85 * midterm score + …

Lecture <2022-11-14 Mon>

Lecture <2022-11-21 Mon>

Lecture <2022-11-30 Wed>

example of determining normal forms

2NF

1NF (always assumed)
$$R = (A,B,C,D,E)$$
$$F = \{ A \rightarrow E, BC \rightarrow D, C \rightarrow E \}$$
2NF
- candidate keys $$\{\{ A,B,C \}\}$$
- $$ABC^{+} = ABCDE$$
- non-prime = $$\{D,E\}$$
- counterexample: $$A \rightarrow E$$ is a dependency from part of a candidate key to a non-prime attribute
this is not in 2NF
$$R = (A,B,C,D,E)$$
$$F = \{ A \rightarrow E, BC \rightarrow AD, C \rightarrow E \}$$
candidate keys $$\{\{ B,C \}\}$$
- non-prime = $$\{A,D,E\}$$
counterexample: $$C \rightarrow E$$ is a dependency from part of a candidate key to a non-prime attribute
this is in not 2NF
$$R = (A,B,C,D,E)$$
$$F = \{ A \rightarrow E, BC \rightarrow AD \}$$
candidate keys $$\{\{ B,C \}\}$$
- non-prime = $$\{A,D,E\}$$
counterexample: is a dependency from part of a candidate key to a non-prime attribute
this is in 2NF

3NF

$$R = (A,B,C,D,E)$$
$$F = \{ A \rightarrow E, BC \rightarrow AD \}$$
candidate keys $$\{\{ B,C \}\}$$
- non-prime = $$\{A,D,E\}$$
counterexample: is a dependency from part of a candidate key to a non-prime attribute
this is in 2NF
$$A \rightarrow E$$ is neither trivial, nor is A a superkey, nor is E part of a candidate key!
this is not in 3NF
just an exercise: $$BC \rightarrow AD$$ ok, BC is a superkey
$$BC \rightarrow B$$ - trivial and BC is a superkey $$BC^{+} = BCADE$$

BCNF

same as first two options for 3NF (exploit that)

attribute closures and canonical cover

$$F = \{ A \rightarrow BCD, B \rightarrow A, C \rightarrow D \}$$
$$F^{+} = \{ AB \rightarrow B, A \rightarrow D, \ldots \}$$
attribute closure $$A^{+} = ABCD$$
$$F^{C} = \{ A \rightarrow BCD, B \rightarrow A, C \rightarrow D \}$$
- find extraneous attributes in RHS, LHS
- is $$D$$ extraneous in $$A \rightarrow BCD$$
  - $$F' = \{ A \rightarrow BC, B \rightarrow A, C \rightarrow D \}$$
    - attribute closure $$A^{+}$$ = $$ABCD$$
$$F^{C} = \{ A \rightarrow BC, B \rightarrow A, C \rightarrow D \}$$
- is $$B$$ extraneous in $$A \rightarrow BC$$
- $$F' = \{ A \rightarrow C, B \rightarrow A, C \rightarrow D \}$$
  - attribute closure $$A^{+}$$ = $$ACD$$ (not extraneous)
- is $$C$$ extraneous in $$A \rightarrow BC$$?
- $$F' = \{ A \rightarrow B, B \rightarrow A, C \rightarrow D \}$$
  - attribute closure $$A^{+}$$ = $$AB$$ (not extraneous)
$$F^{C} = \{ A \rightarrow BC, B \rightarrow A, C \rightarrow D \}$$ is the final result

decomposition into 3NF

Let's use $$F^{C} = \{ A \rightarrow BC, B \rightarrow A, C \rightarrow D \}$$ as an example.

start with empty set of fragments $$\{\}$$
$$A \rightarrow BC$$ - $$R_1 = (A,B,C)$$
$$B \rightarrow A$$ - we already have $$R_1$$
$$C \rightarrow D$$ - $$R_2 = (C,D)$$
we already have $$R_1$$ containing candidate key $$A$$
no contained fragments
final decomposed result is $$\{R_1, R_2\}$$

origin	destination	numstops
ORD	BER	1
ORD	BER	3
ORD	BER	5
ORD	BER	7
ORD	BER	9
ORD	LAX	1
ORD	LAX	3
ORD	LAX	5
ORD	LAX	7
ORD	LAX	9
ORD	ORD	1
ORD	ORD	3
ORD	ORD	5
ORD	ORD	7
ORD	ORD	9
ORD	PAR	2
ORD	PAR	4
ORD	PAR	6
ORD	PAR	8
ORD	PAR	10
ORD	SEA	0
ORD	SEA	2
ORD	SEA	4
ORD	SEA	6
ORD	SEA	8
ORD	SEA	10

origin	destination	numstops
ORD	BER	1
ORD	BER	3
ORD	BER	5
ORD	BER	7
ORD	BER	9
ORD	LAX	1
ORD	LAX	3
ORD	LAX	5
ORD	LAX	7
ORD	LAX	9
ORD	ORD	1
ORD	ORD	3
ORD	ORD	5
ORD	ORD	7
ORD	ORD	9
ORD	PAR	2
ORD	PAR	4
ORD	PAR	6
ORD	PAR	8
ORD	PAR	10
ORD	SEA	0
ORD	SEA	2
ORD	SEA	4
ORD	SEA	6
ORD	SEA	8
ORD	SEA	10

Lecture Notes for CS425

Overview

Lecture <2022-08-24 Wed>

Relations

PKs and FKs

JSON

Lecture <2022-08-31 Wed>

Recap relational model

Schema vs. Instance (data)

schema

instance (data)

Integrity Constraints

foreign key violations

Lecture <2022-09-14 Wed> and <2022-09-19 Mon>

Application Domain for Project

banking system

persons (roles)

data

processes

Postgres Overview

Cluster directory

Postgres binaries

initdb - creates clusters

pg_ctl - start / stop servers

psql - commandline client

Client applications (pgAdmin)

pgAdmin

Setup

Inspecting the database schema

Writing SQL

Create schema and insert data

More DDL examples

DDL change table definitions

dropping a table

alter table

DML - inserts and delete

Insert

Delete

DML - queries

basic SELECT-FROM-WHERE blocks

examples

relational algebra equivalence

casting

keywords are case-insensitive

joins

natural join

outer joins

specifying join conditions

Lecture <2022-09-21 Wed>

SQL is bag semantics

subqueries

WITH clause

DISTINCT

String operations

Case distinction and coalesce

CASE

coalesce

IS NULL and IS NOT NULL

Ordering results

Set operations

set union

bag union

set & bag intersection

set & bag difference

Aggregation and Grouping

Lecture <2022-09-26 Mon>

Aggregation and NULL and empty inputs

empty tables

some null values in tables

Aggregation with HAVING

LIMIT and OFFSET and top-k queries

Bag (multiset) relational algebra

Nested & correlated subqueries

scalar subqueries

EXISTS subqueries

IN subqueries

correlations

Lecture <2022-09-28 Wed>

nested subqueries

example of evaluating nested subqueries with correlations

`initdb` - creates clusters

`pg_ctl` - start / stop servers

`psql` - commandline client

Client applications (`pgAdmin`)

`pgAdmin`

basic `SELECT-FROM-WHERE` blocks

`WITH` clause

`DISTINCT`

`IS NULL` and `IS NOT NULL`

Aggregation and `NULL` and empty inputs

Aggregation with `HAVING`

`LIMIT` and `OFFSET` and top-k queries

`EXISTS` subqueries

`IN` subqueries

other nesting constructs `ANY` and `ALL`

`CHECK` constraints

origin	destination	numstops
ORD	BER	1
ORD	BER	3
ORD	BER	5
ORD	BER	7
ORD	BER	9
ORD	LAX	1
ORD	LAX	3
ORD	LAX	5
ORD	LAX	7
ORD	LAX	9
ORD	ORD	1
ORD	ORD	3
ORD	ORD	5
ORD	ORD	7
ORD	ORD	9
ORD	PAR	2
ORD	PAR	4
ORD	PAR	6
ORD	PAR	8
ORD	PAR	10
ORD	SEA	0
ORD	SEA	2
ORD	SEA	4
ORD	SEA	6
ORD	SEA	8
ORD	SEA	10