# CS595 - Storage - Distributed Key-Value Stores (NoSQL Databases) **Lecturer**: [Boris Glavic](http://www.cs.iit.edu/~glavic/) **Semester**: fall 2021
# 2. Distributed Storage ## Storage - Distributed Key-Value Stores (NoSQL Databases)
## Distributed NoSQL Databases | | Relational database | NoSQL store | |-------------|------------------------|---------------------------------------------------------| | Data model | relational model | Key-value, documents, graphs | | Consistency | Serializability (ACID) | no transactions, typically eventually consistent (BASE) | | Queries | SQL | CRUD |
## Types of NoSQL Data Models - **Key-value** - sets of key-value pairs - some systems support an ordered key domain - **Wide column** - relational or nested relational - can be seen as extension of key-value where we impose more structure on the values - **Document** - semi-structured data like JSON or XML - **Graph** - graph (nodes and edges) - common model are property graphs or RDF graphs
## Examples - **Key-value** - Amazon Dynamo, Redis - **Wide column** - Big table, Cassandra - **Document** - MongoDB, Couchbase - **Graph** - Neo4J, AllegroGraph
## Common Themes - The NoSQL movement was born out of a need for **scalable** storage of structured data with high **availability** and often low **latency** requirements - Often availability and low latency are achieved at the cost of consistency (and potentially expressiveness of queries) - To ensure scalability, some systems restrict operations (e.g., no transactions) such that they can be executed without requiring cluster-wide communication
## BASE (Consistency) - most distributed NoSQL databases sacrifice **consistency** for - low latency of operations - availability under node failures and network partitions
## BASE (Consistency) - **BASE** = Basically Available, Soft state, Eventual consistency - **Basically Available**: the system is available even under failures like network partitions - **Soft state**: data may be changing even if no updates happen (caused by eventual consistency) - **Eventually consistency**: any update to the data will eventually be observed by all replicas
## Eventual consistency - **Informally**: If no updates are applied to a data item, then all accesses to that data item will eventually start to return the same value - For systems using replication + clients can read from any replica - eventually all replicas converge on same state - More details later when talking about consistency, consensus and distributed transaction processing
## CRUD Operations - *C*: `create(key,value)` - associate `key` with `value` - *R*: `read(key)` - return `value` associated with `key` - *U*: `update(key,value)` - associated existing `key` with `value` - *D*: `delete(key)` - delete `key`
## CRUD Operations - **Main take-aways:** - simpler query model (compared to fully-fledged query languages) - no transactions that combine multiple operations into an atomic action
## Storage - Distributed Key-Value Stores (NoSQL Databases) ### Key-value Stores
## Data Model - Data is stored as sets of key-value pairs - Typically the systems treat keys and values as uninterpreted sequences of bytes - applications can interpret them as they please
## Point vs Range Queries - Consider index storing key and values `{(k,v)}` - **point-query**: given a key `k` return the associated value `v` - **range-query**: given a range of keys `[k1,k2]` return all key-value pairs `(k,v)` such that $k \in [k1,k2]$
## Example - **input data**: `{ (34, Peter), (56, Bob), (46, Alice) }` - **point query**: - `get(34)` -> `(34,Peter)` - **range query**: - `range(30,50)` -> `{ (34,Peter), (46, Alice) }`
## Storage - Distributed Key-Value Stores (NoSQL Databases) ### Document Stores
## Data Model - semi-structure data, e.g., `JSON` (Javascript object notation) - types: - arrays `[]` - maps (objects) `{ field1: value, ...}` - primitive types ```json [ { "Name": "Peter", "Age": 15, "Addresses": [ { "City": "Chicago", "Zip": 60616, "Street": "10 W 31st" }, { "City": "Chicago", "Zip": 60614 } ] } ] ```
## Query Model - either only CRUD or declarative query language - one example is MQL, MongoDB's query language for JSON documents - *example*: find persons living in Chicago ```mongo db.persons.find( { "Addresses": { "City": "Chicago" } } ) ``` - for some more examples see: [here](https://github.com/IITDBGroup/CS595-repository/blob/master/mongodb.org)
## Storage - Distributed Key-Value Stores (NoSQL Databases) ### Wide column Stores
## Data Model - typically nested relational - requiring every relation to have a key - this is key-values with more semantics for values!
## Query Model - point queries and range-queries - possibly allowing queries over non-key columns (not supported by most key-value stores)
## Storage - Distributed Key-Value Stores (NoSQL Databases) ### Graph Databases
## Data Models - **Property graphs**: - directed graph of nodes and edged - both nodes and edges (relationships) have a type (often called labels) - nodes and edges can have properties (fields) - **RDF (Resource Description Framework)** - Data is represented as triples: `(object,predicate,subject)` - uses URIs to represent entities (e.g., objects) - Can also be interpreted as a graph where object and subjects are nodes and predicates are edges connecting them
## Example - Property graphs - On the next slide we see an example property graph - For more examples see: [here](https://github.com/IITDBGroup/CS595-repository/blob/master/neo4j-cypher.org) - Two types of nodes: - **Persons** with `name` and `age` - **Companies** with their `name` and `headquarters` - Types of edges: - **married to**: person to person - **child of**: person to person - **reports to**: person to person - **works for**: person to company
G
cluster_0
Person
cluster_1
Person
cluster_2
Person
cluster_3
Person
cluster_4
Company
n1
name:Peter
n2
name:Alice
n1->n2
married to
n3
name:Bob
n1->n3
reports to
a1
age:35
n2->n3
child of
n5
name:IBM
n2->n5
works for
a2
age:36
n4
name:George
n3->n4
reports to
n3->n5
works for
a3
age:55
n4->n5
works for
a4
age:25
h5
headquarters:California
## Example - RDF - We revisit our person example, but with a slightly different instance - **predicates** - all relationships from the previous example - `is-a` - `hasName`
G
p1
P1
p2
P2
p1->p2
child of
p3
P3
p1->p3
married to
person
Person
p1->person
is-a
p1n
Alice
p1->p1n
hasName
p2->person
is-a
ibm
IBM
p2->ibm
works for
p2n
Bob
p2->p2n
hasName
p3->person
is-a
p3->ibm
works for
p3n
Peter
p3->p3n
hasName
n2
n2
p3->n2
reports to
## Query Model - Declarative graph query languages - Examples - **Cypher** - for property graphs - e.g., supported by Neo4J - **SPARQL** - for RDF - supported by many systems
## Cypher - query language for property graphs: [Neo4J Cypher Tutorial](https://neo4j.com/developer/cypher/) - syntax inspired by SQL and SPARQL - based on patterns that describe what subgraphs are of interest and bind nodes and edges to variables - *example*: *return the name of persons working for IBM* ```cypher MATCH (p:Person)-[:works for]->(c:Company) WHERE c:name = 'IBM' RETURN p:name ```
## SPARQL - SPARQL is a query language for RDF data - It's syntax bears some similarity with SQL - Queries consist of graph patterns that are matched against an RDF graph ```sparql SELECT ?name WHERE { ?p hasName ?name . ?p is-a Person . } ```
## Storage - Distributed Key-Value Stores (NoSQL Databases) ### Foundational Techniques and Algorithms
## Foundational Techniques and Concepts - Systems differ widely in what performance characteristics, operations, and what guarantees they provide - Technically, NoSQL Databases combine techniques from the systems, distributed computing, and database communities - We will discuss (have discussed) several foundational techniques and algorithms:
## Foundational Techniques and Concepts - **Storage organization**: - **Data placement**: horizontal (range and hash) and vertical partitioning - **LSM-trees**: An LSM-tree is a write-optimized index structure that is applied by many key-value and wide column store systems. Because writes to disk are sequential in LSM-trees, these data structures work well with distributed file systems like HDFS that only allow appending writes. - **Distributed Hash Tables (DHT)**: A distributed hash table is a distributed, fault tolerant and load-balancing implementation of a map data structure - **Consistent Hashing**: the technique used by many DHTs to distribute data without requiring complete reorganization when nodes leave or enter the system - **Overlay Networks**: an overlay network is a virtual network on top of a physical network. They are used by DHTs for routing requests
## Foundational Techniques and Concepts - **Fault tolerance** - **Replication**: replicate data and/or computation to avoid single points of failure - **Bulk-synchronous processing**: split computation into phases whose ends act as barriers for the computation - **Lineage-based fault tolerance**: log how chunks of data are produced from other chunks of data and re-compute chunks on failures - **Distributed snapshots**: snapshot state of operators and/or their outputs in a distributed data flow
## Foundational Techniques and Concepts - **Consistency and Distributed State Management** - **Eventual Consistency**: A weaker form of consistency applied by many key-value - **Vector Clocks/Version Vectors**: A mechanism for reasoning about the causal ordering of events in a distributed system - **Consensus protocols**: Keeping state in-sync in a distributed system with *Paxos* of *Raft* - **2PC**: the two phase commit protocol for distributed transaction processing