Chapter 2 of 4

Data Models and Storage

How data is stored and queried — from relational models to document stores, LSM-trees, and B-trees.

Key Insights

💡KEY INSIGHT

The relational model, document model, and graph model each have their strengths — the choice depends on your data's relationships.

💡KEY INSIGHT

LSM-trees are optimized for writes (append-only). B-trees are optimized for reads (in-place updates). Most databases use one or the other.

💡KEY INSIGHT

Schema-on-read vs schema-on-write is a fundamental design decision with deep implications.

Notes

📘CONCEPT

Relational vs Document vs Graph

Relational (SQL): Best for many-to-many relationships, joins, and structured data with a known schema. Document (MongoDB, etc.): Best for self-contained documents with few relationships, offers schema flexibility. Graph (Neo4j, etc.): Best for highly interconnected data where relationships are as important as the data itself.

📘CONCEPT

LSM-Trees vs B-Trees

LSM-Trees (Log-Structured Merge): Write to an in-memory buffer (memtable), flush to sorted files (SSTables), compact periodically. Fast writes, sequential I/O. Used by Cassandra, RocksDB, LevelDB. B-Trees: The standard for relational databases. Data stored in fixed-size pages. In-place updates. Good read performance with O(log n) lookups.

TIP

When to Use What

Use relational when you need joins, transactions, and data integrity. Use document stores when your data is naturally hierarchical (e.g., user profiles, product catalogs). Use graph databases when you need to traverse complex relationships (social networks, recommendation engines, fraud detection).

📘CONCEPT

Column-Oriented Storage

In OLAP (analytics) workloads, you typically query a few columns across millions of rows. Column-oriented storage stores all values of a column together, enabling better compression and faster analytical queries. This is why data warehouses (BigQuery, Redshift, ClickHouse) use columnar formats.

Quotes

💬QUOTE
The limits of my language mean the limits of my world.

— Page 27

💬QUOTE
A document is not stored as a row in a table — it is stored as a self-contained JSON or binary blob.

— Page 31