Chapter 2 of 4
Data Models and Storage
How data is stored and queried — from relational models to document stores, LSM-trees, and B-trees.
Key Insights
The relational model, document model, and graph model each have their strengths — the choice depends on your data's relationships.
LSM-trees are optimized for writes (append-only). B-trees are optimized for reads (in-place updates). Most databases use one or the other.
Schema-on-read vs schema-on-write is a fundamental design decision with deep implications.
Notes
Relational vs Document vs Graph
Relational (SQL): Best for many-to-many relationships, joins, and structured data with a known schema. Document (MongoDB, etc.): Best for self-contained documents with few relationships, offers schema flexibility. Graph (Neo4j, etc.): Best for highly interconnected data where relationships are as important as the data itself.
LSM-Trees vs B-Trees
LSM-Trees (Log-Structured Merge): Write to an in-memory buffer (memtable), flush to sorted files (SSTables), compact periodically. Fast writes, sequential I/O. Used by Cassandra, RocksDB, LevelDB. B-Trees: The standard for relational databases. Data stored in fixed-size pages. In-place updates. Good read performance with O(log n) lookups.
When to Use What
Use relational when you need joins, transactions, and data integrity. Use document stores when your data is naturally hierarchical (e.g., user profiles, product catalogs). Use graph databases when you need to traverse complex relationships (social networks, recommendation engines, fraud detection).
Column-Oriented Storage
In OLAP (analytics) workloads, you typically query a few columns across millions of rows. Column-oriented storage stores all values of a column together, enabling better compression and faster analytical queries. This is why data warehouses (BigQuery, Redshift, ClickHouse) use columnar formats.
Quotes
“The limits of my language mean the limits of my world.”
— Page 27
“A document is not stored as a row in a table — it is stored as a self-contained JSON or binary blob.”
— Page 31