mmdb
Difference between memory and disk
- Faster access time.
- Volatile.
- Fixed cost per access to disk so they are block-oriented storage devices but memory is not.
- Sequential access to disk is critical but not to memory.
History
1984-1994
- IMS Fast Path(1976)
- Fine-grained record-level locking(?).
- Install record updates at commit time(?).
- MM-DBMS
- Use of pointers for direct recored accesses.
- New memory-optimized indexing methods(T-Trees).
- Optimized two-pahse locking.
- New recovery techniques.
1994-2007
- ClustRa
- Prototype of distributed main-memory architecture.
- Dali
- Allow applications direct shared-memory access to data records(not common yet in modern mmdb).
- Perform redo-only loggingg during recovery(A transient undo log in memory which discarded after commit).
- Fuzzy action-consistent checkpointing to reduce lock conetention on checkpointing worker.
- System K
- Process transactions sequentially on a cpu core.
- Logical logging(?).
- TimesTen
- An optimized checkpointing and logging scheme for main memory execution.
- High-performance mmdb solution or Oracle.
- P*Time
- L2-cache consciousness.
- Latch-free interal data structures.
- Optimized for durability and recovery by performing fine-grained parallel differential logging and recovery.
- Multi-level locking.
Trends
Embedded Key-Value stores
Provide a high-performance, memory-optimized data structure(hash or range-based access).
Examples of memory-optimized key-value store designs include:
- Masstree: Employs a cache-conscious index layout using a 'trie-of-B+-tree' and implements an optimistic concurrency-control scheme that does not block readers.
- Bw-tree: latch-free B+-tree in memory and implements a log-structured page store that writes sequentailly batck to stable storage.
- MICA: Optimizes the complete request-handling stack(from network request at the NIC through to data structure design) and use a lossy hash index for highly concurrent reads and writes.
Distributed Key-Value stores
Not just a cache, recent projects have build distributed main-meory key-value stores for primary data storage -- durability and high performance.
Examples: RAM-Cloud and FaRM.
Issues and Architectural Choices
Data Orgnization and Layout
- Avoid Page Indirection
- Avoid indirection calculation.
- Avoid page lathing.
- Partitioning
- Run transactions serially within a partition.
- Need to coordinate transactions that span multiple partitions.
- Multi-versioning
- Enable concurrency control schemes where readers never block writers.
- Allow for snapshot reads.
- Row/Columnar layout
- Row layout is more appropriate for OLTP workloads.
- Columnar layout is more appropriate for analytics workloads.
Indexing
- No page-based indirection
- Cache awareness: CSB+-Tree, Pb+-Tree.
- Multi-core parallelism
- Partitioned: single thread per partition.
- Shared: latch-free, Bw-tree, Mass-Tree.
Concurrency Control
- Multi-versioning and Optimisim
- Partitioning
- Apriori knowledge of wrte/read sets
Durability and Recovery
- Optimized for high throughput and low latency.
- Perform redo-only logging and Avoid logging index data to minimize log traffic.
- Copy-on-write for checkpointing because the checkpoint is larger in mmdb than in disk-based database.
- Parallelize recovery as much as possible to reduce recovery time.
Query Processing and Compilation
- The vast difference is in the lower layer of query processing stack.
- MMDB reduces runtime overhead by compiling queries directly to machine code.
- Disk-based databases use iterator model(Volcano-style processing).