LevelDB

LevelDB is a fast, embedded key-value store written by Sandy Ghemawat and Jeff Dean at Google. It is a single-process library — no server, no network protocol — that maps arbitrary byte-string keys to arbitrary byte-string values, ordered by key. The on-disk layout is an LSM (Log-Structured Merge) tree, which makes writes cheap and range scans efficient. LevelDB is the technical ancestor of RocksDB (Facebook's fork) and the storage layer behind Bitcoin Core, Chrome's IndexedDB, and many other systems.

1. Overview & Design

LevelDB stores data as sorted (key, value) pairs. Keys and values are arbitrary byte strings — no schema, no types, no SQL. The library is embedded directly into your application process, so reads and writes are function calls (no IPC, no network round-trip).

Storage model — LSM tree:

MemTable. Writes first land in an in-memory sorted structure (skiplist).
WAL (Write-Ahead Log). The same write is appended to a log on disk for durability.
SSTables (Sorted String Tables). When the MemTable fills, it is flushed to an immutable sorted file on disk.
Compaction. Background threads merge SSTables across levels (L0, L1, L2, ...) to reclaim space and keep reads efficient.

Properties:

Single-writer, multi-reader.
Ordered iteration by key — range scans are first-class.
No secondary indexes — you compose them by key prefix.
No transactions across keys — only atomic batch writes.
Snappy compression by default for SSTables.

2. Installation

macOS (Homebrew)

brew install leveldb

# Python binding
pip install plyvel

Ubuntu / Debian

sudo apt-get update
sudo apt-get install -y libleveldb-dev libsnappy-dev

# Python binding
pip install plyvel

Build from source

git clone --recurse-submodules https://github.com/google/leveldb.git
cd leveldb
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build .
sudo cmake --install .

Verify the install (Python)

import plyvel
db = plyvel.DB('/tmp/leveldb-smoketest', create_if_missing=True)
db.put(b'hello', b'world')
print(db.get(b'hello'))   # b'world'
db.close()

3. Python Quick Start (`plyvel`)

The standard Python binding is plyvel — a thin Cython wrapper around the C++ library. Keys and values are bytes, never str.

import json
import plyvel

# Open (or create) a database directory
db = plyvel.DB('/tmp/users.ldb', create_if_missing=True)

# --- Insert ---
def put_user(user_id: int, record: dict) -> None:
    key = f"user:{user_id}".encode()
    value = json.dumps(record).encode()
    db.put(key, value)

put_user(1001, {"first": "Alice",   "last": "Smith",   "city": "Seattle"})
put_user(1002, {"first": "Bob",     "last": "Johnson", "city": "Portland"})
put_user(1003, {"first": "Charlie", "last": "Davis",   "city": "San Francisco"})

# --- Retrieve a single key ---
raw = db.get(b'user:1002')
if raw is not None:
    print(json.loads(raw))   # {'first': 'Bob', 'last': 'Johnson', 'city': 'Portland'}

# --- Delete ---
db.delete(b'user:1003')

# --- Check missing keys ---
print(db.get(b'user:9999'))  # None

db.close()

Key-design tip. LevelDB stores keys in lexicographic byte order, so encode multi-part keys with sortable separators (e.g. b"user:0000001002" rather than b"user:1002") to keep numeric ranges contiguous.

4. C++ API

The original LevelDB API is C++. The shape is DB::Open → Put/Get/Delete → delete db.

#include <leveldb/db.h>
#include <cassert>
#include <iostream>
#include <string>

int main() {
    leveldb::DB* db = nullptr;
    leveldb::Options options;
    options.create_if_missing = true;

    leveldb::Status status = leveldb::DB::Open(options, "/tmp/users.ldb", &db);
    assert(status.ok());

    // Put
    db->Put(leveldb::WriteOptions(), "user:1001",
            R"({"first":"Alice","last":"Smith"})");

    // Get
    std::string value;
    status = db->Get(leveldb::ReadOptions(), "user:1001", &value);
    if (status.ok()) std::cout << value << std::endl;

    // Delete
    db->Delete(leveldb::WriteOptions(), "user:1001");

    delete db;
    return 0;
}

Compile with the system library:

g++ -std=c++17 leveldb_demo.cpp -o leveldb_demo -lleveldb -lpthread -lsnappy
./leveldb_demo

5. Iteration & Range Scans

Because keys are stored in sorted order, iterators are the canonical way to read multiple records. Range scans are O(log n) seek + O(k) sequential.

import plyvel

db = plyvel.DB('/tmp/users.ldb')

# Full forward scan
for key, value in db:
    print(key, value)

# Range scan: keys in [start, stop)
for key, value in db.iterator(start=b'user:1001', stop=b'user:2000'):
    print(key, value)

# Prefix scan
for key, value in db.iterator(prefix=b'user:'):
    print(key, value)

# Reverse scan
for key, value in db.iterator(reverse=True):
    print(key, value)

# Keys-only / values-only (skip the half you don't need)
for key in db.iterator(include_value=False):
    print(key)

Composite indexes via key prefixes. A common pattern is to use prefix-based "tables" inside one LevelDB: b"user:...", b"order:...", b"order_by_user:<user_id>:<order_id>". Prefix scans then act as lookups by secondary key.

6. Batch Writes & Atomicity

LevelDB has no multi-key transactions, but a WriteBatch commits a group of writes atomically — either all of them apply or none of them do.

import plyvel

db = plyvel.DB('/tmp/users.ldb')

with db.write_batch() as wb:
    wb.put(b'user:1001', b'{"first":"Alice"}')
    wb.put(b'user:1002', b'{"first":"Bob"}')
    wb.delete(b'user:1003')
    # The batch is committed atomically when the `with` block exits.
    # If an exception is raised, nothing is written.

# Synchronous (fsync) batch — slower but durable across power loss
with db.write_batch(sync=True) as wb:
    wb.put(b'config:version', b'42')

Sync vs. async writes. The default write does not fsync; data is durable on process crash but may be lost on power loss. Pass sync=True when correctness depends on the write surviving an OS crash.

7. Snapshots & Consistent Reads

A snapshot pins a logical view of the database at a point in time. Subsequent reads against that snapshot see the data exactly as it was, even as concurrent writes commit.

import plyvel

db = plyvel.DB('/tmp/users.ldb')

# Capture a snapshot
snap = db.snapshot()

# Concurrent writer modifies the database
db.put(b'user:1001', b'{"first":"Alice","city":"NYC"}')

# Reads against the snapshot still see the *old* value
print(snap.get(b'user:1001'))   # original value, not "NYC"
print(db.get(b'user:1001'))     # current value

snap.close()

Snapshots are cheap (a single sequence number) and are the standard tool for consistent multi-key reads, point-in-time iteration, and producing backups while writes continue.

8. Tuning & Options

The most-touched options for production deployments:

write_buffer_size — size of the in-memory MemTable before it flushes to L0. Larger = fewer flushes, more recovery time. Default 4 MiB; raise to 64–128 MiB for write-heavy workloads.
block_size — size of an SSTable block. Larger blocks = better compression and scan throughput; smaller blocks = lower point-read latency. Default 4 KiB.
block_cache — LRU cache for uncompressed blocks. Size to a meaningful fraction of working-set bytes for read-heavy workloads.
max_open_files — cap on simultaneously held SSTable file handles. Raise on large databases.
compression — snappy (default) or none. Snappy is almost free in CPU; turn off only when storing already-compressed payloads.
paranoid_checks — verify checksums on every read. Costs CPU; turns silent disk corruption into errors.

import plyvel

db = plyvel.DB(
    '/tmp/big.ldb',
    create_if_missing=True,
    write_buffer_size=128 * 1024 * 1024,   # 128 MiB MemTable
    max_open_files=2000,
    lru_cache_size=512 * 1024 * 1024,      # 512 MiB block cache
    block_size=8 * 1024,
    compression='snappy',
)

9. LevelDB vs. RocksDB

RocksDB is Facebook's hard fork of LevelDB. The on-disk format is largely compatible, but RocksDB adds substantially more functionality. Pick LevelDB for embedded simplicity; pick RocksDB for production scale.

Capability	LevelDB	RocksDB
Column families (logical namespaces)	No	Yes
Multi-threaded compaction	Single-threaded	Multi-threaded
Bloom filters	Manual	Built-in, per-CF
Backup & checkpoint API	No	Yes
TTL on keys	No	Yes
Transactions (optimistic / pessimistic)	No	Yes
Compression algorithms	Snappy	Snappy, LZ4, Zstd, ZLIB
Tuning surface	Small	Large
Code size	~25k LOC	~400k LOC

10. When to Use LevelDB

Good fits:

Embedded local store inside a desktop or mobile app (Chrome's IndexedDB sits on LevelDB).
The block-store layer of a single-node service — e.g. a blockchain node like Bitcoin Core.
A per-process write-buffer / staging tier for an upstream system.
Append-heavy workloads with ordered range scans — time-series logs, write-ahead caches.
Indexes for embedded search engines and crawlers.

Poor fits:

Multi-process or networked access — LevelDB locks the database directory to one process. Use a server like Redis, FoundationDB, or a real database.
Cross-key transactions — only atomic batches, no MVCC across keys.
Heavy random-write workloads at scale — reach for RocksDB.
Schema-rich querying — no secondary indexes, no SQL.

Common Interview Questions:

What is the LSM-tree write path in LevelDB?

A write is appended to the WAL on disk, then inserted into the in-memory MemTable (a skiplist). When the MemTable hits write_buffer_size, it becomes immutable and is flushed to a Level-0 SSTable. Background compaction later merges L0 into L1, L1 into L2, etc., maintaining sorted, non-overlapping files at each level past L0.

Why are LevelDB writes fast but reads sometimes slow?

Writes only touch the MemTable + WAL — both are sequential. Reads, however, may have to consult the MemTable, an immutable MemTable, all L0 files (which can overlap), and then one file per lower level, before falling back to disk. Bloom filters and the block cache mitigate this, but a write-heavy workload that hasn't compacted can show high read amplification.

How does LevelDB give you atomicity without transactions?

WriteBatch groups multiple put/delete operations into a single sequence-numbered commit. Either all writes in the batch are visible after the WAL fsync or none are. There is no isolation across batches and no rollback — if you need cross-key MVCC, use RocksDB transactions or a database that supports them.

Why does LevelDB only allow one process to open a database?

The database directory holds an exclusive file lock (LOCK file). LevelDB has no client/server protocol or distributed coordination, so two processes writing simultaneously would corrupt the LSM invariants. For multi-process access, run a server in front of it (e.g. a custom HTTP wrapper) or use a database designed for that — FoundationDB, Redis, Postgres.

How would you implement a secondary index on top of LevelDB?

Encode the secondary key in the key prefix and store the primary key (or full record) as the value. For example, to index users by city: write b"user:<id>" for the canonical record and b"by_city:<city>:<id>" with an empty value for the index. A prefix scan on b"by_city:Seattle:" yields all matching IDs. Maintain index and primary entries inside one WriteBatch for atomicity.

When would you choose LevelDB over RocksDB?

When you want a small, audited, embedded library with a tiny tuning surface — e.g. embedding into an application, shipping a CLI tool, or as a teaching example. RocksDB is the better default for any production server-side workload because it has multi-threaded compaction, bloom filters per column family, transactions, backup/checkpoint, and a much richer set of compression and TTL options. LevelDB stays in the picture mainly because of code-size, simplicity, and the systems that already depend on it.

↑ Back to Top