tsdb: a DOTSV database runner
tsdb 是一個用 Rust 寫的命令列資料庫工具,搭配我設計的 DOTSV 平面檔案格式和 base62 編碼系統。
tsdb is a command-line database runner for DOTSV flat files, written in Rust (requires Rust 1.94+). It pairs a custom flat-file database format (DOTSV) with a compact time-based UUID scheme (base62 Format-Gu).
Source code 原始碼
- tsdb
- Documentation 文件:https://sotongdj.github.io/tsdb/
base62 編碼 | base62 Encoding
base62 使用 62 個字母數字字元(0-9、a-z、A-Z),不含檔案名稱或 URL 中的不安全字元。每個字元提供 5.95 bits 的資訊密度。
base62 uses exactly 62 alphanumeric characters — filename-safe, URL-safe, shell-safe, human-typeable.
Format A–F
Six permutations of three character groups ([0-9], [a-z], [A-Z]) producing different lexicographic sort orders.
Format-G
去除視覺易混淆字元 l 和 O,使用 60 個字元。9 字元時間戳記結構:
Removes visually ambiguous l and O, using 60 characters. 9-character timestamp structure:
| Position | Segment | Encoding |
|---|---|---|
| 1 | Prefix | Literal G |
| 2 | Century | Format-G alphabet |
| 3–4 | Year | 2-digit zero-padded |
| 5 | Month | a–f for 1–6; A–F for 7–12 |
| 6 | Day | 0–9, a–k, A–J |
| 7 | Hour | 0, a–l, A–K |
| 8 | Minute | Positional in 60-char alphabet |
| 9 | Second | Positional in 60-char alphabet |
Example: 2026-03-10 04:00:45 → Gk26c9d0K
Format-Gu (tbUUID)
12 字元的時間排序 UUID:{Class}{G-timestamp}{OrderNum}
12-character time-sortable UUID: {Class}{G-timestamp}{OrderNum}
- Class indicator: single uppercase letter for record type
- Order number: 2-char suffix (
01–ZZ), up to 3,844 values per class per second - Regex:
[A-Z]G[0-9a-zA-Z]{10} - 一旦分配就不可變更 | Immutable once assigned
DOTSV 格式 | DOTSV Format
DOTSV (Database Oriented Tab Separated Vehicle) — 每筆記錄佔一行的平面檔案資料庫格式。
DOTSV — a single-line-per-record flat-file database format.
- File extensions:
*.dotsv,*.dov - Encoding: UTF-8
Record format
<12-char-Format-Gu-UUID>\t<key=value>\t<key=value>...\n
NGk26cHcv001 name=Alice city=Tokyo age=30
NGk26cHdn002 name=Bob city=Osaka
Escaping
僅需轉義 4 個位元組 | Only 4 bytes require escaping:
| Byte | Escaped | Reason |
|---|---|---|
\n |
\x0A |
Record delimiter |
\t |
\x09 |
Field delimiter |
= |
\x3D |
Key-value separator |
\ |
\\ |
Escape character |
所有 Unicode(CJK、emoji)直接以 UTF-8 通過。All other Unicode passes through unescaped.
Two-section file structure
<sorted section>
<pending section>
- Sorted section: 依 UUID 字典序排列,支援
mmap+ binary search 的 O(log n) 查找 - Pending section: write-ahead buffer,O(1) append
- Compaction: pending 超過閾值時合併至 sorted section(
tsdb target.dov --compact)
tsdb 操作 | tsdb Operations
tsdb <target.dov> <action.txt>
tsdb <target.dov> --compact
Action file 與 DOTSV pending section 使用同一語法 | Action files use the same grammar as the pending section:
| Prefix | Name | Behavior |
|---|---|---|
+ |
Append | Insert new record; error if UUID exists |
- |
Delete | Remove by UUID; error if UUID missing |
~ |
Patch | Update specific KV pairs; error if UUID missing |
! |
Upsert | Insert if absent, replace if present |
Example
+NGk26cHcv001 name=Alice city=Tokyo age=30
~NGk26cHcv001 city=Kyoto age=31
-NGk26cHdn002
!EGk26cICK001 name=Carol city=London
Concurrency
多個 tsdb 實例透過 .dov.lock 檔案協調。UUID 集合不重疊的實例可以同時執行。
Multiple instances coordinate via a companion .dov.lock file. Instances with non-overlapping UUID sets run concurrently. Stale entries (>30s) are evicted automatically.
Design rationale 設計理念
| Goal | Mechanism |
|---|---|
| Fast query | Sorted UUIDs + mmap + binary search |
| Fast parse | memchr-SIMD accelerated tab-split |
| Low memory | Zero-copy borrows from mmap |
| Fast writes | O(1) append to pending section |
| Git-traceable | Sorted records = one-line diffs |
| Human-readable | Plain UTF-8, minimal escaping |