The MEF 3.0 on-disk format¶
A practical reference for the parts of MEF 3.0 that mef3io reads and writes
(time series + records; video is out of scope). Offsets below are what the
code actually uses (core/include/mef3io/types.hpp, core/src/headers.cpp)
and are cross-validated against meflib/pymef. All integers are
little-endian; type names follow meflib (si8 = int64, ui4 = uint32,
sf8 = float64, …). Strings live in fixed-size, null-padded UTF-8 fields.
Directory tree¶
A "file" in MEF is really a directory tree; the suffix of each directory says what it is:
session.mefd/ # session
├── session.rdat / session.ridx # optional session-level records
└── ch1.timd/ # one time-series channel
├── ch1.rdat / ch1.ridx # optional channel-level records
├── ch1-000000.segd/ # segment 0
│ ├── ch1-000000.tmet # metadata (fixed 16 384 B)
│ ├── ch1-000000.tidx # block index (1024 B UH + 56 B/block)
│ └── ch1-000000.tdat # compressed data (1024 B UH + RED blocks)
└── ch1-000001.segd/ # segment 1, ...
A channel is split into segments (numbered directories); each segment is
a self-contained triple of metadata / index / data files. Appending more data
either extends the last segment's three files in place or starts a new
segment. Use Reader.segments(channel) to see which segment covers which
time and sample range, and Reader.toc(channel) for the per-block view.
Times¶
All times are uUTC: microseconds since the Unix epoch, si8. Two format
quirks matter:
- Times are stored negated. meflib marks "recording-time-offset applied"
by storing
rto - t(a negative number). Readers recover the true time ast >= 0 ? t : -t + rto, whererto(the recording time offset) lives in metadata section 3. This applies to universal-header start/end times, block start times, and record times alike. mef3io writesrto = 0. - The empty value ("no entry") for a time is
INT64_MIN.
Universal header — 1024 B, prefixes every file¶
Every .tmet/.tidx/.tdat/.rdat/.ridx file starts with the same 1024-byte
header:
| Offset | Type | Field |
|---|---|---|
| 0 | ui4 | header CRC (over bytes 4..1023) |
| 4 | ui4 | body CRC (over bytes 1024..end of file) |
| 8 | char[5] | file type string (tmet, tidx, tdat, rdat, ridx) |
| 13 / 14 | ui1 | MEF version major / minor (3 / 0) |
| 15 | ui1 | byte order (1 = little-endian) |
| 16 / 24 | si8 | start / end time (negated uUTC, see above) |
| 32 | si8 | number of entries (blocks / index entries / records) |
| 40 | si8 | maximum entry size (bytes) |
| 48 | si4 | segment number |
| 52 / 308 / 564 | char[256] | channel name / session name / anonymized name |
| 820 / 836 / 852 | ui1[16] | level UUID / file UUID / provenance UUID |
| 868 / 884 | ui1[16] | level-1 / level-2 password validation fields |
| 900 / 960 | ui1[60] / ui1[64] | protected / discretionary regions |
CRCs use CRC-32 with the Koopman polynomial (0x741B8CD7, reflected, start value 0xFFFFFFFF). The password validation fields hold the two-level key material (see encryption_model.md); all-zero fields mean the file is unencrypted.
Metadata file (.tmet) — exactly 16 384 B¶
[ 0..1023 ] universal header
[ 1024..2559 ] section 1 (1536 B) — never encrypted
[ 2560..13311] section 2 (10752 B) — encrypted with the LEVEL-1 key
[13312..16383] section 3 (3072 B) — encrypted with the LEVEL-2 key
Section 1 (offsets relative to the section):
| Offset | Type | Field |
|---|---|---|
| 0 | si1 | section-2 encryption level (+1 encrypted / −1 decrypted-on-disk) |
| 1 | si1 | section-3 encryption level (+2 / −2) |
A positive level means the section bytes are AES-128-ECB ciphertext; a
negative level means the same content is stored decrypted. Unencrypted files
carry −1/−2 (bytes 0xFF/0xFE — beware tools that read this byte as
unsigned).
Section 2 (time-series flavor; the technical metadata):
| Offset | Type | Field |
|---|---|---|
| 0 / 2048 | char[2048] | channel / session description |
| 4096 | si8 | recording duration (µs, spans gaps) |
| 4104 | char[2048] | reference description |
| 6152 | si8 | acquisition channel number |
| 6160 | sf8 | sampling frequency (Hz) |
| 6168–6192 | sf8 ×4 | LFF / HFF / notch filter, AC line frequency |
| 6200 | sf8 | units conversion factor (physical units per count) |
| 6208 | char[128] | units description (e.g. uV) |
| 6336 / 6344 | sf8 | maximum / minimum native sample value |
| 6352 | si8 | start sample (channel-wide index of this segment's first sample) |
| 6360 | si8 | number of samples (stored samples; gaps are not counted) |
| 6368 | si8 | number of blocks |
| 6376 | si8 | maximum block bytes |
| 6384 / 6388 | ui4 | maximum block samples / maximum difference bytes |
| 6392 | si8 | block interval (µs) |
| 6400 | si8 | number of discontinuities |
| 6408–6424 | si8 ×3 | maximum contiguous blocks / block bytes / samples |
Note the two easy-to-confuse quantities: number_of_samples counts samples
physically stored (NaN gaps are skipped at write time), while the
start/end times and recording_duration span the gaps. A gridded read
(Reader.read) therefore usually returns more samples than
number_of_samples, with NaN filling the gaps.
Section 3 (the sensitive, level-2 part):
| Offset | Type | Field |
|---|---|---|
| 0 | si8 | recording time offset (rto, used to de-negate times) |
| 8 / 16 | si8 | DST start / end time |
| 24 | si4 | GMT offset (seconds) |
| 28 / 156 | char[128] | subject name 1 / 2 |
| 284 | char[128] | subject ID |
| 412 | char[512] | recording location |
Block index (.tidx) — 1024 B UH + one 56 B entry per RED block¶
| Offset | Type | Field |
|---|---|---|
| 0 | si8 | file offset of the block in the .tdat — file-relative, so the first block is at 1024, not 0 |
| 8 | si8 | block start time (negated uUTC) |
| 16 | si8 | start sample (channel-wide index) |
| 24 | ui4 | number of samples in the block |
| 28 | ui4 | block bytes (header + payload, padded) |
| 32 / 36 | si4 | maximum / minimum sample value in the block |
| 44 | ui1 | RED flags (bit 0 = discontinuity) |
The index is what makes windowed reads cheap: a reader can binary-search the
time range and fetch only the needed byte range from the .tdat.
Data file (.tdat) — 1024 B UH + concatenated RED blocks¶
Samples are stored as int32 counts, compressed per block with RED (Range Encoded Differences): difference coding followed by an adaptive range coder. Each block starts with a 304-byte header:
| Offset | Type | Field |
|---|---|---|
| 0 | ui4 | block CRC |
| 4 | ui1 | flags (bit 0 = discontinuity; bits 1/2 = level-1/2 encrypted) |
| 16 / 20 | sf4 | detrend slope / intercept (unused when written lossless) |
| 24 | sf4 | scale factor (1.0 when lossless) |
| 28 | ui4 | difference bytes |
| 32 | ui4 | number of samples |
| 36 | ui4 | block bytes |
| 40 | si8 | block start time (negated uUTC) |
| 48 | ui1[256] | symbol statistics table for the range coder |
The compressed payload follows at offset 304; blocks are padded with 0x7e
to an 8-byte boundary. A set discontinuity flag means "this block does not
continue seamlessly from the previous one" — that is how gaps (NaN runs in
the original signal) are represented; nothing is stored for the gap itself.
RED blocks are written unencrypted even in encrypted sessions (meflib
default): the passwords protect metadata and records, and without section 2 a
reader has no fs/ufact/counts to interpret the samples with.
To recover physical units: value = stored_int32 * units_conversion_factor.
Records (.rdat + .ridx) — annotations¶
.rdat holds the records; .ridx is a parallel index. Both start with the
universal header. Record header (24 B) in .rdat:
| Offset | Type | Field |
|---|---|---|
| 0 | ui4 | record CRC |
| 4 | char[4] | type (Note, EDFA, SyLg, Seiz, …) |
| 9 / 10 | ui1 | version major / minor |
| 11 | si1 | encryption level of the body |
| 12 | ui4 | body bytes |
| 16 | si8 | record time (negated uUTC) |
The body follows, padded with 0x7e to a 16-byte multiple (AES block size);
in encrypted sessions record bodies are level-2 encrypted. Body layouts:
Note = text; EDFA = si8 duration + text; Seiz = si8 earliest onset,
si8 latest offset, si8 duration. Each .ridx entry (24 B): type[4] @ 0,
version @ 5/6, encryption @ 7, file offset (file-relative) @ 8, time @ 16.
Sentinels and other conventions¶
| Meaning | Value |
|---|---|
| "no entry" time (si8) | INT64_MIN |
| "no entry" si8 / si4 / ui4 | −1 / −1 / 0xFFFFFFFF |
| RED NaN sample | INT32_MIN |
| GMT offset "no entry" | −86401 |
| pad byte (blocks, record bodies) | 0x7e |
How this maps to the mef3io API¶
| On-disk concept | API |
|---|---|
| Section 2 + universal header | Reader.info(ch) (fs, ufact, times, counts) |
| Section 3 (subject metadata) | Reader.info(ch) subject_* fields; None without L2 access |
| Segment triples | Reader.segments(ch) — time/sample range per segment |
.tidx entries |
Reader.toc(ch) — per-block start time, extrema, discontinuity |
| RED blocks | decoded transparently by read / read_raw |
| Records | Reader.records(ch) / Writer.write_annotations(...) |
Related: encryption_model.md (how the two password levels derive keys and what they unlock), legacy_comparison.md (measured differences vs pymef/mef_tools), and design.md (mef3io internals).