The MEF 3.0 on-disk format¶

A practical reference for the parts of MEF 3.0 that mef3io reads and writes (time series + records; video is out of scope). Offsets below are what the code actually uses (core/include/mef3io/types.hpp, core/src/headers.cpp) and are cross-validated against meflib/pymef. All integers are little-endian; type names follow meflib (si8 = int64, ui4 = uint32, sf8 = float64, …). Strings live in fixed-size, null-padded UTF-8 fields.

Directory tree¶

A "file" in MEF is really a directory tree; the suffix of each directory says what it is:

session.mefd/                     # session
├── session.rdat / session.ridx   #   optional session-level records
└── ch1.timd/                     # one time-series channel
    ├── ch1.rdat / ch1.ridx       #   optional channel-level records
    ├── ch1-000000.segd/          # segment 0
    │   ├── ch1-000000.tmet       #   metadata     (fixed 16 384 B)
    │   ├── ch1-000000.tidx       #   block index  (1024 B UH + 56 B/block)
    │   └── ch1-000000.tdat       #   compressed data (1024 B UH + RED blocks)
    └── ch1-000001.segd/          # segment 1, ...

A channel is split into segments (numbered directories); each segment is a self-contained triple of metadata / index / data files. Appending more data either extends the last segment's three files in place or starts a new segment. Use Reader.segments(channel) to see which segment covers which time and sample range, and Reader.toc(channel) for the per-block view.

Times¶

All times are uUTC: microseconds since the Unix epoch, si8. Two format quirks matter:

Times are stored negated. meflib marks "recording-time-offset applied" by storing rto - t (a negative number). Readers recover the true time as t >= 0 ? t : -t + rto, where rto (the recording time offset) lives in metadata section 3. This applies to universal-header start/end times, block start times, and record times alike. mef3io writes rto = 0.
The empty value ("no entry") for a time is INT64_MIN.

Universal header — 1024 B, prefixes every file¶

Every .tmet/.tidx/.tdat/.rdat/.ridx file starts with the same 1024-byte header:

Offset	Type	Field
0	ui4	header CRC (over bytes 4..1023)
4	ui4	body CRC (over bytes 1024..end of file)
8	char[5]	file type string (`tmet`, `tidx`, `tdat`, `rdat`, `ridx`)
13 / 14	ui1	MEF version major / minor (3 / 0)
15	ui1	byte order (1 = little-endian)
16 / 24	si8	start / end time (negated uUTC, see above)
32	si8	number of entries (blocks / index entries / records)
40	si8	maximum entry size (bytes)
48	si4	segment number
52 / 308 / 564	char[256]	channel name / session name / anonymized name
820 / 836 / 852	ui1[16]	level UUID / file UUID / provenance UUID
868 / 884	ui1[16]	level-1 / level-2 password validation fields
900 / 960	ui1[60] / ui1[64]	protected / discretionary regions

CRCs use CRC-32 with the Koopman polynomial (0x741B8CD7, reflected, start value 0xFFFFFFFF). The password validation fields hold the two-level key material (see encryption_model.md); all-zero fields mean the file is unencrypted.

Metadata file (`.tmet`) — exactly 16 384 B¶

[    0..1023 ]  universal header
[ 1024..2559 ]  section 1 (1536 B)  — never encrypted
[ 2560..13311]  section 2 (10752 B) — encrypted with the LEVEL-1 key
[13312..16383]  section 3 (3072 B)  — encrypted with the LEVEL-2 key

Section 1 (offsets relative to the section):

Offset	Type	Field
0	si1	section-2 encryption level (+1 encrypted / −1 decrypted-on-disk)
1	si1	section-3 encryption level (+2 / −2)

A positive level means the section bytes are AES-128-ECB ciphertext; a negative level means the same content is stored decrypted. Unencrypted files carry −1/−2 (bytes 0xFF/0xFE — beware tools that read this byte as unsigned).

Section 2 (time-series flavor; the technical metadata):

Offset	Type	Field
0 / 2048	char[2048]	channel / session description
4096	si8	recording duration (µs, spans gaps)
4104	char[2048]	reference description
6152	si8	acquisition channel number
6160	sf8	sampling frequency (Hz)
6168–6192	sf8 ×4	LFF / HFF / notch filter, AC line frequency
6200	sf8	units conversion factor (physical units per count)
6208	char[128]	units description (e.g. `uV`)
6336 / 6344	sf8	maximum / minimum native sample value
6352	si8	start sample (channel-wide index of this segment's first sample)
6360	si8	number of samples (stored samples; gaps are not counted)
6368	si8	number of blocks
6376	si8	maximum block bytes
6384 / 6388	ui4	maximum block samples / maximum difference bytes
6392	si8	block interval (µs)
6400	si8	number of discontinuities
6408–6424	si8 ×3	maximum contiguous blocks / block bytes / samples

Note the two easy-to-confuse quantities: number_of_samples counts samples physically stored (NaN gaps are skipped at write time), while the start/end times and recording_duration span the gaps. A gridded read (Reader.read) therefore usually returns more samples than number_of_samples, with NaN filling the gaps.

Section 3 (the sensitive, level-2 part):

Offset	Type	Field
0	si8	recording time offset (`rto`, used to de-negate times)
8 / 16	si8	DST start / end time
24	si4	GMT offset (seconds)
28 / 156	char[128]	subject name 1 / 2
284	char[128]	subject ID
412	char[512]	recording location

Block index (`.tidx`) — 1024 B UH + one 56 B entry per RED block¶

Offset	Type	Field
0	si8	file offset of the block in the `.tdat` — file-relative, so the first block is at 1024, not 0
8	si8	block start time (negated uUTC)
16	si8	start sample (channel-wide index)
24	ui4	number of samples in the block
28	ui4	block bytes (header + payload, padded)
32 / 36	si4	maximum / minimum sample value in the block
44	ui1	RED flags (bit 0 = discontinuity)

The index is what makes windowed reads cheap: a reader can binary-search the time range and fetch only the needed byte range from the .tdat.

Data file (`.tdat`) — 1024 B UH + concatenated RED blocks¶

Samples are stored as int32 counts, compressed per block with RED (Range Encoded Differences): difference coding followed by an adaptive range coder. Each block starts with a 304-byte header:

Offset	Type	Field
0	ui4	block CRC
4	ui1	flags (bit 0 = discontinuity; bits 1/2 = level-1/2 encrypted)
16 / 20	sf4	detrend slope / intercept (unused when written lossless)
24	sf4	scale factor (1.0 when lossless)
28	ui4	difference bytes
32	ui4	number of samples
36	ui4	block bytes
40	si8	block start time (negated uUTC)
48	ui1[256]	symbol statistics table for the range coder

The compressed payload follows at offset 304; blocks are padded with 0x7e to an 8-byte boundary. A set discontinuity flag means "this block does not continue seamlessly from the previous one" — that is how gaps (NaN runs in the original signal) are represented; nothing is stored for the gap itself. RED blocks are written unencrypted even in encrypted sessions (meflib default): the passwords protect metadata and records, and without section 2 a reader has no fs/ufact/counts to interpret the samples with.

To recover physical units: value = stored_int32 * units_conversion_factor.

Records (`.rdat` + `.ridx`) — annotations¶

.rdat holds the records; .ridx is a parallel index. Both start with the universal header. Record header (24 B) in .rdat:

Offset	Type	Field
0	ui4	record CRC
4	char[4]	type (`Note`, `EDFA`, `SyLg`, `Seiz`, …)
9 / 10	ui1	version major / minor
11	si1	encryption level of the body
12	ui4	body bytes
16	si8	record time (negated uUTC)

The body follows, padded with 0x7e to a 16-byte multiple (AES block size); in encrypted sessions record bodies are level-2 encrypted. Body layouts: Note = text; EDFA = si8 duration + text; Seiz = si8 earliest onset, si8 latest offset, si8 duration. Each .ridx entry (24 B): type[4] @ 0, version @ 5/6, encryption @ 7, file offset (file-relative) @ 8, time @ 16.

Sentinels and other conventions¶

Meaning	Value
"no entry" time (si8)	`INT64_MIN`
"no entry" si8 / si4 / ui4	−1 / −1 / `0xFFFFFFFF`
RED NaN sample	`INT32_MIN`
GMT offset "no entry"	−86401
pad byte (blocks, record bodies)	`0x7e`

How this maps to the mef3io API¶

On-disk concept	API
Section 2 + universal header	`Reader.info(ch)` (fs, ufact, times, counts)
Section 3 (subject metadata)	`Reader.info(ch)` `subject_*` fields; `None` without L2 access
Segment triples	`Reader.segments(ch)` — time/sample range per segment
`.tidx` entries	`Reader.toc(ch)` — per-block start time, extrema, discontinuity
RED blocks	decoded transparently by `read` / `read_raw`
Records	`Reader.records(ch)` / `Writer.write_annotations(...)`

Related: encryption_model.md (how the two password levels derive keys and what they unlock), legacy_comparison.md (measured differences vs pymef/mef_tools), and design.md (mef3io internals).