v2.0 - published 18th September 2005
www.rclsoftware.org.uk/dendros
Contents:
Introduction
Conventions used in this document
Overview
Components
size_specifier
value
element
document
Sample document
Future versions
Previous versions
Notes
Useful links
The Dendros file format combines XML's simple self-documenting recursive/tree-like hierarchical structure with the ability to store type specific binary data.
The main aim of Dendros is that binary data structures can be encoded accurately and efficiently, with a low processing overhead.
The name 'Dendros' is derived from the Greek word for 'tree'.
A data tree is an element which may contain other elements.
Individual elements may contain one or more values which contain data items of a specific type (eg. text, float, integer).
An example of a simple data tree.
When a data tree is serialised (written to disk or copied to the clipboard, etc.) the first sixteen bytes must be the header, followed by a single element which contains zero or more other elements or values.
Composition of a Dendros document:
| document | = | header + element |
| element | = | open_marker + name + zero_or_more(value) or zero_or_more(element) + close_marker |
| name | = | name_size + name_text |
| value | = | value_marker + value_size + value_data |
| name_size | = | size_specifier |
| value_size | = | size_specifier |
Size of fixed length components:
| header | sixteen bytes | |
| open_marker | one byte | |
| close_marker | one byte | |
| value_marker | one byte |
Variable length components:
name_text
value_data
size_specifier
The name_size and value_size are encoded as follows:
| size (bytes) | encoding (bits) |
| 0000 0000 .. 0000 007F | 0xxxxxxx |
| 0000 0080 .. 0000 3FFF | 1xxxxxxx 0xxxxxxx |
| 0000 4000 .. 001F FFFF | 1xxxxxxx 1xxxxxxx 0xxxxxxx |
| 0200 0000 .. 0FFF FFFF | 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx |
| ... | ... |
The encoding is similar to UTF-8; each x represents a 'bit' in a big-endian unsigned integer of variable length.
For example:
A size of 7F bytes is represented by the byte sequence 7F.
A size of 80 bytes is represented by the byte sequence 81 00.
A size of FF bytes is represented by the byte sequence 81 7F.
Leading bytes:
The size_specifier may be padded with so called leading bytes, where one or more bytes with the value 80 are put at the beginning of the byte sequence.
For example:
A size of 7F bytes can also be represented by the byte sequence 80 80 7F.
A size of 80 bytes can also be represented by the byte sequence 80 81 00.
A size of FF bytes can also be represented by the byte sequence 80 81 7F.
The values are encoded as follows:
value = value_marker + value_size + value_data
value_marker:
The top bit must always be set.
The other seven bits specify the data type of the data item(s) in value_data.
In v2.0, the following built-in types typically found in programming languages are supported:
| value_marker | data item size | data item type |
| 81 | one byte | 1-bit boolean (0 = false, 1 = true) |
| 82 | one byte | 8-bit unsigned integer |
| 83 | one byte | 8-bit signed integer (2s complement) |
| 84 | two bytes | 16-bit unsigned integer (little endian) |
| 85 | two bytes | 16-bit signed integer (little endian, 2s complement) |
| 86 | four bytes | 32-bit unsigned integer (little endian) |
| 87 | four bytes | 32-bit signed integer (little endian, 2s complement) |
| 88 | eight bytes | 64-bit unsigned integer (little endian) |
| 89 | eight bytes | 64-bit signed integer (little endian, 2s complement) |
| 8A | four bytes | Single precision floating point number (little endian, normalised IEEE 754) |
| 8B | eight bytes | Double precision floating point number (little endian, normalised IEEE 754) |
| 8C | two bytes | UTF-16LE encoded character (must not be zero) |
No other value_markers must be used in v2.0 Dendros documents.
In future revisions of the format, further data types (e.g. 96-bit floating point numbers) may be supported.
When parsing a Dendros document with a higher minor version number (e.g. v2.1), elements with other value_markers must be skipped.
value_size:
Specifies the size of value_data in bytes.
value_data:
Contains the data items. The data items are stored sequentially, with no padding.
Examples of value encodings:
8B 08 00 00 00 00 00 00 14 40
An IEEE 64-bit double precision floating point number, representing the number five.
85 06 01 00 02 00 03 00
Three 16-bit signed integers, representing the numbers 1 to 3.
8C 08 B1 03 B2 03 B3 03 B4 03
A text string consisting of the four Greek letters 'alpha', 'beta', 'gamma', and 'delta'.
8C 00
An empty text string.
82 0A 11 22 33 44 55 66 77 88 99 AA
Ten bytes of raw data.
81 03 01 00 01
Three boolean values: true, false, true.
open_marker
7B
close_marker
7D
name
The name serves the same purpose in Dendros as the element name serves in XML.
The component name_text is an array of valid printable unicode characters encoded in UTF-16LE and must not contain any null terminators or control characters.
The component name_size specifies the size of name_text in bytes.
In future revisions of the format, the use of special elements, who's name_size is 00, may be introduced to implement namespaces and other constructs.
To accommodate the use of namespaces in future versions, Dendros v2.0 documents must not contain zero length names, and the colon character (3A) must not be used in name_text.
When parsing Dendros documents with higher minor version numbers (e.g. v2.1), elements with zero length names or who's name_text contains the colon character (3A) must be skipped.
header
CE BE CF 85 CE BB CE BF CE BD 02 00 0D 0A FF 0A
The first ten bytes of the header are the Greek characters xi upsilon lambda omicron nu in UTF-8 encoding, spelling the ancient greek word 'xylon'.
The next two bytes specify first the major and then the minor version of the format, currently '2.0'.
In the final four bytes, 0D 0A and 0A enable the detection of newline character alteration, while FF signals that this is not a UTF-8 file.
Document validation
After the header must follow 7B, the open_marker of the root element.
The last byte of a document must be 7D, the close_marker of the root element.
Consider this simple XML text document:
<?xml version="1.0" ?> <image> <dim> <w>2</w> <h>3</h> </dim> <data> 11 11 11 12 12 12 21 21 21 22 22 22 31 31 31 32 32 32 </data> </image> |
150 bytes
An equivalent binary Dendros document would look like this:
87 bytes
Here is the same binary document in a more readable layout:
Dendros v1.0 (16-MAR-2004)
Dendros v1.1 (11-JAN-2005)
The main differences between v1 and v2 are as follows: