v2.0 - published 18th September 2005

 

Dendros File Format v2.0

www.rclsoftware.org.uk/dendros


Contents:

Introduction
Conventions used in this document
Overview
Components
   size_specifier
   value
   element
   document
Sample document
Future versions
Previous versions
Notes
Useful links

 


Introduction

The Dendros file format combines XML's simple self-documenting recursive/tree-like hierarchical structure with the ability to store type specific binary data.
The main aim of Dendros is that binary data structures can be encoded accurately and efficiently, with a low processing overhead.

The name 'Dendros' is derived from the Greek word for 'tree'.

 


Conventions used in this document

 


Overview

A data tree is an element which may contain other elements.
Individual elements may contain one or more values which contain data items of a specific type (eg. text, float, integer).

Data Tree

An example of a simple data tree.

 


Components

When a data tree is serialised (written to disk or copied to the clipboard, etc.) the first sixteen bytes must be the header, followed by a single element which contains zero or more other elements or values.

 

Composition of a Dendros document:

document=header + element
element=open_marker + name + zero_or_more(valueor zero_or_more(element) + close_marker
name=name_size + name_text
value=value_marker + value_size + value_data
name_size=size_specifier
value_size=size_specifier

Size of fixed length components:

header sixteen bytes
open_marker one byte
close_marker one byte
value_marker one byte

Variable length components:
 name_text
 value_data
 size_specifier

 


size_specifier

The name_size and value_size are encoded as follows:

size (bytes)encoding (bits)
0000 0000 .. 0000 007F0xxxxxxx
0000 0080 .. 0000 3FFF1xxxxxxx 0xxxxxxx
0000 4000 .. 001F FFFF1xxxxxxx 1xxxxxxx 0xxxxxxx
0200 0000 .. 0FFF FFFF1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx
......

The encoding is similar to UTF-8; each x represents a 'bit' in a big-endian unsigned integer of variable length.

For example:
A size of 7F bytes is represented by the byte sequence 7F.
A size of 80 bytes is represented by the byte sequence 81 00.
A size of FF bytes is represented by the byte sequence 81 7F.


Leading bytes:

The size_specifier may be padded with so called leading bytes, where one or more bytes with the value 80 are put at the beginning of the byte sequence.

For example:
A size of 7F bytes can also be represented by the byte sequence 80 80 7F.
A size of 80 bytes can also be represented by the byte sequence 80 81 00.
A size of FF bytes can also be represented by the byte sequence 80 81 7F.


value

The values are encoded as follows:

value = value_marker + value_size + value_data


value_marker:
The top bit must always be set.

The other seven bits specify the data type of the data item(s) in value_data.

In v2.0, the following built-in types typically found in programming languages are supported:

value_markerdata item sizedata item type
81one byte1-bit boolean (0 = false, 1 = true)
82one byte8-bit unsigned integer
83one byte8-bit    signed integer (2s complement)
84two bytes16-bit unsigned integer (little endian)
85two bytes16-bit    signed integer (little endian, 2s complement)
86four bytes32-bit unsigned integer (little endian)
87four bytes32-bit    signed integer (little endian, 2s complement)
88eight bytes64-bit unsigned integer (little endian)
89eight bytes64-bit    signed integer (little endian, 2s complement)
8Afour bytes  Single precision floating point number (little endian, normalised IEEE 754)
8Beight bytesDouble precision floating point number (little endian, normalised IEEE 754)
8Ctwo bytesUTF-16LE encoded character (must not be zero)

No other value_markers must be used in v2.0 Dendros documents.


Note - future versions:

In future revisions of the format, further data types (e.g. 96-bit floating point numbers) may be supported.
When parsing a Dendros document with a higher minor version number (e.g. v2.1), elements with other value_markers must be skipped.


value_size:
Specifies the size of value_data in bytes.


value_data:
Contains the data items. The data items are stored sequentially, with no padding.


Examples of value encodings:

8B 08 00 00 00 00 00 00 14 40
An IEEE 64-bit double precision floating point number, representing the number five.

85 06 01 00 02 00 03 00
Three 16-bit signed integers, representing the numbers 1 to 3.

8C 08 B1 03 B2 03 B3 03 B4 03
A text string consisting of the four Greek letters 'alpha', 'beta', 'gamma', and 'delta'.

8C 00
An empty text string.

82 0A 11 22 33 44 55 66 77 88 99 AA
Ten bytes of raw data.

81 03 01 00 01
Three boolean values: true, false, true.

 


element


open_marker

7B


close_marker

7D


name

The name serves the same purpose in Dendros as the element name serves in XML.

The component name_text is an array of valid printable unicode characters encoded in UTF-16LE and must not contain any null terminators or control characters.
The component name_size specifies the size of name_text in bytes.


Note - future versions:

In future revisions of the format, the use of special elements, who's name_size is 00, may be introduced to implement namespaces and other constructs. To accommodate the use of namespaces in future versions, Dendros v2.0 documents must not contain zero length names, and the colon character (3A) must not be used in name_text.
When parsing Dendros documents with higher minor version numbers (e.g. v2.1), elements with zero length names or who's name_text contains the colon character (3A) must be skipped.

 


document

header
CE BE CF 85 CE BB CE BF CE BD 02 00 0D 0A FF 0A

The first ten bytes of the header are the Greek characters xi upsilon lambda omicron nu in UTF-8 encoding, spelling the ancient greek word 'xylon'.
The next two bytes specify first the major and then the minor version of the format, currently '2.0'.
In the final four bytes, 0D 0A and 0A enable the detection of newline character alteration, while FF signals that this is not a UTF-8 file.

 

Document validation

After the header must follow 7B, the open_marker of the root element.
The last byte of a document must be 7D, the close_marker of the root element.

 


Sample document

Consider this simple XML text document:

<?xml version="1.0" ?>
<image>
<dim>
<w>2</w>
<h>3</h>
</dim>
<data>
11 11 11 12 12 12
21 21 21 22 22 22
31 31 31 32 32 32
</data>
</image>

150 bytes

 

An equivalent binary Dendros document would look like this:

Dendros v2.0 example document: CE BE CF 85 CE BB CE BF CE BD 02 00 0D 0A FF 0A 7B 0A 69 00 6D 00 61 00 67 00 65 00 7B 06 64 00 69 00 6D 00 7B 02 77 00 84 02 02 00 7D 7B 02 68 00 84 02 03 00 7D 7D 7B 08 64 00 61 00 74 00 61 00 82 12 11 11 11 12 12 12 21 21 21 22 22 22 31 31 31 32 32 32 7D 7D

87 bytes

 

Here is the same binary document in a more readable layout:

Dendros v2.0 example document - formatted

Future versions

Parsers for v2.0 may encounter the extended constructs (described under element and value) in documents with a major version of 02 and a minor version which is greater than 00.
In such cases the document can be parsed as normal and the extended constructs must be skipped.
However, parsers for v2.0 must refuse to parse documents with a major version number which is greater than 02.

 


Previous versions

Dendros v1.0 (16-MAR-2004)
Dendros v1.1 (11-JAN-2005)

The main differences between v1 and v2 are as follows:

 


Notes

  1. After the name_text there must follow one of these components:
  2. A value must never be preceded by close_marker and must never be followed by open_marker.
  3. Dendros can be used to store whole files, such as a jpeg encoded image, by placing the raw byte contents of each file into a single value component (use value_marker 82).
  4. The number of data items in a given value_data component is determined by dividing value_size by the data type size.
  5. MS Windows GUID-s should be stored as text, using the general format: {FB6AFC0F-EA86-4d9e-A545-6BC98FE780D5}

 


Useful links

XML
IEEE Floats
UTF-16

 


 

Valid HTML 4.01