RELOAD

From OHRRPGCE-Wiki
Jump to: navigation, search

The current lump format is, in many cases, a kludge. RELOAD is a proposed format to replace existing data lumps with marked up data, enabling backwards compatibility without the need for a data upgrade procedure, and allowing semantic access to data.

Several formats were considered, and I (Mike C.) decided that none were satisfactory for this purpose. So, I came up with the idea of RELOAD.

Really Efficient List Of Arbitrary Data (an alternative acronym was VLOAD, or Variable List Of Arbitrary Data)

The format is that of a hierarchal tree. The file format is essentially a root node that contains everything in the file. Children can be nested to an arbitrary degree, similar to other formats such as XML or JSON, without some of the overhead of a text-based language.

To be sure, there is overhead, but the benefit of being able to ask for, say, the "text box's text" instead of "the data at offset 32" makes this solution superior to a flat data file.

Contents

[edit] RELOAD format

A RELOAD file consists of three parts: the header, the body, and the string table.

[edit] File Header

About Formal Specs

Data Meaning
BYTE * 4 Magic word "RELD"
BYTE Version number (currently 1)
INT Header size (altogether, should be 13)
INT String table position (from beginning of file)

[edit] Body

The body is composed of a single element, known as the "root" element. The root contains all the data in the file, as children. Theoretically, the root could be a string element, with a simple payload, but in practise, it will always be a "container" element.

[edit] An Element

Data Meaning
INT Total size of content, not including this INT
VLI Tag name, stored in the string table
BYTE Type of element. See below for a list.
??? Data. This data is wholly dependent on the Type INT. See below for more details.
VLI Number of children
??? each child element, back to back (If number of children > 0)

Children are whole elements said to be "contained" by this one, and have the same structure (just, nested)

NOTE: The tag name should not start with the '@' character. This is reserved for "attributes" in documents converted from XML.

There are no specifications for what is a valid tag name; however the empty string is definitely valid. Currently, all RELOAD documents use tag names matched by [_a-z]*

The data chunk can be any of the following types.

[edit] Element Type 0 (null)

A null element has no data. Think of it as a flag.

[edit] Element Type 1 (8-bit integer)

Data Meaning
BYTE The integer to be stored.

[edit] Element Type 2 (16-bit integer)

Data Meaning
SHORT The integer to be stored.

[edit] Element Type 3 (32-bit integer)

Data Meaning
INT The integer to be stored.

[edit] Element Type 4 (64-bit integer)

Data Meaning
INT * 2 The integer to be stored.

[edit] Element Type 5 (double precision float)

Data Meaning
DOUBLE The float to be stored.

[edit] Element Type 6 (string, inline)

This can be used to store a string, or a blob of data (possibly containing NUL bytes). It doesn't make much of a difference to RELOAD.

Data Meaning
VLI Size of string
BYTE * size The string data

[edit] String Table

Element names are stored in a string table, to streamline the data itself. The table is, generally, appended after the body. Index zero of the string table is always the empty string, however it is not written to file. The first string in the on-disk table is string 1. The table has this format:

Data Meaning
VLI number of strings written to file (== size of the string table - 1)
??? string data, for table index 1 and above

The string data consists of a number of records with this format:

Data Meaning
VLI Size of string
BYTE * size string data

[edit] VLI (Variable Length Integer)

VLI (variable length integer) is a special format for storing integers which, obviously, can vary in length. By breaking the integer to be stored into smaller-than-byte-sized chunks, and using the remaining bits as control flags, we can effectively save bytes for integers that are typically small, but have potential to be quite large.

The first byte in a stored VLI value looks like this:

 abxxxxxx

The low 6 bits are the low 6 bits of the integer. 'a' is a flag, indicating whether there are more bytes. 0 means this is the last byte. 'b' is a flag indicating whether the VLI is negative. 0 means that it is positive.

Subsequent bytes are similar, but don't have negative flags, so they look like this:

 axxxxxxx

'a', again, is the continuation flag, while 'x' are the next-higher seven bits.

Example:

Let's read this value:

 0x83 0x01

The first byte looks like this:

 10000011

The first bit is set, so there's another byte. The low 6 bits are the number 3. Let's remember that.

The next byte looks like:

00000001

The first bit is clear, so obviously this is the last byte. So, if we shift the rest of the bits up by 6 (to make room for the first 6 bits we read), we get 64. Adding in those other bits, we get our final total, 67.

NOTE: The current implementation of this algorith is limited to an unencoded 64-bits. This is a platform limitation (the complier does not support any integer data-types greater than 64-bit).

Plus, the overhead of the continuation flag has already chewed up 10 bits at this point, so it would be more efficient to use another encoding for these hypothetical really-big integers.

[edit] To be done

A TODO list for the RELOAD related code, in order of priority (row-major order)

  • Implement attributes in xml2reload
    • attributes are ordinary nodes with names that start with "@"
    • enumerate attributes when parsing an element? (for real this time)
    • need to include attributes when going back to XML
    • what about type hints intended for RELOAD? Needed?
  • Finish API
    • need content getting functions (GetString, GetInteger, etc)
    • need functions for moving branches around (or, do we?)
    • need navigation functions (GetChildByName, GetChildrenByName (see below))
  • Implement NodeList
    • which is like "search results", in that it stores a list of nodes not directly related to each other
    • will be returned from functions like GetChildrenByName and the RPath functions.
  • Implement RPath (RELOAD query syntax)
    • it will bear a surprising resemblance to XPath ("/root/mynode[@id=1]/foo")
    • the first "/" is the root. if omitted, navigate relative to here.
    • selectors ("[@id=1]") match based on the contents of child nodes.
      • since attributes already start with "@", then we can treat everything as ordinary nodes
    • other important features: ".." - up a level, "*" - everything
  • Import RELOAD documents to RPGs/Plotscripting interface
    • one should be able to store as many RELOAD documents as they want, and ideally, they would be combined into one single document internally.
    • RELOAD document(), RELOAD get child by name(), RELOAD get string(), RELOAD get integer(), etc etc.
      • Plotscripting should have as much access as the native API
    • allow documents to be saved in the save file? (will be easier, when the save file format is converted over)

[edit] RPath Syntax

Note: Work has not quite started on this. This is merely a proposal/plan

All RPath searches are performed on what is referred to as the "Context node". The Context node has two purposes:

  1. The document specified by the Context node is the one that will be searched, and
  2. If the RPath query is relative, the search will be performed with the Context node as the root (i.e., all returned nodes will be children of the Context node). If the query is absolute, the Context node is not used for this purpose.

The Context node cannot be null, as a Document must be specified.

[edit] Relative path

node1

not done

This query will select all nodes under the Context node named "node1" without any other constraints.

not done

node1/node2

This query will select all nodes under the Context node named "node2" which directly descend from a node named "node1"

not done

node1//node2

This query will select all nodes under the Context node named "node2" which descend, directly or indirectly, from a node named "node1"

not done

node1[foo=1]

This query will select all nodes under the Context node named "node1" which have a child named "foo" which has a value equivalent to 1.

not done

node1[bar=2]/node2[foo=1]

This query will select all nodes under the Context node named "node2", which meet the following conditions:

  • They descend from a node named "node1"
  • The "node1" which they descend from must also have another child named "bar", which has a value equivalent to 2
  • They also have a child of their own named "foo", which has a value equivalent to 1.

That's enough for now

Personal tools