Henceforth VM
From OHRRPGCE-Wiki
The Henceforth Virtual Machine (HVM, Henceforth VM) is used to dynamically interpret Henceforth (HF, Hamsterforth) scripts. The OHRRPGCE FMF does not use Hamsterspeak's traditional HSX/HSZ files; rather, it opts for a stack-based alternative. Henceforth files are stored with the .HF extension.
Contents |
[edit] Acknowledgements
A big thanks to both James and Ralph for their well-founded suggestions regarding the HVM, and to Mike for clarifying countless issues. If you start reading into the minute details of the HVM, you'll find that a whole lot of it is lean, clean, and sensible. This is almost entirely due to James & Ralph's suggestions & experience, which I truly appreciate.
[edit] Introduction to Henceforth
The Henceforth language is a simple language with three alternative storage formats.
Format T is a text-based format of HF. It is considered a source or development-level format. All examples given in HF tutorials will, by default be in Format T. For example, to explain addition, I might give you the following snippet:
4 5 add
...or, equivalently:
4 5 add
...and tell you that this pushes 4, then 5, and then uses the add primitive.
Format B is the bytecode-equivalent of Format T. One might consider it the "compiled" format, but this distinction is misleading. Unlike other stack-based bytecodes (like Java), Format B and Format T are exactly the same in HF. There is no "compilation" involved in the creation of a Format B file; it's just a binary version of Format T --and, hence, the format computers prefer to read. It consists of WORDS of data (one word = 2 bytes, big-endian format), where each word is (usually) one command.
Format HF is a compressed binary version of Format B. Whereas Format B is used internally by the HVM, Format HF is only used to store HF scripts, or to transfer them over a network. It is therefore very compact, variable-width, and almost impossible to deal with manually. Once should probably always deal with scripts in either Format T or Format B. Lumps ending in .HF are encoded in Format HF.
Format HF is stored with the extension .HF. Format T is stored with the extension .HFT. Format B data is only ever stored internally, however, in the event that phones' I/O start outperforming their processors, we might reasonably store these in .HFB files.
[edit] Where the HVM fits in to the game loop
Let's pretend that on a certain map in your game, a certain NPC is set up to "run script 4" when the user walks up to it and presses "Accept". The following diagram shows exactly how this happens.
[edit] A: Reaction
Upon pressing "Accept", the game engine checks if this NPC has a script ID assigned to its "accept" behavior. The classic OHR performs this same task. If one exists, it is instructed to "initialize".
[edit] B: Load Sript
To initialize a script, the game engine checks if that script is already in the script cache. If so, nothing happens at this stage. Otherwise, the engine locates "4.HF" and expands it into Format B. This format is then stored in the cache, as a large array of WORDs. (Actually, an array of INTs is used, with each INT storing two WORDs, due to a kink in the Java Virtual Machine. But this fact is hidden from the rest of the HVM). Note that the script cache stores recently-used scripts; it does not contain every script in Format B, and it will often delete one script to make room for another, causing the first to be reloaded from Type HF the next time it is called.
[edit] C: Initialization
Any Format B source requires some additional runtime data structures. The second step in initialization allocates and sets these. First, a Program Counter (PC) is set to zero. This stores the offset to the current instruction. Next, a "script ID" is set, which refers back to the Format B source (multiple copies of the same script only load the binary once.)
[edit] D: Side Note
At this point, it is worth being aware of two global data structures. The "Stack" is the place where all stack operations on ALL scripts take place. Stack-based programmers will know that you shouldn't leave things lying around on the stack, so the "Heap" (variables list) can store variables by name. For example, let's say you have a Hamsterspeak script to move NPCs 5 and 6 three steps to the right:
walk NPC (5,east,3) walk NPC (6,east,3)
This might be cross-compiled into the following Hamsterforth:
5 #Push 5 E #Push the constant "E" (east) 3 #Push 3 NPC_W #Call "NPC walk", and KEEP the top 3 elements on the stack rot #Switch 5 and 3 on the stack incr #Add one to 5 rot #Switch 3 and 6 on the stack NPC_W! #Call "NPC walk", and CONSUME the top 3 elements on the stack
This is perfectly fine, but if you want to wait for NPC 5 before sending NPC 6:
walk NPC (5,east,3) wait for NPC (5) walk NPC (6,east,3)
...you may have a problem. Currently, the OHR is not threaded, but future versions may be, and some other script might use the stack while you are "wait"ing. So, the heap is used to store variables:
5 E 3 NPC_W #Call "NPC walk", and keep arguments !@step #Pop "3" and store it in "step" (! means to create it, too) !@dir #Pop "E" and store it in "dir" dup #Duplicate 5 (push it again) !@id #Pop "5" and store it in "id" WAIT_N! #Call "Wait for NPC", and CONSUME the top element (5) #Other threads may execute, if threading is enabled. id@! #Push the value of "id" (! means to delete ID after pushing it) incr #Increment 5 to 6 dir@! #Push the value of "dir" step@! #Push the value of "step" NPC_W! #Call "NPC walk", and consume arguments
Henceforth manages all of this for you, but it is important to understand what's going on behind the scenes. Using the stack is preferred over using variables, and allows some easy optimizations. If, in our example, NPCs 5 and 6 are at (3,4) and (4,4), and both move at the same speed, then you can simply move NPC 6 3 squares left, and the wait is no longer needed:
set NPC position (6,1, 4) walk NPC (5,east,3) walk NPC (6,east,3)
[edit] E: Continuation
At this point, the game engine continues with its tasks for that cycle (e.g., message boxes, vehicles). This (including user input) constitutes the "Update" phase of the game loop. Following this is the "Paint" phase, which shows one frame's worth of updates to the user. After painting, the tick count is incremented, and the loop repeats. The first thing that happens then is the "scripts" phase, during which scripts are actually executed. The loop repeats indefinitely.
[edit] Some Technical Considerations
When a script is triggered by any means, it suspends the currently-executing script; see "The Stack" in "How does plotscripting work?". This makes sense; if your script calls another script, one expects the new script to take over. It also means that if you step on several map tiles which each have (long) scripts, the last maptile's script will finish first. (Threading will probably change this.)
But wait, there's more! Several small but important details of script initialization include:
- If you trigger the same script twice (e.g., by stepping on two copies of an NPC one after the other) the script loader will detect that you are trying to load a script with the same id as the currently executing script. If the "Permit double-triggering of scripts" general bitset is on (or if the script is triggered by another script) then this is allowed, and a second instance of the script is initialized.
- When a script instance is first loaded, its "delay" counter is set to 2 ticks. This is not the same thing as instructing the script to "wait", because it remains the active script while delaying. If another script pre-empts this one, the delay counter does not decrement until the new script finishes. (When the new script finishes, the old one immediately resumes what it was doing, be it running or decrementing the "delay" counter.)
- To prepare for parallel scripts, the active script is pushed to the top of the "waiting" stack, and is manipulated by a reference ID. This way, all scripts are kept in one place, and GAME_FMF can just scan through this list once each cycle, updating all scripts until they enter a waiting state.
[edit] Format B Detailed
Format T is simple; it's just the language. Format HF is something you should never deal with (it'll probably just be compressed using a general-purpose zip algorithm... I'm looking at you 7-zip.) Format B is the interesting one. It's designed to be quick to parse, and relatively conservative on space. A single word in Format B is parsed in one pass. That is to say, if you read "00" and determine this is a "single-width" command, you no longer ever need to use those first two bits again to understand the remainder of the command.
[edit] The First 2 Bits
The first 2 bits of each word explain how the remainder of that word is allocated.
00 is the simplest and fastest to parse.
01 implies that this word specifies how many words follow, either directly ("there are x bytes after") or indirectly ("I am a 2-word integer").
10 is worst to parse; this "variable-width" type requires that parsing continue until a special "magic number" is reached. Although this allows strings to be of arbitrary length (which is better than imposing an arbitrary limit) this complicates all but the simplest interpreters.
11 currently has no use, and will cause a parse error.
[edit] Single-Width: The Next 4 Bits
Single-width bytecodes are relatively simple; they use 4 bits for a control code, and 0, 8, or 10 bits for data or control.
0000 implies the commonly-used "short integer". The right-most remaining 8 bits store the actual value of the integer, from 0 to 255 in value. Negative numbers will probably ues the 2 left-most remaining bits, but that hasn't been decided yet.
0001 implies the "end define" control, which the parser searches for whenever it encounters a "begin define". The 10 bits following have no meaning.
0010 implies a Hamsterspeak API call. The remaining 10 bits are used for the ID of the API call, as listed here.
0011 implies a Henceforth primitive command, like "pop" or "dup". The remaining 10 bits identify the command; see the table below for the specifics.
0100 implies a user-defined function call. All functions in .HSX are compiled to seperate files based on their id (e.g., 1.HF, 2.HF) -the remaining 10 bits of this bytecode specifty that ID. If the user has more than 1023 functions, they will be compiled at runtime (in USER.HF) and stored in the dictionary; this may induce a drain on memory.
**** (i.e., "anything else) incurs an error from the parser.
[edit] Fixed-Width: The Next 6 Bits
There's only one fixed-width bytecode: 000000 implies a single-word integer value follows this word. The format is not fully defined; probably, the integer will be from 0 to 65535, and the first word will contain some information about the sign.
It should go without saying that anything besides 000000 crashes the system.
[edit] Variable-Width: The Next 6 Bits
Variable-width bytecodes are interesting. They are specified by the 6 bits following the first 2. In all cases, the remaining byte of this word is reserved; variable-width bytecodes are so large anyways that this byte is reserved for flexibility.
000000 implies an ASCII-style string, although the font in use by the OHR may wildly affect which character is associated with each letter.
000001 implies a function or variable definition. Currently, both are treated as the same thing internally.
000010 implies the calling of a user-defined function. Although a user function can be called by ID, "internal" functions must be called by name.
000011 implies a unicode string. Don't use these just yet; they're nowhere near ready for deployment.
[edit] Primitive Command IDs
| Control Code | Primitive | Stack Effect Diagram |
| 0 | dup | ( 1 2 3 -- 1 2 3 3 ) |
| 1 | swap | ( 1 2 3 -- 1 3 2 ) |
| 2 | drop | ( 1 2 3 -- 1 2 ) |
| 3 | over | ( 1 2 3 -- 1 2 3 2 ) |
| 4 | rot | ( 1 2 3 4 -- 1 4 3 2 ) |
| 5 | + | ( 1 2 3 -- 1 5 ) |
| 6 | - | ( 1 3 2 -- 1 1 ) |
| 7 | * | ( 1 2 3 -- 1 6 ) |
| 8 | / | ( 1 7 3 -- 1 2 ) |
Note: Stack-effect diagrams are oriented with the top of the stack on the right.





