Writing a loader in Malbolge

Malbolge, for those not familiar with it, is a language designed to be difficult (or perhaps impossible - it's not been proved to be Turing complete) to program in. See the link above for the gory details.

Why write a loader?

One of the many difficulties in Malbolge programming is that only legal instructions can be initially loaded into memory, and all of these are small (less than 128). This is fine for instructions, but makes it very hard to write big programs, since there is no way to enter branch target addresses, or data register values, or data locations, that are larger than this. This essentially limits all directly entered programs to less than 255 bytes. (The numbers 129 through 255 can be entered directly due to a bug in the interpreter.)

So the first thing you need to write is a loader, that feeds a (carefully crafted) byte string to a small bootstrap program, and thus allows the user to start with any memory location set to any value. It's not at all obvious such a program can be written (it's certainly much more ambitious than any existing Malbolge programming).

The basic idea for the loader

If we can repeatably set two locations to completely arbitrary values, we can build a loader.  Call the locations ADDR (for address) and DATA (for data).  Then the following code, repeated often enough, will allow us to set any memory location to any value.

   ROT  DATA    ; this loads the D value  into A.  Of course it's rotated, but since we assuming D can be set to any value, this can be pre-compensated  away.
   LOAD ADDR  ; this loads the D register.  Of course D will be incremented before we use it, but once again this can be compensated for
   OP                   ;  This does the actual mod.  May need to be repeated several times, with different values, to get the desired result

How to set a single location:

We can read a byte from the input, and use it to do an OP.  But this can only change the lowest 5 trits, since when we do an input the upper part of the word is always set to 0.  We can also do a rotate.  Unfortunately, many values cannot be reached by any combination of OPs of input values and ROTATES.  This is because the upper 0s force all 5 of these trits to 0 or 1. If a value of 2 is needed here it is lost.

So next we introduce the 'idler' (named for the small gear between 2 big gears).  Now the sequence is

  1.     INPUT
  2.     OP          ; with the idler
  3.     OP          ; the desired value.

We will set the idler initially to 22222xxxxx, where x does not matter as long as it is known.  Because of the properties of OP, the 2s in the upper trits of the idler will stay 2s, and the lower trits can be set to anything we want.  And for the second OP, the upper trits are always 2, which does not lose any information.  (the 1s and 2s in the target are swapped).  A test program shows that we can reach any value with a combination of ops of the form 22222xxxxx and ROTATES.  In fact this can be done with just one operation combining the OP and the ROTATE.  There is one complication, the first INPUT/OP (steps 1 and 2) combination may need to be done twice before doing the second OP (step 3) , since there are some values of the idler that cannot be reached in one step from the previous idler.

The basic plan of the loader:

So now the basic plan of the loader is clear.  We read a byte, and use it to do a computed branch to one of these operations.
  1. Do an input and OP into the idler
  2. Op the idler into ADDR
  3. Op the idler into DATA
  4. OP the location indicated by ADDR with the data at DATA
  5. Branch to the start of the newly loaded code.
This needs a 5 way computed branch.  The first observation is that you don't really need step 5.  After all, if you can change any location, you can change some branch target within the loader.  So you simply set all the locations you want, then zap one of the branch addresses within the loader to go there on the next pass through.

Building a computed branch.

This is not straightforward (What did you expect?).  The basic problem is that all branch addresses are in memory, and the only to save a value in memory is through an OP, and the 0s in the top of the input values cause the value to radically change.  So one again, we use an idler, in this case of the form 11111xxxxx.  We read an input byte, OP it into the idler, then branch to the idler.  Once again the basic problem is that you cannot reach all values of 11111xxxxx from all other values.  This means you cannot chose your next state arbitrarily.  However, there are subsets of the values where abtritrary next state selection is possible.  We will endeavor to use one of  these.

You can also compute a D value, then do a LOAD, then a branch.  I'm not yet sure which is better

Combining steps (2), (3), and (4):

This is a really cool optimization, using the weirdest 'features' of Malbolge.  Imagine a loop built basically like this: (actually there are branches between each step, for two reasons - the instructions need to be located at certain memory locations for proper effect, and the branches are used to 'undo' some of the code self-modification.  However they are omitted here for clarity).
Since the D is auto incremented, operations I1 and I2 will work on different memory locations.  First, make sure the input and the OP always work on every cycle, by putting them at locations where they are a 2-cycle and fixing them after each use (this can be done by branching to the instruction afterwords, as explained on the main page). Now pick both I1 and I2 to be part of an instruction cycle of the same length.  2 of the 9-cycles look very good.  A ROTATE loaded at location 71 or 83 (mod 94) does only ROTATES, OPS, and NOPs.  Now by starting the one at location 83 as a NOP, the two can be put out of phase.  So now, if we execute this loop 9 times, we get the following behavior, where N means NOP:
Pass through the loop:
71 mod 94
83 mod 94
N (loadable)

So if we execute this 9 times, we get exactly what we want.  Both locations get one OP followed by a ROTATE.  The OPs are at least 2 locations apart, so we can change the idle to the right value before the OP is done.  If we call this routine COMBINED, then we only need a 2 way branch.
There is one possible fly in the ointment.  Say we need to change OLD-ADDR->ADDR and OLD-DATA->DATA.  Using this technique this must be possible using exactly the same number of OP-ROTATE pairs for both changes.  Is this always possible?  For example, you could imagine a scenerio where the change OLD-ADDR->ADDR always required an odd number of OP-ROTATE pairs, but the change OLD-DATA->DATA requires an even number, perhaps due to some parity consideration in the OP.  Then this technique would fail.

Fortunately this fear can be shown to be groundless.  Instead of seaching for a sequence that transforms OLD_DATA->DATA directly, first transform OLD-DATA to 0, then 0 to DATA.  Then do the same for OLD_ADDR->0->ADDR.  Now pick the shorter of the two sequences.  The 0 step can be extended as many cycles as needed, since 0000000000 OPed with 2222222222 -> 0000000000 again.  Hence we can always pad the shorter sequence to be exactly the same length as the longer one.  A side benefit of this technique is that if we wish to calculate the transforms needed by table lookup, then the going through 0 requires 2 tables of size O(N), instead of the direct technique which requires a table of size O(N^2), where N is the numbe of values a Malbolge memory location can assume (59049).   The downside of this approach is that the sequences are longer than is strictly necessary, but if efficiency is your main concern perhaps Malbolge is not the right programming language to use.

Discovering this is what convinced me it is indeed possible to write a generic loader within the confines of the small code space accessible through the loader bug.  I'm now completely convinced this can be done.

Creating unloadable constants:

Lots of the constants you would like, as well as useful instructiions such as immutable nops at all locations, cannot be loaded directly. (This is part of the language spec, but a bug in the interpreter lets a somewhat wider variety of constants be loaded, but by no means all.) However, these can be created systematically.  At the very beginning of your code, create the location 2000000000 by ROTATING 0000000002, which can be loaded.  Then branch to it, and at this location we place a long chunk of straight line code.  Since there is a small value in D, this code will increment D through your main code one step per instruction.  Most of these instructions will be NOPs, but a few can be OPs or ROTATEs to set up the unloadable constants.  Note that once you make a pass through all your code, you could (if you need to)  reload the D register and make another pass.