Introduction to Malbolge

Malbolge, for those not familiar with it, is a language designed to be difficult (or perhaps impossible - until recently, there was not even an informal argument showing Turing completeness) to program in. For example, the effect of any instruction depends on where it is located in memory (mod 94, of course), all instructions are self-modifying (according to a permutation table) and both the code and data pointers are incremented after every instruction, making it hard to re-use any code or data. There is no way to initialize memory except to one of the 8 instruction characters, there is no LOAD or STORE operator, and the only available memory operators (both of them) work in trinary and are designed to be opaque. The only control flow construct is an unconditional computed jump, which is also nearly worthless since there is no way (or certainly no obvious way) to set memory to anything except the 8 instruction characters.

Believe it or not, Counter 02101012220   people (counted in trinary, of course) have so far have expressed an interest in programming in Malbolge! Note that if your first impression is that counting visitors in trinary is rather inconvenient, then you are missing the point of Malbolge altogether.

Originally, information about Malbolge was published at the site below, though this site is now dead, according to the author: http://www.mines.edu/students/b/bolmstea/malbolge/
Fortunately (or maybe not) this information has been preserved.  A copy of the original site was archived at
http://web.archive.org/web/20000815230017/http:/www.mines.edu/students/b/bolmstea/malbolge/
The language specification copied from the original site:  Malbolge language specification
The reference interpreter copied from the original site: Malbolge Interpreter
Note:  Where the spec and the interpreter differ (for example, the spec calls '<' an INPUT instruction and '/' an OUPUT, but the interpreter does the opposite), in this work the interpreter is held to be correct.

Although the language had been out since 1998, for many years the most complex known programs was 'hello, world', available in several versions.
http://www.acooke.org/andrew/writing/malbolge.html
http://www.antwon.com/index.php?p=234
http://www.wikipedia.org/wiki/Malbolge_programming_language
http://www2.latech.edu/~acm/helloworld/malbolge.html

In 2004, based on the analysis below, I wrote a program that copied input to output, though it did not terminate properly on end of input.

There was a claim that the '99 bottles of beer' program had been written in Malbolge.  ( The site was (now dead): http://99-bottles-of-beer.ls-la.net/m.html)  The implication is that the program was doing looping, testing and printing.  However, closer examination shows that the programmer was just doing a printf("") of the desired result using straight line code.  Conceptually this is exactly the same as the 'hello world' example above.

This difficult task of writing a general program in Malbolge was completed for real in 2005 by  Hisashi Iizawa , Toshiki Sakabe, Masahiko Sakai , Keiichirou Kusakari, and Naoki Nishida.  Their paper "Programming Method in Obfuscated Language Malbolge" (in Japanese) can be found at http://www.sakabe.i.is.nagoya-u.ac.jp/~nishida/DB/pdf/iizawa05ss2005-22.pdf.  The resulting source code for '99 bottles of beer' can be found at: http://www.99-bottles-of-beer.net/language-malbolge-995.html .  Though some of the theory developed here was used, reducing this to practice was an amazing feat of programming prowess.

Malbolge as a cryptosystem

The correct way to think about Malbolge, I'm convinced, is as a cryptographer and not a programmer. Think of it as a complex code and/or algorithm that transforms input to output. Then study it to see if you can take advantage of its weaknesses to forge a message that produced the output you want.

 Looked at as a cryptosystem, it has several weaknesses:

Some permutation cycles are short

First, the self modifying instructions do not form one large permutation. (If they did, then any instruction executed enough times would always turn into a 'HALT' at some point). So we can find instructions, which when executed, turn into other instructions, and then back. For example, an OP instruction, when located at location 20 (mod 94), will become a LOAD instruction, then a NOP, then another NOP, then back to a OP instruction, and so on. Cycles are length 2, 9, 4, 5, 6, and 68, and the instructions executed by each cycle depend on the starting position mod 94. In general the short cycles are more useful than the long ones, but the long cycle at 2 mod 94 is very good, as it cycles between input, output, and load D register instructions.  A list of all the cycles can be found here.

Jump instructions do not self modify

The next weakness is a biggie - any JUMP instruction is not self modifying! This happens because the order is:
  1. Instruction at C is executed
  2. Instruction at C is scrambled by the permutation table
  3. C is incremented
But the branch instruction changes C between steps (1) and (2). Thus the branch address is one less than the intended target, and neither the branch instruction itself nor the target is modified (the work before the target IS modified, but that's not so bad.  In fact we will use this to great advantage later). This is very helpful since it's much easier to cope with changing instructions than changing control flow. Also, once an instruction permutes into a branch instruction, it will not change any further.

Initializing values

The next weakness is in the program reader. It is clear from the text description that the intent is allow only valid instructions to be written into memory, and the rest of memory will be filled by the OP loop. This in general prevents the (ab)user from starting with memory set to any useful values. However, close inspection of the code reveals that non-printing characters (0-31) and (128-255) are written directly to memory without checking (except newline, tab, and a few other whitespace characters (those selected by isspace()), which are ignored). One could argue this is simply a bug in the interpreter, but taking advantage of a bug in the interpreter seems very much in character (so to speak). This is very helpful, allowing the programmer to ensure that the right branch target address is in the right spot in memory, for example.

Using these weaknesses, I've succeeded in writing a Malbolge program that copies its input to its output. Since some of it is
non-printing, here it is uu-encoded:

begin 666 copy.mb
M1"="04 _/CT\.SHY.#<V-30S,C$P+RXM+"LJ*2@G)B4D(R(A?GU\>WIY>'=V
M=71S<G%P;VYM;&MJ:6AG9F5D8V)A8%]>75Q;6EE85U955%-245!/3DU,2TI)
M2$=&141#0D% /SX]/#LZ.3@W-C4T,S(Q,"\N+2PK*BDH)R8E)",B(7Y]?'MZ
M>7AW=G5T<W)Q<&]N;6QK:FEH9V9E9&-B86!?7EU<6UI96%=655134E%03TY-
M3$M*24A'1D5$0R9?O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]Y+V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]O>2]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
DO;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;T*

end

Here are some more observations, not taken advantage of yet:

Getting aroung the effect of self modifying instructions

  All the various instructions appear in loops of length 2, although only when located at certain memory locations. This means that you can write a routine that does the right thing every other time, and nothing the alternate times.  However, since the instructions will only do this when located at the proper spots, you need to branch after each one.  For example, suppose you wanted to do
  1. Input
  2. OP
  3. rotate
In that order.   You can do this at any location, but only once.  If you try to execute it again,  the instructions will have self modified in various ways, and the next attempt will do something un-intended.  However, if you do it this way:
  1. Input (at location 53, mod 94)
  2. Branch to 82
  3. OP (at location 82, mod 94)
  4. branch to 59
  5. rotate (at location  59, mod 94)
  6. branch
Then each instruction is located where it is part of a 2-cycle.  So the first time it will do what is desired, the next time it will do nothing at all, then the next time each will do the correct thing again.. In fact you can do even better - you can execute the routine, then branch to each of the branches (this can work since the branch target is not coded with the branch, it's in data segment.  Thus the same branch can jump to multiple different locations.)  So why branch to a branch instead of going to the final location directly??  Because the side effect of a branch is to permute the instruction just before the branch target!  Thus executing the code, then chaining through the branches, restores the code (if 2-cycle) to the original state.

Many modifications are possible:  pre-munging, munge as you go, or post-fixing as described above.  Each is useful in some circumstances.

Building immutable NOPs

Only some locations have NOPs which can be loaded initially, and will always remain NOPs.  Such cycles exist for all locations (mod 94), but most never go through any official instruction and hence cannot be loaded directly.  However, all can be loaded by loading a constant from 129-255, then doing a single rotate (used as a divide by three.)  This gives numbers in the range of 43-85 (Note that the numbers must be evenly divisible by three since the LST will be rotated into the most significant trit).  Each location has at least one NOP loop that contains one such number.

Note that in the case of NOPs , as in branches, we do not need to worry about the cycle length, though for different reasons (branches don't change, and NOPs change but we don't care as long as they change into other NOPs).

Some observations on the OP operator:

OP is defined as:

| A trit:
________|_0__1__2_
0 | 1 0 0
*D 1 | 1 0 2
trit 2 | 2 2 1

If the memory (*D) is all ones, then the result is just the A register with 1s and 0s swapped.

If the A trit is all 2s, then the result is the memory with 1 and 2 swapped.

All the values that are easy to come by (instructions or input) will have 0s in their upper trits.  Thus after any OP they will have all 1s in these position.

You can set a memory location to a known value as follows:  First OP a location with itself.  (Any ROTATE or OP instruction will set the A register and memory to the same value). After resetting the D register, then OPing with itself, a location will contain only 0s and 1s.  Then if you OP with an A of 0, you'll get all ones.   If you OP with an A of all 1s,  you'll set A and memory to 0.

Loads and Stores:

You can synthesize a load from 10 rotates (which restores the original).  Alternatively, you can fill A with all 2s, then OP the location (which swaps 1s and 2s in the memory location. Then repeat this process, which swaps the memory back and loads A with the newly corrected value..  You can synthesize a 'store' by OPing twice into locations fulled with all '1's.  (If the memory trit is 1, then the OP bit gets written with 0 and 1 reversed and 2 kept the same.  If you do this twice you get the original back.)

Doing arithmetic in Malbolge


I suspect the best way to do arithmetic is by table lookup.  Even this is difficult (you need a table filled with computed values followed by an equally sized array of branch targets.  Then you load the value with a ROTATE or OP  instruction followed by a tables worth of immutable NOPs, followed by a LOAD and BRANCH.  Of course this scrambles your table entry, which must then be undone, and so on.).  This still looks easier than synthesizing arithmetic out of OPs and ROTATEs, at least to me.

Code Density

Most of these techniques require considerable code to perform simple operations.  In general this is no problem, and does not affect theoretical Turing completeness.  (If your primary concern is code density, perhaps Malbolge is not such a good choice of languages....) It is a problem in practice since only the bottom 256 locations can be easily addressed by any program you can input - any higher addresses must be synthesized.  This in itself takes lots of code.

A general strategy for writing larger Malbolge programs, and proving practical Turing completeness.

If the code can be made to fit, then a combinations of OPs, rotates, and computed branches should allow a bootstrap program that reads an arbitrary byte string into memory, which would help get around the 8 allowed character restriction, and allows use of more of the address space. Then it might be possible to write a BrainF***->Malbolge compiler, and so on....  This shows that Molbolge meets the practical definition of Turing Completeness - it can compute any problem that fits within its memory.  However, Malbolge can never meet the formal definition of Turing completeness, which requires access to an unlimited amount of memory.

Proving formal Turing completeness.

However, a very slight addition to Malbolge makes it truly and formally Turing complete.  There is nothing in the Malbolge spec that states that the INPUT and OUTPUT operations cannot refer to the same data stream, which then can be considered a tape with a 257 symbol alphabet.  (257 since INPUT can return the values 0..255, plus the special value 2222222222 upon end of file.  We'll assume this really means upon any attempt to read an undefined byte.)  INPUT can read the symbol on the tape and move the head one byte to the right, exactly what it currently does.  OUTPUT can write the symbol (A mod 256) to the current position and move the head one to the right, also exactly what it currently does.   Now for the change:  an OUTPUT with A=2222222222 (normally an obscure way to write the byte 168) moves the tape head one position to the left (in UNIX terms, it backs up the file location by one byte).  Call this variant Malboge-T, with the T standing for Turing completeness.

As far as I can determine, Malbolge-T would give results identical to classic Malbolge with all existing Malbolge programs*. (As if compatibility between Malbolge varients was a big practical problem).  However, since Malbolge-T has access to a potentially unlimited external memory, it at least has the possibility of being Turing complete. In fact it's not hard to show, using the table lookup techniques utilized in the BrainF***->Malbolge compiler, that a simple state machine with completely arbitrary state transitions can be implemented.  And if you can implement a relatively small arbitrary state machine (5 states times 5 symbols is sufficient),  and combine it with the bi-directional tape, then you can implement a Universal Turing Machine, and hence show true Turing completeness.

*except for the copy program above, which would now backup forever after EOF on the input, instead of spewing an infinite number of bytes with a value of 168.

It could be worse

Malbolge, although obviously difficult, could be worse.  Here are some suggestions for making it even tougher:
Happy programming,
Lou Scheffer


From Ryan Kusnery's weird languages page:

The day that someone writes, in Malbolge, a program that simply copies its input to it's output, is the day my hair spontaneously turns green. It's the day that elephants are purple and camels fly, and a cow can fit through a needle's eye.

I hereby release all my work on Malbolge, in any and all forms, into the public domain.
This page last modified 17 Apr 2015.
Back to LScheffer home page