Introduction to Malbolge
Malbolge,
for those not familiar with it, is a language designed to be difficult
(or perhaps impossible - until recently, there was not even an informal
argument showing Turing completeness) to
program in. For example, the effect of any instruction depends on
where it is located in memory (mod 94, of course), all instructions
are self-modifying (according to a permutation table) and both the
code and data pointers are incremented after every instruction, making
it hard to re-use any code or data. There is no way to initialize
memory except to one of the 8 instruction characters, there is no LOAD
or STORE operator, and the only available memory operators (both of
them) work in trinary and are designed to be opaque. The only control
flow construct is an unconditional computed jump, which is also nearly
worthless since there is no way (or certainly no obvious way) to set
memory to anything except the 8 instruction characters.
Believe it or not,
Counter
021010011
people (counted in trinary, of course)
have so far have expressed an interest in programming in Malbolge!
Note that if your first impression is that counting visitors in trinary
is rather inconvenient, then you are
missing the point of
Malbolge altogether.
Originally, information about Malbolge was published at the site below,
though this
site is now dead, according to the author:
http://www.mines.edu/students/b/bolmstea/malbolge/
Fortunately (or maybe not) this information has been preserved. A
copy of the original site was archived at
http://web.archive.org/web/20000815230017/http:/www.mines.edu/students/b/bolmstea/malbolge/
The language specification copied from the original site: Malbolge language
specification
The reference interpreter copied from the original site: Malbolge
Interpreter.
Note: Where the spec and the interpreter differ (for example, the
spec calls '<' an INPUT instruction and '/' an OUPUT, but the
interpreter does the opposite), in this work the interpreter is held to
be correct.
Although the language has been out for many years, the most complex
known program (until now) is 'hello, world', available in several
versions.
http://www.acooke.org/andrew/writing/malbolge.html
http://www.antwon.com/index.php?p=234
http://www.wikipedia.org/wiki/Malbolge_programming_language
http://www2.latech.edu/~acm/helloworld/malbolge.html
There is a claim that the '99 bottles of beer' program has been
written in Malbolge. ( See http://99-bottles-of-beer.ls-la.net/m.html)
The implication is that the program was doing looping, testing
and printing. However, closer examination shows that the
programmer was just doing a printf("") of the desired result using
straight line code. Conceptually this is exactly the same as the
'hello world' example above.
Malbolge as a cryptosystem
The correct way to think about Malbolge,
I'm convinced, is as a cryptographer and not a programmer. Think of it
as a complex code and/or algorithm that transforms input to output.
Then study it to see if you can take advantage of its weaknesses to
forge a message that produced the output you want.
Looked at as a cryptosystem, it has several weaknesses:
Some permutation cycles are short
First, the self modifying instructions do not form one large
permutation. (If they did, then any instruction executed enough
times would always turn into a 'HALT' at some point). So we can find
instructions, which when executed, turn into other instructions, and
then back. For example, an OP instruction, when located at location
20 (mod 94), will become a LOAD instruction, then a NOP, then another
NOP, then back to a OP instruction, and so on. Cycles are length 2,
9, 4, 5, 6, and 68, and the instructions executed by each cycle depend
on the starting position mod 94. In general the short cycles are more
useful than the long ones, but the long cycle at 2 mod 94 is very good,
as it cycles between input, output, and load D register
instructions. A list of all the cycles can be found here.
Jump instructions do not self modify
The next weakness is a biggie - any JUMP instruction is not self
modifying! This happens because the order is:
- Instruction at C is executed
- Instruction at C is scrambled by the permutation table
- C is incremented
But the branch instruction changes C between steps (1) and (2). Thus
the branch address is one less than the intended target, and neither
the branch instruction itself nor the target is modified (the work
before the target IS modified, but that's not so bad. In fact
we will use this to great advantage later). This is very helpful
since it's much easier to cope with changing instructions than
changing control flow. Also, once an instruction permutes into a
branch instruction, it will not change any further.
Initializing values
The next weakness is in the program reader. It is clear from the text
description that the intent is allow only valid
instructions to be written into memory, and the rest of memory will
be filled by the OP loop. This in general prevents the (ab)user from
starting with memory set to any useful values. However, close
inspection of the code reveals that non-printing characters (0-31) and
(128-255) are written directly to memory without checking (except
newline, tab, and a few other whitespace characters (those selected by
isspace()), which are ignored). One could argue this is simply a bug
in the interpreter, but taking advantage of a bug in the interpreter
seems very much in character (so to speak). This is very helpful,
allowing the programmer to ensure that the right branch target address
is in the right spot in memory, for example.
Using these weaknesses, I've succeeded in writing a Malbolge
program that copies its input to its output. Since some of it is
non-printing, here it is uu-encoded:
begin 666 copy.mb
M1"="04 _/CT\.SHY.#<V-30S,C$P+RXM+"LJ*2@G)B4D(R(A?GU\>WIY>'=V
M=71S<G%P;VYM;&MJ:6AG9F5D8V)A8%]>75Q;6EE85U955%-245!/3DU,2TI)
M2$=&141#0D% /SX]/#LZ.3@W-C4T,S(Q,"\N+2PK*BDH)R8E)",B(7Y]?'MZ
M>7AW=G5T<W)Q<&]N;6QK:FEH9V9E9&-B86!?7EU<6UI96%=655134E%03TY-
M3$M*24A'1D5$0R9?O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]Y+V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
MO;V]O>2]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]
DO;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;V]O;T*
end
Here are some more observations, not taken advantage of yet:
Getting aroung the effect of self modifying
instructions
All the various instructions appear in loops of length 2, although only
when located at certain memory locations. This means that you can
write a routine that does the right thing every other time, and
nothing the alternate times. However, since the instructions
will only do this when located at the proper spots, you need to
branch after each one. For example, suppose you wanted to do
- Input
- OP
- rotate
In that order. You can do this at any location, but only
once. If you try to execture it again, the instructions
will
have self modified in various ways, and the next attempt will do
something un-intended. However, if you do it this way:
- Input (at location 53, mod 94)
- Branch to 82
- OP (at location 82, mod 94)
- branch to 59
- rotate (at location 59, mod 94)
- branch
Then each instruction is located where it is part of a 2-cycle.
So the first time it will do what is desired, the next time it will do
nothing at all, then the next time each will do the correct thing
again.. In fact you can do even better - you can execute the routine,
then branch to each of the branches (this can work since the branch
target is not coded with the branch, it's in data segment. Thus
the same branch can jump to multiple different locations.) So why
branch to a branch instead of going to the final location
directly?? Because the side effect of a branch is to permute the
instruction just before the branch target! Thus executing the
code, then chaining through the branches, restores the code (if
2-cycle) to the original state.
Many modifications are possible: pre-munging, munge as you go,
or post-fixing as described above. Each is useful in some
circumstances.
Building immutable NOPs
Only some locations have NOPs which can be loaded initially, and will
always remain NOPs. Such cycles exist
for all locations (mod 94), but most never go through any official
instruction and hence cannot be loaded directly. However, all
can be loaded by loading a constant from 129-255, then doing a single
rotate (used as a divide by three.) This gives numbers in the
range of 43-85 (Note that the numbers must be evenly divisible by three
since the LST will be rotated into the most significant trit).
Each location has at least one NOP loop that contains one such
number.
Note that in the case of NOPs , as in branches, we do not need to
worry about the cycle length, though for different reasons (branches
don't change, and NOPs change but we don't care as long as they change
into other NOPs).
Some observations on the OP operator:
OP is defined as:
| A trit:
________|_0__1__2_
0 | 1 0 0
*D 1 | 1 0 2
trit 2 | 2 2 1
If the memory (*D) is all ones, then the result is just the A register
with 1s and 0s swapped.
If the A trit is all 2s, then the result is the memory with 1 and 2
swapped.
All the values that are easy to come by (instructions or input) will
have 0s in their upper trits. Thus after any OP they will have
all 1s in these position.
You can set a memory location to a known value as follows: First
OP a location with itself. (Any ROTATE or OP instruction will
set the A register and memory to the same value). After resetting the D
register, then OPing with itself, a location will contain only 0s and
1s. Then if you OP with an A of 0, you'll get all ones. If
you OP with an A of all 1s, you'll set A and memory to 0.
Loads and Stores:
You can synthesize a load from 10 rotates (which restores the
original). Alternatively, you can fill A with all 2s, then OP the
location (which swaps 1s and 2s in the memory location. Then repeat
this process, which swaps the memory back and loads A with the newly
corrected value.. You can synthesize a 'store' by OPing twice
into
locations fulled with all '1's. (If the memory trit is 1, then
the OP bit gets written with 0 and 1 reversed and 2 kept the
same. If you do this twice you get the original back.)
Doing arithmetic in Malbolge
I suspect the best way to do arithmetic is by table lookup. Even
this is difficult (you need a table filled with computed values
followed by an equally sized array of branch targets. Then you
load the value with a ROTATE or OP instruction followed by a
tables worth of immutable NOPs, followed by a LOAD and BRANCH. Of
course this scrambles your table entry, which must then be undone, and
so on.). This still looks easier than synthesizing arithmetic out
of OPs and ROTATEs, at least to me.
Code Density
Most of these techniques require considerable code to perform simple
operations. In general this is no problem, and does not affect
theoretical Turing completeness. (If your primary concern is
code density, perhaps Malbolge is not such a good choice of
languages....) It is a problem in practice since only the bottom 256
locations can be easily addressed by any program you can input - any
higher addresses must be synthesized. This in itself takes lots
of code.
A general strategy for writing larger Malbolge programs, and
proving practical Turing completeness.
If the code can be made to fit, then a combinations of OPs, rotates,
and computed branches should allow a bootstrap
program that reads an
arbitrary byte string into memory, which would help get around the 8
allowed character restriction, and allows use of more of the address
space. Then it might be possible to write a
BrainF***->Malbolge
compiler, and so on.... This shows that Molbolge meets the
practical definition of Turing Completeness - it can compute any
problem that fits within its memory. However, Malbolge can never
meet the formal
definition of Turing completeness, which requires access to an
unlimited amount of memory.
Proving formal Turing completeness.
However, a very slight addition to Malbolge makes it truly and formally
Turing complete. There is nothing in the Malbolge spec that
states that the INPUT and OUTPUT operations cannot refer to the same
data stream, which then can be considered a tape with a 257 symbol
alphabet. (257 since INPUT can return the values 0..255, plus the
special value 2222222222 upon end of file. We'll assume this
really means upon any attempt to read an undefined byte.) INPUT
can read the symbol on the tape and move the head one byte to the
right, exactly what it currently does. OUTPUT can write the
symbol (A mod 256) to the current position and move the head one to the
right, also exactly what it currently does. Now for the
change: an OUTPUT with A=2222222222 (normally an obscure way to
write the byte 168) moves the tape head one position to the left (in
UNIX terms, it backs up the file location by one byte). Call this
variant Malboge-T, with the T standing for Turing completeness.
As far as I can determine, Malbolge-T would give results
identical to classic Malbolge with all existing Malbolge programs*. (As
if compatibility between
Malbolge varients was a big practical problem). However, since
Malbolge-T has access to a potentially unlimited external memory, it at
least has the possibility of being Turing complete. In fact it's not
hard to show, using the table lookup techniques utilized in the BrainF***->Malbolge
compiler, that a simple state machine with completely arbitrary
state transitions can be implemented. And if you can implement a
relatively small arbitrary
state machine (5 states times 5 symbols is
sufficient), and combine it with the bi-directional tape, then
you can implement a
Universal Turing Machine, and hence show true Turing completeness.
*except for the copy program above, which would now backup forever
after EOF on the input, instead of spewing an infinite number of bytes
with a value of 168.
It could be worse
Malbolge, although obviously difficult, could be worse. Here are
some suggestions for making it even tougher:
- Redo the instruction permutation table to remove all short cycles.
- In particular, if every possible cycle for every location
contains at least one non-NOP instruction, then you could not even
construct a NOP that you could rely on.
- Remove the oversight in the reference interpreter that lets the
user load non-ascii values directly.
- You could make the OP even less useful by modifying it so that as
few rows and columns as possible contain all three digit values. This
makes it hard to set specific values that contain all three trits.
Alternatively, make OP so that as many rows and columns as
possible contain all three trit values. This makes it very hard
to set a memory address to anything unless you know the prior value.
- Modify instructions as they are fetched, not when they are done.
Then branches too would self modify.
Happy programming,
Lou Scheffer
From Ryan Kusnery's weird languages page:
The day that someone writes, in Malbolge,
a program that simply copies its input to it's output, is the day my
hair spontaneously turns green. It's the day that elephants are purple
and camels fly, and a cow can fit through a needle's eye.
I hereby release all my work on Malbolge, in any and all forms, into the public domain.
This page last modified 11 Dec 2007.
Back to LScheffer home page