University of Nevada Las Vegas
Howard R. Hughes
College of Engineering
School of Computer Science
My Home Page

CSC 456/656 Binary Encoding of a Turing Machine
Revised October 30, 2005
In class, I frequently write <M> and say "the binary encoding of M."
An encoding of TM (or, for that matter, an encoding of any object)
Is a string that fully describes that TM, in the sense that
a machine that emulates M need only read that string to know everything
it needs to know about M.
There are many encoding schemes for Turing machines, but they all must
have the following properties:
-
Given a TM M, there is some encoding <M> of M using that scheme. (There
may be more than one choice of <M> for the same M, though.)
-
There is a machine that can emulate any Turing machine by reading its
encoding. We call this machine the "emulator."
-
If x and y are both encodings of Turing machines, then x cannot be
a proper prefix of y. (This is to ensure that the machine that
emulates TMs by reading their encodings will not become confused if
the encoding is followed by another string.)
We will restrict our universe of Turing machines to those whose
input alphabet Sigma is the binary alphabet {0,1}. We still allow
the tape alphabet Gamma to include other symbols. A TM with any
input alphabet is equivalent to one with the binary input alphabet,
provided we provide conversion of the input string to binary.
A Humanly Readable Encoding Scheme
Our encoding alphabet will consist of commas, digits, letters, an end of line
symbol, and an end of file symbol.
Each line of the humanly readable encoding contains the encoding of one
transition.
Each state has a unique name, which can be any string of digits and letters.
The start state has the name "0". The accept state has the name "accept"
and the reject state has the name "reject".
Each tape symbol has a unique name, which can be any string of digits and
letters. The name of the blank is the empty string. The name of the
binary symbol "0" is "0". The name of the binary symbol "1" is "1".
The name of the command to move right is "R".
The name of the command to move right is "L".
The name of the command to stay is "S".
The encoding of a transition is
the name of the a state, followed by a comma,
followed by the name of a tape symbol, followed by a comma,
followed by the name of a tape symbol, followed by a comma,
followed by the name of a move command, followed by a comma,
followed by the name of a state, followed by the end of line symbol.
The humanly readable encoding of M consists of a number of lines,
one for each transition of M, followed by the end of file symbol.
Here is a humanly readable encoding of the Turing machine
shown in Figure 3.8, on page 144 of the second edition of
Introduction to the Theory of Computation, by Michael Sipser.
Although this encoding lists the transitions in lexical order, that
is unnecessary. If I permute the rows, I simply get a different
encoding for the same machine.
A Binary Encoding Scheme
Finally, <M> is the encoding of the humanly readable encoding
into binary, using the Ascii code.
Note that <M> always ends with the Ascii encoding of the end of file
symbol, and that a machine reading <M> will not entounter this
symbol before eaching the end of <M>.
How the Emulator Works
It is simpler to describe the emulator E as a machine with one "regular"
tape and two "special" tapes, the "code tape" and the "configuration tape."
The input string of the emulator is
<M>w,
where w is an arbitrary binary string. (Note that w could contain
any number of instances of the Ascii code for end of file.)
U first loads <M> onto its code tape.
It then loads #0#w onto its configuration tape.
E then emulates the steps of M, one by one. For each emulated step,
the current state is encoded after the first octotroph on the configuration
tape, while the current symbol is encoded after the second octotroph
on the configuration tape. E searches for the first octotroph, then
searches the code tape for a transition which begins with the
name of the current state, then a comma, then the name of the current symbol.
If it does not find a match, E hangs. If it does find a match,
E uses the rest of the transition line to alter the configuration tape
accordingly. If the new state is a halt state, E halts.
The Halting Problem and the Universal Turing Machine
Let Lhalt = {<M>w | w is accepted by M}.
This is the same language as ATM as defined
on page 174 of the second edition of
Introduction to the Theory of Computation, by Michael Sipser, and
is also known as the halting problem.
By the Church Turing Thesis, we can replace the emulator E defined
above by a Turing machine, U, which is then called a
universal
Turing machine.