ASCII table and history

Or, why does Ctrl+i insert a Tab in my terminal?

Introduction (hide)

To understand why Control+i inserts a Tab in your terminal you need to understand ASCII, and to understand ASCII you need know a bit about its history and the world it was developed in. Please bear with me (or just go the table).

Teleprinters

Teleprinters evolved from the telegraph. Connect a printer and keyboard to a telegraph and you’ve got a teleprinter. Early versions were called “printing telegraphs”.

Most teleprinters communicated using the ITA2 protocol. For the most part this would just encode the alphabet, but there are a few control codes: WRU (“Who R U”) would cause the receiving teleprinter to send back its identification, BEL would ring a bell, and it had the familiar CR (Carriage Return) and LF (Line Feed).

This is all early 20th century stuff. There are no electronic computers; it’s all mechanical working with punched tape. ITA2 (and codes like it) were mechanically efficient; common letters such as “e” and “t” required only a single hole to be punched.

These 5-bit codes could only encode 32 characters, which is not even enough for just English. The solution was to add the FIGS and LTRS codes, which would switch between “figures” and “letters” mode. “FIGS R W” would produce “42”. This worked, but typo’ing a FIGS or LTRS (or losing one in line noise) would result in gibberish. Not ideal.

Terminals

In the 1950s teleprinters started to get connected to computers, rather than other teleprinters. ITA2 was designed for mechanical machines and was awkward to use. ASCII was designed specifically for computer use and published in 1962. Teleprinters used with computers were called terminals (as in “end of a connection”, like “train terminal”). Teleprinters were also called “TeleTYpewriter”, or TTY for short, and you can still find names like /dev/tty or /bin/stty on modern systems.

People really programmed computers using teleprinters. Here’s a video of a teleprinter in action, and here’s a somewhat cheesy (but interesting and cute) video which explains how they were used to program a PDP 11/10.

A terminal would connect to a computer with a serial port (RS-232), which simply transfers bytes back and forth. A terminal is more akin to a monitor with a keyboard, rather than a computer on its own. A modern monitor connected with HDMI is told “draw this pixel in this colour”, in the 1960s the computer merely said “here are a bunch of characters”.

If you’re wondering what a “shell” is: a shell is a program to interact with your computer. It provides a commandline, runs programs, and displays the result. The terminal just displays characters. It’s the difference between a TV and a DVD player.

Teleprinters needed some way to communicate events such as “stop sending me data” or “end of transmission”. This is what control characters are for. The exact meaning of control characters has varied greatly over the years (which is why extensive termcap databases are required). ASCII is more than just a character set; it’s a way to communicate between a terminal and a computer.

An additional method to communicate are escape sequences. This is a list of characters starting with the ESC control character (0x1b). For example F1 is <Esc>OP and the left arrow is <Esc>[OD. Computers can give instructions to terminals, too: <Esc>[2C is move the cursor 2 positions forward and <Esc>[4m underlines all subsequent text. This is also how the Alt key works: Alt+a is <Esc>a.

Modern systems and ASCII properties

All of this matters because modern terminals operate on the same principles as those of the 1960s. If you’re opening three xterm or iTerm2 windows then you’re emulating three terminals connecting to a “mainframe”.

If you look at the ASCII table below then there are some interesting properties: in the 1st column you can see how the left two bits are always set to zero, and that the other 5 bits count to 31 (32 characters in total; it starts at 0). The 2nd column repeats this pattern but with the 5th bit set to 1 (remember, read binary numbers from right-to-left, so that’s 5th from the right). The 3rd column repeats this pattern again with the 6th bit set, and the final column has both bits set.

The interesting part here is that the letters A-Z and some punctuation map directly to the control characters in the 1st column. All that’s needed is removing one bit, and that’s exactly what the Control key did: clear the 7th bit. Lowercase and uppercase letters align in the 3rd and 4th columns, and this is what the Shift key did: clear the 6th bit.

Pressing Control+i (lowercase) would mean sending “)”, which is not very useful. So most terminals interpret this as Control+I (uppercase), which sends HT. DEL is last is so all bits are set to 1. This is how you “deleted” a character in punch tapes: punch all the holes!

This is kind of neat and well designed, but for us it means:

The world has not completely stood still and there have been improvements since the 1960s, but terminals are still fundamentally ASCII-based text interfaces, and programs running inside a terminal – like a shell or Vim – still have very limited facilities for modern key events. Non-terminal programs don’t have these problems as they’re not restricted to a 1960s text interface.

Note: for brevity’s sake many aspects have been omitted in the above: ITA2 was derived from Murray code, the 1967 ASCII spec changed many aspects (1962 ASCII only had uppercase), there were other encodings (e.g. EBCDIC), graphical terminals such as the Tektronix 4014 (which xterm can emulate), ioctls, etc. References and further reading: An annotated history of some character codes, 7-bit character sets, Control characters in ASCII and Unicode, The TTY demystified

Stock Exchange printing telegraph, 1907
Image 1, a printing telegraph produced in 1907. The alphabetically sorted piano keys are a great example of how the first generation of new innovations tends to resemble whatever already exists, and that it takes a few more innovations to really get the most out of it. This style of piano keyboards was introduced in the 1840s, and while the keyboard as we know it today was introduced in the 1870s, it took a while for it to replace all piano-style keyboards; this is probably among the last models that was made).
Teletype model 33 ASR, 1963
Image 2, the Teletype model 33 ASR, introduced in 1963. This is one first ASCII teleprinters. Note the machinery on the left; you could feed this with a punched tape to automatically type a program for you, similar to how you would now load a program from a disk. The Teletype model 33 was massively popular, and the brand name Teletype became synonymous with terminal.
Ken Thompson working on the PDP-11
Image 3, Ken Thompson working on the PDP-11 using a Teletype (model 33?). What always struck be about this image is the atrocious ergonomics of … everything. The keyboard, the chair, everything about the posture: it’s all terrible. Inventing Unix almost seems easy compared to dealing with that!
DEC VT-100
Image 4, DEC VT100, a kind of terminal that a terminal emulator such as xterm emulates. It has a visual display and supports the essential escape sequences still in use today. These were known as “visual terminals”, referring to the visual screen with characters, as opposed to printing them out.

The table

DecHexBinaryChar
00x0000 00000NUL
10x0100 00001SOH
20x0200 00010STX
30x0300 00011ETX
40x0400 00100EOT
50x0500 00101ENQ
60x0600 00110ACK
70x0700 00111BEL
80x0800 01000BS
90x0900 01001HT
100x0a00 01010LF
110x0b00 01011VT
120x0c00 01100FF
130x0d00 01101CR
140x0e00 01110SO
150x0f00 01111SI
160x1000 10000DLE
170x1100 10001DC1
180x1200 10010DC2
190x1300 10011DC3
200x1400 10100DC4
210x1500 10101NAK
220x1600 10110SYN
230x1700 10111ETB
240x1800 11000CAN
250x1900 11001EM
260x1a00 11010SUB
270x1b00 11011ESC
280x1c00 11100FS
290x1d00 11101GS
300x1e00 11110RS
310x1f00 11111US
DecHexBinaryChar
320x2001 00000SPACE
330x2101 00001!
340x2201 00010"
350x2301 00011#
360x2401 00100$
370x2501 00101%
380x2601 00110&
390x2701 00111'
400x2801 01000(
410x2901 01001)
420x2a01 01010*
430x2b01 01011+
440x2c01 01100,
450x2d01 01101-
460x2e01 01110.
470x2f01 01111/
480x3001 100000
490x3101 100011
500x3201 100102
510x3301 100113
520x3401 101004
530x3501 101015
540x3601 101106
550x3701 101117
560x3801 110008
570x3901 110019
580x3a01 11010:
590x3b01 11011;
600x3c01 11100<
610x3d01 11101=
620x3e01 11110>
630x3f01 11111?
DecHexBinaryChar
640x4010 00000@
650x4110 00001A
660x4210 00010B
670x4310 00011C
680x4410 00100D
690x4510 00101E
700x4610 00110F
710x4710 00111G
720x4810 01000H
730x4910 01001I
740x4a10 01010J
750x4b10 01011K
760x4c10 01100L
770x4d10 01101M
780x4e10 01110N
790x4f10 01111O
800x5010 10000P
810x5110 10001Q
820x5210 10010R
830x5310 10011S
840x5410 10100T
850x5510 10101U
860x5610 10110V
870x5710 10111W
880x5810 11000X
890x5910 11001Y
900x5a10 11010Z
910x5b10 11011[
920x5c10 11100\
930x5d10 11101]
940x5e10 11110^
950x5f10 11111_
DecHexBinaryChar
960x6011 00000`
970x6111 00001a
980x6211 00010b
990x6311 00011c
1000x6411 00100d
1010x6511 00101e
1020x6611 00110f
1030x6711 00111g
1040x6811 01000h
1050x6911 01001i
1060x6a11 01010j
1070x6b11 01011k
1080x6c11 01100l
1090x6d11 01101m
1100x6e11 01110n
1110x6f11 01111o
1120x7011 10000p
1130x7111 10001q
1140x7211 10010r
1150x7311 10011s
1160x7411 10100t
1170x7511 10101u
1180x7611 10110v
1190x7711 10111w
1200x7811 11000x
1210x7911 11001y
1220x7a11 11010z
1230x7b11 11011{
1240x7c11 11100|
1250x7d11 11101}
1260x7e11 11110~
1270x7f11 11111DEL

The binary representation has the most significant bit first (“big endian”).
ASCII is 7-bit; because many have called encodings such as CP437, ISO-8859-1, CP-1252, and others “extended ASCII” some are under the misapprehension that ASCII is 8-bit (1 byte).