ASCII table and history

Or, why does Ctrl+i insert a Tab in my terminal?

Introduction

To understand why Control+i inserts a tab in your terminal you need to understand ASCII, and to understand ASCII you need know a bit about its history and the world it was developed in. Please bear with me (or just go the table).

Teleprinters

Teleprinters evolved from the telegraph. Connect a printer and keyboard to a telegraph and you’ve got a teleprinter. Early versions were called “printing telegraphs”.

Teleprinters communicated using ITA2. For the most part this was just a standard to encode the alphabet, but there are a few control codes: WRU (“Who R U”) would cause the receiving teletype to send back its identification, BEL would ring a bell, and the familiar CR (Carriage Return) and LF (Line Feed).

This is all early 20th century stuff. There are no computers; it’s all mechanical working with punched tape. ITA2 (and codes like it) were mechanical efficient; common letters such as “e” and “t” required only a single hole to be punched.

These 5-bit codes could only encode 32 characters, which is not enough. The solution was to add the FIGS and LTRS codes, which would switch between “figures” and “letters” mode. “FIGS R W” would produce “42”. This worked, but typo’ing a FIGS or LTRS, or losing one in line noise, would result in gibberish. Not ideal.

Terminals

Teleprinters were also used to connect to computers, rather than other teleprinters. ITA2 was designed for mechanical machines and awkward, so ASCII designed specifically for computer use and published in 1962. Teleprinters used with computers were called terminals (as in “end of a connection”, like “train terminal”).

It’s perhaps hard to imagine, but people really programmed computers using Teletypes. Here’s a video of a teleprinter in action, and here’s a cheesy (but interesting and cute) video which explains how they were used to program a PDP 11/10.

The Teletype model 33 was massively popular, and the brand name Teletype became synonymous with terminal. The abbreviated form of Teletype is TTY. Yes, like /dev/tty or /bin/stty on your modern Linux or macOS system.

A terminal would connect to a computer using a serial port (RS-232) which simply transferred characters. The best way to see a terminal is as a monitor with an integrated keyboard, rather than a computer on its own. A modern monitors connected with HDMI is told “draw this pixel in this colour”. In the 1960s the computer merely said “draw this character”.

Teleprinters and terminals connected with only a serial port sending characters, so they needed some way to communicate events such as “stop sending me data” or “end of transmission”. This is what control characters are for. The exact meaning of control characters has varied greatly over the years (which is why extensive termcap databases are required). ASCII is more than just a character set; it’s a way to communicate between a terminal and a computer.

An additional method to communicate which came along with visual terminals like the ADM-3A and VT100 is sending escape sequences. This is a list of characters starting with the ESC control character (0x1b) which have a special meaning. For example F1 is “<Esc>OP”, the left arrow is “<Esc>[OD”, “<Esc>[2C” is move the cursor 2 positions forward, “<Esc>[4m” underlines all subsequent text, etc.

Modern systems and ASCII properties

All of this matters because modern terminals operate on the same fundamentals as those of the 60s. If you’re opening three windows with xterm or iTerm2 then you’re emulating three terminals connecting to a “mainframe”.

If you look at the ASCII table below then there are some interesting properties: in the 1st column you can see how the left two bits are always set to zero, and that the rest count to 31 (32 characters in total; it starts at 0). The 2nd column then repeats this pattern but with the 5th bit set to 1 (remember, read binary numbers from right-to-left, so that’s 5th from the right). The 3rd column repeats this pattern again with the 6th bit set, and the final column has both bits set.

The interesting part here is that the letters A-Z, as well as some punctuation, map directly to the control characters in the first column. All that’s needed is removing one bit, and that’s exactly what the Control key did: clear the 6th bit. Lowercase and uppercase letters align in the 3rd and 4th columns, and this is what the Shift key did: clear the 5th bit.

Pressing Control+a (lowercase) would mean sending !, which is not very useful. So most terminals interpret this as Control+A (uppercase), which sends SOH.

DEL is last is because all bits are set to 1. This is how you “deleted” a character in punch tapes: punch all the holes!

This is kind of neat and well designed, but it does have some effects, even for modern terminals:

The world has not completely stood still, and there are some improvements from the 1960s, but terminals are still fundamentally ASCII-based text interfaces, and programs running inside a terminal – like a shell or Vim – still have very limited facilities for modern key events. Non-terminal programs don’t have these problems as they’re not restricted to a 1960s text interface.

Note: for brevity’s sake many details have been omitted in the above: ITA2 was derived from Murray code, the 1967 ASCII spec changed many aspects (1962 ASCII only had uppercase), there were other encodings (e.g. EBCDIC), graphical terminals such as the Tektronix 4014 (which xterm can emulate), ioctls, etc.

Stock Exchange printing telegraph, 1907
Image 1, a printing telegraph produced in 1907. The alphabetically sorted piano-style keys are a great example of how the first generation of a new innovation tends to resemble whatever already exists, and that it takes a few more innovations to really get the most out of it (this style of piano keyboards was introduced in the 1840s, and while the keyboard as we know it today was introduced in the 1870s, it took a while for it to replace all piano-style keyboards; this is probably among the last models that was made).
Teletype model 33 ASR, 1963
Image 2, the Teletype model 33 ASR, introduced in 1963. This is one first ASCII teleprinters. Note the machinery on the left; you could feed this with a punched tape to automatically type a program for you, similar to how you would now load a program from a disk.
Ken Thompson working on the PDP-11
Image 3, Ken Thompson working on the PDP-11 using a Teletype. What always struck be about this image is the atrocious ergonomics of … everything. The keyboard, the chair, everything about the posture: it’s all terrible. Inventing Unix almost seems easy compared to dealing with that!
DEC VT-100
Image 4, DEC VT100, a kind of terminal that a terminal emulator such as xterm emulates. It has a visual display and supports the essential escape sequences.

The table

Dec Hex Binary Char
0 0x0 00 00000 NUL
1 0x1 00 00001 SOH
2 0x2 00 00010 STX
3 0x3 00 00011 ETX
4 0x4 00 00100 EOT
5 0x5 00 00101 ENQ
6 0x6 00 00110 ACK
7 0x7 00 00111 BEL
8 0x8 00 01000 BS
9 0x9 00 01001 HT
10 0xa 00 01010 LF
11 0xb 00 01011 VT
12 0xc 00 01100 FF
13 0xd 00 01101 CR
14 0xe 00 01110 SO
15 0xf 00 01111 SI
16 0x10 00 10000 DLE
17 0x11 00 10001 DC1
18 0x12 00 10010 DC2
19 0x13 00 10011 DC3
20 0x14 00 10100 DC4
21 0x15 00 10101 NAK
22 0x16 00 10110 SYN
23 0x17 00 10111 ETB
24 0x18 00 11000 CAN
25 0x19 00 11001 EM
26 0x1a 00 11010 SUB
27 0x1b 00 11011 ESC
28 0x1c 00 11100 FS
29 0x1d 00 11101 GS
30 0x1e 00 11110 RS
31 0x1f 00 11111 US
Dec Hex Binary Char
32 0x20 01 00000 SPACE
33 0x21 01 00001 !
34 0x22 01 00010 "
35 0x23 01 00011 #
36 0x24 01 00100 $
37 0x25 01 00101 %
38 0x26 01 00110 &
39 0x27 01 00111 '
40 0x28 01 01000 (
41 0x29 01 01001 )
42 0x2a 01 01010 *
43 0x2b 01 01011 +
44 0x2c 01 01100 ,
45 0x2d 01 01101 -
46 0x2e 01 01110 .
47 0x2f 01 01111 /
48 0x30 01 10000 0
49 0x31 01 10001 1
50 0x32 01 10010 2
51 0x33 01 10011 3
52 0x34 01 10100 4
53 0x35 01 10101 5
54 0x36 01 10110 6
55 0x37 01 10111 7
56 0x38 01 11000 8
57 0x39 01 11001 9
58 0x3a 01 11010 :
59 0x3b 01 11011 ;
60 0x3c 01 11100 <
61 0x3d 01 11101 =
62 0x3e 01 11110 >
63 0x3f 01 11111 ?
Dec Hex Binary Char
64 0x40 10 00000 @
65 0x41 10 00001 A
66 0x42 10 00010 B
67 0x43 10 00011 C
68 0x44 10 00100 D
69 0x45 10 00101 E
70 0x46 10 00110 F
71 0x47 10 00111 G
72 0x48 10 01000 H
73 0x49 10 01001 I
74 0x4a 10 01010 J
75 0x4b 10 01011 K
76 0x4c 10 01100 L
77 0x4d 10 01101 M
78 0x4e 10 01110 N
79 0x4f 10 01111 O
80 0x50 10 10000 P
81 0x51 10 10001 Q
82 0x52 10 10010 R
83 0x53 10 10011 S
84 0x54 10 10100 T
85 0x55 10 10101 U
86 0x56 10 10110 V
87 0x57 10 10111 W
88 0x58 10 11000 X
89 0x59 10 11001 Y
90 0x5a 10 11010 Z
91 0x5b 10 11011 [
92 0x5c 10 11100 \
93 0x5d 10 11101 ]
94 0x5e 10 11110 ^
95 0x5f 10 11111 _
Dec Hex Binary Char
96 0x60 11 00000 `
97 0x61 11 00001 a
98 0x62 11 00010 b
99 0x63 11 00011 c
100 0x64 11 00100 d
101 0x65 11 00101 e
102 0x66 11 00110 f
103 0x67 11 00111 g
104 0x68 11 01000 h
105 0x69 11 01001 i
106 0x6a 11 01010 j
107 0x6b 11 01011 k
108 0x6c 11 01100 l
109 0x6d 11 01101 m
110 0x6e 11 01110 n
111 0x6f 11 01111 o
112 0x70 11 10000 p
113 0x71 11 10001 q
114 0x72 11 10010 r
115 0x73 11 10011 s
116 0x74 11 10100 t
117 0x75 11 10101 u
118 0x76 11 10110 v
119 0x77 11 10111 w
120 0x78 11 11000 x
121 0x79 11 11001 y
122 0x7a 11 11010 z
123 0x7b 11 11011 {
124 0x7c 11 11100 |
125 0x7d 11 11101 }
126 0x7e 11 11110 ~
127 0x7f 11 11111 DEL

The binary representation has the most significant bit first ("big endian").
ASCII is 7-bit; because many have called encodings such as CP437, ISO-8859-1, CP-1252, and others “extended ASCII” some are under the misapprehension that ASCII is 8-bit (1 byte).