Programming languages are categorized into a few different levels, which are described in the following sections.
The lowest level of programming languages are the machine languages. A machine language is a set of binary instruction codes which direct the activities of a Central Processing Unit (CPU). Everything a computer does is ultimately the result of running machine language instructions. No matter what language you use to program, the computer is always running a sequence of machine instructions in order to execute your program. The run time of any program is determined by how long that sequence is and to some extent, which instructions it contains. Multiply and divide instructions take longer than add and subtract instructions, for example.
Each instruction consists of an operation code (opcode for short) and possibly some operands. For example, an add instruction contains a binary opcode that causes the CPU to initiate a sequence of operations to add two numbers, and usually two or three operands that specify where to get the terms to be added, and where to store the result.
The CPU reads instructions from memory, and the bits in the instruction trigger "switches" in the CPU, causing it to execute the instruction. For example, the following is an example of a machine code add instruction for the MIPS microprocessor:
00000000010000110001100000100000
This instruction would cause the processor to add the contents of registers 2 and 3, and store the result back into register 3. The meaning of each bit in this instruction is depicted in Table 14.1, “Example MIPS Instruction”
Table 14.1. Example MIPS Instruction
opcode | source1 | source2 | destination | unused | opcode continued |
---|---|---|---|---|---|
000000 | 00010 | 00011 | 00011 | 00000 | 100000 |
add | register 2 | register 3 | register 3 | - | add |
A CPU's instruction set architecture is defined by its instruction set. For example, the Intel x86 family of architectures has a specific set of instructions, with variations that have evolved over time (8086, 8088, 80286, 80386, 80486, Pentium, Xeon, Core Duo, AMD Athlon, etc.) The x86 architectures have a completely different instruction set than the MIPS architecture, the ARM architecture, the RISC-V architecture, and so on.
The fact that machine language is specific to one architecture presents an obvious problem: Machine language is not portable. Machine language programs written for one architecture have to be completely rewritten in order to run on a different architecture.
In addition to the lack of portability, machine language programs tend to be very long, since the machine instructions are primitive. Most machine instructions can only perform a single, simple operation such as adding two numbers. It takes a sequence of dozens of instructions to evaluate a simple polynomial.
In order to program in machine language, one would have to memorize or look up binary codes for opcodes and operands in order to read or write the instructions. This process is far too tedious and error prone to allow for productive (or enjoyable) programming.
One of the first things early programmers did to make the job easier is create a mnemonic, or symbolic form of machine language that is easier for people to read. For example, instead of writing
00000000010000110001100000100000
a programmer could write
add $3, $2, $3
which is obviously much more intuitive.
Assembly language also makes it possible for the programmer to use named variables instead of numeric memory addresses and many other convenient features. However, the CPU can't understand this mnemonic form, so it has to be translated, or assembled into machine language before the computer can run it. Hence, it was given the name assembly language.
While assembly language is much easier to read and write than machine language, it still suffers from two major problems:
Early programmers yearned for the ability to write a mathematical expression or an English-like statement, and have the equivalent machine code generated automatically. In the 1950's, a team at IBM led by John Backus set out to do just that, and their efforts produced the first widely used high-level language, FORTRAN, which is short for FORmula TRANslator. The program that performed the translation from algebraic expressions and other convenient constructs to machine language was named a compiler.
FORTRAN made it much easier to write programs, since we could now write a one-line algebraic expressions and let the compiler convert it to the long sequence of machine instructions. We could write a single print statement instead of the hundreds of assembly language instructions needed to perform common tasks like converting a number to the sequence of characters that are sent to our terminal.
In addition to making our programs much shorter and easier to understand, FORTRAN paved the way for another major benefit: portability. We could now write programs in FORTRAN, and by modifying the compiler to output machine code for different CPU architectures, we could run the same program on any type of computer.
What is machine language?
What is assembly language?
What are two disadvantages of machine language and assembly language, compared to high level languages?
What are two advantages of high level languages over machine language and assembly language?