Many operating systems that came before Unix treated each input or output device differently. Each time a new device became available, programs would have to be modified in order to access it. This is intuitive, since the devices all look different and perform different functions.
The Unix designers realized that this is actually unnecessary and a waste of programming effort, so they employed the concept of device independence. Unix device independence works by treating virtually every input and output device exactly like an ordinary file. All input and output, whether to/from a file on a disk, a keyboard, a mouse, a scanner, or a printer, is simply a stream of bytes to be input or output by a program.
Most I/O devices are actually accessible as a
device file in
/dev. For example, the primary CD-ROM
/dev/cd0, and the main disk
Data are often recovered from corrupted file systems or accidentally deleted files by reading the raw disk partition as a file using standard Unix commands such as grep!
shell-prompt: grep string /dev/ad0s1f
To see the raw input from a mouse as it is being moved, one could use the following command:
shell-prompt: hexdump /dev/mouse
cat /dev/mouse would also work, but the binary data stream would appear as garbage on the terminal screen.
Some years ago while mentoring my son's robotics team, as part of a side project, I reverse-engineered a USB game pad so I could control a Lego robot via Bluetooth from a laptop. Thanks to device-independence, no special software was needed to figure out the game pad's communication protocol.
After plugging the game pad into my FreeBSD laptop, the
dmesg command shows the name of the new
device file created under
ugen1.2: <vendor 0x046d product 0xc216> at usbus1 uhid0 on uhub3 uhid0: <vendor 0x046d product 0xc216, class 0/0, rev 1.10/3.00, addr 2> on usbus1
One can then view the input from the game pad using hexdump. It was easy to see that moving the right joystick up resulted in lower numbers in the 3rd and 7th columns, while moving down increased the values. Center position sends a value around 8000 (hexadecimal), fully up is around 0, fully down is ffff. Analogous results were seen for the other joystick and left or right motion, as well as the various buttons. It was then relatively easy to write a small program to read the joystick position from the game pad and send commands over Bluetooth to the robot, adjusting motor speeds accordingly. Sending commands over Bluetooth is also done with the same functions as writing to a file.
FreeBSD manatee.acadix bacon ~ 410: hexdump /dev/uhid0 0000000 807f 7d80 0008 fc04 807f 7b80 0008 fc04 0000010 807f 7780 0008 fc04 807f 6780 0008 fc04 0000020 807f 5080 0008 fc04 807f 3080 0008 fc04 0000030 807f 0d80 0008 fc04 807f 0080 0008 fc04 0000060 807f 005e 0008 fc04 807f 005d 0008 fc04 0000070 807f 0060 0008 fc04 807f 0063 0008 fc04 0000080 807f 006c 0008 fc04 807f 0075 0008 fc04 0000090 807f 0476 0008 fc04 807f 1978 0008 fc04 00000a0 807f 4078 0008 fc04 807f 8c7f 0008 fc04 00000b0 807f 807f 0008 fc04 807f 7f7f 0008 fc04 00000c0 807f 827f 0008 fc04 807f 847f 0008 fc04 00000d0 807f 897f 0008 fc04 807f 967f 0008 fc04 00000e0 807f a77f 0008 fc04 807f be80 0008 fc04 00000f0 807f d980 0008 fc04 807f f780 0008 fc04 0000100 807f ff80 0008 fc04 807f ff83 0008 fc04 0000110 807f ff8f 0008 fc04 807f ff93 0008 fc04
It's interesting to note that the hexdump command first appeared in 4.3 BSD years before USB debuted and more than a decade before USB game pads existed. I could have just as easily used the od (octal dump) command, which was part of the original AT& Unix 1 in the early 1970s. The developers could not possibly have imagined that this program would one day be used this way. It was intended for looking at binary files and possibly input from devices of the time, but because of device independence, these commands would never need to be altered in order to work with new devices connected to a Unix system. The ability to use software without modification on devices invented decades later is the mark of intelligent software engineering.
Since I/O devices and files are so interchangeable, Unix shells provide a facility called redirection to easily interchange them for any command without the command even knowing it.
Redirection depends on the notion of a file stream. You can think of a file stream as a hose connecting a program to a particular file or device. Redirection simply disconnects the hose from the default file or device and connects it to another one chosen by the shell user.
Every Unix process has three standard streams that are open from the moment the process is born. The standard streams are normally connected to the terminal, as shown in Table 1.9, “Standard Streams”.
Table 1.9. Standard Streams
|Standard Input||User input||Terminal keyboard|
|Standard Output||Normal output||Terminal screen|
|Standard Error||Errors and warnings||Terminal screen|
Redirection in the shell allows any or all of the three standard streams to be disconnected from the terminal and connected to a file or other I/O device. It uses operators within the commands to indicate which stream(s) to redirect and where. The basic redirection operators shells are shown in Table 1.10, “Redirection Operators”.
Table 1.10. Redirection Operators
|>||All||Standard Output (overwrite)|
|>>||All||Standard Output (append)|
|2>||Bourne-based||Standard Error (overwrite)|
|2>>||Bourne-based||Standard Error (append)|
|>&||C shell-based||Standard Output and Standard Error (overwrite)|
|>>&||C shell-based||Standard Output and Standard Error (append)|
Using output redirection (>, 2>, or >&) in a command will normally overwrite (clobber) the file that you're redirecting to, even if the command itself fails.
Be very careful not to use output redirection accidentally. This most commonly occurs when a careless user meant to use input redirection, but pressed the wrong key.
The moment you press Enter after typing a command containing "> filename", filename will be erased! Remember that the shell performs redirection, not the command, so filename is clobbered before the command even begins running.
noclobber is set for the shell,
output redirection to a file that already exists will
result in an error. The
option can be overridden by appending a ! to the
redirection operator in C shell derivatives or a |
in Bourne shell derivatives. For example, >! can
be used to force overwriting a file in csh or tcsh, and >|
can be used in sh, ksh, or bash.
shell-prompt: ls > listing.txt # Overwrite with listing of . shell-prompt: ls /etc >> listing.txt # Append listing of /etc
Note that redirection is performed by the shell, not
the program. In the examples above, the ls
command sends its output to the standard output. It is
unaware that the standard output has been redirected to
Put another way,
not an argument to the
ls command. The redirection is
handled by the shell, and ls runs as
if it had been typed as simple:
More often than not, we want to redirect both normal output and error messages to the same place. This is why C shell and its derivatives use a combined operator that redirects both at once. The same effect can be achieved with Bourne-shell derivatives using another operator that redirects one stream to another stream. In particular, we redirect the standard output (stream 1) to a file (or device) and at the same time redirect the standard error (stream 2) to stream 1.
shell-prompt: find / -name '*.c' > list.txt 2>&1
If a program takes input from the standard input, we can redirect input from a file as follows:
shell-prompt: command < input-file
For example, consider the "bc" (binary calculator) command, an arbitrary-precision calculator which inputs numerical expressions from the standard input and writes the results to the standard output:
shell-prompt: bc 3.14159265359 * 4.2 ^ 2 + sqrt(30) 60.89491440932 quit
In the example above, the user entered "3.14159265359 * 4.2 ^ 2 + sqrt(30)" and "quit" and the bc program output "60.89491440932". We can place the input shown above in a file using any text editor, such as nano or vi, or by any other means:
shell-prompt: cat > bc-input.txt 3.14159265359 * 4.2 ^ 2 + sqrt(30) quit (Type Ctrl+d to signal the end of input to the cat command) shell-prompt: more bc-input.txt 3.14159265359 * 4.2 ^ 2 + sqrt(30) quit
Now that we have the input in a file, we can feed it to the bc command using input redirection instead of retyping it on the keyboard:
shell-prompt: bc < bc-input.txt 60.29203070318
Although it may seem a little confusing and circular, the standard streams themselves are represented as device files on Unix systems. This allows us to redirect one stream to another without modifying a program, by appending the stream to one of the device files /dev/stdout or /dev/stderr. For example, if a program sends output to the standard output and we want to send it instead to the standard error, we could do the following:
printf "Oops!" >> /dev/stderr
If we would like to discard output sent to the standard output or standard error, we can redirect it to /dev/null. For example, to see only error messages (standard error) from myprog, we could do the following:
./myprog > /dev/null
To see only normal output and not error messages, assuming Bourne shell:
./myprog 2> /dev/null
The device /dev/zero is a readable file that produces a stream of zero bytes.
The device /dev/random is a readable file that produces a stream of random integers in binary format.
Quite often, we may want to use the output of one program as input to another. Such a thing could be done using redirection, as shown below:
shell-prompt: sort names.txt > sorted-names.txt shell-prompt: uniq < sorted-names.txt > unique-names.txt
The same task can be accomplished in one command using a pipe. A pipe redirects one of the standard streams, just as redirection does, but to another process instead of to a file or device. In other words, we can use a pipe to send the standard output and/or standard error of one process directly to the standard input of another process.
Example 1.4. Simple Pipe
The command below uses a pipe to redirect the standard output of the sort command directly to the standard input of the uniq.
shell-prompt: sort names.txt | uniq > uniq-names.txt
Since a pipe runs multiple commands in the same shell, it is necessary to understand the concept of foreground and background processes, which are covered in detail in Section 1.19, “Process Control”.
Multiple processes can output to a terminal at the same time, although the results would obviously be chaos in most cases.
Only one process can receiving input from the keyboard, however.
The foreground process running under a given shell process is defined as the process that receives the input from the standard input device (usually the keyboard). This is the only difference between a foreground process and a background process.
When running a pipeline command, the last process in the pipeline is the foreground process. All others run in the background, i.e. do not use the standard input device inherited from the shell process. Hence, when we run:
shell-prompt: find /etc | more
It is the more command that receives input from the keyboard. The more command has its standard input redirected from the standard output of find, and the standard input of the find command is effectively disabled.
The more command is somewhat special: Since its standard input is redirected from the pipe, it opens another stream to connect to the keyboard so that the user can interact with it, pressing the space bar for another screen, etc.
For piping stderr, the notation is similar to that used for redirection:
Table 1.11. Pipe Operators
||||All||Standard Output to Standard Input|
||&||C shell family||Standard Output and Standard Error to Standard Input|
|2|||Bourne shell family||Standard Error to Standard Input|
The entire chain of commands connected by pipes is known as a pipeline.
This is such a common practice that Unix has defined the term filter to apply to programs that can be used in this way. A filter is any command that can receive input from the standard input and send output to the standard output. Many Unix commands are designed to accept a file names as an arguments, but also to use the standard input and/or standard output if no filename arguments are provided.
Example 1.5. Filters
The more command is commonly used as a filter. It can read a file whose name is provided as an argument, but will use the standard input if no argument is provided. Hence, the following two commands have the same effect:
shell-prompt: more names.txt shell-prompt: more < names.txt
The only difference between these two commands is that
in the first, the more receives
names.txt as a command line
argument, opens the file itself (creating a new file
stream), and reads from the new stream (not
the standard input stream). In the second
instance, the shell opens the file and connects the
standard input stream of the more
command to it.
Using the filtering capability of more, we can paginate the output of any command:
shell-prompt: ls | more shell-prompt: find . -name '*.c' | more shell-prompt: sort names.txt | more
We can string any number of commands together using pipes. For example, the following pipeline sorts the names in names.txt, removes duplicates, filters out all names not beginning with 'B', and shows the first 100 results one page at a time.
shell-prompt: sort names.txt | uniq | grep '^B' | head -n 100 | more
One more useful tool worth mentioning is the tee command. The tee is a simple program that reads from its standard input and writes to both the standard output and to one or more files whose names are provided on the command line. This allows you to view the output of a program on the screen and redirect it to a file at the same time.
shell-prompt: ls | tee listing.txt
Recall that Bourne-shell derivatives do not have combined
operators for redirecting standard output and standard
error at the same time. Instead, we redirect the
standard output to a file or device, and redirect the
standard error to the standard output using
We can use the same technique with a pipe, but there is one
more condition: For technical reasons, the
2>&1 must come before the pipe.
shell-prompt: ls | tee listing.txt 2>&1 # Won't work shell-prompt: ls 2>&1 | tee listing.txt # Will work
The yes command produces a stream of y's followed by newlines. It is meant to be piped into a program that prompts for y's or n's in response to yes/no questions, so that the program will receive a yes answer to all of its prompts and run without user input.
yes | ./myprog
In cases where the response isn't always "yes" we can feed a program any sequence of responses using redirection or pipes. Be sure to add a newline (\n) after each response to simulate pressing the Enter key:
./myprog < responses.txt printf "y\nn\ny\n" | ./myprog
Users who don't understand Unix and processes very well often fall into bad habits that can potentially be very costly. There are far too many such habits to cover here (One could write a separate 1,000-page volume called "Favorite Bad Habits of Unix Users").
As a less painful alternative, we'll explore one common habit in detail and try to help you understand how to assess your methods so you can then check others for potential problems. Our feature habit of the day is the use of the cat command at the head of a pipeline:
shell-prompt: cat names.txt | sort | uniq > outfile
So what's the alternative, what's wrong with using cat this way, what's the big deal, why do people do it, and how do we know it's a problem?
Most commands used downstream of cat in situations like this (e.g. sort, grep, more, etc.) are capable of reading a file directly if given the filename as an argument:
shell-prompt: sort names.txt | uniq > outfile
Even if they don't take a filename argument, we can always use simple redirection instead of a pipe:
shell-prompt: sort < names.txt | uniq > outfile
Using cat this way just adds overhead in exchange for no benefit. Pipes are helpful when you have to perform multiple processing steps in sequence. However, the cat command doesn't do any processing at all. It just reads the file and copies it to the first pipe.
In doing so, here's what happens:
This is like pouring a drink into a glass, then moving it to a second glass using an eye dropper, then pouring it into a third class and finally a fourth glass before actually drinking it.
It's much simpler and less wasteful for the sort command to read directly from the file.
Using a pipe this way also prevents the downstream command from optimizing disk access. A program such as sort might use a larger input buffer size to reduce the number of disk reads. Reading fewer, larger blocks from disk can prevent the latency of each disk operation from adding up, thereby reducing run time. This is not possible when reading from a pipe, which is a fixed-size memory buffer.
What's the big deal?
Usually, this is no real problem at all. Wasting a few seconds or minutes on your laptop won't harm anyone. However, sometimes mistakes like this one are incorporated into HPC cluster jobs using hundreds of cores for weeks at a time. In that case, it could increase run time by several days, delaying the work of other users as well as your own. Not to mention, the wasted electricity could cost the organization hundreds of dollars.
By far the most common response I get when asking people about this sort of thing is: "[Shrug] I copied this from an example on the web. Didn't really think about it."
Occasionally, someone might think that this speeds up processing by splitting the task into two processes, hence utilizing multiple cores, one running cat to handle the disk input and another dedicated to sort or whatever command is downstream. This only helps if the commands use enough CPU time to benefit from more than one core, or if it saves you from having to write an intermediate file. Neither of these factors are in play in this situation.
One might also think it helps by overlapping disk input and CPU processing, i.e. cat can read the next block of data while sort is processing the current one. This may have worked a long time ago using slow disks and unsophisticated operating systems, but it only backfires on modern hardware running modern Unix systems.
In reality, this strategy only increases the amount of CPU time used, and almost always increases run time.
Detecting performance issues is pretty easy. The most common tool is the time command.
shell-prompt: time fgrep GGTAGGTGAGGGGCGCCTCTAGATCGGAAGAGCACACGTCTGAACTCCAGTCA test.vcf > /dev/null 2.539u 6.348s 0:09.86 89.9% 92+173k 35519+0io 0pf+0w
We have to be careful when using time with a pipeline, however. Depending on the shell and the time command used (some shells have in internal implementation), it may not work as expected. We can ensure proper function by wrapping the pipeline in a separate shell process, which is then timed:
shell prompt: time sh -c "cat test.vcf | fgrep GGTAGGTGAGGGGCGCCTCTAGATCGGAAGAGCACACGTCTGAACTCCAGTCA > /dev/null" 2.873u 17.008s 0:13.68 145.2% 33+155k 33317+0io 0pf+0w
Table 1.12, “Run times of pipes with cat” compares the run times (wall time) and CPU time of the direct fgrep and piped fgrep shown above three different operating systems.
All runs were performed on otherwise idle system. Several trials were run to ensure reliable results. Times from the first read of test.vcf were discarded, since subsequent runs benefit from disk buffering (file contents still in memory from the previous read). The wall time varied significantly on the CentOS system, with the piped command running in less wall time for a small fraction of the trials. The times shown in the table are typical. Times for FreeBSD and MacOS were fairly consistent.
Note that there is a large variability between platforms which should not be taken too seriously. These tests were not run on identical hardware, so they do not tell us anything conclusive about relative operating system performance.
We can also collect other data using tools such as top to monitor CPU and memory use and iostat to monitor disk activity. These commands are covered in more detail in Section 1.14.15, “top” and Section 1.14.16, “iostat”.
Table 1.12. Run times of pipes with cat
|System specs||Pipe wall||No pipe wall||Pipe CPU||No pipe CPU|
|CentOS 7 i7 2.8GHz||33.43||29.50||13.59||8.45|
|FreeBSD Phenom 3.2GHz||13.01||8.90||18.76||8.43|
|MacBook i5 2.7GHz||81.09||81.35||84.02||81.20|
Commands placed between parentheses are executed in a new child shell process rather than the shell process that received the commands as input.
This can be useful if you want a command to run in a different directory or with altered environment variables, without affecting the current shell process.
shell-prompt: (cd /etc; ls)
Since the commands above are executed in a new shell process, the shell process that printed "shell prompt: " will not have its current working directory changed. This command has the same net effect as the following:
shell-prompt: pushd /etc shell-prompt: ls shell-prompt: popd
prog.cunder bash using gcc, saving error messages to
errors.txtand normal screen output to
prog.cunder tcsh using gcc, saving both error messages and normal screen output to
prog.cunder tcsh using gcc, saving both error messages and normal screen output to
input.txtas the standard input instead of the keyboard.
prog.cunder tcsh using gcc, saving both error messages and normal screen output to
output.txtand sending them to the screen at the same time.