Unix commands fall into one of two categories:
Internal commands are part of the shell.
No new process is created when you execute an internal command. The shell simply carries out the execution of internal commands by itself.
External commands are programs separate
from the shell. The command name of an external command is
actually the name of an executable file,
i.e. a file containing the program or script. For example,
when you run the ls command, you are executing
the program contained in the file /bin/ls
.
When you run an external command, the shell locates the program file, loads the program into memory, and creates a new (child) process to execute the program. The shell then normally waits for the child process to end before prompting you for the next command.
Commands are implemented internally only when it is necessary or when there is a substantial benefit. If all commands were part of the shell, the shell would be enormous and require too much memory.
One command that must be internal is the cd command, which changes the CWD of the shell process. The cd command cannot be implemented as an external command, since the CWD is a property of the process, as described in the section called “Current Working Directory”.
We can prove this using Proof by Contradiction. If the cd command were external, it would run as a child process of the shell. Hence, running cd would create a child process, which would inherit CWD from the shell process, alter its copy of CWD, and then terminate. The CWD of the parent, the shell process, would be unaffected.
Expecting an external command to change your CWD for you would be akin to asking one of your children to go to take a shower for you. Neither is capable of affecting the desired change. Likewise, any command that alters the state of the shell process must be implemented as an internal command.
Most commands are external, i.e. programs separate from the shell. As a result, they behave the same way regardless of which shell we use to run them.
The executable files containing external commands are kept
in certain directories, most of which are called
bin
(short for "binary", since most
executable files are binary files containing machine code).
The most essential commands required for the Unix system to
function are kept in
/bin
and /usr/bin
.
The location of optional add-on commands varies, but a typical
location is /usr/local/bin
. Debian and
Redhat Linux mix add-on commands with core system commands in
/usr/bin
. BSD systems keep them separate
directories such as /usr/local/bin
or
/usr/pkg/bin
.
Example 3.14. Practice Break
shell-prompt: which cd shell-prompt: which cp shell-prompt: which exit shell-prompt: which ls shell-prompt: which pwd
/bin
and
/usr/bin
.
In the dark ages before Unix, when programmers wanted to look up a command or function, they actually had to get out of their chairs and walk somewhere to pick up a typically ring-bound printed manual to flip through. This resembled physical activity, which most computer scientists find terrifying.
The Unix designers saw the injustice of this situation and set out to rectify it. They imagined a Utopian world where they could sit in the same chair for ten hours straight without ever taking our eyes off the monitor or their fingers off the keyboard, happily subsisting on coffee and potato chips.
If there is one trait that best defines an engineer it is the ability to concentrate on one subject to the complete exclusion of everything else in the environment. This sometimes causes engineers to be pronounced dead prematurely. Some funeral homes in high-tech areas have started checking resumes before processing the bodies. Anybody with a degree in electrical engineering or experience in computer programming is propped up in the lounge for a few days just to see if he or she snaps out of it.
-- The Engineer Identification Test (Anonymous)
And so, online documentation was born. On Unix systems, all common Unix commands are documented in detail on the Unix system itself, and the documentation is accessible via the command line (you do not need a GUI to view it, which is important when using a dumb terminal to access a remote system). Whenever you want to know more about a particular Unix command, you can find out by typing man command-name. For example, to learn all about the ls command, type:
shell-prompt: man ls
The man covers virtually every common command, as well as other topics. It even covers itself:
shell-prompt: man man
The man command displays a nicely formatted document known as a man page. It uses a file viewing program called more, which can be used to browse through text files very quickly. Table 3.6, “Common hot keys in more” shows the most common keystrokes used to navigate a man page. For complete information on navigation, run:
shell-prompt: man more
Table 3.6. Common hot keys in more
Key | Action |
---|---|
h | Show key commands |
Space bar | Forward one page |
Enter/Return | Forward one line |
b | Back one page |
/ | Search |
Man pages include a number of standard sections, such as SYNOPSIS, DESCRIPTION, and SEE ALSO, which helps you identify other commands that might be of use.
Man pages do not always make good tutorials. Sometimes they contain too much detail, and they are often not well-written for novice users. If you're learning a new command for the first time, you might want to consult a Unix book or the WEB. The man pages will provide the most detailed and complete reference information on most commands, however.
The apropos command is used to search the man page headings for a given topic. It is equivalent to man -k. For example, to find out what man pages exist regarding Fortran, we might try the following:
shell-prompt: apropos sine FreeBSD moray.acadix bacon ~ 1002: apropos sine acos, acosf, acosl(3) - arc cosine functions acosh, acoshf, acoshl(3) - inverse hyperbolic cosine functions asin, asinf, asinl(3) - arc sine functions asinh, asinhf, asinhl(3) - inverse hyperbolic sine functions cos, cosf, cosl(3) - cosine functions cosh, coshf, coshl(3) - hyperbolic cosine functions cospi, cospif, cospil(3) - half-cycle cosine functions Role::Tiny(3) - Roles: a nouvelle cuisine portion size slice of Moose sin, sinf, sinl, sincosl(3) - sine functions sincos, sincosf, sincosl(3) - sine and cosine functions sinh, sinhf, sinhl(3) - hyperbolic sine function sinpi, sinpif, sinpil(3) - half-cycle sine functions
or
shell-prompt: man -k sine
The whatis is similar to apropos in that it lists short descriptions of commands. However, whatis only lists those commands with the search string in their name or short description, whereas apropos attempts to list everything related to the string.
shell-prompt: whatis sin sin, sinf, sinl, sincosl(3) - sine functions
The info command is an alternative to man that uses a non-graphical hypertext system instead of flat files. This allows the user to navigate extensive documentation more efficiently. The info command has a fairly high learning curve, but it is very powerful, and is often the best option for documentation on a given topic. Some open source software ships documentation in info format and provides a man page (converted from the info files) that actually has less information in it.
shell-prompt: info gcc
Example 3.15. Practice Break
Most Unix commands have short names which are abbreviations or acronyms for what they do. ( pwd = print working directory, cd = change directory, ls = list, ... ) Unix was originally designed for people with good memories and poor typing skills. Some of the most commonly used Unix commands are described below.
This section uses the same notation conventions as the Unix man pages:
ls lists files in CWD or a specified file or directory.
shell-prompt: ls [path ...]
shell-prompt: ls # List CWD shell-prompt: ls /etc # List /etc directory
mkdir creates one or more directories.
shell-prompt: mkdir [-p] path name [path name ...]
The -p
flag indicates that mkdir should
attempt to create any parent directories in the path
that don't already exist. If not used,
mkdir will fail unless all but the
last component of the path already exist.
shell-prompt: ls shell-prompt: mkdir Temp shell-prompt: ls # Should see Temp now shell-prompt: mkdir Temp2/C/MPI # Should fail shell-prompt: mkdir -p Temp2/C/MPI shell-prompt: ls Temp2
cp copies one or more files.
shell-prompt: cp source-file destination-file shell-prompt: cp source-file [source-file ...] destination-directory
If there is only one source filename, then destination can be either a filename or a directory.
shell-prompt: cd shell-prompt: touch file # Create file if it doesn't exist shell-prompt: cp file file.bak # Make a backup copy shell-prompt: ls # Should see file and file.bak
If there are multiple source files, then destination must be a directory. If destination is a filename, and the file exists, it will be overwritten.
shell-prompt: cp /etc/hosts* hosts # Should fail shell-prompt: cp /etc/hosts* Temp # Should work if directory Temp exists shell-prompt: ls Temp
mv moves or renames files or directories.
shell-prompt: mv source destination shell-prompt: mv source [source ...] destination-directory
shell-prompt: mv file.bak file.bk shell-prompt: ls
If multiple sources are given, destination must be a directory.
shell-prompt: mv file file.bk file2 # Should fail shell-prompt: mv file file.bk Temp # Should work if directory Temp exists shell-prompt: ls shell-prompt: ls Temp
rm removes one or more files.
shell-prompt: rm file [file ...]
shell-prompt: cd Temp shell-prompt: ls shell-prompt: rm hosts* shell-prompt: ls shell-prompt: rm file* shell-prompt: ls
If there are multiple hard links to a file, removing one of them only removes the link, and remaining links are still valid.
srm (secure rm) removes files securely, erasing the file content and directory entry so that the file cannot be recovered. Use this to remove files that contain sensitive data. This is not a standard Unix command, but a free program that can be easily installed on most systems via a package manager.
df shows the free disk space on all currently mounted partitions.
shell-prompt: df
ln link files or directories.
shell-prompt: ln source-file destination-file shell-prompt: ln -s source destination
The ln command creates another path name for the same file. Both names refer to the same file, so changes made through one name (e.g. using nano) appear in the other.
Each file in a typical Unix file system is described by a structure called an inode. The inode contains metadata, i.e. information about the file other than its content, such as the file's ownership, permissions, last modification time, and the locations of the disk blocks (chunks of disk space) containing the file's content.
Without -s
, a standard directory entry,
known as a hard link is created.
A hard link is a directory entry that points directly to
the inode of the file. In fact, such a directory entry
contains
little more than the file's name and the location of the
inode. Every file must have at least one hard link to it.
For this reason, removing a file is also known as "unlinking".
shell-prompt: touch file shell-prompt: ln file file.hardlink shell-prompt: ls -l
To create a second hard link, the source cannot be a directory, and the source and destination path names must be in the same file system. There is no harm in trying to create a hard link. If it fails, you can do a soft link instead.
shell-prompt: ln /etc . # Should fail shell-prompt: ln -s /etc . shell-prompt: ls shell-prompt: ls etc # List the link shell-prompt: ls etc/ # List contents of the directory
File systems under Windows appear as different drive letters,
such as C: or D:. Under Unix, each file system is
mounted to a specific directory.
The main file system is mounted to /
and the rest are mounted to subdirectories. The
df command will list file systems and their
mount points within the directory tree. For example, in the
df output below, /
and /data
are separate file systems.
The disk ada0
is divided into three
partitions. Partition 2, called
ada0p2
, contains a file system which
is mounted on /
. Partitions 0 and 1
are used by the operating system for other purposes. The
second disk, ada1
, has a file system
on partition 0, which is mounted on /data
.
shell-prompt: df Filesystem Size Used Avail Capacity Mounted on /dev/ada0p2 447G 266G 146G 64% / /dev/ada1p0 978G 172G 729G 20% /data
Everything under /data
and only things
under /data
are on
ada1p0
. Hence, we cannot create a hard
line to /data/joe/Research/notes.txt
in /home/joe
, which is on
ada0p2
.
# This will fail. # You cannot run this command, since the partitions are hypothetical # You can try linking something from a different filesystem based on # your own "df" output if you like. shell-prompt: ln /data/joe/Research/notes.txt ~joe
With -s
, a symbolic
link, or soft link is
created. A symbolic link is not a standard directory entry,
but a pointer to another path name. It is a directory
entry that points to another directory entry rather than
the inode of the file. Symbolic links to not have to
be in the same file system as the source.
# This will work shell-prompt: ln -s /data/joe/Research/notes.txt ~joe
rmdir removes one or more empty directories.
shell-prompt: rmdir directory [directory ...]
rmdir will fail if a directory is not completely empty. You may also need to check for hidden files using ls -a directory. To remove a directory and everything under it, use rm -r directory.
shell-prompt: cd shell-prompt: rmdir Temp2 # Should fail shell-prompt: rmdir Temp2/C shell-prompt: rmdir Temp2/C/MPI shell-prompt: rm -r Temp2 shell-prompt: rmdir Temp # Should tail shell-prompt: rm -r Temp shell-prompt: ls
du reports the disk usage of a directory and everything under it.
shell-prompt: du [-s] [-h] path
The -s
(summary) flag suppresses output
about each file in the subtree, so that only the total
disk usage of the directory is shown. The -h
asks for human-readable output with gigabytes followed by
a G, megabytes by an M, etc.
shell-prompt: du -sh /etc
The du command does not add up file content sizes. It adds up the disk space used by each file. In an uncompressed file system, space used is rounded up to a multiple of the block size (commonly 4096 bytes). In a compressed file system, space used is a multiple of blocks used after compression, which can be significantly smaller than the file content. This is often the case with the ZFS file system, which is standard on FreeBSD and Solaris-based systems such as OpenIndiana. "Fluffy" text files that compress easily, such as genomic data, may require only a small fraction of their content size in disk space on ZFS. This make ZFS a great choice for housing genomic data.
As mentioned previously, internal commands are part of the shell, and serve to control the shell itself. Below are some of the most common internal commands.
cd changes the current working directory of the shell process.
shell-prompt: cd [directory]
pushd changes CWD and saves the old CWD on a stack so that we can easily return.
shell-prompt: pushd directory
Users often encounter the need to temporarily go to another directory, run a few commands, and then come back to the current directory.
The pushd command is a very useful alternative to cd that helps in this situation. It performs the same operation as cd, but it records the starting CWD by adding it to the top of a stack of CWDs. You can then return to where the last pushd command was invoked using popd. This saves you from having to retype the path name of the directory to which you want to return. This is like leaving a trail of bread crumbs in the woods to retrace your path back home, except the pushd stack will not get eaten by birds and squirrels, and you won't end up in a witch's soup pot.
Example 3.16. Practice Break
Try the following sequence of commands:
shell-prompt: pwd # Check starting point shell-prompt: pushd /etc shell-prompt: more hosts shell-prompt: pushd /home shell-prompt: ls shell-prompt: popd # Back to /etc shell-prompt: pwd shell-prompt: more hosts shell-prompt: popd # Back to starting point shell-prompt: pwd
exit terminates the shell process.
shell-prompt: exit
This is the most reliable way to exit a shell. In some situations you could also type logout or simply press Ctrl+d, which sends an EOT character (end of transmission, ASCII/ISO character 4) to the shell.
cat echoes the contents of one or more text files.
shell-prompt: cat file [file ...]
shell-prompt: cat /etc/hosts
The vis and cat -v commands display invisible characters in a visible way. For example, carriage return characters present in Windows files are normally not shown by most Unix commands. The vis and cat -v commands will show them as '^M' (representing Control+M, which is what you would type to produce this character).
shell-prompt: cat sample.txt This line contains a carriage return. shell-prompt: vis sample.txt This line contains a carriage return.\^M shell-prompt: cat -v sample.txt This line contains a carriage return.^M
head shows the top N lines of one or more text files.
shell-prompt: head -n # file [file ...]
If the flag -n followed by an integer number N is given, the top N lines are shown instead of the default of 10.
shell-prompt: head -n 5 /etc/hosts
The head command can also be useful for generating small test inputs. Suppose you're developing a new program or script that processes genomic sequence files in FASTA format. Real FASTA files can contain millions of sequences and take a great deal of time to process. For testing new code, we don't need much data, and we want the test to complete in a few seconds rather than hours. We can use head to extract a small number of sequences from a large FASTA file for quick testing. Since FASTA files have alternating header and sequence lines, we must always choose a multiple of 2 lines. We use the output redirection operator (>) to send the head output to a file instead of the terminal screen. Redirection is covered in the section called “Redirection and Pipes”.
# You cannot run this command unless you have a file called # reall-big.fasta in the CWD shell-prompt: head -n 1000 really-big.fasta > small-test.fasta
tail shows the bottom N lines of one or more text files.
shell-prompt: tail -n # file [file ...]
Tail is especially useful for viewing the end of a large file that would be cumbersome to view with more.
If the flag -n followed by an integer number N is given, the bottom N lines are shown instead of the default of 10.
shell-prompt: tail -n 5 /etc/hosts
The diff command shows the differences between two text files. This is most useful for comparing two versions of the same file to see what has changed. Also see cdiff, a specialized version of diff, for comparing C source code.
The -u
flag asks for
unified diff output, which shows the
removed text (text in the first file by not the second)
preceded by '-', the added text (text in the second file
but not the first) preceded by a '+', and some unchanged
lines for context. Most people find this easier to read
than the default output format.
shell-prompt: printf "1\n2\n3\n" > input1.txt shell-prompt: printf "2\n3\n4\n" > input2.txt shell-prompt: diff input1.txt input2.txt shell-prompt: diff -u input1.txt input2.txt shell-prompt: rm input1.txt input2.txt
There are more text editors available for Unix systems than any one person is aware of. Some are terminal-based, some are graphical, and some have both types of interfaces.
All Unix systems support running graphical programs from remote locations, but many graphical programs require a fast connection (100 megabits/sec) or more to function comfortably.
Knowing how to use a terminal-based text editor is therefore a very good idea, so that you're prepared to work on a remote Unix system over a slow connection if necessary. Some of the more common terminal-based editors are described below.
vi (visual editor) is the standard text editor for all Unix systems. Most users either love or hate the vi interface, but it's a good editor to know since it is available on every Unix system.
nano is an extremely simplistic text editor that is ideal for beginners. It is a rewrite of the pico editor, which is known to have many bugs and security issues. Neither editor is standard on Unix systems, but both are free and easy to install. These editors entail little or no learning curve, but are not sophisticated enough for extensive programming or scripting.
emacs (Edit MACroS) is a more sophisticated editor used by many programmers. It is known for being hard to learn, but very powerful. It is not standard on most Unix systems, but is free and easy to install.
ape is a menu-driven, user-friendly IDE (integrated development environment), i.e. programmer's editor. It has an interface similar to PC and Mac programs, but works on a standard Unix terminal. It is not standard on most Unix systems, but is free and easy to install. ape has a small learning curve, and advanced features to make programming much faster.
Eclipse is a popular open-source graphical IDE written in Java, with support for many languages. It is sluggish over a slow connection, so it may not work well on remote systems over ssh.
hostname prints the network name of the machine.
shell-prompt: hostname
This is often useful when you are working on multiple Unix machines at the same time (e.g. via ssh), and forgot which window applies to each machine.
passwd changes your password. It asks for your old password once, and the new one twice (to ensure that you don't accidentally set your password to something you don't know because your finger slipped). Unlike many graphical password programs, passwd does not echo anything for each character typed. Even allowing someone to see the length of your password is a bad idea from a security standpoint.
# This may not work on systems using an authentication service # rather than local passwords shell-prompt: passwd
The passwd command is generally only used for setting local passwords on the Unix machine itself. Many Unix systems are configured to authenticate users via a remote service such as Lightweight Directory Access Protocol (LDAP) or Active Directory (AD). Changing LDAP or AD passwords may require using a web portal to the LDAP or AD server instead of the passwd command.
clear clears your terminal screen (assuming the TERM environment variable is properly set).
shell-prompt: clear
reset resets your terminal to its default state. This is useful when your terminal has been corrupted by bad output, such as when attempting to view a binary file with cat.
Terminals are controlled by magic sequences, sequences of invisible control characters sent from the host computer to the terminal amid the normal output. Magic sequences move the cursor, change the color, change the international character set, etc. Binary files contain random data that sometimes by chance contain magic sequences that could alter the mode of your terminal. If this happens, running reset will usually correct the problem. If not, you will need to log out and log back in.
shell-prompt: reset
Table 3.7, “Unix Commands” provides a quick reference for looking up common Unix commands. For details on any of these commands, run man command (or info command on some systems).
Table 3.7. Unix Commands
Synopsis | Description |
---|---|
ls [file|directory] | List file(s) |
cp source-file destination-file | Copy a file |
cp source-file [source-file ...] directory | Copy multiple files to a directory |
mv source-file destination-file | Rename a file |
mv source-file [source-file ...] directory | Move multiple files to a directory |
ln source-file destination-file | Create another name for the same file. (source and destination must be in the same file system) |
ln -s source destination | Create a symbolic link to a file or directory |
rm file [file ...] | Remove one or more files |
rm -r directory | Recursively remove a directory and all of its contents |
mkdir directory | Create a directory |
rmdir directory | Remove a directory (the directory must be empty) |
od/hexdump | Show the contents of a file in octal/hexadecimal |
sort | Sort text files based on flexible criteria |
uniq | Echo files, eliminating adjacent duplicate lines. |
diff | Show differences between text files. |
cmp | Detect differences between binary files. |
cdiff | Show differences between C programs. |
date | Show the current date and time. |
cal | Print a calendar for any month of any year. |
printenv | Print environment variables. |
What types of commands have to be internal to the shell? Give one example and explain why it must be internal.
How can you find a list of the basic Unix commands available on your system?
How can you find out whether the grep command is internal or external, and where it is located?
What kind of suffering did computer users have to endure in order to read documentation before the Unix renaissance? How did Unix put an end to such suffering?
Show a Unix command that helps us learn about all the command-line flags available for the tail command.
Show a Unix command that copies the file /tmp/sample.txt to the CWD.
Show a Unix command that copies all files in /tmp whose names begin with "sample" and end with ".txt" to the CWD.
Show a Unix command that moves all the files in the CWD whose names end with ".py" to a subdirectory of the CWD called "Python".
Show a Unix command that creates another file name in the CWD called test-input.txt for the existing file ./Data/input.txt.
What is a hard link?
What is a symbolic link?
What do we get when we remove the path name to which a symbolic link points?
What limitations do hard links have that soft links do not have?
How do we create a new directory /home/joe/Data/Project1 if the Data directory does not exist and the CWD is /home/joe?
How do we remove the directory ./Data if it is empty? If it is not empty?
Show a Unix command that tells us how much disk space is available in each file system.
Show a Unix command that tells us how much space is used by the directory ./Data.
Show a sequence of Unix commands that change CWD to /tmp, then to /etc and then return to the original CWD.
How do we exit the shell?
Show a Unix command that tells us if there are carriage returns in graph.py.
Show a Unix command that displays the first 20 lines of output.txt.
Show a Unix command that displays the last 20 lines of output.txt.
Show a Unix command that displays what has changed between analysis.c.old and analysis.c.
Which text editor is available on all Unix systems?
Show a Unix command that tells us the name of the machine running our shell.
Show a Unix command to the remote server unixdev1.ceas.uwm.edu as user joe in order to run commands on it.
Show a Unix command to change our local password.
How do we change our password for a Unix system that relies on LDAP or AD?
Show a Unix command that clears the terminal display.
Show a Unix command to reset the terminal mode to default settings.