Unix commands fall into one of two categories:
Internal commands are part of the shell.
No new process is created when you execute an internal command. The shell simply carries out the execution of internal commands by itself.
External commands are programs separate
from the shell. The command name of an external command is
actually the name of an executable file,
i.e. a file containing the program or script. For example,
when you run the ls command, you are executing
the program contained in the file /bin/ls
.
When you run an external command, the shell locates the program file, loads the program into memory, and creates a new (child) process to execute the program. The shell then normally waits for the child process to end before prompting you for the next command.
Commands are implemented internally only when it is necessary or when there is a substantial benefit. If all commands were part of the shell, the shell would be enormous and require too much memory.
One command that must be internal is the cd command, which changes the CWD of the shell process. The cd command cannot be implemented as an external command, since the CWD is a property of the process, as described in the section called “Current Working Directory”.
We can prove this using Proof by Contradiction. If the cd command were external, it would run as a child process of the shell. Hence, running cd would create a child process, which would inherit CWD from the shell process, alter its copy of CWD, and then terminate. The CWD of the parent, the shell process, would be unaffected.
Expecting an external command to change your CWD for you would be akin to asking one of your children to go to take a shower for you. Neither is capable of affecting the desired change. Likewise, any command that alters the state of the shell process must be implemented as an internal command.
Most commands are external, i.e. programs separate from the shell. As a result, they behave the same way regardless of which shell we use to run them.
The executable files containing external commands are kept
in certain directories, most of which are called
bin
(short for "binary", since most
executable files are binary files containing machine code).
The most essential commands required for the Unix system to
function are kept in
/bin
and /usr/bin
.
The location of optional add-on commands varies, but a typical
location is /usr/local/bin
. Debian and
Redhat Linux mix add-on commands with core system commands in
/usr/bin
. BSD systems keep them separate
directories such as /usr/local/bin
or
/usr/pkg/bin
.
shell-prompt: which cd shell-prompt: which cp shell-prompt: which exit shell-prompt: which ls shell-prompt: which pwd
/bin
and
/usr/bin
.
In the dark ages before Unix, when programmers wanted to look up a command or function, they actually had to get out of their chairs and walk somewhere to pick up a typically ring-bound printed manual to flip through.
The Unix designers saw the injustice of this situation and set out to rectify it. They imagined a Utopian world where they could sit in the same chair for ten hours straight without ever taking our eyes off the monitor or their fingers off the keyboard, happily subsisting on coffee and potato chips.
If there is one trait that best defines an engineer it is the ability to concentrate on one subject to the complete exclusion of everything else in the environment. This sometimes causes engineers to be pronounced dead prematurely. Some funeral homes in high-tech areas have started checking resumes before processing the bodies. Anybody with a degree in electrical engineering or experience in computer programming is propped up in the lounge for a few days just to see if he or she snaps out of it.
-- The Engineer Identification Test (Anonymous)
And so, online documentation was born. On Unix systems, all common Unix commands are documented in detail on the Unix system itself, and the documentation is accessible via the command line (you do not need a GUI to view it, which is important when using a dumb terminal to access a remote system). Whenever you want to know more about a particular Unix command, you can find out by typing man command-name. For example, to learn all about the ls command, type:
shell-prompt: man ls
The man covers virtually every common command, as well as other topics. It even covers itself:
shell-prompt: man man
The man command displays a nicely formatted document known as a man page. It uses a file viewing program called more, which can be used to browse through text files very quickly. Table 1.6, “Common hot keys in more” shows the most common keystrokes used to navigate a man page. For complete information on navigation, run:
shell-prompt: man more
Table 1.6. Common hot keys in more
Key | Action |
---|---|
h | Show key commands |
Space bar | Forward one page |
Enter/Return | Forward one line |
b | Back one page |
/ | Search |
Man pages include a number of standard sections, such as SYNOPSIS, DESCRIPTION, and SEE ALSO, which helps you identify other commands that might be of use.
Man pages do not always make good tutorials. Sometimes they contain too much detail, and they are often not well-written for novice users. If you're learning a new command for the first time, you might want to consult a Unix book or the WEB. The man pages will provide the most detailed and complete reference information on most commands, however.
The apropos command is used to search the man page headings for a given topic. It is equivalent to man -k. For example, to find out what man pages exist regarding Fortran, we might try the following:
shell-prompt: apropos fortran
or
shell-prompt: man -k fortran
The whatis is similar to apropos in that it lists short descriptions of commands. However, whatis only lists those commands with the search string in their name or short description, whereas apropos attempts to list everything related to the string.
The info command is an alternative to man that uses a non-graphical hypertext system instead of flat files. This allows the user to navigate extensive documentation more efficiently. The info command has a fairly high learning curve, but it is very powerful, and is often the best option for documentation on a given topic. Some open source software ships documentation in info format and provides a man page (converted from the info files) that actually has less information in it.
shell-prompt: info gcc
Most Unix commands have short names which are abbreviations or acronyms for what they do. ( pwd = print working directory, cd = change directory, ls = list, ... ) Unix was originally designed for people with good memories and poor typing skills. Some of the most commonly used Unix commands are described below.
This section uses the same notation conventions as the Unix man pages:
cp copies one or more files.
shell-prompt: cp source-file destination-file shell-prompt: cp source-file [source-file ...] destination-directory
If there is only one source filename, then destination can be either a filename or a directory. If there are multiple source files, then destination must be a directory. If destination is a filename, and the file exists, it will be overwritten.
shell-prompt: cp file file.bak # Make a backup copy shell-prompt: cp file file.bak ~ # Copy files to home directory
ls lists files in CWD or a specified file or directory.
shell-prompt: ls [path ...]
shell-prompt: ls # List CWD shell-prompt: ls /etc # List /etc directory
mv moves or renames files or directories.
shell-prompt: mv source destination shell-prompt: mv source [source ...] destination-directory
If multiple sources are given, destination must be a directory.
shell-prompt: mv prog1.c Programs
ln link files or directories.
shell-prompt: ln source-file destination-file shell-prompt: ln -s source destination
The ln command creates another path name for the same file. Both names refer to the same file, and changes made through one appear in the other.
Without -s
, a standard directory entry,
known as a hard link is created.
A hard link is a directory entry that points to the first
block of data in the file. Every file must have at least
one hard link to it.
If only one path name exists for a file, it is a hard
link. For this reason, removing a file is also known as
"unlinking". To create a second hard link, the source and
destination path names must be in the same file system.
File systems under Windows appear as different drive letters,
such as C: or D:. Under Unix, all file systems are merged
into a single directory tree under /.
The df will list file systems and their
location within the directory tree. There is no harm in
trying to create a hard link. If it fails, you can do
a soft link instead.
With -s
, a symbolic
link, or soft link is
created. A symbolic link is not a standard directory entry,
but a pointer to another path name. It is a directory
entry that points to another directory entry rather than
the content of the file. Only symbolic links can be used
for directories, and symbolic links to not have to
be in the same file system as the source.
shell-prompt: ln -s /etc/hosts ~ # Make a convenient link to hosts
rm removes one or more files.
shell-prompt: rm file [file ...]
shell-prompt: rm temp.txt core a.out
If there are multiple hard links to a file, removing one of them only removes the link, and remaining links are still valid.
srm (secure rm) removes files securely, erasing the file content and directory entry so that the file cannot be recovered. Use this to remove files that contain sensitive data. This is not a standard Unix command, but a free program that can be easily installed on most systems via a package manager.
mkdir creates one or more directories.
shell-prompt: mkdir [-p] path name [path name ...]
The -p
flag indicates that mkdir should
attempt to create any parent directories in the path
that don't already exist. If not used,
mkdir will fail unless all but the
last component of the path already exist.
shell-prompt: mkdir Programs shell-prompt: mkdir -p Programs/C/MPI
rmdir removes one or more empty directories.
shell-prompt: rmdir directory [directory ...]
rmdir will fail if a directory is not completely empty. You may also need to check for hidden files using ls -a directory. To remove a directory and everything under it, use rm -r directory.
shell-prompt: rmdir Programs/C/MPI
find locates files within a subtree using a wide variety of possible criteria.
shell-prompt: find start-directory criteria [action]
find is a very powerful and complex command that can be used to not only find files, but run commands on the files matching the search criteria.
Find can process globbing patterns like the shell, but note that we need to prevent the shell from processing them before running find by enclosing them in quotes.
# Find all core files (names end with "core") shell-prompt: find . -name '*core' # Remove cores shell-prompt: find . -name '*core' -exec rm '{}' \; # Remove multiple cores with each rm command (much faster) shell-prompt: find . -name '*core' -exec rm '{}' +
df shows the free disk space on all currently mounted partitions.
shell-prompt: df
du reports the disk usage of a directory and everything under it.
shell-prompt: du [-s] [-h] path
The -s
(summary) flag suppresses output
about each file in the subtree, so that only the total
disk usage of the directory is shown. The -h
asks for human-readable output with gigabytes followed by
a G, megabytes by an M, etc.
shell-prompt: du -sh Qemu 6.8G Qemu/
As mentioned previously, internal commands are part of the shell, and serve to control the shell itself. Below are some of the most common internal commands.
cd changes the current working directory of the shell process. It is described in more detail in the section called “Current Working Directory”.
shell-prompt: cd [directory]
pushd changes CWD and saves the old CWD on a stack so that we can easily return.
shell-prompt: pushd directory
Users often encounter the need to temporarily go to another directory, run a few commands, and then come back to the current directory.
The pushd command is a very useful alternative to cd that helps in this situation. It performs the same operation as cd, but it records the starting CWD by adding it to the top of a stack of CWDs. You can then return to where the last pushd command was invoked using popd. This saves you from having to retype the path name of the directory to which you want to return. This is like leaving a trail of bread crumbs in the woods to retrace your path back home, except the pushd stack will not get eaten by birds and squirrels, and you won't end up in a witch's soup pot.
Try the following sequence of commands:
shell-prompt: pwd # Check starting point shell-prompt: pushd /etc shell-prompt: more hosts shell-prompt: pushd /home shell-prompt: ls shell-prompt: popd # Back to /etc shell-prompt: pwd shell-prompt: more hosts shell-prompt: popd # Back to starting point shell-prompt: pwd
exit terminates the shell process.
shell-prompt: exit
This is the most reliable way to exit a shell. In some situations you could also type logout or simply press Ctrl+d, which sends an EOT character (end of transmission, ASCII/ISO character 4) to the shell.
cat echoes the contents of one or more text files.
shell-prompt: cat file [file ...]
shell-prompt: cat /etc/hosts
The vis and cat -v commands display invisible characters in a visible way. For example, carriage return characters present in Windows files are normally not shown by most Unix commands. The vis and cat -v commands will show them as '^M' (representing Control+M, which is what you would type to produce this character).
shell-prompt: cat sample.txt This line contains a carriage return. shell-prompt: vis sample.txt This line contains a carriage return.\^M shell-prompt: cat -v sample.txt This line contains a carriage return.^M
head shows the top N lines of one or more text files.
shell-prompt: head -n # file [file ...]
If a flag consisting of a - followed by an integer number N is given, the top N lines are shown instead of the default of 10.
shell-prompt: head -n 5 prog1.c
The head command can also be useful for generating small test inputs. Suppose you're developing a new program or script that processes genomic sequence files in FASTA format. Real FASTA files can contain millions of sequences and take a great deal of time to process. For testing new code, we don't need much data, and we want the test to complete in a few seconds rather than hours. We can use head to extract a small number of sequences from a large FASTA file for quick testing. Since FASTA files have alternating header and sequence lines, we must always choose a multiple of 2 lines. We use the output redirection operator (>) to send the head output to a file instead of the terminal screen. Redirection is covered in Section 1.13, “Redirection and Pipes”.
shell-prompt: head -n 1000 really-big.fasta > small-test.fasta
tail shows the bottom N lines of one or more text files.
shell-prompt: tail -n # file [file ...]
Tail is especially useful for viewing the end of a large file that would be cumbersome to view with more.
If a flag consisting of a - followed by an integer number N is given, the bottom N lines are shown instead of the default of 10.
shell-prompt: tail -n 5 output.txt
The diff command shows the differences between two text files. This is most useful for comparing two versions of the same file to see what has changed. Also see cdiff, a specialized version of diff, for comparing C source code.
The -u
flag asks for
unified diff output, which shows the
removed text (text in the first file by not the second)
preceded by '-', the added text (text in the second file
but not the first) preceded by a '+', and some unchanged
lines for context. Most people find this easier to read
than the default output format.
shell-prompt: diff -u input1.txt input2.txt
There are more text editors available for Unix systems than any one person is aware of. Some are terminal-based, some are graphical, and some have both types of interfaces.
All Unix systems support running graphical programs from remote locations, but many graphical programs require a fast connection (100 megabits/sec) or more to function comfortably.
Knowing how to use a terminal-based text editor is therefore a very good idea, so that you're prepared to work on a remote Unix system over a slow connection if necessary. Some of the more common terminal-based editors are described below.
vi (visual editor) is the standard text editor for all Unix systems. Most users either love or hate the vi interface, but it's a good editor to know since it is available on every Unix system.
nano is an extremely simplistic text editor that is ideal for beginners. It is a rewrite of the pico editor, which is known to have many bugs and security issues. Neither editor is standard on Unix systems, but both are free and easy to install. These editors entail little or no learning curve, but are not sophisticated enough for extensive programming or scripting.
emacs (Edit MACroS) is a more sophisticated editor used by many programmers. It is known for being hard to learn, but very powerful. It is not standard on most Unix systems, but is free and easy to install.
ape is a menu-driven, user-friendly IDE (integrated development environment), i.e. programmer's editor. It has an interface similar to PC and Mac programs, but works on a standard Unix terminal. It is not standard on most Unix systems, but is free and easy to install. ape has a small learning curve, and advanced features to make programming much faster.
Eclipse is a popular open-source graphical IDE written in Java, with support for many languages. It is sluggish over a slow connection, so it may not work well on remote systems over ssh.
hostname prints the network name of the machine.
shell-prompt: hostname
This is often useful when you are working on multiple Unix machines at the same time (e.g. via ssh), and forgot which window applies to each machine.
ssh is used to remotely log into another machine on the network and start a shell.
ssh [name@]hostname
shell-prompt: ssh joe@unixdev1.ceas.uwm.edu
Network commands for transferring files are discussed in Section 1.15, “File Transfer”.
passwd changes your password. It asks for your old password once, and the new one twice (to ensure that you don't accidentally set your password to something you don't know because your finger slipped). Unlike many graphical password programs, passwd does not echo anything for each character typed. Even allowing someone to see the length of your password is a bad idea from a security standpoint.
shell-prompt: passwd
The passwd command is generally only used for setting local passwords on the Unix machine itself. Many Unix systems are configured to authenticate users via a remote service such as Lightweight Directory Access Protocol (LDAP) or Active Directory (AD). Changing LDAP or AD passwords may require using a web portal to the LDAP or AD server instead of the passwd command.
clear clears your terminal screen (assuming the TERM environment variable is properly set).
shell-prompt: clear
reset resets your terminal to its default state. This is useful when your terminal has been corrupted by bad output, such as when attempting to view a binary file with cat.
Terminals are controlled by magic sequences, sequences of invisible control characters sent from the host computer to the terminal amid the normal output. Magic sequences move the cursor, change the color, change the international character set, etc. Binary files contain random data that sometimes by chance contain magic sequences that could alter the mode of your terminal. If this happens, running reset will usually correct the problem. If not, you will need to log out and log back in.
shell-prompt: reset
Table 1.7, “Unix Commands” provides a quick reference for looking up common Unix commands. For details on any of these commands, run man command (or info command on some systems).
Table 1.7. Unix Commands
Synopsis | Description |
---|---|
ls [file|directory] | List file(s) |
cp source-file destination-file | Copy a file |
cp source-file [source-file ...] directory | Copy multiple files to a directory |
mv source-file destination-file | Rename a file |
mv source-file [source-file ...] directory | Move multiple files to a directory |
ln source-file destination-file | Create another name for the same file. (source and destination must be in the same file system) |
ln -s source destination | Create a symbolic link to a file or directory |
rm file [file ...] | Remove one or more files |
rm -r directory | Recursively remove a directory and all of its contents |
srm file [file ...] | Securely erase and remove one or more files |
mkdir directory | Create a directory |
rmdir directory | Remove a directory (the directory must be empty) |
find start-directory criteria | Find files/directories based on flexible criteria |
make | Rebuild a file based on one or more other files |
od/hexdump | Show the contents of a file in octal/hexadecimal |
awk | Process tabular data from a text file |
sed | Stream editor. Echo files, making changes to contents. |
sort | Sort text files based on flexible criteria |
uniq | Echo files, eliminating adjacent duplicate lines. |
diff | Show differences between text files. |
cmp | Detect differences between binary files. |
cdiff | Show differences between C programs. |
cut | Extract substrings from text. |
m4 | Process text files containing m4 mark-up. |
chfn | Change finger info (personal identity). |
chsh | Change login shell. |
su | Substitute user. |
cc/clang/gcc/icc | Compile C programs. |
f77/f90/gfortran/ifort | Compile Fortran programs. |
ar | Create static object libraries. |
indent | Beautify C programs. |
astyle | Beautify C, C++, C#, and Java programs. |
tar | Pack a directory tree into a single file. |
gzip | Compress files. |
gunzip | Uncompress gzipped files. |
bzip2 | Compress files better (and slower). |
bunzip2 | Uncompress bzipped files. |
zcat/zmore/zgrep/bzcat/bzmore/bzgrep | Process compressed files. |
exec command | Replace shell process with command. |
date | Show the current date and time. |
cal | Print a calendar for any month of any year. |
bc | Unlimited precision calculator. |
printenv | Print environment variables. |
Make sure you are using the latest version of this document.
Carefully read one section of this document and casually read other material (such as corresponding sections in a textbook, if one exists) if needed.
Try to answer the questions from that section. If you do not remember the answer, review the section to find it.
Write the answer in your own words. Do not copy and paste. Verbalizing answers in your own words helps your memory and understanding. Copying does not, and demonstrates a lack of interest in learning.
Check the answer key to make sure your answer is correct and complete.
DO NOT LOOK AT THE ANSWER KEY BEFORE ANSWERING QUESTIONS TO THE VERY BEST OF YOUR ABILITY. In doing so, you would only cheat yourself out of an opportunity to learn and prepare for the quizzes and exams.
Important notes:
Show all your work. This will improve your understanding and ensure full credit for the homework.
The practice problems are designed to make you think about the topic, starting from basic concepts and progressing through real problem solving.
Try to verify your own results. In the working world, no one will be checking your work. It will be entirely up to you to ensure that it is done right the first time.
Start as early as possible to get your mind chewing on the questions, and do a little at a time. Using this approach, many answers will come to you seemingly without effort, while you're showering, walking the dog, etc.
What types of commands have to be internal to the shell? Give one example and explain why it must be internal.
How can you find a list of the basic Unix commands available on your system?
How can you find out whether the grep command is internal or external, and where it is located?
What kind of suffering did computer users have to endure in order to read documentation before the Unix renaissance? How did Unix put an end to such suffering?
How can we learn about all the command-line flags available for the tail command?
How can we copy the file /tmp/sample.txt to the CWD?
How can we copy all files whose names begin with "sample" and end with ".txt" to the CWD?
How can we move all the files whose names end with ".py" to a subdirectory of the CWD called "Python"?
How can we create another filename "./test-input.txt" for the file "./Data/input.txt"?
What is a hard link?
What is a symbolic link?
What do we get when we remove the path name to which a symbolic link points?
How do we create a new directory /home/joe/Data/Project1 if the Data directory does not exist and the CWD is /home/joe?
How do we remove the directory ./Data if it is empty? If it is not empty?
How can we find out how much disk space is available in each file system?
How can we find out how much space is used by the Data directory?
How can we change CWD to /tmp, then to /etc and then return to the original CWD?
How do we exit the shell?
How can we see if there are carriage returns in graph.py?
How can we see the first 20 lines of output.txt?
How can we see the last 20 lines of output.txt?
How can we see what has changed between analysis.c.old and analysis.c?
Which text editor is available on all Unix systems?
How can we find out the name of the machine running our shell?
How can user joe log into the remote server unixdev1.ceas.uwm.edu to run commands on it?
How do we change our local password on a Unix system?
How do we change our password for a Unix system that relies on LDAP or AD?
How do we clear the terminal display?
How do we reset the terminal mode to defaults?