1.10. Unix Commands and the Shell

Before You Begin

You should have a basic understanding of Unix processes, files, and directories. These topics are covered in Section 1.8, “Processes” and Section 1.9, “The Unix File System”.

Unix commands fall into one of two categories:

1.10.1. Internal Commands

Commands are implemented internally only when it is necessary or when there is a substantial benefit. If all commands were part of the shell, the shell would be enormous and require too much memory.

One command that must be internal is the cd command, which changes the CWD of the shell process. The cd command cannot be implemented as an external command, since the CWD is a property of the process, as described in the section called “Current Working Directory”.

We can prove this using Proof by Contradiction. If the cd command were external, it would run as a child process of the shell. Hence, running cd would create a child process, which would inherit CWD from the shell process, alter its copy of CWD, and then terminate. The CWD of the parent, the shell process, would be unaffected.

Expecting an external command to change your CWD for you would be akin to asking one of your children to go to take a shower for you. Neither is capable of affecting the desired change. Likewise, any command that alters the state of the shell process must be implemented as an internal command.

1.10.2. External Commands

Most commands are external, i.e. programs separate from the shell. As a result, they behave the same way regardless of which shell we use to run them.

The executable files containing external commands are kept in certain directories, most of which are called bin (short for "binary", since most executable files are binary files containing machine code). The most essential commands required for the Unix system to function are kept in /bin and /usr/bin. The location of optional add-on commands varies, but a typical location is /usr/local/bin. Debian and Redhat Linux mix add-on commands with core system commands in /usr/bin. BSD systems keep them separate directories such as /usr/local/bin or /usr/pkg/bin.

Practice Break

  1. Use which under C shell family shells to find out whether the following commands are internal or external. Use type under Bourne family shells (bash, ksh, dash, zsh). You can use either command under either shell, but will get better results if you follow the advice above. (Try both and see what happens.)
    shell-prompt: which cd
    shell-prompt: which cp
    shell-prompt: which exit
    shell-prompt: which ls
    shell-prompt: which pwd
                        
  2. Use ls to find out what commands are located in /bin and /usr/bin.

1.10.3. Getting Help

In the dark ages before Unix, when programmers wanted to look up a command or function, they actually had to get out of their chairs and walk somewhere to pick up a typically ring-bound printed manual to flip through.

The Unix designers saw the injustice of this situation and set out to rectify it. They imagined a Utopian world where they could sit in the same chair for ten hours straight without ever taking our eyes off the monitor or their fingers off the keyboard, happily subsisting on coffee and potato chips.

Aside

If there is one trait that best defines an engineer it is the ability to concentrate on one subject to the complete exclusion of everything else in the environment. This sometimes causes engineers to be pronounced dead prematurely. Some funeral homes in high-tech areas have started checking resumes before processing the bodies. Anybody with a degree in electrical engineering or experience in computer programming is propped up in the lounge for a few days just to see if he or she snaps out of it.

-- The Engineer Identification Test (Anonymous)

And so, online documentation was born. On Unix systems, all common Unix commands are documented in detail on the Unix system itself, and the documentation is accessible via the command line (you do not need a GUI to view it, which is important when using a dumb terminal to access a remote system). Whenever you want to know more about a particular Unix command, you can find out by typing man command-name. For example, to learn all about the ls command, type:

shell-prompt: man ls
            

The man covers virtually every common command, as well as other topics. It even covers itself:

shell-prompt: man man
            

The man command displays a nicely formatted document known as a man page. It uses a file viewing program called more, which can be used to browse through text files very quickly. Table 1.6, “Common hot keys in more shows the most common keystrokes used to navigate a man page. For complete information on navigation, run:

shell-prompt: man more
            

Table 1.6. Common hot keys in more

KeyAction
hShow key commands
Space barForward one page
Enter/ReturnForward one line
bBack one page
/Search

Man pages include a number of standard sections, such as SYNOPSIS, DESCRIPTION, and SEE ALSO, which helps you identify other commands that might be of use.

Man pages do not always make good tutorials. Sometimes they contain too much detail, and they are often not well-written for novice users. If you're learning a new command for the first time, you might want to consult a Unix book or the WEB. The man pages will provide the most detailed and complete reference information on most commands, however.

The apropos command is used to search the man page headings for a given topic. It is equivalent to man -k. For example, to find out what man pages exist regarding Fortran, we might try the following:

shell-prompt: apropos fortran
            

or

shell-prompt: man -k fortran
            

The whatis is similar to apropos in that it lists short descriptions of commands. However, whatis only lists those commands with the search string in their name or short description, whereas apropos attempts to list everything related to the string.

The info command is an alternative to man that uses a non-graphical hypertext system instead of flat files. This allows the user to navigate extensive documentation more efficiently. The info command has a fairly high learning curve, but it is very powerful, and is often the best option for documentation on a given topic. Some open source software ships documentation in info format and provides a man page (converted from the info files) that actually has less information in it.

shell-prompt: info gcc
            

Practice Break

  1. Find out how to display a '/' after each directory name and a '*' after each executable file when running ls.
  2. Use apropos to find out what Unix commands to use with bzip files.

1.10.4. A Basic Set of Unix Commands

Most Unix commands have short names which are abbreviations or acronyms for what they do. ( pwd = print working directory, cd = change directory, ls = list, ... ) Unix was originally designed for people with good memories and poor typing skills. Some of the most commonly used Unix commands are described below.

Note

This section is meant to serve as a quick reference, and to inform new readers about which commands they should learn. There is much more to know about these commands than we can cover here. For full details about any of the commands described here, consult the man pages, info pages, or the WEB.

This section uses the same notation conventions as the Unix man pages:

  • Optional arguments are shown inside [].
  • The 'or' symbol (|) between two items means one or the other.
  • An ellipses (...) means optionally more of the same.
  • "file" means a filename is required and a directory name is not allowed. "directory" means a directory name is required, and a filename is not allowed. "path" means either a filename or directory name is acceptable.

File and Directory Management

cp copies one or more files.

shell-prompt: cp source-file destination-file
shell-prompt: cp source-file [source-file ...] destination-directory
                

If there is only one source filename, then destination can be either a filename or a directory. If there are multiple source files, then destination must be a directory. If destination is a filename, and the file exists, it will be overwritten.

shell-prompt: cp file file.bak     # Make a backup copy
shell-prompt: cp file file.bak ~   # Copy files to home directory
                

ls lists files in CWD or a specified file or directory.

shell-prompt: ls [path ...]
                
shell-prompt: ls           # List CWD
shell-prompt: ls /etc      # List /etc directory
                

mv moves or renames files or directories.

shell-prompt: mv source destination
shell-prompt: mv source [source ...] destination-directory
                

If multiple sources are given, destination must be a directory.

shell-prompt: mv prog1.c Programs
                

ln link files or directories.

shell-prompt: ln source-file destination-file
shell-prompt: ln -s source destination
                

The ln command creates another path name for the same file. Both names refer to the same file, and changes made through one appear in the other.

Without -s, a standard directory entry, known as a hard link is created. A hard link is a directory entry that points to the first block of data in the file. Every file must have at least one hard link to it. If only one path name exists for a file, it is a hard link. For this reason, removing a file is also known as "unlinking". To create a second hard link, the source and destination path names must be in the same file system. File systems under Windows appear as different drive letters, such as C: or D:. Under Unix, all file systems are merged into a single directory tree under /. The df will list file systems and their location within the directory tree. There is no harm in trying to create a hard link. If it fails, you can do a soft link instead.

With -s, a symbolic link, or soft link is created. A symbolic link is not a standard directory entry, but a pointer to another path name. It is a directory entry that points to another directory entry rather than the content of the file. Only symbolic links can be used for directories, and symbolic links to not have to be in the same file system as the source.

shell-prompt: ln -s /etc/hosts ~    # Make a convenient link to hosts
                

rm removes one or more files.

shell-prompt: rm file [file ...]
                
shell-prompt: rm temp.txt core a.out
                

Caution

Removing files with rm is not like dragging them to the trash. Once files are removed by rm, they cannot be recovered.

If there are multiple hard links to a file, removing one of them only removes the link, and remaining links are still valid.

Caution

Removing the path name to which a symbolic link points will render the symbolic link invalid. It will become a dangling link.

srm (secure rm) removes files securely, erasing the file content and directory entry so that the file cannot be recovered. Use this to remove files that contain sensitive data. This is not a standard Unix command, but a free program that can be easily installed on most systems via a package manager.

mkdir creates one or more directories.

shell-prompt: mkdir [-p] path name [path name ...]
                

The -p flag indicates that mkdir should attempt to create any parent directories in the path that don't already exist. If not used, mkdir will fail unless all but the last component of the path already exist.

shell-prompt: mkdir Programs
shell-prompt: mkdir -p Programs/C/MPI
                

rmdir removes one or more empty directories.

shell-prompt: rmdir directory [directory ...]
                

rmdir will fail if a directory is not completely empty. You may also need to check for hidden files using ls -a directory. To remove a directory and everything under it, use rm -r directory.

shell-prompt: rmdir Programs/C/MPI
                

find locates files within a subtree using a wide variety of possible criteria.

shell-prompt: find start-directory criteria [action]
                

find is a very powerful and complex command that can be used to not only find files, but run commands on the files matching the search criteria.

Find can process globbing patterns like the shell, but note that we need to prevent the shell from processing them before running find by enclosing them in quotes.

# Find all core files (names end with "core")
shell-prompt: find . -name '*core'

# Remove cores
shell-prompt: find . -name '*core' -exec rm '{}' \;

# Remove multiple cores with each rm command (much faster)
shell-prompt: find . -name '*core' -exec rm '{}' +
                

df shows the free disk space on all currently mounted partitions.

shell-prompt: df
                

du reports the disk usage of a directory and everything under it.

shell-prompt: du [-s] [-h] path
                

The -s (summary) flag suppresses output about each file in the subtree, so that only the total disk usage of the directory is shown. The -h asks for human-readable output with gigabytes followed by a G, megabytes by an M, etc.

shell-prompt: du -sh Qemu
6.8G    Qemu/
                

Shell Internal Commands

As mentioned previously, internal commands are part of the shell, and serve to control the shell itself. Below are some of the most common internal commands.

cd changes the current working directory of the shell process. It is described in more detail in the section called “Current Working Directory”.

shell-prompt: cd [directory]
                

pushd changes CWD and saves the old CWD on a stack so that we can easily return.

shell-prompt: pushd directory
                

Users often encounter the need to temporarily go to another directory, run a few commands, and then come back to the current directory.

The pushd command is a very useful alternative to cd that helps in this situation. It performs the same operation as cd, but it records the starting CWD by adding it to the top of a stack of CWDs. You can then return to where the last pushd command was invoked using popd. This saves you from having to retype the path name of the directory to which you want to return. This is like leaving a trail of bread crumbs in the woods to retrace your path back home, except the pushd stack will not get eaten by birds and squirrels, and you won't end up in a witch's soup pot.

Practice Break

Try the following sequence of commands:

shell-prompt: pwd          # Check starting point
shell-prompt: pushd /etc
shell-prompt: more hosts
shell-prompt: pushd /home
shell-prompt: ls
shell-prompt: popd         # Back to /etc
shell-prompt: pwd
shell-prompt: more hosts
shell-prompt: popd         # Back to starting point
shell-prompt: pwd
                    

exit terminates the shell process.

shell-prompt: exit
                

This is the most reliable way to exit a shell. In some situations you could also type logout or simply press Ctrl+d, which sends an EOT character (end of transmission, ASCII/ISO character 4) to the shell.

Simple Text File Processing

cat echoes the contents of one or more text files.

shell-prompt: cat file [file ...]
                
shell-prompt: cat /etc/hosts
                

The vis and cat -v commands display invisible characters in a visible way. For example, carriage return characters present in Windows files are normally not shown by most Unix commands. The vis and cat -v commands will show them as '^M' (representing Control+M, which is what you would type to produce this character).

shell-prompt: cat sample.txt
This line contains a carriage return.
shell-prompt: vis sample.txt
This line contains a carriage return.\^M
shell-prompt: cat -v sample.txt 
This line contains a carriage return.^M
                

head shows the top N lines of one or more text files.

shell-prompt: head -n # file [file ...]
                

If a flag consisting of a - followed by an integer number N is given, the top N lines are shown instead of the default of 10.

shell-prompt: head -n 5 prog1.c
                

The head command can also be useful for generating small test inputs. Suppose you're developing a new program or script that processes genomic sequence files in FASTA format. Real FASTA files can contain millions of sequences and take a great deal of time to process. For testing new code, we don't need much data, and we want the test to complete in a few seconds rather than hours. We can use head to extract a small number of sequences from a large FASTA file for quick testing. Since FASTA files have alternating header and sequence lines, we must always choose a multiple of 2 lines. We use the output redirection operator (>) to send the head output to a file instead of the terminal screen. Redirection is covered in Section 1.13, “Redirection and Pipes”.

shell-prompt: head -n 1000 really-big.fasta > small-test.fasta
                

tail shows the bottom N lines of one or more text files.

shell-prompt: tail -n # file [file ...]
                

Tail is especially useful for viewing the end of a large file that would be cumbersome to view with more.

If a flag consisting of a - followed by an integer number N is given, the bottom N lines are shown instead of the default of 10.

shell-prompt: tail -n 5 output.txt
                

The diff command shows the differences between two text files. This is most useful for comparing two versions of the same file to see what has changed. Also see cdiff, a specialized version of diff, for comparing C source code.

The -u flag asks for unified diff output, which shows the removed text (text in the first file by not the second) preceded by '-', the added text (text in the second file but not the first) preceded by a '+', and some unchanged lines for context. Most people find this easier to read than the default output format.

shell-prompt: diff -u input1.txt input2.txt
                

Text Editors

There are more text editors available for Unix systems than any one person is aware of. Some are terminal-based, some are graphical, and some have both types of interfaces.

All Unix systems support running graphical programs from remote locations, but many graphical programs require a fast connection (100 megabits/sec) or more to function comfortably.

Knowing how to use a terminal-based text editor is therefore a very good idea, so that you're prepared to work on a remote Unix system over a slow connection if necessary. Some of the more common terminal-based editors are described below.

vi (visual editor) is the standard text editor for all Unix systems. Most users either love or hate the vi interface, but it's a good editor to know since it is available on every Unix system.

nano is an extremely simplistic text editor that is ideal for beginners. It is a rewrite of the pico editor, which is known to have many bugs and security issues. Neither editor is standard on Unix systems, but both are free and easy to install. These editors entail little or no learning curve, but are not sophisticated enough for extensive programming or scripting.

emacs (Edit MACroS) is a more sophisticated editor used by many programmers. It is known for being hard to learn, but very powerful. It is not standard on most Unix systems, but is free and easy to install.

ape is a menu-driven, user-friendly IDE (integrated development environment), i.e. programmer's editor. It has an interface similar to PC and Mac programs, but works on a standard Unix terminal. It is not standard on most Unix systems, but is free and easy to install. ape has a small learning curve, and advanced features to make programming much faster.

Eclipse is a popular open-source graphical IDE written in Java, with support for many languages. It is sluggish over a slow connection, so it may not work well on remote systems over ssh.

Networking

hostname prints the network name of the machine.

shell-prompt: hostname
                

This is often useful when you are working on multiple Unix machines at the same time (e.g. via ssh), and forgot which window applies to each machine.

ssh is used to remotely log into another machine on the network and start a shell.

ssh [name@]hostname
                
shell-prompt: ssh joe@unixdev1.ceas.uwm.edu
                

Network commands for transferring files are discussed in Section 1.15, “File Transfer”.

Identity and Access Management

passwd changes your password. It asks for your old password once, and the new one twice (to ensure that you don't accidentally set your password to something you don't know because your finger slipped). Unlike many graphical password programs, passwd does not echo anything for each character typed. Even allowing someone to see the length of your password is a bad idea from a security standpoint.

shell-prompt: passwd
                

The passwd command is generally only used for setting local passwords on the Unix machine itself. Many Unix systems are configured to authenticate users via a remote service such as Lightweight Directory Access Protocol (LDAP) or Active Directory (AD). Changing LDAP or AD passwords may require using a web portal to the LDAP or AD server instead of the passwd command.

Terminal Control

clear clears your terminal screen (assuming the TERM environment variable is properly set).

shell-prompt: clear
                

reset resets your terminal to its default state. This is useful when your terminal has been corrupted by bad output, such as when attempting to view a binary file with cat.

Terminals are controlled by magic sequences, sequences of invisible control characters sent from the host computer to the terminal amid the normal output. Magic sequences move the cursor, change the color, change the international character set, etc. Binary files contain random data that sometimes by chance contain magic sequences that could alter the mode of your terminal. If this happens, running reset will usually correct the problem. If not, you will need to log out and log back in.

shell-prompt: reset
                

Table 1.7, “Unix Commands” provides a quick reference for looking up common Unix commands. For details on any of these commands, run man command (or info command on some systems).

Table 1.7. Unix Commands

SynopsisDescription
ls [file|directory]List file(s)
cp source-file destination-fileCopy a file
cp source-file [source-file ...] directoryCopy multiple files to a directory
mv source-file destination-fileRename a file
mv source-file [source-file ...] directoryMove multiple files to a directory
ln source-file destination-fileCreate another name for the same file. (source and destination must be in the same file system)
ln -s source destinationCreate a symbolic link to a file or directory
rm file [file ...]Remove one or more files
rm -r directoryRecursively remove a directory and all of its contents
srm file [file ...]Securely erase and remove one or more files
mkdir directoryCreate a directory
rmdir directoryRemove a directory (the directory must be empty)
find start-directory criteriaFind files/directories based on flexible criteria
makeRebuild a file based on one or more other files
od/hexdumpShow the contents of a file in octal/hexadecimal
awkProcess tabular data from a text file
sedStream editor. Echo files, making changes to contents.
sortSort text files based on flexible criteria
uniqEcho files, eliminating adjacent duplicate lines.
diffShow differences between text files.
cmpDetect differences between binary files.
cdiffShow differences between C programs.
cutExtract substrings from text.
m4Process text files containing m4 mark-up.
chfnChange finger info (personal identity).
chshChange login shell.
suSubstitute user.
cc/clang/gcc/iccCompile C programs.
f77/f90/gfortran/ifortCompile Fortran programs.
arCreate static object libraries.
indentBeautify C programs.
astyleBeautify C, C++, C#, and Java programs.
tarPack a directory tree into a single file.
gzipCompress files.
gunzipUncompress gzipped files.
bzip2Compress files better (and slower).
bunzip2Uncompress bzipped files.
zcat/zmore/zgrep/bzcat/bzmore/bzgrepProcess compressed files.
exec commandReplace shell process with command.
dateShow the current date and time.
calPrint a calendar for any month of any year.
bcUnlimited precision calculator.
printenvPrint environment variables.

1.10.5. Practice

Instructions

  1. Make sure you are using the latest version of this document.

  2. Carefully read one section of this document and casually read other material (such as corresponding sections in a textbook, if one exists) if needed.

  3. Try to answer the questions from that section. If you do not remember the answer, review the section to find it.

  4. Write the answer in your own words. Do not copy and paste. Verbalizing answers in your own words helps your memory and understanding. Copying does not, and demonstrates a lack of interest in learning.

  5. Check the answer key to make sure your answer is correct and complete.

    DO NOT LOOK AT THE ANSWER KEY BEFORE ANSWERING QUESTIONS TO THE VERY BEST OF YOUR ABILITY. In doing so, you would only cheat yourself out of an opportunity to learn and prepare for the quizzes and exams.

Important notes:

  • Show all your work. This will improve your understanding and ensure full credit for the homework.

  • The practice problems are designed to make you think about the topic, starting from basic concepts and progressing through real problem solving.

  • Try to verify your own results. In the working world, no one will be checking your work. It will be entirely up to you to ensure that it is done right the first time.

  • Start as early as possible to get your mind chewing on the questions, and do a little at a time. Using this approach, many answers will come to you seemingly without effort, while you're showering, walking the dog, etc.

  1. What types of commands have to be internal to the shell? Give one example and explain why it must be internal.

  2. How can you find a list of the basic Unix commands available on your system?

  3. How can you find out whether the grep command is internal or external, and where it is located?

  4. What kind of suffering did computer users have to endure in order to read documentation before the Unix renaissance? How did Unix put an end to such suffering?

  5. How can we learn about all the command-line flags available for the tail command?

  6. How can we copy the file /tmp/sample.txt to the CWD?

  7. How can we copy all files whose names begin with "sample" and end with ".txt" to the CWD?

  8. How can we move all the files whose names end with ".py" to a subdirectory of the CWD called "Python"?

  9. How can we create another filename "./test-input.txt" for the file "./Data/input.txt"?

  10. What is a hard link?

  11. What is a symbolic link?

  12. What do we get when we remove the path name to which a symbolic link points?

  13. How do we create a new directory /home/joe/Data/Project1 if the Data directory does not exist and the CWD is /home/joe?

  14. How do we remove the directory ./Data if it is empty? If it is not empty?

  15. How can we find out how much disk space is available in each file system?

  16. How can we find out how much space is used by the Data directory?

  17. How can we change CWD to /tmp, then to /etc and then return to the original CWD?

  18. How do we exit the shell?

  19. How can we see if there are carriage returns in graph.py?

  20. How can we see the first 20 lines of output.txt?

  21. How can we see the last 20 lines of output.txt?

  22. How can we see what has changed between analysis.c.old and analysis.c?

  23. Which text editor is available on all Unix systems?

  24. How can we find out the name of the machine running our shell?

  25. How can user joe log into the remote server unixdev1.ceas.uwm.edu to run commands on it?

  26. How do we change our local password on a Unix system?

  27. How do we change our password for a Unix system that relies on LDAP or AD?

  28. How do we clear the terminal display?

  29. How do we reset the terminal mode to defaults?