1.11. Unix Commands and the Shell

Before You Begin

You should have a basic understanding of Unix processes, files, and directories. These topics are covered in Section 1.9, “Processes” and Section 1.10, “The Unix File system”.

Unix commands fall into one of two categories:

1.11.1. Internal Commands

Commands are implemented internally only when it is necessary, or when there is a substantial benefit. If all commands were part of the shell, the shell would be enormous and would require too much memory.

An example is the cd command, which changes the CWD of the shell process. The cd command cannot be implemented as an external command, since the CWD is a property of the process.

We can prove this using Proof by Contradiction. Assuming the cd command is external, it would run as a child process of the shell. Hence, running cd would create a child process, which would alter its CWD, and then terminate. Altering the CWD of a child process does not affect the CWD of the parent process. Remember that every process in a Unix system has it's own independent CWD.

Expecting an external command to change your CWD for you would be akin to asking one of your children to go to take a shower for you. Neither is capable of affecting the desired change.

Likewise, any command that alters the state of the shell process must be implemented as an internal command.

A command might also be implemented internally simply because it's trivial to do so, and it saves the overhead of loading and running an external command. When the work done by a command is very simple, it might take more resources to load an external program than it does to actually run it. In these cases, it makes more sense to implement it as part of the shell.

1.11.2. External Commands

The executable files containing external commands are kept in certain directories, most of which are called bin (short for binary, since most executable files are binary files). The most essential commands that are common to most Unix systems are kept in /bin and /usr/bin. The location of optional add-on commands varies, but a typical location is /usr/local/bin.

The list of directories that are searched when looking for external commands is kept in an environment variable called PATH. The environment is discussed in more detail in Section 1.16, “Environment Variables”.

Practice Break

  1. Use which under C shell family shells (csh and tcsh) to find out whether the following commands are internal or external. Use type under Bourne family shells (bash, ksh, dash, zsh). You can use either command under either shell, but will get better results if you follow the advice above. (Try both and see what happens.)
    shell-prompt: which cd
    shell-prompt: which cp
    shell-prompt: which exit
    shell-prompt: which ls
    shell-prompt: which pwd
    		    
  2. Find out where your external commands are stored by running echo $PATH.
  3. Use ls to find out what commands are located in /bin and /usr/bin.

1.11.3. Getting Help

In the olden days before Unix, when programmers wanted to look up a command or function, they would have to get out of their chair and walk somewhere to pick up a typically ring-bound manual to flip through.

The Unix designers saw this as a waste of time. They thought, wouldn't it be nice if we could sit in the same chair for ten hours straight without ever taking our eyes off the monitor or our hands off the keyboard?

And so, online documentation was born. On Unix systems, all common Unix commands are documented in detail on the Unix system itself, and the documentation is accessible via the command line (you do not need a GUI to view it). Whenever you want to know more about a particular Unix command, you can find out by typing man command-name. For example, to learn all about the ls command, type:

shell-prompt: man ls
	    

The man covers virtually every common command, as well as other topics. It even covers itself:

shell-prompt: man man
	    

The man command displays a nicely formatted document known as a man page. It uses a file viewing program called more, which can be used to browse through text files very quickly. Table 1.6, “Common hot keys in more shows the most common keystrokes used to navigate a man page. For complete information on navigation, run:

shell-prompt: man more
	    

Table 1.6. Common hot keys in more

KeyAction
hShow key commands
Space barForward one page
Enter/ReturnForward one line
bBack one page
/Search

Man pages include a number of standard sections, such as SYNOPSIS, DESCRIPTION, and SEE ALSO, which helps you identify other commands that might be of use.

Man pages do not always make good tutorials. Sometimes they contain too much detail, and they are often not well-written for novice users. If you're learning a new command for the first time, you might want to consult a Unix book or the WEB. The man pages will provide the most detailed and complete reference information on most commands, however.

The apropos command is used to search the man page headings for a given topic. It is equivalent to man -k. For example, to find out what man pages exist regarding Fortran, we might try the following:

shell-prompt: apropos fortran
	    

or

shell-prompt: man -k fortran
	    

The whatis is similar to apropos in that it lists short descriptions of commands. However, whatis only lists those commands with the search string in their name or short description, whereas apropos attempts to list everything related to the string.

The info command is an alternative to man that uses a non-graphical hypertext system instead of flat files. This allows the user to navigate extensive documentation more efficiently. The info command has a fairly high learning curve, but it is very powerful, and is often the best option for documentation on a given topic. Some open source software ships documentation in info format and provides a man page (converted from the info files) that actually has less information in it.

shell-prompt: info gcc
	    

Practice Break

  1. Find out how to display a '/' after each directory name and a '*' after each executable file when running ls.
  2. Use apropos to find out what Unix commands to use with bzip files.

1.11.4. A Basic Set of Unix Commands

Most Unix commands have short names which are abbreviations or acronyms for what they do. ( pwd = print working directory, cd = change directory, ls = list, ... ) Unix was originally designed for people with good memories and poor typing skills.

Some of the most commonly used Unix commands are described below.

Note

This section is meant to serve as a quick reference, and to inform new readers about which commands they should learn. There is much more to know about these commands than we can cover here. For full details about any of the commands described here, consult the man pages, info pages, or the WEB.

This section uses the same notation conventions as the Unix man pages:

  • Optional arguments are shown inside [].
  • The pipe symbol (|) between two items means one or the other.
  • An ellipses (...) means optionally more of the same.
  • "file" means a filename is required and a directory name is not allowed. "directory" means a directory name is required, and a filename is not allowed. "path" means either a filename or directory name is acceptable.

File and Directory Management

cp copies one or more files.

shell-prompt: cp source-file destination-file
shell-prompt: cp source-file [source-file ...] destination-directory
		

If there is only one source filename, then destination can be either a filename or a directory. If there are multiple source files, then destination must be a directory. If destination is a filename, and the file exists, it will be overwritten.

shell-prompt: cp file file.bak     # Make a backup copy
shell-prompt: cp file file.bak ~   # Copy files to home directory
		

ls lists files in CWD or a specified file or directory.

shell-prompt: ls [path ...]
		
shell-prompt: ls           # List CWD
shell-prompt: ls /etc      # List /etc directory
		

mv moves or renames files or directories.

shell-prompt: mv source destination
shell-prompt: mv source [source ...] destination-directory
		

If multiple sources are given, destination must be a directory.

shell-prompt: mv prog1.c Programs
		

ln link files or directories.

shell-prompt: ln source-file destination-file
shell-prompt: ln -s source destination
		

The ln command creates another path name for the same file. Both names refer to the same file, and changes made through one appear in the other. Without -s, a standard directory entry, known as a hard link is created. In this case, source and destination must be on the same partition. (The df will list partitions and their location within the directory tree.) With -s, a symbolic link is created. A symbolic link is not a standard directory entry, but a pointer to the source path name. Only symbolic links can be used for directories, and symbolic links to not have to be on the same partition as the source.

shell-prompt: ln -s /etc/motd ~    # Make a convenient link to motd
		

rm removes one or more files.

shell-prompt: rm file [file ...]
		
shell-prompt: rm temp.txt core a.out
		

Caution

Removing files with rm is not like dragging them to the trash. Once files are removed by rm, they cannot be recovered.

srm (secure rm) removes files securely, erasing the file content and directory entry so that the file cannot be recovered. Use this to remove files that contain sensitive data. This is not a standard Unix command, but a free program that can be easily installed on most systems.

mkdir creates one or more directories.

shell-prompt: mkdir [-p] path name [path name ...]
		

The -p flag indicates that mkdir should attempt to create any parent directories in the path that don't already exist. If not used, mkdir will fail unless all but the last component of the path exist.

shell-prompt: mkdir Programs
shell-prompt: mkdir -p Programs/C/MPI
		

rmdir removes one or more empty directories.

shell-prompt: rmdir directory [directory ...]
		

rmdir will fail if a directory is not completely empty. You may also need to check for hidden files using ls -a directory.

shell-prompt: rmdir Programs/C/MPI
		

find locates files within a subtree using a wide variety of possible criteria.

shell-prompt: find start-directory criteria [action]
		

find is a very powerful and complex command that can be used to not only find files, but run commands on the files matching the search criteria.

# List cores
shell-prompt: find . -name core

# Remove cores
shell-prompt: find . -name core -exec rm '{}' \;

# Remove multiple cores with each rm command (much faster)
shell-prompt: find . -name core -exec rm '{}' +
		

df shows the free disk space on all currently mounted partitions.

shell-prompt: df
		

du reports the disk usage of a directory and everything under it.

shell-prompt: du [-s] [-h] path
		

The -s (summary) flag suppresses output about each file in the subtree, so that only the total disk usage of the directory is shown.

shell-prompt: du -sh Programs 
		

Shell Internal Commands

As mentioned previously, internal commands are part of the shell, and serve to control the shell itself. Below are some of the most common internal commands.

cd changes the current working directory of the shell process. It is described in more detail in the section called “Current Working Directory”.

shell-prompt: cd [directory]
		

The pwd command prints the CWD of the shell process. It is described in detail in the section called “Current Working Directory”[1]. You can use pwd like a Unix file system GPS, to get your bearing when you're lost.

pushd changes CWD and saves the old CWD on a stack so that we can easily return.

shell-prompt: pushd directory
		

Users often encounter the need to temporarily go to another directory, run a few commands, and then come back to the current directory.

The pushd command is a very useful alternative to cd that helps in this situation. It performs the same operation as cd, but it records the starting CWD by adding it to the top of a stack of CWDs. You can then return to where the last pushd command was invoked using popd. This saves you from having to retype the path name of the directory you want to return to. Not all shells support pushd and popd, but the ones you are likely to use for a login session do.

Practice Break

Try the following sequence of commands:

shell-prompt: pwd          # Check starting point
shell-prompt: pushd /etc
shell-prompt: more motd
shell-prompt: pushd /home
shell-prompt: ls
shell-prompt: popd         # Back to /etc
shell-prompt: pwd
shell-prompt: more motd
shell-prompt: popd         # Back to starting point
shell-prompt: pwd
		    

exit terminates the shell process.

shell-prompt: exit
		

This is the most reliable way to exit a shell. In some situations you could also type logout or simply press Ctrl+d, but these alternatives will not work for every shell process.

Text File Processing

cat echoes the contents of one or more text files.

shell-prompt: cat file [file ...]
		
shell-prompt: cat /etc/motd
		

The vis and cat -v commands display invisible characters in a visible way. For example, carriage return characters present in DOS/Windows files are normally not shown by most Unix commands, will appear as '^M' (representing Control+M, which is what you would type to produce this character).

more shows the contents of one or more text files interactively.

shell-prompt: more file [file ...]
		
shell-prompt: more prog1.c
		

head shows the top N lines of one or more text files.

shell-prompt: head -n # file [file ...]
		

If a flag consisting of a - followed by an integer number N is given, the top N lines are shown instead of the default of 10.

shell-prompt: head -n 5 prog1.c
		

tail shows the bottom N lines of one or more text files.

shell-prompt: tail -n # file [file ...]
		

Tail is especially useful for viewing the end of a large file that would be cumbersome to view with more.

If a flag consisting of a - followed by an integer number N is given, the bottom N lines are shown instead of the default of 10.

shell-prompt: tail -n 5 output.txt
		

grep shows lines in one or more text files that match a given regular expression.

shell-prompt: grep regular-expression file [file ...]
		

The regular expression is most often a simple string, but can represent patterns as described by man re_format.

Show all lines containing the string "printf" in prog1.c.

shell-prompt: grep printf prog1.c
		

Show all lines containing the variable names in prog1.c. (Variable names begin with a letter or underscore and may contain letters, underscores, or digits after that.)

shell-prompt: grep '[a-zA-Z_][a-zA-Z0-9_]*' prog1.c
		

The diff command shows the differences between two text files. This is most useful for comparing two versions of the same file to see what has changed. Also see cdiff, a specialized version of diff, for comparing C source code.

shell-prompt: diff -u input1.txt input2.txt
		

Text Editors

There are more text editors available for Unix systems than any one person is aware of. Some are terminal-based, some are graphical, and some have both types of interfaces.

All Unix systems support running graphical programs from remote locations, but most graphical programs require a fast connection (10 megabits/sec) or more to function tolerably.

Knowing how to use a terminal-based text editor is therefore a very good idea, so that you're prepared to work over a slow connection if necessary. Some of the more common terminal-based editors are described below.

vi (visual editor) is the standard text editor for all Unix systems. Most users either love or hate the vi interface, but it's a good editor to know since it is standard on every Unix system.

nano is an extremely simplistic text editor that is ideal for beginners. It is a rewrite of the pico editor, which is known to have many bugs and security issues. Neither editor is standard on Unix systems, but both are free and easy to install. These editors entail little or no learning curve, but are not sophisticated enough for extensive programming or scripting.

emacs (Edit MACroS) is a more sophisticated editor used by many programmers. It is known for being hard to learn, but very powerful. It is not standard on most Unix systems, but is free and easy to install.

ape is a menu-driven, user-friendly IDE (integrated development environment), i.e. programmer's editor. It has an interface similar to PC and Mac programs, but works on a standard Unix terminal. It is not standard on most Unix systems, but is free and easy to install. ape has a small learning curve, and advanced features to make programming much faster.

Networking

hostname prints the network name of the machine.

shell-prompt: hostname
		

This is often useful when you are working on multiple Unix machines at the same time (e.g. via ssh), and forgot which window applies to each machine.

ssh is used to remotely log into another machine on the network and start a shell.

		ssh [name@]hostname
		
shell-prompt: ssh joe@login.peregrine.hpc.uwm.edu
		

sftp is used to remotely log into another machine on the network and transfer files to or from it.

shell-prompt: sftp [name@]host
		
shell-prompt: sftp joe@data.peregrine.hpc.uwm.edu
		

rsync is used to synchronize two directories either on the same machine or on different machines.

shell-prompt: rsync [flags] [[name@]host:]path [[name@]host:]path
		

rsync compares the contents of the two source and destination directories and transfers only the differences. Hence, it can save an enormous amount of time when you make small changes to a large project and need to synchronize another copy of the project.

shell-prompt: rsync -av Project joe@data.peregrine.hpc.uwm.edu:
		

Identity and Access Management

passwd changes your password. It asks for your old password once, and the new one twice (to ensure that you don't accidentally set your password to something you don't know because your finger slipped). Unlike many graphical password programs, passwd does not echo anything for each character typed. (Even showing the length of your password is a bad idea from a security standpoint.)

shell-prompt: passwd
		

Terminal Control

clear clears your terminal screen (assuming the TERM variable is properly set).

shell-prompt: clear
		

reset resets your terminal to an initial state. This is useful when your terminal has been corrupted by bad output, such as when attempting to view a binary file.

Terminals are controlled by magic sequences, sequences of invisible control characters sent from the host computer to the terminal amid the normal output. Magic sequences move the cursor, change the color, change the international character set, etc. Binary files contain random data that sometimes by chance contain magic sequences that could alter the mode of your terminal. If this happens, running reset will usually correct the problem. If not, you will need to log out and log back in.

shell-prompt: reset
		

1.11.5. Self-test

  1. What is an internal command?
  2. What is an external command?
  3. What kinds of commands must be implemented as internal commands?
  4. How can you quickly view detailed information on a Unix command?
  5. How can you identify Unix commands related to a topic? (Describe two methods.)
  6. Show the simplest Unix command to accomplish each of the following in order:
    1. List the files in /usr/local/share.
    2. Make your home directory the CWD.
    3. Copy the file /etc/hosts to your home directory.
    4. Rename the file ~/hosts to ~/hosts.bak.
    5. Create a subdirectory called ~/Temp.
    6. Make ~/Temp the CWD.
    7. Copy the file ~/hosts.bak to ~/Temp.
    8. Create a hard link to the file ~/Temp/hosts.bak called ~/hosts.bak.temp.
    9. Create a link to the directory /usr/local/share in your home directory.
    10. Make your home directory the CWD.
    11. Remove the entire subtree in ~/Temp and the files ~/hosts.bak and ~/hosts.bak.temp.
    12. Show how much space is used by the directory /etc.
  7. Show the simplest Unix command to accomplish each of the following:
    1. Change the current working directory of your shell process to /etc, remembering the previous current working directory on the directory stack.
    2. Return to the previous current working directory on the directory stack.
    3. Terminate the shell.
  8. Show the simplest Unix command to accomplish each of the following:
    1. Show the contents of the text file /etc/motd a page at a time.
    2. Show the first 5 lines of /etc/motd.
    3. Show the last 8 lines of /etc/motd.
    4. Show lines in /etc/group and /etc/passwd containing your username.
    5. Edit the text file ./prog1.c.
  9. Show the simplest Unix command to accomplish each of the following:
    1. Show the network name (host name) of the computer running the shell.
    2. Remotely log into "login.peregrine.hpc.uwm.edu" and start a shell.
    3. Remotely log into "data.peregrine.hpc.uwm.edu" for the purpose of transferring files.
    4. Synchronize the folder ~/Programs/Prog1 on login.peregrine.hpc.uwm.edu to ./Prog1, transferring only the differences.
    5. Clear the terminal screen.
    6. Restore functionality to a terminal window that's in a funk.


[1] pwdpwd