Unix Commands and the Shell

Unix Commands and the Shell
Prev	Chapter 3. Using Unix	Next

Before You Begin

You should have a basic understanding of Unix processes, files, and directories. These topics are covered in the section called “Processes” and the section called “The Unix File System”.

Unix commands fall into one of two categories:

Internal commands, such as cd, are part of the shell. All shells have a cd command and share many other internal commands, but different shells also have some internal commands that unique to that shell.
No new process is created when you execute an internal command. The shell simply carries out the execution of internal commands by itself.
Internal commands are documented in the shell's man page. I.e., to learn the details about the cd command for the shell we are using, we would run man sh, man tcsh, man bash, etc., rather than man cd.
External commands, such as ls, are independent programs separate from the shell. The command name of an external command is actually the name of an executable file, i.e. a file containing the program or script. For example, when you run the ls command, you are executing the program contained in the file /bin/ls.
When you run an external command, the shell locates the program file, loads the program into memory, and creates a new (child) process to execute the program. The shell then normally waits for the child process to end before prompting you for the next command.
External commands are documented by their own man pages. I.e., to learn about the ls command, we would run man ls, regardless of which shell we are using.

Internal Commands

Commands are implemented internally only when it is necessary or when there is a substantial benefit. If all commands were part of the shell, the shell would be enormous and require too much memory.

One command that must be internal is the cd command, which changes the CWD of the shell process. The cd command cannot be implemented as an external command, since the CWD is a property of the process, as described in the section called “Current Working Directory”.

We can prove this using Proof by Contradiction. If the cd command were external, it would run as a child process of the shell. Hence, running cd would create a child process, which would inherit CWD from the shell process, alter its copy of CWD, and then terminate. The CWD of the parent, the shell process, would be unaffected.

Expecting an external command to change your CWD for you would be akin to asking one of your children to go to take a shower for you. Neither is capable of affecting the desired change. Likewise, any command that alters the state of the shell process must be implemented as an internal command.

External Commands

Most commands are external, i.e. programs separate from the shell. As a result, they behave the same way regardless of which shell we use to run them.

The executable files containing external commands are kept in certain directories, most of which are called bin (short for "binary", since most executable files are binary files containing machine code). The most essential commands required for the Unix system to function are kept in /bin and /usr/bin. The location of optional add-on commands varies, but a typical location is /usr/local/bin. Debian and Redhat Linux mix add-on commands with core system commands in /usr/bin. BSD systems keep them separate directories such as /usr/local/bin or /usr/pkg/bin.

Example 3.14. Practice Break

Use which under C shell family shells to find out whether the following commands are internal or external. Use type under Bourne family shells (bash, ksh, dash, zsh). You can use either command under either shell, but will get better results if you follow the advice above. (Try both and see what happens.)
```
shell-prompt: which cd
shell-prompt: which cp
shell-prompt: which exit
shell-prompt: which ls
shell-prompt: which pwd
                    
```
Use ls to find out what commands are located in /bin and /usr/bin.

Getting Help

In the dark ages before Unix, when programmers wanted to look up a command or function, they actually had to get out of their chairs and walk somewhere to pick up a typically ring-bound printed manual to flip through. This resembled physical activity, which most computer scientists find terrifying.

The Unix designers saw the injustice of this situation and set out to rectify it. They imagined a Utopian world where they could sit in the same chair for ten hours straight without ever taking our eyes off the monitor or their fingers off the keyboard, happily subsisting on coffee and potato chips.

Aside

If there is one trait that best defines an engineer it is the ability to concentrate on one subject to the complete exclusion of everything else in the environment. This sometimes causes engineers to be pronounced dead prematurely. Some funeral homes in high-tech areas have started checking resumes before processing the bodies. Anybody with a degree in electrical engineering or experience in computer programming is propped up in the lounge for a few days just to see if he or she snaps out of it.

-- The Engineer Identification Test (Anonymous)

And so, online documentation was born. On Unix systems, all common Unix commands are documented in detail on the Unix system itself, and the documentation is accessible via the command line (you do not need a GUI to view it, which is important when using a dumb terminal to access a remote system). Whenever you want to know more about a particular Unix command, you can find out by typing man command-name. For example, to learn all about the ls command, type:

shell-prompt: man ls

The man covers virtually every common command, as well as other topics. It even covers itself:

shell-prompt: man man

The man command displays a nicely formatted document known as a man page. It uses a file viewing program called more, which can be used to browse through text files very quickly. Table 3.6, “Common hot keys in more” shows the most common keystrokes used to navigate a man page. For complete information on navigation, run:

shell-prompt: man more

Table 3.6. Common hot keys in more

Key	Action
h	Show key commands
Space bar	Forward one page
Enter/Return	Forward one line
b	Back one page
/	Search

Man pages include a number of standard sections, such as SYNOPSIS, DESCRIPTION, and SEE ALSO, which helps you identify other commands that might be of use.

Man pages do not always make good tutorials. Sometimes they contain too much detail, and they are often not well-written for novice users. If you're learning a new command for the first time, you might want to consult a Unix book or the WEB. The man pages will provide the most detailed and complete reference information on most commands, however.

The apropos command is used to search the man page headings for a given topic. It is equivalent to man -k. For example, to find out what man pages exist regarding Fortran, we might try the following:

shell-prompt: apropos sine
FreeBSD moray.acadix  bacon ~ 1002: apropos sine
acos, acosf, acosl(3) - arc cosine functions
acosh, acoshf, acoshl(3) - inverse hyperbolic cosine functions
asin, asinf, asinl(3) - arc sine functions
asinh, asinhf, asinhl(3) - inverse hyperbolic sine functions
cos, cosf, cosl(3) - cosine functions
cosh, coshf, coshl(3) - hyperbolic cosine functions
cospi, cospif, cospil(3) - half-cycle cosine functions
Role::Tiny(3) - Roles: a nouvelle cuisine portion size slice of Moose
sin, sinf, sinl, sincosl(3) - sine functions
sincos, sincosf, sincosl(3) - sine and cosine functions
sinh, sinhf, sinhl(3) - hyperbolic sine function
sinpi, sinpif, sinpil(3) - half-cycle sine functions

shell-prompt: man -k sine

The whatis is similar to apropos in that it lists short descriptions of commands. However, whatis only lists those commands with the search string in their name or short description, whereas apropos attempts to list everything related to the string.

shell-prompt: whatis sin
sin, sinf, sinl, sincosl(3) - sine functions

The info command is an alternative to man that uses a non-graphical hypertext system instead of flat files. This allows the user to navigate extensive documentation more efficiently. The info command has a fairly high learning curve, but it is very powerful, and is often the best option for documentation on a given topic. Some open source software ships documentation in info format and provides a man page (converted from the info files) that actually has less information in it.

shell-prompt: info gcc

Example 3.15. Practice Break

Find out how to display a '/' after each directory name and a '*' after each executable file when running ls.
Use apropos to find out what Unix commands to use with bzip files.

Some Useful Unix Commands

Most Unix commands have short names which are abbreviations or acronyms for what they do. ( pwd = print working directory, cd = change directory, ls = list, ... ) Unix was originally designed for people with good memories and poor typing skills. Some of the most commonly used Unix commands are described below.

Note

This section is meant to serve as a quick reference, and to inform new readers about which commands they should learn. There is much more to know about these commands than we can cover here. For full details about any of the commands described here, consult the man pages, info pages, or the WEB.

This section uses the same notation conventions as the Unix man pages:

Optional arguments are shown inside [].
The 'or' symbol (|) between two items means one or the other.
An ellipses (...) means optionally more of the same.
"file" means a filename is required and a directory name is not allowed. "directory" means a directory name is required, and a filename is not allowed. "path" means either a filename or directory name is acceptable.

File and Directory Management

Note

Run these commands in the exact order presented. Some depend on successful completion of previous commands.

ls lists files in CWD or a specified file or directory.

shell-prompt: ls [path ...]

shell-prompt: ls           # List CWD
shell-prompt: ls /etc      # List /etc directory

mkdir creates one or more directories.

shell-prompt: mkdir [-p] path name [path name ...]

The -p flag indicates that mkdir should attempt to create any parent directories in the path that don't already exist. If not used, mkdir will fail unless all but the last component of the path already exist.

shell-prompt: ls
shell-prompt: mkdir Temp
shell-prompt: ls                        # Should see Temp now
shell-prompt: mkdir Temp2/C/MPI      # Should fail
shell-prompt: mkdir -p Temp2/C/MPI
shell-prompt: ls Temp2

cp copies one or more files.

shell-prompt: cp source-file destination-file
shell-prompt: cp source-file [source-file ...] destination-directory

If there is only one source filename, then destination can be either a filename or a directory.

shell-prompt: cd
shell-prompt: touch file            # Create file if it doesn't exist
shell-prompt: cp file file.bak      # Make a backup copy
shell-prompt: ls                    # Should see file and file.bak

If there are multiple source files, then destination must be a directory. If destination is a filename, and the file exists, it will be overwritten.

shell-prompt: cp /etc/hosts* hosts  # Should fail
shell-prompt: cp /etc/hosts* Temp   # Should work if directory Temp exists
shell-prompt: ls Temp

mv moves or renames files or directories.

shell-prompt: mv source destination
shell-prompt: mv source [source ...] destination-directory

shell-prompt: mv file.bak file.bk
shell-prompt: ls

If multiple sources are given, destination must be a directory.

shell-prompt: mv file file.bk file2     # Should fail
shell-prompt: mv file file.bk Temp      # Should work if directory Temp exists
shell-prompt: ls
shell-prompt: ls Temp

rm removes one or more files.

shell-prompt: rm file [file ...]

shell-prompt: cd Temp
shell-prompt: ls
shell-prompt: rm hosts*
shell-prompt: ls
shell-prompt: rm file*
shell-prompt: ls

Caution

Removing files with rm is not like dragging them to the trash. Once files are removed by rm, they cannot be recovered.

If there are multiple hard links to a file, removing one of them only removes the link, and remaining links are still valid.

Caution

Removing the path name to which a symbolic link points will render the symbolic link invalid. It will become a dangling link.

srm (secure rm) removes files securely, erasing the file content and directory entry so that the file cannot be recovered. Use this to remove files that contain sensitive data. This is not a standard Unix command, but a free program that can be easily installed on most systems via a package manager.

df shows the free disk space on all currently mounted partitions.

shell-prompt: df

ln link files or directories.

shell-prompt: ln source-file destination-file
shell-prompt: ln -s source destination

The ln command creates another path name for the same file. Both names refer to the same file, so changes made through one name (e.g. using nano) appear in the other.

Each file in a typical Unix file system is described by a structure called an inode. The inode contains metadata, i.e. information about the file other than its content, such as the file's ownership, permissions, last modification time, and the locations of the disk blocks (chunks of disk space) containing the file's content.

Without -s, a standard directory entry, known as a hard link is created. A hard link is a directory entry that points directly to the inode of the file. In fact, such a directory entry contains little more than the file's name and the location of the inode. Every file must have at least one hard link to it. For this reason, removing a file is also known as "unlinking".

shell-prompt: touch file
shell-prompt: ln file file.hardlink
shell-prompt: ls -l

To create a second hard link, the source cannot be a directory, and the source and destination path names must be in the same file system. There is no harm in trying to create a hard link. If it fails, you can do a soft link instead.

shell-prompt: ln /etc .         # Should fail
shell-prompt: ln -s /etc .
shell-prompt: ls
shell-prompt: ls etc            # List the link
shell-prompt: ls etc/           # List contents of the directory

File systems under Windows appear as different drive letters, such as C: or D:. Under Unix, each file system is mounted to a specific directory. The main file system is mounted to / and the rest are mounted to subdirectories. The df command will list file systems and their mount points within the directory tree. For example, in the df output below, / and /data are separate file systems. The disk ada0 is divided into three partitions. Partition 2, called ada0p2, contains a file system which is mounted on /. Partitions 0 and 1 are used by the operating system for other purposes. The second disk, ada1, has a file system on partition 0, which is mounted on /data.

shell-prompt: df
Filesystem         Size    Used   Avail Capacity  Mounted on
/dev/ada0p2        447G    266G    146G    64%    /
/dev/ada1p0        978G    172G    729G    20%    /data

Everything under /data and only things under /data are on ada1p0. Hence, we cannot create a hard line to /data/joe/Research/notes.txt in /home/joe, which is on ada0p2.

# This will fail.
# You cannot run this command, since the partitions are hypothetical
# You can try linking something from a different filesystem based on
# your own "df" output if you like.
shell-prompt: ln /data/joe/Research/notes.txt ~joe

With -s, a symbolic link, or soft link is created. A symbolic link is not a standard directory entry, but a pointer to another path name. It is a directory entry that points to another directory entry rather than the inode of the file. Symbolic links to not have to be in the same file system as the source.

# This will work
shell-prompt: ln -s /data/joe/Research/notes.txt ~joe

rmdir removes one or more empty directories.

shell-prompt: rmdir directory [directory ...]

rmdir will fail if a directory is not completely empty. You may also need to check for hidden files using ls -a directory. To remove a directory and everything under it, use rm -r directory.

shell-prompt: cd
shell-prompt: rmdir Temp2               # Should fail
shell-prompt: rmdir Temp2/C
shell-prompt: rmdir Temp2/C/MPI
shell-prompt: rm -r Temp2
shell-prompt: rmdir Temp                # Should tail
shell-prompt: rm -r Temp
shell-prompt: ls

du reports the disk usage of a directory and everything under it.

shell-prompt: du [-s] [-h] path

The -s (summary) flag suppresses output about each file in the subtree, so that only the total disk usage of the directory is shown. The -h asks for human-readable output with gigabytes followed by a G, megabytes by an M, etc.

shell-prompt: du -sh /etc

Note

The du command does not add up file content sizes. It adds up the disk space used by each file. In an uncompressed file system, space used is rounded up to a multiple of the block size (commonly 4096 bytes). In a compressed file system, space used is a multiple of blocks used after compression, which can be significantly smaller than the file content. This is often the case with the ZFS file system, which is standard on FreeBSD and Solaris-based systems such as OpenIndiana. "Fluffy" text files that compress easily, such as genomic data, may require only a small fraction of their content size in disk space on ZFS. This make ZFS a great choice for housing genomic data.

Shell Internal Commands

As mentioned previously, internal commands are part of the shell, and serve to control the shell itself. Below are some of the most common internal commands.

cd changes the current working directory of the shell process.

shell-prompt: cd [directory]

pushd changes CWD and saves the old CWD on a stack so that we can easily return.

shell-prompt: pushd directory

Users often encounter the need to temporarily go to another directory, run a few commands, and then come back to the current directory.

The pushd command is a very useful alternative to cd that helps in this situation. It performs the same operation as cd, but it records the starting CWD by adding it to the top of a stack of CWDs. You can then return to where the last pushd command was invoked using popd. This saves you from having to retype the path name of the directory to which you want to return. This is like leaving a trail of bread crumbs in the woods to retrace your path back home, except the pushd stack will not get eaten by birds and squirrels, and you won't end up in a witch's soup pot.

Example 3.16. Practice Break

Try the following sequence of commands:

shell-prompt: pwd          # Check starting point
shell-prompt: pushd /etc
shell-prompt: more hosts
shell-prompt: pushd /home
shell-prompt: ls
shell-prompt: popd         # Back to /etc
shell-prompt: pwd
shell-prompt: more hosts
shell-prompt: popd         # Back to starting point
shell-prompt: pwd

exit terminates the shell process.

shell-prompt: exit

This is the most reliable way to exit a shell. In some situations you could also type logout or simply press Ctrl+d, which sends an EOT character (end of transmission, ASCII/ISO character 4) to the shell.

Simple Text File Processing

cat echoes the contents of one or more text files.

shell-prompt: cat file [file ...]

shell-prompt: cat /etc/hosts

The vis and cat -v commands display invisible characters in a visible way. For example, carriage return characters present in Windows files are normally not shown by most Unix commands. The vis and cat -v commands will show them as '^M' (representing Control+M, which is what you would type to produce this character).

shell-prompt: cat sample.txt
This line contains a carriage return.
shell-prompt: vis sample.txt
This line contains a carriage return.\^M
shell-prompt: cat -v sample.txt 
This line contains a carriage return.^M

head shows the top N lines of one or more text files.

shell-prompt: head -n # file [file ...]

If the flag -n followed by an integer number N is given, the top N lines are shown instead of the default of 10.

shell-prompt: head -n 5 /etc/hosts

The head command can also be useful for generating small test inputs. Suppose you're developing a new program or script that processes genomic sequence files in FASTA format. Real FASTA files can contain millions of sequences and take a great deal of time to process. For testing new code, we don't need much data, and we want the test to complete in a few seconds rather than hours. We can use head to extract a small number of sequences from a large FASTA file for quick testing. Since FASTA files have alternating header and sequence lines, we must always choose a multiple of 2 lines. We use the output redirection operator (>) to send the head output to a file instead of the terminal screen. Redirection is covered in the section called “Redirection and Pipes”.

# You cannot run this command unless you have a file called
# reall-big.fasta in the CWD
shell-prompt: head -n 1000 really-big.fasta > small-test.fasta

tail shows the bottom N lines of one or more text files.

shell-prompt: tail -n # file [file ...]

Tail is especially useful for viewing the end of a large file that would be cumbersome to view with more.

If the flag -n followed by an integer number N is given, the bottom N lines are shown instead of the default of 10.

shell-prompt: tail -n 5 /etc/hosts

The diff command shows the differences between two text files. This is most useful for comparing two versions of the same file to see what has changed. Also see cdiff, a specialized version of diff, for comparing C source code.

The -u flag asks for unified diff output, which shows the removed text (text in the first file by not the second) preceded by '-', the added text (text in the second file but not the first) preceded by a '+', and some unchanged lines for context. Most people find this easier to read than the default output format.

shell-prompt: printf "1\n2\n3\n" > input1.txt
shell-prompt: printf "2\n3\n4\n" > input2.txt
shell-prompt: diff input1.txt input2.txt
shell-prompt: diff -u input1.txt input2.txt
shell-prompt: rm input1.txt input2.txt

Text Editors

There are more text editors available for Unix systems than any one person is aware of. Some are terminal-based, some are graphical, and some have both types of interfaces.

All Unix systems support running graphical programs from remote locations, but many graphical programs require a fast connection (100 megabits/sec) or more to function comfortably.

Knowing how to use a terminal-based text editor is therefore a very good idea, so that you're prepared to work on a remote Unix system over a slow connection if necessary. Some of the more common terminal-based editors are described below.

vi (visual editor) is the standard text editor for all Unix systems. Most users either love or hate the vi interface, but it's a good editor to know since it is available on every Unix system.

nano is an extremely simplistic text editor that is ideal for beginners. It is a rewrite of the pico editor, which is known to have many bugs and security issues. Neither editor is standard on Unix systems, but both are free and easy to install. These editors entail little or no learning curve, but are not sophisticated enough for extensive programming or scripting.

emacs (Edit MACroS) is a more sophisticated editor used by many programmers. It is known for being hard to learn, but very powerful. It is not standard on most Unix systems, but is free and easy to install.

ape is a menu-driven, user-friendly IDE (integrated development environment), i.e. programmer's editor. It has an interface similar to PC and Mac programs, but works on a standard Unix terminal. It is not standard on most Unix systems, but is free and easy to install. ape has a small learning curve, and advanced features to make programming much faster.

Eclipse is a popular open-source graphical IDE written in Java, with support for many languages. It is sluggish over a slow connection, so it may not work well on remote systems over ssh.

Networking

hostname prints the network name of the machine.

shell-prompt: hostname

This is often useful when you are working on multiple Unix machines at the same time (e.g. via ssh), and forgot which window applies to each machine.

Identity and Access Management

passwd changes your password. It asks for your old password once, and the new one twice (to ensure that you don't accidentally set your password to something you don't know because your finger slipped). Unlike many graphical password programs, passwd does not echo anything for each character typed. Even allowing someone to see the length of your password is a bad idea from a security standpoint.

# This may not work on systems using an authentication service
# rather than local passwords
shell-prompt: passwd

The passwd command is generally only used for setting local passwords on the Unix machine itself. Many Unix systems are configured to authenticate users via a remote service such as Lightweight Directory Access Protocol (LDAP) or Active Directory (AD). Changing LDAP or AD passwords may require using a web portal to the LDAP or AD server instead of the passwd command.

Terminal Control

clear clears your terminal screen (assuming the TERM environment variable is properly set).

shell-prompt: clear

reset resets your terminal to its default state. This is useful when your terminal has been corrupted by bad output, such as when attempting to view a binary file with cat.

Terminals are controlled by magic sequences, sequences of invisible control characters sent from the host computer to the terminal amid the normal output. Magic sequences move the cursor, change the color, change the international character set, etc. Binary files contain random data that sometimes by chance contain magic sequences that could alter the mode of your terminal. If this happens, running reset will usually correct the problem. If not, you will need to log out and log back in.

shell-prompt: reset

Table 3.7, “Unix Commands” provides a quick reference for looking up common Unix commands. For details on any of these commands, run man command (or info command on some systems).

Table 3.7. Unix Commands

Synopsis	Description
ls [file\|directory]	List file(s)
cp source-file destination-file	Copy a file
cp source-file [source-file ...] directory	Copy multiple files to a directory
mv source-file destination-file	Rename a file
mv source-file [source-file ...] directory	Move multiple files to a directory
ln source-file destination-file	Create another name for the same file. (source and destination must be in the same file system)
ln -s source destination	Create a symbolic link to a file or directory
rm file [file ...]	Remove one or more files
rm -r directory	Recursively remove a directory and all of its contents
mkdir directory	Create a directory
rmdir directory	Remove a directory (the directory must be empty)
od/hexdump	Show the contents of a file in octal/hexadecimal
sort	Sort text files based on flexible criteria
uniq	Echo files, eliminating adjacent duplicate lines.
diff	Show differences between text files.
cmp	Detect differences between binary files.
cdiff	Show differences between C programs.
date	Show the current date and time.
cal	Print a calendar for any month of any year.
printenv	Print environment variables.

Practice

Note

Be sure to thoroughly review the instructions in Section 2, “Practice Problem Instructions” before doing the practice problems below.

What types of commands have to be internal to the shell? Give one example and explain why it must be internal.
How can you find a list of the basic Unix commands available on your system?
How can you find out whether the grep command is internal or external, and where it is located?
What kind of suffering did computer users have to endure in order to read documentation before the Unix renaissance? How did Unix put an end to such suffering?
Show a Unix command that helps us learn about all the command-line flags available for the tail command.
Show a Unix command that copies the file /tmp/sample.txt to the CWD.
Show a Unix command that copies all files in /tmp whose names begin with "sample" and end with ".txt" to the CWD.
Show a Unix command that moves all the files in the CWD whose names end with ".py" to a subdirectory of the CWD called "Python".
Show a Unix command that creates another file name in the CWD called test-input.txt for the existing file ./Data/input.txt.
What is a hard link?
What is a symbolic link?
What do we get when we remove the path name to which a symbolic link points?
What limitations do hard links have that soft links do not have?
How do we create a new directory /home/joe/Data/Project1 if the Data directory does not exist and the CWD is /home/joe?
How do we remove the directory ./Data if it is empty? If it is not empty?
Show a Unix command that tells us how much disk space is available in each file system.
Show a Unix command that tells us how much space is used by the directory ./Data.
Show a sequence of Unix commands that change CWD to /tmp, then to /etc and then return to the original CWD.
How do we exit the shell?
Show a Unix command that tells us if there are carriage returns in graph.py.
Show a Unix command that displays the first 20 lines of output.txt.
Show a Unix command that displays the last 20 lines of output.txt.
Show a Unix command that displays what has changed between analysis.c.old and analysis.c.
Which text editor is available on all Unix systems?
Show a Unix command that tells us the name of the machine running our shell.
Show a Unix command to the remote server unixdev1.ceas.uwm.edu as user joe in order to run commands on it.
Show a Unix command to change our local password.
How do we change our password for a Unix system that relies on LDAP or AD?
Show a Unix command that clears the terminal display.
Show a Unix command to reset the terminal mode to default settings.