Chapter 4. Unix Shell Scripting

Table of Contents

4.. What is a Shell Script?
Practice
4.. Why Write Shell Scripts?
Efficiency and Accuracy
Documentation
Why Unix Shell Scripts?
Practice
4.. Which Shell?
Common Shells
Practice
4.. Writing and Running Shell Scripts
Practice
4.. Sourcing Scripts
Practice
4.. Shell Start-up Scripts
Practice
4.. String Constants and Terminal Output
Practice
4.. Shell and Environment Variables
Assignment Statements
Variable References
Using Variables for Code Quality
Output Capture
Practice
4.. Hard and Soft Quotes
Practice
4.. User Input
Practice
4.. Conditional Execution
Command Exit Status
If-then-else Commands
Shell Conditional Operators
Case and Switch Commands
Practice
4.. Loops
For and Foreach
While Loops
Practice
4.. Generalizing Your Code
Hard-coding: Failure to Generalize
Generalizing with User Input
Generalizing with Command-line Arguments
Practice
4.. Pitfalls and Best Practices
Practice
4.. Script Debugging
Practice
4.. Functions, Child Scripts, and Aliases
Bourne Shell Functions
C Shell Separate Scripts
Aliases
Practice
4.. Here Documents
Practice
4.. Scripting an Analysis Pipeline
What's an Analysis Pipeline?
Where do Pipelines Come From?
Implementing Your Own Pipeline
Example Genomics Pipelines
Practice
4.. Solutions to Practice Breaks

Before You Begin

Before reading this chapter, you should be familiar with basic Unix concepts (Chapter 3, Using Unix) and the Unix shell (the section called “Command Line Interfaces (CLIs): Unix Shells”).

What is a Shell Script?

A shell script is essentially a file containing a sequence of Unix commands. A script is a type of program, but is distinguished from other programs in that it represents programming at a higher level. C programs are made up of C statements and calls to subprograms. Shell scripts are made up of shell commands and calls to C programs and other programs. In other words, entire programs serve as the subprograms in a shell script. A script is a way of automating the execution of multiple separate programs in sequence.

All Unix shells share a feature that can help us avoid this repetitive work: They don't care where their input comes from. It is often said that the Unix shell reads commands from the keyboard and executes them. This is not true. The shell reads commands from any input source and executes them. The keyboard is just one of many sources of commands that can be used by the shell. Ordinary files are also very commonly used as shell input.

Note

About the only difference between a shell process reading commands from the keyboard and one reading commands from a file is that the process reading from a file does not print a shell prompt. Otherwise, they do not behave any differently. The commands we put in a script are exactly the same as the commands we would run interactively.

Recall from Chapter 3, Using Unix that Unix systems employ device independence, which means that a keyboard is the same thing as a file from the perspective of a Unix program. Any program that reads from a keyboard can also read the same input from a file or any other input device.

The Unix command-line structure was designed to be convenient for both interactive use and for programming in scripts. In fact, a Unix command looks a lot like a subprogram call. The difference is just minor syntax. A subprogram call in C encloses the arguments in parenthesis and separates them with commas:

function_name(arg1,arg2,arg3);
        

A Unix command is basically the same, except that it uses spaces instead of parenthesis and commas and does not use parenthesis:

command_name arg1 arg2 arg3
        

It is important to understand the difference between a "script" and a "real program", and which languages are appropriate for each. Scripts tend to be small, usually a few lines to a few hundred lines, and do not do any significant computation of their own. Instead, scripts run other programs to do most of the computational work. The job of the script is simply to automate and document the process of running programs.

As a result, scripting languages do not need to be fast and are generally interpreted rather than compiled. Recall that interpreted language programs run orders of magnitude slower than equivalent compiled programs. Programs written in more general-purpose languages such as C, C++, or Java, may be quite large and may implement complex computational algorithms. Hence, they need to be fast and as a result are usually written in compiled languages.

If you plan to use exclusively pre-existing programs such as Unix commands and/or add-on application software, and need only automate the execution of these programs, then you need to write a script and should choose a good scripting language. If you plan to implement your own algorithm(s) that may require a lot of computation, then you need to write a program and should select an appropriate compiled programming language.

Practice

Note

Be sure to thoroughly review the instructions in Section 2, “Practice Problem Instructions” before doing the practice problems below.
  1. What is a shell script?

  2. Compare and contrast Unix commands with subprogram calls in a C or Java program.

  3. What is the difference between a script and a "real" program?

  4. Is a scripting language a good choice for performing matrix multiplication? Why or why not?