Why Write Shell Scripts?

Efficiency and Accuracy

Any experienced computer user knows that we often end up running basically the same sequence of commands many times over. Typing the same sequence of commands over and over is a waste of time and highly prone to errors.

Hence, if we're going to run the same sequence of commands more than once, we don't need to retype the sequence each time. The shell can read the commands from anywhere, and the keyboard is about the worst possible choice in this situation. We can put the same sequence of commands into a text file once and tell the shell to read the commands from the file as many times as we want, which is much easier than typing them all repeatedly, and eliminates the need to remember the details of the commands.

Rule of Thumb

If you're not completely certain that you will never need to do it again, script it.

In theory, Unix commands could also be piped in from another program or read from any other device attached to a Unix system, although in practice, they usually come from the keyboard or a script file.

Documentation

There is another very good reason for writing shell scripts in addition to saving us a lot of redundant typing. A shell script is the ultimate documentation of the work we have done on a computer, ensuring repeatability of the analysis, which is one of the cornerstones of science. By writing a shell script, we record the exact sequence of commands needed to reproduce results, in perfect detail. Hence, the script serves a dual purpose of automating and documenting our processes.

Developing a script has a ratchet effect on your knowledge. Once you add a command to a script, you will never again have to figure out how to do the same thing. Clear documentation of our work flow is important in order to justify research funding and to be able to reproduce results months or years later.

Rule of Thumb

Scientists and Unix users should never find themselves trying to remember how they did something. Script it the first time and you will never be in this situation. Unix makes it easy to automate our analyses, so nobody will waste time struggling to reproduce results.

Imagine that we instead decided to run our sequence of commands manually and document what we did in a word processor. First, we'd be typing everything twice: Once at the shell prompt and again into the document. We would want to add the exact command with all flags and data arguments in the document to ensure that we can reproduce the results. It is also very inconvenient for people to read a document and type in the commands contained in it. Why not just give them a script that they can easily run?

The process of typing the same commands each time would be painful enough, but to document it in detail while we do it would be distracting. We'd also have to remember to update the document every time we type a command differently. This is hard to do when we're trying to focus on getting results.

Writing a shell script allows us to stay focused on perfecting the process. Once the script is finished and working perfectly, we have already documented the process perfectly. We can and should add comments to the script to make it more readable, but even without comments, the script itself preserves the process in detail.

Many experienced users will never run a data processing command from the keyboard. Instead, they only put commands into a script. They run, tweak, and re-run the script until it's working perfectly.

An important part of documenting code is making the code self-documenting. When writing shell scripts, using long options in commands such as zip --preserve-case instead of zip -C makes the script much easier to read. While -C is less typing and may be preferable when running zip interactively many times, we only have to type --preserve-case once when writing the script, so the laziness of using -C doesn't pay here. It just makes us waste time later looking up their meaning, whereas the meaning of the long option may be obvious.

If you use an integrated development environment, such as APE, testing the script is a simple matter of pressing F5. We do not have to exit the editor (as we would when using nano) and we don't lose our place in the script.

Why Unix Shell Scripts?

There are many scripting languages to choose from, including those used on Unix systems, like Bourne shell, C shell, Perl, Python, etc., as well as some languages confined to other platforms like Visual Basic (Microsoft Windows only) and AppleScript (Apple only).

Note that the Unix-based scripting languages can be used on any platform, including Microsoft Windows (with Cygwin, for example) and Apple's Mac OS X, which is Unix-compatible by design. Once you learn to write Unix shell scripts, you're prepared to do scripting on any computer, without having to learn another language. There is little reason not to use Unix shell scripts in place of proprietary scripting languages.

Practice

Note

Be sure to thoroughly review the instructions in Section 2, “Practice Problem Instructions” before doing the practice problems below.
  1. Describe three reasons for writing shell scripts instead of running commands from the keyboard.

  2. What feature of Unix makes scripting so easy to implement and use? Explain.

  3. When should we run commands at the shell prompt and when should we put them in a script?

  4. What are three advantages of a script over a document explaining the commands to run?

  5. What type of flag arguments should we use in scripts? Why?

  6. What is the advantage of Unix scripting languages over others such as Visual Basic or AppleScript? What is the disadvantage?