As in other programming languages, our scripts often need to run the same command or commands repeatedly. Very often, we need to run the same sequence of commands on a group of files. In simple cases, we can simply provide all of the files as arguments to a single invocation of a command, or use xargs to provide them:
shell-prompt: analyze input*.txt find . -name 'input*.txt' | xargs analyze
We can achieve the same effect, as well as handle more complex situations involving multiple commands, using a loop in a shell script.
Unix shells offer a type of loop that takes an enumerated list of string values, rather than counting through a sequence of numbers or looping while some input condition is true. This makes shell scripts very convenient for working with sets of files or arbitrary sets of values. This type of loop is well suited for use with globbing (file name patterns using wild cards, as discussed in the section called “Globbing (File Specifications)”):
#!/bin/sh -e # Process input-1.txt, input-2.txt, etc. for file in input-*.txt do ./myprog $file done
#!/bin/csh -ef # Process input-1.txt, input-2.txt, etc. foreach file (input-*.txt) ./myprog $file end
Code controlled by a loop should be consistently indented as shown above. How much indentation is used is a matter of personal taste, but four spaces is typical.
These loops are not limited to using file names, however. We can use them to iterate through any list of string values:
#!/bin/sh -e for fish in flounder gobie hammerhead manta moray sculpin do printf "%s\n" $fish done
#!/bin/sh -e for c in 1 2 3 4 5 6 7 8 9 10 do printf "%d\n" $c done
To iterate through a list of integers too long to type out, we can utilize the seq command, which takes a starting value, optionally an increment value, and an ending value, and prints the sequence to the standard output. We can use shell output capture (the section called “Output Capture”) to represent the output of the seq command as a string in the script:
#!/bin/sh -e # Count from 0 to 1000 in increments of 5 for c in $(seq 0 5 1000); do printf "%d\n" $c done
#!/bin/csh -ef foreach c (`seq 0 5 1000`) printf "%s\n" $c end
The seq can even be used to embed integer values in a non-integer list:
#!/bin/sh -e # Process all human chromosomes for chromosome in $(seq 1 22) X Y; do printf "chr%s\n" $chromosome done
Type in and run the fish example above.
Example 4.3. Multiple File Downloads
Often we need to download many large files from another site. This process would be tedious to do manually: Start a download, wait for it to finish, start another... There may be special tools provided by the website, but often they are poorly maintained or difficult to use. In many cases, we may be able to automate the download using a simple script and a mainstream transfer tool such as curl, rsync, or wget.
The model scripts below demonstrate how to download a set of files using curl. The local file names will be the same as those on the remote site, and if the transfer is interrupted for any reason, we can simply run the script again to resume the download where it left off.
Depending on the tools available on your local machine and the remote server, you may need to substitute another file transfer program for curl.
#!/bin/sh -e # Download genome data from the ACME genome project site=http://server.with.my.files/directory/with/my/files for file in frog1 frog2 frog3 toad1 toad2 toad3; do printf "Fetching $site/$file.fasta.gz...\n" # Use filename from remote site and try to resume interrupted # transfers if a partial download already exists curl --continue-at - --remote-name $site/$file.fasta.gz fi
#!/bin/csh -ef # Download genome data from the ACME genome project set site=http://server.with.my.files/directory/with/my/files foreach file (frog1 frog2 frog3 toad1 toad2 toad3) printf "Fetching $site/$file.fasta.gz...\n" # Use filename from remote site and try to resume interrupted # transfers if a partial download already exists curl --continue-at - --remote-name $site/$file.fasta.gz end
A for or foreach loop is only convenient for iterating through a fixed set of values or a sequence generated by a program such as seq. Sometimes we may need to terminate a loop based on inputs that are unknown when the loop begins, or values computed over the course of the loop.
The while loop is a more general loop that iterates as long as some condition is true. It uses the same types of expressions as an if command. The while loop can be used to iterate through long integer sequences, as we might do with seq and a for/foreach loop:
#!/bin/sh -e c=1 while [ $c -le 100 ] do printf "%d\n" $c c=$(($c + 1)) # (( )) encloses an integer expression done
Note again that the [ above is an external command, as discussed in the section called “Command Exit Status”, so we must use white space to separate the arguments.
#!/bin/csh -ef set c = 1 while ( $c <= 100 ) printf "%d\n" $c @ c = $c + 1 # @ is like set, but indicates an integer expression end
Code controlled by a loop should be consistently indented as shown above. How much indentation is used is a matter of personal taste, but four spaces is typical.
Type in and run the script above.
While loops can also be used to iterate until an input condition is met:
#!/bin/sh -e continue='' while [ 0"$continue" != 0'y' ] && [ 0"$continue" != 0'n' ]; do printf "Would you like to continue? (y/n) " read continue done
#!/bin/csh -ef set continue='' while ( ("$continue" != 'y') && ("$continue" != 'n') ) printf "Continue? (y/n) " set continue="$<" end
Type in and run the script above.
We may even want a loop to iterate forever. This is often useful when using a computer to collect data at regular intervals. It is up to the user to terminate the process using Ctrl+c or kill.
#!/bin/sh -e # 'true' is an external command that always returns an exit status of 0 while true; do sample-data # Read instrument sleep 10 # Pause for 10 seconds without using any CPU time done
#!/bin/csh -ef while ( 1 ) sample-data # Read instrument sleep 10 # Pause for 10 seconds without using any CPU time end
Describe three ways to run the same program using multiple input files.
Write a shell script that prints the square of every number from 1 to 10 using for/foreach and seq.
Write a shell script that prints the square of every number from 1 to 100 using while.
Write a shell script that sorts each file in the CWD
whose name begins with
"input" and ending in ".txt", and saves the output to
original-filename.sorted
.
You may assume that there are not too many input files for
a simple globbing pattern.
The script then merges all the sorted text into a single file
called combined.txt.sorted
, with duplicate
lines removed.
Hint: The sort can also merge presorted files.
Check the man page for the necessary flag.
shell-prompt: ./sort.sh input1.txt: Starbuck Adama input1.txt.sorted: Adama Starbuck input2.txt: Tigh Apollo input2.txt.sorted: Apollo Tigh Combined: Adama Apollo Starbuck Tigh