Variables are essential to any programming language, and scripting languages are no exception. Variables are useful for user input, control structures, and for giving descriptive names to commonly used constants, such as numbers and long path names.
Recall from the section called “Environment Variables” that every Unix process has a set of string variables called the environment, which are handed down from the parent process in order to communicate important information. For example, the TERM variable, which identifies the type of terminal a user is using, is used by programs such as top, vi, nano, and more, that need to manipulate the terminal screen (move the cursor, highlight characters, etc.) The TERM environment variable is usually set by the shell process so that all of the shell's child processes (those running vi, nano, etc.) will inherit the variable.
Unix shells also keep another set of variables called shell variables that are not part of the environment. These variables are used only for the shell's purposes and are not inherited child processes. The shell variables are structured exactly the same way as the environment variables, each having a name and a value, which is a character string.
There are some special shell variables such as "prompt" and "PS1" (which control the appearance of the shell prompt in C shell and Bourne shell, respectively). Most shell variables, however, are defined by the user for use in scripts, just like variables in any other programming language.
Environment and shell variable names must begin with a letter or an underscore (_), which is optionally followed by more letters, underscores, or digits. The regular expression defining the naming rules for environment variables '[A-Za-z_][A-Za-z0-9_]*'. Environment variable names traditionally use upper case for all letters.
In all Bourne Shell derivatives, a shell variable is created or modified using the same simple syntax:
varname=value
bash-4.2$ name = Fred bash: name: command not found bash-4.2$ name=Fred bash-4.2$ printf "$name\n" Fred
When assigning a string that contains white space, it must be enclosed in quotes or all white space characters must be escaped:
#!/bin/sh -e name=Joe Sixpack # Error name="Joe Sixpack" # OK name=Joe\ Sixpack # OK
C shell and T shell use the internal set command for assigning variables. Since a variable assignment is a command in C shell, we can have white space around the '=' if we wish:
#!/bin/csh -ef set name = "Joe Sixpack"
In many languages such as C, Fortran, or Java, we must define variables before we can use (reference) them:
int c; double x; c = 5; x = 1.4;
Unix shell variables need not be defined before they are assigned a value. Defining variables is unnecessary, since there is only one data type in shell scripts. All shell variables are character strings. There are no integers, Booleans, enumerated types, or floating point variables, although there are some facilities for interpreting shell variables as integers, assuming they contain only digits.
In Bourne shell, we can perform basic integer arithmetic by
enclosing an expression in $(( ))
:
c=$(($c + 1))
In C shell, we use the @ command, which is a special form of set that enables basic arithmetic:
@ c = $c + 1
Most shells are not capable of handling real numbers. Only integers are supported, mainly for the sake of loop counters and a few other purposes. If you must manipulate real numbers in a shell script, you could accomplish it by piping an expression through bc, the Unix arbitrary-precision calculator:
printf "243.9 * $variable\n" | bc -l
Such facilities are very inefficient compared to other languages, however, partly because shell languages are interpreted, not compiled, and partly because they must convert each string to a number, perform arithmetic, and convert the results back to a string. Shell scripts are meant to automate sequences of Unix commands and other programs, not perform numerical computations.
In Bourne shell family shells, environment variables are set by first setting a shell variable of the same name and then exporting it to the environment:
TERM=xterm export TERM
Modern Bourne shell derivatives such as bash (Bourne Again Shell) can do this in one command:
export TERM=xterm
C shell derivatives use the setenv command to set environment variables:
setenv TERM xterm
C shell variables are not linked to environment variables,
except for some special variables, like path
which is automatically exported to the environment variable
PATH
each time is it updated.
To reference a shell variable or an environment variable in a shell script, we must precede its name with a '$'. The '$' tells the shell that the following text is to be interpreted as a variable name rather than a string constant. The variable reference is then expanded, i.e. replaced by the value of the variable. This occurs anywhere in a command except inside a string bounded by single quotes or following an escape character (\), as explained in the section called “String Constants and Terminal Output”. These rules are basically the same for all Unix shells.
#!/bin/sh -e name="Joe Sixpack" printf "Hello, name!\n" # Not a variable reference printf "Hello, $name!\n" # References variable "name" printf 'Hello, $name!\n' # Not a variable reference printf "Hello, \$name!\n" # Not a variable reference
Output:
Hello, name! Hello, Joe Sixpack! Hello, $name! Hello, $name!
Type in and run the following scripts:
#!/bin/sh -e first_name="Bob" last_name="Newhart" printf "%s %s is a superhero.\n" $first_name $last_name
CSH version:
#!/bin/csh -ef set first_name = "Bob" set last_name = "Newhart" printf "%s %s is a superhero.\n" $first_name $last_name
If both a shell variable and an environment variable with the same name exist, a normal variable reference will expand the shell variable.
In Bourne shell derivatives, a shell variable and environment variable of the same name always have the same value, since exporting is the only way to set an environment variable. Hence, it doesn't really matter which one we reference.
In C shell derivatives, a shell variable and environment variable of the same name can have different values. If you want to reference the environment variable rather than the shell variable, you can use the printenv command:
Darwin heron bacon ~ 319: set name=Sue Darwin heron bacon ~ 320: setenv name Bob Darwin heron bacon ~ 321: echo $name Sue Darwin heron bacon ~ 322: printenv name Bob
There are some special C shell variables that
are automatically linked to environment counterparts.
For example, the shell variable path
is always the same as the environment variable
PATH
. The C shell man page
is the ultimate source for a list of these variables.
If a variable reference is immediately followed by a character that could be part of a variable name, we could have a problem:
#!/bin/sh -e name="Joe Sixpack" printf "Hello to all the $names of the world!\n"
Instead of printing "Hello to all the Joe Sixpacks of the world", the printf will fail because there is no variable called "names". In Bourne Shell derivatives, non-existent variables are treated as empty strings, so this script will print "Hello to all the of the world!". C shell will print an error message stating that the variable "names" does not exist.
We can correct this by delimiting the variable name in curly braces:
#!/bin/sh -e name="Joe Sixpack" printf "Hello to all the ${name}s of the world!\n"
This syntax works for all shells. Some shell programmers might insist that all variable references should use {}. My philosophy is that if something is not necessary or at least helpful, then typing it is a waste of time and added clutter.
Another very good use for shell variables is in eliminating redundant string constants from a script. Suppose we have a path name referenced multiple times in a script:
#!/bin/sh -e output_value=`myprog` printf "$output_value\n" >> Run2/Output/results.txt more Run2/Output/results.txt cp Run2/Output/results.txt latest-results.txt
If for any reason the relative path
Run2/Output/results.txt
should change, then you'll
have to search through the script and make sure that all
instances are updated. This is a tedious and error-prone
process, which can be avoided by using a variable:
#!/bin/sh -e output_file="Run2/Output/results.txt" output_value=`myprog` printf "$output_value\n" >> $output_file more $output_file cp $output_file latest-results.txt
In the second version of the script, if the path name of
results.txt
changes, then only one
change must be made to the script.
Avoiding redundancy is one of the primary goals of any
good programmer.
In a more general programming language such as C or Fortran, this role would be served by a constant, not a variable. However, shells do not support constants, so we use a variable for this.
In most shells, a variable can be marked read-only in an assignment to prevent accidental subsequent changes. Bourne family shells use the readonly command for this, while C shell family shells use set -r.
#!/bin/sh -e readonly output_file="Run2/Output/results.txt" output_value=`myprog` printf "$output_value\n" >> $output_file more $output_file cp $output_file latest-results.txt
#!/bin/csh -ef set -r output_file = "Run2/Output/results.txt" set output_value=`myprog` printf "$output_value\n" >> $output_file more $output_file cp $output_file latest-results.txt
Output from a command can be captured and used as a string in the shell environment by enclosing the command in back-quotes (``). In Bourne-compatible shells, we can also use $() in place of back-quotes.
#!/bin/sh -e # Using output capture in a command printf "Today is %s.\n" `date` printf "Today is %s.\n" $(date) # Using a variable. If using the output more than once, this will # avoid running the command multiple times. today=`date` printf "Today is %s\n" $today
What is the convention for naming C shell variables to help distinguish them from environment variables?
Can we use any name we want for a shell variable in our script?
What rules do shell and environment variable names need to follow?
Show how to assign the value "Alfred E. Neumann" to the shell variable full_name in Bourne shell and in C shell.
How do we declare a shell variable? Explain.
What is the relationship between a given shell variable and an environment variable with the same name? Explain.
Are all C shell variable independent of environment variables? Use an example to clarify.
Show the output of the following script: (Try to figure it out first, and then check by typing in and running the script).
#!/bin/sh -e name="Wile E. Coyote" printf "$name\n" printf "name\n" printf '$name\n'
What is the output of the following script? What do you think is the intended output and how can we make it happen?
#!/bin/sh -e file_size=200 printf "File size is $file_sizeMB.\n"
What is the danger in the following script? Alter the script to eliminate the risk.
#!/bin/sh -e printf "The first 20 lines of file.txt are:\n" head -n 20 file.txt printf "The last 20 lines of file.txt are:\n" tail -n 20 file.txt
Write a shell script that prints the following, using a single printf command and using wc -l to find the number of lines in the file. Exact white space is not important.
input1.txt contains 3258 lines.