Shell and Environment Variables

Variables are essential to any programming language, and scripting languages are no exception. Variables are useful for user input, control structures, and for giving descriptive names to commonly used constants, such as numbers and long path names.

Recall from the section called “Environment Variables” that every Unix process has a set of string variables called the environment, which are handed down from the parent process in order to communicate important information. For example, the TERM variable, which identifies the type of terminal a user is using, is used by programs such as top, vi, nano, and more, that need to manipulate the terminal screen (move the cursor, highlight characters, etc.) The TERM environment variable is usually set by the shell process so that all of the shell's child processes (those running vi, nano, etc.) will inherit the variable.

Unix shells also keep another set of variables called shell variables that are not part of the environment. These variables are used only for the shell's purposes and are not inherited child processes. The shell variables are structured exactly the same way as the environment variables, each having a name and a value, which is a character string.

There are some special shell variables such as "prompt" and "PS1" (which control the appearance of the shell prompt in C shell and Bourne shell, respectively). Most shell variables, however, are defined by the user for use in scripts, just like variables in any other programming language.

Environment and shell variable names must begin with a letter or an underscore (_), which is optionally followed by more letters, underscores, or digits. The regular expression defining the naming rules for environment variables '[A-Za-z_][A-Za-z0-9_]*'. Environment variable names traditionally use upper case for all letters.

Assignment Statements

In all Bourne Shell derivatives, a shell variable is created or modified using the same simple syntax:

varname=value
            

Caution

There can be no space around the '='. If there were, the shell would think that 'varname' is a command, and '=' and 'value' are arguments. A variable assignment is distinct from a command in the Bourne shell family.
bash-4.2$ name = Fred
bash: name: command not found
bash-4.2$ name=Fred
bash-4.2$ printf "$name\n"
Fred
            

When assigning a string that contains white space, it must be enclosed in quotes or all white space characters must be escaped:

#!/bin/sh -e

name=Joe Sixpack    # Error
name="Joe Sixpack"  # OK
name=Joe\ Sixpack   # OK
            

C shell and T shell use the internal set command for assigning variables. Since a variable assignment is a command in C shell, we can have white space around the '=' if we wish:

#!/bin/csh -ef

set name = "Joe Sixpack"
            

Caution

Note that Bourne family shells also have a set command, but it has a completely different meaning, so take care not to get confused. The Bourne set command is used to enable or disable shell command-line options, not variables.

In many languages such as C, Fortran, or Java, we must define variables before we can use (reference) them:

    int     c;
    double  x;
    
    c = 5;
    x = 1.4;
            

Unix shell variables need not be defined before they are assigned a value. Defining variables is unnecessary, since there is only one data type in shell scripts. All shell variables are character strings. There are no integers, Booleans, enumerated types, or floating point variables, although there are some facilities for interpreting shell variables as integers, assuming they contain only digits.

In Bourne shell, we can perform basic integer arithmetic by enclosing an expression in $(( )):

c=$(($c + 1))
            

In C shell, we use the @ command, which is a special form of set that enables basic arithmetic:

@ c = $c + 1
            

Most shells are not capable of handling real numbers. Only integers are supported, mainly for the sake of loop counters and a few other purposes. If you must manipulate real numbers in a shell script, you could accomplish it by piping an expression through bc, the Unix arbitrary-precision calculator:

printf "243.9 * $variable\n" | bc -l
            

Such facilities are very inefficient compared to other languages, however, partly because shell languages are interpreted, not compiled, and partly because they must convert each string to a number, perform arithmetic, and convert the results back to a string. Shell scripts are meant to automate sequences of Unix commands and other programs, not perform numerical computations.

In Bourne shell family shells, environment variables are set by first setting a shell variable of the same name and then exporting it to the environment:

TERM=xterm
export TERM
            

Modern Bourne shell derivatives such as bash (Bourne Again Shell) can do this in one command:

export TERM=xterm
            

Note

Exporting a shell variable permanently tags it as exported. Any future changes to the variable's value will automatically be copied to the environment. This type of linkage between two objects is very rare in programming languages: Usually, modifying one object has no effect on any other.

C shell derivatives use the setenv command to set environment variables:

setenv TERM xterm
            

Caution

Note that unlike the 'set' command, setenv requires white space, not an '=', between the variable name and the value.

Note

Since the C shell allows us to create environment variables separate from shell variables, C shell variables traditionally use all lower-case letters, while environment variables use all upper-case. This makes it easier to read scripts that access both shell and environment variables.

C shell variables are not linked to environment variables, except for some special variables, like path which is automatically exported to the environment variable PATH each time is it updated.

Variable References

To reference a shell variable or an environment variable in a shell script, we must precede its name with a '$'. The '$' tells the shell that the following text is to be interpreted as a variable name rather than a string constant. The variable reference is then expanded, i.e. replaced by the value of the variable. This occurs anywhere in a command except inside a string bounded by single quotes or following an escape character (\), as explained in the section called “String Constants and Terminal Output”. These rules are basically the same for all Unix shells.

#!/bin/sh -e

name="Joe Sixpack"
printf "Hello, name!\n"     # Not a variable reference
printf "Hello, $name!\n"    # References variable "name"
printf 'Hello, $name!\n'    # Not a variable reference
printf "Hello, \$name!\n"   # Not a variable reference
            

Output:

Hello, name!
Hello, Joe Sixpack!
Hello, $name!
Hello, $name!
            

Practice Break

Type in and run the following scripts:

#!/bin/sh -e

first_name="Bob"
last_name="Newhart"
printf "%s %s is a superhero.\n" $first_name $last_name
                

CSH version:

#!/bin/csh -ef

set first_name = "Bob"
set last_name = "Newhart"
printf "%s %s is a superhero.\n" $first_name $last_name
                

Note

If both a shell variable and an environment variable with the same name exist, a normal variable reference will expand the shell variable.

In Bourne shell derivatives, a shell variable and environment variable of the same name always have the same value, since exporting is the only way to set an environment variable. Hence, it doesn't really matter which one we reference.

In C shell derivatives, a shell variable and environment variable of the same name can have different values. If you want to reference the environment variable rather than the shell variable, you can use the printenv command:

Darwin heron bacon ~ 319: set name=Sue
Darwin heron bacon ~ 320: setenv name Bob
Darwin heron bacon ~ 321: echo $name
Sue
Darwin heron bacon ~ 322: printenv name
Bob
            

There are some special C shell variables that are automatically linked to environment counterparts. For example, the shell variable path is always the same as the environment variable PATH. The C shell man page is the ultimate source for a list of these variables.

If a variable reference is immediately followed by a character that could be part of a variable name, we could have a problem:

#!/bin/sh -e

name="Joe Sixpack"
printf "Hello to all the $names of the world!\n"
            

Instead of printing "Hello to all the Joe Sixpacks of the world", the printf will fail because there is no variable called "names". In Bourne Shell derivatives, non-existent variables are treated as empty strings, so this script will print "Hello to all the of the world!". C shell will print an error message stating that the variable "names" does not exist.

We can correct this by delimiting the variable name in curly braces:

#!/bin/sh -e

name="Joe Sixpack"
printf "Hello to all the ${name}s of the world!\n"
            

This syntax works for all shells. Some shell programmers might insist that all variable references should use {}. My philosophy is that if something is not necessary or at least helpful, then typing it is a waste of time and added clutter.

Using Variables for Code Quality

Another very good use for shell variables is in eliminating redundant string constants from a script. Suppose we have a path name referenced multiple times in a script:

#!/bin/sh -e

output_value=`myprog`

printf "$output_value\n" >> Run2/Output/results.txt
more Run2/Output/results.txt
cp Run2/Output/results.txt latest-results.txt
            

If for any reason the relative path Run2/Output/results.txt should change, then you'll have to search through the script and make sure that all instances are updated. This is a tedious and error-prone process, which can be avoided by using a variable:

#!/bin/sh -e

output_file="Run2/Output/results.txt"

output_value=`myprog`
printf "$output_value\n" >> $output_file
more $output_file
cp $output_file latest-results.txt
            

In the second version of the script, if the path name of results.txt changes, then only one change must be made to the script. Avoiding redundancy is one of the primary goals of any good programmer.

In a more general programming language such as C or Fortran, this role would be served by a constant, not a variable. However, shells do not support constants, so we use a variable for this.

In most shells, a variable can be marked read-only in an assignment to prevent accidental subsequent changes. Bourne family shells use the readonly command for this, while C shell family shells use set -r.

#!/bin/sh -e

readonly output_file="Run2/Output/results.txt"

output_value=`myprog`
printf "$output_value\n" >> $output_file
more $output_file
cp $output_file latest-results.txt
            
#!/bin/csh -ef

set -r output_file = "Run2/Output/results.txt"

set output_value=`myprog`
printf "$output_value\n" >> $output_file
more $output_file
cp $output_file latest-results.txt
            
Output Capture

Output from a command can be captured and used as a string in the shell environment by enclosing the command in back-quotes (``). In Bourne-compatible shells, we can also use $() in place of back-quotes.

            #!/bin/sh -e
            
            # Using output capture in a command
            printf "Today is %s.\n" `date`
            printf "Today is %s.\n" $(date)
            
            # Using a variable.  If using the output more than once, this will
            # avoid running the command multiple times.
            today=`date`
            printf "Today is %s\n" $today
            
Practice

Note

Be sure to thoroughly review the instructions in Section 2, “Practice Problem Instructions” before doing the practice problems below.
  1. What is the convention for naming C shell variables to help distinguish them from environment variables?

  2. Can we use any name we want for a shell variable in our script?

  3. What rules do shell and environment variable names need to follow?

  4. Show how to assign the value "Alfred E. Neumann" to the shell variable full_name in Bourne shell and in C shell.

  5. How do we declare a shell variable? Explain.

  6. What is the relationship between a given shell variable and an environment variable with the same name? Explain.

  7. Are all C shell variable independent of environment variables? Use an example to clarify.

  8. Show the output of the following script: (Try to figure it out first, and then check by typing in and running the script).

    #!/bin/sh -e
    
    name="Wile E. Coyote"
    
    printf "$name\n"
    printf "name\n"
    printf '$name\n'
            
  9. What is the output of the following script? What do you think is the intended output and how can we make it happen?

    #!/bin/sh -e
    
    file_size=200
    
    printf "File size is $file_sizeMB.\n"
            
  10. What is the danger in the following script? Alter the script to eliminate the risk.

    #!/bin/sh -e
    
    printf "The first 20 lines of file.txt are:\n"
    head -n 20 file.txt
    printf "The last 20 lines of file.txt are:\n"
    tail -n 20 file.txt
            
  11. Write a shell script that prints the following, using a single printf command and using wc -l to find the number of lines in the file. Exact white space is not important.

    input1.txt contains 3258 lines.