1.10. The Unix File system

1.10.1. Unix Files

A Unix file is simply a sequence of bytes (8-bit values) stored on a disk and given a unique name. The bytes in a file may be printable characters such as letters, digits, punctuation symbols, invisible control characters (which cause a printer or terminal to perform actions such as backspacing or scrolling), or other non-character data.

Text vs Binary Files

Files are often classified as either text or binary files. All of the bytes in a text file are interpreted as characters by the programs that read or write the file, while binary files may contain both character and non-character data.

Note that the Unix standard makes no distinction between one file and another based on their contents. Only individual programs care what is inside a file.

Practice Break

Try the following commands:

shell-prompt: cat .profile
		    

What do you see? The .profile file is a text file, and cat is used here to echo it to the screen.

Now try the following:

shell-prompt: cat /bin/ls
		    

What do you see? The file /bin/ls is not a text file. It contains binary program code, not characters. The cat command assumes that the file is a text file displays each character terminal. Binary files show up as a lot of garbage, and may even knock your terminal out of whack. If this happens, run the reset command to restore your terminal to its original state. In the rare case that reset does not fix the terminal, you can try running an editor such as vi, which will attempt to reset the terminal when starting or exiting, or simply log out and log back in using a fresh terminal window.

Unix vs. Windows Text Files

While it is the programs that interpret the contents of a file, there are some conventions regarding text file format that all Unix programs follow, so that they can all manipulate the same files. Unfortunately, DOS and Windows programs follow different conventions. Unix programs assume that text files terminate each line with a control character known as a line feed (also known as a newline), which is character 10 in the standard character sets. DOS and Windows programs tend to use both a carriage return (character 13) and a line feed.

To compound the problem, many Unix editors and tools also run on Windows (under Cygwin, for example). As a result, text files may end up with a mixture of line-terminations after being edited on both Windows and Unix.

Some programs are smart enough to properly handle either line termination convention. However, many others will misbehave if they encounter the "wrong" type of line termination.

The dos2unix and unix2dos commands can be used to clean up files that have been transferred between Unix and DOS/Windows. These programs convert text files between the DOS/Windows and Unix standards. If you've edited a text file on a non-Unix system, and are now using it on a Unix system, you can clean it up by running:

shell-prompt: dos2unix filename
		

The dos2unix and unix2dos commands are not standard with most Unix systems, but they are free programs that can easily be added.

Caution

Note that dos2unix and unix2dos should only be used on text files. They should never be used on binary files, since the contents of a binary file are not meant to be interpreted as characters such as line feeds and carriage returns.

1.10.2. File system Organization

Basic Concepts

A Unix file system contains files and directories. A file is like a document, and a directory is like a folder that contains documents and/or other directories. The terms "directory" and "folder" are interchangeable, but "directory" is the standard term used in Unix.

Note

Unix file systems use case-sensitive file and directory names. I.e., Temp is not the same as temp, and both can coexist in the same directory.

Mac OS X is the only major Unix system that violates this convention. The standard OS X file system (called HFS) is case-preserving, but not case-sensitive. This means that if you call a file Temp, it will remember that the T is capital, but it can also be referred to as temp, tEmp, etc. Only one of these files can exist in a given directory at any one time.

A Unix file system can be visualized as a tree, with each file and directory contained within another directory.

Figure 1.1, “Sample of a Unix File system” shows a small portion of a typical Unix file system. On a real Unix system, there are usually thousands of files and directories. Directories are shown in green and files are in yellow.

Figure 1.1. Sample of a Unix File system

Sample of a Unix File system

The one directory that is not contained within any other is known as the root directory, whose name under Unix is /. There is exactly one root directory on every Unix system. Windows systems, on the other hand, have a root directory for each disk partition such as C: and D:.

The Cygwin compatibility layer works around the separate drive letters of Windows by unifying them under a common parent directory called /cygdrive. Hence, for Unix commands run under Cygwin, /cygdrive/c is equivalent to c:, /cygdrive/d is equivalent to d:, and so on. This allows Cygwin users to traverse multiple Windows drive letters with a single command starting in /cygdrive.

Unix file system trees are fairly standardized, but most have some variation. For instance, all Unix systems have a /bin and a /usr/bin, but not all of them have /home or /usr/local.

The root directory is the parent of /bin and /home and an ancestor of all other files and directories.

The /bin and /home directories are subdirectories, or children of /. Likewise, /home/joe and /home/sue are subdirectories of /home, and grandchildren of /.

All of the files in and under /home comprise a subtree of /home.

The children of a directory, all of its children, and so on, are known as descendants of the directory. All files and directories on a Unix system, except /, are descendants of /.

Each user has a home directory, which can be arbitrarily assigned, but is generally a child of /home on many Unix systems. The home directory can be referred to as ~ or ~username in modern Unix shells.

In the example above, /home/joe is the home directory for user joe, and /home/sue is the home directory for user sue. This is the conventional location for home directories on BSD and Linux systems, which are two specific types of Unix. On a Mac OS X system, which is another brand of Unix, Joe's home directory would be /Users/joe instead of /home/joe. Most of the files owned by ordinary users are either in their home directory or one of its descendants.

Absolute Path Names

The absolute path name, also known as full path name, of a file or directory denotes the complete path from / (the root directory) to the file or directory of interest. For example, the absolute path name of Sue's .cshrc file is /home/sue/.cshrc, and the absolute path name of the ape command is /usr/local/bin/ape.

Note

An absolute path name always begins with '/' or a '~'.

Practice Break

Try the following commands:

shell-prompt: ls
shell-prompt: ls /etc
shell-prompt: cat /etc/motd
		    

Current Working Directory

Every Unix process has an attribute called the current working directory, or CWD. This is the directory that the process is currently "in". When you first log into a Unix system, the shell process's CWD is set to your home directory.

The pwd command prints the CWD of the shell process. The cd command changes the CWD of the shell process. Running cd with no arguments sets the CWD to your home directory.

Practice Break

Try the following commands:

shell-prompt: pwd
shell-prompt: cd /
shell-prompt: pwd
shell-prompt: cd
shell-prompt: pwd
		    

Some commands, such as ls, use the CWD as a default if you don't provide a directory name on the command line. For example, if the CWD is /home/joe, then the following commands are the same:

shell-prompt: ls
shell-prompt: ls /home/joe
		

Relative Path Names

Whereas an absolute path name denotes the path from / to a file or directory, the relative path name denotes the path from the CWD to a file or directory.

Any path name that does not begin with a '/' or '~' is interpreted as a relative path name. The absolute path name is then derived by appending the relative path name to the CWD. For example, if the CWD is /etc, then the relative path name motd refers to the absolute path name /etc/motd.

absolute path name = CWD + "/" + relative path name

Note

Relative path names are handled at the lowest level of the operating system, by the Unix kernel. This means that they can be used anywhere: in shell commands, in C or Fortran programs, etc.

When you run a program from the shell, the new process inherits the CWD from the shell. Hence, you can use relative path names as arguments in any Unix command, and they will use the CWD inherited from the shell. For example, the two cat commands below have the same effect.

shell-prompt: cd /etc          # Set shell's CWD to /etc
shell-prompt: cat motd         # Inherits CWD from shell
shell-prompt: cat /etc/motd
		

Wasting Time

The cd command is one of the most overused Unix commands. Many people use it where it is completely unnecessary and actually results in significantly more typing than needed. Don't use cd where you could have used the directory with another command. For example, the sequence of commands:

shell-prompt: cd /etc
shell-prompt: more hosts
shell-prompt: cd
		    

The same effect could have been achieved much more easily using the following single command:

shell-prompt: more /etc/hosts
		    

Note

In almost all cases, absolute path names and relative path names are interchangeable. You can use either type of path name as a command line argument, or within a program.

Practice Break

Try the following commands:

shell-prompt: cd
shell-prompt: pwd
shell-prompt: cd /etc
shell-prompt: pwd
shell-prompt: cat motd
shell-prompt: cat /etc/motd
shell-prompt: cd
shell-prompt: pwd
shell-prompt: cat motd
		    

Why does the last command result in an error?

Avoid Absolute Path Names

The relative path name is potentially much shorter than the absolute path name. Using relative path names also provides more flexibility.

Suppose you have a project contained in the directory /Users/joe/Thesis on your Mac workstation.

Now suppose you want to work on the same project on a cluster, where there is no /Users directory and you have to store it in /share1/joe/Thesis.

The absolute path name of every file and directory will be different on the cluster than it is on your Mac. This can cause major problems if you were using absolute path names in your scripts, programs, and makefiles. Statements like the following will have to be changed in order to run the program on a different computer.

infile = fopen("/Users/joe/Thesis/Inputs/input1.txt", "r");
		
sort /Users/joe/Thesis/Inputs/names.txt
		

No program should ever have to altered just to run it on a different computer.

While the absolute path names change when you move the Thesis directory, the path names relative to the Thesis directory remain the same. For this reason, absolute path names should be avoided unless absolutely necessary.

The statements below will work on any computer as long as the program or script is running with Thesis as the current working directory. It does not matter where the Thesis directory is located, so long as the Inputs directory is its child.

infile = fopen("Inputs/input1.txt", "r");
		
sort Inputs/names.txt
		

Special Directory Names

In addition to absolute path names and relative path names, there are a few special symbols for directories that are commonly referenced:

Table 1.5. Special Directory Symbols

SymbolRefers to
.The current working directory
..The parent of the current working directory
~Your home directory
~useruser's home directory

Practice Break

Try the following commands and see what they do:

shell-prompt: cd
shell-prompt: pwd
shell-prompt: ls
shell-prompt: ls .
shell-prompt: cp /etc/motd .
shell-prompt: cat motd
shell-prompt: cat ./motd
shell-prompt: ls ~
shell-prompt: ls /bin
shell-prompt: ls ..
shell-prompt: cd ..
shell-prompt: ls
shell-prompt: ls ~
shell-prompt: cd
		    

1.10.3. Ownership and Permissions

Overview

Every file and directory on a Unix system has inherent access control features based on a simple scheme:

  • Every file and directory has an individual owner and group owner.

  • There are 3 types of permissions which are controlled separately from each other:

    • Read
    • Write (create or modify)
    • Execute (e.g. run a file if it's a program)
  • Read, write, and execute permissions can be granted or denied separately for each of the following:
    • The individual owner
    • The group owner
    • All users on the system (a hypothetical group known as "world")

Execute permissions on a file mean that the file can be executed as a script or a program by typing its name. It does not mean that the file actually contains a script or a program: It is up to the owner of the file to set the execute permissions appropriately for each file.

Execute permissions on a directory mean that users in the category can cd into it. Users only need read permissions on a directory to list it or access a file within it, but they need execute permissions to make it the current working directory of their processes.

Unix systems provide this access using 9 on/off switches (bits) associated with each file.

Viewing Permissions

If you do a long listing of a file or directory, you will see the ownership and permissions:

shell-prompt: ls -l
drwx------   2 joe    users      512 Aug  7 07:52 Desktop/
drwxr-x---  39 joe    users     1536 Aug  9 22:21 Documents/
drwxr-xr-x   2 joe    users      512 Aug  9 22:25 Downloads/
-rw-r--r--   1 joe    users    82118 Aug  2 09:47 bootcamp.pdf
		

The leftmost column shows the type of object and the permissions for each user category.

A '-' in the leftmost character means a regular file, 'd' means a directory, 'l' means a link. etc. Running man ls will reveal all the codes.

The next three characters are, in order, read, write and execute permissions for the owner.

The next three after that are permissions for members of the owning group.

The next three are permissions for world.

A '-' in a permission bit column means that the permission is denied for that user or set of users and an 'r', 'w', or 'x' means that it is enabled.

The next two columns show the individual and group ownership of the file or directory. The other columns show the size, the date and time it was last modified, and name. In addition to the 'd' in the first column, directory names are followed by a '/'.

You can see above that Joe's Desktop directory is readable, writable, and executable for Joe, and completely inaccessible to everyone else.

Joe's Documents directory is readable, writable and executable for Joe, and readable and executable for members of the group "users". Users not in the group "users" cannot access the Documents directory at all.

Joe's Downloads directory is readable and executable to anyone who can log into the system.

The file bootcamp.pdf is readable by the world, but only writable by Joe. It is not executable by anyone, which makes sense because a PDF file is not a program.

Setting Permissions

Users cannot change individual ownership on a file, since this would allow them to subvert disk quotas by placing their files under someone else's name. Only the superuser can change the individual ownership of a file or directory.

Users can change the group ownership of a file to any group that they belong to using the chgrp command:

shell-prompt: chgrp group path [path ...]
		

All sharing of files on Unix systems is done by controlling group ownership and file permissions.

File permissions are changed using the chmod command:

shell-prompt: chmod permission-specification path [path ...]
		

The permission specification has a symbolic form, and a raw form, which is an octal number.

The basic symbolic form consists of any of the three user categories 'u' (user/owner), 'g' (group), and 'o' (other/world) followed by a '+' (enable) or '-' (disable), and finally one of the three permissions 'r', 'w', or 'x'.

Add read and execute permissions for group and world on the Documents directory:

shell-prompt: chmod go+rx Documents
		

Disable all permissions for world on the Documents directory and enable read for group:

shell-prompt: chmod o-rwx,g+r Documents
		

Disable write permission for everyone, including the owner, on bootcamp.pdf. ( This can be used to reduce the chances of accidentally deleting an important file. )

shell-prompt: chmod ugo-w bootcamp.pdf
		

Run man chmod for additional information.

The raw form for permissions uses a 3-digit octal number to represent the 9 permission bits. This is a quick and convenient method for computer nerds who can do octal/binary conversions in their head.

shell-prompt: chmod 644 bootcamp.pdf   # 644 = 110100100 = rw-r--r--
shell-prompt: chmod 750 Documents      # 750 = 111101000 = rwxr-x---
		

Caution

NEVER make any file or directory world-writable. Doing so allows any other user to modify it, which is a serious security risk. A malicious user could replace use this to install a Trojan Horse program under your name, for example.

Practice Break

Try the following commands, and try to predict the output of each ls before you run it.

shell-prompt: touch testfile
shell-prompt: ls -l
shell-prompt: chmod go-rwx testfile
shell-prompt: ls -l
shell-prompt: chmod o+rw testfile
shell-prompt: ls -l
shell-prompt: chmod g+rwx testfile
shell-prompt: ls -l
shell-prompt: rm testfile
		    

Now set permissions on testfile so that it is readable, writable, and executable by you, only readable by the group, and inaccessible to everyone else.

1.10.4. Self-test

  1. What is a Unix file?
  2. Explain the difference between a text file and a binary file.
  3. Do Unix operating systems distinguish between text files and binary files? Explain.
  4. Does the Unix standard include a convention on the format of text files?
  5. Are Unix text files the same as Windows (and DOS) text files? Explain.
  6. How can text files be converted between Windows and Unix conventional formats?
  7. What is a directory? Does it go by any other names?
  8. How are files and directories organized in a Unix file system?
  9. What are some of the conventional directories in the Unix file system organization?
  10. What is the root directory?
  11. What is a parent directory?
  12. What is a sibling directory?
  13. What is a child directory?
  14. What is a subdirectory?
  15. What is a descendant directory?
  16. What is an ancestor directory?
  17. What is a home directory?
  18. What is a subtree?
  19. What is a full or absolute path name?
  20. What is the current working directory? What is a current working directory a property of?
  21. What is a relative path name? How can you convert a relative path name to an absolute path name?
  22. Why should absolute path names be avoided?
  23. How can you determine the current working directory of your shell process?
  24. How can you change the current working directory of your shell process to each of the following?
    1. Your home directory.
    2. /etc
    3. The directory Program1, which is a subdirectory of Programs, which is a subdirectory of the current working directory.
  25. Show the simplest Unix command to view each of the following files?
    1. /etc/hosts
    2. A file called .cshrc in the current working directory.
    3. A file called .cshrc in your home directory, regardless of what the current working directory is.
    4. A file called .cshrc in the home directory of a user with user name "bacon".
    5. A file called readme.txt in the parent directory of the current working directory.
  26. How can you change the group ownership of the file .cshrc in your home directory to the group "smithlab"?
  27. How can you change the individual ownership of the file .cshrc in your home directory to your friend with user name Bob? Explain.
  28. How can you change the permissions on the file .cshrc in your home directory so that only you can modify it, members of the group can read and execute it but not modify it, and anyone else can read it but not modify or execute it?
  29. How can you see the ownership and permissions on all the files in /etc? In the current working directory?