A Unix file is simply a sequence of bytes (8-bit values) stored on a disk and given a unique name. The bytes in a file may be printable characters such as letters, digits, punctuation symbols, invisible control characters (which cause a printer or terminal to perform actions such as backspacing or scrolling), or other non-character data.
Files are often classified as either text or binary files. All of the bytes in a text file are interpreted as characters by the programs that read or write the file, while binary files may contain both character and non-character data.
Note that the Unix standard makes no distinction between one file and another based on their contents. Only individual programs care what is inside a file.
Try the following commands:
shell-prompt: cat .profile
What do you see? The
.profile file is
a text file, and cat is used here to echo
it to the screen.
Now try the following:
shell-prompt: cat /bin/ls
What do you see? The file
not a text file. It contains binary program code, not characters.
The cat command assumes that the file
is a text file displays each character terminal. Binary files
show up as a lot of garbage, and may even knock your terminal
out of whack. If this happens, run the reset
command to restore your terminal to its original state.
In the rare case that reset does not
fix the terminal, you can try running an editor such as
vi, which will attempt to reset the
terminal when starting or exiting, or simply log out
and log back in using a fresh terminal window.
While it is the programs that interpret the contents of a file, there are some conventions regarding text file format that all Unix programs follow, so that they can all manipulate the same files. Unfortunately, DOS and Windows programs follow different conventions. Unix programs assume that text files terminate each line with a control character known as a line feed (also known as a newline), which is character 10 in the standard character sets. DOS and Windows programs tend to use both a carriage return (character 13) and a line feed.
To compound the problem, many Unix editors and tools also run on Windows (under Cygwin, for example). As a result, text files may end up with a mixture of line-terminations after being edited on both Windows and Unix.
Some programs are smart enough to properly handle either line termination convention. However, many others will misbehave if they encounter the "wrong" type of line termination.
The dos2unix and unix2dos commands can be used to clean up files that have been transferred between Unix and DOS/Windows. These programs convert text files between the DOS/Windows and Unix standards. If you've edited a text file on a non-Unix system, and are now using it on a Unix system, you can clean it up by running:
shell-prompt: dos2unix filename
The dos2unix and unix2dos commands are not standard with most Unix systems, but they are free programs that can easily be added.
A Unix file system contains files and directories. A file is like a document, and a directory is like a folder that contains documents and/or other directories. The terms "directory" and "folder" are interchangeable, but "directory" is the standard term used in Unix.
Unix file systems use case-sensitive file and directory
Temp is not the same
temp, and both can coexist in the
Mac OS X is the only major Unix system that violates this
convention. The standard OS X file system (called HFS)
is case-preserving, but not case-sensitive. This means
that if you call a file
Temp, it will
remember that the T is capital, but it can also be referred
etc. Only one of these files can exist in a given directory
at any one time.
A Unix file system can be visualized as a tree, with each file and directory contained within another directory.
Figure 1.1, “Sample of a Unix File system” shows a small portion of a typical Unix file system. On a real Unix system, there are usually thousands of files and directories. Directories are shown in green and files are in yellow.
The one directory that is not contained within any other
is known as the root directory,
whose name under Unix is
/. There is
exactly one root directory on every Unix system. Windows
systems, on the other hand, have a root directory for
each disk partition such as C: and D:.
The Cygwin compatibility layer works around the separate drive letters of Windows by unifying them under a common parent directory called /cygdrive. Hence, for Unix commands run under Cygwin, /cygdrive/c is equivalent to c:, /cygdrive/d is equivalent to d:, and so on. This allows Cygwin users to traverse multiple Windows drive letters with a single command starting in /cygdrive.
Unix file system trees are fairly standardized, but most have some variation. For instance, all Unix systems have a /bin and a /usr/bin, but not all of them have /home or /usr/local.
The root directory is the parent of /bin and /home and an ancestor of all other files and directories.
The /bin and /home directories are subdirectories, or children of /. Likewise, /home/joe and /home/sue are subdirectories of /home, and grandchildren of /.
All of the files in and under /home comprise a subtree of /home.
The children of a directory, all of its children, and so on, are known as descendants of the directory. All files and directories on a Unix system, except /, are descendants of /.
Each user has a home directory, which
can be arbitrarily assigned, but is generally a child of
/home on many Unix systems. The home directory can be
referred to as
~username in modern Unix shells.
In the example above, /home/joe is the home directory for user joe, and /home/sue is the home directory for user sue. This is the conventional location for home directories on BSD and Linux systems, which are two specific types of Unix. On a Mac OS X system, which is another brand of Unix, Joe's home directory would be /Users/joe instead of /home/joe. Most of the files owned by ordinary users are either in their home directory or one of its descendants.
The absolute path name, also known as full path name, of a file or directory denotes the complete path from / (the root directory) to the file or directory of interest. For example, the absolute path name of Sue's .cshrc file is /home/sue/.cshrc, and the absolute path name of the ape command is /usr/local/bin/ape.
Try the following commands:
shell-prompt: ls shell-prompt: ls /etc shell-prompt: cat /etc/motd
Every Unix process has an attribute called the current working directory, or CWD. This is the directory that the process is currently "in". When you first log into a Unix system, the shell process's CWD is set to your home directory.
The pwd command prints the CWD of the shell process. The cd command changes the CWD of the shell process. Running cd with no arguments sets the CWD to your home directory.
Try the following commands:
shell-prompt: pwd shell-prompt: cd / shell-prompt: pwd shell-prompt: cd shell-prompt: pwd
Some commands, such as ls, use the
CWD as a default if you don't provide a directory name on the
command line. For example, if the CWD is
then the following commands are the same:
shell-prompt: ls shell-prompt: ls /home/joe
Whereas an absolute path name denotes the path from / to a file or directory, the relative path name denotes the path from the CWD to a file or directory.
Any path name that does not begin with a '/' or '~'
as a relative path name. The absolute path name is then derived
by appending the relative path name to the CWD. For example,
if the CWD is
/etc, then the relative
refers to the absolute path name
When you run a program from the shell, the new process inherits the CWD from the shell. Hence, you can use relative path names as arguments in any Unix command, and they will use the CWD inherited from the shell. For example, the two cat commands below have the same effect.
shell-prompt: cd /etc # Set shell's CWD to /etc shell-prompt: cat motd # Inherits CWD from shell shell-prompt: cat /etc/motd
The cd command is one of the most overused Unix commands. Many people use it where it is completely unnecessary and actually results in significantly more typing than needed. Don't use cd where you could have used the directory with another command. For example, the sequence of commands:
shell-prompt: cd /etc shell-prompt: more hosts shell-prompt: cd
The same effect could have been achieved much more easily using the following single command:
shell-prompt: more /etc/hosts
Try the following commands:
shell-prompt: cd shell-prompt: pwd shell-prompt: cd /etc shell-prompt: pwd shell-prompt: cat motd shell-prompt: cat /etc/motd shell-prompt: cd shell-prompt: pwd shell-prompt: cat motd
Why does the last command result in an error?
The relative path name is potentially much shorter than the absolute path name. Using relative path names also provides more flexibility.
Suppose you have a project contained in the directory /Users/joe/Thesis on your Mac workstation.
Now suppose you want to work on the same project on a cluster, where there is no /Users directory and you have to store it in /share1/joe/Thesis.
The absolute path name of every file and directory will be different on the cluster than it is on your Mac. This can cause major problems if you were using absolute path names in your scripts, programs, and makefiles. Statements like the following will have to be changed in order to run the program on a different computer.
infile = fopen("/Users/joe/Thesis/Inputs/input1.txt", "r");
No program should ever have to altered just to run it on a different computer.
While the absolute path names change when you move the Thesis directory, the path names relative to the Thesis directory remain the same. For this reason, absolute path names should be avoided unless absolutely necessary.
The statements below will work on any computer as long as the program or script is running with Thesis as the current working directory. It does not matter where the Thesis directory is located, so long as the Inputs directory is its child.
infile = fopen("Inputs/input1.txt", "r");
In addition to absolute path names and relative path names, there are a few special symbols for directories that are commonly referenced:
Table 1.5. Special Directory Symbols
|.||The current working directory|
|..||The parent of the current working directory|
|~||Your home directory|
|~user||user's home directory|
Try the following commands and see what they do:
shell-prompt: cd shell-prompt: pwd shell-prompt: ls shell-prompt: ls . shell-prompt: cp /etc/motd . shell-prompt: cat motd shell-prompt: cat ./motd shell-prompt: ls ~ shell-prompt: ls /bin shell-prompt: ls .. shell-prompt: cd .. shell-prompt: ls shell-prompt: ls ~ shell-prompt: cd
Every file and directory on a Unix system has inherent access control features based on a simple scheme:
Every file and directory has an individual owner and group owner.
There are 3 types of permissions which are controlled separately from each other:
Execute permissions on a file mean that the file can be executed as a script or a program by typing its name. It does not mean that the file actually contains a script or a program: It is up to the owner of the file to set the execute permissions appropriately for each file.
Execute permissions on a directory mean that users in the category can cd into it. Users only need read permissions on a directory to list it or access a file within it, but they need execute permissions to make it the current working directory of their processes.
Unix systems provide this access using 9 on/off switches (bits) associated with each file.
If you do a long listing of a file or directory, you will see the ownership and permissions:
shell-prompt: ls -l drwx------ 2 joe users 512 Aug 7 07:52 Desktop/ drwxr-x--- 39 joe users 1536 Aug 9 22:21 Documents/ drwxr-xr-x 2 joe users 512 Aug 9 22:25 Downloads/ -rw-r--r-- 1 joe users 82118 Aug 2 09:47 bootcamp.pdf
The leftmost column shows the type of object and the permissions for each user category.
A '-' in the leftmost character means a regular file, 'd' means a directory, 'l' means a link. etc. Running man ls will reveal all the codes.
The next three characters are, in order, read, write and execute permissions for the owner.
The next three after that are permissions for members of the owning group.
The next three are permissions for world.
A '-' in a permission bit column means that the permission is denied for that user or set of users and an 'r', 'w', or 'x' means that it is enabled.
The next two columns show the individual and group ownership of the file or directory. The other columns show the size, the date and time it was last modified, and name. In addition to the 'd' in the first column, directory names are followed by a '/'.
You can see above that Joe's
directory is readable,
writable, and executable for Joe, and completely inaccessible
to everyone else.
directory is readable, writable and executable
for Joe, and readable and executable for members of the group
Users not in the group "users" cannot access the Documents
directory at all.
Downloads directory is readable
and executable to anyone who can log into the system.
bootcamp.pdf is readable by the world,
but only writable by Joe. It is not executable by anyone, which
makes sense because a PDF file is not a program.
Users cannot change individual ownership on a file, since this would allow them to subvert disk quotas by placing their files under someone else's name. Only the superuser can change the individual ownership of a file or directory.
Users can change the group ownership of a file to any group that they belong to using the chgrp command:
shell-prompt: chgrp group path [path ...]
All sharing of files on Unix systems is done by controlling group ownership and file permissions.
File permissions are changed using the chmod command:
shell-prompt: chmod permission-specification path [path ...]
The permission specification has a symbolic form, and a raw form, which is an octal number.
The basic symbolic form consists of any of the three user categories 'u' (user/owner), 'g' (group), and 'o' (other/world) followed by a '+' (enable) or '-' (disable), and finally one of the three permissions 'r', 'w', or 'x'.
Add read and execute permissions for group and world on the Documents directory:
shell-prompt: chmod go+rx Documents
Disable all permissions for world on the Documents directory and enable read for group:
shell-prompt: chmod o-rwx,g+r Documents
Disable write permission for everyone, including the owner, on bootcamp.pdf. ( This can be used to reduce the chances of accidentally deleting an important file. )
shell-prompt: chmod ugo-w bootcamp.pdf
Run man chmod for additional information.
The raw form for permissions uses a 3-digit octal number to represent the 9 permission bits. This is a quick and convenient method for computer nerds who can do octal/binary conversions in their head.
shell-prompt: chmod 644 bootcamp.pdf # 644 = 110100100 = rw-r--r-- shell-prompt: chmod 750 Documents # 750 = 111101000 = rwxr-x---
Try the following commands, and try to predict the output of each ls before you run it.
shell-prompt: touch testfile shell-prompt: ls -l shell-prompt: chmod go-rwx testfile shell-prompt: ls -l shell-prompt: chmod o+rw testfile shell-prompt: ls -l shell-prompt: chmod g+rwx testfile shell-prompt: ls -l shell-prompt: rm testfile
Now set permissions on testfile so that it is readable, writable, and executable by you, only readable by the group, and inaccessible to everyone else.
Program1, which is a subdirectory of
Programs, which is a subdirectory of the current working directory.
readme.txtin the parent directory of the current working directory.
.cshrcin your home directory to the group "smithlab"?
.cshrcin your home directory to your friend with user name Bob? Explain.
.cshrcin your home directory so that only you can modify it, members of the group can read and execute it but not modify it, and anyone else can read it but not modify or execute it?