1.2. What is Unix?

1.2.1. Aw, man... I Have to Learn Another System?

Well, yeah, but it's the last time, I promise. As you'll see in the sections that follow, once you've learned to use Unix, you'll be able to use your new skills on virtually any computer. Over time you'll get better and better at it, and never have to start over from scratch again.

With rare exceptions, if you plan to do computational research, you have two choices:

  • Learn to use Unix.
  • Rely on the charity of others.

Most scientific software runs only on Unix and very little of it will ever have a graphical or other user interface that allows you to run it without knowing Unix.

The vast majority of high performance computing (HPC) clusters run Unix. You will need basic Unix skills to utilize HPC and HPC clusters generally do not offer a graphical interface. Some HPC administrators attempt to provide for people intent on avoiding Unix, but the results are severely limiting at best.

There have been many attempts to provide access to scientific software via web interfaces, but most of them are abandoned after a short time. People create them with good intentions, but without realizing that they will need to pour effort into maintenance for many years to come. Writing software is like adopting a puppy: It's fun and rewarding, but you need to be committed for the long-term.

In order to be independent in your research computing, you must know how to use Unix in the traditional way. This is the reality of research computing. It's much easier to adapt yourself to reality than to adapt reality to yourself. This chapter will help you become proficient enough to survive and even flourish on your own.

Unix began as the trade name of an operating system developed at AT&T Bell Labs around 1970. It quickly became the model on which most subsequent operating systems have been based. Eventually, "Unix" came into common use to refer to any operating system mimicking the original Unix, much like "Band-Aid" is now used to refer to any adhesive bandage purchased in a drug store.

Over time, formal standards were developed to promote compatibility between the various Unix-like operating systems, and eventually, Unix ceased to be a trade name. Today, the name Unix officially refers to a set of standards to which most operating systems conform.

Look around the room and you will see many standards that make our lives easier. ( Wall outlets, keyboards, USB ports, light bulb sockets, etc. ) All of these standards make it possible to buy interchangeable devices from competing companies. This competition forces the companies to offer better value. They need to offer a lower price and/or better quality than their competition in order to stay in business.

The Unix standards serve the same purpose as all standards; to foster collaboration, give the consumer freedom of choice, reduce unnecessary learning time, and annoy developers who would rather ignore what everyone else is doing and reinvent the wheel at their employer's expense to gratify their own egos. They allow us to become operating system agnostic nomads, readily switching from one Unix system to another as our needs or situations dictate.

In a nutshell, Unix is every operating system you're likely to use except Microsoft Windows. Table 1.1, “Partial List of Unix Operating Systems” provides links to many Unix-compatible operating systems. This is not a comprehensive list. Many more Unix-like systems can be found by searching the web.


Note

Apple's Mac OS X has many proprietary extensions, including Apple's own user interface, but is almost fully Unix-compatible and can be used much like any other Unix system by simply choosing not to use the Apple extensions. It is largely based on FreeBSD and other BSD-based components like the Mach kernel.

Note

When you develop programs for any Unix-compatible operating system, those programs can be easily used by people running any other Unix-compatible system. Most Unix programs can even be used on a Microsoft Windows system with the aid of a compatibility layer such as Cygwin (Section 1.4.1, “Cygwin: Try This First”).

Once you've learned to use one Unix system, you're ready to use any of them. Hence, Unix is the last system you'll ever need to learn!

Unix systems run on everything from your cell phone to the world's largest supercomputers. Unix is the basis for Apple's iOS, the Android mobile OS, embedded systems such as networking equipment and robotics controllers, most PC operating systems, and many large mainframe systems. Many Unix systems are completely free (as in free beer) and can run tens of thousands of high quality free software packages. As an extreme example, NetBSD runs on dozens of different CPU architectures, including some hobbyist systems such as Commodore Amigas, 68k-based Macs, etc.

It's a good idea to regularly use more than one Unix system. This will make you aware of how much they all have in common and what the subtle differences are.

1.2.2. Operating System or Religion?

Aside

Keep the company of those who seek the truth, and run from those who have found it.

-- Vaclav Havel

The more confident someone is in their views, the less they probably know about the subject. As we gain life experience and wisdom, we become less certain about everything and more comfortable with that uncertainty. What looks like confidence is usually a symptom of ignorance of our own ignorance, generally fueled by ego.

If you discuss operating systems at length with most people, you will discover, as the ancient philosopher Socrates did while discussing many topics with "experts", that their views are not based on broad knowledge and objective comparison. Before taking advice from anyone, it's a good idea to find out how much they really know and what role emotion and ego play in their preferences. This process of questioning has become known as a "Socratic examination". Note, however, that if you embarrass the wrong people, it may get you executed, as it did Socrates.

The whole point of the Unix standard, like any other standard, is freedom of choice. However, you won't have any trouble finding evangelists for a particular brand of Unix-compatible operating system on whom this point is lost. "Discussions" about the merits of various Unix implementations often involve arrogant pontification and emotional outbursts, possibly involving some cussing.

If you step back and ask yourself what kind of person gets emotionally attached to a piece of software, you'll realize whose advice you should value and whose you should not. Rational people will keep an open mind and calmly discuss the objective measures of an OS, such as performance, reliability, security, easy of maintenance, specific capabilities, etc. They will also back up their opinions with facts rather than try to bully you into validating their views.

If someone tells you that a particular operating system "isn't worth using", "is way behind the times", or "sucks wads", rather than asking you what you need and objectively discussing alternatives, this is someone whose advice you can safely ignore. They are not interested in helping you. They need you to validate their opinions, because those opinions are not supported by facts.

Aside

We're all capable of rational thought, but sometimes we only use it to rationalize what we want to believe, despite obvious evidence to the contrary.

"I don't understand why some people wash their bath towels. When I get out of the shower, I'm the cleanest object in my house. In theory, those towels should be getting cleaner every time they touch me. By the way, are towels supposed to bend?"

-- Wally (Dilbert)

Evangelists are easy to spot. They will instantly assess your needs without asking you a single question and proceed to explain (often aggressively) why you should be using their favorite operating system or programming language. They invariably have limited or no experience with other alternatives. This is easy to expose with a few simple questions. "How many years of experience to you have with it?" The answer is usually close to 0. "What are the specific advantages and disadvantages?" The response to this will usually be stuttering, silence, or double-talk. Ask them to clarify further and it won't take long to expose their ignorance.

Ultimately, the system that most easily runs your programs to your satisfaction is the best one for you. That could turn out to be BSD, Cygwin, Linux, Mac OS X, OpenIndiana, or any other. Someone who knows what they're doing and truly wants to help you will always begin by asking questions in order to better understand your needs. "What program(s) do you need to run?", "Do they require any special hardware?", "Do you need to run any commercial software, or just open source?", etc. They will then consider multiple alternatives and inform you about the capabilities of each one that might match your needs.

There is another reason besides ego that people often choose inappropriate solutions to a problem; the desire to use what they know instead of being open to learning a better approach.

Aside

When all you have is a hammer, everything looks like a nail.

I regularly experiment with various Unix variants to evaluate their ease of use, reliability, and resource requirements. This is easy to do using virtual machines (See Chapter 7, Running Multiple Operating Systems.) My personal preference for running Unix software (for now, these could change in the distant future) are listed below. All of these systems are somewhat interchangeable with each other and the many other Unix based systems available, so deviating from these recommendations will generally not lead to catastrophe.

More details on choosing a Unix platform are provided in Chapter 4, Platform Selection.

  • Servers running mostly open source software: FreeBSD.

    FreeBSD is extremely fast, reliable, and secure. It is known as a "set-and-forget" operating system, since it requires very little attention after initial installation and configuration. Software management is very easy with FreeBSD ports, which offers over 30,000 software packages (not counting different builds of the same software). The ports system supports installation via either generic binary packages, or you can just as easily build from source with custom options or optimizations for your specific CPU. With the Linux compatibility module, FreeBSD can directly run most Linux closed-source programs with no performance penalty and a little added effort and resources.

    FreeBSD with the Lumina desktop environment
  • Servers running mainly or commercial applications or CUDA GPU software: Enterprise Linux (AlmaLinux, CentOS, RHEL, Rocky Linux, Scientific Linux, SUSE).

    These systems are designed for better reliability, security, and long-term binary compatibility than bleeding-edge Linux systems. They are the only platforms besides MS Windows and Mac OS X supported by many commercial software vendors. While you may be able to get some commercial engineering software running on Ubuntu or Mint, it is often difficult and the company will not provide support. Packages in the native Yum repository of enterprise Linux are generally outdated, but more recent open source software can be installed using a separate add-on package manager such as pkgsrc.

  • An average Joe who wants to browse the web, use a word processor, etc.: Debian, GhostBSD, Ubuntu, or similar open source Unix system with graphical installer and management tools, or Macintosh.

    These systems make it easy to install software packages and system updates, with minimal risk of breakage that Joe would not know how to fix.

    Debian Linux
  • Someone who uses mostly Windows-based software, but needs a basic Unix environment for software development or connecting to other Unix systems: A Windows PC with Cygwin.

    Cygwin is free, entirely open source, and very easy to install in about 10 minutes on most Windows systems. It has some performance bottlenecks, fewer packages than a real Unix system running on the same machine, and a few other limitations, but it's more than adequate for the needs of many typical users. See Section 1.4.1, “Cygwin: Try This First” for details.

1.2.3. The Unix Standard API

Programmer time is expensive. Writing a program twice costs twice as much. Unix standards solve this problem.

Unix systems adhere to an application program interface (API) standard, which means that programs written for one Unix-based system can be run on any other with little or no modification, even on completely different hardware. For example, programs written for an Intel/AMD-based Linux system will also run an ARM-based Mac, or FreeBSD on an ARM, Power, or RISC-V processor.

An API defines a set of functions (subprograms) used to request services from the operating system, such as opening a file, allocating memory, running another program, etc. These functions are the same on all Unix systems, but some of them are different on Windows and other non-standard systems. For example, to open a file in a C program on any Unix system, one would typically use the fopen() function:

            FILE *fopen(const char *filename, const char *mode);
            

Microsoft compilers support fopen() as well, but also provide another function for the same purpose that only works on Windows:

            errno_t fopen_s(FILE** pFile, const char *filename, const char *mode);
            

Note

Microsoft claims that fopen_s() is more secure, which is debatable. Note however, that even if this is true, the existing fopen() function itself could have been made more secure rather than creating a separate, non-portable function that does the same thing. Non-standard functions like fopen_s() mainly benefit the vendor by making it harder to port software to a competing platform.

Here are a few other standard Unix functions that can be used in programs written in C and most other compiled languages. These functions can be used on any Unix system, regardless of the type of hardware running it. Some of these may also work in Windows, but for others, Windows uses a completely different function to achieve the same goal.

            chdir()         // Change current working directory
            execl()         // Load and run another program
            mkdir()         // Create a directory
            unlink()        // Remove a file
            sleep()         // Pause execution of the process
            DisplayWidth()  // Get the width of the graphical screen
            

Because the Unix API is platform-independent, it is also possible to compile and run most Unix programs on Windows with the aid of a compatibility layer, software that bridges the difference between two platforms. ( See Section 1.4.1, “Cygwin: Try This First” for details. ) It is not generally possible to compile and run Windows software on Unix, however, because Windows has many features specific to PC hardware.

Since programs written for Unix can be run on almost any computer, including Windows computers, they will never have to be rewritten in order to run somewhere else. Programs written for non-Unix platforms will only run on that platform, and will have to be rewritten (at least partially) in order to run on any other system. This leads to an enormous waste of man-hours that could have gone into creating something new. They may also become obsolete as they proprietary systems for which they were written evolve. For example, most programs written for MS DOS and Windows 3.x are no longer in use today, while programs written for Unix around that same time will still work on modern Unix systems.

1.2.4. Shake Out the Bugs

Another advantage of programming on standardized platforms is the ability to easily do more thorough testing. Compiling and running a program on multiple operating systems and with multiple compilers will almost always expose bugs that you were unaware of while running it on the original development system. The same bug will have different effects on different operating systems, with different compilers or interpreters, or with different compile options (e.g. with and without optimization).

For example, an errant array subscript or pointer might cause corruption in a non-critical memory location in some environments, while causing the program to crash in others.

A program may seem to be fine when you compile it with Clang and run it on your Mac, but may not compile, or may crash when compiled with GCC on a Linux machine.

Finding bugs now may save you from the stressful situation of tracking them down under time pressure later, with an approaching grant deadline. A bug that was invisible on your Mac for the test cases you've used could also show up on your Mac later, when you run the program with different inputs.

Developing for the Unix API makes it easy to test on various operating systems and with different compilers. There are many free BSD and Linux based systems, as well as free compilers such as Clang and GCC. Most of them can be run in a virtual machine (Chapter 7, Running Multiple Operating Systems), so you don't even need another computer for the sake of program testing. Take advantage of this easy opportunity to stay ahead of program bugs, so they don't lead to missed deadlines down the road.

1.2.5. The Unix Standard UI

The Unix standards not only make programs portable, they make our knowledge as users portable as well. All Unix systems support the same basic set of commands, which conform to standards so that they behave the same way everywhere. So, if you learn to use FreeBSD, most of that knowledge will directly apply to Linux, Mac OS X, Solaris, etc.

Another part of the original Unix design philosophy was to do everything in the simplest way possible. As you learn Unix, you will likely find some of its features befuddling at first. However, upon closer examination, you will often come to appreciate the elegance of the Unix solution to a difficult problem. If you're observant enough, you'll learn to apply this Zen-like simplicity to your own work, and maybe even your everyday life.

You will also gradually recognize a great deal of consistency between various Unix commands and functions. For example, many Unix commands support a -v (verbose) flag to indicate more verbose output, as well as a -q (quiet) flag to indicate no unnecessary output. Over time, you will develop an intuitive feel for Unix commands, become adept at correctly guessing how things work, and feel almost God-like at times.

Unix documentation also follows a few standard formats, which users quickly get used to, making it easier to learn new things about commands on any Unix system.

In a nutshell, the time and effort you spend learning any Unix system will make it easy to use any other in the future. You need only learn Unix once, and you'll be proficient with many different implementations such as FreeBSD, Linux, and Mac OS X.

1.2.6. Fast, Stable and Secure

Since Unix systems compete directly with each other to win and retain users running the same programs, developers are highly motivated to optimize objective measures of the system such as performance, stability, and security.

Most Unix systems operate near the maximum speed of the hardware on which they run. Unix systems typically respond faster than other systems on the same hardware and run intensive programs in less time. Many Unix systems require far fewer resources than non-Unix systems, leaving more disk and memory for use by your programs.

Unix systems tend to be very reliable and may run for months or even years without being rebooted. I managed a particular FreeBSD HPC cluster for eight years. Except for some problems in the first few months that were traced to a Dell firmware bug, none of the servers in this cluster ever crashed.

Unlike Windows, software installations almost never require a reboot, and even most security updates can be applied without rebooting. Reboots are typically only needed following a kernel update.

Stability is critical for research computing, where computational models may run for weeks or months. Users of non-Unix operating systems often have to choose between killing a process that has been running for weeks and neglecting critical security updates that require a reboot.

Very few viruses or other malware programs exist for Unix systems. This is in part due to the inherently better security of Unix systems and in part due to a strong tradition in the Unix community of discouraging users from engaging in risky practices such as running programs under an administrator account and installing software from pop-ups on the web.

1.2.7. Sharing Resources

Your mom probably told you that it's nice to share, but did you know it's also more efficient?

One of the major problems for researchers in computational science is managing their own computers. Most researchers aren't very good at installing operating systems, managing software, apply security updates, etc., nor do they want to be. Unfortunately, they often have to do these things in order to conduct computational research. Computers managed by a tag-team of researchers usually end up full of junk software, out-of-date, full of security issues, and infected with malware.

Since Unix is designed from the ground up to be accessed remotely, Unix creates an opportunity to serve researchers' needs far more cost-effectively than individual computers for each researcher. A single Unix machine on a modern PC can support dozens or even hundreds of users at the same time, depending how demanding their software is.

1.2.8. Practice

Instructions

  1. Make sure you are using the latest version of this document.

  2. Carefully read one section of this document and casually read other material (such as corresponding sections in a textbook, if one exists) if needed.

  3. Try to answer the questions from that section. If you do not remember the answer, review the section to find it.

  4. Write the answer in your own words. Do not copy and paste. Verbalizing answers in your own words helps your memory and understanding. Copying does not, and demonstrates a lack of interest in learning.

  5. Check the answer key to make sure your answer is correct and complete.

    DO NOT LOOK AT THE ANSWER KEY BEFORE ANSWERING QUESTIONS TO THE VERY BEST OF YOUR ABILITY. In doing so, you would only cheat yourself out of an opportunity to learn and prepare for the quizzes and exams.

Important notes:

  • Show all your work. This will improve your understanding and ensure full credit for the homework.

  • The practice problems are designed to make you think about the topic, starting from basic concepts and progressing through real problem solving.

  • Try to verify your own results. In the working world, no one will be checking your work. It will be entirely up to you to ensure that it is done right the first time.

  • Start as early as possible to get your mind chewing on the questions, and do a little at a time. Using this approach, many answers will come to you seemingly without effort, while you're showering, walking the dog, etc.

  1. After learning Unix, on what operating systems will you be able to use your new skills?

  2. What is the major design goal of the Unix standards?

  3. What is the alternative to learning Unix for computational scientists? Why?

  4. Why does most scientific software lack a convenient graphical or web interface?

  5. Is Unix an operating system? Why or why not?

  6. What is the advantage of open standards?

  7. How many different Unix-compatible operating systems exist? What does this mean for Unix users?

  8. Which mainstream operating systems are Unix-compatible and which are not?

  9. What types of computer hardware run Unix?

  10. How much does Unix cost?

  11. Which Unix operating system is the best one?

  12. How should we go about choosing a Unix system? What if we make the wrong choice?

  13. How do we spot evangelists who are likely to give us irrational advice?

  14. What is an API?

  15. What is the advantage of the Unix API over the APIs of non-Unix operating systems? What problem does it solve?

  16. Can software written for Unix be run on Windows? How?

  17. How does the Unix API help us proactively eliminate software bugs?

  18. What is a UI? What are three advantages of the Unix UI over the UIs of non-Unix operating systems?

  19. Why are Unix-compatible operating systems faster, more stable, and more secure than many non-Unix platforms?

  20. How does the inherent remote access capabilities of Unix help researchers?