5. Self-Study Instructions

Note

This guide is updated frequently. Printing is recommended only for those who own stock in a paper company.

This guide is organized as a tutorial for users with little or no experience using the Unix command line or parallel computing resources.

If your institution does not offer a course following this text, you might consider registering for an independent study that requires turning in the practice problems at the end of each section. This will provide motivation to master the material rather than just read it and move on. You and the course supervisor should review the guide and select the appropriate chapters and sections for your study at the beginning of the semester.

5.1. Unix Self-Study Instructions

To begin learning the Unix environment, readers should do the following:

  1. Get access to a Unix system if you don't have it already. You will need this to practice running Unix commands and writing basic scripts. Apple Macintosh, BSD and Linux systems are all Unix compatible. If you are running Windows, you can quickly and easily add a Unix environment to it by installing Cygwin following the instructions in the section called “Cygwin: Try This First”.
  2. Thoroughly read Chapter 3, Using Unix up to and including the section called “The Unix File System”. The remaining sections can be covered later after gaining some hands-on experience. Do the self-test at the end of each section.
  3. Thoroughly read the section called “What is a Shell Script?” through the section called “Sourcing Scripts”. Do the Self-test at the end of each section.

5.2. Parallel Computing Self-Study Instructions

To begin learning the basics of parallel computing, readers should do the following:

  1. Thoroughly read Chapter 6, Parallel Computing and Chapter 7, Job Scheduling.
  2. Read Chapter 9, Job Scheduling with SLURM, but don't expect to understand it perfectly. Just familiarize yourself with the material so it will be easier to master during your first meeting with a facilitator.
  3. If you plan to use the HTCondor pool, skim over Chapter 10, Job Scheduling with HTCondor.

5.3. Instructor's Guide

A typical 3-credit semester course with 2.5 hours per week lecture time should be able to easily cover all of Part I, “Research Computing”, introduce parallel computing concepts and the SLURM scheduler, and possibly touch on Part III, “High Performance Programming”, High Performance Programming.

Knowledge of various computational methods is key to helping researchers find the most elegant solution to their research problems, and avoid the "solutions looking for problems" mentality, which often leads to using overly complex approaches. Students must be taught to focus first on the problem, explore all potential solutions, and choose the simplest among them, rather than the sexiest. Many will gravitate toward using machine learning, GPUs, parallel programming, etc. in order to impress people, where much simpler solutions would have worked.

A solid base in the Unix command-line and shell scripting is important, as most researchers waste time or outright fail to succeed due to lack of knowledge in these areas. I've seen many cases where badly written scripts slow down an analysis by an order of magnitude or more. Also problematic are non-portable programs and scripts that work on Ubuntu, but not RHEL, or on most Linux systems, but not Mac or vice versa. The Unix and scripting chapters emphasize portability and provide guidance on how to avoid non-portable "-isms".

A quick introduction to high performance programming is highly valuable, since most incoming students (even computer science students) will not know the difference between compiled and interpreted languages, or that software can be made to run significantly faster with an understanding of memory hierarchy and other hardware specifics. Just raising awareness of what is possible with the right software development choices will help them avoid wasting time going sideways.

5.4. Digging Deeper

The later sections of high performance programming and parallel computing, along with the parallel programming chapter, are best tackled after becoming comfortable with basic HPC/HTC usage, or in separate courses.

OK, enough talk. Let's get you guys edumacated...