This guide is organized as a tutorial for users with little or no experience using the Unix command line or parallel computing resources.
If your institution does not offer a course following this text, you might consider registering for an independent study that requires turning in the practice problems at the end of each section. This will provide motivation to master the material rather than just read it and move on. You and the course supervisor should review the guide and select the appropriate chapters and sections for your study at the beginning of the semester.
To begin learning the Unix environment, readers should do the following:
To begin learning the basics of parallel computing, readers should do the following:
A typical 3-credit semester course with 2.5 hours per week lecture time should be able to easily cover all of Part I, “Research Computing”, introduce parallel computing concepts and the SLURM scheduler, and possibly touch on Part III, “High Performance Programming”, High Performance Programming.
Knowledge of various computational methods is key to helping researchers find the most elegant solution to their research problems, and avoid the "solutions looking for problems" mentality, which often leads to using overly complex approaches. Students must be taught to focus first on the problem, explore all potential solutions, and choose the simplest among them, rather than the sexiest. Many will gravitate toward using machine learning, GPUs, parallel programming, etc. in order to impress people, where much simpler solutions would have worked.
A solid base in the Unix command-line and shell scripting is important, as most researchers waste time or outright fail to succeed due to lack of knowledge in these areas. I've seen many cases where badly written scripts slow down an analysis by an order of magnitude or more. Also problematic are non-portable programs and scripts that work on Ubuntu, but not RHEL, or on most Linux systems, but not Mac or vice versa. The Unix and scripting chapters emphasize portability and provide guidance on how to avoid non-portable "-isms".
A quick introduction to high performance programming is highly valuable, since most incoming students (even computer science students) will not know the difference between compiled and interpreted languages, or that software can be made to run significantly faster with an understanding of memory hierarchy and other hardware specifics. Just raising awareness of what is possible with the right software development choices will help them avoid wasting time going sideways.
The later sections of high performance programming and parallel computing, along with the parallel programming chapter, are best tackled after becoming comfortable with basic HPC/HTC usage, or in separate courses.
OK, enough talk. Let's get you guys edumacated...