Table of Contents
Before reading this chapter, you should be familiar with basic Unix concepts (Chapter 3, Using Unix), the Unix shell (the section called “Command Line Interfaces (CLIs): Unix Shells”, redirection (the section called “Redirection and Pipes”), shell scripting (Chapter 4, Unix Shell Scripting), and have some experience with computer programming.
High Throughput Computing (HTC) is the simplest and generally most economical form of distributed parallel computing.
In high throughput computing, processes are dispatched to run independently on multiple computers at the same time. The processes are typically serial programs, not parallel programs, so HTC can be thought of as parallel computing without parallel programming.
HTC is sometimes referred to as embarrassingly parallel computing, since it is so easy to implement.
Usually, the same program runs on all computers with different data or inputs, but the definition of HTC is not limited to this scenario. If you are running multiple independent processes simultaneously on different computers, you're using HTC.
HTC has several advantages:
It's the easiest type of parallelism to implement. No parallel programming is necessary. It's just a matter of running a serial program in multiple places at the same time.
It does not require a high-speed network or any other special hardware. HTC often utilizes lab computers across a college campus, or even home computers around the world.
It scales almost infinitely. Since the processes are independent of each other, there is no communication overhead between them, and you can run as many processes as you have cores to run them on with nearly perfect speedup. That is, if you run 1000 processes at once, the computations will finish almost 1000 times faster than if you ran them one at a time.