High Performance Software on Intel Pentium Pro Processors or Micro-Ops to TeraFLOPS

We give a technical discussion of the Intel Pentium Pro processor and optimization strategies used to achieve high performance on scientific applications. We demonstrate these optimizations by characterizing matrix multiplication (DGEMM). We give insight and a model into our efforts on obtaining the world's first TeraFLOP MP LINPACK run (on the Intel ASCI Option Red Supercomputer), based on Pentium Pro processor technology. The importance is carried by the increasing trend of commodity parts in the supercomputing arena.