Computer desktop image
Teaching parallel computing concepts with a desktop computer
Abstract
Parallel computing is currently used in many engineering problems. However, because of limitations in curriculum design, it is not always possible to offer students specific formal teaching in this topic. Furthermore, parallel machines are still too expensive for many institutions. The latest microprocessors, such as Intel's Pentium III and IV, embody single instruction multiple-data (SIMD) type parallel features, which makes them a viable solution for introducing parallel computing concepts to students. Final year projects have been initiated utilizing SSE (streaming SIMD extensions) features and it has been observed that students can easily learn parallel programming concepts after going through some programming exercises. They can now experiment with parallel algorithms on their own PCs at home.
Keywords electrical engineering; parallel computing; SIMD paradigm
Parallel programming is a viable method for solving computationally intensive problems in various fields. In electrical engineering, for instance, solving power systems network equations is an area where parallel algorithms are being developed and applied.1 A popular approach to implementing parallel algorithms is to employ a cluster or a network of PCs. With the advances made in computer hardware and software, it is now quite a simple matter to configure a computer network and program it to solve problems cooperatively. The parallel wavelet transform2 and software simulation demonstrated by Sena et al? are interesting applications that have been implemented on a computer cluster. The common programming paradigm for this type of parallel algorithm is either MPMD (multiple program multiple data) or SPMD (single program multiple data). Communication is based on messages passing between processors. Very often, standard message-passing interfaces, such as MPI4 or PVM,5 are used for this purpose. As parallel computers, either in the form of a network of PCs or dedicated machines, become common, so the skills to utilize them fully will become very valuable.
Although parallel computing is a very useful technique, it is often excluded from the traditional engineering curriculum, which therefore hinders the deployment of parallel programs in industry. In the Electrical Engineering Department at Hong Kong Polytechnic University, there are only two computing subjects being taught during the three years of the undergraduate degree program. In their first year, students learn a programming language: currently, the C language is being taught. In the second year, students study a subject to acquire computer hardware and assembly language programming skills. There are, however, some elective subjects including software engineering, computer networks, and industrial computer applications.
On the other hand, in the School of Electrical and Electronic Engineering at Singapore Polytechnic, there are specialized diploma courses in computers and networking, as well as mainstream courses such as electronics and communication engineering. During their studies students are taught the basic theory of computing and given a great amount of practical knowledge. However, parallel computing is not included in the curriculum.
As discussed,6 it is desirable to provide students with parallel programming skills early in their studies so that these skills can be applied to other fundamental and advanced subjects. However, for an engineering programme it is not possible to implement such a well-defined structure. Considering the significance of parallel computing and its applications in engineering, it is desirable to introduce the concept through some other means. In our case, the final year project is the most suitable method.
As discussed earlier, the current trend in parallel computing is to employ a cluster of workstations and the SPMD programming paradigm. Availability, maintenance and network traffic concerns deterred us from using this solution to train undergraduates. Fortunately, in many modern microprocessors, including the Intel Pentium series, SIMD type parallelism is supported so we can develop a parallel program with a PC even at home. Most importantly, students can learn the concept of parallel computing and implement programs to solve real engineering problems.
In the following section, we introduce the SIMD feature embedded in the Intel microprocessors and the programming model based on it. In later sections, we present our attempts to foster parallel computing concepts at Hong Kong Polytechnic University, where we passed a real engineering problem to the students and asked them to solve it using the Pentium processor's parallel computing features. We also present the work done by another project group at Singapore Polytechnic, where students worked with the same concept and implemented an image processing application. This is then followed by a discussion of the execution of student projects. The final section discusses the merits of this self-learning exercise.
SIMD parallelism
The traditional classification of parallel algorithms was introduced by Flynn and is based on parallelism in instructions and data. SIMD (single instruction multiple data) parallelism was widely applied in bit-serial massively parallel machines, including the MPP7 and connection machine.8 As the name implies, a massively parallel machine consists of many processors and they perform the same operation simultaneously on large quantities of data. Because of advances made in microprocessor fabrication techniques, SIMD is now a common feature included in many recent microprocessors (see, for example, the AMD processor,9 the SunSparc processor10 and the PowerPC processor11). Early SIMD machines processed a single bit at a time in parallel. However, recent SIMD features embedded in processors allow inparallel processing of multiple-bit data structures, ranging from integer to doubleprecision floating-point numbers. With such flexibility, it is possible to implement algorithms manipulating various data types on these systems. We use Intel microprocessors in our experiments as these processors are commonly used in PCs. Consequently we concentrate on the SIMD feature embedded in Intel processors. On the other hand, AMD microprocessors are also a popular choice in the desktop computer market and the 3D NOW!9 technology embedded in the AMD processors supports the SIMD mechanism. We can, therefore, also implement the parallel programs with AMD processors.
The SIMD feature that is included in the Intel Pentium processors is called SSE12-'4 and is currently available in the Pentium III and IV classes of microprocessors. This can be regarded as a second generation SIMD feature, its predecessor being the MMX feature.13 The major difference between MMX and SSE is the data structure that they can process in parallel. MMX registers can operate only on integers, whereas SSE can manipulate both integer and floating-point data types. In many engineering problems, floating-point arithmetic is used, so SSE is a natural choice.
Streaming SIMD extension (SSE)
SSE registers are 128-bit wide and they can store packed values as characters, integers and floating-points. There are eight SSE registers and they can be directly addressed using their register names.13-14 Therefore, utilizing these registers in any program becomes a straightforward process with suitable tools. In the case of integers, eight 16-bit integers can be packed into a single 128-bit register and processed in parallel. Similarly, four 32-bit floating-point values can also be fitted into the 128bit registers and processed in parallel, as shown in Fig. 1. For the Pentium IV microprocessors, the SSE feature is further extended to support the parallel processing of two 64-bit double precision floating-point values.
The first step in applying the SSE feature is packing data into the 128-bit SSE register. When two vectors of four floating-point values are loaded into two SSE registers, as shown in Fig. 1, SIMD operations, such as add, multiply, etc. can be applied to the two vectors in one single operation step, so that theoretically a four times speed-up is possible.