|
|
(13 intermediate revisions by the same user not shown) |
Line 9: |
Line 9: |
| We are currently supporting two Compiler Suites on the Linux platform: | | We are currently supporting two Compiler Suites on the Linux platform: |
| | | |
− | * The '''Intel Compiler Suite''' is located in the '''/opt/ics''' directory. The version is 12.1. The compilers '''ifort''' and '''icc''' are in the /opt/ics/composer_xe_2011_sp1.6.233/bin/intel64 directory. Various libraries are in the /opt/ics/lib/intel64 directory. | + | * The '''Intel Compiler Suite''' is located on the Compute Canada CVMFS software stack. The compilers '''ifort''' and '''icc''' are in the default path when you log in. |
− | * As part of the CentOS distribution, we also have the '''Gnu C/C++ and Fortran Compilers''' called '''gcc''', '''g++''' and '''gfortran''', respectively. They are installed in /usr/bin. | + | ** The default version for the Intel compilers is 2016.4 |
| + | ** Other versions (including newer ones) are available. User the "module avail intel" command for a list. |
| + | * The CVMFS stack also includes the '''Gnu C/C++ and Fortran Compilers''' called '''gcc''', '''g++''' and '''gfortran''', respectively. |
| + | ** The default version for the Gnu compilers is 5.4.0 |
| + | ** Other versions (including newer ones) are available. User the "module avail gcc" command for a list. |
| | | |
| == Setup == | | == Setup == |
| | | |
− | * For setting up the '''Intel Compiler Suite''' you need to issue the command <pre>use icsmpi</pre>. This replaces the issuing of lengthy setting for environment variables by a simple command '''use'''. It also adds the proper directories to the PATH variable. | + | * Neither the Intel suite nor the Gnu compilers need to be specifically set up the default version. |
− | | + | * Non-default versions of the compilers can be listed through the "module avail intel" and "module avail gcc" commands, respectively. |
− | * The public-domain compilers "gcc" and "gfortran" are available by default, i.e. they '''require no set-up'''. The current version for these is 4.4.7-4. Sometimes these compilers are required when compiling public-domain programs. We recommend the use of these compilers unless Intel is required to improve performance. | + | * To setup a non-default version of a compiler, use the command "module load intel/version" or "module load gcc/version", respectively ("version" stands for the specific version). For instance: |
| + | <pre> |
| + | module load intel/2017.5 |
| + | </pre> |
| + | The last command replaces the default 2016.4 version of the Intel compilers with the newer 2017.5 version. This also "re-loads" all the dependencies an reports on it. |
| |} | | |} |
| | | |
Line 39: |
Line 47: |
| |ifort | | |ifort |
| |colspan="2"|icc | | |colspan="2"|icc |
− | |use icc | + | |none-default : module load intel/version |
| |- | | |- |
| |Gnu | | |Gnu |
Line 45: |
Line 53: |
| |gcc | | |gcc |
| |g++ | | |g++ |
− | |n/a | + | |none-default : module load gcc/version |
| |} | | |} |
| | | |
Line 63: |
Line 71: |
| | | |
| Using the compilers and the linker in the above manner requires the proper setting of the PATH environment variable, i.e. prior set-up. | | Using the compilers and the linker in the above manner requires the proper setting of the PATH environment variable, i.e. prior set-up. |
| + | |
| + | === Options / flags === |
| + | |
| There are hundreds of compiler flags, and many of them are not required most of the time. A few that are in more frequent use are: | | There are hundreds of compiler flags, and many of them are not required most of the time. A few that are in more frequent use are: |
| | | |
| * '''-On''' optimizes your code. "n" is a number from 1 to 5 with increasing severity of alterations made to the code, but also increasing gain. Up to -xO3 is generally rather safe to use. But you should, of course, always check results against an un-optimized version: they might differ. | | * '''-On''' optimizes your code. "n" is a number from 1 to 5 with increasing severity of alterations made to the code, but also increasing gain. Up to -xO3 is generally rather safe to use. But you should, of course, always check results against an un-optimized version: they might differ. |
− | * '''-fast'' is a combination of optimization flags that is reasonably safe to use and often improves performance a lot. However, the resulting code is optimized specifically for the current machine architecture and may not work well (or at all) on platforms that differ from the one compiled on. Note that this usually overrides the "-On" option. If you use this flag for compiling, you also need to include it at the linking stage.
| |
| * '''-g''' produces code that can be debugged. -g and -On are not necessarily mutually exclusive, but optimization may make debugging difficult, because it alters the relationship between source code and executable. This is a good flag to have in the development stage of a program, but is usually dropped later. | | * '''-g''' produces code that can be debugged. -g and -On are not necessarily mutually exclusive, but optimization may make debugging difficult, because it alters the relationship between source code and executable. This is a good flag to have in the development stage of a program, but is usually dropped later. |
| * '''-V''' (or '''-v''') produces the version of the compiler. | | * '''-V''' (or '''-v''') produces the version of the compiler. |
− | * '''-l'''name is used to bind in a library called libname.a (static) or libname.so (dynamic). This flag is used to link only. | + | * '''-l''' name is used to bind in a library called libname.a (static) or libname.so (dynamic). This flag is used to link only. |
| * '''-L''' dirname is used in conjunction with -lname and lets the linker know where to look for libraries. "dirname" is a directory name such as /opt/studio12/SUNWspro/prod/lib. | | * '''-L''' dirname is used in conjunction with -lname and lets the linker know where to look for libraries. "dirname" is a directory name such as /opt/studio12/SUNWspro/prod/lib. |
− | * '''-R'''dirname is used to tell the program where to get dynamic libraries at runtime. | + | * '''-R''' dirname is used to tell the program where to get dynamic libraries at runtime. |
| | | |
− | There are many more flags. They are documented in the man pages (e.g. "man f90" for the Studio Fortran compiler), as well in the documentation for the compiler. Some compiler flags are only useful for parallel programs and will be discussed later. Sometimes there is a considerable performance gain from using specific options (such as | + | There are many more flags. They are documented in the man pages (e.g. "man ifort" for the Intel Fortran compiler), as well in the documentation for the compiler. Some compiler flags are only useful for parallel programs and will be discussed later. |
− | "-fast"), but the code becomes less general.
| + | |
| | | |
− | == An Example ==
| |
− |
| |
− | The working principle of MPI is perhaps best illustrated on the grounds of a programming example. The following program, written in Fortran 90 computes the sum of all square-roots of integers from 0 up to a specific limit m:
| |
− |
| |
− | <pre>
| |
− | module mpi
| |
− | include 'mpif.h'
| |
− | end module mpi
| |
− |
| |
− | module cpuids
| |
− | integer::myid,totps, ierr
| |
− | end module cpuids
| |
− |
| |
− | program example02
| |
− | use mpi
| |
− | use cpuids
| |
− | call mpiinit
| |
− | call demo02
| |
− | call mpi_finalize(ierr)
| |
− | stop
| |
− | end
| |
− |
| |
− | subroutine mpiinit
| |
− | use mpi
| |
− | use cpuids
| |
− | call mpi_init( ierr )
| |
− | call mpi_comm_rank(mpi_comm_world,myid,ierr)
| |
− | call mpi_comm_size(mpi_comm_world,totps,ierr)
| |
− | return
| |
− | end
| |
− |
| |
− | subroutine demo02
| |
− | use mpi
| |
− | use cpuids
| |
− | integer:: m, i
| |
− | real*8 :: s, mys
| |
− | if(myid.eq.0) then
| |
− | write(*,*)'how many terms?'
| |
− | read(*,*) m
| |
− | end if
| |
− | call mpi_bcast(m,1,mpi_integer,0,mpi_comm_world,ierr)
| |
− | mys=0.0d0
| |
− | do i=myid,m,totps
| |
− | mys=mys+dsqrt(dfloat(i))
| |
− | end do
| |
− | write(*,*)'rank:', myid,'mys=',mys, ' m:',m
| |
− | s=0.0d0
| |
− | call mpi_reduce(mys,s,1,mpi_real8,mpi_sum,0,mpi_comm_world,ierr)
| |
− | if(myid.eq.0) then
| |
− | write(*,*)'total sum: ', s
| |
− | end if
| |
− | return
| |
− | end
| |
− | </pre>
| |
− |
| |
− | Some of the common tasks that need to be performed in every MPI program are done in the subroutine mpiinit in this program. Namely, we need to call the routine ''mpi_init'' to prepare the usage of MPI. This has to be done before any other MPI routine is called. The two routine calls to ''mpi_comm_size'' and ''call mpi_comm_rank'' determine how many processes are running and what is the unique ID number of the present, i.e. the calling process. Both pieces of information are essential. The results are stored in the variables ''totps'' and ''myid'', respectively. Note that these variables appear in a module ''cpuids'' so that they may be accessed from all routines that "use" that module.
| |
− |
| |
− | The main work in the example is done in the subroutine ''demo02''. Note that this routine does use the module ''cpuids''. The first operation is to determine the maximum integer ''m'' in the sum by requesting input from the user. The if-clause ''if(myid.eq.0) then'' serves to restrict this I/O operation to only one process, the so-called "root process", usually chosen to be the one with rank (i.e. unique ID number) zero.
| |
− |
| |
− | After this initial operation, communication has become necessary, since only one process has the right value of ''m''. This is done by a call to the MPI collective operation routine ''mpi_bcast''. This call has the effect of "broadcasting" the integer ''m''. This call needs to be made by all processes, and after they have done so, all of them know ''m''.
| |
− |
| |
− | The sum over the square root is then executed on each process in a slightly different manner. Each term is added to a local variable ''mys''. A stride of ''totps'' (the number of processes) in the do-loop ensures that each process adds different terms to its local sum, by skipping all others. For instance, if there are 10 processes, process 0 will add the square-roots of 0,10,20,30,..., while process 7 will add the square-roots of 7,17,27,37,...
| |
− |
| |
− | After the sums have been completed, further communication is necessary, since each process only has computed a partial, local sum. We need to collect these local sums into one total, and we do so by calling ''mpi_reduce''. The effect of this call is to "reduce" a value local to each process to a variable that is local to only one process, usually the root process. We can do this in various ways, but in our case we choose to sum the values up by specifying ''mpi_sum'' in the function call. Afterwards, the total sum resides in the variable ''s'', which is printed out by the root process.
| |
− |
| |
− | The last operation done in our example is finalizing MPI usage by a call to ''mpi_finalize'', which is necessary for proper program completion.
| |
− |
| |
− | In this simple example, we have distributed the tasks of computing many square roots among processes, each of which only did a part of the work. We used communication to exchange information about the tasks that needed to be performed, and to collect results. This mode of programming is called "task parallel". Often it is necessary to distribute large amounts of data among processes as well, leading to "data parallel" programs. Of course, the distinction is not always clear.
| |
| |} | | |} |
| | | |
Line 149: |
Line 90: |
| | valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" | | | | valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" | |
| | | |
− | == Implementations == | + | == Documentation == |
| | | |
− | While MPI itself is a portable, platform independent standard, much like a programming language, the actual implementation is necessarily platform dependent since it has to take into account the architecture of the machine or cluster in question.
| + | The best way to get a quick list of compiler options is to use the "'''man pages'''". Just type "man compiler" where "compiler" stands for the name of the compiler you want to use, and get a long explanation of all the relevant options. This is not very user-friendly, but great for a quick look-up. |
| | | |
− | The most commonly used implementation of MPI for the Linux platform is called '''OpenMPI'''. The following considerations will be focussed on this implementation.
| + | * [https://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/win/main/main_cover_title.htm Documentation for the Intel Fortran Compiler can be found here]. |
| + | * [https://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/cpp/lin/ Documentation for the Intel C/C++ Compiler can be found here]. |
| + | * [https://gcc.gnu.org/onlinedocs/ All Gnu compilers are documented here.] |
| | | |
− | Our machines are small to mid-sized shared-memory machines that form a cluster. Since the interconnect between the individual nodes is a bottleneck in efficient program execution, most of the MPI programs running on our machines are executed within a node. This alloows processes to commuincate rapidly through a so-called "shared-memory layer". Our cluster is configured in to preferably schedule processes within a single node.
| + | == Help == |
| | | |
− | Currently, two versions of the OpenMPI parallel environment are in common use:
| + | * If you have questions that you can't resolve by checking documentation, [mailto:cac.help@queensu.ca send email to cac.help@queensu.ca] |
− | * For the '''Intel compiler suite''', an Intel implementation of OpenMPI is automatically available when setting up the compiler suite with the <pre>use icsmpi</pre> command. | + | * If you want to start a larger project that involves making code executable on parallel machines, you may want to do this even before you start and we can point you in the right direction. |
− | * For the '''gnu''' compiler, OpenMPI is made available through the <pre>use openmpi</pre> setup command.
| + | |
− | | + | |
− | We do not recommend to have both versions set up simultaneously.
| + | |
− | | + | |
− | == Compiling MPI code ==
| + | |
− | | + | |
− | The compilation of MPI programs requires a few compiler options to direct the compiler to the location of header files and libraries. Since these switches are always the same, they have been collected in a macro to avoid unnecessary typing. The macro is has an mpi prefix before the normal compiler name. The commands are '''mpiifort''' for the Intel Fortran compiler, '''mpiicc''' for the gnu C compilers, respectively. For instance, if a serial C program is compiled by
| + | |
− | | + | |
− | <pre>gcc -O3 -c test.c</pre>
| + | |
− | | + | |
− | the corresponding parallel (MPI) program is compiled (using gnu compiler) by
| + | |
− | | + | |
− | <pre>mpicc -xO3 -c test_mpi.c</pre>
| + | |
− | | + | |
− | In the linking stage, the usage of '''mpi*''' macros also includes the proper specification of the MPI libraries. For example, the above MPI program should be linked with something like this:
| + | |
− | | + | |
− | <pre>mpicc -o test_mpi.exe test_mpi.o</pre>
| + | |
− | | + | |
− | Compiling and linking may also be combined by omitting the ''-c'' option and including the naming option (''-o'') in the compilation line.
| + | |
− | | + | |
− | Here are the corresponding MPI macros for the 6 commonly used compilers on our systems:
| + | |
− | | + | |
− | {| class="wikitable sortable" border="1" cellpadding="2" cellspacing="0"
| + | |
− | |'''Language'''
| + | |
− | |'''Intel'''
| + | |
− | |'''gnu'''
| + | |
− | |-
| + | |
− | |''Fortran''
| + | |
− | | mpiifort
| + | |
− | | mpif77, mpif90, mpifort
| + | |
− | |-
| + | |
− | |''C''
| + | |
− | | mpiicc
| + | |
− | | mpicc
| + | |
− | |-
| + | |
− | |''C++''
| + | |
− | | mpiicc, mpiicpc
| + | |
− | | mpicxx
| + | |
− | |}
| + | |
− | | + | |
− | == Running MPI programs ==
| + | |
− | | + | |
− | To run MPI programs, a special Runtime Environment is required. This includes commands for the control of multi-process jobs.
| + | |
− | | + | |
− | '''mpirun''' is used to start a multi-process run of a program. This required to run MPI programs. The most commonly used command line option is '''-np''' to specify the number of processes to be started. For instance, the following line will start the program ''test_mpi.exe'' with 9 processes:
| + | |
− | | + | |
− | <pre>mpirun -np 9 test_mpi.exe</pre>
| + | |
− | | + | |
− | The mpirun command offers additional options that are sometimes useful or required. Most tend to interfere with the scheduling of jobs in a multi-user environment such as ours and should be used with caution. Please consult the man pages for details.
| + | |
− | | + | |
− | Note that the usage of [[HowTo:Scheduler|a scheduler]] is mandatory for production jobs on our system. This option is therefore used frequently. For a details about Gridengine and jobs submission on our machines and clusters, [[HowTo:Scheduler|go here]].
| + | |
− | |}
| + | |
− | | + | |
− | {| style="border-spacing: 8px;"
| + | |
− | | valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#f7f7f7; border-radius:7px" |
| + | |
− | | + | |
− | == More Information ==
| + | |
− | | + | |
− | As already pointed out, this FAQ is not an introduction to MPI programming. The standard reference text on MPI is:
| + | |
− | | + | |
− | Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra: <br>
| + | |
− | [http://www.amazon.com/MPI-Complete-Reference-2--set/dp/0262692163/ref=sr_1_1?s=books&ie=UTF8&qid=1409163940&sr=1-1&keywords=MPI+-+The+complete+reference MPI - The Complete Reference (2nd edition)], The MIT Press, Cambridge, Massachusetts, 2000;<br>
| + | |
− | 2 volumes, ISBN 0-262-69215-5 and 0-262-69213-3
| + | |
− | | + | |
− | This text specifies all MPI routines and concepts, and includes a large number of examples. Most people will find it sufficient for all their needs.
| + | |
− | | + | |
− | [http://www.mhpcc.edu/training/workshop/mpi/MAIN.html A quite good online tutorial for MPI programming] can be found at the Maui HPCC site.
| + | |
− | | + | |
− | There is also an [http://www.mpi-forum.org/ official MPI webpage] which contains the standards documents for MPI and gives access to the MPI Forum.
| + | |
− | | + | |
− | We are conducting [[Training:Workshops|Workshops on a regular basis]], some devoted to MPI programming. They are announced on [http://caca.queensu.ca our web site]. We might see you there sometime soon.
| + | |
− | | + | |
− | == Some Tools ==
| + | |
− | | + | |
− | Standard debugging and profiling tools such as Sun Studio are designed for serial or multi-threaded programs. They do not handle multi-process runs very well.
| + | |
− | | + | |
− | Quite often, the best way to check the performance of an MPI program is timing it by insertion of suitable routines. MPI supplies a "wall-clock" routine called ''MPI_WTIME()'', that lets you determine how much actual time was spent in a specific segment of your code. An other method is calling the subroutines ''ETIME'' and ''DTIME'', which can give you information about the actual CPU time used. However, it is advisable to carefully read the documentation before using them with MPI programs. In this case, refer to the [http://docs.oracle.com/cd/E19205-01/819-5259/ Sun Studio 12: Fortran Library Reference].
| + | |
− | | + | |
− | We also provide a package called the [[Software:HWT|HPCVL Working Template (HWT)]], which was created by Gang Liu. The HWT provides 3 main functionalities:
| + | |
− | | + | |
− | * '''Maintenance of multiple versions''' of the same code from a single source file. This is very useful, if your MPI code is based on a serial code that you want to convert.
| + | |
− | * '''Automatic Relative Debugging''' which allows you to use pre-existing code (for example the serial version of your program) as a reference when checking the correctness of your MPI code.
| + | |
− | * '''Simple Timing''' which is needed to determine bottlenecks for parallelization, to optimize code, and to check its scaling properties.
| + | |
− | | + | |
− | The HWT is based on libraries and script files. It is easy to use and portable (written largely in Fortran). Fortran, C, C++, and any mixture thereof are supported, as well as MPI and OpenMP for parallelism. [http://hpcvl.org/sites/default/files/hpcvl%20HWTmanual_1.pdf Documentation of the HWT is available]. The package is installed on our clusters in /opt/hwt.
| + | |
− | | + | |
− | == Help ==
| + | |
− | [mailto:cac.help@queensu.ca Send email to cac.help@queensu.ca]. We have scientific programmers on staff who will probably be able to help you out. Of course, we can't do the coding for you but we do our best to get your code ready for parallel machines and clusters. | + | |
| |} | | |} |
This is an introduction to the Fortran, C, and C++ compilers used on our clusters and servers. It is meant to give the user a basic idea about the usage of the compilers and about code optimization options.