Difference between revisions of "HowTo:matlab"

From CAC Wiki
Jump to: navigation, search
(Features)
(Location of the program and setup)
Line 25: Line 25:
 
== Location of the program and setup ==
 
== Location of the program and setup ==
  
The program resides in '''/opt/gamess''' and is called '''gamess.01.x'''. You also find some test examples in the program directory, which are useful to get an idea of the input format for the program. You are '''not''' allowed to copy the executable or any part of the distribution onto your local machine. However you can easily obtain the program yourself. See the [http://www.msg.ameslab.gov/gamess/download.html GAMESS source code distribution page].
+
The present version of Matlab is R2014a for Linux. The programs in the Matlab package are located in the directory /opt/matlab. Matlab processes should only be run in batch mode through our Gridengine scheduler.
  
Unlike other programs, no special setup is needed to run Gamess. All environment variables etc. are set with an execution script that will be described in the next section. However, it is a good idea to put the directory with the Gamess program into the path, i.e. set the PATH environment variable. This is best done through the usepackage utility, simply by typing <pre>use gamess </pre>
+
Note that many of the Matlab Toolboxes are installed on our clusters. Users who have a pre-existing Matlab license can submit and run '''serial''' Matlab jobs on our cluster nodes. '''We do not hold a so-called DCS (Distributed-Compute Server) license''' and can therefore not run parallel MATLAB jobs.
 +
 
 +
It is [http://www.hpcvl.org/sites/default/files/hpvcl-matlab-statement.pdf required that you sign a statement] if you want to use MATLAB. We will confirm this statement, and you will then be made a member of a Unix group "matlab", which enables you to access the software. Contact us if you are in doubt of whether you will be able to run Matlab on our system.
  
 
== Scratch files ==
 
== Scratch files ==

Revision as of 17:50, 30 May 2016

MATLAB

This is a short help file on using the high-level language code "Matlab" on our machines.

Important: The Centre for Advanced Computing does currently not have a stand-alone Matlab license. This means that users who wish to use MATLAB on our systems have to provide a valid license. This may require a connection to an external license server, or the purchase of a local license file.

The software is only made available to persons who belong to a specific Unix group. See details below.

Features

From the Matlab web page: "MATLAB® is a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numeric computation. [...] You can use MATLAB in a wide range of applications, including signal and image processing, communications, control design, test and measurement, financial modeling and analysis, and computational biology. Add-on toolboxes (collections of special-purpose MATLAB functions, available separately) extend the MATLAB environment to solve particular classes of problems in these application areas."

Here is a list of features, also from the webpage:

  • High-level language for technical computing
  • Development environment for managing code, files, and data
  • Interactive tools for iterative exploration, design, and problem solving
  • Mathematical functions for linear algebra, statistics, Fourier analysis, filtering, optimization, and numerical integration
  • 2-D and 3-D graphics functions for visualizing data
  • Tools for building custom graphical user interfaces
  • Functions for integrating MATLAB based algorithms with external applications and languages, such as C, C++, Fortran, Java, COM, and Microsoft Excel
  • The full Matlab program and most Toolboxes are installed on our systems, but the license for their usage is supplied by the user.

Location of the program and setup

The present version of Matlab is R2014a for Linux. The programs in the Matlab package are located in the directory /opt/matlab. Matlab processes should only be run in batch mode through our Gridengine scheduler.

Note that many of the Matlab Toolboxes are installed on our clusters. Users who have a pre-existing Matlab license can submit and run serial Matlab jobs on our cluster nodes. We do not hold a so-called DCS (Distributed-Compute Server) license and can therefore not run parallel MATLAB jobs.

It is required that you sign a statement if you want to use MATLAB. We will confirm this statement, and you will then be made a member of a Unix group "matlab", which enables you to access the software. Contact us if you are in doubt of whether you will be able to run Matlab on our system.

Scratch files

When you run Gamess a scratch space directory is set to /scratch/$USER and all temporary files and intermediate output will be placed in that directory. The program also requires a local temporary directory right below your home directory called $USER/scr. Make sure that you move files that you want to keep from there before running Gamess again with the same case_name. The second run requires that some temporary files be removed before re-run and will fail if they are still present in the scratch directory. For instance, if your username is "hpc9876", you will need "/scratch/hpc9876" and "/home/hpc9876/scr". Note that the former is automatically created when you obtain an account, but the latter has to be made by you explicitly.

Running Gamess from a command line

To run Gamess, a script file rungms is used. This file resides in the same directory as the gamess.00.x executable. Assuming that the home of the script file and executable is in your path, all you need to do is type

rungms case_name 01 n_procs

where case_name is the name of the input file (file extension is assumed to be .inp and must not be specified), and n_procs stands for the number of processes to be used in a parallel Gamess run. If n_procs=1 a serial run will be performed.

Note: It is absolutely essential to have a good idea about the size and complexity of your calculations before you start a Gamess job. Many of the methods have terrible scaling properties, i.e. the computational cost grows very quickly with the number of electrons, degrees of freedom, or number of basis functions used. We suggest you start with a small basis set and a cheap method, and then slowly increase those parameters.

Like most programs, Gamess requires an input (.inp) file that describes the system (usually a molecule) for which the calculation will be performed, specifies the level of calculation (eg, CISD), and provides other necessary information (starting orbitals, basis sets, required properties, etc). The format of the input is considerably more demanding than the one required for Gaussian (another widely used electronic-structure program), and much less information is hidden inside of defaults. This makes Gamess a very flexible program, but increases the risk of doing something wrong. Careful study of example input files and the documentation is required to run Gamess successfully. This is particularly true for CI or CAS-SCF calculations.

Once an input file is prepared, you will have to make the decision if you want to run Gamess in serial or in parallel mode. Gamess supports the use of multiple processors. However, the scaling (ie, the efficiency of parallel processing) varies with the type of calculation and the systems. We suggest you perform a small test calculation of the same type as your production calculation (eg, with a minimal basis set), and rerun it several times with a varying number of processors. Compare the timings and use the maximum number of processors that yield acceptable scaling for your production calculation.

Submitting (parallel) Gamess jobs

Gamess has to be run via the Grid Engine, which is a load-balancing program that submits batch jobs to low-load processors on the cluster cluster. To learn more about this program, click here. A Gamess job is submitted to the Grid Engine in the form of an execution script. A reasonable execution script for Gamess looks like this:

#$ -S /bin/bash
#$ -q abaqus.q
#$ -l qname=abaqus.q
#$ -cwd
#$ -V
#$ -m be
#$ -M {your email address}
#$ -o {standard output file}
#$ -e {standard error file}
#$ -pe shm.pe {number of processors}
rungms {case name} 01 $NSLOTS

In this template, just replace all entries enclosed in curly brackets by the proper values (do not retain the brackets. The lines starting with "#$ -o" and "#$ -e" define the standard output and standard error files, respectively.

The lines containing "abaqus.q" are there to force execution on the Linux (SW) cluster. Although we have a running version of the code on the legacy Solaris platform (M9000) server, we recommend to retain these lines and execute under Linux because of a likely better performance. In some special cases (for instance, very large memory requirements), it may be better to execute under Solaris. In that case you should remove the two lines with "abaqus.q" in it.

Note that all lines starting with "#$" are directives for the Grid Engine, and will be interpreted when the script is submitted to that program. The "#$ -V" and "#$ -cwd" instruct the executing shell of the script to inherit the environment of the calling shell (for instance the path), and set the starting directory to the current working directory, respectively.

You also need to specify the name of the input file just like in an interactive run. The input file is supposed to have "file extension" .inp and reside in the same directory as the Grid Engine script. The extension should not be specified, i.e. if the case name is "mycase", the program will read in from a file called "mycase.inp".

The number of processes is specified in the "#$ -pe" line, which instructs the Grid Engine to allocate the proper number of CPUs for your run. You do not have to specify it separately in the rungms command line, because Grid Engine sets the environment variable $NSLOTS properly.

Assuming your Grid Engine script is called "gamess.sh", it is submitted to GridEngine by typing

qsub gamess.sh

No further specification of the output is necessary, since this is done inside the script and handled by GridEngine.

Licensing

Gamess is a licensed program although it is distributed freely. The license held by the Centre for Advanced Computing is limited to our computers at our main site. That means that any of our users can use the program on our machines (but nowhere else), whether they are located at Queen's or not. You are not allowed to copy the executable or any part of the distribution onto your local machine. However you can easily obtain the program yourself. See the GAMESS source code distribution page. Gamess is a very portable program, and will run on IBM PC's (Windows), on a Mac, a variety of Unix platforms (including Linux), and your cellphone (just kidding).

Before you can access the Gamess executables and run the program, you have to read the license agreement that exists between the Gamess distributors and HPCVL. You also have to sign a statement that you have done so, and return it to us (fax to (613) 533-2015 or scan/email to cac.admin@queensu.ca).

Where can I get more detailed information ?

Gamess is not a simple program to run. It requires careful study of the input format, and a certain degree of knowledge about the "nuts and bolts" of computational quantum chemistry.