HowTo:matlab

From CAC Wiki
Revision as of 17:54, 30 May 2016 by Hasch (Talk | contribs) (Scratch files)

Jump to: navigation, search

MATLAB

This is a short help file on using the high-level language code "Matlab" on our machines.

Important: The Centre for Advanced Computing does currently not have a stand-alone Matlab license. This means that users who wish to use MATLAB on our systems have to provide a valid license. This may require a connection to an external license server, or the purchase of a local license file.

The software is only made available to persons who belong to a specific Unix group. See details below.

Features

From the Matlab web page: "MATLAB® is a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numeric computation. [...] You can use MATLAB in a wide range of applications, including signal and image processing, communications, control design, test and measurement, financial modeling and analysis, and computational biology. Add-on toolboxes (collections of special-purpose MATLAB functions, available separately) extend the MATLAB environment to solve particular classes of problems in these application areas."

Here is a list of features, also from the webpage:

  • High-level language for technical computing
  • Development environment for managing code, files, and data
  • Interactive tools for iterative exploration, design, and problem solving
  • Mathematical functions for linear algebra, statistics, Fourier analysis, filtering, optimization, and numerical integration
  • 2-D and 3-D graphics functions for visualizing data
  • Tools for building custom graphical user interfaces
  • Functions for integrating MATLAB based algorithms with external applications and languages, such as C, C++, Fortran, Java, COM, and Microsoft Excel
  • The full Matlab program and most Toolboxes are installed on our systems, but the license for their usage is supplied by the user.

Location of the program and setup

The present version of Matlab is R2014a for Linux. The programs in the Matlab package are located in the directory /opt/matlab. Matlab processes should only be run in batch mode through our Gridengine scheduler.

Note that many of the Matlab Toolboxes are installed on our clusters. Users who have a pre-existing Matlab license can submit and run serial Matlab jobs on our cluster nodes. We do not hold a so-called DCS (Distributed-Compute Server) license and can therefore not run parallel MATLAB jobs.

It is required that you sign a statement if you want to use MATLAB. We will confirm this statement, and you will then be made a member of a Unix group "matlab", which enables you to access the software. Contact us if you are in doubt of whether you will be able to run Matlab on our system.

To set the proper environment variables and include the directories with the binaries in your PATH, you should type:

use matlab

To avoid having to do this before every Matlab session, you may also include this command in your setup file, e.g. .bash_profile. Finally you may have to point an environment variable LM_LICENSE_FILE to the proper license file:

export LM_LICENSE_FILE=lic_file

Here lic_file stands for the full path and file name of your license file. You need to make sure that the latter cannot be accessed by anyone but you. This command may of course also be placed into a setup file to avoid retyping.

Since the Matlab licenses are specific to the user (as the user is who supplies them), it is possible that specific set-ups are required, and the simple "use" command won't work. In this case, please get in touch and we modify the set-up to work with your specific case.

Running Gamess from a command line

To run Gamess, a script file rungms is used. This file resides in the same directory as the gamess.00.x executable. Assuming that the home of the script file and executable is in your path, all you need to do is type

rungms case_name 01 n_procs

where case_name is the name of the input file (file extension is assumed to be .inp and must not be specified), and n_procs stands for the number of processes to be used in a parallel Gamess run. If n_procs=1 a serial run will be performed.

Note: It is absolutely essential to have a good idea about the size and complexity of your calculations before you start a Gamess job. Many of the methods have terrible scaling properties, i.e. the computational cost grows very quickly with the number of electrons, degrees of freedom, or number of basis functions used. We suggest you start with a small basis set and a cheap method, and then slowly increase those parameters.

Like most programs, Gamess requires an input (.inp) file that describes the system (usually a molecule) for which the calculation will be performed, specifies the level of calculation (eg, CISD), and provides other necessary information (starting orbitals, basis sets, required properties, etc). The format of the input is considerably more demanding than the one required for Gaussian (another widely used electronic-structure program), and much less information is hidden inside of defaults. This makes Gamess a very flexible program, but increases the risk of doing something wrong. Careful study of example input files and the documentation is required to run Gamess successfully. This is particularly true for CI or CAS-SCF calculations.

Once an input file is prepared, you will have to make the decision if you want to run Gamess in serial or in parallel mode. Gamess supports the use of multiple processors. However, the scaling (ie, the efficiency of parallel processing) varies with the type of calculation and the systems. We suggest you perform a small test calculation of the same type as your production calculation (eg, with a minimal basis set), and rerun it several times with a varying number of processors. Compare the timings and use the maximum number of processors that yield acceptable scaling for your production calculation.

Submitting (parallel) Gamess jobs

Gamess has to be run via the Grid Engine, which is a load-balancing program that submits batch jobs to low-load processors on the cluster cluster. To learn more about this program, click here. A Gamess job is submitted to the Grid Engine in the form of an execution script. A reasonable execution script for Gamess looks like this:

#$ -S /bin/bash
#$ -q abaqus.q
#$ -l qname=abaqus.q
#$ -cwd
#$ -V
#$ -m be
#$ -M {your email address}
#$ -o {standard output file}
#$ -e {standard error file}
#$ -pe shm.pe {number of processors}
rungms {case name} 01 $NSLOTS

In this template, just replace all entries enclosed in curly brackets by the proper values (do not retain the brackets. The lines starting with "#$ -o" and "#$ -e" define the standard output and standard error files, respectively.

The lines containing "abaqus.q" are there to force execution on the Linux (SW) cluster. Although we have a running version of the code on the legacy Solaris platform (M9000) server, we recommend to retain these lines and execute under Linux because of a likely better performance. In some special cases (for instance, very large memory requirements), it may be better to execute under Solaris. In that case you should remove the two lines with "abaqus.q" in it.

Note that all lines starting with "#$" are directives for the Grid Engine, and will be interpreted when the script is submitted to that program. The "#$ -V" and "#$ -cwd" instruct the executing shell of the script to inherit the environment of the calling shell (for instance the path), and set the starting directory to the current working directory, respectively.

You also need to specify the name of the input file just like in an interactive run. The input file is supposed to have "file extension" .inp and reside in the same directory as the Grid Engine script. The extension should not be specified, i.e. if the case name is "mycase", the program will read in from a file called "mycase.inp".

The number of processes is specified in the "#$ -pe" line, which instructs the Grid Engine to allocate the proper number of CPUs for your run. You do not have to specify it separately in the rungms command line, because Grid Engine sets the environment variable $NSLOTS properly.

Assuming your Grid Engine script is called "gamess.sh", it is submitted to GridEngine by typing

qsub gamess.sh

No further specification of the output is necessary, since this is done inside the script and handled by GridEngine.

Licensing

Gamess is a licensed program although it is distributed freely. The license held by the Centre for Advanced Computing is limited to our computers at our main site. That means that any of our users can use the program on our machines (but nowhere else), whether they are located at Queen's or not. You are not allowed to copy the executable or any part of the distribution onto your local machine. However you can easily obtain the program yourself. See the GAMESS source code distribution page. Gamess is a very portable program, and will run on IBM PC's (Windows), on a Mac, a variety of Unix platforms (including Linux), and your cellphone (just kidding).

Before you can access the Gamess executables and run the program, you have to read the license agreement that exists between the Gamess distributors and HPCVL. You also have to sign a statement that you have done so, and return it to us (fax to (613) 533-2015 or scan/email to cac.admin@queensu.ca).

Where can I get more detailed information ?

Gamess is not a simple program to run. It requires careful study of the input format, and a certain degree of knowledge about the "nuts and bolts" of computational quantum chemistry.