Difference between revisions of "HowTo:gaussian"

From CAC Wiki
Jump to: navigation, search
(How do I run Gaussian?)
(How do I submit parallel Gaussian jobs?)
Line 52: Line 52:
 
'''Note:''' It is absolutely essential to have a good idea about the size and complexity of your calculations before you start a Gaussian job. Many of the methods mentioned above have terrible scaling properties, i.e. the computational cost grows very quickly with the number of electrons, degrees of freedom, or number of basis functions used. We suggest you start with a small basis set and a cheap method, and then slowly increase those parameters.
 
'''Note:''' It is absolutely essential to have a good idea about the size and complexity of your calculations before you start a Gaussian job. Many of the methods mentioned above have terrible scaling properties, i.e. the computational cost grows very quickly with the number of electrons, degrees of freedom, or number of basis functions used. We suggest you start with a small basis set and a cheap method, and then slowly increase those parameters.
  
== How do I submit parallel Gaussian jobs? ==
+
== Submitting (parallel) Gaussian jobs ==
  
If you want to run Gaussian on several processors (which is encouraged, since this is a multi-processor machine) on HPCVL, you have to include a line like this in your input file:
+
If you want to run Gaussian on several processors on our machines, you have to include a line in your input file:
  
: %nproc=''number_of_processors''
+
<pre>%nproc=8</pre>
  
where ''number_of_processors'' is exactly what it says, in your input file for the job you are running (see below).
+
where we assume that you want to use 8 processors (cores, threads).
  
For production jobs, especially involving multiple processes, '''you must submit a Gaussian job script to our load-balancing software Grid Engine''' (see our [[FAQ:SGE|SGE FAQ]] for details). This is mandatory. This script requires that the environment was previously set up properly, and that you are a g98 group member.
+
'''It is mandatory to submit a Gaussian job script through our scheduling software''' (see our [[HowTo:Scheduler|Scheduler Help File]] for details).  
 +
 
 +
Here is a "bare bones" sample of a Gaussian submission script:
  
Here is a sample of what such a script may look like:
 
 
<pre>
 
<pre>
 
#! /bin/bash
 
#! /bin/bash
Line 72: Line 73:
 
#$ -o STD.out
 
#$ -o STD.out
 
#$ -e STD.err
 
#$ -e STD.err
#$ -pe gaussian.pe 12
+
#$ -pe gaussian.pe 8
 
. /opt/gaussian/setup.sh
 
. /opt/gaussian/setup.sh
 
g09 < sample.g09 > sample.log
 
g09 < sample.g09 > sample.log
 
</pre>
 
</pre>
  
In this script, you need to specify your email address for notifications. This is done with the "#$ -M" line. We suggest to use "hpcXXXX@localhost" where hpcXXXX has to be replaced by your actual username. Then place a file ".forward" containing your actual email address into your home directory. This way no-one can see your email address.
+
* In this script, an email address for notifications is specified in the '''#$ -M''' line. We suggest to use "hpcXXXX@localhost" (hpcXXXX stands for the username). Place a file '''.forward''' containing your actual email address into your home directory.
 
+
* The '''-o''' and '''-e''' lines are used to tell the system where to write "standard output" and "standard error", i.e. the screen output.
The "-o" and "-e" lines are used to tell the system where to write "standard output" and "standard error", which is stuff that would normally be written to the screen.
+
* The '''#$ -pe gaussian.pe 8''' line specifies the number of processors the scheduler will allocate. '''It is crucial to choose the same number as specified in the '''%nrocs=''' line of the input file.'''  
 
+
* The line '''. /opt/gaussian/setup.sh''' does an internal setup for the scheduler and must be included.
Gaussian offers the opportunity to execute major portions of the code on multiple processors. For our servers, this feature is implemented through shared-memory programming. The %Nprocs line in the input file causes Gaussian to use up to number_of_processors CPU's in the calculation. However, it is not acceptable to start a job like this on the standard serial GridEngine queue for production jobs. Therefore, your SGE script must include the lines
+
 
+
: #$ -pe gaussian.pe ''number of processes''
+
: . /opt/gaussian/setup.sh
+
 
+
where ''number of processes'' has to be replaced by the actual number, and has to be identical with the number appearing in the input file. Using this will assure that the GridEngine knows how many processors are used, and will allocate resources accordingly. This way your parallel job will scale reasonably well since you work on dedicated processors without over-subscription.
+
  
 
Note that we are using a special parallel environment "gaussian.pe" for Gaussian submissions. This will schedule all Gaussian jobs to a dedicated node. The line that sources in "setup.sh" redirects I/O from/to scratch files to a fast local disk. This greatly increases Gaussian performance in some cases and automatically removes scratch file when they are not needed anymore.
 
Note that we are using a special parallel environment "gaussian.pe" for Gaussian submissions. This will schedule all Gaussian jobs to a dedicated node. The line that sources in "setup.sh" redirects I/O from/to scratch files to a fast local disk. This greatly increases Gaussian performance in some cases and automatically removes scratch file when they are not needed anymore.

Revision as of 16:07, 29 April 2016

Gaussian is arguably the most commonly used computational quantum-chemistry program. It offers a wide range of features on the field of computational chemistry, ranging from atomic and molecular structure to thermochemical computations. A list of these features can be found here.

Features

Gaussian does electronic-structure calculations and standard quantum chemical calculations. Among the methods available are simple molecular mechanics (such as Amber force field), semi-empirical methods (such as CNDO), Hartree-Fock (restricted and unrestricted), MPn (Mollar-Plesset perturbation theory of order n=2,3,4), CI (Configuration-Interaction), CC (Coupled-Cluster), Multi-configurational SCF (such as CAS-SCF) and various DFT (Density-Functional Theory) methods. Specific to Gaussian are high-accuracy energy methods (G2, CBS). It can be used to obtain electronic properties, molecular geometries, vibrational frequencies, orbitals, reaction profiles, and much more. Check out the capabilities here.

Location of the program and setup

The program resides in /opt/gaussian. Multiple versions and revisions of the program are located in different sub-directories. The name of the root executable is g09.

The source code of Gaussian is not publicly accessible since Gaussian is a licensed product. However, Gaussian grants the permission to alter the code under certain conditions. If you want to do so, contact us to learn more. You are not allowed to copy the executable or any part of the distribution onto your local machine.

At present, we are using the usepackage system to set up Gaussian. This means that typing

 use g09 

is automatically adding all required settings to your shell set-up.

Scratch files

One of the settings is the environment variable GAUSS_SCRDIR which is required to redirect the temporary files that Gaussian uses to the proper scratch space, presently

export GAUSS_SCRDIR=/scratch/hpcXXXX

where hpcXXXX stands for your username. If for some reason Gaussian does not terminate normally (e.g. a job gets cancelled), it leaves behind large scratch files which you may have to delete manually. To check if such files exist, type

ls -lt /scratch/hpcXXXX

Once you have determined that the scratch files are no longer needed (because the program that used them is not running any more), you can delete them by typing

rm /scratch/hpcxxxx/Gau-*

Cleaning up the scratch space is the user's responsibility. If it is not done regularly, it can cause jobs to terminate, and much work to be lost.

Running Gaussian from a command line

To run Gaussian on our systems, you have to belong to a user group g98 (it's called that for historical reason, but it applies to all versions of Gaussian). You need to read our license agreement and signed a statement to be included in this user group. Once you are, you can access the executables.

A computation is performed by preparing an input file and pipe it to standard input of the program g09. Standard output should be caught in a log-file. We suggest you use the extensions .g09 for input files and .log for results.

Interactively, the command line to invoke Gaussian is thus:

 g09 < test.g09 >test.gout 

This will only work if you are a member of the g98 group and have set the environment correctly. Note that the interactive execution of Gaussian is only meant for test runs.

Gaussian input files are explained in the "User's Reference". It is impossible to give an outline here. Sample files can be found in

/opt/gaussian/g09/bsd/logs

Note: It is absolutely essential to have a good idea about the size and complexity of your calculations before you start a Gaussian job. Many of the methods mentioned above have terrible scaling properties, i.e. the computational cost grows very quickly with the number of electrons, degrees of freedom, or number of basis functions used. We suggest you start with a small basis set and a cheap method, and then slowly increase those parameters.

Submitting (parallel) Gaussian jobs

If you want to run Gaussian on several processors on our machines, you have to include a line in your input file:

%nproc=8

where we assume that you want to use 8 processors (cores, threads).

It is mandatory to submit a Gaussian job script through our scheduling software (see our Scheduler Help File for details).

Here is a "bare bones" sample of a Gaussian submission script:

#! /bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -V
#$ -M hpcXXXX@localhost
#$ -m be
#$ -o STD.out
#$ -e STD.err
#$ -pe gaussian.pe 8
. /opt/gaussian/setup.sh
g09 < sample.g09 > sample.log
  • In this script, an email address for notifications is specified in the #$ -M line. We suggest to use "hpcXXXX@localhost" (hpcXXXX stands for the username). Place a file .forward containing your actual email address into your home directory.
  • The -o and -e lines are used to tell the system where to write "standard output" and "standard error", i.e. the screen output.
  • The #$ -pe gaussian.pe 8 line specifies the number of processors the scheduler will allocate. It is crucial to choose the same number as specified in the %nrocs= line of the input file.
  • The line . /opt/gaussian/setup.sh does an internal setup for the scheduler and must be included.

Note that we are using a special parallel environment "gaussian.pe" for Gaussian submissions. This will schedule all Gaussian jobs to a dedicated node. The line that sources in "setup.sh" redirects I/O from/to scratch files to a fast local disk. This greatly increases Gaussian performance in some cases and automatically removes scratch file when they are not needed anymore.

Important: Please do not use any PE other than gaussian.pe for Gaussian job submissions and make sure you include the "setup.sh" line.

The script (let's call it g09.sh) is submitted by the qsub command:

qsub g09.sh

This must be done from the working directory, i.e. the directory that contains the input file and is supposed to contain the output. Also, we assume you are properly set up for Gaussian usage before submitting the script, for instance by typing "use g09".

There is an easier way to do this: We are supplying a small perl script called GaussSubmit that can be called directly, and will ask a few basic questions, such as the name for the job to be submitted and the number of processes to be used in the job. Simply type

GaussSubmit

and answer the questions. The script expects a Gaussian input file with "file extension" .g09 to be present and will do everything else automatically. This is meant for simple Gaussian job submissions. More complex job submissions are better done manually.

How is Gaussian licensed?

Gaussian is a licensed program. The license held by HPCVL is limited to the HPCVL-operated computers at our main site. That means that any user of HPCVL can use the program on our machines (but nowhere else), whether they are located at Queen's or not.

HPCVL requires users of Gaussian to sign a statement in which they state that they are informed about the terms of the license to be included in the Gaussian user group named "g98". Please fax the completed statement to (613) 533-2015 or scan/email to admin@hpcvl.org.

Where can I get more detailed information?

This is the most important question treated here. To learn the basics about Gaussian input and output, refer to the Gaussian 09 User's Reference. For templates, and to get many examples, check out /opt/gaussian/g09/bsd/examples.

There is a Gaussian web page with lots of information. Gaussian also operates a help line for licensed users. Send email to help@gaussian.com, but don't expect the answer to come too quick, they get a lot of requests.

For hardcore computational chemists, there is the "Gaussian 09 Programmer's Reference" and the IOPs Reference which are useful if you want to tinker with default settings and internal parameters, or even want to write some subroutines of your own. These can be purchased from Gaussian Inc..

Yoiu can also send to help@hpcvl.org or contact one of our user support folks directly.