Difference between revisions of "HowTo:nwchem"

From CAC Wiki
Jump to: navigation, search
(Gaussian)
(Submitting (parallel) NWChem jobs)
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= NWChem =
 
= NWChem =
  
This is a short "Frequently Asked Questions" file on using the parallel electronic-structure code "NWChem" on systems at the Centre for Advanced Computing. This software uses MPI as a message-passing system and is (in principle) able to run on an arbitrary number of processors. Its ability to perform a very broad spectrum of molecular-structure calculations, ranging from CI to ab-initio molecular dynamics, makes it an interesting alternative to the standard electronic structure code Gaussian.  
+
This is a short help file on using the parallel electronic-structure code "NWChem" on systems at the Centre for Advanced Computing. This software uses MPI as a message-passing system and is (in principle) able to run on an arbitrary number of processors. Its ability to perform a very broad spectrum of molecular-structure calculations, ranging from CI to ab-initio molecular dynamics, makes it an interesting alternative to the standard electronic structure code Gaussian.  
 
{|  style="border-spacing: 8px;"
 
{|  style="border-spacing: 8px;"
 
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |  
 
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |  
 
== Features ==
 
== Features ==
  
NWChem is an electronic-structure code that is suitable to perform complex calculations on molecular structure. It was specifically designed to perform well on high-performance parallel computers. The installation on the SunFire cluster of HPCVL employs the MPI message passing package for parallel execution.
+
NWChem is an electronic-structure code that is suitable to perform complex calculations on molecular structure. It was specifically designed to perform well on high-performance parallel computers. The installation on our machines employs the MPI message passing package for parallel execution.
  
 
NWChem allows, among others, the following calculations to be performed:
 
NWChem allows, among others, the following calculations to be performed:
Line 29: Line 29:
 
<pre>use nwchem</pre>
 
<pre>use nwchem</pre>
 
which will set you up automatically.
 
which will set you up automatically.
 +
|}
 +
{|  style="border-spacing: 8px;"
 +
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#f7f7f7; border-radius:7px" |
  
== Scratch files ==
+
== Running NWChem from a command line==
  
One of the settings is the environment variable '''GAUSS_SCRDIR''' which is required to redirect the temporary files that Gaussian uses to the proper scratch space, presently
+
Like other electronic-structure programs, NWChem is run by supplying an input file that defines the system on which to perform a calculation (usually a molecule, or a group of molecules), and the method to use (i.e., the level of calculation, such as "Hartree-Fock", the basis set, and other details of the computation).
  
<pre>export GAUSS_SCRDIR=/scratch/hpcXXXX</pre>
+
The variety of possible calculations is great, and so is the complexity of systems. It is impossible for us here to explain the format that a NWChem input file needs to have. This is explained in the [http://www.nwchem-sw.org/index.php/Release62:NWChem_Documentation User's Manual] which is available online.
  
where hpcXXXX stands for your username. If for some reason Gaussian does not terminate normally (e.g. a job gets cancelled), it leaves behind large '''scratch files''' which you may have to delete manually. To check if such files exist, type
+
Here, we provide a simple sample input file which should be given the file extension .nw.
  
<pre>ls -lt /scratch/hpcXXXX</pre>
+
<pre>
 +
start h2o
  
Once you have determined that the scratch files are no longer needed (because the program that used them is not running any more), you can delete them by typing
+
title "H2O, cc-pVDZ basis, SCF optimized geometry"
  
<pre>rm /scratch/hpcxxxx/Gau-*</pre>
+
geometry units au
 +
H      0.0000000000  1.4140780900  -1.1031626600
 +
H      0.0000000000  -1.4140780900  -1.1031626600
 +
O      0.0000000000  0.0000000000  -0.0080100000
 +
end
  
Cleaning up the scratch space is the user's responsibility. If it is not done regularly, it can cause jobs to terminate, and much work to be lost.
+
basis
|}
+
H library cc-pVDZ
{|  style="border-spacing: 8px;"
+
O library cc-pVDZ
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#f7f7f7; border-radius:7px" |
+
end
== Running Gaussian from a command line==
+
  
To run Gaussian on our systems, you have to belong to a '''user group g98''' (it's called that for historical reason, but it applies to all versions of Gaussian). You need to read our license agreement and [http://www.hpcvl.org/sites/default/files/gaussian-statement.pdf signed a statement] to be included in this user group. Once you are, you can access the executables.
+
scf
 +
  thresh 1.0e-8
 +
end
  
A computation is performed by preparing an input file and pipe it to standard input of the program '''g09'''. Standard output should be caught in a log-file. We suggest you use the '''extensions''' ''.g09 for input'' files and ''.log for results''.
+
task scf
 +
</pre>
  
Interactively, the command line to invoke Gaussian is thus:
+
This extension may be omitted when calling the program. NWChem creates typically a whole array of output files that are documented in the User's Manual. A general log is displayed on the console, and may be saved in a file by redirecting the standard output:
  
<pre> g09 < test.g09 >test.gout </pre>
+
<pre>nwchem sample > sample.log</pre>
  
This will only work if you are a member of the g98 group and have set the environment correctly. Note that the '''interactive execution of Gaussian is only meant for test runs'''.
+
where we assume that your input file is called sample.nw and you want to save the log to a file sample.log.
  
Gaussian input files are explained in the "User's Reference". It is impossible to give an outline here. Sample files can be found in  
+
Note that this is just the basic syntax of the program call. In practise you will use a parallel environment to execute the program (see next section). In fact, executing NWChem by just typing the above line will run it in serial mode.
  
<pre>/opt/gaussian/g09/bsd/logs</pre>
+
== Parallel Runs ==
  
'''Note:''' It is absolutely essential to have a good idea about the size and complexity of your calculations before you start a Gaussian job. Many of the methods mentioned above have terrible scaling properties, i.e. the computational cost grows very quickly with the number of electrons, degrees of freedom, or number of basis functions used. We suggest you start with a small basis set and a cheap method, and then slowly increase those parameters.
+
NWChem is inherently parallelized and designed to scale well on a multi-processor machine or a cluster. The underlying parallel system is MPI (Message Passing Interface) which is a commonly available standard that runs on many platforms. Consult our [[HowTo:mpi|MPI help file]] and follow some of the links in there if you want to have more information about MPI. Even if you want to use only one processor for your NWChem run (which sometimes is the best solution, particularly for smaller computations), you have to submit the program to a parallel environment. On our clusters, the relevant command is '''mpirun''':
 +
<pre>mpirun -np 8 nwchem sample > sample.log</pre>
 +
This will run your sample.nw input file on eight processors. Note that you are only allowed to run NWChem this way for small test systems! For any production jobs, you have to submit the task to the scheduler (see next section.).
  
== Submitting (parallel) Gaussian jobs ==
+
== Submitting (parallel) NWChem jobs ==
  
If you want to run Gaussian on several processors on our machines, you have to include a line in your input file:
+
NWChem jobs are to be submitted on the SW (Linux) systems via the GridEngine, which is a load-balancing software. To obtain details, read our [[FAQ:SGE|GridEngine FAQ]] . For an NWChem batch job, this means that rather than issuing the command in the previous section directly, you wrap it into a GridEngine batch script.
 
+
<pre>%nproc=8</pre>
+
 
+
where we assume that you want to use 8 processors (cores, threads).
+
 
+
'''It is mandatory to submit a Gaussian job script through our scheduling software''' (see our [[HowTo:Scheduler|Scheduler Help File]] for details).  
+
 
+
Here is a "bare bones" sample of a Gaussian submission script:
+
  
 +
Here is an example for such a batch script:
 
<pre>
 
<pre>
#! /bin/bash
 
 
#$ -S /bin/bash
 
#$ -S /bin/bash
#$ -q abaqus.q
+
#$ -o sample.out
#$ -l qname=abaqus.q
+
#$ -e STD.err
#$ -cwd
+
#$ -V
+
 
#$ -M hpcXXXX@localhost
 
#$ -M hpcXXXX@localhost
 
#$ -m be
 
#$ -m be
#$ -o STD.out
+
#$ -V
#$ -e STD.err
+
#$ -cwd
#$ -pe shm.pe 8
+
#$ -pe dist.pe 4
g09 < sample.g09 > sample.log
+
mpirun -np $NSLOTS nwchem sample
</pre>
+
</pre>  
 +
This script needs to be altered by explicitly replacing the entries that differ in your case. We suggest you use it as a template for all your NWChem runs. For details, consult our [[HowTo:Scheduler|Scheduler Help File]].
 +
 
 +
Note that there is no need in this script to redirect the standard output via the > operator. Instead, you define where the output goes to the GridEngine via the "#$ -o" command. In our case, we send it to a file called sample.out.
  
* The first 6 lines of the script make sure the right shell is used, the program executes on the correct cluster, and all necessary setup is done.
+
Email notification is set up through the "#$ -M" line. In the above example you need to replace XXXX by the actual 4 digits in your username, and place a file ".forward" with your email address into your home directory.
* An email address for notifications is specified in the '''#$ -M''' line. We suggest "hpcXXXX@localhost" (hpcXXXX stands for the username). Place a file '''.forward''' containing your actual email address into your home directory
+
* The '''-o''' and '''-e''' lines are used to tell the system where to write "standard output" and "standard error", i.e. the screen output.
+
* The '''#$ -pe gaussian.pe 8''' line specifies the number of processors the scheduler will allocate (8 in this example). It is crucial to choose the same number as specified in the '''%nrocs=''' line of the input file.  
+
  
The script (let's call it g09.sh) is submitted by the qsub command:
+
In the example we are executing with 4 processes. To choose a different number, alter the "#$ -pe" line in the script. For this example script to work you need to have set up the calling shell through the "use nwchem" command because the above script inherits all the environment settings (due to the "#$ -V" option).
  
<pre>qsub g09.sh</pre>
+
The script is submitted to the GridEngine by typing
  
This must be done from the working directory, i.e. the directory that contains the input file and is supposed to contain the output. Also make sure that you have set up gaussian ('''use g09''') before you submit a job.
+
<pre>qsub batch_file_name</pre>
 +
 
 +
The advantage to submit jobs via a load balancing software is that the software will automatically find the resources required and put the job onto a set of processors that have a low load. This will help executing the job faster. Production jobs on our cluster '''must be submitted using GridEngine''' from a login node (sflogin0 or swlogin1), and executed under GE's control on the Linux production nodes without any need for you to log in.
 
|}
 
|}
 
{|  style="border-spacing: 8px;"
 
{|  style="border-spacing: 8px;"
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |  
+
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |
== Licensing ==
+
  
Gaussian is a licensed program. The license held by the Centre for Advanced Computing is limited to our computers at our main site. That means that any of our users can use the program on our machines (but nowhere else), whether they are located at Queen's or not.
+
== Licensing ==
 +
NWChem is obtainable free of charge from the Pacific Northwest National Laboratory. [http://www.nwchem-sw.org/index.php/Download To obtain your own copy, go here]. NWChem is ditributed under an [http://opensource.org/licenses/ecl2.php Open Source Educational Community License]. Like with other software, HPCVL requires users who want to use NWChem, to read this agreement, and sign [http://www.hpcvl.org/sites/default/files/hpcvl%20nwchem_statement.pdf a statement] that they have done so and will abide by its terms. You can fax the signed statement to (613) 533-2015 or scan/email it to [mailto:cac.admin@queensu.ca cac.admin@queensu.ca]. You will then be included in a Unix user group that has access to the NWChem executables.
  
We require users of Gaussian to [http://www.hpcvl.org/sites/default/files/gaussian-statement.pdf sign a statement] in which they state that they are informed about the [http://www.hpcvl.org/sites/default/files/g09-licence.pdf terms of the license] to be included in the Gaussian user group named "g98". Please fax the completed statement to (613) 533-2015 or scan/email to [mailto:cac.admin@queensu.ca cac.admin@queensu.ca].
+
== More Information ==
  
== Where can I get more detailed information ? ==
+
NWChem is a very complex software package, and requires practice to be used efficiently. We cannot explain it use in any detail here.
  
* To learn the basics about Gaussian input and output, refer to the [http://www.gaussian.com/g_tech/g_ur/g09help.htm Gaussian 09 User's Reference].
+
* Complete documentation for the program is available in the form of the [http://www.nwchem-sw.org/index.php/Release62:NWChem_Documentation User's Manual], which is an absolute must-have if you want to use this program.  
* For templates, and to get many examples, check out /opt/gaussian/g09/bsd/examples.
+
* Check out the [http://www.nwchem-sw.org/index.php/Main_Page official NWChem website]. They feature a very useful FAQ and even a tutorial.
* The [http://www.gaussian.com Gaussian web page] contains a lot of information.  
+
* There is an active support community for NWChem which can be [http://www.nwchem-sw.org/index.php/Special:AWCforum accessed through their webpage].
* For hardcore computational chemists, there is the [http://www.gaussian.com/g_tech/g09iop.htm Gaussian IOPs Reference], useful if you want to tinker with default settings and internal parameters.  
+
* If you are experiencing trouble running a batch command script, [[FAQ:SGE|read our FAQ on that subject]]
* These [http://www.gaussian.com/g_prod/books.htm can be purchased from Gaussian Inc.].
+
 
* '''Send [mailto:cac.help@queensu.ca|email to cac.help@queensu.ca]'''. We're happy to help.
 
* '''Send [mailto:cac.help@queensu.ca|email to cac.help@queensu.ca]'''. We're happy to help.
 
|}
 
|}

Latest revision as of 17:47, 16 May 2017

NWChem

This is a short help file on using the parallel electronic-structure code "NWChem" on systems at the Centre for Advanced Computing. This software uses MPI as a message-passing system and is (in principle) able to run on an arbitrary number of processors. Its ability to perform a very broad spectrum of molecular-structure calculations, ranging from CI to ab-initio molecular dynamics, makes it an interesting alternative to the standard electronic structure code Gaussian.

Features

NWChem is an electronic-structure code that is suitable to perform complex calculations on molecular structure. It was specifically designed to perform well on high-performance parallel computers. The installation on our machines employs the MPI message passing package for parallel execution.

NWChem allows, among others, the following calculations to be performed:

  • Hartree-Fock (e.g. RHF, UHF, ROHF etc.)
  • DFT including spin-orbit DFT, with many exchange and correlation functionals.
  • Complete Active Space SCF (CAS-SCF)
  • Coupled-Cluster (CCSD, CCSD+T, etc.)
  • limited CI (eg, CISD) with perturbative corrections
  • MP2 (2nd-order Mollar-Plesset Perturbation Theory)
  • In general: single-point calculations, geometry optimizations, vibrational analysis.
  • Static one-electron properties, densities, electrostatic potentials.
  • ONIOM model for multi-level calculations on larger systems
  • Relativistic corrections (Douglas-Kroll, Dyall-Dirac, spin-orbit)
  • Ab-initio molecular dynamics (Carr-Parinello)
  • Extended (solid-state) systems DFT
  • Classical force-fields (Molecular Mechanics: AMBER, CHARMM, etc.)

For a more complete list, see the official NWChem homepage and click on "capabilities".

Location of the program and setup

The NWChem program is located in the directory /opt/nwchem/bin. To access it, you have to use the usepackage command

use nwchem

which will set you up automatically.

Running NWChem from a command line

Like other electronic-structure programs, NWChem is run by supplying an input file that defines the system on which to perform a calculation (usually a molecule, or a group of molecules), and the method to use (i.e., the level of calculation, such as "Hartree-Fock", the basis set, and other details of the computation).

The variety of possible calculations is great, and so is the complexity of systems. It is impossible for us here to explain the format that a NWChem input file needs to have. This is explained in the User's Manual which is available online.

Here, we provide a simple sample input file which should be given the file extension .nw.

start h2o

title "H2O, cc-pVDZ basis, SCF optimized geometry"

geometry units au
H       0.0000000000   1.4140780900  -1.1031626600
H       0.0000000000  -1.4140780900  -1.1031626600
O       0.0000000000   0.0000000000  -0.0080100000
end

basis
H library cc-pVDZ
O library cc-pVDZ
end

scf
   thresh 1.0e-8
end

task scf

This extension may be omitted when calling the program. NWChem creates typically a whole array of output files that are documented in the User's Manual. A general log is displayed on the console, and may be saved in a file by redirecting the standard output:

nwchem sample > sample.log

where we assume that your input file is called sample.nw and you want to save the log to a file sample.log.

Note that this is just the basic syntax of the program call. In practise you will use a parallel environment to execute the program (see next section). In fact, executing NWChem by just typing the above line will run it in serial mode.

Parallel Runs

NWChem is inherently parallelized and designed to scale well on a multi-processor machine or a cluster. The underlying parallel system is MPI (Message Passing Interface) which is a commonly available standard that runs on many platforms. Consult our MPI help file and follow some of the links in there if you want to have more information about MPI. Even if you want to use only one processor for your NWChem run (which sometimes is the best solution, particularly for smaller computations), you have to submit the program to a parallel environment. On our clusters, the relevant command is mpirun:

mpirun -np 8 nwchem sample > sample.log

This will run your sample.nw input file on eight processors. Note that you are only allowed to run NWChem this way for small test systems! For any production jobs, you have to submit the task to the scheduler (see next section.).

Submitting (parallel) NWChem jobs

NWChem jobs are to be submitted on the SW (Linux) systems via the GridEngine, which is a load-balancing software. To obtain details, read our GridEngine FAQ . For an NWChem batch job, this means that rather than issuing the command in the previous section directly, you wrap it into a GridEngine batch script.

Here is an example for such a batch script:

#$ -S /bin/bash
#$ -o sample.out
#$ -e STD.err
#$ -M hpcXXXX@localhost
#$ -m be
#$ -V
#$ -cwd
#$ -pe dist.pe 4
mpirun -np $NSLOTS nwchem sample

This script needs to be altered by explicitly replacing the entries that differ in your case. We suggest you use it as a template for all your NWChem runs. For details, consult our Scheduler Help File.

Note that there is no need in this script to redirect the standard output via the > operator. Instead, you define where the output goes to the GridEngine via the "#$ -o" command. In our case, we send it to a file called sample.out.

Email notification is set up through the "#$ -M" line. In the above example you need to replace XXXX by the actual 4 digits in your username, and place a file ".forward" with your email address into your home directory.

In the example we are executing with 4 processes. To choose a different number, alter the "#$ -pe" line in the script. For this example script to work you need to have set up the calling shell through the "use nwchem" command because the above script inherits all the environment settings (due to the "#$ -V" option).

The script is submitted to the GridEngine by typing

qsub batch_file_name

The advantage to submit jobs via a load balancing software is that the software will automatically find the resources required and put the job onto a set of processors that have a low load. This will help executing the job faster. Production jobs on our cluster must be submitted using GridEngine from a login node (sflogin0 or swlogin1), and executed under GE's control on the Linux production nodes without any need for you to log in.

Licensing

NWChem is obtainable free of charge from the Pacific Northwest National Laboratory. To obtain your own copy, go here. NWChem is ditributed under an Open Source Educational Community License. Like with other software, HPCVL requires users who want to use NWChem, to read this agreement, and sign a statement that they have done so and will abide by its terms. You can fax the signed statement to (613) 533-2015 or scan/email it to cac.admin@queensu.ca. You will then be included in a Unix user group that has access to the NWChem executables.

More Information

NWChem is a very complex software package, and requires practice to be used efficiently. We cannot explain it use in any detail here.