Difference between revisions of "HowTo:cpmd"

From CAC Wiki
Jump to: navigation, search
(How do I set my account up for running CPMD?)
(Batch Jobs)
 
(5 intermediate revisions by the same user not shown)
Line 34: Line 34:
 
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#f7f7f7; border-radius:7px" |
 
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#f7f7f7; border-radius:7px" |
  
== Running ADF from a command line==
+
== Running from a command line==
  
The following instructions assume that you are a member of the Unix group "adf". The instructions in this section are only useful if you need to run test jobs of a short duration. If you want to run a production job, please refer to to instructions on how to start a ADF batch job.
+
Before you can access the CPMD executables and run the program, you have to read our [http://www.hpcvl.org/sites/default/files/hpvcl_cpmd_license.pdf license agreement]. You also have to [http://www.hpcvl.org/sites/default/files/hpvcl_cpmd_statement.pdf sign a statement] that you have done so, and return it to us (see last section for more information).
  
Once program usage is set up through the "use" command, the program(s) can be run:
+
To run CPMD, you need to specify the executable, an input file, and (optionally) an output file. Assuming that the CPMD home is in your path, all you need to do is type
  
<pre>
+
<pre>cpmd_serial.x input_name >output_name</pre>  
adf <in >out
+
</pre>
+
  
Instructions about the job are read from standard input, which has been redirected from a file in in the above command lines. Commonly an input file will be constructed to specify what calculation is to be run. The output of the program(s) goes to "standard output" and has been redirected to an output file out above. Note that the output of these programs is commonly thousands of lines long and should therefore be redirected in any case.
+
where input_name is the name of the input file (file extension is recommended to be.inp). If no output file output_name is specified, then the output is sent to the terminal screen.  
  
The construction of a proper input file for ADF is an involved process, and is outside the scope of this help file. Detailed instructions can be found in the ADF User's Guide, which should be studied before the program can be used properly. As an initial hint, here is a  sample input file:
+
== Parallel Runs ==
 +
The above command line is for the serial version of the program. For larger runs, it is recommentded to use the parallel version. Then the command line is:
  
<pre>
+
<pre>mpirun -np 8 cpmd_ct8.x input_name >output_name</pre>
title benzene BP/SZ bondorders tol=0.05
+
  
define
+
if 8 processes are to be used in the parallel run. Because CPMD uses the Messageg Passing Interface MPI for parallelism, a runtime environment needs to be used to start the program, which is the reason for the '''mpirun''' command. In this case, we are using '''OpenMPI'''. -np 1 boils down to a serial run.
cc=1.38476576
+
ccc=120.0
+
dih=0.0
+
hc=1.07212846
+
hcc= 120.0
+
dih2=180.0
+
end
+
  
atoms Z-matrix
+
Like most programs, CPMD requires an input (.inp) file that describes the system for which the calculation will be performed, specifies the level of calculation, and provides other necessary information. The format of the input is described in detail in the CPMD documentation and cannot be explained here.
C  0 0 0
+
C  1 0 0  cc
+
C  2 1 0  cc ccc
+
C  3 2 1  cc ccc dih
+
C  4 3 2  cc ccc dih
+
C  5 4 3  cc ccc dih
+
H  2 1 3  hc hcc dih2
+
H  3 2 4  hc hcc dih2
+
H  4 3 5  hc hcc dih2
+
H  5 4 3  hc hcc dih2
+
H  6 5 4  hc hcc dih2
+
H  1 2 3  hc hcc dih2
+
end
+
  
basis
+
In addition to the input file you may need other auxiliary files which can be obtained from the CPMD directory. In most cases, you will have provide pseudo-potential files which usually have the file extension .psp. A collection of these may be found in directories below /opt/cpmd/3.13.
Type SZ
+
Core None
+
end
+
  
symmetry NOSYM
+
Once all input is prepared, you will have to make the decision how many processes you want to use. This involves a trade-off between availability of CPU's on our systems, and the efficiency of additional processes, i.e. scaling. We suggest you perform test calculations of the same type as your production calculation, rerun several times with a varying number of processors. Comparing the timings lets you determine the maximum number of processors that yield acceptable scaling for your production calculation.
  
xc
+
== Batch Jobs ==
  gga becke perdew
+
end
+
  
bondorder tol=0.05 printall
+
CPMD, like all production software, has to be run through our scheduler, which submits batch jobs to low-load processors on the cluster. [[HowTo:Scheduler|Click here to learn more]]. A CPMD job '''must be submitted to the Grid Engine'''. The calculation is set up by editing the execution script. Here is a sample for such a script:
 
+
noprint sfo
+
</pre>
+
 
+
The input consists of several units, separated by blank lines, starting with a keyword, and ending with the statement END. For instance, the atoms in a molecules may be specified by issuing the keyword atoms, followed by one line with the atom name and "Z-matrix" relative coordinates for each atom, and closing with end (case insensitive).
+
 
+
'''Note:''' It is absolutely essential to have a good idea about the size and complexity of your calculations before you start a ADF job. Many of the methods have terrible scaling properties, i.e. the computational cost grows very quickly with the number of electrons, degrees of freedom, or number of basis functions used. We suggest you start with a small basis set and a cheap method, and then slowly increase those parameters.
+
 
+
== Submitting (parallel) ADF jobs ==
+
 
+
In most cases, you will run ADF in batch mode.
+
 
+
Production jobs are submitted to our systems via the Grid Engine, which is a load-balancing software. To obtain details, read our Grid Engine FAQ. For an ADF batch job, this means that rather than issuing the above commands directly, you wrap them into a Grid Engine batch script. Here is an example for such a batch script:
+
  
 
<pre>
 
<pre>
#! /bin/bash
+
#!/bin/bash
 
#$ -S /bin/bash
 
#$ -S /bin/bash
 
#$ -V
 
#$ -V
 
#$ -cwd
 
#$ -cwd
#$ -M MyEmailAdress@whatever.com
+
#$ -pe shm.pe 8
 
#$ -m be
 
#$ -m be
 +
#$ -M hpcXXXX@localhost
 
#$ -o STD.out
 
#$ -o STD.out
 
#$ -e STD.err
 
#$ -e STD.err
#$ -pe shm.pe 12
+
mpirun -np $NSLOTS cpmd.x input_name > output_name
adf -n $NSLOTS <sample.adf >sample.log
+
 
</pre>
 
</pre>
  
This script needs to be altered by replacing all the relevant items. It sets all the necessary environment variables (make sure you issued a "use adf" statement before using this), and then starts the program. The lines in the script that start with #$ are interpreted the Grid Engine load balancing software as directives for the execution of the program.
+
Replace the entries that differ by the proper values. The lines starting with "#$ -o" and "#$ -e" define the standard output and standard error files, respectively. Note that all lines starting with "#$" are directives for the Grid Engine scheduler, and will be interpreted when the script is submitted to that program. The "#$ -V" instructs the executing shell of the script to inherit the environment of the calling shell (for instance the path). The #$ -cwd option sets the starting directory to the current working directory.  
  
For instance the line "#$ -m be" tells the Grid Engine to notify the user via email when the job has started and when it is finished, while the line beginning with "#$ -M" tells the Grid Engine about the email address of the user.
+
The last line is almost the same as in an interactive run. Input and pseudo-potential files are supposed to be in the same directory as this script. The number of processes is specified in the "#$ -pe" line, which instructs the Grid Engine to allocate the proper number of CPUs for your run. You do not have to specify it separately in the cpmd command line, because Grid Engine sets the environment variable $NSLOTS properly.
  
The -o and -e lines determine whence the standard input and the standard error are to be redirected. Since the job is going to be executed in batch, no terminal is available as a default for these.
+
Assuming the script is called cpmd.sh, it is submitted by typing
  
The ADF package is able to execute on several processors simultaneously in a distributed-memory fashion. This means that some tasks such as the calculation of a large number of matrix elements, or numerical integrations may be done in a fraction of the time it takes to execute on a single CPU. For this, the processors on the cluster need to be able to communicate. To this end ADF uses the MPI (Message Passing Interface), a well-established communication system.
+
<pre>qsub cpmd.csh</pre>
  
Because ADF uses a specific version of the parallel system MPI (ClusterTools 7), executing the use adf command will also cause the system to "switch" to that version, which might have an impact on jobs that you are running from the same shell later. To undo this effect, you need to type use ct8 when you are finished using ADF and want to return to the production version of MPI (ClusterTools 8).
+
No further specification of the output is necessary, since this is done inside the script and handled by Grid Engine.
 
+
ADF parallel jobs that are to be submitted to Grid Engine will use the MPI parallel environment and queues already defined for the user.
+
 
+
Our sample script contains a line that determines the number of parallel processes to be used by ADF. The Grid Engine will start the MPI parallel environment (PE) with a given number of slots that you specify by modifying that line:
+
 
+
<pre>#$ -pe shm.pe ''number of processes''</pre>
+
 
+
where ''number of processes'' must be replaced (for instance, by 12 in our example above). It then determines the value of the environment variable NSLOTS which is used in the "adf" line of the sample script. This way, the system allocates exactly the number of processors that are used for the adf run, and no mismatch can occur.
+
 
+
Once properly modified, the script (let's call it "adf.sh") can be submitted to the Grid Engine by typing
+
 
+
<pre>qsub adf.sh</pre>
+
 
+
The advantage to submit jobs via a load balancing software is that the software will automatically find the resources required and put the job onto a node that has a low load. This will help executing the job faster. Note that the usage of Grid Engine for all production jobs on HPCVL clusters is mandatory. Production jobs that are submitted outside of the load balancing software will be terminated by the system administrator.
+
 
+
Luckily, there is an easier way to do all this: We are supplying a small perl script called that can be called directly, and will ask a few basic questions, such as the name for the job to be submitted and the number of processes to be used in the job. Simply type
+
 
+
<pre>ADFSubmit</pre>
+
 
+
and answer the questions. The script expects a ADF input file with "file extension" .adf to be present and will do everything else automatically. This is meant for simple ADF job submissions. More complex job submissions are better done manually.
+
 
|}
 
|}
  
 
{|  style="border-spacing: 8px;"
 
{|  style="border-spacing: 8px;"
 
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |
 
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |
 +
 
== Licensing ==
 
== Licensing ==
  
ADF is a licensed program. The license held by the Centre for Advanced Computing is limited to our computers at our main site. That means that any of our users can use the program on our machines (but nowhere else), whether they are located at Queen's or not.
+
CPMD is distributed by the [http://www.cpmd.org/ CPMD Consortium] and jointly owned by IBM and the Max-Planck Institute for Solid-State Research in Stuttgart. Non-commercial institutions and individuals [http://cpmd.org/download can obtain a free copy of the program]. The CPMD Consortium requires that you register and that you do not redistribute the code. CPMD is a very portable program, and will run on many platforms.
  
We require users of ADF to [http://www.hpcvl.org/sites/default/files/adf-statement.pdf sign a statement] in which they state that they are informed about the [http://www.hpcvl.org/sites/default/files/adf-licence.pdf terms of the license] to be included in the Gaussian user group named "adf". Please fax the completed statement to (613) 533-2015 or scan/email to [mailto:cac.admin@queensu.ca cac.admin@queensu.ca].
+
However, like with all licensed software on our systems, we require users to [http://www.hpcvl.org/sites/default/files/hpvcl_cpmd_license.pdf read the license agreement]. If you want to use CPMD, you will have to [http://www.hpcvl.org/sites/default/files/hpvcl_cpmd_statement.pdf sign a statement]. Return it to us by fax to (613) 533-2015 or scan/[mailto:cacadmin@queensu.ca email to cac.admin@queensu.ca]. You will then be included in a Unix group "cpmd" and given access to the program.
  
 
== Help ==
 
== Help ==
 +
CPMD is a rather sophisticated program, and requires careful study of the input format, and a certain degree of knowledge about the "nuts and bolts" of computational quantum chemistry and molecular dynamics.
  
* To learn the basics about Gaussian input and output, refer to the [https://www.scm.com/documentation/ADF/index/ ADF 2016 Manual].
+
* It is impossible to use the program efficiently without reading the '''user documentation''', [http://cpmd.org/documentation which can be downloaded here].  
* For templates, and to get many examples, check out [https://www.scm.com/documentation/ADF/Examples/Examples/ https://www.scm.com/documentation/ADF/Examples/Examples/].
+
* There is an official CPMD homepage with information about the program, [http://cpmd.org/download downloading a copy yourself], and the history of CPMD.  
* The [http://www.gaussian.com Gaussian web page] contains a lot of information.  
+
 
* '''Send [mailto:cac.help@queensu.ca|email to cac.help@queensu.ca]'''. We're happy to help.
 
* '''Send [mailto:cac.help@queensu.ca|email to cac.help@queensu.ca]'''. We're happy to help.
 
|}
 
|}

Latest revision as of 17:48, 16 May 2017

Car-Parinello Molecular Dynamics (CPMD)

This is an introduction to the usage of the Ab Initio Molecular Dynamics code "CPMD" on our clusters. It is meant as an initial pointer to more detailed information, and to get started. It doesn't replace the study of the manual.

Features

The CPMD code is a parallelized plane wave/pseudopotential implementation of Density Functional Theory, particularly designed for ab-initio Molecular Dynamics simulation as described by Car and Parinello (R. Car and M. Parrinello, Phys. Rev. Lett. 55, 2471 (1985)) and is distributed free of charge to non-profit organizations. CPMD runs on many different computer architectures and it is well parallelized.

CPMD performs many Quantum-Chemical and Molecular-Dynamics calculations, including:

  • Wavefunction optimization: direct minimization and diagonalization
  • Geometry optimization: local optimization and simulated annealing
  • Molecular dynamics: NVE, NVT, NPT ensembles.
  • Path integral MD, free-energy path-sampling methods
  • Response functions and many electronic structure properties
  • Time-dependent DFT (excitations, molecular dynamics in excited states)
  • LDA, LSD and many popular gradient correction schemes
  • Isolated systems and system with periodic boundary conditions; k-points
  • Hybrid quantum mechanical / molecular mechanics calculations (QM/MM)
  • Coarse-grained non-Markovian meta-dynamics
  • Works with norm conserving or ultra-soft pseudopotentials

For a complete list of capabilities of CPMD, consult the CPMD online manual (note: this refers to a newer version), or check an extensive database of related publications.

Location and Setup

The program resides in /opt/cpmd and is called cpmd.x. You also find some test examples in this directory, which are useful to get an idea of the input format for the program. You are not allowed to copy the executable or any part of the distribution onto your local machine. However you can easily obtain the program yourself. See the CPMD download page. Note that you will need a valid password to download the code.

Unlike other programs, no special setup is needed to run CPMD. However, it is a good idea to put the directory with the CPMD program into the path, which can be done by "usepackage":

use cpmd

Running from a command line

Before you can access the CPMD executables and run the program, you have to read our license agreement. You also have to sign a statement that you have done so, and return it to us (see last section for more information).

To run CPMD, you need to specify the executable, an input file, and (optionally) an output file. Assuming that the CPMD home is in your path, all you need to do is type

cpmd_serial.x input_name >output_name

where input_name is the name of the input file (file extension is recommended to be.inp). If no output file output_name is specified, then the output is sent to the terminal screen.

Parallel Runs

The above command line is for the serial version of the program. For larger runs, it is recommentded to use the parallel version. Then the command line is:

mpirun -np 8 cpmd_ct8.x input_name >output_name

if 8 processes are to be used in the parallel run. Because CPMD uses the Messageg Passing Interface MPI for parallelism, a runtime environment needs to be used to start the program, which is the reason for the mpirun command. In this case, we are using OpenMPI. -np 1 boils down to a serial run.

Like most programs, CPMD requires an input (.inp) file that describes the system for which the calculation will be performed, specifies the level of calculation, and provides other necessary information. The format of the input is described in detail in the CPMD documentation and cannot be explained here.

In addition to the input file you may need other auxiliary files which can be obtained from the CPMD directory. In most cases, you will have provide pseudo-potential files which usually have the file extension .psp. A collection of these may be found in directories below /opt/cpmd/3.13.

Once all input is prepared, you will have to make the decision how many processes you want to use. This involves a trade-off between availability of CPU's on our systems, and the efficiency of additional processes, i.e. scaling. We suggest you perform test calculations of the same type as your production calculation, rerun several times with a varying number of processors. Comparing the timings lets you determine the maximum number of processors that yield acceptable scaling for your production calculation.

Batch Jobs

CPMD, like all production software, has to be run through our scheduler, which submits batch jobs to low-load processors on the cluster. Click here to learn more. A CPMD job must be submitted to the Grid Engine. The calculation is set up by editing the execution script. Here is a sample for such a script:

#!/bin/bash
#$ -S /bin/bash
#$ -V
#$ -cwd
#$ -pe shm.pe 8
#$ -m be
#$ -M hpcXXXX@localhost
#$ -o STD.out
#$ -e STD.err
mpirun -np $NSLOTS cpmd.x input_name > output_name

Replace the entries that differ by the proper values. The lines starting with "#$ -o" and "#$ -e" define the standard output and standard error files, respectively. Note that all lines starting with "#$" are directives for the Grid Engine scheduler, and will be interpreted when the script is submitted to that program. The "#$ -V" instructs the executing shell of the script to inherit the environment of the calling shell (for instance the path). The #$ -cwd option sets the starting directory to the current working directory.

The last line is almost the same as in an interactive run. Input and pseudo-potential files are supposed to be in the same directory as this script. The number of processes is specified in the "#$ -pe" line, which instructs the Grid Engine to allocate the proper number of CPUs for your run. You do not have to specify it separately in the cpmd command line, because Grid Engine sets the environment variable $NSLOTS properly.

Assuming the script is called cpmd.sh, it is submitted by typing

qsub cpmd.csh

No further specification of the output is necessary, since this is done inside the script and handled by Grid Engine.

Licensing

CPMD is distributed by the CPMD Consortium and jointly owned by IBM and the Max-Planck Institute for Solid-State Research in Stuttgart. Non-commercial institutions and individuals can obtain a free copy of the program. The CPMD Consortium requires that you register and that you do not redistribute the code. CPMD is a very portable program, and will run on many platforms.

However, like with all licensed software on our systems, we require users to read the license agreement. If you want to use CPMD, you will have to sign a statement. Return it to us by fax to (613) 533-2015 or scan/email to cac.admin@queensu.ca. You will then be included in a Unix group "cpmd" and given access to the program.

Help

CPMD is a rather sophisticated program, and requires careful study of the input format, and a certain degree of knowledge about the "nuts and bolts" of computational quantum chemistry and molecular dynamics.