Difference between revisions of "HowTo:pyrx"

From CAC Wiki
Jump to: navigation, search
(Created page with "=PyRx= This is a quick introduction to the usage of the screening software PyRx that is installed on the HPCVL clusters. It is meant as an initial pointer to more detailed inf...")
 
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
=PyRx=
 
=PyRx=
This is a quick introduction to the usage of the screening software PyRx that is installed on the HPCVL clusters. It is meant as an initial pointer to more detailed information. It also explains a few specific details about local usage.
+
This is a quick introduction to the usage of the screening software PyRx that is installed on our clusters. It is meant as an initial pointer to more detailed information. It also explains a few specific details about local usage.
  
 
{|  style="border-spacing: 8px;"
 
{|  style="border-spacing: 8px;"
Line 16: Line 16:
 
* matplotlib for 2D plotting.  
 
* matplotlib for 2D plotting.  
  
== Version ==  
+
== Version, Location and Access ==
 
+
The present version of the program is 0.9.8 and 0.9.4 (somewhat modified), and it is available on the Linux platform in its 64 bit version. Therefore, all the relevant executables are in /global/software/PyRx/0.9.8 and /global/software/PyRx/0.9.4. Documentation can be found at [http://pyrx.sourceforge.net/ at the main PyRx site].
The current version of Fluent on our systems is Ansys-16 Fluent.
+
 
+
The "legacy" Ansys-14 version of Fluent is available, but all older versions have been de-commissioned.
+
 
+
== Location and Access ==
+
 
+
Fluent runs under the Linux operating system. The Solaris platform is not supported.
+
 
+
The program is located in /opt/fluent/ on swlogin1 and the Linux production nodes.
+
 
+
To use Fluent, you have to be a trained University User of Fluent. It is furthermore required that you [http://www.hpcvl.org/sites/default/files/hpcvl%20FluentLicenseTerms.pdf read our licensing terms], and [http://www.hpcvl.org/sites/default/files/hpcvl%20Fluent_statement.pdf sign a statement]. We will confirm your statement, and you will then be made a member of a Unix group fluent, which enables you to run the software. [mailto:cac.help@queensu.ca Contact us] if you are in doubt of whether you qualify to run Fluent on our system.
+
 
+
== Licensing ==
+
 
+
The Fluent license is "seat limited" and "process limited". At present, there are the following licensing limits on our systems:
+
 
+
<pre>25 program runs plus 512 parallel processes</pre>
+
 
+
i.e. at most 25 separate sessions can be run simultaneously (serial or parallel). Each of these sessions can run up to 4 processes for a total of 100. In addition, it is possible to run up to 512 "parallel only" processes in total. One scenario would be 24 users have 24 process parallel jobs running, and another one with 36, thus using up all available Fluent resources.
+
 
|}
 
|}
  
Line 42: Line 23:
 
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#f7f7f7; border-radius:7px" |
 
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#f7f7f7; border-radius:7px" |
  
== Running Fluent ==
+
== Running PyRx ==
 
=== Setup ===
 
=== Setup ===
  
The setup for Fluent is done via '''usepackage'''. Simply type:
+
You can run PyRx only on the CAC login nodes. From there,
 
+
the setup for PyRx is very simple. It is only necessary type :
<pre>use fluent</pre>
+
<pre>module load PyRx/098</pre>
 
+
This will enter the proper directory into your PATH and off you go.
on the Linux workup node or include this command in your setup (.bash_profile) file. This will set up the current (Ansys 16) version of Fluent. Note that you have to be in the '''fluent''' Unix group for this to work.
+
  
 
=== Interactive runs ===
 
=== Interactive runs ===
 +
Issuing the command
  
These instructions in this section are only useful if you want to use the graphical user interface of Fluent, for instance to set up a job, or pre- and post-process a production job. If you want to run a production job, please refer to to instructions on how to start a Fluent batch job (see below).
+
<pre>PyRx</pre>
  
Invoke a graphical user interface by typing
+
will pop up the GUI. All operations are performed from within that interface. At a minimum, you will have to specify a macromolecule and at least one compound that you want to "dock". These molecules can be specified in several formats such as ''pdb, pdbq, cif, mol2''. You can Import or Load molecules from the
  
<pre>fluent</pre>
+
<pre>File -> Load Molecule</pre>
  
from the command prompt. You must be on the Linux login node '''swlogin1'''  or another Linux node to do this.
+
or the  
  
The first choice you have to make is if you are solving a two- or a three-dimensional problem, and if you want to do so in single or double precision. You can do so by typing 2d, 3d, 2ddp, or 3ddp, following the fluent command, or you select this from the GUI that appears when you issue the "fluent" command.  
+
<pre>File -> Import ...</pre>
  
All commands can be issued in manually or by clicking on the GUI and selecting the appropriate sub-choices. Note that if you want to type a command yourself, and you do not know what your choices are, simply pressing the Enter key will give you a list of applicable commands.
+
tab.
  
It's not possible here to outline how to use Fluent. In many cases, you will want to read in a case file, which has all the required information to describe the system you want to simulate. Such case files have the file extension .cas. Load them by issuing the /file/read-case command or selecting the corresponding menu-commands in the GUI. You can now check and display the grid, specify boundary conditions and material properties, initialize the flow, and perform calculations.
+
The actual Analysis is performed using various tabs on the GUI. As an example we outlined the steps using the "Vina Wizard" which runs a software called "Autodock Vina" for the Analysis:
  
Results are usually saved by the /file/write-case-data command. During an interactive session, it is sometimes a good idea to keep a journal file which records all commands that you have typed in or issued via the GUI. This journal file can later be used as a template for a batch command file. Define the journal file with the /file/start-journal command.
+
<pre>
 +
Vina Wizard -> Start Here -> (select /global/software/pyrx/0.9.4/bin/vina) -> Start
 +
(highlight Ligands and Macromolecule(s)) -> Forward
 +
(adjust values for Search Space) -> Forward
 +
(check results in bottom window)
 +
</pre>
  
Documentation for Ansys-14 Fluent can be accessed directly from the Fluent GUI by pressing the '''"Help" button''' on the upper right (html format), or in pdf format in '''/opt/fluent/ansys-16.0/v140/commonfiles/help/en-us/pdf''' on the Linux login node swlogin1.
+
There's of course a lot more to it. But the authors of the software claim that it is intuitive enough that you can figure anything out while doing it. Your mileage may vary.
  
Note that the documentation is only accessible if you are signed up as a Fluent user on our system.
+
=== Production runs (cluster mode) ===
  
=== Batch runs ===
+
=== NOTE: This sections is obsolete, it needs a complete overhaul ===
  
Fluent can be run in '''batch mode'''. Since you likely have access to Fluent on your local machines, most interactive work can be done elsewhere, whereas the computationally intensive runs can be executed on a parallel system such as ours.
+
If you are screening hundreds (or even thousands) of molecules using PyRx the time required may be too much for interactive usage. PyRx offers some basic interface with a scheduler, but the default settings are too non-specific to work with our systems.
  
For this, you have to set up a batch command file that consists of a sequence of commands that are issue to Fluent. To get an idea how such a batch command file looks, you can produce a journal file during an interactive session, and edit it later to eliminate unnecessary commands. Note that this needs to be done using the command line inside Fluent, not the menu buttons of the GUI. In fact, it is best to generate journal files in sessions that have been started with the '''-g option''', i.e. that do not use the GUI at all.
+
For the Vina Wizard, we have provided a work-around that allows you to work through a large number of runs using the machines on the SW cluster in parallel. '''Before you are trying to do this [[https://cac.queensu.ca/wiki/index.php/SLURM|read through our Slurm wiki page]] to learn how jobs are submitted to our production clusters.'''
  
The '''"Text User Interface"''' that has to be used for writing batch files is documented in the Fluent documentation.
+
The procedure for this starts off the same as for the interactive approach:
Here is an example for a simple batch file that reads in a "case", initializes the flow, and runs 200 iterations. At the end a "data file" is printed and Fluent exits.
+
  
 
<pre>
 
<pre>
rc fan.cas
+
Vina Wizard -> Start Here -> (select Cluster(Portable Batch System)) -> Start
/solve/initialize/initialize-flow
+
(highlight Ligands and Macromolecule(s)) -> Forward
/solve/iterate 1
+
(adjust values for Search Space) -> Forward
/file/write-data fan_1
+
exit
+
yes
+
 
</pre>
 
</pre>
  
Let's call this file "example.flin". Note that every command has to be included in the batch command file, including the answer "yes" to the question if you really want to exit the program without saving the case file. Once you have produced a working command file, you can test it by calling
+
However, in this case the "Cluster" setting was selected and as a result, the program is not actually running any docking software, but rather generates a large number of scripts in a directory
  
<pre>fluent 3ddp -g -i example.flin</pre>
+
<pre>
 +
~/.PyRx_workspace/Macromolecules/MACRO
 +
</pre>
  
We have assumed you are running a three-dimensional solver in double precision. You will have to alter this entries when the case is different. Make sure that the output file for the data (in this case, "fan_1.dat") does not exist before you start the job, otherwise the system will query if you want to over-write it and the answer is not in your command file.
+
where "MACRO" stands for the name of the macromolecule you are using, and "~" is short for the name of your home directory. To run the actual analysis on our cluster, you now need to go into that directory and execute a "perl" script that we have provided for this purpose:
  
Once everything works you could submit this job into the background (using bash) by typing
+
<pre>
 +
cd ~/.PyRx_workspace/Macromolecules/MACRO
 +
PyRxVinaArray.pl
 +
</pre>
  
<pre>fluent 3d -g -i example.flin > example.flout 2>&1 & </pre>
+
This will generate two new sub-directories "jobs" and "logs" and copy the scripts mentioned earlier, then produce a job for our [[FAQ:SGE|scheduler "Grid Engine"]], and submit it. Using the qstat command,, you should then be seing something like:
  
This would redirect standard output and standard error to example.flout. The point is that Fluent is run non-interactively this way, i.e. we can use the same technique to submit a production job to the scheduler, as shown in the next section.
+
<pre>
 +
$ qstat
 +
job-ID  prior  name      user        state submit/start at    queue                          slots ja-task-ID
 +
-----------------------------------------------------------------------------------------------------------------
 +
952371 0.50734 runVina.sh hpcXXXX      r    11/17/2015 12:59:29 abaqus.q@sw0013.hpcvl.org          8 64
 +
952371 0.50734 runVina.sh hpcXXXX      r    11/17/2015 12:48:59 abaqus.q@sw0020.hpcvl.org          8 60
 +
952371 0.50734 runVina.sh hpcXXXX      r    11/17/2015 12:58:29 abaqus.q@sw0044.hpcvl.org          8 62
 +
952371 0.50734 runVina.sh hpcXXXX      r    11/17/2015 12:58:59 abaqus.q@sw0047.hpcvl.org          8 63
 +
952371 0.50734 runVina.sh hpcXXXX      r    11/17/2015 13:05:59 abaqus.q@sw0048.hpcvl.org          8 65
 +
952371 0.50734 runVina.sh hpcXXXX      r    11/17/2015 13:09:29 abaqus.q@sw0054.hpcvl.org          8 66
 +
952371 0.50734 runVina.sh hpcXXXX      qw    11/17/2015 09:30:03                                    8 67-511:1
 +
</pre>
  
=== Production runs ===
+
As you can see, it's working on 6 "Vina" jobs simultaneously, with 8 processors each for a total of 48.
  
To submit a production job on our clusters, '''you must use the Grid Engine scheduler'''. To obtain details, [[HowTo:scheduler|read our Grid Engine help file]]. Production jobs that are run without scheduler will be terminated by the system administrator.
+
Once the "qstat" command does not show anything anymore, the analyses are finished, and you can go back to your PyRX GUI:
 
+
For a Fluent production job, this means that rather than issuing the above batch command directly, you wrap it into a Grid Engine script that looks somewhat like this:
+
  
 
<pre>
 
<pre>
#!/bin/bash
+
-> Forward
#$ -S /bin/bash
+
(check results in bottom window)
#$ -q abaqus.q
+
#$ -l qname=abaqus.q
+
#$ -V
+
#$ -cwd
+
#$ -pe shm.pe 12
+
#$ -m be
+
#$ -M hpcXXXX@localhost
+
#$ -o STD.out
+
#$ -e STD.err
+
rm fan_1.dat
+
. /opt/fluent/ansys-16.1/setup_64bit.sh
+
fluent 3ddp -t$NSLOTS -g -i example.flin
+
 
</pre>
 
</pre>
  
Here we are running the above example batch file "example.flin" using 12 processors on a parallel machine. The output and any error messages from the system are re-directed to files called "STD.out" and "STD.err", respectively. The "#$ -q" and "#$ -l" entries force execution on the Linux cluster (important!). Email notification is handled by the "#$ -m" and "#$ -M" lines. Replace "hpcXXXX" by your actual username and make sure that a file called ".forward" that contains you actual email address is in your home directory. This practice makes it impossible for other users to see your email address.  
+
Note that this works only for the analysis with ''Vina''. If you want to do something similar with a different analysis (for instance Autodock4), please get in touch with us. We can probably come up with a solution.
  
Many Fluent jobs that you run on our machines are likely to be quite large. To utilize the parallel structure of our machines, Fluent offers several options to execute the solver in a parallel environment, i.e. on several CPU's simultaneously. The default option for such runs is MPI i.e., it uses the Message Passing Interface for inter-process communication.
 
 
To take advantage of the parallel capabilities of Fluent, you have to call the program with additional command line options that specify the details of your parallel run:
 
 
* -tn where n is the number of processors requested, e.g. if you want to run with 8 processors, you would use the option -t12
 
* -g specifies that the GUI should be surpressed. This is required for batch jobs.
 
 
Parallel jobs of longer runtime should only be run in batch using the Grid Engine. The number of processors "12" specified in our example script appears only once, after
 
 
<pre>#$ -pe shm.pe</pre>
 
 
which is where you let the Grid Engine know how many processors to allocate to run the program. The internal environment variable '''NSLOTS''' will automatically be set to this value and can then be used in the fluent command line.
 
 
It is also necessary to source a setup file called '''setup_64bit.sh'''. This will set various environment variables and enable the Fluent program to properly interact with Grid Engine. If you are interested, take a look. The file is readable.
 
 
All processes are allocated within a single node. This is to make communication more efficient and to avoid problems with the control by Gridengine. The effect of this is that, while still using MPI, Fluent employs a so-called shared-memory layer for communication. The disadvantage is that the size of the job is restricted by the number of cores on a node. Once the script has been adapted (let's call it "fluent.sh"), it can be submitted to the Gridengine by
 
 
<pre>qsub fluent.sh</pre>
 
 
from the login node. Note that the job will appear as a parallel job on the Grid Engine's "qstat" or "qmon" commands. Note also that submission of a parallel job in this way is only profitable for large systems that use many CPU cycles, since the overhead for assigning processes, preparing nodes, and communication between them is considerable.
 
 
There is an easier way to do this: We are supplying a small perl script called that can be called directly, and will ask a few basic questions, such as the name for the job to be submitted and the number of processes to be used in the job. Simply type
 
 
<pre>AnsysSubmit</pre>
 
 
and answer the questions. The script expects a Fluent input file with "file extension" .flin to be present and will do everything else automatically. This is meant for simple Fluent job submissions. More complex job submissions are better done manually.
 
 
|}
 
|}
  
Line 160: Line 118:
  
 
== Further Help ==
 
== Further Help ==
 
+
* There is a [http://pyrx.sourceforge.net the PyRx Homepage]. There is '''no manual''' for the software.  
Fluent is a complex software package, and requires some practice to be used efficiently. In this FAQ we can not explain it use in any detail.
+
* Basic questions are answered in a [http://pyrx.sourceforge.net/faq FAQ file]  
 
+
* [http://mgl.scripps.edu/forum/viewforum.php?f=25 Scripps provides a forum] that may help you out
The documentation for Fluent can be access from inside the program GUI by clicking on the '''"Help" button on the upper right'''. This is in html format. The pdf version of the docs can be found in
+
* [http://pyrx.sourceforge.net/videos Some videos] demonstrate how the software is used.  
 
+
* If you run into issues you can't resolve, please send email to [mailto:cac.help@queensu.ca cac.help@queensu.ca].
<pre>/opt/fluent/ansys-16.0/v140/commonfiles/help/en-us/pdf</pre>
+
 
+
Fluent documentation is subject to the same license terms as the software itself, i.e. you have to be signed up as a Fluent user in order to access it.
+
 
+
If you are experiencing trouble running a batch command script, check carefully if the sequence of commands is exactly in sync with the program. This might mean typing them in interactively as a test. If you have problems with the Grid Engine, [[FAQ:SGE|read our FAQ on that subject]], and maybe consult the [http://www.hpcvl.org/sites/default/files/hpvcl_sge_manual.pdf manual for that software] which is accessible as a PDF file. HPCVL also provide user support in the case of technical problems: just send [mailto:cac.help@queensu.ca email to cac.help@queensu.ca].
+
 
|}
 
|}

Latest revision as of 18:51, 16 July 2020

PyRx

This is a quick introduction to the usage of the screening software PyRx that is installed on our clusters. It is meant as an initial pointer to more detailed information. It also explains a few specific details about local usage.

What is PyRx ?

PyRx is a Virtual Screening software for Computational Drug Discovery that can be used to screen libraries of compounds against potential drug targets. It is a GUI that uses a large body of established open source software such as:

  • AutoDock 4 and AutoDock Vina are used as a docking software.
  • AutoDockTools, used to generate input files.
  • Python as a programming/scripting language.
  • wxPython for cross-platform GUI.
  • The Visualization ToolKit (VTK) by Kitware, Inc.
  • Enthought Tool Suite, including Traits, for application building blocks.
  • Opal Toolkit for running AutoDock remotely using web services.
  • Open Babel for importing SDF files, removing salts and energy minimization.
  • matplotlib for 2D plotting.

Version, Location and Access

The present version of the program is 0.9.8 and 0.9.4 (somewhat modified), and it is available on the Linux platform in its 64 bit version. Therefore, all the relevant executables are in /global/software/PyRx/0.9.8 and /global/software/PyRx/0.9.4. Documentation can be found at at the main PyRx site.

Running PyRx

Setup

You can run PyRx only on the CAC login nodes. From there, the setup for PyRx is very simple. It is only necessary type :

module load PyRx/098

This will enter the proper directory into your PATH and off you go.

Interactive runs

Issuing the command

PyRx

will pop up the GUI. All operations are performed from within that interface. At a minimum, you will have to specify a macromolecule and at least one compound that you want to "dock". These molecules can be specified in several formats such as pdb, pdbq, cif, mol2. You can Import or Load molecules from the

File -> Load Molecule

or the

File -> Import ...

tab.

The actual Analysis is performed using various tabs on the GUI. As an example we outlined the steps using the "Vina Wizard" which runs a software called "Autodock Vina" for the Analysis:

Vina Wizard -> Start Here -> (select /global/software/pyrx/0.9.4/bin/vina) -> Start
(highlight Ligands and Macromolecule(s)) -> Forward
(adjust values for Search Space) -> Forward
(check results in bottom window)

There's of course a lot more to it. But the authors of the software claim that it is intuitive enough that you can figure anything out while doing it. Your mileage may vary.

Production runs (cluster mode)

NOTE: This sections is obsolete, it needs a complete overhaul

If you are screening hundreds (or even thousands) of molecules using PyRx the time required may be too much for interactive usage. PyRx offers some basic interface with a scheduler, but the default settings are too non-specific to work with our systems.

For the Vina Wizard, we have provided a work-around that allows you to work through a large number of runs using the machines on the SW cluster in parallel. Before you are trying to do this [through our Slurm wiki page] to learn how jobs are submitted to our production clusters.

The procedure for this starts off the same as for the interactive approach:

Vina Wizard -> Start Here -> (select Cluster(Portable Batch System)) -> Start
(highlight Ligands and Macromolecule(s)) -> Forward
(adjust values for Search Space) -> Forward

However, in this case the "Cluster" setting was selected and as a result, the program is not actually running any docking software, but rather generates a large number of scripts in a directory

~/.PyRx_workspace/Macromolecules/MACRO

where "MACRO" stands for the name of the macromolecule you are using, and "~" is short for the name of your home directory. To run the actual analysis on our cluster, you now need to go into that directory and execute a "perl" script that we have provided for this purpose:

cd ~/.PyRx_workspace/Macromolecules/MACRO
PyRxVinaArray.pl

This will generate two new sub-directories "jobs" and "logs" and copy the scripts mentioned earlier, then produce a job for our scheduler "Grid Engine", and submit it. Using the qstat command,, you should then be seing something like:

$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 12:59:29 abaqus.q@sw0013.hpcvl.org          8 64
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 12:48:59 abaqus.q@sw0020.hpcvl.org          8 60
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 12:58:29 abaqus.q@sw0044.hpcvl.org          8 62
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 12:58:59 abaqus.q@sw0047.hpcvl.org          8 63
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 13:05:59 abaqus.q@sw0048.hpcvl.org          8 65
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 13:09:29 abaqus.q@sw0054.hpcvl.org          8 66
 952371 0.50734 runVina.sh hpcXXXX      qw    11/17/2015 09:30:03                                    8 67-511:1

As you can see, it's working on 6 "Vina" jobs simultaneously, with 8 processors each for a total of 48.

Once the "qstat" command does not show anything anymore, the analyses are finished, and you can go back to your PyRX GUI:

-> Forward
(check results in bottom window)

Note that this works only for the analysis with Vina. If you want to do something similar with a different analysis (for instance Autodock4), please get in touch with us. We can probably come up with a solution.

Further Help