Difference between revisions of "HowTo:pyrx"

From CAC Wiki
Jump to: navigation, search
(Version, Location and Access)
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
=PyRx=
 
=PyRx=
This is a quick introduction to the usage of the screening software PyRx that is installed on the HPCVL clusters. It is meant as an initial pointer to more detailed information. It also explains a few specific details about local usage.
+
This is a quick introduction to the usage of the screening software PyRx that is installed on our clusters. It is meant as an initial pointer to more detailed information. It also explains a few specific details about local usage.
  
 
{|  style="border-spacing: 8px;"
 
{|  style="border-spacing: 8px;"
Line 17: Line 17:
  
 
== Version, Location and Access ==
 
== Version, Location and Access ==
 
+
The present version of the program is 0.9.8 and 0.9.4 (somewhat modified), and it is available on the Linux platform in its 64 bit version. Therefore, all the relevant executables are in /global/software/PyRx/0.9.8 and /global/software/PyRx/0.9.4. Documentation can be found at [http://pyrx.sourceforge.net/ at the main PyRx site].
The binary executable is in /opt/PyRx on the SW (Linux) Cluster. The present version of the program is 0.9.4 (somewhat modified), and it is available on the Linux platform in its 64 bit version. Therefore, all the relevant executables are in /opt/PyRx/jeff/0.9.4. Documentation can be found at [http://pyrx.sourceforge.net/ at the main PyRx site].
+
 
|}
 
|}
  
Line 27: Line 26:
 
=== Setup ===
 
=== Setup ===
  
You can run PyRx only on the swlogin1 Linux login node (it won't run on Solaris). From there,  
+
You can run PyRx only on the CAC login nodes. From there,  
 
the setup for PyRx is very simple. It is only necessary type :
 
the setup for PyRx is very simple. It is only necessary type :
<pre>use PyRx</pre>
+
<pre>module load PyRx/098</pre>
 
This will enter the proper directory into your PATH and off you go.
 
This will enter the proper directory into your PATH and off you go.
  
Line 50: Line 49:
  
 
<pre>
 
<pre>
Vina Wizard -> Start Here -> (select /opt/PyRx/0.9.2/bin/vina) -> Start
+
Vina Wizard -> Start Here -> (select /global/software/pyrx/0.9.4/bin/vina) -> Start
 
(highlight Ligands and Macromolecule(s)) -> Forward
 
(highlight Ligands and Macromolecule(s)) -> Forward
 
(adjust values for Search Space) -> Forward
 
(adjust values for Search Space) -> Forward
Line 58: Line 57:
 
There's of course a lot more to it. But the authors of the software claim that it is intuitive enough that you can figure anything out while doing it. Your mileage may vary.
 
There's of course a lot more to it. But the authors of the software claim that it is intuitive enough that you can figure anything out while doing it. Your mileage may vary.
  
=== Batch runs ===
+
=== Production runs (cluster mode) ===
  
Fluent can be run in '''batch mode'''. Since you likely have access to Fluent on your local machines, most interactive work can be done elsewhere, whereas the computationally intensive runs can be executed on a parallel system such as ours.
+
=== NOTE: This sections is obsolete, it needs a complete overhaul ===
  
For this, you have to set up a batch command file that consists of a sequence of commands that are issue to Fluent. To get an idea how such a batch command file looks, you can produce a journal file during an interactive session, and edit it later to eliminate unnecessary commands. Note that this needs to be done using the command line inside Fluent, not the menu buttons of the GUI. In fact, it is best to generate journal files in sessions that have been started with the '''-g option''', i.e. that do not use the GUI at all.
+
If you are screening hundreds (or even thousands) of molecules using PyRx the time required may be too much for interactive usage. PyRx offers some basic interface with a scheduler, but the default settings are too non-specific to work with our systems.
  
The '''"Text User Interface"''' that has to be used for writing batch files is documented in the Fluent documentation.
+
For the Vina Wizard, we have provided a work-around that allows you to work through a large number of runs using the machines on the SW cluster in parallel. '''Before you are trying to do this [[https://cac.queensu.ca/wiki/index.php/SLURM|read through our Slurm wiki page]] to learn how jobs are submitted to our production clusters.'''
Here is an example for a simple batch file that reads in a "case", initializes the flow, and runs 200 iterations. At the end a "data file" is printed and Fluent exits.
+
 
 +
The procedure for this starts off the same as for the interactive approach:
  
 
<pre>
 
<pre>
rc fan.cas
+
Vina Wizard -> Start Here -> (select Cluster(Portable Batch System)) -> Start
/solve/initialize/initialize-flow
+
(highlight Ligands and Macromolecule(s)) -> Forward
/solve/iterate 1
+
(adjust values for Search Space) -> Forward
/file/write-data fan_1
+
exit
+
yes
+
 
</pre>
 
</pre>
  
Let's call this file "example.flin". Note that every command has to be included in the batch command file, including the answer "yes" to the question if you really want to exit the program without saving the case file. Once you have produced a working command file, you can test it by calling
+
However, in this case the "Cluster" setting was selected and as a result, the program is not actually running any docking software, but rather generates a large number of scripts in a directory
  
<pre>fluent 3ddp -g -i example.flin</pre>
+
<pre>
 +
~/.PyRx_workspace/Macromolecules/MACRO
 +
</pre>
  
We have assumed you are running a three-dimensional solver in double precision. You will have to alter this entries when the case is different. Make sure that the output file for the data (in this case, "fan_1.dat") does not exist before you start the job, otherwise the system will query if you want to over-write it and the answer is not in your command file.
+
where "MACRO" stands for the name of the macromolecule you are using, and "~" is short for the name of your home directory. To run the actual analysis on our cluster, you now need to go into that directory and execute a "perl" script that we have provided for this purpose:
  
Once everything works you could submit this job into the background (using bash) by typing
+
<pre>
 +
cd ~/.PyRx_workspace/Macromolecules/MACRO
 +
PyRxVinaArray.pl
 +
</pre>
  
<pre>fluent 3d -g -i example.flin > example.flout 2>&1 & </pre>
+
This will generate two new sub-directories "jobs" and "logs" and copy the scripts mentioned earlier, then produce a job for our [[FAQ:SGE|scheduler "Grid Engine"]], and submit it. Using the qstat command,, you should then be seing something like:
  
This would redirect standard output and standard error to example.flout. The point is that Fluent is run non-interactively this way, i.e. we can use the same technique to submit a production job to the scheduler, as shown in the next section.
+
<pre>
 +
$ qstat
 +
job-ID  prior  name      user        state submit/start at    queue                          slots ja-task-ID
 +
-----------------------------------------------------------------------------------------------------------------
 +
952371 0.50734 runVina.sh hpcXXXX      r    11/17/2015 12:59:29 abaqus.q@sw0013.hpcvl.org          8 64
 +
952371 0.50734 runVina.sh hpcXXXX      r    11/17/2015 12:48:59 abaqus.q@sw0020.hpcvl.org          8 60
 +
952371 0.50734 runVina.sh hpcXXXX      r    11/17/2015 12:58:29 abaqus.q@sw0044.hpcvl.org          8 62
 +
952371 0.50734 runVina.sh hpcXXXX      r    11/17/2015 12:58:59 abaqus.q@sw0047.hpcvl.org          8 63
 +
952371 0.50734 runVina.sh hpcXXXX      r    11/17/2015 13:05:59 abaqus.q@sw0048.hpcvl.org          8 65
 +
952371 0.50734 runVina.sh hpcXXXX      r    11/17/2015 13:09:29 abaqus.q@sw0054.hpcvl.org          8 66
 +
952371 0.50734 runVina.sh hpcXXXX      qw    11/17/2015 09:30:03                                    8 67-511:1
 +
</pre>
  
=== Production runs ===
+
As you can see, it's working on 6 "Vina" jobs simultaneously, with 8 processors each for a total of 48.
  
To submit a production job on our clusters, '''you must use the Grid Engine scheduler'''. To obtain details, [[HowTo:scheduler|read our Grid Engine help file]]. Production jobs that are run without scheduler will be terminated by the system administrator.
+
Once the "qstat" command does not show anything anymore, the analyses are finished, and you can go back to your PyRX GUI:
 
+
For a Fluent production job, this means that rather than issuing the above batch command directly, you wrap it into a Grid Engine script that looks somewhat like this:
+
  
 
<pre>
 
<pre>
#!/bin/bash
+
-> Forward
#$ -S /bin/bash
+
(check results in bottom window)
#$ -q abaqus.q
+
#$ -l qname=abaqus.q
+
#$ -V
+
#$ -cwd
+
#$ -pe shm.pe 12
+
#$ -m be
+
#$ -M hpcXXXX@localhost
+
#$ -o STD.out
+
#$ -e STD.err
+
rm fan_1.dat
+
. /opt/fluent/ansys-16.1/setup_64bit.sh
+
fluent 3ddp -t$NSLOTS -g -i example.flin
+
 
</pre>
 
</pre>
  
Here we are running the above example batch file "example.flin" using 12 processors on a parallel machine. The output and any error messages from the system are re-directed to files called "STD.out" and "STD.err", respectively. The "#$ -q" and "#$ -l" entries force execution on the Linux cluster (important!). Email notification is handled by the "#$ -m" and "#$ -M" lines. Replace "hpcXXXX" by your actual username and make sure that a file called ".forward" that contains you actual email address is in your home directory. This practice makes it impossible for other users to see your email address.  
+
Note that this works only for the analysis with ''Vina''. If you want to do something similar with a different analysis (for instance Autodock4), please get in touch with us. We can probably come up with a solution.
  
Many Fluent jobs that you run on our machines are likely to be quite large. To utilize the parallel structure of our machines, Fluent offers several options to execute the solver in a parallel environment, i.e. on several CPU's simultaneously. The default option for such runs is MPI i.e., it uses the Message Passing Interface for inter-process communication.
 
 
To take advantage of the parallel capabilities of Fluent, you have to call the program with additional command line options that specify the details of your parallel run:
 
 
* -tn where n is the number of processors requested, e.g. if you want to run with 8 processors, you would use the option -t12
 
* -g specifies that the GUI should be surpressed. This is required for batch jobs.
 
 
Parallel jobs of longer runtime should only be run in batch using the Grid Engine. The number of processors "12" specified in our example script appears only once, after
 
 
<pre>#$ -pe shm.pe</pre>
 
 
which is where you let the Grid Engine know how many processors to allocate to run the program. The internal environment variable '''NSLOTS''' will automatically be set to this value and can then be used in the fluent command line.
 
 
It is also necessary to source a setup file called '''setup_64bit.sh'''. This will set various environment variables and enable the Fluent program to properly interact with Grid Engine. If you are interested, take a look. The file is readable.
 
 
All processes are allocated within a single node. This is to make communication more efficient and to avoid problems with the control by Gridengine. The effect of this is that, while still using MPI, Fluent employs a so-called shared-memory layer for communication. The disadvantage is that the size of the job is restricted by the number of cores on a node. Once the script has been adapted (let's call it "fluent.sh"), it can be submitted to the Gridengine by
 
 
<pre>qsub fluent.sh</pre>
 
 
from the login node. Note that the job will appear as a parallel job on the Grid Engine's "qstat" or "qmon" commands. Note also that submission of a parallel job in this way is only profitable for large systems that use many CPU cycles, since the overhead for assigning processes, preparing nodes, and communication between them is considerable.
 
 
There is an easier way to do this: We are supplying a small perl script called that can be called directly, and will ask a few basic questions, such as the name for the job to be submitted and the number of processes to be used in the job. Simply type
 
 
<pre>AnsysSubmit</pre>
 
 
and answer the questions. The script expects a Fluent input file with "file extension" .flin to be present and will do everything else automatically. This is meant for simple Fluent job submissions. More complex job submissions are better done manually.
 
 
|}
 
|}
  
Line 145: Line 118:
  
 
== Further Help ==
 
== Further Help ==
 
+
* There is a [http://pyrx.sourceforge.net the PyRx Homepage]. There is '''no manual''' for the software.  
Fluent is a complex software package, and requires some practice to be used efficiently. In this FAQ we can not explain it use in any detail.
+
* Basic questions are answered in a [http://pyrx.sourceforge.net/faq FAQ file]  
 
+
* [http://mgl.scripps.edu/forum/viewforum.php?f=25 Scripps provides a forum] that may help you out
The documentation for Fluent can be access from inside the program GUI by clicking on the '''"Help" button on the upper right'''. This is in html format. The pdf version of the docs can be found in
+
* [http://pyrx.sourceforge.net/videos Some videos] demonstrate how the software is used.  
 
+
* If you run into issues you can't resolve, please send email to [mailto:cac.help@queensu.ca cac.help@queensu.ca].
<pre>/opt/fluent/ansys-16.0/v140/commonfiles/help/en-us/pdf</pre>
+
 
+
Fluent documentation is subject to the same license terms as the software itself, i.e. you have to be signed up as a Fluent user in order to access it.
+
 
+
If you are experiencing trouble running a batch command script, check carefully if the sequence of commands is exactly in sync with the program. This might mean typing them in interactively as a test. If you have problems with the Grid Engine, [[FAQ:SGE|read our FAQ on that subject]], and maybe consult the [http://www.hpcvl.org/sites/default/files/hpvcl_sge_manual.pdf manual for that software] which is accessible as a PDF file. HPCVL also provide user support in the case of technical problems: just send [mailto:cac.help@queensu.ca email to cac.help@queensu.ca].
+
 
|}
 
|}

Latest revision as of 18:51, 16 July 2020

PyRx

This is a quick introduction to the usage of the screening software PyRx that is installed on our clusters. It is meant as an initial pointer to more detailed information. It also explains a few specific details about local usage.

What is PyRx ?

PyRx is a Virtual Screening software for Computational Drug Discovery that can be used to screen libraries of compounds against potential drug targets. It is a GUI that uses a large body of established open source software such as:

  • AutoDock 4 and AutoDock Vina are used as a docking software.
  • AutoDockTools, used to generate input files.
  • Python as a programming/scripting language.
  • wxPython for cross-platform GUI.
  • The Visualization ToolKit (VTK) by Kitware, Inc.
  • Enthought Tool Suite, including Traits, for application building blocks.
  • Opal Toolkit for running AutoDock remotely using web services.
  • Open Babel for importing SDF files, removing salts and energy minimization.
  • matplotlib for 2D plotting.

Version, Location and Access

The present version of the program is 0.9.8 and 0.9.4 (somewhat modified), and it is available on the Linux platform in its 64 bit version. Therefore, all the relevant executables are in /global/software/PyRx/0.9.8 and /global/software/PyRx/0.9.4. Documentation can be found at at the main PyRx site.

Running PyRx

Setup

You can run PyRx only on the CAC login nodes. From there, the setup for PyRx is very simple. It is only necessary type :

module load PyRx/098

This will enter the proper directory into your PATH and off you go.

Interactive runs

Issuing the command

PyRx

will pop up the GUI. All operations are performed from within that interface. At a minimum, you will have to specify a macromolecule and at least one compound that you want to "dock". These molecules can be specified in several formats such as pdb, pdbq, cif, mol2. You can Import or Load molecules from the

File -> Load Molecule

or the

File -> Import ...

tab.

The actual Analysis is performed using various tabs on the GUI. As an example we outlined the steps using the "Vina Wizard" which runs a software called "Autodock Vina" for the Analysis:

Vina Wizard -> Start Here -> (select /global/software/pyrx/0.9.4/bin/vina) -> Start
(highlight Ligands and Macromolecule(s)) -> Forward
(adjust values for Search Space) -> Forward
(check results in bottom window)

There's of course a lot more to it. But the authors of the software claim that it is intuitive enough that you can figure anything out while doing it. Your mileage may vary.

Production runs (cluster mode)

NOTE: This sections is obsolete, it needs a complete overhaul

If you are screening hundreds (or even thousands) of molecules using PyRx the time required may be too much for interactive usage. PyRx offers some basic interface with a scheduler, but the default settings are too non-specific to work with our systems.

For the Vina Wizard, we have provided a work-around that allows you to work through a large number of runs using the machines on the SW cluster in parallel. Before you are trying to do this [through our Slurm wiki page] to learn how jobs are submitted to our production clusters.

The procedure for this starts off the same as for the interactive approach:

Vina Wizard -> Start Here -> (select Cluster(Portable Batch System)) -> Start
(highlight Ligands and Macromolecule(s)) -> Forward
(adjust values for Search Space) -> Forward

However, in this case the "Cluster" setting was selected and as a result, the program is not actually running any docking software, but rather generates a large number of scripts in a directory

~/.PyRx_workspace/Macromolecules/MACRO

where "MACRO" stands for the name of the macromolecule you are using, and "~" is short for the name of your home directory. To run the actual analysis on our cluster, you now need to go into that directory and execute a "perl" script that we have provided for this purpose:

cd ~/.PyRx_workspace/Macromolecules/MACRO
PyRxVinaArray.pl

This will generate two new sub-directories "jobs" and "logs" and copy the scripts mentioned earlier, then produce a job for our scheduler "Grid Engine", and submit it. Using the qstat command,, you should then be seing something like:

$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 12:59:29 abaqus.q@sw0013.hpcvl.org          8 64
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 12:48:59 abaqus.q@sw0020.hpcvl.org          8 60
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 12:58:29 abaqus.q@sw0044.hpcvl.org          8 62
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 12:58:59 abaqus.q@sw0047.hpcvl.org          8 63
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 13:05:59 abaqus.q@sw0048.hpcvl.org          8 65
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 13:09:29 abaqus.q@sw0054.hpcvl.org          8 66
 952371 0.50734 runVina.sh hpcXXXX      qw    11/17/2015 09:30:03                                    8 67-511:1

As you can see, it's working on 6 "Vina" jobs simultaneously, with 8 processors each for a total of 48.

Once the "qstat" command does not show anything anymore, the analyses are finished, and you can go back to your PyRX GUI:

-> Forward
(check results in bottom window)

Note that this works only for the analysis with Vina. If you want to do something similar with a different analysis (for instance Autodock4), please get in touch with us. We can probably come up with a solution.

Further Help