HowTo:pyrx

From CAC Wiki
Jump to: navigation, search

PyRx

This is a quick introduction to the usage of the screening software PyRx that is installed on our clusters. It is meant as an initial pointer to more detailed information. It also explains a few specific details about local usage.

What is PyRx ?

PyRx is a Virtual Screening software for Computational Drug Discovery that can be used to screen libraries of compounds against potential drug targets. It is a GUI that uses a large body of established open source software such as:

  • AutoDock 4 and AutoDock Vina are used as a docking software.
  • AutoDockTools, used to generate input files.
  • Python as a programming/scripting language.
  • wxPython for cross-platform GUI.
  • The Visualization ToolKit (VTK) by Kitware, Inc.
  • Enthought Tool Suite, including Traits, for application building blocks.
  • Opal Toolkit for running AutoDock remotely using web services.
  • Open Babel for importing SDF files, removing salts and energy minimization.
  • matplotlib for 2D plotting.

Version, Location and Access

The present version of the program is 0.9.8 and 0.9.4 (somewhat modified), and it is available on the Linux platform in its 64 bit version. Therefore, all the relevant executables are in /global/software/PyRx/0.9.8 and /global/software/PyRx/0.9.4. Documentation can be found at at the main PyRx site.

Running PyRx

Setup

You can run PyRx only on the CAC login nodes. From there, the setup for PyRx is very simple. It is only necessary type :

module load PyRx/098

This will enter the proper directory into your PATH and off you go.

Interactive runs

Issuing the command

PyRx

will pop up the GUI. All operations are performed from within that interface. At a minimum, you will have to specify a macromolecule and at least one compound that you want to "dock". These molecules can be specified in several formats such as pdb, pdbq, cif, mol2. You can Import or Load molecules from the

File -> Load Molecule

or the

File -> Import ...

tab.

The actual Analysis is performed using various tabs on the GUI. As an example we outlined the steps using the "Vina Wizard" which runs a software called "Autodock Vina" for the Analysis:

Vina Wizard -> Start Here -> (select /global/software/pyrx/0.9.4/bin/vina) -> Start
(highlight Ligands and Macromolecule(s)) -> Forward
(adjust values for Search Space) -> Forward
(check results in bottom window)

There's of course a lot more to it. But the authors of the software claim that it is intuitive enough that you can figure anything out while doing it. Your mileage may vary.

Production runs (cluster mode)

NOTE: This sections is obsolete, it needs a complete overhaul

If you are screening hundreds (or even thousands) of molecules using PyRx the time required may be too much for interactive usage. PyRx offers some basic interface with a scheduler, but the default settings are too non-specific to work with our systems.

For the Vina Wizard, we have provided a work-around that allows you to work through a large number of runs using the machines on the SW cluster in parallel. Before you are trying to do this [through our Slurm wiki page] to learn how jobs are submitted to our production clusters.

The procedure for this starts off the same as for the interactive approach:

Vina Wizard -> Start Here -> (select Cluster(Portable Batch System)) -> Start
(highlight Ligands and Macromolecule(s)) -> Forward
(adjust values for Search Space) -> Forward

However, in this case the "Cluster" setting was selected and as a result, the program is not actually running any docking software, but rather generates a large number of scripts in a directory

~/.PyRx_workspace/Macromolecules/MACRO

where "MACRO" stands for the name of the macromolecule you are using, and "~" is short for the name of your home directory. To run the actual analysis on our cluster, you now need to go into that directory and execute a "perl" script that we have provided for this purpose:

cd ~/.PyRx_workspace/Macromolecules/MACRO
PyRxVinaArray.pl

This will generate two new sub-directories "jobs" and "logs" and copy the scripts mentioned earlier, then produce a job for our scheduler "Grid Engine", and submit it. Using the qstat command,, you should then be seing something like:

$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 12:59:29 abaqus.q@sw0013.hpcvl.org          8 64
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 12:48:59 abaqus.q@sw0020.hpcvl.org          8 60
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 12:58:29 abaqus.q@sw0044.hpcvl.org          8 62
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 12:58:59 abaqus.q@sw0047.hpcvl.org          8 63
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 13:05:59 abaqus.q@sw0048.hpcvl.org          8 65
 952371 0.50734 runVina.sh hpcXXXX      r     11/17/2015 13:09:29 abaqus.q@sw0054.hpcvl.org          8 66
 952371 0.50734 runVina.sh hpcXXXX      qw    11/17/2015 09:30:03                                    8 67-511:1

As you can see, it's working on 6 "Vina" jobs simultaneously, with 8 processors each for a total of 48.

Once the "qstat" command does not show anything anymore, the analyses are finished, and you can go back to your PyRX GUI:

-> Forward
(check results in bottom window)

Note that this works only for the analysis with Vina. If you want to do something similar with a different analysis (for instance Autodock4), please get in touch with us. We can probably come up with a solution.

Further Help