HowTo:pyrx

From CAC Wiki
Revision as of 14:50, 30 May 2016 by Hasch (Talk | contribs) (Version, Location and Access)

Jump to: navigation, search

PyRx

This is a quick introduction to the usage of the screening software PyRx that is installed on the HPCVL clusters. It is meant as an initial pointer to more detailed information. It also explains a few specific details about local usage.

What is PyRx ?

PyRx is a Virtual Screening software for Computational Drug Discovery that can be used to screen libraries of compounds against potential drug targets. It is a GUI that uses a large body of established open source software such as:

  • AutoDock 4 and AutoDock Vina are used as a docking software.
  • AutoDockTools, used to generate input files.
  • Python as a programming/scripting language.
  • wxPython for cross-platform GUI.
  • The Visualization ToolKit (VTK) by Kitware, Inc.
  • Enthought Tool Suite, including Traits, for application building blocks.
  • Opal Toolkit for running AutoDock remotely using web services.
  • Open Babel for importing SDF files, removing salts and energy minimization.
  • matplotlib for 2D plotting.

Version, Location and Access

The binary executable is in /opt/PyRx on the SW (Linux) Cluster. The present version of the program is 0.9.4 (somewhat modified), and it is available on the Linux platform in its 64 bit version. Therefore, all the relevant executables are in /opt/PyRx/jeff/0.9.4. Documentation can be found at at the main PyRx site.

Running PyRx

Setup

You can run PyRx only on the swlogin1 Linux login node (it won't run on Solaris). From there, the setup for PyRx is very simple. It is only necessary type :

use PyRx

This will enter the proper directory into your PATH and off you go.

Interactive runs

Issuing the command

PyRx

will pop up the GUI. All operations are performed from within that interface. At a minimum, you will have to specify a macromolecule and at least one compound that you want to "dock". These molecules can be specified in several formats such as pdb, pdbq, cif, mol2. You can Import or Load molecules from the

File -> Load Molecule

or the

File -> Import ...

tab.

The actual Analysis is performed using various tabs on the GUI. As an example we outlined the steps using the "Vina Wizard" which runs a software called "Autodock Vina" for the Analysis:

Vina Wizard -> Start Here -> (select /opt/PyRx/0.9.2/bin/vina) -> Start
(highlight Ligands and Macromolecule(s)) -> Forward
(adjust values for Search Space) -> Forward
(check results in bottom window)

There's of course a lot more to it. But the authors of the software claim that it is intuitive enough that you can figure anything out while doing it. Your mileage may vary.

Batch runs

Fluent can be run in batch mode. Since you likely have access to Fluent on your local machines, most interactive work can be done elsewhere, whereas the computationally intensive runs can be executed on a parallel system such as ours.

For this, you have to set up a batch command file that consists of a sequence of commands that are issue to Fluent. To get an idea how such a batch command file looks, you can produce a journal file during an interactive session, and edit it later to eliminate unnecessary commands. Note that this needs to be done using the command line inside Fluent, not the menu buttons of the GUI. In fact, it is best to generate journal files in sessions that have been started with the -g option, i.e. that do not use the GUI at all.

The "Text User Interface" that has to be used for writing batch files is documented in the Fluent documentation. Here is an example for a simple batch file that reads in a "case", initializes the flow, and runs 200 iterations. At the end a "data file" is printed and Fluent exits.

rc fan.cas
/solve/initialize/initialize-flow
/solve/iterate 1
/file/write-data fan_1
exit
yes

Let's call this file "example.flin". Note that every command has to be included in the batch command file, including the answer "yes" to the question if you really want to exit the program without saving the case file. Once you have produced a working command file, you can test it by calling

fluent 3ddp -g -i example.flin

We have assumed you are running a three-dimensional solver in double precision. You will have to alter this entries when the case is different. Make sure that the output file for the data (in this case, "fan_1.dat") does not exist before you start the job, otherwise the system will query if you want to over-write it and the answer is not in your command file.

Once everything works you could submit this job into the background (using bash) by typing

fluent 3d -g -i example.flin > example.flout 2>&1 & 

This would redirect standard output and standard error to example.flout. The point is that Fluent is run non-interactively this way, i.e. we can use the same technique to submit a production job to the scheduler, as shown in the next section.

Production runs

To submit a production job on our clusters, you must use the Grid Engine scheduler. To obtain details, read our Grid Engine help file. Production jobs that are run without scheduler will be terminated by the system administrator.

For a Fluent production job, this means that rather than issuing the above batch command directly, you wrap it into a Grid Engine script that looks somewhat like this:

#!/bin/bash
#$ -S /bin/bash
#$ -q abaqus.q
#$ -l qname=abaqus.q
#$ -V
#$ -cwd
#$ -pe shm.pe 12
#$ -m be
#$ -M hpcXXXX@localhost
#$ -o STD.out
#$ -e STD.err
rm fan_1.dat
. /opt/fluent/ansys-16.1/setup_64bit.sh
fluent 3ddp -t$NSLOTS -g -i example.flin

Here we are running the above example batch file "example.flin" using 12 processors on a parallel machine. The output and any error messages from the system are re-directed to files called "STD.out" and "STD.err", respectively. The "#$ -q" and "#$ -l" entries force execution on the Linux cluster (important!). Email notification is handled by the "#$ -m" and "#$ -M" lines. Replace "hpcXXXX" by your actual username and make sure that a file called ".forward" that contains you actual email address is in your home directory. This practice makes it impossible for other users to see your email address.

Many Fluent jobs that you run on our machines are likely to be quite large. To utilize the parallel structure of our machines, Fluent offers several options to execute the solver in a parallel environment, i.e. on several CPU's simultaneously. The default option for such runs is MPI i.e., it uses the Message Passing Interface for inter-process communication.

To take advantage of the parallel capabilities of Fluent, you have to call the program with additional command line options that specify the details of your parallel run:

  • -tn where n is the number of processors requested, e.g. if you want to run with 8 processors, you would use the option -t12
  • -g specifies that the GUI should be surpressed. This is required for batch jobs.

Parallel jobs of longer runtime should only be run in batch using the Grid Engine. The number of processors "12" specified in our example script appears only once, after

#$ -pe shm.pe

which is where you let the Grid Engine know how many processors to allocate to run the program. The internal environment variable NSLOTS will automatically be set to this value and can then be used in the fluent command line.

It is also necessary to source a setup file called setup_64bit.sh. This will set various environment variables and enable the Fluent program to properly interact with Grid Engine. If you are interested, take a look. The file is readable.

All processes are allocated within a single node. This is to make communication more efficient and to avoid problems with the control by Gridengine. The effect of this is that, while still using MPI, Fluent employs a so-called shared-memory layer for communication. The disadvantage is that the size of the job is restricted by the number of cores on a node. Once the script has been adapted (let's call it "fluent.sh"), it can be submitted to the Gridengine by

qsub fluent.sh

from the login node. Note that the job will appear as a parallel job on the Grid Engine's "qstat" or "qmon" commands. Note also that submission of a parallel job in this way is only profitable for large systems that use many CPU cycles, since the overhead for assigning processes, preparing nodes, and communication between them is considerable.

There is an easier way to do this: We are supplying a small perl script called that can be called directly, and will ask a few basic questions, such as the name for the job to be submitted and the number of processes to be used in the job. Simply type

AnsysSubmit

and answer the questions. The script expects a Fluent input file with "file extension" .flin to be present and will do everything else automatically. This is meant for simple Fluent job submissions. More complex job submissions are better done manually.

Further Help

Fluent is a complex software package, and requires some practice to be used efficiently. In this FAQ we can not explain it use in any detail.

The documentation for Fluent can be access from inside the program GUI by clicking on the "Help" button on the upper right. This is in html format. The pdf version of the docs can be found in

/opt/fluent/ansys-16.0/v140/commonfiles/help/en-us/pdf

Fluent documentation is subject to the same license terms as the software itself, i.e. you have to be signed up as a Fluent user in order to access it.

If you are experiencing trouble running a batch command script, check carefully if the sequence of commands is exactly in sync with the program. This might mean typing them in interactively as a test. If you have problems with the Grid Engine, read our FAQ on that subject, and maybe consult the manual for that software which is accessible as a PDF file. HPCVL also provide user support in the case of technical problems: just send email to cac.help@queensu.ca.