UploadingFiles:Frontenac

From CAC Wiki
Jump to: navigation, search

Uploading / Downloading Files

For data transfers, we provide the login nodes and a dedicated transfer node:

Login nodes          login.cac.queensu.ca       smaller transfers (<1TB)
Data transfer node   transfer.cac.queensu.ca    larger transfers  (>1TB)

Using scp

Possible the simplest way to upload/download files to/from our system is "scp" (secure copy). The syntax for a file transfer is:

 scp -r SOURCE TARGET 

SOURCE is the full path of the file or directory you want to transfer. TARGET is the full path of the file or directory you want top copy to. Both of these are of the format

 username@adress.of.system:/full/directory/path/filename 

If you are on the "source system" and want to upload to a remote system, you can omit the username and address, including the colon, of the source. If you are in the directory that contains the source file (or directory) you can omit the path of the source. Likewise for the target if you download. Here is an example of an upload from Frontenac "current directory" to Graham home directory. A directory named "workshop_nov14" is being transfered:

hasch@caclogin04$ scp -r workshop_nov14 hschmide@graham.computecanada.ca:
Warning: Permanently added the ECDSA host key for IP address '199.241.166.4' to the list of known hosts.
hschmide@graham.computecanada.ca's password:

Note that the -r option stands for "recursive" and is necessary when a full directory with all contents is being transferred. If only a single file is transferred, -r may be omitted.

The details for this command can be found in the official man pages.

Using sftp

For interactive file transfer, the sftp offers an alternative to scp. The main advantage is that if you are planning multiple separate transfers, the password has to be entered only once at the beginning of a session. The syntax for a file transfer is:

 sftp SYSTEM 

SYSTEM is of the format

 username@adress.of.system 

Once you are logged into the system, you can use the "get" command to download from the system, and the "put" command to upload to it. The -r option enables recursive up/downloads of a directory. It can be omitted for single files. SOURCE and TARGET are specified in that order, and should include the full path. If the path is ommited, defaults are such as home directory or present working directory are used. Here is the example of a directory upload from Frontenac to Graham:

hasch@caclogin04$ sftp hschmide@graham.computecanada.ca
hschmide@graham.computecanada.ca's password:
Connected to graham.computecanada.ca.
sftp> put -r omp-test
Uploading omp-test/ to /home/hschmide/omp-test
Entering omp-test/
omp-test/a.out                                                                             100%  814KB   4.3MB/s   00:00
omp-test/test.optrpt                                                                       100% 2096    91.7KB/s   00:00
omp-test/test.in                                                                           100%   11     0.5KB/s   00:00
omp-test/test.f90                                                                          100%  311    13.8KB/s   00:00
sftp> quit

The details for the command can be found in the official man pages.

Using a Secure File Transfer client

If you are working on your own machine (Windows, MAC, Linux desktop), we are now recommending using Winscp to transfer files to and from the cluster. Previously we had recommended Filezilla but it now installs malware. You can get this Winscp from this link. Once you've installed Winscp and opened the Winscp Client, use the following instructions to connect.

In the Login panel, click New Site

  • File protocol: SFTP
  • Hostname:login.cac.queensu.ca
  • User: (your username)
  • Password: (Leave Blank, you'll be prompted)
  • Port: 22

Once connected, you should see your files on the cluster along the right hand side, and the files from your computer on the left. To transfer files between your computer and the cluster, drag-and-drop the files from one side to the other (or to and from your desktop).

Using Globus through a command-line interface

Globus provides a means to transfer large amounts of data in a batch framework, i.e. without "standing by" while the transfer is on-going. Since the setup of an individual "endpoint" is required for this, we don't recommend this method if only small amounts of data need to be transferred. However, if you are planning to move large amounts (in the TB range), then Globus is a reliable and convenient method.

If you decide to go this route, follow the following steps.

Installing Globus Command-Line Interface (CLI)

We reccommend to do the following installs in a spearate directory.

$ mkdir globus
$ cd globus

The Globus CLI needs to be installed individually by the user. This is very simple using the python "pip" tool:

$ module load python
$ pip install --upgrade --user globus-cli
Collecting globus-cli [...response from pip installer...]

In addition, the "Globus Connect Personal CLI" needs to be installed too. We're adding the directory it's in to the path.

$ wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
[...download response from wget...]
2018-11-12 09:57:00 (24.3 MB/s) - ‘globusconnectpersonal-latest.tgz’ saved [14501379/14501379]
$ tar xzf globusconnectpersonal-latest.tgz
$ cd globusconnectpersonal-2.3.6/
$ export PATH=`pwd`:$PATH

Login to Globus

Once the CLI is installed it can be used to login to your Globus account. You need a Globus ID which you can make yourself, or (more likely) obtain through Compute Canada. Authentication is done through a browser. Thge globus login command will provide a link to a Globus page, which you cut-and-paste. At the page you will be required to provide your Globus ID and authorize some access. Eventually you will be give an authorization code which you can cu-and-paste back into the login session:

hpc1005@caclogin03$ globus login --no-local-server
Please authenticate with Globus here:
------------------------------------
https://auth.globus.org/v2/oauth2/authorize?[...etc...]
------------------------------------

Enter the resulting Authorization Code here: qLdfgbsbhdfugisbsusidfgsdbu

You have successfully logged in to the Globus CLI!

You can check your primary identity with
  globus whoami

For information on which of your identities are in session use
  globus session show

Logout of the Globus CLI with
  globus logout

"globus --help" provides a list of available commands that are used from the Globus CLI to initiate transfer sessions etc.

Creating, connecting, and verifying a personal endpoint

Globus works on the basis of "endpoints" between which any file transfer takes place. We need to create such an endpoint, then connect and verify it. First the creation. Make sure you are logged into Globus when you do this:

$ globus endpoint create --personal test-endpoint
Message:     Endpoint created successfully
Endpoint ID: cb8eed54-e72e-1e28-8aca-0a1edd5c824a
Setup Key:   b2224504-e78d-4a87-b8e5-679164e0877f

The Endpoint ID is used to initiate any transfer from the present system. The sedtup key is necessary to connect the endpoint and verify it using the "globusconnectpersonal" command (make sure both directories for Globus CLI and Globus Personal Connect CLI are in the path.

hpc1005@caclogin03$ globusconnectpersonal -setup b2224504-e78d-4a87-b8e5-679164e0877f
Configuration directory: $HOME .globusonline/lta
Contacting relay.globusonline.org:2223
Done!

At this point, you new endpoint should appear in a list of endpoints you can generate with the "globus endpoint" command:

$ globus endpoint search --filter-scope my-endpoints
ID                                   | Owner                     | Display Name
------------------------------------ | ------------------------- | -------------------
6345e4d2-5aab-1ab8-9565-0426a3d44368 | hschmide@computecanada.ca | Hartmut's PC at CAC
cb8eed54-e72e-1e28-8aca-0a1edd5c824a | hschmide@computecanada.ca | test-endpoint

The second line is obviously the present endpoint we just created.

Find and verify the remote endpoint

An endpoint search can be used to find the system you want to transfer to (or from). We use the Compute Canada system "Cedar" as an example:

hpc1005@caclogin03$ globus endpoint search cedar
ID                                   | Owner                      | Display Name
------------------------------------ | -------------------------- | -----------------------------------
c99fd40c-5545-11e7-beb6-22000b9a448b | computecanada@globusid.org | computecanada#cedar-dtn
a962d108-7b4b-11e8-9446-0a6d4e044368 | computecanada@globusid.org | computecanada#cedar-mial
[...more lines...]

The first line (the one with -dtn) is a data transfer node, so that is what we are going for. To be allowed to transfer to that node, you need to authenticate to it.

hpc1005@caclogin03$ globus endpoint activate --no-browser --web a962d108-7b4b-11e8-9446-0a6d4e044368
Autoactivation succeeded with message: Endpoint activated successfully using cached credential

In this case, the credentials are already available to Globus because of earlier usage. If you're doing this for the first time, you will be provided with a "Web activation url" that you can cu-and-paste to a browser to authenticate. If the endpoint has already been activated and is still usable, you are being told an expiry date for the activation.

Starting Globus Connect

Finally, "globusconnect" can be started in the background. Again, be sure to have the executable in your path.

$ nohup globusconnectpersonal -start &
[1] 116748

You're given a process number. It is good idea to note that down.

Initiating a file transfer

File transfer itself is now done with the globus transfer command:

$ globus transfer --encrypt cb8eed54-e72e-1e28-8aca-0a1edd5c824a:wfn.tar c99fd40c-5545-11e7-beb6-22000b9a448b:wfn.tar
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: 0d2a128c-e695-11e8-8c9a-0a1d4c5c824a

The first argument is the endpoint id and file name of the source, the second argument likewise for the target of the transfer. The progress of the transfer can be monitored from the Globus portal.

Shutting down a globus process

The "globusconnectpersonal" process that was started in the background before the transfer could start can be shut down by bringing it into the foreground and stoppoing it with Cntrl-C:

$ fg
nohup globusconnectpersonal -start
^C
$

The Globus portals & Help

Compute Canada operates a Globus Portal which can be used to create a Globus account (if you don't already have one), or to initiate file transfers using a GUI. For the latter to work the personal endpoint has to be set up as described above, and the "globusconnectpersonal" process has to be running.

Alternatively, Globus offers a similar portal that can be accessed with the same credentials.

Extensive documentation about Globus is avalable at https://docs.computecanada.ca/wiki/Globus.

If you need assistance with using globus on our systems, please send email to cac.help@queensu.ca ; we can guide you through the process.

Aspera/ascp

Some sites offer an Aspera server for data uploads/downloads such as NCBI and EGA genomics archives. You can use the client software ASCP which offers parallel transfers and restarts. To use ascp where an available Aspera server is running, load ascp

module load ascp/4.2.5

For details of using ascp, see http://download.asperasoft.com/download/docs/scp_client/2.5/aspera-client-unix.html