UploadingFiles:Frontenac
Contents
- 1 Uploading / Downloading Files
- 1.1 Using scp
- 1.2 Using sftp
- 1.3 Using a Secure File Transfer client
- 1.4 Using Globus through a command-line interface
- 1.4.1 Installing Globus Command-Line Interface (CLI)
- 1.4.2 Login to Globus
- 1.4.3 Creating, connecting, and verifying a personal endpoint
- 1.4.4 Find and verify the remote endpoint
- 1.4.5 Starting Globus Connect
- 1.4.6 Initiating a file transfer
- 1.4.7 Shutting down a globus process
- 1.4.8 The Globus portals & Help
- 1.4.9 Aspera/ascp
Uploading / Downloading Files
For data transfers, we provide the login nodes and a dedicated transfer node:
Login nodes login.cac.queensu.ca smaller transfers (<1TB) Data transfer node transfer.cac.queensu.ca larger transfers (>1TB)
Using scp
Possible the simplest way to upload/download files to/from our system is "scp" (secure copy). The syntax for a file transfer is:
scp -r SOURCE TARGET
SOURCE is the full path of the file or directory you want to transfer. TARGET is the full path of the file or directory you want top copy to. Both of these are of the format
username@adress.of.system:/full/directory/path/filename
If you are on the "source system" and want to upload to a remote system, you can omit the username and address, including the colon, of the source. If you are in the directory that contains the source file (or directory) you can omit the path of the source. Likewise for the target if you download. Here is an example of an upload from Frontenac "current directory" to Graham home directory. A directory named "workshop_nov14" is being transfered:
hasch@caclogin04$ scp -r workshop_nov14 hschmide@graham.computecanada.ca: Warning: Permanently added the ECDSA host key for IP address '199.241.166.4' to the list of known hosts. hschmide@graham.computecanada.ca's password:
Note that the -r option stands for "recursive" and is necessary when a full directory with all contents is being transferred. If only a single file is transferred, -r may be omitted.
The details for this command can be found in the official man pages.
Using sftp
For interactive file transfer, the sftp offers an alternative to scp. The main advantage is that if you are planning multiple separate transfers, the password has to be entered only once at the beginning of a session. The syntax for a file transfer is:
sftp SYSTEM
SYSTEM is of the format
username@adress.of.system
Once you are logged into the system, you can use the "get" command to download from the system, and the "put" command to upload to it. The -r option enables recursive up/downloads of a directory. It can be omitted for single files. SOURCE and TARGET are specified in that order, and should include the full path. If the path is ommited, defaults are such as home directory or present working directory are used. Here is the example of a directory upload from Frontenac to Graham:
hasch@caclogin04$ sftp hschmide@graham.computecanada.ca hschmide@graham.computecanada.ca's password: Connected to graham.computecanada.ca. sftp> put -r omp-test Uploading omp-test/ to /home/hschmide/omp-test Entering omp-test/ omp-test/a.out 100% 814KB 4.3MB/s 00:00 omp-test/test.optrpt 100% 2096 91.7KB/s 00:00 omp-test/test.in 100% 11 0.5KB/s 00:00 omp-test/test.f90 100% 311 13.8KB/s 00:00 sftp> quit
The details for the command can be found in the official man pages.
Using a Secure File Transfer client
If you are working on your own machine (Windows, MAC, Linux desktop), we are now recommending using Winscp to transfer files to and from the cluster. Previously we had recommended Filezilla but it now installs malware. You can get this Winscp from this link. Once you've installed Winscp and opened the Winscp Client, use the following instructions to connect.
In the Login panel, click New Site
- File protocol: SFTP
- Hostname:login.cac.queensu.ca
- User: (your username)
- Password: (Leave Blank, you'll be prompted)
- Port: 22
Once connected, you should see your files on the cluster along the right hand side, and the files from your computer on the left. To transfer files between your computer and the cluster, drag-and-drop the files from one side to the other (or to and from your desktop).
Using Globus through a command-line interface
Globus provides a means to transfer large amounts of data in a batch framework, i.e. without "standing by" while the transfer is on-going. Since the setup of an individual "endpoint" is required for this, we don't recommend this method if only small amounts of data need to be transferred. However, if you are planning to move large amounts (in the TB range), then Globus is a reliable and convenient method. See CLI Documentation
If you decide to go this route, follow the following steps.
Installing Globus Command-Line Interface (CLI)
We reccommend to do the following installs in a spearate directory.
$ mkdir globus $ cd globus
The Globus CLI needs to be installed individually by the user. This is very simple using the python "pip" tool:
$ module load python $ pip install --upgrade --user globus-cli Collecting globus-cli [...response from pip installer...]
In addition, the "Globus Connect Personal CLI" needs to be installed too. We're adding the directory it's in to the path.
$ wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz [...download response from wget...] 2018-11-12 09:57:00 (24.3 MB/s) - ‘globusconnectpersonal-latest.tgz’ saved [14501379/14501379] $ tar xzf globusconnectpersonal-latest.tgz $ cd globusconnectpersonal-2.3.6/ $ export PATH=`pwd`:$PATH
Login to Globus
Once the CLI is installed it can be used to login to your Globus account. You need a Globus ID which you can make yourself, or (more likely) obtain through Compute Canada. Authentication is done through a browser. Thge globus login command will provide a link to a Globus page, which you cut-and-paste. At the page you will be required to provide your Globus ID and authorize some access. Eventually you will be give an authorization code which you can cu-and-paste back into the login session:
hpc1005@caclogin03$ globus login --no-local-server Please authenticate with Globus here: ------------------------------------ https://auth.globus.org/v2/oauth2/authorize?[...etc...] ------------------------------------ Enter the resulting Authorization Code here: qLdfgbsbhdfugisbsusidfgsdbu You have successfully logged in to the Globus CLI! You can check your primary identity with globus whoami For information on which of your identities are in session use globus session show Logout of the Globus CLI with globus logout
"globus --help" provides a list of available commands that are used from the Globus CLI to initiate transfer sessions etc.
Creating, connecting, and verifying a personal endpoint
Globus works on the basis of "endpoints" between which any file transfer takes place. We need to create such an endpoint, then connect and verify it. First the creation. Make sure you are logged into Globus when you do this:
$ globus endpoint create --personal test-endpoint Message: Endpoint created successfully Endpoint ID: cb8eed54-e72e-1e28-8aca-0a1edd5c824a Setup Key: b2224504-e78d-4a87-b8e5-679164e0877f
The Endpoint ID is used to initiate any transfer from the present system. The sedtup key is necessary to connect the endpoint and verify it using the "globusconnectpersonal" command (make sure both directories for Globus CLI and Globus Personal Connect CLI are in the path.
hpc1005@caclogin03$ globusconnectpersonal -setup b2224504-e78d-4a87-b8e5-679164e0877f Configuration directory: $HOME .globusonline/lta Contacting relay.globusonline.org:2223 Done!
At this point, you new endpoint should appear in a list of endpoints you can generate with the "globus endpoint" command:
$ globus endpoint search --filter-scope my-endpoints ID | Owner | Display Name ------------------------------------ | ------------------------- | ------------------- 6345e4d2-5aab-1ab8-9565-0426a3d44368 | hschmide@computecanada.ca | Hartmut's PC at CAC cb8eed54-e72e-1e28-8aca-0a1edd5c824a | hschmide@computecanada.ca | test-endpoint
The second line is obviously the present endpoint we just created.
Find and verify the remote endpoint
An endpoint search can be used to find the system you want to transfer to (or from). We use the Compute Canada system "Cedar" as an example:
hpc1005@caclogin03$ globus endpoint search cedar ID | Owner | Display Name ------------------------------------ | -------------------------- | ----------------------------------- c99fd40c-5545-11e7-beb6-22000b9a448b | computecanada@globusid.org | computecanada#cedar-dtn a962d108-7b4b-11e8-9446-0a6d4e044368 | computecanada@globusid.org | computecanada#cedar-mial [...more lines...]
The first line (the one with -dtn) is a data transfer node, so that is what we are going for. To be allowed to transfer to that node, you need to authenticate to it.
hpc1005@caclogin03$ globus endpoint activate --no-browser --web a962d108-7b4b-11e8-9446-0a6d4e044368 Autoactivation succeeded with message: Endpoint activated successfully using cached credential
In this case, the credentials are already available to Globus because of earlier usage. If you're doing this for the first time, you will be provided with a "Web activation url" that you can cu-and-paste to a browser to authenticate. If the endpoint has already been activated and is still usable, you are being told an expiry date for the activation.
Starting Globus Connect
Finally, "globusconnect" can be started in the background. Again, be sure to have the executable in your path.
$ nohup globusconnectpersonal -start & [1] 116748
You're given a process number. It is good idea to note that down.
Initiating a file transfer
File transfer itself is now done with the globus transfer command:
$ globus transfer --encrypt cb8eed54-e72e-1e28-8aca-0a1edd5c824a:wfn.tar c99fd40c-5545-11e7-beb6-22000b9a448b:wfn.tar Message: The transfer has been accepted and a task has been created and queued for execution Task ID: 0d2a128c-e695-11e8-8c9a-0a1d4c5c824a
The first argument is the endpoint id and file name of the source, the second argument likewise for the target of the transfer. The progress of the transfer can be monitored from the Globus portal.
Shutting down a globus process
The "globusconnectpersonal" process that was started in the background before the transfer could start can be shut down by bringing it into the foreground and stoppoing it with Cntrl-C:
$ fg nohup globusconnectpersonal -start ^C $
The Globus portals & Help
Compute Canada operates a Globus Portal which can be used to create a Globus account (if you don't already have one), or to initiate file transfers using a GUI. For the latter to work the personal endpoint has to be set up as described above, and the "globusconnectpersonal" process has to be running.
Alternatively, Globus offers a similar portal that can be accessed with the same credentials.
Extensive documentation about Globus is avalable at https://docs.computecanada.ca/wiki/Globus.
If you need assistance with using globus on our systems, please send email to cac.help@queensu.ca ; we can guide you through the process.
Aspera/ascp
Some sites offer an Aspera server for data uploads/downloads such as NCBI and EGA genomics archives. You can use the client software ASCP which offers parallel transfers and restarts. To use ascp where an available Aspera server is running, load ascp
module load ascp/4.2.5
For details of using ascp, see http://download.asperasoft.com/download/docs/scp_client/2.5/aspera-client-unix.html