Difference between revisions of "Filesystems:Frontenac"

From CAC Wiki
Jump to: navigation, search
 
(26 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
 +
'''Important Note: Due to the transition to cost-recovery service many of the details of file system organization on the Frontenac GPFS file system are subject to change. Please refer back to these pages occasionally to stay abreast of these changes.'''
  
 
==Overview== <!--T:1-->
 
==Overview== <!--T:1-->
  
 
<!--T:2-->
 
<!--T:2-->
Compute Canada provides a wide range of storage options to cover the needs of our very diverse users. These storage solutions range from high-speed temporary local storage to different kinds of long-term storage, so you can choose the storage medium that best corresponds to your needs and usage patterns. In most cases the [https://en.wikipedia.org/wiki/File_system filesystems] on Compute Canada systems are a ''shared'' resource and for this reason should be used responsibly - unwise behaviour can negatively affect dozens or hundreds of other users. These filesystems are also designed to store a limited number of very large files, typically binary rather than text files, i.e. they are not directly human-readable. You should therefore avoid storing thousands of small files, where small means less than a few megabytes, particularly in the same directory. A better approach is to use commands like [[Archiving and compressing files|<tt>tar</tt>]] or <tt>zip</tt> to convert a directory containing many small files into a single very large archive file.  
+
The Frontenac cluster uses a shared [https://www.ibm.com/support/knowledgecenter/en/SSFKCN/gpfs_welcome.html GPFS filesystem] for all file storage.
 
+
User files are located under <code>/global/home</code> of '''500GB''' quota, shared project space under <code>/global/project</code>,  
 +
and network scratch space under <code>/global/scratch</code> of '''5TB''' quota.
 +
In addition to the network storage, compute nodes have between up to '''1.5TB''' local hard disk for fast access to local scratch space by jobs using
 +
the location specified by the <code>$TMPDISK</code> environment variable. All files in the local scratch space are assumed to be deleted automatically
 +
when corresponding jobs finish.
 
<!--T:3-->
 
<!--T:3-->
It is also your responsibility to manage the age of your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and a Compute Canada system or between two Compute Canada systems should generally be done using [[Globus]].  
+
Note that it is the user's responsibility to manage the age of their data: these file systems do not provide archiving.
 +
If data are no longer needed, they need to be moved off the system. If you need assistance with this, please contact us.
 +
This is especially important as we are charging for storage in the Terabyte range. At present, ''nearline'' data (see explanation below) are free, but ''project'' data (see below) are subject to charges on an annual basis. Detail about our cost structure can be found at [[Frontenac:Fees|our Fees Information Page]].
  
<!--T:4-->
+
== Storage Areas == <!--T:5-->
Note that Compute Canada storage systems are not for personal use and should only be used to store research data.
+
  
<!--T:17-->
+
Unlike your personal computer, our system has several storage spaces or file systems and you should ensure that you are using the right space for the right task. In this section we will discuss the principal file systems available, and the intended use of each one along with its characteristics. Storage options are distinguished by the available hardware, access mode and write system. Typically, most Compute Canada systems offer the following storage types:
When your account is created on Cedar and Graham, your home directory will not be entirely empty. It will contain references to your scratch and project spaces through the mechanism of a [https://en.wikipedia.org/wiki/Symbolic_link symbolic link], a kind of shortcut that allows easy access to these other filesystems from your home directory. While your home and scratch spaces are unique to you as an individual user, the project space is a shared by a research group. This group may consist of those individuals with a Compute Canada account sponsored by a particular faculty member or members of a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ RAC allocation]. A given individual may thus have access to several different project spaces, associated with one or more faculty members, with symbolic links to these different project spaces in the directory projects of your home. Every account has a default project and this default project is what the symbolic link project in your home directory points to. For users with a single active sponsored role is the default project of your sponsor while users with more than one active sponsored role will have a default project that corresponds to the default project of the faculty member with the most sponsored accounts.
+
 
+
<!--T:16-->
+
All users can check the available disk space and the current disk utilization for the ''project'', ''home'' and ''scratch'' file systems with the command line utility '''''diskusage_report''''', available on both '''Cedar''' and '''Graham'''. To use this utility, log into Cedar or Graham using SSH, at the command prompt type diskusage_report, and press the Enter key. Following is a typical output of this utility:
+
<pre>
+
# diskusage_report
+
                  Description                Space          # of files
+
                Home (username)        280 kB/47 GB              25/500k
+
              Scratch (username)        4096 B/18 TB              1/1000k
+
      Project (def-username-ab)      4096 B/9536 GB              2/5000k
+
          Project (def-username)      4096 B/9536 GB              2/5000k
+
</pre>
+
 
+
== Storage Types == <!--T:5-->
+
Unlike your personal computer, a Compute Canada system will typically have several storage spaces or filesystems and you should ensure that you are using the right space for the right task. In this section we will discuss the principal filesystems available on most Compute Canada systems and the intended use of each one along with its characteristics. Storage options are distinguished by the available hardware, access mode and write system. Typically, most Compute Canada systems offer the following storage types:
+
  
 
<!--T:6-->
 
<!--T:6-->
;Network Filesystem (NFS)
+
;Global Parallel File System (GPFS)
: This type of storage is generally equally visible on both login and compute nodes. This is the appropriate place to put small but important files that are regularly used: source code, programs, job scripts and parameter files. This type of storage offers performance comparable to a conventional hard disk.
+
: This file system is visible on both login and compute nodes. Combining multiple disk arrays and fast servers, it offers excellent performance for large files and large input/output operations. Two types of storage are distinguished on such systems: long term storage and temporary storage (scratch). Performance is subject to variations caused by other users.
;Parallel Filesystem (Lustre, GPFS)
+
: This type of storage is generally equally visible on both login and compute nodes. Combining multiple disk arrays and fast servers, it offers excellent performance for large files and large input/output operations. Often two types of storage are distinguished on such systems: long term storage and temporary storage (scratch). Performance is subject to variations caused by other users.
+
 
;Local Filesystem
 
;Local Filesystem
: This type of storage consists of a local hard drive attached to each compute node. Its advantage is that its performance is high because it is very rarely shared --- typically, only one user will access a local drive at a time. However, you must copy your files back to another storage medium like the scratch space or project space before your job ends because everything will be cleaned after each job.
+
: This is a local hard drive attached to each of the nodes. Its advantage is high performance (because it is rarely shared). Its disadvantage is that local files must be re-copied to a global area to be vi sible on other nodes such as the login ()workup) node. Typically, local disk is regularly "cleaned", i.e. data kept there are considered transitory.
 
;RAM (memory) Filesystem
 
;RAM (memory) Filesystem
: This is a filesystem that exists within a compute node's RAM, so its use reduces available memory for computations. Such filesystems are very fast for small files and particularly faster than other systems when file access is random. A RAM disk is always cleaned at the end of a job.
+
: This is a filesystem that exists within a node's RAM, so it reduces the available memory. This makes it very fast but low-capacity. A RAM disk must be cleaned at the end of a job.
  
 
<!--T:7-->
 
<!--T:7-->
Line 50: Line 40:
 
! scope="col" width="120px" | Longevity
 
! scope="col" width="120px" | Longevity
 
|-
 
|-
|Network Filesystem (NFS)
+
|GPFS (/global/home, /global/project ...)
|style="background-color:#00c000;"|All nodes
+
|style="background-color:#ff0000;"|Poor
+
|style="background-color:#ff0000;"|High
+
|style="background-color:#00c000;"|Long term
+
|-
+
|Long-Term Parallel Filesystem
+
 
|style="background-color:#00c000;"|All nodes
 
|style="background-color:#00c000;"|All nodes
 
|style="background-color:#ffff00;"|Fair
 
|style="background-color:#ffff00;"|Fair
Line 62: Line 46:
 
|style="background-color:#00c000;"|Long term
 
|style="background-color:#00c000;"|Long term
 
|-
 
|-
|Short-Term Parallel Filesystem
+
|GPFS (/global/scratch)
 
|style="background-color:#00c000;"|All nodes
 
|style="background-color:#00c000;"|All nodes
 
|style="background-color:#ffff00;"|Fair
 
|style="background-color:#ffff00;"|Fair
Line 68: Line 52:
 
|style="background-color:#ffff00;"|Short term (periodically cleaned)
 
|style="background-color:#ffff00;"|Short term (periodically cleaned)
 
|-
 
|-
|Local Filesystem
+
|Local Filesystem (TMPDIR)
 
|style="background-color:#ff0000;"|Local to the node
 
|style="background-color:#ff0000;"|Local to the node
 
|style="background-color:#ffff00;"|Fair
 
|style="background-color:#ffff00;"|Fair
Line 74: Line 58:
 
|style="background-color:#ff0000;"|Very short term
 
|style="background-color:#ff0000;"|Very short term
 
|-
 
|-
|Memory (RAM) Filesystem
+
|Memory (RAM) FS
 
|style="background-color:#ff0000;"|Local to the node
 
|style="background-color:#ff0000;"|Local to the node
 
|style="background-color:#00c000;"|Good
 
|style="background-color:#00c000;"|Good
Line 80: Line 64:
 
|style="background-color:#ff0000;"|Very short term, cleaned after every job
 
|style="background-color:#ff0000;"|Very short term, cleaned after every job
 
|}
 
|}
'''Throughput''' describes the efficiency of the file system for large operations, such as those involving a megabyte or more per read or write.
+
'''Throughput''' describes the efficiency of the file system for large operations. Sometimes also called "bandwidth" in the context of FS-IO.
  
<!--T:15-->
+
'''Latency''' describes the efficiency of the file system for small operations. Low latency is good.
'''Latency''' describes the efficiency of the file system for multiple small operations. Low latency is good; however, if one has a choice between a small number of large operations and a large number of small ones, it is almost always better to use a small number of large operations.
+
  
== Best practices == <!--T:9-->
+
== Quotas == <!--T:10-->
* Only use text format for files that are smaller than a few megabytes.
+
* As far as possible, use local storage for temporary files. It is best to use the temporary directory created by the [[Running jobs|job scheduler]] for this, named <code>$SLURM_TMPDIR</code>.
+
* If your program must search within a file, it is fastest to do it by first reading it completely before searching, or to use a RAM disk.
+
* Regularly clean up your data in the scratch and project spaces, because those filesystems are used for huge data collections.
+
* If you no longer use certain files but they must be retained, [[Archiving and compressing files|archive and compress]] them, and if possible copy them elsewhere.
+
* If your needs are not well served by the available storage options please contact us by sending an e-mail to [mailto:support@computecanada.ca Compute Canada support].
+
 
+
==Filesystem Quotas and Policies== <!--T:10-->
+
  
 
<!--T:11-->
 
<!--T:11-->
In order to ensure that there is adequate space for all Compute Canada users, there are a variety of quotas and policy restrictions concerning back-ups and automatic purging of certain filesystems.
+
On our cluster, each user has access to the /global/home and /global/scratch spaces by default and each group has access to project space in /global/project. These areas are subject to disk quota
On a cluster, each user has access to the home and scratch spaces by default and each group has access to 1 TB of project space by default. The nearline space has a default quota of 5 TB per group which is made available upon request by writing to [mailto:support@computecanada.ca Compute Canada support]. The nearline filesystem is made up of medium to low performance storage in very high capacity. This filesystem should be used for storage of data that is infrequently accessed that needs to be kept for long periods of time. Both the nearline and project spaces may have their group-based quotas increased allocated through the annual RAC (resource allocation) process. You can see your current usage of the current quota for various filesystems on Cedar and Graham using the command <tt>diskusage_report</tt>.
+
 
+
 
<!--T:12-->
 
<!--T:12-->
 
{| class="wikitable" style="font-size: 95%; text-align: center;"
 
{| class="wikitable" style="font-size: 95%; text-align: center;"
 
|+Filesystem Characteristics  
 
|+Filesystem Characteristics  
! Filesystem
+
! Area
! Default Quota
+
! Quota
! Lustre-based?
+
! Backup ?
! Backed up?
+
! Purge ?
! Purged?
+
! Default ?
! Available by Default?
+
! On Nodes?
! Mounted on Compute Nodes?
+
 
|-
 
|-
|Home Space
+
|/global/home
|50 GB and 500K files per user
+
| 500 GB
|Yes for Cedar, No for Graham (NFS)
+
| Yes
|Yes
+
| No
|No
+
| Yes
|Yes
+
| Yes
|Yes
+
 
|-
 
|-
|Scratch Space
+
| /global/scratch
|20 TB (100 TB on Graham) and 1M files per user<ref>Scratch space on Cedar can be increased to 100 TB per user upon request to [mailto:support@computecanada.ca Compute Canada support].</ref>
+
| 5 TB
|Yes
+
| No
|No
+
| Yes
|Yes, all files older than 60 days are subject to purging.
+
| Yes
|Yes
+
| Yes
|Yes
+
 
|-
 
|-
|Project Space
+
| /global/project
|1 TB and 5M files per group<ref>Project space can be increased to 10 TB per group upon request to [mailto:support@computecanada.ca Compute Canada support] and requests by different members of the same group will be summed together up to the ceiling of 10 TB.</ref>
+
|
|Yes
+
| Yes
|Yes
+
| No
|No
+
| Yes
|Yes
+
| Yes
|Yes
+
 
|-
+
|Nearline Space
+
|5 TB per group
+
|No
+
|No
+
|No
+
|No
+
|No
+
 
|}
 
|}
The backup policy on the home and project space is nightly backups which are retained for 30 days, while deleted files are retained for a further 60 days. If you wish to recover a previous version of a file or directory, you should write to [mailto:support@computecanada.ca Compute Canada support] with the full path for the file(s) and desired version (by date). To copy data from the nearline storage to the project, home or scratch space, you should also write to [mailto:support@computecanada.ca Compute Canada support].
 
<references />
 
 
== See also == <!--T:13-->
 
 
<!--T:14-->
 
* [[Project layout]]
 
* [[Sharing data]]
 
* [[National Data Cyberinfrastructure]]
 
* [[Tuning Lustre]]
 
* [[Archiving and compressing files]]
 
</translate>
 
 
 
The Frontenac cluster uses a shared [https://www.ibm.com/support/knowledgecenter/en/SSFKCN/gpfs_welcome.html GPFS filesystem] for all file storage.
 
User files are located under <code>/global/home</code> of 3TB quota, shared project space under <code>/global/project</code>,
 
and network scratch space under <code>/global/scratch</code> of 5TB quota.
 
  
In addition to the network storage, each compute node has a 1.5TB local hard disk for fast access to local scratch space by jobs using the location specified by the <code>$TMPDISK</code> environment variable. All files in the local scratch space are assumed to be deleted automatically when corresponding jobs finish.
+
== Some Tips == <!--T:9-->
 +
* Avoid text format files for large data.
 +
* Use local storage for temporary files. The scheduler provides this (<code>$TMPDIR or $TMPDISK</code>) which is created when you job starts on a compute node, e.g. TMPDISK=/lscratch/slurm-job-6363084.
 +
* Searches should be done in memory rather than on disk.
 +
* Regularly clean up data, especially in scratch.
 +
* Unused files that have to be kept should be moved off-system.

Latest revision as of 13:38, 27 September 2023

Important Note: Due to the transition to cost-recovery service many of the details of file system organization on the Frontenac GPFS file system are subject to change. Please refer back to these pages occasionally to stay abreast of these changes.

Overview

The Frontenac cluster uses a shared GPFS filesystem for all file storage. User files are located under /global/home of 500GB quota, shared project space under /global/project, and network scratch space under /global/scratch of 5TB quota. In addition to the network storage, compute nodes have between up to 1.5TB local hard disk for fast access to local scratch space by jobs using the location specified by the $TMPDISK environment variable. All files in the local scratch space are assumed to be deleted automatically when corresponding jobs finish. Note that it is the user's responsibility to manage the age of their data: these file systems do not provide archiving. If data are no longer needed, they need to be moved off the system. If you need assistance with this, please contact us. This is especially important as we are charging for storage in the Terabyte range. At present, nearline data (see explanation below) are free, but project data (see below) are subject to charges on an annual basis. Detail about our cost structure can be found at our Fees Information Page.

Storage Areas

Unlike your personal computer, our system has several storage spaces or file systems and you should ensure that you are using the right space for the right task. In this section we will discuss the principal file systems available, and the intended use of each one along with its characteristics. Storage options are distinguished by the available hardware, access mode and write system. Typically, most Compute Canada systems offer the following storage types:

Global Parallel File System (GPFS)
This file system is visible on both login and compute nodes. Combining multiple disk arrays and fast servers, it offers excellent performance for large files and large input/output operations. Two types of storage are distinguished on such systems: long term storage and temporary storage (scratch). Performance is subject to variations caused by other users.
Local Filesystem
This is a local hard drive attached to each of the nodes. Its advantage is high performance (because it is rarely shared). Its disadvantage is that local files must be re-copied to a global area to be vi sible on other nodes such as the login ()workup) node. Typically, local disk is regularly "cleaned", i.e. data kept there are considered transitory.
RAM (memory) Filesystem
This is a filesystem that exists within a node's RAM, so it reduces the available memory. This makes it very fast but low-capacity. A RAM disk must be cleaned at the end of a job.

The following table summarizes the properties of these storage types.

Description of storage type
Type Accessibility Throughput Latency Longevity
GPFS (/global/home, /global/project ...) All nodes Fair High Long term
GPFS (/global/scratch) All nodes Fair High Short term (periodically cleaned)
Local Filesystem (TMPDIR) Local to the node Fair Medium Very short term
Memory (RAM) FS Local to the node Good Very low Very short term, cleaned after every job

Throughput describes the efficiency of the file system for large operations. Sometimes also called "bandwidth" in the context of FS-IO.

Latency describes the efficiency of the file system for small operations. Low latency is good.

Quotas

On our cluster, each user has access to the /global/home and /global/scratch spaces by default and each group has access to project space in /global/project. These areas are subject to disk quota

Filesystem Characteristics
Area Quota Backup ? Purge ? Default ? On Nodes?
/global/home 500 GB Yes No Yes Yes
/global/scratch 5 TB No Yes Yes Yes
/global/project Yes No Yes Yes

Some Tips

  • Avoid text format files for large data.
  • Use local storage for temporary files. The scheduler provides this ($TMPDIR or $TMPDISK) which is created when you job starts on a compute node, e.g. TMPDISK=/lscratch/slurm-job-6363084.
  • Searches should be done in memory rather than on disk.
  • Regularly clean up data, especially in scratch.
  • Unused files that have to be kept should be moved off-system.