Data Management

The cluster has various volumes and you need to be familiar with them in order to manage your files.

Summary

  1. /home/<username>- each cluster user will have a home directory is on this volume with limited space of under 200GB.
  2. /data2/<labname> - a zfs pool for labs to put shared data (initial size limited is 10TB ).
  3. /data2/<username> - a zfs pool of 1TB for user data.

Details of the various volumes

/home is a 4.6TB space. This is the volume where your home directory is located (/home/<username>). This volume is incrementally backed up, nightly, to the Carnegie Mellon's School of Computer Science tape backup system.The size of this volume limits us to what can be stored on it. In addition, if the volume becomes 100% full, it could bring the complete cluster to a standstill. Therefore, we have enabled quotes on this volume. This means that each user will only be able to use, at maximum, predetermined amount of space. Please limit the files to smaller sized files. (e.g. configuration files, notes and documents, software development programs). I believe the quota will be in the range of 100-200GBs of space.

For data files and logs which tend to be considerably larger files, we have a large RAID 6 volume. The file system used for this space is a "zfs" file system. The ZFS file system is a relatively new file system that contains features and benefits not found in more traditional UNIX file systems. This volume will NOT be backed by the CMU SCS tape backup system (it is too large of this). The volume has been configured to be reliable, but we will also have a method in place to protect the data in an unlikely event of a complete RAID failure. We will be using a second RAID volume and a the snapshot feature included in the zfs file system. A snapshot is a read-only copy of a file system or volume. Snapshots can be created, and they initially consume no additional disk space within the pool. However, as data within the active dataset changes, the snapshot consumes disk space by continuing to reference the old data, thus preventing the disk space from being freed. This will give users some protection of the accidental deletion of files. The ZFS filesystems are built on top of virtual storage pools called zpools. We plan to create a zpool for each lab. In addition, we will create a UNIX group for each individual lab and add users in whatever groups they need access to that lab's zpool. Once the zpools are created, they will be automounted through /data2 (e.g. /data2/tarrlab, /data2/coaxlab, etc.) Within the zpool, labs can organize their data in whichever way they would like. If files need recovered, we can do so using the snapshot for that zpool.

Automounting

Automount: As aforementioned, we will automount the zpools in addition to other mount points on the cluster. Automounting is when the system will automatically mount the volume in response to access operations by the user and/or user programs. When there is a period of time that the mount point is not accessed, the system will unmount the volume.

For instance, if you were to do a listing (UNIX command 'ls') on /data2 and the zpool on /data2, /data2/tarrlab had not been accessed for a while, it would NOT show up in the listing. But if you would directly perform a listing on 'ls /data2/tarrlab' you would see its contents. When the tarrlab zpool had been directly access ( using the 'ls /data2/tarrlab command), it was automatically mounted. It will now be seen using the command 'ls /data2'

-- David Pane - 2015-06-12

Comments

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2016-11-18 - DavidPane
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback