Data Management

The cluster has various volumes and you need to be familiar with them in order to manage your files.

/export is a 4.6TB space. This is the volume where your home directory is located (/home/<username>). This volume is incrementally backed up, nightly, to the Carnegie Mellon's School of Computer Science tape backup system.The size of this volume limits us to what can be stored on it. In addition, if the volume becomes 100% full, it could the complete cluster to come to a standstill. Therefore, we have enabled quotes on this volume. This means that each user will only be able to use, at maximum, predetermined amount of space. Please limit the files to smaller sized files. (e.g. configuration files, notes and documents, software development programs). (I believe the quota will be in the range of 100-200GBs of space.

For data files and logs which tend to be considerably larger files, we have a large RAID 6 volume. The file system used for this space is a "zfs" file system. The ZFS file system is a relatively new file system that contains features and benefits not found in more traditional unix file systems. This volume will NOT be backed by the CMU SCS tape backup system (it is too large of this). The volume has been configured to be reliable, but we will also have a method in place to protect the data in an unlikely event of a complete RAID failure. We will be using a second RAID volume and a the snapshot feature included in the zfs file system. A snapshot is a read-only copy of a file system or volume. Snapshots can be created, and they initially consume no additional disk space within the pool. However, as data within the active dataset changes, the snapshot consumes disk space by continuing to reference the old data, thus preventing the disk space from being freed. This will give users some protected of the accidental deletion of files. The ZFS filesystems are built on top of virtual storage pools called zpools. We plan to create a zpool for each lab. In addition, we will create a UNIX group for each individual lab and put users in whatever groups they need access to that lab's zpool. Once the zpools are created, they'll be automounted through /data2 (e.g. /data2/tarrlab, /data2/coaxlab, etc.) Within the zpool, labs can organize their data in whichever way they would like. If files need recovered, we can do so using the snapshot for that zpool.

-- David Pane - 2015-06-12

Comments

Edit | Attach | Watch | Print version | History: r6 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2015-06-17 - dpane
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback