Archiving data

Paul Kerry, version: 31 Oct 2003


Disks available

Even though the hard disks are large within the ultracam computer systems, you must remember that they are not infinite voids of empty space.
You must regularly monitor disk usage (especially on ucam2 which can create 6GB/hr) by using the df -h command. If disk capacity is above 90%, then I/O is impaired. If for example, the capacity of /data on ucam2 hits 100% while data is being written, then the system will crash. You should either move, delete or compress files before this situation arises.

ultracam disks

(information taken from df -h)

Filesystem Size Mounted on
/dev/sda2 66G /
/dev/hdb1 187G /archive
/dev/sda1 23M /boot
/dev/sdb1 67G /ultracam_data
/dev/hda1 56G /win

The /ultracam_data disk is for copying data from ucam2 so backups can be created.

The /archive disk is for long term storage of data files. You should read the further instructions on the use of /archive.

ucam2 disks

(information taken from df -h)

Filesystem Size Mounted on
/dev/hda2 74G /
/dev/hda1 99M /boot
/dev/hdb1 20G /ultracam
/dev/hdb2 94G /data
/dev/sda1 69G /data1
/dev/sdb1 69G /data2
/dev/sdc1 69G /data3
/dev/sdd1 69G /data4

The /ultracam disk contains the control software.

The /data disk is where the instrument writes data to (the ultracam software writes data to /ultracam/data - do not remove the data symbolic link in /ultracam).
Ultracam raw data is created in two files, a data file (.dat) and a xml header file (.xml). Both files are saved with the prefix "run", followed by a number from 001 to 999.

/data1, /data2, /data3 and /data4 make up the SCSI data array (this is not a RAID system). These four disks are for saving recent data from the /data disk. The filesystem in use on these disks is reiserfs (the other disks are ext3 filesystems).

Copying data from ucam2 to ultracam

At the end of the night you should

on ucam2

on ultracam

Copying the files will take several hours as the network speed is 12MB/sec.
DO NOT USE scp -r AS THIS COULD CRASH THE COMPUTER BY OVERLOADING THE FILESYSTEM BUFFERS.
You can automate copying the files by using a script that is provided as a template.

General notes about writing data files

It is likely that you will need to use multiple dvd-r's or tapes to write an entire observing run.
If you are observing on multiple nights, then it is likely you will end up with files of the same name, ie run001.dat and run001.xml from 1st Jan and run001.dat and run001.xml from 2nd Jan. Be careful that you do not copy files/create links with the same name onto your media. It is a good idea to create separate subdirectories using the date of observation for the directoryname.

Tape Drive

The tape drive within ultracam is a Sony SDT-10000 DDS4 20GB (native) device. This also supports all lower DDS versions.
If the status light is on and then flashes off every 4 seconds, then the drive needs cleaning. There is a cleaning cartridge within the computing box file. Simply insert the cleaning tape and wait a few moments - the lights will flash while cleaning is in progress and then the tape will eject automatically. All the lights should now be off. Please write the date on the label so there is a record of how many times the current tape has been used.

The tape drive names are

Tapes are written in the following style

When using multiple tar files on a tape, a marker is automatically inserted when you start the next write.

Useful tape drive commands (man mt for more information)

Writing files to tape

tar is an easy way of writing tapes.
At the time of writing this guide, the version is the redhat updated rpm, tar-1.13.25-4.7.1.i386.rpm

other useful tar options can be found via man tar or info tar but selected ones are

GNU tar uses relative archives by default. This means that you can extract a tarfile into any location. Other tar programs usually use absolute archives which require root access on a machine to create the complete path from "/".
It is a good idea to write on the tape which method you used to write the tape in the first instance - in the future you may not remember how you did it and you will no doubt be pestering your system manager!

Verifying tapes

Extracting tapes

DVD-R Drive

The DVD writer is an IDE interfaced Sony DRU-500AX which supports cd-rom, cd-r, cd-rw, dvd-rom, dvd-r, dvd-rw, dvd+r and dvd+rw.
At the time of writing, Linux software writes dvd-r and dvd-rw formats which are from the DVD Forum.
The other formats are available from Windows.

The Yamaha SCSI cd-r drive is now disconnected and is retained as a spare drive in case of equipment failure.

4700000000 bytes of data will fit onto current 4.7GB DVD-R disks. Note that the true value of 4.7GB is 5046586572 bytes which can be confusing if you are using commands like ls -h. When creating iso images, filesystem information is also written onto the disk which takes some capacaity.

Creating a cd-r

Writing dvd's

All data obtained in a night should be written uncompressed to dvd. This is the master copy and is kept in Sheffield. It takes approximately 30 minutes to write a full dvd.

Observers can make their own dvd's containing just the data they require.

Make sure that you verify the dvd.

Symbolic links can be used to specify files instead of copying large amounts of data around the system.

For an example of symbolic links, look at the files in /home/star/dvd/29_10_03 on ultracam. Remember to create a link to the AutoLogger file so you know what the data files are!

You can run full archive backups while the system is being used by observers if you use the nice command, examples of which are

Walkthrough

in a xterm window

in another xterm window

Verify the dvd

The /archive disk

The 187GB /archive disk should be used when /ultracam_data (or /home) is getting full.
There is a performance decrease compared with the SCSI disks in use within the ultracam system as /archive is an IDE disk. It is possible to overload the filesystem buffers if you try to copy too much data at the same time. You should therefore move files individually or use a script provided as a template.

General information

The data reduction pc is called ultracam.
It utilises an AMD Athlon(TM) XP1900+ (1611MHz) processor system with 1.5GB system memory.
When in Sheffield, it can be accessed via ssh as ultracam.shef.ac.uk
ultracam utilises two network cards - one for external networking which has (in Sheffield) ip address 143.167.4.106 and the other for internal/private networking which has ip address 192.168.1.1
The operating system is Redhat Linux 7.3 with all the latest rpm updates.

The data acquisition pc is called ucam2.
It utilises a dual Pentium III (Coppermine) 1000MHz processor system with 1.0GB system memory.
This runs over a private network controlled by ultracam and can only be accessed from the private network system. The ip address is 192.168.1.2
The operating system is Redhat Linux 9 with a custom built kernel which includes rtai-24.1.11 and bigphysarea-2.4.4

A computer configured to use dhcp plugged into the ultracam network switch will be assigned an ip address of the form 192.168.1.[3-8] automatically.
Networking to the outside world is enabled through ultracam, however it is a good idea to keep networking activity to a minimum from non-essential systems.

Valid XHTML 1.0!