|
Synchronizing Open Files in AFS:
The Problem
You have a long-running process
on Machine A (e.g. a HPCC compute machine which you can't log into) that is
continually appending data to a file in AFS. You'd like to look at the data
on Machine B (e.g a HPCC front-end machine, LINUX node, or SGI/SUN workstation
in the cluster) while the program is running (to see if things are running OK
or just for curiosity). Unfortunately, though you can open the file in AFS on
machine B, it doesn't have any data in it.
This "problem" is
caused by the AFS cache manager. AFS's caching system allows programs running
on the same host to do most AFS file I/O to and from a cache located in the
memory or disk of that host. This greatly speeds operation compared to
non-cached network file systems like NFS, especially for files used
read/write when the network is slow or congested. Unfortunately, the AFS
cache manager daemons don't store files to the server until they are closed
or fsync'd. That means data written to a file by a program can only bee
"seen" by other programs on the same machine until the file is
closed. Programs on other machines only "see" data that was in the
file when it was last closed/created (hence new files appear empty).
The Solution -- fsync()
If you have access to your
source code, the easiest solution is to periodically close() or fsync() the
file so that the AFS cache manager will flush the file to the server.
Unfortunately, many users don't have access to their source code and/or
understanding of the low-level Unix file routines.
The fsync Program
The fsync program is compiled
and has been installed as a part of the SGE (batch queueing module). Fsync
synchronizes the given filename back to the AFS server every 60 seconds. If
you wanted to synchronize a output file from the SGE batch system back to the
AFS server every 60 seconds,( so that you can see the output as explained
above), the command below could be started in the background from batch
script:
fsync /path/to/filea &
You can also use the $SGE_STDOUT_PATH
variable in your script if you wish. The $SGE_STDOUT_PATH is the pathname of
the file to which the standard output stream of the job is diverted. A sample
script using this is given below.
#!/bin/csh
#$ -l arch=irix6
#$ -pe smp 4
#$ -M rich@nd.edu
#$ -m ae
fsync $SGE_STDOUT_PATH &
g98 < testDFT.com
The Source
|