Fortran Scratch Files

Fortran lets you open a "scratch" file with OPEN(STATUS='SCRATCH',...) with the restriction that FILE="filename" is not allowed on the OPEN(3f). What actually occurs is very system-dependent.

So what exactly is the difference between a scratch file and a named external file? For example, what are the potential advantages of opening a scratch file versus doing a normal named OPEN(3f) followed by closing the file later with STATUS='DELETE'? What are the disadvantages?

There are many extensions and differences between programming environments regarding scratch files. What is really standards-conforming? Why are there so many extensions?

What will you do differently if you know exactly how scratch files are implemented?

If nothing else, if you use scratch files reading this will convince you you need to read your compiler documentation and make sure you understand how scratch files are treated in your PE (Programming Environment).

What is a scratch file?

A scratch file is a special type of external file. It is an unnamed temporary file intended to exist only while being used by a single program execution (It does not need named as the intent is that it will not be permanently saved to disk).

In all other respects, a scratch file behaves like other external files. So the only unique things required of a scratch file are

a scratch file cannot be given a pathname on an OPEN(3f) with FILE='some_filename'
when closed explicitly or at normal program termination a scratch file is required to be deleted. That implies you cannot do
```
   CLOSE(LUN,STATUS='KEEP',...) ! not allowed
```
to retain a scratch file using standard Fortran.

This simple definition gives some freedom in implementation that does not apply to typical external files. A scratch file easily might not even be a disk file -- it might be transparently implemented in memory or in a database, for example. That being said scratch files are commonly implemented as system files too.

Note the Fortran standard does not say other files have to be regular system files either, but that is generally expected outside of highly specialized environments.

controlling the location of a scratch file

So one thing special about a scratch file is the programmer is not permitted to specify the pathname in the OPEN(3f) of the file. This means the user can be provided other ways external to Fortran (or as an extension) to specify where the scratch data resides.

So many Fortran implementations use the tempnam(3c) system routine or an equivalent procedure to initially name the scratch file when opening it at the system level.

Among other things this generally allows the directory name where the file is to be created to be controlled external to the program.

Typically a capability to specify where scratch files are created is provided via the first defined environment variable from the set TMPDIR, TMP, and TEMP (almost always in that order) without requiring access to the program code. TMPDIR is the preferred variable name to use. If no scratch directory is specified GNU/Linux and Unix systems usually place the file in /tmp/; although some systems default to the current directory.

Being able to control where the scratch files are generated is very important

If external control of where scratch files go is provided here are some reasons to do so:

A user can make sure scratch files are generated in a filesystem with sufficient space, preventing filesystems from filling.
performance can be optimized by selecting such resources as a local disk, a memory-resident filesystem, or a high-speed parallel file server (eg. a Lustre server), as appropriate.
you can spread out concurrent program executions onto different resources, preventing a fileserver from being overloaded.
security can be improved by placing files in private directories. Not every implementation will provide unlinked files, which are inherently more secure than visible files; so assume the file will be a conventional file and make sure your file permissions for newly created files are as limited as possible. On GNU/Linux and Unix machines remember it is not only your umask(1), but kernel and OS defaults and fileserver options that can affect file permissions.
other processes can be protected by using quotaed file space so any unexpected file system usage does not crash other programs.
if scratch files are implemented as named files it will be a lot easier to clean up scratch files when a program fails if you can place all the scratch files in a scratch directory.

Having unique names generated by the system avoids a common problem

Since the program cannot specify the file pathname a Fortran implementation is freed to use methods to make sure the scratch pathnames are unique, which is another aspect of such procedures as tempnam(3c).

If the Fortran implementation automatically uniquely names scratch files pathname collisions caused by multiple programs accessing the same filename is eliminated. Then no one has to worry about scratch file pathnames being duplicated and colliding even when multiple commands are running.

If your compiler does not have an option to automatically generate unique pathnames for scratch files it should.

automatic cleanup for terminated processes

Since there is no requirement for a scratch file to be accessible by other processes or to exist even after abnormal program termination the implementation can also unlink(3c) the files as soon as they are created.

If a system supports unlinking a file, then a pathname becomes unavailable to normal file system commands (usually immediately after it is created) -- e.g. it cannot be seen by the (GNU/Linux and Unix) ls(1) command and cannot be opened by any other normal process.

On most systems, unlinking the file has the advantage that the file will disappear when the program terminates even if the process terminates abnormally.

Disadvantages

So lets say your compiler takes advantage of the special attributes of a scratch file to make them go away even after abnormal termination, have their location controlled by external means such as the TMPDIR environment variable, and automatically be created with unique names that do not collide with other files.

remember these features are all implementation-dependent. Those behaviors are allowed by the standard, but not required. On the other hand if your compiler does not do these things, "OPEN(STATUS='SCRATCH'...)" is really no different than opening a non-scratch file and closing it at program termination with "STATUS='DELETE'".
if a file is unlinked so that other system processes can no longer open it, you cannot easily access the data for debugging.
if a file is unlinked so it goes away when the process does it is harder for anyone else such as a system administrator to tell who is using and possibly filling up a file system. A file system can be full but look like it has no files in it because unlinked files are not visible to other system commands. Depending on your system you may find commands like pfiles(1), lsof(1), "netstat -vatupn" or "find /proc/$PID/fs -ls" on most GNU/Linux and Unix file systems will let you find what files a process has open and where they reside even if they are unlinked.
Some implementations may generate a "unique" filename that is only unique to a particular compute node and in addition may not unlink the files. That means if scratch files are going into the same directory on a shared file system such as an NFS file server from different compute nodes the generated names may not be unique and could collide with scratch files being generated on other compute nodes.

Extensions

Sadly, some compilers just name scratch files something like fort.LUN in the current directory, nullifying almost all the potentially automatic advantages of opening a scratch file. That is still standard-conforming. So if you build with multiple compilers make sure your application instructions let users know where scratch files will be generated and how they will be named for each environment.

On the other hand some compilers like the IBM compiler default to acting as described above but have extensions to allow for naming scratch files with an environment variable when desired (XLSFSCRATCH_unit and the runtime option "scratchvars"). These extensions are particularly handy for debugging and allocating I/O resources, as you have a pathname to a regular file you can control and see with system commands when the needs arise, but get the best features of a scratch file by default when you do not.

There are many other common extensions. Some PEs allow you to name the scratch file when opened. This raises a lot of questions about what happens if the file exists already, while with a properly implemented scratch file the system should be making sure a scratch file does not collide with other files for you. So if named scratch files are allowed and exist should the OPEN(3f) abort the program, return an error, or erase the file if you have permission to do so?

Some extensions let you do an INQUIRE(3f) and get the filename being used as a scratch file, which can be handy. Technically

       inquire(unit=lun2,named=ifnamed)

Should return ifnamed as false, which means you can not query the pathname, so this is not portable.

Some PEs allow you to close a scratch file with STATUS='KEEP' and to close the file with a non-standard NAME= or FILE= option on CLOSE(3f). This is very non-standard and non-portable.

extensions and methods for controlling residence of scratch files

Even if an environment variable is allowed to specify a scratch directory that does not solve the problem of wanting scratch files from the same program execution to go to different devices for space or performance reasons. As mentioned, Some compilers allow multiple environment variables to be used with a LUN number as part of the variable name to allow for this (SCRATCH_10 would say where LUN 10 goes, SCRATCH_11 where LUN 11 goes, ....). Other solutions are to change the value of TMPDIR before the OPEN(3f) using (currently non-standard) procedures to change an environment variable, or creating scratch directories (to allow for easy clean-up when programs do not terminate properly) and using named files instead of scratch files. You might want to reserve a range of LUNS specifically for being closed with STATUS='DELETE' and create a MYSTOP() subroutine that is always called instead of STOP or try some other creative methods if that is an issue. As performance and size of disks has increased rapidly this is less of a problem than it used to be for all but the most I/O-intensive applications.

extensions and methods for non-standard automatic deletion of named files

I have actually not had a problem with opening files with names and then calling the C routine unlink(3c) to make sure they go away at program termination on many Unix systems, but that is certainly not standard. Many compilers have an extension on the OPEN(3f), usually called DISP='delete' that lets you say at the time the files are open that they should be deleted at the time they are closed instead of having to remove them with a call to CLOSE(STATUS='DELETE',...). All of this is non-standard, of course.

Are unnamed files explicitly scratch?

On all systems I know of if you do not specify a filename on an OPEN(3f)

    `OPEN(UNIT=7)`

A regular system file is created, often named "fort.NNN" where NNN is the unit LUN number. The filename is system-dependent. The standard says

  "If the filename is omitted and the unit is not connected to a file,
  the STATUS= specifier shall be specified with a value of SCRATCH;
  in this case, the connection is made to a processor-dependent file."

That sounds like it means if you do an OPEN(3f) on a file that is not preconnected without a filename that it will be a scratch file and be removed at program termination. I have not seen a compiler do that yet. Regular files like "ftn.NNN" or "fort.NNN" are usually created either in the current directory or where the environment variable TMPNAM points to.

skeleton program for exploring scratch files

Here is a skeleton program that gives some ideas on how to see possibly significant environment variables, test if your system lets you use INQUIRE(3f) to get a scratch file name, and pause while a scratch file is open so you can examine your system and see if you can see scratch files and what their location and permissions are:

 
 program demo_scratch
 implicit none
 logical             :: ifnamed
 character           :: paws
 character(len=4096) :: filename,buffer
 integer             :: ios,lun,lun2,lun3

 write(*,*)'try to see where scratch files are open'
 write(*,*)'This is very implementation-dependent'
 write(*,*)'See your compiler documentation!'
 write(*,*)

 write(*,*)'Likely suspects if an environment variable'
 write(*,*)'is used to determine what directory a scratch'
 write(*,*)'file is written in (probably in this order):'
 write(*,*)

 buffer=' '
 call get_environment_variable('TMPDIR',buffer)
 write(*,*)'TMPDIR=',trim(buffer)
 call get_environment_variable('TMP',buffer)
 write(*,*)'TMP=',trim(buffer)
 call get_environment_variable('TEMP',buffer)
 write(*,*)'TEMP=',trim(buffer)
 write(*,*)'The strings(1) command can sometimes give you a fair idea'
 write(*,*)'of what environment variables your program is aware of'
 write(*,*)'(very system-dependent on what strings(1) will show if it is available).'
 write(*,*)

 ! see if you can query the pathname of a scratch file
 open(newunit=lun2) 
 inquire(unit=lun2,named=ifnamed)
 if(ifnamed)then
    inquire(unit=lun2,name=filename)
    write(*,*)'If you can see the pathname of a scratch file'
    write(*,*)'the standard does not require it ...'
    write(*,*)'filename=',trim(filename)
 endif

 ! NOTE: NOT standard, but some compilers let you specify a name 
 !       to create a named scratch file...
 !!open(newunit=lun,status='scratch',file='where_I_want')

 ! assuming you cannot see a scratch file name pause your program
 ! so you can examine your system to learn how to find where
 ! applications are putting scratch files and how to detect them
 open(newunit=lun,status='scratch')
 write(*,*)lun,'unit opened as scratch'
 inquire(unit=lun,named=ifnamed)
 if(ifnamed)then
    inquire(unit=lun,name=filename)
    write(*,*)'filename=',trim(filename)
 else
    write(*,*)
    write(*,*)'file not named, so while the program is paused' 
    write(*,*)'look at what files it has open (lsof, pfiles, ....'
    write(*,'("Enter return to end program...")',advance='no')
    read(*,'(a)',iostat=ios)paws
 endif
 write(*,*)
 call mystop()
 end program demo_scratch
 subroutine mystop()
 implicit none
 integer :: i,ios
 logical ifnamed, ifopen, ifexist
 character(len=4096) :: filename, msg
 do i=-1000,1000 ! unfortunately, you cannot get a list of open units
    filename=''
    ! ifexist, ifopen, and ifnamed always become defined unless an error condition occurs.
    inquire(i,opened=ifopen,iostat=ios,named=ifnamed,exist=ifexist,iomsg=msg)
    if(ios.ne.0)then
       write(*,'(a)')repeat('=',80)
       write(*,*)'error occurred on query of LUN ',i
       write(*,*)trim(msg)
       cycle
    endif
    if(.not.ifopen)then
       cycle
    endif
    write(*,'(a)')repeat('=',80)
    if(ifnamed)then
       inquire(unit=i,name=filename)
       write(*,*)'closing unit',i,' filename ',trim(filename)
    else
       write(*,*)'closing unnamed unit ',i
       if(ifexist)then
          write(*,*)'this is probably an unlinked scratch file (exists but not named)'
       endif
    endif
    if(ifexist)then
       write(*,*)'exists'
       close(i,iostat=ios,iomsg=msg)
       if(ios.ne.0)then
          write(*,*)trim(msg)
       endif
    endif
 enddo
 stop
 end subroutine mystop

Footnotes

Big scratch files should be unformatted

By definition scratch files are files the program interacts with as opposed to a person or other system commands. Therefore a large scratch file should almost always be a binary file to improve performance (versus a formatted text file). For simplicity that is not the case in some of the examples in this article.

LUNs can be negative values even if you cannot specify a negative UNIT= value

If cleaning up files like "fort.LUN", or using the LUN to build filenames yourself remember the LUN number is often negative when using "OPEN(NEWUNIT=LUN,...)". So something like OPEN(NEWUNIT=LUN) may very well create a file such as "fort.-10".

Make sure scratch files are documented

A user of a program may have no idea the program is using scratch files, especially if they are not implemented as normal files in the current directory of the process. So if scratch files are not unlinked make sure they are being cleaned up and not left to become filesystem clutter and make sure the user understands the system demands made by the scratch files.

That is, since scratch file names are by definition system-dependent, it is hard to have automated system clean-up utilities or wrappers generically clean them up when programs abnormally terminate if scratch files are not unlinked files. That is why it can be handy to make a subdirectory for all your scratch files that is easily identified as scratch files, such as "/tmp/scratch.123/".

Vendor-specific examples

Here are some vendor-specific statements about scratch files. These are just examples and may not be current. See the vendor documentation for authoritative information ...

Intel Fortran

 Scratch files go into a temporary directory and are visible while they
are open. Scratch files are deleted when the unit is closed or when the
program terminates normally, whichever occurs first.

To specify the path for scratch files, you can use one of the following
environment variables:

   On Windows* OS: FORT_TMPDIR, TMP, or TEMP, searched in that order

   On Linux* OS and OS X*: FORT_TMPDIR or TMPDIR, searched in that order

If no environment variable is defined, the default is the current directory.

GNU Fortran (gfortran)

When opening a file with STATUS='SCRATCH', GNU Fortran tries to create the file in one of the potential directories by testing each directory in the order below.

The environment variable TMPDIR, if it exists.
On the MinGW target, the directory returned by the GetTempPath function. Alternatively, on the Cygwin target, the TMP and TEMP environment variables, if they exist, in that order.
The P_tmpdir macro if it is defined, otherwise the directory /tmp.

Oracle Fortran 77

STATUS=sta

The STATUS=sta clause is optional. sta is a character expression. Possible values are: 'OLD', 'NEW', 'UNKNOWN', or 'SCRATCH'.
'OLD'-- The file already exists (nonexistence is an error). For example: STATUS='OLD'.
'NEW' -- The file doesn't exist (existence is an error). If 'FILE=name' is not specified, then a file named 'fort.n' is opened, where n is the specified logical unit.
'UNKNOWN' -- Existence is unknown. This is the default.
'SCRATCH' -- For a file opened with STATUS='SCRATCH', a temporary file with a name of the form tmp.FAAAxnnnnn is opened. Any other STATUS specifier without an associated file name results in opening a file named 'fort.n', where n is the specified logical unit number. By default, a scratch file is deleted when closed or during normal termination. If the program aborts, then the file may not be deleted. To prevent deletion, CLOSE with STATUS='KEEP'.
The FORTRAN 77 Standard prohibits opening a named file as scratch: if OPEN has a FILE=name option, then it cannot have a STATUS='SCRATCH' option. This FORTRAN extends the standard by allowing opening named files as scratch. Such files are normally deleted when closed or at normal termination.
TMPDIR: FORTRAN programs normally put scratch files in the current working directory. If the TMPDIR environment variable is set to a writable directory, then the program puts scratch files there.

Thoughts

Preliminary thoughts on how I wished scratch files could be defined ...

Standard Fortran does not currently have direct access to the POSIX functions like the C tempnam(3c) and unlink(3c) procedures, so if you use multiple programming environments it takes a lot of work to make sure you are creating a unique filename and that you clean up scratch files even when programs end abnormally. The Fortran standard distances itself from almost any specific underlying system requirements, which some think is a good idea and others do not. But in the case of creating scratch files it would be nice if something like

      character(len=:),allocatable :: fn
      OPEN(unit|newunit=lun, newfile=fn [,newdir=dir] [,DISP='UNLINK'|'DELETE'|'KEEP'] )

was supported, where FN would be a returned unique pathname to a nonexistent file, NEWDIR would be an optional directory name, and if NEWDIR is not specified the environment variable TMPDIR would be looked for as a default value, and so on much like tmpnam(3c). The DISP option would allow you to unlink(3c) the file as soon as it is created if supported, but to delete by default at program termination. Allowing DELETE or KEEP would let you set the default for program terminator or a close unless a close occurs with an explicit STATUS='DELETE|KEEP'. If the OPEN(3f) specified DISP='UNLINK' you would not be able to specify CLOSE(STATUS='KEEP',...).

If the environment variable FORTRAN_SCRATCH_DIR is defined it would override TMPDIR so TMPDIR does not have to be changed when desired (as it is a POSIX default that can effect (intentionally) many other applications). If the environment variable FORTRAN_SCRATCH_UNIT_nnn is defined, it would override UNLINK on the status and treat it as DELETE, but would open a scratch file opened with LUN "nnn" with the specified name, deleting the file if it exists at the time of being open.

You would also want to add DISP= to INQUIRE(3f).

Another interesting option would be to support a DISP='NULL' option as well for writes like debug statements or optional files you sometimes do not want to generate, without making all the I/O statements conditional. This would allow for the program to skip the I/O statements efficiently, but relieve the programmer from having to make all the I/O statements conditional. This would also mean that function calls in the I/O statements should not be allowed to have side-effects, or it could get very confusing. To really be useful, you would need an option to have multiple units point to the same file as an alias (not as a seperate file or issues with flushing gets involved). That way you could have writes to the same log file or pre-assigned files that could be turned on and off in groups. Perhaps:

       ! IMAGINARY CODE, NOT REAL FORTRAN CODE
       integer :: DEBUG
       logical :: debugmode
       debugmode=.true.
       open(unit=10)
       if(debugmode)then
          open(unit=10,newunit=debug,disp='null')
       else
          open(unit=10,newunit=DEBUG)
       endif
       write(10,*) 'I ALWAYS WANT TO WRITE THIS'
       write(DEBUG,*) 'WRITE THIS IF DEBUGMODE.EQ..TRUE.'

Definitions

PE:

PE stands for Programming Environment. This includes the constraints imposed by your compiler, loader, operating system and hardware.

urbanjost 20171118

category: code

Revised on Sat, Nov 18, 2017 6:16:18 PM by JSU