Fortran lets you open a "scratch" file with
OPEN(STATUS='SCRATCH',...)
with the restriction that
FILE="filename"
is not allowed on the OPEN(3f)
. What actually
occurs is very system-dependent.
So what exactly is the difference between a scratch file and a named
external file? For example, what are the potential advantages
of opening a scratch file versus doing a normal named OPEN
(3f)
followed by closing the file later with STATUS='DELETE'
? What are
the disadvantages?
There are many extensions and differences between programming environments regarding scratch files. What is really standards-conforming? Why are there so many extensions?
What will you do differently if you know exactly how scratch files are implemented?
If nothing else, if you use scratch files reading this will convince you you need to read your compiler documentation and make sure you understand how scratch files are treated in your PE (Programming Environment).
A scratch file is a special type of external file. It is an unnamed temporary file intended to exist only while being used by a single program execution (It does not need named as the intent is that it will not be permanently saved to disk).
In all other respects, a scratch file behaves like other external files. So the only unique things required of a scratch file are
OPEN
(3f) with FILE='some_filename'
when closed explicitly or at normal program termination a scratch file is required to be deleted. That implies you cannot do
CLOSE(LUN,STATUS='KEEP',...) ! not allowed
to retain a scratch file using standard Fortran.
This simple definition gives some freedom in implementation that does not apply to typical external files. A scratch file easily might not even be a disk file -- it might be transparently implemented in memory or in a database, for example. That being said scratch files are commonly implemented as system files too.
Note the Fortran standard does not say other files have to be regular system files either, but that is generally expected outside of highly specialized environments.
So one thing special about a scratch file is the programmer is not permitted to specify the pathname in the OPEN(3f) of the file. This means the user can be provided other ways external to Fortran (or as an extension) to specify where the scratch data resides.
So many Fortran implementations use the tempnam(3c) system routine or an equivalent procedure to initially name the scratch file when opening it at the system level.
Among other things this generally allows the directory name where the file is to be created to be controlled external to the program.
Typically a capability to specify where scratch files are created is provided via the first defined environment variable from the set TMPDIR, TMP, and TEMP (almost always in that order) without requiring access to the program code. TMPDIR is the preferred variable name to use. If no scratch directory is specified GNU/Linux and Unix systems usually place the file in /tmp/; although some systems default to the current directory.
If external control of where scratch files go is provided here are some reasons to do so:
Since the program cannot specify the file pathname a Fortran implementation is freed to use methods to make sure the scratch pathnames are unique, which is another aspect of such procedures as tempnam(3c).
If the Fortran implementation automatically uniquely names scratch files pathname collisions caused by multiple programs accessing the same filename is eliminated. Then no one has to worry about scratch file pathnames being duplicated and colliding even when multiple commands are running.
If your compiler does not have an option to automatically generate unique pathnames for scratch files it should.
Since there is no requirement for a scratch file to be accessible by other processes or to exist even after abnormal program termination the implementation can also unlink(3c) the files as soon as they are created.
If a system supports unlinking a file, then a pathname becomes unavailable to normal file system commands (usually immediately after it is created) -- e.g. it cannot be seen by the (GNU/Linux and Unix) ls(1) command and cannot be opened by any other normal process.
On most systems, unlinking the file has the advantage that the file will disappear when the program terminates even if the process terminates abnormally.
So lets say your compiler takes advantage of the special attributes of a scratch file to make them go away even after abnormal termination, have their location controlled by external means such as the TMPDIR environment variable, and automatically be created with unique names that do not collide with other files.
remember these features are all implementation-dependent. Those behaviors are allowed by the standard, but not required. On the other hand if your compiler does not do these things, "OPEN(STATUS='SCRATCH'...)" is really no different than opening a non-scratch file and closing it at program termination with "STATUS='DELETE'".
if a file is unlinked so that other system processes can no longer open it, you cannot easily access the data for debugging.
if a file is unlinked so it goes away when the process does it is harder for anyone else such as a system administrator to tell who is using and possibly filling up a file system. A file system can be full but look like it has no files in it because unlinked files are not visible to other system commands. Depending on your system you may find commands like pfiles(1), lsof(1), "netstat -vatupn" or "find /proc/$PID/fs -ls" on most GNU/Linux and Unix file systems will let you find what files a process has open and where they reside even if they are unlinked.
Some implementations may generate a "unique" filename that is only unique to a particular compute node and in addition may not unlink the files. That means if scratch files are going into the same directory on a shared file system such as an NFS file server from different compute nodes the generated names may not be unique and could collide with scratch files being generated on other compute nodes.
Sadly, some compilers just name scratch files something like fort.LUN in the current directory, nullifying almost all the potentially automatic advantages of opening a scratch file. That is still standard-conforming. So if you build with multiple compilers make sure your application instructions let users know where scratch files will be generated and how they will be named for each environment.
On the other hand some compilers like the IBM compiler default to acting as described above but have extensions to allow for naming scratch files with an environment variable when desired (XLSFSCRATCH_unit and the runtime option "scratchvars"). These extensions are particularly handy for debugging and allocating I/O resources, as you have a pathname to a regular file you can control and see with system commands when the needs arise, but get the best features of a scratch file by default when you do not.
There are many other common extensions. Some PEs allow you to name the scratch file when opened. This raises a lot of questions about what happens if the file exists already, while with a properly implemented scratch file the system should be making sure a scratch file does not collide with other files for you. So if named scratch files are allowed and exist should the OPEN(3f) abort the program, return an error, or erase the file if you have permission to do so?
Some extensions let you do an INQUIRE(3f) and get the filename being used as a scratch file, which can be handy. Technically
inquire(unit=lun2,named=ifnamed)Should return
ifnamed
as false, which means you can not
query the pathname, so this is not portable.
Some PEs allow you to close a scratch file with
STATUS='KEEP'
and to close the file with a
non-standard NAME=
or FILE=
option on
CLOSE(3f)
. This is very non-standard and non-portable.
Even if an environment variable is allowed to specify a scratch directory that does not solve the problem of wanting scratch files from the same program execution to go to different devices for space or performance reasons. As mentioned, Some compilers allow multiple environment variables to be used with a LUN number as part of the variable name to allow for this (SCRATCH_10 would say where LUN 10 goes, SCRATCH_11 where LUN 11 goes, ....). Other solutions are to change the value of TMPDIR before the OPEN(3f) using (currently non-standard) procedures to change an environment variable, or creating scratch directories (to allow for easy clean-up when programs do not terminate properly) and using named files instead of scratch files. You might want to reserve a range of LUNS specifically for being closed with STATUS='DELETE' and create a MYSTOP() subroutine that is always called instead of STOP or try some other creative methods if that is an issue. As performance and size of disks has increased rapidly this is less of a problem than it used to be for all but the most I/O-intensive applications.
I have actually not had a problem with opening files with names and then calling the C routine unlink(3c) to make sure they go away at program termination on many Unix systems, but that is certainly not standard. Many compilers have an extension on the OPEN(3f), usually called DISP='delete' that lets you say at the time the files are open that they should be deleted at the time they are closed instead of having to remove them with a call to CLOSE(STATUS='DELETE',...). All of this is non-standard, of course.
On all systems I know of if you do not specify a filename on an
OPEN(3f)
`OPEN(UNIT=7)`
A regular system file is created, often named "fort.NNN" where NNN is the unit LUN number. The filename is system-dependent. The standard says
"If the filename is omitted and the unit is not connected to a file,
the STATUS= specifier shall be specified with a value of SCRATCH;
in this case, the connection is made to a processor-dependent file."
That sounds like it means if you do an OPEN(3f)
on a
file that is not preconnected without a filename that it will be a
scratch file and be removed at program termination. I have not seen
a compiler do that yet. Regular files like "ftn.NNN" or "fort.NNN"
are usually created either in the current directory or where the
environment variable TMPNAM points to.
Here is a skeleton program that gives some ideas on how to see possibly significant environment variables, test if your system lets you use INQUIRE(3f) to get a scratch file name, and pause while a scratch file is open so you can examine your system and see if you can see scratch files and what their location and permissions are:
program demo_scratch
implicit none
logical :: ifnamed
character :: paws
character(len=4096) :: filename,buffer
integer :: ios,lun,lun2,lun3
write(*,*)'try to see where scratch files are open'
write(*,*)'This is very implementation-dependent'
write(*,*)'See your compiler documentation!'
write(*,*)
write(*,*)'Likely suspects if an environment variable'
write(*,*)'is used to determine what directory a scratch'
write(*,*)'file is written in (probably in this order):'
write(*,*)
buffer=' '
call get_environment_variable('TMPDIR',buffer)
write(*,*)'TMPDIR=',trim(buffer)
call get_environment_variable('TMP',buffer)
write(*,*)'TMP=',trim(buffer)
call get_environment_variable('TEMP',buffer)
write(*,*)'TEMP=',trim(buffer)
write(*,*)'The strings(1) command can sometimes give you a fair idea'
write(*,*)'of what environment variables your program is aware of'
write(*,*)'(very system-dependent on what strings(1) will show if it is available).'
write(*,*)
! see if you can query the pathname of a scratch file
open(newunit=lun2)
inquire(unit=lun2,named=ifnamed)
if(ifnamed)then
inquire(unit=lun2,name=filename)
write(*,*)'If you can see the pathname of a scratch file'
write(*,*)'the standard does not require it ...'
write(*,*)'filename=',trim(filename)
endif
! NOTE: NOT standard, but some compilers let you specify a name
! to create a named scratch file...
!!open(newunit=lun,status='scratch',file='where_I_want')
! assuming you cannot see a scratch file name pause your program
! so you can examine your system to learn how to find where
! applications are putting scratch files and how to detect them
open(newunit=lun,status='scratch')
write(*,*)lun,'unit opened as scratch'
inquire(unit=lun,named=ifnamed)
if(ifnamed)then
inquire(unit=lun,name=filename)
write(*,*)'filename=',trim(filename)
else
write(*,*)
write(*,*)'file not named, so while the program is paused'
write(*,*)'look at what files it has open (lsof, pfiles, ....'
write(*,'("Enter return to end program...")',advance='no')
read(*,'(a)',iostat=ios)paws
endif
write(*,*)
call mystop()
end program demo_scratch
subroutine mystop()
implicit none
integer :: i,ios
logical ifnamed, ifopen, ifexist
character(len=4096) :: filename, msg
do i=-1000,1000 ! unfortunately, you cannot get a list of open units
filename=''
! ifexist, ifopen, and ifnamed always become defined unless an error condition occurs.
inquire(i,opened=ifopen,iostat=ios,named=ifnamed,exist=ifexist,iomsg=msg)
if(ios.ne.0)then
write(*,'(a)')repeat('=',80)
write(*,*)'error occurred on query of LUN ',i
write(*,*)trim(msg)
cycle
endif
if(.not.ifopen)then
cycle
endif
write(*,'(a)')repeat('=',80)
if(ifnamed)then
inquire(unit=i,name=filename)
write(*,*)'closing unit',i,' filename ',trim(filename)
else
write(*,*)'closing unnamed unit ',i
if(ifexist)then
write(*,*)'this is probably an unlinked scratch file (exists but not named)'
endif
endif
if(ifexist)then
write(*,*)'exists'
close(i,iostat=ios,iomsg=msg)
if(ios.ne.0)then
write(*,*)trim(msg)
endif
endif
enddo
stop
end subroutine mystop
By definition scratch files are files the program interacts with as opposed to a person or other system commands. Therefore a large scratch file should almost always be a binary file to improve performance (versus a formatted text file). For simplicity that is not the case in some of the examples in this article.
If cleaning up files like "fort.LUN", or using the LUN to build filenames yourself remember the LUN number is often negative when using "OPEN(NEWUNIT=LUN,...)". So something like OPEN(NEWUNIT=LUN) may very well create a file such as "fort.-10".
A user of a program may have no idea the program is using scratch files, especially if they are not implemented as normal files in the current directory of the process. So if scratch files are not unlinked make sure they are being cleaned up and not left to become filesystem clutter and make sure the user understands the system demands made by the scratch files.
That is, since scratch file names are by definition system-dependent, it is hard to have automated system clean-up utilities or wrappers generically clean them up when programs abnormally terminate if scratch files are not unlinked files. That is why it can be handy to make a subdirectory for all your scratch files that is easily identified as scratch files, such as "/tmp/scratch.123/".
Here are some vendor-specific statements about scratch files. These are just examples and may not be current. See the vendor documentation for authoritative information ...
Scratch files go into a temporary directory and are visible while they are open. Scratch files are deleted when the unit is closed or when the program terminates normally, whichever occurs first. To specify the path for scratch files, you can use one of the following environment variables: On Windows* OS: FORT_TMPDIR, TMP, or TEMP, searched in that order On Linux* OS and OS X*: FORT_TMPDIR or TMPDIR, searched in that order If no environment variable is defined, the default is the current directory.
When opening a file with STATUS='SCRATCH', GNU Fortran tries to create the file in one of the potential directories by testing each directory in the order below.
Preliminary thoughts on how I wished scratch files could be defined ...
Standard Fortran does not currently have direct access to the POSIX functions like the C tempnam(3c) and unlink(3c) procedures, so if you use multiple programming environments it takes a lot of work to make sure you are creating a unique filename and that you clean up scratch files even when programs end abnormally. The Fortran standard distances itself from almost any specific underlying system requirements, which some think is a good idea and others do not. But in the case of creating scratch files it would be nice if something like
character(len=:),allocatable :: fn OPEN(unit|newunit=lun, newfile=fn [,newdir=dir] [,DISP='UNLINK'|'DELETE'|'KEEP'] )
was supported, where FN would be a returned unique pathname to a nonexistent file, NEWDIR would be an optional directory name, and if NEWDIR is not specified the environment variable TMPDIR would be looked for as a default value, and so on much like tmpnam(3c). The DISP option would allow you to unlink(3c) the file as soon as it is created if supported, but to delete by default at program termination. Allowing DELETE or KEEP would let you set the default for program terminator or a close unless a close occurs with an explicit STATUS='DELETE|KEEP'. If the OPEN(3f) specified DISP='UNLINK' you would not be able to specify CLOSE(STATUS='KEEP',...).
If the environment variable FORTRAN_SCRATCH_DIR is defined it would override TMPDIR so TMPDIR does not have to be changed when desired (as it is a POSIX default that can effect (intentionally) many other applications). If the environment variable FORTRAN_SCRATCH_UNIT_nnn is defined, it would override UNLINK on the status and treat it as DELETE, but would open a scratch file opened with LUN "nnn" with the specified name, deleting the file if it exists at the time of being open.
You would also want to add DISP= to INQUIRE(3f).
Another interesting option would be to support a DISP='NULL' option as well for writes like debug statements or optional files you sometimes do not want to generate, without making all the I/O statements conditional. This would allow for the program to skip the I/O statements efficiently, but relieve the programmer from having to make all the I/O statements conditional. This would also mean that function calls in the I/O statements should not be allowed to have side-effects, or it could get very confusing. To really be useful, you would need an option to have multiple units point to the same file as an alias (not as a seperate file or issues with flushing gets involved). That way you could have writes to the same log file or pre-assigned files that could be turned on and off in groups. Perhaps:
! IMAGINARY CODE, NOT REAL FORTRAN CODE integer :: DEBUG logical :: debugmode debugmode=.true. open(unit=10) if(debugmode)then open(unit=10,newunit=debug,disp='null') else open(unit=10,newunit=DEBUG) endif write(10,*) 'I ALWAYS WANT TO WRITE THIS' write(DEBUG,*) 'WRITE THIS IF DEBUGMODE.EQ..TRUE.'
PE:
PE stands for Programming Environment. This includes the constraints imposed by your compiler, loader, operating system and hardware.
urbanjost 20171118