File layer¶
The file layer python module gsd.fl
allows direct low level access to read and write
gsd files of any schema. The hoomd reader (gsd.hoomd
) provides higher level access to
hoomd schema files, see HOOMD.
View the page source to find unformatted example code that can be easily copied.
Open a gsd file¶
In [1]: f = gsd.fl.open(name="file.gsd",
...: mode='wb',
...: application="My application",
...: schema="My Schema",
...: schema_version=[1,0])
...:
In [2]: f.close()
Warning
Opening a gsd file with a ‘w’ or ‘x’ mode overwrites any existing file with the given name.
Write data¶
In [3]: f = gsd.fl.open(name="file.gsd",
...: mode='wb',
...: application="My application",
...: schema="My Schema",
...: schema_version=[1,0]);
...:
In [4]: f.write_chunk(name='chunk1', data=numpy.array([1,2,3,4], dtype=numpy.float32))
In [5]: f.write_chunk(name='chunk2', data=numpy.array([[5,6],[7,8]], dtype=numpy.float32))
In [6]: f.end_frame()
In [7]: f.write_chunk(name='chunk1', data=numpy.array([9,10,11,12], dtype=numpy.float32))
In [8]: f.write_chunk(name='chunk2', data=numpy.array([[13,14],[15,16]], dtype=numpy.float32))
In [9]: f.end_frame()
In [10]: f.close()
Call gsd.fl.open()
to access gsd files on disk.
Add any number of named data chunks to each frame in the file with
gsd.fl.GSDFile.write_chunk()
. The data must be a 1 or 2
dimensional numpy array of a simple numeric type (or a data type that will automatically
convert when passed to numpy.array(data)
. Call gsd.fl.GSDFile.end_frame()
to end the frame and start the next one.
Note
While supported, implicit conversion to numpy arrays creates a 2nd copy of the data in memory and adds conversion overhead.
Warning
Make sure to call end_frame()
before closing the file, or the last frame may be lost.
Read data¶
In [11]: f = gsd.fl.open(name="file.gsd", ....: mode='rb', ....: application="My application", ....: schema="My Schema", ....: schema_version=[1,0]) ....: In [12]: f.read_chunk(frame=0, name='chunk1') Out[12]: array([1., 2., 3., 4.], dtype=float32) In [13]: f.read_chunk(frame=1, name='chunk2') Out[13]: array([[13., 14.], [15., 16.]], dtype=float32) In [14]: f.close()
gsd.fl.GSDFile.read_chunk()
reads the named chunk at the given frame index in the file
and returns it as a numpy array.
Test if a chunk exists¶
In [15]: f = gsd.fl.open(name="file.gsd", ....: mode='rb', ....: application="My application", ....: schema="My Schema", ....: schema_version=[1,0]) ....: In [16]: f.chunk_exists(frame=0, name='chunk1') Out[16]: True In [17]: f.chunk_exists(frame=1, name='chunk2') Out[17]: True In [18]: f.chunk_exists(frame=2, name='chunk1') Out[18]: False In [19]: f.close()
gsd.fl.GSDFile.chunk_exists()
tests to see if a chunk by the given name exists in the file
at the given frame.
Read-only access¶
In [20]: f = gsd.fl.open(name="file.gsd", ....: mode='rb', ....: application="My application", ....: schema="My Schema", ....: schema_version=[1,0]) ....: In [21]: if f.chunk_exists(frame=0, name='chunk1'): ....: data = f.read_chunk(frame=0, name='chunk1') ....: In [22]: data Out[22]: array([1., 2., 3., 4.], dtype=float32) # Fails because the file is open read only In [23]: f.write_chunk(name='error', data=numpy.array([1])) --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-23-c9aabea2641a> in <module> ----> 1 f.write_chunk(name='error', data=numpy.array([1])) fl.pyx in gsd.fl.GSDFile.write_chunk() RuntimeError: GSD file is opened read only: file.gsd In [24]: f.close()
Files opened in read only (rb
) mode can be read from, but not written to. The read-only
mode is tuned for high performance reads with minimal memory impact and can easily handle
files with tens of millions of data chunks.
Access file metadata¶
In [25]: f = gsd.fl.open(name="file.gsd", ....: mode='rb', ....: application="My application", ....: schema="My Schema", ....: schema_version=[1,0]) ....: In [26]: f.name Out[26]: 'file.gsd' In [27]: f.mode Out[27]: 'rb' In [28]: f.gsd_version Out[28]: (1, 0) In [29]: f.application Out[29]: 'My application' In [30]: f.schema Out[30]: 'My Schema' In [31]: f.schema_version Out[31]: (1, 0) In [32]: f.nframes Out[32]: 2 In [33]: f.close()
Open a file in read/write mode¶
In [34]: f = gsd.fl.open(name="file.gsd", ....: mode='wb+', ....: application="My application", ....: schema="My Schema", ....: schema_version=[1,0]) ....: In [35]: f.write_chunk(name='double', data=numpy.array([1,2,3,4], dtype=numpy.float64)); In [36]: f.end_frame() In [37]: f.nframes Out[37]: 1 In [38]: f.read_chunk(frame=0, name='double') Out[38]: array([1., 2., 3., 4.])
Files in read/write mode ('wb+' or 'rb+'
) are inefficient. Only use this mode if you must read and
write to the same file, and only if you are working with relatively small files with fewer than
a million data chunks. Prefer append mode for writing and read-only mode for reading.
Write a file in append mode¶
In [39]: f = gsd.fl.open(name="file.gsd", ....: mode='ab', ....: application="My application", ....: schema="My Schema", ....: schema_version=[1,0]) ....: In [40]: f.write_chunk(name='int', data=numpy.array([10,20], dtype=numpy.int16)); In [41]: f.end_frame() In [42]: f.nframes Out[42]: 2 # Reads fail in append mode In [43]: f.read_chunk(frame=2, name='double') --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-43-cab5b10fd02b> in <module> ----> 1 f.read_chunk(frame=2, name='double') fl.pyx in gsd.fl.GSDFile.read_chunk() KeyError: 'frame 2 / chunk double not found in: file.gsd' In [44]: f.close()
Append mode is extremely frugal with memory. It only caches data chunks for the frame about to
be committed and clears the cache on a call to gsd.fl.GSDFile.end_frame()
. This is
especially useful on supercomputers where memory per node is limited, but you may want to
generate gsd files with millions of data chunks.
Use as a context manager¶
In [45]: with gsd.fl.open(name="file.gsd",
....: mode='rb',
....: application="My application",
....: schema="My Schema",
....: schema_version=[1,0]) as f:
....: data = f.read_chunk(frame=0, name='double');
....:
In [46]: data
Out[46]: array([1., 2., 3., 4.])
gsd.fl.GSDFile
works as a context manager for guaranteed file closure and cleanup
when exceptions occur.
Store string chunks¶
In [47]: f = gsd.fl.open(name="file.gsd",
....: mode='wb+',
....: application="My application",
....: schema="My Schema",
....: schema_version=[1,0])
....:
In [48]: f.mode
Out[48]: 'wb+'
In [49]: s = "This is a string"
In [50]: b = numpy.array([s], dtype=numpy.dtype((bytes, len(s)+1)))
In [51]: b = b.view(dtype=numpy.int8)
In [52]: b
Out[52]:
array([ 84, 104, 105, 115, 32, 105, 115, 32, 97, 32, 115, 116, 114,
105, 110, 103, 0], dtype=int8)
In [53]: f.write_chunk(name='string', data=b)
In [54]: f.end_frame()
In [55]: r = f.read_chunk(frame=0, name='string')
In [56]: r
Out[56]:
array([ 84, 104, 105, 115, 32, 105, 115, 32, 97, 32, 115, 116, 114,
105, 110, 103, 0], dtype=int8)
In [57]: r = r.view(dtype=numpy.dtype((bytes, r.shape[0])));
In [58]: r[0].decode('UTF-8')
Out[58]: 'This is a string'
In [59]: f.close()
To store a string in a gsd file, convert it to a numpy array of bytes and store that data in the file. Decode the byte sequence to get back a string.
Truncate¶
In [60]: f = gsd.fl.open(name="file.gsd", ....: mode='ab', ....: application="My application", ....: schema="My Schema", ....: schema_version=[1,0]) ....: In [61]: f.nframes Out[61]: 1 In [62]: f.schema, f.schema_version, f.application Out[62]: ('My Schema', (1, 0), 'My application') In [63]: f.truncate() In [64]: f.nframes Out[64]: 0 In [65]: f.schema, f.schema_version, f.application Out[65]: ('My Schema', (1, 0), 'My application')
Truncating a gsd file removes all data chunks from it, but retains the same schema, schema version, and applicaiton name. The file is not closed during this process. This is useful when writing restart files on a Lustre file system when file open operations need to be kept to a minimum.