File layer¶

The file layer python module gsd.fl allows direct low level access to read and write gsd files of any schema. The hoomd reader (gsd.hoomd) provides higher level access to hoomd schema files, see HOOMD.

View the page source to find unformatted example code that can be easily copied.

Open a gsd file¶

In [1]: f = gsd.fl.open(name="file.gsd",
   ...:                 mode='wb',
   ...:                 application="My application",
   ...:                 schema="My Schema",
   ...:                 schema_version=[1,0])
   ...: 

In [2]: f.close()

Warning

Opening a gsd file with a ‘w’ or ‘x’ mode overwrites any existing file with the given name.

Write data¶

In [3]: f = gsd.fl.open(name="file.gsd",
   ...:                 mode='wb',
   ...:                 application="My application",
   ...:                 schema="My Schema",
   ...:                 schema_version=[1,0]);
   ...: 

In [4]: f.write_chunk(name='chunk1', data=numpy.array([1,2,3,4], dtype=numpy.float32))

In [5]: f.write_chunk(name='chunk2', data=numpy.array([[5,6],[7,8]], dtype=numpy.float32))

In [6]: f.end_frame()

In [7]: f.write_chunk(name='chunk1', data=numpy.array([9,10,11,12], dtype=numpy.float32))

In [8]: f.write_chunk(name='chunk2', data=numpy.array([[13,14],[15,16]], dtype=numpy.float32))

In [9]: f.end_frame()

In [10]: f.close()

Call gsd.fl.open() to access gsd files on disk. Add any number of named data chunks to each frame in the file with gsd.fl.GSDFile.write_chunk(). The data must be a 1 or 2 dimensional numpy array of a simple numeric type (or a data type that will automatically convert when passed to numpy.array(data). Call gsd.fl.GSDFile.end_frame() to end the frame and start the next one.

Note

While supported, implicit conversion to numpy arrays creates a 2nd copy of the data in memory and adds conversion overhead.

Warning

Make sure to call end_frame() before closing the file, or the last frame is lost.

Read data¶

In [11]: f = gsd.fl.open(name="file.gsd",
   ....:                 mode='rb',
   ....:                 application="My application",
   ....:                 schema="My Schema",
   ....:                 schema_version=[1,0])
   ....: 

In [12]: f.read_chunk(frame=0, name='chunk1')
Out[12]: array([1., 2., 3., 4.], dtype=float32)

In [13]: f.read_chunk(frame=1, name='chunk2')
Out[13]: 
array([[13., 14.],
       [15., 16.]], dtype=float32)

In [14]: f.close()

gsd.fl.GSDFile.read_chunk() reads the named chunk at the given frame index in the file and returns it as a numpy array.

Test if a chunk exists¶

In [15]: f = gsd.fl.open(name="file.gsd",
   ....:                 mode='rb',
   ....:                 application="My application",
   ....:                 schema="My Schema",
   ....:                 schema_version=[1,0])
   ....: 

In [16]: f.chunk_exists(frame=0, name='chunk1')
Out[16]: True

In [17]: f.chunk_exists(frame=1, name='chunk2')
Out[17]: True

In [18]: f.chunk_exists(frame=2, name='chunk1')
Out[18]: False

In [19]: f.close()

gsd.fl.GSDFile.chunk_exists() tests to see if a chunk by the given name exists in the file at the given frame.

Read-only access¶

In [20]: f = gsd.fl.open(name="file.gsd",
   ....:                 mode='rb',
   ....:                 application="My application",
   ....:                 schema="My Schema",
   ....:                 schema_version=[1,0])
   ....: 

In [21]: if f.chunk_exists(frame=0, name='chunk1'):
   ....:     data = f.read_chunk(frame=0, name='chunk1')
   ....: 

In [22]: data
Out[22]: array([1., 2., 3., 4.], dtype=float32)

# Fails because the file is open read only
In [23]: f.write_chunk(name='error', data=numpy.array([1]))
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-23-c9aabea2641a> in <module>()
----> 1 f.write_chunk(name='error', data=numpy.array([1]))

fl.pyx in gsd.fl.GSDFile.write_chunk()

RuntimeError: GSD file is opened read only: file.gsd

In [24]: f.close()

Files opened in read only (rb) mode can be read from, but not written to. The read-only mode is tuned for high performance reads with minimal memory impact and can easily handle files with tens of millions of data chunks.

Access file metadata¶

In [25]: f = gsd.fl.open(name="file.gsd",
   ....:                 mode='rb',
   ....:                 application="My application",
   ....:                 schema="My Schema",
   ....:                 schema_version=[1,0])
   ....: 

In [26]: f.name
Out[26]: 'file.gsd'

In [27]: f.mode
Out[27]: 'rb'

In [28]: f.gsd_version
Out[28]: (1, 0)

In [29]: f.application
Out[29]: 'My application'

In [30]: f.schema
Out[30]: 'My Schema'

In [31]: f.schema_version
Out[31]: (1, 0)

In [32]: f.nframes
Out[32]: 2

In [33]: f.close()

Open a file in read/write mode¶

In [34]: f = gsd.fl.open(name="file.gsd",
   ....:                 mode='wb+',
   ....:                 application="My application",
   ....:                 schema="My Schema",
   ....:                 schema_version=[1,0])
   ....: 

In [35]: f.write_chunk(name='double', data=numpy.array([1,2,3,4], dtype=numpy.float64));

In [36]: f.end_frame()

In [37]: f.nframes
Out[37]: 1

In [38]: f.read_chunk(frame=0, name='double')
Out[38]: array([1., 2., 3., 4.])

Files in read/write mode ('wb+' or 'rb+') are inefficient. Only use this mode if you must read and write to the same file, and only if you are working with relatively small files with fewer than a million data chunks. Prefer append mode for writing and read-only mode for reading.

Write a file in append mode¶

In [39]: f = gsd.fl.open(name="file.gsd",
   ....:                 mode='ab',
   ....:                 application="My application",
   ....:                 schema="My Schema",
   ....:                 schema_version=[1,0])
   ....: 

In [40]: f.write_chunk(name='int', data=numpy.array([10,20], dtype=numpy.int16));

In [41]: f.end_frame()

In [42]: f.nframes
Out[42]: 2

# Reads fail in append mode
In [43]: f.read_chunk(frame=2, name='double')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-43-cab5b10fd02b> in <module>()
----> 1 f.read_chunk(frame=2, name='double')

fl.pyx in gsd.fl.GSDFile.read_chunk()

KeyError: 'frame 2 / chunk double not found in: file.gsd'

In [44]: f.close()

Append mode is extremely frugal with memory. It only caches data chunks for the frame about to be committed and clears the cache on a call to gsd.fl.GSDFile.end_frame(). This is especially useful on supercomputers where memory per node is limited, but you may want to generate gsd files with millions of data chunks.

Use as a context manager¶

In [45]: with gsd.fl.open(name="file.gsd",
   ....:                 mode='rb',
   ....:                 application="My application",
   ....:                 schema="My Schema",
   ....:                 schema_version=[1,0]) as f:
   ....:     data = f.read_chunk(frame=0, name='double');
   ....: 

In [46]: data
Out[46]: array([1., 2., 3., 4.])

gsd.fl.GSDFile works as a context manager for guaranteed file closure and cleanup when exceptions occur.

Store string chunks¶

In [47]: f = gsd.fl.open(name="file.gsd",
   ....:                 mode='wb+',
   ....:                 application="My application",
   ....:                 schema="My Schema",
   ....:                 schema_version=[1,0])
   ....: 

In [48]: f.mode
Out[48]: 'wb+'

In [49]: s = "This is a string"

In [50]: b = numpy.array([s], dtype=numpy.dtype((bytes, len(s)+1)))

In [51]: b = b.view(dtype=numpy.int8)

In [52]: b
Out[52]: 
array([ 84, 104, 105, 115,  32, 105, 115,  32,  97,  32, 115, 116, 114,
       105, 110, 103,   0], dtype=int8)

In [53]: f.write_chunk(name='string', data=b)

In [54]: f.end_frame()

In [55]: r = f.read_chunk(frame=0, name='string')

In [56]: r
Out[56]: 
array([ 84, 104, 105, 115,  32, 105, 115,  32,  97,  32, 115, 116, 114,
       105, 110, 103,   0], dtype=int8)

In [57]: r = r.view(dtype=numpy.dtype((bytes, r.shape[0])));

In [58]: r[0].decode('UTF-8')
Out[58]: 'This is a string'

In [59]: f.close()

To store a string in a gsd file, convert it to a numpy array of bytes and store that data in the file. Decode the byte sequence to get back a string.

Truncate¶

In [60]: f = gsd.fl.open(name="file.gsd",
   ....:                 mode='ab',
   ....:                 application="My application",
   ....:                 schema="My Schema",
   ....:                 schema_version=[1,0])
   ....: 

In [61]: f.nframes
Out[61]: 1

In [62]: f.schema, f.schema_version, f.application
Out[62]: ('My Schema', (1, 0), 'My application')

In [63]: f.truncate()

In [64]: f.nframes
Out[64]: 0

In [65]: f.schema, f.schema_version, f.application
Out[65]: ('My Schema', (1, 0), 'My application')

Truncating a gsd file removes all data chunks from it, but retains the same schema, schema version, and applicaiton name. The file is not closed during this process. This is useful when writing restart files on a Lustre file system when file open operations need to be kept to a minimum.