Memory Mapped Files in Rust
How to handle memory mapped files in Rust using the memmap crate
In my re-implementation of the Gaia Sky level-of-detail (LOD) catalog generation in Rust I have been able to roughly halve the processing time, and, even though I do not have concrete numbers yet, everything points towards a drastic decrease in memory usage as well. In this project, I need to read a metric ton of gzipped
csv Gaia catalog files, parse and process them into a functional in-memory catalog with cartesian positions, velocity vectors, RGB colors, etc. Then I need use them to generate an octree that represents the LOD structure, and finally write another metric ton of binary files back to disk. Using memory mapped files helps a lot in avoiding copies and speeding up the reading and writing operations; that’s something I tried out in the Java version and have come to also re-implement in Rust. Here’s the thing though: working with memory mapped files in Java is super straightforward. In Rust? Not so much. And the lack of available documentation and examples does not help. I was actually unable to find any working snippets with all the parts I needed, so I’m documenting it in this post in case someone else is in the same situation I was.
To that purpose, we will use the
Reading memory mapped text files
In my case, since I only need to read text files line by line, reading is the easy part. My input files may or may not be gzipped, so my
Read objects need to be wrapped up in a
Box, since its size is not known at compile time. Other than that, we need to create a memory mapped buffer and pass it on to the actual reader creation.
The snippet below shows how to read a text file by memory mapping it (memory map creation highlighted).
Writing memory mapped binary files
Once the generation of the octree (octree node) has finished, I need to dump the contents of each octant to a file so that they can later be loaded and used by Gaia Sky. These files contain the information of all the stars in the octant, and the more compact they are, the faster the loading and streaming to VRAM will be when Gaia Sky is running.
The file format used is a binary format, described here, and below’s an overview of the contents, in order.
- 1 single-precision integer (32-bit) – token number -1
- 1 single-precision integer (32-bit) – version number (2 in this case)
- 1 single-precision integer (32-bit) – number of stars in the file
- For each star:
- 3 double-precision floats (64-bit * 3) – X, Y, Z cartesian coordinates in internal units
- 3 single-precision floats (32-bit * 3) – Vx, Vy, Vz - cartesian velocity vector in internal units per year
- 3 single-precision floats (32-bit * 3) – mualpha, mudelta, radvel - proper motion
- 4 single-precision floats (32-bit * 4) – appmag, absmag, color, size - Magnitudes, colors (encoded), and size (a derived quantity, for rendering)
- 1 single-precision integer (32-bit) – HIP number (if any, otherwise negative)
- 1 double-precision integer (64-bit) – Gaia SourceID
- 1 single-precision integer (32-bit) – namelen -> Length of name
- namelen * char (16-bit * namelen) – Characters of the star name, where each character is encoded with UTF-16
Writing to a memory mapped file in rust is really almost the same as writing to a byte buffer. You need to know the exact size of the file beforehand, and then fill the buffer with the right bytes at the right positions. As you can see below, that’s exactly what I’m doing. I first compute the final size of the file (lines 9 to 39) and only then I create the mapped buffer (highlighted lines) and fill it up, making sure that each element is in the right position (lines 62 through end).
Most of the code below pertains to my particular binary format, but it beautifully exemplifies how to fill up the buffer with different data types and variable numbers of them.
That is all. The repository that contains this code is here:
gaiasky-catgen. It constitutes my very first foray into Rust, so a lot of the code may not be fully idiomatic (or idiomatic at all), and I’m sure it’s not the fastest also. However, it works well and performs much better than the Java counterpart, both in speed and in memory usage.
In this post we have seen how to deal with memory mapped files in Rust to both read and write data faster, avoiding memory copies.