Hammerspace

Hash-like interface to persistent, concurrent, off-heap storage
331
By Jon Tai

What is Hammerspace?

Hammerspace ... is a fan-envisioned extradimensional, instantly accessible storage area in fiction, which is used to explain how animated, comic, and game characters can produce objects out of thin air.

This gem provides persistent, concurrently-accessible off-heap storage of strings with a familiar hash-like interface. It is optimized for bulk writes and random reads.

Motivation

Applications often use data that never changes or changes very infrequently. In many cases, some latency is acceptable when accessing this data. For example, a user's profile may be loaded from a web service, a database, or an external shared cache like memcache. In other cases, latency is much more sensitive. For example, translations may be used many times and incurring even a ~2ms delay to access them from an external cache would be prohibitively slow.

To work around the performance issue, this type of data is often loaded into the application at startup. Unfortunately, this means the data is stored on the heap, where the garbage collector must scan over the objects on every run (at least in the case of Ruby MRI). Further, for application servers that utilize multiple processes, each process has its own copy of the data which is an inefficient use of memory.

Hammerspace solves these problems by moving the data off the heap onto disk. Leveraging libraries and data structures optimized for bulk writes and random reads allows an acceptable level of performance to be maintained. Because the data is persistent, it does not need to be reloaded from an external cache or service on application startup unless the data has changed.

Unfortunately, these low-level libraries don't always support concurrent writers. Hammerspace adds concurrency control to allow multiple processes to update and read from a single shared copy of the data safely. Finally, hammerspace's interface is designed to mimic Ruby's Hash to make integrating with existing applications simple and straightforward. Different low-level libraries can be used by implementing a new backend that uses the library. (Currently, only Sparkey is supported.) Backends only need to implement a small set of methods ([], []=, close, delete, each, uid), but can override the default implementation of other methods if the underlying library supports more efficient implementations.