Minimizing Save Games with Structural Sharing

Talin
Machine Words
Published in
5 min readJan 31, 2023

--

One of the problems with huge open-world games is that there’s a massive amount of data involved. Moreover, the nature of open world games is that much of the world is mutable — there is an enormous amount of state that can be changed by the player, and that state must be stored when the player’s progress is saved. This can make saving and loading of games very slow, and consume substantial memory (which isn’t a huge problem for desktops, but may be an issue with consoles).

However, the size of saved games can be substantially reduced with a technique known as structural sharing.

Structural sharing simply means that elements of a data structure — such as nodes of a binary tree — are reused rather than copied. This technique is popular in libraries such as Immutable.js and Immer which provide support for immutable data structures.

When you look at a sequence of saved game files for an open-world game, one thing that becomes apparent is that a lot of the data is the same from one save to the next. The reason for this is fairly obvious: most player interactions are local, and the player’s character can only be in once place at a time.

Moreover, open world games generally don’t try to simulate the entire world at once. At any given moment, the only parts of the game world that are changing are those parts which are near to where the player’s avatar is.

This means that if you don’t visit a particular region of the map, the data in that region won’t change. There are some exceptions to this — sometimes a quest will have far-reaching effects — but those exceptions can be treated as a special case.

Typically the way an open-world game engine is organized is that the world is broken up into areas or “tracts”. Each tract is loaded as a separate data chunk and cached in memory. Tracts are loaded when the player visits that area, and are unloaded when the player hasn’t visited them in a while.

(Some readers might think that, because they can look far into the distance and see terrain, that the entire world must be loaded all at once. However, what they are actually seeing in the distance is not the actual terrain, but a reduced detail version which is many times smaller in size).

We can take advantage of this caching scheme to speed up our game saves. Just like the world, the saved game is also broken up into chunks, one for each area of the world. Each chunk is stored as a separate file or database record, depending on what kind of storage mechanism the game uses. The important part is that chunks can be written independently.

Each saved game chunk has an identifier which consists of two parts: the id of the region and a version number. There is also a “root index chunk” which contains the ids of all the other chunks. More specifically, the root index is a map whose keys are area ids, and whose values are “area id + version” strings.

Each time you save the game, a new root index chunk is created. In fact, a “saved game” is simply a root index chunk plus a user-friendly name (the name of the saved game) and a timestamp.

Two root index chunks are kept in memory at all times: The first, which we will call “prev” represents the most recently loaded game (which will be empty if we are starting a new game). The second, which we will call “next” represents the current state of the world, and which will be saved as the next saved game.

When an area of the map is evicted from the game’s area cache, we check to see if anything changed — which we can do by comparing it with the version in “prev”. If nothing changed, then we don’t need to do anything. Otherwise, if there are changes, then we save a new chunk (bumping the version number) and store its chunk id into “next”. We don’t, however, save “next” itself just yet — it will get stored when the next manual save or autosave happens.

This means two things: first, that the game will often be writing chunks in the background as the user is traveling around, even if they aren’t actually saving a game. Second, it means that there will sometimes be “orphan” chunks, meaning a chunk that is not pointed to by any root index. This can happen if the player quits the game without saving — any chunks that were written out since the previous save will be orphaned.

Orphan chunks can also occur if the player decides to delete some old saves — deleting a saved game only deletes the root index, not the chunks it points to, because there could be other root index chunks that also point to those chunks. A standard mark-and-sweep garbage collection pass can be used to delete the orphans, if so desired.

When a game is saved, three things happen: first, we flush any unsaved data in the area cache, using the same algorithm as for an eviction; secondly, we write the current “next” chunk to disk; and finally, the “next” chunk becomes the new “prev”, and a copy of the old “next” becomes the new “next”.

What about when a game is loaded? Several things have to happen: first, we load in the root chunk for that saved game which becomes the new “prev”. We clear the area cache completely, and then start re-loading areas based on the player’s new location. As areas are loaded, we also load the saved game chunks for those areas, using the versions specified in “prev”.

What about non-regional data, such as character skills and inventory? These can be treated just like areas — there will be a chunk specifically for the state of the player character. Since this data is constantly changing, each saved game will likely have a different version of this chunk, and there won’t be much sharing. (You could also break up the player state into multiple chunks, putting data that doesn’t change often into its own chunk.)

What about data that spans the entire world, such as weather or time of day? Again, we can have a special “global” chunk that stores data that is not confined to a given area on the map.

A sample implementation of the algorithm is here.

Conclusion

Using structural sharing, we can dramatically reduce both the data size of saved game files, and the amount of time needed to save and load saved games.

See Also

--

--

Talin
Machine Words

I’m not a mad scientist. I’m a mad natural philosopher.