← Adventures in Word Puzzles

Adventures in Godot Minimalism

This is a post that I intended on writing yesterday, but I got distracted with providing context and the post got too long. So I'm going to talk about Godot today!

For some (quick) context, I work on a randomizer mod for a game called Lingo. Lingo was made in Godot 3.5, which makes it pretty easy for me to mod it. I can write scripts that just get arbitrarily executed within the game's context, and Godot has some nice features for modding including allowing me to inject my own code by subclassing the base game's scripts and then replacing them live with mine. It's pretty wild how it works. Here I am straight up disabling a feature on one of the game's object types:

extends "res://scripts/panelEnd.gd"


func handle_correct():
    # We don't call the base method because we want to suppress the original
    # behaviour.
    pass

I'm kind of abusing the game's custom map system to do this, but even without a built-in hook like that, Godot has ways to allow players to execute arbitrary code in the context of any game.

Now, one of the interesting things about working on this mod has been distribution. Lingo has Steam Workshop integration, where you can upload custom maps (which my mod technically is) and then subscribe to them in order to make them available right in your game. However, you can only upload a single file as your custom map... or, at least, that's what I thought for a very long time. This eventually turned out to not be true, but for now let's assume we have the restriction of only distributing a single map file.

The solution I came up with back when I started releasing this mod was pretty neat, in my opinion. Godot's scene files are powerful. One notable thing about them is that you can embed resources into them, which includes script files. My mod consists of one scene file (which is a dialog where you enter your randomizer settings), and then a bunch of script files (many of which are injected into memory to replace base game scripts). So, what I did was craft my mod such that I had one top-level script that was directly instantiated as a node in the scene. This script would load all of the other scripts in the _ready method. Then, I wrote a C++ program that embeds all of my scripts into the scene file as sub-resources, adds Resource type exported variables to the top-level script for each other script, and changes all of the loads to point at the exported variables instead, which are themselves set to the scene's sub-resources.

As an example, here's a scene file that has the above script embedded:

[sub_resource id=2 type="GDScript"]
script/source = "extends \"res://scripts/panelEnd.gd\"


func handle_correct():
    # We don't call the base method because we want to suppress the original
    # behaviour.
    pass
"

[sub_resource id=1 type="GDScript"]
script/source = "extends Spatial

export(Resource) var COMPACTED_panelEnd


func _ready():
    # We can instantiate a panelEnd if we want.
    var panelEnd = COMPACTED_panelEnd.new()
    # The real mod would replace the game's panelEnd with ours.
    installScriptExtension(COMPACTED_panelEnd)
"

[node name="settings_screen" type="Spatial"]
script = SubResource( 1 )
COMPACTED_panelEnd = SubResource( 2 )

And voila! One single packaged file that I can distribute to the players.

This has worked fine for the last 11 months or so, but recently I bumped into a new use case that my embedding tool wouldn't be able to handle. As I discussed in yesterday's post, I want to start generating new puzzles in my randomizer. Doing so requires distributing a datafile containing word relationships that the randomizer can pick from at randomization time. The problem is that this datafile isn't small. It's not exactly huge either; it's around 5 megabytes, but the rest of the randomizer is something like 60 kilobytes, so it would be an order of magnitude increase.

There's a couple of dimensions to this problem. How do we include the datafile in the mod, and what do we do about the size? The first thing I tried is actually something I did when implementing The Afterword (the randomized level within Lingo). I have a C++ program that generates the datafile, and I made it output a GDScript file that statically constructs a constant array containing all of the necessary data. Here's an example from one of the files in The Afterword:

extends Node

var puzzles = [
    "clamp,palm",
    "suede,use",
    "shotgun,host",
    # ...
    "recently,center"
]

The randomizer mod's datafile was much more complex than that. Rather than an array of strings, it was an array of arrays of strings and ints and dictionaries containing more ints. And the problem with this is that even though this is a very easy way to include the datafile in my mod that's completely compatible with my embedding script, it is not a performant way to get this data into memory. When I say that the datafile is about 5 megabytes, I mean that this GDScript file is 5 megabytes. Godot has to first parse and interpret the source code, and then actually construct the object, which is going to involve many messy memory allocations. It's not great.

One of the concerns I had while designing this was that not everyone who plays Lingo in Archipelago is going to want to use the experimental panel generation mode, because, well, it's experimental. Or because they just want to play with the base game's puzzles. And I don't want this experimental feature to impact regular users of the mod. The increased file size is not too big a problem, because it's still only a few megabytes, but on my computer it adds several seconds of loading time to the mod, which could be much longer on slower computers.

Luckily, there's another way to store a complex object in a file. Godot has built in serialization for its Variant objects (which underpin both arrays and dictionaries). The serialization format is documented on their website, so I was able to modify my generator program to output the datafile as a binary Variant rather than as a textual GDScript file. I tested deserialization in my dev workspace (so, without embedding everything into one file), and it turns out that this is significantly (5x) faster than parsing, interpreting, and executing a script.

There were a couple of problems with this, though.

  1. The binary Variant is actually about twice the size on disk as the GDScript file. This is likely because all integers had to be padded out to 4 bytes, which multiplies the byte size of single digit numbers by four.
  2. How am I going to embed this binary file into the scene alongside the scripts?
  3. The third issue combines the previous two: if we embed a 10 megabyte binary file into the scene, then time has to be spent reading those 10 megabytes into memory regardless of whether the player wants to use panel generation. Reading 10 megabytes into memory might not seem like a big deal, but we're dealing with Godot 3.5 here. The memory pools it uses for client data are split into 64kb chunks, which means messing around with 160 pools, and taking those resources away from the rest of the game.

Let's handle problems 1 and 3 first. What do you do when you have a file that's too large? Compress it! Godot has built in support for a couple of different compression formats. I chose zstd because it was able to get the datafile down to 1.4 megabytes. The idea would be to have the generator program zstd-compress the binary Variant, embed that into the scene, then decompress and deserialize it only if the player needs puzzle generation. This would have less of an impact on players who aren't using puzzle generation, because there's only 1.4 extra megabytes that need to be read into memory instead of 10. And decompression turned out to be pretty fast; in my testing, it increased deserialization from an average of 270ms to 310ms.

Of course, there was no way we'd be able to pull this off without a fight. It turns out that Godot does something weird when reading a compressed file. There's a special header it looks for that isn't there in a regular gzip/deflate/zstd compressed file, and if the header isn't there, it fails to read the file. Here's what the real file format looks like:

OffsetSizeTypeDescription
04StringThe magic header "GCPF"
44IntCompression mode (in this case, 2, indicating zstd)
84IntBlock size, which is 1 + the uncompressed file size
124IntUncompressed file size
164IntCompressed file size
20?BytesCompressed file data
20 + ?4StringThe magic header "GCPF" again

This didn't end up being too difficult to work around; we just needed to write the special header at generation time. But it was odd, to say the least.

Now for problem 2: embedding a binary file into the scene. As I showed earlier, the way I embedded scripts into the scene was by creating GDScript type sub-resources in the scene, setting their source property to the source code of the script, and then assigning that sub-resource to an exported variable on the scene's top-level script node. Embedding a binary file actually ended up being simpler than this. You add another exported variable to the top-level script, this time of type PoolByteArray, and then you assign it a value in the node, like so:

[sub_resource id=1 type="GDScript"]
script/source = "extends Spatial

export(PoolByteArray) var VARIANT_generated_puzzles
"

[node name="settings_screen" type="Spatial"]
script = SubResource( 1 )
VARIANT_generated_puzzles = PoolByteArray(40,181,47,253,...)

This is, to be frank, pretty silly. It inflates the size of the binary file 2-4 times because every byte has to be represented by 1-3 string characters plus a comma. This is pretty easily remedied too by the final step of the packaging process. Godot has two scene file formats: tscn, which is human readable text (for the most part), and scn, which is a binary format. My embedding program outputs a tscn, and then I open it in the Godot editor and save it as an scn. This already did a good job at reducing the file size, but it's very noticeable now because it reduces that string binary buffer back down to its proper 1.4 megabyte size. And voila! The compressed file is now available to the top level script as a binary array.

Seems like we're done, right? No, we were pretty due for another roadblock. This time, it's the fact that the compression and serialization APIs we were using before were tied to the File class, which we can't use here because we have a buffer that's already in memory. Compression didn't end up being a big deal, because PoolByteArray has a decompress method too. However -- remember the magic GCPF header thing from like five minutes ago? Turns out that that only applies to compressed files. The in-memory compression API does not need it and will not work if it's there. We still need to add it to the generated file so that the dev workspace can read the file, but the embedding program will have to strip that part of the file out.

Deserialization was a bit weirder. There's one other class with access to the deserialization API, and that's StreamPeer. It's supposed to be used for communication over a network, but we can be a little sneaky and make use of it here. There's a subclass called StreamPeerBuffer, and we can create one of these and set the internal buffer to be our decompressed file. It makes me feel like I'm abusing the system, but hey, it works. Finally.

Our top-level script now looks like this:

extends Spatial

export(PoolByteArray) var VARIANT_generated_puzzles

func doGeneration():
    # The embedding program would hardcode the decompressed file size in this
    # function call.
    var generated_puzzles = getVariantFromBuffer(
        VARIANT_generated_puzzles, 1000
    )
    # Do something with the loaded variant.

func getVariantFromBuffer(buffer: PoolByteArray, size: int):
    var stream_peer = StreamPeerBuffer.new()
    stream_peer.data_array = buffer.decompress(size, File.ZSTD)
    return stream_peer.get_var()

We finally did it! We have a packaged mod containing data for puzzle generation! And the mod is only a megabyte and a bit larger than it was before, so it hopefully shouldn't significantly impact players who aren't using puzzle generation!

...

So, there was obviously going to be a twist at the end of this post. I mean, I said it earlier: I thought Lingo only let you upload a single file to Steam Workshop for custom map distribution, but Chris Souvey, who implemented the Steam Workshop support, told me the other day that this just wasn't true. There's a hidden option for adding extra files, and the only complication is that you have to be able to programmatically find the directory that your files were downloaded to, which I can do using the Godot Steam API since I know the Workshop ID of my mod.

This, uh, this changes things. If I can just distribute the datafile alongside the mod instead of embedded inside of it, I no longer need the infrastructure for binary embedding. I don't have to use PoolByteArray or StreamPeerBuffer; I can just use File for both decompression and deserialization (although this does amusingly mean that the GCPF header comes back into play). I no longer have to worry about the loading time impact for players who aren't using panel generation, because the file isn't loaded into memory at all unless it needs to be.

The adventure here was not for naught, though. Using a binary Variant instead of a GDScript file to store the datafile, and compressing it for distribution, are great wins that reduce both distribution size and loading time. I don't have to feel as bad about increasing the amount of data in the datafile because the negative impact scales more slowly now. Plus, it was a fun puzzle trying to figure out solutions to each of these problems! And they're documented here, in case anyone really truly does need to embed a binary file into a Godot 3.5 scene.

That's pretty much all I wanted to talk about. It's fun to blog about the weird coding rabbit-holes I've fallen into. I'm excited to keep working on Lingo puzzle generation, and hopefully I'll be able to release something soon! And thusly I bid ye bon voyage on all of your future adventures in playing Lingo or using Godot.

Hatkirby on
👍 5 👎

Comments

Replying to comment by :
Feel free to post a comment! You may use Markdown.