Adventures in Word Puzzles
onThere's a game called Lingo that I am both an avid player and modder of. It's like a cross between The Witness, Antichamber, and word puzzles. You're trapped in a maze of non-euclidean hallways filled with word puzzles and you have to figure out the rules to those puzzles yourself. There's also way more base game content than you'd expect. You should play it! There's my spiel.
In fact, if you have any interest in playing this game, you may want to give at least the first level a good try before reading on in this post, because towards the end I spoil some of the puzzle mechanics.
Anyway, I said that I both play and mod this game. I've mentioned in a previous post that I'm involved in a randomizer project called Archipelago. It's an open source framework for randomizing multiple games together, and I maintain the randomizer for Lingo. It's been a lot of fun to work on over the past couple of years, and I've continued adding features to it long after release. One of the features that people have continually asked for since the very beginning is new puzzle generation.
My randomizer doesn't actually create new word puzzles for the game; the randomization is more about randomizing your route and progression through the labyrinthine world. There's an option to shuffle all of the base game puzzles around, but here's the thing: there are two kinds of Lingo fans. Those who really like Antichamber, and those who really like word puzzles. While I've only played a little bit of Antichamber, I most definitely fall into that category, seeing as what first interested me in the game was how hostile the environment was to the player (although it's been softened quite a bit since then as the game has gotten more popular). In fact, this is a big part of why I made the randomizer; because at that point I'd completely memorized the game's map, and I wanted to be able to feel lost in it again. The creator of the game, Brenton Wildes, has stated that he thought he'd attract more of the Antichamber crowd, but what ended up happening was that the game got popular with the word puzzle crowd, which is why most of the fanmade content for the game has centered around swathes of puzzles and new mechanics rather than a confusing environment.
I am not good at the word puzzles (lol). I rather like the puzzles written by Brenton, but for the most part I haven't been able to enjoy the fanmade content because it's too difficult for me. But I do have experience with programmatically generating content with word semantics. Back during the golden age of Twitter, I ran something like 30 Twitter bots, many of which used a library I wrote called verbly in order to access a database of natural language data like synonyms, pronunciations, meronyms, images, etc, and they'd use this data for various things like rhyming jokes or ridiculous comparisons.
Back when I first got into Lingo, I used this library again to write a Discord bot that generates Lingo puzzles and allows people to race to solve them. There's even a leaderboard! People liked this bot a lot, so Brenton eventually asked me to use my puzzle generating skills to help him create a timed secret level in the game that contained randomly generated puzzles, as an homage to The Witness's Challenge. This was a really fun project and I loved getting to contribute to one of my favorite games. If you play the game and find this level, let me know your time!
Anyway. One thing that I learned from both of those projects is that there's more to a word puzzle than just the mechanic and the answer. It's the same problem I experienced with the handwritten puzzles in the fanmade maps I mentioned earlier: it is hard to write a good puzzle question. My library of natural language relations helps me find a question and answer that are connected via some relationship, but you also need to ask yourself whether the player can reasonably infer the answer from looking at the question.
One rather infamous example of this from the Discord bot was a question that asked for an antonym of the word "ground". There's several reasonable guesses one could make, such as "sky", "water", "minced", "detach", etc, because of the fact that "ground" has multiple meanings. The correct answer? "figure".
Huh?
It turns out that WordNet has a very obscure definition for both the words "ground" and "figure" that allows this relationship to exist. The issue is that these definitions are not common knowledge, nor do Merriam Webster or Oxford mention them. So how was the player supposed to figure this out? And more importantly, how was the bot supposed to know how difficult the puzzle was?
This is not at all the only example. Pretty much any puzzle involving word meaning can fall prey to this, although some moreso than others. Synonymy and antonymy are easier to look up than puzzles involving meronymy and hypernymy, which are frequently impossible to guess. Puzzles involving the spelling or pronunciation of a word are often easier, although it's easy to fall into the trap of posting a puzzle asking "what is a word that rhymes with X" or "what is a (long) word that contains the (short) subword X" and end up with players being unable to answer it.
How does the Discord bot handle this? And the answer would be that it doesn't, really. I manually disabled some puzzle types, but the bot still regularly posts puzzles that are too difficult and players just ignore them. But then, how does my secret Lingo level handle this? It wouldn't really be a very good gameplay experience if the timed secret level regularly gave you practically impossible puzzles and you just had to keep resetting until you got a good one. Well: I manually curated the puzzles. I implemented The Afterword by generating hundreds of specific types of puzzles using a modified version of my bot, curated the puzzles by hand, and then embedded them into the game so that it could pick puzzles to show the player on the fly. Curating the puzzles took hours sometimes, even after writing a script that allowed me to just press y or n after every puzzle to accept or reject it. I still missed ones that were too difficult. For one of the puzzle types, I had to rewrite most of them by hand because the bot just wasn't good with that type (hypernymy). And people still complain that the level is too difficult (although, to some degree, I think we wanted to make a level with a higher difficulty than the rest of the game, as long as it wasn't unreasonable).
This brings me to the third randomized Lingo project I've worked on: the Archipelago randomizer. Thanks to these two other projects, I've been asked for a long time to add randomly generated puzzles to the randomizer, and it's something I do want to do, but it's taken me a long time to implement it because I'm very conscious of this issue inherent in generating random word puzzles. An Archipelago seed is often a long experience, which can take a few hours if played solo or synchronously with co-players, or weeks if played asynchronously. Having to reset an Afterword run because you encountered a puzzle that was too difficult is one thing, but getting stuck in an Archipelago seed for the same reason would be a much worse experience, and it's something I want to do my best to avoid.
My specific aim is to randomize the entirety of Level 1 of Lingo, since that is the only level that Archipelago supports. Level 1 contains a myriad of areas and puzzle types, and each have to be considered in their own way. Some puzzles, as I've alluded to before, are pretty simple to generate: anagrams, for instance, can be generated with little filtering because even with a long anagram, you always have enough information to figure out the solution. But there are many puzzles in the level that are much more difficult to do, which has caused me to take several extended breaks from the project.
I recently came to the decision that it would be a good idea to release a sort of "experimental panel generation" mode for the randomizer that only randomizes certain areas and carries a disclaimer that puzzle quality may vary and content may change. This would allow me to try things out with less risk, and would also make the project seem less bleak, because I would not be beholden to solving the entirety of "What is there to do about LL1?" before releasing any of it. And this has been kind of exciting for me, because it means I get to show off some of the random puzzle designs I'm proud of figuring out.
I'll spoil one of the designs, if I may. Consider this puzzle:
The mechanic behind this puzzle is "what is a word that rhymes with CUISINES?" There's a lot of potential answers to this question, but there's another hint here that helps out a lot. In this area in the base game, the color of the roof above the puzzle indicates the color of the solution. In this case, the question becomes "what is a blue thing that rhymes with CUISINES" and we can deduce that the answer is JEANS. There's another puzzle behind it that asks for an anagram of THING, and the answer is NIGHT. Even though the anagram doesn't really need an additional hint, the roof above the puzzle (not visible in this screenshot) is black.
This demonstrates the overarching solution to this issue with puzzle generation: multiple hints. A puzzle with only one hint is constrained to whatever relationship there is between the hint and the answer, even if it's obscure. Adding a second hint further constrains the solution space, often significantly. This is best demonstrated in another area of the base game that only contains puzzles that each contain two textual hints using different mechanics, which I've been able to randomize almost in its entirety because this is usually enough for the player to deduce the answer.
(There's only two puzzles in that room that I haven't been able to randomize yet, and it's because they both involve a word relationship that I have no data for.)
Lingo has a lot of tricks that help the player deduce the solution to a puzzle, outside of how the text on the panel is crafted. Many puzzles are hinted by nearby puzzles that share a theme or combine to a greater solution. There's also a set of words that the game is fond of that are repeated across multiple puzzles, which means that you can sometimes figure out a puzzle just because the solution word is salient from having seen it earlier on. This in itself is a form of "second hinting", as I've suddenly chosen to call it, and it's likely that I'll end up using it in some form in my randomizer.
While my project is still a work in progress, thinking through this has helped me gain an even greater appreciation for Lingo. It shows how different Lingo is from a generic list of word puzzles. One question that's often asked about Witness-likes such as Taiji and Lingo is why the creator decided to put their puzzles into an environment rather than just presenting them to the player on a webpage. Both games have their own answers to this; in Lingo's case, it's not just because the non-euclidean world adds a second dimension to the game, or just because it allows creating puzzle mechanics that are based on the environment. The world also benefits the non-environmental puzzles, the regular word connection puzzles that exist in relation to one another and collectively educate the player.
Lingo is a masterpiece of puzzle design, to quote Youtuber Icely Puzzles. If you're interested, he has a video that goes into how the game teaches you mechanics and how different puzzles are related to one another. Warning: this video spoils almost the entirety of the first level of the game, so I'd recommend holding off until you've beaten it.
Now, I've long believed that randomizers are fundamentally unable to create as cohesive and rich an experience as handcrafted games (there's a rant about Lenna's Inception that I need to get out at some point). A good example of this is how hard it was for me to try to generate randomized Taiji puzzles. Even though Taiji's puzzles are more mechanical like The Witness's, they are much less inherently constrained and thus generated ones feel very different and usually more difficult than handcrafted ones. Artificial constraints would have to be designed and added on a per-area or per-puzzle basis to improve this.
I think that Lingo is in an interesting territory in this regard. What we've learned is that Lingo puzzles are more constrained than they appear, and this can be studied. Second hinting does a good job at making the puzzles more accessible, and second hinting is a mechanical thing that can be designed and then implemented. If I can figure out good ways to add this to different areas of the game, I think that it will be possible to randomly generate puzzles that are closer to what the base game can offer. They'll never be as good as the handwritten ones, but it's still cool to have a path forward in this project.
Now. As is common on my blog, this post has really gotten away from me. I originally wanted to blog about an interesting problem I encountered in Godot with developing the mod for Lingo, but I started giving a lot of backstory about Lingo and Archipelago and it developed into this whole analysis of puzzle design. I think I'm going to end it here, and pick the coding problem back up in another post. If you haven't played Lingo yet and you made it this far into the post and you're still interested in playing the game, then I have one thing to say:
GOOD LUCK!
Comments