Wordle is Fun (for programmers and data scientists)
So, yes obviously wordle is a fun game to play, but I refer actually to the multitude of interesting programming and analytical efforts that have come out of the game. Many of these were written up at News Stack back in January, but I thought I’d sum up other (and my own) efforts here a little bit, starting with the efforts I think are least common.
- Solvers / bots
- Determining the most popular Wordle openers based on tweeted / shared scores.
- “Solving” from tweeted / shared scores.
Solvers / bots
I found this very cool bot leaderboard via this this excellent writeup on the state of wordle solving. Best average guess is 3.4212 with SALET as the first guess. Precomputing algorithms based on this and you can watch it in action yourself (or use it) here.
My bot is not state of the art, and intentionally somewhat hobbled by using its own home-rolled dictionary, rather than the official wordle answer list from the source code. It was fun to write, but it only does a one-step lookhead to find a guess that on average reduces the remaining words the least.
The original video from 3Blue1Brown talks about this in the context of information theory.
Determining the most popular Wordle openers based on tweeted / shared scores.
Much has been posted about the best wordle openers, but very little on the most popular, other than this YouGov survey of UK wordle players.
I think I’m the only person who has attempted to figure out the most popular wordle openers from tweets. I’m using the aforementioned kaggle dataset, though I am also collecting tweets when I run
TwitterWordle so I have my own sample. However, I think it is easier to use publicly available data for reproducibility.
I have used a Ridge regression to handle the very large amount of colinearity among the possible guesses, since I start only with a pattern ⬜⬜🟨⬜🟨 that implies hundreds of words could create it.
crane are the Top 3 with
crane rocketing out of obscurity after the original 3Blue1Brown video and associated coverage.
“Solving” from tweeted / shared scores.
I was inspired to do this from Ben Hamner’s Kaggle notebook, and his dataset is super useful for a couple reason, which I’ll get to later.
The idea is to analyze everyone’s shared Twitter scores (e.g. the ⬛🟨🟩 patterns) and figure out the answer just from that. This actually got a lot harder post-NYT acquisition because they skipped a few words such that Wordle 241 had two solutions.
This problem has continued as some determined folks keep playing cached/saved versions of the original game. That has no doubt contributed to Ben’s non-100% success rate, the notebook has now failed on 231,236,249,254, and 258.
My implementation which you can read about in this notebook, and see its spoiler free predictions on Twitter, I’m happy to say, is still 100% accurate. I am limiting the search space to the ~2000 known wordle answers. I believe I have just one failure (247) if it has to search the larger ~12,000 length dictionary, though tweaking the parameters to account for the NYT word changes, I think it can solve that too.
I think the trick to making this work is scoring the words, rather than trying to use certain patterns to completely eliminate/filter out possible solutions.