Slot machine strategy
In the game Blue Prince , every run of the game have you fill a mansion with different rooms to wander through. On of these rooms is the Casino, containing several slot machines and a roulette table.
One of the slot machines is broken until you figure out how to fix it, which requires sacrificing a valuable item.
The difference between the standard slot machines and the locked slot machine is that the locked slot machine allows you to respin individual reels five times instead of three times.
There are a lot of satisfying puzzles in Blue Prince, as well as random draws that involve setting up synergistic benefits. I figured this would also be true for the slot machine that requires you to sacrifice the valuable item. But after playing it many times I couldn’t see a clear strategy for the slot machine that would beat it.
Because the difference was in the number of respins, I figured there must be some combinations of symbols on the reels that dramatically increase in expected values when you respin more times, giving higher expected yields. I thought maybe something with the probabilities of getting snake and net combinations.
People online have been collecting statistics from the slot machine estimated probabilities for the different symbols on the reels
Blank: 28%
Coin: 30%
Coin Stack: 10%
Snake: 10%
Net: 4%
x2: 9%
Clover: 1%
Crown: 8%
With the probabilities and the rules I could implement simulations of playing the game to test different strategies.
I thought I could use Q-learning to identify non-obvious strategies buried deep in the game design. After some iterations on different Q-learning strategies, it wasn’t able to find any particularly revolutionary strategies.
To get some points of comparison, I also created a JavaScript version of the game that I could multiple times to collect data on the success on strategies I had intuited from playing the game many many times.
I collected empirical data for 25 runs (until bankruptcy) while. These empirical data can be compared with having the Q-learning agent play the game for 25 runs, then plotting the total balance over the turns in playing the slot machine.
The Q-learning agent had learned some strategies, but it seems I took more risky bets that worked out. On average, however, the performance is about equal.
I wanted to see how this compared to a naive strategy that never uses the respins, and instead alway cashes out after the initial spin.
This performed much better than the Q-learning agent or my strategies!
Generally, the cost of performing a respin does not beat out the expected values for the majority of symbol combinations on the reels. There are a few combinations that do however. For example, if you have three crowns, respinning the remaining reel at a cost of 1 gold has an expected value of 0.08 * 100 = 8 gold. I created a manual strategy that cashes out unless there is a respin available with a positive expected value, and performs that instead.
The performance of the optimal expected value strategy wasn’t much better than the cash-out strategy.
So why would you sacrifice the value item in order to use a slot machine with more respins, if using respins always end up losing you gold?
There is an achievement in the game to collect the jackpot from the slot machine (getting four crowns). If you pull three crowns, and you have three respins, the chance of getting the jackpot is 1 - 0.92^3 = 22.13%, but with five respins it is 1 - 0.92^5 = 34.09%. That is a large different if you really want to get this achievement; you end up having to play many fewer games.
The locked slot machine ends up not being gains in gold but part of ‘completing’ the game.
Code and notebook for this post are available at https://github.com/vals/Blog/tree/master/250705-slot-machine-strategy