lichess.org
Donate

Compressing Chess Moves Even Further, To 3.7 Bits Per Move

It might be fun to apply a learned compression approach (i.e., using neural networks to compress data) and see if it can go even lower... which incidentally is what my recent work has been on. :) My guess is that it might learn a little eval function to help predict moves, too. To be practical, the network would have to be pretty light, and runnable on the CPU, though.

Chess games are essentially videos of 8x8 2D images with 7 possible values (R N B Q K P empty), though with a transition probability distribution that is pretty focused on reasonable candidate moves. And there's also additional redundancy in the sense that many games start with similar openings, which could be further exploited by sorting games in a canonical order. (Though just augmenting the candidate move prediction with an opening book is probably good enough if that isn't desirable for whatever reason.)

But perhaps that's too much of a CNN/VAE viewpoint. Perhaps some transformer-based method applied to "embedded" feature vectors might work better. NLP is all about predicting the next token given a sequence of previously decoded tokens, after all. Or something like OctAttention which is effectively just probability modelling for a manually encoded data representations.
What is wrong with the full 2D diagram that you want to compress it that much. What did it ever do to you?
@sicariusnoctis said in #3:
> a learned compression approach

Huh that’s exactly the idea with which I came here.

You mean, instead of coming up with all the compression tricks by hand (which essentially is remembering common patterns about moves[1]), it would be wise to make this process automatic. Let a neural network or something spot the patterns and make use of them.

It can be done blindly—without knowing the position—or the position can be taken into account to provide additional context. And we can put the effort of holding the board in mind on the neural network, or we can provide it with some external interface telling it what the position is, not to waste the network’s limited mental resources on trying to figure out the position, knowing all the previous moves.

And as I guess the database is far more read than written, we can make the encoding slow than decoding. I believe there is some way to make encoding involve some neural networking, but not the decoding—decoding would be something straightforward—which would improve decoding speed. Maybe even improve it very much, if the network is big and slow.

And consecutive moves can be merged to save permutations otherwise wasted.

[1] For example, that no black pawns ever move to the eighth and seventh ranks, or that pawns almost always promote to a queen.
I'll make the humorous suggestion: games can be compressed into game IDs, which can be looked up using Lichess API.
Could barcodes be more useful that FEN's or EPD strings? @marcusbuffett
www.gtin.info/barcode-101/

What about two co-ordinates. The before and after a move. A square is either occupied (1) or not occupied (0). If nothing is exchanged it would take max 32 Bits to represent the chessboard. If some pawns or pieces are gone, than that size could drops quickly to fewer bits. But maybe it can be even less than that. A starting position has 16 groups of 4 bits.

Ply 0 (Starting position)
--------------- Black Side
1111 1111
1111 1111
0000 0000
0000 0000
--------------- Center line
0000 0000
0000 0000
1111 1111
1111 1111
--------------- White Side

Ply 1 (1. e4)
--------------- Black Side
1111 1111
1111 1111
0000 0000
0000 0000
--------------- Center line
0000 1000 (After)
0000 0000
1111 0111 (Before)
1111 1111
--------------- White Side

Ply 2 (1.e4 d5)
--------------- Black Side
1111 1111
1110 1111 (Before)
0000 0000
0001 0000 (After)
--------------- Center line
0000 1000
0000 0000
1111 0111
1111 1111
--------------- White Side

Ply 3 (2. e4xd5)
--------------- Black Side
1111 1111
1110 1111
0000 0000
0001 0000 (After)
--------------- Center line
0000 0000 (Before)
0000 0000
1111 0111
1111 1111
--------------- White Side

Maybe chess games could be encoded even more efficiently using binary than to decimal values.

www.researchgate.net/publication/330703394_An_Efficient_Text_Database_Compression_Technique_using_6_Bit_Character_Encoding_by_Table_Look_Up

The binary "1111" in decimal value is "15". The number "15" is smaller in bits than the binary "1111." There are 16 different combinations of ones and zeros in a group of 4 digits. So a table of 16 (ones and zeros) would play out a chessboard.

www.sciencedirect.com/topics/computer-science/binary-to-decimal-conversion

Maybe a 2-bit binary system could be used instead of 4. It all depended on how we split up the chessboard. Rather in 4 sectors it would be into 32 sectors. Then we attach the 2-bits or 4-bits like into some sort of DNA chain.

A FEN or EPD string is the DNA of chess positions.
How about compressing the FEN ... rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Since a FEN has 8 parts, maybe it's possible to compress it into 3-Bits.
But then how would we compress the "w KQkq - 0 1".

If you "7z Fastest" a chess game and convert it into binary, would it not be reduced even more?
fiehnlab.ucdavis.edu/staff/kind/collector/benchmark/7zip-benchmark
gist.github.com/ianchursky

codegolf.stackexchange.com/questions/19397/smallest-chess-board-compression
> One of the most optimal forms of
compression is actually to evaluate the position, use an engine
like Stockfish, and then store the index of the move.

Computational cost apart, how would that work in practice given that the typical engine output is non-deterministic?
There is a mistake in your article. You said "3.7 Bits Per Move". You must have meant 3.7 bytes per move, or something similar.
3.7 bits (or any fractional value of bits) is impossible because a bit is a binary value: 1 or 0, represented by a switch in a computer being either 'on' or 'off', like a light switch in a house. A switch cannot be 'seven-tenths on'. You can only have a whole number of bits.

If I am completely wrong, which is very very likely, please explain what is actually going on because I am confused.