Having caught up with the recent episodes I was laughing at the similarity with the TV-show Silicon Valley :) I'll explain the reason why I made the shrynk library.
It began when my server storing cryptocurrency data started to overflow as I was storing compressed CSV files.
Trying to come up with a quick solution, I figured I should just switch to a more effective compression algorithm (it was stored using gzip). But how to quickly figure out which would be better?
Fast forward: I made the package shrynk for compression using machine learning! It helps you by choosing (and applying) the format to compress your CSV, pandas DataFrames, JSON, and in general it will work for files.
It is able to compress using 30% overall less disk space using a mixed strategy by machine learning compared to the best single compression algorithm (meaning: choosing any other single compression algorithm to compress everything will be worse) for benchmark data.
You can try a demo or upload your own file at
https://shrynk.ai
Read the whole story of why and how the approach works on my blog:
https://vks.ai/2019-12-05-shrynk-using-machine-learning-to-learn-how-to-compress
Code:
https://github.com/kootenpv/shrynk
0