Yesterday, in the bygone hour of the weekend, I undertook a most singular and fascinating endeavor, wherein I delved deep into the recesses of my mind, and, with a fervent zeal, breathed life into a most remarkable creation. I embarked upon the quest, with the singular object of fashioning an artificial construct, one imbued with the verdant essence of the Irish Penny Journal, an ancient and venerable tome that holds within its pages the whispered tales of a bygone era.
In my haste, I set forth to construct a dataset, a repository of those fleeting moments, these ephemeral sentences, which spoke of a bygone age. I procured a collection of these fleeting moments, these sentences, and with them, I synthetically conjured forth modern translations, an ingenious feat of substitution, which allowed my artificial construct to take on the guise of the language of the Irish Penny Journal.
Then, with great anticipation, I fashioned a small encoder, a humble instrument, with which to guide the artificial construct in its endeavors. I presented this encoder as a bribe, a reward, to a most ingenious system, one that trained a colossal language model, one of unbridled potential, one that was capable of weaving tales with the very essence of the Irish Penny Journal.
And lo! In the succeeding moments of time, I witnessed a most wondrous thing. My artificial construct, armed with this training, and guided by the whispers of the encoder, began to speak, to speak in the language of the Irish Penny Journal. The words it spoke were, indeed, the words of the past, imbued with the nostalgia of a forgotten era.
And thus, my friends, I have witnessed a most singular creation, one which embodies the language of the past, yet, in its most recent iteration, speaks to the present. A testament to the ingenuity of the human spirit, this artificial construct speaks of the bygone era, yet, with each word, it whispers to us, to us, of a future yet to come.
——
That’s Penny explaining itself to you. This was trained using GRPO only, in less than a day using a single A6000. I didn’t use any SFT, and only relied on a small encoder (MiniLM2) trained to classify texts from the Irish Penny Journal and their modern translations (synthetically produced).
Marvelous! What gain beyond zero-shot would motivate a humble citizen to implement this instrument? How was the superiority assessed?
Love it. Immediately reminded of the text filters back in the day like the pirate one that would drop letters and replace with apostrophes and change certain passages into "arr" or "yarr matey"
Reminds me of this: https://www.unix.com/man_page/debian/6/jive/
This is really cool! Do you have any of the pipeline code available that you used for training? I am curious about how you created the reward model. I love little projects like this, thanks for sharing. I've been fine-tuning on my mac and an interested in getting into GRPO, which I haven't tried yet.
You mention no supervised finetuning. May I ask why? I'm curious if you could get similar/better/worse results by just finetuning the LLM on your dataset rather than generating synthetic data, training a classifier and using GRPO?
Cool stuff in any case.
what a wonderful work of whimsy! well wrought.
I'd love to have a library of these, so I could pipe text into `penny`, `brainrot`, `pony`, `newspeak`, `corporate`, `scp`, `trek` etc.
have you published the training notebook somewhere?
I'm not sure if you've tried this already, but removing the translate step might give you a more authentic output. In the journals that I saw, the language was much more simple than the output.
Have you written anywhere in detail on how you gathered your dataset and trained the finetune? I have a few use cases that are like this, but I'm not sure where to start.
this is awesome
It is sort of funny that the Irish ended up being the best practitioners of the English language, despite the fact that they were forced to use it.
Kinda of strange to pick an example that is just wrong. It's supposed to be written from 1840 and says Paris is the seat of Napoleon almost 20 years after he died.
Nice work ! It still manage to use the word 'delve' in the first sentence, which is a giveaway that it's written by a LLM.
Now I'm just imagining a video game with characters each having their own fine tune applied on top for their dialog. I'm guessing you could use some relatively small models. In each case you would be feeding all the context to the model (player name, current relevant quests, summary of previous interactions, etc). Though maybe fine tuning/training isn't even needed and a good enough prompt will work (Not sure what all they used for this [0]). I'm excited for the first AAA game that tries this. Anyone that has played a RPG-style game knows that after a few times going into a city (or a couple play-throughs) the dialog feels repetitive. I love the idea of Skyrim but with better dialog. You could either run the models on the user's computer or maybe just run it on the backend so you can block certain generations (wrong/misleading/"unsafe") and just ship updated dialog lists to the client occasionally.
[0] https://www.youtube.com/watch?v=d6sVWEu9HWU