This article presents a way to make structured generation with LLMs much faster than standard generation, but what I find most interesting is how it highlights the issues that tokenization entails towards the end.
This article presents a way to make structured generation with LLMs much faster than standard generation, but what I find most interesting is how it highlights the issues that tokenization entails towards the end.