I’ve been working on some projects with Greek-language data and have encountered some interesting challenges with RAG and LLMs. In an English-speaking universe, it's relatively straightforward to have a decent prototype in a short time. But in other languages, it has proven a bit trickier.
I’m curious to hear from others working with non-English languages—what challenges have you faced? Some areas of interest: - Models that are more open to switch language - Availability and quality of language-specific retrieval corpora - Differences in tokenization and embedding quality - Handling multilingual queries and responses - Any workarounds or best practices you’ve discovered
Would love to hear both success stories and pain points.
0