Hacker News

by fforfloon 4/4/2025, 7:11 AMwith 0 comments

I’ve been working on some projects with Greek-language data and have encountered some interesting challenges with RAG and LLMs. In an English-speaking universe, it's relatively straightforward to have a decent prototype in a short time. But in other languages, it has proven a bit trickier.

I’m curious to hear from others working with non-English languages—what challenges have you faced? Some areas of interest: - Models that are more open to switch language - Availability and quality of language-specific retrieval corpora - Differences in tokenization and embedding quality - Handling multilingual queries and responses - Any workarounds or best practices you’ve discovered

Would love to hear both success stories and pain points.

Ask HN: What Are Your Experiences with RAG and LLMs in Non-English Languages?