Ask HN: How to manage docs for LLM RAG app?

by Jianghong94on 8/8/2024, 2:47 AMwith 4 comments

I'm building a LLM RAG QA bot for my company, a financial institution. Right now I know the 'basic' building blocks, e.g. prompt engineering, RAG, vector db, eval, etc. Funny enough the first challenge I encounter is to curate and manage all types of docs, e.g.: * email chains * teams recording transcripts * confluence pages * pdf manuscripts

These can be ever-evolving and may hook up with periodic delta updates, manual sync, add/remove, etc. And I'm trying to figure out if there's a way to manage these docs/texts properly. Basically, I think I would need a system to store these files, their metadata, etc, and provide a web UI for people to manage them. Then these blob of texts will go through frameworks like langchain/LlamaIndex and be cleaned/chunked into vector db, and different chunking strategies can be A/B tested while other people maintain this ever-growing docs system.

Any suggestions are welcomed. I've tried some all-in-one frameworks but so far my experience are lackluster. Also, my company due to compliance constraints cannot use cloud-based solutions, so it has to be either open-source local-deployed, or developed locally.

by omidhon 8/8/2024, 6:04 AM

Did you try dify? I found it was a good beginning for me.

https://dify.ai/