I want build an offline tutor/assistant specifically for 3 high school subjects. It has to be a tiny but useful model because it will be locally on the mobile phone, i.e. absolutely offline. For each of the 3 high school subjects, I have the syllabus/curriculum, the textbooks, practice questions and plenty of old exam papers and answers. I would want to train the model so that it is tailored to this level of academics. I would want the kids to be able to have their questions explained from the knowledge in the books and within the scope of the syllabus. If possible, kids should be able to practice exam questions if they ask for it. The model can either fetch questions on a topic from the past and practice questions, or it can generate similar questions to those ones. I would want it to do more, but these are the requirements for the MVP. I am fairly new to this, so I would like to hear opinions on the best approach. What model to use? How to train it. Should I use RAG, or a purely generative model? Is there an inbetween that could work better? What are the challenges that I am likely to face in doing this and any advice on the potential workarounds? Any other advise that you think is good is most welcome.
I think Sal Khan's TED talk is the best place to start in order to build the foundation for an AI powered educational experience
https://www.ted.com/talks/sal_khan_how_ai_could_save_not_des...
I would avoid the mobile phone constraint, the models will not be good enough. Use foundational model APIs + RAG + prompt engineering + "moderation" to do some of the things Khan shows (like ensuring the system doesn't just hand out the answers) The real vision here is not having students all go through the same syllabus and problems, but making individual interest based education a feasible reality.
Get something simple working, then iterate. Problems will come to light and you will have better context for asking the best questions to work through them