Hacker News

by justanotheratomon 5/8/2025, 9:00 PMwith 1 comments

by justanotheratomon 5/8/2025, 9:03 PM

my question for anyone who knows:

Between SFT, DPO, and RFT, - when to use which? - can we mix and match? e.g, first SFT, then DPO.

OpenAI: support for Reinforcement Fine-tuning available to verified orgs