Hacker News

by magdykson 11/16/2023, 3:59 PMwith 3 comments

by magdykson 11/16/2023, 3:59 PM

A great framework for serving many fine-tuned llms in production by quickly swapping adapters for the same base model (eg. Llama-2-70b)

by abhaymon 11/16/2023, 5:18 PM

Whoa this looks pretty cool. One question though: is there increased latency when you have multiple adapters on a single base model?

LoRAX: Open-Source Serving for 100s of Fine-Tuned LLMs in Production