A great framework for serving many fine-tuned llms in production by quickly swapping adapters for the same base model (eg. Llama-2-70b)
Whoa this looks pretty cool. One question though: is there increased latency when you have multiple adapters on a single base model?
A great framework for serving many fine-tuned llms in production by quickly swapping adapters for the same base model (eg. Llama-2-70b)