Sehr spannend.

(14) 2023-5-14 arXiv roundup: FrugalGPT, Inverse CLIP scaling, Embedding all the modalities über FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance:

Fourth, they finetune a smaller model based on a bigger model’s outputs and query the smaller model instead. An extra benefit of this approach is that the finetuned model can often use a shorter prompt, since it’s more specialized.

And lastly, they cascade LLMs, querying a cheap model at first and proceeding to progressively more expensive ones until some model has high confidence in the answer.

By combining these techniques, they can often reach the same accuracy as a single powerful LLM at greatly reduced cost.