Posts

0 commentsยท0 reblogs

We need a model that is efficient and affordable to run.

Developers can run inference on Llama 3.1 405B on their own infra at roughly 50% the cost of using closed models like GPT-4o, for both user-facing and offline inference tasks.