To assist organizations scale their AI utilization with out over-extending their budgets, we’ve added two new methods to cut back prices on constant and asynchronous workloads:
- Discounted utilization on dedicated throughput: Prospects with a sustained stage of tokens per minute (TPM) utilization on GPT-4 or GPT-4 Turbo can request entry to provisioned throughput to get reductions starting from 10–50% primarily based on the scale of the dedication.
- Diminished prices on asynchronous workloads: Prospects can use our new Batch API to run non-urgent workloads asynchronously. Batch API requests are priced at 50% off shared costs, provide a lot increased price limits, and return outcomes inside 24 hours. That is preferrred to be used circumstances like mannequin analysis, offline classification, summarization, and artificial knowledge era.
We plan to maintain including new options centered on enterprise-grade safety, administrative controls, and price administration. For extra data on these launches, go to our API documentation or get in touch with our team to debate customized options on your enterprise.