A common question we receive is why InstaSD API endpoints don’t incur costs when idle, and how they’re able to stay ready for on-demand processing without noticeable startup times. Here’s an explanation of how our system operates behind the scenes to achieve this.
How Our API Endpoints Work
Standalone Workflow Build
When you deploy an API, we build a standalone version of the ComfyUI workflow you’ve provided.
This workflow is optimized and packaged for efficient loading onto our GPU servers.
Serverless Node Architecture
The API endpoints utilize a serverless architecture, meaning they are not tied to a single, continuously running GPU.
Instead, the workflow is stored and ready to be loaded onto any available GPU in one of our global data centers when a request is received.
On-Demand GPU Allocation
When an API request is made, the system dynamically allocates an available GPU.
The inputs are processed by the GPU, and the results are returned to the user.
Efficient Resource Utilization
Because GPUs are only used when a request is actively being processed, there are no charges during idle periods when no requests are being made.
Why This Matters
This design ensures that:
You’re only charged for what you use. GPUs are not allocated during idle time, so you don’t pay for inactivity.
On-demand processing is fast and efficient. The system minimizes delays by warming up instances as request frequency increases.
Scalability is seamless. The serverless architecture allows us to dynamically scale API requests across multiple data centers to handle demand.
By leveraging serverless nodes and on-demand GPU allocation, InstaSD provides an efficient, cost-effective, and scalable solution for deploying your AI workflows as APIs.