Vault API Rate Limits

API rate limits are a common way to guarantee a high quality service by preventing servers from becoming overloaded and the web service itself from becoming unusable. Web services are a fundamental resource to any developer application or integration. We enforce rate limits to ensure that each application or integration gets its fair share of this resource. Learn more about API rate limits in Vault Help.

What Are Rate Limits?

Rate limits constrain the number of API calls a user can make during a given time period. Once you reach your API quota, the server will throttle, or delay, your API requests until the next window.

Calls to /api/{version}/auth are calculated separately. After reaching the Auth API Burst Limit, any further requests will fail until the next window.

How Does Vault Calculate Limits?

Vault enforces multiple types of rate limits:

Burst Limit is the number of API calls that your Vault can receive within a fixed 5-minute period. When you reach the burst limit, the server delays responses for the remainder of the burst-limit period. To determine the length of delay for a throttled response, check the X-VaultAPI-ResponseDelay response header or the API Usage Logs.
Auth API Burst Limit is the number of calls that your Vault can make to /api/{version}/auth in a one (1) minute period. When you reach 50% of the burst limit, the server delays responses for the remainder of the burst-limit period. This limit is tracked by the username and vaultDNS parameters and does not apply to SAML/SSO or OAuth authentication. When you reach the Auth burst limit, any additional Auth requests will fail until the next window. To determine the burst limit for your Vault or the length of delay for a throttled response, check the response headers or the API Usage Logs.
Job Status API Rate Limit is the number of calls that your Vault can make to /api/{version}/services/jobs/{job_id}, which is one (1) call every 10 seconds. When this limit is reached, Vault returns API_LIMIT_EXCEEDED.

For example, a Vault might allow 2,000 API requests within a 5-minute window. Between 4:00 and 4:03, your Vault has received 2,000 requests. On request 2,001 at 4:04, the server slows down all requests until the next window begins at 4:05.

API Rate Limit Headers

Vault API returns rate limiting headers to help you monitor how many API calls you have remaining as well as possible response throttling.

X-VaultAPI-BurstLimit: Indicates the maximum number of calls allowed in a 5-minute burst window. For example, 2000. (Included in v19.2+)
X-VaultAPI-BurstLimitRemaining: Indicates the number of API calls remaining for the current 5-minute burst window. For example, 1945. (Included in v14.0+)
X-VaultAPI-ResponseDelay: Indicates the delay, in milliseconds, of a throttled response. Only included for delayed responses. For example, 500ms. See How Does Vault Calculate Limits? (Included in v14.0+)

These headers are not relevant when the Job Status endpoint returns the API_LIMIT_EXCEEDED error when requested more than once in 10 seconds.

Authentication API Rate Limit Headers

As of v20.1, calls to /api/{version}/auth return two rate limit headers in every response showing you the total limits allowed for your Vault and how many /api/{version}/auth calls you have remaining. These calls also count towards your burst and daily limits.

X-VaultAPI-BurstLimitRemaining: Indicates the number of API calls remaining for the current 1-minute burst window. For example, 19.
X-VaultAPI-BurstLimit: Indicates the maximum number of calls allowed in a 1-minute burst window. For example, 20.
X-VaultAPI-ResponseDelay: Indicates the length of delay for a throttled response in milliseconds. For example, 2000.

Developing with Rate Limits

Here are some best practices for reducing the number of API requests:

Avoid unnecessary auth calls. A session ID with a timeout of 20 minutes only expires if it is not used within 20 minutes after the last request finishes executing.
Cache configuration data. Configuration data does not change often. Retrieve it once and store it locally in a database like SQLite or serialized in a file.
Optimize your code to eliminate unnecessary API requests. Are you retrieving data that isn’t being used in your application? Are you updating data that hasn’t changed? Eliminate these calls.
Regulate the API request rate. If you are frequently approaching or reaching the API rate limit, consider implementing a throttling process in your application to distribute your requests more uniformly over time. For example, observe the above mentioned response headers and monitor your request rate. Throttle requests when your rate reaches a certain threshold.
Use the bulk/batch APIs. You can drastically reduce your API request count if you take advantage of the many bulk APIs available. For example, a single API request can create 500 object records using the Create Multiple Object Records API. Find opportunities to replace single API calls with their bulk counterparts. Note that bulk APIs are not currently available on all resources.