Skip to content

Direct Data API Overview

Direct Data API is a new class of API that provides high-speed read-only data access to Vault. Direct Data API is a reliable, easy-to-use, timely, and consistent API for extracting Vault data.

It is designed for organizations that wish to replicate large amounts of Vault data to an external database, data warehouse, or data lake. Common use cases include:

  • Analytics: You can run analytics and business intelligence tools on the extracted data residing in the data warehouse.
  • Integration Hub: You can collect data from Vault and other systems in one place and perform data analysis in the external system.
  • Artificial Intelligence: With the rise of AI and large language models (LLMs), you can choose to train your models with Vault data to meet custom needs.

Direct Data API is not designed for real-time application integration.

What Does Direct Data API Provide?

Section link for What Does Direct Data API Provide?
Direct Data file typesDirect Data file types

Direct Data API provides the following file types:

  • Full: Every 24 hours, Direct Data API generates a Full file which contains a complete set of data for a specific Vault, starting from the time the Vault was created to the current date. You can use this file for the initial data load.
  • Incremental: Every 15 minutes, Direct Data API generates an Incremental file which contains the set of data which changed in the Vault during this 15-minute interval. After your initial data load, you can use this file to quickly and easily capture data changes and additions.
  • Log: Every 24 hours, Direct Data API generates audit log data for a single day. You can use this file to track detailed changes for object records and documents, review system configuration changes, and monitor login behaviors.

Learn more about each Direct Data file type in Direct Data File Structure, or our video walkthroughs.

Direct Data API extracts the following data from your Vault:

  • Vault Objects: Includes both custom and standard objects from your Vault.
  • Documents: Includes all document metadata, document types, document relationships, document fields, and references to download source document, rendition, and text files. This includes archived documents but excludes annotation metadata, document roles, and source document, rendition, and text files themselves.
  • Picklists: Includes all picklists, excluding picklists that are not referenced by objects or documents in the extract.
  • Workflows: Includes all workflow instances, items, user tasks, task items, and legacy workflow information. This includes active and inactive workflows for both objects and documents but excludes participant group details.
  • Audit Logs: Includes one extract for each log type: System, Document, Object, and Login.

Direct Data API is not configurable, and all of the above data is always made available. You can use or ignore the data in the files.

There are several benefits of using Direct Data API to extract data from your Vault:

A Direct Data file is produced in a fixed, well-defined, and easy-to-understand format. This simplifies integrations, as the user doesn’t need to know the data model of the components or make multiple calls to different endpoints to build the dataset.

However, Direct Data API does include the data model for all the objects and documents in a single metadata.csv file so that tables can be created in the external system based on the data provided.

Direct Data API continuously collects and stages the data in the background and publishes it as a single file for the interval specified. This is significantly faster than extracting the data via traditional APIs, which may require multiple calls based on the number of records being extracted.

Files are always published at fixed times and at a regular cadence. Direct Data API provides Incremental files every 15 minutes, tracking changes in a 15-minute interval, which makes it possible to update the data warehouse on a more timely basis.

Direct Data API provides a transactionally consistent view of data across Vault at a given time, called stop_time.