Open-Source Accelerators

Our Vault Developer Support team has built a sample accelerators which you can use as-is with your Vault or as a starting point to build your custom integration. If you are building your own accelerator, refer to best practices for building scalable accelerators.

Each open-source accelerator performs the following fundamental processes:

Downloads zipped Direct Data files and uploads to object storage
Extracts data from Direct Data files
Optionally converts CSV files to Parquet
Loads data into the target system
Optionally extracts document source content and text from Vault and uploads to object storage

The open-source accelerators listed below facilitate the loading of data from Vault to an object storage system, to then be loaded into target data systems. Additionally, if your organization relies on data visualization, each target system can connect to Power BI.

Vault → AWS S3 → Snowflake
Vault → AWS S3 → Databricks
Vault → AWS S3 → Redshift
Vault → Azure Blob Storage → Azure SQL Database
Vault → Azure Blob Storage → Microsoft Fabric Warehouse

You can access the source code for these accelerators from our GitHub repository.

Best Practices for Building a Scalable Accelerator

Building a scalable accelerator leveraging Direct Data API requires careful consideration of several key factors. Key strategies for maximizing throughput and minimizing processing time include:

Parallel Loading: Optimize data ingestion by loading multiple Direct Data extracts simultaneously using separate sessions. This approach leverages parallel processing capabilities to significantly reduce overall load times.
Memory Allocation: Ensure sufficient memory allocation for COPY operations and other data processing tasks. Adequate memory prevents bottlenecks and ensures smooth data transfer.
Incremental Loads with Staging Tables: For incremental updates, utilize staging tables to manage changes efficiently. Separate "Delete" and "Update" tables within the staging area allow for optimized processing, and apply deletes before updates. Consider using temporary tables for complex data transformations before final insertion into the target database.
Performance Monitoring: Continuously monitor load times and other key performance metrics. Tracking these metrics can identify potential bottlenecks and ensure that performance remains close to the standards of your external system.