Extract Naming & Attributes

Extracts contain the data for Vault components. Log Direct Data files only contain audit log extracts. Full and Incremental files may contain extracts for the following components:

Documents
Document relationships
Objects
Picklists
Workflows

Direct Data API populates an extract's CSV file name according to its extract_name. In addition, if a user deletes object records or document versions, Direct Data API stores it in a separate file by appending _deletes.csv to the extract name. The CSV files include a column referencing the record ID of related objects (which can be identified using the metadata.csv). In all extracts, standard columns appear first, while the remaining columns are ordered alphabetically. While this order is predictable, schema changes in your Vault may alter column positioning within extracts. The columns and content available in each extract vary depending on the component, and are elaborated on in the sections below.

Document Extract

Document version data is available in the document_version__sys.csv file. Deleted document versions are tracked in a separate file.

All document extracts have a set of standard fields in addition to all the defined document fields in Vault. Extracts also include all queryable document fields.

The following standard columns are available in the document version extract:

Column Name	Description
`id`	The document version ID, in the format `{doc_id}_{major_version_number}_{minor_version_number}`. For example, `101_0_1` represents version 0.1 of document ID 101. This value is the same as `version_id`.
`modified_date__v`	The date the document version was last modified.
`doc_id`	The document `id` field value.
`version_id`	The document version ID, in the format `{doc_id}_{major_version_number}_{minor_version_number}`. For example, `101_0_1` represents version 0.1 of document ID 101. This value is the same as `id`.
`major_version_number`	The major version of the document.
`minor_version_number`	The minor version of the document.
`type`	The document type.
`subtype`	The document subtype.
`classification`	The document classification.
`source_file`	The Vault API request to download the source file using the Download Document Version File endpoint.
`rendition_file`	The Vault API request to download the rendition file using the Download Document Version Rendition File endpoint.
`text_file`	The Vault API request to export the plain text of the source file using the Retrieve Document Version Text endpoint.

Accessing Source Content

Direct Data API includes document metadata in the document_version__sys extract. This file includes additional attributes such as source_file, rendition_file, and text_file, which have generated URLs to download the content for that particular version of a document.

If your organization needs to make the source content for all documents available for further processing or data mining, use the Export Document Versions endpoint to export documents to your Vault’s file staging server in bulk. This endpoint allows up to 10,000 document versions per request.

Document Relationships Extract

Document relationship data is available in the document_relationship__sys.csv file. If there are deleted document relationships, they are tracked in a separate document_relationship__sys_deletes.csv.

The following standard columns are available in the document relationships extract:

Column Name	Description
`id`	The document relationship ID.
`modified_date__v`	The date the document relationship was last modified.
`modified_by__v`	The ID of the user who last modified the document relationship.
`source_doc_id__v`	The ID of the source document on which the relationship originates.
`source_version_id`	The version ID of the source document, in the format `{source_doc_id__v}_{source_major_version__v}_{source_minor_version__v}`. For example, `101_0_1` represents version 0.1 of document ID 101.
`source_major_version__v`	The major version of the source document. If the document relationship is not version-specific, this value is empty.
`source_minor_version__v`	The minor version of the source document. If the document relationship is not version-specific, this value is empty.
`target_doc_id__v`	The ID of the target document to which the relationship points.
`target_version_id`	The version ID of the target document, in the format `{target_doc_id__v}_{target_major_version__v}_{target_minor_version__v}`. For example, `101_0_1` represents version 0.1 of document ID 101. This value is the same as `id`.
`target_major_version__v`	The major version of the target document. If the document relationship is not version-specific, this value is empty.
`target_minor_version__v`	The minor version of the target document. If the document relationship is not version-specific, this value is empty.
`relationship_type__v`	The type of relationship between the source and target document.
`source_vault_id__v`	The ID of the source Vault for a Crosslink document relationship.
`created_date__v`	The date the document relationship was created.
`created_by__v`	The ID of the user who created the document relationship.

Vault Objects Extract

Each object has its own extract file. Extracts are named according to their object name. For example, the extract CSV file for the Activity object is named activity__v.csv. If there are deleted records for an object, they are tracked in a separate {objectname}_deletes.csv.

Both custom and standard objects from your Vault are included. All objects visible on the Admin > Configuration page of your Vault are available for extraction.

All object extracts have a set of standard fields in addition to all of the defined fields included on the object, including inactive fields. The following standard columns are available in Vault object extracts:

Column Name	Description
`id`	The object record ID.
`modified_date__v`	The date the object record was last modified.
`name__v`	The name of the object record.
`status__v`	The status of the object record.
`created_by__v`	The ID of the user who created the object record.
`created_date__v`	The date the object record was created.
`modified_by__v`	The ID of the user who last modified the object record.
`global_id__sys`	The global ID of the object record.
`link__sys`	The object record ID across all Vaults where the record exists.

Picklist Extract

All picklist data is available in the picklist__sys.csv file. Picklist values cannot be deleted, only inactivated. When a picklist value name is changed, the separate picklist__sys_deletes.csv file tracks the original value, while the picklist__sys.csv file shows the new value. Learn more about picklist deletion in Vault Help.

This does not include picklists that are not referenced by any objects or documents.

The following standard columns are available in picklist extracts:

Column Name	Description
`modified_date__v`	The date the picklist was last modified.
`object`	The name of the object on which the picklist is defined.
`object_field`	The name of the object picklist field.
`picklist_value_name`	The picklist value name.
`picklist_value_label`	The picklist value label.
`status__v`	The status of the picklist value.

Picklist References in Object & Document Extracts

Object or document fields that reference picklist values are classified with a type of Picklist or MultiPicklist in the metadata.csv file.

The picklist extract allows you to retrieve the picklist value labels corresponding to the picklist value names referenced in other extracts. Picklist extracts should be handled in the following ways:

Join or denormalize the data using a three-part key (object, object_field, picklist_value_name). The extract and extract field metadata provide the object and object_field values, respectively.
Process incremental changes to picklists, including updates and deletes, based on the three-part key.

Below is an example using the masking__v picklist field on the study__v object from a Safety Vault:

id	modified_date__v	masking__v	name__v	organization__v	study_name__v
V17000000001001	2023-12-06T17:57:05.000Z	open_label__v	Study 1	V0Z000000000201	Study for Evaluation of Study Product

The corresponding entry in the metadata.csv for the picklist field would be:

extract	extract_label	column_name	column_label	type	length	related_extract
Object.study__v	Study	masking__v	Masking	Picklist	46	Picklist.picklist__sys

Below is an example of the Picklist.csv row for the open_label__v value of the masking__v picklist field on the study__v object:

modified_date__v	object	object_field	picklist_value_name	picklist_value_label	status__v
2023-12-14T00:06:28.867Z	study__v	masking__v	open_label__v	Open Label	active__v

Workflow Extracts

Workflow data including workflow instances, items, user tasks, and task items is available in the following extracts:

workflow__sys.csv: Provides workflow-level information about each workflow instance.
workflow_item__sys.csv: Provides item-level information about each document or object record associated with a workflow.
workflow_task__sys.csv: Provides task-level information about each workflow task associated with a workflow.
workflow_task_item__sys.csv: Provides item-level information about each workflow task associated with a workflow.

All workflow data in extracts includes active and inactive workflows for both objects and documents. A Direct Data file may include additional extracts for legacy workflows.

Workflow Extract

The workflow__sys.csv extract provides workflow-level information about each workflow instance, including the workflow ID, workflow label, owner, type, and relevant dates.

Workflow Item Extract

The workflow_item__sys.csv provides item-level information about each document or object record associated with a workflow, including the workflow instance ID, item type, and IDs of the related Vault object record or document. If a workflow includes a document, the Document Version ID (doc_version_id) column in the CSV file references the document version it's related to. If the workflow instance is not related to a specific document version, this column displays the latest version ID of that document. If the extract_type is incremental_directdata, the Incremental file captures new document versions associated with the workflow.

The metadata CSV file assigns the workflow item extract a type of String.

Workflow Items Extract and Object Relationships

The workflow item extract (workflow_item__sys.csv) provides information about the document version or Vault object record that relates to the item. There may be instances where the referenced document version or Vault object record does not have a corresponding extract in the Direct Data file. For example, when you retrieve an Incremental file, its extracts only contain data updated within the specified 15-minute interval. Therefore, if the document version or object record was not modified within this interval, it will not have its own extract.

As best practice, your external data warehouse should allow for a polymorphic relationship between the workflow item extract and each of the tables representing object extracts.

Workflow Task Extract

The workflow_task__sys.csv provides task-level information about each user task associated with a workflow. This extract includes information such as the workflow ID, task label, task owner, task instructions, and relevant dates. This extract excludes participant group details.

Workflow Task Item Extract

The workflow_task_item__sys.csv provides item-level information about each user task associated with a workflow, such as the workflow task item ID, any captured verdicts, and the type of task item.

Legacy Workflow Extracts

Direct Data extracts may contain data about your Vault's active and inactive legacy workflows.

Active Workflows

Active legacy workflow information is available in the following extracts:

active_legacy_workflow__sys.csv: Provides workflow-level information about active legacy workflows and workflow items.
active_legacy_workflow_task__sys.csv: Provides task-level information about active legacy workflow tasks and task items.

Inactive Workflows

Incremental files do not include inactive legacy workflow data, however, this information is accessible from a Full file. The extract for inactive legacy workflows only includes data from the previous day. To retrieve all inactive legacy workflow data, we recommend exporting a report via the Vault UI.

Inactive legacy workflow information is available in the following extracts:

inactive_legacy_workflow__sys.csv: Provides workflow-level information about inactive legacy workflows and workflow items.
inactive_legacy_workflow_task__sys.csv: Provides task-level information about inactive legacy workflow tasks and task items.

Log Extracts

Log files contain extracts that capture audit log data for a single day, which are not included in Full or Incremental files. Direct Data API publishes each Log file once a day at 01:00 UTC and this file is available to download for two (2) days following its publishing. Log data, including detailed changes for object records and documents, system configuration changes, and login information, is captured within the following extracts:

object_audit_trail.csv: Provides all changes to object records. Each event includes information such as the timestamp, user's login name, affected item, and description. Vault only captures changes to object records when the Audit data changes in this object setting is enabled on that object.
document_audit_trail.csv: Provides document-related events, including views, send as link actions, task completions, and modifications to document fields. Each event includes information such as the timestamp, user’s login name, affected item, and description.
system_audit_trail.csv: Provides Vault-level configuration and settings changes. Each event includes information such as the timestamp, the user who performed the change, the affected item, and the event description.
login_audit_trail.csv: Provides user authentication events, including users’ logins, failed login attempts, and password changes. Each event includes information such as the timestamp, user’s login name, originating IP address, type of event, user’s browser, user’s platform, and Vault ID.

To see which standard columns are included in each Log file extract, refer to metadata_full.csv under the root folder. The Log folder only contains the extracts whose audit trails have been updated for the day. For example, if no users have logged in today, the Log folder will not include the login_audit_trail.csv file. Learn more about audit logs in Vault Help.