Extract Naming & Attributes
Extracts contain the data for Vault components. Log Direct Data files only contain audit log extracts. Full and Incremental files may contain extracts for the following components:
Direct Data API populates an extract's CSV file name according to its extract_name. In addition, if a user deletes object records or document versions, Direct Data API stores it in a separate file by appending _deletes.csv to the extract name. The CSV files include a column referencing the record ID of related objects (which can be identified using the metadata.csv). In all extracts, standard columns appear first, while the remaining columns are ordered alphabetically. While this order is predictable, schema changes in your Vault may alter column positioning within extracts. The columns and content available in each extract vary depending on the component, and are elaborated on in the sections below.
Document Extract
Section link for Document ExtractDocument version data is available in the document_version__sys.csv file. Deleted document versions are tracked in a separate file.
All document extracts have a set of standard fields in addition to all the defined document fields in Vault. Extracts also include all queryable document fields.
The following standard columns are available in the document version extract:
| Column Name | Description |
|---|---|
id | The document version ID, in the format {doc_id}_{major_version_number}_{minor_version_number}. For example, 101_0_1 represents version 0.1 of document ID 101. This value is the same as version_id. |
modified_date__v | The date the document version was last modified. |
doc_id | The document id field value. |
version_id | The document version ID, in the format {doc_id}_{major_version_number}_{minor_version_number}. For example, 101_0_1 represents version 0.1 of document ID 101. This value is the same as id. |
major_version_number | The major version of the document. |
minor_version_number | The minor version of the document. |
type | The document type. |
subtype | The document subtype. |
classification | The document classification. |
source_file | The Vault API request to download the source file using the Download Document Version File endpoint. |
rendition_file | The Vault API request to download the rendition file using the Download Document Version Rendition File endpoint. |
text_file | The Vault API request to export the plain text of the source file using the Retrieve Document Version Text endpoint. |
Accessing Source Content
Section link for Accessing Source ContentDirect Data API includes document metadata in the document_version__sys extract. This file includes additional attributes such as source_file, rendition_file, and text_file, which have generated URLs to download the content for that particular version of a document.
If your organization needs to make the source content for all documents available for further processing or data mining, use the Export Document Versions endpoint to export documents to your Vault’s file staging server in bulk. This endpoint allows up to 10,000 document versions per request.
Document Relationships Extract
Section link for Document Relationships ExtractDocument relationship data is available in the document_relationship__sys.csv file. If there are deleted document relationships, they are tracked in a separate document_relationship__sys_deletes.csv.
The following standard columns are available in the document relationships extract:
| Column Name | Description |
|---|---|
id | The document relationship ID. |
modified_date__v | The date the document relationship was last modified. |
modified_by__v | The ID of the user who last modified the document relationship. |
source_doc_id__v | The ID of the source document on which the relationship originates. |
source_version_id | The version ID of the source document, in the format {source_doc_id__v}_{source_major_version__v}_{source_minor_version__v}. For example, 101_0_1 represents version 0.1 of document ID 101. |
source_major_version__v | The major version of the source document. If the document relationship is not version-specific, this value is empty. |
source_minor_version__v | The minor version of the source document. If the document relationship is not version-specific, this value is empty. |
target_doc_id__v | The ID of the target document to which the relationship points. |
target_version_id | The version ID of the target document, in the format {target_doc_id__v}_{target_major_version__v}_{target_minor_version__v}. For example, 101_0_1 represents version 0.1 of document ID 101. This value is the same as id. |
target_major_version__v | The major version of the target document. If the document relationship is not version-specific, this value is empty. |
target_minor_version__v | The minor version of the target document. If the document relationship is not version-specific, this value is empty. |
relationship_type__v | The type of relationship between the source and target document. |
source_vault_id__v | The ID of the source Vault for a Crosslink document relationship. |
created_date__v | The date the document relationship was created. |
created_by__v | The ID of the user who created the document relationship. |
Vault Objects Extract
Section link for Vault Objects ExtractEach object has its own extract file. Extracts are named according to their object name. For example, the extract CSV file for the Activity object is named activity__v.csv. If there are deleted records for an object, they are tracked in a separate {objectname}_deletes.csv.
Both custom and standard objects from your Vault are included. All objects visible on the Admin > Configuration page of your Vault are available for extraction.
All object extracts have a set of standard fields in addition to all of the defined fields included on the object, including inactive fields. The following standard columns are available in Vault object extracts:
| Column Name | Description |
|---|---|
id | The object record ID. |
modified_date__v | The date the object record was last modified. |
name__v | The name of the object record. |
status__v | The status of the object record. |
created_by__v | The ID of the user who created the object record. |
created_date__v | The date the object record was created. |
modified_by__v | The ID of the user who last modified the object record. |
global_id__sys | The global ID of the object record. |
link__sys | The object record ID across all Vaults where the record exists. |
Picklist Extract
Section link for Picklist ExtractAll picklist data is available in the picklist__sys.csv file. Picklist values cannot be deleted, only inactivated. When a picklist value name is changed, the separate picklist__sys_deletes.csv file tracks the original value, while the picklist__sys.csv file shows the new value. Learn more about picklist deletion in Vault Help
This does not include picklists that are not referenced by any objects or documents.
The following standard columns are available in picklist extracts:
| Column Name | Description |
|---|---|
modified_date__v | The date the picklist was last modified. |
object | The name of the object on which the picklist is defined. |
object_field | The name of the object picklist field. |
picklist_value_name | The picklist value name. |
picklist_value_label | The picklist value label. |
status__v | The status of the picklist value. |
Picklist References in Object & Document Extracts
Section link for Picklist References in Object & Document ExtractsObject or document fields that reference picklist values are classified with a type of Picklist or MultiPicklist in the metadata.csv file.
The picklist extract allows you to retrieve the picklist value labels corresponding to the picklist value names referenced in other extracts. Picklist extracts should be handled in the following ways:
- Join or denormalize the data using a three-part key (
object,object_field,picklist_value_name). The extract and extract field metadata provide theobjectandobject_fieldvalues, respectively. - Process incremental changes to picklists, including updates and deletes, based on the three-part key.
Below is an example using the masking__v picklist field on the study__v object from a Safety Vault:
| id | modified_date__v | masking__v | name__v | organization__v | study_name__v |
|---|---|---|---|---|---|
| V17000000001001 | 2023-12-06T17:57:05.000Z | open_label__v | Study 1 | V0Z000000000201 | Study for Evaluation of Study Product |
The corresponding entry in the metadata.csv for the picklist field would be:
| extract | extract_label | column_name | column_label | type | length | related_extract |
|---|---|---|---|---|---|---|
| Object.study__v | Study | masking__v | Masking | Picklist | 46 | Picklist.picklist__sys |
Below is an example of the Picklist.csv row for the open_label__v value of the masking__v picklist field on the study__v object:
| modified_date__v | object | object_field | picklist_value_name | picklist_value_label | status__v |
|---|---|---|---|---|---|
| 2023-12-14T00:06:28.867Z | study__v | masking__v | open_label__v | Open Label | active__v |
Workflow Extracts
Section link for Workflow ExtractsWorkflow data including workflow instances, items, user tasks, and task items is available in the following extracts:
workflow__sys.csv: Provides workflow-level information about each workflow instance.workflow_item__sys.csv: Provides item-level information about each document or object record associated with a workflow.workflow_task__sys.csv: Provides task-level information about each workflow task associated with a workflow.workflow_task_item__sys.csv: Provides item-level information about each workflow task associated with a workflow.
All workflow data in extracts includes active and inactive workflows for both objects and documents. A Direct Data file may include additional extracts for legacy workflows.
Workflow Extract
Section link for Workflow ExtractThe workflow__sys.csv extract provides workflow-level information about each workflow instance, including the workflow ID, workflow label, owner, type, and relevant dates.
Workflow Item Extract
Section link for Workflow Item ExtractThe workflow_item__sys.csv provides item-level information about each document or object record associated with a workflow, including the workflow instance ID, item type, and IDs of the related Vault object record or document. If a workflow includes a document, the Document Version ID (doc_version_id) column in the CSV file references the document version it's related to. If the workflow instance is not related to a specific document version, this column displays the latest version ID of that document. If the extract_type is incremental_directdata, the Incremental file captures new document versions associated with the workflow.
The metadata CSV file assigns the workflow item extract a type of String.
Workflow Items Extract and Object Relationships
Section link for Workflow Items Extract and Object RelationshipsThe workflow item extract (workflow_item__sys.csv) provides information about the document version or Vault object record that relates to the item. There may be instances where the referenced document version or Vault object record does not have a corresponding extract in the Direct Data file. For example, when you retrieve an Incremental file, its extracts only contain data updated within the specified 15-minute interval. Therefore, if the document version or object record was not modified within this interval, it will not have its own extract.
As best practice, your external data warehouse should allow for a polymorphic relationship between the workflow item extract and each of the tables representing object extracts.
Workflow Task Extract
Section link for Workflow Task ExtractThe workflow_task__sys.csv provides task-level information about each user task associated with a workflow. This extract includes information such as the workflow ID, task label, task owner, task instructions, and relevant dates. This extract excludes participant group details.
Workflow Task Item Extract
Section link for Workflow Task Item ExtractThe workflow_task_item__sys.csv provides item-level information about each user task associated with a workflow, such as the workflow task item ID, any captured verdicts, and the type of task item.
Legacy Workflow Extracts
Section link for Legacy Workflow ExtractsDirect Data extracts may contain data about your Vault's active and inactive legacy workflows.
Active Workflows
Section link for Active WorkflowsActive legacy workflow information is available in the following extracts:
active_legacy_workflow__sys.csv: Provides workflow-level information about active legacy workflows and workflow items.active_legacy_workflow_task__sys.csv: Provides task-level information about active legacy workflow tasks and task items.
Inactive Workflows
Section link for Inactive WorkflowsIncremental files do not include inactive legacy workflow data, however, this information is accessible from a Full file. The extract for inactive legacy workflows only includes data from the previous day. To retrieve all inactive legacy workflow data, we recommend exporting a report
Inactive legacy workflow information is available in the following extracts:
inactive_legacy_workflow__sys.csv: Provides workflow-level information about inactive legacy workflows and workflow items.inactive_legacy_workflow_task__sys.csv: Provides task-level information about inactive legacy workflow tasks and task items.
Log Extracts
Section link for Log ExtractsLog files contain extracts that capture audit log data for a single day, which are not included in Full or Incremental files. Direct Data API publishes each Log file once a day at 01:00 UTC and this file is available to download for two (2) days following its publishing. Log data, including detailed changes for object records and documents, system configuration changes, and login information, is captured within the following extracts:
object_audit_trail.csv: Provides all changes to object records. Each event includes information such as the timestamp, user's login name, affected item, and description. Vault only captures changes to object records when the Audit data changes in this object setting is enabled on that object.document_audit_trail.csv: Provides document-related events, including views, send as link actions, task completions, and modifications to document fields. Each event includes information such as the timestamp, user’s login name, affected item, and description.system_audit_trail.csv: Provides Vault-level configuration and settings changes. Each event includes information such as the timestamp, the user who performed the change, the affected item, and the event description.login_audit_trail.csv: Provides user authentication events, including users’ logins, failed login attempts, and password changes. Each event includes information such as the timestamp, user’s login name, originating IP address, type of event, user’s browser, user’s platform, and Vault ID.
To see which standard columns are included in each Log file extract, refer to metadata_full.csv under the root folder. The Log folder only contains the extracts whose audit trails have been updated for the day. For example, if no users have logged in today, the Log folder will not include the login_audit_trail.csv file. Learn more about audit logs in Vault Help