Skip to content

Extract Naming & Attributes

Extracts contain the data for Vault components. Log Direct Data files only contain audit log extracts. Full and Incremental files may contain extracts for the following components:

Direct Data API populates an extract's CSV file name according to its extract_name. In addition, if a user deletes object records or document versions, Direct Data API stores it in a separate file by appending _deletes.csv to the extract name. The CSV files include a column referencing the record ID of related objects (which can be identified using the metadata.csv). In all extracts, standard columns appear first, while the remaining columns are ordered alphabetically. While this order is predictable, schema changes in your Vault may alter column positioning within extracts. The columns and content available in each extract vary depending on the component, and are elaborated on in the sections below.

Document version data is available in the document_version__sys.csv file. Deleted document versions are tracked in a separate file.

All document extracts have a set of standard fields in addition to all the defined document fields in Vault. Extracts also include all queryable document fields.

The following standard columns are available in the document version extract:

Column NameDescription
idThe document version ID, in the format {doc_id}_{major_version_number}_{minor_version_number}. For example, 101_0_1 represents version 0.1 of document ID 101. This value is the same as version_id.
modified_date__vThe date the document version was last modified.
doc_idThe document id field value.
version_idThe document version ID, in the format {doc_id}_{major_version_number}_{minor_version_number}. For example, 101_0_1 represents version 0.1 of document ID 101. This value is the same as id.
major_version_numberThe major version of the document.
minor_version_numberThe minor version of the document.
typeThe document type.
subtypeThe document subtype.
classificationThe document classification.
source_fileThe Vault API request to download the source file using the Download Document Version File endpoint.
rendition_fileThe Vault API request to download the rendition file using the Download Document Version Rendition File endpoint.
text_fileThe Vault API request to export the plain text of the source file using the Retrieve Document Version Text endpoint.

Direct Data API includes document metadata in the document_version__sys extract. This file includes additional attributes such as source_file, rendition_file, and text_file, which have generated URLs to download the content for that particular version of a document.

If your organization needs to make the source content for all documents available for further processing or data mining, use the Export Document Versions endpoint to export documents to your Vault’s file staging server in bulk. This endpoint allows up to 10,000 document versions per request.

Document relationship data is available in the document_relationship__sys.csv file. If there are deleted document relationships, they are tracked in a separate document_relationship__sys_deletes.csv.

The following standard columns are available in the document relationships extract:

Column NameDescription
idThe document relationship ID.
modified_date__vThe date the document relationship was last modified.
modified_by__vThe ID of the user who last modified the document relationship.
source_doc_id__vThe ID of the source document on which the relationship originates.
source_version_idThe version ID of the source document, in the format {source_doc_id__v}_{source_major_version__v}_{source_minor_version__v}. For example, 101_0_1 represents version 0.1 of document ID 101.
source_major_version__vThe major version of the source document. If the document relationship is not version-specific, this value is empty.
source_minor_version__vThe minor version of the source document. If the document relationship is not version-specific, this value is empty.
target_doc_id__vThe ID of the target document to which the relationship points.
target_version_idThe version ID of the target document, in the format {target_doc_id__v}_{target_major_version__v}_{target_minor_version__v}. For example, 101_0_1 represents version 0.1 of document ID 101. This value is the same as id.
target_major_version__vThe major version of the target document. If the document relationship is not version-specific, this value is empty.
target_minor_version__vThe minor version of the target document. If the document relationship is not version-specific, this value is empty.
relationship_type__vThe type of relationship between the source and target document.
source_vault_id__vThe ID of the source Vault for a Crosslink document relationship.
created_date__vThe date the document relationship was created.
created_by__vThe ID of the user who created the document relationship.

Each object has its own extract file. Extracts are named according to their object name. For example, the extract CSV file for the Activity object is named activity__v.csv. If there are deleted records for an object, they are tracked in a separate {objectname}_deletes.csv.

Both custom and standard objects from your Vault are included. All objects visible on the Admin > Configuration page of your Vault are available for extraction.

All object extracts have a set of standard fields in addition to all of the defined fields included on the object, including inactive fields. The following standard columns are available in Vault object extracts:

Column NameDescription
idThe object record ID.
modified_date__vThe date the object record was last modified.
name__vThe name of the object record.
status__vThe status of the object record.
created_by__vThe ID of the user who created the object record.
created_date__vThe date the object record was created.
modified_by__vThe ID of the user who last modified the object record.
global_id__sysThe global ID of the object record.
link__sysThe object record ID across all Vaults where the record exists.

All picklist data is available in the picklist__sys.csv file. Picklist values cannot be deleted, only inactivated. When a picklist value name is changed, the separate picklist__sys_deletes.csv file tracks the original value, while the picklist__sys.csv file shows the new value. Learn more about picklist deletion in Vault Help.

This does not include picklists that are not referenced by any objects or documents.

The following standard columns are available in picklist extracts:

Column NameDescription
modified_date__vThe date the picklist was last modified.
objectThe name of the object on which the picklist is defined.
object_fieldThe name of the object picklist field.
picklist_value_nameThe picklist value name.
picklist_value_labelThe picklist value label.
status__vThe status of the picklist value.

Picklist References in Object & Document Extracts

Section link for Picklist References in Object & Document Extracts

Object or document fields that reference picklist values are classified with a type of Picklist or MultiPicklist in the metadata.csv file.

The picklist extract allows you to retrieve the picklist value labels corresponding to the picklist value names referenced in other extracts. Picklist extracts should be handled in the following ways:

  • Join or denormalize the data using a three-part key (object, object_field, picklist_value_name). The extract and extract field metadata provide the object and object_field values, respectively.
  • Process incremental changes to picklists, including updates and deletes, based on the three-part key.

Below is an example using the masking__v picklist field on the study__v object from a Safety Vault:

idmodified_date__vmasking__vname__vorganization__vstudy_name__v
V170000000010012023-12-06T17:57:05.000Zopen_label__vStudy 1V0Z000000000201Study for Evaluation of Study Product

The corresponding entry in the metadata.csv for the picklist field would be:

extractextract_labelcolumn_namecolumn_labeltypelengthrelated_extract
Object.study__vStudymasking__vMaskingPicklist46Picklist.picklist__sys

Below is an example of the Picklist.csv row for the open_label__v value of the masking__v picklist field on the study__v object:

modified_date__vobjectobject_fieldpicklist_value_namepicklist_value_labelstatus__v
2023-12-14T00:06:28.867Zstudy__vmasking__vopen_label__vOpen Labelactive__v

Workflow data including workflow instances, items, user tasks, and task items is available in the following extracts:

  • workflow__sys.csv: Provides workflow-level information about each workflow instance.
  • workflow_item__sys.csv: Provides item-level information about each document or object record associated with a workflow.
  • workflow_task__sys.csv: Provides task-level information about each workflow task associated with a workflow.
  • workflow_task_item__sys.csv: Provides item-level information about each workflow task associated with a workflow.

All workflow data in extracts includes active and inactive workflows for both objects and documents. A Direct Data file may include additional extracts for legacy workflows.

The workflow__sys.csv extract provides workflow-level information about each workflow instance, including the workflow ID, workflow label, owner, type, and relevant dates.

The workflow_item__sys.csv provides item-level information about each document or object record associated with a workflow, including the workflow instance ID, item type, and IDs of the related Vault object record or document. If a workflow includes a document, the Document Version ID (doc_version_id) column in the CSV file references the document version it's related to. If the workflow instance is not related to a specific document version, this column displays the latest version ID of that document. If the extract_type is incremental_directdata, the Incremental file captures new document versions associated with the workflow.

The metadata CSV file assigns the workflow item extract a type of String.

Workflow Items Extract and Object Relationships

Section link for Workflow Items Extract and Object Relationships

The workflow item extract (workflow_item__sys.csv) provides information about the document version or Vault object record that relates to the item. There may be instances where the referenced document version or Vault object record does not have a corresponding extract in the Direct Data file. For example, when you retrieve an Incremental file, its extracts only contain data updated within the specified 15-minute interval. Therefore, if the document version or object record was not modified within this interval, it will not have its own extract.

As best practice, your external data warehouse should allow for a polymorphic relationship between the workflow item extract and each of the tables representing object extracts.

The workflow_task__sys.csv provides task-level information about each user task associated with a workflow. This extract includes information such as the workflow ID, task label, task owner, task instructions, and relevant dates. This extract excludes participant group details.

The workflow_task_item__sys.csv provides item-level information about each user task associated with a workflow, such as the workflow task item ID, any captured verdicts, and the type of task item.

Direct Data extracts may contain data about your Vault's active and inactive legacy workflows.

Active legacy workflow information is available in the following extracts:

  • active_legacy_workflow__sys.csv: Provides workflow-level information about active legacy workflows and workflow items.
  • active_legacy_workflow_task__sys.csv: Provides task-level information about active legacy workflow tasks and task items.

Incremental files do not include inactive legacy workflow data, however, this information is accessible from a Full file. The extract for inactive legacy workflows only includes data from the previous day. To retrieve all inactive legacy workflow data, we recommend exporting a report via the Vault UI.

Inactive legacy workflow information is available in the following extracts:

  • inactive_legacy_workflow__sys.csv: Provides workflow-level information about inactive legacy workflows and workflow items.
  • inactive_legacy_workflow_task__sys.csv: Provides task-level information about inactive legacy workflow tasks and task items.

Log files contain extracts that capture audit log data for a single day, which are not included in Full or Incremental files. Direct Data API publishes each Log file once a day at 01:00 UTC and this file is available to download for two (2) days following its publishing. Log data, including detailed changes for object records and documents, system configuration changes, and login information, is captured within the following extracts:

  • object_audit_trail.csv: Provides all changes to object records. Each event includes information such as the timestamp, user's login name, affected item, and description. Vault only captures changes to object records when the Audit data changes in this object setting is enabled on that object.
  • document_audit_trail.csv: Provides document-related events, including views, send as link actions, task completions, and modifications to document fields. Each event includes information such as the timestamp, user’s login name, affected item, and description.
  • system_audit_trail.csv: Provides Vault-level configuration and settings changes. Each event includes information such as the timestamp, the user who performed the change, the affected item, and the event description.
  • login_audit_trail.csv: Provides user authentication events, including users’ logins, failed login attempts, and password changes. Each event includes information such as the timestamp, user’s login name, originating IP address, type of event, user’s browser, user’s platform, and Vault ID.

To see which standard columns are included in each Log file extract, refer to metadata_full.csv under the root folder. The Log folder only contains the extracts whose audit trails have been updated for the day. For example, if no users have logged in today, the Log folder will not include the login_audit_trail.csv file. Learn more about audit logs in Vault Help.