Microsoft OneLake - CData Sync Documentation

Supported File Formats

When writes data to , you can choose the file format for the exported data. The following file formats are supported for the destination:

CSV—Plain text comma-separated values.
Avro—A row-based binary format that supports schema evolution.
(Default) Parquet—A columnar storage format that is optimized for analytics.

Add the Microsoft OneLake Connector

Authenticate to Microsoft OneLake

After you add the connector, you need to set the required properties.

Connection Name: Enter a connection name of your choice.
File Format: Select the file format that you want to use: CSV (default), Avro, and Parquet.
Azure Storage Account: Enter the name of your Azure storage account.
URI: Enter the path of the file system and folder that contains your files (for example, onelake://Workspace/Test.LakeHouse/Files/CustomFolder).

supports authenticating to in several ways. Select your authentication method below to proceed to the relevant section that contains the authentication details.

Azure Active Directory (default)
Azure Managed Service Identity
Azure Service Principal
Azure Service Principal Certificate

Azure Active Directory

To connect with an Azure Active Directory (AD) user account, specify the following properties:

Auth Scheme: Select AzureAD.
Use Lake Formation: Select True if you want the AWS Lake Formation service to retrieve temporary credentials. These temporary credentials enforce access policies against the user based on the configured IAM role. You can use this service when you authenticate through AzureAD, Okta, ADFS, and PingFederate, while providing a Security Assertion Markup Language (SAML) assertion. The default setting for Use Lake Formation is False.

Azure Managed Service Identity

Complete Your Connection

To complete your connection:

Specify the following properties: For the CSV file format:
- FMT: Enter the format that you want to use to parse all text files. The default format is CsvDelimited.
- Aggregate Files: Specify whether you want to aggregate all the files that are located in the URI directory and that have the same schema into a single table named AggregatedFiles. The default option is False.
- Include Column Headers: Specify whether you want to obtain column headers from the first lines of the specified files. The default option is True.
For the Avro and Parquet file formats:
- Data Model: Select the data model that you want to use to parse documents for your format and to generate the database metadata. The default data model is Document.
- Aggregate Files: Specify whether you want to aggregate all the files that are located in the URI directory and that have the same schema into a single table named AggregatedFiles. The default option is False.
Define advanced connection settings on the Advanced tab. (In most cases, though, you should not need these settings.)
Connect to to connect to your account.
Click Create & Test to create your connection.

Load Folder Job Behavior for Parquet and Avro Files

optimizes Load Folder jobs that use Microsoft OneLake sources with Parquet or Avro files to reduce redundant downloads and improve performance. During a Load Folder job run, downloads each file once and reuses it for both schema detection and data processing. This behavior reduces overall data transfer and improves performance for large datasets or folders with many files. Temporary files are stored in a job-specific directory:

{ApplicationDatabase}/connections/<SourceConnection>/temp/<JobId>/

In this path:

{ApplicationDatabase} specifies the root directory of the application database.
<SourceConnection> specifies the name or identifier of the source connection.
<JobId> specifies the system-generated job identifier.

Temporary files are removed automatically after the job completes, regardless of success or failure.

Ensure that sufficient disk space is available for the downloaded files during job execution.

Microsoft Office 365 Microsoft OneNote

⌘I

​Supported File Formats

​Add the Microsoft OneLake Connector

​Authenticate to Microsoft OneLake

​Azure Active Directory

​Azure Managed Service Identity

​Complete Your Connection

​Load Folder Job Behavior for Parquet and Avro Files

Supported File Formats

Add the Microsoft OneLake Connector

Authenticate to Microsoft OneLake

Azure Active Directory

Azure Managed Service Identity

Complete Your Connection

Load Folder Job Behavior for Parquet and Avro Files