Skip to main content

Supported File Formats

When writes data to Azure Blob Storage, you can choose the file format for the exported data. The following file formats are supported for the Azure Blob Storage destination:
  • (Default) Delta Parquet—A format that uses a Delta Lake storage layer on top of the Parquet file format that is used by to support delta processing. Delta processing is a method where, after your initial job run, only new or modified files are written or read in subsequent runs, which can reduce job times and resource use. Limitations:
    • Naming restrictions: Table and column names cannot include special characters or reserved SQL and Delta Lake keywords. Examples of special characters include spaces, commas, semicolons, braces, parentheses, equal signs, and the newline (\n) and tab (\t) characters.
    • Primary keys: Primary key constraints are not supported. uses the source primary keys for incremental replication.
    • Data types: Unlike traditional databases, Delta Lake does not support column-size definitions (for example, VARCHAR(100)). It supports only a fixed set of data types and allows type widening when necessary.
    • Schema changes: The ALTER TABLE command supports only adding new columns. Changing the data type of an existing column (for example, from INT to VARCHAR) is not supported.
    • Delete operations: In standard jobs, both hard and soft deletions are supported. In CDC and enhanced CDC jobs, only soft deletions are supported.
  • Apache Iceberg—A high-performance table format that supports atomicity, consistency, isolation, and durability (ACID) transactions and schema evolution.
  • CSV—Plain text comma-separated values.
  • Avro—A row-based binary format that supports schema evolution.
  • Parquet—A columnar storage format that is optimized for analytics.

Authenticate to Azure Blob Storage

After you add the connector, you need to set the required properties.
  • File Format: Select the file format that you want to use: Delta Parquet (default), Apache Iceberg, CSV , Avro, or Parquet.
  • URI: Enter the path of your container and the name of the blob (for example, azureblob://MyContainer/MyBlob).
  • Azure Storage Account: Specify the storage account that should be used in Azure Blob Storage.
supports authenticating to Azure Blob Storage in several ways. Select your authentication method below to proceed to the relevant section that contains the authentication details. Note: The full list of authentication methods above are for all file formats except Delta Parquet. That format uses only the Access Key method.

Azure Active Directory

To connect with an Azure Active Directory (AD) user account, specify the following properties:
  • Auth Scheme: select AzureAD.
  • Use Lake Formation: Select whether you want the AWS Lake Formation service to retrieve temporary credentials. These temporary credentials enforce access policies against the user based on the configured IAM role. You can use this service when you authenticate through AzureAD, Okta, ADFS, and PingFederate, while providing a Security Assertion Markup Language (SAML) assertion. By default, the Enable checkbox is not selected.

Azure Managed Service Identity

Azure Service Principal

Azure Service Principal Certificate

Azure Access Key

Azure Shared Access Signature

Complete Your Connection

To complete your connection:
  1. Specify the following properties: For the Delta Parquet and CSV file formats:
    • FMT: Enter the format that you want to use to parse all text files. The default format is CsvDelimited.
    • Aggregate Files: Select whether you want to aggregate all the files that are located in the URI directory and that have the same schema into a single table named AggregatedFiles. By default, the Enable checkbox is not selected.
    • Include Column Headers: Select whether you want to obtain column headers from the first lines of the specified files. By default, the Enable checkbox is already selected.
    For the Avro and Parquet file formats:
    • Data Model: Select the data model that you want to use to parse documents for your format and to generate the database metadata. The default data model is Document.
    • Aggregate Files: Select whether you want to aggregate all the files that are located in the URI directory and that have the same schema into a single table named AggregatedFiles. By default, the Enable checkbox is not selected.
  2. Define advanced connection settings on the Advanced tab. (In most cases, though, you should not need these settings.)
  3. If you authenticate with AzureAD, click Connect to Azure Blob Storage to connect to your Azure Blob Storage account.
  4. Click Create & Test to create your connection.