Supported File Formats
When writes data to , you can choose the file format for the exported data. The following file formats are supported for the source:- (Default) CSV — Plain text comma-separated values.
- Avro — A row-based binary format that supports schema evolution.
- Parquet — A columnar storage format that is optimized for analytics.
Add the Amazon S3 Connector
Authenticate to Amazon S3
After you add the connector, you need to set the required properties.-
File Format: Select the file format that you want to use: Delta Parquet, CSV (default), Avro, and Parquet.
The Delta Parquet and Apache Iceberg file formats are not supported for source connectors. If you select either of these formats, generates an error.
-
URI: Enter the path of your bucket and folder (for example,
s3://BucketName/FolderName).
- AWS Root Keys (default)
- AWS EC2 Roles
- AWS IAM Roles
- Active Directory Federation Services
- Okta
- PingFederate
- AWS Temporary Credentials
- AWS Credentials File
AWS Root Keys
- (Optional) MFA Serial Number: Enter the serial number for your multifactor authentication (MFA) device, if you are using such a device.
- (Optional) MFA Token: Enter the temporary token that is available from your MFA device.
- Temporary Token Duration: Enter the duration, in seconds, that you want for your temporary credentials. The default duration is 3600.
AWS EC2 Roles
AWS IAM Roles
- (Optional) MFA Serial Number: Enter the serial number for your multifactor authentication (MFA) device, if you are using such a device.
- (Optional) MFA Token: Enter the temporary token that is available from your MFA device.
- Temporary Token Duration: Enter the duration, in seconds, that you want for your temporary credentials. The default duration is 3600.
Active Directory Federation Services
To connect with single sign-on (SSO) via Active Directory Federation Services (ADFS), specify the following properties:- Auth Scheme: Select ADFS.
- User: Enter the username that you use to authenticate to your ADFS account.
- Password: Enter the password that you use to authenticate to your ADFS account.
- SSO Login URL: Enter the login URL that is used by your SSO provider.
- Use Lake Formation: Select True if you want the AWS Lake Formation service to retrieve temporary credentials. These temporary credentials enforce access policies against the user based on the configured IAM role. You can use this service when you authenticate through AzureAD, Okta, ADFS, and PingFederate, while providing a Security Assertion Markup Language (SAML) assertion. The default setting for Use Lake Formation is False.
-
(Optional) SSO Properties: Enter a semicolon-separated list of the single sign-on (SSO) properties that you want to use (for example,
SSOProperty1=Value1;SSOProperty2=Value2;…).
Okta
To connect with single sign-on (SSO) via Okta, specify the following properties:- Auth Scheme: Select Okta.
- User: Enter the username that you use to authenticate to your Okta account.
- Password: Enter the password that you use to authenticate to your Okta account.
- SSO Login URL: Enter the login URL that is used by your SSO provider.
- Use Lake Formation: Select True if you want the AWS Lake Formation service to retrieve temporary credentials. These temporary credentials enforce access policies against the user based on the configured IAM role. You can use this service when you authenticate through AzureAD, Okta, ADFS, and PingFederate, while providing a Security Assertion Markup Language (SAML) assertion. The default setting for Use Lake Formation is False.
-
(Optional) SSO Properties: Enter a semicolon-separated list of the single sign-on (SSO) properties that you want to use (for example,
SSOProperty1=Value1;SSOProperty2=Value2;...).
PingFederate
AWS Temporary Credentials
To connect with AWS temporary credentials, specify the following properties:- Auth Scheme: Select AwsTempCredentials.
- AWS Access Key: Enter the access key that is associated with your Amazon Web Services (AWS) account. This value is accessible from your AWS security credentials page.
- AWS Secret Key: Enter the secret key that is associated with your AWS account. This value is accessible from your AWS security credentials page.
- AWS Session Token: Enter your AWS session token. This token is provided with your temporary credentials. For more information, see AWS Identity and Access Management: User Guide.
AWS Credentials File
Azure Active Directory
To connect with an Azure Active Directory (AD) user account, specify the following properties:- Auth Scheme: Select AzureAD.
- Use Lake Formation: Select True if you want the AWS Lake Formation service to retrieve temporary credentials. These temporary credentials enforce access policies against the user based on the configured IAM role. You can use this service when you authenticate through AzureAD, Okta, ADFS, and PingFederate, while providing a Security Assertion Markup Language (SAML) assertion. The default setting for Use Lake Formation is False.
- OAuth Client Id: Enter the client Id that you were assigned when you registered your application with an OAuth authorization server.
- OAuth Client Secret: Enter the client secret that you were assigned when you registered your application with an OAuth authorization server.
Complete Your Connection
To complete your connection:-
Specify the following properties:
For all file formats:
- (Optional) Storage Base: Enter the URL of your cloud-storage service provider.
- FMT: Enter the format that you want to use to parse all text files. The default format is CsvDelimited.
- Aggregate Files: Specify whether you want to aggregate all the files that are located in the URI directory and that have the same schema into a single table named AggregatedFiles. The default option is False.
- Include Column Headers: Specify whether you want to obtain column headers from the first lines of the specified files. The default option is True.
- Data Model: Select the data model that you want to use to parse documents for your format and to generate the database metadata. The default data model is Document.
- Aggregate Files: Specify whether you want to aggregate all the files that are located in the URI directory and that have the same schema into a single table named AggregatedFiles. The default option is False.
- Define advanced connection settings on the Advanced tab. (In most cases, though, you should not need these settings.)
- Click Create & Test to create your connection.