Azure Data Lake Storage Connection
This documentation is based on version 21.0.8257 of the connector.
This documentation is based on version 21.0.8257 of the connector.
The connector leverages the Azure Data Lake Storage API to enable bidirectional access to Azure Data Lake Storage.
The Jitterbit Connector for Azure Data Lake Storage is designed for navigating the Azure Data Lake Storage metadata only. A variety of of stored procedures* relevant to Azure Data Lake Storage data are supported as well.
This metadata typically includes details about stored objects, such as file and folder names, and excludes the actual content of the discoverable files.
If access to both the file metadata and the actual file content is needed, then the Jitterbit Connector for Azure Data Lake Storage must be used in tandem with the associated file streaming driver(s) for the filetypes stored in Azure Data Lake Storage.
The following file streaming drivers are available:
See the relevant file streaming driver's documentation for a configuration guide for connecting to files stored in Azure Data Lake Storage.
To connect to a Gen 1 DataLakeStorage account, you should first set the following properties:
Gen 1 supports the following authentication methods: Azure Active Directory OAuth (AzureAD) and Managed Service Identity (AzureMSI).
Azure AD is a connection type that goes through OAuth. Set your AuthScheme to AzureAD and see Using OAuth Authentication for an authentication guide.
Azure Service Principal is a connection type that goes through OAuth. Set your AuthScheme to AzureServicePrincipal and see Using Azure Service Principal Authentication for an authentication guide.
If you are running Azure Data Lake Storage on an Azure VM, you can leverage Managed Service Identity (MSI) credentials to connect:
The MSI credentials will then be automatically obtained for authentication.
To connect to a Gen 2 DataLakeStorage account, you should first set the following properties:
Gen 2 supports the following authentication methods: using an AccessKey, using a Shared Access Signature, Azure Active Directory OAuth (AzureAD), Managed Service Identity (AzureMSI).
To connect using a Shared Access Signature set the AccessKey property and the AuthScheme to AccessKey.
You can obtain an access key for the ADLS Gen2 storage account using the Azure portal:
To connect using a Shared Access Signature set the SharedAccessSignature property to a valid signature of a resource to connect to and the AuthScheme to SAS. The SharedAccessSignature may be generated with a tool such as Azure Storage Explorer.
Azure AD is a connection type that goes through OAuth. Set your AuthScheme to AzureAD and see Using OAuth Authentication for an authentication guide.
Azure Service Principal is a connection type that goes through OAuth. Set your AuthScheme to AzureServicePrincipal and see Using Azure Service Principal Authentication for an authentication guide.
If you are running Azure Data Lake Storage on an Azure VM, you can leverage Managed Service Identity (MSI) credentials to connect:
The MSI credentials will then be automatically obtained for authentication.
OAuth requires the authenticating user to interact with Azure Data Lake Storage using the browser. The connector facilitates this in various ways as described below.
Instead of connecting with the connector's embedded credentials, you can register an app with Custom Credentials to obtain the OAuthClientId and OAuthClientSecret.
See Creating a Custom OAuth App for a procedure.
You can connect without setting any connection properties for your user credentials.
When you connect, the connector opens the OAuth endpoint in your default browser. Log in and grant permissions to the application. The connector then completes the OAuth process.
There are two types of app authentication available: using a client secret and using a certificate. You can use any of them depending on the configured app authentication.
Follow the steps below to authenticate with the credentials for a custom OAuth app. See Creating a Custom OAuth App.
You are ready to connect after setting one of the below connection properties groups depending on the authentication type.
To create Azure Data Lake Storage data sources on headless servers or other machines on which the connector cannot open a browser, you need to authenticate from another machine. Authentication is a two-step process.
Create a Custom OAuth App
See Creating a Custom OAuth App for a procedure. You can then follow the procedures below to authenticate and connect to data.
Obtain a Verifier Code
On the headless machine, set one the following properties groups depending on the authentication type:
You can then follow the steps below to authenticate from another machine and obtain the OAuthVerifier connection property.
On the headless machine, set the one of the following connection properties groups depending on the authentication type to obtain the OAuth authentication values:
Connect to Data
After the OAuth settings file is generated, set the following properties to connect to data:
Transfer OAuth Settings
Follow the steps below to install the connector on another machine, authenticate, and then transfer the resulting OAuth values.
On a second machine, install the connector and connect with the one of the following properties groups depending on the authentication type:
Test the connection to authenticate. The resulting authentication values are written, encrypted, to the path specified by OAuthSettingsLocation. Once you have successfully tested the connection, copy the OAuth settings file to your headless machine. On the headless machine, set the following connection properties to connect to data:
Creating a custom OAuth app is optional as the connector is already registered with Azure Data Lake Storage and you can connect with its embedded credentials.
You might want to create a custom OAuth app to change the information displayed when users log into the Azure Data Lake Storage OAuth endpoint to grant permissions to the connector.Follow the steps below to create a custom OAuth app and obtain the connection properties in a specific OAuth authentication flow.
Follow the steps below to obtain the OAuth values for your app, the OAuthClientId and OAuthClientSecret.
Admin consent refers to when the Admin for an Azure Active Directory tenant grants permissions to an application which requires an admin to consent to the use case. The embedded app within the Jitterbit Connector for Azure Data Lake Storage, contains no permissions that require admin consent. Therefore, this information applies only to custom applications.
When creating a new OAuth app in the Azure Portal, you must specify which permissions the app will require. Some permissions may be marked stating "Admin Consent Required". For example, all Groups permissions require Admin Consent. If your app requires admin consent, there are a couple of ways this can be done.
The easiest way to grant admin consent is to just have an admin log into portal.azure.com and navigate to the app you have created in App Registrations. Under API Permissions, there will be a button for Grant Consent. You can consent here for your app to have permissions on the tenant it was created under. If your organization has multiple tenants or the app needs to be granted permissions for other tenants outside your organization, the GetAdminConsentURL may be used to generate the Admin Authorization URL. Unlike the GetOAuthAuthorizationURL, there will be no important information returned from this endpoint. If the grants access, it will simply return a boolean indicating that permissions were granted.Once an admin grants consent, authentication may be performed as normal.
Client credentials refers to a flow in OAuth where there is no direct user authentication taking place. Instead, credentials are created for just the app itself. All tasks taken by the app are done without a default user context. This makes the authentication flow a bit different from standard.
All permissions related to the client oauth flow require admin consent. This means the app embedded with the Jitterbit Connector for Azure Data Lake Storage cannot be used in the client oauth flow. You must create your own OAuth app in order to use client credentials. See Creating a Custom OAuth App for more details.
In your App Registration in portal.azure.com, navigate to API Permissions and select the Microsoft Graph permissions. There are two distinct sets of permissions - Delegated and Application permissions. The permissions used during client credential authentication are under Application Permissions. Select the applicable permissions you require for your integration. You are ready to connect after setting one of the below connection properties groups depending on the authentication type.The authentication as an Azure Service Principal is handled via the OAuth Client Credentials flow, and it does not involve direct user authentication. Instead, credentials are created for just the app itself. All tasks taken by the app are done without a default user context, but based on the assigned roles. The application access to the resources is controlled through the assigned roles' permissions.
You will need to register an OAuth app to obtain the OAuth property values before connection to the Azure Data Lake Storage data source. You can check the Custom Credentials guide on how to set the OAuth properties.
See Creating a Custom OAuth App for a procedure.
Creating a custom OAuth app and a service principal that can access the necessary resources is required when authenticating using an Azure Service Principal.
Follow the steps below to create a custom OAuth app and obtain the connection properties for the Azure Service Principal authentication.
Follow the steps below to obtain the OAuth values for your app.
Follow the steps below to authenticate with the credentials for a custom OAuth app. See Creating a Custom OAuth App.
There are two types of app authentication available: using a client secret and using a certificate. You can use any of them depending on the configured app authentication.
You are ready to connect after setting one of the below connection properties groups depending on the authentication type.
You can use the following properties to gain more control over the data returned from Azure Data Lake Storage:
This section details a selection of advanced features of the Azure Data Lake Storage connector.
The connector allows you to define virtual tables, called user defined views, whose contents are decided by a pre-configured query. These views are useful when you cannot directly control queries being issued to the drivers. See User Defined Views for an overview of creating and configuring custom views.
Use SSL Configuration to adjust how connector handles TLS/SSL certificate negotiations. You can choose from various certificate formats; see the SSLServerCert property under "Connection String Options" for more information.
To configure the connector using Private Agent proxy settings, select the Use Proxy Settings checkbox on the connection configuration screen.
The Jitterbit Connector for Azure Data Lake Storage allows you to define a virtual table whose contents are decided by a pre-configured query. These are called User Defined Views, which are useful in situations where you cannot directly control the query being issued to the driver, e.g. when using the driver from Jitterbit. The User Defined Views can be used to define predicates that are always applied. If you specify additional predicates in the query to the view, they are combined with the query already defined as part of the view.
There are two ways to create user defined views:User Defined Views are defined in a JSON-formatted configuration file called UserDefinedViews.json. The connector automatically detects the views specified in this file.
You can also have multiple view definitions and control them using the UserDefinedViews connection property. When you use this property, only the specified views are seen by the connector.
This User Defined View configuration file is formatted as follows:For example:
{ "MyView": { "query": "SELECT * FROM Resources WHERE MyColumn = 'value'" }, "MyView2": { "query": "SELECT * FROM MyTable WHERE Id IN (1,2,3)" } }Use the UserDefinedViews connection property to specify the location of your JSON configuration file. For example:
"UserDefinedViews", "C:\Users\yourusername\Desktop\tmp\UserDefinedViews.json"
SELECT * FROM Customers WHERE City = 'Raleigh';An example of a query to the driver:
SELECT * FROM UserViews.RCustomers WHERE Status = 'Active';Resulting in the effective query to the source:
SELECT * FROM Customers WHERE City = 'Raleigh' AND Status = 'Active';That is a very simple example of a query to a User Defined View that is effectively a combination of the view query and the view definition. It is possible to compose these queries in much more complex patterns. All SQL operations are allowed in both queries and are combined when appropriate.
By default, the connector attempts to negotiate SSL/TLS by checking the server's certificate against the system's trusted certificate store.
To specify another certificate, see the SSLServerCert property for the available formats to do so.
This section shows the available API objects and provides more information on executing SQL to Azure Data Lake Storage APIs.
Views describes the available views. Views are statically defined to model Resources and Permissions.
NOTE: Stored procedures are not currently supported. See the above note for details.
Stored Procedures are function-like interfaces to Azure Data Lake Storage. Stored procedures* allow you to execute operations to Azure Data Lake Storage, including downloading documents and moving envelopes.
The connector offloads as much of the SELECT statement processing as possible to Azure Data Lake Storage and then processes the rest of the query in memory. API limitations and requirements "are also documented in this section.
See SupportEnhancedSQL for more information on how the connector circumvents API limitations with in-memory client-side processing.
This section shows the available API objects and provides more information on executing SQL to Azure Data Lake Storage APIs.
Views describes the available views. Views are statically defined to model Resources and Permissions.
NOTE: Stored procedures are not currently supported. See the above note for details.
Stored Procedures are function-like interfaces to Azure Data Lake Storage. Stored procedures* allow you to execute operations to Azure Data Lake Storage, including downloading documents and moving envelopes.
The connector offloads as much of the SELECT statement processing as possible to Azure Data Lake Storage and then processes the rest of the query in memory. API limitations and requirements "are also documented in this section.
See SupportEnhancedSQL for more information on how the connector circumvents API limitations with in-memory client-side processing.
Views are composed of columns and pseudo columns. Views are similar to tables in the way that data is represented; however, views do not support updates. Entities that are represented as views are typically read-only entities. Often, a stored procedure* is available to update the data if such functionality is applicable to the data source.
Queries can be executed against a view as if it were a normal table, and the data that comes back is similar in that regard. To find out more about tables and stored procedures, please navigate to their corresponding entries in this help document.
Name | Description |
Permissions | Lists the permissions of the files/file specified in the path. |
Resources | Lists the contents of the supplied path. |
Lists the permissions of the files/file specified in the path.
This will return a list of permissions of all the files and directories in your system. All filters are executed client-side within the connector.
Name | Type | Description |
FullPath [KEY] | String | The full path of the file. |
OwnerRead | Boolean | Whether the owner this file belongs to has read access. |
OwnerWrite | Boolean | Whether the owner this file belongs to has write access. |
OwnerExecute | Boolean | Whether the owner this file belongs to has execute access. |
GroupRead | Boolean | Whether the group this file belongs to has read access. |
GroupWrite | Boolean | Whether the group this file belongs to has write access. |
GroupExecute | Boolean | Whether the group this file belongs to has execute access. |
OthersRead | Boolean | Whether everyone else has read access. |
OthersWrite | Boolean | Whether everyone else has write access. |
OthersExecute | Boolean | Whether everyone else has execute access. |
Lists the contents of the supplied path.
This will return a list of all the files and directories in your system. By default, all subfolders are recursively scanned to list their children. You can configure the depth of subfolders you want to be recursively scanned with DirectoryRetrievalDepth property. All filters are executed client side within the connector.
Name | Type | Description |
PathSuffix [KEY] | String | Name of the file/folder. |
FullPath | String | Full path to the file/folder. |
Owner | String | The owner identifier. |
Group | String | The name of a group. |
Length | Long | The size of the file. |
Permission | String | Permissions set to the file/folder. |
Replication | Integer | The number of replications of a file. |
BlockSize | Long | The block size of a file. |
ModificationTime | Datetime | The timestamp when the file/folder was modified for the last time. |
AccessTime | Datetime | The access time of a file/directory. |
Type | String | The type of the resource FILE/FOLDER |
NOTE: Stored procedures are not currently supported. See the above note for details.
Stored procedures* are available to complement the data available from the Data Model. It may be necessary to update data available from a view using a stored procedure* because the data does not provide for direct, table-like, two-way updates. In these situations, the retrieval of the data is done using the appropriate view or table, while the update is done by calling a stored procedure. Stored procedures* take a list of parameters and return back a dataset that contains the collection of tuples that constitute the response.
Name | Description |
AppendToFile | Create and Write to a File. |
Concat | Concatenate a group of files to another file. |
DeleteFile | Delete a file or a directory. |
DownloadFile | Open and read a file. |
GetAdminConsentURL | Gets the admin consent URL that must be opened separately by an admin of a given domain to grant access to your application. Only needed when using custom OAuth credentials. |
GetContentSummary | Get the content summary of a file/folder. |
GetOAuthAccessToken | Gets an authentication token from Azure DataLakeStorage. |
GetOAuthAuthorizationURL | Gets an authorization URL from the data source. The authorization URL can be used to generate a verifier required to obtain the OAuth token. |
ListStatus | Lists the contents of the supplied path. |
MakeDirectory | Create a directory in the specified path. |
RefreshOAuthAccessToken | Refreshes the OAuth access token used for authentication. |
RenameFile | Rename a file or a directory. |
SetOwner | Set owner and group of a path. |
SetPermission | Set permission of a path. |
UploadFile | Create and Write to a File. |
Create and Write to a File.
Name | Type | Required | Description |
Path | String | False | The absolute path of the file for which content will be appended. |
Content | String | False | The content which will be appended to the specified file. Has lower priority than FilePath. |
FilePath | String | False | The path of the file whose content will be appended to the specified file. Has higher priority than Content. |
Name | Type | Description |
Success | Boolean | Whether the operation completed successfully or not. |
Concatenate a group of files to another file.
Name | Type | Required | Description |
Path | String | False | The path who will be concatenated with other paths/sources. |
Sources | String | True | A comma separated list of paths/sources. These will be joined to the Path input. |
Name | Type | Description |
Success | Boolean | Whether the operation completed successfully or not. |
Delete a file or a directory.
Name | Type | Required | Description |
Path | String | False | The path (file or folder) which will be renamed. |
Recursive | Boolean | False | If the the path to be deleted is a folder, whether all children should be deleted as well. |
Name | Type | Description |
Success | Boolean | Whether the operation completed successfully or not. |
Open and read a file.
Name | Type | Required | Description |
Path | String | False | The path of the file which will be opened. |
Offset | Integer | False | The offset from which the reading will start. |
Length | Integer | False | The amount of how much will be read from the file. |
BufferSize | Integer | False | The internal size of the buffer which will be used for the reading of the file |
WriteToFile | String | False | The local location of the file where the output will be written to. If not set, the output will be displayed. |
Name | Type | Description |
Output | String | The path of the current user's home directory. |
Gets the admin consent URL that must be opened separately by an admin of a given domain to grant access to your application. Only needed when using custom OAuth credentials.
Name | Type | Required | Description |
CallbackUrl | String | False | The URL the user will be redirected to after authorizing your application. This value must match the Reply URL in the Azure AD app settings. |
State | String | False | The same value for state that you sent when you requested the authorization code. |
Name | Type | Description |
URL | String | The authorization URL, entered into a Web browser to obtain the verifier token and authorize your app. |
Get the content summary of a file/folder.
Name | Type | Required | Description |
Path | String | False | The absolute path of the file/folder whose content summary will be returned. |
Name | Type | Description |
DirectoryCount | Int | The number of directories in this folder. |
FileCount | Int | The number of files in this folder. |
Length | Long | The length of the folder/file. |
Quota | Integer | The quota of the folder/file. |
SpaceConsumed | Integer | The amount of space consumed by this folder/file. |
SpaceQuota | Integer | The space quota of the folder/file. |
Gets an authentication token from Azure DataLakeStorage.
Name | Type | Required | Description |
AuthMode | String | False | The type of authentication mode to use. Select App for getting authentication tokens via a desktop app. Select Web for getting authentication tokens via a Web app.
The allowed values are APP, WEB. The default value is APP. |
Verifier | String | False | A verifier returned by the service that must be input to return the access token. Needed only when using the Web auth mode. Obtained by navigating to the URL returned in GetOAuthAuthorizationUrl. |
CallbackUrl | String | False | The URL the user will be redirected to after authorizing your application. |
Prompt | String | False | Defaults to 'select_account' which prompts the user to select account while authenticating. Set to 'None', for no prompt, 'login' to force user to enter their credentials or 'consent' to trigger the OAuth consent dialog after the user signs in, asking the user to grant permissions to the app. |
Name | Type | Description |
OAuthAccessToken | String | The OAuth token. |
OAuthRefreshToken | String | The OAuth refresh token. |
ExpiresIn | String | The remaining lifetime for the access token in seconds. |
Gets an authorization URL from the data source. The authorization URL can be used to generate a verifier required to obtain the OAuth token.
Name | Type | Required | Description |
CallbackURL | String | False | The URL the user will be redirected to after authorizing your application. |
State | String | False | This field indicates any state that may be useful to your application upon receipt of the response. Your application receives the same value it sent, as this parameter makes a round-trip to Dynamics authorization server and back. Uses include redirecting the user to the correct resource in your site, using nonces, and mitigating cross-site request forgery. |
Prompt | String | False | Defaults to 'select_account' which prompts the user to select account while authenticating. Set to 'None', for no prompt, 'login' to force user to enter their credentials or 'consent' to trigger the OAuth consent dialog after the user signs in, asking the user to grant permissions to the app. |
Name | Type | Description |
URL | String | The authorization URL that will need to be opened for the user to authorize your app. |
Lists the contents of the supplied path.
Name | Type | Required | Description |
Path | String | False |
Name | Type | Description |
PathSuffix | String | |
Owner | String | |
Group | String | |
Length | Long | |
Permission | String | |
Replication | Integer | |
BlockSize | Long | |
ModificationTime | Datetime | |
AccessTime | Datetime | |
Type | String |
Create a directory in the specified path.
Name | Type | Required | Description |
Path | String | False | The path of the new directory which will be created. |
Permission | String | False | The permission of the new directory. If no permissions are specified, the newly created directory will have 755 permission as default. |
Name | Type | Description |
Success | Boolean | Whether the operation completed successfully or not. |
Refreshes the OAuth access token used for authentication.
Name | Type | Required | Description |
OAuthRefreshToken | String | False | The refresh token returned from the original authorization code exchange. |
Name | Type | Description |
OAuthAccessToken | String | The authentication token returned. This can be used in subsequent calls to other operations for this particular service. |
OAuthRefreshToken | String | A token that may be used to obtain a new access token. |
ExpiresIn | String | The remaining lifetime on the access token. |
Rename a file or a directory.
Name | Type | Required | Description |
Path | String | False | The path which will be renamed. |
Destination | String | True | The new path for the renamed file/folder. |
Name | Type | Description |
Success | Boolean | Whether the operation completed successfully or not. |
Set owner and group of a path.
Name | Type | Required | Description |
Path | String | False | The path whose owner/group will be changed. |
Owner | String | False | The new owner. |
Group | String | False | The new group. |
Name | Type | Description |
Success | Boolean | Whether the operation completed successfully or not. |
Set permission of a path.
Name | Type | Required | Description |
Path | String | False | The path whose permissions will be changed |
Permission | String | True | Unix permissions in an octal (base-8) notation. |
Name | Type | Description |
Success | Boolean | Whether the operation completed successfully or not. |
Create and Write to a File.
Name | Type | Required | Description |
Path | String | False | The absolute path of the file which will be created. |
Overwrite | Boolean | False | If set to true, the file will be overwritten. |
Permission | String | False | The permissions which will be set for the created file. |
Content | String | False | The content which will be written to the newly created file. Has lower priority than FilePath. |
FilePath | String | False | The path of the file whose content will be written to the newly created file. Has higher priority than Content. |
Name | Type | Description |
Success | Boolean | Whether the operation completed successfully or not. |
This section shows the available API objects and provides more information on executing SQL to Azure Data Lake Storage APIs.
Views describes the available views. Views are statically defined to model Resources.
NOTE: Stored procedures are not currently supported. See the above note for details.
Stored Procedures are function-like interfaces to Azure Data Lake Storage. Stored procedures* allow you to execute operations to Azure Data Lake Storage, including downloading documents and moving envelopes.
The connector offloads as much of the SELECT statement processing as possible to Azure Data Lake Storage and then processes the rest of the query in memory. API limitations and requirements "are also documented in this section.
See SupportEnhancedSQL for more information on how the connector circumvents API limitations with in-memory client-side processing.
Views are composed of columns and pseudo columns. Views are similar to tables in the way that data is represented; however, views do not support updates. Entities that are represented as views are typically read-only entities. Often, a stored procedure* is available to update the data if such functionality is applicable to the data source.
Queries can be executed against a view as if it were a normal table, and the data that comes back is similar in that regard. To find out more about tables and stored procedures, please navigate to their corresponding entries in this help document.
Name | Description |
Resources | Lists the contents of the supplied path. |
Lists the contents of the supplied path.
This will return a list of all the files and directories in your system. By default, all the files and folders of the first level will be retrieved. You can configure the connector to read all files and folders recursively by setting the IncludeSubDirectories property to true. All filters are executed client-side within the connector.
A simple query such as SELECT * FROM Resources will have different results with different combinations of IncludeSubDirectories and Directory connection properties.You can alternatively specify the Directory column in the WHERE clause conditions to list the files in a specific directory.
For example, the below query will list the files located in the first level of the 'SecondLevelDir1' directory.
SELECT * FROM Resources WHERE Directory='FirstLevelDir/SecondLevelDir1'
You can also make use of the IN operator, in order to list the files located in the first levels of multiple specified directories.
SELECT * FROM Resources WHERE Directory IN ('FirstLevelDir', 'FirstLevelDir/SecondLevelDir1', 'FirstLevelDir/SecondLevelDir2')
Note: The connector will use the Azure Data Lake Storage API to process WHERE clause conditions built with the following Directory column, and the '=' operator. The rest of the filter is executed client side within the connector.
Name | Type | Description |
Name [KEY] | String | The path of the file or folder. |
Directory | String | The directory path of the file or folder. |
IsDirectory | String | Determines if the resource is a folder or a file. |
ContentLength | Long | Determines the size of the file in bytes. |
LastModified | Timestamp | The timestamp when the file was modified for the last time. |
Owner | String | The name of the owner. |
Permissions | String | The permissions set to the file. |
ETag | String | Unique identifier of the file or folder. |
Pseudo column fields are used in the WHERE clause of SELECT statements and offer a more granular control over the tuples that are returned from the data source.
Name | Type | Description |
Recursive | Boolean | Set this to true to retrieve all sub folders and files.
The default value is false. |
NOTE: Stored procedures are not currently supported. See the above note for details.
Stored procedures* are available to complement the data available from the Data Model. It may be necessary to update data available from a view using a stored procedure* because the data does not provide for direct, table-like, two-way updates. In these situations, the retrieval of the data is done using the appropriate view or table, while the update is done by calling a stored procedure. Stored procedures* take a list of parameters and return back a dataset that contains the collection of tuples that constitute the response.
Name | Description |
CopyFile | Copy a file from a source to a destination. |
CreateFile | Create a file. Path is required for a successful operation. |
CreateFolder | Create a folder. Path is required for a successful operation. |
DeleteObject | Delete a file or a folder. Path is required for a successful operation. |
DownloadFile | Download a file. Path and DownloadPath are required for a successful operation. |
GetOAuthAccessToken | Gets the OAuth access token from SharePoint. |
GetOAuthAuthorizationURL | Gets the SharePoint authorization URL. Access the URL returned in the output in a Web browser. This requests the access token that can be used as part of the connection string to SharePoint. |
RefreshOAuthAccessToken | Refreshes the OAuth access token used for authentication with SharePoint. |
RenameObject | Rename a file or a directory. Path and RenameTo are required for a successful operation. |
UploadFile | Upload a File. Path and FilePath are required for a successful operation. |
Copy a file from a source to a destination.
To copy the file named Guidelines.txt which is situated in the root of the container to the folder Sales with the new name Guidelines renamed.txt use the stored procedure* like this:
EXEC CopyFile SourcePath=[Guidelines.txt], DestinationPath=[Sales/Guidelines renamed.txt]
Name | Type | Description |
SourcePath | String | Required. The path of the file which will be copied. |
DestinationPath | String | Required. The path of the file where it will be copied. |
Name | Type | Description |
Success | String | Determines if the operation was successful or not. |
Create a file. Path is required for a successful operation.
Name | Type | Description |
Path | String | The path of the file which will be created. |
Name | Type | Description |
Success | String | Determines if the operation was successful or not. |
Create a folder. Path is required for a successful operation.
Name | Type | Description |
Path | String | The path of the folder which will be created. |
Name | Type | Description |
Success | String | Determines if the operation was successful or not. |
Delete a file or a folder. Path is required for a successful operation.
EXEC DeleteObject Path=[directory1/file1.txt]
EXEC DeleteObject Path=[directory1], Recursive=true
EXEC DeleteObject Path=[directory1], DeleteType=[FILES]
EXEC DeleteObject Path=[directory1], DeleteType=[FILES], Recursive=truethen it would have the same effect as:
EXEC DeleteObject Path=[directory1], Recursive=trueIt is also slower, so it is not recommended.
Name | Type | Description |
Path | String | The path of the file or folder to be deleted. |
Recursive | Boolean | Set this to true to delete a folder with all of it content.
The default value is false. |
DeleteType | String | Set this to FILESANDFOLDERS to delete the file or folder specified in the Path parameter. Set this to FILES to delete only the files inside the folder specified in the Path parameter.
The allowed values are FILES, FILESANDFOLDERS. The default value is FILESANDFOLDERS. |
Name | Type | Description |
Success | String | Determines if the operation was successful or not. |
Download a file. Path and DownloadPath are required for a successful operation.
Name | Type | Description |
Path | String | The path of the file which will be downloaded. |
DownloadPath | String | The path where the file will be downloaded. |
Name | Type | Description |
Success | String | The path of the current user's home directory. |
Gets the OAuth access token from SharePoint.
Name | Type | Description |
AuthMode | String | The type of authentication mode to use. The allowed values are APP, WEB. |
Verifier | String | The verifier token returned by SharePoint after using the URL obtained with GetOAuthAuthorizationURL. Required for only the Web AuthMode. |
CallbackUrl | String | The URL the user will be redirected to after authorizing your application. |
State | String | Any value that you wish to be sent with the callback. |
Prompt | String | Defaults to 'select_account' which prompts the user to select account while authenticating. Set to 'None', for no prompt, 'login' to force user to enter their credentials or 'consent' to trigger the OAuth consent dialog after the user signs in, asking the user to grant permissions to the app. |
Name | Type | Description |
OAuthAccessToken | String | The authentication token returned from SharePoint. |
OAuthRefreshToken | String | A token that may be used to obtain a new access token. |
ExpiresIn | String | The remaining lifetime for the access token in seconds. |
Gets the SharePoint authorization URL. Access the URL returned in the output in a Web browser. This requests the access token that can be used as part of the connection string to SharePoint.
Name | Type | Description |
CallbackUrl | String | The URL that Sharepoint will return to after the user has authorized your app. |
State | String | Any value that you wish to be sent with the callback. |
Prompt | String | Defaults to 'select_account' which prompts the user to select account while authenticating. Set to 'None', for no prompt, 'login' to force user to enter their credentials or 'consent' to trigger the OAuth consent dialog after the user signs in, asking the user to grant permissions to the app. |
Name | Type | Description |
URL | String | The URL to be entered into a Web browser to obtain the verifier token and authorize the data provider with. |
Refreshes the OAuth access token used for authentication with SharePoint.
Name | Type | Description |
OAuthRefreshToken | String | The old token to be refreshed. |
Name | Type | Description |
OAuthAccessToken | String | The authentication token returned from SharePoint. |
ExpiresIn | String | The remaining lifetime on the access token. |
Rename a file or a directory. Path and RenameTo are required for a successful operation.
Name | Type | Description |
Path | String | The path which will be renamed. |
RenameTo | String | The new name of the file/folder. |
Name | Type | Description |
Success | Boolean | Whether the operation completed successfully or not. |
Upload a File. Path and FilePath are required for a successful operation.
In case you want to upload a single file, specify a source and destination file as parameter values. For example:
EXEC UploadFile Path='destination/path/name.txt', FilePath='source/path/name.txt'
In case you want to upload multiple files, specify a source and destination folder to upload all the files inside that folder. Use the MaxThreads property to control how many files will be uploaded at a time in parallel. For example:
EXEC UploadFile Path='destination/path', FilePath='source/path'
Increase the Timeout value if you are uploading large files.
Name | Type | Description |
Path | String | Set this to the path of the file where it will be uploaded. |
FilePath | String | Set this to the path of the file which will be uploaded. Example: C:/Users/User/Desktop/SampleUploadtest.txt. |
Name | Type | Description |
Success | Boolean | Whether the operation completed successfully or not. |
You can query the system tables described in this section to access schema information, information on data source functionality, and batch operation statistics.
The following tables return database metadata for Azure Data Lake Storage:
The following tables return information about how to connect to and query the data source:
The following table returns query statistics for data modification queries:
Lists the available databases.
The following query retrieves all databases determined by the connection string:
SELECT * FROM sys_catalogs
Name | Type | Description |
CatalogName | String | The database name. |
Lists the available schemas.
The following query retrieves all available schemas:
SELECT * FROM sys_schemas
Name | Type | Description |
CatalogName | String | The database name. |
SchemaName | String | The schema name. |
Lists the available tables.
The following query retrieves the available tables and views:
SELECT * FROM sys_tables
Name | Type | Description |
CatalogName | String | The database containing the table or view. |
SchemaName | String | The schema containing the table or view. |
TableName | String | The name of the table or view. |
TableType | String | The table type (table or view). |
Description | String | A description of the table or view. |
IsUpdateable | Boolean | Whether the table can be updated. |
Describes the columns of the available tables and views.
The following query returns the columns and data types for the Resources table:
SELECT ColumnName, DataTypeName FROM sys_tablecolumns WHERE TableName='Resources'
Name | Type | Description |
CatalogName | String | The name of the database containing the table or view. |
SchemaName | String | The schema containing the table or view. |
TableName | String | The name of the table or view containing the column. |
ColumnName | String | The column name. |
DataTypeName | String | The data type name. |
DataType | Int32 | An integer indicating the data type. This value is determined at run time based on the environment. |
Length | Int32 | The storage size of the column. |
DisplaySize | Int32 | The designated column's normal maximum width in characters. |
NumericPrecision | Int32 | The maximum number of digits in numeric data. The column length in characters for character and date-time data. |
NumericScale | Int32 | The column scale or number of digits to the right of the decimal point. |
IsNullable | Boolean | Whether the column can contain null. |
Description | String | A brief description of the column. |
Ordinal | Int32 | The sequence number of the column. |
IsAutoIncrement | String | Whether the column value is assigned in fixed increments. |
IsGeneratedColumn | String | Whether the column is generated. |
IsHidden | Boolean | Whether the column is hidden. |
IsArray | Boolean | Whether the column is an array. |
Lists the available stored procedures.
The following query retrieves the available stored procedures:
SELECT * FROM sys_procedures
Name | Type | Description |
CatalogName | String | The database containing the stored procedure. |
SchemaName | String | The schema containing the stored procedure. |
ProcedureName | String | The name of the stored procedure. |
Description | String | A description of the stored procedure. |
ProcedureType | String | The type of the procedure, such as PROCEDURE or FUNCTION. |
Describes stored procedure* parameters.
The following query returns information about all of the input parameters for the DownloadFile stored procedure:
SELECT * FROM sys_procedureparameters WHERE ProcedureName='DownloadFile' AND Direction=1 OR Direction=2
Name | Type | Description |
CatalogName | String | The name of the database containing the stored procedure. |
SchemaName | String | The name of the schema containing the stored procedure. |
ProcedureName | String | The name of the stored procedure* containing the parameter. |
ColumnName | String | The name of the stored procedure* parameter. |
Direction | Int32 | An integer corresponding to the type of the parameter: input (1), input/output (2), or output(4). input/output type parameters can be both input and output parameters. |
DataTypeName | String | The name of the data type. |
DataType | Int32 | An integer indicating the data type. This value is determined at run time based on the environment. |
Length | Int32 | The number of characters allowed for character data. The number of digits allowed for numeric data. |
NumericPrecision | Int32 | The maximum precision for numeric data. The column length in characters for character and date-time data. |
NumericScale | Int32 | The number of digits to the right of the decimal point in numeric data. |
IsNullable | Boolean | Whether the parameter can contain null. |
IsRequired | Boolean | Whether the parameter is required for execution of the procedure. |
IsArray | Boolean | Whether the parameter is an array. |
Description | String | The description of the parameter. |
Ordinal | Int32 | The index of the parameter. |
Describes the primary and foreign keys. The following query retrieves the primary key for the Resources table:
SELECT * FROM sys_keycolumns WHERE IsKey='True' AND TableName='Resources'
Name | Type | Description |
CatalogName | String | The name of the database containing the key. |
SchemaName | String | The name of the schema containing the key. |
TableName | String | The name of the table containing the key. |
ColumnName | String | The name of the key column. |
IsKey | Boolean | Whether the column is a primary key in the table referenced in the TableName field. |
IsForeignKey | Boolean | Whether the column is a foreign key referenced in the TableName field. |
PrimaryKeyName | String | The name of the primary key. |
ForeignKeyName | String | The name of the foreign key. |
ReferencedCatalogName | String | The database containing the primary key. |
ReferencedSchemaName | String | The schema containing the primary key. |
ReferencedTableName | String | The table containing the primary key. |
ReferencedColumnName | String | The column name of the primary key. |
Describes the foreign keys. The following query retrieves all foreign keys which refer to other tables:
SELECT * FROM sys_foreignkeys WHERE ForeignKeyType = 'FOREIGNKEY_TYPE_IMPORT'
Name | Type | Description |
CatalogName | String | The name of the database containing the key. |
SchemaName | String | The name of the schema containing the key. |
TableName | String | The name of the table containing the key. |
ColumnName | String | The name of the key column. |
PrimaryKeyName | String | The name of the primary key. |
ForeignKeyName | String | The name of the foreign key. |
ReferencedCatalogName | String | The database containing the primary key. |
ReferencedSchemaName | String | The schema containing the primary key. |
ReferencedTableName | String | The table containing the primary key. |
ReferencedColumnName | String | The column name of the primary key. |
ForeignKeyType | String | Designates whether the foreign key is an import (points to other tables) or export (referenced from other tables) key. |
Describes the available indexes. By filtering on indexes, you can write more selective queries with faster query response times.
The following query retrieves all indexes that are not primary keys:
SELECT * FROM sys_indexes WHERE IsPrimary='false'
Name | Type | Description |
CatalogName | String | The name of the database containing the index. |
SchemaName | String | The name of the schema containing the index. |
TableName | String | The name of the table containing the index. |
IndexName | String | The index name. |
ColumnName | String | The name of the column associated with the index. |
IsUnique | Boolean | True if the index is unique. False otherwise. |
IsPrimary | Boolean | True if the index is a primary key. False otherwise. |
Type | Int16 | An integer value corresponding to the index type: statistic (0), clustered (1), hashed (2), or other (3). |
SortOrder | String | The sort order: A for ascending or D for descending. |
OrdinalPosition | Int16 | The sequence number of the column in the index. |
Returns information on the available connection properties and those set in the connection string.
When querying this table, the config connection string should be used:
jdbc:cdata:adls:config:
This connection string enables you to query this table without a valid connection.
The following query retrieves all connection properties that have been set in the connection string or set through a default value:
SELECT * FROM sys_connection_props WHERE Value <> ''
Name | Type | Description |
Name | String | The name of the connection property. |
ShortDescription | String | A brief description. |
Type | String | The data type of the connection property. |
Default | String | The default value if one is not explicitly set. |
Values | String | A comma-separated list of possible values. A validation error is thrown if another value is specified. |
Value | String | The value you set or a preconfigured default. |
Required | Boolean | Whether the property is required to connect. |
Category | String | The category of the connection property. |
IsSessionProperty | String | Whether the property is a session property, used to save information about the current connection. |
Sensitivity | String | The sensitivity level of the property. This informs whether the property is obfuscated in logging and authentication forms. |
PropertyName | String | A camel-cased truncated form of the connection property name. |
Ordinal | Int32 | The index of the parameter. |
CatOrdinal | Int32 | The index of the parameter category. |
Hierarchy | String | Shows dependent properties associated that need to be set alongside this one. |
Visible | Boolean | Informs whether the property is visible in the connection UI. |
ETC | String | Various miscellaneous information about the property. |
Describes the SELECT query processing that the connector can offload to the data source.
When working with data sources that do not support SQL-92, you can query the sys_sqlinfo view to determine the query capabilities of the underlying APIs, expressed in SQL syntax. The connector offloads as much of the SELECT statement processing as possible to the server and then processes the rest of the query in memory.
Below is an example data set of SQL capabilities. The following result set indicates the SELECT functionality that the connector can offload to the data source or process client side. Your data source may support additional SQL syntax. Some aspects of SELECT functionality are returned in a comma-separated list if supported; otherwise, the column contains NO.
Name | Description | Possible Values |
AGGREGATE_FUNCTIONS | Supported aggregation functions. | AVG, COUNT, MAX, MIN, SUM, DISTINCT |
COUNT | Whether COUNT function is supported. | YES, NO |
IDENTIFIER_QUOTE_OPEN_CHAR | The opening character used to escape an identifier. | [ |
IDENTIFIER_QUOTE_CLOSE_CHAR | The closing character used to escape an identifier. | ] |
SUPPORTED_OPERATORS | A list of supported SQL operators. | =, >, <, >=, <=, <>, !=, LIKE, NOT LIKE, IN, NOT IN, IS NULL, IS NOT NULL, AND, OR |
GROUP_BY | Whether GROUP BY is supported, and, if so, the degree of support. | NO, NO_RELATION, EQUALS_SELECT, SQL_GB_COLLATE |
STRING_FUNCTIONS | Supported string functions. | LENGTH, CHAR, LOCATE, REPLACE, SUBSTRING, RTRIM, LTRIM, RIGHT, LEFT, UCASE, SPACE, SOUNDEX, LCASE, CONCAT, ASCII, REPEAT, OCTET, BIT, POSITION, INSERT, TRIM, UPPER, REGEXP, LOWER, DIFFERENCE, CHARACTER, SUBSTR, STR, REVERSE, PLAN, UUIDTOSTR, TRANSLATE, TRAILING, TO, STUFF, STRTOUUID, STRING, SPLIT, SORTKEY, SIMILAR, REPLICATE, PATINDEX, LPAD, LEN, LEADING, KEY, INSTR, INSERTSTR, HTML, GRAPHICAL, CONVERT, COLLATION, CHARINDEX, BYTE |
NUMERIC_FUNCTIONS | Supported numeric functions. | ABS, ACOS, ASIN, ATAN, ATAN2, CEILING, COS, COT, EXP, FLOOR, LOG, MOD, SIGN, SIN, SQRT, TAN, PI, RAND, DEGREES, LOG10, POWER, RADIANS, ROUND, TRUNCATE |
TIMEDATE_FUNCTIONS | Supported date/time functions. | NOW, CURDATE, DAYOFMONTH, DAYOFWEEK, DAYOFYEAR, MONTH, QUARTER, WEEK, YEAR, CURTIME, HOUR, MINUTE, SECOND, TIMESTAMPADD, TIMESTAMPDIFF, DAYNAME, MONTHNAME, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, EXTRACT |
REPLICATION_SKIP_TABLES | Indicates tables skipped during replication. | |
REPLICATION_TIMECHECK_COLUMNS | A string array containing a list of columns which will be used to check for (in the given order) to use as a modified column during replication. | |
IDENTIFIER_PATTERN | String value indicating what string is valid for an identifier. | |
SUPPORT_TRANSACTION | Indicates if the provider supports transactions such as commit and rollback. | YES, NO |
DIALECT | Indicates the SQL dialect to use. | |
KEY_PROPERTIES | Indicates the properties which identify the uniform database. | |
SUPPORTS_MULTIPLE_SCHEMAS | Indicates if multiple schemas may exist for the provider. | YES, NO |
SUPPORTS_MULTIPLE_CATALOGS | Indicates if multiple catalogs may exist for the provider. | YES, NO |
DATASYNCVERSION | The Data Sync version needed to access this driver. | Standard, Starter, Professional, Enterprise |
DATASYNCCATEGORY | The Data Sync category of this driver. | Source, Destination, Cloud Destination |
SUPPORTSENHANCEDSQL | Whether enhanced SQL functionality beyond what is offered by the API is supported. | TRUE, FALSE |
SUPPORTS_BATCH_OPERATIONS | Whether batch operations are supported. | YES, NO |
SQL_CAP | All supported SQL capabilities for this driver. | SELECT, INSERT, DELETE, UPDATE, TRANSACTIONS, ORDERBY, OAUTH, ASSIGNEDID, LIMIT, LIKE, BULKINSERT, COUNT, BULKDELETE, BULKUPDATE, GROUPBY, HAVING, AGGS, OFFSET, REPLICATE, COUNTDISTINCT, JOINS, DROP, CREATE, DISTINCT, INNERJOINS, SUBQUERIES, ALTER, MULTIPLESCHEMAS, GROUPBYNORELATION, OUTERJOINS, UNIONALL, UNION, UPSERT, GETDELETED, CROSSJOINS, GROUPBYCOLLATE, MULTIPLECATS, FULLOUTERJOIN, MERGE, JSONEXTRACT, BULKUPSERT, SUM, SUBQUERIESFULL, MIN, MAX, JOINSFULL, XMLEXTRACT, AVG, MULTISTATEMENTS, FOREIGNKEYS, CASE, LEFTJOINS, COMMAJOINS, WITH, LITERALS, RENAME, NESTEDTABLES, EXECUTE, BATCH, BASIC, INDEX |
PREFERRED_CACHE_OPTIONS | A string value specifies the preferred cacheOptions. | |
ENABLE_EF_ADVANCED_QUERY | Indicates if the driver directly supports advanced queries coming from Entity Framework. If not, queries will be handled client side. | YES, NO |
PSEUDO_COLUMNS | A string array indicating the available pseudo columns. | |
MERGE_ALWAYS | If the value is true, The Merge Mode is forcibly executed in Data Sync. | TRUE, FALSE |
REPLICATION_MIN_DATE_QUERY | A select query to return the replicate start datetime. | |
REPLICATION_MIN_FUNCTION | Allows a provider to specify the formula name to use for executing a server side min. | |
REPLICATION_START_DATE | Allows a provider to specify a replicate startdate. | |
REPLICATION_MAX_DATE_QUERY | A select query to return the replicate end datetime. | |
REPLICATION_MAX_FUNCTION | Allows a provider to specify the formula name to use for executing a server side max. | |
IGNORE_INTERVALS_ON_INITIAL_REPLICATE | A list of tables which will skip dividing the replicate into chunks on the initial replicate. | |
CHECKCACHE_USE_PARENTID | Indicates whether the CheckCache statement should be done against the parent key column. | TRUE, FALSE |
CREATE_SCHEMA_PROCEDURES | Indicates stored procedures* that can be used for generating schema files. |
SELECT * FROM sys_sqlinfo WHERE Name='SUPPORTED_OPERATORS'
Note that individual tables may have different limitations or requirements on the WHERE clause; refer to the Data Model section for more information.
Name | Type | Description |
NAME | String | A component of SQL syntax, or a capability that can be processed on the server. |
VALUE | String | Detail on the supported SQL or SQL syntax. |
Returns information about attempted modifications.
The following query retrieves the Ids of the modified rows in a batch operation:
SELECT * FROM sys_identity
Name | Type | Description |
Id | String | The database-generated ID returned from a data modification operation. |
Batch | String | An identifier for the batch. 1 for a single operation. |
Operation | String | The result of the operation in the batch: INSERTED, UPDATED, or DELETED. |
Message | String | SUCCESS or an error message if the update in the batch failed. |
The advanced configurations properties are the various options that can be used to establish a connection. This section provides a complete list of the options you can configure. Click the links for further details.
Property | Description |
AuthScheme | The type of authentication to use when connecting to Azure Data Lake Storage. |
Schema | Schema to use ADLSGen1, ADLSGen2. |
Account | This property specifies the name of the Azure Data Lake storage account. |
AccessKey | Your Azure DataLakeStorage Gen 2 storage account access key. |
FileSystem | This property specifies the name of the FileSystem which will be used in a Gen 2 storage account. For Example, the name of your Azure Blob container. |
SharedAccessSignature | A shared access key signature that may be used for authentication. |
Property | Description |
Directory | This property specifies the root path of Azure Data Lake Storage to list files and folders. |
DirectoryRetrievalDepth | Limit the subfolders recursively scanned in the ADLSGen1 Schema. |
IncludeSubDirectories | Choose if the sub directories paths should be listed in the Resources view in the ADLSGen2 Schema. |
Property | Description |
AzureTenant | The Microsoft Online tenant being used to access data. If not specified, your default tentant will be used. |
AzureEnvironment | The Azure Environment to use when establishing a connection to the Azure Data Lake Storage Gen1. |
Property | Description |
InitiateOAuth | Set this property to initiate the process to obtain or refresh the OAuth access token when you connect. |
OAuthClientId | The client ID assigned when you register your application with an OAuth authorization server. |
OAuthClientSecret | The client secret assigned when you register your application with an OAuth authorization server. |
OAuthAccessToken | The access token for connecting using OAuth. |
CallbackURL | The OAuth callback URL to return to when authenticating. This value must match the callback URL you specify in your app settings. |
OAuthGrantType | The grant type for the OAuth flow. |
OAuthVerifier | The verifier code returned from the OAuth authorization URL. |
OAuthRefreshToken | The OAuth refresh token for the corresponding OAuth access token. |
OAuthExpiresIn | The lifetime in seconds of the OAuth AccessToken. |
OAuthTokenTimestamp | The Unix epoch timestamp in milliseconds when the current Access Token was created. |
Property | Description |
SSLServerCert | The certificate to be accepted from the server when connecting using TLS/SSL. |
Property | Description |
Location | A path to the directory that contains the schema files defining tables, views, and stored procedures. |
BrowsableSchemas | This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA, SchemaB, SchemaC. |
Tables | This property restricts the tables reported to a subset of the available tables. For example, Tables=TableA, TableB, TableC. |
Views | Restricts the views reported to a subset of the available tables. For example, Views=ViewA, ViewB, ViewC. |
Property | Description |
ChunkSize | The size of chunks (in Mb) to use when uploading large files. |
MaxRows | Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time. |
MaxThreads | Specifies the number of concurrent requests. |
Other | These hidden properties are used only in specific use cases. |
PseudoColumns | This property indicates whether or not to include pseudo columns as columns to the table. |
Timeout | The value in seconds until the timeout error is thrown, canceling the operation. |
This section provides a complete list of authentication properties you can configure.
Property | Description |
AuthScheme | The type of authentication to use when connecting to Azure Data Lake Storage. |
Schema | Schema to use ADLSGen1, ADLSGen2. |
Account | This property specifies the name of the Azure Data Lake storage account. |
AccessKey | Your Azure DataLakeStorage Gen 2 storage account access key. |
FileSystem | This property specifies the name of the FileSystem which will be used in a Gen 2 storage account. For Example, the name of your Azure Blob container. |
SharedAccessSignature | A shared access key signature that may be used for authentication. |
The type of authentication to use when connecting to Azure Data Lake Storage.
string
"Auto"
Schema to use ADLSGen1,ADLSGen2.
string
"ADLSGen2"
ADLSGen1 consumes the Azure Data Lake Storage Gen1 API which makes use of the WebHDFS REST API Specifications. ADLSGen2 consumes a newer version of the API, namely Data Lake Storage Gen2.
This property specifies the name of the Azure Data Lake storage account.
string
""
This property specifies the name of the Azure Data Lake storage account.
Your Azure DataLakeStorage Gen 2 storage account access key.
string
""
Your Azure DataLakeStorage Gen 2 storage account access key. Use this only for Gen 2 authentication. You can retrieve it as follows:
This property specifies the name of the FileSystem which will be used in a Gen 2 storage account. For Example, the name of your Azure Blob container.
string
""
This property specifies the name of the FileSystem which will be used in a Gen 2 storage account. For Example, the name of your Azure Blob container.
This section provides a complete list of connection properties you can configure.
Property | Description |
Directory | This property specifies the root path of Azure Data Lake Storage to list files and folders. |
DirectoryRetrievalDepth | Limit the subfolders recursively scanned in the ADLSGen1 Schema. |
IncludeSubDirectories | Choose if the sub directories paths should be listed in the Resources view in the ADLSGen2 Schema. |
This property specifies the root path of Azure Data Lake Storage to list files and folders.
string
""
This property specifies the root path of Azure Data Lake Storage to list files and folders.
Limit the subfolders recursively scanned in the ADLSGen1 Schema.
int
-1
DirectoryRetrievalDepth specifies how many subfolders will be recursively scanned before stopping. -1 specifies that all subfolders are scanned. 0 specified that only the current folder will be scanned for items.
Choose if the sub directories paths should be listed in the Resources view in the ADLSGen2 Schema.
bool
false
Choose if the sub directories paths should be listed in the Resources view in the ADLSGen2 Schema.
This section provides a complete list of Azure authentication properties you can configure.
Property | Description |
AzureTenant | The Microsoft Online tenant being used to access data. If not specified, your default tentant will be used. |
AzureEnvironment | The Azure Environment to use when establishing a connection to the Azure Data Lake Storage Gen1. |
The Microsoft Online tenant being used to access data. If not specified, your default tentant will be used.
string
""
The Microsoft Online tenant being used to access data. For instance, contoso.onmicrosoft.com. Alternatively, specify the tenant Id. This value is the directory ID in the Azure Portal > Azure Active Directory > Properties.
Typically it is not necessary to specify the Tenant. This can be automatically determined by Microsoft when using the OAuthGrantType set to CODE (default). However, it may fail in the case that the user belongs to multiple tenants. For instance, if an Admin of domain A invites a user of domain B to be a guest user. The user will now belong to both tenants. It is a good practice to specify the Tenant, although in general things should normally work without having to specify it. The AzureTenant is required when setting OAuthGrantType to CLIENT. When using client credentials, there is no user context. The credentials are taken from the context of the app itself. While Microsoft still allows client credentials to be obtained without specifying which Tenant, it has a much lower probability of picking the specific tenant you want to work with. For this reason, we require AzureTenant to be explicitly stated for all client credentials connections to ensure you get credentials that are applicable for the domain you intend to connect to.The Azure Environment to use when establishing a connection to the Azure Data Lake Storage Gen1.
string
"GLOBAL"
In most cases, leaving the environment set to global will work. However, if your Azure Account has been added to a different environment, the AzureEnvironment may be used to specify which environment.
This section provides a complete list of OAuth properties you can configure.
Property | Description |
InitiateOAuth | Set this property to initiate the process to obtain or refresh the OAuth access token when you connect. |
OAuthClientId | The client ID assigned when you register your application with an OAuth authorization server. |
OAuthClientSecret | The client secret assigned when you register your application with an OAuth authorization server. |
OAuthAccessToken | The access token for connecting using OAuth. |
CallbackURL | The OAuth callback URL to return to when authenticating. This value must match the callback URL you specify in your app settings. |
OAuthGrantType | The grant type for the OAuth flow. |
OAuthVerifier | The verifier code returned from the OAuth authorization URL. |
OAuthRefreshToken | The OAuth refresh token for the corresponding OAuth access token. |
OAuthExpiresIn | The lifetime in seconds of the OAuth AccessToken. |
OAuthTokenTimestamp | The Unix epoch timestamp in milliseconds when the current Access Token was created. |
Set this property to initiate the process to obtain or refresh the OAuth access token when you connect.
string
"OFF"
The following options are available:
The client ID assigned when you register your application with an OAuth authorization server.
string
""
As part of registering an OAuth application, you will receive the OAuthClientId value, sometimes also called a consumer key, and a client secret, the OAuthClientSecret.
The client secret assigned when you register your application with an OAuth authorization server.
string
""
As part of registering an OAuth application, you will receive the OAuthClientId, also called a consumer key. You will also receive a client secret, also called a consumer secret. Set the client secret in the OAuthClientSecret property.
The access token for connecting using OAuth.
string
""
The OAuthAccessToken property is used to connect using OAuth. The OAuthAccessToken is retrieved from the OAuth server as part of the authentication process. It has a server-dependent timeout and can be reused between requests.
The access token is used in place of your user name and password. The access token protects your credentials by keeping them on the server.
The OAuth callback URL to return to when authenticating. This value must match the callback URL you specify in your app settings.
string
""
During the authentication process, the OAuth authorization server redirects the user to this URL. This value must match the callback URL you specify in your app settings.
The grant type for the OAuth flow.
string
"CLIENT"
The grant type for the OAuth flow. The following options are available: CLIENT,CODE
The verifier code returned from the OAuth authorization URL.
string
""
The verifier code returned from the OAuth authorization URL. This can be used on systems where a browser cannot be launched such as headless systems.
See to obtain the OAuthVerifier value.
Set OAuthSettingsLocation along with OAuthVerifier. When you connect, the connector exchanges the OAuthVerifier for the OAuth authentication tokens and saves them, encrypted, to the specified file. Set InitiateOAuth to GETANDREFRESH automate the exchange.
Once the OAuth settings file has been generated, you can remove OAuthVerifier from the connection properties and connect with OAuthSettingsLocation set. To automatically refresh the OAuth token values, set OAuthSettingsLocation and additionally set InitiateOAuth to REFRESH.The OAuth refresh token for the corresponding OAuth access token.
string
""
The OAuthRefreshToken property is used to refresh the OAuthAccessToken when using OAuth authentication.
The lifetime in seconds of the OAuth AccessToken.
string
""
Pair with OAuthTokenTimestamp to determine when the AccessToken will expire.
The Unix epoch timestamp in milliseconds when the current Access Token was created.
string
""
Pair with OAuthExpiresIn to determine when the AccessToken will expire.
This section provides a complete list of SSL properties you can configure.
Property | Description |
SSLServerCert | The certificate to be accepted from the server when connecting using TLS/SSL. |
The certificate to be accepted from the server when connecting using TLS/SSL.
string
""
If using a TLS/SSL connection, this property can be used to specify the TLS/SSL certificate to be accepted from the server. Any other certificate that is not trusted by the machine is rejected.
This property can take the following forms:
Description | Example |
A full PEM Certificate (example shortened for brevity) | -----BEGIN CERTIFICATE----- MIIChTCCAe4CAQAwDQYJKoZIhv......Qw== -----END CERTIFICATE----- |
A path to a local file containing the certificate | C:\cert.cer |
The public key (example shortened for brevity) | -----BEGIN RSA PUBLIC KEY----- MIGfMA0GCSq......AQAB -----END RSA PUBLIC KEY----- |
The MD5 Thumbprint (hex values can also be either space or colon separated) | ecadbdda5a1529c58a1e9e09828d70e4 |
The SHA1 Thumbprint (hex values can also be either space or colon separated) | 34a929226ae0819f2ec14b4a3d904f801cbb150d |
If not specified, any certificate trusted by the machine is accepted.
Certificates are validated as trusted by the machine based on the System's trust store. The trust store used is the 'javax.net.ssl.trustStore' value specified for the system. If no value is specified for this property, Java's default trust store is used (for example, JAVA_HOME\lib\security\cacerts).
Use '*' to signify to accept all certificates. Note that this is not recommended due to security concerns.
This section provides a complete list of schema properties you can configure.
Property | Description |
Location | A path to the directory that contains the schema files defining tables, views, and stored procedures. |
BrowsableSchemas | This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA, SchemaB, SchemaC. |
Tables | This property restricts the tables reported to a subset of the available tables. For example, Tables=TableA, TableB, TableC. |
Views | Restricts the views reported to a subset of the available tables. For example, Views=ViewA, ViewB, ViewC. |
A path to the directory that contains the schema files defining tables, views, and stored procedures.
string
"%APPDATA%\\ADLS Data Provider\Schema"
The path to a directory which contains the schema files for the connector (.rsd files for tables and views, .rsb files for stored procedures). The folder location can be a relative path from the location of the executable. The Location property is only needed if you want to customize definitions (for example, change a column name, ignore a column, and so on) or extend the data model with new tables, views, or stored procedures.
Note: Given that this connector supports multiple schemas, the structure for Azure Data Lake Storage custom schema files is as follows:
If left unspecified, the default location is "%APPDATA%\\ADLS Data Provider\Schema" with %APPDATA% being set to the user's configuration directory:
Platform | %APPDATA% |
Windows | The value of the APPDATA environment variable |
Mac | ~/Library/Application Support |
Linux | ~/.config |
This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC.
string
""
Listing the schemas from databases can be expensive. Providing a list of schemas in the connection string improves the performance.
This property restricts the tables reported to a subset of the available tables. For example, Tables=TableA,TableB,TableC.
string
""
Listing the tables from some databases can be expensive. Providing a list of tables in the connection string improves the performance of the connector.
This property can also be used as an alternative to automatically listing views if you already know which ones you want to work with and there would otherwise be too many to work with.
Specify the tables you want in a comma-separated list. Each table should be a valid SQL identifier with any special characters escaped using square brackets, double-quotes or backticks. For example, Tables=TableA,[TableB/WithSlash],WithCatalog.WithSchema.`TableC With Space`.Note that when connecting to a data source with multiple schemas or catalogs, you will need to provide the fully qualified name of the table in this property, as in the last example here, to avoid ambiguity between tables that exist in multiple catalogs or schemas.
Restricts the views reported to a subset of the available tables. For example, Views=ViewA,ViewB,ViewC.
string
""
Listing the views from some databases can be expensive. Providing a list of views in the connection string improves the performance of the connector.
This property can also be used as an alternative to automatically listing views if you already know which ones you want to work with and there would otherwise be too many to work with.
Specify the views you want in a comma-separated list. Each view should be a valid SQL identifier with any special characters escaped using square brackets, double-quotes or backticks. For example, Views=ViewA,[ViewB/WithSlash],WithCatalog.WithSchema.`ViewC With Space`.
Note that when connecting to a data source with multiple schemas or catalogs, you will need to provide the fully qualified name of the table in this property, as in the last example here, to avoid ambiguity between tables that exist in multiple catalogs or schemas.
This section provides a complete list of miscellaneous properties you can configure.
Property | Description |
ChunkSize | The size of chunks (in Mb) to use when uploading large files. |
MaxRows | Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time. |
MaxThreads | Specifies the number of concurrent requests. |
Other | These hidden properties are used only in specific use cases. |
PseudoColumns | This property indicates whether or not to include pseudo columns as columns to the table. |
Timeout | The value in seconds until the timeout error is thrown, canceling the operation. |
The size of chunks (in Mb) to use when uploading large files.
int
64
The size of chunks (in Mb) to use when uploading large files.
Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time.
int
-1
Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time.
Specifies the number of concurrent requests.
string
"5"
This property allows you to issue multiple requests simultaneously, thereby improving performance.
These hidden properties are used only in specific use cases.
string
""
The properties listed below are available for specific use cases. Normal driver use cases and functionality should not require these properties.
Specify multiple properties in a semicolon-separated list.
DefaultColumnSize | Sets the default length of string fields when the data source does not provide column length in the metadata. The default value is 2000. |
ConvertDateTimeToGMT | Determines whether to convert date-time values to GMT, instead of the local time of the machine. |
RecordToFile=filename | Records the underlying socket data transfer to the specified file. |
This property indicates whether or not to include pseudo columns as columns to the table.
string
""
This setting is particularly helpful in Entity Framework, which does not allow you to set a value for a pseudo column unless it is a table column. The value of this connection setting is of the format "Table1=Column1, Table1=Column2, Table2=Column3". You can use the "*" character to include all tables and all columns; for example, "*=*".
The value in seconds until the timeout error is thrown, canceling the operation.
int
60
If Timeout = 0, operations do not time out. The operations run until they complete successfully or until they encounter an error condition.
If Timeout expires and the operation is not yet complete, the connector throws an exception.