The hadoop-azure module provides support for the Azure Data Lake Storage Gen2 storage layer through the “abfs” connector
To make it part of Apache Hadoop’s default classpath, simply make sure that HADOOP_OPTIONAL_TOOLS in hadoop-env.sh has hadoop-azure in the list.
TODO: complete/review
The abfs client has a fully consistent view of the store, which has complete Create Read Update and Delete consistency for data and metadata. (Compare and contrast with S3 which only offers Create consistency; S3Guard adds CRUD to metadata, but not the underlying data).
Any configuration can be specified generally (or as the default when accessing all accounts) or can be tied to s a specific account. For example, an OAuth identity can be configured for use regardless of which account is accessed with the property “fs.azure.account.oauth2.client.id” or you can configure an identity to be used only for a specific storage account with “fs.azure.account.oauth2.client.id.<account_name>.dfs.core.windows.net”.
Note that it doesn’t make sense to do this with some properties, like shared keys that are inherently account-specific.
See the relevant section in Testing Azure.