Immuta HDFS Plugin Installation
Audience: System Administrators
Content Summary: The Immuta HDFS plugin installation consists of two main components:
- Immuta INode Attribute Provider
- Immuta Hadoop Filesystem
This illustrates the installation of these components on a Hadoop cluster.
Installation Prerequisites
Before proceeding with installation, an Immuta System API key will need to be generated for the NameNode to communicate securely with the Immuta Web Service. To do so, run the following command. You do not need to store this key, but it will need to be written to the configuration files for Hadoop and all instances of the Immuta Web Service.
The following reads random bytes from /dev/urandom
, taking the first 30 alphanumeric characters:
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 30 | head -n 1
The key that is generated will be referred to as HDFS_SYSTEM_TOKEN
throughout this guide.
Installation
The two main components that will need to be installed on the Hadoop cluster are an Immuta INode Attribute Provider and the Immuta Hadoop Filesystem.
Installation consists of placing the jar files on the Hadoop classpath. This can be accomplished a number of ways. For
the purposes of this document, it is assumed that the jar files will be copied to a directory on the host and that
the HADOOP_CLASSPATH
variable will be updated to include these jars. Updating the HADOOP_CLASSPATH
variable
will be covered under the Configuration section.
Setup
Each node in the cluster will need to have Immuta jar files added to the Hadoop classpath. It is entirely possible to
install the jars to an existing directory. Jars can also be installed to a new directory, such as /opt/immuta/hadoop
.
mkdir -p /opt/immuta/hadoop
If a new directory is created, the hadoop user will need to have read access to the directory and files contained within
the directory. The classpath will also need to be updated by setting HADOOP_CLASSPATH
in
${HADOOP_CONF_DIR}/hadoop-env.sh
.
Immuta INode Attribute Provider (all NameNodes)
The Immuta INode Attribute Provider will be installed on all NameNodes.
Place the Immuta INode Attribute Provider jar in the installation directory, in this case /opt/immuta/hadoop/
, on
each NameNode.
Immuta Hadoop Filesystem (all nodes)
The Immuta Hadoop Filesystem jar needs to be installed on all nodes in the Hadoop cluster. To install the jar, place
it in the installation directory, /opt/immuta/hadoop/
.
Configuration
There are a few steps required in configuring the Hadoop cluster to enable the Immuta INode Attribute Provider and Hadoop filesystem access.
Once these changes are persisted to the Hadoop configuration, the cluster services must be restarted.
Setting up the Classpath
The classpath can be updated by setting HADOOP_CLASSPATH
in ${HADOOP_CONF_DIR}/hadoop-env.sh
.
For example,
HADOOP_CLASSPATH="${HADOOP_CLASSPATH}:/opt/immuta/hadoop/immuta-hadoop-filesystem-2.8.3.jar:/opt/immuta/hadoop/immuta-inode-attribute-provider-2.8.3.jar"
Shared Configuration
The following configuration items should be configured for both the NameNode processes and the DataNode processes. These configurations are used both by the Immuta FileSystem and the Immuta NameNode plugin:
immuta.base.url
: The base URL of the Immuta API.- Example:
https://immuta.hostname
- Example:
immuta.spark.partition.generator
: This configuration item MUST be set to "secure" in order for the ImmutaContext to communicate with the Partition Service rather than attempting to generate partitions itself.- Example:
secure
- Example:
immuta.secure.partition.generator.hostname
: Connection information for the Partition Service. This would point to the node you configured the Immuta Partition Service to run on, or if it is running on every node you can uselocalhost
.- Example:
localhost
- Example:
immuta.partition.tokens.ttl.seconds
: This is the TTL for tokens generated by the Partition Service.- Example:
3600
- Example:
immuta.yarn.api.host.port
: This option must be set to the host/port that the YARN resource manager service is running.- Example:
http://master:8088
- Example:
immuta.credentials.dir
: This directory must contain a directory with each users' username. The directory must be owned by the user, and readable only by that user. An Immuta credential file will be written that is readable only by the owning user.- Example:
/user
- Example:
fs.immuta.impl
: The class used for the immuta file system protocol.- Example:
com.immuta.hadoop.ImmutaFileSystem
- Example:
hadoop.proxyuser.<immuta service principal>.hosts
: The configuration that allows the Immuta service principal to proxy other hosts.- Example:
*
- Example:
hadoop.proxyuser.<immuta service principal>.users
: The configuration that allows the Immuta service principal to proxy end-users.- Example:
*
- Example:
hadoop.proxyuser.<immuta service principal>.groups
: The configuration that allows the Immuta service principal to proxy user groups.- Example:
*
- Example:
Make sure that user directories underneath immuta.credentials.dir
are readable only by the owner of the directory. If
the user's directory doesn't exist and we create it, we will set the permissions to 700
.
<property>
<name>immuta.base.url</name>
<value>https://immuta.hostname</value>
<final>true</final>
</property>
<property>
<name>immuta.spark.partition.generator</name>
<value>secure</value>
<final>true</final>
</property>
<property>
<name>immuta.secure.partition.generator.hostname</name>
<value>localhost</value>
<final>true</final>
</property>
<property>
<name>immuta.yarn.api.host.port</name>
<value>http://master:8088</value>
<final>true</final>
</property>
<property>
<name>immuta.credentials.dir</name>
<value>/user</value>
<final>true</final>
</property>
<property>
<name>fs.immuta.impl</name>
<value>com.immuta.hadoop.ImmutaFileSystem</value>
<final>true</final>
</property>
<property>
<name>hadoop.proxyuser.*.hosts</name>
<value>*</value>
<final>true</final>
</property>
<property>
<name>hadoop.proxyuser.*.users</name>
<value>*</value>
<final>true</final>
</property>
<property>
<name>hadoop.proxyuser.*.groups</name>
<value>*</value>
<final>true</final>
</property>
NOTE: We recommend that all Immuta configuration values be marked
final
.
NameNode only Configuration
The following settings should only be written to the configuration on the NameNode. Setting these values on DataNodes will have security implications, so be sure that they are set in the NameNode only section of your Hadoop configuration tool.
dfs.namenode.inode.attributes.provider.class
: Configure Hadoop to use the Immuta Inode Attribute Provider.- Example:
com.immuta.hadoop.ImmutaInodeAttributeProvider
- Example:
immuta.permission.fallback.class
: This class will be used as a fallback authorization/permission checker if Immuta is not protecting the target directory. This will also be used if fallback is explicitly enabled. If the deployment also requires Sentry, this should be set toorg.apache.sentry.hdfs.SentryINodeAttributesProvider
.- Example:
org.apache.hadoop.hdfs.server.namenode.DefaultINodeAttributesProvider
- Example:
immuta.permission.allow.fallback
: Set to true if a user's access should be determined by the permission fallback class even if they are explicitly denied access by Immuta. This is a dangerous setting in that a user may be forbidden from seeing data through Immuta but still see the data in HDFS.- Example:
false
- Example:
immuta.system.api.key
: This must be set to the value of thehdfsSystemToken
configuration item in Immuta. This API key is used to create user API keys in Immuta, so it is important that it can be trusted and cannot be accessed by users. This must be set when using the Immuta FileSystem. Use the value ofHDFS_SYSTEM_TOKEN
generated earlier.- Example:
mYIUy6REcWrnW1mtVjZpuZiyyRFVj3
- Example:
immuta.permission.users.to.ignore
: Comma separated list of hdfs user accounts which will bypass the Immuta authorization provider. The final listed userimmuta
should be replaced with the principal being used as the Immuta system user. This should match the principal in theusername
configuration mentioned below under Immuta Web Service configuration.- Example:
hdfs,yarn,hive,impala,llama,mapred,spark,oozie,hue,hbase,livy,immuta
- Example:
<property>
<name>dfs.namenode.inode.attributes.provider.class</name>
<value>com.immuta.hadoop.ImmutaInodeAttributeProvider</value>
<final>true</final>
</property>
<property>
<name>immuta.permission.fallback.class</name>
<value>org.apache.hadoop.hdfs.server.namenode.DefaultINodeAttributesProvider</value>
<final>true</final>
</property>
<property>
<name>immuta.permission.allow.fallback</name>
<value>false</value>
<final>true</final>
</property>
<property>
<name>immuta.system.api.key</name>
<value>mYIUy6REcWrnW1mtVjZpuZiyyRFVj3</value>
<final>true</final>
</property>
<property>
<name>immuta.permission.users.to.ignore</name>
<value>hdfs,yarn,hive,impala,llama,mapred,spark,oozie,hue,hbase,livy,immuta</value>
<final>true</final>
</property>
Note: We recommend that all Immuta configuration values be marked final
.
Enabling TLS for the Immuta Partition Service
You can enable TLS on the Immuta Partition Service by configuring it to use a keystore in JKS format.
These settings should be set in the HDFS configuration file core-site.xml
.
immuta.secure.partition.generator.keystore
: The path to the Immuta Partition Service keystore.- Example:
/etc/immuta/keystore.jks
- Example:
immuta.secure.partition.generator.keystore.password
: The password for the Immuta Partition Service keystore. Note this will be a publicly available piece of information, but file permissions should be used to make sure that only the user running the service can read the keystore file.- Example:
secure_password
- Example:
immuta.secure.partition.generator.keystore.password
: The password for the Immuta Partition Service keystore. Note this will be a publicly available piece of information, but file permissions should be used to make sure that only the user running the service can read the keystore file.- Example:
secure_password
- Example:
immuta.secure.partition.generator.keymanager.password
: The KeyManager password for the Immuta Partition Service keystore. Note this will be a publicly available piece of information, but file permissions should be used to make sure that only the user running the service can read the keystore file. This is not always necessary.- Example:
secure_password
- Example:
As noted above, currently the keystore password must be set in core-site.xml
, which is publicly accessible. We
recommend using file permissions to secure the keystore from improper access.
chown immuta:immuta /etc/immuta/keystore.jks
chmod 600 /etc/immuta/keystore.jks
Example configuration:
<property>
<name>immuta.secure.partition.generator.keystore</name>
<value>/etc/immuta/keystore.jks</value>
<final>true</final>
</property>
<property>
<name>immuta.secure.partition.generator.keystore.password</name>
<value>secure_password</value>
<final>true</final>
</property>
<property>
<name>immuta.secure.partition.generator.keymanager.password</name>
<value>secure_password</value>
<final>true</final>
</property>
Note: We recommend that all Immuta configuration values be marked final
.
Impala Configuration
You must give the service principal that the Immuta Web Service is configured to use permission to delegate in Impala.
To accomplish this, add the Immuta Web Service principal to authorized_proxy_user_config
in the Impala daemon command
line arguments:
-authorized_proxy_user_config=<immuta web service principal>=*
Note: If the
authorized_proxy_user_config
parameter is already present for other services, append the Immuta configuration value to the end.-authorized_proxy_user_config=hue=*;<immuta web service principal>=*
Spark Configuration
In spark-conf/spark-defaults.conf
configure:
Note: Enabling Immuta's Spark Access Pattern in spark-defaults.conf
will cause all Spark based tools such as
Hive on Spark to not function properly. Skip this step if you are using such tools.
spark.broadcast.factory=org.apache.spark.broadcast.ImmutaSerializableBroadcastFactory
spark.executor.extraClassPath=/opt/immuta/immuta-hadoop-filesystem-2.8.3.jar:/opt/immuta/immuta-spark-context-2.8.3.jar
spark.driver.extraClassPath=/opt/immuta/immuta-hadoop-filesystem-2.8.3.jar:/opt/immuta/immuta-spark-context-2.8.3.jar
spark.driver.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=/user/immuta/allowedCallingClasses.json
spark.executor.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=/user/immuta/allowedCallingClasses.json
spark.hadoop.fs.hdfs.impl=com.immuta.hadoop.ImmutaSparkTokenFileSystem
In spark-conf/spark-env.sh
configure,
export PYTHONPATH=/opt/immuta/python/ImmutaContext.py
export PYTHONSTARTUP=/opt/immuta/python/initialize.py
Immuta Web Service configuration
The Immuta Web Service needs to be updated to support the HDFS plugin. Update /etc/immuta/config.yml on the Web Service nodes with the following values.
hdfsSystemToken
: The token used by NameNode plugin to authenticate with the Immuta REST API. This must equal the value set inimmuta.system.api.key
. Use the value ofHDFS_SYSTEM_TOKEN
generated earlier.- Example:
mYIUy6REcWrnW1mtVjZpuZiyyRFVj3
- Example:
kerberos
ticketRefreshInterval
: Time in milliseconds to wait between kinit executions. This should be lower than the ticket refresh interval required by the kerberos server.- Example:
43200000
- Example:
username
: User principal used for kinit.- Example:
immuta
- Example:
keyTabPath
: The path to the keytab file on disk to be used for kinit- Example:
/etc/immuta/immuta.keytab
- Example:
krbConfigPath
: The path to the krb5 configuration file on disk.- Example:
/etc/krb5.conf
- Example:
krbBinPath
: The path to the Kerberos installation binary directory.- Example:
/usr/bin/
- Example:
client
kerberosRealm
: The default realm to use for kerberos authentication.- Example:
YOURCOMPANY.COM
- Example:
...
client:
kerberosRealm: YOURCOMPANY.COM
plugins:
...
hdfsHandler:
...
hdfsSystemToken: mYIUy6REcWrnW1mtVjZpuZiyyRFVj3
kerberos:
ticketRefreshInterval: 43200000
username: immuta
keyTabPath: /etc/immuta/immuta.keytab
krbConfigPath: /etc/krb5.conf
krbBinPath: /usr/bin/
...
You must also be sure that the /etc/krb5.conf
configuration on the Immuta Web Service nodes is accurate.
The Web Service must be restarted after making these changes.