Immuta CDH Integration Installation
Audience: System Administrators
Content Summary: The Immuta CDH integration installation consists of the following components:
- Immuta NameNode plugin
- Immuta Hadoop Filesystem plugin
- Immuta Spark 1.6 Partition Service (DEPRECATED)
- Immuta Spark 2 Partition Service
This page outlines the installation steps required to successfully deploy these components on your CDH cluster.
Prerequisites
Follow the Immuta CDH Integration Prerequisites to prepare for installation.
Installation
Begin installation by transferring the Immuta .parcel
and its associated .parcel.sha
files to your Cloudera Manager
node and placing them in /opt/cloudera/parcel-repo
. Once copied, ensure files have both their owner and group permissions
set to cloudera-scm
chown -R cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
Next, transfer the Immuta CSD (.jar
file) to /opt/cloudera/csd
, and ensure both its owner and group permissions are
set to cloudera-scm
as well.
chown -R cloudera-scm:cloudera-scm /opt/cloudera/csd
You will need to restart the Cloudera Manager server in order for the CSD to be picked up:
systemctl restart cloudera-scm-server
service cloudera-scm-server restart
Follow Cloudera's instructions for distributing and activating the IMMUTA parcel.
Once the parcel has been successfully activated, you can add the IMMUTA service:
- From the Cloudera Manager select Add Service.
- Choose Immuta.
- Click Continue.
- Select nodes to install the services on. Your options are
- For maximum redundancy, choose all.
- Choose a single node.
- Choose a few nodes. Set up a Load Balancer in front of the instances to distribute load. Contact Immuta support for more details.
- Proceed to the end of the workflow.
Configuring HDFS
After adding the Immuta service to your CDH cluster, there is some configuration that needs to be completed.
If your cluster is configured with Kerberos, note that the default
configuration expects to run Immuta services using the immuta
principal.
If you need to use a different Kerberos principal, see
Running as a Non-Default User
for detailed instructions on how to configure that. After running through these
steps, note that you may need to
manually run the Create Immuta User Home Directory
command from the Actions
menu for the Immuta
service.
For more details on Immuta's HDFS configuration, please see Hadoop Cluster Configuration for Immuta.
NameNode-Only Configuration
Warning
The following settings should only be written to the configuration on the NameNode. Setting these values on DataNodes will have security implications, so be sure that they are set in the NameNode only section of your Hadoop configuration tool. For example:
Under the HDFS service of Cloudera Manager, Configuration tab, search for key:
NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml
and, using "View as XML", add/set the value(s) similar to:
<property>
<name>dfs.namenode.authorization.provider.class</name>
<value>com.immuta.hadoop.ImmutaAuthorizationProvider</value>
<final>true</final>
</property>
<property>
<name>immuta.permission.fallback.class</name>
<value>org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider</value>
<final>true</final>
</property>
<property>
<name>immuta.permission.allow.fallback</name>
<value>false</value>
<final>true</final>
</property>
<property>
<name>immuta.system.api.key</name>
<value>0ec28d3f-a8a2-4960-b653-d7ccfe4803b3</value>
<final>true</final>
</property>
<property>
<name>immuta.permission.users.to.ignore</name>
<value>hdfs,yarn,hive,impala,llama,mapred,spark,oozie,hue,hbase,livy,immuta</value>
<final>true</final>
</property>
Mark Immuta values final
We recommend that all Immuta configuration values be marked final
.
Detailed Explanation:
dfs.namenode.authorization.provider.class
- Configures Hadoop to use the Immuta Authorization Provider.
- Default:
com.immuta.hadoop.ImmutaAuthorizationProvider
immuta.permission.fallback.class
- This class will be used as a fallback authorization/permission checker if Immuta is not protecting
the target directory. This will also be used if fallback is explicitly enabled. If the deployment also requires
Sentry, this should be set to
org.apache.sentry.hdfs.SentryAuthorizationProvider
. - Default:
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider
- This class will be used as a fallback authorization/permission checker if Immuta is not protecting
the target directory. This will also be used if fallback is explicitly enabled. If the deployment also requires
Sentry, this should be set to
immuta.permission.allow.fallback
- Set to
true
if a user's access should be determined by the permission fallback class even if they are explicitly denied access by Immuta. WARNING! Setting this totrue
is DANGEROUS in that a user may be forbidden from seeing data through Immuta but still able to see the data in HDFS. - Default:
false
- Set to
immuta.system.api.key
- This must be set to the value of the
hdfsSystemToken
configuration item in Immuta. This API key is used to create user API keys in Immuta, so it is important that it can be trusted and cannot be accessed by users. This must be set when using the Immuta FileSystem. Use the value ofHDFS_SYSTEM_TOKEN
generated earlier. - Example:
0ec28d3f-a8a2-4960-b653-d7ccfe4803b3
- This must be set to the value of the
immuta.permission.users.to.ignore
- Comma separated list of HDFS user accounts that will bypass the Immuta authorization provider. If
the principal being used as the Immuta system user is anything other than "immuta", that user should be
appended to this list. This should match the principal in the
username
configuration mentioned below under Immuta Web Service configuration. - Default:
hdfs,yarn,hive,impala,llama,mapred,spark,oozie,hue,hbase,livy,immuta
- Comma separated list of HDFS user accounts that will bypass the Immuta authorization provider. If
the principal being used as the Immuta system user is anything other than "immuta", that user should be
appended to this list. This should match the principal in the
Shared Configuration
The following configuration items should be configured for both the NameNode processes and the DataNode processes. These configurations are used both by the Immuta FileSystem and the Immuta NameNode plugin. For example:
Under the HDFS service of Cloudera Manager, Configuration tab, search for key:
Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml
and, using "View as XML", add/set the value(s) similar to:
<property>
<name>immuta.base.url</name>
<value>https://immuta.hostname</value>
<final>true</final>
</property>
<property>
<name>immuta.spark.partition.generator.user</name>
<value>immuta</value>
<final>true</final>
</property>
<property>
<name>immuta.credentials.dir</name>
<value>/user</value>
<final>true</final>
</property>
<property>
<name>immuta.visibility.cache.timeout.seconds</name>
<value>600</value>
<final>true</final>
</property>
<property>
<name>fs.immuta.impl</name>
<value>com.immuta.hadoop.ImmutaFileSystem</value>
<final>true</final>
</property>
<property>
<name>hadoop.proxyuser.immuta.hosts</name>
<value>*</value>
<final>true</final>
</property>
<property>
<name>hadoop.proxyuser.immuta.users</name>
<value>*</value>
<final>true</final>
</property>
<property>
<name>hadoop.proxyuser.immuta.groups</name>
<value>*</value>
<final>true</final>
</property>
Mark Immuta values final
We recommend that all Immuta configuration values be marked final
.
Detailed Explanation:
immuta.base.url
- Specifies the base URL of the Immuta API.
- Example:
https://immuta.hostname
immuta.spark.partition.generator.user
- Specifies the system user and/or Kerberos principal that the Immuta Partition Service will run as. By default, configuration and other support files will be placed in this user's hdfs directory.
- Default:
immuta
immuta.credentials.dir
- This directory must contain a directory with each user's username. The directory must be owned by the user and readable only by that user. An Immuta credential file will be written that is readable only by the owning user.
- Default:
/user
immuta.visibility.cache.timeout.seconds
- Specifies the amount of time the user's visibility report is cached in the Spark context.
- Default:
600
fs.immuta.impl
- Specifies the class used for the immuta file system protocol.
- Default:
com.immuta.hadoop.ImmutaFileSystem
hadoop.proxyuser.<IMMUTA_SERVICE_PRINCIPAL>.hosts
- Specifies the other hosts the Immuta service principal is allowed to proxy (either a comma-separated list or
*
). - Default:
*
- Specifies the other hosts the Immuta service principal is allowed to proxy (either a comma-separated list or
hadoop.proxyuser.<IMMUTA_SERVICE_PRINCIPAL>.users
- Specifies the end users the Immuta service principal is allowed to proxy (either a comma-separated list or
*
). - Default:
*
- Specifies the end users the Immuta service principal is allowed to proxy (either a comma-separated list or
hadoop.proxyuser.<IMMUTA_SERVICE_PRINCIPAL>.groups
- Specifies the user groups the Immuta service principal is allowed to proxy (either a comma-separated list or
*
). - Default:
*
- Specifies the user groups the Immuta service principal is allowed to proxy (either a comma-separated list or
Make sure that user directories underneath immuta.credentials.dir
are readable only by the owner of the directory. If
the user's directory doesn't exist and we create it, we will set the permissions to 700
.
Enabling TLS for the Immuta Partition Service
You can enable TLS on the Immuta Partition Service by configuring it to use a keystore in JKS format.
Server-side TLS Configuration
These settings need to be set up for both the Spark 2 and Spark 1.6 (DEPRECATED) Partition Servers.
Under the Immuta service of Cloudera Manager, Configuration tab, search for key:
Immuta Spark 2 Partition Server Advanced Configuration Snippet (Safety Valve) for context/generator.xml
and, using "View as XML", add/set the value(s) similar to:
<property>
<name>immuta.secure.partition.generator.keystore</name>
<value>/etc/immuta/keystore.jks</value>
<final>true</final>
</property>
<property>
<name>immuta.secure.partition.generator.keystore.password</name>
<value>secure_password</value>
<final>true</final>
</property>
<property>
<name>immuta.secure.partition.generator.keymanager.password</name>
<value>secure_password</value>
<final>true</final>
</property>
Under the Immuta service of Cloudera Manager, Configuration tab, search for key:
Immuta Spark 1.6 Partition Server Advanced Configuration Snippet (Safety Valve) for context/generator.xml
and, using "View as XML", add/set the value(s) similar to:
<property>
<name>immuta.secure.partition.generator.keystore</name>
<value>/etc/immuta/keystore.jks</value>
<final>true</final>
</property>
<property>
<name>immuta.secure.partition.generator.keystore.password</name>
<value>secure_password</value>
<final>true</final>
</property>
<property>
<name>immuta.secure.partition.generator.keymanager.password</name>
<value>secure_password</value>
<final>true</final>
</property>
Mark Immuta values final
We recommend that all Immuta configuration values be marked final
.
Detailed Explanation:
immuta.secure.partition.generator.keystore
- Specifies the path to the Immuta Partition Service keystore.
- Example:
/etc/immuta/keystore.jks
immuta.secure.partition.generator.keystore.password
- Specifies the password for the Immuta Partition Service keystore. This password will be a publicly available piece of information, but file permissions should be used to make sure that only the user running the service can read the keystore file.
- Example:
secure_password
immuta.secure.partition.generator.keystore.password
- Specifies the password for the Immuta Partition Service keystore. This password will be a publicly available piece of information, but file permissions should be used to make sure that only the user running the service can read the keystore file.
- Example:
secure_password
immuta.secure.partition.generator.keymanager.password
- Specifies the KeyManager password for the Immuta Partition. Service keystore. This password will be a publicly available piece of information, but file permissions should be used to make sure that only the user running the service can read the keystore file. This is not always necessary.
- Example:
secure_password
We recommend using file permissions to secure the keystore from improper access:
chown immuta:immuta /etc/immuta/keystore.jks
chmod 600 /etc/immuta/keystore.jks
Client-side TLS Configuration
You must also set the following properties under the following client sections:
For Spark 2, under the Immuta service of Cloudera Manager, Configuration tab, search for key:
Immuta Client Advanced Configuration Snippet (Safety Valve) for immuta-conf/session/generator.xml
and, using "View as XML", add/set the value(s) similar to:
<property>
<name>immuta.secure.partition.generator.keystore</name>
<value>true</value>
<final>true</final>
</property>
For Spark 1.6 (DEPRECATED), under the Immuta service of Cloudera Manager, Configuration tab, search for key:
Immuta Client Advanced Configuration Snippet (Safety Valve) for immuta-conf/context/generator.xml
and, using "View as XML", add/set the value(s) similar to:
<property>
<name>immuta.secure.partition.generator.keystore</name>
<value>true</value>
<final>true</final>
</property>
Mark Immuta values final
We recommend that all Immuta configuration values be marked final
.
Detailed Explanation:
immuta.secure.partition.generator.keystore
- Set to true to enable TLS
- Default:
true
Impala Configuration
You must give the service principal that the Immuta Web Service is configured to use permission to delegate in Impala.
To accomplish this, add the Immuta Web Service principal to authorized_proxy_user_config
in the Impala daemon command
line arguments.
Under the Impala service of Cloudera Manager, Configuration tab, search for key:
Impala Daemon Command Line Argument Advanced Configuration Snippet (Safety Valve)
and add/set the value(s) similar to:
-authorized_proxy_user_config=<IMMUTA_SERVICE_PRINCIPAL>=*
Note
If the authorized_proxy_user_config
parameter is already present for other services, append the Immuta
configuration value to the end:
-authorized_proxy_user_config=hue=*;<IMMUTA_SERVICE_PRINCIPAL>=*
Spark 2 Configuration
No additional configuration is required.
Note: Immuta will work with any Spark 2 version you may have already installed on your cluster.
Spark 1.6 Configuration (DEPRECATED)
Deprecated
Spark 1.6 support is deprecated as of Immuta v2.7.0 and is slated for removal in Immuta v2.8.0
Caution
Enabling Immuta's Spark Access Pattern in spark-defaults.conf
will cause all Spark based tools such as
Hive on Spark to not function properly. Skip this step if you are using such tools.
Under the Spark service of Cloudera Manager, Configuration tab, search for key:
Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf
and add/set the value(s) similar to:
spark.broadcast.factory=org.apache.spark.broadcast.ImmutaSerializableBroadcastFactory
spark.executor.extraClassPath=file:///etc/immuta/conf.cloudera.immuta_partition_service/context/generator.xml:/opt/cloudera/parcels/IMMUTA/lib/immuta-hadoop-filesystem.jar:/opt/cloudera/parcels/IMMUTA/lib/immuta-spark-context.jar
spark.driver.extraClassPath=file:///etc/immuta/conf.cloudera.immuta_partition_service/context/generator.xml:/opt/cloudera/parcels/IMMUTA/lib/immuta-hadoop-filesystem.jar:/opt/cloudera/parcels/IMMUTA/lib/immuta-spark-context.jar
spark.driver.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///etc/immuta/conf/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service
spark.executor.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///etc/immuta/conf/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service
spark.hadoop.fs.hdfs.impl=org.apache.hadoop.hdfs.ImmutaSparkTokenFileSystem
Under the Spark service of Cloudera Manager, Configuration tab, search for key:
Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh
and add/set the value(s) similar to:
export PYTHONPATH=/opt/cloudera/parcels/IMMUTA/python/context
export PYTHONSTARTUP=/opt/cloudera/parcels/IMMUTA/python/context/initialize-context.py
Immuta Partition Service configuration
The Immuta Partition Service requires the same system API key that is configured for the Immuta NameNode plugin.
Be sure that the value of immuta.system.api.key
is consistent across your configuration.
For Spark 2, under the IMMUTA service of Cloudera Manager, Configuration section, search for key:
Immuta Spark 2 Partition Server Advanced Configuration Snippet (Safety Valve) for session/generator.xml
and, using "View as XML", add/set the value(s) similar to:
<property>
<name>immuta.system.api.key</name>
<value>0ec28d3f-a8a2-4960-b653-d7ccfe4803b3</value>
<final>true</final>
</property>
For Spark 1.6 (DEPRECATED), under the IMMUTA service of Cloudera Manager, Configuration section, search for key:
Immuta Spark 1.6 Partition Server Advanced Configuration Snippet (Safety Valve) for context/generator.xml
and, using "View as XML", add/set the value(s) similar to:
<property>
<name>immuta.system.api.key</name>
<value>0ec28d3f-a8a2-4960-b653-d7ccfe4803b3</value>
<final>true</final>
</property>
Mark Immuta values final
We recommend that all Immuta configuration values be marked final
.
Immuta Web Service configuration
The Immuta Web Service needs to be configured to support the HDFS plugin. You can set this configuration using the Immuta Configuration UI.
Though generally unnecessary given the configuration through the Application Settings of the Web UI, below is an example YAML snippet that can be used as an alternative to the Immuta Configuration UI if recommended by an Immuta representative.
client:
kerberosRealm: YOURCOMPANY.COM
plugins:
hdfsHandler:
hdfsSystemToken: 0ec28d3f-a8a2-4960-b653-d7ccfe4803b3
kerberos:
ticketRefreshInterval: 43200000
username: immuta
keyTabPath: /etc/immuta/immuta.keytab
krbConfigPath: /etc/krb5.conf
krbBinPath: /usr/bin/
Detailed Explanation:
client
kerberosRealm
- Specifies the default realm to use for Kerberos authentication.
- Example:
YOURCOMPANY.COM
plugins
hdfsHandler
hdfsSystemToken
- Token used by NameNode plugin to authenticate with the Immuta REST API. This must equal the value set
in
immuta.system.api.key
. Use the value ofHDFS_SYSTEM_TOKEN
generated earlier. - Example:
0ec28d3f-a8a2-4960-b653-d7ccfe4803b3
- Token used by NameNode plugin to authenticate with the Immuta REST API. This must equal the value set
in
kerberos
ticketRefreshInterval
- Time in milliseconds to wait between kinit executions. This should be lower than the ticket refresh interval required by the Kerberos server.
- Default:
43200000
username
- User principal used for kinit.
- Default:
immuta
keyTabPath
- The path to the keytab file on disk to be used for kinit.
- Default:
/etc/immuta/immuta.keytab
krbConfigPath
- The path to the krb5 configuration file on disk.
- Default:
/etc/krb5.conf
krbBinPath
- The path to the Kerberos installation binary directory.
- Default:
/usr/bin/
Additionally, you must upload a keytab for the immuta
user as well as a krb5.conf
configuration file to the
Immuta Web Service. This can also be done via the Immuta Configuration UI.
Native Workspace Configuration
If you want users to be able to create derived data sources and/or native Hive or Impala tables within Immuta's
native project workspaces, you will need to grant a Sentry admin role to the immuta
user. This requires
adding the immuta
user to Admin Groups and Allowed Connecting Users under Sentry's configuration in
Cloudera Manager.
You should also create a new sentry role for immuta
, with all privileges granted. Run the SQL snippet below
in beeline
or impala-shell
as either the immuta
user or as any user with sentry admin privileges.
CREATE ROLE immuta;
GRANT ALL ON SERVER <server name> TO ROLE immuta WITH GRANT OPTION;
GRANT ROLE immuta TO GROUP immuta;
You will also need to enable the ImmutaGroupsMapping
service in Hive and/or Impala's configuration to allow
Immuta to manage Sentry permissions for Immuta users. For instructions on how to do this,
please see Enabling ImmutaGroupsMapping.