The following instructions describe how to install the Hue tarball on a multi-node cluster. You need to also install Hadoop and its satellite components (Oozie, Hive...) and update some Hadoop configuration files before running Hue.
Hue consists of a web service that runs on a special node in your cluster. Choose one node where you want to run Hue. This guide refers to that node as the Hue Server. For optimal performance, this should be one of the nodes within your cluster, though it can be a remote node as long as there are no overly restrictive firewalls. For small clusters of less than 10 nodes, you can use your existing master node as the Hue Server.
You can download the Hue tarball here: https://github.com/cloudera/hue/releases
Hue employs some Python modules which use native code and requires certain development libraries be installed on your system. To install from the tarball, you'll need these library development packages and tools installed on your system:
sudo apt-get install git ant gcc g++ libffi-dev libkrb5-dev libmysqlclient-dev libsasl2-dev libsasl2-modules-gssapi-mit libsqlite3-dev libssl-dev libxml2-dev libxslt-dev make maven libldap2-dev python-dev python-setuptools libgmp3-dev nodejs npm
Install Oracle JDK
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
sudo yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel
apache-maven
package or maven3 tarball)Installing Python 2.7:
Check your OS Version:
cat /etc/redhat-release
Make sure "/etc/redhat-release" contains "CentOS 6.8 or 6.9" version. These instructions are tested on CentOS 6.8 and 6.9 versions only. It may or may not work on previous CentOS 6 series OS.
yum install -y centos-release-SCL
yum install -y scl-utils
yum install -y python27
Check your OS Version
cat /etc/redhat-release
Make sure /etc/redhat-release
contains "RedHat 6.8 or 6.9" version. These instructions are tested on RedHat 6.8 and 6.9 versions only. It may or may not work on previous RedHat 6 series OS.
wget http://mirror.infra.cloudera.com/centos/6/extras/x86_64/Packages/centos-release-scl-rh-2-3.el6.centos.noarch.rpm
rpm -ivh centos-release-scl-rh-2-3.el6.centos.noarch.rpm
yum install -y scl-utils
yum install -y python27
Check your OS Version
cat /etc/redhat-release
Make sure /etc/redhat-release
contains "Oracle 6.8 or 6.9" version. These instructions are tested on Oracle 6.8 and 6.9 versions only. It may or may not work on previous Oracle 6 series OS.
wget -O /etc/yum.repos.d/public-yum-ol6.repo http://yum.oracle.com/public-yum-ol6.repo
Set the value of the enabled parameter for the software_collections repository to 1: for file /etc/yum.repos.d/public-yum-ol6.repo
[ol6_software_collections]
name=Software Collection Library release 3.0 packages for Oracle Linux 6 (x86_64)
baseurl=http://yum.oracle.com/repo/OracleLinux/OL6/SoftwareCollections/x86_64/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1
for more details, refer to this link: https://docs.oracle.com/cd/E37670_01/E59096/html/section_e3v_nbl_cr.html
yum install -y scl-utils
yum install -y python27
Install Dependencies via Homebrew brew install mysql maven gmp openssl libffi && brew cask install java8
Install Xcode command line tools sudo xcode-select --install
Fix openssl errors Required for MacOS 10.11+ export LDFLAGS=-L/usr/local/opt/openssl/lib && export CPPFLAGS=-I/usr/local/opt/openssl/include
Download both instantclient-basic and instantclient-sdk of the same version (11.2.0.4.0 for this example) and on your ~/.bash_profile, add
export ORACLE_HOME=/usr/local/share/oracle
export VERSION=11.2.0.4.0
export ARCH=x86_64
export DYLD_LIBRARY_PATH=$ORACLE_HOME
export LD_LIBRARY_PATH=$ORACLE_HOME
and then
source ~/.bash_profile
sudo mkdir -p $ORACLE_HOME
sudo chmod 775 $ORACLE_HOME
then unzip the content of both downloaded zip files into the newly created $ORACLE_HOME in a way that the 'sdk' folder is at the same level with the other files and then
ln -s libclntsh.dylib.11.1 libclntsh.dylib
ln -s libocci.dylib.11.1 libocci.dylib
and finally
cd sdk
unzip ottclasses.zip
Configure $PREFIX
with the path where you want to install Hue by running:
PREFIX=/usr/share make install
cd /usr/share/hue
You can install Hue anywhere on your system, and run Hue as a non-root user.
It is a good practice to create a new user for Hue and either install Hue in
that user's home directory, or in a directory within /usr/share
.
Alternatively to building Hue into an image, the Hue Docker Guide is available.
After your cluster is running with the plugins enabled, you can start Hue on your Hue Server by running:
build/env/bin/supervisor
This will start several subprocesses, corresponding to the different Hue components. Your Hue installation is now running.
The source of truth sits in the main hue.ini. It consists in several ini sections. Lines needs to be uncommented to be active.
Hue is using Hadoop impersonation
to be able to communicate properly with certain services. This is described in the following Service Configuration.
The goal of the Editor is to open-up data to more users by making self service querying easy and productive.
It is available in Editor or Notebook mode and focuses on SQL. Dialects can be added to the main [notebook]
section like this:
[notebook]
[[interpreters]]
[[[hive]]]
# The name of the snippet.
name=Hive
# The backend connection to use to communicate with the server.
interface=hiveserver2
[[[mysqlalche]]]
name = MySQL alchemy
interface=sqlalchemy
options='{"url": "mysql://root:root@localhost:3306/hue"}'
Download and export options with limited scalability can be limited in the number of rows or bytes transferred using the following options respectively in your hue.ini:
[beeswax]
# A limit to the number of rows that can be downloaded from a query before it is truncated.
# A value of -1 means there will be no limit.
download_row_limit=-1
# A limit to the number of bytes that can be downloaded from a query before it is truncated.
# A value of -1 means there will be no limit.
download_bytes_limit=-1
In addition, it is possible to disable the download and export feature in the editor, dashboard, as well as in the file browser with the following option in your hue.ini:
[desktop]
# Global setting to allow or disable end user downloads in all Hue.
# e.g. Query result in Editors and Dashboards, file in File Browser...
enable_download=false
The download feature in the file browser can be disabled separately with the following options in your hue.ini:
[filebrowser]
show_download_button=false
[impala]
# Host of the Impala Server (one of the Impalad)
## server_host=localhost
# Port of the Impala Server
## server_port=21050
LDAP or PAM pass-through authentication with Hive or Impala and Impersonation .
[beeswax]
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
## hive_server_host=localhost
# Port where HiveServer2 Thrift server runs on.
## hive_server_port=10000
Tez
Requires support for sending multiple queries when using Tez (instead of a maximum of just one at the time). You can turn it on with this setting:
[beeswax]
max_number_of_sessions=10
Recommended way:
[[[mysql]]]
name = MySQL Alchemy
interface=sqlalchemy
## https://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine
## https://docs.sqlalchemy.org/en/latest/dialects/mysql.html
options='{"url": "mysql://root:root@localhost:3306/hue"}'
Alternative:
[[[mysqljdbc]]]
name=MySql JDBC
interface=jdbc
## Specific options for connecting to the server.
## The JDBC connectors, e.g. mysql.jar, need to be in the CLASSPATH environment variable.
## If 'user' and 'password' are omitted, they will be prompted in the UI.
options='{"url": "jdbc:mysql://localhost:3306/hue", "driver": "com.mysql.jdbc.Driver", "user": "root", "password": "root"}'
## options='{"url": "jdbc:mysql://localhost:3306/hue", "driver": "com.mysql.jdbc.Driver"}'
Direct interface:
[[[presto]]]
name=Presto SQL
interface=presto
## Specific options for connecting to the Presto server.
## The JDBC driver presto-jdbc.jar need to be in the CLASSPATH environment variable.
## If 'user' and 'password' are omitted, they will be prompted in the UI.
options='{"url": "jdbc:presto://localhost:8080/catalog/schema", "driver": "io.prestosql.jdbc.PrestoDriver", "user": "root", "password": "root"}'
The Presto JDBC client driver is maintained by the Presto Team and can be downloaded here: https://prestosql.io/docs/current/installation/jdbc.html
[[[presto]]]
name=Presto JDBC
interface=jdbc
options='{"url": "jdbc:presto://localhost:8080/", "driver": "io.prestosql.jdbc.PrestoDriver"}'
First, in your hue.ini file, you will need to add the relevant database connection information under the librdbms section:
[librdbms]
[[databases]]
[[[postgresql]]]
nice_name=PostgreSQL
name=music
engine=postgresql_psycopg2
port=5432
user=hue
password=hue
options={}
Secondly, we need to add a new interpreter to the notebook app. This will allow the new database type to be registered as a snippet-type in the Notebook app. For query editors that use a Django-compatible database, the name in the brackets should match the database configuration name in the librdbms section (e.g. – postgresql). The interface will be set to rdbms. This tells Hue to use the librdbms driver and corresponding connection information to connect to the database. For example, with the above postgresql connection configuration in the librdbms section, we can add a PostgreSQL interpreter with the following notebook configuration:
[notebook]
[[interpreters]]
[[[postgresql]]]
name=PostgreSQL
interface=rdbms
Same as Presto.
[[[teradata]]]
name=Teradata JDBC
interface=jdbc
options='{"url": "jdbc:teradata://sqoop-teradata-1400.sjc.cloudera.com/sqoop", "driver": "com.teradata.jdbc.TeraDriver", "user": "sqoop", "password": "sqoop"}'
[[[db2]]]
name=DB2 JDBC
interface=jdbc
options='{"url": "jdbc:db2://db2.vpc.cloudera.com:50000/SQOOP", "driver": "com.ibm.db2.jcc.DB2Driver", "user": "DB2INST1", "password": "cloudera"}'
Via Apache Livy (recommended):
[[[sparksql]]]
name=SparkSql
interface=livy
...
[spark]
# The Livy Server URL.
livy_server_url=http://localhost:8998
Via native HiveServer2 API:
[[[sparksql]]]
name=SparkSql
interface=hiveserver2
[[[kafkasql]]]
name=Kafka SQL
interface=kafka
Microsoft’s SQL Server JDBC drivers can be downloaded from the official site: Microsoft JDBC Driver
[[[sqlserver]]]
name=SQLServer JDBC
interface=jdbc
options='{"url": "jdbc:microsoft:sqlserver://localhost:1433", "driver": "com.microsoft.jdbc.sqlserver.SQLServerDriver", "user": "admin": "password": "pass"}'
Vertica’s JDBC client drivers can be downloaded here: Vertica JDBC Client Drivers. Be sure to download the driver for the right version and OS.
[[[vertica]]]
name=Vertica JDBC
interface=jdbc
options='{"url": "jdbc:vertica://localhost:5433/example", "driver": "com.vertica.jdbc.Driver", "user": "admin", "password": "pass"}'
The Phoenix JDBC client driver is bundled with the Phoenix binary and source release artifacts, which can be downloaded here: Apache Phoenix Downloads. Be sure to use the Phoenix client driver that is compatible with your Phoenix server version.
[[[phoenix]]]
name=Phoenix JDBC
interface=jdbc
options='{"url": "jdbc:phoenix:localhost:2181/hbase", "driver": "org.apache.phoenix.jdbc.PhoenixDriver", "user": "", "password": ""}'
Note: Currently, the Phoenix JDBC connector for Hue only supports read-only operations (SELECT and EXPLAIN statements).
The Drill JDBC driver can be used.
[[[drill]]]
name=Drill JDBC
interface=jdbc
## Specific options for connecting to the server.
## The JDBC connectors, e.g. mysql.jar, need to be in the CLASSPATH environment variable.
## If 'user' and 'password' are omitted, they will be prompted in the UI.
options='{"url": "<drill-jdbc-url>", "driver": "org.apache.drill.jdbc.Driver", "user": "admin", "password": "admin"}'</code>
[[[solr]]]
name = Solr SQL
interface=solr
## Name of the collection handler
# options='{"collection": "default"}'
[[[kylin]]]
name=kylin JDBC
interface=jdbc
options='{"url": "jdbc:kylin://172.17.0.2:7070/learn_kylin", "driver": "org.apache.kylin.jdbc.Driver", "user": "ADMIN", "password": "KYLIN"}'
[[[clickhouse]]]
name=ClickHouse
interface=jdbc
## Specific options for connecting to the ClickHouse server.
## The JDBC driver clickhouse-jdbc.jar and its related jars need to be in the CLASSPATH environment variable.
options='{"url": "jdbc:clickhouse://localhost:8123", "driver": "ru.yandex.clickhouse.ClickHouseDriver", "user": "readonly", "password": ""}'
SQL Alchemy is a robust connector that supports many SQL dialects.
[[[mysql]]]
name = MySQL Alchemy
interface=sqlalchemy
options='{"url": "mysql://root:root@localhost:3306/hue"}'
Those rely on the [dbms]
lib an dedicated Python libs.
Note, SQL Alchemy should be prefered.
Hue’s query editor can easily be configured to work with any database backend that Django supports, including PostgreSQL, MySQL, Oracle and SQLite. Some of you may note that these are the same backends supported by Hue’s DBQuery app and in fact, adding a new query editor for these databases starts with the same configuration step.
First, in your hue.ini file, you will need to add the relevant database connection information under the librdbms section:
[librdbms]
[[databases]]
[[[postgresql]]]
nice_name=PostgreSQL
name=music
engine=postgresql_psycopg2
port=5432
user=hue
password=hue
options={}
Secondly, we need to add a new interpreter to the notebook app. This will allow the new database type to be registered as a snippet-type in the Notebook app. For query editors that use a Django-compatible database, the name in the brackets should match the database configuration name in the librdbms section (e.g. – postgresql). The interface will be set to rdbms. This tells Hue to use the librdbms driver and corresponding connection information to connect to the database. For example, with the above postgresql connection configuration in the librdbms section, we can add a PostgreSQL interpreter with the following notebook configuration:
[notebook]
[[interpreters]]
[[[postgresql]]]
name=PostgreSQL
interface=rdbms
After updating the configuration and restarting Hue, we can access the new PostgreSQL interpreter in the Notebook app:
Use the query editor with any JDBC database.
Note, SQL Alchemy should be prefered.
The “rdbms” interface works great for MySQL, PostgreSQL, SQLite, and Oracle, but for other JDBC-compatible databases Hue now finally supports a “jdbc” interface to integrate such databases with the new query editor!
Integrating an external JDBC database involves a 3-step process:
Download the compatible client driver JAR file for your specific OS and database. Usually you can find the driver files from the official database vendor site; for example, the MySQL JDBC connector for Mac OSX can be found here: https://dev.mysql.com/downloads/connector/j/. (NOTE: In the case of MySQL, the JDBC driver is platform independent, but some drivers are specific to certain OSes and versions so be sure to verify compatibility.)
Add the path to the driver JAR file to your Java CLASSPATH. Here, we set the CLASSPATH environment variable in our .bash_profile
script.
# MySQL
export MYSQL_HOME=/Users/hue/Dev/mysql
export CLASSPATH=$MYSQL_HOME/mysql-connector-java-5.1.38-bin.jar:$CLASSPATH
Add a new interpreter to the notebook app and supply the “name”, set “interface” to jdbc, and set “options” to a JSON object that contains the JDBC connection information. For example, we can connect a local MySQL database named “hue” running on localhost
and port 8080
via JDBC with the following configuration:
[notebook]
[[interpreters]]
[[[mysql]]]
name=MySQL JDBC
interface=jdbc
options='{"url": "jdbc:mysql://localhost:3306/hue", "driver": "com.mysql.jdbc.Driver", "user": "root", "password": ""}'
Technically the JDBC is connecting to the database to query via a Java Proxy powered with Py4j. It will automatically be started if any interpreter is using it.
## Main flag to override the automatic starting of the DBProxy server.
enable_dbproxy_server=true
Tip: Testing JDBC Configurations Before adding your interpreter’s JDBC configurations to hue.ini, verify that the JDBC driver and connection settings work in a SQL client like SQuirrel SQL.
Tip: Prompt for JDBC authentication You can leave out the username and password in the JDBC options, and Hue will instead prompt the user for a username and password. This allows administrators to provide access to JDBC sources without granting all Hue users the same access.
[[[pyspark]]]
name=PySpark
interface=livy
[[[sparksql]]]
name=SparkSql
interface=livy
[[[spark]]]
name=Scala
interface=livy
[[[r]]]
name=R
interface=livy
...
[spark]
# The Livy Server URL.
livy_server_url=http://localhost:8998
[[[pig]]]
name=Pig
interface=oozie
Hue supports one HDFS cluster. That cluster should be defined
under the [[[default]]]
sub-section.
fs_defaultfs::
This is the equivalence of `fs.defaultFS` (aka `fs.default.name`) in
Hadoop configuration.
webhdfs_url::
You can also set this to be the HttpFS url. The default value is the HTTP
port on the NameNode.
hadoop_conf_dir::
This is the configuration directory of the HDFS, typically
`/etc/hadoop/conf`.
Hue's filebrowser can now allow users to explore, manage, and upload data in an S3 account, in addition to HDFS.
Read more about it in the S3 User Documentation.
In order to add an S3 account to Hue, you'll need to configure Hue with valid S3 credentials, including the access key ID and secret access key: AWSCredentials
These keys can securely stored in a script that outputs the actual access key and secret key to stdout to be read by Hue (this is similar to how Hue reads password scripts). In order to use script files, add the following section to your hue.ini configuration file:
[aws]
[[aws_accounts]]
[[[default]]]
access_key_id_script=/path/to/access_key_script
secret_access_key_script= /path/to/secret_key_script
allow_environment_credentials=false
region=us-east-1
Alternatively (but not recommended for production or secure environments), you can set the access_key_id and secret_access_key values to the plain-text values of your keys:
[aws]
[[aws_accounts]]
[[[default]]]
access_key_id=s3accesskeyid
secret_access_key=s3secretaccesskey
allow_environment_credentials=false
region=us-east-1
The region should be set to the AWS region corresponding to the S3 account. By default, this region will be set to 'us-east-1'.
Using Ceph New end points have been added in HUE-5420
Hue's file browser can now allow users to explore, manage, and upload data in an ADLS, in addition to HDFS and S3.
Read more about it in the ADLS User Documentation.
In order to add an ADLS account to Hue, you'll need to configure Hue with valid ADLS credentials, including the client ID, client secret and tenant ID. These keys can securely stored in a script that outputs the actual access key and secret key to stdout to be read by Hue (this is similar to how Hue reads password scripts). In order to use script files, add the following section to your hue.ini configuration file:
[adls]
[[azure_accounts]]
[[[default]]]
client_id_script=/path/to/client_id_script.sh
client_secret_script=/path/to/client_secret_script.sh
tenant_id_script=/path/to/tenant_id_script.sh
[[adls_clusters]]
[[[default]]]
fs_defaultfs=adl://<account_name>.azuredatalakestore.net
webhdfs_url=https://<account_name>.azuredatalakestore.net
Alternatively (but not recommended for production or secure environments), you can set the client_secret value in plain-text:
[adls]
[[azure_account]]
[[[default]]]
client_id=adlsclientid
client_secret=adlsclientsecret
tenant_id=adlstenantid
[[adls_clusters]]
[[[default]]]
fs_defaultfs=adl://<account_name>.azuredatalakestore.net
webhdfs_url=https://<account_name>.azuredatalakestore.net
Hue supports one or two Yarn clusters (two for HA). These clusters should be defined
under the [[[default]]]
and [[[ha]]]
sub-sections.
resourcemanager_host:
The host running the ResourceManager.
resourcemanager_port:
The port for the ResourceManager REST service.
logical_name:
NameNode logical name.
submit_to:
To enable the section, set to True.
In the [liboozie]
section of the configuration file, you should
specify:
oozie_url:
The URL of the Oozie service. It is the same as the `OOZIE_URL`
environment variable for Oozie.
In the [search]
section of the configuration file, you should
specify:
solr_url:
The URL of the Solr service.
In the [hbase]
section of the configuration file, you should
specify:
hbase_clusters:
Comma-separated list of HBase Thrift servers for clusters in the format of "(name|host:port)".
Hue's Hive SQL Editor application helps you use Hive to query your data. It depends on a Hive Server 2 running in the cluster. Please read this section to ensure a proper integration.
Your Hive data is stored in HDFS, normally under /user/hive/warehouse
(or any path you specify as hive.metastore.warehouse.dir
in your
hive-site.xml
). Make sure this location exists and is writable by
the users whom you expect to be creating tables. /tmp
(on the local file
system) must be world-writable (1777), as Hive makes extensive use of it.
HiveServer2 and Impala support High Availability through a “load balancer”. One caveat is that Hue's underlying Thrift libraries reuse TCP connections in a pool, a single user session may not have the same Impala or Hive TCP connection. If a TCP connection is balanced away from the previously selected HiveServer2 or Impalad instance, the user session and its queries can be lost and trigger the “Results have expired” or “Invalid session Id” errors.
To prevent sessions from being lost, you need configure the load balancer with “source” algorithm to ensure each Hue instance sends all traffic to a single HiveServer2/Impalad instance. Yes, this is not true load balancing, but a configuration for failover High Availability. HiveSever2 or Impala coordinators already distribute the work across the cluster so this is not an issue.
To enable an optimal load distribution that works for everybody, you can create multiple profiles in our load balancer, per port for Hue clients and non-Hue clients like Hive or Impala. You can configure non-Hue clients to distribute loads with “roundrobin” or “leastconn” and configure Hue clients with “source” (source IP Persistence) on dedicated ports, for example, 10015 for Hive beeline commands, 10016 for Hue, 21051 for Hue-Impala interactions while 25003 for Impala shell.
You can configure the HaProxy to have two different ports associated with different load balancing algorithms. Here is a sample configuration (haproxy.cfg) for Hive and Impala HA on a secure cluster.
frontend hiveserver2_front
bind *:10015 ssl crt /path/to/cert_key.pem
mode tcp
option tcplog
default_backend hiveserver2
backend hiveserver2
balance roundrobin
mode tcp
server hs2_1 host-2.com:10000 ssl ca-file /path/to/truststore.pem check
server hs2_2 host-3.com:10000 ssl ca-file /path/to/truststore.pem check
server hs2_3 host-1.com:10000 ssl ca-file /path/to/truststore.pem check
frontend hivejdbc_front
bind *:10016 ssl crt /path/to/cert_key.pem
mode tcp
option tcplog
stick match src
stick-table type ip size 200k expire 30m
default_backend hivejdbc
backend hivejdbc
balance source
mode tcp
server hs2_1 host-2.com:10000 ssl ca-file /path/to/truststore.pem check
server hs2_2 host-3.com:10000 ssl ca-file /path/to/truststore.pem check
server hs2_3 host-1.com:10000 ssl ca-file /path/to/truststore.pem check
And here is an example for impala HA configuration on a secure cluster.
frontend impala_front
bind *:25003 ssl crt /path/to/cert_key.pem
mode tcp
option tcplog
default_backend impala
backend impala
balance leastconn
mode tcp
server impalad1 host-3.com:21000 ssl ca-file /path/to/truststore.pem check
server impalad2 host-2.com:21000 ssl ca-file /path/to/truststore.pem check
server impalad3 host-4.com:21000 ssl ca-file /path/to/truststore.pem check
frontend impalajdbc_front
bind *:21051 ssl crt /path/to/cert_key.pem
mode tcp
option tcplog
stick match src
stick-table type ip size 200k expire 30m
default_backend impalajdbc
backend impalajdbc
balance source
mode tcp
server impalad1 host-3.com:21050 ssl ca-file /path/to/truststore.pem check
server impalad2 host-2.com:21050 ssl ca-file /path/to/truststore.pem check
server impalad3 host-4.com:21050 ssl ca-file /path/to/truststore.pem check
Note: “check” is required at end of each line to ensure HaProxy can detect any unreachable Impalad/HiveServer2 server, so HA failover can be successful. Without TCP check, you may hit the “TSocket reads 0 byte” error when the Impalad/HiveServer2 server Hue tries to connect is down.
After editing the /etc/haproxy/haproxy.cfg file, run following commands to restart HaProxy service and check the service restarts successfully.
service haproxy restart
service haproxy status
Also we need add following blocks into hue.ini.
[impala]
server_port=21051
[beeswax]
hive_server_port=10016
Read more about it in the How to optimally configure your Analytic Database for High Availability with Hue and other SQL clients post.
You need to enable WebHdfs or run an HttpFS server. To turn on WebHDFS,
add this to your hdfs-site.xml
and restart your HDFS cluster.
Depending on your setup, your hdfs-site.xml
might be in /etc/hadoop/conf
.
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
You also need to add this to core-site.xml
.
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
If you place your Hue Server outside the Hadoop cluster, you can run an HttpFS server to provide Hue access to HDFS. The HttpFS service requires only one port to be opened to the cluster.
Also add this in httpfs-site.xml
which might be in /etc/hadoop-httpfs/conf
.
<property>
<name>httpfs.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>httpfs.proxyuser.hue.groups</name>
<value>*</value>
</property>
Hue submits MapReduce jobs to Oozie as the logged in user. You need to
configure Oozie to accept the hue
user to be a proxyuser. Specify this in
your oozie-site.xml
(even in a non-secure cluster), and restart Oozie:
<property>
<name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>
<value>*</value>
</property>
Hue currently requires that the machines within your cluster can connect to each other freely over TCP. The machines outside your cluster must be able to open TCP port 8888 on the Hue Server (or the configured Hue web HTTP port) to interact with the system.
A recommended setup consists in:
Performing a GET /desktop/debug/is_alive
will return a 200 response if running.
Hue is often run with:
The task server is currently a work in progress to outsource all the blocking or resource intensive operations outside of the API server. Follow (HUE-8738)[https://issues.cloudera.org/browse/HUE-8738) for more information on when first usable task will be released.
Until then, here is how to try the task server service.
Make sure you have Rabbit MQ installed and running.
sudo apt-get install rabbitmq-server -y
In hue.ini, telling the API server that the Task Server is available:
[desktop]
[[task_server]]
enabled=true
Starting the Task server:
./build/env/bin/hue runcelery worker --concurrency=1
Starting the Task server in a Cloudera Manager environment:
./build/env/bin/hue runcelery worker --concurrency=1 --cm-managed
Running a test tasks:
./build/env/bin/hue shell
from desktop.celery import debug_task
debug_task.delay()
debug_task.delay().get() # Works if result backend is setup and task_server is true in the hue.ini
Starting the Task Scheduler server:
./build/env/bin/celery -A core beat -l info
or when Django Celery Beat is enabled:
./build/env/bin/celery -A core beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler
A Web proxy lets you centralize all the access to a certain URL and prettify the address (e.g. ec2-54-247-321-151.compute-1.amazonaws.com --> demo.gethue.com).
Here is one way to do it with Apache.
The Quick Start wizard allows you to perform the following Hue setup operations by clicking the tab of each step or sequentially by clicking Next in each screen:
Displays a list of the installed Hue applications and their configuration. The location of the folder containing the Hue configuration files is shown at the top of the page. Hue configuration settings are in the hue.ini configuration file.
Click the tabs under Configuration Sections and Variables to see the settings configured for each application. For information on configuring these settings, see Hue Configuration in the Hue installation manual.
Hue ships with a default configuration that will work for
pseudo-distributed clusters. If you are running on a real cluster, you must
make a few changes to the hue.ini
configuration file (/etc/hue/hue.ini
when installed from the
package version) or pseudo-distributed.ini
in desktop/conf
when in development mode).
The following sections describe the key configuration options you must make to configure Hue.
This commands outlines the various sections and options in the configuration, and provides help and information on the default values.
These configuration variables are under the [desktop]
section in
the hue.ini
configuration file.
Hue uses CherryPy web server. You can use the following options to change the IP address and port that the web server listens on. The default setting is port 8888 on all configured IP addresses.
# Webserver listens on this address and port
http_host=0.0.0.0
http_port=8888
Gunicorn support is planned to come in via HUE-8739.
For security, you should also specify the secret key that is used for secure hashing in the session store. Enter a long series of random characters (30 to 60 characters is recommended).
secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o
NOTE: If you don't specify a secret key, your session cookies will not be secure. Hue will run but it will also display error messages telling you to set the secret key.
In the Hue ini configuration file, in the [desktop] section, you can enter the names of the app to hide:
[desktop] # Comma separated list of apps to not load at server startup. app_blacklist=beeswax,impala,security,filebrowser,jobbrowser,rdbms,jobsub,pig,hbase,sqoop,zookeeper,metastore,spark,oozie,indexer
By default, the first user who logs in to Hue can choose any username and password and becomes an administrator automatically. This user can create other user and administrator accounts. User information is stored in the Django database in the Django backend.
The authentication system is pluggable. For more information, see the SDK Documentation.
List of some of the possible authentications:
For example, to enable Hue to first attempt LDAP directory lookup before falling back to the database-backed user model, we can update the hue.ini configuration file or Hue safety valve in Cloudera Manager with a list containing first the LdapBackend followed by either the ModelBackend or custom AllowFirstUserDjangoBackend (permits first login and relies on user model for all subsequent authentication):
[desktop] [[auth]] backend=desktop.auth.backend.LdapBackend,desktop.auth.backend.AllowFirstUserDjangoBackend
This tells Hue to first check against the configured LDAP directory service, and if the username is not found in the directory, then attempt to authenticate the user with the Django user manager.
Programmatically
When a Hue administrator loses their password, a more programmatic approach is required to secure the administrator again. Hue comes with a wrapper around the python interpreter called the “shell” command. It loads all the libraries required to work with Hue at a programmatic level. To start the Hue shell, type the following command from the Hue installation root.
Then:
cd /usr/lib/hue (or /opt/cloudera/parcels/CDH-XXXXX/share/hue if using parcels and CM)
build/env/bin/hue shell
The following is a small script, that can be executed within the Hue shell, to change the password for a user named “example”:
from django.contrib.auth.models import User
user = User.objects.get(username='example')
user.set_password('some password')
user.save()
The script can also be invoked in the shell by using input redirection (assuming the script is in a file named script.py):
build/env/bin/hue shell < script.py
How to make a certain user a Hue admin
build/env/bin/hue shell
Then set these properties to true:
from django.contrib.auth.models import User
a = User.objects.get(username='hdfs')
a.is_staff = True
a.is_superuser = True
a.set_password('my_secret')
a.save()
Via a command
Go on the Hue machine, then in the Hue home directory and either type:
To change the password of the currently logged in Unix user:
build/env/bin/hue changepassword
If you don't remember the admin username, create a new Hue admin (you will then also be able to login and could change the password of another user in Hue):
build/env/bin/hue createsuperuser
The properties we need to tweak are leaflet_tile_layer and leaflet_tile_layer_attribution, that can be configured in the hue.ini file:
[desktop] leaflet_tile_layer=https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x} leaflet_tile_layer_attribution='Tiles © Esri — Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community'
We explained how to run Hue with NGINX serving the static files or under Apache. If you use another proxy, you might need to set these options:
[desktop] # Enable X-Forwarded-Host header if the load balancer requires it. use_x_forwarded_host=false # Support for HTTPS termination at the load-balancer level with SECURE_PROXY_SSL_HEADER. secure_proxy_ssl_header=false
You can configure Hue to serve over HTTPS.
Configure Hue to use your private key by adding the following
options to the hue.ini
configuration file:
ssl_certificate=/path/to/certificate ssl_private_key=/path/to/key
Ideally, you would have an appropriate key signed by a Certificate Authority.
If you're just testing, you can create a self-signed key using the openssl
command that may be installed on your system:
Create a key:
openssl genrsa 1024 > host.key
Create a self-signed certificate:
openssl req -new -x509 -nodes -sha1 -key host.key > host.cert
Note: The security vulnerability SWEET32 is also called Birthday attacks against TLS ciphers with 64bit block size and it is assigned CVE-2016-2183. This is due to legacy block ciphers having block size of 64 bits are vulnerable to a practical collision attack when used in CBC mode.
DES/3DES are the only ciphers has block size of 64-bit. One way to config Hue not to use them:
[desktop] ssl_cipher_list=DEFAULT:!DES:!3DES
When getting a bigger result set from Hive/Impala or bigger files like images from HBase, the response requires to increase the buffer size of SASL lib for thrift sasl communication.
[desktop] # This property specifies the maximum size of the receive buffer in bytes in thrift sasl communication, # default value is 2097152 (2 MB), which equals to (2 * 1024 * 1024) sasl_max_buffer=2097152
In the [useradmin]
section of the configuration file, you can
optionally specify the following:
default_user_group:: The name of a default group that is suggested when creating a user manually. If the LdapBackend or PamBackend are configured for doing user authentication, new users will automatically be members of the default group.
You can add a custom banner to the Hue Web UI by applying HTML directly to the property, banner_top_html. For example:
banner_top_html=<H4>My company's custom Hue Web UI banner</H4>
You can customize a splash screen on the login page by applying HTML directly to the property, login_splash_html. For example:
[desktop]
[[custom]]
login_splash_html=WARNING: You are required to have authorization before you proceed.
There is also the possibility to change the logo for further personalization.
[desktop]
[[custom]]
# SVG code to replace the default Hue logo in the top bar and sign in screen
# e.g. <image xlink:href="/static/desktop/art/hue-logo-mini-white.png" x="0" y="0" height="40" width="160" />
logo_svg=
You can go crazy and write there any SVG code you want. Please keep in mind your SVG should be designed to fit in a 160×40 pixels space. To have the same ‘hearts logo' you can see above, you can type this code
[desktop]
[[custom]]
logo_svg='<g><path stroke="null" id="svg_1" d="m44.41215,11.43463c-4.05017,-10.71473 -17.19753,-5.90773 -18.41353,-0.5567c-1.672,-5.70253 -14.497,-9.95663 -18.411,0.5643c-4.35797,11.71793 16.891,22.23443 18.41163,23.95773c1.5181,-1.36927 22.7696,-12.43803 18.4129,-23.96533z" fill="#ffffff"/> <path stroke="null" id="svg_2" d="m98.41246,10.43463c-4.05016,-10.71473 -17.19753,-5.90773 -18.41353,-0.5567c-1.672,-5.70253 -14.497,-9.95663 -18.411,0.5643c-4.35796,11.71793 16.891,22.23443 18.41164,23.95773c1.5181,-1.36927 22.76959,-12.43803 18.41289,-23.96533z" fill="#FF5A79"/> <path stroke="null" id="svg_3" d="m154.41215,11.43463c-4.05016,-10.71473 -17.19753,-5.90773 -18.41353,-0.5567c-1.672,-5.70253 -14.497,-9.95663 -18.411,0.5643c-4.35796,11.71793 16.891,22.23443 18.41164,23.95773c1.5181,-1.36927 22.76959,-12.43803 18.41289,-23.96533z" fill="#ffffff"/> </g>'
Read more about it in Hue with a custom logo post.
This article details how to store passwords in a script launched from the OS rather than have clear text passwords in the hue*.ini files.
Some passwords go in Hue ini configuration file making them easily visible to Hue admin user or by users of cluster management software. You can use the password_script feature to prevent passwords from being visible.
Hue now offers a new property, idle_session_timeout, that can be configured in the hue.ini file:
[desktop] [[auth]] idle_session_timeout=600
When idle_session_timeout is set, users will automatically be logged out after N (e.g. – 600) seconds of inactivity and be prompted to login again:
Read more about Auditing User Administration Operations with Hue and Cloudera Navigator.
Now that you've installed and started Hue, you can feel free to skip ahead
to the <
Hue can detect certain invalid configuration.
To view the configuration of a running Hue instance, navigate to
http://myserver:8888/hue/dump_config
, also accessible through the About
application.
Displays the Hue Server log and allows you to download the log to your local system in a zip file.
Read more on the Threads and Metrics pages blog post
Threads page can be very helpful in debugging purposes. It includes a daemonic thread and the thread objects serving concurrent requests. The host name, thread name identifier and current stack frame of each are displayed. Those are useful when Hue “hangs”, sometimes in case of a request too CPU intensive. There is also a REST API to get the dump of Threads using 'desktop/debug/threads'
Read more on the Threads and Metrics pages blog post
Hue uses the PyFormance Python library to collect the metrics. These metrics are represented as gauge, counters, meter, rate of events over time, histogram, statistical distribution of values. A REST API endpoint '/desktop/metrics/' to get all the metrics dump as json is also exposed
The below metrics of most concern to us are displayed on the page:
One of the most useful ones are the percentiles of response time of requests and the count of active users. Admins can either filter a particular property in all the metrics or select a particular metric for all properties
The User Admin application lets a superuser add, delete, and manage Hue users and groups, and configure group permissions. Superusers can add users and groups individually, or import them from an LDAP directory. Group permissions define the Hue applications visible to group members when they log into Hue and the application features available to them.
Click the User Admin icon in the top right navigation bar under your username.
LDAP or PAM pass-through authentication with Hive or Impala and Impersonation .
The User Admin application provides two levels of user privileges: superusers and users.
Users can change their name, e-mail address, and password and log in to Hue and run Hue applications, subject to the permissions provided by the Hue groups to which they belong.
Username | A user name that contains only letters, numbers, and underscores; blank spaces are not allowed and the name cannot begin with a number. The user name is used to log into Hue and in file permissions and job submissions. This is a required field. |
Password and Password confirmation | A password for the user. This is a required field. |
Create home directory | Indicate whether to create a directory named /user/username in HDFS. For non-superusers, the user and group of the directory are username. For superusers, the user and group are username and supergroup. |
First name and Last name | The user's first and last name. |
E-mail address | The user's e-mail address. The e-mail address is used by the Job Designer and Beeswax applications to send users an e-mail message after certain actions have occurred. The Job Designer sends an e-mail message after a job has completed. Beeswax sends a message after a query has completed. If an e-mail address is not specified, the application will not attempt to email the user. |
Groups | The groups to which the user belongs. By default, a user is assigned to the **default** group, which allows access to all applications. See [Permissions](#permissions). |
Active | Indicate that the user is enabled and allowed to log in. Default: checked. |
Superuser status | Assign superuser privileges to the user. |
Note:
Importing users from an LDAP directory does not import any password information. You must add passwords manually in order for a user to log in.
To add a user from an external LDAP directory:
Username | The user name. |
Distinguished name | Indicate that Hue should use a full distinguished name for the user. This imports the user's first and last name, username, and email, but does not store the user password. |
Create home directory | Indicate that Hue should create a home directory for the user in HDFS. |
Click Add/sync user.
If the user already exists in the User Admin, the user information in User Admin is synced with what is currently in the LDAP directory.
You can sync the Hue user database with the current state of the LDAP directory using the Sync LDAP users/groups function. This updates the user and group information for the already imported users and groups. It does not import any new users or groups.
Superusers can add and delete groups, configure group permissions, and assign users to group memberships.
You can add groups, and delete the groups you've added. You can also import groups from an LDAP directory.
Name | The name of the group. Group names can only be letters, numbers, and underscores; blank spaces are not allowed. |
Members | The users in the group. Check user names or check Select all. |
Permissions | The applications the users in the group can access. Check application names or check Select all. |
[desktop]
[[ldap]]
login_groups=ldap_grp1,ldap_grp2,ldap_grp3
Name | The name of the group. |
Distinguished name | Indicate that Hue should use a full distinguished name for the group. |
Import new members | Indicate that Hue should import the members of the group. |
Import new members from all subgroups | Indicate that Hue should import the members of the subgroups. |
Create home directories | Indicate that Hue should create home directories in HDFS for the imported members. |
Permissions for Hue applications are granted to groups, with users gaining permissions based on their group membership. Group permissions define the Hue applications visible to group members when they log into Hue and the application features available to them.
A script called supervisor
manages all Hue processes. The supervisor is a
watchdog process -- its only purpose is to spawn and monitor other processes.
A standard Hue installation starts and monitors the following processes:
runcpserver
- a web server based on CherryPy that provides the core web
functionality of HueIf you have installed other applications into your Hue instance, you may see other daemons running under the supervisor as well.
You can see the supervised processes running in the output of ps -f -u hue
:
UID PID PPID C STIME TTY TIME CMD
hue 8685 8679 0 Aug05 ? 00:01:39 /usr/share/hue/build/env/bin/python /usr/share/hue/build/env/bin/desktop runcpserver
Note that the supervisor automatically restarts these processes if they fail for any reason. If the processes fail repeatedly within a short time, the supervisor itself shuts down.
The Hue logs are found in /var/log/hue
, or in a logs
directory under your
Hue installation root. Inside the log directory you can find:
access.log
file, which contains a log for all requests against the Hue
web server.supervisor.log
file, which contains log information for the supervisor
process.supervisor.out
file, which contains the stdout and stderr for the
supervisor process..log
file for each supervised process described above, which contains
the logs for that process..out
file for each supervised process described above, which contains
the stdout and stderr for that process.If users on your cluster have problems running Hue, you can often find error
messages in these log files. If you are unable to start Hue from the init
script, the supervisor.log
log file can often contain clues.
In addition to logging INFO
level messages to the logs
directory, the Hue
web server keeps a small buffer of log messages at all levels in memory. You can
view these logs by visiting http://myserver:8888/hue/logs
. The DEBUG
level
messages shown can sometimes be helpful in troubleshooting issues.
To troubleshoot why Hue is slow or consuming high memory, admin can enable instrumentation by setting the instrumentation
flag to True.
[desktop] instrumentation=trueIf
django_debug_mode
is enabled, instrumentation is automatically enabled. This flag appends the response time and the total peak memory used since Hue started for every logged request.
[17/Apr/2018 15:18:43 -0700] access INFO 127.0.0.1 admin - "POST /jobbrowser/jobs/ HTTP/1.1" `returned in 97ms (mem: 135mb)`
[23/Apr/2018 10:59:01 -0700] INFO 127.0.0.1 admin - "POST /jobbrowser/jobs/ HTTP/1.1" returned in 88ms
Hue requires a SQL database to store small amounts of data, including user account information as well as history of job submissions and Hive queries. By default, Hue is configured to use the embedded database SQLite for this purpose, and should require no configuration or management by the administrator. However, MySQL is the recommended database to use. This section contains instructions for configuring Hue to access MySQL and other databases.
The default SQLite database used by Hue is located in: /usr/share/hue/desktop/desktop.db
.
You can inspect this database from the command line using the sqlite3
program or typing `/usr/share/hue/build/env/bin/hue dbshell'. For example:
sqlite3 /usr/share/hue/desktop/desktop.db
SQLite version 3.6.22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select username from auth_user;
admin
test
sample
sqlite>
It is strongly recommended that you avoid making any modifications to the database directly using SQLite, though this trick can be useful for management or troubleshooting.
If you use the default SQLite database, then copy the desktop.db
file to
another node for backup. It is recommended that you back it up on a regular
schedule, and also that you back it up before any upgrade to a new version of
Hue.
Although SQLite is the default database type, some advanced users may prefer to have Hue access an alternate database type. Note that if you elect to configure Hue to use an external database, upgrades may require more manual steps in the future.
The following instructions are for MySQL, though you can also configure Hue to work with other common databases such as PostgreSQL and Oracle.
To configure Hue to store data in MySQL:
Create a new database in MySQL and grant privileges to a Hue user to manage this database.
mysql> create database hue; Query OK, 1 row affected (0.01 sec) mysql> grant all on hue.* to 'hue'@'localhost' identified by 'secretpassword'; Query OK, 0 rows affected (0.00 sec)
Shut down Hue if it is running.
To migrate your existing data to MySQL, use the following command to dump the existing database data to a text file. Note that using the ".json" extension is required.
/usr/share/hue/build/env/bin/hue dumpdata > $some-temporary-file.json
Open the hue.ini
file in a text editor. Directly below the
[[database]]
line, add the following options (and modify accordingly for
your MySQL setup):
host=localhost port=3306 engine=mysql user=hue password=secretpassword name=hue
As the Hue user, configure Hue to load the existing data and create the necessary database tables:
/usr/share/hue/build/env/bin/hue syncdb --noinput mysql -uhue -psecretpassword -e "DELETE FROM hue.django_content_type;" /usr/share/hue/build/env/bin/hue loaddata $temporary-file-containing-dumped-data.json
Your system is now configured and you can start the Hue server as normal.
After installation, you can use Hue by navigating to http://myserver:8888/
.
The user guide will help users go through the various installed applications.
The two latest LTS versions of each browsers:
Q: I moved my Hue installation from one directory to another and now Hue no longer functions correctly.
A: Due to the use of absolute paths by some Python packages, you must run a series of commands if you move your Hue installation. In the new location, run:
rm app.reg
rm -r build
make apps
Q: Why does "make install" compile other pieces of software?
A: In order to ensure that Hue is stable on a variety of distributions and architectures, it installs a Python virtual environment which includes its dependencies. This ensures that the software can depend on specific versions of various Python libraries and you don't have to be concerned about missing software components.
Your feedback is welcome. The best way to send feedback is to join the https://groups.google.com/a/cloudera.org/group/hue-user[mailing list], and send e-mail, to mailto:[email protected][[email protected]].
If you find that something doesn't work, it'll often be helpful to include logs from your server. Please include the logs as a zip (or cut and paste the ones that look relevant) and send those with your bug reports.