Hue consists in 4 apps in a single page interface that allow the users to perform data analyses without losing any context. The goal is to promote self service and stay simple like Excel so that 80% of the user can find, explore and query data and become more data driven.
1. Find or import your data
Use the left metadata assists to browse your existing data without losing your editor. The top search will look through your saved queries and matching tables, columns and databases. Objects can be tagged for a quick retrieval or assigning a more “humane” name. If the data does not exist yet, just drag & drop it to trigger the Create Table wizard and to import it in just two steps.
2. Query your data
When you found your data, the Editor's autocomplete is extremely powerful as they support 90-100% of the language syntax and will highlight any syntax or logical error. The right assistant provides quick previews of the datasets, which columns or JOINs are popular and recommendations on how to type optimized queries. After the querying, refine your results before exporting to S3/HDFS/ADLS or downloaded as CSV/Excel.
4 applications
Each app of Hue can be extended to support your own languages or apps as detailed in the SDK.
The layout simplifies the interface and is now single page app, and this makes things snappier and unifies the apps together.
From top to bottom we have:
Learn more on the The Hue 4 user interface in detail.
The new search bar is always accessible on the top of screen, and it offers a document search and metadata search too if Hue is configured to access a metadata server like Cloudera Navigator.
Embedded Search & Tagging
Have you ever struggled to remember table names related to your project? Does it take much too long to find those columns or views? Hue now lets you easily search for any table, view, or column across all databases in the cluster. With the ability to search across tens of thousands of tables, you're able to quickly find the tables that are relevant for your needs for faster data discovery.
In addition, you can also now tag objects with names to better categorize them and group them to different projects. These tags are searchable, expediting the exploration process through easier, more intuitive discovery.
Through an integration with Cloudera Navigator, existing tags and indexed objects show up automatically in Hue, any additional tags you add appear back in Cloudera Navigator, and the familiar Cloudera Navigator search syntax is supported.
A top search bar now appears. The autocomplete offers a list of facets and prefills the top values. Pressing enter lists the available objects, which can be opened and explored further in the sample popup, the assist or directly into the table browser app.
Granular Search
By default, only tables and views are returned. To search for columns, partitions, databases use the ‘type:' filter.
Example of searches:
b
of the table web_logs
in the database default
.Learn more on the Search and Tagging.
Data where you need it when you need it
You can now find your Hue documents, HDFS and S3 files and more in the left assist panel, right-clicking items will show a list of actions, you can also drag-and-drop a file to get the path in your editor and more.
This assistant content depends on the context of the application selected and will display the current tables or available UDFs.
This popup offers a quick way to see sample of the data and other statistics on databases, tables, and columns. You can open the popup from the SQL Assist or with a right-click on any SQL object (table, column, function…). In this release, it also opens faster and caches the data.
Similarly to Google Document, queries, workflows... can be saved and shared with other users.
Sharing happens on the main page or via the top right menu of the application. Users and groups with Read or Write permissions can be selected.
Via the Home page, saved documents can be exported for backups or transferring to another Hue.
The language is automatically detected from the Browser or OS. English, Spanish, French, German, Korean, Japanese and Chinese are supported.
The language can be manual set by a user in the "My Profile" page. Please go to My Profile > Step2 Profile and Groups > Language Preference and choose the language you want.
The goal of Hue's Editor is to make data querying easy and productive.
It focuses on SQL but also supports job submissions. It comes with an intelligent autocomplete, search & tagging of data and query assistance.
The custom SQL Editor page also describes the configuration steps. Any editor can be starred
next to its name so that it becomes the default editor and the landing page when logging in.
First, in your hue.ini file, you will need to add the relevant database connection information under the librdbms section:
[librdbms]
[[databases]]
[[[postgresql]]]
nice_name=PostgreSQL
name=music
engine=postgresql_psycopg2
port=5432
user=hue
password=hue
options={}
Secondly, we need to add a new interpreter to the notebook app. This will allow the new database type to be registered as a snippet-type in the Notebook app. For query editors that use a Django-compatible database, the name in the brackets should match the database configuration name in the librdbms section (e.g. – postgresql). The interface will be set to rdbms. This tells Hue to use the librdbms driver and corresponding connection information to connect to the database. For example, with the above postgresql connection configuration in the librdbms section, we can add a PostgreSQL interpreter with the following notebook configuration:
[notebook]
[[interpreters]]
[[[postgresql]]]
name=PostgreSQL
interface=rdbms
Note: To run a query, you must be logged in to Hue as a user that also has a Unix user account on the remote server.
If there are multiple statements in the query (separated by semi-colons), click Next in the Multi-statement query pane to execute the remaining statements.
When you have multiple statements it's enough to put the cursor in the statement you want to execute, the active statement is indicated with a blue gutter marking.
Note: Use CTRL/Cmd + ENTER to execute queries.
Note: Under the logs panel, you can view any MapReduce or Impala jobs that the query generated.
To get things started, press the export icon, the bottom last element of the action bar to the top left of the results. There are several ways you can export results of a query.
Two of them offer great scalability: 1. Export to an empty folder on your cluster's file system. This exports the results using multiple files. In the export icon, choose Export and then All. 2. Export to a table. You can choose an already existing table or a new one. In the export icon, choose Export and then Table.
Two of them offer limited scalability: 1. Export to a file on your cluster's file systems. This exports the results to a single file. In the export icon, choose Export and then First XXX. 2. Download to your computer as a CSV or XLS. This exports the results to a single file in comma-separated values or Microsoft Office Excel format. In the export icon, choose Download as CSV or Download as XLS.
The pane to the top of the Editor lets you specify the following options:
DATABASE | The database containing the table definitions. |
SETTINGS | Override the Hive and Hadoop default settings. To configure a new
setting:
|
FILE RESOURCES | Make files locally accessible at query execution time available on the
Hadoop cluster. Hive uses the Hadoop Distributed Cache to distribute the
added files to all machines in the cluster at query execution time.
|
USER-DEFINED FUNCTIONS | Specify user-defined functions. Click Add to configure a new setting. Specify the function name in the Name field, and specify the class name for Classname. You *must* specify a JAR file for the user-defined functions in FILE RESOURCES. To include a user-defined function in a query, add a $ (dollar sign) before the function name in the query. For example, if MyTable is a user-defined function name in the query, you would type: SELECT $MyTable |
PARAMETERIZATION | Indicate that a dialog box should display to enter parameter values when a query containing the string $parametername is executed. Enabled by default. |
To make your SQL editing experience better we've created a new autocompleter for Hue 3.11. The old one had some limitations and was only aware of parts of the statement being edited. The new autocompleter knows all the ins and outs of the Hive and Impala SQL dialects and will suggest keywords, functions, columns, tables, databases, etc. based on the structure of the statement and the position of the cursor.
The result is improved completion throughout. We now have completion for more than just SELECT statements, it will help you with the other DDL and DML statements too, INSERT, CREATE, ALTER, DROP etc.
Smart column suggestions
If multiple tables appear in the FROM clause, including derived and joined tables, it will merge the columns from all the tables and add the proper prefixes where needed. It also knows about your aliases, lateral views and complex types and will include those. It will now automatically backtick any reserved words or exotic column names where needed to prevent any mistakes.
Smart keyword completion
The new autocompleter suggests keywords based on where the cursor is positioned in the statement. Where possible it will even suggest more than one word at at time, like in the case of IF NOT EXISTS, no one likes to type too much right? In the parts where order matters but the keywords are optional, for instance after FROM tbl, it will list the keyword suggestions in the order they are expected with the first expected one on top. So after FROM tbl the WHERE keyword is listed above GROUP BY etc.
UDFs
The improved autocompleter will now suggest functions, for each function suggestion an additional panel is added in the autocomplete dropdown showing the documentation and the signature of the function. The autocompleter know about the expected types for the arguments and will only suggest the columns or functions that match the argument at the cursor position in the argument list.
Sub-queries, correlated or not
When editing subqueries it will only make suggestions within the scope of the subquery. For correlated subqueries the outside tables are also taken into account.
All about quality
We've fine-tuned the live autocompletion for a better experience and we've introduced some options under the editor settings where you can turn off live autocompletion or disable the autocompleter altogether (if you're adventurous). To access these settings open the editor and focus on the code area, press CTRL + , (or on Mac CMD + ,) and the settings will appear.
The autocompleter talks to the backend to get data for tables and databases etc. by default it will timeout after 5 seconds but once it has been fetched it's cached for the next time around. The timeout can be adjusted in the Hue server configuration.
We've got an extensive test suite but not every possible statement is covered, if the autocompleter can't interpret a statement it will be silent and no drop-down will appear. If you encounter a case where you think it should suggest something but doesn't or if it gives incorrect suggestions then please let us know.
Learn more about it in Autocompleter for Hive and Impala.
Variables are used to easily configure parameters in a query. They can be of two types:
Single Valued
select * from web_logs where country_code = "${country_code}"The variable can have a default value.
select * from web_logs where country_code = "${country_code=US}"Multi Valued
select * from web_logs where country_code = "${country_code=CA, FR, US}"In addition, the displayed text for multi valued variables can be changed.
select * from web_logs where country_code = "${country_code=CA(Canada), FR(France), US(United States)}"For values that are not textual, omit the quotes.
select * from boolean_table where boolean_column = ${boolean_column}
A little red underline will display the incorrect syntax so that the query can be fixed before submitting. A right click offers suggestions.
Read more about the Query Assistant with Navigator Optimizer Integration .
These visualizations are convenient for plotting chronological data or when subsets of rows have the same attribute: they will be stacked together.
Read more about extending charts.
The autocompleter will suggest popular tables, columns, filters, joins, group by, order by etc. based on metadata from Navigator Optimizer. A new “Popular” tab has been added to the autocomplete result dropdown which will be shown when there are popular suggestions available.
Risk and suggestions
While editing, Hue will run your queries through Navigator Optimizer in the background to identify potential risks that could affect the performance of your query. If a risk is identified an exclamation mark is shown above the query editor and suggestions on how to improve it is displayed in the lower part of the right assistant panel.
Turns a list of semi-colon separated queries into an interactive presentation. It is great for doing demos or basic reporting.
Use the query editor with any database.
With Solr 5+, query collections like we would query a regular Hive or Impala table.
As Solr SQL is pretty recent, there are some caveats, notably Solr lacks support of:
which prevents a SQL UX experience comparable to the standard other databases (but we track it in HUE-3686).
Presto is a high performance, distributed SQL query engine for big data.
Apache Kylin is an open-source online analytical processing (OLAP) engine. See how to configure the Kylin Query Editor.
Extend with SQL Alchemy, JDBC or build your own connectors.
The Editor application enables you to create and submit jobs to the cluster. You can include variables with your jobs to enable you and other users to enter values for the variables when they run your job.
All job design settings except Name and Description support the use of variables of the form $variable_name. When you run the job, a dialog box will appear to enable you to specify the values of the variables.
Name | Identifies the job and its collection of properties and parameters. |
Description | A description of the job. The description is displayed in the dialog box that appears if you specify variables for the job. |
Advanced | Advanced settings:
|
Prepare | Specifies paths to create or delete before starting the workflow job. |
Params | Parameters to pass to a script or command. The parameters are expressed using the JSP 2.0 Specification (JSP.2.3) Expression Language, allowing variables, functions, and complex expressions as parameters. |
Job Properties | Job properties. To set a property value, click Add Property.
|
Files | Files to pass to the job. Equivalent to the Hadoop -files option. |
Archives | Files to pass to the job. Archives to pass to the job. Equivalent to the Hadoop -archives option. |
A MapReduce job design consists of MapReduce functions written in Java. You can create a MapReduce job design from existing mapper and reducer classes without having to write a main Java class. You must specify the mapper and reducer classes as well as other MapReduce properties in the Job Properties setting.
Jar path | The fully-qualified path to a JAR file containing the classes that implement the Mapper and Reducer functions. |
A Java job design consists of a main class written in Java.
Jar path | The fully-qualified path to a JAR file containing the main class. |
Main class | The main class to invoke the program. |
Args | The arguments to pass to the main class. |
Java opts | The options to pass to the JVM. |
A Pig job design consists of a Pig script.
Script name | Script name or path to the Pig script. |
A Sqoop job design consists of a Sqoop command.
Command | The Sqoop command. |
A Shell job design consists of a shell command.
Command | The shell command. |
Capture output | Indicate whether to capture the output of the command. |
A DistCp job design consists of a DistCp command.
This is a quick way to submit any Jar or Python jar/script to a cluster via the Scheduler or Editor.
How to run Spark jobs with Spark on YARN? This often requires trial and error in order to make it work.
Hue is leveraging Apache Oozie to submit the jobs. It focuses on the yarn-client mode, as Oozie is already running the spark-summit command in a MapReduce2 task in the cluster. You can read more about the Spark modes here.
Here is how to get started successfully. And how to use the Spark Action.
Hue relies on Livy for the interactive Scala, Python and R snippets.
Livy got initially developed in the Hue project but got a lot of traction and was moved to its own project on livy.io. Here is a tutorial on how to use a notebook to perform some Bike Data analysis.
Read more about it:
Make sure that the Notebook and interpreters are set in the hue.ini, and Livy is up and running:
[spark] # Host address of the Livy Server. livy_server_host=localhost [notebook] ## Show the notebook menu or not show_notebooks=true [[interpreters]] # Define the name and how to connect and execute the language. [[[hive]]] # The name of the snippet. name=Hive # The backend connection to use to communicate with the server. interface=hiveserver2 [[[spark]]] name=Scala interface=livy [[[pyspark]]] name=PySpark interface=livy
The goal of the importer is to allow ad hoc queries on data not yet in the clusters thereby expedite self-service analytics.
If you want to import your own data instead of installing the sample
tables, open the importer from the left menu or from the little +
in the left assist.
If you've ever struggled with creating new SQL tables from files, you'll be happy to learn that this is now much easier. The wizard has been revamped to two simple steps and also offers more formats. Now users just need to:
And that's it!
To learn more, watch the video on Data Import Wizard.
Although you can create tables by executing the appropriate Hive HQL DDL query commands, it is easier to create a table using the create table wizard.
From a File
If you've ever struggled with creating new SQL tables from files, you'll be happy to learn that this is now much easier. With the latest Hue release, you can now create these in an ad hoc way and thereby expedite self-service analytics. The wizard has been revamped to two simple steps and also offers more formats. Now users just need to:
Files can be dragged & dropped, selected from HDFS or S3 (if configured), and their formats are automatically detected. The wizard also assists when performing advanced functionalities like table partitioning, Kudu tables, and nested types.
Manually
In the past, indexing data into Solr to then explore it with a Dynamic Dashboard has been quite difficult. The task involved writing a Solr schema and a Morphlines file then submitting a job to YARN to do the indexing. Often times getting this correct for non trivial imports could take a few days of work. Now with Hue's new feature you can start your YARN indexing job in minutes. This tutorial offers a step by step guide on how to do it.
Read more about ingesting data from traditional databases.
Dashboards are an interactive way to explore your data quickly and easily. No programming is required and the analysis is done by drag & drops and clicks.
Read more about Dashboards.
Simply drag & drop widgets that are interconnected together. This is great for exploring new datasets or monitoring without having to type.
Any CSV file can be dragged & dropped and ingested into an index in a few clicks via the Data Import Wizard [link]. The indexed data is immediately queryable and its facets/dimensions will be very fast to explore.
The Collection browser got polished in the last releases and provide more information on the columns. The left metadata assist of Hue 4 makes it handy to list them and peak at their content via the sample popup.
The search box support live prefix filtering of field data and comes with a Solr syntax autocomplete in order to make the querying intuitive and quick. Any field can be inspected for its top values of statistic. This analysis happens very fast as the data is indexed.
The top search bar offers a full autocomplete on all the values of the index.
The “More like This” feature lets you selected fields you would like to use to find similar records. This is a great way to find similar issues, customers, people... with regard to a list of attributes.
This is work in progress but dashboards will soon offer a classic reporting option.
Read more about extending connectors.
Hue's Browsers powers your Data Catalog. They let you easily search, glance and perform actions on data or jobs in Cloud or on premise clusters.
The Table Browser enables you to manage the databases, tables, and partitions of the metastore shared by the Hive and Impala. You can use Metastore Manager to perform the following operations:
Tables
The File Browser application lets you browse and manipulate files and directories in the Hadoop Distributed File System (HDFS), S3 or ADLS. With File Browser, you can:
Hue can be setup to read and write to a configured S3 account, and users get autocomplete capabilities and can directly query from and save data to S3 without any intermediate moving/copying to HDFS.
Create Hive Tables Directly From S3 Hue's Metastore Import Data Wizard can create external Hive tables directly from data directories in S3. This allows S3 data to be queried via SQL from Hive or Impala, without moving or copying the data into HDFS or the Hive Warehouse.
To create an external Hive table from S3, navigate to the Metastore app, select the desired database and then click the “Create a new table from a file” icon in the upper right.
Enter the table name and optional description, and in the “Input File or Directory” filepicker, select the S3A filesystem and navigate to the parent directory containing the desired data files and click the “Select this folder” button. The “Load Data” dropdown should automatically select the “Create External Table” option which indicates that this table will directly reference an external data directory.
Choose your input files' delimiter and column definition options and finally click “Create Table” when you're ready to create the Hive table. Once created, you should see the newly created table details in the Metastore.
Save Query Results to S3
Now that we have created external Hive tables created from our S3 data, we can jump into either the Hive or Impala editor and start querying the data directly from S3 seamlessly. These queries can join tables and objects that are backed either by S3, HDFS, or both. Query results can then easily be saved back to S3.
S3 Configuration
Learn more about it on the ADLS integration post.
Users gets autocomplete capabilities and more:
Exploring ADLS in Hue's file browser Once Hue is successfully configured to connect to ADLS, we can view all accessible folders within the account by clicking on the ADLS root. From here, we can view the existing keys (both directories and files) and create, rename, move, copy, or delete existing directories and files. Additionally, we can directly upload files to ADLS.
Create Hive Tables Directly From ADLS Hue's table browser import wizard can create external Hive tables directly from files in ADLS. This allows ADLS data to be queried via SQL from Hive or Impala, without moving or copying the data into HDFS or the Hive Warehouse. To create an external Hive table from ADLS, navigate to the table browser, select the desired database and then click the plus icon in the upper right. Select a file using the file picker and browse to a file on ADLS.
Save Query Results to ADLS Now that we have created external Hive tables created from our ADLS data, we can jump into either the Hive or Impala editor and start querying the data directly from ADLS seamlessly. These queries can join tables and objects that are backed either by ADLS, HDFS, or both. Query results can then easily be saved back to ADLS.
ADLS Configuration
You can use File Browser to view the input and output files of your MapReduce jobs. Typically, you can save your output files in /tmp or in your home directory if your system administrator set one up for you. You must have the proper permissions to manipulate other user's files.
To change to your home directory, click Home in the path field at the top of the File Browser window.
Note:
The Home button is disabled if you do not have a home directory. Ask a Hue administrator to create a home directory for you.
You can upload text and binary files to the HDFS.
You can download text and binary files to the HDFS.
You can extract zip archives to the HDFS. The archive is extracted to a directory named archivename.
File Browser supports the HDFS trash folder (home directory/.Trash) to contain files and directories before they are permanently deleted. Files in the folder have the full path of the deleted files (in order to be able to restore them if needed) and checkpoints. The length of time a file or directory stays in the trash depends on HDFS properties.
In the File Browser window, click .
Note:
Only the Hadoop superuser can change a file's or directory's owner, group, or permissions. The user who starts Hadoop is the Hadoop superuser. The Hadoop superuser account is not necessarily the same as a Hue superuser account. If you create a Hue user (in User Admin) with the same user name and password as the Hadoop superuser, then that Hue user can change a file's or directory's owner, group, or permissions.
Owner or Group
Click Submit to make the changes.
Permissions
You can view and edit files as text or binary.
View
Edit
Sentry roles and privileges can directly be edited in the Security interface.
Solr privileges can be edited directly via the interface.
For listing collections, query and creating collection:
Admin=*->action=*
Collection=*->action=*
Schema=*->action=*
Config=*->action=*
The Job Browser application lets you to examine multiple types of jobs jobs running in the cluster. Job Browser presents the job and tasks in layers. The top layer is a list of jobs, and you can link to a list of that job's tasks. You can then view a task's attempts and the properties of each attempt, such as state, start and end time, and output size. To troubleshoot failed jobs, you can also view the logs of each attempt.
If there are jobs running, then the Job Browser list appears.
Note: At any level you can view the log for an object by clicking the icon in the Logs column.
To view job information for an individual job:
There are three ways to access the new browser:
Best: Click on the query ID after executing a SQL query in the editor. This will open the mini job browser overlay at the current query. Having the query execution information side by side the SQL editor is especially helpful to understand the performance characteristics of your queries. Open the mini job browser overlay and navigate to the queries tab. Open the job browser and navigate to the queries tab.
Query capabilities
Read more about it on Browsing Impala Query Execution within the SQL Editor .
List submitted workflows, schedules and bundles.
List Livy sessions and submitted statements.
The application lets you build workflows and then schedule them to run regularly automatically. A monitoring interface shows the progress, logs and allow actions like pausing or stopping jobs.
The Oozie Editor/Dashboard application allows you to define Oozie workflow, coordinator, and bundle applications, run workflow, coordinator, and bundle jobs, and view the status of jobs. For information about Oozie, see Oozie Documentation.
A workflow application is a collection of actions arranged in a directed acyclic graph (DAG). It includes two types of nodes:
A coordinator application allows you to define and execute recurrent and interdependent workflow jobs. The coordinator application defines the conditions under which the execution of workflows can occur.
A bundle application allows you to batch a set of coordinator applications.
In the Workflow Editor you can easily perform operations on Oozie action and control nodes.
The Workflow Editor supports dragging and dropping action nodes. As you move the action over other actions and forks, highlights indicate active areas. If there are actions in the workflow, the active areas are the actions themselves and the areas above and below the actions. If you drop an action on an existing action, a fork and join is added to the workflow.
Copy an action by clicking the Copy button.
The action is opened in the Edit Node screen.
Edit the action properties and click Done. The action is added to the end of the workflow.
Delete an action by clicking the button.
Note: worfklows.xml and their job.properties cab also directly be selected and executed via the File Browser.
In Coordinator Manager you create Oozie coordinator applications and submit them for execution.
In the Coordinator Editor you specify coordinator properties and the datasets on which the workflow scheduled by the coordinator will operate by stepping through screens in a wizard. You can also advance to particular steps and revisit steps by clicking the Step "tabs" above the screens. The following instructions walk you through the wizard.
A bundle consists in a collection of schedules.
In the Bundle Editor, you specify properties by stepping through screens in a wizard. You can also advance to particular steps and revisit steps by clicking the Step "tabs" above the screens. The following instructions walk you through the wizard.
Those modules are not active enough to be officially maintained in the core Hue but those are pretty functional and should still fit your needs. Any contribution is welcomed!
Check the SDK guide or contact the community about how to build your own custom app.
We'll take a look at the new HBase Browser App.
Prerequisites before using the app:
1. Have HBase and Thrift Service 1 initiated (Thrift can be configured)
2. Configure your list of HBase Clusters in hue.ini to point to your Thrift IP/Port
Note: With just a few changes in the Python API, the HBase browser could be compatible with Apache Kudu.
The smartview is the view that you land on when you first enter a table. On the left hand side are the row keys and hovering over a row reveals a list of controls on the right. Click a row to select it, and once selected you can perform batch operations, sort columns, or do any amount of standard database operations. To explore a row, simple scroll to the right. By scrolling, the row should continue to lazily-load cells until the end.
To initially populate the table, you can insert a new row or bulk upload CSV/TSV/etc. type data into your table.
On the right hand side of a row is a '+' sign that lets you insert columns into your row
To edit a cell, simply click to edit inline.
If you need more control or data about your cell, click “Full Editor” to edit.
In the full editor, you can view cell history or upload binary data to the cell. Binary data of certain MIME Types are detected, meaning you can view and edit images, PDFs, JSON, XML, and other types directly in your browser!
Hovering over a cell also reveals some more controls (such as the delete button or the timestamp). Click the title to select a few and do batch operations:
If you need some sample data to get started and explore, check out this howto create HBase table tutorial.
The "Smart Searchbar" is a sophisticated tool that helps you zero-in on your data. The smart search supports a number of operations. The most basic ones include finding and scanning row keys. Here I am selecting two row keys with:
domain.100, domain.200
Submitting this query gives me the two rows I was looking for. If I want to fetch rows after one of these, I have to do a scan. This is as easy as writing a '+' followed by the number of rows you want to fetch.
domain.100, domain.200 +5
Fetches domain.100 and domain.200 followed by the next 5 rows. If you're ever confused about your results, you can look down below and the query bar and also click in to edit your query.
The Smart Search also supports column filtering. On any row, I can specify the specific columns or families I want to retrieve. With:
domain.100[column_family:]
I can select a bare family, or mix columns from different families like so:
domain.100[family1:, family2:, family3:column_a]
Doing this will restrict my results from one row key to the columns I specified. If you want to restrict column families only, the same effect can be achieved with the filters on the right. Just click to toggle a filter.
Finally, let's try some more complex column filters. I can query for bare columns:
domain.100[column_a]
This will multiply my query over all column families. I can also do prefixes and scans:
domain.100[family: prefix* +3]
This will fetch me all columns that start with prefix* limited to 3 results. Finally, I can filter on range:
domain.100[family: column1 to column100]
This will fetch me all columns in 'family:' that are lexicographically >= column1 but <= column100. The first column ('column1') must be a valid column, but the second can just be any string for comparison.
The Smart Search also supports prefix filtering on rows. To select a prefixed row, simply type the row key followed by a star *. The prefix should be highlighted like any other searchbar keyword. A prefix scan is performed exactly like a regular scan, but with a prefixed row.
domain.10* +10
Finally, as a new feature, you can also take full advantage of the HBase filteringlanguage, by typing your filter string between curly braces. HBase Browser autocompletes your filters for you so you don't have to look them up every time. You can apply filters to rows or scans.
domain.1000 {ColumnPrefixFilter('100-') AND ColumnCountGetFilter(3)}
This doc only covers a few basic features of the Smart Search. You can take advantage of the full querying language by referring to the help menu when using the app. These include column prefix, bare columns, column range, etc. Remember that if you ever need help with the searchbar, you can use the help menu that pops up while typing, which will suggest next steps to complete your query.
Iport data from relational databases to HDFS file or Hive table using Apache Sqoop 1. It enables us to bring large amount of data into the cluster in just few clicks via interactive UI. This Sqoop connector was added to the existing import data wizard of Hue.
In the past, importing data using Sqoop command line interface could be a cumbersome and inefficient process. The task expected users to have a good knowledge of Sqoop . For example they would need put together a series of required parameters with specific syntax that would result in errors easy to make. Often times getting those correctly can take a few hours of work. Now with Hue's new feature you can submityour Sqoop job in minutes. The imports run on YARN and are scheduled by Oozie. This tutorial offers a step by step guide on how to do it.
Learn more about it on the Importing data from traditional databases into HDFS/Hive in just a few clicks post.
The Sqoop UI enables transfering data from a relational database to Hadoop and vice versa. The UI lives uses Apache Sqoop to do this. See the Sqoop Documentation for more details on Sqoop.
There's a status on each of the items in the job list indicating the last time a job was ran. The progress of the job should dynamically update. There's a progress bar at the bottom of each item on the job list as well.
NOTE: If this does not work, it's like because a job is using that connection. Make sure not jobs are using the connection that will be deleted.
The text field in the top, left corner of the Sqoop Jobs page enables fast filtering of sqoop jobs by name.
The main two features are:
ZooKeeper Browser requires the ZooKeeper REST service to be running. Here is how to setup this one:
First get and build ZooKeeper:
git clone https://github.com/apache/zookeeper cd zookeeper ant Buildfile: /home/hue/Development/zookeeper/build.xml init: [mkdir] Created dir: /home/hue/Development/zookeeper/build/classes [mkdir] Created dir: /home/hue/Development/zookeeper/build/lib [mkdir] Created dir: /home/hue/Development/zookeeper/build/package/lib [mkdir] Created dir: /home/hue/Development/zookeeper/build/test/lib ...
And start the REST service:
cd src/contrib/rest nohup ant run&
If ZooKeeper and the REST service are not on the same machine as Hue, go update the Hue settings and specify the correct hostnames and ports:
[zookeeper] [[clusters]] [[[default]]] # Zookeeper ensemble. Comma separated list of Host/Port. # e.g. localhost:2181,localhost:2182,localhost:2183 ## host_ports=localhost:2181 # The URL of the REST contrib service ## rest_url=http://localhost:9998
A basic read only version is done HUE-951.