org.apache.hadoop.hbase.coprocessor (Apache HBase 2.1.0-cdh6.3.2 API)

Interface Summary
Interface	Description
BulkLoadObserver	Coprocessors implement this interface to observe and mediate bulk load operations.
CoprocessorHost.ObserverGetter<C,O>	Implementations defined function to get an observer of type `O` from a coprocessor of type `C`.
CoprocessorService	Deprecated Since 2.0.
EndpointObserver	Coprocessors implement this interface to observe and mediate endpoint invocations on a region.
HasMasterServices	Deprecated Since 2.0.0 to be removed in 3.0.0.
HasRegionServerServices	Deprecated Since 2.0.0 to be removed in 3.0.0.
MasterCoprocessor
MasterCoprocessorEnvironment
MasterObserver	Defines coprocessor hooks for interacting with operations on the `HMaster` process.
ObserverContext<E extends CoprocessorEnvironment>	Carries the execution state for a given invocation of an Observer coprocessor (`RegionObserver`, `MasterObserver`, or `WALObserver`) method.
RegionCoprocessor
RegionCoprocessorEnvironment
RegionObserver	Coprocessors implement this interface to observe and mediate client actions on the region.
RegionServerCoprocessor
RegionServerCoprocessorEnvironment
RegionServerObserver	Defines coprocessor hooks for interacting with operations on the `HRegionServer` process.
SingletonCoprocessorService	Deprecated Since 2.0.
WALCoprocessor	WALCoprocessor don't support loading services using `Coprocessor.getServices()`.
WALCoprocessorEnvironment
WALObserver	It's provided to have a way for coprocessors to observe, rewrite, or skip WALEdits as they are being written to the WAL.

Class Summary
Class	Description
AggregateImplementation<T,S,P extends com.google.protobuf.Message,Q extends com.google.protobuf.Message,R extends com.google.protobuf.Message>	A concrete AggregateProtocol implementation.
BaseEnvironment<C extends Coprocessor>	Encapsulation of the environment of each coprocessor
BaseRowProcessorEndpoint<S extends com.google.protobuf.Message,T extends com.google.protobuf.Message>	This class demonstrates how to implement atomic read-modify-writes using `Region.processRowsWithLocks(org.apache.hadoop.hbase.regionserver.RowProcessor<?, ?>)` and Coprocessor endpoints.
ColumnInterpreter<T,S,P extends com.google.protobuf.Message,Q extends com.google.protobuf.Message,R extends com.google.protobuf.Message>	Defines how value for specific column is interpreted and provides utility methods like compare, add, multiply etc for them.
CoprocessorHost<C extends Coprocessor,E extends CoprocessorEnvironment<C>>	Provides the common setup framework and runtime services for coprocessor invocation from HBase services.
CoprocessorServiceBackwardCompatiblity	Deprecated
CoprocessorServiceBackwardCompatiblity.MasterCoprocessorService
CoprocessorServiceBackwardCompatiblity.RegionCoprocessorService
CoprocessorServiceBackwardCompatiblity.RegionServerCoprocessorService
Export	Export an HBase table.
Export.Response
MetaTableMetrics	A coprocessor that collects metrics from meta table.
MetricsCoprocessor	Utility class for tracking metrics for various types of coprocessors.
MultiRowMutationEndpoint	This class demonstrates how to implement atomic multi row transactions using `HRegion.mutateRowsWithLocks(Collection, Collection, long, long)` and Coprocessor endpoints.
ObserverContextImpl<E extends CoprocessorEnvironment>	This is the only implementation of `ObserverContext`, which serves as the interface for third-party Coprocessor developers.

Enum Summary
Enum Description

RegionObserver.MutationType
Mutation type for postMutationBeforeWAL hook
Exception Summary
Exception Description

CoprocessorException
Thrown if a coprocessor encounters any exception.
Annotation Types Summary
Annotation Type Description

CoreCoprocessor
Marker annotation that denotes Coprocessors that are core to HBase.

Enum Summary
Enum	Description
RegionObserver.MutationType	Mutation type for postMutationBeforeWAL hook

Exception Summary
Exception	Description
CoprocessorException	Thrown if a coprocessor encounters any exception.

Annotation Types Summary
Annotation Type	Description
CoreCoprocessor	Marker annotation that denotes Coprocessors that are core to HBase.

Package org.apache.hadoop.hbase.coprocessor Description

Overview
Coprocessor
RegionObserver
Endpoint
Coprocessor loading

Overview

Coprocessors are code that runs in-process on each region server. Regions contain references to the coprocessor implementation classes associated with them. Coprocessor classes can be loaded either from local jars on the region server's classpath or via the HDFS classloader.

Multiple types of coprocessors are provided to provide sufficient flexibility for potential use cases. Right now there are:

Coprocessor: provides region lifecycle management hooks, e.g., region open/close/split/flush/compact operations.
RegionObserver: provides hook for monitor table operations from client side, such as table get/put/scan/delete, etc.
Endpoint: provides on demand triggers for any arbitrary function executed at a region. One use case is column aggregation at region server.

Coprocessor

A coprocessor is required to implement Coprocessor interface so that coprocessor framework can manage it internally.

Another design goal of this interface is to provide simple features for making coprocessors useful, while exposing no more internal state or control actions of the region server than necessary and not exposing them directly.

Over the lifecycle of a region, the methods of this interface are invoked when the corresponding events happen. The master transitions regions through the following states:

unassigned -> pendingOpen -> open -> pendingClose -7gt; closed.

Coprocessors have opportunity to intercept and handle events in pendingOpen, open, and pendingClose states.

PendingOpen

The region server is opening a region to bring it online. Coprocessors can piggyback or fail this process.

preOpen, postOpen: Called before and after the region is reported as online to the master.

Open

The region is open on the region server and is processing both client requests (get, put, scan, etc.) and administrative actions (flush, compact, split, etc.). Coprocessors can piggyback administrative actions via:

preFlush, postFlush: Called before and after the memstore is flushed into a new store file.
preCompact, postCompact: Called before and after compaction.
preSplit, postSplit: Called after the region is split.

PendingClose

The region server is closing the region. This can happen as part of normal operations or may happen when the region server is aborting due to fatal conditions such as OOME, health check failure, or fatal filesystem problems. Coprocessors can piggyback this event. If the server is aborting an indication to this effect will be passed as an argument.

preClose and postClose: Called before and after the region is reported as closed to the master.

RegionObserver

If the coprocessor implements the RegionObserver interface it can observe and mediate client actions on the region:

preGet, postGet: Called before and after a client makes a Get request.
preExists, postExists: Called before and after the client tests for existence using a Get.
prePut and postPut: Called before and after the client stores a value.
preDelete and postDelete: Called before and after the client deletes a value.
preScannerOpen postScannerOpen: Called before and after the client opens a new scanner.
preScannerNext, postScannerNext: Called before and after the client asks for the next row on a scanner.
preScannerClose, postScannerClose: Called before and after the client closes a scanner.
preCheckAndPut, postCheckAndPut: Called before and after the client calls checkAndPut().
preCheckAndDelete, postCheckAndDelete: Called before and after the client calls checkAndDelete().

Here's an example of what a simple RegionObserver might look like. This example shows how to implement access control for HBase. This coprocessor checks user information for a given client request, e.g., Get/Put/Delete/Scan by injecting code at certain RegionObserver preXXX hooks. If the user is not allowed to access the resource, a CoprocessorException will be thrown. And the client request will be denied by receiving this exception.

package org.apache.hadoop.hbase.coprocessor;

import org.apache.hadoop.hbase.client.Get;

// Sample access-control coprocessor. It utilizes RegionObserver
// and intercept preXXX() method to check user privilege for the given table
// and column family.
public class AccessControlCoprocessor extends BaseRegionObserverCoprocessor {
  // @Override
  public Get preGet(CoprocessorEnvironment e, Get get)
      throws CoprocessorException {

    // check permissions..
    if (access_not_allowed)  {
      throw new AccessDeniedException("User is not allowed to access.");
    }
    return get;
  }

  // override prePut(), preDelete(), etc.
}

Endpoint

Coprocessor and RegionObserver provide certain hooks for injecting user code running at each region. The user code will be triggered by existing HTable and HBaseAdmin operations at the certain hook points.

Coprocessor Endpoints allow you to define your own dynamic RPC protocol to communicate between clients and region servers, i.e., you can create a new method, specifying custom request parameters and return types. RPC methods exposed by coprocessor Endpoints can be triggered by calling client side dynamic RPC functions -- HTable.coprocessorService(...).

To implement an Endpoint, you need to:

Define a protocol buffer Service and supporting Message types for the RPC methods. See the protocol buffer guide for more details on defining services.
Generate the Service and Message code using the protoc compiler
Implement the generated Service interface and override get*Service() method in relevant Coprocessor to return a reference to the Endpoint's protocol buffer Service instance.

For a more detailed discussion of how to implement a coprocessor Endpoint, along with some sample code, see the org.apache.hadoop.hbase.client.coprocessor package documentation.

Coprocessor loading

A customized coprocessor can be loaded by two different ways, by configuration, or by TableDescriptor for a newly created table.

(Currently we don't really have an on demand coprocessor loading mechanism for opened regions.)

Load from configuration

Whenever a region is opened, it will read coprocessor class names from hbase.coprocessor.region.classes from Configuration. Coprocessor framework will automatically load the configured classes as default coprocessors. The classes must be included in the classpath already.

  <property>
    <name>hbase.coprocessor.region.classes</name>
    <value>org.apache.hadoop.hbase.coprocessor.AccessControlCoprocessor, org.apache.hadoop.hbase.coprocessor.ColumnAggregationProtocol</value>
    <description>A comma-separated list of Coprocessors that are loaded by
    default. For any override coprocessor method from RegionObservor or
    Coprocessor, these classes' implementation will be called
    in order. After implement your own
    Coprocessor, just put it in HBase's classpath and add the fully
    qualified class name here.
    </description>
  </property>

The first defined coprocessor will be assigned Coprocessor.Priority.SYSTEM as priority. And each following coprocessor's priority will be incremented by one. Coprocessors are executed in order according to the natural ordering of the int.

Load from table attribute

Coprocessor classes can also be configured at table attribute. The attribute key must start with "Coprocessor" and values of the form is <path>:<class>:<priority>, so that the framework can recognize and load it.

'COPROCESSOR$1' => 'hdfs://localhost:8020/hbase/coprocessors/test.jar:Test:1000'
'COPROCESSOR$2' => '/hbase/coprocessors/test2.jar:AnotherTest:1001'

<path> must point to a jar, can be on any filesystem supported by the Hadoop FileSystem object.

<class> is the coprocessor implementation class. A jar can contain more than one coprocessor implementation, but only one can be specified at a time in each table attribute.

<priority> is an integer. Coprocessors are executed in order according to the natural ordering of the int. Coprocessors can optionally abort actions. So typically one would want to put authoritative CPs (security policy implementations, perhaps) ahead of observers.

  Path path = new Path(fs.getUri() + Path.SEPARATOR +
    "TestClassloading.jar");

  // create a table that references the jar
  TableDescriptor htd = TableDescriptorBuilder
                        .newBuilder(TableName.valueOf(getClass().getTableName()))
                        .setColumnFamily(ColumnFamilyDescriptorBuilder.of("test"))
                        .setValue(Bytes.toBytes("Coprocessor$1", path.toString()+
                          ":" + classFullName +
                          ":" + Coprocessor.Priority.USER))
                        .build();
  Admin admin = connection.getAdmin();
  admin.createTable(htd);

Chain of RegionObservers

As described above, multiple coprocessors can be loaded at one region at the same time. In case of RegionObserver, you can have more than one RegionObservers register to one same hook point, i.e, preGet(), etc. When a region reach the hook point, the framework will invoke each registered RegionObserver by the order of assigned priority.

Package org.apache.hadoop.hbase.coprocessor