首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 其他教程 > 互联网 >

hadoop,hbase惯用API

2013-10-23 
hadoop,hbase常用API1.org.apache.hadoop.hbase??Class HColumnDescriptor?An HColumnDescriptor contains

hadoop,hbase常用API

1.org.apache.hadoop.hbase??Class HColumnDescriptor?An HColumnDescriptor contains information about a column family such as the number of versions, compression(压缩) settings, etc. It is used as input when creating a table or adding a column. Once set, the parameters that specify(指定) a column cannot be changed without deleting the column and recreating it. If there is data stored in the column, it will be deleted when the column is deleted.?
public void setMaxVersions(int?maxVersions)
指定数据最大保存的版本个数。默认为3。?2.org.apache.hadoop.hbase.filter?Class FilterList?

Implementation of?Filter?that represents(代表) an ordered List of Filters which will be evaluated(评估) with a specified boolean operatorFilterList.Operator.MUST_PASS_ALL?(!AND) or?FilterList.Operator.MUST_PASS_ONE?(!OR). Since you can use Filter Lists as children of Filter Lists, you can create a hierarchy(等级) of filters to be evaluated. Defaults to?FilterList.Operator.MUST_PASS_ALL.

TODO: Fix creation of Configuration on serialization and deserialization.

3.org.apache.hadoop.hbase.filter?Class SingleColumnValueFilter

This filter is used to filter cells based on value. It takes a?CompareFilter.CompareOp?operator (equal, greater, not equal, etc), and either a byte [] value or a WritableByteArrayComparable.

If we have a byte [] value then we just do a lexicographic(字典式的) compare. For example, if passed value is 'b' and cell has 'a' and the compare operator is LESS, then we will filter out this cell (return true). If this is not sufficient(足够的,充分的)(eg you want to deserialize a long and then compare it to a fixed long value), then you can pass in your own comparator instead.

You must also specify a family and qualifier. Only the value of this column will be tested. When using this filter on a?Scan?with specified inputs, the column to be tested should also be added as input (otherwise the filter will regard the column as missing).

To prevent the entire row from being emitted if the column is not found on a row, use?setFilterIfMissing(boolean). Otherwise, if the column is found, the entire row will be emitted only if the value passes. If the value fails, the row will be filtered out.

In order to test values of previous versions (timestamps), set?setLatestVersionOnly(boolean)?to false. The default is true, meaning that only the latest version's value is tested and all previous versions are ignored.

To filter based on the value of all scanned columns, use?ValueFilter.

4.org.apache.hadoop.hbase.filter?Class SingleColumnValueExcludeFilter?A?Filter?that checks a single column value, but does not emit(发送) the tested column. This will enable a performance boost over?SingleColumnValueFilter, if the tested column value is not actually needed as input (besides for the filtering itself).

5.org.apache.hadoop.mapreduce.lib.jobcontrol?Class ControlledJob(管理job的运行)This class encapsulates(封装) a MapReduce job and its dependency(从属). It monitors(监视) the states of the depending jobs and updates the state of this job. A job starts in the WAITING state. If it does not have any depending jobs, or all of the depending jobs are in SUCCESS state, then the job state will become READY. If any depending jobs fail, the job will fail too. When in READY state, the job can be submitted to Hadoop for execution, with the state changing into RUNNING state. From RUNNING state, the job can get into SUCCESS or FAILED state, depending the status of the job execution.?6.org.apache.hadoop.mapreduce.lib.jobcontrol?Class JobControl(管理job的运行)This class encapsulates a set of MapReduce jobs and its dependency. It tracks(跟踪) the states of the jobs by placing them into different tables according to their states. This class provides APIs for the client app to add a job to the group and to get the jobs in the group in different states. When a job is added, an ID unique to the group is assigned(分配/指派) to the job. This class has a thread that submits jobs when they become ready, monitors the states of the running jobs, and updates the states of jobs based on the state changes of their depending jobs states. The class provides APIs for suspending(阻塞)/resuming(恢复) the thread, and for stopping the thread.

热点排行