集团站切换校区
图标

学习文章

当前位置:首页 > >学习文章 > >

{大数据}hbase

发布时间: 2018-01-30 16:26:29

1. 什么是hbase

HBASE是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,利用HBASE技术可在廉价PC Server上搭建起大规模结构化存储集群。

HBASE的目标是存储并处理大型的数据,更具体来说是仅需使用普通的硬件配置,就能够处理由成千上万的行和列所组成的大型数据。

HBASE是Google Bigtable的开源实现,但是也有很多不同之处。比如:Google Bigtable利用GFS作为其文件存储系统,HBASE利用Hadoop HDFS作为其文件存储系统;Google运行MAPREDUCE来处理Bigtable中的海量数据,HBASE同样利用Hadoop MapReduce来处理HBASE中的海量数据;Google Bigtable利用Chubby作为协同服务,HBASE利用Zookeeper作为对应。


上图描述了Hadoop EcoSystem中的各层系统,其中HBase位于结构化存储层,Hadoop HDFS为HBase提供了高可靠性的底层存储支持,Hadoop MapReduce为HBase提供了高性能的计算能力,Zookeeper为HBase提供了稳定服务和failover机制。

此外,Pig和Hive还为HBase提供了高层语言支持,使得在HBase上进行数据统计处理变的非常简单。 Sqoop则为HBase提供了方便的RDBMS数据导入功能,使得传统数据库数据向HBase中迁移变的非常方便。

2. 与传统数据库的对比

1、传统数据库遇到的问题:

1)数据量很大的时候无法存储

2)没有很好的备份机制

3)数据达到一定数量开始缓慢,很大的话基本无法支撑

2、HBASE优势:

1)线性扩展,随着数据量增多可以通过节点扩展进行支撑

2)数据存储在hdfs上,备份机制健全

3)通过zookeeper协调查找数据,访问速度块。

3. hbase集群中的角色

1、一个或者多个主节点,Hmaster

2、多个从节点,HregionServer


安装部署Hbase在多台装有hadoop、zookeeper的机器上安装hbase,本课程是以hdp08、hdp09、hdp10三台机器为例子来讲解

1. hdp08、hdp09、hdp10三台机器分别安装JDK、HADOOP、ZOOKEEPER,并设置好环境变量

2. 从Apache网站上(http://hbase.apache.org/)下载HBase稳定发布包,本课程以hbase-1.1.12-bin.tar.gz为例,将hbase-1.1.12-bin.tar.gz上传到hdp08

3. 解压hbase-1.1.12-bin.tar.gz到/home/hadoop/apps目录下

[hadoop@hdp10 ~]$ tar zxvf hbase-1.1.12-bin.tar.gz -C apps

4. 将解压出来的文件夹名称修改成hbase

[hadoop@hdp10 apps]$ mv hbase-1.1.12 hbase

5. 设置环境变量

[root@hdp10 apps]# vi /etc/profile

HADOOP_HOME=/home/hadoop/apps/hadoop-2.8.1

JAVA_HOME=/opt/jdk1.8.0_121

ZOOKEEPER_HOME=/home/hadoop/apps/zookeeper

HBASE_HOME=/home/hadoop/apps/hbase

PATH=$HBASE_HOME/bin:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

export HBASE_HOME ZOOKEEPER_HOME HADOOP_HOME JAVA_HOME PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL[root@hdp10 apps]#source /etc/profile

6. 查看hbase版本

[hadoop@hdp10 apps]$ hbase version


7. 编辑$HBASE_HOME/conf/hbase-env.sh

[hadoop@hdp08 conf]$ vi hbase-env.sh

#

#/**

# * Copyright 2007 The Apache Software Foundation

# *

# * Licensed to the Apache Software Foundation (ASF) under one

# * or more contributor license agreements.  See the NOTICE file

# * distributed with this work for additional information

# * regarding copyright ownership.  The ASF licenses this file

# * to you under the Apache License, Version 2.0 (the

# * "License"); you may not use this file except in compliance

# * with the License.  You may obtain a copy of the License at

# *

# *     http://www.apache.org/licenses/LICENSE-2.0

# *

# * Unless required by applicable law or agreed to in writing, software

# * distributed under the License is distributed on an "AS IS" BASIS,

# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# * See the License for the specific language governing permissions and

# * limitations under the License.

# */


# Set environment variables here.


# This script sets variables multiple times over the course of starting an hbase process,

# so try to keep things idempotent unless you want to take an even deeper look

# into the startup scripts (bin/hbase, etc.)


# The java implementation to use.  Java 1.6 required.

export JAVA_HOME=/opt/jdk1.8.0_121/


# Extra Java CLASSPATH elements.  Optional.这行代码是错的,需要可以修改为下面的形式

#export HBASE_CLASSPATH=/home/hadoop/hbase/conf

export JAVA_CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

# The maximum amount of heap to use, in MB. Default is 1000.

# export HBASE_HEAPSIZE=1000


# Extra Java runtime options.

# Below are what we set by default.  May only work with SUN JVM.

# For more on why as well as other possible settings,

# see http://wiki.apache.org/hadoop/PerformanceTuning

export HBASE_OPTS="-XX:+UseConcMarkSweepGC"


# Uncomment below to enable java garbage collection logging for the server-side processes

# this enables basic gc logging for the server processes to the .out file

# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps $HBASE_GC_OPTS"


# this enables gc logging using automatic GC log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+. Either use this set of options or the one above

# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M $HBASE_GC_OPTS"


# Uncomment below to enable java garbage collection logging for the client processes in the .out file.

# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps $HBASE_GC_OPTS"


# Uncomment below (along with above GC logging) to put GC information in its own logfile (will set HBASE_GC_OPTS).

# This applies to both the server and client GC options above

# export HBASE_USE_GC_LOGFILE=true



# Uncomment below if you intend to use the EXPERIMENTAL off heap cache.

# export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize="

# Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value.



# Uncomment and adjust to enable JMX exporting

# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.

# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html

#

# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"

# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"

# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"

# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"

# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"


# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.

# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers


# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.

# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters


# Extra ssh options.  Empty by default.

# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"


# Where log files are stored.  $HBASE_HOME/logs by default.

# export HBASE_LOG_DIR=${HBASE_HOME}/logs


# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers

# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"

# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"

# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"

# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"


# A string representing this instance of hbase. $USER by default.

# export HBASE_IDENT_STRING=$USER


# The scheduling priority for daemon processes.  See 'man nice'.

# export HBASE_NICENESS=10


# The directory where pid files are stored. /tmp by default.

# export HBASE_PID_DIR=/var/hadoop/pids


# Seconds to sleep between slave commands.  Unset by default.  This

# can be useful in large clusters, where, e.g., slave rsyncs can

# otherwise arrive faster than the master can service them.

# export HBASE_SLAVE_SLEEP=0.1


# Tell HBase whether it should manage it's own instance of Zookeeper or not.

export HBASE_MANAGES_ZK=false


8. 编辑hbase-site.xml

[hadoop@hdp08 conf]$ vi hbase-site.xml

<configuration>

<property>

   <name>hbase.rootdir</name>

   <value>hdfs://hdp08:9000/hbase</value>

</property>

<property>

<name>hbase.cluster.distributed</name>

<value>true</value>

</property>

<property>

<name>hbase.zookeeper.quorum</name>

<value>hdp08:2181,hdp09:2181,hdp10:2181</value>

</property>

</configuration>

注意:hbase.zookeeper.quorum中的值应当是运行zookeeper的机器


9. 编辑regionservers

[hadoop@hdp08 conf]$ vi regionservers

hdp09

hdp10


10. 将hdp08的hbase发送到hdp09,hdp10

[hadoop@hdp08 apps]$ scp -r hbase hadoop@hdp09:/home/hadoop/apps

[hadoop@hdp08 apps]$ scp -r hbase hadoop@hdp10:/home/hadoop/apps

11.  启动HBase

[hadoop@hdp08 apps]$ start-hbase.sh

启动本机hbase

[hadoop@hdp08 bin]$ hbase-daemon.sh start master

[hadoop@hdp08 bin]$hbase-daemon.sh start regionserver

12. 验证启动

1. 在hadoop节点使用jps查看节点状态


13. 查看启动状态信息

http://192.168.195.138:16010/


三、 配置多台HMaster

在$HBASE_HOME/conf/ 目录下新增文件配置backup-masters,在其内添加要用做Backup Master的节点hostname。如下:

[hadoop@hdp08 conf]$ vi backup-masters

dhp09



上一篇: {大数据}HBase访问接口

下一篇: {大数据}HIVE基本操作

在线咨询 ×

您好,请问有什么可以帮您?我们将竭诚提供最优质服务!