HBase如何实现故障排除与修复

发布时间：2021-12-08 14:28:37 作者：小新
来源：亿速云阅读：228

这篇文章主要介绍HBase如何实现故障排除与修复，文中介绍的非常详细，具有一定的参考价值，感兴趣的小伙伴们一定要看完！

一般准则

总是先从主服务器的日志开始。通常情况下，他总是一行一行的重复信息。如果不是这样，说明有问题，可以Google或是用search-hadoop.com来搜索遇到的异常。

错误很少仅仅单独出现在HBase中，通常是某一个地方出了问题，引起各处大量异常和调用栈跟踪信息。遇到这样的错误，最好的办法是往上查日志，找到最初的异常。例如区域服务器会在退出的时候打印一些度量信息。Grep这个转储应该可以找到最初的异常信息。

区域服务器的自杀是很“正常”的。当一些事情发生错误的，他们就会自杀。如果ulimit和xcievers没有修改，HDFS将无法运转正常，在HBase看来，HDFS死掉了。假想一下，你的MySQL突然无法访问它的文件系统，他会怎么做。同样的事情会发生在HBase和HDFS上。还有一个造成区域服务器切腹自杀的常见的原因是，他们执行了一个长时间的GC操作，这个时间超过了ZooKeeper的会话时长。

Logs

重要日志的位置( <user>是启动服务的用户，<hostname> 是机器的名字)

NameNode: $HADOOP_HOME/logs/hadoop-<user>-namenode-<hostname>.log
DataNode: $HADOOP_HOME/logs/hadoop-<user>-datanode-<hostname>.log
JobTracker: $HADOOP_HOME/logs/hadoop-<user>-jobtracker-<hostname>.log
TaskTracker: $HADOOP_HOME/logs/hadoop-<user>-jobtracker-<hostname>.log
HMaster: $HBASE_HOME/logs/hbase-<user>-master-<hostname>.log
RegionServer: $HBASE_HOME/logs/hbase-<user>-regionserver-<hostname>.log
ZooKeeper: TODO

日志级别

启用 RPC级别日志

Enabling the RPC-level logging on a RegionServer can often given insight on timings at the server. Once enabled, the amount of log spewed is voluminous. It is not recommended that you leave this logging on for more than short bursts of time. To enable RPC-level logging, browse to the RegionServer UI and click on Log Level. Set the log level to DEBUG for the package org.apache.hadoop.ipc (Thats right, for hadoop.ipc, NOT, hbase.ipc). Then tail the RegionServers log. Analyze.

To disable, set the logging level back to INFO level.

JVM 垃圾收集日志

HBase is memory intensive, and using the default GC you can see long pauses in all threads including the Juliet Pause aka "GC of Death". To help debug this or confirm this is happening GC logging can be turned on in the Java virtual machine.

To enable, in hbase-env.sh add:

export HBASE_OPTS="-XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/home/hadoop/hbase/logs/gc-hbase.log"

Adjust the log directory to wherever you log. Note: The GC log does NOT roll automatically, so you'll have to keep an eye on it so it doesn't fill up the disk.

At this point you should see logs like so:

64898.952: [GC [1 CMS-initial-mark: 2811538K(3055704K)] 2812179K(3061272K), 0.0007360 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
64898.953: [CMS-concurrent-mark-start]
64898.971: [GC 64898.971: [ParNew: 5567K->576K(5568K), 0.0101110 secs] 2817105K->2812715K(3061272K), 0.0102200 secs] [Times: user=0.07 sys=0.00, real=0.01 secs]

In this section, the first line indicates a 0.0007360 second pause for the CMS to initially mark. This pauses the entire VM, all threads for that period of time.

The third line indicates a "minor GC", which pauses the VM for 0.0101110 seconds - aka 10 milliseconds. It has reduced the "ParNew" from about 5.5m to 576k. Later on in this cycle we see:

64901.445: [CMS-concurrent-mark: 1.542/2.492 secs] [Times: user=10.49 sys=0.33, real=2.49 secs] 
64901.445: [CMS-concurrent-preclean-start]
64901.453: [GC 64901.453: [ParNew: 5505K->573K(5568K), 0.0062440 secs] 2868746K->2864292K(3061272K), 0.0063360 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] 
64901.476: [GC 64901.476: [ParNew: 5563K->575K(5568K), 0.0072510 secs] 2869283K->2864837K(3061272K), 0.0073320 secs] [Times: user=0.05 sys=0.01, real=0.01 secs] 
64901.500: [GC 64901.500: [ParNew: 5517K->573K(5568K), 0.0120390 secs] 2869780K->2865267K(3061272K), 0.0121150 secs] [Times: user=0.09 sys=0.00, real=0.01 secs] 
64901.529: [GC 64901.529: [ParNew: 5507K->569K(5568K), 0.0086240 secs] 2870200K->2865742K(3061272K), 0.0087180 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] 
64901.554: [GC 64901.555: [ParNew: 5516K->575K(5568K), 0.0107130 secs] 2870689K->2866291K(3061272K), 0.0107820 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 
64901.578: [CMS-concurrent-preclean: 0.070/0.133 secs] [Times: user=0.48 sys=0.01, real=0.14 secs] 
64901.578: [CMS-concurrent-abortable-preclean-start]
64901.584: [GC 64901.584: [ParNew: 5504K->571K(5568K), 0.0087270 secs] 2871220K->2866830K(3061272K), 0.0088220 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] 
64901.609: [GC 64901.609: [ParNew: 5512K->569K(5568K), 0.0063370 secs] 2871771K->2867322K(3061272K), 0.0064230 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 
64901.615: [CMS-concurrent-abortable-preclean: 0.007/0.037 secs] [Times: user=0.13 sys=0.00, real=0.03 secs] 
64901.616: [GC[YG occupancy: 645 K (5568 K)]64901.616: [Rescan (parallel) , 0.0020210 secs]64901.618: [weak refs processing, 0.0027950 secs] [1 CMS-remark: 2866753K(3055704K)] 2867399K(3061272K), 0.0049380 secs] [Times: user=0.00 sys=0.01, real=0.01 secs] 
64901.621: [CMS-concurrent-sweep-start]

The first line indicates that the CMS concurrent mark (finding garbage) has taken 2.4 seconds. But this is a concurrent 2.4 seconds, Java has not been paused at any point in time.

There are a few more minor GCs, then there is a pause at the 2nd last line:

64901.616: [GC[YG occupancy: 645 K (5568 K)]64901.616: [Rescan (parallel) , 0.0020210 secs]64901.618: [weak refs processing, 0.0027950 secs] [1 CMS-remark: 2866753K(3055704K)] 2867399K(3061272K), 0.0049380 secs] [Times: user=0.00 sys=0.01, real=0.01 secs]

The pause here is 0.0049380 seconds (aka 4.9 milliseconds) to 'remark' the heap.

At this point the sweep starts, and you can watch the heap size go down:

64901.637: [GC 64901.637: [ParNew: 5501K->569K(5568K), 0.0097350 secs] 2871958K->2867441K(3061272K), 0.0098370 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] 
...  lines removed ...
64904.936: [GC 64904.936: [ParNew: 5532K->568K(5568K), 0.0070720 secs] 1365024K->1360689K(3061272K), 0.0071930 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] 
64904.953: [CMS-concurrent-sweep: 2.030/3.332 secs] [Times: user=9.57 sys=0.26, real=3.33 secs]

At this point, the CMS sweep took 3.332 seconds, and heap went from about ~ 2.8 GB to 1.3 GB (approximate).

The key points here is to keep all these pauses low. CMS pauses are always low, but if your ParNew starts growing, you can see minor GC pauses approach 100ms, exceed 100ms and hit as high at 400ms.

This can be due to the size of the ParNew, which should be relatively small. If your ParNew is very large after running HBase for a while, in one example a ParNew was about 150MB, then you might have to constrain the size of ParNew (The larger it is, the longer the collections take but if its too small, objects are promoted to old gen too quickly). In the below we constrain new gen size to 64m.

Add this to HBASE_OPTS:

export HBASE_OPTS="-XX:NewSize=64m -XX:MaxNewSize=64m <cms options from above> <gc logging options from above>"

jstack

jstack 是一个最重要(除了看Log)的java工具，可以看到具体的Java进程的在做什么。可以先用Jps看到进程的Id,然后就可以用jstack。他会按线程的创建顺序显示线程的列表，还有这个线程在做什么。

ScannerTimeoutException 或 UnknownScannerException

当从客户端到RegionServer的RPC请求超时。例如如果Scan.setCacheing的值设置为500，RPC请求就要去获取500行的数据，每500次.next()操作获取一次。因为数据是以大块的形式传到客户端的，就可能造成超时。将这个 serCacheing的值调小是一个解决办法，但是这个值要是设的太小就会影响性能。

安全客户端不能连接

([由 GSSException 引起: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]) There can be several causes that produce this symptom.

First, check that you have a valid Kerberos ticket. One is required in order to set up communication with a secure Apache HBase cluster. Examine the ticket currently in the credential cache, if any, by running the klist command line utility. If no ticket is listed, you must obtain a ticket by running the kinit command with either a keytab specified, or by interactively entering a password for the desired principal.

Then, consult the Java Security Guide troubleshooting section. The most common problem addressed there is resolved by setting javax.security.auth.useSubjectCredsOnly system property value to false.

Because of a change in the format in which MIT Kerberos writes its credentials cache, there is a bug in the Oracle JDK 6 Update 26 and earlier that causes Java to be unable to read the Kerberos credentials cache created by versions of MIT Kerberos 1.8.1 or higher. If you have this problematic combination of components in your environment, to work around this problem, first log in with kinit and then immediately refresh the credential cache with kinit -R. The refresh will rewrite the credential cache without the problematic formatting.

Finally, depending on your Kerberos configuration, you may need to install the Java Cryptography Extension, or JCE. Insure the JCE jars are on the classpath on both server and client systems.

You may also need to download the unlimited strength JCE policy files. Uncompress and extract the downloaded file, and install the policy jars into <java-home>/lib/security.

HBase hbck

1.重新修复hbase meta表（根据hdfs上的regioninfo文件，生成meta表）
hbase hbck -fixMeta

2.重新将hbase meta表分给regionserver（根据meta表，将meta表上的region分给regionservere）
hbase hbck -fixAssignments

当出现漏洞

hbase hbck -fixHdfsHoles （新建一个region文件夹）
hbase hbck -fixMeta （根据regioninfo生成meta表）
hbase hbck -fixAssignments （分配region到regionserver上）

region重复问题

查看meta中的region

scan 'hbase:meta' , {LIMIT=>10,FILTER=>"PrefixFilter('INDEX_11')"}

在数据迁移的时候碰到两个重复的region
b0c8f08ffd7a96219f748ef14d7ad4f8，73ab00eaa7bab7bc83f440549b9749a3

删除两个重复的region

delete 'hbase:meta','INDEX_11,4380_2431,1429757926776.b0c8f08ffd7a96219f748ef14d7ad4f8.','info:regioninfo'  

delete 'hbase:meta','INDEX_11,5479_0041431700000000040100004815E9,1429757926776.73ab00eaa7bab7bc83f440549b9749a3.','info:regioninfo'

删除两个重复的hdfs

/hbase/data/default/INDEX_11/b0c8f08ffd7a96219f748ef14d7ad4f8
/hbase/data/default/INDEX_11/73ab00eaa7bab7bc83f440549b9749a3

对应的重启regionserver（只是为了刷新hmaster上汇报的RIS的状态）

肯定会丢数据，把没有上线的重复region上的数据丢失

新版本的 hbck

（1）缺失hbase.version文件
加上选项 -fixVersionFile 解决
（2）如果一个region即不在META表中，又不在hdfs上面，但是在regionserver的online region集合中
加上选项 -fixAssignments 解决
（3）如果一个region在META表中，并且在regionserver的online region集合中，但是在hdfs上面没有
加上选项 -fixAssignments -fixMeta 解决，（ -fixAssignments告诉regionserver close region），（ -fixMeta删除META表中region的记录（4）如果一个region在META表中没有记录，没有被regionserver服务，但是在hdfs上面有
加上选项 -fixMeta -fixAssignments 解决，（ -fixAssignments 用于assign region），（ -fixMeta用于在META表中添加region的记录）
（5）如果一个region在META表中没有记录，在hdfs上面有，被regionserver服务了
加上选项 -fixMeta 解决，在META表中添加这个region的记录，先undeploy region，后assign （6）如果一个region在META表中有记录，但是在hdfs上面没有，并且没有被regionserver服务
加上选项 -fixMeta 解决，删除META表中的记录
（7）如果一个region在META表中有记录，在hdfs上面也有，table不是disabled的，但是这个region没有被服务
加上选项 -fixAssignments 解决，assign这个region
（8）如果一个region在META表中有记录，在hdfs上面也有，table是disabled的，但是这个region被某个regionserver服务了
加上选项 -fixAssignments 解决，undeploy这个region
（9）如果一个region在META表中有记录，在hdfs上面也有，table不是disabled的，但是这个region被多个regionserver服务了加上选项 -fixAssignments 解决，通知所有regionserver close region，然后assign region

（10）如果一个region在META表中，在hdfs上面也有，也应该被服务，但是META表中记录的regionserver和实际所在的regionserver不相符
加上选项 -fixAssignments 解决
（11）region holes
需要加上 -fixHdfsHoles ，创建一个新的空region，填补空洞，但是不assign 这个 region，也不在META表中添加这个region的相关信息
（12）region在hdfs上面没有.regioninfo文件
-fixHdfsOrphans 解决
（13）region overlaps
需要加上 -fixHdfsOverlaps

说明：
（1）修复region holes时，-fixHdfsHoles 选项只是创建了一个新的空region，填补上了这个区间，还需要加上-fixAssignments -fixMeta 来解决问题，（ -fixAssignments 用于assign region），（ -fixMeta用于在META表中添加region的记录），所以有了组合拳 -repairHoles 修复region holes，相当于-fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans
（2） -fixAssignments，用于修复region没有assign、不应该assign、assign了多次的问题

（3）-fixMeta，如果hdfs上面没有，那么从META表中删除相应的记录，如果hdfs上面有，在META表中添加上相应的记录信息
（4）-repair 打开所有的修复选项，相当于-fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps

新版本的hbck从（1）hdfs目录（2）META（3）RegionServer这三处获得region的Table和Region的相关信息，根据这些信息判断并repair

以上是“HBase如何实现故障排除与修复”这篇文章的所有内容，感谢各位的阅读！希望分享的内容对大家有帮助，更多相关知识，欢迎关注亿速云行业资讯频道！