hadoop及组件问题记录
hadoop 及其周边组件问题记录
Hadoop
问题记录
1、Gap in transactions. Expected to be able to read up until at least txid 104138 but unable to find any edit logs containing txid 104138
ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 104138 but unable to find any edit logs containing txid 104138

[hadoop@hadoop01 bin]$ ./hadoop namenode -recover
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
...
22/05/05 10:53:29 INFO common.Storage: Lock on /opt/module/hadoop/data/dfs/name/in_use.lock acquired by nodename 42299@hadoop01
22/05/05 10:53:30 INFO namenode.FileJournalManager: Recovering unfinalized segments in /opt/module/hadoop/data/dfs/name/current
22/05/05 10:53:30 INFO namenode.FileJournalManager: Finalizing edits file /opt/module/hadoop/data/dfs/name/current/edits_inprogress_0000000000000104072 -> /opt/module/hadoop/data/dfs/name/current/edits_0000000000000104072-0000000000000104073
22/05/05 10:53:30 ERROR namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for (journal JournalAndStream(mgr=FileJournalManager(root=/opt/module/hadoop/data/dfs/name), stream=null))
java.lang.IllegalStateException: Can't finalize edits file /opt/module/hadoop/data/dfs/name/current/edits_inprogress_0000000000000104072 since finalized file already exists
at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.finalizeLogSegment(FileJournalManager.java:135)
at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.recoverUnfinalizedSegments(FileJournalManager.java:397)
...
[hadoop@hadoop01 bin]$ rm -rf /opt/module/hadoop/data/dfs/name/current/edits_inprogress_0000000000000104072
[hadoop@hadoop01 bin]$ start-dfs.sh
...
[hadoop@hadoop01 bin]$ jps
40839 DataNode
44265 NameNode
44844 Jps
41150 SecondaryNameNode
[hadoop@hadoop01 bin]$
2、 Datanode denied communication with namenode because hostname cannot be resolved

hdfs-site.xml 添加配置
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
https://www.cnblogs.com/suanec/p/7061485.html
3、namenode Directory xxx is in an inconsistent state
2022-05-30 04:26:48,772 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /root/hadoop/tmp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:644)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:811)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:795)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
4、datanode.DataNode: Problem connecting to server: hdfs-namenode-service.default.svc.cluster.local:9000

5、格式化
hadoop namenode -format
6、 dfs.name.dir | dfs.data.dir | hadoop.tmp.dir
### dfs.name.dir
### Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
#这个参数用于确定将HDFS文件系统的元信息保存在什么目录下。
#如果这个参数设置为多个目录,那么这些目录下都保存着元信息的多个备份。
如:
<property>
<name>dfs.name.dir</name>
<value>/pvdata/hadoopdata/name/,/opt/hadoopdata/name/</value>
</property>
### dfs.data.dir
### Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.
#这个参数用于确定将HDFS文件系统的数据保存在什么目录下。
#我们可以将这个参数设置为多个分区上目录,即可将HDFS建立在不同分区上。
如:
<property>
<name>dfs.data.dir</name>
<value>/dev/sda3/hadoopdata/,/dev/sda1/hadoopdata/</value>
</property>
https://blog.csdn.net/weixin_38847462/article/details/77879459
7、Datanode denied communication with namenode because hostname cannot be resolved

<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
8、8030


9、hadoop 配置压缩
https://blog.csdn.net/zjh_746140129/article/details/79888298
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec
</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
</configuration>
LZO: https://blog.csdn.net/qq_41489540/article/details/109239716
10、yarn 资源及配置参数
https://blog.csdn.net/weixin_44758876/article/details/122924263
k8s Hadoop datanode 启动异常:failed to acquire lock on in_use.lock | failed to add storage directory /hadoop/dfs/data
glusterfs + k8s + hadoop datanode
删除存储卷 in_use.lock
11、hbase The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so
2022-06-16 11:59:56,456 ERROR [Thread-13] master.HMaster: Failed to become active master
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1044)
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:383)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:545)
at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1325)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:871)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2109)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:566)
at java.lang.Thread.run(Thread.java:748)
2022-06-16 11:59:56,464 ERROR [Thread-13] master.HMaster: ***** ABORTING master hbase-master-service,16000,1655351992968: Unhandled exception. Starting shutdown. *****
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1044)
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:383)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:545)
at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1325)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:871)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2109)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:566)
at java.lang.Thread.run(Thread.java:748)
hbase-site.xml
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
12、phoenix 连接hbase问题记录
注意版本 phonenix 与hbase版本兼容,否则可能遇到:




https://blog.csdn.net/hblicy/article/details/106406906
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl cannot be cast to com.google.protobuf.Service
只添加 phoenix-server-hbase-2.1-5.1.1.jar 至 hbase/lib

Hbase
Recover lease on dfs file hdfs://hdfs-namenode-service:9000/hbase/MasterProcWALs/pv2-00000000000000000269.log | Failed to recover lease attempt=0 on file=hdfs://
–>> https://www.136.la/jingpin/show-52637.html


尝试删除 /hbase/MasterProcWALs下的日志文件
root@hdfs-datanode-2:/# hdfs dfs -rmr /hbase/MasterProcWALs/pv2-00000000000000000269.log
rmr: DEPRECATED: Please use 'rm -r' instead.
22/06/22 10:32:24 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /hbase/MasterProcWALs/pv2-00000000000000000269.log
r
Kafka
Executing consumer group command failed due to The consumer group command timed out while waiting for group to initialize
root@kafka-0:/# kafka-consumer-groups --describe --group aa --bootstrap-server kafka-headless:9092
Error: Executing consumer group command failed due to The consumer group command timed out while waiting for group to initialize:
…
[2022-06-22 06:34:15,829] ERROR [KafkaApi-0] Number of alive brokers '2' does not meet the required replication factor '3' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
Phoenix
Phoenix zookeeper 连接不释放的问题
https://blog.csdn.net/wangweidong_hb/article/details/62235094
2022-06-22 15:21:45 [INFO ] [org.apache.zookeeper.ZooKeeper:438] - Initiating client connection, connectString=kafka-zookeeper-headless:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$575/722151900@7abeb208
2022-06-22 15:21:46 [INFO ] [org.apache.zookeeper.ClientCnxn$SendThread:975] - Opening socket connection to server kafka-zookeeper-1.kafka-zookeeper-headless.default.svc.cluster.local/10.244.4.151:2181. Will not attempt to authenticate using SASL (unknown error)
2022-06-22 15:21:46 [INFO ] [org.apache.zookeeper.ClientCnxn$SendThread:852] - Socket connection established to kafka-zookeeper-1.kafka-zookeeper-headless.default.svc.cluster.local/10.244.4.151:2181, initiating session
2022-06-22 15:21:46 [INFO ] [org.apache.zookeeper.ClientCnxn$SendThread:1235] - Session establishment complete on server kafka-zookeeper-1.kafka-zookeeper-headless.default.svc.cluster.local/10.244.4.151:2181, sessionid = 0x2003ead4a8b000b, negotiated timeout = 40000
2022-06-22 15:21:46 [INFO ] [org.apache.phoenix.query.ConnectionQueryServicesImpl:432] - HConnection established. Stacktrace for informational purposes: hconnection-0x69336f1a java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.phoenix.util.LogUtil.getCallerStackTrace(LogUtil.java:55)
org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:432)
org.apache.phoenix.query.ConnectionQueryServicesImpl.access$400(ConnectionQueryServicesImpl.java:272)
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2556)
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2532)
org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76)
org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2532)
org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:255)
org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:150)
org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)
java.sql.DriverManager.getConnection(DriverManager.java:664)
java.sql.DriverManager.getConnection(DriverManager.java:247)
com.gitee.freakchicken.dbapi.util.JdbcUtil.getConnection(JdbcUtil.java:27)
com.gitee.freakchicken.dbapi.controller.DataSourceController.connect(DataSourceController.java:70)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:189)
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:895)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:800)
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1038)
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942)
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1005)
org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:908)
javax.servlet.http.HttpServlet.service(HttpServlet.java:660)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:882)
javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:92)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:93)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:200)
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:490)
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408)
org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:834)
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1415)
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
java.lang.Thread.run(Thread.java:748)
2022-06-22 15:22:17 [INFO ] [org.apache.zookeeper.ClientCnxn$SendThread:1096] - Client session timed out, have not heard from server in 28834ms for sessionid 0x2003ead4a8b000b, closing socket connection and attempting reconnect
2022-06-22 15:22:18 [INFO ] [org.apache.zookeeper.ClientCnxn$SendThread:975] - Opening socket connection to server kafka-zookeeper-0.kafka-zookeeper-headless.default.svc.cluster.local/10.244.4.150:2181. Will not attempt to authenticate using SASL (unknown error)
2022-06-22 15:22:18 [INFO ] [org.apache.zookeeper.ClientCnxn$SendThread:852] - Socket connection established to kafka-zookeeper-0.kafka-zookeeper-headless.default.svc.cluster.local/10.244.4.150:2181, initiating session
2022-06-22 15:22:18 [INFO ] [org.apache.zookeeper.ClientCnxn$SendThread:1235] - Session establishment complete on server kafka-zookeeper-0.kafka-zookeeper-headless.default.svc.cluster.local/10.244.4.150:2181, sessionid = 0x2003ead4a8b000b, negotiated timeout = 40000
<property>
<name>hbase.zookeeper.property.maxClientCnxns</name>
<value>300</value>
<description>Property from ZooKeeper's config zoo.cfg.
Limit on number of concurrent connections (at the socket level) that a
single client, identified by IP address, may make to a single member of
the ZooKeeper ensemble. Set high to avoid zk connection issues running
standalone and pseudo-distributed.
</description>
</property>
DataXceiver error processing WRITE_BLOCK operation
https://www.fwqwd.com/12026.html
22/07/06 16:47:03 ERROR datanode.DataNode: hdfs-datanode-0.hdfs-datanode-service.default.svc.cluster.local:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.244.5.26:52214 ds
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:202)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:503)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:903)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:805)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
at java.lang.Thread.run(Thread.java:748)

配置项
配置项:https://hadoop.apache.org/docs/r2.7.4/
1)hadoop
2)yarn
1.1 hadoop.tmp.dir


1.2 yarn.nodemanager.local-dirs

k8s hadoop
ERROR common.Storage: Failed to acquire lock on /hadoop/dfs/name/in_use.lock
k8s 部署hadoop 集群,glusterfs 作为存储后端(glusterfs 节点部署的机器停机过一段时间),尝试重启hdfs 问题处理
# kubectl get po | grep hdfs
hdfs-datanode-0 1/1 Running 0 26s
hdfs-datanode-1 1/1 Running 0 23s
hdfs-datanode-2 1/1 Running 0 21s
hdfs-namenode-0 0/1 CrashLoopBackOff 0 30s
# kubectl describe po hdfs-namenode-0
...
22/09/01 10:56:01 ERROR common.Storage: Failed to acquire lock on /hadoop/dfs/name/in_use.lock. If this storage directory is mounted via NFS, ensure that the appropriate nfs lock services are running.
java.io.IOException: Read-only file system
at java.io.RandomAccessFile.writeBytes(Native Method)
at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
...
22/09/01 10:56:01 ERROR namenode.NameNode: Failed to start namenode.
java.io.IOException: Read-only file system
at java.io.RandomAccessFile.writeBytes(Native Method)
at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
...
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1566)
22/09/01 10:56:01 INFO util.ExitUtil: Exiting with status 1
更多推荐
所有评论(0)