0. 核心报错(完整在文末)

Exception in thread “main” org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: http://yfashmp02.yfco.yanfengco.com:8983/solr/dw_table
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:676)

Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)

背景:
使用spark-submit提交任务,利用spark写CDH 的solr cloud,基于github开源项目(https://github.com/lucidworks/spark-solr),报错如上

组件版本:

  • spark:2.44+cdh6.1.1
  • solr:7.4.0+cdh6.1.1

1. 报错解释与探索

报错内容很明显指的是 等待服务器的solr 响应时超时

去服务器solr查询目标表数据量为亿级别(任务是写每天的增量50w数据左右),
结合报错盲猜因为solr表太大导致spark写的时候链路中某个组件timeout相关错误,查询网络也只有语焉不详的加大solr的sessiontimout及connecttimeout相关参数,没什么大用


自行查看报错日志,发现每次失败日志的报错日志时间和它上面的最后一条正常日志时间总是正好相差 2分钟(120s)!这么精确的时间控制肯定是组件网络超时的问题!!

在这里插入图片描述timeout阈值为120s,检查solr/zookeeper/spark相关timeout 配置,追踪发现

spark.core.connection.ack.wait.timeout

查spark文档发现该参数继承spark.network.timeout,默认值为 120s! 为了侵入性最小,打算只改spark.core.connection.ack.wait.timeout参数

2.解决方案

知道相关参数,那么很简单了, spark-submit脚本里增加/调整相关参数,这里改为300

--conf spark.core.connection.ack.wait.timeout=300

调整后果然成功!

3. 完整报错

Exception in thread “main” org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: http://yfashmp02.yfco.yanfengco.com:8983/solr/dw_bdp_tt_d_inventory_amount_available_by_date
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:676)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:265)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
at org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:368)
at org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:296)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1143)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:906)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:838)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:504)
at com.yfas.bdw.service.compute.BaseCompute$class.exeSql(BaseCompute.scala:95)
at com.yfas.bdw.service.compute.ScheduleCompute.exeSql(ScheduleCompute.scala:9)
at com.yfas.bdw.service.compute.ScheduleCompute$$anonfun$compute$1$$anonfun$apply$1.apply(ScheduleCompute.scala:42)
at com.yfas.bdw.service.compute.ScheduleCompute$$anonfun$compute$1$$anonfun$apply$1.apply(ScheduleCompute.scala:38)
at scala.collection.immutable.List.foreach(List.scala:392)
at com.yfas.bdw.service.compute.ScheduleCompute$$anonfun$compute$1.apply(ScheduleCompute.scala:38)
at com.yfas.bdw.service.compute.ScheduleCompute$$anonfun$compute$1.apply(ScheduleCompute.scala:28)
at scala.collection.immutable.List.foreach(List.scala:392)
at com.yfas.bdw.service.compute.ScheduleCompute.compute(ScheduleCompute.scala:28)
at com.yfas.bdw.trigger.AppTrigger$.run(AppTrigger.scala:24)
at com.yfas.bdw.Job$.delayedEndpoint$com$yfas$bdw$Job$1(Job.scala:14)
at com.yfas.bdw.Job$delayedInit$body.apply(Job.scala:5)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.yfas.bdw.Job$.main(Job.scala:5)
at com.yfas.bdw.Job.main(Job.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:564)
… 42 more

Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐