Apache httpd +tomcat ProxyPass应用宕机处理总结
???
?? ?系统采用Apache http server+Tomcat部署,最近总是出现登陆不上的问题,开始每天总是偶尔有那么几分钟怎么也登不上去,由于供应商在处理我也没怎么在意。可最近这几天开始经常性出现,没办法,动手解决一下。记录一下经过。
?? ?系统部署在统一服务器上,应用部署了三个tomcat server,jvmRoute分别为t1,t2,t3,采用AJP/1.3协议,端口分别配置为9013,9023,9033,tomcat的http端口分别为:8013,8023,8033,maxThreads 设为1000,httpd采用ProxyPass做了负载均衡的映射,数据库为Oracle 12g。
?? ?开始用
?? ??? ?ps -ef | grep httpd | wc -l
?? ?查看了一下httpd的进程数,工作正常时间低于100,问题不大。
?? ?
?? ?netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
?? ?查看tcp连接统计:
?? ??? ?TIME_WAIT 410
?? ??? ?FIN_WAIT2 2
?? ??? ?ESTABLISHED 1585
?? ?看上去好像是连接用光了。
?? ?看apache error_log日志:
[Tue Sep 04 18:39:55 2012] [error] (104)Connection reset by peer: ajp_ilink_receive() can't receive header
[Tue Sep 04 18:39:55 2012] [error] ajp_read_header: ajp_ilink_receive failed
[Tue Sep 04 18:39:55 2012] [error] (120006)APR does not understand this error code: proxy: read response failed from (null) ($IP)
?? ?看来是proxy得不到响应,于是上度娘,没什么结果。公司的网络google不给力,找半天没结果。
?? ?看看天色渐晚,已经19:00多了,心下着急,还想着回家吃饭。
?? ?决定跟踪一下端口:
?? ? netstat -apn | grep "*:80"
??? 已经下班了,数目为个位数,系统基本没人用。
??? 跟踪9013,9023,9033
??? ?netstat -apn | grep 9013
??? ?netstat -apn | grep 9023
??? ?netstat -apn | grep 9033
??? 竟然都是满的。??? ???
??? 而客户端通过apache仍然连不上,但是直接连接tomcat的8013,8023,8033端口是可用的。
??? 至此可以断定是apache连接tomcat出了问题,但什么原因造成的,不知道。为了赶时间,就果断restart了tomcat,世界清净了。
??? 晚上回家google了一下,搜到[2],apache bug 38227, 似乎从httpd2.2.2就已经修复了这个bug。
??? 早上来到公司,检查了一下版本,2.2.22,真够2,问题还在。看了apache日志,重启后,一切都很正常,看连接数,用户也不多,3个jvmRoute只有一个9033端口在被占用。
??? 打开日志开始观察:
??? apache httpd:
??? ??? tail -f error_log
??? apche tomcat:
??? ??? tail -f localhost.2012-09-05.log
??? 过了一会问题出现了,httpd报错
[Wed Sep 05 09:42:14 2012] [error] ajp_read_header: ajp_ilink_receive failed
[Wed Sep 05 09:42:14 2012] [error] (70007)The timeout specified has expired: proxy: read response failed from 172.16.5.68:18033 (172.16.5.68)
[Wed Sep 05 09:42:28 2012] [error] (70007)The timeout specified has expired: ajp_ilink_receive() can't receive header
??? 追踪到tomcat:
2012-9-5 9:47:06 org.apache.catalina.core.StandardWrapperValve invoke
严重: Servlet.service() for servlet service threw exception
com.mchange.v2.resourcepool.TimeoutException: A client timed out while waiting to acquire a resource from com.mchange.v2.resourcepool.BasicResourcePool@67cec874 -- timeout at awaitAvailable()
??? at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1317)
??? at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:557)
??? at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:477)
??? at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:525)
??? at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:128)
??? 度娘上搜了一下,找到了[3],心下吸了口凉气。看来是数据库死锁导致了tomcat连接被占,时间长了就没法应对ajp的请求,导致系统当掉。
??? 调了一下数据源的配置,以观后效了。
??? 实在不行准备用终极大招,定时轮流restart tomcat server了。
参考:
[1] http://www.2cto.com/os/201205/130110.html
[2] https://issues.apache.org/bugzilla/show_bug.cgi?id=38227
[3] http://aijuans.iteye.com/blog/1478466