LVS+Keepalived+Nginx的奇怪问题
最近因为项目中服务器架构要升级,考虑到高可用性,决定采用keepalived做LVS Server的双机互备,然后LVS作为DB和前端Nginx的load balancer。
我的环境:
VIP 10.8.12.200?
PostgreSql RealServer1 10.8.12.208?
PostgreSql RealServer2 10.8.12.209
Tomcat 1 10.8.12.203
Tomcat 2 10.8.12.204
LVS Server1 & Nginx RealServer1 10.8.12.201
LVS Server2 & Nginx RealServer2 10.8.12.202
gateway 10.8.12.254
上述服务器都只配一块网卡,Ubuntu 11.04 Server
这些都是用vmware创建的虚拟机,考虑到生产环境的服务器数量有限,所以LVS Server和Nginx RealServer是安装在同一台机器上的。ipvsadm、keepalived安装在10.8.12.201(LVS Server1 & Nginx RealServer1)和10.8.12.202(LVS Server2 & Nginx RealServer2)机器上。
我准备了两套方案如下:
方案一:
前端采用Nginx作反向代理服务器并同时作动静分离,load balances到后端的tomcat集群和web服务器。后端用LVS作为PostgreSql?Server的load balancer。keepalived做双机互备。
keepalived master上的配置文件内容如下:
global_defs { router_id Nginx_Id_1 } vrrp_script Monitor_Nginx { script "/usr/local/keepalived/etc/keepalived/scripts/monitor_nginx.sh" interval 2 weight 2 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 33 priority 101 advert_int 1 authentication { auth_type PASS auth_pass 1111 } #VIP virtual_ipaddress { 10.8.12.200 } track_script { Monitor_Nginx } } virtual_server 10.8.12.200 5432 { delay_loop 6 lb_algo rr lb_kind DR persistence_timeout 0 protocol TCP real_server 10.8.12.208 5432 { weight 1 TCP_CHECK { connect_port 5432 connect_timeout 10 } } real_server 10.8.12.209 5432 { weight 1 TCP_CHECK { connect_port 5432 connect_timeout 10 } } }
?
keepalived backup配置文件此处省略...
LVS Server的路由信息:
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
->RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.8.12.200:5432 rr
->10.8.12.208:5432 Route 1 0 0
->10.8.12.209:5432 Route 1 0 0
PostgreSql?RealServer的lvs脚本如下:
/bin/bash #Description : RealServer Start! VIP=10.8.12.210 LVS_TYPE=DR . /lib/lsb/init-functions case "$1" in start) echo "start LVS of REALServer" /sbin/ifconfig lo:0 $VIP broadcast $VIP netmask 255.255.255.255 up /sbin/route add -host $VIP dev lo:0 echo "1" > /proc/sys/net/ipv4/conf/lo/arp_ignore echo "2" > /proc/sys/net/ipv4/conf/lo/arp_announce echo "1" > /proc/sys/net/ipv4/conf/all/arp_ignore echo "2" > /proc/sys/net/ipv4/conf/all/arp_announce ;; stop) route del -host $VIP dev lo:0 /sbin/ifconfig lo:0 down echo "close LVS Directorserver" echo "0" > /proc/sys/net/ipv4/conf/lo/arp_ignore echo "0" > /proc/sys/net/ipv4/conf/lo/arp_announce echo "0" > /proc/sys/net/ipv4/conf/all/arp_ignore echo "0" > /proc/sys/net/ipv4/conf/all/arp_announce ;; *) echo "Usage $0 {start|stop}" exit 1 ;; esac exit 0
?
DB RealServer上的route:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.8.12.200 * 255.255.255.255 UH 0 0 0 lo
10.8.12.0 * 255.255.255.0 U 0 0 0 eth0
default 10.8.12.254 0.0.0.0 UG 100 0 0 eth0
这套方案经测试没有问题。
方案二:
前端采用LVS作为Nginx的load balancer,Nginx再作反向代理服务器并同时作动静分离,load balances到后端的tomcat集群和web服务器,keepalived做LVS双机互备。后端用LVS作为PostgreSql?Server的load balancer。PostgreSql的LVS Server和Nginx的LVS Server是同一个,只是端口不同。LVS Server和Nginx RealServer共享同一台机器,PostgreSql?RealServer是另外两台机器。
相比方案一,方案二只是在前端Nginx上又加了一层LVS的load balancer,Nginx的角色本身没有变化。
keepalived master配置文件内容如下:
global_defs { router_id Nginx_Id_1 } vrrp_script Monitor_Nginx { script "/usr/local/keepalived/etc/keepalived/scripts/monitor_nginx.sh" interval 2 weight 2 } vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 33 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } #VIP virtual_ipaddress { 10.8.12.200 } track_script { Monitor_Nginx } } virtual_server 10.8.12.200 80 { delay_loop 6 lb_algo rr lb_kind DR persistence_timeout 60 protocol TCP real_server 10.8.12.201 80 { weight 1 TCP_CHECK { connect_port 80 connect_timeout 10 } } real_server 10.8.12.202 80 { weight 1 TCP_CHECK { connect_port 80 connect_timeout 10 } } } virtual_server 10.8.12.200 5432 { delay_loop 6 lb_algo rr lb_kind DR persistence_timeout 0 protocol TCP real_server 10.8.12.208 5432 { weight 1 TCP_CHECK { connect_port 5432 connect_timeout 10 } } real_server 10.8.12.209 5432 { weight 1 TCP_CHECK { connect_port 5432 connect_timeout 10 } } }
?可以看出来,相比方案一,只是多了10.8.12.200 80端口的LVS配置。
此时的LVS Server路由信息如下:
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.8.12.200:80 rr persistent 60
->10.8.12.201:80 Route 1 0 0
-> 10.8.12.202:80 Route 1 0 0
TCP 10.8.12.200:5432 rr
-> 10.8.12.208:5432 Route 1 0 0
-> 10.8.12.209:5432 Route 1 0 0
由于LVS Server同时又是Nginx RealServer节点,所以在10.8.12.201(LVS Server1 & Nginx RealServer1)和10.8.12.202(LVS Server2 & Nginx RealServer2)机器上还创建了lvs脚本如下:
/bin/bash #Description : RealServer Start! VIP=10.8.12.210 LVS_TYPE=DR . /lib/lsb/init-functions case "$1" in start) echo "start LVS of REALServer" /sbin/ifconfig lo:0 $VIP broadcast $VIP netmask 255.255.255.255 up /sbin/route add -host $VIP dev lo:0 echo "1" > /proc/sys/net/ipv4/conf/lo/arp_ignore echo "2" > /proc/sys/net/ipv4/conf/lo/arp_announce echo "1" > /proc/sys/net/ipv4/conf/all/arp_ignore echo "2" > /proc/sys/net/ipv4/conf/all/arp_announce ;; stop) route del -host $VIP dev lo:0 /sbin/ifconfig lo:0 down echo "close LVS Directorserver" echo "0" > /proc/sys/net/ipv4/conf/lo/arp_ignore echo "0" > /proc/sys/net/ipv4/conf/lo/arp_announce echo "0" > /proc/sys/net/ipv4/conf/all/arp_ignore echo "0" > /proc/sys/net/ipv4/conf/all/arp_announce ;; *) echo "Usage $0 {start|stop" exit 1 ;; esac exit 0
?
Nginx RealServer上的route:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.8.12.200 * 255.255.255.255 UH 0 0 0 lo
10.8.12.0 * 255.255.255.0 U 0 0 0 eth0
default 10.8.12.254 0.0.0.0 UG 100 0 0 eth0
DB RealServer节点的配置与方案一相同,此处省略。
问题现象如下:
LVS服务器刚启动时,访问10.8.12.200一切正常。
在服务器运行一段时间后(其实也就几分钟,这期间没有做页面访问),再次访问10.8.12.200访问失败,返回502,多次刷新问题依旧。检查LVS Server的路由信息没有变化,route也是正常的。然后我尝试直接访问10.8.12.201上的nginx,访问正常;再尝试直接访问10.8.12.203:8080(后端tomcat),访问也正常。这就是说LVS Server load balance DB RealServer此时是正常的,只是load balance Nginx RealServer不正常。停掉keepalived主机后,备机可以正常接管,接管后再访问10.8.12.200正常。再把keepalived主机启起来后,主机又接管了VIP,但10.8.12.200依然不能访问。在ipvsadm -C之后,访问又正常了(偶然的发现,实在想不明白为什么这样)。
思考:
网上的教程说LVS Server和RealServer节点完全可以共享同一台机器,但在这里只是共享的Nginx RealServer无法访问,DB RealServer是正常的,实在不知道问题出在哪里。
?
?
问题解决了,是keepalived的配置问题,80端口由于是由nginx负责做load balance,所以针对80端口的lvs配置就是多余的了,删掉就可以了。
?