TF上でpacemakerを動かしてみたときの動作

Tungsten のarp対応ロジックは、HAクラスタ等で使われる gratuitous arp にも対応している。
https://github.com/Juniper/contrail-controller/wiki/Contrail-VRouter-ARP-Processing

サンプルとして、k8s上のcentos7で、pacemakerを動かして動作を確認してみている。

※ pacemaker の設定については、以下を参照:
https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/7/html/high_availability_add-on_administration/ch-startup-haaa

準備として、まず、以下のようなyaml を使って、systemd 使用可能な centos pod を2つ (以下、centos1, centos2) 作成する。

apiVersion: v1
kind: Pod
metadata:
  name: centos1
  labels:
    name: centos1
spec:
  containers:
  - name: centos1
    image: centos/systemd
    securityContext:
      privileged: true

このあと、以下のようなコマンドを順に発行し、haクラスタの構成を行う。

yum install pacemaker pcs which
passwd hacluster
systemctl start pcsd.service
vi /etc/hosts
 (/etc/hosts に centos1/centos2 のipをそれぞれ追記)
pcs cluster auth centos1 centos2
 (先ほど設定した、haclusterユーザーのパスワードを入力)
pcs cluster setup --start --name cluster1 centos1 centos2
pcs property set stonith-enabled=false
pcs cluster enable --all

※ なお、yum でパッケージの取得を行う際には、default:k8s-default-pod-network で、Advanced Option > SNAT にチェックをつける必要がある。
https://github.com/Juniper/contrail-specs/blob/master/distributed-snat.md

この後、クラスタにresource の定義を実施し、vipを設定する。

pcs resource create VirtualIP IPaddr2 ip=10.47.255.11 cidr_netmask=24

うまくクラスタが設定されると、ステータス確認で以下のような出力が得られる。

[root@centos1 /]# pcs status
Cluster name: cluster1
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: centos2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Thu May  3 07:42:56 2018
Last change: Thu May  3 07:41:51 2018 by root via cibadmin on centos2

2 nodes configured
1 resource configured

Online: [ centos1 centos2 ]

Full list of resources:

 VirtualIP	(ocf::heartbeat:IPaddr2):	Started centos1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/disabled
[root@centos1 /]#

また、resourceが起動しているcentos1 に、vip が付与されていることを確認しておく。

[root@centos1 /]# ip -o a
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
9: eth0    inet 10.47.255.251/12 scope global eth0\       valid_lft forever preferred_lft forever
9: eth0    inet 10.47.255.11/24 scope global eth0\       valid_lft forever preferred_lft forever
9: eth0    inet6 fe80::80a6:33ff:fedb:b209/64 scope link \       valid_lft forever preferred_lft forever
[root@centos1 /]#

この状態で、同じサブネットに cirros を立ち上げてvip にping を発行するのだが、ここまでの設定だと、TF 上で許可設定が無いため、pingが飛ばない状態になる。

/ # ip -o a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1\    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
15: eth0@if16: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue \    link/ether 02:fa:bb:e6:12:4e brd ff:ff:ff:ff:ff:ff
15: eth0    inet 10.47.255.250/12 scope global eth0\       valid_lft forever preferred_lft forever
15: eth0    inet6 fe80::58f0:7aff:fe95:ee89/64 scope link \       valid_lft forever preferred_lft forever
/ # 
/ # 
/ # 
/ # ping 10.47.255.11
PING 10.47.255.11 (10.47.255.11): 56 data bytes
^C
--- 10.47.255.11 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss
/ #

centos1 用のvmi に、vip (ここでは、10.47.255.11) の使用を許可するには、webui の、Configure > Networking > Ports から、'Advanced Option > Allowed Address Pair' を使って、許可を行う必要がある。
f:id:aaabbb_200904:20180503221456p:plain

設定後、以下のようにping が飛ぶようになる。
※ 同様に centos2 の方のport にもallowed address pair を設定しておく。

/ # ping 10.47.255.11
PING 10.47.255.11 (10.47.255.11): 56 data bytes
64 bytes from 10.47.255.11: seq=0 ttl=63 time=1.031 ms
64 bytes from 10.47.255.11: seq=1 ttl=63 time=0.492 ms
^C
--- 10.47.255.11 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.492/0.761/1.031 ms
/ #

また、以下のコマンドで、HAクラスタのresource の移動を行い、centos2 に移った場合も、ping が飛ぶことを確認しておく。

[root@centos1 ~]# pcs resource move VirtualIP centos2
[root@centos1 ~]# pcs status
Cluster name: cluster1
Stack: corosync
Current DC: centos2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Thu May  3 08:07:42 2018
Last change: Thu May  3 08:07:36 2018 by root via crm_resource on centos1

2 nodes configured
1 resource configured

Online: [ centos1 centos2 ]

Full list of resources:

 VirtualIP	(ocf::heartbeat:IPaddr2):	Started centos2

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/disabled
[root@centos1 ~]#

この後、cirrosからvipにping を打ちながら、切り替えを行う、という対応を実施してみたのだが、上記で使用している、k8s デフォルトの仮想ネットワーク(default:k8s-default-pod-network) の場合、なぜか切り替えに10秒程度かかる事象が発生した、、(原因不明だが、k8s-default-pod-network には、いくつかk8s用のポリシーが設定されていることと、/12と広いサブネットが設定されていること、等が関係しているかもしれない)
このため、以下の内容で、別途仮想ネットワークを作成し、再度、同じ切り替えを実施してみている。

Name: vn1
Subnet: 10.0.1.0/24

この場合、切り替え時のパケットロスは1秒程度、という結果になった。

/ # ping 10.0.1.11
PING 10.0.1.11 (10.0.1.11): 56 data bytes
64 bytes from 10.0.1.11: seq=0 ttl=64 time=0.446 ms
64 bytes from 10.0.1.11: seq=1 ttl=64 time=0.054 ms
64 bytes from 10.0.1.11: seq=2 ttl=64 time=0.082 ms
64 bytes from 10.0.1.11: seq=3 ttl=64 time=0.058 ms  : centos1 -> centos2 の切り替えを実施
64 bytes from 10.0.1.11: seq=5 ttl=64 time=1.342 ms
64 bytes from 10.0.1.11: seq=6 ttl=64 time=0.482 ms
64 bytes from 10.0.1.11: seq=7 ttl=64 time=0.661 ms
64 bytes from 10.0.1.11: seq=8 ttl=64 time=0.574 ms
64 bytes from 10.0.1.11: seq=9 ttl=64 time=0.597 ms
64 bytes from 10.0.1.11: seq=10 ttl=64 time=0.538 ms
64 bytes from 10.0.1.11: seq=11 ttl=64 time=0.571 ms
64 bytes from 10.0.1.11: seq=12 ttl=64 time=0.532 ms  : centos2 -> centos1 の切り替えを実施
64 bytes from 10.0.1.11: seq=14 ttl=64 time=0.337 ms
64 bytes from 10.0.1.11: seq=15 ttl=64 time=0.051 ms
64 bytes from 10.0.1.11: seq=16 ttl=64 time=0.075 ms
64 bytes from 10.0.1.11: seq=17 ttl=64 time=0.074 ms
^C
--- 10.0.1.11 ping statistics ---
18 packets transmitted, 16 packets received, 11% packet loss
round-trip min/avg/max = 0.051/0.404/1.342 ms
/ #

上記の結果を見る限り、(環境に合わせて工夫は必要だが) 現在同じような方法でHAクラスタを組んでいる部分についても、仮想ネットワークに移せる可能性がある、といえるのではなかろうか。