Stretching a legacy data center between different sites can overcome to some software limitation, but it will cost in complexity and performance. A service should be distributed by (application) design, and not implemented by network layer.
The following is a classic and simple data center:
A firewall cluster (active-standby) manages access between
intranet zones, implementing a 3-tier application (
presentation placed in
data placed in
intranet zones can load-balance services between server pools.
The customer requires to deliver services from a two-site active-active infrastructure. Because customer’s applications need a L2 adjacency, the only solution is to stretch LANs, spanning the data center between two sites. We don’t want a metropolitan STP, so each site must be L3 separated and a sort of encapsulation is needed (L3 to L2 and vice versa).
OTV Topology and configuration
Cisco OTV is the protocol recommended by Cisco for data center interconnect (DCI) using Cisco Nexus 7k series. It’s not complex at all, but a working example can help a lot. OTV introduce an additional layer in the ISO/OSI stack: encapsulates L2 frames and forwards them to the remote site(s). Moreover each OTV Edge does some optimization to better manage the traffic (ARP caching, filters, …).
Using OTV the previous diagram evolves in the following L2 topology:
Each OTV site must be configured with a unique ID:
!CSRA1,CSRA2 otv site-identifier 0000.0000.0001
!CSRB1,CSRB2 otv site-identifier 0000.0000.0002
Each OTV router needs the following interfaces:
- Internal interfaces: one interface for each L2 domain that must be “stretched”. Internal interface can support multiple VLANs, but because our topology is running within a simulator, one interface for each L2 domain in needed:
- Gi1: dedicated to
DMZPrivnetwork (behind the Load Balancer).
- Gi2: dedicated to
WANnetwork (external Firewall network).
- Gi3: dedicated to
DMZPubnetwork (in front of the Load Balancer).
- Gi4: dedicated to
IntraPrivnetwork (behind the Load Balancer).
- Gi5: dedicated to
IntraPubnetwork (in front of the Load Balancer).
- Gi7: dedicated to
FWHAnetwork (HA Firewall network).
- Gi1: dedicated to
- Join interface: one interface used to reach other OTV Edge devices. In our example Gi6 is configured as a Join interface.
- Overlay: one logical (virtual) interface used to encapsulate L2 traffic in OTV and send to remote OTV Edge using the associated Join interface. Each Overlay interface can specify only one Join interface.
In our example all OTV Edges have the same Internal interface configuration:
interface GigabitEthernet1 description DMZPriv service instance 1 ethernet encapsulation untagged rewrite ingress tag push dot1q 101 symmetric bridge-domain 101 ! interface GigabitEthernet2 description WAN service instance 2 ethernet encapsulation untagged rewrite ingress tag push dot1q 102 symmetric bridge-domain 102 ! interface GigabitEthernet3 description DMZPub service instance 3 ethernet encapsulation untagged rewrite ingress tag push dot1q 103 symmetric bridge-domain 103 ! interface GigabitEthernet4 description IntraPriv service instance 4 ethernet encapsulation untagged rewrite ingress tag push dot1q 104 symmetric bridge-domain 104 ! interface GigabitEthernet5 description IntraPub service instance 5 ethernet encapsulation untagged rewrite ingress tag push dot1q 105 symmetric bridge-domain 105 ! interface GigabitEthernet7 description FWHA service instance 7 ethernet encapsulation untagged rewrite ingress tag push dot1q 107 symmetric bridge-domain 107
Basically each Internal bridge each L2 domain into a tagged VLAN.
The Overlay interface is mapped to the Join interface and is configured to encapsulate traffic from and to Internal interfaces:
interface Overlay1 no ip address otv join-interface GigabitEthernet6 service instance 1 ethernet encapsulation dot1q 101 bridge-domain 101 ! service instance 2 ethernet encapsulation dot1q 102 bridge-domain 102 ! service instance 3 ethernet encapsulation dot1q 103 bridge-domain 103 ! service instance 4 ethernet encapsulation dot1q 104 bridge-domain 104 ! service instance 5 ethernet encapsulation dot1q 105 bridge-domain 105 ! service instance 7 ethernet encapsulation dot1q 107 bridge-domain 107
OTV can be configured using unicast or multicast. If unicast is used (simpler and good for two-site scenario) one or more adjacency server must be configured.
CSRA1 is an adjacency-server and also join the remote one (CSRB1):
interface Overlay1 otv use-adjacency-server 10.0.2.1 unicast-only otv adjacency-server unicast-only
CSRA2 join both adjacency-servers:
!CSRA2 interface Overlay1 otv use-adjacency-server 10.0.1.1 10.0.2.1 unicast-only
CSRB1 is an adjacency-server and also join the remote one (CSRA1):
!CSRB1 interface Overlay1 otv use-adjacency-server 10.0.1.1 unicast-only otv adjacency-server unicast-only
CSRB2 join both adjacency-servers:
!CSRB2 interface Overlay1 otv use-adjacency-server 10.0.1.1 10.0.2.1 unicast-only
Once OTV is configured, each node can check its own status:
CSRA1#show otv detail Overlay Interface Overlay1 VPN name : None VPN ID : 1 State : UP Fwd-capable : Yes Fwd-ready : Yes AED-Server : Yes Backup AED-Server : No AED Capable : Yes Join interface(s) : GigabitEthernet6 Join IPv4 address : 10.0.1.1 Tunnel interface(s) : Tunnel0 Encapsulation format : GRE/IPv4 Site Bridge-Domain : 101 Capability : Unicast-only Is Adjacency Server : Yes Adj Server Configured : Yes Prim/Sec Adj Svr(s) : 10.0.2.1 OTV instance(s) : 0 FHRP Filtering Enabled : Yes ARP Suppression Enabled : Yes ARP Cache Timeout : 600 seconds
AED (Authoritative edge device) server is the elected OTV Edge responsible for a L2 domain. In this case CSRA1 is the AED-Server and CSRA2 is the backup one.
CSRA1 has the ordinal
1, CSRA2 has the ordinal
CSRA2#show otv site Site Adjacency Information (Site Bridge-Domain: 101) Overlay1 Site-Local Adjacencies (Count: 2) Hostname System ID Last Change Ordinal AED Enabled Status * CSRA2 001E.E6D4.7500 00:54:30 0 site overlay CSRA1 001E.F61D.6600 00:53:31 1 site overlay
0 is AED for even VLAN, ordinal
1 for the odd ones:
CSRA2#show otv vlan Key: SI - Service Instance, NA - Non AED, NFC - Not Forward Capable. Overlay 1 VLAN Configuration Information Inst VLAN BD Auth ED State Site If(s) 0 101 101 CSRA1 inactive(NA) Gi1:SI1 0 102 102 *CSRA2 active Gi2:SI2 0 103 103 CSRA1 inactive(NA) Gi3:SI3 0 104 104 *CSRA2 active Gi4:SI4 0 105 105 CSRA1 inactive(NA) Gi5:SI5 0 107 107 CSRA1 inactive(NA) Gi7:SI7 Total VLAN(s): 6
If AppServer1 (
10.2.11.101) pings AppServer2 (
10.2.11.102), ICMP packets are encapsulated by CSRA2 which is AED for VLAN 104 and forwarded via WAN router to CSRB2 which is AED for remote VLAN 104. A local ARP cache table is stored on all routers:
CSRA2#show otv arp-nd-cache Overlay1 ARP/ND L3->L2 Address Mapping Cache BD MAC Layer-3 Address Age (HH:MM:SS) Local/Remote 104 5000.0011.0000 10.2.11.102 00:00:19 Remote
By default, if a local AED dies, OTV will take up to 30 seconds to converge.
In the example topology, both Firewall and Load Balancer clusters are configured in active-standby mode. Suppose all
A nodes are active, and all
B nodes are standby.
In the above diagram a connection from Internet flows through:
- the active external router announcing public networks bia BGP;
- the active firewall;
- the active load-balancer placed in
- one of the web server behind the load-balanced front end service. If the selected server is in the remote site, the request must be encapsulated in OTV and forwarded via CSRA1 (AED for
DMZPrivLAN), WAN and CSRB1.
Now the serving web server must call a load-balanced application service, passing trough:
- the active load-balancer which is also the gateway in
- the active firewall;
- the active load-balancer placed in
- one of the back end server behind the load-balanced back end service. If the selected server is in the remote site, the request must be encapsulated in OTV and forwarded via CSRA2 (AED for
IntraPrivLAN), WAN and CSRB2.
Check now what happen in the underlay domain and pay attention to the dashed lines:
We can assume that half requests will flows to the other site and back, doubling latency. This go-and-back traffic is also called “traffic trombone” because of the physical movement required to play the instrument.
Switching to active-active cluster usually don’t help so much (in this specific scenario) because each node usually serves a specific range of MAC/IP address, and, moreover, load-balancer cannot be aware if a specific server is local or not. If the load-balancer terminates each connection, then the “traffic trombone” happens too and it is already been discussed above.
If load-balancers does not terminate connection, then a different scenario comes:
Because OTV can filter HSRP/VRRP hello between sites, each cluster is an active default-gateway. A connection from Internet flows through:
- one of the active external router, assume
- the cluster firewall placed in site A;
dmzload-balancer placed site A;
- one of the web server behind the load-balanced front end service. Assume the worst case scenario, the selected server is located in the site B. Traffic must now be encapsulated via OTV and delivered to
But this is one-way traffic, the other way takes another path. From
WebServer2 traffic flows to:
dmzload-balancer placed site B;
- the cluster firewall placed in site B;
- the external router placed in site B.
Then the traffic should arrive to the Internet client. But because there is an asymmetric path, each firewall cluster cannot see requests and responses. During TCP three-way-handshake
Because firewalls usually inspect traffic, asymmetric connection will be dropped.
Each web server must also call a back end server, and if requests are balanced to the remote site, asymmetric routing occurs and connections will be dropped.
This is the worst case scenario, of course, but impact up to 75% of total connections.
Some improvements can also be implemented in this scenario, we’ll see how in another post.