Stretching a legacy data center with Cisco OTV

Abstract

Stretching a legacy data center between different sites can overcome to some software limitation, but it will cost in complexity and performance. A service should be distributed by (application) design, and not implemented by network layer.

The following is a classic and simple data center:

L3 topology for a legacy data center

A firewall cluster (active-standby) manages access between external, dmz and intranet zones, implementing a 3-tier application (presentation placed in dmz, application and data placed in intranet). Both dmz and intranet zones can load-balance services between server pools.

The customer requires to deliver services from a two-site active-active infrastructure. Because customer’s applications need a L2 adjacency, the only solution is to stretch LANs, spanning the data center between two sites. We don’t want a metropolitan STP, so each site must be L3 separated and a sort of encapsulation is needed (L3 to L2 and vice versa).

OTV Topology and configuration

Cisco OTV is the protocol recommended by Cisco for data center interconnect (DCI) using Cisco Nexus 7k series. It’s not complex at all, but a working example can help a lot. OTV introduce an additional layer in the ISO/OSI stack: encapsulates L2 frames and forwards them to the remote site(s). Moreover each OTV Edge does some optimization to better manage the traffic (ARP caching, filters, …).

Using OTV the previous diagram evolves in the following L2 topology:

L2 topology with Cisco OTV

Each OTV site must be configured with a unique ID:

!CSRA1,CSRA2
otv site-identifier 0000.0000.0001
!CSRB1,CSRB2
otv site-identifier 0000.0000.0002

Each OTV router needs the following interfaces:

  • Internal interfaces: one interface for each L2 domain that must be “stretched”. Internal interface can support multiple VLANs, but because our topology is running within a simulator, one interface for each L2 domain in needed:
    • Gi1: dedicated to DMZPriv network (behind the Load Balancer).
    • Gi2: dedicated to WAN network (external Firewall network).
    • Gi3: dedicated to DMZPub network (in front of the Load Balancer).
    • Gi4: dedicated to IntraPriv network (behind the Load Balancer).
    • Gi5: dedicated to IntraPub network (in front of the Load Balancer).
    • Gi7: dedicated to FWHA network (HA Firewall network).
  • Join interface: one interface used to reach other OTV Edge devices. In our example Gi6 is configured as a Join interface.
  • Overlay: one logical (virtual) interface used to encapsulate L2 traffic in OTV and send to remote OTV Edge using the associated Join interface. Each Overlay interface can specify only one Join interface.

Cisco OTV topology

In our example all OTV Edges have the same Internal interface configuration:

interface GigabitEthernet1
 description DMZPriv
 service instance 1 ethernet
  encapsulation untagged
  rewrite ingress tag push dot1q 101 symmetric
  bridge-domain 101
!
interface GigabitEthernet2
 description WAN
 service instance 2 ethernet
  encapsulation untagged
  rewrite ingress tag push dot1q 102 symmetric
  bridge-domain 102
!
interface GigabitEthernet3
 description DMZPub
 service instance 3 ethernet
  encapsulation untagged
  rewrite ingress tag push dot1q 103 symmetric
  bridge-domain 103
!
interface GigabitEthernet4
 description IntraPriv
 service instance 4 ethernet
  encapsulation untagged
  rewrite ingress tag push dot1q 104 symmetric
  bridge-domain 104
!
interface GigabitEthernet5
 description IntraPub
 service instance 5 ethernet
  encapsulation untagged
  rewrite ingress tag push dot1q 105 symmetric
  bridge-domain 105
!
interface GigabitEthernet7
 description FWHA
 service instance 7 ethernet
  encapsulation untagged
  rewrite ingress tag push dot1q 107 symmetric
  bridge-domain 107

Basically each Internal bridge each L2 domain into a tagged VLAN.

The Overlay interface is mapped to the Join interface and is configured to encapsulate traffic from and to Internal interfaces:

interface Overlay1
 no ip address
 otv join-interface GigabitEthernet6
 service instance 1 ethernet
  encapsulation dot1q 101
  bridge-domain 101
 !
 service instance 2 ethernet
  encapsulation dot1q 102
  bridge-domain 102
 !
 service instance 3 ethernet
  encapsulation dot1q 103
  bridge-domain 103
 !
 service instance 4 ethernet
  encapsulation dot1q 104
  bridge-domain 104
 !
 service instance 5 ethernet
  encapsulation dot1q 105
  bridge-domain 105
 !
 service instance 7 ethernet
  encapsulation dot1q 107
  bridge-domain 107

OTV can be configured using unicast or multicast. If unicast is used (simpler and good for two-site scenario) one or more adjacency server must be configured.

CSRA1 is an adjacency-server and also join the remote one (CSRB1):

interface Overlay1
 otv use-adjacency-server 10.0.2.1 unicast-only
 otv adjacency-server unicast-only

CSRA2 join both adjacency-servers:

!CSRA2 
interface Overlay1
 otv use-adjacency-server 10.0.1.1 10.0.2.1 unicast-only

CSRB1 is an adjacency-server and also join the remote one (CSRA1):

!CSRB1
interface Overlay1
 otv use-adjacency-server 10.0.1.1 unicast-only
 otv adjacency-server unicast-only

CSRB2 join both adjacency-servers:

!CSRB2
interface Overlay1
 otv use-adjacency-server 10.0.1.1 10.0.2.1 unicast-only

Once OTV is configured, each node can check its own status:

CSRA1#show otv detail
Overlay Interface Overlay1
 VPN name                 : None
 VPN ID                   : 1
 State                    : UP
 Fwd-capable              : Yes
 Fwd-ready                : Yes
 AED-Server               : Yes
 Backup AED-Server        : No
 AED Capable              : Yes
 Join interface(s)        : GigabitEthernet6
 Join IPv4 address        : 10.0.1.1
 Tunnel interface(s)      : Tunnel0
 Encapsulation format     : GRE/IPv4
 Site Bridge-Domain       : 101
 Capability               : Unicast-only
 Is Adjacency Server      : Yes
 Adj Server Configured    : Yes
 Prim/Sec Adj Svr(s)      : 10.0.2.1
 OTV instance(s)          : 0
 FHRP Filtering Enabled   : Yes
 ARP Suppression Enabled  : Yes
 ARP Cache Timeout        : 600 seconds

AED (Authoritative edge device) server is the elected OTV Edge responsible for a L2 domain. In this case CSRA1 is the AED-Server and CSRA2 is the backup one. CSRA1 has the ordinal 1, CSRA2 has the ordinal 0:

CSRA2#show otv site
Site Adjacency Information (Site Bridge-Domain: 101)

Overlay1 Site-Local Adjacencies (Count: 2)

  Hostname       System ID      Last Change Ordinal    AED Enabled Status
* CSRA2          001E.E6D4.7500 00:54:30    0          site       overlay
  CSRA1          001E.F61D.6600 00:53:31    1          site       overlay

The ordinal 0 is AED for even VLAN, ordinal 1 for the odd ones:

CSRA2#show otv vlan
Key:  SI - Service Instance, NA - Non AED, NFC - Not Forward Capable.

Overlay 1 VLAN Configuration Information
 Inst VLAN BD   Auth ED              State                Site If(s)
 0    101  101   CSRA1               inactive(NA)        Gi1:SI1
 0    102  102  *CSRA2               active              Gi2:SI2
 0    103  103   CSRA1               inactive(NA)        Gi3:SI3
 0    104  104  *CSRA2               active              Gi4:SI4
 0    105  105   CSRA1               inactive(NA)        Gi5:SI5
 0    107  107   CSRA1               inactive(NA)        Gi7:SI7
 Total VLAN(s): 6

If AppServer1 (10.2.11.101) pings AppServer2 (10.2.11.102), ICMP packets are encapsulated by CSRA2 which is AED for VLAN 104 and forwarded via WAN router to CSRB2 which is AED for remote VLAN 104. A local ARP cache table is stored on all routers:

CSRA2#show otv arp-nd-cache
Overlay1 ARP/ND L3->L2 Address Mapping Cache
BD     MAC            Layer-3 Address  Age (HH:MM:SS) Local/Remote
104    5000.0011.0000 10.2.11.102      00:00:19       Remote

By default, if a local AED dies, OTV will take up to 30 seconds to converge.

Asymmetric path

In the example topology, both Firewall and Load Balancer clusters are configured in active-standby mode. Suppose all A nodes are active, and all B nodes are standby.

L3 traffic flows

In the above diagram

A connection from Internet flows through:

  • the active external router announcing public networks bia BGP;
  • the active firewall;
  • the active load-balancer placed in dmz zone;
  • one of the web server behind the load-balanced front end service. If the selected server is in the remote site, the request must be encapsulated in OTV and forwarded via CSRA1 (AED for DMZPriv LAN), WAN and CSRB1.

Now the serving web server must call a load-balanced application service, passing trough:

  • the active load-balancer which is also the gateway in dmz zone;
  • the active firewall;
  • the active load-balancer placed in intranet zone;
  • one of the back end server behind the load-balanced back end service. If the selected server is in the remote site, the request must be encapsulated in OTV and forwarded via CSRA2 (AED for IntraPriv LAN), WAN and CSRB2.

Check now what happen in the underlay domain and pay attention to the dashed lines:

L2 traffic flows with Cisco OTV

We can assume that half requests will flows to the other site and back, doubling latency. This go-and-back traffic is also called “traffic trombone” because of the physical movement required to play the instrument.

Switching to active-active cluster usually don’t help so much (in this specific scenario) because each node usually serves a specific range of MAC/IP address, and, moreover, load-balancer cannot be aware if a specific server is local or not. If the load-balancer terminates each connection, then the “traffic trombone” happens too and it is already been discussed above.

If load-balancers does not terminate connection, then a different scenario comes:

L2 traffic flows with Cisco OTV and multiple clusters

Because OTV can filter HSRP/VRRP hello between sites, each cluster is an active default-gateway. A connection from Internet flows through:

  • one of the active external router, assume BGPA;
  • the cluster firewall placed in site A;
  • the dmz load-balancer placed site A;
  • one of the web server behind the load-balanced front end service. Assume the worst case scenario, the selected server is located in the site B. Traffic must now be encapsulated via OTV and delivered to WebServer2.

But this is one-way traffic, the other way takes another path. From WebServer2 traffic flows to:

  • the dmz load-balancer placed site B;
  • the cluster firewall placed in site B;
  • the external router placed in site B.

Then the traffic should arrive to the Internet client. But because there is an asymmetric path, each firewall cluster cannot see requests and responses. During TCP three-way-handshake FW-cluster-A sees SYN and ACK packet, FW-cluster-B sees SYN+ACK packet. Because firewalls usually inspect traffic, asymmetric connection will be dropped.

Each web server must also call a back end server, and if requests are balanced to the remote site, asymmetric routing occurs and connections will be dropped.

This is the worst case scenario, of course, but impact up to 75% of total connections.

Some improvements can also be implemented in this scenario, we’ll see how in another post.

References

Posted on 09 Nov 2016.
  • Gmail icon
  • Twitter icon
  • Facebook icon
  • LinkedIN icon
  • Google+ icon