Link Aggregation Hints

Aggregated Links - Caveat Router

This is an attempt at summarising available information about bugs and other issues you might run into when configuring link aggregation link aggregation.

This document focuses mostly on Gigabit Ethernet link aggregation. So far, our list of issues with 10Gigabit Ethernet link aggregation remains empty.

The AMS-IX NOC  welcomes additional information that you may wish to share with the community.

Foundry

  • We configured our BigIron JetCore-based edge switches this way:
    !
    trunk server ethernet slot/port to slot/port+1
    !
  • The first GigE port must be oddly numbered, and the other port must directly follow the first one. The same goes for any additional pairs of ports in an aggregated link.
  • On BigIron 15000 switches you cannot build trunks with ports on blade 8, or spanning ports on both sides of slot 8.
  • The load-balancing algorithm used in IronCore-based switches is not geared at all towards an exchange point situation, which makes an additional port hardly worth the effort.
  • BigIron RX switches only know "server" type trunks.
  • The load-balancing algorithm used by MLX and XMR is described on  Foundry's website.
  • We configure our NetIron MLX (10GbE) switches this way, adding ports if necessary in similar fashion:
    !
    lag "memberportaggregate" static id n
    ports ethernet slot/port to slot/otherport ethernet otherslot/port
    primary-port slot/port
    deploy
    port-name "memberportaggregate #1" slot/port
    ...
    !
  • There are no limits to port placements for aggregated links on the RX and MLX platforms.
  • BigIron RX has a limit of 8 ports per aggregated link, NetIron MLX/XMR raise this to 16 in software 3.5.0, 32 in 3.8.0, if configured with "system-max trunk-num 64" or fewer.

 

Cisco Products

  • Configure the port-channel as "on", or should you want LACP, as "active". Please do not configure any forms of "negotiate" or "desirable" as AMS-IX switches do not support PAgP. Also note that all normal restrictions apply as usual in terms of allowed traffic, please see allowed traffic for details.
  • Some modules do not support more than 1 Gbps of traffic under certain conditions across an aggregated link.
  • Load-balancing over four ports may result in an unequal distribution due to CSCsg80948.
  • Here is an example static configuration for IOS:    
    interface GigabitEthernet1/1
    description AMS-IX Link 1
    no ip address
    no ip redirects
    no ip proxy-arp
    no keepalive
    no cdp enable
    channel-group 1 mode on
    !
    interface GigabitEthernet1/2
    description AMS-IX Link 2
    no ip address
    no ip redirects
    no ip proxy-arp
    no keepalive
    no cdp enable
    channel-group 1 mode on
    !
    interface Port-channel1
    description AMS-IX aggregated link
    ip address 195.69.14x.y 255.255.252.0
    no ip redirects
    no ip proxy-arp
    no keepalive
    !
  • Here are examples of LACP configurations:    
    Cisco IOS 65xx/76xx:

    interface GigabitEthernet1/1
    description AMS-IX Link 1
    channel-group 10 mode active
    ! (12.2(18)SXF2 or (12.2(33)SRC) upwards)
    lacp rate fast
    !
    interface GigabitEthernet1/2
    description AMS-IX Link 2
    channel-group 10 mode active
    !
    interface Port-channel10
    description AMS-IX aggregated link
    no switchport
    ip address 195.69.14x.y 255.255.252.0
    !


    Cisco IOS-XR:

    interface Bundle-Ether 10
    description AMS-IX aggregated link
    ipv4 address 195.69.14x.y 255.255.252.0
    !
    interface GigabitEthernet 1/0/0/0
    description AMS-IX Link 1
    bundle-id 10 mode active
    ! (3.2 upwards)
    lacp period short
    !
    interface GigabitEthernet 1/0/1/0
    description AMS-IX Link 2
    bundle-id 10 mode active
    !
    (don't forget to commit)

    Cisco NX-OS:


    feature lacp
    !
    interface ethernet 2/1
    description AMS-IX Link 1
    channel-group 10 mode active
    lacp rate fast
    !
    interface ethernet 2/2
    description AMS-IX Link 2
    channel-group 10 mode active
    !
    interface port-channel 10
    description AMS-IX aggregated link
    ip address 195.69.14x.y 255.255.252.0
    !  

Cisco GSR

  • Do not set a static MAC address on the Port-channel interface. This causes CEF inconsistencies and other assorted failures.
  • Link aggregation and IPv6 do not seem to play well together. Cisco advises against trying this.
  • Some changes will result in a different MAC address getting chosen for the aggregated link (likely such as reloading a linecard, if it contains the first port in the bundle). This will keep your ports dysfunctional due to port security on the AMS-IX switches port security on the AMS-IX switches and you will have to contact the AMS-IX NOC in such cases to fix this.
  • Some restrictions apply to what features are supported on link bundles (e.g. sampled NetFlow only on ISE/Engine4+; no uRPF).
  • Not all line cards support link bundling, and if traffic towards AMS-IX comes in on such an interface you will experience suboptimal load-balancing.
  • Support for link bundling on Engine 5 linecards will come in 12.0(33)S.
  • Cisco Engineering have a special train called "Phase 3" (lb-eft-ph3) that is purported to also provide functionality such as MAC address accounting for Port-Channel interfaces. This seems to have been integrated into 12.0(32)S, but IPv6 does not seem to be supported yet. 
  • Below follows a list of Cisco Bug IDs (ddts) related to link aggregation that you need to consider when choosing an appropriate IOS image.
    • CSCee27396 present in 12.0(26)S1; fixed in 12.0(26)S3, 12.0(27)S2, 12.0(28)S1, 12.0(30)S
      Symptoms: Over 90% CPU usage by CEF Scanner on all linecards and %TFIB-7-SCANSABORTED errors occur when configuring a link bundle.
      Also, the router sends traffic to MAC addresses taken from its ARP table seemingly at random, instead of to the appropriate next-hop's MAC address.
    • CSCef12828 present in post-CSCee27396; fixed in 12.0(26)S4, 12.0(27)S3, 12.0(28)S1, 12.0(30)S
      Symptoms: When traffic passes through a router, the router blocks traffic for certain prefixes behind a port-channel link.
    • CSCdz33664 present in 12.0(25)S3, 12.0(26)S1, 12.0(27)S2, 12.0(28)S; fixed in 12.0(25)S4
      Symptoms: An HSRP state change on any Engine2 interface causes a microcode bundle flap on all other Engine2 linecards, preventing load balancing to work due to vanilla microcode getting loaded.
    • CSCee81071 present in 12.0(26)S3, 12.0(27)S2, 12.0(29)S
      Symptoms: Router sends Ethernet frames with a source MAC address of beef.f00d.beef and destination MAC address f00d.beef.f00d(which is the pattern scribbled in unallocated memory in GSR linecards), with what looks to be a legitimate payload of transit traffic.
      This is one of the symptoms of CSCee27396.
    • CSCeb38014 present in 12.0(26)S5; fixed in 12.0(26)S5, 12.0(27)S
      Symptoms: The BGP Router process flushes the BGP tables for each peer when you change one neighbor's description. This pegs the GRP CPU at 99% for quite a while.
    • CSCeg31951 present in 12.0(31)S; fixed in 12.0(31)S2 (CSCei53226)
      IOS (at least in the PRP code) places each individual public peer in its own update-group if remove-private-as is configured on a peer. Needless to say, this scales badly for a router connected to an Internet exchange. (Try "show ip bgp replication".)
  • A collection of hearsay follows for recent IOS images for the GSR/PRP regarding link aggregation. AMS-IX does not run any GSRs. Please take this information with appropriately-sized grains of salt.
    • 12.0(24)S2 is not advisable (not many specifics known but they include CSCef89562 and CSCee33045)
    • 12.0(24)S6 boots but load-balancing is completely off
    • 12.0(25)S* until S3 have CSCdz33664
    • 12.0(26)S* until S4 have CSCef89562, where Engine4+ linecards can have continuously flapping interfaces, but is also somewhat required for Quadra linecards
    • 12.0(26)S3 has CSCee27396 integrated but not CSCef12828, which leads to traffic blackholing
    • 12.0(27)S* until S3 have CSCef89562 as well
    • 12.0.(27)S1 has a problem where it sends traffic to random destinations
    • 12.0(27)S2 has CSCee27396 integrated but not CSCef12828
    • 12.0(27)S4 reportedly works reasonably well on PRP2s
    • 12.0(28)S1 has problems with Engine2 linecards (CSCef78098) and Engine4+ (CSCef89562)
    • 12.0(28)S2 reportedly works better but still sometimes emits beef.f00d.beef frames on normal ports with only an IPv6 address configured
    • 12.0(30)S has only been observed to exhibit CSCef12828-like symptoms in conjunction with broken hardware, and also to still sometimes emit frames from MAC beef.f00d.beef.
    • Routers occasionally still send out frames with beef.f00d.beef as MAC source address on interfaces with an IPv6 but no IPv4 address configured, even on regular links.
    • Due to the massive amount of feature requests there will be both a 12.0(32)S and a new 12.0(32)SY train.
  • You can check for incorrect next-hops by attaching to the linecard and executing "show controllers rewrite" and "show adjacency internal" and comparing the two rewrite strings for a certain peer's IPv4 address (suffix the commands with " | begin 195.69.14a.b"). The first six bytes of the returned long hex string should be the peer's MAC address, and equal for all three occurrences. 
  • An example configuration follows:
    !
    interface Port-channel1
    description AMS-IX Aggregated Link
    ip address 195.69.14x.y 255.255.252.0
    no ip redirects
    no ip directed-broadcast
    no ip proxy-arp
    channel-group minimum active 1
    no channel-group bandwidth control-propagation
    hold-queue 150 in
    !
    interface GigabitEthernet1/2/1
    no keepalive
    no negotiation auto
    channel-group 1
    no cdp enable
    !
    interface GigabitEthernet1/2/2
    no keepalive
    no negotiation auto
    channel-group 1
    no cdp enable
    !
  • Specifying a hold-queue value is optional, but setting it to the amount of ports in an aggregated link multiplied by 75 is advised.
  • "show interfaces Port-channel 1" will display keepalives enabled even though they are not; also, the BIA (burnt-in address, shown as 0000.0000.0000) can be ignored.
  • Please contact the AMS-IX NOC if you disable autonegotiation on Gigabit Ethernet ports as we may have to explicitly configure our switch for this.

 

Juniper M-Series

We have encountered no issues with aggregated links and JunOS (M40, M160, T320).

  • JunOS releases prior to 6.0 required VLAN tagging on aggregated interfaces. This limitation has since been removed.
  • An example configuration follows:
    ---
    [edit]
    niels@junix# show chassis
    aggregated-devices {
    ethernet {
    device-count 1;
    }
    }
    ---
    [edit]
    niels@junix# show interfaces ge-2/1/0
    gigether-options {
    802.3ad ae0;
    }

    [edit]
    niels@junix# show interfaces ge-3/1/0
    gigether-options {
    802.3ad ae0;
    }
    ---
    [edit]
    niels@junix# show interfaces ae0
    description "AMS-IX";
    unit 0 {
    family inet {
    filter {
    input AMSIX-in;
    output AMSIX-out;
    }
    address 195.69.14x.y/22;
    }
    family inet6 {
    address 2001:07F8:1::A50a:bcde:1/64;
    }
    }
    ---
  • Additionally and optionally you can configure more granular load balancing:
    ---
    routing-options {
    autonomous-system abcde;
    forwarding-table {
    export [ load-balance ];
    }
    }
    policy-options {
    policy-statement load-balance {
    then {
    load-balance per-packet;
    }
    }
    }
    forwarding-options {
    hash-key {
    family inet {
    layer-3;
    layer-4;
    }
    }
    }
    ---
  • In case that is not granular enough, you can modify the hash-key algorithm with some undocumented options in JunOS 7.x and up:
    ---
    hash-key {
    family inet {
    layer-3 {
    destination-address;
    protocol;
    source-address;
    }
    layer-4 {
    destination-port;
    source-port;
    type-of-service;
    }
    }
    }
    ---
  • Also, you can set your aggregated min-links to a value that will cause the bundle to drop in the event that your links can no longer support the amount of traffic you plan on shoving down the pipe. Thus, 2-port aggregated link, pushing 1.2 Gbps sustained across, drop bundle if n == 1;
    ---
    aggregated-ether-options {
    minimum-links 2;
    link-speed 1g;
    }
    ---
  • In a situation with load-balancing over multiple IP interfaces (not AMS-IX), the final statement will make traceroute more confusing to novices as packets may seem to "bounce" between interfaces by also including TCP/UDP port numbers and ICMP checksums in the algorithm.
  • On an IP1 load-balance per-packet really means per-packet; on an IP2 it actually works per flow, which is preferable.

 

Linux

  • Enable bonding driver support in the kernel (CONFIG_BONDING=m)
  • Edit /etc/modules to load the bonding driver on boot:
    bonding miimon=100
    The miimon parameter specifies the frequency for link-monitoring, measured in ms.
  • Install the ifenslave package (apt-get install ifenslave).
    This package provides the /sbin/ifenslave tool, which is used to attach physical interfaces to the bonding interface.
  • Add the bonding interface to /etc/network/interfaces:
    # Ams-IX side
    auto bond0
    iface bond0 inet static
    address 195.69.14x.y
    netmask 255.255.252.0
    post-up /sbin/ifenslave bond0 eth0 eth1
    The above example creates a bonding interface with two physical interfaces.

For more information see the file Documentation/networking/bonding.txt in the kernel source tree.

 

Acknowledgments

The AMS-IX NOC would like to thank Erik Bos (XS4ALL); Edward Henigin (Giganews); Aaron Weintraub (Cogent Communications); Bas Haakman (Multikabel); Bart Peirens (Belgacom); Pierfrancesco Caci (Telecom Italia Sparkle); Tom Scholl (SBC); Richard A Steenbergen (nLayer); Scott Madley (Level 3 Communications); Jon Nistor (Rogers/TorIX); Martin Pels (Support Net) and Paolo Moroni (SWISSCOM) for their input.

We also apologise for the egregious pun in the title of this document.