:. AMS-IX .: Amsterdam Internet Exchange
Contact us || Site-Map || Home || Connect to AMS-IX || Services & pricing || Technical || Member list || FAQ
» Main » Technical » Configuration Guide
Linux Configuration Hints

10. Linux Configuration Hints

We are not aware of any major issues with Linux boxes used as routers, and they seem to be pretty rare on the Exchange. Having said that, there are a few parameters that can (and usually should) be tuned:

  1. ARP filtering & source routing

  2. ARP cache timeout

  3. Reverse Path (RP) filter

For more information on tuning your Linux system for routing, see the Linux Advanced Routing & Traffic Control HOWTO.

10.1. ARP Filtering and Source Routing

The Linux approach to IP addresses is that they belong to the system, not any single interface. As a result, Linux hosts have a default behaviour that is different from most other systems: interfaces semi-promiscuously answer for all IP addresses of all other interfaces. Example:

In this example, host tuxco is a Linux box with a peering connection on eth0 (192.168.1.1/24) and a backbone link on eth1 (10.0.0.1/24).

When host kannix (192.168.1.2) sends an ARP query for 10.0.0.1 it will get a reply from tuxco's eth0 interface!

In other words, a Linux host will answer to ARP queries coming in on any interface if the queried address is configured on any of its interfaces. The idea behind this is that an IP address belongs to the system, not just a single interface. Although this may work well for server or desktop systems, it is not desirable behaviour in a router system. One reason is that it is a limited version of proxy-arp, which is forbidden on the AMS-IX peering LAN. Another reason is that two separate routers could potentially answer ARP queries for the same RFC1918 address.

10.1.1. Fixing ARP

The ARP behaviour can be fixed by using arp_ignore and arp_announce on the WAN interface:

tuxco# sysctl -w net/ipv4/conf/eth0/arp_ignore=1
tuxco# sysctl -w net/ipv4/conf/eth0/arp_announce=1

10.2. IPv4 ARP Cache Timeout

The ARP cache timeout on Linux-based routers should be changed from the default, especially if you have a large number of peers. This parameter can be tuned by setting the appropriate procfs variable through the sysctl interface. The Linux arp(7) manual says:

[ … ]

SYSCTLS

ARP supports a sysctl interface to configure parameters on a global or per-interface basis. The sysctls can be accessed by reading or writing the /proc/sys/net/ipv4/neigh/*/* files or with the sysctl(2) interface. Each interface in the system has its own directory in /proc/sys/net/ipv4/neigh/. The setting in the ‘default’ directory is used for all newly created devices. Unless otherwise specified time related sysctls are specified in seconds.

[ … ]

base_reachable_time

Once a neighbour has been found, the entry is considered to be valid for at least a random value between base_reachable_time/2 and 3*base_reachable_time/2. An entry's validity will be extended if it receives positive feedback from higher level protocols. Defaults to 30 seconds.

This means that Linux systems keep ARP entries in their cache for some time between 15 and 45 seconds (and yes, the average works out to 30 seconds). This is not very high. In fact, it is lower than the typical BGP KEEPALIVE interval and may thus result in excessive ARPs.

We suggest a timeout of at least two hours for ARP entries on your AMS-IX interface, so you'd have to set the base_reachable_time to 2 x 2hrs = 4 hours.

tuxco1# sysctl net.ipv4.neigh.ifname.base_reachable_time
net.ipv4.neigh.ifname.base_reachable_time = 30

The above command tells you that the ARP cache timeout is 30 seconds average. To change it so it's between 2 and 6 hours, use the following command:

tuxco1# sysctl -w net.ipv4.neigh.ifname.base_reachable_time=14400
net.ipv4.neigh.ifname.base_reachable_time = 14400

Here ifname is the name of the interface that connects to AMS-IX. You can also use “default” here, but that may have undesired side-effects for your other interfaces.

10.3. IPv6 Neighbor Cache Timeout

As with the IPv4 ARP cache, Linux systems tend to set the lifetime of the IPv6 neighbor cache quite short as well. The lifetime is controlled in a similar way as for IPv4 ARP:

tuxco1# sysctl net.ipv6.neigh.ifname.base_reachable_time
net.ipv6.neigh.ifname.base_reachable_time = 30

tuxco1# sysctl -w net.ipv6.neigh.ifname.base_reachable_time=14400
net.ipv6.neigh.ifname.base_reachable_time = 14400

10.5. Running the “sysctl” Commands at Boot

The various system parameters discussed above can be set at boot time by adding it to a file such as /etc/sysctl.conf. The exact name, location and very existence of this file typically depends on the Linux distribution in use, but both Debian and Red Hat/Fedora use /etc/sysctl.conf:

# file: /etc/sysctl.conf
# These settings should be duplicated for all interfaces that are
# on a peering LAN.

### Typical stuff you really want on a router

# Fix the "promiscuous ARP" thing...
net/ipv4/conf/ifname/arp_ignore=1
net/ipv4/conf/ifname/arp_announce=1

# Turn off RP filtering to allow asymmetric routing:
net/ipv4/conf/ifname/rp_filter=0

# Multiple (non-aggregated) interfaces on the same peering LAN.
# READ THE MANUAL FIRST!
#net/ipv4/conf/ifname/arp_filter=1

### Keep the AMS-IX ARP Police happy. :-)

net/ipv4/neigh/ifname/base_reachable_time=14400
net/ipv6/neigh/ifname/base_reachable_time=14400

CautionModules must be loaded before sysctl is executed
 

On Debian systems, kernel modules for some network interfaces (e.g. 10GE cards) are not loaded before the init process executes the script that runs the sysctl commands. In those cases, it is necessary to force the module to be loaded earlier. The same goes for the IPv6 settings; the ipv6 module is usually not loaded until the network interfaces are brought up, which is typically after the sysctl variables are set by the procps.sh script.

(On Red Hat/Fedora systems no action needs to be taken; the /etc/init.d/network script automatically (re-)sets the sysctl variables before and after bringing up the interfaces.)

There are a few ways around this:

  1. Re-run the sysctl directives after the interfaces are brought up (and the appropriate modules are loaded). This method is probably the only option available to you if your system does no autoloading of modules.

    On Debian-based systems, this can be done by creating a symbolic link in /etc/rc2.d to re-run procps.sh after the network is brought up:

    root@tuxco# ln -s ../init.d/procps.sh /etc/rc2.d/S20procps.sh
  2. Pre-load the appropriate modules before the sysctl settings are applied.

    On Debian-based systems, the necessary modules can be pre-loaded by listing the appropriate modules in /etc/modules. The module-init-tools script (or modutils on older systems) will load the modules before the sysctl.conf entries are executed:

    # file: /etc/modules
    # load the kernel module for "mycard".
    mycard
    # load the ipv6 stack
    ipv6

    (As a curiosity, on Red Hat/Fedora systems this would be accomplished by creating one or more executable scripts in /etc/sysconfig/modules with names ending in .modules. The scripts should be proper shell scripts executing the appropriate commands to load and initialise the modules).

  3. Modify /etc/modprobe.conf (or the appropriate file in /etc/modprobe.d) and use the install directive to execute the relevant sysctl directives after loading the module. Although this is possible, we recommend against it, as it is far easier and clearer to use one of the alternative methods above.