RE: [load balancing] problem with ports flapping (alteon/cisco/nokia)

From: Scott J. D'Aquila (sdaquilaIZZATiagr.net)
Date: Tue Jul 27 2004 - 13:59:31 EDT

  • Next message: Scott J. D'Aquila: "RE: [load balancing] problem with ports flapping (alteon/cisco/nokia)"

    Peter;

                Thanks for your response. I do have submac enabled for all
    my real servers, I encountered that oversight at the outset of the
    network design over a year ago. Any other thoughts?

    /s

     

    -----Original Message-----
    From: owner-lb-lIZZATvegan.net [mailto:owner-lb-lIZZATvegan.net] On Behalf Of
    Peter Degrassi
    Sent: Tuesday, July 27, 2004 10:08 AM
    To: 'lb-lIZZATvegan.net'
    Subject: RE: [load balancing] problem with ports flapping
    (alteon/cisco/nokia)

     

    Hi Scott,

    You are seeing the FW MAC from switch port 2/5 because by default the
    AceDirector does not substitute the MAC address when the frame is L4
    switched. Enable submac globally (/cfg/slb/adv/submac) or at the real
    servers (/cfg/slb/real 1/submac).

    Hard to say if that's causing the problem without more info but in a DSR
    setup submac is required so the L2 device isn't confused.

    Peter

    -----Original Message-----
    From: Scott J. D'Aquila [mailto:sdaquilaIZZATiagr.net]
    Sent: Monday, July 26, 2004 5:10 PM
    To: lb-lIZZATvegan.net
    Subject: [load balancing] problem with ports flapping
    (alteon/cisco/nokia)

     

    All;
            I am having a great bit of difficulty tracking down a problem
    with my network, wondering if any of you can help shed some light on
    this for me. I have my network set up similar to
    http://networking.oreilly.com/images/networking/bourke_1100_image3.gif,
    using 'route path load balancing'. I have 2 Nokia firewalls running
    checkpoint, 2 Cisco 2948G l2 switches, and 2 Alteon ace directors. We're

    using vrrp as a failover mechanism for the load balancers and the
    firewalls. We're using scripts to health check the real servers on the
    alteons. When my site has problems, and all of the real servers are
    down, all access to the network goes down. Seems to me, they have all
    lost access to the default gateway. The problem must somehow be involved

    with the load balancer since it takes all of the servers to fail to
    trigger the problem. It's also quite frustrating, considering that since

    all the servers are down, I really need to get into that network and fix

    what ever problem is going on in there. But it gets weird...

    The only thing I can find that causes alarm is the following

    Jul 26 13:21:04 xxx snmptrapd[24832]: 192.168.20.242: Enterprise
    Specific Trap (1) Uptime: 494 days, 21:10:19.44,
    SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.2.0 = STRING: "SYS",
    SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.3.0 = INTEGER: 5,
    SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.4.0 = STRING: "SYS",
    SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.5.0 = STRING: "2004 Jul 26
    12:14:51 Eastern -05:00 %SYS-4-P2_WARN: 1/Host 00:a0:8e:1a:d3:52 is
    flapping between port 2/5 and port 2/1",
    SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.6.0 = INTEGER: -19185352

            This is from my snmptrap receiver on that network, coming from
    the Cisco switch. Port 2/1 is the uplink to the firewall, where the
    primary firewall is plugged in. Port 2/5 is the Alteon load balancer.
    The MAC address indicated is the firewalls. However, it is the real mac,

    not the 'vrrp mac' which the network is using as its gateway (since
    we're using the route path method, the web servers use the load balancer

    as the default gateway, but the other hosts just use the firewall vrrp
    address, which uses the virtual mac). But ALL hosts in the subnet are
    unavailable, not just the ones using the load balancer as the gateway.
            Since I've built this with a route path topology there are no
    other devices plugged into the load balancer what so ever, so there is
    no reason for the mac address of the firewall to turn up on that port
    due to a physical connection, but it would seem that it does anyway and
    screws everything up... Has anyone ever seen this type of behavior? I am

    sorta dumbstruck by this. Why would this happen in the first place, and
    how do I fix it? Is it even a problem anyway if this isn't the mac
    address of the gateway? Maybe its something else?
            I've thought of creating a static CAM entry on the Cisco switch,

    but I don't see how this can help since the real mac address we are
    concerned with is the vrrp mac of the firewall, and considering it has
    to be floating by definition, I can't hard code that...

    As always, any help appreciated, TIA

    -scott

    ____________________
    The Load Balancing Mailing List
    Unsubscribe: mailto:majordomoIZZATvegan.net?body=unsubscribe%20lb-l
    Archive: http://vegan.net/lb/archive
    LBDigest: http://lbdigest.com
    MRTG with SLB: http://vegan.net/MRTG
    Hosted by: http://www.tokkisystems.com

    ____________________
    The Load Balancing Mailing List
    Unsubscribe: mailto:majordomoIZZATvegan.net?body=unsubscribe%20lb-l
    Archive: http://vegan.net/lb/archive
    LBDigest: http://lbdigest.com
    MRTG with SLB: http://vegan.net/MRTG
    Hosted by: http://www.tokkisystems.com



    This archive was generated by hypermail 2.1.4 : Tue Jul 27 2004 - 14:08:24 EDT