Re: [load balancing] F5 LTM hitting SSL TPS license limit

From: Kenneth Salchow <k.salchow [izzat] f5.com>
Date: Tue Feb 24 2009 - 14:05:44 EST

Eric, please see below:

 

KJ (Ken) Salchow, Jr. | Manager, Technical Marketing

 

From: lb-l-bounces@vegan.net [mailto:lb-l-bounces@vegan.net] On Behalf Of
Rosenberry, Eric
Sent: Monday, February 23, 2009 3:16 PM
To: Load Balancing Mailing List
Subject: Re: [load balancing] F5 LTM hitting SSL TPS license limit

 

Kenneth-

 

Thank you very much for the detailed technical response. This is much more
information than I have been able to get out of F5 support during the
duration of the support case. They would not tell me over what time period
the F5 was enforcing rate limits. (I would like to note that my support
engineer was excellent however)

 

[[KJSJ]] I'll make sure that is known-thanks for saying it. Those guys/gals
rarely get the respect they deserve.

 

A few comments:

1. This behavior makes more sense when explained in this manner. My main
frustration was with the lack of public documentation. If there was a
google searchable knowledge base article (or at least on the private support
site) I might never have needed to open a support case in the first place.
There are at least two knowledge base articles I have read that explain the
rate limit as being based over a 1 second window which is apparently false.

 

 [[KJSJ]] I am attempting to correct this by ensuring that your case gets
commented with this information and am trying to get an askF5 article
created and published to do the same. In addition, based on your comment, I
will endeavor to find and correct any erroneous information published
elsewhere and see if I can't get that fixed too. My apologies for the
confusion and conflicting information.

 

2. Does it count against your TPS limit as soon as a SYN is received? This
is concerning to me as it could represent an easy DoS attack with a simple
SYN flood. I would prefer to see the license count only decremented against
once a three way handshake was completed (or even until the SSL negotiation
happened). If I am understanding current functionality correctly, a simple
SYN request could tie up a SSL license.

 

[[KJSJ]] That's certainly what the prior response suggests, doesn't it? I
have requested clarification (and was waiting for the response but decided I
didn't want to wait to get this response out).

 

3. Customers need a clear way from the web interface, command line, and SNMP
to check the high watermark to see how close to the limit they are getting.
I am sure there are a lot of people out there (myself included) that login
and look at that SSL TPS graph all the time and rest assured that they are
nowhere near the "limit".

 

[[KJSJ]] As always, I am passing this information along-however, it is in
your best interest to continue to push for request like this via the normal
support channel you have already gone through. Any kind of request for
engineering / feature request / etc. that comes from me via the web, a
conference or whatever is certainly taken seriously and listed in our
feature planning tools-but as a customer with existing product-a request
that comes directly from you has substantially more weight. All the other
stuff is just market hearsay.

 

I am not necessarily convinced that rate limiting on 10ms intervals is
better from a DoS perspective, but I don't have any evidence to prove
otherwise. I could argue that both ways. Right now an attacker simply must
send 20 syn packets within 10ms to "overflow" the limit on a 2000TPS license
(granted the client's TCP stack should re-send and if the DoS is not
sustained they will get through). With a one second rolling window the
attacker would need to send 2000 SYN's each second to fill up all the
license slots. I am however certain that limiting on 10ms windows vs. a
full second causes more people to run into the limit which promotes license
sales. ;-)

 

I do understand and fully agree with the point about the current behavior
helping to smooth out traffic spikes. The issue for most sites though is
that they can't ever let the license limit be run into as they can't afford
to potentially impact customers. There really is not a way (as far as I
know) to tell how "hard" you have run into the limit. Are you dropping 5
syn packets a second, or are you dropping thousands? (this might be a good
SNMP counter object - total dropped SYN's due to license restrictions)

 

Also, as a point of clarification- Do multiple HTTP GET requests within a
single TCP/SSL session count against the SSL TPS limit, or is it only
counted as a single connect. (I do know that resumed SSL sessions that span
multiple TCP connections are treated as separate SSL setups license wise)

 

[[KJSJ]] I believe that Hamish responded correctly to that-it is the
connection only. I'll respond more to Hamish's note about what constitutes
a 'TPS'.

 

Thanks again, I really do appreciate your participation in this forum.

 

[[KJSJ]] It's no problem. I've been hanging around this forum for a decade
now-originally when I was a customer myself. I try to walk a very fine
line-I don't try to make blatant 'sales' pitches and I don't try to replace
our very competent (much more so than I) support staff-but when things like
this come up and there is an opportunity to clarify our technology and have
an honest discussion about the way we've implemented features/functions,
that's a conversation I think we should always be open to.

 

-Eric

 

  _____

From: lb-l-bounces@vegan.net [mailto:lb-l-bounces@vegan.net] On Behalf Of
Kenneth Salchow
Sent: Monday, February 23, 2009 10:09 AM
To: Load Balancing Mailing List
Subject: Re: [load balancing] F5 LTM hitting SSL TPS license limit

 

Eric,

 

First-I LOVE your title; that's awesome.

 

Ok-now to business. After seeing your message, I wanted to find out what
Product Development had to say about it and here is their direct, un-edited
response:

 

[quote]

The code provides for 10ms accuracy (5ms of backlogged time, and 5ms of
limiting) for a TPS limit, where a TPS is defined, essentially, as a SYN (or
other connection startup) message on a clientssl virtual. The rate is
calculated on the expectation that over any given 10ms window, the rate of
new connections should not exceed the rate that would, if it were continued
for the rest of the second, exceed the TPS limit. By only using a 10ms
window, we allow for continued connections at the maximum allowable TPS
limit for the entire second, thus saturating the entire limit while
servicing connections at a constant rate. So naturally, if you try to drop
1000 connections on a box with a 1000 TPS /simultaneously/, the TPS limit
will fire because it expects that rate to continue /for the rest of the
second/.

 

An alternative method would be to accept all 1000 connections, and deny all
others for the rest of the second.

        * One reason not to do this approach is that it will block ALL new
connections for the rest of the time period. This increases the
effectiveness of DOS attacks and decrease the general reliability of the box
itself under load.

        * Additionally, SYN's are expected to be sent three times by
most/all well behaving TCP stacks. This provides us with an automatic
smoothing behavior - we ignore the SYN's during spikes, and the client will
automatically try again over macro-time, hopefully smoothing out the spike.
This is why, in a real world test, the current behavior would not actually
result in dropped or timed-out connections for users, despite momentary
spikes.

 

So, in summary, the higher resolution limit enforcement provides several
measurable benefits over allowing a full TPS limit's worth of connections at
any given point. The first major advantage is to allow new connections to
continue to be serviced at a constant predictable rate over time. The
second is increased DOS and spike resistence. Plus, given normal connection
handshake behaviors by clients, there is a decreased likelyhood that
connections will be completely dropped.

[/quote]

 

I hope that makes more sense and helps you understand why our developers
made the decision they did. Consequently, I also don't see this behavior
changing in any way as it is entirely by design-good, bad or otherwise. The
positive thing is that with your posting and this reply-at least we've made
it much easier for other users to understand why they might see that
behavior. For that, BTW, thank you.

 

Now-Hamish also made a comment concerning our SSL license limit and how we
handle it when that limit is exceeded. It is true, that once we exceed the
SSL license on a BIG-IP LTM device, we no longer accept SSL connections. It
is ALSO true that in the case of compression, we still process the
connections, but we no longer apply compression. There is a significant
difference between these activities that make this distinction necessary.
When dealing with compression-it's not that we simply 'fall-back' on
software-we just quit doing compression beyond the license limit; in this
case, it has no affect on whether traffic continues to flow or not, just
whether it is compressed. The act of compressing/not-compressing is not
relevant to the actual connection itself.

 

With SSL, however, the limit *is* connections. We cannot simply choose
*not* to provide SSL services and somehow have the connection still exist
and be processed-they are mutually exclusive. Hamish suggests that we could
'fall back' to software-but that is making the assumption that the license
limit is for 'hardware SSL', which it isn't; it is for SSL TPS for the whole
box.

 

So, in this regard-comparing Compression licensing with SSL licensing simply
doesn't work. However, that being said, both your (Eric) perspective as
well as Hamish's perspective are being heard and used as important criteria
when making future decisions about how we handle these processes. Having
been around F5 for almost a decade and having been a customer before that-I
know firsthand that this is the way F5 products have continued to evolve
year-over-year; by taking the input and ideas of the people who use them
every day.

 

KJ (Ken) Salchow, Jr. | Manager, Technical Marketing

 

From: lb-l-bounces@vegan.net [mailto:lb-l-bounces@vegan.net] On Behalf Of
Rosenberry, Eric
Sent: Thursday, February 19, 2009 4:06 PM
To: Load Balancing Mailing List
Subject: [load balancing] F5 LTM hitting SSL TPS license limit

 

FYI-

 

I ran into this issue with my F5 LTM's recently and I wanted to share it
with the rest of the world in such a form that future Google searches would
pick it up since I could not find any information on it myself.

 

If you are getting a message similar to this in your LTM logs and you don't
think you are anywhere near your licensed SSL TPS limit, you might be
running into this issue:

tmm tmm[1253]: 01260008:3: SSL transaction (TPS) rate limit reached

 

After many hours of working with F5 support and analyzing packet dumps to
prove the issue to F5 it would appear that the F5 tracks your SSL TPS
license limit on some sub-second interval. i.e. if you have a 2000TPS
license it seems to chop this up into smaller windows (say 1/10th of a
second for this example) and enforce the limit of 200 TPS on each 1/10th of
a second bucket.

 

I ran into this issue while running some load tests against our system at a
SSL TPS rate that should have been less than the license limit, however, due
to the burstiness of the load test it hit F5's artificial license rate
limit.

 

I know this issue exists in code version "BIG-IP 9.4.1 Build 29.0". I can
not speak for any other versions.

 

F5 does not have a code fix for this issue and at the moment. If you need
to prove to F5 support that this issue exists you can have them refer to
case C492020 where it is documented.

 

-Eric

_______________________________________________________________
Eric Rosenberry
Sr. Network Engineer | Chief Bit Plumber
Direct +1. 503.943.6763 | Mobile: +1.503.348.3625 | Fax: +1.503.224.1581

 

 
iovation
111 SW Fifth Avenue
Suite 3200
Portland, OR 97204
www.iovation.com <http://www.iovation.com/>
 
The information contained in this email message may be privileged,
confidential and protected from disclosure. If you are not the intended
recipient, any dissemination, distribution or copying is strictly
prohibited. If you think that you have received this email message in error,
please notify the sender by reply email and delete the message and any
attachments.

_______________________________________________
lb-l mailing list
lb-l@vegan.net
http://vegan.net/mailman/listinfo/lb-l
Searchable Archive: http://vegan.net/lb/archive
http://lbdigest.com Load Balancing Digest
http://lbwiki.com Load Balancing Wiki

Received on Tue Feb 24 14:06:10 2009

This archive was generated by hypermail 2.1.8 : Tue Feb 24 2009 - 14:06:11 EST