Over the last year I've been involved in a number of investigations attempting to find bottlenecks in systems consisting of clients, web service hosts, and databases (usually containing application logic in addition to data). The details of each system's implementation is not especially important to this discussion because what I want to do here is just relate one of my recent experiences regarding measurement of response times. You might like to check if you get similar behaviour.
Firstly, let's just describe the typical situation I find myself looking it. It's very generic, I'm sure the same thing will apply to you. I usually have some client systems accessing a service layer hosted in IIS6 talking to a database server (usually with significant embedded application logic).
The problem (or my lesson in this case) is how to interpret the numbers you get from the IIS log files on the web service hosts. These files give you HTTP request duration and the arrival time of the request. Now what happens when you naively plot a distribution of the the duration (ie request execution time)? You might expect a nice symmetrical peak centered on some value, or you might get something such as the following.
I found that I was consistently getting this sort of shape across different types of requests. This isn't really all that unexpected. Each URI in the log file corresponds to a web service against which a number of web methods may be called. The web methods may have significantly different response times so the graph of the service call is really just a summation of all the individual web method calls. And in fact we've now implemented duration timing on each individual web method call, and in fact we get a much simplified distribution centred round one peak.
Anyhow, while I was looking into the double peak I decided to look at the arrival rate and compare that to the duration of the call. Theoretically you should get a Poisson distribution for random, uncorrelated events and on those occasions when there were many simultaneous arrivals you'd also expect the response time to slow down (although whether this was linear/non-linear is another question).
So, I looked for a period of time during which we have fairly constant activity and chose 2 hours in the middle of the day.
You can see that there's a nice distribution with the expected shape centred on 4 arrivals/second. Of course the IIS log files only record data when a request actually arrives. Looking at the bar graph you'd therefore naively expect about 300 one second intervals over the two hour period during which no requests arrived.
And this is where I got a surprising result.
If you get anything like this, have seen it before, or have a bit more knowledge of what is going on I'd love to hear about it.