Monday, November 30, 2015

Sharpen up your legacy app(s) performance with a bit of F#

First off, thanks go to Sergey for organising another F# Advent Calendar event. Normally, I'd just watch and learn, but this year I thought I'd try and contribute back something of my own experiences from using F# in a corporate/enterprise environment.

The problem I've faced this last 6 months has been to try and get a better understanding of the performance characteristics of the systems where I work. This isn't really a technical post - although I do have some links to sample code on my Github repository, it's mostly a background perspective on what I've found helpful and why - and it needs to be taken from the perspective of someone that doesn't code for a job - I'm mostly pushing out powerpoints and word docs but when the opportunity allows I love the chance to get into a bit of the data and make something of it.

To get a bit more specific in this case, my co-workers and I need to get a good handle on the performance characteristics of a large legacy technology system that's getting replaced; it's at the heart of a customer facing system with upwards of a million customers so performance is very important. The profiling is essential for defining non-functional requirements (terrible term - qualities of service is so much better but the industry can't seem to get away from it), and it's those NFRs which are often misrepresented, misunderstood and yet become crucial to successful delivery.

So what do we do and why has F# been helpful? Okay there’s probably more steps but for me I'm thinking of 3 things.
  1. We have to baseline existing performance which means lots of analysis.
  2. Then we set our target requirements based on the baseline and maybe introduce some tougher targets, because why not, it’s a new and hopefully better thing you’re developing/configuring/deploying.
  3. Then we need to verify through testing under load.
F# and the broader .NET application environment has been incredibly useful to this exercise for the data analysis side, and it's looking very promising for the testing.

So why is that?

Well let's looking at profiling, what that means is collecting lots of data; it’s coming from log files and databases, and it’s got shape, it can all be described as being of some type and F# is really, really good at dealing with data that's got some form of shape, some constraints or rules.

I've previously used Perl, Python, R, Excel and even VBA; they're all popular choices, but that ability to type the data starts to become really useful when trying to get to grips with the edge cases that you trip up on later, or when you start moving into larger volumes of data. The dynamic typing approach you get with Perl, Python and R is great for getting you started but then you so often keep hitting the edge cases as you start processing larger volumes of data and it's sometimes really hard to figure out why because it's basically just happening at runtime. The strong upfront typing in F# (and other statically typed languages) is very, very handy in these circumstances. So that's reason number one for me: static inferred data typing which tells you what's going on right up front.

OK, so how about another reason. Data manipulation. Let's say you get the data into memory and you're working on it. The REPL in combination with the inferred typing, functional and non-functional data structures (Arrays, Seqs, Deedle data frames etc) and functional approaches like Maps makes data transformation a whiz. OK, Microsoft's SQL Server is pretty good - I/we use it a lot but while it's a truly wonderful database, Transact-SQL is just not the best approach to dealing with many parsing or analysis problems, you rapidly end up playing tricks like constructing XML and doing this. Nope, F# rocks for interactive data transformation, it's definitely reason number two.

The third big reason for me is the broader .NET ecosystem. It just works well together. Plugging libraries in isn't hard. Things tend to work and the last few years have seen packages on Nuget expand dramatically. It's a wonderful thing to pull up the Accord libraries, or Deedle, or the many type providers, or FsLab and just get it all working straight away.

The last reason has to be brought up. This is an awesome community. People help and they're positive. That's a helluva a plus when you're just trying to learn.

Ok, so those are the reasons why I find F# works so well for me, but let's go into a little more detail and go through a typical workflow in trying to do this performance profiling.

Step 1 is reading log data. Most often it's going to be with File.ReadAllLines, or SQL data via one of the SQL type providers. Most enterprise environments will also have a variety of monitoring systems to query - we happen to have Microsoft’s System Center Operations Manager installed with a data warehouse that contains multiple years worth of performance metrics. The SQL type providers make reading this easy. Here's a question - has anyone considered a type provider for IntelliTrace files? I can't seem to find any information on how to query these outside of Visual Studio but would love to know.

I like regular expressions and active patterns to parse text data – it helps tease out edge cases but if there’s a simple structure then basic string operations like split can be fast and simple. In practice I haven’t had much luck with the CSV type provider – too many oddities in the files I seem to come up against - far too many of them effectively spread a record across multiple lines, and I very rarely come across simple csv data. In the example code I've got a typical example of a multi-line file that needs to be pre-parsed before handing off to the ReadCsv method in Deedle.

So the data gets into local memory and that's good enough for most of the problems we face, in fact you can make a pretty good argument that big RAM is beating big data. Representing it in memory is usually a case of an Array, or a Deedle data frame. Data frames are very handy but there's a bit of a learning curve due to the increased flexibility. (Fortunately I see a new book has just been published - I need to read it!)

Next step's usually a series of transformations. Using Array.mapi means if something goes wrong I can get a line number. Array.Parallel.mapi is nice when an individual task can be particularly slow but often running interactively it doesn't provide a simple performance boost. Using the Some/None option types makes missing values explicit which turns out to be incredibly useful – missing values are everywhere in real-world data. It makes you explicitly handle the missing data case and then progressively unwrap like this.

Timing a series of piped-forward transformations can be useful to know where the lengthy steps are which can be easily done using a custom operator. (btw I'd love to know if there was a way to reflect back something readable about the nature of the line of the code being executed in the pipeline.)

So, let’s say we’ve transformed the data and we now have something like durations at certain timestamps. What now?

Time to use FSLab and Accord.NET. Graphing distributions of response times or numbers of requests per time interval provides a great way of understanding the system you’re dealing with. The shapes can give some insight on the underlying processes that generate the distributions.Classic examples are random arrival events resulting in a Poisson distribution, measurement errors resulting in a normal Gaussian distribution or failure events resulting in a Weibull distribution. The Accord.NET author, César de Souza, has produced a very useful tool and covering article to getting a handle on different types of distributions with documentation to understand the underlying nature of the distribution and sample code for interactively working with them. In practice with the real system data you're going to find lots of skewing and probably multiple peaks due to competing processes - use your eyes to figure out what's happening.

I used FsLab for that visualisation, it incorporates both FSharp.Charting and XPlot.GoogleCharts. They both work well.

The Accord Framework is very broad - I've used it here for modelling, fitting and sampling distributions but it goes far further - machine learning, image processing etc.

With a good understanding of the baseline performance characteristics you’re in a position to set service level expectations for response times – basically, the QoS requirements - and with requirements then it means you'll want to be testing against them. So you'll either be sampling against real historical data or against a fitted distribution which can allow you push into the long tails. That's important because unless you take a really long historical sample you'll probably never get to test the extreme values, after all they're only rarely occurring but they're the ones that might bring your application to a grinding halt.

Calling into libraries, legacy systems etc to execute your tests is a piece of cake with F#. If it's a web service endpoint the WSDL type provider makes calling a service easy. The Async MailboxProcessor is a very simple way to run up parallel workers for stress testing.

I've put a sample fsx script into GitHub which shows a typical scenario that I might go through in the office. It starts with the retrieval and parsing of data using Deedle and the use of Accord.NET to find the best fit across common distribution types. There's a number of transformations involved and some sample questions, approaches to answering them and graphing tasks. The data in this case doesn't have enough input variables to be useful for a machine learning demo but it is very typical of what gets written from an older enterprise application log. (In retrospect it would have been interesting to include measures of the batch sizes and relative complexity of the processing steps. The we could have considered how the output processing times relate to multiple input variables.)

The last big part of this overall process is the presentation of the data back and the incorporation into documents and presentations. For me, Microsoft Office still rules the roost and the entry point for data is Excel. My default way (for example) to get data into it now is using Deedle's Frame.SaveCsv. This works well as part of a file processing pipeline:

File data -> import process -> transformation and analysis -> representation as an array of records -> Frame.ofRecords -> .SaveCsv. -> Excel and Word/PowerPoint drudgery.

So there you have it. An F# Advent posting simply because it deserves it. A very useful programming language, community and technology toolkit that's made many a problem go away this past year.

Bring on 2016.

Thursday, July 11, 2013

Azure Table Storage F# most basic example

In case anyone else was as confused as I was.... while trying an experiment out with WindowsAzure Table Storage. Searching the net just keeps highlighting old documentation and with the API changing it had me going no where for quite awhile. Basic run through here:


Thursday, October 20, 2011

IronPython and Numpy/Scipy

Should I ever forget this in the future...

If you're trying to install numpy/scipy on ironpython and like me you're behind a corporate firewall, and like me you can't get proxying to work, and like me you can't figure ironpkg out, then don't give up. Download the eggs locally and use ironegg eg: ironegg nose-1.0.0-1.zip

Monday, May 23, 2011

Launching the Black Pearl!

It’s not quite finished – I need to give a final rub down and coat of epoxy, and put the deck lines on, but with great weather this weekend I just had to launch it.

Couple of building notes:

  • I used thick minicell foam for the bulkheads. Expensive stuff in NZ but I got a slab of it while on a work trip to the US. I figure it makes for light bulkheads that will allow the body of the boat to flex.
  • The foot pegs are plastic – screwed into tee nuts embedded in a strip on each side. Not sure how adjustable these are going to be – with the small cockpit I can’t seem to reach in to change the settings!

IMG_1346

  • I used Maroske’s internal tube fittings but the longest of these just wouldn’t let me extract the PVC tube… no problem, just slice em in half! They’re very easy to re-glue.

IMG_1345

  • I doubled the high density fibreglass over 2/3rds of the length for the hull and deck interiors with an extra overlapping strip just behind the cockpit. Feels very robust while getting into/out of the cockpit.

What’s was it like to use?

  • Much lighter build than the Night Heron – on the NH I used 200g cloth doubled over most of the interior. The smaller surface area of the BP plus the lighter building materials (mostly Paulownia strips) means it’s about 14kgs. This makes it much easier to carry and a bit more responsive to my body movement on the water.
  • Very tippy! But secondary stability seems good – I just need to get used to the experience. Very different from anything I’ve been in so far and I’ll take my time to practice rolls before I get into rougher water.
  • Very responsive to my movements and to the water.
  • Lots of fun –which is what I wanted.
  • I was slightly concerned about the ocean cockpit but in practice it’s easy for me to get into and out of so no problem there.
  • Legs are quite straight. Again was a bit worried I’d be too uncomfortable, but actually for me – it’s fine.

So frankly, I’m utterly delighted!

IMG_1364

Friday, April 22, 2011

Cockpit coaming step 1: Wood strips

I’m building the cockpit coaming by first building a wood strip riser. I'll fibreglass it into place then put foam around the outside and use that as a form to create the coaming lip – at least that’s the plan.

First step is building the riser. To do this I followed Nick Schade’s instructional video and it’s seemed a pretty quick and easy exercise.

So far, the riser is in place, a fillet is on and when it dries I’ll wet sand and follow up with 2 layers of fibreglass.

It’s an ocean cockpit so fairly short and only 38cm interior width.

Current pic (plastic to protect the interior):

IMG_1327

Saturday, April 16, 2011

Black Pearl Update–Deck Fibreglassed

The deck’s been glued, sanded and fibreglassed – at least on the outside.

Mostly Paulownia but 3 strips on each side of Cedar as I was running a bit low of the lighter wood. I’ve decided to leave the deck natural colour so that I can forever see how it was built. The cockpit’s been cut out but I’m leaving the hatches till after the fibreglass sets up. It seemed a little flimsy to be cutting too many holes in the deck.

Not many staple holes, it was easy to glue most strips in place without staples.

I kept the surface to an even height and used the odd bit of chopped up credit card under a couple of thinner strips.

Quick pic: of the cockpit:

IMG_1324

Tuesday, March 15, 2011

Wood Density

My fascination with the weight of the kayak I’m building is growing with my surprise at just how light in weight the hull is. I thought I’d do a quick check on a couple of samples of Paulownia and Cedar.

For the wood I’ve bought it works out that Cedar is about 0.40g/cm3 (about 17 lbs/ft3), and the Paulownia is about 0.27g/cm3 (about 25lbs/ft3). So the Paulownia is 2/3rds the density of the Cedar!

Now that has me thinking – I’m mostly building the deck in Cedar – the extra wood weight plus the coming/deck deckplates etc – hmmm, will it be tippy without me in it? I reckon from looking at it that the deck is about 2/3rds the size of the hull so it might will be balanced at about the same weight top half and bottom half!

Monday, March 14, 2011

Black Pearl Hull Completed

Some quick notes on this.

Firstly, must remember to take care during strip gluing to get the glue all the way through – I didn’t and I found that after fibreglassing the outside and removing the forms… the bottom of the hull started to bend out in a couple of places. Remedy… add epoxy on the inside and glue the strips up properly and weigh down with a few clay bricks (sitting on plastic of course).

Secondly, I tried to reinforce the interior chines with a triangular strip of wood height about 5mm and base about 10mm. What a waste of time! Quick calculation after the fact shows that using wood came in at about 100 grams, but I’ve just checked the density of thickened epoxy – I’m getting about 10 to 20% lighter than straight epoxy – and if I’d just used a thickened epoxy bead it would’ve weighed about 200g. The extra 100g would have been worth not having to cut triangular strips!

And finally, I hate fibreglassing the interior – much more difficult than the outside. Fortunately, relatively easy to fix up any ugly bits a day or two later – in my case 3 patches that needed sanding back and redoing along the chines.

Also, I’ve added up the costs and just out of interest it’s looking like about $NZD1100 total for this one – that’s plans, fibreglass, epoxy, tints/dyes/stains/deck ports and loads of sandpaper. If I’d made some smarter choices along the way I certainly could’ve brought the cost down below $1000.

I’ll do a more thorough break down at the end.

And the weight! Before I forget, the hull with one layer of 175g high density fibreglass on the inside, one on the outside and an extra strip on the keel, it totals about 5kgs on my scales. Incredible really! The Paulownia wood is very light and the high density fibreglass certainly takes much less epoxy than the 200g normal fibreglass I used on the last kayak.

Monday, February 21, 2011

Multiple Monitoring Performance Data Collections

Ah yes – that code from the last couple of posts has a bug…

The retrieval of data from SCOM has proven very useful in the last few days but it seems most commonly I’m getting multiple monitoring performance data collections - so the code I wrote in the last couple of posts on how to retrieve that data using F# and Sho needs a little adjusting to concatenate the collections into one sequence – yield! works great in F# eg

let perfData (start:DateTime) (finish:DateTime) : seq<DateTime * float> = 
seq {
for mpdsItem in mpds do
yield! (mpdsItem.GetValues(start, finish)
|> Seq.map (fun(mpdv) -> (mpdv.TimeAdded, mpdv.SampleValue))
|> Seq.filter ( fun(d,v) -> v.HasValue)
|> Seq.map ( fun(d,v) -> (d,box v) )
|> Seq.map ( fun(d,v) -> (d, unbox v) )
)
}




Not sure about python but it must be similar.

Monday, February 14, 2011

SCOM data into F#

Having just discovered how to get SCOM data from Microsoft Sho - a dynamic language analysis environment - I thought I’d try from F#. Pretty much the same but you need to think about what to do with the graphing, and you need to handle the Nullable<float> values coming back from SCOM.

I used fschart to do the plotting and a sequence filter and boxing/unboxing to handle the Nullable data.

The histogram function wouldn’t be hard to make but while stumbling across fschart I also stumbled across this.

So my contribution was do this:

#r @"System.Windows.Forms.DataVisualization.dll"
#r @"C:\extras\FSChart10\FSChart\bin\debug\FSChart.dll"

open FSChart

open System.IO
open System.Drawing
open System.Windows.Forms
open System.Windows.Forms.DataVisualization.Charting

#r "c:\Program Files\System Center Operations Manager 2007\SDK Binaries\Microsoft.EnterpriseManagement.OperationsManager.dll"
open Microsoft.EnterpriseManagement.Monitoring

let mg = Microsoft.EnterpriseManagement.ManagementGroup("someManagementServer")
let mpdc = MonitoringPerformanceDataCriteria("ObjectName = 'ASP.NET' and CounterName like 'Request Execution%' and MonitoringObjectPath like 'someMonitoringObjectPathFilter%'")
let mpds = mg.GetMonitoringPerformanceData(mpdc)

open System

let (perfData:seq<DateTime * float>) =
mpds.[0].GetValues(DateTime.Today.AddDays(-1.), DateTime.Today)
|> Seq.map (fun(mpdv) -> (mpdv.TimeAdded, mpdv.SampleValue))
|> Seq.filter ( fun(d,v) -> v.HasValue)
|> Seq.map ( fun(d,v) -> (d,box v) )
|> Seq.map ( fun(d,v) -> (d, unbox v) )


hist 0.0 200. 50 (perfData |> Seq.map (fun (d,v) -> v) )


The result looks like this.


image


So the question is… what was the more efficient approach? Well when I did the sho/python code I actually had some c# code open in the background to help me through the classes - without that it would have been really hard. The F# approach made me scratch my head a few times wondering how to deal with charting and nullables but all the way through (much like the c#) I had the advantage of the richer type information to help me figure out what to do. 

SCOM and Microsoft Sho–A better histogram example…

The last blog post had a naff histogram as a few exceptionally long queries squashed the remainder of the data in the first bin, also I wasn’t checking for None so here’s a better, filtered example.

ShoLoadAssembly("c:\Program Files\System Center Operations Manager 2007\SDK Binaries\Microsoft.EnterpriseManagement.OperationsManager.dll")
ShoLoadAssembly("c:\Program Files\System Center Operations Manager 2007\SDK Binaries\Microsoft.EnterpriseManagement.OperationsManager.dll")

from Microsoft.EnterpriseManagement.Monitoring import *
from System import *

mg = Microsoft.EnterpriseManagement.ManagementGroup("someManagementServer")

mpdc = MonitoringPerformanceDataCriteria("ObjectName = 'ASP.NET' and CounterName like 'Request Execution%' and MonitoringObjectPath like 'someMonitoringPathFilter%'")

mpds = mg.GetMonitoringPerformanceData(mpdc)
mpds.Count

hist([mpdv.SampleValue for mpdv in mpds[0].GetValues(DateTime.Today.AddDays(-1), DateTime.Today) if ((mpdv.SampleValue < 200) and (mpdv.SampleValue is not None))])



 



image

Friday, February 11, 2011

Retrieving Performance Data from System Center Operations Manager

System Center Operations Manager’s agent model means it can collect performance metrics from across a diverse technology environment. You use Rules to define the data collection and typically you use the Console to view the data in graphical format.

But, the Console by itself doesn’t allow any numerical analysis. If you want to do that you need to get the data out. You can copy to the clipboard within the Console Action menu but that’s very manual. A better approach is obviously programmatic access and this is  where things get interesting.

For some reason there’s little information on the net on how you get data out of SCOM. What is there usuall refers to querying the SCOM database directly. This doesn’t seem right to me. I’d far prefer an API that stood a chance of staying intact in future versions. So here’s what I’ve done.

Firstly, use the Powershell integration to SCOM to get the data. (This is easiest if you setup the Powershell ISE to connect to SCOM.) You can then use the get-performancecounter and get-performancecountervalue cmdlets to retrieve data, for example:

$starttime = [datetime]::today.adddays(-20)
$endtime = [datetime]::today.adddays(1)

get-performancecounter |
? {$_.ObjectName -eq 'Web Service' -and
$_.CounterName -eq 'Connection Attempts/sec' -and
$_.InstanceName -eq '_Total' -and
$_.MonitoringObjectPath -like 'usuallyAServerNameFilter*' } |
get-performancecountervalue -starttime $starttime -endtime $endtime |
select SampleValue, TimeSampled |
export-csv "c:\temp\ConnectionAttemptsPerSec.csv"


You can also use the .Net framework and languages to get the data – the documentation isn’t great but this example works (I’ve reduced the font size to try and make it fit in this blog template):


/// <summary> 
/// Gather performance data
/// </summary>
using System;
using System.Collections.ObjectModel;
using Microsoft.EnterpriseManagement;
using Microsoft.EnterpriseManagement.Common;
using Microsoft.EnterpriseManagement.Configuration;
using Microsoft.EnterpriseManagement.Monitoring;

namespace MySamples
{
class Program
{
static void Main(string[] args)
{
ManagementGroup mg = new ManagementGroup("someManagementServer");

MonitoringPerformanceDataCriteria mpdc =
new MonitoringPerformanceDataCriteria(
@"ObjectName = 'ASP.NET' and CounterName like 'Request Execution%' and MonitoringObjectPath like 'usuallyAServerNameFilter%'");
ReadOnlyCollection<MonitoringPerformanceData> mpds =
mg.GetMonitoringPerformanceData(mpdc);

if (mpds.Count > 0) {
ReadOnlyCollection<MonitoringPerformanceDataValue> mpdvs = mpds[0].GetValues(DateTime.Today.AddDays(-1), DateTime.Today);

foreach (MonitoringPerformanceDataValue mpdv in mpdvs) {
Console.WriteLine("TimeSampled = " + mpdv.TimeSampled +
", SampleValue = " + mpdv.SampleValue);
}
}
Console.ReadLine();
}
}
}


Now I typically take the data out to F# and/or Excel and graph/model it to help figure my way through a problem. However, Microsoft have just released a nicely packaged analysis tool in Microsoft Sho – basically IronPython wrapped up with some plotting/graphing/math libraries – much like many of the other Python packages out there.


I thought this would be a fun way to interactively deal with performance data collection and analysis so I tried the following as an experiment (put in appropriate management server/monitoring object filters for you – I just used ASP.NET request execution times from a test environment as in the programs above).  The hist command takes a list, or I guess anything it can enumerate over and get numbers from, and returns a histogram dialog and plot.


ShoLoadAssembly("c:\Program Files\System Center Operations Manager 2007\SDK Binaries\Microsoft.EnterpriseManagement.OperationsManager.dll")
ShoLoadAssembly("c:\Program Files\System Center Operations Manager 2007\SDK Binaries\Microsoft.EnterpriseManagement.OperationsManager.dll")

import Microsoft.EnterpriseManagement
import Microsoft.EnterpriseManagement.Configuration
from Microsoft.EnterpriseManagement.Monitoring import *
from System import *

mg = Microsoft.EnterpriseManagement.ManagementGroup("aManagementServer")

mpdc = MonitoringPerformanceDataCriteria("ObjectName = 'ASP.NET' and CounterName like 'Request Execution%' and MonitoringObjectPath like 'someMonitoringObjectFilter%'")

mpds = mg.GetMonitoringPerformanceData(mpdc)
mpds.Count

hist([mpdv.SampleValue for mpdv in mpds[0].GetValues(DateTime.Today.AddDays(-1), DateTime.Today)]
)



And here’s the output:


image


Cool huh!?

Wednesday, February 09, 2011

Engineering tricks with WolframAlpha

I’ve found Google great for simple problems like this:

20 pounds per cubic foot in kilograms per cubic meter= 320.36926 kilograms per cubic meter

But wolframalpha also lets you combine unit conversions with calculations:

(2 * 175 grams) per square meter * (2*3 meters squared) + ((20 pounds per cubic foot in kilograms per cubic meter) * 3 square meters * 5 millimeters)

Or just click on this link: http://www.wolframalpha.com/input/?i=(2+*+175+grams)+per+square+meter+*+(2*3+meters+squared)+%2B+((20+pounds+per+cubic+foot+in+kilograms+per+cubic+meter)+*+3+square+meters+*+5+millimeters)

What’s the above tell me? It’s the theoretical weight of the kayak I’m currently building – about 7kg. In reality when hand laying fibreglass you get a much higher weight of epoxy rather than the one to one ratio that’s possible under vacuum application and which I’m using above.

Hull fibreglassing

It’s taken awhile to get here, but the hull is now fibreglassed.

  • After struggling with the Resene colorwood stain affecting epoxy seal coats I found that a wash with isopropyl alcohol made for a nice clean bonding surface.
  • I rediscovered how to apply epoxy, everyone seems to have their own method but for me it’s plastic cards (old atm cards/loyalty cards etc are ideal) and short foam (usually yellow) rollers.
  • A hot air gun does absolute wonders at getting rid of air bubbles/foam.
  • I used high density 170g/sq m fibreglass that’s often used for model making. That equates to 5 ounce high density fibreglass which should be pretty strong (easy way to do the conversion… just use google – type “175 grams per square meter in ounces per square yard” in the search box).
  • Wet sanding within 48 hours is more effectively at smoothing the epoxy finish than waiting for it to set really hard.

second hull fill coat