Monday, May 28, 2007

SQL Server 2005 Data Mining Classification Matrix

Finally figured it out, so easy, and yet so poorly described in the documentation; the classification model tab in SQL Server 2005 data mining shows you the accuracy of your model prediction. Figuring out how to use it had me confused for quite some time. If you're as daft as me then perhaps the following might help.

The easiest way to discover how to correctly read the classification matrix is to generate a table from your testing data set (you do split your original data into training and test data sets don't you...!?) with the ID, the actual outcome you're trying to predict, and the predicted value. You can do this in the Mining Model Prediction Tab. It's easier there because the option to save to table in your database is accessible from the little disk icon in the top left corner. You choose the ID, the actual and the predicted value in the bottom half of the screen. The first time I looked at this it was a bit of mystery how to drive that half of the screen, I'm guessing you've mastered that part.

Load up the table that is saved from your mining query into SQL Server's query window and count up the number of entries that are correctly predicted; and the number of actual values for each state of the actual attribute. You'll then see how it maps to the classification table. Basically, the succinct description at the top of the table is correct. Columns correspond to actuals; rows to predicted values. The bit they leave out that would've been useful to me is that the total number of cases of any one actual value you get from summing vertically; the total number of predicted cases of a value you get from summing the row.

1 comment:

Anonymous said...

Who knows where to download XRumer 5.0 Palladium?
Help, please. All recommend this program to effectively advertise on the Internet, this is the best program!