
In Excel, I've set up a new sheet with confusion at the end of the sheet name to focus on the calculations for the confusion matrix. False positives predict a positive outcome, but we actually have a negative outcome, while false negatives predict negative outcome, but we'll actually see a positive outcome. Conversely, if the predictions don't match the actuals, then these are what we call false positives and false negatives. Conversely, if we predicted the outcome to be true, one in this case, and predicted the actual outcome to be one, this would be an example of a true positive. When we predicted that the outcome would be false and the actual outcome is false, this is called a true negative.

We want to make sure we're not using it as a way to over fit our data, which is why we're not using it on the training data, but instead on the testing data. This is because we're interested in validating our model and measuring the performance instead of developing it. It's also important to note that we're going to use the confusion matrix on our testing data instead of the training data we used to set up the model. Notice the confusion matrix uses numeric outcomes instead of probabilities, so we want to use the predicted outcomes rounded to the nearest integer, either zero or one, so they match the actual outcomes in the data. The confusion table contains true positives, true negatives, false negatives, and false positives. The confusion matrix takes the actual and predicted outcomes for both these outcomes and puts it on a table with four boxes.

A confusion matrix should be on algorithms with roughly an equal number of outcomes in each class, and also in algorithms with only two classes of outcomes, like our binomial logistic regression model. One way to do this is through a confusion matrix, which is a technique for classification algorithms like this that lets us summarize its performance. Now let's compare the predicted outcomes with the actual outcomes in the logistic regression model.
