Question 1

What is a confusion matrix?

Accepted Answer

A confusion matrix is a table that summarizes the performance of a classification model. Each row represents the actual (true) class and each column represents the predicted class. The diagonal cells contain correct predictions (true positives for each class); off-diagonal cells represent errors (false positives and false negatives). By reading the matrix you can immediately see not just how many predictions were wrong, but which classes are being confused with which.

Question 2

What is the difference between precision, recall, and F1 score?

Accepted Answer

Precision answers 'of all items the model labeled as class X, how many actually were X?' It measures false positive rate. Recall answers 'of all actual class X items, how many did the model correctly find?' It measures false negative rate. F1 score is the harmonic mean of precision and recall — it balances both. Use precision when false positives are costly (e.g., spam detection). Use recall when false negatives are costly (e.g., cancer screening). Use F1 when you need a single balanced metric.

Question 3

What is the difference between macro and weighted average?

Accepted Answer

Macro average computes the metric independently for each class and then takes an unweighted mean — every class contributes equally regardless of how many samples it has. This highlights underperformance on minority classes. Weighted average weights each class's metric by the number of true instances of that class (its support), so majority classes dominate the average. Use macro average to diagnose class imbalance issues; use weighted average when you care more about overall prediction volume.

Question 4

How do I use my model's predictions with this tool?

Accepted Answer

Switch to the Predictions Input tab and paste two columns: actual,predicted — one pair per line. You can include an optional header row (actual,predicted) which will be skipped automatically. Class names are discovered from the data and the confusion matrix is built programmatically. Alternatively, if you already have a computed matrix (e.g., from sklearn.metrics.confusion_matrix), switch to Matrix Input and paste the N×N comma-separated values along with class names.

Question 5

What is a good F1 score for a classification model?

Accepted Answer

This depends heavily on the task and class balance. For balanced binary classification, F1 above 0.85 is generally considered strong. For multi-class problems with imbalanced classes, F1 above 0.75 (macro average) is often acceptable. Medical or safety-critical applications demand F1 near or above 0.95. For context: random guessing on a balanced binary problem gives F1 around 0.5. This tool color-codes F1: green for above 0.8, yellow for 0.6–0.8, and red for below 0.6.

Class	Precision	Recall	F1 Score	Support
spam	0.873	0.917	0.894	60
ham	0.865	0.800	0.831	40
Macro avg	0.869	0.858	0.863	—
Weighted avg	0.870	0.870	0.869	—

Metric	Formula	Interpretation
Accuracy	(TP + TN) / Total	Overall fraction of correct predictions
Precision	TP / (TP + FP)	Of all predicted positives, how many are truly positive
Recall	TP / (TP + FN)	Of all actual positives, how many did the model find
F1 Score	2 × P × R / (P + R)	Harmonic mean of precision and recall

Confusion Matrix Visualizer

How a Confusion Matrix Works

Generating a Confusion Matrix with scikit-learn

When to Use Precision vs Recall vs F1

Frequently Asked Questions

What is a confusion matrix?

What is the difference between precision, recall, and F1 score?

What is the difference between macro and weighted average?

How do I use my model's predictions with this tool?

What is a good F1 score for a classification model?

Related Tools