ML Model Performance

A/B testing, evaluation metrics, and error analysis

Model A

sentiment-classifier-v3.2

Version 3.2.0 • Deployed 2024-01-15

Accuracy

92.4%

AUC

0.956

Latency (p95)

28ms

Cost/1K

$0.002

Traffic Split80%
Model B

sentiment-classifier-v4.0

Version 4.0.0 • Deployed 2024-02-20

Accuracy

93.1%

AUC

0.961

Latency (p95)

32ms

Cost/1K

$0.003

Traffic Split20%

Accuracy

92.4%

vs 93.1%

Precision

91.8%

vs 92.7%

Recall

91.2%

vs 91.9%

F1-Score

91.5%

vs 92.3%

Class-level Performance

ClassPrecisionRecallF1-ScoreSupport
Positive
93.2%92.5%92.8%19,800
Negative
91.8%90.1%90.9%15,730
Neutral
90.9%91.3%91.1%14,200

Latency Percentiles

p50 (Median)
12ms15ms
p95
28ms32ms
p99
45ms51ms

Cost Analysis

Cost per 1K Inferences

$0.002

Model A

$0.003

Model B

Estimated Monthly Cost (1M inferences)

$2.00

$3.00