MoodBench

Overview

MoodBench is an automated benchmarking framework that fine-tunes, evaluates, and compares small language models for sentiment analysis. It uses Parameter-Efficient Fine-Tuning (PEFT) with LoRA so that meaningful experiments can run on a single Apple Silicon laptop or modest GPU, closing the gap between "I'd like to evaluate this for my use case" and "I have the budget to spin up an H100."

A secondary capability calculates an approximate Net Promoter Score (NPS) over review corpora as a proof-of-concept for downstream business signals layered on top of sentiment classification.

What's interesting about it

17 models, one harness. Coverage spans ultra-tiny (BERT-tiny at 4M params) to medium research-grade (Gemma-2-2B, Pythia-410m), letting you make like-for-like comparisons across architecture families and parameter budgets.
Memory-bounded. All configurations are designed to fit under 6 GB on Apple Silicon, making the entire suite reproducible without cloud spend.
Thorough metrics. Beyond accuracy and F1: balanced accuracy, latency percentiles, throughput, memory, statistical-significance testing, and robustness checks.
Two interfaces. A CLI (uv run moodbench …) for CI/CD and reproducible runs, and a Gradio web UI for interactive exploration.

References

Repository: github.com/andrewmarconi/MoodBench

Personal Project

2025

Vertical

Geography

Media Type(s)

Tags

Credits

Creator & Developer

Repository

Overview

What's interesting about it

References