MoodBench
Personal Project
2025

Vertical
AI/Machine Learning
Geography
Global
Media Type(s)
CLI Tool
Tags
PythonLoRAPEFTNLPBenchmarkingOpen Source
Credits
Creator & Developer
Repository
Overview
MoodBench is an automated benchmarking framework that fine-tunes, evaluates, and compares small language models for sentiment analysis. It uses Parameter-Efficient Fine-Tuning (PEFT) with LoRA so that meaningful experiments can run on a single Apple Silicon laptop or modest GPU, closing the gap between "I'd like to evaluate this for my use case" and "I have the budget to spin up an H100."
A secondary capability calculates an approximate Net Promoter Score (NPS) over review corpora as a proof-of-concept for downstream business signals layered on top of sentiment classification.
What's interesting about it
- 17 models, one harness. Coverage spans ultra-tiny (BERT-tiny at 4M params) to medium research-grade (Gemma-2-2B, Pythia-410m), letting you make like-for-like comparisons across architecture families and parameter budgets.
- Memory-bounded. All configurations are designed to fit under 6 GB on Apple Silicon, making the entire suite reproducible without cloud spend.
- Thorough metrics. Beyond accuracy and F1: balanced accuracy, latency percentiles, throughput, memory, statistical-significance testing, and robustness checks.
- Two interfaces. A CLI (
uv run moodbench …) for CI/CD and reproducible runs, and a Gradio web UI for interactive exploration.
References
- Repository: github.com/andrewmarconi/MoodBench