MoodBench

Personal Project

2025

MoodBench
MoodBench

Vertical

AI/Machine Learning

Geography

Global

Media Type(s)

CLI Tool

Tags

PythonLoRAPEFTNLPBenchmarkingOpen Source
A multi-LLM sentiment-analysis benchmark: fine-tunes, evaluates, and compares 17 small language models (4M to 410M parameters) on consumer hardware using LoRA.

Credits

Overview

MoodBench is an automated benchmarking framework that fine-tunes, evaluates, and compares small language models for sentiment analysis. It uses Parameter-Efficient Fine-Tuning (PEFT) with LoRA so that meaningful experiments can run on a single Apple Silicon laptop or modest GPU, closing the gap between "I'd like to evaluate this for my use case" and "I have the budget to spin up an H100."

A secondary capability calculates an approximate Net Promoter Score (NPS) over review corpora as a proof-of-concept for downstream business signals layered on top of sentiment classification.

What's interesting about it

  • 17 models, one harness. Coverage spans ultra-tiny (BERT-tiny at 4M params) to medium research-grade (Gemma-2-2B, Pythia-410m), letting you make like-for-like comparisons across architecture families and parameter budgets.
  • Memory-bounded. All configurations are designed to fit under 6 GB on Apple Silicon, making the entire suite reproducible without cloud spend.
  • Thorough metrics. Beyond accuracy and F1: balanced accuracy, latency percentiles, throughput, memory, statistical-significance testing, and robustness checks.
  • Two interfaces. A CLI (uv run moodbench …) for CI/CD and reproducible runs, and a Gradio web UI for interactive exploration.

References