Michael J. Zellinger

Cost-Saving LLM Cascades with Early Abstention
w/ Matt Thomson
posted on arXiv, 2025

On risk-sensitive domains, abstaining from a query is preferable to making a mistake. Here, we consider LLM cascades with abstention and investigate the benefits of allowing smaller models at the beginning of the cascade to abstain directly.

Rational Tuning of LLM Cascades via Probabilistic Modeling
w/ Matt Thomson
posted on arXiv, 2025

Tuning the confidence thresholds of LLM cascades is often a trial-and-error process. We introduce a Markovian copula model to capture interactions between the performance of different LLMs and derive a continuous optimization-based algorithm for more efficient threshold tuning.

Natural Language-Based Synthetic Data Generation for Cluster Analysis
w/ Peter Bühlmann
to appear in Journal of Classification, 2025

Cluster analysis relies on synthetic data benchmarks, but manually designing evaluation scenarios such as "seven oblong clusters in 3D with some overlap" is labor-intensive. We present repliclust, a natural language-based synthetic data generator that turns verbal descriptions of evaluation scenarios into concrete data sets.

datapick - All-In-One Platform for Data Labeling and LLM Finetuning

I am fascinated by human-in-the-loop approaches for customizing language models. To explore this area, I developed datapick, a web application for interactively labeling data and easily deploying finetuned LLMs.

I built the Javascript (NextJS) frontend; Python (FastAPI) backend; OAuth authentication with auth0; database integrations with MongoDB and AWS; a Kubernetes-managed GPU cluster for running containerized finetuning and inference jobs; and a Python API to use datapick workflows within any Python codebase.

faangcheck - Instant Resume Feedback

OpenAI's launch of GPT-4 enabled many use cases for AI that were impractical before. For example, faangcheck provides instant resume feedback, rating a candidate's suitability for different roles across FAANG on a scale from 0 to 100. Users receive a personalized and shareable evaluation page, akin to a baseball card.

I built faangcheck's JavaScript (NextJS) frontend and its database integrations with MongoDB and AWS, as well as part of the Python (FastAPI) backend. Aman Bhargava created the initial version of the backend. The site plateaued at nearly 800 users before we shut it down.

Research / Projects