Michael J. Zellinger
I'm a PhD candidate at Caltech, where my research focuses on making AI systems more reliable by quantifying the uncertainty of large language models.
Outside of academia, I've collaborated closely with Sid Dastidar, Head of Risk Modeling at Oaktree Capital Management, on several AI-driven investment projects.
In addition, I supported label expansion for Novartis' best-selling drug Cosentyx by generating insights into its causal mechanisms for a new indication.
LinkedIn /
X /
GitHub
|
|
Research
/
Projects
I'm broadly interested in uncertainty quantification, systems of AI models,
and applications of AI.
I love building software that manifests the magic of AI in practical use cases.
|
|
Cost-Saving LLM Cascades with Early Abstention
w/ Matt Thomson
posted on arXiv, 2025
On risk-sensitive domains, abstaining from a query is preferable to making a mistake. Here, we consider LLM
cascades with abstention and investigate the benefits of allowing smaller models at the beginning of the
cascade to abstain directly.
|
|
Rational Tuning of LLM Cascades via Probabilistic Modeling
w/ Matt Thomson
posted on arXiv, 2025
Tuning the confidence thresholds of LLM cascades is often a trial-and-error process. We introduce a
Markovian copula model to capture interactions between the performance of different LLMs and derive a
continuous optimization-based algorithm for more efficient threshold tuning.
|
|
Natural Language-Based Synthetic Data Generation for Cluster Analysis
w/ Peter Bühlmann
to appear in Journal of Classification, 2025
Cluster analysis relies on synthetic data benchmarks, but manually designing evaluation scenarios
such as "seven oblong clusters in 3D with some overlap" is labor-intensive. We present
repliclust,
a natural language-based synthetic data generator that turns verbal descriptions of evaluation
scenarios into concrete data sets.
|
|
datapick - All-In-One Platform for Data Labeling and LLM Finetuning
I am fascinated by human-in-the-loop approaches for customizing language models. To explore this area, I developed datapick, a web application for interactively labeling data and easily deploying finetuned LLMs.
I built the Javascript (NextJS) frontend; Python (FastAPI) backend; OAuth authentication with auth0; database integrations with MongoDB and AWS; a Kubernetes-managed GPU cluster for running containerized finetuning and inference jobs; and a Python API to use datapick workflows within any Python codebase.
|
|
faangcheck - Instant Resume Feedback
OpenAI's launch of GPT-4 enabled many use cases for AI that were impractical before. For example, faangcheck provides instant resume feedback, rating a candidate's suitability for different roles across FAANG on a scale from 0 to 100. Users receive a personalized and shareable evaluation page, akin to a baseball card.
I built faangcheck's JavaScript (NextJS) frontend and its database integrations with MongoDB and AWS, as well as part of the Python (FastAPI) backend. Aman Bhargava created the initial version of the backend. The site plateaued at nearly 800 users before we shut it down.
|
|