Looni Lab

Carnegie Mellon University · Privacy, AI for Science, and the Societal Implications of ML

Members

Visiting & Collaborators

Research Themes

Contextual Integrity, Privacy & Differential Privacy for Language

We study privacy for language models through contextual integrity: information flow norms rather than fixed redaction rules. A privacy violation happens when information crosses a contextual boundary in a way that breaches social expectations, which depends on the recipient, the purpose, and the downstream consequences. We build benchmarks like ConfAIde and CIMemories that test whether models respect these norms, and we find that violations compound over long, multi-turn interactions. On the differential privacy side, we work on methods that add formal guarantees to text while preserving rare phrasings and individual style, by operating over semantic or parse-tree representations instead of raw tokens.

Open Problems: Multi-agent and multi-person settings with conflicting norms. Long-horizon interactions where individually benign disclosures aggregate into violations. User-level DP for conversational data that keeps personalization, and DP pattern extraction that lets researchers study sensitive interaction data without raw access. [Write-up] · [Technical report]
Memorization & Membership Inference

We treat memorization as a window into learning dynamics: what gets encoded, when during training, and how it relates to the pretraining distribution. We find that regurgitation tracks n-gram frequency and that extractable verbatim recall requires repetition, so apparent one-shot memorization is usually reconstruction of frequent or templated patterns. Over half of memorized content comes from general language-modeling ability rather than sequence-specific weights, which is part of why unlearning often fails without hurting overall quality. We also show that ordinary downstream finetuning can reactivate verbatim recall of copyrighted text that earlier alignment had suppressed (Alignment Whack-a-Mole).

Open Problems: Predicting which sequences will be memorized before training finishes. Understanding the memorization, capacity, and competence triad, since models are most absorbent around 10 to 20 percent into training. Connecting memorization to contamination detection and unlearning, and explaining why suppressed recall resurfaces after finetuning.
AI for Science (Chemistry, Drug Discovery & RL for Reasoning)

Building on work with the FAIR Chemistry group at Meta, we study whether LLM agents can do end-to-end small-molecule drug design: reasoning over targets, proposing structures, and optimizing candidates over many steps. On our SMDD-Bench, even the strongest frontier model solves only about 40 percent of tasks, and agents often reward-hack the oracles (gaming the structure predictor or brute-forcing ADMET calls) rather than showing molecular intuition.

A central direction is reinforcement learning for science: learning good representations of scientific structure and building RL on top of them. We find RL'd models traverse hierarchical knowledge better than SFT or distilled models, and training on synthetic graph-traversal tasks transfers to unrelated retrieval benchmarks. We are interested in many forms of RL here, including agentic and test-time RL for discovery and agentic verification workflows such as RefGrader for grading math proofs.

Open Problems: Agentic and test-time RL for scientific discovery. Reusable scientific representations that RL can build on. Synthesis-aware design that plans routes, not just structures. 3D pocket reasoning and interaction-point prediction. Oracles and rewards that resist gaming while staying faithful to real chemistry.
AI & Mental Health

People increasingly bring their hardest moments to AI systems, which makes the privacy and safety of mental-health AI a priority for us. With support from an OpenAI Mental Health Research Grant (co-PI with Adam Perer, CMU), we study how safety and mental-health systems handle sensitive disclosures. We find that safety classifiers leak the most at the decision boundary, where crisis and mental-health queries tend to sit, so the inputs we most want to protect are the easiest to infer. We also want to study usage and escalation patterns from sensitive logs without exposing raw conversations.

Selected Papers

Open Problems: Privacy-preserving study of sensitive interaction data such as crisis lines and mental-health chatbots. Safety classifiers that do not leak membership at the boundary. Evaluating escalation and intervention quality without compromising confidentiality. Aligning mental-health AI with clinical norms and contextual-integrity expectations.
Value Diversity & Pluralistic Alignment

Most alignment optimizes toward a single response, the mean of annotator preferences, which erases minority viewpoints and stylistic diversity. We focus on the distributional side: how model outputs relate to the full spectrum of human variation. For example, a model knows a coin is fifty-fifty but will simulate ten tosses as eight heads; models learn facts about distributions without learning to materialize them. This connects to pretraining frequency and cuts across alignment, copyright, and personalization.

Open Problems: Training models to materialize distributional diversity rather than memorize distribution statistics. Probabilistic preference modeling that represents the spectrum rather than point estimates. Measuring diversity that matters for downstream capabilities versus noise. Data scarcity for minority viewpoints and rare preferences.

Sponsors