Niloofar Mireshghallah

Niloofar Mireshghallah

I'm a Member of Technical Staff at humans&.

Beginning Fall 2026, I will join Carnegie Mellon University's Engineering & Public Policy (EPP) Department and Language Technologies Institute (LTI) as an Assistant Professor, and will be a core member of CyLab.

My research interests are privacy (particularly contextual integrity and information flow norms), natural language processing, AI for science, LLM reasoning, and the societal implications of ML. I explore the interplay between data, its influence on models, and the expectations of the people who regulate and use these models. My work has been recognized by the NCWIT Collegiate Award and the Rising Star in Adversarial ML Award.

Recruiting & collaborations: If you are interested in working with me, please fill out this brief form .

✦ Explanation about my name: I used to publish under Fatemeh which is my legal name. But I now go by Niloofar, the Lily flower in Farsi!

✦ My academic Job-market material (Fall 2024): Research statement · Teaching statement · DEI statement · CV · Job-talk slides

✦ Previously: I was a Research Scientist at Meta AI's FAIR Alignment group (May–Nov 2025) working on LLM privacy, security, and learning about chemistry! Prior to that, I was a postdoctoral scholar at University of Washington, advised by Yejin Choi and Yulia Tsvetkov. I received my PhD from UC San Diego, advised by Taylor Berg-Kirkpatrick, and during that time I was also a part-time researcher/intern at Microsoft Research—working with the Privacy in AI, Algorithms, and Semantic Machines teams on differential privacy, model compression, and data synthesis.

News Highlights

✍️

New! Blog post: "From Black and White to Gray: Redefining Privacy for Language" — why privacy is contextual, why scaling won't fix it, and the research behind ConfAIde and CIMemories.

🚀

New! I joined humans& as a Member of Technical Staff! Read about why I joined and what about CMU.

✍️

Check out my Writing/Blog section — latest post: "Surviving (and Thriving on) the Academic Job Market" — tips on interviews, health, and habits that kept me sane during the job search.

🎙️

Appeared on The Information Bottleneck podcast (Jan 2026) with Ravid Shwartz-Ziv & Allen Roush: discussed the future of generative AI, how it's reshaping creative work, accelerating scientific discovery, and the ethical frontier of AI.

🗺️

Gave a talk at the FAR AI San Diego Alignment Workshop at NeurIPS (Dec 2025): "What Does It Mean for Agentic AI to Preserve Privacy?"

📰

Featured in Science News Explores (Nov 2025): "5 things to remember when talking to a chatbot" — on AI privacy risks and how chatbots handle personal information.

📄

Our write-up "Privacy Is Not Just Memorization" with Tianshi Li is now available! Featured in Help Net Security (Oct 2025).

🗺️

Gave a keynote at CAMLIS 2025 (Oct 2025): "What Does It Mean for Agentic AI to Preserve Privacy? Mapping the New Data Sinks and Leaks" — Video, slides, and reading list

📰

Quoted in the Washington Post (Aug 2025) on AI hype, evaluation metrics, and how people judge AI capabilities.

🗺️

Gave a keynote at the L2M2 (Large Language Model Memorization) workshop at ACL (Aug 2025): "Emergent Misalignment Through the Lens of Non-verbatim Memorization"

🗺️

Gave a keynote at the LLMSec workshop at ACL (Aug 2025): "What does it mean for an AI agent to preserve privacy?" — slides

🎙️

Appeared on the Jay Shah Podcast (Feb 2025): "Differential Privacy, Creativity & Future of AI Research in the LLM Era"

🗺️

Gave an invited keynote at NeurIPS 2024 Red Teaming GenAI workshop (Dec 2024) on A False Sense of Privacy: Semantic Leakage and Non-literal Copying in LLMs — slides and recording (jump to 04:50:00).

🎙️

Appeared on the Thesis Review podcast with Sean Welleck where I talked about my work on Auditing and Mitigating Safety Risks in Large Language Models.

📝

I wrote a blogpost on "Should I do a postdoc?" based on my experience - check out the blog post and video with Sasha Rush!

Research Themes

Contextual Integrity and Privacy for LLMs

Privacy cannot be reduced to a checklist of redaction rules. After working on differential privacy for text—where formal guarantees often fail to capture what people actually care about—I turned to contextual integrity as a framework for reasoning about information flow norms. Privacy violations aren't about data exposure per se, but about information crossing contextual boundaries in ways that breach social expectations.

This framing is inherently outcome-driven and extends beyond any single person or moment: would sharing this medical record increase someone's insurance premium? Would exposure tracking during COVID save lives or enable surveillance? Would aggregating individually benign data points across time reveal something the user never intended to disclose? These are questions about downstream consequences for different people across different time horizons—you cannot determine appropriateness without reasoning about them.

I think privacy for LLMs is fundamentally about figuring out data composition, decomposition, and abstraction. What pieces of information combine to reveal something sensitive? What level of granularity is appropriate for a given recipient and purpose? This requires theory of mind—understanding what different parties know, expect, and would be harmed by.

In 2023, we released ConfAIde, the first benchmark testing whether LLMs respect contextual integrity norms (ICLR 2024 Spotlight). More recently, CIMemories (ICLR 2026) extends this to persistent memory systems, showing that violations compound over long-horizon interactions—jumping from 0.1% on single tasks to 25.1% with repeated sampling and aggregation.

Selected Papers

Open Problems: Multi-agent and multi-person scenarios where information flows between parties with conflicting norms. Multi-turn interactions where context accumulates. Long-horizon settings where individually benign disclosures aggregate into violations. Optimizing for the good of all parties, not just the user. [Read the full write-up] · [Technical report]
Differential Privacy for Language

Standard DP applied to text smooths distributional tails—destroying minority patterns, rare phrasings, and individual style. My work addresses this through structured latent modeling: operating in parse tree or semantic representation space rather than token space allows noise injection that preserves distributional structure while providing formal guarantees.

Open Problems: Enabling privacy-preserving research on sensitive interaction data (e.g., mental health chatbot logs) through DP pattern extraction and synthetic generation—allowing safety researchers to study temporal usage patterns, escalation trajectories, and intervention opportunities without accessing raw conversations. Achieving user-level DP for conversational data without destroying personalization.
Memorization & Membership Inference

I view memorization beyond privacy and copyright—as a window into learning dynamics. What gets encoded, when during training, how it interacts with pretraining data distribution, and what it tells us about generalization and contamination.

The most important observation I have come to: model regurgitations are mainly a function of n-gram frequency, and there is no such thing as one-shot memorization. What appears to be single-shot recall typically involves templated texts, textual variants, or compositions of frequent patterns the model reconstructs without true memorization. Non-trivial repetition is required for extractable verbatim recall. Further, over 50% of memorized content can be attributed to general language modeling capabilities rather than sequence-specific weights—explaining why unlearning often fails without degrading overall model quality.

Open Problems: Predicting which specific sequences will be memorized before training completes. Understanding the memorization-capacity-competence triad: models are most "absorbent" around 10-20% into training despite having more unused capacity earlier, suggesting interplay between remaining capacity, linguistic competence, and data distribution dynamics. Connecting memorization patterns to contamination detection and unlearning.
LLM Reasoning & RL Workflows

My work examines what RL actually changes in model behavior versus SFT and distillation—whether improvements are shallow (format, style) or deep (genuine reasoning, knowledge access), and how they transfer across tasks.

We observe that RL'd models access hierarchical parametric knowledge more effectively than SFT models, and distilled models perform worst despite surface-level improvements. Distillation seems to capture format but loses the ability to correctly traverse knowledge structures. Training on synthetic graph traversal tasks improves unrelated retrieval benchmarks—the base model has the knowledge (structured prompting can surface it), but RL pushes the model toward that navigation strategy by default.

A key question is whether RL is teaching depth of composition or breadth of search. One hypothesis: RL incentivizes testing more hypotheses (visible as backtracking in math/code, or as enumeration and recitation in fuzzier tasks). The "chaining" behavior—trying different paths—seems to help most when there's an asymmetry between generation and verification capabilities.

Open Problems: Does transfer depend on pretraining data overlap, or on structural similarity of the RL task? What is the relationship between base model diversity/coverage and downstream RL success—can diversity metrics predict which models will benefit from RL? How do distillation recipes (logit vs hard labels, on-policy vs off-policy) affect preservation of navigation capabilities? Can curriculum learning based on loss or uncertainty signals improve sample efficiency?
Value Diversity & Pluralistic Alignment

Most alignment optimizes toward a single "correct" response—the mean of annotator preferences. But collapsing to the mean erases minority viewpoints, stylistic diversity, and legitimate disagreement. My work focuses on the distributional side: how model outputs relate to the full spectrum of human variation, not its center.

A simple illustration: if you ask a model the probability of a coin landing heads, it knows it's fifty-fifty. But ask most models to simulate ten coin tosses, and you'll get heads eight times. Models learn facts about distributions without learning to materialize them. This connects directly to pretraining data frequency—models collapse toward high-frequency patterns, which is also why they lack creativity and reproduce training data. The distributional question cuts across alignment, copyright, and personalization.

Open Problems: Training models to materialize distributional diversity rather than just memorizing distribution statistics. Probabilistic preference modeling that represents the spectrum rather than point estimates. Measuring diversity that matters for downstream capabilities (RL exploration, creativity) versus noise. Data scarcity for minority viewpoints and rare preferences.

Selected Publications

For the full list, please refer to my Google Scholar page.

Invited Talks

  • FAR AI San Diego Alignment Workshop at NeurIPS

    Workshop Keynote, Dec. 2025

    What Does It Mean for Agentic AI to Preserve Privacy?

  • CAMLIS 2025

    Keynote, Oct. 2025

    What Does It Mean for Agentic AI to Preserve Privacy? Mapping the New Data Sinks and Leaks

    Video

  • Cornell Tech Digital Life Seminar

    Seminar, Oct. 2025

    Contextual Privacy in LLMs: Benchmarking and Mitigating Inference-Time Risks

    Slides

  • First Workshop on LLM Security (LLMSec) at ACL 2025

    Keynote, Aug. 2025

    What Does It Mean for Agentic AI to Preserve Privacy?

    Slides

  • First Workshop on Large Language Model Memorization (L2M2) at ACL 2025

    Keynote, Aug. 2025

    Emergent Misalignment Through the Lens of Semantic Memorization

    Slides

  • Workshop on Collaborative and Federated Agentic Workflows (CFAgentic) at ICML 2025

    Invited Talk, July 2025

    What Does It Mean for Agentic AI to Preserve Privacy?

  • Fifth Workshop on Trustworthy Natural Language Processing @NAACL 2025 (TrustNLP)

    Invited Talk, May 2025

  • Stanford University (NLP Seminar)

    NLP Seminar, Jan. 2025

    Privacy, Copyright and Data Integrity: The Cascading Implications of Generative AI

    Slides | Reading List

  • University of California, Los Angeles

    Guest lecture for CS 269 - Computational Ethics, LLMs and the Future of NLP, Jan. 2025

    Privacy, Copyright and Data Integrity: The Cascading Implications of Generative AI

    Slides

  • NeurIPS Conference (Red Teaming GenAI workshop)

    Red Teaming GenAI workshop, Dec. 2024

    A False Sense of Privacy: Semantic Leakage and Non-literal Copying in LLMs

    Slides | Recording (jump to 04:50:00)

  • NeurIPS Conference (PrivacyML Tutorial)

    Panelist, Dec. 2024

    PrivacyML: Meaningful Privacy-Preserving Machine Learning tutorial

    Recording (jump to 01:52:00)

  • Johns Hopkins University

    CS Department Seminar, Dec. 2024

    Privacy, Copyright and Data Integrity: The Cascading Implications of Generative AI

    Slides

  • Future of Privacy Forum

    Panelist, Nov. 2024

    Technologist Roundtable for Policymakers: Key Issues in Privacy and AI

  • University of Utah

    Guest lecture for the School of Computing CS 6340/5340 NLP course, Nov. 2024

    Can LLMs Keep a Secret?

    Slides | Recording

  • UMass Amherst

    NLP Seminar, Oct. 2024

    Membership Inference Attacks and Contextual Integrity for Language

    Slides

  • Northeastern University

    Khoury College of Computer Sciences Security Seminar, Oct. 2024

    Membership Inference Attacks and Contextual Integrity for Language

    Slides

  • Stanford Research Institute (SRI) International

    Computational Cybersecurity in Compromised Environments (C3E) workshop, Sep. 2024

    Can LLMs keep a secret? Testing privacy implications of Language Models via Contextual Integrity

    Slides

  • LinkedIn Research

    Privacy Tech Talk, Sep. 2024

    Can LLMs keep a secret? Testing privacy implications of Language Models via Contextual Integrity

  • National Academies (NASEM)

    Forum on Cyber Resilience, Aug. 2024

    Oversharing with LLMs is underrated: the curious case of personal disclosures in human-LLM conversations

    Slides

  • ML Collective

    DLCT reading group, Aug. 2024

    Privacy in LLMs: Understanding what data is imprinted in LMs and how it might surface!

    Slides | Recording

  • Carnegie Mellon University

    Invited Talk, Jun. 2024

    Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs

    Slides

  • Generative AI and Law workshop, Washington DC

    Invited Talk, Apr. 2024

    What is differential privacy? And what is it not?

    Slides

  • Meta AI Research

    Invited Talk, Apr. 2024

    Membership Inference Attacks and Contextual Integrity for Language

  • Georgia Institute of Technology

    Guest lecture for the School of Interactive Computing, Apr. 2024

    Safety in LLMs: Privacy and Memorization

  • University of Washington

    Guest lecture for CSE 484 and 582 courses on Computer Security and Ethics in AI, Apr. 2024

    Safety in LLMs: Privacy and Memorization

  • Carnegie Mellon University

    Guest lecture for LTI 11-830 course on Computational Ethics in NLP, Mar. 2024

    Safety in LLMs: Privacy and Memorization

  • Simons Collaboration

    TOC4Fairness Seminar, Mar. 2024

    Membership Inference Attacks and Contextual Integrity for Language

    Slides | Recording

  • University of California, Santa Barbara

    NLP Seminar Invited Talk, Mar. 2024

    Can LLMs Keep a Secret? Testing Privacy Implications of LLMs

    Slides

  • University of California, Los Angeles

    NLP Seminar Invited Talk, Mar. 2024

    Can LLMs Keep a Secret? Testing Privacy Implications of LLMs

    Slides

  • University of Texas at Austin

    Guest lecture for LIN 393 course on Social Applications and Impact of NLP, Feb. 2024

    Can LLMs Keep a Secret? Testing Privacy Implications of LLMs

    Slides

  • Google Brain

    Google Tech Talk, Feb. 2024

    Can LLMs Keep a Secret? Testing Privacy Implications of LLMs

    Slides | Recording

  • University of Washington

    Allen School Colloquium, Jan. 2024

    Can LLMs Keep a Secret? Testing Privacy Implications of LLMs

    Slides | Recording

  • University of Washington

    eScience Institute Seminars, Nov. 2023

    Privacy Auditing and Protection in Large Language Model

    Slides

  • CISPA Helmholtz Center for Security

    Invited Talk, Sep. 2023

    What does privacy-preserving NLP entail?

  • Max Planck Institute for Software Systems

    Next 10 in AI Series, Sep. 2023

    Auditing and Mitigating Safety Risks in LLMs

    Slides

  • Mila / McGill University

    Invited Talk, May 2023

    Privacy Auditing and Protection in Large Language Models

  • EACL 2023

    Tutorial co-instruction, May 2023

    Private NLP: Federated Learning and Privacy Regularization

    Slides | Recording

  • LLM Interfaces Workshop and Hackathon

    Invited Talk, Apr. 2023

    Learning-free Controllable Text Generation

    Slides | Recording

  • University of Washington

    Invited Talk, Apr. 2023

    Auditing and Mitigating Safety Risks in Large Language Models

    Slides

  • NDSS Conference

    Keynote talk for EthiCS workshop, Feb. 2023

    How much can we trust large language models?

  • Google

    Federated Learning Seminar, Feb. 2023

    Privacy Auditing and Protection in Large Language Models

    Slides

  • University of Texas Austin

    Invited Talk, Oct. 2022

    How much can we trust large language models?

    Slides

  • Johns Hopkins University

    Guest lecture for CS 601.670 course on Artificial Agents, Sep. 2022

    Mix and Match: Learning-free Controllable Text Generation

    Slides

  • KDD Conference

    Adversarial ML workshop, Aug. 2022

    How much can we trust large language models?

    Slides | Recording

  • Microsoft Research Cambridge

    Invited Talk, Mar. 2022

    What Does it Mean for a Language Model to Preserve Privacy?

    Slides

  • University of Maine

    Guest lecture for COS435/535 course on Information Privacy Engineering, Dec. 2021

    Improving Attribute Privacy and Fairness for Natural Language Processing

    Slides

  • National University of Singapore

    Invited Talk, Nov. 2021

    Style Pooling: Automatic Text Style Obfuscation for Fairness

    Slides

  • Big Science for Large Language Models

    Invited Panelist, Oct. 2021

    Privacy-Preserving Natural Language Processing

    Recording

  • Research Society MIT Manipal

    Cognizance Event Invited Talk, Jul. 2021

    Privacy and Interpretability of DNN Inference

    Slides | Recording

  • Alan Turing Institute

    Privacy and Security in ML Seminars, Jun. 2021

    Low-overhead Techniques for Privacy and Fairness of DNNs

    Slides | Recording

  • Split Learning Workshop

    Invited Talk, Mar. 2021

    Shredder: Learning Noise Distributions to Protect Inference Privacy

    Slides | Recording

  • University of Massachusetts Amherst

    Machine Learning and Friends Lunch, Oct. 2020

    Privacy and Fairness in DNN Inference

  • OpenMined Privacy Conference

    Invited Talk, Sep. 2020

    Privacy-Preserving Natural Language Processing

    Slides | Recording

  • Microsoft Research AI

    Breakthroughs Workshop, Sep. 2020

    Private Text Generation through Regularization

Awards and Honors

🏆

Tinker Academic Research Compute Grant, 2025

🏆

Modal Academic Research Compute Grant, 2025

🏆

Momental Foundation Mistletoe Research Fellowship (MRF) Finalist, 2023

🌟

Rising Star in Adversarial Machine Learning (AdvML) Award Winner, 2022. AdvML Workshop

🌟

Rising Stars in EECS, 2022. Event Page

🎓

UCSD CSE Excellence in Leadership and Service Award Winner, 2022

🌟

FAccT Doctoral Consortium, 2022. FAccT 2022

👩‍💻

Qualcomm Innovation Fellowship Finalist, 2021. Fellowship Page

👩‍💻

NCWIT (National Center for Women & IT) Collegiate Award Winner, 2020. NCWIT Awards

🎓

National University Entrance Exam in Math, 2014. Ranked 249th of 223,000

🎓

National University Entrance Exam in Foreign Languages, 2014. Ranked 57th of 119,000

🎓

National Organization for Exceptional Talents (NODET), 2008. Admitted, ~2% Acceptance Rate

Featured Press & Media

Recent Co-organized Workshops & Service

[for full list check my CV]

Memorization and Trustworthy Foundation Models Workshop @ICML 2025 (Co-organizer)

Area Chair for COLM 2025 & 2026

Workshop on Technical AI Governance (TAIG) @ICML 2025 (Panelist)

Workshop on Collaborative and Federated Agentic Workflows (CFAgentic) @ICML 2025 (Panelist)

Privacy Session Chair at SAGAI Workshop @IEEE S&P 2025

Industry Research Experience

  • Microsoft Semantic Machines

    Fall 2022-Fall 2023 (Part-time), Summer 2022 (Intern)

    Mentors: Richard Shin, Yu Su, Tatsunori Hashimoto, Jason Eisner

  • Microsoft Research, Algorithms Group, Redmond Lab

    Winter 2022 (Intern)

    Mentors: Sergey Yekhanin, Arturs Backurs

  • Microsoft Research, Language, Learning and Privacy Group, Redmond Lab

    Summer 2021 (Intern), Summer 2020 (Intern)

    Mentors: Dimitrios Dimitriadis, Robert Sim

  • Western Digital Co. Research and Development

    Summer 2019 (Intern)

    Mentor: Anand Kulkarni

Diversity, Inclusion & Mentorship

🔹

Mentor for Women in Machine Learning (WiML) Workshop at NeurIPS 2025

🔹

Panelist at CMU School of Computer Science Panel: Navigating the Academic Job Market (2025)

🔹

Mentor on the 'How to broadcast your research to a wider audience?' panel at ACL Mentorship Program -- 2025

🔹

Mentor for the mentorship program at WiML event in NeurIPS 2024

🔹

D&I chair at NAACL 2025

🔹

Widening NLP (WiNLP) co-chair

🔹

Socio-cultural D&I chair at NAACL 2022

🔹

Mentor for the Graduate Women in Computing (GradWIC) at UCSD

🔹

Mentor for the UC San Diego Women Organization for Research Mentoring (WORM) in STEM

🔹

Co-leader for the "Feminist Perspectives for Machine Learning & Computer Vision" Break-out session at the Women in Machine Learning (WiML) 2020 Un-workshop Held at ICML 2020

🔹

Mentor for the USENIX Security 2020 Undergraduate Mentorship Program

🔹

Volunteer at the Women in Machine Learning 2019 Workshop Held at NeurIPS 2019

🔹

Invited Speaker at the Women in Machine Learning and Data Science (WiMLDS) NeurIPS 2019 Meetup

🔹

Mentor for the UCSD CSE Early Research Scholars Program (CSE-ERSP) in 2018

Professional Services

[Outdated, for an updated version check my CV]

Reviewer for ICLR 2022

Reviewer for NeurIPS 2021

Reviewer for ICML 2021

Shadow PC member for IEEE Security and Privacy Conference Winter 2021

Artifact Evaluation Program Committee Member for USENIX Security 2021

Reviewer for ICLR 2021 Conference

Program Committee member for the LatinX in AI Research Workshop at ICML 2020 (LXAI)

Reviewer for the 2020 Workshop on Human Interpretability in Machine Learning (WHI) at ICML 2020

Program Committee member for the MLArchSys workshop at ISCA 2020

Security & Privacy Committee Member and Session Chair for Grace Hopper Celebration (GHC) 2020

GHC (Grace Hopper Celebration) 2020 Privacy and Security Committee Member

Reviewer for ICML 2020 Conference

Artifact Evaluation Program Committee Member for ASPLOS 2020

Reviewer for IEEE TC Journal

Reviewer for ACM TACO Journal

Books I Like!

📚

Range: Why Generalists Triumph in a Specialized World by D. Epstein

📚

Messy: The Power of Disorder to Transform Our Lives by T. Harford

📚

Small Is Beautiful: Economics As If People Mattered by E. F. Schumacher

📚

Quarter-life by Satya Doyle Byock

📚

The Body Keeps the Score by Bessel van der Kolk

📚

36 Views of Mount Fuji by Cathy Davidson

📚

Indistractable by Nir Eyal

📚

Sapiens: A Brief History of Humankind by Yuval Noah Harari

📚

The Martian by Andy Weir

📚

The Solitaire Mystery by Jostein Gaarder

📚

The Orange Girl by Jostein Gaarder

📚

Life is Short: A Letter to St Augustine by Jostein Gaarder

📚

The Alchemist by Paulo Coelho