Allegro Lab

This page is under development.

The AI, Language, Learning, Generalization, and Robustness (ALLeGRo) Lab studies natural language processing and machine learning with a focus building reliable NLP systems for a wide range of scenarios. We aim for a deeper understanding of how NLP systems work, when they fail, and how they can be improved.

Here are the research questions we have been working on recently:

How can we scientifically understand large language models? Our scientific understanding of LLMs lags far behind our ability to engineer them. To bridge this gap, our recent work has studied in-context learning from both a data-centric and mechanistic perspective; we have also investigated the predictability of different LLM capabilities.
How should we benchmark modern NLP systems? I have long advocated for benchmarking robustness and uncertainty of NLP systems. Our recent work has benchmarked generalization to long-tail examples and calibration of LLMs. We have also shown that benchmarking under distribution shift can reveal advantages of neurosymbolic approaches.
How can smaller open-source models compete with closed-source LLMs? Continued scientific progress relies on access to strong open-source models. Our recent work has improved smaller models by training them to generate reasoning chains.
How can advances in NLP inform other disciplines? Developments in NLP promise to have broad impacts across disparate areas of study. We have collaborated with legal experts to operationalize underspecified requirements in the EU’s Digital Services Act in a manner that is both legally justified and technically feasible. I am also interested in collaborating with experts in other disciplines who want to use NLP for their own research; for example, I have built assisted curation tools for biomedical researchers.

news

Sep 03, 2024	Welcome to the new Allegro Lab website.

selected publications

AIES

Operationalizing content moderation "accuracy" in the Digital Services Act

Johnny Wei, Frederike Zufall, and Robin Jia

2024
ACL Findings

Proving membership in LLM pretraining data via data watermarks

Johnny Tian-Zheng Wei*, Ryan Yixiang Wang*, and Robin Jia

2024
NAACL

Do Localization Methods Actually Localize Memorized Data in LLMs?

Ting-Yun Chang, Jesse Thomason, and Robin Jia

Github , 2024
EMNLP

Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering

Wang Zhu, Jesse Thomason, and Robin Jia

Github , 2023
EMNLP Findings

Estimating Large Language Model Capabilities without Labeled Test Data

Harvey Yiyun Fu, Qinyuan Ye, Albert Xu, Xiang Ren, and Robin Jia

Github , 2023
EACL Findings

Benchmarking Long-tail Generalization with Likelihood Splits

Ameya Godbole, and Robin Jia

Github , 2023
EMNLP Findings

Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems

Wang Zhu, Jesse Thomason, and Robin Jia

Github , 2022
ACL

Selective Question Answering under Domain Shift

Amita Kamath, Robin Jia, and Percy Liang

More Info , 2020
NAACL

Document-Level N-ary Relation Extraction with Multiscale Representation Learning

Robin Jia, Cliff Wong, and Hoifung Poon

More Info , 2019
EMNLP

Adversarial Examples for Evaluating Reading Comprehension Systems

Robin Jia, and Percy Liang

More Info and More Info and More Info , 2017
EMNLP

When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models

Ting-Yun Chang, Jesse Thomason, and Robin Jia

2024
NeurIPS

Pre-trained Large Language Models Use Fourier Features to Compute Addition

Tianyi Zhou, Deqing Fu, Vatsal Sharan, and Robin Jia

2024
arxiv

Language Models can Infer Action Semantics for Classical Planners from Environment Feedback

Wang Zhu, Ishika Singh, Robin Jia, and Jesse Thomason

2024
NeurIPS

Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models

Deqing Fu, Tian-Qi Chen, Robin Jia, and Vatsal Sharan

2024