ICLR 2023

Poster

Tue 7:30

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small
Kevin Wang · Alexandre Variengien · Arthur Conmy · Buck Shlegeris · Jacob Steinhardt

Workshop

PHYSICS-INSPIRED INTERPRETABILITY OF MACHINE LEARNING MODELS
Maximilian Niroomand · David Wales

Poster

Characterizing the Influence of Graph Elements
Zizhang chen · Peizhao Li · Hongfu Liu · Pengyu Hong

Poster

Re-calibrating Feature Attributions for Model Interpretation
Peiyu Yang · NAVEED AKHTAR · Zeyi Wen · Mubarak Shah · Ajmal Mian

Poster

Wed 2:30

Concept Gradient: Concept-based Interpretation Without Linear Assumption
Andrew Bai · Chih-Kuan Yeh · Neil Lin · Pradeep K Ravikumar · Cho-Jui Hsieh

Poster

Wed 7:30

Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
Swarnadeep Saha · Shiyue Zhang · Peter Hase · Mohit Bansal

Poster

Wed 2:30

Discovering Latent Knowledge in Language Models Without Supervision
Collin Burns · Haotian Ye · Dan Klein · Jacob Steinhardt

Poster

Mon 2:30

PandA: Unsupervised Learning of Parts and Appearances in the Feature Maps of GANs
James Oldfield · Christos Tzelepis · Yannis Panagakis · Mihalis Nicolaou · Ioannis Patras

Poster

Mon 7:30

Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning
Antonia Creswell · Murray Shanahan · Irina Higgins

Oral

Mon 7:10

Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning
Antonia Creswell · Murray Shanahan · Irina Higgins

Poster

Tue 2:30

Interpretations of Domain Adaptations via Layer Variational Analysis
Huan-Hsin Tseng · Hsin-Yi Lin · Kuo-Hsuan Hung · Yu Tsao

Poster

CoRTX: Contrastive Framework for Real-time Explanation
Yu-Neng Chuang · Guanchu Wang · Fan Yang · Quan Zhou · Pushkar Tripathi · Xuanting Cai · Xia Hu

Main Navigation

60 Results