firstbacksecondback
60 Results
Poster
|
Tue 7:30 |
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small Kevin Wang · Alexandre Variengien · Arthur Conmy · Buck Shlegeris · Jacob Steinhardt |
|
Workshop
|
PHYSICS-INSPIRED INTERPRETABILITY OF MACHINE LEARNING MODELS Maximilian Niroomand · David Wales |
||
Poster
|
Characterizing the Influence of Graph Elements Zizhang chen · Peizhao Li · Hongfu Liu · Pengyu Hong |
||
Poster
|
Re-calibrating Feature Attributions for Model Interpretation Peiyu Yang · NAVEED AKHTAR · Zeyi Wen · Mubarak Shah · Ajmal Mian |
||
Poster
|
Wed 2:30 |
Concept Gradient: Concept-based Interpretation Without Linear Assumption Andrew Bai · Chih-Kuan Yeh · Neil Lin · Pradeep K Ravikumar · Cho-Jui Hsieh |
|
Poster
|
Wed 7:30 |
Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees Swarnadeep Saha · Shiyue Zhang · Peter Hase · Mohit Bansal |
|
Poster
|
Wed 2:30 |
Discovering Latent Knowledge in Language Models Without Supervision Collin Burns · Haotian Ye · Dan Klein · Jacob Steinhardt |
|
Poster
|
Mon 2:30 |
PandA: Unsupervised Learning of Parts and Appearances in the Feature Maps of GANs James Oldfield · Christos Tzelepis · Yannis Panagakis · Mihalis Nicolaou · Ioannis Patras |
|
Poster
|
Mon 7:30 |
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning Antonia Creswell · Murray Shanahan · Irina Higgins |
|
Oral
|
Mon 7:10 |
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning Antonia Creswell · Murray Shanahan · Irina Higgins |
|
Poster
|
Tue 2:30 |
Interpretations of Domain Adaptations via Layer Variational Analysis Huan-Hsin Tseng · Hsin-Yi Lin · Kuo-Hsuan Hung · Yu Tsao |
|
Poster
|
CoRTX: Contrastive Framework for Real-time Explanation Yu-Neng Chuang · Guanchu Wang · Fan Yang · Quan Zhou · Pushkar Tripathi · Xuanting Cai · Xia Hu |