Skip to yearly menu bar Skip to main content


(7 events)   Timezone:  
Show all
Toggle Poster Visibility
Oral
Fri Apr 24 11:15 AM -- 11:25 AM (PDT) @ Amphitheater None
ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
Akshat Ramachandran ⋅ Marina Neseem ⋅ Charbel Sakr ⋅ Rangharajan Venkatesan ⋅ Brucek Khailany ⋅ Tushar Krishna
[ OpenReview
Oral
Fri Apr 24 11:27 AM -- 11:37 AM (PDT) @ Amphitheater None
MrRoPE: Mixed-radix Rotary Position Embedding
Qingyuan Tian ⋅ Wenhong Zhu ⋅ Xiaoran Liu ⋅ Xiaofeng Wang ⋅ Rui Wang
[ OpenReview
Oral
Fri Apr 24 11:39 AM -- 11:49 AM (PDT) @ Amphitheater None
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Ang Lv ⋅ Jin Ma ⋅ Yiyuan Ma ⋅ Siyuan Qiao
[ Slides [ OpenReview
Oral
Fri Apr 24 11:51 AM -- 12:01 PM (PDT) @ Amphitheater None
ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models
Federico Danieli ⋅ Pau Rodriguez ⋅ Miguel Sarabia ⋅ Xavier Suau ⋅ Luca Zappella
[ OpenReview
Oral
Fri Apr 24 12:03 PM -- 12:13 PM (PDT) @ Amphitheater None
Mamba-3: Improved Sequence Modeling using State Space Principles
Aakash Sunil Lahoti ⋅ Kevin Li ⋅ Berlin Chen ⋅ Caitlin Wang ⋅ Aviv Bick ⋅ Zico Kolter ⋅ Tri Dao ⋅ Albert Gu
[ OpenReview
Oral
Fri Apr 24 12:15 PM -- 12:25 PM (PDT) @ Amphitheater None
Energy-Based Transformers are Scalable Learners and Thinkers
Alexi Gladstone ⋅ Ganesh Nanduru ⋅ Md Mofijul Islam ⋅ Peixuan Han ⋅ Hyeonjeong Ha ⋅ Aman Chadha ⋅ Yilun Du ⋅ Heng Ji ⋅ Jundong Li ⋅ Tariq Iqbal
[ OpenReview
Oral
Fri Apr 24 12:27 PM -- 12:37 PM (PDT) @ Amphitheater None
Transformers are Inherently Succinct
Pascal Bergsträßer ⋅ Ryan Cotterell ⋅ Anthony W. Lin
[ OpenReview