Toggle Poster Visibility
Oral
Thu Apr 23 11:15 AM -- 11:25 AM (PDT) None
High-dimensional Analysis of Synthetic Data Selection
[
OpenReview]
Oral
Thu Apr 23 11:27 AM -- 11:37 AM (PDT) None
How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability
[
OpenReview]
Oral
Thu Apr 23 11:39 AM -- 11:49 AM (PDT) None
Sequences of Logits Reveal the Low Rank Structure of Language Models
[
OpenReview]
Oral
Thu Apr 23 11:51 AM -- 12:01 PM (PDT) None
Intrinsic Entropy of Context Length Scaling in LLMs
[
Slides]
[
OpenReview]
Oral
Thu Apr 23 12:03 PM -- 12:13 PM (PDT) None
From Markov to Laplace: How Mamba In-Context Learns Markov Chains
[
OpenReview]
Oral
Thu Apr 23 12:15 PM -- 12:25 PM (PDT) None
The Coverage Principle: How Pre-Training Enables Post-Training
[
OpenReview]
Oral
Thu Apr 23 12:27 PM -- 12:37 PM (PDT) None
Quantitative Bounds for Length Generalization in Transformers
[
OpenReview]
Successful Page Load