ICLR DEI chairs of this year initiated a Call for Tiny Papers, announcing an alternative publishing format to encourage a wider and more diverse engagement of researchers. This new, additional track at ICLR 2023 received 219 submissions; after 3 months of reviewing, meta-reviewing and decision making, a selected number of papers (14 virtual & in-person oral presentations, 13 in-person poster presentations) are invited to present, and the whole initiative celebrated on the Tiny Papers Showcase Day.
Fri 12:00 a.m. - 12:15 a.m.
|
Opening Remarks
(
Presentation
)
SlidesLive Video » |
Rosanne Liu · Thomas F Burns · Krystal Maughan 🔗 |
Fri 12:15 a.m. - 12:30 a.m.
|
Secure communication model for quantum federated learning: A proof of concept
(
Virtual oral
)
link »
SlidesLive Video » We design a model of Post Quantum Cryptography (PQC) Quantum Federated Learning (QFL). We develop a proof of concept with a dynamic server selection and study convergence and security conditions. |
Dev Gurung 🔗 |
Fri 12:30 a.m. - 12:45 a.m.
|
The Point to Which Soft Actor-Critic Converges
(
Virtual oral
)
link »
SlidesLive Video » Soft actor-critic is a successful successor over soft Q-learning. While lived under maximum entropy framework, their relationship is still unclear. In this paper, we prove that in the limit they converge to the same solution. This is appealing since it translates the optimization from an arduous to an easier way. The same justification can also be applied to other regularizers such as KL divergence. |
Jianfei Ma 🔗 |
Fri 12:45 a.m. - 1:00 a.m.
|
Large Sparse Kernels for Federated Learning
(
Virtual oral
)
link »
Existing approaches to address non-iid data in federated learning are often tailored to specific types of heterogeneity and may lack generalizability to all scenarios. In this paper, we present empirical evidence supporting the claim that employing large sparse convolution kernels can lead to enhanced robustness against distribution shifts in the context of federated learning for various non-iid problems, including imbalanced data volumes, different feature spaces, and label distributions. Our experimental results demonstrate that the substitution of convolutional kernels with large sparse kernels can yield substantial improvements in the ability to resist non-iid problems across multiple methods. |
feilong zhang 🔗 |
Fri 1:00 a.m. - 1:15 a.m.
|
Pay Attention to Multi-Channel for Improving Graph Neural Networks
(
Virtual oral
)
link »
SlidesLive Video » We propose Multi-channel Graph Attention (MGAT) to efficiently handle channel-specific representations encoded by convolutional kernels, enhancing the incorporation of attention with graph convolutional network (GCN)-based architectures. Our experiments demonstrate the effectiveness of integrating our proposed MGAT with various spatial-temporal GCN models for improving prediction performance. |
Chung-Yi Lin 🔗 |
Fri 1:15 a.m. - 1:30 a.m.
|
SoftEDA: Rethinking Rule-Based Data Augmentation with Soft Labels
(
Virtual oral
)
link »
SlidesLive Video » Rule-based text data augmentation is widely used for NLP tasks due to its simplicity. However, this method can potentially damage the original meaning of the text, ultimately hurting the performance of the model. To overcome this limitation, we propose a straightforward technique for applying soft labels to augmented data. We conducted experiments across seven different classification tasks and empirically demonstrated the effectiveness of our proposed approach. We have publicly opened our source code for reproducibility. |
Juhwan Choi 🔗 |
Fri 1:30 a.m. - 2:00 a.m.
|
Break
|
🔗 |
Fri 2:00 a.m. - 2:15 a.m.
|
Compound Tokens: Channel Fusion for Vision-Language Representation Learning
(
Oral
)
link »
SlidesLive Video » We present an effective method for fusing visual-and-language representations for several question answering tasks including visual question answering and visual entailment. In contrast to prior works that concatenate unimodal representations or use only cross-attention, we compose multimodal representations via channel fusion. By fusing on the channels, the model is able to more effectively align the tokens compared to standard methods. These multimodal representations, which we call compound tokens are generated with cross-attention transformer layers. We demonstrate the effectiveness of compound tokens using an encoder-decoder vision-language model trained end-to-end in the open-vocabulary setting. Compound Tokens achieve highly competitive performance across a range of question answering tasks including GQA, VQA2.0, and SNLI-VE. |
Maxwell Aladago 🔗 |
Fri 2:15 a.m. - 2:30 a.m.
|
SimbaML: Connecting Mechanistic Models and Machine Learning with Augmented Data
(
Oral
)
link »
SlidesLive Video » Training sophisticated machine learning (ML) models requires large datasets that are difficult or expensive to collect for many applications. If prior knowledge about system dynamics is available, mechanistic representations can be used to supplement real-world data. We present SimbaML (Simulation-Based ML), an open-source tool that unifies realistic synthetic dataset generation from ordinary differential equation-based models and the direct analysis and inclusion in ML pipelines. SimbaML conveniently enables investigating transfer learning from synthetic to real-world data, data augmentation, identifying needs for data collection, and benchmarking physics-informed ML approaches. SimbaML is available from https://anonymous.4open.science/r/Simba_ML-4884. |
Lukas Drews 🔗 |
Fri 2:30 a.m. - 2:45 a.m.
|
Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models
(
Oral
)
link »
SlidesLive Video » Massive language models with billions of parameters have significant compute expenses and thus can benefit from pruning. Pruning techniques for massive models are typically iterative and require extensive weight retraining after pruning. SparseGPT, a recently introduced one-shot technique for pruning such models, enables pruning without retraining. We improve upon SparseGPT by fine-tuning during pruning with minimal training steps, and we perform experiments against magnitude pruning and find that our iteratively fine-tuned SparseGPT models significantly outperform their magnitude pruning counterparts at high sparsity. |
Aaquib Syed · Phillip Guo 🔗 |
Fri 2:45 a.m. - 3:00 a.m.
|
Optimizing MPJPE promotes miscalibration in multi-hypothesis human pose lifting
(
Virtual oral
)
link »
SlidesLive Video » Due to depth ambiguities and occlusions, lifting 2D poses to 3D is a highly ill-posed problem. Well-calibrated distributions of possible poses can make these ambiguities explicit and preserve the resulting uncertainty for downstream tasks, thus providing the necessary trustworthiness in safety-critical domains. This study shows that multiple hypothesis pose estimation methods produce miscalibrated distributions. We identify that miscalibration can be attributed to the optimization of mean per joint position error MPJPE. In a series of simulations, we show th |
Paweł Pierzchlewicz 🔗 |
Fri 3:00 a.m. - 3:15 a.m.
|
Theta sequences as eligibility traces: A biological solution to credit assignment
(
Virtual oral
)
link »
SlidesLive Video » Credit assignment problems, for example policy evaluation in RL, often require bootstrapping prediction errors through preceding states or maintaining temporally extended memory traces; solutions which are unfavourable or implausible for biological networks of neurons. We propose theta sequences -- chains of neural activity during theta oscillations in the hippocampus, thought to represent rapid playthroughs of awake behviour -- as a solution. By analysing and simulating a model for theta sequences we show they compress behaviour such that existing but short O(10)ms neuronal memory traces are effectively extended allowing for bootstrap-free credit assignment without long memory traces, equivalent to the use of eligibility traces in TD(\lambda). |
Tom George 🔗 |
Fri 3:15 a.m. - 4:15 a.m.
|
Lunch and Reflections
Grab lunch, and join a casual reflection session in the room, to talk about your experience in this brand new, experimental track. |
🔗 |
Fri 4:15 a.m. - 5:15 a.m.
|
Flash orals
SlidesLive Video » In-person poster presenters are invited to summarise their work in a 3 minute presentation and will have time for a maximum of 2 questions from the audience during the flash session. The in-person poster session (without SlidesLive or virtual component) will follow. |
🔗 |
Fri 5:15 a.m. - 6:15 a.m.
|
Poster session
|
🔗 |
Fri 6:15 a.m. - 6:45 a.m.
|
Break
|
🔗 |
Fri 6:45 a.m. - 7:00 a.m.
|
Tiny Attention: A Simple yet Effective Method for Learning Contextual Word Embeddings
(
Virtual oral
)
link »
SlidesLive Video » Contextual Word Embedding (CWE) obtained via the Attention Mechanism in Transformer (AMT) models is one of the key drivers of the current revolution in Natural Language Processing. Previous techniques for learning CWEs are not only inferior to AMT but also are largely subpar to the simple bag-of-words baseline. Though there have been many variants of the Transformer model, the attention mechanism itself remains unchanged and is largely opaque. We introduce a new method for leaning CWEs that uses a simple and transparent attention mechanism. Our method is derived from the SVD based Syntagmatic Word Embeddings, which capture word associations. We test our model on the Word-in-Context dataset, and show that it outperforms the simple but tough-to-beat baseline by a substantial margin. |
Renjith P Ravindran 🔗 |
Fri 7:00 a.m. - 7:15 a.m.
|
Decomposing Causality and Fairness
(
Virtual oral
)
link »
SlidesLive Video » It is often informative to decompose key quantities of interest into smaller components, in order to develop a better understanding of the key quantity. In this paper, we focus causality and fairness, where bias attribution can be particularly useful. We show how quantities can be broken down based on independence, or conditional independence criteria, and show how such a decomposition can be used as a diagnosis tool. |
Peter Hill 🔗 |
Fri 7:15 a.m. - 7:30 a.m.
|
Geodesic Mode Connectivity
(
Virtual oral
)
link »
SlidesLive Video » Mode connectivity is a phenomenon where trained models are connected by a path of low loss. We reframe this in the context of Information Geometry, where neural networks are studied as spaces of parameterized distributions with curved geometry. We hypothesize that shortest paths in these spaces, known as geodesics, correspond to mode-connecting paths in the loss landscape. We propose an algorithm to approximate geodesics and demonstrate that they achieve mode connectivity. |
Charlie Tan 🔗 |
Fri 7:30 a.m. - 7:45 a.m.
|
MetaXLR - Mixed Language Meta Representation Transformation for Low-resource Cross-lingual Learning based on Multi-Armed Bandit
(
Virtual oral
)
link »
SlidesLive Video » Transfer learning for extremely low-resource languages is a challenging task as there is no large-scale monolingual corpora for pre-training or sufficient annotated data for fine-tuning. We follow the work of (Xia et al., 2021) which suggests using meta learning for transfer learning from a single source language to an extremely low resource one. We propose an enhanced approach which uses multiple source languages chosen in a data-driven manner. In addition, we introduce a sample selection strategy for utilizing the languages in training by using a multi armed bandit algorithm. Using both of these improvements we managed to achieve state-of-the-art results on the NER task for the extremely low resource languages even with the same amount of data, making the representations better generalized. Also, due to the method’s ability to use multiple languages it allows the framework to use much larger amounts of data, while still having superior results over the former MetaXL method even with the same amounts of data. |
Liat Bezalel · Eyal Orgad 🔗 |
Fri 7:45 a.m. - 8:00 a.m.
|
Closing Remarks (dinner to follow)
(
Closing Remarks
)
|
Rosanne Liu · Thomas F Burns · Krystal Maughan 🔗 |