Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models
Orchid: Flexible and Data-Adaptive Convolution for Sequence Modeling
Mahdi Karami · Ali Ghodsi
In the rapidly evolving landscape of deep learning, the quest for models that balance expressivity with computational efficiency has never been more critical.Orchid is designed to address the quadratic computational complexity of attention models without sacrificing the model's ability to capture long-range dependencies.At the core of Orchid lies the data-adaptive convolution layers, which conditionally adjust their kernels based on input data using a conditioning neural network.This innovative approach enables the model to maintain scalability and efficiency for long sequence lengths. The adaptive nature of data-adaptive convolution kernel combined with the gating operations allows it to offer a highly expressive neural network.We rigorously evaluate Orchid across multiple domains, including language modeling and image classification, to showcase its generality and performance.Our experiments demonstrate that Orchid not only consistently outperforms traditional attention-based architectures in most scenarios but also extends the feasible sequence length beyond the constraints of dense attention layers. This achievement marks a significant milestone in the pursuit of more efficient and scalable deep learning models for sequence modeling.