Skip to yearly menu bar Skip to main content


MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

Huu Nguyen ⋅ Victor May ⋅ Harsh Raj ⋅ Marianna Nezhurina ⋅ Yishan Wang ⋅ Yanqi Luo ⋅ Minh Chien Vu ⋅ Taishi Nakamura ⋅ Ken Tsui ⋅ Van Nguyen ⋅ David Salinas ⋅ Aleksandra Krasnodębska ⋅ Christoph Schuhmann ⋅ Mats L. Richter ⋅ Xuan-Son Vu ⋅ Jenia Jitsev

Abstract

Chat is not available.