Training large language models has become a defining pursuit in modern machine learning—one that is almost entirely led by industry, fueled by massive computational resources and guided by scaling laws that reward ever-larger models and datasets. For academic researchers, participating in this space can feel out of reach. The barriers—limited compute, infrastructure, and access to proprietary data—are real and growing. Still, I believe academia has an essential role to play. Even with constraints, there are important scientific questions and meaningful opportunities that academic research is uniquely positioned to tackle. By engaging with the training process itself, we can deepen our understanding of language models and develop novel and efficient approaches that complement large-scale efforts. In this talk, I’ll share my lab’s research efforts over the past two years in both pre-training and post-training of language models under an academic budget. Our work has aimed to better understand training dynamics, innovate within limitations, and release artifacts that benefit the broader research community. I’ll also highlight three areas where academic researchers can make significant contributions: (1) developing small but capable models, (2) understanding and improving training data, and (3) advancing post-training methods on top of open-weight models. My hope is to encourage broader engagement with LM training in academia, and to foster new forms of collaboration between academic and industry research.
Training large language models has become a defining pursuit in modern machine learning—one that is almost entirely led by industry, fueled by massive computational resources and guided by scaling laws that reward ever-larger models and datasets. For academic researchers, participating in this space can feel out of reach. The barriers—limited compute, infrastructure, and access to proprietary data—are real and growing. Still, I believe academia has an essential role to play. Even with constraints, there are important scientific questions and meaningful opportunities that academic research is uniquely positioned to tackle. By engaging with the training process itself, we can deepen our understanding of language models and develop novel and efficient approaches that complement large-scale efforts. In this talk, I’ll share my lab’s research efforts over the past two years in both pre-training and post-training of language models under an academic budget. Our work has aimed to better understand training dynamics, innovate within limitations, and release artifacts that benefit the broader research community. I’ll also highlight three areas where academic researchers can make significant contributions: (1) developing small but capable models, (2) understanding and improving training data, and (3) advancing post-training methods on top of open-weight models. My hope is to encourage broader engagement with LM training in academia, and to foster new forms of collaboration between academic and industry research.
AI Multi-Agent Systems in Enterprise: Bridging Research and Real-World Applications
This social aims to explore the journey of AI multi-agent systems from academic research to deployment in enterprise settings. Discussions will focus on the challenges and successes encountered during this transition, including integration strategies, scalability, and the impact on business operations.
ML for Digital Twins
The emerging field of digital twins, virtual replicas of physical systems, represents a significant frontier for machine learning research with broad applications across industries. This social will bring together researchers interested in the unique challenges of leveraging ML to create, improve, and deploy digital twins.
This will be an interactive discussion focusing on key questions broadly related to the fidelity, scientific accuracy and limitations of ML for digital twins.
The session will feature brief introductions from participants working in this area, followed by open discussion and potential collaboration opportunities. We welcome researchers from diverse ML backgrounds including reinforcement learning, generative modeling, time-series forecasting, and domain experts from industries leveraging digital twins.
Join us to explore this rapidly evolving intersection of ML theory and practical applications transforming industries including manufacturing, healthcare, and climate science.
ML Safety Social
As AI systems become increasingly capable and widely deployed, ensuring their safety and reliability is more important than ever. Researchers in the ML Safety community are working on various challenges, including interpretability, adversarial robustness, and alignment, which have become more complex with advances in multi-modal and agentic systems. This rapidly evolving field spans industry labs and academic groups, united by the need to address emerging risks.
We want to host a semi-structured meet-up for researchers who are currently working on or interested in safety-related topics to foster discussion and collaboration. We expect at least 150 people to attend. We previously hosted similar events at NeurIPS, ICML, and ICLR in 2023 and 2024, which were very well attended (150-300 people).
The event will open with a 30-minute panel discussion on the state of ML safety research, followed by a brief Q&A session. The rest of the event will consist of informal discussion and mingling among attendees. We will provide drinks and snacks.
ML in Software Engineering
Discuss ongoing work, upcoming trends, challenges, and job opportunities related to applications of ML in software engineering tools and processes.
Mentorship Hour
MENTORS: Furong Huang, Tatsunori Hashimoto, Erin Grant
Part of the ICLR experience is meeting people and talking with them about their research interests and experiences. To facilitate these conversations, we are thrilled to announce the third iteration of Mentoring Chats at ICLR (previously called Office Hours).Mentoring Chats will be 45-minute round-table sessions, held during lunch (12:30-1:15 pm and 1:15-2:00 pm) in the Topaz Concourse every day of the main conference (April 24-26). There will be a bell ring approximately 22 minutes in, urging participants to switch tables, or switch topics while staying at the same table. Following ICLR 2024, we have a list of topics and questions that you may wish to ask mentors. We hope to see you there!
Research agenda
- Where should I start if I want to do research in ML? What kind of mathematical/programming skills are required for ML research?
- What are good courses to take? How should I use different modes of learning, such as classroom courses, video lectures, and reading a book?
- How to keep track of all the research literature? How to balance breadth vs depth?
- What are some broader goals of academic machine learning research in the era of LLMs?
- How can one set themselves apart in this crowded research space?
- What is ethical research?
- How to decide on a research area? How to decide on a research project?
- How to adapt my research according to the current trends/community interests?
- How to cope with the pressure of publishing while working on riskier/harder projects? Should I be worried about other groups scooping my research and how to deal with such situations?
- Should I establish myself as an expert in one area/technique or explore a breadth of topics? Should I master a technique and apply it to different problems, or should I master a subfield by finding all useful techniques (hammer vs nails)?
ML+X: Multidisciplinary research
- What are good strategies for starting an interdisciplinary project?
- When working across disciplines, should I have one of them as my “home” community or try to be equally visible in both?
- What are the most efficient ways to help establish my ML+X area as a more active area? Should I organize workshops, teach tutorials, ...?
- How to deal with different incentive structures in interdisciplinary collaborations (e.g., journals vs conferences)?
Advisor and collaborators
- Should I follow my advisor’s agenda or define my own?
- What are the pros and cons of being co-advised?
- When is it appropriate to change advisors and how to go about it?
- How to navigate conflicts with an advisor?
- How to get a good balance between collaborating with other researchers while also distinguishing my own research? Will too much collaboration hurt my job prospects?
- What to look for in a collaborator?
- How do I convey the level of commitment I am willing to have in a project without it being awkward? How to say no to collaborations?
- How to navigate different conventions wrt author ordering? Alphabetical vs contributional ordering? Should my advisor always be a coauthor because they are funding me?
- What do I do if my collaborator is not responsive?
Communicating research and networking
- How to find mentors and allies beyond my advisor?
- What is the best way to communicate my research? Blogs, videos, presentations?
- How to write a good research statement? How to apply for fellowships?
- Should I present my work in poster sessions and workshops? Should I be scared of getting scooped? What are the pros of presenting my work early?
Beyond your institution: Internships and research visits
- Should I do a research internship on a topic different from my dissertation?
- Does it make sense to do a software engineering/development internship if it is not research-related?
- When is a good time to look for internships? Should I apply online or email people?
- Should I do research visits to other universities? Does it make sense to go to semester-long programs as a junior student?
- How to get the most out of my internship? What should be the main goal of doing an internship?
Planning after grad school: academia vs industry
- What should I consider when planning for the next step? How should I decide whether to go to academia or industry?
- How to select a postdoc advisor?
- Should I apply to different departments than my core department? How can I prepare for that, and how early?
- Is it ok to quit your PhD? How can I plan my next steps if so?
Work ethics, open research discussion, personal challenges
- How to balance work-life? How much work is too much work?
- How to take care of mental and physical health?
- How to learn about the ethical implications around the topics of my research?
- How to foster inclusion in research and teaching?
Improving LLM Benchmarks: Making AI Work for Real-World Needs
To make AI models truly useful in real-world settings, we need better ways to measure their performance. This talk will focus on how we can improve benchmarks, ensuring LLMs are tested in ways that reflect actual business challenges.
Jonathan will discuss how using real user feedback and industry-specific examples can create more meaningful tests for AI models. We’ll explore ways to measure AI performance based on practical tasks that require applying the model’s conceptual understanding. This will complement the many existing benchmarks that focus on evaluating AI models across a range of conceptual understanding tasks.
By designing evaluation methods that reflect real-world use, we can help bridge the gap between research and business, making AI more effective and reliable in everyday applications.
About the Speaker:
Jonathan Siddharth
Founder and Chief Executive Officer, Turing
Jonathan Siddharth is the Founder and CEO of Turing, one of the world's fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems. Turing helps customers in two ways: working with the world’s leading AI labs to advance frontier model capabilities in thinking, reasoning, coding, agentic behavior, multimodality, multilinguality, STEM and frontier knowledge; and leveraging that expertise to build real-world AI systems that solve mission-critical priorities for Fortune 500 companies and government institutions.
Siddharth is a rare blend of AI scientist and serial tech entrepreneur, with a track record of building transformative AI systems and scaling successful ventures. He helped pioneer natural language search at Powerset, which was acquired by Microsoft, and went on to architect large-scale AI platforms at Rover—a content discovery engine, he co-founded and led as CEO, that was acquired by Revcontent—and at Turing, where he continues to lead cutting-edge innovation.
Beyond his work at Turing, Siddharth has served on the board of Quora, the global knowledge-sharing platform, and is an active investor and advisor to StartX, Stanford’s premier startup accelerator, where he supports the next generation of founders.
He earned his master’s degree in computer science from Stanford University, graduating with distinction in research for his work applying machine learning to web search.
Leveraging Multimodal LLMs for Shopify’s Global Catalogue
As marketing channels rapidly evolve, Shopify’s Global Catalogue initiative aims to improve product discoverability by consolidating millions of products from diverse shops into a single unified system, enabling integration with next-generation platforms such as AI agents and virtual realities. This expo talk will present the core components of this initiative, focusing on the integration of multimodal LLMs to enrich product metadata. We’ll explore the processes of data curation, model fine-tuning, experimentation, evaluation, and feedback loops, showcasing our approach to building and continuously improving these models. Plus, how we leveraged open source tools to scale and deploy these models to make real time predictions for around 40 million LLM calls, or about 16 billion tokens daily. Finally, we'll highlight how these enriched data representations are currently advancing conversational commerce, enhancing search functionalities, and improving personalization.
Kvax: Fast and easy-to-use Flash Attention implementation for JAX
Kvax is a custom FlashAttention implementation for JAX, optimised for long-context training with efficient document mask computation and context parallelism. This talk explores the key ideas behind its implementation, focusing on document mask performance optimisations and context parallelism.
verl: Flexible and Efficient Infrastructures for Post-training LLMs
Recent advances in reinforcement learning significantly boosts the reasoning capabilities of LLMs. Models such as OpenAI o3, Claude 3.7, DeepSeek r1, etc,. demonstrates magnificent performance in STEM and coding tasks. Yet, training such models requires complex infrastructures. In this talk, we present verl (https://github.com/volcengine/verl), a comprehensive framework that utilizes HybridFlow programming abstraction to achieve both flexibility to implement various algorithms and high performance. Through this talk, audiences will gain i) a basic understanding of various RL algorithms including PPO and GRPO; ii) best practices to train state-of-the-art open source language models and vision language models such as QWen series using verl. iii) best practices to implement tool calling and multi-turn rollout.
Town Hall
An open discussion led by the organizing committee on topics related to ICLR, such as the review process, policy, and venue.
Mentorship Hour
MENTORS: Amy Zhang, Junxian He, David Abel, Huazhe Xu
Part of the ICLR experience is meeting people and talking with them about their research interests and experiences. To facilitate these conversations, we are thrilled to announce the third iteration of Mentoring Chats at ICLR (previously called Office Hours).Mentoring Chats will be 45-minute round-table sessions, held during lunch (12:30-1:15 pm and 1:15-2:00 pm) in the Topaz Concourse every day of the main conference (April 24-26). There will be a bell ring approximately 22 minutes in, urging participants to switch tables, or switch topics while staying at the same table.Following ICLR 2024, we have a list of topics and questions that you may wish to ask mentors. We hope to see you there!
Research agenda
- Where should I start if I want to do research in ML? What kind of mathematical/programming skills are required for ML research?
- What are good courses to take? How should I use different modes of learning, such as classroom courses, video lectures, and reading a book?
- How to keep track of all the research literature? How to balance breadth vs depth?
- What are some broader goals of academic machine learning research in the era of LLMs?
- How can one set themselves apart in this crowded research space?
- What is ethical research?
- How to decide on a research area? How to decide on a research project?
- How to adapt my research according to the current trends/community interests?
- How to cope with the pressure of publishing while working on riskier/harder projects? Should I be worried about other groups scooping my research and how to deal with such situations?
- Should I establish myself as an expert in one area/technique or explore a breadth of topics? Should I master a technique and apply it to different problems, or should I master a subfield by finding all useful techniques (hammer vs nails)?
ML+X: Multidisciplinary research
- What are good strategies for starting an interdisciplinary project?
- When working across disciplines, should I have one of them as my “home” community or try to be equally visible in both?
- What are the most efficient ways to help establish my ML+X area as a more active area? Should I organize workshops, teach tutorials, ...?
- How to deal with different incentive structures in interdisciplinary collaborations (e.g., journals vs conferences)?
Advisor and collaborators
- Should I follow my advisor’s agenda or define my own?
- What are the pros and cons of being co-advised?
- When is it appropriate to change advisors and how to go about it?
- How to navigate conflicts with an advisor?
- How to get a good balance between collaborating with other researchers while also distinguishing my own research? Will too much collaboration hurt my job prospects?
- What to look for in a collaborator?
- How do I convey the level of commitment I am willing to have in a project without it being awkward? How to say no to collaborations?
- How to navigate different conventions wrt author ordering? Alphabetical vs contributional ordering? Should my advisor always be a coauthor because they are funding me?
- What do I do if my collaborator is not responsive?
Communicating research and networking
- How to find mentors and allies beyond my advisor?
- What is the best way to communicate my research? Blogs, videos, presentations?
- How to write a good research statement? How to apply for fellowships?
- Should I present my work in poster sessions and workshops? Should I be scared of getting scooped? What are the pros of presenting my work early?
Beyond your institution: Internships and research visits
- Should I do a research internship on a topic different from my dissertation?
- Does it make sense to do a software engineering/development internship if it is not research-related?
- When is a good time to look for internships? Should I apply online or email people?
- Should I do research visits to other universities? Does it make sense to go to semester-long programs as a junior student?
- How to get the most out of my internship? What should be the main goal of doing an internship?
Planning after grad school: academia vs industry
- What should I consider when planning for the next step? How should I decide whether to go to academia or industry?
- How to select a postdoc advisor?
- Should I apply to different departments than my core department? How can I prepare for that, and how early?
- Is it ok to quit your PhD? How can I plan my next steps if so?
Work ethics, open research discussion, personal challenges
- How to balance work-life? How much work is too much work?
- How to take care of mental and physical health?
- How to learn about the ethical implications around the topics of my research?
- How to foster inclusion in research and teaching?
The pursuit of Artificial Superintelligence (ASI) requires a shift from narrow objective optimization towards embracing Open-Endedness—a research paradigm, pioneered in AI by Stanley, Lehman and Clune, that is focused on systems that generate endless sequences of novel but learnable artifacts. In this talk, I will present our work on large-scale foundation world models that can generate a wide variety of diverse environments that can in turn be used to train more general and robust agents. Furthermore, I will argue that the connection between Open-Endedness and Foundation Models points towards automating innovation itself. This convergence is already yielding practical results, enabling self-referential self-improvement loops for automated prompt engineering, automated red-teaming, and AI debate in Large Language Models, and it hints at a future where AI drives its own discoveries.
Open-Endedness, World Models, and the Automation of Innovation
The pursuit of Artificial Superintelligence (ASI) requires a shift from narrow objective optimization towards embracing Open-Endedness—a research paradigm, pioneered in AI by Stanley, Lehman and Clune, that is focused on systems that generate endless sequences of novel but learnable artifacts. In this talk, I will present our work on large-scale foundation world models that can generate a wide variety of diverse environments that can in turn be used to train more general and robust agents. Furthermore, I will argue that the connection between Open-Endedness and Foundation Models points towards automating innovation itself. This convergence is already yielding practical results, enabling self-referential self-improvement loops for automated prompt engineering, automated red-teaming, and AI debate in Large Language Models, and it hints at a future where AI drives its own discoveries.
AI Co-scientist Discussion
Join us at the AI Co-scientist Discussion social at ICLR 2025! This gathering brings together researchers and practitioners interested in collaboratively building AI agents capable of scientific discovery. Our focus will be on sharing insights, discussing practical approaches, and thoughtfully addressing ethical considerations to responsibly advance AI as co-researchers. Connect with peers passionate about ethical AI development, exchange ideas, and explore new collaborations. We warmly invite anyone committed to shaping the future of AI-assisted scientific research through careful ethical reflection and innovative thinking.
LLM Agents 360°: A Holistic View on Frameworks, Systems, and Simulations.
LLM agents are transforming automation, research, and real-world applications. With their increasing adoption, understanding the full landscape - from foundational frameworks to deployment trade-offs - is more critical than ever. OpenAI has just released the Agents SDK, and MCP from Anthropic is also available. This social event at ICLR 2025 will provide a comprehensive view of the evolution of LLM agents, exploring when and where they provide the most value, their strengths and limitations, and the critical factors in building reliable, scalable systems. It will also cover the future of AI agents, including protocols, simulations, and emerging trends. The event’s agenda now includes four short expert talks and a fireside chat with AI leaders from OpenAI, Meta, LangChain, and other leading AI companies working on AI agents. Attendees will gain valuable insights into different frameworks, system architectures, and simulation approaches, helping them make informed decisions about using LLM agents in their own work. They will also have the opportunity to exchange ideas with top researchers and practitioners, explore collaborative opportunities, and network with others interested in AI agents.
Queer in AI Social
This is a meetup for queer researchers and practitioners working in AI. We have hosted many such meetups over the years at conferences such as ICLR, ICML, NeurIPS, NACCL, IROS, etc. Participants have found them a valuable source of community in an environment that, while generally well-intentioned, can feel alienating to those who do not match the overwhelming norm in aspects of personal identity.
Queer in AI’s mission is to raise awareness of queer issues in AI/ML, foster a community of queer researchers and celebrate the work of queer scientists. We use “queer” as an umbrella term for people with diverse non-normative sexual orientations, romantic orientations, and/or genders, corresponding to acronyms like LGBTQIA2S+. We also explicitly include those questioning their identities. Queer in AI’s demographic survey reveals that most queer scientists in our community do not feel completely welcome in conferences or other work environments, with the main reasons being a lack of queer community and role models. While there has been progress on these issues in recent years, issues remain, particularly for those who are transgender/non-binary and/or BIPOC. One of many steps towards improving that situation is to provide queer-focused spaces in work contexts such as this social.