Ant-InclusionAI: A Fully Open-Sourced Project for LLMs from RL Reasoning to Agents

Wed 23 Apr 10 p.m. - 11 p.m. PDT
Expo Talk Panel

InclusionAI is a project at Ant Group aiming to develop fully open-sourced AI ecosystem, from algorithm and models to training infra and data. This talk will highlight two particular projects in InclusionAI, AReaL and AWorld. AReaL is an open-sourced RL training systems for training large reasoning models. We will discuss the design details and our training experiences using AReaL. AWorld (Agent World) is a comprehensive framework that simplifies the building, evaluation, and deployment of general multi-agent assistance systems. We'll demonstrate its capabilities and explore how AI agents collaborate to solve real-world tasks.

Join Virtual Talk & Panel Visit Ant Research Booth

Agent research in the real world

Wed 23 Apr 10 p.m. - 11 p.m. PDT
Expo Talk Panel

RAG is fundamental to real-world agents, but accuracy limitations persist. This talk presents Google Cloud's research on improving RAG for practical applications. We'll discuss findings from real-world deployments and share our approaches to intelligent knowledge integration, end-to-end tuning, and other techniques achieving state-of-the-art performance, demonstrating how we're making RAG more reliable for real-world scenarios.

Join Virtual Talk & Panel Visit Google Research Booth

Beyond Chain-of-Thought: Towards Autonomous Knowledge Management in Alibaba Cloud Tongyi Agentic Systems

Wed 23 Apr 10 p.m. - 11 p.m. PDT
Expo Talk Panel

This talk focuses on Alibaba Cloud‘s latest research progress in retrieval-augmented system in large language model agents, exploring core technical pathways for knowledge storage, comprehension, reasoning, and planning, while proposing external information enhancement strategies to expand cognitive boundaries. For complex multi-modal search scenarios, it systematically elaborates an agent-based task planning framework and its autonomous decision-making capabilities. Furthermore, by analyzing practical implementations of knowledge enhancement technologies in AI Search applications such as Q&A and cross-modal retrieval, the study provides technical frameworks and actionable insights for building self-evolving agent systems that transcend traditional paradigms.

Join Virtual Talk & Panel Visit Alibaba Cloud Booth

Human Attention is NOT all you need

Thu 24 Apr 10 p.m. - 11 p.m. PDT
Expo Talk Panel

While attention does seem to be all you need to attain high-performing models, spanning many modalities, human attention is not all you need to prepare datasets at scale. What you need is scalable and flexible data workflows to automate the process. Established solutions to, e.g., computer vision, audio, and text — including the latest advancements in foundation model capabilities — open up new possibilities to transform AI Data workflows. To name but a few automations, high-volume manual actions like cropping, transcription, and audio-video pairing, as well as more complex reasoning tasks such as video insight extraction and content evaluation. By chaining multiple models together, teams can build custom data engines to create novel, high-quality datasets at scale. On a fixed 100hour human labor budget, we showcase how a high level of automation and constrained budget of human attention spent wisely, consistently outperforms the traditional methods for building datasets.

Join Virtual Talk & Panel Visit Encord Booth

AutoGluon 1.2: Advancing AutoML with Foundation Models and LLM Agents

Thu 24 Apr 10 p.m. - 11 p.m. PDT
Expo Talk Panel

Automated Machine Learning (AutoML) continues to revolutionize how machine learning models are developed, making it accessible to practitioners with varying levels of expertise. In this workshop, we present the latest advancements in AutoGluon 1.2, an open-source AutoML toolkit developed by Amazon, which empowers users to achieve state-of-the-art performance across diverse machine learning tasks with minimal coding effort. We will emphasize how foundational models can streamline and enhance AutoML performance. Specifically, we will discuss our TabPFN-Mix and Chronos foundational model families for tabular and time series data, respectively. In addition, we will introduce the real-world problems that AutoGluon can help you solve within three lines of code and the fundamental techniques adopted in the toolkit. Rather than diving deep into the mechanisms underlining each individual ML model, we emphasize on how you can take advantage of a diverse collection of models to build an automated ML pipeline.

Join Virtual Talk & Panel Visit AMAZON Booth

Bridging Specialized ML Research and Systematic Investing: Transforming the Research Pipeline

Thu 24 Apr 10 p.m. - 11 p.m. PDT
Expo Talk Panel

Recent breakthroughs in machine learning offer powerful tools that can significantly transform systematic investment research. Within our quantitative research and development team, we leverage specialized ML techniques—including large language models (LLMs), retrieval-augmented generation (RAG), agent-based systems, variational autoencoders (VAEs), graph neural networks (GNNs), and multimodal signal processing—to curate large-scale datasets, automate feature extraction, construct robust trading signals, and systematically generate innovative investment hypotheses.

This talk will identify key opportunities where advanced ML methods, featured in ICLR 2025 papers, can substantially enhance systematic investment pipelines. Additionally, we propose ambitious directions for future research, such as building sophisticated ecosystems of interacting ML agents, creating a compelling landscape for ML researchers interested in translating research innovation into impactful real-world investment strategies.

Join Virtual Talk & Panel Visit Abu Dhabi Investment Authority Booth

EUREKA: Evaluating and Understanding Large Foundation Models

Thu 24 Apr 10 p.m. - 11 p.m. PDT
Expo Talk Panel

Rigorous evaluation of large foundation models is critical for assessing the state of the art, informing improvements, and guiding scientific advances in AI. It’s also crucial for app developers using these models. However, practical challenges include benchmark saturation, lack of transparency, difficulties in measuring generative tasks, and numerous capabilities needed for comprehensive model comparison. We also need a deeper understanding of model failures and whether they are consistent over time.

Moreover, with models advancing in reasoning capabilities, a robust evaluation framework is necessary. This session introduces Eureka as a reusable and open framework for standardizing evaluations beyond single-score reporting. We’ll also present Eureka-Bench, which offers benchmarks for challenging and fundamental capabilities in language and vision, including reasoning skills (math, science, hard algorithmic and planning problems). Non-saturated benchmarks help identify meaningful differences between models.

We’ll present insights from analyzing 12 state-of-the-art models, uncovering granular weaknesses and guiding targeted improvements. We’ll also highlight findings from our recent paper on inference-time scaling, which examines reasoning performance and compute tradeoffs. We present an empirical study of inference-time scaling methods for improving reasoning in LLMs across diverse, complex tasks, analyzing their effectiveness, cost-efficiency, and limitations.

Eureka, available as open-source, fosters transparent and reproducible evaluations and has gained significant industry interest, including in prominent press releases.

Useful links:

Blog: https://aka.ms/eureka-ml-insights-blog
Technical report on Eureka: https://aka.ms/eureka-ml-insights-report
Paper on Inference Time Scaling: https://arxiv.org/abs/2504.00294v1
Github repository: https://github.com/microsoft/eureka-ml-insights
Website: https://microsoft.github.io/eureka-ml-insights

Join Virtual Talk & Panel Visit Microsoft Booth

Leveraging Multimodal LLMs for Shopify’s Global Catalogue

Fri 25 Apr 10 p.m. - 11 p.m. PDT
Expo Talk Panel

As marketing channels rapidly evolve, Shopify’s Global Catalogue initiative aims to improve product discoverability by consolidating millions of products from diverse shops into a single unified system, enabling integration with next-generation platforms such as AI agents and virtual realities. This expo talk will present the core components of this initiative, focusing on the integration of multimodal LLMs to enrich product metadata. We’ll explore the processes of data curation, model fine-tuning, experimentation, evaluation, and feedback loops, showcasing our approach to building and continuously improving these models. Plus, how we leveraged open source tools to scale and deploy these models to make real time predictions for around 40 million LLM calls, or about 16 billion tokens daily. Finally, we'll highlight how these enriched data representations are currently advancing conversational commerce, enhancing search functionalities, and improving personalization.

Join Virtual Talk & Panel Visit Shopify Booth

Improving LLM Benchmarks: Making AI Work for Real-World Needs

Fri 25 Apr 10 p.m. - 11 p.m. PDT
Expo Talk Panel

To make AI models truly useful in real-world settings, we need better ways to measure their performance. This talk will focus on how we can improve benchmarks, ensuring LLMs are tested in ways that reflect actual business challenges.

Jonathan will discuss how using real user feedback and industry-specific examples can create more meaningful tests for AI models. We’ll explore ways to measure AI performance based on practical tasks that require applying the model’s conceptual understanding. This will complement the many existing benchmarks that focus on evaluating AI models across a range of conceptual understanding tasks.

By designing evaluation methods that reflect real-world use, we can help bridge the gap between research and business, making AI more effective and reliable in everyday applications.

About the Speaker:

Jonathan Siddharth
Founder and Chief Executive Officer, Turing

Jonathan Siddharth is the Founder and CEO of Turing, one of the world's fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems. Turing helps customers in two ways: working with the world’s leading AI labs to advance frontier model capabilities in thinking, reasoning, coding, agentic behavior, multimodality, multilinguality, STEM and frontier knowledge; and leveraging that expertise to build real-world AI systems that solve mission-critical priorities for Fortune 500 companies and government institutions.

Siddharth is a rare blend of AI scientist and serial tech entrepreneur, with a track record of building transformative AI systems and scaling successful ventures. He helped pioneer natural language search at Powerset, which was acquired by Microsoft, and went on to architect large-scale AI platforms at Rover—a content discovery engine, he co-founded and led as CEO, that was acquired by Revcontent—and at Turing, where he continues to lead cutting-edge innovation.

Beyond his work at Turing, Siddharth has served on the board of Quora, the global knowledge-sharing platform, and is an active investor and advisor to StartX, Stanford’s premier startup accelerator, where he supports the next generation of founders.

He earned his master’s degree in computer science from Stanford University, graduating with distinction in research for his work applying machine learning to web search.

Join Virtual Talk & Panel Visit Turing Booth

verl: Flexible and Efficient Infrastructures for Post-training LLMs

Fri 25 Apr 10 p.m. - 11 p.m. PDT
Expo Talk Panel

Recent advances in reinforcement learning significantly boosts the reasoning capabilities of LLMs. Models such as OpenAI o3, Claude 3.7, DeepSeek r1, etc,. demonstrates magnificent performance in STEM and coding tasks. Yet, training such models requires complex infrastructures. In this talk, we present verl (https://github.com/volcengine/verl), a comprehensive framework that utilizes HybridFlow programming abstraction to achieve both flexibility to implement various algorithms and high performance. Through this talk, audiences will gain i) a basic understanding of various RL algorithms including PPO and GRPO; ii) best practices to train state-of-the-art open source language models and vision language models such as QWen series using verl. iii) best practices to implement tool calling and multi-turn rollout.

Join Virtual Talk & Panel Visit ByteDance Booth

Kvax: Fast and easy-to-use Flash Attention implementation for JAX

Fri 25 Apr 10 p.m. - 11 p.m. PDT
Expo Talk Panel

Kvax is a custom FlashAttention implementation for JAX, optimised for long-context training with efficient document mask computation and context parallelism. This talk explores the key ideas behind its implementation, focusing on document mask performance optimisations and context parallelism.

Join Virtual Talk & Panel Visit Nebius Booth

SPONSOR EXPO | May 7th

Welcome To The ICLR Sponsor Expo!

Expo Schedule

TALKS & PANELS

Ant-InclusionAI: A Fully Open-Sourced Project for LLMs from RL Reasoning to Agents

Agent research in the real world

Beyond Chain-of-Thought: Towards Autonomous Knowledge Management in Alibaba Cloud Tongyi Agentic Systems

Human Attention is NOT all you need

AutoGluon 1.2: Advancing AutoML with Foundation Models and LLM Agents

Bridging Specialized ML Research and Systematic Investing: Transforming the Research Pipeline

EUREKA: Evaluating and Understanding Large Foundation Models

Leveraging Multimodal LLMs for Shopify’s Global Catalogue

Improving LLM Benchmarks: Making AI Work for Real-World Needs

verl: Flexible and Efficient Infrastructures for Post-training LLMs

Kvax: Fast and easy-to-use Flash Attention implementation for JAX