The Challenges of Human-Centered AI and Robotics: What We Want, Need, and are Getting From Human-Machine Interaction
Language-based AI is now ubiquitous, and user expectations for intelligent machines are scaling along with it: we expect machines to understand us, predict our needs and wants, do what we enjoy and prefer, and adapt as we change our moods and minds, learn, grow, and age. Physical AI, in the form of robotics, is the next major AI challenge, and it is not ready to leap into our daily lives yet. While massive investment is focused on functional behavior of humanoid robots (perceiving the world, moving around, and manipulating objects), human-robot interaction (HRI) is relegated to an afterthought. It is assumed that once a robot can move around and do things, it will be useful and wanted, yet over 25 years of research in HRI tells us otherwise. While the needs for human-centered services continue to grow, research and development is minimal. This talk will discuss how bringing together robotics, AI, and machine learning for long-term user modeling, real-time multimodal behavioral signal processing, and affective computing is enabling machines to understand, interact, and adapt to users’ specific and ever-changing needs. We will overview methods and challenges of sparse and noisy heterogeneous, multi-modal, personal interaction data and of creating expressive agent and robot behavior toward understanding, coaching, motivating, and supporting a wide variety of user populations across the age span (infants, children, adults, elderly), ability span (typically developing, autism, anxiety, stroke, dementia), contexts (schools, therapy centers, homes), and deployment durations (from weeks to 6 months) through socially assistive robotics. We will discuss the challenges of understanding what we humans want from interactions with machines vs. what we need vs. what we are getting, and how those distinctions are shaping the future of not just AI and ML but society at large.
Amazon Expo: AGI Lab, Boosting RL Solvability presented by Satyaki Chakraborty from Amazon AGI
We are excited to present Amazon Nova, a groundbreaking portfolio of AI offerings that deliver frontier intelligence and industry-leading price performance. Nova is built on advanced AI technologies originally developed for Amazon's internal applications, such as Alexa+, Amazon Ads, and AWS Marketplace, and is now available to AWS customers. Amazon Nova includes Nova models, fast and cost-effective foundation models for text and multimodal needs; Nova Forge, a new service to build your own frontier models; and Nova Act, a new service to build agents that automate browser-based UI workflows powered by a custom Nova 2 Lite model. These models and services have built-in controls for the safe and responsible use of AI, delivering robust protections, content filters, and policy-aligned behaviors to meet compliance requirements. During our demo, a product engineer and researcher will showcase the product and the science behind it. We will demonstrate how Nova models can be customized to deliver responses that reflect industry expertise, powering interactive chat interfaces, Retrieval-Augmented Generation (RAG) systems, agentic applications, video analysis, and UI workflow automation solutions. We will also highlight the multimodal capabilities of Nova, which accept text, image, or video inputs to generate text output, and the creative content generation models that accept text and image inputs to generate image or video output. Amazon Nova has been adopted by tens of thousands of customers across industries, delivering measurable impact with cost savings and gains in productivity, automation, and quality with real-world deployments. We believe that Nova will be a valuable addition to the ICLR conference and look forward to sharing our insights and experiences with you.
In this talk, we present our experience in scaling hybrid linear attention architectures to trillion-scale, through two models from the Ling Team: Ling-2.5-1T and Ring-2.5-1T. These models integrate linear attention with selected softmax attention layers to support efficient long-context training while preserving strong reasoning and representation capability. We share key algorithm–system co-design insights that make trillion-scale hybrid attention practical, including stability techniques for large-scale linear attention training and efficient distributed training for ultra-long sequences.
Claiming Your True Market Value as an AI Researcher in Industry -- Negotiation Workshop & Fireside Chat
As AI reshapes industries, compensation is changing faster than researchers can track. New labs, startups, and top companies are competing for talent with vastly different pay structures, currencies, and cultural norms. Yet most researchers are never formally taught how to understand their worth or navigate these systems. The result is an uneven landscape where brilliant minds often make life-changing decisions without the information they need.
The session begins with a concise, data-driven talk on current AI compensation and negotiation trends, grounded in real stories and case studies. From there, a fireside chat and open Q&A invite candid, experience-driven insights from researchers who have navigated these conversations firsthand.
Key Takeaways for attendees: * How to evaluate compensation (salary, equity, bonuses) across industry roles such as Research/Applied/Data Scientists, Research/Machine Learning/Software Engineers, and more * How to compare global opportunities and account for regional differences in pay structures * How to identify leverage points and negotiate effectively at different career stages and levels * How to respond to pushback and recognize red vs. green flags in job offers * How to negotiate the deadline, and avoid having an offer rescinded * How to advocate for yourself without counteroffers and amidst having fears
From IQ to AQ & EQ: Reimagining Real Estate with Agentic AI
Real estate is one of the most financially significant and emotionally complex decisions people make — yet digital experiences have historically optimized for information retrieval (i.e., home search) rather than intelligent and personalized guidance. In this talk, I’ll share how we are evolving from high-IQ systems that answer questions to agentic systems that demonstrate AQ (Agentic Quotient) and EQ (Emotional Intelligence). By combining deep, panoptic understanding of homes and users with advanced reasoning, planning, and tool use, we are building AI copilots that don’t just respond passively — they guide, anticipate, and act proactively. I’ll discuss the key technical challenges, and how we operationalize agentic AI in production, architect for trust in high-stakes decisions, and design experiences that transform the home journey from fragmented search into confident, personalized progression.
ML in Smell Tech Discussion Booth
This discussion booth explores the emerging intersection of machine learning and smell technology, also known as digital olfaction. The space is designed for researchers, students, builders, and curious attendees interested in how AI can be used to detect, classify, generate, and interpret scent-related data. Topics may include electronic noses, olfactory sensing, multimodal AI, applications in healthcare, food quality, environmental monitoring, fragrance, and human-computer interaction. The goal is to create an open and interdisciplinary conversation around both the technical challenges and the creative opportunities in this fast-growing field. Whether you work directly on ML models for chemical signal analysis or are simply interested in the future of smell interfaces and sensory intelligence, this booth offers a place to exchange ideas, share projects, discuss collaborations, and connect with others working at the frontier of AI and olfaction.
Building Physical AI at Scale: Data, Infrastructure, and Evaluation for the Real World
Physical AI — robots, autonomous vehicles, and embodied agents — is approaching a genuine inflection point. Foundation models for real-world interaction are becoming viable, hardware costs are dropping, and developer interest is surging. Yet most teams building in this space are still stitching together their development stack from incompatible pieces, and it is slowing them down. The core bottlenecks are well understood but rarely addressed together. Real-world robotic behavior cannot be learned from synthetic data alone — collecting, annotating, and validating diverse physical-world data at scale is a full operational discipline. Training multimodal vision-language-action models demands infrastructure purpose-built for the task. And evaluating whether a model actually works in the physical world requires benchmarking approaches that go far beyond standard leaderboards. This social will bring together researchers and practitioners to examine all three problems in parallel. Short talks from speakers with hands-on experience in physical AI development will cover the state of real-world data pipelines, what purpose-built infrastructure for physical AI actually looks like, and how the community is approaching evaluation for embodied systems. An open discussion will follow, focused on where the biggest unsolved problems lie and how the research community can contribute.
openJiuwen is an open-source Agent framework and platform aimed at helping users build precise, simple, efficient, and engineering production-ready AI Agents. This presentation will cover two topics related to openJiuwen. First, we will introduce the key features and design philosophy of openJiuwen, including agent self-evolution, context compression and offloading, and the agent controller. Second, we will introduce an agent application developed based on openJiuwen: JiuwenClaw, an AI assistant that supports self-evolving capabilities and intelligent task management.
Google: Towards 3D Foundational Robotics Models
Transformers have completely changed the landscape of Robotics, becoming main computational backbones of the generalist robotics models, capable of solving a variety of tasks across several embodiments and enviromental conditions. However designing foundational robotics architectures that can create efficient explicit or implicit 3D models of the surrounding environment and leverage them for optimal control is still work in progress and an exciting research direction. In this EXPO session, we will discuss progress made on this subject in last few years, bring panelists who are experts in the field to brainstorm new methods (e.g. particle-based world models for robotics) and discuss the impact of this research on robotics in a few years from now.
Invited Talk - Max Welling
Step away from the booths and join us for a relaxed afternoon tea - a chance to connect with fellow builders, researchers, and teams over coffee, tea, and great conversation. We’ll also be hosting informal office hours, where you can chat directly with our technical and research teams, ask questions, share ideas, and get hands-on guidance.
From IQ to AQ & EQ: Reimagining Real Estate with Agentic AI
Real estate is one of the most financially significant and emotionally complex decisions people make — yet digital experiences have historically optimized for information retrieval (i.e., home search) rather than intelligent and personalized guidance. In this talk, I’ll share how we are evolving from high-IQ systems that answer questions to agentic systems that demonstrate AQ (Agentic Quotient) and EQ (Emotional Intelligence). By combining deep, panoptic understanding of homes and users with advanced reasoning, planning, and tool use, we are building AI copilots that don’t just respond passively — they guide, anticipate, and act proactively. I’ll discuss the key technical challenges, and how we operationalize agentic AI in production, architect for trust in high-stakes decisions, and design experiences that transform the home journey from fragmented search into confident, personalized progression.
Nomadic Video AI Social
Join us for an evening hosted by a team building video AI systems for real-world deployment, bringing together researchers, engineers, and builders working at the frontier of video AI. This is a relaxed, conversation-first gathering connecting people across robotics, self-driving, and embodied AI.
We’ll focus on a new shift in how video is used in real-world AI: moving beyond static labeling toward systems that can search, reason over, and validate events directly from raw footage. Discussions will explore approaches that combine vision-language models, motion understanding, and multi-stage validation to turn large-scale video into something you can actually act on.
No formal talks, just high-signal conversations with people actively building and deploying these systems.
The goal is to go beyond surface-level ideas and get into what it actually takes to make Physical AI work reliably in the real world. If you are working on autonomy, VLMs, or production AI systems, this is a chance to exchange ideas and shape what comes next.
| ICLR uses cookies for essential functions only. We do not sell your personal information. Our Privacy Policy » |