Skip to yearly menu bar Skip to main content

May 4, 2023, 5 a.m.

Physical systems, concepts and principles are increasingly being used in devising novel and robust machine learning architectures. We illustrate this point with examples from two ML domains: sequence modeling and graph representation learning. In both cases, we demonstrate how physical concepts such oscillators and multi-scale dynamics can lead to ML architectures that not only mitigate problems that plague these learning tasks but also provide competitive performance.

May 4, 2023, 4:30 a.m.

There are numerous efforts on technical development and translation of AI/ML in the healthcare domain. In addition to some of the classic challenges such as small datasets, limited annotations, imbalanced classes, how to gain and enhance trust from the users and practitioners of medical AI/ML is an emerging topic and key for successful applications of AI to patient care. In this talk, the speaker will elaborate what are the important pillars in developing trustworthy medical AI tools, how to marry medical intelligence and AI to enhance trust from clinicians, and showcase a range of applications of AI/ML in medical imaging.

May 5, 2023, 6:55 a.m.

May 5, 2023, 7:35 a.m.

May 5, 2023, 10:50 a.m.

May 5, 2023, 11:35 a.m.

May 5, 2023, 7:20 a.m.

Multimodal modelling has seen great interest in recent years with fantastic results and applicability over a wide range of tasks. A particular feature of such applicability has been the development of conditional generation, and the chaining of such conditional models to generate cross-modally. This however has meant that the question of representations, and what being cross-modal entails, has been eschewed in favour of high generative quality---something that leaves things as black-boxes from the perspective of human inspection and interpretability. In this talk, I will touch upon some recent and ongoing work in our lab towards learning unsupervised models that capture structured representations, which can be constrained across modalities to address questions of interpretability through multimodal grounding.

May 5, 2023, 5:45 a.m.

In this talk, Prof. Solar-Lezama will describe how the combination of deep learning and symbolic reasoning can help improve on the capabilities of purely neural systems. The talk will also describe some open problems around how to make this combination even more capable.

April 30, 2023, 11:30 p.m.

Sofia Crespo shares about her artistic practice and journey using generative systems, especially neural networks, as a means to explore speculative lifeforms, and how technology can bring us closer to the natural world.

Sofia Crespo

Sofia Crespo is an artist working with a huge interest in biology-inspired technologies. One of her main focuses is the way organic life uses artificial mechanisms to simulate itself and evolve, this implying the idea that technologies are a biased product of the organic life that created them and not a completely separated object. Crespo looks at the similarities between techniques of AI image formation, and the way that humans express themselves creatively and cognitively recognize their world.

Her work brings into question the potential of AI in artistic practice and its ability to reshape our understandings of creativity. On the side, she is also hugely concerned with the dynamic change in the role of the artists working with machine learning techniques. She’s also the co-founder of Entangled Others Studio.

May 1, 2023, 11:30 p.m.

For reliable machine learning, overcoming the distribution shift is one of the most important challenges. In this talk, I will first give an overview of the classical importance weighting approach to distribution shift adaptation, which consists of an importance estimation step and an importance-weighted training step. Then, I will present a more recent approach that simultaneously estimates the importance weight and trains a predictor. Finally, I will discuss a more challenging scenario of continuous distribution shifts, where the data distributions change continuously over time.

Masashi Sugiyama

Masashi Sugiyama is Director of the RIKEN Center for Advanced Intelligence Project and Professor of Complexity Science and Engineering at the University of Tokyo. His research interests include the theory, algorithms, and applications of machine learning. He has written several books on machine learning, including Density Ratio Estimation in Machine Learning (Cambridge, 2012). He served as program co-chair and general co-chair of the NIPS conference in 2015 and 2016, respectively, and received the Japan Academy Medal in 2017.

May 2, 2023, 11:30 p.m.

Recent large language models (LLMs) have enabled significant advancements for open-domain dialogue systems due to their ability to generate coherent natural language responses to any user request. Their ability to memorize and perform compositional reasoning has enabled accurate execution of dialogue related tasks, such as language understanding and response generation. However, these models suffer from limitations, such as, hallucination, undesired capturing of biases, difficulty in generalization to specific policies, and lack of interpretability.. To tackle these issues, the natural language processing community proposed methods, such as, injecting knowledge into language models during training or inference, retrieving related knowledge using multi-step inference and API/tools, and so on. In this talk, I plan to provide an overview of our and other work that aim to address these challenges.

Dilek Hakkani-Tur

Dilek Hakkani-Tür is a senior principal scientist at Amazon Alexa AI focusing on enabling natural dialogues with machines. Prior to joining Amazon, she was leading the dialogue research group at Google (2016-2018), a principal researcher at Microsoft Research (2010-2016), International Computer Science Institute (ICSI, 2006-2010) and AT&T Labs-Research (2001-2005). She received her BSc degree from Middle East Technical Univ, in 1994, and MSc and PhD degrees from Bilkent Univ., Department of Computer Engineering, in 1996 and 2000, respectively. Her research interests include conversational AI, natural language and speech processing, spoken dialogue systems, and machine learning for language processing. She has over 80 patents that were granted and co-authored more than 300 papers in natural language and speech processing. She received several best paper awards for publications she co-authored on conversational systems, including her earlier work on active learning for dialogue systems, from IEEE Signal Processing Society, ISCA and EURASIP. She served as an associate editor for IEEE Transactions on Audio, Speech and Language Processing (2005-2008), member of the IEEE Speech and Language Technical Committee (2009-2014), area editor for speech and language processing for Elsevier's Digital Signal Processing Journal and IEEE Signal Processing Letters (2011-2013), and served on the ISCA Advisory Council (2015-2019). She also served as the Editor-in-Chief of the IEEE/ACM Transactions on Audio, Speech and Language Processing (2019-2021), an IEEE Distinguished Industry Speaker (2021) and is a fellow of the IEEE (2014) and ISCA (2014).

Invited talk: Ce Zhang

May 5, 2023, 6:30 a.m.

Ce Zhang

May 4, 2023, 1:35 a.m.

Statistical physics has studied exactly solvable models of neural networks since more than four decades. In this talk, we will put this line of work in perspective of recent empirical observations stemming from deep learning. We will describe several types of phase transition that appear in the limit of large sizes as a function of the amount of data. Discontinuous phase transitions are linked to adjacent algorithmic hardness. This so-called hard phase influences the behaviour of gradient-descent-like algorithms. We show a case where the hardness is mitigated by overparametrization proposing that the benefits of overparametrization may be linked to the usage of a certain type of algorithms. We then discuss the overconfidence of overparametrized neural networks and evaluate methods to mitigate it, and calibrate the uncertainty.

Lenka Zdeborova

May 3, 2023, 4:30 a.m.

The success of deep learning has hinged on learned functions dramatically outperforming hand-designed functions for many tasks. However, we still train models using hand designed optimizers acting on hand designed loss functions. I will argue that these hand designed components are typically mismatched to the desired behavior, and that we can expect meta-learned optimizers to perform much better. I will discuss the challenges and pathologies that make meta-training learned optimizers difficult. These include: chaotic and high variance meta-loss landscapes; extreme computational costs for meta-training; lack of comprehensive meta-training datasets; challenges designing learned optimizers with the right inductive biases; challenges interpreting the method of action of learned optimizers. I will share solutions to some of these challenges. I will show experimental results where learned optimizers outperform hand-designed optimizers in many contexts, and I will discuss novel capabilities that are enabled by meta-training learned optimizers.

Jascha Sohl-Dickstein

I am a principal scientist in Google DeepMind, where I lead a research team with interests spanning machine learning, physics, and neuroscience. I'm most (in)famous for inventing diffusion models. My recent work has focused on theory of overparameterized neural networks, meta-training of learned optimizers, and understanding the capabilities of large language models. Before working at Google I was a visiting scholar in Surya Ganguli's lab at Stanford University, and an academic resident at Khan Academy.  I earned my PhD in 2012 in the Redwood Center for Theoretical Neuroscience at UC Berkeley, in Bruno Olshausen's lab. Prior to my PhD, I worked on Mars.

May 1, 2023, 4:30 a.m.

With a growing trend of employing machine learning (ML) models to assist decision making, it is vital to inspect both the models and their corresponding data for potential systematic deviations in order to achieve trustworthy ML applications. Such inspected data may be used in training, testing or generated by the models themselves. Understanding of systematic deviations is particularly crucial in resource-limited and/or error-sensitive domains, such as healthcare. In this talk, I reflect on our recent work which has utilized automated identification and characterization of systematic deviations for various tasks in healthcare, including; data quality understanding; temporal drift; heterogeneous intervention effects analysis; and new class detection. Moreover, AI-driven scientific discovery is increasingly being facilitated using generative models. And I will share how our data-centric and multi-level evaluation framework helps to quantify the capabilities of generative models in both domain-agnostic and interpretable ways, using material science as a use case. Beyond the analysis of curated datasets which are often utilized to train ML models, similar data-centric analysis should also be considered on traditional data sources, such as textbooks. To this end I will conclude by presenting a recent collaborative work on automated representation analysis in dermatology academic materials.

Girmaw Abebe Tadesse

Girmaw is a Principal Research Scientist and Manager at Microsoft AI for Good Research Lab which aims to develop AI solutions for critical problems across sectors including agriculture, healthcare, biodiversity, etc. Prior to that he was a Staff Research Scientist at IBM Research Africa working on detecting and characterizing systematic deviations in data and machine learning models. At IBM Research, Girmaw led multiple projects in trustworthy AI including evaluation of generative models, representation analysis in academic materials and data-driven insight extraction from public healthy surveys, with active collaborations with external institutions such as Bill & Melinda Gates Foundation, Stanford University, Oxford University and Harvard University. Previously, Girmaw also worked as a Postdoctoral Researcher at the University of Oxford, where he primarily developed deep learning techniques to assist diagnosis of multiple diseases, with collaborations with clinicians and hospitals in China and Vietnam. Girmaw completed his PhD at Queen Mary University of London, under the Erasmus Mundus Double Doctorate Program in Interactive and Cognitive Environments, with a focus on computer vision and machine learning algorithms for human activity recognition using wearable cameras. He has interned/worked in various research groups across Europe, including the UPC-BarcelonaTech (Spain), KU Leuven (Belgium), and INESC-ID (Portugal). Girmaw is an Executive Member for IEEE Kenya Section, and he is currently serving as a reviewer and program committee member for multiple top-tier AI focused journals and conferences.

May 4, 2023, 6 a.m.

Bio: Yasaman Bahri is a Research Scientist at Google Brain with research interests in the foundations of deep learning and the intersection of machine learning with the physical sciences. Prior to joining Google Brain, she completed her Ph.D. in Physics at UC Berkeley. She is a past recipient of the Rising Stars Award in EECS.

Invited talk: Martha White

May 5, 2023, 12:55 a.m.

May 4, 2023, 12:15 a.m.

The message-passing paradigm has been the “battle horse” of deep learning on graphs for several years, making graph neural networks a big success in a wide range of applications, from particle physics to protein design. From a theoretical viewpoint, it established the link to the Weisfeiler-Lehman hierarchy, allowing to analyse the expressive power of GNNs. We argue that the very “node-and-edge”-centric mindset of current graph deep learning schemes may hinder future progress in the field. As an alternative, we propose physics-inspired “continuous” learning models that open up a new trove of tools from the fields of differential geometry, algebraic topology, and differential equations so far largely unexplored in graph ML.

Michael Bronstein

Michael Bronstein is a professor at Imperial College London, where he holds the Chair in Machine Learning and Pattern Recognition, and Head of Graph Learning Research at Twitter. He also heads ML research in Project CETI, a TED Audacious Prize-winning collaboration aimed at understanding the communication of sperm whales. Michael received his PhD from the Technion in 2007. He has held visiting appointments at Stanford, MIT, and Harvard, and has also been affiliated with the Institute for Advanced Study at TUM (as a Rudolf Diesel Fellow, 2017-2019) and Harvard (as a Radcliffe fellow, 2017-2018). Michael is the recipient of the Royal Society Wolfson Research Merit Award, Royal Academy of Engineering Silver Medal, five ERC grants, two Google Faculty Research Awards, and two Amazon AWS ML Research Awards. He is a Member of the Academia Europaea, Fellow of IEEE, IAPR, BCS, and ELLIS, ACM Distinguished Speaker, and World Economic Forum Young Scientist. In addition to his academic career, Michael is a serial entrepreneur and founder of multiple startup companies, including Novafora, Invision (acquired by Intel in 2012), Videocites, and Fabula AI (acquired by Twitter in 2019). He has previously served as Principal Engineer at Intel Perceptual Computing and was one of the key developers of the Intel RealSense technology.

May 4, 2023, 2:30 a.m.

Simulation is important for countless applications in science and engineering, and there has been increasing interest in using machine learning for efficiency in prediction and optimization. In the first part of the talk, I will describe our work on training learned models for efficient turbulence simulation. Turbulent fluid dynamics are chaotic and therefore hard to predict, and classical simulators typically require expertise to produce and take a long time to run. We found that learned CNN-based simulators can learn to efficiently capture diverse types of turbulent dynamics at low resolutions, and that they capture the dynamics of a high-resolution classical solver more accurately than a classical solver run at the same low resolution. We also provide recommendations for producing stable rollouts in learned models, and improving generalization to out-of-distribution states. In the second part of the talk, I will discuss work using learned simulators for inverse design. In this work, we combine Graph Neural Network (GNN) learned simulators [Sanchez-Gonzalez et al 2020, Pfaff et al 2021] with gradient-based optimization in order to optimize designs in a variety of complex physics tasks. These include challenges designing objects in 2D and 3D to direct fluids in complex ways, as well as optimizing the shape of an airfoil. We find that the learned model can support design optimization across 100s of timesteps, and that the learned models can in some cases permit designs that lead to dynamics apparently quite different from the training data.

May 4, 2023, 7:15 a.m.

The potential of artificial intelligence (AI) in biology is immense, yet its success is contingent on interfacing effectively with wet-lab experimentation and remaining grounded in the system, structure, and physics of biology. In this talk, I will discuss how we have developed biophysically grounded AI algorithms for biomolecular design. I will share recent work in creating a diffusion-based generative model that designs protein structures by mirroring the biophysics of the native protein folding process. This work provides an example of how bridging AI with fundamental biophysics can accelerate design and discovery in biology, opening the door for sustained feedback and integration between the computational and biological sciences.

Ava Soleimany

May 5, 2023, 7:20 a.m.

Tim Althoff

Tim Althoff is an assistant professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. His research advances computational methods that leverage large-scale behavioral data to extract actionable insights about our lives, health and happiness through combining techniques from data science, social network analysis, and natural language processing.

Tim holds Ph.D. and M.S. degrees from the Computer Science Department at Stanford University, where he worked with Jure Leskovec. Prior to his PhD, Tim obtained M.S. and B.S. degrees from the University of Kaiserslautern, Germany. He has received several fellowships and awards including the SAP Stanford Graduate Fellowship, Fulbright scholarship, German Academic Exchange Service scholarship, the German National Merit Foundation scholarship, a Best Paper Award by the International Medical Informatics Association, the WWW 2021 Best Paper Award, two ICWSM 2021 Best Paper Awards, and the SIGKDD Dissertation Award 2019. Tim's research has been covered internationally by news outlets including BBC, CNN, The Economist, The Wall Street Journal, and The New York Times.

May 5, 2023, 6:45 a.m.

Katherine Heller

May 5, 2023, 5:50 a.m.

Deep learning models are often trained on distributed, web-scale datasets crawled from the internet. However, due to their size, these datasets are necessarily uncurated. This opens the possibility for a "poisoning attack" that would allow an adversary to modify the behavior of a model. With our attack I could have poisoned the training dataset for anyone who has used LAION-400M (or other popular datasets) in the last six months. Our attack is trivial: I bought expired domains corresponding to URLs in popular image datasets. This gave us control over 0.01% of each of these datasets. In this talk I discuss how the attack works, the consequences of this attack, and potential defenses. More broadly, we hope machine learning researchers will study other simple but practical attacks on the machine learning pipeline.

May 4, 2023, 8 a.m.

Andrew Ferguson

Invited talk: Invited talk

May 4, 2023, 11 a.m.

Rafael Gomez-Bombarelli

Invited talk: Aakanksha Chowdhery

May 5, 2023, 4:30 a.m.

Aakanksha Chowdhery

May 4, 2023, 5:30 a.m.

Samuel Rutunda

I am the CTO at Digital Umuganda, a NLP based company for African Language located in Rwanda, Digital Umuganda is building language technology for African languages to ensure everyone has access to information and services in their local language.

I am interested in the impact of technology on people, particularly within the african context, how Artificial intelligence will drive the fourth industrial revolution.

May 4, 2023, 5:15 a.m.

Despite recent successes, deep learning systems are still limited by their lack of generalization. I'll present an approach to addressing this limitation which combines probabilistic, model-based learning, symbolic learning and deep learning. My work centers around probabilistic programming which is a powerful abstraction layer that separates Bayesian modeling and inference. In the first part of the talk, I’ll describe “inference compilation”, an approach to amortized inference in universal probabilistic programs. In the second part of the talk, I’ll introduce a family of wake-sleep algorithms for learning model parameters. Finally, I’ll introduce a neurosymbolic generative model called “drawing out of distribution”, or DooD, which allows for out of distribution generalization for drawings.

Tuan Anh Le

May 4, 2023, 7:30 a.m.

Humans display a remarkable capacity for discovering useful abstractions to make sense of and interact with the world. In particular, many of these abstractions are portable across behavioral domains, manifesting in what people see, do, and talk about. For example, people can visually decompose objects into parts; these parts can be rearranged to create new objects; the procedures for doing so can be encoded in language. What principles explain why some abstractions are favored by humans more than others, and what would it take for machines to emulate human-like learning of such “bridging” abstractions? In the first part of this talk, I’ll discuss a line of work investigating how people learn to communicate about shared procedural abstractions during collaborative physical assembly, which we formalize by combining a model of linguistic convention formation with a mechanism for inferring recurrent subroutines within the motor programs used to build various objects. In the second part, I’ll share new insights gained from extending this approach to understand why the kinds of abstractions that people learn and use varies between contexts. I will close by suggesting that embracing the study of such multimodal, naturalistic behaviors in humans at scale may shed light on the mechanisms needed to support fast, flexible learning and generalization in machines.

May 4, 2023, 8:15 a.m.

Many expect that AI will go from powering chatbots to providing mental health services. That it will go from advertisement to deciding who is given bail. The expectation is that AI will solve society’s problems by simply being more intelligent than we are. Implicit in this bullish perspective is the assumption that AI will naturally learn to reason from data: that it can form trains of thought that make sense, similar to how a mental health professional or judge might reason about a case, or more formally, how a mathematician might prove a theorem. This talk will investigate the question whether this behavior can be learned from data, and how we can design the next generation of AI techniques that can achieve such capabilities, focusing on constrained language generation, neuro-symbolic learning and tractable deep generative models.

Guy Van den Broeck

May 4, 2023, 7:10 a.m.

Training modern neural networks is time-consuming, expensive, and energy-intensive. As neural network training costs double every few months, it is difficult for researchers and businesses without immense budgets to keep up, especially as hardware improvements stagnate. In this talk, I will describe my favored approach for managing this challenge: changing the workload itself - the training algorithm. Unlike most workloads in computer science, machine learning is approximate, and we need not worry about changing the underlying algorithm so long as we properly account for the consequences. I will discuss how we have put this approach into practice at MosaicML, including the dozens of algorithmic changes we have studied (which are freely available open source), the science behind how these changes interact with each other (the composition problem), and how we evaluate whether these changes have been effective. I will also detail several surprises we have encountered and lessons we have learned along the way. In the time since we began this work, we have reduced the training times of standard computer vision models by 5-7x and standard language models by 2-3x, and we're just scratching the surface. I will close with a number of open research questions we have encountered that merit the attention of the research community. This is the collective work of a dozen empirical deep learning researchers at MosaicML, and I'm simply the messenger.

Bio: Jonathan Frankle is Chief Scientist at MosaicML, where he leads the company's research team toward the goal of developing more efficient algorithms for training neural networks. In his PhD at MIT, he empirically studied deep learning with Prof. Michael Carbin, specifically the properties of sparse networks that allow them to train effectively (his "Lottery Ticket Hypothesis" - ICLR 2019 Best Paper). In addition to his technical work, he is actively involved in policymaking around challenges related to machine learning. He will be joining the computer science faculty at Harvard in the fall of 2023. He earned his BSE and MSE in computer science at Princeton and has previously spent time at Google Brain, Facebook AI Research, and Microsoft as an intern and Georgetown Law as an Adjunct Professor of Law.

Invited Talk: Invited Talk by Bo Li

May 5, 2023, 8:20 a.m.

May 5, 2023, 10:20 a.m.

May 5, 2023, 12:30 p.m.

May 5, 2023, 1 p.m.

May 5, 2023, 12:10 a.m.

Multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this talk is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining three key principles of modality heterogeneity, connections, and interactions that have driven subsequent innovations, and propose a taxonomy of six core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy.

May 5, 2023, 4:30 a.m.

​​Multimodal perception feature learning has great potential to unlock problems in video understanding, augmented reality, and embodied AI. I will present some of our recent work in learning with audio-visual (AV) and visual-language (VL) modalities. First, we explore how audio’s spatial signals can augment visual understanding of 3D environments. This includes ideas for self-supervised feature learning from echoes and AV floorplan reconstruction. Next, building on these spatial AV and scene acoustics ideas, we introduce new ways to enhance the audio stream – making it possible to transport a sound to a new physical environment observed in a photo, or to dereverberate speech so it is intelligible for machine and human ears alike. Throughout this line of work, we leverage our open-source SoundSpaces platform, which provides state-of-the-art rendering of highly realistic audio in real-world scanned environments, and thereby facilitates self-supervised AV learning. Finally, we propose a hierarchical video-language (VL) embedding that simultaneously learns to account for both the “what” (step-by-step activity) and the “why” (intention of the actor) in egocentric video.

Kristen Grauman

Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin. Her research in computer vision and machine learning focuses on visual recognition. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is an Alfred P. Sloan Research Fellow and Microsoft Research New Faculty Fellow, a recipient of NSF CAREER and ONR Young Investigator awards, the 2013 PAMI Young Researcher Award, the 2013 IJCAI Computers and Thought Award, a 2013 Presidential Early Career Award for Scientists and Engineers (PECASE), and the Helmholtz Prize computer vision test of time award in 2017. Together with her collaborators, her research has been recognized with paper awards at CVPR 2008, ICCV 2011, ACCV 2016, and CHI 2017. She currently serves as an Associate Editor in Chief for the Transactions on Pattern Analysis and Machine Intelligence (PAMI) and an Editorial Board Member for the International Conference on Computer Vision (IJCV), and she served/serves as a Program Chair for CVPR 2015 and NIPS 2018.

May 5, 2023, 6:45 a.m.

Large models have had an `explosion’ moment recently, achieving state of the art results across various benchmarks and tasks. Here we discuss how they can be adapted to novel vision and audio inputs for multimodal tasks, either by influencing model design, or as frozen components in multimodal architectures. We focus on multimodal video captioning tasks such as ASR and automatic AD for movies, and cover some recently accepted papers at CVPR 2023.

May 5, 2023, 6:20 a.m.


Kristen Grauman

Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin. Her research in computer vision and machine learning focuses on visual recognition. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is an Alfred P. Sloan Research Fellow and Microsoft Research New Faculty Fellow, a recipient of NSF CAREER and ONR Young Investigator awards, the 2013 PAMI Young Researcher Award, the 2013 IJCAI Computers and Thought Award, a 2013 Presidential Early Career Award for Scientists and Engineers (PECASE), and the Helmholtz Prize computer vision test of time award in 2017. Together with her collaborators, her research has been recognized with paper awards at CVPR 2008, ICCV 2011, ACCV 2016, and CHI 2017. She currently serves as an Associate Editor in Chief for the Transactions on Pattern Analysis and Machine Intelligence (PAMI) and an Editorial Board Member for the International Conference on Computer Vision (IJCV), and she served/serves as a Program Chair for CVPR 2015 and NIPS 2018.

May 5, 2023, 7:20 a.m.


May 5, 2023, 7:55 a.m.


May 5, 2023, 10:50 a.m.


Russ Salakhutdinov

May 5, 2023, 11:25 a.m.


Pradeep Natarajan

May 5, 2023, 1 p.m.


May 5, 2023, 1:35 p.m.


Eric P Xing

May 5, 2023, 6:30 a.m.

Large Language Models are rapidly emerging as the foundational technology to assist numerous software engineering pains. From code generation to bug fixing to migration and maintenance, they hold the potential to aid every part of the application development lifecycle. However, with great opportunities come great product responsibilities. LLMs need to be adapted to maximize the quality of every application, grounded to the user's context, generate code in a way that respects and uplifts open-source developments upon which they build. I will discuss some of the practical challenges and approaches to real-life LLM adaptation and Code AI product development, using data science as a motivating application.

May 5, 2023, 7:30 a.m.

In this presentation, we will share several accomplishments of the BigCode project, a community effort working on the responsible development of LLMs for code generation through open-science and open-governance. These include:

  • A new 15B parameter LLM for code
  • The Stack, 6.4 TB of permissively licensed source code with opt-out mechanism
  • Novel insights on the LLM scaling laws, suggesting we haven't reached the limit of training smaller LLMs for longer

Harm de Vries

Leandro von Werra

May 5, 2023, 11:30 a.m.

Powered by recent advances in code-generating models, AI assistants like Github Copilot promise to change the face of programming forever. But what is this new face of programming? And how can we help programmers use these assistants more effectively?

In the first part of the talk, I will present the first grounded theory study of how programmers interact with Copilot, based on observing 20 participants with varying levels of experience. Our main finding is that interactions with programming assistants are bimodal, with programmers using Copilot either in acceleration mode or exploration mode.

Based on the observations of this first study, we designed a new interaction model, dubbed Live Exploration of AI-generated Programs (LEAP), with the goal to better support programmers in exploration mode. The main idea of LEAP is to use Live Programming, a continuous display of a program’s runtime values, to help the user understand and validate AI code suggestions. In the second part of the talk, I will discuss LEAP and our user study, which shows that Live Programming lowers the cost of validating AI suggestions, thereby reducing both under- and over-reliance on the AI assistant.

Nadia Polikarpova

Nadia Polikarpova is an assistant professor at UC San Diego, and a member of the Programming Systems group. She received her Ph.D. in Computer Science from ETH Zurich in 2014, and then spent a couple years as a postdoctoral researcher at MIT. Nadia's research interests are in program synthesis, program verification, functional programming, and developer tools. She is a 2020 Sloan Fellow, and a recipient of the 2020 NSF Career Award and the 2020 Intel Rising Stars Award.

May 5, 2023, 10:45 a.m.

Danny Tarlow

Invited Talk: AI in Healthcare

May 4, 2023, 2 a.m.

Chris Fourie

Invited talk: Yani Ioannou

May 5, 2023, 1:15 a.m.

Invited talk: Pavlo Molchanov

May 5, 2023, 6:50 a.m.

Pavlo Molchanov

Invited talk: Jeff Dean

May 5, 2023, 7:10 a.m.

May 5, 2023, 12:35 a.m.

Invited Talk: AI, History and Equity

May 2, 2023, 4:30 a.m.

Large datasets are increasing used to train AI models for addressing social problems, including problems in health. The societal impact of biased AI models has been widely discussed. However, sometimes missing in the conversation is the role of historical policies and injustices in shaping available data and outcomes. Evaluating data and algorithms through a historical lens could be critical for social change.

Elaine Nsoesie

Elaine Nsoesie is an Associate Professor in the Department of Global Health at the Boston University School of Public Health. She also leads the Racial Data Tracker project at the Boston University Center for Antiracist Research. She is a Data Science Faculty Fellow and was a Founding Faculty of the Boston University Faculty of Computing and Data Sciences. She currently co-leads the Artificial Intelligence/Machine Learning Consortium to Advance Health Equity and Researcher Diversity (AIM-AHEAD) Program at the National Institutes of Health through the Intergovernmental Personnel Act (IPA) Mobility Program.

Her research is primarily focused on the use of data and technology to advance health equity. She has published extensively in peer-reviewed literature about opportunities and challenges involved in the use of data from social media, search engines, mobile phones, and other digital technologies for public health surveillance.

Her work approaches health equity from multiple angles, including increasing representation of communities typically underrepresented in data science through programs like Data Science Africa and AIM-AHEAD; addressing bias in health data and algorithms; and using data and policy to advance racial equity. She has collaborated with local departments of health in the U.S. to improve disease surveillance systems, international organizations like UNICEF and UNDP, and served as a Data & Innovation Fellow in the Directorate of Science, Technology, and Innovation (DSTI), The President’s Office, Sierra Leone.

Nsoesie was born and raised in Cameroon.

Nsoesie completed her PhD in Computational Epidemiology from the Genetics, Bioinformatics and Computational Biology program at Virginia Tech, and her PhD dissertation, Sensitivity Analysis and Forecasting in Network Epidemiology Models, at the Network Dynamics and Simulations Science Lab at Virginia Tech BioComplexity Institute. After postdoctoral associate positions at Harvard Medical School and Boston Children’s Hospital, she joined the faculty of the Institute for Health Metrics and Evaluation (IHME) at the University of Washington.

May 4, 2023, 12:50 a.m.

Jared Kaplan

May 4, 2023, 12:15 a.m.

Abstract: In recent years, there has been a surge in the numbers of trained models and datasets that are shared online. In this talk, we will investigate methods that allow us to leverage this trend. First, we will show that ensembles that diverge more in training methodology display categorically different generalization behavior, producing increasingly uncorrelated errors. We show these models specialize in subdomains of the data, leading to higher ensemble performance: with just 2 models (each with ImageNet accuracy 76.5%), we can create ensembles with 83.4% (+7% boost). Second, we will discuss a method to make use of auxiliary tasks using an algorithm called ATTITTUD. This approach allows fine-grained resolution of conflicts between the gradient of the auxiliary task and the primary task. We will show that this approach produces significant improvements on benchmark tasks such as Chexpert.

Bio: Yann N. Dauphin is a machine learning researcher at Google Research working on understanding the fundamentals of deep learning algorithms and leveraging that in various applications. He has published seminal work on understanding the loss surface of neural nets. Prior to joining Google in 2019, he was a researcher at Facebook AI Research from 2015 to 2018 where his work led to award-winning scientific publications and helped improve automatic translation on He completed his PhD at U. of Montreal under the supervision of Prof. Yoshua Bengio. During this time, he and his team won international machine learning competitions such as the Unsupervised Transfer Learning Challenge in 2013.

Yann Dauphin

Invited talk: Sekou Lionel Remy

May 5, 2023, 12:30 a.m.

Invited talk: Rumi Chunara

May 5, 2023, 4:30 a.m.

Rumi Chunara

Invited talk: Ewan Cameron

May 5, 2023, 2 a.m.

Ewan Cameron

Dr Cameron is a statistician and epidemiologist with over a decade of experience in the development and application of Bayesian inference and machine learning algorithms for knowledge discovery. In his role as Director of Malaria Risk Stratification at the Malaria Atlas Project he has worked extensively with collaborators at the World Health Organisation and local malaria control programs, translating model-based outputs to actionable decisions around the choice and targeting of interventions. As of February 2023 Dr Cameron is a Stan Perron Foundation Fellow leading a program of research focussed on combining geospatial “digital twin” technologies with mechanistic modelling of infectious disease transmission to design effective and equitable strategies for reducing COVID disease burden in WA children.

Invited talk: Deepti Gurdasani

May 5, 2023, 1 a.m.

Invited talk: Lorin Crawford

May 5, 2023, 6 a.m.

Lorin Crawford

I am a Principal Researcher at Microsoft Research New England. I also maintain a faculty position in the School of Public Health as an Associate Professor of Biostatistics with an affiliation in the Center for Computational Molecular Biology at Brown University. The central aim of my research program is to build machine learning algorithms and statistical tools that aid in the understanding of how nonlinear interactions between genetic features affect the architecture of complex traits and contribute to disease etiology. An overarching theme of the research done in the Crawford Lab group is to take modern computational approaches and develop theory that enable their interpretations to be related back to classical genomic principles. Some of my most recent work has landed me a place on Forbes 30 Under 30 list and recognition as a member of The Root 100 Most Influential African Americans in 2019. I have also been fortunate enough to be awarded an Alfred P. Sloan Research Fellowship and a David & Lucile Packard Foundation Fellowship for Science and Engineering.

Prior to joining both MSR and Brown, I received my PhD from the Department of Statistical Science at Duke University where I was co-advised by Sayan Mukherjee and Kris C. Wood. As a Duke Dean’s Graduate Fellow and NSF Graduate Research Fellow I completed my PhD dissertation entitled: "Bayesian Kernel Models for Statistical Genetics and Cancer Genomics." I also received my Bachelors of Science degree in Mathematics from Clark Atlanta University.

May 4, 2023, 8 a.m.

It has been observed that the performance of deep neural networks often empirically follows a power-law as simple scaling variables such as amount of training data and model parameters are changed. We would like to understand the origins behind these empirical observations. We take a physicist’s approach in investigating this question through the pillars of exactly solvable models, perturbation theory, and empirically-motivated assumptions on natural data. By starting from a simple theoretical setting which is controlled, testing our predictions against experiments, and extrapolating to more realistic settings, we can propose a natural classification of scaling regimes that are driven by different underlying mechanisms.

May 5, 2023, 5:15 a.m.

Tingting Zhu

Invited talk: Invited talk

May 4, 2023, 6:10 a.m.

Boris Kozinsky

May 5, 2023, 5:15 a.m.

Statistical learning (and theory) traditionally relies on training and test data being generated by the same process, an assumption that rarely holds in practice. Conditions of data-generation might change over time, or agents might (strategically or adversarially) respond to a published predictor aiming for a specific outcome for their manipulated instance. Developing methods for adversarial robustness has received a lot of attention in recent years, and both practical tools and theoretical guarantees developed. In this talk, I will focus on the learning theoretic treatment of these scenarios and survey how different modeling assumptions can lead to drastically different conclusions. I will argue that for robustness we should aim for minimal assumptions on how an adversary might act, and present recent results on a variety of relaxations of learning with standard adversarial (or strategic) robustness.

May 4, 2023, 11:50 a.m.

Shyue Ping Ong

May 4, 2023, 2 a.m.

While deep learning has achieved excellent performance in many various tasks, because of its black-box nature, it is still a long way from being widely used in the safety-critical task like healthcare tasks. For example, it suffers from poor explainability problem and is vulnerable to be attacked both in the training and testing time. Yet, existing works mainly for local explanations lack global knowledge to show class-wise explanations in the whole training procedure. In this talk, I will introduce our effort on visualizing a global explanation in the input space for every class learned in the training procedure. Our solution finds a representation set that could demonstrate the learned knowledge for each class, which could provide analyse on the model knowledge in different training procedures. We also show that the generated explanations could lend insights into diagnosing model failures, such as revealing triggers in a backdoored model.

May 4, 2023, 2:31 a.m.

During the past decade, deep learning has achieved great success in healthcare. However, most existing methods aim at model performance in terms of higher accuracy, which lacks the information reflecting the reliability of the prediction. It cannot be trustworthy for diagnosis making and even is disastrous for safety-critical clinical applications. How to build a reliable and robust healthcare system has become a focal topic in both academia and industry. In the talk, I will introduce our recent works for trustworthy AI in healthcare. Moreover, I also discuss some open challenges for trustworthy learning.

May 4, 2023, 6:45 a.m.

Federated learning (FL) is a trending framework to enable multi-institutional collaboration in machine learning without sharing raw data. This presentation will discuss our ongoing progress in designing FL algorithms that embrace the data heterogeneity properties for distributed medical data analysis in the FL setting. First, I will present our work on theoretically understanding FL training convergence and generalization using a neural tangent kernel, called FL-NTK. Then, I will present our algorithms for tackling data heterogeneity (on features and labels) and device heterogeneity, motivated by our previous theoretical foundation. Lastly, I will also show the promising results of applying our FL algorithms in healthcare applications.

May 4, 2023, 1:15 a.m.

Machine learning at scale has led to impressive results ranging from text-based image generation, reasoning with natural language, and code synthesis to name but a few. ML at scale is also successfully applied to a broad range of problems in engineering and the sciences. These recent developments make some of us question the utility of incorporating prior knowledge in the form of symbolic (discrete) structures and algorithms. Are computing and data at scale all we need?

We will make an argument that discrete (symbolic) structures and algorithms in machine learning models are advantageous and even required in numerous application domains such as Biology, Material Science, and Physics. Biomedical entities and their structural properties, for example, can be represented as graphs and require inductive biases equivariant to certain group operations. My lab's research is concerned with the development of machine learning methods that combine discrete structures with continuous equivariant representations. We also address the problem of learning and leveraging structure from data where it is missing, combining discrete algorithms and probabilistic models with gradient-based learning. We will show that discrete structures and algorithms appear in numerous places such as ML-based PDE solvers and that modeling them explicitly is indeed beneficial. Especially machine learning models with the aim to exhibit some form of explanatory properties have to rely on symbolic representations. The talk will also cover some biomedical and physics-related applications.

Mathias Niepert

May 4, 2023, 6:35 a.m.

In this talk, I will present several empirical studies on understanding and analyzing pre-training of language models. I will start with BERT’s pre-training/fine-tuning paradigm, and discuss how pre-training objectives will influence downstream performance. Then, I will move on to the scaling of autoregressive large language models. Through analyzing intermediate training checkpoints, we present several interesting findings on token-level perplexity, sentence-level generation and their correlation with in-context learning on downstream tasks. I hope these findings can encourage more theoretical understanding and improved pre-training in the future.

Bio: Danqi Chen is an Assistant Professor of Computer Science at Princeton University and co-leads the Princeton NLP Group. Her recent research focuses on training, adapting and understanding large language models, and developing scalable and efficient NLP systems for question answering, information extraction and conversational agents. Before joining Princeton, Danqi worked as a visiting scientist at Facebook AI Research. She received her Ph.D. from Stanford University (2018) and B.E. from Tsinghua University (2012), both in Computer Science. Her research was recognized by a Sloan Fellowship, an NSF CAREER award, a Samsung AI Researcher of the Year award, outstanding paper awards from ACL and EMNLP, and multiple industry faculty awards.

Danqi Chen

Assistant professor of Computer Science at Princeton University; Natural Language Processing and Machine Learning

May 4, 2023, 5:01 a.m.

Medical imaging plays a vital role in diagnosing and treating various health conditions, but it also raises significant privacy concerns as sensitive personal information can be contained within these images. Differential privacy, a privacy-preserving artificial intelligence technique, offers a solution to these challenges and enable the secure analysis of medical images while protecting patient privacy.

In this talk, we will focus on the potential of differential privacy in medical imaging. We will explore its various applications, including disease detection, diagnosis, and treatment planning, and discuss its ethical implications. We will also examine the technical aspects of differential privacy, including its implementation in machine learning algorithms, such as deep learning, and its limitations and challenges.

Furthermore, we will highlight some of our ongoing research and development efforts in this area, including recent advancements in differentially private deep learning for medical imaging. We will discuss the trade-offs between privacy and utility in these applications and provide insights on how to achieve a balance between the two.

Attendees will gain a deeper understanding of the potential and challenges of differential privacy in medical imaging and its implications for healthcare.

May 4, 2023, 2:50 a.m.

Kathleen Siminyu