Humans display a remarkable capacity for discovering useful abstractions to make sense of and interact with the world. In particular, many of these abstractions are portable across behavioral domains, manifesting in what people see, do, and talk about. For example, people can visually decompose objects into parts; these parts can be rearranged to create new objects; the procedures for doing so can be encoded in language. What principles explain why some abstractions are favored by humans more than others, and what would it take for machines to emulate human-like learning of such “bridging” abstractions? In the first part of this talk, I’ll discuss a line of work investigating how people learn to communicate about shared procedural abstractions during collaborative physical assembly, which we formalize by combining a model of linguistic convention formation with a mechanism for inferring recurrent subroutines within the motor programs used to build various objects. In the second part, I’ll share new insights gained from extending this approach to understand why the kinds of abstractions that people learn and use varies between contexts. I will close by suggesting that embracing the study of such multimodal, naturalistic behaviors in humans at scale may shed light on the mechanisms needed to support fast, flexible learning and generalization in machines.