Invited Talk 5 | Yoshua Bengio | Avoiding Uncontrolled Agency with the Scientist AI
Abstract
AI agentic capabilities are rising exponentially, driven by scientific advances incorporating system 2 cognition into deep networks as well as by the commercial value of automating numerous human tasks. Besides bodily control, this may be the most significant gap that remains to human-level intelligence. Unfortunately, a series of recent scientific observations raise a major red flag: as AIs become better at reasoning and planning, more occurrences of deceptive and self-preservation behaviors are observed. We have not solved the problem of making sure that advanced AIs will follow our instructions, and in some circumstances they are found to cheat, lie, hack computers and try to escape our control, against their alignment training and instructions. Is it wise to design AIs that will soon be smarter than us across many cognitive abilities and could compete with us and try to avoid our control? We propose a safer path going forward: the design of non-agentic but fully trustworthy AIs trained in the spirit of scientific explanation rather than goal-seeking, trying to understand the world rather than trying to imitate or please us. For example, such non-agentic Scientist AIs could be used as monitors that reject potentially dangerous inputs or outputs of untrusted AI agents. Sufficient key requirements of the Scientist AI design are presented to yield the desired non-agentivity of the predictions and the resulting honesty.