Popular Science: A pilot A.I. developed by a doctoral graduate from the University of Cincinnati has shown that it can not only beat other A.I.s, but also a professional fighter pilot with decades of experience. In a series of flight combat simulations, the A.I. successfully evaded retired U.S. Air Force Colonel Gene “Geno” Lee, and shot him down every time. In a statement, Lee called it “the most aggressive, responsive, dynamic and credible A.I. I’ve seen to date.”
What’s the most important part of this paragraph? The fact that an AI downed a professional fighter pilot? Or the fact that the AI was developed by a graduate student?
In the research paper the article is based on the authors note:
…given an average human visual reaction time of 0.15 to 0.30 seconds, and an even longer time to think of optimal plans and coordinate them with friendly forces, there is a huge window of improvement that an Artificial Intelligence (AI) can capitalize upon.
The AI was running on a $35 Raspberry Pi.
AI pilots can plan and react far quicker than human pilots but that is only half the story. Once we have AI pilots, the entire plane can be redesigned. We can build planes today that are much faster and more powerful than anything that exists now but the pilots can’t take the G-forces even with g-suits, AIs can. Moreover, AI driven planes don’t need ejector seats, life-support, canopies or as much space as humans.
The military won’t hesitate to deploy these systems for battlefield dominance so now seems like a good time to recommend Concrete Problems in AI Safety, a very important paper written by some of the world’s leading researchers in artificial intelligence. The paper examines practical ways to design AI systems so they don’t run off the rails. In the Terminator movie, for example, Skynet goes wrong because it concludes that the best way to fulfill its function to safeguard the world is to eliminate all humans–this is an extreme example of one type of problem, reward hacking.
Imagine that an agent discovers a buffer overflow in its reward function: it may then use this to get extremely high reward in an unintended way. From the agent’s point of view, this is not a bug, but simply how the environment works, and is thus a valid strategy like any other for achieving reward. For example, if our cleaning robot is set up to earn reward for not seeing any messes, it might simply close its eyes rather than ever cleaning anything up. Or if the robot is rewarded for cleaning messes, it may intentionally create work so it can earn more reward. More broadly, formal rewards or objective functions are an attempt to capture the designer’s informal intent, and sometimes these objective functions, or their implementation, can be “gamed” by solutions that are valid in some literal sense but don’t meet the designer’s intent. Pursuit of these “reward hacks” can lead to coherent but unanticipated behavior, and has the potential for harmful impacts in real-world systems. For example, it has been shown that genetic algorithms can often output unexpected but formally correct solutions to problems [155, 22], such as a circuit tasked to keep time which instead developed into a radio that picked up the regular RF emissions of a nearby PC.
Concrete Problems in AI Safety asks what kind of general solutions might exist to prevent or ameliorate reward hacking when we can never know all the variables that might be hacked? (The paper looks at many other issues as well.)
Competitive pressures on the battlefield and in the market mean that AI adoption will be rapid and AIs will be placed in greater and greater positions of responsibility. Firms and governments, however, have an incentive to write piecemeal solutions to AI control for each new domain but that is unlikely to be optimal. We need general solutions so that every AI benefits from the best thinking across a wide range of domains. Incentive design is hard enough when applied to humans. It will take a significant research effort combining ideas from computer science, mathematics and economics to design the right kind of incentive and learning structures for super-human AIs.