By Chris Nicholson
Deep reinforcement learning—where machines learn by testing the consequences of their actions—is one of the most promising and impactful areas of artificial intelligence. It combines deep neural networks with reinforcement learning, which together can be trained to achieve goals over many steps. It’s a crucial part of self-driving vehicles and industrial robots, which have to navigate complex environments safely and on time.
Most machine learning algorithms are good at perceptive tasks that would take a person under a second to perform, such as recognizing a voice or a face. But deep reinforcement learning can learn tactical sequences of actions, things like winning a board game or delivering a package.
Those tactical sequences are closer to what we think of as actually intelligent behavior. If we say someone is smart, we rarely mean that they can recognize faces. We very often mean that they know what to do to reach their goal. Startups like Covariant and Ambi Robotics are using reinforcement learning to control robotic arms, while automakers like Tesla use it to help steer their vehicles.
Steering systems in complexity
As impressive as that is, reinforcement learning can do much more than control individual machines. It can direct entire orchestras of machines, steer complex systems toward better performance, route fleets, and coordinate robotic teams.
Complexity theory describes changing systems with many interlocking parts that produce surprising outcomes based on small changes to their inputs (think the butterfly effect on hurricanes). In that theory, there’s something called emergent behavior. Behaviors emerge when lots of small components in a system organize themselves into something larger, a perceptible pattern on a greater scale. Hurricanes, ant colonies, and flocks of birds are all good examples of small units self-organizing into something more formidable.
With deep reinforcement learning, robots and autonomous vehicles exhibit emergent behavior within larger complex systems, such as traffic or global supply chains. That is, those machines begin to behave as teams, even though they have never been explicitly instructed to do so. Through training, they learn that teamwork leads to better outcomes. In a factory, that might mean one AGV robot sacrifices its own speed in order to allow a higher-priority payload to pass, thus increasing the overall throughput of the system.
For a long time, complexity theory was something interesting to think about, but difficult to act on. Deep reinforcement learning makes complexity theory actionable because it allows us to create and steer emergent behavior. It allows us to grapple with systems too large for the human mind to fully comprehend. Deep reinforcement learning is to complexity what genetic engineering is to evolution: Leverage and acceleration.
Humanity seems like it needs a little leverage and acceleration. The rush of events makes it seem like the complex systems we live in are spiraling out of control. Deep reinforcement learning brings some of them back within our grasp.
Handling external shocks and variable data
Most decision-making systems have trouble with unexpected shifts in data. They are trained to make decisions in some contexts, and they break when something unforeseen happens. They are brittle. Deep reinforcement learning can respond to variable data and unforeseen events in a generalizable way. One reason for this is that, in simulation, you can train it against events that have never occurred before, and are not present in historical data. Simulation has been instrumental in helping manufacturing take autonomous robots and vehicles from testing into real-world scenarios.
Reinforcement learning is adaptive to what’s happening now, and can quickly update its expectations and understanding of the environment in which it operates. This means that AI-driven optimization and decision-making can be applied to many new problems that were previously unsolvable. It means we’ll be able to make our machines smarter, where before we relied on fragile rules and gut instinct.
Mastering complexity in supply chains
Reinforcement learning’s ability to respond to unforeseen events has made it more and more interesting as a solution to modern supply chain problems. Just in the last year, external shocks to global and national supply chains have included COVID-19, economic lockdowns, geopolitical standoffs, extreme weather events (the storms in Texas), and a 200,000-ton ship blocking the Suez Canal. The world has not yet recovered.
Supply chains—starting with the factories upstream, running through the ports and rail yards and warehouses, and ending with retail—are large and complex systems. These systems need to be adaptive, and yet the news shows us they are not.
One use case where deep reinforcement learning can help is in energy consumption, where it can be used to fight climate change by helping us produce more clean and renewable energy, use that energy more efficiently in supply chains and consume that energy more intelligently.
Energy grids are just supply chains for power. But, renewable power sources are subject to greater variability than traditional ones, because their output depends on the weather.
Deep reinforcement learning can inject better decision-making into the drones and robots and repair crews tending the solar farms. It can help set plans for spinning, supplemental reserves, and backup supply at power plants, as well as balance fossil fuel generation with more variable renewable energy sources. It can calibrate how renewables are generated, stored, and consumed at home in distributed systems, lowering the load on the grid. And it can help energy-intensive plants predict and preemptively act on electricity price surges.
Similar forms of systems control can be applied to agriculture, where robots are feeding livestock and harvesting crops and compensating for labor shortages, as well as to mining sites, where autonomous haul trucks are much further along than in the passenger car world because they operate off-highway. Fully autonomous hauling trucks and drill systems are creating new efficiencies by reducing fuel and labor costs, as well as making operations safer.
In sum, deep reinforcement learning isn’t just the intelligent core of smart robots, it is able to act as the control tower for much larger systems, which is one of the crucial problems that society faces today. By learning emergent behaviors for teams of machines on a large scale, deep reinforcement learning can help us adapt to larger, systemic changes to keep our systems working. This is important, because the systems we have now to make and ship goods are large, interlocked, and frail, and making them more robust is a key challenge for the coming years.
About the author
Chris Nicholson is the founder and CEO of Pathmind, an AI startup that applies deep reinforcement learning to supply chain and industrial operations.