Scaling Horizontal in Reinforcement Learning

The ability to learn on it’s own, through experimentation with the world, rather than through human labeled training sets, is a major strength of Reinforcement Learning. Combined with deep learning, this is a fantastic strategy in problems such as learning to play Backgammon and Go. Learning these games requires millions of training episodes, all of which can be simulated and planned by the agent using approximated models.

However, the temporal nature of Reinforcement can be a limiting factor in it’s ability to scale with real world problems. Learning non stationary problems such as irrigation control on crops, seems like a perfect application of Reinforcement Learning. And in many ways it is. The problem is, unlike training Backgammon, you simply can’t run millions of simulations of the game of “growing plants.” You have to wait on nature to learn from your experience and consequences of applying more or less water.

One way to solve this would be through better models. Observing the health of the plant at a finer level, that changes by the minute or even second, could provide enough to help learn. But those types of sensors may not exist before the prevalence of a second tool. What I’d call call horizontal reinforcement learning (I’m sure there’s a much better term that someone has coined). Somewhat similar to multiple agents collaborating together to solve a problem. But subtly different. They are iid (independent and identically distributed) agents that pool their experience together to learn faster. Imagine a crop that is sectioned off into 1 million different tiles. Each of with it’s own irrigation system. One one night, you could learn from 1 million different experiments, rather than just one. In such distributed systems, the temporal difference algorithms all hold up when pooling experience, and the computational challenge is that of any distributed system, whose solutions  (latency and concurrency) are relatively strong in computer science.