Scaling Horizontal in Reinforcement Learning

The ability to learn on it’s own, through experimentation with the world, rather than through human labeled training sets, is a major strength of Reinforcement Learning. Combined with deep learning, this is a fantastic strategy in problems such as learning to play Backgammon and Go. Learning these games requires millions of training episodes, all of which can be simulated and planned by the agent using approximated models.

However, the temporal nature of Reinforcement can be a limiting factor in it’s ability to scale with real world problems. Learning non stationary problems such as irrigation control on crops, seems like a perfect application of Reinforcement Learning. And in many ways it is. The problem is, unlike training Backgammon, you simply can’t run millions of simulations of the game of “growing plants.” You have to wait on nature to learn from your experience and consequences of applying more or less water.

One way to solve this would be through better models. Observing the health of the plant at a finer level, that changes by the minute or even second, could provide enough to help learn. But those types of sensors may not exist before the prevalence of a second tool. What I’d call call horizontal reinforcement learning (I’m sure there’s a much better term that someone has coined). Somewhat similar to multiple agents collaborating together to solve a problem. But subtly different. They are iid (independent and identically distributed) agents that pool their experience together to learn faster. Imagine a crop that is sectioned off into 1 million different tiles. Each of with it’s own irrigation system. One one night, you could learn from 1 million different experiments, rather than just one. In such distributed systems, the temporal difference algorithms all hold up when pooling experience, and the computational challenge is that of any distributed system, whose solutions  (latency and concurrency) are relatively strong in computer science.

 

“Goals” in reinforcement learning

I want to create a mind that, through it’s own learning, makes real time decisions. These decisions would likely be optimized to accomplish a goal – which can be represented by maximizing reward. So in this sense, goals help shape behavior and decision making.

Constraining this intelligence around goals however, seems minimizing. Imagine, however we had an intelligent mind that was able to take an input stream, and project all possible futures, based on it’s action. An ability to make these predictions would produce the ultimate mind.

However, projecting an unconstrained future in this way, is impractical for 2 reasons. At least now.  The first is for performance. The second is for learning. For performance, without goals, the agent is essentially performing a random walk. It’s akin to a water skier with no goal, performing random actions. Without a goal, the skier will fall almost immediately. Learning without a goal is equally problematic. With a goal, an agent is able to try different actions, and access how successful they were towards an end goal. It’s a pruning mechanism to determine what temporal based actions to continue to learn about. For a water skier learning to slalom ski, they’re able to continue to try different levels of aggression of edge change. If the goal however was to learn how to trick ski, the actions tried would be much different.

Markov Decision Process in Reinforcement Learning

I’ve been reading and taking notes of “Introduction to Reinforcement Learning” by Sutton and Barto. As I do so, I’ve been taking notes of each chapter.

Chapter 4 – Markov Decision Process took more time than I’d like to admit to understand. I’ll blame it on the larger than usual amount of statistics/math combined with the 10+ years away from University.

Nonetheless I took notes as I slogged my way to a bit of a better understanding. Hopefully someone finds the attached notes useful below.

 

cmput-609-chapter-3

Perception as Prediction

What is perception? What does the world believe perception is? How does supervised learning model perception? Presumably through labeled data? What does Reinforcement Learning suggest perception is? State and action combinations? Where are these 2 fields right? Where are they wrong? Where are they common? Is perception the same as prediction? Is perception just opportunity for action? Perhaps an opportunity for interaction (action implies I’m acting on an object, when perhaps the object could act on me.) Furthermore, maybe I only perceive things that do in fact pose opportunities to interact with. A near infinite amount of things pass by in day to day life without notice.

These questions warrant consideration given the current state of artificial intelligence (and my understanding of it!).

 

Paleo … again

Since competing at Canadian Ultimate championships in the middle of august, my diet has been … pretty terrible. This is a fairly rare occurrence for me … I’m usually a stickler for eating healthy. Chocolate, trail mix, granola … I have a love hate for you!

So tomorrow … I’m trying paleo again. In the past I’ve felt great – mentally and physically – while on it.