Startup ideas are products of environment

I used to fashion myself as “idea” guy … in terms of the ideas of the startup variety. I felt like I could, at any point rattle off 4 or 5 startup ideas …. and not ALL of them would be laughed at by an investor. Somehow I believed there was something innate in this ability. However, since coming back for grad school at the university of alberta, my perspective on this “ability,” has changed.

It’s been over a year since starting my research in reinforcement learning. During this time, I can’t say I’ve had a single startup idea (not quite true, but close). You’d think I could say that it was because I didn’t give any thought to it … but that’s not entirely true. There’s been moments where I’ve tried to think of ideas. And each time I’ve grasped at straws. Maybe the lack of ideas, stems from the fact that I approach the search for ideas, armed with my reinforcement learning hammer, seeking a nail. This approach (trying to find a problem your technology expertise can solve), could be argued as an anti-pattern for coming up with good ideas. Furthermore, it seams that reinforcement learning is in the infancy of being applied to real world problems, and still struggles to find traction because of the lack of data (or to be more precise, the time it takes to acquire real life data). I blogged about this challenge before in a post called “Scaling horizontally in reinforcement learning.” But I think the root of it might be environmental. My time spent at the water cooler is spent talking about how to optimize algorithms. Or how to leverage GVFs to form predictive state representations. It’s not spent talking about creating an app for that, as it was when spending my days in the Bay area startup scene. For the record … I’m not complaining about this change.

The effect seems intuitive. Obviously someone who is immersed in an environment where everyone and their dog are talking about startup ideas, is going to have a few ideas of their own. But until I was immersed in the academic environment, immediately after spending several years working in SOMA in startups, I didn’t realize how much of an effect the environment has on the type of problems a person is attempting to solve.

 

Perhaps time really does move faster as you age …

When I was a kid, summer holidays (those two magical months of July and August) seemed like an eternity. Now however, as a 37 year old, it seems like it was just yesterday that I was launching fireworks to celebrate Labor day weekend (we really did. And it was amazing!). That was over 2 months ago. I’m sure most “adults” can relate to this feeling of time moving faster the older you get. But perhaps there’s a reasonable explanation for this effect.

Time is conventionally thought of in the continuous space. I’m sure quantum physicists much more intelligent than I have postulated on a discrete time domain, but for now, I perceive time as continuous. However, I have recently been implementing reinforcement learning algorithms in robotic domains, where time is discretized.

In such domains, the robot agent “wakes up” at certain frequencies for computation. At each instance the agent “wakes up”, the robot must choose an action, take the action, observe the environment, and finally learn.

RL.png

In the robot environments I have worked with, I, the designer, have defined the learning rate. This rate defines how often the agent “wakes up” – where it observes it’s most recent environment, as well as takes an action.  To such an agent, it is easy to imagine that the only definition of “elapsed time” is the number of learning cycles it has processed. It has no concept of what happened, let alone how long it took, between these learning cycles.

It is natural to believe that a young child has a much more active brain, processing at a more frequent rate than an older senior. Imagine if a young child “learns” 1000 times per second, and a senior learns only 100 times. To the child, a year represents 10X more learning cycles, so quite literally feels 10 times as long. Similarly, there is an intuition on the perception of elapsed time (or lack there of) with people waking up from a nights sleep, or from a comatose state.

I am making huge generalizations when comparing the human brain to this simple agent / environment framework. I barely have a basic understanding of neurology. But I suspect that the brain doesn’t just operate on a single discrete observation set at certain frequencies. So this comparison to the simple RL environment is somewhat naive. However, at some level, one could imagine that the computational frequency of the human brain slows down with age. If that is the case, and if you believe that is the only metric we have to perceive the passage of time, it only seems natural that time does indeed speed up as we age.

Dynamic Horde of General Value functions

Just finished documenting my work on creating an architecture for a dynamic horde of General value functions.

General value functions (GVFs) have proven to be effective in answering predictive questions about the future. However, simply answering a single predictive question has limited utility. Others have demonstrated further utility by using these GVFs to dynamically compose more abstract questions, or to optimize control. In other words, to feed the prediction back into the system. But these demonstrations have relied on a static set of GVFs, handcrafted by a human designer …

https://github.com/dquail/RLGenerateAndTest

Dynamic Horde

I’m just finishing up a research project about an architecture for a dynamic set of general value functions. I’ll look to share the documentation and source on http://github.com/dquail as per usual. But in the mean time, wanted to share the abstract.

General value functions (GVFs) have proven to be effective in answering predictive questions about the future. However, simply answering a single predictive question has limited utility. Others have demonstrated further utility by using these GVFs to dynamically compose more abstract questions (Ring 2017), or to optimize control (Modayil & Sutton 2014). In other words, to feed the prediction back into the system. But these demonstrations have relied on a static set of GVFs, handcrafted by a human designer.

In this paper, we look to extend the Horde architecture (Sutton et al. 2011) to not only feed the GVFs back into the system, but to do so dynamically. In doing so, we explore ways to control the lifecycle of GVFs contained in a Horde – mainly to create, test, cull, and recreate GVFs, in an attempt to maximize some objective.

Real time Reinforcement learning examples

Over the last 4 months, I’ve completed several different projects using Reinforcement learning on a dynamixel servo providing real time continuous sensorimotor data to the learning algorithms. In particular, I’ve experimented with creating, running, and measuring thousands of GVF demons making predictions in parallel, policy gradient actor critic methods, and pavlovian control.

I’ve attempted to document what I learned as best as I could. All code and experimental writeups can be found on my github page at:

www.github.com/dquail/RobotPerception

Taking a break from my “smart” phone

I think we’re underestimating the importance of continued serendipitous day dreaming. Creative thoughts and problem solving insights have often come to me while laying in bed, walking to the coffee store, or while sitting at a red light. But so many of these moments are being interrupted by the crack cocaine that is my iPhone. I wish I had the will power to not give in; but short of that, I’m going to experiment with giving it up and going to a dumb old flip phone. I did this about a year ago and lasted a month – until I took a water ski trip into Sacramento. The last second economist in me couldn’t take it.

Wish me luck!

How would you feel?

  • If you were beaten up. And 3 years later told by the bully, that they ought to have taken your lunch money while they were at at … and that maybe next time they will.
  • If your rich neighbor erected a giant fence to keep you out of their yard. And told you that you must pay for it.
  • If someone a few blocks away said you couldn’t visit their block. Even though you have nowhere else to go. And that you may have been living there lawfully in the past. Or have family members currently living there.
  • If you were told you wouldn’t be prioritized because of your religion.

http://www.cnn.com/2017/01/29/politics/donald-trump-reality-check-first-week/