Markov Decision Process in Reinforcement Learning

I’ve been reading and taking notes of “Introduction to Reinforcement Learning” by Sutton and Barto. As I do so, I’ve been taking notes of each chapter.

Chapter 4 – Markov Decision Process took more time than I’d like to admit to understand. I’ll blame it on the larger than usual amount of statistics/math combined with the 10+ years away from University.

Nonetheless I took notes as I slogged my way to a bit of a better understanding. Hopefully someone finds the attached notes useful below.

 

cmput-609-chapter-3

Perception as Prediction

What is perception? What does the world believe perception is? How does supervised learning model perception? Presumably through labeled data? What does Reinforcement Learning suggest perception is? State and action combinations? Where are these 2 fields right? Where are they wrong? Where are they common? Is perception the same as prediction? Is perception just opportunity for action? Perhaps an opportunity for interaction (action implies I’m acting on an object, when perhaps the object could act on me.) Furthermore, maybe I only perceive things that do in fact pose opportunities to interact with. A near infinite amount of things pass by in day to day life without notice.

These questions warrant consideration given the current state of artificial intelligence (and my understanding of it!).

 

Paleo … again

Since competing at Canadian Ultimate championships in the middle of august, my diet has been … pretty terrible. This is a fairly rare occurrence for me … I’m usually a stickler for eating healthy. Chocolate, trail mix, granola … I have a love hate for you!

So tomorrow … I’m trying paleo again. In the past I’ve felt great – mentally and physically – while on it.

“Blogging” more

I obviously use “blogging” loosely here.

Since starting grad school a few weeks ago, I’ve allowed myself to believe more in crystalizing my thoughts and putting pen to paper. Not surprisingly, I’ve noticed thoughts of my own, or concepts I’ve read about, finally coming together when I try to describe them to others.

So I’m going to try to write more. And use this blog as a landing spot. For the most part, I won’t be spending as much time / thought / editorial intelligence on these. Consider the posts somewhere between a facebook status update and a well thought out blog post. Most of them related to AI. And I’d guess a few about family, fitness, sport, and other random ideas.

 

Back to school

“Back to school, back to school, to prove to dad that I’m not a fool. I’ve got my lunch packed up, my boots tied tight, I hope I don’t get into a fight.

Well … here goes nothing”

After 13 mostly wonderful years “in industry,” I’m hopping in my own Delorean and will be studying amongst the kids like it’s 1999 ,as I try to dust off my math and computer science skills at the University of Alberta in pursuit of my M.Sc in Computer Science.

I believe it was Daniel Pink who first wrote about the motivation trifecta and the keys to motivation being 3 things – Autonomy, Purpose, and Mastery. This has stuck with me since I read it several years ago. It resonates perfectly. Perhaps because of my involvement in entrepreneurship.  I’m not sure there’s another vocation that can match the autonomy and purpose entrepreneurship provides. But even more likely is that it represents so well what entrepreneurship completely lacks. Mastery. Founders in startups by definition wear multiple hats. You write code. You manage. You raise funds. You talk to investors. You do your own finances. You shop for office space. You interview every employee. You write your own marketing material. You run customer support. That’s amazing. And completely rewarding. But the price you pay for this diversity is becoming good at a lot. But amazing at nothing. Furthermore, there’s no room for intellectual curiosity unless it can be justified by a business case. Time at SRI and later Samsung as an entrepreneur in residence left that intellectual itch even stronger.

For anyone fascinated in reinforcement learning (an area of machine learning / artificial intelligence motivated largely by behavioral psychology), there is quite literally no better place in the world to get training than at the University of Alberta. Rich Sutton literally wrote the text book. He’s there. Jonathan Schaeffer solved checkers. He’s there. Deepmind garnered a lot of press for GoAlpha’s victory over the worlds best Go player.  David Silver. Deepmind’s Chief Scientist did his PHD there. Many of the collaborators still reside there. This doesn’t even mention the other professors I’ve heard good things about, but haven’t yet met.

As someone who’s been amazed with Machine learning and AI, but only been able to dabble, the fact all of this, and an opportunity to become “a master”, is in my backyard of Edmonton still amazes me.

I’m sure University won’t be as fun without the Yukaflux parties! … but I’m incredibly excited for a bit of a diversion from a world of board rooms, brogrammers, apps, and financial models.

Moving on from Samsung Accelerator

It was about a year ago that I started working within the Samsung Accelerator in San Francisco. Depending on the day, I’d refer to myself as either an Entrepreneur in Residence, CEO of Distilled Labs, or Director at Samsung Accelerator. These are the games you allow yourself to play when your title is an afterthought. Regardless of title, the goal was – not surprisingly – to build a product that solved a market’s need. We were afforded the autonomy to essentially build the product we believed in, how we wanted to do it, and with whom we wanted to do it with. As a builder/inventor/entrepreneur/<insert your title here> you can’t ask for much better, so I sincerely thank those at Samsung for the chance.

However, after a year, it felt like time to come home. A weekly commute south to San Francisco would likely seem dreamy to many who head north to Fort McMurry … but while in some ways you feel like you have 2 homes, in many more ways you feel like you have none.

This of course doesn’t even mention the biggest motivation.

You can’t put a price on those feels.

Don’t track Mixpanel events for debug ios builds

Developing a new app I didn’t want to pollute my analytics data in mixpanel with development / test data.  So I wanted a clean way to be able to separate the test data with the release data.  “Not” tracking anything when in development mode was an option but I kinda wanted to be testing this as well … making sure I was always tracking the right things.  At the same time, I also wanted to be able to blow away the metrics from development really easily.  You hit Mixpanels 500K free events surprisingly quickly.

So a drop dead simple way to do this was to create two different Mixpanel projects – one for release, and the other for debugging.

Then in your ios AppDelegate.m you can simply use the appropriate key based on what mode you’re in.

//appdelegate.m
#ifdef DEBUG
#define MIXPANEL_TOKEN @"YOUR_DEBUG_KEY"
#else
#define MIXPANEL_TOKEN @"YOUR_RELEASE_KEY"
#endif
@implementation AppDelegate

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
    //Other stuff
    [Mixpanel sharedInstanceWithToken:MIXPANEL_TOKEN];
    
    return YES;
}

Now that I’ve done that, my development analytics data is nicely partitioned in one place (so I know things are working), and it’s easy to delete.