I spent some time mingling with some CTOs of iNovia portfolio companies today. It was fun getting a bit back into the startup world, especially with the obvious AI slant to the meeting today. One subject that I discussed a lot with some of them was the apparent lack of startups leveraging Reinforcement Learning.
I’m assuming that this perception is true, but admit that I have no data on this other than personal and anecdotal. The number of startups leveraging deep learning outnumber those using RL significantly – based on the press clippings I read, and the first hand startup founders I know. So I believe there is something to this. So this begged the natural question of why?
I’ve blogged about this before, but the discussions today a bit beyond this. No one, obviously including myself, claimed to have the answer. That said, I believe there are several reasons for the apparent lack of RL startups, most of them pivoting around the availability of data (or lack there of).
Data in RL is temporally based. Unless you’re simulating games on the atari, it is difficult to come by. Especially because of the often “real world” action based nature of it. You can’t just use amazon turk and label 500,000 images of cats to train your network. You have to generate a real stream of experience. That’s more difficult to do.
More specific challenges of the acquisition of data might be:
- The cost of experimentation – Exploration is key in creating data used for reinforcement learning; When simulating games of GO, one can experiment and make a move that is thought to be poor. The worst that can happen is that you agents self esteem takes a hit because it loses another game! The stakes are low. But in a real time stock trader, the reinforcement learning suffers a significant cost when experimenting. This exploration cost isn’t suffered in supervised classification type approaches.
- Delayed value – Any value derived from a reinforcement agent solution doesn’t occur until after the agent has been trained. It’s hard to convince an organization to adopt your product if they have to wait a month for it to learn before providing value. Because of the lack of simulators, these agents must learn on the fly. So when they’re deployed the don’t provide immediate value.
- Temporal nature of data – I’ve blogged about this before. But the nature of the reinforcement data is temporal in nature. Often times rooted in “real time.” Agriculture data for example. It takes a full calendar year to acquire results of how a crop may perform. Rather than 0.001 seconds to simulate a game of GO. This is similar to a degree for acquiring data for a manufacturing plant. We’re at the early stages of RL adoption, so this data just isn’t there yet, and beginning to acquire it is difficult because of the reason above – why acquire the data in the first place if there’s no immediate pay off?
- Infancy of the technology – I don’t buy this one, but it was mentioned that RL is a more novel approach than supervised learning. RL, Supervised, Unsupervised. They’ve all been “around” a long time. The computational power and availability of data has been what’s given rise to supervised learning. I’d conclude by saying that for RL we now have the computational power, but we still lake the data.