Do humans learn fully online? I’ll define “online” as when learning only leverages the data incoming from the current observation. It learns only from this new data. In such a world, batch processing and experience replay are not “online.” Is that how humans work?
I suspect not. I think there’s a balance between online learning, and planning with humans. Furthermore, I thank that we balance between the two depending on the situation. When a human is faced with an intense situation – for example a professional hockey player in the middle of their shift – all of the data processing is in the foreground. They take the immediate observation, and act upon it. There is no background planning taking place. In contrast, and at the other extreme is when the athlete is back home daydreaming just before they fall asleep (or while they are actually sleeping). At this time, the human is not doing much, if any foreground processing. The observations they are receiving are minimal – given they’re laying in a dark room. And their action decisions are even more minimal. But in such a state of dreaming, their mind is free to plan (with the model of the world that has been learned).
This brings me to an ideal agent. I think an ideal agent will maximize its computational power. There is no idle time for such an agent. It learns as much as it can at every moment. There are no sleep(s)! Therefore, such an agent, if in a time where immediate action is required (like that of an athlete), this computational power must dedicate itself completely to foreground processing. But later, the same agent, when in a safe place, has the opportunity to plan using the model of the world it has learned. I usually hate suggesting that humans are the the default inspiration for designing the best agent (ie. just accepting that neural nets are the defacto best function approximater because humans implement them) … but in this case, I think we have it right.