I’m just finishing up a research project about an architecture for a dynamic set of general value functions. I’ll look to share the documentation and source on http://github.com/dquail as per usual. But in the mean time, wanted to share the abstract.
General value functions (GVFs) have proven to be effective in answering predictive questions about the future. However, simply answering a single predictive question has limited utility. Others have demonstrated further utility by using these GVFs to dynamically compose more abstract questions (Ring 2017), or to optimize control (Modayil & Sutton 2014). In other words, to feed the prediction back into the system. But these demonstrations have relied on a static set of GVFs, handcrafted by a human designer.
In this paper, we look to extend the Horde architecture (Sutton et al. 2011) to not only feed the GVFs back into the system, but to do so dynamically. In doing so, we explore ways to control the lifecycle of GVFs contained in a Horde – mainly to create, test, cull, and recreate GVFs, in an attempt to maximize some objective.