Several of my experiments in Reinforcement learning have involved generating predictive state representation (where by part, or some of the features are predictive in nature). These learned predictive features look at the current observation, make predictions about some future cumulant, and incorporate this prediction into the current state representation. In the Cycle world setting, I’ve demonstrated how this can be useful. In a way, predictive state representations look at an aspect of the future data stream and add this information to the current state representation.
In contrast, another way to construct a state representation would be from a historical perspective. The agent would encapsulate previous observations and incorporate them into the current state observation.
Given these two perspectives, it got me thinking, is there REALLY a place for predictive state representations? What is that place? Can they just be replaced by historical ones? What follows invokes more questions than answers … but is my way of flushing out some thoughts regarding this topic.
First off, With respect to predictive features, I’m going to define useful as being able to provide information that allows the agent to identify the state. A useful feature would allow you to generalize across similar states and likewise, allow the agent to differentiate from unlike states.
For both predictive and historical features, I think they’re really “only” useful when the data stream of observation is sparse in information. In other words, it lacks information at each tilmestep which would allow the agent to uniquely identify the current state. The tabular grid world where current state is uniquely identified is the opposite of such an information lacking observation stream. In contrast to the tabular word, information sparse / partially observable settings, historical or predictive features can fill the gaps where the current observation is lacking information. If informative data exists in the past data, then historical features become useful. Likewise, if informative data exists in the future data stream, then predictive features become more useful.
The B MDP where there is unique information at the very beginning of each episode, which directly informs the agent which action to take in the future, is somewhat of a special case where all of the useful information in the stream exists only in the past (the first state). Therefor, predictive features are of no use here. But in continuous (real life) tasks, I don’t think this would ever be the case. Informative information “should”? always be observable in the future, if the agent takes the correct steps So if that’s the case (as long as the agent can navigate back to these states, then predictive features “can do whatever the historical ones” can. It’s only when the agent can’t get back that historical features are required. But these cases seem somewhat contrived.
In either case though … maybe this is stating the obvious … but historical and predictive features seem to share something in common: They are useful when the current observation is starving for more informative data. Historical features help when there is informative data in the rear view mirror. Predictive when there’s informative data ahead such as the following MDP.
So which (predictive or historical) really depends on what the data stream looks like … and where you are within it.