Methods for Exploiting Past Experience Data in Reinforcement Learning
Recent advances in autonomy technology have promoted the widespread emergence of autonomous systems in various domains such as domestic robots, self-driving vehicles, and financial management agents. The technology developed in this invention, Fitted Q-Iteration with Complex Returns (CFQI), enables an autonomous system to learn how to perform a complex task from past experiences through reinforcement learning – learning from feedback without human intervention. Successful application of this technology will not only make it easier and faster to build general-purpose autonomous systems, but also enable an autonomous system to continuously improve its performance and adapt to new and dynamic environments.
Compared with competitive products without using our technology, products based on our technology possess the following advantages: A) Less dependent on human intervention (e.g., anticipating the operational environment and hard coding rules of operations and learning) in teaching a system how to perform a task, and therefore, less human cost, and faster to develop a system; B) Ability to adapt to new, uncertain, dynamic environment without human intervention (e.g., specifically instructed by a human expert to change its course of actions under a new condition); and C) Forever learning capability - the longer a system is deployed, the more experience it gets, and the better it performs. Compared with alterative RL processes, in particular, existing FQI methods, our technology possesses the following advantages: i) Better sample efficiency than FQI – achieving the same level of policy performance using significantly less samples; ii) Better computational efficiency than FQI – achieving the same level of policy performance with significantly less computation time; and iii) Better effectiveness than TFQI (an earlier extension to FQI) – much more reliable in producing improved policy performance over FQI given the same set of samples.
Binghamton University RB488