Methods for Diverse Exploration in Reinforcement Learning
Recent advances in autonomy technology have promoted the widespread emergence of autonomous systems in various domains such as domestic robots, self-driving vehicles, and financial management agents. The technology developed in this invention, Diverse Experience Learning (DEL), enables an autonomous system to learn how to perform a complex task from past experiences through reinforcement learning – learning from feedback without human intervention. Successful application of this technology will not only make it easier and faster to build general-purpose autonomous systems, but also enable an autonomous system to continuously improve its performance and adapt to new and dynamic environments.
Compared with competitive products without using our technology, products based on our technology possess the following advantages:
- Less dependent on human intervention (e.g., anticipating the operational environment and hard coding rules of operations and learning) in teaching a system how to perform a task, and therefore, less human cost, and faster to develop a system.
- Ability to adapt to new, uncertain, dynamic environment without human intervention (e.g., specifically instructed by a human expert to change its course of actions under a new condition).
- Forever learning capability - the longer a system is deployed, the more experience it gets, and the better it performs. Compared with alterative RL processes, in particular, existing exploration methods, our technology possesses the following advantages: i) Better sample efficiency – achieving the same level of policy performance using significantly less experience samples; ii) Faster policy improvement – achieving better policy with the same amount of experience samples; and iii) More effective and safer operations - policy performance is maintained above a safe baseline level.
Binghamton University RB514