I’m fortunate to be around the Robot Intelligence through Perception Lab at TTIC.

  1. Active Advantage-Aligned Online Reinforcement Learning with Offline Data [arXiv] We propose a priority-based data sampling policy that improves on the uniform sampling of RLPD, by incorporating the onlineness of the the transitions and their estimated advantages.