Time-varying preference bandits for robot behavior personalization

HIGHLIGHTS

  • What: The aim of this framework is to maximize the cumulative reward over time, thereby finding an optimal option even without the explicit knowledge of the rewards. The authors propose a novel preference-based learning method, called discounted preference bandits (DPBs), to address time-varying preferences. The authors compare the performance of DPB with other methods using different criteria for query selection, such as batch active learning , information gain , and maximum regret . The experiments focused on evaluating the practical applicability of the DPB algorithm in addressing time-varying user preferences driven by environmental changes.
  • Who . . .

     

    Logo ScioWire Beta black

    If you want to have access to all the content you need to log in!

    Thanks :)

    If you don't have an account, you can create one here.

     

Scroll to Top

Add A Knowledge Base Question !

+ = Verify Human or Spambot ?