Time-varying preference bandits for robot behavior personalization

HIGHLIGHTS

What: The aim of this framework is to maximize the cumulative reward over time, thereby finding an optimal option even without the explicit knowledge of the rewards. The authors propose a novel preference-based learning method, called discounted preference bandits (DPBs), to address time-varying preferences. The authors compare the performance of DPB with other methods using different criteria for query selection, such as batch active learning , information gain , and maximum regret . The experiments focused on evaluating the practical applicability of the DPB algorithm in addressing time-varying user preferences driven by environmental changes.
Who . . .

If you want to have access to all the content you need to log in!

Thanks :)

Username or Email

Password

Remember me

Lost your password?

If you don't have an account, you can create one here.

Add A Knowledge Base Question !