HIGHLIGHTS
- What: The aim of this framework is to maximize the cumulative reward over time, thereby finding an optimal option even without the explicit knowledge of the rewards. The authors propose a novel preference-based learning method, called discounted preference bandits (DPBs), to address time-varying preferences. The authors compare the performance of DPB with other methods using different criteria for query selection, such as batch active learning , information gain , and maximum regret . The experiments focused on evaluating the practical applicability of the DPB algorithm in addressing time-varying user preferences driven by environmental changes.
- Who . . .

If you want to have access to all the content you need to log in!
Thanks :)
If you don't have an account, you can create one here.