Trustworthy Recommender Systems via Bayesian Bandits

Vivek Saravanan, Eric Song, Hien Bui, Xiqiang Liu

Mentor: Yuhua Zhu

Introduction

Recommender systems have become quite prevalent today, but many vary in different aspects, whether it be performance, accuracy, or variety. When a product has a good track record and its quality is known beforehand, it is relatively easy to make a recommendation. Ideally, a system would maximize ‘good’ products’ recommendations, and minimize ‘bad’ products’ recommendations. However, a common problem arises when the system has to determine what to do with a  new product with little to no information about it. This challenge is known as the ‘cold start’ problem. One of the most important things for a recommender system is to maintain the trust of the consumer. Many recommenders cannot solve the cold start problem, and lose the trust of consumers when they recommend too often, or "spam" consumers with new products that end up not being good. This excessive exposure of sub-optimal products ultimately reduces the system's credibility and "dilutes" the value of the recommendations overall, including the good ones. However, this problem can be addressed using Multi-Armed Bandits. Multi-Armed Bandits, or MABs, are specific environments that involve limited resources used to select from multiple choices yielding different rewards. The policies developed for MABs aim to efficiently use the limited resources to discover and exploit the highest yielding choice. While exploring these different policies, we hope to shape the recommender system dilemma into a MAB environment, and use a policy that will accurately recommend while maintaining the trust of consumers through carefully limited recommendation frequency.

 

Motivation & Research Question

In many previous works, many multi-armed bandit algorithms (MABs) have been utilized for recommender systems. These algorithms have been rigorously studied, and we referenced the algorithms and the intuition behind them in Lattimore and Szepesvári 2020. In this, they highlight the pros and cons of a collection of bandit algorithms including Upper Confidence Bound, Thompson Sampling, and Bayesian Optimal Policy, etc.

In this project, our goal is to determine: 

Our Solution

We propose a constrained bandit-based recommender system centered around the Bayesian Optimal Policy that outperforms popular asymptotically optimal bandit algorithms while retaining trust of the consumer.