#19: Popularity Bias in Recommender Systems with Himan Abdollahpouri

Note: This transcript has been generated automatically using OpenAI's whisper and may contain inaccuracies or errors. We recommend listening to the audio for a better understanding of the content. Please feel free to reach out if you spot any corrections that need to be made. Thank you for your understanding.

I realized quite early that a lot of other problems, including multi-stakeholder and fairness in recommendation that people are exploring, there is one common pattern that I realized that concept is kind of contributing to all of those problems, which is popularity bias.
The popularity bias in recommendation is basically about the concept where some product, some items, some movies, some songs, they have been listened to or they have been purchased or basically any kind of interaction way, way more than the others.
Why should we care?
Like if something is popular, then it's a popular for a reason.
And that's true, but we need to be careful not to amplify.
For each user, we want to make sure that the list that is going to be recommended ultimately to this user contains more or less the same ratio of very popular, kind of popular and non-popular.
Right?
So that was the very basically simple way of making sure that what's being recommended to each user is kind of consistent with their tendency towards different popularity levels.
We need to ensure that we give a fair chance to different artists to at least reach their potential audience.
We are aligning our algorithms to basically fairly recommend different creators and different artists to different people.
The ideal situation would be that creator, we should find the most relevant user that would appreciate that creators.
This is a two-sided basically matching like, okay, we should get what type of recommendations.
Hello and welcome to this new session of RECSPERTS, a recommender systems experts.
For today's episode, I have invited Himan Abdollahpouri.
I'm very happy that he followed my invitation and joined for today's session, which will be all about popularity bias in recommender systems.
We will also be talking about multi-objective recommender systems, multi-stakeholder recommender systems, calibrated recommendations as well.
And of course about Himans work at Spotify.
Himans Abdullah Puri is a research scientist at Spotify Research in New York City.
In his former role, he was a postdoctoral fellow at the Spiegel Research Center at Northwestern University.
In 2020, he obtained his PhD in information and computer science at the University of Colorado Boulder, where he was supervised by Professor Robin Burke and as to no surprise, the topic of his PhD was popularity bias in recommendation, a multi-stakeholder perspective.
In his former roles, he was also working for Pandora and he served the RecSys community also as a co-organizer of the workshop on multi-objective recommender systems in 2022 and 2021.
Himans has published lots of papers among conferences like RecSys, UMAP and Dub Dub Dub.
So hello and welcome to the show Himans Abdullah Puri.
Hello, hello, hello.
Thank you so much for having me.
I'm very happy to be here and I have been listening to some of the previous episodes of your podcast, so I'm really excited to be here and hopefully we will have an interesting discussion that could be useful for other people as well.
Yeah, I definitely believe that we are going to have this discussion.
I mean, we have already in preparation for this episode talked quite a lot about your papers, about recent publications, the work you are doing at Spotify, and also all these notions that are connected to popularity bias, a really multifaceted topic.
And yeah, I'm really excited to have you as the expert on popularity bias for this episode.
But maybe before going into that topic, can you introduce yourself to our listeners?
Definitely.
Yeah, so hello again.
I'm Himans Abdullah Puri.
I am originally from Iran and I came to the US basically for my PhD, started at DePaul University in Chicago working with Robin Burke, Professor Robin Burke.
And then basically I transferred with him.
He went to University of Colorado Boulder.
So we went together and I basically obtained my PhD from there.
My research since actually my master's has been on recommender systems.
And that's actually also a funny story, how I started recommender systems for my basically master dissertation in Iran.
Initially I was, I picked genetic sequence analysis as my dissertation.
And I spent a month just like reading through different terminologies, all these biology terms, etc.
And I wasn't a big fan.
And what my master's MSc basically supervisor, he came, we had a meeting and he said, oh, there is this area of research that you should predict what people like.
And I was like, oh, that's interesting.
Tell me more.
And basically he showed me some papers and I got really interested in that area.
And that's how actually I started basically.
I started reading papers and while reading papers, that's where I also saw some names they kept repeating as authors.
Okay, so these are the main people then.
And Robin Burke was one of them.
Robin Burke, Bamshatmabasha, Joe Konstan.
And that's also how I ended up actually doing a PhD because I wanted to work with one of these people and Robin Burke was actually one of the people that I really wanted to work with him.
That's been like my research PhD was also on recommender systems, more specifically multi stakeholder.
And I did a postdoc also, as you mentioned, at Northwestern with Professor Edward Malhous, who is also really a great person and recently has been very also active at RecSys.
And now I'm a research scientist at Spotify.
Obviously Spotify is one of the biggest names when it comes to personalization and recommendation systems.
So I'm very happy to work somewhere that it's very relevant to what I have been doing.
And I mean, based in New York City.
And yeah, so I think that should be good.
Yeah, really, really nice.
And already a couple of questions that are popping up in my mind.
And I really love it to finally have managed to get someone from Spotify to into this podcast, which I guess has kind of very many meanings.
I mean, you folks are very much into podcasting or having started with that podcast bet in 2018 or 2019 somewhere in then.
I mean, it's also actually, if I'm correct, I don't know the exact figures, I guess, the product that most people mentioned when it comes to one of my questions that I asked my guests by the end of each episode as to what is the product they have in mind when thinking about good personalization.
And then a lot of times we hear basically Spotify or more specifically the Discover Weekly Playlist.
So you folks are really doing a great job there.
And I hope and I would also be happy to have more of you on the show because it's actually not only your work, it's also the work of many others who are contributing to this great product. So yeah, definitely, definitely great things there in the domain of music recommendations in so many different parts.
So when it comes to playlists, podcasts and with all these challenges that one has.
So evaluation, dynamics, taste and what else you have actually mentioned going back to your master thesis that there was a supervisor and you weren't actually happy with the first month doing genetic sequence analysis.
And he basically proposed that RecSys like paper or predicting what people like to you.
Do you actually remember what was the first paper that you read that was about recommender systems?
Yes, I remember actually.
Yeah, that was I think in 2000, if I'm not mistaken, maybe 2011 or 12, it was a paper.
It's it's it's called a survey on collaborative filtering.
I think it's by I know his last name.
It's I think it's Hoshkoftar.
So it's a survey. It's a very simple survey.
But for the time, it was quite enough to get into the field.
So that was my very first paper.
Okay, okay.
And then you basically decided so you got hooked by the topic and you got to explore the names of the field and you basically decided whom you want to be supervised by and follow and learn more about in this field and decided to go into that PhD program.
But you first mentioned that originally or mainly you mentioned that it was rather about multi stakeholder recommendations.
And so how has popularity bias as the title of your PhD thesis evolved from there?
Was this something that was also set up from the very beginning or something that evolved over time while exploring multi stakeholder access?
That's a great question.
So, yeah, basically during in the first two years of my PhD, you know, I was exploring some promising and interesting areas that are under explored.
And the area of multi stakeholder was something that both me and my advisor, Robin Burke, we realized it's quite under explored because a lot of recommendations they focus on, OK, what's the best for the user?
But a lot of times, you know, there are other things, other stakeholders that needs to be taken into account.
And that was the initial basically starting point.
But the kind of experimentation in this area, especially in academia, is not very straightforward because there are not many data set that contain all of these different preferences of different stakeholders.
Or, you know, there are a lot of other things that needs to be available to do really experimental research in this area.
And simultaneously, I realized quite early that a lot of other problems, including multi stakeholder and fairness in recommendation that people are exploring, there is one common pattern that I realized that concept is kind of contributing to all of those problems, which is popularity bias.
And I realized it's like the kind of usual suspect.
Like whenever I see something, I was reading a paper on unfairness or, you know, on other aspects of bias, I realized the main cause of often is actually popularity bias.
So that's what made me really interested in this area to see what is actually is popularity bias and how it's causing some kind of like other unfairness or other type of biases to different stakeholders.
So that's how it started, actually.
I see.
Okay, so basically, you draw the lines and saw the connections and they all somehow rooted back to the cause of all evil popularity bias.
But yeah, I guess we will definitely explore that evil would be a bit too hard because in some of your works, you have also shown that users who have a higher appetite for popular content should also be served accordingly.
But this is something that as many things you could also personalize or calibrate accordingly.
Exactly.
Yeah, I guess this also tightly connects to one of the key aspects in MSRS that you have already alluded to, which is actually the availability of proper data sets to do research.
I mean, there are obvious reasons why this data isn't there in the wild because it sometimes contains really business critical information that companies, of course, want to keep proprietary, something that you definitely, I would assume, have access to as an employee of such a company, where you could basically do research and multi stakeholder recommender systems by having access to data, which you could basically assess the impact on different stakeholders of such a system and also answer questions accordingly.
Because sometimes if you look for public data sets, then you might have to resort to simulations or something like that.
And then it's, of course, questionable how much your final conclusions might apply really to systems in the wild, right?
Exactly.
Yeah, I mean, that's definitely one of the things that, you know, the problems that you are solving and the access to the tools and the data that needs to be available for those type of research is often it's more straightforward when you work in industry because, you know, that's like everything is there, like the data, you know, everything is available.
So you are more flexible with trying different ideas that like sometimes you might have a very interesting idea, but for in order to work, you might need some type of data either about, you know, and by data, I mean, let's say, you know, for example, user preferences or or for example, artist preferences.
So sometimes, you know, those are not available publicly, right?
So it's I think, yeah, definitely it's it makes it much more straightforward to try different ideas, especially in this area, like multi objective, multi stakeholder to to do interesting work.
Mm hmm.
Oh, I see.
I see.
Yeah.
And I definitely also want to talk about what is kind of your daily work at Spotify?
What are the problems that you are solving there?
But before we go into this, I want to get a bit back in basically your research history, which I guess still has a heavy impact on what you are doing nowadays and basically also shaped your current industry research, which is popularity bias.
I assume that popularity bias to many of the listeners of Rexburg is actually something well known, or at least known.
Some of them are dealing with it.
Some of them are actually solving it.
Some of them might also have implemented some ideas from your papers.
But nevertheless, let's start from the very beginning.
So please introduce us into the world of popularity bias and recommender system.
So what is it and why is it important?
Sure.
Yeah.
So basically, if I want to really simply say, let's say, when we say popularity bias, like that, it has two terms, popularity and bias, right?
So it's quite self reflective.
So the issue is in many not just recommender system, actually, just in machine learning, when something is more common, then the machine learning algorithms learn that and it tries to basically generalize to that more common thing, because that's a much safer choice.
So the popularity bias and recommendation is basically about the concept where some product, some items, some movies, some songs, they have been listened to, or they have been purchased, or basically any kind of interaction way, way more than the others.
And this is something that we see in many domains, it's literally almost in many domains, there are some stuff that are just more popular, right?
Something not even for a good reason, they are just more popular for some reason.
So that's basically the popularity issue itself, right?
So this bias could come externally, like as, for example, even without, even before recommendation algorithm that even exists, there were movies that they were more popular, right?
There are some foods that are more popular.
So that's just the phenomenon that exists in the world that some stuff are just more popular.
It's the same in movies and music, for example, here.
So let's let's take music as an example.
So externally, some artists are just more popular, and we cannot do anything about it, because this is just how the world externally operates.
Like everybody knows Beyonce, everybody knows, you know, like, for example, Ed Sheeran, right?
But not many people might know a local artist in Iran, right?
So that's external, basically popularity bias, right?
Because they have listened to those people, for example, on the TV, saw them in advertising, because of course, very popular artists get sponsored or featured much more often in advertising, because it's familiar.
Yeah, so exactly.
So so all of that externally contribute to someone become more famous and become more popular.
And with recommendation algorithms, that's where it starts.
Let's actually go back to machine learning algorithm.
Let's say we have a classification algorithm that tries to classify, let's say a very typical example, whether an image is a dog or it's a cat, for example.
So and imagine from 100 images, 99 of these images is an image of a dog, one image is a cat, right?
So a very simple algorithm can literally give you very good accuracy, but by all by always saying it's a dog, right?
And then you can write a very great LinkedIn post about this and say, hey, I found this algorithm that gets 99% of accuracy.
Exactly.
Exactly.
So that means the data was biased towards dogs images.
So it was the popularity was actually for the dogs images, right?
So back to the recommendation algorithms, let's say, there are certain, basically, movies or music that have been streamed by people more, right?
Maybe they even search for it, not even by recommendation algorithm.
They just search for those movies or songs more.
So that phenomenon exists anyway.
And the issue is, because of these phenomenon, the algorithm, they also pick up that bias.
And if you're not careful, that bias might get worse because imagine there is a popularity bias.
And imagine the algorithm picks the most popular, let's say movies or artists for you.
So because of that, because of that recommendation, you might stream them, right?
So when you stream them, that stream will be added as another stream to that artist.
So later on, the interactions of that artist or that movie will become even more so because of course, you are going to introduce the newly streamed interactions, new interactions in your retraining procedure, and then it kind of perpetuates.
Exactly. So that's where the issue starts.
And that's what people when initially when I was working on popularity bias, I was getting questions sometimes, why should we care?
Like if something is popular, then it's a popular for a reason.
And that's true.
But we need to be careful not to amplify this by that is kind of the key to it.
Because if you start out with recommender models in your overall recommender system, then you typically think about different approaches and combine them.
And for example, you might have a trending recommender, a popularity recommender, a recommender that is somehow context aware, a collaborative filtering one, and so on and so forth, fueling all your stuff that you might have on display. And yeah, one of the most, let's say, intuitive ones sometimes is really the popular movies, popular playlists, whatever recommender. So one of the questions that comes to my mind in regards to this is actually, should we even use a popularity based recommender when we actually identify popularity as a problem?
Well, that's a great question. So the answer is not as simply yes and no.
There are cases that you just start your business or the user is a new user, right?
So you don't have much information about that user. And it's also risky just to recommend random stuff to this person because to avoid popularity. So as a start, it's not a bad idea, like you want to make sure that you show the best of the best in terms of majority like, okay, something is popular. So the majority of people have kind of approved this content, right? So basically, that gives you some idea, maybe a new user when they join, okay, what should I recommend?
So that seems like the most intuitive thing to basically do, like you recommend top 10.
And that's the starting point, right? They play something, and then they might search for something else. So over time, you learn the preferences of that person, and then you give them the most personalized recommendation later on. Yeah, I guess this is the interesting differentiation of many terms that are sometimes also used interchangeably, unfortunately, which is somehow relevance, popularity, personalization. I mean, popularity based recommendations are typically not personalized, this is why they are popular. But popular recommendations can indeed be very relevant. But what we want to have is that personalized recommendations that are relevant. However, relevant sometimes also might be too narrow to look at it. And then we are entering all that discussion about beyond accuracy. And are we solving for the right problems? So relevance might not be necessarily what we might want to optimize for. Because in the end, I mean, we do have many stakeholders, and we do want to optimize for their overall satisfaction to get basically a healthy marketplace. I'm actually thinking about that paper that was presented, I guess, at CIKM 2018 by Gishatmirota and others about towards a fair marketplace also by Spotify, where they actually looked into relevance, satisfaction and fairness.
And also what I found very interesting there is that even a model that doesn't optimize for relevance, but for some interpolation and adapted personally to users or sensitivity of relevance and fairness can reach higher levels of satisfaction than purely relevant recommendations.
So something definitely where we see, okay, optimizing for relevance only might be too narrow or focused. But of course, it can make sense when you want to get things started.
To start with popularity recommendations, as you said, when you don't have any clue about your users yet, or maybe you haven't also collected something as part of the onboarding that you could already use to do some first personalized steps. However, and as you said, you want to do this basically not for too long, but as long up to the point where you have collected enough evidence from the user to, for example, do collaborative filtering. However, this doesn't prevent us from popularity bias. And I guess in some of your papers, you have studied the effect that popularity bias and its amplification has in certain collaborative filtering algorithms. Can you maybe walk us a bit through these dynamics and to which degree different algorithms within the realm of collaborative filtering are sensitive to popularity bias? Sure. Yeah. So basically, if we go back to, you know, like user preferences, right? And why they like popular, some people, they just like popular stuff. You know, some people, they might be more interested into niche stuff. And again, I've seen this in many domains, like recently, you know, it's been a couple of years that I have been quite into the world of like, sense and colons. And that's there too, like, the majority of people, they like more popular designers, but there are people that they like very niche sense, right? So in music, the same in movies, I've seen this in quite many areas. So how we can give personalized recommendations to people that matches the type of interest that they have in terms of popularity, right? So the most popular recommended, the issue is, it assumes everybody likes those popular recommendations. And that's a very simplistic and maybe naive way of thinking, because they are popular, okay, for a reason. But then you need to also ensure that you are capturing other people's taste that might not well be defined within popular items, right? Let's say I'm very much interested in international movies, like I am interested in some Japanese old movies or like movies from Poland, and France. So a popular movie or recommender might not work well for me, because it might recommend me Avengers, I don't know, Barbie or those kind of stuff, right? Star Wars.
A Star Wars, exactly. Yeah, and I think I forgot to mention that the famous Iranian movie maker, Abbas Kirosztami, so he's also one of my all time favorites. So in order a platform to make me and other people like me happy, and we might not be the majority of users, sure, but we are still a part of the platform. The algorithm needs to find out that, okay, these people are not into this super popular. So how can we model their preference that they get what they want?
And that's how we want to operate what's what we call a good personalization. A personalization is that gives people what they want. You might be very much interested in super popular and trendy stuff. So the algorithm should help you to basically just get what you want. But we have a spectrum of people with different interests in different ratios of popularity. So yeah, basically, that's how. And as you mentioned, in one of the papers, that I think it's been a couple of years already, that you know, investigated some algorithms like matrix factorization, collaborative filtering, like user based collaborative filtering item based, they show different degrees of popularity bias, but basically all of them they showed quite bias in terms of the popularity. The only algorithm that didn't show much bias was random algorithm. So yeah, so basically, if you don't want any bias, I think you need to just kind of recommend random stuff. And again, it's also how you measure bias, do you measure bias in terms of how much you amplify it? Or do you measure bias in terms of how much bias you see in the recommendation in terms of whether something is being recommended more than the others? Because that could be still similar to how it was in the history of interactions. So if you consider being similar as still bias, then a lot of algorithms are biased, right? But often it's much better if you see it, are they amplified? Yeah, then it's a different story. Yeah, yeah. So that means that the overall goal would not be to eliminate bias, but to get bias to the right level and not let it say get out of control. So I guess you have related to that aspect that there are, of course, users are different. I mean, this is the reason why we do all of this, because we want to personalize to different tastes, but also to different contexts and so on. Thinking about that individual appetite for popularity, or that propensity of popularity. So how do we account for this in the recommendations that we serve to the users? So yeah, basically once you know users' tendency towards popularity, right? You could basically treat these as different groups of people, right? So one group is very interested in popularity, one group is not interested, and etc. And one basically very simple approach would be, okay, so then let's make sure that we de-bias the recommendations for group B and C, that they are not interested in popularity, right? We do these de-biasing algorithms on these groups specifically, or we basically even made, you could use a different algorithm for each group of people. That's more like these different cohorts of users. Ideally, yeah, ideally it's much more difficult to serve users with niche taste than it is for users with mainstream taste. Because with mainstream taste, you don't probably need to do much in terms of changing the algorithm, because the algorithm is already doing what it's supposed to do. But yeah, I mean, it becomes much more complex when you want to really design an algorithm that works perfectly for everyone, because even the niche has different meaning, like, okay, the niche doesn't mean you mean like movies in your own country.
So then, okay, maybe one feature that you could use for your model is actually that, like, the country of your residence, okay, maybe that could capture some of your taste, you know.
So yeah, it quite needs a lot of investigation and a lot of different type of features to be used to be able to capture all these different cohorts of user in terms of what they want to listen or to watch. Okay, I see. I guess one of the more simplistic approach that you took in one of your papers where you said, okay, let's look at the broad spectrum of users. And I guess in that example, you took the movie lens data set, and you first determined the popularity of the items.
And then you basically checked for each user to which degree those popular items were prevalent in their user history. And by that, we're able to also sort the user by their propensity towards popular items from, let's say, those blockbuster users over to the diverse users, and then I guess the final group or the niche users. And I guess you did some equally sized user groups so that you had 33% within each group. So is this a first approximation to tailor the effect popular recommendations have on different user groups? And how could you go from there to really make it personalized? Yeah, so that paper was basically very basic and simple approach to somehow mitigate popularity bias. As you mentioned, like we could even categorize items into, let's say, three categories, extremely popular, kind of popular, and non popular. And then for each user, we define, okay, what percentage of their listening history is super popular and kind of popular and non popular.
And then let's say we have an algorithm that just does pure personalization, it doesn't do any debias or anything. So that algorithm, we, we ask that algorithm to recommend us a large list, which is larger than what we want to show to the user, because we want to extract something from that large list. Okay, that large list contains way more items, but with good relevance, basically.
And then we leverage this kind of a reranking step that for each user, we want to make sure that the list that is going to be recommended ultimately to this user contains more or less the same ratio of very popular, kind of popular and non popular, right? So that was the very basically simple way of making sure that what's being recommended to each user is kind of consistent with their tendency towards different popularity levels, right? In practice, and that's what basically makes industry a bit more challenging. In practice, something like this might not work in real applications, because let's actually have movie as an example, again, like, you know, like Netflix, you know, has different shelves. So, okay, is the final list? Is it a shelf? Or is it a set of shelf? Is it a homepage? So if the shelf is 10 movies, like three of them will be very popular and two kind of and, and then non popular, or it's better that we have a couple of shelves, one shelf is very popular, one shelf is not popular, one shelf is least popular. So it becomes much more detailed in practice. But in theory, basically, the goal is to recommend a list of items. And that's what we we call in theory, but in practice, it's a bit more complicated.
I see. And I mean, with that, we are already also hovering the topic of bias mitigation strategy.
So you have mentioned in that very paper, you employed a re ranking step that does typically a way of post processing the results you get from the recommender model, in the sense that you tweak for an additional goal, because your recommender model, as you said, might be a personalized model, and thereby it might be optimized for some ranking metric. And then afterwards, you want to optimize for some metric of fairness as somehow the inverse of popularity bias, or as a kind of approximation to which degree you mitigate the popularity bias, which is actually not the only way of how you could mitigate it. So re ranking as a post processing step, you can also move that bias mitigation procedure into the algorithm itself. How does this work? And which approaches do exist there? Yeah, no, that's definitely true. Like, as we mentioned, basically, in the beginning of the podcast, the popularity bias amplification comes from the popularity bias in data, right, because the data is skewed towards having more interactions for few items. So one way that, in theory, could work, but in practice, it's not much used is basically filtering the data in a way that you are de biasing your data. And by that, for example, you might want to under sample some of these very popular interactions such that in in your sampled data, the interactions are not very biased towards few items, you have less bias in the data. So somehow of a reweighting strategy?
Well, it's not really reweighting, you are not considering those interactions in your sample data, like, let's say, let's say one item has just as an example, has 1000 interactions. And the let's say the other item that have five or five or 10 interactions, you remove the interactions of that 1000 interaction for that popular item and maybe pick five or six interactions for that, you know, basically, you are trying to make a more balanced data set that the algorithm wouldn't amplify. But that means, you know, you're losing information because you are you need to remove all of some of the interactions that could actually help even with the personalization itself. So in theory, that's one approach, I have not used that approach much myself. And I am not personally aware of where like, that could be applicable or where it's being used in practice, there might be so I'm not really aware myself. So then we have two other approaches. The other one, as you pointed out is de biasing the model itself. And that also depends what model you're using, like, you know, are using kind of collaborative filtering or using matrix factorization or using learning method. So in any of these algorithms, there is some part of the algorithm that if you play with that part, you could somehow try to mitigate popularity bias. So for example, let's say your model is a user based collaborative filtering, right? And user based collaborative filtering, you're trying to find similar neighbors to a user and use their interests basically to find your interest. So where popularity bias can creep in in this algorithm. So popular items are rated by more people, right? So that means the neighbor that you are picking for a specific user, you know, if the size of let's say if the size of your neighborhood is 10 versus 100 versus 1000, that impacts how much popular items can creep in into that neighborhood that you are trying to use for recommendations. So the neighborhood size is what you could play with. And might you might be able to get some different results in terms of popularity, let's say so that's the best for collaborative filtering. And then let's say you you have a deep learning method, you know, in deep learning method, often you use like some kind of binary rating, like either someone has liked something or someone has not liked something, it's often not rating based, it's more like a binary interaction. And what's happening in practice, in many at least deep learning method that I have been dealing with, and I have been working with is that in order to model to learn user preferences, it also needs some negative examples, like all the things that user has seen, these are positive examples. But what are the negative examples, right? So then, you know, there are some algorithms that they do, like in batch negative sampling. And that's where you also can play with this process of in batch negative sampling, make sure that popular items or non popular item, they get different weights into being used as negative samples for the process of in batch negative sampling, you know, you might do basically matrix factorization or other type of methods that you want to compute the loss function. So you could also there can give a different weight to the type of error that you make for predicting the ratings of non popular items versus popular items. By that you can play with the weights of that and hopefully the model can learn to be kind of do well for non popular items, right? So these are all the different classes of what we call like, in processing, like it's happening within the model. And again, in practice, the majority of the solutions I have seen belong to the third category, which is a post processing algorithms, especially in industrial applications, often recommendations follow some multi stage process, like stage is usually a lightweight algorithm that generate a candidate set based because, let's say we have 10s of millions of options, and you want to extract a set of candidates. So the first stage is a candidate generation. And usually it's a very it's a lightweight algorithm, not much complex. So maybe some kind of a nearest neighbor or something like that approximate nearest neighbors. And once you have an initial candidate set, let's say, for each user, you have, let's say 1000 1000 items. Hopefully, they are also still relevant. And that's where the re ranking comes in, like, okay, now from 1000 items, you want to extract 10 items for the user that are very relevant. But they cover specifically in terms of popularity that we are talking about, they cover not just popular, but also a little bit less popular items within that 10. And that's how the basically re ranking algorithms work. Again, this is assuming the candidate said the 1000 one has some less popular items, because if there's no less popular item in that set, then there is nothing that the re ranking can do. Okay, I understand. Great coverage of the different points where one could intervene. So as you said, starting with the data as a pre processing step, going right into the algorithm and adapting the model as in processing, or once the list of recommendations, and then correspondingly, a larger list of recommendations that also gives you the freedom to optimize for additional goals, you then do post processing, for example, via re ranking the list to adjust for that secondary or additional criterion. With regards to the second step, or so the in processing, what about the role of regularizers there because you have talked about, for example, some smarter way of performing negative sampling showing the right negative samples to the algorithm, for example. But there was also some work by you, Robin Burke, and I'm shot more basher that was published some years ago at RecSys in in como 2017, with the name controlling popularity bias and learning to rank recommendation, where you have actually presented such a regularizer.
Um, can you touch on the topic of how well properly designed regularizers might help us?
Is this way that is somehow promising? Or is it just way easier, way more effective and efficient to just take the re ranking approach? Because I sometimes have the feeling it's very easy to do the re ranking compared to maybe the other stuff, which is maybe less controllable. However, with the re ranking approach, what I sometimes see as one of the bigger challenges, not to definitely name it a problem, but a challenge is how many items do I need to retrieve to have enough freedom to pick the right from as part of my re ranking steps. So for example, if I want to come up with a balanced list of 10 recommended items, is it better to first pull 50 100 300 500?
How many need do I need to pull? And I mean, the more I pull, the higher the chance I can achieve my final goals, but also the higher the load is on my system. And if I do this for millions of consumers, within a short time window, this can present a heavy load on my system. So this is why I also sometimes think, Hmm, is this really realistically a good or efficient way to do re ranking all the time, even though it might be easier? Or would it be more or smarter to go for something that directly optimizes as part of an overall optimization criterion that combines various objectives? Yeah, so back to the paper that you mentioned. So basically, that approach, which uses like a regularizer is what I meant with giving different weights to the error that the method makes for different groups of items. So you let's say we added this regularizer that penalizes basically the error that the algorithm makes for less popular items more, right. And it did perform well compared to the algorithm that does not mitigate popularity by a surprise. I mean, this is this is basically, it happens in a lot of many papers, you know, like we kind of propose something and we, we only compare it to the algorithm to the base algorithm that doesn't do anything to mitigate a bias or and then all we improved it. Yeah. So I think that was one of my very first papers. And I'm also guilty of that. I think at the time, I think we didn't compare with another algorithm that also improves popularity bias. But I mean, maybe at the time the bar for access was not as high, so it got accepted. But yes, so regularizing regularization was and is one of the approaches, but it's, it's very tricky in general, like the any, it gets into a lot of math, and I don't have much deep understanding of what's happening within. And I don't know, actually, anyone has really explored this to that depth. Like when you incorporate other objectives into your recommendation within the learning step, it's not very intuitive how the model is learning the user preferences, like how it draws the boundary for people that are more familiar with, let's say, classification, how the model is drawing that line to separate positive from negative, like that becomes very complex, and it's not intuitive, how it's happening. And in practice, and actually, there was a dissertation, if I'm not mistaken, I think his name is Yatsik, I forgot his last name. He was a PhD student with Neil Harley. I think in his dissertation, actually, he specifically explored that area of how re ranking algorithms perform compared to changing the model for improving diversity. But in practice, the re ranking is performing really well, actually, in general, you're concerned about how many items to extract, you know, that's a one thing. That's a one time thing to try, like, you know, you first, you might, you might try 100, and then maybe 1000. And then that's set. Now, I realized from now on, I need to extract 2000 items. And that's good enough. Okay, so like a hyper parameter you tweak.
Exactly. And it's more intuitive. Like when you do re ranking, you can, intuitively, it's very easy to understand what's happening. Like, okay, from a list of 10, I want five of them to be non popular, and five to be popular, you could really ensure that this is happening. Right. Within processing, it's much more complex to really ensure what the output should look like. Okay, makes sense. And I mean, yes, okay, this basically answers the question of how to find the right number of, let's say, secondary candidates you want to pull. However, what about the efficiency aspects? So if this grows too large, and actually also needs to be done in real time, and not as a batch processing way. So what about these concerns? Or what is your opinion on this? Yeah, so often, in the type of algorithm that I have been working, they are usually like batch recommendations, you know, they might generate least of recommendations for people once a day, or maybe twice a day. And they can be stored in some databases or some tables, you know, that during the when the user comes to the platform, those are those can be basically shown to the user. It's definitely having more complex algorithms, it's more heavy when you want to do interactively recommend interactive recommendations. But that's the same for in processing as well. Like if you use an in processing algorithm, and you still want to do a lot of regularization, and okay, how, what's the weight of the regularizer? What's you know, that's, that's also very complex, if you want to have basically on demand, like the moment that the user clicks on something, the recommendation should be generated in real real time. So that's basically heavy, regardless of what is a pre processing or post processing. So that's still there. So for that particular reason, I don't see a preference over which group of algorithm to be used for popularity bias reduction.
With regards to the bias mitigation strategies, we somehow implicitly assume that we have collaborative filtering at our disposal or mainly focusing on collaborative filtering there.
However, I mean, this is not the only thing we do to solve for personalization. I mean, we also do take into account content features of users profiles that we might have learned.
And you have already mentioned quite a good example from your own experience. I mean, you're coming from Iran, and you have alluded to that famous Iranian movie maker, who is very popular there, but maybe people in Western countries haven't heard about or just a few of them.
So in that sense, if you would, let's say, subscribe as a new user to Netflix or some other video streaming platform, and if as part of their onboarding, they would ask you about the country you grew up in and then take the popular information about certain categories of that country they have at their disposal. Could popularity and combination with content help to solve popularity bias or to mitigate popularity bias?
Yeah, that's a great question. In general, I think this is definitely something that is being helpful. Like, in theory, when we build recommendations, we always think the only thing is available is the user content interaction. But in practice, when you build large scale real world recommendations, you have more features about even the content of the item. Okay, what's the genre? What's let's say for podcasts, what's the transcript, what's the topic is about for the user, as you mentioned, you have the country, you have the language, you know, all of that is really helping the model not to concentrate on popular content, right? Because the model usually concentrate on popular items, because it doesn't have much information, it only has interaction history. And because the interaction history is concentrated on popular items, so we shouldn't really blame the model, the model also learns to concentrate on popular items. So the more information you give to the model, in terms of the content, the information about the user information about the context information about the item, then you're helping the model to not really concentrate too much on popular items for people who have no interest in popular items.
So definitely adding more content is also a very good way of dealing with that.
So one could say that adding more content and using more richer item context and user representations could also be considered as a way to perform bias mitigation. So I mean, there's one way if you want to stay within the realm of collaborative filtering, then you need to do regularization or re-ranking or whatever. But if you are even, let's say, able to move outside of the realm and maybe move from a collaborative filtering model to a hybrid model, then this could also already not only, for example, improve accuracy of recommendations, but also deliver popularity mitigation. That's exactly the case. Yeah, I like the way that you put it. Yeah, I think that's definitely true. Which is actually bringing us a bit to wrap up that whole topic of popularity bias to the topic of quantifying popularity. And by that, I do not necessarily mean what are the thresholds. I mean, where in some papers, we have that 20%, 60%, 20% into head, medium and tail items, or some equally sized bins with regards to the users.
What I rather mean is what are the right metrics to quantify the effect of debiasing on items and users. And I'm saying this explicitly because you were actually bringing that up in a paper that was about the user centered evaluation of popularity bias, where you actually wrote that most of the research up to that point has mainly focused on the effects of debiasing on items and accordingly, and this is something that we will talk to in a minute, also the providers of items, so fairness and all the downstream related topics. But you were bringing up that there is an effect, of course, on the users. So let's perform some user centered evaluation. Can you briefly give us a short overview of potential metrics and maybe their pros and cons? Maybe that won't be too brief, but maybe just the main aspects that you that you would see there, maybe also some advice if you have? Yeah, definitely. Yeah, that's a great question. And I think that in general, evaluation to me is probably the most important part of any research, because you could work a lot on having the best algorithm, but how to evaluate whether this is the best algorithm is very important. Like what, what, what are you measuring? So for popularity bias, like intuitively, there are some metrics that come to mind when you want to improve popularity bias. So you want to measure, okay, how your algorithm has decreased the over concentration on popular items, right? So one way that people have looked at, and it's very simple, you have the average popularity of every item in your training data, you know, these items popularity is, I don't know, it's 90%. It means it's being listened to by 90% of people. This one is like 1%. So you have all that numbers. And you could measure that average in the list of recommendation that you give as well. Okay. So on average, what's the average popularity of the thing that I have recommended? If it's decreasing, compared to another algorithm, it means you have recommended less popular items, right? So that's one thing. But again, average, and not just for this application, in general, average has this issue of some tiny, like some few outliers might impact the whole average, right? So you might have done actually very well in lowering popularity bias, let's say 10 items you have recommended, nine of them are actually very non popular. And then one of them is extremely popular, right?
So that just impacts the average. Yeah. So that's why not just for this example, in many machine learning models, you need multiple metrics to make sure the metric is not messing with you, basically. So for that, let's say to capture that problem that okay, maybe there is something too popular, and the majority are actually non popular, you could also look at the ratio of non popular versus popular. So if you look at that one, the previous example would look good, because nine of them was non popular, and only one was popular. So nine, nine out of 10 is actually good. So nine out of 10 is not popular. So these are basically four items, right? And then distribution wise, there are metrics that initially they were being used quite in economics, for example, Gini index, right? So it measures the inequality, and it's being used mainly in historically for wealth inequality. That means in a country or in the world, maybe few people, they own the majority of wealth, right? So and for that, they measure Gini index of countries. So a Gini index close to one means this is there is an extreme inequality. And Gini index closer to 0.5 is it's better, like in terms of any, yeah, concentrated versus uniform. Exactly. So that means when you measure the Gini index of recommendation, so by that you want to make sure, okay, let's say you have 100 items, and each of them, they are recommended, like sometimes, like one is recommended 100, one is recommended five times. So you have all these number of times that being recommended. So you measure the Gini index of that, if it's close to one, that means some items, some few items, and that the emphasis should be on the few. And that's what makes Gini index go high, some few items are recommended way, way more frequently. And that makes the Gini index to go high. So that's another metric that you could look at. So this is from the items side, and as you mentioned, from the provider side, kind of similar, because literally, the provider is just either one or multiple items can be seen as a provider. So the provider can either own one item or multiple items. So the same type of metric can be applied to providers as well.
Like what makes the user also interesting here from the user-based perspective, as you mentioned in the paper, is, okay, so now how are we doing in capturing users' tendency towards popularity?
Right? And the way we measured it, and it's just one way, but you might be able to do it differently, is that we know the user's tendency towards popularity. We know their average, the average popularity of their listened or watched movies or music, or like whatever item they are consuming. We know the average popularity. And we also know the ratio, if we group items based on popularity, let's say, super popular, kind of popular, non-popular, we have that ratio as well for each user. So for example, your ratio might be 70% popular, 20%, kind of popular, 10% non-popular, right? In order to measure how we have mitigated the bias from your perspective, from user's perspective, we measure this deviation, distance between the recommendation list I gave it to you in terms of the popularity and your own historical popularity distribution. If it's far away, that means I have not done well. But if it's close, it means it's good. I guess what you are referring to is that sense of calibration.
Exactly.
One could see in a couple of your works. So it seems like you're a great fan of that idea of calibration. I guess many people know the paper by Harald Steck by Netflix called Calibrated Recommendations. And you have used a similar notion, just using, I guess, that Jensen Shannon similarity in your paper, where you came up with a calibrated popularity, if I'm correct there, for the user centered evaluation. And then I guess also in your, I guess, most recent paper, you again, basically stuck to a notion of calibration and improving on the paper by Harald Steck in some certain sense. So would be great also to have a short talk about this. But since we are already touching also then a bit on your work at Spotify, can you tell us where your expertise and popularity bias is currently becoming effective in your work at Spotify and where Spotify is really concerned about popularity bias and also maybe how you solve it?
Sure. Yeah. So like many other domains that I mentioned, music streaming is also, there are super stars, super popular artists. And even in podcasts, you know, there are popular creators.
So the popularity bias, it's there, right? And we need to ensure that we give a fair chance to different artists to at least reach their potential audience. And that's where the work on popularity bias mitigation can come into play. And I mean, we discussed the different methods and you know, different teams, they might basically approach this problem differently. But the main concept is the same, you are trying to reduce the over concentration on super popular content that is basically more than what they should have gotten, you know. So that's where it's becoming quite useful, the popularity bias mitigation. And it's kind of mitigating popularity bias in practice is also not just to basically mitigate the bias just as itself. It's also actually, in practice, it's actually even more important, because you need to give a fair chance to some artist or some creators or, you know, in general, that are new. Right. So if you if you only rely on a machine learning algorithm that trains on historical data, those new items, they don't have much interaction.
And the algorithm might think, okay, these are not good items because there is no much interaction.
But that's not true. Like these are new. So you need to give them enough chance.
Because those new items at some point, they might become the next popular stuff, right? So that needs to be also handled like by, you know, some doing exploration, you know, kind of thing that you want to make sure that each item, each song, each podcast, everything, they get a fair chance of first being exposed to potential audience and to learn how much potential this has in order to actually recommend to more people. Yeah, so that's definitely something that we take it very seriously, we have actually a dedicated team on like transparency and algorithmic fairness, that specifically they monitor the recommendation algorithm or any other algorithms to make sure it's actually in line with the fair practices and everything. So yeah, definitely, it is one of the very important things that we take it very seriously. Okay, I see. Which is, I guess, a good hand over to get a bit broader. Because as part of the introduction, I've already said, popularity bias is basically our elephant in the room. But it's just one of many objectives that we want to solve for. And when talking about many objectives in the realm of recommender systems, then we are talking about multi objective recommender systems. And you have, as I said, been one of the organizers of that multi objective recommender systems workshop, that has taken place in 2021 2022, I guess as well. Can you give us an overview about multi objective recommender systems and which other objectives might play an important role and what Morse is actually all about? Definitely. Yeah, before I get into that, so these multi objective workshops that happened was kind of the continuation of multi stakeholder workshops that happened in 2019 and 2017. So for people that don't want to reference, they are kind of relevant, but just maybe with different names. So the idea of multi objective recommendation, as I think the term is also very self reflective, is there are multiple criteria, multiple objectives that you care about when you generate a list of recommendations for users? And kind of intuitively, well, the main objective is the relevance of the recommendations to the user, because that's the main purpose of having a good recommendation system. But there might be, you know, some other objectives that you want to ensure that at least you take care of them. And sometimes it might be quite impossible to achieve all of them, but taking care of them and thinking about them systematically, because there are situations that you could achieve the user relevance without any change, like you could achieve the same user relevance that you would have achieved, even without multi objective. But there are some rooms, some possibilities that you could achieve some other object at the same time. And that's the ideal situation, basically, the other objectives could be anything, it could be, let's say, helping less known creators or less known artists in e commerce, it could be, let's say products that might go out of stock soon, it could be products that might, if you don't sell them on time, they might go bad, you know, like, it's a kind of online grocery, you know, there are objectives that they are important to met, right? Not just about just using a simple algorithm and pick the most relevant for the user, but these objects are very important.
So that's where basically the main idea how to incorporate all these objectives into one solution that, at the end, the recommendations that are generated for the user, more or less it has been taken into account all these different objectives. So maybe staying with that online grocery or retail example. I mean, you are basically taking that business stance there that the user perspective might be rather touching your top line, you want to grow your user base, you want to have loyal users that retain to basically earn and extend your revenue with them. But on the other side, of course, you want to decrease costs and, I guess, wasting groceries that might become bad because you haven't sold them in time is actually some implicit cost because you have purchased them from your supplier and you are not selling them to your customers. And in the end, it's not only the cost of those goods, but also the cost actually of, yeah, disposing them properly. And I mean, then you additionally do have that implicit cost of not having sold them. I actually like that because it shows how powerful recommender systems could be and that they are far beyond this relevance only perspective, but can be used as a vehicle to optimize for even more goals that one wouldn't think about in the first place. Exactly. Yeah, that's exactly the case. In addition to all the reasons you mentioned, let's say, I mean, even the food waste is a big problem, right? I mean, in an ideal world, food shouldn't be wasted. Right. If there was customers for it, that they could have purchased and they could have consumed those food. So that's one important objective. And actually, there is a paper by my postdoc mentor and his PhD student that I think it was last year, Morse workshop. It's about this topic, like multi- stakeholder algorithm for perishable products. I see. Design a multi-objective algorithm that can help with that.
But also the objectives, again, for this particular example, it could be also recommending you based on your health status. And I don't know if this platform exists in reality, but let's say there is an online grocery that is aware of your diet, is aware of your health status, if you have diabetes, if you have some blood pressure problems. And the platform also knows what type of items, how much sugar they have, how much gluten they have, how much salt, sodium they have, that could lead to some consequences for someone with high blood pressure or something like that.
You know, these are other objectives and it's not purely relevant because you might have diabetes and you might love coke, right? You might love juice, but is it good for you? You know, so these are exactly these are the type of problem that could be interesting. Actually, my classic example that the first time I found was I was very happy with this example.
And whenever I talk multi-stakeholder, I mentioned it's basically that the Uber Eats problem.
I don't know if it works in Europe as well, Uber Eats. But yeah, we do happen to Germany, for example, as well. Yeah. So basically, the idea is it's a multi-sided platform. There are users who search for restaurants for foods. There are restaurants who serve, you know, who cook the food. There are delivery people who deliver foods from restaurants to people home.
And you know, there is Uber, who is a platform that wants to orchestrate and organize all this.
Definitely, it's a multi-objective, multi-stakeholder problem here because you could recommend the most relevant restaurants to different people. And by that, you might recommend some restaurants.
Okay. But then those restaurants, they will get a lot of requests that they cannot handle.
And that leads to also user dissatisfaction because the delivery time will be much higher.
And you know, so all of these are very important factors that in some classic recommendation algorithm that we have seen in academia, they might not capture this type of dynamics.
Yeah, definitely makes sense. In the end, you still care about the satisfaction of your users, but how you do it is more indirectly working here because first you need to somehow model what's happening for a restaurant in order to be able to model the implications of let's say overloading them with orders for the final user, which can be hard to measure, hard to detect, hard to model, whatever multi-sided platforms. So as Spotify is also a multi-sided platform, when preparing for this episode, I was somehow always coming across that back and forth between multi-objective and multi-stakeholder recommender systems. So how can we actually distinguish these two terms? Is it that multi-objective originally started with only taking a consumer's perspective and then extended into multi-stakeholder or how can we distinguish these two topics from each other?
Or is it even worse distinguishing them? So what's your take on this?
Definitely. Yeah, I mean, I do have a take and it might not be the same take that other people might have. So this is my personal opinion. So the multi-objective is a bigger umbrella, right?
Because having multiple stakeholders can be considered multi-objective, like, okay, the objectives come from different stakeholders in multi-stakeholder. But in an abstract way, it is a multi-objective problem, right? So to me, the multi-objective is a big umbrella that initially only look at, as far as I am aware, look at the different objectives of user preferences, like for the diversity novelty, long tail. So those who receive the recommendations?
Exactly. Yeah. And then with the multi-stakeholder is basically a multi-objective recommendation that you care about other stakeholder objectives as well, right? And that's create some interesting dynamics because, you know, the fairness of different stakeholders, because it comes from different stakeholders, like if the object you come from the same user, then the idea of fairness might not be as critical because it comes from me as a user. But if they come from multiple stakeholders, okay, whose objective should be taken more seriously than the others or should should be prior to raise. So some fairness concerns comes into play with multi-stakeholder recommendation. This is how I see the difference, basically, and some economic aspects of stakeholders, you know, over time, what's going to happen to each stakeholder, reach gets richer, poor gets poorer, you know, there are different dynamics that come from the fact that the objectives come from different stakeholders. So that's how it makes it different.
Okay, okay, I see. So if we put these terms of multi-objective and multi-stakeholder into the perspective of Spotify and what you do there as your daily work, can you share some experience of what is your day-to-day work? What are the problems that you are working on exactly? I mean, we talked about debiasing, but also, for example, how do you solve for the trade-off between the objectives of multiple stakeholders, especially those of users, so the listeners versus, or maybe not versus, those of the artists? Yeah, so again, like these are, there are very definitely interesting existing work in the literature. Like we do have a culture of like keeping up with what's the state of the artistry is happening to solve different problems.
And we are following those different approaches to basically mitigate the biases that we have mentioned, initially about the popularity and other fairness issues. And so in specifically, it's important for multi-sided platform like music streaming, let's say Spotify, is that there are two sides. You know, we have listeners who come to the platform, to the app, to have a good time, to listen to their favorite artists or even discover new artists or kind of listen to podcasts, etc. And on the other side, all of these providers, those artists, those creators, they are hoping that what they are creating is being discovered by people, is being listened to by people. So as a platform, we do have a mission, we do have a responsibility to make sure that we are connecting the right audience to the right creator. And the problem that we talked about, about popularity bias, so all the solutions exist here as well, like how to fix those kind of bias to make sure a less known creator or less known artist, they still get what the type of audience that they were hoping for. So definitely, this idea of multi-sided platform is something that is very much visible at Spotify. And yeah, we, as I mentioned before that as well, so we don't really just look at a blind algorithm and not monitor it. We definitely want to ensure we look at the output of recommendations, how different artists are being recommended, to what extent, to what percentage, different popularity tiers, different countries, local artists, international artists. So we are monitoring different aspects of recommendations and to ensure that we are aligning our algorithms to basically fairly recommend different creators and different artists to different people. And from the listener side also, as I mentioned, it's also important that they are getting what they should get, right? So we want to expose some super non-relevant recommendations to people just because that would help someone from the creator side. The ideal situation would be that creator, we should find the most relevant user that would appreciate that creator. This is a two-sided basically matching, like okay, we should get what type of recommendations.
Okay, I see. I see. And I mean, artist fairness is not the only concern there, but also other attributes and their proper calibration with regards to the user's history play a role.
And I'm actually alluding to one of the more recent papers that you have worked on where there's also a blog post in the Spotify R&D research blog there. It was called users interests are multi-faceted recommendation models should be too. Can you walk us a bit through the ideas of the techniques that you propose there and what problems they are solving in the context of Spotify? Yeah, so that particular paper was basically on the topic of calibration. And as you mentioned that, you know, the first known and emphasize on the known word paper that came was the 2018 stick calibrated recommendations paper. But the idea of calibration, actually, I could also list that paper, I forgot the exact title. But I think Professor Diet Mariana actually, he shared that paper with me. And I had seen that paper before, but I forgot the paper, the idea of calibration at first appeared in that paper, I think it's 2011 or 2010.
I guess you mentioned it as part of his keynote for the Morse workshop, right? Oh, okay. So we will definitely find it because I was also astonished a bit when he pulled up that paper and said here in 2011, there were some first remarks and I was like, Oh, I wasn't aware about that. Yeah, so yeah, so the paper that we published, it was at wisdom conference. So it was on calibration. And it was more on the calibrating based on genre, right? No, let's say movies, you know, pop, I don't know, drama, romance. And, and that's what we mean here by multifaceted. I mean, but it could be the facets can be different thing here is about genre. But in general, you know, it could be like, okay, you have certain interest in American music, but you also have certain interest in Persian music. So these different aspects how to capture. And the work is basically we are, we proposed a new algorithm, it's a very novel algorithm based on max flow problem. It's a graph based approach that tries to solve the calibration problem by actually solving a minimum cost problem. And even cost is a very classic computer science problem that basically you want to find the solution that is achieving the minimum cost. This is the right moment to pull up my operations research book from university. Yeah, so here it is.
Oh, nice. Nice. Who's the author? It's Nikhil Stein Waldmann. So actually three professors from my university because the university KIT in Karlsruhe in Germany is quite well known for operations research. So yeah, always a good reference when coming across these things. And I think it's also important to actually give a very big credit to Tony Jabara. He's actually our VP of machine learning. And the basic idea came from him basically. Okay. Yeah, it's not that common for a VP to be debt research oriented and debt technical, but he's really, he's really amazing.
So we work together on that, that algorithm. And yeah, as I mentioned, it works based on like minimizing the cost and how to define the cost. That's also mentioned in the paper in detail, what, what's the cost here that we're trying to minimize. So we kind of define the cost somehow that could be related to the calibration. And basically we defined it in a way that if you minimize it, you are basically achieving calibrated recommendations. It's, I think it's much better for the audience to read the paper in detail, because it's much easier and much easier to follow. Okay, so we will definitely put it in the show notes as well, which means that in that paper, miss calibration is considered as cost. So and by minimizing that cost, you are ensuring to calibrate recommendations. So for example, to show recommendations that have a distribution of music genres that somehow fits the one that you have learned from my history. Exactly. And so in a high level, exactly. Like we defined it as miss calibration, but even how we define miss calibration that could be a calculable by this minimum cost flow problem, because it shouldn't, if it's too complex, it's not possible to solve the problem. So it should be defined in a way that the algorithm can actually minimize it, right? Yeah. So those details can be find the paper. Cool. Sounds great.
And brings me also to another question, which is, given your strong position in the RecSys space, what do you see as your or also for music streaming in general, the main problems and challenges when it comes to personalization yet? And of course, in the future? Yeah, that's a great question. I guess the main problem that is basically trying to find the most relevant recommendations, it's also still not really solved. It's not that common to have a recommendation that gives the most relevant recommendation. And it keeps the user happy in the long term, right?
So that long term satisfaction is something super, super interesting. And what kind of recommendations or what kind of approaches we should take to ensure that you are having a good time getting those recommendations, not just now, but also tomorrow, but also next week, and you are constantly, consistently having a good time on the platform. And so let's say as an example, I might show you a recommendation that you play it right now and you stream it, right?
So by the basic definition, that was a success because you stream it until the end. Yeah.
But it gives you some impression that, okay, maybe I shouldn't stream anything else from this genre or from these artists or from this. So how we can help with that way of measuring, right? So that's something I'm personally very interested in exploring. Sorry for jumping into that. But that actually brings me to a problem that I encounter myself in my recent usage with Spotify, because I really went deep down that podcast rabbit hole. So I listen to many podcasts. I usually listen to a morning news almost every weekday when I go out for a run. I used to listen to a quite long economics podcast where each episode appears on a weekly basis. But sometimes I'm having the impression of two things. So first is, I'm getting a bit stuck in that podcast realm. So sometimes for me as a user who is using as maybe 60, 70% of my overall usage time. So I recently looked into my GDPR request data and analyzed it myself, that I'm getting a bit too much of podcast recommendations recently, where I feel it might be a bit miscalibrated, actually. And the second thing is also, and you alluded to it, the signal and how you interpret the signal. And I'm not sure how you do it with podcasts. But I've come across that reference where you say, okay, for songs, if you listen to them for at least 30 seconds, and we take it as a positive signal, I assume that you do also some more advanced stuff already there or distinguish into even more signals, whatever.
But sometimes I'm afraid that when there is some outer continuation of podcast episodes, and I don't feel that the next one in line is the right one, but I just, it just takes me longer to skip it. And I'm really hesitant, or not hesitant, but I'm really afraid of that this has already been taken into account as a positive signal then because I skipped after two minutes or something like that. So these are just the two things that just crossed my mind when you talk about it, so that I still like to listen to music, just that I listen more to podcasts doesn't mean that I don't want to listen to music anymore. And sometimes it's when I get to the home site, it's really hard, or you really need to scroll quite long to find my discover weekly again, or any other playlist that I would like to listen to. Yeah, definitely. I can definitely that's something that we are constantly working on. I mean, it's a it's a kind of a hard problem, as you said, because the user signal is not even if there is one user, let's say not 500 million user, one user, it's not easy to understand what you want. If we don't talk to each other, right? If you just show me some signs, like, so I need to interpret what you're telling me by what you're doing, right? And often, my guess might not be correct, right? So now extend that to 500 million users, that one model, not 500 million models, one model is trying to learn all these user preferences. So that's what that's why I said, even the main problem of recommendation system is not really solved, like, that how can one model can give the best recommendations to everybody? How can it learns these user signals? Like, what does a skip mean? What does two skips in a row means? What does skipping after streaming means, you know, all of that are different type of signal that they could have different meaning for different people. Right? Yeah, it's a very complex problem in general, right? And if you can be good enough for the majority of people, in general, like the problem of recommendation system is one of the hardest problem in machine learning. Like when you do image classification, you have bunch of images, they are not changing, it's just there, then you can improve your algorithm and then be good in predicting that image. With user behavior, it's constantly people are moving and changing, they change their preferences, they do certain action that they don't mean it. And you know, so it's a very complex problem in general. But with we are constantly different teams, we're working on how to do the best for different. So and what do you think is kind of I mean, on the one hand side, you could say, when will that problem ever be solved? Or what is something that you would say is, yeah, of course, it's about the interpretation of signals and so on. But some of the things that you would say, Okay, this is actually something that we do to tackle a certain problem, what would that be? So one intuitive, basically approach and solution would be to balance the implicit or like, basically, your assumption about what user is telling you and explicitly let people tell you what they want. Right? Because often, and how to balance that is also quite challenging, like, you can't really every time ask people because then that would be just a search, it wouldn't be a recommendation algorithm, it wouldn't be helping you discover.
Yeah, right. If I ask you every time, what do you want? I'm just a search engine. I'm not really doing what I supposed to do. But how to balance that maybe sometimes the algorithm, maybe there are some dark spots that the algorithm has not understood you well. And there is some ambiguity and uncertainty about your preference. Maybe that when the algorithm and the platform can ask you, Okay, what do you want? I'm not getting you what is it? And then, you know, this trade off this balance of user interaction user explicitly telling you, and I think with the also there are quite is quite promising with the development of large language models and like generative AI, that process, I think can be more possible now that if different platform they incorporate this technology into their recommendation systems, that becomes quite more feasible to basically the user tell you what they want. Yeah, I see. Very interesting thoughts, and many inspirations for others who listen to this. Himan, I have actually mentioned that question and the majority of answers to that question. However, this time, you of course can't answer with Spotify, that would be just too easy. When you think about personalization in the world and different systems, what is actually the personalized product that sparks your joy the most? Or what are you thinking about in terms of products that you use and where you really think, okay, these folks are really doing a great job in terms of personalization? Yeah, so basically, first of all, it's not fair that you didn't let me pick Spotify. We have been talking so much about fairness, we need to provide an example of unfairness.
Exactly. So yeah, I mean, I definitely like not just because I work at Spotify, but I think it's it's just something that it's one of the best examples of how personalization can help, because sometimes you could overuse personalization in some domain that you don't even need personalization. But it's just a fancy thing. So my people people might try. But there are domains that it makes perfect sense. Yeah, if I can't pick Spotify, then I would have to, I think, Netflix, I'm also as I told you, I'm a movie fan. So I like movies. The only thing is they don't have many movies. So the catalog is a bit limited, especially the type of movies I am interested in, which I'm not the majority of people. So maybe that's okay. So Netflix definitely is is one of my favorite examples in terms of recommendation systems. Cool. Okay. Thinking about other additional upcoming guests for experts, are there some people you're having in mind you would recommend me interviewing for additional episodes? Yes. Yeah, I was when I was listening to the previous episodes. That's where I was preparing myself. Okay, who should I and I think since I have listened to almost every other episode, I know who has been.
So maybe I shouldn't pick someone who has already been. I think one of the the people that I personally worked with, and I definitely think he has a lot of interesting things to discuss is Mehdi Elahi. He's an associate professor at University of Bergen.
So that would be one of my first picks. And I don't know if you have thought about but Bamshat Mobasher is also someone that I really, really, and my PhD advisor, Robin Burke. So these two, I always see them as together because they have always worked together. And now they're not together. So Robin Burke, my PhD advisor and Bamshat Mobasher. So I gave you three choices, diverse choices. Yeah, so calibrated to your taste. Great, perfect. So I would add them to the list and then also make sure to reach out in the future. Yeah, I really enjoyed our discussion and many things that you elaborated on also kind of nice to pose some additional research questions that can spark people's creativity when listening to this episode, also to have something and that goes back to what we said about solving problems by proposing something that sparks change or kind of is able to tell you what to do differently. Yeah, in that sense, I hope that this episode was also providing some creative sparks for all listeners. And I do want to thank you very, very much for taking part in this. It was really a pleasure. Thank you so much. I had a really good time talking with you. And I think, again, this is a great podcast. And I hope more people would listen to this because I've listened to almost every episode so far. And seriously, it's a really, really informative and very nice way of getting into this area, even actually for someone who might just want to start getting into this area of recommendation systems, I feel like you have covered quite different areas of research and different people with different expertise. So I think it's a really, really interesting podcast. So I'm very happy and proud to be part of this.
And hopefully, the audience will find these episodes also useful for what they are looking for.
Cool. Thanks for that. I will definitely make sure that we get even more people on board and continue that effort. So with these words, thank you again. And Himan, have a wonderful day.
Thank you so much. And you also have a wonderful, I guess, afternoon.
I will going out for a run. See you. Bye. Thank you. Bye.
Thank you so much for listening to this episode of RECSPERTS, recommender systems experts, the podcast that brings you the experts and recommender systems.
If you enjoy this podcast, please subscribe to it on your favorite podcast player. And please share it with anybody you think might benefit from it. If you have questions, a recommendation for an interesting expert you want to have in my show, or any other suggestions, drop me a message on Twitter or send me an email to Marcel at RECSPERTS.com. Thank you again for listening and sharing.
And make sure not to miss the next episode, because people who listen to this also listen to the next episode. Goodbye.
Bye.

#19: Popularity Bias in Recommender Systems with Himan Abdollahpouri
Broadcast by