什么是美的推荐算法

2012-11-25

什么是好的推荐算法2009 年 Greg Linden 在 Communication ACM 上发表了一篇文章《What is a Good Recommen

什么是好的推荐算法
2009 年 Greg Linden 在 Communication ACM 上发表了一篇文章《What is a Good Recommendation Algorithm》，文章质疑了评分预测推荐算法的意义， Greg 认为 TopN 更有价值和意义。

原文链接：http://cacm.acm.org/blogs/blog-cacm/22925-what-is-a-good-recommendation-algorithm/fulltext

Netflix is offering one million dollars for a better recommendation engine. Better recommendations clearly are worth a lot.

Netflix 为一个更好的推荐引擎提供了一百万美金的奖励。更好的推荐很显然是非常有价值的。

But what are better recommendations? What do we mean by better?

但是，什么是更好的推荐？为了获得更好的推荐，我们要做些什么？

In the Netflix Prize, the meaning of better is quite specific. It is the root mean squared error (RMSE) between the actual ratings Netflix customers gave the movies and the predictions of the algorithm.

在 Netflix Prize，更好的含义是非常具体的。评测的标准是：Netflix 用户对电影的真实打分和算法预测的用户打分之间的均方根误差（RMSE）。

Let's say we build a recommender that wins the contest. We reduce the error between our predictions and what people actually will rate by 10% over what Netflix used to be able to do. Is that good?

比方说我们构建了一个赢得了比赛的推荐引擎。我们的算法使 Netflix 能够将预测值和用户的真实打分之间的误差降低 10% 以上。这是好的推荐算法吗？

Depending on what we want, it might be very good. If what we want to do is show people how much they might like a movie, it would be good to be as accurate as possible on every possible movie.

根据我们期望的，这可能是一个不错的算法。如果我们希望显示用户喜欢一部电影的程度（用户的评分），同时对于每一部电影预测的评分值要尽可能准确。

However, this might not be what we want. Even in a feature that shows people how much they might like any particular movie, people care a lot more about misses at the extremes. For example, it could be much worse to say that you will be lukewarm (a prediction of 3 1/2 stars) on a movie you love (an actual of 4 1/2 stars) than to say you will be slightly less lukewarm (a prediction of 2 1/2 stars) on a movie you are lukewarm about (an actual of 3 1/2 stars).

然而，这可能不是我们想要的。相对于在产品功能上显示用户喜欢一些特定的电影的程度，用户更在意一些极端的错误情况。例如：将一部你喜欢的电影（真实打分为 4.5 分）预测为一般般（预测打分为 3.5 分）和将一部你觉得一般般的电影（真实打分为 3.5 分）预测为非常一般（预测打分为 2.5 分），对于用户来说，前者显得非常糟糕。

Moreover, what we often want is not to make a prediction for any movie, but find the best movies. In TopN recommendations, a recommender is trying to pick the best 10 or so items for someone. It does not matter if you cannot predict what people will hate or shades of lukewarm. The only thing that matters is picking 10 items someone will love.

此外，我们经常希望不要去预测用户看了电影之后会给电影什么样的评分，而是希望帮忙找到用户最感兴趣的电影。TopN 推荐为用户挑出最感兴趣的 10 个左右的物品。即使不能预测哪些人会厌恶或者是不喜欢也没关系。TopN 关心的仅仅是挑出用户将会喜欢的 10 个物品。

A recommender that does a good job predicting across all movies might not do the best job predicting the TopN movies. RMSE equally penalizes errors on movies you do not care about seeing as it does errors on great movies, but perhaps what we really care about is minimizing the error when predicting great movies.

擅长对所有电影进行预测评分的推荐引擎并不一定能很好的预测排名靠前的 N 部电影。均方根误差（RMSE）加大了对于预测不准的电影评分的惩罚，但对于一些不错的电影，也许我们真正关心的是最小误差。

There are parallels here with web search. Web search engines primarily care about precision (relevant results in the top 10 or top 3). They only care about recall when someone would notice something they need missing from the results they are likely to see. Search engines do not care about errors scoring arbitrary documents, just their ability to find the top N documents.

这和 Web 搜索有相似之处。搜索引擎主要关心查准率（排名前 10 或排名前 3 的相关结果）。当人们看到的结果中缺少他们需要的内容时，他们仅仅只关心召回率。搜索引擎不关心关于文档的错误得分，它只是尽力找到排名前 N 的文档。

Aggravating matters further, in both recommender systems and web search, people's perception of quality is easily influenced by factors other than the items shown. People hate slow websites and perceive slowly appearing results to be worse than fast appearing results. Differences in the information provided about each item (especially missing data or misspellings) can influence perceived quality. Presentation issues, even color of the links, can change how people focus their attention and which recommendations they see. People trust recommendations more when the engine can explain why it made them. People like recommendations that update immediately when new information is available. Diversity is valued; near duplicates disliked. New items attract attention, but people tend to judge unfamiliar or unrecognized recommendations harshly.

进一步探讨这个问题，在推荐系统和搜索引擎领域，人们的看法很容易受到物品以外的因素影响。人们讨厌网速慢的网站，认为结果加载慢的网站要比加载快的网站糟糕很多。对于每个物品提供的信息上的差异（特别是数据丢失或拼写错误）会影响用户感知。结果的呈现，那怕是超链接的颜色，都有可能影响用户的注意力，影响用户查看推荐结果。当推荐引擎能够向用户解释为什么给他推荐的时候，用户才能够更多的信任推荐结果。同时当用户提供了一些新的有价值的信息之后，希望推荐结果能够立即得到更新。多样性是有价值的；重复让人讨厌。新物品吸引眼球，但用户还是能够很粗糙的判断出那些不熟悉和陌生的推荐结果。

In the end, what we want is happy, satisfied users. Will a recommendation engine that minimizes RMSE make people happy?

最后，我们希望用户能够快乐，满足。但是一个推荐引擎最大限度地减少均方根误差能让人快乐吗？

>> bornhe's blog

热点排行

互联网

什么是美的推荐算法