recommendation engine - Performance Issue with Item-Based Recommender in Mahout -

- February 15, 2010

i trying use item based recommender in mahout. contains 2.5 m user,item interaction, without preference values. there around 100 items , 100k users.it takes around 10s recommend. whereas same data takes less second when use user based recommender.

itemsimilarity sim = new tanimotocoefficientsimilarity(dm);  candidateitemsstrategy cis = new samplingcandidateitemsstrategy(10,10,10,dm.getnumusers(),dm.getnumitems()); mostsimilaritemscandidateitemsstrategy mis = new samplingcandidateitemsstrategy(10,10,10,dm.getnumusers(),dm.getnumitems()); recommender ur = new genericbooleanprefitembasedrecommender(dm,sim,cis,mis);

i read 1 of answer of @sean suggests using above parameters samplingcandidateitemsstrategy. not sure does.

edit: 2.5 m total user-item associations, there 100k users , total number of items 100.

among many reasons, main reason choosing item-based recommender is: if number of items relatively low compared number of users, performance advantage significant. goes other way around too. if number of users relatively low compared number of items, choosing user-based recommendation result in performance advantage.

from question did not number of items in dataset, number of users. once mention 2.5m , 100k? in case if user-based recommendation faster you, should choose approach.

except, if item-item similarities more fixed (not expected change radically or frequently), better candidates precomputation. precomputation , used precomputed similarities between items.

also, since don't have preference values, , if want use item-based similarity, can think of enriching similarity function pure item-item similarity based on characteristics of items. (this idea).

Search This Blog

KBPS

recommendation engine - Performance Issue with Item-Based Recommender in Mahout -

Comments

Post a Comment

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

SQL: Divide the sum of values in one table with the count of rows in another -