fbpx
Wikipedia

Discounted cumulative gain

Discounted cumulative gain (DCG) is a measure of ranking quality for a given query; Normalized DCG (nDCG or NDCG) is a measure of ranking quality independent of the particular query. In information retrieval, they are often used to measure effectiveness of search engine algorithms and related applications. Using a graded relevance scale of documents in a search-engine result set, DCG measures the usefulness, or gain, of a document based on its position in the result list. The gain is accumulated from the top of the result list to the bottom, with the gain of each result discounted at lower ranks.[1] The normalized version corrects for the fact that different queries may have different numbers of relevant results.

Overview Edit

Two assumptions are made in using DCG and its related measures.

  1. Highly relevant documents are more useful when appearing earlier in a search engine result list (have higher ranks)
  2. Highly relevant documents are more useful than marginally relevant documents, which are in turn more useful than non-relevant documents.

Cumulative Gain Edit

DCG is a refinement of simpler measure, Cumulative Gain (CG).[2] Cumulative Gain is the sum of the graded relevance values of all results in a search result list. CG does not take into account the rank (position) of a result in the result list. The CG at a particular rank position   is defined as:

 

Where   is the graded relevance of the result at position  .

The value computed with the CG function is unaffected by changes in the ordering of search results. That is, moving a highly relevant document   above a higher ranked, less relevant, document   does not change the computed value for CG (assuming  ). Based on the two assumptions made above about the usefulness of search results, (N)DCG is usually preferred over CG. Cumulative Gain is sometimes called Graded Precision.

Discounted Cumulative Gain Edit

The premise of DCG is that highly relevant documents appearing lower in a search result list should be penalized as the graded relevance value is reduced logarithmically proportional to the position of the result.

The usual formula of DCG accumulated at a particular rank position   is defined as:[1]

 

Previously there was no theoretically sound justification for using a logarithmic reduction factor[3] other than the fact that it produces a smooth reduction. But Wang et al. (2013)[2] gave theoretical guarantee for using the logarithmic reduction factor in Normalized DCG (NDCG). The authors show that for every pair of substantially different ranking functions, the NDCG can decide which one is better in a consistent manner.

An alternative formulation of DCG[4] places stronger emphasis on retrieving relevant documents:

 

The latter formula is commonly used in industry including major web search companies[5] and data science competition platforms such as Kaggle.[6]

These two formulations of DCG are the same when the relevance values of documents are binary;[3]: 320   .

Note that Croft et al. (2010) and Burges et al. (2005) present the second DCG with a log of base e, while both versions of DCG above use a log of base 2. When computing NDCG with the first formulation of DCG, the base of the log does not matter, but the base of the log does affect the value of NDCG for the second formulation. Clearly, the base of the log affects the value of DCG in both formulations.

Normalized DCG Edit

Search result lists vary in length depending on the query. Comparing a search engine's performance from one query to the next cannot be consistently achieved using DCG alone, so the cumulative gain at each position for a chosen value of   should be normalized across queries. This is done by sorting all relevant documents in the corpus by their relative relevance, producing the maximum possible DCG through position  , also called Ideal DCG (IDCG) through that position. For a query, the normalized discounted cumulative gain, or nDCG, is computed as:

 ,

where IDCG is ideal discounted cumulative gain,

 

and   represents the list of relevant documents (ordered by their relevance) in the corpus up to position p.

The nDCG values for all queries can be averaged to obtain a measure of the average performance of a search engine's ranking algorithm. Note that in a perfect ranking algorithm, the   will be the same as the   producing an nDCG of 1.0. All nDCG calculations are then relative values on the interval 0.0 to 1.0 and so are cross-query comparable.

The main difficulty encountered in using nDCG is the unavailability of an ideal ordering of results when only partial relevance feedback is available.

Example Edit

Presented with a list of documents in response to a search query, an experiment participant is asked to judge the relevance of each document to the query. Each document is to be judged on a scale of 0-3 with 0 meaning not relevant, 3 meaning highly relevant, and 1 and 2 meaning "somewhere in between". For the documents ordered by the ranking algorithm as

 

the user provides the following relevance scores:

 

That is: document 1 has a relevance of 3, document 2 has a relevance of 2, etc. The Cumulative Gain of this search result listing is:

 

Changing the order of any two documents does not affect the CG measure. If   and   are switched, the CG remains the same, 11. DCG is used to emphasize highly relevant documents appearing early in the result list. Using the logarithmic scale for reduction, the DCG for each result in order is:


       
1 3 1 3
2 2 1.585 1.262
3 3 2 1.5
4 0 2.322 0
5 1 2.585 0.387
6 2 2.807 0.712

So the   of this ranking is:

 

Now a switch of   and   results in a reduced DCG because a less relevant document is placed higher in the ranking; that is, a more relevant document is discounted more by being placed in a lower rank.

The performance of this query to another is incomparable in this form since the other query may have more results, resulting in a larger overall DCG which may not necessarily be better. In order to compare, the DCG values must be normalized.

To normalize DCG values, an ideal ordering for the given query is needed. For this example, that ordering would be the monotonically decreasing sort of all known relevance judgments. In addition to the six from this experiment, suppose we also know there is a document   with relevance grade 3 to the same query and a document   with relevance grade 2 to that query. Then the ideal ordering is:

 

The ideal ranking is cut again to length 6 to match the depth of analysis of the ranking:

 

The DCG of this ideal ordering, or IDCG (Ideal DCG) , is computed to rank 6:

 

And so the nDCG for this query is given as:

 

Limitations Edit

  1. Normalized DCG metric does not penalize for bad documents in the result. For example, if a query returns two results with scores 1,1,1 and 1,1,1,0 respectively, both would be considered equally good even if the latter contains a bad document. For the ranking judgments Excellent, Fair, Bad one might use numerical scores 1,0,-1 instead of 2,1,0. This would cause the score to lower if bad results are returned, prioritizing the precision of the results over the recall. Note that this approach can result in an overall negative score which would shift the lower bound of the score from 0 to a negative value.
  2. Normalized DCG does not penalize for missing documents in the result. For example, if a query returns two results with scores 1,1,1 and 1,1,1,1,1 respectively, both would be considered equally good, assuming ideal DCG is computed to rank 3 for the former and rank 5 for the latter. One way to take into account this limitation is to enforce fixed set size for the result set and use minimum scores for the missing documents. In previous example, we would use the scores 1,1,1,0,0 and 1,1,1,1,1 and quote nDCG as nDCG@5.
  3. Normalized DCG may not be suitable to measure performance of queries that may often have several equally good results. This is especially true when this metric is limited to only the first few results as it is done in practice. For example, for queries such as "restaurants" nDCG@1 would account for only the first result and hence if one result set contains only 1 restaurant from the nearby area while the other contains 5, both would end up having the same score even though the latter is more comprehensive.

See also Edit

References Edit

  1. ^ a b Kalervo Järvelin, Jaana Kekäläinen: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)
  2. ^ a b Yining Wang, Liwei Wang, Yuanzhi Li, Di He, Wei Chen, Tie-Yan Liu. 2013. A Theoretical Analysis of Normalized Discounted Cumulative Gain (NDCG) Ranking Measures. In Proceedings of the 26th Annual Conference on Learning Theory (COLT 2013).
  3. ^ a b B. Croft; D. Metzler; T. Strohman (2010). Search Engines: Information Retrieval in Practice. Addison Wesley.
  4. ^ Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning (ICML '05). ACM, New York, NY, USA, 89-96. DOI=10.1145/1102351.1102363 http://doi.acm.org/10.1145/1102351.1102363
  5. ^ "Introduction to Information Retrieval - Evaluation" (PDF). Stanford University. 21 April 2013. Retrieved 23 March 2014.
  6. ^ . Archived from the original on 23 March 2014. Retrieved 23 March 2014.

discounted, cumulative, gain, measure, ranking, quality, given, query, normalized, ndcg, ndcg, measure, ranking, quality, independent, particular, query, information, retrieval, they, often, used, measure, effectiveness, search, engine, algorithms, related, ap. Discounted cumulative gain DCG is a measure of ranking quality for a given query Normalized DCG nDCG or NDCG is a measure of ranking quality independent of the particular query In information retrieval they are often used to measure effectiveness of search engine algorithms and related applications Using a graded relevance scale of documents in a search engine result set DCG measures the usefulness or gain of a document based on its position in the result list The gain is accumulated from the top of the result list to the bottom with the gain of each result discounted at lower ranks 1 The normalized version corrects for the fact that different queries may have different numbers of relevant results Contents 1 Overview 1 1 Cumulative Gain 1 2 Discounted Cumulative Gain 1 3 Normalized DCG 2 Example 3 Limitations 4 See also 5 ReferencesOverview EditTwo assumptions are made in using DCG and its related measures Highly relevant documents are more useful when appearing earlier in a search engine result list have higher ranks Highly relevant documents are more useful than marginally relevant documents which are in turn more useful than non relevant documents Cumulative Gain Edit DCG is a refinement of simpler measure Cumulative Gain CG 2 Cumulative Gain is the sum of the graded relevance values of all results in a search result list CG does not take into account the rank position of a result in the result list The CG at a particular rank position p displaystyle p is defined as C G p i 1 p r e l i displaystyle mathrm CG p sum i 1 p rel i Where r e l i displaystyle rel i is the graded relevance of the result at position i displaystyle i The value computed with the CG function is unaffected by changes in the ordering of search results That is moving a highly relevant document d i displaystyle d i above a higher ranked less relevant document d j displaystyle d j does not change the computed value for CG assuming i j p displaystyle i j leq p Based on the two assumptions made above about the usefulness of search results N DCG is usually preferred over CG Cumulative Gain is sometimes called Graded Precision Discounted Cumulative Gain Edit The premise of DCG is that highly relevant documents appearing lower in a search result list should be penalized as the graded relevance value is reduced logarithmically proportional to the position of the result The usual formula of DCG accumulated at a particular rank position p displaystyle p is defined as 1 D C G p i 1 p r e l i log 2 i 1 r e l 1 i 2 p r e l i log 2 i 1 displaystyle mathrm DCG p sum i 1 p frac rel i log 2 i 1 rel 1 sum i 2 p frac rel i log 2 i 1 Previously there was no theoretically sound justification for using a logarithmic reduction factor 3 other than the fact that it produces a smooth reduction But Wang et al 2013 2 gave theoretical guarantee for using the logarithmic reduction factor in Normalized DCG NDCG The authors show that for every pair of substantially different ranking functions the NDCG can decide which one is better in a consistent manner An alternative formulation of DCG 4 places stronger emphasis on retrieving relevant documents D C G p i 1 p 2 r e l i 1 log 2 i 1 displaystyle mathrm DCG p sum i 1 p frac 2 rel i 1 log 2 i 1 The latter formula is commonly used in industry including major web search companies 5 and data science competition platforms such as Kaggle 6 These two formulations of DCG are the same when the relevance values of documents are binary 3 320 r e l i 0 1 displaystyle rel i in 0 1 Note that Croft et al 2010 and Burges et al 2005 present the second DCG with a log of base e while both versions of DCG above use a log of base 2 When computing NDCG with the first formulation of DCG the base of the log does not matter but the base of the log does affect the value of NDCG for the second formulation Clearly the base of the log affects the value of DCG in both formulations Normalized DCG Edit This section needs additional citations for verification Please help improve this article by adding citations to reliable sources in this section Unsourced material may be challenged and removed February 2020 Learn how and when to remove this template message Search result lists vary in length depending on the query Comparing a search engine s performance from one query to the next cannot be consistently achieved using DCG alone so the cumulative gain at each position for a chosen value of p displaystyle p should be normalized across queries This is done by sorting all relevant documents in the corpus by their relative relevance producing the maximum possible DCG through position p displaystyle p also called Ideal DCG IDCG through that position For a query the normalized discounted cumulative gain or nDCG is computed as n D C G p D C G p I D C G p displaystyle mathrm nDCG p frac DCG p IDCG p where IDCG is ideal discounted cumulative gain I D C G p i 1 R E L p r e l i log 2 i 1 displaystyle mathrm IDCG p sum i 1 REL p frac rel i log 2 i 1 and R E L p displaystyle REL p represents the list of relevant documents ordered by their relevance in the corpus up to position p The nDCG values for all queries can be averaged to obtain a measure of the average performance of a search engine s ranking algorithm Note that in a perfect ranking algorithm the D C G p displaystyle DCG p will be the same as the I D C G p displaystyle IDCG p producing an nDCG of 1 0 All nDCG calculations are then relative values on the interval 0 0 to 1 0 and so are cross query comparable The main difficulty encountered in using nDCG is the unavailability of an ideal ordering of results when only partial relevance feedback is available Example EditPresented with a list of documents in response to a search query an experiment participant is asked to judge the relevance of each document to the query Each document is to be judged on a scale of 0 3 with 0 meaning not relevant 3 meaning highly relevant and 1 and 2 meaning somewhere in between For the documents ordered by the ranking algorithm as D 1 D 2 D 3 D 4 D 5 D 6 displaystyle D 1 D 2 D 3 D 4 D 5 D 6 the user provides the following relevance scores 3 2 3 0 1 2 displaystyle 3 2 3 0 1 2 That is document 1 has a relevance of 3 document 2 has a relevance of 2 etc The Cumulative Gain of this search result listing is C G 6 i 1 6 r e l i 3 2 3 0 1 2 11 displaystyle mathrm CG 6 sum i 1 6 rel i 3 2 3 0 1 2 11 Changing the order of any two documents does not affect the CG measure If D 3 displaystyle D 3 and D 4 displaystyle D 4 are switched the CG remains the same 11 DCG is used to emphasize highly relevant documents appearing early in the result list Using the logarithmic scale for reduction the DCG for each result in order is i displaystyle i r e l i displaystyle rel i log 2 i 1 displaystyle log 2 i 1 r e l i log 2 i 1 displaystyle frac rel i log 2 i 1 1 3 1 32 2 1 585 1 2623 3 2 1 54 0 2 322 05 1 2 585 0 3876 2 2 807 0 712So the D C G 6 displaystyle DCG 6 of this ranking is D C G 6 i 1 6 r e l i log 2 i 1 3 1 262 1 5 0 0 387 0 712 6 861 displaystyle mathrm DCG 6 sum i 1 6 frac rel i log 2 i 1 3 1 262 1 5 0 0 387 0 712 6 861 Now a switch of D 3 displaystyle D 3 and D 4 displaystyle D 4 results in a reduced DCG because a less relevant document is placed higher in the ranking that is a more relevant document is discounted more by being placed in a lower rank The performance of this query to another is incomparable in this form since the other query may have more results resulting in a larger overall DCG which may not necessarily be better In order to compare the DCG values must be normalized To normalize DCG values an ideal ordering for the given query is needed For this example that ordering would be the monotonically decreasing sort of all known relevance judgments In addition to the six from this experiment suppose we also know there is a document D 7 displaystyle D 7 with relevance grade 3 to the same query and a document D 8 displaystyle D 8 with relevance grade 2 to that query Then the ideal ordering is 3 3 3 2 2 2 1 0 displaystyle 3 3 3 2 2 2 1 0 The ideal ranking is cut again to length 6 to match the depth of analysis of the ranking 3 3 3 2 2 2 displaystyle 3 3 3 2 2 2 The DCG of this ideal ordering or IDCG Ideal DCG is computed to rank 6 I D C G 6 8 740 displaystyle mathrm IDCG 6 8 740 And so the nDCG for this query is given as n D C G 6 D C G 6 I D C G 6 6 861 8 740 0 785 displaystyle mathrm nDCG 6 frac DCG 6 IDCG 6 frac 6 861 8 740 0 785 Limitations EditNormalized DCG metric does not penalize for bad documents in the result For example if a query returns two results with scores 1 1 1 and 1 1 1 0 respectively both would be considered equally good even if the latter contains a bad document For the ranking judgments Excellent Fair Bad one might use numerical scores 1 0 1 instead of 2 1 0 This would cause the score to lower if bad results are returned prioritizing the precision of the results over the recall Note that this approach can result in an overall negative score which would shift the lower bound of the score from 0 to a negative value Normalized DCG does not penalize for missing documents in the result For example if a query returns two results with scores 1 1 1 and 1 1 1 1 1 respectively both would be considered equally good assuming ideal DCG is computed to rank 3 for the former and rank 5 for the latter One way to take into account this limitation is to enforce fixed set size for the result set and use minimum scores for the missing documents In previous example we would use the scores 1 1 1 0 0 and 1 1 1 1 1 and quote nDCG as nDCG 5 Normalized DCG may not be suitable to measure performance of queries that may often have several equally good results This is especially true when this metric is limited to only the first few results as it is done in practice For example for queries such as restaurants nDCG 1 would account for only the first result and hence if one result set contains only 1 restaurant from the nearby area while the other contains 5 both would end up having the same score even though the latter is more comprehensive See also EditEvaluation measures information retrieval Learning to rankReferences Edit a b Kalervo Jarvelin Jaana Kekalainen Cumulated gain based evaluation of IR techniques ACM Transactions on Information Systems 20 4 422 446 2002 a b Yining Wang Liwei Wang Yuanzhi Li Di He Wei Chen Tie Yan Liu 2013 A Theoretical Analysis of Normalized Discounted Cumulative Gain NDCG Ranking Measures In Proceedings of the 26th Annual Conference on Learning Theory COLT 2013 a b B Croft D Metzler T Strohman 2010 Search Engines Information Retrieval in Practice Addison Wesley Chris Burges Tal Shaked Erin Renshaw Ari Lazier Matt Deeds Nicole Hamilton and Greg Hullender 2005 Learning to rank using gradient descent In Proceedings of the 22nd international conference on Machine learning ICML 05 ACM New York NY USA 89 96 DOI 10 1145 1102351 1102363 http doi acm org 10 1145 1102351 1102363 Introduction to Information Retrieval Evaluation PDF Stanford University 21 April 2013 Retrieved 23 March 2014 Normalized Discounted Cumulative Gain Archived from the original on 23 March 2014 Retrieved 23 March 2014 Retrieved from https en wikipedia org w index php title Discounted cumulative gain amp oldid 1171021310, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.