fbpx
Wikipedia

Lossy Count Algorithm

The lossy count algorithm is an algorithm to identify elements in a data stream whose frequency exceeds a user-given threshold. The algorithm works by dividing the data stream into buckets for frequent items, but fill as many buckets as possible in main memory one time. The frequency computed by this algorithm is not always accurate, but has an error threshold that can be specified by the user. The run time and space required by the algorithm is inversely proportional to the specified error threshold; hence the larger the error, the smaller the footprint.

The algorithm was created by computer scientists Rajeev Motwani and Gurmeet Singh Manku. It finds applications in computations where data takes the form of a continuous data stream instead of a finite data set, such as network traffic measurements, web server logs, and clickstreams.

Algorithm Edit

The general algorithm is as follows[1]

  • Step 1: Divide the incoming data stream into buckets of width  , where   is mentioned by user as the error bound (along with minimum support threshold =  ).
  • Step 2: Increment the frequency count of each item according to the new bucket values. After each bucket, decrement all counters by 1.
  • Step 3: Repeat – Update counters and after each bucket, decrement all counters by 1.

References Edit

  1. ^ Han, Jiawei. (2006). Data mining : concepts and techniques. Kamber, Micheline. (2nd ed.). Amsterdam: Elsevier. ISBN 978-0-08-047558-5. OCLC 143252170.
  • Motwani, R; Manku, G.S (2002). "Approximate frequency counts over data streams". VLDB '02 Proceedings of the 28th International Conference on Very Large Data Bases: 346–357.

lossy, count, algorithm, lossy, count, algorithm, algorithm, identify, elements, data, stream, whose, frequency, exceeds, user, given, threshold, algorithm, works, dividing, data, stream, into, buckets, frequent, items, fill, many, buckets, possible, main, mem. The lossy count algorithm is an algorithm to identify elements in a data stream whose frequency exceeds a user given threshold The algorithm works by dividing the data stream into buckets for frequent items but fill as many buckets as possible in main memory one time The frequency computed by this algorithm is not always accurate but has an error threshold that can be specified by the user The run time and space required by the algorithm is inversely proportional to the specified error threshold hence the larger the error the smaller the footprint The algorithm was created by computer scientists Rajeev Motwani and Gurmeet Singh Manku It finds applications in computations where data takes the form of a continuous data stream instead of a finite data set such as network traffic measurements web server logs and clickstreams Algorithm EditThe general algorithm is as follows 1 Step 1 Divide the incoming data stream into buckets of width w 1 ϵ displaystyle w 1 epsilon where ϵ displaystyle epsilon is mentioned by user as the error bound along with minimum support threshold s displaystyle sigma Step 2 Increment the frequency count of each item according to the new bucket values After each bucket decrement all counters by 1 Step 3 Repeat Update counters and after each bucket decrement all counters by 1 References Edit Han Jiawei 2006 Data mining concepts and techniques Kamber Micheline 2nd ed Amsterdam Elsevier ISBN 978 0 08 047558 5 OCLC 143252170 Motwani R Manku G S 2002 Approximate frequency counts over data streams VLDB 02 Proceedings of the 28th International Conference on Very Large Data Bases 346 357 This computer science article is a stub You can help Wikipedia by expanding it vte Retrieved from https en wikipedia org w index php title Lossy Count Algorithm amp oldid 1142564885, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.