fbpx
Wikipedia

Delta encoding

Delta encoding is a way of storing or transmitting data in the form of differences (deltas) between sequential data rather than complete files; more generally this is known as data differencing. Delta encoding is sometimes called delta compression, particularly where archival histories of changes are required (e.g., in revision control software).

The differences are recorded in discrete files called "deltas" or "diffs". In situations where differences are small – for example, the change of a few words in a large document or the change of a few records in a large table – delta encoding greatly reduces data redundancy. Collections of unique deltas are substantially more space-efficient than their non-encoded equivalents.

From a logical point of view the difference between two data values is the information required to obtain one value from the other – see relative entropy. The difference between identical values (under some equivalence) is often called 0 or the neutral element.

Simple example Edit

Perhaps the simplest example is storing values of bytes as differences (deltas) between sequential values, rather than the values themselves. So, instead of 2, 4, 6, 9, 7, we would store 2, 2, 2, 3, −2. This reduces the variance (range) of the values when neighbor samples are correlated, enabling a lower bit usage for the same data. IFF 8SVX sound format applies this encoding to raw sound data before applying compression to it. Not even all 8-bit sound samples compress better when delta encoded, and the usability of delta encoding is even smaller for 16-bit and better samples. Therefore, compression algorithms often choose to delta encode only when the compression is better than without. However, in video compression, delta frames can considerably reduce frame size and are used in virtually every video compression codec.

Definition Edit

A delta can be defined in 2 ways, symmetric delta and directed delta. A symmetric delta can be expressed as

 

where   and   represent two versions.

A directed delta, also called a change, is a sequence of (elementary) change operations which, when applied to one version  , yields another version   (note the correspondence to transaction logs in databases). In computer implementations, they typically take the form of a language with two commands: copy data from v1 and write literal data.

Variants Edit

A variation of delta encoding which encodes differences between the prefixes or suffixes of strings is called incremental encoding. It is particularly effective for sorted lists with small differences between strings, such as a list of words from a dictionary.

Implementation issues Edit

The nature of the data to be encoded influences the effectiveness of a particular compression algorithm.

Delta encoding performs best when data has small or constant variation; for an unsorted data set, there may be little to no compression possible with this method.

In delta encoded transmission over a network where only a single copy of the file is available at each end of the communication channel, special error control codes are used to detect which parts of the file have changed since its previous version. For example, rsync uses a rolling checksum algorithm based on Mark Adler's adler-32 checksum.

Sample C code Edit

The following C code performs a simple form of delta encoding and decoding on a sequence of characters:

void delta_encode(unsigned char *buffer, int length) {  unsigned char last = 0;  for (int i = 0; i < length; i++)  {  unsigned char current = buffer[i];  buffer[i] = current - last;  last = current;  } } void delta_decode(unsigned char *buffer, int length) {  unsigned char last = 0;  for (int i = 0; i < length; i++)  {  unsigned char delta = buffer[i];  buffer[i] = delta + last;  last = buffer[i];  } } 

Examples Edit

Delta encoding in HTTP Edit

Another instance of use of delta encoding is RFC 3229, "Delta encoding in HTTP", which proposes that HTTP servers should be able to send updated Web pages in the form of differences between versions (deltas), which should decrease Internet traffic, as most pages change slowly over time, rather than being completely rewritten repeatedly:

This document describes how delta encoding can be supported as a compatible extension to HTTP/1.1.

Many HTTP (Hypertext Transport Protocol) requests cause the retrieval of slightly modified instances of resources for which the client already has a cache entry. Research has shown that such modifying updates are frequent, and that the modifications are typically much smaller than the actual entity. In such cases, HTTP would make more efficient use of network bandwidth if it could transfer a minimal description of the changes, rather than the entire new instance of the resource.

[...] We believe that it might be possible to support rsync using the "instance manipulation" framework described later in this document, but this has not been worked out in any detail.

The suggested rsync-based framework was implemented in the rproxy system as a pair of HTTP proxies.[1] Like the basic vcdiff-based implementation, both systems are rarely used.

Delta copying Edit

Delta copying is a fast way of copying a file that is partially changed, when a previous version is present on the destination location. With delta copying, only the changed part of a file is copied. It is usually used in backup or file copying software, often to save bandwidth when copying between computers over a private network or the internet. One notable open-source example is rsync.[2][3][4]

Online backup Edit

Many of the online backup services adopt this methodology, often known simply as deltas, in order to give their users previous versions of the same file from previous backups. This reduces associated costs, not only in the amount of data that has to be stored as differing versions (as the whole of each changed version of a file has to be offered for users to access), but also those costs in the uploading (and sometimes the downloading) of each file that has been updated (by just the smaller delta having to be used, rather than the whole file).

Delta updates Edit

For large software packages, there is usually little data changed between versions. Many vendors choose to use delta transfers to save time and bandwidth.

Diff Edit

Diff is a file comparison program, which is mainly used for text files. By default, it generates symmetric deltas that are reversible. Two formats used for software patches, context and unified, provides additional context lines that allow for tolerating shifts in line number.

Git Edit

The Git source code control system employs delta compression in an auxiliary "git repack" operation. Objects in the repository that have not yet been delta-compressed ("loose objects") are compared against a heuristically chosen subset of all other objects, and the common data and differences are concatenated into a "pack file" which is then compressed using conventional methods. In common use cases, where source or data files are changed incrementally between commits, this can result in significant space savings. The repack operation is typically performed as part of the "git gc" process, which is triggered automatically when the numbers of loose objects or pack files exceed configured thresholds.

The format is documented in the pack-format page of the Git documentation. It implements a directed delta.[5]

VCDIFF Edit

One general format for directed delta encoding is VCDIFF, described in RFC 3284. Free software implementations include Xdelta and open-vcdiff.

GDIFF Edit

Generic Diff Format (GDIFF) is another directed delta encoding format. It was submitted to W3C in 1997.[6] In many cases, VCDIFF has better compression rate than GDIFF.

bsdiff Edit

Bsdiff is a binary diff program using suffix sorting. For executables that contain many changes in pointer addresses, it performs better than VCDIFF-type "copy and literal" encodings. The intent is to find a way to generate a small diff without needing to parse assembly code (as in Google's Courgette). Bsdiff achieves this by allowing "copy" matches with errors, which are then corrected using an extra "add" array of bytewise differences. Since this array is mostly either zero or repeated values for offset changes, it takes up little space after compression.[7]

Bsdiff is useful for delta updates. Google uses bsdiff in Chromium and Android. The deltarpm feature of the RPM Package Manager is based on a heavily modified bsdiff that can use a hash table for matching.[8] FreeBSD also uses bsdiff for updates.[9]

Since the 4.3 release of bsdiff in 2005, various improvements or fixes have been produced for it. Google maintains multiple versions of the code for each of its products.[10] FreeBSD takes many of Google's compatible changes, mainly a vulnerability fix and a switch to the faster divsufsort suffix-sorting routine.[11] Debian has a series of performance tweaks to the program.[12]

ddelta is a rewrite of bsdiff proposed for use in Debian's delta updates. Among other efficiency improvements, it uses a sliding window to reduce memory and CPU cost.[13]

See also Edit

References Edit

  1. ^ "rproxy: introduction". rproxy.samba.org.
  2. ^ . Archived from the original on 2016-03-13. Retrieved 2016-04-29.
  3. ^ "Bvckup 2 | Forum | How delta copying works".
  4. ^ http://www.eggheadcafe.com/software/aspnet/33678264/delta-copying.aspx[permanent dead link]
  5. ^ "Git - pack-format Documentation". Git documentation. Retrieved 13 January 2020.
  6. ^ "Generic Diff Format Specification". www.w3.org.
  7. ^ Colin Percival, Naive differences of executable code, http://www.daemonology.net/bsdiff/, 2003.
  8. ^ "rpmdelta/delta.c". rpm-software-management. 3 July 2019. Retrieved 13 January 2020.
  9. ^ Anonymous (May 2016). "NON-CRYPTANALYTIC ATTACKS AGAINST FREEBSD UPDATE COMPONENTS". GitHub Gist.
  10. ^ "xtraeme/bsdiff-chromium: README.chromium". GitHub. 2012.; "courgette/third_party/bsdiff/README.chromium - chromium/src". Git at Google.; "android/platform/external/bsdiff/". Git at Google.
  11. ^ "History for freebsd/usr.bin/bsdiff". GitHub.
  12. ^ "Package: bsdiff". Debian Patch Tracker.
  13. ^ Klode, Julian. "julian-klode/ddelta". GitHub. Retrieved 13 January 2020.

External links Edit

  • RFC 3229 – Delta Encoding in HTTP

delta, encoding, confused, with, elias, delta, coding, delta, modulation, storing, transmitting, data, form, differences, deltas, between, sequential, data, rather, than, complete, files, more, generally, this, known, data, differencing, sometimes, called, del. Not to be confused with Elias delta coding or delta modulation Delta encoding is a way of storing or transmitting data in the form of differences deltas between sequential data rather than complete files more generally this is known as data differencing Delta encoding is sometimes called delta compression particularly where archival histories of changes are required e g in revision control software The differences are recorded in discrete files called deltas or diffs In situations where differences are small for example the change of a few words in a large document or the change of a few records in a large table delta encoding greatly reduces data redundancy Collections of unique deltas are substantially more space efficient than their non encoded equivalents From a logical point of view the difference between two data values is the information required to obtain one value from the other see relative entropy The difference between identical values under some equivalence is often called 0 or the neutral element Contents 1 Simple example 2 Definition 2 1 Variants 3 Implementation issues 4 Sample C code 5 Examples 5 1 Delta encoding in HTTP 5 2 Delta copying 5 2 1 Online backup 5 2 2 Delta updates 5 3 Diff 5 4 Git 5 5 VCDIFF 5 6 GDIFF 5 7 bsdiff 6 See also 7 References 8 External linksSimple example EditPerhaps the simplest example is storing values of bytes as differences deltas between sequential values rather than the values themselves So instead of 2 4 6 9 7 we would store 2 2 2 3 2 This reduces the variance range of the values when neighbor samples are correlated enabling a lower bit usage for the same data IFF 8SVX sound format applies this encoding to raw sound data before applying compression to it Not even all 8 bit sound samples compress better when delta encoded and the usability of delta encoding is even smaller for 16 bit and better samples Therefore compression algorithms often choose to delta encode only when the compression is better than without However in video compression delta frames can considerably reduce frame size and are used in virtually every video compression codec Definition EditA delta can be defined in 2 ways symmetric delta and directed delta A symmetric delta can be expressed as D v 1 v 2 v 1 v 2 v 2 v 1 displaystyle Delta v 1 v 2 v 1 setminus v 2 cup v 2 setminus v 1 nbsp where v 1 displaystyle v 1 nbsp and v 2 displaystyle v 2 nbsp represent two versions A directed delta also called a change is a sequence of elementary change operations which when applied to one version v 1 displaystyle v 1 nbsp yields another version v 2 displaystyle v 2 nbsp note the correspondence to transaction logs in databases In computer implementations they typically take the form of a language with two commands copy data from v1 and write literal data Variants Edit A variation of delta encoding which encodes differences between the prefixes or suffixes of strings is called incremental encoding It is particularly effective for sorted lists with small differences between strings such as a list of words from a dictionary Implementation issues EditThe nature of the data to be encoded influences the effectiveness of a particular compression algorithm Delta encoding performs best when data has small or constant variation for an unsorted data set there may be little to no compression possible with this method In delta encoded transmission over a network where only a single copy of the file is available at each end of the communication channel special error control codes are used to detect which parts of the file have changed since its previous version For example rsync uses a rolling checksum algorithm based on Mark Adler s adler 32 checksum Sample C code EditThe following C code performs a simple form of delta encoding and decoding on a sequence of characters void delta encode unsigned char buffer int length unsigned char last 0 for int i 0 i lt length i unsigned char current buffer i buffer i current last last current void delta decode unsigned char buffer int length unsigned char last 0 for int i 0 i lt length i unsigned char delta buffer i buffer i delta last last buffer i Examples EditDelta encoding in HTTP Edit Another instance of use of delta encoding is RFC 3229 Delta encoding in HTTP which proposes that HTTP servers should be able to send updated Web pages in the form of differences between versions deltas which should decrease Internet traffic as most pages change slowly over time rather than being completely rewritten repeatedly This document describes how delta encoding can be supported as a compatible extension to HTTP 1 1 Many HTTP Hypertext Transport Protocol requests cause the retrieval of slightly modified instances of resources for which the client already has a cache entry Research has shown that such modifying updates are frequent and that the modifications are typically much smaller than the actual entity In such cases HTTP would make more efficient use of network bandwidth if it could transfer a minimal description of the changes rather than the entire new instance of the resource We believe that it might be possible to support rsync using the instance manipulation framework described later in this document but this has not been worked out in any detail The suggested rsync based framework was implemented in the rproxy system as a pair of HTTP proxies 1 Like the basic vcdiff based implementation both systems are rarely used Delta copying Edit Delta copying is a fast way of copying a file that is partially changed when a previous version is present on the destination location With delta copying only the changed part of a file is copied It is usually used in backup or file copying software often to save bandwidth when copying between computers over a private network or the internet One notable open source example is rsync 2 3 4 Online backup Edit Main article Online backup services Many of the online backup services adopt this methodology often known simply as deltas in order to give their users previous versions of the same file from previous backups This reduces associated costs not only in the amount of data that has to be stored as differing versions as the whole of each changed version of a file has to be offered for users to access but also those costs in the uploading and sometimes the downloading of each file that has been updated by just the smaller delta having to be used rather than the whole file Delta updates Edit Main article delta update For large software packages there is usually little data changed between versions Many vendors choose to use delta transfers to save time and bandwidth Diff Edit Main article Diff Diff is a file comparison program which is mainly used for text files By default it generates symmetric deltas that are reversible Two formats used for software patches context and unified provides additional context lines that allow for tolerating shifts in line number Git Edit Main article Git software The Git source code control system employs delta compression in an auxiliary git repack operation Objects in the repository that have not yet been delta compressed loose objects are compared against a heuristically chosen subset of all other objects and the common data and differences are concatenated into a pack file which is then compressed using conventional methods In common use cases where source or data files are changed incrementally between commits this can result in significant space savings The repack operation is typically performed as part of the git gc process which is triggered automatically when the numbers of loose objects or pack files exceed configured thresholds The format is documented in the pack format page of the Git documentation It implements a directed delta 5 VCDIFF Edit Main article VCDIFF One general format for directed delta encoding is VCDIFF described in RFC 3284 Free software implementations include Xdelta and open vcdiff GDIFF Edit Generic Diff Format GDIFF is another directed delta encoding format It was submitted to W3C in 1997 6 In many cases VCDIFF has better compression rate than GDIFF bsdiff Edit Bsdiff is a binary diff program using suffix sorting For executables that contain many changes in pointer addresses it performs better than VCDIFF type copy and literal encodings The intent is to find a way to generate a small diff without needing to parse assembly code as in Google s Courgette Bsdiff achieves this by allowing copy matches with errors which are then corrected using an extra add array of bytewise differences Since this array is mostly either zero or repeated values for offset changes it takes up little space after compression 7 Bsdiff is useful for delta updates Google uses bsdiff in Chromium and Android The deltarpm feature of the RPM Package Manager is based on a heavily modified bsdiff that can use a hash table for matching 8 FreeBSD also uses bsdiff for updates 9 Since the 4 3 release of bsdiff in 2005 various improvements or fixes have been produced for it Google maintains multiple versions of the code for each of its products 10 FreeBSD takes many of Google s compatible changes mainly a vulnerability fix and a switch to the faster divsufsort suffix sorting routine 11 Debian has a series of performance tweaks to the program 12 ddelta is a rewrite of bsdiff proposed for use in Debian s delta updates Among other efficiency improvements it uses a sliding window to reduce memory and CPU cost 13 See also EditData differencing Method for compressing changes over time Interleaved deltas method used by SCCS to store all revisions of a filePages displaying wikidata descriptions as a fallback Source Code Control System version control system designed to track changes in the source codePages displaying wikidata descriptions as a fallback String to string correction problem Xdelta open source delta encoderReferences Edit rproxy introduction rproxy samba org Feature request Delta copying 2BrightSparks Archived from the original on 2016 03 13 Retrieved 2016 04 29 Bvckup 2 Forum How delta copying works http www eggheadcafe com software aspnet 33678264 delta copying aspx permanent dead link Git pack format Documentation Git documentation Retrieved 13 January 2020 Generic Diff Format Specification www w3 org Colin Percival Naive differences of executable code http www daemonology net bsdiff 2003 rpmdelta delta c rpm software management 3 July 2019 Retrieved 13 January 2020 Anonymous May 2016 NON CRYPTANALYTIC ATTACKS AGAINST FREEBSD UPDATE COMPONENTS GitHub Gist xtraeme bsdiff chromium README chromium GitHub 2012 courgette third party bsdiff README chromium chromium src Git at Google android platform external bsdiff Git at Google History for freebsd usr bin bsdiff GitHub Package bsdiff Debian Patch Tracker Klode Julian julian klode ddelta GitHub Retrieved 13 January 2020 External links EditRFC 3229 Delta Encoding in HTTP Retrieved from https en wikipedia org w index php title Delta encoding amp oldid 1169762644, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.