fbpx
Wikipedia

File comparison

In computing, file comparison is the calculation and display of the differences and similarities between data objects, typically text files such as source code.

The KDE diff tool Kompare

The methods, implementations, and results are typically called a diff,[1] after the Unix diff utility. The output may be presented in a graphical user interface or used as part of larger tasks in networks, file systems, or revision control.

Some widely used file comparison programs are diff, cmp, FileMerge, WinMerge, Beyond Compare, and File Compare.

Many text editors and word processors perform file comparison to highlight the changes to a file or document.

Method types edit

Most file comparison tools find the longest common subsequence between two files. Any data not in the longest common subsequence is presented as a change or an insertion or a deletion.

In 1978, Paul Heckel published an algorithm that identifies most moved blocks of text.[2] This is used in the IBM History Flow tool.[3] Other file comparison programs find block moves.[clarification needed]

Some specialized file comparison tools find the longest increasing subsequence between two files.[4] The rsync protocol uses a rolling hash function to compare two files on two distant computers with low communication overhead.

File comparison in word processors is typically at the word level, while comparison in most programming tools is at the line level. Byte or character-level comparison is useful in some specialized applications.

Display edit

Display of file comparison varies, with the main approaches being either showing two files side-by-side, or showing a single file, with markup showing the changes from one file to the other. In either case, particularly side-by-side viewing, code folding or text folding may be used to hide unchanged portions of the file, only showing the changed portions.[clarification needed]

Reasoning edit

Comparison tools are used for various reasons. When one wishes to compare binary files, byte-level is probably best. But if one wishes to compare text files or computer programs, a side-by-side visual comparison is usually best.[5] This gives the user the chance to decide which file is the preferred one to retain, if the files should be merged to create one containing all the differences,[6] or perhaps to keep them both as-is for later reference, through some form of "versioning" control.

File comparison is an important, and most likely integral, part of file synchronization and backup. In backup methodologies, the issue of data corruption is an important one. Corruption occurs without warning and without one's knowledge; at least usually until too late to recover the missing parts. Usually, the only way to know for sure if a file has become corrupted is when it is next used or opened. Barring that, one must use a comparison tool to at least recognize that a difference has occurred. Therefore, all file sync or backup programs must include file comparison if these programs are to be actually useful and trusted.[7]

Historical uses edit

Prior to file comparison, machines existed to compare magnetic tapes or punch cards. The IBM 519 Card Reproducer could determine whether a deck of punched cards were equivalent. In 1957, John Van Gardner developed a system to compare the check sums of loaded sections of Fortran programs to debug compilation problems on the IBM 704.[8]

See also edit

References edit

  1. ^ "diff", The Jargon File.
  2. ^ Heckel, Paul (1978), "A Technique for Isolating Differences Between Files" (PDF), Communications of the ACM, 21 (4): 264–268, doi:10.1145/359460.359467, S2CID 207683976, retrieved 2011-12-04
  3. ^ Viégas, Fernanda B.; Wattenberg, Martin; Kushal, Kushal Dave (2004), Studying Cooperation and Conflict between Authors with history flow Visualizations (PDF), vol. 6, Vienna: CHI, pp. 575–582, retrieved 2011-12-01
  4. ^ Liwei Ren; Jinsheng Gu; Luosheng Peng (18 April 2006). "Algorithms for block-level code alignment of software binary files". Google Patents. USPTO. Retrieved 10 May 2019.
  5. ^ MacKenzie, David; Eggert, Paul; Stallman, Richard (2003). Comparing and Merging Files with Gnu Diff and Patch. Network Theory. ISBN 978-0-9541617-5-0.
  6. ^ "File comparison software: vc-dwim and vc-chlog". www.gnu.org. Retrieved 2023-04-16.
  7. ^ "SystemRescue - System Rescue Homepage". www.system-rescue.org. Retrieved 2023-04-16.
  8. ^ John Van Gardner. "Fortran And The Genesis Of Project Intercept" (PDF). Retrieved 2011-12-06.

External links edit

file, comparison, computing, file, comparison, calculation, display, differences, similarities, between, data, objects, typically, text, files, such, source, code, diff, tool, komparethis, article, about, data, object, text, file, comparisons, computing, other. In computing file comparison is the calculation and display of the differences and similarities between data objects typically text files such as source code The KDE diff tool KompareThis article is about data object text and file comparisons in computing For other uses see Comparison The methods implementations and results are typically called a diff 1 after the Unix diff utility The output may be presented in a graphical user interface or used as part of larger tasks in networks file systems or revision control Some widely used file comparison programs are diff cmp FileMerge WinMerge Beyond Compare and File Compare Many text editors and word processors perform file comparison to highlight the changes to a file or document Contents 1 Method types 2 Display 3 Reasoning 4 Historical uses 5 See also 6 References 7 External linksMethod types editMost file comparison tools find the longest common subsequence between two files Any data not in the longest common subsequence is presented as a change or an insertion or a deletion In 1978 Paul Heckel published an algorithm that identifies most moved blocks of text 2 This is used in the IBM History Flow tool 3 Other file comparison programs find block moves clarification needed Some specialized file comparison tools find the longest increasing subsequence between two files 4 The rsync protocol uses a rolling hash function to compare two files on two distant computers with low communication overhead File comparison in word processors is typically at the word level while comparison in most programming tools is at the line level Byte or character level comparison is useful in some specialized applications Display editDisplay of file comparison varies with the main approaches being either showing two files side by side or showing a single file with markup showing the changes from one file to the other In either case particularly side by side viewing code folding or text folding may be used to hide unchanged portions of the file only showing the changed portions clarification needed Reasoning editComparison tools are used for various reasons When one wishes to compare binary files byte level is probably best But if one wishes to compare text files or computer programs a side by side visual comparison is usually best 5 This gives the user the chance to decide which file is the preferred one to retain if the files should be merged to create one containing all the differences 6 or perhaps to keep them both as is for later reference through some form of versioning control File comparison is an important and most likely integral part of file synchronization and backup In backup methodologies the issue of data corruption is an important one Corruption occurs without warning and without one s knowledge at least usually until too late to recover the missing parts Usually the only way to know for sure if a file has become corrupted is when it is next used or opened Barring that one must use a comparison tool to at least recognize that a difference has occurred Therefore all file sync or backup programs must include file comparison if these programs are to be actually useful and trusted 7 Historical uses editPrior to file comparison machines existed to compare magnetic tapes or punch cards The IBM 519 Card Reproducer could determine whether a deck of punched cards were equivalent In 1957 John Van Gardner developed a system to compare the check sums of loaded sections of Fortran programs to debug compilation problems on the IBM 704 8 See also editComparison of file comparison tools Computer assisted reviewing the use of computer software for more effective text reviewingPages displaying wikidata descriptions as a fallback Data differencing Method for compressing changes over time Delta encoding way of storing or transmitting dataPages displaying wikidata descriptions as a fallback Document comparison Edit distance Computer science metric of string similarityReferences edit diff The Jargon File Heckel Paul 1978 A Technique for Isolating Differences Between Files PDF Communications of the ACM 21 4 264 268 doi 10 1145 359460 359467 S2CID 207683976 retrieved 2011 12 04 Viegas Fernanda B Wattenberg Martin Kushal Kushal Dave 2004 Studying Cooperation and Conflict between Authors with history flow Visualizations PDF vol 6 Vienna CHI pp 575 582 retrieved 2011 12 01 Liwei Ren Jinsheng Gu Luosheng Peng 18 April 2006 Algorithms for block level code alignment of software binary files Google Patents USPTO Retrieved 10 May 2019 MacKenzie David Eggert Paul Stallman Richard 2003 Comparing and Merging Files with Gnu Diff and Patch Network Theory ISBN 978 0 9541617 5 0 File comparison software vc dwim and vc chlog www gnu org Retrieved 2023 04 16 SystemRescue System Rescue Homepage www system rescue org Retrieved 2023 04 16 John Van Gardner Fortran And The Genesis Of Project Intercept PDF Retrieved 2011 12 06 External links edit nbsp Wikimedia Commons has media related to File comparison File Comparison at Curlie Retrieved from https en wikipedia org w index php title File comparison amp oldid 1168769463, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.