fbpx
Wikipedia

7z

7z is a compressed archive file format that supports several different data compression, encryption and pre-processing algorithms. The 7z format initially appeared as implemented by the 7-Zip archiver. The 7-Zip program is publicly available under the terms of the GNU Lesser General Public License. The LZMA SDK 4.62 was placed in the public domain in December 2008. The latest stable version of 7-Zip and LZMA SDK is version 23.01.[2]

7z file format
Filename extension
.7z
Internet media type
application/x-7z-compressed
Uniform Type Identifier (UTI)org.7-zip.7-zip-archive
Magic number'7', 'z', 0xBC, 0xAF, 0x27, 0x1C
Size limitation264 bytes (roughly 18 exabytes)
Developed byIgor Pavlov[1]
Initial release1999; 25 years ago (1999)[2]
Type of formatData compression
Open format?Yes: GNU Lesser General Public License / Public domain
Website7-zip.org

The official, informal 7z file format specification is distributed with 7-Zip's source code since 2015. The specification can be found in plain text format in the 'doc' sub-directory of the source code distribution.[3] There have been additional third-party attempts at writing more concrete documentation based on the released code.[4]

Features and enhancements edit

The 7z format provides the following main features:

  • Open, modular architecture that allows any compression, conversion, or encryption method to be stacked.
  • High compression ratios (depending on the compression method used).
  • AES-256 bit encryption.
  • Zip 2.0 (Legacy) Encryption
  • Large file support (up to approximately 16 exbibytes, or 264 bytes).
  • Unicode file names.
  • Support for solid compression, where multiple files of like type are compressed within a single stream, in order to exploit the combined redundancy inherent in similar files.
  • Compression and encryption of archive headers.
  • Support for multi-part archives : e.g. xxx.7z.001, xxx.7z.002, ... (see the context menu items Split File... to create them and Combine Files... to re-assemble an archive from a set of multi-part component files).
  • Support for custom codec plugin DLLs.

The format's open architecture allows additional future compression methods to be added to the standard.

Compression methods edit

The following compression methods are currently defined:

  • LZMA – A variation of the LZ77 algorithm, using a sliding dictionary up to 4 GB in length for duplicate string elimination. The LZ stage is followed by entropy coding using a Markov chain-based range coder and binary trees.
  • LZMA2 – modified version of LZMA providing better multithreading support and less expansion of incompressible data.[5]
  • Bzip2 – The standard Burrows–Wheeler transform algorithm. Bzip2 uses two reversible transformations; BWT, then Move to front with Huffman coding for symbol reduction (the actual compression element).
  • PPMd – Dmitry Shkarin's 2002 PPMdH (PPMII (Prediction by Partial matching with Information Inheritance) and cPPMII (complicated PPMII)) with small changes: PPMII is an improved version of the 1984 PPM compression algorithm (prediction by partial matching).
  • DEFLATE – Standard algorithm based on 32 kB LZ77 and Huffman coding. Deflate is found in several file formats including ZIP, gzip, PNG and PDF. 7-Zip contains a from-scratch DEFLATE encoder that frequently beats the de facto standard zlib version in compression size, but at the expense of CPU usage.

A suite of recompression tools called AdvanceCOMP contains a copy of the DEFLATE encoder from the 7-Zip implementation; these utilities can often be used to further compress the size of existing gzip, ZIP, PNG, or MNG files.

Pre-processing filters edit

The LZMA SDK comes with the BCJ and BCJ2 preprocessors included, so that later stages are able to achieve greater compression: For x86, ARM, PowerPC (PPC), IA-64 Itanium, and ARM Thumb processors, jump targets are 'normalized' [5] before compression by changing relative position into absolute values. For x86, this means that near jumps, calls and conditional jumps (but not short jumps and conditional jumps) are converted from the machine language "jump 1655 bytes backwards" style notation to normalized "jump to address 5554" style notation; all jumps to 5554, perhaps a common subroutine, are thus encoded identically, making them more compressible.

  • BCJ – Converter for 32-bit x86 executables. Normalise target addresses of near jumps and calls from relative distances to absolute destinations.
  • BCJ2– Pre-processor for 32-bit x86 executables. BCJ2 is an improvement on BCJ, adding additional x86 jump/call instruction processing. Near jump, near call, conditional near jump targets are split out and compressed separately in another stream.
  • Delta encoding – delta filter, basic preprocessor for multimedia data.

Similar executable pre-processing technology is included in other software; the RAR compressor features displacement compression for 32-bit x86 executables and IA-64 executables, and the UPX runtime executable file compressor includes support for working with 16-bit values within DOS binary files.

Encryption edit

The 7z format supports encryption with the AES algorithm with a 256-bit key. The key is generated from a user-supplied passphrase using an algorithm based on the SHA-256 hash function. The SHA-256 is executed 219 (524288) times,[6] which causes a significant delay on slow PCs before compression or extraction starts. This technique is called key stretching and is used to make a brute-force search for the passphrase more difficult. Current GPU-based, and custom hardware attacks limit the effectiveness of this particular method of key stretching,[7] so it is still important to choose a strong password. The 7z format provides the option to encrypt the filenames of a 7z archive.

Limitations edit

The 7z format does not store filesystem permissions (such as UNIX owner/group permissions or NTFS ACLs), and hence can be inappropriate for backup/archival purposes. A workaround on UNIX-like systems for this is to convert data to a tar bitstream before compressing with 7z. But GNU tar (common in many UNIX environments) can also compress with the LZMA2 algorithm ("xz") natively, without the use of 7z, using the "-J" switch. The resulting file extension is ".tar.xz" or ".txz" and not ".tar.7z". This method of compression has been adopted with many distributions for packaging, such as Arch, Debian (deb), Fedora (rpm) and Slackware. (The older "lzma" format is less efficient.)[8] On the other hand, it is important to note, that tar does not save the filesystem encoding, which means that tar compressed filenames can become unreadable if decompressed on a different computer.

The 7z format does not allow extraction of some "broken files"—that is (for example) if one has the first segment of a series of 7z files, 7z cannot give the start of the files within the archive—it must wait until all segments are downloaded. The 7z format also lacks recovery records, making it vulnerable to data degradation unless used in conjunction with external solutions, like parchives, or within filesystems with robust error-correction. By way of comparison, zip files also lack a recovery feature while the rar format has one.

See also edit

References edit

  1. ^ "A Few Questions for Igor Pavlov". Dr. Dobb's Data Compression Newsletter. 30 April 2003. Retrieved 26 December 2009.
  2. ^ a b History of 7-zip changes
  3. ^ LZMA SDK, "DOC" directory, 7zFormat.txt
  4. ^ ".7z format specification — py7zr – 7-zip archive library". py7zr.readthedocs.io.
  5. ^ a b Collin, Lasse. "lzma_.lzma". liblzma bindings. from the original on 8 February 2010. Retrieved 3 January 2010. Compared to LZMA1, LZMA2 adds support for LZMA_SYNC_FLUSH, uncompressed chunks (smaller expansion when trying to compress uncompressible data), possibility to change lc/lp/pb in the middle of encoding, and some other internal improvements.
  6. ^ 7-zip source code
  7. ^ Colin Percival. scrypt. As presented in "Stronger Key Derivation via Sequential Memory-Hard Functions". presented at BSDCan'09, May 2009.
  8. ^ "GNU tar 1.34: 8.1 Using Less Space through Compression".

Further reading edit

  • Salomon, David (2007). Data compression: the complete reference. Springer. p. 241. ISBN 978-1-84628-602-5.

External links edit

other, uses, disambiguation, compressed, archive, file, format, that, supports, several, different, data, compression, encryption, processing, algorithms, format, initially, appeared, implemented, archiver, program, publicly, available, under, terms, lesser, g. For other uses see 7Z disambiguation 7z is a compressed archive file format that supports several different data compression encryption and pre processing algorithms The 7z format initially appeared as implemented by the 7 Zip archiver The 7 Zip program is publicly available under the terms of the GNU Lesser General Public License The LZMA SDK 4 62 was placed in the public domain in December 2008 The latest stable version of 7 Zip and LZMA SDK is version 23 01 2 7z file formatFilename extension 7zInternet media typeapplication x 7z compressedUniform Type Identifier UTI org 7 zip 7 zip archiveMagic number 7 z 0xBC 0xAF 0x27 0x1CSize limitation264 bytes roughly 18 exabytes Developed byIgor Pavlov 1 Initial release1999 25 years ago 1999 2 Type of formatData compressionOpen format Yes GNU Lesser General Public License Public domainWebsite7 zip wbr orgThe official informal 7z file format specification is distributed with 7 Zip s source code since 2015 The specification can be found in plain text format in the doc sub directory of the source code distribution 3 There have been additional third party attempts at writing more concrete documentation based on the released code 4 Contents 1 Features and enhancements 1 1 Compression methods 1 2 Pre processing filters 1 3 Encryption 1 4 Limitations 2 See also 3 References 4 Further reading 5 External linksFeatures and enhancements editThe 7z format provides the following main features Open modular architecture that allows any compression conversion or encryption method to be stacked High compression ratios depending on the compression method used AES 256 bit encryption Zip 2 0 Legacy Encryption Large file support up to approximately 16 exbibytes or 264 bytes Unicode file names Support for solid compression where multiple files of like type are compressed within a single stream in order to exploit the combined redundancy inherent in similar files Compression and encryption of archive headers Support for multi part archives e g xxx 7z 001 xxx 7z 002 see the context menu items Split File to create them and Combine Files to re assemble an archive from a set of multi part component files Support for custom codec plugin DLLs The format s open architecture allows additional future compression methods to be added to the standard Compression methods edit The following compression methods are currently defined LZMA A variation of the LZ77 algorithm using a sliding dictionary up to 4 GB in length for duplicate string elimination The LZ stage is followed by entropy coding using a Markov chain based range coder and binary trees LZMA2 modified version of LZMA providing better multithreading support and less expansion of incompressible data 5 Bzip2 The standard Burrows Wheeler transform algorithm Bzip2 uses two reversible transformations BWT then Move to front with Huffman coding for symbol reduction the actual compression element PPMd Dmitry Shkarin s 2002 PPMdH PPMII Prediction by Partial matching with Information Inheritance and cPPMII complicated PPMII with small changes PPMII is an improved version of the 1984 PPM compression algorithm prediction by partial matching DEFLATE Standard algorithm based on 32 kB LZ77 and Huffman coding Deflate is found in several file formats including ZIP gzip PNG and PDF 7 Zip contains a from scratch DEFLATE encoder that frequently beats the de facto standard zlib version in compression size but at the expense of CPU usage A suite of recompression tools called AdvanceCOMP contains a copy of the DEFLATE encoder from the 7 Zip implementation these utilities can often be used to further compress the size of existing gzip ZIP PNG or MNG files Pre processing filters edit The LZMA SDK comes with the BCJ and BCJ2 preprocessors included so that later stages are able to achieve greater compression For x86 ARM PowerPC PPC IA 64 Itanium and ARM Thumb processors jump targets are normalized 5 before compression by changing relative position into absolute values For x86 this means that near jumps calls and conditional jumps but not short jumps and conditional jumps are converted from the machine language jump 1655 bytes backwards style notation to normalized jump to address 5554 style notation all jumps to 5554 perhaps a common subroutine are thus encoded identically making them more compressible BCJ Converter for 32 bit x86 executables Normalise target addresses of near jumps and calls from relative distances to absolute destinations BCJ2 Pre processor for 32 bit x86 executables BCJ2 is an improvement on BCJ adding additional x86 jump call instruction processing Near jump near call conditional near jump targets are split out and compressed separately in another stream Delta encoding delta filter basic preprocessor for multimedia data Similar executable pre processing technology is included in other software the RAR compressor features displacement compression for 32 bit x86 executables and IA 64 executables and the UPX runtime executable file compressor includes support for working with 16 bit values within DOS binary files Encryption edit The 7z format supports encryption with the AES algorithm with a 256 bit key The key is generated from a user supplied passphrase using an algorithm based on the SHA 256 hash function The SHA 256 is executed 219 524288 times 6 which causes a significant delay on slow PCs before compression or extraction starts This technique is called key stretching and is used to make a brute force search for the passphrase more difficult Current GPU based and custom hardware attacks limit the effectiveness of this particular method of key stretching 7 so it is still important to choose a strong password The 7z format provides the option to encrypt the filenames of a 7z archive Limitations edit The 7z format does not store filesystem permissions such as UNIX owner group permissions or NTFS ACLs and hence can be inappropriate for backup archival purposes A workaround on UNIX like systems for this is to convert data to a tar bitstream before compressing with 7z But GNU tar common in many UNIX environments can also compress with the LZMA2 algorithm xz natively without the use of 7z using the J switch The resulting file extension is tar xz or txz and not tar 7z This method of compression has been adopted with many distributions for packaging such as Arch Debian deb Fedora rpm and Slackware The older lzma format is less efficient 8 On the other hand it is important to note that tar does not save the filesystem encoding which means that tar compressed filenames can become unreadable if decompressed on a different computer The 7z format does not allow extraction of some broken files that is for example if one has the first segment of a series of 7z files 7z cannot give the start of the files within the archive it must wait until all segments are downloaded The 7z format also lacks recovery records making it vulnerable to data degradation unless used in conjunction with external solutions like parchives or within filesystems with robust error correction By way of comparison zip files also lack a recovery feature while the rar format has one See also editComparison of archive formats List of archive formats Open file formatReferences edit A Few Questions for Igor Pavlov Dr Dobb s Data Compression Newsletter 30 April 2003 Retrieved 26 December 2009 a b History of 7 zip changes LZMA SDK DOC directory 7zFormat txt 7z format specification py7zr 7 zip archive library py7zr readthedocs io a b Collin Lasse lzma lzma liblzma bindings Archived from the original on 8 February 2010 Retrieved 3 January 2010 Compared to LZMA1 LZMA2 adds support for LZMA SYNC FLUSH uncompressed chunks smaller expansion when trying to compress uncompressible data possibility to change lc lp pb in the middle of encoding and some other internal improvements 7 zip source code Colin Percival scrypt As presented in Stronger Key Derivation via Sequential Memory Hard Functions presented at BSDCan 09 May 2009 GNU tar 1 34 8 1 Using Less Space through Compression Further reading editSalomon David 2007 Data compression the complete reference Springer p 241 ISBN 978 1 84628 602 5 External links editOfficial website 7z on SourceForge Retrieved from https en wikipedia org w index php title 7z amp oldid 1216519308, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.