fbpx
Wikipedia

MPEG-1

MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively)[2] without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) practical.[3][4]

Moving Picture Experts Group Phase 1 (MPEG-1)
Filename extension
.dat, .mpg, .mpeg, .mp1, .mp2, .mp3, .m1v, .m1a, .m2a, .mpa, .mpv
Internet media type
audio/mpeg, video/mpeg
Developed byMPEG (part of ISO/IEC JTC 1)
Initial release6 December 1991; 31 years ago (1991-12-06)[1]
Latest release
ISO/IEC TR 11172-5:1998
October 1998; 24 years ago (1998-10)
Type of formataudio, video, container
Extended fromJPEG, H.261
Extended toMPEG-2
StandardISO/IEC 11172
Open format?Yes
Free format?Yes

Today, MPEG-1 has become the most widely compatible lossy audio/video format in the world, and is used in a large number of products and technologies. Perhaps the best-known part of the MPEG-1 standard is the first version of the MP3 audio format it introduced.

The MPEG-1 standard is published as ISO/IEC 11172 – Information technology—Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s.

The standard consists of the following five Parts:[5][6][7][8][9]

  1. Systems (storage and synchronization of video, audio, and other data together)
  2. Video (compressed video content)
  3. Audio (compressed audio content)
  4. Conformance testing (testing the correctness of implementations of the standard)
  5. Reference software (example software showing how to encode and decode according to the standard)

History

The predecessor of MPEG-1 for video coding was the H.261 standard produced by the CCITT (now known as the ITU-T). The basic architecture established in H.261 was the motion-compensated DCT hybrid video coding structure.[10][11] It uses macroblocks of size 16×16 with block-based motion estimation in the encoder and motion compensation using encoder-selected motion vectors in the decoder, with residual difference coding using a discrete cosine transform (DCT) of size 8×8, scalar quantization, and variable-length codes (like Huffman codes) for entropy coding.[12] H.261 was the first practical video coding standard, and all of its described design elements were also used in MPEG-1.[13]

Modeled on the successful collaborative approach and the compression technologies developed by the Joint Photographic Experts Group and CCITT's Experts Group on Telephony (creators of the JPEG image compression standard and the H.261 standard for video conferencing respectively), the Moving Picture Experts Group (MPEG) working group was established in January 1988, by the initiative of Hiroshi Yasuda (Nippon Telegraph and Telephone) and Leonardo Chiariglione (CSELT).[14] MPEG was formed to address the need for standard video and audio formats, and to build on H.261 to get better quality through the use of somewhat more complex encoding methods (e.g., supporting higher precision for motion vectors).[3][15][16]

Development of the MPEG-1 standard began in May 1988. Fourteen video and fourteen audio codec proposals were submitted by individual companies and institutions for evaluation. The codecs were extensively tested for computational complexity and subjective (human perceived) quality, at data rates of 1.5 Mbit/s. This specific bitrate was chosen for transmission over T-1/E-1 lines and as the approximate data rate of audio CDs.[17] The codecs that excelled in this testing were utilized as the basis for the standard and refined further, with additional features and other improvements being incorporated in the process.[18]

After 20 meetings of the full group in various cities around the world, and 4½ years of development and testing, the final standard (for parts 1–3) was approved in early November 1992 and published a few months later.[19] The reported completion date of the MPEG-1 standard varies greatly: a largely complete draft standard was produced in September 1990, and from that point on, only minor changes were introduced.[3] The draft standard was publicly available for purchase.[20] The standard was finished with the 6 November 1992 meeting.[21] The Berkeley Plateau Multimedia Research Group developed an MPEG-1 decoder in November 1992.[22] In July 1990, before the first draft of the MPEG-1 standard had even been written, work began on a second standard, MPEG-2,[23] intended to extend MPEG-1 technology to provide full broadcast-quality video (as per CCIR 601) at high bitrates (3–15  Mbit/s) and support for interlaced video.[24] Due in part to the similarity between the two codecs, the MPEG-2 standard includes full backwards compatibility with MPEG-1 video, so any MPEG-2 decoder can play MPEG-1 videos.[25]

Notably, the MPEG-1 standard very strictly defines the bitstream, and decoder function, but does not define how MPEG-1 encoding is to be performed, although a reference implementation is provided in ISO/IEC-11172-5.[2] This means that MPEG-1 coding efficiency can drastically vary depending on the encoder used, and generally means that newer encoders perform significantly better than their predecessors.[26] The first three parts (Systems, Video and Audio) of ISO/IEC 11172 were published in August 1993.[27]

MPEG-1 Parts[9][28]
Part Number First public
release date
(first edition)
latest
correction
Title Description
Part 1 ISO/IEC 11172-1 1993 1999[29] Systems
Part 2 ISO/IEC 11172-2 1993 2006[30] Video
Part 3 ISO/IEC 11172-3 1993 1996[31] Audio
Part 4 ISO/IEC 11172-4 1995 2007[32] Compliance testing
Part 5 ISO/IEC TR 11172-5 1998 2007[33] Software simulation

Patents

Due to its age, MPEG-1 is no longer covered by any essential patents and can thus be used without obtaining a licence or paying any fees.[34][35][36][37][38] The ISO patent database lists one patent for ISO 11172, US 4,472,747, which expired in 2003.[39] The near-complete draft of the MPEG-1 standard was publicly available as ISO CD 11172[20] by December 6, 1991.[1] Neither the July 2008 Kuro5hin article "Patent Status of MPEG-1, H.261 and MPEG-2",[40] nor an August 2008 thread on the gstreamer-devel[41] mailing list were able to list a single unexpired MPEG-1 Video and MPEG-1 Audio Layer I/II patent. A May 2009 discussion on the whatwg mailing list mentioned US 5,214,678 patent as possibly covering MPEG-1 Audio Layer II.[42] Filed in 1990 and published in 1993, this patent is now expired.[43]

A full MPEG-1 decoder and encoder, with "Layer III audio", could not be implemented royalty free since there were companies that required patent fees for implementations of MPEG-1 Audio Layer III, as discussed in the MP3 article. All patents in the world connected to MP3 expired 30 December 2017, which makes this format totally free for use.[citation needed] On 23 April 2017, Fraunhofer IIS stopped charging for Technicolor's MP3 licensing program for certain MP3 related patents and software.[44]

Former patent holders

The following corporations filed declarations with ISO saying they held patents for the MPEG-1 Video (ISO/IEC-11172-2) format, although all such patents have since expired.[45]

Applications

  • Most popular software for video playback includes MPEG-1 decoding, in addition to any other supported formats.
  • The popularity of MP3 audio has established a massive installed base of hardware that can play back MPEG-1 Audio (all three layers).
  • "Virtually all digital audio devices" can play back MPEG-1 Audio.[46] Many millions have been sold to-date.
  • Before MPEG-2 became widespread, many digital satellite/cable TV services used MPEG-1 exclusively.[16][26]
  • The widespread popularity of MPEG-2 with broadcasters means MPEG-1 is playable by most digital cable and satellite set-top boxes, and digital disc and tape players, due to backwards compatibility.
  • MPEG-1 was used for full-screen video on Green Book CD-i, and on Video CD (VCD).
  • The Super Video CD standard, based on VCD, uses MPEG-1 audio exclusively, as well as MPEG-2 video.
  • The DVD-Video format uses MPEG-2 video primarily, but MPEG-1 support is explicitly defined in the standard.
  • The DVD-Video standard originally required MPEG-1 Audio Layer II for PAL countries, but was changed to allow AC-3/Dolby Digital-only discs. MPEG-1 Audio Layer II is still allowed on DVDs, although newer extensions to the format, like MPEG Multichannel, are rarely supported.
  • Most DVD players also support Video CD and MP3 CD playback, which use MPEG-1.
  • The international Digital Video Broadcasting (DVB) standard primarily uses MPEG-1 Audio Layer II, and MPEG-2 video.
  • The international Digital Audio Broadcasting (DAB) standard uses MPEG-1 Audio Layer II exclusively, due to its especially high quality, modest decoder performance requirements, and tolerance of errors.
  • The Digital Compact Cassette uses PASC (Precision Adaptive Sub-band Coding) to encode its audio. PASC is an early version of MPEG-1 Audio Layer I with a fixed bit rate of 384 kilobits per second.

Part 1: Systems

Part 1 of the MPEG-1 standard covers systems, and is defined in ISO/IEC-11172-1.

MPEG-1 Systems specifies the logical layout and methods used to store the encoded audio, video, and other data into a standard bitstream, and to maintain synchronization between the different contents. This file format is specifically designed for storage on media, and transmission over communication channels, that are considered relatively reliable. Only limited error protection is defined by the standard, and small errors in the bitstream may cause noticeable defects.

This structure was later named an MPEG program stream: "The MPEG-1 Systems design is essentially identical to the MPEG-2 Program Stream structure."[47] This terminology is more popular, precise (differentiates it from an MPEG transport stream) and will be used here.

Elementary streams, packets, and clock references

  • Elementary Streams (ES) are the raw bitstreams of MPEG-1 audio and video encoded data (output from an encoder). These files can be distributed on their own, such as is the case with MP3 files.
  • Packetized Elementary Streams (PES) are elementary streams packetized into packets of variable lengths, i.e., divided ES into independent chunks where cyclic redundancy check (CRC) checksum was added to each packet for error detection.
  • System Clock Reference (SCR) is a timing value stored in a 33-bit header of each PES, at a frequency/precision of 90 kHz, with an extra 9-bit extension that stores additional timing data with a precision of 27 MHz.[48][49] These are inserted by the encoder, derived from the system time clock (STC). Simultaneously encoded audio and video streams will not have identical SCR values, however, due to buffering, encoding, jitter, and other delay.

Program streams

Program Streams (PS) are concerned with combining multiple packetized elementary streams (usually just one audio and video PES) into a single stream, ensuring simultaneous delivery, and maintaining synchronization. The PS structure is known as a multiplex, or a container format.

Presentation time stamps (PTS) exist in PS to correct the inevitable disparity between audio and video SCR values (time-base correction). 90 kHz PTS values in the PS header tell the decoder which video SCR values match which audio SCR values.[48] PTS determines when to display a portion of an MPEG program, and is also used by the decoder to determine when data can be discarded from the buffer.[50] Either video or audio will be delayed by the decoder until the corresponding segment of the other arrives and can be decoded.

PTS handling can be problematic. Decoders must accept multiple program streams that have been concatenated (joined sequentially). This causes PTS values in the middle of the video to reset to zero, which then begin incrementing again. Such PTS wraparound disparities can cause timing issues that must be specially handled by the decoder.

Decoding Time Stamps (DTS), additionally, are required because of B-frames. With B-frames in the video stream, adjacent frames have to be encoded and decoded out-of-order (re-ordered frames). DTS is quite similar to PTS, but instead of just handling sequential frames, it contains the proper time-stamps to tell the decoder when to decode and display the next B-frame (types of frames explained below), ahead of its anchor (P- or I-) frame. Without B-frames in the video, PTS and DTS values are identical.[51]

Multiplexing

To generate the PS, the multiplexer will interleave the (two or more) packetized elementary streams. This is done so the packets of the simultaneous streams can be transferred over the same channel and are guaranteed to both arrive at the decoder at precisely the same time. This is a case of time-division multiplexing.

Determining how much data from each stream should be in each interleaved segment (the size of the interleave) is complicated, yet an important requirement. Improper interleaving will result in buffer underflows or overflows, as the receiver gets more of one stream than it can store (e.g. audio), before it gets enough data to decode the other simultaneous stream (e.g. video). The MPEG Video Buffering Verifier (VBV) assists in determining if a multiplexed PS can be decoded by a device with a specified data throughput rate and buffer size.[52] This offers feedback to the multiplexer and the encoder, so that they can change the multiplex size or adjust bitrates as needed for compliance.

Part 2: Video

Part 2 of the MPEG-1 standard covers video and is defined in ISO/IEC-11172-2. The design was heavily influenced by H.261.

MPEG-1 Video exploits perceptual compression methods to significantly reduce the data rate required by a video stream. It reduces or completely discards information in certain frequencies and areas of the picture that the human eye has limited ability to fully perceive. It also exploits temporal (over time) and spatial (across a picture) redundancy common in video to achieve better data compression than would be possible otherwise. (See: Video compression)

Color space

 
Example of 4:2:0 subsampling. The two overlapping center circles represent chroma blue and chroma red (color) pixels, while the 4 outside circles represent the luma (brightness).

Before encoding video to MPEG-1, the color-space is transformed to Y′CbCr (Y′=Luma, Cb=Chroma Blue, Cr=Chroma Red). Luma (brightness, resolution) is stored separately from chroma (color, hue, phase) and even further separated into red and blue components.

The chroma is also subsampled to 4:2:0, meaning it is reduced to half resolution vertically and half resolution horizontally, i.e., to just one quarter the number of samples used for the luma component of the video.[2] This use of higher resolution for some color components is similar in concept to the Bayer pattern filter that is commonly used for the image capturing sensor in digital color cameras. Because the human eye is much more sensitive to small changes in brightness (the Y component) than in color (the Cr and Cb components), chroma subsampling is a very effective way to reduce the amount of video data that needs to be compressed. However, on videos with fine detail (high spatial complexity) this can manifest as chroma aliasing artifacts. Compared to other digital compression artifacts, this issue seems to very rarely be a source of annoyance. Because of the subsampling, Y′CbCr 4:2:0 video is ordinarily stored using even dimensions (divisible by 2 horizontally and vertically).

Y′CbCr color is often informally called YUV to simplify the notation, although that term more properly applies to a somewhat different color format. Similarly, the terms luminance and chrominance are often used instead of the (more accurate) terms luma and chroma.

Resolution/bitrate

MPEG-1 supports resolutions up to 4095×4095 (12 bits), and bit rates up to 100 Mbit/s.[16]

MPEG-1 videos are most commonly seen using Source Input Format (SIF) resolution: 352×240, 352×288, or 320×240. These relatively low resolutions, combined with a bitrate less than 1.5 Mbit/s, make up what is known as a constrained parameters bitstream (CPB), later renamed the "Low Level" (LL) profile in MPEG-2. This is the minimum video specifications any decoder should be able to handle, to be considered MPEG-1 compliant. This was selected to provide a good balance between quality and performance, allowing the use of reasonably inexpensive hardware of the time.[3][16]

Frame/picture/block types

MPEG-1 has several frame/picture types that serve different purposes. The most important, yet simplest, is I-frame.

I-frames

"I-frame" is an abbreviation for "Intra-frame", so-called because they can be decoded independently of any other frames. They may also be known as I-pictures, or keyframes due to their somewhat similar function to the key frames used in animation. I-frames can be considered effectively identical to baseline JPEG images.[16]

High-speed seeking through an MPEG-1 video is only possible to the nearest I-frame. When cutting a video it is not possible to start playback of a segment of video before the first I-frame in the segment (at least not without computationally intensive re-encoding). For this reason, I-frame-only MPEG videos are used in editing applications.

I-frame only compression is very fast, but produces very large file sizes: a factor of 3× (or more) larger than normally encoded MPEG-1 video, depending on how temporally complex a specific video is.[3] I-frame only MPEG-1 video is very similar to MJPEG video. So much so that very high-speed and theoretically lossless (in reality, there are rounding errors) conversion can be made from one format to the other, provided a couple of restrictions (color space and quantization matrix) are followed in the creation of the bitstream.[53]

The length between I-frames is known as the group of pictures (GOP) size. MPEG-1 most commonly uses a GOP size of 15–18. i.e. 1 I-frame for every 14-17 non-I-frames (some combination of P- and B- frames). With more intelligent encoders, GOP size is dynamically chosen, up to some pre-selected maximum limit.[16]

Limits are placed on the maximum number of frames between I-frames due to decoding complexing, decoder buffer size, recovery time after data errors, seeking ability, and accumulation of IDCT errors in low-precision implementations most common in hardware decoders (See: IEEE-1180).

P-frames

"P-frame" is an abbreviation for "Predicted-frame". They may also be called forward-predicted frames or inter-frames (B-frames are also inter-frames).

P-frames exist to improve compression by exploiting the temporal (over time) redundancy in a video. P-frames store only the difference in image from the frame (either an I-frame or P-frame) immediately preceding it (this reference frame is also called the anchor frame).

The difference between a P-frame and its anchor frame is calculated using motion vectors on each macroblock of the frame (see below). Such motion vector data will be embedded in the P-frame for use by the decoder.

A P-frame can contain any number of intra-coded blocks, in addition to any forward-predicted blocks.[54]

If a video drastically changes from one frame to the next (such as a cut), it is more efficient to encode it as an I-frame.

B-frames

"B-frame" stands for "bidirectional-frame" or "bipredictive frame". They may also be known as backwards-predicted frames or B-pictures. B-frames are quite similar to P-frames, except they can make predictions using both the previous and future frames (i.e. two anchor frames).

It is therefore necessary for the player to first decode the next I- or P- anchor frame sequentially after the B-frame, before the B-frame can be decoded and displayed. This means decoding B-frames requires larger data buffers and causes an increased delay on both decoding and during encoding. This also necessitates the decoding time stamps (DTS) feature in the container/system stream (see above). As such, B-frames have long been subject of much controversy, they are often avoided in videos, and are sometimes not fully supported by hardware decoders.

No other frames are predicted from a B-frame. Because of this, a very low bitrate B-frame can be inserted, where needed, to help control the bitrate. If this was done with a P-frame, future P-frames would be predicted from it and would lower the quality of the entire sequence. However, similarly, the future P-frame must still encode all the changes between it and the previous I- or P- anchor frame. B-frames can also be beneficial in videos where the background behind an object is being revealed over several frames, or in fading transitions, such as scene changes.[3][16]

A B-frame can contain any number of intra-coded blocks and forward-predicted blocks, in addition to backwards-predicted, or bidirectionally predicted blocks.[16][54]

D-frames

MPEG-1 has a unique frame type not found in later video standards. "D-frames" or DC-pictures are independently coded images (intra-frames) that have been encoded using DC transform coefficients only (AC coefficients are removed when encoding D-frames—see DCT below) and hence are very low quality. D-frames are never referenced by I-, P- or B- frames. D-frames are only used for fast previews of video, for instance when seeking through a video at high speed.[3]

Given moderately higher-performance decoding equipment, fast preview can be accomplished by decoding I-frames instead of D-frames. This provides higher quality previews, since I-frames contain AC coefficients as well as DC coefficients. If the encoder can assume that rapid I-frame decoding capability is available in decoders, it can save bits by not sending D-frames (thus improving compression of the video content). For this reason, D-frames are seldom actually used in MPEG-1 video encoding, and the D-frame feature has not been included in any later video coding standards.

Macroblocks

MPEG-1 operates on video in a series of 8×8 blocks for quantization. However, to reduce the bit rate needed for motion vectors and because chroma (color) is subsampled by a factor of 4, each pair of (red and blue) chroma blocks corresponds to 4 different luma blocks. This set of 6 blocks, with a resolution of 16×16, is processed together and called a macroblock.

A macroblock is the smallest independent unit of (color) video. Motion vectors (see below) operate solely at the macroblock level.

If the height or width of the video are not exact multiples of 16, full rows and full columns of macroblocks must still be encoded and decoded to fill out the picture (though the extra decoded pixels are not displayed).

Motion vectors

To decrease the amount of temporal redundancy in a video, only blocks that change are updated, (up to the maximum GOP size). This is known as conditional replenishment. However, this is not very effective by itself. Movement of the objects, and/or the camera may result in large portions of the frame needing to be updated, even though only the position of the previously encoded objects has changed. Through motion estimation, the encoder can compensate for this movement and remove a large amount of redundant information.

The encoder compares the current frame with adjacent parts of the video from the anchor frame (previous I- or P- frame) in a diamond pattern, up to a (encoder-specific) predefined radius limit from the area of the current macroblock. If a match is found, only the direction and distance (i.e. the vector of the motion) from the previous video area to the current macroblock need to be encoded into the inter-frame (P- or B- frame). The reverse of this process, performed by the decoder to reconstruct the picture, is called motion compensation.

A predicted macroblock rarely matches the current picture perfectly, however. The differences between the estimated matching area, and the real frame/macroblock is called the prediction error. The larger the amount of prediction error, the more data must be additionally encoded in the frame. For efficient video compression, it is very important that the encoder is capable of effectively and precisely performing motion estimation.

Motion vectors record the distance between two areas on screen based on the number of pixels (also called pels). MPEG-1 video uses a motion vector (MV) precision of one half of one pixel, or half-pel. The finer the precision of the MVs, the more accurate the match is likely to be, and the more efficient the compression. There are trade-offs to higher precision, however. Finer MV precision results in using a larger amount of data to represent the MV, as larger numbers must be stored in the frame for every single MV, increased coding complexity as increasing levels of interpolation on the macroblock are required for both the encoder and decoder, and diminishing returns (minimal gains) with higher precision MVs. Half-pel precision was chosen as the ideal trade-off for that point in time. (See: qpel)

Because neighboring macroblocks are likely to have very similar motion vectors, this redundant information can be compressed quite effectively by being stored DPCM-encoded. Only the (smaller) amount of difference between the MVs for each macroblock needs to be stored in the final bitstream.

P-frames have one motion vector per macroblock, relative to the previous anchor frame. B-frames, however, can use two motion vectors; one from the previous anchor frame, and one from the future anchor frame.[54]

Partial macroblocks, and black borders/bars encoded into the video that do not fall exactly on a macroblock boundary, cause havoc with motion prediction. The block padding/border information prevents the macroblock from closely matching with any other area of the video, and so, significantly larger prediction error information must be encoded for every one of the several dozen partial macroblocks along the screen border. DCT encoding and quantization (see below) also isn't nearly as effective when there is large/sharp picture contrast in a block.

An even more serious problem exists with macroblocks that contain significant, random, edge noise, where the picture transitions to (typically) black. All the above problems also apply to edge noise. In addition, the added randomness is simply impossible to compress significantly. All of these effects will lower the quality (or increase the bitrate) of the video substantially.

DCT

Each 8×8 block is encoded by first applying a forward discrete cosine transform (FDCT) and then a quantization process. The FDCT process (by itself) is theoretically lossless, and can be reversed by applying an Inverse DCT (IDCT) to reproduce the original values (in the absence of any quantization and rounding errors). In reality, there are some (sometimes large) rounding errors introduced both by quantization in the encoder (as described in the next section) and by IDCT approximation error in the decoder. The minimum allowed accuracy of a decoder IDCT approximation is defined by ISO/IEC 23002-1. (Prior to 2006, it was specified by IEEE 1180-1990.)

The FDCT process converts the 8×8 block of uncompressed pixel values (brightness or color difference values) into an 8×8 indexed array of frequency coefficient values. One of these is the (statistically high in variance) "DC coefficient", which represents the average value of the entire 8×8 block. The other 63 coefficients are the statistically smaller "AC coefficients", which have positive or negative values each representing sinusoidal deviations from the flat block value represented by the DC coefficient.

An example of an encoded 8×8 FDCT block:

 

Since the DC coefficient value is statistically correlated from one block to the next, it is compressed using DPCM encoding. Only the (smaller) amount of difference between each DC value and the value of the DC coefficient in the block to its left needs to be represented in the final bitstream.

Additionally, the frequency conversion performed by applying the DCT provides a statistical decorrelation function to efficiently concentrate the signal into fewer high-amplitude values prior to applying quantization (see below).

Quantization

Quantization is, essentially, the process of reducing the accuracy of a signal, by dividing it by some larger step size and rounding to an integer value (i.e. finding the nearest multiple, and discarding the remainder).

The frame-level quantizer is a number from 0 to 31 (although encoders will usually omit/disable some of the extreme values) which determines how much information will be removed from a given frame. The frame-level quantizer is typically either dynamically selected by the encoder to maintain a certain user-specified bitrate, or (much less commonly) directly specified by the user.

A "quantization matrix" is a string of 64 numbers (ranging from 0 to 255) which tells the encoder how relatively important or unimportant each piece of visual information is. Each number in the matrix corresponds to a certain frequency component of the video image.

An example quantization matrix:

 

Quantization is performed by taking each of the 64 frequency values of the DCT block, dividing them by the frame-level quantizer, then dividing them by their corresponding values in the quantization matrix. Finally, the result is rounded down. This significantly reduces, or completely eliminates, the information in some frequency components of the picture. Typically, high frequency information is less visually important, and so high frequencies are much more strongly quantized (drastically reduced). MPEG-1 actually uses two separate quantization matrices, one for intra-blocks (I-blocks) and one for inter-block (P- and B- blocks) so quantization of different block types can be done independently, and so, more effectively.[3]

This quantization process usually reduces a significant number of the AC coefficients to zero, (known as sparse data) which can then be more efficiently compressed by entropy coding (lossless compression) in the next step.

An example quantized DCT block:

 

Quantization eliminates a large amount of data, and is the main lossy processing step in MPEG-1 video encoding. This is also the primary source of most MPEG-1 video compression artifacts, like blockiness, color banding, noise, ringing, discoloration, et al. This happens when video is encoded with an insufficient bitrate, and the encoder is therefore forced to use high frame-level quantizers (strong quantization) through much of the video.

Entropy coding

Several steps in the encoding of MPEG-1 video are lossless, meaning they will be reversed upon decoding, to produce exactly the same (original) values. Since these lossless data compression steps don't add noise into, or otherwise change the contents (unlike quantization), it is sometimes referred to as noiseless coding.[46] Since lossless compression aims to remove as much redundancy as possible, it is known as entropy coding in the field of information theory.

The coefficients of quantized DCT blocks tend to zero towards the bottom-right. Maximum compression can be achieved by a zig-zag scanning of the DCT block starting from the top left and using Run-length encoding techniques.

The DC coefficients and motion vectors are DPCM-encoded.

Run-length encoding (RLE) is a simple method of compressing repetition. A sequential string of characters, no matter how long, can be replaced with a few bytes, noting the value that repeats, and how many times. For example, if someone were to say "five nines", you would know they mean the number: 99999.

RLE is particularly effective after quantization, as a significant number of the AC coefficients are now zero (called sparse data), and can be represented with just a couple of bytes. This is stored in a special 2-dimensional Huffman table that codes the run-length and the run-ending character.

Huffman Coding is a very popular and relatively simple method of entropy coding, and used in MPEG-1 video to reduce the data size. The data is analyzed to find strings that repeat often. Those strings are then put into a special table, with the most frequently repeating data assigned the shortest code. This keeps the data as small as possible with this form of compression.[46] Once the table is constructed, those strings in the data are replaced with their (much smaller) codes, which reference the appropriate entry in the table. The decoder simply reverses this process to produce the original data.

This is the final step in the video encoding process, so the result of Huffman coding is known as the MPEG-1 video "bitstream."

GOP configurations for specific applications

I-frames store complete frame info within the frame and are therefore suited for random access. P-frames provide compression using motion vectors relative to the previous frame ( I or P ). B-frames provide maximum compression but require the previous as well as next frame for computation. Therefore, processing of B-frames requires more buffer on the decoded side. A configuration of the Group of Pictures (GOP) should be selected based on these factors. I-frame only sequences give least compression, but are useful for random access, FF/FR and editability. I- and P-frame sequences give moderate compression but add a certain degree of random access, FF/FR functionality. I-, P- and B-frame sequences give very high compression but also increase the coding/decoding delay significantly. Such configurations are therefore not suited for video-telephony or video-conferencing applications.

The typical data rate of an I-frame is 1 bit per pixel while that of a P-frame is 0.1 bit per pixel and for a B-frame, 0.015 bit per pixel.[55]

Part 3: Audio

Part 3 of the MPEG-1 standard covers audio and is defined in ISO/IEC-11172-3.

MPEG-1 Audio utilizes psychoacoustics to significantly reduce the data rate required by an audio stream. It reduces or completely discards certain parts of the audio that it deduces that the human ear can't hear, either because they are in frequencies where the ear has limited sensitivity, or are masked by other (typically louder) sounds.[56]

Channel Encoding:

  • Mono
  • Joint Stereo – intensity encoded
  • Joint Stereo – M/S encoded for Layer III only
  • Stereo
  • Dual (two uncorrelated mono channels)
  • Sampling rates: 32000, 44100, and 48000 Hz
  • Bitrates for Layer I: 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 416 and 448 kbit/s[57]
  • Bitrates for Layer II: 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320 and 384 kbit/s
  • Bitrates for Layer III: 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kbit/s

MPEG-1 Audio is divided into 3 layers. Each higher layer is more computationally complex, and generally more efficient at lower bitrates than the previous.[16] The layers are semi backwards compatible as higher layers reuse technologies implemented by the lower layers. A "Full" Layer II decoder can also play Layer I audio, but not Layer III audio, although not all higher level players are "full".[56]

Layer I

MPEG-1 Audio Layer I is a simplified version of MPEG-1 Audio Layer II.[18] Layer I uses a smaller 384-sample frame size for very low delay, and finer resolution.[26] This is advantageous for applications like teleconferencing, studio editing, etc. It has lower complexity than Layer II to facilitate real-time encoding on the hardware available c. 1990.[46]

Layer I saw limited adoption in its time, and most notably was used on Philips' defunct Digital Compact Cassette at a bitrate of 384 kbit/s.[2] With the substantial performance improvements in digital processing since its introduction, Layer I quickly became unnecessary and obsolete.

Layer I audio files typically use the extension ".mp1" or sometimes ".m1a".

Layer II

MPEG-1 Audio Layer II (the first version of MP2, often informally called MUSICAM)[56] is a lossy audio format designed to provide high quality at about 192 kbit/s for stereo sound. Decoding MP2 audio is computationally simple relative to MP3, AAC, etc.

History/MUSICAM

MPEG-1 Audio Layer II was derived from the MUSICAM (Masking pattern adapted Universal Subband Integrated Coding And Multiplexing) audio codec, developed by Centre commun d'études de télévision et télécommunications (CCETT), Philips, and Institut für Rundfunktechnik (IRT/CNET)[16][18][58] as part of the EUREKA 147 pan-European inter-governmental research and development initiative for the development of digital audio broadcasting.

Most key features of MPEG-1 Audio were directly inherited from MUSICAM, including the filter bank, time-domain processing, audio frame sizes, etc. However, improvements were made, and the actual MUSICAM algorithm was not used in the final MPEG-1 Audio Layer II standard. The widespread usage of the term MUSICAM to refer to Layer II is entirely incorrect and discouraged for both technical and legal reasons.[56]

Technical details

MP2 is a time-domain encoder. It uses a low-delay 32 sub-band polyphased filter bank for time-frequency mapping; having overlapping ranges (i.e. polyphased) to prevent aliasing.[59] The psychoacoustic model is based on the principles of auditory masking, simultaneous masking effects, and the absolute threshold of hearing (ATH). The size of a Layer II frame is fixed at 1152-samples (coefficients).

Time domain refers to how analysis and quantization is performed on short, discrete samples/chunks of the audio waveform. This offers low delay as only a small number of samples are analyzed before encoding, as opposed to frequency domain encoding (like MP3) which must analyze many times more samples before it can decide how to transform and output encoded audio. This also offers higher performance on complex, random and transient impulses (such as percussive instruments, and applause), offering avoidance of artifacts like pre-echo.

The 32 sub-band filter bank returns 32 amplitude coefficients, one for each equal-sized frequency band/segment of the audio, which is about 700 Hz wide (depending on the audio's sampling frequency). The encoder then utilizes the psychoacoustic model to determine which sub-bands contain audio information that is less important, and so, where quantization will be inaudible, or at least much less noticeable.[46]

 
Example FFT analysis on an audio wave sample.

The psychoacoustic model is applied using a 1024-point fast Fourier transform (FFT). Of the 1152 samples per frame, 64 samples at the top and bottom of the frequency range are ignored for this analysis. They are presumably not significant enough to change the result. The psychoacoustic model uses an empirically determined masking model to determine which sub-bands contribute more to the masking threshold, and how much quantization noise each can contain without being perceived. Any sounds below the absolute threshold of hearing (ATH) are completely discarded. The available bits are then assigned to each sub-band accordingly.[56][59]

Typically, sub-bands are less important if they contain quieter sounds (smaller coefficient) than a neighboring (i.e. similar frequency) sub-band with louder sounds (larger coefficient). Also, "noise" components typically have a more significant masking effect than "tonal" components.[58]

Less significant sub-bands are reduced in accuracy by quantization. This basically involves compressing the frequency range (amplitude of the coefficient), i.e. raising the noise floor. Then computing an amplification factor, for the decoder to use to re-expand each sub-band to the proper frequency range.[60][61]

Layer II can also optionally use intensity stereo coding, a form of joint stereo. This means that the frequencies above 6 kHz of both channels are combined/down-mixed into one single (mono) channel, but the "side channel" information on the relative intensity (volume, amplitude) of each channel is preserved and encoded into the bitstream separately. On playback, the single channel is played through left and right speakers, with the intensity information applied to each channel to give the illusion of stereo sound.[46][58] This perceptual trick is known as "stereo irrelevancy". This can allow further reduction of the audio bitrate without much perceivable loss of fidelity, but is generally not used with higher bitrates as it does not provide very high quality (transparent) audio.[46][59][62][63]

Quality

Subjective audio testing by experts, in the most critical conditions ever implemented, has shown MP2 to offer transparent audio compression at 256 kbit/s for 16-bit 44.1 kHz CD audio using the earliest reference implementation (more recent encoders should presumably perform even better).[2][58][59][64] That (approximately) 1:6 compression ratio for CD audio is particularly impressive because it is quite close to the estimated upper limit of perceptual entropy, at just over 1:8.[65][66] Achieving much higher compression is simply not possible without discarding some perceptible information.

MP2 remains a favoured lossy audio coding standard due to its particularly high audio coding performances on important audio material such as castanet, symphonic orchestra, male and female voices and particularly complex and high energy transients (impulses) like percussive sounds: triangle, glockenspiel and audience applause.[26] More recent testing has shown that MPEG Multichannel (based on MP2), despite being compromised by an inferior matrixed mode (for the sake of backwards compatibility)[2][59] rates just slightly lower than much more recent audio codecs, such as Dolby Digital (AC-3) and Advanced Audio Coding (AAC) (mostly within the margin of error—and substantially superior in some cases, such as audience applause).[67][68] This is one reason that MP2 audio continues to be used extensively. The MPEG-2 AAC Stereo verification tests reached a vastly different conclusion, however, showing AAC to provide superior performance to MP2 at half the bitrate.[69] The reason for this disparity with both earlier and later tests is not clear, but strangely, a sample of applause is notably absent from the latter test.

Layer II audio files typically use the extension ".mp2" or sometimes ".m2a".

Layer III

MPEG-1 Audio Layer III (the first version of MP3) is a lossy audio format designed to provide acceptable quality at about 64 kbit/s for monaural audio over single-channel (BRI) ISDN links, and 128 kbit/s for stereo sound.

History/ASPEC

 
ASPEC 91 in the Deutsches Museum Bonn, with encoder (below) and decoder

MPEG-1 Audio Layer III was derived from the Adaptive Spectral Perceptual Entropy Coding (ASPEC) codec developed by Fraunhofer as part of the EUREKA 147 pan-European inter-governmental research and development initiative for the development of digital audio broadcasting. ASPEC was adapted to fit in with the Layer II model (frame size, filter bank, FFT, etc.), to become Layer III.[18]

ASPEC was itself based on Multiple adaptive Spectral audio Coding (MSC) by E. F. Schroeder, Optimum Coding in the Frequency domain (OCF) the doctoral thesis by Karlheinz Brandenburg at the University of Erlangen-Nuremberg, Perceptual Transform Coding (PXFM) by J. D. Johnston at AT&T Bell Labs, and Transform coding of audio signals by Y. Mahieux and J. Petit at Institut für Rundfunktechnik (IRT/CNET).[70]

Technical details

MP3 is a frequency-domain audio transform encoder. Even though it utilizes some of the lower layer functions, MP3 is quite different from MP2.

MP3 works on 1152 samples like MP2, but needs to take multiple frames for analysis before frequency-domain (MDCT) processing and quantization can be effective. It outputs a variable number of samples, using a bit buffer to enable this variable bitrate (VBR) encoding while maintaining 1152 sample size output frames. This causes a significantly longer delay before output, which has caused MP3 to be considered unsuitable for studio applications where editing or other processing needs to take place.[59]

MP3 does not benefit from the 32 sub-band polyphased filter bank, instead just using an 18-point MDCT transformation on each output to split the data into 576 frequency components, and processing it in the frequency domain.[58] This extra granularity allows MP3 to have a much finer psychoacoustic model, and more carefully apply appropriate quantization to each band, providing much better low-bitrate performance.

Frequency-domain processing imposes some limitations as well, causing a factor of 12 or 36 × worse temporal resolution than Layer II. This causes quantization artifacts, due to transient sounds like percussive events and other high-frequency events that spread over a larger window. This results in audible smearing and pre-echo.[59] MP3 uses pre-echo detection routines, and VBR encoding, which allows it to temporarily increase the bitrate during difficult passages, in an attempt to reduce this effect. It is also able to switch between the normal 36 sample quantization window, and instead using 3× short 12 sample windows instead, to reduce the temporal (time) length of quantization artifacts.[59] And yet in choosing a fairly small window size to make MP3's temporal response adequate enough to avoid the most serious artifacts, MP3 becomes much less efficient in frequency domain compression of stationary, tonal components.

Being forced to use a hybrid time domain (filter bank) /frequency domain (MDCT) model to fit in with Layer II simply wastes processing time and compromises quality by introducing aliasing artifacts. MP3 has an aliasing cancellation stage specifically to mask this problem, but which instead produces frequency domain energy which must be encoded in the audio. This is pushed to the top of the frequency range, where most people have limited hearing, in hopes the distortion it causes will be less audible.

Layer II's 1024 point FFT doesn't entirely cover all samples, and would omit several entire MP3 sub-bands, where quantization factors must be determined. MP3 instead uses two passes of FFT analysis for spectral estimation, to calculate the global and individual masking thresholds. This allows it to cover all 1152 samples. Of the two, it utilizes the global masking threshold level from the more critical pass, with the most difficult audio.

In addition to Layer II's intensity encoded joint stereo, MP3 can use middle/side (mid/side, m/s, MS, matrixed) joint stereo. With mid/side stereo, certain frequency ranges of both channels are merged into a single (middle, mid, L+R) mono channel, while the sound difference between the left and right channels is stored as a separate (side, L-R) channel. Unlike intensity stereo, this process does not discard any audio information. When combined with quantization, however, it can exaggerate artifacts.

If the difference between the left and right channels is small, the side channel will be small, which will offer as much as a 50% bitrate savings, and associated quality improvement. If the difference between left and right is large, standard (discrete, left/right) stereo encoding may be preferred, as mid/side joint stereo will not provide any benefits. An MP3 encoder can switch between m/s stereo and full stereo on a frame-by-frame basis.[58][63][71]

Unlike Layers I and II, MP3 uses variable-length Huffman coding (after perceptual) to further reduce the bitrate, without any further quality loss.[56][59]

Quality

MP3's more fine-grained and selective quantization does prove notably superior to MP2 at lower-bitrates. It is able to provide nearly equivalent audio quality to Layer II, at a 15% lower bitrate (approximately).[68][69] 128 kbit/s is considered the "sweet spot" for MP3; meaning it provides generally acceptable quality stereo sound on most music, and there are diminishing quality improvements from increasing the bitrate further. MP3 is also regarded as exhibiting artifacts that are less annoying than Layer II, when both are used at bitrates that are too low to possibly provide faithful reproduction.

Layer III audio files use the extension ".mp3".

MPEG-2 audio extensions

The MPEG-2 standard includes several extensions to MPEG-1 Audio.[59] These are known as MPEG-2 BC – backwards compatible with MPEG-1 Audio.[72][73][74][75] MPEG-2 Audio is defined in ISO/IEC 13818-3.

These sampling rates are exactly half that of those originally defined for MPEG-1 Audio. They were introduced to maintain higher quality sound when encoding audio at lower-bitrates.[25] The even-lower bitrates were introduced because tests showed that MPEG-1 Audio could provide higher quality than any existing (c. 1994) very low bitrate (i.e. speech) audio codecs.[76]

Part 4: Conformance testing

Part 4 of the MPEG-1 standard covers conformance testing, and is defined in ISO/IEC-11172-4.

Conformance: Procedures for testing conformance.

Provides two sets of guidelines and reference bitstreams for testing the conformance of MPEG-1 audio and video decoders, as well as the bitstreams produced by an encoder.[16][23]

Part 5: Reference software

Part 5 of the MPEG-1 standard includes reference software, and is defined in ISO/IEC TR 11172–5.

Simulation: Reference software.

C reference code for encoding and decoding of audio and video, as well as multiplexing and demultiplexing.[16][23]

This includes the ISO Dist10 audio encoder code, which LAME and TooLAME were originally based upon.

File extension

.mpg is one of a number of file extensions for MPEG-1 or MPEG-2 audio and video compression. MPEG-1 Part 2 video is rare nowadays, and this extension typically refers to an MPEG program stream (defined in MPEG-1 and MPEG-2) or MPEG transport stream (defined in MPEG-2). Other suffixes such as .m2ts also exist specifying the precise container, in this case MPEG-2 TS, but this has little relevance to MPEG-1 media.

.mp3 is the most common extension for files containing MP3 audio (typically MPEG-1 Audio, sometimes MPEG-2 Audio). An MP3 file is typically an uncontained stream of raw audio; the conventional way to tag MP3 files is by writing data to "garbage" segments of each frame, which preserve the media information but are discarded by the player. This is similar in many respects to how raw .AAC files are tagged (but this is less supported nowadays, e.g. iTunes).

Note that although it would apply, .mpg does not normally append raw AAC or AAC in MPEG-2 Part 7 Containers. The .aac extension normally denotes these audio files.

See also

Implementations
  • Libavcodec includes MPEG-1/2 video/audio encoders and decoders
  • Mjpegtools MPEG-1/2 video/audio encoders
  • TooLAME A high quality MPEG-1 Audio Layer II encoder.
  • LAME A high quality MP3 audio encoder.
  • Musepack A format originally based on MPEG-1 Audio Layer II, but now incompatible.

References

  1. ^ a b Patel K, Smith BC, Rowe LA (1993-09-01). "Performance of a software MPEG video decoder". Proceedings of the First ACM International Conference on Multimedia. ACM Multimedia. New York City: Association for Computing Machinery: 75–82. doi:10.1145/166266.166274. ISBN 978-0-89791-596-0. S2CID 3773268. Reference 3 in the paper is to Committee Draft of Standard ISO/IEC 11172, December 6, 1991.
  2. ^ a b c d e f Adler, Mark; Popp, Harald; Hjerde, Morten (November 9, 1996), MPEG-FAQ: multimedia compression [1/9], faqs.org, from the original on January 4, 2017, retrieved 2016-11-11
  3. ^ a b c d e f g h Le Gall, Didier (April 1991), MPEG: a video compression standard for multimedia applications (PDF), Communications of the ACM, (PDF) from the original on 2017-01-27, retrieved 2016-11-11
  4. ^ Chiariglione, Leonardo (October 21, 1989), , ISO/IEC, archived from the original on August 5, 2010, retrieved 2008-04-09
  5. ^ ISO/IEC JTC 1/SC 29 (2009-10-30). . Archived from the original on 2013-12-31. Retrieved 2009-11-10.
  6. ^ ISO. "ISO/IEC 11172-1:1993 – Information technology – Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s – Part 1: Systems". from the original on 2016-11-12. Retrieved 2016-11-11.
  7. ^ MPEG. . chiariglione.org. Archived from the original on 2008-07-08. Retrieved 2009-10-31.
  8. ^ MPEG. . chiariglione.org. Archived from the original on 2010-02-21. Retrieved 2009-10-31.
  9. ^ a b MPEG. . chiariglione.org. Archived from the original on 2010-04-20. Retrieved 2009-10-31.
  10. ^ Lea, William (1994). . House of Commons Library. Archived from the original on 20 September 2019. Retrieved 20 September 2019.
  11. ^ "History of Video Compression". ITU-T. Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6). July 2002. pp. 11, 24–9, 33, 40–1, 53–6. Retrieved 3 November 2019.
  12. ^ Ghanbari, Mohammed (2003). Standard Codecs: Image Compression to Advanced Video Coding. Institution of Engineering and Technology. pp. 1–2. ISBN 9780852967102.
  13. ^ "The History of Video File Formats Infographic". RealNetworks. 22 April 2012. Retrieved 5 August 2019.
  14. ^ Hans Geog Musmann, (PDF), archived from the original (PDF) on 2012-01-17, retrieved 2011-07-26
  15. ^ a b c d e f g h i j k l Fogg, Chad (April 2, 1996), , University of California, Berkeley, archived from the original on 2008-06-16, retrieved 2016-11-11
  16. ^ Chiariglione, Leonardo (March 2001), , Linux Journal, archived from the original on 2011-07-25, retrieved 2008-04-09
  17. ^ a b c d Chiariglione, Leonardo; Le Gall, Didier; Musmann, Hans-Georg; Simon, Allen (September 1990), , ISO/IEC, archived from the original on 2010-02-14, retrieved 2008-04-09
  18. ^ , ISO/IEC, archived from the original on 2010-02-10, retrieved 2008-04-09
  19. ^ a b . Archived from the original on 2009-07-23. Retrieved 2008-10-12. Q. Well, then how do I get the documents, like the MPEG I draft? A. MPEG is a draft ISO standard. It's [sic] exact name is ISO CD 11172. [...] You may order it from your national standards body (e.g. ANSI in the USA) or buy it from companies like OMNICOM [...]
  20. ^ (Press release). ISO/IEC JTC1/SC29/WG11. 6 November 1992. Archived from the original on 12 August 2010. Retrieved 7 May 2018.
  21. ^ . Archived from the original on 2008-10-06. Retrieved 2008-07-13. . Archived from the original on 2008-06-12. Retrieved 2008-07-13. A Continuous Media Player, Lawrence A. Rowe and Brian C. Smith, Proc. 3rd Int. Workshop on Network and OS Support for Digital Audio and Video, San Diego CA (November 1992)[dead link]
  22. ^ a b c , ISO/IEC, archived from the original on 2008-07-08, retrieved 2008-04-03
  23. ^ Chiariglione, Leonardo (November 6, 1992), , ISO/IEC, archived from the original on 12 August 2010, retrieved 2008-04-09
  24. ^ a b c Wallace, Greg (April 2, 1993), , ISO/IEC, archived from the original on August 6, 2010, retrieved 2008-04-09
  25. ^ a b c d Popp, Harald; Hjerde, Morten (November 9, 1996), MPEG-FAQ: multimedia compression [2/9], faqs.org, from the original on January 4, 2017, retrieved 2016-11-11
  26. ^ . 26 July 2010. Archived from the original on 26 July 2010. Retrieved 7 May 2018.
  27. ^ ISO/IEC JTC 1/SC 29 (2010-07-17). . Archived from the original on 2013-12-31. Retrieved 2010-07-18.
  28. ^ ISO. "ISO/IEC 11172-1:1993 – Information technology – Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s – Part 1: Systems". from the original on 2017-08-30. Retrieved 2016-11-11.
  29. ^ ISO. "ISO/IEC 11172-2:1993 – Information technology – Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s – Part 2: Video". from the original on 2017-08-30. Retrieved 2016-11-11.
  30. ^ ISO. "ISO/IEC 11172-3:1993 – Information technology – Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s – Part 3: Audio". from the original on 2017-05-15. Retrieved 2016-11-11.
  31. ^ ISO. "ISO/IEC 11172-4:1995 – Information technology – Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s – Part 4: Compliance testing". from the original on 2017-08-30. Retrieved 2016-11-11.
  32. ^ ISO. "ISO/IEC TR 11172-5:1998 – Information technology – Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s – Part 5: Software simulation". from the original on 2017-08-30. Retrieved 2016-11-11.
  33. ^ Ozer, Jan (October 12, 2001), Choosing the Optimal Video Resolution: The MPEG-2 Player Market, extremetech.com, from the original on June 7, 2011, retrieved 2016-11-11
  34. ^ Comparison between MPEG 1 & 2, from the original on 2012-02-10, retrieved 2016-11-11
  35. ^ , Pure Motion Ltd., 2003, archived from the original on 2005-12-14, retrieved 2008-04-09
  36. ^ Dave Singer (2007-11-09). "homework] summary of the video (and audio) codec discussion". from the original on December 21, 2016. Retrieved November 11, 2016.
  37. ^ "MPEG-1 Video Coding (H.261)". Library of Congress, Digital Preservation. October 21, 2014. from the original on January 11, 2017. Retrieved 2016-11-11.
  38. ^ "ISO Standards and Patents". from the original on 2016-11-15. Retrieved 2016-11-11. Search for 11172
  39. ^ . archive.ph. Archived from the original on 2008-09-16. Retrieved 2023-01-21.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  40. ^ "[gst-devel] Can a MPEG-1 with Audio Layers 1&2 plugin be in plugins-good (patentwise)?". SourceForge.net. 2008-08-23. from the original on 2014-02-02. Retrieved 2016-11-11.
  41. ^ . lists.whatwg.org. Archived from the original on 19 July 2011. Retrieved 11 January 2022.
  42. ^ http://patft1.uspto.gov/netacgi/nph-Parser?patentnumber=5214678 Archived 2012-07-13 at archive.today "Digital transmission system using subband coding of a digital signal" Filed: May 31, 1990, Granted May 25, 1993, Expires May 31, 2010?
  43. ^ "mp3". Fraunhofer Institute for Integrated Circuits IIS. from the original on 22 March 2018. Retrieved 7 May 2018.
  44. ^ "ISO Standards and Patents". ISO. Retrieved 10 July 2019.
  45. ^ a b c d e f g Grill, B.; Quackenbush, S. (October 2005), , ISO/IEC, archived from the original on 2010-04-30
  46. ^ Chiariglione, Leonardo, MPEG-1 Systems, ISO/IEC, from the original on 2016-11-12, retrieved 2016-11-11
  47. ^ a b Pack Header, from the original on 2016-10-27, retrieved 2016-11-11
  48. ^ Fimoff, Mark; Bretl, Wayne E. (December 1, 1999), MPEG2 Tutorial, from the original on November 12, 2016, retrieved 2016-11-11
  49. ^ Fimoff, Mark; Bretl, Wayne E. (December 1, 1999), MPEG2 Tutorial, from the original on November 5, 2016, retrieved 2016-11-11
  50. ^ Fimoff, Mark; Bretl, Wayne E. (December 1, 1999), MPEG2 Tutorial, from the original on November 5, 2016, retrieved 2016-11-11
  51. ^ Fimoff, Mark; Bretl, Wayne E. (December 1, 1999), MPEG2 Tutorial, from the original on November 12, 2016, retrieved 2016-11-11
  52. ^ Acharya, Soam; Smith, Brian (1998), Compressed Domain Transcoding of MPEG, Cornell University, IEEE Computer Society, IEEE International Conference on Multimedia Computing and Systems, p. 3, archived from the original on 2011-02-23, retrieved 2016-11-11 – (Requires clever reading: says quantization matrices differ, but those are just defaults, and selectable)(registration required)
  53. ^ a b c Wee, Susie J.; Vasudev, Bhaskaran; Liu, Sam (March 13, 1997), , Hewlett-Packard, CiteSeerX 10.1.1.24.633, archived from the original on 2007-08-17, retrieved 2016-11-11
  54. ^ . Archived from the original on 2009-05-03. Retrieved 2009-05-03.
  55. ^ a b c d e f Thom, D.; Purnhagen, H. (October 1998), , ISO/IEC, archived from the original on 2010-02-18, retrieved 2016-11-11
  56. ^ , archived from the original on 2015-02-08, retrieved 2016-11-11
  57. ^ a b c d e f Church, Steve, , NAB Engineering Handbook, Telos Systems, archived from the original on 2001-05-08, retrieved 2008-04-09
  58. ^ a b c d e f g h i j Pan, Davis (Summer 1995), (PDF), IEEE MultiMedia Journal, p. 8, archived from the original (PDF) on 2004-09-19, retrieved 2008-04-09
  59. ^ Smith, Brian (1996), A Survey of Compressed Domain Processing Techniques, Cornell University, p. 7, archived from the original on 2011-02-23, retrieved 2008-04-09(registration required)
  60. ^ Cheng, Mike, Psychoacoustic Models in TwoLAME, twolame.org, from the original on 2016-10-22, retrieved 2016-11-11
  61. ^ Grill, B.; Quackenbush, S. (October 2005), , archived from the original on 2008-04-27, retrieved 2016-11-11
  62. ^ a b Herre, Jurgen (October 5, 2004), (PDF), International Conference on Digital Audio Effects, p. 2, archived from the original (PDF) on April 5, 2006, retrieved 2008-04-17
  63. ^ C.Grewin, and T.Ryden, Subjective Assessments on Low Bit-rate Audio Codecs, Proceedings of the 10th International AES Conference, pp 91 - 102, London 1991
  64. ^ J. Johnston, Estimation of Perceptual Entropy Using Noise Masking Criteria, in Proc. ICASSP-88, pp. 2524-2527, May 1988.
  65. ^ J. Johnston, Transform Coding of Audio Signals Using Perceptual Noise Criteria, IEEE Journal on Select Areas in Communications, vol. 6, no. 2, pp. 314-323, Feb. 1988.
  66. ^ Wustenhagen et al., Subjective Listening Test of Multi-channel Audio Codecs, AES 105th Convention Paper 4813, San Francisco 1998
  67. ^ a b B/MAE Project Group (September 2007), (PDF), European Broadcasting Union, archived from the original (PDF) on 2008-10-30, retrieved 2008-04-09
  68. ^ a b Meares, David; Watanabe, Kaoru; Scheirer, Eric (February 1998), (PDF), ISO/IEC, p. 18, archived from the original (PDF) on April 14, 2008, retrieved 2016-11-11
  69. ^ Painter, Ted; Spanias, Andreas (April 2000), (PDF), Proceedings of the IEEE, archived from the original (PDF) on September 16, 2006, retrieved 2016-11-11
  70. ^ Amorim, Roberto (September 19, 2006), GPSYCHO - Mid/Side Stereo, LAME, from the original on December 16, 2016, retrieved 2016-11-11
  71. ^ ISO (October 1998). . ISO. Archived from the original on 2010-02-18. Retrieved 2016-11-11.
  72. ^ D. Thom, H. Purnhagen, and the MPEG Audio Subgroup (October 1998). "MPEG Audio FAQ Version 9 - MPEG Audio". from the original on 2011-08-07. Retrieved 2016-11-11.{{cite web}}: CS1 maint: multiple names: authors list (link)
  73. ^ MPEG.ORG. . Archived from the original on 2007-08-31. Retrieved 2009-10-28.
  74. ^ ISO (2006-01-15), ISO/IEC 13818-7, Fourth edition, Part 7 – Advanced Audio Coding (AAC) (PDF), (PDF) from the original on 2009-03-06, retrieved 2016-11-11
  75. ^ Chiariglione, Leonardo (November 11, 1994), , ISO/IEC, archived from the original on August 8, 2010, retrieved 2008-04-09

External links

  • Official Web Page of the Moving Picture Experts Group (MPEG) a working group of ISO/IEC
  • MPEG Industry Forum Organization
  • Source Code to Implement MPEG-1

mpeg, standard, lossy, compression, video, audio, designed, compress, quality, digital, video, audio, down, about, mbit, compression, ratios, respectively, without, excessive, quality, loss, making, video, digital, cable, satellite, digital, audio, broadcastin. MPEG 1 is a standard for lossy compression of video and audio It is designed to compress VHS quality raw digital video and CD audio down to about 1 5 Mbit s 26 1 and 6 1 compression ratios respectively 2 without excessive quality loss making video CDs digital cable satellite TV and digital audio broadcasting DAB practical 3 4 Moving Picture Experts Group Phase 1 MPEG 1 Filename extension dat mpg mpeg mp1 mp2 mp3 m1v m1a m2a mpa mpvInternet media typeaudio mpeg video mpegDeveloped byMPEG part of ISO IEC JTC 1 Initial release6 December 1991 31 years ago 1991 12 06 1 Latest releaseISO IEC TR 11172 5 1998October 1998 24 years ago 1998 10 Type of formataudio video containerExtended fromJPEG H 261Extended toMPEG 2StandardISO IEC 11172Open format YesFree format YesToday MPEG 1 has become the most widely compatible lossy audio video format in the world and is used in a large number of products and technologies Perhaps the best known part of the MPEG 1 standard is the first version of the MP3 audio format it introduced The MPEG 1 standard is published as ISO IEC 11172 Information technology Coding of moving pictures and associated audio for digital storage media at up to about 1 5 Mbit s The standard consists of the following five Parts 5 6 7 8 9 Systems storage and synchronization of video audio and other data together Video compressed video content Audio compressed audio content Conformance testing testing the correctness of implementations of the standard Reference software example software showing how to encode and decode according to the standard Contents 1 History 2 Patents 2 1 Former patent holders 3 Applications 4 Part 1 Systems 4 1 Elementary streams packets and clock references 4 2 Program streams 4 3 Multiplexing 5 Part 2 Video 5 1 Color space 5 2 Resolution bitrate 5 3 Frame picture block types 5 3 1 I frames 5 3 2 P frames 5 3 3 B frames 5 3 4 D frames 5 4 Macroblocks 5 5 Motion vectors 5 6 DCT 5 7 Quantization 5 8 Entropy coding 5 9 GOP configurations for specific applications 6 Part 3 Audio 6 1 Layer I 6 2 Layer II 6 2 1 History MUSICAM 6 2 2 Technical details 6 2 3 Quality 6 3 Layer III 6 3 1 History ASPEC 6 3 2 Technical details 6 3 3 Quality 6 4 MPEG 2 audio extensions 7 Part 4 Conformance testing 8 Part 5 Reference software 9 File extension 10 See also 11 References 12 External linksHistory EditThe predecessor of MPEG 1 for video coding was the H 261 standard produced by the CCITT now known as the ITU T The basic architecture established in H 261 was the motion compensated DCT hybrid video coding structure 10 11 It uses macroblocks of size 16 16 with block based motion estimation in the encoder and motion compensation using encoder selected motion vectors in the decoder with residual difference coding using a discrete cosine transform DCT of size 8 8 scalar quantization and variable length codes like Huffman codes for entropy coding 12 H 261 was the first practical video coding standard and all of its described design elements were also used in MPEG 1 13 Modeled on the successful collaborative approach and the compression technologies developed by the Joint Photographic Experts Group and CCITT s Experts Group on Telephony creators of the JPEG image compression standard and the H 261 standard for video conferencing respectively the Moving Picture Experts Group MPEG working group was established in January 1988 by the initiative of Hiroshi Yasuda Nippon Telegraph and Telephone and Leonardo Chiariglione CSELT 14 MPEG was formed to address the need for standard video and audio formats and to build on H 261 to get better quality through the use of somewhat more complex encoding methods e g supporting higher precision for motion vectors 3 15 16 Development of the MPEG 1 standard began in May 1988 Fourteen video and fourteen audio codec proposals were submitted by individual companies and institutions for evaluation The codecs were extensively tested for computational complexity and subjective human perceived quality at data rates of 1 5 Mbit s This specific bitrate was chosen for transmission over T 1 E 1 lines and as the approximate data rate of audio CDs 17 The codecs that excelled in this testing were utilized as the basis for the standard and refined further with additional features and other improvements being incorporated in the process 18 After 20 meetings of the full group in various cities around the world and 4 years of development and testing the final standard for parts 1 3 was approved in early November 1992 and published a few months later 19 The reported completion date of the MPEG 1 standard varies greatly a largely complete draft standard was produced in September 1990 and from that point on only minor changes were introduced 3 The draft standard was publicly available for purchase 20 The standard was finished with the 6 November 1992 meeting 21 The Berkeley Plateau Multimedia Research Group developed an MPEG 1 decoder in November 1992 22 In July 1990 before the first draft of the MPEG 1 standard had even been written work began on a second standard MPEG 2 23 intended to extend MPEG 1 technology to provide full broadcast quality video as per CCIR 601 at high bitrates 3 15 Mbit s and support for interlaced video 24 Due in part to the similarity between the two codecs the MPEG 2 standard includes full backwards compatibility with MPEG 1 video so any MPEG 2 decoder can play MPEG 1 videos 25 Notably the MPEG 1 standard very strictly defines the bitstream and decoder function but does not define how MPEG 1 encoding is to be performed although a reference implementation is provided in ISO IEC 11172 5 2 This means that MPEG 1 coding efficiency can drastically vary depending on the encoder used and generally means that newer encoders perform significantly better than their predecessors 26 The first three parts Systems Video and Audio of ISO IEC 11172 were published in August 1993 27 MPEG 1 Parts 9 28 Part Number First publicrelease date first edition latestcorrection Title DescriptionPart 1 ISO IEC 11172 1 1993 1999 29 SystemsPart 2 ISO IEC 11172 2 1993 2006 30 VideoPart 3 ISO IEC 11172 3 1993 1996 31 AudioPart 4 ISO IEC 11172 4 1995 2007 32 Compliance testingPart 5 ISO IEC TR 11172 5 1998 2007 33 Software simulationPatents EditDue to its age MPEG 1 is no longer covered by any essential patents and can thus be used without obtaining a licence or paying any fees 34 35 36 37 38 The ISO patent database lists one patent for ISO 11172 US 4 472 747 which expired in 2003 39 The near complete draft of the MPEG 1 standard was publicly available as ISO CD 11172 20 by December 6 1991 1 Neither the July 2008 Kuro5hin article Patent Status of MPEG 1 H 261 and MPEG 2 40 nor an August 2008 thread on the gstreamer devel 41 mailing list were able to list a single unexpired MPEG 1 Video and MPEG 1 Audio Layer I II patent A May 2009 discussion on the whatwg mailing list mentioned US 5 214 678 patent as possibly covering MPEG 1 Audio Layer II 42 Filed in 1990 and published in 1993 this patent is now expired 43 A full MPEG 1 decoder and encoder with Layer III audio could not be implemented royalty free since there were companies that required patent fees for implementations of MPEG 1 Audio Layer III as discussed in the MP3 article All patents in the world connected to MP3 expired 30 December 2017 which makes this format totally free for use citation needed On 23 April 2017 Fraunhofer IIS stopped charging for Technicolor s MP3 licensing program for certain MP3 related patents and software 44 Former patent holders Edit The following corporations filed declarations with ISO saying they held patents for the MPEG 1 Video ISO IEC 11172 2 format although all such patents have since expired 45 BBC Daimler Benz AG Fujitsu IBM Matsushita Electric Industrial Co Ltd Mitsubishi Electric NEC NHK Philips Pioneer Corporation Qualcomm Ricoh Sony Texas Instruments Thomson Multimedia Toppan Printing Toshiba Victor Company of JapanApplications EditMost popular software for video playback includes MPEG 1 decoding in addition to any other supported formats The popularity of MP3 audio has established a massive installed base of hardware that can play back MPEG 1 Audio all three layers Virtually all digital audio devices can play back MPEG 1 Audio 46 Many millions have been sold to date Before MPEG 2 became widespread many digital satellite cable TV services used MPEG 1 exclusively 16 26 The widespread popularity of MPEG 2 with broadcasters means MPEG 1 is playable by most digital cable and satellite set top boxes and digital disc and tape players due to backwards compatibility MPEG 1 was used for full screen video on Green Book CD i and on Video CD VCD The Super Video CD standard based on VCD uses MPEG 1 audio exclusively as well as MPEG 2 video The DVD Video format uses MPEG 2 video primarily but MPEG 1 support is explicitly defined in the standard The DVD Video standard originally required MPEG 1 Audio Layer II for PAL countries but was changed to allow AC 3 Dolby Digital only discs MPEG 1 Audio Layer II is still allowed on DVDs although newer extensions to the format like MPEG Multichannel are rarely supported Most DVD players also support Video CD and MP3 CD playback which use MPEG 1 The international Digital Video Broadcasting DVB standard primarily uses MPEG 1 Audio Layer II and MPEG 2 video The international Digital Audio Broadcasting DAB standard uses MPEG 1 Audio Layer II exclusively due to its especially high quality modest decoder performance requirements and tolerance of errors The Digital Compact Cassette uses PASC Precision Adaptive Sub band Coding to encode its audio PASC is an early version of MPEG 1 Audio Layer I with a fixed bit rate of 384 kilobits per second Part 1 Systems EditPart 1 of the MPEG 1 standard covers systems and is defined in ISO IEC 11172 1 MPEG 1 Systems specifies the logical layout and methods used to store the encoded audio video and other data into a standard bitstream and to maintain synchronization between the different contents This file format is specifically designed for storage on media and transmission over communication channels that are considered relatively reliable Only limited error protection is defined by the standard and small errors in the bitstream may cause noticeable defects This structure was later named an MPEG program stream The MPEG 1 Systems design is essentially identical to the MPEG 2 Program Stream structure 47 This terminology is more popular precise differentiates it from an MPEG transport stream and will be used here Elementary streams packets and clock references Edit Elementary Streams ES are the raw bitstreams of MPEG 1 audio and video encoded data output from an encoder These files can be distributed on their own such as is the case with MP3 files Packetized Elementary Streams PES are elementary streams packetized into packets of variable lengths i e divided ES into independent chunks where cyclic redundancy check CRC checksum was added to each packet for error detection System Clock Reference SCR is a timing value stored in a 33 bit header of each PES at a frequency precision of 90 kHz with an extra 9 bit extension that stores additional timing data with a precision of 27 MHz 48 49 These are inserted by the encoder derived from the system time clock STC Simultaneously encoded audio and video streams will not have identical SCR values however due to buffering encoding jitter and other delay Program streams Edit Further information MPEG program stream Program Streams PS are concerned with combining multiple packetized elementary streams usually just one audio and video PES into a single stream ensuring simultaneous delivery and maintaining synchronization The PS structure is known as a multiplex or a container format Presentation time stamps PTS exist in PS to correct the inevitable disparity between audio and video SCR values time base correction 90 kHz PTS values in the PS header tell the decoder which video SCR values match which audio SCR values 48 PTS determines when to display a portion of an MPEG program and is also used by the decoder to determine when data can be discarded from the buffer 50 Either video or audio will be delayed by the decoder until the corresponding segment of the other arrives and can be decoded PTS handling can be problematic Decoders must accept multiple program streams that have been concatenated joined sequentially This causes PTS values in the middle of the video to reset to zero which then begin incrementing again Such PTS wraparound disparities can cause timing issues that must be specially handled by the decoder Decoding Time Stamps DTS additionally are required because of B frames With B frames in the video stream adjacent frames have to be encoded and decoded out of order re ordered frames DTS is quite similar to PTS but instead of just handling sequential frames it contains the proper time stamps to tell the decoder when to decode and display the next B frame types of frames explained below ahead of its anchor P or I frame Without B frames in the video PTS and DTS values are identical 51 Multiplexing Edit To generate the PS the multiplexer will interleave the two or more packetized elementary streams This is done so the packets of the simultaneous streams can be transferred over the same channel and are guaranteed to both arrive at the decoder at precisely the same time This is a case of time division multiplexing Determining how much data from each stream should be in each interleaved segment the size of the interleave is complicated yet an important requirement Improper interleaving will result in buffer underflows or overflows as the receiver gets more of one stream than it can store e g audio before it gets enough data to decode the other simultaneous stream e g video The MPEG Video Buffering Verifier VBV assists in determining if a multiplexed PS can be decoded by a device with a specified data throughput rate and buffer size 52 This offers feedback to the multiplexer and the encoder so that they can change the multiplex size or adjust bitrates as needed for compliance Part 2 Video EditPart 2 of the MPEG 1 standard covers video and is defined in ISO IEC 11172 2 The design was heavily influenced by H 261 MPEG 1 Video exploits perceptual compression methods to significantly reduce the data rate required by a video stream It reduces or completely discards information in certain frequencies and areas of the picture that the human eye has limited ability to fully perceive It also exploits temporal over time and spatial across a picture redundancy common in video to achieve better data compression than would be possible otherwise See Video compression Color space Edit Example of 4 2 0 subsampling The two overlapping center circles represent chroma blue and chroma red color pixels while the 4 outside circles represent the luma brightness Before encoding video to MPEG 1 the color space is transformed to Y CbCr Y Luma Cb Chroma Blue Cr Chroma Red Luma brightness resolution is stored separately from chroma color hue phase and even further separated into red and blue components The chroma is also subsampled to 4 2 0 meaning it is reduced to half resolution vertically and half resolution horizontally i e to just one quarter the number of samples used for the luma component of the video 2 This use of higher resolution for some color components is similar in concept to the Bayer pattern filter that is commonly used for the image capturing sensor in digital color cameras Because the human eye is much more sensitive to small changes in brightness the Y component than in color the Cr and Cb components chroma subsampling is a very effective way to reduce the amount of video data that needs to be compressed However on videos with fine detail high spatial complexity this can manifest as chroma aliasing artifacts Compared to other digital compression artifacts this issue seems to very rarely be a source of annoyance Because of the subsampling Y CbCr 4 2 0 video is ordinarily stored using even dimensions divisible by 2 horizontally and vertically Y CbCr color is often informally called YUV to simplify the notation although that term more properly applies to a somewhat different color format Similarly the terms luminance and chrominance are often used instead of the more accurate terms luma and chroma Resolution bitrate Edit MPEG 1 supports resolutions up to 4095 4095 12 bits and bit rates up to 100 Mbit s 16 MPEG 1 videos are most commonly seen using Source Input Format SIF resolution 352 240 352 288 or 320 240 These relatively low resolutions combined with a bitrate less than 1 5 Mbit s make up what is known as a constrained parameters bitstream CPB later renamed the Low Level LL profile in MPEG 2 This is the minimum video specifications any decoder should be able to handle to be considered MPEG 1 compliant This was selected to provide a good balance between quality and performance allowing the use of reasonably inexpensive hardware of the time 3 16 Frame picture block types Edit MPEG 1 has several frame picture types that serve different purposes The most important yet simplest is I frame I frames Edit I frame is an abbreviation for Intra frame so called because they can be decoded independently of any other frames They may also be known as I pictures or keyframes due to their somewhat similar function to the key frames used in animation I frames can be considered effectively identical to baseline JPEG images 16 High speed seeking through an MPEG 1 video is only possible to the nearest I frame When cutting a video it is not possible to start playback of a segment of video before the first I frame in the segment at least not without computationally intensive re encoding For this reason I frame only MPEG videos are used in editing applications I frame only compression is very fast but produces very large file sizes a factor of 3 or more larger than normally encoded MPEG 1 video depending on how temporally complex a specific video is 3 I frame only MPEG 1 video is very similar to MJPEG video So much so that very high speed and theoretically lossless in reality there are rounding errors conversion can be made from one format to the other provided a couple of restrictions color space and quantization matrix are followed in the creation of the bitstream 53 The length between I frames is known as the group of pictures GOP size MPEG 1 most commonly uses a GOP size of 15 18 i e 1 I frame for every 14 17 non I frames some combination of P and B frames With more intelligent encoders GOP size is dynamically chosen up to some pre selected maximum limit 16 Limits are placed on the maximum number of frames between I frames due to decoding complexing decoder buffer size recovery time after data errors seeking ability and accumulation of IDCT errors in low precision implementations most common in hardware decoders See IEEE 1180 P frames Edit P frame is an abbreviation for Predicted frame They may also be called forward predicted frames or inter frames B frames are also inter frames P frames exist to improve compression by exploiting the temporal over time redundancy in a video P frames store only the difference in image from the frame either an I frame or P frame immediately preceding it this reference frame is also called the anchor frame The difference between a P frame and its anchor frame is calculated using motion vectors on each macroblock of the frame see below Such motion vector data will be embedded in the P frame for use by the decoder A P frame can contain any number of intra coded blocks in addition to any forward predicted blocks 54 If a video drastically changes from one frame to the next such as a cut it is more efficient to encode it as an I frame B frames Edit B frame stands for bidirectional frame or bipredictive frame They may also be known as backwards predicted frames or B pictures B frames are quite similar to P frames except they can make predictions using both the previous and future frames i e two anchor frames It is therefore necessary for the player to first decode the next I or P anchor frame sequentially after the B frame before the B frame can be decoded and displayed This means decoding B frames requires larger data buffers and causes an increased delay on both decoding and during encoding This also necessitates the decoding time stamps DTS feature in the container system stream see above As such B frames have long been subject of much controversy they are often avoided in videos and are sometimes not fully supported by hardware decoders No other frames are predicted from a B frame Because of this a very low bitrate B frame can be inserted where needed to help control the bitrate If this was done with a P frame future P frames would be predicted from it and would lower the quality of the entire sequence However similarly the future P frame must still encode all the changes between it and the previous I or P anchor frame B frames can also be beneficial in videos where the background behind an object is being revealed over several frames or in fading transitions such as scene changes 3 16 A B frame can contain any number of intra coded blocks and forward predicted blocks in addition to backwards predicted or bidirectionally predicted blocks 16 54 D frames Edit MPEG 1 has a unique frame type not found in later video standards D frames or DC pictures are independently coded images intra frames that have been encoded using DC transform coefficients only AC coefficients are removed when encoding D frames see DCT below and hence are very low quality D frames are never referenced by I P or B frames D frames are only used for fast previews of video for instance when seeking through a video at high speed 3 Given moderately higher performance decoding equipment fast preview can be accomplished by decoding I frames instead of D frames This provides higher quality previews since I frames contain AC coefficients as well as DC coefficients If the encoder can assume that rapid I frame decoding capability is available in decoders it can save bits by not sending D frames thus improving compression of the video content For this reason D frames are seldom actually used in MPEG 1 video encoding and the D frame feature has not been included in any later video coding standards Macroblocks Edit Main article Macroblock MPEG 1 operates on video in a series of 8 8 blocks for quantization However to reduce the bit rate needed for motion vectors and because chroma color is subsampled by a factor of 4 each pair of red and blue chroma blocks corresponds to 4 different luma blocks This set of 6 blocks with a resolution of 16 16 is processed together and called a macroblock A macroblock is the smallest independent unit of color video Motion vectors see below operate solely at the macroblock level If the height or width of the video are not exact multiples of 16 full rows and full columns of macroblocks must still be encoded and decoded to fill out the picture though the extra decoded pixels are not displayed Motion vectors Edit To decrease the amount of temporal redundancy in a video only blocks that change are updated up to the maximum GOP size This is known as conditional replenishment However this is not very effective by itself Movement of the objects and or the camera may result in large portions of the frame needing to be updated even though only the position of the previously encoded objects has changed Through motion estimation the encoder can compensate for this movement and remove a large amount of redundant information The encoder compares the current frame with adjacent parts of the video from the anchor frame previous I or P frame in a diamond pattern up to a encoder specific predefined radius limit from the area of the current macroblock If a match is found only the direction and distance i e the vector of the motion from the previous video area to the current macroblock need to be encoded into the inter frame P or B frame The reverse of this process performed by the decoder to reconstruct the picture is called motion compensation A predicted macroblock rarely matches the current picture perfectly however The differences between the estimated matching area and the real frame macroblock is called the prediction error The larger the amount of prediction error the more data must be additionally encoded in the frame For efficient video compression it is very important that the encoder is capable of effectively and precisely performing motion estimation Motion vectors record the distance between two areas on screen based on the number of pixels also called pels MPEG 1 video uses a motion vector MV precision of one half of one pixel or half pel The finer the precision of the MVs the more accurate the match is likely to be and the more efficient the compression There are trade offs to higher precision however Finer MV precision results in using a larger amount of data to represent the MV as larger numbers must be stored in the frame for every single MV increased coding complexity as increasing levels of interpolation on the macroblock are required for both the encoder and decoder and diminishing returns minimal gains with higher precision MVs Half pel precision was chosen as the ideal trade off for that point in time See qpel Because neighboring macroblocks are likely to have very similar motion vectors this redundant information can be compressed quite effectively by being stored DPCM encoded Only the smaller amount of difference between the MVs for each macroblock needs to be stored in the final bitstream P frames have one motion vector per macroblock relative to the previous anchor frame B frames however can use two motion vectors one from the previous anchor frame and one from the future anchor frame 54 Partial macroblocks and black borders bars encoded into the video that do not fall exactly on a macroblock boundary cause havoc with motion prediction The block padding border information prevents the macroblock from closely matching with any other area of the video and so significantly larger prediction error information must be encoded for every one of the several dozen partial macroblocks along the screen border DCT encoding and quantization see below also isn t nearly as effective when there is large sharp picture contrast in a block An even more serious problem exists with macroblocks that contain significant random edge noise where the picture transitions to typically black All the above problems also apply to edge noise In addition the added randomness is simply impossible to compress significantly All of these effects will lower the quality or increase the bitrate of the video substantially DCT Edit Each 8 8 block is encoded by first applying a forward discrete cosine transform FDCT and then a quantization process The FDCT process by itself is theoretically lossless and can be reversed by applying an Inverse DCT IDCT to reproduce the original values in the absence of any quantization and rounding errors In reality there are some sometimes large rounding errors introduced both by quantization in the encoder as described in the next section and by IDCT approximation error in the decoder The minimum allowed accuracy of a decoder IDCT approximation is defined by ISO IEC 23002 1 Prior to 2006 it was specified by IEEE 1180 1990 The FDCT process converts the 8 8 block of uncompressed pixel values brightness or color difference values into an 8 8 indexed array of frequency coefficient values One of these is the statistically high in variance DC coefficient which represents the average value of the entire 8 8 block The other 63 coefficients are the statistically smaller AC coefficients which have positive or negative values each representing sinusoidal deviations from the flat block value represented by the DC coefficient An example of an encoded 8 8 FDCT block 415 30 61 27 56 20 2 0 4 22 61 10 13 7 9 5 47 7 77 25 29 10 5 6 49 12 34 15 10 6 2 2 12 7 13 4 2 2 3 3 8 3 2 6 2 1 4 2 1 0 0 2 1 3 4 1 0 0 1 4 1 0 1 2 displaystyle begin bmatrix 415 amp 30 amp 61 amp 27 amp 56 amp 20 amp 2 amp 0 4 amp 22 amp 61 amp 10 amp 13 amp 7 amp 9 amp 5 47 amp 7 amp 77 amp 25 amp 29 amp 10 amp 5 amp 6 49 amp 12 amp 34 amp 15 amp 10 amp 6 amp 2 amp 2 12 amp 7 amp 13 amp 4 amp 2 amp 2 amp 3 amp 3 8 amp 3 amp 2 amp 6 amp 2 amp 1 amp 4 amp 2 1 amp 0 amp 0 amp 2 amp 1 amp 3 amp 4 amp 1 0 amp 0 amp 1 amp 4 amp 1 amp 0 amp 1 amp 2 end bmatrix Since the DC coefficient value is statistically correlated from one block to the next it is compressed using DPCM encoding Only the smaller amount of difference between each DC value and the value of the DC coefficient in the block to its left needs to be represented in the final bitstream Additionally the frequency conversion performed by applying the DCT provides a statistical decorrelation function to efficiently concentrate the signal into fewer high amplitude values prior to applying quantization see below Quantization Edit Quantization is essentially the process of reducing the accuracy of a signal by dividing it by some larger step size and rounding to an integer value i e finding the nearest multiple and discarding the remainder The frame level quantizer is a number from 0 to 31 although encoders will usually omit disable some of the extreme values which determines how much information will be removed from a given frame The frame level quantizer is typically either dynamically selected by the encoder to maintain a certain user specified bitrate or much less commonly directly specified by the user A quantization matrix is a string of 64 numbers ranging from 0 to 255 which tells the encoder how relatively important or unimportant each piece of visual information is Each number in the matrix corresponds to a certain frequency component of the video image An example quantization matrix 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99 displaystyle begin bmatrix 16 amp 11 amp 10 amp 16 amp 24 amp 40 amp 51 amp 61 12 amp 12 amp 14 amp 19 amp 26 amp 58 amp 60 amp 55 14 amp 13 amp 16 amp 24 amp 40 amp 57 amp 69 amp 56 14 amp 17 amp 22 amp 29 amp 51 amp 87 amp 80 amp 62 18 amp 22 amp 37 amp 56 amp 68 amp 109 amp 103 amp 77 24 amp 35 amp 55 amp 64 amp 81 amp 104 amp 113 amp 92 49 amp 64 amp 78 amp 87 amp 103 amp 121 amp 120 amp 101 72 amp 92 amp 95 amp 98 amp 112 amp 100 amp 103 amp 99 end bmatrix Quantization is performed by taking each of the 64 frequency values of the DCT block dividing them by the frame level quantizer then dividing them by their corresponding values in the quantization matrix Finally the result is rounded down This significantly reduces or completely eliminates the information in some frequency components of the picture Typically high frequency information is less visually important and so high frequencies are much more strongly quantized drastically reduced MPEG 1 actually uses two separate quantization matrices one for intra blocks I blocks and one for inter block P and B blocks so quantization of different block types can be done independently and so more effectively 3 This quantization process usually reduces a significant number of the AC coefficients to zero known as sparse data which can then be more efficiently compressed by entropy coding lossless compression in the next step An example quantized DCT block 26 3 6 2 2 1 0 0 0 2 4 1 1 0 0 0 3 1 5 1 1 0 0 0 4 1 2 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 displaystyle begin bmatrix 26 amp 3 amp 6 amp 2 amp 2 amp 1 amp 0 amp 0 0 amp 2 amp 4 amp 1 amp 1 amp 0 amp 0 amp 0 3 amp 1 amp 5 amp 1 amp 1 amp 0 amp 0 amp 0 4 amp 1 amp 2 amp 1 amp 0 amp 0 amp 0 amp 0 1 amp 0 amp 0 amp 0 amp 0 amp 0 amp 0 amp 0 0 amp 0 amp 0 amp 0 amp 0 amp 0 amp 0 amp 0 0 amp 0 amp 0 amp 0 amp 0 amp 0 amp 0 amp 0 0 amp 0 amp 0 amp 0 amp 0 amp 0 amp 0 amp 0 end bmatrix Quantization eliminates a large amount of data and is the main lossy processing step in MPEG 1 video encoding This is also the primary source of most MPEG 1 video compression artifacts like blockiness color banding noise ringing discoloration et al This happens when video is encoded with an insufficient bitrate and the encoder is therefore forced to use high frame level quantizers strong quantization through much of the video Entropy coding Edit Several steps in the encoding of MPEG 1 video are lossless meaning they will be reversed upon decoding to produce exactly the same original values Since these lossless data compression steps don t add noise into or otherwise change the contents unlike quantization it is sometimes referred to as noiseless coding 46 Since lossless compression aims to remove as much redundancy as possible it is known as entropy coding in the field of information theory The coefficients of quantized DCT blocks tend to zero towards the bottom right Maximum compression can be achieved by a zig zag scanning of the DCT block starting from the top left and using Run length encoding techniques The DC coefficients and motion vectors are DPCM encoded Run length encoding RLE is a simple method of compressing repetition A sequential string of characters no matter how long can be replaced with a few bytes noting the value that repeats and how many times For example if someone were to say five nines you would know they mean the number 99999 RLE is particularly effective after quantization as a significant number of the AC coefficients are now zero called sparse data and can be represented with just a couple of bytes This is stored in a special 2 dimensional Huffman table that codes the run length and the run ending character Huffman Coding is a very popular and relatively simple method of entropy coding and used in MPEG 1 video to reduce the data size The data is analyzed to find strings that repeat often Those strings are then put into a special table with the most frequently repeating data assigned the shortest code This keeps the data as small as possible with this form of compression 46 Once the table is constructed those strings in the data are replaced with their much smaller codes which reference the appropriate entry in the table The decoder simply reverses this process to produce the original data This is the final step in the video encoding process so the result of Huffman coding is known as the MPEG 1 video bitstream GOP configurations for specific applications Edit I frames store complete frame info within the frame and are therefore suited for random access P frames provide compression using motion vectors relative to the previous frame I or P B frames provide maximum compression but require the previous as well as next frame for computation Therefore processing of B frames requires more buffer on the decoded side A configuration of the Group of Pictures GOP should be selected based on these factors I frame only sequences give least compression but are useful for random access FF FR and editability I and P frame sequences give moderate compression but add a certain degree of random access FF FR functionality I P and B frame sequences give very high compression but also increase the coding decoding delay significantly Such configurations are therefore not suited for video telephony or video conferencing applications The typical data rate of an I frame is 1 bit per pixel while that of a P frame is 0 1 bit per pixel and for a B frame 0 015 bit per pixel 55 Part 3 Audio EditPart 3 of the MPEG 1 standard covers audio and is defined in ISO IEC 11172 3 MPEG 1 Audio utilizes psychoacoustics to significantly reduce the data rate required by an audio stream It reduces or completely discards certain parts of the audio that it deduces that the human ear can t hear either because they are in frequencies where the ear has limited sensitivity or are masked by other typically louder sounds 56 Channel Encoding Mono Joint Stereo intensity encoded Joint Stereo M S encoded for Layer III only Stereo Dual two uncorrelated mono channels Sampling rates 32000 44100 and 48000 Hz Bitrates for Layer I 32 64 96 128 160 192 224 256 288 320 352 384 416 and 448 kbit s 57 Bitrates for Layer II 32 48 56 64 80 96 112 128 160 192 224 256 320 and 384 kbit s Bitrates for Layer III 32 40 48 56 64 80 96 112 128 160 192 224 256 and 320 kbit sMPEG 1 Audio is divided into 3 layers Each higher layer is more computationally complex and generally more efficient at lower bitrates than the previous 16 The layers are semi backwards compatible as higher layers reuse technologies implemented by the lower layers A Full Layer II decoder can also play Layer I audio but not Layer III audio although not all higher level players are full 56 Layer I Edit Main article MPEG 1 Audio Layer I MPEG 1 Audio Layer I is a simplified version of MPEG 1 Audio Layer II 18 Layer I uses a smaller 384 sample frame size for very low delay and finer resolution 26 This is advantageous for applications like teleconferencing studio editing etc It has lower complexity than Layer II to facilitate real time encoding on the hardware available c 1990 46 Layer I saw limited adoption in its time and most notably was used on Philips defunct Digital Compact Cassette at a bitrate of 384 kbit s 2 With the substantial performance improvements in digital processing since its introduction Layer I quickly became unnecessary and obsolete Layer I audio files typically use the extension mp1 or sometimes m1a Layer II Edit Main article MPEG 1 Audio Layer II MPEG 1 Audio Layer II the first version of MP2 often informally called MUSICAM 56 is a lossy audio format designed to provide high quality at about 192 kbit s for stereo sound Decoding MP2 audio is computationally simple relative to MP3 AAC etc History MUSICAM Edit MPEG 1 Audio Layer II was derived from the MUSICAM Masking pattern adapted Universal Subband Integrated Coding And Multiplexing audio codec developed by Centre commun d etudes de television et telecommunications CCETT Philips and Institut fur Rundfunktechnik IRT CNET 16 18 58 as part of the EUREKA 147 pan European inter governmental research and development initiative for the development of digital audio broadcasting Most key features of MPEG 1 Audio were directly inherited from MUSICAM including the filter bank time domain processing audio frame sizes etc However improvements were made and the actual MUSICAM algorithm was not used in the final MPEG 1 Audio Layer II standard The widespread usage of the term MUSICAM to refer to Layer II is entirely incorrect and discouraged for both technical and legal reasons 56 Technical details Edit MP2 is a time domain encoder It uses a low delay 32 sub band polyphased filter bank for time frequency mapping having overlapping ranges i e polyphased to prevent aliasing 59 The psychoacoustic model is based on the principles of auditory masking simultaneous masking effects and the absolute threshold of hearing ATH The size of a Layer II frame is fixed at 1152 samples coefficients Time domain refers to how analysis and quantization is performed on short discrete samples chunks of the audio waveform This offers low delay as only a small number of samples are analyzed before encoding as opposed to frequency domain encoding like MP3 which must analyze many times more samples before it can decide how to transform and output encoded audio This also offers higher performance on complex random and transient impulses such as percussive instruments and applause offering avoidance of artifacts like pre echo The 32 sub band filter bank returns 32 amplitude coefficients one for each equal sized frequency band segment of the audio which is about 700 Hz wide depending on the audio s sampling frequency The encoder then utilizes the psychoacoustic model to determine which sub bands contain audio information that is less important and so where quantization will be inaudible or at least much less noticeable 46 Example FFT analysis on an audio wave sample The psychoacoustic model is applied using a 1024 point fast Fourier transform FFT Of the 1152 samples per frame 64 samples at the top and bottom of the frequency range are ignored for this analysis They are presumably not significant enough to change the result The psychoacoustic model uses an empirically determined masking model to determine which sub bands contribute more to the masking threshold and how much quantization noise each can contain without being perceived Any sounds below the absolute threshold of hearing ATH are completely discarded The available bits are then assigned to each sub band accordingly 56 59 Typically sub bands are less important if they contain quieter sounds smaller coefficient than a neighboring i e similar frequency sub band with louder sounds larger coefficient Also noise components typically have a more significant masking effect than tonal components 58 Less significant sub bands are reduced in accuracy by quantization This basically involves compressing the frequency range amplitude of the coefficient i e raising the noise floor Then computing an amplification factor for the decoder to use to re expand each sub band to the proper frequency range 60 61 Layer II can also optionally use intensity stereo coding a form of joint stereo This means that the frequencies above 6 kHz of both channels are combined down mixed into one single mono channel but the side channel information on the relative intensity volume amplitude of each channel is preserved and encoded into the bitstream separately On playback the single channel is played through left and right speakers with the intensity information applied to each channel to give the illusion of stereo sound 46 58 This perceptual trick is known as stereo irrelevancy This can allow further reduction of the audio bitrate without much perceivable loss of fidelity but is generally not used with higher bitrates as it does not provide very high quality transparent audio 46 59 62 63 Quality Edit Subjective audio testing by experts in the most critical conditions ever implemented has shown MP2 to offer transparent audio compression at 256 kbit s for 16 bit 44 1 kHz CD audio using the earliest reference implementation more recent encoders should presumably perform even better 2 58 59 64 That approximately 1 6 compression ratio for CD audio is particularly impressive because it is quite close to the estimated upper limit of perceptual entropy at just over 1 8 65 66 Achieving much higher compression is simply not possible without discarding some perceptible information MP2 remains a favoured lossy audio coding standard due to its particularly high audio coding performances on important audio material such as castanet symphonic orchestra male and female voices and particularly complex and high energy transients impulses like percussive sounds triangle glockenspiel and audience applause 26 More recent testing has shown that MPEG Multichannel based on MP2 despite being compromised by an inferior matrixed mode for the sake of backwards compatibility 2 59 rates just slightly lower than much more recent audio codecs such as Dolby Digital AC 3 and Advanced Audio Coding AAC mostly within the margin of error and substantially superior in some cases such as audience applause 67 68 This is one reason that MP2 audio continues to be used extensively The MPEG 2 AAC Stereo verification tests reached a vastly different conclusion however showing AAC to provide superior performance to MP2 at half the bitrate 69 The reason for this disparity with both earlier and later tests is not clear but strangely a sample of applause is notably absent from the latter test Layer II audio files typically use the extension mp2 or sometimes m2a Layer III Edit Main article MPEG 1 Audio Layer III MPEG 1 Audio Layer III the first version of MP3 is a lossy audio format designed to provide acceptable quality at about 64 kbit s for monaural audio over single channel BRI ISDN links and 128 kbit s for stereo sound History ASPEC Edit ASPEC 91 in the Deutsches Museum Bonn with encoder below and decoder MPEG 1 Audio Layer III was derived from the Adaptive Spectral Perceptual Entropy Coding ASPEC codec developed by Fraunhofer as part of the EUREKA 147 pan European inter governmental research and development initiative for the development of digital audio broadcasting ASPEC was adapted to fit in with the Layer II model frame size filter bank FFT etc to become Layer III 18 ASPEC was itself based on Multiple adaptive Spectral audio Coding MSC by E F Schroeder Optimum Coding in the Frequency domain OCF the doctoral thesis by Karlheinz Brandenburg at the University of Erlangen Nuremberg Perceptual Transform Coding PXFM by J D Johnston at AT amp T Bell Labs and Transform coding of audio signals by Y Mahieux and J Petit at Institut fur Rundfunktechnik IRT CNET 70 Technical details Edit MP3 is a frequency domain audio transform encoder Even though it utilizes some of the lower layer functions MP3 is quite different from MP2 MP3 works on 1152 samples like MP2 but needs to take multiple frames for analysis before frequency domain MDCT processing and quantization can be effective It outputs a variable number of samples using a bit buffer to enable this variable bitrate VBR encoding while maintaining 1152 sample size output frames This causes a significantly longer delay before output which has caused MP3 to be considered unsuitable for studio applications where editing or other processing needs to take place 59 MP3 does not benefit from the 32 sub band polyphased filter bank instead just using an 18 point MDCT transformation on each output to split the data into 576 frequency components and processing it in the frequency domain 58 This extra granularity allows MP3 to have a much finer psychoacoustic model and more carefully apply appropriate quantization to each band providing much better low bitrate performance Frequency domain processing imposes some limitations as well causing a factor of 12 or 36 worse temporal resolution than Layer II This causes quantization artifacts due to transient sounds like percussive events and other high frequency events that spread over a larger window This results in audible smearing and pre echo 59 MP3 uses pre echo detection routines and VBR encoding which allows it to temporarily increase the bitrate during difficult passages in an attempt to reduce this effect It is also able to switch between the normal 36 sample quantization window and instead using 3 short 12 sample windows instead to reduce the temporal time length of quantization artifacts 59 And yet in choosing a fairly small window size to make MP3 s temporal response adequate enough to avoid the most serious artifacts MP3 becomes much less efficient in frequency domain compression of stationary tonal components Being forced to use a hybrid time domain filter bank frequency domain MDCT model to fit in with Layer II simply wastes processing time and compromises quality by introducing aliasing artifacts MP3 has an aliasing cancellation stage specifically to mask this problem but which instead produces frequency domain energy which must be encoded in the audio This is pushed to the top of the frequency range where most people have limited hearing in hopes the distortion it causes will be less audible Layer II s 1024 point FFT doesn t entirely cover all samples and would omit several entire MP3 sub bands where quantization factors must be determined MP3 instead uses two passes of FFT analysis for spectral estimation to calculate the global and individual masking thresholds This allows it to cover all 1152 samples Of the two it utilizes the global masking threshold level from the more critical pass with the most difficult audio In addition to Layer II s intensity encoded joint stereo MP3 can use middle side mid side m s MS matrixed joint stereo With mid side stereo certain frequency ranges of both channels are merged into a single middle mid L R mono channel while the sound difference between the left and right channels is stored as a separate side L R channel Unlike intensity stereo this process does not discard any audio information When combined with quantization however it can exaggerate artifacts If the difference between the left and right channels is small the side channel will be small which will offer as much as a 50 bitrate savings and associated quality improvement If the difference between left and right is large standard discrete left right stereo encoding may be preferred as mid side joint stereo will not provide any benefits An MP3 encoder can switch between m s stereo and full stereo on a frame by frame basis 58 63 71 Unlike Layers I and II MP3 uses variable length Huffman coding after perceptual to further reduce the bitrate without any further quality loss 56 59 Quality Edit MP3 s more fine grained and selective quantization does prove notably superior to MP2 at lower bitrates It is able to provide nearly equivalent audio quality to Layer II at a 15 lower bitrate approximately 68 69 128 kbit s is considered the sweet spot for MP3 meaning it provides generally acceptable quality stereo sound on most music and there are diminishing quality improvements from increasing the bitrate further MP3 is also regarded as exhibiting artifacts that are less annoying than Layer II when both are used at bitrates that are too low to possibly provide faithful reproduction Layer III audio files use the extension mp3 MPEG 2 audio extensions Edit The MPEG 2 standard includes several extensions to MPEG 1 Audio 59 These are known as MPEG 2 BC backwards compatible with MPEG 1 Audio 72 73 74 75 MPEG 2 Audio is defined in ISO IEC 13818 3 MPEG Multichannel Backward compatible 5 1 channel surround sound 25 Sampling rates 16000 22050 and 24000 Hz Bitrates 8 16 24 32 40 48 56 64 80 96 112 128 144 and 160 kbit sThese sampling rates are exactly half that of those originally defined for MPEG 1 Audio They were introduced to maintain higher quality sound when encoding audio at lower bitrates 25 The even lower bitrates were introduced because tests showed that MPEG 1 Audio could provide higher quality than any existing c 1994 very low bitrate i e speech audio codecs 76 Part 4 Conformance testing EditPart 4 of the MPEG 1 standard covers conformance testing and is defined in ISO IEC 11172 4 Conformance Procedures for testing conformance Provides two sets of guidelines and reference bitstreams for testing the conformance of MPEG 1 audio and video decoders as well as the bitstreams produced by an encoder 16 23 Part 5 Reference software EditPart 5 of the MPEG 1 standard includes reference software and is defined in ISO IEC TR 11172 5 Simulation Reference software C reference code for encoding and decoding of audio and video as well as multiplexing and demultiplexing 16 23 This includes the ISO Dist10 audio encoder code which LAME and TooLAME were originally based upon File extension Edit mpg is one of a number of file extensions for MPEG 1 or MPEG 2 audio and video compression MPEG 1 Part 2 video is rare nowadays and this extension typically refers to an MPEG program stream defined in MPEG 1 and MPEG 2 or MPEG transport stream defined in MPEG 2 Other suffixes such as m2ts also exist specifying the precise container in this case MPEG 2 TS but this has little relevance to MPEG 1 media mp3 is the most common extension for files containing MP3 audio typically MPEG 1 Audio sometimes MPEG 2 Audio An MP3 file is typically an uncontained stream of raw audio the conventional way to tag MP3 files is by writing data to garbage segments of each frame which preserve the media information but are discarded by the player This is similar in many respects to how raw AAC files are tagged but this is less supported nowadays e g iTunes Note that although it would apply mpg does not normally append raw AAC or AAC in MPEG 2 Part 7 Containers The aac extension normally denotes these audio files See also EditMPEG The Moving Picture Experts Group developers of the MPEG 1 standard MP3 Additional less technical details about MPEG 1 Audio Layer III MPEG Multichannel Backwards compatible 5 1 channel surround sound extension to MPEG 1 Audio Layer II MPEG 2 The direct successor to the MPEG 1 standard ISO IEC JTC 1 SC 29ImplementationsLibavcodec includes MPEG 1 2 video audio encoders and decoders Mjpegtools MPEG 1 2 video audio encoders TooLAME A high quality MPEG 1 Audio Layer II encoder LAME A high quality MP3 audio encoder Musepack A format originally based on MPEG 1 Audio Layer II but now incompatible References Edit a b Patel K Smith BC Rowe LA 1993 09 01 Performance of a software MPEG video decoder Proceedings of the First ACM International Conference on Multimedia ACM Multimedia New York City Association for Computing Machinery 75 82 doi 10 1145 166266 166274 ISBN 978 0 89791 596 0 S2CID 3773268 Reference 3 in the paper is to Committee Draft of Standard ISO IEC 11172 December 6 1991 a b c d e f Adler Mark Popp Harald Hjerde Morten November 9 1996 MPEG FAQ multimedia compression 1 9 faqs org archived from the original on January 4 2017 retrieved 2016 11 11 a b c d e f g h Le Gall Didier April 1991 MPEG a video compression standard for multimedia applications PDF Communications of the ACM archived PDF from the original on 2017 01 27 retrieved 2016 11 11 Chiariglione Leonardo October 21 1989 Kurihama 89 press release ISO IEC archived from the original on August 5 2010 retrieved 2008 04 09 ISO IEC JTC 1 SC 29 2009 10 30 Programme of Work Allocated to SC 29 WG 11 MPEG 1 Coding of moving pictures and associated audio for digital storage media at up to about 1 5 Mbit s Archived from the original on 2013 12 31 Retrieved 2009 11 10 ISO ISO IEC 11172 1 1993 Information technology Coding of moving pictures and associated audio for digital storage media at up to about 1 5 Mbit s Part 1 Systems Archived from the original on 2016 11 12 Retrieved 2016 11 11 MPEG About MPEG Achievements chiariglione org Archived from the original on 2008 07 08 Retrieved 2009 10 31 MPEG Terms of Reference chiariglione org Archived from the original on 2010 02 21 Retrieved 2009 10 31 a b MPEG MPEG standards Full list of standards developed or under development chiariglione org Archived from the original on 2010 04 20 Retrieved 2009 10 31 Lea William 1994 Video on demand Research Paper 94 68 House of Commons Library Archived from the original on 20 September 2019 Retrieved 20 September 2019 History of Video Compression ITU T Joint Video Team JVT of ISO IEC MPEG amp ITU T VCEG ISO IEC JTC1 SC29 WG11 and ITU T SG16 Q 6 July 2002 pp 11 24 9 33 40 1 53 6 Retrieved 3 November 2019 Ghanbari Mohammed 2003 Standard Codecs Image Compression to Advanced Video Coding Institution of Engineering and Technology pp 1 2 ISBN 9780852967102 The History of Video File Formats Infographic RealNetworks 22 April 2012 Retrieved 5 August 2019 Hans Geog Musmann Genesis of the MP3 Audio Coding Standard PDF archived from the original PDF on 2012 01 17 retrieved 2011 07 26 Fogg Chad April 2 1996 MPEG 2 FAQ University of California Berkeley archived from the original on August 29 2000 retrieved 2008 04 09 a b c d e f g h i j k l Fogg Chad April 2 1996 MPEG 2 FAQ archived website University of California Berkeley archived from the original on 2008 06 16 retrieved 2016 11 11 Chiariglione Leonardo March 2001 Open source in MPEG Linux Journal archived from the original on 2011 07 25 retrieved 2008 04 09 a b c d Chiariglione Leonardo Le Gall Didier Musmann Hans Georg Simon Allen September 1990 Press Release Status report of ISO MPEG ISO IEC archived from the original on 2010 02 14 retrieved 2008 04 09 Meetings ISO IEC archived from the original on 2010 02 10 retrieved 2008 04 09 a b The MPEG FAQ Version 3 1 Archived from the original on 2009 07 23 Retrieved 2008 10 12 Q Well then how do I get the documents like the MPEG I draft A MPEG is a draft ISO standard It s sic exact name is ISO CD 11172 You may order it from your national standards body e g ANSI in the USA or buy it from companies like OMNICOM MPEG Press Release Press release ISO IEC JTC1 SC29 WG11 6 November 1992 Archived from the original on 12 August 2010 Retrieved 7 May 2018 Abstract Page 101 Archived from the original on 2008 10 06 Retrieved 2008 07 13 BMRC Archived from the original on 2008 06 12 Retrieved 2008 07 13 A Continuous Media Player Lawrence A Rowe and Brian C Smith Proc 3rd Int Workshop on Network and OS Support for Digital Audio and Video San Diego CA November 1992 dead link a b c Achievements ISO IEC archived from the original on 2008 07 08 retrieved 2008 04 03 Chiariglione Leonardo November 6 1992 MPEG Press Release London 6 November 1992 ISO IEC archived from the original on 12 August 2010 retrieved 2008 04 09 a b c Wallace Greg April 2 1993 Press Release ISO IEC archived from the original on August 6 2010 retrieved 2008 04 09 a b c d Popp Harald Hjerde Morten November 9 1996 MPEG FAQ multimedia compression 2 9 faqs org archived from the original on January 4 2017 retrieved 2016 11 11 INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO 26 July 2010 Archived from the original on 26 July 2010 Retrieved 7 May 2018 ISO IEC JTC 1 SC 29 2010 07 17 MPEG 1 Coding of moving pictures and associated audio for digital storage media at up to about 1 5 Mbit s Archived from the original on 2013 12 31 Retrieved 2010 07 18 ISO ISO IEC 11172 1 1993 Information technology Coding of moving pictures and associated audio for digital storage media at up to about 1 5 Mbit s Part 1 Systems Archived from the original on 2017 08 30 Retrieved 2016 11 11 ISO ISO IEC 11172 2 1993 Information technology Coding of moving pictures and associated audio for digital storage media at up to about 1 5 Mbit s Part 2 Video Archived from the original on 2017 08 30 Retrieved 2016 11 11 ISO ISO IEC 11172 3 1993 Information technology Coding of moving pictures and associated audio for digital storage media at up to about 1 5 Mbit s Part 3 Audio Archived from the original on 2017 05 15 Retrieved 2016 11 11 ISO ISO IEC 11172 4 1995 Information technology Coding of moving pictures and associated audio for digital storage media at up to about 1 5 Mbit s Part 4 Compliance testing Archived from the original on 2017 08 30 Retrieved 2016 11 11 ISO ISO IEC TR 11172 5 1998 Information technology Coding of moving pictures and associated audio for digital storage media at up to about 1 5 Mbit s Part 5 Software simulation Archived from the original on 2017 08 30 Retrieved 2016 11 11 Ozer Jan October 12 2001 Choosing the Optimal Video Resolution The MPEG 2 Player Market extremetech com archived from the original on June 7 2011 retrieved 2016 11 11 Comparison between MPEG 1 amp 2 archived from the original on 2012 02 10 retrieved 2016 11 11 MPEG 1 And 2 Compared Pure Motion Ltd 2003 archived from the original on 2005 12 14 retrieved 2008 04 09 Dave Singer 2007 11 09 homework summary of the video and audio codec discussion Archived from the original on December 21 2016 Retrieved November 11 2016 MPEG 1 Video Coding H 261 Library of Congress Digital Preservation October 21 2014 Archived from the original on January 11 2017 Retrieved 2016 11 11 ISO Standards and Patents Archived from the original on 2016 11 15 Retrieved 2016 11 11 Search for 11172 archive ph archive ph Archived from the original on 2008 09 16 Retrieved 2023 01 21 a href Template Cite web html title Template Cite web cite web a CS1 maint bot original URL status unknown link gst devel Can a MPEG 1 with Audio Layers 1 amp 2 plugin be in plugins good patentwise SourceForge net 2008 08 23 Archived from the original on 2014 02 02 Retrieved 2016 11 11 whatwg MPEG 1 subset proposal for HTML5 video codec lists whatwg org Archived from the original on 19 July 2011 Retrieved 11 January 2022 http patft1 uspto gov netacgi nph Parser patentnumber 5214678 Archived 2012 07 13 at archive today Digital transmission system using subband coding of a digital signal Filed May 31 1990 Granted May 25 1993 Expires May 31 2010 mp3 Fraunhofer Institute for Integrated Circuits IIS Archived from the original on 22 March 2018 Retrieved 7 May 2018 ISO Standards and Patents ISO Retrieved 10 July 2019 a b c d e f g Grill B Quackenbush S October 2005 MPEG 1 Audio ISO IEC archived from the original on 2010 04 30 Chiariglione Leonardo MPEG 1 Systems ISO IEC archived from the original on 2016 11 12 retrieved 2016 11 11 a b Pack Header archived from the original on 2016 10 27 retrieved 2016 11 11 Fimoff Mark Bretl Wayne E December 1 1999 MPEG2 Tutorial archived from the original on November 12 2016 retrieved 2016 11 11 Fimoff Mark Bretl Wayne E December 1 1999 MPEG2 Tutorial archived from the original on November 5 2016 retrieved 2016 11 11 Fimoff Mark Bretl Wayne E December 1 1999 MPEG2 Tutorial archived from the original on November 5 2016 retrieved 2016 11 11 Fimoff Mark Bretl Wayne E December 1 1999 MPEG2 Tutorial archived from the original on November 12 2016 retrieved 2016 11 11 Acharya Soam Smith Brian 1998 Compressed Domain Transcoding of MPEG Cornell University IEEE Computer Society IEEE International Conference on Multimedia Computing and Systems p 3 archived from the original on 2011 02 23 retrieved 2016 11 11 Requires clever reading says quantization matrices differ but those are just defaults and selectable registration required a b c Wee Susie J Vasudev Bhaskaran Liu Sam March 13 1997 Transcoding MPEG Video Streams in the Compressed Domain Hewlett Packard CiteSeerX 10 1 1 24 633 archived from the original on 2007 08 17 retrieved 2016 11 11 BMRC Archived from the original on 2009 05 03 Retrieved 2009 05 03 a b c d e f Thom D Purnhagen H October 1998 MPEG Audio FAQ Version 9 ISO IEC archived from the original on 2010 02 18 retrieved 2016 11 11 MPEG Audio Frame Header archived from the original on 2015 02 08 retrieved 2016 11 11 a b c d e f Church Steve Perceptual Coding and MPEG Compression NAB Engineering Handbook Telos Systems archived from the original on 2001 05 08 retrieved 2008 04 09 a b c d e f g h i j Pan Davis Summer 1995 A Tutorial on MPEG Audio Compression PDF IEEE MultiMedia Journal p 8 archived from the original PDF on 2004 09 19 retrieved 2008 04 09 Smith Brian 1996 A Survey of Compressed Domain Processing Techniques Cornell University p 7 archived from the original on 2011 02 23 retrieved 2008 04 09 registration required Cheng Mike Psychoacoustic Models in TwoLAME twolame org archived from the original on 2016 10 22 retrieved 2016 11 11 Grill B Quackenbush S October 2005 MPEG 1 Audio archived from the original on 2008 04 27 retrieved 2016 11 11 a b Herre Jurgen October 5 2004 From Joint Stereo to Spatial Audio Coding PDF International Conference on Digital Audio Effects p 2 archived from the original PDF on April 5 2006 retrieved 2008 04 17 C Grewin and T Ryden Subjective Assessments on Low Bit rate Audio Codecs Proceedings of the 10th International AES Conference pp 91 102 London 1991 J Johnston Estimation of Perceptual Entropy Using Noise Masking Criteria in Proc ICASSP 88 pp 2524 2527 May 1988 J Johnston Transform Coding of Audio Signals Using Perceptual Noise Criteria IEEE Journal on Select Areas in Communications vol 6 no 2 pp 314 323 Feb 1988 Wustenhagen et al Subjective Listening Test of Multi channel Audio Codecs AES 105th Convention Paper 4813 San Francisco 1998 a b B MAE Project Group September 2007 EBU evaluations of multichannel audio codecs PDF European Broadcasting Union archived from the original PDF on 2008 10 30 retrieved 2008 04 09 a b Meares David Watanabe Kaoru Scheirer Eric February 1998 Report on the MPEG 2 AAC Stereo Verification Tests PDF ISO IEC p 18 archived from the original PDF on April 14 2008 retrieved 2016 11 11 Painter Ted Spanias Andreas April 2000 Perceptual Coding of Digital Audio Proceedings of the IEEE VOL 88 NO 4 PDF Proceedings of the IEEE archived from the original PDF on September 16 2006 retrieved 2016 11 11 Amorim Roberto September 19 2006 GPSYCHO Mid Side Stereo LAME archived from the original on December 16 2016 retrieved 2016 11 11 ISO October 1998 MPEG Audio FAQ Version 9 MPEG 1 and MPEG 2 BC ISO Archived from the original on 2010 02 18 Retrieved 2016 11 11 D Thom H Purnhagen and the MPEG Audio Subgroup October 1998 MPEG Audio FAQ Version 9 MPEG Audio Archived from the original on 2011 08 07 Retrieved 2016 11 11 a href Template Cite web html title Template Cite web cite web a CS1 maint multiple names authors list link MPEG ORG AAC Archived from the original on 2007 08 31 Retrieved 2009 10 28 ISO 2006 01 15 ISO IEC 13818 7 Fourth edition Part 7 Advanced Audio Coding AAC PDF archived PDF from the original on 2009 03 06 retrieved 2016 11 11 Chiariglione Leonardo November 11 1994 Press Release ISO IEC archived from the original on August 8 2010 retrieved 2008 04 09External links EditOfficial Web Page of the Moving Picture Experts Group MPEG a working group of ISO IEC MPEG Industry Forum Organization Source Code to Implement MPEG 1 A simple concise explanation from Berkeley Multimedia Research Center Retrieved from https en wikipedia org w index php title MPEG 1 amp oldid 1138250562, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.