fbpx
Wikipedia

Base64

In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits.

Common to all binary-to-text encoding schemes, Base64 is designed to carry data stored in binary formats across channels that only reliably support text content. Base64 is particularly prevalent on the World Wide Web[1] where one of its uses is the ability to embed image files or other binary assets inside textual assets such as HTML and CSS files.[2]

Base64 is also widely used for sending e-mail attachments. This is required because SMTP – in its original form – was designed to transport 7-bit ASCII characters only. This encoding causes an overhead of 33–37% (33% by the encoding itself; up to 4% more by the inserted line breaks).

Design

Each Base64 digit can take on 64 different values, encoding 6 bits of data. Which characters are chosen to represent the 64 values varies between implementations. The general strategy is to choose 64 characters that are common to most encodings and that are also printable. This combination leaves the data unlikely to be modified in transit through information systems, such as email, that were traditionally not 8-bit clean.[3] For example, MIME's Base64 implementation uses AZ, az, and 09 for the first 62 values. Other variations share this property but differ in the symbols chosen for the last two values; an example is UTF-7.

The earliest instances of this type of encoding were created for dial-up communication between systems running the same OS, for example, uuencode for UNIX and BinHex for the TRS-80 (later adapted for the Macintosh), and could therefore make more assumptions about what characters were safe to use. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase.[4][5][6][3]

Base64 table from RFC 4648

This is the Base64 alphabet defined in RFC 4648 §4 . See also Variants summary (below).

Index Binary Char Index Binary Char Index Binary Char Index Binary Char
0 000000 A 16 010000 Q 32 100000 g 48 110000 w
1 000001 B 17 010001 R 33 100001 h 49 110001 x
2 000010 C 18 010010 S 34 100010 i 50 110010 y
3 000011 D 19 010011 T 35 100011 j 51 110011 z
4 000100 E 20 010100 U 36 100100 k 52 110100 0
5 000101 F 21 010101 V 37 100101 l 53 110101 1
6 000110 G 22 010110 W 38 100110 m 54 110110 2
7 000111 H 23 010111 X 39 100111 n 55 110111 3
8 001000 I 24 011000 Y 40 101000 o 56 111000 4
9 001001 J 25 011001 Z 41 101001 p 57 111001 5
10 001010 K 26 011010 a 42 101010 q 58 111010 6
11 001011 L 27 011011 b 43 101011 r 59 111011 7
12 001100 M 28 011100 c 44 101100 s 60 111100 8
13 001101 N 29 011101 d 45 101101 t 61 111101 9
14 001110 O 30 011110 e 46 101110 u 62 111110 +
15 001111 P 31 011111 f 47 101111 v 63 111111 /
Padding =

Examples

The example below uses ASCII text for simplicity, but this is not a typical use case, as it can already be safely transferred across all systems that can handle Base64. The more typical use is to encode binary data (such as an image); the resulting Base64 data will only contain 64 different ASCII characters, all of which can reliably be transferred across systems that may corrupt the raw source bytes.

Here is a well-known idiom from distributed computing:

Many hands make light work.

When the quote (without trailing whitespace) is encoded into Base64, it is represented as a byte sequence of 8-bit-padded ASCII characters encoded in MIME's Base64 scheme as follows (newlines and white spaces may be present anywhere but are to be ignored on decoding):

TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu

In the above quote, the encoded value of Man is TWFu. Encoded in ASCII, the characters M, a, and n are stored as the byte values 77, 97, and 110, which are the 8-bit binary values 01001101, 01100001, and 01101110. These three values are joined together into a 24-bit string, producing 010011010110000101101110. Groups of 6 bits (6 bits have a maximum of 26 = 64 different binary values) are converted into individual numbers from start to end (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values.

As this example illustrates, Base64 encoding converts three octets into four encoded characters.

Source Text (ASCII) M a n
Octets 77 (0x4d) 97 (0x61) 110 (0x6e)
Bits 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0
Base64
encoded
Sextets 19 22 5 46
Character T W F u
Octets 84 (0x54) 87 (0x57) 70 (0x46) 117 (0x75)

= padding characters might be added to make the last encoded block contain four Base64 characters.

Hexadecimal to octal transformation is useful to convert between binary and Base64. Such conversion is available for both advanced calculators and programming languages. For example, the hexadecimal representation of the 24 bits above is 4D616E. The octal representation is 23260556. Those 8 octal digits can be split into pairs (23 26 05 56), and each pair can be converted to decimal to yield 19 22 05 46. Using those four decimal numbers as indices for the Base64 alphabet, the corresponding ASCII characters are TWFu.

If there are only two significant input octets (e.g., 'Ma'), or when the last input group contains only two octets, all 16 bits will be captured in the first three Base64 digits (18 bits); the two least significant bits of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding = padding character):

Source Text (ASCII) M a
Octets 77 (0x4d) 97 (0x61)
Bits 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 0
Base64
encoded
Sextets 19 22 4 Padding
Character T W E =
Octets 84 (0x54) 87 (0x57) 69 (0x45) 61 (0x3D)

If there is only one significant input octet (e.g., 'M'), or when the last input group contains only one octet, all 8 bits will be captured in the first two Base64 digits (12 bits); the four least significant bits of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding two = padding characters):

Source Text (ASCII) M
Octets 77 (0x4d)
Bits 0 1 0 0 1 1 0 1 0 0 0 0
Base64
encoded
Sextets 19 16 Padding Padding
Character T Q = =
Octets 84 (0x54) 81 (0x51) 61 (0x3D) 61 (0x3D)

Output padding

Because Base64 is a six-bit encoding, and because the decoded values are divided into 8-bit octets, every four characters of Base64-encoded text (4 sextets = 4 × 6 = 24 bits) represents three octets of unencoded text or data (3 octets = 3 × 8 = 24 bits). This means that when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four. The padding character is =, which indicates that no further bits are needed to fully encode the input. (This is different from A, which means that the remaining bits are all zeros.) The example below illustrates how truncating the input of the above quote changes the output padding:

Input Output Padding
Text Length Text Length
light work. 11 bGlnaHQgd29yay4= 16 1
light work 10 bGlnaHQgd29yaw== 16 2
light wor 9 bGlnaHQgd29y 12 0
light wo 8 bGlnaHQgd28= 12 1
light w 7 bGlnaHQgdw== 12 2

The padding character is not essential for decoding, since the number of missing bytes can be inferred from the length of the encoded text. In some implementations, the padding character is mandatory, while for others it is not used. An exception in which padding characters are required is when multiple Base64 encoded files have been concatenated.

Decoding Base64 with padding

When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single = indicates that the four characters will decode to only two bytes, while == indicates that the four characters will decode to only a single byte. For example:

Encoded Padding Length Decoded
bGlnaHQgdw== == 1 light w
bGlnaHQgd28= = 2 light wo
bGlnaHQgd29y None 3 light wor

Another way to interpret the padding character is to consider it as an instruction to discard 2 trailing bits from the bit string each time a = is encountered. For example, when `bGlnaHQgdw==` is decoded, we convert each character (except the trailing occurrences of =) into their corresponding 6-bit representation, and then discard 2 trailing bits for the first = and another 2 trailing bits for the other =. In this instance, we would get 6 bits from the d, and another 6 bits from the w for a bit string of length 12, but since we remove 2 bits for each = (for a total of 4 bits), the dw== ends up producing 8 bits (1 byte) when decoded.

Decoding Base64 without padding

Without padding, after normal decoding of four characters to three bytes over and over again, fewer than four encoded characters may remain. In this situation, only two or three characters can remain. A single remaining encoded character is not possible, because a single Base64 character only contains 6 bits, and 8 bits are required to create a byte, so a minimum of two Base64 characters are required: The first character contributes 6 bits, and the second character contributes its first 2 bits. For example:

Length Encoded Length Decoded
2 bGlnaHQgdw 1 light w
3 bGlnaHQgd28 2 light wo
4 bGlnaHQgd29y 3 light wor

Implementations and history

Variants summary table

Implementations may have some constraints on the alphabet used for representing some bit patterns. This notably concerns the last two characters used in the alphabet at positions 62 and 63, and the character used for padding (which may be mandatory in some protocols or removed in others). The table below summarizes these known variants and provides links to the subsections below.

Encoding Encoding characters Separate encoding of lines Decoding non-encoding characters
62nd 63rd pad Separators Length Checksum
RFC 1421: Base64 for Privacy-Enhanced Mail (deprecated) + / = mandatory CR+LF 64, or lower for the last line No No
RFC 2045: Base64 transfer encoding for MIME + / = mandatory CR+LF At most 76 No Discarded
RFC 2152: Base64 for UTF-7 + / No No No
RFC 3501: Base64 encoding for IMAP mailbox names + , No No No
RFC 4648 §4: base64 (standard)[a] + / = optional No No
RFC 4648 §5: base64url (URL- and filename-safe standard)[a] - _ = optional No No
RFC 4880: Radix-64 for OpenPGP + / = mandatory CR+LF At most 76 Radix-64 encoded 24-bit CRC No
Other variations see Applications not compatible with RFC 4648 Base64 (below)
  1. ^ a b It is important to note that this variant is intended to provide common features where they are not desired to be specialized by implementations, ensuring robust engineering. This is particularly in light of separate line encodings and restrictions, which have not been considered when previous standards have been co-opted for use elsewhere. Thus, the features indicated here may be overridden.

Privacy-enhanced mail

The first known standardized use of the encoding now called MIME Base64 was in the Privacy-enhanced Electronic Mail (PEM) protocol, proposed by RFC 989 in 1987. PEM defines a "printable encoding" scheme that uses Base64 encoding to transform an arbitrary sequence of octets to a format that can be expressed in short lines of 6-bit characters, as required by transfer protocols such as SMTP.[7]

The current version of PEM (specified in RFC 1421) uses a 64-character alphabet consisting of upper- and lower-case Roman letters (AZ, az), the numerals (09), and the + and / symbols. The = symbol is also used as a padding suffix.[4] The original specification, RFC 989, additionally used the * symbol to delimit encoded but unencrypted data within the output stream.

To convert data to PEM printable encoding, the first byte is placed in the most significant eight bits of a 24-bit buffer, the next in the middle eight, and the third in the least significant eight bits. If there are fewer than three bytes left to encode (or in total), the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", and the indicated character is output.

The process is repeated on the remaining data until fewer than four octets remain. If three octets remain, they are processed normally. If fewer than three octets (24 bits) are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits.

After encoding the non-padded data, if two octets of the 24-bit buffer are padded-zeros, two = characters are appended to the output; if one octet of the 24-bit buffer is filled with padded-zeros, one = character is appended. This signals the decoder that the zero bits added due to padding should be excluded from the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes.

PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local (platform-specific) conventions.

MIME

The MIME (Multipurpose Internet Mail Extensions) specification lists Base64 as one of two binary-to-text encoding schemes (the other being quoted-printable).[5] MIME's Base64 encoding is based on that of the RFC 1421 version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM and uses the = symbol for output padding in the same way, as described at RFC 2045.

MIME does not specify a fixed length for Base64-encoded lines, but it does specify a maximum line length of 76 characters. Additionally, it specifies that any character outside the standard set of 64 encoding characters (For example CRLF sequences), must be ignored by a compliant decoder, although most implementations use a CR/LF newline pair to delimit encoded lines.

Thus, the actual length of MIME-compliant Base64-encoded binary data is usually about 137% of the original data length (43×7876), though for very short messages the overhead can be much higher due to the overhead of the headers. Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers). The size of the decoded data can be approximated with this formula:

bytes = (string_length(encoded_string) − 814) / 1.37 

UTF-7

UTF-7, described first in RFC 1642, which was later superseded by RFC 2152, introduced a system called modified Base64. This data encoding scheme is used to encode UTF-16 as ASCII characters for use in 7-bit transports such as SMTP. It is a variant of the Base64 encoding used in MIME.[8][9]

The "Modified Base64" alphabet consists of the MIME Base64 alphabet, but does not use the "=" padding character. UTF-7 is intended for use in mail headers (defined in RFC 2047), and the "=" character is reserved in that context as the escape character for "quoted-printable" encoding. Modified Base64 simply omits the padding and ends immediately after the last Base64 digit containing useful bits leaving up to three unused bits in the last Base64 digit.

OpenPGP

OpenPGP, described in RFC 4880, describes Radix-64 encoding, also known as "ASCII armor". Radix-64 is identical to the "Base64" encoding described by MIME, with the addition of an optional 24-bit CRC. The checksum is calculated on the input data before encoding; the checksum is then encoded with the same Base64 algorithm and, prefixed by the "=" symbol as the separator, appended to the encoded output data.[10]

RFC 3548

RFC 3548, entitled The Base16, Base32, and Base64 Data Encodings, is an informational (non-normative) memo that attempts to unify the RFC 1421 and RFC 2045 specifications of Base64 encodings, alternative-alphabet encodings, and the Base32 (which is seldom used) and Base16 encodings.

Unless implementations are written to a specification that refers to RFC 3548 and specifically requires otherwise, RFC 3548 forbids implementations from generating messages containing characters outside the encoding alphabet or without padding, and it also declares that decoder implementations must reject data that contain characters outside the encoding alphabet.[6]

RFC 4648

This RFC obsoletes RFC 3548 and focuses on Base64/32/16:

This document describes the commonly used Base64, Base32, and Base16 encoding schemes. It also discusses the use of line feeds in encoded data, the use of padding in encoded data, the use of non-alphabet characters in encoded data, the use of different encoding alphabets, and canonical encodings.

URL applications

Base64 encoding can be helpful when fairly lengthy identifying information is used in an HTTP environment. For example, a database persistence framework for Java objects might use Base64 encoding to encode a relatively large unique id (generally 128-bit UUIDs) into a string for use as an HTTP parameter in HTTP forms or HTTP GET URLs. Also, many applications need to encode binary data in a way that is convenient for inclusion in URLs, including in hidden web form fields, and Base64 is a convenient encoding to render them in a compact way.

Using standard Base64 in URL requires encoding of '+', '/' and '=' characters into special percent-encoded hexadecimal sequences ('+' becomes '%2B', '/' becomes '%2F' and '=' becomes '%3D'), which makes the string unnecessarily longer.

For this reason, modified Base64 for URL variants exist (such as base64url in RFC 4648), where the '+' and '/' characters of standard Base64 are respectively replaced by '-' and '_', so that using URL encoders/decoders is no longer necessary and has no effect on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. A popular site to make use of such is YouTube.[11] Some variants allow or require omitting the padding '=' signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. Some libraries[which?] will encode '=' to '.', potentially exposing applications to relative path attacks when a folder name is encoded from user data.[citation needed]

HTML

The atob() and btoa() JavaScript methods, defined in the HTML5 draft specification,[12] provide Base64 encoding and decoding functionality to web pages. The btoa() method outputs padding characters, but these are optional in the input of the atob() method.

Other applications

Base64 can be used in a variety of contexts:

  • Base64 can be used to transmit and store text that might otherwise cause delimiter collision
  • Base64 is used to encode character strings in LDAP Data Interchange Format files
  • Base64 is often used to embed binary data in an XML file, using a syntax similar to <data encoding="base64">…</data> e.g. favicons in Firefox's exported bookmarks.html.
  • Base64 is used to encode binary files such as images within scripts, to avoid depending on external files.
  • The data URI scheme can use Base64 to represent file contents. For instance, background images and fonts can be specified in a CSS stylesheet file as data: URIs, instead of being supplied in separate files.
  • Although not part of the official specification for SVG, some viewers can interpret Base64 when used for embedded elements, such as images inside SVG.[13]
  • Base64 can be used to store/transmit relatively small amounts of binary data via a computer's text clipboard functionality, especially in cases where the information doesn't warrant being permanently saved or when information must be quickly sent between a wide variety of different, potentially incompatible programs. An example is the representation of the public keys of cryptocurrency recipients as Base64 encoded text strings, which can be easily copied and pasted into users' wallet software.
  • Binary data that must be quickly verified by humans as a safety mechanism, such as file checksums or key fingerprints, is often represented in Base64 for easy checking, sometimes with additional formatting, such as separating each group of four characters in the representation of a PGP key fingerprint with a space.
  • QR codes which contain binary data will sometimes store it encoded in Base64 rather than simply storing the raw binary data, as there is a stronger guarantee that all QR code readers will accurately decode text, as well as the fact that some devices will more readily save text from a QR code than potentially malicious binary data.

Applications not compatible with RFC 4648 Base64

Some applications use a Base64 alphabet that is significantly different from the alphabets used in the most common Base64 variants (see Variants summary table above).

  • The Uuencoding alphabet includes no lowercase characters, instead using ASCII codes 32 (" " (space)) through 95 ("_"), consecutively. Uuencoding uses the alphabet " !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_". Avoiding all lower-case letters was helpful, because many older printers only printed uppercase. Using consecutive ASCII characters saved computing power, because it was only necessary to add 32, without requiring a lookup table. Its use of most punctuation characters and the space character may limit its usefulness in some applications, such as those that use these characters as syntax.[citation needed]
  • BinHex 4 (HQX), which was used within the classic Mac OS, excludes some visually confusable characters like '7', 'O', 'g' and 'o'. Its alphabet includes additional punctuation characters. It uses the alphabet "!"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr".
  • A UTF-8 environment can use non-synchronized continuation bytes as base64: 0b10xxxxxx. See UTF-8#Self-synchronization.
  • Several other applications use alphabets similar to the common variations, but in a different order:
    • Unix stores password hashes computed with crypt in the /etc/passwd file using an encoding called B64. crypt's alphabet puts the punctuation . and / before the alphanumeric characters. crypt uses the alphabet "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz". Padding is not used.
    • The GEDCOM 5.5 standard for genealogical data interchange encodes multimedia files in its text-line hierarchical file format. GEDCOM uses the same alphabet as crypt, which is "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".[14]
    • bcrypt hashes are designed to be used in the same way as traditional crypt(3) hashes, but bcrypt's alphabet is in a different order than crypt's. bcrypt uses the alphabet "./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789".[15]
    • Xxencoding uses a mostly-alphanumeric character set similar to crypt, but using + and - rather than . and /. Xxencoding uses the alphabet "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".
    • 6PACK, used with some terminal node controllers, uses an alphabet from 0x00 to 0x3f.[16]
    • Bash supports numeric literals in Base64. Bash uses the alphabet "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_".[17]

One issue with the RFC 4648 alphabet is that, when a sorted list of ASCII-encoded strings is Base64-transformed and sorted again, the order of elements changes. This is because the padding character and the characters in the substitution alphabet are not ordered by ASCII character value (which can be seen by using the following sample table's sort buttons). Alphabets like (unpadded) B64 address this.

ASCII Base64 Base64, no padding B64
light w bGlnaHQgdw== bGlnaHQgdw P4ZbO5EURk
light wo bGlnaHQgd28= bGlnaHQgd28 P4ZbO5EURqw
light wor bGlnaHQgd29y bGlnaHQgd29y P4ZbO5EURqxm

See also

References

  1. ^ "Base64 encoding and decoding – Web APIs". MDN Web Docs.
  2. ^ "When to base64 encode images (and when not to)". 28 August 2011.
  3. ^ a b The Base16,Base32,and Base64 Data Encodings. IETF. October 2006. doi:10.17487/RFC4648. RFC 4648. Retrieved March 18, 2010.
  4. ^ a b Privacy Enhancement for InternetElectronic Mail: Part I: Message Encryption and Authentication Procedures. IETF. February 1993. doi:10.17487/RFC1421. RFC 1421. Retrieved March 18, 2010.
  5. ^ a b Multipurpose Internet Mail Extensions: (MIME) Part One: Format of Internet Message Bodies. IETF. November 1996. doi:10.17487/RFC2045. RFC 2045. Retrieved March 18, 2010.
  6. ^ a b The Base16, Base32, and Base64 Data Encodings. IETF. July 2003. doi:10.17487/RFC3548. RFC 3548. Retrieved March 18, 2010.
  7. ^ Privacy Enhancement for Internet Electronic Mail. IETF. February 1987. doi:10.17487/RFC0989. RFC 989. Retrieved March 18, 2010.
  8. ^ UTF-7 A Mail-Safe Transformation Format of Unicode. IETF. July 1994. doi:10.17487/RFC1642. RFC 1642. Retrieved March 18, 2010.
  9. ^ UTF-7 A Mail-Safe Transformation Format of Unicode. IETF. May 1997. doi:10.17487/RFC2152. RFC 2152. Retrieved March 18, 2010.
  10. ^ OpenPGP Message Format. IETF. November 2007. doi:10.17487/RFC4880. RFC 4880. Retrieved March 18, 2010.
  11. ^ "Here's Why YouTube Will Practically Never Run Out of Unique Video IDs". www.mentalfloss.com. 23 March 2016. Retrieved 27 December 2021.
  12. ^ "7.3. Base64 utility methods". HTML 5.2 Editor's Draft. World Wide Web Consortium. Retrieved 2 January 2018. Introduced by changeset 5814, 2021-02-01.
  13. ^ "Edit fiddle". jsfiddle.net.
  14. ^ "The GEDCOM Standard Release 5.5". Homepages.rootsweb.ancestry.com. Retrieved 2012-06-21.
  15. ^ Provos, Niels (1997-02-13). "src/lib/libc/crypt/bcrypt.c r1.1". Retrieved 2018-05-18.
  16. ^ "6PACK a "real time" PC to TNC protocol". Retrieved 2013-05-19.
  17. ^ "Shell Arithmetic". Bash Reference Manual. Retrieved 8 April 2020. Otherwise, numbers take the form [base#]n, where the optional base is a decimal number between 2 and 64 representing the arithmetic base, and n is a number in that base.

base64, computer, programming, group, binary, text, encoding, schemes, that, represent, binary, data, more, specifically, sequence, bytes, sequences, bits, that, represented, four, digits, common, binary, text, encoding, schemes, designed, carry, data, stored,. In computer programming Base64 is a group of binary to text encoding schemes that represent binary data more specifically a sequence of 8 bit bytes in sequences of 24 bits that can be represented by four 6 bit Base64 digits Common to all binary to text encoding schemes Base64 is designed to carry data stored in binary formats across channels that only reliably support text content Base64 is particularly prevalent on the World Wide Web 1 where one of its uses is the ability to embed image files or other binary assets inside textual assets such as HTML and CSS files 2 Base64 is also widely used for sending e mail attachments This is required because SMTP in its original form was designed to transport 7 bit ASCII characters only This encoding causes an overhead of 33 37 33 by the encoding itself up to 4 more by the inserted line breaks Contents 1 Design 2 Base64 table from RFC 4648 3 Examples 3 1 Output padding 3 2 Decoding Base64 with padding 3 3 Decoding Base64 without padding 4 Implementations and history 4 1 Variants summary table 4 2 Privacy enhanced mail 4 3 MIME 4 4 UTF 7 4 5 OpenPGP 4 6 RFC 3548 4 7 RFC 4648 4 8 URL applications 4 9 HTML 4 10 Other applications 4 11 Applications not compatible with RFC 4648 Base64 5 See also 6 ReferencesDesign EditEach Base64 digit can take on 64 different values encoding 6 bits of data Which characters are chosen to represent the 64 values varies between implementations The general strategy is to choose 64 characters that are common to most encodings and that are also printable This combination leaves the data unlikely to be modified in transit through information systems such as email that were traditionally not 8 bit clean 3 For example MIME s Base64 implementation uses A Z a z and 0 9 for the first 62 values Other variations share this property but differ in the symbols chosen for the last two values an example is UTF 7 The earliest instances of this type of encoding were created for dial up communication between systems running the same OS for example uuencode for UNIX and BinHex for the TRS 80 later adapted for the Macintosh and could therefore make more assumptions about what characters were safe to use For instance uuencode uses uppercase letters digits and many punctuation characters but no lowercase 4 5 6 3 Base64 table from RFC 4648 EditThis is the Base64 alphabet defined in RFC 4648 4 See also Variants summary below Index Binary Char Index Binary Char Index Binary Char Index Binary Char0 000000 A 16 010000 Q 32 100000 g 48 110000 w1 000001 B 17 010001 R 33 100001 h 49 110001 x2 000010 C 18 010010 S 34 100010 i 50 110010 y3 000011 D 19 010011 T 35 100011 j 51 110011 z4 000100 E 20 010100 U 36 100100 k 52 110100 05 000101 F 21 010101 V 37 100101 l 53 110101 16 000110 G 22 010110 W 38 100110 m 54 110110 27 000111 H 23 010111 X 39 100111 n 55 110111 38 001000 I 24 011000 Y 40 101000 o 56 111000 49 001001 J 25 011001 Z 41 101001 p 57 111001 510 001010 K 26 011010 a 42 101010 q 58 111010 611 001011 L 27 011011 b 43 101011 r 59 111011 712 001100 M 28 011100 c 44 101100 s 60 111100 813 001101 N 29 011101 d 45 101101 t 61 111101 914 001110 O 30 011110 e 46 101110 u 62 111110 15 001111 P 31 011111 f 47 101111 v 63 111111 Padding Examples EditThe example below uses ASCII text for simplicity but this is not a typical use case as it can already be safely transferred across all systems that can handle Base64 The more typical use is to encode binary data such as an image the resulting Base64 data will only contain 64 different ASCII characters all of which can reliably be transferred across systems that may corrupt the raw source bytes Here is a well known idiom from distributed computing Many hands make light work When the quote without trailing whitespace is encoded into Base64 it is represented as a byte sequence of 8 bit padded ASCII characters encoded in MIME s Base64 scheme as follows newlines and white spaces may be present anywhere but are to be ignored on decoding TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu In the above quote the encoded value of Man is TWFu Encoded in ASCII the characters M a and n are stored as the byte values 77 97 and 110 which are the 8 bit binary values 01001101 01100001 and 01101110 These three values are joined together into a 24 bit string producing 010011010110000101101110 Groups of 6 bits 6 bits have a maximum of 26 64 different binary values are converted into individual numbers from start to end in this case there are four numbers in a 24 bit string which are then converted into their corresponding Base64 character values As this example illustrates Base64 encoding converts three octets into four encoded characters Source Text ASCII M a nOctets 77 0x4d 97 0x61 110 0x6e Bits 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0Base64encoded Sextets 19 22 5 46Character T W F uOctets 84 0x54 87 0x57 70 0x46 117 0x75 padding characters might be added to make the last encoded block contain four Base64 characters Hexadecimal to octal transformation is useful to convert between binary and Base64 Such conversion is available for both advanced calculators and programming languages For example the hexadecimal representation of the 24 bits above is 4D616E The octal representation is 23260556 Those 8 octal digits can be split into pairs 23 26 05 56 and each pair can be converted to decimal to yield 19 22 05 46 Using those four decimal numbers as indices for the Base64 alphabet the corresponding ASCII characters are TWFu If there are only two significant input octets e g Ma or when the last input group contains only two octets all 16 bits will be captured in the first three Base64 digits 18 bits the two least significant bits of the last content bearing 6 bit block will turn out to be zero and discarded on decoding along with the succeeding padding character Source Text ASCII M aOctets 77 0x4d 97 0x61 Bits 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 0 Base64encoded Sextets 19 22 4 PaddingCharacter T W E Octets 84 0x54 87 0x57 69 0x45 61 0x3D If there is only one significant input octet e g M or when the last input group contains only one octet all 8 bits will be captured in the first two Base64 digits 12 bits the four least significant bits of the last content bearing 6 bit block will turn out to be zero and discarded on decoding along with the succeeding two padding characters Source Text ASCII MOctets 77 0x4d Bits 0 1 0 0 1 1 0 1 0 0 0 0 Base64encoded Sextets 19 16 Padding PaddingCharacter T Q Octets 84 0x54 81 0x51 61 0x3D 61 0x3D Output padding Edit Because Base64 is a six bit encoding and because the decoded values are divided into 8 bit octets every four characters of Base64 encoded text 4 sextets 4 6 24 bits represents three octets of unencoded text or data 3 octets 3 8 24 bits This means that when the length of the unencoded input is not a multiple of three the encoded output must have padding added so that its length is a multiple of four The padding character is which indicates that no further bits are needed to fully encode the input This is different from A which means that the remaining bits are all zeros The example below illustrates how truncating the input of the above quote changes the output padding Input Output PaddingText Length Text Lengthlight wor k 11 bGlnaHQgd29y ay4 16 1light wor k 10 bGlnaHQgd29y aw 16 2light wor 9 bGlnaHQgd29y 12 0light wo 8 bGlnaHQgd28 12 1light w 7 bGlnaHQgdw 12 2The padding character is not essential for decoding since the number of missing bytes can be inferred from the length of the encoded text In some implementations the padding character is mandatory while for others it is not used An exception in which padding characters are required is when multiple Base64 encoded files have been concatenated Decoding Base64 with padding Edit When decoding Base64 text four characters are typically converted back to three bytes The only exceptions are when padding characters exist A single indicates that the four characters will decode to only two bytes while indicates that the four characters will decode to only a single byte For example Encoded Padding Length DecodedbGlnaHQgdw 1 light wbGlnaHQgd28 2 light wobGlnaHQgd29y None 3 light worAnother way to interpret the padding character is to consider it as an instruction to discard 2 trailing bits from the bit string each time a is encountered For example when bGlnaHQgdw is decoded we convert each character except the trailing occurrences of into their corresponding 6 bit representation and then discard 2 trailing bits for the first and another 2 trailing bits for the other In this instance we would get 6 bits from the d and another 6 bits from the w for a bit string of length 12 but since we remove 2 bits for each for a total of 4 bits the dw ends up producing 8 bits 1 byte when decoded Decoding Base64 without padding Edit Without padding after normal decoding of four characters to three bytes over and over again fewer than four encoded characters may remain In this situation only two or three characters can remain A single remaining encoded character is not possible because a single Base64 character only contains 6 bits and 8 bits are required to create a byte so a minimum of two Base64 characters are required The first character contributes 6 bits and the second character contributes its first 2 bits For example Length Encoded Length Decoded2 bGlnaHQgdw 1 light w3 bGlnaHQgd28 2 light wo4 bGlnaHQgd29y 3 light worImplementations and history EditVariants summary table Edit Implementations may have some constraints on the alphabet used for representing some bit patterns This notably concerns the last two characters used in the alphabet at positions 62 and 63 and the character used for padding which may be mandatory in some protocols or removed in others The table below summarizes these known variants and provides links to the subsections below Encoding Encoding characters Separate encoding of lines Decoding non encoding characters62nd 63rd pad Separators Length ChecksumRFC 1421 Base64 for Privacy Enhanced Mail deprecated mandatory CR LF 64 or lower for the last line No NoRFC 2045 Base64 transfer encoding for MIME mandatory CR LF At most 76 No DiscardedRFC 2152 Base64 for UTF 7 No No NoRFC 3501 Base64 encoding for IMAP mailbox names No No NoRFC 4648 4 base64 standard a optional No NoRFC 4648 5 base64url URL and filename safe standard a optional No NoRFC 4880 Radix 64 for OpenPGP mandatory CR LF At most 76 Radix 64 encoded 24 bit CRC NoOther variations see Applications not compatible with RFC 4648 Base64 below a b It is important to note that this variant is intended to provide common features where they are not desired to be specialized by implementations ensuring robust engineering This is particularly in light of separate line encodings and restrictions which have not been considered when previous standards have been co opted for use elsewhere Thus the features indicated here may be overridden Privacy enhanced mail Edit The first known standardized use of the encoding now called MIME Base64 was in the Privacy enhanced Electronic Mail PEM protocol proposed by RFC 989 in 1987 PEM defines a printable encoding scheme that uses Base64 encoding to transform an arbitrary sequence of octets to a format that can be expressed in short lines of 6 bit characters as required by transfer protocols such as SMTP 7 The current version of PEM specified in RFC 1421 uses a 64 character alphabet consisting of upper and lower case Roman letters A Z a z the numerals 0 9 and the and symbols The symbol is also used as a padding suffix 4 The original specification RFC 989 additionally used the symbol to delimit encoded but unencrypted data within the output stream To convert data to PEM printable encoding the first byte is placed in the most significant eight bits of a 24 bit buffer the next in the middle eight and the third in the least significant eight bits If there are fewer than three bytes left to encode or in total the remaining buffer bits will be zero The buffer is then used six bits at a time most significant first as indices into the string ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 and the indicated character is output The process is repeated on the remaining data until fewer than four octets remain If three octets remain they are processed normally If fewer than three octets 24 bits are remaining to encode the input data is right padded with zero bits to form an integral multiple of six bits After encoding the non padded data if two octets of the 24 bit buffer are padded zeros two characters are appended to the output if one octet of the 24 bit buffer is filled with padded zeros one character is appended This signals the decoder that the zero bits added due to padding should be excluded from the reconstructed data This also guarantees that the encoded output length is a multiple of 4 bytes PEM requires that all encoded lines consist of exactly 64 printable characters with the exception of the last line which may contain fewer printable characters Lines are delimited by whitespace characters according to local platform specific conventions MIME Edit Main article MIME The MIME Multipurpose Internet Mail Extensions specification lists Base64 as one of two binary to text encoding schemes the other being quoted printable 5 MIME s Base64 encoding is based on that of the RFC 1421 version of PEM it uses the same 64 character alphabet and encoding mechanism as PEM and uses the symbol for output padding in the same way as described at RFC 2045 MIME does not specify a fixed length for Base64 encoded lines but it does specify a maximum line length of 76 characters Additionally it specifies that any character outside the standard set of 64 encoding characters For example CRLF sequences must be ignored by a compliant decoder although most implementations use a CR LF newline pair to delimit encoded lines Thus the actual length of MIME compliant Base64 encoded binary data is usually about 137 of the original data length 4 3 78 76 though for very short messages the overhead can be much higher due to the overhead of the headers Very roughly the final size of Base64 encoded binary data is equal to 1 37 times the original data size 814 bytes for headers The size of the decoded data can be approximated with this formula bytes string length encoded string 814 1 37 UTF 7 Edit Main article UTF 7 UTF 7 described first in RFC 1642 which was later superseded by RFC 2152 introduced a system called modified Base64 This data encoding scheme is used to encode UTF 16 as ASCII characters for use in 7 bit transports such as SMTP It is a variant of the Base64 encoding used in MIME 8 9 The Modified Base64 alphabet consists of the MIME Base64 alphabet but does not use the padding character UTF 7 is intended for use in mail headers defined in RFC 2047 and the character is reserved in that context as the escape character for quoted printable encoding Modified Base64 simply omits the padding and ends immediately after the last Base64 digit containing useful bits leaving up to three unused bits in the last Base64 digit OpenPGP Edit Further information Pretty Good Privacy OpenPGP OpenPGP described in RFC 4880 describes Radix 64 encoding also known as ASCII armor Radix 64 is identical to the Base64 encoding described by MIME with the addition of an optional 24 bit CRC The checksum is calculated on the input data before encoding the checksum is then encoded with the same Base64 algorithm and prefixed by the symbol as the separator appended to the encoded output data 10 RFC 3548 Edit RFC 3548 entitled The Base16 Base32 and Base64 Data Encodings is an informational non normative memo that attempts to unify the RFC 1421 and RFC 2045 specifications of Base64 encodings alternative alphabet encodings and the Base32 which is seldom used and Base16 encodings Unless implementations are written to a specification that refers to RFC 3548 and specifically requires otherwise RFC 3548 forbids implementations from generating messages containing characters outside the encoding alphabet or without padding and it also declares that decoder implementations must reject data that contain characters outside the encoding alphabet 6 RFC 4648 Edit This RFC obsoletes RFC 3548 and focuses on Base64 32 16 This document describes the commonly used Base64 Base32 and Base16 encoding schemes It also discusses the use of line feeds in encoded data the use of padding in encoded data the use of non alphabet characters in encoded data the use of different encoding alphabets and canonical encodings URL applications Edit Base64 encoding can be helpful when fairly lengthy identifying information is used in an HTTP environment For example a database persistence framework for Java objects might use Base64 encoding to encode a relatively large unique id generally 128 bit UUIDs into a string for use as an HTTP parameter in HTTP forms or HTTP GET URLs Also many applications need to encode binary data in a way that is convenient for inclusion in URLs including in hidden web form fields and Base64 is a convenient encoding to render them in a compact way Using standard Base64 in URL requires encoding of and characters into special percent encoded hexadecimal sequences becomes 2B becomes 2F and becomes 3D which makes the string unnecessarily longer For this reason modified Base64 for URL variants exist such as base64url in RFC 4648 where the and characters of standard Base64 are respectively replaced by and so that using URL encoders decoders is no longer necessary and has no effect on the length of the encoded value leaving the same encoded form intact for use in relational databases web forms and object identifiers in general A popular site to make use of such is YouTube 11 Some variants allow or require omitting the padding signs to avoid them being confused with field separators or require that any such padding be percent encoded Some libraries which will encode to potentially exposing applications to relative path attacks when a folder name is encoded from user data citation needed HTML Edit The atob and btoa JavaScript methods defined in the HTML5 draft specification 12 provide Base64 encoding and decoding functionality to web pages The btoa method outputs padding characters but these are optional in the input of the atob method Other applications Edit Base64 can be used in a variety of contexts Base64 can be used to transmit and store text that might otherwise cause delimiter collision Base64 is used to encode character strings in LDAP Data Interchange Format files Base64 is often used to embed binary data in an XML file using a syntax similar to lt data encoding base64 gt lt data gt e g favicons in Firefox s exported bookmarks html Base64 is used to encode binary files such as images within scripts to avoid depending on external files The data URI scheme can use Base64 to represent file contents For instance background images and fonts can be specified in a CSS stylesheet file as data URIs instead of being supplied in separate files Although not part of the official specification for SVG some viewers can interpret Base64 when used for embedded elements such as images inside SVG 13 Base64 can be used to store transmit relatively small amounts of binary data via a computer s text clipboard functionality especially in cases where the information doesn t warrant being permanently saved or when information must be quickly sent between a wide variety of different potentially incompatible programs An example is the representation of the public keys of cryptocurrency recipients as Base64 encoded text strings which can be easily copied and pasted into users wallet software Binary data that must be quickly verified by humans as a safety mechanism such as file checksums or key fingerprints is often represented in Base64 for easy checking sometimes with additional formatting such as separating each group of four characters in the representation of a PGP key fingerprint with a space QR codes which contain binary data will sometimes store it encoded in Base64 rather than simply storing the raw binary data as there is a stronger guarantee that all QR code readers will accurately decode text as well as the fact that some devices will more readily save text from a QR code than potentially malicious binary data Applications not compatible with RFC 4648 Base64 Edit Some applications use a Base64 alphabet that is significantly different from the alphabets used in the most common Base64 variants see Variants summary table above The Uuencoding alphabet includes no lowercase characters instead using ASCII codes 32 space through 95 consecutively Uuencoding uses the alphabet amp 0123456789 lt gt ABCDEFGHIJKLMNOPQRSTUVWXYZ Avoiding all lower case letters was helpful because many older printers only printed uppercase Using consecutive ASCII characters saved computing power because it was only necessary to add 32 without requiring a lookup table Its use of most punctuation characters and the space character may limit its usefulness in some applications such as those that use these characters as syntax citation needed BinHex 4 HQX which was used within the classic Mac OS excludes some visually confusable characters like 7 O g and o Its alphabet includes additional punctuation characters It uses the alphabet amp 012345689 ABCDEFGHIJKLMNPQRSTUVXYZ abcdefhijklmpqr A UTF 8 environment can use non synchronized continuation bytes as base64 0b10 b xxxxxx b See UTF 8 Self synchronization Several other applications use alphabets similar to the common variations but in a different order Unix stores password hashes computed with crypt in the etc passwd file using an encoding called B64 crypt s alphabet puts the punctuation and before the alphanumeric characters crypt uses the alphabet 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz Padding is not used The GEDCOM 5 5 standard for genealogical data interchange encodes multimedia files in its text line hierarchical file format GEDCOM uses the same alphabet as crypt which is 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz 14 bcrypt hashes are designed to be used in the same way as traditional crypt 3 hashes but bcrypt s alphabet is in a different order than crypt s bcrypt uses the alphabet ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 15 Xxencoding uses a mostly alphanumeric character set similar to crypt but using and rather than and Xxencoding uses the alphabet 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz 6PACK used with some terminal node controllers uses an alphabet from 0x00 to 0x3f 16 Bash supports numeric literals in Base64 Bash uses the alphabet 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 17 One issue with the RFC 4648 alphabet is that when a sorted list of ASCII encoded strings is Base64 transformed and sorted again the order of elements changes This is because the padding character and the characters in the substitution alphabet are not ordered by ASCII character value which can be seen by using the following sample table s sort buttons Alphabets like unpadded B64 address this ASCII Base64 Base64 no padding B64light w bGlnaHQgdw bGlnaHQgdw P4ZbO5EURklight wo bGlnaHQgd28 bGlnaHQgd28 P4ZbO5EURqwlight wor bGlnaHQgd29y bGlnaHQgd29y P4ZbO5EURqxmSee also Edit8BITMIME Ascii85 also called Base85 Base16 Base32 Base36 Base62 Binary to text encoding for a comparison of various encoding algorithms Binary number URLReferences Edit Base64 encoding and decoding Web APIs MDN Web Docs When to base64 encode images and when not to 28 August 2011 a b The Base16 Base32 and Base64 Data Encodings IETF October 2006 doi 10 17487 RFC4648 RFC 4648 Retrieved March 18 2010 a b Privacy Enhancement for InternetElectronic Mail Part I Message Encryption and Authentication Procedures IETF February 1993 doi 10 17487 RFC1421 RFC 1421 Retrieved March 18 2010 a b Multipurpose Internet Mail Extensions MIME Part One Format of Internet Message Bodies IETF November 1996 doi 10 17487 RFC2045 RFC 2045 Retrieved March 18 2010 a b The Base16 Base32 and Base64 Data Encodings IETF July 2003 doi 10 17487 RFC3548 RFC 3548 Retrieved March 18 2010 Privacy Enhancement for Internet Electronic Mail IETF February 1987 doi 10 17487 RFC0989 RFC 989 Retrieved March 18 2010 UTF 7 A Mail Safe Transformation Format of Unicode IETF July 1994 doi 10 17487 RFC1642 RFC 1642 Retrieved March 18 2010 UTF 7 A Mail Safe Transformation Format of Unicode IETF May 1997 doi 10 17487 RFC2152 RFC 2152 Retrieved March 18 2010 OpenPGP Message Format IETF November 2007 doi 10 17487 RFC4880 RFC 4880 Retrieved March 18 2010 Here s Why YouTube Will Practically Never Run Out of Unique Video IDs www mentalfloss com 23 March 2016 Retrieved 27 December 2021 7 3 Base64 utility methods HTML 5 2 Editor s Draft World Wide Web Consortium Retrieved 2 January 2018 Introduced by changeset 5814 2021 02 01 Edit fiddle jsfiddle net The GEDCOM Standard Release 5 5 Homepages rootsweb ancestry com Retrieved 2012 06 21 Provos Niels 1997 02 13 src lib libc crypt bcrypt c r1 1 Retrieved 2018 05 18 6PACK a real time PC to TNC protocol Retrieved 2013 05 19 Shell Arithmetic Bash Reference Manual Retrieved 8 April 2020 Otherwise numbers take the form base n where the optional base is a decimal number between 2 and 64 representing the arithmetic base and n is a number in that base Retrieved from https en wikipedia org w index php title Base64 amp oldid 1131302124, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.