Network Working Group                                       S. Josefsson
Request for Comments: 4648                                           SJD
Obsoletes: 3548                                             October 2006
Category: Standards Track
        
Network Working Group                                       S. Josefsson
Request for Comments: 4648                                           SJD
Obsoletes: 3548                                             October 2006
Category: Standards Track
        

The Base16, Base32, and Base64 Data Encodings

Base16、Base32和Base64数据编码

Status of This Memo

关于下段备忘

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.

本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (2006).

版权所有(C)互联网协会(2006年)。

Abstract

摘要

This document describes the commonly used base 64, base 32, and base 16 encoding schemes. It also discusses the use of line-feeds in encoded data, use of padding in encoded data, use of non-alphabet characters in encoded data, use of different encoding alphabets, and canonical encodings.

本文档介绍了常用的base 64、base 32和base 16编码方案。它还讨论了在编码数据中使用换行、在编码数据中使用填充、在编码数据中使用非字母字符、使用不同的编码字母以及规范编码。

Table of Contents

目录

   1. Introduction ....................................................3
   2. Conventions Used in This Document ...............................3
   3. Implementation Discrepancies ....................................3
      3.1. Line Feeds in Encoded Data .................................3
      3.2. Padding of Encoded Data ....................................4
      3.3. Interpretation of Non-Alphabet Characters in Encoded Data ..4
      3.4. Choosing the Alphabet ......................................4
      3.5. Canonical Encoding .........................................5
   4. Base 64 Encoding ................................................5
   5. Base 64 Encoding with URL and Filename Safe Alphabet ............7
   6. Base 32 Encoding ................................................8
   7. Base 32 Encoding with Extended Hex Alphabet ....................10
   8. Base 16 Encoding ...............................................10
   9. Illustrations and Examples .....................................11
   10. Test Vectors ..................................................12
   11. ISO C99 Implementation of Base64 ..............................14
   12. Security Considerations .......................................14
   13. Changes Since RFC 3548 ........................................15
   14. Acknowledgements ..............................................15
   15. Copying Conditions ............................................15
   16. References ....................................................16
      16.1. Normative References .....................................16
      16.2. Informative References ...................................16
        
   1. Introduction ....................................................3
   2. Conventions Used in This Document ...............................3
   3. Implementation Discrepancies ....................................3
      3.1. Line Feeds in Encoded Data .................................3
      3.2. Padding of Encoded Data ....................................4
      3.3. Interpretation of Non-Alphabet Characters in Encoded Data ..4
      3.4. Choosing the Alphabet ......................................4
      3.5. Canonical Encoding .........................................5
   4. Base 64 Encoding ................................................5
   5. Base 64 Encoding with URL and Filename Safe Alphabet ............7
   6. Base 32 Encoding ................................................8
   7. Base 32 Encoding with Extended Hex Alphabet ....................10
   8. Base 16 Encoding ...............................................10
   9. Illustrations and Examples .....................................11
   10. Test Vectors ..................................................12
   11. ISO C99 Implementation of Base64 ..............................14
   12. Security Considerations .......................................14
   13. Changes Since RFC 3548 ........................................15
   14. Acknowledgements ..............................................15
   15. Copying Conditions ............................................15
   16. References ....................................................16
      16.1. Normative References .....................................16
      16.2. Informative References ...................................16
        
1. Introduction
1. 介绍

Base encoding of data is used in many situations to store or transfer data in environments that, perhaps for legacy reasons, are restricted to US-ASCII [1] data. Base encoding can also be used in new applications that do not have legacy restrictions, simply because it makes it possible to manipulate objects with text editors.

在许多情况下,数据的基本编码用于存储或传输环境中的数据,这些环境可能出于遗留原因,仅限于US-ASCII[1]数据。基本编码也可以在没有遗留限制的新应用程序中使用,这仅仅是因为它可以使用文本编辑器操纵对象。

In the past, different applications have had different requirements and thus sometimes implemented base encodings in slightly different ways. Today, protocol specifications sometimes use base encodings in general, and "base64" in particular, without a precise description or reference. Multipurpose Internet Mail Extensions (MIME) [4] is often used as a reference for base64 without considering the consequences for line-wrapping or non-alphabet characters. The purpose of this specification is to establish common alphabet and encoding considerations. This will hopefully reduce ambiguity in other documents, leading to better interoperability.

在过去,不同的应用程序有不同的需求,因此有时以稍微不同的方式实现基本编码。今天,协议规范有时通常使用基本编码,特别是“base64”,而没有精确的描述或引用。多用途Internet邮件扩展(MIME)[4]通常用作base64的参考,而不考虑换行或非字母字符的后果。本规范旨在确定通用字母表和编码注意事项。这有望减少其他文档中的歧义,从而实现更好的互操作性。

2. Conventions Used in This Document
2. 本文件中使用的公约

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [2].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照[2]中所述进行解释。

3. Implementation Discrepancies
3. 执行差异

Here we discuss the discrepancies between base encoding implementations in the past and, where appropriate, mandate a specific recommended behavior for the future.

在这里,我们讨论过去的基本编码实现之间的差异,并在适当的情况下,为未来指定特定的推荐行为。

3.1. Line Feeds in Encoded Data
3.1. 编码数据中的换行

MIME [4] is often used as a reference for base 64 encoding. However, MIME does not define "base 64" per se, but rather a "base 64 Content-Transfer-Encoding" for use within MIME. As such, MIME enforces a limit on line length of base 64-encoded data to 76 characters. MIME inherits the encoding from Privacy Enhanced Mail (PEM) [3], stating that it is "virtually identical"; however, PEM uses a line length of 64 characters. The MIME and PEM limits are both due to limits within SMTP.

MIME[4]通常用作Base64编码的参考。然而,MIME本身并没有定义“base64”,而是定义了一种在MIME中使用的“base64内容传输编码”。同样地,MIME将base 64编码数据的行长度限制为76个字符。MIME继承了隐私增强邮件(PEM)[3]的编码,声明它“几乎相同”;但是,PEM使用64个字符的行长度。MIME和PEM限制都是由于SMTP中的限制造成的。

Implementations MUST NOT add line feeds to base-encoded data unless the specification referring to this document explicitly directs base encoders to add line feeds after a specific number of characters.

除非参考本文档的规范明确指示基本编码器在特定数量的字符后添加换行符,否则实现不得向基本编码数据添加换行符。

3.2. Padding of Encoded Data
3.2. 编码数据的填充

In some circumstances, the use of padding ("=") in base-encoded data is not required or used. In the general case, when assumptions about the size of transported data cannot be made, padding is required to yield correct decoded data.

在某些情况下,不需要或不使用基址编码数据中的填充(“=”)。在一般情况下,当无法对传输数据的大小进行假设时,需要填充以产生正确的解码数据。

Implementations MUST include appropriate pad characters at the end of encoded data unless the specification referring to this document explicitly states otherwise.

实现必须在编码数据的末尾包含适当的pad字符,除非引用本文档的规范明确规定了其他内容。

The base64 and base32 alphabets use padding, as described below in sections 4 and 6, but the base16 alphabet does not need it; see section 8.

base64和base32字母使用填充,如下文第4节和第6节所述,但base16字母不需要填充;见第8节。

3.3. Interpretation of Non-Alphabet Characters in Encoded Data
3.3. 编码数据中非字母字符的解释

Base encodings use a specific, reduced alphabet to encode binary data. Non-alphabet characters could exist within base-encoded data, caused by data corruption or by design. Non-alphabet characters may be exploited as a "covert channel", where non-protocol data can be sent for nefarious purposes. Non-alphabet characters might also be sent in order to exploit implementation errors leading to, e.g., buffer overflow attacks.

基本编码使用特定的简化字母表对二进制数据进行编码。由于数据损坏或设计原因,非字母字符可能存在于基本编码数据中。非字母字符可能被利用为“隐蔽通道”,其中非协议数据可能被发送用于恶意目的。还可能发送非字母字符,以利用导致缓冲区溢出攻击等实现错误。

Implementations MUST reject the encoded data if it contains characters outside the base alphabet when interpreting base-encoded data, unless the specification referring to this document explicitly states otherwise. Such specifications may instead state, as MIME does, that characters outside the base encoding alphabet should simply be ignored when interpreting data ("be liberal in what you accept"). Note that this means that any adjacent carriage return/ line feed (CRLF) characters constitute "non-alphabet characters" and are ignored. Furthermore, such specifications MAY ignore the pad character, "=", treating it as non-alphabet data, if it is present before the end of the encoded data. If more than the allowed number of pad characters is found at the end of the string (e.g., a base 64 string terminated with "==="), the excess pad characters MAY also be ignored.

在解释基本编码数据时,如果编码数据包含基本字母表之外的字符,则实现必须拒绝该编码数据,除非引用本文档的规范明确规定了其他内容。这些规范可能会像MIME那样声明,在解释数据时,应该忽略基本编码字母表之外的字符(“在您接受的内容上要自由”)。请注意,这意味着任何相邻的回车/换行(CRLF)字符构成“非字母字符”,将被忽略。此外,如果pad字符“=”出现在编码数据的末尾之前,则此类规范可以忽略该pad字符,将其视为非字母数据。如果在字符串末尾发现超过允许数量的填充字符(例如,以“==”结尾的基64字符串),则也可以忽略多余的填充字符。

3.4. Choosing the Alphabet
3.4. 选择字母表

Different applications have different requirements on the characters in the alphabet. Here are a few requirements that determine which alphabet should be used:

不同的应用程序对字母表中的字符有不同的要求。以下是确定应使用哪个字母表的一些要求:

o Handled by humans. The characters "0" and "O" are easily confused, as are "1", "l", and "I". In the base32 alphabet below, where 0 (zero) and 1 (one) are not present, a decoder may interpret 0 as O, and 1 as I or L depending on case. (However, by default it should not; see previous section.)

o 由人类操纵。字符“0”和“O”很容易混淆,“1”、“l”和“I”也很容易混淆。在下面的base32字母表中,如果0(零)和1(一)不存在,解码器可能会根据情况将0解释为O,将1解释为I或L。(但是,默认情况下不应如此;请参见上一节。)

o Encoded into structures that mandate other requirements. For base 16 and base 32, this determines the use of upper- or lowercase alphabets. For base 64, the non-alphanumeric characters (in particular, "/") may be problematic in file names and URLs.

o 编码到要求其他需求的结构中。对于基数16和基数32,这决定了使用大写或小写字母。对于base 64,文件名和URL中的非字母数字字符(尤其是“/”)可能有问题。

o Used as identifiers. Certain characters, notably "+" and "/" in the base 64 alphabet, are treated as word-breaks by legacy text search/index tools.

o 用作标识符。某些字符,尤其是64进制字母表中的“+”和“/”被传统的文本搜索/索引工具视为分词。

There is no universally accepted alphabet that fulfills all the requirements. For an example of a highly specialized variant, see IMAP [8]. In this document, we document and name some currently used alphabets.

没有一种普遍接受的字母表能够满足所有的要求。有关高度专业化变体的示例,请参见IMAP[8]。在本文档中,我们记录并命名了一些当前使用的字母表。

3.5. Canonical Encoding
3.5. 规范编码

The padding step in base 64 and base 32 encoding can, if improperly implemented, lead to non-significant alterations of the encoded data. For example, if the input is only one octet for a base 64 encoding, then all six bits of the first symbol are used, but only the first two bits of the next symbol are used. These pad bits MUST be set to zero by conforming encoders, which is described in the descriptions on padding below. If this property do not hold, there is no canonical representation of base-encoded data, and multiple base-encoded strings can be decoded to the same binary data. If this property (and others discussed in this document) holds, a canonical encoding is guaranteed.

如果执行不当,则base 64和base 32编码中的填充步骤可能会导致编码数据的非重大更改。例如,如果基64编码的输入仅为一个八位字节,则使用第一个符号的所有六位,但仅使用下一个符号的前两位。这些pad位必须通过一致的编码器设置为零,这在下面的填充说明中进行了描述。如果此属性不成立,则不存在基址编码数据的规范表示,并且可以将多个基址编码字符串解码为相同的二进制数据。如果此属性(以及本文档中讨论的其他属性)成立,则保证使用规范编码。

In some environments, the alteration is critical and therefore decoders MAY chose to reject an encoding if the pad bits have not been set to zero. The specification referring to this may mandate a specific behaviour.

在某些环境中,更改是至关重要的,因此,如果pad位未设置为零,解码器可能会选择拒绝编码。引用此规范的规范可能要求特定的行为。

4. Base 64 Encoding
4. 基64编码

The following description of base 64 is derived from [3], [4], [5], and [6]. This encoding may be referred to as "base64".

以下对base 64的描述源自[3]、[4]、[5]和[6]。这种编码可以称为“base64”。

The Base 64 encoding is designed to represent arbitrary sequences of octets in a form that allows the use of both upper- and lowercase letters but that need not be human readable.

Base64编码设计用于以允许使用大写和小写字母但不需要人类可读的形式表示任意八位字节序列。

A 65-character subset of US-ASCII is used, enabling 6 bits to be represented per printable character. (The extra 65th character, "=", is used to signify a special processing function.)

使用US-ASCII的65个字符子集,使每个可打印字符能够表示6位。(额外的第65个字符“=”用于表示特殊处理功能。)

The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from left to right, a 24-bit input group is formed by concatenating 3 8-bit input groups. These 24 bits are then treated as 4 concatenated 6-bit groups, each of which is translated into a single character in the base 64 alphabet.

编码过程将输入位的24位组表示为4个编码字符的输出字符串。从左到右,通过连接3个8位输入组形成24位输入组。然后,这24位被视为4个串联的6位组,每个组被转换为基本64字母表中的单个字符。

Each 6-bit group is used as an index into an array of 64 printable characters. The character referenced by the index is placed in the output string.

每个6位组用作64个可打印字符数组的索引。索引引用的字符被放置在输出字符串中。

Table 1: The Base 64 Alphabet

表1:基本64字母表

Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 + 12 M 29 d 46 u 63 / 13 N 30 e 47 v 14 O 31 f 48 w (pad) = 15 P 32 g 49 x 16 Q 33 h 50 y

数值编码数值编码数值编码数值编码数值编码0a17r34i51z1b18s35j520c19t36k531d20u37l542e21v21v21v38m5535f22w39n5646g23x40o577h24y41p5868i25z42q597j26a43r60810k27b6111l28t62+12m29d46u63/13n30e47v31f48w(pad)=15 P 32 g 49 x 16 Q 33 h 50 y

Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. A full encoding quantum is always completed at the end of a quantity. When fewer than 24 input bits are available in an input group, bits with value zero are added (on the right) to form an integral number of 6-bit groups. Padding at the end of the data is performed using the '=' character. Since all base 64 input is an integral number of octets, only the following cases can arise:

如果在编码的数据末尾可用的位少于24位,则执行特殊处理。一个完整的编码量总是在一个量的末尾完成。当一个输入组中可用的输入位少于24位时,将值为零的位相加(在右侧)以形成整数个6位组。数据末尾的填充使用“=”字符执行。由于所有base 64输入都是八位字节的整数,因此只能出现以下情况:

(1) The final quantum of encoding input is an integral multiple of 24 bits; here, the final unit of encoded output will be an integral multiple of 4 characters with no "=" padding.

(1) 编码输入的最终量是24位的整数倍;这里,编码输出的最终单位将是4个字符的整数倍,没有“=”填充。

(2) The final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by two "=" padding characters.

(2) 编码输入的最终量正好是8位;这里,编码输出的最终单位是两个字符,后跟两个“=”填充字符。

(3) The final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be three characters followed by one "=" padding character.

(3) 编码输入的最终量正好是16位;这里,编码输出的最终单位是三个字符,后跟一个“=”填充字符。

5. Base 64 Encoding with URL and Filename Safe Alphabet
5. 使用URL和文件名安全字母表的Base 64编码

The Base 64 encoding with an URL and filename safe alphabet has been used in [12].

[12]中使用了带有URL和文件名安全字母表的Base 64编码。

An alternative alphabet has been suggested that would use "~" as the 63rd character. Since the "~" character has special meaning in some file system environments, the encoding described in this section is recommended instead. The remaining unreserved URI character is ".", but some file system environments do not permit multiple "." in a filename, thus making the "." character unattractive as well.

有人建议另一种字母表使用“~”作为第63个字符。由于“~”字符在某些文件系统环境中具有特殊意义,因此建议改用本节中描述的编码。剩余的未保留URI字符是“.”,但某些文件系统环境不允许文件名中有多个“.”,因此“.”字符也不具有吸引力。

The pad character "=" is typically percent-encoded when used in an URI [9], but if the data length is known implicitly, this can be avoided by skipping the padding; see section 3.2.

填充字符“=”在URI[9]中使用时通常采用百分比编码,但如果数据长度是隐式已知的,则可以通过跳过填充来避免这种情况;见第3.2节。

This encoding may be referred to as "base64url". This encoding should not be regarded as the same as the "base64" encoding and should not be referred to as only "base64". Unless clarified otherwise, "base64" refers to the base 64 in the previous section.

此编码可称为“base64url”。此编码不应视为与“base64”编码相同,也不应仅称为“base64”。除非另有说明,“base64”指上一节中的base64。

This encoding is technically identical to the previous one, except for the 62:nd and 63:rd alphabet character, as indicated in Table 2.

除表2中所示的62:nd和63:rd字母字符外,该编码在技术上与前一种编码相同。

Table 2: The "URL and Filename safe" Base 64 Alphabet

表2:“URL和文件名安全”基本64个字母

Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 - (minus) 12 M 29 d 46 u 63 _ 13 N 30 e 47 v (underline) 14 O 31 f 48 w 15 P 32 g 49 x 16 Q 33 h 50 y (pad) =

数值编码数值编码数值编码数值编码数值编码0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 H 24 Y 41 p 58 6 8 i 25 z 42 q 59 7 9 j 26 A 43 R 60 8 10 k 27 B 44 S 61 11 l 28 C 45 T 62-(减)12 m 29 D 46 U 63 13 n 30 E 47 V(下划线)14 O 31 f 48 w 15 P 32 g 49 x 16 Q 33 h 50 y(衬垫)=

6. Base 32 Encoding
6. 基32编码

The following description of base 32 is derived from [11] (with corrections). This encoding may be referred to as "base32".

以下对基数32的描述源自[11](带更正)。这种编码可以称为“base32”。

The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but that need not be human readable.

Base 32编码设计用于以不区分大小写但不需要人类可读的形式表示任意八位字节序列。

A 33-character subset of US-ASCII is used, enabling 5 bits to be represented per printable character. (The extra 33rd character, "=", is used to signify a special processing function.)

使用US-ASCII的33个字符子集,使每个可打印字符能够表示5位。(额外的第33个字符“=”用于表示特殊处理功能。)

The encoding process represents 40-bit groups of input bits as output strings of 8 encoded characters. Proceeding from left to right, a 40-bit input group is formed by concatenating 5 8bit input groups. These 40 bits are then treated as 8 concatenated 5-bit groups, each of which is translated into a single character in the base 32 alphabet. When a bit stream is encoded via the base 32 encoding, the bit stream must be presumed to be ordered with the most-significant-bit first. That is, the first bit in the stream will be the high-order bit in the first 8bit byte, the eighth bit will be the low-order bit in the first 8bit byte, and so on.

编码过程将40位输入位组表示为8个编码字符的输出字符串。从左到右,通过连接5个8位输入组形成40位输入组。然后将这40位视为8个串联的5位组,每个组被翻译成32个基本字母表中的单个字符。当通过基32编码对比特流进行编码时,必须假定比特流以最高有效位优先排序。也就是说,流中的第一位将是前8位字节中的高位,第八位将是前8位字节中的低位,依此类推。

Each 5-bit group is used as an index into an array of 32 printable characters. The character referenced by the index is placed in the output string. These characters, identified in Table 3, below, are selected from US-ASCII digits and uppercase letters.

每个5位组用作32个可打印字符数组的索引。索引引用的字符被放置在输出字符串中。下表3中标识的这些字符是从US-ASCII数字和大写字母中选择的。

Table 3: The Base 32 Alphabet

表3:32个基本字母

Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 5 F 14 O 23 X 6 G 15 P 24 Y (pad) = 7 H 16 Q 25 Z 8 I 17 R 26 2

数值编码数值编码数值编码数值编码数值编码0a9j18s2731b10k19t2842c11l20u295d12m21v3064e13n22w3175f14o23x6g15p24y(pad)=7h16q25z8i17r262

Special processing is performed if fewer than 40 bits are available at the end of the data being encoded. A full encoding quantum is always completed at the end of a body. When fewer than 40 input bits are available in an input group, bits with value zero are added (on the right) to form an integral number of 5-bit groups. Padding at the end of the data is performed using the "=" character. Since all base 32 input is an integral number of octets, only the following cases can arise:

如果在编码数据的末尾可用的比特数少于40,则执行特殊处理。一个完整的编码量总是在主体的末尾完成。当一个输入组中可用的输入位少于40位时,将值为零的位相加(在右侧)以形成整数个5位组。数据末尾的填充使用“=”字符执行。由于所有基32输入都是八位字节的整数,因此只能出现以下情况:

(1) The final quantum of encoding input is an integral multiple of 40 bits; here, the final unit of encoded output will be an integral multiple of 8 characters with no "=" padding.

(1) 编码输入的最终量是40位的整数倍;这里,编码输出的最终单位将是8个字符的整数倍,没有“=”填充。

(2) The final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by six "=" padding characters.

(2) 编码输入的最终量正好是8位;这里,编码输出的最终单位是两个字符,后跟六个“=”填充字符。

(3) The final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be four characters followed by four "=" padding characters.

(3) 编码输入的最终量正好是16位;这里,编码输出的最终单位是四个字符,后跟四个“=”填充字符。

(4) The final quantum of encoding input is exactly 24 bits; here, the final unit of encoded output will be five characters followed by three "=" padding characters.

(4) 编码输入的最终量正好是24位;这里,编码输出的最终单位是五个字符,后跟三个“=”填充字符。

(5) The final quantum of encoding input is exactly 32 bits; here, the final unit of encoded output will be seven characters followed by one "=" padding character.

(5) 编码输入的最终量正好是32位;这里,编码输出的最终单位是七个字符,后跟一个“=”填充字符。

7. Base 32 Encoding with Extended Hex Alphabet
7. 带扩展十六进制字母的32进制编码

The following description of base 32 is derived from [7]. This encoding may be referred to as "base32hex". This encoding should not be regarded as the same as the "base32" encoding and should not be referred to as only "base32". This encoding is used by, e.g., NextSECure3 (NSEC3) [10].

以下对base 32的描述源自[7]。这种编码可以称为“base32hex”。此编码不应视为与“base32”编码相同,也不应仅称为“base32”。例如,NextSECure3(NSEC3)[10]使用这种编码。

One property with this alphabet, which the base64 and base32 alphabets lack, is that encoded data maintains its sort order when the encoded data is compared bit-wise.

base64和base32字母表缺少的一个属性是,当按位比较编码数据时,编码数据保持其排序顺序。

This encoding is identical to the previous one, except for the alphabet. The new alphabet is found in Table 4.

除字母表外,此编码与前一种编码相同。新字母表见表4。

Table 4: The "Extended Hex" Base 32 Alphabet

表4:以32个字母为基数的“扩展十六进制”

Value Encoding Value Encoding Value Encoding Value Encoding 0 0 9 9 18 I 27 R 1 1 10 A 19 J 28 S 2 2 11 B 20 K 29 T 3 3 12 C 21 L 30 U 4 4 13 D 22 M 31 V 5 5 14 E 23 N 6 6 15 F 24 O (pad) = 7 7 16 G 25 P 8 8 17 H 26 Q

值编码值编码值编码值编码0 0 9 9 18 I 27 R 1 10 A 19 J 28 S 2 11 B 20 K 29 T 3 12 C 21 L 30 U 4 13 D 22 M 31 V 5 14 E 23 N 6 15 F 24 O(pad)=7 7 16 G 25 P 8 17 H 26 Q

8. Base 16 Encoding
8. 基16编码

The following description is original but analogous to previous descriptions. Essentially, Base 16 encoding is the standard case-insensitive hex encoding and may be referred to as "base16" or "hex".

以下描述为原始描述,但与之前的描述类似。基本上,Base 16编码是标准的不区分大小写的十六进制编码,可以称为“Base 16”或“hex”。

A 16-character subset of US-ASCII is used, enabling 4 bits to be represented per printable character.

使用US-ASCII的16个字符子集,使每个可打印字符能够表示4位。

The encoding process represents 8-bit groups (octets) of input bits as output strings of 2 encoded characters. Proceeding from left to right, an 8-bit input is taken from the input data. These 8 bits are then treated as 2 concatenated 4-bit groups, each of which is translated into a single character in the base 16 alphabet.

编码过程将输入位的8位组(八位字节)表示为2个编码字符的输出字符串。从左到右,从输入数据中获取8位输入。然后,这8位被视为2个串联的4位组,每个组被翻译成16进制字母表中的单个字符。

Each 4-bit group is used as an index into an array of 16 printable characters. The character referenced by the index is placed in the output string.

每个4位组用作16个可打印字符数组的索引。索引引用的字符被放置在输出字符串中。

Table 5: The Base 16 Alphabet

表5:基本16个字母

Value Encoding Value Encoding Value Encoding Value Encoding 0 0 4 4 8 8 12 C 1 1 5 5 9 9 13 D 2 2 6 6 10 A 14 E 3 3 7 7 11 B 15 F

值编码值编码值编码值编码0 0 4 4 8 8 12 C 1 5 5 9 9 13 D 2 6 6 10 A 14 E 3 7 11 B 15 F

Unlike base 32 and base 64, no special padding is necessary since a full code word is always available.

与base 32和base 64不同,由于完整的码字始终可用,因此不需要特殊的填充。

9. Illustrations and Examples
9. 插图和例子

To translate between binary and a base encoding, the input is stored in a structure, and the output is extracted. The case for base 64 is displayed in the following figure, borrowed from [5].

为了在二进制编码和基编码之间进行转换,输入存储在一个结构中,然后提取输出。下图显示了base 64的案例,借用自[5]。

            +--first octet--+-second octet--+--third octet--+
            |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|
            +-----------+---+-------+-------+---+-----------+
            |5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|
            +--1.index--+--2.index--+--3.index--+--4.index--+
        
            +--first octet--+-second octet--+--third octet--+
            |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|
            +-----------+---+-------+-------+---+-----------+
            |5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|
            +--1.index--+--2.index--+--3.index--+--4.index--+
        

The case for base 32 is shown in the following figure, borrowed from [7]. Each successive character in a base-32 value represents 5 successive bits of the underlying octet sequence. Thus, each group of 8 characters represents a sequence of 5 octets (40 bits).

下图显示了base 32的情况,借用自[7]。base-32值中的每个连续字符表示基础八位字节序列的5个连续位。因此,每组8个字符代表5个八位字节(40位)的序列。

                        1          2          3
             01234567 89012345 67890123 45678901 23456789
            +--------+--------+--------+--------+--------+
            |< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >|
            +--------+--------+--------+--------+--------+
                                                    <===> 8th character
                                              <====> 7th character
                                         <===> 6th character
                                   <====> 5th character
                             <====> 4th character
                        <===> 3rd character
                  <====> 2nd character
             <===> 1st character
        
                        1          2          3
             01234567 89012345 67890123 45678901 23456789
            +--------+--------+--------+--------+--------+
            |< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >|
            +--------+--------+--------+--------+--------+
                                                    <===> 8th character
                                              <====> 7th character
                                         <===> 6th character
                                   <====> 5th character
                             <====> 4th character
                        <===> 3rd character
                  <====> 2nd character
             <===> 1st character
        

The following example of Base64 data is from [5], with corrections.

下面的Base64数据示例来自[5],并进行了更正。

Input data: 0x14fb9c03d97e Hex: 1 4 f b 9 c | 0 3 d 9 7 e 8-bit: 00010100 11111011 10011100 | 00000011 11011001 01111110 6-bit: 000101 001111 101110 011100 | 000000 111101 100101 111110 Decimal: 5 15 46 28 0 61 37 62 Output: F P u c A 9 l +

输入数据:0x14fb9c03d97e十六进制:1 4 f b 9 c | 0 3 d 9 7 e 8位:0000100 11111 011 10011100 | 000000 11 11011001 011111110 6位:000101111 101110 011100 | 000000 111101 100101 11111 0十进制:5 15 46 28 0 61 37 62输出:f P u c 9 l+

Input data: 0x14fb9c03d9 Hex: 1 4 f b 9 c | 0 3 d 9 8-bit: 00010100 11111011 10011100 | 00000011 11011001 pad with 00 6-bit: 000101 001111 101110 011100 | 000000 111101 100100 Decimal: 5 15 46 28 0 61 36 pad with = Output: F P u c A 9 k =

输入数据:0x14fb9c03d9十六进制:1 4 f b 9 c | 0 3 d 9 8位:0000100 11111 011 10011100 | 000000 11 11011001焊盘带00 6位:000101001111 101110 01100 | 000000 111101 100100十进制:5 15 46 28 0 61 36焊盘带=输出:f P u c 9 k=

      Input data:  0x14fb9c03
      Hex:     1   4    f   b    9   c     | 0   3
      8-bit:   00010100 11111011 10011100  | 00000011
                                             pad with 0000
      6-bit:   000101 001111 101110 011100 | 000000 110000
      Decimal: 5      15     46     28       0      48
                                                  pad with =      =
      Output:  F      P      u      c        A      w      =      =
        
      Input data:  0x14fb9c03
      Hex:     1   4    f   b    9   c     | 0   3
      8-bit:   00010100 11111011 10011100  | 00000011
                                             pad with 0000
      6-bit:   000101 001111 101110 011100 | 000000 110000
      Decimal: 5      15     46     28       0      48
                                                  pad with =      =
      Output:  F      P      u      c        A      w      =      =
        
10. Test Vectors
10. 测试向量

BASE64("") = ""

BASE64(“”=“”

   BASE64("f") = "Zg=="
        
   BASE64("f") = "Zg=="
        
   BASE64("fo") = "Zm8="
        
   BASE64("fo") = "Zm8="
        

BASE64("foo") = "Zm9v"

BASE64(“foo”)=“Zm9v”

   BASE64("foob") = "Zm9vYg=="
        
   BASE64("foob") = "Zm9vYg=="
        
   BASE64("fooba") = "Zm9vYmE="
        
   BASE64("fooba") = "Zm9vYmE="
        

BASE64("foobar") = "Zm9vYmFy"

BASE64(“foobar”)=“Zm9vYmFy”

BASE32("") = ""

BASE32(“”=“”

   BASE32("f") = "MY======"
        
   BASE32("f") = "MY======"
        
   BASE32("fo") = "MZXQ===="
        
   BASE32("fo") = "MZXQ===="
        
   BASE32("foo") = "MZXW6==="
        
   BASE32("foo") = "MZXW6==="
        
   BASE32("foob") = "MZXW6YQ="
        
   BASE32("foob") = "MZXW6YQ="
        

BASE32("fooba") = "MZXW6YTB"

BASE32(“fooba”)=“MZXW6YTB”

   BASE32("foobar") = "MZXW6YTBOI======"
        
   BASE32("foobar") = "MZXW6YTBOI======"
        

BASE32-HEX("") = ""

BASE32-HEX(“”=“”

   BASE32-HEX("f") = "CO======"
        
   BASE32-HEX("f") = "CO======"
        
   BASE32-HEX("fo") = "CPNG===="
        
   BASE32-HEX("fo") = "CPNG===="
        
   BASE32-HEX("foo") = "CPNMU==="
        
   BASE32-HEX("foo") = "CPNMU==="
        
   BASE32-HEX("foob") = "CPNMUOG="
        
   BASE32-HEX("foob") = "CPNMUOG="
        

BASE32-HEX("fooba") = "CPNMUOJ1"

BASE32-HEX(“fooba”)=“CPNMUOJ1”

   BASE32-HEX("foobar") = "CPNMUOJ1E8======"
        
   BASE32-HEX("foobar") = "CPNMUOJ1E8======"
        

BASE16("") = ""

BASE16(“”=“”

BASE16("f") = "66"

BASE16(“f”)=“66”

BASE16("fo") = "666F"

BASE16(“fo”)=“666F”

BASE16("foo") = "666F6F"

BASE16(“foo”)=“666F6F”

BASE16("foob") = "666F6F62"

BASE16(“foob”)=“666F6F62”

BASE16("fooba") = "666F6F6261"

BASE16(“fooba”)=“666F6F6261”

BASE16("foobar") = "666F6F626172"

BASE16(“foobar”)=“666F6F626172”

11. ISO C99 Implementation of Base64
11. Base64的ISO C99实现

An ISO C99 implementation of Base64 encoding and decoding that is believed to follow all recommendations in this RFC is available from:

Base64编码和解码的ISO C99实现被认为遵循本RFC中的所有建议,可从以下网站获得:

      http://josefsson.org/base-encoding/
        
      http://josefsson.org/base-encoding/
        

This code is not normative.

本规范不规范。

The code could not be included in this RFC for procedural reasons (RFC 3978 section 5.4).

由于程序原因,该代码无法包含在本RFC中(RFC 3978第5.4节)。

12. Security Considerations
12. 安全考虑

When base encoding and decoding is implemented, care should be taken not to introduce vulnerabilities to buffer overflow attacks, or other attacks on the implementation. A decoder should not break on invalid input including, e.g., embedded NUL characters (ASCII 0).

在实现基本编码和解码时,应注意不要给实现带来缓冲区溢出攻击或其他攻击的漏洞。解码器不应在无效输入时中断,例如,包括嵌入的NUL字符(ASCII 0)。

If non-alphabet characters are ignored, instead of causing rejection of the entire encoding (as recommended), a covert channel that can be used to "leak" information is made possible. The ignored characters could also be used for other nefarious purposes, such as to avoid a string equality comparison or to trigger implementation bugs. The implications of ignoring non-alphabet characters should be understood in applications that do not follow the recommended practice. Similarly, when the base 16 and base 32 alphabets are handled case insensitively, alteration of case can be used to leak information or make string equality comparisons fail.

如果忽略非字母字符,而不是导致整个编码被拒绝(如建议的那样),则可以使用可用于“泄漏”信息的隐蔽通道。被忽略的字符还可用于其他恶意目的,如避免字符串相等比较或触发实现错误。在不遵循推荐做法的应用程序中,应理解忽略非字母字符的含义。类似地,当以不区分大小写的方式处理基16和基32字母时,大小写的更改可用于泄漏信息或使字符串相等性比较失败。

When padding is used, there are some non-significant bits that warrant security concerns, as they may be abused to leak information or used to bypass string equality comparisons or to trigger implementation problems.

当使用填充时,有一些非重要位需要考虑安全问题,因为它们可能被滥用以泄漏信息或用于绕过字符串相等性比较或触发实现问题。

Base encoding visually hides otherwise easily recognized information, such as passwords, but does not provide any computational confidentiality. This has been known to cause security incidents when, e.g., a user reports details of a network protocol exchange (perhaps to illustrate some other problem) and accidentally reveals the password because she is unaware that the base encoding does not protect the password.

基本编码直观地隐藏了其他容易识别的信息,如密码,但不提供任何计算机密性。众所周知,当用户报告网络协议交换的详细信息(可能是为了说明某些其他问题)并意外泄露密码时,这会导致安全事件,因为她不知道基本编码不保护密码。

Base encoding adds no entropy to the plaintext, but it does increase the amount of plaintext available and provide a signature for cryptanalysis in the form of a characteristic probability distribution.

基本编码不会给明文增加熵,但它确实增加了可用明文的数量,并以特征概率分布的形式为密码分析提供签名。

13. Changes Since RFC 3548
13. 自RFC 3548以来的变化

Added the "base32 extended hex alphabet", needed to preserve sort order of encoded data.

添加了“base32扩展十六进制字母表”,需要保留编码数据的排序顺序。

Referenced IMAP for the special Base64 encoding used there.

此处使用的特殊Base64编码的参考IMAP。

Fixed the example copied from RFC 2440.

修复了从RFC2440复制的示例。

Added security consideration about providing a signature for cryptoanalysis.

增加了关于为密码分析提供签名的安全考虑。

Added test vectors.

添加了测试向量。

Fixed typos.

修正了打字错误。

14. Acknowledgements
14. 致谢

Several people offered comments and/or suggestions, including John E. Hadstate, Tony Hansen, Gordon Mohr, John Myers, Chris Newman, and Andrew Sieber. Text used in this document are based on earlier RFCs describing specific uses of various base encodings. The author acknowledges the RSA Laboratories for supporting the work that led to this document.

一些人提出了意见和/或建议,包括约翰·E·哈德斯塔特、托尼·汉森、戈登·莫尔、约翰·迈尔斯、克里斯·纽曼和安德鲁·西伯。本文档中使用的文本基于早期RFC,描述了各种基本编码的具体用途。作者感谢RSA实验室支持本文档的工作。

This revised version is based in parts on comments and/or suggestions made by Roy Arends, Eric Blake, Brian E Carpenter, Elwyn Davies, Bill Fenner, Sam Hartman, Ted Hardie, Per Hygum, Jelte Jansen, Clement Kent, Tero Kivinen, Paul Kwiatkowski, and Ben Laurie.

本修订版本部分基于罗伊·阿伦兹、埃里克·布莱克、布莱恩·卡彭特、埃尔温·戴维斯、比尔·芬纳、萨姆·哈特曼、泰德·哈迪、佩尔·海根、杰尔特·詹森、克莱门特·肯特、泰罗·基维宁、保罗·奎特考斯基和本·劳里提出的意见和/或建议。

15. Copying Conditions
15. 复制条件

Copyright (c) 2000-2006 Simon Josefsson

版权所有(c)2000-2006 Simon Josefsson

Regarding the abstract and sections 1, 3, 8, 10, 12, 13, and 14 of this document, that were written by Simon Josefsson ("the author", for the remainder of this section), the author makes no guarantees and is not responsible for any damage resulting from its use. The author grants irrevocable permission to anyone to use, modify, and distribute it in any way that does not diminish the rights of anyone else to use, modify, and distribute it, provided that redistributed derivative works do not contain misleading author or version information and do not falsely purport to be IETF RFC documents. Derivative works need not be licensed under similar terms.

关于Simon Josefsson(“作者”)撰写的摘要以及本文件第1、3、8、10、12、13和14节(本节剩余部分),作者不作任何保证,也不对因使用本摘要而造成的任何损害负责。作者向任何人授予不可撤销的许可,允许其以任何方式使用、修改和分发本文件,但不得削弱任何其他人使用、修改和分发本文件的权利,前提是重新分发的衍生作品不包含误导性作者或版本信息,也不得虚假地声称为IETF RFC文件。衍生作品无需根据类似条款获得许可。

16. References
16. 工具书类
16.1. Normative References
16.1. 规范性引用文件

[1] Cerf, V., "ASCII format for network interchange", RFC 20, October 1969.

[1] Cerf,V.,“网络交换的ASCII格式”,RFC 20,1969年10月。

[2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[2] Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

16.2. Informative References
16.2. 资料性引用

[3] Linn, J., "Privacy Enhancement for Internet Electronic Mail: Part I: Message Encryption and Authentication Procedures", RFC 1421, February 1993.

[3] 林恩,J.,“因特网电子邮件的隐私增强:第一部分:信息加密和认证程序”,RFC 14211993年2月。

[4] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996.

[4] Freed,N.和N.Borenstein,“多用途互联网邮件扩展(MIME)第一部分:互联网邮件正文格式”,RFC 20451996年11月。

[5] Callas, J., Donnerhacke, L., Finney, H., and R. Thayer, "OpenPGP Message Format", RFC 2440, November 1998.

[5] Callas,J.,Donnerhacke,L.,Finney,H.,和R.Thayer,“OpenPGP消息格式”,RFC2440,1998年11月。

[6] Arends, R., Austein, R., Larson, M., Massey, D., and S. Rose, "DNS Security Introduction and Requirements", RFC 4033, March 2005.

[6] Arends,R.,Austein,R.,Larson,M.,Massey,D.,和S.Rose,“DNS安全介绍和要求”,RFC 4033,2005年3月。

[7] Klyne, G. and L. Masinter, "Identifying Composite Media Features", RFC 2938, September 2000.

[7] Klyne,G.和L.Masinter,“识别复合媒体特征”,RFC 2938,2000年9月。

[8] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1", RFC 3501, March 2003.

[8] Crispin,M.,“互联网消息访问协议-版本4rev1”,RFC 35012003年3月。

[9] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.

[9] Berners Lee,T.,Fielding,R.,和L.Masinter,“统一资源标识符(URI):通用语法”,STD 66,RFC 3986,2005年1月。

[10] Laurie, B., Sisson, G., Arends, R., and D. Blacka, "DNSSEC Hash Authenticated Denial of Existence", Work in Progress, June 2006.

[10] Laurie,B.,Sisson,G.,Arends,R.,和D.Blacka,“DNSSEC哈希认证拒绝存在”,正在进行的工作,2006年6月。

[11] Myers, J., "SASL GSSAPI mechanisms", Work in Progress, May 2000.

[11] 迈尔斯,J.,“SASL GSSAPI机制”,正在进行的工作,2000年5月。

[12] Wilcox-O'Hearn, B., "Post to P2P-hackers mailing list", http://zgp.org/pipermail/p2p-hackers/2001-September/ 000315.html, September 2001.

[12] Wilcox-O'Hearn,B.,“发布到P2P黑客邮件列表”,http://zgp.org/pipermail/p2p-hackers/2001-September/ 000315.html,2001年9月。

Author's Address

作者地址

Simon Josefsson SJD EMail: simon@josefsson.org

Simon Josefsson SJD电子邮件:simon@josefsson.org

Full Copyright Statement

完整版权声明

Copyright (C) The Internet Society (2006).

版权所有(C)互联网协会(2006年)。

This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.

本文件受BCP 78中包含的权利、许可和限制的约束,除其中规定外,作者保留其所有权利。

This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件及其包含的信息是按“原样”提供的,贡献者、他/她所代表或赞助的组织(如有)、互联网协会和互联网工程任务组不承担任何明示或暗示的担保,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Intellectual Property

知识产权

The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.

IETF对可能声称与本文件所述技术的实施或使用有关的任何知识产权或其他权利的有效性或范围,或此类权利下的任何许可可能或可能不可用的程度,不采取任何立场;它也不表示它已作出任何独立努力来确定任何此类权利。有关RFC文件中权利的程序信息,请参见BCP 78和BCP 79。

Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.

向IETF秘书处披露的知识产权副本和任何许可证保证,或本规范实施者或用户试图获得使用此类专有权利的一般许可证或许可的结果,可从IETF在线知识产权存储库获取,网址为http://www.ietf.org/ipr.

The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.

IETF邀请任何相关方提请其注意任何版权、专利或专利申请,或其他可能涵盖实施本标准所需技术的专有权利。请将信息发送至IETF的IETF-ipr@ietf.org.

Acknowledgement

确认

Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA).

RFC编辑器功能的资金由IETF行政支持活动(IASA)提供。