Network Working Group                                          K. Tamaru
Request for Comments: 2237                         Microsoft Corporation
Category: Informational                                    November 1997
        
Network Working Group                                          K. Tamaru
Request for Comments: 2237                         Microsoft Corporation
Category: Informational                                    November 1997
        

Japanese Character Encoding for Internet Messages

互联网信息的日文字符编码

Status of this Memo

本备忘录的状况

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (1997). All Rights Reserved.

版权所有(C)互联网协会(1997年)。版权所有。

1. Abstract
1. 摘要

This memo defines an encoding scheme for the Japanese Characters, describes "ISO-2022-JP-1", which is used in electronic mail [RFC-822], and network news [RFC 1036]. Also this memo provides a listing of the Japanese Character Set that can be used in this encoding scheme.

本备忘录定义了日文字符的编码方案,描述了电子邮件[RFC-822]和网络新闻[RFC 1036]中使用的“ISO-2022-JP-1”。此外,本备忘录还提供了可用于此编码方案的日语字符集列表。

2. Requirements Notation
2. 需求符号

This document uses terms that appear in capital letters to indicate particular requirements of this specification. Those terms are "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY". The meaning of each term are found in [RFC-2119]

本文件使用大写字母表示本规范的特殊要求。这些术语是“必须”、“应该”、“不得”、“不应该”和“可以”。各术语的含义见[RFC-2119]

3. Introduction
3. 介绍

RFC 1468 defines the way Japanese Characters are encoded, likewise what this memo defines. It defines the use of JIS X 0208 as the double-byte character set in ISO-2022-JP text.

RFC 1468定义了日文字符的编码方式,与本备忘录的定义相同。它将JIS X 0208定义为ISO-2022-JP文本中的双字节字符集。

Today, many operating systems support proprietary extended Japanese characters or JIS X 0212, This includes the Unicode character set, which does not conform to JIS X 0201 nor JIS X 0208. Therefore, this limits the ability to communicate and correspond precise information because of the limited availability of Kanji characters. Fortunately JIS (Japanese Industry Standard) defines JIS X 0212 as "code of the

今天,许多操作系统支持专有的扩展日语字符或JIS X 0212,这包括Unicode字符集,它不符合JIS X 0201或JIS X 0208。因此,由于汉字的可用性有限,这限制了交流和对应精确信息的能力。幸运的是,JIS(日本工业标准)将JIS X 0212定义为“产品代码”

supplementary Japanese graphic character set for information interchange". Most Japanese characters which are used in regular electronic mail in most cases can be accommodated in JIS X 0201, JIS X 0208 and JIS X 0212.

信息交换用补充日文图形字符集”。在大多数情况下,普通电子邮件中使用的大多数日文字符可在JIS X 0201、JIS X 0208和JIS X 0212中使用。

Also it is recognized that there is a tendency to use Unicode, however, Unicode is not yet widely used and there is a certain limitation with old electronic mail system. Furthermore, the purpose of this comment is to add the capability of writing out JIS X 0212.

人们还认识到,有使用Unicode的趋势,但是,Unicode尚未得到广泛应用,旧的电子邮件系统存在一定的局限性。此外,本评论的目的是增加编写JIS X 0212的能力。

This comment does not describe any representation of iso-2022-jp-1 version information in addition to JIS X 0212 support.

除JIS X 0212支持外,本评论不描述iso-2022-jp-1版本信息的任何表示。

4. Description
4. 描述

In "ISO-2022-JP-1" text, the initial character code of the message is in ASCII. The "double-byte-seq"(see "Format Syntax" section) (ESC "$" "B" / ESC "$" "@" / ESC "$" "(" "D") is the only designator that indicates that the following character is double-byte, and it is valid until another escape sequence appears. It is very discouraged to use (ESC "$" "@") for double byte character encoding, new implementation SHOULD use only (ESC "$" "B") for double byte encoding instead.

在“ISO-2022-JP-1”文本中,消息的初始字符代码为ASCII。“双字节序列”(请参阅“格式语法”部分)(ESC“$”B“/ESC“$”@“/ESC“$”(“D”)是唯一指示以下字符为双字节的指示符,并且在出现另一个转义序列之前它是有效的。不鼓励使用(ESC“$”@)对于双字节字符编码,新的实现应该只使用(ESC“$”B”)进行双字节编码。

The end of "ISO-2022-JP-1" text MUST be in ASCII. Also it is strongly recommended to back up to the ASCII at the end of each line rather than JIS X 0201-Roman if there is any none ASCII character in middle of a line.

“ISO-2022-JP-1”文本的结尾必须为ASCII。此外,如果行中间没有任何ASCII字符,强烈建议备份到每行末尾的ASCII字符,而不是JIS X 0201罗马字符。

Since "ISO-2022-JP-1" is designed to add the capability of writing out JIS X 0212, if the message does not contain none of JIS X 0212 characters. "ISO-2022-JP" text MUST BE used.

由于“ISO-2022-JP-1”的设计目的是在消息不包含任何JIS X 0212字符的情况下,增加写出JIS X 0212的能力。必须使用“ISO-2022-JP”文本。

JIS X 0201-Roman is not identical to the ASCII with two different characters.

JIS X 0201 Roman与ASCII不相同,具有两个不同的字符。

The following list are the escape sequences and character sets that can be used in "ISO-2022-JP-1" text. The registered number in the ISO 2375 Register which allow double-byte ideographic scripts to be encoded within ISO/IEC 2022 code structure is indicated as reg# below.

以下列表是可在“ISO-2022-JP-1”文本中使用的转义序列和字符集。ISO 2375寄存器中允许在ISO/IEC 2022代码结构中编码双字节表意文字的注册号如下所示。

   reg# character set     ESC sequence                  designated to
   6    ASCII             ESC 2/8 4/2                   ESC ( B    G0
   42   JIS X 0208-1978   ESC 2/4 4/0                   ESC $ @    G0
   87   JIS X 0208-1983   ESC 2/4 4/2                   ESC $ B    G0
   14   JIS X 0201-Roman  ESC 2/8 4/10                  ESC ( J    G0
   159  JIS X 0212-1990   ESC 2/4 2/8 4/4               ESC $ ( D  G0
        
   reg# character set     ESC sequence                  designated to
   6    ASCII             ESC 2/8 4/2                   ESC ( B    G0
   42   JIS X 0208-1978   ESC 2/4 4/0                   ESC $ @    G0
   87   JIS X 0208-1983   ESC 2/4 4/2                   ESC $ B    G0
   14   JIS X 0201-Roman  ESC 2/8 4/10                  ESC ( J    G0
   159  JIS X 0212-1990   ESC 2/4 2/8 4/4               ESC $ ( D  G0
        

Other restrictions are given in the Formal Syntax below.

下面的正式语法给出了其他限制。

5. Formal Syntax
5. 形式语法

The notational conventions used here are identical to those used in STD 11, RFC 822 [RFC822].

此处使用的符号约定与STD 11、RFC 822[RFC822]中使用的符号约定相同。

The * (asterisk) convention is as follows: l*m something meaning at least l and at most m something, with l and m taking default values of 0 and infinity, respectively.

*(星号)约定如下:l*m某物表示至少l和最多m某物,l和m分别取默认值0和无穷大。

   iso-2022-jp-1-text  = *( line CRLF ) [line]
        
   iso-2022-jp-1-text  = *( line CRLF ) [line]
        
   line                = (*single-byte-char *segment
                        single-byte-seq *single-byte-char) /
                        *single-byte-char
        
   line                = (*single-byte-char *segment
                        single-byte-seq *single-byte-char) /
                        *single-byte-char
        

segment = single-byte-segment / double-byte-segment

段=单字节段/双字节段

   single-byte-segment = single-byte-seq *single-byte-char
   double-byte-segment = double-byte-seq *(one-of-94 one-of-94)
        
   single-byte-segment = single-byte-seq *single-byte-char
   double-byte-segment = double-byte-seq *(one-of-94 one-of-94)
        
   reset-seq           = ESC "(" ( "B" / "J" )
   single-byte-seq     = ESC "(" ( "B" / "J" )
   double-byte-seq     = (ESC "$" ( "@" / "B" )) /
                              (ESC "$" "(" "D" )
        
   reset-seq           = ESC "(" ( "B" / "J" )
   single-byte-seq     = ESC "(" ( "B" / "J" )
   double-byte-seq     = (ESC "$" ( "@" / "B" )) /
                              (ESC "$" "(" "D" )
        
   CRLF             = CR LF;( Octal, Decimal.)
   ESC              = <ISO 2022 ESC, escape>;( 33,27.)
   SI               = <ISO 2022 SI, shift-in>;( 17,15.)
   SO               = <ISO 2022 SO, shift-out>;( 16,14.)
   CR               = <ASCII CR, carriage return>;( 15,13.)
   LF               = <ASCII LF, linefeed>;( 12,10.)
   one-of-94        = <any one of 94 values>;(41-176,33.-126.)
   one-of-96        = <any one of 96 values>;(40-177,32.-127.)
   7BIT             = <any 7-bit value>;(0-177,0.-127.)
   single-byte-char = <any 7BIT, including bare CR & bare LF,
                        but NOT including CRLF, and not including
                        ESC, SI, SO>
        
   CRLF             = CR LF;( Octal, Decimal.)
   ESC              = <ISO 2022 ESC, escape>;( 33,27.)
   SI               = <ISO 2022 SI, shift-in>;( 17,15.)
   SO               = <ISO 2022 SO, shift-out>;( 16,14.)
   CR               = <ASCII CR, carriage return>;( 15,13.)
   LF               = <ASCII LF, linefeed>;( 12,10.)
   one-of-94        = <any one of 94 values>;(41-176,33.-126.)
   one-of-96        = <any one of 96 values>;(40-177,32.-127.)
   7BIT             = <any 7-bit value>;(0-177,0.-127.)
   single-byte-char = <any 7BIT, including bare CR & bare LF,
                        but NOT including CRLF, and not including
                        ESC, SI, SO>
        
6. Security Considerations
6. 安全考虑

This memo raises no known security issues.

这份备忘录没有提出任何已知的安全问题。

7. MIME Considerations
7. MIME注意事项

The name to be used for the Japanese encoding scheme in content is "ISO-2022-JP-1". When this name is used in the MIME message form, it would be:

内容中用于日语编码方案的名称为“ISO-2022-JP-1”。在MIME消息表单中使用此名称时,它将是:

              Content-Type: text/plain; charset=iso-2022-jp-1
        
              Content-Type: text/plain; charset=iso-2022-jp-1
        

Since the "ISO-2022-JP-1" is 7bit encoding, it will be unnecessary to encode in another format by specifying the "Content-Transfer-Encoding" header. Also applying Based64 or Quoted-Printable encoding MAY cause today's software to fail to decode the message.

由于“ISO-2022-JP-1”是7比特编码,因此无需通过指定“内容传输编码”头以另一种格式编码。此外,应用Based64或引用的可打印编码可能会导致今天的软件无法解码消息。

"ISO-2022-JP-1" can be used in MIME headers. Also "ISO-2022-JP-1" text can be used with Base64 or Quoted-Printable encoding.

“ISO-2022-JP-1”可用于MIME头。此外,“ISO-2022-JP-1”文本可与Base64或引用的可打印编码一起使用。

8. Additional Information
8. 补充资料

As long as mail systems are capable of writing out Unicode, it is recommended to also write out Unicode text in addition to "ISO-2022-JP-1" text. Also writing out "ISO-2022-JP" text in addition to "ISO-2022-JP-1" is strongly encouraged for backward compatibility reasons.

只要邮件系统能够写出Unicode,建议除了“ISO-2022-JP-1”文本外,还写出Unicode文本。此外,出于向后兼容性的原因,强烈鼓励在“ISO-2022-JP-1”之外写出“ISO-2022-JP”文本。

Some mail systems write out 8bits characters in 'parameter' and 'value' defined in [RFC 822] and [RFC 1521]. All 8bit characters MUST NOT be used in those fields. The implementation of future mail systems SHOULD support those only for interoperability reasons.

一些邮件系统在[RFC 822]和[RFC 1521]中定义的“参数”和“值”中写入8位字符。不得在这些字段中使用所有8位字符。未来邮件系统的实现应仅出于互操作性原因支持这些系统。

9. References
9. 工具书类

[ISO2022] International Organization for Standardization (ISO), "Information processing -- ISO 7-bit and 8-bit coded character sets -- Code extension techniques", International Standard, Ref. No. ISO 2022-1986 (E).

[ISO2022]国际标准化组织(ISO),“信息处理——ISO 7位和8位编码字符集——代码扩展技术”,国际标准,参考号ISO 2022-1986(E)。

[ISOREG] International Organization for Standardization (ISO), "International Register of Coded Character Sets To Be Used With Escape Sequences".

[ISOREG]国际标准化组织(ISO),“转义序列使用的编码字符集国际注册”。

[RFC-822] Crocker, D., "Standard for the Format of ARPA Internet Text Messages", STD 11, RFC 822, August 1982.

[RFC-822]Crocker,D.,“ARPA互联网文本信息格式标准”,STD 11,RFC 822,1982年8月。

[RFC-1468] Murai, J., Crispin, M., and E. van der Poel, "Japanese Character Encoding for Internet Messages", RFC 1468, June 1993.

[RFC-1468]Murai,J.,Crispin,M.和E.van der Poel,“互联网信息的日语字符编码”,RFC 1468,1993年6月。

[RFC-1766] Alvestrand, H., "Tags for the Identification of Languages", RFC 1766, March 1995.

[RFC-1766]Alvestrand,H.,“语言识别标签”,RFC 1766,1995年3月。

[RFC-2045] Freed, N., and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, December 1996.

[RFC-2045]Freed,N.和N.Borenstein,“多用途Internet邮件扩展(MIME)第一部分:Internet邮件正文格式”,RFC 20451996年12月。

[RFC-2046] Freed, N., and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, December 1996.

[RFC-2046]Freed,N.和N.Borenstein,“多用途互联网邮件扩展(MIME)第二部分:媒体类型”,RFC 2046,1996年12月。

[RFC-2047] Moore, K., "Multipurpose Internet Mail Extensions (MIME) Part Three: Representation of Non-ASCII Text in Internet Message Headers", RFC 2047, December 1996.

[RFC-2047]Moore,K.,“多用途Internet邮件扩展(MIME)第三部分:Internet消息头中非ASCII文本的表示”,RFC 2047,1996年12月。

[RFC-2048] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet Mail Extensions (MIME) Part Four: MIME Registration Procedures", RFC 2048, December 1996.

[RFC-2048]Freed,N.,Klensin,J.和J.Postel,“多用途互联网邮件扩展(MIME)第四部分:MIME注册程序”,RFC 2048,1996年12月。

[RFC-2049] Freed, N., and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Five: Conformance Criteria and Examples", RFC 2049, December 1996.

[RFC-2049]Freed,N.和N.Borenstein,“多用途Internet邮件扩展(MIME)第五部分:一致性标准和示例”,RFC 2049,1996年12月。

[RFC-2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, March 1997.

[RFC-2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,RFC 211997年3月。

Author's Address

作者地址

Kenzaburo Tamaru Microsoft Corporation One Microsoft Way Redmond, WA 98052-6399

Kenzaburo Tamaru微软公司华盛顿州雷德蒙微软大道一号,邮编:98052-6399

   EMail: kenzat@microsoft.com
        
   EMail: kenzat@microsoft.com
        

Full Copyright Statement

完整版权声明

Copyright (C) The Internet Society (1997). All Rights Reserved.

版权所有(C)互联网协会(1997年)。版权所有。

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.

上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。