Network Working Group                                         H. Maruyama
Request for Comments: 2803                                      K. Tamura
Category: Informational                                        N. Uramoto
                                                                      IBM
                                                               April 2000
        
Network Working Group                                         H. Maruyama
Request for Comments: 2803                                      K. Tamura
Category: Informational                                        N. Uramoto
                                                                      IBM
                                                               April 2000
        

Digest Values for DOM (DOMHASH)

DOM的摘要值(DOMHASH)

Status of this Memo

本备忘录的状况

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (2000). All Rights Reserved.

版权所有(C)互联网协会(2000年)。版权所有。

Abstract

摘要

This memo defines a clear and unambiguous definition of digest (hash) values of the XML objects regardless of the surface string variation of XML. This definition can be used for XML digital signature as well efficient replication of XML objects.

此备忘录定义了XML对象的摘要(哈希)值的明确定义,而不考虑XML的表面字符串变化。此定义可用于XML数字签名以及XML对象的高效复制。

Table of Contents

目录

   1. Introduction............................................2
   2. Digest Calculation......................................3
   2.1. Overview..............................................3
   2.2. Namespace Considerations..............................4
   2.3. Definition with Code Fragments........................5
   2.3.1. Text Nodes..........................................5
   2.3.2. Processing Instruction Nodes........................6
   2.3.3. Attr Nodes..........................................6
   2.3.4. Element Nodes.......................................7
   2.3.5. Document Nodes......................................9
   3. Discussion..............................................9
   4. Security Considerations.................................9
   References................................................10
   Authors' Addresses........................................10
   Full Copyright Statement..................................11
        
   1. Introduction............................................2
   2. Digest Calculation......................................3
   2.1. Overview..............................................3
   2.2. Namespace Considerations..............................4
   2.3. Definition with Code Fragments........................5
   2.3.1. Text Nodes..........................................5
   2.3.2. Processing Instruction Nodes........................6
   2.3.3. Attr Nodes..........................................6
   2.3.4. Element Nodes.......................................7
   2.3.5. Document Nodes......................................9
   3. Discussion..............................................9
   4. Security Considerations.................................9
   References................................................10
   Authors' Addresses........................................10
   Full Copyright Statement..................................11
        
1. Introduction
1. 介绍

The purpose of this document is to give a clear and unambiguous definition of digest (hash) values of the XML objects [XML]. Two subtrees are considered identical if their hash values are the same, and different if their hash values are different.

本文档的目的是给出XML对象[XML]的摘要(哈希)值的明确定义。如果两个子树的散列值相同,则认为它们相同;如果它们的散列值不同,则认为它们不同。

There are at least two usage scenarios of DOMHASH. One is as a basis for digital signatures for XML. Digital signature algorithms normally require hashing a signed content before signing. DOMHASH provides a concrete definition of the hash value calculation.

DOMHASH至少有两种使用场景。一种是作为XML数字签名的基础。数字签名算法通常需要在签名之前对签名内容进行哈希运算。DOMHASH提供了哈希值计算的具体定义。

The other is to use DOMHASH when synchronizing two DOM structures [DOM]. Suppose that a server program generates a DOM structure which is to be rendered by clients. If the server makes frequent small changes on a large DOM tree, it is desirable that only the modified parts are sent over to the client. A client can initiate a request by sending the root hash value of the structure in the cache memory. If it matches with the root hash value of the current server structure, nothing needs be sent. If not, then the server compares the client hash with the older versions in the server's cache. If it finds one that matches the client's version of the structure, then it locates differences with the current version by recursively comparing the hash values of each node. This way, the client can receive only an updated portion of a large structure without requesting the whole thing.

另一种是在同步两个DOM结构[DOM]时使用DOMHASH。假设服务器程序生成一个DOM结构,该结构将由客户端呈现。如果服务器频繁地对大型DOM树进行小的更改,则只需要将修改后的部分发送给客户机。客户端可以通过在高速缓存中发送结构的根哈希值来启动请求。如果它与当前服务器结构的根哈希值匹配,则不需要发送任何内容。如果没有,则服务器会将客户端哈希与服务器缓存中的旧版本进行比较。如果它找到一个与客户端版本的结构相匹配的,那么它将通过递归比较每个节点的哈希值来定位与当前版本的差异。通过这种方式,客户机只能接收大型结构的更新部分,而无需请求整个结构。

One way of defining digest values is to take a surface string as the input for a digest algorithm. However, this approach has several drawbacks. The same internal DOM structure may be represented in may different ways as surface strings even if they strictly conform to the XML specification. Treatment of white spaces, selection of character encodings, entity references (i.e., use of ampersands), and so on have impact on the generation of a surface string. If the implementations of surface string generation are different, the hash values would be different, resulting in unvalidatable digital signatures and unsuccessful detection of identical DOM structures. Therefore, it is desirable that digest of DOM is defined in the DOM terms -- that is, as an unambiguous algorithm operating on a DOM tree. This is the approach we take in this specification.

定义摘要值的一种方法是将曲面字符串作为摘要算法的输入。然而,这种方法有几个缺点。相同的内部DOM结构可能以不同的方式表示为表面字符串,即使它们严格符合XML规范。空格的处理、字符编码的选择、实体引用(即,使用符号)等都会影响表面字符串的生成。如果表面字符串生成的实现不同,则散列值也会不同,从而导致无法验证数字签名和无法检测相同的DOM结构。因此,最好用DOM术语定义DOM摘要——也就是说,作为在DOM树上运行的明确算法。这是我们在本规范中采用的方法。

Introduction of namespace is another source of variation of surface string because different namespace prefixes can be used for representing the same namespace URI [URI]. In the following example, the namespace prefix "edi" is bound to the URI "http://ecommerce.org/schema" but this prefix can be arbitrary chosen without changing the logical contents as shown in the second example.

名称空间的引入是表面字符串变化的另一个原因,因为可以使用不同的名称空间前缀来表示相同的名称空间URI[URI]。在以下示例中,命名空间前缀“edi”绑定到URI“http://ecommerce.org/schema但是这个前缀可以任意选择,而不改变逻辑内容,如第二个示例所示。

    <?xml version="1.0"?>
    <root xmlns:edi='http://ecommerce.org/schema'>
        <edi:order>
            :
        </edi:order>
    </root>
        
    <?xml version="1.0"?>
    <root xmlns:edi='http://ecommerce.org/schema'>
        <edi:order>
            :
        </edi:order>
    </root>
        
    <?xml version="1.0"?>
    <root xmlns:ec='http://ecommerce.org/schema'>
        <ec:order>
            :
        </ec:order>
    </root>
        
    <?xml version="1.0"?>
    <root xmlns:ec='http://ecommerce.org/schema'>
        <ec:order>
            :
        </ec:order>
    </root>
        

The DOMHASH defined in this document is designed so that the choice of the namespace prefix does not affect the digest value. In the above example, both the "root" elements will get the same digest value.

本文档中定义的DOMHASH的设计使名称空间前缀的选择不会影响摘要值。在上面的示例中,两个“根”元素将获得相同的摘要值。

2. Digest Calculation
2. 摘要计算
2.1. Overview
2.1. 概述

Hash values are defined on the DOM type Node. We consider the following five node types that are used for representing a DOM document structure:

哈希值在DOM类型节点上定义。我们考虑用于表示DOM文档结构的以下五种节点类型:

- Text - ProcessingInstruction - Attr - Element - Document

- 文本-处理说明-属性-元素-文档

Comment nodes and Document Type Definitions (DTDs) do not participate in the digest value calculation. This is because DOM does not require a conformant processor to create data structures for these. DOMHASH is designed so that it can be computed with any XML processor conformant to the DOM or SAX [SAX] specification.

注释节点和文档类型定义(DTD)不参与摘要值计算。这是因为DOM不需要一致的处理器来为这些对象创建数据结构。DOMHASH的设计使得它可以使用符合DOM或SAX[SAX]规范的任何XML处理器进行计算。

Nodes with the node type EntityReference must be expanded prior to digest calculation.

在摘要计算之前,必须展开节点类型为EntityReference的节点。

The digest values are defined recursively on each level of the DOM tree so that only a relevant part needs to be recalculated when a small portion of the tree is changed.

摘要值在DOM树的每个级别上递归定义,因此当树的一小部分发生更改时,只需要重新计算相关部分。

Below, we give the precise definitions of digest for these types. We describe the format of the data to be supplied to a hash algorithm using a figure and a simple description, followed by a Java code fragment using the DOM API and the JDK 1.1 Platform Core API only. Therefore, the semantics should be unambiguous.

下面,我们给出这些类型的摘要的精确定义。我们使用一个图和一个简单的描述来描述要提供给散列算法的数据的格式,然后是仅使用domapi和jdk1.1平台核心API的Java代码片段。因此,语义应该是明确的。

As the rule of thumb, all strings are to be in UTF-16BE [UTF16]. If there is a sequence of text nodes without any element nodes in between, these text nodes are merged into one by concatenating them. A zero-length text node is always ignored.

根据经验,所有字符串都应采用UTF-16BE[UTF16]格式。如果有一系列文本节点之间没有任何元素节点,则通过将这些文本节点串联在一起,将它们合并为一个文本节点。始终忽略长度为零的文本节点。

Note that validating and non-validating XML processors may generate different DOM trees from the same XML document, due to attribute normalization and default attributes. If DOMHASH is to be used for testing logical equivalence between two XML documents (as opposed to DOM trees), it may be necessary to normalize attributes and supply default attributes prior to DOMHASH calculation.

请注意,由于属性规范化和默认属性,验证和非验证XML处理器可能会从同一XML文档生成不同的DOM树。如果要使用DOMHASH测试两个XML文档(与DOM树相反)之间的逻辑等价性,则可能需要在DOMHASH计算之前规范化属性并提供默认属性。

Some legacy character encodings (such as ISO-2022-JP) have certain ambiguity in translating into Unicode. This is again dependent on XML processors. Treatment of such processor dependencies is out of scope of this document.

一些传统字符编码(如ISO-2022-JP)在转换为Unicode时具有一定的模糊性。这同样依赖于XML处理器。这种处理器依赖关系的处理超出了本文档的范围。

2.2. Namespace Considerations
2.2. 命名空间注意事项

To avoid the dependence on the namespace prefix, we use "expanded names" to do digest calculation. If an element name or an attribute name is qualified either by a explicit namespace prefix or by a default namespace, the name's LocalPart is prepended by the URI of the namespace (the namespace name as defined in the Namespace specification [NAM]) and a colon before digest calculation. In the following example, the default qualified name "order" is expanded into "http://ecommerce.org/schema:order" while the explicit qualified name "book:title" is expanded into "urn:loc.gov:books:title" before digest calculation.

为了避免对名称空间前缀的依赖,我们使用“扩展名称”进行摘要计算。如果元素名或属性名由显式名称空间前缀或默认名称空间限定,则名称的LocalPart在摘要计算之前由名称空间的URI(名称空间规范[NAM]中定义的名称空间名称)和冒号前缀。在下面的示例中,默认限定名称“order”扩展为“http://ecommerce.org/schema:order而在摘要计算之前,显式限定名“book:title”扩展为“urn:loc.gov:books:title”。

   <?xml version="1.0"?>
        
   <?xml version="1.0"?>
        
   <root xmlns='http://ecommerce.org/schema'
            xmlns:book='urn:loc.gov:books'>
       <order>
          <book:title> ... </book:title>
           :
       </order>
   </root>
        
   <root xmlns='http://ecommerce.org/schema'
            xmlns:book='urn:loc.gov:books'>
       <order>
          <book:title> ... </book:title>
           :
       </order>
   </root>
        

We define an expanded name (either for element or attribute) as follows:

我们定义一个扩展名称(元素或属性),如下所示:

If a name is not qualified, the expanded name is the name itself.

如果名称未限定,则扩展名称就是名称本身。

If a name is qualified with the prefix "xmlns", the expanded name is undefined.

如果名称使用前缀“xmlns”限定,则扩展名称未定义。

If a name is qualified either by default or by an explicit namespace prefix, the expanded name is URI bound to the namespace + ":" + LocalPart

如果名称在默认情况下或通过显式名称空间前缀限定,则扩展名称的URI绑定到名称空间+“:”+LocalPart

In the following example code, we assume that the getExpandedName() method (which returns the expanded name as defined above) is defined in both Element and Attr interfaces of DOM.

在下面的示例代码中,我们假设getExpandedName()方法(返回上面定义的扩展名称)是在DOM的元素和Attr接口中定义的。

Note that the digest values are not defined on namespace declarations. In other words, the digest value is not defined for an attribute when

请注意,摘要值不是在命名空间声明上定义的。换言之,在以下情况下,不会为属性定义摘要值:

- the attribute name is "xmlns", or - the namespace prefix is "xmlns".

- 属性名为“xmlns”,或-命名空间前缀为“xmlns”。

In the above example, the two attributes which are namespace declarations do not have digest values and therefore will not participate in the calculation of the digest value of the "root" element.

在上面的示例中,作为名称空间声明的两个属性没有摘要值,因此不会参与“root”元素摘要值的计算。

2.3. Definition with Code Fragments
2.3. 带有代码片段的定义

The code fragments in the definitions below assume that they are in implementation classes of Node. Therefore, a methods call without an explicit object reference is for the Node itself. For example, getData() returns the text data of the current node if it is a Text node. The parameter digestAlgorithm is to be replaced by an identifier of the digest algorithm, such as "MD5" [MD5] and "SHA-1" [SHA].

下面定义中的代码片段假定它们位于节点的实现类中。因此,没有显式对象引用的方法调用是针对节点本身的。例如,如果当前节点是文本节点,则getData()返回该节点的文本数据。参数digestAlgorithm将由摘要算法的标识符替换,例如“MD5”[MD5]和“SHA-1”[SHA]。

The computation should begin with a four byte integer that represents the type of the node, such as TEXT_NODE or ELEMENT_NODE.

计算应该从一个表示节点类型的四字节整数开始,例如TEXT\u节点或ELEMENT\u节点。

2.3.1. Text Nodes
2.3.1. 文本节点

The hash value of a Text node is computed on the four byte header followed by the UTF-16BE encoded text string.

文本节点的哈希值在四字节头上计算,后跟UTF-16BE编码的文本字符串。

- TEXT_NODE (3) in 32 bit network-byte-ordered integer - Text data in UTF-16BE stream (variable length)

- 32位网络字节顺序整数中的文本节点(3)-UTF-16BE流中的文本数据(可变长度)

   public byte[] getDigest(String digestAlgorithm) {
       MessageDigest md = MessageDigest.getInstance(digestAlgorithm);
       md.update((byte)0);
       md.update((byte)0);
       md.update((byte)0);
       md.update((byte)3);
       md.update(getData().getBytes("UnicodeBigUnmarked"));
       return md.digest();
   }
        
   public byte[] getDigest(String digestAlgorithm) {
       MessageDigest md = MessageDigest.getInstance(digestAlgorithm);
       md.update((byte)0);
       md.update((byte)0);
       md.update((byte)0);
       md.update((byte)3);
       md.update(getData().getBytes("UnicodeBigUnmarked"));
       return md.digest();
   }
        

Here, MessageDigest is in the package java.security.*, one of the built-in packages of JDK 1.1.

这里,MessageDigest位于包java.security.*中,它是JDK1.1的内置包之一。

2.3.2. ProcessingInstruction Nodes
2.3.2. 处理指令节点

A ProcessingInstruction (PI) node has two components: the target and the data. Accordingly, the hash is computed on the concatenation of both, separated by 'x0000'. PI data is from the first non white space character after the target to the character immediately preceding the "?>".

ProcessingInstruction(PI)节点有两个组件:目标和数据。因此,哈希是在两者的串联上计算的,由“x0000”分隔。PI数据从目标后面的第一个非空白字符到“>”前面的字符。

- PROCESSING_INSTRUCTION_NODE (7) in 32 bit network-byte-ordered integer - PI target in UTF-16BE stream (variable length) - 0x00 0x00 - PI data in UTF-16BE stream (variable length)

- 处理32位网络字节顺序整数中的\u指令\u节点(7)-UTF-16BE流中的PI目标(可变长度)-0x00 0x00-UTF-16BE流中的PI数据(可变长度)

   public byte[] getDigest(String digestAlgorithm) {
       MessageDigest md = MessageDigest.getInstance(digestAlgorithm);
       md.update((byte)0);
       md.update((byte)0);
       md.update((byte)0);
       md.update((byte)7);
       md.update(getName().getBytes("UnicodeBigUnmarked"));
       md.update((byte)0);
       md.update((byte)0);
       md.update(getData().getBytes("UnicodeBigUnmarked"));
       return md.digest();
   }
        
   public byte[] getDigest(String digestAlgorithm) {
       MessageDigest md = MessageDigest.getInstance(digestAlgorithm);
       md.update((byte)0);
       md.update((byte)0);
       md.update((byte)0);
       md.update((byte)7);
       md.update(getName().getBytes("UnicodeBigUnmarked"));
       md.update((byte)0);
       md.update((byte)0);
       md.update(getData().getBytes("UnicodeBigUnmarked"));
       return md.digest();
   }
        
2.3.3. Attr Nodes
2.3.3. 属性节点

The digest value of Attr nodes are defined similarly to PI nodes, except that we need a separator between the expanded attribute name and the attribute value. The '0x0000' value in UTF-16BE is allowed nowhere in an XML document, so it can serve as an unambiguous separator. The expanded name must be used as the attribute name because it may be qualified. Note that if the attribute is a

Attr节点的摘要值的定义类似于PI节点,只是我们需要在扩展的属性名称和属性值之间使用分隔符。UTF-16BE中的“0x0000”值在XML文档中是不允许的,因此它可以用作明确的分隔符。扩展名称必须用作属性名称,因为它可能是限定的。请注意,如果属性是

namespace declaration (either the attribute name is "xmlns" or its prefix is "xmlns"), the digest value is undefined and the getDigest() method should return null.

命名空间声明(属性名称为“xmlns”或其前缀为“xmlns”),摘要值未定义,getDigest()方法应返回null。

- ATTRIBUTE_NODE (2) in 32 bit network-byte-ordered integer - Expanded attribute name in UTF-16BE stream (variable length) - 0x00 0x00 - Attribute value in UTF-16BE stream (variable length)

- 32位网络字节顺序整数中的属性_节点(2)-UTF-16BE流中的扩展属性名称(可变长度)-0x00 0x00-UTF-16BE流中的属性值(可变长度)

   public byte[] getDigest(String digestAlgorithm) {
       if (getNodeName().equals("xmlns")
               || getNodeName().startsWith("xmlns:"))
           return null;
       MessageDigest md = MessageDigest.getInstance(digestAlgorithm);
       md.update((byte)0);
       md.update((byte)0);
       md.update((byte)0);
       md.update((byte)2);
       md.update(getExpandedName().getBytes("UnicodeBigUnmarked"));
       md.update((byte)0);
       md.update((byte)0);
       md.update(getValue().getBytes("UnicodeBigUnmarked"));
       return md.digest();
   }
        
   public byte[] getDigest(String digestAlgorithm) {
       if (getNodeName().equals("xmlns")
               || getNodeName().startsWith("xmlns:"))
           return null;
       MessageDigest md = MessageDigest.getInstance(digestAlgorithm);
       md.update((byte)0);
       md.update((byte)0);
       md.update((byte)0);
       md.update((byte)2);
       md.update(getExpandedName().getBytes("UnicodeBigUnmarked"));
       md.update((byte)0);
       md.update((byte)0);
       md.update(getValue().getBytes("UnicodeBigUnmarked"));
       return md.digest();
   }
        
2.3.4. Element Nodes
2.3.4. 元素节点

Element nodes are the most complex because they consist of other nodes recursively. Hash values of these component nodes are used to calculate the node's digest so that we can save computation when the structure is partially changed.

元素节点是最复杂的,因为它们递归地由其他节点组成。这些组件节点的散列值用于计算节点的摘要,以便在结构部分更改时节省计算。

First, all the attributes except for namespace declarations must be collected. This list is sorted lexicographically by the expanded attribute names (based on Unicode character code points). When no surrogate characters are involved, this is the same as sorting in ascending order in terms of the UTF-16BE encoded expanded attribute names, using the string comparison operator String.compareTo() in Java.

首先,必须收集除名称空间声明之外的所有属性。此列表按扩展属性名称(基于Unicode字符代码点)按字典顺序排序。当不涉及代理字符时,这与使用Java中的字符串比较运算符string.compareTo()按UTF-16BE编码的扩展属性名称升序排序相同。

- ELEMENT_NODE (1) in 32 bit network-byte-ordered integer - Expanded element name in UTF-16BE stream (variable length) - 0x00 0x00 - A number of non-namespace-declaration attributes in 32 bit network-byte-ordered unsigned integer - Sequence of digest values of non-namespace-declaration attributes, sorted lexicographically by expanded attribute names - A number of child nodes (except for Comment nodes) in 32bit

- 32位网络字节顺序整数中的元素_节点(1)-UTF-16BE流中的扩展元素名称(可变长度)-0x00 0x00-32位网络字节顺序无符号整数中的许多非命名空间声明属性-非命名空间声明属性的摘要值序列,按扩展属性名称按字典顺序排序-32位中的许多子节点(注释节点除外)

network-byte-ordered unsigned integer - Sequence of digest values of each child node except for Comment nodes (variable length) (A sequence of child texts is merged to one text. A zero-length text and Comment nodes are not counted as child)

网络字节顺序无符号整数-每个子节点的摘要值序列,注释节点除外(可变长度)(子文本序列合并为一个文本。长度为零的文本和注释节点不算作子文本)

   public byte[] getDigest(String digestAlgorithm) {
       MessageDigest md = MessageDigest.getInstance(digestAlgorithm);
       ByteArrayOutputStream baos = new ByteArrayOutputStream();
       DataOutputStream dos = new DataOutputStream(baos);
       dos.writeInt(ELEMENT_NODE);//This is stored in network byte order
       dos.write(getExpandedName().getBytes("UnicodeBigUnmarked"));
       dos.write((byte)0);
       dos.write((byte)0);
       // Collect all attributes except for namespace declarations
       NamedNodeMap nnm = this.getAttributes();
       int len = nnm.getLength()
               // Find "xmlns" or "xmlns:foo" in nnm and omit it.
       ...
       dos.writeInt(len);    // This is sorted in the network byte order
       // Sort attributes lexicographically by expanded attribute
       // names.
       ...
       // Assume that `Attr[] aattr' has sorted Attribute instances.
       for (int i = 0;  i < len;  i ++)
           dos.write(aattr[i].getDigest(digestAlgorithm));
       Node n = this.getFirstChild();
       // Assume that adjoining Texts are merged,
       // there is  no 0-length Text, and
       // comment nodes are removed.
       len = this.getChildNodes().getLength();
       dos.writeInt(len);    // This is stored in the network byte order
       while (n != null) {
           dos.write(n.getDigest(digestAlgorithm));
           n = n.getNextSibling();
       }
       dos.close();
       md.update(baos.toByteArray());
       return md.digest();
   }
        
   public byte[] getDigest(String digestAlgorithm) {
       MessageDigest md = MessageDigest.getInstance(digestAlgorithm);
       ByteArrayOutputStream baos = new ByteArrayOutputStream();
       DataOutputStream dos = new DataOutputStream(baos);
       dos.writeInt(ELEMENT_NODE);//This is stored in network byte order
       dos.write(getExpandedName().getBytes("UnicodeBigUnmarked"));
       dos.write((byte)0);
       dos.write((byte)0);
       // Collect all attributes except for namespace declarations
       NamedNodeMap nnm = this.getAttributes();
       int len = nnm.getLength()
               // Find "xmlns" or "xmlns:foo" in nnm and omit it.
       ...
       dos.writeInt(len);    // This is sorted in the network byte order
       // Sort attributes lexicographically by expanded attribute
       // names.
       ...
       // Assume that `Attr[] aattr' has sorted Attribute instances.
       for (int i = 0;  i < len;  i ++)
           dos.write(aattr[i].getDigest(digestAlgorithm));
       Node n = this.getFirstChild();
       // Assume that adjoining Texts are merged,
       // there is  no 0-length Text, and
       // comment nodes are removed.
       len = this.getChildNodes().getLength();
       dos.writeInt(len);    // This is stored in the network byte order
       while (n != null) {
           dos.write(n.getDigest(digestAlgorithm));
           n = n.getNextSibling();
       }
       dos.close();
       md.update(baos.toByteArray());
       return md.digest();
   }
        
2.3.5. Document Nodes
2.3.5. 文档节点

A Document node may have PI nodes before and after the root Element node. The digest value of a Document node is computed based on the sequence of the digest values of the pre-root PI nodes, the root Element node, and the post-root PI nodes in this order. Comment nodes and DocumentType nodes, if any, are ignored.

文档节点可以在根元素节点之前和之后具有PI节点。文档节点的摘要值是根据前根PI节点、根元素节点和后根PI节点的摘要值顺序计算的。注释节点和DocumentType节点(如果有)将被忽略。

- DOCUMENT_NODE (9) in 32 bit network-byte-ordered integer - A number of child nodes (except for Comment and DocumentType nodes) in 32bit network-byte-ordered unsigned integer - Sequence of digest values of each child node except for Comment and DocumentType nodes (variable length)

- 32位网络字节顺序整数中的文档_节点(9)-32位网络字节顺序无符号整数中的子节点数(注释和文档类型节点除外)-每个子节点的摘要值序列,注释和文档类型节点除外(可变长度)

     public byte[] getDigest(String digestAlgorithm) {
         MessageDigest md = MessageDigest.getInstance(digestAlgorithm);
         ByteArrayOutputStream baos = new ByteArrayOutputStream();
         DataOutputStream dos = new DataOutputStream(baos);
         dos.writeInt(DOCUMENT_NODE);//This is stored in network byte order
        
     public byte[] getDigest(String digestAlgorithm) {
         MessageDigest md = MessageDigest.getInstance(digestAlgorithm);
         ByteArrayOutputStream baos = new ByteArrayOutputStream();
         DataOutputStream dos = new DataOutputStream(baos);
         dos.writeInt(DOCUMENT_NODE);//This is stored in network byte order
        
         // Assume that Comment and DocumentType nodes are removed and this
         // node has only an Element node and PI nodes.
         len = this.getChildNodes().getLength();
         dos.writeInt(len);    // This is stored in the network byte order
         Node n = this.getFirstChild();
         while (n != null) {
             dos.write(n.getDigest(digestAlgorithm));
             n = n.getNextSibling();
         }
         dos.close();
         md.update(baos.toByteArray());
         return md.digest();
     }
        
         // Assume that Comment and DocumentType nodes are removed and this
         // node has only an Element node and PI nodes.
         len = this.getChildNodes().getLength();
         dos.writeInt(len);    // This is stored in the network byte order
         Node n = this.getFirstChild();
         while (n != null) {
             dos.write(n.getDigest(digestAlgorithm));
             n = n.getNextSibling();
         }
         dos.close();
         md.update(baos.toByteArray());
         return md.digest();
     }
        
3. Discussion
3. 讨论

The definition described above can be efficiently implemented with any XML processor that is conformant to either DOM and SAX specification. Reference implementations are available on request.

上述定义可以通过任何符合DOM和SAX规范的XML处理器有效地实现。参考实现可根据要求提供。

4. Security Considerations
4. 安全考虑

DOMHASH is expected to be used as the basis for digital signatures and other security and integrity uses. It's appropriateness for such uses depends on the security of the hash algorithm used and inclusion of the fundamental characteristics it is desired to check in parts of the DOM model incorporated in the digest by DOMHASH.

DOMHASH预计将用作数字签名和其他安全和完整性用途的基础。这种用途是否合适取决于所用哈希算法的安全性,以及是否包含希望签入DOMHASH摘要中包含的DOM模型部分的基本特征。

References

工具书类

   [DOM]   "Document Object Model (DOM), Level 1 Specification", October
         1998, http://www.w3.org/TR/REC-DOM-Level-1/
        
   [DOM]   "Document Object Model (DOM), Level 1 Specification", October
         1998, http://www.w3.org/TR/REC-DOM-Level-1/
        

[MD5] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, April 1992.

[MD5]Rivest,R.,“MD5消息摘要算法”,RFC 13211992年4月。

[NAM] Tim Bray, Dave Hollander, Andrew Layman, "Namespaces in XML", http://www.w3.org/TR/1999/REC-xml-names-19990114.

[NAM]Tim Bray,Dave Hollander,Andrew Layman,“XML中的名称空间”,http://www.w3.org/TR/1999/REC-xml-names-19990114.

[SAX] David Megginson, "SAX 1.0: The Simple API for XML", http://www.megginson.com/SAX/, May 1998.

[SAX]David Megginson,“SAX1.0:XML的简单API”,http://www.megginson.com/SAX/,1998年5月。

[SHA] (US) National Institute of Standards and Technology, "Federal Information Processing Standards Publication 180-1: Secure Hash Standard", 17 April 1995.

[SHA](美国)国家标准与技术研究所,“联邦信息处理标准出版物180-1:安全哈希标准”,1995年4月17日。

[URI] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax", RFC 2396, August 1998.

[URI]Berners Lee,T.,Fielding,R.和L.Masinter,“统一资源标识符(URI):通用语法”,RFC 2396,1998年8月。

[UTF16] Hoffman, P., Yergeau, F., "UTF-16, an encoding of ISO 10646", RFC 2781, February 2000.

[UTF16]Hoffman,P.,Yergeau,F.,“UTF-16,ISO 10646编码”,RFC 2781,2000年2月。

   [XML]   Tim Bray, Jean Paoli, C. M. Sperber-McQueen, "Extensible
         Markup Language (XML) 1.0", http://www.w3.org/TR/1998/REC-xml-
         19980210
        
   [XML]   Tim Bray, Jean Paoli, C. M. Sperber-McQueen, "Extensible
         Markup Language (XML) 1.0", http://www.w3.org/TR/1998/REC-xml-
         19980210
        

Authors' Addresses

作者地址

Hiroshi Maruyama, IBM Research, Tokyo Research Laboratory

Hiroshi Maruyama,IBM研究中心,东京研究实验室

   EMail: maruyama@jp.ibm.com
        
   EMail: maruyama@jp.ibm.com
        

Kent Tamura, IBM Research, Tokyo Research Laboratory

Kent Tamura,IBM研究,东京研究实验室

   EMail: kent@trl.ibm.co.jp
        
   EMail: kent@trl.ibm.co.jp
        

Naohiko Uramoto, IBM Research, Tokyo Research Laboratory

浦本直彦,IBM研究,东京研究实验室

   EMail: uramoto@jp.ibm.com
        
   EMail: uramoto@jp.ibm.com
        

Full Copyright Statement

完整版权声明

Copyright (C) The Internet Society (2000). All Rights Reserved.

版权所有(C)互联网协会(2000年)。版权所有。

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.

上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Acknowledgment

致谢

Funding for the RFC Editor function is currently provided by the Internet Society.

RFC编辑功能的资金目前由互联网协会提供。