RFC 6576: IP Performance Metrics (IPPM) Standard Advancement Testing 中文翻译

URL : https://datatracker.ietf.org/doc/html/rfc6576
标题 : RFC 6576
翻译类型 : 自动生成

Internet Engineering Task Force (IETF)                      R. Geib, Ed.
Request for Comments: 6576                              Deutsche Telekom
BCP: 176                                                       A. Morton
Category: Best Current Practice                                AT&T Labs
ISSN: 2070-1721                                                R. Fardid
                                                    Cariden Technologies
                                                            A. Steinmitz
                                                        Deutsche Telekom
                                                              March 2012

Internet Engineering Task Force (IETF)                      R. Geib, Ed.
Request for Comments: 6576                              Deutsche Telekom
BCP: 176                                                       A. Morton
Category: Best Current Practice                                AT&T Labs
ISSN: 2070-1721                                                R. Fardid
                                                    Cariden Technologies
                                                            A. Steinmitz
                                                        Deutsche Telekom
                                                              March 2012

IP Performance Metrics (IPPM) Standard Advancement Testing

IP性能指标（IPPM）标准升级测试

Abstract

摘要

This document specifies tests to determine if multiple independent instantiations of a performance-metric RFC have implemented the specifications in the same way. This is the performance-metric equivalent of interoperability, required to advance RFCs along the Standards Track. Results from different implementations of metric RFCs will be collected under the same underlying network conditions and compared using statistical methods. The goal is an evaluation of the metric RFC itself to determine whether its definitions are clear and unambiguous to implementors and therefore a candidate for advancement on the IETF Standards Track. This document is an Internet Best Current Practice.

本文档指定测试，以确定性能指标RFC的多个独立实例化是否以相同的方式实现了规范。这是相当于互操作性的性能指标，是沿着标准轨道推进RFC所必需的。将在相同的基础网络条件下收集度量RFC的不同实现的结果，并使用统计方法进行比较。目标是对度量RFC本身进行评估，以确定其定义对实施者是否清晰明确，从而成为IETF标准轨道上的进步候选。本文件是互联网最佳实践。

Status of This Memo

关于下段备忘

This memo documents an Internet Best Current Practice.

本备忘录记录了互联网最佳实践。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on BCPs is available in Section 2 of RFC 5741.

本文件是互联网工程任务组（IETF）的产品。它代表了IETF社区的共识。它已经接受了公众审查，并已被互联网工程指导小组（IESG）批准出版。有关BCP的更多信息，请参见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6576.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息，请访问http://www.rfc-editor.org/info/rfc6576.

版权公告

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件，因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本，并提供简化BSD许可证中所述的无担保。

Table of Contents

   1. Introduction ....................................................3
      1.1. Requirements Language ......................................5
   2. Basic Idea ......................................................5
   3. Verification of Conformance to a Metric Specification ...........7
      3.1. Tests of an Individual Implementation against a Metric
           Specification ..............................................8
      3.2. Test Setup Resulting in Identical Live Network
           Testing Conditions .........................................9
      3.3. Tests of Two or More Different Implementations
           against a Metric Specification ............................15
      3.4. Clock Synchronization .....................................16
      3.5. Recommended Metric Verification Measurement Process .......17
      3.6. Proposal to Determine an Equivalence Threshold for
           Each Metric Evaluated .....................................20
   4. Acknowledgements ...............................................21
   5. Contributors ...................................................21
   6. Security Considerations ........................................21
   7. References .....................................................21
      7.1. Normative References ......................................21
      7.2. Informative References ....................................23
   Appendix A.  An Example on a One-Way Delay Metric Validation ......24
     A.1.  Compliance to Metric Specification Requirements ...........24
     A.2.  Examples Related to Statistical Tests for One-Way Delay ...25
   Appendix B.  Anderson-Darling K-sample Reference and 2 Sample
                C++ Code .............................................27
   Appendix C.  Glossary .............................................36

   1. Introduction ....................................................3
      1.1. Requirements Language ......................................5
   2. Basic Idea ......................................................5
   3. Verification of Conformance to a Metric Specification ...........7
      3.1. Tests of an Individual Implementation against a Metric
           Specification ..............................................8
      3.2. Test Setup Resulting in Identical Live Network
           Testing Conditions .........................................9
      3.3. Tests of Two or More Different Implementations
           against a Metric Specification ............................15
      3.4. Clock Synchronization .....................................16
      3.5. Recommended Metric Verification Measurement Process .......17
      3.6. Proposal to Determine an Equivalence Threshold for
           Each Metric Evaluated .....................................20
   4. Acknowledgements ...............................................21
   5. Contributors ...................................................21
   6. Security Considerations ........................................21
   7. References .....................................................21
      7.1. Normative References ......................................21
      7.2. Informative References ....................................23
   Appendix A.  An Example on a One-Way Delay Metric Validation ......24
     A.1.  Compliance to Metric Specification Requirements ...........24
     A.2.  Examples Related to Statistical Tests for One-Way Delay ...25
   Appendix B.  Anderson-Darling K-sample Reference and 2 Sample
                C++ Code .............................................27
   Appendix C.  Glossary .............................................36

1. Introduction

1. 介绍

   The Internet Standards Process as updated by RFC 6410 [RFC6410]
   specifies that widespread deployment and use is sufficient to show
   interoperability as a condition for advancement to Internet Standard.
   The previous requirement of interoperability tests prior to advancing
   an RFC to the Standard maturity level specified in RFC 2026 [RFC2026]
   and RFC 5657 [RFC5657] has been removed.  While the modified
   requirement is applicable to protocols, wide deployment of different
   measurement systems does not prove that the implementations measure
   metrics in a standard way.  Section 5.3 of RFC 5657 [RFC5657]
   explicitly mentions the special case of Standards that are not "on-
   the-wire" protocols.  While this special case is not explicitly
   mentioned by RFC 6410 [RFC6410], the four criteria in Section 2.2 of
   RFC 6410 [RFC6410] are augmented by this document for RFCs that
   specify performance metrics.  This document takes the position that
   flexible metric definitions can be proven to be clear and unambiguous
   through tests that compare the results from independent
   implementations.  It describes tests that infer whether metric
   specifications are sufficient using a definition of metric
   "interoperability": measuring equivalent results (in a statistical
   sense) under the same network conditions.  The document expands on
   this problem and its solution.

   The Internet Standards Process as updated by RFC 6410 [RFC6410]
   specifies that widespread deployment and use is sufficient to show
   interoperability as a condition for advancement to Internet Standard.
   The previous requirement of interoperability tests prior to advancing
   an RFC to the Standard maturity level specified in RFC 2026 [RFC2026]
   and RFC 5657 [RFC5657] has been removed.  While the modified
   requirement is applicable to protocols, wide deployment of different
   measurement systems does not prove that the implementations measure
   metrics in a standard way.  Section 5.3 of RFC 5657 [RFC5657]
   explicitly mentions the special case of Standards that are not "on-
   the-wire" protocols.  While this special case is not explicitly
   mentioned by RFC 6410 [RFC6410], the four criteria in Section 2.2 of
   RFC 6410 [RFC6410] are augmented by this document for RFCs that
   specify performance metrics.  This document takes the position that
   flexible metric definitions can be proven to be clear and unambiguous
   through tests that compare the results from independent
   implementations.  It describes tests that infer whether metric
   specifications are sufficient using a definition of metric
   "interoperability": measuring equivalent results (in a statistical
   sense) under the same network conditions.  The document expands on
   this problem and its solution.

In the case of a protocol specification, the notion of "interoperability" is reasonably intuitive -- the implementations must successfully "talk to each other", while exercising all features and options. To achieve interoperability, two implementors need to interpret the protocol specifications in equivalent ways. In the case of IP Performance Metrics (IPPM), this definition of interoperability is only useful for test and control protocols like the One-Way Active Measurement Protocol (OWAMP) [RFC4656] and the Two-Way Active Measurement Protocol (TWAMP) [RFC5357].

在协议规范的情况下，“互操作性”的概念相当直观——实现必须成功地“相互对话”，同时执行所有功能和选项。为了实现互操作性，两个实现者需要以等效的方式解释协议规范。在IP性能度量（IPPM）的情况下，这种互操作性定义仅适用于测试和控制协议，如单向主动测量协议（OWAMP）[RFC4656]和双向主动测量协议（TWAMP）[RFC5357]。

A metric specification RFC describes one or more metric definitions, methods of measurement, and a way to report the results of measurement. One example would be a way to test and report the one-way delay that data packets incur while being sent from one network location to another, using the One-Way Delay Metric.

度量规范RFC描述了一个或多个度量定义、测量方法以及报告测量结果的方法。一个示例是使用单向延迟度量测试和报告数据包从一个网络位置发送到另一个网络位置时产生的单向延迟的方法。

In the case of metric specifications, the conditions that satisfy the "interoperability" requirement are less obvious, and there is a need for IETF agreement on practices to judge metric specification "interoperability" in the context of the IETF Standards Process. This memo provides methods that should be suitable to evaluate metric specifications for Standards Track advancement. The methods proposed here MAY be generally applicable to metric specification RFCs beyond those developed under the IPPM Framework [RFC2330].

在度量规范的情况下，满足“互操作性”要求的条件不太明显，需要IETF就在IETF标准过程中判断度量规范“互操作性”的实践达成协议。本备忘录提供了适用于评估标准跟踪进展的度量规范的方法。此处提出的方法可能普遍适用于超出IPPM框架[RFC2330]下开发的公制规范RFC。

Since many implementations of IP metrics are embedded in measurement systems that do not interact with one another (they were built before OWAMP and TWAMP), the interoperability evaluation called for in the IETF Standards Process cannot be determined by observing that independent implementations interact properly for various protocol exchanges. Instead, verifying that different implementations give statistically equivalent results under controlled measurement conditions takes the place of interoperability observations. Even when evaluating OWAMP and TWAMP RFCs for Standards Track advancement, the methods described here are useful to evaluate the measurement results because their validity would not be ascertained in protocol interoperability testing.

由于IP度量的许多实现嵌入到彼此不交互的度量系统中（它们在OWAMP和TWAMP之前构建），因此IETF标准过程中要求的互操作性评估无法通过观察独立实现对各种协议交换的正确交互来确定。相反，验证不同的实现在受控测量条件下给出的统计等效结果取代了互操作性观察。即使在评估OWAMP和TWAMP RFC的标准跟踪进展时，此处描述的方法也有助于评估测量结果，因为它们的有效性在协议互操作性测试中无法确定。

The Standards advancement process aims at producing confidence that the metric definitions and supporting material are clearly worded and unambiguous, or reveals ways in which the metric definitions can be revised to achieve clarity. The process also permits identification of options that were not implemented, so that they can be removed from the advancing specification. Thus, the product of this process is information about the metric specification RFC itself: determination of the specifications or definitions that are clear and unambiguous and those that are not (as opposed to an evaluation of the implementations that assist in the process).

标准推进过程旨在产生信心，使人们相信度量定义和支持材料的措辞清晰、明确，或者揭示可以修改度量定义以实现清晰性的方法。该过程还允许识别未实施的选项，以便将其从规范中删除。因此，这个过程的产物是关于度量规范RFC本身的信息：确定清晰和明确的规范或定义，以及那些不明确的规范或定义（与评估有助于该过程的实现相反）。

This document defines a process to verify that implementations (or practically, measurement systems) have interpreted the metric specifications in equivalent ways and produce equivalent results.

本文件定义了验证实施（或实际上，测量系统）是否以等效方式解释了度量规范并产生等效结果的过程。

Testing for statistical equivalence requires ensuring identical test setups (or awareness of differences) to the best possible extent. Thus, producing identical test conditions is a core goal of this memo. Another important aspect of this process is to test individual implementations against specific requirements in the metric specifications using customized tests for each requirement. These tests can distinguish equivalent interpretations of each specific requirement.

统计等效性测试要求尽可能确保相同的测试设置（或差异意识）。因此，产生相同的测试条件是本备忘录的核心目标。这个过程的另一个重要方面是使用针对每个需求的定制测试，根据度量规范中的特定需求测试各个实现。这些测试可以区分每个特定需求的等效解释。

Conclusions on equivalence are reached by two measures.

关于等效性的结论可以通过两种方法得出。

First, implementations are compared against individual metric specifications to make sure that differences in implementation are minimized or at least known.

首先，将实现与单个度量规范进行比较，以确保实现中的差异最小化或至少已知。

Second, a test setup is proposed ensuring identical networking conditions so that unknowns are minimized and comparisons are simplified. The resulting separate data sets may be seen as samples taken from the same underlying distribution. Using statistical methods, the equivalence of the results is verified. To illustrate

其次，提出了一种测试设置，以确保相同的网络条件，从而最小化未知量，简化比较。由此产生的独立数据集可视为从相同的基础分布中提取的样本。使用统计方法，验证了结果的等效性。说明

application of the process and methods defined here, evaluation of the One-Way Delay Metric [RFC2679] is provided in Appendix A. While test setups will vary with the metrics to be validated, the general methodology of determining equivalent results will not. Documents defining test setups to evaluate other metrics should be developed once the process proposed here has been agreed and approved.

应用此处定义的过程和方法，附录A中提供了单向延迟度量[RFC2679]的评估。虽然测试设置将随待验证度量的不同而变化，但确定等效结果的一般方法将不适用。一旦此处提出的流程获得同意和批准，应编制定义测试设置以评估其他指标的文件。

The metric RFC advancement process begins with a request for protocol action accompanied by a memo that documents the supporting tests and results. The procedures of [RFC2026] are expanded in [RFC5657], including sample implementation and interoperability reports. [TESTPLAN] can serve as a template for a metric RFC report that accompanies the protocol action request to the Area Director, including a description of the test setup, procedures, results for each implementation, and conclusions.

metric RFC推进过程从协议行动请求开始，并附有记录支持测试和结果的备忘录。[RFC2026]的程序在[RFC5657]中进行了扩展，包括示例实现和互操作性报告。[TESTPLAN]可作为向区域主管提出协议行动请求时附带的度量RFC报告的模板，包括测试设置、程序、每次实施的结果和结论的说明。

1.1. Requirements Language

1.1. 需求语言

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[RFC2119]中所述进行解释。

2. Basic Idea

2. 基本思想

The implementation of a standard compliant metric is expected to meet the requirements of the related metric specification. So, before comparing two metric implementations, each metric implementation is individually compared against the metric specification.

符合标准的度量标准的实施预计将满足相关度量规范的要求。因此，在比较两个度量实现之前，每个度量实现都要单独与度量规范进行比较。

Most metric specifications leave freedom to implementors on non-fundamental aspects of an individual metric (or options). Comparing different measurement results using a statistical test with the assumption of identical test path and testing conditions requires knowledge of all differences in the overall test setup. Metric specification options chosen by implementors have to be documented. It is RECOMMENDED to use identical metric options for any test proposed here (an exception would be if a variable parameter of the metric definition is not configurable in one or more implementations). Calibrations specified by metric standards SHOULD be performed to further identify (and possibly reduce) potential sources of error in the test setup.

大多数度量规范在单个度量（或选项）的非基本方面留给实现者自由。在假设测试路径和测试条件相同的情况下，使用统计测试比较不同的测量结果需要了解整个测试设置中的所有差异。必须记录实施者选择的度量规范选项。建议对此处提出的任何测试使用相同的度量选项（例外情况是，如果度量定义的可变参数在一个或多个实现中不可配置）。应执行公制标准规定的校准，以进一步确定（并可能减少）测试设置中的潜在误差源。

The IPPM Framework [RFC2330] expects that a "methodology for a metric should have the property that it is repeatable: if the methodology is used multiple times under identical conditions, it should result in consistent measurements". This means an implementation is expected to repeatedly measure a metric with consistent results (repeatability with the same result). Small deviations in the test setup are

IPPM框架[RFC2330]期望“度量方法应具有可重复性：如果在相同条件下多次使用该方法，则应产生一致的测量结果”。这意味着实现需要重复测量具有一致结果的度量（具有相同结果的重复性）。测试设置中的小偏差如下：

expected to lead to small deviations in results only. To characterize statistical equivalence in the case of small deviations, [RFC2330] and [RFC2679] suggest to apply a 95% confidence interval. Quoting RFC 2679, "95 percent was chosen because ... a particular confidence level should be specified so that the results of independent implementations can be compared".

预计只会导致结果的微小偏差。为了描述小偏差情况下的统计等效性，[RFC2330]和[RFC2679]建议采用95%置信区间。引用RFC 2679，“选择95%是因为……应指定特定的置信水平，以便能够比较独立实施的结果”。

Two different implementations are expected to produce statistically equivalent results if they both measure a metric under the same networking conditions. Formulating in statistical terms: separate metric implementations collect separate samples from the same underlying statistical process (the same network conditions). The statistical hypothesis to be tested is the expectation that both samples do not expose statistically different properties. This requires careful test design:

如果两种不同的实现都在相同的网络条件下测量一个度量，那么它们将产生统计上等效的结果。用统计术语表述：单独的度量实现从相同的基础统计过程（相同的网络条件）收集单独的样本。待检验的统计假设是期望两个样本不会暴露统计上不同的特性。这需要仔细的测试设计：

o The measurement test setup must be self-consistent to the largest possible extent. To minimize the influence of the test and measurement setup on the result, network conditions and paths MUST be identical for the compared implementations to the largest possible degree. This includes both the stability and non-ambiguity of routes taken by the measurement packets. See [RFC2330] for a discussion on self-consistency.

o 测量测试设置必须尽可能自我一致。为了最大限度地减少测试和测量设置对结果的影响，网络条件和路径必须在最大程度上与所比较的实现相同。这包括测量数据包所采取的路由的稳定性和非模糊性。有关自我一致性的讨论，请参见[RFC2330]。

o To minimize the influence of implementation options on the result, metric implementations SHOULD use identical options and parameters for the metric under evaluation.

o 为了最大限度地减少实施选项对结果的影响，度量实施应该为被评估的度量使用相同的选项和参数。

o The sample size must be large enough to minimize its influence on the consistency of the test results. This consideration may be especially important if two implementations measure with different average packet transmission rates.

o 样本量必须足够大，以尽量减少其对测试结果一致性的影响。如果两个实现以不同的平均分组传输速率进行测量，则此考虑可能特别重要。

o The implementation with the lowest average packet transmission rate determines the smallest temporal interval for which samples can be compared.

o 具有最低平均分组传输速率的实现确定可比较样本的最小时间间隔。

o Repeat comparisons with several independent metric samples to avoid random indications of compatibility (or the lack of it).

o 与几个独立的度量样本重复比较，以避免随机显示兼容性（或缺少兼容性）。

The metric specifications themselves are the primary focus of evaluation, rather than the implementations of metrics. The documentation produced by the advancement process should identify which metric definitions and supporting material were found to be clearly worded and unambiguous, OR it should identify ways in which the metric specification text should be revised to achieve clarity and unified interpretation.

度量规范本身是评估的主要焦点，而不是度量的实现。推进过程产生的文件应确定哪些度量定义和支持材料措词明确、不含糊，或者应确定修订度量规范文本以实现清晰和统一解释的方式。

The process should also permit identification of options that were not implemented, so that they can be removed from the advancing specification (this is an aspect more typical of protocol advancement along the Standards Track).

该过程还应允许识别未实施的选项，以便将其从推进规范中删除（这是标准轨道上协议推进的一个更典型的方面）。

Note that this document does not propose to base interoperability indications of performance-metric implementations on comparisons of individual singletons. Individual singletons may be impacted by many statistical effects while they are measured. Comparing two singletons of different implementations may result in failures with higher probability than comparing samples.

请注意，本文档不建议将性能度量实现的互操作性指示建立在单个实例的比较基础上。在测量单个个体时，可能会受到许多统计效应的影响。比较两个不同实现的单例可能会导致比比较样本更高的失败概率。

3. Verification of Conformance to a Metric Specification

3. 验证是否符合公制规范

This section specifies how to verify compliance of two or more IPPM implementations against a metric specification. This document only proposes a general methodology. Compliance criteria to a specific metric implementation need to be defined for each individual metric specification. The only exception is the statistical test comparing two metric implementations that are simultaneously tested. This test is applicable without metric-specific decision criteria.

本节指定如何根据度量规范验证两个或多个IPPM实现的符合性。本文件仅提出一般方法。需要为每个单独的度量规范定义特定度量实现的符合性标准。唯一的例外是比较两个同时测试的度量实现的统计测试。本测试适用于无特定度量决策标准的情况。

Several testing options exist to compare two or more implementations:

有几个测试选项可用于比较两个或多个实现：

o Use a single test lab to compare the implementations and emulate the Internet with an impairment generator.

o 使用一个单独的测试实验室来比较实现，并用一个损伤生成器模拟互联网。

o Use a single test lab to compare the implementations and measure across the Internet.

o 使用单个测试实验室来比较互联网上的实现和测量。

o Use remotely separated test labs to compare the implementations and emulate the Internet with two "identically" configured impairment generators.

o 使用远程分离的测试实验室来比较实现，并使用两个“相同”配置的损坏生成器模拟互联网。

o Use remotely separated test labs to compare the implementations and measure across the Internet.

o 使用远程分离的测试实验室来比较互联网上的实现和测量。

o Use remotely separated test labs to compare the implementations, measure across the Internet, and include a single impairment generator to impact all measurement flows in a non-discriminatory way.

o 使用远程分离的测试实验室来比较实施情况，通过互联网进行测量，并包括一个单一的减值生成器，以非歧视的方式影响所有测量流。

The first two approaches work, but involve higher expenses than the others (due to travel and/or shipping plus installation). For the third option, ensuring two identically configured impairment generators requires well-defined test cases and possibly identical hardware and software.

前两种方法可行，但费用比其他方法高（由于差旅和/或运输加上安装）。对于第三个选项，确保两个配置相同的减值生成器需要定义良好的测试用例，并且可能需要相同的硬件和软件。

As documented in a test report [TESTPLAN], the last option was required to prove compatibility of two delay metric implementations. An impairment generator is probably required when testing compatibility of most other metrics, and it is therefore RECOMMENDED to include an impairment generator in metric test setups.

正如测试报告[TESTPLAN]中记录的那样，最后一个选项需要证明两个延迟度量实现的兼容性。在测试大多数其他度量的兼容性时，可能需要一个减值生成器，因此建议在度量测试设置中包括一个减值生成器。

3.1. Tests of an Individual Implementation against a Metric Specification

3.1. 针对度量规范对单个实现进行的测试

A metric implementation is compliant with a metric specification if it supports the requirements classified as "MUST" and "REQUIRED" in the related metric specification. An implementation that implements all requirements is fully compliant with the specification, and the degree of compliance SHOULD be noted in the conclusions of the report.

如果度量实现支持相关度量规范中分类为“必须”和“必需”的需求，则度量实现符合度量规范。实现所有需求的实现完全符合规范，并且应在报告的结论中注明符合程度。

Further, supported options of a metric implementation SHOULD be documented in sufficient detail to evaluate whether the specification was correctly interpreted. The documentation of chosen options should minimize (and recognize) differences in the test setup if two metric implementations are compared. Further, this documentation is used to validate or clarify the wording of the metric specification option, to remove options that saw no implementation or that are badly specified from the metric specification. This documentation SHOULD be included for all implementation-relevant specifications of a metric picked for a comparison, even those that are not explicitly marked as "MUST" or "REQUIRED" in the RFC text. This applies for the following sections of all metric specifications:

此外，应详细记录度量实现的支持选项，以评估规范是否得到正确解释。如果比较两种度量实现，则所选选项的文档应尽量减少（并识别）测试设置中的差异。此外，本文件用于验证或澄清公制规范选项的措辞，以删除未实施或公制规范中规定不当的选项。本文件应包括为进行比较而选择的指标的所有实施相关规范，即使在RFC文本中未明确标记为“必须”或“必需”的规范。这适用于所有公制规范的以下章节：

o Singleton Definition of the Metric.

o 度量的单例定义。

o Sample Definition of the Metric.

o 度量的示例定义。

o Statistics Definition of the Metric. As statistics are compared by the test specified here, this documentation is required even in the case that the metric specification does not contain a Statistics Definition.

o 度量的统计定义。由于统计数据通过此处指定的测试进行比较，因此即使在度量规范不包含统计数据定义的情况下，也需要此文档。

o Timing- and Synchronization-related specification (if relevant for the Metric).

o 定时和同步相关规范（如果与度量相关）。

o Any other technical part present or missing in the metric specification, which is relevant for the implementation of the Metric.

o 公制规范中存在或缺失的与公制实施相关的任何其他技术部分。

[RFC2330] and [RFC2679] emphasize precision as an aim of IPPM metric implementations. A single IPPM-conforming implementation should under otherwise identical network conditions produce precise results for repeated measurements of the same metric.

[RFC2330]和[RFC2679]强调精度是IPPM度量实现的目标。在其他相同的网络条件下，单个符合IPPM的实施应为相同度量的重复测量产生精确结果。

RFC 2330 prefers the "empirical distribution function" (EDF) to describe collections of measurements. RFC 2330 determines, that "unless otherwise stated, IPPM goodness-of-fit tests are done using 5% significance". The goodness-of-fit test determines by which precision two or more samples of a metric implementation belong to the same underlying distribution (of measured network performance events). The goodness-of-fit test suggested for the metric test is the Anderson-Darling K sample test (ADK sample test, K stands for the number of samples to be compared) [ADK]. Please note that RFC 2330 and RFC 2679 apply an Anderson-Darling goodness-of-fit test, too.

RFC 2330倾向于使用“经验分布函数”（EDF）来描述测量数据的收集。RFC 2330确定，“除非另有说明，IPPM拟合优度测试采用5%显著性”。拟合优度测试确定度量实现的两个或多个样本属于同一基础分布（测量的网络性能事件）的精度。为度量测试建议的拟合优度测试是Anderson-Darling K样本测试（ADK样本测试，K代表要比较的样本数量）[ADK]。请注意，RFC 2330和RFC 2679也应用了安德森-达林拟合优度测试。

The results of a repeated test with a single implementation MUST pass an ADK sample test with a confidence level of 95%. The conditions for which the ADK test has been passed with the specified confidence level MUST be documented. To formulate this differently, the requirement is to document the set of parameters with the smallest deviation at which the results of the tested metric implementation pass an ADK test with a confidence level of 95%. The minimum resolution available in the reported results from each implementation MUST be taken into account in the ADK test.

单个实现的重复测试结果必须通过置信水平为95%的ADK样本测试。必须记录ADK测试已通过规定置信水平的条件。为了以不同的方式表述这一点，要求记录一组偏差最小的参数，在该参数下，测试的度量实现的结果通过了置信水平为95%的ADK测试。ADK测试中必须考虑每个实施的报告结果中可用的最小分辨率。

The test conditions to be documented for a passed metric test include:

为通过公制测试而记录的测试条件包括：

o The metric resolution at which a test was passed (e.g., the resolution of timestamps).

o 通过测试的度量分辨率（例如，时间戳的分辨率）。

o The parameters modified by an impairment generator.

o 由减值生成器修改的参数。

o The impairment generator parameter settings.

o 损坏生成器参数设置。

3.2. Test Setup Resulting in Identical Live Network Testing Conditions

3.2. 测试设置导致相同的实时网络测试条件

Two major issues complicate tests for metric compliance across live networks under identical testing conditions. One is the general point that metric definition implementations cannot be conveniently examined in field measurement scenarios. The other one is more broadly described as "parallelism in devices and networks", including mechanisms like those that achieve load balancing (see [RFC4928]).

在相同的测试条件下，两个主要问题使跨实时网络的度量遵从性测试复杂化。一个是一般性的观点，即在现场测量场景中无法方便地检查度量定义实现。另一种更广泛地描述为“设备和网络的并行性”，包括实现负载平衡的机制（参见[RFC4928]）。

This section proposes two measures to deal with both issues. Tunneling mechanisms can be used to avoid parallel processing of different flows in the network. Measuring by separate parallel probe

本节提出了处理这两个问题的两项措施。隧道机制可用于避免网络中不同流的并行处理。分离式平行探头测量

flows results in repeated collection of data. If both measures are combined, Wide Area Network (WAN) conditions are identical for a number of independent measurement flows, no matter what the network conditions are in detail.

流导致重复收集数据。如果将这两种测量结合起来，则对于许多独立的测量流，广域网（WAN）条件是相同的，而不管网络条件是什么。

Any measurement setup must be made to avoid the probing traffic itself to impede the metric measurement. The created measurement load must not result in congestion at the access link connecting the measurement implementation to the WAN. The created measurement load must not overload the measurement implementation itself, e.g., by causing a high CPU load or by causing timestamp imprecision due to unwanted queuing while transmitting or receiving test packets.

必须进行任何测量设置，以避免探测流量本身妨碍度量测量。所创建的测量负载不得在连接测量实现与WAN的接入链路上造成拥塞。创建的测量负载不得使测量实现本身过载，例如，在发送或接收测试数据包时，由于不必要的排队而导致高CPU负载或时间戳不精确。

Tunneling multiple flows destined for a single physical port of a network element allows transmission of all packets via the same path. Applying tunnels to avoid undesired influence of standard routing for measurement purposes is a concept known from literature, see e.g., GRE-encapsulated multicast probing [GU-Duffield]. An existing IP-in-IP tunnel protocol can be applied to avoid Equal-Cost Multi-Path (ECMP) routing of different measurement streams if it meets the following criteria:

通过隧道将多个流发送到网元的单个物理端口，允许通过同一路径传输所有数据包。应用隧道以避免标准路由对测量目的的不期望影响是一个从文献中已知的概念，参见例如GRE封装的多播探测[GU Duffield]。如果现有IP-in-IP隧道协议满足以下标准，则可以应用该协议来避免不同测量流的等成本多路径（ECMP）路由：

o Inner IP packets from different measurement implementations are mapped into a single tunnel with a single outer IP origin and destination address as well as origin and destination port numbers that are identical for all packets.

o 来自不同测量实现的内部IP数据包被映射到一个隧道中，该隧道具有一个外部IP源和目标地址以及所有数据包都相同的源和目标端口号。

o An easily accessible tunneling protocol allows for carrying out a metric test from more test sites.

o 易于访问的隧道协议允许从更多测试站点执行度量测试。

o A low operational overhead may enable a broader audience to set up a metric test with the desired properties.

o 较低的操作开销可以使更广泛的受众建立具有所需属性的度量测试。

o The tunneling protocol should be reliable and stable in setup and operation to avoid disturbances or influence on the test results.

o 隧道协议的设置和运行应可靠、稳定，以避免干扰或影响测试结果。

o The tunneling protocol should not incur any extra cost for those interested in setting up a metric test.

o 隧道协议不应为那些对设置度量测试感兴趣的人带来任何额外成本。

An illustration of a test setup with two layer 2 tunnels and two flows between two linecards of one implementation is given in Figure 1.

图1中给出了一个测试设置的示例，其中两个第2层隧道和一个实现的两个线路卡之间的两个流。

           Implementation                   ,---.       +--------+
                               +~~~~~~~~~~~/     \~~~~~~| Remote |
            +------->-----F2->-|          /       \     |->---+  |
            | +---------+      | Tunnel 1(         )    |     |  |
            | | transmit|-F1->-|         (         )    |->+  |  |
            | | LC1     |      +~~~~~~~~~|         |~~~~|  |  |  |
            | | receive |-<--+           (         )    | F1  F2 |
            | +---------+    |           |Internet |    |  |  |  |
            *-------<-----+  F2          |         |    |  |  |  |
              +---------+ |  | +~~~~~~~~~|         |~~~~|  |  |  |
              | transmit|-*  *-|         |         |    |--+<-*  |
              | LC2     |      | Tunnel 2(         )    |  |     |
              | receive |-<-F1-|          \       /     |<-*     |
              +---------+      +~~~~~~~~~~~\     /~~~~~~| Router |
                                            `-+-'       +--------+

           Implementation                   ,---.       +--------+
                               +~~~~~~~~~~~/     \~~~~~~| Remote |
            +------->-----F2->-|          /       \     |->---+  |
            | +---------+      | Tunnel 1(         )    |     |  |
            | | transmit|-F1->-|         (         )    |->+  |  |
            | | LC1     |      +~~~~~~~~~|         |~~~~|  |  |  |
            | | receive |-<--+           (         )    | F1  F2 |
            | +---------+    |           |Internet |    |  |  |  |
            *-------<-----+  F2          |         |    |  |  |  |
              +---------+ |  | +~~~~~~~~~|         |~~~~|  |  |  |
              | transmit|-*  *-|         |         |    |--+<-*  |
              | LC2     |      | Tunnel 2(         )    |  |     |
              | receive |-<-F1-|          \       /     |<-*     |
              +---------+      +~~~~~~~~~~~\     /~~~~~~| Router |
                                            `-+-'       +--------+

For simplicity, only two linecards of one implementation and two flows F between them are shown.

为简单起见，仅显示了一个实现的两个线路卡以及它们之间的两个流程F。

Figure 1: Illustration of a Test Setup with Two Layer 2 Tunnels

图1：带有两个第2层隧道的测试设置示意图

Figure 2 shows the network elements required to set up layer 2 tunnels as shown by Figure 1.

图2显示了建立图1所示的第2层隧道所需的网络元素。

Implementation

实施

            +-----+                   ,---.
            | LC1 |                  /     \
            +-----+                 /       \              +------+
               |        +-------+  (         )  +-------+  |Remote|
            +--------+  |       |  |         |  |       |  |      |
            |Ethernet|  | Tunnel|  |Internet |  | Tunnel|  |      |
            |Switch  |--| Head  |--|         |--| Head  |--|      |
            +--------+  | Router|  |         |  | Router|  |      |
               |        |       |  (         )  |       |  |Router|
            +-----+     +-------+   \       /   +-------+  +------+
            | LC2 |                  \     /
            +-----+                   `-+-'

            +-----+                   ,---.
            | LC1 |                  /     \
            +-----+                 /       \              +------+
               |        +-------+  (         )  +-------+  |Remote|
            +--------+  |       |  |         |  |       |  |      |
            |Ethernet|  | Tunnel|  |Internet |  | Tunnel|  |      |
            |Switch  |--| Head  |--|         |--| Head  |--|      |
            +--------+  | Router|  |         |  | Router|  |      |
               |        |       |  (         )  |       |  |Router|
            +-----+     +-------+   \       /   +-------+  +------+
            | LC2 |                  \     /
            +-----+                   `-+-'

Figure 2: Illustration of a Hardware Setup to Realize the Test Setup Illustrated by Figure 1 with Layer 2 Tunnels or Pseudowires

图2：用第2层隧道或伪线实现图1所示测试设置的硬件设置示意图

The test setup successfully used during a delay metric test [TESTPLAN] is given as an example in Figure 3. Note that the shown setup allows a metric test between two remote sites.

在延迟度量测试[TESTPLAN]期间成功使用的测试设置如图3所示。请注意，所示设置允许在两个远程站点之间进行度量测试。

           +----+  +----+                                +----+  +----+
           |LC10|  |LC11|           ,---.                |LC20|  |LC21|
           +----+  +----+          /     \    +-------+  +----+  +----+
             | V10  | V11         /       \   | Tunnel|   | V20   |  V21
             |      |            (         )  | Head  |   |       |
            +--------+  +------+ |         |  | Router|__+----------+
            |Ethernet|  |Tunnel| |Internet |  +---B---+  |Ethernet  |
            |Switch  |--|Head  |-|         |      |      |Switch    |
            +-+--+---+  |Router| |         |  +---+---+  +--+--+----+
              |__|      +--A---+ (         )--|Option.|     |__|
                                  \       /   |Impair.|
            Bridge                 \     /    |Gener. |     Bridge
            V20 to V21              `-+-?     +-------+     V10 to V11

           +----+  +----+                                +----+  +----+
           |LC10|  |LC11|           ,---.                |LC20|  |LC21|
           +----+  +----+          /     \    +-------+  +----+  +----+
             | V10  | V11         /       \   | Tunnel|   | V20   |  V21
             |      |            (         )  | Head  |   |       |
            +--------+  +------+ |         |  | Router|__+----------+
            |Ethernet|  |Tunnel| |Internet |  +---B---+  |Ethernet  |
            |Switch  |--|Head  |-|         |      |      |Switch    |
            +-+--+---+  |Router| |         |  +---+---+  +--+--+----+
              |__|      +--A---+ (         )--|Option.|     |__|
                                  \       /   |Impair.|
            Bridge                 \     /    |Gener. |     Bridge
            V20 to V21              `-+-?     +-------+     V10 to V11

Figure 3: Example of Test Setup Successfully Used during a Delay Metic Test

图3：延迟模拟测试期间成功使用的测试设置示例

In Figure 3, LC10 identifies measurement clients / linecards. V10 and the others denote VLANs. All VLANs are using the same tunnel from A to B and in the reverse direction. The remote site VLANs are U-bridged at the local site Ethernet switch. The measurement packets of site 1 travel tunnel A->B first, are U-bridged at site 2, and travel tunnel B->A second. Measurement packets of site 2 travel tunnel B->A first, are U-bridged at site 1, and travel tunnel A->B second. So, all measurement packets pass the same tunnel segments, but in different segment order.

在图3中，LC10标识了测量客户机/线路卡。V10和其他表示VLAN。所有VLAN都在使用从A到B的同一个隧道，并且方向相反。远程站点VLAN在本地站点以太网交换机处进行U桥连接。首先是站点1行车隧道A->B的测量数据包，在站点2进行U桥连接，然后是行车隧道B->A。站点2行车隧道B->A的测量数据包首先在站点1进行U桥连接，然后在站点2行车隧道A->B进行U桥连接。因此，所有测量数据包都通过相同的隧道段，但段顺序不同。

If tunneling is applied, two tunnels MUST carry all test traffic in between the test site and the remote site. For example, if 802.1Q Virtual LANs (VLANs) are applied and the measurement streams are carried in different VLANs, the IP tunnel or pseudowires respectively are setup in physical port mode to avoid setup of pseudowires per VLAN (which may see different paths due to ECMP routing); see [RFC4448]. The remote router and the Ethernet switch shown in Figure 3 have to support 802.1Q in this setup.

如果采用隧道，则两条隧道必须承载测试现场和远程现场之间的所有测试交通。例如，如果应用了802.1Q虚拟局域网（VLAN），并且测量流在不同的VLAN中传输，则分别以物理端口模式设置IP隧道或伪线，以避免每个VLAN设置伪线（由于ECMP路由，可能会看到不同的路径）；见[RFC4448]。图3所示的远程路由器和以太网交换机在此设置中必须支持802.1Q。

The IP packet size of the metric implementation SHOULD be chosen small enough to avoid fragmentation due to the added Ethernet and tunnel headers. Otherwise, the impact of tunnel overhead on fragmentation and interface MTU size must be understood and taken into account (see [RFC4459]).

度量实现的IP数据包大小应选择得足够小，以避免由于添加了以太网和隧道报头而导致的碎片。否则，必须理解并考虑隧道开销对碎片和接口MTU大小的影响（见[RFC4459]）。

An Ethernet port mode IP tunnel carrying several 802.1Q VLANs each containing measurement traffic of a single measurement system was successfully applied when testing compatibility of two metric implementations [TESTPLAN]. Ethernet over Layer 2 Tunneling Protocol Version 3 (L2TPv3) [RFC4719] was picked for this test.

在测试两个度量实现的兼容性时，成功应用了一个以太网端口模式IP隧道，该隧道承载多个802.1Q VLAN，每个VLAN包含单个测量系统的测量流量[TESTPLAN]。本测试选用了第2层以太网隧道协议版本3（L2TPv3）[RFC4719]。

The following headers may have to be accounted for when calculating total packet length, if VLANs and Ethernet over L2TPv3 tunnels are applied:

如果应用了VLAN和L2TPv3隧道上的以太网，则在计算总数据包长度时，可能必须考虑以下报头：

o Ethernet 802.1Q: 22 bytes.

o 以太网802.1Q:22字节。

o L2TPv3 Header: 4-16 bytes for L2TPv3 data messages over IP; 16-28 bytes for L2TPv3 data messages over UDP.

o L2TPv3报头：IP上L2TPv3数据报文4-16字节；UDP上L2TPv3数据消息的16-28字节。

o IPv4 Header (outer IP header): 20 bytes.

o IPv4标头（外部IP标头）：20字节。

o MPLS Labels may be added by a carrier. Each MPLS Label has a length of 4 bytes. At the time of this writing, between 1 and 4 Labels seems to be a fair guess of what's expected.

o MPLS标签可以由运营商添加。每个MPLS标签的长度为4字节。在撰写本文时，1到4个标签似乎是一个合理的猜测。

The applicability of one or more of the following tunneling protocols may be investigated by interested parties if Ethernet over L2TPv3 is felt to be unsuitable: IP in IP [RFC2003] or Generic Routing Encapsulation (GRE) [RFC2784]. RFC 4928 [RFC4928] proposes measures how to avoid ECMP treatment in MPLS networks.

如果认为L2TPv3上的以太网不合适，则相关方可调查以下一个或多个隧道协议的适用性：IP-in-IP[RFC2003]或通用路由封装（GRE）[RFC2784]。RFC 4928[RFC4928]提出了避免MPLS网络中ECMP处理的措施。

L2TP is a commodity tunneling protocol [RFC2661]. At the time of this writing, L2TPv3 [RFC3931] is the latest version of L2TP. If L2TPv3 is applied, software-based implementations of this protocol are not suitable for the test setup, as such implementations may cause incalculable delay shifts.

L2TP是一种商品隧道协议[RFC2661]。在撰写本文时，L2TPv3[RFC3931]是L2TP的最新版本。如果应用L2TPv3，则此协议的基于软件的实现不适合测试设置，因为此类实现可能会导致无法计算的延迟偏移。

Ethernet pseudowires may also be set up on MPLS networks [RFC4448]. While there is no technical issue with this solution, MPLS interfaces are mostly found in the network provider domain. Hence, not all of the above criteria for selecting a tunneling protocol are met.

也可以在MPLS网络上设置以太网伪线[RFC4448]。虽然此解决方案没有技术问题，但MPLS接口大多位于网络提供商域中。因此，并非所有上述选择隧道协议的标准都满足。

Note that setting up a metric test environment is not a plug-and-play issue. Skilled networking engineers should be consulted and involved if a setup between remote sites is preferred.

请注意，设置度量测试环境不是即插即用问题。如果首选远程站点之间的设置，则应咨询熟练的网络工程师并让其参与。

Passing or failing an ADK test with 2 samples could be a random result (note that [RFC2330] defines a sample as a set of singleton metric values produced by a measurement stream, and we continue to use this terminology here). The error margin of a statistical test is higher if the number of samples it is based on is low (the number of samples taken influences the so-called "degree of freedom" of a

通过或失败2个样本的ADK测试可能是一个随机结果（请注意，[RFC2330]将样本定义为测量流产生的一组单例度量值，我们在此继续使用此术语）。如果统计测试所基于的样本数量较低，则统计测试的误差幅度较高（所采集的样本数量会影响测试的所谓“自由度”）

statistical test, and a higher degree of freedom produces more reliable results). To pass an ADK with higher probability, the number of samples collected per implementation under identical networking conditions SHOULD be greater than 2. Hardware and load constraints may enforce an upper limit on the number of simultaneous measurement streams. The ADK test allows one to combine different samples (see Section 9 of [ADK]) and then to run a 2-sample test between combined samples. At least 4 samples per implementation captured under identical networking conditions is RECOMMENDED when comparing different metric implementations by a statistical test.

统计检验，自由度越高，结果越可靠）。要以更高的概率通过ADK，在相同网络条件下每个实现采集的样本数应大于2。硬件和负载约束可能会对同时测量流的数量施加上限。ADK测试允许一个人组合不同的样本（见[ADK]第9节），然后在组合样本之间运行2样本测试。通过统计测试比较不同度量实现时，建议在相同网络条件下捕获的每个实现至少4个样本。

It is RECOMMENDED that tests be carried out by establishing N different parallel measurement flows. Two or three linecards per implementation serving to send or receive measurement flows should be sufficient to create 4 or more parallel measurement flows. Other options are to separate flows by DiffServ marks (without deploying any Quality of Service (QoS) in the inner or outer tunnel) or to use a single Constant Bitrate (CBR) flow and evaluate whether every n-th singleton belongs to a specific measurement flow. Note that a practical test indeed showed that ADK passed with 4 samples even if a 2-sample test failed [TESTPLAN].

建议通过建立N个不同的平行测量流来进行试验。每个用于发送或接收测量流的实现两个或三个线路卡应足以创建4个或更多并行测量流。其他选项是通过DiffServ标记分离流（不在内部或外部隧道中部署任何服务质量（QoS））或使用单个恒定比特率（CBR）流，并评估每个第n个单例是否属于特定测量流。请注意，实际测试确实表明，即使2个样本测试失败，ADK也通过了4个样本[TESTPLAN]。

Some additional guidelines to calculate and compare samples to perform a metric test are:

计算和比较样本以进行公制测试的一些附加指南如下：

o Comparing different probes of a common underlying distribution in terms of metrics characterizing a communication network requires respecting the temporal nature for which the assumption of a common underlying distribution may hold. Any singletons or samples to be compared must be captured within the same time interval.

o 根据表征通信网络的度量来比较公共基础分布的不同探测需要考虑公共基础分布的假设可能适用的时间性质。必须在同一时间间隔内捕获要比较的任何单例或样本。

o If statistical events like rates are used to characterize measured metrics of a time interval, a minimum of 5 singletons of a relevant metric should be picked to ensure a minimum confidence into the reported value. The error margin of the determined rate depends on the number of singletons (refer to statistical textbooks on student's t-test). As an example, any packet loss measurement interval to be compared with the results of another implementation contains at least five lost packets to have some confidence that the observed loss rate wasn't caused by a small number of random packet drops.

o 如果使用统计事件（如速率）来描述时间间隔的测量指标，则应至少选择5个相关指标，以确保报告值的最小置信度。确定的比率的误差幅度取决于单身人数（参考学生t检验的统计教科书）。例如，要与另一个实现的结果进行比较的任何分组丢失测量间隔包含至少五个丢失的分组，以对观察到的丢失率不是由少量随机分组丢失引起具有一定的信心。

o The minimum number of singletons or samples to be compared by an Anderson-Darling test should be 100 per tested metric implementation. Note that the Anderson-Darling test detects small

o Anderson-Darling测试比较的单例或样本的最小数量应为每个测试度量实现100个。请注意，Anderson-Darling测试检测到小的

differences in distributions fairly well and will fail for a high number of compared results (RFC 2330 mentions an example with 8192 measurements where an Anderson-Darling test always failed).

分布的差异相当好，并且会因大量比较结果而失败（RFC 2330提到了一个8192次测量的例子，其中Anderson-Darling测试总是失败）。

o Generally, the Anderson-Darling test is sensitive to differences in the accuracy or bias associated with varying implementations or test conditions. These dissimilarities may result in differing averages of samples to be compared. An example may be different packet sizes, resulting in a constant delay difference between compared samples. Therefore, samples to be compared by an Anderson-Darling test MAY be calibrated by the difference of the average values of the samples. Any calibration of this kind MUST be documented in the test result.

o 通常，Anderson-Darling测试对与不同实现或测试条件相关的准确性或偏差的差异很敏感。这些差异可能导致待比较样本的平均值不同。示例可以是不同的分组大小，导致比较样本之间的恒定延迟差。因此，通过Anderson-Darling试验进行比较的样品可通过样品平均值的差异进行校准。此类校准必须记录在测试结果中。

3.3. Tests of Two or More Different Implementations against a Metric Specification

3.3. 根据度量规范对两个或多个不同实现进行测试

[RFC2330] expects that "a methodology for a given metric exhibits continuity if, for small variations in conditions, it results in small variations in the resulting measurements. Slightly more precisely, for every positive epsilon, there exists a positive delta, such that if two sets of conditions are within delta of each other, then the resulting measurements will be within epsilon of each other". A small variation in conditions in the context of the metric test proposed here can be seen as different implementations measuring the same metric along the same path.

[RFC2330]预计“给定度量的方法显示出连续性，如果条件发生微小变化，则会导致结果测量值出现微小变化。更精确地说，对于每个正ε，都存在一个正δ，这样，如果两组条件在彼此的δ范围内，那么得到的测量值将在彼此的ε范围内“。在此处提出的度量测试的上下文中，条件的微小变化可被视为沿着相同路径测量相同度量的不同实现。

IPPM metric specifications, however, allow for implementor options to the largest possible degree. It cannot be expected that two implementors allow 100% identical options in their implementations. Testers SHOULD pick the same metric measurement configurations for their systems when comparing their implementations by a metric test.

然而，IPPM度量规范最大程度地允许实施者选择。不能期望两个实现者在其实现中允许100%相同的选项。当通过度量测试比较实现时，测试人员应该为他们的系统选择相同的度量配置。

In some cases, a goodness-of-fit test may not be possible or show disappointing results. To clarify the difficulties arising from different metric implementation options, the individual options picked for every compared metric implementation should be documented as specified in Section 3.5. If the cause of the failure is a lack of specification clarity or multiple legitimate interpretations of the definition text, the text should be modified and the resulting memo proposed for consensus and (possible) advancement to Internet Standard.

在某些情况下，拟合优度测试可能不可行或显示令人失望的结果。为了澄清不同度量实施方案产生的困难，应按照第3.5节的规定，记录为每个比较度量实施选择的单个选项。如果失败的原因是规范不清晰或定义文本有多种合法解释，则应对文本进行修改，并提出最终备忘录，以达成共识，并（可能）提升至互联网标准。

The same statistical test as applicable to quantify precision of a single metric implementation must be used to compare metric result equivalence for different implementations. To document

必须使用适用于量化单个度量实现精度的相同统计测试来比较不同实现的度量结果等效性。记录

compatibility, the smallest measurement resolution at which the compared implementations passed the ADK sample test must be documented.

兼容性，必须记录比较的实现通过ADK样本测试的最小测量分辨率。

For different implementations of the same metric, "variations in conditions" are reasonably expected. The ADK test comparing samples of the different implementations may result in a lower precision than the test for precision in the same-implementation comparison.

对于同一度量的不同实现，“条件变化”是合理预期的。比较不同实现的样本的ADK测试可能会导致精度低于相同实现比较中的精度测试。

3.4. Clock Synchronization

3.4. 时钟同步

Clock synchronization effects require special attention. Accuracy of one-way active delay measurements for any metric implementation depends on clock synchronization between the source and destination of tests. Ideally, one-way active delay measurement [RFC2679] test endpoints either have direct access to independent GPS or CDMA-based time sources or indirect access to nearby NTP primary (stratum 1) time sources, equipped with GPS receivers. Access to these time sources may not be available at all test locations associated with different Internet paths, for a variety of reasons out of scope of this document.

时钟同步效应需要特别注意。任何度量实现的单向活动延迟测量的准确性取决于测试源和目标之间的时钟同步。理想情况下，单向主动延迟测量[RFC2679]测试端点可以直接访问独立的GPS或基于CDMA的时间源，也可以间接访问附近配备GPS接收机的NTP主（地层1）时间源。由于本文件范围之外的各种原因，可能无法在与不同互联网路径相关的所有测试位置访问这些时间源。

When secondary (stratum 2 and above) time sources are used with NTP running across the same network, whose metrics are subject to comparative implementation tests, network impairments can affect clock synchronization and distort sample one-way values and their interval statistics. Discarding sample one-way delay values for any implementation is recommended when one of the following reliability conditions is met:

当二级（第2层及以上）时间源与NTP一起使用时，NTP在同一网络上运行，其度量要经过比较实施测试，网络损伤可能会影响时钟同步，并扭曲样本单向值及其间隔统计信息。当满足以下可靠性条件之一时，建议丢弃任何实施的样本单向延迟值：

o Delay is measured and is finite in one direction but not the other.

o 延迟是可测量的，在一个方向上是有限的，但在另一个方向上不是有限的。

o Absolute value of the difference between the sum of one-way measurements in both directions and the round-trip measurement is greater than X% of the latter value.

o 两个方向的单向测量值和往返测量值之和之间的差值的绝对值大于后一个值的X%。

Examination of the second condition requires round-trip time (RTT) measurement for reference, e.g., based on TWAMP [RFC5357] in conjunction with one-way delay measurement.

第二种情况的检查需要参考往返时间（RTT）测量，例如，基于TWAMP[RFC5357]和单向延迟测量。

Specification of X% to strike a balance between identification of unreliable one-way delay samples and misidentification of reliable samples under a wide range of Internet path RTTs requires further study.

为了在不可靠单向延迟样本的识别和广泛互联网路径RTT下可靠样本的错误识别之间取得平衡，需要进一步研究X%的规格。

An IPPM-compliant metric implementation of an RFC that requires synchronized clocks is expected to provide precise measurement results.

需要同步时钟的RFC的IPPM兼容度量实现有望提供精确的测量结果。

IF an implementation publishes a specification of its precision, such as "a precision of 1 ms (+/- 500 us) with a confidence of 95%", then the specification should be met over a useful measurement duration. For example, if the metric is measured along an Internet path that is stable and not congested, then the precision specification should be met over durations of an hour or more.

如果实现发布了其精度规范，例如“置信度为95%的1 ms（+/-500 us）精度”，则应在有用的测量持续时间内满足该规范。例如，如果度量是沿着稳定且不拥挤的互联网路径测量的，则应在一小时或更长时间内满足精度规范。

3.5. Recommended Metric Verification Measurement Process

3.5. 推荐的公制检定测量过程

In order to meet their obligations under the IETF Standards Process, the IESG must be convinced that each metric specification advanced to Internet Standard status is clearly written, that there are a sufficient number of verified equivalent implementations, and that options that have been implemented are documented.

为了履行其在IETF标准过程下的义务，IESG必须确信，已达到互联网标准状态的每个度量规范都已明确写入，有足够数量的经验证的等效实施，并且已实施的选项已记录在案。

In the context of this document, metrics are designed to measure some characteristic of a data network. An aim of any metric definition should be that it is specified in a way that can reliably measure the specific characteristic in a repeatable way across multiple independent implementations.

在本文档的上下文中，度量被设计用于测量数据网络的某些特性。任何度量定义的目的都应该是以一种能够在多个独立实现中以可重复的方式可靠地测量特定特性的方式来指定它。

Each metric, statistic, or option of those to be validated MUST be compared against a reference measurement or another implementation as specified in this document.

必须将要验证的每个度量、统计或选项与参考度量或本文件中规定的其他实现进行比较。

Finally, the metric definitions, embodied in the text of the RFCs, are the objects that require evaluation and possible revision in order to advance to Internet Standard.

最后，RFC文本中体现的度量定义是需要评估和可能修订的对象，以达到互联网标准。

IF two (or more) implementations do not measure an equivalent metric as specified by this document,

如果两个（或两个以上）实施未按照本文件规定测量等效指标，

AND sources of measurement error do not adequately explain the lack of agreement,

测量误差的来源不能充分解释不一致性，

THEN the details of each implementation should be audited along with the exact definition text to determine if there is a lack of clarity that has caused the implementations to vary in a way that affects the correspondence of the results.

然后，应审核每个实现的详细信息以及准确的定义文本，以确定是否存在导致实现在某种程度上发生变化，从而影响结果一致性的不清晰之处。

IF there was a lack of clarity or multiple legitimate interpretations of the definition text,

如果定义文本缺乏明确性或多重合法解释，

THEN the text should be modified and the resulting memo proposed for consensus and (possible) advancement along the Standards Track.

然后，应修改文本，并提出形成的备忘录，以便在标准轨道上达成共识和（可能的）进展。

Finally, all the findings MUST be documented in a report that can support advancement to Internet Standard, as described here (similar to the reports described in [RFC5657]). The list of measurement devices used in testing satisfies the implementation requirement, while the test results provide information on the quality of each specification in the metric RFC (the surrogate for feature interoperability).

最后，所有调查结果必须记录在一份报告中，该报告可以支持互联网标准的发展，如本文所述（类似于[RFC5657]中所述的报告）。测试中使用的测量设备列表满足实现要求，而测试结果提供了度量RFC（功能互操作性的替代）中每个规范的质量信息。

The complete process of advancing a metric specification to a Standard as defined by this document is illustrated in Figure 4.

图4说明了将公制规范提升为本文件定义的标准的完整过程。

      ,---.
     /     \
    ( Start )
     \     /    Implementations
      `-+-'        +-------+
        |         /|   1   `.
    +---+----+   / +-------+ `.-----------+     ,-------.
    |  RFC   |  /             |Check for  |   ,' was RFC `. YES
    |        | /              |Equivalence....  clause x   ------+
    |        |/    +-------+  |under      |   `. clear?  ,'      |
    | Metric \.....|   2   ....relevant   |     `---+---'   +----+-----+
    | Metric |\    +-------+  |identical  |      No |       |Report    |
    | Metric | \              |network    |      +--+----+  |results + |
    |  ...   |  \             |conditions |      |Modify |  |Advance   |
    |        |   \ +-------+  |           |      |Spec   +--+RFC       |
    +--------+    \|   n   |.'+-----------+      +-------+  |request   |
                   +-------+                                +----------+

      ,---.
     /     \
    ( Start )
     \     /    Implementations
      `-+-'        +-------+
        |         /|   1   `.
    +---+----+   / +-------+ `.-----------+     ,-------.
    |  RFC   |  /             |Check for  |   ,' was RFC `. YES
    |        | /              |Equivalence....  clause x   ------+
    |        |/    +-------+  |under      |   `. clear?  ,'      |
    | Metric \.....|   2   ....relevant   |     `---+---'   +----+-----+
    | Metric |\    +-------+  |identical  |      No |       |Report    |
    | Metric | \              |network    |      +--+----+  |results + |
    |  ...   |  \             |conditions |      |Modify |  |Advance   |
    |        |   \ +-------+  |           |      |Spec   +--+RFC       |
    +--------+    \|   n   |.'+-----------+      +-------+  |request   |
                   +-------+                                +----------+

Figure 4: Illustration of the Metric Standardization Process

图4：公制标准化过程示意图

Any recommendation for the advancement of a metric specification MUST be accompanied by an implementation report. The implementation report needs to include the tests performed, the applied test setup, the specific metrics in the RFC, and reports of the tests performed with two or more implementations. The test plan needs to specify the precision reached for each measured metric and thus define the meaning of "statistically equivalent" for the specific metrics being tested.

关于改进度量规范的任何建议必须附有实施报告。实施报告需要包括执行的测试、应用的测试设置、RFC中的特定指标，以及使用两个或多个实施执行的测试的报告。测试计划需要指定每个测量指标达到的精度，从而定义测试特定指标的“统计等效”含义。

Ideally, the test plan would co-evolve with the development of the metric, since that's when participants have the clearest context in their minds regarding the different subtleties that can arise.

理想情况下，测试计划将随着度量的发展而发展，因为这时参与者的头脑中就可能出现的不同微妙之处有了最清晰的背景。

In particular, the implementation report MUST include the following at minimum:

具体而言，实施报告必须至少包括以下内容：

o The metric compared and the RFC specifying it. This includes statements as required by Section 3.1 ("Tests of an Individual Implementation against a Metric Specification") of this document.

o 比较的度量值和指定它的RFC。这包括本文件第3.1节（“针对公制规范的单个实施测试”）要求的声明。

o The measurement configuration and setup.

o 测量配置和设置。

o A complete specification of the measurement stream (mean rate, statistical distribution of packets, packet size or mean packet size, and their distribution), Differentiated Services Code Point (DSCP), and any other measurement stream properties that could result in deviating results. Deviations in results can also be caused if chosen IP addresses and ports of different implementations result in different layer 2 or layer 3 paths due to operation of Equal Cost Multi-Path routing in an operational network.

o 测量流的完整规范（平均速率、数据包的统计分布、数据包大小或平均数据包大小及其分布）、区分服务代码点（DSCP）以及可能导致偏离结果的任何其他测量流属性。如果选择的IP地址和不同实现的端口由于操作网络中的等成本多路径路由的操作而导致不同的第2层或第3层路径，则也可能导致结果的偏差。

o The duration of each measurement to be used for a metric validation, the number of measurement points collected for each metric during each measurement interval (i.e., the probe size), and the level of confidence derived from this probe size for each measurement interval.

o 用于度量验证的每次测量的持续时间，每个测量间隔期间为每个度量收集的测量点数量（即探针尺寸），以及从每个测量间隔的探针尺寸得出的置信水平。

o The result of the statistical tests performed for each metric validation as required by Section 3.3 ("Tests of Two or More Different Implementations against a Metric Specification") of this document.

o 根据本文件第3.3节（“针对度量规范的两个或多个不同实施的测试”）的要求，对每个度量验证执行的统计测试的结果。

o A parameterization of laboratory conditions and applied traffic and network conditions allowing reproduction of these laboratory conditions for readers of the implementation report.

o 实验室条件和应用流量及网络条件的参数化，允许为实施报告的读者再现这些实验室条件。

o The documentation helping to improve metric specifications defined by this section.

o 帮助改进本节定义的度量规范的文档。

All of the tests for each set SHOULD be run in a test setup as specified in Section 3.2 ("Test Setup Resulting in Identical Live Network Testing Conditions".

每一组的所有测试应在第3.2节（“导致相同实时网络测试条件的测试设置”）中规定的测试设置中运行。

If a different test setup is chosen, it is recommended to avoid effects falsifying results of validation measurements caused by real data networks (like parallelism in devices and networks). Data networks may forward packets differently in the case of:

如果选择不同的测试设置，建议避免真实数据网络（如设备和网络中的并行性）导致验证测量结果失真。在以下情况下，数据网络可以不同地转发数据包：

o Different packet sizes chosen for different metric implementations. A proposed countermeasure is selecting the same packet size when validating results of two samples or a sample against an original distribution.

o 为不同的度量实现选择不同的数据包大小。建议的对策是在验证两个样本或一个样本的结果与原始分布的对比时选择相同的分组大小。

o Selection of differing IP addresses and ports used by different metric implementations during metric validation tests. If ECMP is applied on the IP or MPLS level, different paths can result (note that it may be impossible to detect an MPLS ECMP path from an IP endpoint). A proposed countermeasure is to connect the measurement equipment to be compared by a NAT device or establish a single tunnel to transport all measurement traffic. The aim is to have the same IP addresses and port for all measurement packets or to avoid ECMP-based local routing diversion by using a layer 2 tunnel.

o 在度量验证测试期间，选择不同度量实现使用的不同IP地址和端口。如果ECMP应用于IP或MPLS级别，则可能会产生不同的路径（请注意，可能无法从IP端点检测MPLS ECMP路径）。建议的对策是连接NAT设备比较的测量设备，或建立一个隧道来传输所有测量流量。其目的是为所有测量数据包提供相同的IP地址和端口，或者通过使用第2层隧道避免基于ECMP的本地路由转移。

o Different IP options.

o 不同的IP选项。

o Different DSCP.

o 不同的DSCP。

o If the N measurements are captured using sequential measurements instead of simultaneous ones, then the following factors come into play: time varying paths and load conditions.

o 如果使用连续测量而不是同时测量来捕获N个测量值，则以下因素起作用：时变路径和负载条件。

3.6. Proposal to Determine an Equivalence Threshold for Each Metric Evaluated

3.6. 确定每个评估指标的等效阈值的建议

This section describes a proposal for maximum error of equivalence, based on performance comparison of identical implementations. This comparison may be useful for both ADK and non-ADK comparisons.

本节描述了基于相同实现的性能比较的最大等效误差建议。此比较可能对ADK和非ADK比较都有用。

Each metric is tested by two or more implementations (cross-implementation testing).

每个度量都由两个或多个实现进行测试（交叉实现测试）。

Each metric is also tested twice simultaneously by the *same* implementation, using different Src/Dst Address pairs and other differences such that the connectivity differences of the cross-implementation tests are also experienced and measured by the same implementation.

每个度量也由*相同*实现同时测试两次，使用不同的Src/Dst地址对和其他差异，以便交叉实现测试的连接差异也由相同实现体验和测量。

Comparative results for the same implementation represent a bound on cross-implementation equivalence. This should be particularly useful when the metric does *not* produce a continuous distribution of singleton values, such as with a loss metric or a duplication metric. Appendix A indicates how the ADK will work for one-way delay and should be likewise applicable to distributions of delay variation.

相同实现的比较结果代表了跨实现等价性的界限。当度量*不*产生单例值的连续分布时，例如使用损失度量或复制度量时，这应该特别有用。附录A说明了ADK如何适用于单向延迟，并且同样适用于延迟变化的分布。

Appendix B discusses two possible ways to perform the ADK analysis: the R statistical language [Rtool] with ADK package [Radk] and C++ code.

附录B讨论了执行ADK分析的两种可能的方法：R统计语言[RoToR]与ADK包[RADK]和C++代码。

Conclusion: the implementation with the largest difference in homogeneous comparison results is the lower bound on the equivalence threshold, noting that there may be other systematic errors to account for when comparing implementations.

结论：同质比较结果差异最大的实现是等效阈值的下限，注意在比较实现时可能存在其他系统误差。

Thus, when evaluating equivalence in cross-implementation results:

因此，在评估交叉实施结果中的等效性时：

Maximum_Error = Same_Implementation_Error + Systematic_Error

最大错误=相同的执行错误+系统错误

and only the systematic error need be decided beforehand.

只需预先确定系统误差。

In the case of ADK comparison, the largest same-implementation resolution of distribution equivalence can be used as a limit on cross-implementation resolutions (at the same confidence level).

在ADK比较的情况下，分布等价的最大相同实现分辨率可以用作交叉实现分辨率的限制（在相同的置信水平下）。

4. Acknowledgements

4. 致谢

Gerhard Hasslinger commented a first draft version of this document; he suggested statistical tests and the evaluation of time series information. Matthias Wieser's thesis on a metric test resulted in new input for this document. Henk Uijterwaal and Lars Eggert have encouraged and helped to organize this work. Mike Hamilton, Scott Bradner, David Mcdysan, and Emile Stephan commented on this document. Carol Davids reviewed a version of the document before it became a WG item.

格哈德·哈斯林格对本文件的初稿发表了评论；他建议对时间序列信息进行统计测试和评估。Matthias Wieser关于度量测试的论文为本文档提供了新的输入。Henk Uijterwaal和Lars Eggert鼓励并帮助组织了这项工作。Mike Hamilton、Scott Bradner、David Mcdysan和Emile Stephan对此文档发表了评论。卡罗尔·戴维斯（Carol Davids）在该文件成为工作组项目之前审查了该文件的一个版本。

5. Contributors

5. 贡献者

Scott Bradner, Vern Paxson, and Allison Mankin drafted [METRICTEST], and major parts of it are included in this document.

Scott Bradner、Vern Paxson和Allison Mankin起草了[METRICTEST]，其主要部分包含在本文件中。

6. Security Considerations

6. 安全考虑

This memo does not raise any specific security issues.

这份备忘录没有提出任何具体的安全问题。

7. References

7. 工具书类

7.1. Normative References

7.1. 规范性引用文件

[RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, October 1996.

[RFC2003]Perkins，C.，“IP内的IP封装”，RFC 2003，1996年10月。

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner，S.，“RFC中用于表示需求水平的关键词”，BCP 14，RFC 2119，1997年3月。

[RFC2330] Paxson, V., Almes, G., Mahdavi, J., and M. Mathis, "Framework for IP Performance Metrics", RFC 2330, May 1998.

[RFC2330]Paxson，V.，Almes，G.，Mahdavi，J.，和M.Mathis，“IP性能度量框架”，RFC 2330，1998年5月。

[RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", RFC 2661, August 1999.

[RFC2661]汤斯利，W.，瓦伦西亚，A.，鲁本斯，A.，帕尔，G.，佐恩，G.，和B.帕尔特，“第二层隧道协议“L2TP”，RFC 26611999年8月。

[RFC2679] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way Delay Metric for IPPM", RFC 2679, September 1999.

[RFC2679]Almes，G.，Kalidini，S.，和M.Zekauskas，“IPPM的单向延迟度量”，RFC 2679，1999年9月。

[RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, March 2000.

[RFC2784]Farinaci，D.，Li，T.，Hanks，S.，Meyer，D.，和P.Traina，“通用路由封装（GRE）”，RFC 27842000年3月。

[RFC3931] Lau, J., Townsley, M., and I. Goyret, "Layer Two Tunneling Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005.

[RFC3931]Lau，J.，Townsley，M.，和I.Goyret，“第二层隧道协议-版本3（L2TPv3）”，RFC 39312005年3月。

[RFC4448] Martini, L., Rosen, E., El-Aawar, N., and G. Heron, "Encapsulation Methods for Transport of Ethernet over MPLS Networks", RFC 4448, April 2006.

[RFC4448]Martini，L.，Rosen，E.，El Aawar，N.，和G.Heron，“通过MPLS网络传输以太网的封装方法”，RFC 4448，2006年4月。

[RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. Zekauskas, "A One-way Active Measurement Protocol (OWAMP)", RFC 4656, September 2006.

[RFC4656]Shalunov，S.，Teitelbaum，B.，Karp，A.，Boote，J.，和M.Zekauskas，“单向主动测量协议（OWAMP）”，RFC 46562006年9月。

[RFC4719] Aggarwal, R., Townsley, M., and M. Dos Santos, "Transport of Ethernet Frames over Layer 2 Tunneling Protocol Version 3 (L2TPv3)", RFC 4719, November 2006.

[RFC4719]Aggarwal，R.，Townsley，M.和M.Dos Santos，“通过第2层隧道协议版本3（L2TPv3）传输以太网帧”，RFC 4719，2006年11月。

[RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal Cost Multipath Treatment in MPLS Networks", BCP 128, RFC 4928, June 2007.

[RFC4928]Swallow，G.，Bryant，S.和L.Andersson，“避免MPLS网络中的等成本多路径处理”，BCP 128，RFC 4928，2007年6月。

[RFC5657] Dusseault, L. and R. Sparks, "Guidance on Interoperation and Implementation Reports for Advancement to Draft Standard", BCP 9, RFC 5657, September 2009.

[RFC5657]Dusseault，L.和R.Sparks，“推进标准草案的互操作和实施报告指南”，BCP 9，RFC 5657，2009年9月。

[RFC6410] Housley, R., Crocker, D., and E. Burger, "Reducing the Standards Track to Two Maturity Levels", BCP 9, RFC 6410, October 2011.

[RFC6410]Housley，R.，Crocker，D.，和E.Burger，“将标准轨道降低到两个成熟度水平”，BCP 9，RFC 6410，2011年10月。

7.2. Informative References

7.2. 资料性引用

[ADK] Scholz, F. and M. Stephens, "K-sample Anderson-Darling Tests of Fit, for Continuous and Discrete Cases", University of Washington, Technical Report No. 81, May 1986.

肖尔茨，F.和M. Stephens，“K样本Andersson亲爱的测试适合连续和离散病例”，华盛顿大学，技术报告第81号，1986年5月。

[GU-Duffield] Gu, Y., Duffield, N., Breslau, L., and S. Sen, "GRE Encapsulated Multicast Probing: A Scalable Technique for Measuring One-Way Loss", SIGMETRICS'07 San Diego, California, USA, June 2007.

[GU Duffield]GU，Y.，Duffield，N.，Breslau，L.，和S.Sen，“GRE封装多播探测：测量单向损耗的可扩展技术”，SIGMETRICS'07，圣地亚哥，加利福尼亚州，美国，2007年6月。

[METRICTEST] Bradner, S. and V. Paxson, "Advancement of metrics specifications on the IETF Standards Track", Work in Progress, August 2007.

[METRICTEST]Bradner，S.和V.Paxson，“IETF标准轨道上度量规范的进步”，进展中的工作，2007年8月。

[RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996.

[RFC2026]Bradner，S.，“互联网标准过程——第3版”，BCP 9，RFC 2026，1996年10月。

[RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the-Network Tunneling", RFC 4459, April 2006.

[RFC4459]Savola，P.，“网络隧道中的MTU和碎片问题”，RFC 4459，2006年4月。

[RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", RFC 5357, October 2008.

[RFC5357]Hedayat，K.，Krzanowski，R.，Morton，A.，Yum，K.，和J.Babiarz，“双向主动测量协议（TWAMP）”，RFC 5357，2008年10月。

[Radk] Scholz, F., "adk: Anderson-Darling K-Sample Test and Combinations of Such Tests. R package version 1.0", 2008.

[Radk]Scholz，F.，“adk:Anderson-Darling K样本测试及其组合。R软件包版本1.0”，2008年。

[Rtool] R Development Core Team, "R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0", 2011, <http://www.R-project.org/>.

[RToo] R发展核心团队，R：统计计算的语言和环境.R统计计算基础，奥地利维也纳，ISBN 3-900051-07- 0，2011，<http://www.R-project.org/>.

[TESTPLAN] Ciavattone, L., Geib, R., Morton, A., and M. Wieser, "Test Plan and Results for Advancing RFC 2679 on the Standards Track", Work in Progress, March 2012.

[测试计划]Ciavattone，L.，Geib，R.，Morton，A.，和M.Wieser，“在标准轨道上推进RFC 2679的测试计划和结果”，正在进行的工作，2012年3月。

Appendix A. An Example on a One-Way Delay Metric Validation

附录A.单向延迟度量验证示例

The text of this appendix is not binding. It is an example of what parts of a One-Way Delay Metric test could look like.

本附录的文本不具有约束力。这是单向延迟度量测试的一个示例。

A.1. Compliance to Metric Specification Requirements

A.1. 符合公制规范要求

One-Way Delay, Loss Threshold, RFC 2679

单向延迟，损失阈值，RFC 2679

This test determines if implementations use the same configured maximum waiting time delay from one measurement to another under different delay conditions and correctly declare packets arriving in excess of the waiting time threshold as lost. See Sections 3.5 (3rd bullet point) and 3.8.2 of [RFC2679].

该测试确定实现是否在不同的延迟条件下使用相同配置的最大等待时间延迟（从一个测量值到另一个测量值），并正确地将到达的超过等待时间阈值的数据包声明为丢失。参见[RFC2679]第3.5节（第三个要点）和第3.8.2节。

(1) Configure a path with 1-second one-way constant delay.

(1) 配置具有1秒单向恒定延迟的路径。

(2) Measure one-way delay with 2 or more implementations, using identical waiting time thresholds for loss set at 2 seconds.

(2) 使用相同的丢失等待时间阈值（设置为2秒），测量两个或多个实现的单向延迟。

(3) Configure the path with 3-second one-way delay.

(3) 配置具有3秒单向延迟的路径。

(4) Repeat measurements.

(4) 重复测量。

(5) Observe that the increase measured in step 4 caused all packets to be declared lost and that all packets that arrive successfully in step 2 are assigned a valid one-way delay.

(5) 请注意，步骤4中测量到的增加导致所有数据包被宣布丢失，并且在步骤2中成功到达的所有数据包都被分配了有效的单向延迟。

One-Way Delay, First Bit to Last Bit, RFC 2679

单向延迟，第一位到最后一位，RFC 2679

This test determines if implementations register the same relative increase in delay from one measurement to another under different delay conditions. This test tends to cancel the sources of error that may be present in an implementation. See Section 3.7.2 of [RFC2679] and Section 10.2 of [RFC2330].

该测试确定在不同的延迟条件下，实现是否记录了从一个测量到另一个测量的相同相对延迟增加。此测试倾向于取消实现中可能存在的错误源。参见[RFC2679]第3.7.2节和[RFC2330]第10.2节。

(1) Configure a path with X ms one-way constant delay and ideally include a low-speed link.

(1) 配置具有X ms单向恒定延迟的路径，理想情况下包括低速链路。

(2) Measure one-way delay with 2 or more implementations, using identical options and equal size small packets (e.g., 100 octet IP payload).

(2) 使用相同的选项和相同大小的小数据包（例如，100个八位组的IP有效负载），通过2个或多个实现测量单向延迟。

(3) Maintain the same path with X ms one-way delay.

(3) 以X ms单向延迟保持同一路径。

(4) Measure one-way delay with 2 or more implementations, using identical options and equal size large packets (e.g., 1500 octet IP payload).

(4) 使用相同的选项和相同大小的大数据包（例如，1500个八位组IP有效负载），通过2个或更多实现测量单向延迟。

(5) Observe that the increase measured in steps 2 and 4 is equivalent to the increase in ms expected due to the larger serialization time for each implementation. Most of the measurement errors in each system should cancel, if they are stationary.

(5) 请注意，步骤2和步骤4中测量到的增加相当于预期的毫秒增加，因为每个实现的序列化时间都更长。如果每个系统中的大多数测量误差是静止的，那么它们应该被消除。

One-Way Delay, RFC 2679

单向延迟，RFC 2679

This test determines if implementations register the same relative increase in delay from one measurement to another under different delay conditions. This test tends to cancel the sources of error that may be present in an implementation. This test is intended to evaluate measurements in Sections 3 and 4 of [RFC2679].

该测试确定在不同的延迟条件下，实现是否记录了从一个测量到另一个测量的相同相对延迟增加。此测试倾向于取消实现中可能存在的错误源。本试验旨在评估[RFC2679]第3节和第4节中的测量值。

(1) Configure a path with X ms one-way constant delay.

(1) 配置具有X ms单向恒定延迟的路径。

(2) Measure one-way delay with 2 or more implementations, using identical options.

(2) 使用相同的选项，通过两个或多个实现测量单向延迟。

(3) Configure the path with X+Y ms one-way delay.

(3) 配置具有X+Y毫秒单向延迟的路径。

(4) Repeat measurements.

(4) 重复测量。

(5) Observe that the increase measured in steps 2 and 4 is ~Y ms for each implementation. Most of the measurement errors in each system should cancel, if they are stationary.

(5) 观察步骤2和步骤4中测得的每种实施的增加量为~Y ms。如果每个系统中的大多数测量误差是静止的，那么它们应该被消除。

Error Calibration, RFC 2679

错误校准，RFC 2679

This is a simple check to determine if an implementation reports the error calibration as required in Section 4.8 of [RFC2679]. Note that the context (Type-P) must also be reported.

这是一个简单的检查，以确定实施是否按照[RFC2679]第4.8节的要求报告了误差校准。请注意，还必须报告上下文（Type-P）。

A.2. Examples Related to Statistical Tests for One-Way Delay

A.2. 与单向延迟统计测试相关的示例

A one-way delay measurement may pass an ADK test with a timestamp result of 1 ms. The same test may fail if timestamps with a resolution of 100 microseconds are evaluated. The implementation is then conforming to the metric specification up to a timestamp resolution of 1 ms.

单向延迟测量可能通过时间戳结果为1ms的ADK测试。如果对分辨率为100微秒的时间戳进行评估，则同一测试可能失败。然后，实现符合度量规范，时间戳分辨率高达1ms。

Let's assume another one-way delay measurement comparison between implementation 1 probing with a frequency of 2 probes per second and implementation 2 probing at a rate of 2 probes every 3 minutes. To ensure reasonable confidence in results, sample metrics are calculated from at least 5 singletons per compared time interval. This means that sample delay values are calculated for each system for identical 6-minute intervals for the duration of the whole test.

让我们假设另一个单向延迟测量比较：频率为每秒2个探测的实现1探测和频率为每3分钟2个探测的实现2探测。为确保结果的合理置信度，每个比较时间间隔至少从5个单例计算样本度量。这意味着在整个试验期间，以相同的6分钟间隔为每个系统计算样品延迟值。

Per 6-minute interval, the sample metric is calculated from 720 singletons for implementation 1 and from 6 singletons for implementation 2. Note that if outliers are not filtered, moving averages are an option for an evaluation too. The minimum move of an averaging interval is three minutes in this example.

对于实现1，每6分钟间隔从720个单例计算一次样本度量，对于实现2，从6个单例计算一次样本度量。请注意，如果未过滤异常值，移动平均值也是评估的一个选项。在本例中，平均间隔的最小移动为三分钟。

The data in Table 1 may result from measuring one-way delay with implementation 1 (see column Implemnt_1) and implementation 2 (see column Implemnt_2). Each data point in the table represents a (rounded) average of the sampled delay values per interval. The resolution of the clock is one micro-second. The difference in the delay values may result, e.g., from different probe packet sizes.

表1中的数据可能来自实施1（见第Implemnt_1列）和实施2（见第Implemnt_2列）的单向延迟测量。表中的每个数据点表示每个间隔采样延迟值的（四舍五入）平均值。时钟的分辨率是一微秒。延迟值的差异可能导致，例如，来自不同的探测分组大小。

         +------------+------------+-----------------------------+
         | Implemnt_1 | Implemnt_2 | Implemnt_2 - Delta_Averages |
         +------------+------------+-----------------------------+
         |    5000    |    6549    |             4997            |
         |    5008    |    6555    |             5003            |
         |    5012    |    6564    |             5012            |
         |    5015    |    6565    |             5013            |
         |    5019    |    6568    |             5016            |
         |    5022    |    6570    |             5018            |
         |    5024    |    6573    |             5021            |
         |    5026    |    6575    |             5023            |
         |    5027    |    6577    |             5025            |
         |    5029    |    6580    |             5028            |
         |    5030    |    6585    |             5033            |
         |    5032    |    6586    |             5034            |
         |    5034    |    6587    |             5035            |
         |    5036    |    6588    |             5036            |
         |    5038    |    6589    |             5037            |
         |    5039    |    6591    |             5039            |
         |    5041    |    6592    |             5040            |
         |    5043    |    6599    |             5047            |
         |    5046    |    6606    |             5054            |
         |    5054    |    6612    |             5060            |
         +------------+------------+-----------------------------+

         +------------+------------+-----------------------------+
         | Implemnt_1 | Implemnt_2 | Implemnt_2 - Delta_Averages |
         +------------+------------+-----------------------------+
         |    5000    |    6549    |             4997            |
         |    5008    |    6555    |             5003            |
         |    5012    |    6564    |             5012            |
         |    5015    |    6565    |             5013            |
         |    5019    |    6568    |             5016            |
         |    5022    |    6570    |             5018            |
         |    5024    |    6573    |             5021            |
         |    5026    |    6575    |             5023            |
         |    5027    |    6577    |             5025            |
         |    5029    |    6580    |             5028            |
         |    5030    |    6585    |             5033            |
         |    5032    |    6586    |             5034            |
         |    5034    |    6587    |             5035            |
         |    5036    |    6588    |             5036            |
         |    5038    |    6589    |             5037            |
         |    5039    |    6591    |             5039            |
         |    5041    |    6592    |             5040            |
         |    5043    |    6599    |             5047            |
         |    5046    |    6606    |             5054            |
         |    5054    |    6612    |             5060            |
         +------------+------------+-----------------------------+

Table 1

表1

Average values of sample metrics captured during identical time intervals are compared. This excludes random differences caused by differing probing intervals or differing temporal distance of singletons resulting from their Poisson-distributed sending times.

比较在相同时间间隔内捕获的样本度量的平均值。这排除了由于不同的探测间隔或由于泊松分布的发送时间而导致的单例时间距离不同而导致的随机差异。

In the example, 20 values have been picked (note that at least 100 values are recommended for a single run of a real test). Data must be ordered by ascending rank. The data of Implemnt_1 and Implemnt_2 as shown in the first two columns of Table 1 clearly fails an ADK test with 95% confidence.

在本例中，选择了20个值（请注意，对于一次实际测试，建议至少100个值）。数据必须按升序排序。如表1前两列所示，Implemnt_1和Implemnt_2的数据明显未通过ADK测试，置信度为95%。

The results of Implemnt_2 are now reduced by the difference of the averages of column 2 (rounded to 6581 us) and column 1 (rounded to 5029 us), which is 1552 us. The result may be found in column 3 of Table 1. Comparing column 1 and column 3 of the table by an ADK test shows that the data contained in these columns passes an ADK test with 95% confidence.

Implemnt_2的结果现在减少了第2列（四舍五入到6581 us）和第1列（四舍五入到5029 us）的平均值之差，即1552 us。结果见表1第3列。通过ADK测试比较表的第1列和第3列，表明这些列中包含的数据以95%的置信度通过了ADK测试。

Comment: Extensive averaging was used in this example because of the vastly different sampling frequencies. As a result, the distributions compared do not exactly align with a metric in [RFC2679] but illustrate the ADK process adequately.

注释：由于采样频率的巨大差异，本例中使用了广泛的平均值。因此，所比较的分布并不完全符合[RFC2679]中的指标，但充分说明了ADK过程。

Appendix B. Anderson-Darling K-sample Reference and 2 Sample C++ Code

附录B.Anderson Delk K-McLoad参考和2示例C++代码

There are many statistical tools available, and this appendix describes two that are familiar to the authors.

有许多可用的统计工具，本附录描述了作者熟悉的两种工具。

The "R tool" is a language and command-line environment for statistical computing and plotting [Rtool]. With the optional "adk" package installed [Radk], it can perform individual and combined sample ADK computations. The user must consult the package documentation and the original paper [ADK] to interpret the results, but this is as it should be.

“R工具”是用于统计计算和绘图的语言和命令行环境[Rtool]。安装了可选的“adk”包[Radk]，它可以执行单独和组合的样本adk计算。用户必须查阅软件包文档和原始文件[ADK]来解释结果，但这是应该的。

The C++ code below will perform an AD2-sample comparison when compiled and presented with two column vectors in a file (using white space as separation). This version contains modifications made by Wes Eddy in Sept 2011 to use the vectors and run as a stand-alone module. The status of the comparison can be checked on the command line with "$ echo $?" or the last line can be replaced with a printf statement for adk_result instead.

在编译时，下面的C++代码将执行AD2示例比较，并在文件中呈现两列向量（使用空白作为分离）。该版本包含Wes Eddy在2011年9月进行的修改，以使用矢量并作为独立模块运行。可以在命令行上使用“$echo$？”检查比较的状态，或者用adk_结果的printf语句替换最后一行。

/*

/*

Redistribution and use in source and binary forms, with or without modification, is permitted pursuant to, and subject to the license terms contained in, the Simplified BSD License set forth in Section 4.c of the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info).

根据IETF信托有关IETF文件的法律规定第4.c节规定的简化BSD许可证中包含的许可条款，允许以源代码和二进制格式重新分发和使用，无论是否修改(http://trustee.ietf.org/license-info).

  /* Routines for computing the Anderson-Darling 2 sample
  * test statistic.
  *
  * Implemented based on the description in
  * "Anderson-Darling K Sample Test" Heckert, Alan and
  * Filliben, James, editors, Dataplot Reference Manual,
  * Chapter 15 Auxiliary, NIST, 2004.
  * Official Reference by 2010
  * Heckert, N. A. (2001).  Dataplot website at the
  * National Institute of Standards and Technology:
  * http://www.itl.nist.gov/div898/software/dataplot.html/
  * June 2001.
 */

  /* Routines for computing the Anderson-Darling 2 sample
  * test statistic.
  *
  * Implemented based on the description in
  * "Anderson-Darling K Sample Test" Heckert, Alan and
  * Filliben, James, editors, Dataplot Reference Manual,
  * Chapter 15 Auxiliary, NIST, 2004.
  * Official Reference by 2010
  * Heckert, N. A. (2001).  Dataplot website at the
  * National Institute of Standards and Technology:
  * http://www.itl.nist.gov/div898/software/dataplot.html/
  * June 2001.
 */

 #include <iostream>
 #include <fstream>
 #include <vector>
 #include <sstream>

 #include <iostream>
 #include <fstream>
 #include <vector>
 #include <sstream>

using namespace std;

使用名称空间std；

 int main() {
    vector<double> vec1, vec2;
    double adk_result;
    static int k, val_st_z_samp1, val_st_z_samp2,
               val_eq_z_samp1, val_eq_z_samp2,
               j, n_total, n_sample1, n_sample2, L,
               max_number_samples, line, maxnumber_z;
    static int column_1, column_2;
    static double adk, n_value, z, sum_adk_samp1,
                  sum_adk_samp2, z_aux;
    static double H_j, F1j, hj, F2j, denom_1_aux, denom_2_aux;
    static bool next_z_sample2, equal_z_both_samples;
    static int stop_loop1, stop_loop2, stop_loop3,old_eq_line2,
               old_eq_line1;

 int main() {
    vector<double> vec1, vec2;
    double adk_result;
    static int k, val_st_z_samp1, val_st_z_samp2,
               val_eq_z_samp1, val_eq_z_samp2,
               j, n_total, n_sample1, n_sample2, L,
               max_number_samples, line, maxnumber_z;
    static int column_1, column_2;
    static double adk, n_value, z, sum_adk_samp1,
                  sum_adk_samp2, z_aux;
    static double H_j, F1j, hj, F2j, denom_1_aux, denom_2_aux;
    static bool next_z_sample2, equal_z_both_samples;
    static int stop_loop1, stop_loop2, stop_loop3,old_eq_line2,
               old_eq_line1;

static double adk_criterium = 1.993;

静态双adk_准则=1.993；

    /* vec1 and vec2 to be initialized with sample 1 and
     * sample 2 values in ascending order */
    while (!cin.eof()) {
       double f1, f2;
       cin >> f1;
       cin >> f2;
       vec1.push_back(f1);
       vec2.push_back(f2);

    /* vec1 and vec2 to be initialized with sample 1 and
     * sample 2 values in ascending order */
    while (!cin.eof()) {
       double f1, f2;
       cin >> f1;
       cin >> f2;
       vec1.push_back(f1);
       vec2.push_back(f2);

}

    k = 2;
    n_sample1 = vec1.size() - 1;
    n_sample2 = vec2.size() - 1;

    k = 2;
    n_sample1 = vec1.size() - 1;
    n_sample2 = vec2.size() - 1;

    // -1 because vec[0] is a dummy value
    n_total = n_sample1 + n_sample2;

    // -1 because vec[0] is a dummy value
    n_total = n_sample1 + n_sample2;

    /* value equal to the line with a value = zj in sample 1.
     * Here j=1, so the line is 1.
     */
    val_eq_z_samp1 = 1;

    /* value equal to the line with a value = zj in sample 1.
     * Here j=1, so the line is 1.
     */
    val_eq_z_samp1 = 1;

    /* value equal to the line with a value = zj in sample 2.
     * Here j=1, so the line is 1.
     */
    val_eq_z_samp2 = 1;

    /* value equal to the line with a value = zj in sample 2.
     * Here j=1, so the line is 1.
     */
    val_eq_z_samp2 = 1;

    /* value equal to the last line with a value < zj
     * in sample 1.  Here j=1, so the line is 0.
     */
    val_st_z_samp1 = 0;

    /* value equal to the last line with a value < zj
     * in sample 1.  Here j=1, so the line is 0.
     */
    val_st_z_samp1 = 0;

    /* value equal to the last line with a value < zj
     * in sample 1.  Here j=1, so the line is 0.
     */
    val_st_z_samp2 = 0;

    /* value equal to the last line with a value < zj
     * in sample 1.  Here j=1, so the line is 0.
     */
    val_st_z_samp2 = 0;

    sum_adk_samp1 = 0;
    sum_adk_samp2 = 0;
    j = 1;

    sum_adk_samp1 = 0;
    sum_adk_samp2 = 0;
    j = 1;

    // as mentioned above, j=1
    equal_z_both_samples = false;

    // as mentioned above, j=1
    equal_z_both_samples = false;

next_z_sample2 = false;

next_z_sample2=false；

    //assuming the next z to be of sample 1
    stop_loop1 = n_sample1 + 1;

    //assuming the next z to be of sample 1
    stop_loop1 = n_sample1 + 1;

    // + 1 because vec[0] is a dummy, see n_sample1 declaration
    stop_loop2 = n_sample2 + 1;
    stop_loop3 = n_total + 1;

    // + 1 because vec[0] is a dummy, see n_sample1 declaration
    stop_loop2 = n_sample2 + 1;
    stop_loop3 = n_total + 1;

    /* The required z values are calculated until all values
     * of both samples have been taken into account.  See the
     * lines above for the stoploop values.  Construct required

    /* The required z values are calculated until all values
     * of both samples have been taken into account.  See the
     * lines above for the stoploop values.  Construct required

     * to avoid a mathematical operation in the while condition.
     */
    while (((stop_loop1 > val_eq_z_samp1)
           || (stop_loop2 > val_eq_z_samp2)) && stop_loop3 > j)
    {
      if(val_eq_z_samp1 < n_sample1+1)
      {
     /* here, a preliminary zj value is set.
      * See below how to calculate the actual zj.
      */
            z = vec1[val_eq_z_samp1];

     * to avoid a mathematical operation in the while condition.
     */
    while (((stop_loop1 > val_eq_z_samp1)
           || (stop_loop2 > val_eq_z_samp2)) && stop_loop3 > j)
    {
      if(val_eq_z_samp1 < n_sample1+1)
      {
     /* here, a preliminary zj value is set.
      * See below how to calculate the actual zj.
      */
            z = vec1[val_eq_z_samp1];

     /* this while sequence calculates the number of values
      * equal to z.
      */
            while ((val_eq_z_samp1+1 < n_sample1)
                    && z == vec1[val_eq_z_samp1+1] )
                    {
                    val_eq_z_samp1++;
                    }
            }
            else
            {
            val_eq_z_samp1 = 0;
            val_st_z_samp1 = n_sample1;

     /* this while sequence calculates the number of values
      * equal to z.
      */
            while ((val_eq_z_samp1+1 < n_sample1)
                    && z == vec1[val_eq_z_samp1+1] )
                    {
                    val_eq_z_samp1++;
                    }
            }
            else
            {
            val_eq_z_samp1 = 0;
            val_st_z_samp1 = n_sample1;

    // this should be val_eq_z_samp1 - 1 = n_sample1
            }

    // this should be val_eq_z_samp1 - 1 = n_sample1
            }

    if(val_eq_z_samp2 < n_sample2+1)
            {
            z_aux = vec2[val_eq_z_samp2];;

    if(val_eq_z_samp2 < n_sample2+1)
            {
            z_aux = vec2[val_eq_z_samp2];;

    /* this while sequence calculates the number of values
     * equal to z_aux
     */

    /* this while sequence calculates the number of values
     * equal to z_aux
     */

            while ((val_eq_z_samp2+1 < n_sample2)
                    && z_aux == vec2[val_eq_z_samp2+1] )
                    {
                    val_eq_z_samp2++;
                    }

            while ((val_eq_z_samp2+1 < n_sample2)
                    && z_aux == vec2[val_eq_z_samp2+1] )
                    {
                    val_eq_z_samp2++;
                    }

    /* the smaller of the two actual data values is picked
     * as the next zj.
     */

    /* the smaller of the two actual data values is picked
     * as the next zj.
     */

if(z > z_aux)

如果（z>z_辅助）

                    {
                    z = z_aux;
                    next_z_sample2 = true;
                    }
             else
                    {
                    if (z == z_aux)
                    {
                    equal_z_both_samples = true;
                    }

                    {
                    z = z_aux;
                    next_z_sample2 = true;
                    }
             else
                    {
                    if (z == z_aux)
                    {
                    equal_z_both_samples = true;
                    }

    /* This is the case if the last value of column1 is
     * smaller than the remaining values of column2.
     */
                   if (val_eq_z_samp1 == 0)
                    {
                    z = z_aux;
                    next_z_sample2 = true;
                    }
                }
            }
           else
              {
            val_eq_z_samp2 = 0;
            val_st_z_samp2 = n_sample2;

    /* This is the case if the last value of column1 is
     * smaller than the remaining values of column2.
     */
                   if (val_eq_z_samp1 == 0)
                    {
                    z = z_aux;
                    next_z_sample2 = true;
                    }
                }
            }
           else
              {
            val_eq_z_samp2 = 0;
            val_st_z_samp2 = n_sample2;

    // this should be val_eq_z_samp2 - 1 = n_sample2

    // this should be val_eq_z_samp2 - 1 = n_sample2

}

     /* in the following, sum j = 1 to L is calculated for
      * sample 1 and sample 2.
      */
           if (equal_z_both_samples)
              {

     /* in the following, sum j = 1 to L is calculated for
      * sample 1 and sample 2.
      */
           if (equal_z_both_samples)
              {

              /* hj is the number of values in the combined sample
               * equal to zj
               */
                   hj = val_eq_z_samp1 - val_st_z_samp1
                  + val_eq_z_samp2 - val_st_z_samp2;

              /* hj is the number of values in the combined sample
               * equal to zj
               */
                   hj = val_eq_z_samp1 - val_st_z_samp1
                  + val_eq_z_samp2 - val_st_z_samp2;

              /* H_j is the number of values in the combined sample
               * smaller than zj plus one half the number of
               * values in the combined sample equal to zj
               * (that's hj/2).
               */
                  H_j = val_st_z_samp1 + val_st_z_samp2

              /* H_j is the number of values in the combined sample
               * smaller than zj plus one half the number of
               * values in the combined sample equal to zj
               * (that's hj/2).
               */
                  H_j = val_st_z_samp1 + val_st_z_samp2

+ hj / 2;

+ hj/2；

              /* F1j is the number of values in the 1st sample
               * that are less than zj plus one half the number
               * of values in this sample that are equal to zj.
               */

              /* F1j is the number of values in the 1st sample
               * that are less than zj plus one half the number
               * of values in this sample that are equal to zj.
               */

                  F1j = val_st_z_samp1 + (double)
                      (val_eq_z_samp1 - val_st_z_samp1) / 2;

                  F1j = val_st_z_samp1 + (double)
                      (val_eq_z_samp1 - val_st_z_samp1) / 2;

              /* F2j is the number of values in the 1st sample
               * that are less than zj plus one half the number
               * of values in this sample that are equal to zj.
               */
                  F2j = val_st_z_samp2 + (double)
                     (val_eq_z_samp2 - val_st_z_samp2) / 2;

              /* F2j is the number of values in the 1st sample
               * that are less than zj plus one half the number
               * of values in this sample that are equal to zj.
               */
                  F2j = val_st_z_samp2 + (double)
                     (val_eq_z_samp2 - val_st_z_samp2) / 2;

              /* set the line of values equal to zj to the
               * actual line of the last value picked for zj.
               */
                  val_st_z_samp1 = val_eq_z_samp1;

              /* set the line of values equal to zj to the
               * actual line of the last value picked for zj.
               */
                  val_st_z_samp1 = val_eq_z_samp1;

              /* Set the line of values equal to zj to the actual
               * line of the last value picked for zj of each
               * sample.  This is required as data smaller than zj
               * is accounted differently than values equal to zj.
               */
                  val_st_z_samp2 = val_eq_z_samp2;

              /* Set the line of values equal to zj to the actual
               * line of the last value picked for zj of each
               * sample.  This is required as data smaller than zj
               * is accounted differently than values equal to zj.
               */
                  val_st_z_samp2 = val_eq_z_samp2;

              /* next the lines of the next values z, i.e., zj+1
               * are addressed.
               */
                val_eq_z_samp1++;

              /* next the lines of the next values z, i.e., zj+1
               * are addressed.
               */
                val_eq_z_samp1++;

              /* next the lines of the next values z, i.e.,
               * zj+1 are addressed
               */
                  val_eq_z_samp2++;
                  }
           else
                  {

              /* next the lines of the next values z, i.e.,
               * zj+1 are addressed
               */
                  val_eq_z_samp2++;
                  }
           else
                  {

              /* the smaller z value was contained in sample 2;
               * hence, this value is the zj to base the following
               * calculations on.
               */
                            if (next_z_sample2)
                            {

              /* the smaller z value was contained in sample 2;
               * hence, this value is the zj to base the following
               * calculations on.
               */
                            if (next_z_sample2)
                            {

              /* hj is the number of values in the combined
               * sample equal to zj; in this case, these are
               * within sample 2 only.
               */
                            hj = val_eq_z_samp2 - val_st_z_samp2;

              /* hj is the number of values in the combined
               * sample equal to zj; in this case, these are
               * within sample 2 only.
               */
                            hj = val_eq_z_samp2 - val_st_z_samp2;

              /* H_j is the number of values in the combined sample
               * smaller than zj plus one half the number of
               * values in the combined sample equal to zj
               * (that's hj/2).
               */
                                H_j = val_st_z_samp1 + val_st_z_samp2
                              + hj / 2;

              /* H_j is the number of values in the combined sample
               * smaller than zj plus one half the number of
               * values in the combined sample equal to zj
               * (that's hj/2).
               */
                                H_j = val_st_z_samp1 + val_st_z_samp2
                              + hj / 2;

              /* F1j is the number of values in the 1st sample that
               * are less than zj plus one half the number of values in
               * this sample that are equal to zj.
               * As val_eq_z_samp2 < val_eq_z_samp1, these are the
               * val_st_z_samp1 only.
               */
                            F1j = val_st_z_samp1;

              /* F1j is the number of values in the 1st sample that
               * are less than zj plus one half the number of values in
               * this sample that are equal to zj.
               * As val_eq_z_samp2 < val_eq_z_samp1, these are the
               * val_st_z_samp1 only.
               */
                            F1j = val_st_z_samp1;

              /* F2j is the number of values in the 1st sample that
               * are less than zj plus one half the number of values in
               * this sample that are equal to zj.  The latter are from
               * sample 2 only in this case.
               */

              /* F2j is the number of values in the 1st sample that
               * are less than zj plus one half the number of values in
               * this sample that are equal to zj.  The latter are from
               * sample 2 only in this case.
               */

                    F2j = val_st_z_samp2 + (double)
                         (val_eq_z_samp2 - val_st_z_samp2) / 2;

                    F2j = val_st_z_samp2 + (double)
                         (val_eq_z_samp2 - val_st_z_samp2) / 2;

              /* Set the line of values equal to zj to the actual line
               * of the last value picked for zj of sample 2 only in
               * this case.
               */
                                val_st_z_samp2 = val_eq_z_samp2;

              /* Set the line of values equal to zj to the actual line
               * of the last value picked for zj of sample 2 only in
               * this case.
               */
                                val_st_z_samp2 = val_eq_z_samp2;

              /* next the line of the next value z, i.e., zj+1 is
               * addressed.  Here, only sample 2 must be addressed.
               */

              /* next the line of the next value z, i.e., zj+1 is
               * addressed.  Here, only sample 2 must be addressed.
               */

                    val_eq_z_samp2++;
                                    if (val_eq_z_samp1 == 0)
                                    {
                                    val_eq_z_samp1 = stop_loop1;
                                    }
                            }

                    val_eq_z_samp2++;
                                    if (val_eq_z_samp1 == 0)
                                    {
                                    val_eq_z_samp1 = stop_loop1;
                                    }
                            }

    /* the smaller z value was contained in sample 2;
     * hence, this value is the zj to base the following
     * calculations on.
     */

    /* the smaller z value was contained in sample 2;
     * hence, this value is the zj to base the following
     * calculations on.
     */

else {

否则{

    /* hj is the number of values in the combined
     * sample equal to zj; in this case, these are
     * within sample 1 only.
     */
                  hj = val_eq_z_samp1 - val_st_z_samp1;

    /* hj is the number of values in the combined
     * sample equal to zj; in this case, these are
     * within sample 1 only.
     */
                  hj = val_eq_z_samp1 - val_st_z_samp1;

    /* H_j is the number of values in the combined
     * sample smaller than zj plus one half the number
     * of values in the combined sample equal to zj
     * (that's hj/2).
     */

    /* H_j is the number of values in the combined
     * sample smaller than zj plus one half the number
     * of values in the combined sample equal to zj
     * (that's hj/2).
     */

          H_j = val_st_z_samp1 + val_st_z_samp2
                + hj / 2;

          H_j = val_st_z_samp1 + val_st_z_samp2
                + hj / 2;

    /* F1j is the number of values in the 1st sample that
     * are less than zj plus; in this case, these are within
     * sample 1 only one half the number of values in this
     * sample that are equal to zj.  The latter are from
     * sample 1 only in this case.
     */

    /* F1j is the number of values in the 1st sample that
     * are less than zj plus; in this case, these are within
     * sample 1 only one half the number of values in this
     * sample that are equal to zj.  The latter are from
     * sample 1 only in this case.
     */

          F1j = val_st_z_samp1 + (double)
               (val_eq_z_samp1 - val_st_z_samp1) / 2;

          F1j = val_st_z_samp1 + (double)
               (val_eq_z_samp1 - val_st_z_samp1) / 2;

    /* F2j is the number of values in the 1st sample that
     * are less than zj plus one half the number of values
     * in this sample that are equal to zj.  As
     * val_eq_z_samp1 < val_eq_z_samp2, these are the
     * val_st_z_samp2 only.
     */

    /* F2j is the number of values in the 1st sample that
     * are less than zj plus one half the number of values
     * in this sample that are equal to zj.  As
     * val_eq_z_samp1 < val_eq_z_samp2, these are the
     * val_st_z_samp2 only.
     */

F2j = val_st_z_samp2;

F2j=val_st_z_samp2；

    /* Set the line of values equal to zj to the actual line
     * of the last value picked for zj of sample 1 only in
     * this case.
     */

    /* Set the line of values equal to zj to the actual line
     * of the last value picked for zj of sample 1 only in
     * this case.
     */

val_st_z_samp1 = val_eq_z_samp1;

val_st_z_samp1=val_eq_z_samp1；

    /* next the line of the next value z, i.e., zj+1 is
     * addressed.  Here, only sample 1 must be addressed.
     */
                  val_eq_z_samp1++;

    /* next the line of the next value z, i.e., zj+1 is
     * addressed.  Here, only sample 1 must be addressed.
     */
                  val_eq_z_samp1++;

                  if (val_eq_z_samp2 == 0)
                          {
                          val_eq_z_samp2 = stop_loop2;
                          }
                  }
                  }

                  if (val_eq_z_samp2 == 0)
                          {
                          val_eq_z_samp2 = stop_loop2;
                          }
                  }
                  }

            denom_1_aux = n_total * F1j - n_sample1 * H_j;
            denom_2_aux = n_total * F2j - n_sample2 * H_j;

            denom_1_aux = n_total * F1j - n_sample1 * H_j;
            denom_2_aux = n_total * F2j - n_sample2 * H_j;

            sum_adk_samp1 = sum_adk_samp1 + hj
                    * (denom_1_aux * denom_1_aux) /
                                       (H_j * (n_total - H_j)
                    - n_total * hj / 4);
            sum_adk_samp2 = sum_adk_samp2 + hj
           * (denom_2_aux * denom_2_aux) /
                               (H_j * (n_total - H_j)
          - n_total * hj / 4);

            sum_adk_samp1 = sum_adk_samp1 + hj
                    * (denom_1_aux * denom_1_aux) /
                                       (H_j * (n_total - H_j)
                    - n_total * hj / 4);
            sum_adk_samp2 = sum_adk_samp2 + hj
           * (denom_2_aux * denom_2_aux) /
                               (H_j * (n_total - H_j)
          - n_total * hj / 4);

            next_z_sample2 = false;
            equal_z_both_samples = false;

            next_z_sample2 = false;
            equal_z_both_samples = false;

    /* index to count the z.  It is only required to prevent
     * the while slope to execute endless
     */
            j++;
            }

    /* index to count the z.  It is only required to prevent
     * the while slope to execute endless
     */
            j++;
            }

    // calculating the adk value is the final step.
    adk_result = (double) (n_total - 1) / (n_total
           * n_total * (k - 1))
            * (sum_adk_samp1 / n_sample1
            + sum_adk_samp2 / n_sample2);

    // calculating the adk value is the final step.
    adk_result = (double) (n_total - 1) / (n_total
           * n_total * (k - 1))
            * (sum_adk_samp1 / n_sample1
            + sum_adk_samp2 / n_sample2);

    /* if(adk_result <= adk_criterium)
     * adk_2_sample test is passed
     */
    return adk_result <= adk_criterium;
 }

    /* if(adk_result <= adk_criterium)
     * adk_2_sample test is passed
     */
    return adk_result <= adk_criterium;
 }

Appendix C. Glossary

附录C.词汇表

   +-------------+-----------------------------------------------------+
   | ADK         | Anderson-Darling K-Sample test, a test used to      |
   |             | check whether two samples have the same statistical |
   |             | distribution.                                       |
   | ECMP        | Equal Cost Multipath, a load-balancing mechanism    |
   |             | evaluating MPLS Labels stacks, IP addresses, and    |
   |             | ports.                                              |
   | EDF         | The "empirical distribution function" of a set of   |
   |             | scalar measurements is a function F(x), which for   |
   |             | any x gives the fractional proportion of the total  |
   |             | measurements that were smaller than or equal to x.  |
   | Metric      | A measured quantity related to the performance and  |
   |             | reliability of the Internet, expressed by a value.  |
   |             | This could be a singleton (single value), a sample  |
   |             | of single values, or a statistic based on a sample  |
   |             | of singletons.                                      |
   | OWAMP       | One-Way Active Measurement Protocol, a protocol for |
   |             | communication between IPPM measurement systems      |
   |             | specified by IPPM.                                  |
   | OWD         | One-Way Delay, a performance metric specified by    |
   |             | IPPM.                                               |
   | Sample      | A sample metric is derived from a given singleton   |
   | metric      | metric by evaluating a number of distinct instances |
   |             | together.                                           |
   | Singleton   | A singleton metric is, in a sense, one atomic       |
   | metric      | measurement of this metric.                         |
   | Statistical | A 'statistical' metric is derived from a given      |
   | metric      | sample metric by computing some statistic of the    |
   |             | values defined by the singleton metric on the       |
   |             | sample.                                             |
   | TWAMP       | Two-way Active Measurement Protocol, a protocol for |
   |             | communication between IPPM measurement systems      |
   |             | specified by IPPM.                                  |
   +-------------+-----------------------------------------------------+

   +-------------+-----------------------------------------------------+
   | ADK         | Anderson-Darling K-Sample test, a test used to      |
   |             | check whether two samples have the same statistical |
   |             | distribution.                                       |
   | ECMP        | Equal Cost Multipath, a load-balancing mechanism    |
   |             | evaluating MPLS Labels stacks, IP addresses, and    |
   |             | ports.                                              |
   | EDF         | The "empirical distribution function" of a set of   |
   |             | scalar measurements is a function F(x), which for   |
   |             | any x gives the fractional proportion of the total  |
   |             | measurements that were smaller than or equal to x.  |
   | Metric      | A measured quantity related to the performance and  |
   |             | reliability of the Internet, expressed by a value.  |
   |             | This could be a singleton (single value), a sample  |
   |             | of single values, or a statistic based on a sample  |
   |             | of singletons.                                      |
   | OWAMP       | One-Way Active Measurement Protocol, a protocol for |
   |             | communication between IPPM measurement systems      |
   |             | specified by IPPM.                                  |
   | OWD         | One-Way Delay, a performance metric specified by    |
   |             | IPPM.                                               |
   | Sample      | A sample metric is derived from a given singleton   |
   | metric      | metric by evaluating a number of distinct instances |
   |             | together.                                           |
   | Singleton   | A singleton metric is, in a sense, one atomic       |
   | metric      | measurement of this metric.                         |
   | Statistical | A 'statistical' metric is derived from a given      |
   | metric      | sample metric by computing some statistic of the    |
   |             | values defined by the singleton metric on the       |
   |             | sample.                                             |
   | TWAMP       | Two-way Active Measurement Protocol, a protocol for |
   |             | communication between IPPM measurement systems      |
   |             | specified by IPPM.                                  |
   +-------------+-----------------------------------------------------+

Authors' Addresses

作者地址

Ruediger Geib (editor) Deutsche Telekom Heinrich Hertz Str. 3-7 Darmstadt 64295 Germany

Ruediger Geib（编辑）德国电信海因里希赫兹街3-7号达姆施塔特64295

   Phone: +49 6151 58 12747
   EMail: Ruediger.Geib@telekom.de

   Phone: +49 6151 58 12747
   EMail: Ruediger.Geib@telekom.de

Al Morton AT&T Labs 200 Laurel Avenue South Middletown, NJ 07748 USA

美国新泽西州劳雷尔大道南米德尔顿200号阿尔莫顿AT&T实验室，邮编：07748

   Phone: +1 732 420 1571
   Fax:   +1 732 368 1192
   EMail: acmorton@att.com
   URI:   http://home.comcast.net/~acmacm/

   Phone: +1 732 420 1571
   Fax:   +1 732 368 1192
   EMail: acmorton@att.com
   URI:   http://home.comcast.net/~acmacm/

Reza Fardid Cariden Technologies 888 Villa Street, Suite 500 Mountain View, CA 94041 USA

Reza Fardid Cariden Technologies美国加利福尼亚州山景城别墅街888号500室，邮编94041

Phone: EMail: rfardid@cariden.com

电话：电邮：rfardid@cariden.com

Alexander Steinmitz Deutsche Telekom Memmelsdorfer Str. 209b Bamberg 96052 Germany

Alexander Steinmitz Deutsche Telekom Memmelsdorfer Str.209b Bamberg 96052德国

Phone: EMail: Alexander.Steinmitz@telekom.de

电话：电子邮件：亚历山大。Steinmitz@telekom.de