Network Working Group                                           T. Zseby
Request for Comments: 5475                              Fraunhofer FOKUS
Category: Standards Track                                      M. Molina
                                                                   DANTE
                                                             N. Duffield
                                                    AT&T Labs - Research
                                                            S. Niccolini
                                                         NEC Europe Ltd.
                                                              F. Raspall
                                                                EPSC-UPC
                                                              March 2009
        
Network Working Group                                           T. Zseby
Request for Comments: 5475                              Fraunhofer FOKUS
Category: Standards Track                                      M. Molina
                                                                   DANTE
                                                             N. Duffield
                                                    AT&T Labs - Research
                                                            S. Niccolini
                                                         NEC Europe Ltd.
                                                              F. Raspall
                                                                EPSC-UPC
                                                              March 2009
        

Sampling and Filtering Techniques for IP Packet Selection

IP包选择的采样和过滤技术

Status of This Memo

关于下段备忘

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.

本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2009 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document.

本文件受BCP 78和IETF信托在本文件出版之日生效的与IETF文件有关的法律规定的约束(http://trustee.ietf.org/license-info). 请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。

This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.

本文件可能包含2008年11月10日之前发布或公开的IETF文件或IETF贡献中的材料。控制某些材料版权的人员可能未授予IETF信托允许在IETF标准流程之外修改此类材料的权利。在未从控制此类材料版权的人员处获得充分许可的情况下,不得在IETF标准流程之外修改本文件,也不得在IETF标准流程之外创建其衍生作品,除了将其格式化以RFC形式发布或将其翻译成英语以外的其他语言。

Abstract

摘要

This document describes Sampling and Filtering techniques for IP packet selection. It provides a categorization of schemes and defines what parameters are needed to describe the most common selection schemes. Furthermore, it shows how techniques can be combined to build more elaborate packet Selectors. The document provides the basis for the definition of information models for configuring selection techniques in Metering Processes and for reporting the technique in use to a Collector.

本文档描述了IP数据包选择的采样和过滤技术。它提供了方案的分类,并定义了描述最常见的选择方案所需的参数。此外,它还展示了如何结合各种技术来构建更精细的数据包选择器。本文档为定义信息模型提供了基础,用于在计量过程中配置选择技术,并向收集器报告正在使用的技术。

Table of Contents

目录

   1. Introduction ....................................................3
      1.1. Conventions Used in This Document ..........................4
   2. PSAMP Documents Overview ........................................4
   3. Terminology .....................................................4
      3.1. Observation Points, Packet Streams, and Packet Content .....4
      3.2. Selection Process ..........................................5
      3.3. Reporting ..................................................7
      3.4. Metering Process ...........................................7
      3.5. Exporting Process ..........................................8
      3.6. PSAMP Device ...............................................8
      3.7. Collector ..................................................8
      3.8. Selection Methods ..........................................8
   4. Categorization of Packet Selection Techniques ..................11
   5. Sampling .......................................................12
      5.1. Systematic Sampling .......................................13
      5.2. Random Sampling ...........................................14
           5.2.1. n-out-of-N Sampling ................................14
           5.2.2. Probabilistic Sampling .............................14
   6. Filtering ......................................................16
      6.1. Property Match Filtering ..................................16
      6.2. Hash-Based Filtering ......................................19
           6.2.1. Application Examples for Coordinated Packet
                  Selection ..........................................19
           6.2.2. Desired Properties of Hash Functions ...............21
           6.2.3. Security Considerations for Hash Functions .........22
           6.2.4. Choice of Hash Function ............................26
   7. Parameters for the Description of Selection Techniques .........29
      7.1. Description of Sampling Techniques ........................30
      7.2. Description of Filtering Techniques .......................31
   8. Composite Techniques ...........................................34
      8.1. Cascaded Filtering->Sampling or Sampling->Filtering .......34
      8.2. Stratified Sampling .......................................34
   9. Security Considerations ........................................35
   10. Contributors ..................................................36
   11. Acknowledgments ...............................................36
        
   1. Introduction ....................................................3
      1.1. Conventions Used in This Document ..........................4
   2. PSAMP Documents Overview ........................................4
   3. Terminology .....................................................4
      3.1. Observation Points, Packet Streams, and Packet Content .....4
      3.2. Selection Process ..........................................5
      3.3. Reporting ..................................................7
      3.4. Metering Process ...........................................7
      3.5. Exporting Process ..........................................8
      3.6. PSAMP Device ...............................................8
      3.7. Collector ..................................................8
      3.8. Selection Methods ..........................................8
   4. Categorization of Packet Selection Techniques ..................11
   5. Sampling .......................................................12
      5.1. Systematic Sampling .......................................13
      5.2. Random Sampling ...........................................14
           5.2.1. n-out-of-N Sampling ................................14
           5.2.2. Probabilistic Sampling .............................14
   6. Filtering ......................................................16
      6.1. Property Match Filtering ..................................16
      6.2. Hash-Based Filtering ......................................19
           6.2.1. Application Examples for Coordinated Packet
                  Selection ..........................................19
           6.2.2. Desired Properties of Hash Functions ...............21
           6.2.3. Security Considerations for Hash Functions .........22
           6.2.4. Choice of Hash Function ............................26
   7. Parameters for the Description of Selection Techniques .........29
      7.1. Description of Sampling Techniques ........................30
      7.2. Description of Filtering Techniques .......................31
   8. Composite Techniques ...........................................34
      8.1. Cascaded Filtering->Sampling or Sampling->Filtering .......34
      8.2. Stratified Sampling .......................................34
   9. Security Considerations ........................................35
   10. Contributors ..................................................36
   11. Acknowledgments ...............................................36
        
   12. References ....................................................36
      12.1. Normative References .....................................36
      12.2. Informative References ...................................36
   Appendix A. Hash Functions ........................................40
   A.1 IP Shift-XOR (IPSX) Hash Function..............................40
   A.2 BOB Hash Function..............................................41
        
   12. References ....................................................36
      12.1. Normative References .....................................36
      12.2. Informative References ...................................36
   Appendix A. Hash Functions ........................................40
   A.1 IP Shift-XOR (IPSX) Hash Function..............................40
   A.2 BOB Hash Function..............................................41
        
1. Introduction
1. 介绍

There are two main drivers for the evolution in measurement infrastructures and their underlying technology. First, network data rates are increasing, with a concomitant growth in measurement data. Second, the growth is compounded by the demand of measurement-based applications for increasingly fine-grained traffic measurements. Devices that perform the measurements, require increasingly sophisticated and resource-intensive measurement capabilities, including the capture of packet headers or even parts of the payload, and classification for flow analysis. All these factors can lead to an overwhelming amount of measurement data, resulting in high demands on resources for measurement, storage, transfer, and post processing.

测量基础设施及其基础技术的发展有两个主要驱动因素。首先,网络数据速率在增加,同时测量数据也在增长。其次,基于测量的应用程序对越来越细粒度的流量测量的需求加剧了这种增长。执行测量的设备需要越来越复杂和资源密集型的测量能力,包括捕获数据包头甚至有效负载的一部分,以及流分析的分类。所有这些因素都会导致大量的测量数据,导致对测量、存储、传输和后处理资源的高需求。

The sustained capture of network traffic at line rate can be performed by specialized measurement hardware. However, the cost of the hardware and the measurement infrastructure required to accommodate the measurements preclude this as a ubiquitous approach. Instead, some form of data reduction at the point of measurement is necessary.

以线路速率持续捕获网络流量可由专用测量硬件执行。然而,硬件成本和测量所需的测量基础设施,以适应测量排除了这一普遍存在的方法。相反,需要在测量点进行某种形式的数据简化。

This can be achieved by an intelligent packet selection through Sampling or Filtering. Another way to reduce the amount of data is to use aggregation techniques (not addressed in this document). The motivation for Sampling is to select a representative subset of packets that allow accurate estimates of properties of the unsampled traffic to be formed. The motivation for Filtering is to remove all packets that are not of interest. Aggregation combines data and allows compact pre-defined views of the traffic. Examples of applications that benefit from packet selection are given in [RFC5474]. Aggregation techniques are out of scope of this document.

这可以通过采样或过滤的智能数据包选择来实现。减少数据量的另一种方法是使用聚合技术(本文档中未介绍)。采样的动机是选择具有代表性的数据包子集,以便准确估计待形成的未采样流量的特性。过滤的动机是删除所有不感兴趣的数据包。聚合结合了数据,并允许对流量进行紧凑的预定义视图。[RFC5474]中给出了受益于数据包选择的应用程序示例。聚合技术超出了本文档的范围。

1.1. Conventions Used in This Document
1.1. 本文件中使用的公约

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[RFC2119]中所述进行解释。

2. PSAMP Documents Overview
2. PSAMP文档概述

This document is one out of a series of documents from the PSAMP group.

本文档是PSAMP组的一系列文档之一。

[RFC5474]: "A Framework for Packet Selection and Reporting" describes the PSAMP framework for network elements to select subsets of packets by statistical and other methods, and to export a stream of reports on the selected packets to a Collector.

[RFC5474]:“数据包选择和报告框架”描述了PSAMP框架,用于网络元件通过统计和其他方法选择数据包子集,并将所选数据包的报告流导出到收集器。

RFC 5475 (this document): "Sampling and Filtering Techniques for IP Packet Selection" describes the set of packet selection techniques supported by PSAMP.

RFC 5475(本文档):“IP数据包选择的采样和过滤技术”描述了PSAMP支持的数据包选择技术集。

[RFC5476]: "Packet Sampling (PSAMP) Protocol Specifications" specifies the export of packet information from a PSAMP Exporting Process to a PSAMP Collecting Process.

[RFC5476]:“数据包采样(PSAMP)协议规范”指定将数据包信息从PSAMP导出进程导出到PSAMP收集进程。

[RFC5477]: "Information Model for Packet Sampling Exports" defines an information and data model for PSAMP.

[RFC5477]:“数据包采样导出的信息模型”定义了PSAMP的信息和数据模型。

3. Terminology
3. 术语

The PSAMP terminology defined here is fully consistent with all terms listed in [RFC5474] but includes additional terms required for the description of packet selection methods. An architecture overview and possible configurations of PSAMP elements can be found in [RFC5474]. PSAMP terminology also aims at consistency with terms used in [RFC3917]. The relationship between PSAMP and IPFIX terms is described in [RFC5474].

此处定义的PSAMP术语与[RFC5474]中列出的所有术语完全一致,但包括描述数据包选择方法所需的附加术语。可在[RFC5474]中找到PSAMP元素的体系结构概述和可能的配置。PSAMP术语还旨在与[RFC3917]中使用的术语保持一致。[RFC5474]中描述了PSAMP和IPFIX术语之间的关系。

In the PSAMP documents, all defined PSAMP terms are written capitalized. This document uses the same convention.

在PSAMP文档中,所有定义的PSAMP术语都大写。本文件使用相同的约定。

3.1. Observation Points, Packet Streams, and Packet Content
3.1. 观察点、数据包流和数据包内容

* Observation Point

* 观测点

An Observation Point [RFC5101] is a location in the network where packets can be observed. Examples include:

观察点[RFC5101]是网络中可以观察数据包的位置。例子包括:

(i) A line to which a probe is attached;

(i) 连接探针的线路;

(ii) a shared medium, such as an Ethernet-based LAN;

(ii)共享介质,例如基于以太网的LAN;

(iii) a single port of a router, or set of interfaces (physical or logical) of a router;

(iii)路由器的单个端口,或路由器的一组接口(物理或逻辑);

(iv) an embedded measurement subsystem within an interface.

(iv)接口内的嵌入式测量子系统。

Note that one Observation Point may be a superset of several other Observation Points. For example, one Observation Point can be an entire line card. This would be the superset of the individual Observation Points at the line card's interfaces.

请注意,一个观测点可能是其他几个观测点的超集。例如,一个观察点可以是一张完整的线卡。这将是线路卡接口处单个观测点的超集。

* Observed Packet Stream

* 观测数据包流

The Observed Packet Stream is the set of all packets observed at the Observation Point.

观察到的分组流是在观察点处观察到的所有分组的集合。

* Packet Stream

* 包流

A Packet Stream denotes a set of packets from the Observed Packet Stream that flows past some specified point within the Metering Process. An example of a Packet Stream is the output of the selection process. Note that packets selected from a stream, e.g., by Sampling, do not necessarily possess a property by which they can be distinguished from packets that have not been selected. For this reason, the term "stream" is favored over "flow", which is defined as a set of packets with common properties [RFC3917].

数据包流表示来自观测数据包流的一组数据包,这些数据包流经过计量过程中的某个指定点。分组流的一个示例是选择过程的输出。注意,例如通过采样从流中选择的分组不一定具有可通过其与未选择的分组区分的属性。因此,术语“流”优于“流”,后者被定义为具有公共属性的一组数据包[RFC3917]。

* Packet Content

* 数据包内容

The Packet Content denotes the union of the packet header (which includes link layer, network layer, and other encapsulation headers) and the packet payload. At some Observation Points, the link header information may not be available.

分组内容表示分组报头(包括链路层、网络层和其他封装报头)和分组有效载荷的联合。在某些观测点,链路头信息可能不可用。

3.2. Selection Process
3.2. 选择过程

* Selection Process

* 选择过程

A Selection Process takes the Observed Packet Stream as its input and selects a subset of that stream as its output.

选择过程将观察到的分组流作为其输入,并选择该流的子集作为其输出。

* Selection State

* 选择状态

A Selection Process may maintain state information for use by the Selection Process. At a given time, the Selection State may depend on packets observed at and before that time, and other variables. Examples include:

选择过程可以维护状态信息以供选择过程使用。在给定时间,选择状态可能取决于在该时间和之前观察到的数据包以及其他变量。例子包括:

(i) sequence numbers of packets at the input of Selectors;

(i) 选择器输入端的数据包序列号;

(ii) a timestamp of observation of the packet at the Observation Point;

(ii)在观察点观察分组的时间戳;

(iii) iterators for pseudorandom number generators;

(iii)伪随机数生成器的迭代器;

(iv) hash values calculated during selection;

(iv)选择期间计算的哈希值;

(v) indicators of whether the packet was selected by a given Selector.

(v) 指示数据包是否由给定选择器选择的指示器。

Selection Processes may change portions of the Selection State as a result of processing a packet. Selection State for a packet is to reflect the state after processing the packet.

选择过程可作为处理分组的结果而改变选择状态的部分。数据包的选择状态反映处理数据包后的状态。

* Selector

* 选择器

A Selector defines what kind of action a Selection Process performs on a single packet of its input. If selected, the packet becomes an element of the output Packet Stream.

选择器定义选择过程对其输入的单个数据包执行的操作类型。如果选择,则数据包成为输出数据包流的一个元素。

The Selector can make use of the following information in determining whether a packet is selected:

选择器可以利用以下信息来确定是否选择了分组:

(i) the Packet Content;

(i) 分组内容;

(ii) information derived from the packet's treatment at the Observation Point;

(ii)从观察点的数据包处理中获得的信息;

(iii) any Selection State that may be maintained by the Selection Process.

(iii)选择过程可能维持的任何选择状态。

* Composite Selector

* 复合选择器

A Composite Selector is an ordered composition of Selectors, in which the output Packet Stream issuing from one Selector forms the input Packet Stream to the succeeding Selector.

复合选择器是选择器的有序组合,其中从一个选择器发出的输出数据包流形成到后续选择器的输入数据包流。

* Primitive Selector

* 基本选择器

A Selector is primitive if it is not a Composite Selector.

如果选择器不是复合选择器,则它是基本选择器。

* Selection Sequence

* 选择序列

From all the packets observed at an Observation Point, only a few packets are selected by one or more Selectors. The Selection Sequence is a unique value per Observation Domain describing the Observation Point and the Selector IDs through which the packets are selected.

从一个观察点观察到的所有数据包中,只有少数数据包被一个或多个选择器选择。选择序列是每个观察域的唯一值,用于描述观察点和选择数据包的选择器ID。

3.3. Reporting
3.3. 报告

* Packet Reports

* 数据包报告

Packet Reports comprise a configurable subset of a packet's input to the Selection Process, including the Packet's Content, information relating to its treatment (for example, the output interface), and its associated Selection State (for example, a hash of the Packet's Content).

分组报告包括分组对选择过程的输入的可配置子集,包括分组的内容、与其处理相关的信息(例如,输出接口)及其关联的选择状态(例如,分组内容的散列)。

* Report Interpretation

* 报告解释

Report Interpretation comprises subsidiary information, relating to one or more packets, that is used for interpretation of their Packet Reports. Examples include configuration parameters of the Selection Process.

报告解释包括用于解释其数据包报告的与一个或多个数据包相关的辅助信息。示例包括选择过程的配置参数。

* Report Stream

* 报告流

The Report Stream is the output of a Metering Process, comprising two distinguished types of information: Packet Reports and Report Interpretation.

报告流是计量过程的输出,包括两种不同类型的信息:数据包报告和报告解释。

3.4. Metering Process
3.4. 计量过程

A Metering Process selects packets from the Observed Packet Stream using a Selection Process, and produces as output a Report Stream concerning the selected packets.

计量处理使用选择处理从观察到的分组流中选择分组,并产生关于所选分组的报告流作为输出。

The PSAMP Metering Process can be viewed as analogous to the IPFIX Metering Process [RFC5101], which produces Flow Records as its output, with the difference that the PSAMP Metering Process always contains a Selection Process. The relationship between PSAMP and IPFIX is further described in [RFC5477] and [RFC5474].

PSAMP计量过程可被视为类似于IPFIX计量过程[RFC5101],该过程产生流量记录作为其输出,不同之处在于PSAMP计量过程始终包含一个选择过程。[RFC5477]和[RFC5474]中进一步描述了PSAMP和IPFIX之间的关系。

3.5. Exporting Process
3.5. 导出过程

* Exporting Process

* 导出过程

An Exporting Process sends, in the form of Export Packets, the output of one or more Metering Processes to one or more Collectors.

导出过程以导出包的形式将一个或多个计量过程的输出发送到一个或多个收集器。

* Export Packet

* 导出包

An Export Packet is a combination of Report Interpretations and/or one or more Packet Reports that are bundled by the Exporting Process into an Export Packet for exporting to a Collector.

导出数据包是报告解释和/或一个或多个数据包报告的组合,这些数据包由导出过程捆绑到导出数据包中,以导出到收集器。

3.6. PSAMP Device
3.6. PSAMP装置

* PSAMP Device

* PSAMP装置

A PSAMP Device is a device hosting at least an Observation Point, a Metering Process (which includes a Selection Process), and an Exporting Process. Typically, corresponding Observation Point(s), Metering Process(es), and Exporting Process(es) are colocated at this device, for example, at a router.

PSAMP设备是至少承载观测点、计量过程(包括选择过程)和导出过程的设备。通常,相应的观测点、计量过程和导出过程在该设备上(例如,在路由器上)共同定位。

3.7. Collector
3.7. 收藏家

* Collector

* 收藏家

A Collector receives a Report Stream exported by one or more Exporting Processes. In some cases, the host of the Metering and/or Exporting Processes may also serve as the Collector.

收集器接收由一个或多个导出进程导出的报告流。在某些情况下,计量和/或导出过程的主机也可以用作收集器。

3.8. Selection Methods
3.8. 选择方法

* Filtering

* 过滤

A filter is a Selector that selects a packet deterministically based on the Packet Content, or its treatment, or functions of these occurring in the Selection State. Two examples are:

过滤器是一个选择器,它基于包内容或其处理或选择状态中出现的这些内容或处理的功能来确定地选择包。两个例子是:

(i) Property Match Filtering: A packet is selected if a specific field in the packet equals a predefined value.

(i) 属性匹配筛选:如果数据包中的特定字段等于预定义值,则选择数据包。

(ii) Hash-based Selection: A Hash Function is applied to the Packet Content, and the packet is selected if the result falls in a specified range.

(ii)基于散列的选择:对数据包内容应用散列函数,如果结果在指定范围内,则选择数据包。

* Sampling

* 取样

A Selector that is not a filter is called a Sampling operation. This reflects the intuitive notion that if the selection of a packet cannot be determined from its content alone, there must be some type of Sampling taking place. Sampling operations can be divided into two subtypes:

不是过滤器的选择器称为采样操作。这反映了一个直观的概念,即如果不能仅从数据包的内容来确定数据包的选择,则必须进行某种类型的采样。采样操作可分为两个子类型:

(i) Content-independent Sampling, which does not use Packet Content in reaching Sampling decisions. Examples include systematic Sampling, and uniform pseudorandom Sampling driven by a pseudorandom number whose generation is independent of Packet Content. Note that in content-independent Sampling, it is not necessary to access the Packet Content in order to make the selection decision.

(i) 与内容无关的采样,在做出采样决定时不使用数据包内容。示例包括系统采样和由伪随机数驱动的均匀伪随机采样,伪随机数的生成与数据包内容无关。注意,在与内容无关的采样中,不必为了做出选择决策而访问分组内容。

(ii) Content-dependent Sampling, in which the Packet Content is used in reaching selection decisions. An application is pseudorandom selection according to a probability that depends on the contents of a packet field, e.g., Sampling packets with a probability dependent on their TCP/UDP port numbers. Note that this is not a Filter.

(ii)与内容相关的抽样,其中分组内容用于达成选择决策。应用程序是根据取决于数据包字段内容的概率进行的伪随机选择,例如,采样数据包的概率取决于其TCP/UDP端口号。请注意,这不是一个过滤器。

* Hash Domain

* 散列域

A Hash Domain is a subset of the Packet Content and the packet treatment, viewed as an N-bit string for some positive integer N.

散列域是数据包内容和数据包处理的子集,被视为某个正整数N的N位字符串。

* Hash Range

* 散列范围

A Hash Range is a set of M-bit strings for some positive integer M that defines the range of values that the result of the hash operation can take.

散列范围是某个正整数M的一组M位字符串,它定义了散列操作结果可以采用的值的范围。

* Hash Function

* 散列函数

A Hash Function defines a deterministic mapping from the Hash Domain into the Hash Range.

散列函数定义从散列域到散列范围的确定性映射。

* Hash Selection Range

* 散列选择范围

A Hash Selection Range is a subset of the Hash Range. The packet is selected if the action of the Hash Function on the Hash Domain for the packet yields a result in the Hash Selection Range.

哈希选择范围是哈希范围的子集。如果对分组的哈希域的哈希函数的操作产生哈希选择范围内的结果,则选择分组。

* Hash-based Selection

* 基于散列的选择

A Hash-based Selection is Filtering specified by a Hash Domain, a Hash Function, a Hash Range, and a Hash Selection Range.

基于散列的选择是由散列域、散列函数、散列范围和散列选择范围指定的过滤。

* Approximative Selection

* 近似选择

Selectors in any of the above categories may be approximated by operations in the same or another category for the purposes of implementation. For example, uniform pseudorandom Sampling may be approximated by Hash-based Selection, using a suitable Hash Function and Hash Domain. In this case, the closeness of the approximation depends on the choice of Hash Function and Hash Domain.

为了实现的目的,上述任何类别中的选择器可以通过相同或另一类别中的操作来近似。例如,均匀伪随机采样可以通过使用合适的散列函数和散列域的基于散列的选择来近似。在这种情况下,近似值的接近程度取决于哈希函数和哈希域的选择。

* Population

* 人口

A Population is a Packet Stream or a subset of a Packet Stream. A Population can be considered as a base set from which packets are selected. An example is all packets in the Observed Packet Stream that are observed within some specified time interval.

填充是分组流或分组流的子集。可以将总体视为从中选择数据包的基本集。一个示例是在某个指定的时间间隔内观察到的所观察的分组流中的所有分组。

* Population Size

* 人口规模

The Population Size is the number of all packets in the Population.

填充大小是填充中所有数据包的数量。

* Sample Size

* 样本量

The Sample Size is a number of packets selected from the Population by a Selector.

样本大小是由选择器从总体中选择的数据包数。

* Configured Selection Fraction

* 配置选择分数

The Configured Selection Fraction is the expected ratio of the Sample Size to the Population Size, as based on the configured selection parameters.

配置的选择分数是基于配置的选择参数的样本大小与总体大小的预期比率。

* Attained Selection Fraction

* 获得的选择分数

The Attained Selection Fraction is the ratio of the actual Sample Size to the Population Size. For some Sampling methods, the Attained Selection Fraction can differ from the Configured Selection Fraction due to, for example, the inherent statistical variability in Sampling decisions of probabilistic Sampling and Hash-based Selection. Nevertheless, for large Population Sizes and properly configured Selectors, the Attained Selection Fraction usually approaches the Configured Selection Fraction.

获得的选择分数是实际样本量与总体规模的比率。对于一些抽样方法,由于例如概率抽样和基于散列的选择的抽样决策中固有的统计可变性,所获得的选择分数可以不同于所配置的选择分数。然而,对于较大的总体规模和正确配置的选择器,获得的选择分数通常接近配置的选择分数。

4. Categorization of Packet Selection Techniques
4. 分组选择技术的分类

Packet selection techniques generate a subset of packets from an Observed Packet Stream at an Observation Point. We distinguish between Sampling and Filtering.

分组选择技术从观测点处的观测分组流生成分组子集。我们区分采样和过滤。

Sampling is targeted at the selection of a representative subset of packets. The subset is used to infer knowledge about the whole set of observed packets without processing them all. The selection can depend on packet position, and/or on Packet Content, and/or on (pseudo) random decisions.

采样的目标是选择具有代表性的数据包子集。该子集用于推断关于整个观测数据包集的知识,而无需对其全部进行处理。选择可以取决于分组位置和/或分组内容和/或(伪)随机决策。

Filtering selects a subset with common properties. This is used if only a subset of packets is of interest. The properties can be directly derived from the Packet Content, or depend on the treatment given by the router to the packet. Filtering is a deterministic operation. It depends on Packet Content or router treatment. It never depends on packet position or on (pseudo) random decisions.

过滤选择具有公共属性的子集。如果只对数据包的子集感兴趣,则使用该方法。属性可以直接从数据包内容派生,也可以取决于路由器对数据包的处理。过滤是一种确定性操作。这取决于数据包内容或路由器处理。它从不依赖于数据包位置或(伪)随机决策。

Note that a common technique to select packets is to compute a Hash Function on some bits of the packet header and/or content and to select it if the hash value falls in the Hash Selection Range. Since hashing is a deterministic operation on the Packet Content, it is a Filtering technique according to our categorization. Nevertheless, Hash Functions are sometimes used to emulate random Sampling. Depending on the chosen input bits, the Hash Function, and the Hash Selection Range, this technique can be used to emulate the random selection of packets with a given probability p. It is also a powerful technique to consistently select the same packet subset at multiple Observation Points [DuGr00].

注意,选择分组的常见技术是在分组报头和/或内容的某些位上计算哈希函数,并在哈希值落在哈希选择范围内时选择它。由于哈希是对数据包内容的一种确定性操作,因此根据我们的分类,它是一种过滤技术。然而,哈希函数有时用于模拟随机采样。根据选择的输入位、散列函数和散列选择范围,该技术可用于模拟具有给定概率p的分组的随机选择。在多个观测点一致地选择相同的数据包子集也是一种强大的技术[DuGr00]。

The following table gives an overview of the schemes described in this document and their categorization. X means that the characteristic applies to the selection scheme. (X) denotes schemes for which content-dependent and content-independent variants exist. For instance, Property Match Filtering is typically based on Packet Content and therefore is content dependent. But as explained in Section 6.1, it may also depend on router state and then would be independent of the content. It easily can be seen that only schemes with both properties, content dependence and deterministic selection, are considered as Filters.

下表概述了本文档中描述的方案及其分类。X表示该特征适用于选择方案。(十) 表示存在内容相关和内容无关变体的方案。例如,属性匹配筛选通常基于数据包内容,因此依赖于内容。但如第6.1节所述,它也可能取决于路由器状态,然后独立于内容。很容易看出,只有同时具有内容依赖性和确定性选择这两个属性的方案才被视为过滤器。

        Selection Scheme   | Deterministic | Content -| Category
                           |  Selection    | Dependent|
   ------------------------+---------------+----------+----------
    Systematic             |       X       |     _    | Sampling
    Count-based            |               |          |
   ------------------------+---------------+----------+----------
    Systematic             |       X       |     -    | Sampling
    Time-based             |               |          |
   ------------------------+---------------+----------+----------
    Random                 |       -       |     -    | Sampling
    n-out-of-N             |               |          |
   ------------------------+---------------+----------+----------
    Random                 |       -       |     -    | Sampling
    uniform probabilistic  |               |          |
   ------------------------+---------------+----------+----------
    Random                 |       -       |    (X)   | Sampling
    non-uniform probabil.  |               |          |
   ------------------------+---------------+----------+----------
    Random                 |       -       |    (X)   | Sampling
    non-uniform Flow-State |               |          |
   ------------------------+---------------+----------+----------
    Property Match         |       X       |    (X)   | Filtering
    Filtering              |               |          |
   ------------------------+---------------+----------+----------
    Hash function          |       X       |     X    | Filtering
   ------------------------+---------------+----------+----------
        
        Selection Scheme   | Deterministic | Content -| Category
                           |  Selection    | Dependent|
   ------------------------+---------------+----------+----------
    Systematic             |       X       |     _    | Sampling
    Count-based            |               |          |
   ------------------------+---------------+----------+----------
    Systematic             |       X       |     -    | Sampling
    Time-based             |               |          |
   ------------------------+---------------+----------+----------
    Random                 |       -       |     -    | Sampling
    n-out-of-N             |               |          |
   ------------------------+---------------+----------+----------
    Random                 |       -       |     -    | Sampling
    uniform probabilistic  |               |          |
   ------------------------+---------------+----------+----------
    Random                 |       -       |    (X)   | Sampling
    non-uniform probabil.  |               |          |
   ------------------------+---------------+----------+----------
    Random                 |       -       |    (X)   | Sampling
    non-uniform Flow-State |               |          |
   ------------------------+---------------+----------+----------
    Property Match         |       X       |    (X)   | Filtering
    Filtering              |               |          |
   ------------------------+---------------+----------+----------
    Hash function          |       X       |     X    | Filtering
   ------------------------+---------------+----------+----------
        

The categorization just introduced is mainly useful for the definition of an information model describing Primitive Selectors. More complex selection techniques can be described through the composition of cascaded Sampling and Filtering operations. For example, a packet selection that weights the selection probability on the basis of the packet length can be described as a cascade of a Filtering and a Sampling scheme. However, this descriptive approach is not intended to be rigid: if a common and consolidated selection practice turns out to be too complex to be described as a composition of the mentioned building blocks, an ad hoc description can be specified instead and added as a new scheme to the information model.

刚才介绍的分类主要用于定义描述基本选择器的信息模型。更复杂的选择技术可以通过级联采样和滤波操作的组合来描述。例如,基于分组长度加权选择概率的分组选择可以描述为滤波和采样方案的级联。但是,这种描述方法并不严格:如果一种常见的综合选择实践过于复杂,无法描述为所述构建块的组合,则可以指定一种特殊描述,并将其作为新方案添加到信息模型中。

5. Sampling
5. 取样

The deployment of Sampling techniques aims at the provisioning of information about a specific characteristic of the parent Population at a lower cost than a full census would demand. In order to plan a suitable Sampling strategy, it is therefore crucial to determine the needed type of information and the desired degree of accuracy in advance.

抽样技术的部署旨在以低于全面普查所需的成本提供有关父母人口特定特征的信息。因此,为了规划合适的抽样策略,必须提前确定所需的信息类型和所需的准确度。

First of all, it is important to know the type of metric that should be estimated. The metric of interest can range from simple packet counts [JePP92] up to the estimation of whole distributions of flow characteristics (e.g., packet sizes) [ClPB93].

首先,了解应该估计的度量的类型很重要。感兴趣的度量可以从简单的数据包计数[JePP92]到流特征(例如,数据包大小)的整体分布估计[ClPB93]。

Second, the required accuracy of the information and with this, the confidence that is aimed at, should be known in advance. For instance, for usage-based accounting the required confidence for the estimation of packet counters can depend on the monetary value that corresponds to the transfer of one packet. That means that a higher confidence could be required for expensive packet flows (e.g., premium IP service) than for cheaper flows (e.g., best effort). The accuracy requirements for validating a previously agreed quality can also vary extremely with the customer demands. These requirements are usually determined by the service level agreement (SLA).

第二,信息所需的准确性,以及以此为目标的信心,应该提前知道。例如,对于基于使用的记帐,分组计数器估计所需的置信度可以取决于与一个分组的传输相对应的货币值。这意味着,对于昂贵的数据包流(例如,高级IP服务),可能需要比便宜的数据流(例如,尽力而为)更高的置信度。验证先前商定的质量的精度要求也可能因客户需求而发生极大变化。这些需求通常由服务级别协议(SLA)确定。

The Sampling method and the parameters in use must be clearly communicated to all applications that use the measurement data. Only with this knowledge a correct interpretation of the measurement results can be ensured.

采样方法和使用的参数必须清楚地传达给使用测量数据的所有应用程序。只有具备这些知识,才能确保测量结果的正确解释。

Sampling methods can be characterized by the Sampling algorithm, the trigger type used for starting a Sampling interval, and the length of the Sampling interval. These parameters are described here in detail. The Sampling algorithm describes the basic process for selection of samples. In accordance to [AmCa89] and [ClPB93], we define the following basic Sampling processes.

采样方法可以由采样算法、用于启动采样间隔的触发器类型以及采样间隔的长度来描述。这里详细描述了这些参数。采样算法描述了选择样本的基本过程。根据[AmCa89]和[ClPB93],我们定义了以下基本采样过程。

5.1. Systematic Sampling
5.1. 系统抽样

Systematic Sampling describes the process of selecting the start points and the duration of the selection intervals according to a deterministic function. This can be for instance the periodic selection of every k-th element of a trace but also the selection of all packets that arrive at predefined points in time. Even if the selection process does not follow a periodic function (e.g., if the time between the Sampling intervals varies over time), we consider this as systematic Sampling as long as the selection is deterministic.

系统抽样描述了根据确定性函数选择起点和选择间隔持续时间的过程。例如,这可以是对跟踪的每个第k个元素的周期性选择,也可以是对到达预定义时间点的所有数据包的选择。即使选择过程不遵循周期函数(例如,如果采样间隔之间的时间随时间变化),我们认为这是系统采样,只要选择是确定的。

The use of systematic Sampling always involves the risk of biasing the results. If the systematics in the Sampling process resemble systematics in the observed stochastic process (occurrence of the characteristic of interest in the network), there is a high probability that the estimation will be biased. Systematics in the observed process might not be known in advance.

系统抽样的使用总是有可能使结果产生偏差。如果抽样过程中的系统分类与观测到的随机过程中的系统分类相似(网络中出现感兴趣的特征),则估计很可能会有偏差。观测过程中的系统学可能事先不知道。

Here only equally spaced schemes are considered, where triggers for Sampling are periodic, either in time or in packet count. All packets occurring in a selection interval (either in time or packet count) beyond the trigger are selected.

这里只考虑等距方案,其中采样触发器在时间或数据包计数上是周期性的。在触发器之外的选择间隔(时间或数据包计数)内发生的所有数据包都将被选择。

Systematic count-based In systematic count-based Sampling, the start and stop triggers for the Sampling interval are defined in accordance to the spatial packet position (packet count).

基于系统计数在基于系统计数的采样中,根据空间数据包位置(数据包计数)定义采样间隔的开始和停止触发器。

Systematic time-based In systematic time-based Sampling, time-based start and stop triggers are used to define the Sampling intervals. All packets are selected that arrive at the Observation Point within the time intervals defined by the start and stop triggers (i.e., arrival time of the packet is larger than the start time and smaller than the stop time).

基于系统时间的在基于系统时间的采样中,基于时间的启动和停止触发器用于定义采样间隔。选择在开始和停止触发器定义的时间间隔内到达观察点的所有数据包(即,数据包的到达时间大于开始时间,小于停止时间)。

Both schemes are content-independent selection schemes. Content-dependent deterministic Selectors are categorized as filters.

这两种方案都是内容独立的选择方案。依赖于内容的确定性选择器被分类为过滤器。

5.2. Random Sampling
5.2. 随机抽样

Random Sampling selects the starting points of the Sampling intervals in accordance to a random process. The selection of elements is an independent experiment. With this, unbiased estimations can be achieved. In contrast to systematic Sampling, random Sampling requires the generation of random numbers. One can differentiate two methods of random Sampling: n-out-of-N Sampling and probabilistic Sampling.

随机采样根据随机过程选择采样间隔的起点。元素的选择是一个独立的实验。这样,就可以实现无偏估计。与系统抽样相比,随机抽样要求生成随机数。可以区分两种随机抽样方法:n取n抽样和概率抽样。

5.2.1. n-out-of-N Sampling
5.2.1. n取n抽样

In n-out-of-N Sampling, n elements are selected out of the parent Population that consists of N elements. One example would be to generate n different random numbers in the range [1,N] and select all packets that have a packet position equal to one of the random numbers. For this kind of Sampling, the Sample Size n is fixed.

在n取n抽样中,从由n个元素组成的父总体中选择n个元素。一个示例是在范围[1,n]内生成n个不同的随机数,并选择分组位置等于其中一个随机数的所有分组。对于这种抽样,样本大小n是固定的。

5.2.2. Probabilistic Sampling
5.2.2. 概率抽样

In probabilistic Sampling, the decision whether or not an element is selected is made in accordance to a predefined selection probability. An example would be to flip a coin for each packet and select all packets for which the coin showed the head. For this kind of Sampling, the Sample Size can vary for different trials. The selection probability does not necessarily have to be the same for each packet. Therefore, we distinguish between uniform probabilistic Sampling (with the same selection probability for all packets) and

在概率抽样中,根据预定义的选择概率决定是否选择元素。例如,为每个包投掷一枚硬币,并选择硬币显示头部的所有包。对于这种抽样,不同试验的样本量可能不同。对于每个分组,选择概率不一定必须相同。因此,我们区分均匀概率抽样(所有数据包的选择概率相同)和均匀概率抽样

non-uniform probabilistic Sampling (where the selection probability can vary for different packets).

非均匀概率抽样(不同数据包的选择概率可能不同)。

5.2.2.1. Uniform Probabilistic Sampling
5.2.2.1. 均匀概率抽样

For Uniform Probabilistic Sampling, packets are selected independently with a uniform probability p. This Sampling can be count-driven, and is sometimes referred to as geometric random Sampling, since the difference in count between successive selected packets is an independent random variable with a geometric distribution of mean 1/p. A time-driven analog, exponential random Sampling, has the time between triggers exponentially distributed.

对于均匀概率抽样,分组以均匀概率p独立选择。这种抽样可以是计数驱动的,有时称为几何随机抽样,因为连续选择的数据包之间的计数差是一个独立的随机变量,其几何分布平均值为1/p。时间驱动的模拟,指数随机采样,触发器之间的时间呈指数分布。

Both geometric and exponential random Sampling are examples of what is known as additive random Sampling, defined as Sampling where the intervals or counts between successive samples are independent identically distributed random variables.

几何随机抽样和指数随机抽样都是被称为加性随机抽样的例子,加性随机抽样定义为连续样本之间的间隔或计数是独立的同分布随机变量的抽样。

5.2.2.2. Non-Uniform Probabilistic Sampling
5.2.2.2. 非均匀概率抽样

This is a variant of Probabilistic Sampling in which the Sampling probabilities can depend on the selection process input. This can be used to weight Sampling probabilities in order, e.g., to boost the chance of Sampling packets that are rare but are deemed important. Unbiased estimators for quantitative statistics are recovered by re-normalization of sample values; see [HT52].

这是概率抽样的一种变体,其中抽样概率取决于选择过程输入。这可用于对采样概率进行加权,例如,增加对罕见但被认为重要的数据包进行采样的机会。通过样本值的重新标准化,恢复定量统计的无偏估计量;见[HT52]。

5.2.2.3. Non-Uniform Flow State Dependent Sampling
5.2.2.3. 非均匀流状态相关采样

Another type of Sampling that can be classified as probabilistic Non-Uniform is closely related to the flow concept as defined in [RFC3917], and it is only used jointly with a flow monitoring function (IPFIX Metering Process). Packets are selected, dependent on a Selection State. The point, here, is that the Selection State is determined also by the state of the flow the packet belongs to and/or by the state of the other flows currently being monitored by the associated flow monitoring function. An example for such an algorithm is the "sample and hold" method described in [EsVa01]:

另一种可归类为概率非均匀的采样类型与[RFC3917]中定义的流量概念密切相关,且仅与流量监控功能(IPFIX计量过程)联合使用。根据选择状态选择数据包。这里的要点是,选择状态也由分组所属的流的状态和/或由当前由相关联的流监视功能监视的其他流的状态来确定。这种算法的一个例子是[EsVa01]中描述的“采样并保持”方法:

- If a packet accounts for a Flow Record that already exists in the IPFIX flow recording process, it is selected (i.e., the Flow Record is updated).

- 如果数据包说明了IPFIX流记录过程中已经存在的流记录,则选择该数据包(即更新流记录)。

- If a packet doesn't account for any existing Flow Record, it is selected with probability p. If it has been selected, a new Flow Record has to be created.

- 如果一个数据包不考虑任何现有的流记录,则以概率p选择它。如果已选择,则必须创建新的流量记录。

A further algorithm that fits into the category of non-uniform flow state dependent Sampling is described in [Moli03].

[Moli03]中描述了另一种适用于非均匀流状态相关采样的算法。

This type of Sampling is content dependent because the identification of the flow the packet belongs to requires analyzing part of the Packet Content. If the packet is selected, then it is passed as an input to the IPFIX monitoring function (this is called "Local Export" in [RFC5474]). Selecting the packet depending on the state of a flow cache is useful when memory resources of the flow monitoring function are scarce (i.e., there is no room to keep all the flows that have been scheduled for monitoring).

这种类型的采样依赖于内容,因为数据包所属流的标识需要分析数据包内容的一部分。如果选择了数据包,则将其作为输入传递给IPFIX监控功能(在[RFC5474]中称为“本地导出”)。当流监控功能的内存资源稀缺时(即,没有空间保留已调度用于监控的所有流),根据流缓存的状态选择分组是有用的。

5.2.2.4. Configuration of Non-Uniform Probabilistic and Flow State Sampling

5.2.2.4. 非均匀概率和流动状态采样的配置

Many different specific methods can be grouped under the terms non-uniform probabilistic and flow state Sampling. Dependent on the Sampling goal and the implemented scheme, a different number and type of input parameters are required to configure such a scheme.

许多不同的具体方法可以根据非均匀概率和流动状态采样进行分组。根据采样目标和实施的方案,配置此类方案需要不同数量和类型的输入参数。

Some concrete proposals for such methods exist from the research community (e.g., [EsVa01], [DuLT01], [Moli03]). Some of these proposals are still in an early stage and need further investigations to prove their usefulness and applicability. It is not our aim to indicate preference among these methods. Instead, we only describe here the basic methods and leave the specification of explicit schemes and their parameters up to vendors (e.g., as an extension of the information model).

研究界对此类方法提出了一些具体建议(例如,[EsVa01]、[DuLT01]、[Moli03])。其中一些建议仍处于早期阶段,需要进一步调查,以证明其有用性和适用性。我们的目的不是要表明对这些方法的偏好。相反,我们在这里只描述基本方法,并将显式方案的规范及其参数留给供应商(例如,作为信息模型的扩展)。

6. Filtering
6. 过滤

Filtering is the deterministic selection of packets based on the Packet Content, the treatment of the packet at the Observation Point, or deterministic functions of these occurring in the Selection State. The packet is selected if these quantities fall into a specified range. The role of Filtering, as the word itself suggest, is to separate all the packets having a certain property from those not having it. A distinguishing characteristic from Sampling is that the selection decision does not depend on the packet position in time or in space, or on a random process.

过滤是基于包内容的包的确定性选择、在观察点处对包的处理或这些在选择状态下发生的确定性功能。如果这些数量在指定范围内,则选择数据包。正如这个词本身所暗示的那样,过滤的作用是将具有特定属性的所有数据包与不具有特定属性的数据包分开。抽样的一个区别特征是,选择决策不依赖于数据包在时间或空间上的位置,也不依赖于随机过程。

We identify and describe in the following two Filtering techniques.

我们在以下两种过滤技术中识别和描述。

6.1. Property Match Filtering
6.1. 属性匹配过滤

With this Filtering method, a packet is selected if specific fields within the packet and/or properties of the router state equal a predefined value. Possible filter fields are all IPFIX flow

使用这种过滤方法,如果包内的特定字段和/或路由器状态的属性等于预定义值,则选择包。可能的筛选器字段都是IPFIX流

attributes specified in [RFC5102]. Further fields can be defined by proposing new information elements or defining vendor-specific extensions.

[RFC5102]中指定的属性。可以通过建议新的信息元素或定义特定于供应商的扩展来定义更多字段。

A packet is selected if Field=Value. Masks and ranges are only supported to the extent to which [RFC5102] allows them, e.g., by providing explicit fields like the netmasks for source and destination addresses.

如果字段=值,则选择数据包。掩码和范围仅在[RFC5102]允许的范围内受支持,例如,通过提供源地址和目标地址的网络掩码等显式字段。

AND operations are possible by concatenating filters, thus producing a composite selection operation. In this case, the ordering in which the Filtering happens is implicitly defined (outer filters come after inner filters). However, as long as the concatenation is on filters only, the result of the cascaded filter is independent from the order, but the order may be important for implementation purposes, as the first filter will have to work at a higher rate. In any case, an implementation is not constrained to respect the filter ordering, as long as the result is the same, and it may even implement the composite Filtering in one single step.

和操作可以通过串联过滤器来实现,从而产生复合选择操作。在这种情况下,过滤发生的顺序是隐式定义的(外部过滤器在内部过滤器之后)。然而,只要级联仅在滤波器上,级联滤波器的结果与阶数无关,但是阶数对于实现目的可能很重要,因为第一个滤波器必须以更高的速率工作。在任何情况下,只要结果相同,实现都不受过滤器顺序的约束,甚至可以在一个步骤中实现复合过滤。

OR operations are not supported with this basic model. More sophisticated filters (e.g., supporting bitmasks, ranges, or OR operations) can be realized as vendor-specific schemes.

此基本模型不支持或操作。更复杂的过滤器(例如,支持位掩码、范围或操作)可以作为特定于供应商的方案实现。

All IPFIX flow attributes defined in [RFC5102] can be used for Property Match Filtering. Further information elements can be easily defined. Property match operations should be available for different protocol portions of the packet header:

[RFC5102]中定义的所有IPFIX流属性都可用于属性匹配筛选。进一步的信息元素可以很容易地定义。属性匹配操作应可用于数据包头的不同协议部分:

(i) IP header (excluding options in IPv4, stacked headers in IPv6)

(i) IP标头(不包括IPv4中的选项,IPv6中的堆叠标头)

(ii) transport protocol header (e.g., TCP, UDP)

(ii)传输协议头(例如TCP、UDP)

(iii) encapsulation headers (e.g., the MPLS label stack, if present)

(iii)封装头(例如,MPLS标签堆栈,如果存在)

When the PSAMP Device offers Property Match Filtering, and, in its usual capacity other than in performing PSAMP functions, identifies or processes information from IP, transport protocol or encapsulation protocols, then the information should be made available for Filtering. For example, when a PSAMP Device routes based on destination IP address, that field should be made available for Filtering. Conversely, a PSAMP Device that does not route is not expected to be able to locate an IP address within a packet, or make it available for Filtering, although it may do so.

当PSAMP设备提供属性匹配过滤,并且在执行PSAMP功能以外的通常能力下,识别或处理来自IP、传输协议或封装协议的信息时,则应使该信息可用于过滤。例如,当PSAMP设备基于目标IP地址路由时,该字段应可用于筛选。相反,不路由的PSAMP设备预计无法在数据包中定位IP地址,或使其可用于过滤,尽管它可能会这样做。

Since packet encryption conceals the real values of encrypted fields, Property Match Filtering must be configurable to ignore encrypted packets, when detected.

由于数据包加密隐藏加密字段的真实值,因此必须配置属性匹配筛选,以便在检测到加密数据包时忽略加密数据包。

The Selection Process may support Filtering based on the properties of the router state:

选择过程可支持基于路由器状态的属性进行过滤:

(i) Ingress interface at which a packet arrives equals a specified value

(i) 数据包到达的入口接口等于指定值

(ii) Egress interface to which a packet is routed to equals a specified value

(ii)数据包路由到的出口接口等于指定值

(iii) Packet violated Access Control List (ACL) on the router

(iii)路由器上的数据包违反访问控制列表(ACL)

(iv) Failed Reverse Path Forwarding (RPF)

(iv)反向路径转发(RPF)失败

(v) Failed Resource Reservation Protocol (RSVP)

(v) 失败的资源保留协议(RSVP)

(vi) No route found for the packet

(vi)未找到数据包的路由

(vii) Origin Border Gateway Protocol (BGP) Autonomous System (AS) [RFC4271] equals a specified value or lies within a given range

(vii)源边界网关协议(BGP)自治系统(AS)[RFC4271]等于指定值或位于给定范围内

(viii) Destination BGP AS equals a specified value or lies within a given range

(viii)目标BGP等于指定值或位于给定范围内

Packets that match the failed Reverse Path Forwarding (RPF) condition are packets for which ingress Filtering failed as defined in [RFC3704].

符合失败反向路径转发(RPF)条件的数据包是[RFC3704]中定义的入口过滤失败的数据包。

Packets that match the failed Resource Reservation Protocol (RSVP) condition are packets that do not fulfill the RSVP specification as defined in [RFC2205].

符合失败资源预留协议(RSVP)条件的数据包是不符合[RFC2205]中定义的RSVP规范的数据包。

Router architectural considerations may preclude some information concerning the packet treatment being available at line rate for selection of packets. For example, the Selection Process may not be implemented in the fast path that is able to access router state at line rate. However, when Filtering follows Sampling (or some other selection operation) in a Composite Selector, the rate of the Packet Stream output from the sampler and input to the filter may be sufficiently slow that the filter could select based on router state.

路由器体系结构考虑可能会排除一些关于分组处理的信息,这些信息可用于选择分组。例如,选择过程可能不在能够以线路速率访问路由器状态的快速路径中实现。然而,当在复合选择器中的采样(或某个其他选择操作)之后进行过滤时,从采样器输出并输入到过滤器的分组流的速率可能足够慢,以致过滤器可以基于路由器状态进行选择。

6.2. Hash-Based Filtering
6.2. 基于散列的过滤

A Hash Function h maps the Packet Content c, or some portion of it, onto a Hash Range R. The packet is selected if h(c) is an element of S, which is a subset of R called the Hash Selection Range. Thus, Hash-Based selection is a particular case of Filtering. The object is selected if c is in inv(h(S)). But for desirable Hash Functions, the inverse image inv(h(S)) will be extremely complex, and hence h would not be expressible as, say, a Property Match Filter or a simple combination of these.

散列函数h将数据包内容c或其部分映射到散列范围R上。如果h(c)是S的一个元素,则选择该数据包,S是R的子集,称为散列选择范围。因此,基于散列的选择是过滤的一种特殊情况。如果c在inv(h(S))中,则选择该对象。但对于所需的散列函数,逆图像inv(h(S))将是极其复杂的,因此h将不能表示为,例如,属性匹配过滤器或它们的简单组合。

Hash-based Selection is mainly used to realize a coordinated packet selection. That means that the same packets are selected at different Observation Points. This is useful for instance to observe the path (trajectory) that a packet took through the network or to apply packet selection to passive one-way measurements.

基于散列的选择主要用于实现协调的分组选择。这意味着在不同的观察点选择相同的数据包。例如,这对于观察数据包通过网络的路径(轨迹)或将数据包选择应用于被动单向测量非常有用。

A prerequisite for the method to work and to ensure interoperability is that the same Hash Function with the same parameters (e.g., input vector) is used at the Observation Points.

该方法工作和确保互操作性的先决条件是在观测点使用具有相同参数(例如,输入向量)的相同哈希函数。

A consistent packet selection is also possible with Property Match Filtering. Nevertheless, Hash-based Selection can be used to approximate a random selection. The desired statistical properties are discussed in Section 6.2.2.

通过属性匹配筛选,也可以选择一致的数据包。然而,基于散列的选择可以用于近似随机选择。第6.2.2节讨论了所需的统计特性。

In the following subsections, we give some application examples for coordinated packet selection.

在下面的小节中,我们将给出一些协调数据包选择的应用示例。

6.2.1. Application Examples for Coordinated Packet Selection
6.2.1. 协调分组选择的应用实例
6.2.1.1. Trajectory Sampling
6.2.1.1. 轨迹采样

Trajectory Sampling is the consistent selection of a subset of packets at either all of a set of Observation Points or none of them. Trajectory Sampling is realized by Hash-based Selection if all Observation Points in the set use a common Hash Function, Hash Domain, and Selection Range. The Hash Domain comprises all or part of the Packet Content that is invariant along the packet path. Fields such as Time-to-Live, which is decremented per hop, and header CRC [RFC1624], which is recalculated per hop, are thus excluded from the Hash Domain. The Hash Domain needs to be wider than just a flow key, if packets are to be selected quasi-randomly within flows.

轨迹采样是在一组观测点或所有观测点上一致选择数据包子集。如果集合中的所有观测点使用公共哈希函数、哈希域和选择范围,则轨迹采样通过基于哈希的选择实现。散列域包括沿分组路径不变的分组内容的全部或部分。因此,诸如生存时间(每跳递减)和头CRC[RFC1624]等字段(每跳重新计算)从哈希域中排除。如果要在流中准随机地选择数据包,则散列域需要比流密钥更宽。

The trajectory (or path) followed by a packet is reconstructed from PSAMP reports on it that reach a Collector. Reports on a given packet originating from different Observation Points are associated by matching a label from the reports. The label may comprise that

数据包后面的轨迹(或路径)由到达收集器的PSAMP报告重建。来自不同观察点的给定数据包上的报告通过匹配报告中的标签进行关联。该标签可包括:

portion of the invariant Packet Content that is reported, or possibly some digest of the invariant Packet Content that is inserted into the packet report at the Observation Point. Such a digest may be constructed by applying a second Hash Function to the invariant Packet Content. The reconstruction of trajectories and methods for dealing with possible ambiguities due to label collisions (identical labels reported for different packets) and potential loss of reports in transmission are dealt with in [DuGr00], [DuGG02], and [DuGr04].

报告的不变分组内容的一部分,或者可能是在观察点插入到分组报告中的不变分组内容的一些摘要。可以通过对不变分组内容应用第二散列函数来构造这样的摘要。[DuGr00]、[DuGG02]和[DuGr04]中讨论了由于标签冲突(针对不同数据包报告的相同标签)和传输中报告的潜在丢失而导致的轨迹重建和处理可能歧义的方法。

Applications of trajectory Sampling include (i) estimation of the network path matrix, i.e., the traffic intensities according to network path, broken down by flow key; (ii) detection of routing loops, as indicated by self-intersecting trajectories; (iii) passive performance measurement: prematurely terminating trajectories indicate packet loss, packet one-way delay can be determined if reports include (synchronized) timestamps of packet arrival at the Observation Point; and (iv) network attack tracing, of the actual paths taken by attack packets with spoofed source addresses.

轨迹采样的应用包括(i)网络路径矩阵的估计,即根据网络路径按流键分解的流量强度;(ii)如自相交轨迹所示,探测路由环路;(iii)被动性能测量:提前终止轨迹表明数据包丢失,如果报告包括(同步)数据包到达观测点的时间戳,则可以确定数据包单向延迟;和(iv)网络攻击跟踪,跟踪具有伪造源地址的攻击包所采取的实际路径。

6.2.1.2. Passive One-Way Measurements
6.2.1.2. 被动单向测量

Coordinated packet selection can be applied for instance to one-way delay measurements in order to reduce the required resources. In one-way delay measurements, packets are collected at different Observation Points in the network. A packet digest is generated for each packet that helps to identify the packet. The packet digest and the arrival time of the packet at the Observation Point are reported to a process that calculates the delay. The delay is calculated by subtracting the arrival time of the same packet at the Observation Points (e.g., [ZsZC01]). With high data rates, capturing all packets can require a lot of resources for storage, transfer, and processing. To reduce resource consumption, packet selection methods can be applied. But for such selection techniques, it has to be ensured that the same packets are collected at different Observation Points. Hash-based Selection provides this feature.

协调分组选择可以例如应用于单向延迟测量,以减少所需资源。在单向延迟测量中,在网络中的不同观察点收集数据包。为每个数据包生成一个数据包摘要,以帮助识别该数据包。数据包摘要和数据包在观测点的到达时间报告给计算延迟的进程。通过减去相同数据包在观测点(例如[ZsZC01])的到达时间来计算延迟。对于高数据速率,捕获所有数据包可能需要大量资源进行存储、传输和处理。为了减少资源消耗,可以应用包选择方法。但是对于这种选择技术,必须确保在不同的观察点收集相同的数据包。基于哈希的选择提供了此功能。

6.2.1.3. Generation of Pseudorandom Numbers
6.2.1.3. 伪随机数的产生

Although pseudorandom number generators with well-understood properties have been developed, they may not be the method of choice in settings where computational resources are scarce. A convenient alternative is to use Hash Functions of Packet Content as a source of randomness. The hash (suitably re-normalized) is a pseudorandom variate in the interval [0,1]. Other schemes may use packet fields in iterators for pseudorandom numbers. However, the statistical properties of an ideal packet selection law (such as independent

虽然已经开发出了具有众所周知特性的伪随机数生成器,但在计算资源匮乏的环境中,它们可能不是首选方法。一种方便的替代方法是使用数据包内容的哈希函数作为随机性源。散列(适当地重新规范化)是区间[0,1]中的伪随机变量。其他方案可以在迭代器中使用伪随机数的数据包字段。然而,一个理想的数据包选择律的统计特性(如独立的

Sampling for different packets, or independence on Packet Content) may not be exactly rendered by an implementation, but only approximately so.

不同数据包的采样(或数据包内容的独立性)可能不会由实现精确呈现,但只能大致呈现。

Use of Packet Content to generate pseudorandom variates shares with non-uniform probabilistic Sampling (see Section 5.2.2.2 above) the property that selection decisions depend on Packet Content. However, there is a fundamental difference between the two. In the former case, the content determines pseudorandom variates. In the latter case, the content only determines the selection probabilities: selection could then proceed, e.g., by use of random variates obtained by an independent pseudorandom number generator.

使用数据包内容生成伪随机变量与非均匀概率抽样(见上文第5.2.2.2节)共享选择决定取决于数据包内容的属性。然而,两者之间有一个根本的区别。在前一种情况下,内容确定伪随机变量。在后一种情况下,内容仅确定选择概率:然后可以进行选择,例如,通过使用独立伪随机数生成器获得的随机变量。

6.2.2. Desired Properties of Hash Functions
6.2.2. 哈希函数的期望属性

Here we formulate desired properties for Hash Functions. For this, we have to distinguish whether a Hash Function is used for packet selection or just as a packet digest. The main focus of this document is on packet selection. Nevertheless, we also provide some requirements for the use of Hash Functions as packet digest.

这里我们为散列函数制定所需的属性。为此,我们必须区分哈希函数是用于数据包选择还是仅用作数据包摘要。本文档的主要重点是数据包选择。然而,我们也提供了一些使用散列函数作为数据包摘要的要求。

First of all, we need to define suitable input fields from the packet. In accordance to [DuGr00], input field should be:

首先,我们需要从数据包中定义合适的输入字段。根据[DuGr00],输入字段应为:

- invariant on the path - variable among packets

- 包间路径变量的不变量

Only if the input fields are the same at different Observation Points is it possible to recognize the packet. The input fields should be variable among packets in order to distribute the hash results over the selection range.

只有在不同观察点的输入字段相同时,才能识别数据包。输入字段在数据包之间应该是可变的,以便在选择范围内分布散列结果。

6.2.2.1. Requirements for Packet Selection
6.2.2.1. 数据包选择的要求

In accordance to considerations in [MoND05] and [Henk08], we define the following desired properties of Hash Functions used for packet selection:

根据[MoND05]和[Henk08]中的考虑,我们定义了用于数据包选择的哈希函数的以下期望属性:

(i) Speed: The Hash Function has to be applied to each packet that traverses the Observation Point. Therefore, it has to be fast in order to cope with the high packet rates. In the ideal case, the hash operation should not influence the performance on the PSAMP Device.

(i) 速度:散列函数必须应用于通过观察点的每个数据包。因此,它必须是快速的,以应付高分组速率。在理想情况下,哈希操作不应影响PSAMP设备的性能。

(ii) Uniformity: The Hash Function h should have good mixing properties, in the sense that small changes in the input (e.g., the flipping of a single bit) cause large changes in the output (many bits change). Then any local clump of values of c is spread widely over R by h, and so the distribution of h(c) is fairly uniform even if the distribution of c is not. Then the Attained Selection Fraction gets close to the Configured Selection Fraction (#S/#R), which can be tuned by choice of S.

(ii)一致性:散列函数h应具有良好的混合特性,即输入中的小变化(例如,单个位的翻转)会导致输出中的大变化(许多位的变化)。然后,c值的任何局部簇被h广泛地分布在R上,因此,即使c的分布不是均匀的,h(c)的分布也是相当均匀的。然后,获得的选择分数接近配置的选择分数(#S/#R),可以通过选择S进行调整。

(iii) Unbiasedness: The selection decision should be as independent of packet attributes as possible. The set of selected packets should not be biased towards a specific type of packets.

(iii)无偏性:选择决策应尽可能独立于数据包属性。所选数据包集不应偏向于特定类型的数据包。

(iv) Representativeness of sample: The sample should be as representative as possible for the observed traffic.

(iv)样本的代表性:样本应尽可能代表观测的交通量。

(v) Non-linearity: The function should not be linear. This increases the mixing properties (uniformity criterion). In addition to this, it decreases the predictability of the output and therefore the vulnerabilities against attacks.

(v) 非线性:函数不应是线性的。这增加了混合特性(均匀性标准)。除此之外,它还降低了输出的可预测性,从而降低了抵御攻击的漏洞。

(vi) Robustness against vulnerabilities: The Hash Function should be robust against attacks. Potential vulnerabilities are described in Section 6.2.3.

(vi)对漏洞的鲁棒性:哈希函数应该对攻击具有鲁棒性。第6.2.3节描述了潜在漏洞。

6.2.2.2. Requirements for Packet Digesting
6.2.2.2. 包装消化要求

For digesting Packet Content for inclusion in a reported label, the most important property is a low collision frequency. A secondary requirement is the ability to accept variable-length input, in order to allow inclusion of maximal amount of packet as input. Execution speed is of secondary importance, since the digest need only be formed from selected packets.

对于要包含在报告标签中的数据包内容,最重要的特性是低冲突频率。第二个要求是能够接受可变长度的输入,以允许包含最大数量的数据包作为输入。执行速度是次要的,因为摘要只需要从选定的数据包中形成。

6.2.3. Security Considerations for Hash Functions
6.2.3. 哈希函数的安全注意事项

A concern for Hash-based Selection is whether some large set of related packets could be disproportionately sampled, i.e., that the Attained Selection Fraction is significantly different from the Configured Selection Fraction. This can happen either

基于散列的选择的一个关注点是,一些大型相关分组集合是否会被不成比例地采样,即,所获得的选择分数与所配置的选择分数显著不同。这两种情况都可能发生

(i) through unanticipated behavior in the Hash Function, or

(i) 通过哈希函数中的意外行为,或

(ii) because the packets had been deliberately crafted to have this property.

(ii)因为这些数据包是经过精心设计而具有此属性的。

The first point underlines the importance of using a Hash Function with good mixing properties. For this, the statistical properties of candidate Hash Functions need to be evaluated. Since the hash output depends on the traffic mix, the evaluation should be done preferably on up-to-date packet traces from the network in which the Hash-based Selection will be deployed.

第一点强调了使用具有良好混合特性的哈希函数的重要性。为此,需要评估候选哈希函数的统计特性。由于散列输出取决于通信量混合,因此最好在将部署基于散列的选择的网络的最新分组跟踪上进行评估。

However, Hash Functions that perform well on typical traffic may not be sufficiently strong to withstand attacks specifically targeted against them. Such potential attacks have been described in [GoRe07].

但是,在典型流量上表现良好的哈希函数可能不够强大,无法抵御专门针对它们的攻击。[GoRe07]中描述了此类潜在攻击。

In the following subsections, we point out different potential attack scenarios. We encourage the use of standardized Hash Functions. Therefore, we assume that the Hash Function itself is public and hence known to an attacker.

在下面的小节中,我们指出了不同的潜在攻击场景。我们鼓励使用标准化的散列函数。因此,我们假设哈希函数本身是公开的,因此攻击者知道它。

Nevertheless, we also assume the possibility of using a private input parameter for the Hash Function that is kept secret. Such an input parameter can for instance be attached to the hash input before the hash operation is applied. With this, at least parts of the hash operation remain secret.

尽管如此,我们也假设有可能对保密的哈希函数使用私有输入参数。例如,可以在应用散列操作之前将这样的输入参数附加到散列输入。这样,哈希操作的至少部分仍然是秘密的。

For the attack scenarios, we assume that an attacker uses its knowledge of the Hash Function to craft packets that are then dispatched, either as the attack itself or to elicit further information that can be used to refine the attack.

对于攻击场景,我们假设攻击者利用其对哈希函数的了解来制作数据包,然后将这些数据包作为攻击本身发送,或者获取可用于改进攻击的进一步信息。

Two scenarios are considered. In the first scenario, the attacker has no knowledge about whether or not the crafted packets are selected. In the second scenario, the attacker uses some knowledge of Sampling outcomes. The means by which this might be acquired is discussed below. Some additional attacks that involve tampering with Export Packets in transit, as opposed to attacking the PSAMP Device, are discussed in [GoRe07].

考虑了两种情况。在第一种情况下,攻击者不知道是否选择了精心编制的数据包。在第二个场景中,攻击者使用一些采样结果的知识。下文将讨论获取该信息的方法。[GoRe07]中讨论了一些涉及在传输过程中篡改导出数据包的额外攻击,而不是攻击PSAMP设备。

6.2.3.1. Vulnerabilities of Hash-Based Selection without Knowledge of Selection Outcomes

6.2.3.1. 不知道选择结果的基于哈希的选择的漏洞

(i) The Hash Function does not use a private parameter.

(i) 哈希函数不使用私有参数。

If no private input parameter is used, potential attackers can easily calculate which packets result in which hash values. If the selection range is public, an attacker can craft packets whose selection properties are known in advance. If the selection range is private, an attacker cannot determine whether a crafted packet is selected. However, by computing the hash on different trial crafted packets, and selecting those yielding a given hash value, the

如果没有使用私有输入参数,潜在的攻击者可以轻松计算哪些数据包产生哪些散列值。如果选择范围是公共的,则攻击者可以制作选择属性事先已知的数据包。如果选择范围是私有的,则攻击者无法确定是否选择了精心编制的数据包。然而,通过计算不同的试制作数据包上的散列,并选择产生给定散列值的数据包

attacker can construct an arbitrarily large set of distinct packets with a common selection properties, i.e., packets that will be either all selected or all not selected. This can be done whatever the strength of the Hash Function.

攻击者可以构造任意大的一组具有公共选择属性的不同数据包,即全部选中或全部未选中的数据包。无论哈希函数的强度如何,这都可以实现。

(ii) The Hash Function is not cryptographically strong.

(ii)哈希函数在加密方面不强。

If the Hash Function is not cryptographically strong, it may be possible to construct sequences of distinct packets with the common selection property even if a private parameter is used.

如果散列函数的加密性不强,则即使使用私有参数,也可能构造具有公共选择属性的不同数据包序列。

An example is the standard CRC-32 Hash Function used with a private modulus (but without a private string post-pended to the input). It has weak mixing properties for low-order bits. Consequently, simply by incrementing the hash input, one obtains distinct packets whose hashes mostly fall in a narrow range, and hence are likely commonly selected; see [GoRe07].

例如,与专用模一起使用的标准CRC-32散列函数(但没有挂起到输入的专用字符串)。对于低阶比特,它具有弱混合特性。因此,简单地通过增加散列输入,可以获得散列大部分落在狭窄范围内的不同分组,因此可能是通常选择的分组;见[GoRe07]。

Suitable parameterization of the Hash Function can make such attacks more difficult. For example, post-pending a private string to the input before hashing with CRC-32 will give stronger mixing properties over all bits of the input. However, with a Hash Function, such as CRC-32, that is not cryptographically strong, the possibility of discovering a method to construct packet sets with the common selected property cannot be ruled out, even when a private modulus or post-pended string is used.

哈希函数的适当参数化会使此类攻击更加困难。例如,在使用CRC-32进行散列之前,将一个私有字符串post挂起到输入中,将在输入的所有位上提供更强的混合属性。然而,对于加密性不强的哈希函数(如CRC-32),即使使用私有模或后挂起字符串,也不能排除发现构造具有公共选定属性的数据包集的方法的可能性。

6.2.3.2. Vulnerabilities of Hash-Based Selection Using Knowledge of Selection Outcomes

6.2.3.2. 使用选择结果知识的基于哈希的选择漏洞

Knowledge of the selection outcomes of crafted packets can be used by an attacker to more easily construct sets of packets that are disproportionately sampled and/or are commonly selected. For this, the attacker does not need any a priori knowledge about the Hash Function or selection range.

攻击者可以利用对精心制作的数据包的选择结果的了解来更容易地构造不相称采样和/或通常选择的数据包集。为此,攻击者不需要任何有关哈希函数或选择范围的先验知识。

There are several ways an attacker might acquire this knowledge about the selection outcome:

攻击者可以通过多种方式获取有关选择结果的信息:

(i) Billing Reports: If samples are used for billing purposes, then the selection outcomes of packets may be able to be inferred by correlating a crafted Packet Stream with the billing reports that it generates. However, the rate at which knowledge of selection outcomes can be acquired depends on the temporal and spatial granularity of the billing reports; being slower the more aggregated the reports are.

(i) 计费报告:如果样本用于计费目的,则可以通过将精心编制的数据包流与其生成的计费报告相关联来推断数据包的选择结果。然而,获取选择结果知识的速率取决于计费报告的时间和空间粒度;越慢,报告的聚合度越高。

(ii) Feedback from an Intrusion Detection System: e.g., a botmaster adversary learns if his packets were detected by the intrusion detection system by seeing if one of his bots is blocked by the network.

(ii)来自入侵检测系统的反馈:例如,botmaster对手通过查看其一个机器人是否被网络阻止来了解其数据包是否被入侵检测系统检测到。

(iii) Observation of the Report Stream: Export Packets sent across a public network may be eavesdropped on by an adversary. Encryption of the Export Packets provides only a partial defense, since it may be possible to infer the selection outcomes of packets by correlating a crafted Packet Stream with the occurrence (not the content) of packets in the export stream that it generates. The rate at which such knowledge could be acquired is limited by the temporal resolution at which reports can be associated with packets, e.g., due to processing and propagation variability, and difficulty in distinguishing report on attack packets from those of background traffic, if present. The association between packets and their reports on which this depends could be removed by padding Export Packets to a constant length and sending them at a constant rate.

(iii)观察报告流:通过公共网络发送的导出数据包可能被对手窃听。导出数据包的加密仅提供部分防御,因为可以通过将精心编制的数据包流与其生成的导出流中数据包的出现(而不是内容)相关联来推断数据包的选择结果。可获取此类知识的速率受到报告可与分组相关联的时间分辨率的限制,例如,由于处理和传播可变性,以及难以区分攻击分组的报告与背景流量的报告(如果存在)。通过将导出数据包填充为固定长度并以固定速率发送数据包,可以删除数据包与其报告之间的关联。

We now turn to attacks that can exploit knowledge of selection outcomes. First, with a non-cryptographic Hash Function, knowledge of selection outcomes for a trial stream may be used to further craft a packet set with the common selection property. This has been demonstrated for the modular hash f(x) = a x + b mod k, for private parameters a, b, and k. With Sampling rate p, knowledge of the Sampling outcomes of roughly 2/p is sufficient for the attack to succeed, independent of the values of a, b, and k. With knowledge of the selection outcomes of a larger number of packets, the parameters a, b, and k can be determined; see [GoRe07].

我们现在转向可以利用选择结果知识的攻击。首先,利用非加密散列函数,可以使用关于试用流的选择结果的知识来进一步制作具有公共选择属性的分组集。对于模块散列f(x)=a x+b mod k,对于私有参数a、b和k,已经证明了这一点。对于采样率p,了解大约2/p的采样结果就足以使攻击成功,这与a、b和k的值无关。在知道大量分组的选择结果的情况下,可以确定参数a、b和k;见[GoRe07]。

A cryptographic Hash Function employing a private parameter and operating in one of the pseudorandom function modes specified above is not vulnerable to these attacks, even if the selection range is known.

使用私有参数并在上述伪随机函数模式之一下操作的加密哈希函数不易受到这些攻击,即使选择范围已知。

6.2.3.3. Vulnerabilities to Replay Attacks
6.2.3.3. 重放攻击的漏洞

Since Hash-based Selection is deterministic, any packet or set of packets with known selection properties can be replayed into a network and experience the same selection outcomes provide the Hash Function and its parameters are not changed. Repetition of a single packet may be noticeable to other measurement methods if employed (e.g., collection of flow statistics), whereas a set of distinct packets that appears statistically similar to regular traffic may be less noticeable.

由于基于散列的选择是确定性的,因此具有已知选择属性的任何分组或分组集都可以被重放到网络中,并且在提供散列函数且其参数不改变的情况下体验相同的选择结果。如果采用其他测量方法(例如,流量统计数据的收集),则单个数据包的重复可能会引起注意,而在统计上与常规流量相似的一组不同数据包可能不太明显。

Replay attacks may be mitigated by repeated changing of Hash Function parameters. This also prevents attacks that exploit knowledge of Sampling outcomes, at least if the parameters are changed at least as fast as the knowledge can be acquired by an attacker. In order to preserve the ability to perform trajectory Sampling, parameter change would have to be simultaneous (or approximately so) across all Observation Points.

重播攻击可以通过反复更改哈希函数参数来缓解。这还可以防止利用采样结果知识的攻击,至少在参数的更改速度与攻击者可以获取的知识速度相同的情况下。为了保持执行轨迹采样的能力,必须在所有观测点上同时(或大致如此)改变参数。

6.2.4. Choice of Hash Function
6.2.4. 哈希函数的选择

The specific choice of Hash Function represents a trade-off between complexity and ease of implementation. Ideally, a cryptographically strong Hash Function employing a private parameter and operating in pseudorandom function mode as specified above would be used, yielding a good emulation of a random packet selection at a target Sampling rate, and giving maximal robustness against the attacks described in the previous section. Unfortunately, there is currently no single Hash Function that fulfills all the requirements.

哈希函数的具体选择代表了复杂性和易实现性之间的权衡。理想情况下,将使用采用私有参数并在如上所述的伪随机函数模式下操作的加密强散列函数,以目标采样率产生对随机分组选择的良好模拟,并提供对上一节中描述的攻击的最大鲁棒性。不幸的是,目前没有一个散列函数可以满足所有要求。

As detailed in Section 6.2.3, only cryptographic Hash Functions employing a private parameter operating in pseudorandom function mode are sufficiently strong to withstand the range of conceivable attacks. For example, fixed- or variable-length inputs could be hashed using a block cipher (like Advanced Encryption Standard (AES)) in cipher-block-chaining mode. Fixed-length inputs could also be hashed using an iterated cryptographic Hash Function (like MD5 or SHA1), with a private initial vector. For variable-length inputs, an iterated cryptographic Hash Function (like MD5 or SHA1) should employ private string post-pended to the data in addition to a private initial vector. For more details, see the "append-cascade" construction of [BeCK96]. We encourage the use of such cryptographically strong Hash Functions wherever possible.

如第6.2.3节所述,只有采用在伪随机函数模式下运行的私有参数的加密哈希函数才足够强大,能够承受可能的攻击范围。例如,可以在密码块链接模式下使用块密码(如高级加密标准(AES))对固定长度或可变长度输入进行散列。固定长度的输入也可以使用迭代加密哈希函数(如MD5或SHA1)和私有初始向量进行哈希。对于可变长度输入,迭代加密哈希函数(如MD5或SHA1)除了使用私有初始向量外,还应使用私有字符串post挂起到数据。有关更多详细信息,请参见[BeCK96]的“追加级联”构造。我们鼓励尽可能使用这种加密强散列函数。

However, a problem with using such functions is the low performance. As shown for instance in [Henk08], the computation times for MD5 and SHA are about 7-10 times higher compared to non-cryptographic functions. The difference increases for small hash input lengths.

但是,使用这些函数的一个问题是性能低下。例如,如[Henk08]所示,MD5和SHA的计算时间比非加密函数高7-10倍。对于较小的散列输入长度,差异会增大。

Therefore, it is not assumed that all PSAMP Devices will be capable of applying a cryptographically strong Hash Function to every packet at line rate. For this reason, the Hash Functions listed in this section will be of a weaker variety. Future protocol extensions that employ stronger Hash Functions are highly welcome.

因此,并不认为所有PSAMP设备都能够以行速率对每个数据包应用加密强哈希函数。因此,本节中列出的散列函数的种类较少。未来使用更强大的散列函数的协议扩展非常受欢迎。

Comparisons of Hash Functions for packet selection and packet digesting with regard to various criteria can be found in [MoND05] and [Henk08].

在[MoND05]和[Henk08]中可以找到关于不同标准的数据包选择和数据包摘要的哈希函数的比较。

6.2.4.1. Hash Functions for Packet Selection
6.2.4.1. 用于数据包选择的哈希函数

If Hash-based packet Selection is applied, the BOB function MUST be used for packet selection operations in order to be compliant with PSAMP. The specification of BOB is given in the appendix. Both the parameter (the init value) and the selection range should be kept private. The initial vector of the Hash Function MUST be configurable out of band to prevent security breaches like exposure of the initial vector content.

如果应用了基于散列的数据包选择,则必须将BOB函数用于数据包选择操作,以符合PSAMP。附录中给出了BOB的规格。参数(初始值)和选择范围都应保持私有。哈希函数的初始向量必须可在带外配置,以防止安全漏洞,如初始向量内容的暴露。

Other functions, such as CRC-32 and IPSX, MAY be used. The IPSX function is described in the appendix, and the CRC-32 function is described in [RFC1141]. If CRC-32 is used, the input should first be post-pended with a private string that acts as a parameter, and the modulus of the CRC should also be kept private.

可以使用其他功能,例如CRC-32和IPSX。附录中描述了IPSX功能,而[RFC1141]中描述了CRC-32功能。如果使用CRC-32,则应首先使用充当参数的专用字符串对输入进行后挂起,并且CRC的模数也应保持专用。

IPSX is simple to implement and was correspondingly about an order of magnitude faster to execute per packet than BOB or CRC-32 [MoND05].

IPSX实现简单,相应地,每个数据包的执行速度比BOB或CRC-32快一个数量级[MoND05]。

All three Hash Functions evaluated showed relatively poor uniformity with 16-byte input that was drawn from only invariant fields in the IP and TCP/UDP headers (i.e., header fields that do not change from hop to hop). IPSX is inherently limited to 16 bytes.

评估的所有三个哈希函数都显示出相对较差的一致性,16字节的输入仅来自IP和TCP/UDP头中的不变字段(即,头字段不随跃点变化)。IPSX本身被限制为16字节。

BOB and CRC-32 exhibit noticeably better uniformity when 4 or more bytes from the payload are also included in the input [MoND05]. Also with other criteria BOB performed quite well [Henk08].

当输入中还包括来自有效负载的4个或更多字节时,BOB和CRC-32表现出明显更好的一致性[MoND05]。此外,根据其他标准,鲍勃表现相当出色[Henk08]。

Although the characteristics have been checked for different traffic traces, results cannot be generalized to arbitrary traffic. Since Hash-based Selection is a deterministic function on the Packet Content, it can always be biased towards packets with specific attributes. Furthermore, it should be noted that all Hash Functions were evaluated only for IPv4.

虽然已经针对不同的流量跟踪检查了特征,但结果不能推广到任意流量。由于基于散列的选择是对数据包内容的确定性函数,因此它总是偏向于具有特定属性的数据包。此外,应该注意的是,所有哈希函数仅针对IPv4求值。

None of these Hash Functions is recommended for cryptographic purposes. Please also note that the use of a private parameter only slightly reduces the vulnerabilities against attacks. As shown in Section 6.2.3, functions that are not cryptographically strong (e.g., BOB and CRC) cannot prevent attackers from crafting packets that are disproportionally selected even if a private parameter is used and the selection range is kept secret.

出于加密目的,建议不要使用这些哈希函数。还请注意,使用私有参数只会略微降低攻击漏洞。如第6.2.3节所示,加密功能不强的函数(例如,BOB和CRC)无法阻止攻击者制作不成比例选择的数据包,即使使用了私有参数且选择范围保密。

As described in Section 6.2.2, the input bytes for the Hash Function need to be invariant along the path the packet is traveling. Only with this it is ensured that the same packets are selected at

如第6.2.2节所述,哈希函数的输入字节需要沿数据包移动的路径保持不变。只有这样,才能确保在同一时间选择相同的数据包

different Observation Points. Furthermore, they should have a high variability between different packets to generate a high variation in the Hash Range. An evaluation of the variability of different packet header fields can be found in [DuGr00], [HeSZ08], and [Henk08].

不同的观察点。此外,它们在不同分组之间应该具有高可变性,以在散列范围中产生高变化。在[DuGr00]、[HES008]和[Henk08]中可以找到对不同数据包头字段可变性的评估。

If a Hash-based Selection with the BOB function is used with IPv4 traffic, the following input bytes MUST be used.

如果将基于哈希的选择与BOB函数一起用于IPv4通信,则必须使用以下输入字节。

- IP identification field

- IP标识字段

- Flags field

- 标志字段

- Fragment offset

- 碎片偏移量

- Source IP address

- 源IP地址

- Destination IP address

- 目标IP地址

- A configurable number of bytes from the IP payload, starting at a configurable offset

- IP有效负载的可配置字节数,从可配置的偏移量开始

Due to the lack of suitable IPv6 packet traces, all candidate Hash Functions in [DuGr00], [MoND05], and [Henk08] were evaluated only for IPv4. Due to the IPv6 header fields and address structure, it is expected that there is less randomness in IPv6 packet headers than in IPv4 headers. Nevertheless, the randomness of IPv6 traffic has not yet been evaluated sufficiently to get any evidence. In addition to this, IPv6 traffic profiles may change significantly in the future when IPv6 is used by a broader community.

由于缺少合适的IPv6数据包跟踪,[DuGr00]、[MoND05]和[Henk08]中的所有候选哈希函数仅针对IPv4进行评估。由于IPv6报头字段和地址结构,预计IPv6数据包报头中的随机性比IPv4报头中的随机性小。然而,IPv6流量的随机性尚未得到充分评估,无法获得任何证据。除此之外,当IPv6被更广泛的社区使用时,IPv6流量配置文件在未来可能会发生重大变化。

If a Hash-based Selection with the BOB function is used with IPv6 traffic, the following input bytes MUST be used.

如果将基于哈希的选择与BOB函数一起用于IPv6通信,则必须使用以下输入字节。

- Payload length (2 bytes)

- 有效负载长度(2字节)

- Byte number 10,11,14,15,16 of the IPv6 source address

- IPv6源地址的字节号10,11,14,15,16

- Byte number 10,11,14,15,16 of the IPv6 destination address

- IPv6目标地址的字节号10,11,14,15,16

- A configurable number of bytes from the IP payload, starting at a configurable offset. It is recommended to use at least 4 bytes from the IP payload.

- IP有效负载的可配置字节数,从可配置的偏移量开始。建议至少使用IP有效负载中的4个字节。

The payload itself is not changing during the path. Even if some routers process some extension headers, they are not going to strip them from the packet. Therefore, the payload length is invariant along the path. Furthermore, it usually differs for different packets. The IPv6 address has 16 bytes. The first part is the

有效载荷本身在路径过程中不会发生变化。即使某些路由器处理一些扩展头,它们也不会从数据包中删除它们。因此,有效载荷长度沿路径不变。此外,对于不同的数据包,它通常是不同的。IPv6地址有16个字节。第一部分是全文的重点

network part and contains low variation. The second part is the host part and contains higher variation. Therefore, the second part of the address is used. Nevertheless, the uniformity has not been checked for IPv6 traffic.

网络部分和包含低变化。第二部分是主体部分,包含较高的变体。因此,使用地址的第二部分。然而,IPv6流量的一致性尚未得到检查。

6.2.4.2. Hash Functions Suitable for Packet Digesting
6.2.4.2. 适用于数据包摘要的哈希函数

For this purpose also the BOB function SHOULD be used. Other functions (such as CRC-32) MAY be used. Among the functions capable of operating with variable-length input, BOB and CRC-32 have the fastest execution, BOB being slightly faster. IPSX is not recommended for digesting because it has a significantly higher collision rate and takes only a fixed-length input.

为此,还应使用BOB函数。可以使用其他功能(如CRC-32)。在能够使用可变长度输入操作的函数中,BOB和CRC-32的执行速度最快,BOB稍快。不建议对IPSX进行消化,因为它的冲突率明显较高,并且只需要固定长度的输入。

7. Parameters for the Description of Selection Techniques
7. 用于描述选择技术的参数

This section gives an overview of different alternative selection schemes and their required parameters. In order to be compliant with PSAMP, at least one of proposed schemes MUST be implemented.

本节概述了不同的备选方案及其所需参数。为了符合PSAMP,必须实施至少一个提议的方案。

The decision whether or not to select a packet is based on a function that is performed when the packet arrives at the selection process. Packet selection schemes differ in the input parameters for the selection process and the functions they require to do the packet selection. The following table gives an overview.

是否选择分组的决定基于分组到达选择处理时执行的功能。数据包选择方案在选择过程的输入参数和进行数据包选择所需的功能方面有所不同。下表给出了概述。

     Scheme       |   Input parameters     |     Functions
   ---------------+------------------------+-------------------
    systematic    |    packet position     |  packet counter
    count-based   |    Sampling pattern    |
   ---------------+------------------------+-------------------
    systematic    |      arrival time      |  clock or timer
    time-based    |     Sampling pattern   |
   ---------------+------------------------+-------------------
    random        |  packet position       |  packet counter,
    n-out-of-N    |  Sampling pattern      |  random numbers
                  | (random number list)   |
   ---------------+------------------------+-------------------
    uniform       |        Sampling        |  random function
    probabilistic |      probability       |
   ---------------+------------------------+-------------------
    non-uniform   |e.g., packet position,  | selection function,
    probabilistic |  Packet Content(parts) |  probability calc.
   ---------------+------------------------+-------------------
    non-uniform   |e.g., flow state,       | selection function,
    flow-state    |  Packet Content(parts) |  probability calc.
   ---------------+------------------------+-------------------
    property      | Packet Content(parts)  |  filter function or
    match         | or router state        |  state discovery
   ---------------+------------------------+-------------------
    hash-based    |  Packet Content(parts) |  Hash Function
   ---------------+------------------------+-------------------
        
     Scheme       |   Input parameters     |     Functions
   ---------------+------------------------+-------------------
    systematic    |    packet position     |  packet counter
    count-based   |    Sampling pattern    |
   ---------------+------------------------+-------------------
    systematic    |      arrival time      |  clock or timer
    time-based    |     Sampling pattern   |
   ---------------+------------------------+-------------------
    random        |  packet position       |  packet counter,
    n-out-of-N    |  Sampling pattern      |  random numbers
                  | (random number list)   |
   ---------------+------------------------+-------------------
    uniform       |        Sampling        |  random function
    probabilistic |      probability       |
   ---------------+------------------------+-------------------
    non-uniform   |e.g., packet position,  | selection function,
    probabilistic |  Packet Content(parts) |  probability calc.
   ---------------+------------------------+-------------------
    non-uniform   |e.g., flow state,       | selection function,
    flow-state    |  Packet Content(parts) |  probability calc.
   ---------------+------------------------+-------------------
    property      | Packet Content(parts)  |  filter function or
    match         | or router state        |  state discovery
   ---------------+------------------------+-------------------
    hash-based    |  Packet Content(parts) |  Hash Function
   ---------------+------------------------+-------------------
        
7.1. Description of Sampling Techniques
7.1. 取样技术说明

In this section, we define what elements are needed to describe the most common Sampling techniques. Here the selection function is predefined and given by the Selector ID.

在本节中,我们将定义描述最常见的采样技术所需的元素。这里,选择函数是预定义的,由选择器ID给出。

Sampler Description: SELECTOR_ID SELECTOR_TYPE SELECTOR_PARAMETERS

采样器说明:选择器\ ID选择器\类型选择器\参数

Where:

哪里:

SELECTOR_ID: Unique ID for the packet sampler.

选择器ID:数据包采样器的唯一ID。

SELECTOR_TYPE: For Sampling processes, the SELECTOR TYPE defines what Sampling algorithm is used. Values: Systematic count-based | Systematic time-based | Random |n-out-of-N | uniform probabilistic | Non-uniform probabilistic | Non-uniform flow state

选择器类型:对于采样过程,选择器类型定义使用的采样算法。值:基于系统计数|基于系统时间|随机| n取n |均匀概率|非均匀概率|非均匀流动状态

SELECTOR_PARAMETERS: For Sampling processes, the SELECTOR PARAMETERS define the input parameters for the process. Interval length in systematic Sampling means that all packets that arrive in this interval are selected. The spacing parameter defines the spacing in time or number of packets between the end of one Sampling interval and the start of the next succeeding interval.

选择器参数:对于采样过程,选择器参数定义过程的输入参数。系统采样中的间隔长度意味着选择在此间隔内到达的所有数据包。spacing参数定义一个采样间隔结束与下一个后续间隔开始之间的时间间隔或数据包数。

Case n-out-of-N: - Population Size N, Sample size n

案例n-of-n:-总体规模n,样本规模n

Case systematic time-based: - Interval length (in usec), Spacing (in usec)

基于时间的案例系统:-间隔长度(单位:usec),间隔(单位:usec)

Case systematic count-based: - Interval length (in packets), Spacing (in packets)

基于大小写的系统计数:-间隔长度(以数据包为单位),间隔(以数据包为单位)

Case uniform probabilistic (with equal probability per packet): - Sampling probability p

案例一致概率(每个包的概率相等):-抽样概率p

Case non-uniform probabilistic: - Calculation function for Sampling probability p (see also Section 5.2.2.4)

情形非均匀概率:-抽样概率p的计算函数(另见第5.2.2.4节)

Case flow state: - Information reported for flow state Sampling is not defined in this document (see also Section 5.2.2.4)

案例流量状态:-本文件未定义流量状态取样报告的信息(另见第5.2.2.4节)

7.2. Description of Filtering Techniques
7.2. 过滤技术说明

In this section, we define what elements are needed to describe the most common Filtering techniques. The structure closely parallels the one presented for the Sampling techniques.

在本节中,我们将定义描述最常见的过滤技术所需的元素。该结构与取样技术的结构非常相似。

Filter Description: SELECTOR_ID SELECTOR_TYPE SELECTOR_PARAMETERS

过滤器描述:选择器\ ID选择器\类型选择器\参数

Where:

哪里:

SELECTOR_ID: Unique ID for the packet filter. The ID can be calculated under consideration of the SELECTION SEQUENCE and a local ID.

选择器ID:数据包筛选器的唯一ID。可以在考虑选择序列和本地ID的情况下计算ID。

SELECTOR_TYPE: For Filtering processes, the SELECTOR TYPE defines what Filtering type is used. Values: Matching | Hashing | Router_state

选择器类型:对于过滤过程,选择器类型定义使用的过滤类型。值:匹配|哈希|路由器|状态

SELECTOR_PARAMETERS: For Filtering processes, the SELECTOR PARAMETERS define formally the common property of the packet being filtered. For the filters of type matching and hashing, the definitions have a lot of points in common.

选择器参数:对于过滤过程,选择器参数正式定义了被过滤数据包的公共属性。对于类型匹配和哈希的过滤器,这些定义有许多共同点。

Values:

价值观:

Case matching: - Information Element (from [RFC5102]) - Value (type in accordance to [RFC5102])

案例匹配:-信息元素(来自[RFC5102])-值(根据[RFC5102]输入)

In case of multiple match criteria, multiple "case matching" has to be bound by a logical AND.

在多个匹配条件的情况下,多个“案例匹配”必须由逻辑AND绑定。

   Case hashing:
      - Hash Domain (input bits from packet)
           - <Header type = IPv4>
           - <Input bit specification, header part>
           - <Header type =  IPv6>
           - <Input bit specification, header part>
           - <payload byte number N>
           - <Input bit specification, payload part>
      - Hash Function
           - Hash Function name
           - Length of input key (eliminate 0x bytes)
           - Output value (length M and bitmask)
           - Hash Selection Range, as a list of non-overlapping
             intervals [start value, end value] where value is in
             [0,2^M-1]
           - Additional parameters are dependent on specific Hash
             Function (e.g., hash input bits (seed))
        
   Case hashing:
      - Hash Domain (input bits from packet)
           - <Header type = IPv4>
           - <Input bit specification, header part>
           - <Header type =  IPv6>
           - <Input bit specification, header part>
           - <payload byte number N>
           - <Input bit specification, payload part>
      - Hash Function
           - Hash Function name
           - Length of input key (eliminate 0x bytes)
           - Output value (length M and bitmask)
           - Hash Selection Range, as a list of non-overlapping
             intervals [start value, end value] where value is in
             [0,2^M-1]
           - Additional parameters are dependent on specific Hash
             Function (e.g., hash input bits (seed))
        

Notes to input bits for case hashing:

大小写哈希输入位的注意事项:

- Input bits can be from header part only, from the payload part only, or from both.

- 输入位可以仅来自报头部分、仅来自有效负载部分或两者。

- The bit specification, for the header part, can be specified for IPv4 or IPv6 only, or both.

- 可以仅为IPv4或IPv6或两者指定头部分的位规范。

- In case of IPv4, the bit specification is a sequence of 20 hexadecimal numbers [00,FF] specifying a 20-byte bitmask to be applied to the header.

- 对于IPv4,位规范是20个十六进制数字[00,FF]的序列,指定要应用于报头的20字节位掩码。

- In case of IPv6, it is a sequence of 40 hexadecimal numbers [00,FF] specifying a 40-byte bitmask to be applied to the header.

- 对于IPv6,它是一个由40个十六进制数字[00,FF]组成的序列,指定要应用于报头的40字节位掩码。

- The bit specification, for the payload part, is a sequence of hexadecimal numbers [00,FF] specifying the bitmask to be applied to the first N bytes of the payload, as specified by the previous field. In case the hexadecimal number sequence is longer than N, only the first N numbers are considered.

- 有效负载部分的位规范是一系列十六进制数字[00,FF],指定应用于有效负载前N个字节的位掩码,如前一字段所指定。如果十六进制数序列长于N,则只考虑前N个数。

- In case the payload is shorter than N, the Hash Function cannot be applied. Other options, like padding with zeros, may be considered in the future.

- 如果有效负载小于N,则无法应用哈希函数。将来可能会考虑其他选项,如用零填充。

- A Hash Function cannot be defined on the options field of the IPv4 header, neither on stacked headers of IPv6.

- 无法在IPv4标头的选项字段上定义哈希函数,也不能在IPv6的堆叠标头上定义哈希函数。

- The Hash Selection Range defines a range of hash values (out of all possible results of the hash operation). If the hash result for a specific packet falls in this range, the packet is selected. If the value is outside the range, the packet is not selected. For example, if the selection interval specification is [1:3], [6:9] all packets are selected for which the hash result is 1,2,3,6,7,8, or 9. In all other cases, the packet is not selected.

- 散列选择范围定义了散列值的范围(散列操作的所有可能结果)。如果特定数据包的哈希结果在此范围内,则选择该数据包。如果该值超出范围,则不选择该数据包。例如,如果选择间隔规范为[1:3],[6:9],则选择哈希结果为1、2、3、6、7、8或9的所有分组。在所有其他情况下,不选择数据包。

Case router state:

案例路由器状态:

- Ingress interface at which the packet arrives equals a specified value

- 数据包到达的入口接口等于指定值

- Egress interface to which the packet is routed equals a specified value

- 数据包路由到的出口接口等于指定值

- Packet violated Access Control List (ACL) on the router

- 路由器上的数据包违反访问控制列表(ACL)

- Reverse Path Forwarding (RPF) failed for the packet

- 数据包的反向路径转发(RPF)失败

- Resource Reservation is insufficient for the packet

- 资源预留不足以容纳数据包

- No route is found for the packet

- 找不到该数据包的路由

- Origin AS equals a specified value or lies within a given range

- 原点等于指定值或位于给定范围内

- Destination AS equals a specified value or lies within a given range

- 目标值等于指定值或位于给定范围内

Note to case router state:

案例路由器状态的注意事项:

- All router state entries can be linked by AND operators

- 所有路由器状态条目都可以由和运算符链接

8. Composite Techniques
8. 复合技术

Composite schemes are realized by combining the Selector IDs into a Selection Sequence. The Selection Sequence contains all Selector IDs that are applied to the Packet Stream subsequently. Some examples of composite schemes are reported below.

组合方案是通过将选择器ID组合成一个选择序列来实现的。选择序列包含随后应用于分组流的所有选择器id。下文报告了一些复合方案的例子。

8.1. Cascaded Filtering->Sampling or Sampling->Filtering
8.1. 级联滤波->采样或采样->滤波

If a filter precedes a Sampling process, the role of Filtering is to create a set of "parent populations" from a single stream that can then be fed independently to different Sampling functions, with different parameters tuned for the Population itself (e.g., if streams of different intensity result from Filtering, it may be good to have different Sampling rates). If Filtering follows a Sampling process, the same Selection Fraction and type are applied to the whole stream, independently of the relative size of the streams resulting from the Filtering function. Moreover, also packets not destined to be selected in the Filtering operation will "load" the Sampling function. So, in principle, Filtering before Sampling allows a more accurate tuning of the Sampling procedure, but if filters are too complex to work at full line rate (e.g., because they have to access router state information), Sampling before Filtering may be a need.

如果在采样过程之前有一个过滤器,则过滤的作用是从单个流创建一组“父总体”,然后这些“父总体”可以独立地馈送到不同的采样函数,并为总体本身调整不同的参数(例如,如果过滤产生不同强度的流,最好采用不同的采样率)。如果过滤遵循采样过程,则相同的选择分数和类型将应用于整个流,与过滤功能产生的流的相对大小无关。此外,在过滤操作中未指定要选择的数据包也将“加载”采样功能。因此,原则上,采样前过滤允许对采样过程进行更精确的调整,但如果过滤器过于复杂,无法以全行速率工作(例如,因为它们必须访问路由器状态信息),则可能需要在过滤前进行采样。

8.2. Stratified Sampling
8.2. 分层抽样

Stratified Sampling is one example for using a composite technique. The basic idea behind stratified Sampling is to increase the estimation accuracy by using a priori information about correlations of the investigated characteristic with some other characteristic that is easier to obtain. The a priori information is used to perform an intelligent grouping of the elements of the parent Population. In this manner, a higher estimation accuracy can be achieved with the same sample size or the sample size can be reduced without reducing the estimation accuracy.

分层抽样是使用复合技术的一个例子。分层抽样的基本思想是通过使用调查特征与其他一些更容易获得的特征之间的相关性的先验信息来提高估计精度。先验信息用于对父总体的元素进行智能分组。以这种方式,可以使用相同的样本量实现更高的估计精度,或者可以在不降低估计精度的情况下减小样本量。

Stratified Sampling divides the Sampling process into multiple steps. First, the elements of the parent Population are grouped into subsets in accordance to a given characteristic. This grouping can be done in multiple steps. Then samples are taken from each subset.

分层抽样将抽样过程分为多个步骤。首先,根据给定的特征将父总体的元素分组为子集。此分组可分为多个步骤。然后从每个子集中抽取样本。

The stronger the correlation between the characteristic used to divide the parent Population (stratification variable) and the characteristic of interest (for which an estimate is sought after), the easier is the consecutive Sampling process and the higher is the stratification gain. For instance, if the dividing characteristic were equal to the investigated characteristic, each element of the subgroup would be a perfect representative of that characteristic. In this case, it would be sufficient to take one arbitrary element out of each subgroup to get the actual distribution of the characteristic in the parent Population. Therefore, stratified Sampling can reduce the costs for the Sampling process (i.e., the number of samples needed to achieve a given level of confidence).

用于划分亲本群体的特征(分层变量)与感兴趣的特征(寻求估计值)之间的相关性越强,连续抽样过程越容易,分层增益越高。例如,如果分割特征等于调查特征,则子组的每个元素都将是该特征的完美代表。在这种情况下,从每个子群中取出一个任意元素就足以得到特征在父总体中的实际分布。因此,分层抽样可以降低抽样过程的成本(即,达到给定置信水平所需的样本数量)。

For stratified Sampling, one has to specify classification rules for grouping the elements into subgroups and the Sampling scheme that is used within the subgroups. The classification rules can be expressed by multiple filters. For the Sampling scheme within the subgroups, the parameters have to be specified as described above. The use of stratified Sampling methods for measurement purposes is described for instance in [ClPB93] and [Zseb03].

对于分层抽样,必须指定将元素分组为子组的分类规则以及子组内使用的抽样方案。分类规则可以用多个过滤器表示。对于子组内的采样方案,必须如上所述指定参数。例如,[ClPB93]和[Zseb03]中描述了分层抽样方法在测量中的使用。

9. Security Considerations
9. 安全考虑

Security considerations concerning the choice of a Hash Function for Hash-based Selection have been discussed in Section 6.2.3. That section discussed a number of potential attacks to craft Packet Streams that are disproportionately detected and/or discover the Hash Function parameters, the vulnerabilities of different Hash Functions to these attacks, and practices to minimize these vulnerabilities.

第6.2.3节讨论了选择基于散列的选择的散列函数的安全注意事项。该部分讨论了一些可能的攻击,这些攻击是为了制造被不成比例地检测和/或发现哈希函数参数的数据包流,不同哈希函数对这些攻击的漏洞,以及将这些漏洞最小化的实践。

In addition to this, a user can gain knowledge about the start and stop triggers in time-based systematic Sampling, e.g., by sending test packets. This knowledge might allow users to modify their send schedule in a way that their packets are disproportionately selected or not selected [GoRe07].

除此之外,用户可以在基于时间的系统采样中获得关于启动和停止触发器的知识,例如,通过发送测试数据包。这一知识可能允许用户修改其发送计划,使其数据包被不成比例地选择或未被选择[GoRe07]。

For random Sampling, a cryptographically strong random number generator should be used in order to prevent that an advisory can predict the selection decision [GoRe07].

对于随机抽样,应使用加密强随机数生成器,以防止咨询可以预测选择决定[GoRe07]。

Further security threats can occur when Sampling parameters are configured or communicated to other entities. The configuration and reporting of Sampling parameters are out of scope of this document. Therefore, the security threats that originate from this kind of communication cannot be assessed with the information given in this document.

配置采样参数或与其他实体通信时,可能会出现进一步的安全威胁。采样参数的配置和报告不在本文件范围内。因此,无法使用本文件中给出的信息评估源自此类通信的安全威胁。

Some of these threats can probably be addressed by keeping configuration information confidential and by authenticating entities that configure Sampling. Nevertheless, a full analysis and assessment of threats for configuration and reporting has to be done if configuration or reporting methods are proposed.

其中一些威胁可能可以通过对配置信息保密和对配置采样的实体进行身份验证来解决。然而,如果提出配置或报告方法,则必须对配置和报告威胁进行全面分析和评估。

10. Contributors
10. 贡献者

Sharon Goldberg contributed to the security considerations for Hash-based Selection.

Sharon Goldberg为基于哈希的选择的安全考虑做出了贡献。

Sharon Goldberg Department of Electrical Engineering Princeton University F210-K EQuad Princeton, NJ 08544, USA EMail: goldbe@princeton.edu

Sharon Goldberg普林斯顿大学电气工程系F210-K EQuad Princeton,NJ 08544,美国电子邮件:goldbe@princeton.edu

11. Acknowledgments
11. 致谢

We would like to thank the PSAMP group, especially Benoit Claise and Stewart Bryant, for fruitful discussions and for proofreading the document. We thank Sharon Goldberg for her input on security issues concerning Hash-based Selection.

我们要感谢PSAMP小组,特别是Benoit Claise和Stewart Bryant,他们进行了富有成效的讨论,并对文件进行了校对。我们感谢Sharon Goldberg就基于哈希的选择的安全问题提供的意见。

12. References
12. 工具书类
12.1. Normative References
12.1. 规范性引用文件

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

12.2. Informative References
12.2. 资料性引用

[AmCa89] Paul D. Amer, Lillian N. Cassel, "Management of Sampled Real-Time Network Measurements", 14th Conference on Local Computer Networks, October 1989, Minneapolis, pages 62-68, IEEE, 1989.

[AmCa89]Paul D.Amer,Lillian N.Cassel,“采样实时网络测量的管理”,第14届本地计算机网络会议,1989年10月,明尼阿波利斯,第62-68页,IEEE,1989年。

[BeCK96] M. Bellare, R. Canetti and H. Krawczyk, "Pseudorandom Functions Revisited: The Cascade Construction and its Concrete Security", Symposium on Foundations of Computer Science, 1996.

[BeCK96]M.Bellare,R.Canetti和H.Krawczyk,“重新审视伪随机函数:级联结构及其具体安全性”,计算机科学基础研讨会,1996年。

[ClPB93] K.C. Claffy, George C. Polyzos, Hans-Werner Braun, "Application of Sampling Methodologies to Network Traffic Characterization", Proceedings of ACM SIGCOMM'93, San Francisco, CA, USA, September 13 - 17, 1993.

Calpy,George C. Polyzos,Hans Werner Braun,“采样方法应用到网络流量表征”,ACM SigCOMM’93,旧金山,CA,美国,9月13日- 17, 1993的诉讼。

[DuGG02] N.G. Duffield, A. Gerber, M. Grossglauser, "Trajectory Engine: A Backend for Trajectory Sampling", IEEE Network Operations and Management Symposium 2002, Florence, Italy, April 15-19, 2002.

[DuGG02]N.G.Duffield,A.Gerber,M.Grossglauser,“轨迹引擎:轨迹采样的后端”,2002年IEEE网络运营和管理研讨会,意大利佛罗伦萨,2002年4月15-19日。

[DuGr00] N.G. Duffield, M. Grossglauser, "Trajectory Sampling for Direct Traffic Observation", Proceedings of ACM SIGCOMM 2000, Stockholm, Sweden, August 28 - September 1, 2000.

[DuGr00]N.G.Duffield,M.Grossglauser,“直接交通观测的轨迹采样”,2000年ACM SIGCOMM会议记录,瑞典斯德哥尔摩,2000年8月28日至9月1日。

[DuGr04] N.G. Duffield and M. Grossglauser "Trajectory Sampling with Unreliable Reporting", Proc IEEE Infocom 2004, Hong Kong, March 2004.

[DuGR04] N.G.Duffieland M.GrssGraseSe“轨迹采样与不可靠报告”,PROC IEEE ICOFCOM 2004,香港,2004年3月。

[DuLT01] N.G. Duffield, C. Lund, and M. Thorup, "Charging from Sampled Network Usage", ACM Internet Measurement Workshop IMW 2001, San Francisco, USA, November 1-2, 2001.

[Dult01] N.G. Duffield,C.Lund,M. Thorup,“从采样网络使用收费”,ACM互联网测量工作坊IMW 2001,旧金山,美国,十一月1-2,2001。

[EsVa01] C. Estan and G. Varghese, "New Directions in Traffic Measurement and Accounting", ACM SIGCOMM Internet Measurement Workshop 2001, San Francisco (CA) Nov. 2001.

C. Estan和G. Varghese,《交通测量与会计新方向》,ACM SIGCOMM互联网测量工作坊2001,旧金山(CA)11月2001日。

[GoRe07] S. Goldberg, J. Rexford, "Security Vulnerabilities and Solutions for Packet Sampling", IEEE Sarnoff Symposium, Princeton, NJ, May 2007.

[GoRe07]S.Goldberg,J.Rexford,“数据包采样的安全漏洞和解决方案”,IEEE Sarnoff研讨会,新泽西州普林斯顿,2007年5月。

[HT52] D.G. Horvitz and D.J. Thompson, "A Generalization of Sampling without replacement from a Finite Universe" J. Amer. Statist. Assoc. Vol. 47, pp. 663-685, 1952.

[HT52]D.G.Horvitz和D.J.Thompson,“有限宇宙中无替换采样的推广”,J.Amer。统计学家。《联合公报》第47卷,第663-685页,1952年。

[Henk08] Christian Henke, Evaluation of Hash Functions for Multipoint Sampling in IP Networks, Diploma Thesis, TU Berlin, April 2008.

[Henk08]Christian Henke,IP网络中多点采样哈希函数的评估,毕业论文,柏林大学,2008年4月。

[HeSZ08] Christian Henke, Carsten Schmoll, Tanja Zseby, Evaluation of Header Field Entropy for Hash-Based Packet Selection, Proceedings of Passive and Active Measurement Conference PAM 2008, Cleveland, Ohio, USA, April 2008.

[HeSZ08]Christian Henke,Carsten Schmoll,Tanja Zseby,基于散列的数据包选择的报头字段熵评估,被动和主动测量会议记录PAM 2008,美国俄亥俄州克利夫兰,2008年4月。

[Jenk97] B. Jenkins, "Algorithm Alley", Dr. Dobb's Journal, September 1997. http://burtleburtle.net/bob/hash/doobs.html.

[Jenk97]B.Jenkins,“算法巷”,Dobb博士期刊,1997年9月。http://burtleburtle.net/bob/hash/doobs.html.

[JePP92] Jonathan Jedwab, Peter Phaal, Bob Pinna, "Traffic Estimation for the Largest Sources on a Network, Using Packet Sampling with Limited Storage", HP technical report, Managemenr, Mathematics and Security Department, HP Laboratories, Bristol, March 1992, http://www.hpl.hp.com/techreports/92/HPL-92-35.html.

[JePP92]Jonathan Jedwab,Peter Phaal,Bob Pinna,“网络上最大数据源的流量估计,使用有限存储的数据包采样”,惠普技术报告,管理者,数学和安全部,惠普实验室,布里斯托尔,1992年3月,http://www.hpl.hp.com/techreports/92/HPL-92-35.html.

[Moli03] M. Molina, "A scalable and efficient methodology for flow monitoring in the Internet", International Teletraffic Congress (ITC-18), Berlin, Sep. 2003.

[Moli03]M.Molina,“互联网流量监测的可扩展有效方法”,国际电信流量大会(ITC-18),柏林,2003年9月。

[MoND05] M. Molina, S. Niccolini, N.G. Duffield, "A Comparative Experimental Study of Hash Functions Applied to Packet Sampling", International Teletraffic Congress (ITC-19), Beijing, August 2005.

[MoND05]M.Molina,S.Niccolini,N.G.Duffield,“应用于分组抽样的哈希函数的比较实验研究”,国际电信业务大会(ITC-19),北京,2005年8月。

[RFC1141] Mallory, T. and A. Kullberg, "Incremental updating of the Internet checksum", RFC 1141, January 1990.

[RFC1141]Mallory,T.和A.Kullberg,“互联网校验和的增量更新”,RFC 114119990年1月。

[RFC1624] Rijsinghani, A., Ed., "Computation of the Internet Checksum via Incremental Update", RFC 1624, May 1994.

[RFC1624]Rijsinghani,A.,编辑,“通过增量更新计算互联网校验和”,RFC1624,1994年5月。

[RFC2205] Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S. Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 Functional Specification", RFC 2205, September 1997.

[RFC2205]Braden,R.,Ed.,Zhang,L.,Berson,S.,Herzog,S.,和S.Jamin,“资源预留协议(RSVP)——版本1功能规范”,RFC 22052997年9月。

[RFC3704] Baker, F. and P. Savola, "Ingress Filtering for Multihomed Networks", BCP 84, RFC 3704, March 2004.

[RFC3704]Baker,F.和P.Savola,“多宿网络的入口过滤”,BCP 84,RFC 37042004年3月。

[RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander, "Requirements for IP Flow Information Export (IPFIX)", RFC 3917, October 2004.

[RFC3917]Quitek,J.,Zseby,T.,Claise,B.,和S.Zander,“IP流信息导出(IPFIX)的要求”,RFC 39172004年10月。

[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006.

[RFC4271]Rekhter,Y.,Ed.,Li,T.,Ed.,和S.Hares,Ed.,“边境网关协议4(BGP-4)”,RFC 42712006年1月。

[RFC5101] Claise, B., Ed., "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information", RFC 5101, January 2008.

[RFC5101]Claise,B.,Ed.,“交换IP流量信息的IP流量信息导出(IPFIX)协议规范”,RFC 5101,2008年1月。

[RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. Meyer, "Information Model for IP Flow Information Export", RFC 5102, January 2008.

[RFC5102]Quitek,J.,Bryant,S.,Claise,B.,Aitken,P.,和J.Meyer,“IP流信息导出的信息模型”,RFC 5102,2008年1月。

[RFC5474] Duffield, N., Ed., "A Framework for Packet Selection and Reporting", RFC 5474, March 2009.

[RFC5474]Duffield,N.,Ed.“数据包选择和报告框架”,RFC 5474,2009年3月。

[RFC5476] Claise, B., Ed., "Packet Sampling (PSAMP) Protocol Specifications", RFC 5476, March 2009.

[RFC5476]Claise,B.,编辑,“数据包采样(PSAMP)协议规范”,RFC 54762009年3月。

[RFC5477] Dietz, T., Claise, B., Aitken, P., Dressler, F., and G. Carle, "Information Model for Packet Sampling Exports", RFC 5477, March 2009.

[RFC5477]Dietz,T.,Claise,B.,Aitken,P.,Dressler,F.,和G.Carle,“数据包抽样出口的信息模型”,RFC 5477,2009年3月。

[Zseb03] T. Zseby, "Stratification Strategies for Sampling-based Non-intrusive Measurement of One-way Delay", Proceedings of Passive and Active Measurement Workshop (PAM 2003), La Jolla, CA, USA, pp. 171-179, April 2003.

[Zseb03]T.Zseby,“基于采样的单向延迟非侵入性测量的分层策略”,被动和主动测量研讨会论文集(PAM 2003),美国加利福尼亚州拉霍拉,第171-179页,2003年4月。

[ZsZC01] Tanja Zseby, Sebastian Zander, Georg Carle. Evaluation of Building Blocks for Passive One-way-delay Measurements. Proceedings of Passive and Active Measurement Workshop (PAM 2001), Amsterdam, The Netherlands, April 23-24, 2001.

[ZsZC01]坦贾·泽比、塞巴斯蒂安·赞德、格奥尔格·卡尔。评估被动单向延迟测量的构建块。被动和主动测量研讨会论文集(PAM 2001),荷兰阿姆斯特丹,2001年4月23-24日。

Appendix A. Hash Functions
附录A.散列函数
A.1. IP Shift-XOR (IPSX) Hash Function
A.1. IP移位异或(IPSX)哈希函数

The IPSX Hash Function is tailored for acting on IP version 4 packets. It exploits the structure of IP packets and in particular the variability expected to be exhibited within different fields of the IP packet in order to furnish a hash value with little apparent correlation with individual packet fields. Fields from the IPv4 and TCP/UDP headers are used as input. The IPSX Hash Function uses a small number of simple instructions.

IPSX哈希函数是为处理IP版本4数据包而定制的。它利用IP分组的结构,尤其是期望在IP分组的不同字段中显示的可变性,以便提供与各个分组字段几乎没有明显相关性的散列值。IPv4和TCP/UDP头中的字段用作输入。IPSX哈希函数使用少量简单指令。

Input parameters: None

输入参数:无

Built-in parameters: None

内置参数:无

Output: The output of the IPSX is a 16-bit number

输出:IPSX的输出为16位数字

Functioning:

运转:

The functioning can be divided into two parts: input selection, whose forms are composite input from various portions of the IP packet, followed by computation of the hash on the composite.

功能可分为两部分:输入选择,其形式是来自IP数据包不同部分的复合输入,然后是复合数据包上散列的计算。

Input Selection:

输入选择:

The raw input is drawn from the first 20 bytes of the IP packet header and the first 8 bytes of the IP payload. If IP options are not used, the IP header has 20 bytes, and hence the two portions adjoin and comprise the first 28 bytes of the IP packet. We now use the raw input as four 32-bit subportions of these 28 bytes. We specify the input by bit offsets from the start of IP header or payload.

原始输入来自IP数据包头的前20个字节和IP有效负载的前8个字节。如果未使用IP选项,则IP报头有20个字节,因此这两个部分邻接并构成IP数据包的前28个字节。我们现在使用原始输入作为这28个字节的四个32位子部分。我们通过从IP报头或有效负载开始的位偏移量来指定输入。

f1 = bits 32 to 63 of the IP header, comprising the IP identification field, flags, and fragment offset.

f1=IP报头的位32至63,包括IP标识字段、标志和片段偏移量。

f2 = bits 96 to 127 of the IP header, the source IP address.

f2=IP报头的第96位到第127位,即源IP地址。

f3 = bits 128 to 159 of the IP header, the destination IP address.

f3=IP报头的128至159位,即目标IP地址。

f4 = bits 32 to 63 of the IP payload. For a TCP packet, f4 comprises the TCP sequence number followed by the message length. For a UDP packet, f4 comprises the UDP checksum.

f4=IP有效负载的位32至63。对于TCP数据包,f4包括TCP序列号和消息长度。对于UDP数据包,f4包含UDP校验和。

Hash Computation:

散列计算:

The hash is computed from f1, f2, f3, and f4 by a combination of XOR (^), right shift (>>), and left shift (<<) operations. The intermediate quantities h1, v1, and v2 are 32-bit numbers.

散列是通过XOR(^)、右移位(>>)和左移位(<<)操作的组合从f1、f2、f3和f4计算的。中间量h1、v1和v2是32位数字。

1. v1 = f1 ^ f2; 2. v2 = f3 ^ f4; 3. h1 = v1 << 8; 4. h1 ^= v1 >> 4; 5. h1 ^= v1 >> 12; 6. h1 ^= v1 >> 16; 7. h1 ^= v2 << 6; 8. h1 ^= v2 << 10; 9. h1 ^= v2 << 14; 10. h1 ^= v2 >> 7;

1. v1=f1^f2;2.v2=f3^f4;3.h1=v1<8;4.h1^=v1>>4;5.h1^=v1>>12;6.h1^=v1>>16;7.h1^=v2<6;8.h1^=v2<10;9h1^=v2<14;10h1^=v2>>7;

The output of the hash is the least significant 16 bits of h1.

散列的输出是h1的最低有效16位。

A.2. BOB Hash Function
A.2. 鲍勃散列函数

The BOB Hash Function is a Hash Function designed for having each bit of the input affecting every bit of the return value and using both 1-bit and 2-bit deltas to achieve the so-called avalanche effect [Jenk97]. This function was originally built for hash table lookup with fast software implementation.

BOB散列函数是一种散列函数,设计用于使输入的每一位影响返回值的每一位,并使用1位和2位增量来实现所谓的雪崩效应[Jenk97]。此函数最初是为哈希表查找而构建的,具有快速的软件实现。

Input parameters:

输入参数:

The input parameters of such a function are:

该函数的输入参数为:

- the length of the input string (key) to be hashed, in bytes. The elementary input blocks of BOB hash are the single bytes; therefore, no padding is needed.

- 要散列的输入字符串(键)的长度,以字节为单位。BOB散列的基本输入块是单个字节;因此,不需要填充。

- an init value (an arbitrary 32-bit number).

- 初始值(任意32位数字)。

Built-in parameters:

内置参数:

The BOB hash uses the following built-in parameter:

BOB哈希使用以下内置参数:

- the golden ratio (an arbitrary 32-bit number used in the Hash Function computation: its purpose is to avoid mapping all zeros to all zeros).

- 黄金比率(散列函数计算中使用的任意32位数字:其目的是避免将所有零映射到所有零)。

Note: The mix sub-function (see mix (a,b,c) macro in the reference code below) has a number of parameters governing the shifts in the registers. The one presented is not the only possible choice.

注:mix子功能(参见下面参考代码中的mix(a、b、c)宏)有许多参数控制寄存器中的移位。提出的方案并不是唯一可能的选择。

It is an open point whether these may be considered additional built-in parameters to specify at function configuration.

这些参数是否可以被视为在功能配置时指定的附加内置参数,这是一个未知数。

Output:

输出:

The output of the BOB function is a 32-bit number. It should be specified:

BOB函数的输出为32位数字。应规定:

- A 32-bit mask to apply to the output

- 应用于输出的32位掩码

- The Selection Range as a list of non-overlapping intervals [start value, end value] where value is in [0,2^32]

- 选择范围为非重叠间隔[起始值,结束值]列表,其中值为[0,2^32]

Functioning:

运转:

The hash value is obtained computing first an initialization of an internal state (composed of three 32-bit numbers, called a, b, c in the reference code below), then, for each input byte of the key the internal state is combined by addition and mixed using the mix sub-function. Finally, the internal state mixed one last time and the third number of the state (c) is chosen as the return value.

首先通过计算内部状态(由三个32位数字组成,在下面的参考代码中称为a、b、c)的初始化来获得散列值,然后,对于密钥的每个输入字节,通过加法和使用mix子函数混合内部状态。最后,选择最后一次混合的内部状态和状态的第三个数字(c)作为返回值。

   typedef unsigned long int  ub4;   /* unsigned 4-byte quantities
   */
   typedef unsigned      char ub1;   /* unsigned 1-byte quantities
   */
        
   typedef unsigned long int  ub4;   /* unsigned 4-byte quantities
   */
   typedef unsigned      char ub1;   /* unsigned 1-byte quantities
   */
        
   #define hashsize(n) ((ub4)1<<(n))
   #define hashmask(n) (hashsize(n)-1)
        
   #define hashsize(n) ((ub4)1<<(n))
   #define hashmask(n) (hashsize(n)-1)
        
   /* ------------------------------------------------------
     mix -- mix three 32-bit values reversibly.
        
   /* ------------------------------------------------------
     mix -- mix three 32-bit values reversibly.
        

For every delta with one or two bits set, and the deltas of all three high bits or all three low bits, whether the original value of a,b,c is almost all zero or is uniformly distributed, * If mix() is run forward or backward, at least 32 bits in a,b,c have at least 1/4 probability of changing. * If mix() is run forward, every bit of c will change between 1/3 and 2/3 of the time (well, 22/100 and 78/100 for some 2- bit deltas) mix() was built out of 36 single-cycle latency instructions in a structure that could support 2x parallelism, like so:

对于设置了一个或两个位的每个增量,以及所有三个高位或所有三个低位的增量,无论a、b、c的原始值几乎都为零还是均匀分布,*如果向前或向后运行mix(),则a、b、c中至少32位的变化概率至少为1/4。*如果mix()向前运行,c的每一位将在1/3和2/3之间变化(对于某些2位增量,是22/100和78/100)。mix()是由36条单周期延迟指令组成的,其结构可以支持2倍并行,如下所示:

           a -= b;
           a -= c; x = (c>>13);
           b -= c; a ^= x;
           b -= a; x = (a<<8);
           c -= a; b ^= x;
           c -= b; x = (b>>13);
           ...
   Unfortunately, superscalar Pentiums and Sparcs can't take
   advantage of that parallelism.  They've also turned some of
   those single-cycle latency instructions into multi-cycle latency
   instructions
        
           a -= b;
           a -= c; x = (c>>13);
           b -= c; a ^= x;
           b -= a; x = (a<<8);
           c -= a; b ^= x;
           c -= b; x = (b>>13);
           ...
   Unfortunately, superscalar Pentiums and Sparcs can't take
   advantage of that parallelism.  They've also turned some of
   those single-cycle latency instructions into multi-cycle latency
   instructions
        
   ------------------------------------------------------------*/
        
   ------------------------------------------------------------*/
        
     #define mix(a,b,c)  \
     { \
       a -= b; a -= c; a ^= (c>>13); \
       b -= c; b -= a; b ^= (a<<8); \
       c -= a; c -= b; c ^= (b>>13); \
       a -= b; a -= c; a ^= (c>>12);  \
       b -= c; b -= a; b ^= (a<<16); \
       c -= a; c -= b; c ^= (b>>5); \
       a -= b; a -= c; a ^= (c>>3);  \
       b -= c; b -= a; b ^= (a<<10); \
       c -= a; c -= b; c ^= (b>>15); \
     }
        
     #define mix(a,b,c)  \
     { \
       a -= b; a -= c; a ^= (c>>13); \
       b -= c; b -= a; b ^= (a<<8); \
       c -= a; c -= b; c ^= (b>>13); \
       a -= b; a -= c; a ^= (c>>12);  \
       b -= c; b -= a; b ^= (a<<16); \
       c -= a; c -= b; c ^= (b>>5); \
       a -= b; a -= c; a ^= (c>>3);  \
       b -= c; b -= a; b ^= (a<<10); \
       c -= a; c -= b; c ^= (b>>15); \
     }
        
     /* -----------------------------------------------------------
   hash() -- hash a variable-length key into a 32-bit value
   k       : the key (the unaligned variable-length array of bytes)
   len     : the length of the key, counting by bytes
   initval : can be any 4-byte value
   Returns a 32-bit value.  Every bit of the key affects every bit
   of the return value.  Every 1-bit and 2-bit delta achieves
   avalanche.  About 6*len+35 instructions.
        
     /* -----------------------------------------------------------
   hash() -- hash a variable-length key into a 32-bit value
   k       : the key (the unaligned variable-length array of bytes)
   len     : the length of the key, counting by bytes
   initval : can be any 4-byte value
   Returns a 32-bit value.  Every bit of the key affects every bit
   of the return value.  Every 1-bit and 2-bit delta achieves
   avalanche.  About 6*len+35 instructions.
        
   The best hash table sizes are powers of 2.  There is no need to do
   mod a prime (mod is so slow!).  If you need less than 32 bits, use a
   bitmask.  For example, if you need only 10 bits, do h = (h &
   hashmask(10)), in which case, the hash table should have hashsize(10)
   elements.
        
   The best hash table sizes are powers of 2.  There is no need to do
   mod a prime (mod is so slow!).  If you need less than 32 bits, use a
   bitmask.  For example, if you need only 10 bits, do h = (h &
   hashmask(10)), in which case, the hash table should have hashsize(10)
   elements.
        
   If you are hashing n strings (ub1 **)k, do it like this: for (i=0,
   h=0; i<n; ++i) h = hash( k[i], len[i], h);
        
   If you are hashing n strings (ub1 **)k, do it like this: for (i=0,
   h=0; i<n; ++i) h = hash( k[i], len[i], h);
        
   By Bob Jenkins, 1996.  bob_jenkins@burtleburtle.net.  You may use
   this code any way you wish, private, educational, or commercial.
   It's free.  See http://burtleburtle.net/bob/hash/evahash.html.
   Use for hash table lookup, or anything where one collision in 2^^32
   is acceptable.  Do NOT use for cryptographic purposes.
    ----------------------------------------------------------- */
        
   By Bob Jenkins, 1996.  bob_jenkins@burtleburtle.net.  You may use
   this code any way you wish, private, educational, or commercial.
   It's free.  See http://burtleburtle.net/bob/hash/evahash.html.
   Use for hash table lookup, or anything where one collision in 2^^32
   is acceptable.  Do NOT use for cryptographic purposes.
    ----------------------------------------------------------- */
        
     ub4 bob_hash(k, length, initval)
     register ub1 *k;        /* the key */
     register ub4  length;   /* the length of the key */
     register ub4  initval;  /* an arbitrary value */
     {
        register ub4 a,b,c,len;
        
     ub4 bob_hash(k, length, initval)
     register ub1 *k;        /* the key */
     register ub4  length;   /* the length of the key */
     register ub4  initval;  /* an arbitrary value */
     {
        register ub4 a,b,c,len;
        
        /* Set up the internal state */
        len = length;
        a = b = 0x9e3779b9; /*the golden ratio; an arbitrary value
   */
        c = initval;         /* another arbitrary value */
        
        /* Set up the internal state */
        len = length;
        a = b = 0x9e3779b9; /*the golden ratio; an arbitrary value
   */
        c = initval;         /* another arbitrary value */
        
   /*------------------------------------ handle most of the key */
        
   /*------------------------------------ handle most of the key */
        
        while (len >= 12)
        {
           a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16)
   +((ub4)k[3]<<24));
           b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16)
   +((ub4)k[7]<<24));
           c += (k[8] +((ub4)k[9]<<8)
   +((ub4)k[10]<<16)+((ub4)k[11]<<24));
           mix(a,b,c);
           k += 12; len -= 12;
        }
        
        while (len >= 12)
        {
           a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16)
   +((ub4)k[3]<<24));
           b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16)
   +((ub4)k[7]<<24));
           c += (k[8] +((ub4)k[9]<<8)
   +((ub4)k[10]<<16)+((ub4)k[11]<<24));
           mix(a,b,c);
           k += 12; len -= 12;
        }
        
        /*---------------------------- handle the last 11 bytes */
        c += length;
        switch(len)       /* all the case statements fall through*/
        {
        case 11: c+=((ub4)k[10]<<24);
        case 10: c+=((ub4)k[9]<<16);
        case 9 : c+=((ub4)k[8]<<8);
           /* the first byte of c is reserved for the length */
        case 8 : b+=((ub4)k[7]<<24);
        case 7 : b+=((ub4)k[6]<<16);
        case 6 : b+=((ub4)k[5]<<8);
        case 5 : b+=k[4];
        case 4 : a+=((ub4)k[3]<<24);
        case 3 : a+=((ub4)k[2]<<16);
        
        /*---------------------------- handle the last 11 bytes */
        c += length;
        switch(len)       /* all the case statements fall through*/
        {
        case 11: c+=((ub4)k[10]<<24);
        case 10: c+=((ub4)k[9]<<16);
        case 9 : c+=((ub4)k[8]<<8);
           /* the first byte of c is reserved for the length */
        case 8 : b+=((ub4)k[7]<<24);
        case 7 : b+=((ub4)k[6]<<16);
        case 6 : b+=((ub4)k[5]<<8);
        case 5 : b+=k[4];
        case 4 : a+=((ub4)k[3]<<24);
        case 3 : a+=((ub4)k[2]<<16);
        
        case 2 : a+=((ub4)k[1]<<8);
        case 1 : a+=k[0];
          /* case 0: nothing left to add */
        }
        mix(a,b,c);
        /*-------------------------------- report the result */
        return c;
     }
        
        case 2 : a+=((ub4)k[1]<<8);
        case 1 : a+=k[0];
          /* case 0: nothing left to add */
        }
        mix(a,b,c);
        /*-------------------------------- report the result */
        return c;
     }
        

Authors' Addresses

作者地址

Tanja Zseby Fraunhofer Institute for Open Communication Systems Kaiserin-Augusta-Allee 31 10589 Berlin Germany Phone: +49-30-34 63 7153 EMail: tanja.zseby@fokus.fraunhofer.de

Tanja Zseby Fraunhofer开放通信系统研究所Kaiserin Augusta Allee 31 10589柏林德国电话:+49-30-34 63 7153电子邮件:Tanja。zseby@fokus.fraunhofer.de

Maurizio Molina DANTE City House 126-130 Hills Road Cambridge CB21PQ United Kingdom Phone: +44 1223 371 300 EMail: maurizio.molina@dante.org.uk

Maurizio Molina DANTE City House 126-130 Hills Road Cambridge CB21PQ英国电话:+44 1223 371 300电子邮件:Maurizio。molina@dante.org.uk

Nick Duffield AT&T Labs - Research Room B-139 180 Park Ave Florham Park, NJ 07932 USA Phone: +1 973-360-8726 EMail: duffield@research.att.com

Nick Duffield AT&T实验室-研究室B-139美国新泽西州弗洛勒姆公园公园大道180号07932电话:+1 973-360-8726电子邮件:duffield@research.att.com

Saverio Niccolini Network Laboratories, NEC Europe Ltd. Kurfuerstenanlage 36 69115 Heidelberg Germany Phone: +49-6221-9051118 EMail: saverio.niccolini@netlab.nec.de

Saverio Niccolini网络实验室,NEC欧洲有限公司Kurfuerstenalage 36 69115德国海德堡电话:+49-6221-9051118电子邮件:Saverio。niccolini@netlab.nec.de

Frederic Raspall EPSC-UPC Dept. of Telematics Av. del Canal Olimpic, s/n Edifici C4 E-08860 Castelldefels, Barcelona Spain EMail: fredi@entel.upc.es

Frederic Raspall EPSC-UPC远程信息处理Av部。del Canal Olipic,s/n大厦C4 E-08860卡斯特德费尔斯,巴塞罗那西班牙电子邮件:fredi@entel.upc.es