Network Working Group                                    C. Bestler, Ed.
Request for Comments: 5045                                      Neterion
Category: Informational                                         L. Coene
                                                  Nokia Siemens Networks
                                                            October 2007
        
Network Working Group                                    C. Bestler, Ed.
Request for Comments: 5045                                      Neterion
Category: Informational                                         L. Coene
                                                  Nokia Siemens Networks
                                                            October 2007
        

Applicability of Remote Direct Memory Access Protocol (RDMA) and Direct Data Placement Protocol (DDP)

远程直接内存访问协议(RDMA)和直接数据放置协议(DDP)的适用性

Status of This Memo

关于下段备忘

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。

Abstract

摘要

This document describes the applicability of Remote Direct Memory Access Protocol (RDMAP) and the Direct Data Placement Protocol (DDP). It compares and contrasts the different transport options over IP that DDP can use, provides guidance to ULP developers on choosing between available transports and/or how to be indifferent to the specific transport layer used, compares use of DDP with direct use of the supporting transports, and compares DDP over IP transports with non-IP transports that support RDMA functionality.

本文档描述了远程直接内存访问协议(RDMAP)和直接数据放置协议(DDP)的适用性。它比较和对比了DDP可以使用的不同IP传输选项,为ULP开发人员在可用传输之间的选择和/或如何对所使用的特定传输层漠不关心提供了指导,比较了DDP的使用和支持传输的直接使用,并将IP上的DDP传输与支持RDMA功能的非IP传输进行比较。

Table of Contents

目录

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Direct Placement . . . . . . . . . . . . . . . . . . . . . . .  5
     3.1.  Direct Placement Using Only the LLP  . . . . . . . . . . .  5
     3.2.  Fewer Required ULP Interactions  . . . . . . . . . . . . .  6
   4.  Tagged Messages  . . . . . . . . . . . . . . . . . . . . . . .  6
     4.1.  Order-Independent Reception  . . . . . . . . . . . . . . .  7
     4.2.  Reduced ULP Notifications  . . . . . . . . . . . . . . . .  7
     4.3.  Simplified ULP Exchanges . . . . . . . . . . . . . . . . .  8
     4.4.  Order-Independent Sending  . . . . . . . . . . . . . . . .  9
     4.5.  Untagged Messages and Tagged Buffers as ULP Credits  . . . 10
   5.  RDMA Read  . . . . . . . . . . . . . . . . . . . . . . . . . . 12
   6.  LLP Comparisons  . . . . . . . . . . . . . . . . . . . . . . . 13
     6.1.  Multistreaming Implications  . . . . . . . . . . . . . . . 13
     6.2.  Out-of-Order Reception Implications  . . . . . . . . . . . 13
     6.3.  Header and Marker Overhead . . . . . . . . . . . . . . . . 13
     6.4.  Middlebox Support  . . . . . . . . . . . . . . . . . . . . 14
     6.5.  Processing Overhead  . . . . . . . . . . . . . . . . . . . 14
     6.6.  Data Integrity Implications  . . . . . . . . . . . . . . . 14
       6.6.1.  MPA/TCP Specifics  . . . . . . . . . . . . . . . . . . 15
       6.6.2.  SCTP Specifics . . . . . . . . . . . . . . . . . . . . 15
     6.7.  Non-IP Transports  . . . . . . . . . . . . . . . . . . . . 15
       6.7.1.  No RDMA-Layer Ack  . . . . . . . . . . . . . . . . . . 16
     6.8.  Other IP Transports  . . . . . . . . . . . . . . . . . . . 16
     6.9.  LLP-Independent Session Establishment  . . . . . . . . . . 17
       6.9.1.  RDMA-Only Session Establishment  . . . . . . . . . . . 17
       6.9.2.  RDMA-Conditional Session Establishment . . . . . . . . 18
   7.  Local Interface Implications . . . . . . . . . . . . . . . . . 18
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 19
     8.1.  Connection/Association Setup . . . . . . . . . . . . . . . 19
     8.2.  Tagged Buffer Exposure . . . . . . . . . . . . . . . . . . 19
     8.3.  Impact of Encrypted Transports . . . . . . . . . . . . . . 19
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 19
     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 19
     9.2.  Informative References . . . . . . . . . . . . . . . . . . 19
        
   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Direct Placement . . . . . . . . . . . . . . . . . . . . . . .  5
     3.1.  Direct Placement Using Only the LLP  . . . . . . . . . . .  5
     3.2.  Fewer Required ULP Interactions  . . . . . . . . . . . . .  6
   4.  Tagged Messages  . . . . . . . . . . . . . . . . . . . . . . .  6
     4.1.  Order-Independent Reception  . . . . . . . . . . . . . . .  7
     4.2.  Reduced ULP Notifications  . . . . . . . . . . . . . . . .  7
     4.3.  Simplified ULP Exchanges . . . . . . . . . . . . . . . . .  8
     4.4.  Order-Independent Sending  . . . . . . . . . . . . . . . .  9
     4.5.  Untagged Messages and Tagged Buffers as ULP Credits  . . . 10
   5.  RDMA Read  . . . . . . . . . . . . . . . . . . . . . . . . . . 12
   6.  LLP Comparisons  . . . . . . . . . . . . . . . . . . . . . . . 13
     6.1.  Multistreaming Implications  . . . . . . . . . . . . . . . 13
     6.2.  Out-of-Order Reception Implications  . . . . . . . . . . . 13
     6.3.  Header and Marker Overhead . . . . . . . . . . . . . . . . 13
     6.4.  Middlebox Support  . . . . . . . . . . . . . . . . . . . . 14
     6.5.  Processing Overhead  . . . . . . . . . . . . . . . . . . . 14
     6.6.  Data Integrity Implications  . . . . . . . . . . . . . . . 14
       6.6.1.  MPA/TCP Specifics  . . . . . . . . . . . . . . . . . . 15
       6.6.2.  SCTP Specifics . . . . . . . . . . . . . . . . . . . . 15
     6.7.  Non-IP Transports  . . . . . . . . . . . . . . . . . . . . 15
       6.7.1.  No RDMA-Layer Ack  . . . . . . . . . . . . . . . . . . 16
     6.8.  Other IP Transports  . . . . . . . . . . . . . . . . . . . 16
     6.9.  LLP-Independent Session Establishment  . . . . . . . . . . 17
       6.9.1.  RDMA-Only Session Establishment  . . . . . . . . . . . 17
       6.9.2.  RDMA-Conditional Session Establishment . . . . . . . . 18
   7.  Local Interface Implications . . . . . . . . . . . . . . . . . 18
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 19
     8.1.  Connection/Association Setup . . . . . . . . . . . . . . . 19
     8.2.  Tagged Buffer Exposure . . . . . . . . . . . . . . . . . . 19
     8.3.  Impact of Encrypted Transports . . . . . . . . . . . . . . 19
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 19
     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 19
     9.2.  Informative References . . . . . . . . . . . . . . . . . . 19
        
1. Introduction
1. 介绍

Remote Direct Memory Access Protocol (RDMAP) [RFC5040] and Direct Data Placement (DDP) [RFC5041] work together to provide application-independent efficient placement of application payload directly into buffers specified by the Upper Layer Protocol (ULP).

远程直接内存访问协议(RDMAP)[RFC5040]和直接数据放置(DDP)[RFC5041]协同工作,以提供独立于应用程序的应用程序有效负载直接放置到上层协议(ULP)指定的缓冲区中。

The DDP protocol is responsible for direct placement of received payload into ULP-specified buffers. The RDMAP protocol provides completion notifications to the ULP and support for Data-Sink-initiated fetch of Advertised Buffers (RDMA Reads).

DDP协议负责将接收到的有效负载直接放置到ULP指定的缓冲区中。RDMAP协议向ULP提供完成通知,并支持数据接收器启动的播发缓冲区提取(RDMA读取)。

DDP and RDMAP are both application-independent protocols that allow the ULP to perform remote direct data placement. DDP can use multiple standard IP transports including SCTP and TCP.

DDP和RDMAP都是独立于应用程序的协议,允许ULP执行远程直接数据放置。DDP可以使用多个标准IP传输,包括SCTP和TCP。

By clarifying the situations where the functionality of these protocols is applicable, this document can guide implementers and application and protocol designers in selecting which protocols to use.

通过阐明这些协议的功能适用的情况,本文档可以指导实施者、应用程序和协议设计者选择要使用的协议。

The applicability of RDMAP/DDP is driven by their unique capabilities:

RDMAP/DDP的适用性由其独特的功能驱动:

o This document will discuss when common data placement procedures are of more benefit to applications than application-specific solutions built on top of direct use of the underlying transport.

o 本文档将讨论什么时候通用数据放置过程比直接使用底层传输之上构建的特定于应用程序的解决方案对应用程序更有利。

o DDP supports both Untagged and Tagged Buffers. Tagged Buffers allow the Data Sink ULP to be indifferent to what order (or in what messages) the Data Source sent the data, or in what order packets are received. Typically, tagged data can be used for payload transfer, while untagged is best used for control messages. However each upper-layer protocol can determine the optimal use of Tagged and Untagged Messages for itself. This document will discuss when Data Source flexibility is of benefit to applications.

o DDP支持未标记和标记的缓冲区。带标签的缓冲区允许数据接收器ULP对数据源发送数据的顺序(或以何种消息)或接收数据包的顺序无关。通常,标记的数据可用于有效负载传输,而未标记的数据最好用于控制消息。然而,每个上层协议都可以为自己确定标记和未标记消息的最佳使用。本文档将讨论数据源灵活性何时对应用程序有利。

o RDMAP consolidates ULP notifications, thereby minimizing the number of required ULP interactions.

o RDMAP整合ULP通知,从而最小化所需ULP交互的数量。

o RDMAP defines RDMA Reads, which allow remote access to Advertised Buffers. This document will review the advantages of using RDMA Reads as contrasted to alternate solutions.

o RDMAP定义RDMA读取,允许远程访问播发的缓冲区。本文档将回顾使用RDMA读取与替代解决方案相比的优势。

A more comprehensive introduction to the RDMAP and DDP protocols and discussion of their security considerations can be found in [RFC5042].

[RFC5042]中提供了对RDMAP和DDP协议的更全面的介绍及其安全注意事项的讨论。

Some non-IP transports, such as InfiniBand, directly integrate RDMA features. This document will review the applicability of providing RDMA services over ubiquitous IP transports instead of over customized transport protocols. Due to the fact that DDP is defined cleanly as a layer over existing IP transports, DDP has simpler ordering rules than some prior RDMA protocols. This may have some implications for application designers.

一些非IP传输,如InfiniBand,直接集成RDMA功能。本文件将审查通过无处不在的IP传输而不是通过定制的传输协议提供RDMA服务的适用性。由于DDP被清晰地定义为现有IP传输上的一层,因此DDP比一些以前的RDMA协议具有更简单的排序规则。这可能会对应用程序设计者产生一些影响。

The full capabilities of DDP and RDMAP can only be fully realized by applications that are designed to exploit them. The coexistence of RDMAP/DDP-aware local interfaces with traditional socket interfaces will also be explored.

DDP和RDMAP的全部功能只能由专门开发它们的应用程序来充分实现。还将探讨RDMAP/DDP感知本地接口与传统套接字接口的共存。

Finally, DDP support is defined for at least two IP transports: SCTP [RFC5043] and TCP [RFC5044]. The rationale for supporting both transports is reviewed, as well as when each would be the appropriate selection.

最后,至少为两种IP传输定义了DDP支持:SCTP[RFC5043]和TCP[RFC5044]。审查了支持两种运输方式的理由,以及何时每种运输方式都是合适的选择。

2. Definitions
2. 定义

Advertisement - the act of informing a Remote Peer that a local RDMA Buffer is available to it. A Node makes available an RDMA Buffer for incoming RDMA Read or RDMA Write access by informing its RDMA/ DDP peer of the Tagged Buffer identifiers (STag, base address, and buffer length). This Advertisement of Tagged Buffer information is not defined by RDMA/DDP and is left to the ULP. A typical method would be for the Local Peer to embed the Tagged Buffer's Steering Tag, base address, and length in a Send Message destined for the Remote Peer.

播发-通知远程对等方本地RDMA缓冲区可用的行为。节点通过通知其RDMA/DDP对等方已标记的缓冲区标识符(STag、基址和缓冲区长度),为传入的RDMA读或RDMA写访问提供RDMA缓冲区。RDMA/DDP没有定义标记缓冲区信息的播发,而是留给ULP。一种典型的方法是,本地对等方将标记缓冲区的引导标记、基址和长度嵌入到发送给远程对等方的发送消息中。

Data Sink - The peer receiving a data payload. Note that the Data Sink can be required to both send and receive RDMA/DDP Messages to transfer a data payload.

数据接收器-接收数据有效负载的对等方。请注意,可以要求数据接收器发送和接收RDMA/DDP消息以传输数据有效负载。

Data Source - The peer sending a data payload. Note that the Data Source can be required to both send and receive RDMA/DDP Messages to transfer a data payload.

数据源-发送数据有效负载的对等方。请注意,可以要求数据源发送和接收RDMA/DDP消息以传输数据有效负载。

Lower Layer Protocol (LLP) - The transport protocol that provides services to DDP. This is an IP transport with any required adaptation layer. Adaptation layers are defined for SCTP and TCP.

低层协议(LLP)-向DDP提供服务的传输协议。这是一个具有任何所需适配层的IP传输。为SCTP和TCP定义了适配层。

Steering Tag (STag) - An identifier of a Tagged Buffer on a Node, valid as defined within a protocol specification.

转向标记(STag)-节点上标记缓冲区的标识符,在协议规范中定义有效。

Tagged Message - A DDP message that is directed to a ULP-specified buffer based upon imbedded addressing information. In the immediate sense, the destination buffer is specified by the message sender. The message receiver is given no independent indication that a Tagged Message has been received.

标记消息-根据嵌入的寻址信息定向到ULP指定缓冲区的DDP消息。在即时意义上,目标缓冲区由消息发送方指定。消息接收器没有收到标记消息的独立指示。

Untagged Message - A DDP message that is directed to a ULP-specified buffer based upon a Message Sequence Number being matched with a receiver-supplied buffer. The destination buffer is specified by the message receiver. The message receiver is notified by some mechanism that an Untagged Message has been received.

未标记消息-根据与接收器提供的缓冲区匹配的消息序列号,定向到ULP指定缓冲区的DDP消息。目标缓冲区由消息接收器指定。通过某种机制通知消息接收方已收到未标记的消息。

Upper Layer Protocol (ULP) - The direct user of RDMAP/DDP services. In addition to protocols such as iSER [RFC5046] and NFSv4 over RDMA [NFSDIRECT], the ULP may be embedded in an application or a middleware layer, as is often the case for the Sockets Direct Protocol (SDP) and Remote Procedure Call (RPC) protocols.

上层协议(ULP)-RDMAP/DDP服务的直接用户。除了诸如iSER[RFC5046]和NFSv4 over RDMA[NFSDIRECT]之类的协议之外,ULP还可以嵌入到应用程序或中间件层中,这通常是套接字直接协议(SDP)和远程过程调用(RPC)协议的情况。

3. Direct Placement
3. 直接放置

Direct Data Placement optimizes the placement of ULP Payload into the correct destination buffers, typically eliminating intermediate copying. Placement is enabled without regard to order of arrival, order of transmission, or requirement of per-placement interaction with the ULP.

直接数据放置优化了ULP有效负载到正确目标缓冲区的放置,通常消除了中间复制。在不考虑到达顺序、传输顺序或每次放置与ULP交互的要求的情况下启用放置。

RDMAP minimizes the required ULP interactions. This capability is most valuable for applications that require multiple transport layer packets for each required ULP interaction.

RDMAP最小化了所需的ULP交互。对于需要为每个所需ULP交互提供多个传输层数据包的应用程序来说,此功能最有价值。

3.1. Direct Placement Using Only the LLP
3.1. 仅使用LLP直接放置

Direct data placement can be achieved without RDMA. Pre-posting of receive buffers could allow a non-RDMA network stack to place data directly to user buffers.

直接数据放置可以在没有RDMA的情况下实现。预发布接收缓冲区可允许非RDMA网络堆栈将数据直接放置到用户缓冲区。

The degree to which DDP optimizes depends on which transport it is being compared with, and on the nature of the local interface. Without RDMAP/DDP, pre-posting buffers require the receiving side to accurately predict the required buffers and their sizes. This is not feasible for all ULPs. By contrast, DDP only requires the ULP to predict the sequence and size of incoming Untagged Messages.

DDP优化的程度取决于它与哪个传输进行比较,以及本地接口的性质。在没有RDMAP/DDP的情况下,预投递缓冲区要求接收方准确预测所需缓冲区及其大小。这并不适用于所有ULP。相比之下,DDP只需要ULP来预测传入的未标记消息的顺序和大小。

An application that could predict incoming messages and required nothing more than direct placement into buffers might be able to do so with a properly designed local interface to native SCTP or TCP (without RDMA). This is easier using native SCTP because the

一个能够预测传入消息并且只需要直接放置到缓冲区中的应用程序,可以通过一个与本机SCTP或TCP(没有RDMA)的适当设计的本地接口来实现这一点。使用本机SCTP更容易,因为

application would only have to predict the sequence of messages and the maximum size of each message, not the exact size.

应用程序只需预测消息序列和每条消息的最大大小,而不必预测确切的大小。

The main benefit of DDP for such an application would be that pre-posting of receive buffers is a mandated local interface capability, and that predictions can always be made on a per-message basis (not per byte).

对于这种应用程序,DDP的主要好处是预发布接收缓冲区是一种强制的本地接口功能,并且总是可以根据每条消息(而不是每个字节)进行预测。

The Lower Layer Protocol, LLP, can also be used directly if ULP-specific knowledge is built into the protocol stack to allow "parse and place" handling of received packets. Such a solution either requires interaction with the ULP or the protocol stack's knowledge of ULP-specific syntax rules.

如果协议栈中内置了ULP特定知识,以允许对接收到的数据包进行“解析和放置”处理,则也可以直接使用下层协议LLP。这种解决方案要么需要与ULP交互,要么需要协议栈了解ULP特定的语法规则。

DDP achieves the benefits of directly placing incoming payload without requiring tight coupling between the ULP and the protocol stack. However, "parse and place" capabilities can certainly provide equivalent services to a limited number of ULPs.

DDP实现了直接放置传入有效负载的好处,而不需要ULP和协议栈之间的紧密耦合。然而,“解析和放置”功能当然可以为有限数量的ULP提供同等的服务。

3.2. Fewer Required ULP Interactions
3.2. 所需的ULP交互更少

While reducing the number of required ULP interactions is in itself desirable, it is critical for high-speed connections. The burst packet rate for a high-speed interface could easily exceed the host system's ability to switch ULP contexts.

虽然减少所需ULP交互的数量本身是可取的,但这对于高速连接至关重要。高速接口的突发数据包速率很容易超过主机系统切换ULP上下文的能力。

Content access applications are important examples of applications that require high bandwidth and can transfer a significant amount of content between required ULP interactions. These applications include file access protocols (NAS), storage access (SAN), database access, and other application-specific forms of content access such as HTTP, XML, and email.

内容访问应用程序是需要高带宽的应用程序的重要示例,可以在所需的ULP交互之间传输大量内容。这些应用程序包括文件访问协议(NAS)、存储访问(SAN)、数据库访问以及其他特定于应用程序的内容访问形式,如HTTP、XML和电子邮件。

4. Tagged Messages
4. 标记消息

This section covers the major benefits from the use of Tagged Messages.

本节介绍使用标记消息的主要好处。

A more critical advantage of DDP is the ability of the Data Source to use Tagged Buffers. Tagging messages allows the Data Source to choose the ordering and packetization of its payload deliveries. With direct data placement based solely upon pre-posted receives, the packetization and delivery of payload must be agreed by the ULP peers in advance.

DDP的一个更关键的优点是数据源能够使用带标记的缓冲区。标记消息允许数据源选择有效负载交付的顺序和打包。对于仅基于预发布接收的直接数据放置,有效负载的打包和交付必须事先得到ULP对等方的同意。

The Upper Layer Protocol can allocate content between Untagged and/or Tagged Messages to maximize the potential optimizations. Placing content within an Untagged Message can deliver the content in the

上层协议可以在未标记和/或标记的消息之间分配内容,以最大化潜在的优化。将内容放置在未标记的邮件中可以在

same packet that signals completion to the receiver. This can improve latency. It can even eliminate round trips. But it requires making larger anonymous buffers to be available.

向接收器发送完成信号的同一数据包。这可以改善延迟。它甚至可以消除往返。但它需要提供更大的匿名缓冲区。

Some examples of data that typically belongs in the Untagged Message would include:

通常属于未标记消息的一些数据示例包括:

short fixed-size control data that is inherently part of the control message. This is especially true when the data is a required part of the control message.

短的固定大小的控制数据,它本质上是控制消息的一部分。当数据是控制消息的必需部分时,尤其如此。

relatively short payload that is almost always needed, especially when its inclusion would eliminate a round-trip to fetch the data. Examples would include the initial data on a write request and Advertisements of Tagged Buffers.

几乎总是需要相对较短的有效负载,尤其是当包含它将消除获取数据的往返时。示例包括写请求的初始数据和标记缓冲区的公告。

Tagged Messages standardize direct placement of data without per-packet interaction with the upper layers. Even if there is an upper-layer protocol encoding of what is being transferred, as is common with middleware solutions, this information is not understood at the application-independent layers. The directions on where to place the incoming data cannot be accessed without switching to the ULP first. DDP provides a standardized 'packing list', which can be interpreted without requiring ULP interaction. Indeed, it is designed to be implementable in hardware.

标记消息标准化了数据的直接放置,而无需与上层的每个数据包交互。即使有上层协议编码传输的内容(这在中间件解决方案中很常见),这些信息在独立于应用程序的层上也无法理解。如果不先切换到ULP,则无法访问输入数据的放置位置指示。DDP提供标准化的“装箱单”,无需ULP交互即可对其进行解释。事实上,它被设计成可以在硬件中实现。

4.1. Order-Independent Reception
4.1. 订单独立接收

Tagged Messages are directed to a buffer based on an included Steering Tag. Additionally, no notice is provided to the ULP for each individual Tagged Message's arrival. Together these allow Tagged Messages received out of order to be processed without intermediate buffering or additional notifications to the ULP.

标记的消息根据包含的转向标记定向到缓冲区。此外,未向ULP提供每个单独标记邮件到达的通知。这些允许处理无序接收的标记消息,而无需中间缓冲或向ULP发送额外通知。

4.2. Reduced ULP Notifications
4.2. 减少ULP通知

RDMAP offers both Tagged and Untagged Messages. No receiving-side ULP interactions are required for Tagged Messages. By optimally dividing traffic between Tagged and Untagged Messages, the ULP can limit the number of events that must be dealt with at the ULP layer. This typically reduces the number of context switches required and improves performance.

RDMAP同时提供标记消息和未标记消息。标记消息不需要接收方ULP交互。通过在标记消息和未标记消息之间优化划分通信量,ULP可以限制必须在ULP层处理的事件数量。这通常会减少所需的上下文切换数量并提高性能。

RDMAP further reduces required ULP interactions, consolidating completion notifications of Tagged Messages with the completion notification of a trailing Untagged Message. For most ULPs, this radically reduces the number of ULP required interactions even further.

RDMAP进一步减少了所需的ULP交互,将标记消息的完成通知与后续未标记消息的完成通知整合在一起。对于大多数ULP,这从根本上进一步减少了ULP所需交互的数量。

While RDMAP consolidation of notices is beneficial to most applications, it may be detrimental to some applications that benefit from streamed delivery to enable ULP processing of received data as promptly as possible. A ULP that uses RDMAP cannot begin processing any portion of an exchange until it receives notification that the entire exchange has been placed. An "exchange" here is a set of zero or more Tagged Messages and a single terminating Untagged Message. An application that would prefer to begin work on the received payload as soon as possible, no matter what order it arrived in, might prefer to work directly with the LLP. RDMAP is optimized for applications that are more concerned when the entire exchange is complete.

虽然通知的RDMAP整合对大多数应用程序都是有益的,但对一些从流式交付中获益的应用程序可能是有害的,因为流式交付可以使ULP尽快处理接收到的数据。使用RDMAP的ULP在收到已放置整个exchange的通知之前,无法开始处理exchange的任何部分。这里的“交换”是一组零个或多个标记消息和一个终止的未标记消息。希望尽快开始处理接收到的有效负载的应用程序,无论其到达的顺序如何,可能更希望直接与LLP一起工作。RDMAP针对在整个交换完成时更关心的应用程序进行了优化。

An application that benefits from being able to begin processing of each received packet as quickly as possible may find RDMAP interferes with that goal.

如果应用程序能够尽可能快地开始处理每个接收到的数据包,那么它可能会发现RDMAP干扰了这一目标。

Such an application might be able to retain most of the benefits of RDMAP by using the DDP layer directly. However, in addition to taking on the responsibilities of the RDMAP layer, the application would likely have more difficulty finding support for a DDP-only API. Many hardware implementations may choose to tightly couple RDMAP and DDP, and might not provide an API directly to DDP services.

这样的应用程序可以通过直接使用DDP层保留RDMAP的大部分好处。然而,除了承担RDMAP层的责任外,应用程序可能更难找到对仅DDP API的支持。许多硬件实现可能会选择将RDMAP和DDP紧密耦合,并且可能不会直接向DDP服务提供API。

These features minimize the required interactions with the ULP. This can be extremely beneficial for applications that use multiple transport layer packets to accomplish what is a single ULP interaction.

这些功能将所需的与ULP的交互降至最低。这对于使用多个传输层数据包来完成单个ULP交互的应用程序非常有益。

4.3. Simplified ULP Exchanges
4.3. 简化ULP交换

The notification rules for Tagged Messages allows ULPs to create multi-message "exchanges" consisting of zero or more Tagged Messages that represent a single step in the ULP interaction. The receiving ULP is notified that the Untagged Message has arrived, and implicitly notified of any associated Tagged Messages.

标记消息的通知规则允许ULP创建由零条或多条标记消息组成的多消息“交换”,这些标记消息表示ULP交互中的单个步骤。接收ULP收到未标记消息已到达的通知,并隐式通知任何相关的标记消息。

If a ULP cannot effectively use Tagged Messages, it would derive little benefit from use of RDMAP/DDP by comparison to direct use of SCTP. But, while Tagged Buffers are the justification for RDMAP/DDP, Untagged Buffers are still necessary. Without Untagged Buffers, the only method to exchange buffer Advertisements would require out-of-band communications. Most RDMA-aware ULPs use Untagged Buffers for requests and responses. Buffer Advertisements are typically done within these Untagged Messages.

如果ULP不能有效地使用带标签的消息,那么与直接使用SCTP相比,使用RDMAP/DDP几乎没有什么好处。但是,尽管标记缓冲区是RDMAP/DDP的理由,但未标记缓冲区仍然是必要的。如果没有未标记的缓冲区,交换缓冲区广告的唯一方法将需要带外通信。大多数支持RDMA的ULP对请求和响应使用未标记的缓冲区。缓冲区广告通常在这些未标记的消息中完成。

More importantly, there would be no reliable method for the upper-layer peers to synchronize. The absence of any guarantees about

更重要的是,上层节点没有可靠的同步方法。没有任何关于

ordering within or between Tagged Messages is fundamental to allowing the DDP layer to optimize transfer of tagged payload.

标记消息内部或之间的排序对于允许DDP层优化标记有效负载的传输至关重要。

Therefore, no ULP can be defined entirely in terms of Tagged Messages. Eventually, a notification that confirms delivery must be generated from the RDMAP/DDP layer.

因此,不能完全根据标记消息定义ULP。最后,必须从RDMAP/DDP层生成确认交付的通知。

Limiting use of Untagged Buffers to requests and responses by moving all bulk data using tagged transfers can greatly simplify the amount of prediction that the Data Sink must perform in pre-posting receive buffers. For example, a typical RDMA-enabled interaction would consist of the following:

通过使用标记传输移动所有批量数据,将未标记缓冲区的使用限制在请求和响应上,可以极大地简化数据接收器在预发布接收缓冲区中必须执行的预测量。例如,支持RDMA的典型交互包括以下内容:

1. Client sends transaction request to server as an Untagged Message.

1. 客户端将事务请求作为未标记的消息发送到服务器。

2. This message includes buffer Advertisements for the buffers where the results are to be placed.

2. 此消息包括要放置结果的缓冲区的缓冲区播发。

3. The server sends multiple Tagged Messages to the Advertised buffers.

3. 服务器将多个标记的消息发送到播发的缓冲区。

4. The server sends transaction reply as an Untagged Message to the client.

4. 服务器将事务回复作为未标记的消息发送给客户端。

5. Client receives single notification, indicating completion of the interaction.

5. 客户端收到一个通知,指示交互完成。

With this type of exchange, the pacing and required size of Untagged Buffers are highly predictable. The variability of response sizes is absorbed by tagged transfers.

对于这种类型的交换,未标记缓冲区的速度和所需大小是高度可预测的。响应大小的可变性被标记的传输吸收。

4.4. Order-Independent Sending
4.4. 订单独立发送

Use of Tagged Messages is especially applicable when the Data Sink does not know the actual size, structure, or location of the content it is requesting (or updating).

当数据接收器不知道其请求(或更新)的内容的实际大小、结构或位置时,使用标记消息尤其适用。

For example, suppose the Data Sink ULP needs to fetch four related pieces of data into four separate buffers. With SCTP, the Data Sink ULP could receive four messages into four separate buffers, only having to predict the maximum size of each. However, it would have to dictate the order in which the Data Source supplied the separate pieces. If the Data Source found it advantageous to fetch them in a different order, it would have to use intermediate buffering to re-order the pieces into the expected order even though the application only required that all four be delivered and did not truly have an ordering requirement.

例如,假设数据接收器ULP需要将四个相关的数据片段提取到四个单独的缓冲区中。使用SCTP,数据接收器ULP可以在四个单独的缓冲区中接收四条消息,只需预测每条消息的最大大小。但是,它必须规定数据源提供单独数据块的顺序。如果数据源发现以不同的顺序获取它们是有利的,那么它将不得不使用中间缓冲将这些片段重新排序为预期的顺序,即使应用程序只要求交付所有四个片段,并且没有真正的排序要求。

Techniques, such as RAID striping and mirroring, represent this same problem, but one step further. What appears to be a single resource to the Data Sink is actually stored in separate locations by the Data Source. Non RDMA protocols would either require the Data Source to fetch the material in the desired order or force the Data Source to use its own holding buffers to assemble an image of the destination buffer.

RAID条带化和镜像等技术代表了同样的问题,但更进一步。对于数据接收器来说似乎是单个资源的内容实际上由数据源存储在不同的位置。非RDMA协议要么要求数据源以所需的顺序获取材料,要么强制数据源使用其自己的保留缓冲区来组装目标缓冲区的映像。

While sometimes referred to as a "buffer-to-buffer" solution, RDMA more fundamentally enables remote buffer access. The ULP is free to work with larger remote buffers than it has locally. This reduces buffering requirements and the number of times the data must be copied in an end-to-end transfer.

虽然有时被称为“缓冲区到缓冲区”解决方案,但RDMA从根本上支持远程缓冲区访问。ULP可以自由使用比本地更大的远程缓冲区。这减少了缓冲要求和端到端传输中必须复制数据的次数。

There are numerous reasons why the Data Sink would not know the true order or location of the requested data. It could be different for each client, different records selected and/or different sort orders, as well as RAID striping, file fragmentation, volume fragmentation, volume mirroring, and server-side dynamic compositing of content (such as server-side includes for HTTP).

数据接收器不知道所请求数据的真实顺序或位置的原因有很多。对于每个客户端、所选的不同记录和/或不同的排序顺序,以及RAID条带化、文件碎片化、卷碎片化、卷镜像和服务器端内容动态合成(例如HTTP的服务器端包含),这些都可能不同。

In all of these cases, the Data Source is free to assemble the desired data in the Data Sink's buffer in whatever order the component data becomes available to it. It is not constrained on ordering. It does not have to assemble an image in its own memory before creating it in the Data Sink's buffers.

在所有这些情况下,数据源都可以在数据接收器的缓冲区中以组件数据对其可用的任何顺序自由组合所需的数据。它不受顺序的限制。在数据接收器的缓冲区中创建映像之前,它不必在自己的内存中组装映像。

Note that while DDP enables use of Tagged Messages for bulk transfer, there are some application scenarios where Untagged Messages would still be used for bulk transfer. For example, a file server may not expose its own memory to its clients. A client wishing to write may Advertise a buffer upon which the server will issue RDMA Reads. However, when performing a small write, it may be preferable to include the data in the Untagged Message rather than incurring an additional round trip with the RDMA Read and its response.

请注意,虽然DDP允许使用标记的消息进行批量传输,但在某些应用场景中,未标记的消息仍将用于批量传输。例如,文件服务器可能不会向其客户端公开其自己的内存。希望写入的客户机可以公布一个缓冲区,服务器将在该缓冲区上发出RDMA读取。然而,当执行小的写操作时,最好将数据包括在未标记的消息中,而不是导致RDMA读取及其响应的额外往返。

Generally, the best use of an Untagged Message is to synchronize and to deliver data that is naturally tied to the same message as the synchronization. For initial data transfers, this has the additional benefit of avoiding the need to Advertise specific Tagged Buffers for indefinite time periods. Instead, anonymous buffers can be used for initial data reception. Because anonymous buffers do not need to be tied to specific messages in advance, this can be a major benefit.

通常,未标记消息的最佳用途是进行同步,并交付与同步消息自然绑定在同一消息上的数据。对于初始数据传输,这还有一个额外的好处,即避免在不确定的时间段内公布特定的标记缓冲区。相反,匿名缓冲区可用于初始数据接收。因为匿名缓冲区不需要预先绑定到特定的消息,这是一个主要的好处。

4.5. Untagged Messages and Tagged Buffers as ULP Credits
4.5. 未标记的消息和标记的缓冲区作为ULP信用

The handling of end-to-end buffer credits differs considerably with DDP than when the ULP directly uses either TCP or SCTP.

与ULP直接使用TCP或SCTP时相比,DDP对端到端缓冲区信用的处理有很大不同。

With both TCP and SCTP, buffer credits are based upon the receiver granting transmit permission based on the total number of bytes. These credits reflect system buffering resources and/or simple flow control. They do not represent ULP resources.

对于TCP和SCTP,缓冲区信用基于基于总字节数授予传输权限的接收器。这些积分反映了系统缓冲资源和/或简单的流控制。它们不代表ULP资源。

DDP defines no standard flow control, but presumes the existence of a ULP mechanism. The presumed mechanism is that the Data Sink ULP has issued credits to the Data Source, allowing the Data Source to send a specific number of Untagged Messages.

DDP未定义标准流量控制,但假定存在ULP机制。假定的机制是数据接收器ULP已向数据源发出信用,允许数据源发送特定数量的未标记消息。

The ULP peers must ensure that the sender is aware of the maximum size that can be sent to any specific target buffer. One method of doing so is to use a standard size for all Untagged Buffers within a given connection. For example, a ULP may specify an initial Untagged Buffer size to be used immediately after session establishment, and then optionally specify mechanisms for negotiating changes.

ULP对等方必须确保发送方知道可以发送到任何特定目标缓冲区的最大大小。一种方法是对给定连接中的所有未标记缓冲区使用标准大小。例如,ULP可以指定在会话建立之后立即使用的初始未标记缓冲区大小,然后可选地指定协商更改的机制。

Tagged Buffers are ULP resources Advertised directly from ULP to ULP. A DDP put to a known Tagged Buffer is constrained only by transport level flow control, not by available system buffering.

标记缓冲区是直接从ULP向ULP播发的ULP资源。放入已知标记缓冲区的DDP仅受传输级流控制的约束,而不受可用系统缓冲的约束。

Either Tagged or Untagged Buffers allows bypassing of system buffer resources. Use of Tagged Buffers additionally allows the Data Source to choose in what order to exercise the credits.

标记或未标记的缓冲区都允许绕过系统缓冲区资源。使用标记缓冲区还允许数据源选择行使信用的顺序。

To the extent allowed by the ULP, Tagged Buffers are also divisible resources. The Data Sink can Advertise a single 100 KB buffer, and then receive notifications from its peer that it had written 50 KB, 20 KB, and 30 KB to that buffer in three successive transactions.

在ULP允许的范围内,标记缓冲区也是可分割的资源。数据接收器可以公布单个100 KB缓冲区,然后从其对等方接收通知,告知它已在三个连续事务中将50 KB、20 KB和30 KB写入该缓冲区。

ULP management of Tagged Buffer resources, independent of transport and DDP layer credits, is an additional benefit of RDMA protocols. Large bulk transfers cannot be blocked by limited general-purpose buffering capacity. Applications can flow control based upon higher level abstractions, such as number of outstanding requests, independent of the amount of data that must be transferred.

标签缓冲区资源的ULP管理独立于传输和DDP层信用,是RDMA协议的另一个好处。有限的通用缓冲容量无法阻止大批量传输。应用程序可以基于更高级别的抽象(如未完成请求的数量)进行流控制,而不依赖于必须传输的数据量。

However, use of system buffering, as offered by direct use of the underlying transports, can be preferable under certain circumstances.

然而,在某些情况下,通过直接使用底层传输提供的系统缓冲可能更可取。

One example would be when the number of target ULP Buffers is sufficiently large, and the rate at which any writes arrive is sufficiently low, that pinning all the target ULP Buffers in memory would be undesirable. The maximum transfer rate, and hence the maximum amount of system buffering required, may be more stable and predictable than the total ULP Buffer exposure.

一个例子是,当目标ULP缓冲区的数量足够大,并且任何写入到达的速率足够低时,不希望将所有目标ULP缓冲区固定在内存中。最大传输速率以及所需的最大系统缓冲量可能比ULP缓冲区总暴露量更稳定和可预测。

Another example would be when the Data Sink wishes to receive a stream of data at a predictable rate, but does not know in advance what the size of each data packet will be. This is common from streaming media that has been encoded with a variable bit rate. With DDP, the Data Sink would either have to use Untagged Buffers large enough for the largest packet, or Advertise a circular buffer. If, for security or other reasons, the Data Sink did not want the size of its buffer to be publicly known, using the underlying SCTP transport directly may be preferable because of its byte-oriented credits.

另一个例子是当数据接收器希望以可预测的速率接收数据流,但事先不知道每个数据分组的大小时。这在以可变比特率编码的流媒体中很常见。使用DDP,数据接收器必须使用足够大的未标记缓冲区来容纳最大的数据包,或者发布一个循环缓冲区。如果出于安全或其他原因,数据接收器不希望其缓冲区的大小被公开,则直接使用底层SCTP传输可能更可取,因为其面向字节的信用。

5. RDMA Read
5. RDMA读取

RDMA Reads are a further service provided by RDMAP. RDMA Reads allow the Data Sink to fetch exactly the portion of the peer ULP Buffer required on a "just in time" basis. This can be done without requiring per-fetch support from the Data Source ULP.

RDMA读取是RDMAP提供的进一步服务。RDMA读取允许数据接收器在“实时”的基础上准确获取对等ULP缓冲区所需的部分。这可以在不需要数据源ULP的每次提取支持的情况下完成。

Storage servers may wish to limit the maximum write buffer allocated to any single session. The storage server may be a very minimal layer between the client and the disk storage media, or the server may merely wish to limit the total resources that would be required if all clients could push the entire payload they wished written at their own convenience.

存储服务器可能希望限制分配给任何单个会话的最大写入缓冲区。存储服务器可能是客户机和磁盘存储介质之间的一个非常小的层,或者服务器可能只希望限制如果所有客户机都可以在自己方便的情况下推送他们希望写入的整个有效负载所需的总资源。

In either case, there is little benefit in transferring data from the Data Source far in advance of when it will be written to the persistent storage media. RDMA Reads allow the Storage Server to fetch the payload on a "just in time" basis. In this fashion, a relatively small number of block-sized buffers can be used to execute a single transaction that specified writing a large file, or a Storage Server with numerous clients can fetch buffers from the individual clients in the order that is most convenient to the server.

在这两种情况下,在将数据写入持久存储介质之前从数据源传输数据几乎没有什么好处。RDMA读取允许存储服务器“及时”获取有效负载。以这种方式,可以使用相对较少的块大小的缓冲区来执行指定写入大文件的单个事务,或者具有多个客户端的存储服务器可以按照对服务器最方便的顺序从各个客户端获取缓冲区。

This same capability can be used when the desired portion of the Advertised Buffer is not known in advance. For example, the Advertised Buffer could contain performance statistics. The Data Sink could request the portions of the data it required, without requiring an interaction with the Data Source ULP.

当播发缓冲区的所需部分事先未知时,可以使用相同的功能。例如,公布的缓冲区可能包含性能统计信息。数据接收器可以请求它所需的部分数据,而不需要与数据源ULP交互。

This is applicable for many applications that publish semi-volatile data that does not require transactional validity checking (i.e., authorized users have read access to the entire set of data). It is less applicable when there are ULP consistency checks that must be performed upon the data. Such applications would be better served by having the client send a request, and having the server use RDMA Writes to publish the requested data. Neither RDMAP nor DDP provide mechanisms for bundling multiple disjoint updates into an atomic

这适用于发布不需要事务有效性检查的半易失性数据的许多应用程序(即,授权用户具有对整个数据集的读取权限)。当必须对数据执行ULP一致性检查时,它不太适用。让客户端发送请求,让服务器使用RDMA写入来发布请求的数据,这样的应用程序将得到更好的服务。RDMAP和DDP都不提供将多个不相交的更新捆绑到原子数据库中的机制

operation. Therefore, use of an Advertised Buffer as a data resource is subject to the same caveats as any randomly updated data resource, such as flat files, that do not enforce their own consistency.

活动因此,使用公布的缓冲区作为数据资源与任何随机更新的数据资源(如平面文件)一样,都要遵守相同的注意事项,因为这些数据资源并不强制实现其自身的一致性。

6. LLP Comparisons
6. LLP比较

Normally, the choice of underlying IP transport is irrelevant to the ULP. RDMAP and DDP provides the same services over either. There may be performance impacts of the choice, however. It is the responsibility of the ULP to determine which IP transport is best suited to its needs.

通常,底层IP传输的选择与ULP无关。RDMAP和DDP通过这两种方式提供相同的服务。然而,这种选择可能会对性能产生影响。ULP有责任确定哪个IP传输最适合其需求。

SCTP provides for preservation of message boundaries. Each DDP Segment will be delivered within a single SCTP packet. The equivalent services are only available with TCP through the use of the MPA (Marker PDU Alignment) adaptation layer.

SCTP提供了消息边界的保护。每个DDP段将在单个SCTP数据包内交付。只有通过使用MPA(标记PDU对齐)适配层,TCP才能提供等效服务。

6.1. Multistreaming Implications
6.1. 多流含义

SCTP also provides multi-streaming. When the same pair of hosts have need for multiple DDP streams, this can be a major advantage. A single SCTP association carries multiple DDP streams, consolidating connection setup, congestion control, and acknowledgements.

SCTP还提供多流传输。当同一对主机需要多个DDP流时,这可能是一个主要优势。单个SCTP关联承载多个DDP流,整合连接设置、拥塞控制和确认。

Completions are controlled by the DDP Source Sequence Number (DDP-SSN) on a per-stream basis. Therefore, combining multiple DDP Streams into a single SCTP association cannot result in a dropped packet carrying data for one stream delaying completions on others.

每个流的完成由DDP源序列号(DDP-SSN)控制。因此,将多个DDP流组合成单个SCTP关联不会导致一个流的丢包数据延迟其他流的完成。

6.2. Out-of-Order Reception Implications
6.2. 故障接收影响

The use of unordered Data Chunks with SCTP guarantees that the DDP layer will be able to perform placements when IP datagrams are received out of order.

在SCTP中使用无序数据块可以保证DDP层在IP数据报接收顺序错误时能够执行放置。

Placement of out-of-order DDP Segments carried over MPA/TCP is not guaranteed, but certainly allowed. The ability of the MPA receiver to process out-of-order DDP Segments may be impaired when alignment of TCP segments and MPA FPDUs is lost. Using SCTP, each DDP Segment is encoded in a single Data Chunk and never spread over multiple IP datagrams.

不保证通过MPA/TCP传输的无序DDP段的放置,但肯定允许。当TCP段和MPA FPDU的对齐丢失时,MPA接收器处理无序DDP段的能力可能会受损。使用SCTP,每个DDP段都编码在一个数据块中,并且从不分散在多个IP数据报上。

6.3. Header and Marker Overhead
6.3. 收割台和标记机开销

MPA and TCP headers together are smaller than the headers used by SCTP and its adaptation layer. However, this advantage can be reduced by the insertion of MPA markers. The difference in ULP Payload per IP Datagram is not likely to be a significant factor.

MPA和TCP报头加在一起比SCTP及其适配层使用的报头小。然而,插入MPA标记会降低这一优势。每个IP数据报ULP有效负载的差异不太可能是一个重要因素。

6.4. Middlebox Support
6.4. 中间箱支架

Even with the MPA adaptation layer, DDP traffic carried over MPA/TCP will appear to all network middleboxes as a normal TCP connection. In many environments, there may be a requirement to use only TCP connections to satisfy existing network elements and/or to facilitate monitoring and control of connections. While SCTP is certainly just as monitorable and controllable as TCP, there is no guarantee that the network management infrastructure has the required support for both.

即使使用MPA适配层,通过MPA/TCP承载的DDP流量也将作为正常的TCP连接出现在所有网络中间盒中。在许多环境中,可能需要仅使用TCP连接来满足现有网络元素和/或促进对连接的监视和控制。虽然SCTP肯定和TCP一样可监控和可控制,但不能保证网络管理基础设施对两者都有必要的支持。

6.5. Processing Overhead
6.5. 处理开销

A DDP stream delivered via MPA/TCP will require more processing effort than one delivered over SCTP. However, this extra work may be justified for many deployments where full SCTP support is unavailable in the endpoints of the network, or where middleboxes impair the usability of SCTP.

通过MPA/TCP交付的DDP流比通过SCTP交付的DDP流需要更多的处理工作。但是,对于许多部署来说,如果网络端点无法提供完整的SCTP支持,或者中间盒影响了SCTP的可用性,那么这种额外的工作可能是合理的。

6.6. Data Integrity Implications
6.6. 数据完整性影响

Both the SCTP [RFC4960] and MPA/TCP [RFC5044] adaptation provide end-to-end CRC32c protection against data accidental corruption, or its equivalent.

Both the SCTP [RFC4960] and MPA/TCP [RFC5044] adaptation provide end-to-end CRC32c protection against data accidental corruption, or its equivalent.translate error, please retry

A ULP that requires a greater degree of protection may add its own. However, DDP and RDMAP headers will only be guaranteed to have the equivalent of end-to-end CRC32c protection. A ULP that requires data integrity checking more thorough than an end-to-end CRC32c should first invalidate all STags that reference a buffer before applying its own integrity check.

需要更大程度保护的ULP可以添加自己的保护。但是,DDP和RDMAP头只能保证具有相当于端到端CRC32c保护的功能。需要比端到端CRC32c更彻底的数据完整性检查的ULP应首先使引用缓冲区的所有STAG失效,然后再应用其自身的完整性检查。

CRC32c only provides protection against random corruption. To protect against unauthorized alteration or forging of data packets, security methods must be applied. The RDMA security document [RFC5042] specifies usage of RFC 2406 [RFC2406] for both adaptation layers. As stated in [RFC5042], note that the IPsec requirements for RDDP are based on the version of IPsec specified in RFC 2401 [RFC2401] and related RFCs, as profiled by RFC 3723 [RFC3723], despite the existence of a newer version of IPsec specified in RFC 4301 [RFC4301] and related RFCs.

CRC32c仅提供防止随机损坏的保护。为了防止未经授权修改或伪造数据包,必须采用安全方法。RDMA安全文档[RFC5042]规定了两个适配层使用RFC 2406[RFC2406]。如[RFC5042]所述,请注意,RDDP的IPsec要求基于RFC 2401[RFC2401]和相关RFC中指定的IPsec版本,如RFC 3723[RFC3723]所述,尽管RFC 4301[RFC4301]和相关RFC中指定的IPsec版本较新。

6.6.1. MPA/TCP Specifics
6.6.1. MPA/TCP规范

It is mandatory for MPA/TCP implementations to implement CRC32c, but it is not mandatory to use the CRC32c during an RDMA connection. The activating or deactivating of the CRC in MPA/TCP is an administrative configuration operation at the local and remote end. The administration of the CRC (ON/OFF) is invisible to the ULP.

MPA/TCP实现必须实现CRC32c,但在RDMA连接期间使用CRC32c不是必须的。MPA/TCP中CRC的激活或停用是本地和远程端的管理配置操作。对ULP而言,CRC(开/关)的管理是不可见的。

Applications should assume that disabling CRC32c will only be used when the end-to-end protection is at least as effective as a transport layer CRC32c. Applications should not use additional integrity checks based solely on the possibility that CRC32c could be disabled without equivalent integrity checks at a lower level.

应用程序应假定,仅当端到端保护至少与传输层CRC32c一样有效时,才会使用禁用CRC32c。应用程序不应仅基于CRC32c可能被禁用的可能性而使用额外的完整性检查,而无需在较低级别进行等效的完整性检查。

CRC32c must not be disabled unless equivalent or better end-to-end integrity protection is provided.

除非提供同等或更好的端到端完整性保护,否则不得禁用CRC32c。

If the CRC is active/used for one direction/end, then the use of the CRC is mandatory in both directions/ends.

如果CRC激活/用于一个方向/端部,则必须在两个方向/端部使用CRC。

If both ends have been configured not to use the CRC, then this is allowed as long as an equivalent protection (comparable to or better than CRC) from undetected errors on the connection is provided.

如果两端都配置为不使用CRC,则只要提供了防止连接上未检测到错误的等效保护(与CRC相当或优于CRC),则允许使用CRC。

6.6.2. SCTP Specifics
6.6.2. SCTP细节

SCTP provides CRC32c protection automatically. The adaptation to SCTP provides for no option to suppress SCTP CRC32c protection.

SCTP自动提供CRC32c保护。SCTP自适应不提供抑制SCTP CRC32c保护的选项。

6.7. Non-IP Transports
6.7. 非IP传输

DDP is defined to operate over ubiquitous IP transports such as SCTP and TCP. This enables a new DDP-enabled node to be added anywhere to an IP network. No DDP-specific support from middleboxes is required.

DDP被定义为在无处不在的IP传输(如SCTP和TCP)上运行。这使启用DDP的新节点可以添加到IP网络的任何位置。不需要来自中间盒的DDP特定支持。

There are non-IP transport fabric offering RDMA capabilities. Because these capabilities are integrated with the transport protocol they have some technical advantages when compared to RDMA over IP. For example, fencing of RDMA Operations can be based upon transport level acks. Because DDP is cleanly layered over an IP transport, any explicit RDMA layer ack must be separate from the transport layer ack.

有一些非IP传输结构提供RDMA功能。由于这些功能与传输协议集成,因此与RDMA over IP相比,它们具有一些技术优势。例如,RDMA操作的防护可以基于传输级ACK。因为DDP是在IP传输上干净地分层的,所以任何显式RDMA层ack必须与传输层ack分开。

There may be deployments where the benefits of RDMA/transport integration outweigh the benefits of being on an IP network.

在某些部署中,RDMA/传输集成的好处可能大于IP网络的好处。

6.7.1. No RDMA-Layer Ack
6.7.1. 无RDMA层确认

DDP does not provide for its own acknowledgements. The only form of ack provided at the RDMAP layer is an RDMA Read Response. DDP and RDMAP rely almost entirely upon other layers for flow control and pacing. The LLP is relied upon to guarantee delivery and avoid network congestion, and ULP-level acking is relied upon for ULP pacing and to avoid ULP Buffer overruns.

DDP不提供自己的确认。RDMAP层提供的唯一ack形式是RDMA读取响应。DDP和RDMAP几乎完全依赖其他层进行流量控制和起搏。LLP用于保证交付和避免网络拥塞,ULP级确认用于ULP起搏和避免ULP缓冲区溢出。

Previous RDMA protocols, such as InfiniBand, have been able to use their integration with the transport layer to provide stronger ordering guarantees. It is important that application designers that require such guarantees provide them through ULP interaction.

以前的RDMA协议,如InfiniBand,已经能够利用它们与传输层的集成来提供更强的排序保证。需要这种保证的应用程序设计人员通过ULP交互提供保证是很重要的。

Specifically:

明确地:

There is no ability for a local interface to "fence" outbound messages to guarantee that prior Tagged Messages have been placed prior to sending a Tagged Message. The only guarantees available from the other side would be an RDMA Read Response (coming from the RDMAP layer) or a response from the ULP layer. Remember that the normal ordering rules only guarantee when the Data Sink ULP will be notified of Untagged Messages; it does not control when data is placed into receive buffers.

本地接口无法“隔离”出站消息,以确保在发送标记消息之前已放置先前的标记消息。另一方提供的唯一保证是RDMA读取响应(来自RDMAP层)或来自ULP层的响应。请记住,正常的排序规则仅保证数据接收器ULP收到未标记消息的通知时;它不控制何时将数据放入接收缓冲区。

Re-use of Tagged Buffers must be done with extreme care. The fact that an Untagged Message indicates that all prior Tagged Messages have been placed does not guarantee that no later Tagged Message has. The best strategy is to change only the state of any given Advertised Buffers with Untagged Messages.

必须极其小心地重复使用带标签的缓冲器。未标记的消息表示已放置所有先前标记的消息,但这并不保证以后没有标记的消息。最好的策略是只更改具有未标记消息的任何给定播发缓冲区的状态。

As covered elsewhere in this document, flow control of Untagged Messages is the responsibility of the ULP.

如本文档其他部分所述,未标记消息的流控制由ULP负责。

6.8. Other IP Transports
6.8. 其他IP传输

Both TCP and SCTP provide DDP with reliable transport with TCP-friendly rate control. Currently, DDP is defined to work over reliable transports and implicitly relies upon some form of rate control.

TCP和SCTP都通过TCP友好的速率控制为DDP提供可靠的传输。目前,DDP被定义为在可靠的传输上工作,并且隐含地依赖于某种形式的速率控制。

DDP is fully compatible with a non-reliable protocol. Out-of-order placement is obviously not dependent on whether the other DDP Segments ever actually arrive.

DDP与不可靠的协议完全兼容。无序放置显然不取决于其他DDP段是否实际到达。

However, RDMAP requires the LLP to provide reliable service. An alternate completion handling protocol would be required if DDP were to be deployed over an unreliable IP transport.

然而,RDMAP要求LLP提供可靠的服务。如果要通过不可靠的IP传输部署DDP,则需要备用完成处理协议。

As noted in the prior section on Tagged Buffers as ULP credits, neither RDMAP nor DDP provides any flow control for Tagged Messages. If no transport layer flow control is provided, an RDMAP/DDP application would be limited only by the link layer rate, almost inevitably resulting in severe network congestion.

如前一节关于标记缓冲区作为ULP信用的说明,RDMAP和DDP都不为标记消息提供任何流控制。如果不提供传输层流量控制,RDMAP/DDP应用程序将仅受链路层速率的限制,几乎不可避免地导致严重的网络拥塞。

RDMAP encourages applications to be ignorant of the underlying transport path MTU. The ULP is only notified when all messages ending in a single Untagged Message have completed. The ULP is not aware of the granularity or ordering of the underlying message. This approach assumes that the ULP is only interested in the complete set of messages, and has no use for a subset of them.

RDMAP鼓励应用程序忽略底层传输路径MTU。仅当以单个未标记消息结尾的所有消息都已完成时,才会通知ULP。ULP不知道底层消息的粒度或顺序。这种方法假设ULP只对完整的消息集感兴趣,而不使用其中的一个子集。

6.9. LLP-Independent Session Establishment
6.9. LLP独立会议的建立

For an RDMAP/DDP application, the transport services provided by a pair of SCTP streams and by a TCP connection both provide the same service (reliable delivery of DDP Segments between two connected RDMAP/DDP endpoints).

对于RDMAP/DDP应用程序,由一对SCTP流和TCP连接提供的传输服务都提供相同的服务(在两个连接的RDMAP/DDP端点之间可靠地传递DDP段)。

6.9.1. RDMA-Only Session Establishment
6.9.1. 仅RDMA会话建立

It is also possible to allow for transport-neutral establishment of RDMAP/DDP sessions between endpoints. Combined, these two features would allow most applications to be unconcerned as to which LLP was actually in use.

还可以允许在端点之间建立与传输无关的RDMAP/DDP会话。结合这两个特性,大多数应用程序都可以不关心实际使用的是哪种LLP。

Specifically, the procedures for DDP Stream Session establishment discussed in section 3 of the SCTP mapping, and section 13.3 of the MPA/TCP mapping, both allow for the exchange of ULP-specific data ("Private Data") before enabling the exchange of DDP Segments. This delay can allow for proper selection and/or configuration of the endpoints based upon the exchanged data. For example, each DDP Stream Session associated with a single client session might be assigned to the same DDP Protection Domain.

具体而言,SCTP映射第3节和MPA/TCP映射第13.3节中讨论的DDP流会话建立程序都允许在启用DDP段交换之前交换ULP特定数据(“专用数据”)。该延迟可允许基于交换的数据正确选择和/或配置端点。例如,与单个客户端会话关联的每个DDP流会话可能被分配到相同的DDP保护域。

To be transport neutral, the applications should exchange Private Data as part of session establishment messages to determine how the RDMA endpoints are to be configured. One side must be the Initiator, and the other, the Responder.

为了与传输无关,应用程序应该交换私有数据作为会话建立消息的一部分,以确定如何配置RDMA端点。一方必须是发起方,另一方必须是响应方。

With SCTP, a pair of SCTP streams can be used for successive sessions while the SCTP association remains open. With MPA/TCP, each connection can be used for, at most, one session. However, the same source/destination pair of ports can be re-used for a subsequent TCP connection, as allowed by TCP.

使用SCTP,一对SCTP流可用于连续会话,同时SCTP关联保持打开状态。使用MPA/TCP,每个连接最多可用于一个会话。但是,在TCP允许的情况下,相同的源/目标端口对可以重新用于后续TCP连接。

Both SCTP and MPA limit the private data size to a maximum of 512 bytes.

SCTP和MPA都将私有数据大小限制为最大512字节。

MPA/TCP requires the end of the TCP connection that initiated the conversion to MPA mode to send the first DDP Segment. SCTP does not have this requirement. ULPs that wish to be transport neutral should require the initiating end to send the first message. A zero-length RDMA Write can be used for this purpose if the ULP logic itself does naturally support this restriction.

MPA/TCP要求启动到MPA模式转换的TCP连接结束,以发送第一个DDP段。SCTP没有此要求。希望与传输无关的ULP应要求发起端发送第一条消息。如果ULP逻辑本身自然支持此限制,则零长度RDMA写入可用于此目的。

6.9.2. RDMA-Conditional Session Establishment
6.9.2. 条件会话建立

It is sometimes desirable for the active side of a session to connect with the passive side before knowing whether the passive side supports RDMA.

在知道被动端是否支持RDMA之前,会话的主动端有时需要与被动端连接。

This style of session establishment can be supported with either TCP or SCTP, but not as transparently as for RDMA-only sessions. Pre-existing non-RDMA servers are also far more likely to be using TCP than SCTP.

TCP或SCTP都可以支持这种类型的会话建立,但不像仅RDMA会话那样透明。与SCTP相比,先前存在的非RDMA服务器也更可能使用TCP。

With TCP, a normal TCP connection is established. It is then used by the ULP to determine whether or not to convert to MPA mode and use RDMA. This will typically be integral with other session-establishment negotiations.

使用TCP,将建立正常的TCP连接。然后ULP使用它来确定是否转换为MPA模式并使用RDMA。这通常将与其他届会设立谈判融为一体。

With SCTP, the establishment of an association tests whether RDMA is supported. If not supported, the application simply requests the association without the RDMA adaptation indication.

使用SCTP,关联的建立将测试是否支持RDMA。如果不受支持,应用程序只需请求关联,而不需要RDMA自适应指示。

One key difference is that with SCTP the determination as to whether the peer can support RDMA is made before the transport layer association/connection is established, while with TCP the established connection itself is used to determine whether RDMA is supported.

一个关键区别是,对于SCTP,在建立传输层关联/连接之前确定对等方是否可以支持RDMA,而对于TCP,已建立的连接本身用于确定是否支持RDMA。

7. Local Interface Implications
7. 本地接口含义

Full utilization of DDP and RDMAP capabilities requires a local interface that explicitly requests these services. Protocols such as Sockets Direct Protocol (SDP) can allow applications to keep their traditional byte-stream or message-stream interface and still enjoy many of the benefits of the optimized wire level protocols.

充分利用DDP和RDMAP功能需要一个本地接口来明确请求这些服务。诸如Sockets Direct Protocol(SDP)之类的协议允许应用程序保留其传统的字节流或消息流接口,并且仍然可以享受优化的有线协议的许多好处。

8. Security Considerations
8. 安全考虑

RDMA security considerations are discussed in the RDMA security document [RFC5042]. This document will only deal with the more usage-oriented aspects, and where there are implications in the choice of underlying transport.

RDMA安全注意事项在RDMA安全文档[RFC5042]中进行了讨论。本文档将只讨论更面向使用的方面,以及在选择基础传输时有影响的方面。

8.1. Connection/Association Setup
8.1. 连接/关联设置

Both the SCTP and TCP adaptations allow for existing procedures to be followed for the establishment of the SCTP association or TCP connection. Use of DDP does not impair the use of any security measures to filter, validate, and/or log the remote end of an association/connection.

SCTP和TCP适配都允许按照现有程序建立SCTP关联或TCP连接。使用DDP不会影响使用任何安全措施来过滤、验证和/或记录关联/连接的远程端。

8.2. Tagged Buffer Exposure
8.2. 标记缓冲区曝光

DDP only exposes ULP memory to the extent explicitly allowed by ULP actions. These include posting of receive operations and enabling of Steering Tags.

DDP仅在ULP操作明确允许的范围内公开ULP内存。这些包括发布接收操作和启用转向标签。

Neither RDMAP nor DDP places requirements on how ULPs Advertise Buffers. A ULP may use a single Steering Tag for multiple buffer Advertisements. However, the ULP should be aware that enforcement on STag usage is likely limited to the overall range that is enabled. If the Remote Peer writes into the 'wrong' Advertised Buffer, neither the DDP nor the RDMAP layer will be aware of this. Nor is there any report to the ULP on how the Remote Peer specifically used Tagged Buffers.

RDMAP和DDP都没有对ULP如何公布缓冲区提出要求。ULP可使用单个转向标签进行多个缓冲器广告。然而,ULP应该意识到,对STag使用的强制执行可能仅限于启用的总体范围。如果远程对等方写入“错误”的播发缓冲区,DDP层和RDMAP层都不会意识到这一点。也没有向ULP报告远程对等方如何专门使用标记缓冲区。

Unless the ULP peers have an adequate basis for mutual trust, the receiving ULP might be well advised to use a distinct STag for each interaction, and to invalidate it after each use, or to require its peer to use the RDMAP option to invalidate the STag with its responding Untagged Message.

除非ULP对等方具有足够的互信基础,否则接收ULP最好为每次交互使用不同的STag,并在每次使用后使其无效,或者要求其对等方使用RDMAP选项使STag及其响应的未标记消息无效。

8.3. Impact of Encrypted Transports
8.3. 加密传输的影响

While DDP is cleanly layered over the LLP, its maximum benefit may be limited when the LLP Stream is secured with a streaming cypher, such as Transport Layer Security (TLS) [RFC4346]. If the LLP must decrypt in order, it cannot provide out-of-order DDP Segments to the DDP layer for placement purposes. IPsec [RFC2401] tunnel mode encrypts entire IP Datagrams. IPsec transport mode encrypts TCP Segments or SCTP packets, as does use of Datagram TLS (DTLS) [RFC4347] over UDP beneath TCP or SCTP. Neither IPsec nor this use of DTLS precludes providing out-of-order DDP Segments to the DDP layer for placement.

虽然DDP在LLP上被干净地分层,但当LLP流使用流密码(例如传输层安全性(TLS))进行安全保护时,其最大益处可能受到限制[RFC4346]。如果LLP必须按顺序解密,则无法为DDP层提供无序DDP段以用于放置。IPsec[RFC2401]隧道模式加密整个IP数据报。IPsec传输模式加密TCP段或SCTP数据包,在TCP或SCTP下通过UDP使用数据报TLS(DTL)[RFC4347]也是如此。IPsec和DTL的这种使用都不排除向DDP层提供无序DDP段以进行放置。

Note that end-to-end use of cryptographic integrity protection may allow suppression of MPA CRC generation and checking under certain circumstances. This is one example where the LLP may be judged to have "or equivalent" protection to an end-to-end CRC32c.

注意,在某些情况下,端到端使用密码完整性保护可能会抑制MPA CRC生成和检查。这是一个可以判断LLP对端到端CRC32c具有“或等效”保护的示例。

9. References
9. 工具书类
9.1. Normative References
9.1. 规范性引用文件

[RFC2401] Kent, S. and R. Atkinson, "Security Architecture for the Internet Protocol", RFC 2401, November 1998.

[RFC2401]Kent,S.和R.Atkinson,“互联网协议的安全架构”,RFC 2401,1998年11月。

[RFC2406] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload (ESP)", RFC 2406, November 1998.

[RFC2406]Kent,S.和R.Atkinson,“IP封装安全有效载荷(ESP)”,RFC 2406,1998年11月。

[RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC 4960, September 2007.

[RFC4960]Stewart,R.,“流控制传输协议”,RFC 49602007年9月。

[RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. Garcia, "A Remote Direct Memory Access Protocol Specification", RFC 5040, October 2007.

[RFC5040]Recio,R.,Metzler,B.,Culley,P.,Hilland,J.,和D.Garcia,“远程直接内存访问协议规范”,RFC 50402007年10月。

[RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct Data Placement over Reliable Transports", RFC 5041, October 2007.

[RFC5041]Shah,H.,Pinkerton,J.,Recio,R.,和P.Culley,“可靠传输上的直接数据放置”,RFC 50412007年10月。

[RFC5042] Pinkerton, J. and E. Deleganes, "DDP/RDMAP Security", RFC 5042, October 2007.

[RFC5042]Pinkerton,J.和E.Deleganes,“DDP/RDMAP安全”,RFC 50422007年10月。

[RFC5043] Bestler, C. and R. Stewart, "Stream Control Transmission Protocol (SCTP) Direct Data Placement (DDP) Adaptation", RFC 5043, October 2007.

[RFC5043]Bestler,C.和R.Stewart,“流控制传输协议(SCTP)直接数据放置(DDP)自适应”,RFC 50432007年10月。

[RFC5044] Culley, P., Elzur, U., Recio, R., Bailey, S., and J. Carrier, "Marker PDU Aligned Framing for TCP Specification", RFC 5044, October 2007.

[RFC5044]Culley,P.,Elzur,U.,Recio,R.,Bailey,S.,和J.Carrier,“TCP规范的标记PDU对齐帧”,RFC 5044,2007年10月。

9.2. Informative References
9.2. 资料性引用

[NFSDIRECT] Talpey, T., Callaghan, B., and I. Property, "NFS Direct Data Placement", Work in Progress, June 2007.

[NFSDIRECT]Talpey,T.,Callaghan,B.,和I.财产,“NFS直接数据放置”,正在进行的工作,2007年6月。

[RFC3723] Aboba, B., Tseng, J., Walker, J., Rangan, V., and F. Travostino, "Securing Block Storage Protocols over IP", RFC 3723, April 2004.

[RFC3723]Aboba,B.,Tseng,J.,Walker,J.,Rangan,V.,和F.Travostino,“通过IP保护块存储协议”,RFC 37232004年4月。

[RFC4301] Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, December 2005.

[RFC4301]Kent,S.和K.Seo,“互联网协议的安全架构”,RFC 43012005年12月。

[RFC4346] Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.1", RFC 4346, April 2006.

[RFC4346]Dierks,T.和E.Rescorla,“传输层安全(TLS)协议版本1.1”,RFC 4346,2006年4月。

[RFC4347] Rescorla, E. and N. Modadugu, "Datagram Transport Layer Security", RFC 4347, April 2006.

[RFC4347]Rescorla,E.和N.Modadugu,“数据报传输层安全”,RFC 4347,2006年4月。

[RFC5046] Ko, M., Chadalapaka, M., Elzur, U., Shah, H., and P. Thaler, "Internet Small Computer System Interface (iSCSI) Extensions for Remote Direct Memory Access (RDMA)", RFC 5046, October 2007.

[RFC5046]Ko,M.,Chadalapaka,M.,Elzur,U.,Shah,H.,和P.Thaler,“用于远程直接内存访问(RDMA)的互联网小型计算机系统接口(iSCSI)扩展”,RFC 50462007年10月。

Authors' Addresses

作者地址

Caitlin Bestler (editor) Neterion 20230 Stevens Creek Blvd. Suite C Cupertino, CA 95014 USA

Caitlin Bestler(编辑)Neterion 20230 Stevens Creek大道。美国加利福尼亚州库珀蒂诺C套房95014

Phone: 408-366-4639 EMail: caitlin.bestler@neterion.com

电话:408-366-4639电子邮件:凯特琳。bestler@neterion.com

Lode Coene Nokia Siemens Networks Atealaan 26 Herentals 2200 Belgium

Lode Coene Nokia Siemens Networks Atalaan 26 Herentals 2200比利时

   Phone: +32-14-252081
   EMail: lode.coene@nsn.com
        
   Phone: +32-14-252081
   EMail: lode.coene@nsn.com
        

Full Copyright Statement

完整版权声明

Copyright (C) The IETF Trust (2007).

版权所有(C)IETF信托基金(2007年)。

This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.

本文件受BCP 78中包含的权利、许可和限制的约束,除其中规定外,作者保留其所有权利。

This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件及其包含的信息以“原样”为基础提供,贡献者、他/她所代表或赞助的组织(如有)、互联网协会、IETF信托基金和互联网工程任务组不承担任何明示或暗示的担保,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Intellectual Property

知识产权

The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.

IETF对可能声称与本文件所述技术的实施或使用有关的任何知识产权或其他权利的有效性或范围,或此类权利下的任何许可可能或可能不可用的程度,不采取任何立场;它也不表示它已作出任何独立努力来确定任何此类权利。有关RFC文件中权利的程序信息,请参见BCP 78和BCP 79。

Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.

向IETF秘书处披露的知识产权副本和任何许可证保证,或本规范实施者或用户试图获得使用此类专有权利的一般许可证或许可的结果,可从IETF在线知识产权存储库获取,网址为http://www.ietf.org/ipr.

The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.

IETF邀请任何相关方提请其注意任何版权、专利或专利申请,或其他可能涵盖实施本标准所需技术的专有权利。请将信息发送至IETF的IETF-ipr@ietf.org.