Network Working Group                                          V. Paxson
Request for Comments: 2525                                        Editor
Category: Informational                                     ACIRI / ICSI
                                                               M. Allman
                            NASA Glenn Research Center/Sterling Software
                                                               S. Dawson
                                          Real-Time Computing Laboratory
                                                               W. Fenner
                                                              Xerox PARC
                                                               J. Griner
                                              NASA Glenn Research Center
                                                              I. Heavens
                                                    Spider Software Ltd.
                                                                K. Lahey
                                           NASA Ames Research Center/MRJ
                                                                J. Semke
                                        Pittsburgh Supercomputing Center
                                                                 B. Volz
                                            Process Software Corporation
                                                              March 1999
        
Network Working Group                                          V. Paxson
Request for Comments: 2525                                        Editor
Category: Informational                                     ACIRI / ICSI
                                                               M. Allman
                            NASA Glenn Research Center/Sterling Software
                                                               S. Dawson
                                          Real-Time Computing Laboratory
                                                               W. Fenner
                                                              Xerox PARC
                                                               J. Griner
                                              NASA Glenn Research Center
                                                              I. Heavens
                                                    Spider Software Ltd.
                                                                K. Lahey
                                           NASA Ames Research Center/MRJ
                                                                J. Semke
                                        Pittsburgh Supercomputing Center
                                                                 B. Volz
                                            Process Software Corporation
                                                              March 1999
        

Known TCP Implementation Problems

已知的TCP实现问题

Status of this Memo

本备忘录的状况

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (1999). All Rights Reserved.

版权所有(C)互联网协会(1999年)。版权所有。

Table of Contents

目录

   1.  INTRODUCTION....................................................2
   2.  KNOWN IMPLEMENTATION PROBLEMS...................................3
     2.1  No initial slow start........................................3
     2.2  No slow start after retransmission timeout...................6
     2.3  Uninitialized CWND...........................................9
     2.4  Inconsistent retransmission.................................11
     2.5  Failure to retain above-sequence data.......................13
     2.6  Extra additive constant in congestion avoidance.............17
     2.7  Initial RTO too low.........................................23
     2.8  Failure of window deflation after loss recovery.............26
     2.9  Excessively short keepalive connection timeout..............28
     2.10 Failure to back off retransmission timeout..................31
        
   1.  INTRODUCTION....................................................2
   2.  KNOWN IMPLEMENTATION PROBLEMS...................................3
     2.1  No initial slow start........................................3
     2.2  No slow start after retransmission timeout...................6
     2.3  Uninitialized CWND...........................................9
     2.4  Inconsistent retransmission.................................11
     2.5  Failure to retain above-sequence data.......................13
     2.6  Extra additive constant in congestion avoidance.............17
     2.7  Initial RTO too low.........................................23
     2.8  Failure of window deflation after loss recovery.............26
     2.9  Excessively short keepalive connection timeout..............28
     2.10 Failure to back off retransmission timeout..................31
        
     2.11 Insufficient interval between keepalives....................34
     2.12 Window probe deadlock.......................................36
     2.13 Stretch ACK violation.......................................40
     2.14 Retransmission sends multiple packets.......................43
     2.15 Failure to send FIN notification promptly...................45
     2.16 Failure to send a RST after Half Duplex Close...............47
     2.17 Failure to RST on close with data pending...................50
     2.18 Options missing from TCP MSS calculation....................54
   3.  SECURITY CONSIDERATIONS........................................56
   4.  ACKNOWLEDGEMENTS...............................................56
   5.  REFERENCES.....................................................57
   6.  AUTHORS' ADDRESSES.............................................58
   7.  FULL COPYRIGHT STATEMENT.......................................60
        
     2.11 Insufficient interval between keepalives....................34
     2.12 Window probe deadlock.......................................36
     2.13 Stretch ACK violation.......................................40
     2.14 Retransmission sends multiple packets.......................43
     2.15 Failure to send FIN notification promptly...................45
     2.16 Failure to send a RST after Half Duplex Close...............47
     2.17 Failure to RST on close with data pending...................50
     2.18 Options missing from TCP MSS calculation....................54
   3.  SECURITY CONSIDERATIONS........................................56
   4.  ACKNOWLEDGEMENTS...............................................56
   5.  REFERENCES.....................................................57
   6.  AUTHORS' ADDRESSES.............................................58
   7.  FULL COPYRIGHT STATEMENT.......................................60
        
1. Introduction
1. 介绍

This memo catalogs a number of known TCP implementation problems. The goal in doing so is to improve conditions in the existing Internet by enhancing the quality of current TCP/IP implementations. It is hoped that both performance and correctness issues can be resolved by making implementors aware of the problems and their solutions. In the long term, it is hoped that this will provide a reduction in unnecessary traffic on the network, the rate of connection failures due to protocol errors, and load on network servers due to time spent processing both unsuccessful connections and retransmitted data. This will help to ensure the stability of the global Internet.

此备忘录列出了许多已知的TCP实现问题。这样做的目的是通过提高当前TCP/IP实现的质量来改善现有Internet的条件。希望通过让实现人员了解问题及其解决方案,可以解决性能和正确性问题。从长远来看,希望这将减少网络上不必要的通信量、由于协议错误导致的连接故障率以及由于处理不成功的连接和重新传输的数据所花费的时间而导致的网络服务器负载。这将有助于确保全球互联网的稳定。

Each problem is defined as follows:

每个问题的定义如下:

Name of Problem The name associated with the problem. In this memo, the name is given as a subsection heading.

问题名称与问题关联的名称。在本备忘录中,该名称作为小节标题给出。

Classification One or more problem categories for which the problem is classified: "congestion control", "performance", "reliability", "resource management".

分类问题分类的一个或多个问题类别:“拥塞控制”、“性能”、“可靠性”、“资源管理”。

Description A definition of the problem, succinct but including necessary background material.

描述问题的定义,简洁但包括必要的背景材料。

Significance A brief summary of the sorts of environments for which the problem is significant.

重要性简要概述问题所针对的环境类型。

Implications Why the problem is viewed as a problem.

问题被视为问题的含义。

Relevant RFCs The RFCs defining the TCP specification with which the problem conflicts. These RFCs often qualify behavior using terms such as MUST, SHOULD, MAY, and others written capitalized. See RFC 2119 for the exact interpretation of these terms.

相关RFC定义问题与之冲突的TCP规范的RFC。这些RFC通常使用诸如必须、应该、可能和其他大写的术语来限定行为。有关这些术语的准确解释,请参见RFC 2119。

Trace file demonstrating the problem One or more ASCII trace files demonstrating the problem, if applicable.

演示问题的跟踪文件演示问题的一个或多个ASCII跟踪文件(如果适用)。

Trace file demonstrating correct behavior One or more examples of how correct behavior appears in a trace, if applicable.

显示正确行为的跟踪文件一个或多个示例,说明正确行为在跟踪中的显示方式(如果适用)。

References References that further discuss the problem.

进一步讨论该问题的参考资料。

How to detect How to test an implementation to see if it exhibits the problem. This discussion may include difficulties and subtleties associated with causing the problem to manifest itself, and with interpreting traces to detect the presence of the problem (if applicable).

如何检测如何测试一个实现以查看它是否出现问题。这一讨论可能包括与导致问题显现相关的困难和微妙之处,以及与解释痕迹以检测问题的存在(如果适用)相关的困难和微妙之处。

How to fix For known causes of the problem, how to correct the implementation.

如何修复问题的已知原因,如何纠正实施。

2. Known implementation problems
2. 已知的实现问题

2.1.

2.1.

Name of Problem No initial slow start

问题名称无初始慢速启动

Classification Congestion control

分类拥塞控制

Description When a TCP begins transmitting data, it is required by RFC 1122, 4.2.2.15, to engage in a "slow start" by initializing its congestion window, cwnd, to one packet (one segment of the maximum size). (Note that an experimental change to TCP, documented in [RFC2414], allows an initial value somewhat larger than one packet.) It subsequently increases cwnd by one packet for each ACK it receives for new data. The minimum of cwnd and the

说明当TCP开始传输数据时,RFC 1122,4.2.2.15要求通过将其拥塞窗口cwnd初始化为一个数据包(最大大小的一段)来进行“慢启动”。(注意,[RFC2414]中记录的对TCP的实验性更改允许初始值略大于一个数据包。)随后,对于接收到的新数据的每个ACK,它将cwnd增加一个数据包。cwnd的最小值和

receiver's advertised window bounds the highest sequence number the TCP can transmit. A TCP that fails to initialize and increment cwnd in this fashion exhibits "No initial slow start".

接收方的播发窗口限制TCP可以传输的最高序列号。未能以这种方式初始化和增加cwnd的TCP显示“没有初始慢启动”。

Significance In congested environments, detrimental to the performance of other connections, and possibly to the connection itself.

在拥挤环境中的重要性,对其他连接的性能有害,可能对连接本身有害。

Implications A TCP failing to slow start when beginning a connection results in traffic bursts that can stress the network, leading to excessive queueing delays and packet loss.

含义TCP在开始连接时未能减慢启动速度会导致流量突发,从而给网络带来压力,导致过度排队延迟和数据包丢失。

Implementations exhibiting this problem might do so because they suffer from the general problem of not including the required congestion window. These implementations will also suffer from "No slow start after retransmission timeout".

出现此问题的实现可能会这样做,因为它们会遇到不包括所需拥塞窗口的一般问题。这些实现还将遭受“重传超时后无慢启动”的痛苦。

There are different shades of "No initial slow start". From the perspective of stressing the network, the worst is a connection that simply always sends based on the receiver's advertised window, with no notion of a separate congestion window. Another form is described in "Uninitialized CWND" below.

“无初始慢速启动”有不同的含义。从强调网络的角度来看,最糟糕的情况是连接总是基于接收方的广告窗口发送,而没有单独的拥塞窗口的概念。下面的“未初始化CWND”中描述了另一种形式。

Relevant RFCs RFC 1122 requires use of slow start. RFC 2001 gives the specifics of slow start.

相关RFC RFC 1122要求使用慢启动。RFC 2001给出了慢启动的具体说明。

Trace file demonstrating it Made using tcpdump [Jacobson89] recording at the connection responder. No losses reported by the packet filter.

在连接响应程序中使用tcpdump[Jacobson89]记录来演示它的跟踪文件。数据包筛选器未报告任何丢失。

   10:40:42.244503 B > A: S 1168512000:1168512000(0) win 32768
                           <mss 1460,nop,wscale 0> (DF) [tos 0x8]
   10:40:42.259908 A > B: S 3688169472:3688169472(0)
                           ack 1168512001 win 32768 <mss 1460>
   10:40:42.389992 B > A: . ack 1 win 33580 (DF) [tos 0x8]
   10:40:42.664975 A > B: P 1:513(512) ack 1 win 32768
   10:40:42.700185 A > B: . 513:1973(1460) ack 1 win 32768
   10:40:42.718017 A > B: . 1973:3433(1460) ack 1 win 32768
   10:40:42.762945 A > B: . 3433:4893(1460) ack 1 win 32768
   10:40:42.811273 A > B: . 4893:6353(1460) ack 1 win 32768
   10:40:42.829149 A > B: . 6353:7813(1460) ack 1 win 32768
   10:40:42.853687 B > A: . ack 1973 win 33580 (DF) [tos 0x8]
   10:40:42.864031 B > A: . ack 3433 win 33580 (DF) [tos 0x8]
        
   10:40:42.244503 B > A: S 1168512000:1168512000(0) win 32768
                           <mss 1460,nop,wscale 0> (DF) [tos 0x8]
   10:40:42.259908 A > B: S 3688169472:3688169472(0)
                           ack 1168512001 win 32768 <mss 1460>
   10:40:42.389992 B > A: . ack 1 win 33580 (DF) [tos 0x8]
   10:40:42.664975 A > B: P 1:513(512) ack 1 win 32768
   10:40:42.700185 A > B: . 513:1973(1460) ack 1 win 32768
   10:40:42.718017 A > B: . 1973:3433(1460) ack 1 win 32768
   10:40:42.762945 A > B: . 3433:4893(1460) ack 1 win 32768
   10:40:42.811273 A > B: . 4893:6353(1460) ack 1 win 32768
   10:40:42.829149 A > B: . 6353:7813(1460) ack 1 win 32768
   10:40:42.853687 B > A: . ack 1973 win 33580 (DF) [tos 0x8]
   10:40:42.864031 B > A: . ack 3433 win 33580 (DF) [tos 0x8]
        

After the third packet, the connection is established. A, the connection responder, begins transmitting to B, the connection initiator. Host A quickly sends 6 packets comprising 7812 bytes, even though the SYN exchange agreed upon an MSS of 1460 bytes (implying an initial congestion window of 1 segment corresponds to 1460 bytes), and so A should have sent at most 1460 bytes.

在第三个数据包之后,建立连接。A、 连接响应程序开始向连接启动器B发送数据。主机A快速发送包含7812字节的6个数据包,即使SYN交换同意1460字节的MSS(意味着1个段的初始拥塞窗口对应1460字节),因此A最多应发送1460字节。

The ACKs sent by B to A in the last two lines indicate that this trace is not a measurement error (slow start really occurring but the corresponding ACKs having been dropped by the packet filter).

在最后两行中,B发送给A的ACK表明该跟踪不是测量错误(确实发生缓慢启动,但相应的ACK已被数据包过滤器丢弃)。

A second trace confirmed that the problem is repeatable.

第二次跟踪确认问题是可重复的。

Trace file demonstrating correct behavior Made using tcpdump recording at the connection originator. No losses reported by the packet filter.

在连接发起人处使用tcpdump记录证明正确行为的跟踪文件。数据包筛选器未报告任何丢失。

   12:35:31.914050 C > D: S 1448571845:1448571845(0)
                            win 4380 <mss 1460>
   12:35:32.068819 D > C: S 1755712000:1755712000(0)
                            ack 1448571846 win 4096
   12:35:32.069341 C > D: . ack 1 win 4608
   12:35:32.075213 C > D: P 1:513(512) ack 1 win 4608
   12:35:32.286073 D > C: . ack 513 win 4096
   12:35:32.287032 C > D: . 513:1025(512) ack 1 win 4608
   12:35:32.287506 C > D: . 1025:1537(512) ack 1 win 4608
   12:35:32.432712 D > C: . ack 1537 win 4096
   12:35:32.433690 C > D: . 1537:2049(512) ack 1 win 4608
   12:35:32.434481 C > D: . 2049:2561(512) ack 1 win 4608
   12:35:32.435032 C > D: . 2561:3073(512) ack 1 win 4608
   12:35:32.594526 D > C: . ack 3073 win 4096
   12:35:32.595465 C > D: . 3073:3585(512) ack 1 win 4608
   12:35:32.595947 C > D: . 3585:4097(512) ack 1 win 4608
   12:35:32.596414 C > D: . 4097:4609(512) ack 1 win 4608
   12:35:32.596888 C > D: . 4609:5121(512) ack 1 win 4608
   12:35:32.733453 D > C: . ack 4097 win 4096
        
   12:35:31.914050 C > D: S 1448571845:1448571845(0)
                            win 4380 <mss 1460>
   12:35:32.068819 D > C: S 1755712000:1755712000(0)
                            ack 1448571846 win 4096
   12:35:32.069341 C > D: . ack 1 win 4608
   12:35:32.075213 C > D: P 1:513(512) ack 1 win 4608
   12:35:32.286073 D > C: . ack 513 win 4096
   12:35:32.287032 C > D: . 513:1025(512) ack 1 win 4608
   12:35:32.287506 C > D: . 1025:1537(512) ack 1 win 4608
   12:35:32.432712 D > C: . ack 1537 win 4096
   12:35:32.433690 C > D: . 1537:2049(512) ack 1 win 4608
   12:35:32.434481 C > D: . 2049:2561(512) ack 1 win 4608
   12:35:32.435032 C > D: . 2561:3073(512) ack 1 win 4608
   12:35:32.594526 D > C: . ack 3073 win 4096
   12:35:32.595465 C > D: . 3073:3585(512) ack 1 win 4608
   12:35:32.595947 C > D: . 3585:4097(512) ack 1 win 4608
   12:35:32.596414 C > D: . 4097:4609(512) ack 1 win 4608
   12:35:32.596888 C > D: . 4609:5121(512) ack 1 win 4608
   12:35:32.733453 D > C: . ack 4097 win 4096
        

References This problem is documented in [Paxson97].

参考文献[Paxson97]中记录了此问题。

How to detect For implementations always manifesting this problem, it shows up immediately in a packet trace or a sequence plot, as illustrated above.

如何检测总是出现此问题的实现,它会立即显示在数据包跟踪或序列图中,如上图所示。

How to fix If the root problem is that the implementation lacks a notion of a congestion window, then unfortunately this requires significant work to fix. However, doing so is important, as such implementations also exhibit "No slow start after retransmission timeout".

如何修复如果根本问题是实现缺少拥塞窗口的概念,那么不幸的是,这需要大量的工作来修复。然而,这样做很重要,因为这样的实现也表现出“在重传超时之后没有慢启动”。

2.2.

2.2.

Name of Problem No slow start after retransmission timeout

问题名称重传超时后无慢速启动

Classification Congestion control

分类拥塞控制

Description When a TCP experiences a retransmission timeout, it is required by RFC 1122, 4.2.2.15, to engage in "slow start" by initializing its congestion window, cwnd, to one packet (one segment of the maximum size). It subsequently increases cwnd by one packet for each ACK it receives for new data until it reaches the "congestion avoidance" threshold, ssthresh, at which point the congestion avoidance algorithm for updating the window takes over. A TCP that fails to enter slow start upon a timeout exhibits "No slow start after retransmission timeout".

说明当TCP经历重传超时时,RFC 1122,4.2.2.15要求通过将其拥塞窗口cwnd初始化为一个数据包(最大大小的一段)来进行“慢启动”。随后,它为接收新数据的每个ACK将cwnd增加一个分组,直到达到“拥塞避免”阈值ssthresh,此时用于更新窗口的拥塞避免算法接管。在超时时未能进入慢启动的TCP显示“重传超时后无慢启动”。

Significance In congested environments, severely detrimental to the performance of other connections, and also the connection itself.

在拥挤的环境中具有重要意义,对其他连接的性能以及连接本身都有严重的不利影响。

Implications Entering slow start upon timeout forms one of the cornerstones of Internet congestion stability, as outlined in [Jacobson88]. If TCPs fail to do so, the network becomes at risk of suffering "congestion collapse" [RFC896].

正如[Jacobson88]所述,超时后进入慢速启动是互联网拥塞稳定性的基石之一。如果TCP未能做到这一点,网络将面临“拥塞崩溃”的风险[RFC896]。

Relevant RFCs RFC 1122 requires use of slow start after loss. RFC 2001 gives the specifics of how to implement slow start. RFC 896 describes congestion collapse.

相关RFC RFC 1122要求在丢失后使用慢速启动。RFC 2001详细介绍了如何实施慢速启动。RFC896描述了拥塞崩溃。

The retransmission timeout discussed here should not be confused with the separate "fast recovery" retransmission mechanism discussed in RFC 2001.

此处讨论的重传超时不应与RFC 2001中讨论的单独“快速恢复”重传机制混淆。

Trace file demonstrating it Made using tcpdump recording at the sending TCP (A). No losses reported by the packet filter.

在发送TCP(A)时使用tcpdump记录生成的跟踪文件。数据包筛选器未报告任何丢失。

   10:40:59.090612 B > A: . ack 357125 win 33580 (DF) [tos 0x8]
   10:40:59.222025 A > B: . 357125:358585(1460) ack 1 win 32768
   10:40:59.868871 A > B: . 357125:358585(1460) ack 1 win 32768
   10:41:00.016641 B > A: . ack 364425 win 33580 (DF) [tos 0x8]
   10:41:00.036709 A > B: . 364425:365885(1460) ack 1 win 32768
   10:41:00.045231 A > B: . 365885:367345(1460) ack 1 win 32768
   10:41:00.053785 A > B: . 367345:368805(1460) ack 1 win 32768
   10:41:00.062426 A > B: . 368805:370265(1460) ack 1 win 32768
   10:41:00.071074 A > B: . 370265:371725(1460) ack 1 win 32768
   10:41:00.079794 A > B: . 371725:373185(1460) ack 1 win 32768
   10:41:00.089304 A > B: . 373185:374645(1460) ack 1 win 32768
   10:41:00.097738 A > B: . 374645:376105(1460) ack 1 win 32768
   10:41:00.106409 A > B: . 376105:377565(1460) ack 1 win 32768
   10:41:00.115024 A > B: . 377565:379025(1460) ack 1 win 32768
   10:41:00.123576 A > B: . 379025:380485(1460) ack 1 win 32768
   10:41:00.132016 A > B: . 380485:381945(1460) ack 1 win 32768
   10:41:00.141635 A > B: . 381945:383405(1460) ack 1 win 32768
   10:41:00.150094 A > B: . 383405:384865(1460) ack 1 win 32768
   10:41:00.158552 A > B: . 384865:386325(1460) ack 1 win 32768
   10:41:00.167053 A > B: . 386325:387785(1460) ack 1 win 32768
   10:41:00.175518 A > B: . 387785:389245(1460) ack 1 win 32768
   10:41:00.210835 A > B: . 389245:390705(1460) ack 1 win 32768
   10:41:00.226108 A > B: . 390705:392165(1460) ack 1 win 32768
   10:41:00.241524 B > A: . ack 389245 win 8760 (DF) [tos 0x8]
        
   10:40:59.090612 B > A: . ack 357125 win 33580 (DF) [tos 0x8]
   10:40:59.222025 A > B: . 357125:358585(1460) ack 1 win 32768
   10:40:59.868871 A > B: . 357125:358585(1460) ack 1 win 32768
   10:41:00.016641 B > A: . ack 364425 win 33580 (DF) [tos 0x8]
   10:41:00.036709 A > B: . 364425:365885(1460) ack 1 win 32768
   10:41:00.045231 A > B: . 365885:367345(1460) ack 1 win 32768
   10:41:00.053785 A > B: . 367345:368805(1460) ack 1 win 32768
   10:41:00.062426 A > B: . 368805:370265(1460) ack 1 win 32768
   10:41:00.071074 A > B: . 370265:371725(1460) ack 1 win 32768
   10:41:00.079794 A > B: . 371725:373185(1460) ack 1 win 32768
   10:41:00.089304 A > B: . 373185:374645(1460) ack 1 win 32768
   10:41:00.097738 A > B: . 374645:376105(1460) ack 1 win 32768
   10:41:00.106409 A > B: . 376105:377565(1460) ack 1 win 32768
   10:41:00.115024 A > B: . 377565:379025(1460) ack 1 win 32768
   10:41:00.123576 A > B: . 379025:380485(1460) ack 1 win 32768
   10:41:00.132016 A > B: . 380485:381945(1460) ack 1 win 32768
   10:41:00.141635 A > B: . 381945:383405(1460) ack 1 win 32768
   10:41:00.150094 A > B: . 383405:384865(1460) ack 1 win 32768
   10:41:00.158552 A > B: . 384865:386325(1460) ack 1 win 32768
   10:41:00.167053 A > B: . 386325:387785(1460) ack 1 win 32768
   10:41:00.175518 A > B: . 387785:389245(1460) ack 1 win 32768
   10:41:00.210835 A > B: . 389245:390705(1460) ack 1 win 32768
   10:41:00.226108 A > B: . 390705:392165(1460) ack 1 win 32768
   10:41:00.241524 B > A: . ack 389245 win 8760 (DF) [tos 0x8]
        

The first packet indicates the ack point is 357125. 130 msec after receiving the ACK, A transmits the packet after the ACK point, 357125:358585. 640 msec after this transmission, it retransmits 357125:358585, in an apparent retransmission timeout. At this point, A's cwnd should be one MSS, or 1460 bytes, as A enters slow start. The trace is consistent with this possibility.

第一个分组指示ack点是357125。在接收到ACK之后130毫秒,A在ACK点357125:358585之后发送分组。此传输后640毫秒,它在明显的重新传输超时内重新传输357125:358585。此时,A的cwnd应该是一个MSS,或1460字节,因为A进入慢速启动。痕迹与这种可能性是一致的。

B replies with an ACK of 364425, indicating that A has filled a sequence hole. At this point, A's cwnd should be 1460*2 = 2920 bytes, since in slow start receiving an ACK advances cwnd by MSS. However, A then launches 19 consecutive packets, which is inconsistent with slow start.

B回复ACK为364425,表示A已填充序列孔。此时,A的cwnd应该是1460*2=2920字节,因为在慢速启动中,接收ACK会使MSS的cwnd提前。但是,A随后启动19个连续数据包,这与慢速启动不一致。

A second trace confirmed that the problem is repeatable.

第二次跟踪确认问题是可重复的。

Trace file demonstrating correct behavior Made using tcpdump recording at the sending TCP (C). No losses reported by the packet filter.

跟踪文件,演示在发送TCP(C)时使用tcpdump记录进行的正确行为。数据包筛选器未报告任何丢失。

   12:35:48.442538 C > D: P 465409:465921(512) ack 1 win 4608
   12:35:48.544483 D > C: . ack 461825 win 4096
   12:35:48.703496 D > C: . ack 461825 win 4096
   12:35:49.044613 C > D: . 461825:462337(512) ack 1 win 4608
        
   12:35:48.442538 C > D: P 465409:465921(512) ack 1 win 4608
   12:35:48.544483 D > C: . ack 461825 win 4096
   12:35:48.703496 D > C: . ack 461825 win 4096
   12:35:49.044613 C > D: . 461825:462337(512) ack 1 win 4608
        
   12:35:49.192282 D > C: . ack 465921 win 2048
   12:35:49.192538 D > C: . ack 465921 win 4096
   12:35:49.193392 C > D: P 465921:466433(512) ack 1 win 4608
   12:35:49.194726 C > D: P 466433:466945(512) ack 1 win 4608
   12:35:49.350665 D > C: . ack 466945 win 4096
   12:35:49.351694 C > D: . 466945:467457(512) ack 1 win 4608
   12:35:49.352168 C > D: . 467457:467969(512) ack 1 win 4608
   12:35:49.352643 C > D: . 467969:468481(512) ack 1 win 4608
   12:35:49.506000 D > C: . ack 467969 win 3584
        
   12:35:49.192282 D > C: . ack 465921 win 2048
   12:35:49.192538 D > C: . ack 465921 win 4096
   12:35:49.193392 C > D: P 465921:466433(512) ack 1 win 4608
   12:35:49.194726 C > D: P 466433:466945(512) ack 1 win 4608
   12:35:49.350665 D > C: . ack 466945 win 4096
   12:35:49.351694 C > D: . 466945:467457(512) ack 1 win 4608
   12:35:49.352168 C > D: . 467457:467969(512) ack 1 win 4608
   12:35:49.352643 C > D: . 467969:468481(512) ack 1 win 4608
   12:35:49.506000 D > C: . ack 467969 win 3584
        

After C transmits the first packet shown to D, it takes no action in response to D's ACKs for 461825, because the first packet already reached the advertised window limit of 4096 bytes above 461825. 600 msec after transmitting the first packet, C retransmits 461825:462337, presumably due to a timeout. Its congestion window is now MSS (512 bytes).

在C向D发送所示的第一个数据包之后,它不会对D的461825的ACK做出任何响应,因为第一个数据包已经达到了比461825高4096字节的公布窗口限制。传输第一个数据包600毫秒后,C重新传输461825:462337,可能是由于超时。它的拥塞窗口现在是MSS(512字节)。

D acks 465921, indicating that C's retransmission filled a sequence hole. This ACK advances C's cwnd from 512 to 1024. Very shortly after, D acks 465921 again in order to update the offered window from 2048 to 4096. This ACK does not advance cwnd since it is not for new data. Very shortly after, C responds to the newly enlarged window by transmitting two packets. D acks both, advancing cwnd from 1024 to 1536. C in turn transmits three packets.

D确认465921,表明C的重传填补了序列漏洞。此ACK将C的cwnd从512提前到1024。不久之后,D再次确认465921,以便将提供的窗口从2048更新到4096。此ACK不提前cwnd,因为它不用于新数据。不久之后,C通过发送两个数据包来响应新放大的窗口。D确认两者,将cwnd从1024提高到1536。C依次发送三个数据包。

References This problem is documented in [Paxson97].

参考文献[Paxson97]中记录了此问题。

How to detect Packet loss is common enough in the Internet that generally it is not difficult to find an Internet path that will force retransmission due to packet loss.

如何检测数据包丢失在Internet中非常常见,因此通常不难找到由于数据包丢失而强制重新传输的Internet路径。

If the effective window prior to loss is large enough, however, then the TCP may retransmit using the "fast recovery" mechanism described in RFC 2001. In a packet trace, the signature of fast recovery is that the packet retransmission occurs in response to the receipt of three duplicate ACKs, and subsequent duplicate ACKs may lead to the transmission of new data, above both the ack point and the highest sequence transmitted so far. An absence of three duplicate ACKs prior to retransmission suffices to distinguish between timeout and fast recovery retransmissions. In the face of only observing fast recovery retransmissions, generally it is not difficult to repeat the data transfer until observing a timeout retransmission.

然而,如果丢失之前的有效窗口足够大,则TCP可以使用RFC 2001中描述的“快速恢复”机制重新传输。在数据包跟踪中,快速恢复的特征是,数据包重传响应于收到三个重复的ack而发生,随后的重复ack可能导致新数据的传输,超过ack点和迄今为止传输的最高序列。在重新传输之前没有三个重复的ack就足以区分超时和快速恢复重新传输。在仅观察快速恢复重传的情况下,通常不难重复数据传输,直到观察到超时重传。

Once armed with a trace exhibiting a timeout retransmission, determining whether the TCP follows slow start is done by computing the correct progression of cwnd and comparing it to the amount of data transmitted by the TCP subsequent to the timeout retransmission.

一旦装备了显示超时重传的跟踪,则通过计算cwnd的正确进程并将其与超时重传之后TCP传输的数据量进行比较来确定TCP是否遵循慢启动。

How to fix If the root problem is that the implementation lacks a notion of a congestion window, then unfortunately this requires significant work to fix. However, doing so is critical, for reasons outlined above.

如何修复如果根本问题是实现缺少拥塞窗口的概念,那么不幸的是,这需要大量的工作来修复。然而,出于上述原因,这样做至关重要。

2.3.

2.3.

Name of Problem Uninitialized CWND

未初始化的CWND问题的名称

Classification Congestion control

分类拥塞控制

Description As described above for "No initial slow start", when a TCP connection begins cwnd is initialized to one segment (or perhaps a few segments, if experimenting with [RFC2414]). One particular form of "No initial slow start", worth separate mention as the bug is fairly widely deployed, is "Uninitialized CWND". That is, while the TCP implements the proper slow start mechanism, it fails to initialize cwnd properly, so slow start in fact fails to occur.

如上所述,当TCP连接开始时,“无初始慢速启动”的说明cwnd被初始化为一个段(或者可能是几个段,如果使用[RFC2414]进行试验)。“无初始慢启动”的一种特殊形式是“未初始化的CWND”,值得单独提及,因为该缺陷已被广泛部署。也就是说,虽然TCP实现了适当的慢启动机制,但它无法正确初始化cwnd,因此慢启动实际上无法发生。

One way the bug can occur is if, during the connection establishment handshake, the SYN ACK packet arrives without an MSS option. The faulty implementation uses receipt of the MSS option to initialize cwnd to one segment; if the option fails to arrive, then cwnd is instead initialized to a very large value.

出现错误的一种方式是,在连接建立握手期间,SYN ACK数据包到达时没有MSS选项。错误的实现使用MSS选项的接收将cwnd初始化为一个段;如果该选项未能到达,则cwnd将被初始化为一个非常大的值。

Significance In congested environments, detrimental to the performance of other connections, and likely to the connection itself. The burst can be so large (see below) that it has deleterious effects even in uncongested environments.

在拥挤环境中的重要性,对其他连接的性能有害,并且可能对连接本身有害。爆炸可能非常大(见下文),甚至在未被压缩的环境中也会产生有害影响。

Implications A TCP exhibiting this behavior is stressing the network with a large burst of packets, which can cause loss in the network.

含义表现出这种行为的TCP使用大量数据包对网络造成压力,这可能导致网络丢失。

Relevant RFCs RFC 1122 requires use of slow start. RFC 2001 gives the specifics of slow start.

相关RFC RFC 1122要求使用慢启动。RFC 2001给出了慢启动的具体说明。

Trace file demonstrating it This trace was made using tcpdump running on host A. Host A is the sender and host B is the receiver. The advertised window and timestamp options have been omitted for clarity, except for the first segment sent by host A. Note that A sends an MSS option in its initial SYN but B does not include one in its reply.

证明该跟踪的跟踪文件该跟踪是使用在主机A上运行的tcpdump进行的。主机A是发送方,主机B是接收方。为清楚起见,已省略公布的窗口和时间戳选项,但主机A发送的第一段除外。请注意,A在其初始SYN中发送MSS选项,但B在其回复中不包括MSS选项。

   16:56:02.226937 A > B: S 237585307:237585307(0) win 8192
         <mss 536,nop,wscale 0,nop,nop,timestamp[|tcp]>
   16:56:02.557135 B > A: S 1617216000:1617216000(0)
         ack 237585308 win 16384
   16:56:02.557788 A > B: . ack 1 win 8192
   16:56:02.566014 A > B: . 1:537(536) ack 1
   16:56:02.566557 A > B: . 537:1073(536) ack 1
   16:56:02.567120 A > B: . 1073:1609(536) ack 1
   16:56:02.567662 A > B: P 1609:2049(440) ack 1
   16:56:02.568349 A > B: . 2049:2585(536) ack 1
   16:56:02.568909 A > B: . 2585:3121(536) ack 1
        
   16:56:02.226937 A > B: S 237585307:237585307(0) win 8192
         <mss 536,nop,wscale 0,nop,nop,timestamp[|tcp]>
   16:56:02.557135 B > A: S 1617216000:1617216000(0)
         ack 237585308 win 16384
   16:56:02.557788 A > B: . ack 1 win 8192
   16:56:02.566014 A > B: . 1:537(536) ack 1
   16:56:02.566557 A > B: . 537:1073(536) ack 1
   16:56:02.567120 A > B: . 1073:1609(536) ack 1
   16:56:02.567662 A > B: P 1609:2049(440) ack 1
   16:56:02.568349 A > B: . 2049:2585(536) ack 1
   16:56:02.568909 A > B: . 2585:3121(536) ack 1
        

[54 additional burst segments deleted for brevity]

[为简洁起见,删除了54个额外的突发段]

   16:56:02.936638 A > B: . 32065:32601(536) ack 1
   16:56:03.018685 B > A: . ack 1
        
   16:56:02.936638 A > B: . 32065:32601(536) ack 1
   16:56:03.018685 B > A: . ack 1
        

After the three-way handshake, host A bursts 61 segments into the network, before duplicate ACKs on the first segment cause a retransmission to occur. Since host A did not wait for the ACK on the first segment before sending additional segments, it is exhibiting "Uninitialized CWND"

在三方握手之后,主机A将61个段突发到网络中,然后第一个段上的重复ack导致发生重传。由于主机A在发送附加段之前没有等待第一段上的ACK,因此它显示“未初始化CWND”

Trace file demonstrating correct behavior

显示正确行为的跟踪文件

See the example for "No initial slow start".

参见“无初始慢速启动”示例。

References This problem is documented in [Paxson97].

参考文献[Paxson97]中记录了此问题。

How to detect This problem can be detected by examining a packet trace recorded at either the sender or the receiver. However, the bug can be difficult to induce because it requires finding a remote TCP peer that does not send an MSS option in its SYN ACK.

如何检测这个问题可以通过检查在发送方或接收方记录的数据包跟踪来检测。但是,该错误可能很难诱发,因为它需要找到一个在其SYN ACK中不发送MSS选项的远程TCP对等方。

How to fix This problem can be fixed by ensuring that cwnd is initialized upon receipt of a SYN ACK, even if the SYN ACK does not contain an MSS option.

如何解决此问题可以通过确保在收到SYN ACK时初始化cwnd来解决,即使SYN ACK不包含MSS选项。

2.4.

2.4.

Name of Problem Inconsistent retransmission

问题名称不一致的重新传输

Classification Reliability

分类可靠性

Description If, for a given sequence number, a sending TCP retransmits different data than previously sent for that sequence number, then a strong possibility arises that the receiving TCP will reconstruct a different byte stream than that sent by the sending application, depending on which instance of the sequence number it accepts.

说明如果对于给定的序列号,发送TCP重新传输的数据与之前为该序列号发送的数据不同,则很有可能出现接收TCP将重建与发送应用程序发送的数据不同的字节流,具体取决于它接受的序列号实例。

Such a sending TCP exhibits "Inconsistent retransmission".

这种发送TCP表现出“不一致的重传”。

Significance Critical for all environments.

对所有环境都至关重要。

Implications Reliable delivery of data is a fundamental property of TCP.

含义可靠的数据传输是TCP的一个基本特性。

Relevant RFCs RFC 793, section 1.5, discusses the central role of reliability in TCP operation.

相关RFC RFC 793第1.5节讨论了可靠性在TCP操作中的中心作用。

Trace file demonstrating it Made using tcpdump recording at the receiving TCP (B). No losses reported by the packet filter.

在接收TCP(B)时使用tcpdump记录生成的跟踪文件。数据包筛选器未报告任何丢失。

   12:35:53.145503 A > B: FP 90048435:90048461(26)
                             ack 393464682 win 4096
                                        4500 0042 9644 0000
                    3006 e4c2 86b1 0401 83f3 010a b2a4 0015
                    055e 07b3 1773 cb6a 5019 1000 68a9 0000
   data starts here>504f 5254 2031 3334 2c31 3737*2c34 2c31
                    2c31 3738 2c31 3635 0d0a
   12:35:53.146479 B > A: R 393464682:393464682(0) win 8192
   12:35:53.851714 A > B: FP 90048429:90048463(34)
                          ack 393464682 win 4096
                                        4500 004a 965b 0000
                    3006 e4a3 86b1 0401 83f3 010a b2a4 0015
                    055e 07ad 1773 cb6a 5019 1000 8bd3 0000
   data starts here>5041 5356 0d0a 504f 5254 2031 3334 2c31
                    3737*2c31 3035 2c31 3431 2c34 2c31 3539
                    0d0a
        
   12:35:53.145503 A > B: FP 90048435:90048461(26)
                             ack 393464682 win 4096
                                        4500 0042 9644 0000
                    3006 e4c2 86b1 0401 83f3 010a b2a4 0015
                    055e 07b3 1773 cb6a 5019 1000 68a9 0000
   data starts here>504f 5254 2031 3334 2c31 3737*2c34 2c31
                    2c31 3738 2c31 3635 0d0a
   12:35:53.146479 B > A: R 393464682:393464682(0) win 8192
   12:35:53.851714 A > B: FP 90048429:90048463(34)
                          ack 393464682 win 4096
                                        4500 004a 965b 0000
                    3006 e4a3 86b1 0401 83f3 010a b2a4 0015
                    055e 07ad 1773 cb6a 5019 1000 8bd3 0000
   data starts here>5041 5356 0d0a 504f 5254 2031 3334 2c31
                    3737*2c31 3035 2c31 3431 2c34 2c31 3539
                    0d0a
        

The sequence numbers shown in this trace are absolute and not adjusted to reflect the ISN. The 4-digit hex values show a dump of the packet's IP and TCP headers, as well as payload. A first sends to B data for 90048435:90048461. The corresponding data begins with hex words 504f, 5254, etc.

此跟踪中显示的序列号是绝对的,未调整以反映ISN。4位十六进制值显示数据包的IP和TCP报头以及有效负载的转储。A首先向B发送90048435:90048461的数据。相应的数据以十六进制字504f、5254等开头。

B responds with a RST. Since the recording location was local to B, it is unknown whether A received the RST.

B用RST回应。由于记录位置是B的本地位置,因此不知道A是否收到RST。

A then sends 90048429:90048463, which includes six sequence positions below the earlier transmission, all 26 positions of the earlier transmission, and two additional sequence positions.

然后A发送90048429:90048463,其中包括前一个变速器下方的六个序列位置、前一个变速器的所有26个位置以及两个附加序列位置。

The retransmission disagrees starting just after sequence 90048447, annotated above with a leading '*'. These two bytes were originally transmitted as hex 2c34 but retransmitted as hex 2c31. Subsequent positions disagree as well.

重新传输不同意在序列90048447之后开始,上面用前导“*”注释。这两个字节最初作为十六进制2c34传输,但作为十六进制2c31重新传输。随后的立场也不一致。

This behavior has been observed in other traces involving different hosts. It is unknown how to repeat it.

在涉及不同主机的其他跟踪中已观察到此行为。不知道如何重复。

In this instance, no corruption would occur, since B has already indicated it will not accept further packets from A.

在这种情况下,不会发生损坏,因为B已经表示它不会再接受来自A的数据包。

A second example illustrates a slightly different instance of the problem. The tracing again was made with tcpdump at the receiving TCP (D).

第二个例子说明了一个稍微不同的问题实例。在接收TCP(D)时使用tcpdump再次进行跟踪。

   22:23:58.645829 C > D: P 185:212(27) ack 565 win 4096
                                        4500 0043 90a3 0000
                    3306 0734 cbf1 9eef 83f3 010a 0525 0015
                    a3a2 faba 578c 70a4 5018 1000 9a53 0000
   data starts here>504f 5254 2032 3033 2c32 3431 2c31 3538
                    2c32 3339 2c35 2c34 330d 0a
   22:23:58.646805 D > C: . ack 184 win 8192
                                        4500 0028 beeb 0000
                    3e06 ce06 83f3 010a cbf1 9eef 0015 0525
                    578c 70a4 a3a2 fab9 5010 2000 342f 0000
   22:31:36.532244 C > D: FP 186:213(27) ack 565 win 4096
                                        4500 0043 9435 0000
                    3306 03a2 cbf1 9eef 83f3 010a 0525 0015
                    a3a2 fabb 578c 70a4 5019 1000 9a51 0000
   data starts here>504f 5254 2032 3033 2c32 3431 2c31 3538
                    2c32 3339 2c35 2c34 330d 0a
        
   22:23:58.645829 C > D: P 185:212(27) ack 565 win 4096
                                        4500 0043 90a3 0000
                    3306 0734 cbf1 9eef 83f3 010a 0525 0015
                    a3a2 faba 578c 70a4 5018 1000 9a53 0000
   data starts here>504f 5254 2032 3033 2c32 3431 2c31 3538
                    2c32 3339 2c35 2c34 330d 0a
   22:23:58.646805 D > C: . ack 184 win 8192
                                        4500 0028 beeb 0000
                    3e06 ce06 83f3 010a cbf1 9eef 0015 0525
                    578c 70a4 a3a2 fab9 5010 2000 342f 0000
   22:31:36.532244 C > D: FP 186:213(27) ack 565 win 4096
                                        4500 0043 9435 0000
                    3306 03a2 cbf1 9eef 83f3 010a 0525 0015
                    a3a2 fabb 578c 70a4 5019 1000 9a51 0000
   data starts here>504f 5254 2032 3033 2c32 3431 2c31 3538
                    2c32 3339 2c35 2c34 330d 0a
        

In this trace, sequence numbers are relative. C sends 185:212, but D only sends an ACK for 184 (so sequence number 184 is missing). C then sends 186:213. The packet payload is identical to the previous payload, but the base sequence number is one higher, resulting in an inconsistent retransmission.

在此跟踪中,序列号是相对的。C发送185:212,但D只发送184的ACK(因此序列号184丢失)。C然后发送186:213。数据包有效载荷与先前的有效载荷相同,但基本序列号高一个,导致不一致的重新传输。

Neither trace exhibits checksum errors.

两个跟踪都不显示校验和错误。

Trace file demonstrating correct behavior (Omitted, as presumably correct behavior is obvious.)

显示正确行为的跟踪文件(省略,因为假定正确行为显而易见。)

References None known.

参考文献不详。

How to detect This problem unfortunately can be very difficult to detect, since available experience indicates it is quite rare that it is manifested. No "trigger" has been identified that can be used to reproduce the problem.

不幸的是,如何检测这个问题可能很难检测,因为现有的经验表明它很少出现。尚未确定可用于重现问题的“触发器”。

How to fix In the absence of a known "trigger", we cannot always assess how to fix the problem.

如何修复在没有已知“触发器”的情况下,我们无法始终评估如何修复问题。

In one implementation (not the one illustrated above), the problem manifested itself when (1) the sender received a zero window and stalled; (2) eventually an ACK arrived that offered a window larger than that in effect at the time of the stall; (3) the sender transmitted out of the buffer of data it held at the time of the stall, but (4) failed to limit this transfer to the buffer length, instead using the newly advertised (and larger) offered window. Consequently, in addition to the valid buffer contents, it sent whatever garbage values followed the end of the buffer. If it then retransmitted the corresponding sequence numbers, at that point it sent the correct data, resulting in an inconsistent retransmission. Note that this instance of the problem reflects a more general problem, that of initially transmitting incorrect data.

在一个实现(不是上面所示的实现)中,当(1)发送方收到一个零窗口并暂停时,问题就出现了;(2) 最终,一个ACK到达,提供了一个比失速时有效的窗口更大的窗口;(3) 发送方将暂停时持有的数据从缓冲区中传输出去,但(4)未能将传输限制在缓冲区长度内,而是使用新发布的(和更大的)提供的窗口。因此,除了有效的缓冲区内容外,它还发送缓冲区末尾之后的任何垃圾值。如果它随后重新传输了相应的序列号,则在该点它发送了正确的数据,从而导致不一致的重新传输。请注意,此问题的实例反映了一个更普遍的问题,即最初传输错误数据的问题。

2.5.

2.5.

Name of Problem Failure to retain above-sequence data

未能保留上述序列数据的问题名称

Classification Congestion control, performance

分类拥塞控制,性能

Description When a TCP receives an "above sequence" segment, meaning one with a sequence number exceeding RCV.NXT but below RCV.NXT+RCV.WND, it SHOULD queue the segment for later delivery (RFC 1122, 4.2.2.20). (See RFC 793 for the definition of RCV.NXT and RCV.WND.) A TCP that fails to do so is said to exhibit "Failure to retain above-sequence data".

说明当TCP接收到“以上序列”段时,即序列号超过RCV.NXT但低于RCV.NXT+RCV.WND的段,它应将该段排队等待以后的交付(RFC 1122,4.2.2.20)。(有关RCV.NXT和RCV.WND的定义,请参见RFC 793。)如果TCP未能做到这一点,则称其“未能保留上述序列数据”。

It may sometimes be appropriate for a TCP to discard above-sequence data to reclaim memory. If they do so only rarely, then we would not consider them to exhibit this problem. Instead, the particular concern is with TCPs that always discard above-sequence data.

TCP有时可以放弃上述序列数据以回收内存。如果他们很少这样做,我们就不会认为他们表现出这个问题。相反,特别关注的是始终丢弃上述序列数据的TCP。

Significance In environments prone to packet loss, detrimental to the performance of both other connections and the connection itself.

在容易丢失数据包的环境中具有重要意义,这对其他连接和连接本身的性能都是有害的。

Implications In times of congestion, a failure to retain above-sequence data will lead to numerous otherwise-unnecessary retransmissions, aggravating the congestion and potentially reducing performance by a large factor.

在拥塞情况下,未能保留上述序列数据将导致大量不必要的重新传输,从而加剧拥塞,并可能大幅降低性能。

Relevant RFCs RFC 1122 revises RFC 793 by upgrading the latter's MAY to a SHOULD on this issue.

相关RFC RFC 1122修订了RFC 793,将后者在该问题上的“可能”升级为“应该”。

Trace file demonstrating it Made using tcpdump recording at the receiving TCP. No losses reported by the packet filter.

在接收TCP时使用tcpdump记录生成的跟踪文件。数据包筛选器未报告任何丢失。

B is the TCP sender, A the receiver. A exhibits failure to retain above sequence-data:

B是TCP发送方,A是接收方。A未能保留上述序列数据:

   10:38:10.164860 B > A: . 221078:221614(536) ack 1 win 33232 [tos 0x8]
   10:38:10.170809 B > A: . 221614:222150(536) ack 1 win 33232 [tos 0x8]
   10:38:10.177183 B > A: . 222150:222686(536) ack 1 win 33232 [tos 0x8]
   10:38:10.225039 A > B: . ack 222686 win 25800
        
   10:38:10.164860 B > A: . 221078:221614(536) ack 1 win 33232 [tos 0x8]
   10:38:10.170809 B > A: . 221614:222150(536) ack 1 win 33232 [tos 0x8]
   10:38:10.177183 B > A: . 222150:222686(536) ack 1 win 33232 [tos 0x8]
   10:38:10.225039 A > B: . ack 222686 win 25800
        

Here B has sent up to (relative) sequence 222686 in-sequence, and A accordingly acknowledges.

这里,B已按顺序发送至(相对)序列222686,A相应地确认。

   10:38:10.268131 B > A: . 223222:223758(536) ack 1 win 33232 [tos 0x8]
   10:38:10.337995 B > A: . 223758:224294(536) ack 1 win 33232 [tos 0x8]
   10:38:10.344065 B > A: . 224294:224830(536) ack 1 win 33232 [tos 0x8]
   10:38:10.350169 B > A: . 224830:225366(536) ack 1 win 33232 [tos 0x8]
   10:38:10.356362 B > A: . 225366:225902(536) ack 1 win 33232 [tos 0x8]
        
   10:38:10.268131 B > A: . 223222:223758(536) ack 1 win 33232 [tos 0x8]
   10:38:10.337995 B > A: . 223758:224294(536) ack 1 win 33232 [tos 0x8]
   10:38:10.344065 B > A: . 224294:224830(536) ack 1 win 33232 [tos 0x8]
   10:38:10.350169 B > A: . 224830:225366(536) ack 1 win 33232 [tos 0x8]
   10:38:10.356362 B > A: . 225366:225902(536) ack 1 win 33232 [tos 0x8]
        
   10:38:10.362445 B > A: . 225902:226438(536) ack 1 win 33232 [tos 0x8]
   10:38:10.368579 B > A: . 226438:226974(536) ack 1 win 33232 [tos 0x8]
   10:38:10.374732 B > A: . 226974:227510(536) ack 1 win 33232 [tos 0x8]
   10:38:10.380825 B > A: . 227510:228046(536) ack 1 win 33232 [tos 0x8]
   10:38:10.387027 B > A: . 228046:228582(536) ack 1 win 33232 [tos 0x8]
   10:38:10.393053 B > A: . 228582:229118(536) ack 1 win 33232 [tos 0x8]
   10:38:10.399193 B > A: . 229118:229654(536) ack 1 win 33232 [tos 0x8]
   10:38:10.405356 B > A: . 229654:230190(536) ack 1 win 33232 [tos 0x8]
        
   10:38:10.362445 B > A: . 225902:226438(536) ack 1 win 33232 [tos 0x8]
   10:38:10.368579 B > A: . 226438:226974(536) ack 1 win 33232 [tos 0x8]
   10:38:10.374732 B > A: . 226974:227510(536) ack 1 win 33232 [tos 0x8]
   10:38:10.380825 B > A: . 227510:228046(536) ack 1 win 33232 [tos 0x8]
   10:38:10.387027 B > A: . 228046:228582(536) ack 1 win 33232 [tos 0x8]
   10:38:10.393053 B > A: . 228582:229118(536) ack 1 win 33232 [tos 0x8]
   10:38:10.399193 B > A: . 229118:229654(536) ack 1 win 33232 [tos 0x8]
   10:38:10.405356 B > A: . 229654:230190(536) ack 1 win 33232 [tos 0x8]
        

A now receives 13 additional packets from B. These are above-sequence because 222686:223222 was dropped. The packets do however fit within the offered window of 25800. A does not generate any duplicate ACKs for them.

A现在从B收到13个额外的数据包。这些数据包在序列上面,因为222686:223222被丢弃。但是,这些数据包不适合提供的25800窗口。A不会为它们生成任何重复的ack。

The trace contributor (V. Paxson) verified that these 13 packets had valid IP and TCP checksums.

跟踪参与者(V.Paxson)验证了这13个数据包具有有效的IP和TCP校验和。

   10:38:11.917728 B > A: . 222686:223222(536) ack 1 win 33232 [tos 0x8]
   10:38:11.930925 A > B: . ack 223222 win 32232
        
   10:38:11.917728 B > A: . 222686:223222(536) ack 1 win 33232 [tos 0x8]
   10:38:11.930925 A > B: . ack 223222 win 32232
        

B times out for 222686:223222 and retransmits it. Upon receiving it, A only acknowledges 223222. Had it retained the valid above-sequence packets, it would instead have ack'd 230190.

B超时222686:223222并重新传输。收到后,A仅确认223222。如果它保留了有效的上述序列数据包,它将改为确认230190。

   10:38:12.048438 B > A: . 223222:223758(536) ack 1 win 33232 [tos 0x8]
   10:38:12.054397 B > A: . 223758:224294(536) ack 1 win 33232 [tos 0x8]
   10:38:12.068029 A > B: . ack 224294 win 31696
        
   10:38:12.048438 B > A: . 223222:223758(536) ack 1 win 33232 [tos 0x8]
   10:38:12.054397 B > A: . 223758:224294(536) ack 1 win 33232 [tos 0x8]
   10:38:12.068029 A > B: . ack 224294 win 31696
        

B retransmits two more packets, and A only acknowledges them. This pattern continues as B retransmits the entire set of previously-received packets.

B重新传输两个以上的数据包,A只确认它们。当B重新传输之前接收到的整个数据包集时,该模式继续。

A second trace confirmed that the problem is repeatable.

第二次跟踪确认问题是可重复的。

Trace file demonstrating correct behavior Made using tcpdump recording at the receiving TCP (C). No losses reported by the packet filter.

跟踪文件,演示在接收TCP(C)时使用tcpdump记录进行的正确行为。数据包筛选器未报告任何丢失。

   09:11:25.790417 D > C: . 33793:34305(512) ack 1 win 61440
   09:11:25.791393 D > C: . 34305:34817(512) ack 1 win 61440
   09:11:25.792369 D > C: . 34817:35329(512) ack 1 win 61440
   09:11:25.792369 D > C: . 35329:35841(512) ack 1 win 61440
   09:11:25.793345 D > C: . 36353:36865(512) ack 1 win 61440
   09:11:25.794321 C > D: . ack 35841 win 59904
        
   09:11:25.790417 D > C: . 33793:34305(512) ack 1 win 61440
   09:11:25.791393 D > C: . 34305:34817(512) ack 1 win 61440
   09:11:25.792369 D > C: . 34817:35329(512) ack 1 win 61440
   09:11:25.792369 D > C: . 35329:35841(512) ack 1 win 61440
   09:11:25.793345 D > C: . 36353:36865(512) ack 1 win 61440
   09:11:25.794321 C > D: . ack 35841 win 59904
        

A sequence hole occurs because 35841:36353 has been dropped.

由于35841:36353被丢弃,因此出现序列孔。

   09:11:25.794321 D > C: . 36865:37377(512) ack 1 win 61440
   09:11:25.794321 C > D: . ack 35841 win 59904
   09:11:25.795297 D > C: . 37377:37889(512) ack 1 win 61440
   09:11:25.795297 C > D: . ack 35841 win 59904
   09:11:25.796273 C > D: . ack 35841 win 61440
   09:11:25.798225 D > C: . 37889:38401(512) ack 1 win 61440
   09:11:25.799201 C > D: . ack 35841 win 61440
   09:11:25.807009 D > C: . 38401:38913(512) ack 1 win 61440
   09:11:25.807009 C > D: . ack 35841 win 61440
   (many additional lines omitted)
   09:11:25.884113 D > C: . 52737:53249(512) ack 1 win 61440
   09:11:25.884113 C > D: . ack 35841 win 61440
        
   09:11:25.794321 D > C: . 36865:37377(512) ack 1 win 61440
   09:11:25.794321 C > D: . ack 35841 win 59904
   09:11:25.795297 D > C: . 37377:37889(512) ack 1 win 61440
   09:11:25.795297 C > D: . ack 35841 win 59904
   09:11:25.796273 C > D: . ack 35841 win 61440
   09:11:25.798225 D > C: . 37889:38401(512) ack 1 win 61440
   09:11:25.799201 C > D: . ack 35841 win 61440
   09:11:25.807009 D > C: . 38401:38913(512) ack 1 win 61440
   09:11:25.807009 C > D: . ack 35841 win 61440
   (many additional lines omitted)
   09:11:25.884113 D > C: . 52737:53249(512) ack 1 win 61440
   09:11:25.884113 C > D: . ack 35841 win 61440
        

Each additional, above-sequence packet C receives from D elicits a duplicate ACK for 35841.

从D接收的每个附加的、上面序列的数据包C引出35841的重复ACK。

      09:11:25.887041 D > C: . 35841:36353(512) ack 1 win 61440
      09:11:25.887041 C > D: . ack 53249 win 44032
        
      09:11:25.887041 D > C: . 35841:36353(512) ack 1 win 61440
      09:11:25.887041 C > D: . ack 53249 win 44032
        

D retransmits 35841:36353 and C acknowledges receipt of data all the way up to 53249.

D重传35841:36353,C确认收到数据,直至53249。

References This problem is documented in [Paxson97].

参考文献[Paxson97]中记录了此问题。

How to detect Packet loss is common enough in the Internet that generally it is not difficult to find an Internet path that will result in some above-sequence packets arriving. A TCP that exhibits "Failure to retain ..." may not generate duplicate ACKs for these packets. However, some TCPs that do retain above-sequence data also do not generate duplicate ACKs, so failure to do so does not definitively identify the problem. Instead, the key observation is whether upon retransmission of the dropped packet, data that was previously above-sequence is acknowledged.

如何检测数据包丢失在Internet中非常常见,因此通常不难找到导致某些以上序列数据包到达的Internet路径。显示“未能保留…”的TCP可能不会为这些数据包生成重复的ACK。但是,某些保留上述序列数据的TCP也不会生成重复的ACK,因此,如果不这样做,则无法确定问题。相反,关键的观察是,在丢弃的分组重新传输时,先前高于序列的数据是否被确认。

Two considerations in detecting this problem using a packet trace are that it is easiest to do so with a trace made at the TCP receiver, in order to unambiguously determine which packets arrived successfully, and that such packets may still be correctly discarded if they arrive with checksum errors. The latter can be tested by capturing the entire packet contents and performing the IP and TCP checksum algorithms to verify their integrity; or by confirming that the packets arrive with the same checksum and contents as that with which they were sent, with a presumption that the sending TCP correctly calculates checksums for the packets it transmits.

在使用数据包跟踪检测此问题时,有两个注意事项是,在TCP接收器上进行跟踪最容易做到这一点,以便明确地确定哪些数据包成功到达,并且如果这些数据包到达时出现校验和错误,则仍然可以正确地丢弃这些数据包。后者可以通过捕获整个数据包内容并执行IP和TCP校验和算法来验证其完整性;或者通过确认数据包到达时的校验和和内容与发送数据包时的校验和内容相同,并假定发送TCP正确计算其发送数据包的校验和。

It is considerably easier to verify that an implementation does NOT exhibit this problem. This can be done by recording a trace at the data sender, and observing that sometimes after a retransmission the receiver acknowledges a higher sequence number than just that which was retransmitted.

验证一个实现是否没有出现此问题要容易得多。这可以通过在数据发送者处记录跟踪来实现,并观察到有时在重新传输之后,接收器确认的序列号高于重新传输的序列号。

How to fix If the root problem is that the implementation lacks buffer, then then unfortunately this requires significant work to fix. However, doing so is important, for reasons outlined above.

如果根本问题是实现缺少缓冲区,那么如何修复呢?不幸的是,这需要大量的工作来修复。然而,出于上述原因,这样做很重要。

2.6.

2.6.

Name of Problem Extra additive constant in congestion avoidance

拥塞避免中问题额外加性常数的名称

Classification Congestion control / performance

分类拥塞控制/性能

Description RFC 1122 section 4.2.2.15 states that TCP MUST implement Jacobson's "congestion avoidance" algorithm [Jacobson88], which calls for increasing the congestion window, cwnd, by:

说明RFC 1122第4.2.2.15节规定,TCP必须实现Jacobson的“拥塞避免”算法[Jacobson88],该算法要求通过以下方式增加拥塞窗口cwnd:

MSS * MSS / cwnd

MSS*MSS/cwnd

for each ACK received for new data [RFC2001]. This has the effect of increasing cwnd by approximately one segment in each round trip time.

对于为新数据[RFC2001]接收的每个确认。这会使cwnd在每次往返时间内增加大约一个区段。

Some TCP implementations add an additional fraction of a segment (typically MSS/8) to cwnd for each ACK received for new data [Stevens94, Wright95]:

一些TCP实现为新数据接收到的每个ACK向cwnd添加额外部分段(通常为MSS/8)[Stevens94,Wright95]:

           (MSS * MSS / cwnd) + MSS/8
        
           (MSS * MSS / cwnd) + MSS/8
        

These implementations exhibit "Extra additive constant in congestion avoidance".

这些实现展示了“拥塞避免中的额外加性常数”。

Significance May be detrimental to performance even in completely uncongested environments (see Implications).

即使在完全不拥挤的环境中,重要性也可能对性能有害(参见影响)。

In congested environments, may also be detrimental to the performance of other connections.

在拥挤的环境中,也可能对其他连接的性能有害。

Implications The extra additive term allows a TCP to more aggressively open its congestion window (quadratic rather than linear increase). For congested networks, this can increase the loss rate experienced by all connections sharing a bottleneck with the aggressive TCP.

额外的附加项允许TCP更积极地打开其拥塞窗口(二次增加而不是线性增加)。对于拥塞的网络,这可能会增加与攻击性TCP共享瓶颈的所有连接所经历的丢失率。

However, even for completely uncongested networks, the extra additive term can lead to diminished performance, as follows. In congestion avoidance, a TCP sender probes the network path to determine its available capacity, which often equates to the number of buffers available at a bottleneck link. With linear congestion avoidance, the TCP only probes for sufficient capacity (buffer) to hold one extra packet per RTT.

然而,即使对于完全没有阻塞的网络,额外的附加项也会导致性能降低,如下所示。在拥塞避免中,TCP发送方探测网络路径以确定其可用容量,这通常等于瓶颈链路上可用的缓冲区数量。在避免线性拥塞的情况下,TCP只探测足够的容量(缓冲区),以便在每个RTT中保存一个额外的数据包。

Thus, when it exceeds the available capacity, generally only one packet will be lost (since on the previous RTT it already found that the path could sustain a window with one less packet in flight). If the congestion window is sufficiently large, then the TCP will recover from this single loss using fast retransmission and avoid an expensive (in terms of performance) retransmission timeout.

因此,当它超过可用容量时,通常只会丢失一个数据包(因为在之前的RTT中,它已经发现该路径可以维持一个窗口,其中飞行中的数据包更少)。如果拥塞窗口足够大,那么TCP将使用快速重传从单一丢失中恢复,并避免昂贵的(性能方面的)重传超时。

However, when the additional additive term is used, then cwnd can increase by more than one packet per RTT, in which case the TCP probes more aggressively. If in the previous RTT it had reached the available capacity of the path, then the excess due to the extra increase will again be lost, but now this will result in multiple losses from the flight instead of a single loss. TCPs that do not utilize SACK [RFC2018] generally will not recover from multiple losses without incurring a retransmission timeout [Fall96,Hoe96], significantly diminishing performance.

然而,当使用附加的加法项时,cwnd可以每RTT增加一个以上的数据包,在这种情况下,TCP探测更为积极。如果在前一次RTT中,其已达到路径的可用容量,则额外增加的额外容量将再次丢失,但现在这将导致飞行的多个损失,而不是单个损失。不使用SACK[RFC2018]的TCP通常不会从多次丢失中恢复,而不会导致重新传输超时[Fall96,Hoe96],从而显著降低性能。

Relevant RFCs RFC 1122 requires use of the "congestion avoidance" algorithm. RFC 2001 outlines the fast retransmit/fast recovery algorithms. RFC 2018 discusses the SACK option.

相关的RFC RFC 1122要求使用“拥塞避免”算法。RFC 2001概述了快速重传/快速恢复算法。RFC 2018讨论了SACK选项。

Trace file demonstrating it Recorded using tcpdump running on the same FDDI LAN as host A. Host A is the sender and host B is the receiver. The connection establishment specified an MSS of 4,312 bytes and a window scale factor of 4. We omit the establishment and the first 2.5 MB of data transfer, as the problem is best demonstrated when the window has grown to a large value. At the beginning of the trace excerpt, the congestion window is 31 packets. The connection is never receiver-window limited, so we omit window advertisements from the trace for clarity.

使用与主机A在同一FDDI LAN上运行的tcpdump记录的跟踪文件。主机A是发送方,主机B是接收方。连接建立指定了4312字节的MSS和4的窗口比例因子。我们省略了建立和前2.5 MB的数据传输,因为当窗口增长到一个较大的值时,问题得到了最好的证明。在跟踪摘录的开头,拥塞窗口是31个数据包。连接永远不会受到接收器窗口的限制,因此为了清晰起见,我们从跟踪中省略了窗口广告。

   11:42:07.697951 B > A: . ack 2383006
   11:42:07.699388 A > B: . 2508054:2512366(4312)
   11:42:07.699962 A > B: . 2512366:2516678(4312)
   11:42:07.700012 B > A: . ack 2391630
   11:42:07.701081 A > B: . 2516678:2520990(4312)
   11:42:07.701656 A > B: . 2520990:2525302(4312)
   11:42:07.701739 B > A: . ack 2400254
   11:42:07.702685 A > B: . 2525302:2529614(4312)
   11:42:07.703257 A > B: . 2529614:2533926(4312)
   11:42:07.703295 B > A: . ack 2408878
   11:42:07.704414 A > B: . 2533926:2538238(4312)
   11:42:07.704989 A > B: . 2538238:2542550(4312)
   11:42:07.705040 B > A: . ack 2417502
   11:42:07.705935 A > B: . 2542550:2546862(4312)
   11:42:07.706506 A > B: . 2546862:2551174(4312)
   11:42:07.706544 B > A: . ack 2426126
   11:42:07.707480 A > B: . 2551174:2555486(4312)
   11:42:07.708051 A > B: . 2555486:2559798(4312)
   11:42:07.708088 B > A: . ack 2434750
   11:42:07.709030 A > B: . 2559798:2564110(4312)
   11:42:07.709604 A > B: . 2564110:2568422(4312)
   11:42:07.710175 A > B: . 2568422:2572734(4312) *
        
   11:42:07.697951 B > A: . ack 2383006
   11:42:07.699388 A > B: . 2508054:2512366(4312)
   11:42:07.699962 A > B: . 2512366:2516678(4312)
   11:42:07.700012 B > A: . ack 2391630
   11:42:07.701081 A > B: . 2516678:2520990(4312)
   11:42:07.701656 A > B: . 2520990:2525302(4312)
   11:42:07.701739 B > A: . ack 2400254
   11:42:07.702685 A > B: . 2525302:2529614(4312)
   11:42:07.703257 A > B: . 2529614:2533926(4312)
   11:42:07.703295 B > A: . ack 2408878
   11:42:07.704414 A > B: . 2533926:2538238(4312)
   11:42:07.704989 A > B: . 2538238:2542550(4312)
   11:42:07.705040 B > A: . ack 2417502
   11:42:07.705935 A > B: . 2542550:2546862(4312)
   11:42:07.706506 A > B: . 2546862:2551174(4312)
   11:42:07.706544 B > A: . ack 2426126
   11:42:07.707480 A > B: . 2551174:2555486(4312)
   11:42:07.708051 A > B: . 2555486:2559798(4312)
   11:42:07.708088 B > A: . ack 2434750
   11:42:07.709030 A > B: . 2559798:2564110(4312)
   11:42:07.709604 A > B: . 2564110:2568422(4312)
   11:42:07.710175 A > B: . 2568422:2572734(4312) *
        
   11:42:07.710215 B > A: . ack 2443374
   11:42:07.710799 A > B: . 2572734:2577046(4312)
   11:42:07.711368 A > B: . 2577046:2581358(4312)
   11:42:07.711405 B > A: . ack 2451998
   11:42:07.712323 A > B: . 2581358:2585670(4312)
   11:42:07.712898 A > B: . 2585670:2589982(4312)
   11:42:07.712938 B > A: . ack 2460622
   11:42:07.713926 A > B: . 2589982:2594294(4312)
   11:42:07.714501 A > B: . 2594294:2598606(4312)
   11:42:07.714547 B > A: . ack 2469246
   11:42:07.715747 A > B: . 2598606:2602918(4312)
   11:42:07.716287 A > B: . 2602918:2607230(4312)
   11:42:07.716328 B > A: . ack 2477870
   11:42:07.717146 A > B: . 2607230:2611542(4312)
   11:42:07.717717 A > B: . 2611542:2615854(4312)
   11:42:07.717762 B > A: . ack 2486494
   11:42:07.718754 A > B: . 2615854:2620166(4312)
   11:42:07.719331 A > B: . 2620166:2624478(4312)
   11:42:07.719906 A > B: . 2624478:2628790(4312) **
        
   11:42:07.710215 B > A: . ack 2443374
   11:42:07.710799 A > B: . 2572734:2577046(4312)
   11:42:07.711368 A > B: . 2577046:2581358(4312)
   11:42:07.711405 B > A: . ack 2451998
   11:42:07.712323 A > B: . 2581358:2585670(4312)
   11:42:07.712898 A > B: . 2585670:2589982(4312)
   11:42:07.712938 B > A: . ack 2460622
   11:42:07.713926 A > B: . 2589982:2594294(4312)
   11:42:07.714501 A > B: . 2594294:2598606(4312)
   11:42:07.714547 B > A: . ack 2469246
   11:42:07.715747 A > B: . 2598606:2602918(4312)
   11:42:07.716287 A > B: . 2602918:2607230(4312)
   11:42:07.716328 B > A: . ack 2477870
   11:42:07.717146 A > B: . 2607230:2611542(4312)
   11:42:07.717717 A > B: . 2611542:2615854(4312)
   11:42:07.717762 B > A: . ack 2486494
   11:42:07.718754 A > B: . 2615854:2620166(4312)
   11:42:07.719331 A > B: . 2620166:2624478(4312)
   11:42:07.719906 A > B: . 2624478:2628790(4312) **
        
   11:42:07.719958 B > A: . ack 2495118
   11:42:07.720500 A > B: . 2628790:2633102(4312)
   11:42:07.721080 A > B: . 2633102:2637414(4312)
   11:42:07.721739 B > A: . ack 2503742
   11:42:07.722348 A > B: . 2637414:2641726(4312)
        
   11:42:07.719958 B > A: . ack 2495118
   11:42:07.720500 A > B: . 2628790:2633102(4312)
   11:42:07.721080 A > B: . 2633102:2637414(4312)
   11:42:07.721739 B > A: . ack 2503742
   11:42:07.722348 A > B: . 2637414:2641726(4312)
        
   11:42:07.722918 A > B: . 2641726:2646038(4312)
   11:42:07.769248 B > A: . ack 2512366
        
   11:42:07.722918 A > B: . 2641726:2646038(4312)
   11:42:07.769248 B > A: . ack 2512366
        

The receiver's acknowledgment policy is one ACK per two packets received. Thus, for each ACK arriving at host A, two new packets are sent, except when cwnd increases due to congestion avoidance, in which case three new packets are sent.

接收方的确认策略是每接收两个数据包一次确认。因此,对于到达主机A的每个ACK,发送两个新分组,除了当cwnd由于拥塞避免而增加时,在这种情况下发送三个新分组。

With an ack-every-two-packets policy, cwnd should only increase one MSS per 2 RTT. However, at the point marked "*" the window increases after 7 ACKs have arrived, and then again at "**" after 6 more ACKs.

使用每两个数据包确认一次的策略,cwnd应该每2个RTT只增加一个MSS。但是,在标记为“*”的点上,在7次确认到达后,窗口将增大,然后在6次确认到达后,窗口将再次在“*”处增大。

While we do not have space to show the effect, this trace suffered from repeated timeout retransmissions due to multiple packet losses during a single RTT.

虽然我们没有空间来显示效果,但由于单个RTT期间的多个数据包丢失,该跟踪遭受重复超时重传。

Trace file demonstrating correct behavior Made using the same host and tracing setup as above, except now A's TCP has been modified to remove the MSS/8 additive constant. Tcpdump reported 77 packet drops; the excerpt below is fully self-consistent so it is unlikely that any of these occurred during the excerpt.

显示正确行为的跟踪文件,使用与上述相同的主机和跟踪设置,但现在A的TCP已被修改以删除MSS/8附加常数。Tcpdump报告了77个数据包丢失;以下摘录完全是自洽的,因此在摘录过程中不太可能发生任何此类情况。

We again begin when cwnd is 31 packets (this occurs significantly later in the trace, because the congestion avoidance is now less aggressive with opening the window).

当cwnd是31个数据包时,我们再次开始(这在跟踪中发生的时间要晚得多,因为现在打开窗口避免拥塞的攻击性要小一些)。

   14:22:21.236757 B > A: . ack 5194679
   14:22:21.238192 A > B: . 5319727:5324039(4312)
   14:22:21.238770 A > B: . 5324039:5328351(4312)
   14:22:21.238821 B > A: . ack 5203303
   14:22:21.240158 A > B: . 5328351:5332663(4312)
   14:22:21.240738 A > B: . 5332663:5336975(4312)
   14:22:21.270422 B > A: . ack 5211927
   14:22:21.271883 A > B: . 5336975:5341287(4312)
   14:22:21.272458 A > B: . 5341287:5345599(4312)
   14:22:21.279099 B > A: . ack 5220551
   14:22:21.280539 A > B: . 5345599:5349911(4312)
   14:22:21.281118 A > B: . 5349911:5354223(4312)
   14:22:21.281183 B > A: . ack 5229175
   14:22:21.282348 A > B: . 5354223:5358535(4312)
   14:22:21.283029 A > B: . 5358535:5362847(4312)
   14:22:21.283089 B > A: . ack 5237799
   14:22:21.284213 A > B: . 5362847:5367159(4312)
   14:22:21.284779 A > B: . 5367159:5371471(4312)
   14:22:21.285976 B > A: . ack 5246423
   14:22:21.287465 A > B: . 5371471:5375783(4312)
        
   14:22:21.236757 B > A: . ack 5194679
   14:22:21.238192 A > B: . 5319727:5324039(4312)
   14:22:21.238770 A > B: . 5324039:5328351(4312)
   14:22:21.238821 B > A: . ack 5203303
   14:22:21.240158 A > B: . 5328351:5332663(4312)
   14:22:21.240738 A > B: . 5332663:5336975(4312)
   14:22:21.270422 B > A: . ack 5211927
   14:22:21.271883 A > B: . 5336975:5341287(4312)
   14:22:21.272458 A > B: . 5341287:5345599(4312)
   14:22:21.279099 B > A: . ack 5220551
   14:22:21.280539 A > B: . 5345599:5349911(4312)
   14:22:21.281118 A > B: . 5349911:5354223(4312)
   14:22:21.281183 B > A: . ack 5229175
   14:22:21.282348 A > B: . 5354223:5358535(4312)
   14:22:21.283029 A > B: . 5358535:5362847(4312)
   14:22:21.283089 B > A: . ack 5237799
   14:22:21.284213 A > B: . 5362847:5367159(4312)
   14:22:21.284779 A > B: . 5367159:5371471(4312)
   14:22:21.285976 B > A: . ack 5246423
   14:22:21.287465 A > B: . 5371471:5375783(4312)
        
   14:22:21.288036 A > B: . 5375783:5380095(4312)
   14:22:21.288073 B > A: . ack 5255047
   14:22:21.289155 A > B: . 5380095:5384407(4312)
   14:22:21.289725 A > B: . 5384407:5388719(4312)
   14:22:21.289762 B > A: . ack 5263671
   14:22:21.291090 A > B: . 5388719:5393031(4312)
   14:22:21.291662 A > B: . 5393031:5397343(4312)
   14:22:21.291701 B > A: . ack 5272295
   14:22:21.292870 A > B: . 5397343:5401655(4312)
   14:22:21.293441 A > B: . 5401655:5405967(4312)
   14:22:21.293481 B > A: . ack 5280919
   14:22:21.294476 A > B: . 5405967:5410279(4312)
   14:22:21.295053 A > B: . 5410279:5414591(4312)
   14:22:21.295106 B > A: . ack 5289543
   14:22:21.296306 A > B: . 5414591:5418903(4312)
   14:22:21.296878 A > B: . 5418903:5423215(4312)
   14:22:21.296917 B > A: . ack 5298167
   14:22:21.297716 A > B: . 5423215:5427527(4312)
   14:22:21.298285 A > B: . 5427527:5431839(4312)
   14:22:21.298324 B > A: . ack 5306791
   14:22:21.299413 A > B: . 5431839:5436151(4312)
   14:22:21.299986 A > B: . 5436151:5440463(4312)
   14:22:21.303696 B > A: . ack 5315415
   14:22:21.305177 A > B: . 5440463:5444775(4312)
   14:22:21.305755 A > B: . 5444775:5449087(4312)
   14:22:21.308032 B > A: . ack 5324039
   14:22:21.309525 A > B: . 5449087:5453399(4312)
   14:22:21.310101 A > B: . 5453399:5457711(4312)
   14:22:21.310144 B > A: . ack 5332663           ***
        
   14:22:21.288036 A > B: . 5375783:5380095(4312)
   14:22:21.288073 B > A: . ack 5255047
   14:22:21.289155 A > B: . 5380095:5384407(4312)
   14:22:21.289725 A > B: . 5384407:5388719(4312)
   14:22:21.289762 B > A: . ack 5263671
   14:22:21.291090 A > B: . 5388719:5393031(4312)
   14:22:21.291662 A > B: . 5393031:5397343(4312)
   14:22:21.291701 B > A: . ack 5272295
   14:22:21.292870 A > B: . 5397343:5401655(4312)
   14:22:21.293441 A > B: . 5401655:5405967(4312)
   14:22:21.293481 B > A: . ack 5280919
   14:22:21.294476 A > B: . 5405967:5410279(4312)
   14:22:21.295053 A > B: . 5410279:5414591(4312)
   14:22:21.295106 B > A: . ack 5289543
   14:22:21.296306 A > B: . 5414591:5418903(4312)
   14:22:21.296878 A > B: . 5418903:5423215(4312)
   14:22:21.296917 B > A: . ack 5298167
   14:22:21.297716 A > B: . 5423215:5427527(4312)
   14:22:21.298285 A > B: . 5427527:5431839(4312)
   14:22:21.298324 B > A: . ack 5306791
   14:22:21.299413 A > B: . 5431839:5436151(4312)
   14:22:21.299986 A > B: . 5436151:5440463(4312)
   14:22:21.303696 B > A: . ack 5315415
   14:22:21.305177 A > B: . 5440463:5444775(4312)
   14:22:21.305755 A > B: . 5444775:5449087(4312)
   14:22:21.308032 B > A: . ack 5324039
   14:22:21.309525 A > B: . 5449087:5453399(4312)
   14:22:21.310101 A > B: . 5453399:5457711(4312)
   14:22:21.310144 B > A: . ack 5332663           ***
        
   14:22:21.311615 A > B: . 5457711:5462023(4312)
   14:22:21.312198 A > B: . 5462023:5466335(4312)
   14:22:21.341876 B > A: . ack 5341287
   14:22:21.343451 A > B: . 5466335:5470647(4312)
   14:22:21.343985 A > B: . 5470647:5474959(4312)
   14:22:21.350304 B > A: . ack 5349911
   14:22:21.351852 A > B: . 5474959:5479271(4312)
   14:22:21.352430 A > B: . 5479271:5483583(4312)
   14:22:21.352484 B > A: . ack 5358535
   14:22:21.353574 A > B: . 5483583:5487895(4312)
   14:22:21.354149 A > B: . 5487895:5492207(4312)
   14:22:21.354205 B > A: . ack 5367159
   14:22:21.355467 A > B: . 5492207:5496519(4312)
   14:22:21.356039 A > B: . 5496519:5500831(4312)
   14:22:21.357361 B > A: . ack 5375783
   14:22:21.358855 A > B: . 5500831:5505143(4312)
   14:22:21.359424 A > B: . 5505143:5509455(4312)
   14:22:21.359465 B > A: . ack 5384407
        
   14:22:21.311615 A > B: . 5457711:5462023(4312)
   14:22:21.312198 A > B: . 5462023:5466335(4312)
   14:22:21.341876 B > A: . ack 5341287
   14:22:21.343451 A > B: . 5466335:5470647(4312)
   14:22:21.343985 A > B: . 5470647:5474959(4312)
   14:22:21.350304 B > A: . ack 5349911
   14:22:21.351852 A > B: . 5474959:5479271(4312)
   14:22:21.352430 A > B: . 5479271:5483583(4312)
   14:22:21.352484 B > A: . ack 5358535
   14:22:21.353574 A > B: . 5483583:5487895(4312)
   14:22:21.354149 A > B: . 5487895:5492207(4312)
   14:22:21.354205 B > A: . ack 5367159
   14:22:21.355467 A > B: . 5492207:5496519(4312)
   14:22:21.356039 A > B: . 5496519:5500831(4312)
   14:22:21.357361 B > A: . ack 5375783
   14:22:21.358855 A > B: . 5500831:5505143(4312)
   14:22:21.359424 A > B: . 5505143:5509455(4312)
   14:22:21.359465 B > A: . ack 5384407
        
   14:22:21.360605 A > B: . 5509455:5513767(4312)
   14:22:21.361181 A > B: . 5513767:5518079(4312)
   14:22:21.361225 B > A: . ack 5393031
   14:22:21.362485 A > B: . 5518079:5522391(4312)
   14:22:21.363057 A > B: . 5522391:5526703(4312)
   14:22:21.363096 B > A: . ack 5401655
   14:22:21.364236 A > B: . 5526703:5531015(4312)
   14:22:21.364810 A > B: . 5531015:5535327(4312)
   14:22:21.364867 B > A: . ack 5410279
   14:22:21.365819 A > B: . 5535327:5539639(4312)
   14:22:21.366386 A > B: . 5539639:5543951(4312)
   14:22:21.366427 B > A: . ack 5418903
   14:22:21.367586 A > B: . 5543951:5548263(4312)
   14:22:21.368158 A > B: . 5548263:5552575(4312)
   14:22:21.368199 B > A: . ack 5427527
   14:22:21.369189 A > B: . 5552575:5556887(4312)
   14:22:21.369758 A > B: . 5556887:5561199(4312)
   14:22:21.369803 B > A: . ack 5436151
   14:22:21.370814 A > B: . 5561199:5565511(4312)
   14:22:21.371398 A > B: . 5565511:5569823(4312)
   14:22:21.375159 B > A: . ack 5444775
   14:22:21.376658 A > B: . 5569823:5574135(4312)
   14:22:21.377235 A > B: . 5574135:5578447(4312)
   14:22:21.379303 B > A: . ack 5453399
   14:22:21.380802 A > B: . 5578447:5582759(4312)
   14:22:21.381377 A > B: . 5582759:5587071(4312)
   14:22:21.381947 A > B: . 5587071:5591383(4312) ****
        
   14:22:21.360605 A > B: . 5509455:5513767(4312)
   14:22:21.361181 A > B: . 5513767:5518079(4312)
   14:22:21.361225 B > A: . ack 5393031
   14:22:21.362485 A > B: . 5518079:5522391(4312)
   14:22:21.363057 A > B: . 5522391:5526703(4312)
   14:22:21.363096 B > A: . ack 5401655
   14:22:21.364236 A > B: . 5526703:5531015(4312)
   14:22:21.364810 A > B: . 5531015:5535327(4312)
   14:22:21.364867 B > A: . ack 5410279
   14:22:21.365819 A > B: . 5535327:5539639(4312)
   14:22:21.366386 A > B: . 5539639:5543951(4312)
   14:22:21.366427 B > A: . ack 5418903
   14:22:21.367586 A > B: . 5543951:5548263(4312)
   14:22:21.368158 A > B: . 5548263:5552575(4312)
   14:22:21.368199 B > A: . ack 5427527
   14:22:21.369189 A > B: . 5552575:5556887(4312)
   14:22:21.369758 A > B: . 5556887:5561199(4312)
   14:22:21.369803 B > A: . ack 5436151
   14:22:21.370814 A > B: . 5561199:5565511(4312)
   14:22:21.371398 A > B: . 5565511:5569823(4312)
   14:22:21.375159 B > A: . ack 5444775
   14:22:21.376658 A > B: . 5569823:5574135(4312)
   14:22:21.377235 A > B: . 5574135:5578447(4312)
   14:22:21.379303 B > A: . ack 5453399
   14:22:21.380802 A > B: . 5578447:5582759(4312)
   14:22:21.381377 A > B: . 5582759:5587071(4312)
   14:22:21.381947 A > B: . 5587071:5591383(4312) ****
        
      "***" marks the end of the first round trip.  Note that cwnd did
      not increase (as evidenced by each ACK eliciting two new data
      packets).  Only at "****", which comes near the end of the second
      round trip, does cwnd increase by one packet.
        
      "***" marks the end of the first round trip.  Note that cwnd did
      not increase (as evidenced by each ACK eliciting two new data
      packets).  Only at "****", which comes near the end of the second
      round trip, does cwnd increase by one packet.
        

This trace did not suffer any timeout retransmissions. It transferred the same amount of data as the first trace in about half as much time. This difference is repeatable between hosts A and B.

此跟踪未遭受任何超时重新传输。它在大约一半的时间内传输了与第一条记录道相同的数据量。这种差异在主机A和主机B之间是可重复的。

References [Stevens94] and [Wright95] discuss this problem. The problem of Reno TCP failing to recover from multiple losses except via a retransmission timeout is discussed in [Fall96,Hoe96].

参考文献[Stevens94]和[Wright95]讨论了这个问题。[Fall96,Hoe96]中讨论了Reno TCP无法从多次丢失中恢复的问题,除非通过重新传输超时。

How to detect If source code is available, that is generally the easiest way to detect this problem. Search for each modification to the cwnd variable; (at least) one of these will be for congestion avoidance, and inspection of the related code should immediately identify the problem if present.

如何检测源代码是否可用,这通常是检测此问题的最简单方法。搜索对cwnd变量的每次修改;(至少)其中一个将用于避免拥堵,检查相关代码应立即发现问题(如果存在)。

The problem can also be detected by closely examining packet traces taken near the sender. During congestion avoidance, cwnd will increase by an additional segment upon the receipt of (typically) eight acknowledgements without a loss. This increase is in addition to the one segment increase per round trip time (or two round trip times if the receiver is using delayed ACKs).

也可以通过仔细检查发送方附近的数据包跟踪来检测问题。在避免拥塞期间,在收到(通常)八个确认而不丢失时,cwnd将增加一个额外的段。此增加是在每往返时间增加一个段(或如果接收器使用延迟ack,则增加两个往返时间)的基础上增加的。

Furthermore, graphs of the sequence number vs. time, taken from packet traces, are normally linear during congestion avoidance. When viewing packet traces of transfers from senders exhibiting this problem, the graphs appear quadratic instead of linear.

此外,在避免拥塞期间,从数据包跟踪中获取的序列号和时间的关系图通常是线性的。当查看来自表现出此问题的发送方的传输的数据包跟踪时,图形显示为二次曲线而不是线性曲线。

Finally, the traces will show that, with sufficiently large windows, nearly every loss event results in a timeout.

最后,跟踪将显示,对于足够大的窗口,几乎每个丢失事件都会导致超时。

How to fix This problem may be corrected by removing the "+ MSS/8" term from the congestion avoidance code that increases cwnd each time an ACK of new data is received.

如何解决此问题可以通过从拥塞避免码中删除“+MSS/8”项来纠正,该项在每次接收到新数据的ACK时增加cwnd。

2.7.

2.7.

Name of Problem Initial RTO too low

问题名称初始RTO过低

Classification Performance

分类性能

Description When a TCP first begins transmitting data, it lacks the RTT measurements necessary to have computed an adaptive retransmission timeout (RTO). RFC 1122, 4.2.3.1, states that a TCP SHOULD initialize RTO to 3 seconds. A TCP that uses a lower value exhibits "Initial RTO too low".

说明当TCP首次开始传输数据时,它缺少计算自适应重传超时(RTO)所需的RTT测量值。RFC 1122,4.2.3.1,规定TCP应将RTO初始化为3秒。使用较低值的TCP显示“初始RTO过低”。

Significance In environments with large RTTs (where "large" means any value larger than the initial RTO), TCPs will experience very poor performance.

在具有大型RTT的环境中(其中“大型”表示任何大于初始RTO的值),TCP的性能将非常差。

Implications Whenever RTO < RTT, very poor performance can result as packets are unnecessarily retransmitted (because RTO will expire before an ACK for the packet can arrive) and the connection enters slow start and congestion avoidance. Generally, the algorithms for computing RTO avoid this problem by adding a positive term to the estimated RTT. However, when a connection first begins it must use some estimate for RTO, and if it picks a value less than RTT, the above problems will arise.

含义每当RTO<RTT时,由于数据包被不必要地重新传输(因为RTO将在数据包的ACK到达之前过期),并且连接进入慢启动和拥塞避免状态,可能会导致非常差的性能。通常,计算RTO的算法通过在估计的RTT中添加一个正项来避免这个问题。然而,当连接第一次开始时,它必须对RTO使用一些估计值,如果它选择的值小于RTT,则会出现上述问题。

Furthermore, when the initial RTO < RTT, it can take a long time for the TCP to correct the problem by adapting the RTT estimate, because the use of Karn's algorithm (mandated by RFC 1122, 4.2.3.1) will discard many of the candidate RTT measurements made after the first timeout, since they will be measurements of retransmitted segments.

此外,当初始RTO<RTT时,TCP可能需要很长时间通过调整RTT估计来纠正问题,因为使用Karn算法(由RFC 1122,4.2.3.1强制)将丢弃在第一次超时后进行的许多候选RTT测量,因为它们将是重传段的测量。

Relevant RFCs RFC 1122 states that TCPs SHOULD initialize RTO to 3 seconds and MUST implement Karn's algorithm.

相关RFC RFC 1122规定,TCP应将RTO初始化为3秒,并且必须实现Karn的算法。

Trace file demonstrating it The following trace file was taken using tcpdump at host A, the data sender. The advertised window and SYN options have been omitted for clarity.

跟踪文件演示了以下跟踪文件是在数据发送器主机A上使用tcpdump获取的。为清晰起见,已省略公布的窗口和SYN选项。

   07:52:39.870301 A > B: S 2786333696:2786333696(0)
   07:52:40.548170 B > A: S 130240000:130240000(0) ack 2786333697
   07:52:40.561287 A > B: P 1:513(512) ack 1
   07:52:40.753466 A > B: . 1:513(512) ack 1
   07:52:41.133687 A > B: . 1:513(512) ack 1
   07:52:41.458529 B > A: . ack 513
   07:52:41.458686 A > B: . 513:1025(512) ack 1
   07:52:41.458797 A > B: P 1025:1537(512) ack 1
   07:52:41.541633 B > A: . ack 513
   07:52:41.703732 A > B: . 513:1025(512) ack 1
   07:52:42.044875 B > A: . ack 513
   07:52:42.173728 A > B: . 513:1025(512) ack 1
   07:52:42.330861 B > A: . ack 1537
   07:52:42.331129 A > B: . 1537:2049(512) ack 1
   07:52:42.331262 A > B: P 2049:2561(512) ack 1
   07:52:42.623673 A > B: . 1537:2049(512) ack 1
   07:52:42.683203 B > A: . ack 1537
   07:52:43.044029 B > A: . ack 1537
   07:52:43.193812 A > B: . 1537:2049(512) ack 1
        
   07:52:39.870301 A > B: S 2786333696:2786333696(0)
   07:52:40.548170 B > A: S 130240000:130240000(0) ack 2786333697
   07:52:40.561287 A > B: P 1:513(512) ack 1
   07:52:40.753466 A > B: . 1:513(512) ack 1
   07:52:41.133687 A > B: . 1:513(512) ack 1
   07:52:41.458529 B > A: . ack 513
   07:52:41.458686 A > B: . 513:1025(512) ack 1
   07:52:41.458797 A > B: P 1025:1537(512) ack 1
   07:52:41.541633 B > A: . ack 513
   07:52:41.703732 A > B: . 513:1025(512) ack 1
   07:52:42.044875 B > A: . ack 513
   07:52:42.173728 A > B: . 513:1025(512) ack 1
   07:52:42.330861 B > A: . ack 1537
   07:52:42.331129 A > B: . 1537:2049(512) ack 1
   07:52:42.331262 A > B: P 2049:2561(512) ack 1
   07:52:42.623673 A > B: . 1537:2049(512) ack 1
   07:52:42.683203 B > A: . ack 1537
   07:52:43.044029 B > A: . ack 1537
   07:52:43.193812 A > B: . 1537:2049(512) ack 1
        

Note from the SYN/SYN-ACK exchange, the RTT is over 600 msec. However, from the elapsed time between the third and fourth lines (the first packet being sent and then retransmitted), it is apparent the RTO was initialized to under 200 msec. The next line shows that this value has doubled to 400 msec (correct exponential backoff of RTO), but that still does not suffice to avoid an unnecessary retransmission.

注意:从SYN/SYN-ACK交换来看,RTT超过600毫秒。然而,从第三行和第四行之间经过的时间(第一个分组被发送,然后被重传)来看,很明显RTO被初始化到200毫秒以下。下一行显示该值已加倍至400毫秒(RTO的正确指数退避),但这仍不足以避免不必要的重新传输。

Finally, an ACK from B arrives for the first segment. Later two more duplicate ACKs for 513 arrive, indicating that both the original and the two retransmissions arrived at B. (Indeed, a concurrent trace at B showed that no packets were lost during the entire connection). This ACK opens the congestion window to two packets, which are sent back-to-back, but at 07:52:41.703732 RTO again expires after a little over 200 msec, leading to an unnecessary retransmission, and the pattern repeats. By the end of the trace excerpt above, 1536 bytes have been successfully transmitted from A to B, over an interval of more than 2 seconds, reflecting terrible performance.

最后,来自B的ACK到达第一段。后来又有两个513的重复ack到达,表明原始和两次重传都到达了B(事实上,B处的并发跟踪表明在整个连接过程中没有数据包丢失)。此ACK将拥塞窗口打开至两个数据包,这两个数据包背靠背发送,但在07:52:41.703732时,RTO在略超过200毫秒后再次过期,导致不必要的重新传输,并且模式重复。在上面的跟踪摘录结束时,从A到B成功地传输了1536个字节,间隔超过2秒,反映了糟糕的性能。

Trace file demonstrating correct behavior The following trace file was taken using tcpdump at host C, the data sender. The advertised window and SYN options have been omitted for clarity.

显示正确行为的跟踪文件以下跟踪文件是在数据发送方主机C上使用tcpdump获取的。为清晰起见,已省略公布的窗口和SYN选项。

   17:30:32.090299 C > D: S 2031744000:2031744000(0)
   17:30:32.900325 D > C: S 262737964:262737964(0) ack 2031744001
   17:30:32.900326 C > D: . ack 1
   17:30:32.910326 C > D: . 1:513(512) ack 1
   17:30:34.150355 D > C: . ack 513
   17:30:34.150356 C > D: . 513:1025(512) ack 1
   17:30:34.150357 C > D: . 1025:1537(512) ack 1
   17:30:35.170384 D > C: . ack 1025
   17:30:35.170385 C > D: . 1537:2049(512) ack 1
   17:30:35.170386 C > D: . 2049:2561(512) ack 1
   17:30:35.320385 D > C: . ack 1537
   17:30:35.320386 C > D: . 2561:3073(512) ack 1
   17:30:35.320387 C > D: . 3073:3585(512) ack 1
   17:30:35.730384 D > C: . ack 2049
        
   17:30:32.090299 C > D: S 2031744000:2031744000(0)
   17:30:32.900325 D > C: S 262737964:262737964(0) ack 2031744001
   17:30:32.900326 C > D: . ack 1
   17:30:32.910326 C > D: . 1:513(512) ack 1
   17:30:34.150355 D > C: . ack 513
   17:30:34.150356 C > D: . 513:1025(512) ack 1
   17:30:34.150357 C > D: . 1025:1537(512) ack 1
   17:30:35.170384 D > C: . ack 1025
   17:30:35.170385 C > D: . 1537:2049(512) ack 1
   17:30:35.170386 C > D: . 2049:2561(512) ack 1
   17:30:35.320385 D > C: . ack 1537
   17:30:35.320386 C > D: . 2561:3073(512) ack 1
   17:30:35.320387 C > D: . 3073:3585(512) ack 1
   17:30:35.730384 D > C: . ack 2049
        

The initial SYN/SYN-ACK exchange shows that RTT is more than 800 msec, and for some subsequent packets it rises above 1 second, but C's retransmit timer does not ever expire.

初始SYN/SYN-ACK交换显示RTT超过800毫秒,对于一些后续数据包,RTT上升到1秒以上,但C的重传计时器从未过期。

References This problem is documented in [Paxson97].

参考文献[Paxson97]中记录了此问题。

How to detect This problem is readily detected by inspecting a packet trace of the startup of a TCP connection made over a long-delay path. It can be diagnosed from either a sender-side or receiver-side trace. Long-delay paths can often be found by locating remote sites on other continents.

通过检查长延迟路径上TCP连接启动的数据包跟踪,可以很容易地检测到如何检测此问题。它可以从发送方或接收方跟踪进行诊断。长延迟路径通常可以通过在其他大陆上定位远程站点来找到。

How to fix As this problem arises from a faulty initialization, one hopes fixing it requires a one-line change to the TCP source code.

如何修复由于错误初始化导致的此问题,我们希望修复它需要对TCP源代码进行一行更改。

2.8.

2.8.

Name of Problem Failure of window deflation after loss recovery

问题名称损失恢复后窗口通缩失败

Classification Congestion control / performance

分类拥塞控制/性能

Description The fast recovery algorithm allows TCP senders to continue to transmit new segments during loss recovery. First, fast retransmission is initiated after a TCP sender receives three duplicate ACKs. At this point, a retransmission is sent and cwnd is halved. The fast recovery algorithm then allows additional segments to be sent when sufficient additional duplicate ACKs arrive. Some implementations of fast recovery compute when to send additional segments by artificially incrementing cwnd, first by three segments to account for the three duplicate ACKs that triggered fast retransmission, and subsequently by 1 MSS for each new duplicate ACK that arrives. When cwnd allows, the sender transmits new data segments.

说明快速恢复算法允许TCP发送方在丢失恢复期间继续传输新段。首先,在TCP发送方收到三个重复的ACK后,启动快速重传。此时,将发送一次重传,cwnd减半。然后,快速恢复算法允许在到达足够多的重复ack时发送额外的段。快速恢复的一些实现通过人工递增cwnd来计算何时发送额外的段,首先递增三个段以说明触发快速重传的三个重复ACK,然后为每个到达的新重复ACK递增1 ms。当cwnd允许时,发送方传输新的数据段。

When an ACK arrives that covers new data, cwnd is to be reduced by the amount by which it was artificially increased. However, some TCP implementations fail to "deflate" the window, causing an inappropriate amount of data to be sent into the network after recovery. One cause of this problem is the "header prediction" code, which is used to handle incoming segments that require little work. In some implementations of TCP, the header prediction code does not check to make sure cwnd has not been artificially inflated, and therefore does not reduce the artificially increased cwnd when appropriate.

当覆盖新数据的ACK到达时,cwnd将减少人为增加的数量。但是,某些TCP实现无法“缩小”窗口,导致恢复后向网络发送的数据量不适当。这个问题的一个原因是“报头预测”代码,它用于处理需要很少工作的传入段。在TCP的一些实现中,报头预测代码不会检查以确保cwnd没有被人为地膨胀,因此在适当的时候不会减少人为增加的cwnd。

Significance TCP senders that exhibit this problem will transmit a burst of data immediately after recovery, which can degrade performance, as well as network stability. Effectively, the sender does not

出现此问题的TCP发送方将在恢复后立即传输突发数据,这会降低性能和网络稳定性。事实上,发送方没有

reduce the size of cwnd as much as it should (to half its value when loss was detected), if at all. This can harm the performance of the TCP connection itself, as well as competing TCP flows.

尽可能减小cwnd的大小(在检测到丢失时将其值减少一半),如果有的话。这可能会损害TCP连接本身的性能,以及竞争TCP流的性能。

Implications A TCP sender exhibiting this problem does not reduce cwnd appropriately in times of congestion, and therefore may contribute to congestive collapse.

提示:出现此问题的TCP发送方在拥塞时不会适当减少cwnd,因此可能导致拥塞崩溃。

Relevant RFCs RFC 2001 outlines the fast retransmit/fast recovery algorithms. [Brakmo95] outlines this implementation problem and offers a fix.

相关RFCs RFC 2001概述了快速重传/快速恢复算法。[Brakmo95]概述了此实现问题并提供了解决方案。

Trace file demonstrating it The following trace file was taken using tcpdump at host A, the data sender. The advertised window (which never changed) has been omitted for clarity, except for the first packet sent by each host.

跟踪文件演示了以下跟踪文件是在数据发送器主机A上使用tcpdump获取的。为清楚起见,省略了播发窗口(从未更改),但每个主机发送的第一个数据包除外。

   08:22:56.825635 A.7505 > B.7505: . 29697:30209(512) ack 1 win 4608
   08:22:57.038794 B.7505 > A.7505: . ack 27649 win 4096
   08:22:57.039279 A.7505 > B.7505: . 30209:30721(512) ack 1
   08:22:57.321876 B.7505 > A.7505: . ack 28161
   08:22:57.322356 A.7505 > B.7505: . 30721:31233(512) ack 1
   08:22:57.347128 B.7505 > A.7505: . ack 28673
   08:22:57.347572 A.7505 > B.7505: . 31233:31745(512) ack 1
   08:22:57.347782 A.7505 > B.7505: . 31745:32257(512) ack 1
   08:22:57.936393 B.7505 > A.7505: . ack 29185
   08:22:57.936864 A.7505 > B.7505: . 32257:32769(512) ack 1
   08:22:57.950802 B.7505 > A.7505: . ack 29697 win 4096
   08:22:57.951246 A.7505 > B.7505: . 32769:33281(512) ack 1
   08:22:58.169422 B.7505 > A.7505: . ack 29697
   08:22:58.638222 B.7505 > A.7505: . ack 29697
   08:22:58.643312 B.7505 > A.7505: . ack 29697
   08:22:58.643669 A.7505 > B.7505: . 29697:30209(512) ack 1
   08:22:58.936436 B.7505 > A.7505: . ack 29697
   08:22:59.002614 B.7505 > A.7505: . ack 29697
   08:22:59.003026 A.7505 > B.7505: . 33281:33793(512) ack 1
   08:22:59.682902 B.7505 > A.7505: . ack 33281
   08:22:59.683391 A.7505 > B.7505: P 33793:34305(512) ack 1
   08:22:59.683748 A.7505 > B.7505: P 34305:34817(512) ack 1 ***
   08:22:59.684043 A.7505 > B.7505: P 34817:35329(512) ack 1
   08:22:59.684266 A.7505 > B.7505: P 35329:35841(512) ack 1
   08:22:59.684567 A.7505 > B.7505: P 35841:36353(512) ack 1
   08:22:59.684810 A.7505 > B.7505: P 36353:36865(512) ack 1
   08:22:59.685094 A.7505 > B.7505: P 36865:37377(512) ack 1
        
   08:22:56.825635 A.7505 > B.7505: . 29697:30209(512) ack 1 win 4608
   08:22:57.038794 B.7505 > A.7505: . ack 27649 win 4096
   08:22:57.039279 A.7505 > B.7505: . 30209:30721(512) ack 1
   08:22:57.321876 B.7505 > A.7505: . ack 28161
   08:22:57.322356 A.7505 > B.7505: . 30721:31233(512) ack 1
   08:22:57.347128 B.7505 > A.7505: . ack 28673
   08:22:57.347572 A.7505 > B.7505: . 31233:31745(512) ack 1
   08:22:57.347782 A.7505 > B.7505: . 31745:32257(512) ack 1
   08:22:57.936393 B.7505 > A.7505: . ack 29185
   08:22:57.936864 A.7505 > B.7505: . 32257:32769(512) ack 1
   08:22:57.950802 B.7505 > A.7505: . ack 29697 win 4096
   08:22:57.951246 A.7505 > B.7505: . 32769:33281(512) ack 1
   08:22:58.169422 B.7505 > A.7505: . ack 29697
   08:22:58.638222 B.7505 > A.7505: . ack 29697
   08:22:58.643312 B.7505 > A.7505: . ack 29697
   08:22:58.643669 A.7505 > B.7505: . 29697:30209(512) ack 1
   08:22:58.936436 B.7505 > A.7505: . ack 29697
   08:22:59.002614 B.7505 > A.7505: . ack 29697
   08:22:59.003026 A.7505 > B.7505: . 33281:33793(512) ack 1
   08:22:59.682902 B.7505 > A.7505: . ack 33281
   08:22:59.683391 A.7505 > B.7505: P 33793:34305(512) ack 1
   08:22:59.683748 A.7505 > B.7505: P 34305:34817(512) ack 1 ***
   08:22:59.684043 A.7505 > B.7505: P 34817:35329(512) ack 1
   08:22:59.684266 A.7505 > B.7505: P 35329:35841(512) ack 1
   08:22:59.684567 A.7505 > B.7505: P 35841:36353(512) ack 1
   08:22:59.684810 A.7505 > B.7505: P 36353:36865(512) ack 1
   08:22:59.685094 A.7505 > B.7505: P 36865:37377(512) ack 1
        

The first 12 lines of the trace show incoming ACKs clocking out a window of data segments. At this point in the transfer, cwnd is 7 segments. The next 4 lines of the trace show 3 duplicate ACKs arriving from the receiver, followed by a retransmission from the sender. At this point, cwnd is halved (to 3 segments) and artificially incremented by the three duplicate ACKs that have arrived, making cwnd 6 segments. The next two lines show 2 more duplicate ACKs arriving, each of which increases cwnd by 1 segment. So, after these two duplicate ACKs arrive the cwnd is 8 segments and the sender has permission to send 1 new segment (since there are 7 segments outstanding). The next line in the trace shows this new segment being transmitted. The next packet shown in the trace is an ACK from host B that covers the first 7 outstanding segments (all but the new segment sent during recovery). This should cause cwnd to be reduced to 3 segments and 2 segments to be transmitted (since there is already 1 outstanding segment in the network). However, as shown by the last 7 lines of the trace, cwnd is not reduced, causing a line-rate burst of 7 new segments.

跟踪的前12行显示传入的ACK记录了一个数据段窗口。在传输的这一点上,cwnd是7段。跟踪的接下来4行显示了从接收方到达的3个重复ACK,然后是从发送方的重新传输。在这一点上,cwnd被减半(到3个段),并通过已经到达的三个重复ACK人为地增加,使cwnd成为6个段。接下来的两行显示另外两个到达的重复ACK,每个ACK将cwnd增加1段。因此,在这两个重复的ack到达后,cwnd是8段,发送方有权发送1个新段(因为有7段未完成)。跟踪中的下一行显示正在传输的新段。跟踪中显示的下一个数据包是来自主机B的ACK,它覆盖了前7个未完成的段(恢复期间发送的新段除外)。这将导致cwnd减少为3个段和2个要传输的段(因为网络中已经有1个未完成的段)。但是,如跟踪的最后7行所示,cwnd没有减少,导致7个新段的行速率突发。

   Trace file demonstrating correct behavior
      The trace would appear identical to the one above, only it would
      stop after the line marked "***", because at this point host A
      would correctly reduce cwnd after recovery, allowing only 2
      segments to be transmitted, rather than producing a burst of 7
      segments.
        
   Trace file demonstrating correct behavior
      The trace would appear identical to the one above, only it would
      stop after the line marked "***", because at this point host A
      would correctly reduce cwnd after recovery, allowing only 2
      segments to be transmitted, rather than producing a burst of 7
      segments.
        

References This problem is documented and the performance implications analyzed in [Brakmo95].

参考文献[Brakmo95]中记录了该问题,并分析了性能影响。

How to detect Failure of window deflation after loss recovery can be found by examining sender-side packet traces recorded during periods of moderate loss (so cwnd can grow large enough to allow for fast recovery when loss occurs).

如何在丢失恢复后检测窗口压缩失败,可以通过检查中等丢失期间记录的发送方端数据包跟踪来发现(这样cwnd可以变得足够大,以便在发生丢失时能够快速恢复)。

How to fix When this bug is caused by incorrect header prediction, the fix is to add a predicate to the header prediction test that checks to see whether cwnd is inflated; if so, the header prediction test fails and the usual ACK processing occurs, which (in this case) takes care to deflate the window. See [Brakmo95] for details.

如何修复由错误的报头预测导致的错误,修复方法是在报头预测测试中添加一个谓词,检查cwnd是否膨胀;如果是这样,则报头预测测试失败,并发生通常的ACK处理,这(在本例中)会小心地缩小窗口。详见[Brakmo95]。

2.9.

2.9.

Name of Problem Excessively short keepalive connection timeout

问题名称保持连接超时过短

Classification Reliability

分类可靠性

Description Keep-alive is a mechanism for checking whether an idle connection is still alive. According to RFC 1122, keepalive should only be invoked in server applications that might otherwise hang indefinitely and consume resources unnecessarily if a client crashes or aborts a connection during a network failure.

Description Keep alive是一种检查空闲连接是否仍处于活动状态的机制。根据RFC1122,keepalive只应在服务器应用程序中调用,否则,如果客户端在网络故障期间崩溃或中止连接,这些应用程序可能会无限期挂起并不必要地消耗资源。

RFC 1122 also specifies that if a keep-alive mechanism is implemented it MUST NOT interpret failure to respond to any specific probe as a dead connection. The RFC does not specify a particular mechanism for timing out a connection when no response is received for keepalive probes. However, if the mechanism does not allow ample time for recovery from network congestion or delay, connections may be timed out unnecessarily.

RFC1122还规定,如果实现了保持活动机制,则不能将响应任何特定探测的故障解释为死连接。当没有收到keepalive探测的响应时,RFC没有指定特定的机制来超时连接。但是,如果该机制不允许有足够的时间从网络拥塞或延迟中恢复,则连接可能会不必要地超时。

Significance In congested networks, can lead to unwarranted termination of connections.

在拥挤的网络中,可能会导致不必要的连接终止。

Implications It is possible for the network connection between two peer machines to become congested or to exhibit packet loss at the time that a keep-alive probe is sent on a connection. If the keep-alive mechanism does not allow sufficient time before dropping connections in the face of unacknowledged probes, connections may be dropped even when both peers of a connection are still alive.

含义在连接上发送保持活动的探测时,两台对等机器之间的网络连接可能会变得拥挤或出现数据包丢失。如果keep-alive机制在面对未确认的探测时丢弃连接之前没有足够的时间,则即使连接的两个对等方仍然处于活动状态,也可能丢弃连接。

Relevant RFCs RFC 1122 specifies that the keep-alive mechanism may be provided. It does not specify a mechanism for determining dead connections when keepalive probes are not acknowledged.

相关RFC RFC 1122规定可提供保持活动机制。它没有指定在未确认keepalive探测时确定死连接的机制。

Trace file demonstrating it Made using the Orchestra tool at the peer of the machine using keep-alive. After connection establishment, incoming keep-alives were dropped by Orchestra to simulate a dead connection.

使用keep alive在机器的对等位置使用管弦乐队工具制作的跟踪文件。在建立连接后,管弦乐队丢弃传入的保持生命,以模拟一个死连接。

   22:11:12.040000 A > B: 22666019:0 win 8192 datasz 4 SYN
   22:11:12.060000 B > A: 2496001:22666020 win 4096 datasz 4 SYN ACK
   22:11:12.130000 A > B: 22666020:2496002 win 8760 datasz 0 ACK
   (more than two hours elapse)
   00:23:00.680000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
   00:23:01.770000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
   00:23:02.870000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
   00:23.03.970000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
        
   22:11:12.040000 A > B: 22666019:0 win 8192 datasz 4 SYN
   22:11:12.060000 B > A: 2496001:22666020 win 4096 datasz 4 SYN ACK
   22:11:12.130000 A > B: 22666020:2496002 win 8760 datasz 0 ACK
   (more than two hours elapse)
   00:23:00.680000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
   00:23:01.770000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
   00:23:02.870000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
   00:23.03.970000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
        
   00:23.05.070000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
        
   00:23.05.070000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
        

The initial three packets are the SYN exchange for connection setup. About two hours later, the keepalive timer fires because the connection has been idle. Keepalive probes are transmitted a total of 5 times, with a 1 second spacing between probes, after which the connection is dropped. This is problematic because a 5 second network outage at the time of the first probe results in the connection being killed.

最初的三个数据包是用于连接设置的SYN交换。大约两小时后,keepalive计时器启动,因为连接处于空闲状态。Keepalive探针总共传输5次,探针之间的间隔为1秒,然后断开连接。这是有问题的,因为在第一次探测时5秒的网络中断会导致连接中断。

Trace file demonstrating correct behavior Made using the Orchestra tool at the peer of the machine using keep-alive. After connection establishment, incoming keep-alives were dropped by Orchestra to simulate a dead connection.

使用keep alive在机器的对等设备上使用Orchestra工具制作的显示正确行为的跟踪文件。在建立连接后,管弦乐队丢弃传入的保持生命,以模拟一个死连接。

   16:01:52.130000 A > B: 1804412929:0 win 4096 datasz 4 SYN
   16:01:52.360000 B > A: 16512001:1804412930 win 4096 datasz 4 SYN ACK
   16:01:52.410000 A > B: 1804412930:16512002 win 4096 datasz 0 ACK
   (two hours elapse)
   18:01:57.170000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:03:12.220000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:04:27.270000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:05:42.320000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:06:57.370000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:08:12.420000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:09:27.480000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:10:43.290000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:11:57.580000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:13:12.630000 A > B: 1804412929:16512002 win 4096 datasz 0 RST ACK
        
   16:01:52.130000 A > B: 1804412929:0 win 4096 datasz 4 SYN
   16:01:52.360000 B > A: 16512001:1804412930 win 4096 datasz 4 SYN ACK
   16:01:52.410000 A > B: 1804412930:16512002 win 4096 datasz 0 ACK
   (two hours elapse)
   18:01:57.170000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:03:12.220000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:04:27.270000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:05:42.320000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:06:57.370000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:08:12.420000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:09:27.480000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:10:43.290000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:11:57.580000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
   18:13:12.630000 A > B: 1804412929:16512002 win 4096 datasz 0 RST ACK
        

In this trace, when the keep-alive timer expires, 9 keepalive probes are sent at 75 second intervals. 75 seconds after the last probe is sent, a final RST segment is sent indicating that the connection has been closed. This implementation waits about 11 minutes before timing out the connection, while the first implementation shown allows only 5 seconds.

在此跟踪中,当保持活动计时器过期时,以75秒的间隔发送9个保持活动的探测。发送最后一个探测器75秒后,将发送最后一个RST段,指示连接已关闭。此实现在超时连接之前等待约11分钟,而显示的第一个实现只允许5秒。

References This problem is documented in [Dawson97].

参考文献[Dawson97]中记录了该问题。

How to detect For implementations manifesting this problem, it shows up on a packet trace after the keepalive timer fires if the peer machine receiving the keepalive does not respond. Usually the keepalive timer will fire at least two hours after keepalive is turned on, but it may be sooner if the timer value has been configured lower, or if the keepalive mechanism violates the specification (see Insufficient interval between keepalives problem). In this

如何检测出现此问题的实现,如果接收keepalive的对等计算机没有响应,keepalive计时器触发后,它将显示在数据包跟踪上。通常,keepalive计时器会在开启keepalive后至少两小时启动,但如果计时器值配置得更低,或者keepalive机制违反了规范(请参阅keepalive之间的间隔不足问题),则可能会更早启动。在这个

example, suppressing the response of the peer to keepalive probes was accomplished using the Orchestra toolkit, which can be configured to drop packets. It could also have been done by creating a connection, turning on keepalive, and disconnecting the network connection at the receiver machine.

例如,使用Orchestra工具包(可配置为丢弃数据包)来抑制对等到keepalive探测的响应。也可以通过创建连接、打开keepalive并断开接收器计算机上的网络连接来完成。

How to fix This problem can be fixed by using a different method for timing out keepalives that allows a longer period of time to elapse before dropping the connection. For example, the algorithm for timing out on dropped data could be used. Another possibility is an algorithm such as the one shown in the trace above, which sends 9 probes at 75 second intervals and then waits an additional 75 seconds for a response before closing the connection.

如何解决这个问题可以通过使用一种不同的超时方法来解决,这种方法允许在断开连接之前经过更长的时间。例如,可以使用丢弃数据的超时算法。另一种可能是上面跟踪中所示的算法,该算法以75秒的间隔发送9个探测,然后在关闭连接之前再等待75秒,等待响应。

2.10.

2.10.

Name of Problem Failure to back off retransmission timeout

返回重新传输超时失败的问题名称

Classification Congestion control / reliability

分类拥塞控制/可靠性

Description The retransmission timeout is used to determine when a packet has been dropped in the network. When this timeout has expired without the arrival of an ACK, the segment is retransmitted. Each time a segment is retransmitted, the timeout is adjusted according to an exponential backoff algorithm, doubling each time. If a TCP fails to receive an ACK after numerous attempts at retransmitting the same segment, it terminates the connection. A TCP that fails to double its retransmission timeout upon repeated timeouts is said to exhibit "Failure to back off retransmission timeout".

说明重传超时用于确定数据包何时在网络中被丢弃。当此超时在未到达ACK的情况下过期时,将重新传输该段。每次重新传输一个段时,根据指数退避算法调整超时,每次加倍。如果TCP在多次尝试重新传输同一段后未能收到ACK,则会终止连接。如果TCP在重复超时时未能将其重传超时增加一倍,则称其为“无法退出重传超时”。

Significance Backing off the retransmission timer is a cornerstone of network stability in the presence of congestion. Consequently, this bug can have severe adverse affects in congested networks. It also affects TCP reliability in congested networks, as discussed in the next section.

重要意义:在出现拥塞时,取消重传计时器是网络稳定的基石。因此,在拥挤的网络中,此错误可能会产生严重的负面影响。它还影响拥塞网络中的TCP可靠性,如下一节所述。

Implications It is possible for the network connection between two TCP peers to become congested or to exhibit packet loss at the time that a retransmission is sent on a connection. If the retransmission mechanism does not allow sufficient time before dropping

含义在连接上发送重传时,两个TCP对等点之间的网络连接可能变得拥挤或出现数据包丢失。如果重传机制在丢弃之前没有足够的时间

connections in the face of unacknowledged segments, connections may be dropped even when, by waiting longer, the connection could have continued.

连接面对未确认的段,连接可能会断开,即使等待更长时间,连接可能会继续。

Relevant RFCs RFC 1122 specifies mandatory exponential backoff of the retransmission timeout, and the termination of connections after some period of time (at least 100 seconds).

相关RFCs RFC 1122规定了强制指数退避重传超时,以及在一段时间(至少100秒)后终止连接。

Trace file demonstrating it Made using tcpdump on an intermediate host:

在中间主机上使用tcpdump生成的跟踪文件:

   16:51:12.671727 A > B: S 510878852:510878852(0) win 16384
   16:51:12.672479 B > A: S 2392143687:2392143687(0)
                            ack 510878853 win 16384
   16:51:12.672581 A > B: . ack 1 win 16384
   16:51:15.244171 A > B: P 1:3(2) ack 1 win 16384
   16:51:15.244933 B > A: . ack 3 win 17518  (DF)
        
   16:51:12.671727 A > B: S 510878852:510878852(0) win 16384
   16:51:12.672479 B > A: S 2392143687:2392143687(0)
                            ack 510878853 win 16384
   16:51:12.672581 A > B: . ack 1 win 16384
   16:51:15.244171 A > B: P 1:3(2) ack 1 win 16384
   16:51:15.244933 B > A: . ack 3 win 17518  (DF)
        

<receiving host disconnected>

<接收主机已断开连接>

   16:51:19.381176 A > B: P 3:5(2) ack 1 win 16384
   16:51:20.162016 A > B: P 3:5(2) ack 1 win 16384
   16:51:21.161936 A > B: P 3:5(2) ack 1 win 16384
   16:51:22.161914 A > B: P 3:5(2) ack 1 win 16384
   16:51:23.161914 A > B: P 3:5(2) ack 1 win 16384
   16:51:24.161879 A > B: P 3:5(2) ack 1 win 16384
   16:51:25.161857 A > B: P 3:5(2) ack 1 win 16384
   16:51:26.161836 A > B: P 3:5(2) ack 1 win 16384
   16:51:27.161814 A > B: P 3:5(2) ack 1 win 16384
   16:51:28.161791 A > B: P 3:5(2) ack 1 win 16384
   16:51:29.161769 A > B: P 3:5(2) ack 1 win 16384
   16:51:30.161750 A > B: P 3:5(2) ack 1 win 16384
   16:51:31.161727 A > B: P 3:5(2) ack 1 win 16384
        
   16:51:19.381176 A > B: P 3:5(2) ack 1 win 16384
   16:51:20.162016 A > B: P 3:5(2) ack 1 win 16384
   16:51:21.161936 A > B: P 3:5(2) ack 1 win 16384
   16:51:22.161914 A > B: P 3:5(2) ack 1 win 16384
   16:51:23.161914 A > B: P 3:5(2) ack 1 win 16384
   16:51:24.161879 A > B: P 3:5(2) ack 1 win 16384
   16:51:25.161857 A > B: P 3:5(2) ack 1 win 16384
   16:51:26.161836 A > B: P 3:5(2) ack 1 win 16384
   16:51:27.161814 A > B: P 3:5(2) ack 1 win 16384
   16:51:28.161791 A > B: P 3:5(2) ack 1 win 16384
   16:51:29.161769 A > B: P 3:5(2) ack 1 win 16384
   16:51:30.161750 A > B: P 3:5(2) ack 1 win 16384
   16:51:31.161727 A > B: P 3:5(2) ack 1 win 16384
        
   16:51:32.161701 A > B: R 5:5(0) ack 1 win 16384
        
   16:51:32.161701 A > B: R 5:5(0) ack 1 win 16384
        

The initial three packets are the SYN exchange for connection setup, then a single data packet, to verify that data can be transferred. Then the connection to the destination host was disconnected, and more data sent. Retransmissions occur every second for 12 seconds, and then the connection is terminated with a RST. This is problematic because a 12 second pause in connectivity could result in the termination of a connection.

最初的三个数据包是用于连接设置的SYN交换,然后是一个数据包,用于验证数据是否可以传输。然后断开与目标主机的连接,并发送更多数据。每秒重新传输一次,持续12秒,然后通过RST终止连接。这是有问题的,因为连接暂停12秒可能会导致连接终止。

Trace file demonstrating correct behavior Again, a tcpdump taken from a third host:

再次显示正确行为的跟踪文件,从第三台主机获取的tcpdump:

   16:59:05.398301 A > B: S 2503324757:2503324757(0) win 16384
   16:59:05.399673 B > A: S 2492674648:2492674648(0)
                           ack 2503324758 win 16384
   16:59:05.399866 A > B: . ack 1 win 17520
   16:59:06.538107 A > B: P 1:3(2) ack 1 win 17520
   16:59:06.540977 B > A: . ack 3 win 17518  (DF)
        
   16:59:05.398301 A > B: S 2503324757:2503324757(0) win 16384
   16:59:05.399673 B > A: S 2492674648:2492674648(0)
                           ack 2503324758 win 16384
   16:59:05.399866 A > B: . ack 1 win 17520
   16:59:06.538107 A > B: P 1:3(2) ack 1 win 17520
   16:59:06.540977 B > A: . ack 3 win 17518  (DF)
        

<receiving host disconnected>

<接收主机已断开连接>

   16:59:13.121542 A > B: P 3:5(2) ack 1 win 17520
   16:59:14.010928 A > B: P 3:5(2) ack 1 win 17520
   16:59:16.010979 A > B: P 3:5(2) ack 1 win 17520
   16:59:20.011229 A > B: P 3:5(2) ack 1 win 17520
   16:59:28.011896 A > B: P 3:5(2) ack 1 win 17520
   16:59:44.013200 A > B: P 3:5(2) ack 1 win 17520
   17:00:16.015766 A > B: P 3:5(2) ack 1 win 17520
   17:01:20.021308 A > B: P 3:5(2) ack 1 win 17520
   17:02:24.027752 A > B: P 3:5(2) ack 1 win 17520
   17:03:28.034569 A > B: P 3:5(2) ack 1 win 17520
   17:04:32.041567 A > B: P 3:5(2) ack 1 win 17520
   17:05:36.048264 A > B: P 3:5(2) ack 1 win 17520
   17:06:40.054900 A > B: P 3:5(2) ack 1 win 17520
        
   16:59:13.121542 A > B: P 3:5(2) ack 1 win 17520
   16:59:14.010928 A > B: P 3:5(2) ack 1 win 17520
   16:59:16.010979 A > B: P 3:5(2) ack 1 win 17520
   16:59:20.011229 A > B: P 3:5(2) ack 1 win 17520
   16:59:28.011896 A > B: P 3:5(2) ack 1 win 17520
   16:59:44.013200 A > B: P 3:5(2) ack 1 win 17520
   17:00:16.015766 A > B: P 3:5(2) ack 1 win 17520
   17:01:20.021308 A > B: P 3:5(2) ack 1 win 17520
   17:02:24.027752 A > B: P 3:5(2) ack 1 win 17520
   17:03:28.034569 A > B: P 3:5(2) ack 1 win 17520
   17:04:32.041567 A > B: P 3:5(2) ack 1 win 17520
   17:05:36.048264 A > B: P 3:5(2) ack 1 win 17520
   17:06:40.054900 A > B: P 3:5(2) ack 1 win 17520
        
   17:07:44.061306 A > B: R 5:5(0) ack 1 win 17520
        
   17:07:44.061306 A > B: R 5:5(0) ack 1 win 17520
        

In this trace, when the retransmission timer expires, 12 retransmissions are sent at exponentially-increasing intervals, until the interval value reaches 64 seconds, at which time the interval stops growing. 64 seconds after the last retransmission, a final RST segment is sent indicating that the connection has been closed. This implementation waits about 9 minutes before timing out the connection, while the first implementation shown allows only 12 seconds.

在此跟踪中,当重传计时器过期时,以指数级递增的间隔发送12次重传,直到间隔值达到64秒,此时间隔停止增长。最后一次重新传输64秒后,将发送最后一个RST段,指示连接已关闭。此实现在超时连接之前等待约9分钟,而显示的第一个实现只允许12秒。

References None known.

参考文献不详。

How to detect A simple transfer can be easily interrupted by disconnecting the receiving host from the network. tcpdump or another appropriate tool should show the retransmissions being sent. Several trials in a low-rtt environment may be required to demonstrate the bug.

通过断开接收主机与网络的连接,可以很容易地中断如何检测简单传输。tcpdump或其他适当的工具应显示正在发送的重传。可能需要在低rtt环境中进行多次试验来证明该缺陷。

How to fix For one of the implementations studied, this problem seemed to be the result of an error introduced with the addition of the Brakmo-Peterson RTO algorithm [Brakmo95], which can return a value of zero where the older Jacobson algorithm always returns a

如何修复所研究的其中一个实现,这个问题似乎是由于添加了Brakmo-Peterson RTO算法[Brakmo95]而引入的错误造成的,该算法可以返回一个零值,而旧的Jacobson算法总是返回一个零值

positive value. Brakmo and Peterson specified an additional step of min(rtt + 2, RTO) to avoid problems with this. Unfortunately, in the implementation this step was omitted when calculating the exponential backoff for the RTO. This results in an RTO of 0 seconds being multiplied by the backoff, yielding again zero, and then being subjected to a later MAX operation that increases it to 1 second, regardless of the backoff factor.

正值。Brakmo和Peterson指定了一个额外的最小步长(rtt+2,RTO),以避免出现问题。不幸的是,在实现中,在计算RTO的指数退避时忽略了该步骤。这将导致0秒的RTO乘以退避,再次产生零,然后接受稍后的MAX操作,该操作将其增加到1秒,而不管退避系数如何。

A similar TCP persist failure has the same cause.

类似的TCP持久化故障具有相同的原因。

2.11.

2.11.

Name of Problem Insufficient interval between keepalives

问题名称保持间隔时间不足

Classification Reliability

分类可靠性

Description Keep-alive is a mechanism for checking whether an idle connection is still alive. According to RFC 1122, keep-alive may be included in an implementation. If it is included, the interval between keep-alive packets MUST be configurable, and MUST default to no less than two hours.

Description Keep alive是一种检查空闲连接是否仍处于活动状态的机制。根据RFC 1122,保持活动可以包括在实现中。如果包含,则保持活动数据包之间的间隔必须是可配置的,并且必须默认为不少于两个小时。

Significance In congested networks, can lead to unwarranted termination of connections.

在拥挤的网络中,可能会导致不必要的连接终止。

Implications According to RFC 1122, keep-alive is not required of implementations because it could: (1) cause perfectly good connections to break during transient Internet failures; (2) consume unnecessary bandwidth ("if no one is using the connection, who cares if it is still good?"); and (3) cost money for an Internet path that charges for packets. Regarding this last point, we note that in addition the presence of dial-on-demand links in the route can greatly magnify the cost penalty of excess keepalives, potentially forcing a full-time connection on a link that would otherwise only be connected a few minutes a day.

根据RFC 1122的含义,实现不需要保持活动状态,因为它可能:(1)在瞬时互联网故障期间导致完全良好的连接中断;(2) 消耗不必要的带宽(“如果没有人在使用连接,谁在乎它是否仍然良好?”);和(3)成本的互联网路径,收费的数据包。关于最后一点,我们注意到,除此之外,路由中存在的按需拨号链接会大大放大额外保留的成本损失,可能会迫使在一条链路上进行全职连接,否则该链路每天只能连接几分钟。

If keepalive is provided the RFC states that the required inter-keepalive distance MUST default to no less than two hours. If it does not, the probability of connections breaking increases, the bandwidth used due to keepalives increases, and cost increases over paths which charge per packet.

如果提供了keepalive,RFC声明所需的保持间隔距离必须默认为不少于两小时。否则,连接中断的概率会增加,由于keepalives而使用的带宽会增加,并且在每个数据包收费的路径上,成本会增加。

Relevant RFCs RFC 1122 specifies that the keep-alive mechanism may be provided. It also specifies the two hour minimum for the default interval between keepalive probes.

相关RFC RFC 1122规定可提供保持活动机制。它还指定keepalive探测之间的默认间隔的最短两小时。

Trace file demonstrating it Made using the Orchestra tool at the peer of the machine using keep-alive. Machine A was configured to use default settings for the keepalive timer.

使用keep alive在机器的对等位置使用管弦乐队工具制作的跟踪文件。机器A已配置为使用keepalive计时器的默认设置。

   11:36:32.910000 A > B: 3288354305:0      win 28672 datasz 4 SYN
   11:36:32.930000 B > A: 896001:3288354306 win 4096  datasz 4 SYN ACK
   11:36:32.950000 A > B: 3288354306:896002 win 28672 datasz 0 ACK
        
   11:36:32.910000 A > B: 3288354305:0      win 28672 datasz 4 SYN
   11:36:32.930000 B > A: 896001:3288354306 win 4096  datasz 4 SYN ACK
   11:36:32.950000 A > B: 3288354306:896002 win 28672 datasz 0 ACK
        
   11:50:01.190000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
   11:50:01.210000 B > A: 896002:3288354306 win 4096  datasz 0 ACK
        
   11:50:01.190000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
   11:50:01.210000 B > A: 896002:3288354306 win 4096  datasz 0 ACK
        
   12:03:29.410000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
   12:03:29.430000 B > A: 896002:3288354306 win 4096  datasz 0 ACK
        
   12:03:29.410000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
   12:03:29.430000 B > A: 896002:3288354306 win 4096  datasz 0 ACK
        
   12:16:57.630000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
   12:16:57.650000 B > A: 896002:3288354306 win 4096  datasz 0 ACK
        
   12:16:57.630000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
   12:16:57.650000 B > A: 896002:3288354306 win 4096  datasz 0 ACK
        
   12:30:25.850000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
   12:30:25.870000 B > A: 896002:3288354306 win 4096  datasz 0 ACK
        
   12:30:25.850000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
   12:30:25.870000 B > A: 896002:3288354306 win 4096  datasz 0 ACK
        
   12:43:54.070000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
   12:43:54.090000 B > A: 896002:3288354306 win 4096  datasz 0 ACK
        
   12:43:54.070000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
   12:43:54.090000 B > A: 896002:3288354306 win 4096  datasz 0 ACK
        

The initial three packets are the SYN exchange for connection setup. About 13 minutes later, the keepalive timer fires because the connection is idle. The keepalive is acknowledged, and the timer fires again in about 13 more minutes. This behavior continues indefinitely until the connection is closed, and is a violation of the specification.

最初的三个数据包是用于连接设置的SYN交换。大约13分钟后,keepalive计时器启动,因为连接处于空闲状态。keepalive被确认,计时器在大约13分钟后再次启动。这种行为会无限期地持续下去,直到连接关闭,这违反了规范。

Trace file demonstrating correct behavior Made using the Orchestra tool at the peer of the machine using keep-alive. Machine A was configured to use default settings for the keepalive timer.

使用keep alive在机器的对等设备上使用Orchestra工具制作的显示正确行为的跟踪文件。机器A已配置为使用keepalive计时器的默认设置。

   17:37:20.500000 A > B: 34155521:0       win 4096 datasz 4 SYN
   17:37:20.520000 B > A: 6272001:34155522 win 4096 datasz 4 SYN ACK
   17:37:20.540000 A > B: 34155522:6272002 win 4096 datasz 0 ACK
        
   17:37:20.500000 A > B: 34155521:0       win 4096 datasz 4 SYN
   17:37:20.520000 B > A: 6272001:34155522 win 4096 datasz 4 SYN ACK
   17:37:20.540000 A > B: 34155522:6272002 win 4096 datasz 0 ACK
        
   19:37:25.430000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
   19:37:25.450000 B > A: 6272002:34155522 win 4096 datasz 0 ACK
        
   19:37:25.430000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
   19:37:25.450000 B > A: 6272002:34155522 win 4096 datasz 0 ACK
        
   21:37:30.560000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
   21:37:30.570000 B > A: 6272002:34155522 win 4096 datasz 0 ACK
        
   21:37:30.560000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
   21:37:30.570000 B > A: 6272002:34155522 win 4096 datasz 0 ACK
        
   23:37:35.580000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
   23:37:35.600000 B > A: 6272002:34155522 win 4096 datasz 0 ACK
        
   23:37:35.580000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
   23:37:35.600000 B > A: 6272002:34155522 win 4096 datasz 0 ACK
        
   01:37:40.620000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
   01:37:40.640000 B > A: 6272002:34155522 win 4096 datasz 0 ACK
        
   01:37:40.620000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
   01:37:40.640000 B > A: 6272002:34155522 win 4096 datasz 0 ACK
        
   03:37:45.590000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
   03:37:45.610000 B > A: 6272002:34155522 win 4096 datasz 0 ACK
        
   03:37:45.590000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
   03:37:45.610000 B > A: 6272002:34155522 win 4096 datasz 0 ACK
        

The initial three packets are the SYN exchange for connection setup. Just over two hours later, the keepalive timer fires because the connection is idle. The keepalive is acknowledged, and the timer fires again just over two hours later. This behavior continues indefinitely until the connection is closed.

最初的三个数据包是用于连接设置的SYN交换。两个多小时后,keepalive计时器启动,因为连接处于空闲状态。keepalive被确认,两个多小时后计时器再次启动。此行为将无限期持续,直到连接关闭。

References This problem is documented in [Dawson97].

参考文献[Dawson97]中记录了该问题。

How to detect For implementations manifesting this problem, it shows up on a packet trace. If the connection is left idle, the keepalive probes will arrive closer together than the two hour minimum.

如何检测出现此问题的实现,它显示在数据包跟踪上。如果连接处于空闲状态,keepalive探测器到达的距离将比最短两小时更近。

2.12.

2.12.

Name of Problem Window probe deadlock

问题窗口探测死锁的名称

Classification Reliability

分类可靠性

Description When an application reads a single byte from a full window, the window should not be updated, in order to avoid Silly Window Syndrome (SWS; see [RFC813]). If the remote peer uses a single byte of data to probe the window, that byte can be accepted into the buffer. In some implementations, at this point a negative argument to a signed comparison causes all further new data to be considered outside the window; consequently, it is discarded (after sending an ACK to resynchronize). These discards include the ACKs for the data packets sent by the local TCP, so the TCP will consider the data unacknowledged.

说明当应用程序从完整窗口读取单个字节时,不应更新窗口,以避免愚蠢的窗口综合症(SWS;请参阅[RFC813])。如果远程对等方使用单个字节的数据来探测窗口,则可以将该字节接收到缓冲区中。在一些实现中,此时对有符号比较的否定参数会导致在窗口之外考虑所有进一步的新数据;因此,它被丢弃(在发送确认以重新同步之后)。这些丢弃包括本地TCP发送的数据包的ACK,因此TCP将考虑未确认的数据。

Consequently, the application may be unable to complete sending new data to the remote peer, because it has exhausted the transmit buffer available to its local TCP, and buffer space is never being freed because incoming ACKs that would do so are being discarded. If the application does not read any more data, which may happen due to its failure to complete such sends, then deadlock results.

因此,应用程序可能无法完成向远程对等方发送新数据,因为它已耗尽其本地TCP可用的传输缓冲区,并且缓冲区空间永远不会被释放,因为将这样做的传入ACK被丢弃。如果应用程序没有读取更多的数据,这可能是由于未能完成此类发送而导致的,则会导致死锁。

Significance It's relatively rare for applications to use TCP in a manner that can exercise this problem. Most applications only transmit bulk data if they know the other end is prepared to receive the data. However, if a client fails to consume data, putting the server in persist mode, and then consumes a small amount of data, it can mistakenly compute a negative window. At this point the client will discard all further packets from the server, including ACKs of the client's own data, since they are not inside the (impossibly-sized) window. If subsequently the client consumes enough data to then send a window update to the server, the situation will be rectified. That is, this situation can only happen if the client consumes 1 < N < MSS bytes, so as not to cause a window update, and then starts its own transmission towards the server of more than a window's worth of data.

意义应用程序很少使用TCP来解决这个问题。大多数应用程序只有在知道另一端准备接收数据时才发送批量数据。但是,如果客户机无法使用数据,将服务器置于持久模式,然后使用少量数据,则可能会错误地计算负窗口。此时,客户端将丢弃来自服务器的所有其他数据包,包括客户端自身数据的ACK,因为它们不在(不可能的大小)窗口内。如果随后客户端消耗了足够的数据,然后向服务器发送窗口更新,则情况将得到纠正。也就是说,只有当客户端消耗1<N<MSS字节以避免导致窗口更新,然后开始向服务器传输超过一个窗口值的数据时,才会发生这种情况。

Implications TCP connections will hang and eventually time out.

TCP连接将挂起并最终超时。

Relevant RFCs RFC 793 describes zero window probing. RFC 813 describes Silly Window Syndrome.

相关RFC RFC 793描述了零窗口探测。RFC813描述了傻窗综合征。

Trace file demonstrating it Trace made from a version of tcpdump modified to print out the sequence number attached to an ACK even if it's dataless. An unmodified tcpdump would not print seq:seq(0); however, for this bug, the sequence number in the ACK is important for unambiguously determining how the TCP is behaving.

显示它的跟踪文件跟踪由修改后的tcpdump版本生成,以打印出附加到ACK的序列号,即使它是无数据的。未修改的tcpdump不会打印seq:seq(0);但是,对于这个错误,ACK中的序列号对于明确确定TCP的行为非常重要。

   [ Normal connection startup and data transmission from B to A.
     Options, including MSS of 16344 in both directions, omitted
     for clarity. ]
   16:07:32.327616 A > B: S 65360807:65360807(0) win 8192
   16:07:32.327304 B > A: S 65488807:65488807(0) ack 65360808 win 57344
   16:07:32.327425 A > B: . 1:1(0) ack 1 win 57344
   16:07:32.345732 B > A: P 1:2049(2048) ack 1 win 57344
   16:07:32.347013 B > A: P 2049:16385(14336) ack 1 win 57344
   16:07:32.347550 B > A: P 16385:30721(14336) ack 1 win 57344
   16:07:32.348683 B > A: P 30721:45057(14336) ack 1 win 57344
   16:07:32.467286 A > B: . 1:1(0) ack 45057 win 12288
        
   [ Normal connection startup and data transmission from B to A.
     Options, including MSS of 16344 in both directions, omitted
     for clarity. ]
   16:07:32.327616 A > B: S 65360807:65360807(0) win 8192
   16:07:32.327304 B > A: S 65488807:65488807(0) ack 65360808 win 57344
   16:07:32.327425 A > B: . 1:1(0) ack 1 win 57344
   16:07:32.345732 B > A: P 1:2049(2048) ack 1 win 57344
   16:07:32.347013 B > A: P 2049:16385(14336) ack 1 win 57344
   16:07:32.347550 B > A: P 16385:30721(14336) ack 1 win 57344
   16:07:32.348683 B > A: P 30721:45057(14336) ack 1 win 57344
   16:07:32.467286 A > B: . 1:1(0) ack 45057 win 12288
        
   16:07:32.467854 B > A: P 45057:57345(12288) ack 1 win 57344
        
   16:07:32.467854 B > A: P 45057:57345(12288) ack 1 win 57344
        
   [ B fills up A's offered window ]
   16:07:32.667276 A > B: . 1:1(0) ack 57345 win 0
        
   [ B fills up A's offered window ]
   16:07:32.667276 A > B: . 1:1(0) ack 57345 win 0
        
   [ B probes A's window with a single byte ]
   16:07:37.467438 B > A: . 57345:57346(1) ack 1 win 57344
        
   [ B probes A's window with a single byte ]
   16:07:37.467438 B > A: . 57345:57346(1) ack 1 win 57344
        
   [ A resynchronizes without accepting the byte ]
   16:07:37.467678 A > B: . 1:1(0) ack 57345 win 0
        
   [ A resynchronizes without accepting the byte ]
   16:07:37.467678 A > B: . 1:1(0) ack 57345 win 0
        
   [ B probes A's window again ]
   16:07:45.467438 B > A: . 57345:57346(1) ack 1 win 57344
        
   [ B probes A's window again ]
   16:07:45.467438 B > A: . 57345:57346(1) ack 1 win 57344
        
   [ A resynchronizes and accepts the byte (per the ack field) ]
   16:07:45.667250 A > B: . 1:1(0) ack 57346 win 0
        
   [ A resynchronizes and accepts the byte (per the ack field) ]
   16:07:45.667250 A > B: . 1:1(0) ack 57346 win 0
        
   [ The application on A has started generating data.  The first
     packet A sends is small due to a memory allocation bug. ]
   16:07:51.358459 A > B: P 1:2049(2048) ack 57346 win 0
        
   [ The application on A has started generating data.  The first
     packet A sends is small due to a memory allocation bug. ]
   16:07:51.358459 A > B: P 1:2049(2048) ack 57346 win 0
        
   [ B acks A's first packet ]
   16:07:51.467239 B > A: . 57346:57346(0) ack 2049 win 57344
        
   [ B acks A's first packet ]
   16:07:51.467239 B > A: . 57346:57346(0) ack 2049 win 57344
        

[ This looks as though A accepted B's ACK and is sending another packet in response to it. In fact, A is trying to resynchronize with B, and happens to have data to send and can send it because the first small packet didn't use up cwnd. ] 16:07:51.467698 A > B: . 2049:14337(12288) ack 57346 win 0

[这看起来像是A接受了B的ACK并发送了另一个数据包作为响应。事实上,A正在尝试与B重新同步,碰巧有数据要发送并且可以发送,因为第一个小数据包没有使用cwnd。]16:07:51.467698 A>B:。2049:14337(12288)ack 57346赢0

   [ B acks all of the data that A has sent ]
   16:07:51.667283 B > A: . 57346:57346(0) ack 14337 win 57344
        
   [ B acks all of the data that A has sent ]
   16:07:51.667283 B > A: . 57346:57346(0) ack 14337 win 57344
        
   [ A tries to resynchronize.  Notice that by the packets
     seen on the network, A and B *are* in fact synchronized;
     A only thinks that they aren't. ]
   16:07:51.667477 A > B: . 14337:14337(0) ack 57346 win 0
        
   [ A tries to resynchronize.  Notice that by the packets
     seen on the network, A and B *are* in fact synchronized;
     A only thinks that they aren't. ]
   16:07:51.667477 A > B: . 14337:14337(0) ack 57346 win 0
        
   [ A's retransmit timer fires, and B acks all of the data.
     A once again tries to resynchronize. ]
   16:07:52.467682 A > B: . 1:14337(14336) ack 57346 win 0
   16:07:52.468166 B > A: . 57346:57346(0) ack 14337 win 57344
   16:07:52.468248 A > B: . 14337:14337(0) ack 57346 win 0
        
   [ A's retransmit timer fires, and B acks all of the data.
     A once again tries to resynchronize. ]
   16:07:52.467682 A > B: . 1:14337(14336) ack 57346 win 0
   16:07:52.468166 B > A: . 57346:57346(0) ack 14337 win 57344
   16:07:52.468248 A > B: . 14337:14337(0) ack 57346 win 0
        
   [ A's retransmit timer fires again, and B acks all of the data.
     A once again tries to resynchronize. ]
   16:07:55.467684 A > B: . 1:14337(14336) ack 57346 win 0
        
   [ A's retransmit timer fires again, and B acks all of the data.
     A once again tries to resynchronize. ]
   16:07:55.467684 A > B: . 1:14337(14336) ack 57346 win 0
        
   16:07:55.468172 B > A: . 57346:57346(0) ack 14337 win 57344
   16:07:55.468254 A > B: . 14337:14337(0) ack 57346 win 0
        
   16:07:55.468172 B > A: . 57346:57346(0) ack 14337 win 57344
   16:07:55.468254 A > B: . 14337:14337(0) ack 57346 win 0
        

Trace file demonstrating correct behavior Made between the same two hosts after applying the bug fix mentioned below (and using the same modified tcpdump).

跟踪文件,演示在应用下面提到的错误修复(并使用相同的修改过的tcpdump)后在相同的两台主机之间所做的正确行为。

   [ Connection starts up with data transmission from B to A.
     Note that due to a separate bug (the fact that A and B
     are communicating over a loopback driver), B erroneously
     skips slow start. ]
   17:38:09.510854 A > B: S 3110066585:3110066585(0) win 16384
   17:38:09.510926 B > A: S 3110174850:3110174850(0)
                            ack 3110066586 win 57344
   17:38:09.510953 A > B: . 1:1(0) ack 1 win 57344
   17:38:09.512956 B > A: P 1:2049(2048) ack 1 win 57344
   17:38:09.513222 B > A: P 2049:16385(14336) ack 1 win 57344
   17:38:09.513428 B > A: P 16385:30721(14336) ack 1 win 57344
   17:38:09.513638 B > A: P 30721:45057(14336) ack 1 win 57344
   17:38:09.519531 A > B: . 1:1(0) ack 45057 win 12288
   17:38:09.519638 B > A: P 45057:57345(12288) ack 1 win 57344
        
   [ Connection starts up with data transmission from B to A.
     Note that due to a separate bug (the fact that A and B
     are communicating over a loopback driver), B erroneously
     skips slow start. ]
   17:38:09.510854 A > B: S 3110066585:3110066585(0) win 16384
   17:38:09.510926 B > A: S 3110174850:3110174850(0)
                            ack 3110066586 win 57344
   17:38:09.510953 A > B: . 1:1(0) ack 1 win 57344
   17:38:09.512956 B > A: P 1:2049(2048) ack 1 win 57344
   17:38:09.513222 B > A: P 2049:16385(14336) ack 1 win 57344
   17:38:09.513428 B > A: P 16385:30721(14336) ack 1 win 57344
   17:38:09.513638 B > A: P 30721:45057(14336) ack 1 win 57344
   17:38:09.519531 A > B: . 1:1(0) ack 45057 win 12288
   17:38:09.519638 B > A: P 45057:57345(12288) ack 1 win 57344
        
   [ B fills up A's offered window ]
   17:38:09.719526 A > B: . 1:1(0) ack 57345 win 0
        
   [ B fills up A's offered window ]
   17:38:09.719526 A > B: . 1:1(0) ack 57345 win 0
        
   [ B probes A's window with a single byte.  A resynchronizes
     without accepting the byte ]
   17:38:14.499661 B > A: . 57345:57346(1) ack 1 win 57344
   17:38:14.499724 A > B: . 1:1(0) ack 57345 win 0
        
   [ B probes A's window with a single byte.  A resynchronizes
     without accepting the byte ]
   17:38:14.499661 B > A: . 57345:57346(1) ack 1 win 57344
   17:38:14.499724 A > B: . 1:1(0) ack 57345 win 0
        
   [ B probes A's window again.  A resynchronizes and accepts
     the byte, as indicated by the ack field ]
   17:38:19.499764 B > A: . 57345:57346(1) ack 1 win 57344
   17:38:19.519731 A > B: . 1:1(0) ack 57346 win 0
        
   [ B probes A's window again.  A resynchronizes and accepts
     the byte, as indicated by the ack field ]
   17:38:19.499764 B > A: . 57345:57346(1) ack 1 win 57344
   17:38:19.519731 A > B: . 1:1(0) ack 57346 win 0
        
   [ B probes A's window with a single byte.  A resynchronizes
     without accepting the byte ]
   17:38:24.499865 B > A: . 57346:57347(1) ack 1 win 57344
   17:38:24.499934 A > B: . 1:1(0) ack 57346 win 0
        
   [ B probes A's window with a single byte.  A resynchronizes
     without accepting the byte ]
   17:38:24.499865 B > A: . 57346:57347(1) ack 1 win 57344
   17:38:24.499934 A > B: . 1:1(0) ack 57346 win 0
        
   [ The application on A has started generating data.
     B acks A's data and A accepts the ACKs and the
     data transfer continues ]
   17:38:28.530265 A > B: P 1:2049(2048) ack 57346 win 0
   17:38:28.719914 B > A: . 57346:57346(0) ack 2049 win 57344
        
   [ The application on A has started generating data.
     B acks A's data and A accepts the ACKs and the
     data transfer continues ]
   17:38:28.530265 A > B: P 1:2049(2048) ack 57346 win 0
   17:38:28.719914 B > A: . 57346:57346(0) ack 2049 win 57344
        
   17:38:28.720023 A > B: . 2049:16385(14336) ack 57346 win 0
   17:38:28.720089 A > B: . 16385:30721(14336) ack 57346 win 0
        
   17:38:28.720023 A > B: . 2049:16385(14336) ack 57346 win 0
   17:38:28.720089 A > B: . 16385:30721(14336) ack 57346 win 0
        
   17:38:28.720370 B > A: . 57346:57346(0) ack 30721 win 57344
        
   17:38:28.720370 B > A: . 57346:57346(0) ack 30721 win 57344
        
   17:38:28.720462 A > B: . 30721:45057(14336) ack 57346 win 0
   17:38:28.720526 A > B: P 45057:59393(14336) ack 57346 win 0
   17:38:28.720824 A > B: P 59393:73729(14336) ack 57346 win 0
   17:38:28.721124 B > A: . 57346:57346(0) ack 73729 win 47104
        
   17:38:28.720462 A > B: . 30721:45057(14336) ack 57346 win 0
   17:38:28.720526 A > B: P 45057:59393(14336) ack 57346 win 0
   17:38:28.720824 A > B: P 59393:73729(14336) ack 57346 win 0
   17:38:28.721124 B > A: . 57346:57346(0) ack 73729 win 47104
        
   17:38:28.721198 A > B: P 73729:88065(14336) ack 57346 win 0
   17:38:28.721379 A > B: P 88065:102401(14336) ack 57346 win 0
        
   17:38:28.721198 A > B: P 73729:88065(14336) ack 57346 win 0
   17:38:28.721379 A > B: P 88065:102401(14336) ack 57346 win 0
        
   17:38:28.721557 A > B: P 102401:116737(14336) ack 57346 win 0
   17:38:28.721863 B > A: . 57346:57346(0) ack 116737 win 36864
        
   17:38:28.721557 A > B: P 102401:116737(14336) ack 57346 win 0
   17:38:28.721863 B > A: . 57346:57346(0) ack 116737 win 36864
        

References None known.

参考文献不详。

How to detect Initiate a connection from a client to a server. Have the server continuously send data until its buffers have been full for long enough to exhaust the window. Next, have the client read 1 byte and then delay for long enough that the server TCP sends a window probe. Now have the client start sending data. At this point, if it ignores the server's ACKs, then the client's TCP suffers from the problem.

如何检测并启动从客户端到服务器的连接。让服务器连续发送数据,直到其缓冲区已满足够长的时间以耗尽窗口。接下来,让客户端读取1字节,然后延迟足够长的时间,以便服务器TCP发送一个窗口探测。现在让客户端开始发送数据。此时,如果忽略服务器的ACK,那么客户端的TCP就会出现问题。

How to fix In one implementation known to exhibit the problem (derived from 4.3-Reno), the problem was introduced when the macro MAX() was replaced by the function call max() for computing the amount of space in the receive window:

如何在已知存在此问题的一个实现(源自4.3-Reno)中修复,当宏MAX()被函数调用MAX()替换以计算接收窗口中的空间量时,出现了此问题:

          tp->rcv_wnd = max(win, (int)(tp->rcv_adv - tp->rcv_nxt));
        
          tp->rcv_wnd = max(win, (int)(tp->rcv_adv - tp->rcv_nxt));
        

When data has been received into a window beyond what has been advertised to the other side, rcv_nxt > rcv_adv, making this negative. It's clear from the (int) cast that this is intended, but the unsigned max() function sign-extends so the negative number is "larger". The fix is to change max() to imax():

当接收到的数据超出了向另一方公布的数据时,rcv_nxt>rcv_adv,使其为负值。从(int)转换中可以清楚地看出这是有意的,但是无符号max()函数符号扩展了,因此负数“更大”。修复方法是将max()更改为imax():

          tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));
        
          tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));
        

4.3-Tahoe and before did not have this bug, since it used the macro MAX() for this calculation.

4.3-Tahoe及之前版本没有此错误,因为它使用宏MAX()进行此计算。

2.13.

2.13.

Name of Problem Stretch ACK violation

问题的名称拉伸ACK冲突

Classification Congestion Control/Performance

分类拥塞控制/性能

Description To improve efficiency (both computer and network) a data receiver may refrain from sending an ACK for each incoming segment, according to [RFC1122]. However, an ACK should not be delayed an inordinate amount of time. Specifically, ACKs SHOULD be sent for every second full-sized segment that arrives. If a second full-sized segment does not arrive within a given timeout (of no more than 0.5 seconds), an ACK should be transmitted, according to [RFC1122]. A TCP receiver which does not generate an ACK for every second full-sized segment exhibits a "Stretch ACK Violation".

说明为了提高效率(计算机和网络),根据[RFC1122],数据接收器可以避免为每个输入段发送ACK。但是,ACK不应延迟过多的时间。具体而言,应为每一秒到达的完整段发送ACK。如果第二个全尺寸段未在给定超时(不超过0.5秒)内到达,则应根据[RFC1122]发送ACK。如果TCP接收器不为每秒钟的全尺寸段生成ACK,则会显示“拉伸ACK冲突”。

Significance TCP receivers exhibiting this behavior will cause TCP senders to generate burstier traffic, which can degrade performance in congested environments. In addition, generating fewer ACKs increases the amount of time needed by the slow start algorithm to open the congestion window to an appropriate point, which diminishes performance in environments with large bandwidth-delay products. Finally, generating fewer ACKs may cause needless retransmission timeouts in lossy environments, as it increases the possibility that an entire window of ACKs is lost, forcing a retransmission timeout.

重要的是,表现出这种行为的TCP接收方将导致TCP发送方产生更大的流量,这会降低拥塞环境中的性能。此外,生成更少的ack会增加慢启动算法将拥塞窗口打开到适当点所需的时间,这会降低具有大带宽延迟产品的环境中的性能。最后,生成较少的ack可能会在有损环境中导致不必要的重新传输超时,因为这会增加整个ack窗口丢失的可能性,从而导致重新传输超时。

Implications When not in loss recovery, every ACK received by a TCP sender triggers the transmission of new data segments. The burst size is determined by the number of previously unacknowledged segments each ACK covers. Therefore, a TCP receiver ack'ing more than 2 segments at a time causes the sending TCP to generate a larger burst of traffic upon receipt of the ACK. This large burst of traffic can overwhelm an intervening gateway, leading to higher drop rates for both the connection and other connections passing through the congested gateway.

含义当不在丢失恢复中时,TCP发送方收到的每个ACK都会触发新数据段的传输。突发大小由每个ACK覆盖的先前未确认的段的数量决定。因此,TCP接收器一次确认超过2个段会导致发送TCP在收到确认后产生更大的流量突发。这一大流量突发可能会压倒介入网关,导致通过拥塞网关的连接和其他连接的丢弃率更高。

In addition, the TCP slow start algorithm increases the congestion window by 1 segment for each ACK received. Therefore, increasing the ACK interval (thus decreasing the rate at which ACKs are transmitted) increases the amount of time it takes slow start to increase the congestion window to an appropriate operating point, and the connection consequently suffers from reduced performance. This is especially true for connections using large windows.

此外,TCP慢启动算法将每个接收到的ACK的拥塞窗口增加1段。因此,增加ACK间隔(从而降低ACK的传输速率)会增加慢启动将拥塞窗口增加到适当操作点所需的时间量,并且连接因此会受到性能降低的影响。对于使用大窗口的连接尤其如此。

Relevant RFCs RFC 1122 outlines delayed ACKs as a recommended mechanism.

相关RFC RFC 1122概述了延迟ACK作为推荐机制。

Trace file demonstrating it Trace file taken using tcpdump at host B, the data receiver (and ACK originator). The advertised window (which never changed) and timestamp options have been omitted for clarity, except for the first packet sent by A:

跟踪文件,用于演示在数据接收方(和ACK发起者)主机B上使用tcpdump获取的跟踪文件。为清楚起见,已省略广告窗口(从未更改)和时间戳选项,但由以下用户发送的第一个数据包除外:

   12:09:24.820187 A.1174 > B.3999: . 2049:3497(1448) ack 1
       win 33580 <nop,nop,timestamp 2249877 2249914> [tos 0x8]
   12:09:24.824147 A.1174 > B.3999: . 3497:4945(1448) ack 1
   12:09:24.832034 A.1174 > B.3999: . 4945:6393(1448) ack 1
   12:09:24.832222 B.3999 > A.1174: . ack 6393
   12:09:24.934837 A.1174 > B.3999: . 6393:7841(1448) ack 1
   12:09:24.942721 A.1174 > B.3999: . 7841:9289(1448) ack 1
   12:09:24.950605 A.1174 > B.3999: . 9289:10737(1448) ack 1
   12:09:24.950797 B.3999 > A.1174: . ack 10737
   12:09:24.958488 A.1174 > B.3999: . 10737:12185(1448) ack 1
   12:09:25.052330 A.1174 > B.3999: . 12185:13633(1448) ack 1
   12:09:25.060216 A.1174 > B.3999: . 13633:15081(1448) ack 1
   12:09:25.060405 B.3999 > A.1174: . ack 15081
        
   12:09:24.820187 A.1174 > B.3999: . 2049:3497(1448) ack 1
       win 33580 <nop,nop,timestamp 2249877 2249914> [tos 0x8]
   12:09:24.824147 A.1174 > B.3999: . 3497:4945(1448) ack 1
   12:09:24.832034 A.1174 > B.3999: . 4945:6393(1448) ack 1
   12:09:24.832222 B.3999 > A.1174: . ack 6393
   12:09:24.934837 A.1174 > B.3999: . 6393:7841(1448) ack 1
   12:09:24.942721 A.1174 > B.3999: . 7841:9289(1448) ack 1
   12:09:24.950605 A.1174 > B.3999: . 9289:10737(1448) ack 1
   12:09:24.950797 B.3999 > A.1174: . ack 10737
   12:09:24.958488 A.1174 > B.3999: . 10737:12185(1448) ack 1
   12:09:25.052330 A.1174 > B.3999: . 12185:13633(1448) ack 1
   12:09:25.060216 A.1174 > B.3999: . 13633:15081(1448) ack 1
   12:09:25.060405 B.3999 > A.1174: . ack 15081
        

This portion of the trace clearly shows that the receiver (host B) sends an ACK for every third full sized packet received. Further investigation of this implementation found that the cause of the increased ACK interval was the TCP options being used. The implementation sent an ACK after it was holding 2*MSS worth of unacknowledged data. In the above case, the MSS is 1460 bytes so the receiver transmits an ACK after it is holding at least 2920 bytes of unacknowledged data. However, the length of the TCP options being used [RFC1323] took 12 bytes away from the data portion of each packet. This produced packets containing 1448 bytes of data. But the additional bytes used by the options in the header were not taken into account when determining when to trigger an ACK. Therefore, it took 3 data segments before the data receiver was holding enough unacknowledged data (>= 2*MSS, or 2920 bytes in the above example) to transmit an ACK.

跟踪的这一部分清楚地表明,接收器(主机B)每接收三个完整大小的数据包就发送一个ACK。对此实现的进一步调查发现,ACK间隔增加的原因是所使用的TCP选项。实现在保存了价值2*MSS的未确认数据后发送了一个ACK。在上述情况下,MSS为1460字节,因此接收器在保存至少2920字节的未确认数据后发送ACK。然而,正在使用的TCP选项的长度[RFC1323]从每个数据包的数据部分拿走了12个字节。这产生了包含1448字节数据的数据包。但是,在确定何时触发ACK时,没有考虑报头中的选项使用的额外字节。因此,在数据接收器持有足够的未确认数据(>=2*MSS,或在上述示例中为2920字节)以发送ACK之前,需要3个数据段。

Trace file demonstrating correct behavior Trace file taken using tcpdump at host B, the data receiver (and ACK originator), again with window and timestamp information omitted except for the first packet:

显示正确行为的跟踪文件在数据接收方(和ACK发起者)主机B上使用tcpdump获取的跟踪文件,同样省略窗口和时间戳信息,但第一个数据包除外:

   12:06:53.627320 A.1172 > B.3999: . 1449:2897(1448) ack 1
       win 33580 <nop,nop,timestamp 2249575 2249612> [tos 0x8]
   12:06:53.634773 A.1172 > B.3999: . 2897:4345(1448) ack 1
   12:06:53.634961 B.3999 > A.1172: . ack 4345
   12:06:53.737326 A.1172 > B.3999: . 4345:5793(1448) ack 1
   12:06:53.744401 A.1172 > B.3999: . 5793:7241(1448) ack 1
   12:06:53.744592 B.3999 > A.1172: . ack 7241
        
   12:06:53.627320 A.1172 > B.3999: . 1449:2897(1448) ack 1
       win 33580 <nop,nop,timestamp 2249575 2249612> [tos 0x8]
   12:06:53.634773 A.1172 > B.3999: . 2897:4345(1448) ack 1
   12:06:53.634961 B.3999 > A.1172: . ack 4345
   12:06:53.737326 A.1172 > B.3999: . 4345:5793(1448) ack 1
   12:06:53.744401 A.1172 > B.3999: . 5793:7241(1448) ack 1
   12:06:53.744592 B.3999 > A.1172: . ack 7241
        
   12:06:53.752287 A.1172 > B.3999: . 7241:8689(1448) ack 1
   12:06:53.847332 A.1172 > B.3999: . 8689:10137(1448) ack 1
   12:06:53.847525 B.3999 > A.1172: . ack 10137
        
   12:06:53.752287 A.1172 > B.3999: . 7241:8689(1448) ack 1
   12:06:53.847332 A.1172 > B.3999: . 8689:10137(1448) ack 1
   12:06:53.847525 B.3999 > A.1172: . ack 10137
        

This trace shows the TCP receiver (host B) ack'ing every second full-sized packet, according to [RFC1122]. This is the same implementation shown above, with slight modifications that allow the receiver to take the length of the options into account when deciding when to transmit an ACK.

根据[RFC1122],此跟踪显示TCP接收器(主机B)每秒确认一个完整大小的数据包。这与上面所示的实现相同,只是稍作修改,允许接收机在决定何时发送ACK时考虑选项的长度。

References This problem is documented in [Allman97] and [Paxson97].

参考文献[Allman97]和[Paxson97]中记录了此问题。

How to detect Stretch ACK violations show up immediately in receiver-side packet traces of bulk transfers, as shown above. However, packet traces made on the sender side of the TCP connection may lead to ambiguities when diagnosing this problem due to the possibility of lost ACKs.

如上图所示,如何检测拉伸ACK冲突会立即出现在批量传输的接收方数据包跟踪中。但是,由于可能会丢失ACK,在TCP连接的发送方端进行的数据包跟踪在诊断此问题时可能会导致歧义。

2.14.

2.14.

Name of Problem Retransmission sends multiple packets

问题重新传输的名称发送多个数据包

Classification Congestion control

分类拥塞控制

Description When a TCP retransmits a segment due to a timeout expiration or beginning a fast retransmission sequence, it should only transmit a single segment. A TCP that transmits more than one segment exhibits "Retransmission Sends Multiple Packets".

说明当TCP由于超时过期或开始快速重传序列而重传一个段时,它应该只传输一个段。传输多个数据段的TCP显示“重传发送多个数据包”。

Instances of this problem have been known to occur due to miscomputations involving the use of TCP options. TCP options increase the TCP header beyond its usual size of 20 bytes. The total size of header must be taken into account when retransmitting a packet. If a TCP sender does not account for the length of the TCP options when determining how much data to retransmit, it will send too much data to fit into a single packet. In this case, the correct retransmission will be followed by a short segment (tinygram) containing data that may not need to be retransmitted.

已知由于涉及使用TCP选项的错误计算而发生此问题的实例。TCP选项将TCP头的大小增加到通常的20字节以外。在重新传输数据包时,必须考虑报头的总大小。如果TCP发送方在确定要重新传输的数据量时未考虑TCP选项的长度,则它将发送太多数据,无法放入单个数据包中。在这种情况下,正确的重传之后将出现一个短段(tinygram),其中包含可能不需要重传的数据。

A specific case is a TCP using the RFC 1323 timestamp option, which adds 12 bytes to the standard 20-byte TCP header. On retransmission of a packet, the 12 byte option is incorrectly

一种特殊情况是使用RFC1323时间戳选项的TCP,它向标准的20字节TCP报头添加12字节。在重新传输数据包时,12字节选项错误

interpreted as part of the data portion of the segment. A standard TCP header and a new 12-byte option is added to the data, which yields a transmission of 12 bytes more data than contained in the original segment. This overflow causes a smaller packet, with 12 data bytes, to be transmitted.

解释为段数据部分的一部分。一个标准TCP头和一个新的12字节选项被添加到数据中,这将产生比原始段中包含的数据多12字节的传输。此溢出导致传输的数据包较小,包含12个数据字节。

Significance This problem is somewhat serious for congested environments because the TCP implementation injects more packets into the network than is appropriate. However, since a tinygram is only sent in response to a fast retransmit or a timeout, it does not effect the sustained sending rate.

重要意义对于拥挤的环境来说,这个问题有点严重,因为TCP实现向网络中注入的数据包比适当的多。然而,由于tinygram仅在响应快速重传或超时时发送,因此它不会影响持续发送速率。

Implications A TCP exhibiting this behavior is stressing the network with more traffic than appropriate, and stressing routers by increasing the number of packets they must process. The redundant tinygram will also elicit a duplicate ACK from the receiver, resulting in yet another unnecessary transmission.

暗示:表现出这种行为的TCP给网络带来了过多的流量,并通过增加路由器必须处理的数据包数量给路由器带来了压力。冗余的tinygram还将从接收器获取重复的ACK,从而导致另一个不必要的传输。

Relevant RFCs RFC 1122 requires use of slow start after loss; RFC 2001 explicates slow start; RFC 1323 describes the timestamp option that has been observed to lead to some implementations exhibiting this problem.

相关RFC RFC 1122要求在丢失后使用慢启动;RFC 2001解释了缓慢启动;RFC 1323描述了时间戳选项,已观察到该选项导致某些实现出现此问题。

Trace file demonstrating it Made using tcpdump recording at a machine on the same subnet as Host A. Host A is the sender and Host B is the receiver. The advertised window and timestamp options have been omitted for clarity, except for the first segment sent by host A. In addition, portions of the trace file not pertaining to the packet in question have been removed (missing packets are denoted by "[...]" in the trace).

在与主机a位于同一子网的计算机上使用tcpdump记录生成的跟踪文件。主机a是发送方,主机B是接收方。为清楚起见,省略了公布的窗口和时间戳选项,主机A发送的第一段除外。此外,已删除跟踪文件中与所述数据包无关的部分(丢失的数据包在跟踪中用“[…]”表示)。

   11:55:22.701668 A > B: . 7361:7821(460) ack 1
       win 49324 <nop,nop,timestamp 3485348 3485113>
   11:55:22.702109 A > B: . 7821:8281(460) ack 1
   [...]
        
   11:55:22.701668 A > B: . 7361:7821(460) ack 1
       win 49324 <nop,nop,timestamp 3485348 3485113>
   11:55:22.702109 A > B: . 7821:8281(460) ack 1
   [...]
        
   11:55:23.112405 B > A: . ack 7821
   11:55:23.113069 A > B: . 12421:12881(460) ack 1
   11:55:23.113511 A > B: . 12881:13341(460) ack 1
   11:55:23.333077 B > A: . ack 7821
   11:55:23.336860 B > A: . ack 7821
   11:55:23.340638 B > A: . ack 7821
   11:55:23.341290 A > B: . 7821:8281(460) ack 1
   11:55:23.341317 A > B: . 8281:8293(12) ack 1
        
   11:55:23.112405 B > A: . ack 7821
   11:55:23.113069 A > B: . 12421:12881(460) ack 1
   11:55:23.113511 A > B: . 12881:13341(460) ack 1
   11:55:23.333077 B > A: . ack 7821
   11:55:23.336860 B > A: . ack 7821
   11:55:23.340638 B > A: . ack 7821
   11:55:23.341290 A > B: . 7821:8281(460) ack 1
   11:55:23.341317 A > B: . 8281:8293(12) ack 1
        
   11:55:23.498242 B > A: . ack 7821
   11:55:23.506850 B > A: . ack 7821
   11:55:23.510630 B > A: . ack 7821
        
   11:55:23.498242 B > A: . ack 7821
   11:55:23.506850 B > A: . ack 7821
   11:55:23.510630 B > A: . ack 7821
        

[...]

[...]

   11:55:23.746649 B > A: . ack 10581
        
   11:55:23.746649 B > A: . ack 10581
        

The second line of the above trace shows the original transmission of a segment which is later dropped. After 3 duplicate ACKs, line 9 of the trace shows the dropped packet (7821:8281), with a 460- byte payload, being retransmitted. Immediately following this retransmission, a packet with a 12-byte payload is unnecessarily sent.

上面跟踪的第二行显示了后来丢弃的段的原始传输。在3次重复确认之后,跟踪的第9行显示丢弃的数据包(7821:8281),其有效负载为460字节,正在重新传输。在该重传之后,立即不必要地发送具有12字节有效负载的数据包。

Trace file demonstrating correct behavior The trace file would be identical to the one above, with a single line:

显示正确行为的跟踪文件跟踪文件与上面的跟踪文件相同,只有一行:

      11:55:23.341317 A > B: . 8281:8293(12) ack 1
        
      11:55:23.341317 A > B: . 8281:8293(12) ack 1
        

omitted.

省略。

References [Brakmo95]

参考文献[Brakmo95]

How to detect This problem can be detected by examining a packet trace of the TCP connections of a machine using TCP options, during which a packet is retransmitted.

如何检测此问题可以通过使用TCP选项检查机器TCP连接的数据包跟踪来检测,在此期间数据包被重新传输。

2.15.

2.15.

Name of Problem Failure to send FIN notification promptly

未能及时发送FIN通知的问题名称

Classification Performance

分类性能

Description When an application closes a connection, the corresponding TCP should send the FIN notification promptly to its peer (unless prevented by the congestion window). If a TCP implementation delays in sending the FIN notification, for example due to waiting until unacknowledged data has been acknowledged, then it is said to exhibit "Failure to send FIN notification promptly".

说明当应用程序关闭连接时,相应的TCP应立即向其对等方发送FIN通知(除非拥塞窗口阻止)。如果TCP实现延迟发送FIN通知,例如由于等待未确认的数据被确认,则称为“未能及时发送FIN通知”。

Also, while not strictly required, FIN segments should include the PSH flag to ensure expedited delivery of any pending data at the receiver.

此外,虽然没有严格要求,但FIN段应包括PSH标志,以确保在接收方快速交付任何未决数据。

Significance The greatest impact occurs for short-lived connections, since for these the additional time required to close the connection introduces the greatest relative delay.

重要意义对于短命连接,影响最大,因为对于短命连接,关闭连接所需的额外时间会带来最大的相对延迟。

The additional time can be significant in the common case of the sender waiting for an ACK that is delayed by the receiver.

在发送方等待由接收方延迟的ACK的常见情况下,额外的时间可能是重要的。

Implications Can diminish total throughput as seen at the application layer, because connection termination takes longer to complete.

这可能会降低应用层的总吞吐量,因为连接终止需要更长的时间才能完成。

Relevant RFCs RFC 793 indicates that a receiver should treat an incoming FIN flag as implying the push function.

相关RFCs RFC 793指出,接收器应将传入FIN标志视为暗示推送功能。

Trace file demonstrating it Made using tcpdump (no losses reported by the packet filter).

使用tcpdump制作的跟踪文件(数据包过滤器未报告任何丢失)。

   10:04:38.68 A > B: S 1031850376:1031850376(0) win 4096
                   <mss 1460,wscale 0,eol> (DF)
   10:04:38.71 B > A: S 596916473:596916473(0) ack 1031850377
                   win 8760 <mss 1460> (DF)
   10:04:38.73 A > B: . ack 1 win 4096 (DF)
   10:04:41.98 A > B: P 1:4(3) ack 1 win 4096 (DF)
   10:04:42.15 B > A: . ack 4 win 8757 (DF)
   10:04:42.23 A > B: P 4:7(3) ack 1 win 4096 (DF)
   10:04:42.25 B > A: P 1:11(10) ack 7 win 8754 (DF)
   10:04:42.32 A > B: . ack 11 win 4096 (DF)
   10:04:42.33 B > A: P 11:51(40) ack 7 win 8754 (DF)
   10:04:42.51 A > B: . ack 51 win 4096 (DF)
   10:04:42.53 B > A: F 51:51(0) ack 7 win 8754 (DF)
   10:04:42.56 A > B: FP 7:7(0) ack 52 win 4096 (DF)
   10:04:42.58 B > A: . ack 8 win 8754 (DF)
        
   10:04:38.68 A > B: S 1031850376:1031850376(0) win 4096
                   <mss 1460,wscale 0,eol> (DF)
   10:04:38.71 B > A: S 596916473:596916473(0) ack 1031850377
                   win 8760 <mss 1460> (DF)
   10:04:38.73 A > B: . ack 1 win 4096 (DF)
   10:04:41.98 A > B: P 1:4(3) ack 1 win 4096 (DF)
   10:04:42.15 B > A: . ack 4 win 8757 (DF)
   10:04:42.23 A > B: P 4:7(3) ack 1 win 4096 (DF)
   10:04:42.25 B > A: P 1:11(10) ack 7 win 8754 (DF)
   10:04:42.32 A > B: . ack 11 win 4096 (DF)
   10:04:42.33 B > A: P 11:51(40) ack 7 win 8754 (DF)
   10:04:42.51 A > B: . ack 51 win 4096 (DF)
   10:04:42.53 B > A: F 51:51(0) ack 7 win 8754 (DF)
   10:04:42.56 A > B: FP 7:7(0) ack 52 win 4096 (DF)
   10:04:42.58 B > A: . ack 8 win 8754 (DF)
        

Machine B in the trace above does not send out a FIN notification promptly if there is any data outstanding. It instead waits for all unacknowledged data to be acknowledged before sending the FIN segment. The connection was closed at 10:04.42.33 after requesting 40 bytes to be sent. However, the FIN notification isn't sent until 10:04.42.51, after the (delayed) acknowledgement of the 40 bytes of data.

如果有任何未完成的数据,上述跟踪中的机器B不会立即发送FIN通知。而是在发送FIN段之前等待所有未确认的数据被确认。请求发送40个字节后,连接于10:04.42.33关闭。但是,FIN通知直到10:04.42.51,在(延迟)确认40字节数据之后才发送。

Trace file demonstrating correct behavior Made using tcpdump (no losses reported by the packet filter).

显示使用tcpdump进行的正确行为的跟踪文件(数据包筛选器未报告任何丢失)。

   10:27:53.85 C > D: S 419744533:419744533(0) win 4096
                   <mss 1460,wscale 0,eol> (DF)
   10:27:53.92 D > C: S 10082297:10082297(0) ack 419744534
                   win 8760 <mss 1460> (DF)
   10:27:53.95 C > D: . ack 1 win 4096 (DF)
   10:27:54.42 C > D: P 1:4(3) ack 1 win 4096 (DF)
   10:27:54.62 D > C: . ack 4 win 8757 (DF)
   10:27:54.76 C > D: P 4:7(3) ack 1 win 4096 (DF)
   10:27:54.89 D > C: P 1:11(10) ack 7 win 8754 (DF)
   10:27:54.90 D > C: FP 11:51(40) ack7 win 8754 (DF)
   10:27:54.92 C > D: . ack 52 win 4096 (DF)
   10:27:55.01 C > D: FP 7:7(0) ack 52 win 4096 (DF)
   10:27:55.09 D > C: . ack 8 win 8754 (DF)
        
   10:27:53.85 C > D: S 419744533:419744533(0) win 4096
                   <mss 1460,wscale 0,eol> (DF)
   10:27:53.92 D > C: S 10082297:10082297(0) ack 419744534
                   win 8760 <mss 1460> (DF)
   10:27:53.95 C > D: . ack 1 win 4096 (DF)
   10:27:54.42 C > D: P 1:4(3) ack 1 win 4096 (DF)
   10:27:54.62 D > C: . ack 4 win 8757 (DF)
   10:27:54.76 C > D: P 4:7(3) ack 1 win 4096 (DF)
   10:27:54.89 D > C: P 1:11(10) ack 7 win 8754 (DF)
   10:27:54.90 D > C: FP 11:51(40) ack7 win 8754 (DF)
   10:27:54.92 C > D: . ack 52 win 4096 (DF)
   10:27:55.01 C > D: FP 7:7(0) ack 52 win 4096 (DF)
   10:27:55.09 D > C: . ack 8 win 8754 (DF)
        

Here, Machine D sends a FIN with 40 bytes of data even before the original 10 octets have been acknowledged. This is correct behavior as it provides for the highest performance.

在这里,机器D发送一个包含40字节数据的FIN,甚至在原始10个八位字节被确认之前。这是正确的行为,因为它提供了最高的性能。

References This problem is documented in [Dawson97].

参考文献[Dawson97]中记录了该问题。

How to detect For implementations manifesting this problem, it shows up on a packet trace.

如何检测出现此问题的实现,它显示在数据包跟踪上。

2.16.

2.16.

Name of Problem Failure to send a RST after Half Duplex Close

半双工关闭后发送RST失败的问题名称

Classification Resource management

分类资源管理

Description RFC 1122 4.2.2.13 states that a TCP SHOULD send a RST if data is received after "half duplex close", i.e. if it cannot be delivered to the application. A TCP that fails to do so is said to exhibit "Failure to send a RST after Half Duplex Close".

说明RFC 1122 4.2.2.13规定,如果在“半双工关闭”之后收到数据,即如果数据无法传送到应用程序,则TCP应发送RST。不能这样做的TCP被称为“半双工关闭后无法发送RST”。

Significance Potentially serious for TCP endpoints that manage large numbers of connections, due to exhaustion of memory and/or process slots available for managing connection state.

由于用于管理连接状态的内存和/或进程插槽耗尽,对于管理大量连接的TCP端点来说,意义可能非常严重。

Implications Failure to send the RST can lead to permanently hung TCP connections. This problem has been demonstrated when HTTP clients abort connections, common when users move on to a new page before the current page has finished downloading. The HTTP client closes by transmitting a FIN while the server is transmitting images, text, etc. The server TCP receives the FIN, but its application does not close the connection until all data has been queued for transmission. Since the server will not transmit a FIN until all the preceding data has been transmitted, deadlock results if the client TCP does not consume the pending data or tear down the connection: the window decreases to zero, since the client cannot pass the data to the application, and the server sends probe segments. The client acknowledges the probe segments with a zero window. As mandated in RFC1122 4.2.2.17, the probe segments are transmitted forever. Server connection state remains in CLOSE_WAIT, and eventually server processes are exhausted.

含义发送RST失败可能导致TCP连接永久挂起。当HTTP客户端中止连接时,这个问题已经被证明,当用户在当前页面完成下载之前移动到新页面时,这个问题很常见。HTTP客户端在服务器传输图像、文本等时通过传输FIN关闭。服务器TCP接收FIN,但其应用程序在所有数据排队等待传输之前不会关闭连接。由于服务器在传输之前不会传输FIN,因此如果客户端TCP不使用挂起的数据或中断连接,则会导致死锁:由于客户端无法将数据传递给应用程序,因此窗口减小为零,并且服务器发送探测段。客户端使用零窗口确认探测段。按照RFC1122 4.2.2.17的规定,探头段将永久传输。服务器连接状态仍处于关闭等待状态,最终服务器进程将耗尽。

Note that there are two bugs. First, probe segments should be ignored if the window can never subsequently increase. Second, a RST should be sent when data is received after half duplex close. Fixing the first bug, but not the second, results in the probe segments eventually timing out the connection, but the server remains in CLOSE_WAIT for a significant and unnecessary period.

请注意,有两个bug。首先,如果窗口无法随后增加,则应忽略探头段。其次,在半双工关闭后接收数据时,应发送RST。修复第一个bug(而不是第二个bug)会导致探测段最终超时连接,但服务器仍处于关闭状态等待一段重要且不必要的时间。

Relevant RFCs RFC 1122 sections 4.2.2.13 and 4.2.2.17.

相关RFC RFC 1122第4.2.2.13节和第4.2.2.17节。

Trace file demonstrating it Made using an unknown network analyzer. No drop information available.

使用未知网络分析仪制作的跟踪文件。没有可用的投递信息。

   client.1391 > server.8080: S 0:1(0) ack: 0 win: 2000 <mss: 5b4>
   server.8080 > client.1391: SA 8c01:8c02(0) ack: 1 win: 8000 <mss:100>
   client.1391 > server.8080: PA
   client.1391 > server.8080: PA 1:1c2(1c1) ack: 8c02 win: 2000
   server.8080 > client.1391: [DF] PA 8c02:8cde(dc) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A 8cde:9292(5b4) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A 9292:9846(5b4) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A 9846:9dfa(5b4) ack: 1c2 win: 8000
   client.1391 > server.8080: PA
   server.8080 > client.1391: [DF] A 9dfa:a3ae(5b4) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A a3ae:a962(5b4) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A a962:af16(5b4) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A af16:b4ca(5b4) ack: 1c2 win: 8000
   client.1391 > server.8080: PA
   server.8080 > client.1391: [DF] A b4ca:ba7e(5b4) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A b4ca:ba7e(5b4) ack: 1c2 win: 8000
        
   client.1391 > server.8080: S 0:1(0) ack: 0 win: 2000 <mss: 5b4>
   server.8080 > client.1391: SA 8c01:8c02(0) ack: 1 win: 8000 <mss:100>
   client.1391 > server.8080: PA
   client.1391 > server.8080: PA 1:1c2(1c1) ack: 8c02 win: 2000
   server.8080 > client.1391: [DF] PA 8c02:8cde(dc) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A 8cde:9292(5b4) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A 9292:9846(5b4) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A 9846:9dfa(5b4) ack: 1c2 win: 8000
   client.1391 > server.8080: PA
   server.8080 > client.1391: [DF] A 9dfa:a3ae(5b4) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A a3ae:a962(5b4) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A a962:af16(5b4) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A af16:b4ca(5b4) ack: 1c2 win: 8000
   client.1391 > server.8080: PA
   server.8080 > client.1391: [DF] A b4ca:ba7e(5b4) ack: 1c2 win: 8000
   server.8080 > client.1391: [DF] A b4ca:ba7e(5b4) ack: 1c2 win: 8000
        
   client.1391 > server.8080: PA
   server.8080 > client.1391: [DF] A ba7e:bdfa(37c) ack: 1c2 win: 8000
   client.1391 > server.8080: PA
   server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c2 win: 8000
   client.1391 > server.8080: PA
        
   client.1391 > server.8080: PA
   server.8080 > client.1391: [DF] A ba7e:bdfa(37c) ack: 1c2 win: 8000
   client.1391 > server.8080: PA
   server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c2 win: 8000
   client.1391 > server.8080: PA
        

[ HTTP client aborts and enters FIN_WAIT_1 ]

[HTTP客户端中止并进入FIN_WAIT_1]

client.1391 > server.8080: FPA

client.1391>server.8080:FPA

[ server ACKs the FIN and enters CLOSE_WAIT ]

[服务器确认FIN并进入关闭\u等待]

server.8080 > client.1391: [DF] A

server.8080>client.1391:[DF]A

[ client enters FIN_WAIT_2 ]

[客户端进入FIN\u等待\u 2]

   server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
        
   server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
        

[ server continues to try to send its data ]

[服务器继续尝试发送其数据]

   client.1391 > server.8080: PA < window = 0 >
   server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
   client.1391 > server.8080: PA < window = 0 >
   server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
   client.1391 > server.8080: PA < window = 0 >
   server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
   client.1391 > server.8080: PA < window = 0 >
   server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
   client.1391 > server.8080: PA < window = 0 >
        
   client.1391 > server.8080: PA < window = 0 >
   server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
   client.1391 > server.8080: PA < window = 0 >
   server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
   client.1391 > server.8080: PA < window = 0 >
   server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
   client.1391 > server.8080: PA < window = 0 >
   server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
   client.1391 > server.8080: PA < window = 0 >
        

[ ... repeat ad exhaustium ... ]

[…完全重复…]

Trace file demonstrating correct behavior Made using an unknown network analyzer. No drop information available.

显示使用未知网络分析仪进行的正确行为的跟踪文件。没有可用的投递信息。

   client > server D=80 S=59500 Syn Seq=337 Len=0 Win=8760
   server > client D=59500 S=80 Syn Ack=338 Seq=80153 Len=0 Win=8760
   client > server D=80 S=59500 Ack=80154 Seq=338 Len=0 Win=8760
        
   client > server D=80 S=59500 Syn Seq=337 Len=0 Win=8760
   server > client D=59500 S=80 Syn Ack=338 Seq=80153 Len=0 Win=8760
   client > server D=80 S=59500 Ack=80154 Seq=338 Len=0 Win=8760
        

[ ... normal data omitted ... ]

[…忽略正常数据…]

   client > server D=80 S=59500 Ack=14559 Seq=596 Len=0 Win=8760
   server > client D=59500 S=80 Ack=596 Seq=114559 Len=1460 Win=8760
        
   client > server D=80 S=59500 Ack=14559 Seq=596 Len=0 Win=8760
   server > client D=59500 S=80 Ack=596 Seq=114559 Len=1460 Win=8760
        

[ client closes connection ]

[客户端关闭连接]

   client > server D=80 S=59500 Fin Seq=596 Len=0 Win=8760
        
   client > server D=80 S=59500 Fin Seq=596 Len=0 Win=8760
        
   server > client D=59500 S=80 Ack=597 Seq=116019 Len=1460 Win=8760
        
   server > client D=59500 S=80 Ack=597 Seq=116019 Len=1460 Win=8760
        

[ client sends RST (RFC1122 4.2.2.13) ]

[客户端发送RST(RFC1122 4.2.2.13)]

   client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
   server > client D=59500 S=80 Ack=597 Seq=117479 Len=1460 Win=8760
   client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
   server > client D=59500 S=80 Ack=597 Seq=118939 Len=1460 Win=8760
   client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
   server > client D=59500 S=80 Ack=597 Seq=120399 Len=892 Win=8760
   client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
   server > client D=59500 S=80 Ack=597 Seq=121291 Len=1460 Win=8760
   client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
        
   client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
   server > client D=59500 S=80 Ack=597 Seq=117479 Len=1460 Win=8760
   client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
   server > client D=59500 S=80 Ack=597 Seq=118939 Len=1460 Win=8760
   client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
   server > client D=59500 S=80 Ack=597 Seq=120399 Len=892 Win=8760
   client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
   server > client D=59500 S=80 Ack=597 Seq=121291 Len=1460 Win=8760
   client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
        

"client" sends a number of RSTs, one in response to each incoming packet from "server". One might wonder why "server" keeps sending data packets after it has received a RST from "client"; the explanation is that "server" had already transmitted all five of the data packets before receiving the first RST from "client", so it is too late to avoid transmitting them.

“客户端”发送多个RST,其中一个用于响应来自“服务器”的每个传入数据包。有人可能想知道为什么“服务器”在从“客户端”接收到RST后一直发送数据包;解释是,“服务器”在从“客户机”接收第一个RST之前已经传输了所有五个数据包,所以现在避免传输已经太晚了。

How to detect The problem can be detected by inspecting packet traces of a large, interrupted bulk transfer.

如何检测问题可以通过检查大的、中断的批量传输的数据包跟踪来检测。

2.17.

2.17.

Name of Problem Failure to RST on close with data pending

问题名称关闭时RST失败,数据挂起

Classification Resource management

分类资源管理

Description When an application closes a connection in such a way that it can no longer read any received data, the TCP SHOULD, per section 4.2.2.13 of RFC 1122, send a RST if there is any unread received data, or if any new data is received. A TCP that fails to do so exhibits "Failure to RST on close with data pending".

说明当应用程序以无法再读取任何接收数据的方式关闭连接时,根据RFC 1122第4.2.2.13节,如果存在任何未读取的接收数据,或者如果接收到任何新数据,TCP应发送RST。如果TCP未能做到这一点,则会显示“关闭时RST失败,数据挂起”。

Note that, for some TCPs, this situation can be caused by an application "crashing" while a peer is sending data.

请注意,对于某些TCP,这种情况可能是由于对等方发送数据时应用程序“崩溃”造成的。

We have observed a number of TCPs that exhibit this problem. The problem is less serious if any subsequent data sent to the now-closed connection endpoint elicits a RST (see illustration below).

我们已经观察到许多TCP表现出这一问题。如果发送到现在已关闭的连接端点的任何后续数据引发RST(请参见下图),则问题就不那么严重了。

Significance This problem is most significant for endpoints that engage in large numbers of connections, as their ability to do so will be curtailed as they leak away resources.

重要意义这个问题对于涉及大量连接的端点来说最为重要,因为它们的连接能力会随着资源泄漏而受到限制。

Implications Failure to reset the connection can lead to permanently hung connections, in which the remote endpoint takes no further action to tear down the connection because it is waiting on the local TCP to first take some action. This is particularly the case if the local TCP also allows the advertised window to go to zero, and fails to tear down the connection when the remote TCP engages in "persist" probes (see example below).

含义重置连接失败可能会导致永久挂起连接,其中远程端点不会采取进一步的操作来断开连接,因为它正在等待本地TCP首先采取一些操作。如果本地TCP也允许播发的窗口变为零,并且当远程TCP进行“持久”探测时无法断开连接,则情况尤其如此(请参见下面的示例)。

Relevant RFCs RFC 1122 section 4.2.2.13. Also, 4.2.2.17 for the zero-window probing discussion below.

相关RFC RFC 1122第4.2.2.13节。此外,4.2.2.17用于下面的零窗探测讨论。

Trace file demonstrating it Made using tcpdump. No drop information available.

使用tcpdump制作的跟踪文件。没有可用的投递信息。

   13:11:46.04 A > B: S 458659166:458659166(0) win 4096
                       <mss 1460,wscale 0,eol> (DF)
   13:11:46.04 B > A: S 792320000:792320000(0) ack 458659167
                       win 4096
   13:11:46.04 A > B: . ack 1 win 4096 (DF)
   13:11.55.80 A > B: . 1:513(512) ack 1 win 4096 (DF)
   13:11.55.80 A > B: . 513:1025(512) ack 1 win 4096 (DF)
   13:11:55.83 B > A: . ack 1025 win 3072
   13:11.55.84 A > B: . 1025:1537(512) ack 1 win 4096 (DF)
   13:11.55.84 A > B: . 1537:2049(512) ack 1 win 4096 (DF)
   13:11.55.85 A > B: . 2049:2561(512) ack 1 win 4096 (DF)
   13:11:56.03 B > A: . ack 2561 win 1536
   13:11.56.05 A > B: . 2561:3073(512) ack 1 win 4096 (DF)
   13:11.56.06 A > B: . 3073:3585(512) ack 1 win 4096 (DF)
   13:11.56.06 A > B: . 3585:4097(512) ack 1 win 4096 (DF)
   13:11:56.23 B > A: . ack 4097 win 0
   13:11:58.16 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
   13:11:58.16 B > A: . ack 4097 win 0
   13:12:00.16 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
   13:12:00.16 B > A: . ack 4097 win 0
   13:12:02.16 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
   13:12:02.16 B > A: . ack 4097 win 0
   13:12:05.37 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
   13:12:05.37 B > A: . ack 4097 win 0
   13:12:06.36 B > A: F 1:1(0) ack 4097 win 0
   13:12:06.37 A > B: . ack 2 win 4096 (DF)
   13:12:11.78 A > B: . 4096:4097(1) ack 2 win 4096 (DF)
        
   13:11:46.04 A > B: S 458659166:458659166(0) win 4096
                       <mss 1460,wscale 0,eol> (DF)
   13:11:46.04 B > A: S 792320000:792320000(0) ack 458659167
                       win 4096
   13:11:46.04 A > B: . ack 1 win 4096 (DF)
   13:11.55.80 A > B: . 1:513(512) ack 1 win 4096 (DF)
   13:11.55.80 A > B: . 513:1025(512) ack 1 win 4096 (DF)
   13:11:55.83 B > A: . ack 1025 win 3072
   13:11.55.84 A > B: . 1025:1537(512) ack 1 win 4096 (DF)
   13:11.55.84 A > B: . 1537:2049(512) ack 1 win 4096 (DF)
   13:11.55.85 A > B: . 2049:2561(512) ack 1 win 4096 (DF)
   13:11:56.03 B > A: . ack 2561 win 1536
   13:11.56.05 A > B: . 2561:3073(512) ack 1 win 4096 (DF)
   13:11.56.06 A > B: . 3073:3585(512) ack 1 win 4096 (DF)
   13:11.56.06 A > B: . 3585:4097(512) ack 1 win 4096 (DF)
   13:11:56.23 B > A: . ack 4097 win 0
   13:11:58.16 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
   13:11:58.16 B > A: . ack 4097 win 0
   13:12:00.16 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
   13:12:00.16 B > A: . ack 4097 win 0
   13:12:02.16 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
   13:12:02.16 B > A: . ack 4097 win 0
   13:12:05.37 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
   13:12:05.37 B > A: . ack 4097 win 0
   13:12:06.36 B > A: F 1:1(0) ack 4097 win 0
   13:12:06.37 A > B: . ack 2 win 4096 (DF)
   13:12:11.78 A > B: . 4096:4097(1) ack 2 win 4096 (DF)
        
   13:12:11.78 B > A: . ack 4097 win 0
   13:12:24.59 A > B: . 4096:4097(1) ack 2 win 4096 (DF)
   13:12:24.60 B > A: . ack 4097 win 0
   13:12:50.22 A > B: . 4096:4097(1) ack 2 win 4096 (DF)
   13:12:50.22 B > A: . ack 4097 win 0
        
   13:12:11.78 B > A: . ack 4097 win 0
   13:12:24.59 A > B: . 4096:4097(1) ack 2 win 4096 (DF)
   13:12:24.60 B > A: . ack 4097 win 0
   13:12:50.22 A > B: . 4096:4097(1) ack 2 win 4096 (DF)
   13:12:50.22 B > A: . ack 4097 win 0
        

Machine B in the trace above does not drop received data when the socket is "closed" by the application (in this case, the application process was terminated). This occurred at approximately 13:12:06.36 and resulted in the FIN being sent in response to the close. However, because there is no longer an application to deliver the data to, the TCP should have instead sent a RST.

当应用程序“关闭”套接字时,上面跟踪中的机器B不会丢弃接收到的数据(在这种情况下,应用程序进程被终止)。这发生在大约13:12:06.36,导致发送FIN以响应关闭。但是,由于不再有应用程序将数据传送到,TCP应该发送RST。

Note: Machine A's zero-window probing is also broken. It is resending old data, rather than new data. Section 3.7 in RFC 793 and Section 4.2.2.17 in RFC 1122 discuss zero-window probing.

注意:机器A的零窗探测也被破坏。它正在重新发送旧数据,而不是新数据。RFC 793中的第3.7节和RFC 1122中的第4.2.2.17节讨论了零窗口探测。

Trace file demonstrating better behavior Made using tcpdump. No drop information available.

使用tcpdump生成的跟踪文件,演示了更好的行为。没有可用的投递信息。

Better, but still not fully correct, behavior, per the discussion below. We show this behavior because it has been observed for a number of different TCP implementations.

根据下面的讨论,行为更好,但仍不完全正确。我们之所以显示这种行为,是因为在许多不同的TCP实现中都观察到了这种行为。

   13:48:29.24 C > D: S 73445554:73445554(0) win 4096
                       <mss 1460,wscale 0,eol> (DF)
   13:48:29.24 D > C: S 36050296:36050296(0) ack 73445555
                       win 4096 <mss 1460,wscale 0,eol> (DF)
   13:48:29.25 C > D: . ack 1 win 4096 (DF)
   13:48:30.78 C > D: . 1:1461(1460) ack 1 win 4096 (DF)
   13:48:30.79 C > D: . 1461:2921(1460) ack 1 win 4096 (DF)
   13:48:30.80 D > C: . ack 2921 win 1176 (DF)
   13:48:32.75 C > D: . 2921:4097(1176) ack 1 win 4096 (DF)
   13:48:32.82 D > C: . ack 4097 win 0 (DF)
   13:48:34.76 C > D: . 4096:4097(1) ack 1 win 4096 (DF)
   13:48:34.84 D > C: . ack 4097 win 0 (DF)
   13:48:36.34 D > C: FP 1:1(0) ack 4097 win 4096 (DF)
   13:48:36.34 C > D: . 4097:5557(1460) ack 2 win 4096 (DF)
   13:48:36.34 D > C: R 36050298:36050298(0) win 24576
   13:48:36.34 C > D: . 5557:7017(1460) ack 2 win 4096 (DF)
   13:48:36.34 D > C: R 36050298:36050298(0) win 24576
        
   13:48:29.24 C > D: S 73445554:73445554(0) win 4096
                       <mss 1460,wscale 0,eol> (DF)
   13:48:29.24 D > C: S 36050296:36050296(0) ack 73445555
                       win 4096 <mss 1460,wscale 0,eol> (DF)
   13:48:29.25 C > D: . ack 1 win 4096 (DF)
   13:48:30.78 C > D: . 1:1461(1460) ack 1 win 4096 (DF)
   13:48:30.79 C > D: . 1461:2921(1460) ack 1 win 4096 (DF)
   13:48:30.80 D > C: . ack 2921 win 1176 (DF)
   13:48:32.75 C > D: . 2921:4097(1176) ack 1 win 4096 (DF)
   13:48:32.82 D > C: . ack 4097 win 0 (DF)
   13:48:34.76 C > D: . 4096:4097(1) ack 1 win 4096 (DF)
   13:48:34.84 D > C: . ack 4097 win 0 (DF)
   13:48:36.34 D > C: FP 1:1(0) ack 4097 win 4096 (DF)
   13:48:36.34 C > D: . 4097:5557(1460) ack 2 win 4096 (DF)
   13:48:36.34 D > C: R 36050298:36050298(0) win 24576
   13:48:36.34 C > D: . 5557:7017(1460) ack 2 win 4096 (DF)
   13:48:36.34 D > C: R 36050298:36050298(0) win 24576
        

In this trace, the application process is terminated on Machine D at approximately 13:48:36.34. Its TCP sends the FIN with the window opened again (since it discarded the previously received data). Machine C promptly sends more data, causing Machine D to

在此跟踪中,应用过程在机器D上大约13:48:36.34终止。它的TCP在窗口再次打开时发送FIN(因为它丢弃了以前接收到的数据)。机器C迅速发送更多数据,导致机器D

reset the connection since it cannot deliver the data to the application. Ideally, Machine D SHOULD send a RST instead of dropping the data and re-opening the receive window.

重置连接,因为它无法将数据传送到应用程序。理想情况下,机器D应该发送RST,而不是丢弃数据并重新打开接收窗口。

Note: Machine C's zero-window probing is broken, the same as in the example above.

注意:机器C的零窗口探测被破坏,与上面的示例相同。

Trace file demonstrating correct behavior Made using tcpdump. No losses reported by the packet filter.

显示使用tcpdump进行的正确行为的跟踪文件。数据包筛选器未报告任何丢失。

   14:12:02.19 E > F: S 1143360000:1143360000(0) win 4096
   14:12:02.19 F > E: S 1002988443:1002988443(0) ack 1143360001
                       win 4096 <mss 1460> (DF)
   14:12:02.19 E > F: . ack 1 win 4096
   14:12:10.43 E > F: . 1:513(512) ack 1 win 4096
   14:12:10.61 F > E: . ack 513 win 3584 (DF)
   14:12:10.61 E > F: . 513:1025(512) ack 1 win 4096
   14:12:10.61 E > F: . 1025:1537(512) ack 1 win 4096
   14:12:10.81 F > E: . ack 1537 win 2560 (DF)
   14:12:10.81 E > F: . 1537:2049(512) ack 1 win 4096
   14:12:10.81 E > F: . 2049:2561(512) ack 1 win 4096
   14:12:10.81 E > F: . 2561:3073(512) ack 1 win 4096
   14:12:11.01 F > E: . ack 3073 win 1024 (DF)
   14:12:11.01 E > F: . 3073:3585(512) ack 1 win 4096
   14:12:11.01 E > F: . 3585:4097(512) ack 1 win 4096
   14:12:11.21 F > E: . ack 4097 win 0 (DF)
   14:12:15.88 E > F: . 4097:4098(1) ack 1 win 4096
   14:12:16.06 F > E: . ack 4097 win 0 (DF)
   14:12:20.88 E > F: . 4097:4098(1) ack 1 win 4096
   14:12:20.91 F > E: . ack 4097 win 0 (DF)
   14:12:21.94 F > E: R 1002988444:1002988444(0) win 4096
        
   14:12:02.19 E > F: S 1143360000:1143360000(0) win 4096
   14:12:02.19 F > E: S 1002988443:1002988443(0) ack 1143360001
                       win 4096 <mss 1460> (DF)
   14:12:02.19 E > F: . ack 1 win 4096
   14:12:10.43 E > F: . 1:513(512) ack 1 win 4096
   14:12:10.61 F > E: . ack 513 win 3584 (DF)
   14:12:10.61 E > F: . 513:1025(512) ack 1 win 4096
   14:12:10.61 E > F: . 1025:1537(512) ack 1 win 4096
   14:12:10.81 F > E: . ack 1537 win 2560 (DF)
   14:12:10.81 E > F: . 1537:2049(512) ack 1 win 4096
   14:12:10.81 E > F: . 2049:2561(512) ack 1 win 4096
   14:12:10.81 E > F: . 2561:3073(512) ack 1 win 4096
   14:12:11.01 F > E: . ack 3073 win 1024 (DF)
   14:12:11.01 E > F: . 3073:3585(512) ack 1 win 4096
   14:12:11.01 E > F: . 3585:4097(512) ack 1 win 4096
   14:12:11.21 F > E: . ack 4097 win 0 (DF)
   14:12:15.88 E > F: . 4097:4098(1) ack 1 win 4096
   14:12:16.06 F > E: . ack 4097 win 0 (DF)
   14:12:20.88 E > F: . 4097:4098(1) ack 1 win 4096
   14:12:20.91 F > E: . ack 4097 win 0 (DF)
   14:12:21.94 F > E: R 1002988444:1002988444(0) win 4096
        

When the application terminates at 14:12:21.94, F immediately sends a RST.

当应用程序在14:12:21.94终止时,F立即发送RST。

Note: Machine E's zero-window probing is (finally) correct.

注:机器E的零窗探测(最终)是正确的。

How to detect The problem can often be detected by inspecting packet traces of a transfer in which the receiving application terminates abnormally. When doing so, there can be an ambiguity (if only looking at the trace) as to whether the receiving TCP did indeed have unread data that it could now no longer deliver. To provoke this to happen, it may help to suspend the receiving application so that it fails to consume any data, eventually exhausting the advertised window. At this point, since the advertised window is zero, we know that

如何检测问题通常可以通过检查接收应用程序异常终止的传输的数据包跟踪来检测。这样做时,可能会有一个模糊性(如果只看跟踪),即接收TCP是否确实有未读数据,它现在无法再传递。为了促使这种情况发生,可能需要暂停接收应用程序,使其无法使用任何数据,最终耗尽公布的窗口。此时,由于广告窗口为零,我们知道

the receiving TCP has undelivered data buffered up. Terminating the application process then should suffice to test the correctness of the TCP's behavior.

接收TCP已缓冲未传递的数据。然后,终止应用程序进程应该足以测试TCP行为的正确性。

2.18.

2.18.

Name of Problem Options missing from TCP MSS calculation

TCP MSS计算中缺少问题选项的名称

Classification Reliability / performance

分类可靠性/性能

Description When a TCP determines how much data to send per packet, it calculates a segment size based on the MTU of the path. It must then subtract from that MTU the size of the IP and TCP headers in the packet. If IP options and TCP options are not taken into account correctly in this calculation, the resulting segment size may be too large. TCPs that do so are said to exhibit "Options missing from TCP MSS calculation".

说明当TCP确定每个数据包要发送多少数据时,它会根据路径的MTU计算段大小。然后它必须从该MTU中减去数据包中IP和TCP头的大小。如果在此计算中未正确考虑IP选项和TCP选项,则生成的段大小可能太大。这样做的TCP被称为“TCP MSS计算中缺少选项”。

Significance In some implementations, this causes the transmission of strangely fragmented packets. In some implementations with Path MTU (PMTU) discovery [RFC1191], this problem can actually result in a total failure to transmit any data at all, regardless of the environment (see below).

在某些实现中,这会导致传输奇怪的碎片数据包。在一些使用路径MTU(PMTU)发现[RFC1191]的实现中,此问题实际上可能导致传输任何数据的完全失败,而不管环境如何(见下文)。

Arguably, especially since the wide deployment of firewalls, IP options appear only rarely in normal operations.

可以说,特别是由于防火墙的广泛部署,IP选项在正常操作中很少出现。

Implications In implementations using PMTU discovery, this problem can result in packets that are too large for the output interface, and that have the DF (don't fragment) bit set in the IP header. Thus, the IP layer on the local machine is not allowed to fragment the packet to send it out the interface. It instead informs the TCP layer of the correct MTU size of the interface; the TCP layer again miscomputes the MSS by failing to take into account the size of IP options; and the problem repeats, with no data flowing.

在使用PMTU发现的实现中,此问题可能导致数据包对于输出接口太大,并且在IP报头中设置了DF(不分段)位。因此,本地机器上的IP层不允许对数据包进行分段以将其发送到接口。而是通知TCP层接口的正确MTU大小;TCP层由于没有考虑IP选项的大小而再次错误地完成MSS;问题会重复出现,没有数据流动。

Relevant RFCs RFC 1122 describes the calculation of the effective send MSS. RFC 1191 describes Path MTU discovery.

相关RFC RFC 1122描述了有效发送MSS的计算。RFC 1191描述了路径MTU发现。

Trace file demonstrating it Trace file taking using tcpdump on host C. The first trace demonstrates the fragmentation that occurs without path MTU discovery:

跟踪文件演示在主机C上使用tcpdump获取跟踪文件。第一个跟踪演示在未发现路径MTU的情况下发生的碎片:

   13:55:25.488728 A.65528 > C.discard:
           P 567833:569273(1440) ack 1 win 17520
           <nop,nop,timestamp 3839 1026342>
           (frag 20828:1472@0+)
           (ttl 62, optlen=8 LSRR{B#} NOP)
        
   13:55:25.488728 A.65528 > C.discard:
           P 567833:569273(1440) ack 1 win 17520
           <nop,nop,timestamp 3839 1026342>
           (frag 20828:1472@0+)
           (ttl 62, optlen=8 LSRR{B#} NOP)
        
   13:55:25.488943 A > C:
           (frag 20828:8@1472)
           (ttl 62, optlen=8 LSRR{B#} NOP)
        
   13:55:25.488943 A > C:
           (frag 20828:8@1472)
           (ttl 62, optlen=8 LSRR{B#} NOP)
        
   13:55:25.489052 C.discard > A.65528:
           . ack 566385 win 60816
           <nop,nop,timestamp 1026345 3839> (DF)
           (ttl 60, id 41266)
        
   13:55:25.489052 C.discard > A.65528:
           . ack 566385 win 60816
           <nop,nop,timestamp 1026345 3839> (DF)
           (ttl 60, id 41266)
        

Host A repeatedly sends 1440-octet data segments, but these hare fragmented into two packets, one with 1432 octets of data, and another with 8 octets of data.

主机A重复发送1440个八位字节数据段,但这些数据段分成两个数据包,一个包含1432个八位字节的数据,另一个包含8个八位字节的数据。

The second trace demonstrates the failure to send any data segments, sometimes seen with hosts doing path MTU discovery:

第二条跟踪显示无法发送任何数据段,有时在主机执行路径MTU发现时会看到:

   13:55:44.332219 A.65527 > C.discard:
           S 1018235390:1018235390(0) win 16384
           <mss 1460,nop,wscale 0,nop,nop,timestamp 3876 0> (DF)
           (ttl 62, id 20912, optlen=8 LSRR{B#} NOP)
        
   13:55:44.332219 A.65527 > C.discard:
           S 1018235390:1018235390(0) win 16384
           <mss 1460,nop,wscale 0,nop,nop,timestamp 3876 0> (DF)
           (ttl 62, id 20912, optlen=8 LSRR{B#} NOP)
        
   13:55:44.333015 C.discard > A.65527:
           S 1271629000:1271629000(0) ack 1018235391 win 60816
           <mss 1460,nop,wscale 0,nop,nop,timestamp 1026383 3876> (DF)
           (ttl 60, id 41427)
        
   13:55:44.333015 C.discard > A.65527:
           S 1271629000:1271629000(0) ack 1018235391 win 60816
           <mss 1460,nop,wscale 0,nop,nop,timestamp 1026383 3876> (DF)
           (ttl 60, id 41427)
        
   13:55:44.333206 C.discard > A.65527:
           S 1271629000:1271629000(0) ack 1018235391 win 60816
           <mss 1460,nop,wscale 0,nop,nop,timestamp 1026383 3876> (DF)
           (ttl 60, id 41427)
        
   13:55:44.333206 C.discard > A.65527:
           S 1271629000:1271629000(0) ack 1018235391 win 60816
           <mss 1460,nop,wscale 0,nop,nop,timestamp 1026383 3876> (DF)
           (ttl 60, id 41427)
        

This is all of the activity seen on this connection. Eventually host C will time out attempting to establish the connection.

这是在此连接上看到的所有活动。最终,主机C将在尝试建立连接时超时。

How to detect The "netcat" utility [Hobbit96] is useful for generating source routed packets:

如何检测“netcat”实用程序[Hobbit96]对于生成源路由数据包非常有用:

      1% nc C discard
      (interactive typing)
      ^C
      2% nc C discard < /dev/zero
      ^C
      3% nc -g B C discard
      (interactive typing)
      ^C
      4% nc -g B C discard < /dev/zero
      ^C
        
      1% nc C discard
      (interactive typing)
      ^C
      2% nc C discard < /dev/zero
      ^C
      3% nc -g B C discard
      (interactive typing)
      ^C
      4% nc -g B C discard < /dev/zero
      ^C
        

Lines 1 through 3 should generate appropriate packets, which can be verified using tcpdump. If the problem is present, line 4 should generate one of the two kinds of packet traces shown.

第1行到第3行应该生成适当的数据包,可以使用tcpdump进行验证。如果问题存在,第4行应该生成所示的两种数据包跟踪中的一种。

How to fix The implementation should ensure that the effective send MSS calculation includes a term for the IP and TCP options, as mandated by RFC 1122.

如何修复实施应确保有效的发送MSS计算包括RFC 1122规定的IP和TCP选项术语。

3. Security Considerations
3. 安全考虑

This memo does not discuss any specific security-related TCP implementation problems, as the working group decided to pursue documenting those in a separate document. Some of the implementation problems discussed here, however, can be used for denial-of-service attacks. Those classified as congestion control present opportunities to subvert TCPs used for legitimate data transfer into excessively loading network elements. Those classified as "performance", "reliability" and "resource management" may be exploitable for launching surreptitious denial-of-service attacks against the user of the TCP. Both of these types of attacks can be extremely difficult to detect because in most respects they look identical to legitimate network traffic.

本备忘录未讨论任何与安全相关的TCP实施问题,因为工作组决定在单独的文件中记录这些问题。然而,这里讨论的一些实现问题可用于拒绝服务攻击。那些被归类为拥塞控制的方案提供了将用于合法数据传输的TCP颠覆为过度加载的网络元素的机会。可利用被归类为“性能”、“可靠性”和“资源管理”的漏洞对TCP用户发起秘密拒绝服务攻击。这两种类型的攻击都极难检测,因为在大多数方面,它们看起来与合法网络流量完全相同。

4. Acknowledgements
4. 致谢

Thanks to numerous correspondents on the tcp-impl mailing list for their input: Steve Alexander, Larry Backman, Jerry Chu, Alan Cox, Kevin Fall, Richard Fox, Jim Gettys, Rick Jones, Allison Mankin, Neal McBurnett, Perry Metzger, der Mouse, Thomas Narten, Andras Olah, Steve Parker, Francesco Potorti`, Luigi Rizzo, Allyn Romanow, Al Smith, Jerry Toporek, Joe Touch, and Curtis Villamizar.

感谢tcp impl邮件列表上的众多记者的投入:史蒂夫·亚历山大、拉里·贝克曼、杰瑞·朱、艾伦·考克斯、凯文·福尔、理查德·福克斯、吉姆·盖蒂、里克·琼斯、埃里森·曼金、尼尔·麦克伯内特、佩里·梅茨格、德·穆斯、托马斯·纳顿、安德拉斯·奥拉、史蒂夫·帕克、弗朗西斯科·波托蒂、路易吉·里佐、艾琳·罗曼诺、艾尔·史密斯、,杰里·托波雷克、乔·图奇和柯蒂斯·维拉米扎。

Thanks also to Josh Cohen for the traces documenting the "Failure to send a RST after Half Duplex Close" problem; and to John Polstra, who analyzed the "Window probe deadlock" problem.

还要感谢Josh Cohen记录“半双工关闭后发送RST失败”问题的跟踪;约翰·波尔斯特拉分析了“窗口探测死锁”问题。

5. References
5. 工具书类
   [Allman97]   M. Allman, "Fixing Two BSD TCP Bugs," Technical Report
                CR-204151, NASA Lewis Research Center, Oct. 1997.
                http://roland.grc.nasa.gov/~mallman/papers/bug.ps
        
   [Allman97]   M. Allman, "Fixing Two BSD TCP Bugs," Technical Report
                CR-204151, NASA Lewis Research Center, Oct. 1997.
                http://roland.grc.nasa.gov/~mallman/papers/bug.ps
        

[RFC2414] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's Initial Window", RFC 2414, September 1998.

[RFC2414]奥尔曼,M.,弗洛伊德,S.和C.帕特里奇,“增加TCP的初始窗口”,RFC2414141998年9月。

[RFC1122] Braden, R., Editor, "Requirements for Internet Hosts -- Communication Layers", STD 3, RFC 1122, October 1989.

[RFC1122]Braden,R.,编辑,“互联网主机的要求——通信层”,STD 3,RFC 1122,1989年10月。

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

[Brakmo95] L. Brakmo and L. Peterson, "Performance Problems in BSD4.4 TCP," ACM Computer Communication Review, 25(5):69-86, 1995.

[Brakmo95]L.Brakmo和L.Peterson,“BSD4.4 TCP中的性能问题”,ACM计算机通信评论,25(5):69-861995。

[RFC813] Clark, D., "Window and Acknowledgement Strategy in TCP," RFC 813, July 1982.

[RFC813]Clark,D.,“TCP中的窗口和确认策略”,RFC813,1982年7月。

[Dawson97] S. Dawson, F. Jahanian, and T. Mitton, "Experiments on Six Commercial TCP Implementations Using a Software Fault Injection Tool," to appear in Software Practice & Experience, 1997. A technical report version of this paper can be obtained at ftp://rtcl.eecs.umich.edu/outgoing/sdawson/CSE-TR-298- 96.ps.gz.

[Dawson97]S.Dawson,F.Jahanian和T.Mitton,“使用软件故障注入工具对六种商业TCP实现进行的实验”,发表于《软件实践与经验》,1997年。本文件的技术报告版本可在以下网址获得:ftp://rtcl.eecs.umich.edu/outgoing/sdawson/CSE-TR-298- 96.ps.gz。

[Fall96] K. Fall and S. Floyd, "Simulation-based Comparisons of Tahoe, Reno, and SACK TCP," ACM Computer Communication Review, 26(3):5-21, 1996.

[Fall96]K.Fall和S.Floyd,“塔霍、雷诺和SACK TCP基于模拟的比较”,ACM计算机通信评论,26(3):5-211996。

[Hobbit96] Hobbit, Avian Research, netcat, available via anonymous ftp to ftp.avian.org, 1996.

[Hobbit96]霍比特人,鸟类研究,网络猫,可通过匿名ftp访问ftp.Avian.org,1996年。

[Hoe96] J. Hoe, "Improving the Start-up Behavior of a Congestion Control Scheme for TCP," Proc. SIGCOMM '96.

[Hoe96]J.Hoe,“改进TCP拥塞控制方案的启动行为”,Proc。SIGCOMM'96。

   [Jacobson88] V. Jacobson, "Congestion Avoidance and Control," Proc.
                SIGCOMM '88.  ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z
        
   [Jacobson88] V. Jacobson, "Congestion Avoidance and Control," Proc.
                SIGCOMM '88.  ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z
        

[Jacobson89] V. Jacobson, C. Leres, and S. McCanne, tcpdump, available via anonymous ftp to ftp.ee.lbl.gov, Jun. 1989.

[Jacobson89]V.Jacobson,C.Leres和S.McCanne,tcpdump,可通过匿名ftp访问ftp.ee.lbl.gov,1989年6月。

[RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP Selective Acknowledgement Options", RFC 2018, October 1996.

[RFC2018]Mathis,M.,Mahdavi,J.,Floyd,S.和A.Romanow,“TCP选择性确认选项”,RFC 2018,1996年10月。

[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990.

[RFC1191]Mogul,J.和S.Deering,“MTU发现路径”,RFC1191,1990年11月。

[RFC896] Nagle, J., "Congestion Control in IP/TCP Internetworks", RFC 896, January 1984.

[RFC896]Nagle,J.,“IP/TCP网络中的拥塞控制”,RFC896,1984年1月。

[Paxson97] V. Paxson, "Automated Packet Trace Analysis of TCP Implementations," Proc. SIGCOMM '97, available from ftp://ftp.ee.lbl.gov/papers/vp-tcpanaly-sigcomm97.ps.Z.

[Paxson97]V.Paxson,“TCP实现的自动数据包跟踪分析”,Proc。SIGCOMM'97,可从ftp://ftp.ee.lbl.gov/papers/vp-tcpanaly-sigcomm97.ps.Z.

[RFC793] Postel, J., Editor, "Transmission Control Protocol," STD 7, RFC 793, September 1981.

[RFC793]Postel,J.,编辑,“传输控制协议”,STD 7,RFC 793,1981年9月。

[RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms", RFC 2001, January 1997.

[RFC2001]Stevens,W.“TCP慢启动、拥塞避免、快速重传和快速恢复算法”,RFC 2001,1997年1月。

[Stevens94] W. Stevens, "TCP/IP Illustrated, Volume 1", Addison-Wesley Publishing Company, Reading, Massachusetts, 1994.

[Stevens94]W.Stevens,“TCP/IP插图,第1卷”,Addison-Wesley出版公司,雷丁,马萨诸塞州,1994年。

[Wright95] G. Wright and W. Stevens, "TCP/IP Illustrated, Volume 2", Addison-Wesley Publishing Company, Reading Massachusetts, 1995.

[Wright95]G.Wright和W.Stevens,“TCP/IP插图,第2卷”,Addison-Wesley出版公司,阅读马萨诸塞州,1995年。

6. Authors' Addresses
6. 作者地址

Vern Paxson ACIRI / ICSI 1947 Center Street Suite 600 Berkeley, CA 94704-1198

Vern Paxson ACIRI/ICSI 1947加利福尼亚州伯克利中心街600号套房94704-1198

   Phone: +1 510/642-4274 x302
   EMail: vern@aciri.org
        
   Phone: +1 510/642-4274 x302
   EMail: vern@aciri.org
        

Mark Allman <mallman@grc.nasa.gov> NASA Glenn Research Center/Sterling Software Lewis Field 21000 Brookpark Road MS 54-2 Cleveland, OH 44135 USA

马克·奥尔曼<mallman@grc.nasa.gov>美国俄亥俄州克利夫兰布鲁克帕克路MS 54-2号,邮编:44135

   Phone: +1 216/433-6586
   Email: mallman@grc.nasa.gov
        
   Phone: +1 216/433-6586
   Email: mallman@grc.nasa.gov
        

Scott Dawson Real-Time Computing Laboratory EECS Building University of Michigan Ann Arbor, MI 48109-2122 USA

史葛道森实时计算实验室ECES安娜堡密歇根大学,美国

   Phone: +1 313/763-5363
   EMail: sdawson@eecs.umich.edu
        
   Phone: +1 313/763-5363
   EMail: sdawson@eecs.umich.edu
        

William C. Fenner Xerox PARC 3333 Coyote Hill Road Palo Alto, CA 94304 USA

美国加利福尼亚州帕洛阿尔托郊狼山路3333号威廉C.芬纳施乐公园,邮编94304

   Phone: +1 650/812-4816
   EMail: fenner@parc.xerox.com
        
   Phone: +1 650/812-4816
   EMail: fenner@parc.xerox.com
        

Jim Griner <jgriner@grc.nasa.gov> NASA Glenn Research Center Lewis Field 21000 Brookpark Road MS 54-2 Cleveland, OH 44135 USA

吉姆·格林纳<jgriner@grc.nasa.gov>美国俄亥俄州克利夫兰布鲁克公园路21000号美国宇航局格伦研究中心刘易斯场,邮编:44135

   Phone: +1 216/433-5787
   EMail: jgriner@grc.nasa.gov
        
   Phone: +1 216/433-5787
   EMail: jgriner@grc.nasa.gov
        

Ian Heavens Spider Software Ltd. 8 John's Place, Leith Edinburgh EH6 7EL UK

伊恩天堂蜘蛛软件有限公司8约翰广场,英国爱丁堡莱思EH6 7EL

   Phone: +44 131/475-7015
   EMail: ian@spider.com
        
   Phone: +44 131/475-7015
   EMail: ian@spider.com
        

Kevin Lahey NASA Ames Research Center/MRJ MS 258-6 Moffett Field, CA 94035 USA

凯文·拉希美国宇航局艾姆斯研究中心/MRJ MS 258-6美国加利福尼亚州莫菲特菲尔德94035

   Phone: +1 650/604-4334
   EMail: kml@nas.nasa.gov
        
   Phone: +1 650/604-4334
   EMail: kml@nas.nasa.gov
        

Jeff Semke Pittsburgh Supercomputing Center 4400 Fifth Ave Pittsburgh, PA 15213 USA

美国宾夕法尼亚州匹兹堡第五大道4400号杰夫·塞姆克匹兹堡超级计算中心,邮编15213

   Phone: +1 412/268-4960
   EMail: semke@psc.edu
        
   Phone: +1 412/268-4960
   EMail: semke@psc.edu
        

Bernie Volz Process Software Corporation 959 Concord Street Framingham, MA 01701 USA

伯尼沃兹过程软件公司美国马萨诸塞州弗雷明翰康科德街959号01701

   Phone: +1 508/879-6994
   EMail: volz@process.com
        
   Phone: +1 508/879-6994
   EMail: volz@process.com
        
7. Full Copyright Statement
7. 完整版权声明

Copyright (C) The Internet Society (1999). All Rights Reserved.

版权所有(C)互联网协会(1999年)。版权所有。

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.

上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。