Internet Engineering Task Force (IETF)                          S. Banks
Request for Comments: 7654                                VSS Monitoring
Category: Informational                                      F. Calabria
ISSN: 2070-1721                                            Cisco Systems
                                                              G. Czirjak
                                                               R. Machat
                                                        Juniper Networks
                                                            October 2015
        
Internet Engineering Task Force (IETF)                          S. Banks
Request for Comments: 7654                                VSS Monitoring
Category: Informational                                      F. Calabria
ISSN: 2070-1721                                            Cisco Systems
                                                              G. Czirjak
                                                               R. Machat
                                                        Juniper Networks
                                                            October 2015
        

Benchmarking Methodology for In-Service Software Upgrade (ISSU)

在用软件升级(ISU)的基准测试方法

Abstract

摘要

Modern forwarding devices attempt to minimize any control- and data-plane disruptions while performing planned software changes by implementing a technique commonly known as In-Service Software Upgrade (ISSU). This document specifies a set of common methodologies and procedures designed to characterize the overall behavior of a Device Under Test (DUT), subject to an ISSU event.

现代转发设备试图通过实施一种通常称为服务内软件升级(ISU)的技术,在执行计划的软件更改时将任何控制和数据平面中断降至最低。本文件规定了一套通用方法和程序,旨在描述受试设备(DUT)在ISU事件影响下的整体行为。

Status of This Memo

关于下段备忘

This document is not an Internet Standards Track specification; it is published for informational purposes.

本文件不是互联网标准跟踪规范;它是为了提供信息而发布的。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741.

本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。并非IESG批准的所有文件都适用于任何级别的互联网标准;见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7654.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc7654.

Copyright Notice

版权公告

Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2015 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。

Table of Contents

目录

   1. Introduction ....................................................3
   2. Conventions Used in This Document ...............................4
   3. Generic ISSU Process, Phased Approach ...........................4
      3.1. Software Download ..........................................5
      3.2. Software Staging ...........................................6
      3.3. Upgrade Run ................................................6
      3.4. Upgrade Acceptance .........................................7
   4. Test Methodology ................................................7
      4.1. Test Topology ..............................................7
      4.2. Load Model .................................................8
   5. ISSU Test Methodology ...........................................9
      5.1. Pre-ISSU Recommended Verifications .........................9
      5.2. Software Staging ...........................................9
      5.3. Upgrade Run ...............................................10
      5.4. Post-ISSU Verification ....................................11
      5.5. ISSU under Negative Stimuli ...............................12
   6. ISSU Abort and Rollback ........................................12
   7. Final Report: Data Presentation and Analysis ...................13
      7.1. Data Collection Considerations ............................14
   8. Security Considerations ........................................15
   9. References .....................................................15
      9.1. Normative References ......................................15
      9.2. Informative References ....................................16
   Acknowledgments ...................................................16
   Authors' Addresses ................................................16
        
   1. Introduction ....................................................3
   2. Conventions Used in This Document ...............................4
   3. Generic ISSU Process, Phased Approach ...........................4
      3.1. Software Download ..........................................5
      3.2. Software Staging ...........................................6
      3.3. Upgrade Run ................................................6
      3.4. Upgrade Acceptance .........................................7
   4. Test Methodology ................................................7
      4.1. Test Topology ..............................................7
      4.2. Load Model .................................................8
   5. ISSU Test Methodology ...........................................9
      5.1. Pre-ISSU Recommended Verifications .........................9
      5.2. Software Staging ...........................................9
      5.3. Upgrade Run ...............................................10
      5.4. Post-ISSU Verification ....................................11
      5.5. ISSU under Negative Stimuli ...............................12
   6. ISSU Abort and Rollback ........................................12
   7. Final Report: Data Presentation and Analysis ...................13
      7.1. Data Collection Considerations ............................14
   8. Security Considerations ........................................15
   9. References .....................................................15
      9.1. Normative References ......................................15
      9.2. Informative References ....................................16
   Acknowledgments ...................................................16
   Authors' Addresses ................................................16
        
1. Introduction
1. 介绍

As required by most Service Provider (SP) network operators, ISSU functionality has been implemented by modern forwarding devices to upgrade or downgrade from one software version to another with a goal of eliminating the downtime of the router and/or the outage of service. However, it is noted that while most operators desire complete elimination of downtime, minimization of downtime and service degradation is often the expectation.

按照大多数服务提供商(SP)网络运营商的要求,ISU功能已由现代转发设备实现,以从一个软件版本升级或降级到另一个版本,目的是消除路由器停机和/或服务中断。然而,需要注意的是,虽然大多数运营商希望完全消除停机时间,但通常期望将停机时间和服务降级降至最低。

The ISSU operation may apply in terms of an atomic version change of the entire system software or it may be applied in a more modular sense, such as for a patch or maintenance upgrade. The procedure described herein may be used to verify either approach, as may be supported by the vendor hardware and software.

ISU操作可能适用于整个系统软件的原子版本更改,也可能适用于更模块化的意义,例如补丁或维护升级。本文描述的程序可用于验证任一方法,如供应商硬件和软件所支持。

In support of this document, the desired behavior for an ISSU operation can be summarized as follows:

为支持本文件,ISU操作的预期行为可总结如下:

- The software is successfully migrated from one version to a successive version or vice versa.

- 软件成功地从一个版本迁移到后续版本,反之亦然。

- There are no control-plane interruptions throughout the process. That is, the upgrade/downgrade could be accomplished while the device remains "in service". It is noted, however, that most service providers will still undertake such actions in a maintenance window (even in redundant environments) to minimize any risk.

- 整个过程中没有控制平面中断。也就是说,升级/降级可以在设备保持“服务”状态时完成。然而,值得注意的是,大多数服务提供商仍将在维护窗口(即使在冗余环境中)执行此类操作,以将任何风险降至最低。

- Interruptions to the forwarding plane are minimal to none.

- 对转发平面的中断最小甚至为零。

- The total time to accomplish the upgrade is minimized, again to reduce potential network outage exposure (e.g., an external failure event might impact the network as it operates with reduced redundancy).

- 完成升级的总时间被最小化,这也是为了减少潜在的网络中断风险(例如,外部故障事件可能会影响网络,因为网络在减少冗余的情况下运行)。

This document provides a set of procedures to characterize a given forwarding device's ISSU behavior quantitatively, from the perspective of meeting the above expectations.

本文档提供了一组程序,从满足上述期望的角度定量描述给定转发设备的ISU行为。

Different hardware configurations may be expected to be benchmarked, but a typical configuration for a forwarding device that supports ISSU consists of at least one pair of Routing Processors (RPs) that operate in a redundant fashion, and single or multiple forwarding engines (line cards) that may or may not be redundant, as well as fabric cards or other components as applicable. This does not preclude the possibility that a device in question can perform ISSU functions through the operation of independent process components,

可能会对不同的硬件配置进行基准测试,但支持ISU的转发设备的典型配置包括至少一对以冗余方式运行的路由处理器(RPs)和单个或多个转发引擎(线路卡),这些引擎可能是冗余的,也可能不是冗余的,以及织物卡片或其他适用组件。这并不排除有问题的装置可以通过运行独立的工艺组件来执行ISU功能的可能性,

which may be upgraded without impact to the overall operation of the device. As an example, perhaps the software module involved in SNMP functions can be upgraded without impacting other operations.

可在不影响设备整体运行的情况下进行升级。例如,SNMP功能中涉及的软件模块可能可以在不影响其他操作的情况下进行升级。

The concept of a multi-chassis deployment may also be characterized by the current set of proposed methodologies, but the implementation-specific details (i.e., process placement and others) are beyond the scope of the current document.

多机箱部署的概念也可以由当前提出的一组方法来描述,但具体实施细节(即,流程布局和其他)超出了当前文档的范围。

Since most modern forwarding devices, where ISSU would be applicable, do consist of redundant RPs and hardware-separated control-plane and data-plane functionality, this document will focus on methodologies that would be directly applicable to those platforms. It is anticipated that the concepts and approaches described herein may be readily extended to accommodate other device architectures as well.

由于适用ISU的大多数现代转发设备都包含冗余RPs和硬件分离的控制平面和数据平面功能,因此本文件将重点介绍直接适用于这些平台的方法。可以预期,本文描述的概念和方法也可以容易地扩展以适应其他设备架构。

2. Conventions Used in This Document
2. 本文件中使用的公约

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[RFC2119]中所述进行解释。

In this document, these words will appear with that interpretation only when in ALL CAPS. Lowercase uses of these words are not to be interpreted as carrying the significance of RFC 2119.

在本文件中,只有在所有大写字母中,这些单词才会以该解释出现。这些词语的小写用法不得解释为具有RFC 2119的意义。

3. Generic ISSU Process, Phased Approach
3. 通用ISU流程,分阶段方法

ISSU may be viewed as the behavior of a device when exposed to a planned change in its software functionality. This may mean changes to the core operating system, separate processes or daemons, or even firmware logic in programmable hardware devices (e.g., Complex Programmable Logic Device (CPLD) or Field-Programmable Gate Array (FPGA)). The goal of an ISSU implementation is to permit such actions with minimal or no disruption to the primary operation of the device in question.

ISU可被视为设备在其软件功能发生计划变更时的行为。这可能意味着对可编程硬件设备(例如,复杂可编程逻辑设备(CPLD)或现场可编程门阵列(FPGA))中的核心操作系统、独立进程或守护进程,甚至固件逻辑的更改。ISU实施的目标是允许此类操作对相关设备的主要操作造成最小或无中断。

ISSU may be user initiated through direct interaction with the device or activated through some automated process on a management system or even on the device itself. For the purposes of this document, we will focus on the model where the ISSU action is initiated by direct user intervention.

ISU可以由用户通过与设备的直接交互来启动,也可以通过管理系统甚至设备本身上的一些自动化过程来激活。就本文件而言,我们将重点关注通过直接用户干预发起发行行动的模式。

The ISSU process can be viewed as a series of different phases or activities, as defined below. For each of these phases, the test operator must record the outcome as well as any relevant observations (defined further in the present document). Note that, a given vendor implementation may or may not permit the abortion of the in-progress

可将ISU流程视为一系列不同的阶段或活动,定义如下。对于每个阶段,试验操作员必须记录结果以及任何相关观察结果(在本文件中进一步定义)。请注意,给定的供应商实现可能允许也可能不允许中止正在进行的项目

ISSU at particular stages. There may also be certain restrictions as to ISSU availability given certain functional configurations (for example, ISSU in the presence of Bidirectional Failure Detection (BFD) [RFC5880] may not be supported). It is incumbent upon the test operator to ensure that the DUT is appropriately configured to provide the appropriate test environment. As with any properly orchestrated test effort, the test plan document should reflect these and other relevant details and should be written with close attention to the expected production operating environment. The combined analysis of the results of each phase will characterize the overall ISSU process with the main goal of being able to identify and quantify any disruption in service (from the data- and control-plane perspective) allowing operators to plan their maintenance activities with greater precision.

在特定阶段发行股票。鉴于某些功能配置,还可能对ISU可用性存在某些限制(例如,可能不支持存在双向故障检测(BFD)[RFC5880]的ISU)。测试操作员有责任确保DUT经过适当配置,以提供适当的测试环境。与任何适当安排的测试工作一样,测试计划文件应反映这些和其他相关细节,并应密切关注预期的生产操作环境。对每个阶段结果的综合分析将描述整个ISU流程,其主要目标是能够识别和量化服务中断(从数据和控制平面的角度),使运营商能够更精确地规划其维护活动。

3.1. Software Download
3.1. 软件下载

In this first phase, the requested software package may be downloaded to the router and is typically stored onto a device. The downloading of software may be performed automatically by the device as part of the upgrade process, or it may be initiated separately. Such separation allows an administrator to download the new code inside or outside of a maintenance window; it is anticipated that downloading new code and saving it to disk on the router will not impact operations. In the case where the software can be downloaded outside of the actual upgrade process, the administrator should do so; downloading software can skew timing results based on factors that are often not comparative in nature. Internal compatibility verification may be performed by the software running on the DUT, to verify the checksum of the files downloaded as well as any other pertinent checks. Depending upon vendor implementation, these mechanisms may include 1) verifying that the downloaded module(s) meet a set of identified prerequisites such as (but not limited to) hardware or firmware compatibility or minimum software requirements or even 2) ensuring that device is "authorized" to run the target software.

在该第一阶段中,所请求的软件包可以下载到路由器,并且通常存储到设备上。作为升级过程的一部分,软件的下载可以由设备自动执行,也可以单独启动。这种分离允许管理员在维护窗口内部或外部下载新代码;预计下载新代码并将其保存到路由器上的磁盘不会影响操作。如果软件可以在实际升级过程之外下载,管理员应这样做;下载软件可能会基于通常不具有可比性的因素而扭曲计时结果。内部兼容性验证可由DUT上运行的软件执行,以验证下载文件的校验和以及任何其他相关检查。根据供应商的实施,这些机制可能包括1)验证下载的模块是否满足一组确定的先决条件,例如(但不限于)硬件或固件兼容性或最低软件要求,甚至2)确保设备“授权”运行目标软件。

Where such mechanisms are made available by the product, they should be verified, by the tester, with the goal of avoiding operational issues in production. Verification should include both positive verification (ensuring that an ISSU action should be permitted) as well as negative tests (creation of scenarios where the verification mechanisms would report exceptions).

如果产品提供此类机制,则测试人员应验证这些机制,以避免生产中出现操作问题。验证应包括正面验证(确保允许ISU行动)和负面测试(创建验证机制将报告例外情况的场景)。

3.2. Software Staging
3.2. 软件登台

In this second phase, the requested software package is loaded in the pertinent components of a given forwarding device (typically the RP in standby state). Internal compatibility verification may be performed by the software running on the DUT, as part of the upgrade process itself, to verify the checksum of the files downloaded as well as any other pertinent checks. Depending upon vendor implementation, these mechanisms may include verification that the downloaded module(s) meet a set of identified prerequisites such as hardware or firmware compatibility or minimum software requirements. Where such mechanisms are made available by the product, they should be verified, by the tester (again with the goal of avoiding operational issues in production). In this case, the execution of these checks is within the scope of the upgrade time and should be included in the testing results. Once the new software is downloaded to the pertinent components of the DUT, the upgrade begins, and the DUT begins to prepare itself for upgrade. Depending on the vendor implementation, it is expected that redundant hardware pieces within the DUT are upgraded, including the backup or secondary RP.

在第二阶段,请求的软件包被加载到给定转发设备(通常是处于待机状态的RP)的相关组件中。内部兼容性验证可由在DUT上运行的软件执行,作为升级过程本身的一部分,以验证下载文件的校验和以及任何其他相关检查。根据供应商的实施,这些机制可能包括验证下载的模块是否满足一组确定的先决条件,如硬件或固件兼容性或最低软件要求。如果产品提供了此类机制,则应由测试人员对其进行验证(同样是为了避免生产中的操作问题)。在这种情况下,这些检查的执行在升级时间范围内,并应包括在测试结果中。一旦新软件下载到DUT的相关组件,升级开始,DUT开始准备升级。根据供应商的实施情况,预计DUT内的冗余硬件件将升级,包括备份或辅助RP。

3.3. Upgrade Run
3.3. 升级运行

In this phase, a switchover of RPs may take place, where one RP is now upgraded with the new version of software. More importantly, the "Upgrade Run" phase is where the internal changes made to information and state (stored on the router, on disk, and in memory) are either migrated to the "new" version of code, or transformed/rebuilt to meet the standards of the new version of code, and pushed onto the appropriate pieces of hardware. It is within this phase that any outage(s) on the control or forwarding plane may be expected to be observed. This is the critical phase of the ISSU, where the control plane should not be impacted and any interruptions to the forwarding plane should be minimal to none.

在此阶段,可能会发生RPs的切换,其中一个RP现在使用新版本的软件进行升级。更重要的是,在“升级运行”阶段,对信息和状态(存储在路由器、磁盘和内存中)所做的内部更改要么迁移到“新”版本的代码中,要么进行转换/重建以满足新版本代码的标准,并推送到相应的硬件上。在该阶段内,控制或转发平面上的任何中断都可能被观察到。这是ISU的关键阶段,在此阶段,控制平面不应受到影响,对转发平面的任何中断应最小至零。

If any control- or data-plane interruptions are observed within this stage, they should be recorded as part of the results document.

如果在此阶段内观察到任何控制面或数据面中断,应将其记录为结果文件的一部分。

For some implementations, the two stages, as described in Section 3.2 and above, may be concatenated into one monolithic operation. In that case, the calculation of the respective ISSU time intervals may need to be adapted accordingly.

对于一些实现,如第3.2节和以上所述,这两个阶段可以连接成一个单片操作。在这种情况下,可能需要相应地调整各个ISU时间间隔的计算。

3.4. Upgrade Acceptance
3.4. 升级验收

In this phase, the new version of software must be running in all the physical nodes of the logical forwarding device (RPs and line cards as applicable). At this point, configuration control is returned to the operator, and normal device operation, i.e., outside of ISSU-oriented operation, is resumed.

在此阶段,新版本的软件必须在逻辑转发设备(RPs和线路卡,如适用)的所有物理节点中运行。此时,配置控制返回给操作员,并恢复正常的设备操作,即在面向ISU的操作之外。

4. Test Methodology
4. 测试方法

As stated by [RFC6815], the Test Topology Setup must be part of an Isolated Test Environment (ITE).

如[RFC6815]所述,测试拓扑设置必须是隔离测试环境(ITE)的一部分。

The reporting of results must take into account the repeatability considerations from Section 4 of [RFC2544]. It is RECOMMENDED to perform multiple trials and report average results. The results are reported in a simple statement including the measured frame loss and ISSU impact times.

结果报告必须考虑[RFC2544]第4节中的重复性考虑。建议进行多次试验并报告平均结果。结果在一份简单的声明中报告,包括测得的帧损失和ISU冲击时间。

4.1. Test Topology
4.1. 测试拓扑

The hardware configuration of the DUT (Device Under Test) should be identical to the one expected to be or currently deployed in production in order for the benchmark to have relevance. This would include the number of RPs, hardware version, memory, and initial software release, any common chassis components, such as fabric hardware in the case of a fabric-switching platform, and the specific line cards (version, memory, interfaces type, rate, etc.).

DUT(被测设备)的硬件配置应与预期或当前在生产中部署的相同,以便基准测试具有相关性。这将包括RPs的数量、硬件版本、内存和初始软件版本、任何通用机箱组件(如结构交换平台中的结构硬件)以及特定线路卡(版本、内存、接口类型、速率等)。

For the control and data plane, differing configuration approaches may be utilized. The recommended approach relies on "mimicking" the existing production data- and control-plane information, in order to emulate all the necessary Layer 1 through Layer 3 communications and, if appropriate, the upper-layer characteristics of the network, as well as end-to-end traffic/communication pairs. In other words, design a representative load model of the production environment and deploy a collapsed topology utilizing test tools and/or external devices, where the DUT will be tested. Note that, the negative impact of ISSU operations is likely to impact scaled, dynamic topologies to a greater extent than simpler, static environments. As such, this methodology (based upon production configuration) is advised for most test scenarios.

对于控制平面和数据平面,可以使用不同的配置方法。推荐的方法依赖于“模拟”现有生产数据和控制平面信息,以便模拟所有必要的第1层至第3层通信,以及网络的上层特性(如适用),以及端到端通信/通信对。换句话说,设计生产环境的代表性负载模型,并利用测试工具和/或外部设备部署折叠的拓扑,DUT将在其中进行测试。请注意,ISU运营的负面影响可能会比更简单的静态环境更大程度地影响扩展的动态拓扑。因此,对于大多数测试场景,建议使用此方法(基于生产配置)。

The second, more simplistic approach is to deploy an ITE in which endpoints are "directly" connected to the DUT. In this manner, control-plane information is kept to a minimum (only connected interfaces), and only a basic data-plane of sources and destinations is applied. If this methodology is selected, care must be taken to

第二种更简单的方法是部署一个ITE,其中端点“直接”连接到DUT。以这种方式,控制平面信息保持在最小值(仅连接接口),并且仅应用源和目的地的基本数据平面。如果选择此方法,则必须注意

understand that the systemic behavior of the ITE may not be identical to that experienced by a device in a production network role. That is, control-plane validation may be minimal to none with this methodology. Consequently, if this approach is chosen, comparison with at least one production configuration is recommended in order to understand the direct relevance and limitations of the test exercise.

了解ITE的系统行为可能与担任生产网络角色的设备所经历的系统行为不同。也就是说,使用这种方法,控制面验证可能是最小的,甚至没有。因此,如果选择此方法,建议至少与一种生产配置进行比较,以了解测试练习的直接相关性和局限性。

4.2. Load Model
4.2. 负荷模型

In consideration of the defined test topology, a load model must be developed to exercise the DUT while the ISSU event is introduced. This applied load should be defined in such a manner as to provide a granular, repeatable verification of the ISSU impact on transit traffic. Sufficient traffic load (rate) should be applied to permit timing extrapolations at a minimum granularity of 100 milliseconds, e.g., 100 Mbps for a 10 Gbps interface. The use of steady traffic streams rather than bursty loads is preferred to simplify analysis.

考虑到定义的测试拓扑,必须开发负载模型,以便在引入ISU事件时运行DUT。该应用负载的定义方式应确保能够对ISU对过境交通的影响进行粒度、可重复的验证。应应用足够的流量负载(速率),以允许以100毫秒的最小粒度进行定时外推,例如,10 Gbps接口的100 Mbps。为了简化分析,最好使用稳定的业务流,而不是突发负载。

The traffic should be patterned to provide a broad range of source and destination pairs, which resolve to a variety of FIB (Forwarding Information Base) prefix lengths. If the production network environment includes multicast traffic or VPNs (L2, L3, or IPsec), it is critical to include these in the model.

应该对流量进行模式化,以提供范围广泛的源和目标对,这些对可解析为各种FIB(转发信息库)前缀长度。如果生产网络环境包括多播通信量或VPN(L2、L3或IPsec),则在模型中包含它们是至关重要的。

For mixed protocol environments (e.g., IPv4 and IPv6), frames should be distributed between the different protocols. The distribution should approximate the network conditions of deployment. In all cases, the details of the mixed protocol distribution must be included in the reporting.

对于混合协议环境(例如IPv4和IPv6),帧应分布在不同协议之间。分布应近似于部署的网络条件。在所有情况下,报告中必须包括混合协议分发的详细信息。

The feature, protocol timing, and other relevant configurations should be matched to the expected production environment. Deviations from the production templates may be deemed necessary by the test operator (for example, certain features may not support ISSU or the test bed may not be able to accommodate such). However, the impact of any such divergence should be clearly understood, and the differences must be recorded in the results documentation. It is recommended that a Network Management System (NMS) be deployed, preferably similar to that utilized in production. This will allow for monitoring of the DUT while it is being tested, both in terms of supporting the impact analysis on system resources as well as detecting interference with non-transit (management) traffic as a result of the ISSU operation. It is suggested that the actual test exercise be managed utilizing direct console access to the DUT, if at all possible, to avoid the possibility that a network interruption impairs execution of the test exercise.

功能、协议定时和其他相关配置应与预期的生产环境相匹配。测试操作员可能认为有必要偏离生产模板(例如,某些特征可能不支持ISU,或者测试台可能无法容纳此类特征)。但是,应清楚了解任何此类差异的影响,并且必须在结果文件中记录差异。建议部署网络管理系统(NMS),最好与生产中使用的类似。这将允许在测试DUT时对其进行监控,包括支持对系统资源的影响分析,以及检测ISU操作对非运输(管理)流量的干扰。建议尽可能利用直接控制台访问DUT来管理实际测试演习,以避免网络中断影响测试演习执行的可能性。

All in all, the load model should attempt to simulate the production network environment to the greatest extent possible in order to maximize the applicability of the results generated.

总而言之,负荷模型应尽量模拟生产网络环境,以最大限度地提高生成结果的适用性。

5. ISSU Test Methodology
5. ISU测试方法

As previously described, for the purposes of this test document, the ISSU process is divided into three main phases. The following methodology assumes that a suitable test topology has been constructed per Section 4. A description of the methodology to be applied for each of the above phases follows.

如前所述,就本测试文件而言,ISU流程分为三个主要阶段。以下方法假设已根据第4节构建了合适的测试拓扑。下文描述了上述各阶段所采用的方法。

5.1. Pre-ISSU Recommended Verifications
5.1. 发行前建议的验证

The steps of this phase are as follows.

此阶段的步骤如下所示。

1. Verify that enough hardware and software resources are available to complete the Load operation (e.g., enough disk space).

1. 验证是否有足够的硬件和软件资源可用于完成加载操作(例如,足够的磁盘空间)。

2. Verify that the redundancy states between RPs and other nodes are as expected (e.g., redundancy on, RPs synchronized).

2. 验证RPs和其他节点之间的冗余状态是否符合预期(例如,冗余打开、RPs已同步)。

3. Verify that the device, if running protocols capable of NSR (Non-Stop Routing), is in a "ready" state; that is, that the sync between RPs is complete and the system is ready for failover, if necessary.

3. 验证设备是否处于“就绪”状态(如果正在运行支持NSR(非停止路由)的协议);也就是说,RPs之间的同步已完成,并且系统已准备好进行故障切换(如果需要)。

4. Gather a configuration snapshot of the device and all of its applicable components.

4. 收集设备及其所有适用组件的配置快照。

5. Verify that the node is operating in a "steady" state (that is, no critical or maintenance function is being currently performed).

5. 验证节点是否在“稳定”状态下运行(即当前未执行任何关键或维护功能)。

6. Note any other operational characteristics that the tester may deem applicable to the specific implementation deployed.

6. 注意测试人员可能认为适用于所部署的特定实现的任何其他操作特征。

5.2. Software Staging
5.2. 软件登台

The steps of this phase are as follows.

此阶段的步骤如下所示。

1. Establish all relevant protocol adjacencies and stabilize routing within the test topology. In particular, ensure that the scaled levels of the dynamic protocols are dimensioned as specified by the test topology plan.

1. 在测试拓扑中建立所有相关的协议邻接并稳定路由。特别是,确保动态协议的缩放级别按照测试拓扑计划的规定确定尺寸。

2. Clear, relevant logs and interface counters to simplify analysis. If possible, set logging timestamps to a highly granular mode. If the topology includes management systems, ensure that the appropriate polling levels have been applied, sessions have been established, and the responses are per expectation.

2. 清除相关日志和接口计数器以简化分析。如果可能,将日志记录时间戳设置为高粒度模式。如果拓扑包括管理系统,请确保已应用适当的轮询级别,已建立会话,并且响应符合预期。

3. Apply the traffic loads as specified in the load model previously developed for this exercise.

3. 应用之前为本练习开发的负荷模型中指定的交通负荷。

4. Document an operational baseline for the test bed with relevant data supporting the above steps (include all relevant load characteristics of interest in the topology, e.g., routing load, traffic volumes, memory and CPU utilization).

4. 用支持上述步骤的相关数据记录试验台的运行基线(包括拓扑中所有相关的负载特性,例如路由负载、通信量、内存和CPU利用率)。

5. Note the start time (T0) and begin the code change process utilizing the appropriate mechanisms as expected to be used in production (e.g., active download with TFTP, FTP, SCP, etc., or direct install from local or external storage facility). In order to ensure that ISSU process timings are not skewed by the lack of a network-wide synchronization source, the use of a network NTP source is encouraged.

5. 记下开始时间(T0),并使用预期在生产中使用的适当机制(例如,使用TFTP、FTP、SCP等进行主动下载,或从本地或外部存储设施直接安装)开始代码更改过程。为了确保ISU进程计时不会因缺少网络范围的同步源而扭曲,鼓励使用网络NTP源。

6. Take note of any logging information and command-line interface (CLI) prompts as needed. (This detail will be vendor specific.) Respond to any DUT prompts in a timely manner.

6. 根据需要记录所有日志信息和命令行界面(CLI)提示。(此细节将针对供应商。)及时响应DUT的任何提示。

7. Monitor the DUT for the reload of the secondary RP to the new software level. Once the secondary has stabilized on the new code, note the completion time. The duration of these steps will be recorded as "T1".

7. 监控DUT,以便将辅助RP重新加载到新的软件级别。一旦辅助系统稳定在新代码上,请注意完成时间。这些步骤的持续时间将记录为“T1”。

8. Review system logs for any anomalies, check that relevant dynamic protocols have remained stable, and note traffic loss if any. Verify that deployed management systems have not identified any unexpected behavior.

8. 检查系统日志中是否存在任何异常,检查相关的动态协议是否保持稳定,并记录流量损失(如果有)。验证已部署的管理系统未发现任何意外行为。

5.3. Upgrade Run
5.3. 升级运行

The following assumes that the software load step and upgrade step are discretely controllable. If not, maintain the aforementioned timer and monitor for completion of the ISSU as described below.

以下假设软件加载步骤和升级步骤是离散可控的。如果没有,则按照以下说明维护上述计时器和监视器,以完成ISU。

1. Note the start time and initiate the actual upgrade procedure.

1. 记下开始时间并启动实际升级过程。

2. Monitor the operation of the secondary route processor while it initializes with the new software and assumes mastership of the DUT. At this point, pay particular attention to any indications of control-plane disruption, traffic impact, or other anomalous

2. 在次级路由处理器使用新软件进行初始化并掌握DUT的同时,监控次级路由处理器的操作。此时,请特别注意控制平面中断、交通影响或其他异常情况的任何迹象

behavior. Once the DUT has converged upon the new code and returned to normal operation, note the completion time and log the duration of this step as "T2".

行为一旦DUT聚合到新代码并恢复正常运行,记录完成时间并将此步骤的持续时间记录为“T2”。

3. Review the syslog data in the DUT and neighboring devices for any behavior that would be disruptive in a production environment (line card reloads, control-plane flaps, etc.). Examine the traffic generators for any indication of traffic loss over this interval. If the Test Set reported any traffic loss, note the number of frames lost as "TPL_frames", where TPL stands for "Total Packet Loss". If the Test Set also provides outage duration, note this as "TPL_time". (Alternatively, TPL_time may be calculated as (TPL / Offered Load) * 1000. The units for Offered Load are packets per second; the units for TPL_time are milliseconds.)

3. 查看DUT和相邻设备中的系统日志数据,以查看是否存在可能在生产环境中造成中断的任何行为(线路卡重新加载、控制平面襟翼等)。检查流量发生器,查看在此间隔内是否存在任何流量损失迹象。如果测试集报告任何通信量丢失,请将丢失的帧数记为“TPL_帧”,其中TPL表示“总数据包丢失”。如果测试集还提供停机持续时间,则将其记为“TPL_时间”。(或者,TPL_时间可以计算为(TPL/提供负载)*1000。提供负载的单位为每秒数据包;TPL_时间的单位为毫秒。)

4. Verify the DUT status observations as per any NMS managing the DUT and its neighboring devices. Document the observed CPU and memory statistics both during and after the ISSU upgrade event, and ensure that memory and CPU have returned to an expected (previously baselined) level.

4. 根据管理DUT及其相邻设备的任何NMS,验证DUT状态观察结果。记录ISU升级事件期间和之后观察到的CPU和内存统计信息,并确保内存和CPU已恢复到预期(以前基线化)水平。

5.4. Post-ISSU Verification
5.4. 发行后验证

The following describes a set of post-ISSU verification tasks that are not directly part of the ISSU process, but are recommended for execution in order to validate a successful upgrade.

以下描述了一组ISU后验证任务,这些任务不是ISU流程的直接组成部分,但建议执行,以验证升级是否成功。

1. Configuration delta analysis

1. 配置增量分析

Examine the post-ISSU configurations to determine if any changes have occurred either through process error or due to differences in the implementation of the upgraded code.

检查发行后的配置,以确定是否由于流程错误或升级代码的实现差异而发生了任何更改。

2. Exhaustive control-plane analysis

2. 穷举控制平面分析

Review the details of the Routing Information Base (RIB) and FIB to assess whether any unexpected changes have been introduced in the forwarding paths.

查看路由信息库(RIB)和FIB的详细信息,以评估转发路径中是否引入了任何意外更改。

3. Verify that both RPs are up and that the redundancy mechanism for the control plane is enabled and fully synchronized.

3. 验证两个RPs均已启动,并且控制平面的冗余机制已启用并完全同步。

4. Verify that no control-plane (protocol) events or flaps were detected.

4. 确认未检测到控制平面(协议)事件或襟翼。

5. Verify that no L1 and or L2 interface flaps were observed.

5. 确认未观察到L1和/或L2接口活门。

6. Document the hitless operation or presence of an outage based upon the counter values provided by the Test Set.

6. 根据测试集提供的计数器值,记录无故障运行或存在停机。

5.5. ISSU under Negative Stimuli
5.5. 负刺激下的ISU

As an OPTIONAL Test Case, the operator may want to perform an ISSU test while the DUT is under stress by introducing route churn to any or all of the involved phases of the ISSU process.

作为可选测试用例,操作员可能希望在DUT处于压力下时,通过在ISU过程的任何或所有相关阶段引入路由搅动来执行ISU测试。

One approach relies on the operator to gather statistical information from the production environment and determine a specific number of routes to flap every 'fixed' or 'variable' interval. Alternatively, the operator may wish to simply preselect a fixed number of prefixes to flap. As an example, an operator may decide to flap 1% of all the BGP routes every minute and restore them 1 minute afterwards. The tester may wish to apply this negative stimulus throughout the entire ISSU process or, most importantly, during the run phase. It is important to ensure that these routes, which are introduced solely for stress proposes, must not overlap the ones (per the load model) specifically leveraged to calculate the TPL_time (recorded outage). Furthermore, there should not be 'operator-induced' control-plane protocol adjacency flaps for the duration of the test process as it may adversely affect the characterization of the entire test exercise. For example, triggering IGP adjacency events may force recomputation of underlying routing tables with attendant impact to the perceived ISSU timings. While not recommended, if such trigger events are desired by the test operator, care should be taken to avoid the introduction of unexpected anomalies within the test harness.

一种方法依赖于操作员从生产环境中收集统计信息,并确定每个“固定”或“可变”间隔的特定路线数。或者,操作员可能希望简单地预先选择固定数量的前缀进行切换。例如,运营商可能决定每分钟调整1%的所有BGP路由,并在1分钟后恢复它们。测试人员可能希望在整个ISU过程中,或者最重要的是在运行阶段,应用此负面刺激。重要的是要确保这些仅为压力建议而引入的路线不得与专门用于计算TPL_时间(记录的大修)的路线重叠(根据负荷模型)。此外,在试验过程中,不应存在“操作员诱导的”控制平面协议邻接襟翼,因为它可能会对整个试验过程的特性产生不利影响。例如,触发IGP邻接事件可能会强制重新计算基础路由表,从而对感知的ISU计时产生影响。虽然不推荐,但如果测试操作员需要此类触发事件,应注意避免在测试线束内引入意外异常。

6. ISSU Abort and Rollback
6. ISU中止和回滚

Where a vendor provides such support, the ISSU process could be aborted for any reason by the operator. However, the end results and behavior may depend on the specific phase where the process was aborted. While this is implementation dependent, as a general recommendation, if the process is aborted during the "Software Download" or "Software Staging" phases, no impact to service or device functionality should be observed. In contrast, if the process is aborted during the "Upgrade Run" or "Upgrade Accept" phases, the system may reload and revert back to the previous software release, and, as such, this operation may be service affecting. Where vendor support is available, the abort/rollback functionality should be verified, and the impact, if any, quantified generally following the procedures provided above.

如果供应商提供此类支持,运营商可出于任何原因中止ISU流程。但是,最终结果和行为可能取决于进程中止的特定阶段。虽然这取决于实现,但作为一般建议,如果在“软件下载”或“软件暂存”阶段中止该过程,则不应观察到对服务或设备功能的影响。相反,如果进程在“升级运行”或“升级接受”阶段中止,系统可能会重新加载并恢复到以前的软件版本,因此,此操作可能会影响服务。如果有供应商支持,则应验证中止/回滚功能,并按照上述程序对影响(如有)进行量化。

7. Final Report: Data Presentation and Analysis
7. 最终报告:数据展示和分析

All ISSU impact results are summarized in a simple statement describing the "ISSU Disruption Impact" including the measured frame loss and impact time, where impact time is defined as the time frame determined per the TPL_time reported outage. These are considered to be the primary data points of interest.

所有ISU影响结果总结在一个简单的陈述中,描述了“ISU中断影响”,包括测量的帧损失和影响时间,其中影响时间定义为根据TPL_时间报告的中断确定的时间帧。这些被认为是主要的数据点。

However, the entire ISSU operational impact should also be considered in support of planning for maintenance, and, as such, additional reporting points are included.

但是,还应考虑整个ISU运营影响,以支持维护规划,因此,还应包括额外的报告点。

Software download / secondary update T1

软件下载/二次更新T1

Upgrade/Run T2

升级/运行T2

ISSU Traffic Disruption (Frame Loss) TPL_frames

ISU通信中断(帧丢失)TPL_帧

ISSU Traffic Impact Time (milliseconds) TPL_time

ISU流量影响时间(毫秒)TPL_时间

ISSU Housekeeping Interval T3

ISU内务间隔T3

(Time for both RPs up on new code and fully synced - Redundancy restored)

(两个RPs在新代码上启动和完全同步的时间-冗余恢复)

Total ISSU Maintenance Window T4 (sum of T1+T2+T3)

总ISU维护窗口T4(T1+T2+T3之和)

The results reporting must provide the following information:

结果报告必须提供以下信息:

- DUT hardware and software detail

- DUT硬件和软件详细信息

- Test Topology definition and diagram (especially as related to the ISSU operation)

- 测试拓扑定义和图表(特别是与ISU操作相关的)

- Load Model description including protocol mixes and any divergence from the production environment

- 负载模型描述,包括协议混合和与生产环境的任何差异

- Time Results as per above

- 时间结果如上所述

- Anomalies Observed during ISSU

- 在ISU期间观察到的异常情况

- Anomalies Observed in post-ISSU analysis

- 发行后分析中观察到的异常情况

It is RECOMMENDED that the following parameters be reported as outlined below:

建议如下所述报告以下参数:

   Parameter                Units or Examples
   ---------------------------------------------------------------
        
   Parameter                Units or Examples
   ---------------------------------------------------------------
        

Traffic Load Frames per second and bits per second

流量加载每秒帧数和每秒比特数

Disruption (average) Frames

中断(平均)帧

Impact Time (average) Milliseconds

影响时间(平均)毫秒

Number of trials Integer count

试验次数整数计数

Protocols IPv4, IPv6, MPLS, etc.

协议IPv4、IPv6、MPLS等。

Frame Size Octets

帧大小八位组

Port Media Ethernet, Gigabit Ethernet (GbE), Packet over SONET (POS), etc.

端口媒体以太网、千兆以太网(GbE)、SONET数据包(POS)等。

Port Speed 10 Gbps, 1 Gbps, 100 Mbps, etc.

端口速度10 Gbps、1 Gbps、100 Mbps等。

Interface Encaps Ethernet, Ethernet VLAN, PPP, High-Level Data Link Control (HDLC), etc.

接口包括以太网、以太网VLAN、PPP、高级数据链路控制(HDLC)等。

Number of Prefixes Integer count

前缀数整数计数

flapped (ON Interval) (Optional) # of prefixes / Time (min.)

拍打(间隔)(可选)#前缀/时间(分钟)

flapped (OFF Interval) (Optional) # of prefixes / Time (min.)

拍打(关闭间隔)(可选)#前缀/时间(分钟)

Document any configuration deltas that are observed after the ISSU upgrade has taken effect. Note differences that are driven by changes in the patch or release level, as well as items that are aberrant changes due to software faults. In either of these cases, any unexpected behavioral changes should be analyzed and a determination made as to the impact of the change (be it functional variances or operational impacts to existing scripts or management mechanisms).

记录ISU升级生效后观察到的任何配置增量。请注意由修补程序或版本级别中的更改驱动的差异,以及由于软件故障导致异常更改的项目。在这两种情况下,应分析任何意外的行为变化,并确定变化的影响(无论是功能差异还是对现有脚本或管理机制的操作影响)。

7.1. Data Collection Considerations
7.1. 数据收集注意事项

When a DUT is undergoing an ISSU operation, it's worth noting that the DUT's data collection and reporting of data, such as counters, interface statistics, log messages, etc., may not be accurate. As such, one should not rely on the DUT's data collection methods, but rather, should use the test tools and equipment to collect data used

当DUT正在进行ISU操作时,值得注意的是,DUT的数据收集和数据报告,如计数器、接口统计数据、日志消息等,可能不准确。因此,不应依赖DUT的数据收集方法,而应使用测试工具和设备收集使用的数据

for reporting in Section 7. Care and consideration should be paid in testing or adding new test cases, such that the desired data can be collected from the test tools themselves, or other external equipment, outside of the DUT itself.

用于第7节中的报告。在测试或添加新的测试用例时,应谨慎考虑,以便可以从测试工具本身或DUT本身之外的其他外部设备收集所需的数据。

8. Security Considerations
8. 安全考虑

All BMWG memos are limited to testing in a laboratory Isolated Test Environment (ITE), thus avoiding accidental interruption to production networks due to test activities.

所有BMWG备忘录仅限于在实验室隔离测试环境(ITE)中进行测试,从而避免因测试活动而对生产网络造成意外中断。

All benchmarking activities are limited to technology characterization using controlled stimuli in a laboratory environment with dedicated address space and the other constraints [RFC2544].

所有基准测试活动仅限于在具有专用地址空间和其他约束条件的实验室环境中使用受控刺激进行技术表征[RFC2544]。

The benchmarking network topology will be an independent test setup and MUST NOT be connected to devices that may forward the test traffic into a production network or misroute traffic to the test management network.

基准网络拓扑将是一个独立的测试设置,不得连接到可能将测试流量转发到生产网络或将流量错误路由到测试管理网络的设备。

Further, benchmarking is performed on a "black-box" basis, relying solely on measurements observable external to the Device Under Test / System Under Test (DUT/SUT).

此外,基准测试是在“黑盒”基础上进行的,仅依赖于被测设备/被测系统(DUT/SUT)外部可观察到的测量值。

Special capabilities should not exist in the DUT/SUT specifically for benchmarking purposes. Any implications for network security arising from the DUT/SUT should be identical in the lab and in production networks.

DUT/SUT中不应存在专门用于基准测试的特殊能力。DUT/SUT对网络安全的任何影响应在实验室和生产网络中相同。

9. References
9. 工具书类
9.1. Normative References
9.1. 规范性引用文件

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <http://www.rfc-editor.org/info/rfc2119>.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,DOI 10.17487/RFC2119,1997年3月<http://www.rfc-editor.org/info/rfc2119>.

[RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for Network Interconnect Devices", RFC 2544, DOI 10.17487/RFC2544, March 1999, <http://www.rfc-editor.org/info/rfc2544>.

[RFC2544]Bradner,S.和J.McQuaid,“网络互连设备的基准测试方法”,RFC 2544,DOI 10.17487/RFC2544,1999年3月<http://www.rfc-editor.org/info/rfc2544>.

9.2. Informative References
9.2. 资料性引用

[RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, <http://www.rfc-editor.org/info/rfc5880>.

[RFC5880]Katz,D.和D.Ward,“双向转发检测(BFD)”,RFC 5880,DOI 10.17487/RFC5880,2010年6月<http://www.rfc-editor.org/info/rfc5880>.

[RFC6815] Bradner, S., Dubray, K., McQuaid, J., and A. Morton, "Applicability Statement for RFC 2544: Use on Production Networks Considered Harmful", RFC 6815, DOI 10.17487/RFC6815, November 2012, <http://www.rfc-editor.org/info/rfc6815>.

[RFC6815]Bradner,S.,Dubrey,K.,McQuaid,J.,和A.Morton,“RFC 2544的适用性声明:在被认为有害的生产网络上使用”,RFC 6815,DOI 10.17487/RFC6815,2012年11月<http://www.rfc-editor.org/info/rfc6815>.

Acknowledgments

致谢

The authors wish to thank Vibin Thomas for his valued review and feedback.

作者希望感谢Vibin Thomas的宝贵评论和反馈。

Authors' Addresses

作者地址

Sarah Banks VSS Monitoring Email: sbanks@encrypted.net

Sarah Banks VSS监控电子邮件:sbanks@encrypted.net

Fernando Calabria Cisco Systems Email: fcalabri@cisco.com

费尔南多·卡拉布里亚思科系统电子邮件:fcalabri@cisco.com

Gery Czirjak Juniper Networks Email: gczirjak@juniper.net

Gery Czirjak Juniper Networks电子邮件:gczirjak@juniper.net

Ramdas Machat Juniper Networks Email: rmachat@juniper.net

Ramdas Machat Juniper Networks电子邮件:rmachat@juniper.net