Network Working Group                              D. Papadimitriou, Ed.
Request for Comments: 4428                                       Alcatel
Category: Informational                                   E. Mannie, Ed.
                                                                Perceval
                                                              March 2006
        
Network Working Group                              D. Papadimitriou, Ed.
Request for Comments: 4428                                       Alcatel
Category: Informational                                   E. Mannie, Ed.
                                                                Perceval
                                                              March 2006
        

Analysis of Generalized Multi-Protocol Label Switching (GMPLS)-based Recovery Mechanisms (including Protection and Restoration)

基于广义多协议标签交换(GMPLS)的恢复机制分析(包括保护和恢复)

Status of This Memo

关于下段备忘

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (2006).

版权所有(C)互联网协会(2006年)。

Abstract

摘要

This document provides an analysis grid to evaluate, compare, and contrast the Generalized Multi-Protocol Label Switching (GMPLS) protocol suite capabilities with the recovery mechanisms currently proposed at the IETF CCAMP Working Group. A detailed analysis of each of the recovery phases is provided using the terminology defined in RFC 4427. This document focuses on transport plane survivability and recovery issues and not on control plane resilience and related aspects.

本文件提供了一个分析网格,用于评估、比较和对比通用多协议标签交换(GMPLS)协议套件的功能与IETF CCAMP工作组目前提出的恢复机制。使用RFC 4427中定义的术语对每个恢复阶段进行详细分析。本文件侧重于运输机的生存能力和恢复问题,而非控制机的恢复能力和相关方面。

Table of Contents

目录

   1. Introduction ....................................................3
   2. Contributors ....................................................4
   3. Conventions Used in this Document ...............................5
   4. Fault Management ................................................5
      4.1. Failure Detection ..........................................5
      4.2. Failure Localization and Isolation .........................8
      4.3. Failure Notification .......................................9
      4.4. Failure Correlation .......................................11
   5. Recovery Mechanisms ............................................11
      5.1. Transport vs. Control Plane Responsibilities ..............11
      5.2. Technology-Independent and Technology-Dependent
           Mechanisms ................................................12
           5.2.1. OTN Recovery .......................................12
           5.2.2. Pre-OTN Recovery ...................................13
           5.2.3. SONET/SDH Recovery .................................13
        
   1. Introduction ....................................................3
   2. Contributors ....................................................4
   3. Conventions Used in this Document ...............................5
   4. Fault Management ................................................5
      4.1. Failure Detection ..........................................5
      4.2. Failure Localization and Isolation .........................8
      4.3. Failure Notification .......................................9
      4.4. Failure Correlation .......................................11
   5. Recovery Mechanisms ............................................11
      5.1. Transport vs. Control Plane Responsibilities ..............11
      5.2. Technology-Independent and Technology-Dependent
           Mechanisms ................................................12
           5.2.1. OTN Recovery .......................................12
           5.2.2. Pre-OTN Recovery ...................................13
           5.2.3. SONET/SDH Recovery .................................13
        
      5.3. Specific Aspects of Control Plane-Based Recovery
           Mechanisms ................................................14
           5.3.1. In-Band vs. Out-Of-Band Signaling ..................14
           5.3.2. Uni- vs. Bi-Directional Failures ...................15
           5.3.3. Partial vs. Full Span Recovery .....................17
           5.3.4. Difference between LSP, LSP Segment and
                  Span Recovery ......................................18
      5.4. Difference between Recovery Type and Scheme ...............19
      5.5. LSP Recovery Mechanisms ...................................21
           5.5.1. Classification .....................................21
           5.5.2. LSP Restoration ....................................23
           5.5.3. Pre-Planned LSP Restoration ........................24
           5.5.4. LSP Segment Restoration ............................25
   6. Reversion ......................................................26
      6.1. Wait-To-Restore (WTR) .....................................26
      6.2. Revertive Mode Operation ..................................26
      6.3. Orphans ...................................................27
   7. Hierarchies ....................................................27
      7.1. Horizontal Hierarchy (Partitioning) .......................28
      7.2. Vertical Hierarchy (Layers) ...............................28
           7.2.1. Recovery Granularity ...............................30
      7.3. Escalation Strategies .....................................30
      7.4. Disjointness ..............................................31
           7.4.1. SRLG Disjointness ..................................32
   8. Recovery Mechanisms Analysis ...................................33
      8.1. Fast Convergence (Detection/Correlation and
           Hold-off Time) ............................................34
      8.2. Efficiency (Recovery Switching Time) ......................34
      8.3. Robustness ................................................35
      8.4. Resource Optimization .....................................36
           8.4.1. Recovery Resource Sharing ..........................37
           8.4.2. Recovery Resource Sharing and SRLG Recovery ........39
           8.4.3. Recovery Resource Sharing, SRLG
                  Disjointness and Admission Control .................40
   9. Summary and Conclusions ........................................42
   10. Security Considerations .......................................43
   11. Acknowledgements ..............................................43
   12. References ....................................................44
      12.1. Normative References .....................................44
      12.2. Informative References ...................................44
        
      5.3. Specific Aspects of Control Plane-Based Recovery
           Mechanisms ................................................14
           5.3.1. In-Band vs. Out-Of-Band Signaling ..................14
           5.3.2. Uni- vs. Bi-Directional Failures ...................15
           5.3.3. Partial vs. Full Span Recovery .....................17
           5.3.4. Difference between LSP, LSP Segment and
                  Span Recovery ......................................18
      5.4. Difference between Recovery Type and Scheme ...............19
      5.5. LSP Recovery Mechanisms ...................................21
           5.5.1. Classification .....................................21
           5.5.2. LSP Restoration ....................................23
           5.5.3. Pre-Planned LSP Restoration ........................24
           5.5.4. LSP Segment Restoration ............................25
   6. Reversion ......................................................26
      6.1. Wait-To-Restore (WTR) .....................................26
      6.2. Revertive Mode Operation ..................................26
      6.3. Orphans ...................................................27
   7. Hierarchies ....................................................27
      7.1. Horizontal Hierarchy (Partitioning) .......................28
      7.2. Vertical Hierarchy (Layers) ...............................28
           7.2.1. Recovery Granularity ...............................30
      7.3. Escalation Strategies .....................................30
      7.4. Disjointness ..............................................31
           7.4.1. SRLG Disjointness ..................................32
   8. Recovery Mechanisms Analysis ...................................33
      8.1. Fast Convergence (Detection/Correlation and
           Hold-off Time) ............................................34
      8.2. Efficiency (Recovery Switching Time) ......................34
      8.3. Robustness ................................................35
      8.4. Resource Optimization .....................................36
           8.4.1. Recovery Resource Sharing ..........................37
           8.4.2. Recovery Resource Sharing and SRLG Recovery ........39
           8.4.3. Recovery Resource Sharing, SRLG
                  Disjointness and Admission Control .................40
   9. Summary and Conclusions ........................................42
   10. Security Considerations .......................................43
   11. Acknowledgements ..............................................43
   12. References ....................................................44
      12.1. Normative References .....................................44
      12.2. Informative References ...................................44
        
1. Introduction
1. 介绍

This document provides an analysis grid to evaluate, compare, and contrast the Generalized MPLS (GMPLS) protocol suite capabilities with the recovery mechanisms proposed at the IETF CCAMP Working Group. The focus is on transport plane survivability and recovery issues and not on control-plane-resilience-related aspects. Although the recovery mechanisms described in this document impose different requirements on GMPLS-based recovery protocols, the protocols' specifications will not be covered in this document. Though the concepts discussed are technology independent, this document implicitly focuses on SONET [T1.105]/SDH [G.707], Optical Transport Networks (OTN) [G.709], and pre-OTN technologies, except when specific details need to be considered (for instance, in the case of failure detection).

本文档提供了一个分析网格,用于评估、比较和对比通用MPLS(GMPLS)协议套件的功能与IETF CCAMP工作组提出的恢复机制。重点是运输飞机的生存能力和恢复问题,而不是控制飞机的恢复力相关方面。尽管本文档中描述的恢复机制对基于GMPLS的恢复协议提出了不同的要求,但本文档将不涉及协议规范。尽管所讨论的概念与技术无关,但本文件隐含地关注SONET[T1.105]/SDH[G.707]、光传输网络(OTN)[G.709]和前OTN技术,除非需要考虑具体细节(例如,在故障检测的情况下)。

A detailed analysis is provided for each of the recovery phases as identified in [RFC4427]. These phases define the sequence of generic operations that need to be performed when a LSP/Span failure (or any other event generating such failures) occurs:

对[RFC4427]中确定的每个恢复阶段进行了详细分析。这些阶段定义了发生LSP/Span故障(或产生此类故障的任何其他事件)时需要执行的一般操作顺序:

- Phase 1: Failure Detection - Phase 2: Failure Localization (and Isolation) - Phase 3: Failure Notification - Phase 4: Recovery (Protection or Restoration) - Phase 5: Reversion (Normalization)

- 阶段1:故障检测-阶段2:故障定位(和隔离)-阶段3:故障通知-阶段4:恢复(保护或恢复)-阶段5:恢复(正常化)

Together, failure detection, localization, and notification phases are referred to as "fault management". Within a recovery domain, the entities involved during the recovery operations are defined in [RFC4427]; these entities include ingress, egress, and intermediate nodes. The term "recovery mechanism" is used to cover both protection and restoration mechanisms. Specific terms such as "protection" and "restoration" are used only when differentiation is required. Likewise the term "failure" is used to represent both signal failure and signal degradation.

故障检测、定位和通知阶段统称为“故障管理”。在恢复域中,[RFC4427]中定义了恢复操作期间涉及的实体;这些实体包括入口、出口和中间节点。“恢复机制”一词用于涵盖保护机制和恢复机制。只有在需要区分时,才使用“保护”和“恢复”等特定术语。同样,术语“故障”用于表示信号故障和信号退化。

In addition, when analyzing the different hierarchical recovery mechanisms including disjointness-related issues, a clear distinction is made between partitioning (horizontal hierarchy) and layering (vertical hierarchy). In order to assess the current GMPLS protocol capabilities and the potential need for further extensions, the dimensions for analyzing each of the recovery mechanisms detailed in this document are introduced. This document concludes by detailing the applicability of the current GMPLS protocol building blocks for recovery purposes.

此外,在分析不同的分层恢复机制(包括与不相交相关的问题)时,分区(水平层次)和分层(垂直层次)之间有明确的区别。为了评估当前的GMPLS协议功能和进一步扩展的潜在需求,本文介绍了分析每个恢复机制的维度。本文档最后详细说明了当前GMPLS协议构建块在恢复方面的适用性。

2. Contributors
2. 贡献者

This document is the result of the CCAMP Working Group Protection and Restoration design team joint effort. Besides the editors, the following are the authors that contributed to the present memo:

本文件是CCAMP工作组保护和修复设计团队共同努力的结果。除编辑外,以下是撰写本备忘录的作者:

Deborah Brungard (AT&T) 200 S. Laurel Ave. Middletown, NJ 07748, USA

Deborah Brungard(美国电话电报公司)美国新泽西州米德尔敦S.Laurel大道200号,邮编07748

   EMail: dbrungard@att.com
        
   EMail: dbrungard@att.com
        

Sudheer Dharanikota

苏德尔·达兰尼科塔

   EMail: sudheer@ieee.org
        
   EMail: sudheer@ieee.org
        

Jonathan P. Lang (Sonos) 506 Chapala Street Santa Barbara, CA 93101, USA

乔纳森·P·朗(索诺斯)美国加利福尼亚州圣巴巴拉查帕拉街506号,邮编:93101

   EMail: jplang@ieee.org
        
   EMail: jplang@ieee.org
        

Guangzhi Li (AT&T) 180 Park Avenue, Florham Park, NJ 07932, USA

美国新泽西州弗罗勒姆公园公园公园大道180号广志里(AT&T)07932

   EMail: gli@research.att.com
        
   EMail: gli@research.att.com
        

Eric Mannie Perceval Rue Tenbosch, 9 1000 Brussels Belgium

Eric Mannie Perceval Rue Tenbosch,9 1000比利时布鲁塞尔

   Phone: +32-2-6409194
   EMail: eric.mannie@perceval.net
        
   Phone: +32-2-6409194
   EMail: eric.mannie@perceval.net
        

Dimitri Papadimitriou (Alcatel) Francis Wellesplein, 1 B-2018 Antwerpen, Belgium

Dimitri Papadimitriou(阿尔卡特)Francis Welleslein,1 B-2018比利时安特卫普

   EMail: dimitri.papadimitriou@alcatel.be
        
   EMail: dimitri.papadimitriou@alcatel.be
        

Bala Rajagopalan Microsoft India Development Center Hyderabad, India

巴拉·拉贾戈帕兰微软印度发展中心,印度海得拉巴

   EMail: balar@microsoft.com
        
   EMail: balar@microsoft.com
        

Yakov Rekhter (Juniper) 1194 N. Mathilda Avenue Sunnyvale, CA 94089, USA

亚科夫·雷克特(Juniper)美国加利福尼亚州桑尼维尔市马蒂尔达大道北1194号,邮编94089

   EMail: yakov@juniper.net
        
   EMail: yakov@juniper.net
        
3. Conventions Used in this Document
3. 本文件中使用的公约

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照[RFC2119]中所述进行解释。

Any other recovery-related terminology used in this document conforms to that defined in [RFC4427]. The reader is also assumed to be familiar with the terminology developed in [RFC3945], [RFC3471], [RFC3473], [RFC4202], and [RFC4204].

本文件中定义的任何其他术语符合RFC4427。假定读者熟悉[RFC3945]、[RFC3471]、[RFC3473]、[RFC4202]和[RFC4204]中开发的术语。

4. Fault Management
4. 故障管理
4.1. Failure Detection
4.1. 故障检测

Transport failure detection is the only phase that cannot be achieved by the control plane alone because the latter needs a hook to the transport plane in order to collect the related information. It has to be emphasized that even if failure events themselves are detected by the transport plane, the latter, upon a failure condition, must trigger the control plane for subsequent actions through the use of GMPLS signaling capabilities (see [RFC3471] and [RFC3473]) or Link Management Protocol capabilities (see [RFC4204], Section 6).

传输故障检测是控制平面无法单独实现的唯一阶段,因为后者需要一个连接到传输平面的挂钩来收集相关信息。必须强调的是,即使传输平面检测到故障事件本身,后者在出现故障情况时,也必须通过使用GMPLS信令功能(参见[RFC3471]和[RFC3473])或链路管理协议功能(参见[RFC4204],第6节)触发控制平面进行后续操作。

Therefore, by definition, transport failure detection is transport technology dependent (and so exceptionally, we keep here the "transport plane" terminology). In transport fault management, distinction is made between a defect and a failure. Here, the discussion addresses failure detection (persistent fault cause). In the technology-dependent descriptions, a more precise specification will be provided.

因此,根据定义,传输故障检测依赖于传输技术(因此例外情况下,我们在这里保留“传输平面”术语)。在传输故障管理中,对缺陷和故障进行区分。这里,讨论的是故障检测(持续故障原因)。在技术相关说明中,将提供更精确的规范。

As an example, SONET/SDH (see [G.707], [G.783], and [G.806]) provides supervision capabilities covering:

例如,SONET/SDH(见[G.707]、[G.783]和[G.806])提供的监控功能包括:

- Continuity: SONET/SDH monitors the integrity of the continuity of a trail (i.e., section or path). This operation is performed by monitoring the presence/absence of the signal. Examples are Loss of Signal (LOS) detection for the physical layer, Unequipped (UNEQ) Signal detection for the path layer, Server Signal Fail Detection (e.g., AIS) at the client layer.

- 连续性:SONET/SDH监控一条线路(即区段或路径)连续性的完整性。该操作通过监控信号的存在/不存在来执行。例如,物理层的信号丢失(LOS)检测、路径层的未装备(UNEQ)信号检测、客户端层的服务器信号故障检测(例如,AIS)。

- Connectivity: SONET/SDH monitors the integrity of the routing of the signal between end-points. Connectivity monitoring is needed if the layer provides flexible connectivity, either automatically (e.g., cross-connects) or manually (e.g., fiber distribution frame). An example is the Trail (i.e., section or path) Trace Identifier used at the different layers and the corresponding Trail Trace Identifier Mismatch detection.

- 连接性:SONET/SDH监控端点之间信号路由的完整性。如果层提供灵活的连接(自动(例如交叉连接)或手动(例如光纤配线架),则需要进行连接监控。一个例子是在不同层使用的轨迹(即,区段或路径)轨迹标识符和相应的轨迹轨迹标识符不匹配检测。

- Alignment: SONET/SDH checks that the client and server layer frame start can be correctly recovered from the detection of loss of alignment. The specific processes depend on the signal/frame structure and may include: (multi-)frame alignment, pointer processing, and alignment of several independent frames to a common frame start in case of inverse multiplexing. Loss of alignment is a generic term. Examples are loss of frame, loss of multi-frame, or loss of pointer.

- 对齐:SONET/SDH检查客户端和服务器层帧开始是否可以从检测到的对齐丢失中正确恢复。具体过程取决于信号/帧结构,并且可以包括:(多)帧对齐、指针处理,以及在反向复用的情况下将多个独立帧对齐到公共帧开始。失准是一个通用术语。例如丢失帧、丢失多帧或丢失指针。

- Payload type: SONET/SDH checks that compatible adaptation functions are used at the source and the destination. Normally, this is done by adding a payload type identifier (referred to as the "signal label") at the source adaptation function and comparing it with the expected identifier at the destination. For instance, the payload type identifier is compared with the corresponding mismatch detection.

- 有效负载类型:SONET/SDH检查源和目标是否使用了兼容的自适应功能。通常,这是通过在源自适应功能处添加有效负载类型标识符(称为“信号标签”)并将其与目标处的预期标识符进行比较来实现的。例如,将有效负载类型标识符与相应的不匹配检测进行比较。

- Signal Quality: SONET/SDH monitors the performance of a signal. For instance, if the performance falls below a certain threshold, a defect -- excessive errors (EXC) or degraded signal (DEG) -- is detected.

- 信号质量:SONET/SDH监控信号的性能。例如,如果性能下降到某个阈值以下,则会检测到缺陷——过度错误(EXC)或降级信号(DEG)。

The most important point is that the supervision processes and the corresponding failure detection (used to initiate the recovery phase(s)) result in either:

最重要的一点是,监督过程和相应的故障检测(用于启动恢复阶段)会导致:

- Signal Degrade (SD): A signal indicating that the associated data has degraded in the sense that a degraded defect condition is active (for instance, a dDEG declared when the Bit Error Rate exceeds a preset threshold). Or

- 信号降级(SD):一种信号,指示相关数据在降级缺陷条件激活的意义上已降级(例如,当误码率超过预设阈值时声明的dDEG)。或

- Signal Fail (SF): A signal indicating that the associated data has failed in the sense that a signal interrupting near-end defect condition is active (as opposed to the degraded defect).

- 信号失效(SF):在信号中断近端缺陷条件激活(与降级缺陷相反)的情况下,指示相关数据失效的信号。

In Optical Transport Networks (OTN), equivalent supervision capabilities are provided at the optical/digital section layers (i.e., Optical Transmission Section (OTS), Optical Multiplex Section (OMS) and Optical channel Transport Unit (OTU)) and at the optical/digital path layers (i.e., Optical Channel (OCh) and Optical channel Data Unit (ODU)). Interested readers are referred to the ITU-T Recommendations [G.798] and [G.709] for more details.

在光传输网络(OTN)中,在光/数字部分层(即光传输部分(OTS)、光多路复用部分(OMS)和光信道传输单元(OTU))和光/数字路径层(即光信道(OCh)和光信道数据单元(ODU))提供等效的监控能力。感兴趣的读者可参考ITU-T建议[G.798]和[G.709]了解更多详情。

The above are examples that illustrate cases where the failure detection and reporting entities (see [RFC4427]) are co-located. The following example illustrates the scenario where the failure detecting and reporting entities (see [RFC4427]) are not co-located.

上述示例说明了故障检测和报告实体(参见[RFC4427])位于同一位置的情况。以下示例说明了故障检测和报告实体(请参见[RFC4427])不在同一位置的场景。

In pre-OTN networks, a failure may be masked by intermediate O-E-O based Optical Line System (OLS), preventing a Photonic Cross-Connect (PXC) from detecting upstream failures. In such cases, failure detection may be assisted by an out-of-band communication channel, and failure condition may be reported to the PXC control plane. This can be provided by using [RFC4209] extensions that deliver IP message-based communication between the PXC and the OLS control plane. Also, since PXCs are independent of the framing format, failure conditions can only be triggered either by detecting the absence of the optical signal or by measuring its quality. These mechanisms are generally less reliable than electrical (digital) ones. Both types of detection mechanisms are outside the scope of this document. If the intermediate OLS supports electrical (digital) mechanisms, using the LMP communication channel, these failure conditions are reported to

在OTN之前的网络中,故障可能被基于中间O-E-O的光线路系统(OLS)掩盖,从而防止光子交叉连接(PXC)检测上游故障。在这种情况下,故障检测可由带外通信信道辅助,并且故障状况可报告给PXC控制平面。这可以通过使用[RFC4209]扩展来实现,该扩展在PXC和OLS控制平面之间提供基于IP消息的通信。此外,由于PXC独立于成帧格式,故障条件只能通过检测光信号的缺失或测量其质量来触发。这些机制通常不如电气(数字)机制可靠。这两种类型的检测机制都不在本文档的范围内。如果中间OLS支持使用LMP通信信道的电气(数字)机制,则将这些故障情况报告给

the PXC and subsequent recovery actions are performed as described in Section 5. As such, from the control plane viewpoint, this mechanism turns the OLS-PXC-composed system into a single logical entity, thus having the same failure management mechanisms as any other O-E-O capable device.

PXC和后续恢复操作按照第5节所述执行。因此,从控制平面的观点来看,该机制将OLS PXC组合系统转变为单个逻辑实体,因此具有与任何其他支持O-E-O的设备相同的故障管理机制。

More generally, the following are typical failure conditions in SONET/SDH and pre-OTN networks:

更一般地说,以下是SONET/SDH和OTN前网络中的典型故障情况:

- Loss of Light (LOL)/Loss of Signal (LOS): Signal Failure (SF) condition where the optical signal is not detected any longer on the receiver of a given interface.

- 失光(LOL)/信号丢失(LOS):在给定接口的接收器上不再检测到光信号的信号故障(SF)情况。

- Signal Degrade (SD): detection of the signal degradation over a specific period of time.

- 信号降级(SD):检测特定时间段内的信号降级。

- For SONET/SDH payloads, all of the above-mentioned supervision capabilities can be used, resulting in SD or SF conditions.

- 对于SONET/SDH有效负载,可以使用上述所有监控功能,从而产生SD或SF条件。

In summary, the following cases apply when considering the communication between the detecting and reporting entities:

总之,在考虑检测和报告实体之间的通信时,以下情况适用:

- Co-located detecting and reporting entities: both the detecting and reporting entities are on the same node (e.g., SONET/SDH equipment, Opaque cross-connects, and, with some limitations, Transparent cross-connects, etc.)

- 位于同一位置的检测和报告实体:检测和报告实体位于同一节点上(例如,SONET/SDH设备、不透明交叉连接,以及在某些限制下的透明交叉连接等)

- Non-co-located detecting and reporting entities:

- 非同地检测和报告实体:

o with in-band communication between entities: entities are physically separated, but the transport plane provides in-band communication between them (e.g., Server Signal Failures such as Alarm Indication Signal (AIS), etc.)

o 实体间带内通信:实体物理上是分离的,但传输平面提供它们之间的带内通信(例如,服务器信号故障,如报警指示信号(AIS)等)

o with out-of-band communication between entities: entities are physically separated, but an out-of-band communication channel is provided between them (e.g., using [RFCF4204]).

o 实体之间带外通信:实体物理上是分离的,但它们之间提供带外通信信道(例如,使用[RFCF4204])。

4.2. Failure Localization and Isolation
4.2. 故障定位与隔离

Failure localization provides information to the deciding entity about the location (and so the identity) of the transport plane entity that detects the LSP(s)/span(s) failure. The deciding entity can then make an accurate decision to achieve finer grained recovery switching action(s). Note that this information can also be included as part of the failure notification (see Section 4.3).

故障定位向决策实体提供有关检测LSP/span故障的传输平面实体的位置(以及身份)的信息。然后,决策实体可以做出准确的决策,以实现更细粒度的恢复切换操作。请注意,该信息也可作为故障通知的一部分包含在内(见第4.3节)。

In some cases, this accurate failure localization information may be less urgent to determine if it requires performing more time-consuming failure isolation (see also Section 4.4). This is particularly the case when edge-to-edge LSP recovery is performed based on a simple failure notification (including the identification of the working LSPs under failure condition). Note that "edge" refers to a sub-network end-node, for instance. In this case, a more accurate localization and isolation can be performed after recovery of these LSPs.

在某些情况下,这种准确的故障定位信息可能不太紧迫,无法确定是否需要执行更耗时的故障隔离(另见第4.4节)。当基于简单故障通知(包括故障条件下工作LSP的标识)执行边到边LSP恢复时,情况尤其如此。请注意,例如,“边缘”是指子网络终端节点。在这种情况下,可以在恢复这些LSP后执行更精确的定位和隔离。

Failure localization should be triggered immediately after the fault detection phase. This operation can be performed at the transport plane and/or (if the operation is unavailable via the transport plane) the control plane level where dedicated signaling messages can be used. When performed at the control plane level, a protocol such as LMP (see [RFC4204], Section 6) can be used for failure localization purposes.

故障定位应在故障检测阶段后立即触发。此操作可在传输平面和/或(如果操作无法通过传输平面)控制平面级别执行,其中可使用专用信令消息。当在控制平面级别执行时,诸如LMP(参见[RFC4204],第6节)之类的协议可用于故障定位目的。

4.3. Failure Notification
4.3. 故障通知

Failure notification is used 1) to inform intermediate nodes that an LSP/span failure has occurred and has been detected and 2) to inform the deciding entities (which can correspond to any intermediate or end-point of the failed LSP/span) that the corresponding service is not available. In general, these deciding entities will be the ones making the appropriate recovery decision. When co-located with the recovering entity, these entities will also perform the corresponding recovery action(s).

故障通知用于1)通知中间节点LSP/span故障已发生且已检测到;2)通知决策实体(可对应于故障LSP/span的任何中间点或端点)相应的服务不可用。一般来说,这些决策实体将是做出适当恢复决策的实体。当与恢复实体位于同一位置时,这些实体还将执行相应的恢复操作。

Failure notification can be provided either by the transport or by the control plane. As an example, let us first briefly describe the failure notification mechanism defined at the SONET/SDH transport plane level (also referred to as maintenance signal supervision):

故障通知可以由传输或控制平面提供。例如,让我们首先简要描述在SONET/SDH传输平面级别定义的故障通知机制(也称为维护信号监控):

- AIS (Alarm Indication Signal) occurs as a result of a failure condition such as Loss of Signal and is used to notify downstream nodes (of the appropriate layer processing) that a failure has occurred. AIS performs two functions: 1) inform the intermediate nodes (with the appropriate layer monitoring capability) that a failure has been detected and 2) notify the connection end-point that the service is no longer available.

- AIS(报警指示信号)是信号丢失等故障条件的结果,用于通知下游节点(适当的层处理)发生故障。AIS执行两项功能:1)通知中间节点(具有适当的层监控功能)已检测到故障,2)通知连接端点服务不再可用。

For a distributed control plane supporting one (or more) failure notification mechanism(s), regardless of the mechanism's actual implementation, the same capabilities are needed with more (or less) information provided about the LSPs/spans under failure condition, their detailed statuses, etc.

对于支持一个(或多个)故障通知机制的分布式控制平面,无论该机制的实际实现如何,都需要相同的功能,并提供更多(或更少)关于故障条件下LSP/跨度及其详细状态等的信息。

The most important difference between these mechanisms is related to the fact that transport plane notifications (as defined today) would directly initiate either a certain type of protection switching (such as those described in [RFC4427]) via the transport plane or restoration actions via the management plane.

这些机制之间最重要的区别在于,传输平面通知(如今天所定义)将通过传输平面直接启动某种类型的保护切换(如[RFC4427]中所述),或通过管理平面启动恢复操作。

On the other hand, using a failure notification mechanism through the control plane would provide the possibility of triggering either a protection or a restoration action via the control plane. This has the advantage that a control-plane-recovery-responsible entity does not necessarily have to be co-located with a transport maintenance/recovery domain. A control plane recovery domain can be defined at entities not supporting a transport plane recovery.

另一方面,通过控制平面使用故障通知机制将提供通过控制平面触发保护或恢复动作的可能性。这样做的优点是,控制平面恢复负责实体不必与传输维护/恢复域位于同一位置。可以在不支持传输平面恢复的实体上定义控制平面恢复域。

Moreover, as specified in [RFC3473], notification message exchanges through a GMPLS control plane may not follow the same path as the LSP/spans for which these messages carry the status. In turn, this ensures a fast, reliable (through acknowledgement and the use of

此外,如[RFC3473]中所述,通过GMPLS控制平面进行的通知消息交换可能不会遵循与LSP/spans相同的路径,因为这些消息携带状态。反过来,这确保了快速、可靠(通过确认和使用

either a dedicated control plane network or disjoint control channels), and efficient (through the aggregation of several LSP/span statuses within the same message) failure notification mechanism.

专用控制平面网络或不相交的控制通道)和高效(通过在同一消息中聚合多个LSP/span状态)故障通知机制。

The other important properties to be met by the failure notification mechanism are mainly the following:

故障通知机制需要满足的其他重要特性主要包括:

- Notification messages must provide enough information such that the most efficient subsequent recovery action will be taken at the recovering entities (in most of the recovery types and schemes this action is even deterministic). Remember here that these entities can be either intermediate or end-points through which normal traffic flows. Based on local policy, intermediate nodes may not use this information for subsequent recovery actions (see for instance the APS protocol phases as described in [RFC4427]). In addition, since fast notification is a mechanism running in collaboration with the existing GMPLS signaling (see [RFC3473]) that also allows intermediate nodes to stay informed about the status of the working LSP/spans under failure condition.

- 通知消息必须提供足够的信息,以便在恢复实体处采取最有效的后续恢复操作(在大多数恢复类型和方案中,此操作甚至是确定性的)。请记住,这些实体可以是正常流量流经的中间点或终点。根据本地策略,中间节点可能不会将此信息用于后续恢复操作(例如,请参见[RFC4427]中所述的APS协议阶段)。此外,由于快速通知是一种与现有GMPLS信令(参见[RFC3473])协作运行的机制,该机制还允许中间节点随时了解故障条件下工作LSP/跨度的状态。

The trade-off here arises when defining what information the LSP/span end-points (more precisely, the deciding entities) need in order for the recovering entity to take the best recovery action: If not enough information is provided, the decision cannot be optimal (note that in this eventuality, the important issue is to quantify the level of sub-optimality). If too much information is provided, the control plane may be overloaded with unnecessary information and the aggregation/correlation of this notification information will be more complex and time-consuming to achieve. Note that a more detailed quantification of the amount of information to be exchanged and processed is strongly dependent on the failure notification protocol.

当定义LSP/span端点(更准确地说,是决策实体)需要什么信息才能使恢复实体采取最佳恢复行动时,这里会出现权衡:如果没有提供足够的信息,决策就无法达到最优(注意,在这种情况下,重要的问题是量化次优水平)。如果提供的信息太多,则控制平面可能会因不必要的信息而过载,并且此通知信息的聚合/关联将更加复杂和耗时。请注意,要交换和处理的信息量的更详细量化在很大程度上取决于故障通知协议。

- If the failure localization and isolation are not performed by one of the LSP/span end-points or some intermediate points, the points should receive enough information from the notification message in order to locate the failure. Otherwise, they would need to (re-) initiate a failure localization and isolation action.

- 如果某个LSP/span端点或某些中间点未执行故障定位和隔离,则这些点应从通知消息中接收足够的信息,以便定位故障。否则,他们将需要(重新)启动故障定位和隔离操作。

- Avoiding so-called notification storms implies that 1) the failure detection output is correlated (i.e., alarm correlation) and aggregated at the node detecting the failure(s), 2) the failure notifications are directed to a restricted set of destinations (in general the end-points), and 3) failure notification suppression (i.e., alarm suppression) is provided in order to limit flooding in case of multiple and/or correlated failures detected at several locations in the network.

- 避免所谓的通知风暴意味着1)故障检测输出在检测故障的节点上进行关联(即报警关联)和聚合,2)故障通知被定向到一组受限的目的地(通常是端点),以及3)故障通知抑制(即报警抑制)为了在网络中多个位置检测到多个和/或相关故障时限制泛洪。

- Alarm correlation and aggregation (at the failure-detecting node) implies a consistent decision based on the conditions for which a trade-off between fast convergence (at detecting node) and fast notification (implying that correlation and aggregation occurs at receiving end-points) can be found.

- 报警关联和聚合(在故障检测节点处)意味着基于以下条件的一致性决策:快速聚合(在检测节点处)和快速通知(意味着关联和聚合发生在接收端点处)之间的权衡。

4.4. Failure Correlation
4.4. 失效相关性

A single failure event (such as a span failure) can cause multiple failure (such as individual LSP failures) conditions to be reported. These can be grouped (i.e., correlated) to reduce the number of failure conditions communicated on the reporting channel, for both in-band and out-of-band failure reporting.

单个故障事件(如span故障)可导致报告多个故障(如单个LSP故障)情况。对于带内和带外故障报告,这些可以分组(即相关)以减少报告通道上传达的故障条件数量。

In such a scenario, it can be important to wait for a certain period of time, typically called failure correlation time, and gather all the failures to report them as a group of failures (or simply group failure). For instance, this approach can be provided using LMP-WDM for pre-OTN networks (see [RFC4209]) or when using Signal Failure/Degrade Group in the SONET/SDH context.

在这种情况下,等待一段时间(通常称为故障关联时间)并收集所有故障以将其报告为一组故障(或简单的组故障)可能很重要。例如,对于OTN之前的网络(参见[RFC4209]),或者在SONET/SDH上下文中使用信号故障/降级组时,可以使用LMP-WDM提供这种方法。

Note that a default average time interval during which failure correlation operation can be performed is difficult to provide since it is strongly dependent on the underlying network topology. Therefore, providing a per-node configurable failure correlation time can be advisable. The detailed selection criteria for this time interval are outside of the scope of this document.

请注意,执行故障关联操作的默认平均时间间隔很难提供,因为它强烈依赖于底层网络拓扑。因此,建议提供每个节点可配置的故障关联时间。此时间间隔的详细选择标准不在本文件范围内。

When failure correlation is not provided, multiple failure notification messages may be sent out in response to a single failure (for instance, a fiber cut). Each failure notification message contains a set of information on the failed working resources (for instance, the individual lambda LSP flowing through this fiber). This allows for a more prompt response, but can potentially overload the control plane due to a large amount of failure notifications.

当未提供故障关联时,可能会发送多个故障通知消息以响应单个故障(例如,光纤切断)。每个故障通知消息都包含一组关于故障工作资源的信息(例如,流经此光纤的单个lambda LSP)。这允许更迅速的响应,但由于大量的故障通知,可能会导致控制平面过载。

5. Recovery Mechanisms
5. 恢复机制
5.1. Transport vs. Control Plane Responsibilities
5.1. 运输与控制飞机的责任

When applicable, recovery resources are provisioned, for both protection and restoration, using GMPLS signaling capabilities. Thus, these are control plane-driven actions (topological and resource-constrained) that are always performed in this context.

如果适用,将使用GMPLS信令功能为保护和恢复提供恢复资源。因此,这些是控制平面驱动的动作(拓扑和资源受限),总是在这种情况下执行。

The following tables give an overview of the responsibilities taken by the control plane in case of LSP/span recovery:

下表概述了LSP/span恢复时控制平面所承担的责任:

1. LSP/span Protection

1. LSP/span保护

- Phase 1: Failure Detection Transport plane - Phase 2: Failure Localization/Isolation Transport/Control plane - Phase 3: Failure Notification Transport/Control plane - Phase 4: Protection Switching Transport/Control plane - Phase 5: Reversion (Normalization) Transport/Control plane

- 阶段1:故障检测传输平面-阶段2:故障定位/隔离传输/控制平面-阶段3:故障通知传输/控制平面-阶段4:保护切换传输/控制平面-阶段5:恢复(正常化)传输/控制平面

Note: in the context of LSP/span protection, control plane actions can be performed either for operational purposes and/or synchronization purposes (vertical synchronization between transport and control plane) and/or notification purposes (horizontal synchronization between end-nodes at control plane level). This suggests the selection of the responsible plane (in particular for protection switching) during the provisioning phase of the protected/protection LSP.

注意:在LSP/span保护的上下文中,控制平面动作可以出于操作目的和/或同步目的(传输平面和控制平面之间的垂直同步)和/或通知目的(控制平面级别的端节点之间的水平同步)执行。这建议在受保护/保护LSP的供应阶段选择责任平面(特别是保护切换)。

2. LSP/span Restoration

2. LSP/span恢复

- Phase 1: Failure Detection Transport plane - Phase 2: Failure Localization/Isolation Transport/Control plane - Phase 3: Failure Notification Control plane - Phase 4: Recovery Switching Control plane - Phase 5: Reversion (Normalization) Control plane

- 阶段1:故障检测传输平面-阶段2:故障定位/隔离传输/控制平面-阶段3:故障通知控制平面-阶段4:恢复切换控制平面-阶段5:恢复(正常化)控制平面

Therefore, this document primarily focuses on provisioning of LSP recovery resources, failure notification mechanisms, recovery switching, and reversion operations. Moreover, some additional considerations can be dedicated to the mechanisms associated to the failure localization/isolation phase.

因此,本文档主要关注LSP恢复资源的配置、故障通知机制、恢复切换和恢复操作。此外,还可以专门考虑与故障定位/隔离阶段相关的机制。

5.2. Technology-Independent and Technology-Dependent Mechanisms
5.2. 技术独立和技术依赖机制

The present recovery mechanisms analysis applies to any circuit-oriented data plane technology with discrete bandwidth increments (like SONET/SDH, G.709 OTN, etc.) being controlled by a GMPLS-based distributed control plane.

目前的恢复机制分析适用于由基于GMPLS的分布式控制平面控制的离散带宽增量(如SONET/SDH、G.709 OTN等)的任何面向电路的数据平面技术。

The following sub-sections are not intended to favor one technology versus another. They list pro and cons for each technology in order to determine the mechanisms that GMPLS-based recovery must deliver to overcome their cons and make use of their pros in their respective applicability context.

以下各小节的目的不是支持一种技术而不是另一种技术。他们列出了每种技术的优缺点,以确定基于GMPLS的恢复必须提供哪些机制来克服它们的缺点,并在各自的适用性环境中利用它们的优点。

5.2.1. OTN Recovery
5.2.1. OTN恢复

OTN recovery specifics are left for further consideration.

OTN恢复细节留待进一步考虑。

5.2.2. Pre-OTN Recovery
5.2.2. OTN前恢复

Pre-OTN recovery specifics (also referred to as "lambda switching") present mainly the following advantages:

OTN前恢复规范(也称为“lambda交换”)主要具有以下优点:

- They benefit from a simpler architecture, making it more suitable for mesh-based recovery types and schemes (on a per-channel basis).

- 它们受益于更简单的体系结构,使其更适合基于网格的恢复类型和方案(基于每个通道)。

- Failure suppression at intermediate node transponders, e.g., use of squelching, implies that failures (such as LoL) will propagate to edge nodes. Thus, edge nodes will have the possibility to initiate recovery actions driven by upper layers (vs. use of non-standard masking of upstream failures).

- 中间节点转发器的故障抑制,例如使用静噪,意味着故障(如LoL)将传播到边缘节点。因此,边缘节点将有可能启动由上层驱动的恢复操作(与使用上游故障的非标准掩蔽相比)。

The main disadvantage is the lack of interworking due to the large amount of failure management (in particular failure notification protocols) and recovery mechanisms currently available.

主要缺点是由于目前可用的大量故障管理(特别是故障通知协议)和恢复机制,缺乏互通性。

Note also, that for all-optical networks, combination of recovery with optical physical impairments is left for a future release of this document because corresponding detection technologies are under specification.

另外请注意,对于全光网络,恢复与光物理损伤的结合将留待本文档的未来版本,因为相应的检测技术正在规范中。

5.2.3. SONET/SDH Recovery
5.2.3. SONET/SDH恢复

Some of the advantages of SONET [T1.105]/SDH [G.707], and more generically any Time Division Multiplexing (TDM) transport plane recovery, are that they provide:

SONET[T1.105]/SDH[G.707]和更一般的任何时分多路复用(TDM)传输平面恢复的一些优点在于,它们提供:

- Protection types operating at the data plane level that are standardized (see [G.841]) and can operate across protected domains and interwork (see [G.842]).

- 在数据平面级别运行的标准化保护类型(见[G.841]),可跨受保护域和互通运行(见[G.842])。

- Failure detection, notification, and path/section Automatic Protection Switching (APS) mechanisms.

- 故障检测、通知和路径/区段自动保护切换(APS)机制。

- Greater control over the granularity of the TDM LSPs/links that can be recovered with respect to coarser optical channel (or whole fiber content) recovery switching

- 对TDM LSP/链路的粒度进行更大的控制,这些链路可以针对较粗的光信道(或整个光纤内容)恢复交换进行恢复

Some of the limitations of the SONET/SDH recovery are:

SONET/SDH恢复的一些限制是:

- Limited topological scope: Inherently the use of ring topologies, typically, dedicated Sub-Network Connection Protection (SNCP) or shared protection rings, has reduced flexibility and resource efficiency with respect to the (somewhat more complex) meshed recovery.

- 拓扑范围有限:环形拓扑(通常为专用子网连接保护(SNCP)或共享保护环)的使用降低了网状恢复的灵活性和资源效率。

- Inefficient use of spare capacity: SONET/SDH protection is largely applied to ring topologies, where spare capacity often remains idle, making the efficiency of bandwidth usage a real issue.

- 备用容量使用效率低下:SONET/SDH保护主要应用于环形拓扑,在环形拓扑中,备用容量通常处于空闲状态,这使得带宽使用效率成为一个真正的问题。

- Support of meshed recovery requires intensive network management development, and the functionality is limited by both the network elements and the capabilities of the element management systems (thus justifying the development of GMPLS-based distributed recovery mechanisms.)

- 支持网状恢复需要密集的网络管理开发,其功能受到网络元件和元件管理系统能力的限制(因此有理由开发基于GMPLS的分布式恢复机制)

5.3. Specific Aspects of Control Plane-Based Recovery Mechanisms
5.3. 基于控制平面的恢复机制的特定方面
5.3.1. In-Band vs. Out-Of-Band Signaling
5.3.1. 带内与带外信令

The nodes communicate through the use of IP terminating control channels defining the control plane (transport) topology. In this context, two classes of transport mechanisms can be considered here: in-fiber or out-of-fiber (through a dedicated physically diverse control network referred to as the Data Communication Network or DCN). The potential impact of the usage of an in-fiber (signaling) transport mechanism is briefly considered here.

节点通过使用定义控制平面(传输)拓扑的IP端接控制通道进行通信。在这种情况下,这里可以考虑两类传输机制:光纤内或光纤外(通过称为数据通信网络或DCN的专用物理多样性控制网络)。这里简要考虑使用光纤内(信令)传输机制的潜在影响。

In-fiber transport mechanisms can be further subdivided into in-band and out-of-band. As such, the distinction between in-fiber in-band and in-fiber out-of-band signaling reduces to the consideration of a logically- versus physically-embedded control plane topology with respect to the transport plane topology. In the scope of this document, it is assumed that at least one IP control channel between each pair of adjacent nodes is continuously available to enable the exchange of recovery-related information and messages. Thus, in either case (i.e., in-band or out-of-band) at least one logical or physical control channel between each pair of nodes is always expected to be available.

光纤内传输机制可进一步细分为带内传输机制和带外传输机制。因此,带内光纤和带外光纤信令之间的区别减少到考虑相对于传输平面拓扑的逻辑与物理嵌入式控制平面拓扑。在本文档的范围内,假设每对相邻节点之间至少有一个IP控制通道连续可用,以实现恢复相关信息和消息的交换。因此,在任一情况下(即,带内或带外),每对节点之间的至少一个逻辑或物理控制信道总是期望可用。

Therefore, the key issue when using in-fiber signaling is whether one can assume independence between the fault-tolerance capabilities of control plane and the failures affecting the transport plane (including the nodes). Note also that existing specifications like the OTN provide a limited form of independence for in-fiber signaling by dedicating a separate optical supervisory channel (OSC, see [G.709] and [G.874]) to transport the overhead and other control traffic. For OTNs, failure of the OSC does not result in failing the optical channels. Similarly, loss of the control channel must not result in failing the data channels (transport plane).

因此,使用光纤信令时的关键问题是,是否可以假设控制平面的容错能力与影响传输平面(包括节点)的故障之间的独立性。还请注意,现有规范(如OTN)通过专用于传输开销和其他控制流量的单独光监控信道(OSC,见[G.709]和[G.874]),为光纤内信令提供了有限形式的独立性。对于OTN,OSC故障不会导致光信道故障。同样,控制通道的丢失不得导致数据通道(传输平面)失效。

5.3.2. Uni- vs. Bi-Directional Failures
5.3.2. 单向故障与双向故障

The failure detection, correlation, and notification mechanisms (described in Section 4) can be triggered when either a uni-directional or a bi-directional LSP/Span failure occurs (or a combination of both). As illustrated in Figures 1 and 2, two alternatives can be considered here:

当发生单向或双向LSP/Span故障(或两者的组合)时,可以触发故障检测、关联和通知机制(如第4节所述)。如图1和图2所示,此处可考虑两种备选方案:

1. Uni-directional failure detection: the failure is detected on the receiver side, i.e., it is detected by only the downstream node to the failure (or by the upstream node depending on the failure propagation direction, respectively).

1. 单向故障检测:在接收器侧检测故障,即仅由故障的下游节点检测(或由上游节点检测,具体取决于故障传播方向)。

2. Bi-directional failure detection: the failure is detected on the receiver side of both downstream node AND upstream node to the failure.

2. 双向故障检测:在故障的下游节点和上游节点的接收器侧检测故障。

Notice that after the failure detection time, if only control-plane-based failure management is provided, the peering node is unaware of the failure detection status of its neighbor.

请注意,在故障检测时间之后,如果仅提供基于控制平面的故障管理,对等节点不知道其邻居的故障检测状态。

    -------             -------           -------             -------
   |       |           |       |Tx     Rx|       |           |       |
   | NodeA |----...----| NodeB |xxxxxxxxx| NodeC |----...----| NodeD |
   |       |----...----|       |---------|       |----...----|       |
    -------             -------           -------             -------
        
    -------             -------           -------             -------
   |       |           |       |Tx     Rx|       |           |       |
   | NodeA |----...----| NodeB |xxxxxxxxx| NodeC |----...----| NodeD |
   |       |----...----|       |---------|       |----...----|       |
    -------             -------           -------             -------
        
   t0                                >>>>>>> F
        
   t0                                >>>>>>> F
        
   t1                      x <---------------x
                               Notification
   t2  <--------...--------x                 x--------...-------->
          Up Notification                      Down Notification
        
   t1                      x <---------------x
                               Notification
   t2  <--------...--------x                 x--------...-------->
          Up Notification                      Down Notification
        

Figure 1: Uni-directional failure detection

图1:单向故障检测

    -------             -------           -------             -------
   |       |           |       |Tx     Rx|       |           |       |
   | NodeA |----...----| NodeB |xxxxxxxxx| NodeC |----...----| NodeD |
   |       |----...----|       |xxxxxxxxx|       |----...----|       |
    -------             -------           -------             -------
        
    -------             -------           -------             -------
   |       |           |       |Tx     Rx|       |           |       |
   | NodeA |----...----| NodeB |xxxxxxxxx| NodeC |----...----| NodeD |
   |       |----...----|       |xxxxxxxxx|       |----...----|       |
    -------             -------           -------             -------
        
   t0                      F <<<<<<< >>>>>>> F
        
   t0                      F <<<<<<< >>>>>>> F
        
   t1                      x <-------------> x
                               Notification
   t2  <--------...--------x                 x--------...-------->
          Up Notification                      Down Notification
        
   t1                      x <-------------> x
                               Notification
   t2  <--------...--------x                 x--------...-------->
          Up Notification                      Down Notification
        

Figure 2: Bi-directional failure detection

图2:双向故障检测

After failure detection, the following failure management operations can be subsequently considered:

故障检测后,可随后考虑以下故障管理操作:

- Each detecting entity sends a notification message to the corresponding transmitting entity. For instance, in Figure 1, node C sends a notification message to node B. In Figure 2, node C sends a notification message to node B while node B sends a notification message to node C. To ensure reliable failure notification, a dedicated acknowledgement message can be returned back to the sender node.

- 每个检测实体向相应的发送实体发送通知消息。例如,在图1中,节点C向节点B发送通知消息。在图2中,节点C向节点B发送通知消息,而节点B向节点C发送通知消息。为了确保可靠的故障通知,可以将专用确认消息返回给发送方节点。

- Next, within a certain (and pre-determined) time window, nodes impacted by the failure occurrences may perform their correlation. In case of uni-directional failure, node B only receives the notification message from node C, and thus the time for this operation is negligible. In case of bi-directional failure, node B has to correlate the received notification message from node C with the corresponding locally detected information (and node C has to do the same with the message from node B).

- 接下来,在特定(和预先确定的)时间窗口内,受故障事件影响的节点可以执行它们的关联。在单向故障的情况下,节点B仅从节点C接收通知消息,因此此操作的时间可以忽略不计。在双向故障的情况下,节点B必须将从节点C接收到的通知消息与相应的本地检测到的信息相关联(并且节点C必须对来自节点B的消息执行相同的操作)。

- After some (pre-determined) period of time, referred to as the hold-off time, if the local recovery actions (see Section 5.3.4) were not successful, the following occurs. In case of uni-directional failure and depending on the directionality of the LSP, node B should send an upstream notification message (see [RFC3473]) to the ingress node A. Node C may send a downstream notification message (see [RFC3473]) to the egress node D. However, in that case, only node A would initiate an edge to edge recovery action. Node A is referred to as the "master", and node D is referred to as the "slave", per [RFC4427]. Note that the other LSP end-node (node D in this case) may be optionally notified using a downstream notification message (see [RFC3473]).

- 在一段(预先确定的)时间(称为延迟时间)后,如果本地恢复操作(见第5.3.4节)未成功,则会发生以下情况。在单向故障的情况下,并且取决于LSP的方向性,节点B应该向入口节点A发送上游通知消息(参见[RFC3473])。节点C可以向出口节点D发送下游通知消息(参见[RFC3473])。然而,在这种情况下,只有节点A将发起边到边恢复动作。根据[RFC4427],节点A被称为“主节点”,节点D被称为“从节点”。注意,可以选择使用下游通知消息(参见[RFC3473])通知另一LSP端节点(在本例中为节点D)。

In case of bi-directional failure, node B should send an upstream notification message (see [RFC3473]) to the ingress node A. Node C may send a downstream notification message (see [RFC3473]) to the egress node D. However, due to the dependence on the LSP directionality, only ingress node A would initiate an edge-to-edge recovery action. Note that the other LSP end-node (node D in this case) should also be notified of this event using a downstream notification message (see [RFC3473]). For instance, if an LSP directed from D to A is under failure condition, only the notification message sent from node C to D would initiate a recovery action. In this case, per [RFC4427], the deciding and recovering node D is referred to as the "master", while node A is referred to as the "slave" (i.e., recovering only entity).

在双向故障的情况下,节点B应向入口节点A发送上游通知消息(参见[RFC3473])。节点C可向出口节点D发送下游通知消息(参见[RFC3473])。然而,由于对LSP方向性的依赖,只有入口节点A将发起边到边恢复动作。注意,还应使用下游通知消息(参见[RFC3473])将此事件通知另一个LSP端节点(在本例中为节点D)。例如,如果从D定向到A的LSP处于故障状态,则只有从节点C发送到D的通知消息才会启动恢复操作。在这种情况下,根据[RFC4427],决定和恢复节点D被称为“主节点”,而节点A被称为“从节点”(即,仅恢复实体)。

Note: The determination of the master and the slave may be based either on configured information or dedicated protocol capability.

注:主设备和从设备的确定可能基于配置信息或专用协议能力。

In the above scenarios, the path followed by the upstream and downstream notification messages does not have to be the same as the one followed by the failed LSP (see [RFC3473] for more details on the notification message exchange). The important point concerning this mechanism is that either the detecting/reporting entity (i.e., nodes B and C) is also the deciding/recovery entity or the detecting/reporting entity is simply an intermediate node in the subsequent recovery process. One refers to local recovery in the former case, and to edge-to-edge recovery in the latter one (see also Section 5.3.4).

在上述场景中,上游和下游通知消息所遵循的路径不必与失败LSP所遵循的路径相同(有关通知消息交换的更多详细信息,请参阅[RFC3473])。关于该机制的重要一点是,检测/报告实体(即节点B和C)也是决策/恢复实体,或者检测/报告实体只是后续恢复过程中的中间节点。一种是指前一种情况下的局部恢复,另一种情况下的边到边恢复(另见第5.3.4节)。

5.3.3. Partial vs. Full Span Recovery
5.3.3. 部分恢复与完整恢复

When a given span carries more than one LSP or LSP segment, an additional aspect must be considered. In case of span failure, the LSPs it carries can be recovered individually, as a group (aka bulk LSP recovery), or as independent sub-groups. When correlation time windows are used and simultaneous recovery of several LSPs can be performed using a single request, the selection of this mechanism would be triggered independently of the failure notification granularity. Moreover, criteria for forming such sub-groups are outside of the scope of this document.

当给定跨距承载多个LSP或LSP段时,必须考虑附加方面。在span故障的情况下,它携带的LSP可以单独恢复,作为一个组(也称为批量LSP恢复)或作为独立的子组。如果使用了相关时间窗口,并且可以使用单个请求同时恢复多个LSP,则此机制的选择将独立于故障通知粒度而触发。此外,形成此类子组的标准不在本文件范围内。

Additional complexity arises in the case of (sub-)group LSP recovery. Between a given pair of nodes, the LSPs that a given (sub-)group contains may have been created from different source nodes (i.e., initiator) and directed toward different destination nodes. Consequently the failure notification messages following a bi-directional span failure that affects several LSPs (or the whole group of LSPs it carries) are not necessarily directed toward the same initiator nodes. In particular, these messages may be directed

在(子)组LSP恢复的情况下会产生额外的复杂性。在给定的一对节点之间,给定(子)组包含的lsp可能已从不同的源节点(即,启动器)创建并指向不同的目标节点。因此,影响多个LSP(或其承载的整个LSP组)的双向span故障后的故障通知消息不一定指向相同的启动器节点。特别地,这些消息可以被定向

to both the upstream and downstream nodes to the failure. Therefore, such span failure may trigger recovery actions to be performed from both sides (i.e., from both the upstream and the downstream nodes to the failure). In order to facilitate the definition of the corresponding recovery mechanisms (and their sequence), one assumes here as well that, per [RFC4427], the deciding (and recovering) entity (referred to as the "master") is the only initiator of the recovery of the whole LSP (sub-)group.

向上游和下游节点发送故障消息。因此,这种跨度故障可能会触发从两侧(即,从上游和下游节点到故障)执行的恢复动作。为了便于定义相应的恢复机制(及其顺序),此处还假设,根据[RFC4427],决定(和恢复)实体(称为“主”)是整个LSP(子)组恢复的唯一发起人。

5.3.4. Difference between LSP, LSP Segment and Span Recovery
5.3.4. LSP、LSP段和跨度恢复之间的差异

The recovery definitions given in [RFC4427] are quite generic and apply for link (or local span) and LSP recovery. The major difference between LSP, LSP Segment and span recovery is related to the number of intermediate nodes that the signaling messages have to travel. Since nodes are not necessarily adjacent in the case of LSP (or LSP Segment) recovery, signaling message exchanges from the reporting to the deciding/recovery entity may have to cross several intermediate nodes. In particular, this applies to the notification messages due to the number of hops separating the location of a failure occurrence from its destination. This results in an additional propagation and forwarding delay. Note that the former delay may in certain circumstances be non-negligible; e.g., in a copper out-of-band network, the delay is approximately 1 ms per 200km.

[RFC4427]中给出的恢复定义非常通用,适用于链路(或本地范围)和LSP恢复。LSP、LSP段和跨度恢复之间的主要区别与信令消息必须传输的中间节点的数量有关。由于在LSP(或LSP段)恢复的情况下节点不一定相邻,因此从报告到决定/恢复实体的信令消息交换可能必须跨越多个中间节点。特别是,这适用于通知消息,因为将故障发生的位置与其目的地分离的跳数。这会导致额外的传播和转发延迟。注意,在某些情况下,前一个延迟可能是不可忽略的;e、 例如,在铜缆带外网络中,延迟约为每200km 1 ms。

Moreover, the recovery mechanisms applicable to end-to-end LSPs and to the segments that may compose an end-to-end LSP (i.e., edge-to-edge recovery) can be exactly the same. However, one expects in the latter case, that the destination of the failure notification message will be the ingress/egress of each of these segments. Therefore, using the mechanisms described in Section 5.3.2, failure notification messages can be exchanged first between terminating points of the LSP segment, and after expiration of the hold-off time, between terminating points of the end-to-end LSP.

此外,适用于端到端LSP和可组成端到端LSP的段(即,边到边恢复)的恢复机制可以完全相同。然而,在后一种情况下,人们期望故障通知消息的目的地将是这些段中的每个段的入口/出口。因此,使用第5.3.2节中描述的机制,故障通知消息可以首先在LSP段的终止点之间交换,并且在延迟时间到期后,在端到端LSP的终止点之间交换。

Note: Several studies provide quantitative analysis of the relative performance of LSP/span recovery techniques. [WANG] for instance, provides an analysis grid for these techniques showing that dynamic LSP restoration (see Section 5.5.2) performs well under medium network loads, but suffers performance degradations at higher loads due to greater contention for recovery resources. LSP restoration upon span failure, as defined in [WANG], degrades at higher loads because paths around failed links tend to increase the hop count of the affected LSPs and thus consume additional network resources. Also, performance of LSP restoration can be enhanced by a failed working LSP's source node that initiates a new recovery attempt if an initial attempt fails. A single retry attempt is sufficient to

注:一些研究提供了LSP/span恢复技术相对性能的定量分析。[WANG]例如,为这些技术提供了一个分析网格,表明动态LSP恢复(见第5.5.2节)在中等网络负载下表现良好,但在更高负载下由于对恢复资源的更大竞争而导致性能下降。如[WANG]中所定义的,由于故障链路周围的路径往往会增加受影响LSP的跳数,从而消耗额外的网络资源,因此,跨域故障时的LSP恢复会在更高的负载下降级。此外,LSP恢复的性能可以通过在初始尝试失败时启动新恢复尝试的失败工作LSP的源节点来增强。一次重试尝试就足以

produce large increases in the restoration success rate and ability to initiate successful LSP restoration attempts, especially at high loads, while not adding significantly to the long-term average recovery time. Allowing additional attempts produces only small additional gains in performance. This suggests using additional (intermediate) crankback signaling when using dynamic LSP restoration (described in Section 5.5.2 - case 2). Details on crankback signaling are outside the scope of this document.

大大提高了恢复成功率和启动成功LSP恢复尝试的能力,特别是在高负载下,同时不会显著增加长期平均恢复时间。允许额外的尝试只会在性能上产生很小的额外收益。这建议在使用动态LSP恢复(如第5.5.2节-情况2所述)时使用附加(中间)回退信号。有关拖转信号的详细信息不在本文件范围内。

5.4. Difference between Recovery Type and Scheme
5.4. 恢复类型和方案之间的差异

[RFC4427] defines the basic LSP/span recovery types. This section describes the recovery schemes that can be built using these recovery types. In brief, a recovery scheme is defined as the combination of several ingress-egress node pairs supporting a given recovery type (from the set of the recovery types they allow). Several examples are provided here to illustrate the difference between recovery types such as 1:1 or M:N, and recovery schemes such as (1:1)^n or (M:N)^n (referred to as shared-mesh recovery).

[RFC4427]定义基本LSP/span恢复类型。本节介绍可以使用这些恢复类型构建的恢复方案。简言之,恢复方案被定义为支持给定恢复类型(来自它们允许的恢复类型集合)的多个入口-出口节点对的组合。这里提供了几个示例来说明恢复类型(如1:1或M:N)与恢复方案(如(1:1)^N或(M:N)^N)(称为共享网格恢复)之间的区别。

1. (1:1)^n with recovery resource sharing

1. (1:1)^n与恢复资源共享

The exponent, n, indicates the number of times a 1:1 recovery type is applied between at most n different ingress-egress node pairs. Here, at most n pairs of disjoint working and recovery LSPs/spans share a common resource at most n times. Since the working LSPs/spans are mutually disjoint, simultaneous requests for use of the shared (common) resource will only occur in case of simultaneous failures, which are less likely to happen.

指数n表示在最多n个不同的入口-出口节点对之间应用1:1恢复类型的次数。在这里,最多n对不相交的工作和恢复LSP/跨度共享一个公共资源,最多n次。由于工作LSP/SPAN是相互不相交的,因此只有在同时发生故障的情况下才会同时请求使用共享(公共)资源,这是不太可能发生的。

For instance, in the common (1:1)^2 case, if the 2 recovery LSPs in the group overlap the same common resource, then it can handle only single failures; any multiple working LSP failures will cause at least one working LSP to be denied automatic recovery. Consider for instance the following topology with the working LSPs A-B-C and F-G-H and their respective recovery LSPs A-D-E-C and F-D-E-H that share a common D-E link resource.

例如,在公共(1:1)^2情况下,如果组中的两个恢复LSP重叠相同的公共资源,则它只能处理单个故障;任何多个工作LSP故障都将导致至少一个工作LSP被拒绝自动恢复。例如,考虑与工作LSP A B-C和F G-H和它们各自的恢复LSP A -D E-C和F D-E-H共享共享的D-E链路资源的以下拓扑结构。

                          A---------B---------C
                           \                 /
                            \               /
                             D-------------E
                            /               \
                           /                 \
                          F---------G---------H
        
                          A---------B---------C
                           \                 /
                            \               /
                             D-------------E
                            /               \
                           /                 \
                          F---------G---------H
        

2. (M:N)^n with recovery resource sharing

2. (M:N)^N具有恢复资源共享

The (M:N)^n scheme is documented here for the sake of completeness only (i.e., it is not mandated that GMPLS capabilities support this scheme). The exponent, n, indicates the number of times an M:N recovery type is applied between at most n different ingress-egress node pairs. So the interpretation follows from the previous case, except that here disjointness applies to the N working LSPs/spans and to the M recovery LSPs/spans while sharing at most n times M common resources.

此处记录的(M:N)^N方案仅用于完整性(即GMPLS功能不支持此方案)。指数n表示在最多n个不同的入口-出口节点对之间应用M:n恢复类型的次数。因此,解释遵循前一种情况,除了此处不相交适用于N个工作LSP/跨度和M个恢复LSP/跨度,同时共享最多N倍M个公共资源。

In both schemes, it results in a "group" of sum{n=1}^N N{n} working LSPs and a pool of shared recovery resources, not all of which are available to any given working LSP. In such conditions, defining a metric that describes the amount of overlap among the recovery LSPs would give some indication of the group's ability to handle simultaneous failures of multiple LSPs.

其中{n}个工作组的{n}个工作组的{n}个工作组的{n}个工作组的{n}个工作组的{n}个工作组的{n}个工作组的{n个工作组的}个工作组的}个工作组的{n个工作组的}个工作组。在这种情况下,定义一个描述恢复LSP之间的重叠量的指标将在一定程度上表明该组处理多个LSP同时故障的能力。

For instance, in the simple (1:1)^n case, if n recovery LSPs in a (1:1)^n group overlap, then the group can handle only single failures; any simultaneous failure of multiple working LSPs will cause at least one working LSP to be denied automatic recovery. But if one considers, for instance, a (2:2)^2 group in which there are two pairs of overlapping recovery LSPs, then two LSPs (belonging to the same pair) can be simultaneously recovered. The latter case can be illustrated by the following topology with 2 pairs of working LSPs A-B-C and F-G-H and their respective recovery LSPs A-D-E-C and F-D-E-H that share two common D-E link resources.

例如,在简单的(1:1)^n情况下,如果(1:1)^n组中的n个恢复LSP重叠,则该组只能处理单个故障;多个工作LSP的任何同时故障都将导致至少一个工作LSP被拒绝自动恢复。但是,如果考虑(2:2)^2组,其中有两对重叠的恢复LSP,则可以同时恢复两个LSP(属于同一对)。后一种情况可以通过以下拓扑来说明,其中2对工作lsp A-B-C和F-G-H及其各自的恢复lsp A-D-E-C和F-D-E-H共享两个公共D-E链路资源。

                           A========B========C
                           \\               //
                            \\             //
                             D =========== E
                            //             \\
                           //               \\
                           F========G========H
        
                           A========B========C
                           \\               //
                            \\             //
                             D =========== E
                            //             \\
                           //               \\
                           F========G========H
        

Moreover, in all these schemes, (working) path disjointness can be enforced by exchanging information related to working LSPs during the recovery LSP signaling. Specific issues related to the combination of shared (discrete) bandwidth and disjointness for recovery schemes are described in Section 8.4.2.

此外,在所有这些方案中,可以通过在恢复LSP信令期间交换与工作LSP相关的信息来强制(工作)路径不相交。第8.4.2节描述了与恢复方案的共享(离散)带宽和不连续性组合相关的具体问题。

5.5. LSP Recovery Mechanisms
5.5. LSP恢复机制
5.5.1. Classification
5.5.1. 分类

The recovery time and ratio of LSPs/spans depend on proper recovery LSP provisioning (meaning pre-provisioning when performed before failure occurrence) and the level of overbooking of recovery resources (i.e., over-provisioning). A proper balance of these two operations will result in the desired LSP/span recovery time and ratio when single or multiple failures occur. Note also that these operations are mostly performed during the network planning phases.

LSP/跨度的恢复时间和比率取决于适当的恢复LSP资源调配(即在发生故障之前执行预资源调配)和恢复资源的超订级别(即超配)。当发生单个或多个故障时,这两个操作的适当平衡将产生所需的LSP/span恢复时间和比率。还请注意,这些操作大多在网络规划阶段执行。

The different options for LSP (pre-)provisioning and overbooking are classified below to structure the analysis of the different recovery mechanisms.

下面对LSP(预)供应和超售的不同选项进行了分类,以构建对不同恢复机制的分析。

1. Pre-Provisioning

1. 预调配

Proper recovery LSP pre-provisioning will help to alleviate the failure of the working LSPs (due to the failure of the resources that carry these LSPs). As an example, one may compute and establish the recovery LSP either end-to-end or segment-per-segment, to protect a working LSP from multiple failure events affecting link(s), node(s) and/or SRLG(s). The recovery LSP pre-provisioning options are classified as follows in the figure below:

适当的恢复LSP预配置将有助于缓解工作LSP的故障(由于承载这些LSP的资源出现故障)。例如,可以计算并建立端到端或每段的恢复LSP,以保护工作LSP免受影响链路、节点和/或SRLG的多个故障事件的影响。恢复LSP预配置选项分类如下图所示:

(1) The recovery path can be either pre-computed or computed on-demand.

(1) 恢复路径可以是预先计算的,也可以是按需计算的。

(2) When the recovery path is pre-computed, it can be either pre-signaled (implying recovery resource reservation) or signaled on-demand.

(2) 当预先计算恢复路径时,可以预先发出信号(意味着恢复资源保留)或按需发出信号。

(3) When the recovery resources are pre-signaled, they can be either pre-selected or selected on-demand.

(3) 当恢复资源预先发出信号时,可以预先选择或按需选择。

Recovery LSP provisioning phases:

恢复LSP资源调配阶段:

   (1) Path Computation --> On-demand
           |
           |
            --> Pre-Computed
                    |
                    |
                   (2) Signaling --> On-demand
                           |
                           |
                            --> Pre-Signaled
                                    |
                                    |
                                   (3) Resource Selection --> On-demand
                                                |
                                                |
                                                 --> Pre-Selected
        
   (1) Path Computation --> On-demand
           |
           |
            --> Pre-Computed
                    |
                    |
                   (2) Signaling --> On-demand
                           |
                           |
                            --> Pre-Signaled
                                    |
                                    |
                                   (3) Resource Selection --> On-demand
                                                |
                                                |
                                                 --> Pre-Selected
        

Note that these different options lead to different LSP/span recovery times. The following sections will consider the above-mentioned pre-provisioning options when analyzing the different recovery mechanisms.

请注意,这些不同的选项会导致不同的LSP/span恢复时间。以下部分将在分析不同的恢复机制时考虑上述预配置选项。

2. Overbooking

2. 超售

There are many mechanisms available that allow the overbooking of the recovery resources. This overbooking can be done per LSP (as in the example mentioned above), per link (such as span protection), or even per domain. In all these cases, the level of overbooking, as shown in the below figure, can be classified as dedicated (such as 1+1 and 1:1), shared (such as 1:N and M:N), or unprotected (and thus restorable, if enough recovery resources are available).

有许多机制允许超额预订恢复资源。这种超售可以根据LSP(如上面提到的示例)、每个链路(如span保护)甚至每个域进行。在所有这些情况下,超售级别(如下图所示)可分为专用(如1+1和1:1)、共享(如1:N和M:N)或未受保护(因此,如果有足够的恢复资源可用,则可恢复)。

Overbooking levels:

超售水平:

                    +----- Dedicated (for instance: 1+1, 1:1, etc.)
                    |
                    |
        
                    +----- Dedicated (for instance: 1+1, 1:1, etc.)
                    |
                    |
        
                    +----- Shared (for instance: 1:N, M:N, etc.)
                    |
   Level of         |
   Overbooking -----+----- Unprotected (for instance: 0:1, 0:N)
        
                    +----- Shared (for instance: 1:N, M:N, etc.)
                    |
   Level of         |
   Overbooking -----+----- Unprotected (for instance: 0:1, 0:N)
        

Also, when using shared recovery, one may support preemptible extra-traffic; the recovery mechanism is then expected to allow preemption of this low priority traffic in case of recovery resource contention during recovery operations. The following sections will consider the

此外,当使用共享恢复时,可以支持可抢占的额外流量;然后,在恢复操作期间发生恢复资源争用的情况下,恢复机制将允许抢占此低优先级流量。以下各节将考虑

above-mentioned overbooking options when analyzing the different recovery mechanisms.

在分析不同的恢复机制时,应考虑上述超售选项。

5.5.2. LSP Restoration
5.5.2. LSP恢复

The following times are defined to provide a quantitative estimation about the time performance of the different LSP restoration mechanisms (also referred to as LSP re-routing):

定义以下时间是为了定量估计不同LSP恢复机制(也称为LSP重路由)的时间性能:

- Path Computation Time: Tc - Path Selection Time: Ts - End-to-end LSP Resource Reservation Time: Tr (a delta for resource selection is also considered, the corresponding total time is then referred to as Trs) - End-to-end LSP Resource Activation Time: Ta (a delta for resource selection is also considered, the corresponding total time is then referred to as Tas)

- 路径计算时间:Tc-路径选择时间:Ts-端到端LSP资源保留时间:Tr(也考虑资源选择的增量,相应的总时间称为Trs)-端到端LSP资源激活时间:Ta(还考虑了资源选择的增量,相应的总时间称为TA)

The Path Selection Time (Ts) is considered when a pool of recovery LSP paths between a given pair of source/destination end-points is pre-computed, and after a failure occurrence one of these paths is selected for the recovery of the LSP under failure condition.

当预先计算给定源/目标端点对之间的恢复LSP路径池时,考虑路径选择时间(Ts),并且在发生故障后,选择其中一条路径用于在故障条件下恢复LSP。

Note: failure management operations such as failure detection, correlation, and notification are considered (for a given failure event) as equally time-consuming for all the mechanisms described below:

注:故障管理操作(如故障检测、关联和通知)(对于给定故障事件)被视为对以下所述所有机制同样耗时:

1. With Route Pre-computation (or LSP re-provisioning)

1. 具有路由预计算(或LSP重新配置)

An end-to-end restoration LSP is established after the failure(s) occur(s) based on a pre-computed path. As such, one can define this as an "LSP re-provisioning" mechanism. Here, one or more (disjoint) paths for the restoration LSP are computed (and optionally pre-selected) before a failure occurs.

在故障发生后,基于预先计算的路径建立端到端恢复LSP。因此,可以将其定义为“LSP重新配置”机制。这里,在发生故障之前,计算(并且可选地预先选择)恢复LSP的一个或多个(不相交)路径。

No reservation or selection of resources is performed along the restoration path before failure occurrence. As a result, there is no guarantee that a restoration LSP is available when a failure occurs.

在发生故障之前,不会沿恢复路径执行资源保留或选择。因此,无法保证发生故障时恢复LSP可用。

The expected total restoration time T is thus equal to Ts + Trs or to Trs when a dedicated computation is performed for each working LSP.

因此,当对每个工作LSP执行专用计算时,预期总恢复时间T等于Ts+Trs或Trs。

2. Without Route Pre-computation (or Full LSP re-routing)

2. 无路由预计算(或完全LSP重新路由)

An end-to-end restoration LSP is dynamically established after the failure(s) occur(s). After failure occurrence, one or more (disjoint) paths for the restoration LSP are dynamically computed and

在发生故障后动态建立端到端恢复LSP。故障发生后,动态计算和分析恢复LSP的一条或多条(不相交)路径

one is selected. As such, one can define this as a complete "LSP re-routing" mechanism.

选择一个。因此,可以将其定义为完整的“LSP重新路由”机制。

No reservation or selection of resources is performed along the restoration path before failure occurrence. As a result, there is no guarantee that a restoration LSP is available when a failure occurs.

在发生故障之前,不会沿恢复路径执行资源保留或选择。因此,无法保证发生故障时恢复LSP可用。

The expected total restoration time T is thus equal to Tc (+ Ts) + Trs. Therefore, time performance between these two approaches differs by the time required for route computation Tc (and its potential selection time, Ts).

因此,预期的总恢复时间T等于Tc(+Ts)+Trs。因此,这两种方法之间的时间性能因路由计算Tc(及其潜在选择时间Ts)所需的时间而不同。

5.5.3. Pre-Planned LSP Restoration
5.5.3. 预先计划的LSP恢复

Pre-planned LSP restoration (also referred to as pre-planned LSP re-routing) implies that the restoration LSP is pre-signaled. This in turn implies the reservation of recovery resources along the restoration path. Two cases can be defined based on whether the recovery resources are pre-selected.

预先计划的LSP恢复(也称为预先计划的LSP重路由)意味着恢复LSP是预先发信号的。这反过来意味着沿恢复路径保留恢复资源。可以根据是否预先选择了恢复资源来定义两种情况。

1. With resource reservation and without resource pre-selection

1. 有资源预留,无资源预选

Before failure occurrence, an end-to-end restoration path is pre-selected from a set of pre-computed (disjoint) paths. The restoration LSP is signaled along this pre-selected path to reserve resources at each node, but these resources are not selected.

在发生故障之前,从一组预先计算(不相交)的路径中预先选择端到端恢复路径。恢复LSP沿着此预先选择的路径发送信号,以在每个节点保留资源,但这些资源未被选择。

In this case, the resources reserved for each restoration LSP may be dedicated or shared between multiple restoration LSPs whose working LSPs are not expected to fail simultaneously. Local node policies can be applied to define the degree to which these resources can be shared across independent failures. Also, since a restoration scheme is considered, resource sharing should not be limited to restoration LSPs that start and end at the same ingress and egress nodes. Therefore, each node participating in this scheme is expected to receive some feedback information on the sharing degree of the recovery resource(s) that this scheme involves.

在这种情况下,为每个恢复LSP保留的资源可以在其工作LSP预计不会同时失败的多个恢复LSP之间专用或共享。可以应用本地节点策略来定义这些资源在独立故障之间共享的程度。此外,由于考虑了恢复方案,因此资源共享不应限于在相同的入口和出口节点处开始和结束的恢复lsp。因此,期望参与该方案的每个节点接收关于该方案涉及的恢复资源的共享程度的一些反馈信息。

Upon failure detection/notification message reception, signaling is initiated along the restoration path to select the resources, and to perform the appropriate operation at each node crossed by the restoration LSP (e.g., cross-connections). If lower priority LSPs were established using the restoration resources, they must be preempted when the restoration LSP is activated.

在接收到故障检测/通知消息时,沿着恢复路径启动信令以选择资源,并在恢复LSP交叉的每个节点上执行适当的操作(例如交叉连接)。如果使用恢复资源建立了低优先级LSP,则必须在激活恢复LSP时抢占这些LSP。

Thus, the expected total restoration time T is equal to Tas (post-failure activation), while operations performed before failure occurrence take Tc + Ts + Tr.

因此,预期总恢复时间T等于Tas(故障后激活),而故障发生前执行的操作采用Tc+Ts+Tr。

2. With both resource reservation and resource pre-selection

2. 具有资源预留和资源预选功能

Before failure occurrence, an end-to-end restoration path is pre-selected from a set of pre-computed (disjoint) paths. The restoration LSP is signaled along this pre-selected path to reserve AND select resources at each node, but these resources are not committed at the data plane level. So that the selection of the recovery resources is committed at the control plane level only, no cross-connections are performed along the restoration path.

在发生故障之前,从一组预先计算(不相交)的路径中预先选择端到端恢复路径。恢复LSP沿着此预先选择的路径发送信号,以在每个节点保留和选择资源,但这些资源不会在数据平面级别提交。因此,恢复资源的选择仅在控制平面级别提交,不会沿恢复路径执行交叉连接。

In this case, the resources reserved and selected for each restoration LSP may be dedicated or even shared between multiple restoration LSPs whose associated working LSPs are not expected to fail simultaneously. Local node policies can be applied to define the degree to which these resources can be shared across independent failures. Also, because a restoration scheme is considered, resource sharing should not be limited to restoration LSPs that start and end at the same ingress and egress nodes. Therefore, each node participating in this scheme is expected to receive some feedback information on the sharing degree of the recovery resource(s) that this scheme involves.

在这种情况下,为每个恢复LSP保留和选择的资源可以专用,甚至可以在多个恢复LSP之间共享,这些恢复LSP的相关工作LSP预计不会同时失败。可以应用本地节点策略来定义这些资源在独立故障之间共享的程度。此外,由于考虑了恢复方案,因此资源共享不应限于在相同的入口和出口节点处开始和结束的恢复lsp。因此,期望参与该方案的每个节点接收关于该方案涉及的恢复资源的共享程度的一些反馈信息。

Upon failure detection/notification message reception, signaling is initiated along the restoration path to activate the reserved and selected resources, and to perform the appropriate operation at each node crossed by the restoration LSP (e.g., cross-connections). If lower priority LSPs were established using the restoration resources, they must be preempted when the restoration LSP is activated.

在接收到故障检测/通知消息时,沿着恢复路径启动信令,以激活保留的和选定的资源,并在恢复LSP交叉的每个节点上执行适当的操作(例如交叉连接)。如果使用恢复资源建立了低优先级LSP,则必须在激活恢复LSP时抢占这些LSP。

Thus, the expected total restoration time T is equal to Ta (post-failure activation), while operations performed before failure occurrence take Tc + Ts + Trs. Therefore, time performance between these two approaches differs only by the time required for resource selection during the activation of the recovery LSP (i.e., Tas - Ta).

因此,预期总恢复时间T等于Ta(故障后激活),而故障发生前执行的操作采用Tc+Ts+Trs。因此,这两种方法之间的时间性能仅因恢复LSP(即Tas-Ta)激活期间选择资源所需的时间不同而不同。

5.5.4. LSP Segment Restoration
5.5.4. LSP段恢复

The above approaches can be applied on an edge-to-edge LSP basis rather than end-to-end LSP basis (i.e., to reduce the global recovery time) by allowing the recovery of the individual LSP segments constituting the end-to-end LSP.

通过允许恢复构成端到端LSP的各个LSP段,可以在边到边LSP的基础上而不是在端到端LSP的基础上应用上述方法(即,减少全局恢复时间)。

Also, by using the horizontal hierarchy approach described in Section 7.1, an end-to-end LSP can be recovered by multiple recovery mechanisms applied on an LSP segment basis (e.g., 1:1 edge-to-edge LSP protection in a metro network, and M:N edge-to-edge protection in the core). These mechanisms are ideally independent and may even use different failure localization and notification mechanisms.

此外,通过使用第7.1节中描述的水平层次方法,可以通过基于LSP段应用的多个恢复机制来恢复端到端LSP(例如,城域网络中的1:1边到边LSP保护,以及核心中的M:N边到边保护)。这些机制在理想情况下是独立的,甚至可以使用不同的故障定位和通知机制。

6. Reversion
6. 回归

Reversion (a.k.a. normalization) is defined as the mechanism allowing switching of normal traffic from the recovery LSP/span to the working LSP/span previously under failure condition. Use of normalization is at the discretion of the recovery domain policy. Normalization may impact the normal traffic (a second hit) depending on the normalization mechanism used.

恢复(又称标准化)是指允许在故障状态下将正常通信量从恢复LSP/span切换到工作LSP/span的机制。正常化的使用由恢复域策略决定。根据使用的规范化机制,规范化可能会影响正常通信量(第二次命中)。

If normalization is supported, then 1) the LSP/span must be returned to the working LSP/span when the failure condition clears and 2) the capability to de-activate (turn-off) the use of reversion should be provided. De-activation of reversion should not impact the normal traffic, regardless of whether it is currently using the working or recovery LSP/span.

如果支持标准化,则1)当故障条件清除时,必须将LSP/span返回到工作LSP/span,2)应提供取消激活(关闭)恢复使用的功能。恢复的取消激活不应影响正常通信量,无论其当前使用的是工作LSP/span还是恢复LSP/span。

Note: during the failure, the reuse of any non-failed resources (e.g., LSP and/or spans) belonging to the working LSP/span is under the discretion of recovery domain policy.

注意:在故障期间,属于工作LSP/span的任何非故障资源(如LSP和/或span)的重用由恢复域策略决定。

6.1. Wait-To-Restore (WTR)
6.1. 等待恢复(WTR)

A specific mechanism (Wait-To-Restore) is used to prevent frequent recovery switching operations due to an intermittent defect (e.g., Bit Error Rate (BER) fluctuating around the SD threshold).

特定机制(等待恢复)用于防止由于间歇性缺陷(例如,误码率(BER)在SD阈值附近波动)而导致频繁的恢复切换操作。

First, an LSP/span under failure condition must become fault-free, e.g., a BER less than a certain recovery threshold. After the recovered LSP/span (i.e., the previously working LSP/span) meets this criterion, a fixed period of time shall elapse before normal traffic uses the corresponding resources again. This duration called Wait-To-Restore (WTR) period or timer is generally on the order of a few minutes (for instance, 5 minutes) and should be capable of being set. The WTR timer may be either a fixed period, or provide for incrementally longer periods before retrying. An SF or SD condition on the previously working LSP/span will override the WTR timer value (i.e., the WTR cancels and the WTR timer will restart).

首先,故障条件下的LSP/span必须无故障,例如,BER小于某个恢复阈值。在恢复的LSP/span(即先前工作的LSP/span)满足该标准后,在正常业务再次使用相应的资源之前,应经过一段固定的时间。这种称为等待恢复(WTR)周期或计时器的持续时间通常为几分钟(例如,5分钟),应该能够设置。WTR定时器可以是固定的周期,或者在重试之前提供增量更长的周期。先前工作的LSP/span上的SF或SD条件将覆盖WTR定时器值(即,WTR取消,WTR定时器将重新启动)。

6.2. Revertive Mode Operation
6.2. 回复模式操作

In revertive mode of operation, when the recovery LSP/span is no longer required, i.e., the failed working LSP/span is no longer in SD or SF condition, a local Wait-to-Restore (WTR) state will be activated before switching the normal traffic back to the recovered working LSP/span.

在恢复操作模式下,当不再需要恢复LSP/span时,即故障工作LSP/span不再处于SD或SF状态,在将正常通信量切换回恢复工作LSP/span之前,将激活本地等待恢复(WTR)状态。

During the reversion operation, since this state becomes the highest in priority, signaling must maintain the normal traffic on the

在恢复操作期间,由于该状态成为最高优先级,信令必须维持网络上的正常通信量

recovery LSP/span from the previously failed working LSP/span. Moreover, during this WTR state, any null traffic or extra traffic (if applicable) request is rejected.

从以前失败的工作LSP/span恢复LSP/span。此外,在此WTR状态期间,拒绝任何空流量或额外流量(如果适用)请求。

However, deactivation (cancellation) of the wait-to-restore timer may occur if there are higher priority request attempts. That is, the recovery LSP/span usage by the normal traffic may be preempted if a higher priority request for this recovery LSP/span is attempted.

但是,如果有更高优先级的请求尝试,则可能会停用(取消)等待恢复计时器。也就是说,如果尝试对此恢复LSP/span的更高优先级请求,则正常通信量的恢复LSP/span使用可能会被抢占。

6.3. Orphans
6.3. 孤儿

When a reversion operation is requested, normal traffic must be switched from the recovery to the recovered working LSP/span. A particular situation occurs when the previously working LSP/span cannot be recovered, so normal traffic cannot be switched back. In that case, the LSP/span under failure condition (also referred to as "orphan") must be cleared (i.e., removed) from the pool of resources allocated for normal traffic. Otherwise, potential de-synchronization between the control and transport plane resource usage can appear. Depending on the signaling protocol capabilities and behavior, different mechanisms are expected here.

当请求恢复操作时,必须将正常通信量从恢复切换到恢复的工作LSP/span。当先前工作的LSP/span无法恢复时,会出现一种特殊情况,因此无法切换回正常通信量。在这种情况下,故障条件下的LSP/span(也称为“孤立”)必须从为正常流量分配的资源池中清除(即删除)。否则,可能会出现控制平面和传输平面资源使用之间的潜在不同步。根据信令协议的功能和行为,这里需要不同的机制。

Therefore, any reserved or allocated resources for the LSP/span under failure condition must be unreserved/de-allocated. Several ways can be used for that purpose: wait for the clear-out time interval to elapse, initiate a deletion from the ingress or the egress node, or trigger the initiation of deletion from an entity (such as an EMS or NMS) capable of reacting upon reception of an appropriate notification message.

因此,在故障条件下,LSP/span的任何保留或分配的资源都必须取消保留/取消分配。有几种方法可用于此目的:等待清除时间间隔过去,从入口或出口节点发起删除,或触发从能够在接收到适当通知消息时作出反应的实体(如EMS或NMS)发起删除。

7. Hierarchies
7. 等级制度

Recovery mechanisms are being made available at multiple (if not all) transport layers within so-called "IP/MPLS-over-optical" networks. However, each layer has certain recovery features, and one needs to determine the exact impact of the interaction between the recovery mechanisms provided by these layers.

在所谓的“光上IP/MPLS”网络中的多个(如果不是全部的话)传输层上提供了恢复机制。但是,每一层都有特定的恢复功能,需要确定这些层提供的恢复机制之间的交互的确切影响。

Hierarchies are used to build scalable complex systems. By hiding the internal details, abstraction is used as a mechanism to build large networks or as a technique for enforcing technology, topological, or administrative boundaries. The same hierarchical concept can be applied to control the network survivability. Network survivability is the set of capabilities that allow a network to restore affected traffic in the event of a failure. Network survivability is defined further in [RFC4427]. In general, it is expected that the recovery action is taken by the recoverable LSP/span closest to the failure in order to avoid the multiplication

层次结构用于构建可伸缩的复杂系统。通过隐藏内部细节,抽象被用作构建大型网络的一种机制,或者作为强制执行技术、拓扑或管理边界的一种技术。同样的分层概念也可用于控制网络的生存性。网络生存能力是一组功能,允许网络在发生故障时恢复受影响的通信量。[RFC4427]中进一步定义了网络生存性。通常,预计最接近故障的可恢复LSP/span会采取恢复操作,以避免倍增

of recovery actions. Moreover, recovery hierarchies also can be bound to control plane logical partitions (e.g., administrative or topological boundaries). Each logical partition may apply different recovery mechanisms.

恢复行动的执行情况。此外,恢复层次结构还可以绑定到控制平面逻辑分区(例如,管理或拓扑边界)。每个逻辑分区可以应用不同的恢复机制。

In brief, it is commonly accepted that the lower layers can provide coarse but faster recovery while the higher layers can provide finer but slower recovery. Moreover, it is also desirable to avoid similar layers with functional overlaps in order to optimize network resource utilization and processing overhead, since repeating the same capabilities at each layer does not create any added value for the network as a whole. In addition, even if a lower layer recovery mechanism is enabled, it does not prevent the additional provision of a recovery mechanism at the upper layer. The inverse statement does not necessarily hold; that is, enabling an upper layer recovery mechanism may prevent the use of a lower layer recovery mechanism. In this context, this section analyzes these hierarchical aspects including the physical (passive) layer(s).

简言之,人们普遍认为,较低层可以提供粗略但较快的恢复,而较高层可以提供更精细但较慢的恢复。此外,还希望避免具有功能重叠的类似层,以便优化网络资源利用率和处理开销,因为在每个层重复相同的功能不会为整个网络创造任何附加值。此外,即使启用了下层恢复机制,也不会阻止在上层额外提供恢复机制。相反的陈述不一定成立;也就是说,启用上层恢复机制可以防止使用下层恢复机制。在这种情况下,本节分析这些分层方面,包括物理(被动)层。

7.1. Horizontal Hierarchy (Partitioning)
7.1. 水平层次结构(分区)

A horizontal hierarchy is defined when partitioning a single-layer network (and its control plane) into several recovery domains. Within a domain, the recovery scope may extend over a link (or span), LSP segment, or even an end-to-end LSP. Moreover, an administrative domain may consist of a single recovery domain or can be partitioned into several smaller recovery domains. The operator can partition the network into recovery domains based on physical network topology, control plane capabilities, or various traffic engineering constraints.

将单层网络(及其控制平面)划分为多个恢复域时,定义了水平层次结构。在域内,恢复范围可以扩展到链路(或范围)、LSP段甚至端到端LSP。此外,管理域可以由单个恢复域组成,也可以划分为几个较小的恢复域。运营商可以根据物理网络拓扑、控制平面功能或各种流量工程约束将网络划分为恢复域。

An example often addressed in the literature is the metro-core-metro application (sometimes extended to a metro-metro/core-core) within a single transport layer (see Section 7.2). For such a case, an end-to-end LSP is defined between the ingress and egress metro nodes, while LSP segments may be defined within the metro or core sub-networks. Each of these topological structures determines a so-called "recovery domain" since each of the LSPs they carry can have its own recovery type (or even scheme). The support of multiple recovery types and schemes within a sub-network is referred to as a "multi-recovery capable domain" or simply "multi-recovery domain".

文献中经常提到的一个例子是单个传输层内的metro core metro应用程序(有时扩展到metro/core core)(见第7.2节)。对于这种情况,在入口和出口城域节点之间定义端到端LSP,而LSP段可以在城域或核心子网内定义。每个拓扑结构都决定了所谓的“恢复域”,因为它们所承载的每个LSP都可以有自己的恢复类型(甚至是方案)。在子网内支持多种恢复类型和方案称为“支持多恢复的域”或简称“多恢复域”。

7.2. Vertical Hierarchy (Layers)
7.2. 垂直层次结构(层)

It is very challenging to combine the different recovery capabilities available across the path (i.e., switching capable) and section layers to ensure that certain network survivability objectives are met for the network-supported services.

将路径层(即具有交换能力的)和区段层上的不同恢复能力结合起来,以确保网络支持的服务满足某些网络生存性目标,这是一项非常具有挑战性的工作。

As a first analysis step, one can draw the following guidelines for a vertical coordination of the recovery mechanisms:

作为第一个分析步骤,可以为恢复机制的纵向协调制定以下指南:

- The lower the layer, the faster the notification and switching.

- 层越低,通知和切换速度越快。

- The higher the layer, the finer the granularity of the recoverable entity and therefore the granularity of the recovery resource.

- 层越高,可恢复实体的粒度越细,因此恢复资源的粒度也越细。

Moreover, in the context of this analysis, a vertical hierarchy consists of multiple layered transport planes providing different:

此外,在本分析中,垂直层次结构由多层运输平面组成,提供不同的:

- Discrete bandwidth granularities for non-packet LSPs such as OCh, ODUk, STS_SPE/HOVC, and VT_SPE/LOVC LSPs and continuous bandwidth granularities for packet LSPs.

- 非分组LSP(如OCh、ODUk、STS_SPE/HOVC和VT_SPE/LOVC LSP)的离散带宽粒度以及分组LSP的连续带宽粒度。

- Potential recovery capabilities with different temporal granularities: ranging from milliseconds to tens of seconds

- 具有不同时间粒度的潜在恢复功能:从毫秒到数十秒不等

Note: based on the bandwidth granularity, we can determine four classes of vertical hierarchies: (1) packet over packet, (2) packet over circuit, (3) circuit over packet, and (4) circuit over circuit. Below we briefly expand on (4) only. (2) is covered in [RFC3386]. (1) is extensively covered by the MPLS Working Group, and (3) by the PWE3 Working Group.

注:根据带宽粒度,我们可以确定四类垂直层次结构:(1)包对包,(2)包对电路,(3)包对电路,以及(4)电路对电路。下面,我们仅对(4)进行简单的扩展。(2) 参见[RFC3386]。(1) 被MPLS工作组广泛覆盖,(3)被PWE3工作组广泛覆盖。

In SONET/SDH environments, one typically considers the VT_SPE/LOVC and STS SPE/HOVC as independent layers (for example, VT_SPE/LOVC LSP uses the underlying STS_SPE/HOVC LSPs as links). In OTN, the ODUk path layers will lie on the OCh path layer, i.e., the ODUk LSPs use the underlying OCh LSPs as OTUk links. Note here that lower layer LSPs may simply be provisioned and not necessarily dynamically triggered or established (control driven approach). In this context, an LSP at the path layer (i.e., established using GMPLS signaling), such as an optical channel LSP, appears at the OTUk layer as a link, controlled by a link management protocol such as LMP.

在SONET/SDH环境中,通常将VT_SPE/LOVC和STS SPE/HOVC视为独立层(例如,VT_SPE/LOVC LSP使用底层STS_SPE/HOVC LSP作为链路)。在OTN中,ODUk路径层将位于OCh路径层上,即ODUk LSP将底层OCh LSP用作OTUk链路。这里注意,较低层lsp可以简单地被供应,而不必动态地触发或建立(控制驱动方法)。在该上下文中,路径层(即,使用GMPLS信令建立的)处的LSP,例如光信道LSP,作为链路出现在OTUk层,由诸如LMP的链路管理协议控制。

The first key issue with multi-layer recovery is that achieving individual or bulk LSP recovery will be as efficient as the underlying link (local span) recovery. In such a case, the span can be either protected or unprotected, but the LSP it carries must be (at least locally) recoverable. Therefore, the span recovery process can be either independent when protected (or restorable), or triggered by the upper LSP recovery process. The former case requires coordination to achieve subsequent LSP recovery. Therefore, in order to achieve robustness and fast convergence, multi-layer recovery requires a fine-tuned coordination mechanism.

多层恢复的第一个关键问题是,实现单个或批量LSP恢复将与底层链路(本地span)恢复一样高效。在这种情况下,span可以是受保护的,也可以是不受保护的,但它所承载的LSP必须(至少在本地)可恢复。因此,span恢复过程在受保护(或可恢复)时可以是独立的,也可以由上层LSP恢复过程触发。前一种情况需要协调以实现后续LSP恢复。因此,为了实现鲁棒性和快速收敛,多层恢复需要微调协调机制。

Moreover, in the absence of adequate recovery mechanism coordination (for instance, a pre-determined coordination when using a hold-off timer), a failure notification may propagate from one layer to the next one within a recovery hierarchy. This can cause "collisions" and trigger simultaneous recovery actions that may lead to race conditions and, in turn, reduce the optimization of the resource utilization and/or generate global instabilities in the network (see [MANCHESTER]). Therefore, a consistent and efficient escalation strategy is needed to coordinate recovery across several layers.

此外,在缺乏充分的恢复机制协调的情况下(例如,使用延迟计时器时的预定义协调),故障通知可能会在恢复层次结构中从一层传播到下一层。这可能导致“冲突”并触发同步恢复操作,这可能导致竞争条件,进而降低资源利用率的优化和/或在网络中产生全局不稳定性(参见[MANCHESTER])。因此,需要一个一致且高效的上报策略来协调跨多个层的恢复。

One can expect that the definition of the recovery mechanisms and protocol(s) is technology-independent so that they can be consistently implemented at different layers; this would in turn simplify their global coordination. Moreover, as mentioned in [RFC3386], some looser form of coordination and communication between (vertical) layers such as a consistent hold-off timer configuration (and setup through signaling during the working LSP establishment) can be considered, thereby allowing the synchronization between recovery actions performed across these layers.

可以预期,恢复机制和协议的定义与技术无关,因此它们可以在不同的层上一致地实现;这将反过来简化它们的全球协调。此外,如[RFC3386]中所述,可以考虑(垂直)层之间的一些松散形式的协调和通信,例如一致的延迟计时器配置(以及在工作LSP建立期间通过信令进行的设置),从而允许在这些层之间执行的恢复动作之间进行同步。

7.2.1. Recovery Granularity
7.2.1. 恢复粒度

In most environments, the design of the network and the vertical distribution of the LSP bandwidth are such that the recovery granularity is finer at higher layers. The OTN and SONET/SDH layers can recover only the whole section or the individual connections they transports whereas the IP/MPLS control plane can recover individual packet LSPs or groups of packet LSPs independently of their granularity. On the other side, the recovery granularity at the sub-wavelength level (i.e., SONET/SDH) can be provided only when the network includes devices switching at the same granularity (and thus not with optical channel level). Therefore, the network layer can deliver control-plane-driven recovery mechanisms on a per-LSP basis if and only if these LSPs have their corresponding switching granularity supported at the transport plane level.

在大多数环境中,网络的设计和LSP带宽的垂直分布使得恢复粒度在更高的层上更精细。OTN和SONET/SDH层只能恢复其传输的整个部分或单个连接,而IP/MPLS控制平面可以独立于其粒度恢复单个分组LSP或分组LSP组。另一方面,仅当网络包括以相同粒度(因此不具有光信道级别)交换的设备时,才能提供亚波长级别(即,SONET/SDH)的恢复粒度。因此,当且仅当这些LSP在传输平面级别支持其相应的交换粒度时,网络层可以基于每个LSP提供控制平面驱动的恢复机制。

7.3. Escalation Strategies
7.3. 升级战略

There are two types of escalation strategies (see [DEMEESTER]): bottom-up and top-down.

有两种升级策略(参见[Demester]):自下而上和自上而下。

The bottom-up approach assumes that lower layer recovery types and schemes are more expedient and faster than upper layer ones. Therefore, we can inhibit or hold off higher layer recovery. However, this assumption is not entirely true. Consider for instance a SONET/SDH based protection mechanism (with a protection switching time of less than 50 ms) lying on top of an OTN restoration mechanism (with a restoration time of less than 200 ms). Therefore, this

自下而上的方法假设下层恢复类型和方案比上层恢复类型和方案更方便、更快。因此,我们可以抑制或延缓更高层的恢复。然而,这一假设并不完全正确。例如,考虑基于SONET/SDH的保护机制(保护切换时间小于50毫秒)位于OTN恢复机制的顶部(恢复时间小于200毫秒)。因此,

assumption should be (at least) clarified as: the lower layer recovery mechanism is expected to be faster than the upper level one, if the same type of recovery mechanism is used at each layer.

假设应(至少)澄清为:如果在每层使用相同类型的恢复机制,则下层恢复机制预计比上层恢复机制更快。

Consequently, taking into account the recovery actions at the different layers in a bottom-up approach: if lower layer recovery mechanisms are provided and sequentially activated in conjunction with higher layer ones, the lower layers must have an opportunity to recover normal traffic before the higher layers do. However, if lower layer recovery is slower than higher layer recovery, the lower layer must either communicate the failure-related information to the higher layer(s) (and allow it to perform recovery), or use a hold-off timer in order to temporarily set the higher layer recovery action in a "standby mode". Note that the a priori information exchange between layers concerning their efficiency is not within the current scope of this document. Nevertheless, the coordination functionality between layers must be configurable and tunable.

因此,考虑到自下而上方法中不同层的恢复操作:如果提供较低层的恢复机制并与较高层的恢复机制一起顺序激活,则较低层必须有机会在较高层之前恢复正常流量。但是,如果较低层恢复比较高层恢复慢,较低层必须将故障相关信息传达给较高层(并允许其执行恢复),或者使用延迟计时器,以便将较高层恢复操作临时设置为“待机模式”。请注意,各层之间关于其效率的先验信息交换不在本文件的当前范围内。然而,层之间的协调功能必须是可配置和可调的。

For example, coordination between the optical and packet layer control plane enables the optical layer to perform the failure management operations (in particular, failure detection and notification) while giving to the packet layer control plane the authority to decide and perform the recovery actions. If the packet layer recovery action is unsuccessful, fallback at the optical layer can be performed subsequently.

例如,光学层和分组层控制平面之间的协调使得光学层能够执行故障管理操作(尤其是故障检测和通知),同时向分组层控制平面授予决定和执行恢复动作的权限。如果分组层恢复动作不成功,则可随后在光学层执行回退。

The top-down approach attempts service recovery at the higher layers before invoking lower layer recovery. Higher layer recovery is service selective, and permits "per-CoS" or "per-connection" re-routing. With this approach, the most important aspect is that the upper layer should provide its own reliable and independent failure detection mechanism from the lower layer.

自上而下的方法在调用较低层恢复之前尝试在较高层进行服务恢复。更高层的恢复是服务选择性的,并且允许“每CoS”或“每连接”重新路由。使用这种方法,最重要的方面是上层应提供自己的可靠和独立的故障检测机制。

[DEMEESTER] also suggests recovery mechanisms incorporating a coordinated effort shared by two adjacent layers with periodic status updates. Moreover, some of these recovery operations can be pre-assigned (on a per-link basis) to a certain layer, e.g., a given link will be recovered at the packet layer while another will be recovered at the optical layer.

[Demester]还建议采用恢复机制,包括由两个相邻层共享的协调工作,并定期更新状态。此外,这些恢复操作中的一些可以预先分配(基于每个链路)到某一层,例如,给定链路将在分组层恢复,而另一链路将在光学层恢复。

7.4. Disjointness
7.4. 不相交

Having link and node diverse working and recovery LSPs/spans does not guarantee their complete disjointness. Due to the common physical layer topology (passive), additional hierarchical concepts, such as the Shared Risk Link Group (SRLG), and mechanisms, such as SRLG diverse path computation, must be developed to provide complete working and recovery LSP/span disjointness (see [IPO-IMP] and

链路和节点多样化的工作和恢复LSP/spans并不能保证它们完全不相交。由于通用物理层拓扑(被动),必须开发额外的分层概念,如共享风险链路组(SRLG)和机制,如SRLG多样路径计算,以提供完整的工作和恢复LSP/span不相交性(参见[IPO-IMP]和

[RFC4202]). Otherwise, a failure affecting the working LSP/span would also potentially affect the recovery LSP/span; one refers to such an event as "common failure".

[RFC4202])。否则,影响工作LSP/span的故障也可能影响恢复LSP/span;有人将此类事件称为“常见故障”。

7.4.1. SRLG Disjointness
7.4.1. SRLG不相交性

A Shared Risk Link Group (SRLG) is defined as the set of links sharing a common risk (such as a common physical resource such as a fiber link or a fiber cable). For instance, a set of links L belongs to the same SRLG s, if they are provisioned over the same fiber link f.

共享风险链路组(SRLG)定义为共享公共风险的链路集(例如,光纤链路或光缆等公共物理资源)。例如,一组链路L属于相同的SRLG s,如果它们是通过相同的光纤链路f提供的。

The SRLG properties can be summarized as follows:

SRLG属性可总结如下:

1) A link belongs to more than one SRLG if and only if it crosses one of the resources covered by each of them.

1) 一个链接属于多个SRLG,当且仅当它跨越每个SRLG覆盖的一个资源时。

2) Two links belonging to the same SRLG can belong individually to (one or more) other SRLGs.

2) 属于同一SRLG的两个链路可以分别属于(一个或多个)其他SRLG。

3) The SRLG set S of an LSP is defined as the union of the individual SRLG s of the individual links composing this LSP.

3) LSP的SRLG集定义为构成该LSP的各个链路的各个SRLG的并集。

SRLG disjointness is also applicable to LSPs:

SRLG不相交性也适用于LSP:

The LSP SRLG disjointness concept is based on the following postulate: an LSP (i.e., a sequence of links and nodes) covers an SRLG if and only if it crosses one of the links or nodes belonging to that SRLG.

LSP SRLG不相交性概念基于以下假设:LSP(即链路和节点序列)覆盖SRLG当且仅当其穿过属于该SRLG的链路或节点之一时。

Therefore, the SRLG disjointness for LSPs, can be defined as follows: two LSPs are disjoint with respect to an SRLG s if and only if they do not cover simultaneously this SRLG s.

因此,LSP的SRLG不相交性可定义如下:两个LSP关于SRLG s不相交当且仅当它们不同时覆盖该SRLG s时。

Whilst the SRLG disjointness for LSPs with respect to a set S of SRLGs, is defined as follows: two LSPs are disjoint with respect to a set of SRLGs S if and only if the set of SRLGs that are common to both LSPs is disjoint from set S.

当lgs到srs的集是不相交的,而lgs到srs的集是不相交的,则lgs到srs的集是不相交的。

The impact on recovery is noticeable: SRLG disjointness is a necessary (but not a sufficient) condition to ensure network survivability. With respect to the physical network resources, a working-recovery LSP/span pair must be SRLG-disjoint in case of dedicated recovery type. On the other hand, in case of shared recovery, a group of working LSP/spans must be mutually SRLG-disjoint in order to allow for a (single and common) shared recovery LSP that is itself SRLG-disjoint from each of the working LSPs/spans.

对恢复的影响是显而易见的:SRLG不相交是确保网络生存性的必要(但不是充分)条件。对于物理网络资源,如果是专用恢复类型,则工作恢复LSP/span对必须是SRLG不相交的。另一方面,在共享恢复的情况下,一组工作LSP/跨度必须相互SRLG不相交,以便允许(单个和公共)共享恢复LSP,该LSP本身与每个工作LSP/跨度SRLG不相交。

8. Recovery Mechanisms Analysis
8. 恢复机制分析

In order to provide a structured analysis of the recovery mechanisms detailed in the previous sections, the following dimensions can be considered:

为了对前面章节中详述的恢复机制进行结构化分析,可以考虑以下方面:

1. Fast convergence (performance): provide a mechanism that aggregates multiple failures (implying fast failure detection and correlation mechanisms) and fast recovery decision independently of the number of failures occurring in the optical network (also implying a fast failure notification).

1. 快速收敛(性能):提供一种聚合多个故障(意味着快速故障检测和关联机制)和快速恢复决策的机制,独立于光网络中发生的故障数量(也意味着快速故障通知)。

2. Efficiency (scalability): minimize the switching time required for LSP/span recovery independently of the number of LSPs/spans being recovered (this implies efficient failure correlation, fast failure notification, and time-efficient recovery mechanisms).

2. 效率(可伸缩性):最大限度地减少LSP/span恢复所需的切换时间,与正在恢复的LSP/span的数量无关(这意味着高效的故障关联、快速的故障通知和高效的恢复机制)。

3. Robustness (availability): minimize the LSP/span downtime independently of the underlying topology of the transport plane (this implies a highly responsive recovery mechanism).

3. 健壮性(可用性):独立于传输平面的底层拓扑,最大限度地减少LSP/span停机时间(这意味着具有高度响应的恢复机制)。

4. Resource optimization (optimality): minimize the resource capacity, including LSPs/spans and nodes (switching capacity), required for recovery purposes; this dimension can also be referred to as optimizing the sharing degree of the recovery resources.

4. 资源优化(优化):最小化恢复所需的资源容量,包括LSP/跨度和节点(交换容量);这个维度也可以称为优化恢复资源的共享程度。

5. Cost optimization: provide a cost-effective recovery type/scheme.

5. 成本优化:提供经济高效的回收类型/方案。

However, these dimensions are either outside the scope of this document (such as cost optimization and recovery path computational aspects) or mutually conflicting. For instance, it is obvious that providing a 1+1 LSP protection minimizes the LSP downtime (in case of failure) while being non-scalable and consuming recovery resource without enabling any extra-traffic.

但是,这些维度要么不在本文档的范围内(例如成本优化和恢复路径计算方面),要么相互冲突。例如,很明显,提供1+1 LSP保护可以最大限度地减少LSP停机时间(发生故障时),同时不具有可扩展性,并且在不启用任何额外流量的情况下消耗恢复资源。

The following sections analyze the recovery phases and mechanisms detailed in the previous sections with respect to the dimensions described above in order to assess the GMPLS protocol suite capabilities and applicability. In turn, this allows the evaluation of the potential need for further GMPLS signaling and routing extensions.

以下各节分析了前几节中针对上述维度详述的恢复阶段和机制,以评估GMPLS协议套件的功能和适用性。反过来,这允许评估进一步GMPLS信令和路由扩展的潜在需求。

8.1. Fast Convergence (Detection/Correlation and Hold-off Time)
8.1. 快速收敛(检测/相关和延迟时间)

Fast convergence is related to the failure management operations. It refers to the time elapsed between failure detection/correlation and hold-off time, the point at which the recovery switching actions are initiated. This point has been detailed in Section 4.

快速收敛与故障管理操作有关。它是指故障检测/关联和保持时间之间经过的时间,即启动恢复切换操作的时间点。这一点已在第4节中详细说明。

8.2. Efficiency (Recovery Switching Time)
8.2. 效率(恢复切换时间)

In general, the more pre-assignment/pre-planning of the recovery LSP/span, the more rapid the recovery is. Because protection implies pre-assignment (and cross-connection) of the protection resources, in general, protection recovers faster than restoration.

通常,恢复LSP/span的预分配/预规划越多,恢复越快。因为保护意味着预分配(和交叉连接)保护资源,所以一般来说,保护恢复比恢复快。

Span restoration is likely to be slower than most span protection types; however this greatly depends on the efficiency of the span restoration signaling. LSP restoration with pre-signaled and pre-selected recovery resources is likely to be faster than fully dynamic LSP restoration, especially because of the elimination of any potential crankback during the recovery LSP establishment.

跨度恢复可能比大多数跨度保护类型慢;然而,这在很大程度上取决于跨度恢复信令的效率。使用预发信号和预选恢复资源的LSP恢复可能比完全动态LSP恢复更快,特别是因为在恢复LSP建立过程中消除了任何潜在的回退。

If one excludes the crankback issue, the difference between dynamic and pre-planned restoration depends on the restoration path computation and selection time. Since computational considerations are outside the scope of this document, it is up to the vendor to determine the average and maximum path computation time in different scenarios and to the operator to decide whether or not dynamic restoration is advantageous over pre-planned schemes that depend on the network environment. This difference also depends on the flexibility provided by pre-planned restoration versus dynamic restoration. Pre-planned restoration implies a somewhat limited number of failure scenarios (that can be due, for instance, to local storage capacity limitation). Dynamic restoration enables on-demand path computation based on the information received through failure notification message, and as such, it is more robust with respect to the failure scenario scope.

如果排除回退问题,则动态恢复和预先计划的恢复之间的差异取决于恢复路径计算和选择时间。由于计算方面的考虑不在本文件的范围内,因此供应商应确定不同场景下的平均和最大路径计算时间,运营商应决定动态恢复是否优于依赖于网络环境的预先计划的方案。这种差异还取决于预先计划的恢复与动态恢复所提供的灵活性。预先计划的恢复意味着故障场景的数量有限(例如,可能是由于本地存储容量限制)。动态恢复能够根据通过故障通知消息接收到的信息按需计算路径,因此,它对于故障场景范围更为稳健。

Moreover, LSP segment restoration, in particular, dynamic restoration (i.e., no path pre-computation, so none of the recovery resource is pre-reserved) will generally be faster than end-to-end LSP restoration. However, local LSP restoration assumes that each LSP segment end-point has enough computational capacity to perform this operation while end-to-end LSP restoration requires only that LSP end-points provide this path computation capability.

此外,LSP段恢复,特别是动态恢复(即,没有路径预计算,因此没有任何恢复资源被预保留)通常比端到端LSP恢复快。然而,本地LSP恢复假定每个LSP段端点具有足够的计算能力来执行该操作,而端到端LSP恢复仅要求LSP端点提供该路径计算能力。

Recovery time objectives for SONET/SDH protection switching (not including time to detect failure) are specified in [G.841] at 50 ms, taking into account constraints on distance, number of connections

[G.841]中规定了SONET/SDH保护切换的恢复时间目标(不包括检测故障的时间)为50 ms,并考虑了距离、连接数量的限制

involved, and in the case of ring enhanced protection, number of nodes in the ring. Recovery time objectives for restoration mechanisms have been proposed through a separate effort [RFC3386].

涉及到,并且在环增强保护的情况下,环中的节点数。恢复机制的恢复时间目标已通过单独的工作提出[RFC3386]。

8.3. Robustness
8.3. 健壮性

In general, the less pre-assignment (protection)/pre-planning (restoration) of the recovery LSP/span, the more robust the recovery type or scheme is to a variety of single failures, provided that adequate resources are available. Moreover, the pre-selection of the recovery resources gives (in the case of multiple failure scenarios) less flexibility than no recovery resource pre-selection. For instance, if failures occur that affect two LSPs sharing a common link along their restoration paths, then only one of these LSPs can be recovered. This occurs unless the restoration path of at least one of these LSPs is re-computed, or the local resource assignment is modified on the fly.

通常,恢复LSP/span的预分配(保护)/预规划(恢复)越少,恢复类型或方案对各种单一故障的鲁棒性就越强,前提是有足够的资源可用。此外,恢复资源的预选(在多个故障场景的情况下)比没有恢复资源预选的灵活性要小。例如,如果发生的故障影响两个LSP沿其恢复路径共享公共链路,则只能恢复其中一个LSP。除非重新计算这些LSP中至少一个的恢复路径,或者动态修改本地资源分配,否则会发生这种情况。

In addition, recovery types and schemes with pre-planned recovery resources (in particular, LSP/spans for protection and LSPs for restoration purposes) will not be able to recover from failures that simultaneously affect both the working and recovery LSP/span. Thus, the recovery resources should ideally be as disjoint as possible (with respect to link, node, and SRLG) from the working ones, so that any single failure event will not affect both working and recovery LSP/span. In brief, working and recovery resources must be fully diverse in order to guarantee that a given failure will not affect simultaneously the working and the recovery LSP/span. Also, the risk of simultaneous failure of the working and the recovery LSPs can be reduced. It is reduced by computing a new recovery path whenever a failure occurs along one of the recovery LSPs or by computing a new recovery path and provision the corresponding LSP whenever a failure occurs along a working LSP/span. Both methods enable the network to maintain the number of available recovery path constant.

此外,具有预先计划的恢复资源的恢复类型和方案(特别是用于保护的LSP/span和用于恢复的LSP)将无法从同时影响工作和恢复LSP/span的故障中恢复。因此,理想情况下,恢复资源应尽可能与工作资源分离(关于链路、节点和SRLG),以便任何单一故障事件都不会影响工作和恢复LSP/span。简言之,工作和恢复资源必须完全多样化,以确保给定故障不会同时影响工作和恢复LSP/span。此外,还可以降低工作和恢复LSP同时失效的风险。通过在故障沿着其中一个恢复LSP发生时计算新的恢复路径,或者在故障沿着工作LSP/span发生时计算新的恢复路径并提供相应的LSP,可以减少故障。这两种方法都使网络能够保持可用恢复路径数不变。

The robustness of a recovery scheme is also determined by the amount of pre-reserved (i.e., signaled) recovery resources within a given shared resource pool: as the sharing degree of recovery resources increases, the recovery scheme becomes less robust to multiple LSP/span failure occurrences. Recovery schemes, in particular restoration, with pre-signaled resource reservation (with or without pre-selection) should be capable of reserving an adequate amount of resource to ensure recovery from any specific set of failure events, such as any single SRLG failure, any two SRLG failures, etc.

恢复方案的健壮性还取决于给定共享资源池中预保留(即,发信号的)恢复资源的数量:随着恢复资源共享程度的增加,恢复方案对多个LSP/span故障发生的健壮性降低。恢复方案,特别是恢复方案,具有预发信号的资源保留(有预选或无预选),应能够保留足够的资源,以确保从任何特定的故障事件集恢复,如任何单个SRLG故障、任何两个SRLG故障等。

8.4. Resource Optimization
8.4. 资源优化

It is commonly admitted that sharing recovery resources provides network resource optimization. Therefore, from a resource utilization perspective, protection schemes are often classified with respect to their degree of sharing recovery resources with the working entities. Moreover, non-permanent bridging protection types allow (under normal conditions) for extra-traffic over the recovery resources.

人们普遍认为,共享恢复资源可以优化网络资源。因此,从资源利用的角度来看,保护方案通常根据其与工作实体共享恢复资源的程度进行分类。此外,非永久性桥接保护类型允许(在正常情况下)恢复资源上的额外流量。

From this perspective, the following statements are true:

从这个角度来看,以下陈述是正确的:

1) 1+1 LSP/Span protection is the most resource-consuming protection type because it does not allow for any extra traffic.

1) 1+1 LSP/Span保护是最消耗资源的保护类型,因为它不允许任何额外流量。

2) 1:1 LSP/span recovery requires dedicated recovery LSP/span allowing for extra traffic.

2) 1:1 LSP/span恢复需要专用恢复LSP/span,以允许额外流量。

3) 1:N and M:N LSP/span recovery require 1 (and M, respectively) recovery LSP/span (shared between the N working LSP/span) allowing for extra traffic.

3) 1:N和M:N LSP/span恢复需要1(分别为M)个恢复LSP/span(在N个工作LSP/span之间共享),以允许额外的流量。

Obviously, 1+1 protection precludes, and 1:1 recovery does not allow for any recovery LSP/span sharing, whereas 1:N and M:N recovery do allow sharing of 1 (M, respectively) recovery LSP/spans between N working LSP/spans. However, despite the fact that 1:1 LSP recovery precludes the sharing of the recovery LSP, the recovery schemes that can be built from it (e.g., (1:1)^n, see Section 5.4) do allow sharing of its recovery resources. In addition, the flexibility in the usage of shared recovery resources (in particular, shared links) may be limited because of network topology restrictions, e.g., fixed ring topology for traditional enhanced protection schemes.

显然,1+1保护不允许,1:1恢复不允许任何恢复LSP/span共享,而1:N和M:N恢复允许在N个工作LSP/span之间共享1(分别为M)个恢复LSP/span。但是,尽管1:1 LSP恢复排除了恢复LSP的共享,但可以从中构建的恢复方案(例如,(1:1)^n,请参见第5.4节)确实允许共享其恢复资源。此外,由于网络拓扑限制(例如,传统增强保护方案的固定环拓扑),共享恢复资源(尤其是共享链路)使用的灵活性可能会受到限制。

On the other hand, when using LSP restoration with pre-signaled resource reservation, the amount of reserved restoration capacity is determined by the local bandwidth reservation policies. In LSP restoration schemes with re-provisioning, a pool of spare resources can be defined from which all resources are selected after failure occurrence for the purpose of restoration path computation. The degree to which restoration schemes allow sharing amongst multiple independent failures is then directly inferred from the size of the resource pool. Moreover, in all restoration schemes, spare resources can be used to carry preemptible traffic (thus over preemptible LSP/span) when the corresponding resources have not been committed for LSP/span recovery purposes.

另一方面,当使用具有预信号化资源保留的LSP恢复时,保留的恢复容量的量由本地带宽保留策略确定。在具有重新配置的LSP恢复方案中,可以定义一个备用资源池,在发生故障后从中选择所有资源,以便进行恢复路径计算。恢复方案允许在多个独立故障之间共享的程度直接从资源池的大小推断出来。此外,在所有恢复方案中,当相应的资源尚未提交用于LSP/span恢复目的时,可以使用备用资源来承载可抢占的通信量(从而超过可抢占的LSP/span)。

From this, it clearly follows that less recovery resources (i.e., LSP/spans and switching capacity) have to be allocated to a shared

由此可以清楚地看出,必须分配给共享网络的恢复资源(即LSP/跨度和交换容量)更少

recovery resource pool if a greater sharing degree is allowed. Thus, the network survivability level is determined by the policy that defines the amount of shared recovery resources and by the maximum sharing degree allowed for these recovery resources.

如果允许更大的共享度,则恢复资源池。因此,网络生存性级别由定义共享恢复资源量的策略和这些恢复资源允许的最大共享程度决定。

8.4.1. Recovery Resource Sharing
8.4.1. 恢复资源共享

When recovery resources are shared over several LSP/Spans, the use of the Maximum Reservable Bandwidth, the Unreserved Bandwidth, and the Maximum LSP Bandwidth (see [RFC4202]) provides the information needed to obtain the optimization of the network resources allocated for shared recovery purposes.

当在多个LSP/跨度上共享恢复资源时,最大可保留带宽、无保留带宽和最大LSP带宽(请参见[RFC4202])的使用提供了获得为共享恢复目的分配的网络资源优化所需的信息。

The Maximum Reservable Bandwidth is defined as the Maximum Link Bandwidth but it may be greater in case of link over-subscription.

最大可保留带宽定义为最大链路带宽,但在链路过度订阅的情况下,它可能会更大。

The Unreserved Bandwidth (at priority p) is defined as the bandwidth not yet reserved on a given TE link (its initial value for each priority p corresponds to the Maximum Reservable Bandwidth). Last, the Maximum LSP Bandwidth (at priority p) is defined as the smaller of Unreserved Bandwidth (at priority p) and Maximum Link Bandwidth.

未保留带宽(优先级p)定义为给定TE链路上尚未保留的带宽(每个优先级p的初始值对应于最大可保留带宽)。最后,将最大LSP带宽(优先级为p)定义为未保留带宽(优先级为p)和最大链路带宽中的较小者。

Here, one generally considers a recovery resource sharing degree (or ratio) to globally optimize the shared recovery resource usage. The distribution of the bandwidth utilization per TE link can be inferred from the per-priority bandwidth pre-allocation. By using the Maximum LSP Bandwidth and the Maximum Reservable Bandwidth, the amount of (over-provisioned) resources that can be used for shared recovery purposes is known from the IGP.

这里,人们通常会考虑恢复资源共享程度(或比率),以全局优化共享恢复资源的使用。每个TE链路的带宽利用率分布可以从每个优先级的带宽预分配中推断出来。通过使用最大LSP带宽和最大可保留带宽,可以从IGP知道用于共享恢复目的的(过度配置的)资源量。

In order to analyze this behavior, we define the difference between the Maximum Reservable Bandwidth (in the present case, this value is greater than the Maximum Link Bandwidth) and the Maximum LSP Bandwidth per TE link i as the Maximum Shareable Bandwidth or max_R[i]. Within this quantity, the amount of bandwidth currently allocated for shared recovery per TE link i is defined as R[i]. Both quantities are expressed in terms of discrete bandwidth units (and thus, the Minimum LSP Bandwidth is of one bandwidth unit).

为了分析这种行为,我们将最大可保留带宽(在当前情况下,该值大于最大链路带宽)和每个TE链路i的最大LSP带宽之间的差异定义为最大可共享带宽或max_R[i]。在该数量内,当前分配给每个TE链路i的共享恢复的带宽量被定义为R[i]。这两个量都以离散带宽单位表示(因此,最小LSP带宽为一个带宽单位)。

The knowledge of this information available per TE link can be exploited in order to optimize the usage of the resources allocated per TE link for shared recovery. If one refers to r[i] as the actual bandwidth per TE link i (in terms of discrete bandwidth units) committed for shared recovery, then the following quantity must be maximized over the potential TE link candidates:

可以利用每个TE链路可用的此信息的知识,优化每个TE链路分配的资源的使用情况,以实现共享恢复。如果将r[i]称为每个TE链路i的实际带宽(以离散带宽单位表示)用于共享恢复,则必须在潜在TE链路候选上最大化以下数量:

        sum {i=1}^N [(R{i} - r{i})/(t{i} - b{i})]
        
        sum {i=1}^N [(R{i} - r{i})/(t{i} - b{i})]
        
        or equivalently: sum {i=1}^N [(R{i} - r{i})/r{i}]
        
        or equivalently: sum {i=1}^N [(R{i} - r{i})/r{i}]
        
        with R{i} >= 1 and r{i} >= 1 (in terms of per component
        bandwidth unit)
        
        with R{i} >= 1 and r{i} >= 1 (in terms of per component
        bandwidth unit)
        

In this formula, N is the total number of links traversed by a given LSP, t[i] the Maximum Link Bandwidth per TE link i, and b[i] the sum per TE link i of the bandwidth committed for working LSPs and other recovery LSPs (thus except "shared bandwidth" LSPs). The quantity [(R{i} - r{i})/r{i}] is defined as the Shared (Recovery) Bandwidth Ratio per TE link i. In addition, TE links for which R[i] reaches max_R[i] or for which r[i] = 0 are pruned during shared recovery path computation as well as TE links for which max_R[i] = r[i] that can simply not be shared.

在该公式中,N是给定LSP穿过的链路总数,t[i]是每个TE链路i的最大链路带宽,b[i]是每个TE链路i为工作LSP和其他恢复LSP(因此“共享带宽”LSP除外)承诺的带宽之和。数量[(R{i}-R{i})/R{i}]被定义为每个TE链路i的共享(恢复)带宽比。此外,在共享恢复路径计算期间,对R[i]达到max_R[i]或R[i]=0的TE链路以及不能简单共享的max_R[i]=R[i]的TE链路进行修剪。

More generally, one can draw the following mapping between the available bandwidth at the transport and control plane level:

更一般地,可以在传输和控制平面级别的可用带宽之间绘制以下映射:

                                 - ---------- Max Reservable Bandwidth
                                |  -----  ^
                                |R -----  |
                                |  -----  |
                                 - -----  |max_R
                                   -----  |
   --------  TE link Capacity    - ------ | - Maximum TE Link Bandwidth
   -----                        |r -----  v
   -----     <------ b ------>   - ---------- Maximum LSP Bandwidth
   -----                           -----
   -----                           -----
   -----                           -----
   -----                           -----
   -----                           ----- <--- Minimum LSP Bandwidth
   -------- 0                      ---------- 0
        
                                 - ---------- Max Reservable Bandwidth
                                |  -----  ^
                                |R -----  |
                                |  -----  |
                                 - -----  |max_R
                                   -----  |
   --------  TE link Capacity    - ------ | - Maximum TE Link Bandwidth
   -----                        |r -----  v
   -----     <------ b ------>   - ---------- Maximum LSP Bandwidth
   -----                           -----
   -----                           -----
   -----                           -----
   -----                           -----
   -----                           ----- <--- Minimum LSP Bandwidth
   -------- 0                      ---------- 0
        

Note that the above approach does not require the flooding of any per LSP information or any detailed distribution of the bandwidth allocation per component link or individual ports or even any per-priority shareable recovery bandwidth information (using a dedicated sub-TLV). The latter would provide the same capability as the already defined Maximum LSP bandwidth per-priority information. This approach is referred to as a Partial (or Aggregated) Information Routing as described in [KODIALAM1] and [KODIALAM2]. They show that the difference obtained with a Full (or Complete) Information Routing approach (where for the whole set of working and recovery LSPs, the amount of bandwidth units they use per-link is known at each node and for each link) is clearly negligible. The Full Information Routing

注意,上述方法不需要任何每LSP信息的泛洪,也不需要每个组件链路或单个端口的带宽分配的任何详细分布,甚至不需要任何每优先级可共享恢复带宽信息(使用专用子TLV)。后者将提供与已定义的每个优先级信息的最大LSP带宽相同的能力。这种方法称为部分(或聚合)信息路由,如[KODIALAM1]和[KODIALAM2]所述。它们表明,使用完整(或完整)信息路由方法(其中对于整个工作和恢复LSP组,每个节点和每个链路上每个链路使用的带宽单元数量是已知的)获得的差异显然可以忽略不计。全信息路由

approach is detailed in [GLI]. Note also that both approaches rely on the deterministic knowledge (at different degrees) of the network topology and resource usage status.

方法详见[GLI]。还请注意,这两种方法都依赖于网络拓扑和资源使用状态的确定性知识(在不同程度上)。

Moreover, extending the GMPLS signaling capabilities can enhance the Partial Information Routing approach. It is enhanced by allowing working-LSP-related information and, in particular, its path (including link and node identifiers) to be exchanged with the recovery LSP request. This enables more efficient admission control at upstream nodes of shared recovery resources, and in particular, links (see Section 8.4.3).

此外,扩展GMPLS信令能力可以增强部分信息路由方法。它通过允许与工作LSP相关的信息,特别是其路径(包括链路和节点标识符)与恢复LSP请求交换而得到增强。这使得共享恢复资源的上游节点,特别是链路的准入控制更加有效(见第8.4.3节)。

8.4.2. Recovery Resource Sharing and SRLG Recovery
8.4.2. 恢复资源共享和SRLG恢复

Resource shareability can also be maximized with respect to the number of times each SRLG is protected by a recovery resource (in particular, a shared TE link) and methods can be considered for avoiding contention of the shared recovery resources in case of single SRLG failure. These methods enable the sharing of recovery resources between two (or more) recovery LSPs, if their respective working LSPs are mutually disjoint with respect to link, node, and SRLGs. Then, a single failure does not simultaneously disrupt several (or at least two) working LSPs.

资源共享性还可以根据每个SRLG受恢复资源(特别是共享TE链路)保护的次数最大化,并且可以考虑在单个SRLG故障的情况下避免共享恢复资源争用的方法。如果两个(或多个)恢复LSP各自的工作LSP在链路、节点和SRLGs方面相互不相交,则这些方法可以在两个(或多个)恢复LSP之间共享恢复资源。然后,单个故障不会同时中断多个(或至少两个)工作LSP。

For instance, [BOUILLET] shows that the Partial Information Routing approach can be extended to cover recovery resource shareability with respect to SRLG recoverability (i.e., the number of times each SRLG is recoverable). By flooding this aggregated information per TE link, path computation and selection of SRLG-diverse recovery LSPs can be optimized with respect to the sharing of recovery resource reserved on each TE link. This yields a performance difference of less than 5%, which is negligible compared to the corresponding Full Information Flooding approach (see [GLI]).

例如,[BOUILLET]表明,可以扩展部分信息路由方法,以涵盖与SRLG可恢复性相关的恢复资源共享性(即,每个SRLG可恢复的次数)。通过在每个TE链路上泛洪此聚合信息,可以优化SRLG不同恢复LSP的路径计算和选择,以共享每个TE链路上保留的恢复资源。这会产生小于5%的性能差异,与相应的全信息泛洪方法相比,这一差异可以忽略不计(参见[GLI])。

For this purpose, additional extensions to [RFC4202] in support of path computation for shared mesh recovery have been often considered in the literature. TE link attributes would include, among others, the current number of recovery LSPs sharing the recovery resources reserved on the TE link, and the current number of SRLGs recoverable by this amount of (shared) recovery resources reserved on the TE link. The latter is equivalent to the current number of SRLGs that will be recovered by the recovery LSPs sharing the recovery resource reserved on the TE link. Then, if explicit SRLG recoverability is considered, a TE link attribute would be added that includes the explicit list of SRLGs (recoverable by the shared recovery resource reserved on the TE link) and their respective shareable recovery bandwidths. The latter information is equivalent to the shareable recovery bandwidth per SRLG (or per group of SRLGs), which implies

为此,文献中经常考虑对[RFC4202]进行额外扩展,以支持共享网格恢复的路径计算。TE链路属性将包括共享TE链路上保留的恢复资源的恢复LSP的当前数量,以及通过TE链路上保留的(共享)恢复资源量可恢复的SRLGs的当前数量。后者相当于将由共享TE链路上保留的恢复资源的恢复LSP恢复的SRLGs的当前数量。然后,如果考虑显式SRLG可恢复性,则将添加TE链路属性,该属性包括SRLG的显式列表(可由保留在TE链路上的共享恢复资源恢复)及其各自的可共享恢复带宽。后一种信息相当于每个SRLG(或每组SRLG)的可共享恢复带宽,这意味着

that the amount of shareable bandwidth and the number of listed SRLGs will decrease over time.

随着时间的推移,可共享带宽的数量和列出的SRLGs的数量将减少。

Compared to the case of recovery resource sharing only (regardless of SRLG recoverability, as described in Section 8.4.1), these additional TE link attributes would potentially deliver better path computation and selection (at a distinct ingress node) for shared mesh recovery purposes. However, due to the lack of evidence of better efficiency and due to the complexity that such extensions would generate, they are not further considered in the scope of the present analysis. For instance, a per-SRLG group minimum/maximum shareable recovery bandwidth is restricted by the length that the corresponding (sub-) TLV may take and thus the number of SRLGs that it can include. Therefore, the corresponding parameter should not be translated into GMPLS routing (or even signaling) protocol extensions in the form of TE link sub-TLV.

与仅共享恢复资源的情况相比(与SRLG可恢复性无关,如第8.4.1节所述),这些附加TE链路属性可能会为共享网格恢复目的提供更好的路径计算和选择(在不同的入口节点)。然而,由于缺乏证据表明效率更高,而且由于这种扩展会产生复杂性,因此在本分析的范围内没有进一步考虑这些问题。例如,每个SRLG组的最小/最大可共享恢复带宽受到相应(子)TLV可能采用的长度以及它可以包括的SRLG数量的限制。因此,相应的参数不应以TE链路子TLV的形式转换为GMPLS路由(甚至信令)协议扩展。

8.4.3. Recovery Resource Sharing, SRLG Disjointness and Admission Control

8.4.3. 恢复资源共享、SRLG分离和接纳控制

Admission control is a strict requirement to be fulfilled by nodes giving access to shared links. This can be illustrated using the following network topology:

允许控制是允许访问共享链路的节点必须满足的严格要求。可以使用以下网络拓扑来说明这一点:

      A ------ C ====== D
      |        |        |
      |        |        |
      |        B        |
      |        |        |
      |        |        |
       ------- E ------ F
        
      A ------ C ====== D
      |        |        |
      |        |        |
      |        B        |
      |        |        |
      |        |        |
       ------- E ------ F
        

Node A creates a working LSP to D (A-C-D), B creates simultaneously a working LSP to D (B-C-D) and a recovery LSP (B-E-F-D) to the same destination. Then, A decides to create a recovery LSP to D (A-E-F-D), but since the C-D span carries both working LSPs, node E should either assign a dedicated resource for this recovery LSP or reject this request if the C-D span has already reached its maximum recovery bandwidth sharing ratio. In the latter case, C-D span failure would imply that one of the working LSP would not be recoverable.

节点A创建工作LSP到D(A-C-D),B同时创建工作LSP到D(B-C-D)和恢复LSP(B-E-F-D)到同一目的地。然后,A决定创建一个到D的恢复LSP(A-E-F-D),但是由于C-D跨度携带两个工作LSP,节点E应该为该恢复LSP分配一个专用资源,或者如果C-D跨度已经达到其最大恢复带宽共享率,则拒绝该请求。在后一种情况下,C-D跨度故障意味着其中一个工作LSP将无法恢复。

Consequently, node E must have the required information to perform admission control for the recovery LSP requests it processes (implying for instance, that the path followed by the working LSP is carried with the corresponding recovery LSP request). If node E can guarantee that the working LSPs (A-C-D and B-C-D) are SRLG disjoint over the C-D span, it may securely accept the incoming recovery LSP request and assign to the recovery LSPs (A-E-F-D and B-E-F-D) the

因此,节点E必须具有对其处理的恢复LSP请求执行接纳控制所需的信息(例如,意味着工作LSP所遵循的路径与相应的恢复LSP请求一起携带)。如果节点E可以保证工作LSP(A-C-D和B-C-D)在C-D跨度上是SRLG不相交的,那么它可以安全地接受传入的恢复LSP请求,并向恢复LSP(A-E-F-D和B-E-F-D)分配

same resources on the link E-F. This may occur if the link E-F has not yet reached its maximum recovery bandwidth sharing ratio. In this example, one assumes that the node failure probability is negligible compared to the link failure probability.

链路E-F上的资源相同。如果链路E-F尚未达到其最大恢复带宽共享率,则可能发生这种情况。在此示例中,假设节点故障概率与链路故障概率相比可以忽略不计。

To achieve this, the path followed by the working LSP is transported with the recovery LSP request and examined at each upstream node of potentially shareable links. Admission control is performed using the interface identifiers (included in the path) to retrieve in the TE DataBase the list of SRLG IDs associated to each of the working LSP links. If the working LSPs (A-C-D and B-C-D) have one or more link or SRLG ID in common (in this example, one or more SRLG id in common over the span C-D), node E should not assign the same resource over link E-F to the recovery LSPs (A-E-F-D and B-E-F-D). Otherwise, one of these working LSPs would not be recoverable if C-D span failure occurred.

为了实现这一点,工作LSP遵循的路径与恢复LSP请求一起传输,并在潜在可共享链路的每个上游节点处进行检查。使用接口标识符(包括在路径中)执行接纳控制,以在TE数据库中检索与每个工作LSP链路相关联的SRLG id的列表。如果工作LSP(A-C-D和B-C-D)具有一个或多个共同的链路或SRLG ID(在本例中,在跨度C-D上有一个或多个共同的SRLG ID),则节点E不应通过链路E-F将相同的资源分配给恢复LSP(A-E-F-D和B-E-F-D)。否则,如果发生C-D span故障,其中一个工作LSP将无法恢复。

There are some issues related to this method; the major one is the number of SRLG IDs that a single link can cover (more than 100, in complex environments). Moreover, when using link bundles, this approach may generate the rejection of some recovery LSP requests. This occurs when the SRLG sub-TLV corresponding to a link bundle includes the union of the SRLG id list of all the component links belonging to this bundle (see [RFC4202] and [RFC4201]).

该方法存在一些相关问题;主要原因是单个链路可以覆盖的SRLG ID数量(在复杂环境中超过100个)。此外,当使用链路包时,这种方法可能会拒绝某些恢复LSP请求。当对应于链路束的SRLG子TLV包括属于该束的所有组件链路的SRLG id列表的并集时,就会发生这种情况(请参见[RFC4202]和[RFC4201])。

In order to overcome this specific issue, an additional mechanism may consist of querying the nodes where the information would be available (in this case, node E would query C). The main drawback of this method is that (in addition to the dedicated mechanism(s) it requires) it may become complex when several common nodes are traversed by the working LSPs. Therefore, when using link bundles, solving this issue is closely related to the sequence of the recovery operations. Per-component flooding of SRLG identifiers would deeply impact the scalability of the link state routing protocol. Therefore, one may rely on the usage of an on-line accessible network management system.

为了克服这个特定问题,另一种机制可能包括查询信息可用的节点(在这种情况下,节点E将查询C)。这种方法的主要缺点是(除了它需要的专用机制外),当工作LSP遍历多个公共节点时,它可能变得复杂。因此,在使用链接束时,解决此问题与恢复操作的顺序密切相关。SRLG标识符的每组件泛滥将严重影响链路状态路由协议的可伸缩性。因此,可以依赖在线可访问网络管理系统的使用。

9. Summary and Conclusions
9. 摘要和结论

The following table summarizes the different recovery types and schemes analyzed throughout this document.

下表总结了本文档中分析的不同恢复类型和方案。

   --------------------------------------------------------------------
              |       Path Search (computation and selection)
   --------------------------------------------------------------------
              |       Pre-planned (a)      |         Dynamic (b)
   --------------------------------------------------------------------
          |   | faster recovery            | Does not apply
          |   | less flexible              |
          | 1 | less robust                |
          |   | most resource-consuming    |
   Path   |   |                            |
   Setup   ------------------------------------------------------------
          |   | relatively fast recovery   | Does not apply
          |   | relatively flexible        |
          | 2 | relatively robust          |
          |   | resource consumption       |
          |   |  depends on sharing degree |
           ------------------------------------------------------------
          |   | relatively fast recovery   | less faster (computation)
          |   | more flexible              | most flexible
          | 3 | relatively robust          | most robust
          |   | less resource-consuming    | least resource-consuming
          |   |  depends on sharing degree |
   --------------------------------------------------------------------
        
   --------------------------------------------------------------------
              |       Path Search (computation and selection)
   --------------------------------------------------------------------
              |       Pre-planned (a)      |         Dynamic (b)
   --------------------------------------------------------------------
          |   | faster recovery            | Does not apply
          |   | less flexible              |
          | 1 | less robust                |
          |   | most resource-consuming    |
   Path   |   |                            |
   Setup   ------------------------------------------------------------
          |   | relatively fast recovery   | Does not apply
          |   | relatively flexible        |
          | 2 | relatively robust          |
          |   | resource consumption       |
          |   |  depends on sharing degree |
           ------------------------------------------------------------
          |   | relatively fast recovery   | less faster (computation)
          |   | more flexible              | most flexible
          | 3 | relatively robust          | most robust
          |   | less resource-consuming    | least resource-consuming
          |   |  depends on sharing degree |
   --------------------------------------------------------------------
        

1a. Recovery LSP setup (before failure occurrence) with resource reservation (i.e., signaling) and selection is referred to as LSP protection.

1a。恢复LSP设置(故障发生前)和资源保留(即信令)和选择称为LSP保护。

2a. Recovery LSP setup (before failure occurrence) with resource reservation (i.e., signaling) and with resource pre-selection is referred to as pre-planned LSP re-routing with resource pre-selection. This implies only recovery LSP activation after failure occurrence.

2a。具有资源预留(即信令)和资源预选的恢复LSP设置(故障发生前)称为具有资源预选的预计划LSP重路由。这意味着仅在发生故障后恢复LSP激活。

3a. Recovery LSP setup (before failure occurrence) with resource reservation (i.e., signaling) and without resource selection is referred to as pre-planned LSP re-routing without resource pre-selection. This implies recovery LSP activation and resource (i.e., label) selection after failure occurrence.

3a。恢复LSP设置(故障发生前)具有资源保留(即信令)且无资源选择称为无资源预选的预先计划LSP重新路由。这意味着故障发生后恢复LSP激活和资源(即标签)选择。

3b. Recovery LSP setup after failure occurrence is referred to as to as LSP re-routing, which is full when recovery LSP path computation occurs after failure occurrence.

3b。故障发生后的恢复LSP设置称为LSP重路由,当故障发生后进行恢复LSP路径计算时,该重路由已满。

Thus, the term pre-planned refers to recovery LSP path pre-computation, signaling (reservation), and a priori resource selection (optional), but not cross-connection. Also, the shared-mesh recovery scheme can be viewed as a particular case of 2a) and 3a), using the additional constraint described in Section 8.4.3.

因此,术语预先规划是指恢复LSP路径预先计算、信令(保留)和先验资源选择(可选),但不是交叉连接。此外,使用第8.4.3节中描述的附加约束,共享网格恢复方案可被视为2a)和3a)的特殊情况。

The implementation of these recovery mechanisms requires only considering extensions to GMPLS signaling protocols (i.e., [RFC3471] and [RFC3473]). These GMPLS signaling extensions should mainly focus in delivering (1) recovery LSP pre-provisioning for the cases 1a, 2a, and 3a, (2) LSP failure notification, (3) recovery LSP switching action(s), and (4) reversion mechanisms.

这些恢复机制的实现只需要考虑对GMPLS信令协议(即[RFC3471]和[RFC3473])的扩展。这些GMPLS信令扩展应主要侧重于提供(1)针对情况1a、2a和3a的恢复LSP预配置,(2)LSP故障通知,(3)恢复LSP切换动作,以及(4)恢复机制。

Moreover, the present analysis (see Section 8) shows that no GMPLS routing extensions are expected to efficiently implement any of these recovery types and schemes.

此外,目前的分析(见第8节)表明,预计没有GMPLS路由扩展能够有效地实现这些恢复类型和方案中的任何一种。

10. Security Considerations
10. 安全考虑

This document does not introduce any additional security issue or imply any specific security consideration from [RFC3945] to the current RSVP-TE GMPLS signaling, routing protocols (OSPF-TE, IS-IS-TE) or network management protocols.

本文件不引入任何额外的安全问题,也不暗示从[RFC3945]到当前RSVP-TE GMPLS信令、路由协议(OSPF-TE、IS-IS-TE)或网络管理协议的任何特定安全考虑。

However, the authorization of requests for resources by GMPLS-capable nodes should determine whether a given party, presumably already authenticated, has a right to access the requested resources. This determination is typically a matter of local policy control, for example, by setting limits on the total bandwidth made available to some party in the presence of resource contention. Such policies may become quite complex as the number of users, types of resources, and sophistication of authorization rules increases. This is particularly the case for recovery schemes that assume pre-planned sharing of recovery resources, or contention for resources in case of dynamic re-routing.

然而,具有GMPLS能力的节点对资源请求的授权应该确定给定的一方(可能已经通过身份验证)是否有权访问请求的资源。该确定通常是本地策略控制的问题,例如,通过设置在存在资源争用的情况下对某一方可用的总带宽的限制。随着用户数量、资源类型和授权规则复杂性的增加,此类策略可能变得相当复杂。恢复方案尤其如此,这些恢复方案假定预先计划的恢复资源共享,或者在动态重新路由的情况下争夺资源。

Therefore, control elements should match the requests against the local authorization policy. These control elements must be capable of making decisions based on the identity of the requester, as verified cryptographically and/or topologically.

因此,控制元素应该根据本地授权策略匹配请求。这些控制元件必须能够根据请求者的身份做出决策,并通过加密和/或拓扑验证。

11. Acknowledgements
11. 致谢

The authors would like to thank Fabrice Poppe (Alcatel) and Bart Rousseau (Alcatel) for their revision effort, and Richard Rabbat (Fujitsu Labs), David Griffith (NIST), and Lyndon Ong (Ciena) for their useful comments.

作者要感谢Fabrice Poppe(阿尔卡特)和Bart Rousseau(阿尔卡特)的修订工作,以及Richard Rabbat(富士通实验室)、David Griffith(NIST)和Lyndon Ong(Ciena)的有用评论。

Thanks also to Adrian Farrel for the thorough review of the document.

还感谢阿德里安·法雷尔对该文件的全面审查。

12. References
12. 工具书类
12.1. Normative References
12.1. 规范性引用文件

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

[RFC3471] Berger, L., "Generalized Multi-Protocol Label Switching (GMPLS) Signaling Functional Description", RFC 3471, January 2003.

[RFC3471]Berger,L.“通用多协议标签交换(GMPLS)信令功能描述”,RFC 3471,2003年1月。

[RFC3473] Berger, L., "Generalized Multi-Protocol Label Switching (GMPLS) Signaling Resource ReserVation Protocol-Traffic Engineering (RSVP-TE) Extensions", RFC 3473, January 2003.

[RFC3473]Berger,L.“通用多协议标签交换(GMPLS)信令资源预留协议流量工程(RSVP-TE)扩展”,RFC 3473,2003年1月。

[RFC3945] Mannie, E., "Generalized Multi-Protocol Label Switching (GMPLS) Architecture", RFC 3945, October 2004.

[RFC3945]Mannie,E.“通用多协议标签交换(GMPLS)体系结构”,RFC 39452004年10月。

[RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling in MPLS Traffic Engineering (TE)", RFC 4201, October 2005.

[RFC4201]Kompella,K.,Rekhter,Y.,和L.Berger,“MPLS流量工程(TE)中的链路捆绑”,RFC 42012005年10月。

[RFC4202] Kompella, K., Ed. and Y. Rekhter, Ed., "Routing Extensions in Support of Generalized Multi-Protocol Label Switching (GMPLS)", RFC 4202, October 2005.

[RFC4202]Kompella,K.,Ed.和Y.Rekhter,Ed.,“支持通用多协议标签交换(GMPLS)的路由扩展”,RFC 4202,2005年10月。

[RFC4204] Lang, J., Ed., "Link Management Protocol (LMP)", RFC 4204, October 2005.

[RFC4204]Lang,J.,Ed.,“链路管理协议(LMP)”,RFC4204,2005年10月。

[RFC4209] Fredette, A., Ed. and J. Lang, Ed., "Link Management Protocol (LMP) for Dense Wavelength Division Multiplexing (DWDM) Optical Line Systems", RFC 4209, October 2005.

[RFC4209]Fredette,A.,Ed.和J.Lang,Ed.,“密集波分复用(DWDM)光纤线路系统的链路管理协议(LMP)”,RFC 4209,2005年10月。

[RFC4427] Mannie E., Ed. and D. Papadimitriou, Ed., "Recovery (Protection and Restoration) Terminology for Generalized Multi-Protocol Label Switching (GMPLS)", RFC 4427, March 2006.

[RFC4427]Mannie E.,Ed.和D.Papadimitriou,Ed.,“通用多协议标签交换(GMPLS)的恢复(保护和恢复)术语”,RFC 4427,2006年3月。

12.2. Informative References
12.2. 资料性引用

[BOUILLET] E. Bouillet, et al., "Stochastic Approaches to Compute Shared Meshed Restored Lightpaths in Optical Network Architectures," IEEE Infocom 2002, New York City, June 2002.

[BOUILLET]E.BOUILLET等人,“光网络体系结构中计算共享网状恢复光路的随机方法”,IEEE Infocom 2002,纽约市,2002年6月。

[DEMEESTER] P. Demeester, et al., "Resilience in Multilayer Networks," IEEE Communications Magazine, Vol. 37, No. 8, pp. 70-76, August 1998.

[Demester]P.Demester等人,“多层网络中的弹性”,《IEEE通信杂志》,第37卷,第8期,第70-76页,1998年8月。

[GLI] G. Li, et al., "Efficient Distributed Path Selection for Shared Restoration Connections," IEEE Infocom 2002, New York City, June 2002.

[GLI]G.Li等,“共享恢复连接的有效分布式路径选择”,IEEE Infocom 2002,纽约市,2002年6月。

[IPO-IMP] Strand, J. and A. Chiu, "Impairments and Other Constraints on Optical Layer Routing", RFC 4054, May 2005.

[IPO-IMP]Strand,J.和A.Chiu,“光学层路由的损伤和其他限制”,RFC 4054,2005年5月。

[KODIALAM1] M. Kodialam and T.V. Lakshman, "Restorable Dynamic Quality of Service Routing," IEEE Communications Magazine, pp. 72-81, June 2002.

[KODIALAM1]M.Kodialam和T.V.Lakshman,“可恢复的动态服务质量路由”,IEEE通信杂志,第72-81页,2002年6月。

[KODIALAM2] M. Kodialam and T.V. Lakshman, "Dynamic Routing of Restorable Bandwidth-Guaranteed Tunnels using Aggregated Network Resource Usage Information," IEEE/ ACM Transactions on Networking, pp. 399-410, June 2003.

[KODIALAM2]M.Kodialam和T.V.Lakshman,“使用聚合网络资源使用信息的可恢复带宽保证隧道的动态路由”,IEEE/ACM网络交易,第399-410页,2003年6月。

[MANCHESTER] J. Manchester, P. Bonenfant and C. Newton, "The Evolution of Transport Network Survivability," IEEE Communications Magazine, August 1999.

[曼彻斯特]J.MANCHESTER,P.Bonenfant和C.Newton,“传输网络生存能力的演变”,IEEE通信杂志,1999年8月。

[RFC3386] Lai, W. and D. McDysan, "Network Hierarchy and Multilayer Survivability", RFC 3386, November 2002.

[RFC3386]Lai,W.和D.McDysan,“网络层次结构和多层生存能力”,RFC 3386,2002年11月。

[T1.105] ANSI, "Synchronous Optical Network (SONET): Basic Description Including Multiplex Structure, Rates, and Formats," ANSI T1.105, January 2001.

[T1.105]ANSI,“同步光网络(SONET):基本描述,包括多路复用结构、速率和格式”,ANSI T1.105,2001年1月。

[WANG] J. Wang, L. Sahasrabuddhe, and B. Mukherjee, "Path vs. Subpath vs. Link Restoration for Fault Management in IP-over-WDM Networks: Performance Comparisons Using GMPLS Control Signaling," IEEE Communications Magazine, pp. 80-87, November 2002.

[WANG]J.WANG,L.Sahasrabuddhe和B.Mukherjee,“IP over WDM网络中故障管理的路径与子路径与链路恢复:使用GMPLS控制信令的性能比较”,《IEEE通信杂志》,第80-87页,2002年11月。

For information on the availability of the following documents, please see http://www.itu.int

有关下列文件的可用性信息,请参见http://www.itu.int

[G.707] ITU-T, "Network Node Interface for the Synchronous Digital Hierarchy (SDH)," Recommendation G.707, October 2000.

[G.707]ITU-T,“同步数字体系(SDH)的网络节点接口”,建议G.707,2000年10月。

[G.709] ITU-T, "Network Node Interface for the Optical Transport Network (OTN)," Recommendation G.709, February 2001 (and Amendment no.1, October 2001).

[G.709]ITU-T,“光传输网络(OTN)的网络节点接口”,建议G.709,2001年2月(和修正案1,2001年10月)。

[G.783] ITU-T, "Characteristics of Synchronous Digital Hierarchy (SDH) Equipment Functional Blocks," Recommendation G.783, October 2000.

[G.783]ITU-T,“同步数字体系(SDH)设备功能块的特性”,建议G.783,2000年10月。

[G.798] ITU-T, "Characteristics of optical transport network hierarchy equipment functional block," Recommendation G.798, June 2004.

[G.798]ITU-T,“光传输网络层次结构设备功能块的特征”,建议G.798,2004年6月。

[G.806] ITU-T, "Characteristics of Transport Equipment - Description Methodology and Generic Functionality", Recommendation G.806, October 2000.

[G.806]ITU-T,“运输设备的特性——描述方法和通用功能”,建议G.806,2000年10月。

[G.841] ITU-T, "Types and Characteristics of SDH Network Protection Architectures," Recommendation G.841, October 1998.

[G.841]ITU-T,“SDH网络保护体系结构的类型和特征”,建议G.841,1998年10月。

[G.842] ITU-T, "Interworking of SDH network protection architectures," Recommendation G.842, October 1998.

[G.842]ITU-T,“SDH网络保护体系结构的互通”,建议G.842,1998年10月。

[G.874] ITU-T, "Management aspects of the optical transport network element," Recommendation G.874, November 2001.

[G.874]ITU-T,“光传输网元的管理方面”,建议G.874,2001年11月。

Editors' Addresses

编辑地址

Dimitri Papadimitriou Alcatel Francis Wellesplein, 1 B-2018 Antwerpen, Belgium

迪米特里·帕帕迪米特里奥·阿尔卡特·弗朗西斯·韦勒斯普林,1 B-2018比利时安特卫普

   Phone:  +32 3 240-8491
   EMail: dimitri.papadimitriou@alcatel.be
        
   Phone:  +32 3 240-8491
   EMail: dimitri.papadimitriou@alcatel.be
        

Eric Mannie Perceval Rue Tenbosch, 9 1000 Brussels Belgium

Eric Mannie Perceval Rue Tenbosch,9 1000比利时布鲁塞尔

   Phone: +32-2-6409194
   EMail: eric.mannie@perceval.net
        
   Phone: +32-2-6409194
   EMail: eric.mannie@perceval.net
        

Full Copyright Statement

完整版权声明

Copyright (C) The Internet Society (2006).

版权所有(C)互联网协会(2006年)。

This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.

本文件受BCP 78中包含的权利、许可和限制的约束,除其中规定外,作者保留其所有权利。

This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件及其包含的信息是按“原样”提供的,贡献者、他/她所代表或赞助的组织(如有)、互联网协会和互联网工程任务组不承担任何明示或暗示的担保,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Intellectual Property

知识产权

The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.

IETF对可能声称与本文件所述技术的实施或使用有关的任何知识产权或其他权利的有效性或范围,或此类权利下的任何许可可能或可能不可用的程度,不采取任何立场;它也不表示它已作出任何独立努力来确定任何此类权利。有关RFC文件中权利的程序信息,请参见BCP 78和BCP 79。

Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.

向IETF秘书处披露的知识产权副本和任何许可证保证,或本规范实施者或用户试图获得使用此类专有权利的一般许可证或许可的结果,可从IETF在线知识产权存储库获取,网址为http://www.ietf.org/ipr.

The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.

IETF邀请任何相关方提请其注意任何版权、专利或专利申请,或其他可能涵盖实施本标准所需技术的专有权利。请将信息发送至IETF的IETF-ipr@ietf.org.

Acknowledgement

确认

Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA).

RFC编辑器功能的资金由IETF行政支持活动(IASA)提供。