RFC 6366: Requirements for an Internet Audio Codec 中文翻译

URL : https://datatracker.ietf.org/doc/html/rfc6366
标题 : RFC 6366
翻译类型 : 自动生成

Internet Engineering Task Force (IETF)                          J. Valin
Request for Comments: 6366                                       Mozilla
Category: Informational                                           K. Vos
ISSN: 2070-1721                                 Skype Technologies, S.A.
                                                             August 2011

Internet Engineering Task Force (IETF)                          J. Valin
Request for Comments: 6366                                       Mozilla
Category: Informational                                           K. Vos
ISSN: 2070-1721                                 Skype Technologies, S.A.
                                                             August 2011

Requirements for an Internet Audio Codec

对Internet音频编解码器的要求

Abstract

摘要

This document provides specific requirements for an Internet audio codec. These requirements address quality, sampling rate, bit-rate, and packet-loss robustness, as well as other desirable properties.

本文档提供了Internet音频编解码器的具体要求。这些要求涉及质量、采样率、比特率和丢包鲁棒性，以及其他期望的特性。

Status of This Memo

关于下段备忘

This document is not an Internet Standards Track specification; it is published for informational purposes.

本文件不是互联网标准跟踪规范；它是为了提供信息而发布的。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741.

本文件是互联网工程任务组（IETF）的产品。它代表了IETF社区的共识。它已经接受了公众审查，并已被互联网工程指导小组（IESG）批准出版。并非IESG批准的所有文件都适用于任何级别的互联网标准；见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6366.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息，请访问http://www.rfc-editor.org/info/rfc6366.

版权公告

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件，因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本，并提供简化BSD许可证中所述的无担保。

Table of Contents

   1. Introduction ....................................................2
   2. Definitions .....................................................3
   3. Applications ....................................................3
      3.1. Point-to-Point Calls .......................................3
      3.2. Conferencing ...............................................4
      3.3. Telepresence ...............................................5
      3.4. Teleoperation and Remote Software Services .................5
      3.5. In-Game Voice Chat .........................................5
      3.6. Live Distributed Music Performances / Internet
           Music Lessons ..............................................6
      3.7. Delay-Tolerant Networking or Push-to-Talk Services .........6
      3.8. Other Applications .........................................7
   4. Constraints Imposed by the Internet on the Codec ................7
   5. Detailed Basic Requirements .....................................8
      5.1. Operating Space ............................................9
      5.2. Quality and Bit-Rate .......................................9
      5.3. Packet-Loss Robustness ....................................10
      5.4. Computational Resources ...................................10
   6. Additional Considerations ......................................12
      6.1. Low-Complexity Audio Mixing ...............................12
      6.2. Encoder Side Potential for Improvement ....................12
      6.3. Layered Bit-Stream ........................................13
      6.4. Partial Redundancy ........................................13
      6.5. Stereo Support ............................................13
      6.6. Bit Error Robustness ......................................13
      6.7. Time Stretching and Shortening ............................14
      6.8. Input Robustness ..........................................14
      6.9. Support of Audio Forensics ................................14
      6.10. Legacy Compatibility .....................................14
   7. Security Considerations ........................................14
   8. Acknowledgments ................................................15
   9. Informative References .........................................15

   1. Introduction ....................................................2
   2. Definitions .....................................................3
   3. Applications ....................................................3
      3.1. Point-to-Point Calls .......................................3
      3.2. Conferencing ...............................................4
      3.3. Telepresence ...............................................5
      3.4. Teleoperation and Remote Software Services .................5
      3.5. In-Game Voice Chat .........................................5
      3.6. Live Distributed Music Performances / Internet
           Music Lessons ..............................................6
      3.7. Delay-Tolerant Networking or Push-to-Talk Services .........6
      3.8. Other Applications .........................................7
   4. Constraints Imposed by the Internet on the Codec ................7
   5. Detailed Basic Requirements .....................................8
      5.1. Operating Space ............................................9
      5.2. Quality and Bit-Rate .......................................9
      5.3. Packet-Loss Robustness ....................................10
      5.4. Computational Resources ...................................10
   6. Additional Considerations ......................................12
      6.1. Low-Complexity Audio Mixing ...............................12
      6.2. Encoder Side Potential for Improvement ....................12
      6.3. Layered Bit-Stream ........................................13
      6.4. Partial Redundancy ........................................13
      6.5. Stereo Support ............................................13
      6.6. Bit Error Robustness ......................................13
      6.7. Time Stretching and Shortening ............................14
      6.8. Input Robustness ..........................................14
      6.9. Support of Audio Forensics ................................14
      6.10. Legacy Compatibility .....................................14
   7. Security Considerations ........................................14
   8. Acknowledgments ................................................15
   9. Informative References .........................................15

1. Introduction

1. 介绍

This document provides requirements for an audio codec designed specifically for use over the Internet. The requirements attempt to address the needs of the most common Internet interactive audio transmission applications and ensure good quality when operating in conditions that are typical for the Internet. These requirements also address the quality, sampling rate, delay, bit-rate, and packet-loss robustness. Other desirable codec properties are considered as well.

本文档提供了专门为在Internet上使用而设计的音频编解码器的要求。这些要求试图满足最常见的互联网交互式音频传输应用的需求，并确保在互联网典型条件下运行时具有良好的质量。这些要求还涉及质量、采样率、延迟、比特率和丢包鲁棒性。还考虑了其他需要的编解码器属性。

2. Definitions

2. 定义

Throughout this document, the following conventions refer to the sampling rate of a signal:

在本文件中，以下约定涉及信号的采样率：

Narrowband: 8 kilohertz (kHz)

窄带：8千赫（kHz）

Wideband: 16 kHz

宽频带：16千赫

Super-wideband: 24/32 kHz

超宽带：24/32 kHz

Full-band: 44.1/48 kHz

全频段：44.1/48 kHz

Codec bit-rates in bits per second (bit/s) will be considered without counting any overhead ((IP/UDP/RTP) headers, padding, etc.). The codec delay is the total algorithmic delay when one adds the codec frame size to the "look-ahead". Thus, it is the minimum theoretically achievable end-to-end delay of a transmission system that uses the codec.

在不计算任何开销（IP/UDP/RTP）头、填充等的情况下，将考虑以比特/秒（bit/s）为单位的编解码器比特率。编解码器延迟是将编解码器帧大小添加到“前瞻”时的总算法延迟。因此，它是使用编解码器的传输系统理论上可实现的最小端到端延迟。

3. Applications

3. 应用

The following applications should be considered for Internet audio codecs, along with their requirements:

互联网音频编解码器应考虑以下应用及其要求：

o Point-to-point calls

o 点对点通话

o Conferencing

o 会议

o Telepresence

o 临场感

o Teleoperation

o 遥操作

o In-game voice chat

o 游戏内语音聊天

o Live distributed music performances / Internet music lessons

o 现场分布式音乐表演/互联网音乐课程

o Delay-tolerant networking or push-to-talk services

o 延迟容忍网络或按键通话服务

o Other applications

o 其他应用

3.1. Point-to-Point Calls

3.1. 点对点通话

Point-to-point calls are voice over IP (VoIP) calls from two "standard" (fixed or mobile) phones, and implemented in hardware or software. For these applications, a wideband codec is required, along with narrowband support for compatibility with a public switched telephone network (PSTN). It is expected for the range of

点对点呼叫是来自两部“标准”（固定或移动）电话的IP语音（VoIP）呼叫，通过硬件或软件实现。对于这些应用，需要宽带编解码器，以及窄带支持，以便与公共交换电话网（PSTN）兼容。这是预期的范围

useful bit-rates to be 12 - 32 kilobits per second (kbit/s) for wideband speech and 8 - 16 kbit/s for narrowband speech. The codec delay must be less than 40 milliseconds (ms), but no more than 25 ms is desirable. Support for encoding music is not required, but it is desirable for the codec not to make background (on-hold) music excessively unpleasant to hear. Also, the codec should be robust to noise (produce intelligible speech and no annoying artifacts) even at lower bit-rates.

宽带语音的有用比特率为12-32千比特/秒（kbit/s），窄带语音的有用比特率为8-16千比特/秒。编解码器延迟必须小于40毫秒（ms），但最好不要超过25毫秒。不需要对音乐编码的支持，但编解码器最好不要使背景（保留）音乐听起来过于不舒服。此外，即使在较低的比特率下，编解码器也应该对噪声具有鲁棒性（产生可理解的语音并且没有恼人的伪影）。

3.2. Conferencing

3.2. 会议

Conferencing applications (that support multi-party calls) have additional requirements on top of the requirements for point-to-point calls. Conferencing systems often have higher-fidelity audio equipment and have greater network bandwidth available -- especially when video transmission is involved. Therefore, support for super-wideband audio becomes important, with useful bit-rates in the 32 - 64 kbit/s range. The ability to vary the bit-rate, according to the "difficulty" of the audio signal, is a desirable feature for the codec. This not only saves bandwidth "on average", but it can also help conference servers make more efficient use of the available bandwidth, by using more bandwidth for important audio streams and less bandwidth for less important ones (e.g., background noise).

会议应用程序（支持多方通话）除了对点到点通话的要求外，还有其他要求。会议系统通常具有高保真度的音频设备和更大的可用网络带宽，尤其是在涉及视频传输时。因此，支持超宽带音频变得非常重要，有用的比特率在32-64 kbit/s范围内。根据音频信号的“难度”改变比特率的能力是编解码器的理想特性。这不仅可以“平均”节省带宽，还可以帮助会议服务器更有效地利用可用带宽，为重要的音频流使用更多带宽，为不太重要的音频流（例如背景噪音）使用更少带宽。

Conferencing end-points often operate in hands-free conditions, which creates acoustic echo problems. Therefore, lower delay is important, as it reduces the quality degradation due to any residual echo after acoustic echo cancellation (AEC). Consequently, the codec delay must be less than 30 ms for this application. An optional low-delay mode with less than 10 ms delay is desirable, but not required.

会议端点通常在免提条件下工作，这会产生声音回声问题。因此，较低的延迟非常重要，因为它可以减少由于声学回波消除（AEC）后的任何残余回波而导致的质量下降。因此，此应用程序的编解码器延迟必须小于30 ms。可选的低延迟模式（延迟小于10 ms）是可取的，但不是必需的。

Most conferencing systems operate with a bridge that mixes some (or all) of the audio streams and sends them back to all the participants. In that case, it is important that the codec not produce annoying artifacts when two voices are present at the same time. Also, this mixing operation should be as easy as possible to perform. To make it easier to determine which streams have to be mixed (and which are noise/silence), it must be possible to measure (or estimate) the voice activity in a packet without having to fully decode the packet (saving most of the complexity when the packet need not be decoded). Also, the ability to save on the computational complexity when mixing is also desirable, but not required. For example, a transform codec may make it possible to mix the streams in the transform domain, without having to go back to time-domain. Low-complexity up-sampling and down-sampling within the codec is also a desirable feature when mixing streams with different sampling rates.

大多数会议系统都使用一个桥接器，该桥接器混合部分（或全部）音频流并将其发送回所有参与者。在这种情况下，当两个声音同时出现时，编解码器不产生恼人的伪影是很重要的。此外，该混合操作应尽可能容易执行。为了更容易地确定哪些流必须混合（以及哪些流是噪声/静音），必须能够测量（或估计）数据包中的语音活动，而不必对数据包进行完全解码（在数据包不需要解码时节省了大部分复杂性）。此外，混合时节省计算复杂性的能力也是可取的，但不是必需的。例如，变换编解码器可以使在变换域中混合流成为可能，而不必返回到时域。当混合具有不同采样率的流时，编解码器内的低复杂度上采样和下采样也是理想的特性。

3.3. Telepresence

3.3. 临场感

Most telepresence applications can be considered to be essentially very high-quality video-conferencing environments, so all of the conferencing requirements also apply to telepresence. In addition, telepresence applications require super-wideband and full-band audio capability with useful bit-rates in the 32 - 80 kbit/s range. While voice is still the most important signal to be encoded, it must be possible to obtain good quality (even if not transparent) music.

大多数远程呈现应用程序本质上可以被认为是非常高质量的视频会议环境，因此所有会议要求也适用于远程呈现。此外，临场感应用需要超宽带和全频段音频能力，有用的比特率在32-80 kbit/s范围内。虽然语音仍然是需要编码的最重要信号，但必须能够获得高质量（即使不是透明的）音乐。

Most telepresence applications require more than one audio channel, so support for stereo and multi-channel is important. While this can always be accomplished by encoding multiple single-channel streams, it is preferable to take advantage of the redundancy that exists between channels.

大多数临场感应用程序需要多个音频通道，因此支持立体声和多通道非常重要。虽然这总是可以通过编码多个单通道流来实现，但最好利用通道之间存在的冗余。

3.4. Teleoperation and Remote Software Services

3.4. 远程操作和远程软件服务

Teleoperation applications are similar to telepresence, with the exception that they involve remote physical interactions. For example, the user may be controlling a robot while receiving real-time audio feedback from that robot. For these applications, the delay has to be less than 10 ms. The other requirements of telepresence (quality, bit-rate, multi-channel) apply to teleoperation as well. The only exception is that mixing is not an important issue for teleoperation.

遥操作应用程序与远程临场感类似，只是它们涉及远程物理交互。例如，用户可以在接收来自机器人的实时音频反馈的同时控制机器人。对于这些应用，延迟必须小于10ms。远程临场感的其他要求（质量、比特率、多通道）也适用于远程操作。唯一的例外是，对于远程操作来说，混合不是一个重要的问题。

The requirements for remote software services are similar to those of teleoperation. These applications include remote desktop applications, remote virtualization, and interactive media application being rendered remotely (e.g., video games rendered on central servers). For all these applications, full-band audio with an algorithmic delay below 10 ms are important.

远程软件服务的要求与远程操作的要求类似。这些应用程序包括远程桌面应用程序、远程虚拟化和远程呈现的交互式媒体应用程序（例如，在中央服务器上呈现的视频游戏）。对于所有这些应用，算法延迟低于10毫秒的全频段音频都很重要。

3.5. In-Game Voice Chat

3.5. 游戏内语音聊天

An increasing number of computer/console games make use of VoIP to allow players to communicate in real time. The requirements for gaming are similar to those of conferencing, with the main difference being that narrowband compatibility is not necessary. While for most applications a codec delay up to 30 ms is acceptable, a low-delay (< 10 ms) option is highly desirable, especially for games with rapid interactions. The ability to use variable bit-rate (VBR) (with a maximum allowed bit-rate) is also highly desirable because it can significantly reduce the bandwidth requirement for a game server.

越来越多的计算机/控制台游戏使用VoIP来允许玩家进行实时通信。游戏的要求与会议类似，主要区别在于不需要窄带兼容性。虽然对于大多数应用程序，编解码器延迟高达30ms是可以接受的，但低延迟（<10ms）选项是非常可取的，尤其是对于具有快速交互的游戏。使用可变比特率（VBR）（具有最大允许比特率）的能力也是非常理想的，因为它可以显著降低游戏服务器的带宽要求。

3.6. Live Distributed Music Performances / Internet Music Lessons

3.6. 现场分布式音乐表演/互联网音乐课程

Live music over the Internet requires extremely low end-to-end delay and is one of the most demanding applications for interactive audio transmission. It has been observed that for most scenarios, total end-to-end delays up to 25 ms could be tolerated by musicians, with the absolute limit (where none of the scenarios are possible) being around 50 ms [carot09]. In order to achieve this low delay on the Internet -- either in the same city or in a nearby city -- the network propagation time must be taken into account. When also subtracting the delay of the audio buffer, jitter buffer, and acoustic path, that leaves around 2 ms to 10 ms for the total delay of the codec. Considering the speed of light in fiber, every 1 ms reduction in the codec delay increases the range over which synchronization is possible by approximately 200 km.

互联网上的现场音乐需要极低的端到端延迟，是交互式音频传输要求最高的应用之一。据观察，在大多数情况下，音乐家可以容忍高达25毫秒的端到端总延迟，绝对限制（在任何情况下都不可能）约为50毫秒[carot09]。为了在互联网上实现这种低延迟——无论是在同一个城市还是在附近的城市——必须考虑网络传播时间。同时减去音频缓冲区、抖动缓冲区和声学路径的延迟时，编码解码器的总延迟约为2 ms到10 ms。考虑到光纤中的光速，编解码器延迟每减少1 ms，同步范围就会增加约200 km。

Acoustic echo is expected to be an even more important issue for network music than it is in conferencing, especially considering that the music quality requirements essentially forbid the use of a "non-linear processor" (NLP) with AEC. This is another reason why very low delay is essential.

对于网络音乐而言，声学回波比会议音乐更为重要，特别是考虑到音乐质量要求基本上禁止使用AEC的“非线性处理器”（NLP）。这是极低延迟至关重要的另一个原因。

Considering that the application is music, the full audio bandwidth (44.1 or 48 kHz sampling rate) must be transmitted with a bit-rate that is sufficient to provide near-transparent to transparent quality. With the current audio coding technology, this corresponds to approximately 64 kbit/s to 128 kbit/s per channel. As for telepresence, support for two or more channels is often desired, so it would be useful for a codec to be able to take advantage of the redundancy that is often present between audio channels.

考虑到应用是音乐，全音频带宽（44.1或48 kHz采样率）必须以足以提供近乎透明到透明质量的比特率传输。使用当前的音频编码技术，这相当于每个通道大约64 kbit/s到128 kbit/s。至于临场感，通常需要支持两个或多个通道，因此编解码器能够利用音频通道之间经常存在的冗余是很有用的。

3.7. Delay-Tolerant Networking or Push-to-Talk Services

3.7. 延迟容忍网络或按键通话服务

Internet transmissions are subjected to interruptions of connectivity that severely disturb a phone call. This may happen in cases of route changes, handovers, slow fading, or device failures. To overcome this distortion, the phone call can be halted and resumed after the connectivity has been reestablished again.

互联网传输会受到严重干扰电话的连接中断的影响。这可能发生在路由更改、切换、慢衰落或设备故障的情况下。为了克服这种失真，可以在重新建立连接后停止和恢复电话通话。

Also, if transmission capacity is lower than the minimal coding rate, switching to a push-to-talk mode still allows for effective communication. In this situation, voice is transmitted at slower-than-real-time bit-rate and conversations are interrupted until the speech has been transmitted.

此外，如果传输容量低于最小编码速率，则切换到按键通话模式仍然允许有效通信。在这种情况下，语音以低于实时比特率的速度传输，对话中断，直到语音传输完毕。

These modes require interrupting the audio playout and continuing after a pause of arbitrary duration.

这些模式需要中断音频播放，并在任意持续时间的暂停后继续。

3.8. Other Applications

3.8. 其他应用

The above list is by no means a complete list of all applications involving interactive audio transmission on the Internet. However, it is believed that meeting the needs of all these different applications should be sufficient to ensure that the needs of other applications not listed will also be met.

上述列表绝不是涉及互联网上交互式音频传输的所有应用程序的完整列表。不过，我们相信，满足所有这些不同应用的需求应足以确保满足未列出的其他应用的需求。

4. Constraints Imposed by the Internet on the Codec

4. 互联网对编解码器施加的限制

Packet losses are inevitable on the Internet, and dealing with them is one of the most fundamental requirements for an Internet audio codec. While any audio codec can be combined with a good packet-loss concealment (PLC) algorithm, the important aspect is what happens on the first packets received _after_ the loss. More specifically, this means that:

数据包丢失在互联网上是不可避免的，处理它们是互联网音频编解码器最基本的要求之一。虽然任何音频编解码器都可以与良好的包丢失隐藏（PLC）算法相结合，但重要的方面是丢失后收到的第一个包上会发生什么。更具体地说，这意味着：

o it should be possible to interpret the contents of any received packet, irrespective of previous losses as specified in BCP 36 [PAYLOADS]; and

o 无论BCP 36[有效载荷]中规定的先前损失如何，都应该能够解释任何接收到的数据包的内容；和

o the decoder should re-synchronize as quickly as possible (i.e., the output should quickly converge to the output that would have been obtained if no loss had occurred).

o 解码器应尽可能快地重新同步（即，输出应快速收敛到在没有发生丢失的情况下获得的输出）。

The constraint of being able to decode any packet implies the following considerations for an audio codec:

能够解码任何数据包的限制意味着音频编解码器需要考虑以下因素：

o The size of a compressed frame must be kept smaller than the MTU to avoid fragmentation;

o 压缩帧的大小必须小于MTU，以避免碎片；

o The interpretation of any parameter encoded in the bit-stream must not depend on information contained in other packets. For example, it is not acceptable for a codec to allow signaling a mode change in one packet and assume that subsequent frames will be decoded according to that mode.

o 位流中编码的任何参数的解释不得依赖于其他数据包中包含的信息。例如，编解码器不允许在一个分组中发出模式改变的信号，并假设后续帧将根据该模式解码。

Although the interpretation of parameters cannot depend on other packets, it is still reasonable to use some amount of prediction across frames, provided that the predictors can resynchronize quickly in case of a lost packet. In this case, it is important to use the best compromise between the gain in coding efficiency and the loss in packet loss robustness due to the use of inter-frame prediction. It is a desirable property for the codec to allow some real-time control of that trade-off, so that it can take advantage of more prediction when the loss rate is small, while being more robust to losses when the loss rate is high.

尽管参数的解释不能依赖于其他数据包，但在帧间使用一定量的预测仍然是合理的，前提是在数据包丢失的情况下预测器可以快速重新同步。在这种情况下，重要的是在编码效率的增益和由于使用帧间预测而导致的分组丢失鲁棒性的损失之间使用最佳折衷。编解码器的理想特性是允许对这种权衡进行一些实时控制，以便在丢失率较低时可以利用更多的预测，而在丢失率较高时对丢失更具鲁棒性。

To improve the robustness to packet loss, it would be desirable for the codec to allow an adaptive (data- and network-dependent) amount of side information to help improve audio quality when losses occur. For example, side information may include the retransmission of certain parameters encoded in the previous frame(s).

为了提高对分组丢失的鲁棒性，需要编解码器允许自适应（数据和网络相关）的旁侧信息量，以在发生丢失时帮助改善音频质量。例如，旁侧信息可以包括在前一帧中编码的某些参数的重传。

To ensure freedom of implementation, decoder-side-only error concealment does not need to be specified, although a functional PLC algorithm is desirable as part of the codec reference implementation. Obviously, any information signaled in the bit-stream intended to aid PLC needs to be specified.

为了确保实现的自由度，不需要指定仅解码器端错误隐藏，尽管功能性PLC算法是编解码器参考实现的一部分。显然，需要指定位流中用于辅助PLC的任何信息。

Another important property of the Internet is that it is mostly a best-effort network, with no guaranteed bandwidth. This means that the codec has to be able to vary its output bit-rate dynamically (in real time), without requiring an out-of-band signaling mechanism, and without causing audible artifacts at the bit-rate change boundaries. Additional desirable features are:

互联网的另一个重要特性是，它主要是一个尽力而为的网络，没有保证的带宽。这意味着编解码器必须能够动态地（实时地）改变其输出比特率，而不需要带外信令机制，并且在比特率改变边界处不引起可听伪影。其他可取的特点包括：

o Having the possibility to use smooth bit-rate changes with one byte/frame resolution;

o 能够以一字节/帧分辨率使用平滑的比特率变化；

o Making it possible for a codec to adapt its bit-rate based on the source signal being encoded (source-controlled VBR) to maximize the quality for a certain _average_ bit-rate.

o 使编解码器能够根据正在编码的源信号（源代码控制VBR）调整其比特率，以最大限度地提高特定平均比特率的质量。

Because the Internet transmits data in bytes, a codec should produce compressed data in integer numbers of bytes. In general, the codec design should take into consideration explicit congestion notification (ECN) and may include features that would improve the quality of an ECN implementation.

由于互联网以字节为单位传输数据，因此编解码器应以整数字节为单位生成压缩数据。通常，编解码器设计应考虑显式拥塞通知（ECN），并可包括可提高ECN实现质量的功能。

The IETF has defined a set of application-layer protocols to be used for transmitting real-time transport of multimedia data, including voice. Thus, it is important for the resulting codec to be easy to use with these protocols. For example, it must be possible to create an [RTP] payload format that conforms to BCP 36 [PAYLOADS]. If any codec parameters need to be negotiated between end-points, the negotiation should be as easy as possible to carry over session initiation protocol (SIP) [RFC3261]/ session description protocol (SDP) [RFC4566] or alternatively over extensible messaging and presence protocol (XMPP) [RFC6120] / Jingle [XEP-0167].

IETF定义了一组应用层协议，用于传输多媒体数据（包括语音）的实时传输。因此，生成的编解码器要易于与这些协议一起使用，这一点很重要。例如，必须能够创建符合BCP 36[有效载荷]的[RTP]有效载荷格式。如果需要在端点之间协商任何编解码器参数，则协商应尽可能容易地通过会话启动协议（SIP）[RFC3261]/会话描述协议（SDP）[RFC4566]或可扩展消息和状态协议（XMPP）[RFC6120]/静乐[XEP-0167]进行。

5. Detailed Basic Requirements

5. 详细的基本要求

This section summarizes all the constraints imposed by the target applications and by the Internet into a set of actual requirements for codec development.

本节将目标应用程序和Internet施加的所有约束总结为编解码器开发的一组实际需求。

5.1. Operating Space

5.1. 操作空间

The operating space for the target applications can be divided in terms of delay: most applications require a "medium delay" (20-30 ms), while a few require a "very low delay" (< 10 ms). It makes sense to divide the space based on delay because lowering the delay has a cost in terms of quality versus bit-rate.

目标应用程序的操作空间可按延迟划分：大多数应用程序需要“中等延迟”（20-30毫秒），而少数应用程序需要“极低延迟”（<10毫秒）。基于延迟划分空间是有意义的，因为降低延迟在质量和比特率方面有成本。

For medium delay, the resulting codec must be able to efficiently operate within the following range of bit-rates (per channel):

对于中等延迟，生成的编解码器必须能够在以下比特率范围内（每个信道）有效运行：

o Narrowband: 8 kbit/s to 16 kbit/s

o 窄带：8 kbit/s至16 kbit/s

o Wideband: 12 to 32 kbit/s

o 宽带：12至32 kbit/s

o Super-wideband: 24 to 64 kbit/s

o 超宽带：24至64 kbit/s

o Full-band: 32 to 80 kbit/s

o 全频段：32至80 kbit/s

Obviously, a lower-delay codec that can operate in the above range is also acceptable.

显然，可以在上述范围内工作的较低延迟编解码器也是可以接受的。

For very low delay, the resulting codec will need to operate within the following range of bit-rates (per channel):

对于非常低的延迟，生成的编解码器将需要在以下比特率范围内运行（每个通道）：

o Super-wideband: 32 to 80 kbit/s

o 超宽带：32至80 kbit/s

o Full-band: 48 to 128 kbit/s

o 全频段：48至128 kbit/s

o (Narrowband and wideband not required)

o （不需要窄带和宽带）

5.2. Quality and Bit-Rate

5.2. 质量和比特率

The quality of a codec is directly linked to the bit-rate, so these two must be considered jointly. When comparing the bit-rate of codecs, the overhead of IP/UDP/RTP headers should not be considered, but any additional bits required in the RTP payload format, after the header (e.g., required signaling), should be considered. In terms of quality versus bit-rate, the codec to be developed must be better than the following codecs, that are generally considered royalty-free:

编解码器的质量与比特率直接相关，因此必须同时考虑这两者。在比较编解码器的比特率时，不应考虑IP/UDP/RTP报头的开销，但应考虑报头之后RTP有效负载格式中所需的任何附加比特（例如，所需的信令）。就质量与比特率而言，要开发的编解码器必须优于以下编解码器，这些编解码器通常被认为是免版税的：

o For narrowband: Speex (NB) [Speex], and internet low bit-rate codec (iLBC)(*) [RFC3951]

o 对于窄带：Speex（NB）[Speex]和互联网低比特率编解码器（iLBC）（*）[RFC3951]

o For wideband: Speex (WB) [Speex], G.722.1(*) [ITU.G722.1]

o 宽带：Speex（WB）[Speex]，G.722.1（*）[ITU.G722.1]

o For super-wideband/fullband: G.722.1C(*) [ITU.G722.1]

o 超宽带/全频段：G.722.1C（*）[ITU.G722.1]

The codecs marked with (*) have additional licensing restrictions, but the codec to be developed should still not perform significantly worse. In addition to the quality targets listed above, a desirable objective is for the codec quality to be no worse than Adaptive Multi-Rate (AMR-NB) and Adaptive Multi-Rate Wideband (AMR-WB). Quality should be measured for multiple languages, including tonal languages. The case of multiple simultaneous voices (as sometimes happens in conferencing) should be evaluated as well.

标有（*）的编解码器具有额外的许可限制，但要开发的编解码器的性能仍不应显著降低。除了上面列出的质量目标外，理想的目标是编解码器质量不低于自适应多速率（AMR-NB）和自适应多速率宽带（AMR-WB）。应该衡量多种语言的质量，包括音调语言。同时出现多个声音的情况（有时在会议中会发生）也应进行评估。

The comparison with the above codecs assumes that the codecs being compared have similar delay characteristics. The bit-rate required, for a certain level of quality, may be higher than the referenced codecs in cases where a much lower delay is required. In that case, the increase in bit-rate must be less than the ratio between the delays.

与上述编解码器的比较假设所比较的编解码器具有相似的延迟特性。在需要低得多的延迟的情况下，对于某一质量级别所需的比特率可能高于参考编解码器。在这种情况下，比特率的增加必须小于延迟之间的比率。

It is desirable for the codecs to support source-controlled variable bit-rate (VBR) to take advantage of different inputs, that require a different bit-rate, to achieve the same quality. However, it should still be possible to use the codec at a truly constant bit-rate to ensure that no information leak is possible when using an encrypted channel.

编解码器需要支持源代码控制的可变比特率（VBR），以利用需要不同比特率的不同输入来实现相同的质量。但是，在使用加密通道时，仍然可以以真正恒定的比特率使用编解码器，以确保不可能发生信息泄漏。

5.3. Packet-Loss Robustness

5.3. 丢包鲁棒性

Robustness to packet loss is a very important aspect of any codec to be used on the Internet. Codecs must maintain acceptable quality at loss rates up to 5% and maintain good intelligibility up to 15% loss rate. At any sampling rate, bit-rate, and packet-loss rate, the quality must be no less than the quality obtained with the Speex codec or the Global System for Mobile Communications - Full Rate (GSM-FR) codec in the same conditions. The actual packet-loss "patterns" to be used in testing must be obtained from real packet-loss traces collected on the Internet, rather than from loss models. These traces should be representative of the typical environments in which the applications of Section 3 operate. For example, traces related to VoIP calls should consider the loss patterns observed for typical home broadband and corporate connections.

对数据包丢失的鲁棒性是互联网上使用的任何编解码器的一个非常重要的方面。编解码器必须在高达5%的丢失率下保持可接受的质量，并在高达15%的丢失率下保持良好的清晰度。在任何采样率、比特率和丢包率下，质量必须不低于Speex编解码器或全球移动通信系统-全速率（GSM-FR）编解码器在相同条件下获得的质量。测试中使用的实际数据包丢失“模式”必须从互联网上收集的真实数据包丢失跟踪中获得，而不是从丢失模型中获得。这些痕迹应代表第3节应用程序运行的典型环境。例如，与VoIP呼叫相关的跟踪应该考虑典型家庭宽带和公司连接所观察到的丢失模式。

5.4. Computational Resources

5.4. 计算资源

The resulting codec should be implementable on a wide range of devices, so there should be a fixed-point implementation or at least assurance that a reasonable fixed-point is possible. The computational resources figures listed below are meant to be upper bounds. Even below these bounds, resources should still be minimized. Any proposed increase in computational resources consumption (e.g., to increase quality) should be carefully evaluated

由此产生的编解码器应该可以在广泛的设备上实现，因此应该有一个定点实现，或者至少可以保证一个合理的定点是可能的。下面列出的计算资源数字是上界。即使低于这些界限，资源仍然应该最小化。应仔细评估计算资源消耗的任何拟议增加（例如，提高质量）

even if the resulting resource consumption is below the upper bound. Having variable complexity would be useful (but not required) in achieving that goal as it would allow trading quality/bit-rate for lower complexity.

即使产生的资源消耗低于上限。拥有可变复杂度将有助于（但不是必需的）实现这一目标，因为它将允许交易质量/比特率降低复杂度。

The computational requirements for real-time encoding and decoding of a mono signal on one core of a recent x86 CPU (as measured with the Unix "time" utility or equivalent) are as follows:

在最新x86 CPU的一个核心上实时编码和解码单声道信号的计算要求（使用Unix“time”实用程序或等效工具进行测量）如下所示：

o Narrowband: 40 megahertz (MHz) (2% of a 2 gigahertz (GHz) CPU core)

o 窄带：40兆赫（MHz）（占2GHz（GHz）CPU核心的2%）

o Wideband: 80 MHz (4% of a 2 GHz CPU core)

o 宽带：80 MHz（2 GHz CPU核心的4%）

o Super-wideband/fullband: 200 MHz (10% of a 2 GHz CPU core)

o 超宽带/全频段：200 MHz（2 GHz CPU核心的10%）

It is desirable that the MHz values listed above also be achievable on fixed-point digital signal processors that are capable of single-cycle multiply-accumulate operations (16x16 multiplication accumulated into 32 bits).

希望上述MHz值也可以在能够进行单周期乘法累积操作（16x16乘法累积到32位）的定点数字信号处理器上实现。

For applications that require mixing (e.g., conferencing), it should be possible to estimate the energy and/or the voice activity status of the decoded signal with less than 10% of the complexity figures listed above.

对于需要混合的应用（例如，会议），应该能够估计解码信号的能量和/或语音活动状态，其复杂度低于上述数字的10%。

It is the intent to maximize the range of devices on which a codec can be implemented. Therefore, the reference implementation must not depend on special hardware features or instructions to be present in order to meet the complexity requirement. However, it may be desirable to take advantage of such hardware when available, (e.g., hardware accelerators for operations like Fast Fourier Transforms (FFT) and convolutions). A codec should also minimize the use of saturating arithmetic so as to be implementable on architectures that do not provide hardware saturation (e.g., ARMv4).

其目的是最大限度地扩大可实现编解码器的设备范围。因此，为了满足复杂性要求，参考实现不得依赖于要提供的特殊硬件功能或指令。然而，可能希望在可用时利用此类硬件（例如，用于快速傅里叶变换（FFT）和卷积等操作的硬件加速器）。编解码器还应尽量减少饱和算法的使用，以便在不提供硬件饱和的体系结构（如ARMv4）上实现。

The combined codec size and data read-only memory (ROM) should be small enough not to cause significant implementation problems on typical embedded devices. The codec context/state size required should be no more than 2*R*C bytes in floating-point, where R is the sampling rate and C is the number of channels. For fixed-point, that size should be less than R*C. The scratch space required should also be less than 2*R*C bytes for floating point or less than R*C bytes for fixed-point.

编解码器大小和数据只读存储器（ROM）的组合应足够小，以免在典型的嵌入式设备上造成重大的实现问题。所需的编解码器上下文/状态大小应不超过2*R*C字节（浮点），其中R是采样率，C是通道数。对于定点，该大小应小于R*C。对于浮点，所需的暂存空间也应小于2*R*C字节，对于定点，所需的暂存空间也应小于R*C字节。

6. Additional Considerations

6. 其他考虑事项

There are additional features or characteristics that may be desirable under some circumstances, but should not be part of the strict requirements. The benefit of meeting these considerations should be weighted against the associated cost.

在某些情况下，可能需要一些附加特性或特征，但不应成为严格要求的一部分。满足这些考虑的好处应与相关成本进行权衡。

6.1. Low-Complexity Audio Mixing

6.1. 低复杂度音频混音

In many applications that require a mixing server (e.g., conferencing, games), it is important to minimize the computational cost of the mixing. As much as possible, it should be possible to perform the mixing with fewer computations than it would take to decode all the streams, mix them, and re-encode the result. Properties that reduce the complexity of the mixing process include:

在许多需要混合服务器的应用程序中（例如，会议、游戏），将混合的计算成本降至最低非常重要。与解码所有流、混合流和重新编码结果所需的计算量相比，应该尽可能少地执行混合。降低混合过程复杂性的特性包括：

o The ability to derive sufficient parameters, such as loudness and/or spectral envelope, for estimating voice activity of a compressed frame without fully decoding that frame;

o 能够导出足够的参数，例如响度和/或频谱包络，用于在不完全解码该帧的情况下估计压缩帧的语音活动；

o The ability to mix the streams in an intermediate representation (e.g., transform domain), rather than having to fully decode the signals before the mixing;

o 以中间表示（例如，变换域）混合流的能力，而不必在混合之前对信号进行完全解码；

o The use of bit-stream layers (Section 6.3) by aggregating a small number of active streams at lower quality.

o 通过聚集少量低质量的活动流来使用比特流层（第6.3节）。

For conferencing applications, the total complexity of the decoding, voice activity detection (VAD), and mixing should be considered when evaluating proposals.

对于会议应用程序，在评估提案时，应考虑解码、语音活动检测（VAD）和混合的总体复杂性。

6.2. Encoder Side Potential for Improvement

6.2. 编码器侧的改进潜力

In many codecs, it is possible to improve the quality by improving the encoder without breaking compatibility (i.e., without changing the decoder). Potential for improvement varies from one codec to another. It is generally low for pulse code modulation (PCM) or adaptive differential pulse code modulation (ADPCM) codecs and higher for perceptual transform codecs. All things being equal, being able to improve a codec after the bit-stream is a desirable property. However, this should not be done at the expense of quality in the reference encoder. Other potential improvements include signal-adaptive frame size selection and improved discontinuous transmission (DTX) algorithms that take advantage of predicting the decoder sides packet loss concealment (PLC) algorithms.

在许多编解码器中，可以通过改进编码器而不破坏兼容性（即，不改变解码器）来提高质量。改进的潜力因编解码器而异。脉冲编码调制（PCM）或自适应差分脉冲编码调制（ADPCM）编解码器通常较低，而感知变换编解码器则较高。在所有条件相同的情况下，能够在比特流之后改进编解码器是一个理想的特性。但是，这不应以牺牲参考编码器的质量为代价。其他潜在的改进包括信号自适应帧大小选择和改进的不连续传输（DTX）算法，这些算法利用了预测解码器端丢包隐藏（PLC）算法。

6.3. Layered Bit-Stream

6.3. 分层比特流

A layered codec makes it possible to transmit only a certain subset of the bits and still obtain a valid bit-stream with a quality that is equivalent to the quality that would be obtained from encoding at the corresponding rate. While this is not a necessary feature for most applications, it can be desirable for cases where a "mixing server" needs to handle a large number of streams with limited computational resources.

分层编解码器使得仅传输比特的某个子集并且仍然获得质量等同于以相应速率编码将获得的质量的有效比特流成为可能。虽然这对于大多数应用程序来说不是必需的功能，但对于“混合服务器”需要在有限的计算资源下处理大量流的情况，这是可取的。

6.4. Partial Redundancy

6.4. 部分冗余

One possible way of increasing robustness to packet loss is to include partial redundancy within packets. This can be achieved either by including the base layer of the previous frame (for a layered codec) or by transmitting other parameters from the previous frame(s) to assist the PLC algorithm in case of loss. The ability to include partial redundancy for high-loss scenarios is desirable, provided that the feature can be dynamically turned on or off (so that no bandwidth is wasted in case of loss-free transmission).

提高数据包丢失鲁棒性的一种可能方法是在数据包中包含部分冗余。这可以通过包括前一帧的基本层（对于分层编解码器）或通过从前一帧传输其他参数来实现，以在丢失的情况下辅助PLC算法。在高损耗场景中包含部分冗余的能力是可取的，前提是该功能可以动态打开或关闭（以便在无损耗传输的情况下不会浪费带宽）。

6.5. Stereo Support

6.5. 立体声支架

It is highly desirable for the codec to have stereo support. At a minimum, the codec should be able to encode two channels independently without causing significant stereo image artifacts. It is also desirable for the codec to take advantage of the inter-channel redundancy in stereo audio to reduce the bit-rate (for an equivalent quality) of stereo audio compared to coding channels independently.

这是非常理想的编解码器有立体声支持。至少，编解码器应该能够独立地编码两个通道，而不会造成明显的立体图像伪影。与独立编码信道相比，编解码器还希望利用立体声音频中的信道间冗余来降低立体声音频的比特率（对于等效质量）。

6.6. Bit Error Robustness

6.6. 误码鲁棒性

The vast majority of Internet-based applications do not need to be robust to bit errors because packets either arrive unaltered or do not arrive at all. Therefore, the emphasis should be on packet-loss robustness and packet-loss concealment. That being said, often, the extra robustness to bit errors can be achieved at no cost at all (i.e., no increase in size, complexity, or bit-rate; no decrease in quality, or packet-loss robustness, etc.). In those cases, it is useful to make a change that increases the robustness to bit errors. This can be useful for applications that use UDP Lite transmission (e.g., over a wireless LAN). Robustness to packet loss should *never* be sacrificed to achieve higher bit error robustness.

绝大多数基于Internet的应用程序不需要对位错误具有鲁棒性，因为数据包要么未经更改就到达，要么根本不到达。因此，重点应放在丢包鲁棒性和丢包隐藏上。这就是说，通常，对比特错误的额外鲁棒性可以完全免费实现（即，不增加大小、复杂度或比特率；不降低质量或分组丢失鲁棒性等）。在这些情况下，进行更改以提高对位错误的鲁棒性是很有用的。这对于使用UDP Lite传输（例如，通过无线LAN）的应用程序非常有用。为了获得更高的误码鲁棒性，不应牺牲对数据包丢失的鲁棒性。

6.7. Time Stretching and Shortening

6.7. 时间的延长和缩短

When adaptive jitter buffers are used, it is often necessary to stretch or shorten the audio signal to allow changes in buffering. While this operation can be performed directly on the decoder's output, it is often more computationally efficient to stretch or shorten the signal directly within the decoder. It is desirable for the reference implementation to provide a time stretching/shortening implementation, although it should not be normative.

当使用自适应抖动缓冲器时，通常需要拉伸或缩短音频信号以允许缓冲的变化。虽然此操作可直接在解码器的输出上执行，但直接在解码器内拉伸或缩短信号通常在计算上更有效。虽然参考实现不应该是规范性的，但它最好提供一个时间延长/缩短的实现。

6.8. Input Robustness

6.8. 输入鲁棒性

The systems providing input to the encoder and receiving output from the decoder may be far from ideal in actual use. Input and output audio streams may be corrupted by compounding non-linear artifacts from analog hardware and digital processing. The codecs to be developed should be tested to ensure that they degrade gracefully under adverse audio input conditions. Types of digital corruption that may be tested include tandeming, transcoding, low-quality resampling, and digital clipping. Types of analog corruption that may be tested include microphones with substantial background noise, analog clipping, and loudspeaker distortion. No specific end-to-end quality requirements are mandated for use with the proposed codec. It is advisable, however, that several typical in situ environments/ processing chains be specified for the purpose of benchmarking end-to-end quality with the proposed codec.

向编码器提供输入并从解码器接收输出的系统在实际使用中可能远不理想。输入和输出音频流可能因模拟硬件和数字处理产生的非线性伪影而被破坏。应测试待开发的编解码器，以确保其在不利的音频输入条件下正常降级。可测试的数字损坏类型包括串联、转码、低质量重采样和数字剪辑。可测试的模拟损坏类型包括背景噪声大、模拟削波和扬声器失真的话筒。对于提议的编解码器，没有特定的端到端质量要求。但是，建议指定几个典型的现场环境/处理链，以便使用建议的编解码器对端到端质量进行基准测试。

6.9. Support of Audio Forensics

6.9. 支持音频取证

Emergency calls can be analyzed using audio forensics if the context and situation of the caller has to be identified. Thus, it is important to transmit not only the voice of the callers well, but also to transmit background noise at high quality. In these situations, sounds or noises of low volume should also not be compressed or dropped. Therefore, the encoder must allow DTX to be disabled when required (e.g., for emergency calls).

如果必须识别呼叫者的背景和情况，可以使用音频取证来分析紧急呼叫。因此，不仅要传输好呼叫者的声音，而且要传输高质量的背景噪声。在这些情况下，也不应压缩或降低低音量的声音或噪音。因此，编码器必须允许在需要时禁用DTX（例如，紧急呼叫）。

6.10. Legacy Compatibility

6.10. 遗留兼容性

In order to create the best possible codec for the Internet, there is no requirement for compatibility with legacy Internet codecs.

为了为互联网创建尽可能最好的编解码器，不需要与传统互联网编解码器兼容。

7. Security Considerations

7. 安全考虑

Although this document itself does not have security considerations, this section describes the security requirements for the codec.

尽管本文档本身没有安全注意事项，但本节描述了编解码器的安全要求。

As for any protocol to be used over the Internet, security is a very important aspect to consider. This goes beyond the obvious considerations of preventing buffer overflows and similar attacks that can lead to denial-of-service (DoS) or remote code execution. One very important security aspect is to make sure that the decoders have a bounded and reasonable worst-case complexity. This prevents an attacker from causing a DoS by sending packets that are specially crafted to take a very long (or infinite) time to decode.

至于在互联网上使用的任何协议，安全性是一个非常重要的考虑因素。这超出了防止缓冲区溢出和可能导致拒绝服务（DoS）或远程代码执行的类似攻击的明显考虑。一个非常重要的安全方面是确保解码器具有有界且合理的最坏情况复杂性。这可以防止攻击者通过发送精心编制的数据包来造成DoS，这些数据包需要很长（或无限）时间来解码。

A more subtle aspect is the information leak that can occur when the codec is used over an encrypted channel (e.g., [SRTP]). For example, it was suggested [wright08] [white11] that use of source-controlled VBR may reveal some information about a conversation through the size of the compressed packets. Therefore, it should be possible to use the codec at a truly constant bit-rate, if needed.

一个更微妙的方面是在加密信道（例如[SRTP]）上使用编解码器时可能发生的信息泄漏。例如，有人建议[wright08][white11]，使用源代码控制的VBR可能会通过压缩数据包的大小揭示一些关于对话的信息。因此，如果需要，应该能够以真正恒定的比特率使用编解码器。

8. Acknowledgments

8. 致谢

We would like to thank all the people who contributed directly or indirectly to this document, including Slava Borilin, Christopher Montgomery, Raymond (Juin-Hwey) Chen, Jason Fischl, Gregory Maxwell, Alan Duric, Jonathan Christensen, Julian Spittka, Michael Knappe, Christian Hoene, and Henry Sinnreich. We would also like to thank Cullen Jennings, Jonathan Rosenberg, and Gregory Lebovitz for their advice.

我们要感谢所有直接或间接为本文件做出贡献的人，包括斯拉瓦·鲍里林、克里斯托弗·蒙哥马利、雷蒙德（朱因·赫韦）陈、杰森·菲舍尔、格雷戈里·麦克斯韦、艾伦·杜里奇、乔纳森·克里斯滕森、朱利安·斯皮特卡、迈克尔·纳佩、克里斯蒂安·霍恩和亨利·辛里奇。我们还要感谢卡伦·詹宁斯、乔纳森·罗森伯格和格雷戈里·勒博维茨的建议。

9. Informative References

9. 资料性引用

[RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002.

[RFC3261]Rosenberg，J.，Schulzrinne，H.，Camarillo，G.，Johnston，A.，Peterson，J.，Sparks，R.，Handley，M.，和E.Schooler，“SIP：会话启动协议”，RFC 3261，2002年6月。

[RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006.

[RFC4566]Handley，M.，Jacobson，V.，和C.Perkins，“SDP：会话描述协议”，RFC4566，2006年7月。

[RFC6120] Saint-Andre, P., "Extensible Messaging and Presence Protocol (XMPP): Core", RFC 6120, March 2011.

[RFC6120]Saint Andre，P.，“可扩展消息和状态协议（XMPP）：核心”，RFC61202011年3月。

[XEP-0167] Ludwig, S., Saint-Andre, P., Egan, S., McQueen, R., and D. Cionoiu, "Jingle RTP Sessions", XSF XEP 0167, December 2009.

[XEP-0167]路德维希，S.，圣安德烈，P.，伊根，S.，麦昆，R.，和D.乔努，“叮当声RTP会议”，XSF XEP 0167，2009年12月。

[RFC3951] Andersen, S., Duric, A., Astrom, H., Hagen, R., Kleijn, W., and J. Linden, "Internet Low Bit Rate Codec (iLBC)", RFC 3951, December 2004.

[RFC3951]Andersen，S.，Duric，A.，Astrom，H.，Hagen，R.，Kleijn，W.，和J.Linden，“互联网低比特率编解码器（iLBC）”，RFC 39512004年12月。

[ITU.G722.1] International Telecommunications Union, "Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss", ITU-T Recommendation G.722.1, May 2005.

[ITU.G722.1]国际电信联盟，“以24和32 kbit/s的低复杂度编码，用于低帧丢失系统中的免提操作”，ITU-T建议G.722.1，2005年5月。

[Speex] Xiph.Org Foundation, "Speex: http://www.speex.org/", 2003.

[Sex] XIPH.org基金会，“Speex：http://www.speex.org/", 2003.

[carot09] Carot, A., Werner, C., and T. Fischinger, "Towards a Comprehensive Cognitive Analysis of Delay-Influenced Rhythmical Interaction: http://www.carot.de/icmc2009.pdf", 2009.

[carot09]Carot，A.，Werner，C.，和T.Fischinger，“对延迟影响节奏互动的综合认知分析：http://www.carot.de/icmc2009.pdf", 2009.

[PAYLOADS] Handley, M. and C. Perkins, "Guidelines for Writers of RTP Payload Format Specifications", BCP 36, RFC 2736, December 1999.

[有效载荷]Handley，M.和C.Perkins，“RTP有效载荷格式规范编写者指南”，BCP 36，RFC 2736，1999年12月。

[RTP] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.

[RTP]Schulzrinne，H.，Casner，S.，Frederick，R.，和V.Jacobson，“RTP：实时应用的传输协议”，STD 64，RFC 35502003年7月。

[SRTP] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004.

[SRTP]Baugher，M.，McGrew，D.，Naslund，M.，Carrara，E.，和K.Norrman，“安全实时传输协议（SRTP）”，RFC 37112004年3月。

[wright08] Wright, C., Ballard, L., Coull, S., Monrose, F., and G. Masson, "Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations: http://www.cs.jhu.edu/~cwright/oakland08.pdf", 2008.

[wright08]Wright，C.，Ballard，L.，Coull，S.，Monrose，F.，和G.Masson，“如果你能找到我：在加密的VoIP对话中发现口语短语：http://www.cs.jhu.edu/~cwright/oakland08.pdf”，2008年。

[white11] White, A., Matthews, A., Snow, K., and F. Monrose, "Phonotactic Reconstruction of Encrypted VoIP Conversations: Hookt on fon-iks http://www.cs.unc.edu/~fabian/papers/foniks-oak11.pdf", 2011.

[white11]White，A.，Matthews，A.，Snow，K.，和F.Monrose，“加密VoIP对话的语音重建：fon-iks上的Hook”http://www.cs.unc.edu/~fabian/papers/foniks-oak11.pdf”，2011年。

Authors' Addresses

作者地址

Jean-Marc Valin Mozilla 650 Castro Street Mountain View, CA 94041 USA

Jean-Marc Valin Mozilla美国加利福尼亚州卡斯特罗街山景650号，邮编94041

   EMail: jmvalin@jmvalin.ca

   EMail: jmvalin@jmvalin.ca

Koen Vos Skype Technologies, S.A. Stadsgarden 6 Stockholm, 11645 Sweden

Koen Vos Skype Technologies，S.A.Stadsgarden 6斯德哥尔摩，瑞典11645

   EMail: koen.vos@skype.net

   EMail: koen.vos@skype.net