Internet Engineering Task Force (IETF)                        M. Schmidt
Request for Comments: 6416                            Dolby Laboratories
Obsoletes: 3016                                               F. de Bont
Category: Standards Track                            Philips Electronics
ISSN: 2070-1721                                                S. Doehla
                                                          Fraunhofer IIS
                                                                  J. Kim
                                                     LG Electronics Inc.
                                                            October 2011
        
Internet Engineering Task Force (IETF)                        M. Schmidt
Request for Comments: 6416                            Dolby Laboratories
Obsoletes: 3016                                               F. de Bont
Category: Standards Track                            Philips Electronics
ISSN: 2070-1721                                                S. Doehla
                                                          Fraunhofer IIS
                                                                  J. Kim
                                                     LG Electronics Inc.
                                                            October 2011
        

RTP Payload Format for MPEG-4 Audio/Visual Streams

MPEG-4音频/视频流的RTP有效负载格式

Abstract

摘要

This document describes Real-time Transport Protocol (RTP) payload formats for carrying each of MPEG-4 Audio and MPEG-4 Visual bitstreams without using MPEG-4 Systems. This document obsoletes RFC 3016. It contains a summary of changes from RFC 3016 and discusses backward compatibility to RFC 3016. It is a necessary revision of RFC 3016 in order to correct misalignments with the 3GPP Packet-switched Streaming Service (PSS) specification regarding the RTP payload format for MPEG-4 Audio.

本文档描述了用于在不使用MPEG-4系统的情况下承载每个MPEG-4音频和MPEG-4视频比特流的实时传输协议(RTP)有效载荷格式。本文件废除了RFC 3016。它包含对RFC 3016更改的总结,并讨论了对RFC 3016的向后兼容性。为了纠正与3GPP分组交换流媒体服务(PSS)规范关于MPEG-4音频的RTP有效载荷格式的不一致,RFC 3016需要进行必要的修订。

For the purpose of directly mapping MPEG-4 Audio/Visual bitstreams onto RTP packets, this document provides specifications for the use of RTP header fields and also specifies fragmentation rules. It also provides specifications for Media Type registration and the use of the Session Description Protocol (SDP). The audio payload format described in this document has some limitations related to the signaling of audio codec parameters for the required multiplexing format. Therefore, new system designs should utilize RFC 3640, which does not have these restrictions. Nevertheless, this revision of RFC 3016 is provided to update and complete the specification and to enable interoperable implementations.

为了将MPEG-4音频/视频比特流直接映射到RTP数据包,本文档提供了RTP头字段的使用规范,并指定了分段规则。它还为媒体类型注册和会话描述协议(SDP)的使用提供了规范。本文档中描述的音频有效负载格式具有一些与所需多路复用格式的音频编解码器参数的信令相关的限制。因此,新的系统设计应使用RFC 3640,而RFC 3640没有这些限制。尽管如此,RFC 3016的本次修订是为了更新和完善规范,并实现互操作。

Status of This Memo

关于下段备忘

This is an Internet Standards Track document.

这是一份互联网标准跟踪文件。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741.

本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。有关互联网标准的更多信息,请参见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6416.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc6416.

Copyright Notice

版权公告

Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2011 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。

This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.

本文件可能包含2008年11月10日之前发布或公开的IETF文件或IETF贡献中的材料。控制某些材料版权的人员可能未授予IETF信托允许在IETF标准流程之外修改此类材料的权利。在未从控制此类材料版权的人员处获得充分许可的情况下,不得在IETF标准流程之外修改本文件,也不得在IETF标准流程之外创建其衍生作品,除了将其格式化以RFC形式发布或将其翻译成英语以外的其他语言。

Table of Contents

目录

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.1.  MPEG-4 Visual RTP Payload Format . . . . . . . . . . . . .  4
     1.2.  MPEG-4 Audio RTP Payload Format  . . . . . . . . . . . . .  5
     1.3.  Interoperability with RFC 3016 . . . . . . . . . . . . . .  6
     1.4.  Relation with RFC 3640 . . . . . . . . . . . . . . . . . .  6
   2.  Definitions and Abbreviations  . . . . . . . . . . . . . . . .  6
   3.  Clarifications on Specifying Codec Configurations for
       MPEG-4 Audio . . . . . . . . . . . . . . . . . . . . . . . . .  7
   4.  LATM Restrictions for RTP Packetization of MPEG-4 Audio
       Bitstreams . . . . . . . . . . . . . . . . . . . . . . . . . .  7
   5.  RTP Packetization of MPEG-4 Visual Bitstreams  . . . . . . . .  8
     5.1.  Use of RTP Header Fields for MPEG-4 Visual . . . . . . . .  9
     5.2.  Fragmentation of MPEG-4 Visual Bitstream . . . . . . . . . 10
     5.3.  Examples of Packetized MPEG-4 Visual Bitstream . . . . . . 11
   6.  RTP Packetization of MPEG-4 Audio Bitstreams . . . . . . . . . 15
     6.1.  RTP Packet Format  . . . . . . . . . . . . . . . . . . . . 15
     6.2.  Use of RTP Header Fields for MPEG-4 Audio  . . . . . . . . 16
     6.3.  Fragmentation of MPEG-4 Audio Bitstream  . . . . . . . . . 17
   7.  Media Type Registration for MPEG-4 Audio/Visual Streams  . . . 17
     7.1.  Media Type Registration for MPEG-4 Visual  . . . . . . . . 17
     7.2.  Mapping to SDP for MPEG-4 Visual . . . . . . . . . . . . . 20
       7.2.1.  Declarative SDP Usage for MPEG-4 Visual  . . . . . . . 20
     7.3.  Media Type Registration for MPEG-4 Audio . . . . . . . . . 21
     7.4.  Mapping to SDP for MPEG-4 Audio  . . . . . . . . . . . . . 24
       7.4.1.  Declarative SDP Usage for MPEG-4 Audio . . . . . . . . 25
         7.4.1.1.  Example: In-Band Configuration . . . . . . . . . . 25
         7.4.1.2.  Example: 6 kbit/s CELP . . . . . . . . . . . . . . 25
         7.4.1.3.  Example: 64 kbit/s AAC LC Stereo . . . . . . . . . 26
         7.4.1.4.  Example: Use of the "SBR-enabled" Parameter  . . . 26
         7.4.1.5.  Example: Hierarchical Signaling of SBR . . . . . . 27
         7.4.1.6.  Example: HE AAC v2 Signaling . . . . . . . . . . . 27
         7.4.1.7.  Example: Hierarchical Signaling of PS  . . . . . . 28
         7.4.1.8.  Example: MPEG Surround . . . . . . . . . . . . . . 28
         7.4.1.9.  Example: MPEG Surround with Extended SDP
                   Parameters . . . . . . . . . . . . . . . . . . . . 28
         7.4.1.10. Example: MPEG Surround with Single-Layer
                   Configuration  . . . . . . . . . . . . . . . . . . 29
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 29
   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 30
   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 30
   11. Differences to RFC 3016  . . . . . . . . . . . . . . . . . . . 31
   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32
     12.1. Normative References . . . . . . . . . . . . . . . . . . . 32
     12.2. Informative References . . . . . . . . . . . . . . . . . . 33
        
   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.1.  MPEG-4 Visual RTP Payload Format . . . . . . . . . . . . .  4
     1.2.  MPEG-4 Audio RTP Payload Format  . . . . . . . . . . . . .  5
     1.3.  Interoperability with RFC 3016 . . . . . . . . . . . . . .  6
     1.4.  Relation with RFC 3640 . . . . . . . . . . . . . . . . . .  6
   2.  Definitions and Abbreviations  . . . . . . . . . . . . . . . .  6
   3.  Clarifications on Specifying Codec Configurations for
       MPEG-4 Audio . . . . . . . . . . . . . . . . . . . . . . . . .  7
   4.  LATM Restrictions for RTP Packetization of MPEG-4 Audio
       Bitstreams . . . . . . . . . . . . . . . . . . . . . . . . . .  7
   5.  RTP Packetization of MPEG-4 Visual Bitstreams  . . . . . . . .  8
     5.1.  Use of RTP Header Fields for MPEG-4 Visual . . . . . . . .  9
     5.2.  Fragmentation of MPEG-4 Visual Bitstream . . . . . . . . . 10
     5.3.  Examples of Packetized MPEG-4 Visual Bitstream . . . . . . 11
   6.  RTP Packetization of MPEG-4 Audio Bitstreams . . . . . . . . . 15
     6.1.  RTP Packet Format  . . . . . . . . . . . . . . . . . . . . 15
     6.2.  Use of RTP Header Fields for MPEG-4 Audio  . . . . . . . . 16
     6.3.  Fragmentation of MPEG-4 Audio Bitstream  . . . . . . . . . 17
   7.  Media Type Registration for MPEG-4 Audio/Visual Streams  . . . 17
     7.1.  Media Type Registration for MPEG-4 Visual  . . . . . . . . 17
     7.2.  Mapping to SDP for MPEG-4 Visual . . . . . . . . . . . . . 20
       7.2.1.  Declarative SDP Usage for MPEG-4 Visual  . . . . . . . 20
     7.3.  Media Type Registration for MPEG-4 Audio . . . . . . . . . 21
     7.4.  Mapping to SDP for MPEG-4 Audio  . . . . . . . . . . . . . 24
       7.4.1.  Declarative SDP Usage for MPEG-4 Audio . . . . . . . . 25
         7.4.1.1.  Example: In-Band Configuration . . . . . . . . . . 25
         7.4.1.2.  Example: 6 kbit/s CELP . . . . . . . . . . . . . . 25
         7.4.1.3.  Example: 64 kbit/s AAC LC Stereo . . . . . . . . . 26
         7.4.1.4.  Example: Use of the "SBR-enabled" Parameter  . . . 26
         7.4.1.5.  Example: Hierarchical Signaling of SBR . . . . . . 27
         7.4.1.6.  Example: HE AAC v2 Signaling . . . . . . . . . . . 27
         7.4.1.7.  Example: Hierarchical Signaling of PS  . . . . . . 28
         7.4.1.8.  Example: MPEG Surround . . . . . . . . . . . . . . 28
         7.4.1.9.  Example: MPEG Surround with Extended SDP
                   Parameters . . . . . . . . . . . . . . . . . . . . 28
         7.4.1.10. Example: MPEG Surround with Single-Layer
                   Configuration  . . . . . . . . . . . . . . . . . . 29
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 29
   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 30
   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 30
   11. Differences to RFC 3016  . . . . . . . . . . . . . . . . . . . 31
   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32
     12.1. Normative References . . . . . . . . . . . . . . . . . . . 32
     12.2. Informative References . . . . . . . . . . . . . . . . . . 33
        
1. Introduction
1. 介绍

The RTP payload formats described in this document specify how MPEG-4 Audio [14496-3] and MPEG-4 Visual streams [14496-2] are to be fragmented and mapped directly onto RTP packets.

本文档中描述的RTP有效载荷格式规定了如何将MPEG-4音频[14496-3]和MPEG-4视频流[14496-2]分段并直接映射到RTP数据包。

These RTP payload formats enable transport of MPEG-4 Audio/Visual streams without using the synchronization and stream management functionality of MPEG-4 Systems [14496-1]. Such RTP payload formats will be used in systems that have intrinsic stream management functionality and thus require no such functionality from MPEG-4 Systems. H.323 [H323] terminals are an example of such systems, where MPEG-4 Audio/Visual streams are not managed by MPEG-4 Systems Object Descriptors but by H.245 [H245]. The streams are directly mapped onto RTP packets without using the MPEG-4 Systems Sync Layer. Other examples are the Session Initiation Protocol (SIP) [RFC3261] and Real Time Streaming Protocol (RTSP) where media type and SDP are used. Media type and SDP usages of the RTP payload formats described in this document are defined to directly specify the attribute of Audio/Visual streams (e.g., media type, packetization format, and codec configuration) without using MPEG-4 Systems. The obvious benefit is that these MPEG-4 Audio/Visual RTP payload formats can be handled in a unified way together with those formats defined for non-MPEG-4 codecs. The disadvantage is that interoperability with environments using MPEG-4 Systems may be difficult; hence, other payload formats may be better suited to those applications.

这些RTP有效载荷格式能够在不使用MPEG-4系统的同步和流管理功能的情况下传输MPEG-4音频/视频流[14496-1]。此类RTP有效载荷格式将用于具有内在流管理功能的系统中,因此不需要来自MPEG-4系统的此类功能。H.323[H323]终端是此类系统的一个示例,其中MPEG-4音频/视频流不是由MPEG-4系统对象描述符管理的,而是由H.245[H245]管理的。流直接映射到RTP数据包,而不使用MPEG-4系统同步层。其他示例包括会话启动协议(SIP)[RFC3261]和实时流协议(RTSP),其中使用了媒体类型和SDP。本文档中描述的RTP有效负载格式的媒体类型和SDP使用被定义为直接指定音频/视频流的属性(例如,媒体类型、打包格式和编解码器配置),而不使用MPEG-4系统。明显的好处是,这些MPEG-4音频/视频RTP有效负载格式可以与为非MPEG-4编解码器定义的格式一起以统一的方式处理。缺点是与使用MPEG-4系统的环境的互操作性可能很困难;因此,其他有效载荷格式可能更适合这些应用。

The semantics of RTP headers in such cases need to be clearly defined, including the association with MPEG-4 Audio/Visual data elements. In addition, it is beneficial to define the fragmentation rules of RTP packets for MPEG-4 Video streams so as to enhance error resiliency by utilizing the error resiliency tools provided inside the MPEG-4 Video stream.

在这种情况下,需要明确定义RTP头的语义,包括与MPEG-4音频/视频数据元素的关联。此外,通过利用在MPEG-4视频流内部提供的差错恢复工具来定义用于MPEG-4视频流的RTP分组的分段规则以增强差错恢复能力是有益的。

1.1. MPEG-4 Visual RTP Payload Format
1.1. MPEG-4可视RTP有效负载格式

MPEG-4 Visual is a visual coding standard with many features, including: high coding efficiency; high error resiliency; and multiple, arbitrary shape object-based coding [14496-2]. It covers a wide range of bitrates from scores of kbit/s to several Mbit/s. It also covers a wide variety of networks, ranging from those guaranteed to be almost error-free to mobile networks with high error rates.

MPEG-4视频编码标准是一种具有许多特点的视频编码标准,包括:编码效率高;错误恢复能力强;以及多个任意形状的基于对象的编码[14496-2]。它涵盖了从kbit/s到数Mbit/s的各种比特率。它还涵盖各种各样的网络,从保证几乎无错误的网络到错误率高的移动网络。

With respect to the fragmentation rules for an MPEG-4 Visual bitstream defined in this document, since MPEG-4 Visual is used for a wide variety of networks, it is desirable not to apply too much restriction on fragmentation, and a fragmentation rule such as "a single video packet shall always be mapped on a single RTP packet"

关于本文档中定义的MPEG-4视频比特流的分段规则,由于MPEG-4视频用于各种各样的网络,因此不希望对分段应用过多的限制,并且诸如“单个视频分组应始终映射到单个RTP分组”之类的分段规则

may be inappropriate. On the other hand, careless, media-unaware fragmentation may cause degradation in error resiliency and bandwidth efficiency. The fragmentation rules described in this document are flexible but manage to define the minimum rules for preventing meaningless fragmentation while utilizing the error resiliency functionalities of MPEG-4 Visual.

可能不合适。另一方面,不小心的、不知道介质的碎片可能会导致错误恢复能力和带宽效率的降低。本文档中描述的分段规则是灵活的,但在利用MPEG-4 Visual的错误恢复功能的同时,能够定义用于防止无意义分段的最小规则。

The fragmentation rule "Different Video Object Planes (VOPs) SHOULD be fragmented into different RTP packets" is made so that the RTP timestamp uniquely indicates the VOP time framing. On the other hand, MPEG-4 video may generate VOPs of very small size, in cases with an empty VOP (vop_coded=0) containing only VOP header or an arbitrary shaped VOP with a small number of coding blocks. To reduce the overhead for such cases, the fragmentation rule permits concatenating multiple VOPs in an RTP packet. (See fragmentation rule (4) in Section 5.2 and the descriptions of marker bit and timestamp in Section 5.1.)

制定分段规则“不同的视频对象平面(VOP)应分段为不同的RTP数据包”,以便RTP时间戳唯一地指示VOP时间帧。另一方面,在空VOP(VOP_coded=0)仅包含VOP报头或具有少量编码块的任意形状VOP的情况下,MPEG-4视频可以生成非常小的VOP。为了减少这种情况下的开销,分段规则允许在RTP数据包中连接多个VOP。(参见第5.2节中的碎片规则(4)和第5.1节中的标记位和时间戳说明。)

While the additional media-specific RTP header defined for such video coding tools as H.261 [H261] or MPEG-1/2 is effective in helping to recover picture headers corrupted by packet losses, MPEG-4 Visual already has error resiliency functionalities for recovering corrupt headers, and these can be used on RTP/IP networks as well as on other networks (H.223/mobile, MPEG-2 Transport Stream, etc.). Therefore, no extra RTP header fields are defined in this MPEG-4 Visual RTP payload format.

虽然为H.261[H261]或MPEG-1/2等视频编码工具定义的附加媒体特定RTP报头在帮助恢复因分组丢失而损坏的图片报头方面是有效的,但MPEG-4 Visual已经具有用于恢复损坏报头的错误恢复功能,并且这些可用于RTP/IP网络以及其他网络(H.223/mobile、MPEG-2传输流等)。因此,在此MPEG-4可视RTP有效负载格式中没有定义额外的RTP头字段。

1.2. MPEG-4 Audio RTP Payload Format
1.2. MPEG-4音频RTP有效负载格式

MPEG-4 Audio is an audio standard that integrates many different types of audio coding tools. Low-overhead MPEG-4 Audio Transport Multiplex (LATM) manages the sequences of audio data with relatively small overhead. In audio-only applications, then, it is desirable for LATM-based MPEG-4 Audio bitstreams to be directly mapped onto RTP packets without using MPEG-4 Systems.

MPEG-4音频是一种集成了多种不同类型音频编码工具的音频标准。低开销MPEG-4音频传输多路复用(LATM)以相对较小的开销管理音频数据序列。因此,在仅音频应用中,希望基于LATM的MPEG-4音频比特流直接映射到RTP分组,而不使用MPEG-4系统。

For MPEG-4 Audio coding tools, as is true for other audio coders, if the payload is a single audio frame, packet loss will not impair the decodability of adjacent packets. Therefore, the additional media-specific header for recovering errors will not be required for MPEG-4 Audio. Existing RTP protection mechanisms, such as Generic Forward Error Correction [RFC5109] and Redundant Audio Data [RFC2198], MAY be applied to improve error resiliency.

对于MPEG-4音频编码工具,与其他音频编码器一样,如果有效载荷是单个音频帧,则数据包丢失不会损害相邻数据包的可解码性。因此,MPEG-4音频不需要用于恢复错误的附加媒体特定报头。现有的RTP保护机制,例如通用前向纠错[RFC5109]和冗余音频数据[RFC2198],可用于提高错误恢复能力。

1.3. Interoperability with RFC 3016
1.3. 与RFC3016的互操作性

This specification is not backwards compatible with [RFC3016], as a binary incompatible LATM version is mandated. Existing implementations of RFC 3016 that use a recent LATM version may already comply to this specification and must be considered as not compliant with RFC 3016. The 3GPP PSS service [3GPP] is such an example, as a more recent LATM version is mandated in the 3GPP PSS specification. Existing implementations that use the LATM version as specified in RFC 3016 MUST be updated to comply with this specification.

本规范与[RFC3016]不向后兼容,因为强制使用二进制不兼容的LATM版本。使用最新LATM版本的RFC 3016的现有实现可能已经符合本规范,必须视为不符合RFC 3016。3GPP PSS服务[3GPP]就是这样一个例子,因为在3GPP PSS规范中强制要求使用更新的LATM版本。必须更新使用RFC 3016中规定的LATM版本的现有实现,以符合本规范。

1.4. Relation with RFC 3640
1.4. 与RFC3640的关系

In this document a payload format for the transport of MPEG-4 Elementary Streams is specified. For MPEG-4 Audio streams "out-of-band" signaling is defined such that a receiver is not obliged to decode the payload data to determine the audio codec and its configuration. The signaling capabilities specified in this document are less explicit than those defined in [RFC3640]. But, the use of the MPEG-4 LATM in various transmission standards justifies its right to exist; see also Section 1.2.

在本文档中,指定了用于传输MPEG-4基本流的有效载荷格式。对于MPEG-4音频流,“带外”信令被定义为使得接收机不必解码有效载荷数据以确定音频编解码器及其配置。本文件中规定的信令能力没有[RFC3640]中规定的明确。但是,在各种传输标准中使用MPEG-4 LATM证明了其存在的权利;另见第1.2节。

2. Definitions and Abbreviations
2. 定义和缩写

This document makes use of terms, specified in [14496-2], [14496-3], and [23003-1]. In addition, the following terms are used in this document and have specific meaning within the context of this document.

本文件使用了[14496-2]、[14496-3]和[23003-1]中规定的术语。此外,本文件中使用了以下术语,在本文件上下文中具有特定含义。

Abbreviations:

缩写:

AAC: Advanced Audio Coding

高级音频编码

ASC: AudioSpecificConfig

ASC:AudioSpecificConfig

HE AAC: High Efficiency AAC

HE AAC:高效AAC

LATM: Low-overhead MPEG-4 Audio Transport Multiplex

LATM:低开销MPEG-4音频传输多路复用

PS: Parametric Stereo

参数立体声

SBR: Spectral Band Replication

SBR:谱带复制

VOP: Video Object Plane

视频对象平面

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照[RFC2119]中所述进行解释。

3. Clarifications on Specifying Codec Configurations for MPEG-4 Audio
3. 关于为MPEG-4音频指定编解码器配置的说明

For MPEG-4 Audio [14496-3] streams, the decoder output configuration can differ from the core codec configuration depending of use of the SBR and PS tools.

对于MPEG-4音频[14496-3]流,解码器输出配置可能不同于核心编解码器配置,具体取决于SBR和PS工具的使用。

The core codec sampling rate is the default audio codec sampling rate. When SBR is used, typically the double value of the core codec sampling rate will be regarded as the definitive sampling rate (i.e., the decoder's output sampling rate)

核心编解码器采样率是默认的音频编解码器采样率。当使用SBR时,通常核心编解码器采样率的两倍值将被视为最终采样率(即解码器的输出采样率)

Note: The exception is down-sampled SBR mode, in which case the SBR sampling rate and core codec sampling rate are identical.

注:下采样SBR模式除外,在这种情况下,SBR采样率和核心编解码器采样率相同。

The core codec channel configuration is the default audio codec channel configuration. When PS is used, the core codec channel configuration indicates one channel (i.e., mono) whereas the definitive channel configuration is two channels (i.e. stereo). When MPEG Surround is used, the definitive channel configuration depends on the output of the MPEG Surround decoder.

核心编解码器通道配置是默认的音频编解码器通道配置。使用PS时,核心编解码器信道配置指示一个信道(即单声道),而最终信道配置为两个信道(即立体声)。使用MPEG环绕时,最终通道配置取决于MPEG环绕解码器的输出。

4. LATM Restrictions for RTP Packetization of MPEG-4 Audio Bitstreams
4. MPEG-4音频比特流RTP打包的LATM限制

LATM has several multiplexing features as follows:

LATM具有以下几种多路复用功能:

o carrying configuration information with audio data,

o 携带配置信息和音频数据,

o concatenating multiple audio frames in one audio stream,

o 在一个音频流中连接多个音频帧,

o multiplexing multiple objects (programs), and

o 多路复用多个对象(程序),以及

o multiplexing scalable layers,

o 复用可伸缩层,

However, in RTP transmission, there is no need for the last two features. Therefore, these two features MUST NOT be used in applications based on RTP packetization specified by this document. Since LATM has been developed for only natural audio coding tools, i.e., not for synthesis tools, it seems difficult to transmit Structured Audio (SA) data and Text-to-Speech Interface (TTSI) data by LATM. Therefore, SA data and TTSI data MUST NOT be transported by the RTP packetization in this document.

然而,在RTP传输中,不需要最后两个特性。因此,这两个特性不得用于基于本文档指定的RTP打包的应用程序中。由于LATM仅用于自然音频编码工具,即不用于合成工具,因此似乎很难通过LATM传输结构化音频(SA)数据和文本到语音接口(TTSI)数据。因此,SA数据和TTSI数据不得通过本文件中的RTP打包进行传输。

For transmission of scalable streams, audio data of each layer SHOULD be packetized onto different RTP streams allowing for the different layers to be treated differently at the IP level, for example, via some means of differentiated service. On the other hand, all configuration data of the scalable streams are contained in one LATM configuration data "StreamMuxConfig", and every scalable layer shares the StreamMuxConfig. The mapping between each layer and its configuration data is achieved by LATM header information attached to the audio data. In order to indicate the dependency information of the scalable streams, the signaling mechanism as specified in [RFC5583] SHOULD be used (see Section 6.2).

对于可伸缩流的传输,每个层的音频数据应打包到不同的RTP流上,从而允许在IP级别(例如,通过某种区分服务的方式)对不同层进行不同的处理。另一方面,可伸缩流的所有配置数据都包含在一个LATM配置数据“StreamMuxConfig”中,并且每个可伸缩层共享StreamMuxConfig。每个层与其配置数据之间的映射是通过附加到音频数据的LATM头信息实现的。为了指示可伸缩流的依赖信息,应使用[RFC5583]中规定的信令机制(见第6.2节)。

5. RTP Packetization of MPEG-4 Visual Bitstreams
5. MPEG-4视频码流的RTP分组

This section specifies RTP packetization rules for MPEG-4 Visual content. An MPEG-4 Visual bitstream is mapped directly onto RTP packets without the addition of extra header fields or any removal of Visual syntax elements. The Combined Configuration/Elementary stream mode MUST be used so that configuration information will be carried to the same RTP port as the elementary stream. (See Subclause 6.2.1, "Start codes", of [14496-2].) The configuration information MAY additionally be specified by some out-of-band means. If needed by systems using media type parameters and SDP parameters, e.g., SIP and RTSP, the optional parameter "config" MUST be used to specify the configuration information (see Sections 7.1 and 7.2).

本节指定MPEG-4视频内容的RTP打包规则。MPEG-4视频比特流直接映射到RTP数据包上,无需添加额外的头字段或删除任何视频语法元素。必须使用组合配置/基本流模式,以便将配置信息传送到与基本流相同的RTP端口。(见[14496-2]第6.2.1款“启动代码”)。可通过一些带外方式额外指定配置信息。如果使用介质类型参数和SDP参数(如SIP和RTSP)的系统需要,则必须使用可选参数“config”来指定配置信息(参见第7.1节和第7.2节)。

When the short video header mode is used, the RTP payload format for H.263 SHOULD be used. (The format defined in [RFC4629] is RECOMMENDED, but the [RFC4628] format MAY be used for compatibility with older implementations.)

当使用短视频报头模式时,应使用H.263的RTP有效负载格式。(建议使用[RFC4629]中定义的格式,但[RFC4628]格式可用于与较早的实现兼容。)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       sequence number         | RTP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           | Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|            contributing source (CSRC) identifiers             |
|                             ....                              |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                                                               | RTP
|       MPEG-4 Visual stream (byte aligned)                     | Pay-
|                                                               | load
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       sequence number         | RTP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           | Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|            contributing source (CSRC) identifiers             |
|                             ....                              |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                                                               | RTP
|       MPEG-4 Visual stream (byte aligned)                     | Pay-
|                                                               | load
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 1: An RTP Packet for MPEG-4 Visual Stream

图1:MPEG-4视频流的RTP包

5.1. Use of RTP Header Fields for MPEG-4 Visual
5.1. MPEG-4视频编码中RTP头字段的使用

Payload Type (PT): The assignment of an RTP payload type for this packet format is outside the scope of this document and will not be specified here. It is expected that the RTP profile for a particular class of applications will assign a payload type for this encoding, or if that is not done, then a payload type in the dynamic range SHALL be chosen by means of an out-of-band signaling protocol (e.g., H.245, SIP).

有效负载类型(PT):此数据包格式的RTP有效负载类型的分配不在本文档的范围内,此处将不指定。预计特定类别应用的RTP配置文件将为该编码分配有效负载类型,或者如果未分配有效负载类型,则应通过带外信令协议(例如,H.245,SIP)选择动态范围内的有效负载类型。

Extension (X) bit: Defined by the RTP profile used.

扩展(X)位:由使用的RTP配置文件定义。

Sequence Number: Incremented by 1 for each RTP data packet sent, starting, for security reasons, with a random initial value.

序号:对于发送的每个RTP数据包,递增1,出于安全原因,以随机初始值开始。

Marker (M) bit: The marker bit is set to 1 to indicate the last RTP packet (or only RTP packet) of a VOP. When multiple VOPs are carried in the same RTP packet, the marker bit is set to 1.

标记(M)位:标记位设置为1,表示VOP的最后一个RTP数据包(或仅RTP数据包)。当在同一RTP数据包中携带多个VOP时,标记位设置为1。

Timestamp: The timestamp indicates the sampling instance of the VOP contained in the RTP packet. A constant offset, which is random, is added for security reasons.

时间戳:时间戳指示RTP数据包中包含的VOP的采样实例。出于安全原因,添加了一个随机的常量偏移量。

o When multiple VOPs are carried in the same RTP packet, the timestamp indicates the earliest of the VOP times within the VOPs carried in the RTP packet. Timestamp information of the rest of the VOPs is derived from the timestamp fields in the VOP header (modulo_time_base and vop_time_increment).

o 当在同一RTP包中承载多个VOP时,时间戳指示RTP包中承载的VOP中最早的VOP时间。其余VOP的时间戳信息来自VOP报头中的时间戳字段(模时间基和VOP时间增量)。

o If the RTP packet contains only configuration information and/or Group_of_VideoObjectPlane() fields, the timestamp of the next VOP in the coding order is used.

o 如果RTP数据包仅包含配置信息和/或\u VideoObjectPlane()字段组,则使用编码顺序中下一个VOP的时间戳。

o If the RTP packet contains only visual_object_sequence_end_code information, the timestamp of the immediately preceding VOP in the coding order is used.

o 如果RTP数据包仅包含可视的\u对象\u序列\u结束\u代码信息,则使用编码顺序中紧靠前的VOP的时间戳。

The resolution of the timestamp is set to its default value of 90 kHz, unless specified by out-of-band means (e.g., SDP parameter or media type parameter as defined in Section 7).

时间戳的分辨率设置为其默认值90 kHz,除非由带外方式(例如,第7节中定义的SDP参数或媒体类型参数)指定。

Other header fields are used as described in [RFC3550].

其他标题字段如[RFC3550]所述使用。

5.2. Fragmentation of MPEG-4 Visual Bitstream
5.2. MPEG-4视频码流的分段

A fragmented MPEG-4 Visual bitstream is mapped directly onto the RTP payload without any addition of extra header fields or any removal of Visual syntax elements.

片段化的MPEG-4视频比特流直接映射到RTP负载上,而无需添加任何额外的头字段或移除任何视频语法元素。

In the following, header means one of the following:

在下文中,标题是指以下内容之一:

o Configuration information (Visual Object Sequence Header, Visual Object Header, and Video Object Layer Header)

o 配置信息(可视对象序列头、可视对象头和视频对象层头)

o visual_object_sequence_end_code

o 可视\u对象\u序列\u结束\u代码

o The header of the entry point function for an elementary stream (Group_of_VideoObjectPlane() or the header of VideoObjectPlane(), video_plane_with_short_header(), MeshObject(), or FaceObject())

o 基本流的入口点函数的标头(组\u的\u VideoObjectPlane()或VideoObjectPlane()的标头、视频\u平面\u的\u short\u标头()、MeshObject()或FaceObject()

o The video packet header (video_packet_header() excluding next_resync_marker())

o 视频数据包头(视频数据包头()不包括下一个重新同步标记()

o The header of gob_layer()

o gob_层()的标头

o See Subclause 6.2.1 ("Start codes") of [14496-2] for the definition of the configuration information and the entry point functions.

o 配置信息和入口点功能的定义见[14496-2]第6.2.1款(“启动代码”)。

The Combined Configuration/Elementary streams mode is used. The following rules apply for the fragmentation.

使用组合配置/基本流模式。以下规则适用于碎片。

(1) Configuration information and Group_of_VideoObjectPlane() fields SHALL be placed at the beginning of the RTP payload (just after the RTP header) or just after the header of the syntactically upper-layer function.

(1) 配置信息和\u VideoObjectPlane()字段的组\u应放置在RTP有效载荷的开头(紧跟在RTP头之后)或语法上层函数的头之后。

(2) If one or more headers exist in the RTP payload, the RTP payload SHALL begin with the header of the syntactically highest function. Note: The visual_object_sequence_end_code is regarded as the lowest function.

(2) 如果RTP有效载荷中存在一个或多个报头,RTP有效载荷应以语法上最高功能的报头开始。注:可视对象\序列\结束\代码被视为最低功能。

(3) A header SHALL NOT be split into a plurality of RTP packets.

(3) 不得将报头拆分为多个RTP数据包。

(4) Different VOPs SHOULD be fragmented into different RTP packets so that one RTP packet consists of the data bytes associated with a unique VOP time instance (that is indicated in the timestamp field in the RTP packet header), with the exception that multiple consecutive VOPs MAY be carried within one RTP packet in the decoding order if the size of the VOPs is small.

(4) 不同的VOP应分段为不同的RTP包,以便一个RTP包由与唯一VOP时间实例相关联的数据字节组成(在RTP包报头的时间戳字段中指示),但是,如果vop的大小很小,则可以在一个RTP分组中以解码顺序携带多个连续vop。

Note: When multiple VOPs are carried in one RTP payload, the timestamp of the VOPs after the first one may be calculated by the decoder. This operation is necessary only for RTP packets in which the marker bit equals to 1 and the beginning of the RTP payload corresponds to a start code. (See the descriptions of timestamp and marker bit in Section 5.1.)

注:当在一个RTP有效载荷中承载多个VOP时,解码器可计算第一个VOP之后的VOP的时间戳。此操作仅对于标记位等于1且RTP有效负载的开头对应于开始代码的RTP数据包是必需的。(见第5.1节中的时间戳和标记位说明。)

(5) It is RECOMMENDED that a single video packet is sent as a single RTP packet. The size of a video packet SHOULD be adjusted in such a way that the resulting RTP packet is not larger than the Path MTU. If the video packet is disabled by the coder configuration (by setting resync_marker_disable in the VOL header to 1), or in coding tools where the video packet is not supported, a VOP MAY be split at arbitrary byte positions.

(5) 建议将单个视频数据包作为单个RTP数据包发送。视频分组的大小应以这样的方式进行调整,即所得RTP分组不大于路径MTU。如果编码器配置(通过将VOL报头中的resync_marker_disable设置为1)禁用了视频分组,或者在不支持视频分组的编码工具中禁用了视频分组,则可以在任意字节位置拆分VOP。

The video packet starts with the VOP header or the video packet header, followed by motion_shape_texture(), and ends with next_resync_marker() or next_start_code().

视频包以VOP头或视频包头开始,然后是运动形状纹理(),最后是下一个重新同步标记()或下一个开始代码()。

5.3. Examples of Packetized MPEG-4 Visual Bitstream
5.3. 分组化MPEG-4视频比特流示例

Figure 2 shows examples of RTP packets generated based on the criteria described in Section 5.2

图2显示了根据第5.2节中描述的标准生成的RTP数据包的示例

(a) is an example of the first RTP packet or the random access point of an MPEG-4 Visual bitstream containing the configuration information. According to criterion (1), the Visual Object Sequence Header (VS header) is placed at the beginning of the RTP payload, preceding the Visual Object Header and the Video Object Layer Header (VO header, VOL header). Since the fragmentation rule defined in Section 5.2 guarantees that the configuration information, starting with visual_object_sequence_start_code, is always placed at the beginning of the RTP payload, RTP receivers can detect the random access point by checking if the first 32-bit field of the RTP payload is visual_object_sequence_start_code.

(a) 是包含配置信息的MPEG-4视频比特流的第一RTP分组或随机接入点的示例。根据准则(1),视觉对象序列报头(VS报头)被放置在RTP有效载荷的开始处,在视觉对象报头和视频对象层报头(VO报头、VOL报头)之前。由于第5.2节中定义的分段规则保证配置信息(从可视对象序列开始)始终放在RTP有效负载的开头,RTP接收器可以通过检查RTP有效负载的第一个32位字段是否为可视对象序列开始代码来检测随机接入点。

(b) is another example of the RTP packet containing the configuration information. It differs from example (a) in that the RTP packet also contains a VOP header and a video packet in the VOP following the configuration information. Since the length of the configuration information is relatively short (typically scores of bytes) and an RTP packet containing only the configuration information may thus increase the overhead, the configuration information and the subsequent VOP can be packetized into a single RTP packet.

(b) 是包含配置信息的RTP包的另一个示例。它与示例(a)的不同之处在于,RTP分组还包含VOP报头和VOP中跟随配置信息的视频分组。由于配置信息的长度相对较短(通常为几十字节),并且仅包含配置信息的RTP分组因此可能增加开销,因此可以将配置信息和后续VOP打包成单个RTP分组。

(c) is an example of an RTP packet that contains Group_of_VideoObjectPlane (GOV). Following criterion (1), the GOV is placed at the beginning of the RTP payload. It would be a waste of RTP/IP header overhead to generate an RTP packet containing only a GOV whose length is 7 bytes. Therefore, the following VOP (or a part of it) can be placed in the same RTP packet as shown in (c).

(c) 是RTP数据包的一个示例,其中包含视频对象平面(GOV)的组。根据标准(1),GOV被放置在RTP有效载荷的开头。生成仅包含长度为7字节的GOV的RTP数据包将浪费RTP/IP报头开销。因此,以下VOP(或其一部分)可以放置在(c)中所示的同一RTP包中。

(d) is an example of the case where one video packet is packetized into one RTP packet. When the packet-loss rate of the underlying network is high, this kind of packetization is recommended. Even when the RTP packet containing the VOP header is discarded by a packet loss, the other RTP packets can be decoded by using the HEC (Header Extension Code) information in the video packet header. No extra RTP header field is necessary.

(d) 是将一个视频分组打包成一个RTP分组的情况的示例。当底层网络的丢包率较高时,建议采用这种分组方式。即使当包含VOP报头的RTP分组由于分组丢失而被丢弃时,也可以通过使用视频分组报头中的HEC(报头扩展码)信息来解码其他RTP分组。不需要额外的RTP标头字段。

(e) is an example of the case where more than one video packet is packetized into one RTP packet. This kind of packetization is effective to save the overhead of RTP/IP headers when the bitrate of the underlying network is low. However, it will decrease the packet-loss resiliency because multiple video packets are discarded by a single RTP packet loss. The optimal number of video packets in an RTP packet and the length of the RTP packet can be determined by considering the packet-loss rate and the bitrate of the underlying network.

(e) 是将多个视频分组打包成一个RTP分组的情况的示例。当底层网络的比特率较低时,这种分组可以有效地节省RTP/IP报头的开销。然而,由于单个RTP数据包丢失会丢弃多个视频数据包,因此会降低数据包丢失的弹性。可以通过考虑分组丢失率和底层网络的比特率来确定RTP分组中视频分组的最佳数目和RTP分组的长度。

(f) is an example of the case when the video packet is disabled by setting resync_marker_disable in the VOL header to 1. In this case, a VOP may be split into a plurality of RTP packets at arbitrary byte positions. For example, it is possible to split a VOP into fixed-length packets. This kind of coder configuration and RTP packet fragmentation may be used when the underlying network is guaranteed to be error-free.

(f) 是通过将VOL报头中的resync_marker_disable设置为1来禁用视频分组的示例。在这种情况下,VOP可以在任意字节位置被分割成多个RTP分组。例如,可以将VOP拆分为固定长度的数据包。当保证底层网络无错误时,可以使用这种编码器配置和RTP数据包分段。

Figure 3 shows examples of RTP packets prohibited by the criteria of Section 5.2.

图3显示了第5.2节标准禁止的RTP数据包示例。

Fragmentation of a header into multiple RTP packets, as in Figure 3(a), will not only increase the overhead of RTP/IP headers but also decrease the error resiliency. Therefore, it is prohibited by criterion (3).

如图3(a)所示,将一个报头分割成多个RTP数据包,不仅会增加RTP/IP报头的开销,还会降低错误恢复能力。因此,标准(3)禁止使用。

When concatenating more than one video packet into an RTP packet, the VOP header or video_packet_header() is not allowed to be placed in the middle of the RTP payload. The packetization as in Figure 2(b) is not allowed by criterion (2) due to the aspect of the error resiliency. Comparing this example with Figure 2(d), although two video packets are mapped onto two RTP packets in both cases, the packet-loss resiliency is not identical. Namely, if the second RTP packet is lost, both video packets 1 and 2 are lost in the case of Figure 3(b), whereas only video packet 2 is lost in the case of Figure 2(d).

当将一个以上的视频包连接到RTP分组时,不允许将VOP头或VooOxPayTeTHead()放置在RTP有效载荷的中间。由于容错性方面的原因,标准(2)不允许图2(b)中的打包。将该示例与图2(d)进行比较,尽管在这两种情况下两个视频包都映射到两个RTP包上,但包丢失恢复能力并不相同。即,如果第二RTP分组丢失,则在图3(b)的情况下视频分组1和2都丢失,而在图2(d)的情况下仅视频分组2丢失。

    +------+------+------+------+
(a) | RTP  |  VS  |  VO  | VOL  |
    |header|header|header|header|
    +------+------+------+------+
        
    +------+------+------+------+
(a) | RTP  |  VS  |  VO  | VOL  |
    |header|header|header|header|
    +------+------+------+------+
        
    +------+------+------+------+------+------------+
(b) | RTP  |  VS  |  VO  | VOL  | VOP  |Video Packet|
    |header|header|header|header|header|            |
    +------+------+------+------+------+------------+
        
    +------+------+------+------+------+------------+
(b) | RTP  |  VS  |  VO  | VOL  | VOP  |Video Packet|
    |header|header|header|header|header|            |
    +------+------+------+------+------+------------+
        
    +------+-----+------------------+
(c) | RTP  | GOV |Video Object Plane|
    |header|     |                  |
    +------+-----+------------------+
        
    +------+-----+------------------+
(c) | RTP  | GOV |Video Object Plane|
    |header|     |                  |
    +------+-----+------------------+
        
    +------+------+------------+  +------+------+------------+
(d) | RTP  | VOP  |Video Packet|  | RTP  |  VP  |Video Packet|
    |header|header|    (1)     |  |header|header|    (2)     |
    +------+------+------------+  +------+------+------------+
        
    +------+------+------------+  +------+------+------------+
(d) | RTP  | VOP  |Video Packet|  | RTP  |  VP  |Video Packet|
    |header|header|    (1)     |  |header|header|    (2)     |
    +------+------+------------+  +------+------+------------+
        
    +------+------+------------+------+------------+------+------------+
(e) | RTP  |  VP  |Video Packet|  VP  |Video Packet|  VP  |Video Packet|
    |header|header|     (1)    |header|    (2)     |header|    (3)     |
    +------+------+------------+------+------------+------+------------+
        
    +------+------+------------+------+------------+------+------------+
(e) | RTP  |  VP  |Video Packet|  VP  |Video Packet|  VP  |Video Packet|
    |header|header|     (1)    |header|    (2)     |header|    (3)     |
    +------+------+------------+------+------------+------+------------+
        
    +------+------+------------+  +------+------------+
(f) | RTP  | VOP  |VOP fragment|  | RTP  |VOP fragment|
    |header|header|    (1)     |  |header|    (2)     | . . .
    +------+------+------------+  +------+------------+
        
    +------+------+------------+  +------+------------+
(f) | RTP  | VOP  |VOP fragment|  | RTP  |VOP fragment|
    |header|header|    (1)     |  |header|    (2)     | . . .
    +------+------+------------+  +------+------------+
        

Figure 2: Examples of RTP Packetized MPEG-4 Visual Bitstream

图2:RTP打包的MPEG-4视频比特流示例

      +------+-------------+  +------+------------+------------+
  (a) | RTP  |First half of|  | RTP  |Last half of|Video Packet|
      |header|  VP header  |  |header|  VP header |            |
      +------+-------------+  +------+------------+------------+
        
      +------+-------------+  +------+------------+------------+
  (a) | RTP  |First half of|  | RTP  |Last half of|Video Packet|
      |header|  VP header  |  |header|  VP header |            |
      +------+-------------+  +------+------------+------------+
        
      +------+------+----------+  +------+---------+------+------------+
  (b) | RTP  | VOP  |First half|  | RTP  |Last half|  VP  |Video Packet|
      |header|header| of VP(1) |  |header| of VP(1)|header|    (2)     |
      +------+------+----------+  +------+---------+------+------------+
        
      +------+------+----------+  +------+---------+------+------------+
  (b) | RTP  | VOP  |First half|  | RTP  |Last half|  VP  |Video Packet|
      |header|header| of VP(1) |  |header| of VP(1)|header|    (2)     |
      +------+------+----------+  +------+---------+------+------------+
        

Figure 3: Examples of Prohibited RTP Packetization for MPEG-4 Visual

图3:MPEG-4视频格式的禁止RTP打包示例

6. RTP Packetization of MPEG-4 Audio Bitstreams
6. MPEG-4音频码流的RTP分组

This section specifies RTP packetization rules for MPEG-4 Audio bitstreams. MPEG-4 Audio streams MUST be formatted LATM (Low-overhead MPEG-4 Audio Transport Multiplex) [14496-3] streams, and the LATM-based streams are then mapped onto RTP packets as described in the sections below.

本节指定MPEG-4音频比特流的RTP打包规则。MPEG-4音频流必须格式化为LATM(低开销MPEG-4音频传输多路复用)[14496-3]流,然后将基于LATM的流映射到RTP包上,如以下各节所述。

6.1. RTP Packet Format
6.1. RTP数据包格式

LATM-based streams consist of a sequence of audioMuxElements that include one or more PayloadMux elements that carry the audio frames. A complete audioMuxElement or a part of one SHALL be mapped directly onto an RTP payload without any removal of audioMuxElement syntax elements (see Figure 4). The first byte of each audioMuxElement SHALL be located at the first payload location in an RTP packet.

基于LATM的流由一系列音频复用元素组成,这些音频复用元素包括一个或多个承载音频帧的PayloadMux元素。完整的audioMuxElement或其一部分应直接映射到RTP有效载荷上,而无需移除audioMuxElement语法元素(见图4)。每个音频多路复用器的第一个字节应位于RTP数据包中的第一个有效负载位置。

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       sequence number         |RTP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           |Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|            contributing source (CSRC) identifiers             |
|                             ....                              |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                                                               |RTP
:                 audioMuxElement (byte aligned)                :Payload
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       sequence number         |RTP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           |Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|            contributing source (CSRC) identifiers             |
|                             ....                              |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                                                               |RTP
:                 audioMuxElement (byte aligned)                :Payload
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 4 - An RTP packet for MPEG-4 Audio

图4-MPEG-4音频的RTP数据包

In order to decode the audioMuxElement, the following muxConfigPresent information is required to be indicated by out-of-band means. When SDP is utilized for this indication, the media type parameter "cpresent" corresponds to the muxConfigPresent information (see Section 7.3). The following restrictions apply:

为了解码音频像素,需要通过带外方式指示以下muxConfigPresent信息。当SDP用于此指示时,媒体类型参数“cpresent”对应于muxConfigPresent信息(参见第7.3节)。以下限制适用:

o In the out-of-band configuration case, the number of PayloadMux elements contained in each audioMuxElement can only be set once. If more than one PayloadMux element is contained in each audioMuxElement, special care is required to ensure that the last RTP packet remains decodable.

o 在带外配置情况下,每个audioMuxElement中包含的PayloadMux元素的数量只能设置一次。如果每个audioMuxElement中包含多个PayloadMux元素,则需要特别注意确保最后一个RTP数据包保持可解码状态。

o To construct the audioMuxElement in the in-band configuration case, non-octet-aligned configuration data is inserted immediately before the one or more PayloadMux elements. Since the generation of RTP payloads with non-octet-aligned data is not possible with RTP hint tracks, as defined by the MP4 file format [14496-12] [14496-14], this document does not support RTP hint tracks for the in-band configuration case.

o 为了在带内配置情况下构造audioMuxElement,在一个或多个PayloadMux元素之前插入非八位组对齐的配置数据。由于使用MP4文件格式[14496-12][14496-14]定义的RTP提示磁道无法生成具有非八位字节对齐数据的RTP有效载荷,因此本文档不支持带内配置情况下的RTP提示磁道。

muxConfigPresent: If this value is set to 1 (in-band mode), the audioMuxElement SHALL include an indication bit "useSameStreamMux" and MAY include the configuration information for audio compression "StreamMuxConfig". The useSameStreamMux bit indicates whether the StreamMuxConfig element in the previous frame is applied in the current frame. If the useSameStreamMux bit indicates to use the StreamMuxConfig from the previous frame, but if the previous frame has been lost, the current frame may not be decodable. Therefore, in case of in-band mode, the StreamMuxConfig element SHOULD be transmitted repeatedly depending on the network condition. On the other hand, if muxConfigPresent is set to 0 (out-of-band mode), the StreamMuxConfig element is required to be transmitted by an out-of-band means. In case of SDP, the media type parameter "config" is utilized (see Section 7.3).

muxConfigPresent:如果该值设置为1(带内模式),则AudioMuxement应包括一个指示位“useSameStreamMux”,并可能包括音频压缩的配置信息“StreamMuxConfig”。useSameStreamMux位指示前一帧中的StreamMuxConfig元素是否应用于当前帧。如果useSameStreamMux位指示使用前一帧的StreamMuxConfig,但如果前一帧已丢失,则当前帧可能无法解码。因此,在带内模式下,应根据网络条件重复传输StreamMuxConfig元素。另一方面,如果muxConfigPresent设置为0(带外模式),则需要通过带外方式传输StreamMuxConfig元素。对于SDP,使用媒体类型参数“config”(见第7.3节)。

6.2. Use of RTP Header Fields for MPEG-4 Audio
6.2. 对MPEG-4音频使用RTP报头字段

Payload Type (PT): The assignment of an RTP payload type for this packet format is outside the scope of this document and will only be restricted here. It is expected that the RTP profile for a particular class of applications will assign a payload type for this encoding, or if that is not done, then a payload type in the dynamic range shall be chosen by means of an out-of-band signaling protocol (e.g., H.245, SIP). In the dynamic assignment of RTP payload types for scalable streams, the server SHALL assign a different value to each layer. The dependency relationships between the enhanced layer and the base layer MUST be signaled as specified in [RFC5583]. An example of the use of such signaling for scalable audio streams can be found in [RFC5691].

有效负载类型(PT):此数据包格式的RTP有效负载类型的分配不在本文档的范围内,仅限于此。预计特定类别应用的RTP配置文件将为该编码分配有效负载类型,或者如果未分配有效负载类型,则应通过带外信令协议(例如,H.245,SIP)选择动态范围内的有效负载类型。在可伸缩流的RTP有效负载类型的动态分配中,服务器应为每个层分配不同的值。增强层和基础层之间的依赖关系必须按照[RFC5583]中的规定发出信号。在[RFC5691]中可以找到将这种信令用于可伸缩音频流的示例。

Marker (M) bit: The marker bit indicates audioMuxElement boundaries. It is set to 1 to indicate that the RTP packet contains a complete audioMuxElement or the last fragment of an audioMuxElement.

标记(M)位:标记位表示音频像素边界。设置为1表示RTP数据包包含完整的audioMuxElement或audioMuxElement的最后一个片段。

Timestamp: The timestamp indicates the sampling instance of the first audio frame contained in the RTP packet. Timestamps are RECOMMENDED to start at a random value for security reasons.

时间戳:时间戳指示RTP数据包中包含的第一个音频帧的采样实例。出于安全原因,建议时间戳以随机值开始。

Unless specified by an out-of-band means, the resolution of the timestamp is set to its default value of 90 kHz.

除非由带外方式指定,否则时间戳的分辨率设置为其默认值90 kHz。

Sequence Number: Incremented by 1 for each RTP packet sent, starting, for security reasons, with a random value.

序号:对于发送的每个RTP数据包,递增1,出于安全原因,以随机值开始。

Other header fields are used as described in [RFC3550].

其他标题字段如[RFC3550]所述使用。

6.3. Fragmentation of MPEG-4 Audio Bitstream
6.3. MPEG-4音频比特流的分段

It is RECOMMENDED to put one audioMuxElement in each RTP packet. If the size of an audioMuxElement can be kept small enough that the size of the RTP packet containing it does not exceed the size of the Path MTU, this will be no problem. If it cannot, the audioMuxElement SHALL be fragmented and spread across multiple packets.

建议在每个RTP数据包中放置一个audioMuxElement。如果audioMuxElement的大小可以保持足够小,使得包含它的RTP包的大小不超过路径MTU的大小,这将不会有问题。如果不能,则音频多路复用应分段并分布在多个数据包中。

7. Media Type Registration for MPEG-4 Audio/Visual Streams
7. MPEG-4音频/视频流的媒体类型注册

The following sections describe the media type registrations for MPEG-4 Audio/Visual streams, which are registered in accordance with [RFC4855] and use the template of [RFC4288]. Media type registration and SDP usage for the MPEG-4 Visual stream are described in Sections 7.1 and 7.2, respectively, while media type registration and SDP usage for MPEG-4 Audio stream are described in Sections 7.3 and 7.4, respectively.

以下各节描述了根据[RFC4855]注册并使用[RFC4288]模板的MPEG-4音频/视频流的媒体类型注册。第7.1节和第7.2节分别描述了MPEG-4视频流的媒体类型注册和SDP使用,而第7.3节和第7.4节分别描述了MPEG-4音频流的媒体类型注册和SDP使用。

7.1. Media Type Registration for MPEG-4 Visual
7.1. MPEG-4视频流的媒体类型注册

The receiver MUST ignore any unspecified parameter in order to ensure that additional parameters can be added in any future revision of this specification.

接收器必须忽略任何未指定的参数,以确保在本规范的任何未来版本中可以添加其他参数。

Type name: video

类型名称:视频

Subtype name: MP4V-ES

子类型名称:MP4V-ES

Required parameters: none

所需参数:无

Optional parameters:

可选参数:

"rate": This parameter is used only for RTP transport. It indicates the resolution of the timestamp field in the RTP header. If this parameter is not specified, its default value of 90000 (90 kHz) is used.

“速率”:此参数仅用于RTP传输。它指示RTP标头中时间戳字段的分辨率。如果未指定此参数,则使用其默认值90000(90 kHz)。

"profile-level-id": A decimal representation of MPEG-4 Visual Profile and Level indication value (profile_and_level_indication) defined in Table G-1 of [14496-2]. This parameter MAY be used in the capability exchange or session setup procedure to indicate the MPEG-4 Visual Profile and Level combination of which the MPEG-4 Visual codec is capable. If this parameter is not specified by the procedure, its default value of 1 (Simple Profile/Level 1) is used.

“配置文件级别id”:MPEG-4视觉配置文件和级别指示值(配置文件和级别指示)的十进制表示,定义见[14496-2]表G-1。该参数可在能力交换或会话设置过程中使用,以指示MPEG-4视频编解码器能够实现的MPEG-4视频简档和级别组合。如果程序未指定此参数,则使用其默认值1(简单配置文件/级别1)。

"config": This parameter SHALL be used to indicate the configuration of the corresponding MPEG-4 Visual bitstream. It SHALL NOT be used to indicate the codec capability in the capability exchange procedure. It is a hexadecimal representation of an octet string that expresses the MPEG-4 Visual configuration information, as defined in Subclause 6.2.1 ("Start codes") of [14496-2]. The configuration information is mapped onto the octet string most significant bit (MSB) first. The first bit of the configuration information SHALL be located at the MSB of the first octet. The configuration information indicated by this parameter SHALL be the same as the configuration information in the corresponding MPEG-4 Visual stream, except for first_half_vbv_occupancy and latter_half_vbv_occupancy (if they exist), which may vary in the repeated configuration information inside an MPEG-4 Visual stream. (See Subclause 6.2.1, "Start codes", of [14496-2].)

“配置”:该参数用于指示相应MPEG-4视频比特流的配置。在能力交换程序中,它不应用于指示编解码器能力。它是表示MPEG-4视觉配置信息的八位字节字符串的十六进制表示,如[14496-2]第6.2.1款(“开始代码”)所定义。配置信息首先映射到八位字符串最高有效位(MSB)。配置信息的第一位应位于第一个八位组的MSB。该参数指示的配置信息应与相应MPEG-4视频流中的配置信息相同,但前半部分占用和后半部分占用(如果存在)除外,这可能在MPEG-4视频流中的重复配置信息中有所不同。(见[14496-2]第6.2.1款“启动代码”)

Published specification:

已发布的规范:

The specifications for MPEG-4 Visual streams are presented in [14496-2]. The RTP payload format is described in [RFC6416].

[14496-2]中介绍了MPEG-4视频流的规范。[RFC6416]中描述了RTP有效负载格式。

Encoding considerations:

编码注意事项:

Video bitstreams MUST be generated according to MPEG-4 Visual specifications [14496-2]. A video bitstream is binary data and MUST be encoded for non-binary transport (for email, the Base64 encoding is sufficient). This type is also defined for transfer via RTP. The RTP packets MUST be packetized according to the MPEG-4 Visual RTP payload format defined in [RFC6416].

视频比特流必须根据MPEG-4视频规范生成[14496-2]。视频比特流是二进制数据,必须为非二进制传输进行编码(对于电子邮件,Base64编码就足够了)。此类型也定义为通过RTP传输。RTP数据包必须根据[RFC6416]中定义的MPEG-4可视RTP有效负载格式进行打包。

Security considerations:

安全考虑:

See Section 10 of [RFC6416].

见[RFC6416]第10节。

Interoperability considerations:

互操作性注意事项:

MPEG-4 Visual provides a large and rich set of tools for the coding of visual objects. For effective implementation of the standard, subsets of the MPEG-4 Visual tool sets have been provided for use in specific applications. These subsets, called 'Profiles', limit the size of the tool set a decoder is required to implement. In order to restrict computational complexity, one or more Levels are set for each Profile. A Profile@Level combination allows:

MPEG-4 Visual为可视对象的编码提供了一套丰富的工具。为了有效实施该标准,提供了MPEG-4视频工具集的子集,以用于特定应用。这些称为“概要文件”的子集限制了解码器需要实现的工具集的大小。为了限制计算复杂性,为每个配置文件设置一个或多个级别。A.Profile@Level组合允许:

* a codec builder to implement only the subset of the standard he needs, while maintaining interworking with other MPEG-4 devices included in the same combination, and

* 编解码器生成器,仅实现所需标准的子集,同时保持与同一组合中包含的其他MPEG-4设备的互通,以及

* checking whether MPEG-4 devices comply with the standard ('conformance testing').

* 检查MPEG-4设备是否符合标准(“一致性测试”)。

The visual stream SHALL be compliant with the MPEG-4 Visual Profile@Level specified by the parameter "profile-level-id". Interoperability between a sender and a receiver may be achieved by specifying the parameter "profile-level-id" or by arranging a capability exchange/announcement procedure for this parameter.

视频流应符合MPEG-4视频流标准Profile@Level由参数“配置文件级别id”指定。发送方和接收方之间的互操作性可通过指定参数“概要文件级别id”或为此参数安排能力交换/公告程序来实现。

Applications that use this media type:

使用此媒体类型的应用程序:

Audio and visual streaming and conferencing tools

音频和视频流媒体及会议工具

Additional information: none

其他信息:无

Person and email address to contact for further information:

联系人和电子邮件地址,以获取更多信息:

See Authors' Addresses section at the end of [RFC6416].

参见[RFC6416]末尾的作者地址部分。

Intended usage: COMMON

预期用途:普通

Author:

作者:

See Authors' Addresses section at the end of [RFC6416].

参见[RFC6416]末尾的作者地址部分。

Change controller:

更改控制器:

IETF Audio/Video Transport Payloads working group delegated from the IESG.

IESG授权的IETF音频/视频传输有效载荷工作组。

7.2. Mapping to SDP for MPEG-4 Visual
7.2. 映射到SDP以实现MPEG-4可视化

The media type video/MP4V-ES string is mapped to fields in SDP [RFC4566], as follows:

媒体类型video/MP4V-ES字符串映射到SDP[RFC4566]中的字段,如下所示:

o The media type (video) goes in SDP "m=" as the media name.

o 媒体类型(视频)以SDP“m=”作为媒体名称。

o The Media subtype (MP4V-ES) goes in SDP "a=rtpmap" as the encoding name.

o 媒体子类型(MP4V-ES)以SDP“a=rtpmap”作为编码名称。

o The optional parameter "rate" goes in "a=rtpmap" as the "clock rate".

o 可选参数“速率”作为“时钟速率”进入“a=rtpmap”。

o The optional parameter "profile-level-id" and "config" go in the "a=fmtp" line to indicate the coder capability and configuration, respectively. These parameters are expressed as a string, in the form of a semicolon-separated list of parameter=value pairs.

o 可选参数“profile level id”和“config”分别显示在“a=fmtp”行中,以指示编码器的能力和配置。这些参数表示为字符串,以分号分隔的参数=值对列表的形式。

      Example usages for the "profile-level-id" parameter are:
      1  : MPEG-4 Visual Simple Profile/Level 1
      34 : MPEG-4 Visual Core Profile/Level 2
      145: MPEG-4 Visual Advanced Real Time Simple Profile/Level 1
        
      Example usages for the "profile-level-id" parameter are:
      1  : MPEG-4 Visual Simple Profile/Level 1
      34 : MPEG-4 Visual Core Profile/Level 2
      145: MPEG-4 Visual Advanced Real Time Simple Profile/Level 1
        
7.2.1. Declarative SDP Usage for MPEG-4 Visual
7.2.1. 用于MPEG-4视频的声明性SDP用法

The following are some examples of media representations in SDP:

以下是SDP中媒体表示的一些示例:

   Simple Profile/Level 1, rate=90000(90 kHz), "profile-level-id" and
   "config" are present in "a=fmtp" line:
     m=video 49170/2 RTP/AVP 98
     a=rtpmap:98 MP4V-ES/90000
     a=fmtp:98 profile-level-id=1;config=000001B001000001B50900000100000
        00120008440FA282C2090A21F
        
   Simple Profile/Level 1, rate=90000(90 kHz), "profile-level-id" and
   "config" are present in "a=fmtp" line:
     m=video 49170/2 RTP/AVP 98
     a=rtpmap:98 MP4V-ES/90000
     a=fmtp:98 profile-level-id=1;config=000001B001000001B50900000100000
        00120008440FA282C2090A21F
        
   Core Profile/Level 2, rate=90000(90 kHz), "profile-level-id" is
   present in "a=fmtp" line:
     m=video 49170/2 RTP/AVP 98
     a=rtpmap:98 MP4V-ES/90000
     a=fmtp:98 profile-level-id=34
        
   Core Profile/Level 2, rate=90000(90 kHz), "profile-level-id" is
   present in "a=fmtp" line:
     m=video 49170/2 RTP/AVP 98
     a=rtpmap:98 MP4V-ES/90000
     a=fmtp:98 profile-level-id=34
        
   Advance Real Time Simple Profile/Level 1, rate=90000(90 kHz),
   "profile-level-id" is present in "a=fmtp" line:
     m=video 49170/2 RTP/AVP 98
     a=rtpmap:98 MP4V-ES/90000
     a=fmtp:98 profile-level-id=145
        
   Advance Real Time Simple Profile/Level 1, rate=90000(90 kHz),
   "profile-level-id" is present in "a=fmtp" line:
     m=video 49170/2 RTP/AVP 98
     a=rtpmap:98 MP4V-ES/90000
     a=fmtp:98 profile-level-id=145
        
7.3. Media Type Registration for MPEG-4 Audio
7.3. MPEG-4音频的媒体类型注册

The receiver MUST ignore any unspecified parameter, to ensure that additional parameters can be added in any future revision of this specification.

接收器必须忽略任何未指定的参数,以确保在本规范的任何未来版本中可以添加其他参数。

Type name: audio

类型名称:音频

Subtype name: MP4A-LATM

子类型名称:MP4A-LATM

Required parameters:

所需参数:

"rate": the "rate" parameter indicates the RTP timestamp "clock rate". The default value is 90000. Other rates MAY be indicated only if they are set to the same value as the audio sampling rate (number of samples per second).

“速率”:rate参数表示RTP时间戳“时钟速率”。默认值为90000。只有将其他速率设置为与音频采样速率(每秒采样数)相同的值时,才能指示其他速率。

In the presence of SBR, the sampling rates for the core encoder/ decoder and the SBR tool are different in most cases. Therefore, this parameter SHALL NOT be considered as the definitive sampling rate. If this parameter is used, the server must follow the rules below:

在存在SBR的情况下,核心编码器/解码器和SBR工具的采样率在大多数情况下是不同的。因此,该参数不应视为最终采样率。如果使用此参数,服务器必须遵循以下规则:

* When the presence of SBR is not explicitly signaled by the optional SDP parameters such as "object", "profile-level-id", or "config", this parameter SHALL be set to the core codec sampling rate.

* 当可选SDP参数(如“对象”、“配置文件级别id”或“配置”)未明确表示存在SBR时,该参数应设置为核心编解码器采样率。

* When the presence of SBR is explicitly signaled by the optional SDP parameters such as "object", "profile-level-id", or "config", this parameter SHALL be set to the SBR sampling rate.

* 当可选SDP参数(如“对象”、“配置文件级别id”或“配置”)明确表示存在SBR时,该参数应设置为SBR采样率。

NOTE: The optional parameter "SBR-enabled" in SDP "a=fmtp" is useful for implicit HE AAC / HE AAC v2 signaling. But the "SBR-enabled" parameter can also be used in the case of explicit HE AAC / HE AAC v2 signaling. Therefore, its existence (in itself) is not the criteria to determine whether or HE AAC / HE AAC v2 signaling is explicit.

注:SDP“a=fmtp”中的可选参数“SBR enabled”对隐式HE AAC/HE AAC v2信令有用。但“SBR启用”参数也可用于显式HE AAC/HE AAC v2信令的情况。因此,它的存在(本身)不是判断AAC/HE AAC v2信号是否明确的标准。

Optional parameters:

可选参数:

"profile-level-id": a decimal representation of MPEG-4 Audio Profile Level indication value defined in [14496-3]. This parameter indicates which MPEG-4 Audio tool subsets the decoder is capable of using. If this parameter is not specified in the capability exchange or session setup procedure, its default value of 30 (Natural Audio Profile/Level 1) is used.

“配置文件级别id”:在[14496-3]中定义的MPEG-4音频配置文件级别指示值的十进制表示。此参数指示解码器能够使用的MPEG-4音频工具子集。如果在功能交换或会话设置过程中未指定此参数,则使用其默认值30(自然音频配置文件/级别1)。

"MPS-profile-level-id": a decimal representation of the MPEG Surround Profile Level indication as defined in [14496-3]. This parameter indicates the support of the MPEG Surround profile and level by the decoder to be capable to decode the stream.

“MPS配置文件级别id”:MPEG环绕配置文件级别指示的十进制表示,如[14496-3]中所定义。此参数表示解码器支持MPEG环绕模式和级别,以便能够解码流。

"object": a decimal representation of the MPEG-4 Audio Object Type value defined in [14496-3]. This parameter specifies the tool to be used by the decoder. It CAN be used to limit the capability within the specified "profile-level-id".

“对象”:在[14496-3]中定义的MPEG-4音频对象类型值的十进制表示。此参数指定解码器要使用的工具。它可用于将功能限制在指定的“配置文件级别id”内。

"bitrate": the data rate for the audio bitstream.

“比特率”:音频比特流的数据速率。

"cpresent": a boolean parameter that indicates whether audio payload configuration data has been multiplexed into an RTP payload (see Section 6.1). A 0 indicates the configuration data has not been multiplexed into an RTP payload, and in that case, the "config" parameter MUST be present; a 1 indicates that it has been multiplexed. The default if the parameter is omitted is 1. If this parameter is set to 1 and the "config" parameter is present, the multiplexed configuration data and the value of the "config" parameter SHALL be consistent.

“cpresent”:一个布尔参数,指示音频有效负载配置数据是否已被多路复用到RTP有效负载中(参见第6.1节)。0表示配置数据没有被多路复用到RTP有效负载中,在这种情况下,“config”参数必须存在;1表示它已被多路复用。如果省略该参数,则默认值为1。如果该参数设置为1且存在“配置”参数,则多路复用配置数据和“配置”参数的值应一致。

"config": a hexadecimal representation of an octet string that expresses the audio payload configuration data "StreamMuxConfig", as defined in [14496-3]. Configuration data is mapped onto the octet string in an MSB-first basis. The first bit of the configuration data SHALL be located at the MSB of the first octet. In the last octet, zero-padding bits, if necessary, SHALL follow the configuration data. Senders MUST set the StreamMuxConfig elements taraBufferFullness and latmBufferFullness to their largest respective value, indicating that buffer fullness measures are not used in SDP. Receivers MUST ignore the value of these two elements contained in the "config" parameter.

“配置”:八位字节字符串的十六进制表示形式,表示[14496-3]中定义的音频有效负载配置数据“StreamMuxConfig”。配置数据以MSB优先的方式映射到八位字节字符串。配置数据的第一位应位于第一个八位组的MSB。在最后一个八位字节中,如有必要,零填充位应跟随配置数据。发件人必须将StreamMuxConfig元素taraBufferFullness和latmBufferFullness设置为各自的最大值,这表明SDP中未使用缓冲区完整性度量。接收方必须忽略“config”参数中包含的这两个元素的值。

"MPS-asc": a hexadecimal representation of an octet string that expresses audio payload configuration data "AudioSpecificConfig", as defined in [14496-3]. If this parameter is not present, the relevant signaling is performed by other means (e.g., in-band or contained in the "config" string).

“MPS asc”:八位字节字符串的十六进制表示形式,表示[14496-3]中定义的音频有效负载配置数据“AudioSpecificConfig”。如果该参数不存在,则通过其他方式(例如,带内或包含在“配置”字符串中)执行相关信令。

The same mapping rules as for the "config" parameter apply.

应用与“config”参数相同的映射规则。

"ptime": duration of each packet in milliseconds.

“ptime”:每个数据包的持续时间(毫秒)。

"SBR-enabled": a boolean parameter that indicates whether SBR-data can be expected in the RTP-payload of a stream. This parameter is relevant for an SBR-capable decoder if the presence of SBR cannot be detected from an out-of-band decoder configuration (e.g., contained in the "config" string).

“SBR enabled”:一个布尔参数,指示流的RTP有效负载中是否可以预期SBR数据。如果无法从带外解码器配置(例如,包含在“配置”字符串中)检测到SBR的存在,则该参数与支持SBR的解码器相关。

If this parameter is set to 0, a decoder MAY expect that SBR is not used. If this parameter is set to 1, a decoder CAN up-sample the audio data with the SBR tool, regardless of whether or not SBR data is present in the stream.

如果此参数设置为0,则解码器可能期望不使用SBR。如果此参数设置为1,则解码器可以使用SBR工具对音频数据进行上采样,而不管流中是否存在SBR数据。

If the presence of SBR cannot be detected from out-of-band configuration and the "SBR-enabled" parameter is not present, the parameter defaults to 1 for an SBR-capable decoder. If the resulting output sampling rate or the computational complexity is not supported, the SBR tool can be disabled or run in down-sampled mode.

如果无法从带外配置中检测到SBR的存在,并且“SBR启用”参数不存在,则对于支持SBR的解码器,该参数默认为1。如果结果输出采样率或计算复杂度不受支持,则可以禁用SBR工具或在向下采样模式下运行。

The timestamp resolution at the RTP layer is determined by the "rate" parameter.

RTP层的时间戳分辨率由“速率”参数确定。

Published specification:

已发布的规范:

Encoding specifications are provided in [14496-3]. The RTP payload format specification is described in [RFC6416].

[14496-3]中提供了编码规范。[RFC6416]中描述了RTP有效负载格式规范。

Encoding considerations:

编码注意事项:

This type is only defined for transfer via RTP.

此类型仅为通过RTP传输而定义。

Security considerations:

安全考虑:

See Section 10 of [RFC6416].

见[RFC6416]第10节。

Interoperability considerations:

互操作性注意事项:

MPEG-4 Audio provides a large and rich set of tools for the coding of audio objects. For effective implementation of the standard, subsets of the MPEG-4 Audio tool sets similar to those used in MPEG-4 Visual have been provided (see Section 7.1).

MPEG-4音频为音频对象的编码提供了一套丰富的工具。为有效实施本标准,提供了与MPEG-4视频工具集类似的MPEG-4音频工具集子集(见第7.1节)。

The audio stream SHALL be compliant with the MPEG-4 Audio Profile@ Level specified by the parameters "profile-level-id" and "MPS-profile-level-id". Interoperability between a sender and a receiver may be achieved by specifying the parameters "profile-level-id" and "MPS-profile-level-id" or by arranging in the capability exchange procedure to set this parameter mutually

音频流应符合参数“配置文件级别id”和“MPS配置文件级别id”指定的MPEG-4音频配置文件@级别。发送方和接收方之间的互操作性可以通过指定参数“概要文件级别id”和“MPS概要文件级别id”或通过在能力交换过程中安排相互设置该参数来实现

to the same value. Furthermore, the "object" parameter can be used to limit the capability within the specified Profile@Level in the capability exchange.

相同的值。此外,“object”参数可用于限制指定范围内的功能Profile@Level在能力交换中。

Applications that use this media type:

使用此媒体类型的应用程序:

Audio and video streaming and conferencing tools.

音频和视频流以及会议工具。

Additional information: none

其他信息:无

Personal and email address to contact for further information:

有关更多信息,请联系个人和电子邮件地址:

See Authors' Addresses section at the end of [RFC6416].

参见[RFC6416]末尾的作者地址部分。

Intended usage: COMMON

预期用途:普通

Author:

作者:

See Authors' Addresses section at the end of [RFC6416].

参见[RFC6416]末尾的作者地址部分。

Change controller:

更改控制器:

IETF Audio/Video Transport Payloads working group delegated from the IESG.

IESG授权的IETF音频/视频传输有效载荷工作组。

7.4. Mapping to SDP for MPEG-4 Audio
7.4. 映射到用于MPEG-4音频的SDP

The media type audio/MP4A-LATM string is mapped to fields in SDP [RFC4566], as follows:

媒体类型audio/MP4A-LATM字符串映射到SDP[RFC4566]中的字段,如下所示:

o The media type (audio) goes in SDP "m=" as the media name.

o 媒体类型(音频)以SDP“m=”作为媒体名称。

o The Media subtype (MP4A-LATM) goes in SDP "a=rtpmap" as the encoding name.

o 媒体子类型(MP4A-LATM)以SDP“a=rtpmap”作为编码名称。

o The required parameter "rate" goes in "a=rtpmap" as the "clock rate".

o 所需参数“速率”作为“时钟速率”进入“a=rtpmap”。

o The optional parameter "ptime" goes in SDP "a=ptime" attribute.

o 可选参数“ptime”位于SDP“a=ptime”属性中。

o The optional parameters "profile-level-id", "MPS-profile-level-id", and "object" go in the "a=fmtp" line to indicate the coder capability.

o 可选参数“概要文件级别id”、“MPS概要文件级别id”和“对象”位于“a=fmtp”行中,以指示编码器的能力。

The following are some examples of the "profile-level-id" value: 1 : Main Audio Profile Level 1 9 : Speech Audio Profile Level 1 15: High Quality Audio Profile Level 2 30: Natural Audio Profile Level 1 44: High Efficiency AAC Profile Level 2 48: High Efficiency AAC v2 Profile Level 2 55: Baseline MPEG Surround Profile (see ISO/IEC 23003-1) Level 3

以下是“配置文件级别id”值的一些示例:1:主音频配置文件级别1 9:语音音频配置文件级别1 15:高质量音频配置文件级别2 30:自然音频配置文件级别1 44:高效AAC配置文件级别2 48:高效AAC v2配置文件级别2 55:基线MPEG环绕配置文件(参见ISO/IEC 23003-1)级别3

The optional payload-format-specific parameters "bitrate", "cpresent", "config", "MPS-asc", and "SBR-enabled" also go in the "a=fmtp" line. These parameters are expressed as a string, in the form of a semicolon-separated list of parameter=value pairs.

可选的有效负载格式特定参数“比特率”、“cpresent”、“配置”、“MPS asc”和“SBR启用”也位于“a=fmtp”行中。这些参数表示为字符串,以分号分隔的参数=值对列表的形式。

7.4.1. Declarative SDP Usage for MPEG-4 Audio
7.4.1. MPEG-4音频的声明性SDP用法

The following sections contain some examples of the media representation in SDP.

以下部分包含SDP中媒体表示的一些示例。

Note that the "a=fmtp" line in some of the examples has been wrapped to fit the page; they would comprise a single line in the SDP file.

注意,一些示例中的“a=fmtp”行已被包装以适合页面;它们将包含SDP文件中的一行。

7.4.1.1. Example: In-Band Configuration
7.4.1.1. 示例:带内配置

In this example, the audio configuration data appears in the RTP payload exclusively (i.e., the MPEG-4 audio configuration is known when a StreamMuxConfig element appears within the RTP payload).

在该示例中,音频配置数据以独占方式出现在RTP有效负载中(即,当StreamMuxConfig元素出现在RTP有效负载中时,MPEG-4音频配置是已知的)。

      m=audio 49230 RTP/AVP 96
      a=rtpmap:96 MP4A-LATM/90000
      a=fmtp:96 object=2; cpresent=1
        
      m=audio 49230 RTP/AVP 96
      a=rtpmap:96 MP4A-LATM/90000
      a=fmtp:96 object=2; cpresent=1
        

The "clock rate" is set to 90 kHz. This is the default value, and the real audio sampling rate is known when the audio configuration data is received.

“时钟频率”设置为90 kHz。这是默认值,当接收到音频配置数据时,实际音频采样率是已知的。

7.4.1.2. Example: 6 kbit/s CELP
7.4.1.2. 示例:6kbit/s CELP

This example shows a 6 kbit/s CELP (Code-Excited Linear Prediction) bitstream (with an audio sampling rate of 8 kHz).

此示例显示6 kbit/s CELP(代码激励线性预测)比特流(音频采样率为8 kHz)。

     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/8000
     a=fmtp:96 profile-level-id=9; object=8; cpresent=0;
       config=40008B18388380
     a=ptime:20
        
     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/8000
     a=fmtp:96 profile-level-id=9; object=8; cpresent=0;
       config=40008B18388380
     a=ptime:20
        

In this example, audio configuration data is not multiplexed into the RTP payload and is described only in SDP. Furthermore, the "clock rate" is set to the audio sampling rate.

在该示例中,音频配置数据未多路复用到RTP有效载荷中,并且仅在SDP中描述。此外,“时钟速率”被设置为音频采样速率。

7.4.1.3. Example: 64 kbit/s AAC LC Stereo
7.4.1.3. 示例:64 kbit/s AAC LC立体声

This example shows a 64 kbit/s AAC LC stereo bitstream (with an audio sampling rate of 24 kHz).

此示例显示64 kbit/s AAC LC立体声比特流(音频采样率为24 kHz)。

     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/24000/2
     a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
       object=2; config=400026203fc0
        
     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/24000/2
     a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
       object=2; config=400026203fc0
        

In this example, audio configuration data is not multiplexed into the RTP payload and is described only in SDP. Furthermore, the "clock rate" is set to the audio sampling rate.

在该示例中,音频配置数据未多路复用到RTP有效载荷中,并且仅在SDP中描述。此外,“时钟速率”被设置为音频采样速率。

In this example, the presence of SBR cannot be determined by the SDP parameter set. The "clock rate" represents the core codec sampling rate. An SBR-enabled decoder can use the SBR tool to up-sample the audio data if the complexity and resulting output sampling rate permit.

在本例中,SBR的存在不能由SDP参数集确定。“时钟速率”表示核心编解码器采样速率。如果复杂性和结果输出采样率允许,启用SBR的解码器可以使用SBR工具对音频数据进行上采样。

7.4.1.4. Example: Use of the "SBR-enabled" Parameter
7.4.1.4. 示例:使用“SBR启用”参数

These two examples are identical to the example above with the exception of the "SBR-enabled" parameter. The presence of SBR is not signaled by the SDP parameters "object", "profile-level-id", and "config", but instead the "SBR-enabled" parameter is present. The "rate" parameter and the StreamMuxConfig contain the core codec sampling rate.

除“SBR启用”参数外,这两个示例与上述示例相同。SDP参数“object”、“profile level id”和“config”并不表示存在SBR,而是表示存在“SBR enabled”参数。“rate”参数和StreamMuxConfig包含核心编解码器采样率。

This example shows "SBR-enabled=0", with definitive and core codec sampling rates of 24 kHz.

此示例显示“SBR enabled=0”,最终和核心编解码器采样率为24 kHz。

     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/24000/2
     a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
       SBR-enabled=0; config=400026203fc0
        
     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/24000/2
     a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
       SBR-enabled=0; config=400026203fc0
        

This example shows "SBR-enabled=1", with core codec sampling rate of 24 kHz, and definitive and SBR sampling rates of 48 kHz:

此示例显示“SBR enabled=1”,核心编解码器采样率为24 kHz,最终和SBR采样率为48 kHz:

     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/24000/2
     a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
       SBR-enabled=1; config=400026203fc0
        
     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/24000/2
     a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
       SBR-enabled=1; config=400026203fc0
        

In this example, the "clock rate" is still 24000, and this information is used for RTP timestamp calculation. The value of 24000 is used to support old AAC decoders. This makes the decoder supporting only AAC understand the HE AAC coded data, although only plain AAC is supported. A HE AAC decoder is able to generate output data with the SBR sampling rate.

在此示例中,“时钟速率”仍然是24000,并且该信息用于RTP时间戳计算。24000的值用于支持旧的AAC解码器。这使得仅支持AAC的解码器能够理解HE AAC编码数据,尽管仅支持普通AAC。AAC解码器能够以SBR采样率生成输出数据。

7.4.1.5. Example: Hierarchical Signaling of SBR
7.4.1.5. 示例:SBR的分层信令

When the presence of SBR is explicitly signaled by the SDP parameters "object", "profile-level-id", or "config", as in the example below, the StreamMuxConfig contains both the core codec sampling rate and the SBR sampling rate.

当SDP参数“object”、“profile level id”或“config”明确表示存在SBR时,如以下示例所示,StreamMuxConfig包含核心编解码器采样率和SBR采样率。

     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/48000/2
     a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0;
       config=40005623101fe0; SBR-enabled=1
        
     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/48000/2
     a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0;
       config=40005623101fe0; SBR-enabled=1
        

This "config" string uses the explicit signaling mode 2.A (hierarchical signaling; see [14496-3]. This means that the AOT (Audio Object Type) is SBR (5) and SFI (Sampling Frequency Index) is 6 (24000 Hz), which refers to the underlying core codec sampling frequency. CC (Channel Configuration) is stereo (2), and the ESFI (Extension Sampling Frequency Index)=3 (48000) is referring to the sampling frequency of the extension tool (SBR).

此“配置”字符串使用显式信令模式2.A(分层信令;请参见[14496-3]。这意味着AOT(音频对象类型)为SBR(5),SFI(采样频率索引)为6(24000 Hz),这表示底层核心编解码器采样频率。CC(信道配置)为立体声(2),ESFI为立体声(扩展采样频率指数)=3(48000)是指扩展工具(SBR)的采样频率。

7.4.1.6. Example: HE AAC v2 Signaling
7.4.1.6. 示例:HE AAC v2信令

HE AAC v2 decoders are required to always produce a stereo signal from a mono signal. Hence, there is no parameter necessary to signal the presence of PS.

AAC v2解码器需要始终从单声道信号产生立体声信号。因此,没有必要的参数来表示PS的存在。

This example shows "SBR-enabled=1" with 1 channel signaled in the "a=rtpmap" line and within the "config" parameter. The core codec sampling rate is 24 kHz; the definitive and SBR sampling rates are 48 kHz. The core codec channel configuration is mono; the PS channel configuration is stereo.

此示例显示了“SBR enabled=1”,在“a=rtpmap”行和“config”参数内有1个通道发出信号。核心编解码器采样率为24kHz;最终和SBR采样率为48 kHz。核心编解码器信道配置为单声道;PS通道配置为立体声。

     m=audio 49230 RTP/AVP 110
     a=rtpmap:110 MP4A-LATM/24000/1
     a=fmtp:110 profile-level-id=15; object=2; cpresent=0;
       config=400026103fc0; SBR-enabled=1
        
     m=audio 49230 RTP/AVP 110
     a=rtpmap:110 MP4A-LATM/24000/1
     a=fmtp:110 profile-level-id=15; object=2; cpresent=0;
       config=400026103fc0; SBR-enabled=1
        
7.4.1.7. Example: Hierarchical Signaling of PS
7.4.1.7. 示例:PS的分层信令

This example shows 48 kHz stereo audio input.

此示例显示48 kHz立体声音频输入。

     m=audio 49230 RTP/AVP 110
     a=rtpmap:110 MP4A-LATM/48000/2
     a=fmtp:110 profile-level-id=48; cpresent=0; config=4001d613101fe0
        
     m=audio 49230 RTP/AVP 110
     a=rtpmap:110 MP4A-LATM/48000/2
     a=fmtp:110 profile-level-id=48; cpresent=0; config=4001d613101fe0
        

The "config" parameter indicates explicit hierarchical signaling of PS and SBR. This configuration method is not supported by legacy AAC an HE AAC decoders, and these are therefore unable to decode the coded data.

“config”参数表示PS和SBR的显式分层信令。传统AAC和AAC解码器不支持此配置方法,因此无法解码编码数据。

7.4.1.8. Example: MPEG Surround
7.4.1.8. 示例:MPEG环绕
   The following examples show how MPEG Surround configuration data can
   be signaled using SDP.  The configuration is carried within the
   "config" string in the first example by using two different layers.
   The general parameters in this example are: AudioMuxVersion=1;
   allStreamsSameTimeFraming=1; numSubFrames=0; numProgram=0;
   numLayer=1.  The first layer describes the HE AAC payload and signals
   the following parameters: ascLen=25; audioObjectType=2 (AAC LC);
   extensionAudioObjectType=5 (SBR); samplingFrequencyIndex=6 (24 kHz);
   extensionSamplingFrequencyIndex=3 (48 kHz); channelConfiguration=2
   (2.0 channels).  The second layer describes the MPEG Surround payload
   and specifies the following parameters: ascLen=110;
   AudioObjectType=30 (MPEG Surround); samplingFrequencyIndex=3 (48
   kHz); channelConfiguration=6 (5.1 channels); sacPayloadEmbedding=1;
   SpatialSpecificConfig=(48 kHz; 32 slots; 525 tree; ResCoding=1;
   ResBands=[7,7,7,7]).
        
   The following examples show how MPEG Surround configuration data can
   be signaled using SDP.  The configuration is carried within the
   "config" string in the first example by using two different layers.
   The general parameters in this example are: AudioMuxVersion=1;
   allStreamsSameTimeFraming=1; numSubFrames=0; numProgram=0;
   numLayer=1.  The first layer describes the HE AAC payload and signals
   the following parameters: ascLen=25; audioObjectType=2 (AAC LC);
   extensionAudioObjectType=5 (SBR); samplingFrequencyIndex=6 (24 kHz);
   extensionSamplingFrequencyIndex=3 (48 kHz); channelConfiguration=2
   (2.0 channels).  The second layer describes the MPEG Surround payload
   and specifies the following parameters: ascLen=110;
   AudioObjectType=30 (MPEG Surround); samplingFrequencyIndex=3 (48
   kHz); channelConfiguration=6 (5.1 channels); sacPayloadEmbedding=1;
   SpatialSpecificConfig=(48 kHz; 32 slots; 525 tree; ResCoding=1;
   ResBands=[7,7,7,7]).
        

In this example, the signaling is carried by using two different LATM layers. The MPEG Surround payload is carried together with the AAC payload in a single layer as indicated by the sacPayloadEmbedding Flag.

在此示例中,通过使用两个不同的LATM层来承载信令。如sacPayloadEmbedding标志所示,MPEG环绕负载与AAC负载一起在单个层中承载。

     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/48000
     a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
       SBR-enabled=1;
       config=8FF8004192B11880FF0DDE3699F2408C00536C02313CF3CE0FF0
        
     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/48000
     a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
       SBR-enabled=1;
       config=8FF8004192B11880FF0DDE3699F2408C00536C02313CF3CE0FF0
        
7.4.1.9. Example: MPEG Surround with Extended SDP Parameters
7.4.1.9. 示例:具有扩展SDP参数的MPEG环绕

The following example is an extension of the configuration given above by the MPEG-Surround-specific parameters. The "MPS-asc" parameter specifies the MPEG Surround Baseline Profile at Level 3 (PLI55), and the "MPS-asc" string contains the hexadecimal

下面的示例是上述MPEG环绕特定参数配置的扩展。“MPS asc”参数指定级别3(PLI55)的MPEG环绕基线配置文件,“MPS asc”字符串包含十六进制

   representation of the MPEG Surround ASC [audioObjectType=30 (MPEG
   Surround); samplingFrequencyIndex=0x3 (48 kHz);
   channelConfiguration=6 (5.1 channels); sacPayloadEmbedding=1;
   SpatialSpecificConfig=(48 kHz; 32 slots; 525 tree; ResCoding=1;
   ResBands=[0,13,13,13])].
        
   representation of the MPEG Surround ASC [audioObjectType=30 (MPEG
   Surround); samplingFrequencyIndex=0x3 (48 kHz);
   channelConfiguration=6 (5.1 channels); sacPayloadEmbedding=1;
   SpatialSpecificConfig=(48 kHz; 32 slots; 525 tree; ResCoding=1;
   ResBands=[0,13,13,13])].
        
     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/48000
     a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0;
       config=40005623101fe0; MPS-profile-level-id=55;
       MPS-asc=F1B4CF920442029B501185B6DA00;
        
     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/48000
     a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0;
       config=40005623101fe0; MPS-profile-level-id=55;
       MPS-asc=F1B4CF920442029B501185B6DA00;
        
7.4.1.10. Example: MPEG Surround with Single-Layer Configuration
7.4.1.10. 示例:带有单层配置的MPEG环绕
   The following example shows how MPEG Surround configuration data can
   be signaled using the SDP "config" parameter.  The configuration is
   carried within the "config" string using a single layer.  The general
   parameters in this example are: AudioMuxVersion=1;
   allStreamsSameTimeFraming=1; numSubFrames=0; numProgram=0;
   numLayer=0.  The single layer describes the combination of HE AAC and
   MPEG Surround payload and signals the following parameters:
   ascLen=101; audioObjectType=2 (AAC LC); extensionAudioObjectType=5
   (SBR); samplingFrequencyIndex=7 (22.05 kHz);
   extensionSamplingFrequencyIndex=7 (44.1 kHz); channelConfiguration=2
   (2.0 channels).  A backward-compatible extension according to
   [14496-3/Amd.1] signals the presence of MPEG Surround payload data
   and specifies the following parameters: SpatialSpecificConfig=(44.1
   kHz; 32 slots; 525 tree; ResCoding=0).
        
   The following example shows how MPEG Surround configuration data can
   be signaled using the SDP "config" parameter.  The configuration is
   carried within the "config" string using a single layer.  The general
   parameters in this example are: AudioMuxVersion=1;
   allStreamsSameTimeFraming=1; numSubFrames=0; numProgram=0;
   numLayer=0.  The single layer describes the combination of HE AAC and
   MPEG Surround payload and signals the following parameters:
   ascLen=101; audioObjectType=2 (AAC LC); extensionAudioObjectType=5
   (SBR); samplingFrequencyIndex=7 (22.05 kHz);
   extensionSamplingFrequencyIndex=7 (44.1 kHz); channelConfiguration=2
   (2.0 channels).  A backward-compatible extension according to
   [14496-3/Amd.1] signals the presence of MPEG Surround payload data
   and specifies the following parameters: SpatialSpecificConfig=(44.1
   kHz; 32 slots; 525 tree; ResCoding=0).
        

In this example, the signaling is carried by using a single LATM layer. The MPEG Surround payload is carried together with the HE AAC payload in a single layer.

在此示例中,通过使用单个LATM层来承载信令。MPEG环绕声有效载荷与HE AAC有效载荷一起在单个层中承载。

     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/44100
     a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0;
       SBR-enabled=1; config=8FF8000652B920876A83A1F440884053620FF0;
       MPS-profile-level-id=55
        
     m=audio 49230 RTP/AVP 96
     a=rtpmap:96 MP4A-LATM/44100
     a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0;
       SBR-enabled=1; config=8FF8000652B920876A83A1F440884053620FF0;
       MPS-profile-level-id=55
        
8. IANA Considerations
8. IANA考虑

This document updates the media subtypes "MP4A-LATM" and "MP4V-ES" from RFC 3016. The new registrations are in Sections 7.1 and 7.3 of this document.

本文档更新了RFC 3016中的媒体子类型“MP4A-LATM”和“MP4V-ES”。新注册在本文件第7.1节和第7.3节中。

9. Acknowledgements
9. 致谢

The authors would like to thank Yoshihiro Kikuchi, Yoshinori Matsui, Toshiyuki Nomura, Shigeru Fukunaga, and Hideaki Kimata for their work on RFC 3016, and Ali Begen, Keith Drage, Roni Even, and Qin Wu for their valuable input and comments on this document.

作者要感谢菊池吉弘、松井吉彦、野村俊彦、福永茂和木田英彦在RFC 3016上的工作,以及阿里·贝根、基思·德拉吉、甚至罗尼和秦武对本文件的宝贵投入和评论。

10. Security Considerations
10. 安全考虑

RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [RFC3550] and in any applicable RTP profile. The main security considerations for the RTP packet carrying the RTP payload format defined within this document are confidentiality, integrity, and source authenticity. Confidentiality is achieved by encryption of the RTP payload, and integrity of the RTP packets is achieved through a suitable cryptographic integrity protection mechanism. A cryptographic system may also allow the authentication of the source of the payload. A suitable security mechanism for this RTP payload format should provide confidentiality, integrity protection, and (at least) source authentication capable of determining whether or not an RTP packet is from a member of the RTP session.

使用本规范中定义的有效负载格式的RTP数据包应遵守RTP规范[RFC3550]和任何适用RTP配置文件中讨论的安全注意事项。承载本文档中定义的RTP有效负载格式的RTP数据包的主要安全注意事项是机密性、完整性和源真实性。机密性通过RTP有效载荷的加密实现,RTP数据包的完整性通过合适的密码完整性保护机制实现。密码系统还可允许对有效载荷的源进行认证。该RTP有效载荷格式的合适安全机制应提供机密性、完整性保护和(至少)能够确定RTP分组是否来自RTP会话的成员的源认证。

Note that most MPEG-4 codecs define an extension mechanism to transmit extra data within a stream that is gracefully skipped by decoders that do not support this extra data. This may be used to transmit unwanted data in an otherwise valid stream.

请注意,大多数MPEG-4编解码器定义了一种扩展机制,用于在不支持此额外数据的解码器正常跳过的流中传输额外数据。这可用于在其他有效流中传输不需要的数据。

The appropriate mechanism to provide security to RTP and payloads following this may vary. It is dependent on the application, the transport, and the signaling protocol employed. Therefore, a single mechanism is not sufficient, although, if suitable, the usage of the Secure Real-time Transport Protocol (SRTP) [RFC3711] is recommended. Other mechanisms that may be used are IPsec [RFC4301] and Transport Layer Security (TLS) [RFC5246] (e.g., for RTP over TCP), but other alternatives may also exist.

为RTP和有效负载提供安全性的适当机制可能会有所不同。它取决于应用程序、传输和所采用的信令协议。因此,单一机制是不够的,尽管如果合适,建议使用安全实时传输协议(SRTP)[RFC3711]。可使用的其他机制包括IPsec[RFC4301]和传输层安全(TLS)[RFC5246](例如,用于TCP上的RTP),但也可能存在其他替代方案。

This RTP payload format and its media decoder do not exhibit any significant non-uniformity in the receiver-side computational complexity for packet processing, and thus are unlikely to pose a denial-of-service threat due to the receipt of pathological data. The complete MPEG-4 System allows for transport of a wide range of content, including Java applets (MPEG-J) and scripts. Since this payload format is restricted to audio and video streams, it is not possible to transport such active content in this format.

此RTP有效载荷格式及其媒体解码器在用于分组处理的接收器端计算复杂度方面不表现出任何显著的非均匀性,因此不太可能由于接收病理数据而造成拒绝服务威胁。完整的MPEG-4系统允许传输广泛的内容,包括Java小程序(MPEG-J)和脚本。由于此有效负载格式仅限于音频和视频流,因此不可能以此格式传输此类活动内容。

11. Differences to RFC 3016
11. 与RFC 3016的差异

The RTP payload format for MPEG-4 Audio as specified in RFC 3016 is used by the 3GPP PSS service [3GPP]. However, there are some misalignments between RFC 3016 and the 3GPP PSS specification that are addressed by this update:

3GPP PSS服务[3GPP]使用RFC 3016中指定的MPEG-4音频的RTP有效负载格式。但是,RFC 3016和3GPP PSS规范之间存在一些偏差,这些偏差由本更新解决:

o The audio payload format (LATM) referenced in this document is the newer format specified in [14496-3], which is binary compatible to the format used in [3GPP]. This newer format is not binary compatible with the LATM referenced in RFC 3016, which is specified in [14496-3:1999/Amd.1:2000].

o 本文件中引用的音频有效负载格式(LATM)是[14496-3]中规定的较新格式,与[3GPP]中使用的格式二进制兼容。这种较新的格式与[14496-3:1999/Amd.1:2000]中规定的RFC 3016中引用的LATM不兼容。

o The audio signaling format (StreamMuxConfig) referenced in this document is binary compatible to the format used in [3GPP]. The StreamMuxConfig element has also been revised by MPEG since RFC 3016.

o 本文档中引用的音频信号格式(StreamMuxConfig)与[3GPP]中使用的格式是二进制兼容的。自RFC3016以来,StreamMuxConfig元素也已由MPEG修订。

o The use of an audio parameter "SBR-enabled" is now defined in this document, which is used by 3GPP implementations [3GPP]. RFC 3016 does not define this parameter.

o 音频参数“SBR enabled”的使用现在在本文档中定义,由3GPP实现[3GPP]使用。RFC 3016未定义此参数。

o The "rate" parameter is defined unambiguously in this document for the case of presence of SBR (Spectral Band Replication). In RFC 3016, the definition of the "rate" parameter is ambiguous.

o 对于存在SBR(光谱带复制)的情况,本文件明确定义了“速率”参数。在RFC 3016中,“速率”参数的定义不明确。

o The number of audio channels parameter is defined unambiguously in this document for the case of presence of PS (Parametric Stereo). At the time RFC 3016 was written, PS was not yet defined.

o 在本文件中,对于存在PS(参数立体声)的情况,音频通道数参数的定义是明确的。在编写RFC3016时,尚未定义PS。

Furthermore, some comments have been addressed and signaling support for MPEG Surround [23003-1] was added.

此外,还提出了一些意见,并添加了对MPEG环绕[23003-1]的信令支持。

Below is a summary of the changes in requirements by this update:

以下是本次更新对需求变化的总结:

o In the dynamic assignment of RTP payload types for scalable MPEG-4 Audio streams, the server SHALL assign a different value to each layer.

o 在可伸缩MPEG-4音频流的RTP有效负载类型的动态分配中,服务器应为每一层分配不同的值。

o The dependency relationships between the enhanced layer and the base layer for scalable MPEG-4 Audio streams MUST be signaled as specified in [RFC5583].

o 可伸缩MPEG-4音频流的增强层和基础层之间的依赖关系必须按照[RFC5583]中的规定发出信号。

o If the size of an audioMuxElement is so large that the size of the RTP packet containing it does exceed the size of the Path MTU, the audioMuxElement SHALL be fragmented and spread across multiple packets.

o 如果音频复用单元的大小如此之大,以至于包含它的RTP数据包的大小超过路径MTU的大小,则音频复用单元应被分割并分布在多个数据包中。

o The receiver MUST ignore any unspecified parameter in order to ensure that additional parameters can be added in any future revision of this specification.

o 接收器必须忽略任何未指定的参数,以确保在本规范的任何未来版本中可以添加其他参数。

12. References
12. 工具书类
12.1. Normative References
12.1. 规范性引用文件

[14496-2] MPEG, "ISO/IEC International Standard 14496-2 - Coding of audio-visual objects, Part 2: Visual", 2003.

[14496-2]MPEG,“ISO/IEC国际标准14496-2-视听对象编码,第2部分:视觉”,2003年。

[14496-3] MPEG, "ISO/IEC International Standard 14496-3 - Coding of audio-visual objects, Part 3 Audio", 2009.

[14496-3]MPEG,“ISO/IEC国际标准14496-3-视听对象编码,第3部分音频”,2009年。

[14496-3/Amd.1] MPEG, "ISO/IEC International Standard 14496-3 - Coding of audio-visual objects, Part 3: Audio, Amendment 1: HD-AAC profile and MPEG Surround signaling", 2009.

[14496-3/Amd.1]MPEG,“ISO/IEC国际标准14496-3-视听对象编码,第3部分:音频,修改件1:HD-AAC配置文件和MPEG环绕信号”,2009年。

[23003-1] MPEG, "ISO/IEC International Standard 23003-1 - MPEG Surround (MPEG D)", 2007.

[23003-1]MPEG,“ISO/IEC国际标准23003-1-MPEG环绕(MPEG D)”,2007年。

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.

[RFC3550]Schulzrinne,H.,Casner,S.,Frederick,R.,和V.Jacobson,“RTP:实时应用的传输协议”,STD 64,RFC 35502003年7月。

[RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and Registration Procedures", BCP 13, RFC 4288, December 2005.

[RFC4288]Freed,N.和J.Klensin,“介质类型规范和注册程序”,BCP 13,RFC 4288,2005年12月。

[RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006.

[RFC4566]Handley,M.,Jacobson,V.,和C.Perkins,“SDP:会话描述协议”,RFC4566,2006年7月。

[RFC4629] Ott, H., Bormann, C., Sullivan, G., Wenger, S., and R. Even, "RTP Payload Format for ITU-T Rec", RFC 4629, January 2007.

[RFC4629]Ott,H.,Bormann,C.,Sullivan,G.,Wenger,S.,和R.偶,“ITU-T Rec的RTP有效载荷格式”,RFC 46292007年1月。

[RFC4855] Casner, S., "Media Type Registration of RTP Payload Formats", RFC 4855, February 2007.

[RFC4855]Casner,S.,“RTP有效负载格式的媒体类型注册”,RFC 48552007年2月。

[RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding Dependency in the Session Description Protocol (SDP)", RFC 5583, July 2009.

[RFC5583]Schierl,T.和S.Wenger,“会话描述协议(SDP)中的信令媒体解码依赖性”,RFC 5583,2009年7月。

12.2. Informative References
12.2. 资料性引用

[14496-1] MPEG, "ISO/IEC International Standard 14496-1 - Coding of audio-visual objects, Part 1 Systems", 2004.

[14496-1]MPEG,“ISO/IEC国际标准14496-1-视听对象编码,第1部分系统”,2004年。

[14496-12] MPEG, "ISO/IEC International Standard 14496-12 - Coding of audio-visual objects, Part 12 ISO base media file format".

[14496-12]MPEG,“ISO/IEC国际标准14496-12-视听对象编码,第12部分ISO基本媒体文件格式”。

[14496-14] MPEG, "ISO/IEC International Standard 14496-14 - Coding of audio-visual objects, Part 12 MP4 file format".

[14496-14]MPEG,“ISO/IEC国际标准14496-14-视听对象编码,第12部分MP4文件格式”。

[14496-3:1999/Amd.1:2000] MPEG, "ISO/IEC International Standard 14496-3 - Coding of audio-visual objects, Part 3 Audio, Amendment 1: Audio extensions", 2000.

[14496-3:1999/Amd.1:2000]MPEG,“ISO/IEC国际标准14496-3-视听对象编码,第3部分音频,修改件1:音频扩展”,2000年。

[3GPP] 3GPP, "3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Transparent end-to-end Packet-switched Streaming Service (PSS); Protocols and codecs (Release 9)", 3GPP TS 26.234 V9.5.0, December 2010.

[3GPP]3GPP,“第三代合作伙伴关系项目;技术规范组服务和系统方面;透明端到端分组交换流媒体服务(PSS);协议和编解码器(第9版)”,3GPP TS 26.234 V9.5.0,2010年12月。

[H245] International Telecommunication Union, "Control protocol for multimedia communication", ITU Recommendation H.245, December 2009.

[H245]国际电信联盟,“多媒体通信控制协议”,ITU建议H.245,2009年12月。

[H261] International Telecommunication Union, "Video codec for audiovisual services at p x 64 kbit/s", ITU Recommendation H.261, March 1993.

[H261]国际电信联盟,“用于p x 64 kbit/s视听服务的视频编解码器”,国际电联建议H.261,1993年3月。

[H323] International Telecommunication Union, "Packet-based multimedia communications systems", ITU Recommendation H.323, December 2009.

[H323]国际电信联盟,“基于分组的多媒体通信系统”,ITU建议H.323,2009年12月。

[RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, September 1997.

[RFC2198]Perkins,C.,Kouvelas,I.,Hodson,O.,Hardman,V.,Handley,M.,Bolot,J.,Vega Garcia,A.,和S.Fosse Parisis,“冗余音频数据的RTP有效载荷”,RFC 21981997年9月。

[RFC3016] Kikuchi, Y., Nomura, T., Fukunaga, S., Matsui, Y., and H. Kimata, "RTP Payload Format for MPEG-4 Audio/Visual Streams", RFC 3016, November 2000.

[RFC3016]菊口,Y.,野村,T.,福永,S.,松井,Y.,和H.Kimata,“MPEG-4音频/视频流的RTP有效载荷格式”,RFC3016,2000年11月。

[RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002.

[RFC3261]Rosenberg,J.,Schulzrinne,H.,Camarillo,G.,Johnston,A.,Peterson,J.,Sparks,R.,Handley,M.,和E.Schooler,“SIP:会话启动协议”,RFC 3261,2002年6月。

[RFC3640] van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and P. Gentric, "RTP Payload Format for Transport of MPEG-4 Elementary Streams", RFC 3640, November 2003.

[RFC3640]van der Meer,J.,Mackie,D.,Swaminathan,V.,Singer,D.,和P.Gentric,“MPEG-4基本流传输的RTP有效载荷格式”,RFC 36402003年11月。

[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004.

[RFC3711]Baugher,M.,McGrew,D.,Naslund,M.,Carrara,E.,和K.Norrman,“安全实时传输协议(SRTP)”,RFC 37112004年3月。

[RFC4301] Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, December 2005.

[RFC4301]Kent,S.和K.Seo,“互联网协议的安全架构”,RFC 43012005年12月。

[RFC4628] Even, R., "RTP Payload Format for H.263 Moving RFC 2190 to Historic Status", RFC 4628, January 2007.

[RFC4628]偶数,R.,“H.263将RFC 2190移动到历史状态的RTP有效载荷格式”,RFC 4628,2007年1月。

[RFC5109] Li, A., "RTP Payload Format for Generic Forward Error Correction", RFC 5109, December 2007.

[RFC5109]Li,A.“通用前向纠错的RTP有效载荷格式”,RFC 5109,2007年12月。

[RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.2", RFC 5246, August 2008.

[RFC5246]Dierks,T.和E.Rescorla,“传输层安全(TLS)协议版本1.2”,RFC 5246,2008年8月。

[RFC5691] de Bont, F., Doehla, S., Schmidt, M., and R. Sperschneider, "RTP Payload Format for Elementary Streams with MPEG Surround Multi-Channel Audio", RFC 5691, October 2009.

[RFC5691]de Bont,F.,Doehla,S.,Schmidt,M.,和R.Sperschneider,“具有MPEG环绕多声道音频的基本流的RTP有效负载格式”,RFC 5691,2009年10月。

Authors' Addresses

作者地址

Malte Schmidt Dolby Laboratories Deutschherrnstr. 15-19 90537 Nuernberg DE

德国马尔特施密特杜比实验室。纽伦堡15-19 90537

   Phone: +49 911 928 91 42
   EMail: malte.schmidt@dolby.com
        
   Phone: +49 911 928 91 42
   EMail: malte.schmidt@dolby.com
        

Frans de Bont Philips Electronics High Tech Campus 36 5656 AE Eindhoven NL

弗朗斯·德·邦特飞利浦电子高科技园区36 5656埃因霍温NL

   Phone: +31 40 2740234
   EMail: frans.de.bont@philips.com
        
   Phone: +31 40 2740234
   EMail: frans.de.bont@philips.com
        

Stefan Doehla Fraunhofer IIS Am Wolfmantel 33 91058 Erlangen DE

Stefan Doehla Fraunhofer IIS Am Wolfmantel 33 91058 Erlangen DE

   Phone: +49 9131 776 6042
   EMail: stefan.doehla@iis.fraunhofer.de
        
   Phone: +49 9131 776 6042
   EMail: stefan.doehla@iis.fraunhofer.de
        

Jaehwan Kim LG Electronics Inc. VCS/HE, 16Fl. LG Twin Towers Yoido-Dong, YoungDungPo-Gu, Seoul 150-721 Korea

杰万·金LG电子有限公司VCS/HE,16Fl。LG双子塔韩国首尔永东坡Yoido Dong 150-721

   Phone: +82 10 6225 0619
   EMail: kjh1905m@naver.com
        
   Phone: +82 10 6225 0619
   EMail: kjh1905m@naver.com