Network Working Group                                         Y. Kikuchi
Request for Comments: 3016                                       Toshiba
Category: Standards Track                                      T. Nomura
                                                                     NEC
                                                             S. Fukunaga
                                                                     Oki
                                                               Y. Matsui
                                                              Matsushita
                                                               H. Kimata
                                                                     NTT
                                                           November 2000
        
Network Working Group                                         Y. Kikuchi
Request for Comments: 3016                                       Toshiba
Category: Standards Track                                      T. Nomura
                                                                     NEC
                                                             S. Fukunaga
                                                                     Oki
                                                               Y. Matsui
                                                              Matsushita
                                                               H. Kimata
                                                                     NTT
                                                           November 2000
        

RTP Payload Format for MPEG-4 Audio/Visual Streams

MPEG-4音频/视频流的RTP有效负载格式

Status of this Memo

本备忘录的状况

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.

本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (2000). All Rights Reserved.

版权所有(C)互联网协会(2000年)。版权所有。

Abstract

摘要

This document describes Real-Time Transport Protocol (RTP) payload formats for carrying each of MPEG-4 Audio and MPEG-4 Visual bitstreams without using MPEG-4 Systems. For the purpose of directly mapping MPEG-4 Audio/Visual bitstreams onto RTP packets, it provides specifications for the use of RTP header fields and also specifies fragmentation rules. It also provides specifications for Multipurpose Internet Mail Extensions (MIME) type registrations and the use of Session Description Protocol (SDP).

本文档描述了用于在不使用MPEG-4系统的情况下承载每个MPEG-4音频和MPEG-4视频比特流的实时传输协议(RTP)有效载荷格式。为了将MPEG-4音频/视频比特流直接映射到RTP数据包,它提供了RTP头字段的使用规范,并指定了分段规则。它还为多用途Internet邮件扩展(MIME)类型注册和会话描述协议(SDP)的使用提供了规范。

1. Introduction
1. 介绍

The RTP payload formats described in this document specify how MPEG-4 Audio [3][5] and MPEG-4 Visual streams [2][4] are to be fragmented and mapped directly onto RTP packets.

本文档中描述的RTP有效负载格式规定了如何将MPEG-4音频[3][5]和MPEG-4视频流[2][4]分段并直接映射到RTP数据包。

These RTP payload formats enable transport of MPEG-4 Audio/Visual streams without using the synchronization and stream management functionality of MPEG-4 Systems [6]. Such RTP payload formats will be used in systems that have intrinsic stream management

这些RTP有效载荷格式能够在不使用MPEG-4系统的同步和流管理功能的情况下传输MPEG-4音频/视频流[6]。这种RTP有效负载格式将用于具有内在流管理的系统中

functionality and thus require no such functionality from MPEG-4 Systems. H.323 terminals are an example of such systems, where MPEG-4 Audio/Visual streams are not managed by MPEG-4 Systems Object Descriptors but by H.245. The streams are directly mapped onto RTP packets without using MPEG-4 Systems Sync Layer. Other examples are SIP and RTSP where MIME and SDP are used. MIME types and SDP usages of the RTP payload formats described in this document are defined to directly specify the attribute of Audio/Visual streams (e.g., media type, packetization format and codec configuration) without using MPEG-4 Systems. The obvious benefit is that these MPEG-4 Audio/Visual RTP payload formats can be handled in an unified way together with those formats defined for non-MPEG-4 codecs. The disadvantage is that interoperability with environments using MPEG-4 Systems may be difficult, other payload formats may be better suited to those applications.

因此,MPEG-4系统不需要此类功能。H.323终端是此类系统的示例,其中MPEG-4音频/视频流不是由MPEG-4系统对象描述符管理的,而是由H.245管理的。流直接映射到RTP数据包,而不使用MPEG-4系统同步层。其他示例是SIP和RTSP,其中使用了MIME和SDP。本文档中描述的RTP有效负载格式的MIME类型和SDP使用被定义为直接指定音频/视频流的属性(例如,媒体类型、打包格式和编解码器配置),而不使用MPEG-4系统。明显的好处是,这些MPEG-4音频/视频RTP有效负载格式可以与为非MPEG-4编解码器定义的格式一起以统一的方式处理。缺点是与使用MPEG-4系统的环境的互操作性可能很困难,其他有效负载格式可能更适合这些应用。

The semantics of RTP headers in such cases need to be clearly defined, including the association with MPEG-4 Audio/Visual data elements. In addition, it is beneficial to define the fragmentation rules of RTP packets for MPEG-4 Video streams so as to enhance error resiliency by utilizing the error resilience tools provided inside the MPEG-4 Video stream.

在这种情况下,需要明确定义RTP头的语义,包括与MPEG-4音频/视频数据元素的关联。此外,通过利用MPEG-4视频流内部提供的错误恢复工具,为MPEG-4视频流定义RTP分组的分段规则以增强错误恢复能力是有益的。

1.1 MPEG-4 Visual RTP payload format
1.1 MPEG-4可视RTP有效负载格式

MPEG-4 Visual is a visual coding standard with many new features: high coding efficiency; high error resiliency; multiple, arbitrary shape object-based coding; etc. [2]. It covers a wide range of bitrates from scores of Kbps to several Mbps. It also covers a wide variety of networks, ranging from those guaranteed to be almost error-free to mobile networks with high error rates.

MPEG-4视频编码标准是一种具有许多新特点的视频编码标准:编码效率高;错误恢复能力强;基于对象的多、任意形状编码;等[2]。它涵盖了从几十Kbps到几Mbps的广泛比特率。它还涵盖各种各样的网络,从保证几乎无错误的网络到错误率高的移动网络。

With respect to the fragmentation rules for an MPEG-4 Visual bitstream defined in this document, since MPEG-4 Visual is used for a wide variety of networks, it is desirable not to apply too much restriction on fragmentation, and a fragmentation rule such as "a single video packet shall always be mapped on a single RTP packet" may be inappropriate. On the other hand, careless, media unaware fragmentation may cause degradation in error resiliency and bandwidth efficiency. The fragmentation rules described in this document are flexible but manage to define the minimum rules for preventing meaningless fragmentation while utilizing the error resilience functionalities of MPEG-4 Visual.

关于本文档中定义的MPEG-4视频比特流的分段规则,由于MPEG-4视频用于各种各样的网络,因此不希望对分段应用过多的限制,并且诸如“单个视频分组应始终映射到单个RTP分组”之类的分段规则可能不合适。另一方面,不小心的、不知道介质的碎片可能会导致错误恢复能力和带宽效率的降低。本文档中描述的分段规则是灵活的,但在利用MPEG-4 Visual的错误恢复功能的同时,能够定义用于防止无意义分段的最小规则。

The fragmentation rule recommends not to map more than one VOP in an RTP packet so that the RTP timestamp uniquely indicates the VOP time framing. On the other hand, MPEG-4 video may generate VOPs of very small size, in cases with an empty VOP (vop_coded=0) containing only

分段规则建议不要映射RTP数据包中的多个VOP,以便RTP时间戳唯一地指示VOP时间帧。另一方面,MPEG-4视频在空VOP(VOP_coded=0)仅包含

VOP header or an arbitrary shaped VOP with a small number of coding blocks. To reduce the overhead for such cases, the fragmentation rule permits concatenating multiple VOPs in an RTP packet. (See fragmentation rule (4) in section 3.2 and marker bit and timestamp in section 3.1.)

VOP报头或带有少量编码块的任意形状的VOP。为了减少这种情况下的开销,分段规则允许在RTP数据包中连接多个VOP。(参见第3.2节中的碎片规则(4)和第3.1节中的标记位和时间戳。)

While the additional media specific RTP header defined for such video coding tools as H.261 or MPEG-1/2 is effective in helping to recover picture headers corrupted by packet losses, MPEG-4 Visual has already error resilience functionalities for recovering corrupt headers, and these can be used on RTP/IP networks as well as on other networks (H.223/mobile, MPEG-2/TS, etc.). Therefore, no extra RTP header fields are defined in this MPEG-4 Visual RTP payload format.

虽然为诸如H.261或MPEG-1/2之类的视频编码工具定义的附加媒体特定RTP报头在帮助恢复因分组丢失而损坏的图片报头方面是有效的,但MPEG-4 Visual已经具有用于恢复损坏报头的错误恢复功能,并且这些功能可以在RTP/IP网络以及其他网络上使用(H.223/mobile、MPEG-2/TS等)。因此,在此MPEG-4可视RTP有效载荷格式中未定义额外的RTP头字段。

1.2 MPEG-4 Audio RTP payload format
1.2 MPEG-4音频RTP有效负载格式

MPEG-4 Audio is a new kind of audio standard that integrates many different types of audio coding tools. Low-overhead MPEG-4 Audio Transport Multiplex (LATM) manages the sequences of audio data with relatively small overhead. In audio-only applications, then, it is desirable for LATM-based MPEG-4 Audio bitstreams to be directly mapped onto the RTP packets without using MPEG-4 Systems.

MPEG-4音频是一种新的音频标准,它集成了多种不同类型的音频编码工具。低开销MPEG-4音频传输多路复用(LATM)以相对较小的开销管理音频数据序列。因此,在仅音频应用中,希望基于LATM的MPEG-4音频比特流直接映射到RTP分组,而不使用MPEG-4系统。

While LATM has several multiplexing features as follows;

而LATM具有以下几种复用特性:;

- Carrying configuration information with audio data, - Concatenation of multiple audio frames in one audio stream, - Multiplexing multiple objects (programs), - Multiplexing scalable layers,

- 携带配置信息和音频数据,-在一个音频流中串联多个音频帧,-多路复用多个对象(程序),多路复用可伸缩层,

in RTP transmission there is no need for the last two features. Therefore, these two features MUST NOT be used in applications based on RTP packetization specified by this document. Since LATM has been developed for only natural audio coding tools, i.e., not for synthesis tools, it seems difficult to transmit Structured Audio (SA) data and Text to Speech Interface (TTSI) data by LATM. Therefore, SA data and TTSI data MUST NOT be transported by the RTP packetization in this document.

在RTP传输中,不需要最后两个特性。因此,这两个特性不得用于基于本文档指定的RTP打包的应用程序中。由于LATM仅用于自然音频编码工具,即不用于合成工具,因此似乎很难通过LATM传输结构化音频(SA)数据和文本到语音接口(TTSI)数据。因此,SA数据和TTSI数据不得通过本文件中的RTP打包进行传输。

For transmission of scalable streams, audio data of each layer SHOULD be packetized onto different RTP packets allowing for the different layers to be treated differently at the IP level, for example via some means of differentiated service. On the other hand, all configuration data of the scalable streams are contained in one LATM configuration data "StreamMuxConfig" and every scalable layer shares the StreamMuxConfig. The mapping between each layer and its configuration data is achieved by LATM header information attached to

对于可伸缩流的传输,每个层的音频数据应打包到不同的RTP分组上,以允许在IP级别(例如通过某种区分服务的方式)对不同层进行不同的处理。另一方面,可伸缩流的所有配置数据都包含在一个LATM配置数据“StreamMuxConfig”中,并且每个可伸缩层共享StreamMuxConfig。每个层与其配置数据之间的映射是通过附加到

the audio data. In order to indicate the dependency information of the scalable streams, a restriction is applied to the dynamic assignment rule of payload type (PT) values (see section 4.2).

音频数据。为了指示可伸缩流的依赖信息,对有效负载类型(PT)值的动态分配规则应用了限制(见第4.2节)。

For MPEG-4 Audio coding tools, as is true for other audio coders, if the payload is a single audio frame, packet loss will not impair the decodability of adjacent packets. Therefore, the additional media specific header for recovering errors will not be required for MPEG-4 Audio. Existing RTP protection mechanisms, such as Generic Forward Error Correction (RFC 2733) and Redundant Audio Data (RFC 2198), MAY be applied to improve error resiliency.

对于MPEG-4音频编码工具,与其他音频编码器一样,如果有效载荷是单个音频帧,则数据包丢失不会损害相邻数据包的可解码性。因此,MPEG-4音频不需要用于恢复错误的附加媒体特定报头。现有的RTP保护机制,例如通用前向纠错(RFC 2733)和冗余音频数据(RFC 2198),可用于提高错误恢复能力。

2. Conventions used in this document
2. 本文件中使用的公约

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [7].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC-2119[7]中所述进行解释。

3. RTP Packetization of MPEG-4 Visual bitstream
3. MPEG-4视频码流的RTP打包

This section specifies RTP packetization rules for MPEG-4 Visual content. An MPEG-4 Visual bitstream is mapped directly onto RTP packets without the addition of extra header fields or any removal of Visual syntax elements. The Combined Configuration/Elementary stream mode MUST be used so that configuration information will be carried to the same RTP port as the elementary stream. (see 6.2.1 "Start codes" of ISO/IEC 14496-2 [2][9][4]) The configuration information MAY additionally be specified by some out-of-band means. If needed for an H.323 terminal, H.245 codepoint "decoderConfigurationInformation" MUST be used for this purpose. If needed by systems using MIME content type and SDP parameters, e.g., SIP and RTSP, the optional parameter "config" MUST be used to specify the configuration information (see 5.1 and 5.2).

本节指定MPEG-4视频内容的RTP打包规则。MPEG-4视频比特流直接映射到RTP数据包上,无需添加额外的头字段或删除任何视频语法元素。必须使用组合配置/基本流模式,以便将配置信息传送到与基本流相同的RTP端口。(参见ISO/IEC 14496-2[2][9][4]中的6.2.1“启动代码”)。可通过一些带外方式另外指定配置信息。如果H.323终端需要,则必须使用H.245代码点“解码配置信息”。如果使用MIME内容类型和SDP参数(如SIP和RTSP)的系统需要,则必须使用可选参数“config”来指定配置信息(参见5.1和5.2)。

When the short video header mode is used, the RTP payload format for H.263 SHOULD be used (the format defined in RFC 2429 is RECOMMENDED, but the RFC 2190 format MAY be used for compatibility with older implementations).

当使用短视频报头模式时,应使用H.263的RTP有效负载格式(建议使用RFC 2429中定义的格式,但RFC 2190格式可用于与旧版本实现的兼容性)。

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       sequence number         | RTP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           | Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|            contributing source (CSRC) identifiers             |
|                             ....                              |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                                                               | RTP
|       MPEG-4 Visual stream (byte aligned)                     | Pay-
|                                                               | load
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       sequence number         | RTP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           | Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|            contributing source (CSRC) identifiers             |
|                             ....                              |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                                                               | RTP
|       MPEG-4 Visual stream (byte aligned)                     | Pay-
|                                                               | load
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 1 - An RTP packet for MPEG-4 Visual stream

图1-MPEG-4视频流的RTP数据包

3.1 Use of RTP header fields for MPEG-4 Visual
3.1 MPEG-4视频编码中RTP头字段的使用

Payload Type (PT): The assignment of an RTP payload type for this new packet format is outside the scope of this document, and will not be specified here. It is expected that the RTP profile for a particular class of applications will assign a payload type for this encoding, or if that is not done then a payload type in the dynamic range SHALL be chosen by means of an out of band signaling protocol (e.g., H.245, SIP, etc).

有效负载类型(PT):此新数据包格式的RTP有效负载类型的分配不在本文档的范围内,此处将不指定。预计特定类别应用的RTP配置文件将为该编码分配有效负载类型,或者如果未这样做,则应通过带外信令协议(例如,H.245、SIP等)选择动态范围内的有效负载类型。

Extension (X) bit: Defined by the RTP profile used.

扩展(X)位:由使用的RTP配置文件定义。

Sequence Number: Incremented by one for each RTP data packet sent, starting, for security reasons, with a random initial value.

序号:对于发送的每个RTP数据包,递增一,出于安全原因,以随机初始值开始。

Marker (M) bit: The marker bit is set to one to indicate the last RTP packet (or only RTP packet) of a VOP. When multiple VOPs are carried in the same RTP packet, the marker bit is set to one.

标记(M)位:标记位设置为1,以指示VOP的最后一个RTP数据包(或仅RTP数据包)。当在同一RTP数据包中携带多个VOP时,标记位设置为1。

Timestamp: The timestamp indicates the sampling instance of the VOP contained in the RTP packet. A constant offset, which is random, is added for security reasons.

时间戳:时间戳指示RTP数据包中包含的VOP的采样实例。出于安全原因,添加了一个随机的常量偏移量。

- When multiple VOPs are carried in the same RTP packet, the timestamp indicates the earliest of the VOP times within the VOPs carried in the RTP packet. Timestamp information of the rest of

- 当在同一RTP包中承载多个VOP时,时间戳指示RTP包中承载的VOP中最早的VOP时间。其余部分的时间戳信息

the VOPs are derived from the timestamp fields in the VOP header (modulo_time_base and vop_time_increment). - If the RTP packet contains only configuration information and/or Group_of_VideoObjectPlane() fields, the timestamp of the next VOP in the coding order is used. - If the RTP packet contains only visual_object_sequence_end_code information, the timestamp of the immediately preceding VOP in the coding order is used.

VOP来自VOP头中的时间戳字段(模时间基和VOP时间增量)。-如果RTP数据包仅包含配置信息和/或\u VideoObjectPlane()字段组,则使用编码顺序中下一个VOP的时间戳。-如果RTP数据包仅包含可视的\u对象\u序列\u结束\u代码信息,则使用编码顺序中紧靠前的VOP的时间戳。

The resolution of the timestamp is set to its default value of 90kHz, unless specified by an out-of-band means (e.g., SDP parameter or MIME parameter as defined in section 5).

时间戳的分辨率设置为其默认值90kHz,除非由带外方式(例如,第5节中定义的SDP参数或MIME参数)指定。

Other header fields are used as described in RFC 1889 [8].

如RFC 1889[8]所述,使用其他标题字段。

3.2 Fragmentation of MPEG-4 Visual bitstream
3.2 MPEG-4视频码流的分段

A fragmented MPEG-4 Visual bitstream is mapped directly onto the RTP payload without any addition of extra header fields or any removal of Visual syntax elements. The Combined Configuration/Elementary streams mode is used. The following rules apply for the fragmentation.

片段化的MPEG-4视频比特流直接映射到RTP负载上,而无需添加任何额外的头字段或移除任何视频语法元素。使用组合配置/基本流模式。以下规则适用于碎片。

In the following, header means one of the following:

在下文中,标题是指以下内容之一:

- Configuration information (Visual Object Sequence Header, Visual Object Header and Video Object Layer Header) - visual_object_sequence_end_code - The header of the entry point function for an elementary stream (Group_of_VideoObjectPlane() or the header of VideoObjectPlane(), video_plane_with_short_header(), MeshObject() or FaceObject()) - The video packet header (video_packet_header() excluding next_resync_marker()) - The header of gob_layer() See 6.2.1 "Start codes" of ISO/IEC 14496-2 [2][9][4] for the definition of the configuration information and the entry point functions.

- 配置信息(可视对象序列头、可视对象头和视频对象层头)-可视对象序列头代码-基本流入口点函数的头(组头、视频对象平面头、视频对象平面头、网格对象头或面对象头)-视频数据包头(视频数据包头()不包括下一个重新同步标记())-数据包层头()配置信息和入口点函数的定义见ISO/IEC 14496-2[2][9][4]的6.2.1“开始代码”。

(1) Configuration information and Group_of_VideoObjectPlane() fields SHALL be placed at the beginning of the RTP payload (just after the RTP header) or just after the header of the syntactically upper layer function.

(1) 配置信息和\u VideoObjectPlane()字段的组\u应放置在RTP有效载荷的开头(紧跟在RTP头之后)或语法上层函数的头之后。

(2) If one or more headers exist in the RTP payload, the RTP payload SHALL begin with the header of the syntactically highest function. Note: The visual_object_sequence_end_code is regarded as the lowest function.

(2) 如果RTP有效载荷中存在一个或多个报头,RTP有效载荷应以语法上最高功能的报头开始。注:可视对象\序列\结束\代码被视为最低功能。

(3) A header SHALL NOT be split into a plurality of RTP packets.

(3) 不得将报头拆分为多个RTP数据包。

(4) Different VOPs SHOULD be fragmented into different RTP packets so that one RTP packet consists of the data bytes associated with a unique VOP time instance (that is indicated in the timestamp field in the RTP packet header), with the exception that multiple consecutive VOPs MAY be carried within one RTP packet in the decoding order if the size of the VOPs is small.

(4) 不同的VOP应分段为不同的RTP包,以便一个RTP包由与唯一VOP时间实例相关联的数据字节组成(在RTP包报头的时间戳字段中指示),但是,如果vop的大小很小,则可以在一个RTP分组中以解码顺序携带多个连续vop。

Note: When multiple VOPs are carried in one RTP payload, the timestamp of the VOPs after the first one may be calculated by the decoder. This operation is necessary only for RTP packets in which the marker bit equals to one and the beginning of RTP payload corresponds to a start code. (See timestamp and marker bit in section 3.1.)

注:当在一个RTP有效载荷中承载多个VOP时,解码器可计算第一个VOP之后的VOP的时间戳。此操作仅对于标记位等于1且RTP有效负载的开头对应于开始代码的RTP数据包是必需的。(见第3.1节中的时间戳和标记位。)

(5) It is RECOMMENDED that a single video packet is sent as a single RTP packet. The size of a video packet SHOULD be adjusted in such a way that the resulting RTP packet is not larger than the path-MTU. Note: Rule (5) does not apply when the video packet is disabled by the coder configuration (by setting resync_marker_disable in the VOL header to 1), or in coding tools where the video packet is not supported. In this case, a VOP MAY be split at arbitrary byte-positions.

(5) 建议将单个视频数据包作为单个RTP数据包发送。视频分组的大小应以这样的方式进行调整,即所得RTP分组不大于路径MTU。注意:当编码器配置禁用视频包时(通过将VOL头中的resync_MAKER_disable设置为1),或在不支持视频包的编码工具中,规则(5)不适用。在这种情况下,可以在任意字节位置拆分VOP。

The video packet starts with the VOP header or the video packet header, followed by motion_shape_texture(), and ends with next_resync_marker() or next_start_code().

视频包以VOP头或视频包头开始,然后是运动形状纹理(),最后是下一个重新同步标记()或下一个开始代码()。

3.3 Examples of packetized MPEG-4 Visual bitstream
3.3 分组化MPEG-4视频比特流示例

Figure 2 shows examples of RTP packets generated based on the criteria described in 3.2

图2显示了基于3.2中描述的标准生成的RTP数据包的示例

(a) is an example of the first RTP packet or the random access point of an MPEG-4 Visual bitstream containing the configuration information. According to criterion (1), the Visual Object Sequence Header(VS header) is placed at the beginning of the RTP payload, preceding the Visual Object Header and the Video Object Layer Header(VO header, VOL header). Since the fragmentation rule defined in 3.2 guarantees that the configuration information, starting with visual_object_sequence_start_code, is always placed at the beginning of the RTP payload, RTP receivers can detect the random access point by checking if the first 32-bit field of the RTP payload is visual_object_sequence_start_code.

(a) 是包含配置信息的MPEG-4视频比特流的第一RTP分组或随机接入点的示例。根据准则(1),视觉对象序列报头(VS报头)被放置在RTP有效载荷的开始处,在视觉对象报头和视频对象层报头(VO报头、VOL报头)之前。由于3.2中定义的分段规则保证配置信息(从可视对象序列开始)始终放在RTP有效负载的开头,RTP接收器可以通过检查RTP有效负载的第一个32位字段是否为可视对象序列开始代码来检测随机接入点。

(b) is another example of the RTP packet containing the configuration information. It differs from example (a) in that the RTP packet also contains a video packet in the VOP following the configuration information. Since the length of the configuration information is relatively short (typically scores of bytes) and an RTP packet containing only the configuration information may thus increase the overhead, the configuration information and the immediately following GOV and/or (a part of) VOP can be packetized into a single RTP packet as in this example.

(b) 是包含配置信息的RTP包的另一个示例。它与示例(a)的不同之处在于,RTP分组还包含VOP中跟随配置信息的视频分组。由于配置信息的长度相对较短(通常为几十个字节),并且仅包含配置信息的RTP分组因此可能增加开销,因此配置信息和紧随其后的GOV和/或(部分)VOP可以打包成单个RTP分组,如本示例中所示。

(c) is an example of an RTP packet that contains Group_of_VideoObjectPlane(GOV). Following criterion (1), the GOV is placed at the beginning of the RTP payload. It would be a waste of RTP/IP header overhead to generate an RTP packet containing only a GOV whose length is 7 bytes. Therefore, (a part of) the following VOP can be placed in the same RTP packet as shown in (c).

(c) 是RTP数据包的一个示例,其中包含视频对象平面(GOV)的组。根据标准(1),GOV被放置在RTP有效载荷的开头。生成仅包含长度为7字节的GOV的RTP数据包将浪费RTP/IP报头开销。因此,可以将以下VOP的(一部分)放入如(c)所示的同一RTP包中。

(d) is an example of the case where one video packet is packetized into one RTP packet. When the packet-loss rate of the underlying network is high, this kind of packetization is recommended. Even when the RTP packet containing the VOP header is discarded by a packet loss, the other RTP packets can be decoded by using the HEC(Header Extension Code) information in the video packet header. No extra RTP header field is necessary.

(d) 是将一个视频分组打包成一个RTP分组的情况的示例。当底层网络的丢包率较高时,建议采用这种分组方式。即使当包含VOP报头的RTP分组由于分组丢失而被丢弃时,也可以通过使用视频分组报头中的HEC(报头扩展码)信息来解码其他RTP分组。不需要额外的RTP标头字段。

(e) is an example of the case where more than one video packet is packetized into one RTP packet. This kind of packetization is effective to save the overhead of RTP/IP headers when the bit-rate of the underlying network is low. However, it will decrease the packet-loss resiliency because multiple video packets are discarded by a single RTP packet loss. The optimal number of video packets in an RTP packet and the length of the RTP packet can be determined considering the packet-loss rate and the bit-rate of the underlying network.

(e) 是将多个视频分组打包成一个RTP分组的情况的示例。当底层网络的比特率较低时,这种分组可以有效地节省RTP/IP报头的开销。然而,由于单个RTP数据包丢失会丢弃多个视频数据包,因此会降低数据包丢失的弹性。可以考虑分组丢失率和底层网络的比特率来确定RTP分组中视频分组的最佳数目和RTP分组的长度。

(f) is an example of the case when the video packet is disabled by setting resync_marker_disable in the VOL header to 1. In this case, a VOP may be split into a plurality of RTP packets at arbitrary byte-positions. For example, it is possible to split a VOP into fixed-length packets. This kind of coder configuration and RTP packet fragmentation may be used when the underlying network is guaranteed to be error-free. On the other hand, it is not recommended to use it in error-prone environment since it provides only poor packet loss resiliency.

(f) 是通过将VOL报头中的resync_marker_disable设置为1来禁用视频分组的示例。在这种情况下,VOP可以在任意字节位置被分割成多个RTP分组。例如,可以将VOP拆分为固定长度的数据包。当保证底层网络无错误时,可以使用这种编码器配置和RTP数据包分段。另一方面,不建议在容易出错的环境中使用它,因为它只提供较差的丢包恢复能力。

Figure 3 shows examples of RTP packets prohibited by the criteria of 3.2.

图3显示了3.2标准禁止的RTP数据包示例。

Fragmentation of a header into multiple RTP packets, as in (a), will not only increase the overhead of RTP/IP headers but also decrease the error resiliency. Therefore, it is prohibited by the criterion (3).

将一个报头分割成多个RTP数据包,如(a)所示,不仅会增加RTP/IP报头的开销,还会降低错误恢复能力。因此,这是标准(3)所禁止的。

When concatenating more than one video packets into an RTP packet, VOP header or video_packet_header() shall not be placed in the middle of the RTP payload. The packetization as in (b) is not allowed by criterion (2) due to the aspect of the error resiliency. Comparing this example with Figure 2(d), although two video packets are mapped onto two RTP packets in both cases, the packet-loss resiliency is not identical. Namely, if the second RTP packet is lost, both video packets 1 and 2 are lost in the case of Figure 3(b) whereas only video packet 2 is lost in the case of Figure 2(d).

当将一个以上的视频包连接到RTP分组时,VoP头或VooOxPayTeTHead()不应被放置在RTP有效载荷的中间。由于容错性方面的原因,标准(2)不允许(b)中的打包。将该示例与图2(d)进行比较,尽管在这两种情况下两个视频包都映射到两个RTP包上,但包丢失恢复能力并不相同。即,如果第二RTP分组丢失,则在图3(b)的情况下视频分组1和2都丢失,而在图2(d)的情况下仅视频分组2丢失。

    +------+------+------+------+
(a) | RTP  |  VS  |  VO  | VOL  |
    |header|header|header|header|
    +------+------+------+------+
        
    +------+------+------+------+
(a) | RTP  |  VS  |  VO  | VOL  |
    |header|header|header|header|
    +------+------+------+------+
        
    +------+------+------+------+------------+
(b) | RTP  |  VS  |  VO  | VOL  |Video Packet|
    |header|header|header|header|            |
    +------+------+------+------+------------+
        
    +------+------+------+------+------------+
(b) | RTP  |  VS  |  VO  | VOL  |Video Packet|
    |header|header|header|header|            |
    +------+------+------+------+------------+
        
    +------+-----+------------------+
(c) | RTP  | GOV |Video Object Plane|
    |header|     |                  |
    +------+-----+------------------+
        
    +------+-----+------------------+
(c) | RTP  | GOV |Video Object Plane|
    |header|     |                  |
    +------+-----+------------------+
        
    +------+------+------------+  +------+------+------------+
(d) | RTP  | VOP  |Video Packet|  | RTP  |  VP  |Video Packet|
    |header|header|    (1)     |  |header|header|    (2)     |
    +------+------+------------+  +------+------+------------+
        
    +------+------+------------+  +------+------+------------+
(d) | RTP  | VOP  |Video Packet|  | RTP  |  VP  |Video Packet|
    |header|header|    (1)     |  |header|header|    (2)     |
    +------+------+------------+  +------+------+------------+
        
    +------+------+------------+------+------------+------+------------+
(e) | RTP  |  VP  |Video Packet|  VP  |Video Packet|  VP  |Video Packet|
    |header|header|     (1)    |header|    (2)     |header|    (3)     |
    +------+------+------------+------+------------+------+------------+
        
    +------+------+------------+------+------------+------+------------+
(e) | RTP  |  VP  |Video Packet|  VP  |Video Packet|  VP  |Video Packet|
    |header|header|     (1)    |header|    (2)     |header|    (3)     |
    +------+------+------------+------+------------+------+------------+
        
    +------+------+------------+  +------+------------+
(f) | RTP  | VOP  |VOP fragment|  | RTP  |VOP fragment|
    |header|header|    (1)     |  |header|    (2)     | ___
    +------+------+------------+  +------+------------+
        
    +------+------+------------+  +------+------------+
(f) | RTP  | VOP  |VOP fragment|  | RTP  |VOP fragment|
    |header|header|    (1)     |  |header|    (2)     | ___
    +------+------+------------+  +------+------------+
        

Figure 2 - Examples of RTP packetized MPEG-4 Visual bitstream

图2-RTP打包的MPEG-4视频比特流示例

    +------+-------------+  +------+------------+------------+
(a) | RTP  |First half of|  | RTP  |Last half of|Video Packet|
    |header|  VP header  |  |header|  VP header |            |
    +------+-------------+  +------+------------+------------+
        
    +------+-------------+  +------+------------+------------+
(a) | RTP  |First half of|  | RTP  |Last half of|Video Packet|
    |header|  VP header  |  |header|  VP header |            |
    +------+-------------+  +------+------------+------------+
        
    +------+------+----------+  +------+---------+------+------------+
(b) | RTP  | VOP  |First half|  | RTP  |Last half|  VP  |Video Packet|
    |header|header| of VP(1) |  |header| of VP(1)|header|    (2)     |
    +------+------+----------+  +------+---------+------+------------+
        
    +------+------+----------+  +------+---------+------+------------+
(b) | RTP  | VOP  |First half|  | RTP  |Last half|  VP  |Video Packet|
    |header|header| of VP(1) |  |header| of VP(1)|header|    (2)     |
    +------+------+----------+  +------+---------+------+------------+
        

Figure 3 - Examples of prohibited RTP packetization for MPEG-4 Visual bitstream

图3-MPEG-4视频比特流的禁止RTP打包示例

4. RTP Packetization of MPEG-4 Audio bitstream
4. MPEG-4音频码流的RTP打包

This section specifies RTP packetization rules for MPEG-4 Audio bitstreams. MPEG-4 Audio streams MUST be formatted by LATM (Low-overhead MPEG-4 Audio Transport Multiplex) tool [5], and the LATM-based streams are then mapped onto RTP packets as described the three sections below.

本节指定MPEG-4音频比特流的RTP打包规则。MPEG-4音频流必须使用LATM(低开销MPEG-4音频传输多路复用)工具[5]进行格式化,然后将基于LATM的流映射到RTP包上,如下三节所述。

4.1 RTP Packet Format
4.1 RTP数据包格式

LATM-based streams consist of a sequence of audioMuxElements that include one or more audio frames. A complete audioMuxElement or a part of one SHALL be mapped directly onto an RTP payload without any removal of audioMuxElement syntax elements (see Figure 4). The first byte of each audioMuxElement SHALL be located at the first payload location in an RTP packet.

基于LATM的流由包含一个或多个音频帧的音频像素序列组成。完整的audioMuxElement或其一部分应直接映射到RTP有效载荷上,而无需移除audioMuxElement语法元素(见图4)。每个音频多路复用器的第一个字节应位于RTP数据包中的第一个有效负载位置。

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       sequence number         |RTP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           |Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|            contributing source (CSRC) identifiers             |
|                             ....                              |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                                                               |RTP
:                 audioMuxElement (byte aligned)                :Payload
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       sequence number         |RTP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           |Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|            contributing source (CSRC) identifiers             |
|                             ....                              |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                                                               |RTP
:                 audioMuxElement (byte aligned)                :Payload
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 4 - An RTP packet for MPEG-4 Audio

图4-MPEG-4音频的RTP数据包

In order to decode the audioMuxElement, the following muxConfigPresent information is required to be indicated by an out-of-band means. When SDP is utilized for this indication, MIME parameter "cpresent" corresponds to the muxConfigPresent information (see section 5.3).

In order to decode the audioMuxElement, the following muxConfigPresent information is required to be indicated by an out-of-band means. When SDP is utilized for this indication, MIME parameter "cpresent" corresponds to the muxConfigPresent information (see section 5.3).translate error, please retry

muxConfigPresent: If this value is set to 1 (in-band mode), the audioMuxElement SHALL include an indication bit "useSameStreamMux" and MAY include the configuration information for audio compression "StreamMuxConfig". The useSameStreamMux bit indicates whether the StreamMuxConfig element in the previous frame is applied in the current frame. If the useSameStreamMux bit indicates to use the StreamMuxConfig from the previous frame, but if the previous frame has been lost, the current frame may not be decodable. Therefore, in case of in-band mode, the StreamMuxConfig element SHOULD be transmitted repeatedly depending on the network condition. On the other hand, if muxConfigPresent is set to 0 (out-band mode), the StreamMuxConfig element is required to be transmitted by an out-of-band means. In case of SDP, MIME parameter "config" is utilized (see section 5.3).

muxConfigPresent:如果该值设置为1(带内模式),则AudioMuxement应包括一个指示位“useSameStreamMux”,并可能包括音频压缩的配置信息“StreamMuxConfig”。useSameStreamMux位指示前一帧中的StreamMuxConfig元素是否应用于当前帧。如果useSameStreamMux位指示使用前一帧的StreamMuxConfig,但如果前一帧已丢失,则当前帧可能无法解码。因此,在带内模式下,应根据网络条件重复传输StreamMuxConfig元素。另一方面,如果muxConfigPresent设置为0(带外模式),则需要通过带外方式传输StreamMuxConfig元素。对于SDP,使用MIME参数“config”(参见第5.3节)。

4.2 Use of RTP Header Fields for MPEG-4 Audio
4.2 对MPEG-4音频使用RTP报头字段

Payload Type (PT): The assignment of an RTP payload type for this new packet format is outside the scope of this document, and will not be specified here. It is expected that the RTP profile for a particular class of applications will assign a payload type for this encoding,

有效负载类型(PT):此新数据包格式的RTP有效负载类型的分配不在本文档的范围内,此处将不指定。预计特定类别应用程序的RTP配置文件将为此编码分配有效负载类型,

or if that is not done then a payload type in the dynamic range shall be chosen by means of an out of band signaling protocol (e.g., H.245, SIP, etc). In the dynamic assignment of RTP payload types for scalable streams, a different value SHOULD be assigned to each layer. The assigned values SHOULD be in order of enhance layer dependency, where the base layer has the smallest value.

或者,如果未这样做,则应通过带外信令协议(例如,H.245、SIP等)选择动态范围内的有效负载类型。在可伸缩流的RTP有效负载类型的动态分配中,应为每个层分配不同的值。指定的值应按增强层相关性的顺序排列,其中基础层的值最小。

Marker (M) bit: The marker bit indicates audioMuxElement boundaries. It is set to one to indicate that the RTP packet contains a complete audioMuxElement or the last fragment of an audioMuxElement.

标记(M)位:标记位表示音频像素边界。设置为1表示RTP数据包包含完整的audioMuxElement或audioMuxElement的最后一个片段。

Timestamp: The timestamp indicates the sampling instance of the first audio frame contained in the RTP packet. Timestamps are recommended to start at a random value for security reasons.

时间戳:时间戳指示RTP数据包中包含的第一个音频帧的采样实例。出于安全原因,建议时间戳以随机值开始。

Unless specified by an out-of-band means, the resolution of the timestamp is set to its default value of 90 kHz.

除非由带外方式指定,否则时间戳的分辨率设置为其默认值90 kHz。

Sequence Number: Incremented by one for each RTP packet sent, starting, for security reasons, with a random value.

序列号:对于发送的每个RTP数据包,递增一,出于安全原因,以随机值开始。

Other header fields are used as described in RFC 1889 [8].

如RFC 1889[8]所述,使用其他标题字段。

4.3 Fragmentation of MPEG-4 Audio bitstream
4.3 MPEG-4音频比特流的分段

It is RECOMMENDED to put one audioMuxElement in each RTP packet. If the size of an audioMuxElement can be kept small enough that the size of the RTP packet containing it does not exceed the size of the path-MTU, this will be no problem. If it cannot, the audioMuxElement MAY be fragmented and spread across multiple packets.

建议在每个RTP数据包中放置一个audioMuxElement。如果audioMuxElement的大小可以保持足够小,使得包含它的RTP包的大小不超过路径MTU的大小,这将不会有问题。如果不能,音频像素可能会被分割并分布在多个数据包中。

5. MIME type registration for MPEG-4 Audio/Visual streams
5. MPEG-4音频/视频流的MIME类型注册

The following sections describe the MIME type registrations for MPEG-4 Audio/Visual streams. MIME type registration and SDP usage for the MPEG-4 Visual stream are described in Sections 5.1 and 5.2, respectively, while MIME type registration and SDP usage for MPEG-4 Audio stream are described in Sections 5.3 and 5.4, respectively.

以下各节描述了MPEG-4音频/视频流的MIME类型注册。第5.1节和第5.2节分别描述了MPEG-4视频流的MIME类型注册和SDP使用,而第5.3节和第5.4节分别描述了MPEG-4音频流的MIME类型注册和SDP使用。

5.1 MIME type registration for MPEG-4 Visual
5.1 MPEG-4视频文件的MIME类型注册

MIME media type name: video

MIME媒体类型名称:视频

MIME subtype name: MP4V-ES

MIME子类型名称:MP4V-ES

Required parameters: none

所需参数:无

Optional parameters:

可选参数:

rate: This parameter is used only for RTP transport. It indicates the resolution of the timestamp field in the RTP header. If this parameter is not specified, its default value of 90000 (90kHz) is used.

速率:此参数仅用于RTP传输。它指示RTP标头中时间戳字段的分辨率。如果未指定此参数,则使用其默认值90000(90kHz)。

profile-level-id: A decimal representation of MPEG-4 Visual Profile and Level indication value (profile_and_level_indication) defined in Table G-1 of ISO/IEC 14496-2 [2][4]. This parameter MAY be used in the capability exchange or session setup procedure to indicate MPEG-4 Visual Profile and Level combination of which the MPEG-4 Visual codec is capable. If this parameter is not specified by the procedure, its default value of 1 (Simple Profile/Level 1) is used.

配置文件级别id:ISO/IEC 14496-2[2][4]表G-1中定义的MPEG-4视觉配置文件和级别指示值(配置文件和级别指示)的十进制表示。此参数可在能力交换或会话设置过程中使用,以指示MPEG-4视频编解码器能够实现的MPEG-4视频配置文件和级别组合。如果程序未指定此参数,则使用其默认值1(简单配置文件/级别1)。

config: This parameter SHALL be used to indicate the configuration of the corresponding MPEG-4 Visual bitstream. It SHALL NOT be used to indicate the codec capability in the capability exchange procedure. It is a hexadecimal representation of an octet string that expresses the MPEG-4 Visual configuration information, as defined in subclause 6.2.1 Start codes of ISO/IEC14496-2 [2][4][9]. The configuration information is mapped onto the octet string in an MSB-first basis. The first bit of the configuration information SHALL be located at the MSB of the first octet. The configuration information indicated by this parameter SHALL be the same as the configuration information in the corresponding MPEG-4 Visual stream, except for first_half_vbv_occupancy and latter_half_vbv_occupancy, if exist, which may vary in the repeated configuration information inside an MPEG-4 Visual stream (See 6.2.1 Start codes of ISO/IEC14496-2).

配置:该参数用于指示相应MPEG-4视频码流的配置。在能力交换程序中,它不应用于指示编解码器能力。它是表示MPEG-4视觉配置信息的八位字节字符串的十六进制表示,如ISO/IEC14496-2[2][4][9]第6.2.1款开始代码所定义。配置信息以MSB优先顺序映射到八位字节字符串。配置信息的第一位应位于第一个八位组的MSB。该参数指示的配置信息应与相应MPEG-4视频流中的配置信息相同,但前半部分占用和后半部分占用(如果存在)除外,这可能在MPEG-4视频流中的重复配置信息中有所不同(参见ISO/IEC14496-2的6.2.1启动代码)。

Example usages for these parameters are:

这些参数的示例用法如下:

- MPEG-4 Visual Simple Profile/Level 1: Content-type: video/mp4v-es; profile-level-id=1

- MPEG-4视觉简单配置文件/级别1:内容类型:视频/mp4v es;配置文件级别id=1

- MPEG-4 Visual Core Profile/Level 2: Content-type: video/mp4v-es; profile-level-id=34

- MPEG-4视频核心配置文件/第2级:内容类型:视频/mp4v es;配置文件级别id=34

- MPEG-4 Visual Advanced Real Time Simple Profile/Level 1: Content-type: video/mp4v-es; profile-level-id=145

- MPEG-4视频高级实时简单配置文件/级别1:内容类型:视频/mp4v es;配置文件级别id=145

Published specification: The specifications for MPEG-4 Visual streams are presented in ISO/IEC 14469-2 [2][4][9]. The RTP payload format is described in RFC 3016.

已发布规范:MPEG-4视频流的规范见ISO/IEC 14469-2[2][4][9]。RFC 3016中描述了RTP有效负载格式。

Encoding considerations: Video bitstreams MUST be generated according to MPEG-4 Visual specifications (ISO/IEC 14496-2). A video bitstream is binary data and MUST be encoded for non-binary transport (for Email, the Base64 encoding is sufficient). This type is also defined for transfer via RTP. The RTP packets MUST be packetized according to the MPEG-4 Visual RTP payload format defined in RFC 3016.

编码注意事项:必须根据MPEG-4视频规范(ISO/IEC 14496-2)生成视频比特流。视频比特流是二进制数据,必须为非二进制传输进行编码(对于电子邮件,Base64编码就足够了)。此类型也定义为通过RTP传输。RTP数据包必须根据RFC 3016中定义的MPEG-4可视RTP有效负载格式进行打包。

Security considerations: See section 6 of RFC 3016.

安全注意事项:见RFC 3016第6节。

Interoperability considerations: MPEG-4 Visual provides a large and rich set of tools for the coding of visual objects. For effective implementation of the standard, subsets of the MPEG-4 Visual tool sets have been provided for use in specific applications. These subsets, called 'Profiles', limit the size of the tool set a decoder is required to implement. In order to restrict computational complexity, one or more Levels are set for each Profile. A Profile@Level combination allows:

互操作性注意事项:MPEG-4 Visual为可视对象的编码提供了一套丰富的工具。为了有效实施该标准,提供了MPEG-4视频工具集的子集,以用于特定应用。这些称为“概要文件”的子集限制了解码器需要实现的工具集的大小。为了限制计算复杂性,为每个配置文件设置一个或多个级别。A.Profile@Level组合允许:

o a codec builder to implement only the subset of the standard he needs, while maintaining interworking with other MPEG-4 devices included in the same combination, and

o 编解码器生成器,仅实现所需标准的子集,同时保持与同一组合中包含的其他MPEG-4设备的互通,以及

o checking whether MPEG-4 devices comply with the standard (' conformance testing').

o 检查MPEG-4设备是否符合标准(“一致性测试”)。

The visual stream SHALL be compliant with the MPEG-4 Visual Profile@Level specified by the parameter "profile-level-id". Interoperability between a sender and a receiver may be achieved by specifying the parameter "profile-level-id" in MIME content, or by arranging in the capability exchange/announcement procedure to set this parameter mutually to the same value.

视频流应符合MPEG-4视频流标准Profile@Level由参数“配置文件级别id”指定。发送方和接收方之间的互操作性可以通过在MIME内容中指定参数“概要文件级别id”来实现,或者通过在功能交换/公告过程中安排将该参数相互设置为相同的值来实现。

Applications which use this media type: Audio and visual streaming and conferencing tools, Internet messaging and Email applications.

使用此媒体类型的应用程序:音频和视频流媒体和会议工具、Internet消息和电子邮件应用程序。

Additional information: none

其他信息:无

Person & email address to contact for further information: The authors of RFC 3016. (See section 8.)

联系人和电子邮件地址,以获取更多信息:RFC 3016的作者。(见第8节。)

Intended usage: COMMON

预期用途:普通

Author/Change controller: The authors of RFC 3016. (See section 8.)

作者/变更控制者:RFC 3016的作者。(见第8节。)

5.2 SDP usage of MPEG-4 Visual
5.2 MPEG-4视频编码的SDP使用

The MIME media type video/MP4V-ES string is mapped to fields in the Session Description Protocol (SDP), RFC 2327, as follows:

MIME媒体类型video/MP4V-ES字符串映射到会话描述协议(SDP)RFC 2327中的字段,如下所示:

o The MIME type (video) goes in SDP "m=" as the media name.

o MIME类型(视频)以SDP“m=”作为媒体名称。

o The MIME subtype (MP4V-ES) goes in SDP "a=rtpmap" as the encoding name.

o MIME子类型(MP4V-ES)以SDP“a=rtpmap”作为编码名称。

o The optional parameter "rate" goes in "a=rtpmap" as the clock rate.

o 可选参数“rate”作为时钟频率进入“a=rtpmap”。

o The optional parameter "profile-level-id" and "config" go in the "a=fmtp" line to indicate the coder capability and configuration, respectively. These parameters are expressed as a MIME media type string, in the form of as a semicolon separated list of parameter=value pairs.

o 可选参数“profile level id”和“config”分别显示在“a=fmtp”行中,以指示编码器的能力和配置。这些参数表示为MIME媒体类型字符串,形式为以分号分隔的参数=值对列表。

The following are some examples of media representation in SDP:

以下是SDP中媒体表示的一些示例:

Simple Profile/Level 1, rate=90000(90kHz), "profile-level-id" and
"config" are present in "a=fmtp" line:
  m=video 49170/2 RTP/AVP 98
  a=rtpmap:98 MP4V-ES/90000
  a=fmtp:98 profile-level-id=1;config=000001B001000001B509000001000000012
     0008440FA282C2090A21F
        
Simple Profile/Level 1, rate=90000(90kHz), "profile-level-id" and
"config" are present in "a=fmtp" line:
  m=video 49170/2 RTP/AVP 98
  a=rtpmap:98 MP4V-ES/90000
  a=fmtp:98 profile-level-id=1;config=000001B001000001B509000001000000012
     0008440FA282C2090A21F
        
Core Profile/Level 2, rate=90000(90kHz), "profile-level-id" is present in
"a=fmtp" line:
  m=video 49170/2 RTP/AVP 98
  a=rtpmap:98 MP4V-ES/90000
  a=fmtp:98 profile-level-id=34
        
Core Profile/Level 2, rate=90000(90kHz), "profile-level-id" is present in
"a=fmtp" line:
  m=video 49170/2 RTP/AVP 98
  a=rtpmap:98 MP4V-ES/90000
  a=fmtp:98 profile-level-id=34
        
Advance Real Time Simple Profile/Level 1, rate=90000(90kHz),
"profile-level-id" is present in "a=fmtp" line:
  m=video 49170/2 RTP/AVP 98
  a=rtpmap:98 MP4V-ES/90000
  a=fmtp:98 profile-level-id=145
        
Advance Real Time Simple Profile/Level 1, rate=90000(90kHz),
"profile-level-id" is present in "a=fmtp" line:
  m=video 49170/2 RTP/AVP 98
  a=rtpmap:98 MP4V-ES/90000
  a=fmtp:98 profile-level-id=145
        
5.3 MIME type registration of MPEG-4 Audio
5.3 MPEG-4音频的MIME类型注册

MIME media type name: audio

MIME媒体类型名称:音频

MIME subtype name: MP4A-LATM

MIME子类型名称:MP4A-LATM

Required parameters: rate: the rate parameter indicates the RTP time stamp clock rate. The default value is 90000. Other rates MAY be specified only if they are set to the same value as the audio sampling rate (number of samples per second).

所需参数:rate:rate参数表示RTP时间戳时钟速率。默认值为90000。只有将其他速率设置为与音频采样速率(每秒采样数)相同的值时,才能指定其他速率。

Optional parameters: profile-level-id: a decimal representation of MPEG-4 Audio Profile Level indication value defined in ISO/IEC 14496-1 ([6] and its amendments). This parameter indicates which MPEG-4 Audio tool subsets the decoder is capable of using. If this parameter is not specified in the capability exchange or session setup procedure, its default value of 30 (Natural Audio Profile/Level 1) is used.

可选参数:配置文件级别id:ISO/IEC 14496-1([6]及其修正案)中定义的MPEG-4音频配置文件级别指示值的十进制表示。此参数指示解码器能够使用的MPEG-4音频工具子集。如果在功能交换或会话设置过程中未指定此参数,则使用其默认值30(自然音频配置文件/级别1)。

object: a decimal representation of the MPEG-4 Audio Object Type value defined in ISO/IEC 14496-3 [5]. This parameter specifies the tool to be used by the coder. It CAN be used to limit the capability within the specified "profile-level-id".

对象:ISO/IEC 14496-3[5]中定义的MPEG-4音频对象类型值的十进制表示。此参数指定编码器要使用的工具。它可用于将功能限制在指定的“配置文件级别id”内。

bitrate: the data rate for the audio bit stream.

比特率:音频比特流的数据速率。

cpresent: a boolean parameter indicates whether audio payload configuration data has been multiplexed into an RTP payload (see section 4.1). A 0 indicates the configuration data has not been multiplexed into an RTP payload, a 1 indicates that it has. The default if the parameter is omitted is 1.

cpresent:布尔参数表示音频有效负载配置数据是否已被多路复用到RTP有效负载中(参见第4.1节)。0表示配置数据尚未多路复用到RTP有效负载中,1表示已多路复用到RTP有效负载中。如果省略该参数,则默认值为1。

config: a hexadecimal representation of an octet string that expresses the audio payload configuration data "StreamMuxConfig", as defined in ISO/IEC 14496-3 [5] (see section 4.1). Configuration data is mapped onto the octet string in an MSB-first basis. The first bit of the configuration data SHALL be located at the MSB of the first octet. In the last octet, zero-padding bits, if necessary, SHALL follow the configuration data.

配置:表示音频有效负载配置数据“StreamMuxConfig”的八位字节字符串的十六进制表示形式,如ISO/IEC 14496-3[5]所定义(见第4.1节)。配置数据以MSB优先的方式映射到八位字节字符串。配置数据的第一位应位于第一个八位组的MSB。在最后一个八位字节中,如有必要,零填充位应跟随配置数据。

ptime: RECOMMENDED duration of each packet in milliseconds.

ptime:每个数据包的建议持续时间(毫秒)。

Published specification: Payload format specifications are described in this document. Encoding specifications are provided in ISO/IEC 14496-3 [3][5].

已发布规范:本文档中描述了有效负载格式规范。编码规范见ISO/IEC 14496-3[3][5]。

Encoding considerations: This type is only defined for transfer via RTP.

编码注意事项:此类型仅为通过RTP传输而定义。

Security considerations: See Section 6 of RFC 3016.

安全注意事项:见RFC 3016第6节。

Interoperability considerations: MPEG-4 Audio provides a large and rich set of tools for the coding of audio objects. For effective implementation of the standard, subsets of the MPEG-4 Audio tool sets similar to those used in MPEG-4 Visual have been provided (see section 5.1).

互操作性注意事项:MPEG-4 Audio为音频对象的编码提供了大量丰富的工具。为有效实施本标准,提供了与MPEG-4视频中使用的工具集类似的MPEG-4音频工具集子集(见第5.1节)。

The audio stream SHALL be compliant with the MPEG-4 Audio Profile@Level specified by the parameter "profile-level-id". Interoperability between a sender and a receiver may be achieved by specifying the parameter "profile-level-id" in MIME content, or by arranging in the capability exchange procedure to set this parameter mutually to the same value. Furthermore, the "object" parameter can be used to limit the capability within the specified Profile@Level in capability exchange.

音频流应符合MPEG-4音频标准Profile@Level由参数“配置文件级别id”指定。发送方和接收方之间的互操作性可以通过在MIME内容中指定参数“profile level id”来实现,或者通过在能力交换过程中安排将该参数相互设置为相同的值来实现。此外,“object”参数可用于限制指定范围内的功能Profile@Level在能力交换方面。

Applications which use this media type: Audio and video streaming and conferencing tools.

使用此媒体类型的应用程序:音频和视频流和会议工具。

Additional information: none

其他信息:无

Personal & email address to contact for further information: See Section 8 of RFC 3016.

联系人的个人和电子邮件地址,以获取更多信息:请参阅RFC 3016第8节。

Intended usage: COMMON

预期用途:普通

Author/Change controller: See Section 8 of RFC 3016.

作者/变更控制员:见RFC 3016第8节。

5.4 SDP usage of MPEG-4 Audio
5.4 MPEG-4音频的SDP使用

The MIME media type audio/MP4A-LATM string is mapped to fields in the Session Description Protocol (SDP), RFC 2327, as follows:

MIME媒体类型audio/MP4A-LATM字符串映射到会话描述协议(SDP)RFC 2327中的字段,如下所示:

o The MIME type (audio) goes in SDP "m=" as the media name.

o MIME类型(音频)以SDP“m=”作为媒体名称。

o The MIME subtype (MP4A-LATM) goes in SDP "a=rtpmap" as the encoding name.

o MIME子类型(MP4A-LATM)以SDP“a=rtpmap”作为编码名称。

o The required parameter "rate" goes in "a=rtpmap" as the clock rate.

o 所需参数“rate”作为时钟频率进入“a=rtpmap”。

o The optional parameter "ptime" goes in SDP "a=ptime" attribute.

o 可选参数“ptime”位于SDP“a=ptime”属性中。

o The optional parameter "profile-level-id" goes in the "a=fmtp" line to indicate the coder capability. The "object" parameter goes in the "a=fmtp" attribute. The payload-format-specific parameters

o 可选参数“profile level id”位于“a=fmtp”行中,表示编码器的能力。“object”参数位于“a=fmtp”属性中。有效负载格式特定参数

"bitrate", "cpresent" and "config" go in the "a=fmtp" line. These parameters are expressed as a MIME media type string, in the form of as a semicolon separated list of parameter=value pairs.

“比特率”、“cpresent”和“config”进入“a=fmtp”行。这些参数表示为MIME媒体类型字符串,形式为以分号分隔的参数=值对列表。

The following are some examples of the media representation in SDP:

以下是SDP中媒体表示的一些示例:

For 6 kb/s CELP bitstreams (with an audio sampling rate of 8 kHz),
  m=audio 49230 RTP/AVP 96
  a=rtpmap:96 MP4A-LATM/8000
  a=fmtp:96 profile-level-id=9;object=8;cpresent=0;config=9128B1071070
  a=ptime:20
        
For 6 kb/s CELP bitstreams (with an audio sampling rate of 8 kHz),
  m=audio 49230 RTP/AVP 96
  a=rtpmap:96 MP4A-LATM/8000
  a=fmtp:96 profile-level-id=9;object=8;cpresent=0;config=9128B1071070
  a=ptime:20
        

For 64 kb/s AAC LC stereo bitstreams (with an audio sampling rate of 24 kHz),

对于64 kb/s AAC LC立体声比特流(音频采样率为24 kHz),

      m=audio 49230 RTP/AVP 96
      a=rtpmap:96 MP4A-LATM/24000
      a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
      config=9122620000
        
      m=audio 49230 RTP/AVP 96
      a=rtpmap:96 MP4A-LATM/24000
      a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
      config=9122620000
        

In the above two examples, audio configuration data is not multiplexed into the RTP payload and is described only in SDP. Furthermore, the "clock rate" is set to the audio sampling rate.

在上述两个示例中,音频配置数据未多路复用到RTP有效载荷中,并且仅在SDP中描述。此外,“时钟速率”被设置为音频采样速率。

If the clock rate has been set to its default value and it is necessary to obtain the audio sampling rate, this can be done by parsing the "config" parameter (see the following example).

如果时钟频率已设置为其默认值,并且需要获取音频采样率,则可以通过解析“config”参数来完成此操作(请参见以下示例)。

      m=audio 49230 RTP/AVP 96
      a=rtpmap:96 MP4A-LATM/90000
      a=fmtp:96 object=8; cpresent=0; config=9128B1071070
        
      m=audio 49230 RTP/AVP 96
      a=rtpmap:96 MP4A-LATM/90000
      a=fmtp:96 object=8; cpresent=0; config=9128B1071070
        

The following example shows that the audio configuration data appears in the RTP payload.

以下示例显示音频配置数据出现在RTP有效负载中。

      m=audio 49230 RTP/AVP 96
      a=rtpmap:96 MP4A-LATM/90000
      a=fmtp:96 object=2; cpresent=1
        
      m=audio 49230 RTP/AVP 96
      a=rtpmap:96 MP4A-LATM/90000
      a=fmtp:96 object=2; cpresent=1
        
6. Security Considerations
6. 安全考虑

RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [8]. This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed on the compressed data so there is no conflict between the two operations.

使用本规范中定义的有效负载格式的RTP数据包应遵守RTP规范[8]中讨论的安全注意事项。这意味着媒体流的机密性是通过加密实现的。由于与此有效负载格式一起使用的数据压缩是端到端应用的,因此可以对压缩数据执行加密,因此两个操作之间没有冲突。

The complete MPEG-4 system allows for transport of a wide range of content, including Java applets (MPEG-J) and scripts. Since this payload format is restricted to audio and video streams, it is not possible to transport such active content in this format.

完整的MPEG-4系统允许传输广泛的内容,包括Java小程序(MPEG-J)和脚本。由于此有效负载格式仅限于音频和视频流,因此不可能以此格式传输此类活动内容。

7. References
7. 工具书类

1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996.

1 Bradner,S.,“互联网标准过程——第3版”,BCP 9,RFC 2026,1996年10月。

2 ISO/IEC 14496-2:1999, "Information technology - Coding of audio-visual objects - Part2: Visual".

2 ISO/IEC 14496-2:1999,“信息技术-视听对象编码-第2部分:视觉”。

3 ISO/IEC 14496-3:1999, "Information technology - Coding of audio-visual objects - Part3: Audio".

3 ISO/IEC 14496-3:1999,“信息技术-视听对象编码-第3部分:音频”。

4 ISO/IEC 14496-2:1999/Amd.1:2000, "Information technology - Coding of audio-visual objects - Part 2: Visual, Amendment 1: Visual extensions".

ISO/IEC 14496-2:1999/Amd.1:2000,“信息技术-视听对象的编码-第2部分:视觉,修改件1:视觉扩展”。

5 ISO/IEC 14496-3:1999/Amd.1:2000, "Information technology - Coding of audio-visual objects - Part3: Audio, Amendment 1: Audio extensions".

5 ISO/IEC 14496-3:1999/Amd.1:2000,“信息技术-视听对象编码-第3部分:音频,修改件1:音频扩展”。

6 ISO/IEC 14496-1:1999, "Information technology - Coding of audio-visual objects - Part1: Systems".

6 ISO/IEC 14496-1:1999,“信息技术-视听对象编码-第1部分:系统”。

7 Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

7 Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

8 Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson "RTP: A Transport Protocol for Real Time Applications", RFC 1889, January 1996.

8 Schulzrinne,H.,Casner,S.,Frederick,R.和V.Jacobson,“RTP:实时应用的传输协议”,RFC 1889,1996年1月。

9 ISO/IEC 14496-2:1999/Cor.1:2000, "Information technology - Coding of audio-visual objects - Part2: Visual, Technical corrigendum 1".

9 ISO/IEC 14496-2:1999/Cor.1:2000,“信息技术——视听对象的编码——第2部分:视觉技术勘误1”。

8. Authors' Addresses
8. 作者地址

Yoshihiro Kikuchi Toshiba corporation 1, Komukai Toshiba-cho, Saiwai-ku, Kawasaki, 212-8582, Japan

日本川崎市西围区Komukai Toshiba cho Kikuchi-Yoshihiro Toshiba corporation 1,邮编:212-8582

   EMail: yoshihiro.kikuchi@toshiba.co.jp
        
   EMail: yoshihiro.kikuchi@toshiba.co.jp
        

Yoshinori Matsui Matsushita Electric Industrial Co., LTD. 1006, Kadoma, Kadoma-shi, Osaka, Japan

日本大阪市嘉道理市嘉道理松下电器工业有限公司

   EMail: matsui@drl.mei.co.jp
        
   EMail: matsui@drl.mei.co.jp
        

Toshiyuki Nomura NEC Corporation 4-1-1,Miyazaki,Miyamae-ku,Kawasaki,JAPAN

日本川崎市宫崎县宫崎骏野村日本电气公司4-1-1

   EMail: t-nomura@ccm.cl.nec.co.jp
        
   EMail: t-nomura@ccm.cl.nec.co.jp
        

Shigeru Fukunaga Oki Electric Industry Co., Ltd. 1-2-27 Shiromi, Chuo-ku, Osaka 540-6025 Japan.

日本大阪中古市Shiromi 1-2-27号福永茂Oki电气工业有限公司,邮编540-6025。

   EMail: fukunaga444@oki.co.jp
        
   EMail: fukunaga444@oki.co.jp
        

Hideaki Kimata Nippon Telegraph and Telephone Corporation 1-1, Hikari-no-oka, Yokosuka-shi, Kanagawa, Japan

Kimata Hideaki日本电报电话公司1-1,日本神奈川横须贺市Hikari no oka

   EMail: kimata@nttvdt.hil.ntt.co.jp
        
   EMail: kimata@nttvdt.hil.ntt.co.jp
        
9. Full Copyright Statement
9. 完整版权声明

Copyright (C) The Internet Society (2000). All Rights Reserved.

版权所有(C)互联网协会(2000年)。版权所有。

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.

上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Acknowledgement

确认

Funding for the RFC Editor function is currently provided by the Internet Society.

RFC编辑功能的资金目前由互联网协会提供。