Network Working Group                                          M. Civanlar
Request for Comments: 2343                                         G. Cash
Category: Experimental                                          B. Haskell
                                                        AT&T Labs-Research
                                                                  May 1998
        
Network Working Group                                          M. Civanlar
Request for Comments: 2343                                         G. Cash
Category: Experimental                                          B. Haskell
                                                        AT&T Labs-Research
                                                                  May 1998
        

RTP Payload Format for Bundled MPEG

捆绑MPEG的RTP有效负载格式

Status of this Memo

本备忘录的状况

This memo defines an Experimental Protocol for the Internet community. This memo does not specify an Internet standard of any kind. Discussion and suggestions for improvement are requested. Distribution of this memo is unlimited.

这份备忘录为互联网社区定义了一个实验性协议。本备忘录未规定任何类型的互联网标准。要求进行讨论并提出改进建议。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (1998). All Rights Reserved.

版权所有(C)互联网协会(1998年)。版权所有。

Abstract

摘要

This document describes a payload type for bundled, MPEG-2 encoded video and audio data that may be used with RTP, version 2. Bundling has some advantages for this payload type particularly when it is used for video-on-demand applications. This payload type may be used when its advantages are important enough to sacrifice the modularity of having separate audio and video streams.

本文档描述了捆绑的MPEG-2编码视频和音频数据的有效负载类型,该数据可与RTP版本2一起使用。捆绑对于这种负载类型有一些优势,特别是当它用于视频点播应用时。当这种有效负载类型的优点非常重要,足以牺牲具有单独音频和视频流的模块性时,可以使用这种有效负载类型。

1. Introduction
1. 介绍

This document describes a bundled packetization scheme for MPEG-2 encoded audio and video streams using the Real-time Transport Protocol (RTP), version 2 [1].

本文档描述了使用实时传输协议(RTP)版本2[1]的MPEG-2编码音频和视频流的捆绑打包方案。

The MPEG-2 International standard consists of three layers: audio, video and systems [2]. The audio and the video layers define the syntax and semantics of the corresponding "elementary streams." The systems layer supports synchronization and interleaving of multiple compressed streams, buffer initialization and management, and time identification. RFC 2250 [3] describes packetization techniques to transport individual audio and video elementary streams as well as the transport stream, which is defined at the system layer, using the RTP.

MPEG-2国际标准由三层组成:音频、视频和系统[2]。音频和视频层定义了相应“基本流”的语法和语义。系统层支持多个压缩流的同步和交织、缓冲区初始化和管理以及时间标识。RFC 2250[3]描述了使用RTP传输单个音频和视频基本流以及在系统层定义的传输流的打包技术。

The bundled packetization scheme is needed because it has several advantages over other schemes for some important applications including video-on-demand (VOD) where, audio and video are always used together. Its advantages over independent packetization of audio and video are:

捆绑打包方案是必要的,因为对于一些重要的应用,包括视频点播(VOD),它比其他方案有几个优点,其中音频和视频总是一起使用。与音频和视频的独立打包相比,其优势在于:

1. Uses a single port per "program" (i.e. bundled A/V). This may increase the number of streams that can be served e.g., from a VOD server. Also, it eliminates the performance hit when two ports are used for the separate audio and video streams on the client side.

1. 每个“程序”使用一个端口(即捆绑a/V)。这可能增加例如从VOD服务器可以服务的流的数量。此外,当客户端的两个端口用于单独的音频和视频流时,它消除了性能影响。

2. Provides implicit synchronization of audio and video. This is particularly convenient when the A/V data is stored in an interleaved format at the server.

2. 提供音频和视频的隐式同步。当A/V数据以交织格式存储在服务器上时,这尤其方便。

3. Reduces the header overhead. Since using large packets increases the effects of losses and delay, audio only packets need to be smaller increasing the overhead. An A/V bundled format can provide about 1% overall overhead reduction. Considering the high bitrates used for MPEG-2 encoded material, e.g. 4 Mbps, the number of bits saved, e.g. 40 Kbps, may provide noticeable audio or video quality improvement.

3. 减少标头开销。由于使用大数据包会增加丢失和延迟的影响,因此仅音频数据包需要更小,从而增加开销。A/V捆绑格式可以提供大约1%的总体开销减少。考虑到用于MPEG-2编码材料的高比特率,例如4mbps,保存的比特数,例如40kbps,可以提供显著的音频或视频质量改进。

4. May reduce overall receiver buffer size. Audio and video streams may experience different delays when transmitted separately. The receiver buffers need to be designed for the longest of these delays. For example, let's assume that using two buffers, each with a size B, is sufficient with probability P when each stream is transmitted individually. The probability that the same buffer size will be sufficient when both streams need to be received is P times the conditional probability of B being sufficient for the second stream given that it was sufficient for the first one. This conditional probability is, generally, less than one requiring use of a larger buffer size to achieve the same probability level.

4. 可能会减少整个接收器缓冲区的大小。音频和视频流在单独传输时可能会经历不同的延迟。接收机缓冲器需要设计为这些延迟中最长的一个。例如,假设当每个流单独传输时,使用两个缓冲区(每个缓冲区的大小为B)的概率P就足够了。当两个流都需要接收时,相同缓冲区大小足够的概率是B对第二个流足够的条件概率的P倍,前提是B对第一个流足够。通常,该条件概率小于需要使用较大缓冲区大小以达到相同概率水平的概率。

5. May help with the control of the overall bandwidth used by an A/V program.

5. 可能有助于控制A/V程序使用的总带宽。

And, the advantages over packetization of the transport layer streams are:

并且,与传输层流的分组相比的优点是:

1. Reduced overhead. It does not contain systems layer information which is redundant for the RTP (essentially they address similar issues).

1. 减少了开销。它不包含RTP冗余的系统层信息(基本上它们解决了类似的问题)。

2. Easier error recovery. Because of the structured packetization consistent with the application layer framing (ALF) principle, loss concealment and error recovery can be made simpler and more effective.

2. 更容易的错误恢复。由于与应用层成帧(ALF)原理一致的结构化分组,丢失隐藏和错误恢复可以变得更简单、更有效。

2. Encapsulation of Bundled MPEG Video and Audio
2. 捆绑MPEG视频和音频的封装

Video encapsulation follows rules similar to the ones described in [3] for encapsulation of MPEG elementary streams. Specifically,

视频封装遵循与[3]中描述的MPEG基本流封装规则相似的规则。明确地

1. The MPEG Video_Sequence_Header, when present, will always be at the beginning of an RTP payload.

1. MPEG Video_Sequence_头(如果存在)将始终位于RTP有效负载的开头。

2. An MPEG GOP_header, when present, will always be at the beginning of the RTP payload, or will follow a Video_Sequence_Header.

2. MPEG GOP_头(当存在时)将始终位于RTP有效载荷的开头,或者将跟随视频序列_头。

3. An MPEG Picture_Header, when present, will always be at the beginning of a RTP payload, or will follow a GOP_header.

3. MPEG Picture_头(如果存在)将始终位于RTP有效负载的开头,或紧跟在GOP_头之后。

In addition to these, it is required that:

除此之外,还要求:

4. Each packet must contain an integral number of video slices.

4. 每个数据包必须包含整数个视频片段。

It is the application's responsibility to adjust the slice sizes and the number of slices put in each RTP packet so that lower level fragmentation does not occur. This approach simplifies the receivers while somewhat increasing the complexity of the transmitter's packetizer. Considering that a slice can be as small as a single macroblock, it is possible to prevent fragmentation for most of the cases. If a packet size exceeds the path maximum transmission unit (path-MTU) [4], this payload type depends on the lower protocol layers for fragmentation although, this may cause problems with packet classification for integrated services (e.g. with RSVP).

应用程序负责调整切片大小和放入每个RTP数据包中的切片数量,以便不发生较低级别的碎片。这种方法简化了接收机,同时在一定程度上增加了发射机打包器的复杂性。考虑到切片可以与单个宏块一样小,在大多数情况下可以防止碎片。如果数据包大小超过路径最大传输单元(路径MTU)[4],则此有效负载类型取决于较低的协议层的碎片,尽管这可能会导致集成服务(例如,使用RSVP)的数据包分类问题。

The video data is followed by a sufficient number of integral audio frames to cover the duration of the video segment included in a packet. For example, if the first packet contains three 1/900 seconds long slices of video, and Layer I audio coding is used at a 44.1kHz sampling rate, only one audio frame covering 384/44100 seconds of audio need be included in this packet. Since the length of this audio frame (8.71 msec.) is longer than that of the video segment contained in this packet (3.33 msec), the next few packets may not contain any audio frames until the packet in which the covered video time extends outside the length of the previously transmitted audio frames. Alternatively, it is possible, in this proposal, to repeat the latest audio frame in "no-audio" packets for

视频数据之后是足够数量的完整音频帧,以覆盖包括在分组中的视频段的持续时间。例如,如果第一个分组包含三个1/900秒长的视频片段,并且以44.1kHz采样率使用第一层音频编码,则该分组中只需要包括一个覆盖384/44100秒音频的音频帧。由于该音频帧的长度(8.71毫秒)比该分组中包含的视频段的长度(3.33毫秒)长,因此在覆盖视频时间延伸到先前发送的音频帧长度之外的分组之前,接下来的几个分组可能不包含任何音频帧。或者,在该方案中,可以在“无音频”分组中为每个用户重复最新的音频帧

packet loss resilience. Again, it is the application's responsibility to adjust the bundled packet size according to the minimum MTU size to prevent fragmentation.

数据包丢失恢复能力。同样,应用程序负责根据最小MTU大小调整捆绑包大小,以防止碎片。

2.1. RTP Fixed Header for BMPEG Encapsulation
2.1. 用于BMPEG封装的RTP固定头

The following RTP header fields are used:

使用以下RTP标头字段:

Payload Type: A distinct payload type number, which may be dynamic, should be assigned to BMPEG.

有效负载类型:应为BMPEG分配一个不同的有效负载类型编号,该编号可能是动态的。

M Bit: Set for packets containing end of a picture.

M位:为包含图片结尾的数据包设置。

timestamp: 32-bit 90 kHz timestamp representing sampling time of the MPEG picture. May not be monotonically increasing if B pictures are present. Same for all packets belonging to the same picture. For packets that contain only a sequence, extension and/or GOP header, the timestamp is that of the subsequent picture.

时间戳:表示MPEG图片采样时间的32位90 kHz时间戳。如果存在B图片,则可能不是单调递增的。对于属于同一图片的所有数据包相同。对于仅包含序列、扩展和/或GOP报头的数据包,时间戳是后续图片的时间戳。

2.2. BMPEG Specific Header:

2.2. BMPEG特定标题:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | P |N|MBZ|    Audio Length   | |         Audio Offset          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                 MBZ
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | P |N|MBZ|    Audio Length   | |         Audio Offset          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                 MBZ
        

P: Picture type (2 bits). I (0), P (1), B (2).

P:图片类型(2位)。I(0),P(1),B(2)。

N: Header data changed (1 bit). Set if any part of the video sequence, extension, GOP and picture header data is different than that of the previously sent headers. It gets reset when all the header data gets repeated (see Appendix 1).

N:标题数据已更改(1位)。设置视频序列、分机、GOP和图片标题数据的任何部分是否与先前发送的标题不同。当所有标题数据重复时,它会被重置(见附录1)。

MBZ: Must be zero. Reserved for future use.

MBZ:必须是零。保留供将来使用。

Audio Length: (10 bits) Length of the audio data in this packet in bytes. Start of the audio data is found by subtracting "Audio Length" from the total length of the received packet.

音频长度:(10位)此数据包中音频数据的长度(字节)。音频数据的开始是通过从接收到的数据包的总长度中减去“音频长度”来找到的。

Audio Offset: (16 bits) The offset between the start of the audio frame and the RTP timestamp for this packet in number of audio samples (for multi-channel sources, a set of samples covering all channels is counted as one sample for this purpose.)

音频偏移:(16位)音频帧开始和该数据包的RTP时间戳之间的偏移,以音频样本数表示(对于多通道源,覆盖所有通道的一组样本被计为一个样本)

Audio offset is a signed integer in two's complement form. It allows a ~ +/- 750 msec offset at 44.1 KHz audio sampling rate. For a very low video frame rate (e.g., a frame per second), this offset may not be sufficient and this payload format may not be usable.

音频偏移量是2的补码形式的有符号整数。它允许在44.1 KHz音频采样率下~+/-750毫秒的偏移。对于非常低的视频帧速率(例如,每秒一帧),该偏移量可能不够,并且该有效负载格式可能不可用。

If B frames are present, audio frames are not re-ordered along with video. Instead, they are packetized along with video frames in their transmission order (e.g., an audio segment packetized with a video segment corresponding to a P picture may belong to a B picture, which will be transmitted later and should be rendered at the same time with this audio segment.) Even though the video segments are reordered, the audio offset for a particular audio segment is still relative to the RTP timestamp in the packet containing that audio segment.

如果存在B帧,则音频帧不会与视频一起重新排序。相反,它们按照其传输顺序与视频帧一起打包(例如,与对应于P图片的视频段打包的音频段可能属于B图片,其将在稍后传输并且应与该音频段同时呈现)。即使视频段被重新排序,特定音频段的音频偏移量仍然相对于包含该音频段的数据包中的RTP时间戳。

Since a special picture counter, such as the "temporal reference (TR)" field of [3], is not included in this payload format, lost GOP headers may not be detected. The only effect of this may be incorrect decoding of the B pictures immediately following the lost GOP header for some edited video material.

由于特殊图片计数器(例如[3]的“时间参考(TR)”字段)不包括在该有效载荷格式中,因此可能无法检测到丢失的GOP报头。这种情况的唯一影响可能是,对于某些编辑过的视频材料,在丢失GOP头之后,B图片的解码不正确。

3. Security Considerations
3. 安全考虑

RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [1]. This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed after compression so there is no conflict between the two operations.

使用本规范中定义的有效负载格式的RTP数据包应遵守RTP规范[1]中讨论的安全注意事项。这意味着媒体流的机密性是通过加密实现的。由于与此有效负载格式一起使用的数据压缩是端到端应用的,因此可以在压缩之后执行加密,因此两个操作之间没有冲突。

This payload type does not exhibit any significant non-uniformity in the receiver side computational complexity for packet processing to cause a potential denial-of-service threat.

这种有效负载类型在数据包处理的接收方计算复杂性方面没有表现出任何显著的不均匀性,从而导致潜在的拒绝服务威胁。

A security review of this payload format found no additional considerations beyond those in the RTP specification.

对该有效负载格式的安全性审查发现,除了RTP规范中的考虑之外,没有其他考虑。

Appendix 1. Error Recovery

附录1。错误恢复

Packet losses can be detected from a combination of the sequence number and the timestamp fields of the RTP fixed header. The extent of the loss can be determined from the timestamp, the slice number and the horizontal location of the first slice in the packet. The slice number and the horizontal location can be determined from the slice header and the first macroblock address increment, which are located at fixed bit positions.

可以从RTP固定报头的序列号和时间戳字段的组合中检测分组丢失。丢失的程度可以根据时间戳、片编号和数据包中第一个片的水平位置来确定。片数和水平位置可根据片头和第一宏块地址增量确定,它们位于固定位位置。

If lost data consists of slices all from the same picture, new data following the loss may simply be given to the video decoder which will normally repeat missing pixels from a previous picture. The next audio frame must be played at the appropriate time determined by the timestamp and the audio offset contained in the received packet. Appropriate audio frames (e.g., representing background noise) may need to be fed to the audio decoder in place of the lost audio frames to keep the lip-synch and/or to conceal the effects of the losses.

如果丢失的数据由来自同一图片的所有片段组成,则丢失后的新数据可以简单地提供给视频解码器,视频解码器通常会重复先前图片中丢失的像素。下一个音频帧必须在由时间戳和接收数据包中包含的音频偏移量确定的适当时间播放。可能需要将适当的音频帧(例如,表示背景噪声)馈送到音频解码器以代替丢失的音频帧,以保持唇形同步和/或隐藏丢失的影响。

If the received new data after a loss is from the next picture (i.e. no complete picture loss) and the N bit is not set, previously received headers for the particular picture type (determined from the P bits) can be given to the video decoder followed by the new data. If N is set, data deletion until a new picture start code is advisable unless headers are made available to the receiver through some other channel.

如果丢失后接收到的新数据来自下一个图片(即,没有完整的图片丢失),并且未设置N位,则可将先前接收到的特定图片类型的报头(由P位确定)提供给视频解码器,随后是新数据。如果设置了N,则建议删除数据,直到新的图片开始代码,除非通过其他通道使标题可供接收器使用。

If data for more than one picture is lost and headers are not available, unless N is zero and at least one packet has been received for every intervening picture of the same type and that the N bit was 0 for each of those pictures, resynchronization to a new video sequence header is advisable.

如果一个以上图片的数据丢失,且报头不可用,除非N为零,并且对于相同类型的每个介入图片至少接收到一个数据包,并且对于这些图片中的每个图片,N位为0,否则建议重新同步到新的视频序列报头。

In all cases of heavy packet losses, if the correct headers for the missing Pictures are available, they can be given to the video decoder and the received data can be used irrespective of the N bit value or the number of lost pictures.

在所有严重分组丢失的情况下,如果丢失图片的正确报头可用,则可以将其提供给视频解码器,并且可以使用接收到的数据,而与N比特值或丢失图片的数量无关。

Appendix 2. Resynchronization

附录2。再同步

As described in [3], use of frequent video sequence headers makes it possible to join in a program at arbitrary times. Also, it reduces the resynchronization time after severe losses.

如[3]所述,使用频繁的视频序列头可以在任意时间加入程序。此外,它还缩短了严重损失后的重新同步时间。

References

工具书类

[1] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996.

[1] Schulzrinne,H.,Casner,S.,Frederick,R.,和V.Jacobson,“RTP:实时应用的传输协议”,RFC 1889,1996年1月。

[2] ISO/IEC International Standard 13818; "Generic coding of moving pictures and associated audio information," November 1994.

[2] ISO/IEC国际标准13818;“运动图像和相关音频信息的通用编码”,1994年11月。

[3] Hoffman, D., Fernando, G., Goyal, V., and M. Civanlar, "RTP Payload Format for MPEG1/MPEG2 Video", RFC 2250, January 1998.

[3] Hoffman,D.,Fernando,G.,Goyal,V.,和M.Civanlar,“MPEG1/MPEG2视频的RTP有效载荷格式”,RFC 2250,1998年1月。

[4] Mogul, J., and S. Deering, "Path MTU Discovery", RFC 1191, November 1990.

[4] Mogul,J.和S.Deering,“MTU发现路径”,RFC 119119119990年11月。

Authors' Addresses

作者地址

M. Reha Civanlar AT&T Labs-Research 100 Schultz Drive Red Bank, NJ 07701 USA

M.Reha Civanlar AT&T实验室研究中心,美国新泽西州红银行舒尔茨大道100号,邮编:07701

   EMail: civanlar@research.att.com
        
   EMail: civanlar@research.att.com
        

Glenn L. Cash AT&T Labs-Research 100 Schultz Drive Red Bank, NJ 07701 USA

美国新泽西州红银行舒尔茨大道100号Glenn L.Cash AT&T实验室研究中心07701

   EMail: glenn@research.att.com
        
   EMail: glenn@research.att.com
        

Barry G. Haskell AT&T Labs-Research 100 Schultz Drive Red Bank, NJ 07701 USA

Barry G.Haskell AT&T实验室研究所,美国新泽西州红银行舒尔茨大道100号,邮编:07701

   EMail: bgh@research.att.com
        
   EMail: bgh@research.att.com
        

Full Copyright Statement

完整版权声明

Copyright (C) The Internet Society (1998). All Rights Reserved.

版权所有(C)互联网协会(1998年)。版权所有。

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.

上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。