IMS

 

 

 

 

RTP/RTCP

 

RTP stands for Real Time Protocol and RTCP stands for Real Time Control Protocol. Simply put, RTP is a protocol to carry various real time data (e.g, audio, video) and RTCP is a kind of control mechanism for RTP.

 

 

 

Basic Functionalities of RTP and RTCP

 

Some of the import functions of RTP and RTCP are listed below. I think the best way to get the detailed understanding of RTP and RTCP is to go through header structure of these protocol and see what kind of parameter (information) are included in the header and understand the role (meaning) of each of the parameters.

  • RTP provides a thin protocol to support for various real time application (e.g, audio, video). This protocol provides the means to reconstruct timing, to detact loss and to perform security and to carry the content identification.
  • RTP is designed to follow the architectural principle known as Application Level Framing.
  • RTP provides a flexible mechanism by which new applications can be developed without repeatedly revsing RTP itself.
  • RTCP provides the means to control RTP. It offser the QoS feedback from the recievers and it support for the synchronization of different media streams. It also carries information about participants in a group session.

 

 

 

 

Intreplay between RTP and RTCP

 

In most of SIP based traffic, RTP and RTCP interplays each other. RTP is responsible for transmitting the multimedia content, while RTCP is responsible for providing feedback and control information about the RTP stream. Both protocols work together to ensure the delivery of high-quality multimedia content over the internet.

 

RTP is responsible for transporting the multimedia data, such as audio and video, from the source to the destination. It works by breaking the data into small packets and sending them over the network. RTP does not provide any error correction or flow control, but it does provide timestamping and sequencing information that allows the receiving end to reconstruct the original data stream.

 

On the other hand, RTCP is responsible for providing feedback and control information about the RTP stream. RTCP packets are sent from the receiver back to the sender, containing information such as the number of packets received, the amount of data received, and the quality of service. This information can be used by the sender to adjust the transmission rate and quality of the stream.

 

 

 

Intreplay between SIP and RTP/RTCP

 

In short, SIP is used to establish and manage multimedia sessions in IP-based networks, while RTP and RTCP are used to transport the actual multimedia content between the endpoints. The SIP protocol is used to negotiate the parameters for the RTP and RTCP streams, and may also contain information about the IP addresses and port numbers to be used for the RTP and RTCP connections.

 

When a SIP session is established, the SIP protocol is used to negotiate the parameters for the multimedia exchange, such as the codecs to be used, the transport protocol (UDP or TCP), and the port numbers for the RTP and RTCP streams. Once the parameters are negotiated, the endpoints can begin transmitting RTP and RTCP packets over the network to transport the actual multimedia content.

 

For example, in a typical voice or video call using SIP, the endpoints first use SIP to establish the call and negotiate the parameters for the RTP and RTCP streams. Once the call is established, the endpoints begin sending RTP and RTCP packets containing the voice or video data.

 

During the call, the RTP and RTCP packets are sent independently of the SIP signaling messages, and are typically transported over separate UDP or TCP connections. However, the SIP signaling messages may contain information about the RTP and RTCP streams, such as the IP addresses and port numbers to be used, in order to facilitate the setup of the RTP and RTCP connections.

 

Following is a general procedure for voice call over SIP showing the interplay of SIP, RTP, RTCP. (NOTE : This is just an overal flow. There can be variation in term of details with each implementation. Check out this note for a concrete example of voice call over IMS/SIP in LTE).

    1) The user initiates a voice call from their SIP client, sending a SIP INVITE message to the recipient's SIP client. The INVITE message includes the details of the call, such as the SIP addresses of the caller and recipient, the media type (audio), and the preferred codec.

    2) The recipient's SIP client receives the INVITE message and sends a SIP 200 OK response message to the caller's SIP client, indicating that the call has been accepted. The 200 OK message also includes the IP address and port number for the RTP and RTCP streams.

    3) The caller's SIP client sends an ACK message to the recipient's SIP client, acknowledging the acceptance of the call.

    4) The caller's SIP client initiates the RTP stream by sending the first RTP packet containing the voice data to the recipient's IP address and port number. The RTP packet contains a sequence number and timestamp.

    5) The recipient's SIP client receives the RTP packet and sends an RTCP receiver report message back to the caller's SIP client. The receiver report includes information about the quality of the received voice data, such as the number of packets lost or received.

    6) The caller's SIP client receives the RTCP receiver report and adjusts the transmission rate of the voice data if necessary.

    7) The caller's SIP client continues to send RTP packets containing the voice data to the recipient's IP address and port number. The recipient's SIP client continues to send RTCP messages back to the caller's SIP client to provide feedback on the received voice data.

    8) At the end of the call, either the caller or the recipient sends a SIP BYE message to terminate the call. The BYE message is acknowledged with a SIP 200 OK message.

    9) The RTP and RTCP streams are closed, and the call is terminated.

 

 

 

How can we figure out the type of transport for RTP ?

 

How can I know whether TCP or UDP is used for a specific RTP session ?

 

The decision to use either TCP or UDP for the RTP transport protocol is typically made during the initial SDP negotiation between the endpoints in the SIP INVITE message. The SDP message contains a media-level attribute called "proto", which indicates the transport protocol to be used for the associated media stream.

 

If the "proto" attribute is set to "RTP/AVP", it indicates that the RTP and RTCP streams will use UDP as the transport protocol. If the "proto" attribute is set to "RTP/AVP/TCP", it indicates that the RTP and RTCP streams will use TCP as the transport protocol.

 

Here is an example of an SDP message that indicates the use of UDP for RTP:

    v=0

    o=user1 1234 5678 IN IP4 192.0.2.1

    s=Session SDP

    c=IN IP4 192.0.2.1

    t=0 0

    m=audio 49170 RTP/AVP 0  // here there is no keyword TCP implying that it is UDP with port number 49170

    a=rtpmap:0 PCMU/8000

    a=sendrecv

 

Here is an example of an SDP message that indicates the use of TCP for RTP:

    v=0

    o=user1 1234 5678 IN IP4 192.0.2.1

    s=Session SDP

    c=IN IP4 192.0.2.1

    t=0 0

    m=audio 49170 RTP/AVP/TCP 0  // here there is the keyword TCP implying that it is TCP with port number 49170

    a=rtpmap:0 PCMU/8000

    a=sendrecv

 

 

 

RTP  Structure

 

Following diagram shows the structure of a RTP header in UDP. But the same structure can be embedded into TCP payload as well.

 

  • V (Version) : Fixed to '2' as of now (Dec 2017)
  • P (Padding) : Indicates whether the packet contains the padding octets at the end, or not.
    • 0 - No Padding in the packet
    • 1 - Padding in the packet. The last ocet of the padding contains a count of how many padding octetns should be ignored. NOTE : Why we need this kind of padding ? It is for the cases where the packet should be filled up a block of certain size as required by an encryption algorithm.
  • X (Extension) : Indicates the existance of the extension header
    • 0 - No extension header
    • 1 - Exactly one extension header follows the fixed header
  • CC (CSRC Count) : Indicates the number of CSRC identifiers that follows the fixed header.
  • M (Marker) : This is the field intended for marking a special events (e.g, frame broundary). The exact meaning of this marker varies with a profile.
  • PT (Payload Type) : Indicates the format of the RTP payload and determines its interpretation by the application.

 

< RFC 3551 : Table 4: Payload types (PT) for audio encodings >

 

               PT   encoding    media type  clock rate   channels

                    name                    (Hz)

               ___________________________________________________

               0    PCMU        A            8,000       1

               1    reserved    A

               2    reserved    A

               3    GSM         A            8,000       1

               4    G723        A            8,000       1

               5    DVI4        A            8,000       1

               6    DVI4        A           16,000       1

               7    LPC         A            8,000       1

               8    PCMA        A            8,000       1

               9    G722        A            8,000       1

               10   L16         A           44,100       2

               11   L16         A           44,100       1

               12   QCELP       A            8,000       1

               13   CN          A            8,000       1

               14   MPA         A           90,000       (see text)

               15   G728        A            8,000       1

               16   DVI4        A           11,025       1

               17   DVI4        A           22,050       1

               18   G729        A            8,000       1

               19   reserved    A

               20   unassigned  A

               21   unassigned  A

               22   unassigned  A

               23   unassigned  A

               dyn  G726-40     A            8,000       1

               dyn  G726-32     A            8,000       1

               dyn  G726-24     A            8,000       1

               dyn  G726-16     A            8,000       1

               dyn  G729D       A            8,000       1

               dyn  G729E       A            8,000       1

               dyn  GSM-EFR     A            8,000       1

               dyn  L8          A            var.        var.

               dyn  RED         A                        (see text)

               dyn  VDVI        A            var.        1

 

 

< RFC 3551 : Table 5: Payload types (PT) for video and combined encodings >

 

               PT      encoding    media type  clock rate

                       name                    (Hz)

               _____________________________________________

               24      unassigned  V

               25      CelB        V           90,000

               26      JPEG        V           90,000

               27      unassigned  V

               28      nv          V           90,000

               29      unassigned  V

               30      unassigned  V

               31      H261        V           90,000

               32      MPV         V           90,000

               33      MP2T        AV          90,000

               34      H263        V           90,000

               35-71   unassigned  ?

               72-76   reserved    N/A         N/A

               77-95   unassigned  ?

               96-127  dynamic     ?

               dyn     H263-1998   V           90,000

 

 

Example 01 >

    Real-Time Transport Protocol

        [Stream setup by SDP (frame 9)]

            [Setup frame: 9]

            [Setup Method: SDP]

        10.. .... = Version: RFC 1889 Version (2)

        ..0. .... = Padding: False

        ...0 .... = Extension: False

        .... 0000 = Contributing source identifiers count: 0

        1... .... = Marker: True

        Payload type: DynamicRTP-Type-97 (97)

        Sequence number: 21770

        [Extended sequence number: 87306]

        Timestamp: 483390931

        Synchronization Source identifier: 0x113db031 (289255473)

        Payload: f03c70696cc8e17b8588e8e60ff623270580007e56418a28...

 

 

 

RTCP Structure

 

The RTCP packet structure provides a mechanism for monitoring the quality of multimedia streams in real-time, enabling applications to adjust their behavior in response to changes in the network conditions.

 

The structure of the RTCP packet may vary depending on the type of packet being sent. For example, RTCP Sender Report (SR) packets contain additional information, such as the timestamp of the last packet sent by the source, the number of packets sent, and the number of bytes sent. RTCP Receiver Report (RR) packets contain information about the quality of the received packets, such as the number of packets lost and the interarrival time between packets.

  • Header: The header is 8 bytes long and contains information about the RTCP packet, such as the packet type, the version of the protocol being used, the length of the packet in 32-bit words, and the synchronization source (SSRC) identifier.
  • Payload: The payload contains a variable number of report blocks, which are used to provide feedback on the quality of the multimedia stream. Each report block is 24 bytes long and contains information about a particular source contributing to the multimedia stream. The payload may also contain other types of information, such as SDES (Source Description) information, which provides additional details about the sources contributing to the multimedia stream.

 

 

RTCP Header

 

Following diagram shows the structure of a RTCP header in UDP. But the same structure can be embedded into TCP payload as well.

 

 

RTCP - SR (Sender Report)

 

RTCP SR (Sender Report) is a type of RTCP packet that is sent by the sender of an RTP stream to report on the characteristics of the stream. The RTCP SR packet is sent periodically by the sender of the RTP stream to provide information about the characteristics of the stream and to enable the other participants in the session to adjust their reception parameters accordingly

 

 

 

Followings are brief descriptions for each of the field :

  • Version (2 bits): The version of the RTCP protocol being used. Currently, the only version is 2.
  • Padding (1 bit): Indicates whether the packet contains additional padding bytes at the end.
  • Reception report count (RR Count) (5 bits): The number of reception report blocks (RR blocks) that follow the sender information.
  • Packet type (PT) (8 bits): The RTCP packet type, which in this case is set to 200 for SR packets.
  • Length (16 bits): The length of the RTCP packet in 32-bit words, minus one.
  • Sender SSRC (32 bits): The synchronization source identifier for the sender of the RTP stream.
  • NTP timestamp (64 bits): The current time according to the sender's clock, expressed as the number of seconds since January 1, 1900, and the fraction of a second.
  • RTP timestamp (32 bits): The timestamp of the first packet in the RTP stream sent by the sender.
  • Sender's packet count (32 bits): The total number of RTP data packets sent by the sender.
  • Sender's octet count (32 bits): The total number of payload bytes (excluding headers) sent by the sender.

 

 

RTCP - RR (Reception Report)

 

The reception report blocks (RR blocks) contain information about the quality of service of the RTP stream as received by the other participants in the session.

The reception report blocks provide feedback on the quality of service of the stream as experienced by the other participants, allowing the sender to adjust its transmission parameters to improve the quality of the stream.

 

Followings are the field description of RR packet.

  • Reception report SSRC (32 bits): The synchronization source identifier for the participant being reported on.
  • Fraction lost (8 bits): The fraction of RTP data packets lost by the participant being reported on, expressed as a fixed-point number.
  • Cumulative number of packets lost (24 bits): The total number of RTP data packets lost by the participant being reported on since the beginning of the session.
  • Extended highest sequence number received (32 bits): The highest sequence number received by the participant being reported on.
  • Interarrival jitter (32 bits): An estimate of the statistical variance of the RTP data packet interarrival time at the receiver.
  • Last SR timestamp (32 bits): The NTP timestamp of the last RTCP SR packet sent by the sender being reported on.
  • Delay since last SR (32 bits): The time elapsed between the receipt of the last RTCP SR packet from the sender being reported on and the sending of the reception report.

 

 

 

RTCP - SDES (Source Description)

 

SDES (Source Description) is a type of RTCP packet that is used to provide additional information about the participants in a session. It  is typically sent periodically by each participant in a session to provide additional information about themselves and to enable the other participants to identify them. The information provided can be used, for example, to display the names of the participants in a video conference or to help diagnose network problems by providing information about the tools being used by the participants. The SDES items are optional and can be included or excluded as needed, depending on the requirements of the application.

 

 

The RTCP SDES packet contains the following fields:

  • Version (2 bits): The version of the RTCP protocol being used. Currently, the only version is 2.
  • Padding (1 bit): Indicates whether the packet contains additional padding bytes at the end.
  • Item count (5 bits): The number of items in the packet.
  • Packet type (PT) (8 bits): The RTCP packet type, which in this case is set to 202 for SDES packets.
  • Length (16 bits): The length of the RTCP packet in 32-bit words, minus one.
  • SSRC/CSRC (32 bits): The synchronization source identifier for the participant being described.
  • SDES items: A series of SDES items containing information about the participant being described.

 

Each SDES item consists of a type field and a value field. The type field specifies the type of information being provided, while the value field contains the actual information. The following are some of the most commonly used SDES item types:

  • CNAME (Canonical Name): The canonical name of the participant being described.
  • NAME (Name): The name of the participant being described.
  • EMAIL (Email Address): The email address of the participant being described.
  • PHONE (Phone Number): The phone number of the participant being described.
  • LOC (Location): The geographic location of the participant being described.
  • TOOL (Tool): The name and version of the tool being used by the participant being described.
  • NOTE (Note): A note or comment about the participant being described.

 

 

Example 01 >

    Real-time Transport Control Protocol (Sender Report)

        [Stream setup by SDP (frame 2)]

            [Setup frame: 2]

            [Setup Method: SDP]

        10.. .... = Version: RFC 1889 Version (2)

        ..0. .... = Padding: False

        ...0 0001 = Reception report count: 1

        Packet type: Sender Report (200)

        Length: 12 (52 bytes)

        Sender SSRC: 0x113db030 (289255472)

        Timestamp, MSW: 300 (0x0000012c)

        Timestamp, LSW: 1786706395 (0x6a7ef9db)

        [MSW and LSW as NTP timestamp: Feb  7, 2036 06:33:16.415999999 UTC]

        RTP timestamp: 483391347

        Sender's packet count: 4

        Sender's octet count: 132

        Source 1

            Identifier: 0x113db031 (289255473)

            SSRC contents

                Fraction lost: 0 / 256

                Cumulative number of packets lost: 1

            Extended highest sequence number received: 0

                Sequence number cycles count: 0

                Highest sequence number received: 0

            Interarrival jitter: 0

            Last SR timestamp: 0 (0x00000000)

            Delay since last SR timestamp: 0 (0 milliseconds)

    Real-time Transport Control Protocol (Source description)

        [Stream setup by SDP (frame 2)]

            [Setup frame: 2]

            [Setup Method: SDP]

        10.. .... = Version: RFC 1889 Version (2)

        ..0. .... = Padding: False

        ...0 0001 = Source count: 1

        Packet type: Source description (202)

        Length: 4 (20 bytes)

        Chunk 1, SSRC/CSRC 0x113DB030

            Identifier: 0x113db030 (289255472)

            SDES items

                Type: CNAME (user and domain) (1)

                Length: 7

                Text: unknown

                Type: END (0)

    [RTCP frame length check: OK - 72 bytes]

 

 

Example 02 >

    Real-time Transport Control Protocol (Sender Report)

        [Stream setup by SDP (frame 112)]

            [Setup frame: 112]

            [Setup Method: SDP]

        10.. .... = Version: RFC 1889 Version (2)

        ..0. .... = Padding: False

        ...0 0001 = Reception report count: 1

        Packet type: Sender Report (200)

        Length: 12 (52 bytes)

        Sender SSRC: 0x3f8ead04 (1066315012)

        Timestamp, MSW: 3721497922 (0xddd18d42)

        Timestamp, LSW: 1046146659 (0x3e5aee63)

        [MSW and LSW as NTP timestamp: Dec  5, 2017 21:25:22.243574999 UTC]

        RTP timestamp: 1313316619

        Sender's packet count: 72

        Sender's octet count: 60905

        Source 1

            Identifier: 0x3f8ead03 (1066315011)

            SSRC contents

                Fraction lost: 0 / 256

                Cumulative number of packets lost: 0

            Extended highest sequence number received: 6346

                Sequence number cycles count: 0

                Highest sequence number received: 6346

            Interarrival jitter: 2363

            Last SR timestamp: 2369797437 (0x8d403d3d)

            Delay since last SR timestamp: 130277 (1987 milliseconds)

 

 

Example 03 >

    Real-time Transport Control Protocol (Sender Report)

        [Stream setup by SDP (frame 125)]

            [Setup frame: 125]

            [Setup Method: SDP]

        10.. .... = Version: RFC 1889 Version (2)

        ..0. .... = Padding: False

        ...0 0001 = Reception report count: 1

        Packet type: Sender Report (200)

        Length: 12 (52 bytes)

        Sender SSRC: 0x00819e3b (8494651)

        Timestamp, MSW: 8496 (0x00002130)

        Timestamp, LSW: 2924872728 (0xae560418)

        [MSW and LSW as NTP timestamp: Feb  7, 2036 08:49:52.680999999 UTC]

        RTP timestamp: 32000

        Sender's packet count: 53

        Sender's octet count: 2223

        Source 1

            Identifier: 0x00819e3c (8494652)

            SSRC contents

                Fraction lost: 4 / 256

                Cumulative number of packets lost: 1

            Extended highest sequence number received: 52

                Sequence number cycles count: 0

                Highest sequence number received: 52

            Interarrival jitter: 3

            Last SR timestamp: 0 (0x00000000)

            Delay since last SR timestamp: 0 (0 milliseconds)

    Real-time Transport Control Protocol (Source description)

        [Stream setup by SDP (frame 125)]

            [Setup frame: 125]

            [Setup Method: SDP]

        10.. .... = Version: RFC 1889 Version (2)

        ..0. .... = Padding: False

        ...0 0001 = Source count: 1

        Packet type: Source description (202)

        Length: 6 (28 bytes)

        Chunk 1, SSRC/CSRC 0x819E3B

            Identifier: 0x00819e3b (8494651)

            SDES items

                Type: CNAME (user and domain) (1)

                Length: 14

                Text: 2001:0:0:1::11

                Type: END (0)

 

 

Example 04 >

    Real-time Transport Control Protocol (Goodbye)

        [Stream setup by SDP (frame 125)]

            [Setup frame: 125]

            [Setup Method: SDP]

        10.. .... = Version: RFC 1889 Version (2)

        ..0. .... = Padding: False

        ...0 0001 = Source count: 1

        Packet type: Goodbye (203)

        Length: 2 (12 bytes)

        Identifier: 0x3f8ead03 (1066315011)

        Length: 0

        Text:

 

 

Example 05 >

    Real-time Transport Control Protocol (Sender Report)

        [Stream setup by SDP (frame 125)]

            [Setup frame: 125]

            [Setup Method: SDP]

        10.. .... = Version: RFC 1889 Version (2)

        ..0. .... = Padding: False

        ...0 0001 = Reception report count: 1

        Packet type: Sender Report (200)

        Length: 12 (52 bytes)

        Sender SSRC: 0x00819e3b (8494651)

        Timestamp, MSW: 8530 (0x00002152)

        Timestamp, LSW: 2744484102 (0xa3958106)

        [MSW and LSW as NTP timestamp: Feb  7, 2036 08:50:26.638999999 UTC]

        RTP timestamp: 575328

        Sender's packet count: 789

        Sender's octet count: 30996

        Source 1

            Identifier: 0x00819e3c (8494652)

            SSRC contents

                Fraction lost: 0 / 256

                Cumulative number of packets lost: 1

            Extended highest sequence number received: 788

                Sequence number cycles count: 0

                Highest sequence number received: 788

            Interarrival jitter: 3

            Last SR timestamp: 558933082 (0x2150a45a)

            Delay since last SR timestamp: 129369 (1974 milliseconds)

    Real-time Transport Control Protocol (Source description)

        [Stream setup by SDP (frame 125)]

            [Setup frame: 125]

            [Setup Method: SDP]

        10.. .... = Version: RFC 1889 Version (2)

        ..0. .... = Padding: False

        ...0 0001 = Source count: 1

        Packet type: Source description (202)

        Length: 6 (28 bytes)

        Chunk 1, SSRC/CSRC 0x819E3B

            Identifier: 0x00819e3b (8494651)

            SDES items

                Type: CNAME (user and domain) (1)

                Length: 14

                Text: 2001:0:0:1::11

                Type: END (0)

    Real-time Transport Control Protocol (Goodbye)

        [Stream setup by SDP (frame 125)]

            [Setup frame: 125]

            [Setup Method: SDP]

        10.. .... = Version: RFC 1889 Version (2)

        ..0. .... = Padding: False

        ...0 0001 = Source count: 1

        Packet type: Goodbye (203)

        Length: 7 (32 bytes)

        Identifier: 0x00819e3b (8494651)

        Length: 22

        Text: Disconnect IMS Session

 

 

 

 

Reference :

 

[1] RTP: Multimedia Streaming over IP  

[2] RTP, RTCP and RTSP - Internet Protocols for Real-Time Multimedia Communication

[3] 6.3.1 SR: Sender report RTCP packet