TheVoĉoTheVoĉo
Platform

RTP vs SRTP: How Media Actually Flows in SIP Calls

Understand RTP and SRTP: essential protocols for voice and video in SIP. This guide for VoIP engineers clarifies media flow, security, and common pitfalls, ensuring robust real-time communication.

Product Team
Product Team
5 min read
Illustration for RTP vs SRTP: How Media Actually Flows in SIP Calls

Real-time Transport Protocol (RTP) is the workhorse behind delivering audio and video streams in Voice over IP (VoIP) calls, including those set up by SIP. It manages the real-time aspects like timing and sequencing. Secure Real-time Transport Protocol (SRTP) is an extension of RTP, adding crucial encryption, authentication, and replay protection to these media streams, ensuring privacy and integrity for sensitive communications.RTP emerged to tackle the unique challenges of real-time communication over packet-switched networks like the internet. Unlike file transfers where slight delays are acceptable, voice and video demand immediate delivery. RTP addresses issues like network jitter (variations in packet arrival time), packet loss, and ensuring packets are reassembled in the correct order for a smooth, natural conversation or video feed. It uses UDP for speed, but layers on sequence numbers and timestamps to achieve this real-time reliability.However, RTP itself offers no security. Media packets traversing the internet are vulnerable to eavesdropping, tampering, and replay attacks. This is where SRTP steps in. SRTP solves this critical security gap by encrypting the media payload, authenticating the packets to prevent modification, and providing replay protection to stop malicious re-insertion of old packets. For businesses handling sensitive data or operating under regulatory compliance (like GDPR, HIPAA), SRTP isn't just a feature – it's a necessity.How it works (step-by-step):RTP Basics: The Unsung Hero of Media FlowBefore any audio or video bits flow, the Session Initiation Protocol (SIP) sets up the call, negotiating parameters like codecs and port numbers through the Session Description Protocol (SDP). Once the SIP signaling is complete, RTP takes over.An RTP packet consists of a header and a payload. The header contains vital information:Sequence Number: Increments with each packet, allowing the receiver to detect lost packets and reorder out-of-sequence packets.Timestamp: Indicates the sampling instant of the first octet in the RTP payload, used to synchronize media and account for jitter.Synchronization Source (SSRC) Identifier: A unique 32-bit number that identifies the source of an RTP stream, helping distinguish different participants in a multi-party conference.Payload Type: Identifies the codec being used (e.g., G.711, G.729, H.264).RTP itself doesn't guarantee delivery, but it provides the tools for applications to manage real-time streams effectively. Complementing RTP is the RTP Control Protocol (RTCP), which monitors delivery statistics like packet loss, jitter, and round-trip delay, sending quality feedback to the communicating endpoints. This feedback helps adapt the stream quality if network conditions degrade.SRTP: Adding a Layer of TrustSRTP extends RTP by adding cryptographic functions. When SRTP is enabled, the SDP during SIP call setup will specify secure media. The core enhancements are:1. Encryption: Typically uses AES-CTR to encrypt the RTP payload, making it unreadable to unauthorized listeners.2. Authentication and Integrity: Uses HMAC-SHA1 to verify packets haven't been tampered with. A message authentication tag is appended to the SRTP packet.3. Replay Protection: Prevents attackers from re-sending old, valid SRTP packets by maintaining a receive window.The most crucial aspect of SRTP is key exchange. While older, less secure methods like SDES exist (embedding keys in SDP), modern deployments overwhelmingly use DTLS-SRTP (Datagram Transport Layer Security over SRTP). DTLS-SRTP leverages the security of DTLS for robust key negotiation and certificate exchange, ensuring SRTP keys are established securely and are unique for each call. The SDP will typically contain a fingerprint attribute when using DTLS-SRTP, referencing the certificate for the DTLS handshake.Media Path with/without SBC & NAT and ICE Considerations:The actual media path can vary significantly. In simple peer-to-peer setups, RTP/SRTP flows directly between the two endpoints once negotiated. However, in enterprise environments or when crossing different networks, a Session Border Controller (SBC) often acts as a crucial intermediary. An SBC can proxy both signaling and media, providing security (NAT traversal, topology hiding), quality of service, and interworking capabilities. When an SBC proxies media, it effectively terminates and re-originates the RTP/SRTP streams, which can add latency but offers greater control and security.A common challenge for RTP/SRTP is Network Address Translation (NAT). Since VoIP endpoints often reside behind private IP addresses, direct communication is impossible without assistance. This is where ICE (Interactive Connectivity Establishment), leveraging STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT), becomes vital. ICE helps endpoints discover their public IP addresses and negotiate the most direct media path possible, even through complex NAT scenarios. For a deeper dive into these techniques, check out our post on SIP NAT Traversal.Example: SDP for SRTP NegotiationHere's a simplified look at how SRTP capabilities might be advertised in an SDP offer, specifically for DTLS-SRTP:v =0o=alice 2890844526 2890844526 IN IP4 client.example.coms=SIP Callc=IN IP4 192.0.2.1t=0 0m=audio 50000 RTP/SAVP 0 8 101a=rtpmap:0 PCMU/8000a=rtpmap:8 PCMA/8000a=rtpmap:101 telephone-event/8000a=fmtp:101 0-16a=fingerprint:sha-256 00:11:22:33:44:55:66:77:88:99:AA:BB:CC:DD:EE:FF:...a=setup:activea=mid:audioThe
key lines here are m=audio ... RTP/SAVP (or RTP/SAVPF if RTCP feedback is also secured). SAVP indicates Secure Audio/Video Profile. The a=fingerprint attribute signals the use of DTLS-SRTP, providing a cryptographic hash of the endpoint's certificate, which the other side will use to authenticate the DTLS handshake. The a=setup:active attribute indicates this endpoint wants to be the DTLS client. For more on how SDP drives call setup, see our article on the SDP Offer/Answer Model.Diagram: Media TraversalUnderstanding the path media takes is crucial for troubleshooting. Here's a simplified sequence diagram illustrating the media flow, distinguishing between direct media and media proxied by an SBC.mermaidsequenceDiagram participant Alice participant SIP_Proxy participant Bob participant SBC Note over Alice,Bob: SIP Signaling to set up call Alice->>SIP_Proxy: INVITE (SDP Offer) SIP_Proxy->>Bob: INVITE (SDP Offer) Bob->>SIP_Proxy: 200 OK (SDP Answer) SIP_Proxy->>Alice: 200 OK (SDP Answer) Note over Alice,Bob: RTP/SRTP media parameters negotiated alt Direct Media Flow (No SBC proxy) Note over Alice,Bob: Media flows directly between endpoints Alice-->>Bob: RTP/SRTP Audio/Video Bob-->>Alice: RTP/SRTP Audio/Video else Media Flow via SBC Proxy Note over Alice,SBC: SBC proxies media for security, NAT, etc. Alice->>SBC: RTP/SRTP Audio/Video SBC->>Bob: RTP/SRTP Audio/Video Bob->>SBC: RTP/SRTP Audio/Video SBC->>Alice: RTP/SRTP Audio/Video endThis
diagram shows that while SIP signaling (the INVITE and 200 OK messages) often traverses a SIP proxy, the actual RTP/SRTP media typically tries to flow directly or through a media-aware component like an SBC. For a complete understanding of the entire process, refer to our SIP Call Flow Explained post.Common Mistakes:1. Mismatched SRTP Profiles/Keys: The most frequent issue. If endpoints offer incompatible SRTP profiles, or if DTLS-SRTP key exchange fails (e.g., certificate issues, firewalls, cipher suite mismatches), the call might connect without audio. Always verify a=crypto or a=fingerprint in SDP and check DTLS errors in logs.2. Firewall Misconfiguration: RTP/SRTP uses dynamic UDP port ranges (often 10000-20000). Firewalls must be configured to allow this traffic. An SBC can help centralize firewall rules.3. NAT Traversal Failures: Incorrect STUN/TURN server configurations, or firewalls blocking UDP, can lead to one-way or no audio. Public IP addresses may not be correctly discovered.4. Assuming End-to-End Encryption: Remember, SRTP encrypts the media. SIP signaling (INVITE, OK) might still be unencrypted unless secured by TLS (SIPS). Don't assume full security if only media is SRTP.5. Lack of SRTP Verification: Always confirm SRTP is active by looking for RTP/SAVP or RTP/SAVPF in the final SDP answer and verifying successful DTLS handshake in call logs.Related Terms (Internal Links):SIP (Session Initiation Protocol): The signaling protocol for establishing, modifying, and terminating real-time multimedia sessions.SDP (Session Description Protocol): Describes the media parameters for a session, like codecs, IP addresses, and ports. See: SDP Offer/Answer ModelRTCP (RTP Control Protocol): Companion protocol to RTP, used to monitor data delivery quality and synchronize streams.SBC (Session Border Controller): A network element that controls SIP signaling and media streams, often providing security, NAT traversal, and QoS.NAT Traversal (Network Address Translation): Techniques allowing devices behind NAT to communicate with external hosts. See: SIP NAT TraversalICE (Interactive Connectivity Establishment): A framework using STUN and TURN to find the best communication path between peers, especially through NATs.DTLS-SRTP (Datagram Transport Layer Security - Secure Real-time Transport Protocol): The modern, secure method for key exchange in SRTP.Conclusion:RTP and SRTP are foundational to modern VoIP and real-time communication. While RTP efficiently transports the core audio and video, SRTP provides the essential security layer needed for private and compliant communications. Understanding their mechanisms, from SDP negotiation to NAT traversal and potential pitfalls, empowers VoIP engineers and developers to build and maintain robust, secure communication systems. As the demand for secure real-time interactions grows, mastering the nuances of RTP vs SRTP becomes not just good practice, but a critical skill.

Tags:rtpsrtpsipmediasecurity