SDP Lines Explained: m=, a=, c=, rtpmap, fmtp

The Session Description Protocol (SDP) is the unsung hero behind every successful VoIP call and real-time communication session. It's the language devices use to describe, negotiate, and agree upon the parameters for a multimedia session before a single media packet is exchanged. Understanding fundamental SDP lines like m=, a=, c=, rtpmap, and fmtp is critical for anyone troubleshooting, developing, or operating communication systems.

Why SDP Exists: The Problem It Solves

Imagine two people wanting to talk, but one speaks only English and the other only Spanish. They need a way to agree on a common language before conversation can begin. In the world of real-time communications, devices face a similar challenge. They need to agree on various session parameters:

Which codecs to use (e.g., G.711, Opus, H.264)
Where to send the media (IP address and port)
What type of media is being exchanged (audio, video, text)
And various other settings like packetization time, security mechanisms, and more.

Without SDP, devices would struggle to establish compatible communication channels, leading to failed calls, one-way audio, or video glitches. SDP provides a standardized, text-based format for endpoints to exchange these capabilities and negotiate a common ground, paving the way for smooth media flow via RTP.

How SDP Works: A Step-by-Step Negotiation

SDP primarily operates within an "Offer/Answer" model, often carried by signaling protocols like SIP. One endpoint (the "offerer") sends an SDP message describing its preferred session parameters. The other endpoint (the "answerer") then responds with its own SDP message, accepting or modifying those parameters based on its capabilities and preferences. This exchange ensures both parties are on the same page. For a deeper dive, read our post on SDP Offer/Answer.

Let's break down the key SDP lines:

m= (Media Description Line):
The m= line is perhaps the most fundamental. It describes a specific media stream within the session. A single SDP message can contain multiple m= lines for different media types (e.g., one for audio, one for video).
- Syntax: m=<media> <port> <protocol> <format list>
- <media>: The media type (e.g., audio, video, text, application).
- <port>: The port number where the media stream will be sent/received. If this is 0, it indicates that the media stream is not desired or inactive.
- <protocol>: The transport protocol for the media (e.g., RTP/AVP for audio/video over RTP, UDP for simple UDP streams, RTP/SAVP for secure RTP).
- <format list>: A space-separated list of payload types (integers) that the endpoint is willing to use for this media stream. These payload types are then mapped to specific codecs using the a=rtpmap attribute.
Example: m=audio 5004 RTP/AVP 0 8 18 101
This indicates an audio stream on port 5004 using RTP/AVP, supporting payload types 0, 8, 18, and 101.
c= (Connection Data Line):
The c= line specifies the connection information, primarily the IP address where the media will be sent. It can appear at the session level (applying to all media streams) or at the media level (applying only to the specific m= line it precedes).
- Syntax: c=<nettype> <addrtype> <connection-address>
- <nettype>: Network type (e.g., IN for Internet).
- <addrtype>: Address type (e.g., IP4 for IPv4, IP6 for IPv6).
- <connection-address>: The IP address or multicast address.
Example: c=IN IP4 192.168.1.100
This specifies that media should be sent to 192.168.1.100.
a= (Attribute Line):
Attribute lines are highly versatile and provide additional information about the session or specific media streams. They can appear at the session level (before any m= lines) or at the media level (after an m= line, applying only to that specific stream). There are hundreds of registered SDP attributes, but two are particularly important for codec negotiation: rtpmap and fmtp.
- Syntax: a=<attribute>:<value> or a=<attribute>
Let's focus on rtpmap and fmtp:
- a=rtpmap (RTP Map Attribute):
  This attribute maps a dynamic payload type from the m= line to a specific codec and clock rate. This is crucial because standard payload types (like 0 for PCMU, 8 for PCMA) are predefined, but many codecs (like G.722, Opus, speex) require dynamic assignments.
  - Syntax: a=rtpmap:<payload type> <encoding name>/<clock rate>[/<encoding parameters>]
  - <payload type>: An integer from the m= line's format list.
  - <encoding name>: The codec name (e.g., PCMU, G722, OPUS).
  - <clock rate>: The sampling rate of the codec in Hertz (e.g., 8000 for G.711, 16000 for G.722, 48000 for Opus).
  - <encoding parameters> (optional): Additional parameters like the number of audio channels.
  Example: a=rtpmap:101 telephone-event/8000 (for DTMF tones)
  Example: a=rtpmap:97 G722/8000
  Example: a=rtpmap:109 opus/48000/2
- a=fmtp (Format Parameters Attribute):
  The fmtp attribute provides codec-specific configuration parameters for a given payload type. These are often mandatory for complex codecs to ensure proper operation.
  - Syntax: a=fmtp:<payload type> <format specific parameters>
  Example (for Opus): a=fmtp:109 minptime=10;useinbandfec=1;maxaveragebitrate=64000;cbr=1
  This tells the Opus codec for payload type 109 to use a minimum packet time of 10ms, enable in-band FEC, cap the bitrate at 64kbps, and use constant bitrate.
  Example (for H.264 video): a=fmtp:99 profile-level-id=42e01f;packetization-mode=1;level-asymmetry-allowed=1

Example: SDP Offer/Answer Flow

Let's trace a simplified SDP exchange for an audio call:

Endpoint A (Offerer) sends SDP:

v=0
o=Alice 2890844526 2890844526 IN IP4 192.168.1.100
s=A simple call
c=IN IP4 192.168.1.100
t=0 0
m=audio 5004 RTP/AVP 0 8 101
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=sendrecv

Endpoint A offers audio on port 5004 from its IP 192.168.1.100.
It supports PCMU (payload 0), PCMA (payload 8), and DTMF telephone-event (payload 101).
a=sendrecv indicates it can send and receive media.

Endpoint B (Answerer) receives, processes, and sends SDP:

v=0
o=Bob 2890844527 2890844527 IN IP4 10.0.0.50
s=A simple call
c=IN IP4 10.0.0.50
t=0 0
m=audio 6000 RTP/AVP 8 101
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=sendrecv

Endpoint B answers, specifying its IP 10.0.0.50 and port 6000.
It prefers PCMA (payload 8) and telephone-event (payload 101). It drops PCMU (payload 0) from the offer, meaning PCMA will be used for the call as it's the highest preference mutual codec.
Both parties now know to send PCMA audio to the other's specified IP and port.

Diagram Hint:

Here's a simple sequence diagram illustrating the SDP Offer/Answer flow:

sequenceDiagram
    participant EndpointA
    participant SignalingServer
    participant EndpointB

    EndpointA->>SignalingServer: SIP INVITE (with SDP Offer)
    SignalingServer->>EndpointB: SIP INVITE (with SDP Offer)
    EndpointB->>EndpointB: Process SDP Offer (Select Codecs, Ports)
    EndpointB->>SignalingServer: SIP 200 OK (with SDP Answer)
    SignalingServer->>EndpointA: SIP 200 OK (with SDP Answer)
    EndpointA->>EndpointA: Process SDP Answer
    EndpointA->>EndpointB: Start RTP/SRTP Media (using agreed parameters)
    EndpointB->>EndpointA: Start RTP/SRTP Media (using agreed parameters)

Common Mistakes When Working with SDP

Codec Mismatch / Order: If the offerer proposes codecs A, B, C and the answerer only supports D, E, F, there's no common ground, and the call will fail or have no audio. The order in the m= line indicates preference, but the answerer makes the final choice among the common ones.
Incorrect c= Address (NAT Issues): One of the most frequent problems. If a device is behind a NAT, it might advertise its private IP address in the c= line. The remote party, unable to reach this private IP, will fail to establish media. Solutions like STUN/TURN servers (using a=candidate attributes within ICE) help resolve this by discovering public IP addresses.
Missing rtpmap or fmtp: For dynamic payload types or complex codecs, omitting rtpmap prevents the mapping from payload type to codec, and missing fmtp can lead to improper codec initialization or suboptimal performance.
Attribute Placement: Misplacing an attribute (e.g., putting a media-specific attribute like a=ptime at the session level when it should apply only to audio) can lead to unintended behavior or parsing errors.
Unreachable Ports: The m= line specifies a port. If this port is blocked by a firewall or not properly configured for media reception, the call will experience one-way audio or no audio.