How to calculate effective time offset in RTP
I have to calculate time offset between packets in RTP streams. With video stream encoded with Theora codec i have timestamp field like
2856000
2940000
3024000
...
So I assume that transmission offset is 84000. With audio speex codec i have timestamp field like
38080
38400
38720
...
So I assume that transmission offset is 320. Why values so different? Are they microseconds, milliseconds, or what? Can i generalize a formula to calculate delay between packets in microseconds that works with any codec开发者_开发百科? Thank you.
RTP timestamps are media dependant. They use the sampling rate of the codec in use. You have to convert them to milliseconds before comparing with your clock or with timestamps from other RTP streams.
Added:
To convert the timstamp to seconds, just divide the timestamp by the sample rate. For most audio codecs, the sample rate is 8 kHz.
See here for a few examples.
Note that video codecs typically use 90000 for the timestamp rate.
Instead of guessing at the clock rate, look at the a=rtpmap line in the sdp for the payload in use. Example:
a=audio 5678 RTP/AVP 0 8 99
a=rtpmap 0 PCMU/8000
a=rtpmap 8 PCMA/8000
a=rtpmap 99 AAC-LD/16000
If the payload is 0 or 8, timestamps are 8KHz. If it's 99, they're 16KHz. Note that the rtpmap line has an optional 'channels' parameter, as in "a=rtpmap payload name/rate[/channels]"
Been researching this question for about an hour for the case of audio. Seems like the answer is: the RTP timestamp is incremented by the number of audio time units (samples) in a packet. Take this example where you have a stream of encoded, 2 channel audio, sampled at 44100 before the audio was encoded. Say that you send 512 audio samples (256 time units because we have 2 channel audio) for every packet. Assuming the first packet has a timestamp of 0 (it should be random though according to the RTP spec (RFC 3550)), the second timestamp would be 256, and the third 512. The receiver can convert the value back to an actual time by dividing the timestamp by the audio sample rate, so the first packet would be T0, the second equals 256/44100=0.0058 seconds, the third equals 512/44100=0.0116 seconds, etc.
Someone please correct me if I'm wrong, I'm not sure why there aren't any articles online that state it this way. I guess it would be more complicated if the resolution of the RTP timestamp is different than the sample rate of the audio stream. Nevertheless, converting the timestamp to a different resolution is not complicated. Use the example as before, but change the resolution of the RTP timestamp to 90 kHz, as in MPEG4 Audio (RFC 3016). From the source side the first timestamp is 0, the second is 90000*(256/44100)=522, and the third is 1044. And on the receiver, the time is 0 for first packet, 522/90000=0.0058 for the second, and 1044/90000=0.0116 for the third. Again, someone please correct me if I'm wrong.
精彩评论