- Should Have Gone With Cisco - http://shouldhavegonewithcisco.com -
One-way audio?
Posted By ted On October 4, 2008 @ 11:13 am In Voice | 2 Comments
I had a nice one-way audio issue I was troubleshooting on Friday.
Here’s the setup ::
Cali HQ site <-----> MPLS Cloud <-------> LA Regional office (PRI located here for LA Sales Office PSTN access)
Cali HQ site <-----> MPLS Cloud <-------> LA Sales office
This is a pretty simple setup where you have the HQ site with Call manager which connects to the MPLS cloud. And you have the Remote Regional and Sales office that also connect to the same MPLS cloud.
The issue was regarding the LA Sales office which was reporting one-way audio issues for outside (PSTN) and on-net calls (across WAN).
Here are some questions I asked ::
1. How long had this been going on ?
Answer :: This is a new site they turned up and has been going on since going live.
2. Who is affected?
Answer :: Everyone at LA Sales office.
3. Does it affect calls to other extensions in other offices or just outside callers?
Answer :: Both.
4. Does it matter who calls who?
Answer :: No.
5. Which direction is the one-way audio?
Answer :: Nobody can hear the LA sales office users, but they can always hear the other party
So when troubleshooting one-way audio, I have found about 80% of the time it’s a routing issue. Meaning in this case, the LA Sales office phones don’t have a route back to the PSTN GW or phone IP addresses in other offices.
The other thing important to remember is that we are dealing with a new site, so we don’t have proven routing nor a WAN link that’s proven to be stable.
The first thing I checked was I verified routing end-to-end. This was easy enough to do by going to the LA Sales office edge router and doing pings to the PSTN GW and the phones in other sites (in this case, I was focusing on phones in the Cali HQ site). The PSTN GW for this site was actually not on any router in the LA sales office. The PRI was actually on the LA Regional office GW. This was done because the remote sales office wasn’t big enough to justify a PRI and because for every regional office we have quite a few satellite sales offices. So it makes sense to have the PRI centralized and all inbound or outbound PSTN calls would be going across the WAN from the LA sales office to the Regional office. This told me that in both cases, I was going across the WAN link for either on-net or outside calling. So when testing routing, I was on the Sales office sourcing my pings from the phone subnet to IP phones in the HQ site and the IP of the GW with PRI in the Regional office (ping the IP of the GW that’s registered to CCM). In both cases, I found the pings to be working so routing was fine.
So in this case, I decided to focus on site-to-site calling between two IP phones. Phones in the LA Sales office and HQ were both registered to the same cluster in Cali. I checked the region settings to make sure we weren’t running into some kind of weird codec issue. Everything was set to use G711 between sites, which makes things easy.
The next step was to get a packet capture from one end, I chose to have the user I was talking to in the HQ site to install Wireshark on his PC attached to his phone. I had him place a test call to a user in the LA Sales office and had him start the capture before doing so. I also checked to make sure I had the right CCM trace settings enabled so I could look at the traces if needed. Now before doing the packet capture, I had him look at the TX and RX counters on his phone. It seemed like his phone was actually receiving packets, but I needed to verify with a capture.
I decided to look at the packet capture first. When looking at the packet capture, there was no question the packets were making it to his phone, as I could see the LA Sales office phone IP address talking to his phone. What I also noticed was the ratio of packets didn’t seem right. For every 8 or so packets that the HQ phone sent, we only received 1 from the Sales office.
See below ::
The HQ phone is 192.168.39.23 and the LA Sales office is 172.32.254.101
After looking at that, I knew we didn’t have a Firewall issue or routing issue. Maybe we had a QoS issue? Look below at the analysis on the RTP stream ::
Each screen shot is one direction. The first screen shot is from the HQ to the LA Sales office phone. Notice the inter-packet delay of 20ms which is our default sampling rate. Everything looks good here.
Now look at the packets coming in from the LA Sales office. Yikes, look at that jitter! The inter-packet delay is really bad!
So after looking at this, I went back to the router at the LA Sales office location for more testing. I knew we didn’t have a QoS issue because I checked the config and did a “sho policy-map interface” for the outbound QoS policy. No queue drops, things were fine. No errors incrementing on WAN circuit, no utilization issues either? Dang, so what is it?
Well, I then did the following ping tests ::
I basically pinged from the IP phone subnet to the remote phone in HQ. Noticed how the first test I tagged the packets with a default TOS value of 0. On the second test, I tagged them with a TOS value of 184 (DSCP 46/EF - which is how phone media gets tagged).
LA_Sales_Office#ping
Protocol [ip]:
Target IP address: 192.168.39.23
Repeat count [5]: 100
Datagram size [100]: 500
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface: 172.32.254.1
Type of service [0]:
Set DF bit in IP header? [no]:
Validate reply data? [no]:
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 100, 500-byte ICMP Echos to 192.168.39.23, timeout is 2 seconds:
Packet sent with a source address of 172.32.254.1
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (100/100), round-trip min/avg/max = 68/68/72 ms
—————-
LA_Sales_Office#ping
Protocol [ip]:
Target IP address: 192.168.39.23
Repeat count [5]: 50
Datagram size [100]: 500
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface: 172.32.254.1
Type of service [0]: 184
Set DF bit in IP header? [no]:
Validate reply data? [no]:
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 50, 500-byte ICMP Echos to 192.168.39.23, timeout is 2 seconds:
Packet sent with a source address of 172.32.254.1
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (50/50), round-trip min/avg/max = 72/661/676 ms
So for some reason, Packets tagged with DSCP 46 are getting huge delays (600ms + round-trip), while DSCP 0 is just fine. So why was it causing one-way audio? I would say the de-jitter buffer on the HQ phone couldn’t deal with such high jitter, and the packets were ignored. The funny thing is, the guy in the HQ site said he could sometimes hear a blip or two from the LA Sales office phone. That would be pretty consistent with what we see here, especially during the first second or two when the phone was picked up (seeing as the jitter values weren’t that bad for the first few packets).
So at this point, I pretty much handed it over to the MPLS carrier and told them to check their QoS policies in that particular direction. The workaround is to set the DSCP value to 0 on the oubound QoS policy-map.
This isn’t the first time I’ve seen this problem with MPLS carriers. It’s getting more and more common it seems.
Article printed from Should Have Gone With Cisco: http://shouldhavegonewithcisco.com
URL to article: http://shouldhavegonewithcisco.com/2008/10/04/one-way-audio/
Click here to print.