Bobbie Smulders Freelance Software Developer

Reverse engineering the Polycom CX300 USB phone

Close-up of the Polycom CX300 display

I’ve recently purchased the Polycom CX300 USB phone which works with Skype for Business out of the box. But with all this technology accessible via USB I was keen to figure out if I could use it for other purposes. How can I interface with this phone outside of Skype?

Polycom CX300

Polycom CX300 phone

The Polycom CX300 is a USB phone that is designed specifically for Microsoft Skype for Business (formally Microsoft Lync). It plugs in via the USB port of your computer and attaches to Skype for Business to make phone calls to co-workers and external numbers. It has a handset, speakerphone and a connection for an external headset. Furthermore, it has a display, a traditional keypad, buttons for controlling volume and selecting the audio output.

On this page, I will investigate how to interface with this specific phone outside of Skype for Business. This could be useful for using a different VOIP client, using the phone as an intercom, using the phone to listen to jokes, making a personal voice assistant, displaying the title of your currently playing song or whatever creative project you can come up with.

Audio

macOS audio interfaces

Audio is very easy on this device. After plugging in the phone over USB, it presents itself as a USB audio device. Using it is as easy as setting it as your default audio output. Switching between the speakerphone, handset and headset can be done by pressing the buttons on the phone. Audio input is the same, with the phone switching between the handset microphone and the built-in microphone automatically.

Buttons and state

The phone uses USB HID to communicate button presses and state with the computer. This is where it got a bit tricky. Since macOS High Sierra it is possible to use Wireshark to sniff USB packets, for which I found a good tutorial. Pressing the “5” button on the keypad gave me a total of four USB packets:

Four USB interrupt packets

Given the timing, it was clear that the first two were for pressing the button down and the last two for releasing the button. But going inside the first packet, it was not instantly clear what it all meant. In the following screenshot, the final block of bytes is the one carrying the data.

Contents of the packet after pressing 5

So what happened when I pressed “6” on the keypad instead of “5”? The screenshot below shows the content of the new USB packet. The third byte changed from 0x06 to 0x07, which was a strong indication that this was the byte that contained the key being pressed.

Contents of the packet after pressing 6

After pressing all buttons and trying out various sensors, I ended up with a look-up table for all buttons. As it turned out, the second and third bytes of the packet were for the button pressed.

KEYPAD_0_DOWN    = 00 01
KEYPAD_1_DOWN    = 00 02
KEYPAD_2_DOWN    = 00 03
KEYPAD_3_DOWN    = 00 04
KEYPAD_4_DOWN    = 00 05
KEYPAD_5_DOWN    = 00 06
KEYPAD_6_DOWN    = 00 07
KEYPAD_7_DOWN    = 00 08
KEYPAD_8_DOWN    = 00 09
KEYPAD_9_DOWN    = 00 0A
KEYPAD_STAR_DOWN = 00 0B
KEYPAD_DASH_DOWN = 00 0C
REDIAL_DOWN      = 04 00
FLASH_DOWN       = 02 00
DELETE_DOWN      = 20 00
MUTE_DOWN        = 10 00
HOOK_UP          = 01 00
NO_KEY           = 00 00

Fiddling with phone and deciphering the received packets also resulted in various other discoveries. The fourth byte is to communicate if the phone is receiving audio:

ENABLED           = 00
DISABLED          = 03

The fifth byte is to communicate the type of audio device the phone is using:

HANDSET          = 40
SPEAKER          = 50
HEADSET          = 60
NO_CHANGE        = 00
SPEAKER_LOUD     = 01
SPEAKER_LOUDER   = 02

The sixth and seventh byte are to communicate the current volume level:

VOLUME_01 = 70 0B
VOLUME_02 = 27 10
VOLUME_03 = D1 16
VOLUME_04 = 3B 20
VOLUME_05 = 86 2D
VOLUME_06 = 4E 40
VOLUME_07 = D5 5A
VOLUME_08 = 4E 80
VOLUME_09 = 3C B5
VOLUME_10 = FF FF	

The eighth byte is to communicate whether or not the microphone is muted:

NOT_MUTED        = 00
MUTED            = 01

If I try to decipher the USB packet from earlier:

01 00 06 00 00 4e 40 00

Keypad:     00 06 KEYPAD_5_DOWN
Audio:      00    NO_INPUT
Audio type: 00    NO_CHANGE
Volume:     4e 40 VOLUME_06
Microphone: 00    NOT_MUTED

And that concludes the buttons and device state.

Display

The display was quite the puzzle to figure out. I needed Skype for Business to send data to the display so I could see what was happening. Even before logging in, the phone displayed the current date and time on the right hand of the display.

CX300 display showing the current date and time

After inspecting this action in Wireshark, it showed that the computer sent 16 packets to the phone doing so. The first two packets were control packets, for setting up the USB device. The ones following were interrupt packets for sending data. The 12th and 16th packet contained a large chunk of data making them a suspect for our display text.

USB packets after setting the display from Skype

Time to decipher what I was receiving

  Data sent to phone:
  15 80 31 00 34 00 2f 00 30 00 34 00 00 00 00 00 00 00
  15 80 31 00 37 00 3a 00 35 00 30 00 00 00 00 00 00 00
  
  Text on display:
  14/04
  17:50
  
  Difference in packets:
  31 00 34 00 2f 00 30 00 34
  31 00 37 00 3a 00 35 00 30
  
  After removing filler bytes:
  31 34 2f 30 34
  31 37 3a 35 30
  
  31 = 1
  34 = 4
  2f = /
  30 = 0
  34 = 4
  
  31 = 1
  37 = 7
  3a = :
  35 = 5
  30 = 0

Attentive readers will have noticed that the data is encoded with plain ASCII. But what about all the other data that is sent to the phone? After creating my own packets and playing with the data, this is what I figured out:

Clear display:                      13 00

Show text in four corners:          13 0D
Display in the top left corner:     14 01 80
Display in the bottom left corner:  14 02 80
Display in the top right corner:    14 03 80
Display in the bottom right corner: 14 04 80

Show text in two lines:             13 15
Display in the top line:            14 05 80
Display in the bottom line:         14 0A 80

To send data to the display, I start the process by sending out a data packet containing which mode I want to use, four corners or two lines. I then send a data packet containing the corner or line I want to write data to. Finally, I send the actual message encoded in ASCII. The second byte is for indicating if this message is the last one. Use 0x00 for all messages except the last, that one needs 0x80 so the phone knows when all data is received. Repeat these steps for the other corners or line.

  15 00 48 00 65 00 6c 00 6c 00 6f 00 20 00 77 00 6f 00
  Hello wo
  
  15 80 72 00 6c 00 64 00 21 00 00 00 00 00 00 00 00 00
  rld!

The final pieces of the puzzle are the status LEDs on the phone. The presence LED on the bottom right of the phone:

STATUS_AVAILABLE      = 16 01
STATUS_BUSY           = 16 03
STATUS_BE_RIGHT_BACK  = 16 05
STATUS_AWAY           = 16 05
STATUS_DO_NOT_DISTURB = 16 06
STATUS_OFF_WORK       = 16 07

Or when translated to colors:

STATUS_LED_GREEN      = 16 01
STATUS_LED_RED        = 16 03
STATUS_LED_YELLOW_RED = 16 04
STATUS_LED_YELLOW     = 16 05
STATUS_LED_OFF        = 16 07

And the speakerphone status LED:

OFF = 02 00
ON  = 02 01

Java

I made a small Java application to control the Polycom CX300 phone using the information on this page, available on GitHub. It uses hid4java which uses JNA to communicate with USB HID devices. The application will start and display a clock on the plugged-in Polycom CX300 phone. Keypresses and audio state are written to the console.

Wrapping it up

The end result is a detailed description of the protocol that the Polycom CX300 uses for communication and an application that uses the protocol to control all functions of the phone. Feel free to play around with it, and if you need a phone, they are fairly cheap on eBay. The Plantronics P540 is a rebranded device that may also work, please let me know if it does.