This post is about a critical technical development in the history of Voice over IP which had a wide-reaching impact on the development of voice and related communications solutions. I’m referring to IP media, which was introduced early in the 2000s and has been ramping up every since.

In my last two posts, we discussed two important technologies which were instrumental in the early and middle years of voice-based solutions. The first post covered the introduction of voice boards and the second post reviewed the impact of media gateways on voice solutions.

On the business side, the introduction of media gateways provided a stepping stone which encouraged the pioneers of Voice over IP and other voice solution providers to decide where they offered the most strategic value to their customers.  In particular, should they focus on applications or upon enabling technology which could either complement applications or be used to build underlying infrastructure. The introduction of IP media pushed companies further down this decision path.

Two directions emerged. Several manufacturers of voice boards began to kick the tires on creating software-based versions of their voice boards. In the post on voice boards, we noted how control of voice and media functions was controlled using Application Program Interfaces (APIs) and that private APIs tied to particular vendor product families gained much more market traction than attempts at standards-based APIs. Hence, an early product in the space, Host Based Media Processing (HMP) from Dialogic®, offered the value proposition of being software-based, but was still controlled using the same set of APIs that were used with Dialogic voice boards. In parallel, another movement emerged. Two startup companies, Snowshore and Convedia, introduced a new category of product called the Media Server. In the last post, I mentioned how the Session Initiation Protocol (SIP) started to gain traction early in the 2000s. The Media Server concept took SIP as a starting point and added the ability to manipulate media by using a markup language, which was typically based on the  Extensible Markup Language (XML) recently standardized by the World Wide Web Consortium (W3C). The implications were profound, both on the technical and business sides, but like many new innovations, the transition to using this new approach took many years to develop. For example, by the time Media Servers truly hit the mainstream, the two originating companies had both been acquired by larger organizations who were able to make the needed capital investment to build sustainable businesses for media servers.

Of the two approaches, IP Media controlled by APIs essentially was an incremental development and IP Media managed by Media Servers introduced radical change. Let’s consider why this was the case. IP Media controlled by APIs retained the API-based model for control of media. For existing voice application developers, this was great.  They could start the transition away from including voice board hardware in their solutions and thus vastly simplify their go-to-market strategies. As a result, many voice application developers now raised the flag and said they were now out of the hardware business and their solutions were totally software-based. In reality, this typically meant their application software would run on industry standard Commercial Off the Shelf (COTS) PCs or servers using Intel or compatible CPUs such as those offered by AMD. But by using IP Media, the solution providers could skip the step of adding voice boards to their computer-based solutions and eliminate all of the complications of integrating 3rd party hardware. They did have to be careful to have enough CPU horsepower to run both their applications and IP media software, but it represented a major step forward. Voice and multi-media application solutions had now become a separate business in the Voice over IP market.

I mentioned that the introduction of the IP-based Media Server was a more radical step. So, I’ll review a few points to back up that assertion.

  1. The need to have a private API controlled by a single vendor went away.  The new concept of “the protocol is the API” replaced the programmatic approaches which had required developers to use programming languages such as C, C++ or Java for media operations. Instead, simple operations like playing back voice prompts or collecting digits could be accomplished using the combination of SIP and an XML-based markup language, thus eliminating the need for a programmatic language to carry out these operations.
  2. The application developer could focus clearly on making their applications best-of-breed and partner with media server vendors, who would focus on creating world-class voice and multimedia solutions.
  3. The application developers no longer needed to include media processing in their applications at all, thus reducing the CPU cycles needed for those media operations, However, the application developers did need to partner closely with the media server vendors and ensure their SIP + XML commands would work correctly when issued over an IP network to the paired media server.
  4. The concepts of the standalone application server and the standalone media server got included in the new IP Multimedia Subsystem (IMS) architecture, which was being standardized by the Third Generation Partnership Project (3GPP) as a linchpin for the next generation of mobile networks.

So the move toward IP Media was a major step forward for the Voice over IP industry and encouraged further market segmentation. For the first time, companies could specialize in applications and include the ability to support voice, tones and other multimedia, and do all of this in software which would run on industry standard COTS servers. In turn, hardware component and appliance vendors were able to focus on more distinct market segments where they could utilize either embedded solutions technology or start making the move toward running  media on COTS servers.

Voice Development Models: A Journey Begins

During the past three years, I had product management responsibilities for products which covered the spectrum from hardware-centered to software-centered development.  In telecom, there’s been an evolution in development models as solution providers have taken a series of steps to gradually move away from hardware.  However, like many technical trends, there is a long tail as the older technology goes away only gradually.  In this post and others to follow, I’ll review models for voice applications at a high level and consider some steps along the way which have led to the software-oriented nirvana sought by many solution providers.

In the Nineties, voice development was often done with PCs at the center and embedded board hardware was an important component. The CPUs of the PCs ranged from models like the 386 on up to Pentium. Voice applications entailed lots of media processing, so voice boards with lots of Digital Signal Processors (DSPs) were critical to get scalable applications.  The DSPs did all of the heavy lifting for the media and the CPU of the PC was freed up to support the application side for solutions such as call centers, interactive voice retrieval and fax on demand.  Many of the applications developed during this time are still being used, though the actual PCs or servers may have been replaced and there may also have been some upgrades on the voice board hardware. Nonetheless, thousands of voice boards are still being sold to support these applications. On the software side, there were efforts to create industry standard Application Program Interfaces (APIs) such as S.100 from the Enterprise Computer Telephony Forum (ECTF) and T.611 from the International Telecommunications Union, but most of the boards were controlled using private APIs supplied by the board vendors.

In the model above, the boards and applications were all designed to work over the circuit-switched telephone network, which ranged from analog services (POTS or Plain Old Telephone Service) to digital approaches which began with the Integrated Systems Digital Network (ISDN) and continued with the Signaling System 7 (SS7) network overlay.  The phone companies worldwide assumed that these circuit-switched networks with Time-Division Multiplexing (TDM) and the related seven layer Open Systems Interconnect (OSI) models would be the focus going forward, replacing analog networks, and would perhaps be supplemented by new OSI stacks such as the Asynchronous Transport Method (ATM).

But a revolution had already begun as alternative flatter telecom stacks based on the upstart Internet Protocol  (IP) protocols were being used both for existing applications such as email and new applications like the Worldwide Web. In the telecom industry, a few companies began to explore running voice over IP networks, thus creating a new Voice over IP (VoIP) technical and business model for phone networks.  In the early days (from the late Nineties to the early 2000s), VoIP was mainly used to bypass existing long distance networks to reduce long distance charges, but the range of applications for IP soon began to expand.

At first, this looked like a great opportunity for the voice board manufacturers.  Now, they could add IP support to their boards or potentially just give software developers access to Ethernet ports on PC. An important new board category was created: the media gateway. These early media gateway boards allowed developers to use the existing circuit networks for most of their connections, but also tap into new IP networks where they existed.  Continuing on the same API trends, board vendors extended their private APIs to support IP in addition to TDM.  So now solution developers could run their solutions over both existing TDM and new IP networks, using these new hybrid boards which often could support voice, fax and tones.

