How IP Media Changed the Voice Business

This post is about a critical technical development in the history of Voice over IP which had a wide-reaching impact on the development of voice and related communications solutions. I’m referring to IP media, which was introduced early in the 2000s and has been ramping up every since.

In my last two posts, we discussed two important technologies which were instrumental in the early and middle years of voice-based solutions. The first post covered the introduction of voice boards and the second post reviewed the impact of media gateways on voice solutions.

On the business side, the introduction of media gateways provided a stepping stone which encouraged the pioneers of Voice over IP and other voice solution providers to decide where they offered the most strategic value to their customers.  In particular, should they focus on applications or upon enabling technology which could either complement applications or be used to build underlying infrastructure. The introduction of IP media pushed companies further down this decision path.

Two directions emerged. Several manufacturers of voice boards began to kick the tires on creating software-based versions of their voice boards. In the post on voice boards, we noted how control of voice and media functions was controlled using Application Program Interfaces (APIs) and that private APIs tied to particular vendor product families gained much more market traction than attempts at standards-based APIs. Hence, an early product in the space, Host Based Media Processing (HMP) from Dialogic®, offered the value proposition of being software-based, but was still controlled using the same set of APIs that were used with Dialogic voice boards. In parallel, another movement emerged. Two startup companies, Snowshore and Convedia, introduced a new category of product called the Media Server. In the last post, I mentioned how the Session Initiation Protocol (SIP) started to gain traction early in the 2000s. The Media Server concept took SIP as a starting point and added the ability to manipulate media by using a markup language, which was typically based on the  Extensible Markup Language (XML) recently standardized by the World Wide Web Consortium (W3C). The implications were profound, both on the technical and business sides, but like many new innovations, the transition to using this new approach took many years to develop. For example, by the time Media Servers truly hit the mainstream, the two originating companies had both been acquired by larger organizations who were able to make the needed capital investment to build sustainable businesses for media servers.

Of the two approaches, IP Media controlled by APIs essentially was an incremental development and IP Media managed by Media Servers introduced radical change. Let’s consider why this was the case. IP Media controlled by APIs retained the API-based model for control of media. For existing voice application developers, this was great.  They could start the transition away from including voice board hardware in their solutions and thus vastly simplify their go-to-market strategies. As a result, many voice application developers now raised the flag and said they were now out of the hardware business and their solutions were totally software-based. In reality, this typically meant their application software would run on industry standard Commercial Off the Shelf (COTS) PCs or servers using Intel or compatible CPUs such as those offered by AMD. But by using IP Media, the solution providers could skip the step of adding voice boards to their computer-based solutions and eliminate all of the complications of integrating 3rd party hardware. They did have to be careful to have enough CPU horsepower to run both their applications and IP media software, but it represented a major step forward. Voice and multi-media application solutions had now become a separate business in the Voice over IP market.

I mentioned that the introduction of the IP-based Media Server was a more radical step. So, I’ll review a few points to back up that assertion.

  1. The need to have a private API controlled by a single vendor went away.  The new concept of “the protocol is the API” replaced the programmatic approaches which had required developers to use programming languages such as C, C++ or Java for media operations. Instead, simple operations like playing back voice prompts or collecting digits could be accomplished using the combination of SIP and an XML-based markup language, thus eliminating the need for a programmatic language to carry out these operations.
  2. The application developer could focus clearly on making their applications best-of-breed and partner with media server vendors, who would focus on creating world-class voice and multimedia solutions.
  3. The application developers no longer needed to include media processing in their applications at all, thus reducing the CPU cycles needed for those media operations, However, the application developers did need to partner closely with the media server vendors and ensure their SIP + XML commands would work correctly when issued over an IP network to the paired media server.
  4. The concepts of the standalone application server and the standalone media server got included in the new IP Multimedia Subsystem (IMS) architecture, which was being standardized by the Third Generation Partnership Project (3GPP) as a linchpin for the next generation of mobile networks.

So the move toward IP Media was a major step forward for the Voice over IP industry and encouraged further market segmentation. For the first time, companies could specialize in applications and include the ability to support voice, tones and other multimedia, and do all of this in software which would run on industry standard COTS servers. In turn, hardware component and appliance vendors were able to focus on more distinct market segments where they could utilize either embedded solutions technology or start making the move toward running  media on COTS servers.

In my next post, I’ll talk more about how the business models for voice and unified communications solutions have evolved due to the more wide spread use of server and appliance based technology for applications, signaling and media.

Advertisements

Impact of Media Gateways on Voice Solutions

This is the latest in a series of posts on how voice development has been moving from hardware to software centered models. In my last post, we reviewed the classic approach to developing voice-centered solutions, which typically utilized voice boards. In this post, I’ll review how media gateways helped change the model.

In the classic voice model, the voice board often was used both for voice processing and to connect to a phone network, which might be either digital or analog. When Voice over IP (VoIP) began to emerge, new options became available for voice solutions. In the early days of VoIP, the H.323 stack was used to connect to IP networks, but the Session Initiation Protocol (SIP) got some crucial support in the 2000-2001 time frame from Microsoft and the Third Generation Partnership Program (3GPP), the leading standards organization for mobile phone networks. Within a few years, voice developers began to add SIP to their development capabilities. This had multiple implications.

Let’s look at some business side drivers. After the dot com crash and the related “Telecom Downturn,” which decimated the ranks of engineering staffs of the large vendors known as The Equipment Manufacturers (TEMs), these companies were looking for ways to reduce the amount of hardware in their solutions. In the classic voice solution, the voice board processed media and also connected to the circuit-switched networks. When SIP became popular, many of the TEMs started saying they wanted to move away from the hardware business. Some of these companies started processing media as part of their voice applications and others continued to rely upon voice boards for this processing.  In either case, if they outsourced the connection to the network to another box, they could reduce the number of hardware dependent elements in their solution and simplify the process of building and shipping their solutions.

Enter the Media Gateway. As the application developer included SIP in their solutions, they could connect to a media gateway via SIP and then let the media gateway take over the role of connecting to the existing circuit-switched network. This had been possible before SIP with H.323, but SIP offered much more flexibility for doing the complex call processing needed by the voice developers and continued to gain market momentum. In turn, various hardware companies started building purpose-built media gateway appliances to connect to digital or analog networks. The gateways supported the most common networks such as ISDN first, but eventually some gateways got more sophisticated and added Signaling System #7 (SS7) support as well.  This decomposition  of the voice solution offered benefits for both types of vendors. The solution vendors could start their move away from hardware and focus more on software, whereas the media gateway vendors were able to specialize in connections between SIP and the circuit-switched networks. Each type of company could specialize in their area of expertise and the solutions providers could add value to their solutions by buying best-of-breed media gateways.  Since the network protocols were standards-based,  the gateways needed to have robust standard protocol implementations and this helped create a competitive market for media gateways.

As a result, solution developers took another step along the path of reducing their dependency on embedded hardware, since they could now outsource the network connection to a media gateway.  In the next post, I’ll talk about developments in IP-based media which continued the evolution toward software-based voice applications.

If you participated in the evolution described here, please feel free to weigh in with your comments. If you’d like to explore strategies on how to evolve your company’s solutions to meet customer needs, you can reach me on LinkedIn.

Voice Development Models: A Journey Begins

During the past three years, I had product management responsibilities for products which covered the spectrum from hardware-centered to software-centered development.  In telecom, there’s been an evolution in development models as solution providers have taken a series of steps to gradually move away from hardware.  However, like many technical trends, there is a long tail as the older technology goes away only gradually.  In this post and others to follow, I’ll review models for voice applications at a high level and consider some steps along the way which have led to the software-oriented nirvana sought by many solution providers.

In the Nineties, voice development was often done with PCs at the center and embedded board hardware was an important component. The CPUs of the PCs ranged from models like the 386 on up to Pentium. Voice applications entailed lots of media processing, so voice boards with lots of Digital Signal Processors (DSPs) were critical to get scalable applications.  The DSPs did all of the heavy lifting for the media and the CPU of the PC was freed up to support the application side for solutions such as call centers, interactive voice retrieval and fax on demand.  Many of the applications developed during this time are still being used, though the actual PCs or servers may have been replaced and there may also have been some upgrades on the voice board hardware. Nonetheless, thousands of voice boards are still being sold to support these applications. On the software side, there were efforts to create industry standard Application Program Interfaces (APIs) such as S.100 from the Enterprise Computer Telephony Forum (ECTF) and T.611 from the International Telecommunications Union, but most of the boards were controlled using private APIs supplied by the board vendors.

In the model above, the boards and applications were all designed to work over the circuit-switched telephone network, which ranged from analog services (POTS or Plain Old Telephone Service) to digital approaches which began with the Integrated Systems Digital Network (ISDN) and continued with the Signaling System 7 (SS7) network overlay.  The phone companies worldwide assumed that these circuit-switched networks with Time-Division Multiplexing (TDM) and the related seven layer Open Systems Interconnect (OSI) models would be the focus going forward, replacing analog networks, and would perhaps be supplemented by new OSI stacks such as the Asynchronous Transport Method (ATM).

But a revolution had already begun as alternative flatter telecom stacks based on the upstart Internet Protocol  (IP) protocols were being used both for existing applications such as email and new applications like the Worldwide Web. In the telecom industry, a few companies began to explore running voice over IP networks, thus creating a new Voice over IP (VoIP) technical and business model for phone networks.  In the early days (from the late Nineties to the early 2000s), VoIP was mainly used to bypass existing long distance networks to reduce long distance charges, but the range of applications for IP soon began to expand.

At first, this looked like a great opportunity for the voice board manufacturers.  Now, they could add IP support to their boards or potentially just give software developers access to Ethernet ports on PC. An important new board category was created: the media gateway. These early media gateway boards allowed developers to use the existing circuit networks for most of their connections, but also tap into new IP networks where they existed.  Continuing on the same API trends, board vendors extended their private APIs to support IP in addition to TDM.  So now solution developers could run their solutions over both existing TDM and new IP networks, using these new hybrid boards which often could support voice, fax and tones.

In my next post, I’ll talk about how media gateways helped to kick off a new voice development model which accelerated the separation between software and hardware for voice and the new application category which became Unified Communications.

If you participated in the evolution described here, please feel free to weigh in with your comments.  If you’d like to explore strategies on how to evolve your solutions, you can reach me on LinkedIn.

Security – A Teachable Moment?

The recent headlines about the NSA capturing data related to phone calls brings up a familiar topic – security. I’ve been managing a session border controller product for the past year and I’ve often been asked if the product supports security. This can be a frustrating question for a product manager, since security is a blanket term that can cover so many areas and this kind of naive question means that the discussion needs to start at a pretty basic level. However, the question can be turned around. One logical response is to ask what kind of security the person wants to know about. An even better response is to get back to basics and ask what are they — typically a customer — trying to protect. In other words, what are the threats?

In the world of international telecom standards, the definition of security starts with the analysis of threats. The National Standards Institute (NIST) wrote a fine paper on security for Voice over IP networks which can be found here. The authors analyzed potential threats to such networks and then proposed solutions. This is preferable to the approach that is often taken of prescribing a security solution before understanding what the goals of the security solution are.

Returning to the topic of the NSA, the President offered a response to critics saying that NSA was not recording phone calls, as if that was the only issue in play here. But if we look at this from a threats perspective, if you are an individual subscriber of phone services, you might want assurances from the service provider of privacy protecting both the content of your communications and the records of who you are talking to. We’ve all seen television shows where the police get a warrant to dump the cell phone records of a potential suspect and just by analyzing the call patterns, are able to figure out who they were calling, when and for how long. This kind of information is often called “traffic analysis” and it can be very revealing. If your company is discussing a merger deal with another company, getting access to these kinds of phone records might reveal the potential merger participants in advance of any public announcement. So is there an incentive for businesses and individuals to protect against people who want to do traffic analysis on their voice (or other) communications? You bet.

I’ve been hearing that argument that if people participate on Facebook and Twitter their public activities are an open book for anybody with Internet access. Sure, that’s true to an extent, though there are battles going on between Facebook and their members about where the privacy lines get drawn. However, I think most phone subscribers, be they individuals or businesses, expect that their private communications will remain so.

On the technical side, this story boils down to a question of where to draw the lines between security and privacy. If this story and the resulting publicity causes individuals and businesses to consider what information they’d like to remain private and which data is considered “fair use” by the government and under what guidelines, then maybe we can have a useful public debate on these matters and not “leave it to the experts.”