Conformance levels in Audio over IP networking

EXPLORE THE HUB: Hot Topcis and Deep Dives | Ebooks and Technical Application | Case Studies

Why should you conform?

There has always been a necessity for different pieces of technical equipment to work together. As time moved on and technologies evolved, that requirement has called for both more complexity and greater harmonisation. In audio this was relatively straightforward when everything was analogue. The major challenge at that time was finding the right connector or, if necessary, adaptor.

Fast forward to today, where Audio over IP (AoIP) technologies are becoming the norm. These systems, which include Livewire, Dante and RAVENNA, use the same, or similar, underlying network technology and standards but were initially incompatible with each other. This did not impede the adoption of AoIP but the fundamental incompatibility confined the respective formats to working in self-contained environments.

While there can be advantages to this, with each proprietary system providing scope for adjusting operational values to meet the needs of different situations, there was the realisation on the part of users and standards organisations that a single, unifying solution could service different types and size of installation, as well as driving adoption.

The result was the AES67 interoperability standard (introduced in 2013), which has accelerated the continuing growth in AoIP usage by bringing greater interconnectivity and enabling the different, incompatible AoIP formats to work together. AES67 also forms the basis of SMPTE ST 2110-30, the audio transport component of the standard for sending digital media over IP networks.

Having the flexibility and choice that a fully interconnected and interoperable system brings is a major advantage when designing a system. This does, however, create a new problem in that the larger a standard grows, the more complex it becomes for manufacturers to implement. Which is where the conformance requirements within AES67 and ST 2110-30 come into play.

Conforming to the standards

Rather than enforcing complex conditions on all compatible systems, AES67 provides a base level of interoperability. Which is useful because, although this is a minimum, it in fact maximises compatibility between equipment by creating a useful, basic level of guaranteed interoperability. This does not, however, prevent manufacturers from taking the platform further to support more 'edge case' values if they can.

When transmitting digital audio over an IP network, there are several factors to consider. Key factors include both how the audio is broken up into packets (number of channels and the length of audio transmitted) and the format of the audio (sampling rate and bit depth). There is scope within the AES67 and ST 2110-30 standards for equipment to accommodate more than one value for these parameters, maximising support for different installation requirements.

Packet time is the real-time duration of the media data contained in a media packet. It details how much audio is captured before being packetized and transmitted. The shorter the packet time, the sooner the audio is transmitted and hence the end-to-end latency is lower. However, a shorter packet time increases the load on the network as more packets are required, so a balance needs to be reached. It should be made clear that despite its influence on latency, packet time is not the same as latency.

The channel count defines the number of mono audio channels in a stream. Depending on how the audio channels from specific pieces of equipment relate to each other, it can be useful for mono audio channels to be group together into a stream. It is more efficient and operationally simpler for a receiving device to manage a single stream containing all the required channels. However, it would be inefficient to send a single, large stream containing all audio channels to all devices if they only required a small subset of the channels within the stream. Building in flexibility over the number of channels in a stream allows for efficient design for a system.

The sampling rate and bit-depth also need to be defined as they would be for any digital audio system, although there will be less variation than for the packet time and channel count. In broadcast audio this is typically 48kHz 24-bit.

Levels of conformance

Standards are carefully worded to define a required base level of functionality, with clear descriptions of additional recommended and optional functionality. A key difference between AES67 and ST 2110-30 is in how the values which are recommended to be supported are presented for each of the four intrinsic technical components that ensure interoperable and efficient networking.

AES67 provides a base level of compliance, which ST 2110-30 builds on. The mandatory configuration an AES67 compliant device must support covers a stream channel count of one to eight channels, a packet time of 1ms, a sampling rate of 48kHz and either 16 or 24-bit audio. These could be described as 'middle of the road' values, which meet the requirements for a wide range of use cases.

Table Graphic

ST 2110-30 has six levels of conformance for payload and packet time. Only Level A is mandatory; it specifies 48kHz, 16 or 24-bit streams for one to eight audio channels delivered with a 1ms packet time. Level B supports the same bit stream and channel count but with a shorter packet time of 125μs. Level C adds support for a maximum of 64 audio channels per stream with the shorter 125μs packet time. The remaining levels, AX, BX and CX, provide support for 96 kHz sampling rates with the respective audio channel counts halved (4, 4 and 32).

ST 2110-30 Level A matches the mandatory conformance requirements of AES67. This means an AES67 device will typically also be ST 2110-30 compliant, although this is not guaranteed. The respective support for lower packet times and higher channel count provided by Levels B and C provides suitability for environments where end-to-end latency is critical or larger groups of audio channels need to be transmitted. This makes it a direct replacement for 64-channel MADI or 16-channel SDI connections. It may still be beneficial to break up a transmission that was previously being distributed as a high channel count, point-to-point connection into smaller streams, especially if some of the receiving equipment does not need to access all the channels.

Many broadcast audio devices are capable of supporting ST 2110-30 Levels A, B and C, which gives system designers a great degree of flexibility in creating audio networks for IP-based broadcast installations. This includes ST 2110-30 compatible Dante and RAVENNA devices, along with native ST 2110-30 equipment from a range of manufacturers. The additional levels defined in ST 2110-30 have been to everyone's benefit, both tightening the focus for manufacturers and broadening the scope for those using the equipment. AES67 achieved the aim of providing a level of unification between previously disparate audio formats over IP solutions. ST 2110-30 has built on this to ensure suitability for a wider range of applications.

What about resilience?

When it comes to live production, resilience has always been a key consideration and this remains the case for IP-based installations. In SDI installations, redundant paths relied on a frame sync as part of the switch function. In the IP domain, seamless switching is based on the data packets. The requirements for creating a single, reconstructed output stream with seamless protection using multiple redundant streams of RTP (real-time transport protocol) packets is specified in the ST 2022-7 standard. This describes an approach in which two streams carrying the same data are typically sent along different routes to the destination. In the event of any packets being lost along either path, the original stream is reconstructed, with switching between the two taking place without affecting the output.

In a similar manner to ST 2110-30, ST 2022-7 includes provision for different levels of compliance, termed Class A, B, C and D. These are specific to the receiving equipment and detail the maximum time differential between receipt of the two streams in order to guarantee seamless switching. Class D requires support for a maximum of 150μs, suitable for live production and environments with minimal latency. Class A, B and C cover a range of larger values (10, 50 and 450ms respectively), which enable a wider array of applications including longer-haul networks.

Table Graphic 2

As with ST 2110-30, when designing a resilient IP-based broadcast network it is a case of matching the resilience requirements of the production to the equipment available.

Conclusion

The various levels of conformance are there to meet different technical requirements. The parameters that make up the conformance levels should be taken into consideration when designing a facility or installation. The simplification of these parameters into a clear set of defined levels within both the ST 2110-30 and ST 2022-7 standards help manufacturers, system integrators and broadcasters alike to deliver optimal products and solutions. As with any large-scale broadcast installation, there is a level of planning required to ensure that system designs are compatible, but the groundwork provided through the technical standards makes this achievable.

From the Broadcast Bridge

In a free part series the Broadcast Bridge explores the road blocks faced by broadcasters when bringing the transition to IP from theory to application, brought together in an ebook ready for your to download now.

Request a copy here