Broadcast News

Bookmark and Share

What Is The Future For Immersive Audio?

News Image
Peter Poers, Managing Director at Jünger Audio, looks at production efforts versus consumer experience.

Along with the evolution of higher resolution in video images, a new way of creating and delivering audio content will be required and is already on the way. All the changes for future audio systems are covered by the general title "Next Generation Audio" (NGA). In addition to the very common existing channel based or fix mixed audio formats, some more audio channels will need to be added to make a difference.

Obviously, the creation and delivery of one layer surround sound (horizontally, with the listener surrounded by audio elements) will not meet future expectations anymore. Following the successful introduction of 3D audio in cinemas by Dolby® Atmos or in VR applications we can expect that immersive audio programs will become part of the delivery for future TV formats. Of course, the client at home can't expect to have a listening experience similar to that of a large scale multi-speaker cinema theatre. However, spatial audio effects can be delivered by using additional height channels reproduced by separate speakers or special 3D sound bars (better called sound projectors as opposed to ordinary external amplifier/loudspeaker combinations), or by using headphones driven by 3D virtualization software. And that will give immersive audio a realistic chance to become a standard feature of home entertainment systems in the near future.

The future NGA based surround sound formats adopted by TV Broadcast and OTT will typically be a maximum of 7.1 + 4H channels – in total, up to 11.1 speakers (as referenced in DVB NGA survey May 2015), arranged as a mid layer surround array of up to seven speaker positions and up to four height speakers on a top layer plus the sub-woofer for low frequency effects.

There are more elements than just a higher channel count that will define the next generation of audio format technology though – the presence of audio objects. At the moment, audio programs are typically produced and mixed in their final reproduction surround sound audio format. That can be for example 5.1 or 5.1 + 2H or even 7.1 + 4H. The mix is created and finalised and is then ready for delivery. These types of program mix we can call a channel based immersive audio format. For the NGA formats, there will be the additional introduction of audio objects. Audio objects are typically discrete mono or stereo audio channels that will be rendered to the reproduction audio mix in the final receiver audio decoder. With this method, these audio elements can remain as objects with individual changes applied just before the final end of the audio delivery path.

Another element for defining the NGA formats is the use of metadata. All of the existing new audio codec systems (e.g. MPEG-H, Dolby AC-4, DTS:X) use an extensive set of metadata to describe audio program details, to optimize production workflow, to control audio encoding and to allow optimum audio performance at the final receiver device decoder. Besides controlling and monitoring audio content in the process of program production, the generation of metadata is a most important step for introducing and launching next generation audio formats. Working with metadata will be essential to "authoring" audio programs in new formats.

Workflow considerations – introduction of a "side car" device
The next generation immersive and personalized audio formats will require changes in the audio production workflow. New procedures for managing object based encoded content and also for the personalization of services through the selection of alternative audio objects (such as commentator languages) needs to be defined. Of course, loudness control during production and the loudness definition for the final output formats are other aspects to consider. The NGA formats will offer a new surround sound experience and the use of upmix, format rendering and downmix algorithms will be essential for creating and monitoring the audio programs.

Some additional tools and changes to existing production environments will be required to be able to create audio content for these new audio formats. One of the important aspects to give the new formats a good chance to succeed will be to minimize the cost of transition. Production costs on the professional side cannot be raised significantly without running the risk of the industry rejecting the new formats. The use of existing digital production infrastructures will be essential to begin content creation for new formats in the near future.

One particular new supporting tool will become most important for different workflow areas – the Multichannel Monitoring & Authoring Unit MMA. This tool must combine audio interfacing, audio computing and metadata authoring in a unique way and will be the key to start production for immersive audio encoding systems or technologies. It will host intellectual property elements from the codec vendor of choice to perform codec specific features and processing. In addition, and depending on the workflow situation, additional sophisticated audio processing features such as surround upmix and loudness control may be options that could be integrated.

Monitoring the immersive audio content will require rendering and downmix. Especially if the local speaker setup is not capable of reproducing the higher order audio formats. It is strongly recommended to also to monitor (or emulate) lower order speaker setups to verify the result of rendering and downmix for home reproduction in environments with different speaker installations. Also, metadata controlling the processes must be verified so that the settings are correct for optimum performance.

Immersive – by the introduction of audio objects
The addition of audio objects is the key for delivering a personalized audio experience. In the case of personalized audio, certain separated audio tracks will be mixed to the final receiver audio format based on decisions made by the end user. The user might select certain objects to use and might also define the mixing ratio between the audio bed and the objects. One example of this application will be dialogue enhancement. There will also be advantages for multi-language programs from object based technology. Several commentary tracks – not just different languages but also different presenters and perspectives – can be delivered within the same audio mix. Additional descriptive audio tracks can be mixed to any possible output format. Of course just one mix can be monitored at any one time. The limits for possible gain changes available for the viewer must be set and will be part of the metadata structure. The final audio format will be determined by the channel count of the audio bed. Depending on the channel order of the audio bed, a rendering procedure and downmix will be required for lower order audio formats. If the final format isn't 3D immersive, the personalized objects will typically be mixed to the center channel and/or to the front stereo side channels.

Conclusion 1
It will take more time for the market for NGA formats at home to become established. The time frame will be set by codec releases from known vendors, by technical preparation of professional production and of distribution networks (content creation and delivery). And finally by support from the consumer industry regarding the implementation of codecs and offering sophisticated reproduction systems (home theater systems, 3D sound projectors, 3D binaural headphones virtualization).

But never forget – many people are still quite happy with the "easy listening" experience with no interaction needed on their part to select from a list of available audio tracks. Another limiting factor is also the practical implications of receiving immersive audio for viewers globally! Just a fraction of consumers will have the chance to use the higher order audio formats in their home or when out and about using mobile devices. For the majority of countries, we should expect that just 5... 10% of households will be prepared and capable of using real 5.1 or higher order audio formats (17% of German households had 5.1 AV home systems in 2014, by Verband Deutscher Musikindustrie). All the others will get immersive and surround sound content just as (rendered) stereo downmix.

Conclusion 2
One question remains. What is the real definition of immersive with the new audio formats? And who will get the most out of it? Immersive can mean very different things to different people and not necessarily just a case of hearing sounds from above! Simple, well done audio recordings can be really immersive! In a simple format that delivers a meaningful audio experience! In recent years, the quality of audio productions has not improved in terms of natural and good sounding audio content. We are living in a world where many audio programs no longer represent the dynamic range and the structure that such content should typically offer. Whilst in previous decades, audio professionals did their best to overcome the technical limitations, now that we have all digital technologies, we cannot maintain the audio quality of the content anymore! Loudness is largely solved, but as we see in many cases now, speech intelligibilty is often worse than ever.

Yes, there is some audio from above and it is surrounding us, but by nature we do not focus on listening from above. So I guess the third dimension in audio cannot be the motivation to move to modern and new codec systems alone. Many common codecs in use today are from an old generation. New codecs can bring technical improvements and higher audio quality level at lower bit rates. The aspect of object based audio (OBA) and the option for personalization of delivered audio content is maybe more attractive for many consumers even if it does not really improve the delivery and performance of audio programs. Three dimensional audio and object based audio – both formats will require changes to production and delivery. Now is the time to discuss and explore how to move forward in the direction of creating a new audio experience.

This article is also available to read at BFV online as part of this issue's Audio feature here, page 33.


Top Related Stories
Click here for the latest broadcast news stories.

Jünger Audio Prototype For IBC 2015
Jünger Audio will use IBC 2015 to showcase a prototype audio monitoring solution that will allow broadcasters to check the quality of all immersive au
Sonic introduces DVD-Audio Centre LE
Sonic Solutions have introduced the DVD-Audio Creator LE – a highly-affordable DVD-Audio authoring system with advanced features. Incorporating core t
Jünger Audio Makes Audio Loudness Its Focus At Broadcast Asia 2012
Dynamics specialist Jünger Audio will focus on controlling audio loudness in the broadcast chain at this year’s Broadcast Asia convention in Singapore
Jünger Audio Helps Two US Broadcasters Comply With CALM
WBPH-TV and WMBC-TV choose Jünger Audio’s T*AP Television Audio Processor as their "one-box" Loudness Control solution Berlin, Germany: The implementa
Blackmagic Design Announces Major New Software Update
Blackmagic Design has announced a major new software update that adds full-audio mixing capability to its ATEM 1 M/E Production Switcher and ATEM Tele
Pro Audio Named New Audio-Technica Distributor In South Africa
Audio-Technica has appointed Pro Audio as its new distributor in South Africa. The Johannesburg-based company will deal with Audio-Technica's consumer
Jünger Audio Introduces New High Performance Audio Processing Products At Broadcast Asia 2011
Dynamics processing specialist Jünger Audio will be showing a number of new projects at Broadcast Asia 2011 (Stand: 4U3-01), including the award-winni
DK-Audio launch new audio monitoring unit
DK-Audio have launched their new PT0600M-LS Audio Monitor, which will be shipping within the next two months. This new addition to DK-Audio's range of
Jünger Audio Makes Audio Loudness Its Focus At BIRTV 2012
Dynamics specialist Jünger Audio will focus on controlling audio loudness in the broadcast chain at this year’s BIRTV convention in China (August 22nd
Audio-Technica's BP3600 Immersive Audio Microphone Now Available
Audio-Technica has announced scheduled availability in Europe and the UK for its recently launched BP3600 Immersive Audio Microphone. A premium broadc
Digigram Releases New NTP
Digigram has announced the release of a new Network Time Protocol (NTP) feature for the company's IQOYA *LINK, IQOYA *LINK/LE, and IQOYA *SERV/LINK IP
Jünger Audio Crontrols Loudness For TVE
Spain's national broadcaster TVE has selected Jünger Audio’s loudness control technology to normalize the audio across a number of its television chan
NTI Dragon Burn now supports leading image & audio file formats
In addition to fine-tuning the DVD Video burning capabilities of Dragon Burn, NTI has now added Mac Panther compatibility and support for virtually al
TV One Shows New Multiple-Input And Output Audio Format Converters At IBC2008
At IBC2008, TV One, specialists in video, audio and multimedia processing equipment, will show their new A2-7000 Series of Audio Format Converters wor
A Viable Economic Alternative to AD...
Both Spoken Subtitling and Audio Description may be candidate services that could benefit from the application of Text-To-Speech technology, says John