IoSR Blog : 01 June 2018

Modifying envelopment in object-based audio

Towards the beginning of the S3A project, we conducted some research to determine the perceptual features ("attributes") that listeners use to differentiate spatial audio reproduction systems [1], and to find out which of those attributes were particularly important for determining listener preference [2]. We found that "envelopment" was the most important attribute.

The definition of envelopment has been widely discussed in the literature [3], but the simple definition produced by participants in our experiment was: "how immersed/enveloped you feel in the sound field (from not at all enveloping to fully enveloping)". In reproduced audio, this sensation can be produced by distinct sound images coming from many directions, or by being surrounded by diffuse or reverberant sound.

Because envelopment is such an important aspect of the listening experience, it is desirable to be able to control it in audio reproduction. For example, a mix engineer might create a highly enveloping mix for a five-channel surround sound system that might be much less successful when replayed on a two-channel stereo system in a living room. Sound that comes from behind the listener is important for creating envelopment, but if we know that we can't achieve this with a given loudspeaker layout, then maybe we could compensate by making other changes; for example, increasing the level of reverberant sound. Another possibility is allowing listener personalisation of the envelopment level, letting listeners set low or high envelopment as they required.

To find out about the parameters that we’d need to modify, we performed an experiment where we asked mix engineers to make mixes at different levels of envelopment, and we recorded where they set the controls [4]. By looking at the average values for low, medium, and high levels of envelopment, we were able to map parameters to envelopment levels and create a system for automatically controlling envelopment. You can find out more about the experiment and hear a demonstration of the system in action in the video below.

The technology that underpins this process is object-based audio. In object-based audio, the elements of an audio scene are kept separate and transmitted alongside metadata that defines how they should be mixed together (e.g. describing their levels and positions). That metadata can be altered to change how the scene is reproduced, allowing the optimisation or personalisation procedures discussed above. The S3A project system for intelligent object-based audio rendering through metadata adaptation will be presented in a paper at the upcoming Audio Engineering Society conference on Spatial Reproduction [5].

In the future, we plan to continue this research. One avenue to explore is developing tools that allow producers to meter the level of envelopment in their mix, and to see how this would be affected by reproducing the mix on different systems. To do this, envelopment models (such as the model produced by George et al. [6]) need to be tested and developed.

References

[1] J. Francombe, T. Brookes, R. Mason, “Evaluation of Spatial Audio Reproduction Methods (Part 1): Elicitation of Perceptual Differences,” Journal of the Audio Engineering Society, vol. 65, pp. 198–211 (2017), http://dx.doi.org/10.17743/jaes.2016.0070.

[2] J. Francombe, T. Brookes, R. Mason, J. Woodcock, “Evaluation of Spatial Audio Reproduction Methods (Part 2): Analysis of Listener Preference,” Journal of the Audio Engineering Society, vol. 65, pp. 212– 225 (2017), http://dx.doi.org/10.17743/jaes.2016.0071.

[3] J. Berg, “The Contrasting and Conflicting Definitions of Envelopment,” in: Audio Engineering Society 126th Convention (Munich, Germany. Paper No. 7808, 2009).

[4] J. Francombe, T. Brookes, R. Mason, “Determination and Validation of Mix Parameters for Modifying Envelopment in Object-Based Audio”, Journal of the Audio Engineering Society, vol. 66, pp. 127–145, https://doi.org/10.17743/jaes.2018.0011

[5] J. Woodcock, J. Francombe, A. Franck, R. Hughes, Y. Tang, H. Kim, D. Menzies, Q. Liu, M. Simon Galvez, P. Coleman, W.J. Davies, T. Brookes, R. Mason, B.M. Fazenda, T.J. Cox, P.J.B. Jackson, C. Pike, F.M. Fazi, A. Hilton, "A framework for intelligent metadata adaptation in object-based audio", Audio Engineering Society Conference on Spatial Reproduction (Tokyo, Japan, 7–9 Aug 2018).

[6] S. George, S. Zielinski, F. Rumsey, P. J. B. Jackson, R. Conetta, M. Dewhirst, D. Meares, S. Bech, “Development and Validation of an Unintrusive Model for Predicting the Sensation of Envelopment Arising from Surround Sound Recordings,” Journal of the Audio Engineering Society, vol. 58, pp. 1013–1031 (2011).

by Jon Francombe (now at BBC R&D, jon.francombe@bbc.co.uk)