Adaptive KLT

KLT is signal content dependent. The KLT transform matrix is derived based on the inter-channel statistical attributes of a multichannel programme. Since the inter-channel statistical attributes vary significantly for different programme, the signal-sensitivity ensures KLT achieve better audio quality than fixed-matrix transform in the context of HBL [P3]. The inter-channel statistical attributes also vary over time. In this work, an Audio Library consists of 140 multichannel audio excerpts was used. These excerpts were collected from more than fifty tracks and their time duration varied from 3s to 41s. The short-term KLT matrix of these excerpts were analysed using a novel visualisation technique. It was found that in many excerpts, even in very short excerpts, the KLT matrix varied considerably over time, see examples in Figure 1.

Figure 1: Adaptive-KLT matrix coefficients analysis

This originated an idea that the KLT matrix should be updated adaptively over time in KLT-based HBL. However, the following questions needed to be answered:

1. How frequently should the KLT matrix be updated?

2. What kinds of perceptual effects would adaptive KLT cause?

In order to answer these questions, a series of listening tests was carried out. The first listening test aimed to compare the global audio quality degradation, in term of Basic Audio Quality (BAQ), caused by non-adaptive KLT and adaptive-KLT-based HBL. For adaptive KLT, the time window length varied from 2ms to 2000ms. The results showed that: for the adaptive KLT, when the adaptation rate increases, the BAQ score tends to decrease. The adaptive KLT processing resulted in better audio quality compared with non-adaptive KLT, see Figure 2.

Figure 2: Basic Audio Quality for different processing

However, the adaptive KLT process introduced some other artefacts, which are related to the adaptation rate. When the adaptation rate was relatively slow, dynamic spatial distortion was perceived in the form of moving image and sources. When the adaptation rate was very high, timbral distortions were predominant, see Figure 3.

Figure 3: Perceptual attributes for different processing

This work is reported in detail in [P4].