Spherical correlation as a similarity measure for 3-D radiation patterns of musical instruments

– We investigate the use of spherical cross-correlation as a similarity measure of sound radiation patterns, with potential applications for their study, organization, and manipulation. This work is motivated by the application of corpus-based synthesis techniques to spatial projection based on the radiation patterns of orchestral instruments. To this end, we wish to derive spatial descriptors to complement other audio features available for the organization of the sample corpus. Considering two directivity functions on the sphere, their spherical correlation can be computed from their spherical harmonic coef ﬁ cients. In addition, one can search for the 3-D rotation matrix which maximizes the cross-correlation, i.e. which offers the optimal spherical shape matching. The mathematical foundations of these tools are well established in the literature; however, their practical use in the ﬁ eld of acoustics remains relatively limited and challenging. As a proof of concept, we apply these techniques both to simulated radiation data and to measurements derived from an existing database of 3-D directivity patterns of orchestral instruments. Using these examples we present several test cases to compare the results of spherical correlation to mathematical and acoustical expectations. A range of visualization methods are applied to analyze the test cases, including multi-dimensional scaling, employed as an ef ﬁ - cient technique for data reduction and navigation. This article is an extended version of a study previously published in Carpentier, T. & Einbond, A. (2022). Spherical correlation as a similarity measure for 3D radiation patterns of musical instruments. 16th Congrès Français d ’ Acoustique (CFA), 11 – 15 Apr 2022, Marseille, France. https://openaccess.city.ac.uk/id/eprint/28202/.


Introduction
The directivity properties of sound sources, in particular musical instruments and voices, have been shown to be a key factor in the perception of acoustical sources in a reverberant environment [1,2], for the reproduction of sound by electroacoustic devices [3,4], and for the realistic synthesis of sources in auralization and virtual acoustic frameworks [5][6][7][8]. In all these areas of ongoing research, it would be beneficial to formulate a compact way to describe, compare, and classify sound radiation patterns.
Audio descriptors are widely and successfully used for the classification of musical and sonic material [9][10][11]. However, they are primarily evaluated on monophonic signals and therefore do not capture the complex spatial patterns of acoustical sources such as musical instruments. In order to analyze these patterns computationally, we require a similarity measure for pairwise comparisons, from which we can derive a multidimensional spatial description. The primary aim of our study is to investigate the cross-correlation of these patterns as a similarity measure and clustering tool.

Motivation
This work was initiated during an artistic-research residency where composer Aaron Einbond sought to extend corpus-based concatenative synthesis (CBCS) with spatial features. CBCS [12] is a sound synthesis method where short sound segments, or units, are automatically selected from a large database of audio samples and then assembled (i.e. concatenated) for playback. By re-arranging units from a corpus of live-or pre-recorded sounds, the technique is capable of synthesizing rich and musically expressive materials [13].
In CBCS, a unit selection algorithm is responsible for finding the sequence of units that best matches a given target specifying the sound or phrase to be synthesized. The selection is performed according to descriptors of the units, that is, feature characteristics extracted from or attributed to the source sounds. A multi-dimensional descriptor space is populated with the sound units and a distance function *Corresponding author: thibaut.carpentier@ircam.fr guides the unit selection algorithm to find the optimal sequence of units to fit the target.
Most often, the descriptors used are acoustical parameters computationally analyzed from the audio signals [14] and expected to be representative of the acoustical structure of the sound units. This typically includes temporal, spectral, harmonic, spectro-temporal, and energetic properties of the sound events, such as fundamental frequency, loudness, spectral centroid, etc. However, these descriptors do not commonly include spatial information, which would be useful especially when considering audio signals from acoustic instruments. One initial motivation for our work is to incorporate spatial descriptors into the CBCS method as a way to extend and improve our previous experiments on spatialized CBCS [15][16][17]. More specifically, we wish to apply CBCS to a corpus of musical instrument recordings, taking their radiation patterns into account as supplemental descriptors for the organization of the database.

Previous work
To incorporate radiation patterns in the context of CBCS, as described above, requires a rigorous comparison, or distance function between patterns. However, radiation patterns are in essence multidimensional data, and comparing them is therefore not an obvious task.
One way to assess their resemblance is by evaluating their spatial correlation, defined as follows. A cross-correlation coefficient is a frequency-dependent scalar value that expresses the similarity between sound pressures at two given positions in space [18]. This coefficient can then be integrated over the entire discrete set of measurement sensors for two radiation functions: the result is their spatial correlation. Looking chronologically at examples of this technique, Moreau [19,20] employs such a spatial correlation for the qualitative assessment of a directivity model of spherical microphone arrays against measured data. Pollow [21] (Sects. 3.3.1-3.3.5) uses spatial correlation for the analysis of instrumental radiation patterns from the Technical University of Berlin (TU Berlin), discussed in further detail below. He produces scatter plots of the correlation values of partial tones radiated by the instruments, similar to what we present in Section 5 using a different algorithm. Sridhar et al. [22] use spatial correlation to compare polar responses of different loudspeakers and assess their invariance over a specified frequency range.
However, spatial correlation presents disadvantages that motivated us to pursue a more flexible and efficient method based on spherical harmonics. As radiation patterns are acoustic wave fields evaluated on the surface of a sphere, they can be conveniently represented by their spherical harmonic coefficients, using a decomposition of the angular functions into the orthonormal set of spherical harmonics [23]. This representation has already proven useful for the analysis [4], reproduction [8], interpolation [24], extrapolation [25], or auralization [26] of directivity functions.
The spherical harmonic representation can also be used to evaluate the correlation of two signals on the sphere; this will be referred to as spherical correlation. Spherical correlation offers some advantages over spatial correlation as presented above: (a) it is independent of the acquisition setup, i.e. from the number and positions of measurement microphones; this potentially allows for comparisons of data from different origins. (b) As spherical harmonic expansions can be easily rotated, spherical correlation can be used to determine the angular displacement (or lag) needed to align the two signals on the sphere. This is widely used in the field of image processing in order to perform 3-D shape-matching, also known as shape registration [27][28][29][30][31][32]. (c) As spherical harmonic expansion offers the most compact representation of radiation data for a given resolution, spherical correlation can be computed efficiently.
The spherical correlation formalism is discussed in depth in the mathematical literature. The state of the art is introduced in [33], building on the foundation of the spatial Fourier transform [34]. However, while "conventional" correlation is ubiquitous for the analysis of audio signals, so far the practical use of spherical correlation in the field of acoustics remains relatively limited and challenging [35].
In a chronological review of acoustics applications, spherical correlation is briefly presented in [4,36], but actually unused. It is used by Guillon [37,38] to compute similarities between spatial frequency response surfaces (SFRS), and later to clusterize a dataset of SFRS; by Deboy and Zotter [39] to perform rotational tracking of a moving trumpet, for which they address the question of discretizing the 3-D rotation group; and by Pollow et al. [25] to measure the quality of range extrapolation of head-related transfer functions, independent of a possible gain mismatch. In a recent publication, Pezzoli et al. [40] propose a set of metrics, including the spherical correlation coefficient, to compare the sound radiation of several historical violins. Similar to our study, Hohl and Zotter [41] use spherical correlation as a similarity measure of radiation patterns of musical instruments. They examine whether different partials at the same frequency, but originating from different played pitches, exhibit similar radiation on a given instrument. This is promising work, but their short paper does not provide much detail.

Source data and methods
Our study is both motivated and facilitated by the availability of existing datasets of radiation patterns of orchestral instruments and voices that have been obtained by previous researchers. These studies use acoustical measurements, in anechoic conditions, with surrounding circular or spherical microphone arrays. Instruments are excited either by human players or, in some cases, by electromechanical devices such as an artificial mouth for brass instruments. Many authors have reported such measurements either using synchronous multichannel recordings [2,5,[42][43][44][45][46][47][48][49][50][51][52] or repeating the signals and rotating the instrument [48,49,[53][54][55][56][57]. In the case of a human player or singer, the reproducibility of the signals is of course a matter of concern.
Throughout this article, we use a database published by TU Berlin [50,58] that constitutes, to date, one of the most comprehensive datasets of publicly available 3-D high-resolution synchronous measurements, with radiation patterns for 41 modern and historical orchestral instruments.
In the following sections, we review the theory of spherical correlation and expand on the studies reviewed above to present applications to the analysis of directivity functions. We argue that the spherical correlation coefficient, as a simple scalar number, is a powerful tool for the comparison and similarity measurement of radiation patterns. After introducing the mathematical preliminaries (Sect. 2), we discuss practical considerations for an actual implementation, in particular with respect to the discretization of the search space for rotational matching (Sect. 3). In Section 4, we present numerical simulations to elicit basic properties of spherical correlation and to validate our implementation. We then present several visualization techniques, based on a matrix of cross-correlation values, that can be helpful to detect similarities among radiation patterns, and we propose the use of multidimensional scaling to clusterize radiation data. We apply these tools to several case studies of measured directivity data of musical instruments (Sect. 5), and we show that spherical correlation, both with and without rotational matching, can unveil interesting radiation similarities across frequencies or partials of a given instrument, or across multiple instruments. Given two shapes f and g, we wish to find the rotation that best aligns them. Let f be a square-integrable function on the unit sphere f 2 L 2 ðS 2 Þ. In the simple case, let us assume that g is a rotated version of f , i.e. f = K R (g) for some 3-D rotation R. We denote K R the rotational operator defined such that 8X 2 S 2 We wish to find the rotation R. In the more general case, given the two patterns f and g, we wish to find the rotation that best aligns the two shapes on the sphere. This can be accomplished by evaluating the cross-correlation between the two functions and finding the rotation R that maximizes the above integral [33]. However, evaluating C R (f , g) for all possible rotations in the spatial domain is a time-consuming task. Instead, we undertake it in the spatial Fourier domain.
Since f and g are square-integrable on the sphere S 2 , we can write their Fourier expansions [34] f Y m n ðXÞ are the spherical harmonic functions, and f m n are the Fourier coefficients of f . Here we have further assumed that f and g are bandlimited, with their bandwidth B = (N + 1), N being the maximum order of the Fourier expansion.
A well-known property of the conventional (Euclidian) Fourier transform is that a translation in the time domain is interpreted as a phase shift in the frequency domain. For the Fourier transform on S 2 , this property means that the magnitudes of the Fourier coefficients are invariant under rotation: writing f n (X) for the nth frequency component of f , the quantity |f n (X)| is invariant under rotation. Furthermore, the spherical harmonic basis functions of each order n transform among themselves under rotation according to where D n mm 0 ðRÞ is the Wigner-D function (see Appendix). Equation (5) is valid for complex-valued spherical harmonics. When considering real-valued spherical harmonics (as typically used in the field of Ambisonics research), a similar result holds, however involving the Wigner-d function instead of Wigner-D; details can be found for instance in [59] (Eqs. (16)- (20)). Now, using equations (3)-(5), it is possible to simplify equation (2) into: A complete demonstration can be found for instance in [33]. Equation (6) allows us efficiently to compute the crosscorrelation by combining the Fourier coefficients of f and g. From this, we can now search for which rotation R maximizes C R (f , g).
In equations (5) and (6), the Fourier coefficients are rotated by means of the explicit formulae with the Wigner-D functions (or Wigner-d for real-valued harmonics); in our practical implementation, we rather use recurrence relations as this appears to be computationally more efficient and numerically stable [60][61][62].

Normalized cross-correlation
The normalized cross-correlation [30,38,63] is simply a variant of equation (2) normalized by the energy of f and g, written The numerator of this expression has already been developed in equation (6). The denominator can be easily calculated thanks to Parseval's identity: Z As expressed above, the cross-correlation coefficient C R ðf ; gÞ corresponds to a cosine similarity measure. It is also possible to construct a Pearson correlation coefficient by replacing f and g with f and ğ, where f results from centering f on its spatial average. In the spherical harmonic domain, the spatial average is simply given by the 0th-order component f 0 0 . This approach is used, for example, in [37,38,40]. In [64], this is also referred to as the normalized cross-covariance coefficient.

Finding the optimal rotation R
A rotation in R 3 can be equivalently represented by: (a) a 3 Â 3 rotation matrix, (b) a set of 3 Euler angles (or alternatively 3 Tait-Bryan angles), (c) a unit quaternion, also known as a versor, or (d) an axis-angle representation (i.e. a unit vector indicating the direction of an axis of rotation, and an angle describing the magnitude of the rotation about the axis). The different properties of these formalismssuch as compactness, numerical stability, computational cost, singularity or gimbal lock, etc.can easily be found in the literature, as well as conversion formulae from one representation to another [65][66][67].
We denote as SO(3) the 3-D rotation group, also known as the special orthogonal group. SO(3) contains all rotations R about the origin in 3-D Euclidean space: where I 3 is the 3 Â 3 identity matrix. In other words, SO(3) is the subgroup of orthogonal matrices with determinant +1.
We must explore the SO(3) space in order to determine the rotation R 2 SO(3) that maximizes equation (6) or (7). One possibility would be to use a gradient descent technique, which has been formalized on the SO(3) rotation group in [68,69]. However, we have decided not to use this approach, as it presents several challenges: the procedure requires approximation of the directional derivatives, with an adjustable small step size, in order to converge to the nearest local maximum. It might also be necessary to restart the algorithm from several initial values, and choosing the optimal step size or the initial value is not a trivial task. Therefore, in the scope of this paper, we instead proceed with a sampling-based exploration of the SO(3) space, as this is conceptually simpler and straightforward in its implementation.

Sampling the rotation group SO(3)
Due to the topology of SO (3), different choices of parametrization may yield distributions of samples with various properties or biases. To implement our approach, we would prefer a uniform and deterministic sampling scheme. The definition of uniform is subject to interpretation, but essentially the sampling grid should ensure both global coverage and local separation. Note that sampling the SO(3) space is not the same task as uniformly distributing samples on the sphere S 2 , which is a well-studied question in the field of spherical acoustics processing [70]. In contrast, the SO(3) sampling problem is examined for instance in [71][72][73][74]. While an extensive analysis is beyond the scope of this article, in the following subsections we discuss and evaluate the most relevant approaches.

Parametrization by Euler angles
For the sake of simplicity, we will follow [33,37] and use regular sampling in terms of ZYZ Euler angles (a, b, c): with 0 j, k < 2B. According to [33], this sampling scheme is suitable for the analysis of band-limited functions with bandwidth B. In this case, the size of the search space is 2B Â 2B Â 2B.
Note that the null rotation (a = b = c = 0) is not part of the sampling grid. Consequently, correlating a signal with itself will not yield a = b = c = 0, but rather b = p/(4B). This is counter-intuitive but not problematic. To circumvent this, it has been proposed [37] first to rotate one of the patterns, f or g, by Àp/(4B) around the y axis.
We should also note that the true rotation R might not be on the sampling grid. Therefore, we might only find an approximate solutionR that should be close to R. The notion of distance in SO(3) will be discussed in Section 3.1.4.

Sampling by Halton sequences
Another sampling strategy is proposed in [74], extending and improving a probabilistic approach first proposed in [75,76] with the use of Halton sequences in the unit cube. This approach is easy to implement, and allows the generation of sampling grids with arbitrary numbers of samples.

Axis-angle visualization
Any 3-D rotation can be represented by an axis-angle representation. This formalism is useful for visualizing the projective space in R 3 [72]: each rotation is drawn as a vector with direction n (the axis of the rotation) and a magnitude corresponding to 0 (the rotation angle). In Figure 1, we present several sampling grids generated via (a) Euler parametrization, (b) Halton sequences, or (c) the Hopf fibration sampling proposed in [72]. The later is another, more elaborated, deterministic strategy that produces dense and highly uniform grids. It can be observed graphically in Figure 1 that Euler sampling is not perfectly uniform, while the Hopf fibration grid divides the surface of SO(3) into regions of (apparently) equal volume. However, the Hopf fibration approach is restricted to fixed sequences of samples, i.e. it cannot generate an arbitrary number of samples. Therefore, for the remainder of this article, we will use the basic Euler sampling (Eq. (9)) for its simplicity and scalability (with respect to B).

Distance measure on SO(3)
Various functions for measuring the "closeness" between 3-D rotations have been proposed in the literature. The usual (angular) distance between two rotation matrices R, Q 2 SO(3) is given by [39,71,77] . This metric measures the angle of rotation needed to map the transformation R to the transformation Q, or equivalently the angle of rotation associated to the transformation QR À1 . Alternatively, we can interpret d 1 (R, Q) as a scaled Froebenius norm, since . Several other distance functions have also been proposed, producing values in different ranges and of different units. Huynh [78] presents a detailed review and analysis of various metrics, and demonstrates that many of them are functionally equivalent. Huynh concludes that the following metric, based on quaternions, is both spatially and computationally more efficient: where q R is the unit quaternion corresponding to matrix R, and Á denotes the quaternion inner product (scalar inner product of two 4-D unit vectors). Equation (10) gives values in the range [0, 1], with 0 denoting that R and Q are close [73,78]. This metric will be used for the remainder of this article. Note that, while a distance metric in SO(3) is useful, it must be handled with care in the context of this paper: consider for example an axis-symmetric pattern f , for example a cardioid in the y direction. Rotate f around the z-axis with R 0 (a 0 , b 0 , c 0 ) = (p, 0, 0). Rotate f around the x-axis with R 1 (a 1 , b 1 , c 1 ) = (0, 0, p). Both scenarios result in the exact same pattern g; however, the two rotations differ, and their distance is d(R 0 , R 1 ) = 1.

Numerical simulations
We run numerical simulations on simple test cases in order to elicit basic properties of spherical correlation and to validate our implementation.

Impact of SO(3) sampling
As discussed above (Sect. 3.1), the choice of the sampling grid for SO(3) might have an influence on the accuracy and efficiency of the maximization problem equation (7). For the sake of simplicity, here we consider only the regular Euler sampling of equation (9). However, we investigate the use of oversampling i.e. we use a smaller step size in discretizing the Euler angles (a, b, c). Instead of using 2B samples for each parameter, we use N ! 2B. This seems relevant as our scenario involves directional functions with relatively low bandwidth (B = 5 if we consider the radiation patterns available in the TU Berlin database) compared to other authors (Kostelec [33] typically presents results with B = 128 or B = 256). With such low bandwidth (B = 5), the angular step size is very large (p/B = 36°), and therefore high angular misalignment might occur. Of course, oversampling the search grid results in higher computation time, but that is not the focus of this paper.
We run the following numerical simulation: (a) generate a random directional function f with a given bandwidth B; (b) generate a random rotation matrix [79] R 2 SO(3) and compute g = K R (f ), a rotated version of f ; (c) sample SO(3) and search for the rotation matrixR that maximizes the cross-correlation; (d) compute the cross-correlation after rotational matching, i.e. the cross-correlation between g and KRðf Þ; (e) repeat the simulation for various samplings of SO (3), varying the oversampling factor N =ð2BÞ; (f) repeat the simulation for various bandwidths 1 B 6. For each test case, we perform 10,000 Monte-Carlo runs. The results are presented in Figure 2.  For each simulated bandwidth B, it can be observed that oversampling the search space improves accuracy, allowing for a higher cross-correlation (closer to 1), and a smaller distance (closer to 0) between the expected and estimated rotation matrices. For B = 2, the distance criteria dðR;RÞ appears bounded: this is due to the non-uniqueness of the solution, as several matricesR achieve rotational matching (with normalized cross-correlation values close to 1), while being "far" from the expected/simulated matrix R (see concluding remark in Sect. 3.1.4).
As a result of this simulation, we speculate that sufficiently oversampled Euler schemes yield accuracy comparable to other highly uniform grids (such as Halton sequences or Hopf fibration).

Auto-correlation of elementary patterns
We now perform a simulation to highlight some basic properties of the spherical auto-correlation, i.e. the correlation C R ðf ; f Þ of a signal f with a rotated copy of itself. For f we use different elementary functions (see legend of Fig. 3). For the sake of simplicity and visualization, we examine rotations only around the yaw angle: R R(a). Figure 3 presents the auto-correlation as a function of the rotational lag a.
We can make a few simple observations in agreement with theoretical expectations: (1) the auto-correlation coefficient C R ðf ; f Þ varies in range [À1; 1]; (2) as the omnidirectional pattern Y 0 0 ðXÞ is rotationally invariant, its autocorrelation always equals 1; (3) as all examples of f are real functions, the auto-correlation is an even function of a; (4) the auto-correlation reaches its peak at the origin (a = 0), and for any lag a we have: jC RðaÞ ðf ; f Þj C Rð0Þ ðf ; f Þ; (5) the auto-correlation of a periodic function is, itself, periodic with the same period; (6) the Dirac distribution is an eigenfunction of the auto-correlation function, i.e. the auto-correlation of a Dirac delta is a Dirac distribution itself. All of these are well-known properties of the autocorrelation function of time signals, here pertained to spherical auto-correlation of directional signals on S 2 .

Comparing two patterns of different shapes
So far, we have assumed that the two patterns f and g are rotated cousins and therefore have the same bandwidth.
In the most general case, we are interested in comparing two arbitrary radiation patterns, potentially with different bandwidths. Such a scenario generally yields jC R ðf ; gÞj < 1, but the rotational alignment can still be effective.
As a proof of concept, we investigate the crosscorrelation function of two patterns f and g having different "shapes". For f , we choose an Nth-order Dirac delta, i.e. a maximally directional pattern in direction X 0 : For g, we build a "directionally reduced" version of f such as: where f 2 [0-100] is a directivity (or "aperture") factor, and w n (f) is a spherical harmonics weighting function. The precise choice of w n (f) is not relevant here; we simply build a weighting function that allows us to generate a series of patterns with significantly different characteristics. We further impose that f = 100% produces a Dirac delta, and f = 0% produces an omnidirectional pattern. In other words, w n (f) is used to simulate patterns with varying bandwidths (see Fig. 4). Techniques for building such weighting functions have been proposed e.g. in [80]. Finally, we rotate g with a random matrix R 2 SO (3), repeating the simulation with 10,000 Monte-Carlo runs. One example of resulting rotational alignment is presented in Figure 5.
In Figure 6, we present the cross-correlation of the functions f and g for varying values of f. As expected, we observe that the spherical correlation coefficient decreases when f tends towards 0, i.e. when g becomes more omnidirectional. When f = 0%, the correlation value is not 0, but the curve is bounded by 0.2. Indeed, when g reduces to the omnidirectional pattern, it is easy to show that "f : When f is a Dirac delta, this further simplifies to C R ðf ; gÞ ¼ 1 B , which equals 0.2 in our example. The actual shape of the C R ðf ; gÞ curve ( Fig. 6) depends on the chosen weighting functions w n (f). Similarly, the distance between the estimated and expected rotation matrix significantly increases as f gets smaller.  These results suggest that the spherical correlation coefficient can be a relevant indicator of similarity between patterns of different shapes/bandwidth. As with any correlation measure, this essentially allows for qualitative interpretation; a quantitative analysis of the correlation coefficient depends on the context and purposes.

Application to measured data
We now apply the approaches presented in the previous sections to measured data. As mentioned in the introduction, our source is the acoustic instrumental radiation database, containing 41 modern and historical orchestral instruments, made available by TU Berlin [50,58]. This database contains single-tone recordings at two dynamic levels (pp and ff). For our processing, we use the ff data due to its better signal-to-noise ratio.
In the TU Berlin database, radiation patterns were measured by a surrounding spherical array of 32 microphones with a radius of 2.1 m. The database is published with spherical harmonic coefficients calculated for the first 10 partials of each played note, and third-octave band-averaged patterns are also provided [81]. Methods to obtain the spherical harmonic coefficients from such measurements are not discussed here; readers can refer e.g. to [4,8,23,36,54]. In our work, we directly exploit the precomputed spherical harmonic coefficients for the order N = 4 (25 coefficients) as provided by the TU database authors. An acoustic source centering algorithm [82,83] has also been applied, by the TU researchers, to these spherical harmonic coefficients (below 1 kHz) in order to align the acoustic center of the sound source to the geometrical center of the microphone array, and to account for the resulting phase shifts at the microphone positions. This source re-alignment procedure is known to be particularly important for directivity functions that model the complex sound pressure (as is the case in the TU database), with significant impact on the spherical harmonic coefficients; when only the absolute values (sound pressure levels) are considered, the impact of displaced sources might be less severe [84,85]. The TU database includes both uncentered and centered data, and we have used the latter to minimize the influence on our calculations of changes to instrument alignment. The practical setup used in the collection of the TU data exhibits a spatial aliasing frequency of approximately 1.1 kHz [58,82]. Observations made above this frequency should therefore be interpreted cautiously.
Finally, let us emphasize that our aim in this section is not to provide a thorough analysis of the TU database, but rather to exemplify the qualitative results that can potentially be obtained through spherical correlation. More in-depth discussions about the TU data can be found e.g. in [21,58,[86][87][88].

Similarities over frequencies
In Figure 7, we propose scatter plots to visualize the matrix of cross-correlation values, i.e. the normalized cross-correlation of all pairs of partial tones of a given instrument, for four different instruments. The color code depicts the magnitude of the correlation coefficient (its absolute value), from blue (low correlation) to red (high correlation). The x and y axes represent the frequency on a logarithmic scale. The size of the scatter dots is slightly adjusted with frequency in order to avoid overlapping dots and to improve readability. This visualization is similar to what is proposed in [21,89]. Such a diagram allows the compact combination of a significant amount of information: the number of dots is equal to (N p Â N t ) 2 where N t is the number of played tones and N p the number of partials for each tone. In the TU dataset, N p = 10. As the matrix of cross-correlation values is persymmetric, so is the scatter diagram. The cross-correlation values presented here have been calculated without rotational alignment. When Figure 5. Example of rotational alignment of two patterns with different bandwidths. In magenta, the original pattern f (Dirac delta with N = 4); in black, the rotated and directionally reduced pattern g (with f = 40%); in red, pattern f after rotational alignment with the estimated matrixR. We observe that the rotation matching is still effective, although the main lobes of the patterns are not perfectly aligned (dðR;RÞ % 10 À1 ). The correlation coefficient is C R ðf ; gÞ % 0:5. applying rotational alignment, the scatter diagrams (not shown here) do not exhibit significant differences, suggesting the musicians did not rotate dramatically during the recording session.
Inspecting these diagrams, as well as those of other instruments not displayed here for brevity, we can draw several general observations. The results are consistent with the categorization of the TU database into three groups of instruments, as proposed by Shabtai et al. [58]: those with one expected radiation point, such as brass instruments; those with several expected radiation points, such as woodwind instruments, with sound radiated by the bell, the fingering holes, and the mouthpiece; and those with a full body radiating sound, such as string instruments. For brass instruments (tuba in Fig. 7), the cross-correlation values vary slowly with frequency, and the diagonal in the matrix has a wide "spread". This suggests that the radiation patterns evolve smoothly over frequency. In other words, nearby frequencies produce similar directivity patterns. For woodwind instruments (clarinet in Fig. 7), there is greater correlation of all partials for low frequencies, approximately below 300 Hz. With increasing frequencies, the correlation values exhibit abrupt changes, manifested by vertical and horizontal "stripes" and "clusters" in the depicted matrix containing the cross-correlation values. These discontinuities are likely related to changes of fingering, for example using the register key. For string instruments (cello and double bass in Fig. 7), the crosscorrelation is relatively large in the low frequency region (for wavelengths larger than the dimensions of the instrument), but drops sharply at higher frequencies, suggesting greater variations in their radiation patterns. For further interpretations see e.g. [21,46,89].

Similarities over partials
Following an idea from Hohl and Zotter [41,46], we now plot the matrix of cross-correlations organized by partials rather than by frequencies. In other words, we generate a 2-D color-coded diagram, similar to the one presented in the previous section, but with x and y axes now organized by partial tones instead of frequencies. The matrix is organized in blocks each containing a chromatic scale, with each successive block representing a different partial for each played note. Black dashed lines separate each partial index, for ease of readability. This sorting makes the comparison of cross-correlation between partials more evident. With this matrix arrangement, partials with matching frequency are located on the secondary diagonals. Figure 8 illustrates examples of results for the same four instruments as in Figure 7.
For the tuba, it can be observed that most partials at the same frequency exhibit strongly correlated radiation, regardless of the played pitch from which they originate. This is in line with the observations made in Section 5.1. Similar results can be observed for other brass instruments (not shown here). For the clarinet, however, the diagram is less regular. Along the diagonal, a checkered pattern appears, revealing that partial tones with similar or nearby frequencies radiate differently. More precisely, pitches from C to G# are strongly correlated, but differ significantly from pitches A to B. The abrupt change between these two blocks corresponds to the transition between the first and second registers of the B[ clarinet. This suggests that the addition of the register key, and consequently discontinuous change in fingerings, strongly impacts the radiation pattern. For string instruments such as the cello and double  bass, except for a few narrow diagonal red lines, the overall similarities between partials are low. Each partial seems to have its own radiation pattern, uncorrelated with the other partials, suggesting that the directivities of these instruments are rather complex. The diagrams presented in Figure 8 are qualitatively comparable to ones obtained by Hohl and Zotter with another database of instrument recordings. Further analysis can be found in [41,46].

Visualization with multidimensional scaling
Another motivation for this work is to classify the TU Berlin database for subsequent manipulation with corpusbased synthesis techniques: by organizing instrumental samples according to their directional characteristics, the resulting low-dimensional representation can be used, along with other audio descriptors, to navigate the sample corpus.
With an approach to pairwise spherical correlation in place, we can now examine applications to classify and navigate larger collections of 3-D radiation patterns by using multidimensional scaling (MDS) [90]. MDS is used to translate information about the pairwise distances (or dissimilarities) among a set of p objects into a configuration of p points mapped onto an abstract Cartesian space. MDS is therefore a means of visualizing the level of similarity of samples within a dataset.
We apply MDS analysis, using a dimensionality of 2 for simplicity of visualization and interpretation, to measured radiation patterns of different instruments for the fundamental frequency of various played pitches. Examples of MDS for two different pitches are presented as scatter plots in Figure 9. The color code corresponds to the three categories of instruments cited above [58]: in red, instruments with one expected radiation point such as brass; in blue, instruments with several expected radiation points, such as woodwinds; and in black, instruments with a complex radiation pattern, such as strings. The MDS plots for these two pitches, as well as for others not shown for brevity, show groupings among instruments of these three categories. Further sub-classes of instruments of similar construction appear to cluster together: for example, the cylindrical-bore brass instruments in the trombone family, which are separated from conical-bore French horn and tuba. Members of the single-reed clarinet and saxophone families cluster together, as do members of the double-reed bassoon and dulcian family. As expected, historical instruments are positioned near their modern counterparts. Therefore we observe that MDS based on spherical correlation allows us efficiently to segregate and organize the instruments in accordance with their predicted categories.
The examples presented in Figure 9 were evaluated with pitches performed at dynamic ff. When both dynamic levels pp and ff are included in the MDS analysis, the corresponding points (not shown for clarity) are mapped closely in the MDS space, indicating that the played dynamic level does not substantially affect the clustering operation.
Rotational matching has not been applied to these examples in order better to visualize the distinctions between instrument categoriesas, by definition, rotational alignment always yields higher correlations. We examine the effects of rotational matching in the following section.

Exploring rotational matches
We now show that rotational matching can be a convenient tool to explore the database further and search for similarities, regardless of the orientation of the instruments. The following analyses refer to the radiation coefficients averaged over third-octave bands [58]. While both pp and ff dynamics have been analyzed, the results are presented only for ff data for the sake of readability.

Across instruments
In Figure 10, we examine the cross-correlation between all instruments in a single frequency band, before and after applying rotational matching. The axes are organized according to the expected three main categories of radiation characteristics, separated by black lines ("brass-like", "woodwind-like", and "string-like", as discussed in Sect. 5.3). Within each category, instruments are presented in alphabetical order without presupposing any finer categorization. White spaces indicate instruments for which no data is present in the chosen frequency band (timpani, flutes, and oboes in Fig. 10). As the matrix of cross-correlation values is persymmetric, only half of it is shown: instead, in the upper and lower triangles we present the correlation values with and without rotational alignment, respectively.
We can make a number of observations, ranging from more general to more specific: 1. The upper triangle has higher values than the lower triangle, by definition of rotational matching. 2. In the upper triangle, nearly all correlation values are greater than 0.5, indicating some degree of similarities across most instruments. This is somewhat expected: in this relatively low frequency band, most instruments have energy concentrated in the lower spherical harmonic orders, and they behave roughly like firstorder functions, with cardioid-or dipole-like patterns. The concentration of energy within the lower orders is also, partly, a consequence of the acoustic centering process that has been applied [82]. 3. The three "diagonal blocks", representing correlation among instruments of the same category, show overall higher correlation than the other blocks, comparing instruments of unlike categories. This is observed both before and after rotational matching, but with several nuances: for example, the "brass-like" instruments show a particularly large increase in correlation after rotational matching, which is consistent with the assumption that they each have a single point of radiation but oriented in different directions. To a lesser extent, the "string-like" instruments show a significant improvement in correlation following rotational matching, which again fits the observation that several of these instruments feature a similar construction but oriented in different directions (for example violin, viola, cello, and guitar). In these cases, therefore, the similarities among instruments are most clearly revealed when they are rotated. 4. A few instruments visually emerge as "outliers", exhibiting a line (vertical or horizontal) with strikingly low correlation values, both with and without rotational matching. In particular, this is the case for the double action harp, pedal timpani, French horn, and to a lesser extent dulcian. Indeed, it appears that these instruments have significant energy in higher orders (N > 1), and consequently their radiation patterns are dissimilar to other instruments. We can propose several hypothetical explanations: (a) These instruments are also outliers in terms of their construction and performance technique, with few obvious correlates in the database: the timpani and harp are percussion instruments that are only provisionally grouped with "strings" due to their full-body radiation. The horn, while technically a brass instrument, has historically been acknowledged as an exceptional member of that group due to its timbre and directivity [1]: it is the only instrument whose bell is oriented behind the performer, with the performer's hand mediating the output. Its directional characteristic is therefore expected to be more complex (of higher order) than other brass instruments performed with open bells. While the dulcian may be expected to correlate with other members of the bassoon family, it is an outlier as the oldest instrument of the database by nearly a century (ca. 1600) [50] and we hypothesize that its unusual directivity is consistent with historical trends favoring more focused directivity as instruments have modernized. (b) Consistent with these observations, the presence of higher order components may be caused by the contribution of large, spatially extended, excitation sources (i.e. the absence of a unique natural acoustic center) and/or by off-center sound sources and aliasing errors; this seems especially plausible for the harp and timpani, as they are among the largest instruments in the database. We hypothesize that the performance of the centering algorithm might be degraded for these instruments (in spite of the relatively low frequency). (c) When measuring harp and timpani, "a relatively small-area floor was used" to support the instrument [83]; for the wavelength of interest, this might have contributed to undesired reflections or scattering (based on the available pictures, this floor was covered with absorptive materials for the timpani only). (d) Finally, the timpani are unique in the database as only one played pitch was recorded for each (historical and modern); this could explain differences from the other instruments, due to relatively fewer data for the thirdoctave averaging. 5. As already noted in Section 5.3, modern instruments and their historical counterparts usually exhibit high correlation values, even without rotational matching. The best rotational alignment is typically achieved with a relatively small angular displacement (not shown here for brevity), suggesting that the difference between modern and historical instruments might be essentially due to slight changes in the performer's orientation and instrument construction. It is, however, not obvious why modern and historical viola exhibit low correlation values in this particular frequency band. 6. Double bass exhibits only moderate correlation with other members of the string quartet; this bias is possibly an effect of its large size relative to the frequency band being considered: we have noted in Section 5.1 that, for string instruments, partial tones within a small frequency interval show low correlation values for wavelengths smaller than the size of the instrument; consequently, the averaged directivity pattern of Double bass in the frequency band centered at 198 Hz is questionable. 7. While it is surprising there is not higher correlation between French horn and natural horn, the latter may be anomalous because relatively few performed pitches in the natural harmonic scale leave sparse data for third-octave averaging in this register (similar to timpani as mentioned above).
8. We observe high correlation between several unexpected pairs of instruments before rotational matching, such as trumpet and historical violin, tuba and historical violin, or trumpet and clarinet. There is not space to analyze every example in this paper; however, spherical correlation and the proposed visualization strategies appear as promising tools for further research. For brevity we also have not systematically discussed here the specific angle of rotation that produces the optimal rotational alignment; again, this would be a fruitful topic for further study.

Across frequencies
As mentioned in observation 3, rotational alignment is especially relevant for brass, so we will focus on the tenor trombone as one particular representative. Still considering its data in the 198 Hz-centered band, we present in Figure 11 its cross-correlation with all other instruments in all available frequency bands.
Before rotational matching, we observe high correlation values with the other trombones at the same frequency (historical tenor trombone, modern and historical bass trombone), but also with the bass trombone in nearby frequency bands (99 Hz and 157 Hz). This expected affinity between all trombones in nearby frequencies is further increased after applying rotational alignment. There is also some correlation with the trumpet in the 314 Hz band, where the shift to higher frequencies could be reasonably explained by the trumpet's smaller dimensions.
The three instruments that "benefit" the most from rotational alignment are contrabassoon, bass clarinet, and historical bass trombone. One hypothesis is that these instruments have a relatively similar size, but are performed with their bells oriented in different directions: the trombone points forwards in front of the performer, while the bassoon and contrabassoon bells point upwards, and the bass clarinet bell is located at the bottom of the instrument but pointing upwards. This is partially confirmed by Shabtai et al. [58] (Figs. 8a and 9a) in their measurements of acoustic source centers, where they observe that both trombone and bassoon have strong excitation sources at their bells, but with different spatial orientations. Therefore it is plausible that a suitable rotation would produce a high correlation between the radiation patterns of these instruments.
More surprisingly, there is also noticeable correlation, before and after rotational matching, with a few strings instruments (violas, basses, historical cello, and acoustic guitar) in certain frequency bands. As in Section 5.4.2, observation 2, this could be due to the relatively low-order radiation patterns of all instruments in low frequency bands.
In summary, beyond the expected correlations with other brass, Figure 11 reveals that rotational matching can capture less obvious similarities with different instrument categories, such as some woodwinds and strings, in different frequency regions. Nevertheless, a much more systematic study would be required to confidently relate these observations to the instruments' construction and performance technique. Now focusing on one of the above examples, in Figure 12 we graphically represent the radiation patterns for tenor trombone and contrabassoon in the respective frequency bands centered at 198 Hz and 63 Hz, where rotational alignment induced a large increase in cross-correlation. As the normalized cross-correlation is used here, it is possible to detect similarities regardless of differences in overall energies of the two instruments, as already noted by Pollow [21] (Sect. 2.3.7). The contrabassoon radiation pattern shown in Figure 12c is essentially a vertical dipole, consistent with the vertical orientation of the instrument, as compared to the horizontal orientation of the trombone. However, after rotating the contrabassoon radiation pattern, in Figure 12f we can observe that the 2-D polar patterns align nicely. This is not always the case: indeed, maximizing the 3-D shape alignment does not necessarily imply strong similarity on the horizontal plane.
Beyond acoustical analysis, such high correlations suggest clear applications from an artistic perspective in the case of CBCS, as discussed in the introduction. By concatenating instrumental samples according to spherical correlation, smooth transitions can be made between radiation characteristics of disparate instruments, with or without rotational alignment, suggesting a novel approach to corpus-based spatial sound synthesis.

Conclusion
In this paper, we discussed the use of cross-correlation on the sphere as a similarity measure for the classification of 3-D radiation patterns. We showed that this tool can facilitate thorough analysis of directivity pattern similarities across partials of one instrument or between different instruments. We have presented several visualization tools that allow the compact organization and examination of multidimensional data, revealing or confirming the radiation behavior or categorization of instruments. Multidimensional scaling, in particular, appears to be a powerful approach to clusterize instruments based on their radiation characteristics, with potential applications to the creative exploration of the corpus. We have also discussed some challenges regarding the implementation of rotational alignment, and demonstrated that rotational matching can be useful to detect similarities among different categories of instruments, or different frequency bands, by mitigating the effects of differing instrument construction and orientation.
As a consequence, spherical correlation presents promising possibilities for human or machine navigation of a database of radiation patterns. We have already applied CBCS to computer improvisation based on audio features, in particular using timbral descriptors to structure sequences of complex sounds with an audio oracle algorithm [91]. These CBCS sequences can be synthesized spatially using spherical harmonic coefficients derived from the TU database [17] and projected with a compact spherical loudspeaker array such as the IKO [92] to approximate the dynamic radiation patterns of acoustic instruments, as implemented in Einbond's composition Prestidigitation for percussion and 3-D electronics in 2022. A further step will be to use an MDS visualization of the TU database to train the computer improvisation agent directly on spherical correlation distances themselves, allowing for direct learning and continuation of spatial gestures.
Future work will investigate other spatial descriptors for the classification of the 3-D database of orchestral instruments as well as other metrics recently proposed in [40]. Finally, we will examine whether spherical correlation can be useful to detect or correct possible rotational misalignment of a human performer across multiple measurements, thereby improving reproducibility.

Acknowledgments
This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 101019164 (ERC MusAI -Music and Artificial Intelligence: Building Critical Interdisciplinary Studies).

Data availability statement
The research data associated with this article are available in github, under the reference https://github.com/ tcarpent/shxcorr.
Cite this article as: Carpentier T. & Einbond A. 2023. Spherical correlation as a similarity measure for 3-D radiation patterns of musical instruments. Acta Acustica, 7, 40.