Issue
Acta Acust.
Volume 8, 2024
Topical Issue - Musical Acoustics: Latest Advances in Analytical, Numerical and Experimental Methods Tackling Complex Phenomena in Musical Instruments
Article Number 65
Number of page(s) 12
DOI https://doi.org/10.1051/aacus/2024042
Published online 20 November 2024

© The Author(s), Published by EDP Sciences, 2024

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

Across various engineering domains, physical modeling has become a widely used strategy to simulate and predict the behavior of physical systems. In musical acoustics, physical models of musical instruments have been developed over the last few decades for sound synthesis and to better understand the underlying mechanisms behind the functioning of musical instruments. These models also elucidate the relationships between design and acoustic responses such as intonation, ease of blowing, and timbre. Being able to numerically compute quantitative descriptors associated with the functioning of musical instruments can be crucial for instrument makers, as it enables virtual prototyping as a tool in the development process of new instruments. Another important benefit of virtual prototyping is the potential savings in time and resources typically associated with producing hardware prototypes through traditional trial-and-error procedures. Altogether, providing musical instrument designers with numerical tools based on physical modeling is highly appealing for assisting the development of new instruments.

In musical acoustics, physical models of brass instruments have been proposed using a modal description of the exciter (the lips) and the resonator (the air column), along with a nonlinear flow equation that couples the two linear systems [1]. These models have shown great capability in reproducing the behavior of brass instruments [24] and offer a system description with a relatively limited number of equations. They also provide enough precision to compare instruments with slight design changes if modal parameters of the instrument are computed or extracted from measurements with enough accuracy [57]. Different outputs can be extracted from these models, depending on the numerical method applied. The bifurcation diagram is an object that represents the evolution of a system’s solution with respect to one or several parameters. The solution branches and their stability provide a global image of the model’s behavior from which performance descriptors of the instrument can be extracted [5]. Numerical continuation is one method to compute the solution branches and the bifurcation diagram of a trumpet. The Asymptotic Numerical Method, associated with the Harmonic Balanced Method and implemented in the software MANLAB, is one method that has been applied to brass instrument models [8, 9]. Another method, based on a prediction-correction algorithm and implemented in the software AUTO, has also been recently applied to wind and brass instrument models [1012].

Although these continuation methods have demonstrated their efficiency in computing bifurcation diagrams of brass instruments for comparing instruments [5], shedding light on phenomena such as the influence of impedance inharmonicity [10], and the production of ghost notes [12], handling these methods requires theoretical and technical knowledge, making them difficult for a novice to use. Furthermore, selecting the parameters for the model, particularly the lip parameters in brass instruments, is critical since the mechanical parameters of the lips are challenging to estimate experimentally and can significantly impact the model’s behavior. To account for the variability and uncertainty in the lip parameter values, computing solutions for a large number of lip parameter values (virtual players) can be a valid option. Nevertheless, it usually induces a high computational cost, which can be a significant limitation, especially when combined with an optimization routine. Given these constraints, transferring this technology into an “easy-to-use” application for use in a designer’s workshop is not straightforward, and solutions must be found to overcome this issue.

One strategy explored in previous work involves training a machine learning model (surrogate model) to compute solutions at a much lower computational cost than numerical integration or continuation methods. This approach was applied to trumpet bore optimization [7, 14], and in an initial attempt to predict descriptors associated with trumpet bifurcation diagrams [13]. This methodology, which combines physical modeling with artificial intelligence and data-driven methods, has also been applied to other instruments, such as the violin for predicting eigenfrequencies of the violin body [15], and the piano for sensitivity analysis of the dynamic behavior of the soundboard [16].

As depicted in Figure 1, our goal is to precisely compute the outputs of numerical continuation – specifically the bifurcation diagram and its associated descriptors – using a “black box” model that does not require any knowledge of continuation methods and requires very little computational time. The benefits of this approach are straightforward: 1) the model can be encapsulated into an “easy-to-use” software application that can be used autonomously by a designer, and 2) calculations can be performed for several virtual players to account for uncertainties in lip parameters, providing a richer analysis of an instrument.

thumbnail Figure 1

Goal: replacing the traditional physical approach with a fast and interpretable machine learning approach.

In this study, we followed this approach to develop a numerical tool for trumpet designers that allows several descriptors associated with the bifurcation diagram to be computed using machine learning. The article is organized as follows: Section 2 provides an overview of the physical model and bifurcation diagram of a trumpet. Section 3 discusses the generation of the dataset. Section 4 details the application of machine learning models. Finally, Section 5 presents the proposed tool, followed by the conclusions.

2 Trumpet bifurcation diagrams

2.1 Physical model of a trumpet and bifurcation diagram

We consider here a classical model based on three equations that assumes linear propagation in the resonator. It includes a mechanical equation for the lips, represented by a one-degree-of-freedom damped oscillator, an equation for the resonator, represented by a series of complex modes, and a Bernoulli-like flow equation. Note that lip models with two degrees of freedom can also be considered [17], but we prefer to keep the model simple enough to work with a limited number of parameters for the lips.

Denoting y the vertical lip position, y0 is the lip position at rest, ωl, Ql, μl, and b the lip mechanical parameters (resonance angular frequency, quality factor, mass per surface area, and lip opening width respectively), sk and Ck with k ∈ [1, N] the modal parameters (poles and residues respectively) of the N resonances of the acoustic impedance of the instrument, Zc the characteristic impedance, u the volume flow, p the downstream pressure at the input of the instrument (in the mouthpiece), and p0 the upstream (mouth) static pressure, the model is written as follows:

{ÿ(t)+ωlQlẏ(t)+ωl2(y(t)-y0)=1μl(p0-p(t))ṗk(t)=ZcCku(t)+skpk(t),k[1,N]u=2|p0-p|ρbsign(p0-p)θ(y),$$ \left\{\begin{array}{l}\ddot{y}(t)+\frac{{\omega }_l}{{Q}_l}\dot{y}(t)+{{\omega }_l}^2(y(t)-{y}_0)=\frac{1}{{\mu }_l}({p}_0-p(t))\\ {\dot{p}}_k(t)={Z}_c{C}_ku(t)+{s}_k{p}_k(t),\forall k\in [1,N]\\ u=\sqrt{\frac{2|{p}_0-p|}{\rho }}b\cdot \mathrm{sign}({p}_0-p)\cdot \theta (y),\\ \end{array}\right. $$(1)

with θ(y)=|y|+y2$ \theta (y)=\frac{|y|+y}{2}$, ρ the air density, and the mouthpiece pressure p obtained from p(t)=2k=1NR(pk(t))$ p(t)=2{\sum }_{k=1}^N \mathfrak{R}({p}_k(t))$.

This model can be written in a dimensionless form with quadratic nonlinearity, making it easy to analyze using the Asymptotic Numerical Method (ANM) [5]. Bifurcation diagrams, such as the one for a B♭4 (fundamental frequency f0 ≃ 470 Hz) shown in Figure 2, can then be computed using the software MANLAB. This bifurcation diagram was obtained with eleven acoustic modes (N = 11) enabling an accurate fit on the measured input impedance, and with the following lip parameter values taken from [5] and determined based on values provided in the literature, considering constraints on the blowing pressure levels to generate sound: Ql = 3, μl = 2 kg·m−2, y0 = 0.1 mm, b = 8 mm. The natural frequency of the lips fl = 382.18 Hz is obtained by Linear Stability Analysis (LSA) in order to locally minimize the threshold mouth pressure [2].

thumbnail Figure 2

Descriptors associated with the bifurcation diagram of a B♭ trumpet for a B♭4. The thin lines represent the unstable branches, while the bold lines represent the stable branches. Detailed definitions of these descriptors are provided in Table 1.

Note that the bifurcation diagram in Figure 2 reveals an inverse bifurcation. Indeed, measurements of crescendo-decrescendo maneuvers on artificial player systems and musicians usually reveal some hysteresis around the oscillation thresholds [18]. This observation aligns more closely with an inverse bifurcation, although direct bifurcations are likely to occur in real trumpet playing. With the physical model used in this study, direct bifurcations can be obtained, for instance, by reducing the lip position at rest y0, or increasing the value of Ql.

2.2 Landmarks of the bifurcation diagram – performance descriptors

A number of specific values can be extracted from the bifurcation diagram, particularly from the |p| and f0 traces. These quantities describe the main features of the solution branches constituting the bifurcation diagram and are closely related to the performance of the instrument with a given set of lip parameters. Therefore, we choose to refer to these values as “performance descriptors”, and we extract 10 of them from the bifurcation diagram, as illustrated in Figure 2. The definitions of these descriptors are provided in Table 1.

Table 1

Definition of the performance descriptors extracted from the bifurcation diagrams.

It can be considered that for each {player – instrument} pair, these 10 descriptors constitute a quantitative performance evaluation that facilitates comparisons between instruments. Furthermore, using these descriptors, it is possible to reconstruct the skeleton or the outlines of the associated bifurcation diagram.

The final goal of this study is to predict these descriptors using a machine learning model, thereby replacing numerical continuation. Note first that from Figure 2, it appears that Pmin1 and Pf0min are almost identical (Pf0min is slightly larger than Pmin1), although we are not aware of any theoretical reason justifying the observation of the minimum value of f0 on the stable branch nearly at the Hopf bifurcation mouth pressure. Additionally, other descriptors that potentially carry significance for players in terms of instrument performance can be deduced from the 10 descriptors in Table 1: the “hysteresis amplitude” H = Pmin1 − Pmin2, the “dynamic range” D = pmax − pmin, or the “pitch stability” Δf0 = f0max − f0min.

2.3 Relevance of the descriptors

Initially, we propose to assess the ability of the descriptors to classify high-end trumpets. It is indeed valuable to confirm the relevance of the descriptors extracted from the bifurcation diagram in distinguishing between existing instruments within the same product range (specifically professional instruments).

We consider four real trumpets labeled as B, S, V, and W, along with 800 virtual players to construct a dataset consisting of 3200 bifurcation diagrams. The set of 800 virtual players is generated using uniform sampling of Ql, μl, and y0, within boundaries corresponding to ±10% of the parameter values given in Section 2.1. Initially, we classify the trumpets using the 10 performance descriptors extracted from these bifurcation diagrams. Subsequently, we interpret the classification results by analyzing the feature importance of the machine learning model.

2.3.1 Trumpet classification

Now, we aim to demonstrate that the descriptors extracted from the bifurcation diagrams can effectively classify different trumpets. We utilize the XGBoost algorithm [19], a popular machine learning algorithm for classification tasks based on decision trees. We follow a standard procedure to train the XGBoost classifier. The dataset is split into a training set (75% of the dataset) and a test set (25% of the dataset) while maintaining the same proportion of each trumpet in each set. To optimize the performance of the XGBoost classifier, we employ a stratified 5-fold cross-validation [20] on the training set to fine-tune its hyperparameters, as illustrated in Figure 3. Adjusting these hyperparameters alters how decision trees are constructed from the training data.

thumbnail Figure 3

Complete 5-fold stratified cross-validation procedure for hyperparameter search. (1) The dataset is randomly split into a training set and a testing set while maintaining equal proportions of each trumpet in both sets. (2) The training set is randomly split into 5 subsets (or folds) ensuring each trumpet’s proportion is preserved across folds. (3) The model is trained on 4 folds and performances are evaluated (validated) on the remaining fold. This process repeats 5 times, resulting in 5 trained models with associated validation performances. (4) The accuracy is computed and averaged over the 5 validation set. (5) Steps (3) and (4) are repeated with different hyperparameter sets, iterating this process 100 times. (6) The hyperparameter set yielding the highest average accuracy is selected. The final model is trained on the entire training set (from step 1) using these hyperparameters. The performance of the final model is evaluated on the testing set to assess its generalization ability.

We use the mean accuracy, denoted as ACCmean, to evaluate the classification performance. ACCmean is computed as the number of correct classifications divided by the total number of classifications. Additionally, we replicate the experiment using descriptors represented in the principal components space. This is achieved by applying Principal Component Analysis (PCA) [21] to the normalized descriptors of the training set before training the XGBoost classifier. PCA is a statistical method that transforms descriptors into a new coordinate system, emphasizing the variance in the data.

Figure 4 displays the confusion matrices of the classification outcomes using raw descriptors and descriptors represented in the principal components space. Reviewing the confusion matrices reveals that classifying trumpets based on raw descriptors achieve a mean accuracy of 81.12%. However, preprocessing the descriptors using PCA significantly enhances the classification performance, yielding a mean accuracy of 99.50%. In summary, these results underscore the relevance of the performance descriptors in effectively distinguishing between the four instruments.

thumbnail Figure 4

Confusion matrix for instrument prediction based on descriptors using the XGBoost. Left: features are raw descriptors, ACCmean = 81.12%. Right: features are descriptors represented in principal components space, ACCmean = 99.50%.

2.3.2 Interpretability of the classification results

To interpret the classification results, we analyze the feature importance using XGBoost. As the XGBoost classifier is based on decision trees, we measure the importance of each feature in the classification by calculating the gain of each feature. Gain quantifies how much each feature contributes to improving the model’s accuracy when it makes decisions. Figure 5 illustrates the gain of each feature in both cases: using descriptors represented in the original descriptor space and in the principal components space. Higher gain values indicate that a feature contributes more significantly to the classification process. In essence, features with higher gain values are more important because they lead to greater improvements in the model’s accuracy when utilized for decision-making.

thumbnail Figure 5

Gain – improvement in accuracy – of each feature in the XGBoost classifier. Left: features are raw descriptors. Right: features are descriptors represented in principal components space.

In the descriptor space, no descriptor appears to be useless for classification. The most important descriptors are f0 fold, slope and f0max. This highlights that the bounds of the f0 trace are crucial performance descriptors for distinguishing trumpets in this space. However, relying solely on these descriptors is insufficient, as two trumpets with different virtual players can yield the same f0 bounds.

Looking at the descriptors in the principal components space, it is notable that the most important axes for the classification do not necessarily correspond to those with the highest explained variance ratio. Instead, the second, and then the ninth axes demonstrate larger importance. Examining the biplot of these axes in Figure 6, we observe that they primarily differentiate between trumpets. Even though the variance ratio on the ninth axis is almost negligible, these results are stable across all the validation splits of the cross-validation scheme. The robust classification performance of the XGBoost classifier using descriptors represented in the principal components space arises from its effective ability to distinguish between trumpets through decision trees in this transformed space. This study demonstrates that the selected set of performance descriptors can identify trumpets with minimal error. Therefore, we can try to generate these descriptors without solving the classical physical model. This approach would enable the creation of a fast and easy-to-use tool for trumpet designers.

thumbnail Figure 6

Biplots of the principal component analysis axes with the highest gain in XGBoost classifier. The explained variance ratio of the axes is in parentheses. Arrows in the biplot represent descriptors. The direction of each arrow indicates increasing values for that descriptor, while the arrow’s length represents the descriptor’s contribution to overall variation in the dataset.

3 Prediction of performance descriptors

In this section, we outline the machine learning models utilized to predict the performance descriptors extracted from trumpet bifurcation diagrams. Firstly, we define the problem statement and discuss the virtual instruments and players employed to generate the training set. Following this, we present the results of a benchmark that compares the effectiveness of different machine learning algorithms in predicting the descriptors of bifurcation diagrams, providing insights into the most effective algorithm. Finally, we elaborate on the machine learning model chosen to predict the performance descriptors of the bifurcation diagrams.

3.1 Virtual instruments and virtual players

As explained in Section 2.1, the primary objective of this work is to replace numerical continuation calculations with a machine learning model capable of predicting the 10 descriptors outlined in Table 1 for a specific trumpet identified by its input impedance. Our goal is to replace each of the np virtual players, characterized by their lip parameters, with a machine learning model that can predict the 10 descriptors of the bifurcation diagram based on a given input impedance. To achieve this, it is crucial to generate a dataset that includes performance descriptors for a sufficiently large number of virtual instruments defined by their input impedance, specifically by their modal parameters. Moreover, we stress the importance of the model’s accuracy and reliability, especially within the design space – the range of modal parameters that encompasses various instruments. Therefore, the training set must be representative of the modal parameter space of real instruments, ensuring that the model performs accurately across this space.

To ensure coverage of the virtual trumpets’ conditions, modal parameters from impedance measurements of nine commercial trumpets from different makers are extracted using the high-resolution ESPRIT method [22]. These modal parameters set boundary values for each of the 11 modes defining the input impedance of the instruments.

The input impedance, decomposed into complex modes characterized by their complex poles sk and residues Ck, defines a design space of dimension nd = 44. Subsequently, a total of nt = 200 virtual trumpets are generated by randomly sampling these 44 parameters from a uniform distribution within the bounds provided by the measurements of the nine professional trumpets.

Figure 7 depicts the corresponding impedances of the 200 virtual trumpets. For this study, focus is placed on the note B♭4 (with f0 ≃ 470 Hz), whose fundamental frequency lies between the fourth and fifth impedance peaks. To provide a detailed examination of the impedances of the virtual instruments around this fundamental frequency, a zoom around the fourth and fifth peaks is presented in Figure 8.

thumbnail Figure 7

Normalized amplitude and phase angle of the impedances of the 200 virtual instruments generated to train the machine learning models.

thumbnail Figure 8

Zoom around the fourth (left) and fifth (right) peaks of the impedances of the 200 virtual instruments.

As previously mentioned, a significant advantage of machine learning is its low computational cost, allowing consideration of various virtual players for each instrument. Each virtual player is defined by four parameters (Ql, μl, y0, and b), with the lip natural frequency fl calculated by LSA for each {player – instrument} pair. A set of np = 60 virtual players is generated using Latin hypercube sampling of Ql, μl, and y0, constrained within boundaries corresponding to ±10% of the parameter values outlined in Section 2.1 (Fig. 9). It’s worth noting that these boundaries in the virtual player space are somewhat arbitrary; we assume ±10% variations represent reasonable expectations for differences among human players or variations within a group of players of similar playing level.

thumbnail Figure 9

Lip parameters associated to the 60 virtual players. The red dot represents the baseline virtual player (Ql = 3, μl = 2 kg/m2, y0 = 0.1 mm).

Overall, the training set comprises a total of 12,000 bifurcation diagrams for the note B♭4, each yielding a set of 10 descriptors. These diagrams are computed using an automated Matlab routine and the MANLAB source code. Given that it takes approximately 3 min to compute one bifurcation diagram on the computer used for calculations, generating the entire training set required approximately 600 h (equivalent to 25 days) of computational time.

3.2 Comparative study of machine learning algorithms

We conducted a comparative analysis of various machine learning algorithms to predict the descriptors of bifurcation diagrams using the dataset generated in Section 3.1. Due to the relatively small size of the dataset, we focused on machine learning algorithms known for their performance on small datasets. The problem at hand is a regression task, where the objective is to predict each descriptor from the trumpet’s impedance for each {musician – descriptor} pair. We considered several machine learning algorithms for regression, including linear approaches such as Linear Regression, LassoLars [24], and Ridge Regression [25], as well as Support Vector Machine for Regression (SVR) [26] with various kernels, and Nearest Neighbors [27] approaches. Additionally, we included a baseline algorithm that predicts a constant value for each descriptor, set to the mean of the descriptor in the training set.

We follow a common approach to train the machine learning models (one for each musician-descriptor pair). First, we normalize the dataset by subtracting the mean and dividing it by the standard deviation of each modal parameter. Then, we split the dataset into a training set (75% of the dataset) and a test set (25% of the dataset). For each algorithm, we employ a 5-fold cross-validation procedure to fine-tune its hyperparameters and evaluate its performance, as illustrated in Figure 3. The chosen performance metric is the Mean Absolute Percentage Error (MAPE), calculated as the mean of the absolute differences between the predicted and true descriptor values, divided by the true descriptor value. Table 2 summarizes these results. Despite its relative simplicity, the LassoLars approach exhibits superior performance compared to other methods in predicting the descriptors values of the bifurcation diagrams, displaying the lowest MAPE for each descriptor. Notably, for frequency descriptors (f0min, f0max, f0 fold and f0H), the prediction errors range between 0.5 and 0.7 cents.

Table 2

Mean Absolute Percentage Error (MAPE) and Standard deviation of the Absolute Percentage Error (StdAPE) in predicting descriptor values using various machine learning algorithms on the test set. The best performance for each descriptor is highlighted in bold.

3.3 Focus on the LassoLars algorithm

In this section, we describe the LassoLars algorithm, which outperforms other algorithms in predicting the descriptors of the bifurcation diagrams.

Consider one of the descriptors obtained from the bifurcation diagram. The question we are exploring is whether, for a given musician, it is possible to predict the value of this descriptor using a linear combination of the nd = 44 modal coefficients and a bias (or intercept). If we consider a single trumpet, the answer is positive, and there are even an infinite number of possible combinations since the problem is linear and underdetermined (nd + 1 = 45 unknowns for a single equation). As we include more trumpets in the training set, the number of equations increases while the number of unknowns remains constant. However, when all nt = 200 trumpets are considered, it is highly unlikely that an exact solution to this overdetermined system exists (200 linear equations for 45 unknowns). Therefore, we seek the best solution that minimizes in the least squares sense, the difference between the 200 predicted descriptor values for the nt = 200 trumpets and the exact descriptor values obtained from their respective bifurcation diagrams. Thus, we aim to solve the following optimization problem for each descriptor d and each musician m:

minwm,dRnd, wm,d0Rym,d-Xwm,d-wm,d022with ym,dRnt,XRnt×nd.$$ \begin{array}{cc}\underset{{w}_{m,d}\in {\mathbb{R}}^{{n}_d},\enspace {w}_{m,d}^0\in \mathbb{R}}{\mathrm{min}}{\Vert {y}_{m,d}-X{w}_{m,d}-{w}_{m,d}^0\Vert }_2^2& \mathrm{with}\enspace {y}_{m,d}\in {\mathbb{R}}^{{n}_t},X\in {\mathbb{R}}^{{n}_t\times {n}_d}.\end{array} $$(2)

Here, the ith coordinate of ym,d represents the descriptor d of the bifurcation diagram of the mth musician and the ith trumpet. The ith row of the matrix X contains the nd = 44 modal coefficients of the ith trumpet with I ∈ [1,nt]. Meanwhile, wm,d is the vector of coefficients of the linear model for the mth musician and the descriptor d.

However, to make the solution more interpretable, we complete this minimization problem by adding a constraint promoting the sparsest possible solution, i.e., with a maximum number of zero terms among the nd = 44 components of the solution vector X. This way, the modal coefficients that contribute most to predicting the value of the descriptor are easily identified. The Lasso method addresses this problem by incorporating l1 norm regularization into equation (2):

{w̃m,d, w̃m,d0}= argminwm,dRnd, wm,d0Rym,d-Xwm,d-wm,d022+λwm,d1withλR+.$$ \begin{array}{cc}\left\{{\mathop{w}\limits^\tilde}_{m,d},\enspace {\mathop{w}\limits^\tilde}_{m,d}^0\right\}=\enspace \underset{{w}_{m,d}\in {\mathbb{R}}^{{n}_d},\enspace {w}_{m,d}^0\in \mathbb{R}}{\mathrm{arg}\mathrm{min}}{\Vert {y}_{m,d}-X{w}_{m,d}-{w}_{m,d}^0\Vert }_2^2+\lambda {\Vert {w}_{m,d}\Vert }_1& \mathrm{with}\hspace{0.5em}\lambda \in {\mathbb{R}}^{+}.\end{array} $$(3)

Here, λ serves as a regularization parameter that controls the sparsity of the solution. This optimization problem is known as the Lasso problem [23]. To solve this Lasso problem efficiently, we employ the Lars algorithm [24], a least-angle regression algorithm solving (3) efficiently for a set of well-chosen λ values.

Figure 10 represents the bifurcation diagram skeletons generated using the machine learning (LassoLars) approach in solid lines, and using the physical model (continuation method) in dashed lines, for 3 players and 2 trumpets that the model has not encountered during its training. For the |p| diagram, it can be seen that the LassoLars algorithm accurately predicts the bifurcation diagrams of the trumpets it has not encountered during its training, with an error significantly lower than the inter-instrument and inter-player variability. For the f0 diagram, although the overlap between the prediction and continuation results is not as striking as in |p| diagram, the relative difference between the different curves is relatively well respected, and the slightly higher pitch for one trumpet (blue lines) is well predicted by the machine learning model. Note that the maximum difference in f0 value (along the stable branches) between the two instruments is quite low (about 7 cents), which contributes to explaining why the machine learning prediction may look less accurate in this figure despite the very good performances of the prediction reported in Table 2.

thumbnail Figure 10

Bifurcation diagram skeletons generated using the machine learning (LassoLars) model (solid line) and the continuation method (dashed line) for 2 trumpets (blue and purple) the model has not encountered during its training, considering 3 players.

We can conclude that the LassoLars algorithm is a reliable and robust model for predicting the descriptor values of bifurcation diagrams. Moreover, as detailed in the next section, the embedded sparsity constraint allows an insightful interpretation of the influence of modal coefficients on the descriptors of the bifurcation diagrams.

3.4 Interpretability of the LassoLars algorithm

We proceed to interpret the influence of modal coefficients on the descriptors of the bifurcation diagrams using the LassoLars algorithm by looking at the amplitude of w̃m,d$ {\mathop{w}\limits^\tilde}_{m,d}$ coefficients. Each coefficient of w̃m,d$ {\mathop{w}\limits^\tilde}_{m,d}$ represents the impact of the corresponding modal coefficient on the target descriptor/musician pair. Due to its inclination towards sparse solutions, we can discern the impact of modal coefficients on descriptors by examining the non-zero coefficients of the linear model.

Figure 11 illustrates the normalized importance of modal coefficients in predicting Pmin1, averaged over all virtual players. We compute this importance by obtaining the mean w̃mean,Pmin1$ {\mathop{w}\limits^\tilde}_{\mathrm{mean},{P}_{\mathrm{min}1}}$ where w̃mean,Pmin1i=1npm=1np|w̃m,Pmin1i|||w̃m,Pmin1||1, i[1,nd]$ {\mathop{w}\limits^\tilde}_{\mathrm{mean},{P}_{\mathrm{min}1}}^i=\frac{1}{{n}_p}{\sum }_{m=1}^{{n}_p} \frac{|{\mathop{w}\limits^\tilde}_{m,{P}_{\mathrm{min}1}}^i|}{||{\mathop{w}\limits^\tilde}_{m,{P}_{\mathrm{min}1}}|{|}_1},\enspace i\in [1,{n}_d]$ and examining the modal coefficient associated with its highest coefficients. Notably, the 4th and 5th acoustic modes contribute to over 60% of the importance in predicting Pmin1. This is quite coherent with the underlying physics of sound production. Indeed, due to the nonlinear coupling between the lips and the instrument, the fundamental frequency of the B♭4 lies between the 4th and 5th impedance peaks. In other words, the impedance “seen” by the instrument at the fundamental frequency strongly depends on these two resonances. Therefore, this is consistent with the expectation that the modal parameters of the 4th and 5th modes will have a significant effect on the dynamics of the system, especially concerning threshold blowing pressures.

thumbnail Figure 11

Mean and standard deviation of the normalized importance of the top 25 modal coefficients in predicting the Pmin1 descriptor. R(sk)$ \mathfrak{R}({s}_k)$ and I(sk)$ \mathfrak{I}({s}_k)$ are the real and imaginary parts of the kth resonance pole of the acoustic impedance of the instrument. R(Ck)$ \mathfrak{R}({C}_k)$ and I(Ck)$ \mathfrak{I}({C}_k)$ are the real and imaginary parts of the kth resonance residue of the acoustic impedance of the instrument.

4 Tool for trumpet designers

To make this technology easily used by trumpet designers, a software tool was developed, enabling the 10 descriptors to be computed for the 60 virtual musicians from given modal parameters as an input and automatically displayed in a convenient and readable way.

In practice, users can input modal parameters for one or multiple trumpets, and the tool returns predicted descriptor values for all virtual players in the form of Figure 12. In this example, the descriptors are calculated for three trumpets that fall within the training space (they respect the boundaries of the modal parameters) but were not used to train the machine learning model. Figure 12 then provides an overview of the characteristics of the three instruments concerning the descriptors. The boxplot representation allows us to observe differences in the medians. For instance, there are differences in intonation (playing frequencies). We also observe a smaller value of the descriptor slope (that could also be associated with the dynamic range of the instrument) for trumpet 3 compared to the others. Moreover, it also sheds light on the dispersion around the median, which we may interpret as the sensitivity of the instrument to the virtual musicians. For instance, trumpet 1 appears less sensitive to the player parameters in terms of the threshold pressure Pmin1 than trumpets 2 and 3. This last aspect is particularly interesting for the comparison of instruments and for predicting the perceived quality, and a benefit of the machine learning model that allows very fast computation of the descriptors for several virtual musicians.

thumbnail Figure 12

Box plot of descriptors prediction for the 60 virtual players on 3 test instruments (unseen during the model training).

Finally, users can also train a new model by providing a new dataset of modal parameters and descriptors. Regarding computation time, the tool is built with Python programming language, Scikit-learn [28] and Streamlit (https://docs.streamlit.io/get-started), and provides almost real-time computation, with training taking only 2 min and prediction less than 1 s on a regular laptop. A video demonstration of the tool is available under the reference [29].

5 Conclusions

In this article, we have proposed an approach to enable trumpet designers to access the results of non-linear dynamics calculations. More precisely, using a machine-learning approach, we have shown that the use of a surrogate model enables the value of 10 descriptors characteristic of the detailed bifurcation diagram to be predicted in less than a second. This result is obtained directly without the need for the user to have any expertise in non-linear dynamics.

Following a benchmark between several classic solutions in the machine learning literature, the article shows that the most accurate model for carrying out this task is obtained by the LassoLars method. To achieve a large enough training set, we showed in the article how to take advantage of the experience acquired in recent years in analyzing the non-linear dynamics of physical trumpet models using numerical continuation. This approach allows constructing detailed bifurcation diagrams that take account of the acoustic characteristics of the instrument (via its input impedance) and the way it is played (via the values of the musician’s parameters). The training set was made up of around 12,000 bifurcation diagrams from which the values of 10 descriptors were extracted for each one. Building the training set is the longest stage in our approach since it involves both the physical model and complex analyses of how it works. It corresponds to around 600 h of computation.

From a practical point of view, the solution proposed in the article of a software interface for querying the surrogate model in an instrument-making context responds to two difficulties: firstly, calculation codes in non-linear dynamics require know-how and scientific knowledge that not all musical instrument designers have. Secondly, a database of bifurcation diagrams, even a large one, is by its very nature incomplete, since instrument designers are conceiving instruments that do not yet exist. The solution presented in the article therefore allows musical instrument designers to be autonomous in their exploration of the performance of different trumpets. Even for a model they are currently designing, they can instantly compare it with other trumpets, for different types of musicians.

We defer the analysis of multiple musical notes for further study. That investigation could provide a more comprehensive understanding of trumpet dynamics across its entire range. This would entail examining how different notes influence the instrument’s behavior and its interaction with players. Moreover, delving deeper into the modeling of the virtual player could yield valuable insights into the nuanced requirements for trumpet design. This could involve learning to predict the descriptors of the bifurcation diagram from both the virtual player’s parameters and the modal parameters of the trumpet using models such as Physics-Informed Neural Networks (PINNs). However, this step towards a more comprehensive surrogate model was beyond the scope of this paper. Regarding the trumpets themselves, the approach proposed in this article could be extended to instruments with modal parameters outside the bounds of the commercial instruments used in this study. Such an extension would enable trumpet designers to explore instrument behavior beyond the traditional range of modal parameters and compare them with existing instruments.

Conflicts of interest

We have no conflicts of interest to disclose. All authors declare that they have no conflicts of interest.

Data availability statement

The audio files are publicly available from GitHub under the reference [29].

References

  1. M. Campbell, J. Gilbert, A. Myers: The science of brass instruments, Springer International Publishing, New York City, USA, 2021. [CrossRef] [Google Scholar]
  2. J.S. Cullen, J. Gilbert, D.M. Campbell: Brass instruments: linear stability analysis and experiments with an artificial mouth, Acta Acustica 86 (2000) 704–724. [Google Scholar]
  3. L. Velut, C. Vergez, J. Gilbert, M. Djahanbani: How well can linear stability analysis predict the behaviour of an outward-striking valve brass instrument model? Acta Acustica united with Acustica 103 (2017) 132–148. [CrossRef] [Google Scholar]
  4. T. Kaburagi, C. Kuroki, S. Hidaka, S. Ishikawa: Numerical method for analyzing steady-state oscillation in trumpets, Acoustical Science and Technology 44, 3 (2023) 269–280. [CrossRef] [Google Scholar]
  5. V. Fréour, L. Guillot, H. Masuda, S. Usa, E. Tominaga, Y. Tohgi, C. Vergez, B. Cochelin: Numerical continuation of a physical model of brass instruments: application to trumpet comparisons, Journal of the Acoustical Society of America 148, 2 (2020) 748–758. [CrossRef] [PubMed] [Google Scholar]
  6. V. Fréour, H. Masuda, B. Cochelin, C. Vergez. Identification of lip parameters associated to different trumpets using constrained continuation. in Proceedings of Forum Acusticum 2023, Turin, Italy, 11–15 September, 2023. [Google Scholar]
  7. J.F. Petiot, M. Roatta, V. Fréour, K. Arimoto, Contribution of machine learning and physic-based sound simulations for the charcterization of brass instruments, in: Proceedings of Forum Acusticum 2023, Turin, Italy, 11–15 September, 2023. [Google Scholar]
  8. B. Cochelin: A path following technique via an asymptotic-numerical method, Computers and Structures 53, 5 (1994) 1181–1192. [CrossRef] [Google Scholar]
  9. B. Cochelin, C. Vergez: A high order purely frequency-based harmonic balance formulation for continuation of periodic solutions, Journal of Sound and Vibration 324 (2009) 242–262. [Google Scholar]
  10. J. Gilbert, S. Maugeais, C. Vergez: Minimal blowing pressure allowing periodic oscillations in a simplified reed musical instrument model: Bouasse-Benade prescription assessed through numerical continuation, Acta Acustica 4 (2020) 27. [CrossRef] [EDP Sciences] [Google Scholar]
  11. R. Matteoli, J. Gilbert, C. Vergez, J.-P. Dalmont, S. Maugeais, S. Terrien, F. Ablitzer: Minimal blowing pressure allowing periodic oscillations in a model of bass brass instruments, Acta Acustica 5 (2021) 57. [CrossRef] [EDP Sciences] [Google Scholar]
  12. R. Matteoli, J. Gilbert, S. Terrien, J.-P. Dalmont, C. Vergez: Diversity of ghost notes in tubas, euphoniums and saxhorns, Acta Acustica 6 (2021) 32. [Google Scholar]
  13. V. Fréour, M. Mohamed, K. Arimoto, V. Emiya, B. Cochelin, C. Vergez: Machine learning applied to the prediction of trumpet bifurcation diagrams: towards a tool for trumpet designers, in: Proceedings of Forum Acusticum 2023, Turin, Italy, 11–15 September, 2023. [Google Scholar]
  14. R. Tournemenne, J.F. Petiot, B. Talgorn, M. Kokkolaras, J. Gilbert: Brass instruments design using physics-based sound simulation models and surrogate-assisted derivative-free optimization, Journal of Mechanical Design 139 (2017) 0141401-1–011401-9. [CrossRef] [Google Scholar]
  15. S. Gonzales, D. Salvi, D. Baeza, F. Antonacci, A. Sarti: A data-driven approach to violin making, Scientific Reports 11 (2021) 9455. [CrossRef] [PubMed] [Google Scholar]
  16. F. Mokdad, S. Missoum: A fully parametrized finite element model of a grand piano soundboard for sensitivity analysis of the dynamic behavior, in: Proceedingsof the ASME 2013, Portland, Oregon, USA, August 4–7, 2013. [Google Scholar]
  17. S. Adachi, M.A. Sato: Trumpet sound simulation using a two-dimensional lip vibration model, Journal of the Acoustical Society of America 99, 2 (1996) 1200–1209. [CrossRef] [Google Scholar]
  18. V. Fréour, L. Guillot, H. Masuda, C. Vergez, B. Cochelin: Parameter identification of a physical model of brass instruments by constrained continuation, Acta Acustica 6 (2022) 9. [CrossRef] [EDP Sciences] [Google Scholar]
  19. T. Chen, C. Guestrin: Xgboost: a scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August, Association for Computing Machinery, 2016, pp. 785–794. [Google Scholar]
  20. D. Berrar: Cross-validation, Encyclopedia of Bioinformatics and Computational Biology 1 (2019) 542–545. [CrossRef] [Google Scholar]
  21. H. Hotelling: Analysis of a complex of statistical variables into principal components, Journal of Educational Psychology 24, 6 (1933) 417. [CrossRef] [Google Scholar]
  22. R. Roy, T. Kailath: Esprit: estimation of signal parameters via rotational invariance techniques, IEEE Transactions on Acoustics, Speech, and Signal Processing 37, 7 (1989) 984–995. [CrossRef] [Google Scholar]
  23. R. Tibshirani: Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996) 267–288. [CrossRef] [Google Scholar]
  24. B. Efron, T. Hastie, I. Johnstone, R. Tibshirani: Least angle regression, Annals of Statistics 32, 2 (2004) 407–451. [CrossRef] [Google Scholar]
  25. A.E. Hoerl, R.W. Kennard: Ridge regression: biased estimation for nonorthogonal problems, Technometrics 12, 1 (1970) 55–67. [CrossRef] [Google Scholar]
  26. V. Vapnik: The nature of statistical learning theory Chapter 5–6, Springer Science & Business Media, Berlin, Germany, 2013. [Google Scholar]
  27. N.S. Altman: An introduction to kernel and nearest-neighbor nonparametric regression, American Statistician 46, 3 (1992) 175–185. [CrossRef] [Google Scholar]
  28. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas: Scikit-learn: machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830. [Google Scholar]
  29. M. Mohamed: A machine learning tool for the prediction of trumpet bifurcation diagrams – Supplementary materials, GitHub, 2024. Available at https://github.com/mimoun-mohamed-lab/Prediction-of-trumpet-descriptors. [Google Scholar]

Cite this article as: Mohamed M. Fréour V. Vergez C. Arimoto K. Emiya V, et al. 2024. Prediction of trumpet performance descriptors using machine learning. Acta Acustica, 8, 65. https://doi.org/10.1051/aacus/2024042.

All Tables

Table 1

Definition of the performance descriptors extracted from the bifurcation diagrams.

Table 2

Mean Absolute Percentage Error (MAPE) and Standard deviation of the Absolute Percentage Error (StdAPE) in predicting descriptor values using various machine learning algorithms on the test set. The best performance for each descriptor is highlighted in bold.

All Figures

thumbnail Figure 1

Goal: replacing the traditional physical approach with a fast and interpretable machine learning approach.

In the text
thumbnail Figure 2

Descriptors associated with the bifurcation diagram of a B♭ trumpet for a B♭4. The thin lines represent the unstable branches, while the bold lines represent the stable branches. Detailed definitions of these descriptors are provided in Table 1.

In the text
thumbnail Figure 3

Complete 5-fold stratified cross-validation procedure for hyperparameter search. (1) The dataset is randomly split into a training set and a testing set while maintaining equal proportions of each trumpet in both sets. (2) The training set is randomly split into 5 subsets (or folds) ensuring each trumpet’s proportion is preserved across folds. (3) The model is trained on 4 folds and performances are evaluated (validated) on the remaining fold. This process repeats 5 times, resulting in 5 trained models with associated validation performances. (4) The accuracy is computed and averaged over the 5 validation set. (5) Steps (3) and (4) are repeated with different hyperparameter sets, iterating this process 100 times. (6) The hyperparameter set yielding the highest average accuracy is selected. The final model is trained on the entire training set (from step 1) using these hyperparameters. The performance of the final model is evaluated on the testing set to assess its generalization ability.

In the text
thumbnail Figure 4

Confusion matrix for instrument prediction based on descriptors using the XGBoost. Left: features are raw descriptors, ACCmean = 81.12%. Right: features are descriptors represented in principal components space, ACCmean = 99.50%.

In the text
thumbnail Figure 5

Gain – improvement in accuracy – of each feature in the XGBoost classifier. Left: features are raw descriptors. Right: features are descriptors represented in principal components space.

In the text
thumbnail Figure 6

Biplots of the principal component analysis axes with the highest gain in XGBoost classifier. The explained variance ratio of the axes is in parentheses. Arrows in the biplot represent descriptors. The direction of each arrow indicates increasing values for that descriptor, while the arrow’s length represents the descriptor’s contribution to overall variation in the dataset.

In the text
thumbnail Figure 7

Normalized amplitude and phase angle of the impedances of the 200 virtual instruments generated to train the machine learning models.

In the text
thumbnail Figure 8

Zoom around the fourth (left) and fifth (right) peaks of the impedances of the 200 virtual instruments.

In the text
thumbnail Figure 9

Lip parameters associated to the 60 virtual players. The red dot represents the baseline virtual player (Ql = 3, μl = 2 kg/m2, y0 = 0.1 mm).

In the text
thumbnail Figure 10

Bifurcation diagram skeletons generated using the machine learning (LassoLars) model (solid line) and the continuation method (dashed line) for 2 trumpets (blue and purple) the model has not encountered during its training, considering 3 players.

In the text
thumbnail Figure 11

Mean and standard deviation of the normalized importance of the top 25 modal coefficients in predicting the Pmin1 descriptor. R(sk)$ \mathfrak{R}({s}_k)$ and I(sk)$ \mathfrak{I}({s}_k)$ are the real and imaginary parts of the kth resonance pole of the acoustic impedance of the instrument. R(Ck)$ \mathfrak{R}({C}_k)$ and I(Ck)$ \mathfrak{I}({C}_k)$ are the real and imaginary parts of the kth resonance residue of the acoustic impedance of the instrument.

In the text
thumbnail Figure 12

Box plot of descriptors prediction for the 60 virtual players on 3 test instruments (unseen during the model training).

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.