Issue
Acta Acust.
Volume 7, 2023
Topical Issue - CFA 2022
Article Number 28
Number of page(s) 12
DOI https://doi.org/10.1051/aacus/2023022
Published online 14 June 2023

© The Author(s), Published by EDP Sciences, 2023

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

Models of brass instruments have been used for a long time to understand the physics (cf. [1]) and synthesize their sound (cf. [2]). However, one remaining difficulty is the calibration of these models as many of the constants appearing in the mathematical equations are difficult to measure, in particular when they involve human body parts such as the lips of a brass player.

A good calibration of these parameters is useful not only for sound synthesis, where a realistic sound is the main goal, but also for theoretical purposes, as the equations governing models of brass instruments are nonlinear, and are therefore very sensitive to changes in the constants of the model.

In the literature, many models exist for the lips, with different degrees of complexity, either as a one degree of freedom oscillator (which is the most common), or a two degrees of freedom oscillator taking into account different polarities (cf. [3]), and models trying to come closer to the geometry of the opening section of vibrating lips (cf. [1], Sect. 5.1.2). However, the more complex the model, the more constants there are that need to be calibrated, and the higher the uncertainty.

The question of calibration of the parameters is not restricted to lip models of brass players, and a related problem is that of the embouchure of reed instruments which may seem easier as measures can be undertaken directly on the reed. The study of the reed parameters is the source of a large literature (see e.g., [410]) which can be used as a source for methods dedicated to the lips.

Concerning brass instruments, the body of literature is more reduced, although many articles deserve to be cited and will serve as reference for the present work (see [11] for a table summarizing known values). Most notably, [12] who was one of the first to give a complete set of parameters, [13] who built an asymptotic state observer, [14] who used simulated annealing with a cost function depending on playing frequency only, [3] with data coming from high speed camera and [15] using bifurcation diagrams.

The present article aims to provide a new method to identify embouchure parameters that can be used with very little apparatus on actual musicians, and that can follow their evolution while playing. It introduces a new cost function which is a combination of [14] (for the frequency part) and [8] (without the displacement), together with some penalization (see Sect. 2.4). An optimization algorithm is used to minimize this cost function on recordings with actual musicians and the results are discussed (cf. Sect. 3).

2 Method

2.1 Model

To reduce the amount of parameters that must be calibrated, the model chosen in the present article is the simplest one described by (1), with only three equations (cf. [16]), that proved to replicate many of the properties of brass instruments (see e.g., [17]). It relates the mouth pressure pm to the mouthpiece pressure p and the opening h through a spring-mass-dashpot equation describing the lips, a valve effect computing the flow u through the lip from the difference of pressure, and the expression of the input impedance:

(ḧ+ωlQlḣ+ωl2(h-H)=pm-pμṗn=ZcCnu+snpnp=2R(n=1Npn)u=wh+sgn(pm-p)2|pm-p|ρ,$$ \left(\begin{array}{l}\ddot{h}+\frac{{\omega }_{\mathcal{l}}}{{Q}_{\mathcal{l}}}\dot{h}+{\omega }_{\mathcal{l}}^2\left(h-H\right)=\frac{{p}_m-p}{\mu }\\ {\dot{p}}_n={Z}_c{C}_nu+{s}_n{p}_n\\ p=2\mathfrak{R}\left(\sum_{n=1}^N {p}_n\right)\\ u=w{h}^{+}\mathrm{sgn}\left({p}_{\mathrm{m}}-p\right)\sqrt{\frac{2\left|{p}_m-p\right|}{\rho }}\end{array}\right., $$(1)where h+ = max(h, 0), w is the width of the lips, and Zc the characteristic impedance.

Here, the impedance is decomposed using modal analysis as a sum of simple fractions

Z(ω)=Zc(nCn-sn+Cn*-sn*)$$ Z(\omega )={Z}_c\left(\sum_n \frac{{C}_n}{{j\omega }-{s}_n}+\frac{{C}_n^{\mathrm{*}}}{{j\omega }-{s}_n^{\mathrm{*}}}\right) $$(2)and pn being complex valued.

The variables pn form a decomposition of the mouthpiece pressure using the special form of the impedance. They are not really modal coordinates, as they are not naturally orthogonal for some scalar product, but give a convenient way to solve the problem.

The input impedance is measured through an impedance bridge and is therefore known. It is decomposed into formula (2) using the Rational Fraction Polynomial method (see [18], Sect. 4.4.3), giving the values for sn and Cn which are therefore fixed for a given instrument.

The value of Cn and sn computed for the Bb trombone in first position and used in the measures (basse trombone Courtois and mouthpiece Holton) are given in Table A3.

The mask parameters are the remaining constants appearing in equations (1), and are the lip angular resonance frequency ωl=2πFl$ {\omega }_{\mathcal{l}}=2\pi {F}_{\mathcal{l}}$, the quality factor Ql$ {Q}_{\mathcal{l}}$, its surface density μ and its opening at rest H.

The set of equations (1) can be rewritten as a real valued ordinary differential equation (cf. [16]) ẏ=f(y)$ \dot{y}=f(y)$ of order 1 in dimension 2N + 2 with a state variable

y=[h,ḣ,R(p1),I(p1),R(pN),I(pN)].$$ y=[h,\dot{h},\mathfrak{R}({p}_1),\mathfrak{I}({p}_1)\dots,\mathfrak{R}({p}_N),\mathfrak{I}({p}_N)]. $$(3)

Such a formulation is very convenient both for time numerical simulation, and for the study of bifurcation diagrams, as computed by the software auto-07p [19] (see Sect. 2.9).

The time simulations performed in this project are based on the Runge-Kutta 4 algorithm and are coded in C language as a large number of them are performed. The sampling rate is set to 44,100 Hz as it gives a sufficient precision for the results (comparable to measured data), ensures the stability and convergence of the numerical scheme, and is sufficiently fast (0.18 s for a one second signal).

The time simulations have to be performed on a time range long enough so that stationary regime is attained. In practice, this means signals of up to 4 s have to be simulated as transient regime can be quite long for some mask parameters.

For a set of mask parameters

M=(ωl,Ql,H,μ),$$ \mathfrak{M}=({\omega }_{\mathcal{l}},{Q}_{\mathcal{l}},H,\mu ), $$(4)we write pM$ {p}_{\mathfrak{M}}$ for the mouthpiece pressure obtained by solving the system (1) with zero initial value y(0) = [0, …, 0] (cf. Eq. (3)). All the other parameters of the model, including pm, are fixed and constant during one optimization.

2.2 Experimental protocol

The goal of the project is to get access to as many mask parameters as possible with as little apparatus as necessary so as not to hinder the musician’s playing. We therefore focused on only two piezoresistive pressure sensors (Endevco 8507C-5): one in the mouth of the musician (or artificial mouth), and one in the mouthpiece. Simultaneous recordings give access at each time step to the quantities pm and p. Both signals are sampled at 44,100 Hz.

As only one period of p is needed by the algorithm below, the signal can be broken into pieces to see the evolution of parameters (see Sect. 2.4). Once a period is chosen, pm is averaged on the same time period to have a constant value to feed the numerical simulation.

Three sets of measures were performed with two experienced amateur trombone players, one of them being recorded twice, labeled A1, A2 and B in the rest of this article. For each session, the musician was asked to play 6 notes on a Bb bass trombone in first position (Bb2, F3, Bb3, D4, F4, Bb4), together with a bend on F4 (first down, then up), and a crescendo on F4.

2.3 Extraction of a signal’s period

For a given sampled signal p, either simulated (as in Eq. (1)) or measured (see Sect. 2.2), the determination of a period is critical for the method and the first step to perform it is the identification of the periodic regime and its frequency. This is done using a python implementation [20] of the Yin algorithm (see [21]).

The Yin algorithm produces an estimator called the harmonic rate, which is a real number between 0 and 1, that gives a quantification of how periodic the signal is: A harmonic rate very small (ideally 0) meaning that the signal is close to being periodic. In practice, we consider the signal to be periodic when the harmonic rate is lower than 10−3, and extract the part of the periodic regime with the smallest harmonic rate.

Yin also gives an estimation of the instantaneous frequency F at each point. From this, it is already possible to extract a waveform p of duration exactly one period in the periodic regime. However, as this waveform has to be compared to another one coming from a reference signal, a phase condition has to be fixed. We therefore demand that all the waveforms

{begin by crossing 0,in an increasing way.$$ \left\{\begin{array}{c}\centerdot \mathrm{begin}\enspace \mathrm{by}\enspace \mathrm{crossing}\enspace 0,\\ \centerdot \mathrm{in}\enspace \mathrm{an}\enspace \mathrm{increasing}\enspace \mathrm{way}.\end{array}\right. $$(5)

This is always possible in practice as p has a mean value equal to 0. The normalization is achieved by considering the waveform p and shifting it to the left until the first point satisfies the phase condition, giving rise to a new waveform p̃$ \mathop{p}\limits^\tilde$ which is used as a reference.

It should be noted that on a general signal, the phase condition may not be sufficient to uniquely determine p̃$ \mathop{p}\limits^\tilde$. However, for the signals obtained either numerically or experimentally, this condition proved to be sufficient.

For a set of mask parameters M$ \mathfrak{M}$, we also write FM$ {F}_{\mathfrak{M}}$ and p̃M$ {\mathop{p}\limits^\tilde}_{\mathfrak{M}}$ for the frequency and normalized waveform of the signal pM$ {p}_{\mathfrak{M}}$ obtained in Section 2.1.

2.4 Definition of the cost function

The goal of the cost function is to try to compare two periodic signals pref and p, from which frequencies FrefF and waveforms p̃ref,p̃$ {\mathop{p}\limits^\tilde}_{\mathrm{ref}},\mathop{p}\limits^\tilde$ are extracted. The signal p̃ref$ {\mathop{p}\limits^\tilde}_{\mathrm{ref}}$ can be either a recorded signal, or a simulated signal obtained from known mask parameters (for test purposes), and is the reference against which the model outputs are compared. Although in theory it should be enough to compare p̃ref$ {\mathop{p}\limits^\tilde}_{\mathrm{ref}}$ and p̃$ \mathop{p}\limits^\tilde$, it puts too much emphasis on the waveform itself, and too little on the frequencies. As both timbre and intonation are important for the applications, it is necessary to add an extra weight to the difference in frequencies.

The preliminary cost function C$ {\mathcal{C}}_{\circ }$ is therefore

C(pref,p)=1||p̃ref||220min(1Fref,1F)(p̃ref(t)-p̃(t))2dt+αF(1200 log2(Fref/F) )2$$ \begin{array}{c}{\mathcal{C}}_{\circ }\left({p}_{\mathrm{ref}},p\right)=\\ \frac{1}{||{\mathop{p}\limits^\tilde}_{\mathrm{ref}}{||}_2^2}{\int }_0^{\mathrm{min}\left(\frac{1}{{F}_{\mathrm{ref}}},\frac{1}{F}\right)} {\left({\mathop{p}\limits^\tilde}_{\mathrm{ref}}(t)-\mathop{p}\limits^\tilde(t)\right)}^2\mathrm{d}t\\ +{\alpha }_F{\left(1200\enspace {\mathrm{log}}_2\left({F}_{\mathrm{ref}}/F\right)\enspace \right)}^2\end{array} $$(6)with

||p̃ref||22=01/Frefp̃ref(t)2dt.$$ ||{\mathop{p}\limits^\tilde}_{\mathrm{ref}}{||}_2^2={\int }_0^{1/{F}_{\mathrm{ref}}} {\mathop{p}\limits^\tilde}_{\mathrm{ref}}(t{)}^2\mathrm{d}t. $$

The first term of the sum is the square of the relative RMS difference, and the second one the square of the relative frequency difference.

The choice of the constant αF guides the optimization procedure either toward a better approximation of the waveform (small αF) or toward a better approximation of the frequency (big αF).

In our case, the choice of αF = 0.02, obtained by trial and error, leads to good results during optimization for trombone sounds, in that intonation (errors around 10 cents) and waveforms (errors around 30%) are respected. In particular it means that the difference in cents between two signals is C(pref,p)/αF$ \le \sqrt{{\mathcal{C}}_{\circ }({p}_{\mathrm{ref}},p)/{\alpha }_F}$.

2.5 Penalization

As is already well known (cf. [8]) the inversion problem is not well defined and it is actually easy to find multiple sets of mask parameters which give signals with very similar waveforms (see Tab. 1).

Table 1

Values of two different sets of mask parameters giving almost identical signals: relative RMS difference is 0.9% and difference in frequencies is 1.41 cents. Mouth pressure is equal for both simulations and fixed at 2500 Pa.

This means in particular that the cost function lacks convexity. One typical solution to remedy this problem is to convexify the cost function using Tikhonov regularization, which amounts to adding quadratic terms with respect to some mask parameters. More precisely, we define the complete cost function C$ \mathcal{C}$

C(pref,M)=C(pref,pM)+βQQl2+βHH2.$$ \mathcal{C}\left({p}_{\mathrm{ref}},\mathfrak{M}\right)={\mathcal{C}}_{\circ }\left({p}_{\mathrm{ref}},{p}_{\mathfrak{M}}\right)+{\beta }_Q{Q}_{\mathcal{l}}^2+{\beta }_H{H}^2. $$(7)

The choice on the specific penalization has been made on Ql$ {Q}_{\mathcal{l}}$ and H because it proved to be

  • sufficient to have a well defined solution up to a sufficiently good precision (cf. Tab. 1),

  • necessary to remove very different solutions (cf. Sect. 2.8).

It should be noted that this particular choice of penalization, instead of a more general form like (H − H0)2 for a reference H0 which should be fixed for the whole optimization, implies that the optimization procedure will favor solutions with the smallest quality factor and lip opening. This was chosen for lack of a good candidate for H0.

This particular choice of penalization proved to give results close to those in the literature, except for H (cf. Sect. 3).

The method for fixing the values of βQ and βH is done so that a typical value of the penalization should be of the same magnitude as C(pref,p)$ {\mathcal{C}}_{\circ }({p}_{\mathrm{ref}},p)$. As we expect C(pref,p)$ {\mathcal{C}}_{\circ }({p}_{\mathrm{ref}},p)$ to be about 0.3 (cf. Sect. 3), that Q ≅ 7 and H ≅ 10−4 m (chosen among the known values in [11]), we took βQ = 5 × 10−3 and βH = 3 × 107 m−2.

2.6 Continuity, optimization algorithm

The algorithm chosen to find the minimum of the cost function is the dual annealing optimization (cf. [22]) which is a stochastic algorithm that requires neither the cost function to be regular, nor the minimum to be unique.

Indeed, the cost function in this article is not continuous: a small variation of the mask parameters can lead to completely dissimilar solutions. For example the trombone player can obtain the different notes by only varying its lip resonance frequency: the variation of playing frequency with lip resonance frequency clearly has jumps (cf. [16], Fig. 4).

Another problem of the cost function is that it has many local minima. Although we do not have a mathematical or musical reason for this, it is clearly seen during the optimization process using dual annealing as it performs local searches before jumping to other locations (see Tab. 2 where each line represents a local minimum).

Table 2

Data for optimization: Mouth pressure pm = 1656 Pa and width of the lips w = 12.10−3 m. Search space is Fl[150, 200]$ {F}_{\mathcal{l}}\in [150,\enspace 200]$, Ql[0.1, 6]$ {Q}_{\mathcal{l}}\in [0.1,\enspace 6]$, μ ∈ [0.1, 3], H ∈ [10−5, 10−3]. Error on playing frequency: 6 cents.

2.7 Limitations of dual annealing

The dual annealing algorithm is known to give a solution for some very slow (logarithmically) decreasing temperatures (cf. [22]), but that kind of evolution of the temperature implies a very slow convergence. In practice, a faster decreasing temperature is used, but the convergence is not assured.

Moreover, as this algorithm is of stochastic nature, there is no simple criterion to stop it, and an arbitrary condition has to be chosen: the choice was made to bound by 1000 the number of calls to the cost function. With this choice, a typical run lasts about 1 h on a desktop computer.

The probabilistic nature of this algorithm also means that two runs of the same algorithm, with the same initialization data (except for the seed of the random generator) give different solutions that can be quite different, as the global minimum might not have been reached in one run. It is therefore often necessary to launch the algorithm iteratively until a new run does not produce a solution with lower cost function.

2.8 Precision of the algorithm

For a given set of mask parameters Mref$ {\mathfrak{M}}_{\mathrm{ref}}$, the goal is to find an algorithm that captures an approximation of Mref$ {\mathfrak{M}}_{\mathrm{ref}}$ with only the knowledge of p̃ref$ {\mathop{p}\limits^\tilde}_{\mathrm{ref}}$ and Fref. The algorithm we propose here is not able to do that without a prior assumption on Mref$ {\mathfrak{M}}_{\mathrm{ref}}$, namely

(*)Mref must be a minimum of the costfunction MC(pref,M).$$ \left(*\right)\begin{array}{c}{\mathfrak{M}}_{\mathrm{ref}}\enspace \mathrm{must}\enspace \mathrm{be}\enspace \mathrm{a}\enspace \mathrm{minimum}\enspace \mathrm{of}\enspace \mathrm{the}\enspace \mathrm{cost}\\ \mathrm{function}\enspace \mathfrak{M}\mapsto \mathcal{C}\left({p}_{\mathrm{ref}},\mathfrak{M}\right)\end{array}. $$

We say that a set of mask parameters satisfying this hypothesis is C$ \mathcal{C}$-admissible. A random set of mask parameters is not C$ \mathcal{C}$-admissible in general . Indeed, two sets M1$ {\mathfrak{M}}_1$ and M2$ {\mathfrak{M}}_2$ can give very similar waveforms (cf. Tab. 1), so that C(pref,pM1)=C(pref,pM2)$ {\mathcal{C}}_{\circ }({p}_{\mathrm{ref}},{p}_{{\mathfrak{M}}_1})={\mathcal{C}}_{\circ }({p}_{\mathrm{ref}},{p}_{{\mathfrak{M}}_2})$, but have different quality factor or lip opening, and implying for example C(pref,M1)<C(pref,M2)$ \mathcal{C}({p}_{\mathrm{ref}},{\mathfrak{M}}_1)<C({p}_{\mathrm{ref}},{\mathfrak{M}}_2)$. In that case, M2$ {\mathfrak{M}}_2$ cannot be C$ \mathcal{C}$-admissible.

Moreover, this definition highly depends on the choices made for the definition of C$ \mathcal{C}$, be it αF or the choice of penalizations. A set of mask parameters may be C$ \mathcal{C}$-minimal for one choice, but no longer for another one!

The assumption in this article is that for every “realistic” signal (i.e. coming from the recording of a trombone), there is only one mask parameter that is C$ \mathcal{C}$-admissible. It is not at all clear that this is true, and this is even known to be false if the penalizations are not added (see Sect. 2.5). Taking this assumption for granted, a set of mask parameters obtained by minimization of the cost function is automatically C$ \mathcal{C}$-admissible.

To find a suitable C$ \mathcal{C}$-admissible set of mask parameters and to stay close to an actual trombone signal, so that the robustness of the optimization procedure can be tested, we applied the optimization procedure to a reference signal on a recorded D4 (cf. Sect. 2.2), cf. Figure 1.

thumbnail Figure 1

Block diagram explaining how to get a C$ \mathcal{C}$-admissible set of mask parameters from a recorded signal and use it to assess the precision of the algorithm.

The obtained values are given in the second column of Table 2, denoted by Mref$ {\mathfrak{M}}_{\mathrm{ref}}$. The set of mask parameters used for the initialization of the dual annealing algorithm is given in the third column, and the result of the optimization algorithm is in the last one. The search space is given in its caption.

The resulting waveforms for the two sets of mask parameters are indistinguishable, with a relative RMS error of only 3%, and an error on the frequencies of about 6 cents.

The results of optimization are acceptable (much lower than the dispersion of the values found in the literature) and representative of the errors we found on other simulations (cf. Sect. 3) except for the opening at rest H, but its value is so small that it is hard to give a physical interpretation (see Sect. 3.2.4).

The difference between Mref$ {\mathfrak{M}}_{\mathrm{ref}}$ and Moptim$ {\mathfrak{M}}_{\mathrm{optim}}$ may be explained by the fact that we had to stop the algorithm at one point (cf. Sect. 2.7) or because C$ \mathcal{C}$ is insufficiently convexified.

2.9 Continuation

During the analysis of the bend (cf. Sect. 3.4), the continuation software auto-07p ([19]) is used to follow the evolution of the playing frequency with different parameters.

The continuation is first initialized for a reference set of mask parameters M1$ {\mathfrak{M}}_1$ which is chosen for each musician to be the point which minimizes the cost function C$ {\mathcal{C}}_{\circ }$ among all the optimized values of mask parameters, so as to be as close as possible to the actual recording.

The bifurcation diagram is built using the dependency on pm up to the recorded value of the reference signal, so that auto-07p is now precisely set to the signal pM1$ {p}_{{\mathfrak{M}}_1}$.

Then a continuation curve along one of the mask parameters (either Ql$ {Q}_{\mathcal{l}}$, Fl$ {F}_{\mathcal{l}}$, H or μ), δ is computed, all other physical variables being fixed, and the playing frequency is drawn.

3 Results

The results obtained for the different musicians are presented in the following subsections, but first it is interesting to look at the mouth pressure as a function of the playing frequency, cf Figure 2. Indeed, although all these data are directly recorded, and not optimized, we can clearly see differences between the two recording sessions of musician A, where the second session has a larger mouth pressure, which could translate into perceptible differences within the optimized data of a single musician.

thumbnail Figure 2

Measured mouth pressure Pm averaged over one period as a function of the playing frequency for all three musicians and the six notes.

3.1 Errors on sustained notes

Both RMS and frequency errors obtained at the end of optimization are presented in Figures 3 and 5. The RMS error can be quite large for some notes (up to 40%), which is not surprising as the model is one of the simplest and many physical details are neglected. As the optimization looks for a best fit among all mask parameters, this means the model should be complexified to take into account more of the physics of the instrument if precision on timbre and playing frequency have to be maintained, provided the dual annealing algorithm give results close enough to the global minimum.

thumbnail Figure 3

RMS error of the signals obtained from optimized mask parameters, relative to the measured signal for 6 different notes and the three musicians.

thumbnail Figure 4

Typical waveform for the recorded signal (green) and for the signal obtained from optimized mask parameters (orange) as played by musician A2 on a Bb3. The difference between signals is drawn in dashed red. The relative RMS error is 0.28.

thumbnail Figure 5

Frequency error in cents of the signals obtained from optimized mask parameters, relative to the recorded signal for 6 different notes and the three musicians.

For reference, a typical waveform is shown in Figure 4 where the reference signal is given in green, the reconstructed signal is in orange and the difference between them is in dotted red. The relative RMS error for this particular signal is 0.28. Although many properties are well approximated, the higher harmonics of the signal are clearly not in agreement with the experimental signal. This gives a typical value that can be expected for the RMS error.

Concerning repeatability, estimations of the mask parameters of musician A are coherent and give almost the same results for both the RMS error and the frequency error (except for the playing frequency of the note Bb4). However, the errors differ largely between both players, player B mainly getting the lowest error. This may suggest that both musicians use different techniques, and that player B is closer to the simple model (1).

Note in particular the difference between errors for the note F3, where musician A has the largest RMS error of all (cf. Fig. 4), and musician B has one of the lowest.

3.2 Discussion on sustained notes

3.2.1 Lip resonance frequency

The lip resonance frequency as a function of the playing frequency is shown in Figure 6 for all three musicians. As for any outward model, the frequency of the lips is lower than the playing frequency (cf. [23]), which is clearly seen in this figure as the circles are below the line Fplay=Fl$ {F}_{\mathrm{play}}={F}_{\mathcal{l}}$.

thumbnail Figure 6

Variation of Fl$ {F}_{\mathcal{l}}$ as a function of Fplay for all three musicians in circles, together with the diagonal Fplay=Fl$ {F}_{\mathrm{play}}={F}_{\mathcal{l}}$ in dashed black, and the regression line in dashed red.

Note that for a given frequency, there is little dispersion from player A (A1 or A2) to player B. Moreover the regression line

Fl=0.9366Fplay-31.57$$ {F}_{\mathcal{l}}=0.9366{F}_{\mathrm{play}}-31.57 $$(8)gives a good fit with R2 = 0.996 and could be used as a first estimation of the playing frequency using only the lip frequency.

3.2.2 Quality factor

The estimated quality factors for all three players are displayed in Figure 7. As in the case of lip frequency (cf. Sect. 3.2.1) results are very close for all three players, and also for all notes, being between 2 and 5.

thumbnail Figure 7

Variation of Ql$ {Q}_{\mathcal{l}}$ as a function of Fplay for all three musicians in circles.

Compared to the literature, they are however smaller than the measured values of [24] (between 9 and 10.5) but comparable to the estimation of [25] (around 5), [26] (between 1.2 and 1.8), [27] (around 3.7), [28] (around 2.88) and [29] (between 0.5 and 3). Except for the first reference, this justifies the penalization on Ql$ {Q}_{\mathcal{l}}$, which tends to favor the smallest possible values.

The values obtained for the two recordings of player A are always very close, this may mean that it depends very little on the loudness.

3.2.3 Surface density

The optimized values of μ−1 are given in Figure 8. Except for the 4 highest values, they are comparable to [28]. However, they are overestimated compared to other values found in the literature ([12, 24, 25, 27]) where they are between 0.03 and 0.2 m2·kg−1.

thumbnail Figure 8

Variation of μ−1 as a function of Fplay for all three musicians.

We can see that the data for A1 is systematically higher than that of A2, indicating a possible dependency on the mouth pressure and the loudness.

3.2.4 Opening at rest

The values of opening at rest obtained by optimization are given in Figure 9. They seem very small compared to what was obtained by other authors, up to a factor 10: a typical value obtained by optimization is around 2 × 10−5 m (see Fig. 9) whereas [24] and [27] have a typical value of 5 × 10−4 m, [25] 2 × 10−4 m, and [29] 1.10 × 10−3 m, all with comparable lip’s width.

thumbnail Figure 9

Variation of H as a function of Fplay for the three musicians.

However, when shifting from opening at rest to mean opening (cf. Fig. 10) using formula (A3), which takes also into account the mouth pressure, the lip frequency and the lip surface density, the results are comparable to those of [30], which are between 0.6 mm and 2 mm. It should be noted that in this case, the value of opening at rest is negligible in the formula (A3) in Appendix.

thumbnail Figure 10

Mean opening of the lips as a function of Fplay for the three musicians.

Just as for μ−1 there appears to be a correlation between loudness and mean opening. It is not only expected, but actually obvious from the formula (A3) where the mouth pressure appears.

3.3 Optimization for bent notes

The musicians were instructed to perform pitch bends on F4: without moving the slide, the player used embouchure adjustments to vary the pitch, first below its normal value, then above, then below, and then back to F4. For each recording of approximately 10 s, the signals are cut into chunks of 0.2 s, and the optimization procedure is applied independently on each chunk. The note F4 has been chosen because it is one of the most comfortable to bend for the musician. The RMS and frequency errors can be found in Figures 11 and 12.

thumbnail Figure 11

RMS error for bent note on F4 for all three musicians. The dashed line represents the average frequency of the actual played F4 with no bend.

thumbnail Figure 12

Frequency error for bent note on F4 for all three musicians. The dashed line represents the average frequency of the actual played F4 with no bend.

The RMS error is around 0.25, except for a group of notes played by musician B with low frequencies, which may be related to a particular technique used by this musician.

The frequency error is quite low except for the lowest notes, and is in agreement with the very low frequency error found for F4 in Figure 5. This suggests that the model is not able to predict precisely what the musician is doing for the lowest frequencies of the bend.

Indeed, there seems to be two regions for the errors, with a change at around 352 Hz, as if the optimization process could not find parameters that fit the playing frequency below that value. In practice, musicians often use special techniques to bend to very low notes, such as using the vocal tract. This technique is clearly not taken into account in the model, and it seems the algorithm indicates its own limits.

3.4 Discussion on bent notes

During bending, the musician varies many parameters. This makes it quite difficult to see the influence of any of them. In the following diagrams, the evolution of playing frequency is shown with respect to the mask parameters. To put it into perspective, the theoretical evolution with respect to only the considered parameter (the other parameters being kept constant) is also computed using auto-07p (see Sect. 2.9). The mask parameters used to initialize the continuation are those with smallest cost function among all the optimized values for this recording, to ensure that the model is as close to the measures as possible.

3.4.1 Quality factor Ql$ {Q}_{\mathcal{l}}$

The results of the optimization for bent notes is presented in Figure 13 for the quality factor. The values obtained are within the same range as in Figure 7.

thumbnail Figure 13

Variation of Ql$ {Q}_{\mathcal{l}}$ as a function of Fplay. Crosses represent the measures for all three musicians, circles the initialization parameter for continuation (see Sect. 2.9), and lines the continuation obtained with auto-07p. The dashed line represents the average frequency of the actual played F4 with no bend.

One striking feature is the proximity from the measures of musician B, and the results of continuation obtained by auto-07p. It seems like the playing frequency is completely predicted by the evolution of the quality factor. However, the precision of the fit must be put into perspective with the rather large errors in the optimization (see Figs. 11 and 12).

The fit is not so good with musician A, although the results of the continuation go in the right direction.

3.4.2 Lip resonance frequency Fl$ {F}_{\mathcal{l}}$

The results of optimization for the lip resonance frequency are given in Figure 14, and are quite difficult to interpret. Even more than in Figure 12, there seem to be two regions, one before 352 Hz, and one after.

thumbnail Figure 14

Variation of Fl$ {F}_{\mathcal{l}}$ as a function of Fplay. Crosses represent the measures for all three musicians, circles the initialization parameter for continuation (see Sect. 2.9), and lines the continuation obtained with auto-07p. The dashed line represents the average frequency of the actual played F4 with no bend.

Above 352 Hz, the estimation of lip frequency does not give a clear tendency. Although we could expect the lip frequency to increase with playing frequency, just as in Figure 6, this is not what appears in the figure. This suggests that the lip frequency is only a coarse tuner, and the quality factor is actually the fine tuner.

3.4.3 Lip surface density μ

The results of the optimization for bent notes is presented in Figure 15 for the lip surface density. The values are compatible with those in Figure 8, and the evolution of playing frequency relatively to μ is compatible with the theoretical one obtained by continuation.

thumbnail Figure 15

Variation of μ−1 as a function of Fplay. Crosses represent the measures for all three musicians, circles the initialization parameter for continuation (see Sect. 2.9), and lines the continuation obtained with auto-07p. The dashed line represents the average frequency of the actual played F4 with no bend.

3.4.4 Opening at rest H

The results of the optimization for bent notes is presented in Figure 16 for the opening at rest. The values obtained are within the same range as in Figure 9.

thumbnail Figure 16

Variation of H as a function of Fplay. Crosses represent the measures for all three musicians, circles the initialization parameter for continuation (see Sect. 2.9), and lines the continuation obtained with auto-07p. The dashed line represents the average frequency of the actual played F4 with no bend.

As explained in Section 3.2.4, the values obtained for H are very small, and therefore not very well defined (see error term in Tab. 2).

The mean opening obtained from other optimized values and formula (A3) is given in Figure 17, and a clear tendency can be observed above 350 Hz: the mean opening increases with the playing frequency for all musicians.

thumbnail Figure 17

Variation of Hmean as a function of Fplay. Crosses represent the measures for all three musicians, circles the initialization parameter for continuation (see Sect. 2.9), and lines the continuation obtained with auto-07p. The dashed line represents the average frequency of the actual played F4 with no bend.

Below 350 Hz the tendency is not so clear. Moreover, one must be careful with interpretation as there may be other phenomena involved than those directly modeled (cf. Sect. 3.3).

3.5 Optimization for a crescendo

The same procedure as in Section 3.3 was used for the recordings of a crescendo on F4 for all three musicians.

The relative RMS error is presented in Figure 18, and indicates that the higher the mouth pressure, the higher the RMS error. This shows that the simple model (1) is good at reproducing the timbre for low pressure, but not so much for higher pressures. This may be due to the nonlinear propagation along the length of the trombone.

thumbnail Figure 18

Variation of RMS error as a function of Pm for all three players.

The error in frequency is presented in Figure 19 and is compatible with that in Figure 5. It proves that the model (1) is actually quite good at reproducing the playing frequency, whatever the dynamic of the playing. This indicates that the limits of the model are not so much on the frequency, but more on timbre.

thumbnail Figure 19

Variation of frequency error as a function of Pm for all three players.

The waveform for two different dynamics are shown in Figure 20, both for the measured signal, together with the reconstructed signal from optimized mask parameters. The difference in timbre is clearly seen for the forte recording.

thumbnail Figure 20

Comparison of wave functions for measured signal vs. optimized signal for musician B. Top for piano (relative RMS error: 20%, playing frequency error 1.5 cents), and bottom for forte (relative RMS error: 34%, playing frequency error 0.7 cents).

The details of the figures obtained through optimization during a crescendo can be found at Comparison of measured vs. simulated sounds with these parameters can be found at http://perso.univ-lemans.fr/~smauge/mask.

4 Conclusion

In this article, a new method is proposed to estimate the mask parameters of a brass musician within a set of acceptable parameters (so called C$ \mathcal{C}$-admissible). This approach is used on recordings of actual musicians during playing, and is able to deliver a coherent set of parameters (except maybe for the opening at rest), in that they are not too far from existing results in the literature, and their values evolve in a way that is compatible with theory during the playing.

The values obtained prove that a simple model is already capable of reproducing a playing frequency close to that played by an actual musician, with a dynamic and waveform that are similar to the measured ones. This may prove useful for instrument making, although more research should be done to assess the robustness of the method, and investigate the variability of the mask parameters from player to player. Moreover, it can give new leads to a better understanding of intonation, such as the almost linear relation between playing and lip frequencies, or the role of quality factor as a fine tuner.

Furthermore, the system seems to be able to detect when a particular technique is used, for example on the lowest part of bent notes where the vocal tract is used by the musician, as the difference between the techniques is clearly seen on the cost function.

Acknowledgments

The authors would like to thank Christophe Vergez for many discussions and very helpful comments on the manuscript.

In memoriam

This article is dedicated to the memory of Joël Gilbert (1963–2022), who was at the origin of this project and a constant source of ideas.

Appendix

A.1 Mean opening

Suppose that p and h are of period T, write – for the mean value of any T-periodic function. Applying it on equation

h''+ωlQlh+ωl2(h-H)=pm-pμ$$ h\mathrm{\prime\prime}+\frac{{\omega }_{\mathcal{l}}}{{Q}_{\mathcal{l}}}h\mathrm{\prime}+{\omega }_{\mathcal{l}}^2(h-H)=\frac{{p}_m-p}{\mu } $$(A1)gives

h''̅+ωlQlh̅+ωl2(h̅-H)=pm-p̅μ.$$ \overline{h\mathrm{\prime\prime}+\frac{{\omega }_{\mathcal{l}}}{{Q}_{\mathcal{l}}}\overline{h\mathrm{\prime}+{\omega }_{\mathcal{l}}^2\left(\bar{h}-H\right)=\frac{{p}_{\mathrm{m}}-\bar{p}}{\mu }. $$(A2)

As h is T-periodic, h̅=h''̅=0$ \overline{h\mathrm{\prime}=\overline{h\mathrm{\prime\prime}=0$, and as p̅=0$ \bar{p}=0$ we get ωl2(h̅-H)=pmμ$ {\omega }_{\mathcal{l}}^2(\bar{h}-H)=\frac{{p}_{\mathrm{m}}}{\mu }$. So that

h̅=H+pmμωl2.$$ \bar{h}=H+\frac{{p}_{\mathrm{m}}}{\mu {\omega }_{\mathcal{l}}^2}. $$(A3)

A.2 Tables

Table A1

Optimization steps. Each line represents the result of a gradient descent performed by the dual annealing algorithm.

Table A2

Values obtained by optimization for different notes played by musician B (mezzo forte) for a width w = 0.012 m. Comparison of measured vs. simulated sounds with these parameters can be found at http://perso.univ-lemans.fr/smauge/mask/#sounds.

Table A3

Values of the coefficients in equation for the impedance decomposition for the Bb trombone in first position 2

References

  1. M. Campbell, J. Gilbert, A. Myers: The science of brass instruments. Springer Nature Switzerland AG, Cham, 2021. [CrossRef] [Google Scholar]
  2. R.L. Harrison-Harsley: Physical modelling of brass instruments using finite-difference time-domain methods. Ph.D. thesis, University of Edinburgh, 2018. [Google Scholar]
  3. H. Boutin, J. Smith, J. Wolfe: Trombone lip mechanics with inertive and compliant loads (“lipping up and down”). The Journal of the Acoustical Society of America 147, 6 (2020) 4133–4144. [CrossRef] [PubMed] [Google Scholar]
  4. F. Avanzini, M. van Walstijn: Modelling the mechanical response of the reed-mouthpiece-lip system of a clarinet. Part I. A one-dimensional distributed model. Acta Acustica united with Acustica 90, 3 (2004) 537–547. [Google Scholar]
  5. M. van Walstijn, F. Avanzini: Modelling the mechanical response of the reed-mouthpiece-lip system of a clarinet. Part II: A lumped model approximation. Acta Acustica united with Acustica 93, 3 (2007) 435–446. [Google Scholar]
  6. A. Muñoz Arancón, B. Gazengel, J.-P. Dalmont, E. Conan: Estimation of saxophone reed parameters during playing. The Journal of the Acoustical Society of America 139, 5 (2016) 2754–2765. [Google Scholar]
  7. V. Chatziioannou, M. van Walstijn: Estimation of clarinet reed parameters by inverse modelling. Acta Acustica united with Acustica 98, 4 (2012) 629–639. [CrossRef] [Google Scholar]
  8. T. Helie, C. Vergez, J. Levine, X. Rodet: Inversion of a physical model of a trumpet, in: Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No. 99CH36304), vol. 3, 1999, pp. 2593–2598. ISSN: 0191–2216. [CrossRef] [Google Scholar]
  9. V. Chatziioannou, S. Schmutzhard, M. Pàmies-Vilà, Alex Hofmann: Investigating clarinet articulation using a physical model and an artificial blowing machine. Acta Acustica united with Acustica 105, 4 (2019) 682–694. [CrossRef] [Google Scholar]
  10. T. Smyth, J.S. Abel: Toward an estimation of the clarinet reed pulse from instrument performance. The Journal of the Acoustical Society of America 131, 6 (2012) 4799–4810. [CrossRef] [PubMed] [Google Scholar]
  11. L. Velut, C. Vergez, J. Gilbert, M. Djahanbani: How well can linear stability analysis predict the behaviour of an outward-striking valve brass instrument model?. Acta Acustica united with Acustica 103, 1 (2017) 132–148. [CrossRef] [Google Scholar]
  12. S.J. Elliott, J.M. Bowsher: Regeneration in brass wind instruments. Journal of Sound and Vibration 83, 2 (1982) 181–217. [CrossRef] [Google Scholar]
  13. B. d’Andréa-Novel, J.-M. Coron, T. Hélie: Asymptotic state observers for a simplified brass instrument model. Acta Acustica united with Acustica 96, 4 (2010) 733–742. [CrossRef] [Google Scholar]
  14. C. Vergez, P. Tisserand: The BRASS project, from physical models to virtual musical instruments: playability issues, in: R. Kronland-Martinet, T. Voinier, S. Ystad (Eds.), Computer music modeling and retrieval. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2006, pp. 24–33. [CrossRef] [Google Scholar]
  15. V. Fréour, L. Guillot, H. Masuda, C. Vergez, B. Cochelin: Parameter identification of a physical model of brass instruments by constrained continuation. Acta Acustica 6 (2022) 9. [CrossRef] [EDP Sciences] [Google Scholar]
  16. R. Mattéoli, J. Gilbert, C. Vergez, J.-P. Dalmont, S. Maugeais, S. Terrien, F. Ablitzer: Minimal blowing pressure allowing periodic oscillations in a model of bass brass instruments. Acta Acustica 5 (2021) 57. [CrossRef] [EDP Sciences] [Google Scholar]
  17. R. Mattéoli, J. Gilbert, S. Terrien, J.-P. Dalmont, C. Vergez, S. Maugeais, E. Brasseur: Diversity of ghost notes in tubas, euphoniums and saxhorns. Acta Acustica 6 (2022) 32. [CrossRef] [EDP Sciences] [Google Scholar]
  18. D.J. Ewins. Modal testing: Theory, practice and application, 2nd edn. Research Studies Press, Baldock Hertfordshire England Philadelphia PA, 2000. [Google Scholar]
  19. E.J. Doedel. Auto-07p, continuation and bifurcation software for ordinary differential equations. Ver. 0.9.3. https://github.com/auto-07p/auto-07p. [Google Scholar]
  20. P. Guyot. Fast python implementation of the Yin algorithm. (Version v1.1.1). Zenodo. https://doi.org/10.5281/zenodo.1220947. [Google Scholar]
  21. A. de Cheveigné, H. Kawahara: YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America 111, 4 (2002) 1917–1930. [CrossRef] [PubMed] [Google Scholar]
  22. Y. Xiang, D.Y. Sun, W. Fan, X.G. Gong: Generalized simulated annealing algorithm and its application to the Thomson model. Physics Letters A 233, 3 (1997) 216–220. [CrossRef] [Google Scholar]
  23. M. Campbell: Brass instruments as we know them today. Acta Acustica united with Acustica 90, 4 (2004) 600–610. [Google Scholar]
  24. J.S. Cullen, J. Gilbert, D.M. Campbell: Brass instruments: Linear stability analysis and experiments with an artificial mouth. Acta Acustica united with Acustica 86, 4 (2000) 704–724. [Google Scholar]
  25. I. Lopez, A. Hirschberg, A. Van Hirtum, N. Ruty, X. Pelorson: Physical modeling of buzzing artificial lips: The effect of acoustical feedback. Acta Acustica united with Acustica 92, 6 (2006) 1047–1059. [Google Scholar]
  26. M.J. Newton: Experimental mechanical and fluid mechanical investigations of the brass instrument lip-reed and the human vocal folds. Ph.D. thesis, University of Edinburgh, 2009. [Google Scholar]
  27. O. Richards: Investigation of the lip reed using computational modelling and experimental studies with an artificial mouth. Ph.D. thesis, University of Edinburgh, 2003. [Google Scholar]
  28. X. Rodet, C. Vergez, Physical models of trumpet-like instruments. Detailed behavior and model improvements, in: Proceedings of the 1996 International Computer Music Conference, ICMC 1996, Hong Kong, August 19–24, 1996. Michigan Publishing, 1996, pp. 448–453. [Google Scholar]
  29. S. Adachi, M. Sato: Time-domain simulation of sound production in the brass instrument. The Journal of the Acoustical Society of America 97, 6 (1995) 3850–3861. [CrossRef] [Google Scholar]
  30. S. Bromage, M. Campbell, J. Gilbert: Open areas of vibrating lips in trombone playing. Acta Acustica united with Acustica 96, 4 (2010) 603–613. [CrossRef] [Google Scholar]

Cite this article as: Maugeais S. & Gilbert J. 2023. Brass player’s mask parameters obtained by inverse method. Acta Acustica, 7, 28.

All Tables

Table 1

Values of two different sets of mask parameters giving almost identical signals: relative RMS difference is 0.9% and difference in frequencies is 1.41 cents. Mouth pressure is equal for both simulations and fixed at 2500 Pa.

Table 2

Data for optimization: Mouth pressure pm = 1656 Pa and width of the lips w = 12.10−3 m. Search space is Fl[150, 200]$ {F}_{\mathcal{l}}\in [150,\enspace 200]$, Ql[0.1, 6]$ {Q}_{\mathcal{l}}\in [0.1,\enspace 6]$, μ ∈ [0.1, 3], H ∈ [10−5, 10−3]. Error on playing frequency: 6 cents.

Table A1

Optimization steps. Each line represents the result of a gradient descent performed by the dual annealing algorithm.

Table A2

Values obtained by optimization for different notes played by musician B (mezzo forte) for a width w = 0.012 m. Comparison of measured vs. simulated sounds with these parameters can be found at http://perso.univ-lemans.fr/smauge/mask/#sounds.

Table A3

Values of the coefficients in equation for the impedance decomposition for the Bb trombone in first position 2

All Figures

thumbnail Figure 1

Block diagram explaining how to get a C$ \mathcal{C}$-admissible set of mask parameters from a recorded signal and use it to assess the precision of the algorithm.

In the text
thumbnail Figure 2

Measured mouth pressure Pm averaged over one period as a function of the playing frequency for all three musicians and the six notes.

In the text
thumbnail Figure 3

RMS error of the signals obtained from optimized mask parameters, relative to the measured signal for 6 different notes and the three musicians.

In the text
thumbnail Figure 4

Typical waveform for the recorded signal (green) and for the signal obtained from optimized mask parameters (orange) as played by musician A2 on a Bb3. The difference between signals is drawn in dashed red. The relative RMS error is 0.28.

In the text
thumbnail Figure 5

Frequency error in cents of the signals obtained from optimized mask parameters, relative to the recorded signal for 6 different notes and the three musicians.

In the text
thumbnail Figure 6

Variation of Fl$ {F}_{\mathcal{l}}$ as a function of Fplay for all three musicians in circles, together with the diagonal Fplay=Fl$ {F}_{\mathrm{play}}={F}_{\mathcal{l}}$ in dashed black, and the regression line in dashed red.

In the text
thumbnail Figure 7

Variation of Ql$ {Q}_{\mathcal{l}}$ as a function of Fplay for all three musicians in circles.

In the text
thumbnail Figure 8

Variation of μ−1 as a function of Fplay for all three musicians.

In the text
thumbnail Figure 9

Variation of H as a function of Fplay for the three musicians.

In the text
thumbnail Figure 10

Mean opening of the lips as a function of Fplay for the three musicians.

In the text
thumbnail Figure 11

RMS error for bent note on F4 for all three musicians. The dashed line represents the average frequency of the actual played F4 with no bend.

In the text
thumbnail Figure 12

Frequency error for bent note on F4 for all three musicians. The dashed line represents the average frequency of the actual played F4 with no bend.

In the text
thumbnail Figure 13

Variation of Ql$ {Q}_{\mathcal{l}}$ as a function of Fplay. Crosses represent the measures for all three musicians, circles the initialization parameter for continuation (see Sect. 2.9), and lines the continuation obtained with auto-07p. The dashed line represents the average frequency of the actual played F4 with no bend.

In the text
thumbnail Figure 14

Variation of Fl$ {F}_{\mathcal{l}}$ as a function of Fplay. Crosses represent the measures for all three musicians, circles the initialization parameter for continuation (see Sect. 2.9), and lines the continuation obtained with auto-07p. The dashed line represents the average frequency of the actual played F4 with no bend.

In the text
thumbnail Figure 15

Variation of μ−1 as a function of Fplay. Crosses represent the measures for all three musicians, circles the initialization parameter for continuation (see Sect. 2.9), and lines the continuation obtained with auto-07p. The dashed line represents the average frequency of the actual played F4 with no bend.

In the text
thumbnail Figure 16

Variation of H as a function of Fplay. Crosses represent the measures for all three musicians, circles the initialization parameter for continuation (see Sect. 2.9), and lines the continuation obtained with auto-07p. The dashed line represents the average frequency of the actual played F4 with no bend.

In the text
thumbnail Figure 17

Variation of Hmean as a function of Fplay. Crosses represent the measures for all three musicians, circles the initialization parameter for continuation (see Sect. 2.9), and lines the continuation obtained with auto-07p. The dashed line represents the average frequency of the actual played F4 with no bend.

In the text
thumbnail Figure 18

Variation of RMS error as a function of Pm for all three players.

In the text
thumbnail Figure 19

Variation of frequency error as a function of Pm for all three players.

In the text
thumbnail Figure 20

Comparison of wave functions for measured signal vs. optimized signal for musician B. Top for piano (relative RMS error: 20%, playing frequency error 1.5 cents), and bottom for forte (relative RMS error: 34%, playing frequency error 0.7 cents).

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.