Brass player ’ s mask parameters obtained by inverse method

– An optimization method is proposed to ﬁ nd mask parameters of a brass player coming from a one degree of freedom lip model, with only constant mouth pressure and periodic mouthpiece pressure as input data, and a cost function relying on the waveform and the frequency of the signal. It delivers a set of parameters called C -admissible, which is a subset of all mask parameters that allow the inverse problem to be well de ﬁ ned up to an acceptable precision. Values for the mask parameters are found that give a good aproximation of real signals, with an error on the playing frequency of less than 5 cents for some notes. The evolution of the mask parameters is assessed during recordings with real musicians playing bend notes and their effects on the playing frequency are compared to the theoretical change on a model.


Introduction
Models of brass instruments have been used for a long time to understand the physics (cf. [1]) and synthesize their sound (cf. [2]). However, one remaining difficulty is the calibration of these models as many of the constants appearing in the mathematical equations are difficult to measure, in particular when they involve human body parts such as the lips of a brass player.
A good calibration of these parameters is useful not only for sound synthesis, where a realistic sound is the main goal, but also for theoretical purposes, as the equations governing models of brass instruments are nonlinear, and are therefore very sensitive to changes in the constants of the model.
In the literature, many models exist for the lips, with different degrees of complexity, either as a one degree of freedom oscillator (which is the most common), or a two degrees of freedom oscillator taking into account different polarities (cf. [3]), and models trying to come closer to the geometry of the opening section of vibrating lips (cf. [1], Sect. 5.1.2). However, the more complex the model, the more constants there are that need to be calibrated, and the higher the uncertainty.
The question of calibration of the parameters is not restricted to lip models of brass players, and a related problem is that of the embouchure of reed instruments which may seem easier as measures can be undertaken directly on the reed. The study of the reed parameters is the source of a large literature (see e.g., [4][5][6][7][8][9][10]) which can be used as a source for methods dedicated to the lips.
Concerning brass instruments, the body of literature is more reduced, although many articles deserve to be cited and will serve as reference for the present work (see [11] for a table summarizing known values). Most notably, [12] who was one of the first to give a complete set of parameters, [13] who built an asymptotic state observer, [14] who used simulated annealing with a cost function depending on playing frequency only, [3] with data coming from high speed camera and [15] using bifurcation diagrams.
The present article aims to provide a new method to identify embouchure parameters that can be used with very little apparatus on actual musicians, and that can follow their evolution while playing. It introduces a new cost function which is a combination of [14] (for the frequency part) and [8] (without the displacement), together with some penalization (see Sect. 2.4). An optimization algorithm is used to minimize this cost function on recordings with actual musicians and the results are discussed (cf. Sect. 3).

Model
To reduce the amount of parameters that must be calibrated, the model chosen in the present article is the simplest one described by (1), with only three equations (cf. [16]), that proved to replicate many of the properties of brass instruments (see e.g., [17]). It relates the mouth pressure p m to the mouthpiece pressure p and the opening h through a spring-mass-dashpot equation describing the lips, a valve effect computing the flow u through the lip from the difference of pressure, and the expression of the input impedance: where h + = max(h, 0), w is the width of the lips, and Z c the characteristic impedance.
Here, the impedance is decomposed using modal analysis as a sum of simple fractions and p n being complex valued. The variables p n form a decomposition of the mouthpiece pressure using the special form of the impedance. They are not really modal coordinates, as they are not naturally orthogonal for some scalar product, but give a convenient way to solve the problem.
The input impedance is measured through an impedance bridge and is therefore known. It is decomposed into formula (2) using the Rational Fraction Polynomial method (see [18], Sect. 4.4.3), giving the values for s n and C n which are therefore fixed for a given instrument.
The value of C n and s n computed for the Bb trombone in first position and used in the measures (basse trombone Courtois and mouthpiece Holton) are given in Table A3.
The mask parameters are the remaining constants appearing in equations (1), and are the lip angular resonance frequency x ' ¼ 2pF ' , the quality factor Q ' , its surface density l and its opening at rest H.
The set of equations (1) can be rewritten as a real valued ordinary differential equation (cf. [16]) _ y ¼ f ðyÞ of order 1 in dimension 2N + 2 with a state variable Such a formulation is very convenient both for time numerical simulation, and for the study of bifurcation diagrams, as computed by the software auto-07p [19] (see Sect. 2.9). The time simulations performed in this project are based on the Runge-Kutta 4 algorithm and are coded in C language as a large number of them are performed. The sampling rate is set to 44,100 Hz as it gives a sufficient precision for the results (comparable to measured data), ensures the stability and convergence of the numerical scheme, and is sufficiently fast (0.18 s for a one second signal).
The time simulations have to be performed on a time range long enough so that stationary regime is attained. In practice, this means signals of up to 4 s have to be simulated as transient regime can be quite long for some mask parameters.
For a set of mask parameters we write p M for the mouthpiece pressure obtained by solving the system (1) with zero initial value y(0) = [0, . . ., 0] (cf. Eq. (3)). All the other parameters of the model, including p m , are fixed and constant during one optimization.

Experimental protocol
The goal of the project is to get access to as many mask parameters as possible with as little apparatus as necessary so as not to hinder the musician's playing. We therefore focused on only two piezoresistive pressure sensors (Endevco 8507C-5): one in the mouth of the musician (or artificial mouth), and one in the mouthpiece. Simultaneous recordings give access at each time step to the quantities p m and p. Both signals are sampled at 44,100 Hz.
As only one period of p is needed by the algorithm below, the signal can be broken into pieces to see the evolution of parameters (see Sect. 2.4). Once a period is chosen, p m is averaged on the same time period to have a constant value to feed the numerical simulation.
Three sets of measures were performed with two experienced amateur trombone players, one of them being recorded twice, labeled A1, A2 and B in the rest of this article. For each session, the musician was asked to play 6 notes on a Bb bass trombone in first position (Bb2, F3, Bb3, D4, F4, Bb4), together with a bend on F4 (first down, then up), and a crescendo on F4.

Extraction of a signal's period
For a given sampled signal p, either simulated (as in Eq. (1)) or measured (see Sect. 2.2), the determination of a period is critical for the method and the first step to perform it is the identification of the periodic regime and its frequency. This is done using a python implementation [20] of the Yin algorithm (see [21]).
The Yin algorithm produces an estimator called the harmonic rate, which is a real number between 0 and 1, that gives a quantification of how periodic the signal is: A harmonic rate very small (ideally 0) meaning that the signal is close to being periodic. In practice, we consider the signal to be periodic when the harmonic rate is lower than 10 À3 , and extract the part of the periodic regime with the smallest harmonic rate.
Yin also gives an estimation of the instantaneous frequency F at each point. From this, it is already possible to extract a waveform p y of duration exactly one period in the periodic regime. However, as this waveform has to be compared to another one coming from a reference signal, a phase condition has to be fixed. We therefore demand that all the waveforms begin by crossing 0; in an increasing way:

& ð5Þ
This is always possible in practice as p has a mean value equal to 0. The normalization is achieved by considering the waveform p y and shifting it to the left until the first point satisfies the phase condition, giving rise to a new waveformp which is used as a reference.
It should be noted that on a general signal, the phase condition may not be sufficient to uniquely determinep. However, for the signals obtained either numerically or experimentally, this condition proved to be sufficient.
For a set of mask parameters M, we also write F M and p M for the frequency and normalized waveform of the signal p M obtained in Section 2.1.

Definition of the cost function
The goal of the cost function is to try to compare two periodic signals p ref and p, from which frequencies F ref , F and waveformsp ref ;p are extracted. The signalp ref can be either a recorded signal, or a simulated signal obtained from known mask parameters (for test purposes), and is the reference against which the model outputs are compared. Although in theory it should be enough to comparep ref andp, it puts too much emphasis on the waveform itself, and too little on the frequencies. As both timbre and intonation are important for the applications, it is necessary to add an extra weight to the difference in frequencies.
The preliminary cost function C is therefore with jjp ref jj The first term of the sum is the square of the relative RMS difference, and the second one the square of the relative frequency difference. The choice of the constant a F guides the optimization procedure either toward a better approximation of the waveform (small a F ) or toward a better approximation of the frequency (big a F ).
In our case, the choice of a F = 0.02, obtained by trial and error, leads to good results during optimization for trombone sounds, in that intonation (errors around 10 cents) and waveforms (errors around 30%) are respected. In particular it means that the difference in cents between two signals is ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi C ðp ref ; pÞ=a F p .

Penalization
As is already well known (cf. [8]) the inversion problem is not well defined and it is actually easy to find multiple sets of mask parameters which give signals with very similar waveforms (see Tab. 1).
This means in particular that the cost function lacks convexity. One typical solution to remedy this problem is to convexify the cost function using Tikhonov regularization, which amounts to adding quadratic terms with respect to some mask parameters. More precisely, we define the complete cost function C The choice on the specific penalization has been made on Q ' and H because it proved to be sufficient to have a well defined solution up to a sufficiently good precision (cf. Tab. 1), necessary to remove very different solutions (cf. Sect. 2.8).
It should be noted that this particular choice of penalization, instead of a more general form like (H À H 0 ) 2 for a reference H 0 which should be fixed for the whole optimization, implies that the optimization procedure will favor solutions with the smallest quality factor and lip opening. This was chosen for lack of a good candidate for H 0 .
This particular choice of penalization proved to give results close to those in the literature, except for H (cf. Sect. 3).
The method for fixing the values of b Q and b H is done so that a typical value of the penalization should be of the same magnitude as C ðp ref ; pÞ. As we expect C ðp ref ; pÞ to be about 0.3 (cf. Sect. 3), that Q ffi 7 and H ffi 10 À4 m (chosen among the known values in [11]), we took b Q = 5 Â 10 À3 and b H = 3 Â 10 7 m À2 .

Continuity, optimization algorithm
The algorithm chosen to find the minimum of the cost function is the dual annealing optimization (cf. [22]) which is a stochastic algorithm that requires neither the cost function to be regular, nor the minimum to be unique.
Indeed, the cost function in this article is not continuous: a small variation of the mask parameters can lead to completely dissimilar solutions. For example the trombone player can obtain the different notes by only varying its lip resonance frequency: the variation of playing frequency with lip resonance frequency clearly has jumps (cf. [16], Fig. 4).
Another problem of the cost function is that it has many local minima. Although we do not have a mathematical or musical reason for this, it is clearly seen during the optimization process using dual annealing as it performs local searches before jumping to other locations (see Tab. 2 where each line represents a local minimum).

Limitations of dual annealing
The dual annealing algorithm is known to give a solution for some very slow (logarithmically) decreasing temperatures (cf. [22]), but that kind of evolution of the temperature implies a very slow convergence. In practice, a faster decreasing temperature is used, but the convergence is not assured.
Moreover, as this algorithm is of stochastic nature, there is no simple criterion to stop it, and an arbitrary condition has to be chosen: the choice was made to bound by 1000 the number of calls to the cost function. With this choice, a typical run lasts about 1 h on a desktop computer.
The probabilistic nature of this algorithm also means that two runs of the same algorithm, with the same initialization data (except for the seed of the random generator) give different solutions that can be quite different, as the global minimum might not have been reached in one run. It is therefore often necessary to launch the algorithm iteratively until a new run does not produce a solution with lower cost function. We say that a set of mask parameters satisfying this hypothesis is C-admissible. A random set of mask parameters is not C-admissible in general . Moreover, this definition highly depends on the choices made for the definition of C, be it a F or the choice of penalizations. A set of mask parameters may be C-minimal for one choice, but no longer for another one!

Precision of the algorithm
The assumption in this article is that for every "realistic" signal (i.e. coming from the recording of a trombone), there is only one mask parameter that is C-admissible. It is not at all clear that this is true, and this is even known to be false if the penalizations are not added (see Sect. 2.5). Taking this assumption for granted, a set of mask parameters obtained by minimization of the cost function is automatically C-admissible.
To find a suitable C-admissible set of mask parameters and to stay close to an actual trombone signal, so that the robustness of the optimization procedure can be tested, we applied the optimization procedure to a reference signal on a recorded D4 (cf. Sect. 2.2), cf. Figure 1.
The obtained values are given in the second column of Table 2, denoted by M ref . The set of mask parameters used for the initialization of the dual annealing algorithm is given in the third column, and the result of the optimization algorithm is in the last one. The search space is given in its caption.
The resulting waveforms for the two sets of mask parameters are indistinguishable, with a relative RMS error of only 3%, and an error on the frequencies of about 6 cents.
The results of optimization are acceptable (much lower than the dispersion of the values found in the literature) and representative of the errors we found on other simulations (cf. Sect. 3) except for the opening at rest H , but its value is so small that it is hard to give a physical interpretation (see Sect. 3.2.4).
The difference between M ref and M optim may be explained by the fact that we had to stop the algorithm at one point (cf. Sect. 2.7) or because C is insufficiently convexified.

Continuation
During the analysis of the bend (cf. Sect. 3.4), the continuation software auto-07p ( [19]) is used to follow the evolution of the playing frequency with different parameters.
The continuation is first initialized for a reference set of mask parameters M 1 which is chosen for each musician to be the point which minimizes the cost function C among all the optimized values of mask parameters, so as to be as close as possible to the actual recording.  The bifurcation diagram is built using the dependency on p m up to the recorded value of the reference signal, so that auto-07p is now precisely set to the signal p M 1 .
Then a continuation curve along one of the mask parameters (either Q ' , F ' , H or l), d is computed, all other physical variables being fixed, and the playing frequency is drawn.

Results
The results obtained for the different musicians are presented in the following subsections, but first it is interesting to look at the mouth pressure as a function of the playing frequency, cf Figure 2. Indeed, although all these data are directly recorded, and not optimized, we can clearly see differences between the two recording sessions of musician A, where the second session has a larger mouth pressure, which could translate into perceptible differences within the optimized data of a single musician.

Errors on sustained notes
Both RMS and frequency errors obtained at the end of optimization are presented in Figures 3 and 5. The RMS error can be quite large for some notes (up to 40%), which is not surprising as the model is one of the simplest and many physical details are neglected. As the optimization looks for a best fit among all mask parameters, this means the model should be complexified to take into account more of the physics of the instrument if precision on timbre and playing frequency have to be maintained, provided the dual annealing algorithm give results close enough to the global minimum.
For reference, a typical waveform is shown in Figure 4 where the reference signal is given in green, the reconstructed signal is in orange and the difference between them is in dotted red. The relative RMS error for this particular signal is 0.28. Although many properties are well approximated, the higher harmonics of the signal are clearly not in agreement with the experimental signal. This gives a typical value that can be expected for the RMS error.
Concerning repeatability, estimations of the mask parameters of musician A are coherent and give almost the same results for both the RMS error and the frequency error (except for the playing frequency of the note Bb4). However, the errors differ largely between both players, player B mainly getting the lowest error. This may suggest that both musicians use different techniques, and that player B is closer to the simple model (1).
Note in particular the difference between errors for the note F3, where musician A has the largest RMS error of all (cf. Fig. 4), and musician B has one of the lowest.

Lip resonance frequency
The lip resonance frequency as a function of the playing frequency is shown in Figure 6 for all three musicians. As for any outward model, the frequency of the lips is lower than the playing frequency (cf. [23]), which is clearly seen in this figure as the circles are below the line F play ¼ F ' .
Note that for a given frequency, there is little dispersion from player A (A1 or A2) to player B. Moreover the regression line gives a good fit with R 2 = 0.996 and could be used as a first estimation of the playing frequency using only the lip frequency.

Quality factor
The estimated quality factors for all three players are displayed in Figure 7. As in the case of lip frequency (cf. Sect. 3.2.1) results are very close for all three players, and also for all notes, being between 2 and 5.  Compared to the literature, they are however smaller than the measured values of [24] (between 9 and 10.5) but comparable to the estimation of [25] (around 5), [26] (between 1.2 and 1.8), [27] (around 3.7), [28] (around 2.88) and [29] (between 0.5 and 3). Except for the first reference, this justifies the penalization on Q ' , which tends to favor the smallest possible values.
The values obtained for the two recordings of player A are always very close, this may mean that it depends very little on the loudness.

Surface density
The optimized values of l À1 are given in Figure 8. Except for the 4 highest values, they are comparable to [28]. However, they are overestimated compared to other values found in the literature ( [12,24,25,27]) where they are between 0.03 and 0.2 m 2 Ákg À1 .
We can see that the data for A1 is systematically higher than that of A2, indicating a possible dependency on the mouth pressure and the loudness.

Opening at rest
The values of opening at rest obtained by optimization are given in Figure 9. They seem very small compared to what was obtained by other authors, up to a factor 10: a typical value obtained by optimization is around 2 Â 10 À5 m (see Fig. 9) whereas [24] and [27] have a typical value of 5 Â 10 À4 m, [25] 2 Â 10 À4 m, and [29] 1.10 Â 10 À3 m, all with comparable lip's width.
However, when shifting from opening at rest to mean opening (cf. Fig. 10) using formula (A3), which takes also into account the mouth pressure, the lip frequency and the lip surface density, the results are comparable to those of [30], which are between 0.6 mm and 2 mm. It should be noted that in this case, the value of opening at rest is negligible in the formula (A3) in Appendix.
Just as for l À1 there appears to be a correlation between loudness and mean opening. It is not only expected, but actually obvious from the formula (A3) where the mouth pressure appears.    . Typical waveform for the recorded signal (green) and for the signal obtained from optimized mask parameters (orange) as played by musician A2 on a Bb3. The difference between signals is drawn in dashed red. The relative RMS error is 0.28.

Optimization for bent notes
The musicians were instructed to perform pitch bends on F4: without moving the slide, the player used embouchure adjustments to vary the pitch, first below its normal value, then above, then below, and then back to F4. For each recording of approximately 10 s, the signals are cut into chunks of 0.2 s, and the optimization procedure is applied independently on each chunk. The note F4 has been chosen because it is one of the most comfortable to bend for the musician. The RMS and frequency errors can be found in Figures 11 and 12.
The RMS error is around 0.25, except for a group of notes played by musician B with low frequencies, which may be related to a particular technique used by this musician.
The frequency error is quite low except for the lowest notes, and is in agreement with the very low frequency error found for F4 in Figure 5. This suggests that the model is not able to predict precisely what the musician is doing for the lowest frequencies of the bend.
Indeed, there seems to be two regions for the errors, with a change at around 352 Hz, as if the optimization process could not find parameters that fit the playing frequency below that value. In practice, musicians often use special techniques to bend to very low notes, such as using the vocal tract. This technique is clearly not taken into account in the model, and it seems the algorithm indicates its own limits.

Discussion on bent notes
During bending, the musician varies many parameters. This makes it quite difficult to see the influence of any of them. In the following diagrams, the evolution of playing frequency is shown with respect to the mask parameters. To put it into perspective, the theoretical evolution with respect to only the considered parameter (the other parameters being kept constant) is also computed using auto-07p (see Sect. 2.9). The mask parameters used to initialize the    continuation are those with smallest cost function among all the optimized values for this recording, to ensure that the model is as close to the measures as possible.

Quality factor Q '
The results of the optimization for bent notes is presented in Figure 13 for the quality factor. The values obtained are within the same range as in Figure 7.
One striking feature is the proximity from the measures of musician B, and the results of continuation obtained by auto-07p. It seems like the playing frequency is completely predicted by the evolution of the quality factor. However, the precision of the fit must be put into perspective with the rather large errors in the optimization (see Figs. 11 and 12).
The fit is not so good with musician A, although the results of the continuation go in the right direction.

Lip resonance frequency F '
The results of optimization for the lip resonance frequency are given in Figure 14, and are quite difficult to interpret. Even more than in Figure 12, there seem to be two regions, one before 352 Hz, and one after.
Above 352 Hz, the estimation of lip frequency does not give a clear tendency. Although we could expect the lip frequency to increase with playing frequency, just as in Figure 6, this is not what appears in the figure. This suggests that the lip frequency is only a coarse tuner, and the quality factor is actually the fine tuner.

Lip surface density l
The results of the optimization for bent notes is presented in Figure 15 for the lip surface density. The values are compatible with those in Figure 8, and the evolution of playing frequency relatively to l is compatible with the theoretical one obtained by continuation.

Opening at rest H
The results of the optimization for bent notes is presented in Figure 16 for the opening at rest. The values obtained are within the same range as in Figure 9.
As explained in Section 3.2.4, the values obtained for H are very small, and therefore not very well defined (see error term in Tab. 2).
The mean opening obtained from other optimized values and formula (A3) is given in Figure 17, and a clear tendency can be observed above 350 Hz: the mean opening increases with the playing frequency for all musicians.   Below 350 Hz the tendency is not so clear. Moreover, one must be careful with interpretation as there may be other phenomena involved than those directly modeled (cf. Sect. 3.3).

Optimization for a crescendo
The same procedure as in Section 3.3 was used for the recordings of a crescendo on F4 for all three musicians.
The relative RMS error is presented in Figure 18, and indicates that the higher the mouth pressure, the higher the RMS error. This shows that the simple model (1) is good at reproducing the timbre for low pressure, but not so much for higher pressures. This may be due to the nonlinear propagation along the length of the trombone.
The error in frequency is presented in Figure 19 and is compatible with that in Figure 5. It proves that the model (1) is actually quite good at reproducing the playing frequency, whatever the dynamic of the playing. This indicates that the limits of the model are not so much on the frequency, but more on timbre.
The waveform for two different dynamics are shown in Figure 20, both for the measured signal, together with the reconstructed signal from optimized mask parameters. The difference in timbre is clearly seen for the forte recording.    The details of the figures obtained through optimization during a crescendo can be found at Comparison of measured vs. simulated sounds with these parameters can be found at http://perso.univ-lemans.fr/~smauge/mask.

Conclusion
In this article, a new method is proposed to estimate the mask parameters of a brass musician within a set of acceptable parameters (so called C-admissible). This approach is used on recordings of actual musicians during playing, and is able to deliver a coherent set of parameters (except maybe for the opening at rest), in that they are not too far from existing results in the literature, and their values evolve in a way that is compatible with theory during the playing.
The values obtained prove that a simple model is already capable of reproducing a playing frequency close to that played by an actual musician, with a dynamic and waveform that are similar to the measured ones. This may prove useful for instrument making, although more research should be done to assess the robustness of the method, and investigate the variability of the mask parameters from player to player. Moreover, it can give new leads to a better understanding of intonation, such as the almost linear relation between playing and lip frequencies, or the role of quality factor as a fine tuner. Furthermore, the system seems to be able to detect when a particular technique is used, for example on the lowest part of bent notes where the vocal tract is used by the musician, as the difference between the techniques is clearly seen on the cost function.