Uncertainty in tuning evaluation with low-register complex tones of orchestra instruments

The relationship between perceived pitch and harmonic spectrum in complex tones is ambiguous. In this study, 31 professional orchestra musicians participated in a listening experiment where they adjusted the pitch of complex low-register successively presented tones to unison. Tones ranged from A0 to A2 (27.6–110 Hz) and were derived from acoustic instrument samples at three different dynamic levels. Four orchestra instruments were chosen as sources of the stimuli; double bass, bass tuba, contrabassoon, and contrabass clarinet. In addition, a sawtooth tone with 13 harmonics was included as a synthetic reference stimulus. The deviation of subjects’ tuning adjustments from unison tuning was greatest for the lowest tones, but remained unexpectedly high also for higher tones, even though all participants had long experience in accurate tuning. Preceding studies have proposed spectral centroid and Terhardt’s virtual pitch theory as useful predictors of the influence of the envelope of a harmonic spectrum on the perceived pitch. However, neither of these concepts were supported by our results. According to the principal component analysis of spectral differences between the presented tone pairs, the contrabass clarinet-type spectrum, where every second harmonic is attenuated, lowered the perceived pitch of a tone compared with tones with the same fundamental frequency but a different spectral envelope. In summary, the pitches of the stimuli were perceived as undefined and highly dependent on the listener, spectrum, and dynamic level. Despite their high professional level, the subjects did not perceive a common, unambiguous pitch of any of the stimuli. The contrabass clarinet-type spectrum lowered the perceived pitch.


Introduction
The auditory sensation evoked by a musical tone, according to Fletcher [1], can be divided into characteristics of pitch, loudness, and timbre. These aspects all correlate with each other, that is, the pitch is primarily related to the frequency, but spectral differences and intensity may also influence the perceived pitch. However, the relationship between perceived pitch, harmonic component levels, and intensity in complex tones has remained mostly unresolved, in particular in the lowest and highest musical pitch ranges. The present study aims to disentangle the question on the subjective accuracy and influence of harmonic components on perceived pitch in the low-frequency range. This is done with a listening experiment combined with an analysis of the spectral differences between the evaluated tones.
Pitch is a perceptual attribute that allows the ordering of sounds on a frequency-related scale from low to high.
The fundamental frequency is the corresponding physical term, defined as the inverse of the period of the signal. The perceived pitch is highly dependent on the listener and can be quantified only by a listening test, where the pitch of the evaluated tone is compared with the pitch of another tone serving as a reference.
In typical low-cost electronic tuning machines [2], the pitch detection algorithm typically estimates the period of a quasiperiodic signal and transforms the length of the period to the fundamental frequency. In electronic tuning machines, the relative amplitudes of the partials of a harmonic complex tone should not influence the estimated periodic pitch.
The pitch perception of a complex tone is intricate and not even fully explained. Schouten et al. [3] defined the perceived pitch of a complex tone as a joint perception of several individual harmonics (see also virtual pitch by Terhardt [4,5]). In a complex tone, all its harmonics, that is, individual sinusoids, are present simultaneously and are separately coded as neural patterns. Although each harmonic component is coded individually, the aggregate of harmonics is perceived as one tone with a single pitch [6].
In a previous experiment [7], we observed that the perceived pitches of the low-register orchestra instruments at C1 (32.5 Hz) differed notably from each other, although the fundamental frequency and harmonic overtone frequencies were identical. In particular, the contrabass clarinet was perceived even one semitone lower than all other bass instruments. Although there are few studies available that report the influence of the harmonic spectrum envelope on perceived pitch [8][9][10][11], no comprehensive explanation has been published earlier to the authors' knowledge. Latent connections between spectral patterns and perceived pitch are difficult to distinguish without a dimension reduction technique. We approached this phenomenon by principal component analysis (PCA) [12], by which such patterns with certain tendencies can be isolated.

Background
Pitch perception is fundamentally influenced by the physical properties of the evaluated tone. In the context of the current study, variation of the relative levels of harmonic overtones [13] is a central influencing factor. As one underlying effect, some harmonics might be masked by other stronger harmonics, possibly altering the perceived pitch. The result may be considered as a synthesis of different neurophysiological and psychoacoustical mechanisms.
The following sections shortly review previously proposed mechanisms for the influence of the spectral envelope on the perceived pitch.

Pitch of individual sinusoids
A harmonic spectrum consists of individual sinusoids which can be perceived differently depending on their intensity and frequency. Stevens [14] observed that the perceived pitch of high sinusoidal tones (>2500 Hz) increased with intensity (up to 13.5% at 12 kHz), whereas the perception of low sinusoids (<2500 Hz) showed an opposite effect of decreased pitch (up to 6% at 150 Hz). He suggested that the resonant characteristics of the ear acted as a divider. Snow [15] validated this phenomenon in the low register (<1000 Hz). Morgan and Galambos [16] observed high interindividual differences in the direction and magnitude of pitch shifts when the intensity was changed. They suggested that irregularities in auditory sensitivity between individual listeners may explain even contradictory observations with the same stimuli. Cohen [17] in part replicated Stevens' experiment with more participants with a similar outcome. Although the directions of the pitch shifts were congruent, the magnitudes of the shifts were smaller (<2%).
In summary, the pitch of individual sinusoidal tones may bend down or upwards depending on intensity, frequency, and listener. This may influence the perceived pitch of a complex tone if downwards or upwards-shifted individual harmonics are dominant. This effect has been considered in advanced pitch detection algorithms (e.g., Terhardt's model [13]). Short reviews of previous studies on how individual harmonics (or groups of harmonics) may influence the perceived pitch are given in the following sections.

Terhardt's algorithm for simulating pitch perception
Moving from a pure tone to multiple sinusoids, which together constitute a complex tone, introduces a range of overlapping aspects.
Terhardt et al. [5] and later Terhardt et al. [13] presented an algorithm for simulating pitch perception in a human listener including a number of auditory processes. The final estimation of pitch is dependent on spectral pitch (analytic listening to individual harmonics) and virtual pitch (holistic listening to one evoked pitch). In this model, there are two separate pitch modes, the spectral pitch pattern (SP) and the virtual pitch pattern (VP). Both patterns include several pitch values and weights that determine the relative prominence of the individual pitches. SP includes spectral analyses, extraction of tonal components, masking of other harmonics, pitch shifts of individual harmonics, and weighting of spectral dominance. VP is derived from SP with an algorithm for subharmonic coincidence. Attributing to the original Terhardt's code, an implementation in AARAE package [18] by Cabrera was employed in the current study.

Spectral centroid
The spectral centroid describes the balance point in frequency for the spectral energy. It is a robust indicator of the perceived "brightness" of a complex tone [19]. In simplified experiments, when the spectral centroid is moved from low to middle and high harmonics, its influence on the change in perceived pitch and timbre is possible to evaluate. Singh and Hirsh [8] explored how changes in spectral locus and fundamental frequency affect perceived pitch. They found that changes in fundamental frequency were the primary cue, but also changes in the spectral centroid had some influence on the pitch perception. "Brighter" or "sharper" tones, corresponding to a higher spectral centroid, were perceived as higher in pitch. However, in our opinion, listeners may not be able to distinguish high pitch from bright timbre with the applied listening test method. All in all, this effect may also be explained by the listeners' attention to individual harmonics, i.e., analytic listening.

Dominance region
The spectral region which is most important to the pitch perception is called the dominance region. Ritsma [20] found that the six lowest harmonics dominate the pitch perception as long as their amplitudes exceed the hearing threshold level (fundamental frequencies were 100, 200, and 400 Hz). Moore et al. [21] evaluated the existence of a dominance region by mistuning individual harmonics.
They concluded that the dominating harmonics were within the six lowest partials, at least for fundamental frequencies of 100, 200, and 400 Hz. In a later study, Jackson and Moore [22] suggested that in the case of complex tones with low fundamental frequency, the low harmonics in general mask the information of the higher harmonics.
Dai [23] argued that in the case of complex tones with a low fundamental frequency (<800 Hz), harmonics close to a fixed frequency at about 600 Hz are the most important or dominating in the pitch perception, irrespective of their rank order. This result contrasts with the findings above where the dominating harmonics are claimed to depend on the fundamental frequency (fixed harmonic order). Moreover, Terhardt et al. [5] proposed that there is a spectral dominance region that is symmetrical in log frequency and centered on 700 Hz, i.e., in absolute frequency. Partials nearer the center of that curve are more likely to be dominant.

Influence of spectral envelope change on perceived pitch in a musical context
There are only a few studies available on the influence of changes in the envelope of a harmonic spectrum on the perceived pitch in a musical context. Russo and Thompson [9] investigated musical intervals and their perceived size when the slope of the envelope of a harmonic spectrum was manipulated. The results indicated that changes in the spectrum envelope did not directly affect the perceived pitch, but the size of the intervals was experienced as expanded or contracted.
Vurma and Ross [10] had a more musical approach to the topic. In their first experiment, professional classical singers matched the pitches of their voices to synthesized oboe and piano tones. Since the participants sang in different registers from bass to soprano, it was clear that all participants did not match their voice to a common reference tone. However, the vocal sounds were on average 7-13 cents lower than the instrumental sounds.
In the second experiment, participants rated in a forcedchoice task whether the instrumental sounds were lower, equal, or higher with respect to the vocal sound of a single performer. As a result, an approximately 20 cents lower vocal sound was perceived to be in tune with an instrumental sound. This, in contrast, indicates that the pitch of the vocal sound was perceived as higher than the corresponding instrumental sound with the same fundamental frequency. As an explanation, they suggested differences in the energy distributions of the power spectra. In a second paper, Vurma et al. [11] had more participants (13 musicians and 13 nonmusicians) and different instruments (tenor opera singer, viola, and trumpet). The experimental design applied pairwise comparison of successively presented tones. The tones described by the authors having a brighter overall timbre (trumpet and tenor voice) were perceived about 15-20 cents higher than the viola, which on its turn had a darker timbre. However, no deeper analysis of a possible influence of differences in the spectral envelope was presented.

Psychoacoustical vs. neurophysiological methods
The short overview above of some psychoacoustic studies on pitch perception indicates that several underlying neurophysiological processes influence the perceived pitch of a complex tone. The processes include, among others, pitch bending of an individual harmonic, masking of harmonics, and irregular auditory sensitivity in listeners. In combination, they may cause interindividual deviations between listeners' perceptions.
In addition to psychoacoustical methods, different neurophysiological techniques for measuring pitch-related neural representations at different locations in the auditory pathway are available all the way from the auditory periphery to the auditory cortex [24][25][26][27][28]. However, current neuroimaging techniques cannot clarify how the spectral properties of complex stimulus tones influence the perception of pitch. Hence, we must continue to approach the multifaceted phenomenon of pitch perception of complex tones with psychoacoustic methods.

Aim and motivation for this study
This study investigates how the perceived pitch in the low-frequency range depends on the spectrum envelope, dynamic level, and listener, using stimuli with complex tones which represent different orchestral instruments.
The focus is on how perceived differences in pitch can be attributed to differences in the spectrum envelope. The study was motivated by a phenomenon observed in an earlier experiment [7], where the pitches of contrabass clarinet tones were perceived to be lower in pitch in comparison with other bass instruments. The presented approach, combining a listening experiment using stimulus tones from orchestral instruments with a principal component analysis on the spectral differences between the evaluated stimuli, has not been reported earlier.

Participants
The participants (N = 31) of the listening experiments were professional orchestra musicians, aged from 31 to 61 years (mean 46.8, SD 8.1; 11 females, 20 males). Most of them are instrument section principals and employed by top-tier Finnish symphony orchestras (Helsinki Philharmonic Orchestra, Finnish Radio Symphony Orchestra, Tapiola Sinfonietta, Turku Philharmonic Orchestra, Finnish National Opera Orchestra). Five subjects reported absolute pitch. All orchestra instruments had at least one representative: 11 strings, 9 woodwinds, 8 brass, 1 harp, 1 piano, and 1 percussion. Although none of the participants self-reported a severe hearing loss, it is obvious that some of them may have mild hearing loss due to the profession and/or age. Usually, severe hearing loss does not occur at lower frequencies, at least not in people who are working as professional musicians in a symphony orchestra. In our experiment, all stimuli had a low fundamental frequency ( 110.5 Hz ±100 cents). According to Moore et al. [21], in the low register tones, the six lowest harmonics are the most important for pitch perception. For that reason, we suppose that a possible hearing loss in the range of higher harmonics (>2 kHz) does not influence significantly the perceived pitch in our experiment.

Stimuli
The listening experiment included stimuli based on processed samples of four instruments: double bass (db), contrabassoon (cbsn), contrabass clarinet (cbcl), and bass tuba (tb). Most samples were extracted from the Vienna Symphony Library (VSL GmbH, Vienna, Austria). The contrabass clarinet and the additional instrument samples in the lowest extrema were captured by the authors with professional musicians. The experiments included a total of 108 individual samples: four instruments at nine musical pitches, and three dynamic levels.
Since the A musical pitch (note) is a common reference tone (A4), we chose A0-A2 for the experiment's musical pitch range despite the fact that no orchestra instrument can play below B[0. The A0 tones were derived from B[0 tones. B[0 was in the normal playable range of all instruments except the double bass. A five-string double bass was tuned down from B0 to B[0 in the recording session.
The attack portion of an instrument tone in the lowest register can be unstable and ambiguous (e.g. tuba). The time until the tuning of a tone reaches a stable state could therefore be long, which in part might strongly influence the results. Therefore, all attacks were removed by editing the recordings and using the stable-tuned part in steady-state wavetable synthesis. A short section from the sustain phase of an instrument signal was isolated and oversampled at 384 kHz sample rate at 32 bits for extracting a single period in Wavelab 9.5 software (Steinberg GmbH, Hamburg, Germany). The high sample rate enabled precise period isolation to avoid artificial discontinuities. After the isolation of an individual period, the spectrum of the sample was analyzed to verify that the spectrum was not distorted due to a discontinuity. Each of the single periods derived from the sustain part was carefully aurally chosen so that they sounded as natural as possible. For that purpose, single periods were multiplied to obtain a suitable duration for evaluation. Finally, the selected periods were imported to Matlab 2018a (Mathworks Inc, MA, USA) for further processing.
The final stimulus signals were generated by repeating the single wave period by a known integer multiple and applying resampling to attain an accurately tuned complex harmonic tone with the desired length. Fundamental frequency calculation and the stimulus synthesis procedure were identical to our previous study, and the entire process has been explained thoroughly in [7], including frequency resolution considerations. Since all instrument samples were extracted from real instruments, the phases of the harmonics were closely representative of the actual instruments.
The subjects reported the instruments to be well recognizable and distinguishable by timbre, even though they did not have an attack, nor did they include small period-toperiod fluctuations in frequency and amplitude (jitter and/or shimmer).
In order to facilitate a repeatable comparison anchor for possible future studies, we included a purely artificial reference instrument with a sawtooth-like waveform in our experimental design. This type of tone was motivated by two factors. First, many acoustical instruments have spectral envelopes relatively close to a 1/f spectral envelope corresponding to a sawtooth waveform. Second, due to the shifts in the perceived pitch of pure tones at different SPLs [14][15][16][17], a certain number of harmonics was found necessary to include in the reference complex tone for producing a more stable perception of the pitch [29].
Various orders of harmonics were evaluated aurally by the authors to arrive at a pleasant and adequately natural timbre of the sawtooth. The implemented sawtooth waveform was synthesized in the frequency domain as a zerophase 1/f amplitude spectrum with 10 harmonics. The choice was a compromise between the stability of perceived pitch and timbre pleasantness. Three supplemental harmonics with a linear amplitude fade out to zero were added above the first ten partials. This addition was due to the observation that an abruptly truncated harmonic series produces a "ghost" tone in perceived pitch, which was not present in the Fourier analysis [30]. It should be noted that the reference sawtooth-like tones only serve as additional data, supplementing the central comparisons between instrument tones. All stimuli were unfiltered.
In Zenodo [31], a complete set of example stimuli is provided in monaural audio files with the following sequence: fixed tone; adjusted tone À6 cent; fixed; adjusted 0 cents; fixed; adjusted +6 cents. Moreover, a complete set of figures including sound pressure level spectra, harmonic peaks, and respective spectral differences for the 10 first harmonics are provided [32].

Experiment design
The task of a participant in the listening experiment was to adjust pairs of two successive tones to perceived unison. Each presented pair consisted of two different instrument tones, where the first one was a fixed tone as a reference and the second one a user-adjustable tone with a tuning range of ±100 cents. Tuning adjustment steps were 3 cents [for adjustment between 0 and ±12c], 4 cents [±12-28c], 5 cents [±28-48c], 6 cents [±48-72c], and 7 cents [±72-100c].
All possible 12 permutations of the four instruments as the reference tone and adjustable tone in a stimulus pair were not included in the test. Three instruments were used for the reference tone and all four instruments for the following adjustable tone according to the following scheme, giving a total of six stimulus combinations of instruments (referenceadjustable): contrabassooncontrabass clarinet/tuba, double basscontrabassoon/tuba, and contrabass clarinetdouble bass/tuba. For pairs including the sawtooth spectrum, this tone was always the reference yielding four combinations.
Nine musical pitches were included (A0, C1, E[1, F]1, A1, C2, E[2, F]2, and A2). These notes correspond to a fundamental frequency range of 27.6-110.5 Hz. In all presented pairs, the first reference tone was tuned to equal-tempered fundamental frequency derived from A4 (442 Hz), and the second tone started arbitrarily from a fundamental frequency between ±15 cents from the reference.
The total number of test pairs was 270 for each subject. All test pairs were different, no repeats were included. This means that every subject evaluated each individual test pair only once. The pairs were presented in random order and each tone was 1 s long. We considered that for the lowest tones it is necessary to have a long enough duration for perceiving a stable pitch. With a shorter duration, there may not be enough periods for accurate perception in the lowest register. Krumbholz et al. [33] used 800 ms and 1000 ms stimulus duration. In addition, Rogala et al. [34] showed that longer tones with a duration of 500-1000 ms are preferable in the lowest register for stable pitch perception.
In contrast to many studies on pitch discrimination, there was no silent gap between the two tones in a stimulus pair. A gap of 500 ms, for example, in Ref. [35], has been traditionally motivated by the circumvention of so-called streaming effects with uninterrupted audio stimuli. However, the present experiment was intended to represent the evaluation of musical intervals as would be experienced by a musician playing a monophonic instrument.
The test pair was repeated continuously until the "next" or "stop" button was pressed. The average duration of the experiment was 1.5 h including regular pauses.
The amplitudes of the tones were equalized to C-weighted sound pressure levels of 62, 68, and 74 dB, respectively, representing three nominal dynamic levels (pp-mf-ff). Equalizing did not change the relative amplitudes between the harmonics in the stimulus tone. Feasible ranges for sound pressure levels were piloted by the authors. The range of presented sound pressure levels with different dynamic levels was reduced from the range that is possible for most orchestra instruments: 62 dB can be regarded only moderately soft for pianissimo, as many instruments can reach considerably lower SPL. Furthermore, most instruments produce substantially more than 74 dB in fortissimo. The lower bound was chosen so that all tones could still be heard and tuned with relative ease. Correspondingly, the upper bound was limited in order to avoid presenting unpleasantly loud sound pressure levels, considering the duration of the experiment.
In this context, it should be reminded that the stimulus tones presented at different dynamic levels were not just linearly scaled versions of one and the same spectrum. Besides the variation in sound pressure level between the three dynamic levels, the spectra of the stimulus tones also included the dynamic changes in the spectrum envelope (i.e., disproportionate amplification of higher overtones) exhibited by each instrument type.
SPL calibration values were measured with a fixture where the headphones rested against a small panel, and a calibrated sound level meter was attached flush through the fixture at the location of the ear canal entrance. The listening test system was implemented with Max MSP 8.0 software (Cycling '74 Inc., CA, USA) and sound reproduction was performed with headphones (AKG K550, AKG Acoustics GmbH, Vienna, Austria). A Zoom H6 (Zoom Corporation, Tokyo, Japan) portable recorder was used as an external sound card and a digitally controlled headphone amplifier.

Statistical analysis and virtual pitch estimation
The collected data were first subjected to regression analysis with the pitch and instrument pairs as primary independent variables. Spectral centroids, as well as individual magnitudes of the 10 first harmonics, were calculated for each tone for subsequent analyses.
Statistical analysis was conducted by computing Pearson's correlation coefficient between the calculated distances between the spectral centroids of the compared instrument tones and the perceptually adjusted tuning values of the same tones.
The term "tuning value" was introduced to indicate the difference in fundamental frequency (in cents) between the adjustable tone and the reference tone (in that order) when the two tones are perceived to have the same pitch. A positive tuning value means that the adjustable tone has a higher fundamental frequency than the reference when the two tones are perceived to be tuned in unison. Turned the other way round, a positive tuning value means that the reference tone is perceived to be higher in pitch than the adjustable tone when the two tones have the same fundamental frequency.
The data were explored by correlation analysis and visual inspection for relationships between the dependent variable (tuning value) and the independent variables described above. Initial observations directed our interest toward the trend in the overall deviation of tuning values as a function of the fundamental frequency. Subsequently, we applied principal component analysis (PCA) for investigating potential latent connections between the spectral features and the perceived pitch.
Terhardt's virtual pitch was estimated with the Matlab implementation (AARAE9 toolbox) of the original algorithm [18]. However, the virtual pitch estimation was observed producing values deviating over one semitone from the nominal fundamental frequency for tones below A1. The same behavior was detected with the actual stimuli as well as pure tone testing signals. Hence, the virtual pitch estimates that deviated from the nominal pitch of more than 100 cents were omitted.

Principal component analysis
In principal component analysis, the original multidimensional data are transformed to a new space with the aim of describing the data by a few salient data dimensions.
The new dimensions are denoted as principal components and are orthogonal to each other. The first principal component explains the maximum amount of variance in the original data and the following explain successively decreasing amount of the total variance. PCA was conducted with the FactoMineR package in R environment [36]. The analysis was based on comparing the spectral differences between the presented instrument pairs. The magnitude differences (in dB scale) between the 10 first harmonics of the compared instrument tones were used as variables. The mean tuning values (see Sect. 3.4) of the same tones were used as supplementary variables. That is, the tuning values did not influence the PCA; they were only projected on the resulting dimensions. This approach facilitates correlation analysis between tuning values and the principal components which characterize the differences between compared spectra. Furthermore, instrument pairs and dynamic levels were included in the PCA as additional grouping factors.
A number of ten harmonics were included in the PCA based on the following two principles. First, the 10 harmonics enabled a consistent comparison between all instrument pairs, including the synthesized sawtooth tone, which has only 10 harmonics that strictly follow 1/f spectrum. Second, according to earlier studies by Ritsma [20] and Moore et al. [21], the lowest harmonics are the most significant for the pitch perception of the low fundamental frequency tones.
PCA analysis was conducted in two separate runs. In the first case, stimulus pairs including the sawtooth were omitted from the PCA, as the synthesized spectral envelope was perfectly equal in all sawtooth tones. Consequently, the difference between the instrument spectrum and the constant envelope of the sawtooth spectrum would dominate as the first PCA component and therefore complicate the interpretation of the instrument pairs. In the second run, the PCA included only pairs with the sawtooth and an instrument.
Furthermore, we sharpened the dataset for the PCA by performing a t-test with criteria p < 0.05. The small-p-value dataset included only stimulus pairs where the variability of the tuning values was relatively small, and the differences between the means of the tuning values for the reference and adjusted tone, respectively, deviated from zero with statistical significance. The sharpened dataset included 91 out of 162 pairs with instruments only, and 46 out of 108 pairs that included the sawtooth waveform.

Results
The collected data were aggregated and inspected by the basic categories of instrument pair, dynamic level, and pitch. Figure 1 shows the grand average of the subjects' preferred tuning values, including all instrument combinations and dynamic levels, combined in a single boxplot. The order of presentation of the instruments in the stimulus pairs (reference toneadjusted tone) is not taken into account. For example, the contrabass clarinet and the contrabassoon appear both as a reference and adjusted tone in different stimulus pairs.
The figure illustrates that the variability of the tuning values is substantially larger for the lowest tones and decreases towards the higher. At the lowest musical pitch (A0), the interquartile range (25th-75th quantile) spans an interval of about 70 cents, reducing to 20 cents at the highest musical pitch (A2). Outliers reaching ±100 cents are present from A0 up to C2. The median values lie just below the zero line for all musical pitches, the only exception being A0 which reaches zero.
When looking at tuning values for the individual instrument pairs at different dynamic levels in Figure 2 it is hard to identify any general trend other than the increase in variability towards the lowest tones observed in Figure 1. In Figure 2, the order between the reference instrument tone and the adjusted tone is preserved. It is seen that the tuning value medians often vary between positive and negative values within one instrument pair. This suggests that the pitches of the tones of a particular instrument are heard seemingly randomly higher or lower than the compared instrument.
Furthermore, the 25th and 75th quantile ranges (boxes) typically cross the the zero line, which represents tuning to equal fundamental frequencies (0 cents). The results indicate that individual subjects perceived the pitch of the low-register tones with a relatively wide variation. Visual investigation of the instrument pairs suggests that a slightly more constant tendency could be found in the pairs with the contrabass clarinet and the tuba at pp level (Fig. 2B). The contrabass clarinet requires tuning upwards to match a unison with the tuba.
The pitch of the sawtooth tones (Fig. 3) was typically perceived higher than the pitch of the instrument in most of the cases. Similar to the stimulus pairs with instrument tones in Figure 2, the variability is high and of the same order of magnitude, increasing towards the lower musical pitches.
Whereas the mean tuning values across instruments and musical pitches appear random (Figs. 2 and 3), the degree of uncertainty in the tuning follows a more predictable curve. This effect is visualized in Figure 4 showing the mean absolute deviation in tuning values across all cases (instruments and sawtooth) and subjects for each of the nine musical pitches. The choice of mean absolute deviation (MAD) for describing the variability in tuning data, instead of the more commonly used standard deviation, was made in order to enable easier comparisons with other studies on pitch perception.
The data point for each musical pitch (thick line) reflects the spread of 930 tuning values (31 subjects Â 10 stimuli pairs Â 3 dynamic levels). Also here a higher variability in tuning for the lower musical pitches is evident. The mean absolute deviation increases from about 16 cents at A2 to 41 cents at A0. The calculated 95%-confidence interval is narrow, about ±3 cents.
The mean absolute deviations for each of the 31 subjects are included as separate curves, each data point representing 30 tuning values. The most reliable subjects showed mean absolute deviations between 7 and 27 cents, whereas the most uncertain subjects performed seemingly randomly with mean absolute deviations between 25 and 55 cents (excluding outliers). Since each stimulus pair was evaluated only once by each listener (no repeats), a formal analysis of  Regarding the possibility of improved tuning accuracy or evaluation by subjects who play a low-register instrument, the statistical dependency between performance by such groups was explored over all tones. This was evaluated with Welch's two-sample t-test, which did not show a statistically significant difference between the subject groups (t(24) = À0.29, p = 0.77).

Spectral centroid
The correlation between the spectral centroid and mean tuning values was estimated for every combination of the instruments in the stimulus pair and for each of the nine musical pitches. Pearson's correlation coefficients ranged from q = À0.09 (double bass and tuba) to q = 0.18 (double bass and contrabassoon). This outcome suggests that the spectral centroid is not a consistent and indicative measure of explaining the perceptual pitch difference between two complex tones.

Virtual pitch
The calculated virtual pitches showed no systematic correlation with the observed mean tuning values (see Fig. 2, virtual pitches marked by crosses). However, in a few cases (e.g., contrabassoon-contrabass clarinet pair in mf, see Fig. 2C), virtual pitch values follow the mean tuning values moderately well. Such effects were investigated statistically by calculating the correlations between mean tuning and virtual pitch for stimulus pairs. The highest correlation for the instrument pairs occurred between contrabass clarinet and tuba (q = 0.17). With the sawtooth tone included, the combination of sawtooth and contrabassoon produced q = 0.24. In all other cases, the correlation coefficients were lower. In short, no significant correlation between the mean tuning values and the virtual pitches could be observed.

Principal component analysis
The correlation analyses above did not reveal any consistent relationship between the mean tuning and overall spectral properties, represented by the spectral centroid and virtual pitch. A deepened analysis focused on differences in the harmonic structure of the compared tones as a possible cause of the large variability in the perceived pitches. "Difference spectra" were calculated and subjected to PCA, which decomposes the spectral differences into fewer common salient features. The difference spectra were calculated by subtracting the levels of the harmonics of the adjusted tone from the levels of the fixed reference tone on a decibel scale. An example of a difference spectrum is provided in Figure 5. The reference tone is the sawtooth waveform showing a 1/f magnitude spectrum. The adjusted instrument is the contrabass clarinet, which, similar to the clarinet, is characterized by attenuated second and fourth harmonics. The subtraction of the magnitude spectra results in the difference spectrum, shown in Figure 5B. Difference spectra spanning the first 10 harmonics were calculated for all included pairs of stimulus tones.
As mentioned, the PCA analysis was based on a sharpened dataset consisting of stimulus pairs showing a mean tuning deviation from zero with statistical significance (see Sect. 3.5). The PCA solutions for the difference spectra were obtained separately for tone pairs without and with the sawtooth tone (see Fig. 6). The first component (PC1) was identified to characterize the emphasis of the lowest harmonics and very weak even-order harmonics. The next component (PC2) illustrates a combination of  pronounced 2nd and 4th harmonics as well as reduced spectral balance at further harmonics. Notably, PCA gives this component an opposing polarity in analyses with and without sawtooth tone. PC3 describes the absence of the fundamental and harmonics 5-7, as well as the second harmonic in the analysis without sawtooth tone. Components with a strong contrast between odd and even harmonics can be associated with a contrabass clarinet-type tone often featuring attenuated even harmonics.
Although the eigenvalues of PC4 and PC5 do not exceed unity, these remaining components are shown here in a residual sense.
For instrument tone pairs (Fig. 6, top row), the first two principal components together explain nearly half of the total variance of the spectral differences (28% and 21% respectively). The corresponding values for the sawtoothinstrument pairs were 24% and 23%, respectively.
The biplots in Figure 7 show the difference between instrument pairs in the PC1-PC2 dimensions. For improved visual clarity, the cloud of data points for individual instrument pairs are shown with confidence ellipses. The arrows represent the 5 harmonic difference components (HDC) individually. Their lengths and directions indicate the contribution to the corresponding PC dimensions.
Together, the confidence ellipses, their centers, and the directions of HDC arrows illustrate the variability of spectral differences between instrument pairs. The location of the center of the confidence ellipse describes the average composition of the spectral difference in terms of PC1 and PC2 associated with a certain instrument pair. The angle of the ellipse major axis characterizes the strongest variability of the spectral differences within the respective pair. The smallest confidence ellipses are found for instrument pairs where the spectral difference remains constant regardless of the fundamental frequency and dynamic level.
The key findings can be interpreted as follows: In Figure 7A, the pair contrabassoon-contrabass clarinet shows a strong average deviation from the origin towards positive PC1 and negative PC2 values. The relations to the HDC arrows for harmonics 1, 2, and 4 suggest that the reference instrument tone of the pair (cbsn) contains substantially more even harmonics but its fundamental is weaker. For this instrument pair, the overall variation in spectral differences across tones and dynamic levels includes both PC1 and PC2.
A seemingly contradictory result can be observed for tone pairs including the contrabass clarinet (bolded ellipses in Fig. 7A). Since the contrabass clarinet is included both as a reference tone (with double bass and tuba) and an adjustable tone (with contrabassoon), subtraction of the adjustable tone spectrum from the reference tone spectrum results in a partially opposing spectral difference for contrabass clarinet pairs (opposite signs of the harmonic magnitude differences). Hence, the slopes and directions from the origin of the corresponding confidence ellipses vary between pairs including the contrabass clarinet as a reference and adjustable tone, respectively.
Within the instrument group, the centers of all three ellipses including contrabass clarinet deviate from the origin more than other instrument pairs, which in turn are notably co-centric (Fig. 7A). This strongly suggests that the absence of even harmonics in the clarinet-type spectrum is the most differentiating single spectral feature. The influence of the lowest even-order HDC, i.e., second and fourth harmonics, is prominent in both biplots in Figure 7 (without and with sawtooth tone pairs). This is particularly apparent in the pair with the sawtooth tone as a reference and the contrabass clarinet as an adjustable tone in Figure 7B. Here, HDC2 and HDC4 both point in the positive PC2 direction, suggesting that these harmonics were substantially stronger in the sawtooth tone than in the contrabass clarinet. The positive sign of HDC2 and HDC4 is reflected in the mean tuning values for sawtooth and contrabass clarinet in Figure 3 which all lie above or on the zero line. This indicates that the weak second and fourth harmonics in the contrabass clarinet contributed to a lowering of the perceived pitch compared to the sawtooth. As a consequence, the fundamental frequency of the contrabass clarinet has to be adjusted to a higher value to obtain a unison with the sawtooth.
For comparison, the analyses above were repeated on the entire unsharpened tuning data set. That is, all tone pairs were included regardless of the p-value of the mean tuning values. The PCA result remained generally similar to the sharpened dataset results, and the explained variances did not differ more than 1%.
Music dynamics had in general only a marginal effect on the perceived pitch differences according to the PCA, although the tuning values of different instrument combinations in Figures 2 and 3 showed some variations between dynamic levels.
As a result of the PCA, the salient harmonic difference components between the tone pairs were compared with the mean tuning values in Section 3.5. On the sharpened data set with only strong p-value tone pairs, PC2 showed the highest correlation with the mean tuning value for stimulus pairs including sawtooth as reference (q = 0.31). For stimulus pairs with two instruments, PC1 had the highest correlation with the mean tuning value (q = 0.24). The same PCs showed the strongest correlations with mean tuning across the full data set, but the correlation coefficients were lower (q = 0.14) for sawtooth pairs (PC2) and (q = 0.22) for instrument pairs (PC1).
Referring to the shape of the difference spectra in Figure 6, this result suggests that the absence of even harmonics has the most substantial influence on the lowering of the perceived pitch of low-register complex tones. Thus, the adjustable instrument tone with missing even harmonics (contrabass clarinet) was tuned higher to match the perceived pitch of a tone with a more uniform harmonic spectrum (see Figs. 2B-2C and 3).

Discussion
A striking result of the listening test is the overall large spread in the tuning values and the large variations between subjects. Altogether, the results reflect a large uncertainty in the perception of pitch at low frequencies.
Most of the participants reported that tuning of stimulus pairs at higher fundamental frequencies (>50 Hz) was straightforward and easy. Despite this apparently easy task, a remarkably large spread in tuning adjustments was evident in the results. For the lowest musical pitches, i.e., A0 (27.6 Hz) and C1 (32.5 Hz), the variability was high due to difficulty to even understand which tone was presented, which indicates that the stimuli gave very weak cues to the perception of melodic pitch. This observation is consistent with the results of a study by Krumbholz et al. [33], where the lower limit for temporal processing of pitch was reported to be about 30 Hz.
The high variability between and within subjects does, however, not conceal the general trend in data, showing larger uncertainty in tuning towards lower musical pitches. This is evident from Figure 4. The grand average of the mean tuning values (black line), which is based on (30 Â 31) 930 observations of each of the nine musical pitches, shows a narrow 95% confidence interval of only ±3 cents, approximately.
A formal analysis of the intrasubject variability in tuning was not possible to conduct as each stimulus pair was evaluated only once by each listener (no repeats). It is clear, however, that within the group of 31 professional musicians who participated in the study, there are large individual differences in the pitch perception and tuning accuracy. As seen in Figure 4 a few participants (about 5) had quite random and wide mean deviations. A majority of participants (about 15) were placed close to the average and a smaller group (about 10) performed considerably better. The best subjects lie consistently about 10 cents below the grand average line.
It is interesting to compare our results with previous studies on pitch discrimination, in particular measurements of just noticable difference in pitch (JND), usually reported as the corresponding difference in fundamental frequency. The great majority of studies of JNDs have been made using pure tone stimuli. The results are therefore not directly comparable with our experiments in which pairs of stimulus tones obtained from samples of musical instruments were compared. Published data in the low-frequency register are sparse, but a common result from previous studies is that JNDs worsened significantly towards the lowest musical pitches, in agreement with our results. The grand average curve of the absolute mean deviation in Figure 4 replicates the JND curves of previous studies, but it is shifted down about 30-60 cent compared to pure tone stimuli [37]. Using harmonic tones apparently improves the pitch discrimination.
In some previous experiments, pairs of complex harmonic tones which differ in a few spectral properties are compared, which still is far from our stimuli based on tones from musical instruments. In a recent study by Mehta and Oxenham [35], 12-harmonic complex tones were used as stimuli, which resembles our 13-harmonic sawtooth stimulus. They used stimuli where the three lowest harmonics were absent, which definitely worsened JND. The aim was to study the influence of listening to individual harmonics rather than to "overall" pitch defined by the periodicity of the tone. The deviation from our results in Figure 4 was large, especially in the lowest register where they reported JNDs up to 130 cent higher at 30 Hz (about A0). Their experiment is also relevant for studying the pitch perception of bass instruments with weak or missing fundamentals in the lowest register, for example, the double bass.
In our study, the participants were instructed to listen to the tones holistically without paying attention to individual harmonics. However, sometimes a few listeners may have been hypersensitive to some frequencies and that may cause unwanted emphasis on some harmonics, which for its part may slide the listener to a "wrong" (spectral) listening mode. This effect could result in intermittent outliers in otherwise more consistent data. However, according to Mehta and Oxenham [35], this listening option is not available for the lowest pitches as all harmonics are spectrally unresolved in the auditory periphery.
The contrabass clarinet was reported to be the most challenging stimulus. This opinion was supported by the PCA results which indicate that the contrabass clarinet evoked a pitch perception that was different from the other instruments. It has a deviating spectrum contour, showing alternating strong and weak harmonics and a strong fundamental frequency (Fig. 5). The PCA results showed that the strength of the second and fourth harmonics influenced the perceived pitch, as reflected in the mean tuning values. The contrabass clarinet required tuning upwards to match a unison with most other instruments and the sawtooth. That means that the pitch of the contrabass clarinet was perceived to be lower than other instruments when adjusted to the same fundamental frequency. The reason could be that the weak or almost lacking second and fourth harmonics (octave and double octave above the fundamental) may make the pitch perception more difficult.
Although a direct comparison of the mean values and variability in tuning values for different instrument combinations in Figures 2 and 3 did not indicate any particular difficulties in the pitch perception of the contrabass clarinet, the PCA analysis suggests that the harmonic structure with weak even low harmonics had a major influence on the pitch perception and tuning values for stimuli pairs including this instrument. Therefore, we could conclude that due to the spectral difference, the contrabass clarinet-like sound was perceived to have a lower pitch in the lowest register.
According to our earlier psychoacoustic study on the octave enlargement phenomenon [7], the general stretching curve is almost horizontal below A2 (110 Hz). However, in that study, the clarinet curve differed from other instruments and was more like a "J"-shape on its side, where the lowest register bends upwards. This effect may be explained by the findings in this study. If the pitch of the clarinet is perceived lower than would be expected from the periodicity of the tone, the pitch has to be adjusted upwards to achieve a perceptually correct octave interval.
Apparently, the influence of the relative strength of the harmonics on the perceived pitch is an important factor to consider. In this connection, it should be noted that since the sound pressure levels were equalized within pairs in our experiment, the prominence of the higher harmonics was relatively emphasized for instruments having weak or missing fundamental, like the double bass in the lowest octave.
Furthermore, due to the low sensitivity of the human ear in the low-frequency region [38], a weak fundamental may even have been completely inaudible for the lowest tones. Altogether, this may have influenced the perceived pitch.
Regarding the subjects' ability to hear differences between presented spectra, the conventional audiogram does not reveal much about real sensitivity to individual harmonics. Between the sparse measurement points of the audiogram, narrow frequency bands may be differently sensitive, which can cause amplification of some harmonics and attenuation of others. This, in part, may also affect the perceived pitch and explain the relatively large intersubject difference as suggested by Morgan and Galambos [16]. If an audiogram with narrow frequency bands (e.g. one semitone steps instead of octaves or fifths) would be collected from participants, it may help in similar types of studies to find correlations between hearing sensitivity, perceived pitch, and spectral envelope.
We did not find any signs that the players of bass instruments would have performed better on the listening test. Probably, the accuracy of pitch perception in the low register has its limits which do not depend significantly on the listener's training or background.
In the context of electronic tuning machines, it may be relevant to consider their usefulness, especially in the lowest register even though they are technically precise. This is an important question, particularly in the case of clarinet instruments, whose perceived pitch seems to be lower compared to other instruments.

Conclusions
The conducted listening experiment showed that the perceived pitch of low-register complex tones (derived from musical instrument samples) exhibits large variability and is highly dependent on the listener, spectrum, and dynamic level. Using 31 professional musicians as participants, the spread (mean absolute deviation) in the tuning of a melodic interval to unison, using complex tones with different spectra, increased continuously from 16 to 41 cents in the low-frequency range from 110 Hz (A2) to 27.6 Hz (A0). The result suggests that the participants were not able to determine an unambiguous reference tone over a considerable part of this frequency range. That is, towards the lowest register the uncertainty in the pitch judgments increased. However, it is debatable whether that is due to a reduced judgment ability (central) or inherent uncertainty in the information in the auditory nerve (peripheral). From a musical perspective, the result would imply that a melodic line in the bass register may be perceived as undefined, in particular in the lowest octave A0-A1.
The incongruity between the perceived pitch of the reference tones (using four different spectra) and Terhardt's model of the virtual pitch was substantial.
A prominent result achieved by PCA was that the musicians perceived the pitch of tones derived from samples of the contrabass clarinet to be somewhat lower than the pitch of other bass instruments. The plausible reason for this slight pitch shift is that the second and fourth harmonics are attenuated in the contrabass clarinet spectrum.

Conflict of interest
Author declared no conflict of interests.

Data availability statement
The research data (see Sect. 3.2) associated with this article are available in Zenodo, under the references [31] and [32].