Open Access
Issue
Acta Acust.
Volume 8, 2024
Article Number 12
Number of page(s) 12
Section Speech
DOI https://doi.org/10.1051/aacus/2024002
Published online 01 March 2024

© The Author(s), Published by EDP Sciences, 2024

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

Human voice directivity has long been a research topic and has been investigated for more than 200 years [13]. However, only a few of these studies determined directivity patterns in the vertical plane, or more generally analyzed full-spherical voice radiation. The first one by Dunn and Farnsworth [4] was published in 1939 and determined the spherical directivity patterns for a spoken sentence at different distances in octave or one-half octave bands from 63 Hz up to 12 kHz. Even though not mentioned in the paper, the data allow the analysis of specific aspects of vertical speech directivity, such as the frequency dependence of the main radiation direction (MRD). Similar patterns of the vertical voice directivity pattern were measured by Marshall and Meyer [5], who already briefly discussed a downward-directed MRD in their study. Chu and Warnock [6] provided means and standard deviations of voice directivity patterns measured for 40 subjects taken at 92 positions on a spherical grid in one-third octave bands. Due to the higher resolution over frequency, the data show better than the previous studies how the MRD of the human voice varies with frequency. While the radiation is slightly downward directed for most frequency bands, at about 1 kHz the MRD is elevated by about 30°. Recently, Leishman et al. [7] published results of measurements on six subjects with a resolution of 5° in the horizontal and vertical planes that further support previous findings on the MRD. However, because the directivity patterns in all these studies were averaged over a full sentence of fluent speech, these datasets do not allow for the analysis of phoneme-dependencies.

In Pörschmann and Arend [8, 9], we determined the spherical directivity patterns of 23 phonemes of different groups for 13 subjects using a 32-channel surrounding spherical microphone array and examined both the directivity patterns and the directivity indices. Our results showed statistically significant differences between phoneme groups and between some phonemes within each group. Most importantly in the context of the present study, this comprehensive database allows a phoneme-dependent analysis of the MRD.

Furthermore, only a few studies addressed the perceptual influence of the ground reflection, which is often only slightly lower in intensity than the direct sound, and thus can affect localization, distance perception, and perceived spaciousness [1015]. However, none of these studies considered natural human voice radiation or a downward-directed MRD, which can lead to a ground reflection reaching a listener at a higher intensity than the direct sound. The present study aims to disentangle these phoneme- and frequency-dependencies of the MRD and its effect on the ground reflection, extending our previous studies on human voice directivity [8, 9].

2 Methods

This work applies the measured datasets from our previous studies [8, 9, 16]1 that were carried out with 13 subjects in the anechoic chamber of TH Köln using a surrounding spherical microphone array with 32 sampling positions [17, 18]. In the present study, the following articulations were considered:

  • Phonetically balanced sentence2

  • Vowels: [a], [e], [i], [o], [u]

  • Plosives: [p], [t], [k], [b], [d], [g]

  • Unvoiced fricatives: [f], [s], [ʃ], [x], [h]

  • Voiced fricatives: [z], [v]

  • Nasals: [m], [n], [ŋ]

  • Voiced alveolars: [l], [r].

The phonetically balanced sentence was recorded to determine an average directivity. For all the phonemes, a suitable voice excitation signal was chosen that had a broadband energy spectrum and thus allowed calculating the directivity pattern continuously over frequency. For the voiced phonemes, the glissando method [19, 20] was applied, which means the subjects sang the phoneme with an increasing pitch over at least one octave. For the plosives and the unvoiced fricatives, the participants repeated the respective phoneme three times for each measurement. For the unvoiced [x], the subjects continuously articulated the phoneme for at least three seconds. For each measured articulation, we determined relative transfer functions related to the frontal direction by deconvolving the microphone array signals with the signal from an additional frontal reference microphone, yielding a dataset with 32 raw impulse responses per articulation. The measured datasets were spatially upsampled to a Lebedev grid [21] with 2702 sampling points using the SUpDEq (Spatial Upsampling by Directional Equalization) method [22], which is suitable for upsampling voice directivities [23]. The spatially upsampled directivity patterns were stored as impulse responses with a length of 128 coefficients at a sampling rate of 48 kHz. Thus, the datasets from [8, 9, 16] provide a spectrally and spatially dense description of the directivities. While for the sentence only one directivity dataset per person was measured, each phoneme was measured twice per person and the resulting directivity data were averaged.

We analyzed the directivities D(ϕ0,θ,f)$ D({\phi }_0,\theta,f)$ in the frontal vertical plane (also called sagittal plane; azimuth ϕ0 = 0°, elevation θ = −90° to 90°, where positive angles point upward). This was done by transforming the upsampled datasets to the spherical harmonic (SH) domain with an SH order of N = 35 and then resampling them using the inverse SH transform to dense datasets along the vertical plane in steps of 1° as evaluated in our previous study [23]. From this, the frequency-dependent value of the MRD of each individual dataset was calculated:

MRD(f)=argmaxθ(D(ϕ0,θ,f)).$$ \mathrm{MRD}(f)=\mathrm{arg}\underset{\theta }{\mathrm{max}}(D({\phi }_0,\theta,f)). $$(1)

Furthermore, we determined the main radiation direction gain (MRDG), which describes the level difference between radiation towards the MRD and the frontal direction:

MRDG(f)=20 lg|D(ϕ0,MRD,f)||D(ϕ0,θ0,f)|,$$ \mathrm{MRDG}(f)=20\enspace {lg}\frac{|D({\phi }_0,\mathrm{MRD},f)|}{|D({\phi }_0,{\theta }_0,f)|}, $$(2)with ϕ0,θ0$ {\phi }_0,{\theta }_0$ the frontal direction.

Finally, to analyze the strength of the ground reflection, we determined the ground reflection gain (GRG) as the level difference between the ground reflection and the direct sound incidence at the listener’s position:

GRG(d,f)=20 lg(1-α(f)) dd' |D(ϕ0,θ',f)||D(ϕ0,θ0,f)|$$ \mathrm{GRG}(d,f)=20\enspace {lg}\left(\sqrt{1-\alpha (f)}\right)\enspace \frac{d}{{d}^{\prime}\enspace \frac{|D({\phi }_0,{\theta }^{\prime},f)|}{|D({\phi }_0,{\theta }_0,f)|} $$(3)where α denotes the frequency-dependent absorption coefficient of the ground floor. The geometric properties depend on the relative position between speaker and listener, d denotes the distance between the human speaker and the listener, d the path length of the ground reflection, and θ the radiation direction of the ground reflection.

3 Results

3.1 Main radiation direction

Figures 1 and 2 show the vertical-plane directivity averaged over all subjects for all examined phonemes as well as the respective MRD. For the phonetically balanced sentence as well as for almost all phonemes, there is a prominent maximum of radiation towards θ ≈ −45° between 600 Hz and 800 Hz and upwards towards θ ≈ 30° between 800 Hz and 1 kHz. As an exception, both can hardly be observed for the voiceless fricatives [f], [s], and [ʃ]. For most phonemes, there are further local maxima in a third region between 1.5 kHz and 1.8 kHz and in a fourth region above 4 kHz, where the radiation is directed downwards to θ ≈ −30°. The fourth region is for the vowels, the plosives, and most of the fricatives at about 5 kHz, and for the nasals at about 7 kHz. The black lines in Figures 1 and 2 show the MRD averaged over all subjects, which is below 800 Hz and above 1.6 kHz downwards at -15°θ-45°$ -15\mathrm{{}^{\circ} }\le \theta \le -45\mathrm{{}^{\circ} }$, confirming the results of previous studies [47]. The plots also show that for all phonemes, the MRD rises sharply to θ ≈ +15° at f ≈ 800 Hz and drops back to θ ≈ −20° at f ≈ 1.6 kHz.

thumbnail Figure 1

Vertical directivity averaged over all subjects over frequency for the sentence, the vowels and the plosives. The black line shows the MRD averaged over all subjects.

thumbnail Figure 2

Vertical directivity averaged over all subjects over frequency for the fricatives, nasals and alveolars. The black line shows the MRD averaged over all subjects.

Even though some phoneme dependencies can be observed, it is difficult to compare them directly based on these plots. Therefore, Figure 3 shows the subject-averaged MRD for the different phoneme groups, revealing deviations of the MRD within the vowels, nasals, and especially the plosives. For further comparison, each plot also depicts the MRD for the sentence, which can be regarded as a phonetically averaged value. Generally, above 4 kHz the differences between the phonemes and the standard deviations, which indicate the amount of inter-subject variation increase. Both could be due to the higher spatial complexity of the directional characteristics with peaks and dips at different frequencies and directions. In the case of the nasals, the MRD is slightly shifted towards lower frequencies, so that the sharp increase from a downward to an upward MRD starts at about 700 Hz. The MRD of the voiced [z] is similar to that of the voiceless [s] and [ʃ] fricatives. However, the standard deviations are much larger for the [z] below 1 kHz than for any of the other phonemes. On the contrary, for the plosives, the standard deviations are small compared to the other phonemes.

thumbnail Figure 3

Mean values and standard deviations across subjects of the MRD (1/12-octave smoothed) for the phoneme groups. For comparison, each plot also shows the MRD for the phonetically balanced sentence.

3.2 Main radiation direction gain

Figure 4 shows the means and standard deviations of the MRDG for the different phoneme groups, illustrating the strength of the increased sound radiation towards the MRD. The plots indicate frequency- and phoneme-dependent differences in MRDG, which was statistically confirmed by a Greenhouse–Geisser (GG) corrected [24] two-way repeated measures analysis of variance (ANOVA) with the within-subject factors phoneme (23 phonemes) and frequency (19 third-octave frequency-bands from 125 Hz to 8 kHz), revealing a significant main effect of phoneme [F(22,264) = 15.42, pGG < .001, ηp2= .56$ {\eta }_p^2=\enspace.56$, ϵ = .25] and frequency [F(18,2136) = 50.80, pGG < .001, ηp2$ {\eta }_p^2$ = .81, ϵ = .15], as well as a significant phoneme × frequency interaction effect [F(396,4752) = 7.48, pGG < .001, ηp2$ {\eta }_p^2$ = .38, ϵ = .02].

thumbnail Figure 4

Mean values and standard deviations across subjects of the MRDG (1/12-octave smoothed) for the phoneme groups. For comparison, each plot also shows the MRDG for the phonetically balanced sentence.

For most phonemes, there is a prominent peak at f ≈ 700 Hz. However, some phonemes, such as the unvoiced fricatives [f], [s], and [ʃ], do not show this prominent peak in this frequency range, whereas others, such as the voiced alveolars [l] and [r], show a stronger peak when compared visually to the other phonemes. For statistical analysis of the MRDG across phoneme groups, we performed pairwise comparisons between the frequency-dependent means of each phoneme group (i.e., averaged MRDG across the respective phonemes of each group) using nested GG-corrected two-way repeated measures ANOVAs with the within-subject factors frequency and phoneme group. The initial significance level of 0.05 was further corrected according to Hochberg [25] to prevent alpha-error accumulation. Most interestingly, the ANOVAs yielded significant frequency × phoneme group interaction effects for all five comparisons involving the unvoiced fricatives (all pGG ≤ .003); for the sake of conciseness, the statistical results of the nested ANOVAs are not reported in greater detail throughout the manuscript). The effect is driven by the deviations in MRDG below 1 kHz for [f], [s], and [ʃ], indicating that the downward radiation is weaker for these unvoiced fricatives than for any of the other phonemes examined in the present study. Furthermore, the analysis revealed significant frequency × phoneme group interaction effects for the five pairwise comparisons involving the voiced alveolars (all pGG ≤ .01), indicating that the downward radiation below 1 kHz is stronger for these phonemes compared to all other phonemes. The MRDG remains below 2 dB for all phonemes between 1 kHz and 4 kHz. Above 4 kHz, the MRDG increases most for the nasals, reaching values of up to 6 dB. For all other phonemes, the MRDG is below 4 dB. This is consistent with our observations [9, Fig. 6], where we found for the nasals a strong dip in frontal radiation relative to other directions above 5 kHz. The statistical analysis further confirmed these observation, with significant frequency × phoneme group interaction effects for the five pairwise comparisons involving the nasals (all pGG < .001), which is driven by the comparably high MRDG of the nasals in the higher frequency range. Thus, in the frequency range above 4 kHz, the downward radiation seems to be significantly stronger for nasals than for any of the other phonemes analyzed.

3.3 Ground reflection gain

To estimate how the MRD affects the ground reflection, we further determined the GRG. A downward orientated MRD amplifies the ground reflection compared to the horizontally-directed direct sound component. As a result, the ground reflection can be even stronger than the direct sound component, resulting in GRG > 0 dB. The GRG is maximal at distances where the radiation direction of the ground reflection is the same as the MRD and thus has a strong frequency dependence that is additionally affected by the distance between the human speaker and the listener. In the following, we assume an ideal reflecting ground floor (α = 0) and a height of the human speaker’s mouth and the listener’s ears of 1.5 m, which represent typical values for a natural communication scenario with two persons facing each other. We consider a varying distance between the human speaker and the listener. Figures 5 and 6 show the resulting subject-averaged GRG for the phonetically balanced sentence and each of the phonemes. In each plot, the black line indicates where the direct sound and the ground reflection are equally strong (GRG = 0 dB).

thumbnail Figure 5

Subject-averaged GRG over frequency for the phonetically balanced sentence, the vowels and the plosives. The black lines indicate where the direct sound and the ground reflection reach the listener with equal strength (GRG = 0 dB).

thumbnail Figure 6

Subject-averaged GRG over frequency for the fricatives, nasals and alveolars. The black lines indicate where the direct sound and the ground reflection reach the listener with equal strength (GRG = 0 dB).

Below 800 Hz and for most phonemes, the subject-averaged GRG is higher than 0 dB for distances above 3 m. For the sentence, the GRG is maximal at a distance of 4 m at 750 Hz with a GRG of 2.6 dB. The phonemes lead to a maximal GRG of 4.6 dB for an [i] at 700 Hz at a distance of 3.85 m and an [l] at 800 Hz at a distance of 3.2 m. On the contrary, for frequencies between 800 Hz and 1.6 kHz, the GRG is for all phonemes below 0 dB. Thus, we found a similar structure of the GRG for most phonemes. Exceptions are the voiceless fricatives [f], [s], [ʃ], and the [z], for which maximal values of the GRG are less than 1 dB below 800 Hz. For the [s] the [ʃ], and the [z], the highest GRG of more than 1 dB can be found in a region between 4 kHz and 7 kHz. These similarities between the [z] and the [s], which are unvoiced and voiced counterparts, are worth noting. For the plosives [t], [k], [b], [d], and [g] there is a second frequency range with an increased GRG at about 5.5 kHz, reaching values of up to 2.7 dB for the [d]. The nasals show similar behavior but at slightly higher frequencies of about 7.5 kHz, reaching maximal values of 3.2 dB for the [m] at a distance of 4 m.

The ground reflection and thus the GRG are directly affected by the absorption coefficient α of the ground floor, according to equation (3). The influence of α is visualized for the phonetically balanced sentence in Figure 7. For α = 0.2, there is still a region below 1 kHz at distances above 3 m where the GRG is above 0 dB and another one above 4 kHz. The region becomes much smaller for α = 0.4 only in an area around about 700 Hz and 5 m distance, and completely diminishes for α = 0.6.

thumbnail Figure 7

Subject-averaged GRG over frequency for the phonetically balanced sentence with a frequency-independent varying absorption coefficient α of the ground floor. The black lines indicate where the direct sound and the ground reflection reach the listener with equal strength (GRG = 0 dB).

At the listener’s position, the direct sound and the ground reflection are superimposed, causing constructive or destructive interference depending on the phase characteristics. Generally, this interference results in comb filter effects with a distance-depending structure of peaks and dips over frequency.

4 Discussion

This study examined frequency dependencies of the MRD for human voice radiation and investigated to what extent the radiation towards the MRD is stronger than in the frontal direction. Our results confirm those of other studies in the literature, all of which, however, are based on datasets averaged over a complete sentence and thus cannot analyze phoneme dependencies as in the present study. Dunn and Farnsworth [4] performed measurements in octave bands for f < 500 Hz and one-half octave bands for f ≥ 500 Hz at a distance of 1 m and a vertical resolution of 45°. They found that in the one-half octave bands between 500 Hz and 1 kHz, the main direction is downward (θ = −45°) and between 1 kHz and 1.4 kHz upward (θ = +45°), which is consistent with our results for the phonetically balanced sentence. It is worth noting that data of that study shows a similar trend for near-field measurements as well. Marshall and Meyer [5] determined the vertical voice directivity in octave bands with center frequencies between 125 Hz and 8 kHz and found the MRD to be about θ = −20°. Because this study provides only data in octave bands, the abrupt changes in MRD around 1 kHz could not be resolved. However, in the 1 kHz octave band, the strength of the radiation is similar for θ = −20° and θ = +45°. Chu and Warnock [6], who performed measurements in one-third octave bands, found downward MRDs (-42°θ-13°$ -42\mathrm{{}^{\circ} }\le \theta \le -13\mathrm{{}^{\circ} }$) for the bands with center frequencies between 315 Hz and 800 Hz and an upward MRD (θ = 31°) at the frequency bands with center frequencies of 1 kHz and 1.25 kHz, which is also in line with our results. Similarly, the results of Moreno and Pfretzschner [26] show MRDs upwards and downwards in these frequency bands, but with slightly different absolute values, probably due to the varying sampling grid.

The MRDG as depicted in Figure 4 shows a prominent peak for most of the phonemes at f ≈ 700 Hz. Similarly, the spherical directivity index has a characteristic dip in this frequency range [8, 9], as both are direct counterparts. Since the directivity index is calculated related to the frontal direction, it decreases when the radiation into other directions increases. Accordingly, the dip in directivity index is less prominent for the unvoiced fricatives [f], [s], and [ʃ], and so is the MRDG for these phonemes (cf. [8] and Fig. 4).

Although the general behavior of the MRD can be determined from data in the literature, it has not yet been studied in detail, and no causes have been identified for the increase in MRDG at f ≈ 700 Hz. Since our results show that it is similar for a large number of phonemes, it is unlikely to be caused by properties of specific phonemes, such as different mouth opening sizes and radiation through the mouth and nostrils in the nasals. In this context, Birkholz et al. [27] recently simulated the frontal radiation of a modeled human mouth and torso, which showed a sharp dip at f ≈ 800 Hz and a rise to a (local) maximum at f ≈ 1.3 kHz. Their comparative simulations of a spherical cap in a spherical baffle (without any kind of torso) did not show these peaks and dips, indicating that diffraction and reflection at the shoulder and torso could be a cause of these effects. In a recent study, Blandin et al. [28] examined torso diffraction and reflections in more detail. They showed that when the torso is present, interference patterns occur containing repeated maxima and minima in both the horizontal and vertical planes. For the frequency range below 2 kHz, where we found the prominent pattern of downward and upward directed MRDs, the study by Blandin et al. showed a significant effect of the torso on the directivity index. These results are supported by the findings of Bellows and Leishman [29], who compared voice directivity patterns of a KEMAR HATS with and without a torso. Their plots of the 1.6 kHz one-third octave band show a downward-directed (local) maximum for the measurements with the torso that does not occur in the measurements without the torso. Similarly, we observed an area between 1.5 kHz and 1.8 kHz in our measurements where the sound radiation is directed downwards. The study by Blandin et al. [28] further revealed that torso diffraction and reflections are enhanced by the presence of the lips compared to a simple model of a piston placed on the surface of a spherical head model.

The downward-directed MRD strongly affects the ground reflection intensity. The strongest effect can be observed for distances between 3 m and 7 m. Below 800 Hz, the ground reflection is stronger than the direct sound for most phonemes, with a peak of 4.6 dB for an [i] and an [l]. Even though this effect is weaker for the other phonemes, the behavior is similar for all phonemes except a few unvoiced fricatives. Towards higher frequencies, there are phoneme-dependent maxima, e.g., for most of the plosives.

Bech [11, 12] determined the perceptual thresholds of 17 early reflections in a simulated small room with a volume of 113 m3. The results showed that the signal level of four early reflections exceeded the human threshold of detection which included the ground reflection that was attenuated by 1.36 dB related to the direct sound. Bech suggested that the ground reflection in particular affects the perceived spaciousness. A few other studies indicate an effect of the ground reflection on localization. Guski [10], studied the influence of reflections from the ground, ceiling, and side walls on localization and showed that adding a reflection from the ground significantly reduces the vertical localization error of a speech signal. Gourévitch and Brette [13] analyzed the influence of ground reflections on binaural cues utilizing numerical models. The authors hypothesized that the ground reflection may potentially provide additional cues for distance and elevation estimation. Wendt et al. [15] investigated aspects of vertical sum localization due to the ground reflection. While the authors did not find a significant influence of the ground reflection for broadband noise, for speech signals the ground reflection contributed to the perception of height. Finally, the ground reflection can affect distance perception. In this context, the results of Ebelt et al. [14] indicate an influence of the ground reflection on distance estimation if the subject is familiar with the environment. None of these studies took into account the directivity of the human voice, which has an MRD that is directed downward over a wide frequency range. Accordingly, for the human voice, the effects of the ground reflection could be even stronger than the ones determined in these studies. As the ground reflection is present in many typical conversation scenarios, future studies focusing on the perceptual relevance of increased downward radiation of human voice directivity need to be carried out.

In our study, we assumed a non-tilted head with a frontal head orientation. However, it has been shown, e.g., by Munhall et al. [30], that people intuitively tend to make small head movements during conversations and to tilt their head while speaking, which is to some extent to support what is being said. We plan to investigate aspects of non-frontal head orientation on voice radiation in future studies.

5 Conclusion

We investigated the main radiation direction (MRD) of the human voice for a variety of phonemes, showing a characteristic pattern of MRD over frequency. We found statistically significant differences, for example, between the unvoiced fricatives [f], [s], and [ʃ], and the other phonemes. For all phonemes, the MRD is slightly downward in a range of −15° to −45° over a wide frequency range, but between 800 Hz and 1.6 kHz the MRD is upward to about 15°. The downward radiation strongly affects the ground reflection, with amplitudes that are higher than those of the direct sound component over a wide frequency range. As a consequence, for distances greater than 3 m between speaker and listener, the reflection from an ideal reflecting ground floor reaches the listener with a higher amplitude than the direct sound at frequencies below 800 Hz. Accordingly, perceptual aspects of the ground reflection’s magnitude and the perceptual relevance of considering phoneme dependencies need to be addressed in future research.

Funding

The research presented in this paper has been partly funded by the Federal Ministry of Education and Research in Germany, support code BMBF 03FH014IX5NarDasS, and by the German Research Foundation under Grant No. DFG WE 4057/21-1.

Conflict of interest

The authors have no conflicts of interest to disclose.

Data availability statement

This work applies measured datasets from our previous studies [8, 9, 16], which are publicly available as research data associated in Zenodo, under the references [31, 32].


1

The datasets are available as supplementary material to these papers and can be accessed at https://doi.org/10.5281/zenodo.7452117 and at https://doi.org/10.5281/zenodo.7834210.

2

The subjects articulated the following German sentence: "Die Schüssel mit Äpfeln haben wir auf den Küchentisch gedeckt."

References

  1. G. Saunders: Treatise on theaters. I. and J. Taylor, London, 1790. [Google Scholar]
  2. B. Wyatt: Observation on the design for the theatre royal, Drury Lane. J. Taylor, London, 1813. [Google Scholar]
  3. J. Henry: Annual Report of the Board of Regents of the Smithsonian Institution. Technical report, A. G. F. Nicholson, Washington, DC, 1857. [Google Scholar]
  4. H.K. Dunn, D.W. Farnsworth: Exploration of pressure field around the human head during speech, Journal of the Acoustical Society of America 10 (1939) 184–199. [CrossRef] [Google Scholar]
  5. A.H. Marshall, J. Meyer: The directivity and auditory impressions of singers. Acustica 58 (1985) 130–140. [Google Scholar]
  6. W.T. Chu, A.C.C. Warnock: Detailed directivity of sound fields around human talkers. Technical Report NRC-IRC-15212, National Research Council of Canada, Ottawa, 2002. [Google Scholar]
  7. T.W. Leishman, S.D. Bellows, C.M. Pincock, J.K. Whiting: High-resolution spherical directivity of live speech from a multiple-capture transfer function method. Journal of the Acoustical Society of America 149, 3 (2021) 1507–1523. [CrossRef] [PubMed] [Google Scholar]
  8. C. Pörschmann, J.M. Arend: Investigating phoneme-dependencies of spherical voice directivity patterns. Journal of the Acoustical Society of America 149, 6 (2021) 4553–4564. [CrossRef] [PubMed] [Google Scholar]
  9. C. Pörschmann, J.M. Arend: Investigating phoneme-dependencies of spherical voice directivity patterns II: Various groups of phonemes. Journal of the Acoustical Society of America 153, 1 (2023) 179–190. [CrossRef] [PubMed] [Google Scholar]
  10. R. Guski: Auditory localization: Effects of reflecting surfaces. Perception 19 (1990) 819–830. [CrossRef] [PubMed] [Google Scholar]
  11. S. Bech: Timbral aspects of reproduced sound in small rooms. I; Journal of the Acoustical Society of America 97, 3 (1995) 1717–1726. [CrossRef] [PubMed] [Google Scholar]
  12. S. Bech: Timbral aspects of reproduced sound in small rooms. II. Journal of the Acoustical Society of America 99, 6 (1996) 3539–3549. [CrossRef] [Google Scholar]
  13. B. Gourévitch, R. Brette: The impact of early reflections on binaural cues. Journal of the Acoustical Society of America 132, 1 (2012) 9–27. [CrossRef] [PubMed] [Google Scholar]
  14. M.D. Ebelt, J.M. Arend, C. Pörschmann: Influences of the floor reflection on auditory distance perception. In: Proceedings of the 42th DAGA, Deutsche Gesellschaft für Akustik, 2016, pp. 1467–1469. [Google Scholar]
  15. F. Wendt, R. Höldrich, M. Frank: The influence of the floor reflection on the perception of sound elevation. In: Proceedings of the 43rd DAGA, Deutsche Gesellschaft für Akustik, 2017, pp. 767–770. [Google Scholar]
  16. C. Pörschmann: A database for the comparison of measured datasets of human voice directivity. In: Proc. Forum Acusticum Torino, Italy, European Acoustic Association, 2023, pp. 4131–4138. [Google Scholar]
  17. J.M. Arend, P. Stade, C. Pörschmann: A system for binaural reproduction of self-generated sound in VAEs. In: Proceedings of the 43rd DAGA, 2017, pp. 271–274. [Google Scholar]
  18. J.M. Arend, T. Lübeck, C. Pörschmann: A reactive virtual acoustic environment for interactive immersive audio. In: Proceedings of the AES Conference on Immersive and Interactive Audio, 2019. [Google Scholar]
  19. M. Kob, H. Jers: Directivity measurement of a singer. Journal of the Acoustical Society of America 105 (1999) 1003. [CrossRef] [Google Scholar]
  20. M. Brandner, R. Blandin, M. Frank, A. Sontacchi: A pilot study on the influence of mouth configuration and torso on singing voice directivity, Journal of the Acoustical Society of America 148, 3 (2020) 1169–1180. [CrossRef] [PubMed] [Google Scholar]
  21. V.I. Lebedev: Spherical quadrature formulas exact to orders 25–29. Siberian Mathematical Journal 18, 1 (1977) 99–107. [CrossRef] [Google Scholar]
  22. C. Pörschmann, J.M. Arend, F. Brinkmann: Directional equalization of sparse head-related transfer function sets for spatial upsampling. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 6 (2019) 1060–1071. [CrossRef] [Google Scholar]
  23. C. Pörschmann, J.M. Arend: A method for spatial upsampling of voice directivity by directional equalization. Journal of the Audio Engineering Society 68, 9 (2020) 649–663. [CrossRef] [Google Scholar]
  24. S.W. Greenhouse, S. Geisser: On methods in the analysis of profile data. Psychometrika 24, 2 (1959) 95–112. [CrossRef] [Google Scholar]
  25. Y. Hochberg: A sharper bonferroni procedure for multiple tests of significance. Biometrika 75, 4 (1988) 800–802. [CrossRef] [Google Scholar]
  26. A. Moreno, J. Pfretzschner: Human head directivity in speech emission: A new approach. Acoustics Letters 1 (1978) 78–84. [Google Scholar]
  27. P. Birkholz, S. Ossmann, R. Blandin, A. Willbrandt, P.K. Krug, M. Fleischer: Modeling speech sound radiation with different degrees of realism for articulatory synthesis, IEEE Access (2022) 95008–95019. [CrossRef] [Google Scholar]
  28. R. Blandin, J. Geng, P. Birkholz: Investigation of the influence of the torso, lips and vocal tract configuration on speech directivity using measurements from a custom head and torso simulator. Acta Acustica, 7 (2023) 39. [CrossRef] [EDP Sciences] [Google Scholar]
  29. S. Bellows, T.W. Leishman: Effect of head orientation on speech directivity. In: Interspeech 2022, ISCA, September 2022, pp. 246–250. [CrossRef] [Google Scholar]
  30. K.G. Munhall, J.A. Jones, D.E. Callan, T. Kuratate, E. Vatikiotis-Bateson: Visual prosody and speech intelligibility: head movement improves auditory speech perception. Psychological Science 15, 2 (2004) 133–137. [CrossRef] [PubMed] [Google Scholar]
  31. C. Pörschmann, J.M. Arend: Supplementary material for “Investigating phoneme-dependencies of spherical voice directivity patterns II: Various groups of phonemes” [Data set]. Journal of the Acoustical Society of America 153, 1 (2023). Zenodo. https://doi.org/10.5281/zenodo.7452117. [Google Scholar]
  32. C. Pörschmann: Supplementary material for “A database for the comparison of measured datasets of human voice directivity” [Data set]. In Forum Acusticum 2023, Torino Italy, 2023. Zenodo. https://doi.org/10.5281/zenodo.7834210. [Google Scholar]

Cite this article as: Pörschmann C. & Arend JM. 2024. On the impact of downward-directed human voice radiation on ground reflections. Acta Acustica, 8, 12.

All Figures

thumbnail Figure 1

Vertical directivity averaged over all subjects over frequency for the sentence, the vowels and the plosives. The black line shows the MRD averaged over all subjects.

In the text
thumbnail Figure 2

Vertical directivity averaged over all subjects over frequency for the fricatives, nasals and alveolars. The black line shows the MRD averaged over all subjects.

In the text
thumbnail Figure 3

Mean values and standard deviations across subjects of the MRD (1/12-octave smoothed) for the phoneme groups. For comparison, each plot also shows the MRD for the phonetically balanced sentence.

In the text
thumbnail Figure 4

Mean values and standard deviations across subjects of the MRDG (1/12-octave smoothed) for the phoneme groups. For comparison, each plot also shows the MRDG for the phonetically balanced sentence.

In the text
thumbnail Figure 5

Subject-averaged GRG over frequency for the phonetically balanced sentence, the vowels and the plosives. The black lines indicate where the direct sound and the ground reflection reach the listener with equal strength (GRG = 0 dB).

In the text
thumbnail Figure 6

Subject-averaged GRG over frequency for the fricatives, nasals and alveolars. The black lines indicate where the direct sound and the ground reflection reach the listener with equal strength (GRG = 0 dB).

In the text
thumbnail Figure 7

Subject-averaged GRG over frequency for the phonetically balanced sentence with a frequency-independent varying absorption coefficient α of the ground floor. The black lines indicate where the direct sound and the ground reflection reach the listener with equal strength (GRG = 0 dB).

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.