Comparing sound emergence and sound pressure level as predictors of short-term annoyance from wind turbine noise

While most of the countries over the world rely on sound-pressure-level-based limit values to regulate wind energy development, sound emergence as defined in ISO 1996-1 is used in a few national legislations but also in international guidelines. There is however no published evidence that sound emergence is a relevant noise descriptor for that kind of source, namely that there is a correlation between this metric and perception or annoyance. A listening test was carried out to evaluate the relative merits of sound pressure level and sound emergence as predictors of annoyance from wind turbine noise. The test samples consisted of 45 30-s wind turbine sounds at three different A-weighted sound pressure levels and five different signal-to-noise ratios. Thirty two persons rated the test samples according to the ISO 15666 standard scale in a dry room equipped with loudspeakers. The results indicate that short term annoyance is better predicted by A-weighted sound pressure levels than by sound emergence. It is also observed that sound emergence is a poor predictor of the audibility of wind turbine sounds.


Introduction
In the past decades, wind power experienced a sustained growth all over the World. 54.2 GW of onshore wind capacity was added globally in 2019 taking cumulative onshore wind beyond 600 GW [1]. In Europe the installed power was 205 GW at the end of 2019 [2]. With a growth rate above 10% per year between 2009 and 2019, wind power is close to becoming the most important power source in Europe. Since wind power is identified as one of the major tools in the transition toward a carbon-free energy mix the pressure to further increase the installed power is not likely to recede in the near future.
While it is well known that noise explains only a small part of the variance of annoyance due to noise-generating activities in general [3] and wind energy in particular [4], acoustics is a key factor in the acceptance of wind energy by the communities. There is clear empirical evidence that wind turbine noise can be a source of annoyance [4][5][6][7]. Due to the pressure to increase the installed capacity, over time one can expect that the density of wind farms per unit area will be increased and that the new wind farm projects will be located closer to populations. The risk of conflicts with the neighborhood will then be increased. Therefore it is worth considering the relevance of the metric used for setting legal noise limits.
In the particular context of wind turbine noise control, most of the countries rely on various ratings based on either (1) the total sound only, or (2) the specific soundor source-attributable soundand the residual sound, i.e. the remaining sound without any source-attributable sound [8]. But some legislation and guidelines prefer to rely on (3) sound emergence e.
To our knowledge only two countries over the world, namely Italy [9][10][11] and France [12], specify sound emergence limit values. The concept of sound emergence appears also implicitly in the guidelines issued by the World Bank [13,14] that specify a maximum increase in "background levels of 3 dB at the nearest receiver location". This guideline value is always combined with limit values expressed in l eq and applies beyond the property boundaries of the noisy facilities. The current compliance criterion is that emergence must be lower than 3 dB for night-time and 5 dB for day-time in Italy and France [10,12]. The field assessment of the compliance of a wind farm with such low limit values is not devoid of challenges [15]. The main issue, however, is the psychoacoustic foundation of sound emergence as a noise descriptor.
While dose-response curves have been developed for Aweighted sound pressure levels [5,7] and the merits of Cweighted sound pressure levels with respect to A-weighted ones have also been assessed [16], it appears that there is no published evidence supporting the use of sound emergence [15] as an alternative to sound pressure level in the context of annoyance due to wind turbine sounds. The only reference we found [17] deals with a listening test based on 24 synthetic sounds that were built from three different residual sounds (quiet at 34 dB(A), natural at 40 dB(A) and road traffic at 50 dB(A)) and four different specific industrial sounds including a wind turbine sound at two sound emergences (e = 3 dB and e = 5 dB). It is concluded in [17] that the degree of short-term annoyance is sourcedependent at constant e and that e = 0 dB does not imply that the specific sound cannot be detected. In [17], the design of the listening test prevents from evaluating annnoyance at various sound levels ceteris paribus because the type of residual sound used is sound-pressure-level-dependent.
The purpose of our paper is to report about a listening test carried out in laboratory regarding the relative merits of sound emergence and total sound pressure as proxies for short-term annoyance due to wind turbine noise. A second objective is to assess the capacity of sound emergence to predict the audibility of wind turbine sounds.
The paper is organized as follows. Section 2 provides the definition of sound emergence. Section 3 presents the methods. It focuses on the collection of audio samples of both specific and residual sound in the field to generate stimuli with controled total sound pressure level and signal-to-noise ratio, and on the preparation of a listening test. Section 4 describes the results obtained, Section 5 discusses (1) the correlation between short-term annoyance and the two quantitative metrics considered, (2) the issue of the relationship between audibility of the specific sound and sound emergence, and (3) several methodological aspects. Section 6 concludes this paper.

Definitions
In the remainder of this paper, sound emergence e is defined as: where L tot is the sound pressure level of the total sound and L res the sound pressure level of the residual sound. It is useful to further introduce L spec for the specific sound, i.e. the sound pressure level attributable to the source under investigation. Here, these four concepts of sound emergence, of total, specific and residual sound are used as defined in the current ISO 1996-1 standard [15,18]. If the residual and the specific sound are not correlated it is easy to derive the following relationship between e and the signal-to-noise ratio (SNR) [15]: e ¼ 10 log 10 1 þ 10 3 Methods

Recording equipment and procedure
Since the aim was to generate auditory stimuli with controlled signal-to-noise ratio where the signal is wind turbine sound, two types of recordings were collected or occasionally reused from previous work. The first one was wind turbine sounds in a situation of high signal-to-noise ratio. The second one was residual sound where no wind turbine sound is present but other sounds that are typical of a rural environment.
All the recordings were made with a 1/2 00 200V class 1 condenser microphone (Brüel and Kjaer type 4190) connected to a preamplifier (Norsonic type 336) with a high pass filter at 20 Hz. The preamplifier served as a front end to a digital audio recorder (Sound Devices type 722). The recordings where made with a sampling frequency of 48 kHz and a resolution of 24 bits. Before and after each measurement series, the recording channel was calibrated with a class 1 sound calibrator (Brüel and Kjaer type 4231). Wind speed was measured at 2 m height with a handheld anemometer (Windmate type WM-200).
Residual sound samples were recorded using a standard 9 cm diameter spherical wind screen and the microphone was placed on a tripod. Wind turbine sounds used in this study were instead recorded with reference to IEC 61400-11:2018 for the measurement of sound power from wind turbines [19]. Terrain unevenness made it however impossible to meet the requirements on distance from the microphone to the wind turbine. According to [19], the microphone was mounted flush on a circular rigid plate placed on the ground. A 9 cm diameter standard wind screen cut in half was used as a primary wind screen. A secondary wind screen was used (handmade or Microtech Gefell type GFM 920).

Recording campaigns
The wind turbine sounds used are taken either from a previous measurement campaign at Smøla wind farm, Møre og Romsdal, Norway in April 2018 [20] or sounds recorded in May 2019 at Skomakerfjellet wind farm, Roan, Trøndelag, Norway. Both sites are away from any noise generating infrastructure. Smøla can be characterized as a flat patchwork of moorland, of barren rocky ground and of soil covered by low vegetation like grass or heath. Skomakerfjellet can be described as hilly terrain, with mostly barren rocky ground, moss and grass. Facts about these wind farms are provided in Table 1. At Smøla wind farm the recordings were made at various positions where the closest wind turbine was a Bonus B76/2000. Six 5-min recordings were collected at distances between 236 m and 360 m. At Skomakerfjellet the microphone was placed 150 m away from the closest turbine. Five 5-min recordings were performed at various orientations from the plane of the blades of the closest turbine.
Residual sound samples were recorded in May 2019 at Rotvoll, in the surroundings of Trondheim, Troendelag, Norway, close to the coast. The place can be described as a semi-rural environment with forest and crop patches, diffuse habitat, industrial buildings, secondary roads within 100 m and a railway line at 200 m distance. Three 10-min recordings were made at different locations.

Removing tonality
Considering that tonality in wind turbine sounds at typical distances from dwellings reflects either poor design or poor maintenance [21], the recordings made were scanned for tonal components using both listening and a Python program [22] that implements the ISO PAS 20065 standard procedure for the detection of tonality [23].
When tonality was detected in a recording, it was filtered out in the frequency domain using local estimations of two characteristics. The first one is the average power spectrum density (PSD) and the second one the maximum PSD. The two estimations were obtained from the PSD of the frequencies adjacent to the frequency range where the tonal components occur. In the frequency interval with tonal components, PSD was then replaced by values from a gaussian random deviate defined by interpolation of the averages and the maximum PSD at left and the right of the frequency interval.

Accounting for atmospheric attenuation
Since the aim was to collect wind turbine sounds with as high a signal-to-noise ratio as possible, the distance to the wind turbine was low in comparison to the shortest typical distance between a turbine and a dwelling. While it would be relevant to assess the annoyance from wind turbines in the outdoors, for instance because the surroundings of a wind farm can be used as a recreational area, our focus was on the impact on inhabited areas. Therefore, atmospheric attenuation at 500 m was computed according to the ISO 9613-1 standard [24] assuming 20°C, 70% humidity and 101,325 Pa atmospheric pressure. The choice of 500 m corresponds to the minimal distance allowed from a wind turbine to the closest dwelling in the French law [25]. The corresponding filter was applied in the frequency domain using narrow bands.

Selected residual and specific sounds
Three 30-s recordings of wind turbine sound were selected to be used as the specific sounds S0, S1 and S2 in the generation of stimuli with controlled L tot and e. An overview of their time or frequency characteristics before correcting for tonality and atmospheric attenuation is given in Figure 1. S0 (Smøla) is an excerpt from a recording taken 360 m away from the closest wind turbine with a wind speed of 4 m/s at 2 m height. The original recording is free from tonality. S1 and S2 were recorded at Skomakerfjellet (see Table 1). S1 comes from a recording made downwind at 60°from the horizontal projection of the rotation axis, S2 from a recording made upwind along the horizontal projection of the rotation axis (see Table 1). In either case the average wind speed was 4.2 m/s at 2 m height.
The recordings made at Skomakerfjellet contained a series of four tonal components between 3750 Hz and 4250 Hz (see Fig. 1 upper right). Contrary to the usual assumption that tonality originates from gearboxes [21], the tonal components here are probably attributable to sound radiated by power inverters, a little documented noise generation mechanisme in the context of wind turbines [26]. In addition clear fluctuation or "swish" can be heard in S1  Fig. 1 lower left), whereas S0 and S2 can be described as steady sounds.
The 30-s recording selected for the residual sound R0 contains a mixture of low frequency sound from distant urban traffic and vocalizations from four bird species, the Common chiffchaff (Phylloscopus collybita), the Common chaffinch (Fingilla coelebs), the Blackbird (Turdus merula) and the Willow warbler (Phylloscopus trochilus). These are common species that are widespread throughout Europe. Therefore, the overall soundscape does not have any typically Norwegian characteristic and it could have been recorded in many other places in Europe. The PSD of the signal and a spectrogram are presented in Figure 1.

Combining residual and specific sounds
As explained in Section 3.3, the wind turbine sounds were cleaned from any occasional tonal component and their frequency spectrum was extrapolated to 500 m distance. This means three different pairs ðS0; R0Þ, ðS1; R0Þ, and ðS2; R0Þ of specific and residual sounds. Each pair was used to generate a series of stimuli. Residual and specific sound were mixed to account for specific SNR. The amplitude of the mix was also adjusted to match specific The metric used for both L tot and SNR is L Aeq;30s . While the intention was to propose stimuli where L tot is in 10 dB steps up to 50 dB(A), it was necessary to ensure that the signal-to-noise ratio in the test room would be high enough. Assuming that the usual 10 dB SNR rule holds in this context, preliminary sound pressure level measurements showed that the lower admissible value for L tot was 35 dB (A).
SNR was deemed more convenient than e when it comes to preparing the stimuli because the range of variation of e is limited to 3 dB for negative SNR. The choice of 30-s stimuli was dictated by the intention to evoke long-lasting exposure to wind turbine sounds while keeping the listening test within reasonable time limits.

Listening test protocol
The room used for the listening test is a shoebox-shaped room of dimensions 7.2 Â 6 Â 2.9 m. The room is designed to be visually neutral and acoustically dry. It has no window and its white walls are without any furniture. Moreover, the walls were entirely covered by either acoustic absorbers or diffusors. A suspended ceiling provided additional absorption. We measured the A-weighted equivalent sound pressure level in the room in June 2019 with a calibrated class I sound level meter (Norsonic type Nor140) during office hours and we obtained L Aeq;1s;30 min ¼ 25:3 dB. The L Aeq;1s showed only minor fluctuations. The same sound level meter was used to calibrate sound reproduction.
The listening test relied on a software developed by one of the authors. This user interface guides the subject all the way through the listening test. Starting with the collection of personal data, it proceeds with the introduction to the task, the presentation and the rating of the different stimuli and the storage of the results in a text file for further processing. The language used is Python with the Tkinter binding to Tk for the graphical user interface. The user interface does not perform any kind of processing on the acoustic stimuli that are prepared beforehand and stored in mono PCM 48 kHz 24 bits format, but the software presents the different stimuli in random order. The user interface controls the audio hardware using the sounddevice Python binding to the PortAudio library while collecting the subject's answers regarding short-term annoyance and the audibility of wind turbine sounds for each stimulus. Short-term annoyance is rated using the standard ISO 15666 [27] scale that is displayed by the graphical user interface. Regarding audibility, the subjects answers if he/she could hear a wind turbine in the stimulus just played back.
The hardware used was a laptop computer (Dell type Latitude 5580) and its embedded sound card (Realtek type ALC3246-CG). The mono two-channel output from the sound card was connected to two active 2-way electrodynamic loudspeakers (Dynaudio type BM6A). For the sake of the listening test, the frequency response of the loudspeakers was measured in anechoic conditions and an inverse filter applied to equalize the frequency response from 31.5 Hz to 10,000 Hz.
During the test, the listener would sit at a table were the laptop was placed. Seen from above, the two loudspeakers and the head of the subject formed an equilateral triangle. The loudspeakers were placed on a stand so that the tweeter's height was at about the same height as the ears of the subject. Absorbing materials were placed on the ground to reduce the contribution of ground reflections.

Test subjects
Subjects were recruited among temporary and permanent staff members from the Faculty of Information Technology and Electrical Engineering at Norwegian University of Science and Technology, Trondheim via a mailing list announcement and going door-to-door. Announcement were also made using social networks from student organizations (NTNU International Students) and the French Alliance in Trondheim. The subjects were not asked to take an audiometric test but to self-report any hearing impairment.
Thirty two subjects with self-declared normal hearing were recruited and took part in the listening test. There were 14 female subjects and 18 male ones. The age of the subjects ranged from 20 to 49 with a median age of 29 years. Ninteen subjects were NTNU employees, eight were students and five did not belong to these categories. Fourteen nationalities were represented in the panel of subjects. Eighteen participants out of 32 were Europeans. Only two participants reported a prior exposure to wind turbine sounds.
Overall there were no acoustic disturbances during the test. However, one subject reported disturbance from the sound of an helicopter fly-over and another from a slamming door. Since the impact of the disturbance was limited to a single stimuli, the input from these subjects was used in the statistical analysis. Figure 2 presents a regression analysis of short-term annoyance as the dependent variable and sound emergence e as the independent variable. In addition, it gives a synthetic view of the distribution of the experimental data as a function of sound emergence. Figure 3 presents the counterpart for short-term annoyance and sound pressure level. The annoyance ratings appear to be much more scattered when sound emergence is used than with sound pressure level. For instance, the totality of the ISO 15666 scale is used whatever the level of sound emergence considered whereas it is not the case when the ratings are grouped by sound pressure level.

Annoyance
The linear regression analyses indicate correlation values R 2 that range between 0.024 and 0.06 between short-term annoyance and sound emergence for different data clusters, and between 0.31 and 0.34 for short-term annoyance and sound pressure level. Numerical results for the complete data set are given in Table 2. The slope of the annoyance function is also higher for L p than for e while the range of variation for L p is lower than that of e in the test. If one includes both L tot and e in a multilinar regression analysis, the R 2 value is only slightly increased when compared to a linear model that only depends on L tot . Furthermore, a Principal Component Analysis carried out on the data set indicates as shown in Figure 4 that short-term annoyance and L tot correspond more or less to the same dimension of the dataset while e corresponds to another principal component. A two-way ANOVA was also conducted that examined the influence of sound pressure level and sound emergence on short-term annoyance. Its results are that the variance explained by sound pressure level (F ¼ 704) is larger than that explained by sound emergence (F ¼ 92), while both parameters have a statistically significant influence (p < 0:001).
All this suggests that sound pressure levels reflects the annoyance data better than sound emergence. Therefore L tot , if a single parameter is to be used, then L tot should have the priority over e.

Audibility
The answers to the question "Could you hear wind turbine sounds?" allow to calculate audibility rates for the various stimuli proposed. As expected, wind turbine sounds are more easily detectable as sound emergence increases. The results for the two lower values of sound emergence are presented in Figure 5. It appears that even for values that are close to 0, i.e. the minimum value for e by definition, the lowest audibility rate is 28% for the mixture S0R0 at L tot =35 dB(A). At the scale or our study, ceteris paribus, the audibility rate is stimulus-dependent. Stimuli based on S1 present a higher detection rate than those based ond the two specific sounds. But for the three sound mixtures proposed, the audibiility rate is an increasing function of L tot .

Discussion
The main finding of our study is that, for the set of synthetic soundscapes proposed in this listening test and for a specific definition of sound emergence, the A-weighted sound pressure level is a better predictor of short-term annoyance due to wind turbine sound than sound emergence. While the correlation between short-term annoyance and sound emergence is statistically significant, sound emergence clearly appears as a second order descriptor. Therefore, if one is to use a single descriptor, the A-weighted sound pressure level should be preferred. Here we used L eq;30s to calculate sound emergence. This finding may not hold if sound emergence were based on another descriptor like a fractile sound level for instance.
In addition, since a large part of the subjects were able to hear the wind turbine in situations of negative signal-tonoise ratiocorresponding to sound emergence as low as 0.4 dBthere are good reasons to question the relevance of the 3 dB or 5 dB sound emergence thresholds found in the existing legislation that relies on sound emergence to set noise limits. The common interpretation is that these thresholds come from the observation that when sound emergence is above 3 dB then the specific sound should be clearly detected without the need to concentrate on the task. The listening test suggests that this empirical detectability threshold should be closer to 0 dB. But this would make it impossible to set a legal limit between compliance and non-compliance based on sound emergence  Table 2. For the complete data set, linear models of short-term annoyance as a function of L tot , SNR and e, and corresponding performance measures.  because sound emergence is positive by definition. In addition, the audibility rates derived from our listening test are likely to underestimate the real audibility rates in situations of long-term exposure as, in the long run, the human ear may get trained to detect specific sounds when repeatedly exposed to them in a given soundscape. The observation that stimuli based on S1 lead to higher audibility rates can be explained by the periodic time pattern revealed by the sonogram in Figure 1. The ability of subjects to detect wind turbine sounds when the signal-to-noise ratio is negative is documented in previous research. For instance, [28] found that the detection threshold for wind turbine noise in the presence of natural sounds were around À8 dB to À12 dB SNR.
As generally observed, our study illustrates that the total sound pressure level accounts for a relatively low percentage of the variance of short-term annoyance. Adding a second variable may help building a better model of annoyance from wind turbine noise. As said above, including sound emergence would only have a minor impact. The higher audibility rates found for S1 suggest that a sourcespecific parameter like a metric reflecting the periodic sound pressure level fluctuation, often called amplitude modulation for the sake of conciseness, is more relevant. Recent research also showed that this apparent amplitude modulation is an important source of complaints [4] in the context of wind energy.
Regarding the collection of wind turbine sound samples, it is obvious that placing the microphone on the ground is not representative of a typical listener's position. Without any a priori information, however, when listening to such a recording made on the ground it is hard to tell where the microphone was. Furthermore, the recording obtained does sound like a wind turbine sound. None of the listeners involved in this study reported that the wind turbine component of the soundscapes felt unrealistic. Our main concern here was to collect audio samples that were free from wind-induced noise on the microphone. While noise measurements can tolerate a certain amount of windinduced noise on the microphone, we believe that the human ear is very sensitive to such spurious noises. The ground is a very favorable position when it comes to minimizing wind flow on the microphone.
One can also wonder whether the wind turbine sounds used to prepare stimuli are really free from any other sound. It is well known that vegetation can generate sounds when exposed to wind. Emission models have been developed, however only for trees [29][30][31]. In both sites the occasional vegetation is very low. Coniferous or deciduous trees are absent. Assuming a flat ground with a roughness length of 0.005 m for short grass [32] and a logarithmic wind speed profile one can extrapolate the wind speeds measured at 2 m height down to 0.5 m height. This can be seen as the absolute maximum height of vegetation in Smøla. The vegetation at Skomakerfjellet was lower. The wind speed at 0.5 m height is about 3.1 m/s (11 km/h). Considering the type of vegetation observed, such a wind speed is not likely to generate any significant vegetation noise. On site we did not notice any vegetation noise and this impression was confirmed in the lab.
When extrapolating our close range wind turbine sound recordings to 500 m, we corrected for atmospheric attenuation. Close range is convenient to achieve a high SNR. This approach is similar to [16]. But our correction does not account for ground effect and the ground dip in particular [33]. Ground effect is less pronounced for elevated sources like wind turbines than for sources near the ground like road vehicles. Still, ground effect is likely to modify to some extent the overal shape of the spectrum. In a further study, ground effect could be taken into account for instance by computing the corresponding excess attenuation with the parabolic equation.
The number of sound stimuli proposed in our listening test may seem limited. This is merely the consequence of the need to propose long enough individual stimuli so that the listener can better imagine being long-term exposed to the same sound and the need to keep within reasonable limits the total duration of the listening test for a subject. Nevertheless, the results obtained are consistent with a wide metaanalysis carried out on annoyance from environmental noise sources, although mostly transportation ones and at higher sound pressure levels [34], i.e. that annoyance is only marginally influenced by residual sound pressure levels and that the total sound pressure level is a more important factor.
Our study focused on short-term annoyance. A field study would be necessary to evaluate annoyance in the long-term sense. But such a field study would raise several practical issues because of the difficulty of monitoring sound emergence in the field. Due to the typical distances between a wind farm and a building faҫade, the signal-to-noise ratio of a measurement is likely to be too low to allow for an Figure 5. Audibility rates for the two lower values of e considered in this study, as a function of the stimulus and of L tot . In this figure, the bars are not stacked but superimposed. In other words, the audibility rate for a specific L tot can be read at the right end of the corresponding rectangular area. accurate estimation of L spec . Estimating L res experimentally would imply costly long-term measurements. The procedure developed in Gallo et al. [35] to separate the residual sound from the specific sound would be certainly worth considering here. Simulating sound emergence is not theoretically impossible but it is much more demanding than simulating the acoustic contribution of a wind farm because it implies to model a wide variety of sound sources present in the environment [15].

Conclusion
From recordings of wind turbine sounds and residual sounds, we constructed a set of synthetic soundscapes where the control parameters are the sound pressure level and sound emergence. These soundscapes that could have been recorded in many different places in Europe formed the basis of a listening test. For a specific definition of sound emergence, the results of the test indicate that short-term annoyance from wind turbine sound is better predicted by the A-weighted sound pressure levels than by sound emergence. With the soundscapes used in our listening test it was also observed that sound emergence was a poor predictor of the audibility of wind turbine sounds.
While sound emergence has been present for decades in a number of legal texts and guidelines in Europe and at the international level, to our knowledge this is the first time a systematic evaluation of this indicator ceteris paribus is carried out. Further work is needed to confirm our findings in the specific context of wind turbine noise. In addition, since sound emergence is not only used to regulate wind turbine noise it would also be worth investigating the merits of this indicator when applied to other community or industry noise sources.