Contribution of bone-reverberated waves to sound localization of dolphins: A numerical model

We implement a new algorithm to model acoustic wave propagation through and around a dolphin skull, using the k-Wave software package [1]. The equation of motion is integrated numerically in a complex three-dimensional structure via a pseudospectral scheme which, importantly, accounts for lateral heterogeneities in the mechanical properties of bone. Modeling wave propagation in the skull of dolphins contributes to our understanding of how their sound localization and echolocation mechanisms work. Dolphins are known to be highly effective at localizing sound sources; in particular, they have been shown to be equally sensitive to changes in the elevation and azimuth of the sound source, while other studied species, e.g. humans, are much more sensitive to the latter than to the former. A laboratory experiment conducted by our team on a dry skull [2] has shown that sound reverberated in bones could possibly play an important role in enhancing localization accuracy, and it has been speculated that the dolphin sound localization system could somehow rely on the analysis of this information. We employ our new numerical model to simulate the response of the same skull used by [2] to sound sources at a wide and dense set of locations on the vertical plane. This work is the first step towards the implementation of a new tool for modeling source (echo)location in dolphins; in future work, this will allow us to effectively explore a wide variety of emitted signals and anatomical features.


Introduction
Most literature concerned with the evolution of sound localization postulates that this task is accomplished by means of several well established auditory cues, at least in mammals [3,4]. Firstly, the binaural or interaural timeor phase-difference (ITD or IPD): the delay between the arrival time of a sound at the two ears. Secondly, the binaural or interaural intensity or level difference (IID or ILD): the difference in the intensity of a sound, as perceived at the two ears. The locus of sources that can be associated to a given ITD and/or ILD is a cone, sometimes dubbed the "cone of confusion" [4]. Since humans and other species are known to distinguish different sources within such a cone, a third psychoacoustic cue must exist. In fact, it is clear from the physics of acoustic-wave propagation that the same signal interacts differently with our anatomy depending on the direction it comes from; these differences are summarized in the "head-related transfer function" (HRTF). Behavioural experiments on humans suggest that each individual unconsciously learns to use its HRTF to localize sounds [5]. It is generally accepted that, in practice, the auditory system is very sensitive to certain prominent peaks and notches of the frequency spectrum perceived by the ears, or "spectral cues", whose amplitude and location along the frequency axis, controlled by the complex shape of the pinnae, depend on the elevation of the source; a number of studies have shown that up-down and front-back ambiguities are dealt with based on such spectral cues [4,6].
The ITD/ILD/spectral-cue model has been successfully applied to explain the results of behavioural experiments, and has found neurophysiological and neuroanatomical support across different species [4,7]. At least in the case of humans, it has also been shown to have limitations that can be predicted theoretically and are confirmed by behavioural experiments. Namely, (i) since binaural cues play a fundamental role, subjects that are deaf on one side can be expected and are indeed observed to perform poorly in sound localization tasks [4]; (ii) normal-hearing subjects fail at determining the correct elevation of narrow-band sources: elevation is determined through the identification of spectral peaks and notches, which requires that the emitted sound be broadband [8,9]; (iii) because spectral cues are not as accurate as ITD and ILD, at least humans have been observed to be less sensitive to changes in source elevation than azimuth: according to experiments, their minimum audible angle (MAA, or the smallest detectable angular difference between the locations of two identical sources of sound) for sources within the vertical plane is never less than 4°, vs.~1°on the horizontal plane [10].
Behavioural experiments indicate that odontocete cetaceans, which rely on sound and on their biosonar to carry out many essential tasks, do not suffer from (some of) the mentioned limitations. They are known to emit highfrequency clicks and burst-pulsed calls as well as lower frequency whistles. For example, the frequency range of echolocation signals emitted by short-beaked common dolphins is between 23 and 67 kHz [11]. Despite the absence of pinnae, bottlenose dolphins have been shown to be equally sensitive to changes in the elevation or azimuth of signals similar to their echolocation clicks [12,13]; they have also been shown to respond to clicks with a MAA of 0.7°, superior to the localization accuracy of other studied mammals [3]. It has been suggested [14] that these results cannot be explained without invoking, within the auditory system of dolphins, a localization mechanism different from the ITD/ILD/spectral-cue model attributed to other species.
An essential factor in assessing this speculation is the analysis of the dolphins' HRTF, and the information it might carry pertaining to source location. In an earlier study [15], our team has tackled this problem with an experimental approach, recording and analyzing mechanical waves propagating through a dolphin skull. Their experiments were performed using two different signals: a narrow-band chirp and a sinusoidal burst, both with a central frequency of 45 kHz. Reinwald [15] concluded that the skull shape and structure does not give rise to prominent, location-dependent features in the HRTF spectrum; they found, on the other hand, that the skull transfer function contains sufficient information for sources to be precisely localized by an algorithm based on Pearson's correlation, equivalent to the so-called "time-reversal" scheme postulated by Catheline et al. [16].
The frequency band of Reinwald et al.'s [2] data was severely limited by their experimental setup; additionally, it would be very difficult, if possible at all, to carry out similar experiments on full heads, including soft tissues: yet, soft tissues might have a non-negligible impact on the HRTF and thus on localization. We here accordingly develop a model of elastic-wave propagation through a dolphin head, and validate it by application on the same skull that was employed in the Reinwald et al. [2] experiments. We obtain high-resolution (voxel size = 111 lm) X-ray tomography images of the skull resolving small but possibly important heterogeneity within the bone. Based on grayscale values, we define a model of the skull's mechanical properties, and numerically simulate elastic wave propagation through it via a pseudospectral scheme [1]. Importantly, bone heterogeneity, neglected in other similarlyminded numerical studies (e.g. [17,18]) is accounted for in our simulations. While it is impossible (for a number of technical reasons that we discuss) to reproduce Reinwald et al.'s [2] experiment exactly, our results share all the important features of their data.

Method
In this section, we introduce the numerical tool we employed; we explain how we built a model of the skull's mechanical properties based on a computed tomography scan and we describe the configuration of our numerical experiment.

Numerical simulation toolbox
We treat the skull as a purely elastic medium and model wave propagation via k-Wave 1 , an open-source toolbox which solves the coupled first-order equations [1]: where r denotes stress, v particle velocity, and the Einstein summation notation is adopted. k and l are the Lamé parameters and q is the mass density. The compressional and shear wave speed (a and b, respectively) are, Equations (1) and (2) are solved for an elastic isotropic medium using the Fourier pseudospectral method. The latter is based on the Fourier collocation spectral method for computing spatial derivatives and the finite difference method for time integration [1,19].

Model of dolphin's skull
The flawless skull of a young adult short-beaked common dolphin (Delphinus delphis) of~50 cm length and 20 cm width, was scanned at the AST-RX X-ray tomography platform at the National Museum of Natural History in Paris, using a v|tome|x L 240-180 industrial micro CT scanner with voltage = 150 kV, current = 310 lA and exposure time of 333 ms. Data were reconstructed using the datos|x software and exported into 4106 slices of 16-bit TIFF images in coronal view. The resulting images are in grayscale which is representative of the mean attenuation of X-rays at each voxel. Their voxel size is 111 lm along x, y and z directions, which is large compared to the grid size envisaged for our simulations. We hence downsampled the images. Voxel size is an important parameter as it defines the spatial grid size in the simulations. Moreover, decreasing the resolution implies loss in the information on the structure. The gray values are important because they are the basis for associating elastic parameters to each voxel. This image processing step demands a compromise between the computational cost of the simulations and the degree of information on the physical structure of the skull that should be kept. In Figure 1 we show a slice with the original voxel size (i.e., 111 lm) and the 20-times increased voxel size (i.e. 2220 lm).
We chose to implement this level of resolution, as a good compromise between accuracy and computational cost, after a suite of simulations on two-dimensional structures of comparable complexity; these tests showed that signals modeled at the receiver locations were not modified significantly by changes in numerical grid spacing (See Appendix A for more details). Downsampling blurs the images and hence some detailed information on the structure is lost. For example, the distinction between the trabecular and cortical bone becomes less obvious. These two main parts of the bone structure are shown for different slices with the original resolution in Figure 2.
Also, the gray scale values vary from the original ones. It should be noted that the processed image here has been sharpened in order to magnify the contrast of the gray values in the downsampled image and all the image processing steps have been performed using ImageJ software.
We next associate a value for the mass density and wave speed to each voxel. We convert the images from 16-bit to 8-bit, so that the gray scale values range from 0 to 255. We apply a self-calibration on CT values, so that the brighter regions are assigned higher values of mass density and wave speed compared to the darker regions with lower CT values. We set mass density to vary from 1001 to 2000 kg/m 3 with 255 equal increments. Next, based on the density associated to each voxel, the voxel-wise values of a and b are calculated based on Equations (4) and (5), respectively. The value of 0.3 was used for the Poisson ratio (m) [20]: Equation (4) was obtained by fitting to experimental data (compressional wave speed and mass density for human cortical bone samples). This data was obtained as described in [21]. Based on the Resonant Ultrasound Spectroscopy (RUS) method, anisotropic velocities are measured for cuboid specimens. As resonant frequencies are directly related to the dimension of the sample and the wave speed inside it, RUS provides values for anisotropic velocities independent of density measurements. Figure 3 shows the map of the wave speeds for the same slice shown in Figure 1. Considering that the background medium is water with a = 1500 m/s and b = 0, the minimum and maximum values for the compressional wave speed in these maps are 1500 m/s and 3861 m/s and the corresponding values for b are 0 m/s and 2919 m/s.

Numerical set-up
In the experiment performed by [2], the skull is immersed in water far from the walls and floor of the pool, and is illuminated by sources along different elevations. The acceleration is measured using accelerometers attached to the posterior part of the mandible. In order to be close to the experimental set-up in [2] and since one of the goals of this study is to investigate the mechanism by virtue of which the elevation of a sound source is determined, in the following we shall position numerical sources on the median plane (the hypothetical plane that divides the head into left and right), at a range of different elevations. In practice, we illuminate the skull with a plane wave whose wave vector varies from 0°to 90°. Angle zero corresponds to the wave coming from the front of the skull while angle 90 corresponds to the wave coming from above it.
The input signal is a sinusoidal burst with the central frequency of 45 kHz and total duration of 100 ls as shown in  [19], we select c 0 Át Áx = 0.1, where Dt and Dx are the temporal and spatial steps, respectively. The value of 0.1 was chosen after careful trial and error; this value, sensibly smaller than 1, is compatible with example values suggested in the k-Wave documentation. Our numerical configuration is shown in Figure 5.
Once the elastic parameters and the numerical properties are determined, we illuminate the skull by a plane wave. This is equivalent to limiting our application to sources or echolocation targets that are not in the immediate vicinity of the subject. It is important to note that we do not intend to model the actual acoustical environment of the dolphin, but to characterize its localization abilities. This is usually done by considering sound coming from different directions, i.e., the angle of the incoming wave should be clearly defined. Plane waves are better suited for this goal, since they have a unique and well-defined direction of propagation.
In order to numerically generate a plane wave, one might deploy a point source in the far field of the object, but this approach requires a large enough simulation box to satisfy the far field condition, i.e. a significant increase in computational cost. We therefore adopt another approach here assuming that a plane wave already formed outside the simulation box, we consider the points located on the edges of the simulation domain as point sources, each emitting the same signal with a previously-calculated delay. By firing an identical signal with the proper time delay from neighboring points, we ensure the formation of a plane wave front which in contrast to the point-source approach does not require a large simulation box. The source mask in 3D is two perpendicular planes at the edge of the simulation box where each point on these planes acts as a source.
We record signals inside the pan bone (i.e. the thin posterior end of the lower jaw bone) on either side of the mandible. The exact locations of receivers were not recorded when conducting the analogue experiment [2], but the distance between receivers was: in our simulations, receivers were positioned as similar to the experiment as possible, based on this datum and on several available photos of the experimental set-up. A 2D view of the position of the receivers is shown in Figure 6.
The receivers are positioned on the bone with identical elastic properties on each side and they measure velocity of oscillation along x, y and z directions. Each simulation takes approximately 24 h on CPUs of type Inter(R) Xenon(R) CPU E5-2695 v3.

Results
In this section, we show the results pertaining to the validity of our model through different sets of comparisons with the experimental case; namely as direct comparison of the signals, measurement of the inter-receiver time difference and correlation maps.

Waveform comparison
We adopt different criteria to validate the reliability of our numerical model of the skull. In all cases, we use data from the mentioned laboratory experiment conducted by our team on the same skull [2]. In this experiment, the skull was immersed in water, centered in depth and width inside a water tank and illuminated from different angles on the vertical plane using a broadband marine transducer. The input signal in the simulation and experimental case are the same (Fig. 4). The receivers in the experiment are accelerometers. The position of the receivers in simulations is very close to the ones implemented in the experiment with a maximum difference of~5 mm. In contrast to the experiment where data are available for a range of angles varying from À90°to 90°, we have data for the range of 0°-90°.
As a first proxy, we directly compare the experimentally recorded signals with their numerically computed counterparts. In the latter case, acceleration is computed from velocity by numerical differentiation, and only the x component, which is normal to the mandible and pointing outwards from it, is considered (Fig. 5). This component is the closest one to what is measured in the experiment. Compared to the input signal in Figure 4, the recorded signals (in both experiment and simulation) are perturbed, due to the interference of the incoming wave with the bone-reverberated one. Because wave propagation is faster in bone than in water, reverberated signal arrives almost as early as the direct one, and cannot be easily separated from it by visual analysis. This phenomenon can be more or less prominent depending on the angle of incidence. As an example, in Figure 7a, we show the comparison between the experimental and simulation data for the left receiver and an incoming angle of 25°. Figure 7 shows that experimental and numerical signals are most often in phase with one another (this is true for both the direct and reverberated signal); the amplitude of experimental data is, however, strongly attenuated, while attenuation has been neglected in simulations. In order to facilitate the comparison, we scale the amplitude of the experimental signal and show the resulting waveforms in Figure 7b. Amplitudes are scaled assuming an exponential decay of the signal with time (t), as is commonly done in medical ultrasound imaging. This technique, called time gain compensation (TGC), assumes an exponential decay of the signals as they penetrate into the attenuating tissue. In order to retrieve the lost energy by time, the amplitude is multiplied by exp(at). In our case, the value of a is chosen by trial and error.
The consistency between the simulated and experimental signals becomes less clear for some angles. The worst case of the observed vs. simulated waveform decorrelation is shown in Figure 8. In this case, the codas of the two signals are not in phase, but, again, the envelope of the reverberated wavefield is captured fairly well. Simulated and experimental data for both left and right receivers are compared in Appendix B for a suite of different elevations of the source.

ITD
The second criterion for validating our numerical model is the time difference between the signals recorded at the left and right receivers. Assuming that the skull is symmetric, the interaural time difference (ITD) between the two received signals from the sources located on the median plane is expected to be zero. However, the dolphin skull is not symmetric. Here, we compare the values of ITD for simulated and experimental signals. We find the ITD by taking the difference between the arrival time of the first peak in the signals recorded at the left and right receivers. This arrival time was first identified automatically, as the time when signal amplitude would exceed a selected threshold, and subsequently checked visually by the first author of this article. This approach is applied to both simulation and experimental data, and the ITD versus the incidence angle for both cases is shown in Figure 9.
As can be seen, the variations of ITD values obtained from our simulations show the same overall "bias" as the experimental data, but with elevation-dependent fluctuations that are less-reproduced compared to the experimental case. Furthermore, an increasing trend from 0°to 20°i s observed in the simulation data but it is not as sharp as the experimental case. It should be noted that the orientation of the incoming wavefront with respect to the skull in the experimental case is not as accurately-defined as in the simulations. Therefore, there might be an error in the direction of the incident wave as recorded by the experimenters in [2]. The slightly higher values of experimental ITD are likely to be related to the fact that the distance between the receivers in the simulations is~5 mm shorter than in the experimental case.

Correlation-based localization
A given biosonar signal, reflected by a target at given azimuth and elevation, will result in a similar received signal at the ears of the echolocating individual. This signal importantly includes the coda arising from reverberations within the bone and is strongly dependent on target location. It can be hypothesized that, upon hearing a click-like sound, a dolphin could compare it with a library of echoes that, while training itself to echolocate, it has associated with unique target locations. The dolphin would then locate the source of the perceived click at the location corresponding to the stored echo that, of all those available in the library, is best correlated with it. In this case, we have two receivers but localization based on correlation is still possible thanks to the presence of a complex structure i.e. the skull. This is equivalent to what has previously been shown in time reversal experiments [22]. We quantify the effectiveness of this algorithm by using signals resulting from our simulations. This allows us to identify the contribution of different parts of the signal to localization.
Pearson's correlation [23] is used as a quantitative measure of similarity. In practice, our algorithm first aligns a pair of signals based on the first (non reverberated) arrival, similar to a "normal moveout" correction in seismics [24]. The correlation of the so aligned traces is then computed, summing over the sample by sample product of the two signals. We then normalize the correlation value by the square root of the product of the energy of the two signals as follows, where s 1 (t) is a signal received at a given receiver from a given angle and s 0 2 ðtÞ is the delayed version of s 2 (t) which could be any of the signals received at the same receiver from all other angles [23].
We prefer the combination of moveout and correlation rather than simple cross-correlation, because we have neglected the attenuation in our simulations. This results in relatively large-amplitude reverberations, which might cause cross-correlation to be amplitude-biased. Consequently, the maximum of the cross-correlation which is expected to be the maximum of the similarity could be biased.
After aligning the signals, we explore two different approaches: We correlate only the direct part of the signals. Direct waves are those part of the signals that mimic the form of the incident wave that has not undergone any reverberations (see Fig. 7b for distinction between different contributions in signals). We apply the correlation to the full-time signals, (i.e. direct + reverberated waves) In the following, similar to Figure 9 in [2], we show the resulting correlation maps for the left and right receivers. The y axis in these maps is the true elevation of the source (/ 0 ) (see Fig. 5) and the x axis is the elevation of all sources available in the library (a i ). The library includes the recordings made at both receivers, and a i is the source elevation for each pair of signals. The color values are representative of the maximum of the normalized correlation coefficient.
By construction, the correlation is maximum and equal to one only along the diagonal of the plots in Figure 10, and, ideally, should quickly decay to zero away from the diagonal. If correlation decays quickly to zero away from the diagonal, source localization resolution can be considered relatively high; conversely, large correlations for significant differences between a i and / 0 mean low resolution. When only the direct part of the traces is correlated, we observe that high values of correlation are spread over almost all angles, implying that the resolution of the localization is not high. On the other hand, when we correlate the fulltime signals, there is a strong improvement in resolution, with the regions of high similarity becoming more concentrated along the diagonal (i.e. true source elevation). This improvement is most striking in the 20°-70°elevation range. We next changed the input signal to a linear chirp whose frequency varies between 20 and 45 kHz. The reason for choosing this form of input signal (shown in Fig. 11) is to be as close as possible to the real world in the limits of the simulation configuration. The correlation maps we obtained for this case (not shown here), look very similar to the case where the input signal is a sinusoidal burst.
This observation confirms that information carried by signal reverberated in the bone enhances significantly the precision of the localization. We can quantify this improvement through the width of the focusing function. The latter is common in time reversal experiments (e.g. [25,26]), where through focusing of time-reversed received waves a localization in time and space is achieved. The -3dB width of the focusing function is defined as the width of the focusing pattern where the value of the correlation coefficient falls to 70% of the maximum value (e.g. [2]). As we have data for a limited range of angles, and since the dominant improvement in localization happens for the angles between 20°and 70°, we measure the À3dB width for this range of angles. The results of this exercise for the case of the chirplike input are shown in Figure 12.
This, again, reveals the increase in the precision of the localization as a result of taking into account the reverberated part of the signals. The same processing procedure applied to experimental data shows a similar enhancement in the localization. Moreover, the decrease in the À3dB width of the focusing function thanks to considering the

Discussion
In this study, the propagation of elastic waves through a dolphin skull is modeled numerically, based on high-resolution X-ray tomography of the skull, and a numerical algorithm that solves first-order coupled equations for the velocity and stress in the time domain. We validate our numerical results with data from an experimental study [2] conducted on the same skull. We do not expect to reproduce experimental data exactly, because: (i) our current numerical setup neglects attenuation (which is hard to quantify); (ii) there is an uncertainty in how we associate wavespeed and density estimates to the X-ray scan; (iii) position and orientation of sensors in the laboratory experiment also carry an uncertainty; (iv) the coupling between sensors and bone in the experiment is probably not perfect, but cannot be modeled easily in our framework. Taking all these differences into account, we consider the match between numerical and experimental results to be satisfactory.
We next employ our numerical data to evaluate the hypothesis, emitted by Reinwald et al. [2] that dolphin echolocation might involve the correlation of each newly received waveform with a library of recorded echoes. Our results confirm that, in this approach, using the information carried by the waves that are reverberated through the bone before reaching the ear complex improves localization accuracy significantly. As noted by Reinwald et al., our "correlation" algorithm is equivalent to acoustic time reversal, and our conclusion accordingly confirms those of timereversal literature (e.g. [16,26]).
The important contribution of bone reverberation in the context of echolocation has been previously confirmed through in-vivo experiments (e.g. [27]). Our study confirms the salient role of bone-reverberated waves in the context of localization based on time-reversal acoustics. It should be noted that here, we do not investigate the accuracy of echolocation as a function of arrival elevation. Also, our model consists of only a part of the head and not the full head. Nevertheless, on the basis of this model we can conclude that localization is indeed improved by information carried by bone.
Having verified the validity of our model and the performance of our algorithm on this simplified model, we can now apply it to more realistic models and to a wide variety of species. The emphasis of this study was on a particular skull for which detailed experimental data are available, allowing the validation of our numerical model. The next, particularly important step will be to take into account soft tissues, which is necessary if any inferences relevant to biology are to be made from our results. Our future work will shed further light on the nature of sound localization in cetaceans, and its implications for evolutionary biology. We observe that there is a good match in amplitude between the three signals in both the direct and reverberated parts. There is a slight difference in the phase of the signals which comes from the variation of elastic parameters from one model to the other. The important feature is that different wave packets are clearly captured even in the case of 20-times decreased resolution. Repeating this procedure for different slices, we observed a similar behaviour. This strongly suggests that our 3D simulations conducted on the 20 times downsampled images are sufficiently accurate to be compared to experimental data. This resolution provides a geometry representative of the real geometry with sufficient details on the medium and at the same time reduces the computational cost and time.

Appendix B Comparison of numerical and simulated signals
In this appendix, we show visual and quantitative comparisons of numerical and experimental results for a suite of different source elevations. The overall similarity between simulations and experiments is reflected in a value of the Pearson's correlation coefficients always >0.6 for the direct signal (Direct signals are found based on a threshold on the amplitude and the known duration of the input signal); correlation coefficients for the reverberated signals are low, but visual analysis suggests that the envelope, if not the phase, of the reverberated signal is often reproduced fairly well (Fig. B.1).