A model framework for simulating spatial hearing of bilateral cochlear implant users

– Bilateral cochlear implants (CIs) greatly improve spatial hearing acuity for CI users, but substantial gaps still exist compared to normal-hearing listeners. For example, CI users have poorer localization skills, little or no binaural unmasking, and reduced spatial release from masking. Multiple factors have been identi ﬁ ed that limit binaural hearing with CIs. These include degradation of cues due to the various sound processing stages, the viability of the electrode-neuron interface, impaired brainstem neurons, and deterioration in connectivity between different cortical layers. To help quantify the relative importance and inter-relationship between these factors, computer models can and arguably should be employed. While models exploring single stages are often in good agreement with selected experimental data, their combination often does not yield a comprehensive and accurate simulation of perception. Here, we combine information from CI sound processing with computational auditory model stages in a modular and open-source framework, resembling an arti ﬁ cial bilateral CI user. The main stages are (a) binaural signal generation with optional head-related impulse response ﬁ ltering, (b) generic CI sound processing not restricted to a speci ﬁ c manufacturer, (c) electrode-to-neuron transmission, (d) binaural interaction, and (e) a decision model. The function and the outputs of different model stages are demonstrated with examples of localization experiments. However, the model framework is not tailored to a speci ﬁ c dataset. It offers a selection of sound coding strategies and allows for third-party model extensions or substitutions; thus, it is possible to employ the model for a wide range of binaural applications and even for educational purposes.


Introduction
Cochlear implants (CIs) have successfully restored the sensation of hearing to those with severe to profound deafness since the 1980s [1] and there are over 1 million implant recipients worldwide as of 2022 [2]. In the past 10-15 years, there have been improvements in advanced implant features and a large increase in the number of patients -(predominantly children) who receive bilateral CIs. These changes have led to significant improvements in CI outcomes due to the restoration of some binaural processing abilities, resulting in better speech understanding in noisy environments and better sound localization. For example, the accuracy of a bilateral CI user's localization in the azimuthal plane is much improved over that of unilateral CI users, by exploiting interaural level differences (ILD) [3][4][5], and potentially making better use of envelope interaural time differences (ITD) [6,7]. However, substantial differences still exist in many binaural tasks between bilateral CI users and normal hearing (NH) listeners [8][9][10]. Multiple factors have been identified that limit binaural hearing abilities of people with bilateral CIs, and the influence of these factors can depend on the individual. Factors or processing stages include the acoustic environment, the head-related impulse response (HRIR), various CI speech processing stages and their individual settings, electrodeneuron transmission, binaural interaction in brainstem neurons, as well as particularly complex and plastic cortical information decoding and decision-making stages (see, e.g., [11] for a review). These stages are mostly covered in the proposed framework. Figure 1 shows a schematic of the model framework, which includes input signals, various CI processing stages, the electrode-neuron interface stage, the auditory nerve (AN) stage, the binaural integration stage, and simplified decision-making stages. Within such a long sound encoding/decoding pipeline, it is often not possible to identify the reason for a particular impairment or limitation. Computer models of auditory processing can be employed to help diagnose and understand the cause of a specific perceptual result or measurement by using one or a combination of several aforementioned stages.
Theoretically, a combination of verified stage-specific models in a single framework should allow for the simulation of individual CI user's perception (Fig. 1). Yet, as is evident from the literature referenced above, models of each stage still have some unresolved details, or their parametrization is already extremely complex even when used in isolation. It is unclear to what extent it is necessary to have very detailed parameter definitions to simulate auditory processing on a macroscopic level. These difficulties and limitations restrict the generation of multi-stage sound-to-perception type models. In fact, accurate simulations with most models are often limited to relatively narrow datasets. In the binaural context, simulating human perception with models built upon physiological data still remains an ongoing challenge even for understanding normal hearing (see, e.g., [38] for a review). However, some recently developed, fairly comprehensive models (e.g., [39][40][41]) have overcome some of the challenges and demonstrated the possibility for full perceptual models.
Previously, Kelvasa and Dietz [34] published a soundto-perception model for bilateral CIs by combining a CI speech coding strategy mimicking the advanced combination encoder (ACE) strategy from Cochlear Corporation [42,43]. They implemented several auditory processing models to estimate the direction of a sound source: First, an electrode-neuron interface, second an AN model, third a binaural interaction stage, and finally a decision stage. Like most of the above-mentioned models, this model was The "Input Signals" stage incorporates functions for generating binaural signals, with or without head-related impulse responses (HRIRs). (C) The "CI Processing" stage comprises a general pre-emphasis high-pass filter, automatic gain controls (AGCs), filter banks, and coding strategies. Note that this block can be bypassed by providing pulse trains directly from the signal generator block. (D) The "Processing Model" includes an electrode-neuron interface (ENI), an auditory nerve (AN) periphery model, and an excitationinhibition (EI) binaural interaction model. (E) The "Decision Model" maps the spiking rate to psychoacoustic behavioral data. Abbreviations: continuous interleaved sampling (CIS); N-of-M channel selection (NofM); with x-channel fine structure processing (FSx); peak picking (PP). able to account for selected localization data but was not generalizable to other experiments. In addition, there were some programming limitations, most notably, their Hodgkin and Huxley model stage for the lateral superior olive (LSO) is the computational bottleneck and is not provided in MATLAB. Therefore, it was deemed necessary to revise some of the model stages from Kelvasa and Dietz [34] to simulate a broader range of experiments. The first is the simulation of different CI stimulation techniques, including direct stimulation of single or multiple electrodes, utilizing audio line input, as well as free-field stimulation based on head-related impulse responses (HRIR) or even binaural room impulse responses (BRIR). A second extension allows for a much wider range of sound-coding strategies to be explored, incorporating options such as filter bank type, fixed pulse rate or local-peak-driven pulse rates, or N-of-M channel selection. The third extension replaced the Hodgkin and Huxley LSO model in the binaural integration stage with a functional and computationally more efficient model [44] as used in Klug et al. [40] and Hu et al. [45]. The fourth extension was to add a variety of decision stages that can be selected in a task-dependent manner.
In this work, the framework is demonstrated using simulations of free-field localization results of bilateral CI users, based on typical and well-established experimental findings previously conducted in both research (e.g., [46]) and clinical (e.g., [4,5,7,9]) settings. The model outputs are shown at different stages along the processing chain, including HRIR-based virtual acoustic rendering, interaurally independent automatic gain controls (AGCs), and various coding strategies with different filter banks. These model simulations aim to correspond to the average CI user's performance, providing a starting point for future individualized applications.

Model framework and stages
The lower panel of Figure 1 illustrates the five main stages of the proposed modular framework in direct comparison to the physiological stages represented in the upper panel: (A) a parameter initialization stage, (B) a binaural signal generation stage, (C) a bilateral CI processing stage, (D) an auditory processing model stage including binaural interaction, and (E) a decision model stage. Most stages can be switched on and off, to allow for different test setups. Due to its flexible nature, the framework allows for easy third-party model extensions or substitutions.

Parameter settings
The parameter initialization stage (Fig. 1A) loads the default parameters for most stages except for the "Decision Model" stage, and, if applicable, changes them to the userdefined settings. More specifically, users can define the switches and set the parameters of stages B, C, and D ( Fig. 1) based on their specific research topic, such as switching AGCs on and off, switching coding strategies, or varying cochlear insertion depth.

Input signals
The signal generation stage (Fig. 1B) contains different functions for generating binaural signals either using HRIRs or by adding ILDs and ITDs directly. For simulations with CI processing, six binaural signal generation functions are provided as example input stimuli that have been commonly used in lateralization and localization studies. These are tone bursts [6,47], broad-band, low-pass, and high-pass noise bursts [5,7,[48][49][50][51], click trains [6], and speech fragments [6,52]. Furthermore, custom signals can be loaded as input. By switching the HRIR in the "Input Signals" module ( Fig. 1B), the binaural cues can be introduced either by adding ITD or ILD cues independently or by applying HRIRs. In the current framework, the HRIRs were taken from Kayser et al. [53] and were those from the most frontal of the three behind-the-ear (BTE) microphones mounted on the Bruel and Kjaer artificial head and torso simulator (HATS, type 4128C) with a source distance of 3 m. This database covers the range of ±90°, at 5°intervals in azimuth, and from À10°to 20°in steps of 10°in elevation. It should be noted that the HRIRs of positive azimuthal angles (right side) were simply mirrored from the negative ones (left side).

CI processing
The CI processing (Fig. 1C) consists of a pre-emphasis filter, AGC, two possible filter banks, four research coding strategies, and electric amplitude mapping. For simulating direct stimulation for single-or multiple-electrodes, this block can be bypassed by providing pulse trains with independently added ITD and ILD cues from the signal generator (Fig. 1B).

Pre-emphasis filter
The first step of the CI processing is a pre-emphasis filter, as a substitute mostly for the ear canal and middle ear transfer characteristics. This is implemented using a firstorder Butterworth high-pass filter with a corner frequency of 1200 Hz. The present implementation permits changes to the pre-emphasis frequencies and the deactivation of the pre-emphasis filter.

Automatic gain control
The AGC stage compresses the input signal above a defined threshold (knee point). Here, a general broadband AGC based on Giannoulis et al. [54] is implemented in the framework as an optional stage. In brief, the digital signal is amplified with a gain that is dependent on the compression ratio and on the knee point. Temporal smoothing is achieved with a digital one pole filter, which includes options for adjusting attack and release times. The default threshold, ratio, attack time, and release time were 50 dB SPL, 2:1, 50 ms, and 100 ms, respectively (see Appendix for full details). Comparable results to the literature for both static and moving sources (e.g., [55,56]) or commercial CI devices have been achieved with these parameters.

Filter banks
After the AGC, the input signal is split into M channels. For this, either a Fast Fourier Transform (FFT)-based filter bank or a gammatone-filter bank [57,58] can be selected. The channel number and the corner frequencies for each channel can be specified by the user. For the FFT-based filter bank, the sample rate is fixed at 16 kHz and a 128-point moving Hann windowed audio signal covering the range of 250 to 7437 Hz [59]. After transforming the input signal into a spectrogram, the M-channel signal envelope is obtained by summing several (weighted) power frequency bins (please refer to [59][60][61] for more details). The FFT-based approach and the corresponding parameters coarsely resemble the signal processing of Cochlear Corporation devices [62].
The gammatone filter bank implementation outputs a complex-valued analytical signal in each band, allowing for the extraction of both Hilbert envelope and the temporal fine structure [58]. The default filter bank has center frequencies between 158 and 7480 Hz [63] and consists of 12 fourth-order gammatone filters. The bandwidth of each auditory filter scales with its center frequency according to Glasberg and Moore [64]. The center frequencies and bandwidths are chosen so that the À3 dB corner frequencies of adjacent filters are approximately aligned. The Hilbert envelope in each band is smoothed using a 200 Hz low-pass filter. 1 This processing and parametrization coarsely resemble elements of MED-EL signal processing, even though they do not use a gammatone filter bank but rather bellshaped finite impulse response filters [65].

Coding strategies
Four CI speech coding strategies were implemented. Three of them are based on well-known commercial coding strategies, or on what is publicly known about them: (1) continuous interleaved sampling (CIS) [66], (2) N-of-M selection as part of the advanced combination encode (ACE) [42,43], and (3) x-channel fine structure processing (FSx), typically with x = 4 [67]. The fourth strategy "peak picking" (PP) [52] contains elements of fundamental asynchronous stimulus timing (FAST) [68], the peak-derived timing (PDT) [69] coding strategy, and fine structure processing. Briefly, instead of a fixed rate, the stimulation time and amplitude of each channel are specified by the local temporal maxima. For the three most apical channels, the pulse timing and amplitude of the local maxima are extracted from the real part of the complex-valued gammatone-filter output, which contains the temporal fine structure of the signal. The remaining mid-to high-frequency channels of PP are similar to the FAST coding strategy, i.e., the local maxima are extracted from the envelope. It should be noted that all these research coding strategies were self-implemented for different purposes and are not intended to correspond exactly to their commercial counterparts. They were implemented in a way that is not restricted to commercial hardware or computing limitations and to allow for flexible parametrization. This includes the total electrode number along the CI electrode array, the stimulation pulse rate, and allowing for arbitrarily short pulses. Consequently, the default parameters might not be optimized for all tasks. However, they have been validated in different studies with CI users and showed similar speech intelligibility in noise to the commercial strategies (e.g., [8,52,61]).
The implemented FSx coding strategy combines elements of the gammatone filter bank based CIS with fine structure processing as in the PP strategy in the x most apical channels. The remaining mid-to high-frequency channels of FSx are based on the low-pass filtered temporal Hilbert envelope and sampled at a fixed pulse rate. For example, with x = 4, the four most four apical channels contain the temporal fine structure of the signal, where the pulse timing and amplitude of the local temporal maxima are used for the stimulation, related to but not identical to MED-EL FS4 and FS4p strategies.
Since speech is one of the most commonly used stimuli for comparing coding strategies, Figure 2 shows the output at selected stages until the end of the CI processing for the Chinese word "shui" (water) from the Chinese mandarin matrix sentence test (CMNmatrix) [70]. The original word was a single channel signal with a sound level of 60 dB SPL ( Fig. 2A). Figures 2B-2D show the two-channel waveform of the processed signal after applying HRIR of À60°(B), pre-emphasis (C), and two independent AGCs (D). The corresponding spectrograms of the left channel signal are shown in the second row. Figures 2E-2H are the electrodograms of the compressed signals (D) processed with one of the four research coding strategies: 12-channel CIS (E), FFT-based CIS with 8-of-22 selection (F), 12-channel FSx = 4 (G), 12-channel PP (H).
As expected, the three gammatone filterbank-based coding strategies (E, G, and H) show more similar stimulation patterns compared to the FFT-based CIS with 8-of-22 channel selection (F), without considering the stimulation rate. The interaural pulse time differences were clearly shown in the zoomed-in plot of FSx = 4 (G, channel 1-4) and PP coding strategies (H). In addition, the frequencydependent delay in each individual channel can be observed in the gammatone filter bank-based electrodograms (e.g., at the beginning) when compared with the FFT filter bank based electrodogram. Table 1 listed some default settings of these four exemplary coding strategies used in the following demonstrations as well as in the Supplementary materials.

Electric amplitude mapping
To map the acoustic amplitude into the CI user's electrical dynamic range, a frequency-independent compression is performed for all coding strategies on each channel by applying a logarithmic-shaped compression function with steepness controlled by the parameter a c = 415.96 [60]. 1 It is a 1st-order finite impulse response Butterworth filter with forward and backward filtering. The non-causal approach was used to ease temporal alignment of fine structure and envelope for sound coding strategies. An causal filter implementation would be needed for real-time applications.
Unless stated otherwise, a threshold level of 25 dB SPL and a maximum level of 65 dB SPL were set as the acoustic input range of each band-pass filtered channel, and the resulting levels were then mapped to the clinical unit (CU) between the CI electrode threshold and saturation levels (e.g., 100 and 200 CU in the examples), respectively.
These values were then mapped onto current, by using the mapping function: Finally, biphasic pulses were sent to the auditory nerve (AN) model although the currently implemented CI AN model is agnostic to polarity. Unless stated otherwise, the default biphasic pulse is cathodic/anodic with a phase duration of 25 ls and an inter-phase gap of 8 ls. The stimulation was done in an interleaved manner from apical to basal electrodes. Other electric amplitude mapping equations can be implemented to replace equation (1) to convert any clinical unit to a current value in lA for other CI manufacturers.

Processing model
The encoded sound after CI processing is then passed into the auditory processing model stage, which starts with the electrode-neuron interface (ENI), then the AN model, and finally the binaural excitation-inhibition (EI) interaction model (see Fig. 1D). Figure 3 illustrates the exemplary settings of these three parts and their interactions. The same electrical AN model and EI model as in Hu et al. [45] were used but were extended from single EI neuron to multiple EI neurons, considering the current spread: each electrical pulse can activate many AN fibers (ANFs), and the outputs of these ANFs then serve as inputs to several EI neurons, linking the left and right hemispheres.

Electrode neuron interface
The ENI is a bridge between the CI electrode output and the AN model. It allows the simulation of different experimental settings. For example, users can specify the cochlear geometry (e.g., dead regions or the number of AN fibers), the CI electrode array (e.g., the length, the electrode number, or the insertion depth), and the spread of excitation in the left and right ear independently. For demonstration purposes, one predefined interface is shown in Figure 3. The exemplary interface assumes: (1) left and right sides are identical; (2) the electrode positions along the unwound cochlea can theoretically reach from the most apical end (35 mm) to the most basal end (0 mm); (3) there are 28 AN fibers per mm, i.e., in total 980 fibers, evenly distributed along the tonotopical axis from 0 mm to 35 mm. Figure 3A shows the location of a predefined example CI electrode array on the straightened and simplified cochlea (brown), where 12 electrodes are equally distributed along the tonotopic axis between 5 to 26 mm. The frequencies shown at the bottom of Figure 3A correspond to the full frequency range (20-20,000 Hz) of a normal-hearing ear [71]. The input frequency allocation of each electrode is defined as part of the CI coding strategies.
Next, the electric field at the positions of the auditory nerve fibers was simulated, not considering the generally important spatial structure of the fibers. Here, the decline of the electric field, which is commonly referred to as the spread of excitation, is modeled by a double-sided, onedimensional, exponentially decaying spatial-spread function controlled with the parameter k [36]: where I 0 is the active input current at the position x el and I is the effective current due to the spread of excitation to the position of x n . The presented framework adopted k = 9 mm as its default value. It corresponds to a medium amount of spread of excitation, according to electrically evoked compound action potential data [72]. The parameter k can be interpreted as a surrogate parameter inversely related to the distance between the electrodes and the AN fibers (radial distance is assumed to be constant along the cochlea). Figure 3B shows one set of exemplary spread of excitation functions of the 12 electrodes. When it is intended to use a 3D model, one possibility is to derive the excitation matrix from all electrodes to all neurons or even to all compartments of all neurons within the 3D model and then substitute the spread of the excitation stage with such a matrix.

Auditory nerve model
The electrical AN model of Hamacher [18] in the implementation of Fredelake and Hohmann [36] was used to simulate electrically stimulated AN fibers. This model incorporates the effects of cell membrane, membrane noise, refractory period, mean response latency, and response time jitter in a functional way. As in Hu et al. [45], all default AN parameters were the same as in Fredelake and Hohmann [36]. The main information transferred from the CI processing outputs (electrodogram, with optional current spread) to the AN fibers after the ENI stage includes a sequence of stimulated electrodes and their stimulation time, effective stimulation current I (see Eq. (2)), and phase duration of pulses. The output of the AN stage is a spike matrix containing the effective firing time of action potentials for each fiber. The response probability of each fiber depends on two parameters: Firstly, the membrane voltage after the first pulse phase, which is influenced by the threshold potential, stimulation current and the first phase duration. Secondly, the time difference between the onset of the current pulse and the last action potential. When following the above-mentioned input-output format, the AN model can be replaced by other AN models (e.g., [19,20,22]).
In the example, 980 simulated AN fibers on each side were created. They were 35 groups of 28 fiber bundles sharing the same functional properties. To be more specific, a bundle of 28 fibers was first randomly generated, and then replicated 35 times along the tonotopic axis and across ears, simulating identical left and right sides. Each bundle was assigned to the excitatory and inhibitory inputs of a single EI neuron described in the next section (binaural neuron model). Each bundle of these 28 AN fibers is represented by a black or gray color-filled rectangle in Figure 3B. Figure 4A shows the average spike rates of each fiber bundle along the tonotopic axis from 0 to 35 mm. These spikes were evoked by the same Chinese word "shui" shown in Figure 2 (60 dB SPL) convolved with the HRIR of azimuth À60°or 0°(with AGCs). Figure 4B shows the average spike rates of all AN fibers evoked by the same stimuli presented at different azimuthal angles. The AN spike patterns are useful for understanding the inputoutput relations of each processing stage shown in Figure 1. For example, it enables the user to check if spatial cues are still encoded at this level, to understand at which level the information may get corrupted or lost.

Binaural neuron model
As in Klug et al. [40], the single EI neuron model used in Hu et al. [45] was extended to a population EI model neurons along the tonotopic array. In the current model configuration, the outputs of the left and right ANFs (Fig. 4A, 35 bundles and in total 980 ANFs on each side) serve as the inputs to the EI stage. There are 35 EI neurons on each hemisphere in response to these 35 ANF bundles Figure 4. Simulation outputs at AN, EI, and decision stages for the same Chinese word "shui" at 60 dB SPL. (A) average spike rates of each ANF bundle along the tonotopic axis for left (blue) and right (red) sides, with azimuth of À60°(left subplot) and 0°(right subplot). (B) average spike rates across all 980 ANFs for different azimuth angles (left and right AN rate-azimuth tuning curves, in blue and red). (C) spike rate of each EI along the tonotopic axis of the left (blue) and right (red) sides, at azimuth angles of À60°and 0°. (D) average spike rates across 35 EI neurons for different azimuth angles (left and right EI rate-azimuth tuning curves, blue and red) and the corresponding spike rate difference between the right and left hemisphere (green). (E) the normalize rate difference (right y-axis) and the corresponding predicted response azimuth angles (left y-axis) after applying a linear mapping. (Fig. 3B). Each EI neuron receives 20 excitatory inputs from the neighboring ipsilateral 20 AN fibers and eight inhibitory inputs from the contralateral eight AN fibers, all placed around the same center frequencies. Figure 3C illustrates one left-hemispheric EI neuron receiving excitatory input from the ipsilateral side (20 ANFs) and inhibitory input from the contralateral side (eight ANFs). The EI spike outputs were generated according to the coincidence counting model described by Ashida et al. [44] but with the same parameters as in Klug et al. [40] and Hu et al. [45]. Briefly, ipsilateral ANF spikes are convolved with a 1.1 ms long rectangular window (of amplitude 1) and contralateral spikes with a 3.1 ms long window of amplitude 2, simulating inhibition. If the sum of both convolved signals reaches the response threshold of 3, an action potential is generated. Within the refractory period of 1.6 ms, only the first action potential leads to a spike. Figure 4C shows the spike pattern of the 35 EI neurons evoked by the 60 dB SPL word "shui" convolved with the HRIR of azimuth À60°( left subplot) and 0°(right subplot), where a reduction of on-frequency rate could be observed. Figure 4D shows the average spike rates across the 35 EI neurons evoked by the same word at different azimuth angles (blue and red curves are corresponding to left and right hemispheres). Under a fixed combination of the above-mentioned EI neuron parameters (e.g., coincidence and inhibitory window size, response threshold, refractory period), different properties of the excitatory and inhibitory AN fibers from the bundle of 28 AN fibers can also lead to different results. In the demonstrated setups, inhibition will eventually win and suppress excitation when a pulse is presented bilaterally to a single electrode pair, stimulating all fibers in the relevant bundles. However, the different durations of inhibitory and excitatory inputs result in an additional ratedependence of this mechanism. It's noteworthy that the rate and level-dependent outputs of the EI model, shown in Supplementary Figure 12, align with the results from [73] in a qualitative manner.

Decision model
As many questions about the human "decision stage" remain open, it is challenging to define a single comprehensive model of the decision-making process for different tasks and applications. In previous studies, various central readout mechanisms have been used (for a review, see [74]). Kelvasa and Dietz [34] showed that the averaged hemispheric LSO model response difference is approximately proportional to azimuthal sound source localization in CI users. Klug et al. [40] adopted this simple linear mapping and demonstrated that the mean rate difference between the left-hemispheric and the right-hemispheric binaural EI model neurons together with a subject specific factor are largely sufficient to explain both envelope-ITD based and ILD based lateralization for narrow-band stimuli in normal hearing listeners. As previously emphasized, it is out of the scope of the current model framework to provide a validated decision model for all possible applications. For demonstration purposes a normalized spiking rate difference r D modified from Klug et al. [40] was employed. It was obtained as where N is the number of EI model neurons on each side (35 in this study). The spike rates of the two EI model neurons on the right and left hemispheres (with the same nth center frequency) are r R,n and r L,n , respectively. Comparing the rate difference ( P N n¼1 ðr R;n À r L;n Þ; Fig. 4D, green, Right-Left) to the normalized rate difference (Fig. 4E, r D ), the normalization leads to a value between À1 and 1. The normalization was achieved by dividing the rate difference with the total spike rate across two hemispheres, P N n¼1 ðr R;n þ r L;n Þ, which requires a non-zero total spike count. If one hemispheric rate drops to zero, the decision variable is either À1 or 1. This assumes that as soon as only one hemisphere is active exclusively, localization does not depend on the spike rate. This assumption might not be valid in other applications. For example, in a lateralization experiment, the amount of spike rate might be related to the extent of lateralization. In such a case, normalization might reduce the possible effects of certain factors on the extent of lateralization. For instance, the lateralization differences between single and multiple electric stimulations might be underestimated because of normalization. It should also be noted that such normalization can, to some extent, diminish some features that are presented in the rate difference tuning curves, such as the local peaks shown near the azimuth of ± 45°in Figure 4D. In addition, as shown in Figure 4C, the 35 EI spike rates along the tonotopic axis evoked by the Chinese word "shui" were not identical, however in the calculation of the normalized r D (Eq. (3)), all the EI neurons are equally weighted. Further optimization could consider tonotopically weighted rate differences (e.g., taking into account spectral dominance region [75]). Afterwards, alternative mapping functions can be used to project the normalized spiking rate difference r D on various scales to account for a range of psychoacoustic data (see Supplementary Materials). Similar to Kelvasa and Dietz [34] and Klug et al. [40], a simple linear function was introduced to map the r D function to azimuth angles. For example, c = 90°, and d = 0°were used in equation (4) to map the r D 2 [À1 1] to azimuth responses between [À90°90°] shown in classic localization experimental data (e.g., Fig. 4E). It should be noted that no other dedicated individual calibration of the linear mapping, such as using a subject-specific factor [40], was applied in this demonstration.

Discussion
The goal of this research was to provide a general CI model framework for simulating the spatial hearing abilities of bilateral CI users in a broad range of binaural experiments. The framework combines various CI processing strategies with characteristics of different ascending auditory pathway stages in a modular and open-source software. To demonstrate the outputs of different model stages, a localization experiment setup was simulated with a simple decision model. Signal representations were shown after different CI processing stages. Simulated neural response rates were shown at the level of the AN and after binaural interaction, and finally the decision variable was visualized as a function of sound source azimuth.
Beyond the examples demonstrated in this paper, the model has been applied to several psychoacoustic datasets including lateralization and localization experiments in bilateral CI users (e.g., [6,[76][77][78][79][80]). In general, with the same AN, binaural, and multi-task decision stages, the model was able to capture the overall results of the bilateral CI users in these experimental studies, such as the HRIR-, ITD-, and ILD-based localization/lateralization judgments, the effect of coding strategies, and the effect of level-dependent compression. These results are provided in the Supplementary Materials.
The following subsections discuss the limitations of the current model framework and possible applications with further extension.

Limitations
In the CI processing stage of the current model implementation, four different coding strategies and two different filter banks were implemented. Their implementations, particularly for CIS and FSx were partially based on secondary literature due to limited published details from manufacturers. It should be noted that they were not validated to the same extent as commercial coding strategies. However, some studies using the self-implemented coding strategies demonstrated similar speech reception thresholds using direct CI stimulation when compared to participants using their own speech processors [8,52]. Additionally, as discussed in the following Section 3.2, users should be able to easily integrate their own coding strategies into the model framework. It is crucial to recognize that the current binaural stage solely incorporates the LSO circuitry. However, for a more comprehensive understanding and representation of binaural processing, it may be necessary to include the medial superior olive (MSO) circuitry in future work.
The decision stage model was introduced to mimic how central pathways extract a lateralization or localization percept from the ensemble of binaural model neuron spikes. Due to the complexity, a variety of different decoding stages have been suggested in previous studies (for a review, see [74]). In this paper, the normalized rate difference r D and linear mapping functions were used to map the model outputs to different psychoacoustic data. With this multi-task decision model, ITD and ILD threshold, lateralization, and localization experiments were simulated, using single or multi-electrode direct stimulation or the simulated CI processor. Overall, this simplified decision model showed plausible results in predicting the average CI user's performance with different stimulation techniques. However, not all features of the selected experiments could be captured with this approach. For example, the model may underestimate the effect of multiple electrode stimulation compared to single electrode stimulation, the differences between the coding strategies with and without fine structure encoding, or exaggerate the detrimental effect of two independent AGCs by treating all the EI neurons equally. To be more specific, the normalized rate difference assigns the same weight to the outputs of all 35 EI neurons and this might be one of the reasons for some of the unusual simulation results included in the Supplementary Materials. For example, one unusual simulation result that occurred with the use of two independent AGCs (Supplementary Figure 5) might be due to this limitation. There the model suggested a reverse in the perceived localization direction for broad-band stimuli at high input levels. However, to our knowledge, this effect has not been explicitly reported in the localization literature with bilateral CI users, although inverted ILDs have been reported in the lowfrequency channels with bilaterally independent AGCs (e.g., [46,48,81,82]) or N-of-M selection (e.g., [34]). Sparsely distributed reverse localization data points were shown in some Advanced Bionics bilateral CI users [46], but it was not specifically discussed by the authors. This implies that either the current simplified decision model exaggerated the effect of the inverted ILDs in the lowfrequency channels or the reversed direction may happen in a minority of CI users under some extreme circumstances. To solve the issues with the decision stage, more advanced decision models such as weighted rate differences and machine learning based methods (e.g., [83,84]) might help. Furthermore, a decision stage including a listenerspecific factor (e.g., [40]) for fitting the model outputs to each participant might be required for accounting for individualized performance in the psychoacoustic data.
All the examples demonstrated here were based on the current "general purpose" parameter settings. However, making individual predictions is possible by customizing the model parameters of other stages (CI processing, ENI, AN, binaural stage) to their clinical profile, additionally to the above-mentioned decision model. For example, all the simulation results presented in this paper and the Supplementary materials assumed ideal bilateral synchronization and bilaterally symmetrical hearing (same CI fittings, ENIs, same AN, and EI neurons) in both ears. To simulate configurations such as a dominant CI, different ENIs, and ANFs (e.g., different AN properties, neuron number) for each EI neuron across the tonotopic array or even across ears need to be set. For this approach, the parameters can be easily adapted by including asymmetric neuron survival, different electrode insertion depths, the spread of excitation, CI dynamic range, two unsynchronized independent speech processors, and bilaterally mismatched fittings. Consequently, further studies involving, e.g., mismatched insertion depth [85][86][87], bilaterally synchronized N-of-M channel selection [34,[88][89][90][91], and bilaterally synchronized AGCs [46,81,82] are possible within the current framework.

Possible applications with further extension
The model framework is designed to be flexible enough to cover a wide range of binaural research topics. Although it was only tested with lateralization and localization experiments (see also Supplementary materials) in this study, by selecting different stage combinations and different decision models, its applications can go beyond the provided experimental examples. By referring to the MATLAB examples provided, users are able to replace the current functions with customized ones, such as incorporating other HRIRs or BRIRs for complex acoustic environments (e.g., [53,[92][93][94][95][96][97][98]), self-developed AGCs, different coding strategies, different AN models, or different decision models. Each model stage alone can be used or extended for both research and education purposes. The framework can thus be used as a tool to help validate binaural algorithms [99,100], for designing binaural studies with real CI users or to study the implications of binaural fitting [87]. Although the current model framework is binaural by default, it can be used for monaural purposes or only include certain stages. For example, one of the main stages of the current framework, the implemented CI processing, includes most of the technical details of a typical CI speech processor (e.g., AGC, filter banks, coding strategies). Thus, this model can also be used for developing or validating new sound processing strategies (e.g., [52,59,61,90,[101][102][103][104][105][106]).
Since modeling studies are not restricted by a clinic appointment timeframe, this model can also be a powerful tool for testing new concepts or experiments before starting clinical tests, where recruiting many participants with the same coding strategies or pre-processing is very timeconsuming, expensive, and cumbersome. For example, in pre-clinical studies, certain specific settings might deviate from parameters typically used by clinical populations (e.g., switching off AGC or assuming synchronized AGCs, using single or multi-electrode direct stimulation instead of the participant's own speech processor), which makes it difficult to predict the outputs of CI users in real-life with their everyday processor settings. The framework can help in "generalizing" such laboratory research findings by exploring the impact of variations in the parameter settings across a range of ecologically relevant acoustic conditions.
In addition to the above-mentioned applications which focus on the CI device-related limitations (e.g., binaural cue distortions) and AN-related issues (e.g., ENI, ANFs, and EI interaction), the framework also allows for thirdparty model extensions or substitutions for other applications, although less straightforward. For example: (1) the demonstrated ENI assumed an unwound cochlea and all AN fibers were equally distributed between the most basal (0 mm) and the most apical (35 mm) ends. It is possible and might be necessary to incorporate 3D implanted cochlear models (e.g., [25-28, 30, 107-109]) for the ENI individualization to simulate interactions between implanted cochlear anatomy and stimulation parameters (e.g., [29,110,111]).
(2) The current AN model does not take into account the fiber morphology and orientation towards the electric field, as well as its first and second derivatives that are essential for simulating the activation of AN fibers. The present model is insensitive to the number and polarity of pulse phases, and to the size of the inter-phase gap. Several studies have investigated the polarity sensitivity of auditory nerve fibers using eCAP recordings in human CI users [112,113], and to model such experiments, other AN models (e.g., [19,20,114]) can be incorporated. (3) Extensions to predict bilateral speech intelligibility are possible by connecting the AN spike outputs to the backend processing stages of other purposes such as an automatic speech recognition system (e.g., [37,110,111]) or by replacing it with more abstract internal model representations (e.g., [115,116]). (4) Applications for predicting CI users' bilateral loudness perception (e.g., [117]) by combining, e.g., the AN fiber outputs with a physiologically-motivated binaural loudness model (e.g., [118]). (5) The electrical hearing of one side could be replaced with an acoustic hearing periphery (e.g., [40,45,119]) or by combining the electrical hearing with an acoustic hearing on the same side [110,120]. In this way, the model framework can also be useful for the investigation of more complex input systems (e.g., CI users with single-sided deafness or electro-acoustic stimulation) in complex acoustic scenarios.

Summary and conclusion
An open-source computer model framework for simulating the spatial hearing abilities of bilateral CI listeners was developed based on the model of Kelvasa and Dietz [34]. It includes a binaural signal generation stage, a CI processing stage with a wider range of speech-coding strategies, an electrode-neuron interface stage, an auditory nerve model stage, an excitation-inhibition binaural interaction model stage, and a decision stage based on the normalized hemispheric rate differences. The framework can be used to simulate left/right discrimination, lateralization, and localization performance of CI users for single-or multiple-electrode direct stimulation, and free-field listening experiments. It can be extended or modified to model further experiments. The MATLAB code of the model framework, as well as the code to reproduce the model data and figures, are published as open source on Zenodo [121].

Data Availability Statement
The research data associated with this article are included in the Supplementary materials of this article and are also available on Zenodo [121].

Author contributions
Conceptualization and framework design: HH, MD.

Acknowledgments
The AGC MATLAB code used in the framework was implemented and validated as part of Jakob Dießel's bachelor thesis. The authors thank Stephan D. Ewert for comments on earlier versions of the manuscript.

Supplementary materials
The supplementary material of this article is available at https://acta-acustica.edpsciences.org/10.1051/aacus/ 2023036/olm. Supplementary Figure 1: An overview of the four experiments selected for the purpose of demonstrating and validating the proposed framework. Experiment 1 and 2 simulate the localization (with HRIR) and lateralization (without HRIR) experiments in bilateral CI users, respectively. In addition, the effect of AGC was illustrated in experiment 1a and 1b (with and without AGC). Experiment 3 and 4 simulate the ITD just noticeable difference (JND) or ITD-and ILD-based lateralization, where single-electrode or multiple-electrode pairs were stimulated without a CI processing stage.
Supplementary Figure 2: Waveforms (column 1), spectrogram (column 2) and electrodograms (column 3-6) of these six acoustic stimuli (250 Hz sinusoidal tones, BB-, LP-, HP-noise bursts, clicks, and Chinese word 'shui') at 60 dB SPL. The electrodograms of four coding strategies are shown in column 3-6: 12-channel gammatone filter-bank based CIS (column3), FSx = 4 (column4), PP (column5), and the 8-of-22 channel FFT filter-bank based ACE* (column 6). The Matlab function for creating these plots is 'plot_suppfig2.m' Supplementary Figure 3: The electrodograms (A, C, within 60 ms, from 60°) and the ILD-azimuth functions (B, D) of the BB stimuli without (AGC = 0) and with AGC (AGC = 1) obtained from the 12-channel FSx=4 coding strategies. The electrodogram of the right CI is shown in black behind the colored left electrodogram. The broadband ILDs before the CI processing were plotted as black dotted lines, while the ILDs of each individual electrode pair were plotted as colored solid lines. The simulated source level was 70 dB SPL. The corresponding Matlab function for plots is 'plot_suppfig3'.
Supplementary Figure 4: Examples of the effects of different coding strategies (different colors represent 12-channel gammatone filter-bank based CIS, 8of22-channel CIS, FSx = 4, and PP without AGC) on the azimuthal localization performance with the six stimulus types (column 1-6) at 40 dB SPL: the spike pattern along the tonotopic axis for 0°azimuth (upper panels), response azimuth over target azimuth (lower panels) functions. The corresponding Matlab functions for generating the simulation and the plots are 'gen_suppfig4.m, and 'plot_subfig4.m'.
Supplementary Figure 5: Demonstration of the effect of AGC on the EI-outputs (localization) for the six stimuli at different levels with 12-channel FSx = 4 coding strategies. The first row shows the estimated direction vs the original direction without AGC, the second one with AGC. The corresponding Matlab functions for generating the simulation and the plots are 'gen_suppfig5.m, and 'plot_subfig5.m'.
Supplementary Figure 6: Example of an estimated psychometric ILD curve on the r D scale (left y-axis) and the corresponding lateralization scale (right y-axis) after the mapping. The slope, width, and range are indicated by a, x and range.
Supplementary Figure 7: The spike pattern along the tonotopic axis for ITD = 0 and ILD = 0 (upper panels), r D -ITD (middle panels) and r D -ILD (bottom panels) functions of the EI-outputs for the six stimuli types with a general 12-channel CIS coding strategy without AGC at 6 different levels (represented by different colors). The corresponding Matlab functions for generating the simulation and the plots are 'gen_suppfig7.m' and 'plot_subfig7.m'.
Supplementary Figure 8: Effects of different coding strategies (different colors represent 12-channel gammatone filter-bank based CIS, FSx=4, PP, and 8of22-channel FFTbased CIS; without AGC) on the ITD and ILD discrimination for the six stimuli types (column 1-6) at 60 dB SPL: the spike pattern along the tonotopic axis for ITD = 0 and ILD = 0 (upper panels), % right response-ITD (middle panels) and % right response-ILD (bottom panels) discrimination functions. The corresponding Matlab functions for generating the simulation and the plots are 'gen_suppfig8. m' and 'plot_subfig8.m'.
Supplementary Figure 9: stimulation locations used in experiment 3 and 4. The marked red points are located at 5, 10, 15, 20, and 25 mm along the tonotopic axis.
Supplementary Figure 10: the normalized spike rate difference (r D )-IPD functions (A, 380 lA; B, 600 lA) and the ITD JNDs-pulse rate functions (C, blue, 380 lA; purple, 600 lA) of single electrode stimulation at different pulse rates. The corresponding Matlab functions for the simulation and the generation of the plots are 'gen_suppfig10. m' and 'plot_subfig10.m'.
Supplementary Figure 11: The average spike rate of the 35 EI-neurons along the tonotopic axis at ITD = 0 and ILD = 0 for 100 pps (A) and 1000 pps (B). On the y-axis, the spike rate is shown in spikes per second (sp/s), and the x-axis is the tonotopic location as well as the corresponding greenwood frequency. The normalized r Á -ITD (C, left y-axis) and r Á -ITD functions (D, left y-axis) and the corresponding lateralization results (right y-axis). The dotted lines in D are the corresponding results for the 1000 pps. The stimulation level is 480 lA. The corresponding Matlab functions for the simulation and generating the plots are 'gen_suppfig11.m' and 'plot_subfig11.m'.
Supplementary Figure 12: The average spike rate of the 35 EI-neurons along the tonotopic axis for single-electrode pair stimulation located at 15 mm. Three pulse rates of 100 pps (A), 500 pps (B) and 1000 pps (C) were simulated for unmodulated pulses at eight different input levels with ITD = 0 and ILD = 0. On the y-axis the spike rate is shown in spikes per second (sp/s), and the x-axis is the location along the basilar membrane. Panel (D) shows the on-site (at 15 mm) EI neuron's spike rate-level for these three pulse rates. The corresponding Matlab functions for the simulation and generating the plots are 'gen_suppfig12.m' and 'plot_subfig12.m'.