ABSTRACT  
 
Title of Dissertation:       THE ROLE OF FREQUENCY, TIMING AND 
LEVEL DISTORTION ON BINAURAL 
PROCESSING IN SIMULATIONS OF 
COCHLEAR IMPLANT USERS WITH SINGLE-
SIDED DEAFNESS. 
 
                                                             Jessica Marie Wess 
                                                             Doctor of Philosophy, 2017 
 
Dissertation directed by:                     Adjunct Professor Joshua G. W. Bernstein    
     Neuroscience and Cognitive Science Program 
     University of Maryland – College Park 
     and 
Audiology & Speech Pathology Center 
Walter Reed National Military Medical Center 
Bethesda, Maryland 
 
                                       
 
                            Professor Sandra Gordon-Salant 
      Department of Hearing and Speech Sciences 
University of Maryland College Park 
 
 
Cochlear implants are a promising new treatment option for single-sided deafness. 
Cochlear implants for single-sided deafness have been shown to improve speech 
perception in noise and aid in sound localization. However, this intervention is not as 
good as acoustic hearing and listeners’ exhibit large amounts of variability in hearing 
outcomes. These limitations may be caused by certain distortions inherent in the 
processing of the sound signals by the cochlear implant. This dissertation examined the 
role that three key cochlear implant distortions might play in limiting speech perception 
in noise for listeners with single-sided deafness.  The first distortion examined was the 
frequency mismatch between the cochlear implant and the acoustic ear.  The next 
distortion examined was the effect of timing differences between the cochlear implant 
and the normal hearing ear. Finally, the effect of compression on hearing speech in spatial 
noise was investigated. These limitations and distortions could limit binaural processing 
ability in those with single-sided deafness who receive a cochlear implant. The goal of 
this dissertation was to examine the role of cochlear-implant distortions on binaural 
hearing using simulations of cochlear implant processing presented to normal-hearing 
listeners. Normal-hearing listeners were presented with vocoder simulations of cochlear-
implant processing to one ear, and unprocessed signals to the other ear. These simulations 
were used to examine the ability to understand binaural speech signals in noisy 
environments and to examine auditory object formation in simulated free-field 
environments. These data provided insight into how CI distortions and mapping strategies 
can limit binaural benefits for those with single-sided deafness. Knowledge of these 
limitations could lead to better programming strategies to improve binaural hearing and 
quality of life for those with single-sided deafness who receive a cochlear implant. 
 
 
 
 
  
THE ROLE OF FREQUENCY, TIMING AND LEVEL DISTORTION ON BINAURAL 
PROCESSING IN SIMULATIONS OF COCHLEAR IMPLANT USERS WITH 
SINGLE-SIDED DEAFNESS. 
  
By 
Jessica Marie Wess 
 
Dissertation submitted to the Faculty of the Graduate School of the 
University of Maryland, College Park, in partial fulfillment 
of the requirements for the degree of 
Doctor of Philosophy 
2017 
 
 
Advisory Committee: 
 
Joshua Bernstein, Ph.D., Co-Chair 
Sandra Gordon-Salant, Ph.D., Co-Chair 
Douglas Brungart, Ph.D. 
Kenneth Grant, Ph.D. 
Matthew Goupell, Ph.D. 
Jonathan Simon, Ph.D. 
 
 
 
 
  
 
 
 
 
 
                        © Copyright by  
                     Jessica Marie Wess 
                                  2017
 
 
 
 
 
 
 
 
 
 
 
 
ii 
 
 
Dedication   
 
 
This dissertation is dedicated to my parents, Rose and John, for their unconditional love 
support and encouragement. 
 
 
 
  
iii 
 
Acknowledgements 
 
 
First and foremost I’d like to thank my adviser Josh Bernstein. Josh took me 
in as a wayward graduate student, as I was looking for greener scientific pastures. I 
have learned so much in the last four years at Walter Reed and working with such 
an amazing scientist has been a real honor. Josh has been extremely patient with me 
and my colloquial personality. Josh’s door was always open whenever I needed help 
with anything and I’m really grateful for all his guidance, support and 
encouragement.    
  I would like to extend my sincere gratitude to my Co-Adviser Dr. Sandra 
Gordon-Salant, for her patience and the time she has spent helping me become a 
better writer and scientist.  
Many thanks to and my dissertation committee members: Dr. Matthew 
Goupell, Dr. Douglas Brungart, Dr. Ken Grant and Dr. Jonathan Simon for their 
time, input and guidance.   
Finally I would like to thank my husband Paul, for all the love and 
encouragement, for being my best friend and for providing interesting and helpful 
scientific discussions. Also for help with the occasional data analysis or plot 
generation.  
 
This research has been supported by a grant from the Defense Medical Research and 
Development Program (DM130007; PI: Joshua Bernstein) 
iv 
 
Table of Contents: 
Abstract 
Dedication ........................................................................................................................... ii 
Acknowledgements ............................................................................................................ iii 
Table of Contents ............................................................................................................... iv 
List of Tables ..................................................................................................................... xi 
List of Figures ................................................................................................................... xii 
List of Abbreviations ....................................................................................................... xiv 
 
Chapter 1. Introduction to binaural hearing and cochlear implants for single-sided 
deafness ................................................................................................................................1   
General Introduction ............................................................................................................1 
Dissertation Aims.................................................................................................................3 
Binaural hearing is critical for speech perception in noisy environments .....................8 
The role of binaural fusion and auditory grouping in spatial hearing............................3 
Single-sided deafness and treatment options ...............................................................14 
Possible sources of distortion in CI users with SSD ....................................................24 
Spectral mismatches and their effects on binaural hearing ....................................24 
Temporal disparities between cochlear implants and normal hearing ears ...........29 
Loudness growth, compression and their effects on binaural hearing ...................32 
 
Chapter 2. The effect of interaural mismatches on contralateral unmasking in vocoder 
simulations of single-sided deafness. .................................................................................37 
v 
 
Introduction ........................................................................................................................37 
Experiment 2.1. The role of spectral mismatches on contralateral unmasking in 
simulations of CI users with SSD.  ....................................................................................45 
Experimental question and hypothesis.........................................................................45 
Methods..............................................................................................................................45 
Participants ...................................................................................................................45 
Approach ......................................................................................................................45 
Stimuli  .........................................................................................................................46 
Procedure .....................................................................................................................48 
Results ................................................................................................................................49 
Summary ............................................................................................................................52 
 
Experiment 2.2. The role of temporal mismatches on contralateral unmasking in 
simulations of CI users with SSD ......................................................................................53 
Experimental question and hypothesis ..........................................................................53 
Methods..............................................................................................................................53 
Participants ...................................................................................................................53 
Stimuli  .........................................................................................................................54 
Procedure .....................................................................................................................55 
Results ................................................................................................................................55 
Summary ............................................................................................................................56 
 
vi 
 
Experiment 2.3. The role of spectral mismatches and vocoder channel resolution on 
contralateral unmasking in simulations of CI users with SSD. .........................................56 
Experimental question and hypothesis.........................................................................56 
Methods..............................................................................................................................57 
Participants ...................................................................................................................57 
Stimuli ..........................................................................................................................57 
Procedure .....................................................................................................................58 
Results ................................................................................................................................60 
Summary ............................................................................................................................63 
 
Experiment 2.4. The role of spectral and temporal mismatches on contralateral 
unmasking in simulations of CI users with SSD ...............................................................63 
Experimental question and hypothesis.........................................................................63 
Methods..............................................................................................................................63 
Participants ...................................................................................................................63 
Stimuli ..........................................................................................................................64 
Procedure .....................................................................................................................64 
Results ................................................................................................................................65 
Summary ............................................................................................................................67 
Discussion ..........................................................................................................................68 
Impacts of a spectral mismatch ....................................................................................69 
Impacts of spectral resolution ......................................................................................71 
Effects of temporal mismatch ......................................................................................73 
vii 
 
Effects of combined spectral and temporal mismatch .................................................75 
Implications for SSD-CI listeners ................................................................................76 
Study Limitations .........................................................................................................78 
Conclusions ........................................................................................................................80 
 
Chapter 3. Effect of compression and expansion on binaural hearing in simulations of CI 
users with SSD ...................................................................................................................82 
Introduction ........................................................................................................................82 
Experiment 3.1. The effect of compression and expansion on squelch in simulations of 
cochlear implants for SSD listeners. ..................................................................................88 
Experimental question .................................................................................................89 
Hypothesis....................................................................................................................89 
Methods..............................................................................................................................90 
Approach ......................................................................................................................91 
Participants ...................................................................................................................93 
Stimuli ..........................................................................................................................94 
Generation of HRTFs ...................................................................................................94 
Noise Vocoding ...........................................................................................................95 
Loudness manipulations...............................................................................................96 
Procedure .....................................................................................................................97 
Results ................................................................................................................................99 
Summary ..........................................................................................................................103 
 
viii 
 
Experiment 3.2. The effect of compression and expansion on head-shadow benefit in 
simulations of cochlear implants for SSD listeners. ........................................................104 
Experimental question ...............................................................................................104 
Hypothesis..................................................................................................................104 
Methods............................................................................................................................107 
Approach ....................................................................................................................107 
Participants .................................................................................................................108 
Stimuli ........................................................................................................................108 
Procedure ...................................................................................................................109 
Results ..............................................................................................................................110 
Discussion ........................................................................................................................115 
The effect of compression and expansion on squelch ...............................................118 
The effect of compression and expansion on head-shadow benefit ..........................120 
Implications for CI listeners.......................................................................................122 
Study Limitations .......................................................................................................122 
Conclusions ......................................................................................................................125 
 
Chapter 4. The role of spectral mismatch on perceived binaural fusion in vocoder 
stimulations of cochlear implant listening. ......................................................................128 
Introduction ......................................................................................................................128 
Experiment 4.1. Numerosity judgments of binaural fusion .............................................136 
Study objectives ...............................................................................................................136 
Experimental questions ..............................................................................................136 
ix 
 
Hypothesis..................................................................................................................136 
Methods............................................................................................................................137 
Participants .................................................................................................................137 
Stimuli ........................................................................................................................137 
Procedure ...................................................................................................................137 
Noise Vocoding .........................................................................................................144 
Results ..............................................................................................................................145 
Interim Discussion 4.1 .....................................................................................................150 
 
Experiment 4.2. Discrimination, spectral mismatch and binaural fusion. .......................152 
Study objectives ...............................................................................................................152 
Experimental question ...............................................................................................152 
Hypothesis..................................................................................................................152 
Methods............................................................................................................................152 
Participants .................................................................................................................152 
Stimuli ........................................................................................................................153 
Procedure ...................................................................................................................155 
Apparatus ...................................................................................................................155 
Results ..............................................................................................................................156 
Discussion ........................................................................................................................157 
Impacts of spectral mismatch.....................................................................................159 
Disruption of temporal processing .............................................................................162 
Implications for SSD-CI listeners ..............................................................................163 
x 
 
Study limitations ........................................................................................................165 
Conclusions ......................................................................................................................168 
 
Chapter 5. Summary of dissertation and general discussion ...........................................170 
General discussion ...........................................................................................................175 
 
References ........................................................................................................................182 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
xi 
 
List of Tables 
 
Table I.  Hypothesis table for effect of compression and expansion on  
contralateral unmasking ..............................................................................88 
Table II.  Statistical post hoc results for squelch experiment ...................................100 
Table III.  Hypothesis table for effect of compression and expansion on  
head-shadow benefit ..................................................................................103 
Table IV.  Statistical post hoc results for head-shadow experiment...........................109 
Table V.  Experimental parameters fusion diotic conditions vs foils for  
Experiment 4.1A .......................................................................................135 
Table VI.  Analysis and synthesis channel allocation table 
standard vs place-matched ........................................................................138 
Table VII.  Experimental parameters monaural and bilateral vocoded and 
unprocessed conditions for experiment 4.1A ............................................139 
  
xii 
 
List of Figures 
 
Figure 1.1. Release from informational masking in CI users with SSD ........................20 
Figure 1.2. Performance variability among CI users with SSD .....................................21 
Figure 1.3.  Release from masking in vocoder simulations of CI users with SSD .........24 
 
Figure 2.1. The effect of spectral shift on contralateral unmasking ..............................49 
Figure 2.2. The effect of spectral shift on contralateral unmasking as a function of 
TMR ............................................................................................................50 
Figure 2.3. The effect of temporal shift on contralateral unmasking ............................54 
Figure 2.4. Analysis and synthesis band edges for +4 ERB spectral shift ....................57 
Figure 2.5. The effect of spectral shift and spectral resolution on contralateral 
unmasking ...................................................................................................59 
Figure 2.6. The effect of spectral shift and temporal resolution on contralateral 
unmasking as a function of TMR ................................................................60 
Figure 2.7. The effect of spectral shift and temporal resolution on contralateral 
unmasking ...................................................................................................64 
Figure 2.8. The effect of spectral shift and temporal resolution on contralateral 
unmasking as a function of TMR ................................................................65 
 
Figure 3.1.     Prediction for squelch experiment ...............................................................90 
Figure 3.2. Spatial configuration for HRTF contralateral unmasking experiment ........92 
Figure 3.3. HRTF acquisition schematic .......................................................................92 
Figure 3.4. Compression and Expansion Input/Output function ...................................94 
xiii 
 
Figure 3.5. Contralateral unmasking monaural vs linear bilateral .................................96 
Figure 3.6. The effect of compression and expansion on contralateral unmasking .......98 
Figure 3.7. Contralateral unmasking compression parameter as a function of TMR ....99 
Figure 3.8. Prediction for head-shadow experiment ......................................................99 
Figure 3.9. Spatial configuration for HRTF head-shadow benefit experiment ...........104 
Figure 3.10. Head-shadow benefit monaural vs linear bilateral ....................................106 
Figure 3.11. The effect of compression and expansion on head-shadow benefit ..........107 
Figure 3.12. Head-shadow compression parameter as a function of TMR ...................110 
Figure 3.13. Loudness growth in cochlear implants and normal hearing ......................124 
 
 
Figure 4.1. Data from experiment 4.1A. Sets A, B, C and D ......................................140 
Figure 4.2. Data from experiment 4.1B. Numerosity judgments for all vocoded and 
unprocessed conditions ..............................................................................142 
Figure 4.3. Schematic of two example trials from experiment 4.2 ..............................149 
Figure 4.4. Data from experiment 4.1B. Numerosity judgments for all vocoded and 
unprocessed conditions ..............................................................................150 
  
xiv 
 
List of Abbreviations 
• Alternative forced choice – AFC 
• Analysis of variance –ANOVA 
• Auditory scene analysis – ASA 
• Automatic gain control – AGC  
• Behind the ear – BTE 
• Bilateral cochlear implant – BICI  
• Bone-anchored hearing aids – BAHA  
• Cochlear implant – CI 
• Computerized tomography – CT 
• Continued interleaved sampling – CIS  
• Contralateral routing of signal (hearing aid) – CROS 
• Coordinate response measure – CRM  
• Decibels – dB 
• Dynamic range – DR 
• Electrical auditory brainstem response – eABR 
• Equivalent rectangular bandwidth – ERB  
• Fast fourier transform – FFT  
• Graphical user interface – GUI 
• Head related transfer functions – HRTFs  
• Hearing impaired – HI  
• Hearing loss – HL 
• Hertz – Hz 
• In the ear – ITE 
• Interaural level differences – ILDs 
• Interaural timing differences – ITDs 
• Just noticeable differences – JND 
• Knowles Electronic Manikin for Acoustic Research – KEMAR  
• Normal hearing – NH 
• Root mean squared – RMS  
• Signal to noise ratios – SNRs 
• Single-sided deafness – SSD 
• Spatial release from masking – SRM 
• Sound pressure level – SPL  
• Superior olivary complex – SOC 
• Target-to-masker ratios – TMR
1 
 
Chapter 1: Introduction to binaural hearing and cochlear implants for 
single-sided deafness 
 
 
General Introduction 
 
The human auditory system allows normal hearing (NH) individuals to detect 
speech and other signals embedded in dynamic noisy environments. Having two ears with 
NH (binaural hearing) is immensely important for hearing in these situations. Normal 
human interaction requires being able to hear a particular talker of interest when multiple 
people are talking in the background. For example, the ability to understand and 
communicate with someone across the table in a crowded cafeteria, is facilitated by 
binaural hearing. Individuals who have only one functional ear are at a severe disadvantage 
in terms of normal social interaction in such situations. 
Cochlear implants (CIs) are the world’s first widely successful neuroprosthetic 
devices. CIs can restore partial hearing in completely deaf individuals to a point where they 
can carry on verbal conversations completely normally. Traditionally CIs have only been 
implanted in profoundly hearing impaired individuals, but more recent outcomes indicate 
that they may also be useful for individuals with deafness in one ear—referred to as single-
sided deafness (SSD) — to partially restore binaural hearing. More specifically, for 
listeners with SSD, CIs have been shown to restore basic spatial hearing functions, 
including facilitation of speech understanding in spatial noise and improved sound 
localization ability. CIs are not as good as acoustic ears and there exists considerable 
2 
 
variability in outcomes. Despite the benefits CIs can provide, especially for speech 
understanding in quiet environments, the auditory signals provided by CIs are crude 
relative to a NH ear. Several alterations occur to a sound signal after being transduced by 
a CI that may limit the benefit of CIs for SSD individuals who still have a functional 
acoustic ear. Many of these alterations can be grouped into frequency, timing and level 
distortions. These distortions likely cause very different neural representations of two 
identical acoustic signals presented to a CI and NH ear.  
The overall goal of this dissertation was to investigate CI distortions and measure 
the effect on listeners’ ability to segregate voices in competing talker environments. 
Knowledge of the impact of these distortions will allow clinicians to make better 
programming and mapping choices for SSD-CI listeners, which could potentially improve 
their binaural hearing outcomes. Experiments utilized simulated CI speech processing 
(vocoded) presented to NH listeners. This was accomplished by presenting unprocessed 
sound to one ear and vocoded sound (sounds designed to mimic the auditory stimulus 
provided to SSD patients by the CI) to the other ear. The aims of this dissertation 
investigated how vocoder-simulated CI distortions might affect speech perception in 
competing-talker environments when the signals presented to the ears are controlled 
independently (Chapter 2), under more realistic simulated spatial configurations (Chapter 
3) and whether the effects of interaural mismatch can be attributed to the perceived fusion 
of speech signals across the ears (Chapter 4). Exploration of the effect of CI distortions on 
binaural hearing will help determine: (i) which distortions are particularly problematic and 
(ii) possible mapping and programming techniques that could improve binaural hearing for 
SSD-CI users.  
3 
 
 
Dissertation Aims 
 
The overall goal of this dissertation was to examine how distortions associated with 
CI processing could potentially limit hearing speech in multiple concurrent talker 
environments for those with SSD. Specifically, frequency, timing and level CI distortions 
were examined. The technique used to accomplish this goal was to incorporate distortions 
in these three dimensions into vocoder simulations presented to NH listeners.  This 
overarching goal was approached through a series of three specific Aims.  
 
Aim 1 (Chapter 2): Measure the extent to which negative impacts of interaural 
mismatch in frequency and timing on contralateral unmasking in vocoder simulations 
of CI listeners with SSD. 
 
In this dissertation, contralateral unmasking is operationally defined as the 
improvement in speech perception associated with adding interfering voices to an ear 
contralateral to the target speech. CI users with SSD receive contralateral unmasking but 
there exists large variability across users and they do not appear to benefit as much as NH 
participants listening to a vocoder. The contralateral unmasking metric is a proxy for how 
well listeners are able to combine information across the ears to facilitate hearing speech 
in background noise. Frequency mismatches between place of stimulation of the implant 
array and place of excitation in the NH ear could reduce the ability to benefit from binaural 
cues. Latency dissimilarities between the CI processor and NH ear are also likely to be 
4 
 
present and could cause disruptions in binaural hearing. Chapter 2 consisted of 4 
experiments examining the effects of spectral and temporal mismatch on speech 
understanding in the presence of interfering talkers using a contralateral unmasking 
paradigm that focused on the processes of binaural integration in a speech task. Experiment 
2.1 examined the effect of a spectral shift on contralateral unmasking and found a strong 
dependence of frequency match on performance in the task (i.e., less spectral shift = more 
contralateral unmasking). Experiment 2.2 examined temporal mismatch and its effect on 
contralateral unmasking, and found that physiologically plausible temporal mismatches did 
not greatly disrupt contralateral unmasking. Experiment 2.3 investigated the interaction 
between the frequency resolution of the vocoder and spectral mismatch and found that 
broader channel vocoding made listeners more immune to spectral shifts. Finally, 
Experiment 2.4 examined the potential interaction between spectral and temporal shifts and 
found that once a mismatch was implemented in the vocoder, the addition of a second 
mismatch (either temporal or spectral) did not further disrupt performance. More 
specifically, instead of finding an additive effect, once a mismatch was present the 
additional mismatch had a negligible effect on performance. Therefore, spectral mismatch 
effects performance more than temporal mismatch for the perceptual separation of a target 
from a masker background.  
 
Aim 2 (Chapter 3): Determine how contralateral unmasking and head-shadow benefit 
can be affected by envelope compression and expansion in HRTF-generated virtual 
auditory environments. 
 
5 
 
The goal of this chapter was to examine how compression distortions are likely to 
manifest in a simulated free-field environment. The experiments in Chapter 3 examined 
binaural squelch (3.1) and head-shadow (3.2) in a more realistic auditory environment than 
was used in Chapter 2. Specifically, the experiments examined how compression and 
expansion might affect the relative interaural level differences (ILDs) and target-to-masker 
ratios (TMRs) in the two ears and impact speech perception in the presence of interfering 
talkers that are spatially separated from the target talker of interest. Normally, listeners 
have access to spatial cues (interaural timing differences [ITDs] and ILDs) of 
environmental signals arriving at the two ears, which help listeners segregate competing 
talkers and other background noises to aid them in streaming auditory sources of interest. 
The contralateral unmasking paradigm employed in Chapter 2 was an artificial situation 
that would never occur in the free field. In a simulated free field environment, the listener 
has two different TMRs in the two ears and these will likely be distorted by compression. 
In Chapter 3, two experiments examined spatial hearing benefit with spatial cues provided 
via generalized head-related transfer functions (HRTFs) that mimic the effects of head-
shadow and path-length differences that are encountered for signals in the free field. Level 
compression and expansion were implemented in the vocoder to determine what effect 
amplitude manipulation had on speech perception in the presence of spatially separated 
interfering talkers. Envelope compression was found to have a negative effect on both 
squelch and head-shadow benefit. Envelope expansion had little effect on head-shadow 
benefit (Experiment 3.2) but increased binaural squelch, relative to the compression 
conditions (Experiment 3.1). It is likely that in the squelch experiment compression and 
expansion exerted their effects by changing the ILDs between the target and the maskers; 
6 
 
with the ILD between the target and maskers decreasing with compression and increasing 
with expansion. For the head-shadow experiment, diminished performance after 
compression could have been a result of change in the TMR at the vocoded ear (closest to 
target), which reduced audibility of the target. However, since both expansion and 
compression disrupted performance it is likely that envelope distortion reduced 
intelligibility of the target.  
 
Aim 3 (Chapter 4): Elucidate a possible fusion mechanism for the contralateral 
unmasking (squelch) effect in Chapters 2 and 3. More specifically, to develop and test 
a paradigm to measure binaural fusion in the presence of a spectral mismatch. 
 
Chapters 2 and 3 described experiments in which the addition of the vocoder provided 
contralateral unmasking, but this benefit was eliminated or largely diminished after CI 
distortions were implemented in the vocoder. The loss of binaural squelch and contralateral 
unmasking after vocoder distortion could be explained by a loss of binaural fusion ability. 
To more directly test this hypothesis, the experiments in Chapter 4 aimed to measure 
binaural fusion ability with and without a spectral mismatch. The spectral mismatch 
distortion was tested in these experiments, because that distortion profoundly disrupted 
contralateral unmasking in Chapter 2, compared to the other distortions tested in this 
dissertation. In Chapter 4, spectral mismatch was implemented not by linearly shifting the 
vocoder channels (as was done in Chapter 2), but rather by utilizing a more realistic 
mismatch based on published radiographic data averaged across CI users. The 
experimental approach had listeners identify the number of voices in the environment, 
7 
 
instead of relying on intelligibility. Binaural fusion was tested in two ways. The first 
experiment (4.1) involved listeners counting the number of voices they heard in a complex 
mixture. If an unprocessed and a vocoded version of the same voice presented to opposite 
ears were fused, the listener should report one voice; if not, they should report hearing two. 
This should occur even when the voice to be fused was accompanied by other voices in the 
mixture. Experiment 4.2 was a two-alternative forced choice (2AFC) task, in which the 
listeners had to discriminate a fusion interval from a non-fusion interval. In this experiment, 
the “fused” interval had the same voice in the two ears (one vocoded, one unprocessed), 
and the non-fused interval had two different voices in the two ears. If listeners were able 
to perceptually fuse the two voices in the first interval, they should have been able to more 
easily tell the difference between the mixture containing the same voice in the two ears and 
the mixture containing no common voices in the two ears. Experiment 4.1 found that 
people reported a number of voices that indicated they were not fusing the fusion stimulus, 
regardless of the vocoder condition. In Experiment 4.2, when binaural fusion was assessed 
via a discrimination test, listeners were generally more likely to select the correct fusion 
interval with a place-matched vocoder mapping than with a mismatched mapping. Taken 
together, these results suggested that the listeners were achieving incomplete fusion. For 
spectrally matched stimuli, the speech stimuli were sufficiently fused between the two ears 
and this was enough to detect that there was a common voice presented to the two ears 
(Experiment 4.2). Yet, the stimuli were unfused enough that listeners still reported a 
diotically presented voice as two voices when they were asked to count the number of 
talkers they heard in the mixture. 
 
8 
 
The remainder of this chapter will review literature relevant to the main question 
raised by this dissertation: How do simulated CI distortions affect listeners’ ability to 
segregate voices in competing talker environments? First, the role of binaural hearing in 
speech perception in noisy environments is discussed. Second, the concepts of binaural 
fusion and auditory grouping are introduced. Third, an overview of SSD and its treatment 
options are presented. Fourth, possible sources of distortion in CI processing that can affect 
binaural hearing are described.  
 
Binaural hearing is critical for improving speech perception in noisy environments 
 
 Binaural hearing provides a number of benefits for listening in complex acoustic 
environments. Two of the most important benefits are the ability to localize sounds and the 
ability to understand speech in noise. The phenomenon of being able to successfully focus 
attention on a particular stimulus or talker while filtering out or ignoring competing talkers 
has been referred to as “the cocktail-party effect” (Cherry, 1953; Bronkhorst, 2000). 
Binaural hearing is critical for successful hearing in these environments (Hawley, Litovsky, 
& Culling, 2004). Having two ears allows for computations of spatial cues to perceptually 
separate sound sources based on their different locations.  
Perceiving a talker of interest in multiple talker environments is difficult due to 
auditory masking. Auditory masking occurs when the presence of one sound interferes with 
the perception of another. There are multiple types of masking (Gelfand, 2004), which are 
generally divided into two categories: energetic and informational (Kidd, Mason, & 
Deliwala, 1994; Leek, Brown, & Dorman, 1991; Watson, 2005). Energetic masking can 
9 
 
occur when the masking energy renders the signal inaudible, which tends to occur when 
there is a high degree of spectral and temporal overlap between targets and maskers. 
Energetic masking occurs from sound-wave interference in the cochlea. For NH listeners 
spatially separating a target from noise results in reduced energetic masking of the target. 
Up to a 10 decibel (dB) benefit in speech-reception thresholds (i.e., binaural unmasking) 
can occur in energetic masking situations when the target is spatially separated from the 
maskers (Best, Thompson, Mason, & Kidd, 2013; Bronkhorst, 2000). Release from 
energetic masking is thought to occur via a combination of two mechanisms: (i) the head-
shadow effect, utilized predominantly for higher sound frequencies (van Hoesel, 2012) and 
(ii) binaural squelch, which requires neural computation of interaural difference cues. The 
head-shadow effect results in one ear having a better signal-to-noise ratio (SNR) than the 
other ear when the source of interest and the masking sounds are spatially separated. A 
target to masker ratio (TMR) is related to an SNR, but where the signal is denoted as the 
target and the maskers are the noise.  Change in speech reception thresholds due to the 
head-shadow effect is approximately 6 dB in the speech frequency range (500-2000 Hertz 
(Hz)) and up to 15 dB at higher frequencies (Schleich, Nopp, D’Haese, 2004a). When a 
target and masker are spatially separated, this results in one ear having a better SNR than 
the other ear. Therefore, the head-shadow benefit involves attending to the ear with the 
more favorable SNR for the target signal of interest.  
Binaural squelch requires neural computations of interaural cues to facilitate 
hearing in spatial noise. The shape and size of the human head creates ITDs and ILDs.  
ITDs are created by a sound originating from a specific location having differential arrival 
times at each ear, because it has to take a longer path to reach the far ear. ILDs are created 
10 
 
by the intensity difference that occurs when sounds are attenuated in one ear relative to the 
other (due to the head shadow, for example). The auditory system takes advantages of these 
interaural differences to improve signal detection in noise. According to the equalization-
cancelation model of masking release, the binaural system can reduce the impact of 
masking noise by carrying out neural computations that effectively attenuate and delay the 
entire signal in one ear relative to the other ear (equalization). By subtracting the resulting 
signals between the ears (cancellation), the binaural system can effectively reduce the 
amount of masking experienced by the listener (Durlach, 1963). An alternative theory for 
masking release involves “glimpse listening,” which requires that the listener take 
advantage of dips in the background noise in order to better detect the target signal of 
interest. This is thought to occur by providing the brain with the “lost” signal components 
in each individual ear and then integrating this information from each ear (Cooke, 2006). 
In situations involving multiple competing talkers, binaural squelch can also be thought of 
in terms of added listening advantage obtained by perceived spatial separation between a 
target and masker. Binaural squelch can of course arise from actual spatial separation but 
it is important to note that perceived spatial separation is often sufficient (Freyman, Helfer, 
McCall, & Clifton, 1999) at least in situations with multiple simultaneous talkers that are 
difficult to perceptually separate based on monaural cues alone. 
The other category of masking that complicates speech perception in noisy 
environments is informational masking. Informational masking occurs due to a difficulty 
in identifying an audible signal that is accompanied by other similar sounding signals (Leek 
et al., 1991). An example of informational masking is the difficulty encountered when 
trying to listen to one talker in the midst of multiple competing talkers, all of which are 
11 
 
audible to the listener. For example, the difficulty is even greater when the talkers are the 
same gender (Brungart, 2001). The problem is likely a failure of auditory stream 
segregation or auditory scene analysis (ASA). ASA is the process by which the auditory 
system can separate and segregate sounds coming from different sources and locations 
(Bregman, 1994). For speech stimuli, a failure of stream segregation or informational 
masking occurs most often when targets and maskers are perceptually and semantically 
similar (Ihlefeld & Shinn-Cunningham, 2008). Informational masking is much more likely 
to occur when targets and maskers sound alike, for example talkers with comparable voice 
pitch. In this situation, fewer cues are available to either stream the target and masker 
speech apart or to identify which of the words spoken belong to the target and which belong 
to the masker talker. Confusability can also be encountered by trying to follow a string of 
words spoken by a target talker while a masker talker is also reciting a string of words 
concurrently.  It has been proposed that the two main mechanisms driving informational 
masking are stimulus uncertainty and target-masker similarity (Durlach et al., 2003). 
Informational masking can also occur with non-speech stimuli such as a task involving a 
complex mixture of tonal stimuli (Kidd, Mason, & Arbogast, 2002).  
For speech stimuli, a number of cues are known to aid listeners in achieving release 
from informational masking, including voice pitch, relative onset timing of different 
speakers, and spatial separation between targets and maskers. Differences in voice pitch 
between talkers can aid in streaming targets of interest from a background of competing 
talkers. Onset timing differences can aid in release from masking due to the strong 
influence of timing on stream segregation (Darwin & Hukin, 1998; de Cheveigné, 
McAdams, & Marin, 1997). It is known that spatial cues can also provide a great deal of 
12 
 
release from informational masking (Arbogast, Mason, & Kidd, 2002; Freyman, Helfer, & 
Balakrishnan, 2005; Hall, Buss, & Grose, 2005; Kidd, Mason, Rohtla, & Deliwala, 1998). 
In particular, the two primary binaural cues—ILDs and ITDs—are theorized to play a role 
in contributing to binaural unmasking (Hawley et al., 2004; Kidd, Mason, Best, & Marrone, 
2010; Middlebrooks & Green, 1991), but the extent to which each cue is involved is still a 
matter of debate.  
Binaural localization cues have been shown to facilitate binaural unmasking. In 
cases of high informational masking, a squelch or unmasking benefit arises from ILD and 
ITD cues that allow the listener to perceive the target and maskers as arriving from different 
points in space (Freyman, Balakrishnan, & Helfer, 2001). ITDs are generally only useful 
for frequencies below 1500 Hz (except for envelope ITDs for modulated stimuli, which 
can be relayed at higher carrier frequencies), and ILDs are only useful for frequencies 
above 1500 Hz (Middlebrooks, Macpherson, & Onsan, 2000; Rayleigh, 1907; Wightman 
& Kistler, 1992). High- or low-pass filtering of speech allows for the “removal” of usable 
ILD or ITD information in a signal. Several studies have examined the role of either ILDs 
or ITDs in binaural unmasking using young listeners with NH (Hawley et al., 2004; Ihlefeld 
& Shinn-Cunningham, 2008; Kidd et al., 2010). Some research points to the dominance of 
ITDs as the cue most necessary for binaural unmasking (Hawley et al., 2004). Other 
research has found that either cue can provide sufficient binaural unmasking when 
controlling for head-shadow benefits (Kidd et al., 2010). Gallun et al. (2005) assessed the 
role of ILDs and ITDs separately in NH listeners. They employed a word-identification 
task using the coordinate response measure (CRM) corpus (Bolia, Nelson, Ericson, & 
Simpson, 2000), which has been shown to produce a great deal of informational masking 
13 
 
(Brungart, 2001). Gallun et al. (2005) presented the target monaurally and the maskers 
diotically (identical signals presented to each ear) to create a perceived spatial difference 
between the locations of the target and masker signals. They systemically varied the ILD 
and ITD components in the masker signal to examine the role of each cue in release from 
informational masking. They found substantial release from masking with ITDs or ILDs 
alone. More importantly, they found that ILDs played a role in masking release when TMRs 
were held constant at the “better-ear.” Therefore, this masking release cannot be explained 
by better-ear listening alone.  
 
The role of binaural fusion and auditory grouping in spatial hearing 
 
An important prerequisite to the ability of the binaural system to facilitate the 
perceptual separation of concurrent voices is that the listener must be able to perceptually 
fuse the coherent auditory information arriving at the two ears. This is referred to as 
binaural fusion and it allows NH listeners to perceive diotic sounds as a single centered 
sound.  
Binaural fusion is believed to occur in the mammalian superior olivary complex 
(SOC) in the brainstem, with the higher-order auditory areas receiving a more complete 
and summed auditory object after subcortical processing (Moore, 2000). It is believed that 
coincidence detectors and/or interaural cross-correlation give rise to fused perception of 
binaural signals (Roberts, Seeman, & Golding, 2013). Shinn, Baran, Moncrieff and Musiek 
(2005) tested NH listeners on a variety of dichotic speech tasks and found that binaural 
fusion was less likely to be affected by memory or the listener’s attention than by other 
14 
 
speech tasks. They therefore concluded that binaural fusion likely occurs below a listener’s 
conscious control at the subcortical level. Even when asked to switch focus to one ear or 
another, listeners still reported one fused stimulus, indicating they did not have conscious 
“control” over the percept. Further evidence of fusion occurring in the brainstem comes 
from electrophysiological experiments. With presentation of matched interaural input, 
large binaural-difference response amplitudes can be measured at the level of the 
brainstem. This binaural-difference response has been measured in humans, for NH, 
hearing-impaired (HI), and CI listeners as well as for animals (Cai et al., 2015; Goksoy, 
Demirtas, Yagcioglu, & Ungan, 2005; Pelizzone, Kasper, & Montandon, 1990; Riedel & 
Kollmeier, 2002). Additionally, the binaural-interaction component (i.e., the difference 
waveform between the summed monaural response and the binaural response) has been 
linked to perceptual fusion ability (Zhou & Durrant, 2003). Therefore, proper integration 
of binaural stimuli at the level of the brainstem is paramount to successful binaural fusion.  
 
Single-sided deafness and treatment options  
 
Due to the immense importance of binaural hearing for communication in noisy 
environments, individuals with only one functional ear are at a severe disadvantage. SSD—
the profound loss of hearing in one ear while the other ear remains normal-hearing or near-
normal hearing—is a form of hearing loss with functional limitations that has been 
traditionally underappreciated. It is estimated that there are nearly 60,000 new cases of 
SSD a year in the US (Baguley et al., 2009; Carlyon et al., 2010; Sinopoli, 2003). SSD is 
now known to cause many problems for those afflicted with it. Some common complaints 
15 
 
include social isolation, driving difficulties, problems working, embarrassment and loss of 
confidence (McKinney, 2002). Traditionally, SSD was not treated because it was not 
considered incapacitating (i.e., individuals with SSD still have a normal-hearing ear). 
However, a number of studies have demonstrated that SSD is, in fact, a substantial 
disability. For example, individuals with SSD exhibit reduced language comprehension as 
well as reduced oral communication abilities (Lieu, Tye-Murray, Karzon, & Piccirillo, 
2010). Additionally, for children with SSD, learning and academic challenges have been 
widely reported, with these children being 10 times more likely to be held back in school 
(Bess & Tharpe, 1984; English & Church, 1999). Until recently, the only treatments 
available for SSD involved hearing-aid solutions that routed signals from a microphone 
near the deaf ear to the NH ear.  The two most common solutions are bone-anchored 
hearing aids (BAHAs) and contralateral routing of signal (CROS) hearing aids. BAHAs 
are surgically implanted into the bone just behind the deaf ear, and transmit sound to the 
opposite (functional) cochlea through the skull through bone conduction. CROS hearing 
aids are removable devices that contain a receiver on the deaf side of the head and transmit 
sound to the functional ear via a microphone in the deaf ear. These methods have been 
successful in alleviating some of the adverse effects of SSD mainly by giving access to 
signals presented toward the deaf side. However, these devices can impair performance in 
cases where the unwanted noise is on the deaf side. This occurs because in these cases, the 
device transmits noise to the normal hearing ear, thereby offsetting the head-shadow 
advantage that is otherwise present (Arndt et al., 2010).  Moreover, these treatments do not 
restore binaural hearing, and as a result, these patients still experience difficulty with sound 
16 
 
localization and speech understanding in noise (Grantham et al., 2012; Linstrom, 
Silverman, & Yu, 2009). 
In the past several years, CIs have been considered as a possible new treatment 
option for SSD. Although CIs are not currently approved by the United States Food and 
Drug Administration for SSD patients, criteria for implant candidacy at individual centers 
and hospitals has relaxed in the last few years and a substantial number of individuals with 
SSD in the U.S. and in Europe have received CIs. CIs are the world’s first widely successful 
neuroprosthetic devices. They are implanted in individuals with severe or profound hearing 
loss, allowing restoration of basic levels of hearing and speech understanding. Over a 
quarter of a million people have been implanted with CIs worldwide and that number is 
steadily rising (NIH Report, 2013). A CI consists of an external microphone, a speech 
processor, a transmitter, a receiver, a stimulator and an electrode array. A behind-the-ear 
microphone picks up sounds from the environment and the speech processor then filters 
the signal into a number of frequency bands (depending on the number of electrode 
channels) and extracts information about the signal envelopes (i.e., slow fluctuations in the 
range of 2-50 Hz) in each band. The receiver and stimulator then convert the signal 
envelopes into a series of “signal-shaped” pulse trains that activate the electrodes of the 
implant array. The activated electrodes in turn directly stimulate the neurons of the auditory 
nerve. This provides the brain with a signal that captures important features of the original 
signal in the environment. This method of delivering sound to the brain, referred to as 
electric hearing, lacks the temporal and spectral resolution of sounds that are received by 
individuals with a normal auditory system (i.e., acoustic hearing) (Rubinstein & Miller, 
1999). However, the impoverished signals of a CI are still able to relay enough information 
17 
 
for high intelligibility of speech for many individuals (O’Donoghue, Nikolopoulos, & 
Archbold, 2000).  
Although CIs have been widely used as a treatment for the profoundly deaf, the 
first use of CIs in individuals with SSD was intended as a treatment for debilitating tinnitus 
in the deaf ear (van de Heyning et al., 2008). CIs proved to be successful in alleviating 
tinnitus for many patients and also had an encouraging secondary benefit: improved sound 
localization and hearing in noisy competing talker environments (Vermeire & van de 
Heyning, 2009). CIs for SSD allow for the use of two separate auditory signals (one in the 
implanted ear, one in the NH ear). This is in contrast to BAHAs and CROS hearing aids, 
which route the signals at the deaf ear to the one working ear. The availability of two 
distinct auditory inputs afforded with a single CI in one ear and acoustic hearing in the NH 
ear offers the potential for binaural hearing advantages among those with SSD.  
Unfortunately, there exist several reasons that the same cues from binaural hearing 
utilized by NH listeners for speech perception (as previously discussed) may not be as 
effective for SSD-CI users. First, with regard to binaural squelch, CI users do not have 
access to fine structure ITDs so they would need to rely mainly on ILD information to 
receive contralateral unmasking (Loizou, 2006). Therefore, CI listeners must rely on 
accurate ILD cues for spatial hearing. Second, with regard to binaural fusion, “fused” 
perception is likely to be impaired due to the presence of potential distortions (discussed 
below) in electric hearing that are encountered by SSD-CI listeners. Because spatial release 
from informational masking depends on the listener perceiving the target and masking 
speech as coming from different spatial locations (Freyman et al., 2001), listeners would 
likely not get a squelch benefit if they were unable to integrate signals across the ears to 
18 
 
create a single perceptual object. The prediction, therefore, is that SSD-CI listeners would 
receive less SRM than NH listeners because diotic signals are less likely to be perceived 
with a fused image. Third, CI processing comes with a severe loss of pitch information 
making pitch cues much less effective for release from informational masking (Freyman, 
Balakrishnan, & Helfer, 2008).  
Fortunately, there also exists compelling evidence that SSD-CIs can aid in spatial 
hearing. This comes from studies that have examined performance in localizing a sound 
source (Arndt et al., 2010; Firszt et al., 2012; Hansen et al., 2013) and from studies that 
have assessed the advantages for listening to speech in noise when there is a spatial 
separation between the two (Bernstein, Schuchman & Rivera., 2017; Buechner et al., 2010; 
Firszt et al., 2012; Hansen et al., 2013). CIs primarily improve speech perception for 
listeners with SSD in configurations where the signal is on the deaf side, and/or the masker 
is on the NH side. This pattern of conditions for which a benefit is observed is consistent 
with the idea that the CI allows users to take advantage of head-shadow effects and a better-
ear listening strategy (Bernstein et al., 2017; Arndt et al., 2010; Buechner et al., 2010; Firszt 
et al., 2012; Hansen et al., 2013). Having two ears allows the listener to take advantage of 
listening to the ear with the better SNR, regardless of which side of the head receives the 
better SNR (Schleich et al., 2004a). The actual benefit that SSD listeners receive is smaller 
than in NH individuals, on the order of 2-5 dB. This is probably because the CI signal is 
distorted relative to that received by the NH ear, which appears to reduce the normal head-
shadow advantage. 
While previous studies suggest CIs can provide a head-shadow benefit, to date there 
is little evidence that a CI can provide people with SSD with other speech-in-noise benefits 
19 
 
associated with binaural hearing, namely, binaural squelch. However, the results of a pair 
of recent studies suggest that SSD-CI listeners may experience a binaural-squelch benefit 
for speech understanding in certain situations. Bernstein, Goupell, Schuchman, Rivera, and 
Brungart (2016) investigated whether a CI could provide benefits to speech perception in 
complex auditory scenes beyond those provided by the head-shadow (better-ear) 
advantage.  They employed a paradigm that eliminated the head-shadow advantage in order 
to investigate whether listeners could combine information across the two ears to improve 
speech reception performance via a binaural benefit. This was accomplished by using 
headphones to present the target talker and two interfering maskers to the one acoustic ear.  
They then investigated the impact on performance of also presenting the same interfering 
masker signals to the opposite ear via direct connection to the CI. Putatively, for NH 
listeners presented with signals over headphones in this manner, this results in the 
perception that the maskers are speaking to them from the center of the head, while the 
target is speaking to them from the side, thereby providing a spatial cue to perceptually 
separate the target signal from the maskers (Bernstein et al. 2016). Figure 1.1 shows the 
results for the SSD-CI listeners in this study. SSD-CI listeners received a binaural benefit 
in conditions involving competing talkers of the same gender as the target talker. The 
interpretation of this result is that the spatial information provided by the CI helped 
listeners to perceptually separate the competing talkers in conditions where the target and 
maskers were easily confused with each other (informational masking). These implant 
users presumably capitalized on differences in the combined target and masker signals in 
the two ears, allowing for improved perceptual segregation of multiple competing voices.  
The opposite-gender masker conditions did not result in significant binaural unmasking, 
20 
 
presumably because these were situations with less informational making, thus the target 
and interfering speech could be segregated via monaural cues (Figure 1.1; Bernstein et al., 
2016). Because the target signal was not presented to the second ear, there was no better-
ear advantage provided at the CI ear using this paradigm. These results show that the 
contralateral unmasking paradigm is an effective way to study the role of binaural squelch 
(integration of information across the ears via spatial cues) for the release from 
informational masking. Bernstein et al. (2017) found a similar result when testing SSD-CI 
users in the free field with the target in front and symmetric maskers on either side, so there 
was no long term head-shadow advantage available to the listeners.  As in the Bernstein et 
al. (2016) study, listeners showed a benefit from the implant with same-gender interferers 
but not with speech-shaped noise or opposite-gender interferers. 
 
 
Figure 1.1. Significant improvements in performance were measured in the one and two-same 
gender conditions. Therefore, binaural unmasking in this population seems to occur mainly in 
situations with high informational masking (adapted from Bernstein et al. 2016). 
 
 
21 
 
  
Figure 1.2. Large amounts of inter-subject variability seen among CI users with SSD. The best 
CI listener is at the level of the mean of the vocoder data in all speech masker situations 
(adapted from Bernstein et al. 2016). 
 
Despite the advantages of CIs for SSD that have been observed, there were several 
indications that these listeners are not receiving the maximum benefit possible from their 
device.  First, there was a large degree of inter-subject variability in the amount of masking 
release each individual patient receives with their CI (Figure 1.2)  (Bernstein et al., 2016). 
Second, vocoder simulations of cochlear implantation for SSD presented unilaterally to 
NH listeners show more masking release than is observed for actual CI patients. The CI 
listener who had the best performance in this task was just about as good as an average NH 
listener who was listening to vocoder simulations (discussed in following paragraph). 
These results suggest that the SSD-CI listeners were not performing the task optimally. 
SSD-CI users could potentially receive a larger binaural benefit from their implant with 
performance more closely matching that of the NH vocoder listeners.  
22 
 
 Vocoded speech presented to NH listeners is often utilized to manipulate 
experimental parameters involving CI processing without all of the variability inherent in 
actual CI users; it is a common simulation technique used in CI research (Loizou, 2006). 
Vocoding performs some of the same signal-processing steps that are carried out in a CI 
processor, including allocating the original signal into separate channels (analysis filters) 
within the audible speech range of 100Hz - 10,000Hz, and then extracting the envelopes 
from the resulting signals. These envelopes are then used to modulate an acoustic carrier 
signal instead of electrical pulse trains in CIs. Vocoding also permits manipulation of 
variables relating to CI processing while avoiding common confounds in CI data, for 
example, duration of CI user deafness, and differences in coding strategies and electrode 
configurations across different CI manufacturers. Vocoding allows for the independent 
manipulation of certain distortions inherent in CI processing, which allows for more careful 
study of each distortion.   
Although useful, vocoder processing is an imperfect estimation of what CI users 
might hear (Freyman et al., 2008; Ihlefeld & Litovsky, 2012; Li & Loizou, 2009). Vocoder 
simulations can lack certain aspects that are characteristic in CI processing, such as spectral 
smearing, because electrical current spread is difficult to represent acoustically. Also, 
different coding strategies, such as continued interleaved sampling (CIS), are challenging 
to reproduce using a vocoder. CIS requires that the pulses sent to an electrode array are 
presented in non-overlapping sequences. This technique is difficult to mimic in a 
simulation. Nevertheless, vocoder simulations have been an invaluable tool for studying 
CI processing and perception. Bernstein, Iyer and Brungart (2015) and Bernstein et al. 
(2016) examined binaural unmasking using vocoder simulations of CI users with SSD and 
23 
 
the same competing talker task described above (see Figure 1.1). For the vocoder listeners, 
masking release was observed for all multi-talker conditions (Figure 1.3; Bernstein et al., 
2016). These results are in contrast to the results from the actual CI users, because these 
vocoder-simulation studies show contralateral unmasking for all background masker 
conditions, not just for same gender interferers. While the specific reasons for the 
variability in performance between the CI listeners and the NH listeners who are presented 
with vocoded speech are unknown, there are a few possible explanations. One explanation 
is variation in intrinsic characteristics of individual CI listeners, which cannot easily be 
addressed through signal processing simulations. These include current spread (van 
Hoesel, 2012), lack of cortical plasticity (Litovsky et al., 2012; Maslin, Munro, & El-
Deredy, 2013), spiral ganglion neural survival (Maslin et al. 2013) and duration of deafness 
(Blamey et al., 2012). Alternatively, the variability could be caused by certain 
programming characteristics of the CI, particularly distortions inherent in CI processing 
that could affect binaural hearing. Since these distortions can possibly be remedied by 
signal-processing or mapping techniques, these extrinsic characteristics will be examined 
in this dissertation.  It is hypothesized that the actual CI users with SSD did not receive the 
same levels of contralateral unmasking as those in the simulation, because of distortions 
inherent in CI processing such as spectral, temporal and level mismatches between the CI 
processor and the NH ear. The next section explores how these kinds of CI distortions 
might affect binaural hearing. 
24 
 
 
 
Figure 1.3.  Vocoder simulations of CI users with SSD show masking release in all multi-talker 
conditions. No release was observed for the noise masker condition (adapted from Bernstein 
et al. 2016).  
 
 
Possible sources of distortion in CI users with SSD 
 
Spectral mismatches and their effect on binaural hearing. 
 
 Accurate binaural processing requires inputs that are frequency matched across the 
ears (Joris, Smith, & Yin, 1998).  Therefore, a mismatch between the cochlear place of 
stimulation for the CI and acoustic ear is likely to limit binaural benefit for SSD-CI 
listeners. CIs are usually programmed to deliver the frequencies important for speech, 
between about 150 and 8000 Hz. However, the electrode array is not inserted all the way 
into the cochlea. As a result, this frequency mapping does not correspond to the intrinsic 
mapping of the basilar membrane of the inner ear. Thus, for the vast majority of CI users, 
there exists a large incongruity between the mapping of their CI electrode and the tonotopic 
25 
 
axis of their basilar membrane. For traditional CI patients with two deaf ears, this approach 
makes sense, because the goal of the CI is to restore as many speech cues as possible to the 
implanted ear.  Research has shown that with months (or years) of training and experience, 
post-lingually deafened CI users are able to “remap” speech sounds and understand speech 
(Svirsky, Silveira, Neuburger, Teoh, & Suárez, 2004).  
There is reason to believe, however, that this might not be the optimal approach to 
clinical mapping for CI users with SSD. Because SSD listeners still have one functioning 
ear, the main role of the CI is to assist the NH acoustic ear by providing spatial hearing 
benefits. For these patients, speech intelligibility via the CI alone may not be the ultimate 
goal. Thus, these patients might benefit from an electrode mapping that more closely 
matches the tonotopic organization of their NH basilar membrane, at the risk of not 
including some portions of the full frequency spectrum that is typically provided to CI 
users. This would allow for a frequency match between the implanted ear and the 
functioning ear, thereby potentially facilitating a larger binaural benefit. Insertion depths 
vary between CI recipients, either because of the properties of the electrode array or 
difficulties during surgery. The average insertion depth is about 20mm, but some CI users 
can have much shallower insertion depths (Ketten et al., 1998). A normal cochlea is about 
35mm long (Fried, 1990). The Greenwood Function relates frequency selectively along the 
cochlea to the position of the hair cells that respond to that frequency (Fried, 1990). The 
Greenwood Function can be used to estimate the lowest frequencies that a CI user can hear. 
For example, if an electrode array was fully inserted (25mm), the lowest frequency auditory 
nerve fiber characteristic frequency that can be stimulated by a CI would be around 500Hz. 
Thus, if a place-matched mapping strategy was implemented for those with SSD-CI 
26 
 
listeners, they would lose some low frequency speech information from the CI signal. 
However, this loss of low-frequency information is much less likely to be as deleterious to 
SSD-CI listeners as for traditional CI listeners, because the SSD listener can rely on their 
NH ear for low-frequency speech cues. Because head-shadow is minimal at low 
frequencies (Bronkhorst & Plomp, 1988; Rayleigh, 1907), there will be very little 
difference in SNR between the two ears in this frequency range.  Therefore, any low-
frequency speech cues that are available at the CI ear will also be available at the NH ear. 
Place-of-stimulation mismatches in bilateral CI (BICI) users are known to disrupt 
calculations of spatial cues, such as timing and level differences. Small interaural frequency 
offsets can cause a substantial disruption in localization of a free-field sound source 
(Goupell, Stoelb, Kan, & Litovsky, 2013; Kan, Stoelb, Litovsky, & Goupell, 2013; 
Litovsky et al., 2012). The effect of small offsets is measured by changes in just notable 
differences (JND) in ITD and ILD perception. Small mismatches of ±2 electrode pairs can 
change perception of ITDs and ILDs in a lateralization task. However, when JNDs were 
estimated from the lateralization data it was found that ILDs were generally more immune 
to these interaural mismatches then were ITDs. Interaural mismatch led to a doubling of 
normalized JNDs for ITDs with only a 3mm mismatch, for ILDs the mismatch increased 
to 12 mm before this occurred (Kan et al., 2013). Therefore, even a small mismatch 
between ears would require a larger change in stimulus location to be correctly localized. 
Binaural fusion has also been shown to be limited in BICI when spectral mismatches are 
applied; listeners report unfused auditory images and often perceive multiple auditory 
images when there should only be one (Kan et al., 2013). Goupell et al. (2013) examined 
the effect of interaural frequency mismatch on binaural fusion in NH participants listening 
27 
 
to bilateral vocoder stimuli. They found that listeners were more likely to report multiple 
auditory images (i.e. stimuli were not fused) with increasing spectral mismatch between 
the ears.  Corroborating the work from Goupell et al. (2013), work from Kan et al. (2013) 
performed the same interaural spectral mismatch experiment but in actual BICI listeners. 
They found that increasing mismatch led to perception of multiple auditory images in some 
of the listeners and more variability in responses across listeners. Taken together, these two 
studies indicate that spectral mismatch impairs fusion in both CI listeners and vocoder 
listeners alike. This degradation of binaural cues with frequency mismatch is likely to affect 
binaural squelch and subsequent contralateral unmasking in CI listeners.  
SSD-CI listeners often show difficulty obtaining binaural summation after 
implantation, in contrast to BICI users who are able to obtain summation relatively quickly 
after bilateral implantation (Dunn, Tyler, Witt, Ji, & Gantz, 2012; Eapen, Buss, Adunka, 
Pillsbury, & Buchman, 2009). Binaural summation refers to the listening advantage 
obtained by having two copies of the same signal (i.e. one in each ear), the loudness of the 
signal is increased and it can lead to improved detection thresholds (Reynolds & Stevens, 
1960).  Binaural fusion is a related process but refers to a listener’s ability to combine 
information across the ears to create the percept of a single fused sound (discussed 
previously for NH listeners).  Aronoff, Shayman, Prasad, Suneel, and Stelmach (2015) 
tested binaural fusion with temporal and spectral compression in vocoder simulations of 
SSD-CI listening. They tested fusion by presenting vocoded stimuli in one ear and 
unprocessed stimuli in the other, and asked listeners if they heard one sound or two. The 
authors then applied various levels of spectral compression and found that more spectral 
mismatch resulted in less binaural fusion. Reiss et al. (2014) examined fusion using 
28 
 
dichotic tones by presenting dichotic tones in a five-alternative forced choice (AFC) task 
to CI users with residual hearing in their non-implanted ear. These listeners had moderate-
to-severe hearing loss and were fitted with a hearing aid in their acoustic ear and are 
referred to as bimodal CI listeners. To examine fusion, the authors presented a stimulus 
simultaneously to the implant and the acoustic ear. The listeners were asked if they heard 
1 or 2 sounds. If one sound was selected, they were asked to report which ear had the higher 
pitch or if they had the same pitch (indicating fusion). Many of the listeners reported fusion 
ranges of an octave or more (ranges much higher than measured in NH listeners). Most 
interestingly, the fusion ranges tended to match the pitch mismatch between mapped 
electrode pitches. The authors suggested that listeners might be compensating for spectral 
mismatch by increasing binaural fusion ranges, at least for bimodal CI users (Reiss et al., 
2014). Even though bimodal CI listeners and BICI users are different than CI users with 
SSD, due to the availability of one relatively normal acoustic ear, the binaural cues 
available to SSD listeners are nevertheless limited by the poorer CI ear. Thus for SSD-CI 
listeners, it is likely that a spectral mismatch would reduce the ability to stream 
simultaneous voices and thus reduce binaural benefits for speech in noise.  
The aforementioned studies examining spectral mismatch and fusion used simple 
tonal stimuli or stimuli presented to a single electrode. In studies examining the impact of 
spectral mismatch on perception of more complicated stimuli such as speech, the results 
are generally the same—spectral mismatch reduces binaural fusion and intelligibility of 
bilaterally presented stimuli. Given the widely reported effects of frequency mismatch on 
binaural fusion and the fact that spectral mismatch is essentially guaranteed to be present 
in SSD-CI listeners, spectral mismatch is a likely contributor to limitations and variability 
29 
 
in binaural unmasking benefits for SSD-CI listeners. Therefore, the experiments described 
in Chapter 2 examined the effect of a linear spectral mismatch on binaural squelch, using 
the contralateral masking paradigm developed by Bernstein et al. (2015; 2016) to examine 
the benefit of CIs for SSD in a situation where the CI does not produce a head-shadow 
benefit. The experiments in Chapter 4 examined the effects of a more realistic spectral shift 
(based on published CI insertion angle data; Landsberger, Svrakic, Roland, & Svirsky, 
2015) on binaural fusion. 
 
Temporal disparities between cochlear implants and normal hearing ears.  
 
Many of the binaural advantages that NH listeners experience depend on the 
detection of temporally coherent signals across the ears. Therefore, a temporal delay 
between a signal received by the NH ear and the CI ear could negatively impact spatial 
hearing performance for SSD-CI listeners. The net temporal delay between the CI and NH 
ears is determined by the relative delay between electrical and acoustic processing. The 
temporal delay of the CI signal depends on the manufacturer of the device, the stimulation 
rate of the processor, and the coding strategy employed (Green, Faulkner, & Rosen, 2002). 
Temporal responses for NH ears depend on the mechanical properties of the traveling wave 
in the cochlea and the firing rates of the auditory nerve fibers to encode acoustic input. 
Electrically evoked auditory brainstem responses (eABRs) have been used to compare 
latency differences at the level of the brainstem for NH and CI users. With direct 
stimulation of the auditory nerve, the latency of the traveling wave in a normal cochlea is 
on average 4 - 8 ms slower than the rate of a CI processer, when measured at the level of 
30 
 
the inferior colliculus (wave V) (Dooley et al., 1993; Rasetshwane, Argenyi, Neely, Kopun, 
& Gorga, 2013a). This traveling wave is not replicated in a CI. However, in most cases the 
delay of the CI speech processor is even longer than the latency of the traveling wave and 
neural transduction in a NH ear.  The delay of the speech processor is also not uniform 
across CI manufacturers. To get an idea of the estimated delay associated with the speech 
processor, we contacted research staff for each of the major CI manufacturers who provided 
estimates. Cochlear Ltd. uses a coding algorithm that induces a delay at the CI ear of about 
10.5 to 12.5 ms relative to an acoustic ear (vanDijk, private communication 2015). 
However, Med-El uses a proprietary processing scheme that incorporates frequency-
dependent group delays.  Thus, their devices have a delay that more closely matches that 
of an acoustic ear on the order of 0.5 to 1.6 ms relative to the CI (Zirn, Arndt, Aschendorff, 
& Wesarg, 2015). The delay of Advanced Bionics devices falls between that of Cochlear 
and Med-El, with a delay of about 9 to 11 ms relative to an acoustic ear (Litvak, private 
communication 2016).  
Differences between the latency in the acoustic ear and delay in the CI ear make it 
nearly impossible for a listener to obtain useful ITD information, given that the maximum 
delay for real-world sounds is less than 1 ms (Middlebrooks, 1999). Additionally, previous 
research has found that bilateral interstimulus intervals greater than 1 ms decrease binaural 
fusion in NH children performing binaural fusion tests (Chermak & Lee, 2005). The fusion 
stimuli used in Chermak and Lee (2005) were dichotic, white noise stimuli. In contrast, the 
binaural fusion of speech stimuli may be more resilient to interaural delays. The auditory 
system has the ability to suppress echoes that occur within a certain time window after the 
initial stimulus.  This “precedence effect” results in the echo not being perceived as a 
31 
 
separate object.  The echo threshold for speech is on the order of 30 ms (Litovsky et al. 
1999; Stecker & Hafter, 2002). 
Temporal disparities between a CI ear and a NH ear in people with SSD might occur 
not only due to differences in encoding in the CI and NH ears, but also as a result of 
physiological changes in the brain after deafness. Duration of unilateral deafness is known 
to impact cortical as well as subcortical circuits, including at the level of the brainstem 
(Dong, Mulders, Rodger, & Robertson, 2009). The brainstem is highly involved in ITD 
computations for humans and other mammals alike (Grothe, Pecka, & McAlpine, 2010). 
Abnormal timing delays have been measured in eABRs after unilateral deafness, 
suggesting the brainstem is susceptible to changes in input soon after deafness (Gordon, 
Valero, van Hoesel, & Papsin, 2008). These timing changes in neural circuitry can affect 
how well a CI ear and a NH ear can integrate temporal information and perform spatial 
computations, which could also impact binaural unmasking. Thus, a temporal mismatch 
might not only arise from processing delays in CIs, but can be inherent to the brain after 
deafness. Aside from changes in the brainstem that occur after deafness, this temporal 
disparity between a NH ear and a CI ear might be mitigated by speeding up CI processing 
by ~5-10 ms. This could potentially limit any binaural processing issues that could arise 
from altered timing between the ears for those with SSD.  
 The effect of timing disparities on hearing depends on the listening situation. For 
binaural calculations of timing differences between the ears, there is very little latitude for 
delays introduced by CI processing. This is because natural ITDs occur in < 1 ms. 
Therefore, timing differences between CI and acoustic processing would render any 
useable timing differences useless for spatial processing. In contrast, speech perception is 
32 
 
generally immune to delays up until around 40 ms (echo threshold), which affords much 
more leeway in terms of speech understanding, even with a large CI delay. This is an 
interesting contrast, since the contralateral unmasking paradigm used in this dissertation 
involves both spatial cues and speech perception. Experiment 2.2 in Chapter 2 will 
investigate the effect of interaural disparities in temporal delay on contralateral unmasking.  
 
Loudness growth, compression and their effects on binaural hearing.  
 
There exists a large dynamic range (DR) difference between acoustic and CI ears. 
This DR disparity results in very different loudness growth between the two. Because ILDs 
are so important for relaying binaural cues to CI listeners (Litovsky et al., 2004; van Hoesel 
& Tyler, 2003), differences in loudness growth between the CI and NH ears are likely to 
affect spatial hearing. In CIs, loudness is encoded by the amount of electrical charge 
delivered by the current-pulse amplitudes. When the amplitude of the current is increased, 
the loudness percept is also increased. The smallest possible change in charge that can be 
produced by the CI processor results in large increases in perceived loudness, which has 
the effect of reducing the available DR. It is common for CI users to have a reduced total 
DR of about 40dB (McDermott & Varsavsky, 2009), whereas the DR of hearing for a 
healthy NH ear is approximately 120 dB (Moore, 2003).   
McDermott, McKay, Richardson, and Henshall (2003) describe in detail the 
loudness-encoding scheme of a CI processor. A signal is received at the CI microphone and 
converted into an electrical signal, then amplified with an automatic gain control (AGC) 
mechanism. The AGC circuit is similar to that used to amplify signals in a hearing aid. The 
33 
 
AGC limits the range of sound levels sent to the processor to include those that are above 
the noise floor of the processor. The AGC usually discards signals below about 25 dB sound 
pressure level (SPL) and maps signals in the range of 25 dB to 65 dB SPL to the listener’s 
electrical dynamic range. Stimulus levels above about 65 dB SPL are usually “compressed” 
and represented at the maximum electrical stimulus level, equivalent to a 65 dB SPL signal. 
The internal noise of the CI in combination with the electrical response characteristics of 
individual neurons prohibits proper encoding below a certain threshold, usually deemed 
the T-level, which is the threshold for electric hearing. A comfortable C-level is then 
computed as a loud but bearable maximum level for the implant user. Sounds louder than 
the C-level are compressed to fall at or below the C-level. These measurements are made 
for each electrode channel. This type of loudness programming contributes to the reduced 
dynamic range and loudness-growth issues in CI listeners. 
With respect to loudness growth, the electrode neural-interface has also been 
implicated as a potential source of variability in CI users. Larger than average dynamic 
ranges (for CI listeners) have been correlated with higher amounts of spiral ganglion cell 
survival (Kawano, Seldon, Pyman, & Clark, 1995).  The electrode neural-interface and 
health of the auditory nerve determines how well a given CI listener will be able to code 
intensity information.  The electrode-neural interface broadly refers to the physical junction 
between the individual electrodes on the CI array and the corresponding neurons along the 
basilar membrane. However, many peripheral factors contribute to the interface, such as 
electrode placement, scar tissue growth, bone regeneration and the number and integrity of 
the spiral ganglion neurons in the cochlea (DeVries, Scheperle, & Bierer, 2016). These 
factors can contribute to current spread and channel interactions, which can interfere with 
34 
 
transmission of speech information and lead to pitch perception impairments (Crew, 
Galvin, & Fu, 2012; Jones, Won, Drennan, & Rubinstein, 2013). Most specifically, 
loudness growth in CI listeners depends on the proximity of surviving spiral ganglion 
neurons to the location of the active electrodes on the array. Therefore, the poorer the 
electrode-neural interface, the more compression is needed to encode amplitude and the 
poorer loudness will be represented for the CI listener. A related limitation of CI amplitude 
processing is that there are a limited number of discriminable sound intensity steps 
available to the listener. Schroder, Viemeister, and Nelson (1994) estimated that the total 
number of discriminable intensity steps for NH listeners is about 83. In contrast, it has been 
estimated that the number of intensity steps for CI listeners ranges from 7 - 45 (best case) 
and this number is highly variable across CI listeners (Nelson, Schmitz, Donaldson, 
Viemeister, & Javel, 1996). The number of discriminable intensity steps is thought to be 
important for identifying different speech formants via differences in perceived loudness 
in adjacent frequency channels (Stafford, Stafford, Wells, Loizou & Keller., 2014). The 
ability to reliably identify formants is especially important for segregating different talkers 
in multi-talker environments. 
In order to fit the full dynamic range of an acoustic ear into the limited dynamic 
range of the CI, compression algorithms are implemented in CI processing. The most 
common compression technique is static envelope compression. This method uses a fixed 
compression ratio, meaning the ratio is the same over time. Although commonly used, this 
strategy is not optimal and does not enable the listener to make the best use of their limited 
DR. In static envelope compression the ratio is fixed across channels, instead of being 
optimized for each individual channel. However, due to the wide use of static envelope 
35 
 
compression algorithms in CI processing, the effect of this type of compression on spatial 
hearing was examined in Chapter 3 (Experiment 3.1 and 3.2). 
The presence of envelope compression in CI processing is likely to impact speech 
perception for SSD-CI listeners in several different ways.  First, distortions occur in speech 
after envelope compression is applied in CI processing. Envelope compression is known 
to smear acoustic landmarks important not only for vowel comprehension, but for 
identification of word boundaries (Li & Loizou, 2009). These distortions introduced by 
compression could impair fusion of spatially separated maskers due to distortion of the 
signal envelope in the CI or vocoded ear. Second, CI envelope compression is likely to 
distort the ILD cues that are important for spatial hearing (Grantham et al., 2008). Because 
CIs do a poor job of relaying ITD information, CI listeners mainly rely on ILD cues for 
spatial hearing, that is in order to localize sounds (Dorman et al., 2015) and to identify 
differences in spatial location between concurrent sources in the environment. Finally, 
envelope compression is also likely to affect masked speech perception for SSD-CI 
listeners by changing the effective SNR (i.e., the ratio between the target and masker levels 
in the CI ear). In competing-talker speech tasks, performance can vary in a complex way 
as a function of the relative levels of the target and masker speech in each ear.  Compression 
amplifies quieter sounds relative to louder ones; therefore, compression will have a 
different effect depending on the relative levels and spatial locations of the targets and 
maskers. Depending on the situation, compression could benefit the listener or impair 
performance. For example, if compression results in the talker of interest becoming louder 
in the acoustic ear it could improve the TMR in that ear, and consequently improve 
performance. In some cases, compression could make the target and masker signals more 
36 
 
similar to each other, and therefore reduce the perceived difference in spatial location, 
thereby impairing performance. The effects of amplitude compression and expansion on 
contralateral unmasking and head-shadow benefit were examined in the experiments 
described in Chapter 3 of this dissertation.   
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37 
 
Chapter 2: The effect of interaural mismatches on contralateral 
unmasking in vocoder simulations of cochlear-implant listeners with 
single-sided deafness 
  
The work described in this chapter is published in Ear and Hearing. 
Wess, J.M, Brungart, D.S, Bernstein, J.G.W. (2017). The Effect of Interaural Mismatches 
on Contralateral Unmasking With Single-Sided Vocoders. Ear Hear. 38, 374-386. 
 
 
Introduction 
  
Binaural hearing provides a number of benefits for NH listeners in noisy 
environments (Zurek, 1993). Head-shadow effects allow listeners to obtain a substantial 
listening benefit simply by listening to the “better ear” where the SNR of the target is most 
favorable. In addition, having two ears can generate an additional “squelch” benefit by 
allowing the listener to take advantage of precise timing and level differences between the 
signals arriving at the two ears to increase the intelligibility of the target speech in the 
presence of spatially separated masking sounds (Drullman & Bronkhorst, 2000). As it 
relates to the current study, squelch or contralateral unmasking is defined as the 
improvement in speech understanding when the speech and noise are spatially separated 
and the ear with the poorer SNR is added. Squelch is particularly beneficial in situations 
involving multiple competing talkers, whereby interaural difference cues provide 
information to facilitate the perceptual separation of competing sound sources (Hawley et 
al., 2004). Overall, NH listeners show substantially more binaural benefit (about 3-5 dB) 
38 
 
for speech understanding in noise compared to BICI listeners (Aronoff, Freed, Fisher, Pal, 
& Soli, 2011; Culling, Jelfs, Talbert, Grange, & Backhouse, 2012).  
Individuals with SSD — one NH ear and one deaf ear — are at a severe disadvantage 
when listening to speech in complex listening environments because they lack the benefits 
of binaural hearing (e.g., squelch, head-shadow) that are available to individuals with two 
healthy ears (Welsh, Rosen, Welsh, & Dragonette, 2004). If treated at all, typical treatments 
for SSD included osseointegrated or CROS hearing aids. Hearing-aid treatments have been 
successful in alleviating some of the adverse effects of SSD, by providing access to signals 
presented from the deaf side of the head by routing them to the NH side (Stewart, Clark, & 
Niparko, 2011). However, these treatments do not restore access to binaural cues and these 
patients still have trouble hearing in noisy environments and difficulty with sound 
localization (Grantham et al. 2008). 
In the past several years, CIs have been considered as a possible new treatment 
option for SSD1.  Although CIs have been widely used as a treatment for the profoundly 
deaf, the first use of CIs in individuals with SSD was to treat debilitating tinnitus in the 
deaf ear  (van de Heyning et al., 2008). Since then, a number of studies have found that CIs 
can also improve sound localization and speech perception in noise for individuals with 
SSD  (Arndt et al., 2010; Buechner et al., 2010; Erbele, Bernstein, Schuchman, Brungart, 
& Rivera, 2015; Firszt et al., 2012; Hansen et al., 2013; Vermeire & Van de Heyning, 2009; 
Zeitler et al., 2015). In general, the benefit provided by a CI for speech perception is 
observed in configurations where the target signal is on the deaf side and/or the interferer 
is on the NH side. This is consistent with the idea that the CI allows users to take advantage 
of head-shadow effects and a better-ear listening strategy (Arndt et al. 2011; Buechner et 
39 
 
al. 2010; Firszt et al. 2012; Hansen et al. 2013; Zeitler et al. 2015), with little evidence of 
binaural squelch. A number of studies have measured binaural squelch for bilateral CI 
listeners, and have found that a second CI provides either no squelch at all (e.g.,  Loizou et 
al., 2009; Tyler, Noble, Dunn, & Witt, 2006), or very modest squelch effects on the order 
of 1–2 dB (e.g.,  Eapen et al., 2009). This likely reflects the fact that CIs do not deliver the 
temporal fine-structure information (van Hoesel, 2012) that allows the NH binaural system 
to take advantage of ITDs to increase the effectiveness of binaural hearing.  Contralateral 
unmasking has also been demonstrated in CI users, insofar as they are able to detect tones 
embedded in noise that is uncorrelated across processors (Long, Eddington, Colburn, & 
Rabinowitz, 2003). BICI listeners have demonstrated the ability to detect changes in 
interaural envelopes, although just noticeable differences for CI users are much worse than 
for NH listeners (Goupell & Litovsky, 2015). It is unknown how well SSD CI listeners can 
detect changes in envelope correlation across the ears. 
 Bernstein et al. (2015, 2016) recently provided some evidence suggesting that SSD-
CI listeners can benefit from squelch in certain situations2. They employed a paradigm that 
did not provide any head-shadow benefit, thus ensuring that any observed advantage of the 
CI could be attributed to squelch. They presented a mixture containing a target talker and 
one or two interfering talkers to the acoustic ear, and a mixture containing a copy of the 
interfering talkers to the CI ear. For NH listeners presented with unprocessed signals, this 
paradigm results in the perception that the interferers are originating at the center of the 
head, while the target is speaking to them from the side, thereby providing a reliable spatial 
cue that can be used to perceptually segregate the target signal from the interferers 
(Freyman et al. 2008). Thus, it is not surprising that NH listeners obtained a substantial 
40 
 
benefit in this listening configuration. What is more surprising is that SSD-CI listeners (and 
NH listeners presented with vocoder simulations of SSD-CI listening) also received 
substantial benefit when target and interfering talkers were of the same gender, such that 
few monaural cues (i.e. voice, pitch and timbre) were available to allow the listener to 
perceptually separate the concurrent talkers. This effect was likely mediated by the ability 
of the listeners to fuse the unprocessed and CI (or vocoder)-processed interferer waveforms 
across the ears, to more easily perceptually separate the monaural target from the binaurally 
presented interferers.  
Despite the evidence of squelch apparent in the average results for SSD-CI 
listeners, Bernstein et al. (2016) observed a large amount of intersubject variability in the 
magnitude of contralateral unmasking. While the specific reasons for the inter-subject 
variability in performance are unknown, there are several possible explanations. First, there 
are intrinsic characteristics of the individual listener that cannot easily be fixed or addressed 
through signal processing means, and that are known or suspected to influence speech 
perception for traditional CI listeners. These include neural survival (e.g., Maslin et al. 
2013), current spread (e.g., van Hoesel 2012), duration of deafness (e.g., Blamey et al., 
2013) and lack of cortical plasticity (e.g., (Litovsky et al., 2012; Maslin et al., 2013). 
Second, the variability might also reflect certain extrinsic CI distortions that are potentially 
rectifiable via signal-processing or clinical-mapping procedures. These include frequency 
mismatch between the ears (brought on by electrode placement and mapping procedures), 
timing incongruities originating from different processing latencies between a CI and NH 
ear, and loudness distortions due to CI compression and reduced dynamic range. The 
current study focused on two of these extrinsic factors — spectral and temporal mismatch 
41 
 
— that have the potential to be addressed by either adjusting the frequency allocation tables 
or introducing a temporal delay to one ear. 
Spectral mismatch — i.e., a mismatch between the cochlear places of stimulation 
between the CI and acoustic ears — is one of the most obvious extrinsic factors that might 
negatively impact contralateral unmasking for SSD-CI listeners. The average insertion 
depth of a CI is about 20 mm, with some CI users experiencing much shallower insertion 
depths (Ketten et al., 1998), whereas a normal cochlea is approximately 35 mm long 
(Greenwood, 1961). As a result, the CI electrodes are generally unable to stimulate the 
apical portions of the cochlea where the lowest frequencies (approximately 500 Hz and 
below) are typically processed. For the profoundly deaf, speech perception through the CI 
is the primary goal of cochlear implantation; therefore, CIs are often programmed to 
correspond to the frequencies most important for speech perception, between 150 and 8000 
Hz (van Hoesel, 2012). This results in a large incongruity between the frequencies mapped 
to a given CI electrode and the acoustic best frequencies of the spiral ganglion neurons 
adjacent to the electrode (Landsberger et al., 2015; Stakhovskaya, Sridhar, Bonham, & 
Leake, 2007).  Radiographic insertion depth data from Landsberger et al. (2015) can be 
used to estimate the mismatch between the cochlear place of stimulation for a given 
electrode and the associated place of cochlear stimulation for an acoustic stimulus at that 
electrode’s allocated center frequency. For an average CI patient with a default frequency 
map, this mismatch is approximately 4-6 equivalent rectangular bandwidths (ERBs) (3.6 – 
5.4 mm), depending on the manufacturer and the specific electrode within the array. 
However, the intersubject variability in insertion angle, and therefore the electric-acoustic 
mismatch, is substantial. Landsberger et al. (2015) also conducted a literature survey and 
42 
 
reported the across-subject mean and standard deviations of the insertion angles of the most 
apical electrode across a number of studies. By combining this information across studies 
and averaging across the three major CI manufacturers, we estimate that the range of 
electric-acoustic mismatch in the cochlear place of stimulation for 95% of CI users (i.e., 
±2 standard deviations from the mean) extends from -0.6 to 12 ERBs (-0.5 to 11 mm). 
Although there is some evidence that post-lingually deafened CI users are able adapt to the 
shifted stimulus to “remap” speech sounds and better understand speech (e.g., Reiss, 
Turner, Erenberg, & Gantz, 2007; Svirsky et al., 2004), this plasticity is likely to be 
incomplete, especially for individuals with an extraordinarily large mismatch, meaning that 
SSD-CI listeners might still benefit from an improved match between the frequency 
allocation of the CI and their normal acoustic ear.   
A temporal delay between the NH and the CI ears could also limit contralateral 
unmasking. The processing delay for a CI depends on the stimulation rate of the processor 
as well as the coding strategy employed (Green et al., 2002). There is no uniform delay 
across CI manufacturers. For example, Cochlear Ltd. uses a filtering and coding strategy 
that causes a delay at the CI ear of about 10.5–12.5 ms relative to an acoustic ear (van Dijk, 
private communication 2015). In contrast, Med-El finite impulse response filters have 
integrating group delays with decreasing frequency that more closely match the traveling-
wave latencies for an acoustic ear, resulting in frequency-dependent delays in the acoustic 
ear on the order of 0.5–1.6 ms relative to the CI (Zirn et al., 2015).  Advanced Bionics 
processing latency falls somewhere in between, with 9 –11 ms latency relative to an 
acoustic ear (Litvak, private communication 2016).  Delays of this magnitude in either 
direction would make it very difficult to relay accurate ITD information, given that the 
43 
 
maximum interaural delay for real-world sources is less than 1 ms for humans 
(Middlebrooks, 1999), and sensitivity to ITD is known to deteriorate when the delay 
exceeds 300–500 µs (Mills, 1960). However, contralateral unmasking for SSD-CI listeners 
is likely to depend on interaural correlation between the temporal envelopes of the stimuli 
processed in the two ears. Since speech envelopes are dominated by very slow 2–8 Hz 
temporal modulations (Elliott & Theunissen, 2009), the contralateral unmasking effect 
might be more resilient than ITD discrimination to interaural time delays on the order of 
12 ms or less. 
A third factor that could influence contralateral unmasking is spectral resolution of 
the CI due to physical current spread in the cochlea (Nie, Barco, & Zeng, 2006). Whereas 
an acoustic ear is capable of encoding 30–50 channels of spectral information (Shannon, 
Fu, & Galvin, 2004), a CI has only about 8 functional channels (Friesen, Shannon, Baskent, 
& Wang, 2001). Poor spectral resolution has been shown to limit spatial release from 
masking for CI users in noisy environments, such as competing talker situations (Fu & 
Nogaki, 2005). Spectral resolution is, for the most part, an intrinsic characteristic of the 
electrode-neural interface that cannot easily be overcome with signal-processing or clinical 
mapping solutions. Nevertheless, the degree of spectral resolution could have an impact on 
the extent to which spectral mismatch affects contralateral unmasking. 
This study used vocoder simulations to investigate the extent to which spectral and 
temporal distortions may negatively impact contralateral masking release in NH listeners 
presented with unprocessed stimuli in one ear and vocoded stimuli in the other.  Vocoders 
simulate certain aspects of CI processing such as filtering the original speech signal, 
extracting the amplitude temporal envelope and exciting a certain physical location in the 
44 
 
cochlea (Goupell et al., 2013). Although vocoder simulations are not considered a 
comprehensive CI simulation, this approach allowed us to directly control the amount of 
interaural spectral and temporal mismatch. This study took as a starting point the previous 
results from Bernstein et al. (2016) showing that (a) SSD-CI listeners experienced 
contralateral unmasking, and (b) that the average magnitude of contralateral unmasking for 
NH vocoder listeners was equal to the largest observed benefit for an individual SSD-CI 
listener. We expected that contralateral unmasking would be sensitive to distortions in all 
three dimensions of interest (spectral and temporal mismatch and spectral resolution), and 
that for an extreme case in each dimension, contralateral unmasking would disappear 
completely. Given the large amount of inter-subject variability in interaural mismatch 
reported in the literature – especially in the spectral dimension – the goal was to determine 
to what extent contralateral unmasking would be resilient to interaural mismatch.  
Interaural temporal and spectral mismatch, along with the spectral resolution of the 
vocoder, were varied parametrically in a series of four experiments to measure how large 
of a mismatch could be tolerated in each dimension before contralateral unmasking was 
reduced or eliminated. Experiment 2.1 examined the effects of interaural spectral mismatch 
on contralateral unmasking. Experiment 2.2 examined the effects of an interaural temporal 
mismatch. Experiment 2.3 examined the interaction between frequency resolution of the 
vocoder and interaural spectral mismatch. Experiment 2.4 examined the interaction 
between interaural spectral and temporal mismatch.   
 
 
45 
 
Experiment 2.1: The role of spectral mismatches on contralateral unmasking in 
simulations of CI users with SSD 
 
Experiment 2.1 investigated the effect of interaural spectral mismatch on 
contralateral unmasking.  We hypothesized that spectral mismatch would reduce 
contralateral unmasking, with the idea that the dissimilarity between the acoustic 
interferers presented to one ear and the vocoded interferers presented to the other ear would 
negatively impact performance. 
 
Methods. 
Participants. Experiment 2.1 was a pilot study carried out at Walter Reed National 
Military Medical Center.  Seven paid listeners (age range 18-30) participated in this 
experiment. All listeners had NH, defined as symmetrical thresholds equal to or better than 
20 dB hearing level at octave frequencies between 125 and 8000 Hz and were free from 
cognitive and neurological disorders. All listeners were native English speakers. 
 
Approach. This study employed the contralateral-unmasking paradigm of 
Bernstein et al. (2015, 2016) to measure the squelch benefit provided by a second 
(vocoded) ear in perceptually separating concurrent streams of speech. The left ear was 
always presented with unprocessed target and interfering speech. The right ear was 
46 
 
presented with either silence (in the monaural condition) or a vocoded copy of the 
interfering speech (in the bilateral condition).   
 
Stimuli.  The target and interfering speech were taken from the CRM speech corpus 
for multi-talker communication research (Bolia et al., 2000; Brungart, 2001). The CRM 
corpus consists of phrases of the form “Ready (call sign) go to (color) (number) now.” 
There were eight possible call signs (“Arrow,” “Baron,” “Charlie,” “Eagle,” “Hopper,” 
“Laker,” “Ringo” and “Tiger”), four possible colors (“blue,” “green,” “red” and “white”), 
and eight possible integer numbers (one through eight, including seven). A typical sentence 
would be “Ready Charlie go to white five now.” The target sentence call sign was always 
“Baron”, which provided the cue for the listener to identify which of the concurrent talkers 
was the target. The interferers used other call signs (e.g., “Arrow” or “Ringo”). Eight 
speakers (four females, four males) were used to record all possible combinations. 
Noise vocoding was used to extract speech envelopes in a number of frequency 
channels and use the envelopes to excite specified regions of the cochlea (via synthesis 
filters). The algorithm was similar to that described by Hopkins and Moore (2009) and 
Bernstein et al. (2015, 2016), except that the signals were further manipulated to produce 
spectral mismatches between the unprocessed and vocoded ears. First, stimuli were passed 
through a bank of  linear phase finite-impulse response “analysis” filters with bandwidths 
proportional to the equivalent rectangular bandwidth (ERB) of a NH auditory filter 
(Glasberg & Moore, 1986). This particular algorithm and the definition of channels in 
terms of ERBs were employed to match the processing employed by Bernstein et al., (2015, 
2016), although ERBs can easily be translated into millimeter equivalent distances along 
47 
 
the basilar membrane (0.9mm/ERB, Moore, 1986).  The order of the filters were varied so 
that the filter skirts had similar slopes. However, the filters were engineered to be quite 
steep, the shape of the filter was not symmetrical. The low-end edge of the filter had ~60 
dB per octave roll-off while the top edge rolled offed at about 80 dB. These steep filters 
reduced any potential channel interactions.  Delays introduced by the filtering process were 
offset by removing the appropriate number of samples (half the filter length) from the 
beginning of the output signal, so that the output signal was time-aligned with the input 
signal.  The envelope of the signal in each channel was extracted via a Hilbert transform. 
Each envelope was multiplied by a white noise carrier, with the resulting signal then passed 
through a bandpass “synthesis” filter with cutoff frequencies selected to stimulate a 
specified region of the cochlea.  The level of the resulting signal in each channel was 
adjusted to be equal to the root-mean-squared (RMS) level of the input signal for that 
channel, and the delays associated with the filtering process were removed. Finally, the 
signals were summed across channels to create the noise-vocoded signal.  
Interaural spectral mismatch was introduced through the use of synthesis filters that did 
not match the analysis filters used to extract the envelope, thereby stimulating a different 
cochlear place than would be stimulated by an unprocessed acoustic signal. This was done, 
rather than shift the frequencies of the analysis filters, to simulate the large range of 
possible electrode positions within the cochlea across a population of CI listeners 
(Landsberger et al., 2015), and to determine at what point an interaural spectral mismatch 
would harm performance. The synthesis-filter cutoff frequencies were shifted upward or 
downward relative to the analysis filters by 1, 2, 4 or 7 ERBs (equivalent to 0.9, 1.8, 3.6 or 
6.3 mm). A 6-channel vocoder was used in this experiment, with each channel 4 ERBs (3.6 
48 
 
mm) wide. The frequency range of the analysis filters was 100 to 2502 Hz. It was important 
to ensure that any effects of the spectral processing reflected the introduction of a spectral 
mismatch between the two ears and not the loss of audibility of a portion of the speech 
spectrum. Therefore, the vocoder high-frequency cutoff was set to a lower frequency than 
is customary to allow for the possibility of large upward spectral shifts without removing 
acoustic frequency content, although shifts larger than 7 ERBs were ultimately not included 
based on pilot results.  
 
Procedure.  To maximize the difficulty in perceptually separating concurrent talkers in 
the target ear, this experiment used all same-gender interferers and targets (Brungart, 
Simpson, Ericson, & Scott, 2001). These are the conditions that produced the most masking 
release for SSD-CI listeners (Bernstein et al., 2016) and for single-sided vocoder listeners 
(Bernstein et al., 2015, 2016). The three talkers (target and two interferers) in a given trial 
were always of the same gender, although the gender varied randomly from trial to trial. 
The three simultaneous sentences were constrained such that they were always spoken by 
a different talker and had a different call sign, color and number. The target speech was 
presented at 60 dB SPL, with the interferer level adjusted to yield the desired target-to-
masker ratio (TMR). Two TMRs were tested (0 and 4 dB), because they yielded the largest 
amount of contralateral unmasking in the vocoder study of Bernstein et al. (2015). In the 
bilateral conditions, the interferers were played at the same level to both ears. 
Participants were seated in a sound booth and directed their attention to a computer 
screen. The speech stimulus was generated by MATLAB and played via a RME 
Hammerfall (Haimhausen, Germany) sound card and presented over Sennheiser HD 280 
49 
 
headphones. The computer screen displayed an eight-column, four-row array of colored 
digits corresponding to the response set of the CRM. The listener used the mouse to select 
the colored digit corresponding to the number and color spoken by the target talker who 
used the call sign “Baron”. After each response, the subject received feedback, with the 
button associated with the correct answer flashing briefly. In order to receive a correct 
response, both the color and number needed to be correctly identified. Listeners were 
presented with 100 trials for each TMR (0 and 4 dB) in the monaural condition, and for 
each combination of spectral shift and TMR in the bilateral conditions, for a total of 2000 
trials for each listener. Listeners were presented with blocks of 30 trials with the spectral-
shift condition held fixed (or stimuli presented monaurally) for all of the trials in the block. 
The TMR varied randomly from trial to trial.  
 
Results. 
 
50 
 
 
Figure 2.1.  Results from experiment 2.1 plotting mean performance in correctly identifying 
the target number and color, with data averaged across TMR. Mean monaural performance is 
depicted by the horizontal line, with the horizontal light grey box representing ± one standard 
error of the mean. The vertical dark gray shaded region represents the range of mismatch 
expected for actual SSD-CI listeners. Maximum performance was observed with no spectral 
shift (0 ERBs), and decreased with increasing spectral shift. Error bars represent ± one standard 
error of the mean. 
51 
 
Figure 2.1 plots the mean proportion of trials where the color and number were both 
identified correctly as a function of the spectral shift. Fig. 2.1 shows the data averaged 
across TMR. The vertical shaded region indicates the range of expected spectral mismatch 
across the cochlear partition for an average CI listener (roughly 4–6 ERBs; recall, however, 
that the range of mismatch across individual listeners is much larger, on the order of -0.6 
–12 ERBs, Landsberger et al., 2015). The horizontal shaded region indicates mean 
monaural performance ± one standard error. The data in Fig. 2.1 show a clear effect of 
spectral mismatch on the magnitude of contralateral unmasking, with a benefit of 18 
percentage points with no shift, decreasing to no benefit at all for shifts of −4 or +7 ERBs.  
Figure 2.2.  Results from experiment 2.1 plotting mean performance in correctly identifying 
the target number and color, with data plotted separately for the two TMRs tested: (A) 0 dB 
and (B) 4 dB. Mean monaural performance is depicted by the horizontal line, with the 
horizontal light grey box representing ± one standard error of the mean. The vertical dark gray 
shaded region represents the range of mismatch expected for actual SSD CI listeners. Error 
bars represent ± one standard error of the mean. 
 
For clarity, the data have also been plotted separately for each TMR (Fig. 2.2).  The 
data were analyzed using a repeated-measures binary-logistic regression analysis with two 
52 
 
within-subject factors (spectral shift and TMR). This analysis was used because the data 
were binary in nature (correct or not) and the analysis takes into account the likelihood that 
percentage-correct scores are different based on the number of trials presented. For the 
purposes of the statistical analysis, the monaural condition was considered as an additional 
spectral-shift condition. There were significant main effects of spectral mismatch [χ² (9) = 
2324.2, p<0.001] and TMR [χ² (1) = 606.8, p<0.001] and a significant interaction between 
the two factors [χ² (6) = 69.5, p<0.001]. These interactions were investigated through a 
series of post-hoc tests. The first set of tests sought to determine for which spectral-shift 
conditions contralateral unmasking was observed by comparing performance to the 
monaural condition, with Bonferroni corrections applied for (18) multiple comparisons. 
For a TMR of 0 dB, bilateral performance was significantly better than monaural 
performance for spectral shifts of 0, ±1 and ±2 ERBs (p<0.001).  For a TMR of 4 dB, 
bilateral performance was significantly better than monaural performance for spectral 
shifts of 0, -1, -2 and + 4 ERBs (p<0.001).  The second set of tests sought to determine the 
point at which contralateral unmasking was reduced relative to the unshifted condition. For 
a TMR of 0 dB, spectral shifts of -4 and ±7 ERBs yielded reduced contralateral unmasking 
relative to the zero-shift condition. For a TMR of 4 dB, only the two largest negative shifts 
(-4 and -7 ERBs) yielded reduced contralateral unmasking.  
 
Summary. 
In summary, the results of experiment 2.1 show that significant contralateral 
unmasking was preserved for spectral shifts smaller than ±2 ERBs, was significantly 
53 
 
reduced by spectral shifts of ± 2-4 ERBs, and was completely eliminated by spectral shifts 
of ±7 ERBs.  
 
Experiment 2.2. The role of temporal mismatches on contralateral unmasking in 
simulations of CI users with SSD 
 
 Experiment 2.2 investigated the effect of interaural delays on contralateral 
unmasking. We hypothesized that a large interaural delay would reduce contralateral 
unmasking, but it was not clear to what extent the unmasking effect would be resilient to 
small delays. 
 
Methods. 
Participants. Eight NH paid listeners participated. Listeners were tested at the Air 
Force Research Laboratory, Wright Patterson Air Force Base, Ohio. The listener panel 
consisted of professional listeners, in that they are paid to conduct multiple psychoacoustic 
experiments. All listeners were native English speakers. 
 
Stimuli. The methods were generally the same as in experiment 2.1, except that 
interaural disparities were implemented in the temporal instead of the spectral dimension.  
Vocoder processing was carried out in 8 frequency bands covering a range of 100 to 10,000 
Hz. This full-bandwidth vocoder was employed because no spectral shifts were applied. 
54 
 
Interaural temporal mismatches were induced by delaying the vocoded signals presented 
to the right ear (defined as a positive delay) or by delaying the unprocessed signals 
presented to the left ear (negative delay). Temporal mismatches included ± 100, 50, 24, 18, 
12 and six ms. 
Procedure. The data reported here form a subset of the data collected from a larger 
experiment exploring the cues that listeners might use to perform the contralateral 
unmasking task.  Thus, only a TMR of 0 dB was tested, and the number of trials for the 
reported conditions are different than in the other experiments. This incongruity occurred 
because this experiment was run as part of a larger unrelated experiment. Listeners were 
presented with 50 trials for each temporal shift condition, 200 trials for the zero-shift 
condition and 500 trials for the monaural condition, for a total of 2600 trials per listener. 
Listeners were presented with blocks of 32 or 64 trials, with all experimental parameters 
(i.e., temporal shift or monaural condition) varying randomly from trial to trial within each 
block.  
 
 
 
 
 
 
 
55 
 
Results. 
 
 
Figure 2.3.  Results from Experiment 2.2 showing mean performance as a function of interaural 
temporal mismatch. Mean monaural performance is depicted by the horizontal line, with the 
horizontal light grey box representing ± one standard error of the mean. The vertical dark gray 
shaded region represents the range of mismatch expected for actual SSD-CI listeners. 
Performance was maximum with no interaural delay (0 ms) and decreased with increasing 
temporal delay, although there was relatively little effect for an interaural delay of 12 ms or 
less (the expected range for SSD-CI listeners). Error bars represent ± one standard error of the 
mean. 
 
Figure 2.3 plots the mean proportion of keywords correctly identified as a function 
of temporal shift. Interaural temporal mismatch reduced contralateral unmasking, but the 
effect was relatively small for delays in the +0.5-12.5 ms range expected for SSD-CI 
56 
 
listeners. A repeated-measures binary-logistic regression analysis revealed a significant 
main effect of temporal shift [χ² (11) = 2549.3, p<0.001]. Post-hoc tests were carried out 
to determine at what point contralateral unmasking was reduced relative to the zero-delay 
condition, and at which point contralateral unmasking disappeared completely with 
Bonferroni corrections applied for (13) multiple comparisons. Performance was 
significantly poorer (p<0.05) than in the zero-delay condition for positive temporal shifts 
(i.e., vocoder leading) of 24 ms larger, and for negative shifts (i.e., vocoder lagging) of -18 
ms or larger with the exception of -24 ms which was not significant. Performance was 
significantly better than in the monaural condition (p<0.05) for all temporal shifts between 
-24 and +18 ms.  
Summary. 
In summary, temporal shifts of ±50–100 ms completely eliminated contralateral 
unmasking and shifts smaller than ±24 ms preserved most of the benefit. A temporal 
mismatch in the 0.5–12 ms range expected for SSD-CI listeners did not significantly reduce 
contralateral unmasking. 
 
Experiment 2.3: The role of spectral mismatches and vocoder channel resolution on 
binaural unmasking in simulations of CI users with SSD 
 
Experiment 2.3 examined the interaction between interaural spectral mismatch and 
the frequency resolution of the vocoder.  Many CI users have limited spectral resolution 
(Loizou, 2006). We hypothesized that although reduced frequency resolution would reduce 
57 
 
contralateral unmasking (Bernstein et al., 2015) when no mismatch is present, it might also 
mitigate the negative effects of spectral mismatch on contralateral unmasking to some 
extent. The idea was that for a given degree of interaural mismatch, with broader vocoder 
channels there would be an increased likelihood of interaural correlation between the 
speech envelopes. 
Methods. 
Participants.  Nine NH paid listeners participated in experiment 3, 6 of whom had 
also participated in experiment 2.2. All listeners who participated in both experiments 
completed experiment 2.2 first. Listeners were tested at the Air Force Research Laboratory, 
Wright Patterson Air Force Base, Ohio.  
 
Stimuli.  The methods were generally the same as in experiment 2.1, except that 
the number of frequency channels in the vocoder was manipulated in addition to the 
introduction of interaural spectral mismatches. Four different numbers of vocoder channels 
were tested (3, 5, 8 and 10). Synthesis filters were shifted by 0, ±0.5, ±1, ±2, ±4 and ±7 
ERBs relative to an acoustic ear. By shifting the synthesis filters, the signal excited a 
different cochlear place than would be excited by the unshifted vocoder signal. The 
frequency range of the vocoder analysis filters was 576 to 4102 Hz. The high-frequency 
cutoff was higher than in Experiment 2.1 because the maximum spectral shift was limited 
to 7 ERBs (maximum synthesis filter cutoff = 8960 Hz). The low-frequency cutoff was set 
higher than in Experiment 2.1 to ensure that frequency information was not removed with 
negative spectral shifts  (minimum synthesis filter cutoff for a shift of -7 ERBs = 147 Hz). 
58 
 
Figure 2.4 shows examples of the analysis and synthesis filter band edges with a spectral 
shift of +4 ERBS, for vocoders with 3, 5, 8 and 10 channels. 
 
Figure 2.4.  The analysis and synthesis band edges for a +4-ERB spectral mismatch for all of 
the vocoder-channel conditions in experiment 3. The numbers above and below each pair of 
lines identify the corresponding analysis and synthesis channels.  For the 3-channel vocoder, 
the corresponding bands overlap. For the 5-channel vocoder, the bands do not overlap, but they 
are nearly adjacent.  For the 8- and 10-channel vocoders, the analysis and synthesis bands are 
separated by at least one channel. 
 
Procedure.  This experiment examined an interaction between two factors and 
therefore included a greater number of conditions than in experiment 2.1; to compensate 
for this fewer trials were presented to limit the duration of the experiment. Listeners were 
presented with 45 trials for each combination of spectral shift, TMR (0 and 4 dB), and 
number of vocoder channels in the bilateral conditions. In the monaural condition, listeners 
were also presented with 45 trials for each TMR, but the structure of the automated program 
59 
 
repeated these 45 monaural trials 4 times (once for each channel-number condition), 
resulting in a total of 180 monaural trials for each TMR. 
Thus, each listener completed a total of 4320 trials. The stimuli were presented in 
blocks consisting of 50 trials, including 45 bilateral trials with the number of vocoder 
channels held constant throughout the block and 5 monaural trials. The spectral shift 
condition and TMR varied randomly from trial to trial within a block. 
 
 
 
 
 
 
 
 
 
 
 
 
 
60 
 
Results. 
 
 
Figure 2.5.  Results from Experiment 2.3 plotting mean performance in correctly identifying 
the target number and color as a function of spectral mismatch, with data averaged across TMR. 
Curves represent fits to the data for each vocoder condition (3, 5, 8 or 10 channels) using 
Pearson type 7 distributions with four free parameters. Mean monaural performance is depicted 
by the horizontal line, with the horizontal light grey box representing ± one standard error of 
the mean. The vertical dark gray shaded region represents the range of mismatch expected for 
actual SSD CI listeners. While maximum performance was slightly better in conditions with 
better spectral resolution, performance dropped off substantially with relatively small spectral 
shifts in these conditions. Conditions with fewer vocoder channels were more immune to the 
effects of spectral shift. Error bars represent ± one standard error of the mean. 
 
Figure 2.5 plots mean performance (averaged across TMR) as a function of spectral 
shift for the four vocoder-channel conditions, along with curves (Pearson Type 7, four free 
parameters3) fitted to the data. As in Fig. 2.1, the vertical shaded region indicates the 
expected range of spectral mismatch across the cochlear partition for average SSD-CI 
61 
 
listeners (based on Landsberger et al., 2015). The horizontal shaded region indicates 
monaural performance. For clarity, Fig. 2.6 plots the effect of spectral resolution and 
spectral mismatch as a function of TMR.  
 
 
Figure 2.6.  Results from experiment 2.3 plotting mean performance in correctly identifying 
the target number and color as a function of spectral mismatch, with data plotted separately for 
the two TMRs tested: (A) 0 dB and (B) 4 dB. Mean monaural performance is depicted by the 
horizontal line, with the horizontal light grey box representing ± one standard error of the mean. 
The vertical dark gray shaded region represents the range of mismatch expected for actual SSD 
CI listeners. Error bars represent ± one standard error of the mean. 
 
 
A binary-logistic regression analysis revealed significant main effects of spectral 
mismatch [χ² (9) = 50.1 p<0.001] and TMR [χ² (1) = 463.4 p<0.001], but no main effect of 
spectral resolution (p>0.05). There was a significant three-way interaction between all 
three factors [χ² (9) = 91.6 p<0.001], and significant two-way interactions between spectral 
resolution and spectral shift [χ² (9) = 110.0 p<0.001] and TMR and spectral shift [χ² (9) = 
1323.8 p<0.001].  The interaction between spectral resolution and spectral shift is visible 
62 
 
in Fig. 2.5, whereby conditions with fewer channels (poorer spectral resolution) were more 
resilient to spectral shifts. For example, for a vocoder with 3-5 channels, performance was 
only marginally affected by a spectral shift of 4 ERBs, whereas for a vocoder with 8-10 
channels the contralateral unmasking benefit was almost completely eliminated by a 4-
ERB shift. 
Planned comparisons were made between performance with a spectral shift to 
performance in the monaural and zero-shift conditions. To reduce the number of tests, 
pairwise comparisons were only made between the monaural condition and the bilateral 0-
ERB and +4-ERB spectral-shift conditions. These two spectral-shift conditions were 
selected because they represent a perfect interaural spectral match, and a mismatch that fell 
within the range of the shifts expected for an average SSD-CI listener. Bonferroni 
corrections were made for 8 multiple comparisons (4 vocoder-channel conditions x 2 
TMRs). 
Overall, there were many more significant effects (p<0.05) for the 0-dB than for 
the 4-dB TMR. The 0-dB data are discussed first. With no spectral shift (0 ERBs) there 
was significant contralateral unmasking for vocoders with 3, 8 or 10 channels (p<0.05). 
However, conditions with better spectral resolution (more channels) were more sensitive 
to spectral mismatch. A spectral shift of 4 ERBs significantly reduced performance 
(relative to the 0-ERB condition) for a 10-channel vocoder, but not for vocoders with 3, 5 
or 8 channels. Only the 3-channel vocoder still yielded significant contralateral unmasking 
(relative to the monaural condition) when there was a 4-ERB spectral shift. While relatively 
few of the comparisons were significant for the 4-dB TMR, there was also some indication 
of the same basic pattern of results. Only the 10-channel condition showed reduced 
63 
 
performance for a 4-ERB relative to a 0-ERB shift, while only the 5-channel condition 
showed significant contralateral unmasking with a 4-ERB shift. 
Summary. 
In summary, the results of experiment 2.3 show that contralateral unmasking is 
greater for a vocoder with higher spectral resolution when the spectrum is matched 
perfectly, but that vocoders with a less spectral resolution were more robust to spectral 
mismatch.  
 
Experiment 2.4: The role of spectral and temporal mismatches on contralateral 
unmasking in simulations of CI users with SSD 
 
Experiment 2.4 explored the interaction between spectral and temporal mismatch 
in their effect on contralateral unmasking. We hypothesized that the negative impact of a 
mismatch in one dimension might be compounded by a mismatch in the other dimension.  
 
Methods. 
Participants.  Ten NH listeners participated in this experiment, 4 of whom had 
participated in both experiments 2 and 3; these listeners completed experiments 2 and 3 
before completing experiment 4. Listeners were tested at the Air Force Research 
Laboratory, Wright Patterson Air Force Base, Ohio.  
64 
 
Stimuli.  The methods were generally the same as in the previous experiments, 
except that spectral and temporal mismatches were combined. This experiment employed 
a 10-channel vocoder with a frequency range of 354 to 5752 Hz. The vocoder bandwidth 
was larger than in experiments 1 and 3 because a more narrow range of spectral shifts was 
tested (± 4 ERBs). 
Procedure.  Listeners were presented with 60 trials for each combination of spectral 
shift (0, ±2 and ±4 ERBs), temporal shift (0, ±12, ±18, ±24, ±50 and ±100 ms) and TMR 
(0 and 4 dB). Because the monaural conditions were coded as additional temporal-shift 
conditions in the experimental software, listeners completed 5 times as many trials (i.e., 
300) at each TMR in the monaural conditions. Each block consisted of 48 trials with a fixed 
temporal shift (or monaural presentation), while the TMR and spectral shift (where 
applicable) varied randomly from trial to trial within the block.  
 
 
 
 
 
 
 
 
 
65 
 
Results. 
 
Figure 2.7.  Results from experiment 2.4 plotting mean performance in correctly identifying 
the target number and color, averaged across TMR. Mean monaural performance is depicted 
by the horizontal line, with the horizontal light grey box representing ± one standard error of 
the mean. The vertical dark gray shaded region represents the range of spectral or temporal 
mismatch expected for actual SSD CI listeners. (A) Data plotted as a function of temporal shift. 
(B) The same data plotted as function of spectral shift, with the ±50 and ±100 ms temporal-
shift conditions excluded for visual clarity. Error bars represent ± one standard error of the 
mean. 
 
Figure 2.7A plots the mean performance as a function of temporal shift, with 
individual curves representing the different spectral-shift conditions. Figure 2.7B plots the 
same data as a function of spectral shift, with individual curves representing the different 
temporal-shift conditions tested.  The ± 50 and ± 100-ms conditions were excluded from 
Fig. 2.5B for visual clarity. A binary-logistic regression analysis revealed significant main 
effects of TMR [χ² (1) = 637.2 p<0.001], temporal shift [χ² (8) = 81.8 p<0.001] and spectral 
shift [χ² (4) = 17.0 p<0.05], significant two-way interactions between spectral shift and 
temporal shift [χ² (12) = 5.46 E13 p<0.001], TMR and temporal shift [χ² (8) = 53.2 
66 
 
p<0.001], and a significant three-way interaction between all three factors [χ² (11) = 
63320198.5 p<0.001]. The two-way interaction between TMR and spectral shift was not 
significant (p>0.05). The interaction between the effects of spectral and temporal mismatch 
is visible in Fig. 2.7, whereby the effect of a mismatch in one dimension became more 
muted (i.e., the individual curves in Figs. 2.7A and B became flatter) when there was also 
a mismatch in the other dimension. For completeness, Fig. 2.8 plots the effect of temporal 
and spectral mismatch as a function of TMR. 
Figure 2.8.  Results from experiment 2.4 plotting mean performance in correctly identifying 
the target number and color, with data plotted separately for the two TMRs tested. Mean 
monaural performance is depicted by the horizontal line, with the horizontal light grey box 
representing ± one standard error of the mean. The vertical dark gray shaded region represents 
the range of spectral or temporal mismatch expected for actual SSD CI listeners. The scale for 
the ordinate changes in each panel. Top row: data plotted as a function of temporal shift for 
TMRs of (A) 0 dB and (B) 4 dB. Bottom row: data plotted as a function of spectral shift, with 
the ±50 and ±100 ms conditions excluded for clarity, for TMRs of (C) 0 dB and (D) 4 dB. Error 
bars represent ± one standard error of the mean. 
67 
 
 
Post-hoc tests further evaluated the interaction between spectral and temporal 
mismatch. To limit the number of planned comparisons, pairwise comparisons were made 
only for conditions involving spectrally or temporally matched stimuli (0 ERBs or 0 ms) 
and spectral or temporal mismatches in the range expected for an average SSD-CI listeners 
(+4 ERBs or +12 ms). Bonferroni corrections were made for (2) multiple comparisons: +4 
ERBs and +12 ms (2 TMRs). Significant differences were only observed for the 0-dB 
TMR. When there was no spectral mismatch (0 ERBs), a 12-ms temporal mismatch 
significantly reduced performance (p<0.05). But there was no significant effect of a 12-ms 
temporal mismatch when there was also a 4-ERB spectral mismatch (p>0.05).  When there 
was no temporal mismatch, a 4-ERB spectral mismatch just failed to significantly reduce 
performance relative to the 0-ERB condition (p>0.05).  There was no effect of a 4-ERB 
mismatch when there was also a 12-ms temporal mismatch (p>0.05). 
Summary. 
In summary, these results show that while mismatches in either dimension 
(temporal or spectral) can reduce contralateral unmasking, these two types of mismatch do 
not interact in an additive fashion. Instead, the results show that once masking release was 
diminished by a shift in one dimension, an additional shift in the other dimension had a 
relatively small effect.  
 
 
 
68 
 
Discussion 
The results of the four experiments in this study demonstrate that interaural spectral 
and temporal mismatch introduced into vocoder processing can reduce or eliminate the 
contralateral unmasking effects experienced by NH listeners presented with unprocessed 
sounds in one ear and vocoded sounds in the other ear. The competing-talker paradigm 
produced a situation where it would have been very difficult to perceptually separate the 
target talker of interest from the concurrent same-gender interfering talkers based on 
monaural cues alone. The addition of a second copy of the interfering voices 
contralaterally, either via vocoder processing to a second NH ear (Bernstein et al., 2015, 
2016) or via direct connection to a CI (Bernstein et al., 2016) has been previously shown 
to provide the listener with sufficient interaural cues to facilitate the perceptual separation 
of concurrent voices and improve performance in the speech-identification task. The 
current study replicated this finding for NH listeners presented with vocoded stimuli, and 
extended it by providing information about the degree of interaural spectral and temporal 
alignment required to facilitate the effect.  
Across the four experiments, the greatest amount of contralateral unmasking was 
achieved in the “ideal” condition whereby a high-resolution vocoded signal was exactly 
matched both spectrally and temporally to the unprocessed ear. Performance in this ideal 
situation was found to improve performance by as much as 20 percentage points relative 
to the monaural condition. This result suggests that SSD-CI listeners should perform best 
in this type of task if signal-processing and clinical frequency mapping procedures could 
be established to achieve interaural alignment in both the spectral and temporal dimensions. 
If perfect spectral and temporal matches cannot be obtained, the “optimal” level of 
69 
 
performance would be difficult to achieve. In cases where there is a spectral mismatch of 
more than 2 ERBs or a temporal mismatch more than 12 ms or more, there was a significant 
reduction in contralateral unmasking performance relative to the ideally matched condition. 
Given the likely difficulty in achieving these tight tolerances with current technology, it 
will be necessary for clinicians to tolerate some level of interaural mismatch, as well as 
limited spectral resolution, when fitting and counseling SSD-CI patients. The implications 
of distortions in each of these dimensions are discussed in the following. 
Impacts of a spectral mismatch. 
When considered relative to the range of mismatch expected for an average SSD-
CI listener, spectral mismatch had the greatest negative impact on contralateral unmasking 
of the three distortions examined in this study. Contralateral unmasking was maximal for 
a vocoder with no spectral shift, decreasing to approximately half this maximum value for 
a shift of ±2 ERBs, and to nearly zero (i.e., no contralateral unmasking) for a shift of ±4 
ERBs or more, in the range expected for an average CI user (Landsberger et al., 2015).  
Reduced contralateral unmasking with spectral mismatch is consistent with other 
examples whereby interaural frequency match affects binaural processing for bilateral CI 
or bilateral vocoder listeners. Small mismatches (on the order of 3 mm) disrupt interaural 
ILD and ITD discrimination performance for bilateral CI users (Goupell et al., 2013; Kan 
et al., 2013) and bilateral vocoder listeners (Siciliano, Faulkner, Rosen, & Mair, 2010). 
Similarly, binaural fusion has been found to be disrupted by spectral compression for NH 
listeners presented with bilaterally vocoded signals (Aronoff et al., 2015). For bilateral CI 
listeners, interaural spectral mismatch reduces binaural fusion, causing a single stimulus to 
be perceived as multiple sounds (Kan et al., 2013). Binaural fusion is an important 
70 
 
prerequisite to the proper grouping and perceptual separation of concurrent sounds in the 
environment (Bregman, 1994). In the current study, the perceptual fusion of the 
unprocessed interferers presented to one ear and the vocoded interferers presented to the 
other ear would be required to allow the listener to perceive these sounds as a single 
auditory object and perceptually separate them from the monaural target speech. One way 
of interpreting the results is that spectral mismatch disrupted the interaural envelope 
correlation that is required to facilitate fusion. Correlated envelope information between 
the ears has been shown to facilitate auditory object formation and fusion (Carrell & Opie, 
1992). 
These results are also consistent with a recent vocoder study that examined the role 
of spectral mismatch and its effect on integration of speech information across ears. Ma, 
Morris, and Kitterick (2016) found that bilateral presentation of the vocoded speech 
resulted in better performance than with stimuli presented monaurally, but that this 
improvement was reduced by an interaural spectral mismatch. One caveat to the 
interpretation of these results is that Ma et al. (2016) presented target speech information 
to both ears, which makes it difficult to know whether the result reflects the effects of 
spectral shifts on binaural integration or better-ear listening.  In the current study, no target 
speech information was presented to the vocoder ear, ensuring that the observed effects 
reflect the integration of information across the ears.  
 The current results regarding the effects of spectral mismatch contrast with the 
results of Bernstein et al. (2016) who found that most of the SSD-CI listeners in the study 
did obtain a substantial release from masking. The release occurred despite the likelihood 
that many of the listeners likely experienced an interaural spectral mismatch on the order 
71 
 
of 4 ERBs or more (Landsberger et al., 2015), which should have been large enough to 
extinguish any contralateral unmasking. The reason for this discrepancy between the 
studies is not clear, but perhaps it could be related to plasticity/adaptation effects for the CI 
listeners. The NH listeners in the current study were naïve to the mismatched frequency 
channels used in this experiment, whereas the CI listeners in the prior experiment had 
substantial experience listening to their own possibly mismatched frequency maps. Indeed, 
previous work has shown that CI listeners can adapt to frequency mismatches over time 
(Svirsky et al., 2004; Reiss et al., 2007). Alternatively, it is possible that the actual SSD-
CI listeners had relatively poor spectral resolution, resulting in current spread and auditory-
nerve activation across a broad swath of the cochlea for a given electrode. Current spread 
in CIs typically causes reduced speech understanding in noise (Srinivasan, Padilla, 
Shannon & Landsberger, 2013). However, in cases where the listener does not depend on 
the vocoded signals for information about the target speech, Experiment 2.3 showed that 
reduced frequency resolution could mitigate the negative effects of spectral mismatch on 
contralateral unmasking. 
 
Impacts of spectral resolution.  
Spectral resolution had relatively little effect on contralateral unmasking under 
conditions with no spectral mismatch, in that contralateral unmasking was observed even 
in conditions with few vocoder channels. This result is in qualitative agreement with 
Bernstein et al. (2015) who found that contralateral unmasking was maximal with six 
vocoder channels, was only modestly reduced with four channels, and did not disappear 
completely until processing was carried out in a single broadband vocoder channel. In 
72 
 
contrast, spectral resolution had a substantial impact on contralateral unmasking when 
there was also a spectral shift. Contrary to the typical improvement in speech perception 
associated with better resolution (e.g., Nie et al., 2006), in this case better spectral 
resolution actually led to poorer performance when there was also a spectral mismatch 
present.  
Our interpretation of this result is that in conditions with broader channels, there 
was more resilience to spectral shift because at least some of the synthesis-filter bandwidth 
overlapped with the analysis-filter frequency range from which the signal was derived. 
This can be seen in Fig. 2.4, which shows the analysis and synthesis filter cutoff frequencies 
for the conditions with a 4-ERB spectral mismatch. For a vocoder with 3 channels, there is 
some overlap in the analysis and synthesis filter bandwidths for a given channel, which 
likely yielded some interaural correlation between the envelopes at a given cochlear place. 
In contrast, for a vocoder with 10 channels, there was no overlap between the analysis and 
synthesis filter bandwidths for a given channel. This would cause the envelope in a given 
frequency region to be decorrelated with the acoustic envelope in the other ear in the 
corresponding frequency region, thereby limiting contralateral unmasking.  The 
bandwidths for the 5-channel vocoder are somewhat at odds with this interpretation, since 
the analysis and synthesis filters do not overlap, yet significant contralateral unmasking 
was observed. However, it is known that there is some envelope correlation between 
neighboring spectral bands (Buss, Whittle, Grose, & Hall, 2009); the analysis and synthesis 
filters might have still been close enough to one another in the 5-channel case to allow for 
some interaural correlation. 
73 
 
It should be noted that even though poorer vocoder resolution led to an advantage 
in this particular instance involving spectral mismatch, this does not necessarily mean that 
SSD-CI listeners might benefit in general from reduced spectral resolution. For example, 
if the experiment had been designed differently with the target presented to the vocoded 
ear, reduced spectral resolution is likely to have yielded a reduction in performance. Thus, 
in situations where SSD-CI listeners take advantage of a better SNR at the CI ear as a result 
of acoustic head shadow (e.g., Vermeire & van de Heyning, 2009; Arndt et al. 2011; Firszt 
et al. 2012), reduced spectral resolution would be more likely to harm performance. 
 
Effects of temporal mismatch. 
Experiments 2.2 and 2.4 examined the effect of interaural temporal mismatch on 
contralateral unmasking. Contralateral unmasking was only modestly affected by temporal 
mismatch in the expected range for SSD-CI listeners (0.5–12 ms), although larger 
mismatches of 24 ms or greater did substantially reduce the unmasking effect (Figure 2.2). 
A possible explanation for these results involves integration of early reflections to enhance 
perception of the speech signal. Basically reflections that occur within a small window (< 
50ms) have been shown to enhance speech perception and reflections that occur later 
degrade a listener’s ability to hear the speech signal. Essentially these early reflections (or 
delays) combine with the original signal and enhance the SNR, this could explain why 
small delays did not disrupt contralateral unmasking (Bradley, Reich, & Norcross, 1999; 
Soulodre, Popplewell, & Bradley, 1989).  Related to the early reflections interpretation the 
robustness of contralateral unmasking to interaural delays less than 24 ms is consistent with 
the well-established precedence effect for speech, which has a high echo threshold — i.e., 
74 
 
a long maximum duration over which stimulus echoes “fuse” with the direct sound and are 
perceived as part of a single auditory object (Litovsky et al., 1999). This process enables 
humans to hear in highly reverberant environments and is likely the result of cortical 
processing (Miller et al., 2009), with speech sounds arriving within a 30 – 40 ms window 
being perceived as one auditory object (Grant, Wassenhove, & Poeppel, 2004; Litovsky et 
al.,1999). An interesting caveat to the interpretation of these results being explained in 
terms of the precedence effect is that the drop off in performance after temporal mismatch 
was relatively symmetrical. If the results of this experiment truly did reflect a precedence 
mechanism, than listeners would have likely performed much better when the vocoder 
leading stimulus was applied. This would equate to the first vocoded sound taking 
“precedence” and the listeners would have received a clear spatial cue that the vocoded 
voices were on the right and the target was on the left. The opposite should have occurred 
in the acoustic ear leading conditions. The leading acoustic interferers should have sounded 
like they were primarily coming from the acoustic side, combined with the target in that 
ear. This would have strongly degraded performance in the acoustic ear leading vs the 
acoustic ear lagging conditions. This was not what occurred, therefore an alternative 
explanation based on interaural envelope correlation might be appropriate.  The effects of 
temporal mismatch on contralateral unmasking might be thought of in terms of interaural 
coherence of the interferer envelopes. Speech contains inherent envelope fluctuations from 
2–5 Hz for syllables and 15–30 Hz for phonemes (Elliott & Theunissen, 2009), 
corresponding to a modulation period of 200–500 ms for syllables and 30–60 ms for 
phonemes. For these slow modulations, some interaural temporal misalignment will have 
relatively little impact on interaural correlation of speech envelopes.  
75 
 
 
Effects of combined spectral and temporal mismatch. 
SSD-CI listeners are likely to experience both a temporal and a spectral mismatch 
simultaneously. Experiment 2.4 investigated the interaction between these two distortions. 
We hypothesized that these distortions would be additive — that introducing a temporal 
mismatch in addition to a spectral mismatch (or vice versa) would cause an even larger 
reduction in contralateral unmasking. Temporal delays caused by frequency dependent 
differences in signal latency between the NH ear and CI ear could disrupt bilateral 
unmasking because common onset times are an important cue for grouping of sounds 
(Bregman, 1994). This disruption could be especially pronounced when accompanied with 
spectral compression, which is known to limit binaural fusion (Aronoff et al., 2015). 
However, the results did not support this hypothesis. In fact, the opposite occurred: 
mismatch in one dimension had a smaller additional effect if there was already mismatch 
in the other dimension (Fig. 2.5). One interpretation of this result is that if fusion is already 
disrupted, then further distortion does not have as much effect.  From a clinical perspective, 
this result suggests that if one distortion is present, then reducing the other distortion 
through signal processing or re-programming the CI will only modestly improve 
contralateral unmasking. To yield the most possible contralateral unmasking, temporal and 
spectral interaural disparities must both be minimized. 
76 
 
 
Implications for SSD-CI listeners. 
These results suggest that to maximize a listener’s ability to use their two ears 
together to better understand speech in competing backgrounds, steps could be taken to 
minimize spectral and temporal distortions. Perhaps the most encouraging result from this 
paper is that the distortion that had the largest negative impact on contralateral unmasking 
— spectral mismatch — is also the distortion that can most readily be addressed clinically. 
Theoretically, a place-matched frequency mapping based on electrode location could be 
provided by an audiologist to better match the place of cochlear stimulation for a given CI 
electrode to the cochlear place of stimulation for an acoustic signal presented to the NH 
ear. This process would require estimates of the cochlear places of stimulation associated 
with individual electrodes in the array, which could be accomplished in one of several 
ways. Computerized tomography (CT) scans (Noble, Gifford, Hedley-Williams, Dawant, 
& Labadie, 2014) or radiographs (Landsberger et al., 2015) could be used to estimate the 
insertion angles of individual electrodes. Comparisons of ITD sensitivity for a given 
electrode and a range of acoustic stimulus frequencies might also provide information 
about which acoustic frequency would be best matched to a particular electrode (Goupell 
et al., 2013; Kan et al., 2013). Pitch matching between individual electrodes and acoustic 
stimuli (Carlyon et al., 2010) could also provide information about cochlear place of 
stimulation, although the pitch-matching estimates have been shown to be susceptible to 
adaptation effects (Reiss et al., 2014). Hu and Dietz (2015) compared pitch matching and 
ITD sensitivity in BICI users and found pitch matching preference was nearly identical as 
the programmed electrode frequency band, suggesting that pitch percepts adapt to the CI 
77 
 
processor allocation. There were also large differences between the chosen pitch-matched 
pairs and maximal ITD-sensitive pairs, as shown in previous studies (Long et al., 2003; 
Poon, Eddington, Noel, & Colburn, 2009; van Hoesel & Clark, 1997). Due to the 
adaptability of pitch percepts and the biological importance of ITD sensitivity for binaural 
hearing, Hu and Dietz (2015) conclude that identifying ITD-sensitive electrode pairs is the 
most promising method for remapping a CI, at least for BICI listeners.  
Minimizing temporal mismatch is likely to be more difficult than minimizing 
spectral mismatch, given the expectation that CI stimulation will be delayed relative to the 
auditory-nerve response in the acoustic ear. In a group of bimodal CI listeners (CI in one 
ear and severely impaired acoustic hearing in the other ear), Francart and McDermott, 
(2013) used lateralization judgements to establish that the auditory-nerve response time is 
faster in the CI ear than in the acoustic ear (with no hearing aid worn) by about 1.5 ms, due 
to the delay associated with the cochlear traveling wave in the acoustic ear (Rasetshwane, 
Argenyi, Neely, Kopun, & Gorga, 2013b). However, this study used a method of direct-
stimulation to control the electrical stimulation pattern on the array, and did not make use 
of the listener’s external speech processors. In everyday listening conditions, the speech 
processor can add substantial delay to the overall processing time, resulting in a slower 
response auditory-nerve response time in the CI relative to the acoustic ear (Zirn et al. 
2015). Thus, the only ways to reduce the interaural temporal mismatch are (1) to reduce 
the processing time for the CI external speech processor, or (2) to introduce a delay to the 
acoustic ear. Many individuals that might be classified as SSD have some hearing loss in 
the acoustic ear and wear a hearing aid to provide amplification (e.g., Vermeire & van de 
Heyning 2009; Firszt et al. 2012). Theoretically, the time delays in the hearing-aid and CI 
78 
 
ears could be adjusted to minimize interaural delay. However, this approach would not be 
reasonable for an SSD-CI listener with normal hearing in the acoustic ear, whereby adding 
any processing to the acoustic signal via a hearing-aid device is likely to be undesirable.  
 
Study Limitations. 
Vocoder simulations are imperfect estimates of the acoustic information that is 
delivered to the auditory nerve for a CI recipient (Freyman et al. 2008; Li & Loizou 2009).  
Nevertheless, Bernstein et al. (2016) employed the same contralateral unmasking paradigm 
as the current study, and found that NH listeners presented with vocoded signals yielded a 
similar qualitative pattern of results to actual SSD-CI listeners. Furthermore, the best 
performing SSD-CI listener obtained about the same amount of contralateral unmasking as 
the average vocoder listener. Thus, the substantial effects of spectral mismatch and to some 
extent, of temporal mismatch observed for NH listeners presented with vocoder 
simulations suggest the possibility that minimizing these particular distortions could 
improve performance for SSD-CI listeners.  
Another important difference between CI and vocoder listeners is that the vocoder 
listeners did not have chronic exposure to the vocoded and interaurally mismatched stimuli, 
and therefore could not take advantage of any possible adaptation to distorted and 
mismatched inputs over time (Svirsky et al. 2004; Reiss et al. 2007). For CI users, it is 
possible that contralateral unmasking might emerge or improve following long-term 
exposure to mismatch. On the other hand, a subset of our vocoder listeners did take part in 
several of the experiments over time.  We did not observe any evidence of training effects 
79 
 
over the course of the multiple experiments.  In the monaural conditions, for which the 
stimulus parameters were identical across the individual experiments, there was no 
evidence of training effects for the four listeners who participated in experiments 2.2 (34% 
correct), 2.3 (23% correct) and 2.4 (27% correct). There was also no evidence that our 
vocoder listeners grew resistant to spectral mismatch as the study progressed. With the 
exception of the conditions with very poor vocoder spectral resolution in Experiment 2.3, 
no significant contralateral unmasking was observed for a spectral mismatch of 4 ERBs in 
any of the experiments. Still, this study was not specifically designed to examine effects of 
training and plasticity; we cannot rule out the possibility that with more extensive and 
controlled exposure to inter-aurally mismatched stimuli, adaptation could emerge. 
An additional limitation of the current study is that it measured a very specific 
aspect of binaural hearing. These results might not generalize to all listening situations, and 
there could be listening environments and situations where remapping to reduce spectral 
mismatch might not be advantageous. While the contralateral unmasking paradigm 
demonstrates that CI and vocoder listeners are capable of experiencing squelch, the 
complete isolation of the interfering speech from the target in the CI or vocoder ear is an 
artificial situation that would not be encountered in real environments.  Still, Bernstein et 
al. (2015) showed that the squelch effect was reduced when a more typical 6-dB of 
contralateral attenuation was employed. Previous studies have shown that SSD listeners 
can benefit from a CI for sound localization and for taking advantage of better-ear listening 
(Arndt et al., 2010; Buechner et al., 2010; Erbele et al., 2015; Firszt et al., 2012; Hansen et 
al., 2013; Vermeire & van de Heyning, 2009; Zeitler et al., 2015). Reducing spectral 
mismatch might improve localization since small interaural offsets can cause a 
80 
 
considerable disruption of the ITD and ILD cues needed for sound-source localization 
(Goupell et al., 2013; Kan, Stoelb, Litovsky, & Goupell, 2013b; Litovsky et al., 2012). On 
the other hand, altering the CI frequency allocation might impair speech perception in the 
implant ear, since the remapping would preclude the inclusion of low frequencies in the CI 
map. However, this loss of low-frequency information might not affect speech perception 
for a SSD-CI listener because head shadow is very limited at low frequencies (Bronkhorst 
& Plomp, 1988). As a result, in the free field, nearly identical low-frequency acoustic 
information should be available in the NH acoustic ear.  In any case, further work is needed 
to investigate the possible impact of remapping to reduce interaural mismatch on a wider 
variety of speech-perception and sound-localization tasks before clinical recommendations 
can be made to take this approach for SSD-CI listeners. 
 
Conclusions 
The results of the experiments presented here demonstrate that spectral and 
temporal interaural mismatches reduce contralateral unmasking in a speech-identification 
task with interfering talkers for NH listeners presented with unprocessed signals in one ear 
and vocoded signals in the other. Spectral mismatches in the range that an average SSD-
CI listener is likely to experience with standard frequency mapping (4–6 ERBs) were 
particularly detrimental to performance (Experiments 2.1, 2.3 and 2.4). The detrimental 
effect was mitigated to some extent by reducing the spectral resolution of the vocoder 
(experiment 2.2), although an approach that purposefully reduces frequency resolution is 
likely to impair speech-reception performance in other conditions not tested here. 
Temporal mismatches in the range expected for SSD-CI listeners (<12 ms) had a less 
81 
 
pronounced negative effect (experiments 2.3 and 2.4), although maximum contralateral 
unmasking was observed when signals were aligned across the ears in both time and 
frequency (experiment 2.4). Overall, the results of this study highlight the need for 
interaural alignment to maximize the use of interaural differences to parse a complex 
auditory scene involving multiple competing talkers when presented with unprocessed 
speech in one ear and only envelope information contralaterally. SSD-CI listeners might 
benefit from strategies to reduce interaural mismatch, such as frequency remapping or 
introducing a processing delay to the acoustic ear.  
 
Footnotes Chapter 2 
1 Cochlear implants are not currently labeled by the United States Food & Drug 
Administration for use for the treatment of SSD. 
2 Bernstein et al., (2016) also showed a similar squelch benefit for bilateral CI listeners, but 
the SSD-CI configuration is the focus of the current study. 
3 A Pearson type 7 distribution is similar to a normal distribution in that it can account for 
kurtosis and skew, thus the data could be more accurately represented using this type of 
fitting function. The free parameters used to fit the data were mean, variance (standard 
distribution), amplitude and skew.  
 
 
 
82 
 
Chapter 3: Effect of compression and expansion on binaural 
hearing in simulations of SSD-CI listeners 
 
 
Introduction 
 
Binaural hearing is integral for sound localization and hearing speech in noisy 
environments. Therefore, individuals without binaural hearing are at a severe disadvantage 
when it comes to listening in our complex, noisy world. A form of hearing loss with 
functional limitations that has been traditionally under-appreciated is SSD, which refers to 
a profound loss of hearing in one ear. Traditionally, SSD was not treated because it was not 
considered incapacitating. However, after it became apparent that SSD was a disability, 
hearing aid treatments became available. Treatments include a CROS hearing aid, which 
routes the signals from the deaf side via wireless or wired link to a hearing-aid transducer 
placed in the NH ear.  Additionally, a BAHA that routes the signals from the deaf side of 
the head to the NH ear via a transducer surgically implanted to stimulate the listener’s skull 
has been used as a treatment for SSD.  These methods have been successful in alleviating 
some of the adverse effects of SSD mainly by giving access to signals presented on the 
deaf side (by delivering them to the NH side). However, these treatments do not restore 
binaural hearing, and these patients still have trouble with sound localization and hearing 
in noisy environments (Grantham et al., 2012; Linstrom et al., 2009). 
In the past several years, CIs have been considered as a possible new treatment 
option for SSD. A CI is a surgically implanted device that treats deafness by bypassing the 
83 
 
dead or damaged hair cells in the cochlea by direct electrical stimulation of the spiral 
ganglion neurons in the auditory nerve.  Historically, CIs have only been implanted in the 
profoundly deaf.  Although CIs are not currently approved by the United States Food and 
Drug Administration for the treatment of SSD, criteria for implant candidacy at individual 
centers has been relaxed in the last few years and a substantial number of individuals with 
SSD in the U.S. have received CIs. Currently, the most compelling evidence for CIs aiding 
SSD listeners in spatial hearing comes from studies that have examined performance in 
localizing a sound source (Arndt et al., 2010; Firszt et al., 2012; Hansen et al., 2013) and 
from studies that have assessed the advantages for listening to speech in noise when there 
is a spatial separation between the two signals (Buechner et al., 2010; Firszt et al., 2012; 
Hansen et al., 2013).  
Binaural hearing provides the listener with two main advantages: head-shadow and 
squelch. Head-shadow allows listeners to take advantage of listening to the ear with the 
better SNR, regardless of which side of the head receives the better SNR (Schleich et al., 
2004). Squelch is a neural process that involves the use of differences in timing and level 
from sources originating in different locations to reduce the effective amount of masking. 
CIs mainly provide a benefit to SSD listeners in configurations where the signal is on the 
deaf side, and/or the masker is on the NH side of the head, consistent with the idea that the 
CI allows users to take advantage of head-shadow effects and a better-ear listening strategy 
(Arndt et al., 2010; Buechner et al., 2010; Firszt et al., 2012; Hansen et al., 2013). The 
magnitude of the head-shadow effect is approximately (2-5 dB) for SSD-CI listeners 
(Kamal, Robinson, & Diaz, 2012; Schleich, Nopp & D’Haese, 2004b). Until recently, there 
has been limited evidence that a CI for SSD can provide binaural squelch to the listener. 
84 
 
Squelch is defined operationally in this study as the improvement in speech understanding 
(relative to monaural performance) when the speech and noise are spatially separated and 
the ear with the poorer SNR is added. For the binaural squelch advantage there is no added 
target speech information that is not already available at the other ear; this kind of 
measurement isolates the component of the benefit related to binaural interactions.  The 
binaural squelch benefit is different than the head-shadow benefit. The head-shadow 
advantage is defined operationally in this study as the improvement in speech 
understanding when the speech and noise are spatially separated and the ear with the better 
SNR is added. 
A series of studies examining SSD-CI listeners (Bernstein et al., 2016; Bernstein et 
al., 2017) and NH listeners presented with vocoder simulations of SSD-CI listening 
(Bernstein et al., 2015: Wess et al., 2017) have demonstrated binaural squelch under certain 
conditions. Specifically, these studies found a significant squelch benefit in listening 
situations where the target and interfering speech were produced by talkers of the same 
gender. This was taken as evidence that having hearing restored in the deaf ear via a CI can 
provide spatial cues to perceptually separate competing talkers when they are difficult to 
perceptually separate based on monaural pitch and timbre cues alone.  
Despite the fact that on average, SSD-CI and (vocoder simulated SSD-CI) listeners 
experience binaural squelch, there was considerable individual variability in the magnitude 
of the squelch benefit.  Furthermore, the squelch benefit was considerably larger for NH 
listeners presented with vocoder simulations of SSD-CI listeners than for actual SSD-CI 
listeners (Bernstein et al., 2016). There are many factors that could cause this inter-subject 
variability and limited squelch benefit for SSD-CI listeners, such as neural survival (e.g., 
85 
 
Maslin et al. 2013), limited cortical plasticity (e.g., Litovsky et al. 2012; Maslin et al. 2013), 
electrical current spread (e.g., van Hoesel, 2012) and duration of deafness before 
implantation (e.g., Blamey et al., 2012). Alternatively, certain distortions inherent in CI 
processing are likely to lead to large differences in processing between the NH and CI ears 
which could limit binaural integration required to generate squelch. Wess et al. (2017) (as 
reported in Chapter 2) showed that spectral mismatch, and to some extent temporal 
mismatch, between a NH ear and a vocoder-processed ear reduced the magnitude of the 
squelch benefit. This chapter examines another possible factor that may limit speech-
perception benefits for SSD-CI listeners: a mismatch in loudness growth between the two 
ears.  
CIs have a dramatically reduced DR relative to a NH acoustic ear, due in part to a 
poor electrode-neural interface (Kawano et al., 1995). CI users have a reduced total DR of 
about 40 dB (McDermott & Varsavsky, 2009), whereas the DR for a healthy NH ear is 
approximately 120 dB (Moore, 2003).  Electrical stimulation allows for fewer just-
noticeable differences (JNDs) between threshold and maximum acceptable loudness level 
than an acoustic ear. It has been estimated that the number of intensity steps for CI listeners 
ranges from 7 - 45 (compared to 83 in NH listeners) and this number is highly variable 
across CI listeners (Nelson et al., 1996). This means that CI technology cannot encode the 
full DR into an electrical representation. Compression is a necessary step in CI signal 
processing, however it does alter some important features present in the signal. 
Compression is usually implemented on the signal envelope via a static logarithmic 
function. A static compressive function has been shown to preserve semi-normal loudness 
growth in many CI users (Zeng & Shannon, 1992). Conversely, expansion relates to 
86 
 
expanding the fast amplitude fluctuations of a speech signal envelope to potentially 
increase the intelligibility of the signal (van Buuren, Festen, & Houtgast, 1999). 
First, compression is known to smear acoustic landmarks important not only for 
vowel comprehension but for identification of word boundaries (Li & Loizou, 2009). 
Second, compression distorts binaural cues by raising thresholds for discriminating ILDs 
(Grantham et al., 2008). CI listeners must rely on ILDs for spatial hearing cues, because 
ITD fine structure cues are discarded in CI processing (Loizou, 2006).  Thus, ILD threshold 
increments that occur with compression are likely to weaken a SSD-CI listener’s ability to 
perceptually stream apart acoustic events based on location, and thereby reduce binaural 
squelch.  As noted previously, the ability to achieve binaural squelch is based on ILDs, 
which provide information about where in space the target and interferers are located.  
Third, compression could alter the effective TMR, which could change the amount of 
speech information that comes through the CI ear.   
The specific attribute of interaural difference in the loudness growth that this study 
examined was range of envelope compression and expansion factors in vocoder 
simulations of SSD-CI listening.  To examine the effects of envelope compression on 
speech perception, this study took as a starting point the results of prior studies (Bernstein 
et al., 2015, 2016; Wess et al., 2017; Chapter 2), that were designed to measure binaural 
squelch in the absence of head-shadow effects. In those studies, the target was presented 
to only the unprocessed ear while the interferers were presented simultaneously to both 
ears (bilateral condition) or to only the acoustic ear (monaural condition). This contralateral 
unmasking paradigm allowed for an examination of the binaural squelch (i.e., reduction in 
masking effectiveness) provided by the vocoded ear because no target speech information 
87 
 
was provided to the vocoded ear. Moreover, the contralateral unmasking paradigm allowed 
for complete isolation of the interferer signals in the vocoder ear (i.e. no target energy in 
the vocoded ear), which is not realistic and would not happen in the real world.  
Furthermore, in real world competing talker situations, both ears contain mixtures of target 
and masker energy.  However, in the current compression/expansion experiment, it was 
important to have a mixture of target and interferers in the vocoder ear to examine the effect 
of a change in the effective TMR in that ear. Therefore, compression/expansion is likely to 
affect spatial hearing in this more realistic paradigm by altering TMRs and ILDs.  For these 
two reasons, the experiments described in this chapter examined the effects of 
compression/expansion using simulations of spatially separated target and interfering 
talkers.   
HRTFs were used to simulate the effects of the head on the amplitude and phase 
characteristics of a signal coming from any given direction. An HRTF is an individualized 
frequency response describing how a sound signal is transformed by the head, external ear 
and to a certain extent, torso (Gardner, 1995). HRTFs permit researchers to represent any 
degree in the horizontal plane, rather than being restrained by physical speaker locations. 
They also allow for signal manipulations that could not be reliably represented in the free-
field (i.e. SSD vocoder simulations). Most importantly, generalized HRTFs are generally 
reliable across individuals due to similarities in head size and shape among different 
listeners; this applies specifically to interaural cues and not pinna cues. Individualized 
HRTFs are better for localization tasks than generalized HRTFs, especially in the vertical 
plane. However,  generalized HRTFs are consistent across individuals in producing spatial 
88 
 
perceptions in the horizontal plane and are valuable tools for studying hearing in spatial 
noise (Begault, Wenzel, & Anderson, 2001; Wenzel, Wightman, & Kistler, 1991).  
 As in the previous chapter, this study utilized vocoder simulations of SSD-CI 
listening. Vocoder simulations are used extensively in CI research and are an invaluable 
tool for studying aspects of CI processing without potential sources of uncontrolled inter-
subject variability often found in experiments involving CI listeners. Although vocoder 
processing is an imperfect estimation of what CI users might hear (Freyman et al., 2008; 
Ihlefeld & Litovsky., 2012; Li & Loizou, 2009), the key advantage of using vocoder 
simulations in this set of experiments is it allows for more direct control over the relative 
compression in the two ears than what could be achieved with actual CI listeners.   
The goal of this study was to examine how interaural differences in loudness growth 
in vocoder simulations of SSD-CI listening would affect two main benefits to speech 
perception that CIs are known to provide to listeners with SSD:  binaural squelch and head-
shadow. More specifically, this study examined the effect of envelope compression or 
expansion on binaural squelch (Experiment 3.1) and on head-shadow benefit (Experiment 
3.2). 
 
 
 
Experiment 3.1: The effect of envelope compression and expansion on squelch in 
simulations of cochlear implants for SSD listeners 
 
89 
 
Experimental question. How do envelope compression and expansion affect 
squelch in a HRTF generated virtual free-field environment? 
 
Hypothesis. The relative effects of both compression and expansion on squelch will 
likely depend on the TMR. The hypothesized effects of compression and expansion are 
summarized in Table I below. At positive TMRs, squelch was not expected based on results 
from previous studies (Bernstein et al., 2015, 2016) that indicated the most squelch occurs 
at negative TMRs where monaural cues are insufficient for the perceptual separation of 
competing talkers. Therefore, we did not expect an effect of compression or expansion at 
positive TMRs. At negative TMRs, compression will amplify quiet sounds (i.e., the target) 
relative to louder sounds (i.e., the interferers), which will effectively decrease the 
amplitude differences between target and interferers. Because in this case the target is the 
quieter sound, the low-level amplitude compression should make the ILD for the interferers 
and target more similar. This would reduce the perceived spatial difference between them 
and potentially reduce squelch. On the other hand, expansion should exaggerate the relative 
difference in amplitude between the target and interfering speech in the vocoder ear, 
thereby increasing the perceived spatial separation.   
 
 
 
 
 
 
90 
 
Table I. Hypothesis table for Experiment 3.1 
TMR Compression 
@ vocoded ear 
Expansion 
@ vocoded ear 
+ Unlikely to see unmasking benefit 
No effect 
Unlikely to see 
unmasking benefit 
No effect 
‾ ↑ Target level relative to masker 
↓ Performance 
↓ Target level 
relative to masker 
↑ Performance 
 
Table I.  Hypothesis table indicating the predicted outcome after compression and expansion. 
Based on previous experiments, when the TMR is positive, listeners are unlikely to 
demonstrate squelch. Therefore, it was predicted that compression and/or expansion would not 
have any effect on performance. However, when TMRs are negative, an unmasking benefit is 
likely, and therefore compression and expansion are likely to affect performance. We predicted 
compression would increase the level of the target in the vocoded ear, making the target and 
masker level more similar, essentially reducing the ILD between the target and maskers, 
disrupting performance. Expansion should exaggerate the level difference between the target 
and masker and improve performance.  
 
 
 
 
 
Figure 3.1. Prediction of what might happen to the squelch advantage after compression in 
Experiment 3.1. 
 
 
 
Methods. 
91 
 
 
 
Approach. This experiment employed an HRTF-based simulation of spatially 
separated targets and maskers. This paradigm allowed for more realistic presentation of 
competing speech signals, with both ears receiving a mixture of target and masker energy 
as would occur in the free-field. This experiment was similar to the contralateral-
unmasking paradigm of Bernstein et al. (2015, 2016) and Wess et al. (2017) in that it was 
used to measure the squelch benefit provided by a second (vocoded) ear in perceptually 
separated concurrent streams of speech. An additional similarity was that the target was 
located closest to the acoustic ear and the two same gender maskers were located closer to 
the vocoded ear. As in these previous studies, monaural performance was compared to 
bilateral performance.  The amount of binaural squelch benefit was calculated as the 
difference between monaural (unprocessed ear only) and binaural performance – i.e., the 
magnitude of benefit the listener receives from the addition of the vocoded ear. In this 
experiment, the interfering speech was located closest to the vocoded ear, and therefore 
had a poorer TMR than the unprocessed ear.  Thus, the addition of the vocoder ear had the 
opportunity to provide a squelch benefit, but not a head-shadow advantage. The target was 
presented virtually on the left side (-60 degrees) and two interfering talkers were presented 
on the right (+60 degrees) using HRTFs (see Figure 3.1).  Compression and expansion 
were varied parametrically in the vocoder processing. This configuration was chosen 
because the largest possible head-shadow effects arise for sources originating from a 60 
degree azimuth (Culling, Jelfs, Talbert, Grange, & Backhouse, 2012), on the order of 9 dB. 
Therefore, for a target and interfering sources at ±60 degrees, the TMR of a target source 
is about 18 dB higher to the ear closer to the target source than the ear closer to the 
interferer. However, this is the theoretical maximum and is unlikely to occur in the real 
92 
 
world, where reverberation will decrease the SNR difference between the ears (Culling, et 
al., 2012). These spatial configurations were specifically chosen to maximize spatial 
differences (head-shadow), while also providing enough spatial separation to facilitate the 
perceptual separation of target and interfering voices based on perceived differences in 
spatial location.  
 
 
Figure 3.2.   Schematic of the squelch experimental setup in experiment 3.1. The stimuli were 
presented over headphones, and each spatial configuration was created by convolving the 
speech with a generalized HRTF before additional processing. The target talker is located at 
˗60 degrees, closer to the acoustic ear. The two same-gender maskers are located at +60 
degrees, closer to the vocoded ear. 
 
Participants. Experiment 3.1 was carried out at Walter Reed National Military 
Medical Center, Bethesda Maryland. Seven paid listeners (age range 18-39) participated in 
93 
 
this experiment. All listeners had NH (defined as symmetrical thresholds equal to or better 
than 20 dB hearing level at octave frequencies between 125 and 8000 Hz) and were free 
from cognitive and neurological disorders. All listeners were native English speakers. The 
listeners that participated in Experiment 3.1 also participated in Experiment 3.2. 
 
Stimuli. The target and interfering speech were taken from the CRM speech corpus for 
multi-talker communication research (Bolia et al., 2000; Brungart, 2001). The CRM corpus 
consists of phrases of the form “Ready (call sign) go to (color) (number) now.” There were 
eight possible call signs (“Arrow,” “Baron,” “Charlie,” “Eagle,” “Hopper,” “Laker,” 
“Ringo” and “Tiger”), four possible colors (“blue,” “green,” “red” and “white”), and eight 
possible integer numbers (one through eight). A typical sentence would be “Ready Charlie 
go to white five now.” The target sentence call sign was always “Baron,” which provided 
the cue for the listener to identify which of the concurrent talkers was the target. The 
interferers used other call signs (e.g., “Arrow” or “Ringo”). Eight speakers (four females, 
four males) were used to record all possible combinations. To maximize the difficulty in 
perceptually separating concurrent talkers in the target (acoustic) ear, this experiment used 
all same-gender interferers and targets (Brungart et al., 2001). These are the conditions that 
produced the most masking release for SSD-CI listeners (Bernstein et al. 2016) and for 
single-sided vocoder listeners (Bernstein et al. 2015, 2016). The three talkers (target and 
two interferers) in a given trial were always of the same gender, although the gender varied 
randomly from trial to trial. The three simultaneous sentences were constrained such that 
they were always spoken by a different talker and had a different call sign, color and 
number. The target speech was presented at 60 dB SPL, with the interferer level adjusted 
94 
 
to yield the desired TMR. The following TMRs were tested: -16, -12, -8, -4, 0 and +4 dB. 
The TMRs were defined before the HRTFs and vocoding were applied. The TMR varied 
from -16 dB to +4 dB across the frequency range, with a speech importance-function 
weighted average +9 dB higher than the original TMR in one ear, and -9 dB in the other. 
The left ear was the unprocessed ear and the right ear was always the vocoded ear in this 
experiment. In the monaural condition, only the signals at the left (unprocessed) ear were 
presented. 
 
Generation of HRTFs.  We used HRTFs recorded at Oldenburg University (Wierstorf, 
Geier, Raake, & Spors, 2011). HRTFs were generated using an in-the-ear (ITE) microphone 
and a behind-the-ear (BTE) microphone (Siemens) on a Knowles Electronic Manikin for 
Acoustic Research (KEMAR). The HRTFs generated from the ITE microphone were used 
for stimuli presented to the unprocessed (left) ear and the BTE HRTFs were used for stimuli 
presented to the vocoded (right; CI simulation) ear. The excitation signal for the impulse 
response measurement used to create the HRTFs was presented from a loudspeaker at a 
distance of 80 cm from the center of the mannequin’s head. This study used HRTFs that 
were recorded at – 60 and +60 degrees. 
95 
 
                       
Figure 3.3. Schematic of HRTF acquisition for an ITE microphone and a BTE microphone.   
 
Noise vocoding. An 8-channel noise vocoder was used to extract speech envelopes 
in a number of frequency channels and the envelopes were used to modulate bands of noise. 
First, stimuli were passed through a bank of fourth-order Butterworth (-24 dB roll off) 
“analysis” filters, the frequency range of the analysis filters was 100 to 10000 Hz. The 
envelope of the signal in each channel was extracted via a Half-wave rectification, then 
low-pass filtered at 400 Hz with a second-order Butterworth filter. Compression or 
expansion was applied (if applicable) at this stage in vocoder processing. Each envelope 
was then multiplied by a white noise carrier, with the resulting signal then passed through 
a series of bandpass “synthesis” filters. In this study, no frequency mismatch was applied. 
Frequency content was always delivered to the correct cochlear place, with the synthesis-
96 
 
filter cutoff frequencies matching those of the analysis filters. Finally, the signals were 
summed across channels to create the noise-vocoded signal.  
 
Loudness manipulations.  Compression and expansion were implemented using a 
power-law function: y = Ax P + B, A and B are constants where A is the threshold and B is 
the max comfortable level (expressed as micro amps in electric hearing). Since this formula 
was used to manipulate the envelope of an acoustic signal, no noise floor or max 
comfortable level was necessary for our simulation, therefor A and B are not included and 
the formula becomes, y = x P . Instantaneous compression or expansion was applied to 
signal envelopes with compression exponents of 0.25, 0.50, 1 (linear, no compression), 1.5 
and 2 (expansion) before the envelopes were applied to the noise carriers. The input/output 
function and a stem plot for the compression, linear and expansion conditions are shown 
in Figure 3.3. The nonlinear transformation (compression or expansion) was applied to 
each vocoder channel independently. The level of the resulting signal in each channel was 
adjusted to be equal to the RMS level of the input signal for that channel, and the delays 
associated with the filtering process were removed. 
 
97 
 
 
Figure 3.4. Input/output function for the compression, linear and expansion conditions.  
 
Procedure.  Listeners were instructed to identity the target talker based on the call 
sign “Baron” and repeat back the color and number reported by the target talker, while 
ignoring two interfering talkers who used different call signs. Listeners were told which 
side of the head the target would be presented. For the training portion of the experiment, 
12 trials were presented for each TMR tested for the monaural condition and for the linear 
condition with no compression applied to the vocoder. Listeners were presented with 
blocks of 30 trials for a total of 120 trials in the training condition. The TMR varied 
98 
 
randomly from trial to trial. The monaural and linear bilateral conditions were fixed for all 
trials in a block.  
For the experimental portion of the experiment, listeners were presented with 6 
different vocoder-compression conditions: two compression conditions (compression 
factors = 0.25 and 0.5), two expansion conditions (compression factors = 1.5 and 2.0), a 
linear condition (compression factor = 1.0), and a monaural condition where no stimulus 
(interferers) was presented to the vocoder ear. Listeners completed 18 trials for each 
combination of TMR (-16, -12, -8, -4, 0 and +4 dB) and vocoder-compression condition.  
Listeners were presented with blocks of 36 trials for a total of 648 trials for each listener. 
Within each block, only one vocoder-compression was presented but the TMR varied 
randomly from trial to trial. 
Participants were seated in a sound booth and directed their attention to a computer 
screen. The speech stimulus was generated by MATLAB, played via a RME Hammerfall 
(Haimhausen, Germany) sound card, and presented over Sennheiser HD 280 headphones. 
The RMS of the signal varied depending on the TMR and spatial location. The RMS of the 
signals ranged from 60 – 75 dB SPL. The signal level was not fixed because preservation 
of the ILDs created by the HRTFs was necessary for this experiment. The computer screen 
displayed an eight-column, four-row array of colored digits corresponding to the response 
set of the CRM. The listener used the mouse to select the colored digit corresponding to 
the number and color spoken by the target talker who used the call sign “Baron”. After 
each response, the subject received feedback, with the button associated with the correct 
answer flashing briefly. For the response to be considered correct, both the color and 
number needed to be correctly identified.  
99 
 
 
Results. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3.5. The linear bilateral and monaural data from experiment 3.1. The monaural 
conditions are depicted with white circles and the linear bilateral conditions are depicted with 
black triangles. These data indicate that the listeners did not receive a significant squelch 
benefit (p> 0.05). 
 
 
Figure 3.5 plots the mean proportion of trials where the color and number were both 
correctly identified as a function of TMR for the monaural and linear vocoder conditions. 
100 
 
Percent correct was determined by summing the correct responses for each keyword (color 
and number) separately, then that number was divided by two.  
 
 
Figure 3.6.  Compression and expansion data from experiment 3.1. The left figure 3.6A, shows 
the effect of compression on squelch compared to monaural and linear bilateral performance. 
The right figure 3.6B shows the effect of expansion on squelch. The negative effect of 
compression compared to linear bilateral is clear at lower TMRs. The slightly positive effect 
of expansion is also evident at lower TMRs. 
 
 
Figure 3.6 plots the mean proportion of trials where the color and number were both 
correctly identified as a function of TMR for all six vocoder-compression conditions tested 
in the experiment. Figure 3.6.A. shows the effect of compression, plotting the results for 
101 
 
the two compression conditions (exp = 0.25 and 0.50) along with the monaural and linear-
vocoder conditions replotted from Fig. 3.5. Figure 3.5.B plots the results for the two 
expansion conditions (exp = 1.5 and 2.0) together with the monaural and linear-vocoder 
conditions replotted from Fig. 3.5.   
                                        
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3.7.  Data from experiment 3.1 plotted as a function of the compression parameter and 
TMR. The trend of improvements in squelch from compression to expansion are more clearly 
represented in this graph. Especially at low TMRs, listeners’ performance improves as the 
vocoded signal moves from highly compressive to expansive. The dashed lined indicated 
monaural performance for a given TMR.   
 
 
102 
 
Figure 3.7 plots the same data as in Figure 3.5 and 3.6, but instead as a function of 
the compression parameter and TMR.  The horizontal dashed lines indicate monaural 
performance for a given TMR. Plotted this way, these data more clearly show an effect of 
compression/expansion condition on performance, especially for lower TMR.  
The data were analyzed using a repeated-measures binary-logistic regression 
analysis with two within-subject factors (compression parameter and TMR). This analysis 
was used because the data were binary in nature (correct or not) and the analysis takes into 
account the likelihood that percentage-correct scores are different based on the number of 
trials presented. The initial analysis included all the vocoder conditions (plus monaural) as 
well as all the TMRs tested. There was no significant main effect of condition, but there 
was a significant main effect of TMR [χ² (5) = 10555.7, p<0.001] and a significant 
interaction between TMR and condition [χ² (6) = 148.5, p<0.001].  A subsequent series of 
binary-logistic regression analyses was conducted to determine the source of the significant 
interaction by examining the effect of compression/expansion condition on performance at 
each TMR.  Bonferroni corrections for 6 comparisons (TMR) were applied.  There was a 
significant main effect of compression/expansion condition for all TMRs except -4 dB. The 
statistical results for each TMR are as follows: TMR -16 dB = [χ² (5) = 99.1, p<0.005], 
TMR -12 dB = [χ² (5) = 41.9,  p<0.005], TMR -8 dB  = [χ² (5) = 32.5, p<0.005], TMR -4 
dB = [χ² (5) = 11.7, p=0.25], TMR 0 dB = [χ² (5) = 23.9, p<0.005] and TMR +4 dB = [χ² 
(5) = 161.4, p<0.005].  
  
For the TMRs that showed a significant main effect, a series of post-hoc pairwise 
comparisons was conducted to determine which pairs of conditions were significant. 
103 
 
Bonferroni corrections for 15 comparisons were applied after statistical analysis.  The 
results of the post-hoc tests are summarized in the table below Table II. Basically, 
compression disrupts performance compared to expansion at -12 and -16 dB and compared 
to linear (-16 dB TMR) performance. 
 
. 
Comparison   TMR   p value 
0.25 vs Linear -16 dB  p=0.03 
0.25 vs 2.0 -16 dB  p<0.001 
0.25 vs 2.0 -12 dB  p=0.01 
 
Table II. Results from the post hoc tests from experiment 3.1. These results indicate that high 
compression (exp= 0.25) disrupts performance compared to a linear (exp=1.0) vocoded signal 
or an expanded one (exp =2.0). 
 
Summary. 
In summary, the results of experiment 3.1 demonstrated an effect of the 
compression coefficient. As the exponent moved from exp = 0.25 (highly compressed) to 
linear (1.0) and expanded (2.0) performance improved. The largest performance difference 
was observed between the highly expansive and highly compressive conditions at lower 
TMRs. 
 
 
 
 
 
104 
 
Experiment 3.2: The effect of compression and expansion on head-shadow benefit in 
simulations of cochlear implants for SSD listeners. 
 
Introduction 
 
The previous experiments described in Chapters 1 and 2 focused on binaural 
squelch. Head-shadow effects are also important for hearing speech in spatial noise.  One 
of the most frequently reported speech-perception benefits for SSD-CI listeners is that the 
CI in the deaf ear allows them to take advantage of situations where the TMR is better at 
the CI ear (i.e., the head-shadow advantage). Here, the effect of compression and expansion 
on the head-shadow benefit were examined. The magnitude of the head shadow advantage 
provided by the CI will depend on the TMR at the CI. Therefore, we hypothesized this 
advantage would likely be impacted by compression or expansion due to changes imposed 
on the effective TMR at the CI ear.  
 
Experimental question. What role do amplitude compression and expansion have 
on head-shadow benefit for vocoder-simulated SSD-CI listening in an HRTF generated 
virtual free-field environment? 
 
Hypothesis. The effect of compression on head-shadow benefit will likely depend 
on the TMR at the vocoded ear after HRTF processing. Table III summarizes the 
hypothesized effects of compression and expansion on TMR and performance. 
Compression will tend to increase the level of the softer speech relative to the louder 
105 
 
speech. Therefore, at negative TMRs, compression should increase the effective TMR and 
therefore improve performance, while at positive TMRs, compression should decrease the 
effective TMR and decrease performance. Expansion is expected to have the opposite 
effect. This can be seen in Figure 3.8, compression will potentially reduce the TMR but 
expansion should expand it and performance should improve.  The TMR at the ear is the 
relevant quantity, since that is what the vocoder receives as its input signal.  The TMR at 
the loudspeakers and the effective TMR at the ears are different.  Although the relationship 
between the TMR at the ear and the TMR of the original signal is frequency dependent. 
Culling et al. (2012) showed that for the spatial configuration tested here, the speech-
weighted average TMR at the vocoded ear is about 9 dB higher than the signal TMR, after 
being convolved with an HRTF.   
In addition to the possible role of the effective TMR, compression or expansion 
could also distort the envelope and subsequent speech cues which could also disrupt 
performance. This distortion of acoustic features and speech cues is more likely to 
negatively affect performance in this experiment (3.2) than in the previous squelch 
experiment, (3.1) because the target is presented on the vocoded side of the head and 
therefore the listener will rely more heavily on target speech information contained in the 
vocoded signal. This is in contrast to Experiment 3.1 where the vocoder ear mainly 
contained interfering speech information and was used to primarily provide spatial 
information for the listener to perceptually separate the target from the interfering signals. 
 
 
 
106 
 
 
MR Compression 
@ vocoded ear 
Expansion 
@ vocoded ear 
+ ↓ TMR 
↓ Performance 
↑ TMR 
↑ Performance 
‾ ↑ TMR 
↑ Performance 
↓ TMR 
↓ Performance 
Table III. Hypothesis table indicating the predicted outcome after compression and expansion. 
Since the talker is now located on the vocoded side of the head, predictions differ from 
experiment 3.1. Here in experiment 3.2, compression and expansion are likely to influence the 
effective TMR at the vocoded ear. At a positive TMR, compression should lower the TMR and 
disrupt performance. At negative TMRs, compression should increase the TMR and improve 
performance. At a positive TMR, expansion should increase the TMR and improve 
performance and at negative TMRs, expansion should decrease the TMR and disrupt 
performance relative to the linear bilateral condition.  
 
 
 
Figure 3.8. Prediction of what might happen to the head-shadow advantage after compression 
in Experiment 3.2. 
 
 
 
107 
 
 
Methods. 
 
Approach.  This experiment employed the same spatial paradigm as in Experiment 
3.1 to maximize head-shadow effects, except that the locations of the target and interfering 
speech were reversed. Target and interferer azimuths of +60 and -60 degrees were used to 
maximize the potential for head-shadow benefit produced by adding the vocoder ear 
relative to the monaural condition (Culling et al., 2012). The target was presented on the 
vocoded, right side (+60 degrees) and the maskers were presented on the acoustic, left side 
(-60 degrees) using HRTFs to simulate the level and timing effects of the physical barrier 
created by the head. Comparing the monaural performance (unprocessed ear only, poorer 
TMR) to the bilateral performance (both ears) gives us a measure of head-shadow benefit 
afforded by the vocoder ear (better TMR). As in Experiment 3.1, the right ear was the 
vocoded ear and the left ear was unprocessed.  
 
 
 
108 
 
 
Figure 3.9.  Schematic of the head-shadow experimental setup in experiment 3.2. As in 
experiment 3.1 the stimuli are presented over headphones and each spatial configuration is 
created by convolving the speech with a generalized HRTF before additional processing. Now 
the target talker was located at 60 degrees, closest to the vocoded ear. The two same gender 
maskers were located at -60 degrees, closest to the acoustic ear. 
 
 
Participants. The same 7 NH listeners that completed Experiment 3.1 also 
participated in Experiment 3.2. The order of experiments was randomized between 
participants to reduce any chance of order effects. 
 
Stimuli. The stimuli for experiment 3.2 were similar to those used in Experiment 
3.1, except for the range of TMRs tested and the spatial locations of the target and maskers. 
Specifically, the range of TMRs (-4, 0, +4, +8, +12 and +16 dB) was higher to offset the 
large amount of attenuation of the target in the baseline monaural condition. This assured 
the TMR was almost always positive at the vocoded ear (i.e., about 9 dB higher than the 
109 
 
TMR before HRTF filtering), so that the task was possible, even in the monaural 
conditions. Therefore, the prediction (Figure 3.8) is that compression will reduced the TMR 
and thereby reduce reduced performance. Expansion should increase the TMR and increase 
performance in this task.   
 
Procedure.  This experiment required the listener to identity the target talker based 
on the call sign “Baron” and repeat back the color and number reported by the target talker, 
while ignoring two interfering talkers who used different call signs. Listeners were told to 
which side of the head the target would be presented. For the training portion of the 
experiment, 12 trials were presented for each TMR tested for both the monaural condition 
and the bilateral condition with no compression applied to the vocoder (linear).  Listeners 
were presented with blocks of 30 trials for a total of 120 trials in the training condition. 
The TMR varied randomly from trial to trial. The monaural and linear bilateral conditions 
were fixed for all trails in a block. 
For the experimental portion of the experiment, listeners were presented with 6 
different vocoder-compression conditions: two compression conditions (compression 
factors = 0.25 and 0.5), two expansion conditions (compression factors = 1.5 and 2.0), a 
linear condition (compression factor = 1.0), and a monaural condition where no stimuli 
(interferers) were presented to the vocoded ear (i.e.., a simulation of SSD without a device 
intervention). Listeners completed 18 trials for each combination of TMR and vocoder-
compression condition.  Listeners were presented with blocks of 36 trials for a total of 648 
trials for each listener. Within each block, only one vocoder-compression was presented, 
but the TMR varied randomly from trial to trial. As in Experiment 3.1, signals presented 
110 
 
to the left ear were unprocessed, while signals presented to the right ear were vocoded. But 
in contrast to experiment 3.1, the target speech was presented from the right (vocoded) 
side, so the vocoder ear had the better TMR.   
 
Results. 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3.10.  Plots the linear bilateral and monaural data from experiment 3.2. The monaural 
conditions are depicted with white circles and the linear bilateral conditions are depicted with 
black triangles. These data indicate that the listeners receive a head-shadow benefit at all TMRs 
except +16 (p=0.06), where performance between monaural and bilateral is not significantly 
different.  
111 
 
 
Figure 3.11.  Compression and expansion data from experiment 3.2. The left figure 3.11A 
shows the effect of compression on head-shadow benefit compared to monaural and linear 
bilateral performance. The right figure 3.11B shows the effect of expansion on head-shadow 
benefit. The negative effect of compression compared to linear bilateral is clear at nearly all 
the TMRs tested (3.11A).  
 
Figure 3.10 plots the mean proportion of trials where the color and number were 
both correctly identified as a function of TMR for the monaural and bilateral (linear 
condition). The data in Figure 3.10 show a clear head-shadow advantage for all TMRs 
tested except +16 dB. To illustrate the effect of compression and expansion on 
performance, Figure 3.11 plots the mean proportion of trials where the color and number 
were both correctly identified as a function of TMR for the monaural and all of the bilateral 
conditions. Figure 3.11.A. plots the data for the two bilateral conditions with compression 
(exp = 0.25 and 0.50). Figure 3.11.B. plots the results for the two bilateral conditions with 
112 
 
expansion (exp = 1.5 and 2.0). The monaural (white circles) and linear bilateral data (green 
triangles) from Figure 3.10 are replotted in both panels of Figure 3.11 for comparison. 
 
Table IV: Head-shadow benefit experiment post-hoc results 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Table IV.  Significant results from the post-hoc tests from experiment 3.2.  The “M” refers to 
monaural performance. These results indicate that high compression (exp= 0.25) disrupted 
performance compared to all other conditions (exp= 0.50, linear, (exp=1.50) and (exp=2.0), at 
least at some TMRs. These data indicate a large negative effect of compression on head-
shadow benefits. 
 
Comparison  TMR  p value 
0.25 vs Linear -4 dB p<0.001 
0.25 vs 1.50 -4 dB p=0.01 
0.50 vs Linear -4 dB p=0.004 
M vs 1.5 -4 dB p=0.02 
M vs Linear -4 dB p=0.006 
0.25 vs 0.50 0 dB p=0.03 
0.25 vs Linear 0 dB p=0.001 
0.25 vs 1.5 0 dB p<0.001 
0.25 vs 2.0 0 dB p<0.001 
M vs 2.0 0 dB p<0.001 
M vs 1.5 0 dB p=0.003 
M vs Linear 0 dB p=0.002 
0.25 vs Linear +4 dB p=0.005 
0.25 vs 1.5 +4 dB p=0.01 
0.25 vs 2.0 +4 dB p=0.006 
M vs 2.0 +4 dB p<0.001 
M vs 1.5 +4 dB p<0.001 
M vs Linear +4 dB p<0.001 
0.25 vs 0.50 +8 dB p=0.002 
0.25 vs Linear +8 dB p<0.001 
0.25 vs 1.5 +8 dB p<0.001 
0.25 vs 2.0 +8 dB p=0.005 
M vs 1.5 +8 dB p<0.001 
0.50 vs Linear +8 dB p<0.001 
M vs Linear +8 dB p<0.001 
0.25 vs Linear +12 dB p<0.001 
0.25 vs 1.5 +12 dB p=0.001 
M vs 2.0 +12 dB p=0.049 
113 
 
 
 
 
 
 
 
 
 
 
                                                        
 
 
 
 
 
 
Figure 3.12. Data from experiment 3.2 plotted as a function of the compression parameter and 
TMR. The trend of improvements in head-shadow from compression to expansion is more 
clearly represented in this graph. At low TMRs, the data appear in a bell shape, with 
performance being worse than linear bilateral with both compression and expansion.  
 
Figure 3.12 plots the same data as in Figure 3.10 and 3.11, as a function of the 
compression parameter.  The horizontal dashed lines indicate monaural performance for a 
given TMR. Plotted this way, the data more clearly show a clear effect of the compression 
parameter on performance at all but the highest TMR tested (+16 dB). The data were 
analyzed using a repeated-measures binary-logistic regression analysis with two within-
subject factors (compression parameter and TMR). The initial analysis included all the 
vocoder conditions (plus monaural) as well as all the TMRs tested. There was a significant 
114 
 
main effect of condition [χ² (5) = 1923.7, p<0.001], a significant main effect of TMR [χ² 
(5) = 28856.6, p<0.001] and a significant interaction between TMR and condition [χ² (6) 
= 28.0, p<0.001]. 
The initial analysis revealed a main effect of condition, a main effect of TMR and 
an interaction between TMR and vocoder condition, therefore a subsequent analysis was 
conducted at each TMR separately to determine the source of these significant results. 
Bonferroni corrections for 6 comparisons (TMR) were applied after statistical analysis. 
The statistical results for each TMR are as follows: TMR -4 dB = [χ² (5) = 251.2, p<0.005], 
TMR 0 dB = [χ² (5) = 242.4, p<0.005],  TMR +4 dB= [χ² (5) = 64.7, p<0.005],  TMR +8 
dB= [χ² (5) = 667.4, p<0.005] and TMR +12 dB = [χ² (5) = 660.2, p<0.005].  Thus, a 
significant main effect of compression and expansion was found at each TMR except +16 
dB. 
For the TMRs that showed a significant main effect, a series of post-hoc pairwise 
comparisons were conducted to determine the significant interactions between the vocoder 
and monaural conditions. Bonferroni corrections for multiple (15) comparisons were 
applied for each analysis.  The results of the post-hoc tests are summarized in Table IV. In 
summary, these results showed significant differences (a) between the 0.25 and linear 
conditions at most of the TMRs, (b) between 0.25 and expansion (1.5 or 2.0) at some 
TMRs, and (c) between the monaural condition and the linear or expanded (1.5 or 2.0) 
vocoder conditions at some TMRs.   
 
 
 
115 
 
In summary, the results of Experiment 3.2 show a significant negative effect of 
compression on head-shadow benefit at all TMRs except +16 dB.  Compression completely 
eliminated the head-shadow benefit in many cases. Expansion had no significant effect on 
performance, although there was a non-significant trend for expansion to slightly reduce 
performance relative to the linear condition. 
 
 
Discussion 
 
The goal of this study was to examine the effects of compression and expansion on 
squelch and head-shadow benefit in vocoder simulations of SSD-CI listening in virtual 
auditory environments. The results of these experiments show that compression was 
detrimental to performance in both the squelch and in the head-shadow experiment.  
Expansion afforded a modest benefit in the squelch experiment when compared to 
performance in the highly compressed condition. There was a trend toward expansion 
having a slight negative effect on performance in the head-shadow experiment (albeit not 
significantly). The impact of envelope compression and expansion could be attributed to 
changes in the effective TMRs or ILDs or to the distortion of envelope speech cues. TMR 
and ILD are related quantities, both are determined by the interaction between head-
shadow and spatial origin. However, they likely played different roles in determining 
outcomes in the squelch and head-shadow experiments. In this study, TMR is a monaural 
quantity referring to the level of the target and masker level in one ear. The TMR is 
especially important in the head-shadow experiment (3.2) since any listening advantage is 
116 
 
based on monaural listening to the ear with the better TMR. The essential cue for the 
squelch experiment was the ILD, which is the difference in loudness between the target 
and the maskers across the ears. In the squelch Experiment (3.1), the ILD provided the 
spatial cue necessary to perceptually segregate the target from the masker.  Finally, 
envelope distortion after compression/expansion could have reduced the intelligibility of 
the target. This likely would have been particularly detrimental in the head-shadow 
experiment, because the listener depended on the vocoder ear for the extraction of target 
speech information. This is in contrast to the squelch experiment where the vocoder ear 
provided spatial cues. 
 
The effect of compression and expansion on squelch. 
 
The effects of compression and expansion in this experiment can be understood in 
terms of changes in the effective ILDs of the target and interfering speech. Referenced to 
the unprocessed ear, the target ILD was positive (i.e., louder at the acoustic ear) and the 
masker ILD was negative, due to the spatial locations of the target and maskers. Since 
compression amplifies quieter sounds relative to louder ones, the effect of compression in 
the vocoder on the target and masker levels depends on the TMR at the vocoder ear. 
Therefore, at negative TMRs, compression would amplify the target. The TMR at the 
vocoded ear was very negative: the TMRs tested were negative (at the level of the 
loudspeakers) and the HTRFs exacerbated this effect. Therefore, in this experiment 
compression would have amplified the target relative to the masker in the vocoded ear.  
This effectively caused the ILD between the target and masker to become more similar, 
117 
 
thereby reducing the perceived spatial difference and ultimately minimizing squelch.  This 
is what occurred in this experiment: squelch was reduced as the compression exponent 
decreased from expansive (exp =2.0) to compressive (exp=0.25).  In contrast, expansion 
should have exaggerated the difference between target and masker in the vocoder ear, 
thereby increasing the effective ILD difference and improving unmasking. This is indeed 
what occurred: expansion provided a small listening benefit (i.e., more squelch) compared 
to that provided by the compressed vocoder signal. The HRTF-generated virtual auditory 
environment utilized in this experiment allowed for differences in spatial cues to be 
represented to the listener over headphones. In CI and vocoder processing alike, the major 
inter-aural spatial cue is loudness differences or ILDs.  A study by Grantham et al., (2008) 
examined the role of compression on ITD and ILD thresholds in BICI listeners. They 
measured ILD thresholds (with Gaussian noise bursts) with compression turned on and off, 
and found that compression drastically raised ILD thresholds for 10 out of 12 CI listeners. 
They found a mean ILD threshold of 3.8 dB with compression on to 1.9 dB with 
compression off. Similar results of the negative effects of compression on ILD threshold 
in CI listeners have been found by other researchers (Laback, Egger, & Majdak, 2014; 
Senn, Kompis, Vischer, & Haeusler, 2005). 
 The use of primarily ILD cues by CI and vocoder listeners has been widely 
reported in the literature (Buss et al., 2009; Garadat, Litovsky, Yu, & Zeng, 2009; Li & 
Loizou, 2009; Schleich et al., 2004b; van Hoesel, 2008). Mechanistically, this experiment 
causes ILD differences between the target and the interferers, and provides the listeners 
with a cue to differentiate the target from the masker based on the perceived difference in 
spatial location. Using the squelch paradigm described in Chapter 2, Bernstein et al., (2015) 
118 
 
investigated a situation (albeit artificial) where the masker ILD was held at 0dB and the 
target ILD was adjusted from negative infinity to 0 dB by mixing target energy with the 
masker energy in the vocoded ear. They found that as the target ILD decreased and became 
more similar to the masker ILD, the squelch benefit started to disappear. A target ILD of 6 
dB or less completely eliminated squelch.  
Another factor that might have influenced the results is the envelope distortion 
caused by compression and expansion. Distortion of the signal envelope could have led to 
decorrelation of the signals between the acoustic ear and the vocoded ear, which has been 
previously shown to limit unmasking (van de Par & Kohlrausch, 1998).  However, both 
compression and expansion could have distorted the envelope relative to the unprocessed 
ear and had a similar effect on performance. This is not what was found in the results, 
suggesting the decrease in performance after compression had more to do with ILDs than 
with the distortion of speech information.  
 
 
The effect of compression and expansion on head-shadow benefit. 
 
In the head-shadow experiment (Experiment 3.2), performance generally improved 
when the linearly processed vocoder signal was provided to the listener (compared to 
monaural). This is because in this paradigm the target talker was located closest to the 
vocoded ear. Therefore, this result reflects a head-shadow advantage when the stimuli were 
provided to the vocoded ear, because they could now listen to the vocoded ear with the 
better TMR to hear the target speech.  
119 
 
The results of this experiment can be thought of in terms of changes in the TMR at 
the vocoded ear after compression and expansion. The TMR is the important parameter 
here, since the head-shadow benefit is based on monaural listening to the ear with the better 
TMR. Since the TMR was always positive at the vocoded ear, we hypothesized that 
compression should have amplified the masker relative to the target thereby reducing 
performance. Conversely, expansion should have increased the level of the target in the 
vocoded ear, which should have improved performance. However, this is not what 
occurred. Whereas compression did reduce performance, expansion did improve it, and 
there was even a non-significant trend toward a reduction in performance. This suggests 
that the distortion of speech cues might have offset any TMR advantage that expansion 
might have provided. It is likely that a disruption of intelligibility via envelope distortion 
caused by compression and expansion likely contributed to the observed decrease in 
performance (for compression) and the lack of improvement in performance (for 
expansion).   
Envelope distortion and loss of intelligibility likely played a larger role in this 
experiment than in Experiment 3.1 since the listener had to primarily rely on the vocoder 
signal to hear the target.   There is evidence in the literature that implicates that compression 
and expansion distort speech cues, and this is particularly relevant for the head-shadow 
case because listeners are relying on speech cues in the vocoder ear. According to the 
lexical access (i.e. speech recognition) model suggested by Stevens (2002), the first 
component in successful speech perception involves breaking down a speech signal into 
“acoustic landmarks” based on frequency features and amplitude peaks in the signal.  If 
detection of these acoustic landmarks is compromised, then the listeners will have 
120 
 
difficulty perceiving the speech because word boundaries and syllable onsets will be 
misconstrued. Envelope compression has been shown to skew acoustic landmarks and 
subsequent word boundaries in speech, especially in noise. Combined with the poor 
spectral resolution of the vocoder, which reduces speech redundancy, compression causes 
the listener to lose the reliable cues required to correctly hear speech (Li & Loizou, 2009). 
Envelope distortion is also known to occur after envelope expansion (Clarkson & Bahgat, 
1991; Fu & Shannon, 1998; Lorenzi, Berthommier, Apoux, & Bacri, 1999). The effect of 
expansion on intelligibility is not expected to be as great as with compression, at least for 
NH listeners as predicted by the Speech Transmission Index (Steeneken & Houtgast, 
1980). The head-shadow experiment specifically called for the listener to primarily attend 
to the signal in their vocoded ear, since the target was located at 60 degrees, closest to that 
ear. It is likely that any alteration of the speech envelope in the head-shadow experiment 
could reduce performance. Expansion was not as detrimental as compression, possibly due 
to offsetting effects, whereby expansion might have increased TMR but also caused 
distortion in the signal. Alternatively, it could be that expansion is not as detrimental to the 
signal as envelope compression 
 
 
Implications for CI listeners. 
 
In this study, nonlinear loudness growth was found to affect head-shadow 
advantage and squelch for the NH listeners presented with vocoder simulations of SSD-CI 
listening. This suggests that adjusting the compression function is a potential candidate for 
121 
 
optimization for SSD-CI listeners. The ideal clinical solution for SSD-CI would be to have 
perfectly matched loudness growth between a CI and acoustic ear for SSD-CI populations. 
Given the reduced DR of a CI, this would be nearly impossible to achieve. However, 
audiologists do have some control over the details of the CI compressive function. This 
could allow the opportunity to test different strategies to offset the limitations imposed by 
envelope compression. For instance, envelope distortion via compression has been shown 
to severely limit the peak-to-trough ratio in the signal, which is a proxy for acoustic 
(obstruent) landmarks. The obstruent landmarks are consonants created by obstruction of 
vocal airflow.  A study by Li and Loizou (2009) measured the peak-to- trough ratio for 
linear and compressed speech and found a decrease of 7.6 dB in the ratio after compression 
(down from 10dB). They concluded that CI listeners will not likely be able to perceive such 
a small ratio (2.4 dB) and acoustic landmark identification will be greatly reduced.   
To address the distortion caused by CIs, researchers have suggested implementing 
different types of compression. For example, one type of compressive function uses an s-
shaped input-output function, which would expand low level input, up to a certain point 
(knee point), then compression would turn on after the knee point is reached. Theoretically, 
the audiologist could adjust the knee point based on estimated noise levels. This could 
enable expansion of acoustic landmarks in speech while compressing less important 
features of speech rather than louder more salient portions. This is accomplished by 
amplifying the portion of the DR where speech features are more likely to occur. Kasturi 
and Loizou (2007) implemented a sigmoid-shaped compressive function and found that 
this more sophisticated compression improved speech perception for CI listeners, but only 
when it was optimized for each listener individually.  The CI listeners showed improved 
122 
 
sentence recognition in noise when using a dynamic s-shaped function compared to a 
logarithmic compressive function, which they attributed to less distortion of critical speech 
features in the speech envelope. The success of a dynamic compressive function suggests 
that this might be an important potential target for optimization to reduce envelope 
distortion and subsequent speech perception distortions for SSD-CI listeners.  
 
Study Limitations. 
 
This study was conducted on NH participants listening to vocoder simulations of 
SSD-CI listeners. The power-law compression implemented in this study was very basic 
and likely didn’t capture the complexity of compression in CI processing. For example, 
our vocoder processing did not include any pre-emphasis speech amplification or gain 
control mechanisms. Nevertheless, the objective of this study was to examine spatial 
speech perception with simple CI compression and expansion, and this study was a critical 
first step in examining the spatial hearing outcomes of amplitude distortion in CI listeners. 
Another possible limitation to this vocoder experiment is that it does not capture the effects 
of plasticity that can occur after altered sensory input. More specifically, CI listeners have 
time to adapt and get accustomed to listening to their CI which contains compression 
distortions. Therefore, CI listeners who have substantial listening experience with their 
implant might not experience the same decrease in speech perception that occurred in our 
vocoder experiments. Although vocoder experiments are imperfect estimates of what 
actual CI listeners’ experience, they are valuable in that they allow for specific and 
independent manipulation of certain aspects of CI processing in a very precise way.  The 
123 
 
vocoder results alone do not determine what might happen in actual SSD-CI users, but 
these results suggest that effects of compression should be examined in future studies in 
SSD-CI listeners. 
A potential limitation of this study is that envelope compression was examined in 
isolation of any loudness specific conditions examining level differences between targets 
and maskers and possible outcomes on binaural squelch. In the real-world, a CI has very 
different loudness growth than an NH ear. The consequence of this is that the perceived 
loudness of the CI will change relative to the acoustic ear as a function of level. These 
differences in loudness growth would affect both the target and masker level (since HRTFs 
were used in this study). For actual SSD-CI listeners in the squelch experiment, where the 
target is located at -60 degrees and the maskers are located at 60 degrees, relative loudness 
differences between the target and maskers might profoundly affect performance in a non-
standard way.  This problem can be understood in terms of loudness growth curves from 
acoustic and CI ears adapted from McDermott & Varsavsky (2009) (Figure 3.13). For 
instance, for masker levels below the noise floor (below 25 dB SPL), SSD-CI listeners 
would not receive any binaural unmasking benefits, since the sound will not be transduced 
by the CI. The perceptual separation cue will not be provided by the CI and all talker energy 
will be relayed to the acoustic ear. For a masker level falling slightly above the noise floor 
(but less than 45 dB SPL), the SSD-CI listener might still not receive an unmasking benefit 
because the masker energy will not be loud enough to combine with masker energy in the 
acoustic ear and no listening benefit will be obtained (location on growth curved marked 
with number 1). However, a masker level of about more at about 50 dB (first cross-over 
point) would provide an unmasking benefit since the loudness between the masker will be 
124 
 
matched across the ears (Figure 3.13, point “2”). The situation changes for a masker sound 
that is loud (around 75 dB SPL, point “3”). In this situation, the masker will sound much 
louder in the CI ear relative to the acoustic ear. This situation might improve performance 
due to a very large ILD difference between the masker and target. Conversely, the louder 
maskers sound in a CI ear relative to the target dominated mixture in the acoustic ear could 
vastly increase the salience of the masker signals, which would limit a listener’s ability to 
hear the target. Finally, at the second cross over point (about 85 dB SPL, point “4”), the 
loudness between the target and maskers should match again and the listeners should 
receive an unmasking benefit. Taken together, the location of targets and maskers and the 
relative level differences of targets and maskers (due to CI loudness growth) could have 
profound effects on spatial hearing for SSD-CI listeners in real world situations. 
 
 
 
 
 
 
 
 
 
Figure 3.12. Loudness growth curve for a CI and acoustic ear adapted from McDermott 
& Varsavsky (2009). 
 
         Optimizing compression in CIs has become a focus of recent research efforts. Lopez-
Poveda et al. (2016) developed a compression strategy that was meant to mimic the efferent 
125 
 
olivocochlear reflex (OCR) found in normal hearing. The OCR is important for NH 
listeners because this pathway allows for dynamic manipulation of the physical properties 
of the basilar membrane, which can effectively adjust the gain of signals reaching the brain. 
The OCR can be initiated by ipsilateral and contralateral input, and has been shown to aid 
in speech perception in noise (Mishra & Lutman, 2014). Unlike for NH listeners, 
compression is fixed in the CI processor and is likely inferior to the dynamic, adjustable 
compression seen in a NH listener’s basilar membrane. Lopez-Poveda et al. (2016) 
examined the effect of compression on speech intelligibility in bilateral and SSD-CI 
listeners. They implemented a dynamic compressive function and compared performance 
to a standard logarithmic compressive function. They found a significant improvement in 
SRM and improved speech recognition for spatially separated speech and noise in both 
bilateral and SSD-CI listeners. These results are promising since the strategy is only an 
approximation of what a NH basilar membrane is doing at any given time. Ultimately, to 
ameliorate some of the CI amplitude distortions, steps needed to be taken to address the 
lack of front end compression (compression occurring at the microphone before additional 
signal processing) and the addition of artificial back-end compression (compression 
occurring later on in each individual channel), which can raise ILD thresholds and distort 
speech envelopes. 
An additional limitation of this study was the small number of subjects that were 
tested (n = 7). In spite of this, many of the comparisons between the compression and 
expansion conditions were significant. It was difficult to compare linear/compressive and 
126 
 
or linear/expansive conditions because of small effect sizes. More subjects would help 
differentiate these nuances in the data. 
 
Conclusions 
HRTF-generated virtual auditory environments were used to test whether 
compression or expansion had an impact on spatial hearing in vocoder simulations of SSD-
listening. In both the squelch and head-shadow experiments, the linear vocoder provided a 
listening advantage over monaural. This was especially true in the head-shadow 
experiment in which bilateral performance was better than monaural in all TMRs tested. 
Compression disrupted performance in both experiments but to a larger extent in the head-
shadow experiment. Expansion caused a slight improvement in performance in the squelch 
experiment (relative to compression), but showed no significant effect on performance in 
the head-shadow experiment. Taken together, these results suggest that compression and 
expansion had two effects on performance: (i) changed the relative loudness of target and 
maskers and (ii) introduced envelope distortions. The results of the squelch experiment are 
likely attributed to changes in ILDs, with compression causing target and masker ILDs to 
become more similar to one another and thereby reducing perceived spatial separation. 
 In the head-shadow experiment, the results indicate that distortion of speech cues 
via envelope manipulation likely contributed most to the observed outcomes. Since both 
compression and expansion disrupted performance, this suggests that envelope distortion 
likely played a role in the results. Additionally, compression and expansion could have 
changed the TMR in the vocoded ear. Compression should have reduced the loudness of 
127 
 
the target in the vocoded ear, diminishing performance. Expansion should have amplified 
the target in the vocoded ear, increasing performance. The compressive function is an 
aspect of CI programming processing that clinicians have access to control to some extent. 
Therefore, compression may be an important target for optimization to improve speech 
perception outcomes for SSD-CI listeners.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
128 
 
Chapter 4: The role of spectral mismatch on perceived binaural fusion 
in vocoder simulations of SSD-CI listening 
 
Introduction 
 
Binaural hearing improves the ability to hear in noisy, complex environments. For 
those with SSD (one normal-hearing ear and one deaf ear), this loss of binaural hearing can 
be challenging. CIs can restore some of the benefits of having two ears and facilitate spatial 
hearing for SSD listeners in a number of ways, such as improving sound localization and 
speech perception in noise. Most of the previous work examining spatial hearing in SSD-
CI listeners has found a clear head-shadow benefit after implantation, as evidenced by the 
fact that the CI provides improved speech perception when a target talker is on the deaf 
side or the noise is on the acoustic side of the head (Arndt et al. 2011; Bernstein et al., 
2017; Buechner et al., 2010; Firszt et al., 2012; Hansen et al., 2013; Zeitler et al., 2015;). 
For NH listeners, the head-shadow effect arises from the physical acoustic barrier created 
by the head and allows listeners to attend to the ear with the better SNR for the signal of 
interest. For SSD-CI listeners, this head-shadow benefit arises because the CI allows the 
listener to take advantage of the fact that the deaf ear has a better SNR. This benefit does 
not require any binaural computations and arises solely from the physical barrier of the 
listener’s head.  
In addition to head-shadow benefits, individuals with two normal-hearing ears also 
receive an additional advantage that can improve speech perception in noise: binaural 
squelch. Binaural squelch involves neural computations based on differences in timing and 
129 
 
level across the two ears to reduce the effective amount of masking in situations involving 
spatially separated sound sources (Drullman & Bronkhorst, 2000). Although most studies 
of speech perception in noise have not identified a binaural squelch benefit for SSD-CI 
listeners, a series of recent studies has shown that SSD-CI listeners (Bernstein et al., 2016; 
2017) and NH listeners presented with vocoder simulations of SSD-CI listening (Bernstein 
et al., 2015, 2016; Wess et al., 2017) can benefit from binaural squelch in situations with 
multiple competing talkers. In particular, binaural squelch is observed for SSD-CI and 
SSD-vocoder listeners when the target speech and interfering voices are all of the same 
gender, such that they are difficult to perceptually separate based on monaural cues such 
as voice pitch and timbre. 
The binaural squelch benefit is believed to arise out of binaural fusion ability, such that 
if listeners can fuse signals across the ears, they will receive a squelch benefit. In the 
contralateral unmasking paradigm discussed in Chapter 2, listeners received a binaural 
squelch benefit (relative to monaural listening) when a copy of the interfering speech was 
added to the vocoded ear. The improvement in performance in this paradigm is thought to 
occur because the listener can perceptually fuse the maskers presented to the vocoded ear 
and unprocessed ear. This fusion then led to the perceived spatial separation between target 
(perceived at the unprocessed ear) and masker (perceived as a diffuse image or in the center 
of the head), and thereby improved speech intelligibility.  
This binaural fusion hypothesis may also explain some of the results from Chapters 2 
and 3. We found that certain vocoder distortions reduced the amount of binaural squelch. 
According to the fusion hypothesis, the introduction of misalignments between the 
unprocessed ear and vocoded ear impaired the ability to perceptually fuse the diotic speech 
130 
 
signals, thus eliminating the perceived spatial separation of the target and interferer stimuli, 
which reduced binaural squelch. In particular, spectral and temporal mismatch were found 
to reduce binaural squelch, with spectral mismatch causing the most detriment to 
performance (Chapter 2). Envelope compression was also detrimental to contralateral 
unmasking and head-shadow benefit compared to linearly processed vocoded signals 
(Chapter 3). However, the results of the compression experiments were interpreted in terms 
of TMR and ILD distortions and were not obviously attributable to binaural fusion 
mechanisms.  Overall, the results from the CI distortion vocoder experiments implicated 
spectral mismatch as the largest potential cause of the performance decrease in our 
experiments, and hence, potentially most deleterious for actual SSD-CI listeners.  
Mechanistically, we sought to explain the deleterious effect of spectral mismatch on 
binaural hearing. This led to the current set of experiments. We hypothesized that the loss 
of contralateral unmasking benefit after spectral mismatch could be caused by a loss of 
binaural fusion between the stimuli presented to the ears. Normally, listeners are able to 
integrate the sounds in their two ears together to hear a single voice, allowing it to be 
perceptually separated from other voices in a mixture based on spatial differences. 
However, frequency mismatch could have distorted the vocoded signal to such a degree 
that the acoustic ear was no longer able to integrate and fuse bilaterally presented signals 
with the vocoded ear.  
The loss of binaural fusion hypothesis is a plausible explanation for the results of the 
previous experiments. However, the speech intelligibility measure is an indirect test of 
binaural fusion, and it has not been verified that these listeners have difficulty integrating 
signals across the ears after interaural misalignment. Additionally, it remains to be 
131 
 
elucidated if interaural alignment promotes fusion in this population.  The goal of these 
experiments was to more directly assess and measure binaural fusion of speech signals in 
a multi-talker mixture using vocoder simulations of SSD-CI listening. This was 
accomplished by either asking the listener how many voices they hear (Experiment 4.1) or 
asking the listener to discriminate between a case with diotic speech (the same signal 
presented to both ears) and dichotic speech (different signals presented to both ears) 
presented to the two ears (Experiment 4.2). 
Binaural fusion is the subjective experience of the perception of one sound rather than 
two that occurs when listeners are presented with signals to both ears (Steel, Papsin, & 
Gordon, 2015). Binaural fusion is essential for hearing in noisy environments and is 
encountered in nearly every real life listening situation. Integration of information across 
the ears into a cohesive, specific and continuous percept is an indispensable prerequisite to 
properly analyze an auditory environment and group sounds into distinct sources. 
 Despite the importance of fusion for listening in complex auditory scenes, binaural 
fusion is difficult to measure. The perceptual measurement of binaural fusion is usually 
accomplished by eliciting subjective reports of how “fused” the inputs to the two ears 
sound. This can be done using basic signals such as tones or noise, or by presenting speech 
to both ears (Aronoff et al., 2015; Reiss et al., 2014). For example, dichotic speech tasks 
involve presenting different verbal stimuli to each ear simultaneously and asking the 
listener whether the sounds are fused into one auditory image or object (integration) or if 
they are perceived as two separate sounds (separation). Binaural fusion has been 
investigated in CI listeners, with a focus on how spectral mismatch affects fusion. Fusion 
ability appears to be limited in BICI listeners, even with a small degree of spectral 
132 
 
mismatch between their processors. Listeners report unfused auditory images and often 
perceive multiple auditory images when there should only be one image (Kan et al., 2013). 
Goupell et al. (2013) also found that spectral mismatch impaired CI listeners’ ability to 
achieve auditory image fusion, and images that were perceived correctly were often 
lateralized incorrectly after spectral mismatch. Further study of fusion, but in bimodal CI 
listeners (CI in one ear, hearing aid in the other), came from work by Reiss et al. (2014) 
who measured very wide fusion ranges in her listeners; these same listeners were more 
readily able to fuse pitch-matched signals across the ears than BICI users. The ability to 
pitch-match may have occurred because the bimodal listeners had some acoustic hearing 
in their hearing-aid ear. Abnormally large fusion ranges could likely lead to interference of 
speech perception for these listeners. Finally, a vocoder study by Aronoff et al. (2015) 
examined the effect of CI distortion on binaural fusion in SSD-vocoder listeners by 
applying either spectral or temporal compression to the vocoded signal. They found that 
both distortions disrupted fusion, but spectral compression was far more detrimental to 
binaural fusion.  
Most of these previous studies focused on tonal or single electrode stimuli or 
directly asked about “fusion,” which relies on the participants’ subjective understanding of 
the meaning of this terminology.  
To be successful, a subjective measurement that asks the listener to report how 
“fused” the ears sound requires the listener to understand what is meant by the question. 
Many individuals might not be able to characterize their auditory perceptions at this level 
of abstraction. Therefore, this study took a different approach to addressing the fusion 
question by directly asking the listener to report the number of concurrent voices they heard 
133 
 
in a mixture (Experiment 4.1) or by asking listeners to discriminate two mixtures based on 
how many voices were presented (Experiment 4.2). The idea was that a lack of fusion 
should lead to an increase in the number of voices reported or poor performance in 
discriminating a single diotic voice from two dichotic voices. In contrast, complete fusion 
should lead to an accurate estimate of the number of voices present in a mixture or good 
performance in discriminating diotic from dichotic voices.  
In Experiment 4.1, listeners were asked to report the total number of talkers they 
heard in the scene (called a “numerosity” judgment). The key condition included was a 
diotic condition, where the same voice was presented bilaterally to the unprocessed and 
vocoded ears. This was the “fusion” condition, and it was paired with additional voices in 
the unprocessed or vocoded ear. The other key conditions were the control (foil) conditions, 
which were designed to be equivalent to the fusion conditions in all other respects except 
that the diotic voice was replaced by two different (i.e., dichotic) voices to represent the 
situation in which listeners were unable to perceptually fuse the diotic voice. If the listener 
was able to fuse the diotic stimulus, they should have reported it as one voice. Conversely, 
if the listeners were not able to fuse the diotic stimulus, they should report two voices for 
that stimulus (one voice in each ear). Listeners were only asked to report the number of 
voices they heard. They were not asked to recall any of the spoken speech.   
Listeners were tested in two vocoder conditions: with a “mismatched” vocoder 
where speech information was delivered to the wrong cochlear place (as would be expected 
with a standard CI allocation and incomplete electrode insertion) and with a “matched” 
vocoder where the frequency content was spectrally aligned in the two ears. This was done 
to investigate whether spectral mismatch, which was shown to impair contralateral 
134 
 
unmasking in Chapter 2, also affected numerosity judgments of fusion. Moreover, if the 
mismatched vocoder negatively affected fusion, then the listeners might report the diotic 
stimulus as two voices. Additionally, conditions were included that just required the 
listeners to segregate all NH or all vocoded talkers presented either monaurally or 
bilaterally. These additional conditions were run in an attempt to explain the results from 
the diotic fusion conditions of interest (Experiment 4.1).  
 The paradigm used in experiment 4.1 introduced a potential confound relating to 
perceptual limits of voice counting in listeners. The perceptual limits of voice counting is 
broadly referred to as the limit in numerosity judgments. Knowing the numerosity limits 
of the listener’s ability to count the number of voices in a scene is principal to 
understanding perceptual limits in multi-talker environments.  A recent study by 
Kawashima and Sato (2015) investigated the numerosity judgement limit for multiple 
concurrent talkers. They found that listeners were generally accurate in the range of 3 to 5 
voices, and accuracy increased when talkers were spatially separated. Knowing numerosity 
limits is important to understanding the results from the first fusion experiment in this study 
(Experiment 4.1).  
The paradigm in Experiment 4.1 was ultimately found to not be sensitive to spectral 
mismatch. This could have occurred because the listeners found it difficult to accurately 
count the number of voices in the mixture. There was also no feedback provided to guide 
them to learn what was being asked of them. Experiment 4.2 was designed to ask a similar 
question about whether listeners were able to fuse two copies of a speech signal presented 
to each ear. But in this case, listeners were asked to discriminate between “diotic” and 
“foil” stimuli in a two-alternative forced-choice (2AFC) task.  One ear was always 
135 
 
presented with unprocessed speech (acoustic ear) and the number of voices varied from 
one to six. The other ear was always presented with a vocoded stimulus (the vocoded ear) 
and only one voice was presented to the vocoded ear at a time.  The vocoded speech was 
either the same voice and the same speech segment as one of the voices in the acoustic ear 
or was a completely different voice and speech segment. The listener was instructed to pick 
the interval that contained a “fused” or “stereo” voice (i.e. the interval that contained the 
same voice presented to the vocoded and acoustic ear).  With only a single voice presented 
to the unprocessed ear, this was a trivial task.  However, the task became more difficult 
with the systematic addition of unprocessed voices to the NH ear. The key question was 
whether the ability to discriminate between the “fused” and “unfused” mixtures was 
affected by spectral mismatch.  In contrast to Experiment 4.1, this did not require listeners 
to count voices.  Additionally, listeners received correct-answer feedback to train them on 
the discrimination judgments. We hypothesized that the matched vocoder would give rise 
to a fused percept and the listener would have an easier time selecting the correct “fusion” 
interval than with a mismatched vocoder.  
 
 
 
 
 
 
 
 
136 
 
Numerosity judgments of binaural fusion: Experiment 4.1. 
 
Study Objectives.  The goal of this study was to develop a test of the perceptual 
binaural fusion of speech stimuli—based on counting or discriminating the number of 
voices in a mixture—that was sensitive to changes in interaural spectral mismatch. 
Experiment 4.1A was designed to evaluate how many individual talkers the listener heard 
in a mixture when one or two of the talkers were presented concurrently to the unprocessed 
and vocoder ears.  A second control experiment (4.1B) evaluated the number of total voices 
(either unprocessed or vocoded) that listeners could reliably count (i.e. numerosity 
judgments). Knowing the maximum number of voices that can be counted allowed us to 
determine whether any lack of difference between conditions could be ascribed to a limit 
in the number of perceptible voices in the mixture.  
  Experiment 4.1A presented combinations of one or more concurrent talkers, with 
each talker in the mixture presented just to the left ear (normal unprocessed speech), just 
to the right ear (vocoded speech) or diotically to both ears (normal speech in the left ear 
and vocoded speech in the right ear). Participants listened to a short segment of speech and 
then reported how many total voices they heard (0-6). In the “matched vocoder” conditions, 
the vocoder used the same synthesis and analysis filters thereby yielding a match in the 
cochlear place of stimulation across the ears. In the “standard vocoder” conditions, 
radiographic insertion depth data taken from Landsberger et al. (2015) were used to 
approximate the average spectral mismatch between the frequency allocation of the CI 
electrode array and basilar membrane for a typical CI listener. In cases where the diotic 
voice was fused, we expected listeners to report the correct number of voices in the mixture. 
137 
 
In cases where the diotic voice was not fused, we expected listeners to report one extra 
voice because the diotic signal would be perceived as two separate voices. We 
hypothesized that the standard vocoder would give rise to the unfused perception and the 
matched vocoder would lead to fusion of the diotic stimulus.   
Experiment 4.1B was a control experiment designed to determine the perceptual 
limits of accurate numerosity judgments for the NH listeners participating in this study. 
Experiment 4.1B also presented combinations of talkers to one or both ears, but in this case 
either all of the voices were vocoded or none of the voices were vocoded (i.e., 
unprocessed). This provided a set of control data that established how many talkers the 
listeners could count, using the same basic experimental procedure and stimuli.  
 
 
 Experimental Questions.  4.1A) Will a more accurate “place-matched” vocoder 
mapping facilitate better binaural fusion, leading listeners to better identify the correct 
number of talkers in a scene over a standard vocoder mapping?   
4.1B) How many total talkers (unprocessed or vocoded) can people segregate in an 
auditory scene?  
 
Hypotheses.  4.1A) A more accurate “place matched” map compared to a standard 
map will facilitate the listener’s ability to correctly identify the number of talkers in the 
acoustic scene. 4.1B) Listeners will be better able to identify the correct number of talkers 
in an acoustic scene when the talkers are presented acoustically. Numerosity judgments 
138 
 
will likely be worse when the talkers are vocoded. Accuracy will diminish in both situations 
when the number of talkers in the scene increases. 
 
Methods. 
Participants. There were 10 paid listeners (age range 18-30) in this experiment. All 
listeners had NH, defined as symmetrical thresholds equal to or better than 20 dB hearing 
level at octave frequencies between 125 and 8000 Hz and were free from cognitive and 
neurological disorders. Listeners were tested at the Air Force Research Laboratory, Wright 
Patterson Air Force Base, Ohio. The listener panel consisted of professional listeners, in 
that they are paid to conduct multiple psychoacoustic experiments. 
Stimuli.  This experiment utilized the CUNY topic sentence corpus (Boothroyd., et 
al., 1988). The corpus was originally developed using just two different talkers discussing 
12 different topic areas such as food, work, family, weather etc. An example sentence 
would be “The thunder and lightning from the storm last night woke up all of us.”  In order 
to create more than two talkers, the original corpus was modified using Praat software 
(Boersma & Weenink, 2007) to change the fundamental frequency, the intensity contours 
and other speech features. A total of 8 voices were used in this study: the 2 original talkers 
(1 Female and 1 Male) and 3 additional male and 3 additional female talkers that were 
created based on the original two recordings. 
Sentences from the corpus were concatenated by topic area and talker. After 
combining all of the sentences from a given talker and topic, 90-second paragraphs were 
created. Two-second samples of the concatenated paragraph were chosen randomly for 
each talker in each trial. 
139 
 
 
Procedure.  Experiment 4.1: The experiment involved having a listener report how 
many total talkers they heard in an auditory scene presented over headphones. Trials 
consisted of combinations of multiple concurrent talkers, each talking for 2 seconds.  Each 
individual talker was presented at a level of 60 dB SPL (acoustic ear) or the matched level 
to the vocoded ear. The experimental conditions are summarized in Table V below. The 
experiment was divided into two sets of conditions: the “diotic” conditions of interest and 
the “foil” conditions. The diotic “fusion” conditions included diotic presentation of the 
same talker signal, therefore the number of voices heard should depend on the amount of 
binaural fusion between the ears. The foil conditions are conditions where each voice is 
only presented to one ear, so there should be no impact of the effect of binaural fusion on 
the results. The foil conditions served as a control, to examine whether any differences 
measured in the test conditions can be attributed to differences in the amount of binaural 
fusion and not to other perceptual differences imposed by the vocoder frequency mapping.   
To create the percept of a talker originating from either the left, right or center of 
the listener’s head, each individual talker was presented to one ear or both ears 
simultaneously. The key condition was a diotic condition in which the same stimulus was 
presented simultaneously to both ears. The main question addressed in this experiment was 
whether or not listeners perceptually fused the diotic signal presented to both ears to hear 
a single voice. (NH listeners presented with diotic speech perceive a single talker in the 
center of the head.) The diotic “fusion” conditions were chosen to examine binaural fusion 
and numerosity judgments with increasing numbers of total talkers. Four sets (A, B, C and 
D) of conditions (fusion condition plus two foils) were presented (Table VI). The table 
140 
 
provides details about the arrangement of the speech stimuli in each condition. Each “X” 
denotes one voice. For example, Set A had the fewest number of total voices.  The diotic 
condition consisted of a single voice presented to both the vocoded and unprocessed ears.  
The listener therefore should report one voice if the stimulus was perceptually fused and 
two if not.  There were two controls included in Set A. The “unfused” foil condition 
represented a situation where the listener could not fuse two voices across the ears: two 
different voices which were presented to the vocoded and unprocessed ears.  Because 
different speech segments and voices were presented to the two ears, the listeners should 
always report two voices. The “fused” foil condition consisted of a single voice presented 
vocoded to the right ear.  This condition served as a control for what listeners would report 
if they heard the diotic voice in the test condition as a single fused voice. Although Set A 
was relatively easy with just a single voice in the test condition, Sets B-D introduced 
additional diotic or monaural voices into the mixture.  The configuration of the Sets 
corresponded to the location of the “fused” voice and the additional acoustic voices added 
to the mix. 
 
 
 
 
 
 
141 
 
 
Table V.  Experimental conditions for Experiment 4.1A. 
Set Configuration Condition Total # 
talkers 
Left ear only 
(unprocessed) 
Right ear only 
(vocoded) 
Diotic  
A Center Test A 1   X 
A  Foil (Unfused) 2 X X  
A  Foil (Fused) 1  X  
       
B Center/Left Test B 2 X  X 
B  Foil (Unfused) 3 XX X  
B  Foil (Fused)* 2 X X  
       
C 2 Center/Left Test C 3 X  XX 
C  Foil (Unfused) 5 XXX XX  
C  Foil (Fused) 3 X XX  
       
D Center/2 Left Test D 3 XX  X 
D  Foil (Unfused) 4 XXX X  
D  Foil (Fused)* 3 XX X  
Table V.  Experimental conditions for experiment 4.1A. Each set contains the test (fusion) 
conditions of interest and two control conditions. The “X” denotes one talker. The unfused foil 
represents a control condition where the listener is not expected to receive fusion because the 
voices are presented dichotically. The fused control condition represents a condition of what 
the test condition might sound like if it was actually fused. (i.e., it would sound like a single 
vocoded voice). 
 
*Set B fused is repeat of Set A unfused: Was not repeated twice 
*Set D fused is same as Set B unfused: Was not repeated twice 
 
The experiment began with a 15 minute training session. The training session was 
identical to the experimental session, except that only foil conditions were presented (no 
test conditions) and listeners were provided feedback at the end of each trial. After each 
trial, a GUI appeared which displayed a sentence asking the listeners, “How many total 
talkers do you hear?”  The listener used the mouse to select the button corresponding to the 
number of talkers they heard (1-6). During training, the listeners were provided feedback 
142 
 
(via blinking of the numbered button corresponding to the total number of talkers in the 
trial) about the correct answer. During the main experiment, feedback was not provided, 
since the goal of the experiment was to measure listeners’ subjective impression of the 
number of voices in the mixture.  Experimental conditions were presented randomly, with 
10 trials per condition, per experiment. 
Participants were seated in a sound booth and directed their attention to a computer 
screen. The speech stimulus was generated by MATLAB and played via a RME 
Hammerfall (Haimhausen, Germany) sound card and presented over Sennheiser HD 280 
headphones at a comfortable presentation level of 60 dB SPL. 
 
Noise Vocoding:  Noise vocoding was used to extract speech envelopes in seven 
frequency channels and then used the envelopes to excite specified regions of the cochlea 
(via synthesis filters). First, stimuli were passed through a bank of “analysis” filters, the 
frequency range of the analysis filters was 100 to 10000 Hz. The envelope of the signal in 
each channel was extracted via a Half-wave rectification then low-passed filtered at 400 
Hz with a second-order Butterworth filter. Each envelope was then multiplied by a white 
noise carrier, with the resulting signal then passed through a series bandpass “synthesis” 
filters. The level of the resulting signal in each channel was adjusted to be equal to the 
RMS level of the input signal for that channel, and the delays associated with the filtering 
process were removed. Finally, the signals were summed across channels to create the 
noise-vocoded signal.  
143 
 
Interaural spectral mismatch was introduced through the use of synthesis filters that did 
not match the analysis filters used to extract the envelope, thereby stimulating a different 
cochlear place than would be stimulated by an unprocessed acoustic signal. This particular 
spectral mismatch is different from the spectral mismatch implemented in experiments in 
Chapter 2 (2.1-2.4). Instead of using a linear spectral shift, this experiment employed a 
more realistic spectral mismatch that took into account published mismatch measurements 
for CI listeners.  Radiographic insertion depth data from Landsberger et al. (2015) were 
used to estimate the average mismatch that would occur for an average CI listener. These 
data were combined with clinical frequency allocations to create the corresponding 
standard and place-matched vocoder mappings summarized in Table IV. For a typical CI 
listener, the electrode is fixed and is stimulating a specific place of the cochlea; this cannot 
be changed.  In a vocoder simulation, this is emulated by having a fixed set of synthesis 
bands, so that the vocoder is always stimulating the same set of locations on the cochlea. 
For CI users, the electrode array is not fully inserted along the length of the cochlea. 
Therefore, the basilar membrane can only be stimulated down to the ~400 Hz place in the 
cochlea. This was represented by the low end of the lowest synthesis band being set to 438 
Hz. An audiologist has control over the analysis bands, which dictate which acoustic 
frequencies get delivered to each band. In our vocoder simulation for the “place-matched” 
case, the analysis bands are set to equal the synthesis bands, thereby providing a frequency 
match between analysis and synthesis channels.  In the “standard” case, the typical 100-
8500 Hz frequency range is input to the available channels, generating a place mismatch. 
In the standard map, the upper frequency cutoff for the analysis bands was set to 3548 Hz 
(Table VI). This was done to ensure that extra channels were not added for synthesis filters 
144 
 
above 8500 Hz both for audibility reasons (hard to excite frequencies that high) and to not 
include extra channels of auditory information. This ensured the two vocoder conditions 
had the same number of active “electrodes” (i.e., synthesis bands). 
 
     Table VI.  Frequency allocation for the place-matched and standard vocoder map. 
Channel    
# 
Synthesis Bands:  
(Hz) 
Analysis Bands: 
Place-Matched (Hz) 
Analysis Bands: 
Standard Map (Hz) 
1 438 ― 576 438 ― 576 100 ― 237 
 
2 576 ― 757  576 ― 757  237 ― 431 
 
3 757 ― 1238  757 ― 1238  431 ― 710 
 
4 1238 ― 2072  1238 ― 2072  710 ― 1115  
 
5 2072 ― 3548  2072 ― 3548  1115 ― 1707  
 
6 3548 ― 5623  3548 ― 5623  1707 ― 2574 
 
7 5623 ― 8500 5623 ― 8500 2574 ― 3849  
 
Table VI.  Frequency allocation for the place-matched and standard vocoder map. The spectral 
mismatch was created using radiographic insertion depth data from Landsberger et al. (2015) 
to estimate the average mismatch for a typical CI listener. These data were combined with 
clinical frequency allocations to create the corresponding standard analysis bands. The 
synthesis bands remained fixed for both vocoder conditions, since the synthesis bands represent 
the physical location of the electrode array on the basilar membrane in this vocoder simulation. 
For the place-matched map, the analysis bands are equal to the synthesis bands. For the 
standard map, the analysis bands are set as the standard CI frequency allocation. 
 
In control experiment 4.1B, all vocoded stimuli were presented with the place-matched 
vocoder. The goal of this part of the experiment was to determine how many total voices 
listeners were able to accurately count in the mixture with vocoded or unprocessed stimuli. 
Conditions are summarized in Table VII below. The total number of talkers varied from 2-
145 
 
6, and were either all vocoded or all unprocessed. In the bilateral conditions the total 
number of talkers was presented roughly evenly between the ears. In the monaural 
conditions, all the voices were presented to one ear. Experimental conditions were 
presented randomly, with 10 trials per tracked condition per experiment. 
Configuration Condition Total # 
talkers 
Voices  
Left ear  
 
Voices  
Right ear 
 
Bilateral Vocoded 2 X X 
Bilateral Unprocessed 2 X X 
Bilateral Vocoded 3 XX X 
Bilateral Unprocessed 3 XX X 
Bilateral Vocoded 4 XX XX 
Bilateral Unprocessed 4 XX XX 
Bilateral Vocoded 5 XXX XX 
Bilateral Unprocessed 5 XXX XX 
Bilateral Vocoded 6 XXX XXX 
Bilateral Unprocessed 6 XXX XXX 
Monaural Vocoded 2  XX 
Monaural Unprocessed 2 XX  
Monaural Vocoded 3  XXX 
Monaural Unprocessed 3 XXX  
Monaural Vocoded 4  XXXX 
Monaural Unprocessed 4 XXXX  
Monaural Vocoded 5  XXXXX 
Monaural Unprocessed 5 XXXXX  
Monaural Vocoded 6  XXXXXX 
Monaural Unprocessed 6 XXXXXX  
Table VII. Experimental conditions for numerosity experiment 4.1.B. Stimuli were presented 
as either unprocessed or vocoded. The “X” denotes one talker. Two through six voices were 
presented either monaurally or roughly spilt between the left and right ears. 
146 
 
Results. 
Figure 4.1.  Results from Experiment 4.1A: Sets A, B, C and D plotting perceived number of 
talkers vs experimental condition. The black bar represents the place-matched vocoder, the 
grey bar represents the standard vocoder map. The dashed line indicates the expected number 
of voices if the listener is receiving fusion. The solid black line represents the expectation if 
the listener does not receive fusion. The results indicate that listeners were more likely to report 
the test condition stimulus as unfused. Error bars represent ± one standard error of the mean. 
147 
 
The results of Experiment 4.1A are shown in Figure 4.1. Each panel plots the mean 
perceived number of talkers as a function of listening condition for one set of diotic 
“fusion” and foil conditions. The two bars in each pair are the different vocoder conditions, 
the standard (mismatched) map in grey and the place-matched map in black. The first pair 
of bars in each panel represents the diotic fusion condition which included one or two 
diotically presented talkers. The two other pairs of bars in each panel represent the two 
associated foil conditions, one representing an unfused percept and the other representing 
a fused percept. The horizontal lines in each plot represent the number of talkers the listener 
would have reported if they had perceived the diotic voice(s) as completely unfused (upper 
solid line) or completely fused (lower dashed line). 
In all four sets, as the number of voices included in the mixture increased, the 
number of perceived voices increased.  The listeners reported more talkers in the unfused 
controls than in the fused controls. Implementation of the standard vocoder had a marginal 
effect on the number of perceived talkers. Listeners generally reported a greater number of 
talkers than were presented, indicating a lack of fusion.  This is evident by the fact that the 
unfused control and test conditions show a similar response. This was true for sets B, C 
and D. For each of the four sets of conditions, a repeated-measures two-way analysis of 
variance (ANOVA) was conducted to examine the effects of vocoder condition and 
experimental condition on the reported number of talkers in the scene. Vocoder condition 
contained two levels (place-matched vs standard map) and experimental condition 
contained three levels (test, fused foil and unfused foil).  
148 
 
For Set A, there was a significant main effect of experimental condition [F= (2, 
16) = 64.4, p<0.001]. Post-hoc tests found differences between the fused foil and the test 
condition (p< 0.001) and between the fused and unfused foil (p< 0.001) condition.  
For Set B, there was a significant main effect of experimental condition [F= (2, 
16) = 64.4, p<0.001]. Post hoc tests found differences between the test condition and 
fused foil condition (p<0.01), and between the test condition and the unfused foil 
conditions (p<0.01). There was also a small (2.93 vs. 3.10 voices) but significant 
difference between the test condition and the unfused foil condition (p<0.01). This 
difference suggests that the listeners might have experienced very slight partial fusion of 
the diotic voice in this condition. 
For Set C, there was a significant main effect of experimental condition [F= (2, 
16) = 55.5, p<0.001]. Post hoc tests found differences between the unfused foil and the 
test conditions (p< 0.001) and between the unfused and fused foil (p< 0.001) conditions.  
For Set D, there was a significant main effect of experimental condition [F= (2, 
16) = 26.6, p<0.05].  Post hoc tests found differences between the unfused foil and the 
test condition (p<0.05) and between the unfused and fused foil (p< 0.001) conditions.  
The results of these experiments found no effects of vocoder on the perceived 
number of voices, but strong effects of experimental condition were found. Generally, there 
were no differences in perceived number of voices between the test condition and unfused 
foil condition.  But listeners reported more voices for the test condition than for the fused 
foil condition, and more voices for the unfused foil than for the fused foil condition.  
 
149 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.2.  Results from the numerosity experiment from Experiment 4.1B. The solid lines 
represent the unprocessed speech stimuli and the dashed lines indicate the vocoder conditions. 
The solid grey line is the identity line, which represents if the listener’s perceived number 
of voices matched the actual number of voices. Error bars represent ± one standard error of 
the mean. 
 
The results of Experiment 4.1B are shown in Figure 4.2. Figure 4.2 plots the 
perceived number of talkers as a function of the number of voices in the mixture. For this 
portion of the experiment, listeners had to count the number of voices all presented to one 
ear or distributed relatively evenly to both ears.  All of the voices in a mixture were 
unprocessed or they were all vocoded. Only the place-matched vocoder was used for this 
portion of the experiment. 
For the numerosity experiments, a repeated measures three-way ANOVA was 
conducted to compare the main effects of listening condition (monaural vs bilateral), 
150 
 
processing condition (acoustic vs vocoded) and number of voices (two through six voices), 
and the interactions between all three variables. There was a significant main effect of 
number of voices condition [F= (4, 28) = 82.2, p<0.001], no other significant main effects 
or interactions were found. Four main trends are apparent in the data. Just as in Experiment 
4.1A, the perceived number of talkers increased with the actual number of talkers. 
However, listeners tended to report more voices than were actually presented for two 
talkers. With three total voices, listeners were accurate in all conditions and under reported 
for four talkers and above... Surprisingly, there was no significant difference in 
performance between the vocoded and unprocessed conditions.  There was also no 
significant difference between monaural and bilateral presentation. Overall, numerosity 
judgments were relatively accurate in the two-four talker range, but listeners 
underestimated the number of talkers when there were more than four talkers in the 
mixture. Listeners perceived fewer voices than are actually presented when the number of 
voices increases beyond four. Four voices may be the numerosity limit for unprocessed 
and vocoded voices, at least in this experiment with limited spatial conditions (monaural 
or bilateral).   
 
 
 
Interim discussion: Experiment 4.1. 
 
The results of experiment 4.1A demonstrated that listeners almost always reported 
the “diotic” stimuli as two separate voices.  In every case except “Set A,” the number of 
151 
 
perceived voices was equal to the predictions based on the “unfused” foil, and substantially 
greater than predictions for the “fused” foil.  In this paradigm, listeners were not fusing the 
diotic “fusion” condition voices, and the listeners reported the fusion stimulus as two 
separate voices, instead of one fused voice.  
These results are counter to the contralateral unmasking results from Bernstein et 
al. (2015, 2016) and Wess et al. (2017).  Listeners must have been combining the diotic 
sounds from the ears in some way to achieve the contralateral unmasking advantage, but 
perhaps they were not fusing the voices to the point where they heard one single sound. 
This “incomplete fusion” could have been enough to provide a spatial listening advantage 
but the percept was not fused enough that the listeners freely reported one single fused 
voice.   
Experiment 4.2 took a slightly different approach to asking the question of whether 
listeners were able to perceive the same stimulus presented to the unprocessed and vocoded 
ear as a perceptually fused entity. In this experiment, listeners discriminated between two 
sequential speech mixtures: a diotic mixture (same voice presented to the unprocessed and 
vocoded ear) and a dichotic mixture (two different voices presented to each ear). This 
paradigm did not rely on the subjective report of the listener. The idea was that if the 
listener received even partial fusion from the fusion interval, they should have been able to 
report the correct interval. Another possible explanation of the results from 4.1 was that 
there was no feedback provided to the listener. The discrimination paradigm employed in 
experiment 4.2 allowed for feedback (correct/incorrect) to be provided to the listener after 
each response. 
 
152 
 
Discrimination, spectral mismatch and binaural fusion: Experiment 4.2 
 
Study Objectives.  The goal of this study was to examine whether a discrimination-
based perceptual test of binaural fusion of speech stimuli is sensitive to changes in 
interaural spectral mismatch. As in Experiment 4.1, vocoder simulations of SSD-CI 
listening were used to investigate the effect of spectral mismatch on the perception of 
fusion in NH listeners. A virtual cocktail party was created by presenting combinations of 
one or more concurrent talkers to the left ear (normal unprocessed speech), to the right ear 
(vocoded speech) or to both ears (normal speech and vocoded speech).  
The experiment was a 2AFC task where the listener was required to identify which 
interval contained a diotic speech signal. The signal interval contained the same speech 
waveform, presented unprocessed to the left and vocoded to the right ear (fusion possible). 
The reference interval presented an unrelated voice to the left and right ears (no fusion 
possible). With only a single diotic voice in the signal interval, this task was trivial, and 
most listeners could easily determine which interval contained the diotic voice.  The task 
was made more difficult by systemically adding additional unprocessed voices to the 
mixture in the left ear.  
 
  
Experimental question.  Are NH listeners presented with vocoder simulations of 
SSD-CI listening better able to identify the correct interval in which a diotic speech signal 
is presented with a place-matched vocoder frequency map than with a standard map? 
 
153 
 
Hypothesis.  The hypothesis was that listeners will be more likely to choose the 
correct fusion interval with a place-matched map. This will facilitate binaural fusion 
because the matching frequency bands across the ears should result in more interaural 
correlation between the signals. The place-matched map should facilitate more fusion over 
the standard map, no matter how many additional voices are added. However, performance 
could decline in both vocoder conditions with an increasing number of additional 
unprocessed acoustic voices added to the mixture in the left ear.  
 
Methods. 
 
Participants.  There were nine paid listeners (age range 18-30) participating in this 
experiment. All listeners had NH, defined as symmetrical thresholds equal to or better than 
20 dB hearing level (HL) at octave frequencies between 125 and 8000 Hz and were free 
from cognitive and neurological disorders. Listeners were tested at the Air Force Research 
Laboratory, Wright Patterson Air Force Base, Ohio. Seven out of the nine listeners who 
participated in Experiment 4.1 also participated in Experiment 4.2. 
 
Stimuli.  The stimuli used in this experiment were the same that were used in 
Experiment 4.1 (the CUNY topic sentence corpus with 8 different talkers). 
 
Procedure.  The experiment used a 2AFC paradigm to assess binaural fusion. The 
signal interval always contained a 2-second segment of speech that was presented diotically 
154 
 
(unprocessed in one ear, vocoded in the other). The reference interval always contained 
two different segments of speech produced by two different talkers, saying two different 
things, one unprocessed, one not.  These intervals were similar to the “test” and “fused 
foil” conditions from Experiment 4.1 (See Figure 4.3 for two example trials). Example trial 
one is the easiest case, where the listener should have no issue selecting the correct interval. 
Example trial two is harder and the listener might have more trouble selecting the correct 
interval.  
After each trial, a GUI window appeared and the following text appeared “Which 
interval contained one ‘stereo’ voice that was the same in both ears?” The listener’s task 
was to identify the interval where one vocoded speech signal in the right ear matched one 
of the unprocessed speech signals in the left ear. The listener used the computer mouse to 
select the button corresponding to the first or second interval. Blocks consisted of 
combinations of multiple concurrent talkers. The listeners were provided feedback (via 
blinking of the numbered button corresponding to the interval that contained the diotic 
stimulus).  
The training portion of the experiment contained three multi-talker conditions (one, 
two or three voices in the acoustic ear) and two vocoder conditions (standard vs place-
matched). The six combinations in the training were presented randomly in each block. 
The vocoder condition was fixed for each block. Listeners were presented with 30 trials 
per block and 20 trails for each combination of number of talkers and vocoder condition, 
for a total of 120 trials for the training potion of the experiment.   
 The experimental portion of the experiment consisted of six talker conditions (one, 
two, three, four, five and seven voices in the acoustic ear) and two vocoder conditions 
155 
 
(standard vs place-matched). The 12 conditions in the experimental blocks were presented 
randomly. The vocoder condition was held fixed for each block. Listeners were presented 
with 100 trials per block and 20 trials per tracked condition, for a total of 240 trials for the 
experimental potion of the experiment.           
 
Figure 4.3. Schematic of possible perception for two example trials from experiment 4.2. 
Example 1 depicts the easiest scenario, where the signal interval should be easily 
distinguishable from the reference interval. In the signal interval, a diotic voice is presented to 
the listener, with the same speech segment spoken by the same talker presented unprocessed 
to the left ear and vocoded to the right ear. In the reference interval, two different speech 
segments produced by two different talkers’ voices are presented to the vocoded and 
unprocessed ears. Example 2 is similar to example 1, except the task is made more difficult by 
presenting two additional unprocessed speech segments to the left ear in both intervals.  
 
156 
 
Procedure.  Participants were seated in a sound booth and directed their attention 
to a computer screen. The speech stimuli were generated by MATLAB and played via an 
RME Hammerfall (Haimhausen, Germany) sound card and presented over Sennheiser HD 
280 headphones at a comfortable presentation level of 60 dB SPL. The RMS was set fixed 
to 60 dB regardless of the number of talkers in the condition. 
 
Results. 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.4.  Results from fusion experiment 4.2. The dashed line indicated chance performance. 
The place-matched vocoder is represented by the white circles. The standard condition is 
represented by the black squares. Taken as a whole the place-matched vocoder yielded a higher 
proportion of correct answers relative to the standard vocoder. Post-hoc analysis revealed 
listeners were significantly more likely to guess the signal interval with 5 talkers in the left ear 
with the place-matched vocoder over the standard vocoder. These results indicate better 
* 
157 
 
performance when there was more of a spectral match between the diotic stimuli presented to 
each ear. Error bars represent ± one standard error of the mean. 
 
The results of Experiment 4.2 are shown in Figure 4.4. The mean percentage correct 
in identifying the interval containing a diotic voice is plotted as a function of the number 
of unprocessed talkers presented to the left ear. With only a single talker, performance was 
very high regardless of the vocoder type. With an increasing number of unprocessed talkers 
in the left ear, performance decreased, but more rapidly for the standard (mismatched) 
vocoder frequency map than for the matched vocoder.   A binary-logistic regression 
analysis revealed significant main effects of vocoder condition [χ² (1) = 10.40 p<0.001] 
and number of talkers [χ² (5) = 89.89 p<0.001] and a significant two-way interaction 
between vocoder condition and number of talkers [χ² (5) = 205.87 p<0.001]. Post-hoc tests 
were performed to determine the difference between the vocoder conditions at each number 
of talkers condition. After Bonferroni corrections for 6 comparisons, the vocoder 
conditions were significantly different from each other only in the case of 5 talkers 
(p=0.002).   
 
 
Discussion 
 
The goal of this series of experiments was to further explore the possible 
mechanisms behind the contralateral unmasking results from experiments in Chapters 2 
and 3 in terms of fusion and object formation. In Experiment 2.1 (Chapter 2) the 
contralateral unmasking benefit for NH listeners presented with vocoder simulations of 
158 
 
SSD-CI listening disappeared after a modest spectral mismatch of 3.6 mm or more. These 
results led to the current set of experiments whose goal was to determine if the loss of 
squelch benefit after frequency mismatch in experiment 2.1 could be attributed to 
disruption of object formation (forming discrete auditory percepts) and binaural fusion. 
Binaural fusion was assessed by measuring how many voices listeners heard in a mixture.  
This was measured in two ways, either by having the listeners count voices or discriminate 
between diotic and dichotic voices in the vocoder simulations.  In Experiment 4.1, listeners 
reported a number of voices that indicated they were not fusing the diotic stimulus (the 
same speech segment presented unprocessed to one ear and vocoded to the other), 
regardless of the vocoder condition. This result indicated the listeners had difficulty fusing 
an identical acoustic stimulus with the vocoded one. The goal of Experiment 4.2 was to 
examine the effect of a realistic spectral mismatch on discrimination of a diotic fusion 
stimulus from a dichotic reference stimulus. More specifically, the aim was to determine if 
listeners were more likely to achieve successful binaural fusion between their acoustic ear 
and vocoded ear when the vocoded ear’s frequency allocation more closely matched their 
acoustic ear.  In contrast to the lack of an effect of vocoder mismatch in Experiment 4.1, 
Experiment 4.2 demonstrated that listeners were more likely to correctly select the interval 
containing the diotic stimulus with a place-matched vocoder mapping than with a mapping 
that was based on a standard CI frequency map. The contrasting results from Experiments 
4.1 and 4.2 lead to several interesting interpretations about fusion and spectral mismatch. 
When simply asking listeners to count the number of voices in an acoustic scene, there was 
no difference between vocoder conditions (Experiment 4.1). However, when listeners were 
asked to discriminate between diotic fused and non-fused intervals, performance was 
159 
 
sensitive to spectral mismatch, i.e., the listeners performed worse with the mismatched 
vocoder (Experiment 4.2). One possible interpretation of these contrasting results is that 
listeners were experiencing partial or incomplete fusion. On one hand, the diotic stimulus 
might not have been sufficiently fused for a listener to identify it as one voice instead of 
two when asked for a free answer (Experiment 4.1). On the other hand, the diotic signal 
might have been sufficiently fused for listeners to detect interaural coherence in the signals 
(Experiment 4.2).  These results suggest that traditional subjective measures of fusion 
might be less sensitive to changes in interaural coherence than an objective discrimination 
task.  Another important difference between the current study and previous studies of 
fusion in real or simulated CI listeners is that these experiments involved stimuli that 
included extra voices in the mixture. Most studies of binaural fusion involve only a single 
sound presented to each ear such as a tone or noise, or a segment of speech presented to 
each ear. The complex mixture of extra voices included with the fusion stimulus in the 
current studies likely required the use of stream segregation and object formation in 
addition to binaural fusion. By evaluating fusion in the context of a complex mixture of 
voices, this study revealed the negative effects of the spectrally mismatched vocoder on 
binaural hearing processes. This result was also not apparent in the simple condition (i.e., 
one voice) but emerged when a complex mixture of voices was presented to the listener. 
 
 
 
 
 
160 
 
Impacts of spectral mismatch. 
 
The results of these experiments are in agreement with other recent binaural fusion data in 
CI and vocoder listeners. Aronoff et al. (2015) examined the effect of spectral compression 
on binaural fusion for NH listeners listening to vocoder simulations where one ear was 
spectrally mismatched relative to the other. They also concluded that spectral mismatch 
resulted in significantly less fusion. The test of fusion in Aronoff et al. (2015) was a basic 
subjective test of fusion, where they simply asked if the listeners heard the same sound in 
both ears or a different sound. The conclusions of the Aronoff study were similar to the 
conclusions in this study although the methodology used herein was more quantitative. 
Determining the effect of frequency distortion on discrimination is a more objective way 
to measure the functional limitations of a spectral mismatch for a SSD-CI listener. This is 
because discrimination is a pivotal step in segregating different voices in an acoustic scene.  
Goupell et al. (2013) examined the effect of spectral mismatch on binaural fusion in 
vocoder stimulations of BICI listeners. Goupell et al. (2013) measured fusion by varying 
the spectral mismatch and measuring the perceived image location on a GUI that displayed 
a face, which the listener could click on. The researchers predicted fused stimuli would 
cause the listeners to choose a location near the center of the face, and partially fused or 
unfused stimuli would cause the listener to choose a location that was diffuse or off center. 
They found that as spectral mismatch increased between the ears, so did the likelihood that 
the listeners would report more than one auditory image and that the perception was biased 
towards the ear where the stimulus had a higher carrier frequency.  In a follow up study 
conducted in BICI listeners, Kan et al. (2013) performed the same experiment but 
161 
 
controlled spectral mismatch by selecting single electrode pairs to present the bilateral 
stimuli.  They found similar results: as mismatch between electrodes increased so did the 
propensity for the listeners to report multiple auditory images.  The reduced binaural fusion 
that occurred after processing with the standard mismatched vocoder is consistent with 
other studies examining the effects of frequency mismatch on binaural processing.  
An interesting result from this current study was that the difference between 
vocoder conditions was only revealed as the number of concurrent talkers in the mixture 
was increased. Therefore, when many multiple talkers are present, a spectrally matched 
vocoder was critical for achieving fusion. In this current study, successful selection of the 
fusion interval depended on the listener’s ability to combine information from their 
vocoded ear and the acoustic ear to create a single auditory object (i.e. the fused voice).  
Ma et al. (2016) examined the role of frequency mismatch on binaural integration 
in vocoder simulations of SSD listening. They found that perception of speech presented 
bilaterally was better than speech presented monaurally, but that this effect was largely 
reduced by a spectral mismatch. However, a caveat of their work was that they presented 
target speech to both ears, which prevented the results from being interpreted purely in 
terms of binaural squelch. Taken together, it is clear that spectral mismatch is detrimental 
to binaural fusion and binaural processing in general. Fortunately for CI listeners, spectral 
mismatch can be clinically addressed with current technology and slight adjustments of 
mapping procedures. 
 
 
 
162 
 
Disruption of temporal processing. 
 
In the current experiment, loss of fusion sensitivity after spectral mismatch could 
have disrupted temporal grouping cues needed for binaural fusion, which could have 
contributed to the listeners’ poor performance in the standard vocoder mapping case. 
Accurate auditory grouping is an integral step for fusion of binaural stimuli. Auditory 
grouping refers to the processes of breaking down a complete auditory scene into its 
constituent components, or auditory objects and then connecting these objects together into 
streams. Spatial cues are powerful grouping cues when other more salient cues are 
unavailable, such as pitch cues which are mostly absent in CI listeners and vocoder listeners 
alike. Grouping based on temporal coherence is much more likely to occur when there is 
spectral overlap between binaural stimuli, since it provides continuity to the listener 
(Shamma, Elhilali, & Micheyl, 2011). Temporal coherence is an integral first step for 
auditory grouping. This is because linking dynamic, rapidly changing speech into the 
appropriate auditory object will enable the listener to stream said object (such as a target 
talker in a backdrop of competing talkers).  Related to coherence is temporal integration 
which refers to the neural process of integrating sounds in a certain temporal window. 
Temporal integration has been shown to be negatively impacted by spectral mismatch in 
CI listeners (Poon et al., 2009). This could be due to miscalculations in anatomical areas 
that process both spectral and temporal information, namely the brainstem. Moreover, the 
tonotopic nature of the auditory system has been shown to extend to the auditory cortex 
and association areas. It is probable that binaural processing could be affected by spectral 
mismatches originating in the periphery and further propagated to higher areas in the 
163 
 
auditory system.  Therefore diminished temporal grouping might have played a role in 
these experiments. 
An additional way to think about the decline in performance with the standard 
mapping condition is in terms of envelope correlation between the ears. Spectral mismatch 
might have disrupted the interaural envelope correlation between the vocoded signal and 
the acoustic signal.  Correlated envelope information has been shown to facilitate auditory 
object formation and binaural fusion (Carrell & Opie, 1992). Frequency mismatch could 
shift the envelope to higher frequencies, thus lowering interaural correlation and inhibiting 
fusion. 
 
Implications for SSD-CI listeners. 
 
In this study, our listeners were more likely to pick the correct fusion interval with 
a place-matched map over a standard map. Not only was spectral mismatch shown to be 
detrimental to binaural fusion in this study, spectral mismatch caused the largest decrease 
in performance on contralateral unmasking in the experiments in Chapter 2 (2.1, 2.3 and 
2.4). Currently, CIs are programmed for the profoundly deaf with little or no attention given 
to binaural hearing ability after CI implantation. However, for those with SSD-CI, a more 
place-matched map based on the electrode location on the basilar membrane could be 
implemented by an audiologist.  This type of change is readily realizable from a 
technological standpoint, since it would only require a shift in the speech-processor 
frequency-to-electrode allocation table (similar to the change in vocoder analysis filters in 
the current study, Table VII). However, an accurate interaural frequency match is not trivial 
164 
 
to accomplish, because it requires knowledge regarding the characteristic frequencies of 
the auditory nerve fibers being stimulated by each electrode in the array. This could be 
accomplished in a number of ways. CT scans (Noble et al., 2014) or radiographs 
(Landsberger et al., 2015) could be used to estimate the insertion angles of individual 
electrodes. In fact, the radiographic data from Landsberger et al. (2015) was used to 
generate an estimated average spectral mismatch map and corresponding place-matched 
map in this study. Individualized CT scans after implantation could give clinicians a good 
approximation of the various electrode locations for their patient and this information can 
help guide an individuated place-matched remapping protocol. However, these CT scans 
would not inform the audiologist about the best frequency of the neurons located below the 
individual electrodes. Psychoacoustic methods might be used to try to determine electrode 
location. However, many psychoacoustic measures are very laborious and long, and thus 
not clinically feasible.   
The 2AFC paradigm utilized in Experiment 4.2 could potentially be used to 
determine efficacy of new “place-matched” maps for CI listeners.  The speech task used in 
Experiment 4.2 is unique in that it does not rely on speech intelligibility. This is important 
because speech intelligibility with a CI is plastic and changes over time with adaptation, 
which makes intelligibility measures difficult to use as an acute tool for comparing maps. 
Using the paradigm introduced in Experiment 4.2, an audiologist could potentially fit a 
listener with a few different maps and determine which map is best for binaural fusion. 
This would eliminate the added confound of reduced intelligibility that might occur after 
acute map changes. Additionally, Experiment 4.2 is a good candidate for a clinical tool to 
build new maps, because relative to the above psychophysical techniques, it is fast and 
165 
 
easy for subjects to understand, and sensitive to mismatch.  However, it still remains to be 
elucidated whether binaural fusion is solely brainstem-mediated or if the cortex plays a 
role. If fusion does instead involve cortical-mediated processing, then listeners could 
possibly adapt to their mismatched maps and achieve partial fusion without remapping 
(Svirsky et al., 2004). However, adaptation might occur faster if interaural frequency 
alignment occurs right at implant activation (Svirsky, Talavage, Sinha, Neuburger, & 
Azadpour, 2015). Regardless, there are many potential avenues for clinicians and 
researchers alike to potentially reduce the hearing impacts of spectral mismatch for SSD-
CI listeners and BICI listeners alike.  
 
Study Limitations. 
 
The current study utilized vocoder simulation of SSD-CI listening.  Although 
valuable for studying the various effects of CI processing on auditory perception, vocoder 
simulations are an imperfect approximation of what actual CI listeners hear (Freyman et 
al., 2008; Li & Loizou, 2009). Duration of deafness, spiral ganglion nerve survival, amount 
of time after implantation, listeners’ age and electrode placement and programming can all 
impact CI listeners’ outcome after implantation. Vocoder simulations allow for the 
reduction of these confounds and enable researchers to study specific aspects of CI 
processing on auditory perception, without all of the potential confounds.    
The use of multiple different talkers was necessary for these experiments. The 
CUNY topic sentence corpus only contained two original talkers, the additional six were 
created using Praat synthesis techniques.  However, this strategy of creating multiple might 
166 
 
not have been ideal for these experiments. One of the features of Praat is that it can change 
the fundamental frequencies of the talkers. In order to implement the spectral mismatch in 
these studies, the new talkers were created by shifting the fundamental frequency of the 
original two talkers down in frequency. This method could have created new talkers that 
still sounded like the original talkers. However, this was likely not a significant confound 
in this study, since speech segments and talkers were all randomized. The odds that the 
same talker was presented saying the same thing in a trial was extremely low. A related 
issue with this approach is that Pratt software alters the formant frequencies (shifted up or 
down), which could have reduced the difference between one of the original voices and a 
‘new’ voice; basically the concern was that the Praat software shifts the voice and the 
vocoder shifts it back again. Because of the randomness of the speech samples, however, 
this is also likely of minor significance.  Despite this limitation, in future fusion 
experiments, a corpus with multiple (actually) different talkers should be used. 
A potential issue with Experiment 4.2 relates to the possible cues that listeners 
might have used to complete the task. This experiment was designed to measure binaural 
fusion. However, the listeners could have possibly been using additional cues to help them 
complete the experiments. In particular since the fusion interval contained a talker saying 
the same words in both ears, the listeners could have potentially been monitoring each ear 
for common words to complete the task. This strategy would likely only work when the 
total number of talkers was low, since individual words in the mixture would become more 
difficult to understand when the total number of words presented is high (as would occur 
with multiple talkers). Additional research could investigate whether listeners could have 
used this potential alternative cue. For instance, this fusion experiment can be manipulated 
167 
 
such that a listener must choose the interval where the same person is saying the same 
sentence in both ears (fusion interval) rather than choose the interval where different people 
are saying the same sentence in both ears. This alternative technique would isolate the 
fusion cue from any potential word cue strategy and remove this potential confound from 
the experiment.  
An important shortcoming of the vocoder approach is that actual CI listeners have 
time to adapt to their mismatched frequency maps, whereas vocoder listeners do not, at 
least in these acute experiments. A study by Siciliano et al. (2010) found that even after 10 
plus hours of training with a unilateral frequency shifted vocoder, listeners received no 
binaural benefits, i.e. they could not learn the shifted map. This indicates a constraint on 
the limits of binaural plasticity, at least in NH vocoder listeners. There is some limited 
evidence that post-lingually deafened monaural CI listeners and bimodal users can adapt 
to this frequency mismatch between their implanted ear and their acoustic ear (Svirsky et 
al. 2004; Reiss et al. 2007). However, this plasticity is likely to be incomplete, and these 
listeners would probably benefit from a more aligned spectral mapping, especially as it 
pertains to binaural hearing.  With regards to those with SSD, plasticity mechanisms could 
theoretically overcome some of the limitations imposed by spectral mismatch on binaural 
fusion. This would be more likely if the fusion percept is occurring at the cortical level. 
However, most of the evidence points to fusion occurring at a lower level of the auditory 
pathway where envelope coherence sensitivity is highly sensitive to frequency mismatch 
(Buss et al., 2009). 
 
Conclusions 
168 
 
 
The present set of experiments examined how spectral mismatch in vocoder 
simulations of SSD-CI listening affected the ability of listeners to fuse binaural stimuli and 
form auditory objects.  Experiments 4.1, A and B were designed to measure fusion in the 
context of the formation of perceptual objects in a multi-talker environment. The results of 
these experiments indicated that neither vocoder provided enough fusion cues for the 
listener to report the diotic stimuli as one voice. This could have occurred because the 
listeners were achieving partial fusion and the percept was not strong enough for the 
listeners to report the fusion stimulus as one voice. Experiment 4.2 measured a listener’s 
ability to choose the interval that contained a fused voice (signal interval) from a reference 
interval that contained two different voices (1 vocoded, 1 NH). The 2AFC task in 
experiment 4.2 was much more sensitive to spectral manipulation. If the listeners in this 
study were achieving partial fusion, it could have been enough to discriminate between the 
two intervals in Experiment 4.2, but not enough for the listeners to report one voice in 
Experiment 4.1. Similarly, the previous results of spectral mismatch on contralateral 
unmasking probably reflect some degree of fusion – spectral mismatch leads to enough 
fusion to identify when the sounds from the two ears originate from the same source 
(Experiment 4.2), and gives rise to binaural squelch under some conditions (Bernstein et 
al., 2015; 2016; 2017; Wess et al., 2017).  This percept did not provide enough fusion to 
sound like one voice (this study) or to produce binaural squelch in non-informational 
masking conditions (Bernstein et al. 2015; 2016).  
Taken together, these results suggest that a typical mismatch associated with the 
average insertion angle of the CI electrode array may have a substantial effect on the ability 
169 
 
to perceptually fuse a diotic speech signal in the acoustic and CI ear for SSD-CI listeners, 
and limit the ability to correctly parse the auditory scene.  Still, the place-matched vocoder 
did not provide listeners with enough interaural coherence to achieve full fusion of the 
diotic stimuli, indicating that even under ideal circumstances, a crude vocoder signal (or 
CI signal) might yield, at best, a partially fused percept. Overall, the fusion paradigm in 
Experiment 4.2 was sensitive to interaural mismatch and was relatively easy for 
participants to understand. This makes the measurement technique developed for 
Experiment 4.2 a potentially useful clinical tool to determine optimal frequency mapping 
or to evaluate outcomes of binaural integration for SSD-CI users. What is clear from the 
data is that for a SSD-CI listener to have the best chance of binaural hearing with their 
implant, steps need to be taken to reduce spectral mismatch between their acoustic ear and 
their CI.  
 
 
 
 
 
 
 
 
 
 
 
170 
 
Chapter 5: Summary of dissertation and general discussion 
 
The goal of this dissertation was to examine the effect of common CI distortions on 
binaural hearing in vocoder simulations of SSD-CI listening. Individuals with SSD are at 
a severe disadvantage when it comes to listening in noisy environments due to lack of 
binaural hearing.  This dissertation was primarily concerned with how CI distortions affect 
binaural squelch and perceptual fusion. Vocoder simulations in NH listeners enabled the 
selective manipulation of certain aspects of CI processing as a first step in determining how 
these distortions might disrupt binaural hearing for SSD-CI listeners. This work was 
motivated in part based on results from Bernstein et al. (2015, 2016). In these previous 
studies, SSD-CI listeners were found to receive a contralateral unmasking benefit from 
their implant. In this dissertation, contralateral unmasking is defined as the improvement 
in speech perception associated with adding the interfering voices to the ear contralateral 
to the target speech.  SSD-CI listeners demonstrated highly variable performance, poorer 
than for vocoder listeners performing the same task, and some SSD-CI listeners did not 
receive a binaural benefit at all (Bernstein et al., 2016).  This may be explained by a key 
difference between SSD-CI listeners and SSD-vocoder listeners—that is, vocoder 
simulations in NH listeners do not fully capture the distortion and functional limitations of 
a CI. Nevertheless, experimental manipulations that create the effect of these distortions 
are much more easily controlled in vocoder simulations of SSD-CI listeners; therefore, the 
entirety of this dissertation was conducted using vocoder simulations. Using these 
manipulations, this dissertation addressed questions to examine the relative importance of 
specific sources of the variability in binaural hearing outcomes for SSD-CI listeners. The 
171 
 
results of these dissertation studies identified the dimensions that should be studied and 
manipulated in actual CI listeners to see if binaural outcomes can be improved, and 
ultimately enable clinicians to make better programming choices for SSD-CI listeners.  
A series of experiments tested the effects of frequency mismatch, temporal 
disparities and amplitude compression on the ability to binaurally integrate unprocessed 
speech in one ear and vocoded speech in the other. The over-arching goal was to better 
understand binaural perception of speech in the presence of interfering talkers. To elucidate 
some of the effects of CI distortions on contralateral unmasking, the experiments in 
Chapter 2 manipulated three variables related to CI processing: interaural temporal and 
spectral mismatch and spectral resolution. Spectral mismatch was chosen as the first 
variable to study since a SSD-CI listener will typically possess a frequency mismatch 
between their acoustic ear and implant. This is because (i) the implant array is not fully 
inserted into the cochlea and (ii) an implant is normally programmed between 100-8500 
Hz to cover the most important frequencies for speech perception. Spectral mismatch was 
applied by linearly shifting the vocoded signal up and down in frequency in the range of 
1.8 - 7.4 mm. The results from Experiment 2.1 found that contralateral unmasking was 
completely eliminated with a negligible mismatch of 4-6 ERBs (3.6 - 5.4 mm). This is at 
the low end of the expected spectral mismatch for an average CI listener.  
The next distortion examined was temporal mismatch. Although CI manufacturers 
do not have a uniform delay in their processors, on average the speech processor in a CI is 
~10 ms slower than the traveling wave latency in an acoustic ear. This is because a CI 
contains some delay that is associated with the speech processor.  A range of temporal 
delays was applied to the vocoded speech presented to one ear. We found that contralateral 
172 
 
unmasking was not negatively affected by timing differences between the vocoder and 
acoustic ear up until about 24 ms, which is well beyond the timing discrepancy that would 
occur in actual SSD-CI listeners (Experiment 2.2).  Thus, the findings suggest that the 
interaural temporal mismatch is most likely not an important source contributing to the 
limited binaural unmasking observed in SSD-CI listeners. 
Next the effect of spectral mismatch was examined along with changes in spectral 
resolution. This was done because CIs have reduced spectral resolution and, due to current 
spread, only have about 8 functional channels at a time. The interaction between spectral 
mismatch and spectral resolution of the vocoder was examined by implementing a vocoder 
with either 3, 5, 8 or 10 channels, while systemically shifting the frequency allocation of 
the vocoder. Spectral resolution only affected performance when it accompanied a 
frequency mismatch, such that performance was more robust to spectral mismatch when 
the resolution of the vocoder was reduced (Experiment 2.3). This was a somewhat 
surprising result, the interpretation is that a lower number of frequency channels made the 
listener more immune to spectral shifts. In other words, broader channels allowed for more 
interaural correlation between the ears. Even after frequency manipulation, some interaural 
correlation was perceived. When frequency resolution was high, even a small change in 
frequency would cause the bands in both ears to become decorrelated.   
Finally, the interaction between temporal and spectral resolution was also 
examined, since both distortions are likely to coexist in a typical CI listener. The results 
from the temporal-spectral mismatch interaction experiment found that performance was 
best when the signals were aligned in frequency and in latency. In cases where a mismatch 
was present in one dimension, the additional mismatch did not further disrupt performance 
173 
 
(Experiment 2.4). The distortions in Experiment 2.4 were not additive as expected. This 
result was encouraging because CI listeners will likely have both a temporal and spectral 
mismatch between their CI and acoustic ear. Taken together, the results of Chapter 2 
indicated that spectral mismatch was by far the largest disruptor to binaural squelch. 
 Chapter 3 examined the effect of amplitude compression and expansion on head-
shadow benefit and squelch. CI listeners have fewer discriminable intensity steps than are 
available to NH listeners (Nelson et al., 1996). Additionally, the large DR enjoyed by NH 
listeners is dramatically reduced for CI listeners, so compression must be applied to the 
signals in order to deliver a wide range of amplitudes into a much smaller range. Chapter 
3 used HRTFs (horizontal spatial cues provided to the listener) to examine two spatial 
configurations to study the effects of compression on binaural squelch (3.1) and on head-
shadow benefit (3.2). The effect of expansion was also examined to determine if the 
opposite distortion (i.e. exaggerating the amplitude of the signal) could enhance 
performance in this spatial listening task. Compression was shown to have a negative effect 
on head-shadow benefit and binaural squelch. The results of Chapter 3 indicate that 
compression likely reduced ILD cues in the squelch experiment and reduced the effective 
TMRs in the head-shadow experiment, which reduced perceived spatial separation of the 
target and maskers. A direct comparison between the results of Experiment 2.1 (spectral 
mismatch and contralateral unmasking) and the experiments in Chapter 3 is difficult. The 
paradigms used in Chapter 2 and 3 are different, with the experiments in Chapter 3 
providing spatial cues to the listeners. Compression disrupted performance in both spatial 
conditions in Chapter 3, having a larger detrimental effect in the head-shadow case. In the 
head-shadow experiment (Experiment 3.2), the listener was required to listen to the 
174 
 
vocoded speech to adequately perform the task. Therefore, any further manipulation to the 
vocoded signal (compression/expansion) potentially corrupted the speech and reduced 
intelligibility.  In contrast, the contralateral unmasking experiment required the listener to 
primarily attend to the acoustic ear and ignore the vocoded ear. Based on that distinction, 
it’s acceptable to conclude that the mechanisms pertaining to the effects of compression 
and spectral mismatch on binaural hearing are different and spectral mismatch might be 
more detrimental to binaural hearing.  
 Finally, the experiments in Chapter 4 sought to determine if the decrease in 
contralateral unmasking (measured in chapter 2) after spectral mismatch was related to a 
loss of binaural fusion ability. The results of this dissertation implicated spectral mismatch 
as causing a large hindrance to contralateral unmasking. An interpretation of these results 
from the contralateral unmasking experiments was that frequency mismatch disrupted 
binaural fusion between the signals in the ears. In other words, listeners might not have 
been able to use spatial cues to perceptually pull the target talker from the maskers, if the 
maskers were not perceived as distinct, separate fused voices. Chapter 4 more directly 
investigated whether listeners could integrate signals between their two ears to hear a single 
voice in the context of multiple interfering talkers. Binaural fusion ability was examined 
in two different experiments. The first experiment examined numerosity judgments and the 
second examined binaural fusion in a discrimination task. These two experiments produced 
divergent results. When listeners were asked to freely report the number of voices they 
heard, their responses suggested that they always reported the diotic stimulus as unfused, 
with no effect of vocoder mismatch. On the other hand, when listeners were asked to 
discriminate between a diotic fusion interval and a non-fusion interval, they performed 
175 
 
significantly better with the place-matched vocoder than the standard vocoder. 
Additionally, the listeners were better able to determine when there was a stereo voice 
present in the mixture with the place-matched vocoder. The interpretation of these studies 
is that the listeners might have been achieving partial fusion. This partial fusion was enough 
for the listeners to identify the correct fusion interval (Experiment 4.2) but not enough to 
report the diotic signal as one voice (Experiment 4.1). The idea behind this is that interaural 
frequency alignment facilitates identification of the correct fusion interval (Experiment 
4.2) and enables listeners to receive a binaural benefit to better understand speech in noise 
i.e., receive partial fusion to facilitate contralateral unmasking (Experiment 2.1).  
 
General Discussion. 
Despite the negative impact CI distortions have on spatial hearing, many SSD-CI 
listeners receive binaural benefits such as squelch from their implant. This seems to 
contradict the results of this dissertation, which would predict that typical CI programming 
would greatly reduce or eliminate binaural benefit for SSD-CI listeners. In our vocoder 
simulations, level and frequency distortions of similar magnitude to what many SSD-CI 
listeners likely experience substantially reduced or eliminated binaural benefits. However, 
despite the likely presence of these distortions, many actual SSD-CI listeners achieve 
partial restoration of binaural function. This suggests that over time, individuals’ auditory 
systems might be compensating for these mismatched inputs (Reiss et al., 2007; Svirsky et 
al., 2004).  Still, for those SSD-CI listeners who achieve some binaural hearing after 
implantation, they might nevertheless benefit from remapping strategies to diminish the 
effects of spectral mismatch and compression. These listeners might still not be 
176 
 
maximizing the potential power of their implant to provide binaural hearing. It is an open 
question as to what kind of hearing benefits SSD-CI listeners might achieve with CI 
mapping that more closely reflects the needs of SSD individuals. In contrast, for SSD-CI 
users who do not exhibit any binaural hearing benefits after implantation, it is possible that 
a modified frequency allocation and compression algorithm would restore some aspects of 
binaural hearing for these individuals.  
Frequency mismatch and compression are viable and realistic targets for 
optimization because they can be minimized with current CI technology and techniques. 
Simply changing the frequency allocation of a SSD-CI listener’s electrode array has the 
potential to reduce spectral mismatch if it can be determined what the optimal map is. 
Additionally, dynamic and adaptive compression algorithms could be implemented that 
might be less disruptive to spatial hearing than static envelope compression.  Reduction of 
the frequency mismatch between an acoustic ear and an implanted ear has the most 
potential for improved spatial hearing for those with SSD. Although the data are sparse, 
the binaural system in the brainstem is believed to be based on coincidence detection by 
spectrally matched inputs coming from each ear (Joris et al., 1998). Therefore, simply 
providing a better interaural frequency match might restore many binaural hearing benefits 
for SSD-CI listeners via improved alignment of subcortical circuitry. Additionally, most 
plasticity is seen during development and normally there exists no reason to rewire 
subcortical binaural circuits in adulthood (King, Parsons, & Moore, 2000). Once the head 
and ears reach adult size, these brainstem-mediated binaural circuits are essentially stable. 
Therefore, plasticity mechanisms cannot necessarily be relied on to remedy misalignment 
177 
 
in subcortical circuits. Thus, providing the binaural system with a more accurate alignment 
between the implant and acoustic ear is of principal importance for binaural hearing.  
Frequency mismatch between the CI and acoustic ear can be diminished by 
innovative mapping techniques. This type of change is readily realizable from a 
technological standpoint, since it would only require a shift in the speech processor 
frequency-to-electrode allocation table (similar to the change in vocoder analysis filters in 
Experiment 4.2). However, a completely accurate interaural frequency match is not trivial 
to accomplish, because it requires knowledge regarding the characteristic frequencies of 
the auditory nerve fibers being stimulated by each electrode in the array. Determining the 
location of the electrode array could be accomplished in a number of ways. CT scans 
(Noble et al., 2014) or radiographs (Landsberger et al., 2015) could be used to estimate the 
insertion angles of individual electrodes. Individualized CT scans after implantation could 
give clinicians a good approximation of the various electrode locations for their patient and 
this information can help guide an individuated place-matched remapping protocol. 
However, CT scans would only provide the location of the electrode array and would not 
inform the audiologist about the characteristic frequency of the neurons located below the 
individual electrodes, neural survival and potential electric field interactions.   
Alternatively, psychoacoustic methods might be used to try to determine electrode 
location. Pitch matching between acoustic and electrical stimuli could be used to determine 
electrode location. However, pitch-matching procedures can be susceptible to methodical 
bias (Carlyon et al., 2010) and have been shown to be susceptible to adaptation effects 
(Reiss et al., 2014). Pitch perception changes reflect cortical plasticity instead of brainstem 
relative alignments and would likely not be implicated in optimized binaural function. ITD-
178 
 
sensitivity comparisons between a given electrode and a limited range of acoustic stimuli 
could be used to approximate the location of a listener’s electrodes on the basilar membrane 
(Goupell et al., 2013; Kan et al., 2013).  Identifying ITD-sensitive pairs of electrodes might 
be the most direct way to determine the best frequency allocation for pairs of electrodes.  
This approach directly engages the binaural system and has been shown to be a promising 
psychoacoustic method for determining electrode location for bilateral CI listeners (Hu & 
Deitz, 2015; Kan et al., 2013). ITD sensitivity could also be measured in SSD-CI listeners 
by presenting a narrow-band acoustic stimulus to the NH listener paired with a sensitive 
single electrode in the CI ear. ITD sensitivity might not be as susceptible to adaptation 
effects, as is pitch matching. This is because ITD processing is a brainstem-mediated 
computation, and therefore is less vulnerable to plasticity mechanisms than the perception 
of pitch, which could be subject to cortical plasticity (Weinberger, 1995). However, these 
ITD measurement experiments take a very long time to complete and some SSD-CI 
listeners are unable to complete the task at all. The fusion paradigm utilized in Experiment 
4.2 could be a good candidate for determining optimal binaural sensitivity in SSD-CI 
listeners. A clinician could test different maps to determine which map facilitates fusion 
for the SSD-CI listener. The experiment is relatively straightforward, quick to administer 
and has the potential to determine whether a particular map can lead to binaural fusion. 
Based on these putative brainstem mechanisms for ITD and the results presented in Chapter 
2 that demonstrate the largest detriments from spectral mismatch, more accurate alignment 
179 
 
between the basilar membrane and implant array would likely lead to the largest 
improvements in binaural hearing for SSD-CI listeners.  
Distortion caused by compression was found to be substantially detrimental to 
binaural hearing in the experiments in Chapter 3. Due to the integrity of the cochlea after 
deafness and the limitations in how electric current can encode level, compression is 
necessary in CI processing. However, just as with spectral mismatch, several possible 
remedies are available. The most innovative of solutions comes from a study by Kasturi 
and Loizou (2007) who implemented a dynamic compressive function to determine the 
effects of a rapidly changing compression function on speech understating in CI listeners. 
Static envelope compression falls short when background noise is increased; therefore, the 
authors aimed to determine what effect a sigmoid-shaped compression function might have 
on perception of speech in noise. This innovative technique involves suppressing any 
signals that fall below the noise floor and retaining any signals that fall above the noise 
floor (likely speech). The sigmoid function likely works well because the knee point 
(compression threshold) was set to change depending on the listening environment 
(dynamic compression based on the current noise floor). After examining speech 
perception, they found that the sigmoid compressive function produced significantly lower 
speech reception thresholds over the standard logarithmic compression algorithm. A follow 
up study by the same research group (Hu, Loizou, Li, & Kasturi, 2007) compared their 
sigmoid compressive function to CI listeners using their own daily strategy and obtained 
the same result—that is, the dynamic sigmoid compressive function outperformed the 
standard compressive function in every noise condition tested.  
180 
 
Additional research examining adaptive compression strategies in vocoder 
simulations corroborated the findings of previous research. Lai, Tsao, and Chen (2015) 
implemented an envelope compression strategy which enhanced the modulation depth of 
the vocoded signals and compared speech perception performance to that of a standard 
static compression algorithm.  They found that the adaptive strategy substantially improved 
speech intelligibility in noise. They conclude that this type of adaptive strategy could show 
real promise in actual CI listeners by enhancing signal envelopes while reducing the 
impacts of background noise. Taken together, the research on dynamic, adaptive 
compression algorithms shows real promise in improving speech perception in noise for 
CI listeners. Adaptation of sigmoid-shaped compression might facilitate hearing speech in 
noise for SSD-CI listeners. This type of compression is thought to improve spectral contrast 
(as would be needed in competing talker situations) without disrupting loudness. Sigmoid-
shaped compression also should attenuate more spatial noise due to the adaptive noise 
floor, which would facilitate hearing in noisy environments. Although promising, it 
remains to be elucidated whether or not these adaptive compression strategies could 
improve spatial hearing outcomes for SSD-CI listeners. 
The future is bright for optimization of compression CI processing for all CI users. 
Ultimately, for CI compression to be enhanced, it needs to be adaptive to the listening 
environment (noisy or quiet) and situation. Additionally, the compression parameter should 
be adjusted for each individual listener based on his or her unique needs and limitations. 
New technologies are being developed to restore normal loudness growth for CI listeners 
and potentially the number of discriminable intensity steps as well.  
181 
 
Taken together, the results of this dissertation indicate that common CI distortions 
can impose some listening challenges for SSD-CI listeners. The principal findings of this 
dissertation identify frequency mismatch and compression as important possible targets for 
optimization to facilitate binaural hearing for SSD-CI listeners. Follow-up studies should 
specifically target these two CI distortions in actual SSD-CI listeners to determine what 
effect they have on binaural hearing.  Fortunately, these distortions can likely be minimized 
by innovative mapping and signal programming techniques in order to ensure that SSD-CI 
listeners receive binaural hearing benefits from their implant. Given the importance of 
verbal communication in our society, better spatial hearing in noise for SSD-CI listeners 
will undoubtedly improve their quality of life. More broadly, better hearing outcomes for 
current SSD-CI listeners will motivate more individuals who are suffering with SSD to 
seek out CIs as a treatment option.  
 
 
 
 
 
 
 
References 
 
Arbogast, T. L., Mason, C. R., & Kidd, G. (2002). The effect of spatial separation on 
informational and energetic masking of speech. The Journal of the Acoustical Society 
182 
 
of America, 112(5), 2086. http://doi.org/10.1121/1.1510141 
Arndt, S., Aschendorff, A., Laszig, R., Beck, R., Schild, C., Kroeger, S., … Wesarg, T. 
(2010). Comparison of Pseudobinaural Hearing to Real Binaural Hearing 
Rehabilitation After Cochlear Implantation in Patients With Unilateral Deafness and 
Tinnitus. 
Aronoff, J. M., Freed, D. J., Fisher, L. M., Pal, I., & Soli, S. D. (2011). The Effect of 
Different Cochlear Implant Microphones on Acoustic Hearing Individuals ’ Binaural 
Benefits for Speech Perception in Noise. Ear & Hearing, 468–484. 
http://doi.org/10.1097/AUD.0b013e31820dd3f0 
Aronoff, J. M., Shayman, C., Prasad, A., Suneel, D., & Stelmach, J. (2015). Unilateral 
spectral and temporal compression reduces binaural fusion for normal hearing 
listeners with cochlear implant simulations. Hearing Research, 320, 24–29. 
http://doi.org/10.1016/j.heares.2014.12.005 
Begault, D. R., Wenzel, E. M., & Anderson, M. R. (2001). Direct comparison of the impact 
of head tracking, reverberation, and individualized head-related transfer functions on 
the spatial perception of a virtual speech source. Journal of the Audio Engineering 
Society. Audio Engineering Society, 49, 904–916. 
Bernstein, J.G.W., Goupell, M.J., Iyer, N., Schuchman, G.I., Rivera, A.L., and Brungart, 
D. . (2013). Binaural speech stream segregation for single-sided deaf and bilateral 
cochlear implantees. Poster presentation, Conference on Implantable Auditory 
Prostheses. 
Bernstein, J. G. W., Goupell, M. J., Schuchman, G. I., Rivera, A. L., & Brungart, D. S. 
(2016). Having Two Ears Facilitates the Perceptual Separation of Concurrent Talkers 
183 
 
for Bilateral and Single-Sided Deaf Cochlear Implantees. Ear and Hearing, 289–302. 
http://doi.org/10.1097/AUD.0000000000000284 
Bernstein, J. G. W., Iyer, N., & Brungart, D. S. (2015). Release from informational 
masking in a monaural competing-speech task with vocoded copies of the maskers 
presented contralaterally. The Journal of the Acoustical Society of America, 137(2), 
702–13. http://doi.org/10.1121/1.4906167 
Bess, F. H., & Tharpe, A. M. (1984). Unilateral hearing impairment in children. Pediatrics, 
74(2), 206–16. Retrieved from 
http://pediatrics.aappublications.org/content/74/2/206.abstract 
Best, V., Thompson, E. R., Mason, C. R., & Kidd, G. (2013). An energetic limit on spatial 
release from masking. JARO - Journal of the Association for Research in 
Otolaryngology, 14, 603–610. http://doi.org/10.1007/s10162-013-0392-1 
Blamey, P., Artieres, F., Başkent, D., Bergeron, F., Beynon, A., Burke, E., … Lazard, D. 
S. (2012). Factors affecting auditory performance of postlinguistically deaf adults 
using cochlear implants: An update with 2251 patients. Audiology and Neurotology, 
18(1), 36–47. http://doi.org/10.1159/000343189 
Boersma, P., & Weenink, D. (2007). Praat: doing phonetics by computer (Version 
4.5.)[Computer program]. Retrieved from Http://www.praat.org/, 5(9/10), 341–345. 
Retrieved from papers3://publication/uuid/AF582E4D-2F7A-409E-B4F1-
7A10385D9135 
Bolia, R. S., Nelson, W. T., Ericson, M. A., & Simpson, B. D. (2000). A speech corpus for 
multitalker communications research. The Journal of the Acoustical Society of 
America. http://doi.org/10.1121/1.428288 
184 
 
Bradley, J. S., Reich, R. D., & Norcross, S. G. (1999). On the combined effects of signal-
to-noise ratio and room acoustics on speech intelligibility. The Journal of the 
Acoustical Society of America, 106(4 Pt 1), 1820–8. http://doi.org/10.1121/1.427932 
Bregman, A. S. (1994). The Auditory Scene. In Auditory Scene Analysis: The perceptual 
organization of sound (pp. 1–45). 
Bronkhorst, A. W. (2000). The Cocktail Party Phenomenon: A Review of Research on 
Speech Intelligibility in Multiple-Talker Conditions. Acustica, 86, 117–128. 
http://doi.org/10.1306/74D710F5-2B21-11D7-8648000102C1865D 
Bronkhorst, A. W., & Plomp, R. (1988). The effect of head-induced interaural time and 
level differences on speech intelligibility in noise. The Journal of the Acoustical 
Society of America, 83, 1508–1516. http://doi.org/10.1121/1.395906 
Brungart, D. S. (2001). Informational and energetic masking effects in the perception of 
two simultaneous talkers. The Journal of the Acoustical Society of America, 109(3), 
1101–1109. http://doi.org/10.1121/1.1345696 
Brungart, D. S., Simpson, B. D., Ericson, M. a., & Scott, K. R. (2001). Informational and 
energetic masking effects in the perception of multiple simultaneous talkers. The 
Journal of the Acoustical Society of America, 110(5), 2527. 
http://doi.org/10.1121/1.1408946 
Buechner, A., Brendel, M., Lesinski-Schiedat, A., Wenzel, G., Frohne-Buechner, C., 
Jaeger, B., & Lenarz, T. (2010). Cochlear implantation in unilateral deaf subjects 
associated with ipsilateral tinnitus. Otology & Neurotology : Official Publication of 
the American Otological Society, American Neurotology Society [and] European 
Academy of Otology and Neurotology, 31(9), 1381–5. 
185 
 
http://doi.org/10.1097/MAO.0b013e3181e3d353 
Buss, E., Whittle, L. N., Grose, J. H., & Hall, J. W. (2009). Masking release for words in 
amplitude-modulated noise as a function of modulation rate and task. The Journal of 
the Acoustical Society of America, 126(1), 269–80. http://doi.org/10.1121/1.3129506 
Cai, Y., Zheng, Y., Liang, M., Zhao, F., Yu, G., Liu, Y., … Chen, G. (2015). Auditory 
spatial discrimination and the mismatch negativity response in hearing-impaired 
individuals. PLoS ONE, 10(8). http://doi.org/10.1371/journal.pone.0136299 
Carlyon, R. P., Macherey, O., Frijns, J. H. M., Axon, P. R., Kalkman, R. K., Boyle, P., … 
Dauman, R. (2010). Pitch comparisons between electrical stimulation of a cochlear 
implant and acoustic stimuli presented to a normal-hearing contralateral ear. Journal 
of the Association for Research in Otolaryngology : JARO, 11(4), 625–40. 
http://doi.org/10.1007/s10162-010-0222-7 
Carrell, T. D., & Opie, J. M. (1992). The effect of amplitude comodulation on auditory 
object formation in sentence perception. Perception & Psychophysics, 52(4), 437–45. 
http://doi.org/10.3758/BF03206703 
Chermak, G., & Lee, J. (2005). Comparison of children’s performance on four tests of 
temporal resolution. Journal of the American Academy of Audiology, 16(8), 554–563. 
http://doi.org/10.3766/jaaa.16.8.4 
Clarkson, P. M., & Bahgat, S. F. (1991). Envelope expansion methods for speech 
enhancement. The Journal of the Acoustical Society of America, 89(3), 1378–82. 
http://doi.org/10.1121/1.400538 
Cooke, M. (2006). A glimpsing model of speech perception in noise. The Journal of the 
Acoustical Society of America, 119, 1562–1573. http://doi.org/10.1121/1.2166600 
186 
 
Crew, J. D., Galvin, J. J., & Fu, Q.-J. J. (2012). Channel interaction limits melodic pitch 
perception in simulated cochlear implants. J. Acoust. Soc. Am., 132(October), EL429. 
http://doi.org/10.1121/1.4758770 
Culling, J. F., Jelfs, S., Talbert, A., Grange, J. a, & Backhouse, S. S. (2012). The benefit of 
bilateral versus unilateral cochlear implantation to speech intelligibility in noise. Ear 
and Hearing, 33(6), 673–82. http://doi.org/10.1097/AUD.0b013e3182587356 
Culling, J. F., Jelfs, S., Talbert, A., Grange, J. a, & Backhouse, S. S. (2012). The benefit of 
bilateral versus unilateral cochlear implantation to speech intelligibility in noise. Ear 
Hear., 33(6), 673–682. http://doi.org/10.1097/AUD.0b013e3182587356 
Darwin, C. J., & Hukin, R. W. (1998). Perceptual segregation of a harmonic from a vowel 
by interaural time difference in conjunction with mistuning and onset asynchrony. The 
Journal of the Acoustical Society of America, 103, 1080–1084. 
http://doi.org/10.1121/1.421221 
de Cheveigné, A., McAdams, S., & Marin, C. M. H. (1997). Concurrent vowel 
identification. II. Effects of phase, harmonicity, and task. The Journal of the 
Acoustical Society of America. http://doi.org/10.1121/1.419476 
DeVries, L., Scheperle, R., & Bierer, J. A. (2016). Assessing the Electrode-Neuron 
Interface with the Electrically Evoked Compound Action Potential, Electrode 
Position, and Behavioral Thresholds. JARO - Journal of the Association for Research 
in Otolaryngology, 17(3), 237–252. http://doi.org/10.1007/s10162-016-0557-9 
Dong, S., Mulders, W. H. a M., Rodger, J., & Robertson, D. (2009). Changes in neuronal 
activity and gene expression in guinea-pig auditory brainstem after unilateral partial 
hearing loss. Neuroscience, 159(3), 1164–74. 
187 
 
http://doi.org/10.1016/j.neuroscience.2009.01.043 
Dooley, G. J., Blarney, P. J., Seligman, P. M., Alcantara, J. I., Clark, G. M., Shallop, J. K., 
… Menapace, C. M. (1993). Combined Electrical and Acoustical Stimulation Using 
a Bimodal Prosthesis. 
Dorman, M. F., Zeitler, D., Cook, S. J., Loiselle, L., Yost, W. A., Wanna, G. B., & Gifford, 
R. H. (2015). Interaural level difference cues determine sound source localization by 
single-sided deaf patients fit with a cochlear implant. Audiology and Neurotology, 
20(3), 183–188. http://doi.org/10.1159/000375394 
Drullman, R., & Bronkhorst,  a W. (2000). Multichannel speech intelligibility and talker 
recognition using monaural, binaural, and three-dimensional auditory presentation. 
The Journal of the Acoustical Society of America, 107(4), 2224–2235. 
http://doi.org/10.1121/1.428503 
Dunn, C. C., Tyler, R. S., Witt, S., Ji, H., & Gantz, B. J. (2012). Sequential bilateral 
cochlear implantation: Speech perception and localization pre-and post-second 
cochlear implantation. American Journal of Audiology, 21, 181–189. 
http://doi.org/10.1044/1059-0889(2012/12-0004) 
Durlach, N. I. (1963). Equalization and Cancellation Theory of Binaural Masking-Level 
Differences. The Journal of the Acoustical Society of America. 
http://doi.org/10.1121/1.1918675 
Durlach, N. I., Mason, C. R., Shinn-Cunningham, B. G., Arbogast, T. L., Colburn, H. S., 
& Kidd, G. (2003). Informational masking: Counteracting the effects of stimulus 
uncertainty by decreasing target-masker similarity. The Journal of the Acoustical 
Society of America, 114(1), 368. http://doi.org/10.1121/1.1577562 
188 
 
Eapen, R. J., Buss, E., Adunka, M. C., Pillsbury, H. C., & Buchman, C. A. (2009). Hearing-
in-noise benefits after bilateral simultaneous cochlear implantation continue to 
improve 4 years after implantation. Otology & Neurotology : Official Publication of 
the American Otological Society, American Neurotology Society [and] European 
Academy of Otology and Neurotology, 30, 153–159. 
http://doi.org/10.1097/MAO.0b013e3181925025 
Elliott, T. M., & Theunissen, F. E. (2009). The modulation transfer function for speech 
intelligibility. PLoS Computational Biology, 5(3), e1000302. 
http://doi.org/10.1371/journal.pcbi.1000302 
English, K., & Church, G. (1999). Unilateral hearing loss in children: An update for the 
1990s. Language, Speech, and Hearing Services in Schools, 30(1), 26–31. Retrieved 
from http://lshss.asha.org/cgi/content/abstract/30/1/26 
Erbele, I. D., Bernstein, J. G. W., Schuchman, G. I., Brungart, D. S., & Rivera, A. (2015). 
An initial experience of cochlear implantation for patients with single-sided deafness 
after prior osseointegrated hearing device. Otology & Neurotology : Official 
Publication of the American Otological Society, American Neurotology Society [and] 
European Academy of Otology and Neurotology, 36(1), e24-9. 
http://doi.org/10.1097/MAO.0000000000000652 
Firszt, J. B., Holden, L. K., Reeder, R. M., Cowdrey, L., & King, S. (2012). Cochlear 
implantation in adults with asymmetric hearing loss. Ear and Hearing, 33(4), 521–
33. http://doi.org/10.1097/AUD.0b013e31824b9dfc 
Francart, T., & McDermott, H. J. (2013). Psychophysics, fitting, and signal processing for 
combined hearing aid and cochlear implant stimulation. Ear and Hearing, 34(6), 685–
189 
 
700. http://doi.org/10.1097/AUD.0b013e31829d14cb 
Freyman, R. L., Balakrishnan, U., & Helfer, K. S. (2001). Spatial release from 
informational masking in speech recognition. JOURNAL OF THE ACOUSTICAL 
SOCIETY OF AMERICA, 109(5), 2112–2122. http://doi.org/10.1121/1.1354984 
Freyman, R. L., Balakrishnan, U., & Helfer, K. S. (2008). Spatial release from masking 
with noise-vocoded speech. The Journal of the Acoustical Society of America, 124(3), 
1627–37. http://doi.org/10.1121/1.2951964 
Freyman, R. L., Helfer, K. S., & Balakrishnan, U. (2005). Spatial and spectral factors in 
release from informational masking in speech recognition. Acta Acustica United with 
Acustica, 91, 537–545. http://doi.org/10.1121/1.1354984 
Freyman, R. L., Helfer, K. S., McCall, D. D., & CLIFTON, R. K. (1999). The role of 
perceived spatial separation in the unmasking of speech. Journal of the Acoustical 
Society of America, 106(6), 3578–3588. http://doi.org/10.1121/1.428211 
Fried, D. L. (1990). Greenwood frequency measurements. Journal of the Optical Society 
of America A. http://doi.org/10.1364/JOSAA.7.000946 
Friesen, L. M., Shannon, R. V., Baskent, D., & Wang, X. (2001). Speech recognition in 
noise as a function of the number of spectral channels: Comparison of acoustic hearing 
and cochlear implants. The Journal of the Acoustical Society of America, 110(2), 
1150. http://doi.org/10.1121/1.1381538 
Fu, Q.-J., & Nogaki, G. (2005). Noise susceptibility of cochlear implant users: the role of 
spectral resolution and smearing. Journal of the Association for Research in 
Otolaryngology : JARO, 6(1), 19–27. http://doi.org/10.1007/s10162-004-5024-3 
Fu, Q. J., & Shannon, R. V. (1998). Effects of amplitude nonlinearity on phoneme 
190 
 
recognition by cochlear implant users and normal-hearing listeners. The Journal of 
the Acoustical Society of America, 104(5), 2570–2577. 
http://doi.org/10.1121/1.423912 
Gallun, F. J., Mason, C. R., & Kidd, G. (2005). Binaural release from informational 
masking in a speech identification task. The Journal of the Acoustical Society of 
America, 118(3), 1614. http://doi.org/10.1121/1.1984876 
Garadat, S. N., Litovsky, R. Y., Yu, G., & Zeng, F.-G. (2009). Role of binaural hearing in 
speech intelligibility and spatial release from masking using vocoded speech. The 
Journal of the Acoustical Society of America, 126(5), 2522–2535. 
http://doi.org/10.1121/1.3238242 
Gardner, W. G. (1995). HRTF measurements of a KEMAR. The Journal of the Acoustical 
Society of America. http://doi.org/10.1121/1.412407 
Gelfand, S. (2004). Hearing- An Introduction to Psychological and Physiological 
Acoustics (4th ed.). New York, Marcel Dekker. 
Glasberg, B. R., & Moore, B. C. (1986). Auditory filter shapes in subjects with unilateral 
and bilateral cochlear impairments. The Journal of the Acoustical Society of America, 
79, 1020–1033. http://doi.org/10.1121/1.393374 
Goksoy, C., Demirtas, S., Yagcioglu, S., & Ungan, P. (2005). Interaural delay-dependent 
changes in the binaural interaction component of the guinea pig brainstem responses. 
Brain Research, 1054(2), 183–191. http://doi.org/10.1016/j.brainres.2005.06.083 
Gordon, K. A., Valero, J., van Hoesel, R., & Papsin, B. C. (2008). Abnormal timing delays 
in auditory brainstem responses evoked by bilateral cochlear implant use in children. 
Otology & Neurotology : Official Publication of the American Otological Society, 
191 
 
American Neurotology Society [and] European Academy of Otology and 
Neurotology, 29, 193–198. http://doi.org/10.1097/mao.0b013e318162514c 
Goupell, M. J., & Litovsky, R. Y. (2015). Sensitivity to interaural envelope correlation 
changes in bilateral cochlear-implant users. The Journal of the Acoustical Society of 
America, 137(1), 335–349. http://doi.org/10.1121/1.4904491 
Goupell, M. J., Stoelb, C., Kan, A., & Litovsky, R. Y. (2013). Effect of mismatched place-
of-stimulation on the salience of binaural cues in conditions that simulate bilateral 
cochlear-implant listening. The Journal of the Acoustical Society of America, 133(4), 
2272–87. http://doi.org/10.1121/1.4792936 
Grant, K. W., Wassenhove, V. Van, & Poeppel, D. (2004). Detection of auditory (cross-
spectral) and auditory–visual (cross-modal) synchrony. Speech Communication, 
44(1–4), 43–53. http://doi.org/10.1016/j.specom.2004.06.004 
Grantham, D. W., Ashmead, D. H., Haynes, D. S., Hornsby, B. W. Y., Labadie, R. F., & 
Ricketts, T. A. (2012). Horizontal Plane Localization in Single-Sided Deaf Adults 
Fitted With a Bone-Anchored Hearing Aid (Baha). Ear and Hearing. 
http://doi.org/10.1097/AUD.0b013e3182503e5e 
Grantham, D. W., Ashmead, D. H., Ricketts, T. A., Haynes, D. S., & Labadie, R. F. (2008). 
Interaural time and level difference thresholds for acoustically presented signals in 
post-lingually deafened adults fitted with bilateral cochlear implants using CIS+ 
processing. Ear and Hearing, 29, 33–44. 
http://doi.org/10.1097/AUD.0b013e31815d636f 
Green, T., Faulkner, A., & Rosen, S. (2002). Spectral and temporal cues to pitch in noise-
excited vocoder simulations of continuous-interleaved-sampling cochlear implants. 
192 
 
The Journal of the Acoustical Society of America, 112(5), 2155. 
http://doi.org/10.1121/1.1506688 
Greenwood, D. D. (1961). Auditory Masking and the Critical Band. The Journal of the 
Acoustical Society of America. http://doi.org/10.1121/1.1908699 
Grothe, B., Pecka, M., & McAlpine, D. (2010). Mechanisms of sound localization in 
mammals. Physiological Reviews, 90(3), 983–1012. 
http://doi.org/10.1152/physrev.00026.2009 
Hall, J. W., Buss, E., & Grose, J. H. (2005). Informational masking release in children and 
adults. The Journal of the Acoustical Society of America, 118, 1605–1613. 
http://doi.org/10.1121/1.1992675 
Hansen, M. R., Gantz, B. J., & Dunn, C. (2013). Outcomes After Cochlear Implantation 
for Patients With Single-Sided Deafness , Including Those With ` re ’ s Disease 
Recalcitrant Me. 
Hawley, M. L., Litovsky, R. Y., & Culling, J. F. (2004). The benefit of binaural hearing in 
a cocktail party: Effect of location and type of interferer. The Journal of the Acoustical 
Society of America, 115(2), 833. http://doi.org/10.1121/1.1639908 
Hoesel, R. Van. (2012). Auditory Prostheses. (F.-G. Zeng, A. N. Popper, & R. R. Fay, 
Eds.) (Vol. 39). New York, NY: Springer New York. http://doi.org/10.1007/978-1-
4419-9434-9 
Hopkins, K., & Moore, B. C. J. (2009). The contribution of temporal fine structure to the 
intelligibility of speech in steady and modulated noise. The Journal of the Acoustical 
Society of America, 125(1), 442–6. http://doi.org/10.1121/1.3037233 
Hu, H., & Dietz, M. (2015). Comparison of Interaural Electrode Pairing Methods for 
193 
 
Bilateral Cochlear Implants. Trends in Hearing, 19, 233121651561714. 
http://doi.org/10.1177/2331216515617143 
Hu, Y., Loizou, P. C., Li, N., & Kasturi, K. (2007). Use of a sigmoidal-shaped function for 
noise attenuation in cochlear implants. The Journal of the Acoustical Society of 
America, 122(4), EL128-L134. http://doi.org/10.1121/1.2772401 
Ihlefeld, A., & Litovsky, R. Y. (2012). Interaural level differences do not suffice for 
restoring spatial release from masking in simulated cochlear implant listening. PloS 
One, 7(9), e45296. http://doi.org/10.1371/journal.pone.0045296 
Ihlefeld, A., & Shinn-Cunningham, B. (2008). Spatial release from energetic and 
informational masking in a selective speech identification task. The Journal of the 
Acoustical Society of America, 123(6), 4369–79. http://doi.org/10.1121/1.2904826 
Jones, G. L., Won, J. H., Drennan, W. R., & Rubinstein, J. T. (2013). Relationship between 
channel interaction and spectral-ripple discrimination in cochlear implant users a). 
The Journal of the Acoustical Society of America, 133(1), 425–433. 
http://doi.org/10.1121/1.4768881 
Joris, P. X., Smith, P. H., & Yin, T. C. T. (1998). Coincidence detection in the auditory 
system: 50 years after Jeffress. Neuron. http://doi.org/10.1016/S0896-
6273(00)80643-1 
Kamal, S. M., Robinson, A. D., & Diaz, R. C. (2012). Cochlear implantation in single-
sided deafness for enhancement of sound localization and speech perception. Current 
Opinion in Otolaryngology & Head and Neck Surgery, 20(5), 393–397. 
http://doi.org/10.1097/MOO.0b013e328357a613 
Kan, A., Stoelb, C., Litovsky, R. Y., & Goupell, M. J. (2013a). Effect of mismatched place-
194 
 
of-stimulation on binaural fusion and lateralization in bilateral cochlear-implant users. 
The Journal of the Acoustical Society of America, 134(4), 2923–36. 
http://doi.org/10.1121/1.4820889 
Kan, A., Stoelb, C., Litovsky, R. Y., & Goupell, M. J. (2013b). Effect of mismatched place-
of-stimulation on binaural fusion and lateralization in bilateral cochlear-implant users. 
The Journal of the Acoustical Society of America, 134(4), 2923–36. 
http://doi.org/10.1121/1.4820889 
Kasturi, K., & Loizou, P. C. (2007). Use of S-shaped input-output functions for noise 
suppression in cochlear implants. Ear and Hearing, 28(3), 402–411. 
http://doi.org/10.1097/AUD.0b013e31804793c4 
Kawano, A., Seldon, H. L., Pyman, B., & Clark, G. M. (1995). Intracochlear factors 
contributing to psychophysical percepts following cochlear implantation: A case 
study. In Annals of Otology, Rhinology and Laryngology (Vol. 104, pp. 54–57). 
http://doi.org/10.1080/00016489850183386 
Ketten, D. R., Skinner, M. W., Wang, G., Vannier, M. W., Gates, G. A., & Neely, J. G. 
(1998). In vivo measures of cochlear length and insertion depth of nucleus cochlear 
implant electrode arrays. Annals of Otology, Rhinology and Laryngology, 107, 1–16. 
Kidd, G., Mason, C. R., & Arbogast, T. L. (2002). Similarity, uncertainty, and masking in 
the identification of nonspeech auditory patterns. The Journal of the Acoustical 
Society of America, 111(3), 1367. http://doi.org/10.1121/1.1448342 
Kidd, G., Mason, C. R., Best, V., & Marrone, N. (2010). Stimulus factors influencing 
spatial release from speech-on-speech masking. The Journal of the Acoustical Society 
of America, 128, 1965–1978. http://doi.org/10.1121/1.3478781 
195 
 
Kidd, G., Mason, C. R., & Deliwala, P. S. (1994). Reducing informational masking by 
sound segregation, 95(June 1994), 3475–3480. 
Kidd, G., Mason, C. R., Rohtla, T. L., & Deliwala, P. S. (1998). Release from masking due 
to spatial separation of sources in the identification of nonspeech auditory patterns. 
The Journal of the Acoustical Society of America, 104, 422–431. 
http://doi.org/10.1121/1.423246 
King,  a J., Parsons, C. H., & Moore, D. R. (2000). Plasticity in the neural coding of 
auditory space in the mammalian brain. Proceedings of the National Academy of 
Sciences of the United States of America, 97(22), 11821–11828. 
http://doi.org/10.1073/pnas.97.22.11821 
Laback, B., Egger, K., & Majdak, P. (2014). Perception and coding of interaural time 
differences with bilateral cochlear implants. Hearing Research, 1–13. 
http://doi.org/10.1016/j.heares.2014.10.004 
Lai, Y. H., Tsao, Y., & Chen, F. (2015). Effects of adaptation rate and noise suppression 
on the intelligibility of compressed-envelope based speech. PLoS ONE, 10(7), 1–19. 
http://doi.org/10.1371/journal.pone.0133519 
Landsberger, D. M., Svrakic, M., Roland, J. T., & Svirsky, M. (2015). The Relationship 
Between Insertion Angles, Default Frequency Allocations, and Spiral Ganglion Place 
Pitch in Cochlear Implants. Ear Hear2, 36, 207–213. 
http://doi.org/10.1097/AUD.0000000000000163 
Landsberger DM, Svrakic M, Roland JT Jr, S. M. (2015). The Relationship Between 
Insertion Angles, Default Frequency Allocations, and Spiral Ganglion Place Pitch in 
Cochlear Implants. Ear and Hearing. 
196 
 
Leek, M., Brown, M., & Dorman, M. (1991). Informational masking and auditory 
attention. Perception & Psychophysics, 50(3), 205–214. Retrieved from 
http://link.springer.com/article/10.3758/BF03206743 
Li, N., & Loizou, P. C. (2009). Factors affecting masking release in cochlear-implant 
vocoded speech. The Journal of the Acoustical Society of America, 126(1), 338–46. 
http://doi.org/10.1121/1.3133702 
Lieu, J. E. C., Tye-Murray, N., Karzon, R. K., & Piccirillo, J. F. (2010). Unilateral hearing 
loss is associated with worse speech-language scores in children. Pediatrics, 125(6), 
e1348–e1355. http://doi.org/10.1542/peds.2009-2448 
Linstrom, C. J., Silverman, C. a, & Yu, G.-P. (2009). Efficacy of the bone-anchored hearing 
aid for single-sided deafness. The Laryngoscope, 119(4), 713–20. 
http://doi.org/10.1002/lary.20164 
Litovsky, R. Y., Colburn, H. S., Yost, W. A., & Guzman, S. J. (1999). The precedence 
effect. The Journal of the Acoustical Society of America, 106, 1633–1654. 
Litovsky, R. Y., Goupell, M. J., Godar, S., Grieco-Calub, T., Jones, G. L., Garadat, S. N., 
… Misurelli, S. (2012). Studies on bilateral cochlear implants at the University of 
Wisconsin’s Binaural Hearing and Speech Laboratory. Journal of the American 
Academy of Audiology, 23(6), 476–94. http://doi.org/10.3766/jaaa.23.6.9 
Litovsky, R. Y., Parkinson, A., Arcaroli, J., Peters, R., Lake, J., Johnstone, P., & Yu, G. 
(2004). Bilateral Cochlear Implants in Adults and Children. Archives of 
Otolaryngology–Head & Neck Surgery, 130(5), 648. 
http://doi.org/10.1001/archotol.130.5.648 
Loizou, P. C. (2006). Speech processing in vocoder-centric cochlear implants. Advances 
197 
 
in Oto-Rhino-Laryngology. http://doi.org/10.1159/000094648 
Loizou, P. C., Hu, Y., Litovsky, R., Yu, G., Peters, R., Lake, J., & Roland, P. (2009). 
Speech recognition by bilateral cochlear implant users in a cocktail-party setting. The 
Journal of the Acoustical Society of America, 125(1), 372–83. 
http://doi.org/10.1121/1.3036175 
Long, C. J., Eddington, D. K., Colburn, H. S., & Rabinowitz, W. M. (2003). Binaural 
sensitivity as a function of interaural electrode position with a bilateral cochlear 
implant user. The Journal of the Acoustical Society of America, 114(3), 1565. 
http://doi.org/10.1121/1.1603765 
Lopez-Poveda, E. A., Eustaquio-Martín, A., Stohl, J. S., Wolford, R. D., Schatzer, R., & 
Wilson, B. S. (2016). A Binaural Cochlear Implant Sound Coding Strategy Inspired 
by the Contralateral Medial Olivocochlear Reflex. Ear and Hearing, 37(3), e138-48. 
http://doi.org/10.1097/AUD.0000000000000273 
Lorenzi, C., Berthommier, F., Apoux, F., & Bacri, N. (1999). Effects of envelope 
expansion on speech recognition. Hearing Research, 136(1–2), 131–138. 
http://doi.org/10.1016/S0378-5955(99)00117-3 
Ma, N., Morris, S., & Kitterick, P. (2015). Benefits to speech perception in noise from the 
binaural integration of electric and acoustic signalsin unilateral deafness. Ear and 
Hearing. 
Manuscript, A., & Listeners, C. (2013). NIH Public Access, 33(5), 645–659. 
http://doi.org/10.1097/AUD.0b013e318252caae.Timbre 
Maslin, M. R. D., Munro, K. J., & El-Deredy, W. (2013). Evidence for multiple 
mechanisms of cortical plasticity: a study of humans with late-onset profound 
198 
 
unilateral deafness. Clinical Neurophysiology : Official Journal of the International 
Federation of Clinical Neurophysiology, 124(7), 1414–21. 
http://doi.org/10.1016/j.clinph.2012.12.052 
McDermott, H. J., McKay, C. M., Richardson, L. M., & Henshall, K. R. (2003). 
Application of loudness models to sound processing for cochlear implants. The 
Journal of the Acoustical Society of America, 114(4), 2190. 
http://doi.org/10.1121/1.1612488 
McDermott, H., & Varsavsky, A. (2009). Better fitting of cochlear implants: modeling 
loudness for acoustic and electric stimuli. Journal of Neural Engineering, 6, 65007. 
http://doi.org/10.1088/1741-2560/6/6/065007 
McKinney, C. (2002). Hear the other side – a report on Single Sided Deafness. Entific 
Medical Systems. 
Middlebrooks, J. C. (1999). Individual differences in external-ear transfer functions 
reduced by scaling in frequency. The Journal of the Acoustical Society of America, 
106(3 Pt 1), 1480–1492. http://doi.org/10.1121/1.427176 
Middlebrooks, J. C., & Green, D. M. (1991). Sound localization by human listeners. 
Annual Review of Psychology, 42, 135–159. 
http://doi.org/10.1146/annurev.ps.42.020191.001031 
Middlebrooks, J. C., Macpherson, E. A., & Onsan, Z. A. (2000). Psychophysical 
customization of directional transfer functions for virtual sound localization. The 
Journal of the Acoustical Society of America. http://doi.org/10.1121/1.1322026 
Mills, A. W. (1960). Lateralization of High-Frequency Tones. The Journal of the 
Acoustical Society of America, 32(1), 132. http://doi.org/10.1121/1.1907864 
199 
 
Mishra, S. K., & Lutman, M. E. (2014). Top-down influences of the medial olivocochlear 
efferent system in speech perception in noise. PloS One, 9(1), e85756. 
http://doi.org/10.1371/journal.pone.0085756 
Moore, B. C. J. (2003). An Introduction to the Psychology of Hearing. Boston Academic 
Press (Vol. 3). http://doi.org/10.1016/j.tins.2007.05.005 
Moore, J. K. (2000). Organization of the human superior olivary complex. Microscopy 
Research and Technique, 51(4), 403–412. http://doi.org/10.1002/1097-
0029(20001115)51:4<403::AID-JEMT8>3.0.CO;2-Q 
Nelson, D. a, Schmitz, J. L., Donaldson, G. S., Viemeister, N. F., & Javel, E. (1996). 
Intensity discrimination as a function of stimulus level with electric stimulation. The 
Journal of the Acoustical Society of America, 100(4 Pt 1), 2393–2414. 
http://doi.org/10.1121/1.417949 
Nie, K., Barco, A., & Zeng, F.-G. (2006). Spectral and temporal cues in cochlear implant 
speech perception. Ear and Hearing, 27(2), 208–217. 
http://doi.org/10.1097/01.aud.0000202312.31837.25 
Noble, J. H., Gifford, R. H., Hedley-Williams, A. J., Dawant, B. M., & Labadie, R. F. 
(2014). Clinical evaluation of an image-guided cochlear implant programming 
strategy. Audiology and Neurotology, 19(6), 400–411. 
http://doi.org/10.1159/000365273 
O’Donoghue, G. M., Nikolopoulos, T. P., & Archbold, S. M. (2000). Determinants of 
speech perception in children after cochlear implantation. Lancet, 356, 466–468. 
http://doi.org/10.1016/S0140-6736(00)02555-1 
Pelizzone, M., Kasper, A., & Montandon, P. (1990). Binaural interaction in a cochlear 
200 
 
implant patient. Hearing Research, 48(3), 287–290. http://doi.org/10.1016/0378-
5955(90)90069-2 
Poon, B. B., Eddington, D. K., Noel, V., & Colburn, H. S. (2009). Sensitivity to interaural 
time difference with bilateral cochlear implants: Development over time and effect of 
interaural electrode spacing. The Journal of the Acoustical Society of America, 126(2), 
806–815. http://doi.org/10.1121/1.3158821 
R.C., S., J.W., S., J.D., W., P.C., L., & M.D., K. (2014). Vocoder simulations of highly 
focused cochlear stimulation with limited dynamic range and discriminable steps. Ear 
and Hearing, 35(2), 262–270. http://doi.org/10.1097/AUD.0b013e3182a768e8 
Rasetshwane, D. M., Argenyi, M., Neely, S. T., Kopun, J. G., & Gorga, M. P. (2013a). 
Latency of tone-burst-evoked auditory brain stem responses and otoacoustic 
emissions: level, frequency, and rise-time effects. The Journal of the Acoustical 
Society of America, 133(5), 2803–17. http://doi.org/10.1121/1.4798666 
Rasetshwane, D. M., Argenyi, M., Neely, S. T., Kopun, J. G., & Gorga, M. P. (2013b). 
Latency of tone-burst-evoked auditory brain stem responses and otoacoustic 
emissions: level, frequency, and rise-time effects. The Journal of the Acoustical 
Society of America, 133, 2803–17. http://doi.org/10.1121/1.4798666 
Rayleigh, Lord. (1907). On our perception of sound direction. Philosophical Magazine 
Series 6, 13, 214–232. http://doi.org/10.1080/14786440709463595 
Reiss, L. A. J., Ito, R. A., Eggleston, J. L., & Wozny, D. R. (2014). Abnormal binaural 
spectral integration in cochlear implant users. JARO - Journal of the Association for 
Research in Otolaryngology, 15(2), 235–248. http://doi.org/10.1007/s10162-013-
0434-8 
201 
 
Reiss, L., Turner, C. W., Erenberg, S. R., & Gantz, B. J. (2007). Changes in pitch with a 
cochlear implant over time. Journal of the Association for Research in 
Otolaryngology : JARO, 8(2), 241–57. http://doi.org/10.1007/s10162-007-0077-8 
Reynolds, G. S., & Stevens, S. S. (1960). Binaural Summation of Loudness. The Journal 
of the Acoustical Society of America, 32(10), 1337–1344. 
http://doi.org/10.1121/1.1907903 
Riedel, H., & Kollmeier, B. (2002). Comparison of binaural auditory brainstem responses 
and the binaural difference potential evoked by chirps and clicks. Hearing Research, 
169(1–2), 85–96. http://doi.org/10.1016/S0378-5955(02)00342-8 
Roberts, M. T., Seeman, S. C., & Golding, N. L. (2013). A mechanistic understanding of 
the role of feedforward inhibition in the mammalian sound localization circuitry. 
Neuron, 78(5), 923–935. http://doi.org/10.1016/j.neuron.2013.04.022 
Rubinstein, J. T., & Miller, C. A. (1999). How do cochlear prostheses work? Current 
Opinion in Neurobiology. http://doi.org/10.1016/S0959-4388(99)80060-9 
Schleich, P., Nopp, P., D’Haese, P., & D??Haese, P. (2004a). Head shadow, squelch, and 
summation effects in bilateral users of the MED-EL COMBI 40/40+ cochlear implant. 
Ear Hear., 25, 197–204. http://doi.org/10.1097/01.AUD.0000130792.43315.97 
Schleich, P., Nopp, P., D’Haese, P., & D??Haese, P. (2004b). Head shadow, squelch, and 
summation effects in bilateral users of the MED-EL COMBI 40/40+ cochlear implant. 
Ear Hear., 25(3), 197–204. http://doi.org/10.1097/01.AUD.0000130792.43315.97 
Schroder, A. C., Viemeister, N. F., & Nelson, D. A. (1994). Intensity discrimination in 
normal-hearing and hearing-impaired listeners. The Journal of the Acoustical Society 
of America, 96(5), 2683. http://doi.org/10.1121/1.411276 
202 
 
Senn, P., Kompis, M., Vischer, M., & Haeusler, R. (2005). Minimum audible angle, just 
noticeable interaural differences and speech intelligibility with bilateral cochlear 
implants using clinical speech processors. Audiology and Neurotology, 10(6), 342–
352. http://doi.org/10.1159/000087351 
Shamma, S. A., Elhilali, M., & Micheyl, C. (2011). Temporal coherence and attention in 
auditory scene analysis. Trends in Neurosciences. 
http://doi.org/10.1016/j.tins.2010.11.002 
Shannon, R. V, Fu, Q.-J., & Galvin, J. (2004). The number of spectral channels required 
for speech recognition depends on the difficulty of the listening situation. Acta Oto-
Laryngologica. Supplementum, (February), 50–54. 
http://doi.org/10.1080/03655230410017562 
Shinn, J. B., Baran, J. a, Moncrieff, D. W., & Musiek, F. E. (2005). Differential attention 
effects on dichotic listening. Journal of the American Academy of Audiology, 16(4), 
205–18. http://doi.org/10.3766/jaaa.16.4.2 
Siciliano, C. M., Faulkner, A., Rosen, S., & Mair, K. (2010). Resistance to learning 
binaurally mismatched frequency-to-place maps: implications for bilateral 
stimulation with cochlear implants. The Journal of the Acoustical Society of America, 
127(3), 1645–1660. http://doi.org/10.1121/1.3293002 
Sinopoli, T. (2003). Single Sided Deafness: Issues and Alternatives. 
www.Audiologyonline.com. 
Soulodre, G. A., Popplewell, N., & Bradley, J. S. (1989). Combined effects of early 
reflections and background noise on speech intelligibility. Journal of Sound and 
Vibration, 135(1), 123–133. http://doi.org/10.1016/0022-460X(89)90759-1 
203 
 
Stakhovskaya, O., Sridhar, D., Bonham, B. H., & Leake, P. a. (2007). Frequency map for 
the human cochlear spiral ganglion: implications for cochlear implants. Journal of the 
Association for Research in Otolaryngology : JARO, 8(2), 220–33. 
http://doi.org/10.1007/s10162-007-0076-9 
Stecker, G. C., & Hafter, E. R. (2002). Temporal weighting in sound localization. The 
Journal of the Acoustical Society of America, 112, 1046–1057. 
http://doi.org/10.1121/1.1497366 
Steel, M. M., Papsin, B. C., & Gordon, K. A. (2015). Binaural fusion and listening effort 
in children who use bilateral cochlear implants: A psychoacoustic and pupillometric 
study. PLoS ONE, 10(2), 1–29. http://doi.org/10.1371/journal.pone.0117611 
Steeneken, H. J. M., & Houtgast, T. (1980). A physical method for measuring speech-
transmission quality, 67, 318–326. 
Stevens, K. N. (2002). Toward a model for lexical access based on acoustic landmarks and 
distinctive features. Journal of the Acoustical Society of America, 111(4), 1872–1891. 
http://doi.org/10.1121/1.1458026 
Stewart, C. M., Clark, J. H., & Niparko, J. K. (2011). Bone-anchored devices in single-
sided deafness. In Implantable Bone Conduction Hearing Aids (Vol. 71, pp. 92–102). 
http://doi.org/10.1159/000323589 
Svirsky, M. A., Silveira, A., Neuburger, H., Teoh, S.-W., & Suárez, H. (2004). Long-term 
auditory adaptation to a modified peripheral frequency map. Acta Oto-Laryngologica, 
124, 381–386. http://doi.org/10.1080/00016480310000593 
Svirsky, M. A., Talavage, T. M., Sinha, S., Neuburger, H., & Azadpour, M. (2015). 
Gradual adaptation to auditory frequency mismatch. Hearing Research, 322, 163–
204 
 
170. http://doi.org/10.1016/j.heares.2014.10.008 
Tyler, R. S., Noble, W., Dunn, C., & Witt, S. (2006). Some benefits and limitations of 
binaural cochlear implants and our ability to measure them. International Journal of 
Audiology, 45 Suppl 1, S113-9. http://doi.org/10.1080/14992020600783095 
van Buuren, R. A., Festen, J. M., & Houtgast, T. (1999). Compression and expansion of 
the temporal envelope: evaluation of speech intelligibility and sound quality. The 
Journal of the Acoustical Society of America, 105(5), 2903–2913. 
http://doi.org/10.1121/1.426943 
Van de Heyning, P., Vermeire, K., Diebl, M., Nopp, P., Anderson, I., & De Ridder, D. 
(2008). Incapacitating Unilateral Tinnitus in Single-Sided Deafness Treated by 
Cochlear Implantation. Annals of Otology, Rhinology & Laryngology, 117(9), 645–
652. http://doi.org/10.1177/000348940811700903 
van de Par, S., & Kohlrausch,  a. (1998). Comparison of monaural (CMR) and binaural 
(BMLD) masking release. The Journal of the Acoustical Society of America, 103(3), 
1573–1579. http://doi.org/10.1121/1.421292 
van Hoesel, R. J., & Clark, G. M. (1997). Psychophysical studies with two binaural 
cochlear implant subjects. The Journal of the Acoustical Society of America, 102(1), 
495–507. http://doi.org/10.1121/1.419611 
van Hoesel, R. J. M. (2008). Observer weighting of level and timing cues in bilateral 
cochlear implant users. The Journal of the Acoustical Society of America, 124, 3861–
3872. http://doi.org/10.1121/1.2998974 
van Hoesel, R. J. M., & Tyler, R. S. (2003). Speech perception, localization, and 
lateralization with bilateral cochlear implants. The Journal of the Acoustical Society 
205 
 
of America, 113(3), 1617–1630. http://doi.org/10.1121/1.1539520 
Vermeire, K., & Van de Heyning, P. (2009). Binaural hearing after cochlear implantation 
in subjects with unilateral sensorineural deafness and tinnitus. Audiology & Neuro-
Otology, 14(3), 163–71. http://doi.org/10.1159/000171478 
Watson, C. S. (2005). Some comments on informational masking. Acta Acustica United 
with Acustica, 91(3), 502–512. 
Weinberger, N. M. (1995). Dynamic regulation of receptive fields and maps in the adult 
sensory cortex. Annual Review of Neuroscience, 18, 129–158. 
http://doi.org/10.1146/annurev.ne.18.030195.001021 
Welsh, L. W., Rosen, L. F., Welsh, J. J., & Dragonette, J. E. (2004). Functional 
impairments due to unilateral deafness. Annals of Otology, Rhinology and 
Laryngology, 113(12), 987–993. 
Wenzel, E. M., Wightman, F. L., & Kistler, D. J. (1991). Localization with non-
individualized virtual acoustic display cues. Proceedings of the SIGCHI Conference 
on Human Factors in Computing Systems Reaching through Technology - CHI ’91, 
351–359. http://doi.org/10.1145/108844.108941 
Wierstorf, H., Geier, M., Raake, A., & Spors, S. (2011). A Free Database of Head-Related 
Impulse Response Measurements in the Horizontal Plane with Multiple Distances. 
Audio Engineering Society Convention, 130, 3–6. Retrieved from https://dev.qu.tu-
berlin.de/projects/measurements/ 
Wightman, F. L., & Kistler, D. J. (1992). The dominant role of low-frequency interaural 
time differences in sound localization. The Journal of the Acoustical Society of 
America, 91, 1648–1661. 
206 
 
Zeitler, D. M., Dorman, M. F., Natale, S. J., Loiselle, L., Yost, W. A., & Gifford, R. H. 
(2015). Sound Source Localization and Speech Understanding in Complex Listening 
Environments by Single-sided Deaf Listeners After Cochlear Implantation. Otology 
& Neurotology, 36(9), 1467–1471. http://doi.org/10.1097/MAO.0000000000000841 
Zeng, F. G., & Shannon, R. V. (1992). Loudness balance between electric and acoustic 
stimulation. Hearing Research, 60(2), 231–235. http://doi.org/10.1016/0378-
5955(92)90024-H 
Zhou, J., & Durrant, J. D. (2003). Effects of interaural frequency difference on binaural 
fusion evidenced by electrophysiological versus psychoacoustical measures. The 
Journal of the Acoustical Society of America, 114(3), 1508–1515. 
http://doi.org/10.1121/1.1600718 
Zirn, S., Arndt, S., Aschendorff, A., & Wesarg, T. (2015). Interaural stimulation timing in 
single sided deaf cochlear implant users. Hearing Research, 328, 148–156. 
http://doi.org/10.1016/j.heares.2015.08.010 
Zurek, P. M. (1993). A note on onset effects in binaural hearing. The Journal of the 
Acoustical Society of America, 93, 1200–1201. http://doi.org/10.1121/1.405516