ABSTRACT
Title of dissertation: Optimal Space-Time-Frequency
Design of Microphone Networks
Yenming Mark Lai, Doctor of Philosophy, 2014
Dissertation directed by: Professor Radu V. Balan
Department of Mathematics
Consider a sensing system using a large number of N microphones placed in multiple di-
mensions to monitor a acoustic field. Using all the microphones at once is impractical because of
the amount data generated. Instead, we choose a subset of D microphones to be active. Specif-
ically, we wish to find the D set of microphones that minimizes the largest interference gain at
multiple frequencies while monitoring a target of interest. A direct, combinatorial approach – test-
ing all N choose D subsets of microphones – is impractical because of problem size. Instead, we
use a convex optimization technique that induces sparsity through a l1-penalty to determine which
subset of microphones to use. Our work investigates not only the optimal placement (space) of
microphones but also how to process the output of each microphone (time/frequency). We explore
this problem for both single and multi-frequency sources, optimizing both microphone weights
and positions simultaneously. In addition, we explore this problem for random sources where the
output of each of the N microphones is processed by an individual multirate filterbank. The N
processed filterbank outputs are then combined to form one final signal. In this case, we fix all the
analysis filters and optimize over all the synthesis filters. We show how to convert the continuous
frequency problem to a discrete frequency approximation that is computationally tractable. In
this random source/multirate filterbank case, we once again optimize over space-time-frequency
simultaneously.
Optimal Space-Time-Frequency Design of Microphone Networks
by
Yenming Mark Lai
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2014
Advisory Committee:
Professor Radu V. Balan, Chair/Advisor
Professor John J. Benedetto
Professor Howard Elman
Professor Patrick Fitzpatrick
Professor James Yorke
© Copyright by
Yenming Mark Lai
2014
To April and Richard Lai, my loving parents,
ii
Acknowledgments
I would first like to thank my advisor Radu Balan. It has been an incredible experience
being under your tutelage for these last three years. I appreciate the countless hours you have
invested in me and am so very grateful to have had such a kind and intelligent advisor. I carry
your intellectual banner onto Texas!
Thanks to John Gilbert for encouraging me to start my graduate school journey six years
ago. Thank you for believing in me.
Thanks to John Benedetto for his excellent coaching throughout these past six years. We
know who is boss.
Thanks to Howard Elman for mid-class stretch breaks and giving me a chance to redeem
myself on your final. Your coaching has helped me more than you can imagine.
Thanks to Jim Yorke for always hearing me out. You are a role model in scholarship and
leadership, and it is difficult to walk away from a conversation with you without learning some-
thing new.
Thanks to Giovanni Forni for his patient office-hours tutoring in Complex Analysis my very
first semester. You are a brilliant teacher and scholar.
Thanks to Bob Warner for cutting a clear path through the analysis jungle for uninitiated
new comers.
Thanks to Mike Fitzpatrick for always lending me his ear. I will miss seeing your open door
on the third floor.
Thanks to Sennur Ulukus for her helpful advice and modeling clear and excellent teaching.
I still remember 667 and the pack of cookies you brought to our makeup class!
Thanks to Rama Chellappa for great stories and insights. Your work sets the bar in scholar-
ship.
iii
Thanks to Min Wu for a tremendous DSP class. I learned so much.
Thanks to Jeff Henrikson for his fantastic technical support. My thesis would have no
computational meat without your ninja skills. Sorry about flipping the power switch!
Thanks to Eitan Tadmor for graciously providing CSCAMM resources. My large scale
computations would not have been possible without the CSCAMM cluster.
Thanks to Michael Grant for providing commercial grade optimization software for free to
academics. Your toolbox is helping the research community press into unknown territory.
Thanks to Dan Mote for taking the time to meet with a graduate student. I would wrestle
with choices A and B and somehow you always suggested C, the superior option.
Thanks to Stephen Trachtenberg for his leadership insights. It is such a treat to sit in front
of a master.
Thanks for Dave Mount for helpful coaching and great tennis practices. I learned how to
volley playing with you!
Thanks to Razi Haimi-Cohen for his excellent mentoring at Bell Labs and giving me my
first taste of research. I respect and appreciate your high standards, work ethic, and integrity.
Thank to Terri Rogers for her support and advice at Bell Labs. I appreciate your always
open door!
Thanks to Heiko Claussen for a super fast computer and always helpful suggestions. You
are both a great teammate and leader.
Thanks to Justinian Rosca for great vision and helping me to see the bigger picture. We
need to play tennis again!
Thanks to Lex McCusker for your NSF I-Corp mentorship. You have fundamentally altered
how I see both research and entrepreneurship and broadened my horizons. I learned so much
working side-by-side with you.
iv
Thanks to Karl Ginter for knowing everything about everything. I lean heavily on your
wisdom and think so much of your leadership. Thanks for showing me the first few steps in how
to hunt.
Thanks to Nate Strawn for helping me make decisions at crucial crossroads. Your advice
was invaluable and always right.
Thanks to Stan Reeves and Chris Rozell for excellent coaching during my job hunt. I respect
the way you have balanced both career and family.
Thanks to a special group of professors at Rice University who inspired my love for mathe-
matics and engineering. Behnaam Aazhang, you are both smart and savvy and also a great leader.
Rich Baraniuk, your enthusiasm is infectious and your teaching and research lights so many fires!
Don Johnson, your class first taught me what it means to think really hard and those early lessons
have carried me far.
Thanks to a group of guys who helped me hack it through Rice – Shay Harnoy, Patrick
MacAlpine, Juan Navarro, and Gary Printy. I am sad to see BCS come to close. I am so proud
to have known and worked with each of you when we were all clueless undergraduates and look
forward to seeing the impacts each of you will have in your respective professions. (Shay - Fortune
500 CEO, Patmac - brilliant technical lead, Juan - Federal Reserve executive, Gary Printy, the guy
who pulls strings around town!) Also, Shay, is TomNod going to pay for our dinner sometime?
Thanks to Jim Tour for modeling scholarship, faith, and boldness and Shireen Tour for
providing warmth and hospitality week after week. Your lives are powerful testimonies.
Thanks to Robert Newsome for his steadfast friendship and adventuresome spirit. This
thesis would not have been written without your help and hospitality.
Thanks to Sarah Berkey for her support and friendship. I am grateful our lives intersected
in Maryland.
v
Thanks to Ryan Penley and Maryland Cru for the incredible encouragement over these last
six years. I can’t wait to see what is in store for the next six years.
Thanks to the guys who were crazy enough to slog it out in the gym and on the track with
me: Alex Cloninger, James Murphy, Ryan Hunter, Avinash Sahu, Kevin Galloway, Daniel Yin,
Chao-wei Chen, Mike Fahey, Pat Brown, Stefan Zavalin, Casey Allred, and Mike Casto. You each
have a little bit of iron in your souls.
Thanks to Mike Casto and Kyungjin Yoo for running countless tennis drills with me. I am
working hard to get up to a 4.5 rating in Texas!
Thanks to the MATH bible study group: Josh Ballew, Victoria Taroudaki, Bryant Angelos,
Clare Wickman, Rui Can, Jonathan Fernandez, Christiana Sabett, Asia Wyatt, Franck Olivier
Ndjakou Njeunje. Studying and praying with you has been such nourishment.
Thanks to Kevin Offner and the Intervarsity Graduate Fellowship group. I have learned so
much about reconciling faith and work in my day to day life.
Thanks to my housemates and friends: Kevin Galloway (captain worthy of respect), Matt
Videon (athlete, friend, and hunter of White Buffalo), Eli Tung (thoughtful friend and super chef),
Matt Buttner (County, country, and a good guy), Cain Jeffries (good music and good chicken!),
Sebastian Van Neste (leader, comrade, and always striving for the best), Andrew Flavin (thinker,
diplomat, and brother), Jim Spatz (friend and roommate of the year), Troy Stevens (fun train), Es-
rael Seyoum (Pad Thai and man of faith), Ruben Amalalo (Are sure Ghanaians are not Asians?),
Sheldon Neuberger (John Mayer, expert computer scientist, and a chill guy), Jadon Ramsing (soc-
cer star with a kind heart), Jeremy Perry (amazing voice and caring teacher), Emmanuel Yeboah
(extraordinary work ethic and built like a rock), and Matt Shriver (faithful in everything he does).
5010 will always have a special place in my heart because of you. Great times, great food, and
great stories to tell the kids! When’s the next housecleaning?
vi
Thanks to Kevin Moore for teaching me how to teach. I deeply respect your practice and
am thankful for your friendship.
Thanks to Terrence Long for helping me get through MATH631. I had no business being in
that class and would not have made it through without your patient tutoring.
Thanks to AJ Hergenroeder for his steadfast friendship and constant encouragement. I love
your intensity and spirit brother.
Thanks for Austin Bratton for always checking in. You are a faithful friend.
Thanks for Justin Lo for his help small and large. You are always watching out for me and
am very grateful for your friendship.
Thanks to Jim James McTuttle for helping me get back to Texas. I am grateful for your
friendship and the way it challenges me.
Thanks to Jonathan Lugo for all the on-the-fly coding coaching. It has been wonderful to
chat with both an expert and friend.
Thanks to Joaquin Guadarrama. You are a friend in stormy times and would not have made
it through Lanier without your help. You have my vote, literally and figuratively.
Thanks to Gilbert Wang for cooking a feast for my going away party. You are going to
make a woman very happy one day, and without a doubt, you are solid.
Thanks to William Mantzel for giving me both backroom insights into academia and
thoughtful suggestions.
Thanks to Bill Bill Hodges for helping me on a very important drive from DC to Nashville
and constant encouragement through the years. Have you looked at the Comp210 assignment yet?
A special thanks to Alverda McCoy. You are wise, kind, and gracious and mom to so many
students. I am so grateful to have been under your leadership for five years. I will dearly miss our
regular chats and your wise counsel. Thank you for your support, time, and prayers. AMSC is so
vii
blessed to have you on board.
Thanks to Konstantina Trivisa for taking a chance and admitting a graduate student mid-
year. You have incredible vision and energy and a warm heart. I will miss hearing your voice in
the hallways, and I anticipate big things in your future!
Thanks to Angel and Chang Ho for their steadfast support and gracious hospitality. Indi-
anapolis has a special place in my heart, especially Angel’s refrigerator and cupboard.
Thanks to Mike and Jenny Holberg for their support and prayers. It has been a joy to see
your family grow and grow up!
Thanks to my mom April Lai for her faithful prayers and encouragement through these
years. You are a source a strength.
Thanks to my dad Richard Lai for always modeling diligence, patience, and integrity. You
have repeatedly earned my respect.
Finally, thanks to Elizabeth Matthews for her constant support and encouragement. You
hold up half the sky. I hope one day to dedicate a book to you.
viii
List of Tables
3.1 Number of separated minima of criterion J for D = 5 active microphones out
of N and a neighborhood radius R shows the vast majority of combinations are
nonoptimal locally. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Comparison λ-method vs. global minimum. . . . . . . . . . . . . . . . . . . . . 29
3.3 WTIG for the λ-method and SA . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1 Worst interference gains[dB] for: Beamformer, MultiFreq λ-method, SA Multi-
Freq λ-method, SingleFreq λ-method, SA SingleFreq λ-method. . . . . . . . . . 48
ix
List of Figures
3.1 Sample Scenario: L = 193 sources (1 target and 192 interferences), N = 200 mi-
crophones. The 200 microphones are placed on the walls of a 10 m by 8 m room.
To monitor the target located at (3m, 4m), we wish to use only V microphones
instead of all N so as not to overwhelm our central processing unit with data. If
we set V = 5, there are
(N
V
)
=
(100
5
)
≈ 7.5× 107 possible sets of 5 microphones
to use. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Gain pattern of a 32 element microphone array at 1000 hz found by λ-method.
The worst target to interference gain is 1.5 dB. The array is down-selected from
1000 uniform distributed microphones. . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Worst target to interference gain for a number of active microphones and different
target locations. 32 microphones are sufficient to achieve performance similar to
1000 microphones. The sufficient number of microphones is dependent on the
target location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Gain pattern of a 32 element microphone array at 1000 hz found by Simulated An-
nealing. The worst target to interference gain is 0.1 dB. The search was initialized
by the result of the λ-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1 Sample Scenario: L = 193 sources (1 target and 192 interferences), N = 200 mi-
crophones. The 200 microphones are placed on the walls of a 10 m by 8 m room.
To monitor the target located at (3m, 4m), we wish to use only V microphones
instead of all N so as not to overwhelm our central processing unit with data. If
we set V = 5, there are
(N
V
)
=
(100
5
)
≈ 7.5× 107 possible sets of 5 microphones
to use. The L sources are assumed to be broadband, that is each source consists of
the F frequencies f1, f2, . . . , fF . We allow the N filter weights to vary for each
of the F frequencies and hence our optimization problem returns a set of F · N
filter weights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Gain map for classical beamformer that maximizes SNR at 1000 Hz . . . . . . . 49
4.3 Gain map for beamformer found by multi-frequency λ-method at 1000 Hz . . . . 50
4.4 Comparison of mean and variance of maximum gain for varying number of ran-
domly placed interferences at 1000 Hz . . . . . . . . . . . . . . . . . . . . . . . 50
4.5 Ordered weights found by multi-frequency λ-method . . . . . . . . . . . . . . . 50
5.1 Single channel of multi-rate filterbank . . . . . . . . . . . . . . . . . . . . . . . 52
x
5.2 Filterbank for microphone n. Notice that D, the downsampling and upsampling
factor, is not necessarily equal to C, the number of channels (subbands). . . . . . 54
6.1 A single acoustic source xr propagates to N microphones, and the output of each
the N microphones is processed by an individual multi-rate, finite-impulse re-
sponse (FIR) filterbank. The subscript r of xr indexes the I + 1 sources which
consist of I interference sources, whose overall gain we wish to minimize, and
1 target source, whose gain we wish to be exactly 1 so that our system perfectly
reconstructs the target. We index the target source by setting r = 0, and hence
the target source is denoted by x0. The propagation from source xr to filterbank
n is modeled with transfer function Pr,n. The prefilter I0,n inverts the propaga-
tion from the target source, x0, to filterbank n. The cascade of the propagation
filter Pr,n and target inversion filter I0,n is denoted as Hr,n. The output of the N
filterbanks are summed to give a processed signal yr . . . . . . . . . . . . . . . 59
6.2 FIR multi-rate filter bank for n-th microphone . . . . . . . . . . . . . . . . . . . 63
6.3 A single acoustic source x propagates to N microphones, and the output of each
the N microphones is processed by an individual multi-rate, finite-impulse re-
sponse (FIR) filterbank. The propagation from source x to filterbank n is modeled
with transfer function Pn. The prefilter I0,n inverts the propagation from the tar-
get source, x0, to filterbank n. The cascade of the propagation filter Pn and target
inversion filter I0,n is denoted as Hn. The output of the N filterbanks are summed
to give a processed signal yr . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.1 Room environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.2 Performance on optimization grid consisting of 1240 interferences in db. . . . . . 105
8.3 Maximum magnitude of each subchannel’s synthesis filter for the λ value that
produced the desired number of subchannels. . . . . . . . . . . . . . . . . . . . 106
8.4 Maximum magnitude of each subchannel’s synthesis filter after debiasing. . . . . 107
8.5 Locations of active low frequency subchannels. . . . . . . . . . . . . . . . . . . 108
8.6 Locations of active high frequency subchannels. . . . . . . . . . . . . . . . . . . 109
8.7 The synthesis filters frequency responses of a filterbank whose low frequency sub-
channel is only active. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.8 The synthesis filters frequency responses of a filterbank whose high frequency
subchannel is only active. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.9 The synthesis filters frequency responses of a filterbank whose subchannels are
both active. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.10 The synthesis filters frequency responses of a filterbank whose subchannels are
both inactive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.11 Performance on evaluation grid consisting of 4960 interferences in db . . . . . . . 111
xi
List of Abbreviations
DFT Discrete Fourier Transform
DTFT Discrete Time Fourier Transform
WSS Wide Sense Stationary
WSCS Wide Sense Cyclostationary
xii
Chapter 1: Overview
In this work, we explore the design of audio systems to monitor targets in complex envi-
ronments. We propose algorithms to not only choose the placement of microphones but also how
to process each of the microphones sampled signals to monitor a target while attenuating other
interfering sources.
This work is primarily motivated by industrial needs. For example, engineering managers
are interested in monitoring specific bearings on a wind turbine, car manufacturers interested in
the sound of a specific piston, and train conductors interested in detecting aberrant sounds in a
specific wheel set.
These monitoring problems would be trivial if any of the following conditions applied:
• Microphones can be placed adjacent to the target of interest.
• The target source is generated in a quiet or interference-free environment.
• The interference sources’ location and signature are known.
Each of the three previous use cases violate some if not all these conditions.
We will assume our environment contains either a large number of interferences with known
locations or a small number of interferences with unknown locations. In addition, we will assume
we are limited to using only N microphones because of bandwidth constraints.
Furthermore, to make our problem computationally tractable, we will discretize possible
1
microphone locations so that there is a finite set of possible microphone locations. Choosing a
fixed number of microphone locations from a set of possible microphone locations is by definition
a combinatorial problem and for even a moderate size problem, the number of possibilities can
be overwhelming. For example, there are approximately 1060 possibilities when choosing 32
microphone locations from a possible set of 1000 microphone locations. In comparison, there are
approximately 1050 atoms that make up planet Earth.
To make reasonable comparisons in our simulations, we conduct exhaustive searches on
small problem setups to discover the global optimum for each setup and compare our algorithm, a
convex optimization scheme, against these global “bests”.
This work consists of three parts, with each subsequent part building on previous parts’
ideas and algorithms. The content of each part, however, is self-contained and can be read indi-
vidually without consulting other parts.
1.1 Part I: Processing Single Frequency Source Using Delay-Scale-Sum Beamformer
In part I, we explore the case where the acoustic source consists of a single frequency.
We propose an algorithm to find not only the microphone locations but also the corresponding
beamforming weights. In other words, we both place the microphones and offer algorithms to
process the sampled output of each the microphones.
Our processing scheme is a basic delay-scale-sum beamformer. Our system takes the out-
put of each of the N microphones, applies a chosen delay and amplitude scaling to each of the
microphone’s output and then sums up the N processed signals to give a final output. In the fre-
quency domain, this delay and scaling beamforming weight can be represented as simply a scaled
complex exponential.
In practice, the processing system works as follows. Each of the N microphones samples
2
the continuous time signal at the appropriate sampling rate (>= Nyquist). A Discrete Fourier
Transform (DFT) of sufficient length to achieve the needed frequency resolution is then taken
on each of the N streams of discrete samples. If the original source signals consisted only of a
pure tone and the correct sampling rate and DFT length were chosen, the DFT transform should
produce an output of DFT coefficients with only one non-zero entry. For each of theN sets of DFT
coefficients, the system multiplies the computed beamforming weight at the non-zero frequency
bin. We note that the beamforming weights can vary for each of the N processing streams.
The output of the N processing streams is then summmed to give the final result.
1.2 Part II: Processing Broadband Source Using Delay-Scale-Sum Beamformer
Part II is a direct extension of Part I. Here, we explore the broadband case, where signals are
assumed to be a sum of F narrowband signals. Our algorithm again finds the optimal placement
of microphones but now also computes beamforming weights for each of the F frequencies of
interest.
If the original source signals consisted only of a sum of F pure tones and the correct sam-
pling rate and DFT length were chosen, the DFT transform should produce an output of DFT
coefficients with only F non-zero entries. Our algorithm computes beamforming weights for each
of the F non-zero entries for each of the N processing streams.
1.3 Part III: Processing Broadband Source Using Multi-rate Filterbanks
Part III generalizes the microphone placement and processing scheme of Part I and Part
II. In this part’s setup, we now have N multi-rate filterbanks, each processing the corresponding
output of one of the N microphones. Each of the N filterbanks decomposes the discrete input into
C subchannels, and hence our system uses a total of N · C subchannels. The previous constraint
3
of Part I and Part II of using only N microphones because of bandwidth limitations can now be
generalized to the constraint of using only N · C subchannels. We can now use the output of
between N and N · C microphones and still not violate our bandwidth constraint by selecting
which subchannels to process in each filterbank. By using N microphones and each of the N
microphones C subchannels, we fulfill our bandwidth constraint of N · C subchannels. By using
N · C microphones but only choosing to use one subchannel of each of the microphones, we also
fulfill our our bandwidth constraint of N · C · 1 subchannels.
In other words, instead of choosing the placement of N microphones out of a set of P
possible microphone locations, we choose a subset of N · C subchannels to use out of a possible
P · C subchannels. This type of processing scheme would be appropriate when deploying micro-
phones is relatively inexpensive but the transfer of the collected data of each of the microphone is
expensive.
In multi-rate filterbanks, each subchannel is processed by both an analysis and a synthesis
filter. We fix the analysis filters to reduce the computational complexity and instead compute
only the synthesis filters for each of the chosen N · C subchannels. Computing the filters for the
multi-rate filterbanks generalizes computing beamforming weights in Part I and Part II.
The DFT and IDFT implementation of Part I and Part II can be interpreted as the analysis
and synthesis filtering respectively. The choice of beamforming weights of Part I and Part II now
correspond to the choice of the synthesis filters in Part III.
4
Chapter 2: Notation
We denote both the label of and the frequency response of the filter by capitalization of a
letter. We denote the impulse response by the lowercase of the same letter. Hence a filter F has
transfer function F and impulse response f . Summation over all possible indices t is indicated by
placing the index variable t below the summation, that is
∑
t
=
∞∑
t=−∞
The complex number
√
−1 is denoted by the letter j, in contrast to the letter i. Hence, we
write Euler’s formula as ejθ = cos(θ) + j sin(θ). The complex conjugate of complex number is
denoted by a bar over the number. Hence we have, for all real a and b,
a+ jb = a− jb
Parameters that are continuous are placed in parenthesis whereas parameters that are dis-
crete are placed in brackets. Hence, the function f(t) = sin(t), 0 ≤ t < 2pi has a continuous
parameter t whereas f [t] = 3t, t = 0, 1, 2, . . . , 5 has a discrete parameter t. We denote the
z-transform of a discrete signal r[k] as R(z) =
∑
k r[k]z
−k.
The formula for the sum of the first D terms of a geometric series with ratio r 6= 1
5
D−1∑
k=0
rk =
1− rD
1− r
yields the useful identity
D−1∑
k=0
e−j2pi
n
D k =



D if n is an integer multiple of D;
0 otherwise.
(2.1)
In addition, we will let WD be the D-th root of unity, that is
WD = e
− 2pijD (2.2)
The delta sequence with a single parameter m, δm, is defined as follows
δm =



1 if m = 0;
0 otherwise.
(2.3)
The Kronecker delta symbol with two parameters m,n, δm,n is defined as follows
δm,n =



1 if m = n;
0 otherwise.
(2.4)
Matrices and in turn vectors will be denoted in bold face. Hence, A refers to a matrix A.
The i-th, j-th entry, the entry of the i-th row and j-th column, of a matrix is denoted as (A)i,j .
Indices of matrixes will begin with 0. Hence(A)2,0 refers to the entry in the third row and first
column of the matrix A. A denotes the matrix A but with each entry complex conjugated. Hence,
we have
(A)i,j = (A)i,j
AT refers to the transpose of the matrix A. Hence, we have
6
(A)i,j = (AT )j,i
A∗ refers to the conjugate transpose of the matrix A where A∗ is obtained by taking the
transpose of A and then taking the complex conjugate of each entry. Hence we have
A∗ = AT
The z-transform of a discrete function x[t], X(z), is defined as follows
X(z) =
∑
t
x[t]z−t
The Discrete Time Fourier Transform (DTFT) of a discrete functon x[t], X(w), is defined
as follows
X(w) =
∑
t
x[t]e−jwt
We note that if we set z = ejw in the z-transform, we have the DTFT, that is
X(ejw) =
∑
t
x[t](ejw)−t = X(w) (2.5)
Using (2.5), we have
X(w + 2pi) =
∑
t
x[t](ej(w+2pi))−t =
∑
t
x[t](ejw)−t e−j2pit︸ ︷︷ ︸
1
= X(w) (2.6)
In other words, the DTFT is 2pi periodic. If we again set z = ejw in the z-transform, we
have the following relationship
X(zW dD) = X(e
j(w−2pi dD )) = X(ej(w−wD,d)) = X(w − wD,d) (2.7)
7
using the shortened notation
wD,d = 2pi
d
D
(2.8)
Equation(2.7) connects the z-transform notation and the DTFT notation when z = ejw.
Without loss of generality, we will assume that the z-transform series converge in a region around
the unit circle ejw except possibly for a finite number of points. We will use the z-transform
notation exclusively in this paper, with the assumption z = ejw.
Convolution between two discrete functions f and g, evaluated at time t, is defined as
follows
(f ∗ g)[t] =
∑
τ
f [t− τ ]g[τ ]
A change of variable shows that convolution is commutative, that is
(f ∗ g)[t] = (g ∗ f)[t]
A change of variable also shows that if a[t] = (f ∗ g)[t], then
A(z) = F (z)H(z)
A(w) = F(w)H(w)
In other words, convolution in the time domain becomes a product in the z-domain and
Fourier domain.
The Hilbert-Schmidt norm for a matrix A, sizeM×N , is denoted as ‖A‖2HS and is defined
as follows:
8
‖A‖2HS =
M−1∑
m=0
N−1∑
n=0
|(A)m,n|2 (2.9)
The Hilbert-Schmidt inner product for matrices A and B, both of size M ×N , is denoted
as 〈A,B〉HS and is defined as follows:
〈A,B〉HS = Tr(AB
∗) (2.10)
9
Part I
Processing Single Frequency Source Using Delay-Scale-Sum Beamformer
10
Chapter 3: Processing Single Frequency Source
Using Delay-Scale-Sum Beamformer
3.1 Overview
The main goal of this work is to adaptively employ a large set of microphone sensors dis-
tributed in multiple dimensions to scan an acoustic field. Processing data from a large set of
sensors will necessarily involve intelligent definition of suitable subsets of sensors active at var-
ious times. This paper presents a novel method for optimal beam pattern design for large scale
sensor arrays using convex and non-convex optimization techniques to define optimal subsets of
sensors capable to select a target location while suppressing a large number of interferences. The
first of two optimization techniques we present, uses a LASSO-type approach to convexify the
corresponding combinatorial optimization problem. The second approach employs simulated an-
nealing to search for optimal solutions with a fixed size subset of active sensors. Our numerical
simulations show that for scenarios of practical interest, the convex optimization solution is almost
optimal.
11
3.2 Introduction
Consider a large scale microphone array having N microphones that monitors a surveil-
lance area. Here N could be in the order of thousands. Data could be collected from all micro-
phones, processed locally, and further communicated to a central processing unit. For example, for
N = 1000 microphones and a data sampling rate of 100000 samples per second, the bandwidth
requirements for a system performing exhaustive sampling is 1Gsamples/sec, an overwhelming
amount of data for even current hardware systems. Alternatively, the central processing unit could
implement a policy of sparse spatial sampling where only a subset of microphones is polled for
data at any time. This may be desirable for practical reasons: cost (in power consumption, com-
munication or processing bandwidth) is simply too high for an exhaustive strategy. This part
advocates the need for advanced, dynamic control and signal processing of what data should be
manipulated in place of the full extent of the data, and it introduces novel strategies and formal
analysis of what can be achieved when only a sparse subset of microphones is sampled simultane-
ously.
Assume a surveillance area that consists of a set of point sources that we wish to distinguish
from one another. Our task is to sample any given source location using a maximum of V micro-
phones of the very large array (V  N ). We then process the raw data from the V microphones
to estimate the point source at that location. Our system can then scan the room by using different
V -size subsets of microphones for each possible source location at different times. By doing so,
the central processing unit is processing data from V microphones at a time and not all N micro-
phones at a time. Figure 3.1 provides a possible setup scenario. We will thus design and analyze
sparse spatial sampling strategies with near-optimal performance. Specifically, we develop algo-
rithms that minimize the p-norm of interference gains with p = 1, 2,∞ while reconstructing the
12
target source perfectly. Conceptually, the system works in two steps. First, the discovery step esti-
mates locations of all point-like sources. Second, the system separates source signals, one source
at a time.
Below we define the problem in precise terms. Throughout this part, we assume the follow-
ing hypothesis:
H1. The total number of microphonesN and their locations(x, y, z-coordinates) are known.
H2. The total number of sources L and their locations (x, y, z-coordinates) are known.
H3. The maximum number of microphones active at any time is fixed in advance and
denoted by V . In general V  L.
H4. Signals are narrow band and operate on the same frequency f .
In practice, the number of sources and their locations are often unknown. By assuming a
large number of known sources and locations, our system can search each possible source location
to detect whether or not a real source is present.
13
0 2 4 6 8 100
2
4
6
8
 
 
Target
Interferences
Sensors
Fig. 3.1. Sample Scenario: L = 193 sources (1 target and 192 interferences), N = 200 micro-
phones. The 200 microphones are placed on the walls of a 10 m by 8 m room. To monitor the
target located at (3m, 4m), we wish to use only V microphones instead of all N so as not to over-
whelm our central processing unit with data. If we set V = 5, there are
(N
V
)
=
(100
5
)
≈ 7.5× 107
possible sets of 5 microphones to use.
3.3 Problem formulation
Consider the setup described earlier and sketched in Figure 3.1. We assume that both the
N microphone and L source locations are known. We use the direct-path model to model the
propagation from source to microphone, and hence the transfer function between source l and
microphone n at frequency f is given by
Hl,n(f) =
e−2piifc‖rl−rn‖
‖rl − rn‖
, 0 ≤ l ≤ L− 1, 0 ≤ n ≤ N − 1 (3.1)
where rn, rl denote the position vectors of microphone n and source l respectively, and c is the
wave propagation speed, which in our case is the speed of sound, 343 m/sec.
14
Let xn(τ) denote the continuous time-domain signal received by microphone n. The short-
time Fourier transform (STFT) of xn, Xn(f, t), is related to the STFT of source signals Sl(f, t)
as follows
Xn(f, t) =
L−1∑
l=0
Hl,n(f)Sl(f, t) + νn(f, t) (3.2)
where νn(f, t) is the microphone n noise, f denotes the frequency, and t denotes the time frame
index. In other words, for fixed frequency f and time frame index t, the signal received by mi-
crophone n, in the frequency domain, is a sum of L signals and the noise of microphone n, νn.
Each of the L signals received by the microphone originates from one of the L sources, Sl, and the
propagation of the source signal Sl to microphone n is modeled by multiplying the source signal
Sl by transfer function Hl,n.
Our system estimates one source signal at a time. Hence, one of the sources is declared the
target, and the remaining sources are declared interferences. Our system can then estimate all the
L sources by repeating this process L times – choosing a target source, declaring the rest of the
sources as interferences, and then estimating the target source.
Our system is a delay-scale-sum beamformer. In other words, the input to each microphone,
xn, is delayed and scaled by some variable amount, and then all the delayed and scaled microphone
signals are summed up to give a single final output signal, y. By choosing an appropriate delay
and scaling for each microphone, we wish this final output signal to contain an undistorted target
source signal and attenuated versions of the other L− 1 interference source signals.
The delay and scaling that microphone n performs can be modeled in the frequency domain
as multiplication by a weighted complex exponential or equivalently multiplication by a complex
number, wn(f). A key observation is that if wn(f) = 0, then microphone n is inactive at
frequency f .
For simplicity of notation, we assume the zero indexed source is the target and the remaining
15
L− 1 sources are interferences. The STFT of the final output y is given by
Y (f, t) =
N−1∑
n=0
wn(f)Xn(f, t)
=
L−1∑
l=0
(
N−1∑
n=0
wn(f)Hl,n(f)
)
Sl(f, t) +
N−1∑
n=0
wn(f)νn(f, t) (3.3)
We observe from (3.3) that the response factor of source l, Sl, is given by
Rl(f) =
N−1∑
n=0
wn(f)Hl,n(f). (3.4)
We note that Rl(f) is a complex scalar that shows how our delay-scale-sum beamformer
processes source l. By taking the modulus of Rl(f), we get the gain Kl(f), that is
Kl(f) = |Rl(f)| (3.5)
The gain Kl(f) is a real non-negative scalar that measures how much our system attenuates
or amplifies a given source l. If 0 ≤ Kl(f) < 1, the system attenuates the source l and conversely
if Kl(f) > 1, the system amplifies source l. If Kl(f) = 1, the system does not distort the
amplitude of source l but may possibly introduce a phase shift to source l.
Ideally, for fixed frequency f , we wish to choose N beamforming weights such that
Rl(f) = 0 for l = 1, 2, . . . L − 1 and R0(f) = 1. In other words, our system of N microphones
completely cancels out all L− 1 interferences while perfectly reconstructing the the target source,
source 0. This is achievable in the case where there are more microphones than sources, that is
N > L.
In general, for fixed frequency f , we wish to choose N beamforming weights such that
p-norm of the L − 1 interference responses, Rl(f) for l = 1, 2, . . . L − 1, is small while still
achieving perfect reconstruction of the target source, that is R0(f) = 1. We denote the column
vector of interference responses, size (L− 1)× 1, as RI,(L−1)×1(f) where
16
RI,(L−1)×1(f) = [R1(f), R2(f), . . . , RL−1]
T (3.6)
In our setup, each of the L sources propagates to each of the N microphones where the
L sources consist of 1 target and L − 1 interferences. For the l-th source, there are N transfer
functions modeling the propagation from the l-th source to the N microphones. We denote the
row vector of transfer functions for source l, size 1×N , as Hl,1×N (f) where
Hl,1×N (f) = [Hl,1(f), Hl,2(f), . . . ,Hl,N−1(f)] (3.7)
Since we have a total of N microphones, there are a total of L×N transfer functions. We
denote the source transfer function matrix, size L×N , as HL×N (f) where
HL×N (f) =
















H0,1×N (f)
H1,1×N (f)
H2,l×N (f)
...
HL−1,1×N (f)
















(3.8)
If we consider only transfer functions corresponding to the L − 1 from interferences, we
can delete the first row of (3.8) to get the interference transfer function matrix, size (L− 1)×N ,
HI,(L−1)×N (f) where
HI,(L−1)×N (f) =












H1,1×N (f)
H2,l×N (f)
...
HL−1,1×N (f)












(3.9)
17
We can denote the column vector of all beamforming weights, size N × 1, as wN×1(f)
where
wN×1(f) = [w0(f), w1(f), . . . , wN−1(f)]
T (3.10)
We observe from the definition of response factor, (3.4), that all L−1 interference response
factors, RI,(L−1)×1(f) , can be expressed as a matrix-vector multiply between interference trans-
fer function matrix HI,(L−1)×N (f) and weight column vector of wN×1(f), that is
RI,(L−1)×1(f) = |HI,(L−1)×N (f)wN×1(f)| (3.11)
In addition, the target perfect reconstruction requirement, that is R0(f) = 1, can be com-
pactly rewritten as 〈wN×1(f), H¯0,1×N (f)〉 = 1, where the bar denotes complex conjugation, to
compensate for the complex conjugation in the inner product.
Hypothesis H3 limits the number of simultaneously active microphones. Microphone n is
considered active if for fixed frequency f , wn(f) 6= 0. Thus an appropriate measure of the number
of active microphones for fixed frequency f is the pseudo-norm ‖wN×1(f)‖0 defined by
‖wN×1(f)‖0 = |{n ; wn(f) 6= 0}| (3.12)
where |U | represents the cardinal of set U . We can now state our optimization problem as follows:
Optimal V -Sparse Beampattern Design Problem for Fixed Frequency f
min ‖HI,(L−1)×N (f)wN×1(f)‖p
wN×1(f)
subject to
〈wN×1(f), H¯0,1×N (f)〉 = 1
‖wN×1(f)‖0 = V
(3.13)
18
Note that this is a non-convex optimization problem due to the l0 pseudo-norm constraint ‖w‖0 ≤
V .
3.4 Optimization Strategies
We study two optimization strategies for problem (4.5), which is non-convex and combi-
natorial by nature. First, we present a solution we call the λ-method, which solves a regularized
version the problem. Second, we present a simulated annealing based method, which has global
search capability.
3.4.1 The λ-method
This method is inspired by LASSO regularization [1], a regression technique that minimizes the
sum of squares of residual errors subject to the l1 norm of the coefficients being less than a con-
stant. Furthermore, similar to the sparse signal and model estimation approach in [2], the l0
pseudo-norm (3.12) is replaced by the l1 norm ‖wN×1(f)‖1 =
∑N
n=1 |wn(f)| which is then
incorporated into the optimization criterion using a Lagrange multiplier λ. The optimization prob-
lem (4.5) is first replaced by the following convex optimization problem which is the λ-method:
min ‖HI,(L−1)×N (f)wN×1(f)‖∞ + λ‖wN×1(f)‖1
wN×1(f)
subject to
〈wN×1(f), H¯0,1×N 〉 = 1
(3.14)
As λ increases, the l1 penalty term forces more and more of the weights wn(f) to have very small
magnitude, which means that the corresponding microphones are very near inactive. For very
large λ, the optimization problem finds a solution with exactly one non-trivial weight and hence
exactly one active microphone so that the target perfect reconstruction constraint is fulfilled. As
19
λ decreases, the penalty term becomes less expensive and more microphones become active. At
the lower limit of λ = 0, all microphones are allowed to be active. We note, however, in general,
there is no need for more than min(N,L) active microphones.
We note that λ is a data-dependent parameter that needs to be tuned to discover the final
subset of microphones that are to be used. An additional step is needed to find the corresponding
microphone weights.
Bisection method to discover λ that produces V microphones
We discover the λ that produces exactly V microphones using the bisection method. The
algorithim follows:
(1) Choose an upper bound bupper for λ. We note that bupper is user-chosen and data-
dependent. The lower bound blower for λ is automatically set to 0.
(2) Set λ = (bupper + blower)/2.
(3) Solve the optimization problem (3.14) with the λ calculated in step (2).
(4) We measure the number of active microphones returned by the solution of step (3) as
follows. We note that the the solution of (3.14) is a N -length vector of beamforming weights,
i.e. compex numbers. First, we measure how strongly active microphone n is by measuring the
magnitude of its beamforming weight wn. Hence, let gn = |wn(f)|, 1 ≤ n ≤ N be the measure
of activity of microphone n. Sort the sequence (gn)1≤n≤N so that its values are monotonically
decreasing, that is gn1 ≥ gn2 ≥ · · · ≥ gnN . In other words, sort the microphones in terms of how
active they are, with the most active microphones first. If gnVλ+1 is 3 orders of magnitude smaller
than gn1 , we say that Vλ microphones are active. In other words, if the activity of a microphone
is much smaller than the activity of the most active microphone, we consider the microphone
inactive.
(5) If Vλ > V , set blower = λ and go to step (2). In other words, increase λ since there
20
are too many active microphones. If Vλ < V , set bupper = λ and go to step (2). In other words,
decrease λ since there are too few active microphones. If Vλ = V , we are done. The set of active
of microphones, A, is then
A = {n1, n2, . . . , nVλ} (3.15)
Remark The above algorithim assumes monotonicity in the number of active microphones with
respect to the tuning parameter λ , that is as λ increases, the number of active microphones is non-
increasing. In practice, we find the assumption holds globally but not always locally. While, in
general, as λ increases, the number of active microphones decreases. we find that small changes in
λ sometimes leads to non-monotonic behavior. For example, a small increase in λ might increase
the number of active microphones from 4 to 6 or similarly a small decrease in λ might decrease
the number of active microphones from 6 to 4.
To handle these cases, we introduce these additional checks in the above algorithm:
(1) We keep track of the number of active microphones discovered by the previous λ it-
eration and also the previous λ value. Before entering step (5), we check if the current λ itera-
tion produces a number of active microphones that exhibits non-monotonic behavior. We define
non-monotonic behavior as an increase in the number of active microphones if λ was previously
increased and, similarly, a decrease in the number of active microphones if λ was previously de-
creased. If non-monotonic behavior has occurred, we immediately return to step (2), and instead
of computing the average of blower and bupper, we choose a random number between blower and
bupper.
(2) To guarantee convergence of the algorithm, we also check the following two termination
conditions:
• In step (2), if the difference between the new λ and the previous λ exceeds a user-defined
21
tolerance, terminate the algorithm.
• In step (5), if the number of iterations exceeds a user-defined maximum number of iterations,
terminate the algorithm. An iteration is defined as the completion of one round of steps (2),
(3), and (4).
Discovery of final weights using only V microphones
The final weights are obtained by solving a second time (3.14) restricted to the submatrix
indexed by A with either λ = 0, if the submatrix is well-conditioned, or the same λ otherwise.
This step is known as “debiasing”. This step is needed because of the weight vector found by the
final step (3), while containing V large weights also contains N − V weights that are small but
are still non-zero.
3.4.2 Simulated Annealing (SA)
Our second approach to solve (4.5) uses simulated annealing (SA). We note that the con-
straint on the number of active microphones makes (4.5) a combinatorial optimization problem.
Simulated annealing is a simple randomized technique for iterative improvement introduced in [3].
SA repeatedly traverses a Markov chain by sampling the search space from a neighborhood of the
current point with a probability proportional to exp(f(x)T ) where T is a temperature parameter and
f is the objective function to be minimized. The idea is that when T is large or, metaphorically
speaking, when the “the iron is hot” and therefore pliable, all points in the space have approxi-
mately equal probability of being selected, and as T decreases or the “the iron cools” and is less
pliable, only points in the space that are close to the current search point have approximately equal
probablity of being selected. The temperature functions follows an user-determined schedule of
heating, cooling, and the re-heating.
If the new point discovered is better, the point is set as the current search point. A key idea
22
of SA is that if the new point discovered is worse, the algorithm will accept it as the current search
point with some non-zero probability in order to avoid local minima.
In our case, we wish to find the subset of V microphones out of N and their corresponding
weights that minimize the problem given by (4.5). In other words, we wish to find the subset
of V beamforming weights out of N that minimizes the p-norm of the inteference gains while
maintaining target unity gain, that is achieves target perfect reconstruction. The objective function
value f is then the p-norm of the L − 1 interference gains. If the optimization program fails for
any reason, the objective function value is set to infinity.
SA generates a new random candidate solution point j in the neighborhood of the current
point i, where j’s distance from the current search point (the difference of the microphone indices
moduloN ) is proportional to the temperature parameter. The candidate replaces the current search
point if the candidate’s objective function value improves, i.e. f(j) < f(i). This is also done if
the candidate’s objective function value is worse, but only randomly with a probability given by:
p = exp
(
f(j)− f(i)
T
)
(3.16)
Otherwise, the candidate is rejected.
We index our microphones from 0 to N − 1, and our initial point is the set of V active
microphone indices discovered by the λ-method. By choosing our initial point in this manner, we
guarantee that SA will find a solution no worse than the solution discovered by the λ-method.
We perturb these V indices in proportion to the current temparature T . If v >= N or v < 0,
we set v = v (mod N). By doing so, we make all the new indices valid microphone indices but
also introduce the notion that the microphones are arranged in a circular manner, that is the first
microphone, microphone 0, and the last microphone, microphone N − 1, are only a distance 1
apart.
In our experiments, we use the following SA parameters. The initial temperature is set to
23
100, and the annealing schedule decreases the temperature by 5% each iteration. We limit the
length of the Markov walk to a total length of 1200 iterations. After every 100 iterations, the
annealing is restarted by setting the temperature back to its initial value of 100. Further, we rerun
SA ten times.
3.5 Relation to Prior Work
In this section we compare and relate our approach to four problems in the literature: the
beam pattern design, co-array design, grid-based beamforming, and compressive sampling.
Beampattern Design Lebret and Boyd [4] showed that given arbitrary microphone loca-
tions, finding the set of microphone weights that minimizes the maximum interference gain could
be formulated as a convex optimization problem. They modeled sources as point sources, com-
plex exponentials decaying as a function of distance. Specifically, the problem could be showed
equivalent with a Second Order Cone Programming (SOCP) problem and thus efficiently solved
by interior point methods [5].
Our approach is similar in spirit to that of Ling et al. [6]. The authors followed the ap-
proach of [4] but added an additional l1 norm penalty to the weights. The l1 penalty sparsifies the
microphone weights and therefore microphone locations. Specifically, the l1 penalty on weights
causes many of the beamforming weights to be close to zero in magnitude. Our work differs in
the following three aspects. First, we minimize p-norm of interference gains instead of the side
lobes at different incident angles since we are interested in a two dimensional beam pattern. Sec-
ond, our simulations consider microphone locations on the perimeter of a rectangular room, rather
than only on a straight line. Third, we do not add an l2 penalty on the weights to protect against
large gains. Instead, after optimizing over a large set of microphones with a l1 penalty, we select
the microphone locations with the largest weight values and then reoptimize over this subset of
24
locations. During the reoptimization, we use the same l1 penalty.
The performance of the proposed λ-method is compared against the performance of an
exhaustive search, showing how close to optimal the λ-method achieves. The search space of
possible microphone configurations is also analyzed by counting the number of local minima.
Co-array Design A common approach to choose microphone locations for a variety of
problems is to examine the co-arrays formed by the microphone locations [7,8]. Recently, Pal and
Vaidyanathan [9] proposed an alternate method to find microphone positions with desired co-array
properties through the use of nested uniform arrays. However, the theory of co-prime arrays deals
principally with the one dimensional case. The optimization criteria we consider in the present
paper can deal implicitly with two or three dimensional array designs.
Grid-based Beamforming Brandstein and Ward modeled an acoustic enclosure as a rect-
angular grid of point sources in [10]. Grids were labeled either as sources or interferences based
on prior knowledge. Microphone weights were calculated to maximize the optimization criterion
given by the ratio between source gains and interference gains. This prior work did not analyze the
relationship between microphone placement and the optimization criterion. The authors extended
their algorithm [11] to address room reverberation by allowing for sources to lie outside the room,
an idea drawn from the “image” model where reflections of sources off of walls are modeled as
virtual sources lying outside the room [12].
Compressive Sampling The widespread compressive sampling problem (see [13,14]) is to
to minimize the l0 pseudo-norm of a vector x subject to a linear constraint Ax = b in the absence
of noise, or an inequality ‖Ax − b‖p ≤ ε in the presence of noise. Using Lagrange multipliers,
this problem becomes
min
x
‖Ax− b‖p + λ‖x‖0 (3.17)
where p ≥ 1.
25
Problem (4.5) can be brought to this form if we make the additional assumption that a
specific microphone is active. For simplicity of notation, assume that we know |wN (f)| > 0, that
is microphone N , the last microphone is active. We know that at least one microphone is active
since if no microphones were active the target unit gain constraint would not be fulfilled. Next, we
solve for wN (f) from the target unit gain constraint 〈wN×1(f), H¯0,1×N (f)〉 = 1 and substitute
back into HI,(L−1)×N (f)wN×1(f). If we denote w˜(N−1)×1(f) as the (N −1)×1 column vector
which consists of the first N − 1 components of wN×1(f), that is
w˜(N−1)×1(f) =












w0(f)
w1(f)
...
wN−1(f)












,
and AL×(N−1)(f) the L× (N − 1) matrix whose l, n entry is given by
(
AL×(N−1)(f)
)
l,n
= Hl,n(f)−
Hl,N (f)H0,n(f)
H0,N (f)
,
and bL×1(f) the L× 1 column vector whose l-th entry is given by
(bL×1(f))l = −
Hl,N (f)
H0,N (f)
,
then (4.5) becomes
min
w˜(N−1)×1(f):‖w˜(N−1)×1(f)‖0=V−1
‖AL×(N−1)(f)w˜(N−1)×1(f)− bL×1(f)‖p (3.18)
which can be converted into a problem similar in form to (3.17):
min
w˜(N−1)×1(f)
‖AL×(N−1)(f)w˜(N−1)×1(f)− bL×1(f)‖p + λ‖w˜(N−1)×1(f)‖0. (3.19)
26
A different formulation of (3.19) is to minimize the number of sensors that achieve a given
gain level τ , that is
min ‖w˜(N−1)×1(f)‖0
w˜(N−1)×1(f)
subject to
‖AL×(N−1)(f)w˜(N−1)×1(f)− bL×1(f)‖p ≤ τ
(3.20)
The convex relation is then given by
min ‖w˜(N−1)×1(f)‖1
w˜(N−1)×1(f)
subject to
‖AL×(N−1)(f)w˜(N−1)×1(f)− bL×1(f)‖p ≤ τ
(3.21)
We note that existing literature such as [15] gives guarantees on the quality of solution
found by optimization problems (3.21) if AL×(N−1) satisfies the Restricted Isometry Property
(RIP) [16]. However in our problem setup, AL×(N−1) does not satisfy RIP for reasonable numbers
of available microphones (e.g. N = 1000) since the microphones are closely spaced and hence
columns of AL×(N−1) have high similarity.
3.6 Experimental Results
Below we present two types of results. First we do a comprehensive analysis of the combi-
natorial optimization problem (4.5) for a tractable size. Second, we study a large scale setup. All
reported gain results are in [dB] defined as 10 log10 ‖Hw‖∞.
I). Consider the rectangular room similar to that of Figure 3.1 of size (10m × 8m) but with
the target located at (1m, 4m) and 6200 interferences operating at f = 1kHz. The total number of
27
Table 3.1. Number of separated minima of criterion J for D = 5 active microphones out of N
and a neighborhood radius R shows the vast majority of combinations are nonoptimal locally.
@
@
@
@
@@N
R
1 2 3 4 5 6 7 8 9 10 (N5
)
10 13 7 3 2 2 1 1 1 1 1 252
12 13 2 1 1 1 1 1 1 1 1 792
14 29 9 5 1 1 1 1 1 1 1 2002
16 112 27 10 4 4 3 2 2 2 2 4368
18 184 49 15 10 8 7 1 1 1 1 8568
20 288 56 24 13 6 6 2 2 2 2 15504
microphones N runs from 10 to 20 but only V = 5 are active microphones. For each combination
pi = (pi1, pi2, pi3, pi4, pi5) of 5 microphones out of N we solve the reduced optimization problem
J(pi) = min
w˜
‖H˜w˜‖∞ , subject to 〈w˜,
¯˜H0〉 = 1 (3.22)
where H˜ is the 6200 × 5 submatrix of H corresponding to the combination pi of active micro-
phones. Then we analyze the number of separated (local) minima across the set of
(N
5
)
combina-
tions. We use the l1 modulo N for distance. We analyze neighborhoods of radius R, with R from
1 to 10. Specifically a combination ρ is a separated minimum if J(ρ) ≤ J(pi) for all combinations
pi so that d(ρ, pi) ≤ R. Table 3.1 summarizes the number of separated minima for a given R. Next
we apply the λ-method where we tune λ to achieve 5 significant nonzero weights. Due to axial
symmetry in cases N=12 and N=14 the 5th and 6th largest weights are equal. In the other cases,
5th largest weight was at least 1000 times larger than the 6th weight. The optimal criterion found
by the λ-method is less than 0.75dB higher than the global minimum (see Table 3.2).
II). We perform simulations to show the performance and properties of the λ-method, where
we aim to separate sounds from a known target location and unknown interference locations. To
account for the unknown locations we consider a uniform grid of 100×80 interferences. We refer
to this grid as the optimization grid. All experiments assume a rectangular room of dimension
28
Table 3.2. Comparison λ-method vs. global minimum.
N 10 12 14 16 18 20
λ-method [dB] -0.86 0.94 0.55 -1.39 -0.95 -1.55
Global optim [dB] -1.31 0.54 0.54 -1.6 -1.68 -1.62
10m x 8m and N = 1000 microphones. We disregard interferences that are closer than 0.5m to
the source or the walls of the room and report the worst target to interference gain (WTIG) as
performance measure.
First, we use the λ-method to select a subset of size V = 32 microphones from the original
1000 that achieves the lowest WTIG for a 1kHz target signal at position (x, y) = (3, 4) for a
single path reflection model on a fine 1000 × 800 evaluation grid. This is achieved as discussed
in Section 4.4. The resulting gain beam pattern is illustrated in Fig. 3.2. Here, the WTIG is found
is 1.5dB. The disregarded regions close to the walls and the target are indicated by white coloring.
Note that both inferred microphone locations are beam pattern are symmetric. Also, the beam
pattern shows ray-like regions of higher gains “emitting” from pairs of microphones. These rays
overlap in a 2D design to archive a high gain at the target and to suppress signals at other locations
in the room.
Next, we analyze performance as a function of the number of selected microphones V and
the target location. Figure 3.3 illustrates the WTIG of a 1kHz target signal at positions (1,4) and
(3,4) for D = 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1000 active microphones and λ = 0.1. This
shows that a small subset of e.g., 32 microphones can be sufficient to achieve performance similar
to the solution of all 1000 microphones. However, it also shows that the necessary number of
microphones, for close to optimum performance, is dependent on the target location. That is, while
sub-selecting 32 microphones results in a WTIG increase over the 1000 microphone solution of
0.1dB for position (1,4) on the evaluation grid, it results in an WTIG increase of 4.8dB for position
29
  Gain [dB]MicrophonesSource
−30
−25
−20
−15
−10
−5
0
5
10
Fig. 3.2. Gain pattern of a 32 element microphone array at 1000 hz found by λ-method. The worst
target to interference gain is 1.5 dB. The array is down-selected from 1000 uniform distributed
microphones.
(3,4).
Finally, we evaluate the quality of the result from our λ-method by comparing it with the
results from the SA method. This is necessary due to the non-convex nature of the optimization
problem (4.5), which suggests that our approach could converge to a sub-optimal local minimum.
Both, the λ-method and SA method use the optimization grid to find the microphone locations and
weights. SA is initialized at the final microphone locations of the λ-method to reduce processing
time. The result of this comparison is illustrated in Table 3.3. The small improvement from SA
illustrates the quality of the λ-method result. In Fig. 3.4, the gain pattern is no longer symmetric
but achieves a better WTIG of 1.5 dB.
Table 3.3 and Fig. 3.3 also show that the microphone selection and weights obtained using
the optimization grid perform sub-optimally on the finer evaluation grid. That is, while Table 3.3
reports a WTIG of −12.71dB for position (1,4), where the evalution is based on the optimization
grid, the same position only achieves a WTIG of −4.79dB when the evaluation is based on the
finer evaluation grid. Note that it is computationally infeasible to use the fine evaluation grid for
optimization with the λ-method. 30
100 101 102 103−20
−10
0
10
20
30
Number of Microphones
WTIG
 [dB]
 
 Pos. (1,4); Opt. GridPos. (1,4); Eval. GridPos. (3,4); Opt. GridPos. (3,4); Eval. Grid
Fig. 3.3. Worst target to interference gain for a number of active microphones and different target
locations. 32 microphones are sufficient to achieve performance similar to 1000 microphones.
The sufficient number of microphones is dependent on the target location.
Table 3.3. WTIG for the λ-method and SA
Method Pos. (1,4) Pos. (3,2.5) Pos. (3,4)
Opt. Grid λ-method -12.71 -7.35 -5.73
(100× 80) SA -13.00 -7.66 -5.73
Eval. Grid λ-method -4.79 0.30 1.5
(1000× 800) SA -5.34 -0.14 0.1
31
  Gain [dB]MicrophonesSource
−30
−25
−20
−15
−10
−5
0
5
10
Fig. 3.4. Gain pattern of a 32 element microphone array at 1000 hz found by Simulated Annealing.
The worst target to interference gain is 0.1 dB. The search was initialized by the result of the λ-
method
3.7 Conclusions
Our problem is acoustic scene understanding, which subsumes a number of high level tasks
such as localization of acoustic sources, tracking of features of the sources, and even source sep-
aration. We aim to utilize a very large number of microphones distributed spatially in the volume
of interest. This paper discusses optimal beam patterns for very large microphone arrays. It shows
that only a sparse active subset of microphones is sufficient at any given time. The equivalent
problem with a small set of microphones would be element spacing. The paper formalizes the
problem of defining active subsets of microphones capable to zoom into a target location while
suppressing a large number of interferences. It shows that near optimal subsets are solutions of a
convex approximation of the originally stated non-convex criterion using a LASSO inspired ap-
proach. Future work will show how solutions representing multiple active subsets of a very large
multidimensional microphone array could be adaptively employed in order to locate and separate
sources of interest.
32
Part II
Processing Broadband Source Using Delay-Scale-Sum Beamformer
33
Chapter 4: Processing Broadband Source Using
Delay-Scale-Sum Beamformer
4.1 Overview
Consider a sensing system using a large number of N microphones placed in multiple
dimensions to monitor a broadband acoustic field. Using all the microphones at once is impractical
because of the amount of data generated. Instead, we choose a subset of V microphones to be
active. Specifically, we wish to find the set of V microphones that minimizes the p-norm of the
gain at multiple frequencies while monitoring a target of interest with p ∈ {1, 2,∞}. A direct,
combinatorial approach – testing all N choose V subsets of microphones – is impractical because
of the problem size. Instead, we use a convex optimization technique that induces sparsity through
a l1-penalty to determine which subset of microphones to use. We test the robustness of the our
solution through simulated annealing and compare its performance against a classical beamformer
which maximizes SNR. Since switching from a subset of D microphones to another subset of D
microphones at every sample is possible, we construct a space-time-frequency sampling scheme
that achieves near optimal performance.
34
4.2 Introduction
Consider a large scale microphone array havingN microphones that monitors a surveillance
area. Using all microphones simultaneously may be unreasonable in terms of power consumption
and data processing. For example, for N = 10000 microphones and a data sampling rate of
100000 samples per second, the bandwidth requirement is 1Gsamples/sec. Instead, we could poll
only a subset of D microphones at any one given time. The N choose D number of choices of
microphones allows for a myriad of microphone configurations, and the task is then to choose a
subset that achieves our objective.
Alternatively, our problem is to place D microphones in a constrained region of space. We
convert this non-convex optimization problem into a combinatorial problem by discretizing the
possible set of microphone locations. In this context our approach can be seen as an optimal test
design.
Assume the surveillance area consists of a set of point-like sources. We seek designs that
minimize the p-norm of intereference gains across frequencies from a potentially very large num-
ber of locations while still maintaining target unit gain at each frequency. Throughout this paper,
we assume the following hypotheses:
H1. Microphone locations, that is their (x, y, z-coordinates), are known. Their locations
however can be arbitrary.
H2. The number of real interferences, their locations, and their spectral powers are un-
known.
H3. The maximum number of microphones active at any time, V , is fixed in advance.
H4. All sources, both target and interferences, are broadband that is each source consist of
F frequencies f0, f1, . . . fF−1
35
When microphones have local computational processing power, we make an additional
hypothesis:
H5. Microphones can band-pass signals and they can switch data transmission on a sample-
by-sample basis.
Under H5, the system can sample selectively the space-time-frequency domain.
Our system divides the surveillance area into a large number of small, non-overlapping
target areas. For each fixed target area, we find a subset of microphones that minimizes the p-
norm of broadband gains of a large number of virtual interferences with p ∈ {1, 2,∞}. By doing
so, the system is robust to a wide variety of unknown interference configurations. In addition,
we note that the system can choose a different set of microphone locations for each of the three
different p-norms.
After finding a set of V microphones to monitor each of the individual target areas, the sys-
tem begins to iteratively scan each target area by polling the predetermined set of V microphones
for each of the corresponding target areas. We note that the V microphones need not be unique
for each target area but rather the same microphone can be reused across different sets of V mi-
crophone configurations as target areas vary. By polling sets of numactivemics microphone for
each target area, we can measure the signal strength in each target area and hence discover and
monitor the actual targets.
Our goal then is to find for a fixed target area the optimal set of V microphones and their
corresponding broadband weights and that minimizes the p-norm of the broadband gain of the
inteferences while maintain target unity gain at each frequency with p ∈ {1, 2,∞}
36
0 2 4 6 8 100
2
4
6
8
 
 TargetInterferencesSensors
Fig. 4.1. Sample Scenario: L = 193 sources (1 target and 192 interferences), N = 200 micro-
phones. The 200 microphones are placed on the walls of a 10 m by 8 m room. To monitor the
target located at (3m, 4m), we wish to use only V microphones instead of all N so as not to over-
whelm our central processing unit with data. If we set V = 5, there are
(N
V
)
=
(100
5
)
≈ 7.5× 107
possible sets of 5 microphones to use. The L sources are assumed to be broadband, that is each
source consists of the F frequencies f1, f2, . . . , fF . We allow theN filter weights to vary for each
of the F frequencies and hence our optimization problem returns a set of F ·N filter weights.
4.3 Problem formulation
Consider the setup described earlier and sketched in Figure 4.1. We assume we know the
microphone locations and then fix a target area. Since the number, location, and power distribution
of each of the interferences is unknown, we assume a large number of virtual interferences, say
L. To account for possible reflections off of walls, we include virtual interferences outside our
survelliance area. Our objective is to minimize the p-norm of the interference broadband gains
with p ∈ {1, 2,∞}.
Assume the zero indexed source is the target, and the remaining L sources are interferences.
For these “virtual”L+1 sources we use the direct-path model, where the transfer function between
37
source l and microphone n at frequency fi is given by
Hl,n(fi) =
e−2piific‖rl−rn‖
‖rl − rn‖
, 0 ≤ l ≤ L, 1 ≤ n ≤ N (4.1)
where rn, rl denote the position vectors of microphone n and source l respectively, and c is the
speed of sound, which in our case is the speed of sound, 343 m/sec. The direct path model assumes
that the signal amplitude decays in proportion to distance and the frequency phase varies cyclically
in proportion to both the propagation speed and frequency.
Let wn(fi) denote the processing weight for microphone n at frequency fi. In other words,
we assume that each microphone applies a delay and scaling in the time domain to the signal
for each frequency of interest of fi. This can be accomplished by splitting the signal into fre-
quency bands via bandpass filters and operating on each bandlimited signal individually. A key
observation is that if wn(fi) = 0, then microphone n is inactive at frequency fi.
For a fixed interference l and frequency fi, the the system processing response to source l,
Rl(fi), of our delay-scale-sum system is given by
Rl(fi) =
N−1∑
n=0
Hl,n(fi)wn(fi). (4.2)
We note that for a fixed frequency fi each of theN microphones receives a “copy” of source
l that has been distorted by some transfer function Hl,n that is a function of the distance between
the microphone n’s and source l’s location. Each microphone n’s copy is processed by some
weight wn and all N processed, distorted copies are summed up to give the system processed
source l.
Rl(fi) is then a complex scalar that is exactly the delay-scale-sum system’s effect on source
l for frequency fi. By taking the modulus of Rl(fi), we get the gain Kl(fi), that is
Kl(fi) = |Rl(fi)| (4.3)
38
The gainKl(fi) is a real non-negative scalar that measures how much our system attenuates
or amplifies a given source l at frequency fi. If 0 ≤ Kl(fi) < 1, the system attenuates the source
l and conversely if Kl(fi) > 1, the system amplifies source l. If Kl(fi) = 1, the system does not
distort the amplitude of source l but may possibly introduce a phase shift to source l at frequency
fi.
Assume that we have F distinct frequencies of interest,f0, f1,. . .,fF−1. Given F sets of
N microphone weights, each interference l has F different gains, one for each frequency. Note
that for each of the F frequencies, we use a different set of N microphone weights to calculate
the interference gains at that frequency. Our objective is to minimize the p-norm of sum of gains
across F frequencies for interference sources while still achieving target unit response at each
frequency with p ∈ {1, 2,∞}. We call the sum of gains of an interference across F frequencies
an intereference’s broadband gain. Hence, the broadgain for source l for frequencies of interest
f0, f1, . . . , fF−1, Kl(f0, f1, . . . , fF−1),a real non-negative scalar, is given by
Kl(f0, f1, . . . , fF−1) =
F−1∑
i=0
|
N−1∑
n=0
Hl,n(fi)wn(fi)| (4.4)
Assumption H3, however, limits us to V simultaneously active microphones.
In this multi-frequency setup, when H5 is not satisfied, microphone n becomes ac-
tive if wn(fi) 6= 0 for any i. Thus, the number of non-zeros in the N -dimensional vector
{ max
0≤i≤F−1
|wn(fi)|} is an appropriate measure of the number of active microphones. In other
words, if a fixed microphone n has any non-zero weight for any of its F filter weights, the mi-
crophone is considered active. Let ‖w‖0 denote the pseudo-norm which counts the number of
non-zeros in the vector w. We can now state our optimization problem:
39
min
(wn(fi))
F−1,N−1
i=0,n=0
‖{
F−1∑
i=0
|
N−1∑
n=0
Hl,n(fi)wn(fi)|}1≤l≤L‖p
subject to
N−1∑
n=0
H0,n(fi)wn(fi) = 1 for i = 0, 1, . . . , F − 1
‖{ max
0≤i≤F−1
|wn(fi)|}0≤n≤N−1‖0 ≤ V
(4.5)
Note that this is a non-convex optimization problem due to the l0 pseudo-norm constraint
‖{ max
0≤i≤F−1
|wn(fi)|}0≤n≤N−1‖0 ≤ V . When hypothesis H5 is satisfied, we can solve the
optimization problem (4.5) independently for each frequency, and then implement an adaptive
space-time-frequency sampling.
4.4 Convex Optimization Strategy
Our method is inspired by LASSO regularization [1], a regression technique that minimizes
the sum of squares of residual errors subject to the l1 norm of the coefficients being less than a
constant. Similar to the sparse signal and model estimation approach in [2], the l0 pseudo-norm is
replaced by the l1 norm ‖w‖1 =
∑N−1
n=0 |wn(f)| which is then incorporated into the optimization
criterion using a Lagrange multiplier λ. The optimization problem (4.5) is then replaced by the
following convex optimization problem which we call the λ-method:
min
(wn(fi))
F−1,N−1
i=0,n=0
max
1≤l≤L
F−1∑
i=0
|
N−1∑
n=0
Hl,n(fi)wn(fi)|+ λ
N−1∑
n=0
max
0≤i≤F−1
|wn(fi)|
subject to
N−1∑
n=0
H0,n(fi)wn(fi) = 1 for i = 0, 1, . . . , F − 1
(4.6)
For very large λ, the penalty term forces many of the microphones to become inactive. Specif-
ically, let WN×F be the N by F matrix of microphone weights produced by the optimization
where a row represents the F filter weights of an individual microphone. With a large λ penalty,
many of the rows of WN×F contain only values very close to zero. If the nth row only con-
tains such small values, microphone n is inactive. As λ decreases, the penalty term becomes
40
less expensive and more microphones become active. At the limit, λ = 0, all microphones are
allowed to be active. We fine tune λ until we get V non-zero components. Specifically, this is
accomplished when gn(V+1) is significantly smaller than gn(1) (e.g. by 3 orders of magnitude),
where gn = max
0≤i≤F−1
|wn(fi)| and gn(i) is the i-th largest element in magnitude of the set {gn},
n = 0, 1, . . . , N − 1. We then solve (4.6) using this λ. The final weights are obtained by solving
a second time (4.6) restricted to V microphones and λ = 0, commonly known as the debiasing
step [1].
4.5 Varying Microphone Support Over Frequencies
When we optimize over multiple frequencies (4.6), the support of the chosen microphones
remains fixed over the frequencies. However, we can also run (4.6) for each individual frequency
of interest fi, that is we solve the optimization problem
min
(wn(fi))
N−1
n=0
max
1≤l≤L
|
N−1∑
n=0
Hl,n(fi)wn(fi)|+ λi
N−1∑
n=0
|wn(fi)|
subject to
N−1∑
n=0
H0,n(fi)wn(fi) = 1
(4.7)
We have to run (4.7) a total of F times, once for each frequency of interest fi. A run of
(4.7) for frequency fi means to iteratively choose λi until we discover a λi that produces V active
microphones. We note that every time we test a new candidate λi, the optimization problem (4.7)
needs to be solved .
Each run of (4.7) for fixed fi produces a different set of V microphones and hence the
support of the microphones can vary over frequencies. If the frequencies are chosen with uniform
spacing, we can use non-uniform sampling to reconstuct the signals of interest at each frequency.
Specifically assume we have a total of F frequency bands (e.g. F = 4 as in the results below).
Then each microphone has an additional F -channel filter bank each processing disjoint frequency
41
bands. The filter bank outputs are downsampled by F and the corresponding sample streams are
sent according the transmission policy obtained in the optimization step. The central controller
reconstructs the broadband signal by putting together the disjoint frequency bands.
Another method of allowing the microphone supports to vary over frequency is to solve the
following optimization problem
min
(wn(fi))
F−1,N−1
i=0,n=0
max
1≤l≤L
F−1∑
i=0
|
N−1∑
n=0
Hl,n(fi)wn(fi)|+
F−1∑
i=0
λi
N−1∑
n=0
|wn(fi)|
subject to
N−1∑
n=0
H0,n(fi)wn(fi) = 1 for i = 0, 1, . . . , F − 1
(4.8)
Every time (4.8) is solved, F different λi must be chosen. The question arises then how to
iteratively choose the F differnt λi so that V active microphones are chosen for each frequency.
We also note that (4.8) differs from (4.7) in that the objective function includes the broad-
band gain.
4.5.1 Varying λ1 and λ2 For Two Frequencies
We are interested in the pair of lambda values that produces exactly the same number of
significant microphones weights for each of the two frequencies where one lambda is a scalar
multiply of the ell-one norm of microphone weights for the first frequency and the other lambda is
similarly a scalar multiply of the ell-one norm of microphones weights for the second frequency
within the objective function. In other words, we have a fixed number of microphones for each
frequency but the microphone positions and weights are allowed to vary for each frequency.
For example, if we had 100 possible microphone locations, we would have then have 200
total microphones weights we wish to optimize, 100 weights for each of the two frequencies.
We want to find the pair of lambda values that would produce 32 significant weights in the 100
weights corresponding to frequency 1 and 32 significant weights in the 100 weights corresponding
42
to frequency 2.
A weight is considered significant if its magnitude is greater than 0.1% of the largest mag-
nitude of the weights for that frequency.
We note that the optimization problem consists of an objective function and a constraint.
The objective function is the sum of three terms: the p-norm of the sum of the interference gain
across the two frequencies, the first lambda multiplied against the ell-one norm of the microphone
weights corresponding to frequency one, and the second lambda multiplied against the ell-one
norm of the microphone weights corresponding to frequency two. In addition, we have the con-
straint that the microphone weights produce unit gain for the target at both frequencies.
We also note by increasing the lambda value corresponding to a single frequency, the num-
ber of significant weights for that that frequency decreases and similarly by deceasing the lambda
value corresponding to a single frequency, the number of significant weights for that frequency
increases. What is unknown is the effect of changing the lambda for a given frequency on the
number of significant weights for the other frequency.
To gain intuition on the effect of varying one lambda on the other frequency’s number of
significant weights, we run the following experiment.
We fix our possible microphones locations, the target, the interferences, and the two fre-
quencies.
We then vary two different lambdas values within our objective function. For example, if
we wish to test the lambda values 0,1, and 100 we would have 32 lambda pairs to test. For each
pair of lambdas we test, we collect four values: the number of significant weights for each of the
two frequencies and the ell-one norm of the microphone weights for each of the two frequencies.
We specify the number of significant microphone weights for each frequency we are inter-
ested in. In our case, 5,8, and 32 microphones weights out of 100.
43
For example, for 32 significant microphones, we plot the contour lines for 32 microphones
for each of the two frequencies on the same plot with the the x-axis and y-axis representing the
variation of lambda values tested We have then plotted two different contour lines, one for fre-
quency. We find the intersection of the two contour lines and for that intersection point, a specific
pair of lambdas, we look up the corresponding ell-one norm of weights of each of these two fre-
quencies. This represents two different ell-one weights. One is the ell-one norm of the weights
for frequency one and the other is the ell-one norm of the weights for frequency two.
We note that quantization (round-off error) is involved in finding this intersection point
since not all possible lambda values between smallest lambda and largest lambda, an uncountably
infinite number of points, are tested. The intersection point is chosen to be the pair of lambda
values that are closest in distance to the intersection of the two contour lines where the contour
lines are they themselves interpolated curves.
We then plot the contour lines for two different ell-one weights for both frequencies on the
same plot with the the x-axis and y-axis representing the variation of lambda values tested. This
represents 4 different contour lines.
4.6 Simulated Annealing
We test the robustness of the solution found by convex optimization through simulated an-
nealing (SA). Simulated annealing is a simple randomized technique for iterative improvement
introduced in [3]. SA will probabilistically accept worse transitions in order to avoid local min-
ima. In our case, SA minimizes the objective function given by the largest gain for all interference
positions for a fixed-size subset of microphones over locations of the microphones in the subset.
More precisely, given a fixed number of D microphones, we run the same convex optimization
problem as the λ-method (e.g., find the filter weights that minimize the maximum valid interfer-
44
ence over a coarse grid while maintaining a target gain of unity) but with λ set to 0. The objective
function value f is then the largest gain of the valid interferences. If the optimization program
fails for any reason, the objective function value is set to infinity. The initial temperature is set
to 100, and the annealing schedule decreases the temperature by 5% each iteration. We limit the
length of the Markov walk to a total length of 1200 iterations. The initial search point is the point
produced by the λ found through an iterative binary search that produces D = 32 microphones.
We assume that the target location, room size, and the frequency of interest are fixed.
4.7 Relation to Prior Work
In this section, we review prior work in relation to our work.
The author of [17] derives a probabilistic expression for the peak side lobe level for a linear
array with randomly placed elements. The elements’ weights are uniform in magnitude but their
phases are chosen to maximize the main lobe in the direction normal to the array. In our case, we
allow weights to vary in both magnitude and phase.
In [18], the authors address the problem of finding the sparsest linear array given a desired
beam pattern and error tolerance. The elements’ phases are fixed and their magnitudes’ are al-
lowed to vary. They furthermore assume symmetry in their linear array so that their optimization
variables are real. Analogous to our setup, they formulate an l0 quasi norm objective to count the
number of active array elements by counting the number of non-zero weight values. The number
of possible of active array elements available,N in our case, is large but finite to make the problem
computationally tractable. They convert the objective to a lp norm objective with 0 < p < 1 and
solve the optimization problem using a simplex search.
In [19], the author assumes a symmetric linear array setup and formulates an optimization
problem to find the real weights that minimize peak side lobe level for fixed sensor locations. He
45
then proposes another optimization problem that finds both sensor weights and sparse locations
simultaneously. To minimize the number of sensors, he formulates a l0 quasi norm objective
to count the number of active array elements by counting the number of non-zero weight. The
optimization problem also includes a user-specified constraint on the maximum allowable peak
side lobe value. He solves this combinatorially difficult problem for small values using a branch
and bound method. In our case, we work with 2-D arrays and optimize over complex weights. In
addition, we fix the number of active microphones rather than trying to minimize the number of
active microphones and seek to minimize the maximum interference gain in the case p =∞. We
relax the l0 quasi norm to an l1 norm to discover microphone locations.
In [20], the authors propose two greedy deletion algorithms and a l1 minimization algorithm
to find a sparse linear array containing the desired number of elements. First is the “minimum in-
crease” algorithm, where for a given array, an element is removed and the performance of the
remaining elements is measured. This procedure is repeated for each element of the array. The
element which affects the performance the least is then deleted. This procedure is then repeated
iteratively until the desired number of array elements remain. Second, is the ”smallest coefficient”
algorithm where the array element with the smallest weight is deleted. This is also repeated it-
eratively until the desired number of array elements remain. The third algorithm relaxes an l0
pseudo-norm constraint to a l1 norm to get a sparse solution. The algorithm then chooses the
elements with the V largest weight magnitudes where V is the desired number of active elements.
Finally, the algorithm finds new weights for the chosen V algorithms, a debiasing step. Our algo-
rithm is similar in spirit to the third algorithm but we instead we rerun the optimization problem,
tuning a sparsity parameter λ using a bisection method, until the l1 norm minimization problem
produces the desired number of V microphones.
In [21], the authors uses the theory of difference sets to place elements. In addition, he
46
proposes the idea of “spatial hopping” wherein one part of the array form one desired beampattern
and the other part of the array forms another desired beampattern. This idea is analogous to the
idea of using different subset of microphones to focus in targets at different locations.
The authors of [22] solve the problem of finding the maximally sparse linear array that
matches a given reference beampattern within a specified tolerance by converting the problem to
finding the maximally sparse linear array that matches a given reference beampattern with the
highest a-posteriori probability . They solve the converted problem via Bayesian Compressive
sampling. Their algorithm is dependent on an initial estimate of the error variance, σ20 .
4.8 Experimental Results
We run experiments by optimizing over a simpler model and then evaluating over a more
sophisticated model. Our optimization model is as follows: The room size is 10 m by 8 m. The
target of interest is located at (3 m,4 m). There are 1000 possible microphone locations along the
perimeter of the rectangular room. We optimize over four freqencies of 250, 500, 750,and 1000
Hz. There are 6200 virtual interferences, and a direct path model is used to calculate the transfer
functions. We do not place these interferences within 0.5 m of the perimeter of the room or the
target. The evaluation model differs from the optimization model in two ways: There is a denser
set of 620000 interferences, and we include reflections for each of them. We run five types of
experiments to compare the performance of the λ-method. First, we run the optimization problem
(4.6), simultaneously optimizing over the four frequencies. This setup fixes the support of the
microphone setup across all 4 frequencies. Second, we run the optimization problem (4.6) again
four times, once at each individual frequency. The support of the chosen microphones are then
allowed to vary over frequencies. Third, we randomly perturb the set of microphones found by
the multi-frequency optimization of experiment (1) using simulated annealing (SA) to see by how
much we can improve the solution. Fourth, we again use SA to perturb the set of microphones
47
found by the single-frequency optimization of experiment (2). Fifth, we test the performance of
the beamformer that maximizes the signal to noise ratio (SNR). This beamformer can be shown to
be the set of microphones that lie closest to the target.
Table (1) shows the worst interference gain in dB for the five setups using the evaluation
model. The results show our multi-freqency λ-method (column 3) outperforming the beamformer
that maximizes SNR (column 2) for every frequency. The single-frequency λ-method performs
better than multi-frequency λ-method since microphone locations are allowed to vary across fre-
quencies. Simulated annealing sometimes but not always finds better performing solutions when
measured with the evaluation model. By algorithm construction, simulated annealing finds a solu-
tion at least as good as the initial point when measured on the optimization model. Figs. 4.2 and
4.3 show the beam patterns for both the λ-method and the maximum-SNR beamformer at 1000
Hz along with the placement of the microphones. Results are in dB, with unit target gain (0 dB).
Fig. 4.4 compares the expected value and variance of the maximum gain among a varying number
of randomly placed interferences in the survelliance area. Finally, Fig. 4.5 shows the sharp drop
in filter weights produced by the multi-frequency λ-method.
f[Hz] BF MF λ SA MF λ SF λ SA SF λ
250 15.6 9.2 3.9 -1.19 -3.1
500 14.5 7.6 4.3 3.3 2.1
750 12.5 3.4 4.1 -0.9 -0.2
1000 10.4 2.4 5.8 1.5 0.1
Table 4.1. Worst interference gains[dB] for: Beamformer, MultiFreq λ-method, SA MultiFreq
λ-method, SingleFreq λ-method, SA SingleFreq λ-method.
48
4.9 Conclusions
We aim to utilize a very large number of available microphones by using customized sub-
sets of microphones to monitor specific areas of interest. This selective sampling of microphones
then produces reasonable amounts of data to be processed. An equivalent problem to our micro-
phone subset selection is microphone spacing. Our optimization criterion finds microphones that
suppress a large number of interferences across multiple frequencies while monitoring a target
location. We allow the subset of microphones we choose to have different weights for different
frequencies of interest. We show that our multi-frequency LASSO-inspired convex optimization
technique can find subsets of microphones that give reasonable performance on evaluation models
that contain large numbers of virtual interferences and reflections even though the optimization
criterion assumes many fewer virtual interferences and no reflections. If frequencies of interest
are uniformly spaced, we can achieve even better performance by allowing the active microphone
subset to change over frequencies and then reusing space-time-frequency sampling to recover our
signal of interest.
 
 Gain [dB]MicrophonesSource
−30
−25
−20
−15
−10
−5
0
5
10
Fig. 4.2. Gain map for classical beamformer that maximizes SNR at 1000 Hz
49
  
−30
−25
−20
−15
−10
−5
0
5
10Gain [sB]MicrophonesSource
Fig. 4.3. Gain map for beamformer found by multi-frequency λ-method at 1000 Hz
1 2 3 4 5 6 7 8 9 10−15
−10
−5
0
Number Interferences2
0*lo
g10
(mea
n gai
n)
 
 Max−SNRMFSA MFSFSA SF
1 2 3 4 5 6 7 8 9 10−30
−20
−10
0
Number Interferences10*
log1
0(var
iance
 gain
)
 
 Max−SNRMFSA MFSFSA SF
Fig. 4.4. Comparison of mean and variance of maximum gain for varying number of randomly
placed interferences at 1000 Hz
0 200 400 600 800 100010−10
10−8
10−6
10−4
10−2
100
Sorted Filter Weight Index
Σ f|W f
|
Fig. 4.5. Ordered weights found by multi-frequency λ-method
50
Part III
Processing Broadband Source Using Multi-rate Filterbanks
51
Chapter 5: Multirate Filterbank Review
We note that this chapter’s derivations are known in literature. In particular, we refer the
reader to [23].
5.1 Filtering Followed by Downsampling
F ↓D ↑D x q v e yG 
Fig. 5.1. Single channel of multi-rate filterbank
In the first half of Fig. 5.1, the input signal x is first filtered by filter F and then downsam-
pled by integer D. The output after downsampling is denoted by v. The output at time t′ after
filtering but before downsampling is given by the convolution q[t′] =
∑
t f [t
′− t]x[t]. If the input
to the down sampler is q, then the output of the downsampler is v[m] = q[mD]. Combining these
two relationships yields the input/output relationship
v[m] = q[mD] =
∑
k
f [mD − k]x[k] (5.1)
Reference [23] in Section 4.1 shows that the z-transform of (5.1) is given by
52
V (zD) =
1
D
D−1∑
d=0
F (zW dD)X(zW
d
D) (5.2)
5.2 Upsampling Followed by Filtering
In the second half of Fig. 5.1, the signal v is upsampled by integer D and then filtered by
filter G. We denote the signal after upsampling but before filtering as e[τ ] and the output after
filtering as y[t]. When upsampling by integer D, we have
e[τ ] =



v[ τD ] if τ mod D = 0;
0 otherwise.
(5.3)
The output y is a convolution of the filter G and e so by the previous relationship we can
derive the input/output relationship as follows
y[t] = (g ∗ e)[t]
=
∑
τ
g[t− τ ]e[τ ]
=
∑
k
g[t− kD]e[kD]
=
∑
k
g[t− kD]v[k] (5.4)
In the third line, we let τ = kD with integer k since e[τ ] is only non-zero when τ is an
integer multiple of D. [23] in §4.1 shows that the z-transform of (5.4) is given by
Y (z) = G(z)V (zD) (5.5)
53
… 
… … Fn,0(z) … … ↓D ↑D Gn,0(z) 
nx … + ny… … Fn,l(z) … ↓D ↑D Gn,l(z) … 
… Fn,C-1(z) ↓D ↑D Gn,C-1(z) 
0,ny
lny ,
1, −Cny
0,nv
lnv ,
1, −Cnv
Fig. 5.2. Filterbank for microphone n. Notice that D, the downsampling and upsampling factor,
is not necessarily equal to C, the number of channels (subbands).
5.3 Filterbank Processing
The output of the n-th’s microphone filterbank, yn, is the sum of each of the C channels
processing result. Hence, we have
yn[t] =
C−1∑
l=0
yn,l[t] (5.6)
where yn,l is the processed result of the l-th channel of the n-th microphone’s filterbank.
The z-transform of (5.6) is simply
Yn(z) =
C−1∑
l=0
Yn,l(z) (5.7)
Using (5.4), we can expand (5.6) as follows
yn[t] =
C−1∑
l=0
∑
k
gn,l[t− kD]vn,l[k] (5.8)
where gn,l is the synthesis filter of the l-th channel of the n-th microphone and vn,l is the
54
channel signal after decimation of the l-th channel of the n-th microphone.
From (5.7), (5.2), and (5.5), we can express the output of the n-th microphone as follows
Yn(z) =
1
D
C−1∑
l=0
Gn,l(z)
D−1∑
d=0
Fn,l(zW
d
D)X(zW
d
D) (5.9)
(5.9) can also be equivalently be expressed as a scaled product of a row vector, matrix, and
column vector, that is
Yn(z) =
1
D
Gn,0,D,1×C(z)Fn,C×D(z)X
T
n,1×D(z) (5.10)
where
Gn,d,D,1×C(z) = [Gn,0(zW
d
D), Gn,1(zW
d
D), · · · , Gn,C−1(zW
d
D)] (5.11)
is a row vector, size 1 by C, containing all C synthesis filters for the n-th filterbank,
Fn,C×D(z) =












Fn,0(z) Fn,0(zWD) . . . Fn,0(zW
D−1
D )
Fn,1(z) Fn,1(zWD) . . . Fn,1(zW
D−1
D )
...
...
. . .
...
Fn,C−1(z) Fn,C−1(zWD) . . . Fn,C−1(zW
D−1
D )












(5.12)
is a C by D analysis modulation matrix ( [24], §4.1) for the n-th filterbank, and
XTn,1×D(z) = [Xn(z), Xn(zWD), · · · , Xn(zW
D−1
D )]
T (5.13)
is a column vector, size D by 1, containing aliased versions of Xn(z), the z-transform of
the n-th filterbank. We also define the synthesis modulation matrix, size D by C, for the n-th
55
filterbank as follows
Gn,D×C(z) =












Gn,0(z) Gn,1(z) . . . Gn,C−1(z)
Gn,0(zWD) Gn,1(zWD) . . . Gn,C−1(zWD)
...
...
. . .
...
Gn,0(zW
D−1
D ) Gn,1(zW
D−1
D ) . . . Gn,C−1(zW
D−1
D )












(5.14)
If we expand out the matrix vector multiply Fn,C×D(z)XTn,1×D(z) as follows
Fn,C×D(z) XTn,1×D(z) =
Xn(z)












Fn,0(z)
Fn,1(z)
...
Fn,C−1(z)












+ Xn(zWD)












Fn,0(zWD)
Fn,1(zWD)
...
Fn,C−1(zWD)












+ . . .+Xn(zW
d−1
D )












Fn,0(zW
d−1
D )
Fn,1(zW
d−1
D )
...
Fn,C−1(zW
d−1
D )












(5.15)
and observe that W dD = e
−j2pi dD is D periodic in d, we notice the following useful equality from
the commutative property of addition
Fn,C×D(zW
d
D)X
T
n,1×D(zW
d
D) = Fn,C×D(z)X
T
n,1×D(z) (5.16)
for d = 0, 1, . . . , D − 1. Using (5.16), the input/output scalar equality of (5.10) can be
written in vector form as follows:
YT1×D =
1
D
Gn,D×C(z)Fn,C×D(z)X
T
n,1×D(z) (5.17)
where
YTn,1×D(z) = [Yn(z), Yn(zWD), · · · , Yn(zW
D−1
D )]
T (5.18)
56
is a column vector, size D by 1, containing aliased versions of Yn(z)
57
Chapter 6: Processing Random Source Using Plu-
rality of Multirate Filterbanks
In this chapter, Section 6.1 builds on top of the results of Chapter 5. Section 6.2 presents
the second order description of cyclostationary processes. Sections 6.3 and 6.4 describes how
cyclostationary processes propagate through a plurality of multirate filterbanks, which is one of
our main contributions. We note that the single multirate filterbank description for cyclostationary
processes has been derived in [25].
6.1 Problem Setup
Assume that we have N microphones and that the output of each of the N microphones,
after pre-filtering, is processed by an individual filterbank. Each filterbank is implemented as
a multi-rate, finite-impulse response (FIR) filterbank and each filterbank has the same setup as
shown in Fig. 5.2 with D, the downsampling and upsampling rate, and C, the number of channels
fixed for allN filterbanks. The analysis and synthesis filters for each subchannel of each filterbank,
however, will be allowed to vary. Each of the N filterbanks receives a signal propagating from
source xr . See Fig. 6.1 for the overall system processing setup for a single source xr.
The n-th filterbank receives a sampled input originating from acoustic source xr. Each
microphone samples at the same uniform rate, and this rate is sufficient to recover all I+1 sources,
58
ǥfilterbank00,rx ǥ
+ ryǥǥ
0,ry
nry ,
1, Nry
0,0I
rx nrx ,
1, Nrx filterbankN-1
filterbankn
ǥ
ǥ
0,rP
nrP ,
1, NrP
ǥ
ǥ
nrH ,
nI ,0 1,0 NI
Fig. 6.1. A single acoustic source xr propagates to N microphones, and the output of each the
N microphones is processed by an individual multi-rate, finite-impulse response (FIR) filterbank.
The subscript r of xr indexes the I + 1 sources which consist of I interference sources, whose
overall gain we wish to minimize, and 1 target source, whose gain we wish to be exactly 1 so
that our system perfectly reconstructs the target. We index the target source by setting r = 0,
and hence the target source is denoted by x0. The propagation from source xr to filterbank n
is modeled with transfer function Pr,n. The prefilter I0,n inverts the propagation from the target
source, x0, to filterbank n. The cascade of the propagation filter Pr,n and target inversion filter
I0,n is denoted as Hr,n. The output of the N filterbanks are summed to give a processed signal yr
each of which is assumed to be bandlimited. The sampled input is given by xr,n[k] = xr,n(kTs)
where k is an integer time index and Ts is the sampling period. The input signal for filterbank n,
xr,n, is then fed into C subchannels. The l-th subchannel for the n-th filterbank consists of a FIR
analysis filter Fn,l(z), a down-sampler of integer D, an up-sampler (zero interpolator) of integer
D, and FIR synthesis filter Gn,l(z). The outputs of the C subchannels are combined to give an
59
output signal yr,n or in the z-domain,
Yr,n(z) =
D−1∑
d=0
(
1
D
C−1∑
l=0
Gn,l(z)Fn,l(zW
d
D)
)
Xr,n(zW
d
D) (6.1)
where the above equation is simply a rearrangement of terms of (5.9). Gn,l is the transfer
function of the synthesis filter of filterbank n’s subchannel l and Fn,l is the transfer function of the
analysis filter of filterbank n’s subchannel l. In short, yr,n is the processed output of filterbank n
given an input signal propagating from source xr.
Assume we have I+1 acoustic point sources in the environment. We denote the r-th source
as xr. We set the 0-th source, x0, to be the target of interest and denote all the remaining I sources
as interferences. We wish to minimize the overall gain of these I interferences.
Let Xr,n(z) be the signal received by the n-th filterbank from source xr, and let Pr,n be the
transfer function that models the propagation from source xr to the n-th filterbank. Each filterbank
is pre-filtered with a target inversion filter, that is a filter I0,n that inverts the propagation effect on
the target’s phase from the target x0 to filterbank n. We denote the cascade of the propagation and
target inversion filter from source xr to microphone n as Hr,n , that is
Hr,n(z) = Pr,n(z)I0,n(z) (6.2)
We then have
Xr,n(z) = Hr,n(z)Xr(z) (6.3)
Note that the target inversion pre-filter does not vary with source xr but only varies with
filterbank n.
Remark Ideally, source x0, the target source, enters to each of the N filterbanks with only a
amplitude scaling and with its original phase. If we assume that the propagation P0,n from target
60
source x0 to microphone n can be inverted perfectly by prefilter I0,n, then we have
H0,n(z) = P0,n(z)I0,n(z) = α0,nz
−∆Hn , (6.4)
where α0,n is a real scalar representing the amplitude change from propagation and ∆Hn repre-
sents a processing delay.
The processed output of each of the N filterbanks, Yr,n(z), are summed to give the final
output signal Yr(z), the system output from system input Xr(z), that is
Yr(z) =
N−1∑
n=0
Yr,n(z) (6.5)
By substituting the right hand side of (6.1) for Yr,n(z) in (6.5) and then substituting the
right hand side of (6.3) evaluated at zW dD for Xr,n(zW
d
D) in the resulting equation, we have, after
rearranging terms,
Yr(z) =
D−1∑
d=0
(
1
D
N−1∑
n=0
Hr,n(zW
d
D)
C−1∑
l=0
Gn,l(z)Fn,l(zW
d
D)
)
Xr(zW
d
D) (6.6)
We want our system to perfectly reconstruct the target source, x0, after some processing
delay ∆, that is we want, in the z-domain,
Y0(z) = z
−∆X0(z) (6.7)
By substituting the right hand side of (6.6) with r = 0, the target index, into the left hand
side of (6.7), we get
D−1∑
d=0
(
1
D
N−1∑
n=0
H0,n(zW
d
D)
C−1∑
l=0
Gn,l(z)Fn,l(zW
d
D)
)
X0(zW
d
D) = z
−∆X0(z) (6.8)
For the equality of (6.8) to hold, the following D conditions need to hold:
61
N−1∑
n=0
H0,n(zW
d
D)
C−1∑
l=0
Gn,l(z)Fn,l(zW
d
D) =



Dz−∆ if d = 0;
0 if 1 ≤ d ≤ D − 1
(6.9)
where ∆ is some positive integer, denoting the overall delay of the system.
In matrix-vector form, the D target perfect reconstruction of conditions of (6.9) can be
written as












F˙0,1×C(z) F˙1,1×C(z) · · · F˙N−1,1×C(z)
F˙0,1×C(zWD) F˙1,1×C(zWD) · · · F˙N−1,1×C(zWD)
...
...
. . .
...
F˙0,1×C(zW
D−1
D ) F˙1,1×C(zW
D−1
D ) · · · F˙N−1,1×C(zW
D−1
D )












︸ ︷︷ ︸
F˙D×N·C












GT0,1×C(z)
GT1,1×C(z)
...
GTN−1,1×C(z)












︸ ︷︷ ︸
GT1×N·C
=












Dz−∆
0
...
0












(6.10)
where the matrix F˙D×N ·C , size D by N · C, consists of D · N row vectors, size 1 by C,
defined as
F˙n,1×C(z) = H0,n(z) · [Fn,0(z), Fn,1(z), · · · , Fn,C−1(z)] (6.11)
and the column vector GT1×N ·C , size N · C by 1, consists of N column vectors, size C by
1, defined as
GTn,1×C(z) = [Gn,0(z), Gn,1(z), · · · , Gn,C−1(z)]
T (6.12)
62
… 
… … Fn,0(z) … … ↓D ↑D Gn,0(z) nx … + ny… … Fn,l(z) … ↓D ↑D Gn,l(z) … 
… Fn,C-1(z) ↓D ↑D Gn,C-1(z) 
0,ny
lny ,
1, −Cny
0,nv
lnv ,
1, −Cnv
Fig. 6.2. FIR multi-rate filter bank for n-th microphone
6.2 Cyclostationary Processes
The output of each microphone is processed by a corresponding multirate filterbank, Fig.
6.2, and hence it is useful to introduce cyclostationary processes when analyzing such filterbank’s
statistical properties.
Definition 6.2.1. The correlation function of two zero mean discrete random processes xn1 , xn2 ,
evaluated at time t1, t2, is defined as
Rxn1 ,xn2 [t1, t2] = E[xn1 [t1]xn2 [t2]] (6.13)
Definition 6.2.2. Two zero mean discrete random processes xn1 , xn2 are said to be jointly wide-
sense stationary (WSS) if their correlation function depends only on the difference of their time
evaluation, t1 − t2, that is
Rxn1 ,xn2 [t1, t2] = Rxn1 ,xn2 [t1 − t2, 0] for all t1, t2 (6.14)
To shorten notation for the correlation of jointly WSS signals, we will define the following
equivalence:
63
Rxn1 ,xn2 [t1 − t2, 0] ≡ Rxn1 ,xn2 [t1 − t2] (6.15)
Definition 6.2.3. The spectral density of two zero mean discrete random processes xn1 , xn2 which
are jointly WSS is the z-transform, evaluated at ejw, of their correlation function, that is
Sxn1 ,xn2 (e
jw) =
∑
τ
Rxn1 ,xn2 [τ ]e
−jwτ (6.16)
Definition 6.2.4. Two zero mean discrete random processes xn1 , xn2 are said to be jointly wide-
sense cyclostationary (WSCS) with period D if their correlation function is periodic, that is
Rxn1 ,xn2 [t1 +D, t2 +D] = Rxn1 ,xn2 [t1, t2] for all t1, t2 (6.17)
Remark Assume xn1 , xn2 are jointly WSS, then we have
Rxn1 ,xn2 [t1 + P0, t2 + P0] = Rxn1 ,xn2 [t1 − t2] = Rxn1 ,xn2 [t1, t2] (6.18)
(6.18) shows that jointly WSS is a special case of jointly WSCS since jointly WSS implies
jointly WSCS with period 1.
Remark Assume xn1 , xn2 are jointly WSCS with period D and fix an integer u0. Then their
correlation function, defined as a function of integer k,
Rxn1 ,xn2 [u0 + k, k] (6.19)
is D periodic with respect to input parameter k.
Definition 6.2.5. The cyclic correlation function for period D and integer d with 0 ≤ d ≤ D − 1
is defined as
64
Rd/Dxn1 ,xn2 [u] =
1
D
D−1∑
k=0
Rxn1 ,xn2 [k + u, k]e
−j2pi dD k (6.20)
The value d/D is known as the cyclic frequency.
Remark From equation (6.17), we observe that for fixed u, Rxn1 ,xn2 [k + u, k] is a periodic
sequence in k with period D if xn1 , xn2 are jointly WSCS with period D . Hence, R
d/D
xn1 ,xn2 [u] are
the Fourier coefficients of Rxn1 ,xn2 [k + u, k], that is
Rxn1 ,xn2 [k + u, k] =
D−1∑
d=0
Rd/Dxn1 ,xn2 [u]W
−kd
D (6.21)
(6.21) can be verified by substituting in (6.20) into the right hand side and using (2.1).
Remark If xn1 , xn2 are jointly WSS, then the application of (2.1) for any D ≥ 1 shows that
Rd/Dxn1 ,xn2 [u] =



Rxn1 ,xn2 [u] if d = 0 mod D;
0 if 0 < d ≤ D − 1.
(6.22)
Definition 6.2.6. The cyclic spectral density for period D and integer d is the z-transform, evalu-
ated at ejw, of the cyclic correlation function for period D and integer d:
Sd/Dxn1 ,xn2 (e
jw) =
∑
u
Rd/Dxn1 ,xn2 [u]e
−jwu (6.23)
where w ∈ [0, 2pi).
Remark If xn1 , xn2 are jointly WSS, we observe from (6.22) that the cyclic spectral density for
d = 0 is exactly the spectral density, that is
S0xn1 ,xn2 (e
jw) = Sxn1 ,xn2 (e
jw) (6.24)
65
6.3 System Input/Output Relationship
The following section derives an input/output relationship for the system given in Fig. 6.3.
We note that Fig. 6.3 is identical to that of Fig. 6.1 except the subscript r has been dropped from
the source x since in the following derivations we are assuming a fixed source xr. We assume that
the source x is white noise and hence WSS and with uncorrelated time samples and the output y
is the sum of N multirate filterbanks each processing a signal propagating from the source x.
In the following, we drop the subscript r from our notation
Remark We will show in the following section that when the input signal x is WSS, then xn1 , xn2 ,
the input to the n1 and n2 filterbank respectively, are jointly WSS. Furthermore, vn1,p, vn2,q, the
p and q subchannel signal of the n1 and n2 filterbank respectively are also jointly WSS. However,
yn1 , yn2 , the output of the n1 and n2 filterbank respectively, are jointly WSCS with period D.
6.3.1 Filterbank Input Cyclic Spectral Density
Lemma 6.3.1. Assume the fixed source x is WSS with spectral density Sx,x(ejw) and is processed
with the scheme depicted in Fig. 6.3. Then xn1 and xn2 are jointly WSS with cyclic spectral
density given as follows:
Sd/Dxn1 ,xn2 (e
jw) =



Sx,x(ejw)Hn1(e
jw)Hn2(ejw) if d = 0 mod D;
0 if d 6= 0 mod D.
(6.25)
where D is any given period, D ≥ 1.
In the special case when the input signal x is white noise, that is x is WSS and decorrelated,
with variance σ2x, then the cyclic spectral density simplifies to the following:
66
0
0x
+y
0y ny 1Ny
0,0I
x
nx 1Nx
N‐1n
0P nP 1NP
nH
nI ,0
1
,0
NI
Fig. 6.3. A single acoustic source x propagates to N microphones, and the output of each the
N microphones is processed by an individual multi-rate, finite-impulse response (FIR) filterbank.
The propagation from source x to filterbank n is modeled with transfer function Pn. The prefilter
I0,n inverts the propagation from the target source, x0, to filterbank n. The cascade of the propa-
gation filter Pn and target inversion filter I0,n is denoted as Hn. The output of the N filterbanks
are summed to give a processed signal yr
Sd/Dxn1 ,xn2 (e
jw) =



σ2xHn1(e
jw)Hn2(ejw) if d = 0 mod D;
0 if d 6= 0 mod D.
(6.26)
where as before D is any given period, D ≥ 1.
Proof. The inputs xn1 and xn2 to the n1-th filterbank and n2-th filterbank respectively from a
single WSS source x are jointly WSS, that is
Rxn1 ,xn2 [t1, t2] = Rxn1 ,xn2 [t1 − t2] (6.27)
67
A simple proof of this statement follows:
E[xn1 [t1]xn2 [t2]] = E[
∑
k
x[t1 − k]hn1 [k]
∑
l
x[t2 − l]hn2 [l]]
=
∑
k
∑
l
hn1 [k]hn2 [l]E[x[t1 − k]x[t2 − l]]
=
∑
k
∑
l
hn1 [k]hn2 [l]Rx,x[t1 − t2 − k + l]
︸ ︷︷ ︸
function[t1−t2]
where the last line follows because the source x is assumed to be WSS.
To show our desired result, we first note that (6.22) implies for d 6= 0 mod D, Sd/Dxn1 ,xn2 (e
jw) =
0 since xn1 and xn2 are jointly WSS. Hence, we now calculate the cyclic spectral density between
xn1 and xn2 for period D and d = 0.
S0xn1 ,xn2 (e
jw) =
∑
u
R0xn1 ,xn2 [u− 0]e
−jwu
=
∑
u
E[(
∑
k
hn1 [u− k]x[k])
︸ ︷︷ ︸
(x∗hn1 )[u]
(
∑
l
hn2 [−l]x[l])
︸ ︷︷ ︸
(x∗hn2 )[0]
]e−jwu
=
∑
k
∑
l
∑
u
hn1 [u− k]hn2 [−l]E[x[k]x[l]]︸ ︷︷ ︸
Rx,x[k−l]
e−jw(u−k)e−jw(k−l)e−jwl
=
∑
l
∑
k
∑
u
hn1 [u− k]e
−jw(u−k)
︸ ︷︷ ︸
Hn1 (e
jw)
Rx,x[k − l]e
−jw(k−l)hn2 [−l]e
−jwl
= Hn1(e
jw)
∑
l
∑
k
Rx,x[k − l]e
−jw(k−l)
︸ ︷︷ ︸
S0x,x(ejw)
hn2 [−l]e
−jwl
= Hn1(e
jw)Hn2(ejw)S
0
x,x(e
jw) (6.28)
Hence, we have proven (6.25). If the source signal x is white noise with variance σ2x, then
we have
Rx,x[k + u, k] = Rx,x[u] = σ
2
xδu (6.29)
68
Using the definition of cyclic spectral density, (6.23), (6.22), and the correlation of white
noise, (6.29), we then have
S0x,x(e
jw) =
∑
u
R0x,xe
−jwu =
∑
u
Rx,x[u]e
−jwu = σ2x (6.30)
Substituting (6.30) into (6.28), we then have the desired
S0xn1 ,xn2 (e
jw) = σ2xHn1(e
jw)Hn2(ejw) (6.31)
which proves (6.26).
6.3.2 Downsampled Subchannel Signal Spectral Density
We now derive the spectral density of subchannel signals vn1,p and vn2,q where vn,k is the
signal of the k-th subchannel of the n-th filterbanks after being processed by the analysis filter
Fn,k and then downsampled by D. As before, n = 0, 1, · · · , N − 1 indexes the filterbanks, and
k = 0, 1, · · · , C − 1 indexes the subchannels. See Fig. 5.2.
Lemma 6.3.2. The signals vn1,p and vn2,q are jointly WSS with spectral density given by
Svn1,p,vn2,q(e
jDw) =
1
D
D−1∑
m=0
Fn1,p(e
j(w−wD,m))·
D−1∑
n′=0
Fn2,q(e
j(w−wD,n′ ))S(n
′−m)/D
xn1 ,xn2
(ej(w−wD,m))
(6.32)
where as in (2.8), wD,d = 2piD d.
Remark The statement of this lemma holds true even under the weaker assumption that xn1 , xn2
are jointly WSCS with period D. The following proof remains the same under the weaker as-
sumption.
69
Proof. We note that this derivation parallels the derivation in Appendix I of [25]. First we show
that the downsampled subchannel signals vn1,p, vn2,q are jointly WSS so that their spectral density
exists. Using (5.1), we can derive a relationship between the correlation of the downsampled
channels vn1,p, vn2,q and the correlation of filter input signals xn1 , xn2 . See Fig. (5.2) for the
setup of a single filterbank.
Rvn1,p,vn2,q [m,n] = E[vn1,p[m], vn2,q[n]]
= E
[
∑
k
fn1,p[k]xn1 [Dm− k]
∑
k′
fn2,q[k
′]xn2 [Dn− k
′]
]
=
∑
k
∑
k′
fn1,p[k]fn2,q[k
′]Rxn1 ,xn2 [Dm− k,Dn− k
′]
=
∑
k
∑
k′
fn1,p[k]fn2,q[k
′]Rxn1 ,xn2 [D(m− n)− k,−k
′]
︸ ︷︷ ︸
function [m− n]
(6.33)
where we subtracted −Dn from both arguments in the second line since xn1 , xn2 are
jointly WSS, (6.27), and therefore jointly WSCS with period 1, (6.18). The last line shows that
Rvn1,p,vn2,q [m,n] is a function of only m− n and therefore vn1,p, vn2,q are jointly WSS.
We now begin the derivation of (6.32) with the definition of the spectral density of vn1,p
and vn2,q, (6.16), evaluating the spectral density at the argument e
jDw:
Svn1,p,vn2,q(e
jDw) =
∑
τ
Rvn1,p,vn2,q [τ ]e
−jDwτ (6.34)
We now substitute in (6.33) into (6.34) by letting τ = m− n, yielding
Svn1,p,vn2,q(e
jDw) =
∑
τ
∑
k
∑
k′
fn1,p[k]fn2,q[k
′]Rxn1 ,xn2 [Dτ − k,−k
′]e−jDwτ
We factor out common terms in the summations (e.g. push summation onto only the terms
that vary with the summation index), add and subtract k′ in the first argument of Rxn1 ,xn2 , and let
70
τ ′ = Dτ + k′ − k to get the following:
=
∑
k
∑
k′
fn1,p[k]fn2,q[k
′]
∑
τ
Rxn1 ,xn2 [−k
′ +Dτ + k′ − k
︸ ︷︷ ︸
τ ′
,−k′]e−jDwτ
=
∑
k
∑
k′
fn1,p[k]fn2,q[k
′]
∑
τ ′:τ ′−k′+k=0 mod D
Rxn1 ,xn2 [−k
′ + τ ′,−k′]e−jw(τ
′−k′+k)
We next insert the following identity that can be verified using the formula for the sum of
terms of a geometric series.
1
D
D−1∑
m=0
W−m(τ
′−k′+k)
D =



1 if τ ′ − k′ + k is a multiple of D;
0 otherwise.
where, as before, WD = e−j2pi/D. By inserting this identity, the summation over τ ′ :
τ ′ − k′ + k = 0 mod D can be simplified to be a summation over all τ ′
Hence, we have
=
∑
k
∑
k′
fn1,p[k]fn2,q[k
′]
∑
τ ′
Rxn1 ,xn2 [−k
′ + τ ′,−k′](
1
D
D−1∑
m=0
W−m(τ
′−k′+k)
D )e
−jw(τ ′−k′+k)
We now use equation (6.21) to expand Rxn1 ,xn2 [−k
′ + τ ′,−k′] into its Fourier Series rep-
resentation, rearrange terms, and use the definition of the Discrete Time Fourier Transform to get
the following:
=
1
D
D−1∑
m=0
D−1∑
n=0
∑
k
∑
k′
fn1,p[k]fn2,q[k
′]
∑
τ ′
Rn/Dxn1 ,xn2 [τ
′]Wnk
′
D ·W
−m(τ ′−k′+k)
D e
−jw(τ ′−k′+k)
=
1
D
D−1∑
m=0
D−1∑
n=0
∑
k
∑
k′
fn1,p[k]e
−jwkW−mkD fn2,q[k
′]ejwk
′
W (m+n)k
′
D
∑
τ ′
Rn/Dxn1 ,xn2 [τ
′]e−jwτ
′
W−mτ
′
D
=
1
D
D−1∑
m=0
D−1∑
n=0
Fn1,p(e
j(w−wD,m)) · Fn2,q(e
j(w−wD,m+n))Sn/Dxn1 ,xn2 (e
j(w−wD,m))
71
where as in (2.8), wD,d = 2piD d. We again factor out common terms in the summation and
let n′ = m+ n to get
=
1
D
D−1∑
m=0
Fn1,p(e
j(w−wD,m)) ·
m+D−1∑
n′=m
Fn2,q(e
j(w−wD,n′ ))S(n
′−m)/D
xn1 ,xn2
(ej(w−wD,m))
For fixed m with 0 ≤ m ≤ D− 1, the inner sum over n′ has (m+D− 1)− (D) + 1 = m
terms whose n′ index is greater than or equal toD. Each of these “overflow” indices can be shifted
by −D to make them each less than m because of the following two relationships, each of which
can be verified by direct substitution:
S(n
′−m)/D
xn1 ,xn2
(ej(w−wD,m)) = S(n
′−D−m)/D
xn1 ,xn2
(ej(w−wD,m))
Fn2,l(e
j(w−wD,n′ )) = Fn2,l(e
j(w−wD,n′−D))
After shifting the “overflow” indices, the summation over n′ runs from 0 to D − 1 instead
of from m to m+D − 1. We then have the desired result:
Svn1,p,vn2,q(e
jDw) =
1
D
D−1∑
m=0
Fn1,p(e
j(w−wD,m))·
D−1∑
n′=0
Fn2,q(e
j(w−wD,n′ ))S(n
′−m)/D
xn1 ,xn2
(ej(w−wD,m))
6.3.3 Filterbank Input and Downsampled Subchannel Signal Matrix Relationship
We first define the cyclic spectral density matrix of filterbank input signals xn1 and xn2 of
size D ×D as follows
(Sxn1 ,xn2 ,D×D(e
jw))m,n = S
(n−m)/D
xn1 ,xn2
(ej(w−wD,m)) (6.35)
72
where m,n = 0, . . . , D − 1 and wD,m = 2piDm, as in (2.8). Note that when xn1 , xn2 are
jointly WSS, Sxn1 ,xn2 ,D×D(e
jw) is a diagonal matrix. Then (6.25) implies we can decompose the
matrix Sxn1 ,xn2 ,D×D(e
jw) as follows
Sxn1 ,xn2 ,D×D(e
jw) = Hn1,D×D(e
jw)Sx,x,D×D(e
jw)H∗n2,D×D(e
jw) (6.36)
where
Hnk,D×D(e
jw) = diag(Hnk(e
jw), Hnk(e
j(w−wD,1)), . . . ,Hnk(e
j(w−wD,D−1)) (6.37)
for k = 0, 1, . . . , N − 1 and
Sxn1 ,xn2 ,D×D(e
jw) = diag(Sx,x(e
jw), Sx,x(e
j(w−wD,1)), . . . , Sx,x(e
j(w−wD,D−1)) (6.38)
If x is white noise with variance σ2x, then (6.36) simplifies to
Sxn1 ,xn2 ,D×D(e
jw) = σ2xHn1,D×D(e
jw))H∗n2,D×D(e
jw) (6.39)
We observe that the scalar spectral density of downsampled subchannel signals, (6.32), can
then be written as product of a row vector, matrix, and column vector, that is
Svn1,p,vn2,q(e
jDw) =
1
D
Fn1,p,1×D(e
jw)Sxn1 ,xn2 ,D×D(e
jw)F∗n2,q,1×D(e
jw) (6.40)
where
Fn,r,1×D(e
jw) = [Fn,r(e
jw), Fn,r(e
j(w−wD,1)), . . . , Fn,r(e
j(w−wD,D−1))] (6.41)
73
Notice that in (6.32), the subchannel indexes p and q are both fixed. Each of the two fil-
terbanks n1, n2 has C subchannels and therefore both p, q can vary between 0 and C − 1, that is
0 ≤ p, q ≤ C−1. To capture all C×C spectral density relationships between different downsam-
pled subchannels of filterbanks n1 and n2, we define the spectral density matrix of downsampled
subchannel signals, size C × C, as follows:
(Svn1 ,vn2 ,C×C(e
jDw))pq = Svn1,p,vn2,q(e
jDw) (6.42)
Then using (6.40), the modulation matrix of the analysis filters, (5.12), and (6.36), we have
Svn1 ,vn2 ,C×C(e
jDw) =
1
D
Fn1,C×D(e
jw)Sxn1 ,xn2 ,D×D(e
jw)F∗n2,C×D(e
jw)
=
1
D
Fn1,C×D(e
jw)Hn1,D×D(e
jw)Sx,x,D×D(e
jw)H∗n2,D×D(e
jw)F∗n2,C×D(e
jw)
(6.43)
6.3.4 Filterbank Output Cylic Correlation
We now derive the following cyclic correlation between two filterbanks’ output signals yn1
and yn2 where yn is the output of the n-th filterbank in terms of correlation between downsampled
subchannel signals and vn,l is the l-th subchannel signal of the n-th filterbank after downsampling.
Lemma 6.3.3. For any integer u and 0 ≤ d ≤ D − 1,
Rd/Dyn1 ,yn2 [u] =
1
D
C−1∑
p,q=0
∑
r
∑
τ
gn1,p[r + u−Dτ ] · gn2,q[r]Rvn1,p,vn2,q [τ ]e
−j2pi dD r (6.44)
Proof. First, we derive an expression for the correlation between two filterbanks’ outputs in terms
of their downsampled subchannel signals where yn is the multirate output of the n-th filterbank,
n = 0, 1 . . . , N−1, and vn,l is the l-th subchannel signal after downsampling of the n-th filterbank,
l = 0, 1, . . . , C − 1. We use (6.13) ,(5.8), and (6.33) to get the following:
74
Ryn1 ,yn2 [n, k] = E[yn1 [n]yn2 [k]]
= E


C−1∑
p=0
∑
m′
gn1,p[n−Dm
′]vn1,p[m
′]
C−1∑
q=0
∑
m
gn2,q[k −Dm]vn2,q[m]


=
C−1∑
p,q=0
∑
m′
∑
m
gn1,p[n−Dm
′] · gn2,q[k −Dm]Rvn1,p,vn2,q [m
′ −m] (6.45)
We can verify that the correlation of processed filterbank outputs are cyclostationary with
period D, that is
Ryn1 ,yn2 [n+D, k +D] = Ryn1 ,yn2 [n, k] (6.46)
using two changes of variables (m′′ = m′− 1,m′′′ = m− 1) in (6.45). If we let n = k+u
and τ = m′ −m in (6.45), we have
Ryn1 ,yn2 [k+u, k] =
C−1∑
p,q=0
∑
τ
∑
m
gn1,p[k+u−D(τ+m)] ·gn2,q[k−Dm]Rvn1,p,vn2,q [τ ] (6.47)
We substitute (6.47) into the definition of the cyclic correlation function, (6.20), rearrange
terms, and then multiply by ej2pi
d
DDm = 1 for integers d and m to get
Rd/Dyn1 ,yn2 [u] =
1
D
D−1∑
k=0
C−1∑
p,q=0
∑
τ
∑
m
gn1,p[k + u−D(τ +m)] · gn2,q[k −Dm]Rvn1,p,vn2,q [τ ]e
−j2pi dD k
Rd/Dyn1 ,yn2 [u] =
1
D
C−1∑
p,q=0
∑
τ
Rvn1,p,vn2,q [τ ] ·
(
D−1∑
k=0
∑
m
gn1,p[k −Dm+ u−Dτ ] · gn2,q[k −Dm]e
−j2pi dD (k−Dm)
)
We observe that
∑D−1
k=0
∑
m f [k − Dm] =
∑
r f [r], and hence we let r = k − Dm to
combine the summations over k and m into one summation over r to get the desired cyclic cross
correlation.
75
Rd/Dyn1 ,yn2 [u] =
1
D
C−1∑
p,q=0
∑
r
∑
τ
gn1,p[r + u−Dτ ] · gn2,q[r]Rvn1,p,vn2,q [τ ]e
−j2pi dD r
6.3.5 Filterbank Output Cyclic Spectral Density
We now derive the filterbank output cyclic spectral density in terms of the spectral density
of downsampled subchannel signals. We begin with the definition of cyclic spectral density (6.23),
substitute in (6.44), factor out common terms in the summations, multiply by ejw(r−r+Dτ−Dτ) =
1, and use the definition of the Discrete Time Fourier Transform to get the following:
Sd/Dyn1 ,yn2 (e
jw) =
∑
u
Rd/Dyn1 ,yn2 [u]e
−jwu
=
1
D
C−1∑
p,q=0
∑
u
∑
r
∑
τ
gn1,p[r + u−Dτ ] · gn2,q[r]Rvn1,p,vn2,q [τ ]e
−j2pi dD re−jwu
=
1
D
C−1∑
p,q=0
∑
r
gn2,q[r]e
j(w−2pi dD )r
∑
τ
Rvn1,p,vn2,q [τ ]e
−jwDτ
·
∑
u
gn1,p(r + u−Dτ)e
−jw(r+u−Dτ)
=
1
D
C−1∑
p,q=0
Gn2,q(e
j(w−wD,d))Svn1,p,vn2,q(e
jDw)Gn1,p(e
jw) (6.48)
where, as before in (2.8), wD,d = 2pi dD .
6.3.6 Downsampled Subchannel Signal and Filterbank Output Matrix Relationship
We observe that (6.48) can be written as a product of a row vector, matrix, and column
vector, that is
Sd/Dyn1 ,yn2 (e
jw) =
1
D
Gn1,0,D,1×C(e
jw)Svn1 ,vn2 ,C×C(e
jDw)G∗n2,d,D,1×C(e
jw) (6.49)
76
where
Gn,d,D,1×C(e
jw) = [Gn,0(e
j(w−wD,d)), Gn,1(e
j(w−wD,d)), . . . , Gn,C−1(e
j(w−wD,d))]
We define the cyclic spectral density matrix of output signals yn1 , yn2 , size D ×D, as
(Syn1 ,yn2 ,D×D(e
jw))m,n = S
(n−m)/D
yn1 ,yn2
(ej(w−wD,m)) (6.50)
where, as before in (2.8), wD,m = 2pimD . Then, using (6.49) and the modulation matrix of
synthesis filters, (5.14), we have
Syn1 ,yn2 ,D×D(e
jw) =
1
D
Gn1,D×C(e
jw)Svn1 ,vn2 ,C×C(e
jDw)G∗n2,D×C(e
jw) (6.51)
6.3.7 Matrix Relationship Between Filterbank Inputs and Filterbank Outputs
Combining the previous matrix relationships of (6.36), (6.43), and (6.51) gives the follow-
ing statistical input/output relationship of two different filterbanks n1, n2.
Syn1 ,yn2 ,D×D(e
jw) =
1
D2
·
(
Gn1,D×C(e
jw)Fn1,C×D(e
jw)Hn1,D×D(e
jw)
)
Sx,x,D×D(e
jw)
(
H∗n2,D×D(e
jw)F∗n2,C×D(e
jw)G∗n2,D×C(e
jw)
)
(6.52)
To shorten notation, we define the following relationship:
Qn,D×D(e
jw) = Gn,D×C(e
jw)Fn,C×D(e
jw)Hn,D×D(e
jw) (6.53)
where synthesis modulation matrix Gn,D×C(ejw) is given by (5.14), analysis modulation
matrix Fn,C×D(ejw) is given by (5.12), and propagation and target inversion cascade matrix
Hn,D×D(ejw), a diagonal matrix, is given by (6.37)
77
Hence, we can write (6.52) as simply
Syn1 ,yn2 ,D×D(e
jw) =
1
D2
Qn1,D×D(e
jw)Sx,x,D×D(e
jw)Q∗n2,D×D(e
jw) (6.54)
In the case when x is white noise with variance σ2x, then (6.54) simplifies to
Syn1 ,yn2 ,D×D(e
jw) =
σ2x
D2
Qn1,D×D(e
jw)Q∗n2,D×D(e
jw) (6.55)
6.4 Time Averaged Energy of Output
We now show that the time averaged energy of the system output y, the sum of processing
from N filterbanks, from a single source x is related to the trace of the cyclic spectral density
of filterbank outputs Syn1 ,yn2 ,D×D. We measure the time averaged energy of the output, σ
2
y , as
follows:
σ2y = lim
T1→∞
T2→∞
1
T1 + T2 + 1
T2∑
t=−T1
E[y[t]y[t]] (6.56)
The following theorem is used to measure the gain of a single WSS source being processed
by a plurality of multirate filterbanks.
Theorem 6.4.1. Assume source x is white noise with variance σ2x. Then the output y after
processing through the system given by Fig. 6.1, has time averaged energy given as follows:
σ2y =
σ2x
2piD2
D−1∑
d1=0
D−1∑
d2=0
∫ pi
D
−pi
D
|
(
N−1∑
n=0
Qn,D×D(e
jw)
)
d1,d2
|2dw (6.57)
where Qn,D×D(ejw) = Gn,D×C(ejw)Fn,C×D(ejw)Hn,D×D(ejw) as defined in
(6.53).
78
To simplify the notation of (6.57, we define the 2-norm of a D ×D matrix valued function
QD×D(ejw) as
‖QD×D(e
jw)‖22 ≡
D−1∑
d1=0
D−1∑
d2=0
∫ pi
D
−pi
D
|
(
QD×D(e
jw)
)
d1,d2
|2dw (6.58)
With the above definition, (6.57) becomes
σ2y =
σ2x
2piD2
‖
N−1∑
n=0
Qn,D×D‖
2
2 (6.59)
Proof. In Fig. 6.1, for a given interference x, the output y is a sum of N outputs from each
microphone’s filter bank, that is
y[t] =
N−1∑
n=0
yn[t] (6.60)
where yn is the output of the n-th microphone’s filter bank.
Using (6.60) and (6.46), we can verify that y is WSCS with period D. Hence, we can
measure the energy of the output just over one period D as the following calculation shows:
79
σ2y = lim
T1→∞
T2→∞
1
T1 + T2 + 1
T2∑
t=−T1
E[y[t]]2
= lim
Np→∞
1
D(2Np + 1)
Np∑
p=−Np
D−1∑
l=0
E[y[l + pD]y[l + pD]]
=
1
D
D−1∑
l=0
lim
Np→∞
1
(2Np + 1)
Np∑
p=−Np
E[y[l + pD]y[l + pD]]
=
1
D
D−1∑
l=0
lim
Np→∞
1
(2Np + 1)
(2Np + 1)E[y[l]y[l]]
=
1
D
D−1∑
l=0
E[y[l]y[l]]
=
1
D
D−1∑
l=0
Ry[l, l] (6.61)
Expanding the correlation of y,Ry[l, l], shows that we are interested in the correlation of
each of the N filterbank’s outputs.
Ry[l, l] = E[y[l]y[l]]
= E[
N−1∑
n1=0
yn1 [l]
N−1∑
n2=0
yn2 [l]]
=
N−1∑
n1=0
N−1∑
n2=0
Ryn1 ,yn2 [l, l] (6.62)
Substituting (6.62) into (6.61) and changing the order of the finite summations, we have
σ2y =
N−1∑
n1=0
N−1∑
n2=0
1
D
D−1∑
l=0
Ryn1 ,yn2 [l, l] (6.63)
We observe that the weighted inner sum is simply the time averaged energy of the cross
correlation between filter bank outputs yn1 , yn2 (see (6.61)), that is
σ2y =
N−1∑
n1=0
N−1∑
n2=0
σ2yn1 ,yn2 (6.64)
80
The following deriviation relates σ2yn1 ,yn2 to the trace of the cyclic spectral density matrix
Syn1 ,yn2 .
σ2yn1 ,yn2 =
1
D
D−1∑
l=0
Ryn1 ,yn2 [l, l]
=
1
D
D−1∑
l=0
∑
u
Ryn1 ,yn2 [l + u, l] ·
1
2pi
∫ pi
D
−2pi+ piD
e−jwudw
︸ ︷︷ ︸
1 if u = 0 and 0 otherwise
=
1
2pi
∫ pi
D
−2pi+ piD
∑
u
1
D
D−1∑
l=0
Ryn1 ,yn2 [l + u, l]
︸ ︷︷ ︸
R0yn1 ,yn2
[u]
e−jwudw
=
1
2pi
∫ pi
D
−2pi+ piD
∑
u
R0yn1 ,yn2 [u]e
−jwu
︸ ︷︷ ︸
S0yn1 ,yn2
(ejw)
dw
=
1
2pi
D−1∑
r=0
∫ pi
D
− piD
S0yn1 ,yn2 (e
j(w−wD,r))dw
=
1
2pi
∫ pi
D
− piD
D−1∑
r=0
S0yn1 ,yn2 (e
j(w−wD,r))dw
=
1
2pi
∫ pi
D
− piD
Tr(Syn1 ,yn2 ,D×D(e
jw))dw (6.65)
The third and fourth line follow from the definition of cyclic correlation, (6.20), and cyclic
spectral density, (6.23), respectively. The fifth line breaks apart the integration into D intervals
each of length 2piD , and the last line follows the definition of Syn1 ,yn2 ,D×D(e
jw), (6.50). Substitut-
ing (6.65) into (6.64) yields:
σ2y =
1
2pi
∫ pi
D
−pi
D
N−1∑
n1=0
N−1∑
n2=0
Tr(Syn1 ,yn2 ,D×D(e
jw))dw (6.66)
Using (6.55), the definition of the Hilbert-Schmidt inner product, (2.10), and the definition
of the Hilbert-Schmidt norm, (2.9), we can simplify the integrand of (6.66) as follows:
81
N−1∑
n1=0
N−1∑
n2=0
Tr(Syn1 ,yn2 ,D×D(e
jw)) =
σ2x
D2
N−1∑
n1=0
N−1∑
n2=0
Tr(Qn1,D×D(e
jw)Q∗n2,D×D(e
jw))
=
σ2x
D2
N−1∑
n1=0
N−1∑
n2=0
〈
Qn1,D×D(e
jw),Q∗n2,D×D(e
jw)
〉
HS
=
σ2x
D2
〈
N−1∑
n1=0
Qn1,D×D(e
jw),
N−1∑
n2=0
Q∗n2,D×D(e
jw)
〉
HS
=
σ2x
D2
‖
N−1∑
n=0
Qn,D×D(e
jw)‖2HS
=
σ2x
D2
D−1∑
d1=0
D−1∑
d2=0
|
(
N−1∑
n=0
Qn,D×D(e
jw)
)
d1,d2
|2 (6.67)
Substituting (6.67) into (6.66) yields the desired
σ2y =
σ2x
2piD2
D−1∑
d1=0
D−1∑
d2=0
∫ pi
D
−pi
D
|
(
N−1∑
n=0
Qn,D×D(e
jw)
)
d1,d2
|2dw
82
Chapter 7: Optimization Over Synthesis Filters
Using a Sparse Number of Subchan-
nels
7.1 Overview
Our overall objective is to minimize the p-norm of interference source gains while both re-
constructing perfectly the target source and using a sparse number of subchannels. In our problem
setup, there are two types of sources, I interferences, whose gains we wish to attenuate and a
single target, whose gain from system processing we wish to be exactly equal to 1. In other words,
our system will process the target source with no distortion. Specifically, our desired optimization
problem reads
min
Gn,D×C(ejw)
‖
{
‖
N−1∑
n=0
Gn,D×C(e
jw)Fn,C×D(e
jw)H(r)n,D×D(e
jw)‖22
}
1≤r≤L
‖p
subject to
N−1∑
n=0
Gn,D×C(e
jw)Fn,C×D(e
jw)H(0)n,D×D(e
jw) = Dejw diag
[
(e−
2pijd
D )D−1d=0
]
N−1∑
n=0
‖{ max
−pi≤w≤pi
|
(
Gn,D×C(e
jw)
)
0,l |}0≤l≤C−1‖0 ≤ VS
(7.1)
where H(r)n,D×D(e
jw) refers to the product of two frequency domain objects: the propaga-
83
tion of source r to microphone n and the target inversion filter specific to microphone n, D is the
decimation factor, C is the number of subchannels per filterbank, and VS is the desired number of
active subchannels. Unfortunately, this is not a convex optimization problem.
We assume our set of N ·C analysis filters, denoted by F , are fixed and known where N is
the number of microphones and C is the number of subchannels for each microphone, that is
F =
(
Fn,l(e
jw)
)
0≤n≤N−1
0≤l≤C−1
w∈[0,2pi)
(7.2)
is a known quantity.
We further assume that the locations of the sources and microphones and known and hence
the cascade of the source propagation and target inversion filters, denoted by H , is also known.
There are I + 1 sources with I interferences and a single target source and N microphones and
hence H consists of (I + 1) ·N transfer functions. We define H as follows
H =
(
Hr,n(e
jw)
)
0≤r≤I
0≤n≤N−1
w∈[0,2pi)
(7.3)
Our optimization is then over the N · C synthesis filters, denoted by G, where G is defined
as follows
G =
(
Gn,l(e
jw)
)
0≤n≤N−1
0≤l≤C−1
w∈[0,2pi)
(7.4)
To make finding our unknown G computationally tractable, we will discretize the continu-
ous variable w, 0 ≤ w < 2pi, with Nf equally spaced points, that is we let
w =
2pif
Nf
with f = 0, 1, . . . , Nf − 1 (7.5)
84
To simply our computations, we will also make the assumption that Nf , the number of
discretized frequencies, is an even multiple of D, the downsampling factor, that is
Nf = M ·D (7.6)
withM a positive even integer. In addition, to vectorize our computations, we will combine
the indexes n and l over s (for subchannel) by letting
s = l + nC and S = N · C (7.7)
Hence, the discretized version of unknown synthesis filters G, Gcompute, consists of S ·Nf
unknown complex numbers, that is
Gcompute =
(
Gn,l(e
j( 2piNf
f)
)
)
0≤s≤S−1
0≤f≤Nf−1
(7.8)
By penalizing non-zero subchannel synthesis tap coefficients in our optimization problem
with an absolute value penalty term, in the spirit of the LASSO algorithm given in [1], we will
force certain subchannels to be considered inactive and create a sparse set of active subchannels.
A subchannel will be considered inactive if its synthesis tap coefficients are zero are close to zero.
We will measure gain using the time-averaged energy given by (6.57). Notice that (6.57)
assumes a fixed source x, and hence we will need to compute (6.57) for each of our I + 1 sources
xr.
In this chapter, we first review the prior work before detailing our problem solution. Next,
we discuss the the objective function which consists of two terms, the p-norm of interference
source gains and the sparse subchannel penalty term. Afterwards, we discuss the target source
perfect reconstruction constraint and then combine the results in the final section to give the full
optimization problem.
85
7.2 Prior Work
Our work optimizes both the placement of microphones and the design of the multirate
filterbanks processing the microphone’s inputs simultaneously. To the best of our knowledge,
this is a novel contribution. There has been extensive work, however, in both the areas of sensor
placement and optimization of multirate filterbanks.
7.2.1 Sensor Placement
There has been extensive work in the sensor placement problem using a variety of strate-
gies. In [26], the authors use simulated annealing to simultaneously optimize both weights and
sensor locations on a linear array. In [27], the authors address a parameter estimation problem
that includes sensor location using convex optimization. They relax a binary variable of a sensor
being off, 0, or on, 1, by letting the variable instead be in the range of [0, 1]. In [28], the authors
relax the binary variable of a sensor being off, 0, or on, 1, by converting the unknown vector to a
matrix of 0s and 1s which belongs to the class of Steifel matrices. The relaxation is to a 1-d sphere
and they jump to multiple dimensions using a greedy algorithm. The authors here are interested
optimizing an objective criteria involving the KullbackLeibler divergence. [29] minimizes the side
lobe produced by a sparse linear array using a greedy deletion algorithm.
[30] maximizes the Fisher Information in their optimization criteria. When formulating an
optimization problem to choose sensors, they convert a matrix equality to a matrix inequality in
their convex relaxation.
[31] address the problem of recovering K target sources emanating from a N possible
locations using N sensors where K << N . They convert the problem to one of selecting rows
that minimize the correlation coefficient and propose a genetic algorithm that is data-dependent.
86
[32] addresses the the following problem: Given a desired beampattern response and er-
ror tolerance, what is the sparsest sensor setup that will achieve the desired beampattern within
the given error tolerance? The authors address this problem using reweighted l − 1 minimization
and assuming the use of regularly spaced rectangular array, symmetric subarrays, and weights
with conjugate symmetry. In [22], the authors address the same problem using a regularization
approach inspired by Bayesian statistics. [33] also addresses the same problem. The author con-
strain microphone distances and guess a configuration and the perturb the configuration seeking a
better solution. [34] address the same problem for a beampattern produced by a transmission array
and propose a greedy addition algorithm.
In contrast, our work addresses the following problem: Given a fixed number of sensors,
what is the best possible beampattern achievable?
7.2.2 Filterbank Optimization
There has also been extensive work in the optimization of filterbanks. [35], a work pub-
lished in 1980, was an early forerunner in the field. The author optimizes a quadrature mirror
filterbank to meet a user-given frequency response criteria. He minimizes both the ripple energy
and out of band energy using a search algorithm whose success is highly dependent on both the
starting point and step size. Similar to our proposed problem setup but for a single filterbank, [36]
fixes the analysis filters and optimizes the synthesis filters to achieve the best possible reconstruc-
tion given a user-specified integer time delay. They convert their problem to a H∞ problem to
take advantage of existing software. In [37], the authors design a multi-dimensional perfect re-
construction filterbank where both the analysis and synthesis filter are FIR filters of equal length.
They directly embed this non-linear and non-convex constraint into their optimization problem
where the objective function measures the difference between a desired analysis filterbank and the
87
optimized analysis filterbank. The author of [38] designs perfect reconstruction and near perfect
reconstruction filterbanks using a quadratic constrained least-squares optimization problem. The
author solves a relaxed problem by converting the quadratic constraints to linear constraints. The
authors of [39] design a cosine modulated filterbanks with a linear phase prototype filter using
convex optimization by converting a non-convex set to a convex set. More recently, Davidson
in [40] provides a novice-friendly tutorial of convex optimization techniques when designing FIR
filters. Finally, [25] optimizes a single FIR filterbank to minimize the time-averaged mean-squared
error when the high-frequency subband is dropped. We build off their techniques to derive an in-
put/output relationship for our system consisting of a plurality of filterbanks.
7.3 Objective Function
Our objective function is the sum of two terms, one measuring the p-norm of the gain of the
I interferences, JI,p, and the other penalizing active subchannels, JS , that is those subchannels
with non-zero synthesis filters responses. Typical p-norms of interest would be p = 1, 2,∞.
We adjust the severity of our active subchannel penalty by choosing a non-negative constant λ
to weight our active subchannel term JS . The larger a λ chosen, the more severe the active
subchannel penalty and the fewer number of active subchannels our optimization problem will
recover. Conversely, the smaller a λ chosen, the less severe the active subchannel penalty and the
greater number active subchannels our optimization problem will recover. By choosing λ equal to
0, we eliminate the active subchannel penalty altogether and allow the use of all N · C channels.
Our optimization is over N · C synthesis filter responses where N is the number of micro-
phones and C is the number of subchannels for each microphone. We denote these set of filter
responses by G defined in (7.4). Hence our objective function J can be written as
88
J(G) = JI,p(G) + λ · JS(G) (7.9)
G as stated above is a function of continuous variable w and to make J(G) computation-
ally tractable, we will discretize G using Gcompute, defined in (7.8), and update both JI,p and JS
accordingly. Hence, what we will compute for our objective function J(G) will be the approxi-
mation Jcompute(Gcompute), that is,
J(G) ≈ Jcompute(Gcompute)
= JI,p,compute(Gcompute) + λ · JS,compute(Gcompute) (7.10)
7.3.1 p-norm of Interference Gains Term
We let the p-norm of the interference gains be the p-norm of the interference sources’ time-
averaged energies, that is
JI,p(G) = ‖(σ
2
yr)r‖p (7.11)
where the time-averaged energy σ2yr is given by (6.57). We observe that (6.57) is a function
of Qn,D×D which varies with n, the microphone index. The definition of Qn,D×D,(6.53), further
shows that for fixed n, Qn,D×D varies with Hn,D×D, the cascade of propagation and target and
inversion filter responses, Fn,C×D, the analysis filter responses, and Gn,D×C , the synthesis filter
responses. As stated above, we assume that both H and F , the set of target and inversion filter
responses and analysis filter repsonses for all n respectively, are fixed and known and hence in our
setup, the time-averaged energy only varies with synthesis filter repsonses for each n, that is G.
89
For the case p =∞, JI,p(G) then becomes
JI,∞(G) = max
r
σ2yr (7.12)
7.3.2 Discretization of σ2yr
In the following derivations, we will discretize the value σ2yr over frequency w with a set
of finite, uniformly spaced points to give σ2yr,compute, a computationally tractable term. We will
further assume all our sources have equal variance σ2x to further simplify computationsf to give
σˆ2yr,compute
To begin, we use (6.57) to expand σ2yr and observe that the d1, d2 term of the sum of N
Qn,D×D matrices is equal to summing each of the d1, d2 terms of the same set of N matrices to
get the following:
σ2yr =
σ2x
2piD2
D−1∑
d1=0
D−1∑
d2=0
∫ pi
D
−pi
D
|
(
N−1∑
n=0
Qn,D×D(e
jw)
)
d1,d2
|2dw
=
σ2x
2piD2
D−1∑
d1=0
D−1∑
d2=0
∫ pi
D
−pi
D
|
N−1∑
n=0
(
Qn,D×D(e
jw)
)
d1,d2
|2dw (7.13)
Next, by observing that Hn,D×D(ejw) is a diagonal matrix in the definition of Qn,D×D(ejw),(6.53),
we can write the scalar expression
(
Qn,D×D(ejw)
)
d1,d2
as follows
(
Qn,D×D(e
jw)
)
d1,d2
=
C−1∑
l=0
Gn,l(e
j(w−wD,d1 ))Fn,l(e
j(w−wD,d2 ))Hr,n(e
j(w−wD,d2 )) (7.14)
Substituting (7.14) into (7.13) yields,
90
σ2yr =
σ2x
2piD2
D−1∑
d1=0
D−1∑
d2=0
·
∫ pi
D
−pi
D
|
N−1∑
n=0
C−1∑
l=0
Gn,l(e
j(w−wD,d1 ))Fn,l(e
j(w−wD,d2 ))Hr,n(e
j(w−wD,d2 )|2dw
(7.15)
We can then approximate the integral in (7.15) as follows:
∫ pi
D
−pi
D
|
N−1∑
n=0
C−1∑
l=0
Gn,l(e
j(w−wD,d1 ))Fn,l(e
j(w−wD,d2 ))Hr,n(e
j(w−wD,d2 )|2dw
=
∫ pi
D
− piD
∣
∣
∣
∣
∣
N−1∑
n=0
C−1∑
l=0
Gn,l(e
j(w− 2pid1D ))Fn,l(e
j(w− 2pid2D ))Hr,n(e
j(w− 2pid2D ))
∣
∣
∣
∣
∣
2
dw
≈
2pi
Nf
NF
2D −1∑
f=−
NF
2D
∣
∣
∣
∣
∣
N−1∑
n=0
C−1∑
l=0
Gn,l(e
j2pi( fNf
− d1D ))Fn,l(e
j2pi( fNf
− d2D ))Hr,n(e
j2pi( fNf
− d2D ))
∣
∣
∣
∣
∣
2
(7.16)
Substituting (7.16) for the integral in (7.15), yields σ2yr,compute, an approximation of σ
2
yr
that is computationally tractable, that is
σ2yr ≈ σ
2
yr,compute
=
σ2xr
D2Nf
D−1∑
d1=0
D−1∑
d2=0
·
NF
2D −1∑
f=−
NF
2D
∣
∣
∣
∣
∣
N−1∑
n=0
C−1∑
l=0
Gn,l(e
j2pi( fNf
− d1D ))Fn,l(e
j2pi( fNf
− d2D ))Hr,n(e
j2pi( fNf
− d2D ))
∣
∣
∣
∣
∣
2
(7.17)
If we assume that each source xr has the same variance, that is σ2xr = σ
2
x for all r, we can
treat the leading coefficient as a constant and (7.17) becomes
91
σˆ2yr,compute =
σx2
D2Nf
·
D−1∑
d1=0
D−1∑
d2=0
NF
2D −1∑
f=−
NF
2D
∣
∣
∣
∣
∣
N−1∑
n=0
C−1∑
l=0
Gn,l(e
j2pi( fNf
− d1D ))Fn,l(e
j2pi( fNf
− d2D ))Hr,n(e
j2pi( fNf
− d2D ))
∣
∣
∣
∣
∣
2
(7.18)
To simply notation, we define the following equality:
F˙n,l,r(e
j2pi( fNf
− d2D )) ≡ Fn,l(e
j2pi( fNf
− d2D ))Hr,n(e
j2pi( fNf
− d2D )) (7.19)
We note that the value of (7.19) is known by assumption. We also combine the summations over
n and l inside the magnitude squared into a single summation over s (for subchannel) by letting
s = l + nC and S = N · C (7.20)
Hence, (7.18) becomes
σˆ2yr,compute =
σx2
D2Nf
D−1∑
d1=0
D−1∑
d2=0
NF
2D −1∑
f=−
NF
2D
∣
∣
∣
∣
∣
S−1∑
s=0
Gs(e
j2pi( fNf
− d1D ))F˙s,r(e
j2pi( fNf
− d2D ))
∣
∣
∣
∣
∣
2
(7.21)
To efficiently compute (7.21), we would like to rewrite it as a product of a row vector,
matrix, and column vector where the row and column vector contain the unknown discretized
frequency responses of all S synthesis filters. To begin, we first expand the magnitude squared of
(7.21) and rearrange the finite summations to get
92
σˆ2yr,compute =
σx2
D2Nf
D−1∑
d1=0
NF
2D −1∑
f=−
NF
2D
S−1∑
s1=0
S−1∑
s2=0
Gs1(e
j2pi( fNf
− d1D ))Gs2(e
j2pi( fNf
− d1D )) ·


D−1∑
d2=0
F˙s1,r(e
j2pi( fNf
− d2D ))F˙ s2,r(e
j2pi( fNf
− d2D ))


︸ ︷︷ ︸
Φr(s1,s2,f)
(7.22)
We choose the symbol Φ since it represents capital “F ” in Greek. In order to to reduce the
summations over both d1 and f to a single summation over f , we make the assumption that Nf is
an even multiple of D, that is
Nf = M ·D (7.23)
with M a positive even integer. Rewriting the arguments to Gs1 and Gs2 , we then have
σˆ2yr,compute =
σx2
D2Nf
D−1∑
d1=0
NF
2D −1∑
f=−
NF
2D
S−1∑
s1=0
S−1∑
s2=0
Gs1(e
j2pi( f−Md1Nf
)
)Gs2(e
j2pi( f−Md1Nf
)
)Φr(s1, s2, f)
(7.24)
A calculation in the appendix, (A.1), shows that Φr(s1, s2, f) is M -periodic in f , that is
Φr(s1, s2, f −M) = Φr(s1, s2, f) (7.25)
Hence, we then have
σˆ2yr,compute =
σx2
D2Nf
D−1∑
d1=0
NF
2D −1∑
f=−
NF
2D
S−1∑
s1=0
S−1∑
s2=0
Gs1(e
j2pi( f−Md1Nf
)
)Gs2(e
j2pi( f−Md1Nf
)
)Φr(s1, s2, f −Md1)
(7.26)
By letting f = −NF2D , . . . ,
NF
2D − 1 and d1 = 0, 1, . . . D − 1 in the relationship fˆ = (f −
Md1) mod Nf , we have fˆ = 0, 1, . . . , Nf − 1. In addition, we observe that the z-transform is 2pi
93
periodic in w for z = ejw which means we can re-index f −Md1 mod Nf in the arguments of
Gs1 and Gs2 . Finally, since Nf = M ·D by assumption, Φr(s1, s2, f) is also Nf periodic in f
which means we can also re-index f −Md1 modNf in the arguments of Φ. We can then combine
the summations over d1 and f into one summation over f in (7.26) as follows
σˆ2yr,compute =
σx2
D2Nf
·
Nf−1∑
f=0
S−1∑
s1=0
S−1∑
s2=0
Gs1(e
j2pi( fNf
)
)Gs2(e
j2pi( fNf
)
)Φr(s1, s2, f)
(7.27)
If we make the assumption that the analysis and the synthesis filters’ FIR coefficients are
real, the term Gs1(e
j2pi( fNf
)
)Gs2(e
j2pi( fNf
)
)Φr(s1, s2, f) is conjugate symmetric in continuous
variable f . Hence we can write (7.27) as follows:
σˆ2yr,compute =
σx2
D2Nf
[2
Nf
2 −1∑
f=1
S−1∑
s1=0
S−1∑
s2=0
Gs1(e
j2pi( fNf
)
)Gs2(e
j2pi( fNf
)
)Φr(s1, s2, f)
+
S−1∑
s1=0
S−1∑
s2=0
Gs1(e
j2pi( 0Nf
)
)Gs2(e
j2pi( 0Nf
)
)Φr(s1, s2, 0)
+
S−1∑
s1=0
S−1∑
s2=0
Gs1(e
j2pi(
Nf
2
Nf
)
)Gs2(e
j2pi(
Nf
2
Nf
)
)Φr(s1, s2,
Nf
2
)]
(7.28)
For a fixed f , any of the three summations over s1 and s2 of (7.27) can be expressed as
product of a row vector, matrix, and column vector, that is
S−1∑
s1=0
S−1∑
s2=0
Gs1(e
j2pi( fNf
)
)Gs2(e
j2pi( fNf
)
)Φr(s1, s2, f)
= G1×S(e
j2pi( fNf
)
)Φr,S×S(f)G
∗
1×S(e
j2pi( fNf
)
) (7.29)
where
94
G1×S(e
j2pi( fNf
)
) = [G0,1×C(e
j2pi( fNf
)
),G1,1×C(e
j2pi( fNf
)
), . . . ,GN−1,1×C(e
j2pi( fNf
)
)] (7.30)
is a row vector, size 1 × S, containing all S synthesis filters’ responses at discretized fre-
quency f . The entries of the square matrix Φr,S×S(f), size S × S, is given as
(Φr,S×S(f))s1,s2 = Φr(s1, s2, f) =
D−1∑
d2=0
F˙s1,r(e
j2pi( fNf
− d2D ))F˙ s2,r(e
j2pi( fNf
− d2D )) (7.31)
In addition, Φr,S×S(f) can be expressed as the product of the analysis matrix F˙r,S×D(e
j2pi( fNf
)
)
and its conjugate that is
Φr,S×S(f) = F˙r,S×D(e
j2pi( fNf
)
)F˙∗r,S×D(e
j2pi( fNf
)
) (7.32)
where the matrix F˙r,S×D(e
j2pi( fNf
)
), size S by D, is defined as
F˙r,S×D(e
j2pi( fNf
)
) =












F˙r,0,1×C(e
j2pi( fNf
)
) F˙r,1,1×C(e
j2pi( fNf
)
) · · · F˙r,N−1,1×C(e
j2pi( fNf
)
)
F˙r,0,1×C(e
j2pi( fNf
− 1D )) F˙r,1,1×C(e
j2pi( fNf
− 1D )) · · · F˙r,N−1,1×C(e
j2pi( fNf
− 1D ))
...
...
. . .
...
F˙r,0,1×C(e
j2pi( fNf
−D−1D )) F˙r,1,1×C(e
j2pi( fNf
−D−1D )) · · · F˙r,N−1,1×C(e
j2pi( fNf
−D−1D ))












T
(7.33)
and consists of D ·N column vectors, size C by 1, defined in row vector notation as
F˙r,n,1×C(e
j2pi( fNf
− dD )) =
Hr,n(e
j2pi( fNf
− dD )) · [Fn,0(e
j2pi( fNf
− dD )), Fn,1(e
j2pi( fNf
− dD )), · · · , Fn,C−1(e
j2pi( fNf
− dD ))]
(7.34)
95
with d ∈ {0, 1, . . . , D − 1}. Hence, we can further rewrite (7.29) as follows
S−1∑
s1=0
S−1∑
s2=0
Gs1(e
j2pi( fNf
)
)Gs2(e
j2pi( fNf
)
)Φr(s1, s2, f)
= G1×S(e
j2pi( fNf
)
)Φr,S×S(f)G
∗
1×S(e
j2pi( fNf
)
)
= G1×S(e
j2pi( fNf
)
)F˙r,S×D(e
j2pi( fNf
)
)F˙∗r,S×D(e
j2pi( fNf
)
)G∗1×S(e
j2pi( fNf
)
)
= ‖F˙∗r,S×D(e
j2pi( fNf
)
)G∗1×S(e
j2pi( fNf
)
)‖2 (7.35)
We can then express the right hand side of (7.28) as a product of a block diagonal matrix
and column vector, that is,
σˆ2yr,compute =
σx2
D2Nf
· 2‖F˙∗
r,S(
Nf
2 +1)×D(
Nf
2 +1)
(e
j2pi( fNf
)
)G∗
1×S(
Nf
2 +1)
(e
j2pi( fNf
)
)‖2 (7.36)
and by transpose and conjugation we have
σˆ2yr,compute =
σx2
D2Nf
· 2‖G
1×S(
Nf
2 +1)
(e
j2pi( fNf
)
)F˙
r,S(
Nf
2 +1)×D(
Nf
2 +1)
(e
j2pi( fNf
)
)‖2 (7.37)
where
F˙
r,S(
Nf
2 +1)×D(
Nf
2 +1)
(e
j2pi( fNf
)
) =
diag
(
1
√
2
F˙r,S×D(e
j2pi(0)), F˙r,S×D(e
j2pi( 1Nf
)
), · · · , F˙r,S×D(e
j2pi(
Nf
2 −1)),
1
√
2
F˙r,S×D(e
j2pi(
Nf
2 ))
)
(7.38)
and
G
1×S(
Nf
2 +1)
(e
j2pi( fNf
)
) =
[
G1×S(ej2pi(0)) G1×S(e
j2pi( 1Nf
)
) · · · G1×S(ej2pi(
Nf
2 ))
]
(7.39)
96
7.3.3 Computationally Tractable Approximation of JI(G)
Using (7.37), we then have the computationally tractable approximation of (7.11), that is
JI,p(G) = ‖(σ
2
yr)r‖p
≈ ‖(σˆ2yr,compute)r‖p
= JI,p,compute(Gcompute) (7.40)
7.3.4 Sparse Subchannels Penalty Term
In the following section, we will derive a computationally tractable and efficient sparse
subchannel penalty term JS,compute(Gcompute). The derivation begins by defining JS,sgn(G),
which counts the number of active subchannels by seeing whether each subchannel’s synthesis
filter frequency response is non-zero or not. Next, the continuous frequency variable w is dis-
cretized along a finite, uniformly spaced set of points to give the computationally tractable term
JS,sgn,compute(Gcompute). Finally, an L-1 like penalty is substituted to not only increase computa-
tional efficiency but also to induce sparse solutions to give the desired JS,compute(Gcompute).
Our goal is to find an objective function that not only minimizes the gain (time-averaged
energy) of interference sources but also encourages sparse subchannels, that is only a few of the
N ∗C subchannels are active. As before,N is the number of microphones, and C is the number of
subchannels of each microphone. We say a subchannel is inactive if its synthesis filter frequency
response is zero or very small in magnitude. We measure the number of active subchannels as
follows:
JS,sgn(G) =
N−1∑
n=0
C−1∑
l=0
sgn2
(
max
0≤w<pi
|Gn,l(e
jw|
)
(7.41)
97
where sgn2(x) is 1 if x < 0, 0 if x = 0, and 1 if x > 0. In other words, a channel is
considered active if any portion of its frequency response is non-zero. We note that 0 ≤ w < pi
rather than 0 ≤ w < 2pi since we assume our filter taps are real and hence the frequency response
is conjugate symmetric. (7.41) counts the number of active subchannels and since sgn is applied
to the max of absolute values, the value of (7.41) lies in the appropriate range of 0 to N · C.
To make (7.41) computationally tractable, we discretize the continuous variable w using
Nf points as in (7.16) so (7.41) becomes
JS,sgn(G) ≈ JS,sgn,compute(Gcompute)
=
N−1∑
n=0
C−1∑
l=0
sgn2
(
max
f∈{0,1,...,Nf/2}
|Gn,l(e
j2pi fNf |
)
(7.42)
We combine the summations over n and l into a single summation over s using (7.20) as
before. In addition, similar to the spirit of Compressive Sampling, we replace the sgn2 function
by the absolute value. Hence (7.42) becomes
JS,abs,compute(Gcompute) =
S−1∑
s=0
(
max
f∈{0,1,...,Nf/2}
|Gs(e
j2pi fNf |
)
(7.43)
In the objective function that we will compute, (7.10), we will use (7.43) and hence we will
shorten its notation as follows:
JS,compute(Gcompute) = JS,abs,compute(Gcompute) (7.44)
98
7.4 Target Source Perfect Reconstruction Condition
The target perfect reconstruction condition, TPR, is given by (6.9). To make (6.9) compu-
tationally tractable, we discretize the continuous variable w using Nf points as in (7.16) so (6.9)
becomes for f = 0, 1, . . . , Nf − 1
TPRcompute(f, d) =
N−1∑
n=0
H0,n(e
j2pi( fNf
− dD ))
C−1∑
l=0
Gn,l(e
j2pi( fNf
)
)Fn,l(e
j2pi( fNf
− dD ))
=



De
j2pi( fNF
)(−∆)
if d = 0;
0 if 1 ≤ d ≤ D − 1
(7.45)
Since we haveD constraints for each of ourNf discretized frequencies, the TPR condition
now has a total of Nf ·D constraints.
In matrix-vector form and analogous to (6.10), the D target perfect reconstruction of con-
ditions of (7.45) for a fixed f ∈ {0, 1, . . . , Nf − 1} can be written as
TPRcompute(f) = F˙TS×D(e
j2pi( fNf
)
)GT1×S(e
j2pi( fNf
)
) = De
j2pi( fNF
)(−∆)
· eT0,1×D (7.46)
where S = N · C, the number of subchannels. The matrix F˙TS×D(f), size D by S, is
defined as
F˙TS×D(e
j2pi( fNf
)
) =












F˙0,1×C(e
j2pi( fNf
)
) F˙1,1×C(e
j2pi( fNf
)
) · · · F˙N−1,1×C(e
j2pi( fNf
)
)
F˙0,1×C(e
j2pi( fNf
− 1D )) F˙1,1×C(e
j2pi( fNf
− 1D )) · · · F˙N−1,1×C(e
j2pi( fNf
− 1D ))
...
...
. . .
...
F˙0,1×C(e
j2pi( fNf
−D−1D )) F˙1,1×C(e
j2pi( fNf
−D−1D )) · · · F˙N−1,1×C(e
j2pi( fNf
−D−1D ))












(7.47)
99
and consists of D ·N row vectors, size 1 by C, defined as
F˙n,1×C(e
j2pi( fNf
− dD )) =
H0,n(e
j2pi( fNf
− dD )) · [Fn,0(e
j2pi( fNf
− dD )), Fn,1(e
j2pi( fNf
− dD )), · · · , Fn,C−1(e
j2pi( fNf
− dD ))]
(7.48)
with d ∈ {0, 1, . . . , D − 1}.
The column vector GT1×S(e
j2pi( fNf
)
), size S by 1, is defined as
GT1×S(e
j2pi( fNf
)
) =












GT0,1×C(e
j2pi( fNf
)
)
GT1,1×C(e
j2pi( fNf
)
)
...
GTN−1,1×C(e
j2pi( fNf
)
)












(7.49)
and consists of N column vectors, size C by 1, defined as
GTn,1×C(e
j2pi( fNf
)
) = [Gn,0(e
j2pi( fNf
)
), Gn,1(e
j2pi( fNf
)
), · · · , Gn,C−1(e
j2pi( fNf
)
)]T (7.50)
The column vector eTk,1×D, size D×1, is defined as the D×1 zero vector but with the k-th
entry set to 1, that is
eTk,1×D = [0, . . . 0, 1︸︷︷︸
k-th entry of D size vector
, 0, . . . 0]T (7.51)
We note that if we use the same set of C analysis filters for each filterbank that is
[Fn1,0(e
j2pi( fNf
− dD )), Fn1,1(e
j2pi( fNf
− dD )), · · · , Fn1,C−1(e
j2pi( fNf
− dD ))]
= [Fn2,0(e
j2pi( fNf
− dD )), Fn2,1(e
j2pi( fNf
− dD )), · · · , Fn2,C−1(e
j2pi( fNf
− dD ))] (7.52)
100
for all 0 ≤ n1, n2 ≤ N − 1, 0 ≤ d ≤ D− 1, and 0 ≤ f ≤ Nf − 1, and the target inversion
pre-filter removes the effect on phase from propagation perfectly, that is
H0,n(e
j2pi( fNf
− dD )) = α0,n (7.53)
for all 0 ≤ n ≤ N − 1, 0 ≤ d ≤ D − 1, and 0 ≤ f ≤ Nf − 1, then (7.47) is of rank =
min(D,C) since every C columns are scalar multiples of the previous C columns. Since we need
to fulfill D constraints, these two additional assumptions imply that D ≤ C.
If we assume that all filter taps are real, then we can almost halve the number of constraints
using the conjugate symmetry of filter responses and the 2pi periodicity in w in the z-transform
for z = ejw . First if (7.46) holds for 0 ≤ f ≤ Nf − 1, then the conjugate of the entire equation
also holds, that is
F˙TS×D(e
j2pi( fNf
)
)GT1×S(e
j2pi( fNf
)
) = D · eT0,1×D = D · e
T
0,1×D (7.54)
where the last equality follows because eT0,1×D contains all real entries. Next, we have
F˙TS×D(e
j2pi( fNf
)
)GT1×S(e
j2pi( fNf
)
) = D · eT0,1×D
F˙TS×D(e
j2pi(−fNf
)
)GT1×Se
j2pi(−fNf
)
) = D · eT0,1×D
F˙TS×D(e
j2pi(
−f mod Nf )
Nf
)
)GT1×S(e
j2pi(
−f mod Nf )
Nf
)
) = D · eT0,1×D
where the second line follows from conjugate symmetry and the third line follows from
2pi periodicity in w. In summary, we have shown that if TPRcompute(f) holds, so does
TPRcompute(−f mod Nf ). Hence, we can express the constraints of (7.46) for all f as a
product of a block-diagonal matrix-vector multiply, that is
101
F˙T
(
Nf
2 +1)S×(
Nf
2 +1)D
GT
1×(
Nf
2 +1)S
= D · E˙T
0,1×(
Nf
2 +1)D
(7.55)
and by transposition we have
G
1×(
Nf
2 +1)S
F˙
(
Nf
2 +1)S×(
Nf
2 +1)D
= D · E˙
0,1×(
Nf
2 +1)D
(7.56)
where the block diagonal matrix F˙
(
Nf
2 +1)S×(
Nf
2 +1)D
, size (Nf2 + 1)S × (
Nf
2 + 1)D , is given by
F˙
(
Nf
2 +1)S×(
Nf
2 +1)D
=
diag(
1
√
2
F˙S×D(e
j2pi( 0Nf
)
), F˙S×D(e
j2pi( 1Nf
)
), . . . , F˙S×D(e
j2pi(
Nf
2 −1
Nf
)
),
1
√
2
F˙S×D(e
j2pi(
Nf
2
Nf
)
))
(7.57)
the row vector of unknowns G
1×(
Nf
2 +1)S
, size 1× (Nf2 + 1)S, is given by
G
1×(
Nf
2 +1)S
= [G1×S(e
j2pi( 0Nf
)
),G1×S(e
j2pi( 1Nf
)
), . . . ,G1×S(e
j2pi(
Nf
2
Nf
)
)], (7.58)
and the row vector of constraints E
0,1×(
Nf
2 +1)D
, size 1× (Nf2 + 1)D, is given by
E˙
0,1×(
Nf
2 +1)D
=
[
1
√
2
e
j2pi( 0NF
)(−∆)
e0,1×D
︸ ︷︷ ︸
f=0
, e
j2pi( 1NF
)(−∆)
e0,1×D
︸ ︷︷ ︸
f=1
, . . . ,
e
j2pi(
Nf
2 −1
NF
)(−∆)
e0,1×D
︸ ︷︷ ︸
f=
Nf
2 −1
,
1
√
2
e
j2pi(
Nf
2
NF
)(−∆)
e0,1×D
︸ ︷︷ ︸
f=
Nf
2
]
(7.59)
As before, S represents the total number of subchannels and is equal to the product the
number of filterbanks N and the number of subchannels per filterbank C, that is S = N · C.
102
In addition, by conjugating (7.55), we make the terms consistent with (7.35) where the
unknown synthesis filters are a column vector, and hence we have
TPRcompute = F˙∗
(
Nf
2 +1)S×(
Nf
2 +1)D
G∗
1×(
Nf
2 +1)S
= D · E˙∗
0,1×(
Nf
2 +1)D
(7.60)
7.5 Optimization Problem
We can now combine the results of the previous sections to give an optimization problem
that is computationally tractable.
minimize
Gcompute
Jcompute(Gcompute) = JI,p,compute(Gcompute) + λJS,compute(Gcompute)
subject to TPRcompute
(7.61)
where Gcompute is given by (7.8), JI,p,compute is given by (7.40), JS,compute is given by
(7.44), and TPRcompute is given by (7.56).
We note again that λ is a user chosen constant that denotes the severity of the sparse sub-
channel penalty and the problem (7.61)is convex.
103
Chapter 8: Numerical Results When Optimizing
Over Synthesis Filters
In this chapter, we detail our numerical simulations. Our simulations rely heavily on
Michael Grant’s CVX Toolbox, a package for specifying and solving convex programs [41] ,
and its integration with the professional solver MOSEK.
Fig. 8.1. Room environment.
The rectangular room shown in Fig. 8.1 has dimensions 10 meters by 8 meters and 32
microphones are placed uniformly along its 4 walls. The lower left hand corner of the room is
104
defined as the origin, coordinate (0,0), and our target source is located at (3, 2.5). There are 1240
virtual interferences in our room. Four-fifths of the interferences lie outside the room and model
reflections off the four walls. We model our room environment using a large number of virtual
interferences since we do not apriori where the actual sources lie.
Each microphone is processed by a two channel filterbank, with each channel being deci-
mated and then later upsampled by a factor of 2. Hence we have 32 ∗ 2 = 64 subchannels. We
fix the two analysis filters of each of the filterbanks to be Haar filters. We wish to design the
synthesis filters that will minimize the maximum gain of the interferences and hence let p = ∞.
In addition, we wish to perfectly reconstruct the target source and use only 20 out of the possible
64 subchannels. In other words, only 20 out of the 64 synthesis filters are allowed to be active
and have non-zero frequency responses. We note that 64 choose 20 is approximately 2 ∗ 1016, and
hence we have an overwhelming number of subsets of subchannels to choose from.
Fig. 8.2. Performance on optimization grid consisting of 1240 interferences in db.
We run our optimization problem iteratively, tuning the parameter λ at each iteration until
105
we find exactly 20 active subchannels. λ, a non-negative scalar, is found through a bisection
algorithm since as λ increases, the number of active subchannel decreases, and similarly as λ
decreases, the number of active subchannels increase. A subchannel is considered inactive if the
maximum magnitude of its synthesis filter is less than 11000 of the maximum magnitude of all the
synthesis filters. Fig. 8.3 shows the maximum magnitude of each subchannel’s synthesis filter
after our iterations completed. Notice that 20 of the 64 subchannel have non-trivial synthesis
filters .
Fig. 8.3. Maximum magnitude of each subchannel’s synthesis filter for the λ value that produced
the desired number of subchannels.
Our optimization problem finds a sparse number of synthesis filters and thus a sparse num-
ber of active subchannels. However, the frequency response of the inactive synthesis filters are not
exactly zero.
Hence, we run our optimization routine one more time, optimizing the synthesis filters of
only the previously discovered active subchannels and setting our sparsity parameters λ to zero.
106
This is known as the debiasing step. In some sense, this step redistributes the “crumbs” of energy
in the inactive subchannels’ synthesis filters to the active subchannels’ synthesis filters. Fig. 8.4
shows each subchannel’s maximum synthesis filter magnitude after this debiasing step.
Fig. 8.4. Maximum magnitude of each subchannel’s synthesis filter after debiasing.
Our optimization routine returns a setup that uses slightly more low-frequency subchannels
than it does high-frequency subchannels. Furthermore, the setup even occasionally uses only a
single subchannel of a filterbank and not the other subchannel. See Fig. 8.7,8.8, 8.9, and 8.10 for
sample synthesis filter frequency responses of the 64 subchannels, 20 of which are active. Fig. 8.5
shows the locations of the active low-frequency subchannels, and Fig. 8.6 shows the locations of
the active high-frequency subchannels. We optimized over 16 discrete frequencies, 9 of which are
unique since we assume all our filter taps are real and therefore frequency responses are conjugate
symmetric. Notice that the total number of active subchannels sums to the desired number of
active subchannels, 20.
Fig. 8.2 shows how the 20 synthesis filters we found performed on the set of sources we
107
Fig. 8.5. Locations of active low frequency subchannels.
optimized over. The target gain is 0 db because an optimization constraint was target perfect
reconstruction. The worst interference gain is −8.27 db. While this result is impressive, a more
realistic measure of performance is to test our system’s filters in a room with a denser set of
interferences, as in Fig. 8.11 where the worst interference gain is 1.84 db. Not surprisingly, the
performance is worst near the microphones. However, the target gain is again 0 db, and the gain
map decays very smoothly.
108
Fig. 8.6. Locations of active high frequency subchannels.
Fig. 8.7. The synthesis filters frequency responses of a filterbank whose low frequency subchannel
is only active.
109
Fig. 8.8. The synthesis filters frequency responses of a filterbank whose high frequency subchan-
nel is only active.
Fig. 8.9. The synthesis filters frequency responses of a filterbank whose subchannels are both
active.
110
Fig. 8.10. The synthesis filters frequency responses of a filterbank whose subchannels are both
inactive.
Fig. 8.11. Performance on evaluation grid consisting of 4960 interferences in db .
111
Chapter A: Properties of Γr
In the following, we assume that Nf , the number of discretized frequencies, is a multiple
of D, the downsampling factor, that is Nf = M ·D with M a positive integer. We now show that
Γr(s1, s2, f) is M -periodic in f , that is
Γr(s1, s2, f +M) = Γr(s1, s2, f) (A.1)
Proof. To simplify notation, we let φs1,s2,r(e
j2pi( f−Md2Nf
)
) = F˙s1,r(e
j2pi( f−Md2Nf
)
)F˙ s2,r(e
j2pi( f−Md2Nf
)
).
Also let d
′
2 = d2 − 1. Then we have
112
Γr(s1, s2, f +M) =
D−1∑
d2=0
φs1,s2,r(e
j2pi( f−M(d2−1)Nf
)
)
=
D−2∑
d
′
2=−1
φs1,s2,r(e
j2pi(
f−Md
′
2
Nf
)
)
= φs1,s2,r(e
j2pi( f+MNf
)
) +
D−2∑
d
′
2=0
φs1,s2,r(e
j2pi(
f−Md
′
2
Nf
)
)
= φs1,s2,r(e
j2pi( f+MNf
)
e
j2pi(−MD−M+MNf
)
) +
D−2∑
d
′
2=0
φs1,s2,r(e
j2pi(
f−Md
′
2
Nf
)
)
= φs1,s2,r(e
j2pi( f−M(D−1)Nf
)
) +
D−2∑
d
′
2=0
φs1,s2,r(e
j2pi(
f−Md
′
2
Nf
)
)
=
D−1∑
d
′
2=0
φs1,s2,r(e
j2pi(
f−Md
′
2
Nf
)
)
= Γr(s1, s2, f)
where the fourth line uses the fact that the z-transform is 2pi periodic in w for z = ejw.
We now show that the matrix Γr,S×S(f) is positive semi-definite, that is
x∗Γr,S×S(f)x ≥ 0 (A.2)
for any non-zero column vector x in CS .
Proof. From (7.31), we have
113
x∗Γr,S×S(f)x
=
S−1∑
s1=0
S−1∑
s2=0
xs1xs2
D−1∑
d2=0
F˙s1,r(e
j2pi( f−Md2Nf
)
)F˙ s2,r(e
j2pi( f−Md2Nf
)
)
=
D−1∑
d2=0
S−1∑
s1=0
xs1F˙s1,r(e
j2pi( f−Md2Nf
)
)
S−1∑
s2=0
xs2F˙ s2,r(e
j2pi( f−Md2Nf
)
)
=
D−1∑
d2=0
∣
∣
∣
∣
∣
S−1∑
s=0
xsF˙s,r(e
j2pi( f−Md2Nf
)
)
∣
∣
∣
∣
∣
2
≥ 0
where xk is the k-th entry of the vector x.
Chapter B: Parseval Relationship For Energy of
Filters
The following equation gives a relationship between the l2 energy of a filter g’s tap coeffi-
cients and the l2 energy of its transfer function.
∑
l
|g[l]|2 =
1
2pi
|G(w)|2 dw (B.1)
Proof. The following inverse Discrete Time Fourier Transform relates the tap cofficients of a filter
g to its frequency response G
g[l] =
1
2pi
∫ 2pi
0
G(w)ejwldw (B.2)
114
The following Poission summation formula gives us an useful “time-frequency” relation-
ship.
∑
n
e2pijnx =
∑
k
δ(x− k) (B.3)
We note that the delta distribution δ(x) has the following scaling property:
|a|δ(ax) = δ(x) (B.4)
We can now relate the l2 energy of the filter’s tap cofficients to the l2 energy of the filter’s
transfer function.
∑
l
|g[l]|2 =
∑
l
∣
∣
∣
∣
1
2pi
∫ 2pi
0
G(w)ejlwdw
∣
∣
∣
∣
2
=
∑
l
1
4pi2
∫ 2pi
0
G(w1)e
jlw1dw1 ·
∫ 2pi
0
G(w2)e
−jlw2dw2
=
1
4pi2
∫ 2pi
0
∫ 2pi
0
G(w1)G(w2)
∑
l
e2pijl
w1−w2
2pi
︸ ︷︷ ︸
∑
n δ(
w1−w2
2pi −n)
dw1dw2
=
1
2pi
∑
n
∫ 2pi
0
∫ 2pi
0
G(w1)G(w2)δ(w1 − w2 − 2pin)dw1dw2
=
1
2pi
|G(w)|2 dw
Note that w1 −w2 < 2pi since both w1’s and w2’s limits are from 0 to 2pi and hence the sum over
n produces non-zero δ(w1 − w2 − 2pin) only when n = 0. (We ignore the difference of exactly
2pi because it has measure 0.)
115
B.1 References
[1] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statis-
tical Society (Series B), 58:267–288, 1996.
[2] E.J. Candes and Y. Plan. Near-ideal model selection by l1 minimization. Annals of Statistics,
37(5A):2145–2177, 2009.
[3] S. Kirkpatrick and MP Vecchi. Optimization by simmulated annealing. Science,
220(4598):671–680, 1983.
[4] H. Lebret and S. Boyd. Antenna array pattern synthesis via convex optimization. Signal
Processing, IEEE Transactions on, 45(3):526–532, 1997.
[5] M. S. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret. Applications of second-order cone
programming. Linear algebra and its applications, 284(1):193–228, 1998.
[6] C. Ling, S. Wee, C. Wei, and Y. Zhu Liang. Linear sparse array synthesis via convex opti-
mization. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Sympo-
sium on, pages 4233–4236, 2010.
[7] R.A. Haubrich. Array design. Bulletin of the Seismological Society of America, 58(3):977–
991, 1968.
[8] S. Lang, C. Duckworth, and I. McClellan. Array design for mem and mlm array processing.
In Proceedings of the IEEE, pages 145–148, Atlanta, GA, USA, 1981.
[9] P. Pal and P. P. Vaidyanathan. Nested arrays: A novel approach to array processing with
enhanced degrees of freedom. Signal Processing, IEEE Transactions on, 58(8):4167–4181,
2010.
[10] D. B. Ward and M. S. Brandstein. Grid-based beamformer design for room-environment
microphone arrays. In Applications of Signal Processing to Audio and Acoustics, 1999 IEEE
Workshop on, pages 23–26, 1999.
[11] M. S. Brandstein and D. B. Ward. Cell-based beamforming (ce-babe) for speech acquisition
with microphone arrays. Speech and Audio Processing, IEEE Transactions on, 8(6):738–
743, 2000.
[12] J.B. Allen and D.A. Berkley. Image method for efficiently simulating small-room acoustics.
J. Acoust. Soc. Am, 65(4):943–950, 1979.
[13] R.G. Baraniuk, E. Candes, R. Nowak, and M. Vetterli. Compressive sampling. IEEE Signal
Processing Magazine, 25(2):12–13, 2008.
[14] E. J. Candes and M. Wakin. An introduction to compressive sampling. IEEE Signal Pro-
cessing Magazine, 25(2):21–30, 2008.
[15] E.J. Cands and Y. Plan. Near-ideal model selection by 1 minimization. The Annals of
Statistics, 37(5A):2145–2177, 2009.
[16] Emmanuel J Candes. The restricted isometry property and its implications for compressed
sensing. Comptes Rendus Mathematique, 346(9):589–592, 2008.
116
[17] Bernard D Steinberg. The peak sidelobe of the phased array having randomly located ele-
ments. Antennas and Propagation, IEEE Transactions on, 20(2):129–136, 1972.
[18] Richard M Leahy and Brian D Jeffs. On the design of maximally sparse beamforming arrays.
Antennas and Propagation, IEEE Transactions on, 39(8):1178–1187, 1991.
[19] Sverre Holm, Bjrnar Elgetun, and Geir Dahl. Properties of the beampattern of weight-and
layout-optimized sparse arrays. Ultrasonics, Ferroelectrics and Frequency Control, IEEE
Transactions on, 44(5):983–991, 1997.
[20] Thomas Baran, Dennis Wei, and Alan V Oppenheim. Linear programming algorithms for
sparse filter design. Signal Processing, IEEE Transactions on, 58(3):1605–1617, 2010.
[21] David G Leeper. Isophoric arrays-massively thinned phased arrays with well-controlled
sidelobes. Antennas and Propagation, IEEE Transactions on, 47(12):1825–1835, 1999.
[22] Giacomo Oliveri and Andrea Massa. Bayesian compressive sampling for pattern synthesis
with maximally sparse non-uniform linear arrays. Antennas and Propagation, IEEE Trans-
actions on, 59(2):467–481, 2011.
[23] P.P. Vaidyanathan. Multirate Systems And Filter Banks. Dorling Kindersley, 1993.
[24] G. Strang and T. Nguyen. Wavelets and Filter Banks. Wellesley-Cambridge Press, 1996.
[25] Shuichi Ohno and Hideaki Sakai. Optimization of filter banks using cyclostationary spectral
analysis. Signal Processing, IEEE Transactions on, 44(11):2718–2725, 1996.
[26] Andrea Trucco and Vittorio Murino. Stochastic optimization of linear sparse arrays. Oceanic
Engineering, IEEE Journal of, 24(3):291–299, 1999.
[27] Siddharth Joshi and Stephen Boyd. Sensor selection via convex optimization. Signal Pro-
cessing, IEEE Transactions on, 57(2):451–462, 2009.
[28] Dragana Bajovic, Bruno Sinopoli, and Joo Xavier. Sensor selection for event detection in
wireless sensor networks. Signal Processing, IEEE Transactions on, 59(10):4938–4953,
2011.
[29] Ling Cen, Wee Ser, Zhu Liang Yu, Susanto Rahardja, and Wei Cen. Linear sparse array
synthesis with minimum number of sensors. Antennas and Propagation, IEEE Transactions
on, 58(3):720–726, 2010.
[30] Venkat Roy, Sundeep Prabhakar Chepuri, and Geert Leus. Sparsity-enforcing sensor selec-
tion for doa estimation. In Computational Advances in Multi-Sensor Adaptive Processing
(CAMSAP), 2013 IEEE 5th International Workshop on, pages 340–343. IEEE.
[31] Jian Wang, Wei-Xing Sheng, Yu-Bing Han, and Xiao-Feng Ma. Adaptive beamforming with
compressed sensing for sparse receiving array. Aerospace and Electronic Systems, IEEE
Transactions on, 50(2):823–833, 2014.
[32] Siew Eng Nai, Wee Ser, Zhu Liang Yu, and Huawei Chen. Beampattern synthesis for linear
and planar arrays with antenna selection by convex optimization. Antennas and Propagation,
IEEE Transactions on, 58(12):3923–3930, 2010.
117
[33] Zhi Guo Feng, Ka Fai Cedric Yiu, and Sven E Nordholm. Placement design of micro-
phone arrays in near-field broadband beamformers. Signal Processing, IEEE Transactions
on, 60(3):1195–1204, 2012.
[34] J. L. Araque Quijano, M. Righero, and G. Vecchi. Sparse 2-d array placement for arbitrary
pattern mask and with excitation constraints: A simple deterministic approach. Antennas
and Propagation, IEEE Transactions on, 62(4):1652–1662, 2014. 0018-926X.
[35] James D Johnston. A filter family designed for use in quadrature mirror filter banks. In
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP’80.,
volume 5, pages 291–294. IEEE.
[36] Tongwen Chen and BA Francis. Design of multirate filter banks by h optimization. IEEE
Transactions on Signal Processing, 43:2822–2830, 1995.
[37] Eric Viscito and Jan P Allebach. The analysis and design of multidimensional fir perfect
reconstruction filter banks for arbitrary sampling lattices. Circuits and Systems, IEEE Trans-
actions on, 38(1):29–41, 1991.
[38] Truong Q Nguyen. Digital filter bank design quadratic-constrained formulation. Signal
Processing, IEEE Transactions on, 43(9):2103–2108, 1995.
[39] Ha Hoang Kha, Hoang Duong Tuan, and Truong Q Nguyen. Efficient design of cosine-
modulated filter banks via convex optimization. Signal Processing, IEEE Transactions on,
57(3):966–976, 2009.
[40] Timothy Davidson. Enriching the art of fir filter design via convex optimization. Signal
Processing Magazine, IEEE, 27(3):89–101, 2010.
[41] Michael Grant and Stephen Boyd. Cvx: Matlab software for disciplined convex program-
ming, version 2.1, March 2014.
118