ABSTRACT Title of dissertation: Optimal Space-Time-Frequency Design of Microphone Networks Yenming Mark Lai, Doctor of Philosophy, 2014 Dissertation directed by: Professor Radu V. Balan Department of Mathematics Consider a sensing system using a large number of N microphones placed in multiple di- mensions to monitor a acoustic field. Using all the microphones at once is impractical because of the amount data generated. Instead, we choose a subset of D microphones to be active. Specif- ically, we wish to find the D set of microphones that minimizes the largest interference gain at multiple frequencies while monitoring a target of interest. A direct, combinatorial approach – test- ing all N choose D subsets of microphones – is impractical because of problem size. Instead, we use a convex optimization technique that induces sparsity through a l1-penalty to determine which subset of microphones to use. Our work investigates not only the optimal placement (space) of microphones but also how to process the output of each microphone (time/frequency). We explore this problem for both single and multi-frequency sources, optimizing both microphone weights and positions simultaneously. In addition, we explore this problem for random sources where the output of each of the N microphones is processed by an individual multirate filterbank. The N processed filterbank outputs are then combined to form one final signal. In this case, we fix all the analysis filters and optimize over all the synthesis filters. We show how to convert the continuous frequency problem to a discrete frequency approximation that is computationally tractable. In this random source/multirate filterbank case, we once again optimize over space-time-frequency simultaneously. Optimal Space-Time-Frequency Design of Microphone Networks by Yenming Mark Lai Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2014 Advisory Committee: Professor Radu V. Balan, Chair/Advisor Professor John J. Benedetto Professor Howard Elman Professor Patrick Fitzpatrick Professor James Yorke © Copyright by Yenming Mark Lai 2014 To April and Richard Lai, my loving parents, ii Acknowledgments I would first like to thank my advisor Radu Balan. It has been an incredible experience being under your tutelage for these last three years. I appreciate the countless hours you have invested in me and am so very grateful to have had such a kind and intelligent advisor. I carry your intellectual banner onto Texas! Thanks to John Gilbert for encouraging me to start my graduate school journey six years ago. Thank you for believing in me. Thanks to John Benedetto for his excellent coaching throughout these past six years. We know who is boss. Thanks to Howard Elman for mid-class stretch breaks and giving me a chance to redeem myself on your final. Your coaching has helped me more than you can imagine. Thanks to Jim Yorke for always hearing me out. You are a role model in scholarship and leadership, and it is difficult to walk away from a conversation with you without learning some- thing new. Thanks to Giovanni Forni for his patient office-hours tutoring in Complex Analysis my very first semester. You are a brilliant teacher and scholar. Thanks to Bob Warner for cutting a clear path through the analysis jungle for uninitiated new comers. Thanks to Mike Fitzpatrick for always lending me his ear. I will miss seeing your open door on the third floor. Thanks to Sennur Ulukus for her helpful advice and modeling clear and excellent teaching. I still remember 667 and the pack of cookies you brought to our makeup class! Thanks to Rama Chellappa for great stories and insights. Your work sets the bar in scholar- ship. iii Thanks to Min Wu for a tremendous DSP class. I learned so much. Thanks to Jeff Henrikson for his fantastic technical support. My thesis would have no computational meat without your ninja skills. Sorry about flipping the power switch! Thanks to Eitan Tadmor for graciously providing CSCAMM resources. My large scale computations would not have been possible without the CSCAMM cluster. Thanks to Michael Grant for providing commercial grade optimization software for free to academics. Your toolbox is helping the research community press into unknown territory. Thanks to Dan Mote for taking the time to meet with a graduate student. I would wrestle with choices A and B and somehow you always suggested C, the superior option. Thanks to Stephen Trachtenberg for his leadership insights. It is such a treat to sit in front of a master. Thanks for Dave Mount for helpful coaching and great tennis practices. I learned how to volley playing with you! Thanks to Razi Haimi-Cohen for his excellent mentoring at Bell Labs and giving me my first taste of research. I respect and appreciate your high standards, work ethic, and integrity. Thank to Terri Rogers for her support and advice at Bell Labs. I appreciate your always open door! Thanks to Heiko Claussen for a super fast computer and always helpful suggestions. You are both a great teammate and leader. Thanks to Justinian Rosca for great vision and helping me to see the bigger picture. We need to play tennis again! Thanks to Lex McCusker for your NSF I-Corp mentorship. You have fundamentally altered how I see both research and entrepreneurship and broadened my horizons. I learned so much working side-by-side with you. iv Thanks to Karl Ginter for knowing everything about everything. I lean heavily on your wisdom and think so much of your leadership. Thanks for showing me the first few steps in how to hunt. Thanks to Nate Strawn for helping me make decisions at crucial crossroads. Your advice was invaluable and always right. Thanks to Stan Reeves and Chris Rozell for excellent coaching during my job hunt. I respect the way you have balanced both career and family. Thanks to a special group of professors at Rice University who inspired my love for mathe- matics and engineering. Behnaam Aazhang, you are both smart and savvy and also a great leader. Rich Baraniuk, your enthusiasm is infectious and your teaching and research lights so many fires! Don Johnson, your class first taught me what it means to think really hard and those early lessons have carried me far. Thanks to a group of guys who helped me hack it through Rice – Shay Harnoy, Patrick MacAlpine, Juan Navarro, and Gary Printy. I am sad to see BCS come to close. I am so proud to have known and worked with each of you when we were all clueless undergraduates and look forward to seeing the impacts each of you will have in your respective professions. (Shay - Fortune 500 CEO, Patmac - brilliant technical lead, Juan - Federal Reserve executive, Gary Printy, the guy who pulls strings around town!) Also, Shay, is TomNod going to pay for our dinner sometime? Thanks to Jim Tour for modeling scholarship, faith, and boldness and Shireen Tour for providing warmth and hospitality week after week. Your lives are powerful testimonies. Thanks to Robert Newsome for his steadfast friendship and adventuresome spirit. This thesis would not have been written without your help and hospitality. Thanks to Sarah Berkey for her support and friendship. I am grateful our lives intersected in Maryland. v Thanks to Ryan Penley and Maryland Cru for the incredible encouragement over these last six years. I can’t wait to see what is in store for the next six years. Thanks to the guys who were crazy enough to slog it out in the gym and on the track with me: Alex Cloninger, James Murphy, Ryan Hunter, Avinash Sahu, Kevin Galloway, Daniel Yin, Chao-wei Chen, Mike Fahey, Pat Brown, Stefan Zavalin, Casey Allred, and Mike Casto. You each have a little bit of iron in your souls. Thanks to Mike Casto and Kyungjin Yoo for running countless tennis drills with me. I am working hard to get up to a 4.5 rating in Texas! Thanks to the MATH bible study group: Josh Ballew, Victoria Taroudaki, Bryant Angelos, Clare Wickman, Rui Can, Jonathan Fernandez, Christiana Sabett, Asia Wyatt, Franck Olivier Ndjakou Njeunje. Studying and praying with you has been such nourishment. Thanks to Kevin Offner and the Intervarsity Graduate Fellowship group. I have learned so much about reconciling faith and work in my day to day life. Thanks to my housemates and friends: Kevin Galloway (captain worthy of respect), Matt Videon (athlete, friend, and hunter of White Buffalo), Eli Tung (thoughtful friend and super chef), Matt Buttner (County, country, and a good guy), Cain Jeffries (good music and good chicken!), Sebastian Van Neste (leader, comrade, and always striving for the best), Andrew Flavin (thinker, diplomat, and brother), Jim Spatz (friend and roommate of the year), Troy Stevens (fun train), Es- rael Seyoum (Pad Thai and man of faith), Ruben Amalalo (Are sure Ghanaians are not Asians?), Sheldon Neuberger (John Mayer, expert computer scientist, and a chill guy), Jadon Ramsing (soc- cer star with a kind heart), Jeremy Perry (amazing voice and caring teacher), Emmanuel Yeboah (extraordinary work ethic and built like a rock), and Matt Shriver (faithful in everything he does). 5010 will always have a special place in my heart because of you. Great times, great food, and great stories to tell the kids! When’s the next housecleaning? vi Thanks to Kevin Moore for teaching me how to teach. I deeply respect your practice and am thankful for your friendship. Thanks to Terrence Long for helping me get through MATH631. I had no business being in that class and would not have made it through without your patient tutoring. Thanks to AJ Hergenroeder for his steadfast friendship and constant encouragement. I love your intensity and spirit brother. Thanks for Austin Bratton for always checking in. You are a faithful friend. Thanks for Justin Lo for his help small and large. You are always watching out for me and am very grateful for your friendship. Thanks to Jim James McTuttle for helping me get back to Texas. I am grateful for your friendship and the way it challenges me. Thanks to Jonathan Lugo for all the on-the-fly coding coaching. It has been wonderful to chat with both an expert and friend. Thanks to Joaquin Guadarrama. You are a friend in stormy times and would not have made it through Lanier without your help. You have my vote, literally and figuratively. Thanks to Gilbert Wang for cooking a feast for my going away party. You are going to make a woman very happy one day, and without a doubt, you are solid. Thanks to William Mantzel for giving me both backroom insights into academia and thoughtful suggestions. Thanks to Bill Bill Hodges for helping me on a very important drive from DC to Nashville and constant encouragement through the years. Have you looked at the Comp210 assignment yet? A special thanks to Alverda McCoy. You are wise, kind, and gracious and mom to so many students. I am so grateful to have been under your leadership for five years. I will dearly miss our regular chats and your wise counsel. Thank you for your support, time, and prayers. AMSC is so vii blessed to have you on board. Thanks to Konstantina Trivisa for taking a chance and admitting a graduate student mid- year. You have incredible vision and energy and a warm heart. I will miss hearing your voice in the hallways, and I anticipate big things in your future! Thanks to Angel and Chang Ho for their steadfast support and gracious hospitality. Indi- anapolis has a special place in my heart, especially Angel’s refrigerator and cupboard. Thanks to Mike and Jenny Holberg for their support and prayers. It has been a joy to see your family grow and grow up! Thanks to my mom April Lai for her faithful prayers and encouragement through these years. You are a source a strength. Thanks to my dad Richard Lai for always modeling diligence, patience, and integrity. You have repeatedly earned my respect. Finally, thanks to Elizabeth Matthews for her constant support and encouragement. You hold up half the sky. I hope one day to dedicate a book to you. viii List of Tables 3.1 Number of separated minima of criterion J for D = 5 active microphones out of N and a neighborhood radius R shows the vast majority of combinations are nonoptimal locally. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 Comparison λ-method vs. global minimum. . . . . . . . . . . . . . . . . . . . . 29 3.3 WTIG for the λ-method and SA . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1 Worst interference gains[dB] for: Beamformer, MultiFreq λ-method, SA Multi- Freq λ-method, SingleFreq λ-method, SA SingleFreq λ-method. . . . . . . . . . 48 ix List of Figures 3.1 Sample Scenario: L = 193 sources (1 target and 192 interferences), N = 200 mi- crophones. The 200 microphones are placed on the walls of a 10 m by 8 m room. To monitor the target located at (3m, 4m), we wish to use only V microphones instead of all N so as not to overwhelm our central processing unit with data. If we set V = 5, there are (N V ) = (100 5 ) ≈ 7.5× 107 possible sets of 5 microphones to use. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Gain pattern of a 32 element microphone array at 1000 hz found by λ-method. The worst target to interference gain is 1.5 dB. The array is down-selected from 1000 uniform distributed microphones. . . . . . . . . . . . . . . . . . . . . . . 30 3.3 Worst target to interference gain for a number of active microphones and different target locations. 32 microphones are sufficient to achieve performance similar to 1000 microphones. The sufficient number of microphones is dependent on the target location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 Gain pattern of a 32 element microphone array at 1000 hz found by Simulated An- nealing. The worst target to interference gain is 0.1 dB. The search was initialized by the result of the λ-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1 Sample Scenario: L = 193 sources (1 target and 192 interferences), N = 200 mi- crophones. The 200 microphones are placed on the walls of a 10 m by 8 m room. To monitor the target located at (3m, 4m), we wish to use only V microphones instead of all N so as not to overwhelm our central processing unit with data. If we set V = 5, there are (N V ) = (100 5 ) ≈ 7.5× 107 possible sets of 5 microphones to use. The L sources are assumed to be broadband, that is each source consists of the F frequencies f1, f2, . . . , fF . We allow the N filter weights to vary for each of the F frequencies and hence our optimization problem returns a set of F · N filter weights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Gain map for classical beamformer that maximizes SNR at 1000 Hz . . . . . . . 49 4.3 Gain map for beamformer found by multi-frequency λ-method at 1000 Hz . . . . 50 4.4 Comparison of mean and variance of maximum gain for varying number of ran- domly placed interferences at 1000 Hz . . . . . . . . . . . . . . . . . . . . . . . 50 4.5 Ordered weights found by multi-frequency λ-method . . . . . . . . . . . . . . . 50 5.1 Single channel of multi-rate filterbank . . . . . . . . . . . . . . . . . . . . . . . 52 x 5.2 Filterbank for microphone n. Notice that D, the downsampling and upsampling factor, is not necessarily equal to C, the number of channels (subbands). . . . . . 54 6.1 A single acoustic source xr propagates to N microphones, and the output of each the N microphones is processed by an individual multi-rate, finite-impulse re- sponse (FIR) filterbank. The subscript r of xr indexes the I + 1 sources which consist of I interference sources, whose overall gain we wish to minimize, and 1 target source, whose gain we wish to be exactly 1 so that our system perfectly reconstructs the target. We index the target source by setting r = 0, and hence the target source is denoted by x0. The propagation from source xr to filterbank n is modeled with transfer function Pr,n. The prefilter I0,n inverts the propaga- tion from the target source, x0, to filterbank n. The cascade of the propagation filter Pr,n and target inversion filter I0,n is denoted as Hr,n. The output of the N filterbanks are summed to give a processed signal yr . . . . . . . . . . . . . . . 59 6.2 FIR multi-rate filter bank for n-th microphone . . . . . . . . . . . . . . . . . . . 63 6.3 A single acoustic source x propagates to N microphones, and the output of each the N microphones is processed by an individual multi-rate, finite-impulse re- sponse (FIR) filterbank. The propagation from source x to filterbank n is modeled with transfer function Pn. The prefilter I0,n inverts the propagation from the tar- get source, x0, to filterbank n. The cascade of the propagation filter Pn and target inversion filter I0,n is denoted as Hn. The output of the N filterbanks are summed to give a processed signal yr . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.1 Room environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 8.2 Performance on optimization grid consisting of 1240 interferences in db. . . . . . 105 8.3 Maximum magnitude of each subchannel’s synthesis filter for the λ value that produced the desired number of subchannels. . . . . . . . . . . . . . . . . . . . 106 8.4 Maximum magnitude of each subchannel’s synthesis filter after debiasing. . . . . 107 8.5 Locations of active low frequency subchannels. . . . . . . . . . . . . . . . . . . 108 8.6 Locations of active high frequency subchannels. . . . . . . . . . . . . . . . . . . 109 8.7 The synthesis filters frequency responses of a filterbank whose low frequency sub- channel is only active. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 8.8 The synthesis filters frequency responses of a filterbank whose high frequency subchannel is only active. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 8.9 The synthesis filters frequency responses of a filterbank whose subchannels are both active. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 8.10 The synthesis filters frequency responses of a filterbank whose subchannels are both inactive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 8.11 Performance on evaluation grid consisting of 4960 interferences in db . . . . . . . 111 xi List of Abbreviations DFT Discrete Fourier Transform DTFT Discrete Time Fourier Transform WSS Wide Sense Stationary WSCS Wide Sense Cyclostationary xii Chapter 1: Overview In this work, we explore the design of audio systems to monitor targets in complex envi- ronments. We propose algorithms to not only choose the placement of microphones but also how to process each of the microphones sampled signals to monitor a target while attenuating other interfering sources. This work is primarily motivated by industrial needs. For example, engineering managers are interested in monitoring specific bearings on a wind turbine, car manufacturers interested in the sound of a specific piston, and train conductors interested in detecting aberrant sounds in a specific wheel set. These monitoring problems would be trivial if any of the following conditions applied: • Microphones can be placed adjacent to the target of interest. • The target source is generated in a quiet or interference-free environment. • The interference sources’ location and signature are known. Each of the three previous use cases violate some if not all these conditions. We will assume our environment contains either a large number of interferences with known locations or a small number of interferences with unknown locations. In addition, we will assume we are limited to using only N microphones because of bandwidth constraints. Furthermore, to make our problem computationally tractable, we will discretize possible 1 microphone locations so that there is a finite set of possible microphone locations. Choosing a fixed number of microphone locations from a set of possible microphone locations is by definition a combinatorial problem and for even a moderate size problem, the number of possibilities can be overwhelming. For example, there are approximately 1060 possibilities when choosing 32 microphone locations from a possible set of 1000 microphone locations. In comparison, there are approximately 1050 atoms that make up planet Earth. To make reasonable comparisons in our simulations, we conduct exhaustive searches on small problem setups to discover the global optimum for each setup and compare our algorithm, a convex optimization scheme, against these global “bests”. This work consists of three parts, with each subsequent part building on previous parts’ ideas and algorithms. The content of each part, however, is self-contained and can be read indi- vidually without consulting other parts. 1.1 Part I: Processing Single Frequency Source Using Delay-Scale-Sum Beamformer In part I, we explore the case where the acoustic source consists of a single frequency. We propose an algorithm to find not only the microphone locations but also the corresponding beamforming weights. In other words, we both place the microphones and offer algorithms to process the sampled output of each the microphones. Our processing scheme is a basic delay-scale-sum beamformer. Our system takes the out- put of each of the N microphones, applies a chosen delay and amplitude scaling to each of the microphone’s output and then sums up the N processed signals to give a final output. In the fre- quency domain, this delay and scaling beamforming weight can be represented as simply a scaled complex exponential. In practice, the processing system works as follows. Each of the N microphones samples 2 the continuous time signal at the appropriate sampling rate (>= Nyquist). A Discrete Fourier Transform (DFT) of sufficient length to achieve the needed frequency resolution is then taken on each of the N streams of discrete samples. If the original source signals consisted only of a pure tone and the correct sampling rate and DFT length were chosen, the DFT transform should produce an output of DFT coefficients with only one non-zero entry. For each of theN sets of DFT coefficients, the system multiplies the computed beamforming weight at the non-zero frequency bin. We note that the beamforming weights can vary for each of the N processing streams. The output of the N processing streams is then summmed to give the final result. 1.2 Part II: Processing Broadband Source Using Delay-Scale-Sum Beamformer Part II is a direct extension of Part I. Here, we explore the broadband case, where signals are assumed to be a sum of F narrowband signals. Our algorithm again finds the optimal placement of microphones but now also computes beamforming weights for each of the F frequencies of interest. If the original source signals consisted only of a sum of F pure tones and the correct sam- pling rate and DFT length were chosen, the DFT transform should produce an output of DFT coefficients with only F non-zero entries. Our algorithm computes beamforming weights for each of the F non-zero entries for each of the N processing streams. 1.3 Part III: Processing Broadband Source Using Multi-rate Filterbanks Part III generalizes the microphone placement and processing scheme of Part I and Part II. In this part’s setup, we now have N multi-rate filterbanks, each processing the corresponding output of one of the N microphones. Each of the N filterbanks decomposes the discrete input into C subchannels, and hence our system uses a total of N · C subchannels. The previous constraint 3 of Part I and Part II of using only N microphones because of bandwidth limitations can now be generalized to the constraint of using only N · C subchannels. We can now use the output of between N and N · C microphones and still not violate our bandwidth constraint by selecting which subchannels to process in each filterbank. By using N microphones and each of the N microphones C subchannels, we fulfill our bandwidth constraint of N · C subchannels. By using N · C microphones but only choosing to use one subchannel of each of the microphones, we also fulfill our our bandwidth constraint of N · C · 1 subchannels. In other words, instead of choosing the placement of N microphones out of a set of P possible microphone locations, we choose a subset of N · C subchannels to use out of a possible P · C subchannels. This type of processing scheme would be appropriate when deploying micro- phones is relatively inexpensive but the transfer of the collected data of each of the microphone is expensive. In multi-rate filterbanks, each subchannel is processed by both an analysis and a synthesis filter. We fix the analysis filters to reduce the computational complexity and instead compute only the synthesis filters for each of the chosen N · C subchannels. Computing the filters for the multi-rate filterbanks generalizes computing beamforming weights in Part I and Part II. The DFT and IDFT implementation of Part I and Part II can be interpreted as the analysis and synthesis filtering respectively. The choice of beamforming weights of Part I and Part II now correspond to the choice of the synthesis filters in Part III. 4 Chapter 2: Notation We denote both the label of and the frequency response of the filter by capitalization of a letter. We denote the impulse response by the lowercase of the same letter. Hence a filter F has transfer function F and impulse response f . Summation over all possible indices t is indicated by placing the index variable t below the summation, that is ∑ t = ∞∑ t=−∞ The complex number √ −1 is denoted by the letter j, in contrast to the letter i. Hence, we write Euler’s formula as ejθ = cos(θ) + j sin(θ). The complex conjugate of complex number is denoted by a bar over the number. Hence we have, for all real a and b, a+ jb = a− jb Parameters that are continuous are placed in parenthesis whereas parameters that are dis- crete are placed in brackets. Hence, the function f(t) = sin(t), 0 ≤ t < 2pi has a continuous parameter t whereas f [t] = 3t, t = 0, 1, 2, . . . , 5 has a discrete parameter t. We denote the z-transform of a discrete signal r[k] as R(z) = ∑ k r[k]z −k. The formula for the sum of the first D terms of a geometric series with ratio r 6= 1 5 D−1∑ k=0 rk = 1− rD 1− r yields the useful identity D−1∑ k=0 e−j2pi n D k =    D if n is an integer multiple of D; 0 otherwise. (2.1) In addition, we will let WD be the D-th root of unity, that is WD = e − 2pijD (2.2) The delta sequence with a single parameter m, δm, is defined as follows δm =    1 if m = 0; 0 otherwise. (2.3) The Kronecker delta symbol with two parameters m,n, δm,n is defined as follows δm,n =    1 if m = n; 0 otherwise. (2.4) Matrices and in turn vectors will be denoted in bold face. Hence, A refers to a matrix A. The i-th, j-th entry, the entry of the i-th row and j-th column, of a matrix is denoted as (A)i,j . Indices of matrixes will begin with 0. Hence(A)2,0 refers to the entry in the third row and first column of the matrix A. A denotes the matrix A but with each entry complex conjugated. Hence, we have (A)i,j = (A)i,j AT refers to the transpose of the matrix A. Hence, we have 6 (A)i,j = (AT )j,i A∗ refers to the conjugate transpose of the matrix A where A∗ is obtained by taking the transpose of A and then taking the complex conjugate of each entry. Hence we have A∗ = AT The z-transform of a discrete function x[t], X(z), is defined as follows X(z) = ∑ t x[t]z−t The Discrete Time Fourier Transform (DTFT) of a discrete functon x[t], X(w), is defined as follows X(w) = ∑ t x[t]e−jwt We note that if we set z = ejw in the z-transform, we have the DTFT, that is X(ejw) = ∑ t x[t](ejw)−t = X(w) (2.5) Using (2.5), we have X(w + 2pi) = ∑ t x[t](ej(w+2pi))−t = ∑ t x[t](ejw)−t e−j2pit︸ ︷︷ ︸ 1 = X(w) (2.6) In other words, the DTFT is 2pi periodic. If we again set z = ejw in the z-transform, we have the following relationship X(zW dD) = X(e j(w−2pi dD )) = X(ej(w−wD,d)) = X(w − wD,d) (2.7) 7 using the shortened notation wD,d = 2pi d D (2.8) Equation(2.7) connects the z-transform notation and the DTFT notation when z = ejw. Without loss of generality, we will assume that the z-transform series converge in a region around the unit circle ejw except possibly for a finite number of points. We will use the z-transform notation exclusively in this paper, with the assumption z = ejw. Convolution between two discrete functions f and g, evaluated at time t, is defined as follows (f ∗ g)[t] = ∑ τ f [t− τ ]g[τ ] A change of variable shows that convolution is commutative, that is (f ∗ g)[t] = (g ∗ f)[t] A change of variable also shows that if a[t] = (f ∗ g)[t], then A(z) = F (z)H(z) A(w) = F(w)H(w) In other words, convolution in the time domain becomes a product in the z-domain and Fourier domain. The Hilbert-Schmidt norm for a matrix A, sizeM×N , is denoted as ‖A‖2HS and is defined as follows: 8 ‖A‖2HS = M−1∑ m=0 N−1∑ n=0 |(A)m,n|2 (2.9) The Hilbert-Schmidt inner product for matrices A and B, both of size M ×N , is denoted as 〈A,B〉HS and is defined as follows: 〈A,B〉HS = Tr(AB ∗) (2.10) 9 Part I Processing Single Frequency Source Using Delay-Scale-Sum Beamformer 10 Chapter 3: Processing Single Frequency Source Using Delay-Scale-Sum Beamformer 3.1 Overview The main goal of this work is to adaptively employ a large set of microphone sensors dis- tributed in multiple dimensions to scan an acoustic field. Processing data from a large set of sensors will necessarily involve intelligent definition of suitable subsets of sensors active at var- ious times. This paper presents a novel method for optimal beam pattern design for large scale sensor arrays using convex and non-convex optimization techniques to define optimal subsets of sensors capable to select a target location while suppressing a large number of interferences. The first of two optimization techniques we present, uses a LASSO-type approach to convexify the corresponding combinatorial optimization problem. The second approach employs simulated an- nealing to search for optimal solutions with a fixed size subset of active sensors. Our numerical simulations show that for scenarios of practical interest, the convex optimization solution is almost optimal. 11 3.2 Introduction Consider a large scale microphone array having N microphones that monitors a surveil- lance area. Here N could be in the order of thousands. Data could be collected from all micro- phones, processed locally, and further communicated to a central processing unit. For example, for N = 1000 microphones and a data sampling rate of 100000 samples per second, the bandwidth requirements for a system performing exhaustive sampling is 1Gsamples/sec, an overwhelming amount of data for even current hardware systems. Alternatively, the central processing unit could implement a policy of sparse spatial sampling where only a subset of microphones is polled for data at any time. This may be desirable for practical reasons: cost (in power consumption, com- munication or processing bandwidth) is simply too high for an exhaustive strategy. This part advocates the need for advanced, dynamic control and signal processing of what data should be manipulated in place of the full extent of the data, and it introduces novel strategies and formal analysis of what can be achieved when only a sparse subset of microphones is sampled simultane- ously. Assume a surveillance area that consists of a set of point sources that we wish to distinguish from one another. Our task is to sample any given source location using a maximum of V micro- phones of the very large array (V  N ). We then process the raw data from the V microphones to estimate the point source at that location. Our system can then scan the room by using different V -size subsets of microphones for each possible source location at different times. By doing so, the central processing unit is processing data from V microphones at a time and not all N micro- phones at a time. Figure 3.1 provides a possible setup scenario. We will thus design and analyze sparse spatial sampling strategies with near-optimal performance. Specifically, we develop algo- rithms that minimize the p-norm of interference gains with p = 1, 2,∞ while reconstructing the 12 target source perfectly. Conceptually, the system works in two steps. First, the discovery step esti- mates locations of all point-like sources. Second, the system separates source signals, one source at a time. Below we define the problem in precise terms. Throughout this part, we assume the follow- ing hypothesis: H1. The total number of microphonesN and their locations(x, y, z-coordinates) are known. H2. The total number of sources L and their locations (x, y, z-coordinates) are known. H3. The maximum number of microphones active at any time is fixed in advance and denoted by V . In general V  L. H4. Signals are narrow band and operate on the same frequency f . In practice, the number of sources and their locations are often unknown. By assuming a large number of known sources and locations, our system can search each possible source location to detect whether or not a real source is present. 13 0 2 4 6 8 100 2 4 6 8 Target Interferences Sensors Fig. 3.1. Sample Scenario: L = 193 sources (1 target and 192 interferences), N = 200 micro- phones. The 200 microphones are placed on the walls of a 10 m by 8 m room. To monitor the target located at (3m, 4m), we wish to use only V microphones instead of all N so as not to over- whelm our central processing unit with data. If we set V = 5, there are (N V ) = (100 5 ) ≈ 7.5× 107 possible sets of 5 microphones to use. 3.3 Problem formulation Consider the setup described earlier and sketched in Figure 3.1. We assume that both the N microphone and L source locations are known. We use the direct-path model to model the propagation from source to microphone, and hence the transfer function between source l and microphone n at frequency f is given by Hl,n(f) = e−2piifc‖rl−rn‖ ‖rl − rn‖ , 0 ≤ l ≤ L− 1, 0 ≤ n ≤ N − 1 (3.1) where rn, rl denote the position vectors of microphone n and source l respectively, and c is the wave propagation speed, which in our case is the speed of sound, 343 m/sec. 14 Let xn(τ) denote the continuous time-domain signal received by microphone n. The short- time Fourier transform (STFT) of xn, Xn(f, t), is related to the STFT of source signals Sl(f, t) as follows Xn(f, t) = L−1∑ l=0 Hl,n(f)Sl(f, t) + νn(f, t) (3.2) where νn(f, t) is the microphone n noise, f denotes the frequency, and t denotes the time frame index. In other words, for fixed frequency f and time frame index t, the signal received by mi- crophone n, in the frequency domain, is a sum of L signals and the noise of microphone n, νn. Each of the L signals received by the microphone originates from one of the L sources, Sl, and the propagation of the source signal Sl to microphone n is modeled by multiplying the source signal Sl by transfer function Hl,n. Our system estimates one source signal at a time. Hence, one of the sources is declared the target, and the remaining sources are declared interferences. Our system can then estimate all the L sources by repeating this process L times – choosing a target source, declaring the rest of the sources as interferences, and then estimating the target source. Our system is a delay-scale-sum beamformer. In other words, the input to each microphone, xn, is delayed and scaled by some variable amount, and then all the delayed and scaled microphone signals are summed up to give a single final output signal, y. By choosing an appropriate delay and scaling for each microphone, we wish this final output signal to contain an undistorted target source signal and attenuated versions of the other L− 1 interference source signals. The delay and scaling that microphone n performs can be modeled in the frequency domain as multiplication by a weighted complex exponential or equivalently multiplication by a complex number, wn(f). A key observation is that if wn(f) = 0, then microphone n is inactive at frequency f . For simplicity of notation, we assume the zero indexed source is the target and the remaining 15 L− 1 sources are interferences. The STFT of the final output y is given by Y (f, t) = N−1∑ n=0 wn(f)Xn(f, t) = L−1∑ l=0 ( N−1∑ n=0 wn(f)Hl,n(f) ) Sl(f, t) + N−1∑ n=0 wn(f)νn(f, t) (3.3) We observe from (3.3) that the response factor of source l, Sl, is given by Rl(f) = N−1∑ n=0 wn(f)Hl,n(f). (3.4) We note that Rl(f) is a complex scalar that shows how our delay-scale-sum beamformer processes source l. By taking the modulus of Rl(f), we get the gain Kl(f), that is Kl(f) = |Rl(f)| (3.5) The gain Kl(f) is a real non-negative scalar that measures how much our system attenuates or amplifies a given source l. If 0 ≤ Kl(f) < 1, the system attenuates the source l and conversely if Kl(f) > 1, the system amplifies source l. If Kl(f) = 1, the system does not distort the amplitude of source l but may possibly introduce a phase shift to source l. Ideally, for fixed frequency f , we wish to choose N beamforming weights such that Rl(f) = 0 for l = 1, 2, . . . L − 1 and R0(f) = 1. In other words, our system of N microphones completely cancels out all L− 1 interferences while perfectly reconstructing the the target source, source 0. This is achievable in the case where there are more microphones than sources, that is N > L. In general, for fixed frequency f , we wish to choose N beamforming weights such that p-norm of the L − 1 interference responses, Rl(f) for l = 1, 2, . . . L − 1, is small while still achieving perfect reconstruction of the target source, that is R0(f) = 1. We denote the column vector of interference responses, size (L− 1)× 1, as RI,(L−1)×1(f) where 16 RI,(L−1)×1(f) = [R1(f), R2(f), . . . , RL−1] T (3.6) In our setup, each of the L sources propagates to each of the N microphones where the L sources consist of 1 target and L − 1 interferences. For the l-th source, there are N transfer functions modeling the propagation from the l-th source to the N microphones. We denote the row vector of transfer functions for source l, size 1×N , as Hl,1×N (f) where Hl,1×N (f) = [Hl,1(f), Hl,2(f), . . . ,Hl,N−1(f)] (3.7) Since we have a total of N microphones, there are a total of L×N transfer functions. We denote the source transfer function matrix, size L×N , as HL×N (f) where HL×N (f) =                 H0,1×N (f) H1,1×N (f) H2,l×N (f) ... HL−1,1×N (f)                 (3.8) If we consider only transfer functions corresponding to the L − 1 from interferences, we can delete the first row of (3.8) to get the interference transfer function matrix, size (L− 1)×N , HI,(L−1)×N (f) where HI,(L−1)×N (f) =             H1,1×N (f) H2,l×N (f) ... HL−1,1×N (f)             (3.9) 17 We can denote the column vector of all beamforming weights, size N × 1, as wN×1(f) where wN×1(f) = [w0(f), w1(f), . . . , wN−1(f)] T (3.10) We observe from the definition of response factor, (3.4), that all L−1 interference response factors, RI,(L−1)×1(f) , can be expressed as a matrix-vector multiply between interference trans- fer function matrix HI,(L−1)×N (f) and weight column vector of wN×1(f), that is RI,(L−1)×1(f) = |HI,(L−1)×N (f)wN×1(f)| (3.11) In addition, the target perfect reconstruction requirement, that is R0(f) = 1, can be com- pactly rewritten as 〈wN×1(f), H¯0,1×N (f)〉 = 1, where the bar denotes complex conjugation, to compensate for the complex conjugation in the inner product. Hypothesis H3 limits the number of simultaneously active microphones. Microphone n is considered active if for fixed frequency f , wn(f) 6= 0. Thus an appropriate measure of the number of active microphones for fixed frequency f is the pseudo-norm ‖wN×1(f)‖0 defined by ‖wN×1(f)‖0 = |{n ; wn(f) 6= 0}| (3.12) where |U | represents the cardinal of set U . We can now state our optimization problem as follows: Optimal V -Sparse Beampattern Design Problem for Fixed Frequency f min ‖HI,(L−1)×N (f)wN×1(f)‖p wN×1(f) subject to 〈wN×1(f), H¯0,1×N (f)〉 = 1 ‖wN×1(f)‖0 = V (3.13) 18 Note that this is a non-convex optimization problem due to the l0 pseudo-norm constraint ‖w‖0 ≤ V . 3.4 Optimization Strategies We study two optimization strategies for problem (4.5), which is non-convex and combi- natorial by nature. First, we present a solution we call the λ-method, which solves a regularized version the problem. Second, we present a simulated annealing based method, which has global search capability. 3.4.1 The λ-method This method is inspired by LASSO regularization [1], a regression technique that minimizes the sum of squares of residual errors subject to the l1 norm of the coefficients being less than a con- stant. Furthermore, similar to the sparse signal and model estimation approach in [2], the l0 pseudo-norm (3.12) is replaced by the l1 norm ‖wN×1(f)‖1 = ∑N n=1 |wn(f)| which is then incorporated into the optimization criterion using a Lagrange multiplier λ. The optimization prob- lem (4.5) is first replaced by the following convex optimization problem which is the λ-method: min ‖HI,(L−1)×N (f)wN×1(f)‖∞ + λ‖wN×1(f)‖1 wN×1(f) subject to 〈wN×1(f), H¯0,1×N 〉 = 1 (3.14) As λ increases, the l1 penalty term forces more and more of the weights wn(f) to have very small magnitude, which means that the corresponding microphones are very near inactive. For very large λ, the optimization problem finds a solution with exactly one non-trivial weight and hence exactly one active microphone so that the target perfect reconstruction constraint is fulfilled. As 19 λ decreases, the penalty term becomes less expensive and more microphones become active. At the lower limit of λ = 0, all microphones are allowed to be active. We note, however, in general, there is no need for more than min(N,L) active microphones. We note that λ is a data-dependent parameter that needs to be tuned to discover the final subset of microphones that are to be used. An additional step is needed to find the corresponding microphone weights. Bisection method to discover λ that produces V microphones We discover the λ that produces exactly V microphones using the bisection method. The algorithim follows: (1) Choose an upper bound bupper for λ. We note that bupper is user-chosen and data- dependent. The lower bound blower for λ is automatically set to 0. (2) Set λ = (bupper + blower)/2. (3) Solve the optimization problem (3.14) with the λ calculated in step (2). (4) We measure the number of active microphones returned by the solution of step (3) as follows. We note that the the solution of (3.14) is a N -length vector of beamforming weights, i.e. compex numbers. First, we measure how strongly active microphone n is by measuring the magnitude of its beamforming weight wn. Hence, let gn = |wn(f)|, 1 ≤ n ≤ N be the measure of activity of microphone n. Sort the sequence (gn)1≤n≤N so that its values are monotonically decreasing, that is gn1 ≥ gn2 ≥ · · · ≥ gnN . In other words, sort the microphones in terms of how active they are, with the most active microphones first. If gnVλ+1 is 3 orders of magnitude smaller than gn1 , we say that Vλ microphones are active. In other words, if the activity of a microphone is much smaller than the activity of the most active microphone, we consider the microphone inactive. (5) If Vλ > V , set blower = λ and go to step (2). In other words, increase λ since there 20 are too many active microphones. If Vλ < V , set bupper = λ and go to step (2). In other words, decrease λ since there are too few active microphones. If Vλ = V , we are done. The set of active of microphones, A, is then A = {n1, n2, . . . , nVλ} (3.15) Remark The above algorithim assumes monotonicity in the number of active microphones with respect to the tuning parameter λ , that is as λ increases, the number of active microphones is non- increasing. In practice, we find the assumption holds globally but not always locally. While, in general, as λ increases, the number of active microphones decreases. we find that small changes in λ sometimes leads to non-monotonic behavior. For example, a small increase in λ might increase the number of active microphones from 4 to 6 or similarly a small decrease in λ might decrease the number of active microphones from 6 to 4. To handle these cases, we introduce these additional checks in the above algorithm: (1) We keep track of the number of active microphones discovered by the previous λ it- eration and also the previous λ value. Before entering step (5), we check if the current λ itera- tion produces a number of active microphones that exhibits non-monotonic behavior. We define non-monotonic behavior as an increase in the number of active microphones if λ was previously increased and, similarly, a decrease in the number of active microphones if λ was previously de- creased. If non-monotonic behavior has occurred, we immediately return to step (2), and instead of computing the average of blower and bupper, we choose a random number between blower and bupper. (2) To guarantee convergence of the algorithm, we also check the following two termination conditions: • In step (2), if the difference between the new λ and the previous λ exceeds a user-defined 21 tolerance, terminate the algorithm. • In step (5), if the number of iterations exceeds a user-defined maximum number of iterations, terminate the algorithm. An iteration is defined as the completion of one round of steps (2), (3), and (4). Discovery of final weights using only V microphones The final weights are obtained by solving a second time (3.14) restricted to the submatrix indexed by A with either λ = 0, if the submatrix is well-conditioned, or the same λ otherwise. This step is known as “debiasing”. This step is needed because of the weight vector found by the final step (3), while containing V large weights also contains N − V weights that are small but are still non-zero. 3.4.2 Simulated Annealing (SA) Our second approach to solve (4.5) uses simulated annealing (SA). We note that the con- straint on the number of active microphones makes (4.5) a combinatorial optimization problem. Simulated annealing is a simple randomized technique for iterative improvement introduced in [3]. SA repeatedly traverses a Markov chain by sampling the search space from a neighborhood of the current point with a probability proportional to exp(f(x)T ) where T is a temperature parameter and f is the objective function to be minimized. The idea is that when T is large or, metaphorically speaking, when the “the iron is hot” and therefore pliable, all points in the space have approxi- mately equal probability of being selected, and as T decreases or the “the iron cools” and is less pliable, only points in the space that are close to the current search point have approximately equal probablity of being selected. The temperature functions follows an user-determined schedule of heating, cooling, and the re-heating. If the new point discovered is better, the point is set as the current search point. A key idea 22 of SA is that if the new point discovered is worse, the algorithm will accept it as the current search point with some non-zero probability in order to avoid local minima. In our case, we wish to find the subset of V microphones out of N and their corresponding weights that minimize the problem given by (4.5). In other words, we wish to find the subset of V beamforming weights out of N that minimizes the p-norm of the inteference gains while maintaining target unity gain, that is achieves target perfect reconstruction. The objective function value f is then the p-norm of the L − 1 interference gains. If the optimization program fails for any reason, the objective function value is set to infinity. SA generates a new random candidate solution point j in the neighborhood of the current point i, where j’s distance from the current search point (the difference of the microphone indices moduloN ) is proportional to the temperature parameter. The candidate replaces the current search point if the candidate’s objective function value improves, i.e. f(j) < f(i). This is also done if the candidate’s objective function value is worse, but only randomly with a probability given by: p = exp ( f(j)− f(i) T ) (3.16) Otherwise, the candidate is rejected. We index our microphones from 0 to N − 1, and our initial point is the set of V active microphone indices discovered by the λ-method. By choosing our initial point in this manner, we guarantee that SA will find a solution no worse than the solution discovered by the λ-method. We perturb these V indices in proportion to the current temparature T . If v >= N or v < 0, we set v = v (mod N). By doing so, we make all the new indices valid microphone indices but also introduce the notion that the microphones are arranged in a circular manner, that is the first microphone, microphone 0, and the last microphone, microphone N − 1, are only a distance 1 apart. In our experiments, we use the following SA parameters. The initial temperature is set to 23 100, and the annealing schedule decreases the temperature by 5% each iteration. We limit the length of the Markov walk to a total length of 1200 iterations. After every 100 iterations, the annealing is restarted by setting the temperature back to its initial value of 100. Further, we rerun SA ten times. 3.5 Relation to Prior Work In this section we compare and relate our approach to four problems in the literature: the beam pattern design, co-array design, grid-based beamforming, and compressive sampling. Beampattern Design Lebret and Boyd [4] showed that given arbitrary microphone loca- tions, finding the set of microphone weights that minimizes the maximum interference gain could be formulated as a convex optimization problem. They modeled sources as point sources, com- plex exponentials decaying as a function of distance. Specifically, the problem could be showed equivalent with a Second Order Cone Programming (SOCP) problem and thus efficiently solved by interior point methods [5]. Our approach is similar in spirit to that of Ling et al. [6]. The authors followed the ap- proach of [4] but added an additional l1 norm penalty to the weights. The l1 penalty sparsifies the microphone weights and therefore microphone locations. Specifically, the l1 penalty on weights causes many of the beamforming weights to be close to zero in magnitude. Our work differs in the following three aspects. First, we minimize p-norm of interference gains instead of the side lobes at different incident angles since we are interested in a two dimensional beam pattern. Sec- ond, our simulations consider microphone locations on the perimeter of a rectangular room, rather than only on a straight line. Third, we do not add an l2 penalty on the weights to protect against large gains. Instead, after optimizing over a large set of microphones with a l1 penalty, we select the microphone locations with the largest weight values and then reoptimize over this subset of 24 locations. During the reoptimization, we use the same l1 penalty. The performance of the proposed λ-method is compared against the performance of an exhaustive search, showing how close to optimal the λ-method achieves. The search space of possible microphone configurations is also analyzed by counting the number of local minima. Co-array Design A common approach to choose microphone locations for a variety of problems is to examine the co-arrays formed by the microphone locations [7,8]. Recently, Pal and Vaidyanathan [9] proposed an alternate method to find microphone positions with desired co-array properties through the use of nested uniform arrays. However, the theory of co-prime arrays deals principally with the one dimensional case. The optimization criteria we consider in the present paper can deal implicitly with two or three dimensional array designs. Grid-based Beamforming Brandstein and Ward modeled an acoustic enclosure as a rect- angular grid of point sources in [10]. Grids were labeled either as sources or interferences based on prior knowledge. Microphone weights were calculated to maximize the optimization criterion given by the ratio between source gains and interference gains. This prior work did not analyze the relationship between microphone placement and the optimization criterion. The authors extended their algorithm [11] to address room reverberation by allowing for sources to lie outside the room, an idea drawn from the “image” model where reflections of sources off of walls are modeled as virtual sources lying outside the room [12]. Compressive Sampling The widespread compressive sampling problem (see [13,14]) is to to minimize the l0 pseudo-norm of a vector x subject to a linear constraint Ax = b in the absence of noise, or an inequality ‖Ax − b‖p ≤ ε in the presence of noise. Using Lagrange multipliers, this problem becomes min x ‖Ax− b‖p + λ‖x‖0 (3.17) where p ≥ 1. 25 Problem (4.5) can be brought to this form if we make the additional assumption that a specific microphone is active. For simplicity of notation, assume that we know |wN (f)| > 0, that is microphone N , the last microphone is active. We know that at least one microphone is active since if no microphones were active the target unit gain constraint would not be fulfilled. Next, we solve for wN (f) from the target unit gain constraint 〈wN×1(f), H¯0,1×N (f)〉 = 1 and substitute back into HI,(L−1)×N (f)wN×1(f). If we denote w˜(N−1)×1(f) as the (N −1)×1 column vector which consists of the first N − 1 components of wN×1(f), that is w˜(N−1)×1(f) =             w0(f) w1(f) ... wN−1(f)             , and AL×(N−1)(f) the L× (N − 1) matrix whose l, n entry is given by ( AL×(N−1)(f) ) l,n = Hl,n(f)− Hl,N (f)H0,n(f) H0,N (f) , and bL×1(f) the L× 1 column vector whose l-th entry is given by (bL×1(f))l = − Hl,N (f) H0,N (f) , then (4.5) becomes min w˜(N−1)×1(f):‖w˜(N−1)×1(f)‖0=V−1 ‖AL×(N−1)(f)w˜(N−1)×1(f)− bL×1(f)‖p (3.18) which can be converted into a problem similar in form to (3.17): min w˜(N−1)×1(f) ‖AL×(N−1)(f)w˜(N−1)×1(f)− bL×1(f)‖p + λ‖w˜(N−1)×1(f)‖0. (3.19) 26 A different formulation of (3.19) is to minimize the number of sensors that achieve a given gain level τ , that is min ‖w˜(N−1)×1(f)‖0 w˜(N−1)×1(f) subject to ‖AL×(N−1)(f)w˜(N−1)×1(f)− bL×1(f)‖p ≤ τ (3.20) The convex relation is then given by min ‖w˜(N−1)×1(f)‖1 w˜(N−1)×1(f) subject to ‖AL×(N−1)(f)w˜(N−1)×1(f)− bL×1(f)‖p ≤ τ (3.21) We note that existing literature such as [15] gives guarantees on the quality of solution found by optimization problems (3.21) if AL×(N−1) satisfies the Restricted Isometry Property (RIP) [16]. However in our problem setup, AL×(N−1) does not satisfy RIP for reasonable numbers of available microphones (e.g. N = 1000) since the microphones are closely spaced and hence columns of AL×(N−1) have high similarity. 3.6 Experimental Results Below we present two types of results. First we do a comprehensive analysis of the combi- natorial optimization problem (4.5) for a tractable size. Second, we study a large scale setup. All reported gain results are in [dB] defined as 10 log10 ‖Hw‖∞. I). Consider the rectangular room similar to that of Figure 3.1 of size (10m × 8m) but with the target located at (1m, 4m) and 6200 interferences operating at f = 1kHz. The total number of 27 Table 3.1. Number of separated minima of criterion J for D = 5 active microphones out of N and a neighborhood radius R shows the vast majority of combinations are nonoptimal locally. @ @ @ @ @@N R 1 2 3 4 5 6 7 8 9 10 (N5 ) 10 13 7 3 2 2 1 1 1 1 1 252 12 13 2 1 1 1 1 1 1 1 1 792 14 29 9 5 1 1 1 1 1 1 1 2002 16 112 27 10 4 4 3 2 2 2 2 4368 18 184 49 15 10 8 7 1 1 1 1 8568 20 288 56 24 13 6 6 2 2 2 2 15504 microphones N runs from 10 to 20 but only V = 5 are active microphones. For each combination pi = (pi1, pi2, pi3, pi4, pi5) of 5 microphones out of N we solve the reduced optimization problem J(pi) = min w˜ ‖H˜w˜‖∞ , subject to 〈w˜, ¯˜H0〉 = 1 (3.22) where H˜ is the 6200 × 5 submatrix of H corresponding to the combination pi of active micro- phones. Then we analyze the number of separated (local) minima across the set of (N 5 ) combina- tions. We use the l1 modulo N for distance. We analyze neighborhoods of radius R, with R from 1 to 10. Specifically a combination ρ is a separated minimum if J(ρ) ≤ J(pi) for all combinations pi so that d(ρ, pi) ≤ R. Table 3.1 summarizes the number of separated minima for a given R. Next we apply the λ-method where we tune λ to achieve 5 significant nonzero weights. Due to axial symmetry in cases N=12 and N=14 the 5th and 6th largest weights are equal. In the other cases, 5th largest weight was at least 1000 times larger than the 6th weight. The optimal criterion found by the λ-method is less than 0.75dB higher than the global minimum (see Table 3.2). II). We perform simulations to show the performance and properties of the λ-method, where we aim to separate sounds from a known target location and unknown interference locations. To account for the unknown locations we consider a uniform grid of 100×80 interferences. We refer to this grid as the optimization grid. All experiments assume a rectangular room of dimension 28 Table 3.2. Comparison λ-method vs. global minimum. N 10 12 14 16 18 20 λ-method [dB] -0.86 0.94 0.55 -1.39 -0.95 -1.55 Global optim [dB] -1.31 0.54 0.54 -1.6 -1.68 -1.62 10m x 8m and N = 1000 microphones. We disregard interferences that are closer than 0.5m to the source or the walls of the room and report the worst target to interference gain (WTIG) as performance measure. First, we use the λ-method to select a subset of size V = 32 microphones from the original 1000 that achieves the lowest WTIG for a 1kHz target signal at position (x, y) = (3, 4) for a single path reflection model on a fine 1000 × 800 evaluation grid. This is achieved as discussed in Section 4.4. The resulting gain beam pattern is illustrated in Fig. 3.2. Here, the WTIG is found is 1.5dB. The disregarded regions close to the walls and the target are indicated by white coloring. Note that both inferred microphone locations are beam pattern are symmetric. Also, the beam pattern shows ray-like regions of higher gains “emitting” from pairs of microphones. These rays overlap in a 2D design to archive a high gain at the target and to suppress signals at other locations in the room. Next, we analyze performance as a function of the number of selected microphones V and the target location. Figure 3.3 illustrates the WTIG of a 1kHz target signal at positions (1,4) and (3,4) for D = 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1000 active microphones and λ = 0.1. This shows that a small subset of e.g., 32 microphones can be sufficient to achieve performance similar to the solution of all 1000 microphones. However, it also shows that the necessary number of microphones, for close to optimum performance, is dependent on the target location. That is, while sub-selecting 32 microphones results in a WTIG increase over the 1000 microphone solution of 0.1dB for position (1,4) on the evaluation grid, it results in an WTIG increase of 4.8dB for position 29 Gain [dB]MicrophonesSource −30 −25 −20 −15 −10 −5 0 5 10 Fig. 3.2. Gain pattern of a 32 element microphone array at 1000 hz found by λ-method. The worst target to interference gain is 1.5 dB. The array is down-selected from 1000 uniform distributed microphones. (3,4). Finally, we evaluate the quality of the result from our λ-method by comparing it with the results from the SA method. This is necessary due to the non-convex nature of the optimization problem (4.5), which suggests that our approach could converge to a sub-optimal local minimum. Both, the λ-method and SA method use the optimization grid to find the microphone locations and weights. SA is initialized at the final microphone locations of the λ-method to reduce processing time. The result of this comparison is illustrated in Table 3.3. The small improvement from SA illustrates the quality of the λ-method result. In Fig. 3.4, the gain pattern is no longer symmetric but achieves a better WTIG of 1.5 dB. Table 3.3 and Fig. 3.3 also show that the microphone selection and weights obtained using the optimization grid perform sub-optimally on the finer evaluation grid. That is, while Table 3.3 reports a WTIG of −12.71dB for position (1,4), where the evalution is based on the optimization grid, the same position only achieves a WTIG of −4.79dB when the evaluation is based on the finer evaluation grid. Note that it is computationally infeasible to use the fine evaluation grid for optimization with the λ-method. 30 100 101 102 103−20 −10 0 10 20 30 Number of Microphones WTIG [dB] Pos. (1,4); Opt. GridPos. (1,4); Eval. GridPos. (3,4); Opt. GridPos. (3,4); Eval. Grid Fig. 3.3. Worst target to interference gain for a number of active microphones and different target locations. 32 microphones are sufficient to achieve performance similar to 1000 microphones. The sufficient number of microphones is dependent on the target location. Table 3.3. WTIG for the λ-method and SA Method Pos. (1,4) Pos. (3,2.5) Pos. (3,4) Opt. Grid λ-method -12.71 -7.35 -5.73 (100× 80) SA -13.00 -7.66 -5.73 Eval. Grid λ-method -4.79 0.30 1.5 (1000× 800) SA -5.34 -0.14 0.1 31 Gain [dB]MicrophonesSource −30 −25 −20 −15 −10 −5 0 5 10 Fig. 3.4. Gain pattern of a 32 element microphone array at 1000 hz found by Simulated Annealing. The worst target to interference gain is 0.1 dB. The search was initialized by the result of the λ- method 3.7 Conclusions Our problem is acoustic scene understanding, which subsumes a number of high level tasks such as localization of acoustic sources, tracking of features of the sources, and even source sep- aration. We aim to utilize a very large number of microphones distributed spatially in the volume of interest. This paper discusses optimal beam patterns for very large microphone arrays. It shows that only a sparse active subset of microphones is sufficient at any given time. The equivalent problem with a small set of microphones would be element spacing. The paper formalizes the problem of defining active subsets of microphones capable to zoom into a target location while suppressing a large number of interferences. It shows that near optimal subsets are solutions of a convex approximation of the originally stated non-convex criterion using a LASSO inspired ap- proach. Future work will show how solutions representing multiple active subsets of a very large multidimensional microphone array could be adaptively employed in order to locate and separate sources of interest. 32 Part II Processing Broadband Source Using Delay-Scale-Sum Beamformer 33 Chapter 4: Processing Broadband Source Using Delay-Scale-Sum Beamformer 4.1 Overview Consider a sensing system using a large number of N microphones placed in multiple dimensions to monitor a broadband acoustic field. Using all the microphones at once is impractical because of the amount of data generated. Instead, we choose a subset of V microphones to be active. Specifically, we wish to find the set of V microphones that minimizes the p-norm of the gain at multiple frequencies while monitoring a target of interest with p ∈ {1, 2,∞}. A direct, combinatorial approach – testing all N choose V subsets of microphones – is impractical because of the problem size. Instead, we use a convex optimization technique that induces sparsity through a l1-penalty to determine which subset of microphones to use. We test the robustness of the our solution through simulated annealing and compare its performance against a classical beamformer which maximizes SNR. Since switching from a subset of D microphones to another subset of D microphones at every sample is possible, we construct a space-time-frequency sampling scheme that achieves near optimal performance. 34 4.2 Introduction Consider a large scale microphone array havingN microphones that monitors a surveillance area. Using all microphones simultaneously may be unreasonable in terms of power consumption and data processing. For example, for N = 10000 microphones and a data sampling rate of 100000 samples per second, the bandwidth requirement is 1Gsamples/sec. Instead, we could poll only a subset of D microphones at any one given time. The N choose D number of choices of microphones allows for a myriad of microphone configurations, and the task is then to choose a subset that achieves our objective. Alternatively, our problem is to place D microphones in a constrained region of space. We convert this non-convex optimization problem into a combinatorial problem by discretizing the possible set of microphone locations. In this context our approach can be seen as an optimal test design. Assume the surveillance area consists of a set of point-like sources. We seek designs that minimize the p-norm of intereference gains across frequencies from a potentially very large num- ber of locations while still maintaining target unit gain at each frequency. Throughout this paper, we assume the following hypotheses: H1. Microphone locations, that is their (x, y, z-coordinates), are known. Their locations however can be arbitrary. H2. The number of real interferences, their locations, and their spectral powers are un- known. H3. The maximum number of microphones active at any time, V , is fixed in advance. H4. All sources, both target and interferences, are broadband that is each source consist of F frequencies f0, f1, . . . fF−1 35 When microphones have local computational processing power, we make an additional hypothesis: H5. Microphones can band-pass signals and they can switch data transmission on a sample- by-sample basis. Under H5, the system can sample selectively the space-time-frequency domain. Our system divides the surveillance area into a large number of small, non-overlapping target areas. For each fixed target area, we find a subset of microphones that minimizes the p- norm of broadband gains of a large number of virtual interferences with p ∈ {1, 2,∞}. By doing so, the system is robust to a wide variety of unknown interference configurations. In addition, we note that the system can choose a different set of microphone locations for each of the three different p-norms. After finding a set of V microphones to monitor each of the individual target areas, the sys- tem begins to iteratively scan each target area by polling the predetermined set of V microphones for each of the corresponding target areas. We note that the V microphones need not be unique for each target area but rather the same microphone can be reused across different sets of V mi- crophone configurations as target areas vary. By polling sets of numactivemics microphone for each target area, we can measure the signal strength in each target area and hence discover and monitor the actual targets. Our goal then is to find for a fixed target area the optimal set of V microphones and their corresponding broadband weights and that minimizes the p-norm of the broadband gain of the inteferences while maintain target unity gain at each frequency with p ∈ {1, 2,∞} 36 0 2 4 6 8 100 2 4 6 8 TargetInterferencesSensors Fig. 4.1. Sample Scenario: L = 193 sources (1 target and 192 interferences), N = 200 micro- phones. The 200 microphones are placed on the walls of a 10 m by 8 m room. To monitor the target located at (3m, 4m), we wish to use only V microphones instead of all N so as not to over- whelm our central processing unit with data. If we set V = 5, there are (N V ) = (100 5 ) ≈ 7.5× 107 possible sets of 5 microphones to use. The L sources are assumed to be broadband, that is each source consists of the F frequencies f1, f2, . . . , fF . We allow theN filter weights to vary for each of the F frequencies and hence our optimization problem returns a set of F ·N filter weights. 4.3 Problem formulation Consider the setup described earlier and sketched in Figure 4.1. We assume we know the microphone locations and then fix a target area. Since the number, location, and power distribution of each of the interferences is unknown, we assume a large number of virtual interferences, say L. To account for possible reflections off of walls, we include virtual interferences outside our survelliance area. Our objective is to minimize the p-norm of the interference broadband gains with p ∈ {1, 2,∞}. Assume the zero indexed source is the target, and the remaining L sources are interferences. For these “virtual”L+1 sources we use the direct-path model, where the transfer function between 37 source l and microphone n at frequency fi is given by Hl,n(fi) = e−2piific‖rl−rn‖ ‖rl − rn‖ , 0 ≤ l ≤ L, 1 ≤ n ≤ N (4.1) where rn, rl denote the position vectors of microphone n and source l respectively, and c is the speed of sound, which in our case is the speed of sound, 343 m/sec. The direct path model assumes that the signal amplitude decays in proportion to distance and the frequency phase varies cyclically in proportion to both the propagation speed and frequency. Let wn(fi) denote the processing weight for microphone n at frequency fi. In other words, we assume that each microphone applies a delay and scaling in the time domain to the signal for each frequency of interest of fi. This can be accomplished by splitting the signal into fre- quency bands via bandpass filters and operating on each bandlimited signal individually. A key observation is that if wn(fi) = 0, then microphone n is inactive at frequency fi. For a fixed interference l and frequency fi, the the system processing response to source l, Rl(fi), of our delay-scale-sum system is given by Rl(fi) = N−1∑ n=0 Hl,n(fi)wn(fi). (4.2) We note that for a fixed frequency fi each of theN microphones receives a “copy” of source l that has been distorted by some transfer function Hl,n that is a function of the distance between the microphone n’s and source l’s location. Each microphone n’s copy is processed by some weight wn and all N processed, distorted copies are summed up to give the system processed source l. Rl(fi) is then a complex scalar that is exactly the delay-scale-sum system’s effect on source l for frequency fi. By taking the modulus of Rl(fi), we get the gain Kl(fi), that is Kl(fi) = |Rl(fi)| (4.3) 38 The gainKl(fi) is a real non-negative scalar that measures how much our system attenuates or amplifies a given source l at frequency fi. If 0 ≤ Kl(fi) < 1, the system attenuates the source l and conversely if Kl(fi) > 1, the system amplifies source l. If Kl(fi) = 1, the system does not distort the amplitude of source l but may possibly introduce a phase shift to source l at frequency fi. Assume that we have F distinct frequencies of interest,f0, f1,. . .,fF−1. Given F sets of N microphone weights, each interference l has F different gains, one for each frequency. Note that for each of the F frequencies, we use a different set of N microphone weights to calculate the interference gains at that frequency. Our objective is to minimize the p-norm of sum of gains across F frequencies for interference sources while still achieving target unit response at each frequency with p ∈ {1, 2,∞}. We call the sum of gains of an interference across F frequencies an intereference’s broadband gain. Hence, the broadgain for source l for frequencies of interest f0, f1, . . . , fF−1, Kl(f0, f1, . . . , fF−1),a real non-negative scalar, is given by Kl(f0, f1, . . . , fF−1) = F−1∑ i=0 | N−1∑ n=0 Hl,n(fi)wn(fi)| (4.4) Assumption H3, however, limits us to V simultaneously active microphones. In this multi-frequency setup, when H5 is not satisfied, microphone n becomes ac- tive if wn(fi) 6= 0 for any i. Thus, the number of non-zeros in the N -dimensional vector { max 0≤i≤F−1 |wn(fi)|} is an appropriate measure of the number of active microphones. In other words, if a fixed microphone n has any non-zero weight for any of its F filter weights, the mi- crophone is considered active. Let ‖w‖0 denote the pseudo-norm which counts the number of non-zeros in the vector w. We can now state our optimization problem: 39 min (wn(fi)) F−1,N−1 i=0,n=0 ‖{ F−1∑ i=0 | N−1∑ n=0 Hl,n(fi)wn(fi)|}1≤l≤L‖p subject to N−1∑ n=0 H0,n(fi)wn(fi) = 1 for i = 0, 1, . . . , F − 1 ‖{ max 0≤i≤F−1 |wn(fi)|}0≤n≤N−1‖0 ≤ V (4.5) Note that this is a non-convex optimization problem due to the l0 pseudo-norm constraint ‖{ max 0≤i≤F−1 |wn(fi)|}0≤n≤N−1‖0 ≤ V . When hypothesis H5 is satisfied, we can solve the optimization problem (4.5) independently for each frequency, and then implement an adaptive space-time-frequency sampling. 4.4 Convex Optimization Strategy Our method is inspired by LASSO regularization [1], a regression technique that minimizes the sum of squares of residual errors subject to the l1 norm of the coefficients being less than a constant. Similar to the sparse signal and model estimation approach in [2], the l0 pseudo-norm is replaced by the l1 norm ‖w‖1 = ∑N−1 n=0 |wn(f)| which is then incorporated into the optimization criterion using a Lagrange multiplier λ. The optimization problem (4.5) is then replaced by the following convex optimization problem which we call the λ-method: min (wn(fi)) F−1,N−1 i=0,n=0 max 1≤l≤L F−1∑ i=0 | N−1∑ n=0 Hl,n(fi)wn(fi)|+ λ N−1∑ n=0 max 0≤i≤F−1 |wn(fi)| subject to N−1∑ n=0 H0,n(fi)wn(fi) = 1 for i = 0, 1, . . . , F − 1 (4.6) For very large λ, the penalty term forces many of the microphones to become inactive. Specif- ically, let WN×F be the N by F matrix of microphone weights produced by the optimization where a row represents the F filter weights of an individual microphone. With a large λ penalty, many of the rows of WN×F contain only values very close to zero. If the nth row only con- tains such small values, microphone n is inactive. As λ decreases, the penalty term becomes 40 less expensive and more microphones become active. At the limit, λ = 0, all microphones are allowed to be active. We fine tune λ until we get V non-zero components. Specifically, this is accomplished when gn(V+1) is significantly smaller than gn(1) (e.g. by 3 orders of magnitude), where gn = max 0≤i≤F−1 |wn(fi)| and gn(i) is the i-th largest element in magnitude of the set {gn}, n = 0, 1, . . . , N − 1. We then solve (4.6) using this λ. The final weights are obtained by solving a second time (4.6) restricted to V microphones and λ = 0, commonly known as the debiasing step [1]. 4.5 Varying Microphone Support Over Frequencies When we optimize over multiple frequencies (4.6), the support of the chosen microphones remains fixed over the frequencies. However, we can also run (4.6) for each individual frequency of interest fi, that is we solve the optimization problem min (wn(fi)) N−1 n=0 max 1≤l≤L | N−1∑ n=0 Hl,n(fi)wn(fi)|+ λi N−1∑ n=0 |wn(fi)| subject to N−1∑ n=0 H0,n(fi)wn(fi) = 1 (4.7) We have to run (4.7) a total of F times, once for each frequency of interest fi. A run of (4.7) for frequency fi means to iteratively choose λi until we discover a λi that produces V active microphones. We note that every time we test a new candidate λi, the optimization problem (4.7) needs to be solved . Each run of (4.7) for fixed fi produces a different set of V microphones and hence the support of the microphones can vary over frequencies. If the frequencies are chosen with uniform spacing, we can use non-uniform sampling to reconstuct the signals of interest at each frequency. Specifically assume we have a total of F frequency bands (e.g. F = 4 as in the results below). Then each microphone has an additional F -channel filter bank each processing disjoint frequency 41 bands. The filter bank outputs are downsampled by F and the corresponding sample streams are sent according the transmission policy obtained in the optimization step. The central controller reconstructs the broadband signal by putting together the disjoint frequency bands. Another method of allowing the microphone supports to vary over frequency is to solve the following optimization problem min (wn(fi)) F−1,N−1 i=0,n=0 max 1≤l≤L F−1∑ i=0 | N−1∑ n=0 Hl,n(fi)wn(fi)|+ F−1∑ i=0 λi N−1∑ n=0 |wn(fi)| subject to N−1∑ n=0 H0,n(fi)wn(fi) = 1 for i = 0, 1, . . . , F − 1 (4.8) Every time (4.8) is solved, F different λi must be chosen. The question arises then how to iteratively choose the F differnt λi so that V active microphones are chosen for each frequency. We also note that (4.8) differs from (4.7) in that the objective function includes the broad- band gain. 4.5.1 Varying λ1 and λ2 For Two Frequencies We are interested in the pair of lambda values that produces exactly the same number of significant microphones weights for each of the two frequencies where one lambda is a scalar multiply of the ell-one norm of microphone weights for the first frequency and the other lambda is similarly a scalar multiply of the ell-one norm of microphones weights for the second frequency within the objective function. In other words, we have a fixed number of microphones for each frequency but the microphone positions and weights are allowed to vary for each frequency. For example, if we had 100 possible microphone locations, we would have then have 200 total microphones weights we wish to optimize, 100 weights for each of the two frequencies. We want to find the pair of lambda values that would produce 32 significant weights in the 100 weights corresponding to frequency 1 and 32 significant weights in the 100 weights corresponding 42 to frequency 2. A weight is considered significant if its magnitude is greater than 0.1% of the largest mag- nitude of the weights for that frequency. We note that the optimization problem consists of an objective function and a constraint. The objective function is the sum of three terms: the p-norm of the sum of the interference gain across the two frequencies, the first lambda multiplied against the ell-one norm of the microphone weights corresponding to frequency one, and the second lambda multiplied against the ell-one norm of the microphone weights corresponding to frequency two. In addition, we have the con- straint that the microphone weights produce unit gain for the target at both frequencies. We also note by increasing the lambda value corresponding to a single frequency, the num- ber of significant weights for that that frequency decreases and similarly by deceasing the lambda value corresponding to a single frequency, the number of significant weights for that frequency increases. What is unknown is the effect of changing the lambda for a given frequency on the number of significant weights for the other frequency. To gain intuition on the effect of varying one lambda on the other frequency’s number of significant weights, we run the following experiment. We fix our possible microphones locations, the target, the interferences, and the two fre- quencies. We then vary two different lambdas values within our objective function. For example, if we wish to test the lambda values 0,1, and 100 we would have 32 lambda pairs to test. For each pair of lambdas we test, we collect four values: the number of significant weights for each of the two frequencies and the ell-one norm of the microphone weights for each of the two frequencies. We specify the number of significant microphone weights for each frequency we are inter- ested in. In our case, 5,8, and 32 microphones weights out of 100. 43 For example, for 32 significant microphones, we plot the contour lines for 32 microphones for each of the two frequencies on the same plot with the the x-axis and y-axis representing the variation of lambda values tested We have then plotted two different contour lines, one for fre- quency. We find the intersection of the two contour lines and for that intersection point, a specific pair of lambdas, we look up the corresponding ell-one norm of weights of each of these two fre- quencies. This represents two different ell-one weights. One is the ell-one norm of the weights for frequency one and the other is the ell-one norm of the weights for frequency two. We note that quantization (round-off error) is involved in finding this intersection point since not all possible lambda values between smallest lambda and largest lambda, an uncountably infinite number of points, are tested. The intersection point is chosen to be the pair of lambda values that are closest in distance to the intersection of the two contour lines where the contour lines are they themselves interpolated curves. We then plot the contour lines for two different ell-one weights for both frequencies on the same plot with the the x-axis and y-axis representing the variation of lambda values tested. This represents 4 different contour lines. 4.6 Simulated Annealing We test the robustness of the solution found by convex optimization through simulated an- nealing (SA). Simulated annealing is a simple randomized technique for iterative improvement introduced in [3]. SA will probabilistically accept worse transitions in order to avoid local min- ima. In our case, SA minimizes the objective function given by the largest gain for all interference positions for a fixed-size subset of microphones over locations of the microphones in the subset. More precisely, given a fixed number of D microphones, we run the same convex optimization problem as the λ-method (e.g., find the filter weights that minimize the maximum valid interfer- 44 ence over a coarse grid while maintaining a target gain of unity) but with λ set to 0. The objective function value f is then the largest gain of the valid interferences. If the optimization program fails for any reason, the objective function value is set to infinity. The initial temperature is set to 100, and the annealing schedule decreases the temperature by 5% each iteration. We limit the length of the Markov walk to a total length of 1200 iterations. The initial search point is the point produced by the λ found through an iterative binary search that produces D = 32 microphones. We assume that the target location, room size, and the frequency of interest are fixed. 4.7 Relation to Prior Work In this section, we review prior work in relation to our work. The author of [17] derives a probabilistic expression for the peak side lobe level for a linear array with randomly placed elements. The elements’ weights are uniform in magnitude but their phases are chosen to maximize the main lobe in the direction normal to the array. In our case, we allow weights to vary in both magnitude and phase. In [18], the authors address the problem of finding the sparsest linear array given a desired beam pattern and error tolerance. The elements’ phases are fixed and their magnitudes’ are al- lowed to vary. They furthermore assume symmetry in their linear array so that their optimization variables are real. Analogous to our setup, they formulate an l0 quasi norm objective to count the number of active array elements by counting the number of non-zero weight values. The number of possible of active array elements available,N in our case, is large but finite to make the problem computationally tractable. They convert the objective to a lp norm objective with 0 < p < 1 and solve the optimization problem using a simplex search. In [19], the author assumes a symmetric linear array setup and formulates an optimization problem to find the real weights that minimize peak side lobe level for fixed sensor locations. He 45 then proposes another optimization problem that finds both sensor weights and sparse locations simultaneously. To minimize the number of sensors, he formulates a l0 quasi norm objective to count the number of active array elements by counting the number of non-zero weight. The optimization problem also includes a user-specified constraint on the maximum allowable peak side lobe value. He solves this combinatorially difficult problem for small values using a branch and bound method. In our case, we work with 2-D arrays and optimize over complex weights. In addition, we fix the number of active microphones rather than trying to minimize the number of active microphones and seek to minimize the maximum interference gain in the case p =∞. We relax the l0 quasi norm to an l1 norm to discover microphone locations. In [20], the authors propose two greedy deletion algorithms and a l1 minimization algorithm to find a sparse linear array containing the desired number of elements. First is the “minimum in- crease” algorithm, where for a given array, an element is removed and the performance of the remaining elements is measured. This procedure is repeated for each element of the array. The element which affects the performance the least is then deleted. This procedure is then repeated iteratively until the desired number of array elements remain. Second, is the ”smallest coefficient” algorithm where the array element with the smallest weight is deleted. This is also repeated it- eratively until the desired number of array elements remain. The third algorithm relaxes an l0 pseudo-norm constraint to a l1 norm to get a sparse solution. The algorithm then chooses the elements with the V largest weight magnitudes where V is the desired number of active elements. Finally, the algorithm finds new weights for the chosen V algorithms, a debiasing step. Our algo- rithm is similar in spirit to the third algorithm but we instead we rerun the optimization problem, tuning a sparsity parameter λ using a bisection method, until the l1 norm minimization problem produces the desired number of V microphones. In [21], the authors uses the theory of difference sets to place elements. In addition, he 46 proposes the idea of “spatial hopping” wherein one part of the array form one desired beampattern and the other part of the array forms another desired beampattern. This idea is analogous to the idea of using different subset of microphones to focus in targets at different locations. The authors of [22] solve the problem of finding the maximally sparse linear array that matches a given reference beampattern within a specified tolerance by converting the problem to finding the maximally sparse linear array that matches a given reference beampattern with the highest a-posteriori probability . They solve the converted problem via Bayesian Compressive sampling. Their algorithm is dependent on an initial estimate of the error variance, σ20 . 4.8 Experimental Results We run experiments by optimizing over a simpler model and then evaluating over a more sophisticated model. Our optimization model is as follows: The room size is 10 m by 8 m. The target of interest is located at (3 m,4 m). There are 1000 possible microphone locations along the perimeter of the rectangular room. We optimize over four freqencies of 250, 500, 750,and 1000 Hz. There are 6200 virtual interferences, and a direct path model is used to calculate the transfer functions. We do not place these interferences within 0.5 m of the perimeter of the room or the target. The evaluation model differs from the optimization model in two ways: There is a denser set of 620000 interferences, and we include reflections for each of them. We run five types of experiments to compare the performance of the λ-method. First, we run the optimization problem (4.6), simultaneously optimizing over the four frequencies. This setup fixes the support of the microphone setup across all 4 frequencies. Second, we run the optimization problem (4.6) again four times, once at each individual frequency. The support of the chosen microphones are then allowed to vary over frequencies. Third, we randomly perturb the set of microphones found by the multi-frequency optimization of experiment (1) using simulated annealing (SA) to see by how much we can improve the solution. Fourth, we again use SA to perturb the set of microphones 47 found by the single-frequency optimization of experiment (2). Fifth, we test the performance of the beamformer that maximizes the signal to noise ratio (SNR). This beamformer can be shown to be the set of microphones that lie closest to the target. Table (1) shows the worst interference gain in dB for the five setups using the evaluation model. The results show our multi-freqency λ-method (column 3) outperforming the beamformer that maximizes SNR (column 2) for every frequency. The single-frequency λ-method performs better than multi-frequency λ-method since microphone locations are allowed to vary across fre- quencies. Simulated annealing sometimes but not always finds better performing solutions when measured with the evaluation model. By algorithm construction, simulated annealing finds a solu- tion at least as good as the initial point when measured on the optimization model. Figs. 4.2 and 4.3 show the beam patterns for both the λ-method and the maximum-SNR beamformer at 1000 Hz along with the placement of the microphones. Results are in dB, with unit target gain (0 dB). Fig. 4.4 compares the expected value and variance of the maximum gain among a varying number of randomly placed interferences in the survelliance area. Finally, Fig. 4.5 shows the sharp drop in filter weights produced by the multi-frequency λ-method. f[Hz] BF MF λ SA MF λ SF λ SA SF λ 250 15.6 9.2 3.9 -1.19 -3.1 500 14.5 7.6 4.3 3.3 2.1 750 12.5 3.4 4.1 -0.9 -0.2 1000 10.4 2.4 5.8 1.5 0.1 Table 4.1. Worst interference gains[dB] for: Beamformer, MultiFreq λ-method, SA MultiFreq λ-method, SingleFreq λ-method, SA SingleFreq λ-method. 48 4.9 Conclusions We aim to utilize a very large number of available microphones by using customized sub- sets of microphones to monitor specific areas of interest. This selective sampling of microphones then produces reasonable amounts of data to be processed. An equivalent problem to our micro- phone subset selection is microphone spacing. Our optimization criterion finds microphones that suppress a large number of interferences across multiple frequencies while monitoring a target location. We allow the subset of microphones we choose to have different weights for different frequencies of interest. We show that our multi-frequency LASSO-inspired convex optimization technique can find subsets of microphones that give reasonable performance on evaluation models that contain large numbers of virtual interferences and reflections even though the optimization criterion assumes many fewer virtual interferences and no reflections. If frequencies of interest are uniformly spaced, we can achieve even better performance by allowing the active microphone subset to change over frequencies and then reusing space-time-frequency sampling to recover our signal of interest. Gain [dB]MicrophonesSource −30 −25 −20 −15 −10 −5 0 5 10 Fig. 4.2. Gain map for classical beamformer that maximizes SNR at 1000 Hz 49 −30 −25 −20 −15 −10 −5 0 5 10Gain [sB]MicrophonesSource Fig. 4.3. Gain map for beamformer found by multi-frequency λ-method at 1000 Hz 1 2 3 4 5 6 7 8 9 10−15 −10 −5 0 Number Interferences2 0*lo g10 (mea n gai n) Max−SNRMFSA MFSFSA SF 1 2 3 4 5 6 7 8 9 10−30 −20 −10 0 Number Interferences10* log1 0(var iance gain ) Max−SNRMFSA MFSFSA SF Fig. 4.4. Comparison of mean and variance of maximum gain for varying number of randomly placed interferences at 1000 Hz 0 200 400 600 800 100010−10 10−8 10−6 10−4 10−2 100 Sorted Filter Weight Index Σ f|W f | Fig. 4.5. Ordered weights found by multi-frequency λ-method 50 Part III Processing Broadband Source Using Multi-rate Filterbanks 51 Chapter 5: Multirate Filterbank Review We note that this chapter’s derivations are known in literature. In particular, we refer the reader to [23]. 5.1 Filtering Followed by Downsampling F ↓D ↑D x q v e yG Fig. 5.1. Single channel of multi-rate filterbank In the first half of Fig. 5.1, the input signal x is first filtered by filter F and then downsam- pled by integer D. The output after downsampling is denoted by v. The output at time t′ after filtering but before downsampling is given by the convolution q[t′] = ∑ t f [t ′− t]x[t]. If the input to the down sampler is q, then the output of the downsampler is v[m] = q[mD]. Combining these two relationships yields the input/output relationship v[m] = q[mD] = ∑ k f [mD − k]x[k] (5.1) Reference [23] in Section 4.1 shows that the z-transform of (5.1) is given by 52 V (zD) = 1 D D−1∑ d=0 F (zW dD)X(zW d D) (5.2) 5.2 Upsampling Followed by Filtering In the second half of Fig. 5.1, the signal v is upsampled by integer D and then filtered by filter G. We denote the signal after upsampling but before filtering as e[τ ] and the output after filtering as y[t]. When upsampling by integer D, we have e[τ ] =    v[ τD ] if τ mod D = 0; 0 otherwise. (5.3) The output y is a convolution of the filter G and e so by the previous relationship we can derive the input/output relationship as follows y[t] = (g ∗ e)[t] = ∑ τ g[t− τ ]e[τ ] = ∑ k g[t− kD]e[kD] = ∑ k g[t− kD]v[k] (5.4) In the third line, we let τ = kD with integer k since e[τ ] is only non-zero when τ is an integer multiple of D. [23] in §4.1 shows that the z-transform of (5.4) is given by Y (z) = G(z)V (zD) (5.5) 53 … … … Fn,0(z) … … ↓D ↑D Gn,0(z) nx … + ny… … Fn,l(z) … ↓D ↑D Gn,l(z) … … Fn,C-1(z) ↓D ↑D Gn,C-1(z) 0,ny lny , 1, −Cny 0,nv lnv , 1, −Cnv Fig. 5.2. Filterbank for microphone n. Notice that D, the downsampling and upsampling factor, is not necessarily equal to C, the number of channels (subbands). 5.3 Filterbank Processing The output of the n-th’s microphone filterbank, yn, is the sum of each of the C channels processing result. Hence, we have yn[t] = C−1∑ l=0 yn,l[t] (5.6) where yn,l is the processed result of the l-th channel of the n-th microphone’s filterbank. The z-transform of (5.6) is simply Yn(z) = C−1∑ l=0 Yn,l(z) (5.7) Using (5.4), we can expand (5.6) as follows yn[t] = C−1∑ l=0 ∑ k gn,l[t− kD]vn,l[k] (5.8) where gn,l is the synthesis filter of the l-th channel of the n-th microphone and vn,l is the 54 channel signal after decimation of the l-th channel of the n-th microphone. From (5.7), (5.2), and (5.5), we can express the output of the n-th microphone as follows Yn(z) = 1 D C−1∑ l=0 Gn,l(z) D−1∑ d=0 Fn,l(zW d D)X(zW d D) (5.9) (5.9) can also be equivalently be expressed as a scaled product of a row vector, matrix, and column vector, that is Yn(z) = 1 D Gn,0,D,1×C(z)Fn,C×D(z)X T n,1×D(z) (5.10) where Gn,d,D,1×C(z) = [Gn,0(zW d D), Gn,1(zW d D), · · · , Gn,C−1(zW d D)] (5.11) is a row vector, size 1 by C, containing all C synthesis filters for the n-th filterbank, Fn,C×D(z) =             Fn,0(z) Fn,0(zWD) . . . Fn,0(zW D−1 D ) Fn,1(z) Fn,1(zWD) . . . Fn,1(zW D−1 D ) ... ... . . . ... Fn,C−1(z) Fn,C−1(zWD) . . . Fn,C−1(zW D−1 D )             (5.12) is a C by D analysis modulation matrix ( [24], §4.1) for the n-th filterbank, and XTn,1×D(z) = [Xn(z), Xn(zWD), · · · , Xn(zW D−1 D )] T (5.13) is a column vector, size D by 1, containing aliased versions of Xn(z), the z-transform of the n-th filterbank. We also define the synthesis modulation matrix, size D by C, for the n-th 55 filterbank as follows Gn,D×C(z) =             Gn,0(z) Gn,1(z) . . . Gn,C−1(z) Gn,0(zWD) Gn,1(zWD) . . . Gn,C−1(zWD) ... ... . . . ... Gn,0(zW D−1 D ) Gn,1(zW D−1 D ) . . . Gn,C−1(zW D−1 D )             (5.14) If we expand out the matrix vector multiply Fn,C×D(z)XTn,1×D(z) as follows Fn,C×D(z) XTn,1×D(z) = Xn(z)             Fn,0(z) Fn,1(z) ... Fn,C−1(z)             + Xn(zWD)             Fn,0(zWD) Fn,1(zWD) ... Fn,C−1(zWD)             + . . .+Xn(zW d−1 D )             Fn,0(zW d−1 D ) Fn,1(zW d−1 D ) ... Fn,C−1(zW d−1 D )             (5.15) and observe that W dD = e −j2pi dD is D periodic in d, we notice the following useful equality from the commutative property of addition Fn,C×D(zW d D)X T n,1×D(zW d D) = Fn,C×D(z)X T n,1×D(z) (5.16) for d = 0, 1, . . . , D − 1. Using (5.16), the input/output scalar equality of (5.10) can be written in vector form as follows: YT1×D = 1 D Gn,D×C(z)Fn,C×D(z)X T n,1×D(z) (5.17) where YTn,1×D(z) = [Yn(z), Yn(zWD), · · · , Yn(zW D−1 D )] T (5.18) 56 is a column vector, size D by 1, containing aliased versions of Yn(z) 57 Chapter 6: Processing Random Source Using Plu- rality of Multirate Filterbanks In this chapter, Section 6.1 builds on top of the results of Chapter 5. Section 6.2 presents the second order description of cyclostationary processes. Sections 6.3 and 6.4 describes how cyclostationary processes propagate through a plurality of multirate filterbanks, which is one of our main contributions. We note that the single multirate filterbank description for cyclostationary processes has been derived in [25]. 6.1 Problem Setup Assume that we have N microphones and that the output of each of the N microphones, after pre-filtering, is processed by an individual filterbank. Each filterbank is implemented as a multi-rate, finite-impulse response (FIR) filterbank and each filterbank has the same setup as shown in Fig. 5.2 with D, the downsampling and upsampling rate, and C, the number of channels fixed for allN filterbanks. The analysis and synthesis filters for each subchannel of each filterbank, however, will be allowed to vary. Each of the N filterbanks receives a signal propagating from source xr . See Fig. 6.1 for the overall system processing setup for a single source xr. The n-th filterbank receives a sampled input originating from acoustic source xr. Each microphone samples at the same uniform rate, and this rate is sufficient to recover all I+1 sources, 58 ǥfilterbank00,rx ǥ + ryǥǥ 0,ry nry , 1, Nry 0,0I rx nrx , 1, Nrx filterbankN-1 filterbankn ǥ ǥ 0,rP nrP , 1, NrP ǥ ǥ nrH , nI ,0 1,0 NI Fig. 6.1. A single acoustic source xr propagates to N microphones, and the output of each the N microphones is processed by an individual multi-rate, finite-impulse response (FIR) filterbank. The subscript r of xr indexes the I + 1 sources which consist of I interference sources, whose overall gain we wish to minimize, and 1 target source, whose gain we wish to be exactly 1 so that our system perfectly reconstructs the target. We index the target source by setting r = 0, and hence the target source is denoted by x0. The propagation from source xr to filterbank n is modeled with transfer function Pr,n. The prefilter I0,n inverts the propagation from the target source, x0, to filterbank n. The cascade of the propagation filter Pr,n and target inversion filter I0,n is denoted as Hr,n. The output of the N filterbanks are summed to give a processed signal yr each of which is assumed to be bandlimited. The sampled input is given by xr,n[k] = xr,n(kTs) where k is an integer time index and Ts is the sampling period. The input signal for filterbank n, xr,n, is then fed into C subchannels. The l-th subchannel for the n-th filterbank consists of a FIR analysis filter Fn,l(z), a down-sampler of integer D, an up-sampler (zero interpolator) of integer D, and FIR synthesis filter Gn,l(z). The outputs of the C subchannels are combined to give an 59 output signal yr,n or in the z-domain, Yr,n(z) = D−1∑ d=0 ( 1 D C−1∑ l=0 Gn,l(z)Fn,l(zW d D) ) Xr,n(zW d D) (6.1) where the above equation is simply a rearrangement of terms of (5.9). Gn,l is the transfer function of the synthesis filter of filterbank n’s subchannel l and Fn,l is the transfer function of the analysis filter of filterbank n’s subchannel l. In short, yr,n is the processed output of filterbank n given an input signal propagating from source xr. Assume we have I+1 acoustic point sources in the environment. We denote the r-th source as xr. We set the 0-th source, x0, to be the target of interest and denote all the remaining I sources as interferences. We wish to minimize the overall gain of these I interferences. Let Xr,n(z) be the signal received by the n-th filterbank from source xr, and let Pr,n be the transfer function that models the propagation from source xr to the n-th filterbank. Each filterbank is pre-filtered with a target inversion filter, that is a filter I0,n that inverts the propagation effect on the target’s phase from the target x0 to filterbank n. We denote the cascade of the propagation and target inversion filter from source xr to microphone n as Hr,n , that is Hr,n(z) = Pr,n(z)I0,n(z) (6.2) We then have Xr,n(z) = Hr,n(z)Xr(z) (6.3) Note that the target inversion pre-filter does not vary with source xr but only varies with filterbank n. Remark Ideally, source x0, the target source, enters to each of the N filterbanks with only a amplitude scaling and with its original phase. If we assume that the propagation P0,n from target 60 source x0 to microphone n can be inverted perfectly by prefilter I0,n, then we have H0,n(z) = P0,n(z)I0,n(z) = α0,nz −∆Hn , (6.4) where α0,n is a real scalar representing the amplitude change from propagation and ∆Hn repre- sents a processing delay. The processed output of each of the N filterbanks, Yr,n(z), are summed to give the final output signal Yr(z), the system output from system input Xr(z), that is Yr(z) = N−1∑ n=0 Yr,n(z) (6.5) By substituting the right hand side of (6.1) for Yr,n(z) in (6.5) and then substituting the right hand side of (6.3) evaluated at zW dD for Xr,n(zW d D) in the resulting equation, we have, after rearranging terms, Yr(z) = D−1∑ d=0 ( 1 D N−1∑ n=0 Hr,n(zW d D) C−1∑ l=0 Gn,l(z)Fn,l(zW d D) ) Xr(zW d D) (6.6) We want our system to perfectly reconstruct the target source, x0, after some processing delay ∆, that is we want, in the z-domain, Y0(z) = z −∆X0(z) (6.7) By substituting the right hand side of (6.6) with r = 0, the target index, into the left hand side of (6.7), we get D−1∑ d=0 ( 1 D N−1∑ n=0 H0,n(zW d D) C−1∑ l=0 Gn,l(z)Fn,l(zW d D) ) X0(zW d D) = z −∆X0(z) (6.8) For the equality of (6.8) to hold, the following D conditions need to hold: 61 N−1∑ n=0 H0,n(zW d D) C−1∑ l=0 Gn,l(z)Fn,l(zW d D) =    Dz−∆ if d = 0; 0 if 1 ≤ d ≤ D − 1 (6.9) where ∆ is some positive integer, denoting the overall delay of the system. In matrix-vector form, the D target perfect reconstruction of conditions of (6.9) can be written as             F˙0,1×C(z) F˙1,1×C(z) · · · F˙N−1,1×C(z) F˙0,1×C(zWD) F˙1,1×C(zWD) · · · F˙N−1,1×C(zWD) ... ... . . . ... F˙0,1×C(zW D−1 D ) F˙1,1×C(zW D−1 D ) · · · F˙N−1,1×C(zW D−1 D )             ︸ ︷︷ ︸ F˙D×N·C             GT0,1×C(z) GT1,1×C(z) ... GTN−1,1×C(z)             ︸ ︷︷ ︸ GT1×N·C =             Dz−∆ 0 ... 0             (6.10) where the matrix F˙D×N ·C , size D by N · C, consists of D · N row vectors, size 1 by C, defined as F˙n,1×C(z) = H0,n(z) · [Fn,0(z), Fn,1(z), · · · , Fn,C−1(z)] (6.11) and the column vector GT1×N ·C , size N · C by 1, consists of N column vectors, size C by 1, defined as GTn,1×C(z) = [Gn,0(z), Gn,1(z), · · · , Gn,C−1(z)] T (6.12) 62 … … … Fn,0(z) … … ↓D ↑D Gn,0(z) nx … + ny… … Fn,l(z) … ↓D ↑D Gn,l(z) … … Fn,C-1(z) ↓D ↑D Gn,C-1(z) 0,ny lny , 1, −Cny 0,nv lnv , 1, −Cnv Fig. 6.2. FIR multi-rate filter bank for n-th microphone 6.2 Cyclostationary Processes The output of each microphone is processed by a corresponding multirate filterbank, Fig. 6.2, and hence it is useful to introduce cyclostationary processes when analyzing such filterbank’s statistical properties. Definition 6.2.1. The correlation function of two zero mean discrete random processes xn1 , xn2 , evaluated at time t1, t2, is defined as Rxn1 ,xn2 [t1, t2] = E[xn1 [t1]xn2 [t2]] (6.13) Definition 6.2.2. Two zero mean discrete random processes xn1 , xn2 are said to be jointly wide- sense stationary (WSS) if their correlation function depends only on the difference of their time evaluation, t1 − t2, that is Rxn1 ,xn2 [t1, t2] = Rxn1 ,xn2 [t1 − t2, 0] for all t1, t2 (6.14) To shorten notation for the correlation of jointly WSS signals, we will define the following equivalence: 63 Rxn1 ,xn2 [t1 − t2, 0] ≡ Rxn1 ,xn2 [t1 − t2] (6.15) Definition 6.2.3. The spectral density of two zero mean discrete random processes xn1 , xn2 which are jointly WSS is the z-transform, evaluated at ejw, of their correlation function, that is Sxn1 ,xn2 (e jw) = ∑ τ Rxn1 ,xn2 [τ ]e −jwτ (6.16) Definition 6.2.4. Two zero mean discrete random processes xn1 , xn2 are said to be jointly wide- sense cyclostationary (WSCS) with period D if their correlation function is periodic, that is Rxn1 ,xn2 [t1 +D, t2 +D] = Rxn1 ,xn2 [t1, t2] for all t1, t2 (6.17) Remark Assume xn1 , xn2 are jointly WSS, then we have Rxn1 ,xn2 [t1 + P0, t2 + P0] = Rxn1 ,xn2 [t1 − t2] = Rxn1 ,xn2 [t1, t2] (6.18) (6.18) shows that jointly WSS is a special case of jointly WSCS since jointly WSS implies jointly WSCS with period 1. Remark Assume xn1 , xn2 are jointly WSCS with period D and fix an integer u0. Then their correlation function, defined as a function of integer k, Rxn1 ,xn2 [u0 + k, k] (6.19) is D periodic with respect to input parameter k. Definition 6.2.5. The cyclic correlation function for period D and integer d with 0 ≤ d ≤ D − 1 is defined as 64 Rd/Dxn1 ,xn2 [u] = 1 D D−1∑ k=0 Rxn1 ,xn2 [k + u, k]e −j2pi dD k (6.20) The value d/D is known as the cyclic frequency. Remark From equation (6.17), we observe that for fixed u, Rxn1 ,xn2 [k + u, k] is a periodic sequence in k with period D if xn1 , xn2 are jointly WSCS with period D . Hence, R d/D xn1 ,xn2 [u] are the Fourier coefficients of Rxn1 ,xn2 [k + u, k], that is Rxn1 ,xn2 [k + u, k] = D−1∑ d=0 Rd/Dxn1 ,xn2 [u]W −kd D (6.21) (6.21) can be verified by substituting in (6.20) into the right hand side and using (2.1). Remark If xn1 , xn2 are jointly WSS, then the application of (2.1) for any D ≥ 1 shows that Rd/Dxn1 ,xn2 [u] =    Rxn1 ,xn2 [u] if d = 0 mod D; 0 if 0 < d ≤ D − 1. (6.22) Definition 6.2.6. The cyclic spectral density for period D and integer d is the z-transform, evalu- ated at ejw, of the cyclic correlation function for period D and integer d: Sd/Dxn1 ,xn2 (e jw) = ∑ u Rd/Dxn1 ,xn2 [u]e −jwu (6.23) where w ∈ [0, 2pi). Remark If xn1 , xn2 are jointly WSS, we observe from (6.22) that the cyclic spectral density for d = 0 is exactly the spectral density, that is S0xn1 ,xn2 (e jw) = Sxn1 ,xn2 (e jw) (6.24) 65 6.3 System Input/Output Relationship The following section derives an input/output relationship for the system given in Fig. 6.3. We note that Fig. 6.3 is identical to that of Fig. 6.1 except the subscript r has been dropped from the source x since in the following derivations we are assuming a fixed source xr. We assume that the source x is white noise and hence WSS and with uncorrelated time samples and the output y is the sum of N multirate filterbanks each processing a signal propagating from the source x. In the following, we drop the subscript r from our notation Remark We will show in the following section that when the input signal x is WSS, then xn1 , xn2 , the input to the n1 and n2 filterbank respectively, are jointly WSS. Furthermore, vn1,p, vn2,q, the p and q subchannel signal of the n1 and n2 filterbank respectively are also jointly WSS. However, yn1 , yn2 , the output of the n1 and n2 filterbank respectively, are jointly WSCS with period D. 6.3.1 Filterbank Input Cyclic Spectral Density Lemma 6.3.1. Assume the fixed source x is WSS with spectral density Sx,x(ejw) and is processed with the scheme depicted in Fig. 6.3. Then xn1 and xn2 are jointly WSS with cyclic spectral density given as follows: Sd/Dxn1 ,xn2 (e jw) =    Sx,x(ejw)Hn1(e jw)Hn2(ejw) if d = 0 mod D; 0 if d 6= 0 mod D. (6.25) where D is any given period, D ≥ 1. In the special case when the input signal x is white noise, that is x is WSS and decorrelated, with variance σ2x, then the cyclic spectral density simplifies to the following: 66 0 0x +y 0y ny 1Ny 0,0I x nx 1Nx N‐1n 0P nP 1NP nH nI ,0 1 ,0 NI Fig. 6.3. A single acoustic source x propagates to N microphones, and the output of each the N microphones is processed by an individual multi-rate, finite-impulse response (FIR) filterbank. The propagation from source x to filterbank n is modeled with transfer function Pn. The prefilter I0,n inverts the propagation from the target source, x0, to filterbank n. The cascade of the propa- gation filter Pn and target inversion filter I0,n is denoted as Hn. The output of the N filterbanks are summed to give a processed signal yr Sd/Dxn1 ,xn2 (e jw) =    σ2xHn1(e jw)Hn2(ejw) if d = 0 mod D; 0 if d 6= 0 mod D. (6.26) where as before D is any given period, D ≥ 1. Proof. The inputs xn1 and xn2 to the n1-th filterbank and n2-th filterbank respectively from a single WSS source x are jointly WSS, that is Rxn1 ,xn2 [t1, t2] = Rxn1 ,xn2 [t1 − t2] (6.27) 67 A simple proof of this statement follows: E[xn1 [t1]xn2 [t2]] = E[ ∑ k x[t1 − k]hn1 [k] ∑ l x[t2 − l]hn2 [l]] = ∑ k ∑ l hn1 [k]hn2 [l]E[x[t1 − k]x[t2 − l]] = ∑ k ∑ l hn1 [k]hn2 [l]Rx,x[t1 − t2 − k + l] ︸ ︷︷ ︸ function[t1−t2] where the last line follows because the source x is assumed to be WSS. To show our desired result, we first note that (6.22) implies for d 6= 0 mod D, Sd/Dxn1 ,xn2 (e jw) = 0 since xn1 and xn2 are jointly WSS. Hence, we now calculate the cyclic spectral density between xn1 and xn2 for period D and d = 0. S0xn1 ,xn2 (e jw) = ∑ u R0xn1 ,xn2 [u− 0]e −jwu = ∑ u E[( ∑ k hn1 [u− k]x[k]) ︸ ︷︷ ︸ (x∗hn1 )[u] ( ∑ l hn2 [−l]x[l]) ︸ ︷︷ ︸ (x∗hn2 )[0] ]e−jwu = ∑ k ∑ l ∑ u hn1 [u− k]hn2 [−l]E[x[k]x[l]]︸ ︷︷ ︸ Rx,x[k−l] e−jw(u−k)e−jw(k−l)e−jwl = ∑ l ∑ k ∑ u hn1 [u− k]e −jw(u−k) ︸ ︷︷ ︸ Hn1 (e jw) Rx,x[k − l]e −jw(k−l)hn2 [−l]e −jwl = Hn1(e jw) ∑ l ∑ k Rx,x[k − l]e −jw(k−l) ︸ ︷︷ ︸ S0x,x(ejw) hn2 [−l]e −jwl = Hn1(e jw)Hn2(ejw)S 0 x,x(e jw) (6.28) Hence, we have proven (6.25). If the source signal x is white noise with variance σ2x, then we have Rx,x[k + u, k] = Rx,x[u] = σ 2 xδu (6.29) 68 Using the definition of cyclic spectral density, (6.23), (6.22), and the correlation of white noise, (6.29), we then have S0x,x(e jw) = ∑ u R0x,xe −jwu = ∑ u Rx,x[u]e −jwu = σ2x (6.30) Substituting (6.30) into (6.28), we then have the desired S0xn1 ,xn2 (e jw) = σ2xHn1(e jw)Hn2(ejw) (6.31) which proves (6.26). 6.3.2 Downsampled Subchannel Signal Spectral Density We now derive the spectral density of subchannel signals vn1,p and vn2,q where vn,k is the signal of the k-th subchannel of the n-th filterbanks after being processed by the analysis filter Fn,k and then downsampled by D. As before, n = 0, 1, · · · , N − 1 indexes the filterbanks, and k = 0, 1, · · · , C − 1 indexes the subchannels. See Fig. 5.2. Lemma 6.3.2. The signals vn1,p and vn2,q are jointly WSS with spectral density given by Svn1,p,vn2,q(e jDw) = 1 D D−1∑ m=0 Fn1,p(e j(w−wD,m))· D−1∑ n′=0 Fn2,q(e j(w−wD,n′ ))S(n ′−m)/D xn1 ,xn2 (ej(w−wD,m)) (6.32) where as in (2.8), wD,d = 2piD d. Remark The statement of this lemma holds true even under the weaker assumption that xn1 , xn2 are jointly WSCS with period D. The following proof remains the same under the weaker as- sumption. 69 Proof. We note that this derivation parallels the derivation in Appendix I of [25]. First we show that the downsampled subchannel signals vn1,p, vn2,q are jointly WSS so that their spectral density exists. Using (5.1), we can derive a relationship between the correlation of the downsampled channels vn1,p, vn2,q and the correlation of filter input signals xn1 , xn2 . See Fig. (5.2) for the setup of a single filterbank. Rvn1,p,vn2,q [m,n] = E[vn1,p[m], vn2,q[n]] = E [ ∑ k fn1,p[k]xn1 [Dm− k] ∑ k′ fn2,q[k ′]xn2 [Dn− k ′] ] = ∑ k ∑ k′ fn1,p[k]fn2,q[k ′]Rxn1 ,xn2 [Dm− k,Dn− k ′] = ∑ k ∑ k′ fn1,p[k]fn2,q[k ′]Rxn1 ,xn2 [D(m− n)− k,−k ′] ︸ ︷︷ ︸ function [m− n] (6.33) where we subtracted −Dn from both arguments in the second line since xn1 , xn2 are jointly WSS, (6.27), and therefore jointly WSCS with period 1, (6.18). The last line shows that Rvn1,p,vn2,q [m,n] is a function of only m− n and therefore vn1,p, vn2,q are jointly WSS. We now begin the derivation of (6.32) with the definition of the spectral density of vn1,p and vn2,q, (6.16), evaluating the spectral density at the argument e jDw: Svn1,p,vn2,q(e jDw) = ∑ τ Rvn1,p,vn2,q [τ ]e −jDwτ (6.34) We now substitute in (6.33) into (6.34) by letting τ = m− n, yielding Svn1,p,vn2,q(e jDw) = ∑ τ ∑ k ∑ k′ fn1,p[k]fn2,q[k ′]Rxn1 ,xn2 [Dτ − k,−k ′]e−jDwτ We factor out common terms in the summations (e.g. push summation onto only the terms that vary with the summation index), add and subtract k′ in the first argument of Rxn1 ,xn2 , and let 70 τ ′ = Dτ + k′ − k to get the following: = ∑ k ∑ k′ fn1,p[k]fn2,q[k ′] ∑ τ Rxn1 ,xn2 [−k ′ +Dτ + k′ − k ︸ ︷︷ ︸ τ ′ ,−k′]e−jDwτ = ∑ k ∑ k′ fn1,p[k]fn2,q[k ′] ∑ τ ′:τ ′−k′+k=0 mod D Rxn1 ,xn2 [−k ′ + τ ′,−k′]e−jw(τ ′−k′+k) We next insert the following identity that can be verified using the formula for the sum of terms of a geometric series. 1 D D−1∑ m=0 W−m(τ ′−k′+k) D =    1 if τ ′ − k′ + k is a multiple of D; 0 otherwise. where, as before, WD = e−j2pi/D. By inserting this identity, the summation over τ ′ : τ ′ − k′ + k = 0 mod D can be simplified to be a summation over all τ ′ Hence, we have = ∑ k ∑ k′ fn1,p[k]fn2,q[k ′] ∑ τ ′ Rxn1 ,xn2 [−k ′ + τ ′,−k′]( 1 D D−1∑ m=0 W−m(τ ′−k′+k) D )e −jw(τ ′−k′+k) We now use equation (6.21) to expand Rxn1 ,xn2 [−k ′ + τ ′,−k′] into its Fourier Series rep- resentation, rearrange terms, and use the definition of the Discrete Time Fourier Transform to get the following: = 1 D D−1∑ m=0 D−1∑ n=0 ∑ k ∑ k′ fn1,p[k]fn2,q[k ′] ∑ τ ′ Rn/Dxn1 ,xn2 [τ ′]Wnk ′ D ·W −m(τ ′−k′+k) D e −jw(τ ′−k′+k) = 1 D D−1∑ m=0 D−1∑ n=0 ∑ k ∑ k′ fn1,p[k]e −jwkW−mkD fn2,q[k ′]ejwk ′ W (m+n)k ′ D ∑ τ ′ Rn/Dxn1 ,xn2 [τ ′]e−jwτ ′ W−mτ ′ D = 1 D D−1∑ m=0 D−1∑ n=0 Fn1,p(e j(w−wD,m)) · Fn2,q(e j(w−wD,m+n))Sn/Dxn1 ,xn2 (e j(w−wD,m)) 71 where as in (2.8), wD,d = 2piD d. We again factor out common terms in the summation and let n′ = m+ n to get = 1 D D−1∑ m=0 Fn1,p(e j(w−wD,m)) · m+D−1∑ n′=m Fn2,q(e j(w−wD,n′ ))S(n ′−m)/D xn1 ,xn2 (ej(w−wD,m)) For fixed m with 0 ≤ m ≤ D− 1, the inner sum over n′ has (m+D− 1)− (D) + 1 = m terms whose n′ index is greater than or equal toD. Each of these “overflow” indices can be shifted by −D to make them each less than m because of the following two relationships, each of which can be verified by direct substitution: S(n ′−m)/D xn1 ,xn2 (ej(w−wD,m)) = S(n ′−D−m)/D xn1 ,xn2 (ej(w−wD,m)) Fn2,l(e j(w−wD,n′ )) = Fn2,l(e j(w−wD,n′−D)) After shifting the “overflow” indices, the summation over n′ runs from 0 to D − 1 instead of from m to m+D − 1. We then have the desired result: Svn1,p,vn2,q(e jDw) = 1 D D−1∑ m=0 Fn1,p(e j(w−wD,m))· D−1∑ n′=0 Fn2,q(e j(w−wD,n′ ))S(n ′−m)/D xn1 ,xn2 (ej(w−wD,m)) 6.3.3 Filterbank Input and Downsampled Subchannel Signal Matrix Relationship We first define the cyclic spectral density matrix of filterbank input signals xn1 and xn2 of size D ×D as follows (Sxn1 ,xn2 ,D×D(e jw))m,n = S (n−m)/D xn1 ,xn2 (ej(w−wD,m)) (6.35) 72 where m,n = 0, . . . , D − 1 and wD,m = 2piDm, as in (2.8). Note that when xn1 , xn2 are jointly WSS, Sxn1 ,xn2 ,D×D(e jw) is a diagonal matrix. Then (6.25) implies we can decompose the matrix Sxn1 ,xn2 ,D×D(e jw) as follows Sxn1 ,xn2 ,D×D(e jw) = Hn1,D×D(e jw)Sx,x,D×D(e jw)H∗n2,D×D(e jw) (6.36) where Hnk,D×D(e jw) = diag(Hnk(e jw), Hnk(e j(w−wD,1)), . . . ,Hnk(e j(w−wD,D−1)) (6.37) for k = 0, 1, . . . , N − 1 and Sxn1 ,xn2 ,D×D(e jw) = diag(Sx,x(e jw), Sx,x(e j(w−wD,1)), . . . , Sx,x(e j(w−wD,D−1)) (6.38) If x is white noise with variance σ2x, then (6.36) simplifies to Sxn1 ,xn2 ,D×D(e jw) = σ2xHn1,D×D(e jw))H∗n2,D×D(e jw) (6.39) We observe that the scalar spectral density of downsampled subchannel signals, (6.32), can then be written as product of a row vector, matrix, and column vector, that is Svn1,p,vn2,q(e jDw) = 1 D Fn1,p,1×D(e jw)Sxn1 ,xn2 ,D×D(e jw)F∗n2,q,1×D(e jw) (6.40) where Fn,r,1×D(e jw) = [Fn,r(e jw), Fn,r(e j(w−wD,1)), . . . , Fn,r(e j(w−wD,D−1))] (6.41) 73 Notice that in (6.32), the subchannel indexes p and q are both fixed. Each of the two fil- terbanks n1, n2 has C subchannels and therefore both p, q can vary between 0 and C − 1, that is 0 ≤ p, q ≤ C−1. To capture all C×C spectral density relationships between different downsam- pled subchannels of filterbanks n1 and n2, we define the spectral density matrix of downsampled subchannel signals, size C × C, as follows: (Svn1 ,vn2 ,C×C(e jDw))pq = Svn1,p,vn2,q(e jDw) (6.42) Then using (6.40), the modulation matrix of the analysis filters, (5.12), and (6.36), we have Svn1 ,vn2 ,C×C(e jDw) = 1 D Fn1,C×D(e jw)Sxn1 ,xn2 ,D×D(e jw)F∗n2,C×D(e jw) = 1 D Fn1,C×D(e jw)Hn1,D×D(e jw)Sx,x,D×D(e jw)H∗n2,D×D(e jw)F∗n2,C×D(e jw) (6.43) 6.3.4 Filterbank Output Cylic Correlation We now derive the following cyclic correlation between two filterbanks’ output signals yn1 and yn2 where yn is the output of the n-th filterbank in terms of correlation between downsampled subchannel signals and vn,l is the l-th subchannel signal of the n-th filterbank after downsampling. Lemma 6.3.3. For any integer u and 0 ≤ d ≤ D − 1, Rd/Dyn1 ,yn2 [u] = 1 D C−1∑ p,q=0 ∑ r ∑ τ gn1,p[r + u−Dτ ] · gn2,q[r]Rvn1,p,vn2,q [τ ]e −j2pi dD r (6.44) Proof. First, we derive an expression for the correlation between two filterbanks’ outputs in terms of their downsampled subchannel signals where yn is the multirate output of the n-th filterbank, n = 0, 1 . . . , N−1, and vn,l is the l-th subchannel signal after downsampling of the n-th filterbank, l = 0, 1, . . . , C − 1. We use (6.13) ,(5.8), and (6.33) to get the following: 74 Ryn1 ,yn2 [n, k] = E[yn1 [n]yn2 [k]] = E   C−1∑ p=0 ∑ m′ gn1,p[n−Dm ′]vn1,p[m ′] C−1∑ q=0 ∑ m gn2,q[k −Dm]vn2,q[m]   = C−1∑ p,q=0 ∑ m′ ∑ m gn1,p[n−Dm ′] · gn2,q[k −Dm]Rvn1,p,vn2,q [m ′ −m] (6.45) We can verify that the correlation of processed filterbank outputs are cyclostationary with period D, that is Ryn1 ,yn2 [n+D, k +D] = Ryn1 ,yn2 [n, k] (6.46) using two changes of variables (m′′ = m′− 1,m′′′ = m− 1) in (6.45). If we let n = k+u and τ = m′ −m in (6.45), we have Ryn1 ,yn2 [k+u, k] = C−1∑ p,q=0 ∑ τ ∑ m gn1,p[k+u−D(τ+m)] ·gn2,q[k−Dm]Rvn1,p,vn2,q [τ ] (6.47) We substitute (6.47) into the definition of the cyclic correlation function, (6.20), rearrange terms, and then multiply by ej2pi d DDm = 1 for integers d and m to get Rd/Dyn1 ,yn2 [u] = 1 D D−1∑ k=0 C−1∑ p,q=0 ∑ τ ∑ m gn1,p[k + u−D(τ +m)] · gn2,q[k −Dm]Rvn1,p,vn2,q [τ ]e −j2pi dD k Rd/Dyn1 ,yn2 [u] = 1 D C−1∑ p,q=0 ∑ τ Rvn1,p,vn2,q [τ ] · ( D−1∑ k=0 ∑ m gn1,p[k −Dm+ u−Dτ ] · gn2,q[k −Dm]e −j2pi dD (k−Dm) ) We observe that ∑D−1 k=0 ∑ m f [k − Dm] = ∑ r f [r], and hence we let r = k − Dm to combine the summations over k and m into one summation over r to get the desired cyclic cross correlation. 75 Rd/Dyn1 ,yn2 [u] = 1 D C−1∑ p,q=0 ∑ r ∑ τ gn1,p[r + u−Dτ ] · gn2,q[r]Rvn1,p,vn2,q [τ ]e −j2pi dD r 6.3.5 Filterbank Output Cyclic Spectral Density We now derive the filterbank output cyclic spectral density in terms of the spectral density of downsampled subchannel signals. We begin with the definition of cyclic spectral density (6.23), substitute in (6.44), factor out common terms in the summations, multiply by ejw(r−r+Dτ−Dτ) = 1, and use the definition of the Discrete Time Fourier Transform to get the following: Sd/Dyn1 ,yn2 (e jw) = ∑ u Rd/Dyn1 ,yn2 [u]e −jwu = 1 D C−1∑ p,q=0 ∑ u ∑ r ∑ τ gn1,p[r + u−Dτ ] · gn2,q[r]Rvn1,p,vn2,q [τ ]e −j2pi dD re−jwu = 1 D C−1∑ p,q=0 ∑ r gn2,q[r]e j(w−2pi dD )r ∑ τ Rvn1,p,vn2,q [τ ]e −jwDτ · ∑ u gn1,p(r + u−Dτ)e −jw(r+u−Dτ) = 1 D C−1∑ p,q=0 Gn2,q(e j(w−wD,d))Svn1,p,vn2,q(e jDw)Gn1,p(e jw) (6.48) where, as before in (2.8), wD,d = 2pi dD . 6.3.6 Downsampled Subchannel Signal and Filterbank Output Matrix Relationship We observe that (6.48) can be written as a product of a row vector, matrix, and column vector, that is Sd/Dyn1 ,yn2 (e jw) = 1 D Gn1,0,D,1×C(e jw)Svn1 ,vn2 ,C×C(e jDw)G∗n2,d,D,1×C(e jw) (6.49) 76 where Gn,d,D,1×C(e jw) = [Gn,0(e j(w−wD,d)), Gn,1(e j(w−wD,d)), . . . , Gn,C−1(e j(w−wD,d))] We define the cyclic spectral density matrix of output signals yn1 , yn2 , size D ×D, as (Syn1 ,yn2 ,D×D(e jw))m,n = S (n−m)/D yn1 ,yn2 (ej(w−wD,m)) (6.50) where, as before in (2.8), wD,m = 2pimD . Then, using (6.49) and the modulation matrix of synthesis filters, (5.14), we have Syn1 ,yn2 ,D×D(e jw) = 1 D Gn1,D×C(e jw)Svn1 ,vn2 ,C×C(e jDw)G∗n2,D×C(e jw) (6.51) 6.3.7 Matrix Relationship Between Filterbank Inputs and Filterbank Outputs Combining the previous matrix relationships of (6.36), (6.43), and (6.51) gives the follow- ing statistical input/output relationship of two different filterbanks n1, n2. Syn1 ,yn2 ,D×D(e jw) = 1 D2 · ( Gn1,D×C(e jw)Fn1,C×D(e jw)Hn1,D×D(e jw) ) Sx,x,D×D(e jw) ( H∗n2,D×D(e jw)F∗n2,C×D(e jw)G∗n2,D×C(e jw) ) (6.52) To shorten notation, we define the following relationship: Qn,D×D(e jw) = Gn,D×C(e jw)Fn,C×D(e jw)Hn,D×D(e jw) (6.53) where synthesis modulation matrix Gn,D×C(ejw) is given by (5.14), analysis modulation matrix Fn,C×D(ejw) is given by (5.12), and propagation and target inversion cascade matrix Hn,D×D(ejw), a diagonal matrix, is given by (6.37) 77 Hence, we can write (6.52) as simply Syn1 ,yn2 ,D×D(e jw) = 1 D2 Qn1,D×D(e jw)Sx,x,D×D(e jw)Q∗n2,D×D(e jw) (6.54) In the case when x is white noise with variance σ2x, then (6.54) simplifies to Syn1 ,yn2 ,D×D(e jw) = σ2x D2 Qn1,D×D(e jw)Q∗n2,D×D(e jw) (6.55) 6.4 Time Averaged Energy of Output We now show that the time averaged energy of the system output y, the sum of processing from N filterbanks, from a single source x is related to the trace of the cyclic spectral density of filterbank outputs Syn1 ,yn2 ,D×D. We measure the time averaged energy of the output, σ 2 y , as follows: σ2y = lim T1→∞ T2→∞ 1 T1 + T2 + 1 T2∑ t=−T1 E[y[t]y[t]] (6.56) The following theorem is used to measure the gain of a single WSS source being processed by a plurality of multirate filterbanks. Theorem 6.4.1. Assume source x is white noise with variance σ2x. Then the output y after processing through the system given by Fig. 6.1, has time averaged energy given as follows: σ2y = σ2x 2piD2 D−1∑ d1=0 D−1∑ d2=0 ∫ pi D −pi D | ( N−1∑ n=0 Qn,D×D(e jw) ) d1,d2 |2dw (6.57) where Qn,D×D(ejw) = Gn,D×C(ejw)Fn,C×D(ejw)Hn,D×D(ejw) as defined in (6.53). 78 To simplify the notation of (6.57, we define the 2-norm of a D ×D matrix valued function QD×D(ejw) as ‖QD×D(e jw)‖22 ≡ D−1∑ d1=0 D−1∑ d2=0 ∫ pi D −pi D | ( QD×D(e jw) ) d1,d2 |2dw (6.58) With the above definition, (6.57) becomes σ2y = σ2x 2piD2 ‖ N−1∑ n=0 Qn,D×D‖ 2 2 (6.59) Proof. In Fig. 6.1, for a given interference x, the output y is a sum of N outputs from each microphone’s filter bank, that is y[t] = N−1∑ n=0 yn[t] (6.60) where yn is the output of the n-th microphone’s filter bank. Using (6.60) and (6.46), we can verify that y is WSCS with period D. Hence, we can measure the energy of the output just over one period D as the following calculation shows: 79 σ2y = lim T1→∞ T2→∞ 1 T1 + T2 + 1 T2∑ t=−T1 E[y[t]]2 = lim Np→∞ 1 D(2Np + 1) Np∑ p=−Np D−1∑ l=0 E[y[l + pD]y[l + pD]] = 1 D D−1∑ l=0 lim Np→∞ 1 (2Np + 1) Np∑ p=−Np E[y[l + pD]y[l + pD]] = 1 D D−1∑ l=0 lim Np→∞ 1 (2Np + 1) (2Np + 1)E[y[l]y[l]] = 1 D D−1∑ l=0 E[y[l]y[l]] = 1 D D−1∑ l=0 Ry[l, l] (6.61) Expanding the correlation of y,Ry[l, l], shows that we are interested in the correlation of each of the N filterbank’s outputs. Ry[l, l] = E[y[l]y[l]] = E[ N−1∑ n1=0 yn1 [l] N−1∑ n2=0 yn2 [l]] = N−1∑ n1=0 N−1∑ n2=0 Ryn1 ,yn2 [l, l] (6.62) Substituting (6.62) into (6.61) and changing the order of the finite summations, we have σ2y = N−1∑ n1=0 N−1∑ n2=0 1 D D−1∑ l=0 Ryn1 ,yn2 [l, l] (6.63) We observe that the weighted inner sum is simply the time averaged energy of the cross correlation between filter bank outputs yn1 , yn2 (see (6.61)), that is σ2y = N−1∑ n1=0 N−1∑ n2=0 σ2yn1 ,yn2 (6.64) 80 The following deriviation relates σ2yn1 ,yn2 to the trace of the cyclic spectral density matrix Syn1 ,yn2 . σ2yn1 ,yn2 = 1 D D−1∑ l=0 Ryn1 ,yn2 [l, l] = 1 D D−1∑ l=0 ∑ u Ryn1 ,yn2 [l + u, l] · 1 2pi ∫ pi D −2pi+ piD e−jwudw ︸ ︷︷ ︸ 1 if u = 0 and 0 otherwise = 1 2pi ∫ pi D −2pi+ piD ∑ u 1 D D−1∑ l=0 Ryn1 ,yn2 [l + u, l] ︸ ︷︷ ︸ R0yn1 ,yn2 [u] e−jwudw = 1 2pi ∫ pi D −2pi+ piD ∑ u R0yn1 ,yn2 [u]e −jwu ︸ ︷︷ ︸ S0yn1 ,yn2 (ejw) dw = 1 2pi D−1∑ r=0 ∫ pi D − piD S0yn1 ,yn2 (e j(w−wD,r))dw = 1 2pi ∫ pi D − piD D−1∑ r=0 S0yn1 ,yn2 (e j(w−wD,r))dw = 1 2pi ∫ pi D − piD Tr(Syn1 ,yn2 ,D×D(e jw))dw (6.65) The third and fourth line follow from the definition of cyclic correlation, (6.20), and cyclic spectral density, (6.23), respectively. The fifth line breaks apart the integration into D intervals each of length 2piD , and the last line follows the definition of Syn1 ,yn2 ,D×D(e jw), (6.50). Substitut- ing (6.65) into (6.64) yields: σ2y = 1 2pi ∫ pi D −pi D N−1∑ n1=0 N−1∑ n2=0 Tr(Syn1 ,yn2 ,D×D(e jw))dw (6.66) Using (6.55), the definition of the Hilbert-Schmidt inner product, (2.10), and the definition of the Hilbert-Schmidt norm, (2.9), we can simplify the integrand of (6.66) as follows: 81 N−1∑ n1=0 N−1∑ n2=0 Tr(Syn1 ,yn2 ,D×D(e jw)) = σ2x D2 N−1∑ n1=0 N−1∑ n2=0 Tr(Qn1,D×D(e jw)Q∗n2,D×D(e jw)) = σ2x D2 N−1∑ n1=0 N−1∑ n2=0 〈 Qn1,D×D(e jw),Q∗n2,D×D(e jw) 〉 HS = σ2x D2 〈 N−1∑ n1=0 Qn1,D×D(e jw), N−1∑ n2=0 Q∗n2,D×D(e jw) 〉 HS = σ2x D2 ‖ N−1∑ n=0 Qn,D×D(e jw)‖2HS = σ2x D2 D−1∑ d1=0 D−1∑ d2=0 | ( N−1∑ n=0 Qn,D×D(e jw) ) d1,d2 |2 (6.67) Substituting (6.67) into (6.66) yields the desired σ2y = σ2x 2piD2 D−1∑ d1=0 D−1∑ d2=0 ∫ pi D −pi D | ( N−1∑ n=0 Qn,D×D(e jw) ) d1,d2 |2dw 82 Chapter 7: Optimization Over Synthesis Filters Using a Sparse Number of Subchan- nels 7.1 Overview Our overall objective is to minimize the p-norm of interference source gains while both re- constructing perfectly the target source and using a sparse number of subchannels. In our problem setup, there are two types of sources, I interferences, whose gains we wish to attenuate and a single target, whose gain from system processing we wish to be exactly equal to 1. In other words, our system will process the target source with no distortion. Specifically, our desired optimization problem reads min Gn,D×C(ejw) ‖ { ‖ N−1∑ n=0 Gn,D×C(e jw)Fn,C×D(e jw)H(r)n,D×D(e jw)‖22 } 1≤r≤L ‖p subject to N−1∑ n=0 Gn,D×C(e jw)Fn,C×D(e jw)H(0)n,D×D(e jw) = Dejw diag [ (e− 2pijd D )D−1d=0 ] N−1∑ n=0 ‖{ max −pi≤w≤pi | ( Gn,D×C(e jw) ) 0,l |}0≤l≤C−1‖0 ≤ VS (7.1) where H(r)n,D×D(e jw) refers to the product of two frequency domain objects: the propaga- 83 tion of source r to microphone n and the target inversion filter specific to microphone n, D is the decimation factor, C is the number of subchannels per filterbank, and VS is the desired number of active subchannels. Unfortunately, this is not a convex optimization problem. We assume our set of N ·C analysis filters, denoted by F , are fixed and known where N is the number of microphones and C is the number of subchannels for each microphone, that is F = ( Fn,l(e jw) ) 0≤n≤N−1 0≤l≤C−1 w∈[0,2pi) (7.2) is a known quantity. We further assume that the locations of the sources and microphones and known and hence the cascade of the source propagation and target inversion filters, denoted by H , is also known. There are I + 1 sources with I interferences and a single target source and N microphones and hence H consists of (I + 1) ·N transfer functions. We define H as follows H = ( Hr,n(e jw) ) 0≤r≤I 0≤n≤N−1 w∈[0,2pi) (7.3) Our optimization is then over the N · C synthesis filters, denoted by G, where G is defined as follows G = ( Gn,l(e jw) ) 0≤n≤N−1 0≤l≤C−1 w∈[0,2pi) (7.4) To make finding our unknown G computationally tractable, we will discretize the continu- ous variable w, 0 ≤ w < 2pi, with Nf equally spaced points, that is we let w = 2pif Nf with f = 0, 1, . . . , Nf − 1 (7.5) 84 To simply our computations, we will also make the assumption that Nf , the number of discretized frequencies, is an even multiple of D, the downsampling factor, that is Nf = M ·D (7.6) withM a positive even integer. In addition, to vectorize our computations, we will combine the indexes n and l over s (for subchannel) by letting s = l + nC and S = N · C (7.7) Hence, the discretized version of unknown synthesis filters G, Gcompute, consists of S ·Nf unknown complex numbers, that is Gcompute = ( Gn,l(e j( 2piNf f) ) ) 0≤s≤S−1 0≤f≤Nf−1 (7.8) By penalizing non-zero subchannel synthesis tap coefficients in our optimization problem with an absolute value penalty term, in the spirit of the LASSO algorithm given in [1], we will force certain subchannels to be considered inactive and create a sparse set of active subchannels. A subchannel will be considered inactive if its synthesis tap coefficients are zero are close to zero. We will measure gain using the time-averaged energy given by (6.57). Notice that (6.57) assumes a fixed source x, and hence we will need to compute (6.57) for each of our I + 1 sources xr. In this chapter, we first review the prior work before detailing our problem solution. Next, we discuss the the objective function which consists of two terms, the p-norm of interference source gains and the sparse subchannel penalty term. Afterwards, we discuss the target source perfect reconstruction constraint and then combine the results in the final section to give the full optimization problem. 85 7.2 Prior Work Our work optimizes both the placement of microphones and the design of the multirate filterbanks processing the microphone’s inputs simultaneously. To the best of our knowledge, this is a novel contribution. There has been extensive work, however, in both the areas of sensor placement and optimization of multirate filterbanks. 7.2.1 Sensor Placement There has been extensive work in the sensor placement problem using a variety of strate- gies. In [26], the authors use simulated annealing to simultaneously optimize both weights and sensor locations on a linear array. In [27], the authors address a parameter estimation problem that includes sensor location using convex optimization. They relax a binary variable of a sensor being off, 0, or on, 1, by letting the variable instead be in the range of [0, 1]. In [28], the authors relax the binary variable of a sensor being off, 0, or on, 1, by converting the unknown vector to a matrix of 0s and 1s which belongs to the class of Steifel matrices. The relaxation is to a 1-d sphere and they jump to multiple dimensions using a greedy algorithm. The authors here are interested optimizing an objective criteria involving the KullbackLeibler divergence. [29] minimizes the side lobe produced by a sparse linear array using a greedy deletion algorithm. [30] maximizes the Fisher Information in their optimization criteria. When formulating an optimization problem to choose sensors, they convert a matrix equality to a matrix inequality in their convex relaxation. [31] address the problem of recovering K target sources emanating from a N possible locations using N sensors where K << N . They convert the problem to one of selecting rows that minimize the correlation coefficient and propose a genetic algorithm that is data-dependent. 86 [32] addresses the the following problem: Given a desired beampattern response and er- ror tolerance, what is the sparsest sensor setup that will achieve the desired beampattern within the given error tolerance? The authors address this problem using reweighted l − 1 minimization and assuming the use of regularly spaced rectangular array, symmetric subarrays, and weights with conjugate symmetry. In [22], the authors address the same problem using a regularization approach inspired by Bayesian statistics. [33] also addresses the same problem. The author con- strain microphone distances and guess a configuration and the perturb the configuration seeking a better solution. [34] address the same problem for a beampattern produced by a transmission array and propose a greedy addition algorithm. In contrast, our work addresses the following problem: Given a fixed number of sensors, what is the best possible beampattern achievable? 7.2.2 Filterbank Optimization There has also been extensive work in the optimization of filterbanks. [35], a work pub- lished in 1980, was an early forerunner in the field. The author optimizes a quadrature mirror filterbank to meet a user-given frequency response criteria. He minimizes both the ripple energy and out of band energy using a search algorithm whose success is highly dependent on both the starting point and step size. Similar to our proposed problem setup but for a single filterbank, [36] fixes the analysis filters and optimizes the synthesis filters to achieve the best possible reconstruc- tion given a user-specified integer time delay. They convert their problem to a H∞ problem to take advantage of existing software. In [37], the authors design a multi-dimensional perfect re- construction filterbank where both the analysis and synthesis filter are FIR filters of equal length. They directly embed this non-linear and non-convex constraint into their optimization problem where the objective function measures the difference between a desired analysis filterbank and the 87 optimized analysis filterbank. The author of [38] designs perfect reconstruction and near perfect reconstruction filterbanks using a quadratic constrained least-squares optimization problem. The author solves a relaxed problem by converting the quadratic constraints to linear constraints. The authors of [39] design a cosine modulated filterbanks with a linear phase prototype filter using convex optimization by converting a non-convex set to a convex set. More recently, Davidson in [40] provides a novice-friendly tutorial of convex optimization techniques when designing FIR filters. Finally, [25] optimizes a single FIR filterbank to minimize the time-averaged mean-squared error when the high-frequency subband is dropped. We build off their techniques to derive an in- put/output relationship for our system consisting of a plurality of filterbanks. 7.3 Objective Function Our objective function is the sum of two terms, one measuring the p-norm of the gain of the I interferences, JI,p, and the other penalizing active subchannels, JS , that is those subchannels with non-zero synthesis filters responses. Typical p-norms of interest would be p = 1, 2,∞. We adjust the severity of our active subchannel penalty by choosing a non-negative constant λ to weight our active subchannel term JS . The larger a λ chosen, the more severe the active subchannel penalty and the fewer number of active subchannels our optimization problem will recover. Conversely, the smaller a λ chosen, the less severe the active subchannel penalty and the greater number active subchannels our optimization problem will recover. By choosing λ equal to 0, we eliminate the active subchannel penalty altogether and allow the use of all N · C channels. Our optimization is over N · C synthesis filter responses where N is the number of micro- phones and C is the number of subchannels for each microphone. We denote these set of filter responses by G defined in (7.4). Hence our objective function J can be written as 88 J(G) = JI,p(G) + λ · JS(G) (7.9) G as stated above is a function of continuous variable w and to make J(G) computation- ally tractable, we will discretize G using Gcompute, defined in (7.8), and update both JI,p and JS accordingly. Hence, what we will compute for our objective function J(G) will be the approxi- mation Jcompute(Gcompute), that is, J(G) ≈ Jcompute(Gcompute) = JI,p,compute(Gcompute) + λ · JS,compute(Gcompute) (7.10) 7.3.1 p-norm of Interference Gains Term We let the p-norm of the interference gains be the p-norm of the interference sources’ time- averaged energies, that is JI,p(G) = ‖(σ 2 yr)r‖p (7.11) where the time-averaged energy σ2yr is given by (6.57). We observe that (6.57) is a function of Qn,D×D which varies with n, the microphone index. The definition of Qn,D×D,(6.53), further shows that for fixed n, Qn,D×D varies with Hn,D×D, the cascade of propagation and target and inversion filter responses, Fn,C×D, the analysis filter responses, and Gn,D×C , the synthesis filter responses. As stated above, we assume that both H and F , the set of target and inversion filter responses and analysis filter repsonses for all n respectively, are fixed and known and hence in our setup, the time-averaged energy only varies with synthesis filter repsonses for each n, that is G. 89 For the case p =∞, JI,p(G) then becomes JI,∞(G) = max r σ2yr (7.12) 7.3.2 Discretization of σ2yr In the following derivations, we will discretize the value σ2yr over frequency w with a set of finite, uniformly spaced points to give σ2yr,compute, a computationally tractable term. We will further assume all our sources have equal variance σ2x to further simplify computationsf to give σˆ2yr,compute To begin, we use (6.57) to expand σ2yr and observe that the d1, d2 term of the sum of N Qn,D×D matrices is equal to summing each of the d1, d2 terms of the same set of N matrices to get the following: σ2yr = σ2x 2piD2 D−1∑ d1=0 D−1∑ d2=0 ∫ pi D −pi D | ( N−1∑ n=0 Qn,D×D(e jw) ) d1,d2 |2dw = σ2x 2piD2 D−1∑ d1=0 D−1∑ d2=0 ∫ pi D −pi D | N−1∑ n=0 ( Qn,D×D(e jw) ) d1,d2 |2dw (7.13) Next, by observing that Hn,D×D(ejw) is a diagonal matrix in the definition of Qn,D×D(ejw),(6.53), we can write the scalar expression ( Qn,D×D(ejw) ) d1,d2 as follows ( Qn,D×D(e jw) ) d1,d2 = C−1∑ l=0 Gn,l(e j(w−wD,d1 ))Fn,l(e j(w−wD,d2 ))Hr,n(e j(w−wD,d2 )) (7.14) Substituting (7.14) into (7.13) yields, 90 σ2yr = σ2x 2piD2 D−1∑ d1=0 D−1∑ d2=0 · ∫ pi D −pi D | N−1∑ n=0 C−1∑ l=0 Gn,l(e j(w−wD,d1 ))Fn,l(e j(w−wD,d2 ))Hr,n(e j(w−wD,d2 )|2dw (7.15) We can then approximate the integral in (7.15) as follows: ∫ pi D −pi D | N−1∑ n=0 C−1∑ l=0 Gn,l(e j(w−wD,d1 ))Fn,l(e j(w−wD,d2 ))Hr,n(e j(w−wD,d2 )|2dw = ∫ pi D − piD ∣ ∣ ∣ ∣ ∣ N−1∑ n=0 C−1∑ l=0 Gn,l(e j(w− 2pid1D ))Fn,l(e j(w− 2pid2D ))Hr,n(e j(w− 2pid2D )) ∣ ∣ ∣ ∣ ∣ 2 dw ≈ 2pi Nf NF 2D −1∑ f=− NF 2D ∣ ∣ ∣ ∣ ∣ N−1∑ n=0 C−1∑ l=0 Gn,l(e j2pi( fNf − d1D ))Fn,l(e j2pi( fNf − d2D ))Hr,n(e j2pi( fNf − d2D )) ∣ ∣ ∣ ∣ ∣ 2 (7.16) Substituting (7.16) for the integral in (7.15), yields σ2yr,compute, an approximation of σ 2 yr that is computationally tractable, that is σ2yr ≈ σ 2 yr,compute = σ2xr D2Nf D−1∑ d1=0 D−1∑ d2=0 · NF 2D −1∑ f=− NF 2D ∣ ∣ ∣ ∣ ∣ N−1∑ n=0 C−1∑ l=0 Gn,l(e j2pi( fNf − d1D ))Fn,l(e j2pi( fNf − d2D ))Hr,n(e j2pi( fNf − d2D )) ∣ ∣ ∣ ∣ ∣ 2 (7.17) If we assume that each source xr has the same variance, that is σ2xr = σ 2 x for all r, we can treat the leading coefficient as a constant and (7.17) becomes 91 σˆ2yr,compute = σx2 D2Nf · D−1∑ d1=0 D−1∑ d2=0 NF 2D −1∑ f=− NF 2D ∣ ∣ ∣ ∣ ∣ N−1∑ n=0 C−1∑ l=0 Gn,l(e j2pi( fNf − d1D ))Fn,l(e j2pi( fNf − d2D ))Hr,n(e j2pi( fNf − d2D )) ∣ ∣ ∣ ∣ ∣ 2 (7.18) To simply notation, we define the following equality: F˙n,l,r(e j2pi( fNf − d2D )) ≡ Fn,l(e j2pi( fNf − d2D ))Hr,n(e j2pi( fNf − d2D )) (7.19) We note that the value of (7.19) is known by assumption. We also combine the summations over n and l inside the magnitude squared into a single summation over s (for subchannel) by letting s = l + nC and S = N · C (7.20) Hence, (7.18) becomes σˆ2yr,compute = σx2 D2Nf D−1∑ d1=0 D−1∑ d2=0 NF 2D −1∑ f=− NF 2D ∣ ∣ ∣ ∣ ∣ S−1∑ s=0 Gs(e j2pi( fNf − d1D ))F˙s,r(e j2pi( fNf − d2D )) ∣ ∣ ∣ ∣ ∣ 2 (7.21) To efficiently compute (7.21), we would like to rewrite it as a product of a row vector, matrix, and column vector where the row and column vector contain the unknown discretized frequency responses of all S synthesis filters. To begin, we first expand the magnitude squared of (7.21) and rearrange the finite summations to get 92 σˆ2yr,compute = σx2 D2Nf D−1∑ d1=0 NF 2D −1∑ f=− NF 2D S−1∑ s1=0 S−1∑ s2=0 Gs1(e j2pi( fNf − d1D ))Gs2(e j2pi( fNf − d1D )) ·   D−1∑ d2=0 F˙s1,r(e j2pi( fNf − d2D ))F˙ s2,r(e j2pi( fNf − d2D ))   ︸ ︷︷ ︸ Φr(s1,s2,f) (7.22) We choose the symbol Φ since it represents capital “F ” in Greek. In order to to reduce the summations over both d1 and f to a single summation over f , we make the assumption that Nf is an even multiple of D, that is Nf = M ·D (7.23) with M a positive even integer. Rewriting the arguments to Gs1 and Gs2 , we then have σˆ2yr,compute = σx2 D2Nf D−1∑ d1=0 NF 2D −1∑ f=− NF 2D S−1∑ s1=0 S−1∑ s2=0 Gs1(e j2pi( f−Md1Nf ) )Gs2(e j2pi( f−Md1Nf ) )Φr(s1, s2, f) (7.24) A calculation in the appendix, (A.1), shows that Φr(s1, s2, f) is M -periodic in f , that is Φr(s1, s2, f −M) = Φr(s1, s2, f) (7.25) Hence, we then have σˆ2yr,compute = σx2 D2Nf D−1∑ d1=0 NF 2D −1∑ f=− NF 2D S−1∑ s1=0 S−1∑ s2=0 Gs1(e j2pi( f−Md1Nf ) )Gs2(e j2pi( f−Md1Nf ) )Φr(s1, s2, f −Md1) (7.26) By letting f = −NF2D , . . . , NF 2D − 1 and d1 = 0, 1, . . . D − 1 in the relationship fˆ = (f − Md1) mod Nf , we have fˆ = 0, 1, . . . , Nf − 1. In addition, we observe that the z-transform is 2pi 93 periodic in w for z = ejw which means we can re-index f −Md1 mod Nf in the arguments of Gs1 and Gs2 . Finally, since Nf = M ·D by assumption, Φr(s1, s2, f) is also Nf periodic in f which means we can also re-index f −Md1 modNf in the arguments of Φ. We can then combine the summations over d1 and f into one summation over f in (7.26) as follows σˆ2yr,compute = σx2 D2Nf · Nf−1∑ f=0 S−1∑ s1=0 S−1∑ s2=0 Gs1(e j2pi( fNf ) )Gs2(e j2pi( fNf ) )Φr(s1, s2, f) (7.27) If we make the assumption that the analysis and the synthesis filters’ FIR coefficients are real, the term Gs1(e j2pi( fNf ) )Gs2(e j2pi( fNf ) )Φr(s1, s2, f) is conjugate symmetric in continuous variable f . Hence we can write (7.27) as follows: σˆ2yr,compute = σx2 D2Nf [2 Nf 2 −1∑ f=1 S−1∑ s1=0 S−1∑ s2=0 Gs1(e j2pi( fNf ) )Gs2(e j2pi( fNf ) )Φr(s1, s2, f) + S−1∑ s1=0 S−1∑ s2=0 Gs1(e j2pi( 0Nf ) )Gs2(e j2pi( 0Nf ) )Φr(s1, s2, 0) + S−1∑ s1=0 S−1∑ s2=0 Gs1(e j2pi( Nf 2 Nf ) )Gs2(e j2pi( Nf 2 Nf ) )Φr(s1, s2, Nf 2 )] (7.28) For a fixed f , any of the three summations over s1 and s2 of (7.27) can be expressed as product of a row vector, matrix, and column vector, that is S−1∑ s1=0 S−1∑ s2=0 Gs1(e j2pi( fNf ) )Gs2(e j2pi( fNf ) )Φr(s1, s2, f) = G1×S(e j2pi( fNf ) )Φr,S×S(f)G ∗ 1×S(e j2pi( fNf ) ) (7.29) where 94 G1×S(e j2pi( fNf ) ) = [G0,1×C(e j2pi( fNf ) ),G1,1×C(e j2pi( fNf ) ), . . . ,GN−1,1×C(e j2pi( fNf ) )] (7.30) is a row vector, size 1 × S, containing all S synthesis filters’ responses at discretized fre- quency f . The entries of the square matrix Φr,S×S(f), size S × S, is given as (Φr,S×S(f))s1,s2 = Φr(s1, s2, f) = D−1∑ d2=0 F˙s1,r(e j2pi( fNf − d2D ))F˙ s2,r(e j2pi( fNf − d2D )) (7.31) In addition, Φr,S×S(f) can be expressed as the product of the analysis matrix F˙r,S×D(e j2pi( fNf ) ) and its conjugate that is Φr,S×S(f) = F˙r,S×D(e j2pi( fNf ) )F˙∗r,S×D(e j2pi( fNf ) ) (7.32) where the matrix F˙r,S×D(e j2pi( fNf ) ), size S by D, is defined as F˙r,S×D(e j2pi( fNf ) ) =             F˙r,0,1×C(e j2pi( fNf ) ) F˙r,1,1×C(e j2pi( fNf ) ) · · · F˙r,N−1,1×C(e j2pi( fNf ) ) F˙r,0,1×C(e j2pi( fNf − 1D )) F˙r,1,1×C(e j2pi( fNf − 1D )) · · · F˙r,N−1,1×C(e j2pi( fNf − 1D )) ... ... . . . ... F˙r,0,1×C(e j2pi( fNf −D−1D )) F˙r,1,1×C(e j2pi( fNf −D−1D )) · · · F˙r,N−1,1×C(e j2pi( fNf −D−1D ))             T (7.33) and consists of D ·N column vectors, size C by 1, defined in row vector notation as F˙r,n,1×C(e j2pi( fNf − dD )) = Hr,n(e j2pi( fNf − dD )) · [Fn,0(e j2pi( fNf − dD )), Fn,1(e j2pi( fNf − dD )), · · · , Fn,C−1(e j2pi( fNf − dD ))] (7.34) 95 with d ∈ {0, 1, . . . , D − 1}. Hence, we can further rewrite (7.29) as follows S−1∑ s1=0 S−1∑ s2=0 Gs1(e j2pi( fNf ) )Gs2(e j2pi( fNf ) )Φr(s1, s2, f) = G1×S(e j2pi( fNf ) )Φr,S×S(f)G ∗ 1×S(e j2pi( fNf ) ) = G1×S(e j2pi( fNf ) )F˙r,S×D(e j2pi( fNf ) )F˙∗r,S×D(e j2pi( fNf ) )G∗1×S(e j2pi( fNf ) ) = ‖F˙∗r,S×D(e j2pi( fNf ) )G∗1×S(e j2pi( fNf ) )‖2 (7.35) We can then express the right hand side of (7.28) as a product of a block diagonal matrix and column vector, that is, σˆ2yr,compute = σx2 D2Nf · 2‖F˙∗ r,S( Nf 2 +1)×D( Nf 2 +1) (e j2pi( fNf ) )G∗ 1×S( Nf 2 +1) (e j2pi( fNf ) )‖2 (7.36) and by transpose and conjugation we have σˆ2yr,compute = σx2 D2Nf · 2‖G 1×S( Nf 2 +1) (e j2pi( fNf ) )F˙ r,S( Nf 2 +1)×D( Nf 2 +1) (e j2pi( fNf ) )‖2 (7.37) where F˙ r,S( Nf 2 +1)×D( Nf 2 +1) (e j2pi( fNf ) ) = diag ( 1 √ 2 F˙r,S×D(e j2pi(0)), F˙r,S×D(e j2pi( 1Nf ) ), · · · , F˙r,S×D(e j2pi( Nf 2 −1)), 1 √ 2 F˙r,S×D(e j2pi( Nf 2 )) ) (7.38) and G 1×S( Nf 2 +1) (e j2pi( fNf ) ) = [ G1×S(ej2pi(0)) G1×S(e j2pi( 1Nf ) ) · · · G1×S(ej2pi( Nf 2 )) ] (7.39) 96 7.3.3 Computationally Tractable Approximation of JI(G) Using (7.37), we then have the computationally tractable approximation of (7.11), that is JI,p(G) = ‖(σ 2 yr)r‖p ≈ ‖(σˆ2yr,compute)r‖p = JI,p,compute(Gcompute) (7.40) 7.3.4 Sparse Subchannels Penalty Term In the following section, we will derive a computationally tractable and efficient sparse subchannel penalty term JS,compute(Gcompute). The derivation begins by defining JS,sgn(G), which counts the number of active subchannels by seeing whether each subchannel’s synthesis filter frequency response is non-zero or not. Next, the continuous frequency variable w is dis- cretized along a finite, uniformly spaced set of points to give the computationally tractable term JS,sgn,compute(Gcompute). Finally, an L-1 like penalty is substituted to not only increase computa- tional efficiency but also to induce sparse solutions to give the desired JS,compute(Gcompute). Our goal is to find an objective function that not only minimizes the gain (time-averaged energy) of interference sources but also encourages sparse subchannels, that is only a few of the N ∗C subchannels are active. As before,N is the number of microphones, and C is the number of subchannels of each microphone. We say a subchannel is inactive if its synthesis filter frequency response is zero or very small in magnitude. We measure the number of active subchannels as follows: JS,sgn(G) = N−1∑ n=0 C−1∑ l=0 sgn2 ( max 0≤w 0. In other words, a channel is considered active if any portion of its frequency response is non-zero. We note that 0 ≤ w < pi rather than 0 ≤ w < 2pi since we assume our filter taps are real and hence the frequency response is conjugate symmetric. (7.41) counts the number of active subchannels and since sgn is applied to the max of absolute values, the value of (7.41) lies in the appropriate range of 0 to N · C. To make (7.41) computationally tractable, we discretize the continuous variable w using Nf points as in (7.16) so (7.41) becomes JS,sgn(G) ≈ JS,sgn,compute(Gcompute) = N−1∑ n=0 C−1∑ l=0 sgn2 ( max f∈{0,1,...,Nf/2} |Gn,l(e j2pi fNf | ) (7.42) We combine the summations over n and l into a single summation over s using (7.20) as before. In addition, similar to the spirit of Compressive Sampling, we replace the sgn2 function by the absolute value. Hence (7.42) becomes JS,abs,compute(Gcompute) = S−1∑ s=0 ( max f∈{0,1,...,Nf/2} |Gs(e j2pi fNf | ) (7.43) In the objective function that we will compute, (7.10), we will use (7.43) and hence we will shorten its notation as follows: JS,compute(Gcompute) = JS,abs,compute(Gcompute) (7.44) 98 7.4 Target Source Perfect Reconstruction Condition The target perfect reconstruction condition, TPR, is given by (6.9). To make (6.9) compu- tationally tractable, we discretize the continuous variable w using Nf points as in (7.16) so (6.9) becomes for f = 0, 1, . . . , Nf − 1 TPRcompute(f, d) = N−1∑ n=0 H0,n(e j2pi( fNf − dD )) C−1∑ l=0 Gn,l(e j2pi( fNf ) )Fn,l(e j2pi( fNf − dD )) =    De j2pi( fNF )(−∆) if d = 0; 0 if 1 ≤ d ≤ D − 1 (7.45) Since we haveD constraints for each of ourNf discretized frequencies, the TPR condition now has a total of Nf ·D constraints. In matrix-vector form and analogous to (6.10), the D target perfect reconstruction of con- ditions of (7.45) for a fixed f ∈ {0, 1, . . . , Nf − 1} can be written as TPRcompute(f) = F˙TS×D(e j2pi( fNf ) )GT1×S(e j2pi( fNf ) ) = De j2pi( fNF )(−∆) · eT0,1×D (7.46) where S = N · C, the number of subchannels. The matrix F˙TS×D(f), size D by S, is defined as F˙TS×D(e j2pi( fNf ) ) =             F˙0,1×C(e j2pi( fNf ) ) F˙1,1×C(e j2pi( fNf ) ) · · · F˙N−1,1×C(e j2pi( fNf ) ) F˙0,1×C(e j2pi( fNf − 1D )) F˙1,1×C(e j2pi( fNf − 1D )) · · · F˙N−1,1×C(e j2pi( fNf − 1D )) ... ... . . . ... F˙0,1×C(e j2pi( fNf −D−1D )) F˙1,1×C(e j2pi( fNf −D−1D )) · · · F˙N−1,1×C(e j2pi( fNf −D−1D ))             (7.47) 99 and consists of D ·N row vectors, size 1 by C, defined as F˙n,1×C(e j2pi( fNf − dD )) = H0,n(e j2pi( fNf − dD )) · [Fn,0(e j2pi( fNf − dD )), Fn,1(e j2pi( fNf − dD )), · · · , Fn,C−1(e j2pi( fNf − dD ))] (7.48) with d ∈ {0, 1, . . . , D − 1}. The column vector GT1×S(e j2pi( fNf ) ), size S by 1, is defined as GT1×S(e j2pi( fNf ) ) =             GT0,1×C(e j2pi( fNf ) ) GT1,1×C(e j2pi( fNf ) ) ... GTN−1,1×C(e j2pi( fNf ) )             (7.49) and consists of N column vectors, size C by 1, defined as GTn,1×C(e j2pi( fNf ) ) = [Gn,0(e j2pi( fNf ) ), Gn,1(e j2pi( fNf ) ), · · · , Gn,C−1(e j2pi( fNf ) )]T (7.50) The column vector eTk,1×D, size D×1, is defined as the D×1 zero vector but with the k-th entry set to 1, that is eTk,1×D = [0, . . . 0, 1︸︷︷︸ k-th entry of D size vector , 0, . . . 0]T (7.51) We note that if we use the same set of C analysis filters for each filterbank that is [Fn1,0(e j2pi( fNf − dD )), Fn1,1(e j2pi( fNf − dD )), · · · , Fn1,C−1(e j2pi( fNf − dD ))] = [Fn2,0(e j2pi( fNf − dD )), Fn2,1(e j2pi( fNf − dD )), · · · , Fn2,C−1(e j2pi( fNf − dD ))] (7.52) 100 for all 0 ≤ n1, n2 ≤ N − 1, 0 ≤ d ≤ D− 1, and 0 ≤ f ≤ Nf − 1, and the target inversion pre-filter removes the effect on phase from propagation perfectly, that is H0,n(e j2pi( fNf − dD )) = α0,n (7.53) for all 0 ≤ n ≤ N − 1, 0 ≤ d ≤ D − 1, and 0 ≤ f ≤ Nf − 1, then (7.47) is of rank = min(D,C) since every C columns are scalar multiples of the previous C columns. Since we need to fulfill D constraints, these two additional assumptions imply that D ≤ C. If we assume that all filter taps are real, then we can almost halve the number of constraints using the conjugate symmetry of filter responses and the 2pi periodicity in w in the z-transform for z = ejw . First if (7.46) holds for 0 ≤ f ≤ Nf − 1, then the conjugate of the entire equation also holds, that is F˙TS×D(e j2pi( fNf ) )GT1×S(e j2pi( fNf ) ) = D · eT0,1×D = D · e T 0,1×D (7.54) where the last equality follows because eT0,1×D contains all real entries. Next, we have F˙TS×D(e j2pi( fNf ) )GT1×S(e j2pi( fNf ) ) = D · eT0,1×D F˙TS×D(e j2pi(−fNf ) )GT1×Se j2pi(−fNf ) ) = D · eT0,1×D F˙TS×D(e j2pi( −f mod Nf ) Nf ) )GT1×S(e j2pi( −f mod Nf ) Nf ) ) = D · eT0,1×D where the second line follows from conjugate symmetry and the third line follows from 2pi periodicity in w. In summary, we have shown that if TPRcompute(f) holds, so does TPRcompute(−f mod Nf ). Hence, we can express the constraints of (7.46) for all f as a product of a block-diagonal matrix-vector multiply, that is 101 F˙T ( Nf 2 +1)S×( Nf 2 +1)D GT 1×( Nf 2 +1)S = D · E˙T 0,1×( Nf 2 +1)D (7.55) and by transposition we have G 1×( Nf 2 +1)S F˙ ( Nf 2 +1)S×( Nf 2 +1)D = D · E˙ 0,1×( Nf 2 +1)D (7.56) where the block diagonal matrix F˙ ( Nf 2 +1)S×( Nf 2 +1)D , size (Nf2 + 1)S × ( Nf 2 + 1)D , is given by F˙ ( Nf 2 +1)S×( Nf 2 +1)D = diag( 1 √ 2 F˙S×D(e j2pi( 0Nf ) ), F˙S×D(e j2pi( 1Nf ) ), . . . , F˙S×D(e j2pi( Nf 2 −1 Nf ) ), 1 √ 2 F˙S×D(e j2pi( Nf 2 Nf ) )) (7.57) the row vector of unknowns G 1×( Nf 2 +1)S , size 1× (Nf2 + 1)S, is given by G 1×( Nf 2 +1)S = [G1×S(e j2pi( 0Nf ) ),G1×S(e j2pi( 1Nf ) ), . . . ,G1×S(e j2pi( Nf 2 Nf ) )], (7.58) and the row vector of constraints E 0,1×( Nf 2 +1)D , size 1× (Nf2 + 1)D, is given by E˙ 0,1×( Nf 2 +1)D = [ 1 √ 2 e j2pi( 0NF )(−∆) e0,1×D ︸ ︷︷ ︸ f=0 , e j2pi( 1NF )(−∆) e0,1×D ︸ ︷︷ ︸ f=1 , . . . , e j2pi( Nf 2 −1 NF )(−∆) e0,1×D ︸ ︷︷ ︸ f= Nf 2 −1 , 1 √ 2 e j2pi( Nf 2 NF )(−∆) e0,1×D ︸ ︷︷ ︸ f= Nf 2 ] (7.59) As before, S represents the total number of subchannels and is equal to the product the number of filterbanks N and the number of subchannels per filterbank C, that is S = N · C. 102 In addition, by conjugating (7.55), we make the terms consistent with (7.35) where the unknown synthesis filters are a column vector, and hence we have TPRcompute = F˙∗ ( Nf 2 +1)S×( Nf 2 +1)D G∗ 1×( Nf 2 +1)S = D · E˙∗ 0,1×( Nf 2 +1)D (7.60) 7.5 Optimization Problem We can now combine the results of the previous sections to give an optimization problem that is computationally tractable. minimize Gcompute Jcompute(Gcompute) = JI,p,compute(Gcompute) + λJS,compute(Gcompute) subject to TPRcompute (7.61) where Gcompute is given by (7.8), JI,p,compute is given by (7.40), JS,compute is given by (7.44), and TPRcompute is given by (7.56). We note again that λ is a user chosen constant that denotes the severity of the sparse sub- channel penalty and the problem (7.61)is convex. 103 Chapter 8: Numerical Results When Optimizing Over Synthesis Filters In this chapter, we detail our numerical simulations. Our simulations rely heavily on Michael Grant’s CVX Toolbox, a package for specifying and solving convex programs [41] , and its integration with the professional solver MOSEK. Fig. 8.1. Room environment. The rectangular room shown in Fig. 8.1 has dimensions 10 meters by 8 meters and 32 microphones are placed uniformly along its 4 walls. The lower left hand corner of the room is 104 defined as the origin, coordinate (0,0), and our target source is located at (3, 2.5). There are 1240 virtual interferences in our room. Four-fifths of the interferences lie outside the room and model reflections off the four walls. We model our room environment using a large number of virtual interferences since we do not apriori where the actual sources lie. Each microphone is processed by a two channel filterbank, with each channel being deci- mated and then later upsampled by a factor of 2. Hence we have 32 ∗ 2 = 64 subchannels. We fix the two analysis filters of each of the filterbanks to be Haar filters. We wish to design the synthesis filters that will minimize the maximum gain of the interferences and hence let p = ∞. In addition, we wish to perfectly reconstruct the target source and use only 20 out of the possible 64 subchannels. In other words, only 20 out of the 64 synthesis filters are allowed to be active and have non-zero frequency responses. We note that 64 choose 20 is approximately 2 ∗ 1016, and hence we have an overwhelming number of subsets of subchannels to choose from. Fig. 8.2. Performance on optimization grid consisting of 1240 interferences in db. We run our optimization problem iteratively, tuning the parameter λ at each iteration until 105 we find exactly 20 active subchannels. λ, a non-negative scalar, is found through a bisection algorithm since as λ increases, the number of active subchannel decreases, and similarly as λ decreases, the number of active subchannels increase. A subchannel is considered inactive if the maximum magnitude of its synthesis filter is less than 11000 of the maximum magnitude of all the synthesis filters. Fig. 8.3 shows the maximum magnitude of each subchannel’s synthesis filter after our iterations completed. Notice that 20 of the 64 subchannel have non-trivial synthesis filters . Fig. 8.3. Maximum magnitude of each subchannel’s synthesis filter for the λ value that produced the desired number of subchannels. Our optimization problem finds a sparse number of synthesis filters and thus a sparse num- ber of active subchannels. However, the frequency response of the inactive synthesis filters are not exactly zero. Hence, we run our optimization routine one more time, optimizing the synthesis filters of only the previously discovered active subchannels and setting our sparsity parameters λ to zero. 106 This is known as the debiasing step. In some sense, this step redistributes the “crumbs” of energy in the inactive subchannels’ synthesis filters to the active subchannels’ synthesis filters. Fig. 8.4 shows each subchannel’s maximum synthesis filter magnitude after this debiasing step. Fig. 8.4. Maximum magnitude of each subchannel’s synthesis filter after debiasing. Our optimization routine returns a setup that uses slightly more low-frequency subchannels than it does high-frequency subchannels. Furthermore, the setup even occasionally uses only a single subchannel of a filterbank and not the other subchannel. See Fig. 8.7,8.8, 8.9, and 8.10 for sample synthesis filter frequency responses of the 64 subchannels, 20 of which are active. Fig. 8.5 shows the locations of the active low-frequency subchannels, and Fig. 8.6 shows the locations of the active high-frequency subchannels. We optimized over 16 discrete frequencies, 9 of which are unique since we assume all our filter taps are real and therefore frequency responses are conjugate symmetric. Notice that the total number of active subchannels sums to the desired number of active subchannels, 20. Fig. 8.2 shows how the 20 synthesis filters we found performed on the set of sources we 107 Fig. 8.5. Locations of active low frequency subchannels. optimized over. The target gain is 0 db because an optimization constraint was target perfect reconstruction. The worst interference gain is −8.27 db. While this result is impressive, a more realistic measure of performance is to test our system’s filters in a room with a denser set of interferences, as in Fig. 8.11 where the worst interference gain is 1.84 db. Not surprisingly, the performance is worst near the microphones. However, the target gain is again 0 db, and the gain map decays very smoothly. 108 Fig. 8.6. Locations of active high frequency subchannels. Fig. 8.7. The synthesis filters frequency responses of a filterbank whose low frequency subchannel is only active. 109 Fig. 8.8. The synthesis filters frequency responses of a filterbank whose high frequency subchan- nel is only active. Fig. 8.9. The synthesis filters frequency responses of a filterbank whose subchannels are both active. 110 Fig. 8.10. The synthesis filters frequency responses of a filterbank whose subchannels are both inactive. Fig. 8.11. Performance on evaluation grid consisting of 4960 interferences in db . 111 Chapter A: Properties of Γr In the following, we assume that Nf , the number of discretized frequencies, is a multiple of D, the downsampling factor, that is Nf = M ·D with M a positive integer. We now show that Γr(s1, s2, f) is M -periodic in f , that is Γr(s1, s2, f +M) = Γr(s1, s2, f) (A.1) Proof. To simplify notation, we let φs1,s2,r(e j2pi( f−Md2Nf ) ) = F˙s1,r(e j2pi( f−Md2Nf ) )F˙ s2,r(e j2pi( f−Md2Nf ) ). Also let d ′ 2 = d2 − 1. Then we have 112 Γr(s1, s2, f +M) = D−1∑ d2=0 φs1,s2,r(e j2pi( f−M(d2−1)Nf ) ) = D−2∑ d ′ 2=−1 φs1,s2,r(e j2pi( f−Md ′ 2 Nf ) ) = φs1,s2,r(e j2pi( f+MNf ) ) + D−2∑ d ′ 2=0 φs1,s2,r(e j2pi( f−Md ′ 2 Nf ) ) = φs1,s2,r(e j2pi( f+MNf ) e j2pi(−MD−M+MNf ) ) + D−2∑ d ′ 2=0 φs1,s2,r(e j2pi( f−Md ′ 2 Nf ) ) = φs1,s2,r(e j2pi( f−M(D−1)Nf ) ) + D−2∑ d ′ 2=0 φs1,s2,r(e j2pi( f−Md ′ 2 Nf ) ) = D−1∑ d ′ 2=0 φs1,s2,r(e j2pi( f−Md ′ 2 Nf ) ) = Γr(s1, s2, f) where the fourth line uses the fact that the z-transform is 2pi periodic in w for z = ejw. We now show that the matrix Γr,S×S(f) is positive semi-definite, that is x∗Γr,S×S(f)x ≥ 0 (A.2) for any non-zero column vector x in CS . Proof. From (7.31), we have 113 x∗Γr,S×S(f)x = S−1∑ s1=0 S−1∑ s2=0 xs1xs2 D−1∑ d2=0 F˙s1,r(e j2pi( f−Md2Nf ) )F˙ s2,r(e j2pi( f−Md2Nf ) ) = D−1∑ d2=0 S−1∑ s1=0 xs1F˙s1,r(e j2pi( f−Md2Nf ) ) S−1∑ s2=0 xs2F˙ s2,r(e j2pi( f−Md2Nf ) ) = D−1∑ d2=0 ∣ ∣ ∣ ∣ ∣ S−1∑ s=0 xsF˙s,r(e j2pi( f−Md2Nf ) ) ∣ ∣ ∣ ∣ ∣ 2 ≥ 0 where xk is the k-th entry of the vector x. Chapter B: Parseval Relationship For Energy of Filters The following equation gives a relationship between the l2 energy of a filter g’s tap coeffi- cients and the l2 energy of its transfer function. ∑ l |g[l]|2 = 1 2pi |G(w)|2 dw (B.1) Proof. The following inverse Discrete Time Fourier Transform relates the tap cofficients of a filter g to its frequency response G g[l] = 1 2pi ∫ 2pi 0 G(w)ejwldw (B.2) 114 The following Poission summation formula gives us an useful “time-frequency” relation- ship. ∑ n e2pijnx = ∑ k δ(x− k) (B.3) We note that the delta distribution δ(x) has the following scaling property: |a|δ(ax) = δ(x) (B.4) We can now relate the l2 energy of the filter’s tap cofficients to the l2 energy of the filter’s transfer function. ∑ l |g[l]|2 = ∑ l ∣ ∣ ∣ ∣ 1 2pi ∫ 2pi 0 G(w)ejlwdw ∣ ∣ ∣ ∣ 2 = ∑ l 1 4pi2 ∫ 2pi 0 G(w1)e jlw1dw1 · ∫ 2pi 0 G(w2)e −jlw2dw2 = 1 4pi2 ∫ 2pi 0 ∫ 2pi 0 G(w1)G(w2) ∑ l e2pijl w1−w2 2pi ︸ ︷︷ ︸ ∑ n δ( w1−w2 2pi −n) dw1dw2 = 1 2pi ∑ n ∫ 2pi 0 ∫ 2pi 0 G(w1)G(w2)δ(w1 − w2 − 2pin)dw1dw2 = 1 2pi |G(w)|2 dw Note that w1 −w2 < 2pi since both w1’s and w2’s limits are from 0 to 2pi and hence the sum over n produces non-zero δ(w1 − w2 − 2pin) only when n = 0. (We ignore the difference of exactly 2pi because it has measure 0.) 115 B.1 References [1] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statis- tical Society (Series B), 58:267–288, 1996. [2] E.J. Candes and Y. Plan. Near-ideal model selection by l1 minimization. Annals of Statistics, 37(5A):2145–2177, 2009. [3] S. Kirkpatrick and MP Vecchi. Optimization by simmulated annealing. Science, 220(4598):671–680, 1983. [4] H. Lebret and S. Boyd. Antenna array pattern synthesis via convex optimization. Signal Processing, IEEE Transactions on, 45(3):526–532, 1997. [5] M. S. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret. Applications of second-order cone programming. Linear algebra and its applications, 284(1):193–228, 1998. [6] C. Ling, S. Wee, C. Wei, and Y. Zhu Liang. Linear sparse array synthesis via convex opti- mization. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Sympo- sium on, pages 4233–4236, 2010. [7] R.A. Haubrich. Array design. Bulletin of the Seismological Society of America, 58(3):977– 991, 1968. [8] S. Lang, C. Duckworth, and I. McClellan. Array design for mem and mlm array processing. In Proceedings of the IEEE, pages 145–148, Atlanta, GA, USA, 1981. [9] P. Pal and P. P. Vaidyanathan. Nested arrays: A novel approach to array processing with enhanced degrees of freedom. Signal Processing, IEEE Transactions on, 58(8):4167–4181, 2010. [10] D. B. Ward and M. S. Brandstein. Grid-based beamformer design for room-environment microphone arrays. In Applications of Signal Processing to Audio and Acoustics, 1999 IEEE Workshop on, pages 23–26, 1999. [11] M. S. Brandstein and D. B. Ward. Cell-based beamforming (ce-babe) for speech acquisition with microphone arrays. Speech and Audio Processing, IEEE Transactions on, 8(6):738– 743, 2000. [12] J.B. Allen and D.A. Berkley. Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am, 65(4):943–950, 1979. [13] R.G. Baraniuk, E. Candes, R. Nowak, and M. Vetterli. Compressive sampling. IEEE Signal Processing Magazine, 25(2):12–13, 2008. [14] E. J. Candes and M. Wakin. An introduction to compressive sampling. IEEE Signal Pro- cessing Magazine, 25(2):21–30, 2008. [15] E.J. Cands and Y. Plan. Near-ideal model selection by 1 minimization. The Annals of Statistics, 37(5A):2145–2177, 2009. [16] Emmanuel J Candes. The restricted isometry property and its implications for compressed sensing. Comptes Rendus Mathematique, 346(9):589–592, 2008. 116 [17] Bernard D Steinberg. The peak sidelobe of the phased array having randomly located ele- ments. Antennas and Propagation, IEEE Transactions on, 20(2):129–136, 1972. [18] Richard M Leahy and Brian D Jeffs. On the design of maximally sparse beamforming arrays. Antennas and Propagation, IEEE Transactions on, 39(8):1178–1187, 1991. [19] Sverre Holm, Bjrnar Elgetun, and Geir Dahl. Properties of the beampattern of weight-and layout-optimized sparse arrays. Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, 44(5):983–991, 1997. [20] Thomas Baran, Dennis Wei, and Alan V Oppenheim. Linear programming algorithms for sparse filter design. Signal Processing, IEEE Transactions on, 58(3):1605–1617, 2010. [21] David G Leeper. Isophoric arrays-massively thinned phased arrays with well-controlled sidelobes. Antennas and Propagation, IEEE Transactions on, 47(12):1825–1835, 1999. [22] Giacomo Oliveri and Andrea Massa. Bayesian compressive sampling for pattern synthesis with maximally sparse non-uniform linear arrays. Antennas and Propagation, IEEE Trans- actions on, 59(2):467–481, 2011. [23] P.P. Vaidyanathan. Multirate Systems And Filter Banks. Dorling Kindersley, 1993. [24] G. Strang and T. Nguyen. Wavelets and Filter Banks. Wellesley-Cambridge Press, 1996. [25] Shuichi Ohno and Hideaki Sakai. Optimization of filter banks using cyclostationary spectral analysis. Signal Processing, IEEE Transactions on, 44(11):2718–2725, 1996. [26] Andrea Trucco and Vittorio Murino. Stochastic optimization of linear sparse arrays. Oceanic Engineering, IEEE Journal of, 24(3):291–299, 1999. [27] Siddharth Joshi and Stephen Boyd. Sensor selection via convex optimization. Signal Pro- cessing, IEEE Transactions on, 57(2):451–462, 2009. [28] Dragana Bajovic, Bruno Sinopoli, and Joo Xavier. Sensor selection for event detection in wireless sensor networks. Signal Processing, IEEE Transactions on, 59(10):4938–4953, 2011. [29] Ling Cen, Wee Ser, Zhu Liang Yu, Susanto Rahardja, and Wei Cen. Linear sparse array synthesis with minimum number of sensors. Antennas and Propagation, IEEE Transactions on, 58(3):720–726, 2010. [30] Venkat Roy, Sundeep Prabhakar Chepuri, and Geert Leus. Sparsity-enforcing sensor selec- tion for doa estimation. In Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2013 IEEE 5th International Workshop on, pages 340–343. IEEE. [31] Jian Wang, Wei-Xing Sheng, Yu-Bing Han, and Xiao-Feng Ma. Adaptive beamforming with compressed sensing for sparse receiving array. Aerospace and Electronic Systems, IEEE Transactions on, 50(2):823–833, 2014. [32] Siew Eng Nai, Wee Ser, Zhu Liang Yu, and Huawei Chen. Beampattern synthesis for linear and planar arrays with antenna selection by convex optimization. Antennas and Propagation, IEEE Transactions on, 58(12):3923–3930, 2010. 117 [33] Zhi Guo Feng, Ka Fai Cedric Yiu, and Sven E Nordholm. Placement design of micro- phone arrays in near-field broadband beamformers. Signal Processing, IEEE Transactions on, 60(3):1195–1204, 2012. [34] J. L. Araque Quijano, M. Righero, and G. Vecchi. Sparse 2-d array placement for arbitrary pattern mask and with excitation constraints: A simple deterministic approach. Antennas and Propagation, IEEE Transactions on, 62(4):1652–1662, 2014. 0018-926X. [35] James D Johnston. A filter family designed for use in quadrature mirror filter banks. In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP’80., volume 5, pages 291–294. IEEE. [36] Tongwen Chen and BA Francis. Design of multirate filter banks by h optimization. IEEE Transactions on Signal Processing, 43:2822–2830, 1995. [37] Eric Viscito and Jan P Allebach. The analysis and design of multidimensional fir perfect reconstruction filter banks for arbitrary sampling lattices. Circuits and Systems, IEEE Trans- actions on, 38(1):29–41, 1991. [38] Truong Q Nguyen. Digital filter bank design quadratic-constrained formulation. Signal Processing, IEEE Transactions on, 43(9):2103–2108, 1995. [39] Ha Hoang Kha, Hoang Duong Tuan, and Truong Q Nguyen. Efficient design of cosine- modulated filter banks via convex optimization. Signal Processing, IEEE Transactions on, 57(3):966–976, 2009. [40] Timothy Davidson. Enriching the art of fir filter design via convex optimization. Signal Processing Magazine, IEEE, 27(3):89–101, 2010. [41] Michael Grant and Stephen Boyd. Cvx: Matlab software for disciplined convex program- ming, version 2.1, March 2014. 118