electronics
Article
Learnable Wavelet Scattering Networks: Applications to Fault
Diagnosis of Analog Circuits and Rotating Machinery
Varun Khemani *, Michael H. Azarian and Michael G. Pecht
Center for Advanced Life Cycle Engineering (CALCE), University of Maryland, College Park, MD 20742, USA;
mazarian@umd.edu (M.H.A.); pecht@umd.edu (M.G.P.)
* Correspondence: vkheman@umd.edu
Abstract: Analog circuits are a critical part of industrial electronics and systems. Estimates in the
literature show that, even though analog circuits comprise less than 20% of all circuits, they are
responsible for more than 80% of faults. Hence, analog circuit fault diagnosis and isolation can
be a valuable means of ensuring the reliability of circuits. This paper introduces a novel technique
of learning time?frequency representations, using learnable wavelet scattering networks, for the
fault diagnosis of circuits and rotating machinery. Wavelet scattering networks, which are fixed
time?frequency representations based on existing wavelets, are modified to be learnable so that they
can learn features that are optimal for fault diagnosis. The learnable wavelet scattering networks are
developed using the genetic algorithm-based optimization of second-generation wavelet transform
operators. The simulation and experimental results for the diagnosis of analog circuit faults demon-
strates that the developed diagnosis scheme achieves greater fault diagnosis accuracy than other
methods in the literature, even while considering a larger number of fault classes. The performance
of the diagnosis scheme on benchmark datasets of bearing faults and gear faults shows that the
developed method generalizes well to fault diagnosis in multiple domains and has good transfer
learning performance, too.


Citation: Khemani, V.; Azarian, M.H.; Keywords: wavelet scattering networks; analog circuits; rotating machinery; fault diagnosis; scatter-
Pecht, M.G. Learnable Wavelet ing networks; fault isolation; second-generation wavelet transform
Scattering Networks: Applications to
Fault Diagnosis of Analog Circuits
and Rotating Machinery. Electronics
2022, 11, 451. https://doi.org/ 1. Introduction
10.3390/electronics11030451
Electronic circuits are ubiquitous in our everyday lives, in applications ranging from
Academic Editor: Gaetano Palumbo the commercial domain to the safety-critical domain. As a result, unforeseen circuit failures
can have enormous consequences for the safety and financial well-being of their users
Received: 1 January 2022
Accepted: 28 January 2022 and producers [1,2]. Analog circuit failures can be attributed to interconnected failures or
Published: 2 February 2022 component faults, which are associated with either parametric drift (soft faults) or short
circuit/open circuit [3] (hard faults). Analog circuits have become increasingly complex and
Publisher?s Note: MDPI stays neutral consequentially, fault diagnosis is increasingly difficult, due to: (a) component tolerances,
with regard to jurisdictional claims in (b) interactions among components, (c) inadequate accessible measurement nodes; and
published maps and institutional affil- (d) the inherent non-linearity in the behavior of analog circuits. Compared to digital circuits,
iations. analog circuits are more susceptible to interference and have fewer measurement nodes.
Interestingly, even though analog circuits account for less than 20% of all circuits, they are
responsible for more than 80% of circuit faults [4,5] Therefore, the fault diagnosis of analog
Copyright: ? 2022 by the authors. circuits has become a highly important research area in recent years.
Licensee MDPI, Basel, Switzerland. There are two broad categories for fault diagnosis approaches for circuits: analytical
This article is an open access article methods and data-driven methods. Circuit transfer function equations are required to
distributed under the terms and apply analytical methods [6]. If these equations are unavailable, they can be determined
conditions of the Creative Commons using design principles or parameter identification techniques [7], and fault diagnosis is
Attribution (CC BY) license (https:// then achieved by exposing the circuit to a test stimulus and using the response to estimate
creativecommons.org/licenses/by/ the circuit parameters. This technique is suitable for linear analog circuits but is not feasible
4.0/). for nonlinear analog circuits because of the complexity involved [8].
Electronics 2022, 11, 451. https://doi.org/10.3390/electronics11030451 https://www.mdpi.com/journal/electronics
Electronics 2022, 11, 451 2 of 16
Data-driven methods [9?12] require data obtained under faulty conditions to be avail-
able either through testing, operation, or simulation such that a comparison can be made
to data obtained under healthy conditions for fault diagnosis. Features of the data are
used for this comparison and can be time domain, frequency domain, or time?frequency
domain. Various machine learning approaches such as neural networks, support vector
machines, Na?ve Bayes classifier, etc., have been used for fault diagnosis under the broad
umbrella of data-driven methods. Neural-network-based fault-diagnosis approaches [13,14]
have included, for feature generation: kurtosis and entropy [15], wavelet transforms [16],
and fractional wavelet transforms [17]; and for dimensionality reduction: kernel PCA
(kPCA) [16,17]. Support vector machine (SVM)-based [18] fault-diagnosis approaches have
further included, for feature generation: fractional Fourier transform [19], cross-wavelet
transform [20,21], deep belief networks (DBN) [22,23], and empirical mode decomposi-
tion [24]; for dimensionality reduction: parametric t-SNE [20] and principal component
analysis [21]; and for SVM hyperparameter optimization: the double-chains quantum
genetic algorithm [24], the fruitfly algorithm [25], the barnacles mating optimizer algo-
rithm [26], and the firefly algorithm [27]. Na?ve-Bayes-classifier-based [28] fault-diagnosis
approaches include, for feature generation: cross-wavelet transform [29]; and for dimen-
sionality reduction: bilateral 2D linear discriminant analysis.
The standard approach that the vast majority of the methods followed is to extract
features and apply a dimensionality reduction algorithm to obtain a lower-dimensional
feature set which is then fed to a classification algorithm. Extracting features informative for
fault diagnosis requires technical expertise which restricts its application as a generalized
method. Recently, techniques have been proposed involving the direct application of deep
learning methods for fault diagnosis. These techniques use input data to learn features
autonomously through a multi-layered neural network. This avoids the need for manual
feature extraction and feature selection. For example, different 2D representations [30,31]
have been developed for circuit outputs for use with state-of-the-art deep learning networks
such as ResNet50 [32] to achieve fault diagnosis. However, the creation of an optimal custom
deep learning network structure for the problem at hand requires subject matter expertise
and extensive trial-and-error [33]. Inspired by wavelet scattering theory [34] and second-
generation wavelet transform [35], we propose a novel technique that does not need to be
optimized for structure and learns wavelet filters instead of random filters from the data.
Hence, it overcomes the shortcomings of deep learning networks. The remainder of the
paper is organized as follows: Section 2 presents a theoretical background of the techniques
involved in the approach. Section 3 details the developed fault diagnosis methodology.
Section 4 details the application of the approach to the fault diagnosis of two circuits and a
bearing and a gears dataset. The conclusions follow in Section 5.
2. Theoretical Background
As mentioned earlier, in this paper, time?frequency representations are learnt from the
circuit outputs for fault diagnosis using learnable wavelet scattering networks (LWSNs).
This involves modifying wavelet scattering networks, which are fixed time?frequency
representations based on existing wavelets, such that they can learn features that are
optimal for fault diagnosis. Learnable wavelet scattering networks are developed using the
genetic-algorithm-based optimization of second-generation wavelet transform operators.
Support vector machines (SVMs) are used as classifiers for the features learned by the
LWSN. In the following subsections, we review the basics of a wavelet transform, a wavelet
scattering network, a genetic algorithm, and a support vector machine and introduce the
concept of learnable wavelet scattering networks.
2.1. Wavelet Transform
A wavelet transform is a collection of bandpass filters with progressively broader
bandwidths at higher frequencies. A wavelet is a time-limited waveform that has a non-zero
norm and zero average value. Often, signals are piecewise smooth but have momentary
Electronics 2022, 11, 451 3 of 16
transients; for example, edges in images or transients caused by rapid changes in economic
conditions in financial time series. The Fourier basis is not suited for the sparse represen-
tation of these signals, as their sinusoids have infinite duration and would require sine
waves of vari(ous f)requencies for representation. Wavelets, being irregular and of limitedtime, require the break-up of a signal into a limited number of variations of the original
wavelet ?1 ? t?us s . The scale parameter s is inversely proportional to the frequency. A
small scale s leads to a compressed wavelet, which is ideal for high-frequency signals with
rapidly changing details. A long scale s leads to a stretched wavelet, which is ideal for
slowly changing signals with coarse features; i.e., a low-frequency signal. This increases
the flexibility of the time?frequency analysis. The wavelet transform (1) has scale-varying
basis functions. ? ? ( )1 t? u
W f (u, s) = f (t)? ? dt (1)
?? s s
The continuous wavelet transform (CWT) (2) compares a signal with shifted and
scaled versions of the mother wavelet. ( )
1 t?m
?(u, s) = j ? j (2)
2 v 2 v
Here, v is the number of voices per octave, as it requires v intermediate scales to increase
the scale by an octave. Higher values of v result in a finer discretization of the scale parameter
s and an increase in the amount of computation required. The discrete wavelet transform
(DWT) has a much coarser discretization of the scale parameter such that the number of voices
per octave is always one. Depending on the translation parameter discretization, there are
two broad types of DWT: decimated DWT and non-decimated DWT.
Decimated DWT (3): The translation parameter is 2jm, where m is a non-negative
integer and j is the scale. The decimated DWT is a sparse representation; hence, it is used
for compression, denoising, signal transmission(, etc. j )1 t? 2 m
?(u, s) = ? ? j (3)2j 2
Non-decimated DWT (4): Like in the case of the CWT, the translation parameter
is independent of the scale parameter. The non-decimated DWT is a more redundant
representation than the decimated DWT and is t(ranslati)on invariant.
?1 t?m?(u, s) = ? j (4)2j 2
2.2. Wavelet Scattering Networks (WSNs)
In an effort to create interpretable networks that mimic human performance on vision
and auditory tasks, some researchers use wavelet-transform-based methods, as wavelets
are an approximation of the response of the human visual cortex and cochlea to stimuli [36].
For example, the wavelet transform renders a time domain signal to the time?frequency
plane with a decreasing frequency resolution with increasing frequency, which is similar to
the human cochlear response.
Mallat [37] proposed WSNs (Figure 1) as a first step in understanding the success
of Convolutional Neural Networks (CNNs). A wavelet scattering network computes a
representation that preserves high-frequency information, is stable to deformations, and
is translation invariant, which makes it a good feature extractor for classification. It is a
cascade (tree) of convolutions between Gabor wavelet transforms (represented by ? in
Figure 1) and non-linear modulus and averaging operators (represented by ? in Figure 1),
which ?scatter? the signal along multiple paths. The number of paths at each node of
the WSN is the scale of the wavelet transform (scale = 3 in Figure 1), and the number of
Electronics 2022, 11, x FOR PEER REVIEW 4 of 17 
 
cascade (tree) of convolutions between Gabor wavelet transforms (represent?ed by ? in Figure 1) and non-linear modulus and averaging operators (represented by  in Figure 
Electronics 2022, 11, 451 1), which ?scatter? the signal along multiple paths. The number of paths at each nod4 eo fo1f6 
the WSN is the scale of the wavelet transform (scale = 3 in Figure 1), and the number of 
layers of wavelet transforms is typically two. Discrete versions of WSNs were proposed 
blayy Wersiaotof wsakvie [le3t6]tr aanndsf oinrvmoslvise teyxpisictainllgy dtwisocr.eDteis ocrrtehteogvoernsaiol nansdo fbWioSrtNhsogwoenrael pwroapvoelseetds.b y
Wiatowski [36] and involve existing discrete orthogonal and biorthogonal wavelets.
 
Fiigure 1.. Wavelet scattering network. 
UUnnlliikkee CCNNNNss, ,aa scsactattetreirnign gnentewtworokr kouotuptuptsu tcsoecfofeicfifiecnitesn atst aaltl alalyl elarsy,e nrso,t njuostt jtuhset ltahset 
llaayset rl,a aynedr, fainltderfis laterers naorte lenaortnleeda rfnroemd  fdroatma bduatt aarbeu pt raerdeepfirneedde fiwnaevdelwetas.v Telheutss., tThheu fsil,tethrse 
rfielttaeirns rtehteaiirn pthheyisripcahly msiceaalnminega,n winhgi,cwh hciacnhncoatn nboet sbaeidsa oidf othf eth feilfiteltresr sththata taarree ddeevveellooppeedd 
tthhrroouugghh tthhee lleeaarrnniinn? g
g pprroocceessss iinn aa ttyyppiiccaall? c
coonnvvoolluuttiioonn nneeuurraall nneettwwoorrkk.. OOppeerraattiioonnss iinn bbootthh 
CCNNNNss aanndd wwaavveelleett ssccaatttteerriinng networks can be represented as P (? (x ? w)), where x is the
tihnep uintpsuigt nsaigl,nwali, s th eisfi thltee rfiwl
aCtoNrN. Isn, tCh?Ne Nwse,i gthhet swweiagrhetws e?
g networks can be represented as ? ? (? ? ?) , where ? is 
teerig whte,ig?hist, the ins othneli nneoanrliitnye,aarnitdy,P ainsdt h?e piso tohlien pgoooplienrga toopr.eIrn-
ig ahrtes wofeliegahrtnse odf rlaeanrdnoemd rfialntedrosm, w fhilitleersin, wWhSilNe si,nt hWeSwNesi,g thhtes 
wweaigrehttsh e w aerieg hthtes wofetihgehtfis xoefd thwe afivxeeldet wfialtveerlse.t Sficlatetrtes.r iSncgatnteertiwngo rnkestwproorvkisd persotvaitdee-o sft-atthee--
oafr-tthclea-sasritfi cclaatsisoifnicaactciounr aacciceusroanciessim opn lseimtopmleo tdo emraotdeleyractoemlyp cloemx pdlaetxa sdeattsa,sseutsc,h sauschte axst uterxe-s
tiunreCsU inR CeTURdeaTta dseatta[s3e4t] [,3o4r], moru msiucsailcagle gnerneraen adnde nenvvirioronnmmeenntatal ls soouunndd ccllaassssiiffiiccaattiioonn [[3377]],, 
aanndd iimmaaggeess iinn MMNNIISSTT ddaattaasseett [[3388]].. HHoowweevveerr,, ffoorr eexxttrreemmeellyy ccoommpplleexx ddaattaasseettss ssuucchh aass 
IImmaaggeeNNeett [[3399]] oorr TTIMIMITITA Acocuoustsitci?cP?hPohnoenteictiCc oCnotnintuinouuosuSsp SepecehecCho Crpourspu[4s0 ][,4C0]N, NCNs aNres satriell 
smtiollr me aocrceu arcacteurthataen thscaant tsecrainttgernientgw noertkws.oArkms. aAjo mr raejaosro rneafsoornth fiosri tshtihsa its stchaattt esrciantgtenrientwg onrekt-s
waroerfikxs eadr-ef efaixtuedre-fgeeantuerrea tgoers, while CNNs learn features from this made to make the discrenteerawtoarvse, lwethsiclea tCteNriNngs nleeatrwn ofrekastuhraevs
e frdoamta .tAs a result, an efforte the lehaer ndaabtail.i tAysp ar orepseurltty,, asnu cehfftohrat titsh meyadcaen tole marankfee athtuer edsisfcrroemte twheavdealteat. scattering networks have the learnability 
property, such that they can learn features from the data. 
2.3. Learnable Wavelet Scattering Networks (LWSNs)
2.3. Learnable Wavelet Scattering Networks (LWSNs) 
Instead of the fixed wavelet filters of the WSN, the wavelet filters in the LWSN are
learnIanbstleeauds ionfg thae sfeixcoedn dw-gaevneelerta tfiioltnerws oafv ethleet WtraSnNs,f othrme w(SaGveWleTt )f.ilTtehres icnla tshseic LalWwSaNv ealreet 
lteraarnnsafobrlme uissirnegal iaz esdecthornodu-gghentehreattrioanns lwataiovnelaent dtreaxnpsfaonrsmio n(SoGf WtheTm). oTthhee rcwlaasvsieclaelt fwunavcteiloent .
tTrhanissfdoerfimn iitsi orneailsizveedry threrosturgichti vthee,  storatnhselaStGioWn Tanddo eesxpawanasyiown iothf tith.eT mheotlihfetirn wg amveetlheot dfu[n3c5-]
toiornth. Tehliifst idnegfisncihtieomn eis (vFeigryu rrees2tr)iicstiaves,p saoc ethdeo SmGaWinTw doavese laewt caoyn wstirtuh citti. oTnhem leiftthinogd museethdotdo 
[c3o5n]s otrru tchtet hlieftSinGgW sTchfielmteers (,Faingduriet  b2u) iilsd as sspaarcsee dreopmreasienn wtaativoenlestb cyoenxsptrlouicttiinognt hmeectohrorde luatsieodn 
tion hceornesnttruinctm thoes tSrGeaWl-Tw ofirlltderds,a atan.dI titc obnusilidstss sopfatrhsree erebparseiscesntetaptsio: ns by exploiting the cor-
relation inherent in most real-world data. It consists of three basic steps: 
1. Split: Let x(n) be an original signal. In this step, x(n) is divided into two subsets: the
1. Sepvleitn: Lsuetb s?e(t?x)e (bn?e) aa(n?n )doroidgidnasul bsisgent axlo. (Inn) .thTihs estseupb,s e?t(s?a)r eisc dorivreidlaetded inatcoc otwrdoin sgubtosetthse: tchoer reevleanti osnubstsreut cture o fatnhde oordidgi nsualbssiegtn ?al.(?). The subsets are correlated according 
to the correlation structure of the original signal. 
xe(n) = x(2n) (5)
xo(n) = x(2n + 1) (6)
 
Electronics 2022, 11, x FOR PEER REVIEW 5 of 17 
 
Electronics 2022, 11, 451 ? ?(?()?) =? (?) = ?(
?2(?2+?)1 ) 5 o(f51)6 (6)
22. . PPrereddicitc?:t :T(Th?eh) eodod dcoceofefficfiiceinetnst s xo(n )aarree pprreedd?ici(ct?tee)dd ffrroomm ththee nneeigighhbboorirningg eevveenn ccooeefffifi- -ciceienntst s xe(n),, aanndd tthhee pprreeddiicctti?ioo(nn? d)diiffffeerreenncceess d(n)  aarreed deefifnineedda asst htheed deetatailils isgignnaal,l, ? = [?( d( n=) =? x(?o()n )?? ?P((?xe((?n)))) ((77))wherew here P =1),[?p(?1),, ?? ?(???)?]?  , ips( tNhe)] pT riesdtihcetiponre odpicetrioatnoor.p erator.
33. . UUppddaatete: :CCooaarrssee aapppprroxiimattiion c?((n?)) totot htheeo roirgiignianlasl igsinganlails icsr ecarteeadtebdy bcyo mcobminbininginthge 
theev envecno ecfofiecfifeincitesnatns danthde thlien el?ia(nr?ec)aorm co?bmin(b?ait)nioanti?onf(? toh(f?e t)hpere pdriecdtiiocntiodnif fdeirfefnercensces  ? = [?(1), ? ? , ?(?)] c( n=) = xe(n )++ U(d(n)) ((88))where ?(?) U u ? ? ? ? ? ?  ius tNhe uT pdate operator. By iterating on the approximation signalw here = [ (1), , ( )matio nussiignnga tlhce nthrueesi sntgepths,e tthhe] apis tree pstrhoexiumpadaeps, thetioten operatoapparnodx itmher. By iteratina tdioetnaialn sdigtnhagl aorne dee thta oilbetaapproxisiginneadl  aarte -( )
differeonbtt aleinveedls.a Tt hdeif foeprteinmtilzeavteiolsn.  Tofh tehoep ltifimtinizga stciohnemofet?sh eUlpifdtiantge (sUch) eamnde? sPrUepddicatt (eP()U o)paenr-d
a?tors iPnr tehdei cLtW(PS)No pise craatroriresdi noutht eusLiWngS tNhei sgecnaerrtiice daloguortiuthsmin g(GthAe).g Tehnee toipc taimlgiozreidth Ump(dGaAte) .(U) anTdh eProepdtiicmt i(zPe)d oUpperdaattoers( Ua)rea ncdonPvreerdtiecdt (tPo) tohpee wraatovresleatr e(?c)o nanvedr taevdertoagtihnegw oapveerlaetto(rs ?)( ) usainngd Cavlaeyrapgoionlge?osp aelrgaotorirtsh(m? )[u35si]n, gsuCclha ythpaoto lteh?es astlgruocrtituhrme i[n3 5F]i,gsuurceh 1t hcaatnt hbee sutrsuecdt utroe 
learn tiinmFei?gfurerequ1ecnacnyb reepurseesdentotalteiaornns tfirmome? ftrheeq udeantac.y representations from the data.
 
FFiigguurree 22. .LLifitfitningg sscchheemmee. .
TTaabbllee 11 iilllluussttrraatteess tthhee ddiiffffeerreenncceess bbeettwweeeennd deeeeppl eleaarnrnininggn netewtworokrsk,sw, wavaevleeltestc sactatettreinr-g
innegt wneotrwkos,rkasn,d anleda rlneaarbnleabwlea vwealevteslecat tstceartitnegrinnegt wneotrwkos.rks. 
Table 1. DiffeTraebnlcee s1.b Detiwffeereennnceestw boetrwkse.en networks. 
Deep Learning Networks WaveWleatvSeclaettte SricnagttNereitnwgo rNkest- LearLneaabrnlea bWleaWvealvel Deep Learning Networks et 
eStcat-
works Stceartitnergi nNgeNtwetowrokrsk s
Features Learnt from data FFiixxeedd waavveeleltetty tpyepaen adnd co- WWaavveelleett ttyyppee aanndd 
Features Learnt from data coefficeifefnictsie(nntost (lenaortn tlefarormnt dfraotam)  ccooeeffiffciiceinetns tlse alrenatrfnrto mfrodmat a
Features Output at Last layer Everydlaaytear) Evedraytlaa yer
FeNautumrbeesr OofuLtpayuetr ast Variable nuLmabset rlaoyf ehrid den Two layers (Etyvpeircya llayy) oefr fixed TwoElavyeerrys loafylearr ned(co
Number of Layers Vari
navbolleu ntiounmalb)elar yoefr shidden  Two lwayaveresle (ttsypically) of Two laywearvse oleft slearned 
Nonlinearity Modulu(cso/nRvecotliufiteidonLainl)e alar yUenrist / fMixoeddu wluasvelets wMaovdeulelutss 
Nonlinearity Mo
Hdyupleurbso/RlieccTtaifnigeedn Lt,ientec.ar Unit/ 
Pooling HMax/Ave
Modulus Modulus 
yperbolriacg Tinagn,getecn. t, etc. Averaging Averaging
LearnPinogolAinlggo rithm GrMadaiexn/tADveesrcaegnitnagn,d etc. ANvAeraging Lifting mAevtehroadgaingd genetic
GBraacdkpiernopt aDgeasticoennt and  Lifting meatlghoordit hamnd genetic 
LearniCnlga sAsilfigeorrithm BaScokfptMroapxagation Any (e.g.N, SAVM ) Anaylg(oe.rgi.t,hSmVM )
Classifier Various architecStoufrteMs, ae.xg ., ResNet Any (e.g., SVM) Any (e.g., SVM) 
Architecture [32] Alexnet [41], Recurrent See Figure 1 See Figure 1
Neural Network [42] etc.
 
Electronics 2022, 11, 451 6 of 16
2.4. Genetic Algorithm (GA)
The GA [43] mimics the theory of natural selection. As in the case with evolution,
a population consists of individuals which reproduce to create the next generation. This
reproduction involves the combination of genetic material from parents to create an off-
spring. Each subsequent generation will be created by parent individuals by combining
their genes. The selection of parents (individuals) to combine is based on their fitness, and
the fitness of an individual is based on the fitness function. A total of 10% of the individuals
with the best fitness move on to the next generation. This mechanism is called elitism, and
the percentage of the elite individuals can be changed. The remaining individuals take
part in crossover, where the genes of two individuals (parents) are combined to create the
genes of the individual of the next generation (child). Crossover is carried out until the
required number of individuals (children) is created in the next generation. Analogous to
mutation in natural reproduction, random changes are added to the genes of a fraction of
the children created. This helps to avoid getting stuck in the local minima of the optimiza-
tion of the fitness function. The process repeats for the new generation and the subsequent
generations until the predefined maximum number of generations is reached or there is no
improvement in the fitness in consecutive generations.
2.5. Support Vector Machine (SVM)
An SVM is based on the concept of finding decision planes or hyperplanes that
maximize the separation between classes. If the classes are not linearly separable, a kernel
trick is used to map the data into higher dimensions in an effort to separate them. To find
the support vectors and hence construct an optimal hyperplane, the following optimization
problem [44] is solved:
1 N ( )
min?(w) = ?w?2 + C ? ? Ti s.t. yi w ?(xi) + b ? 1? ?i (9)2 i=1
where C is the penalty parameter to guard against overfitting, and ?i are the slack variables
introduced to handle inseparable data. The input data consists of xi and yi, which are the
independent and the dependent variable (class label), respectively. The kernel function ?
transforms the input data xi into higher dimensions.
3. Fault Diagnosis Methodology
The implementation of the diagnostic scheme is depicted in Figure 3. Firstly, a dataset
of signals when the circuit components are degrading is obtained via simulation or exper-
imentation. This dataset is randomly split into a training dataset [XTrain, YTrain] and a
testing dataset [XTest, YTest], where XTrain and Xtest represent the circuit output signals
in the training and the testing dataset, respectively, and YTrain and YTest represent the
corresponding labels (degrading components). A subset of signals (30%), XTrain?, is ran-
domly selected from the entire training dataset to be used with the GA. This is done to
prevent overfitting to the training dataset and to reduce the time taken for GA optimization.
The fitness function used is the Davies?Bouldin (DB) index [45], as it considers the ratio of
within-class and between-class distances. As a result, the minimization of the DB index
leads to maximum separation between the classes. The GA is used to optimize the Predict
and Update operators of the SGWT, such that the DB index is minimized. The genes in
each individual in the GA are the coefficients for the P and U operators that need to be
optimized by the GA. The P and U operators are assumed to be of length 8; hence, the
number of genes in each individual is 16. Other hyperparameters chosen for the GA include
population size: 100, elite count: 10%, crossover fraction: 90%, mutation rate: 5%, and the
stopping criterion of the GA is when there is no appreciable improvement in the fitness
function for 30 consecutive generations. The feature space (XTrainMod) created by the
LWSN, with the optimized P and U operators, is classified using the SVM as the classifier.
Electronics 2022, 11, x FOR PEER REVIEW 7 of 17 
 
length 8; hence, the number of genes in each individual is 16. Other hyperparameters cho-
sen for the GA include population size: 100, elite count: 10%, crossover fraction: 90%, mu-
ta?t?io?n? ?r?a?te?: ?5)% , and the stopping criterion of the GA is when there is no appreciable im-Electronics 2022, 11, 451 provement in the fitness function for 30 consecutive generations. The feature s7poafc1e6 ( created by the LWSN, with the optimized P and U operators, is classified 
using the SVM as the classifier. Since SVM hyperparameter optimization is not the focus 
Soifn tcheisS VpaMpehry, ptherep hayrapmereptearraompteimteriz oaptitoimn iiszantoiot nth we faosc cuasrorifetdh iosupta upseirn, gth beuhiyltp-ienr pMarAaTmLeAteBr 
ofupnticmtioiznast.i on was carried out using built-in MATLAB functions.
 
Figure 3. FaulFtidgiuargen 3o.s Fisaumlte tdhioadgnoloosgisy .methodology. 
4.. Experiments and Results 
The prropoosseeddm meeththooddw waassv evreirfiiefidedu suinsigntgw towaon anloaglocgir cuirictus,itsh,e tShael lSeanl?leKne?yKbeayn bdapnadss-
fiplatsesr fcilitrecru citiracnudit tahnedt twheo -tswoi-tscwh iftocrhw foardwacrodn vcoenrtvoerrctoirrc cuiirtc,uaint,d antwd otwrot raotitnagtinmga mchaicnheirny-
derayt adsaetass, eCtsW, CRWURbUea brienagrinfagu flatsuldtsa tdaasteatsaetn adnUd oUCoCge gaerafra fualutsltsd adtaatsaeste.t.F Faauultlt ddaattaa ffor the 
circuits is generated by varying component values around their nominal* values within SPIC* E, i.e., if the nominal value of a component is Y,, the lower range and the upper range of the deviation constituting the paramettriic ffaulltt off tthee ccoomppoonneenntt iiss [[00.2.255 Y? Y? 0?.90*Y.9] ?anYd] a[1n.d1 [Y1 .?1 1?.7Y5*?Y]1, .r7e5sp?eYc]t,ivrelsyp.e Wctihveenly t.hWe choemnpthoenecnotm vpaolune nist bveatlwueeisn b0e.9tw*Ye aend0 .19.1?*Y, aitn ids 
1c.o1n?siYd,eirteids tco nbsei wdeirthedint iotsb teolweritahnicne irtasntoglee,r ia.en.c, ea troalnegrae,nic.e .r,aantgoele orfa 1n0c%e r. aTnhgee troafin10in%g. dTahtea 
twraeirnei nogbtdaaintaedw ebrye coobntadinuecdtinbgy 1c0o0n0d uScPtIiCngE 1s0im00uSlPatIiCoEnss,i mwuhlearteio cnosm, wphonereenctso marpeo vnaernitesda rine 
vthaeri eadfoirnemthenatfionreemd ernatniognees dornaen agte sa otnime ea,t awthimilee ,thweh oiltehtehre cotmheprocnoemntpso anreen thsealdre ahte tlhdeairt 
tnhoemirinoalm vianlaulevsa. lues.
44..11.. SSaalllleenn??KKeeyy BBaannddppaassss FFiilltteerr 
TThhee fifirrsstt cciirrccuuiitt uunnddeerr tteesstt ((CCUUTT11)) iiss tthhee SSaalllleenn??KKeeyy bbaannddppaassss fifilltteerr ((FFiigguurree 44)),, wwhhiicchh 
iiss tthhee mmoosstt ffrreeqquueennttllyy ssttuuddiieedd cciirrccuuiitt ffoorr aannaalloogg cciirrccuuiitt ffaauulltt ddiiaaggnnoossiiss.. UUnnlliikkee ootthheerr ppaa--
ppeerrss tthhaatt oonnllyy ccoonnssiiddeerr tthhee ffaauulltt ddiiaaggnnoossiiss ooff ffoouurr ooff tthhee sseevveenn ppaassssiivvee ccoommppoonneennttss,, wwee 
ccoonnssiiddeerreedd aallll sseevveenn ppaassssiivvee ccoommppoonneennttss ffoorr ffaauulltt ddiiaaggnnoossiiss.. TThhee ppaarraammeettrriicc ffaauulltt rraannggeess 
ffoorr tthhee sseevveenn ccoommppoonneenntstsc coonnsisdidereerdeda raeres hsohwown nin inTa Tbalebl2e. 2A. sAcsa ncabne bseee sneefrno mfroTmab Tleab2,lew 2e, 
considered a single class for each component as opposed to other papers in the literature
that consider two classes for each component. The data for each class were split into training
 and testing data sets via a 75%?25% split. The LWSN was trained on the training data, and
the testing accuracy of the LWSN is reported in Table 3, along with the testing accuracy of
the original wavelet scattering network and the Gaussian?Bernoulli Deep Belief Network
(GB-DBN)-based approach [22], which was used for comparison. This paper was used for
Electronics 2022, 11, x FOR PEER REVIEW 8 of 17 
 
we considered a single class for each component as opposed to other papers in the litera-
ture that consider two classes for each component. The data for each class were split into 
training and testing data sets via a 75%?25% split. The LWSN was trained on the training 
data, and the testing accuracy of the LWSN is reported in Table 3, along with the testing 
Electronics 2022, 11, 451 accuracy of the original wavelet scattering network and the Gaussian?Bernoulli Dee8po fB1e6-
lief Network (GB-DBN)-based approach [22], which was used for comparison. This paper 
was used for comparison because it uses a deep-learning-based feature extractor, the 
DBN, along with an SVM for classification. Hence, it is conceptually similar to our paper. 
cTohmep caornisfounsiboenc mauasteriixt ufosre sthaed faeeuplt- ldeiaargnninogsi-sb aosf etdhef eSaatluleren?eKxteryac btaonr,dthpeasDs BfiNlte, ar luosnigngw LitWh SanN 
SiVs Mshofowrnc lians sTiafibcalet i4o.n . Hence, it is conceptually similar to our paper. The confusion matrix
for the fault diagnosis of the Sallen?Key bandpass filter using LWSN is shown in Table 4.
 
FFiigguurree4 4. .S Saallleenn??KKeeyyb baannddppaasssfi flitleter.r. 
TTaabblele2 2. .N Noommininaal lv vaalulueessa annddp paararammeetrtircicf afauultltr arnanggeeo of fS aSlallelnen?K?Keyeyb bananddppasasssfi fltiletrerc ocmompponoennetnst.s. 
FFaauullt?t CCllaassss FaFualutl tCCooddee NoNmoimnianla Vl Valauluee FFaauullttyy RRaannggee HHeea?alltthhyy F0F 0 NNAA NNAA ?  F1 1 k? [0.25 k 0.9 k] and [1.1 k 1.75 k] R1 F1 1 k? [0.25 k 0.9 k] and [1.1 k 1.75 k]R?  F2F 2 1 k1?k? [02 [.02.52 5kk 00.9.9 kk]] aanndd [[11..11k k1 1.7.575k ]k] R?3  F3F 3 2 k2?k? [0[.05. 5kk 11.8.8 kk]] and [[2..22k k3 3.5.5k ]k] R F4 2 k? [0.5 k?4  F4 2 k? [0.5 k 1
1.8.8 kk]] aand [[22..22k k3 3.5.5k ]k] 
R5  F5F 5 2 k2?k? [0[.05. 5kk 11.8.8?  k
k]] aanndd [[22..22k k3 3.5.5k ]k] 
C1  F6F 6 5 n5Fn F [1[.12.25 n 4.50 n] and [5.50 n 8.75 n]C2 F7 5 nF [1.52 5nn 44.5.500 nn]] aanndd [[55..5500n n8 8.7.575n ]n] 
 F7 5 nF [1.25 n 4.50 n] and [5.50 n 8.75 n] 
Table 3T.a Fbaleul3t .dFiaaugnltodsiias gancocusirsaaccyc oufr aLcWy SoNf L aWndSN coamndpacroismopna wriistohn owthiethr motehtehromdse.t hods.
Circuit Literature L(GitBe-rDatBuNre) [2 2] Wavelet Scattering Networks Proposed Method (LWSN)
Circuit (GB-DBN) [22] Wavelet Scattering Networks Proposed Method (LWSN) CUT1 99.12% 90.01% 99.72%
CUCTU2 T1 84.9349%.12% 8920.4.051%% 9929.9.732%% 
CUT2 (ExperimeCnUtaTl V2 alidation) N8A4.34% 8812.1.425%% 9902.7.913%% 
CUT2  
(Experimental ValidationTa) ble 4. Confusion
NmAa trix for LWSN for Sallen8?1K.e1y2%ba ndpass filter. 90.71% 
F0 99.T4able 4. Co0n.6fusion matrix for LWSN for Sallen?Key bandpass filter. 
F1 99.8 0.2
F0 99.4 0.6       
F2 F1  99.8 99.8 0.2     0.2  
F3 F2   10099.8 0.2     
F4 F3    110000     
F5 F4     1001 00    
F6 1F.58      100 98.2   
F7 100
F0 F1 F2 F3 F4 F5 F6 F7
 
Predicted Class
The Sallen?Key bandpass filter circuit involved seven fault types and one healthy
class to detect and identify, which correspond to the 14 fault types for methods used in
True Class
True Class 
Electronics 2022, 11, x FOR PEER REVIEW 9 of 17 
 
F6  1.8     98.2  
F7        100 
 F0 F1 F2 F3 F4 F5 F6 F7 
 Predicted Class 
Electronics 2022, 11, 451 The Sallen?Key bandpass filter circuit involved seven fault types and one he9aolfth16y 
class to detect and identify, which correspond to the 14 fault types for methods used in 
the literature. From Table 3, it can be seen that the proposed LWSN method achieved a 
tmhearligtienraalt uimrep. rForvoem eTnatb olef 30,.7i%t c ainn tbhee sfaeuenlt tdhiaatgtnhoesips raocpcousreadcyL oWvSerN commepthaoradbalec hmieevthedodas 
mina rthgien lailteimrapturorev e[1m8e] natnodf 0a. 79%  iinmtphreofvaeumltedniat ginn otshies afacucultr adciyagonvoesrisc oamccpuarraacbyl eomveert hao tdrasdini-
tthioenliatle rWatSuNre. [A18s] caannd bae9 s%eeinm fprroomve tmhee nctoinnfuthseiofna umltadtriaixg nino sTisaabclec u4r,a fcayuoltv teyrpaet rFa6d, iwtiohnicahl  
WcoSrNre.sApsoncadns btoe sceaepnafcriotomr tChe1,c ownafsu smioinsdmiaagtrnioxsiendT ambolest4 ,offatuenlt; thypowe Fe6v,ewr, htihceh cdoiarrgensopsoins dosf 
toothcaepr afaciutoltr tCyp1e, sw wasams aislmdioasgtn poesrefdecmt. ost often; however, the diagnosis of other fault types
was almost perfect.
4.2. Two-Switch Forward Convertor 
4.2. Two-Switch Forward Convertor
The second circuit under test (CUT2) is the two-switch forward convertor circuit (Fig-
ure 5T)h. eAs feocrownadrdci rccounivt eurntedre irs tae sstw(iCtcUhTin2g) ipsotwheert wsuop-pswlyi tccihrcufoitr wthaartd isc ounsveder tfoorr ecinrecrugiyt  
(tFriagnusrfeer5 w). hAenfo trhwea trwdoc sownvitecrhteers i(straanswsisittcohrsin) garpeo swimerulstuapnpeolyucsilryc tuuirtnthedat oins .u Tshede pfoarraemneertgriyc 
transfer when
ffaauulltt rraannggeessf foo
th
rrt 
e tw
htheec c
oosmwpitocnheens t(st rcaonnssisitdoerrse)da raeftseirm suelntasniteivoiutysl yantualrynseids aorne. sThhoewpna rianm Tetricomponents considered after sensitivity analysis are shown in Tablaeb5le,  
a5l,o anlgonwgi twhitthh ethvea lvuaelsuefso rfoerx epxepriemriemnetanltavl evreifiricfaictiaotino.n.A Assc acannb bees seeeenn ffrroomm TTaabblele 55,, wwee 
ccoonnssiiddeerreedda as siningglelec lcalassssf ofrore aecahchs isnignlgelfea fualut l(ts i(nsignleglceo cmopmopnoennetndte dgreagdraadtiaotni)oans) oasp poopspeodsetod 
ototh oerthpearp peraspienrst hien ltihteer alittuerreattuharet  ctohnast idcoenr stiwdoerc ltawssoe sclfaosrseesa cfhors ienagclhe fsaiunlgtl.eT fhaeualtd. vTahnet aagde-
ovfadnotaingge osof disoitnhga tswo eis ctohualtd wceo ncosiudlder coonnesicdlears sonfoer celvasesr yfodr oeuvbelrey fdaouultb(ltew foauclot m(twpoon ceonmts-
dpeognreandtisn dgesgimraudlitnagn esiomuusllyta),naesoucasnlyb),e asse ceannf rboem seFeanu flrtoCmo dFeasulFt1 C4oadneds FF1154. aInf dw Fe1w5.e Irfe wtoe 
cwonerseid teor ctownosicdlaesrs tews ofo crlaesascehs sfionrg eleacfahu slitn, gwlee fwaouultl,d whea wveotuoldco hnasvidee trof coounrscildaesrs efsouforr celavsesreys 
dfooru ebvleerfya udlot.ubTlhee fadualtta. Tfohre edaactha fcolar sesacwhe crleasssp wliteirnet soptlrita iinntion gtraanindintge satnindg tedsattiangs edtastva isaetas 
7v5i%a ?a2 755%%s?p2l5i%t.  Tshpelitt.e Tsthineg teasctcinugra accycoufrathcye LoWf thSNe LoWn SbNot honth beostihm tuhlea tsiiomnualnadtioenx paenrdim eexnptearl-
dimateanistarle dpaotrate ids rinepToarbtleed3 i,na Tloanbglew 3i,t haltohnegt ewstiitnhg thaec ctuesraticnygo afctchueroarciyg ionfa tlhwe aovriegleint ascl awttaevrienlegt 
nsectawttoerriknagn ndettwheoGrka aunssdi athn?e BGearunossuilalni D?BeeerpnBouellilei fDNeeetpw Boerlkie(fG NBe-tDwBoNrk)- (bGasBe-dDaBpNp)r-obaacshed[2 a2p],-
wphroicahchw [2e2re], wusheidchf owrecroe muspeadr ifsoorn c.omThpearcisoonnf.u Tsihoen comnafutrsiixonfo mr athtreixf afourl tthdei afagunlot sdisiaogfnothsies 
towf oth-sew tiwtcoh-sfworiwtcahr dfocrownavredr tcoorncviercrtuoirt ucisricnugitL uWsiSnNg LisWshSoNw ins sinhoTwabnl ein6 .Table 6. 
 
FFiigguurree 55.. TTwwoo--sswwititcchhf foorrwwaarrddc coonnvveerrtotorrc cirirccuuiti.t. 
The experimental setup that was used to demonstrate our approach is shown in Fig-
ure 6. The two-switch forward convertor circuit (CUT2) was used with pulse width 
 
Electronics 2022, 11, 451 10 of 16
Table 5. Nominal values and parametric fault range of two-switch forward convertor circuit components.
Fault Class Fault Code Nominal Value Faulty Range Experimental Values
Healthy F0 NA NA NA
R1 F1 33 ? [8.25 ? 29.7 ?] and [36.3 ? 57.75 ?] 10 ?, 20 ?, 40 ?, 50 ?
C4 F2 0.1 ?F [0.025 ?F 0.09 ?F] and [0.11 F 0.175 F]
0.025 ?F, 0.05 ?F,
? ? 0.12 ?F, 0.15 ?F
RL F3 100 ? [25 ? 90 ?] and [110 ? 175 ?] 30 ?, 80 ?, 120 ?, 170 ?
L3 F4 100 ?H [25 H 90
30 ?H, 75 ?H,
? ?H] and [110 ?H 175 ?H] 156 ?H, 170 ?H
R5 F5 0 ? [0.1 ? 10 ?] 2 ?, 4 ?, 6 ?, 8 ?
R6 F6 0 ? [0.1 ? 10 ?] 2 ?, 4 ?, 6 ?, 8 ?
R7 F7 0 ? [0.1 ? 10 ?] 2 ?, 4 ?, 6 ?, 8 ?
R8 F8 0 ? [0.1 ? 10 ?] 2 ?, 4 ?, 6 ?, 8 ?
R10 F9 0 ? [0.1 ? 10 ?] 2 ?, 4 ?, 6 ?, 8 ?
R11 F10 0 ? [0.1 ? 10 ?] 2 ?, 4 ?, 6 ?, 8 ?
R12 F11 0 ? [0.1 ? 10 ?] 2 ?, 4 ?, 6 ?, 8 ?
R13 F12 0 ? [0.1 ? 10 ?] 2 ?, 4 ?, 6 ?, 8 ?
R16 F13 0 ? [0.1 ? 10 ?] 2 ?, 4 ?, 6 ?, 8 ?
(30 ? 0.025 ?F), (30 ?
RL ? C4 F14 100 ? ? 10 ?F ( ([0.025 ?F 0.09 ?F] and [0.11 ?F 0.175 ?F]) 0.175 ?F), (170 ? 0.025 ?F),
(170 ? 0.175 ?F)
([8.25 ? 29.7 ?] and [36.3 ? 57.75 ?]) ?
R ? R ? (10 ?, 20 ?), (10 ?, 40 ?),1 2 F15 33 ? 33 ?
([8.25 ? 29.7 ?] and [36.3 ? 57.75 ?]) (30 ?, 10 ?), (50 ?, 50 ?),
R2 F16 33 ? [8.25 ? 29.7 ?] and [36.3 ? 57.75 ?] 10 ?, 20 ?, 40 ?, 50 ?
Table 6. Confusion matrix for LWSN for two-switch forward convertor circuit.
F0 91.7 0.7 3.5 0.5 3.5
F1 94.4 0.2 5.3
F2 0.5 89.9 9.6
F3 0.8 78.4 3.8 0.8 1.3 14.0 0.8 0.3
F4 0.5 0.3 1.6 89.7 0.3 5.7 0.5 0.8 0.8
F5 0.3 0.3 98.2 0.5 0.5 0.3
F6 0.5 1.2 3.9 1.2 92.0 0.2 0.2 0.5 0.2
F7 3.1 94.8 0.3 1.3 0.5
F8 0.2 14.4 2.0 0.5 0.7 81.9 0.2
F9 0.3 0.3 99.2 0.3
F10 9.9 88.3 1.8
F11 1.0 0.3 0.3 0.8 97.1 0.3 0.3
F12 4.2 0.2 1.0 1.7 0.2 1.0 90.3 1.2
F13 0.5 0.5 0.2 0.2 0.2 98.3
F14 100
F15 4.1 0.7 94.9 0.3
F16 4.7 2.3 2.3 2.3 4.7 83.7
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16
Predicted Class
The experimental setup that was used to demonstrate our approach is shown in
Figure 6. The two-switch forward convertor circuit (CUT2) was used with pulse width
waveforms to trigger the two switches, generated using an Agilent Arbitrary Waveform
Generator 33250A. The circuit components were swapped out with the components with
values shown in the Experimental values column of Table 5. For instance, to mimic the
True Class
Electronics 2022, 11, x FOR PEER REVIEW 11 of 17 
 
F8 0.2   14.4 2.0  0.5 0.7 81.9     0.2    
F9  0.3      0.3  99.2    0.3    
F10   9.9        88.3     1.8  
F11 1.0   0.3 0.3 0.8      97.1 0.3    0.3 
Electronics 2022, 11, 451 11 of 16
F12 4.2 0.2   1.0 1.7 0.2 1.0     90.3 1.2    
F13 0.5    0.5   0.2   0.2  0.2 98.3    
F14               100   
F15  4.1 d egrad ation of resi stor R 1 from its no minal valu0e.7o f 33 ?, res istors of 10  ?, 2904?.9, 400.?3 ,
and 50 ? were substituted, and the circuit output was captured at every instance. The
F16  4.7 c ircuit respon ses c2a.p3 ture2d.3a t the outpu t usin g an A gilen2t.D3 igita l Osc4il.l7o scop e 5485 3A w83er.7e 
 F0 F1 Fc2l assiFfi3e d usFi4n g thFe5d eveFl6o pedFf7a ult dF8ia gnoFs9i s mFe1th0o doFl1o1g y, Fa1n2d thFe1r3e suFlt1s4a reFp1r5o viFd1e6d 
 in Table 3. Predicted Class 
 
FFigiguurere6 6. .E Exxppeerirmimeenntatal ls esetutuppf oforrd deemmoonnstsrtaratitninggt htheed deveveleoloppededa pappproroacahch. . 
TThhees sixixteteeennf afauultltt ytyppeessa annddo onneeh heeaaltlhthyyc clalassssc coonnssidideerereddf oforrt htheet wtwoo-s-wswitictchhf oforwrwaardrd 
ccoonnvveerrtotorrc coorrrreessppoonnddt oto2 288f afauultltt ytyppeessf oforrm meeththooddssi nint htheel iltieteraratuturree, ,a annddt hthisisi sisa am muucchh 
mmoorerec chhaalllleennggiinngg ffaulltt diiagnossiiss pprroobblelemm cocommpparaerded tot oCUCTU1T. 1F.roFmro mTabTlaeb 3le, it3 ,caitn cbaen seben 
steheant tthaet pthroepporsoepdo LseWdSLNW mSNethmoedt haocdhiaecvheide vae sdigansifiigcnainfitc iamnpt rimovpermoveenmt oefn 8t.o9%f 8 i.9n% thien ftahuelt 
faduialtgndoiasgisn oascicsuaraccyu roavcyero tvheer ctohme cpoamrapbaler ambleethmoedt hino dthine ltihterlaitueraet u[2r2e] [2a2n]da an d10a.91%0. 9im%-
imprporvoevmemenetn itn itnheth feaufaltu dltiadginagosnios saisccaucrcaucrya ocyveorv tehre tthraedtirtaiodnitaiol nWaSl NW. SANs. cAans bcaen sebeens fereonm 
frtohme ctohnefucosinofnu smioantrmixa itnri TxainblTea 6b, lfea6u,ltf atuypltet yFp3e, wF3h,icwhh cicohrrceosprroenspdos ntod sretsoisrteosris RtoLr, wRLa,s wmais-
mdiisadginaogsneods eads fasuflta utylpt ety Fp8e (Fre8si(srteosris Rto8r).R O8t)h.eOr tnhoetrabnloet ambilseclmasisicfliacsastifioncas tiinocnlsudinec tlhued seinthgele 
sfinauglte Ffa1u (lrteFs1is(troers iRs1to) raRn1d) tahned dtohuebdloe ufbauleltf aFu1l5t F(r1e5si(srteosris Rto1r aRn1da nRd2)R. 2T)h. iTs hhiisghilgighhlitgsh tthse 
thcoemcopmlepxlietyx itoyf oafnaanloaglo gcircciruciut itfafuaultl tddiaiaggnnoosissi.s .HHoowweevveerr, , the developed LLWSSNm meeththoodd 
sstatannddsso ouut ti nint etermrmsso of ff afauultltd diaiaggnnoosissisp perefroformrmaanncecei ninc ocommpparairsiosonnt otoe xeixsitsitninggm metehthodods.s. 
44.3.3. .B BeeaarrininggF FaauultltD Diaiaggnnoosissis 
InInr orotatatitninggm maacchhinineeryrya apppplilcicaatitoionnss, ,r orolllilninggb beeaarrininggf afauultlstsa arreet thheem moossttc coommmmoonn,,l leeaadd--
ininggt otot htheep peerrffoorrmmaannccee ddeetteerriioorraattiioonn ooff mmaacchhiinneerryy.. HHeennccee,, bbeeaarriinngg ffaauulltt ddiiaaggnnoossiiss ppllaayyss a 
avvitiatal lrroolele iinn tthhee hheeaalltthh mmaannaaggeemmeenntt ooff mmaacchhiinneerryy [[4466]].. TToo tteesstt tthhee eefffefecctitviveenneessss oofft hthee 
mmeeththoodda accrroossss ddiiffffeerreenntt ddoommaaiinnss ooff ffaauulltt ddiiaaggnnoossisis, ,ththee ddeveveleolpopeded mmetehtohdo dwwasa tsestteesdte odn 
oan baeabreinargi nfaguflatsu bltesnbcehnmcahrmk adraktadsaetta. Tsehte.  TChaeseC WaseestWerens RteersnerRvees UernvieveUrsniitvye (rCsiWtyR(UC)W mRoUto)r 
mboetaorrinbge adraitnagsedta wtaasse tgewnaesragteende urastiendg uas tiensgt raigte csotnrisgisctionngs iosft ian 2g hopf  aR2elhiapnRcee lEialenccteriEc lmecotrtoicr, 
mao ttoorrq, uaet otrrqaunesdtruacnesrd/euncceord/eern,c ao ddeyrn, aamdyonmaemteorm, aentedr ,darnivded-ernivde -aenndd faannd-efnand- eSnvdenSsvkean Kskual-
Klaugllearg-Fera-bFraikberink edneedpe-egpr-ogorvoeo vbealbl ablelabreinargisn. gIsn.nIenrn reirngri,n ogu, toeur treirngri,n agn,da nrodllrionlgli neglemeleemnte dnet-
dfeefcetcst swwereer emmaannuufafactcutureredd inintoto tthhee bbeeaarriinnggss.. TThhee mmoottoorr wwaass rruunn aatt aa nneeaarr--ccoonnsstatanntts sppeeeedd 
(1(1772200??11779977r /r/mmiinn)) wwiitthh ddiiffffeerreennttl looaaddss( (00??33h hpp))p prroovvidideeddb byyt thheed dyynnaammoommeeteter.r.V Vibibraratitoionn 
ddaatataw weererec ocollleleccteteddu usisningga acccceelelerorommeeteterrss, ,w whhicichhw weererev veertritcicaalllylya atttatacchheeddt otot htheeh hoouusisningg 
wwitihthm maaggnneetitcicb baaseses.s.S Saammpplilninggf rfereqquueennccieiessw weerere1 122k kHHzzf oforrs osommeeo offt htheet etestsstsa anndd4 488k kHHzz 
foforrt htheeo tohtehresr.sF. uFruthrtehredr edtaeitlasiclsa ncabne bfoeu fnodunatdt haet tChWe RCUWBReUa rBinegarDinagta DCaetnat eCrewnteebrs iwtee[b4s7i]t.e 
A[s47s]h. oAws nshinowTanb ilne T7,aobnlee 7h, eoanlteh hyebaeltahryin bgeaanridngth arnede  ftahureltem faoudlte sm, oindcluding the inner ringfault, the rolling element fault, and the outer ring fault, were classifieesd, iinntcolutdenincga tthege oinrineesr 
(one health state and nine fault states) according to different fault sizes. A plot of the data
can be seen in Figure 7. The data were resampled such that the entire dataset had a constant
 sampling rate, and then, the data were split into chunks with sizes of 1024. The dataset
was then split into training and testing datasets in the ratio of 75%:25% using stratified
sampling. The LWSN achieved 99.2% accuracy for the testing dataset, which is comparable
to the state-of-the-art methods [48]. The confusion matrix is shown in Table 8.
Electronics 2022, 11, x FOR PEER REVIEW 12 of 17 
 
ring fault, the rolling element fault, and the outer ring fault, were classified into ten cate-
gories (one health state and nine fault states) according to different fault sizes. A plot of 
the data can be seen in Figure 7. The data were resampled such that the entire dataset had 
a constant sampling rate, and then, the data were split into chunks with sizes of 1024. The 
dataset was then split into training and testing datasets in the ratio of 75%:25% using strat-
ified sampling. The LWSN achieved 99.2% accuracy for the testing dataset, which is com-
Electronics 2022, 11, 451 parable to the state-of-the-art methods [48]. The confusion matrix is shown in Table 81. 2 of 16
Table 7. CWRU faults. 
Table 7. FCaWulRt UMfoaudlets . Description 
HealthF SatualtteM ode the normal bearing at 1D79e1s crrpipmti oannd 0 HP 
Inner ring 1 0.007-inch inner ring fault at 1797 rpm and 0 HP 
Inner rHinega l2th State 0.014-inch i the normal bearing at 1791 rInner ring 1 0.0n0n7e-irn rcihngin fnaeurlrti antg 1f7a9u7lt raptm17 9a
pnmd a0n Hd 0P HP7 rpm and 0 HP
Inner rIinnnge 3r  ring 2 0.021-inch0 .i0n1n4e-irn rcihngin fnaeurlrti antg 1f7a9u7lt raptm17 9a7ndrp 0m HaPnd 0 HP
Rolling ElIenmnernrti n1g 3 0.007-inch ro0l.l0in21g- ienlcehmiennte rfaruinlgt afat u1l7t9a7t 1rp79m7 rapnmd a0n HdP0 HP
RollingR EollelimngenElte 2m ent 1 0.014-inch0 r.0o0l7li-ningc ehleromlleingt fealeumlte antt 1fa7u9l7t rapt m17 9a7nrdp 0m HanPd 0 HP
RollingR EollelimngenElte 3m ent 2 0.021-inch0 r.0o1l4li-ningc ehleromlleinngt fealeumlte antt 1fa7u9l7t rapt m17 9a7nrdp 0m HanPd 0 HP
OuteRro rlliinngg 1E lement 3 0.007-in0c.0h2 o1-uintecrh rrionlgli nfaguellte amt e1n7t9f7a urlptmat a1n79d7 0r pHmP and 0 HP
Outer ring 1 0.007-inch outer ring fault at 1797 rpm and 0 HP
Outer rOinugte 2r  ring 2 0.014-inch0 .o0u14te-irn rcihngo uftaeurlrti antg 1f7a9u7lt raptm17 9a7ndrp 0m HaPnd 0 HP
Outer rOinugte 3r  ring 3 0.021-inch0 .o0u21te-irn rcihngo uftaeurlrti antg 1f7a9u7lt raptm17 9a7ndrp 0m HaPnd 0 HP
 
FiFgiugruere7 .7V. Vibirbartaitoionns isgignnaalslso offt thhee ddiiffffeerreenntt ffaauullttss iinn tthhee CCWWRRUU ddaattaasseet.t .
TaTbalbel8e. 8C. Conofnufsuisoinonm maatrtirxixf oforrL LWWSSNNf foorrt thhee CCWWRRUU ddaattaasseett.. 
Healthy 100.0          
InnHeera lt hy 100.0
RInn
 100.0 
inge r1R ing 1 100.0
        
IInnnneerr R ing 2 96.7 3.3
RIinnner
  96.7 3.3       
g 2R ing 3 100.0
InnReorl li ng
Element 1    100.0  100.0      Ring 3 
RollRinolgli ng
Element 2     100.0  100.0     
Element 1 
Rolling
Element 3 100.0
Outer Ring 1 100.0
 Outer Ring 2 4.0 96.0
Outer Ring 3 100.0
Healthy Inner Inner Inner Rolling Rolling Rolling Outer Outer OuterRing 1 Ring 2 Ring 3 Element 1 Element 2 Element 3 Ring 1 Ring 2 Ring 3
Predicted Class
True Class True Class 
Electronics 2022, 11, 451 13 of 16
The CWRU bearing dataset involves nine fault classes and one healthy class. As can be
seen from the confusion matrix in Table 8, for the bearing fault diagnosis, fault types F3 and
F9 were misdiagnosed most often; however, the diagnosis of other fault types was perfect.
4.4. Gear Fault Diagnosis
The second rotating machinery fault diagnosis dataset considered was the University
of Connecticut (UoC) gear fault dataset [49]. The CWRU dataset and the UoC dataset were
ranked the simplest and the most difficult benchmark dataset, respectively [48], for rotating
machinery fault diagnosis. The average RMS and the average power of the signals in
the CWRU and the UoC dataset were 0.27, ?9.36 dB and 0.07, ?21.91 dB, respectively.
Preprocessing methods such as stochastic resonance [50] can be used to enhance weak fault
characteristics in datasets such as UoC; however, in this paper, the LWSN method was
applied directly to the raw vibration data.
In the UoC dataset, nine different gear conditions were introduced to the pinions
on the input shaft, including healthy condition, root crack, missing tooth, spalling, and
chipping tip with five different levels of severity. All the collected datasets were used
and classified into nine categories (one health state and eight fault states missing, crack,
spall, chip5a, chip4a, chip3a, chip2a, and chip1a) to test the performance. The data were
resampled such that the entire dataset had a constant sampling rate, and then, the data
were split into chunks with sizes of 1024. The dataset was then split into training and
testing datasets in the ratio 75%:25% using stratified sampling. The LWSN achieved 96.51%
accuracy for the testing dataset, and the confusion matrix is shown in Table 9. Our result is
marginally better, as the best result reported in [48] was 96.19%. Since the UoC dataset had
3600 samples per fault class and there were nine fault classes, the developed method is able
to process the big data of rotating machinery.
Table 9. Confusion matrix for LWSN for the UoC dataset.
Healthy 99.0 0.1 0.1 0.1 0.1 0.1 0.3
Missing Tooth 0.3 98.6 0.3 0.3 0.3 0.1 0.1
Root Crack 0.7 1.3 91.6 1.1 1.4 0.9 0.9 1.3 0.9
Spalling 0.3 98.2 0.1 0.8 0.3 0.3
Chipping Tip 1a 1.0 0.3 0.4 0.4 95.5 0.6 0.7 0.4 0.7
Chipping Tip 2a 0.4 0.3 0.1 0.6 98.2 0.1 0.3
Chipping Tip 3a 0.1 0.1 0.1 0.1 0.1 99.0 0.1 0.1
Chipping Tip 4a 0.1 0.3 0.6 0.1 0.1 0.3 98.2 0.3
Chipping Tip 5a 0.1 0.1 0.1 99.5
Healthy Missing Root Spalling Chipping Chipping Chipping Chipping ChippingTooth Crack Tip 1a Tip 2a Tip 3a Tip 4a Tip 5a
Predicted Class
4.5. Transfer Learning
In recent years, transfer learning has been gaining importance, as it enables knowledge
acquired through training on data to be transferred from a source domain to gain insight in
the target domain. This importance rises from the fact that it is very challenging to collect
data from all possible conditions that machinery may encounter. Umdale et al. [51] created
different datasets by dividing the original CWRU dataset based on speed and load, as can be
seen in Table 10. For instance, in dataset D1, the goal was to determine if training on lower
speeds in the source data set would still enable us to achieve acceptable fault diagnosis on a
dataset with higher rotational speeds, as can be seen from the target dataset of D1. In dataset
D2, the opposite was true?the goal was to determine if datasets with higher speeds would
have vital information for fault diagnosis at lower speeds, whereas mixtures of speeds were
considered in datasets D3 and D4. The maximum training and testing accuracies reported
True Class
Electronics 2022, 11, 451 14 of 16
by [51] are shown in Table 10, where testing accuracies are an indication of the effectiveness of
transfer learning. As can be seen from Table 10, the developed LWSN is more effective for
transfer learning across all four datasets. Exploratory work suggests that LWSN can perform
at least as well as deep learning networks at transfer learning, but further work needs to be
undertaken to determine if there is a fundamental improvement.
Table 10. Comparison of transfer learning accuracies across different datasets.
Dataset Source Dataset Target Dataset Training Testing Training Accuracy Testing AccuracyAccuracy [51] Accuracy [51] (LWSN) (LWSN)
1730 RPM and 3 1772 RPM and 1
D1 HP 1750 RPM HP 1797 RPM 97.22 97.02 100 99.96
and 2 HP and 0 HP
1772 RPM and 1 1730 RPM and 3
D2 HP 1797 RPM HP 1750 RPM 94.17 92.88 100 99.87
and 0 HP and 2 HP
1730 RPM and 3 1750 RPM and 2
D3 HP 1797 RPM HP 1772 RPM 96.92 95.77 100 99.39
and 0 HP and 1 HP
1750 RPM and 2 1730 RPM and 3
D4 HP 1772 RPM HP 1797 RPM 95.77 94.48 100 99.93
and 1 HP and 0 HP
These results imply that the LWSN network can extract discriminative information
from raw data effectively and achieve fault classification with high accuracy, irrespective of
the complexity and domain of the dataset.
5. Conclusions
Traditional fault diagnosis methods involve the extraction of fixed representations in
the time domain, frequency domain, or time?frequency domain. These methods require
technical expertise for designing appropriate features from the fixed representations. In this
paper, a new feature extraction technique based on learnable wavelet scattering networks
was developed to diagnose faults primarily in analog circuits and rotating machinery. By
learning a time?frequency representation from the data, the developed method has a better
ability to extract essential features of the fault signals. This results in better fault diagnosis
accuracy, by almost 9%, compared to the state-of-the-art fault diagnosis method in the liter-
ature. By considering more classes for fault diagnosis than any other paper in the literature,
a more thorough fault diagnosis was demonstrated. The fault diagnosis performance of
this method was verified by experiments on the two-switch forward convertor circuit. The
experiments indicated that the fault diagnosis model trained on simulation data is able
to effectively diagnose faults from the actual circuit. Analog circuits and gears/bearings
are the predominant sources of faults in electronic systems and rotary mechanical systems,
respectively. The developed fault diagnosis approach was applied to the CWRU bearing
faults and the UoC gear faults benchmark datasets and achieved fault diagnosis accuracy
that is comparable to state-of-the-art methods. Since the UoC gear faults benchmark dataset
is considered the most challenging benchmark dataset in rotating machinery fault diagnosis,
this speaks to the ability of the developed method to extract weak fault signatures. Hence,
the generalizability of the developed fault diagnosis approach across the most common
industrial fault diagnosis domains was demonstrated. Initial experiments indicated that
the developed approach is also effective in transfer learning; however, further experiments
need to be carried out to confirm these observations.
The incorporation of learnability in traditional wavelet scattering networks resulted in
a 10% improvement in fault diagnosis accuracy. As opposed to deep learning networks,
the developed learnable wavelet scattering networks do not require an extensive trial-and-
error process to optimize their structure. Additionally, the developed learnable wavelet
scattering networks learn wavelet filters as opposed to the random filters learnt in deep
Electronics 2022, 11, 451 15 of 16
learning networks. Hence, the filters learnt by learnable wavelet scattering networks are
interpretable, which enables wavelets to be used to gain further insight into circuit faults.
The interpretability of the wavelets learnt by the learnable wavelet scattering networks and
digital circuit fault diagnosis are possible avenues for future research.
Author Contributions: Conceptualization, methodology, investigation, software, writing?original
draft, V.K.; writing?review and editing, M.H.A.; writing?review and editing, supervision, M.G.P.
All authors have read and agreed to the published version of the manuscript.
Funding: The Center for Advanced Life Cycle Engineering (CALCE) and the Center for Advances in
Reliability and Safety (CAiRS) in Hong Kong provided financial support for this research work.
Data Availability Statement: Publicly available datasets were analyzed in this study This data can
be found here: https://figshare.com/articles/dataset/Gear_Fault_Data/6127874/1 (accessed on
31 December 2021) and https://engineering.case.edu/bearingdatacenter (accessed on 31 December 2021).
Acknowledgments: The authors thank the Center for Advanced Life Cycle Engineering (CALCE)
and its over 150 funding companies and the Center for Advances in Reliability and Safety (CAiRS) in
Hong Kong for supporting research into advanced topics in reliability, safety, and sustainment.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Pecht, M.; Jaai, R. A prognostics and health management roadmap for information and electronics-rich systems. Microelectron.
Reliab. 2010, 50, 317?323. [CrossRef]
2. Binu, D.; Kariyappa, B.S. A survey on fault diagnosis of analog circuits: Taxonomy and state of the art. AEU-Int. J. Electron.
Commun. 2017, 73, 68?83. [CrossRef]
3. Vasan, A.S.S.; Long, B.; Pecht, M. Diagnostics and prognostics method for analog electronic circuits. IEEE Trans. Ind. Electron.
2013, 60, 5277?5291. [CrossRef]
4. Yang, H.; Meng, C.; Wang, C. Data-driven feature extraction for analog circuit fault diagnosis using 1-D convolutional neural
network. IEEE Access 2020, 8, 18305?18315. [CrossRef]
5. Li, F.; Woo, P.Y. Fault detection for linear analog IC?The method of short-circuit admittance parameters. IEEE Trans. Circuits Syst.
I Fundam. Theory Appl. 2002, 49, 105?108. [CrossRef]
6. Tadeusiewicz, M.; Halgas, S.; Korzybski, M. An algorithm for soft-fault diagnosis of linear and nonlinear circuits. IEEE Trans.
Circuits Syst. I Fundam. Theory Appl. 2002, 49, 1648?1653. [CrossRef]
7. Luo, H.; Wang, Y.; Lin, H.; Jiang, Y. Module level fault diagnosis for analog circuits based on system identification and genetic
algorithm. Meas. J. Int. Meas. Confed. 2012, 45, 769?777. [CrossRef]
8. Cannas, B.; Fanni, A.; Montisci, A. Algebraic approach to ambiguity-group determination in nonlinear analog circuits. IEEE Trans.
Circuits Syst. I Regul. Pap. 2010, 57, 438?447. [CrossRef]
9. Dai, X.; Gao, Z. From model, signal to knowledge: A data-driven perspective of fault detection and diagnosis. IEEE Trans. Ind.
Inform. 2013, 9, 2226?2238. [CrossRef]
10. Bandyopadhyay, I.; Purkait, P.; Koley, C. Performance of a classifier based on time-domain features for incipient fault detection in
inverter drives. IEEE Trans. Ind. Inform. 2019, 15, 3?14. [CrossRef]
11. Queiroz, L.P.; Rodrigues, F.C.M.; Gomes, J.P.P.; Brito, F.T.; Chaves, I.C.; Paula, M.R.P.; Salvador, M.R.; Machado, J.C. A fault
detection method for hard disk drives based on mixture of gaussians and nonparametric statistics. IEEE Trans. Ind. Inform. 2017,
13, 542?550. [CrossRef]
12. Nasser, A.R.; Azar, A.T.; Humaidi, A.J.; Al-Mhdawi, A.K.; Ibraheem, I.K. Intelligent fault detection and identification approach
for analog electronic circuits based on fuzzy logic classifier. Electronics 2021, 10, 2888. [CrossRef]
13. Shi, J.; Deng, Y.; Wang, Z. Analog circuit fault diagnosis based on density peaks clustering and dynamic weight probabilistic
neural network. Neurocomputing 2020, 407, 354?365. [CrossRef]
14. Aizenberg, I.; Belardi, R.; Bindi, M.; Grasso, F.; Manetti, S.; Luchetta, A.; Piccirilli, M.C. A neural network classifier with
multi-valued neurons for analog circuit fault diagnosis. Electronics 2021, 10, 349. [CrossRef]
15. Yuan, L.; He, Y.; Huang, J.; Sun, Y. A new neural-network-based fault diagnosis approach for analog circuits by using kurtosis
and entropy as a preprocessor. IEEE Trans. Instrum. Meas. 2010, 59, 586?595. [CrossRef]
16. Xiao, Y.; He, Y. A novel approach for analog fault diagnosis based on neural networks and improved kernel PCA. Neurocomputing
2011, 74, 1102?1115. [CrossRef]
17. Xiao, Y.; Feng, L. A novel linear ridgelet network approach for analog fault diagnosis using wavelet-based fractal analysis and
kernel PCA as preprocessors. Meas. J. Int. Meas. Confed. 2012, 45, 297?310. [CrossRef]
18. Zhang, A.; Chen, C.; Jiang, B. Analog circuit fault diagnosis based UCISVM. Neurocomputing 2016, 173, 1752?1760. [CrossRef]
Electronics 2022, 11, 451 16 of 16
19. Song, P.; He, Y.; Cui, W. Statistical property feature extraction based on FRFT for fault diagnosis of analog circuits. Analog Integr.
Circuits Signal Process. 2016, 87, 427?436. [CrossRef]
20. He, W.; He, Y.; Li, B.; Zhang, C. Analog circuit fault diagnosis via joint cross-wavelet singular entropy and parametric t-SNE.
Entropy 2018, 20, 604. [CrossRef]
21. Cui, J.; Wang, Y. A novel approach of analog circuit fault diagnosis using support vector machines classifier. Meas. J. Int. Meas.
Confed. 2011, 44, 281?289. [CrossRef]
22. Liu, Z.; Jia, Z.; Vong, C.M.; Bu, S.; Han, J.; Tang, X. Capturing high-discriminative fault features for electronics-rich analog system
via deep learning. IEEE Trans. Ind. Inform. 2017, 13, 1213?1226. [CrossRef]
23. Zhao, G.; Liu, X.; Zhang, B.; Liu, Y.; Niu, G.; Hu, C. A novel approach for analog circuit fault diagnosis based on Deep Belief
Network. Meas. J. Int. Meas. Confed. 2018, 121, 170?178. [CrossRef]
24. Chen, P.; Yuan, L.; He, Y.; Luo, S. An improved SVM classifier based on double chains quantum genetic algorithm and its
application in analogue circuit diagnosis. Neurocomputing 2016, 211, 202?211. [CrossRef]
25. Wenxin, Y. Analog circuit fault diagnosis via FOA-LSSVM. Telkomnika 2020, 18, 251. [CrossRef]
26. Liang, H.; Zhu, Y.; Zhang, D.; Chang, L.; Lu, Y.; Zhao, X.; Guo, Y. Analog circuit fault diagnosis based on support vector machine
classifier and fuzzy feature selection. Electronics 2021, 10, 1496. [CrossRef]
27. Gao, T.Y.; Yang, J.L.; Jiang, S.D.; Yang, C. A novel fault diagnostic method for analog circuits using frequency response features.
Rev. Sci. Instrum. 2019, 90, 104708. [CrossRef]
28. He, W.; He, Y.; Li, B.; Zhang, C. A naive-Bayes-based fault diagnosis approach for analog circuit by using image-oriented feature
extraction and selection technique. IEEE Access 2020, 8, 5065?5079. [CrossRef]
29. He, W.; He, Y.; Luo, Q.; Zhang, C. Fault diagnosis for analog circuits utilizing time-frequency features and improved VVRKFA.
Meas. Sci. Technol. 2018, 29, 045004. [CrossRef]
30. Ji, L.; Fu, C.; Sun, W. Soft fault diagnosis of analog circuits based on a ResNet with circuit spectrum map. IEEE Trans. Circuits Syst.
I Regul. Pap. 2021, 68, 2841?2849. [CrossRef]
31. Khemani, V.; Azarian, M.H.; Pecht, M.G. Electronic circuit diagnosis with no data. In Proceedings of the 2019 IEEE International
Conference on Prognostics and Health Management (ICPHM), San Francisco, CA, USA, 17?20 June 2019; pp. 1?7. [CrossRef]
32. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27?30 June 2016.
33. Elsken, T.; Metzen, J.H.; Hutter, F. Simple and efficient architecture search for convolutional neural networks. In Proceedings of
the 6th International Conference on Learning Representations, ICLR 2018?Workshop Track Proceedings, Vancouver, BC, USA,
30 April?3 May 2018.
34. Bruna, J.; Mallat, S. Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1872?1886. [CrossRef]
35. Sweldens, W. The lifting scheme: A construction of second generation wavelets. SIAM J. Math. Anal. 1998, 29, 511?546. [CrossRef]
36. Wiatowski, T.; Tschannen, M.; Stanic, A.; Grohs, P.; Bolcskei, H. Discrete deep feature extraction: A theory and new architectures.
In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19?24 June 2016; Volume 5,
pp. 3168?3183.
37. And?n, J.; Lostanlen, V.; Mallat, S. Joint time-frequency scattering. IEEE Trans. Signal Process. 2019, 67, 3704?3718. [CrossRef]
38. LeCun, Y.; Cortes, C.; Burges, C. The MNIST Database of Handwritten Digits. Courant Inst. Math. Sci. 1998. Available online:
http://yann.lecun.com/exdb/mnist/ (accessed on 25 January 2022).
39. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the
2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20?25 June 2009.
40. Garofolo, J.S.; Lamel, L.F.; Fisher, W.M.; Fiscus, J.G.; Pallett, D.S.; Dahlgren, N.L.; Zue, V. TIMIT Acoustic-Phonetic Continuous
Speech Corpus; Linguistic Data Consortium: Philadelphia, PA, USA, 1993.
41. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of
the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3?6 December 2012.
42. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735?1780. [CrossRef]
43. Holland, J.H. Genetic Algorithms. Sci. Am. 1992, 267, 66?73. [CrossRef]
44. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273?297. [CrossRef]
45. Davies, D.L.; Bouldin, D.W. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 224?227. [CrossRef]
46. Mao, W.; Wang, L.; Feng, N. A new fault diagnosis method of bearings based on structural feature selection. Electronics 2019, 8, 1406.
[CrossRef]
47. Bearing Data Center|Case School of Engineering|Case Western Reserve University. Available online: https://engineering.case.
edu/bearingdatacenter (accessed on 25 January 2022).
48. Zhao, Z.; Li, T.; Wu, J.; Sun, C.; Wang, S.; Yan, R.; Chen, X. Deep learning algorithms for rotating machinery intelligent diagnosis:
An open source benchmark study. ISA Trans. 2020, 107, 224?255. [CrossRef]
49. Gear Fault Data. Available online: https://figshare.com/articles/dataset/Gear_Fault_Data/6127874/1 (accessed on 25 January 2022).
50. Qiao, Z.; Elhattab, A.; Shu, X.; He, C. A second-order stochastic resonance method enhanced by fractional-order derivative for
mechanical fault detection. Nonlinear Dyn. 2021, 106, 707?723. [CrossRef]
51. Udmale, S.S.; Singh, S.K.; Singh, R.; Sangaiah, A.K. Multi-fault bearing classification using sensors and ConvNet-based transfer
learning approach. IEEE Sens. J. 2020, 20, 1433?1444. [CrossRef]