Neural Networks 121 (2020) 430–440 Contents lists available at ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet A sparse deep belief network with efficient fuzzy learning framework Gongming Wang a,∗, Qing-Shan Jia a, Junfei Qiao b, Jing Bi b, Caixia Liu c a Department of Automation, Tsinghua University, Beijing 100084, China b Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China c Department of Environmental Engineering, Peking University, Beijing 100871, China article info abstract Article history: Deep belief network (DBN) is one of the most feasible ways to realize deep learning (DL) technique, and Received 12 May 2019 it has been attracting more and more attentions in nonlinear system modeling. However, DBN cannot Received in revised form 17 August 2019 provide satisfactory results in learning speed, modeling accuracy and robustness, which is mainly Accepted 22 September 2019 caused by dense representation and gradient diffusion. To address these problems and promote DBN’s Available online 5 October 2019 development in cross-models, we propose a Sparse Deep Belief Network with Fuzzy Neural Network (SDBFNN) for nonlinear system modeling. In this novel framework, the sparse DBN is considered as a Keywords: pre-training technique to realize fast weight-initialization and to obtain feature vectors. It can balance Deep belief network the dense representation to improve its robustness. A fuzzy neural network is developed for supervised Deep learning modeling so as to eliminate the gradient diffusion. Its input happens to be the obtained feature Sparse representation vector. As a novel cross-model, SDBFNN combines the advantages of both pre-training technique and Fuzzy neural network fuzzy neural network to improve modeling capability. Its convergence is also analyzed as well. A Nonlinear system modeling benchmark problem and a practical problem in wastewater treatment are conducted to demonstrate the superiority of SDBFNN. The extensive experimental results show that SDBFNN achieves better performance than the existing methods in learning speed, modeling accuracy and robustness. © 2019 Elsevier Ltd. All rights reserved. 1. Introduction input-data instead of random initialization. It is the layer-by-layer learning in stages that makes DBN succeed in pre-training deep In recent years, deep learning (DL), inspired by the hierar- structure (Hinton et al., 2006; Hinton & Salakhutdinov, 2006). chical representation of human brain, has been put into the spotlight of academic research community (Hinton & Salakhutdi- In deep learning algorithms, one of the claimed objectives is nov, 2006; LeCun, Bengio, & Hinton, 2015; Schmidhuber, 2015). to separately represent the variations of data (Ali, 2015; Bengio, Compared with artificial neural networks (ANNs), DL not only has 2009). However, the layers in DBN models are densely connected, an effective dimension-reduction capability, but also has a strong which means that fluctuations in inputs can change most of the learning capability in identifying nonlinear dynamics between features in the representation vector, thereby leading to their given data pairs (Qiao, Wang, Li, & Li, 2018a). In particular, deep poor robustness (Ali & Yangyu, 2017; Olshausen & Field, 1996). belief network (DBN) is one of the most feasible architectures Meanwhile, too many hidden neurons easily lead to a redundant using DL technique, and gains successful applications in com- structure, which is also unstable. Furthermore, the fine-tuning puter vision and big data processing (Hinton, Osindero, & Teh, uses a gradient-based back propagation (BP) algorithm starting 2006; Qiao, Wang and Li, 2018). Recent years have witnessed from top layer (target output) to bottom layer (input layer). It its success and promising prospect in nonlinear system modeling is always time-consuming and easily leads to a local minimum and identification (De la Rosa & Yu, 2016; Qiao, Wang, Li, & Li, and sometimes training failures (Ding, Su, & Yu, 2011; Horikawa, 2018b; Wang, Qiao, Bi, Li, & Zhou, 2019). DBN consists of many Furuhashi, & Uchikawa, 1992; Jagannathan & Lewis, 1996), es- restricted Boltzmann machines (RBMs) that are stacked one-by- pecially when it comes to DBN with too many hidden layers. one, and the first RBM’s output is considered as the second one’s Being stuck in a local minimum is an enormous threat to DBN input. The learning process of DBN includes two stages: unsu- learning capability, which leads to a lower accuracy of DBN. It is pervised pre-training and supervised fine-tuning. The unsuper- necessary to further develop more effective methods to balance vised pre-training is to initialize DBN’s weights using unlabeled dense representation and eliminate the problems resulted from BP algorithm. ∗ Corresponding author. To balance the dense representation, many different methods E-mail address: [email protected] (G. Wang). are proposed to build sparse RBMs for improving the robustness of a structure (Keyvanrad & Homayounpour, 2017; Li, Chang, https://doi.org/10.1016/j.neunet.2019.09.035 Yang, Luo, & Fu, 2018; Sharma et al., 2017). For example, a sparse 0893-6080/© 2019 Elsevier Ltd. All rights reserved.
G. Wang, Q.-S. Jia, J. Qiao et al. / Neural Networks 121 (2020) 430–440 431 response regularization induced by L1-norm of codes is used to Fig. 1. Structure of RBM. achieve a small code rate and to construct a sparse DBN (Ji, Zhang, & Zhang, 2014). It has been proved to be more effective results show that SDBFNN achieves better performance than ex- in feature extraction. A new unsupervised learning algorithm isting single-models in learning speed, modeling accuracy and called sparse encoding symmetric machine (SESM) is proposed to robustness. construct a sparse DBN (Boureau & Cun, 2008). It is based on an encoder–decoder paradigm and produces sparse over-complete The rest of this work is organized as follows. Section 2 de- representations efficiently without any need for filter normaliza- scribes the related theoretical background about RBM, DBN and tion (Murray & Kreutz-Delgado, 2006). Although these methods FNN models. Section 3 gives the learning framework and algo- have achieved success in sparse representation to some extent, rithm of SDBFNN. The convergence analysis of SDBFNN is given they have not yet realized significant improvement in robust- in Section 4. Section 5 presents the simulation experiments to ness without any sacrifice of running time cost. Therefore, we demonstrate the superiority of SDBFNN. Finally, this work is concluded in Section 6. pose the first question: how to design an effective approach of sparse representation to achieve pure and significant improvement 2. Theoretical background in robustness? 2.1. Restricted Boltzmann Machine (RBM) With respect to problems resulted from BP algorithm, a har- RBM is usually considered as a kind of energy-based two- mony search based fine-tuning is used to optimize the weights of DBN (Papa, Scheirer, & Cox, 2016), and the corresponding layers (visible and hidden layers) neural network. It is shown in results demonstrate its effectiveness. In Qiao et al. (2018a), a partial least square regression (PLSR) model is first introduced Fig. 1. into learning process of DBN, which avoids the problems resulted from BP-based fine-tuning. Additionally, a novel deep prediction In a learning process, RBM adopted an unsupervised manner framework named the deep belief echo-state network (DBEN) is proposed for time-series prediction (Sun, Li, Li, Huang, & Li, 2017). according to Contrastive Divergence (CD) algorithm. In Fig. 1, It is proved to be effective method in eliminating the problems resulted from BP algorithm. These alternative methods have made h = (h1, h2, . . . , hn) and v = (v1, v2, v3, . . . , vm) denote the progress. Unfortunately, these efforts have not considered the feature vectors of the hidden and visible layers, respectively. wR convergence of alternative methods. It is just very important to is an weight matrix with a dimension of m × n connecting the their successful applications. Specifically, a new thorny problem occurs when BP algorithm is replaced. Therefore, we pose the hidden and visible neurons. second question: what is the most suitable method to realize fast Considering a parameter set θ = (wR, bv, bh), we give the and accurate supervised learning framework? following function: How to address these two questions mentioned above is what we are concerned with in this work. The related studies have m n mn (1) found that, fuzzy neural network (FNN) is a hybrid learning model. It combines fuzzy systems with neural networks (Huang, E(v, h; θ) = − ∑ bvivi − ∑ bhjhj − ∑ ∑ viwiRj hj, Ho, & Cao, 2005; Li & Liu, 2008; Zhao, Gao, & Mou, 2008). Owing to its the semantic transparency of a fuzzy system and learning i=1 j=1 i=1 j=1 capability, FNN is considered as an effective and powerful tool to model dynamical characteristic between input and output data- where wR is a connecting matrix of RBM, bh and bv are the biases points. Although BP algorithm is still used in FNN, it is only conducted between output layer and rule layer one time instead of hidden and visible neurons, respectively. of repeated layer-by-layer many times. Therefore, there is no threat from gradient diffusion in FNN (Hadavandi, Shavandi, & The corresponding distribution of marginal probability regard- Ghanbari, 2010). ing v is given as: Inspired by sparse representation and efficient FNN supervised learning framework, we propose a Sparse Deep Belief Fuzzy Neu- P(v; θ) = ∑ e−E(v,h;θ) , (2) ral Network (SDBFNN). As a novel cross-model, SDBFNN combines the advantages of both pre-training technique (DBN) with FNN h model to improve modeling capability. The major contributions of this work are: ∑ e−E(v,h;θ) (1) It proposes two regularization terms to realize sparse rep- v,h resentation in an RBM training process. The resulting Sparse DBN (SDBN) can be considered as a pre-training technique to realize a The derivative of P(v; θ) regarding θ should be first calculated, fast weight-initialization and to obtain the feature-representation vector. It can balance the dense representation to improve its then the parameters are updated, i.e., robustness. ∆θ = ∂ log P(v; θ) , (3) (2) It proposes a FNN learning framework for supervised mod- ∂θ eling so as to reduce complexity of layer-by-layer BP algorithm even eliminate the gradient diffusion. The input of FNN happens ∂ log P(v; θ) = Edata(vihj) − Emodel(vihj), (4) to be the feature-representation vector derived from the last ∂θ hidden layer of SDBN model. θτ+1 = θτ + η∆θτ , (5) (3) It proposes the convergence analysis of resulting SDBFNN according to input–output stability and Lyapunov function. In the It is worth noting that η is a learning rate, τ is iteration number, simulation experiments, a benchmark problem and a practical Edata(vihj) and Emodel(vihj) are the mean values of original data and problem in wastewater treatment process are used to demon- strate the superiority of SDBFNN. The extensive experimental sampling data from the resulting model, respectively. Generally, it is extremely hard to calculate Emodel(vihj). Therefore, researchers
432 G. Wang, Q.-S. Jia, J. Qiao et al. / Neural Networks 121 (2020) 430–440 Fig. 2. Gibbs sampling of CD algorithm. Fig. 4. Structure of FNN. repeated layer-by-layer BP algorithm. It is proved to be time- consuming and inefficient for deep structure (Ding et al., 2011; Qiao et al., 2018a). Therefore, it is necessary to develop a more efficient supervised learning framework instead of layer-by-layer BP algorithm. Fig. 3. Structure of DBN. 2.3. Fuzzy neural network use Gibbs sampling (Bengio, Courville, & Vincent, 2013; Hinton Fuzzy neural network (FNN) is an efficient and powerful et al., 2006) to approximately calculate it. It is shown in Fig. 2. method to realize supervised learning and modeling. FNN consists of input layer, membership function layer, rule layer and output During a Gibbs sampling process, the distributions of condi- layer. Its structure is shown in Fig. 4. tional probability of hidden and visible neurons are given as: Input layer: This layer receives the external input x = ( m) (6) [x1, x2, . . . , xn], the dimension of input vector is the number of P(hj = 1/v, θ) = σ bhj + ∑ viwiRj , (7) input layer neurons. i=1 Membership function layer: In membership function layer, ⎛⎞ each neuron denotes a lingual variable. It is used to calculate the degree of membership of input component. In general, the n membership function is described as: P(vi = 1/h, θ) = σ ⎝bvi + ∑ wiRj hj⎠ , ( )e−xi −cij 2 σi2j j=1 fij = , (8) where σ (·) is a Sigmoid function. 2.2. Deep Belief Network (DBN) where fij is the membership function, which describes that the ith DBN model is actually a set of stacked RBMs. Its previous input belongs to jth fuzzy set. cij and σij denote center and width RBM’s output is considered as the following RBM’s input. The basic structure of DBN is shown in Fig. 3. of the jth membership Gaussian function of xi, respectively. Rule layer: In rule layer, each neuron denotes a fuzzy rule, All the RBMs are sequentially trained by using a CD algorithm whose output is activation intensity, it can be described as: to obtain initialized weight W R = (wRl , wlR−1, . . . , w1R). l is the aj = f1j(x1)f2j(x2) · · · fnj(xn), (9) number of hidden layer, namely the last hidden layer of DBN. This training process is considered as the initialization of weights, a¯ j = aj aj , (10) which is proved to be an advantage over random initialization of weights (Hinton et al., 2006). After stacked RBMs are trained, the ∑r calculated difference between desired and actual outputs is used to further perform supervised learning. It is the so-called fine- j=1 tuning and usually implemented by a BP algorithm. Noting that BP-based fine-tuning is conducted from the output layer to the in- where r is the number of rule layer neurons, namely the number put one to obtain the final weight W = (wout , wl, wl−1, . . . , w1), of fuzzy rule, and a¯j is normalized value. and this process involves in several middle hidden layers. Output layer: This layer realizes final output calculation, it is described as: r (11) ∑ y = wja¯j j=1 Remark 1. Generally, DBN model has several hidden layers where y is the output of FNN, and wj is the weight connecting and neurons, and the existing supervised fine-tuning is usually rule layer and output layer.
G. Wang, Q.-S. Jia, J. Qiao et al. / Neural Networks 121 (2020) 430–440 433 Fig. 5. SDBFNN learning framework. Remark 2. In FNN learning process, the main task is to train R2 = n 1 e− (hj −µ)2 , (14) 2σ 2 the weight connecting rule layer and output layerwj, center cij ∑ √ and width σij of the jth membership Gaussian function of xi. σ 2π j=1 The current popular algorithm used to train these parameters is gradient descent. where λ is a regularization constant. The first sparsity term penal- Compared with BP-based fine-tuning of DBN, FNN has many izes the activation expectation of hidden neurons from a low level advantages in supervised learning framework. Most important one of these advantages is that, it only optimizes the weights µ, and the second one controls the sparseness by using a variance between rule layer and output layer by using BP algorithm one σ 2 in a normal probability density function. n is the hidden time, and the parameters in membership function layer are also optimized one time. That is to say, BP algorithm in FNN is not con- neurons number, hj is average value of activation probability on ducted in a continuous manner, which can effectively eliminate jth hidden neuron. gradient diffusion problem of BP algorithm in DBN’s fine-tuning. The log-likelihood term can be updated by CD algorithm. The first and second terms are updated with a gradient descent method. The gradient can be calculated as: 3. SDBFNN learning framework ∂ ( ⏐⏐µ − 1 ∑n E(hj/v)⏐⏐2) λ n j=1 ∆θR1 = ∂θ (15) In this section, we present a novel deep learning structure ⏐⏐ for nonlinear system modeling, which combines pre-training technique of SDBN and efficient supervised learning of FNN. 2λ ⏐ n ⏐ n (E(hj/v)) (E(hj/v)2) , The learning framework of SDBFNN is illustrated in Fig. 5. In SDBFNN learning framework, DBN is considered as an effective n ⏐⏐µ ∑ E(hj/v)⏐⏐ ∑ pre-training technique to obtain a feature-representation vector, ⏐ ⏐ which is preserved as hl in the last hidden layer. Because hl is the = − − − most effective and representative feature vector for input data, it is feasible and effective to consider hl as the input of FNN learning ⏐ j=1 ⏐ j=1 model. Just like the description in Section 2.3, FNN learning model can realize efficient supervised learning and modeling without ( (hj −µ)2 ) influences from gradient diffusion of BP algorithm. Therefore, 2σ 2 SDBFNN is a novel combination between SDBN and FNN, and is ∂ λ ∑n √1 e− the first effort to describe the details about how to do it. σ 2π j=1 ∆θR2 = ∂θ ( (hj −µ)2 ) 2σ 2 ∂ ∑n √1 e− (16) λ σ 2π j=1 ∂ hj ∂U = ∂hj ∂U ∂θ = λ µ − hj hj(1 − hj)v n 1 e− (hj −µ)2 , σ2 2σ 2 ∑ √ σ 2π j=1 3.1. Sparse representation Therefore, the parameters of Sparse RBM (SRBM) can be updated as follows: A simple RBM is to encode an input v and decode it back to θτ+1 = θτ + η(∆θ + ∆θR1 + ∆θR2 ), (17) reconstruct vˆ . In order to avoid dense representation and make where η is the learning rate given in (5). RBM more tolerant to fluctuations of input, additional constraints should be added to make it extract useful features even if it In this case, SDBN can be achieved with the continuous train- is over-complete (Chen & Li, 2017). For this purpose, two spar- ing of several SRBMs. The effective feature vector is available for sity terms R1 and R2 are designed to penalize average value of the following FNN-based supervised model. activation probability of hidden neurons as follows: Maximize log P(v) + λ (R1 + R2) , (12) 3.2. SDBFNN learning process θ ⏐ ⏐2 For the sake of solution to the problems mentioned in n Remark 1, we present an efficient FNN-based supervised mod- ⏐ 1 ⏐ , eling with the case where its input is the effective feature vector n ∑ from SDBN. The detailed process is described as follows. ⏐⏐µ (hj/v)⏐⏐ R1 = − ⏐ − E ⏐ (13) ⏐ j=1 ⏐
434 G. Wang, Q.-S. Jia, J. Qiao et al. / Neural Networks 121 (2020) 430–440 Step 1: We extract the effective feature vector for input layer. input and output stability is very important for the convergence. Once SDBN is pre-trained, the effective feature vector is preserved Therefore, the convergence of pre-training starts with stability. in the last hidden layer, which can be extracted as: Without loss of generality, the lower and upper asymptotes in (6) and (7) are denoted by AL and AH , respectively. With respect hl = wRl wlR−1 · · · wR2w1RD, (18) to a standard RBM, if si0 and sik are the input and k Gibbs samplings data (reconstruction), and s0j and skj are hidden layer state and k where l is the hidden layers number, and D is the original input Gibbs samples vector, respectively. We can conclude data. s0i ∈ [AL, AH ], (26) Step 2: We consider the feature-representation vector of last (m) (27) sj0 = AL + (AH − AL)σ bhj + ∑ s0i WiRj , (28) hidden layer hl as the input of FNN. Let input layer of FNN receive x = [x1, x2, . . . , xn] = hl = [h1, h2, . . . , hn]. Step 3: We give parameter learning. Let loss function be: L2 = 1 (yd − y)2 , (19) i=1 2 ⎛ ⎞ n where yd and y are the desired output and SDBFNN’s output, ski = AL + (AH − AL)σ ⎝bvi + ∑ WiRj sjk−1⎠ , respectively. The aim of parameter learning is to minimize the j=1 loss function, and the parameters can be updated as follows: (m) ⎧ + 1) = wj (t ) − η ∂ L2 sjk = AL + (AH − AL)σ bhj + ∑ ski WiRj , (29) ∂ wj ⎪⎪wj(t i=1 ⎪ ⎨ cij(t + 1) = cij (t ) − η ∂ L2 (20) From (26) to (29), we further conclude that the output state of ∂ cij RBM is calculated from the middle state during a Gibbs sampling ⎪ process. Therefore, we have the following result. ⎪ + 1) = σij (t ) − η ∂ L2 , ⎪⎩σij(t ∂ σij where η is learning rate. According to derivative chain rule, the Theorem 1. If si0 and sjk denote the input and output states, s0j and sik partial derivatives of loss function with respect to wj, cij and σij are considered as the middle feature states, we have a sufficient and are described as: necessary condition of the SRBM’s convergence: all the states vectors ∂ L2 = ∂ L2 ∂y r (21) follow that s0i , sjk, sj0, ski ∈ [AL, AH ]. ∂ wj ∂y ∂ wj ∑ Proof. (1) Necessity. If the stability and convergence of an SRBM = − (yd − y) a¯j, is guaranteed, the input and output states are bounded, namely j=1 si0, ski ∈ [AL, AH ]. In addition, because σ (·) is monotonically in- ∂ L2 = ∂ L2 ∂y ∂ a¯ j ∂ aj ∂ fij = − ∂ fij (yd − y)wj ∑r aj ∏ ∂ cij ∂y ∂ a¯ j ∂ aj ∂ fij ∂ cij ∂ cij fkj, creasing, and when the number of activated neurons is contin- j̸=i k̸=i uously increasing, we can conclude that sjk > ski and sik > sj0. ∑r aj 2 Therefore, a conclusion can be drawn that all the states satisfy j=1 that si0, skj , s0j , sik ∈ [AL, AH ]. The necessity is proved. (22) (2) Sufficiency. It can be seen from (26)–(29) that when si0, sj0, ∂ L2 = ∂ L2 ∂y ∂ a¯ j ∂ aj ∂ fij = − ∂ fij (yd − y)wj ∑r aj ∏ ski ∈ [AL, AH ], the hidden layer state skj satisfies skj ∈ [AL, AH ], ∂ σij ∂y ∂ a¯ j ∂ aj ∂ fij ∂ σij ∂ σij fkj, j̸=i which shows that the SRBM output is stable and bounded. Thus, k̸=i ∑r aj 2 the convergence of SRBM is guaranteed, which proves the suffi- j=1 ciency. ■ (23) where ( )− xi−cij 2 With respect to the convergence, the first sparsity term R1 is actually from the traditional approaches (Chen & Li, 2017), which ∂fij = 2 (xi − cij) e σi2j ∂ cij σi2j can keep the hidden neurons inactive during a training process. , (24) (25) This approach is to give the hidden neurons a very small desired ( )− xi−cij 2 activation, and the deviation between real and desired activations ∂fij = 2 (xi − cij)2 e σi2j ∂ cij σi3j is to be penalized. Assuming that µ is the desired activation, and , µj is average activation of the jth hidden neuron for S training samples. It is described as: Step 4: We test the well-trained SDBFNN model for a given set 1 S E(hj/v), of test data. µj = S ∑ (30) j=1 4. Convergence analysis If µj deviates too much from µ, it will be penalized by the For the proposed method, the convergence of SDBFNN with sparsity penalty term. The term is intuitive and effective, which weight parameters is an important issue to its successful appli- cations. Therefore, we analyze the convergence of the SDBFNN is described as: model from two parts: (1) pre-training stage of SDBN; and (2) FNN-based supervised modeling stage. KL (µ ∥ µj) = (1 − µ) log 1−µ + µ log µ (31) 1 − µj , µj 4.1. Pre-training stage where KL(·) is Kullback–Leibler divergence, which is considered Pre-training is to train all SRBMs, and the convergence of as a penalty term between real and target activations. In sparsity SRBM can measure the convergence of the proposed method. Because SDBN can be considered as several stacked SRBMs, whose penalty function KL(·), a small µ close to 0 can work well. Fig. 6 gives the sparsity penalty function KL(·) when µ = 0.1, which shows that this sparse term can effectively penalize the activa- tions of hidden neurons and make them approximate the desired activations.
G. Wang, Q.-S. Jia, J. Qiao et al. / Neural Networks 121 (2020) 430–440 435 According to (37), (34) can be rewritten as ∆V (e(t)) = 1 ((e(t) + ∆e(t))2 − e2(t)) 2 = 1 (2e(t)∆e(t) + ∆e2(t)) (38) 2 1 ⎛ ( ∂ y (t ) )2)2 ⎞ 2 ( ∂ wj (t ) − 1⎠ , = e2 (t ) − η ⎝1 As indicated by (38), when the learning rate η satisfies: 2 (39) (40) 0 < η < ( ∂y(t) )2 , ∂ Φj (t ) Fig. 6. Sparsity penalty function KL(·) when µ = 0.1. we have ( ∂ y(t ) )2)2 ( ∂ Φj (t ) −1 < 0, 1−η The second sparsity term as shown in (14) is a normal prob- As shown in (38) and (40), it is concluded that ∆V (e(t)) < 0 ability density function, which is mainly to control sparseness and V (e(t)) → 0. According to Lyapunov theorem of stability, using variance σ 2. When σ 2 is bigger, the activations of hidden the error dynamics shown in (32) is uniformly and ultimately neurons are more scattered. This is not only to penalize the devi- bounded (UUB). Stability is guaranteed. ation between actual and desired activations of hidden neurons but also to control sparseness. (2) Approaching zero 4.2. FNN-based supervised modeling stage Defining error matrix of weight parameters as Σ = Φ − Φ∗. Considering expansive matrix (e, Σ)-space, let the Lyapunov function be According to Eq. (19), let FNN-based modeling error dynam- V (e, Σ) = 1 (eTe + trace (ΣTΣ)) , (41) ics (Qiao & Han, 2012) be 2 e˙(t) = y˙ d(t) − y˙ (t) where trace (·) is the trace operator. Its derivative is then given = g (hl(t), Φ∗) − g′ (hl(t), Φ(t)) − e(t), as: (32) V˙ (e, Σ) = eTe˙ + trace (Σ˙ TΣ) where yd and y are the desired output and SDBFNN’s output, = e (−e − ΣF (hl(t), Φ, Φ∗)) respectively; e(0) = 0, g(·) and g′(·) are resulting function from + trace (Σ˙ TΣ) (42) FNN and ideal one, respectively; Φ∗ and Φ = [w, c, Σ] are the = −e2 − eΣF (hl(t), Φ, Φ∗) ideal and actual parameters set of FNN, respectively; t denotes + trace (Σ˙ T ) , the sample batch, it is also considered as time to some extent. Σ With respect to the convergence in FNN-based supervised where modeling stage, we analyze this issue from two parts: stability F (hl(t), Φ, Φ∗) = g′ (hl(t), Φ) − g (hl(t), Φ∗) , (43) and approaching zero. According to weight update strategy of FNN-based modeling, (1) Stability (42) can be rewritten as: Let the Lyapunov function be V (e(t)) = 1 eT(t)e(t), (33) V˙ (e, Σ) = −e2 − trace (eΣF (hl(t), Φ, Φ∗)) 2 + trace (Σ˙ T ) Σ then = −e2 − trace ( (eF (hl (t ), Φ, Φ∗) Σ ∆V (e(t)) = V (e(t + 1)) − V (e(t)) = 1 (e2(t + 1) − e2(t)) , (34) − Σ˙ T)) (44) 2 = −e2 − trace ( (eF (hl (t ), Φ, Φ∗)) where e(t + 1) = e(t) + ∆e(t). Σ The parameters set of FNN-based unsupervised modeling is − eF (hl(t), Φ, Φ∗)) denoted by Φj = [wj, cij, σij], according to complete differential = −e2, theorem, we have From (44), it is concluded that V˙ (e, Σ) is negative semi- r( ∂ e(t ) ) definite. As result of analysis above and the Lyapunov stabi- ∑ lization theorem as well as approximation error dynamics (Han ∆e(t) = ∂Φj(t) ∆Φj(t) , & Qiao, 2010; Qiao & Han, 2012), the approximation error e(t) (35) satisfies: j=1 where ∆Φj (t ) = ∂ e(t ) = ηe(t ) ∂ y(t ) , (36) lim e(t) = 0, (45) −η (37) ∂ Φj (t ) t →∞ ∂ Φj (t ) Therefore, the whole error of FNN-based supervised modeling is thus as follows: ∆e(t) = r( ∂ e(t ) ηe(t ) ∂y(t) ) = −ηe(t) ( ∂y(t) )2 , Ntr ∑ Esum(t) = ∑ eT(t)e(t), j=1 ∂ Φj(t) ∂ Φj (t ) ∂ Φj (t ) (46) t =1
436 G. Wang, Q.-S. Jia, J. Qiao et al. / Neural Networks 121 (2020) 430–440 also satisfies lim Esum(t) = 0, (47) t →∞ where Ntr denotes the number of training samples, and t → ∞ means that the number of training samples is large enough. In conclusion, the convergence of the proposed SDBFNN in both pre-training and FNN-based supervised modeling is theoret- ically guaranteed simultaneously. 5. Simulation experiments In this section, we do some simulation experiments to show Fig. 7. SDBFNN training results for nonlinear system modeling. the superiority and effectiveness of our method. There are two examples: nonlinear system modeling and water quality predic- Fig. 8. SDBFNN training error for nonlinear system modeling. tion in wastewater treatment process (WWTP). All simulations are carried out in MATLAB R2013b environment and run on a parameters are selected as: λ = 2, µ = 0.1, σ = 0.5, and target Core i7-4790 with CPU 3.6 GHZ and 8.0 GB RAM. In the internal error ET = 0.001. connection of SDBN model, if the connection weight parameters are all large, the SDBN can be considered as a model constructed Figs. 7–8 show training result and error of SDBFNN, respec- by dense representation. The SDBN with dense representation tively. Fig. 9 shows RMSE value during training process, includ- is highly dependent because any fluctuations in the input can ing the proposed SDBFNN, transfer learning-based growing DBN change most of the features in the representation vector. So the (TL-GDBN) (Wang et al., 2019), DBN-based echo-state network pursuit for sparse representation in SDBN can make the SDBFNN (DBESN) (Sun et al., 2017), self-organizing cascade neural network more robust to some extent. Therefore, here we use two evalua- (SCNN) (Li, Qiao, Han, & Yang, 2016) and growing echo-state tion indices: Mean square error (RMSE) and sparsity degree (Sd). network (GESN) (Qiao, Li, Han, & Li, 2017). Figs. 10–11 show They evaluate SDBFNN for modeling accuracy and robustness, their testing results and testing errors, respectively. As shown in respectively. Figs. 7–11, SDBFNN can give more accurate training and testing results with smallest errors, while the other methods predict the target data-points with bigger errors. Furthermore, it can be seen that there are many periodically crests and troughs in training RMSE = N − y′i )2 , (48) and testing errors, which are caused by the strong irregular non- 1 (49) linearity in periodically inflection points shown in Figs. 7–10. In √ ∑ (yi particular, it is universally acknowledged that it is very difficult to learn and predict the strong irregular nonlinearity in periodically N i=1 inflection points (Mattsson, Zachariah, & Stoica, 2018; Zhang, Lin, & Karim, 2018). Sd = ∑L (∑nl−1 ∑nl |wlR (i, j) |) , (i, j))2) In order to demonstrate the superiority of SDBFNN in detail, l=2 i=1 j=1 especially its robustness, 50 independent comparative trials with other methods are performed. The methods for comparison in- ∑L (∑nl−1 ∑nl (wlR clude TL-GDBN (Wang et al., 2019), normal regularization-based sparse DBN (NR-SDBN) (Keyvanrad & Homayounpour, 2017), rate l=2 i=1 j=1 distortion theory-based sparse DBN (RDT-SDBN) (Ji et al., 2014), sparse feature learning-based DBN (SFL-DBN) (Boureau & Cun, where N is the samples number, yi and y′i are the ith desired and 2008), DBESN (Sun et al., 2017), SCNN (Li et al., 2016), GESN (Qiao actual outputs, respectively. L is the number of hidden layers in et al., 2017), an automatic axon-neural network (AANN) (Han, Wang, & Qiao, 2013) and genetic algorithm-based FNN (GA- SDBN, nl is the number of hidden neurons, and wlR(i, j) is the ele- FNN) (Zhang & Tao, 2018), where NR-DBN, RDT-DBN and SFL-DBN ment of sub-matrix wlR, which is a part of wR. In particular, owing are three DBN models with sparse representation. The statisti- cal average results are listed in Table 1, where the results are to two sparse terms are used to penalize dense representation, their optimal outputs with consideration of all the performance the weight connecting parameters are all under a low level, thus, the sparse representation and robustness will be more obvious when Sd is smaller. 5.1. Nonlinear system modeling Nonlinear dynamic system modeling is often used to eval- uate the performance of neural networks (Pedrycz, Al-Hmouz, Balamash, & Morfeq, 2015; Qiao et al., 2018a). In this example, nonlinear dynamic system is described as: y(t)y(t − 1) (y(t) + 2.5) (50) y(t + 1) = y2(t) + y2(t − 1) + 1 + u(t), where y(1) = y(2) = 0, u(t) = sin(2π t/25). According to (50), SDBFNN has three input y(t − 1), y(t) and u(t), and its output is y(t + 1). The key issue is to model input– output relation and predict y(t + 1) using y(t − 1), y(t) and u(t). The data samples used in this example are generated from (49) with sampling time ts = 0.1, 10000 data-points with t ∈ [0.1, 1000] are selected as training samples, and 200 data-points with t ∈ [1000.1, 1020] are selected as testing samples. The structure size of pre-training of SDBN is 3-8-8-3, and the structure of FNN learning model is 3-12-4-1. The number of iteration in unsupervised pre-training is 100 so that each RBM of SDBFNN is sufficiently trained to obtain an efficient feature. The main
G. Wang, Q.-S. Jia, J. Qiao et al. / Neural Networks 121 (2020) 430–440 437 Table 1 Experimental results of different methods for nonlinear system modeling. Methods Structure size Sd Testing RMSE Running time Mean Var Training Testing SDBFNN 3-8-8-3-12-4–1 0.63 2.62e−4 4.71e−7 8.13 3.95 TL-GDBN 3-15-15-15-15–1 1.32 3.83e−3 2.67e−5 14.07 5.80 NR-DBN 3-20-20-20–1 0.81 2.47e−3 1.02e−5 9.26 4.82 RDT-DBN 3-18-18-18–1 0.92 2.13e−3 1.19e−5 9.97 5.29 SFL-DBN 3-25-25-25–1 0.79 3.08e−3 8.63e−6 10.75 5.54 DBESN 3-10-10-20-1a 1.65 1.36e−3 2.83e−5 12.16 6.10 SCNN 3-50–1 1.30 5.89e−3 3.21e−5 16.07 6.30 GESN 3-60-1b 0.98 9.87e−4 3.42e−5 14.28 5.85 AANN 4.24e−2 8.89 4.42 GA-FNN c c 1.03e−3 c 9.675 4.58 3-36-9–1 2.16 4.10e−5 The winner is shown in boldface. Running-time includes training and testing time. aThe result denotes that the reservoir size of sub-model ESN is 20. bThe result denotes that the reservoir size of GESN is 60. cThe results are not listed in original paper. Fig. 9. Training RMSEs of different methods for nonlinear system modeling. Table 2 Fig. 10. Testing results of different methods for nonlinear system modeling. Measurable variables related to TP concentration. Variables Units Descriptions DO1 mg/L Dissolved oxygen concentrations in front end of aerobic tank DO2 mg/L Dissolved oxygen concentrations in back end of aerobic tank ITP mg/L Influent total phosphorus NH4 − N mg/L Ammonium nitrogen concentration in effluent water NO3 − N mg/L Nitrate nitrogen concentration in effluent water ORP1 mV Oxidation–reduction potential in anaerobic tank ORP2 mV Oxidation–reduction potential in effluent water PH – Potential of hydrogen T ◦C Water temperature TSS g/L Total suspended solids concentration in back end of aerobic tank Fig. 11. Testing errors of different methods for nonlinear system modeling. the key issue of WWTP is to make sure that the quality of treated wastewater conforms to the related standard, especially indices, and the winner values are shown in boldface. We con- the concentration of total phosphorus (TP). In this example, clude from Table 1 that the SDBFNN has better performances in SDBFNN is used to model and predict the effluent TP concentra- nonlinear system modeling than its peers: firstly, the modeling tion based on practical data in WWTP. Table 2 shows 10 mea- accuracy of our method is the highest, and its structure size is surable variables related to TP, and they are used as the inputs simplest; secondly, the training and testing time of our method of SDBFNN (Wang et al., 2019). In particular, the dataset used in is the shortest, and the testing RMSE value is the smallest as this example contains 2300 samples collected from a wastewater well; finally, the variance (Var) value of testing RMSE during 50 factory from March to November, 2016 in Gaobeidian, Chaoyang, independent experiments is the smallest, which indicates that Beijing, China. In specific experiment, 2000 samples are randomly our method has much higher robustness and stability than the selected as training samples, and remaining 300 samples are other methods without consideration of sparse representation. selected as testing samples. SDBFNN has ten input variables Additionally, the smallest Sd value indicates that SDBFNN has a shown in Table 2 and one output-TP concentration. The key better sparse representation, which makes SDBFNN more robust issue is to model relation between input and output and predict in return. TP concentration by using input in future time. The structure size of pre-training of SDBN is 10-15-15-4, and the structure 5.2. Total phosphorus prediction in WWTP of FNN learning model is 4-24-8-1. The number of iteration in unsupervised pre-training is 100 so that each RBM of SDBFNN With the development of society, wastewater treatment has is sufficiently trained to obtain an efficient feature. The main been becoming a hot issue in many countries. As we know, parameters are selected as: λ = 3, µ = 0.1, σ = 0.5, and target error ET = 0.01. Fig. 12 shows the RMSE value during training process, Figs. 13– 14 show testing results and errors, respectively, including the proposed SDBFNN, TL-GDBN (Wang et al., 2019), DBESN (Sun et al., 2017), SCNN (Li et al., 2016) and GESN (Qiao et al., 2017). As shown in Figs. 12–14, SDBFNN can give more accurate training and testing results with smallest errors, while the other methods predict the TP concentration with bigger errors. Without loss of completeness, an evaluating indicator named coefficient of
438 G. Wang, Q.-S. Jia, J. Qiao et al. / Neural Networks 121 (2020) 430–440 Fig. 12. Training RMSEs of different methods for TP prediction. Fig. 15. Regression plots of predicted and target TP of SDBFNN. Fig. 13. Testing results of different methods for TP prediction. et al., 2017), SCNN (Li et al., 2016), GESN (Qiao et al., 2017), AANN (Han et al., 2013), GA-FNN (Zhang & Tao, 2018) and or- Fig. 14. Testing errors of different methods for TP prediction. thogonal least squares-based cascade neural network (OLSCNN) (Huang, Song, & Wu, 2012), where NR-DBN, RDT-DBN and SFL- determination is introduced to intuitively evaluate the prediction DBN are three DBN models with sparse representation. The sta- result, which is symbolized as R2 and defined as: tistical average results are listed in Table 3, where the results are their optimal outputs with consideration of all the performance R2 = ∑N (yi′ − y¯)2 (51) indices, and the winner values are shown in boldface. We con- , clude from Table 3 that the SDBFNN has better performances for i=1 TP concentration modeling and prediction than its peers: firstly, − y¯)2 the modeling accuracy of our method is the highest, and its struc- ∑N (yi ture size is simplest; secondly, the training and testing time of our method is the shortest, and the testing RMSE value is the smallest i=1 as well; finally, the variance (Var) value of testing RMSE during 50 independent experiments is the smallest, which indicates that where N is the samples number, yi′, y¯ and yi are the ith desired, our method has much higher robustness and stability than the average value and actual outputs, respectively. Particularly, 0 < other methods without consideration of sparse representation. R2 < 1, and the predicted TP is closer to the target TP when R2 is Furthermore, the smallest Sd value indicates that SDBFNN has a better sparse representation, which makes SDBFNN more robust closer to 1. Fig. 15 shows the regression plots of the predicted TP in return. and the target TP, where R2 = 0.9793. Just as shown in Fig. 15, Remark 3. According to the experimental results of these two examples, the accuracy of SDBFNN for nonlinear system modeling the predicted values of TP concentration fit its target values with is higher than that of TP concentration prediction. the first exam- ple is a benchmark nonlinear system with regular nonlinearities, a small deviation. and it is not difficult to learn its nonlinear dynamics; while the second example is a practical nonlinear system with complex and To show the superiority of SDBFNN in detail, especially its irregular nonlinearities, and it is a little bit difficult to accurately learn its nonlinear dynamics. robustness, 50 independent comparative trials with other meth- Remark 4. As described in Section 3.1, two sparsity terms are ods are performed. The methods for comparison include self- used to penalize the dense representation to further improve robustness of SDBFNN. The experimental results have successfully organizing DBN (SODBN) (Qiao et al., 2018b), TL-GDBN (Wang shown their effectiveness that SDBFNN has achieved effective sparse representation and good robustness. et al., 2019), NR-DBN (Keyvanrad & Homayounpour, 2017), RDT- As a note to practitioners, SDBFNN is a novel cross-model DBN (Ji et al., 2014), SFL-DBN (Boureau & Cun, 2008), DBESN (Sun combining pre-training technique and fuzzy learning. It does well in those applications that attach great importance to stability and robustness. Especially those practical systems with high- dimensional and complex input data, such as optimization and modeling in wastewater treatment process (WWTP), continuous stirred tank reactor (CSTR) system and cyber physical energy system (CPES).
G. Wang, Q.-S. Jia, J. Qiao et al. / Neural Networks 121 (2020) 430–440 439 Table 3 Experimental results of different methods for TP concentration prediction. Methods Structure size Sd Testing RMSE Running time (s) Mean Var Training Testing SDBFNN 10-15-15-4-24-8–1 0.39 8.27e−3 1.06e−6 6.15 2.06 SODBN 10-16-15-12-10–1 1.71 5.39e−2 6.20e−4 18.42 7.30 TL-GDBN 10-180-180-180-180–1 1.95 1.54e−2 7.29e−6 7.18 3.38 NR-DBN 10-220-220-220–1 0.67 2.47e−3 1.02e−5 7.05 3.18 RDT-DBN 10-200-200-200–1 0.78 2.13e−3 1.19e−5 7.12 3.41 SFL-DBN 10-210-210-210–1 0.62 3.08e−3 8.63e−6 7.16 3.50 DBESN 10-15-15-30-1a 2.03 9.71e−2 3.07e−4 9.10 5.03 SCNN 10-60–1 1.38 2.04e−2 4.80e−4 12.67 5.25 GESN 10-70-1b 0.86 8.36e−2 2.97e−4 11.23 4.85 AANN 1.03e−2 23.19 10.08 GA-FNN c c 5.04e−2 c 8.34 4.62 OLSCNN 9.43e−2 36.82 13.73 10-30-10–1 1.59 6.13e−4 2.21 6.13e−4 c The winner is shown in boldface. Running-time includes training and testing time. aThe result denotes that the reservoir size of sub-model ESN is 30. bThe result denotes that the reservoir size of GESN is 70. cThe results are not listed in original paper. 6. Conclusion and National Science and Technology Major Project of China (No. 2018ZX07111005). The authors would like to thank these In this work, we propose a sparse deep belief network with supports. fuzzy neural network (SDBFNN) to improve accuracy and robust- ness of a learning model. As a cross-model, SDBFNN is the first References effort to combine fuzzy learning framework with pre-training technique of deep belief network (DBN). The experimental re- Ali, M. (2015). Use of dropouts and sparsity for regularization of autoencoders in sults, performed on benchmark nonlinear system and practical deep neural networks[D]. bilkent university. wastewater treatment system, have demonstrated its effective- ness and superiority. As a summary, the novelties and contribu- Ali, A., & Yangyu, F. (2017). Automatic modulation classification using deep tions are as follows. learning based on sparse autoencoders with nonnegativity constraints. IEEE Signal Processing Letters, 24(11), 1626–1630. (1) According to experimental results, an interesting finding is that accuracy is not always inversely proportional to training Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in time, which occurs in the cases where sparse representation Machine Learning, 2(1), 1–127. is used. It emphasizes on fast effective connection with sparse representation, which avoids complex dense representations in a Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: Areview large structure and achieves a higher accuracy. and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. (2) As a novel combination, SDBFNN has achieved an en- couraging result. It will contribute to the forward development Boureau, Y., & Cun, Y. (2008). Sparse feature learning for deep belief networks. of cross-models, especially deep learning technique with differ- Advances in Neural Information Processing Systems, 1185–1192. ent neural-network-based learning frameworks. SDBFNN has a promising prospect in some engineering applications, especially Chen, Z., & Li, W. (2017). Multisensor feature fusion for bearing faultdiagnosis for the case where datasets or learning tasks are very complex using sparse autoencoder and deep belief network. IEEE Transactions on with too high or low dimension. Instrumentation and Measurement, 66(7), 1693–1702. 3) SDBFNN has successfully overcome several disadvantages De la Rosa, E., & Yu, W. (2016). Randomized algorithms for nonlinear system of both DBN and fuzzy neural network (FNN) models. The pre- identification with deep learning modification. Information Sciences, 364, training technique of DBN can give a more efficient feature- 197–212. representation vector as the input of FNN, and sparse represen- tation can make SDBFNN more tolerant to the external fluctu- Ding, S., Su, C., & Yu, J. (2011). An optimizing BP neural network algorithm based ations to improve its robustness. Most importantly, FNN-based on genetic algorithm. Artificial Intelligence Review, 36(2), 153–162. supervised learning framework can replace the traditional back propagation (BP) algorithm-based fine-tuning to eliminate some Hadavandi, E., Shavandi, H., & Ghanbari, A. (2010). Integration of genetic problems mentioned in Section 1. fuzzy systems and artificial neural networks for stock price forecasting. Knowledge-Based Systems, 23(8), 800–808. However, there are some room to further improve SDBFNN performances. From the experimental results, we find that the Han, H., & Qiao, J. (2010). A self-organizing fuzzy neural network based on fluctuations of hyper-parameters assignment and how to select agrowing-and-pruning algorithm. IEEE Transactions on Fuzzy Systems, 18(6), the values of hyper-parameters are very important to an accu- 1129–1143. rate and stable SDBFNN, especially those hyper-parameters in structure size. Therefore, our next step focuses on addressing the Han, H., Wang, L., & Qiao, J. (2013). Efficient self-organizing multilayer neural fluctuations of hyper-parameter assignment and how to design network for nonlinear system modeling. Neural Networks, 43, 22–32. self-adjusting structures. Hinton, G., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep Acknowledgments belief nets. Neural Computation, 18(7), 1527–1554. This work is funded by the Key Project of National Natural Hinton, G., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with Science Foundation of China (No. 61533002), the National Nat- neural networks. Science, 313(5786), 504–507. ural Science Foundation of China (No. 61673229 and 61703011) Horikawa, S., Furuhashi, T., & Uchikawa, Y. (1992). On fuzzy modeling using fuzzy neural networks with the back-propagation algorithm. IEEE Transactions on Neural Networks, 3(5), 801–806. Huang, H., Ho, D., & Cao, J. (2005). Analysis of global exponential stability and periodic solutions of neural networks with time-varying delays. Neural Networks, 18(2), 161–170. Huang, G., Song, S., & Wu, C. (2012). Orthogonal least squares algorithm for training cascade neural networks. IEEE Transactions on Circuits and Systems. I. Regular Papers, 59(11), 2629–2637. Jagannathan, S., & Lewis, F. (1996). Identification of nonlinear dynamical systems using multilayered neural networks. Automatica, 32(12), 1707–1712. Ji, N., Zhang, J., & Zhang, C. (2014). A sparse-response deep belief network based on rate distortion theory. Pattern Recognition, 47(9), 3179–3191. Keyvanrad, M., & Homayounpour, M. (2017). Effective sparsity control in deep belief networks using normal regularization term. Knowledge and Information Systems, 53(2), 533–550.
440 G. Wang, Q.-S. Jia, J. Qiao et al. / Neural Networks 121 (2020) 430–440 LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), Qiao, J., Li, F., Han, H., & Li, W. (2017). Growing echo-state network with multiple 436–444. subreservoirs. IEEE Transactions on Neural Networks and Learning Systems, 28(2), 391–404. Li, J., Chang, H., Yang, J., Luo, W., & Fu, Y. (2018). Visual representation and classification by learning group sparse deep stacking network. IEEE Qiao, J., Wang, G., & Li, W. (2018). An adaptive deep q-learning strategy for Transactions on Image Processing, 27(1), 464–476. handwritten digit recognition. Neural Networks, 107, 61–71. Li, H., & Liu, Z. (2008). A probabilistic neural-fuzzy learning system for stochastic Qiao, J., Wang, G., Li, W., & Li, X. (2018a). A deep belief network with PLSR for modeling. IEEE Transactions on Fuzzy System, 16(4), 898–908. nonlinear system modeling. Neural Networks, 104, 68–79. Li, F., Qiao, J., Han, H., & Yang, C. (2016). A self-organizing cascade neural network Qiao, J., Wang, G., Li, X., & Li, W. (2018b). A self-organizing deep belief network with random weights for nonlinear system modeling. Applied Soft Computing, for nonlinear system modeling. Applied Soft Computing, 65, 170–183. 42, 184–193. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Mattsson, P., Zachariah, D., & Stoica, P. (2018). Recursive nonlinear-system Networks, 61, 85–117. identification using latent variables. Automatica, 93, 343–351. Sharma, P., Abrol, V., Sao, A., Sharma, P., Abrol, V., & Sao, A. (2017). Deep-sparse- Murray, J., & Kreutz-Delgado, K. (2006). Learning sparse overcomplete codes for representation-based features for speech recognition. IEEE/ACM Transactions images. Journal of VLSI Signal Processing Systems for Signal, Image and Video on Audio, Speech and Language Processing (TASLP), 25(11), 2162–2175. Technology, 45(1), 97–110. Sun, X., Li, T., Li, Q., Huang, Y., & Li, Y. (2017). Deep belief echo-state network Olshausen, B., & Field, D. (1996). Emergence of simple-cell receptive field and its application to time series prediction. Knowledge-Based Systems, 130, properties by learning a sparse code for natural images. Nature, 381(6583), 17–29. 607–609. Wang, G., Qiao, J., Bi, J., Li, W., & Zhou, M. (2019). Tl-gdbn: Growing deep belief Papa, J., Scheirer, W., & Cox, D. (2016). Fine-tuning deep belief networks using network with transfer learning. IEEE Transactions on Automation Science and harmony search. Applied Soft Computing, 46, 875–885. Engineering, 16(2), 874–885. Pedrycz, W., Al-Hmouz, R., Balamash, A., & Morfeq, A. (2015). Designing granular Zhang, L., Lin, J., & Karim, R. (2018). Adaptive kernel density-based anomaly fuzzy models: A hierarchical approach to fuzzy modeling. Knowledge-Based detection for nonlinear systems. Knowledge-Based Systems, 139, 50–63. Systems, 76, 42–52. Zhang, R., & Tao, J. (2018). A nonlinear fuzzy neural network modeling ap- Qiao, J., & Han, H. (2012). Identification and modeling of nonlinear dynamical proach using an improved genetic algorithm. IEEE Transactions on Industrial systems using a novel self-organizing rbf-based approach. Automatica, 48(8), Electronics, 65(7), 5882–5892. 1729–1734. Zhao, Y., Gao, H., & Mou, S. (2008). Asymptotic stability analysis of neural net- works with successive time delay components. Neurocomputing, 71(13–15), 2848–2856.
Search
Read the Text Version
- 1 - 11
Pages: