Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Proceedings of 2019 4th International Conference on Information Technology

Proceedings of 2019 4th International Conference on Information Technology

Published by b.pramuk, 2019-10-20 23:27:43

Description: USB - Proceedings of 2019 4th International Conference on Information Technology

Keywords: InCIT2019,TNI,CITT

Search

Read the Text Version

2019 4th International Conference on Information Technology (InCIT2019) TABLE V SATISFACTION/NONSATISFACTION LEVEL FOR “MOISTURE” COSMETIC EFFECT Age Normal Oily Mixed Dry Sensitive Atopic teens 0.795/0.625 0.698/0.595 0.769/0.609 0.675/0.626 0.642/0.594 0.905/0.569 20’s 0.689/0.641 0.657/0.590 0.653/0.618 0.645/0.612 0.544/0.603 0.747/0.631 30’s 0.371/0.643 0.659/0.624 0.671/0.621 0.618/0.579 0.889/0.596 40’s 0.727/0.673 Non/0.596 0.825/0.642 0.699/0.665 0.900/0.624 0.800/0.538 0.682/0.575 TABLE VI SATISFACTION/NONSATISFACTION LEVEL FOR “WHITENING” COSMETIC EFFECT Age Normal Oily Mixed Dry Sensitive Atopic teens 0.486/0.141 0.389/0.145 0.520/0.144 0.466/0.166 0.430/0.122 0.351/0.106 20’s 0.401/0.149 0.337/0.113 0.367/0.139 0.350/0.135 0.421/0.141 0.326/0.089 30’s 0.340/0.140 0.370/0.141 0.358/0.124 0.346/0.130 0.312/0.122 0.240/0.062 40’s 0.443/0.145 0.233/0.144 0.384/0.134 0.375/0.132 0.480/0.146 0.05/0.085 TABLE VII SATISFACTION/NONSATISFACTION LEVEL FOR “PORES” COSMETIC EFFECT Age Normal Oily Mixed Dry Sensitive Atopic teens 0.398/0.131 0.067/0.158 0.221/0.162 0.389/0.124 0.287/0.154 0.133/0.074 20’s 0.183/0.166 0.268/0.187 0.199/0.188 0.083/0.150 0.213/0.163 0.391/0.111 30’s 0.130/0.157 0.379/0.185 0.168/0.173 0.095/0.149 0.150/0.160 0.249/0.099 40’s 0.133/0.167 0.677/0.211 0.267/0.178 0.089/0.142 0.223/0.139 0.343/0.063 TABLE VIII SATISFACTION/NONSATISFACTION LEVEL FOR “ACNE” COSMETIC EFFECT Age Normal Oily Mixed Dry Sensitive Atopic teens 0.486/0.153 0.538/0.181 0.535/0.183/ 0.653/0.133 0.532/0.197/ 0.344/0.124 20’s 0.548/0.114 0.577/0.164 0.425/0.145 0.469/0.093 0.568/0.140 0.378/0.069 30’s 0.374/0.078 0.555/0.125 0.415/0.099 0.365/0.070 0.366/0.107 0.147/0.052 40’s 0.273/0.050 0.661/0.065 0.364/0.068 0.278/0.038 0.100/0.054 0.100/0.020 by extracting details other than cosmetic effect from review [4] X. Su and T. M. Khoshgoftaar, “A survey of collaborative filtering text, such as skin abnormalities (e.g., itching and swelling), techniques,” Advances in Artificial Intelligence, vol. 2009, no. Article it is expected that we will be able to extract the names of ID 421425, 2009. ingredients that certain sets of user attributes should avoid. [5] J. L. Herlocker, J. A. Konstan, and J. Riedl, “Explaining collaborative Future research will develop a more reliable recommen- filtering recommendations,” in In Proc. of the Conf. on Computer dation system incorporating factors such as skin tone, user’s Supported Cooperative Work, 2000, pp. 241–250. activity during the day, habits or locality to make this research stronger. [6] Y. Shirota, T. Hashimito, and T. Kuboyama, “Analysis of reputation of cosmetics on review website,” Annual report of the Gakushuin ACKNOWLEDGMENT University Computer Centre, Tech. Rep., 2013. We express many thanks to anonymous referees for their [7] Y. Matsunami, M. Ueda, and S. Nakajima, “Proposal of a review valuable advices on the theory of our attacks and their analysis method aimed at judging the feeling of use and preference helpful editorial comments. This work was partially sup- of cosmetics items,” in DEIM Forum, vol. P3-1, 2015. ported by JSPS KAKENHI Grant Number 19K11834 and 17K00324, and Cooperative Education/Research Project [8] S. Abe and I. Kobayashi, “Development of review recommender between Toyohashi University of Technology and National system using cosmetic review data,” in SIG-DBS 2015-A-22, 2015. Institute of Technology. [9] Y. Matsunami, M. Ueda, and S. Nakajima, “Evaluation item review REFERENCES automatic scoring method using cosmetic item evaluation expression dictionary,” in DEIM Forum, vol. B1-1, 2016. [1] Y. Wang, Y. Chuang, M. Hsu, and H. Keh, “A personalized rec- ommender system for the cosmetic business,” Expert Systems with Applications, vol. 26, no. 3, pp. 427–434, 2004. [2] H. H. Moe and W. T. Aung, “Building ontologies for cross-domain recommendation on facial skin problem and related cosmetics,” Information Technology and Computer Science, vol. 6, pp. 33–39, 2014. [3] J. S. Breese, D. Heckerman, and C. Kadie, “Empirical analysis of predictive algorithms for collaborative filtering,” Morgan Kaufmann Publishers, Tech. Rep., 1998. 27

2019 4th International Conference on Information Technology (InCIT2019) Long Short-Term Memory for Bed Position Classification Sakada Sao Virach Sornlertlamvanich School of Information, Computer and Communication School of Information, Computer and Communication Technology Technology Sirindhorn International Institute of Technology, Thammasat Sirindhorn International Institute of Technology, Thammasat University University Bangkok, Thailand Bangkok, Thailand [email protected] Department of Data Science, Faculty of Data Science Abstract—This paper describes an approach for bed Musashino University position classification by using 2 stacked layers of Long Short- Japan. Term Memory approach. The data is collected from the [email protected] sensor panel which consists of 2 types of sensors, i.e. piezoelectric and pressure sensors. The raw data has been order to classify 9 sleeping positions. The result from SVM classified into 5 classes. It also has to go through the min-max is better than 3 others. Due to the arm postures between scaling normalization on a fixed range between 0 and 1. The sleeping in log position and yearner position, it affects the data is assembled to fit a one-second interval of the 30Hz model to misclassify those positions. Moreover, Foubert et sensor sampling rate. The model has been experimented by al. [7] work with a pressure sensor array to recognize lying changing the number of hidden nodes of the model in 128, 80 and sitting positions. They make a comparison between and 50 nodes. The result is 91.70% of accuracy which is good SVM, Neural Network and k-nearest neighbor and can get enough comparing to the previous works. acceptable results from 5 out of 8 selected postures. For Townsend et al. [2] use the output from the pressure sensor Keywords— bed position classification, LSTM, piezoelectric array to calculate the center of gravity signal to extract the sensor, pressure sensor, elderly care. rollover positions. The rollover positions are recorded with 5 different placements of a pressure sensor array under the I. INTRODUCTION bed mattress. The data are applied to a decision tree A human has to go through one natural phenomenon technique for rollover detection; however, there is a which is aging. Because of that, their daily activities and limitation to the recorded data between rollover positions capabilities will decrease gradually over time when getting and many types of non-rollover positions for classification, older. Elderly people usually have a sleep disorder which and the experiment is done by a healthy volunteer in a non- will eventually increase the bed fall accident while they are sleep situation. sleeping [1, 2]. Due to the above mentioned incident and their privacy, many approaches are proposed to solve this An effective approach from Viriyavit et al.[8] use problem with unobtrusive devices such as sensor panels or Neural Network and Bayesian Network for bed posture sensor mats [3]. Those devices are better than cameras that classification with a sensor panel which consists of only 4 can violate the elderly’s privacy [4, 5]. sensors and can get a good result as well. After reviewing the previous work, we notice that the bed positions often Bennett et al. [6] use the Kinotex fiber-optic pressure occur in sequence, i.e. lying to sitting to out of bed. sensors mats. They use 2-D support vector machines Therefore, we propose the use of Long Short-Term Memory (SVMs) and linear classification to monitoring 3 in-bed (LSTM) approach which has the potential to include the positions such as lying, sitting and standing which can get a sequential information to make a better prediction for the good result but the downside is that they have to use many current position [9]. pressure sensors in order to get those body positions. Ostadabbas et al. [5] use a pressure sensor mat to eliminate In this research, we use the data from previous work and the risk of pressure ulceration and reposition the patient apply it with stacked LSTM to classify the bed position. In according to schedule. Each pressure point on mats is used Section II, we describe briefly about the equipment and data to form a pressure image and then applied 2D Gaussian preparation. Section III, we describe the proposed approach. mixture model for in-bed posture classification and limb Section IV, the result and discussion related to the previous detection. Even they can get a good result yet their approach works are presented. still relies on many pressure points to properly combine into the image. II. EQUIPMENT AND DATA PREPARATION The pressure-sensitive sensor mat produced by NITTA A. Equipment Corporation is tested with SVM, Naïve Bayes, Random The sensor panel is made of a plastic plate which Forest, and Neural Network (NN) by Minehura et al. [1] in consists of two kinds of sensors, i.e. pressure and piezoelectric. Each type of sensor is placed symmetrically on the left and right side of the panel, as shown in Fig. 1. The length of the panel is 60 cm and the width is 18 cm. In 28

2019 4th International Conference on Information Technology (InCIT2019) operation, it is placed under the mattress in the thorax area TABLE I. Five Classes of Bed Position of the patient. The sensors collect the signal in the sampling rate of 30 Hz ranging from -127 to 128 for the piezoelectric Bed Position Tagging Label sensors and 0 to 255 for the pressure sensors. Out of Bed 1 Sitting 2 Fig. 1. Sensor Panel Sleep Center 3 Sleep Left 4 B. Data Collection Sleep Right 5 The signal data generated from those sensors are sent via D. Data Accumulation the Bluetooth box to the M2M box as shown in Fig 2. After that, the data from the M2M box are sent to the computer The sensor panel records the signal in 30Hz of sampling which saves the signal data in the comma-separated value rate, we accumulate the data before we use it in our model (CSV) format. by transforming the 30 of data points into 1 second which is equal to 30 x 4 sensors = 120 data points as described in Fig. 3. Fig 2. The process flow of collecting the data Fig. 3. Data accumulated structure C. Data Preprocessing E. Data Normalization The structure of the data collected is organized into 5 Before the data can pass through the model, we need to columns. Column 1 to column 4 are the data from each mitigate some factors that might affect the signal such as the sensor that it starts from PR: Piezo-Right, WR: Weight- weight of the patient, the weight of the mattress, and Right, PL: Piezo-Left, WL: Weight-Left and the last column different types of sensors. Hence, we apply for the off-set is the label of the bed position as shown in (1). number with min-max scaling normalization on a fixed range from 0 to 1 [10]. = ( ) (2) ( ) D = {PR, WR, PL, WL, Label} (1) Xt is the signal point at time tth, Xnorm is normalized value, min is the minimum off-set value and max is the maximum For annotated the data, we install the camera for off-set value. recording the movement of the patient on the bed. We observe the video and synchronize with the recorded signal III. METHODOLOGY to decide the targeted position. However, recording the Long Short-Term Memory or LSTM is a type of Recurrent video is against the patient’s privacy. Therefore, we obtain Neural Networks (RNN) with the capability to cope with a consent from the patient with a formal agreement to long term dependency [11, 12] as a solution to a gradient maintain personal privacy. explosion during the long backpropagated leaning process of RNN [13]. LSTM is commonly used for classification of The signal data and video footage are collected from a the time-series [14]. The mechanism of LSTM is described patient whose age is more than 60 for 120 hours. The as a unit that allows the data to pass through with little predefined 5 classes of bed position are used for annotation. modification [15]. Each unit has 3 gates: (1) forget gate uses The annotated labels are represented by the number as to decide what value needs to remember or forget inside the described in Table 1. unit. (2) input gate uses to decide how much value needs to update inside the unit and (3) output gate uses to decide what The total dataset is more than 390,000 samples that use value the unit going to output. Fig. 4 is the representation of for this experiment which consists of out of bed, sitting, the LSTM unit where xt is the input data, ht-1 is the hidden sleep at the center, sleep left side, and sleep right side. The value from the previous unit, Ct-1 is the memory cell from numbers of data for each position are around 44,000, the previous unit. ht is the hidden output value and Ct is the 32,000, 90,000, 4,800, and 220,000 respectively. The total output memory cell. data are divided into 3 parts for training, validation, and testing with 60%, 20% and 20% proportion of the total data. 29

2019 4th International Conference on Information Technology (InCIT2019) Fig. 4. LSTM Unit [12] Fig. 6. Confusion Matrix for 128 hidden nodes Stacked LSTM is a stable approach for solving the sequence prediction problem [9] and is used on the fraudulent transaction and temporal dependence in EEG [16, 17]. Therefore, we propose the approach of 2 stacked layers of LSTM with Softmax activation function since we work on classification problem and body position on the bed usually happen in sequence. Fig. 5 shows the process flow of the experiment. Fig. 5. Process Flow for Experiment Fig. 7. Confusion Matrix for 80 hidden nodes Fig. 8. Confusion Matrix for 50 hidden nodes IV. RESULT AND DISCUSSION A. Result We test with 3 different experiments. The first experiment starts with 128 hidden nodes, 6000 batch size. We input the data into LSTM and allow it to process for 300 epochs. We can achieve overall accuracy at 92.73% on the testing set. Yet, it still has some weak points for its prediction as shown in Fig 6. The accuracy for sitting and sleep left side are moderately low from our expectation. For sitting position, it has only 72% correctly predicted while 20% is wrongly predicted as sleep right position. Also, 84% is predicted correctly and 11% is incorrectly predicted. Fig. 7 is the result of reducing the hidden node to 80 while other factors remain the same. We can see that the overall accuracy is reduced to 92.43%. But, there is a significant improvement for sleep left position from 84% to 95%. Unfortunately, the sitting position gets a decrease from 72% to 65%. After the hidden node is reduced to 50, the sitting position, which is the lowest prediction position, has significant improvement. It increases back from 65% to 83% and the wrongly predicted position is decreased from 20% to 13% while maintaining the other positions with each accuracy of more than 90% and overall accuracy at 91.70% as described in Fig. 8. 30

2019 4th International Conference on Information Technology (InCIT2019) B. Comparative finding with previous work detection for sleep monitoring in smart homes,\" in 2009 Annual Base on the result from the previous research, It shows International Conference of the IEEE Engineering in Medicine and Biology Society, 2009, pp. 6135-6138. 91.5% for the total accuracy [8] from the combination of [3] K. Chaccour, R. Darazi, A. H. El Hassani, and E. Andrès, \"From fall Neural Network and Bayesian Network. For our proposed detection to fall prevention: A generic classification of fall-related method can achieve a slight increase to 91.7%. Although the systems,\" IEEE Sensors Journal, vol. 17, pp. 812-822, 2017. accuracy is likely the same, we can see a refinement of the [4] R. Steele, C. Secombe, and W. Brookes, \"Using wireless sensor sleep right position. NN and Bayesian Network has only networks for aged care: The patient's perspective,\" in 2006 75% [8] while our method can predict up to 94%. Pervasive Health Conference and Workshops, 2006, pp. 1-10. [5] S. Ostadabbas, M. B. Pouyan, M. Nourani, and N. Kehtarnavaz, \"In- There are some improvement points from the previous bed posture classification and limb identification,\" in 2014 IEEE work. We can use the built-in mechanism of LSTM to adjust Biomedical Circuits and Systems Conference (BioCAS) its network and provide a moderate output rather than using Proceedings, 2014, pp. 133-136. the combination of 2 networks. Even we cannot outperform [6] S. Bennett, Z. F. Ren, R. Goubran, K. Rockwood, and F. Knoefel, the previous approach, but we can see an improvement of \"In-Bed Mobility Monitoring Using Pressure Sensors,\" Ieee sleeping right side position that NN and Bayesian Network Transactions on Instrumentation and Measurement, vol. 64, pp. hard to predict based on the same dataset. 2110-2120, Aug 2015. [7] N. Foubert, A. M. McKee, R. A. Goubran, and F. Knoefel, \"Lying As described from the previous work, the confusion of and sitting posture recognition and transition detection using a the prediction from sleep right, sleep center and sitting pressure sensor array,\" presented at the Medical Measurements and because the patient usually gets out of the bed and return Applications Proceedings (MeMeA), 2012 IEEE International back on the right side which has a side effect on prediction Symposium on, 2012. for those 3 positions. [8] W. Viriyavit, V. Sornlertlamvanich, W. Kongprawechnon, P. Pongpaibool, and T. Isshiki, \"Neural network based bed posture V. CONCLUSION classification enhanced by Bayesian approach,\" in Information and Communication Technology for Embedded Systems (IC-ICTES), This paper has shown some improvement from the 2017 8th International Conference of, 2017, pp. 1-5. previous work with an accuracy of 91.7%. As mention [9] Jason Brownlee. (2017, August 18). Stacked long short-term above, our proposed method cannot outperform the memory networks. Available: previous work but it shows some improvement which is https://machinelearningmastery.com/stacked-long-short-term- worth to study. For example, we can reduce the workload memory-networks/ from the combination of Neural Network with Bayesian [10] Sebastian Raschka. (2014, July 11). About feature scaling and Network and use the only LSTM. Moreover, this approach normalization and the effect of standardization for machine learning helps increase the prediction of sleep right position without algorithms. Available: losing much accuracy from other positions. http://sebastianraschka.com/Articles/2014_about_feature_scaling.h tml#about-min-max-scaling ACKNOWLEDGMENT [11] S. Hochreiter and J. Schmidhuber, \"Long short-term memory,\" This research is financially supported by Thammasat Neural Computation, vol. 9, pp. 1735-1780, 1997. University Research fund under the NRCT, Contract No. [12] Christopher Olah. (2015, August 27). Understing LSTM networks. 25/2561, for the project of “Digital platform for sustainable Available: digital economy development”, based on the RUN Digital http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Cluster collaboration scheme. We are very thankful to Mr. [13] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, Shuichi Yoshitake, chairman of AIVS, for his strong \"Gradient flow in recurrent nets: the difficulty of learning long-term support in the equipment utilization under the Japanese dependencies,\" ed: A field guide to dynamical recurrent neural International Cooperation Agency (JICA) grant for SME networks. IEEE Press, 2001. development support, and the director together with the staff [14] Josh Patterson and A. Gibson, \"Major Architectures of Deep of Banphaeo Hospital for overall supports in data collection. Networks,\" in Deep Learning A Practitioner’s Approach Mike Loukides and T. McGovern, Eds., ed United States of America: REFERENCES O’Reilly Media, Inc., 2017, pp. 117-164. [15] R. B. Z. Bharath Ramsundar, \"Long Short-Term Memory (LSTM),\" [1] A. Mineharu, N. Kuwahara, and K. Morimoto, \"A study of automatic in Tensorflow for Deep Learning, A. Y. Rachel Roumeliotis, Ed., ed classification of sleeping position by a pressure-sensitive sensor,\" in United States of America: O’Reilly Media, Inc., 2018, pp. 152-153. Informatics, Electronics & Vision (ICIEV), 2015 International [16] Y. Heryadi and H. L. H. S. Warnars, \"Learning temporal Conference on, 2015, pp. 1-5. representation of transaction amount for fraudulent transaction recognition using cnn, stacked lstm, and cnn-lstm,\" in 2017 IEEE [2] D. I. Townsend, R. Goubran, M. Frize, and F. Knoefel, \"Preliminary International Conference on Cybernetics and Computational results on the effect of sensor position on unobtrusive rollover Intelligence (CyberneticsCom), 2017, pp. 84-89. [17] R. G. Hefron, B. J. Borghetti, J. C. Christensen, and C. M. S. Kabban, \"Deep long short-term memory structures model temporal dependencies improving cognitive workload estimation,\" Pattern Recognition Letters, vol. 94, pp. 96-104, 2017. 31

2019 4th International Conference on Information Technology (InCIT2019) A Secure Fragile Video Watermarking Algorithm for Content Authentication Based on Arnold Cat Map Rinaldi Munir Harlili School of Electrical Engineering and Informatics School of Electrical Engineering and Informatics Institut Teknologi Bandung Institut Teknologi Bandung Bandung, Indonesia Bandung, Indonesia [email protected] [email protected] Abstract—This paper presents a fragile watermarking specific transformation (DCT, DWT, DFT, etc) [4]. The algorithm in spatial domain to authenticate integrity of the transform coefficients are modified by embedding the video digital content. The watermark is a binary image and it watermark bits [2, 3]. Watermarking in transform domain are has been replicated so that has the same size with frame size of more robust than spatial domain through non-malicious the video. To increase security, before embedding, the attacks like cropping, compression, scaling, rotation, etc. watermark is encrypted by XOR-ing it with a random image. The random image is generated by using a chaos map, i.e Robust video watermarking in transform domain is Arnold Cat Map. The encrypted watermark is embedded by solution to the problems of copyright protection, proving modifying pixel values of video frames. Some attacks has been ownership, illegal copying, and transaction tracking of video. done to the watermarked video. Experiment results show that The watermark in the video is difficult to remove through the algorithm can detect and localize the modified region of malicious and non-malicious attacks. On the contrary, fragile video frames very well. video watermarking is suitable to solve the problem of tamper detection of video content. The watermark in the video is Keywords—fragile watermarking, video, Arnold Cat Map fragile when the video is manipulated. Robustness is not important for fragile watermarking. I. INTRODUCTION Fragile video watermarking could be performed either in Digital video is one of a kind of digital data which block-wise scheme or in pixel-wise scheme. In block-wise contains more information than an image. An single image is scheme, the host image is divided into small blocks, then the only a frame, while a digital video consist of frames of image watermark is embedded into each block. This scheme makes and audio (if any). Data digital such as a video is easily tamper detection could be performed to tampered bocks. In copied, transferred, edited, altered, or manipulated. Once a pixel-wise scheme, the watermark is embedded into each digital video is manipulated, the integrity of video changes. In pixel, therefore it can identify until pixel level [1]. some cases, we need to know authenticity of the video. For example, a court need to decide if a video is genuine or has Based on source of watermark, fragile video been manipulated. Fragile watermarking provides a technique watermarking can be classified into two classes. The first is to authenticate originality of a video. In the fragile internal watermark scheme, which the watermark information watermarking technique, a data signal that called watermark is derived from gray values of host image, then it is embedded is inserted into a host video become to a watermarked video into the host image themselves. The second is external without affecting its perceptual quality. The watermark can be watermark scheme, which the watermark is input from the extracted again from the video. If the watermarked video is user, usually the watermark is a meaningfull binary image like manipulated by using a software, then the extracted a logo or something like that. watermarked is fragile or broken. Compared to the original watermark, the broken watermark is indication that the video In this paper we propose a video fragile watermarking has been altered. algorithm which it should has requirements as follows: Majority of research about fragile watermarking are 1. Domain: Embedding and extraction of watermark is specialized for image. However, we could also extent the performed in spatial domain and in pixel-wise scheme. image fragile watermarking schemes for video sequences. Basically a video is collection of frames where a frame is an 2. Perceptual quality: Embedding of watermark should image, therefore we could embed a watermark to each frame. not degrade quality of the host video. Based on domain to hide the watermark, digital 3. Watermark: The watermark is a binary image which watermarking schemes can be divided into spatial domain has the same size with the frame size of the host video. and transform domain. In spatial domain, watermarking is This requirement is made in order to we can identify performed by modifying pixel values of host video directly tampering until pixel level. [1, 2]. The watermark bits are embedded into pixel values. Digital watermarking in transform domain is performed by 4. Security: In order to only the authorized parties can modifying of the transform coefficients of the host image. do authentication of the received video, then the Before embedding the watermark, the host image (in spatial watermarking algorithm should consider security domain) is transformed to a transform domain by using a issue. 32

2019 4th International Conference on Information Technology (InCIT2019) 5. Location detection: the algorithm has capability to last non-zero DCT coefficient. Embedding and extraction of localize the area being tampered. watermark use a public key, so that anyone who knows the public key can do embedding and extraction of watermark However, unfortunately, not all of the existing fragile The experiment results show that the scheme can detect watermarking schemes fulfil security aspect so that the tampering on the watermarked video. unauthorized parties can do authentication of the received video without know the secret key(s). To overcome this Not all of the watermarking schemes fulfil security aspect. problem, we proposed a secure fragile video watermarking One scheme do not use key at all, the others use public key to based on chaos map. Chaos system is used in security for two extract watermark. Anyone who knows the public key can do reasons: (1) the nature of chaos is sensitive to initial extraction of watermark. We need a watermarking scheme so conditions of the system, (2) random chaotic behavior. that embedding and extraction of watermark are performed by authorized party only namely the owner of video. The paper is organized into six sections. The first section is this introduction. The second section will review some B. Arnold Cat Map related works about fragile video watermarking and chaos Arnold Cat Map (ACM) is 2-D chaos map that transforms map. The algorithm to embed and extract watermark will be explained in the third section. The fourth and fifth section will an element from a position to another position in the same present the experiment results. Finally, the last section will area [6]. In other words, ACM transforms coordinate (x, y) resume the conclusion and future works. from an image N × N pixels to a new coordinate (x’, y’). The iteration equation is II. RELATED WORKS  xi+1  = 1 b  xi  mod( N ) (1) A. Video Fragile Watermarking   c bc + 1  yi  Fragile watermarking for digital video has become  yi+1    interesting research topics. Recently digital video is very easy ACM is reversible, i.e the transformed image can be to be modified by using commercial software, therefore one returned to its original image with the equation: need to prove authenticity of video content. Fragile watermarking algorithm consist of two process: embedding  xi  = 1 b  -1  xi+1  mod( N ) (2) and extracting. Embedding process receives input such as a   c bc +1  yi+1  digital video, a digital watermark, and key(s). To extract the  yi    watermark from the video, user gives input such as a watermarked video, key(s), and original watermark. The Parameters b and c are arbitrary positive integers, and extracted watermark is compared to original watermark and matrix determinant must be 1 so that the results of the a decision is made to decide if the video has been tampered transformation are area-preserving, that is, they remain in the or authentic. same image area. ACM is repeated m times and each iteration produces an image that looks like random. Values of b, c, and Elgamal et al. [1] proposed a fragile video watermarking m can be considered as secret keys. After being iterated p based on block mean and modulation factor. The original times, the image will be transformed back to the original video is transformed from RGB model to YCbCr model, and image, as shown in Figure 1. The value of p varies for each Cr-component is partitioned into non-overlapping blocks of image, depending on b, c, N. According to [6], Freeman J. pixel, depend on the number of bits of the watermarks. Dyson's research and Harold Falk found that T < 3N. Watermark is a binary image. The watermark bits are embedded for each block separately. No key is used either in Fig. 1. Results of iteration of ACM [7] embedding or extraction of watermark. The proposed algorithm can detect tampering attacks such as filtering and III. PROPOSED ALGORITHM geometric/non-geometric tranformations. This section will explain the proposed fragile video watermarking. This algorithm is simple but it can detect Rupali et al. [5] proposed a public-key fragile video watermarking technique to embed and extract watermark in DCT domain. There are two watermarks to embed. The first waternark is the digital signature of the frame in frequency domain, and the second watermark is numbers of blocks and frame numbers. The first watermark is used to detect tampering and the second watermark is used to localize tampered area. The watermark embedding uses the privat key, while the watermark extraction uses the public key. Anyone who knows the public key can do extraction of watermark. Experiments by changing single pixel value in a block results that tampered block can be detected. However, the proposed technique is not robust against compression. Zhi-yu et al. [3] proposed a fragile watermarking scheme for the color video authentication. The video first is transformed from RGB model to YST model. The T- component then is partitioned into 4×4 blocks. The watermark, hence is called the authentication code, is created from the quantized DCT coefficient and is embedded into the 33

2019 4th International Conference on Information Technology (InCIT2019) manipulation in video frames until pixel level. Embedding A. Embedding Algorithm and extracting of watermark is performed in spatial domain. Input: a host video file (v), a watermark file (w), ACM To detect and localize manipulation in the watermarked parameters (b, c, and m). video, we need the original watermark. Output: a watermarked video (v’) Fig. 2. Left: the original watermark; Right : the new watermark Step 1: Read frames of video v, watermark w, and ACM parameters (b, c, and m). If the video has an audio, then The watermark is a binary image. However, watermark separate the audio. size may be less than video frame size, therefore the watermark need to be replicated by duplicating it a number of Step 3: Copy the single watermark to produce a new times in order to produce a new watermark that has the same watermark w’ which has the same size with the host video size with the host video frame size. Fig. 2 shows example of frames. replication. The original watermark ‘ITB logo’ has a size 185 × 185 pixels, whereas video frames have a size 480 × 720 Step 2: Scrambling w’ by iterating ACM (eq. 1) m times to pixels. This original watermark must be duplicated a number produce a random image r. of times so that produce a new watermark that has size 480 × 720 pixels Step 3: Encrypt w’ as follows: w” = w’ ⊕ r. Step 4: Embed the encrypted watermark, w”, into each frame Next, to increase security, before embedding, the new of the video by manipulating the least significant bit (LSB) of watermark is encrypted by XOR-ing it with a random image. pixels. If the frame has R, G, and B component, then perform A random image is generated by scrambling an arbitrary embedding to each component. binary image by using Arnold Cat Map (ACM). For example, the new watermark above is scrambled by using ACM (see Step 5: If the original video has a audio, combine it to the Fig. 3). The map is performed to each single watermark. watermarked frames to produce a watermarked video. Parameters of ACM are b, c, and number of iteration m. Different parameters will result different random image. The B. Extraction Algorithm new watermark is encrypted with the random image by using Input: a watermarked video file (v’), an original watermark XOR operation to produce an encrypted watermark. Next, we file (w), ACM parameters (b, c, and m). embed the encrypted watermark into the host video. Because Output: an extracted watermark, location of tampered frame of the new watermark has the same size as the frame size, then (if any). we can detect changes to the video frames to the pixel level. Step 1: Read the frames of video v’, watermark w, and ACM Fig. 3. Left: A new watermark; Right : A random image of the parameters (b, c, and m). watermark Step 2: Copy the single watermark a number of times to Based on the explanation above, we could design a simple, produce a new watermark w’ which has the same size with the but secure, fragile video watermarking algorithm. For host video frames. simplicity, the random image is generated from the watermark itself. The algorithm consist of two processes: embedding Step 3: Scrambling w’ by iterating ACM (eq. 1) m times to algorithm and extraction algorithm, each will be describe produce a random image r. below. Step 4: For each frame, extract all of the least significant bit (LSB) of pixels. This step yields an extracted watermark w”. Step 5: Decrypt the watermark w” as follows: w”’= w” ⊕ r. Step 6: Compare w”’ with w’. If w”’= w’, we conclude that the integrity of video is authenticated. If not, go to step 7 and 8. Step 7: To localize tampered region, subtract w’ to w’’. If a pixel is not changed, the subtraction yields 0, else the subtraction yields 1. Step 8: Identify pixels in the watermarked video in position where the subtraction above yields 1. Those are pixels that have been manipulated. IV. EXPERIMENT RESULTS After the watermarking algorithm has been designed, we implemented the algorithm to a computer program. We test the algorithm to a sample video. The sample video was a video clip of cartoon film which has 450 frames, each frame has a size 480 × 720 (Fig. 4a). This video contains audio inside. Fig. 4b and 4c show two frames of the video (frame 1 and frame 283). The watermark to be embedded into the host video is ‘ITB logo’ as shown in Fig. 3 (after duplicated a number times so 34

2019 4th International Conference on Information Technology (InCIT2019) that produce a new watermark which size 480 × 720). In these then the extracted watermark is not the same as the original experiments below we used parameters of ACM as follows: watermark. Therefore this algorithm provides security aspect. For example, the receiver used b = 10, c = 8; m = 5 to b = 5; c = 7; m = 5; extract the watermark. Fig. 6 shows the extracted watermarks We divide this section into two cases: (i) no attack case, and from two frames. Compared to the original watermark, this (ii) tamper detection test. watermark is not same. Frame Extracted watermark (a) (b) (c) Fig. 6. The watermarked frames and the extracted (wrong) watermarks. Fig. 4. A host video to be watermarked B. Tamper Detection Test A. No-Attack Case Main goal of fragile watermarking is to determine wheater In this case, we didn’t manipulate anything to the the video has been manipulated or not. If the video has been watermarked video. We extracted the watermarks from the manipulated, it also able to locate where the alteration made video. We could extract all watermarks in all frames or on the video frames. In these experiments, we performed extracted watermarks only in certain frames. Fig. 5 shows the some attacks to the watermarked video. extracted watermarks from frame number 1 and frame 1. Detection Test Againts Text Addition number 283 only. There is no damage in the extracted watermarks. We conlude no tampering performed to the On this test, we modified the watermarked video by watermarked video. adding a text ‘(C) Cartoon Network’ at the left top of the frames (Fig. 7a). Next, we extracted the watermark from a Frame Extracted watermark frame and we got an extracted watermark (after decrypted) contains the adding text (Fig. 7b). Fig. 7c shows detection of pixels that have been manipulated by adding a text ‘(C) Cartoon Network’. Fig 7d shows the tampered pixels in the correspondence frame. Fig. 5. The watermarked frames and the extracted watermarks In this algorithm, parameters of ACM behave as keys. Fig. 7. (a) Watermarked frame after adding a text, (b) extracted Embedding and extraction of the watermark could be done watermark; (c) and (d) detected tampering region by the authorized party. If the receiver didn’t have the keys, 2. Detection Test Againts Copy-Paste Attack 35

2019 4th International Conference on Information Technology (InCIT2019) On this test, figure of character ‘Doraemon’ was copied watermarked video and then extracted the watermark (Fig 10, and pasted into the watermarked video (Fig. 8a). We extracted left). The extracted watermarks contain noises (Fig 10, right). the watermarks from the video and got an extracted watermark as shown in Fig. 8b. The watermark contains a silhouette of strange object. Detection of copy-paste object is shown in Fig. 8c and 8d. We concluded that the video has been tampered. Fig. 8. (a) Watermarked frame after copy-paste attack, (b) extracted Fig. 10. The watermarked frames after changing the contrast and the watermark; (c) and (d) detected tampering region extracted watermarks 3. Detection Test Againts Adding Noise 5. Detection Test Againts Cropping Some videos maybe contain noise. There are some kind of Video cropping is one of geometrical attack. We noise such as gaussian noise, salt and pepper noise, poisson manipulated the watermarked video by cropping the certain noise, etc. In this test, we added salt and pepper noise with region, horizontally or vertically. In this test we cropped a low density 0.1 into the watermarked video (Fig. 9a). We found part of the video. Before we extracted the watermark, we that the extracted watermark also contained noise that returned the frames size into original size by adding white or indicated the video has been altered (Fig. 9b). The tampering black pixels. We found the extracted watermarks contained region is entire of frame (Fig 9c and 9d). the black region that indicated the cropped region in the frames (Fig. 11). Fig. 11. The watermarked frames after cropping and the extracted watermarks Fig. 9. (a) The watermarked frames after adding noise ‘salt and pepper’; V. DISCUSSION (b) the extracted watermark; (c) and (d) detected tampering region We have run some experiments to test performance of the 4. Detection Test Againts Contrast Change fragile video watermarking algorithm. If the watermarked One of the common manipulation of video is filtering. video is not manipulated, altered, or tampered, then we will get the extracted watermark is same exactly to the original The video is manipulated by changing its contrast or watermark. It is indication that the integrity of the video is brightness. On this test we changed contrast of the authenticated and we conclude that the video is genuine. Next we manipulated the watermarked video by adding a text, inserting a new object into the video, changing contrast, and cropping some pixels. Adding a text or inserting an object into the video result the extracted watermarks that contain silhouette of the object or text. We can detect them visually. 36

2019 4th International Conference on Information Technology (InCIT2019) Compared to the original watermark, the extracted watermark Thank to Institut Teknologi Bandung (ITB), Indonesia. is not same. We conclude the video has been altered. The This research is funded by Program Penelitian dan algorithm could detect the tampered region very well. Pengabdian Masyarakat ITB (P3MI) 2019. Changing contrast and brightness of video mean changing REFERENCES all pixel values. As the result, the extracted watermark also change entirely. It is broken totally. We conclude the vide has [1] A.F. Elgamal, N.A. Mosa, W.K., ElSaid, “A Fragile Video been manipulated. Watermarking Algorithm for Content Authentication based on Block Mean and Modulation Factor”, International Journal of Computer Cropping a block area in the watermarked video results Applications (0975 – 8887) Vol. 80 – No.4, October 2013. the extracted watermark is also cropped in the correspondence block. The watermark has a black region in the cropped area [2] T. Jayamalar, V. Radha, “Survey on Digital Watermarking Techniques of the correspondence frame. and Attacks watermark”, International Journal of Engineering Science and Technology, Vol. 2, No. 12, pp 6963-6937, 2010. VI. CONCLUSION AND FUTURE WORKS A secure fragile video watermarking on spatial domain [3] H. Zhi-yu, T. Xiang-Hong, “Integrity Authentication Scheme of Color based on Arnold Cat Map has been presented. Some Video Based on the Fragile Watermarking”, Proc. of 2011 experiments has been run to test the performance of the International Conference on Electronics, Communications and Control algorithm. The experiment results showed that the algorithm (ICECC). could detect tampering on the watermarked video. The algorithm could also locate the tampering region. [4] Maryam A., Mansoor R., Hamidreza A., “A novel robust scaling image For future works, the algorithm can be developed for the watermarking scheme based on Gaussian Mixture Model” in Expert compressed video. Embedding of watermark is performed in Systems with Applications 42, 2015, pp 1960–1971. encoding and decoding stage of the video. Of course watermark embedding and extraction must be operated in [5] Rupali D. P., Shilpa M., “Fragile Video Watermarking for Tampering transform domain. Detection and Localization”, Proc. of 2015 International Conference on Advances in Computing, Communications and Informatics ACKNOWLEDGMENT (ICACCI), 2015. [6] Katherine S., “A Chaotic Image Encryption”, Mathematics Senior Seminar, 4901, University of Minnesota, Morris, 2009. [7] Rinaldi M., “Algoritma Enkripsi Citra dengan Kombinasi Dua Chaos Map dan Penerapan Teknik Selektif Terhadap Bit-bit MSB”, Proc. of Seminar Nasional dan Aplikasi Teknologi Informasi (SNATI), Universitas Islam Indonesia Yogyakarta, 2012. 37

2019 4th International Conference on Information Technology (InCIT2019) Audio-Visual Speech Recognition System Using Recurrent Neural Network Yeh-Huann Goh Kai-Xian Lau Faculty of Engineering and Technology Faculty of Engineering and Technology Tunku Abdul Rahman University College Tunku Abdul Rahman University College Jalan Genting Kelang, 53300 Kuala Lumpur, Malaysia Jalan Genting Kelang, 53300 Kuala Lumpur, Malaysia [email protected] [email protected] Yoon-Ket Lee Faculty of Engineering and Technology Tunku Abdul Rahman University College Jalan Genting Kelang, 53300 Kuala Lumpur, Malaysia [email protected] Abstract—An audio-visual speech recognition system navigation system [2] have gained popularity in people’s (AVSR) integrates audio and visual information to perform daily lifestyle. Traditional automatic voice recognition speech recognition task. The AVSR has various applications (AVR) only performs well in environment with high in practice especially in natural language processing systems SNR. Over the years, many approaches were proposed to such as speech-to-text conversion, automatic translation improve the speech recognition rate and the robustness and sentiment analysis. Decades ago, researchers tend to of Automatic Speech Recognition (ASR) system such as: use Hidden Markov Model (HMM) to construct speech speech pre-processing [3]–[6], robust acoustic features recognition system due to its good achievements in suc- [7], [8] and uncertainty decoding [9]. A more recent cess recognition rate. However, HMM’s training dataset is approach is the introduction of visual features to improve enormous in order to have sufficient linguistic coverage. the overall recognition of an ASR system [10]–[14] Besides, its recognition rate under noisy environments is not satisfying. To overcome this deficiency, a Recurrent Neural Linear Predcitive Coding (LPC), Perceptual Linear Pre- Network (RNN) based AVSR is proposed. The proposed dictive (PLP) and MFCC features are common parameters AVSR model consists of three components: 1) audio fea- employed in speech processing and ASR systems. The tures extraction mechanism, 2) visual features extraction basic working of ASR system consists of pre-processing, mechanism and 3) audio and visual features integration feature extraction, knowledge model and pattern classifi- mechanism. The features integration mechanism combines cation. A performance analysis of the audio features ex- the output features from both audio and visual extraction traction methods in ASR system was done in [15]. In that mechanisms to generate final classification results. In this study, the ASR system was exclusively trained with 100 research, the audio features mechanism is modelled by Hindi word vocabulary. Reported results show a higher Mel-frequency Cepstrum Coefficient (MFCC) and further speech recognition rate of 95.4% has been achieved using processed by RNN system, whereas the visual features MFCC-based ASR system compared to PLP-based ASR mechanism is modelled by Haar-Cascade Detection with system. Subsequently, another HMM-based ASR system OpenCV and again, it is further processed by RNN system. for Punjabi language is developed and an experiment Then, both of these extracted features were integrated was also conducted on different audio features extraction by multimodal RNN-based features-integration mechanism. methods. Based on the result, the MFCC-based ASR has The performance in terms of the speech recognition rate and the highest accuracy in noise-free environment up to 86%, the robustness of the proposed AVSR system were evaluated whereas the PLP and Power Normalized Cepstral Coef- using speech under clean environment and Signal Noise ficients (PNCC)-based methods have only 83% accuracy Ratio (SNR) levels ranging from -20 dB to 20 dB with 5 dB [16]. For English pronounciation, Goh et al. show that interval. On average, final speech recognition rate is 89% speech recognition rate for MFCC-based ASR system is across different levels of SNR. higher than the speech recognition rate of PLP-based ASR system for continuous number digits recorded in TIDIGIT Keywords—Recurrent Neural Network, Speech Recog- database. However, in the same study, PLP-based ASR nition System, Audio-visual Speech Recognition System, system perform better than MFCC-based ASR system in Features Integration Mechanism, Audio Features Extraction correct phoneme accuracies for speech data recorded in Mechanism TIMIT database [8]. According to [17], LPC dominates in predicting but not in extracting parameters. In contrary, I. INTRODUCTION MFCC and PLP are derived based on the human auditory system that consist of filter bank, thus they have better AVSR system integrates visual information and audio response compare to LPC parameter. Subsequently, the information to create a reliable speech recognition sys- MFCC requires less computation power and performs tem. The basic idea of the AVSR is to extract human mouth or lip images features to aid traditional audio-based speech recognition system [1]. Recently, human-machine interface for intelligent devices are increasingly com- mon. To illustrate, voice commanding smartphones, voice controlled-domestic robots, autonomous car and voice 38

2019 4th International Conference on Information Technology (InCIT2019) Fig. 1. Example of model-based features modalities (audio and visual) and transforms them to a single feature vector, known as multimodal feature vector. well in clean environment compare to PLP [7], [18]. An experiment with the Restricted Boltzmann Machines In this study, since the database is mainly contributed (RBM) with the concatenated audio and video data has by digit numbers, MFCC features have been chosen for been carried out in [22]. The result obtained was not audio features extraction mechanism. Besides, based on satisfying as the training process overfitting due to the literature studies, we can observe that HMM is frequently correlations between the audio and video data are highly used in ASR as an audio features extraction mechanism. non-linear. The same approach can be applied at mid-level Although the HMM method has achieved high accuracy of network. Kingsbury et al. used Deep Belief Network of speech recognition in noise free environment, however (DBN) to train each modal and combine the features at its training dataset is enormous in order to have sufficient mid-level before feeding into DBN [13]. The downside linguistic coverage. Hence, the RNN approach which of these approaches is the difficulty in deciding which is able to learn audio feature representations has been information gained depends on the dynamic changes in proposed in this study as the speech recognition model. the reliability of multimodal information sources. As for the decision fusion approach, the outputs of classification As for visual features extraction mechanism, two networks are merged to determine the final classification. different approaches are available which are top-down Decision fusion approach improves the system robustness and bottom-up approaches. The top-down approach is by integrating the stream reliabilities together with multi- by generating a static model of designated shape and ple information sources as a criterion of information gain matches it with the image as input and recognises the for a classification model. For instance, Youssef Mroueh lips or mouth images features. The commonly used et al. presented deep multimodal learning based AVSR. technique for lip detection are model based technique This method trains the audio features and visual features which includes Active Shape Models (ASM), Active independently and fuses them in the final hidden layers to Appearance Model (AAM) and image based model which obtain a joint feature. This bilinear bimodal Deep Neural utilises spatial information, intensity, pixel colour, lines, Network was trained and tested with IBM large vocabu- corners, motion, and edge information [19]. The AAM lary audio-visual studio dataset, and obtained about phone was trained using a one-held-out methodology or cross error rate (PER) of 35.83% in clean environment [14]. validation method [20]. Beforehand, the linear-predictor The AVSR based on HMM has also been realised by [23], based tracker functions to track the speaker in each video [24] decades ago. However, all these published methods as the speakers in dataset are recorded in different angle. are unable to correctly recognize speech in unpredictable Besides that, the AAM was used with Discrete Cosine environment or in low SNR environment. Contrariwise, Transform (DCT) coefficients and Principle Coefficient RNN is a class of neural networks that performs ex- Analysis (PCA) coefficient extracted from the Region of ceptionally in natural language processing (NLP) system. Interest (ROI), which is the mouth in this case [21]. To The maximum weighted stream posterior (MWSP) model sum up, model-based features are suitable for a more has been used as a stream integration method in [25]. detailed ROI representation. However, it might require The MWSP method used a database, whereby the video precise manual labelling of the training data to construct was recorded in a quiet environment, and tested with a a statistical model of a lip shape model as shown in Figure corrupted MPEG-4 video compression. Additional full- 1. On the other hand, the bottom-up approaches directly band white noise with different SNR levels (-20dB to estimate the visual features from an image [11], [12]. The 20 dB) was added into the test set as well to examine pros of the bottom-up approaches are detailed static lip- the effects of corruption in the audio stream. The DCT shape models or manual labelled data for training are not (type II) was used as visual feature extraction; while the required. Nevertheless, they are sensitive to lighting con- MFCCs was used to represent the features in the audio. ditions, rotation and translation of input image. For this Overall, the AVSR based on MWSP has word recognition study, Haar-cascade detection is selected as visual feature error rate of approximately 15% to 2%, at −20dB to extraction mechanism due to the ease of implementation 20dB, respectively. Another example of AVSR based on and has many pre-trained classifiers for facial features. coupled hidden Markov models (CHMMs) having the feature of blind estimation of dynamic stream weights The multimodal recognition has better performance [26]. In other words, the model estimates the weightage in terms of speech recognition rate compared to uni- or parameter and influences the overall performance of modal recognition as the sources of information com- AVSR. The parameters are the probabilities of state plement each other. There are two common approaches transitions. The MFCCs served as the audio features; for audio-visual integration mechanism, which are 1) while the video features was determined by using the features fusion and 2) decision fusion in achieving the Viola-Jones face and mouth detector [27]. multimodal integration. First and foremost, the feature fusion approach links all the feature vectors from multiple In summary, RNN-based AVSR system enhances the accuracy of speech recognition performance in a noisy setting as it takes extra sensory modal to counterpart corrupted audio or undesired background sound. Hence, this study will explore and research on AVSR system. The proposed RNN-based AVSR system uses MFCC- 39

2019 4th International Conference on Information Technology (InCIT2019) based audio features integrated with visual information from scratch. The inner architecture of the proposed extracted using Haar Cascade Detection in OpenCV. By using RNN to integrate both the audio features mecha- AVSR system is illustrated in Figure 2. The audio and vi- nism and video features machanism, the proposed AVSR system shows a better robustness in noisy environment. sual features extraction mechanisms are constructed using The organisation of the paper is as follow: Section 2 LSTM and artificial neural network (ANN). The features- begins with integration mechanism is constructed using multimodal II. DATA ACQUISITION AND FEATURES EXTRACTION RNN and a softmax layer. An isolated digit numbers audio-visual dataset is used for the AVSR system experiment as presented in this Let na denotes the number of elements in the MFCC study. In this dataset, digit numbers from one to ten were input matrix, whereas nv denotes the number of elements obtained from 100 respondents, divided into 50 males in the visual input matrix. Let oa(i,j) and ov(i,j) denote and 50 females. The respondents constitute of different the audio and visual input matrices for LSTM systems ethnic groups, which include Malay, Chinese, Indian and others. The dataset consists of three parts, which are corresponding to the i-th and j-th input respectively. The video, audio, and visual. The videos are recorded with a sampling rate of 22.050Hz in the QuickTime File Format weighted state layer for audio output oA and visual output (.MOV) by using iPhone 5s. The frame size of each oV are as follows: videos is 720x1280 pixel 24 bits RGB facial view. Only the respondent’s face will be captured in the video. All the nai naj videos were taken under bright and vibrant environment and the respondents were requested to remove their eyes oA = wa(i,j) o(ai,j) (1) wear or face wear during the recording process. The raw video dataset was separated into audio and visual dataset. i=1 j=1 For every single video sample, the visual information is represented by five visual frames extracted from that nvi nvj particular video. In total, 1000 raw isolated digit number video samples were recorded, from the recorded video oV = wv(i,j) ov(i,j) (2) samples, 1000 audio samples with duration of 2 seconds and 5000 visual samples were extracted. i=1 j=1 Speech signals were processed by pre-emphasis and di- Where wv(i,j) ∈ R(1 ≤ i ≤ nvi, 1 ≤ j ≤ nvj) and vided into smaller speech frame with frame size of 25ms wa(i,j) ∈ R(1 ≤ i ≤ nai, 1 ≤ j ≤ naj) are learnable and 10ms stride between the frame. Each speech frame parameters. was further processed by Hamming Window and finally 12 mean normalized MFCC-features were extracted. As shown in Figure 2, the proposed audio and visual As for visual signals, by using the Haar-cascade de- features extraction mechanisms consist of two parts. The tection, the original RGB input image will be fed into first part is a sequence to sequence type LSTM and the the face classifier in grayscale mode. If any faces are detected, it will return the positions of the detected faces second part is an ANN system. Audio and visual LSTM and set the region as ROI for the face. Within the ROI, systems read the two seconds MFCC-features of size 12× eye detection, nose detection and mouth detection can be 200 and five concatenated visual-features of size 60 × 30 applied. Finally, image features in the mouth ROI region from one single video clip respectively. Audio and video will be saved in ‘.BMP’ format. Each extracted mouth input features are converted into an intermediate features image frames will be further resized to a dimension of of size 100 at the outputs of audio and visual LSTM 60×30 (width × height) image. Then, each resized image systems. These LSTM outputs were further fed into the will be further processed by edge detection. ANN system and final outputs of size 10 were produced for both audio and visual extraction mechanisms. The Both extracted audio and video features are stored in CSV file format in one dimensional form as CSV file is final extracted ANN outputs from both MFCC and visual a popular format for storing tabular data in plain text. extraction mechanisms are o(Ai) and o(Vi) respectively, the Besides, CSV file is readable by Python programming extracted audio and visual features were concatenated and environment and CSV file is able to deal with large amounts of data. The final extracted MFCC-features and fed into multimodal RNN-based integration mechanism. visual features are fed into the RNN system respectively The output of the RNN layer oF is found using (3) for speech recognition purpose. noA +noV III. RECURRENT NEURAL NETWORK MODEL oF = wF(i)(oA(i) + oV(i)) (3) In this paper, an end-to-end AVSR system which includes: 1) audio features extraction mechanism, 2) i=1 visual features extraction mechanism and 3) audio-visual features-integration mechanism is developed and trained Where wF(i) ∈ R(1 ≤ i ≤ nF ) is a learnable parameter. A logistic regression layer with Softmax activation is Fig. 2. The architecture of the proposed AVSR system 40

2019 4th International Conference on Information Technology (InCIT2019) Fig. 3. Architecture of individual audio network model Fig. 4. Architecture of individual visual network model Fig. 5. Validation Loss for Different Network Models followed after the RNN layer. The probability of the input were adopted in the training process. Each models sample belonging to each label class is found using 4: is trained with 10 epochs using 940 training data. The models will be evaluated by the remaining 60 o = Sof tmax(W (S)oM + b(s)) (4) clean audio signals and noisy audio signals that were superimposed with Gaussian white noises with different Where W (S) ∈ RC×dM and b(s) ∈ RC are the learnable SNRs ranging from −20dB to 20dB at 5dB intervals. parameters for the Softmax layer with C representing the As for the visual signal, only the clean visual signals number of classes. were used. Noisy visual signals were not used in this research. The hyperparameters used in the RNN By varying the configuration of the AVSR system models were kept constant throughout the research. through replacing other types of RNN, such as basic The fixed hyperparameters that shared across network RNN, LSTM, gated recurrent unit (GRU) and bidirec- models are batch size 60, number of epochs 10 and tional setting in the proposed system design, multimodal learning rate 0.01. Numbers of hidden neurons in audio RNN variants for individual audio and visual network features extraction mechanism, visual features extraction models were developed for comparison purpose. The mechanism and features integration mechanism have architectures for comparison RNN models are shown been as 100, 100 and 10 respectively in Figure 3 and Figure 4. The results obtained from all different types of configuration were recorded and A. Comparative study on Different Multimodal Networks compared with the proposed AVSR model. Besides, pa- rameters such as number of hidden neurons were varied In this section, different RNN which are: 1) basic RNN, and the effect of having different combinations of number 2) bidirectional basic RNN, 3) LSTM, 4) bidirectional of hidden neurons across the RNN models was explored. LSTM, 5) GRU and 6) bidirectional GRU are tested and The Softmax layer was set as constant layer throughout evaluated. All these network models are evaluated using the experiment. clean audio signals. Experimental results are shown in Figure 5 and Figure 6. A good loss gradient can be IV. RESULTS AND DISCUSSIONS observed from the results as the loss converges with respect to the increasing number of epochs. This indicates The proposed AVSR system was tested using cross that all the network models are able to converge after validation approach. The proposed system was trained a few epochs. Besides, losses happened at bidirectional using 940 instances from the dataset, while the other model shows similar trend as the non-bidirectional net- 60 instances were separated from training dataset and work models. The deviations between bidirectional and were used in validation. The 60 instances were randomly non-bidirectional is around 2%. As for the validation selected from both audio and visual dataset. The accuracy test for different network models, results show binary cross-entropy cost functions as the loss function, that the final accuracies of all network models reach RMSprop as optimizer, and an activity regularization 0.9 when the number of epochs is equalvalent to 5 and above. Since uni-directional and bidirectional multimodal networks show similar losses and accuracies, they are both suitable to construct the proposed AVSR system. For simplicity, uni-directional multimodal network was chosen. Besides, by referring to past research results, LSTM is able to perform well in speech recognition area [28], [29], as a results, uni-directional multimodal LSTM network was chosen to construct the proposed AVSR system. 41

2019 4th International Conference on Information Technology (InCIT2019) Fig. 6. Validation Accuracy for Different Network Models and a RNN for multimodal features integration showed that it outperforms conventional AVR system in word B. Comparative Study Between Proposed AVSR System and Individual Audio and Visual RNN Models recognition under different levels of SNR. The fixed In order to determine whether a multimodal AVSR sys- hyperparameters that shared across network models are tem can outperform conventional single modal ASR sys- batch size 60, number of epochs 10 and learning rate 0.01. tem, comparison between multimodal RNN-based AVSR Numbers of hidden neurons in audio features extraction system and single modal RNNs for audio and visual speech recognition systems was carried out using clean mechanism, visual features extraction mechanism and visual signals, clean audio signals and noise added audio features integration mechanism have been as 100, 100 and signals. The overall results for this experiment are shown 10 respectively. Different types of RNN, such as simple in Table I. Results show that the recognition performance RNN, LSTM, and GRU show slight difference in terms of the ASR system is dropping as SNR getting lower. At clean environment, single audio network model shows of recognition rate at the first few epochs and ultimately, the greatest word recognition rate of 95% compared to they able to reach up to 90% accuracy. The possible the other lower SNR regions. As for visual model, it future work will be applying the current AVSR system only obtains 12% of word recognition rate for clean visual dataset. After combining both the audio and visual to develop a practical and real-world applications. The signals together at the features integration mechanism, the proposed multimodal network AVSR system shows architecture of proposed AVSR can be amended for future consistent performance, it’s recognition rate remains at work. For instance, a 3D convolution neural network can around 89% across all various SNRs. The recognition results for the proposed AVSR system is slightly lower be included in visual feature extraction mechanism. As than audio network model at clean environment. For SNR a CNN can contributes to a more robust speech recog- regions from 20dB to 0dB, recognition rates achieved by AVSR system is just slightly higher than the audio nition performance in a real-world environment, where system, differences of ±2% can be observed. At low SNR regions (-5dB to -20dB), by processing both audio dynamic changes such as facial orientation, reverberation and visual signals together, recognition rates achieved and illumination occurs. The 3D convolution is better in by the AVSR network model is 3% to 6.3% higher than the recognition rates achieved by audio network recognising for video. system. The multimodal RNN AVSR system shows that it’s advantage as the additional sensory model is able REFERENCES to generate consistent recognition rate in different SNR environments. [1] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in 2013 IEEE international TABLE I conference on acoustics, speech and signal processing. IEEE, WORD RECOGNITION RATE OF DIFFERENT RNN MODELS UNDER 2013, pp. 6645–6649. DIFFERENT SNR REGIONS [2] A. Biswas, P. K. Sahu, and M. Chandra, “Multiple cameras audio visual speech recognition using active appearance model visual SNR(dB) Clean 20 15 10 5 0 -5 -10 -15 -20 features in car environment,” International Journal of Speech Technology, vol. 19, no. 1, pp. 159–171, 2016. Audio(%) 95.0 88.5 87.7 86.8 88.2 87.9 86.6 86.3 86.7 83.0 [3] E. Vincent, J. Barker, S. Watanabe, J. Le Roux, F. Nesta, and AVSR(%) 89.4 88.8 89.2 89.3 89.2 89.0 89.5 89.7 89.0 89.3 M. Matassoni, “The second chimespeech separation and recogni- tion challenge: Datasets, tasks and baselines,” in 2013 IEEE Inter- SNR(dB) Clean national Conference on Acoustics, Speech and Signal Processing. IEEE, 2013, pp. 126–130. Visual(%) 12.0 [4] F. Weninger, M. Wo¨llmer, J. Geiger, B. Schuller, J. F. Gemmeke, V. CONCLUSION A. Hurmalainen, T. Virtanen, and G. Rigoll, “Non-negative ma- trix factorization for highly noise-robust asr: To enhance or to In this study, the proposed AVSR system based on recognize?” in 2012 IEEE International Conference on Acoustics, deep learning for audio and visual feature extraction Speech and Signal Processing (ICASSP). IEEE, 2012, pp. 4681– 4684. [5] Y. H. Goh, P. Raveendran, and Y. L. Goh, “Robust speech recognition system using bidirectional kalman filter,” IET Signal Processing, vol. 9, no. 6, pp. 491–497, 2015. [6] Y.-H. Goh, Y.-L. Goh, Y.-K. Lee, and Y.-H. Ko, “Robust speech recognition system using multi-parameter bidirectional kalman filter,” International Journal of Speech Technology, vol. 20, no. 3, pp. 455–463, 2017. [7] C. Kim and R. M. Stern, “Power-normalized cepstral coefficients (pncc) for robust speech recognition,” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol. 24, no. 7, pp. 1315–1329, 2016. [8] Y. H. Goh, P. Raveendran, and S. S. Jamuar, “Robust speech recognition using harmonic features,” IET Signal Processing, vol. 8, no. 2, pp. 167–175, 2013. [9] H. Xu, M. J. Gales, and K. Chin, “Joint uncertainty decoding with predictive methods for noise robust speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 6, pp. 1665–1676, 2011. [10] K. Noda, Y. Yamaguchi, K. Nakadai, H. G. Okuno, and T. Ogata, “Audio-visual speech recognition using deep learning,” Applied Intelligence, vol. 42, no. 4, pp. 722–737, 2015. [11] M. Ibrahim and D. Mulvaney, “A lip geometry approach for feature-fusion based audio-visual speech recognition,” in 2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP). IEEE, 2014, pp. 644–647. [12] S. Tamura, H. Ninomiya, N. Kitaoka, S. Osuga, Y. Iribe, K. Takeda, and S. Hayamizu, “Audio-visual speech recognition using deep bottleneck features and high-performance lipreading,” 42

2019 4th International Conference on Information Technology (InCIT2019) in 2015 Asia-Pacific Signal and Information Processing Associa- tion Annual Summit and Conference (APSIPA). IEEE, 2015, pp. 575–582. [13] J. Huang and B. Kingsbury, “Audio-visual deep learning for noise robust speech recognition,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013, pp. 7596–7599. [14] Y. Mroueh, E. Marcheret, and V. Goel, “Deep multimodal learning for audio-visual speech recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 2130–2134. [15] A. Kuamr, M. Dua, and A. Choudhary, “Implementation and performance evaluation of continuous hindi speech recognition,” in 2014 International Conference on Electronics and Communication Systems (ICECS). IEEE, 2014, pp. 1–5. [16] A. Kaur and A. Singh, “Optimizing feature extraction techniques constituting phone based modelling on connected words for pun- jabi automatic speech recognition,” in 2016 International Confer- ence on Advances in Computing, Communications and Informatics (ICACCI). IEEE, 2016, pp. 2104–2108. [17] N. Dave, “Feature extraction methods lpc, plp and mfcc in speech recognition,” International journal for advance research in engineering and technology, vol. 1, no. 6, pp. 1–4, 2013. [18] M. Anusuya and S. Katti, “Comparison of different speech feature extraction techniques with and without wavelet transform to kannada speech recognition,” International Journal of Computer Applications, vol. 26, no. 4, pp. 19–24, 2011. [19] L. Lombardi et al., “A survey of automatic lip reading ap- proaches,” in Eighth International Conference on Digital Infor- mation Management (ICDIM 2013). IEEE, 2013, pp. 299–302. [20] I. Almajai, S. Cox, R. Harvey, and Y. Lan, “Improved speaker independent lip reading using speaker adaptive training and deep neural networks,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 2722–2726. [21] N. T. Chuong and J. Chaloupka, “Visual feature extraction for isolated word visual only speech recognition of vietnamese,” in 2013 36th International Conference on Telecommunications and Signal Processing (TSP). IEEE, 2013, pp. 459–463. [22] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng, “Multimodal deep learning,” in Proceedings of the 28th international conference on machine learning (ICML-11), 2011, pp. 689–696. [23] S. Dupont and J. Luettin, “Audio-visual speech modeling for continuous speech recognition,” IEEE transactions on multimedia, vol. 2, no. 3, pp. 141–151, 2000. [24] X. Shao and J. Barker, “Stream weight estimation for multistream audio–visual speech recognition in a multispeaker environment,” Speech Communication, vol. 50, no. 4, pp. 337–353, 2008. [25] D. Stewart, R. Seymour, A. Pass, and J. Ming, “Robust audio- visual speech recognition under noisy audio-video conditions,” IEEE transactions on cybernetics, vol. 44, no. 2, pp. 175–184, 2014. [26] A. H. Abdelaziz, S. Zeiler, and D. Kolossa, “Learning dynamic stream weights for coupled-hmm-based audio-visual speech recog- nition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 5, pp. 863–876, 2015. [27] G. Bradski and A. Kaehler, Learning OpenCV: Computer vision with the OpenCV library. ” O’Reilly Media, Inc.”, 2008. [28] H. Sak, A. Senior, and F. Beaufays, “Long short-term memory recurrent neural network architectures for large scale acoustic modeling,” in Fifteenth annual conference of the international speech communication association, 2014. [29] X. Li and X. Wu, “Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recog- nition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 4520– 4524. 43

2019 4th International Conference on Information Technology (InCIT2019) Very Short-Term Solar Power Forecast using Data from NWP Model Sukrit Jaidee Wanchalerm Pora Department of Electrical Engineering, Department of Electrical Engineering, Faculty of Engineering, Faculty of Engineering, Chulalongkorn University Chulalongkorn University Bangkok, Thailand Bangkok, Thailand [email protected] [email protected] Abstract—This article presents a method to forecast solar instantaneous consumption load and to be ready for power 4 hours in advance by using forecast weather data from production. Numerical Weather Prediction (NWP) and the measurement obtained from weather monitoring instrument with feature Basically, in Solar power forecast by machine learning engineering by machine learning techniques using Feed- techniques, input variables used in forecasting are forward Neural Network (FNN) and Recurrent Neural Network measurements of various weather variables and use machine (RNN). Four types of relevant variables—forecast weather data, learning models to transform measured values derived from the average of forecast weather data, forecast weather data with the measurement of weather variables into solar power the measurement from weather instrument, and the average of through training model [1] But the distribution of areas that forecast weather data from nearby areas with the measurement generate electricity from solar energy and the high cost of from weather instrument—are used to find the right model and installing weather monitoring instruments in solar farms, solar input variable for the forecast. The result shows that by using rooftops in each household and solar floating farms make it Feed-forward Neural Network with the forecast weather data difficult to predict the power generation from solar energy and the measurement from weather monitoring instrument, the with mentioned inputs. In order to make the forecast possible Root Mean Square Error (RMSE) value is 8.13%. On the other in each area of production and to further enhance the forecast hand, by using Feed-forward Neural Network with just the at the city or region level, we ,therefore, considered bringing forecast weather data from nearby areas, RMSE is at 8.38% Numerical Weather Prediction (NWP) to provide the weather which is only 0.25% higher than the forecast with the forecast of various variables such as solar irradiance measurement. Thus, only weather data from nearby areas can ( ∙ ), ambient temperature, relative humidity, etc. The be used for prediction of the area without the measurement NWP which can provide weather forecasts from one hour to from weather monitoring instrument and we can use it for several days with moderate accuracy are used to provide further prediction at the region level inputs to models of solar power forecast for solar farms, solar floating farms and solar rooftops that are not equipped with Keywords—Numerical Weather Prediction (NWP), Machine weather monitoring instruments and used to increase the Learning, Recurrent Neural Network (RNN), Feed-forward accuracy of forecasting for areas with measurements from Neural Network (FNN), Solar Power Forecast weather monitoring instruments. The study shows that 2 types of NWP and Feed-forward Neural Network (FNN) should be I. INTRODUCTION used to forecast the solar power generation of the power plant. [2] Most forecasting methods use historical values or variables Nowadays, the electricity generation capacity from solar related to predicted weather forecast data. In addition, energy has increased significantly according to Thailand Photovoltaic (PV) Power is used directly to training models Power Development Plan. Electricity Generating Authority of [3] or the solar power from solar irradiance ( ∙ )is Thailand (EGAT) has plans to build solar power plants, solar forecast by using downscaling models [4]. floating farms in all dams in Thailand as well as the increase in the installation of solar rooftop in each household, by the This article aims to improve the forecasting of electricity support of the Provincial Electricity Authority (PEA), for generation from solar energy in areas that are equipped with household use and to sell to the grid (Power Grid System). As weather monitoring equipment and in areas that do not have a result, the electricity generation from solar energy can be weather monitoring devices by using NWP together with other found in many different areas. However, the uncertainty of related variables and feature engineering process to create power generation from solar energy caused by natural variables of the solar irradiance ( ∙ ) obtained by fluctuations has resulted in the decrease of the reliability and calculating with the Exponential Moving Average (EMA). stability of the electrical system. Therefore, the forecasting of electricity generation from solar energy is a major challenge II. SOLAR POWER FORECAST USING NEURAL in the integration of volatile renewable energy into power grid NETWORKS system. With more and more volatile renewable energy into the grid system, the large fluctuations will occur if not handled A. Feed-forward Neural Network (FNN) properly. Problems with Power Supply and Frequency The proposed FNN structure consists of 8 inputs, 256 Regulation can occur. In order to deal with these problems, forecasting should be done to help us maintain frequency neurons in the hidden layer and 8 neurons in the output layer. stability and maintain the total energy produced to match the 44

2019 4th International Conference on Information Technology (InCIT2019) Fig. 3. General structure of RNN Fig. 1. Structure of Feed-forward Neural Network model Recurrent Units (GRU) architecture structure [6] and ReLU activation function because it is the architecture developed Fig. 2. Input dimensions for each sample and forecasting horizon (time from RNN in order to avoid vanishing gradient problems as steps) shown in Figure 3 from Figure 3 = ( , ) is new state, is the function with parameters ∙ is old state, As the forecast is 8 times steps, each step is in 0.5-hour scale. 8 time steps in this study are 4 hours in advance. The structure is the input vector that has the size according to the value of FNN is shown in Figure 1. In the training model, the model that we define, which, in this article, is set to 8, that is, the was trained with 32 batch size (batch size is the number of input in each sample will be a vector of the previous 8 hours in each input variable. 8 is a customized number, not by the experiment, to find the best numerical value for the model, and where = 1,2, … ,8 is the solar power in each time step of the forecast in advance. Hyperparameter as defined in the RNN model of this study is 2 hidden layers and each layer has a total of 256 neurons, 32 batch size, learning rate is 0.005. In order to reduce overfitting in this study, regularization with the dropout technique is used to cut off some nodes while training, so that there is not too much reference from these nodes because there is high possibility that the weather monitoring devices will sometimes be unavailable causing the unavailability of information that will be input to the model. Therefore, using dropout will help reduce this problem. samples per mini-batch) 50 epochs (epochs is the maximum number of times that the training algorithm will cycle through all samples). Use Rectified Linear Unit (ReLU) Activation C. Exponential Moving Average (EMA) 0, x́ < 00. The equation is f(x́ ) = x́ , x́ ≥ Optimizer use RMS Prop In order to improve the accuracy of solar power forecast, we would do feature engineering by adding the EMA variable and loss function using the Mean Squared Error (MSE) with of the solar irradiance ( ∙ ) , that is added as an additional techniques to find the best model from the early additional input to the model as EMA provides more weight stopping technique with the condition that if the value of min to the data that is in close proximity to the present time and delta after patience epochs causes poor validation loss, the lower the weight of the data in a distant period; it provides training will stop. The input of the model in each sample is the Weight Moving Average (WMA) where weight is in vector of the previous six hours of each variable used as input exponential. The equation of weight is to the model, and the forecasting horizon section is set to eight, which is the time steps in the prediction. Therefore, this model is 8 time steps prediction in advance as shown in Figure 2. , =0 ( ́ ) = ∙ + (1 − ) ∙ , >1 (1) B. Recurrent Neural Network (RNN) where = 0.9 since we experimented ( = 0.6,0.1, 0.9) and the result showed that = 0.9 provide the most accurate RNN is a non-parametric that greatly simplifies learning forecasting value with the model used in the study. The and is a natural extension of the ARIMA model, but RNN is more flexible and can also add other external variables to the values used in this study are the number we set up ourselves model. [5] In the structure of RNN, there is internal hidden state that can be fed back to the network, where the weight and which should be experimented to find the best in the range bias values are the same and can share through every step. In (0,1). As the trend of the solar irradiance ( ∙ )per RNN model, it is trained by Backpropagation Through Time (BPTT). It will forward through the entire series in order to area in the next four hours is related to the trend of the solar calculate the loss and then backward through the entire series irradiance in the previous four hours, that is the reason for to calculate the gradient. In this article, we used the Gated adding this variable to one of our inputs. 45

2019 4th International Conference on Information Technology (InCIT2019) Fig. 4. Latitude and longitude grid points of NWP every 3 minutes. In this article, we resample the data in the hour scale because the values obtained from the NWP model D. Numerical Weather Prediction (NWP) are in hour scale. The weather forecast data from the NWP model used in this article use a Global Forecast System (GFS) Numerical Weather Prediction models are generally used for as input, and output to 00UTC and 12UTC, provided by forecasting the solar irradiance ( ∙ ) . However, it National Centers for Environmental Prediction (NCEP). This also provides other weather forecasting variables such as NWP model works twice a day to predict the 2-day weather temperature, air pressure, downward / upward short-wave forecast in advance. The forecasting data will be available radiative flux, etc. In this study, the NWP model was used in every hour. In the training model, the training dataset used in forecasting because it provides forecast values in the form of this study will be from 1 January 2017 - 31 December 2017 latitude and longitude grid points as shown in Figure 4. It is and the testing set used in this study will be from 1 January very useful for cases where the area we consider does not 2018. - 31 October 2018 with Performance Index used to have weather measurement devices. evaluate in this study is the Root Mean Square Error (RMSE) and Mean Bias Error (MBE) as the following equation We can use the points that provide weather forecast data by the NWP model near the area that we are interested by = 100% ∑ ( ) − ( ) /( .) using the 4 nearest points [7] for each area that we are interested to predict the power capacity of the area we are and = ∑ ( ( ) − ( )) where ( ) is forecasted interested and interpolate the forecasted values from all 4 solar power at time t, N is the number of samples and installed points to forecast for the area that we consider. We can capacity of CUBEMS is 8 kW. perform a single forecast for the area that we are interested from all the 4 nearest points [7] or as predictions from 1 IV. EXPERIMENTS AND RESULTS nearest point [8] or bring the distance between each point near the area we want to forecast to participate in the forecasting. A. Experiment on Feed-forward Neural Network (FNN) In this article, we would consider the nearest points of 1, 4, 9, The results of the study shown in Figure 5 can be 16, 25, and 36 points to be used in forecasting areas of interest. concluded that the feature set 3 that consists of features as in Table 1 gives the least RMSE in one point case compared to III. DATA DESCRIPTION other feature set values by 8.13%. While feature Set 1 with only the value of the weather forecast from the NWP model We divided the feature sets used in this study into 4 sets alone, the RMSE in one point case is 8.38%. It can be according to Table1 and divided the study into 6 cases: the concluded that feature sets 3 gives a reduced error from nearest 1, 4, 9, 16, 25 and 36 points, respectively. feature set 1 by 0.25%. And based on the results from the study shown in Figure 6, Feature Set 1 provide a high TABLE I. FEATURES IN EACH DATASET WHICH USED IN prediction error at 6:00 am, which gives the RMSE in one 1 THE IMPLEMENTATION point cases at 6:00 am at 2.73%. While the RMSE in one point cases at 6:00 am of feature set 3 is 2.39%. The study shows Feature Sets that the RMSE of feature set 3 decreases from feature set 1 by 0.34%. resulting in a high error at 6:00 am decreasing 23 4 significantly. And the results from the study shown in Figure 7 show that feature set 1 have a higher prediction error in each , , , ,, , ,, time step than in feature set 3, except time step 1 and 2. feature , , ,, ,, set 1 gives higher RMSE in each time step than in feature set ( ), ( ), ( ), ( ), ( ), ( ), 3 as shown in Table 3, which can be seen that feature sets 3 () () has a RMSE value reduced by 0.68% and 0.51% of feature set 1 in time step 3 and 8 respectively and has a RMSE value increased by 0.34% of feature set 1 in time step 1 and 2 respectively and from Figure 7 show that the predictive error has increased significantly in the time step 5 onwards. Also, the results of the study shown in Figure 8 shows that MBE at 6:00 and 7:00 am of feature set 3 decreasing from feature set1 by 0.34%. The measurement data obtained from the weather Fig. 5. RMSE (%) of each feature set in each case. measurement equipment used in this article is derived from Chulalongkorn University Building Energy Management System (CUBEMS) which is a system that receives and collects data from various weather monitoring devices installed at faculty of engineering, Chulalongkorn University. This article used the data from 1 January 2017 to 31 October 2018, which is the data from 06:00 – 18:00. The data of weather variables obtained from CUBEMS consists of Solar Irradiance( ),Temperature( ), Relative Humidity( ℎ), UV index( ), and Wind Speed( ) with all variables received 46

2019 4th International Conference on Information Technology (InCIT2019) TABLE II. RMSE (%) OF FEATURE SETS IN EACH CASE. TABLE III. RMSE (%) IN EACH TIME STEP OF FEATURE SETS 1 AND 3 (1 POINT) RMSE (%) in each case Sets 4 9 16 25 36 Sets RMSE (%) in each Time Step 1 1 2345 6 7 8 1 8.38 8.44 8.41 8.53 8.54 8.61 1 2.9 12.78 2.56 3.24 3.92 9.37 11.24 12.1 2 8.38 8.32 8.51 8.24 8.22 8.43 3 8.13 8.19 8.21 8.24 8.18 8.28 3 3.24 2.9 2.56 3.92 9.03 10.9 11.75 12.27 4 8.13 8.25 8.23 8.28 8.23 8.14 Feed-forward Neural Network (FNN) is a general model Fig. 8. MBE (%) of feature set 1 and 3 at 6:00 to 7:00 am (1Point) which is used in various applications such as image recognition but its limitation is that it does not have memory Fig. 9. RMSE (%) of each feature set in each case. unit, so it cannot remember previous State to model sequences of data; this is a critical problem for sequence data such as text B. Experiment on Recurrent Neural Network (RNN) or time series, etc. Moreover, despite its deeper model, FNN From the study, the results shown in Figure 9 show that learns less from time series data than RNN does because it doesn’t have memory unit, so the input is fixed-size input the Feature Set 3 that consists of Feature as in Table 1 gives while RNN can adjust the size of input in each round, so RNN the least RMSE in one-point case compared to other Feature can search more data. Set, with a value of 8.45% which is 0.32% more than the results obtained from the FNN model. And from the results Normally, to apply general FNN with time connections in shown in Figure 10, the trend of RMSE values in each time the same way as RNN, we will create training data set where steps of this model which is relatively high compared to the input is the latest n time steps and target or output is the next results of FNN model and the trend of RMSE increased with n+1th time step. Adding time data features ((such as hour-of- the farther Time Steps. By comparing between the Feature day, day-of-year, etc.) to the model and adding Deep, enough Set 3 of FNN model and RNN model, it can be seen that the complexity of data will help model time series effectively. But RMSE value in one-point case of FNN model is less than of for short time series (such as less than 100 time steps), we may the RNN model in Time Steps 2,3,5,6,7 and 8, and both not have to create intricate relationship through time model models have similar the trend of RMSE values in each time since with short time series, FNN can perform as well as RNN. steps. Also, based on the study results shown in Figure 11, However, it depends on the architecture of the networks, the the value of RMSE in one-point case at 13:00, 14:00, 15:00, training time, the initialization of weights and other and 18:00 pm of FNN model increaded from RNN model by parameters. 0.51%, 1.02%, 0.68%, and 0.17% respectively and Figure 12 shows that FNN model provides predictive values that are Fig. 6. the RMSE (%) in one-point cases at 6:00 am lower than the actual value, while the RNN model provides a forecast value that is higher than the actual value. The results Fig. 7. RMSE (%) in each time step of feature sets 1 and 3 (1 Point) of the study shown in Figure 13 and 14 shows that by using RNN with the forecast weather data, real weather data, and the EMA variable of solar irradiance, the Root Mean Square Error (RMSE) value is 8.45%. On the other hand, by using RNN with just the forecast weather data, RMSE is at 8.73% which is only 0.28% higher than the forecast with the forecast weather data, real weather data, and the EMA variable of solar irradiance. 47

2019 4th International Conference on Information Technology (InCIT2019) TABLE IV. RMSE (%) OF FEATURE SETS IN EACH CASE Sets # Points 25 36 1 4 9 16 9.04 9.05 1 8.73 8.96 9.03 9 2 8.73 9.02 8.98 9.01 9.03 9.01 3 8.45 8.8 8.72 8.87 8.61 8.58 4 8.45 8.84 8.88 8.95 8.88 8.77 Fig. 14. The actual value compared to the forecast value of each model. Fig. 10. RMSE (%) of feature sets 3 in one-point case for each time step V. CONCLUSIONS Fig. 11. RMSE (%) of FNN and RNN model in one-point case Fig. 12. MBE (%) in one-point case of feature set 3 from 6:00 - 18.00 Solar power forecast at the city or regional level is a major Fig. 13. RMSE (%) of each feature sets for FNN and RNN models challenge to combine renewable energy which has increasing proportion into the grid system. However, forecasting the power generation capacity at the regional level is difficult since weather variables are needed as inputs in forecasting but weather monitoring instruments cannot be installed in each solar farm or each house with a solar rooftop; it is costly. This article, therefore, proposes solar power forecast with the NWP model that will provide weather forecasts in different areas as grid points to cover the central region of Thailand as we have designated. Normally, with the time series data, the RNN model performs better than the FNN model because the model RNN has its feedback loop. As a result, it can use the previous state to predict the current and future status. In this paper, we have added the EMA variable of solar irradiance. As a result, the FNN model can use the previous state to predict the current and future status as well. Since the time step ( = 8) is a short time series and the FNN model has been transferred to connect time or simulate the time series through the characteristics of the variable (irradiance, temperature, forecast weather data etc.) that changes over time of the day, resulting in the FNN model performing better than the RNN model. However, it depends on the data, architecture of the networks, the training time, the initialization of the weights, and other parameters. From the experiment, the forecast by using the measured values from the weather monitoring instrument, the weather forecast data obtained from the NWP model, and EMA variable of solar irradiance has only 0.25% lower RMSE value than the forecast using just the weather forecast data obtained from the NWP model. Forecasting with FFN model and feature engineering by adding EMA variable of solar irradiance, in order to improve the accuracy of forecasting, shows that the RMSE at 6.00 and 7.00 a.m. decreased significantly from the case without the EMA variable of the solar irradiance and real weather data. And finding predictions in each interested area from grid points of NWP that provides forecast values in that area is another technique to help improve predictive accuracy. Therefore, the models and techniques proposed in this article will make the predictions at the city or regional level in Thailand possible. 48

2019 4th International Conference on Information Technology (InCIT2019) ACKNOWLEDGMENT SystemsApplications to Power Systems”, 2007. ISAP 2007. The author would like to express sincere gratitude to International Conference on, 2008, pp. 1–6. Electricity Generating Authority of Thailand for granting scholarship for the author’s education in master’s degree and [5] Yijing Chen, Dmitry Pechyoni, Angus Taylor, Vanja Paunic, for research funding as well as thank the advisor and those “Recurrent Neural Networks for Time Series Forecasting”, who have been advising and giving suggestions throughout AIConf2018-RNN for time series forecasting. the research.. [6] Cho et al, “Learning Phrase Representations using RNN Encoder– REFERENCES Decoder for Statistical Machine Translation”, 2014 [1] Chow SKH, Lee EWM, Li DHW, “Short-term prediction of [7] ROMAIN JUBAN, PATRICK QUACH, “Predicting daily incoming photovoltaic energy generation by intelligent approach”, Energy Build. solar energy from weather data”, Stanford University - CS229 Machine 2012; 55: 660–667. Learning [2] Fernandez-Jimenez LA, Muñoz-Jimenez A, Falces A, Mendoza- [8] Jeff Patra, “Solar Energy Prediction Using Machine Learning”, 2017 Villena M, Garcia-Garrido E, Lara-Santillan PM, Zorzano-Alba E, Zorzano-Santamaria PJ, “Shortterm power forecasting system for [9] Xinmin Zhang, Yuan Li, Siyuan Lu, Hendrik F. Hamann, Bri-Mathias photovoltaic plants”, Renewable Energy 2012; 44: 311–317. Hodgem, and Brad Lehman, “A Solar Time-Based Analog Ensemble Method for Regional Solar Power Forecasting” IEEE [3] Bacher P, Madsen H, Nielsen HA, “Online short-term solar power TRANSACTIONS ON SUSTAINABLE ENERGY, VOL. 10, NO. 1, forecasting”, Solar Energy 2009; 83(10):1772–1783. JANUARY 2019 [4] Yona A, Senjyu T, Saber AY, Funabashi T, Sekine H, Kim CH, [10] Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto “Application of neural network to one-dayahead 24 hours generating Troccoli and Rob J. Hyndman, \"Probabilistic energy forecasting: power forecasting of photovoltaic system, in Intelligent Global Energy Forecasting Competition 2014 and beyond\", International Journal of Forecasting, vol.32, no.3, pp 896-913, July- September 2016. [11] AMS 2013-2014 Solar Energy Prediction Contest https://www.kaggle.com/c/ ams-2014-solar-energy-prediction-contest 49

2019 4th International Conference on Information Technology (InCIT2019) Twitter Based Outcome Predictions of 2019 Indian General Elections Using Decision Tree Ferdin Joe John Joseph Faculty of Information Technology Thai-Nichi Institute of Technology, Bangkok [email protected] Abstract – Social Media is a huge corpus of raw data available future. Though the process of poll predictions is banned due to process and analyze the mood of the people. Various institutions to the model code of conduct, it applies to twitter based poll and corporations use social media data to understand market predictions as well. So this process is carried out during the response of their products. Twitter data is widely available one for polling season and the results are declared after the model this purpose. Similar to market analysis, this data is used to map code of conduct is lifted by the Election Commission of India, political mood of the general public. Most of the existing which is now legal. This paper discusses about the prediction methodologies use tweets downloaded on a particular criterion. of election outcomes from twitter data in the past and There is very limited study done on daily mood mapping of political proposes a new methodology good enough for a long electoral views among people. This paper addresses a methodology to predict process done in the Republic of India during the year 2019. the outcome of the 2019 Indian general elections using the The results obtained are for the incumbent ruling party against sentiment analysis of twitter data. Decision tree classifier is used to all the opposition parties combined. For a country with nearly train and test data and the predicted outcome is found to be close 1.3 billion populations, it is not practical enough to conduct to that of the actual outcome and most of the pre poll analysis done pre poll and exit poll surveys in all 543 constituencies with a so far. The experiments reported in this paper are only on tweets neutral and transparent manner. So the agencies managing the in English language and having the most number of retweets by political campaign and the media house need to gain the users. This methodology is efficient enough to map the mood knowledge through freely available raw data from which the of people over a timely basis across various phases of polls. exact outcome is understood. The section II will discuss on the various methodologies used so far to predict elections all Keywords – Sentiment Analysis, Decision Tree, Twitter Data, over the world and the section III discusses the proposed Election Prediction methodology in this paper. Then the results are visualized and the method is justified over in section IV. I. INTRODUCTION II. RELATED WORK The Indian general elections are held once in 5 years to elect the Member of Parliament (MP) from 541 constituencies all There are many methodologies proposed for election outcome over the country. These elected MPs will elect the Prime prediction from tweets. A handful of literature is available for Minister who will rule the country for the next 5 years. For Indian elections. Election outcome prediction using twitter confirming the term, the Prime Minister who took oath should has a history dating back to 2012. It was for Queensland state prove his majority in Lok Sabha, the lower house of the Indian election, the methodology focused on the issues raised in Parliament. In this motion, the Prime Minister must have twitter and the popular mention [1]. A substantial study on the support of at least 272 MPs including the vote of the elected sentiment on twitter was done to forecast 2013 Pakistani and speaker. This process is happening since 1952 when the 2014 Indian elections [2]. A Diffusion centric model was country was declared a republic. The country had elections in created to map the sentiments of people in election centric 1952, 57, 62, 67, 71, 77, 80, 84, 89, 91, 96, 98, 99, 2004, 09, issues. This has no support to the popularity or any forecast in 14 and 19 respectively. Until 2009 elections, the election the respective elections. An unsupervised learning based campaigning happened on party and ideology centric. Since methodology was used for 2016 US presidential poll 2014 the trend shifted towards Prime Minister candidate. campaign based tweets [3]. This was done for two days of There are many processes carried out by various independent tweets with around 60000 tweets each for Donald Trump and agencies to predict the outcome of polls in the past. Pre poll Hillary Clinton. analysis was conducted weeks or months before the election and Exit poll was conducted on the day of election with the Swedish elections were predicted in [4] using the frequent voters returning from the voting booth as sample space. This mapping of political behavior of users such as retweet count, is done on all the 7 phases of election conducted in various likes and other aspects. A support vector machine and parts of the country with different timetable. These kind of Convolutional Neural Network based classification was done poll predictions were banned by the Election Commission of to forecast elections in UK [5]. This methodology obtained India under the model code of conduct for the elections to around 80% accuracy. Normally these twitter based election happen without bias in the largest democracy of the world. The internet users in India has increased exponentially over the past decade and it is expected to continue the trend in the 50

2019 4th International Conference on Information Technology (InCIT2019) classification methods are good for countries with two party pruned. After this procedure the resultant text is then system. For countries like India having multi-party subjected to normalization by tokenizing the words and democracy, forecasting of election outcome is still a making it parse-able by the classifier. challenge. C. Sentiment Analysis German federal elections were predicted using twitter text. This attempt was done on the twitter mentions and they got Long before the data collection, preliminary experimentations an error rate of 1% [6]. However, the advent of twitter bots were done using classifiers like Artificial Neural Network, and IT wing of political parties make this method not Naïve Bayes Classifier and SVM. These methods were not convincing enough to predict. good enough to experiment by creating a tree of grammar from the Parts of Speech (POS tag). Decision tree was good From all the literature studied above, it is clear that a specific in proving the necessary scores in polarity and subjectivity to methodology Is needed to predict in a multi-party democracy map the mood of people. This was evident with a rough using twitter feed. This process of forecasting elections has an estimation during the State Assembly elections in end of impact in stock exchanges as the indices of market get 2018. However, this process is not a refined one to report as affected to a considerable extent when an unexpected a methodology. With this result of primary investigation, outcome is going to come in the actual result of the election. decision tree classifier is used to predict the polarity of tweets in the proposed methodology. Decision Tree classifier in III. PROPOSED METHODOLOGY Textblob [11] library is used to classify the text extracted. Using this classifier, the polarity and subjectivity of The proposed methodology consists of the 3 phases. The first sentiments from each tweet is calculated. From the scores of two phases are performed every day during a fixed time. polarity and subjectivity, the tweet is classified as positive, negative or neutral. The popularity score for each day is A. Data Collection calculated on the tweets downloaded on that particular day. This process was carried out for 50 days during the election Tweets are downloaded from the twitter database using season. twitter API connectivity. Jupyter Notebook with tweepy [7] library is used. The collected tweets are stored in Mongo DB. Popularity = ((0 x Negative tweets) + (Neutral tweets / 2) + This is done using the pymongo [8] library. Every day during Positive tweets)/Total tweets a particular time, 5000 tweets each for the ruling and opposition parties most famous twitter handles were Negative tweets are not given any score. The proposed extracted. Tweets with twitter handles of the then Prime methodology is designed to find only the popularity. Minister, ruling party leaders and the party itself are taken as Whatever be the lower bound or higher bound weights to ruling party tweets and the tweets with twitter handles of the negative tweets gave similar trend. This popularity score is leader of opposition, opposition party major stakeholders and recorded for tweets in favor of ruling party keywords and the opposition and regional parties are taken as opposition other parties’ keywords separately. For each phase of the poll, party tweets. These tweets are extracted with the conditions the average of popularity scores was recoded and they were of most popular tweets with the most retweets and English as used to calculate the number of seats possible to be won on the language of tweets. In the experimentations on the those getting polled on that phase of polls. The number of proposed methodology, tweets in English language alone are seats predicted was taken to project how many seats possible taken. The sentiment classifier and lexical analysis is to be won by the respective party. The results obtained was available for English language but not available especially for compared to the pre poll survey and the actual result obtained Indian languages. on the counting day. B. Preprocessing IV. RESULTS Preprocessing is done to prune regular expressions and The popularity score obtained every day for ruling party and emoji’s not available in ASCII or Unicode. These symbols are other parties are listed in Fig 1. There might be some ups and not interpretable to any kind of sentiment and were downs in the trend but the popularity scores’ difference was responsible for exceptions during the preliminary always positive for ruling part and it is evident from the trend experimentation. After pruning of unusable regular obtained from Fig 2. The highest popularity obtained by any expressions, the tweets are sorted based on the total number party in a particular day was 72% by the ruling party and the of retweets. Then the attributes ID, text and lang are taken for lowest was recorded by the opposition as 49%. This is the easier processing of data. This is made to include a sparse data daily trend of popularity. The popularity difference is processing and management [9]. Inorder to obtain an efficient obtained over various phases and it is the average of decision, stopwords are removed using the nltk corpus [10] of popularities from day 1 to the respective phase of election. stopwords. After removing stopwords, regular expressions, emojis, Unicode and punctuation marks are pruned. Then the text is checked whether it has any non-English words and then 51

2019 4th International Conference on Information Technology (InCIT2019) The projected seats over the phases shows the seats possible for ruling or other parties to win during that particular phase of poll and it is evident from Fig. 3. TABLE I: PREDICTION OF PROPOSED METHODOLOGY COMPARED WITH VARIOUS SURVEYS AND ACTUAL RESULT OBTAINED. Date Survey agency Ruling All Effect Obtained Party other to Parties Ruling Party 23 May 2019 General 303 239 Win 2019 Election Results (Actual) 20 May Proposed 293 249 Win 2019 Methodology 264 Win (Prediction on 268 Win Fig 1: Popularity scores obtained by ruling and other parties every day. Ruling Party) 233 Win April 2019 Times Now- 279 VMR [12] (Total for Alliance) April 2019 IndiaTV-CNX 275 [13] (Total for Alliance) April 2019 Jan Ki Baat [14] 310 (Total for Alliance) Fig 2: Difference between ruling party and other parties’ popularity during From the Fig. 3, the latest result predicted was 293 and it is various phases. compared against various pre poll surveys and actual result. While most of the pre poll surveys gave a near majority The lowest performance was recorded during the second number, the proposed methodology produced a prediction phase of polls. It was the phase when the seats taking poll result with 97% accuracy. Fig 3 shows that during the first gave results in favor of the opposition parties. The seats won phase of election, the ruling party was capable of winning by the ruling party on the phase 2 also had a little difference over 309 seats while during other phases it reduced to 303 due when compared to those won in other phases. to some controversial statements issued by the prime campaigners. All the experiments were carried out in Jupyter Notebook using Python 3.7 on a windows environment using 4GB RAM and intel core i7 processor. Internet connection stability works fine over a fibre optic connection with atleast 30 mbps download bandwidth. Fig. 3: Projected seats to be won by ruling party and other parties during V. CONCLUSION various phases. It is evident from the results that the proposed methodology gave a near prediction to the actual result from analyzing tweets in English language. There is a need for this methodology to analyze tweets in other languages. This will help predict elections where non English language is spoken in majority or to map the trends of people’s mood in country like India over various states speaking and tweeting multiple languages. The results obtained in the proposed methodology shows that this methodology has a promising future in predicting Indian General elections. However, this trend has to be validated with the results obtained by State Assembly elections. There are many other deep learning classifiers like Convolutional Neural Networks (CNN), Recurrent CNN etc 52

2019 4th International Conference on Information Technology (InCIT2019) but the proposed methodology is not dealt with any deep learning methodology. ACKNOWLEDGEMENT The author thanks the reviewers for their time and valuable comments while reviewing this paper. Gratitude is bestowed to the support provided by Thai- Nichi Institute of Technology, Bangkok during the entire process of research presented in this paper. REFERENCES [1] J. Burgess and A. Bruns, “(Not) the Twitter election: the dynamics of the# ausvotes conversation in relation to the Australian media ecology,” Journal. Pract., vol. 6, no. 3, pp. 384–402, 2012. [2] V. Kagan, A. Stevens, and V. S. Subrahmanian, “Using twitter sentiment to forecast the 2013 pakistani election and the 2014 indian election,” IEEE Intell. Syst., vol. 30, no. 1, pp. 2–5, 2015. [3] J. Ramteke, S. Shah, D. Godhia, and A. Shaikh, “Election result prediction using Twitter sentiment analysis,” in 2016 international conference on inventive computation technologies (ICICT), 2016, vol. 1, pp. 1–5. [4] A. O. Larsson and H. Moe, “Studying political microblogging: Twitter users in the 2010 Swedish election campaign,” New Media Soc., vol. 14, no. 5, pp. 729–747, 2012. [5] X. Yang, C. Macdonald, and I. Ounis, “Using word embeddings in twitter election classification,” Inf. Retr. J., vol. 21, no. 2–3, pp. 183–207, 2018. [6] A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe, “Election forecasts with Twitter: How 140 characters reflect the political landscape,” Soc. Sci. Comput. Rev., vol. 29, no. 4, pp. 402–418, 2011. [7] J. Roesslein, “tweepy Documentation,” Online] http://tweepy. readthedocs. io/en/v3, vol. 5, 2009. [8] A. Nayak, MongoDB Cookbook. Packt Publishing Ltd, 2014. [9] F. J. John Joseph, R. T, and J. J. C, “Classification of correlated subspaces using HoVer representation of Census Data,” in 2011 International Conference on Emerging Trends in Electrical and Computer Technology, 2011, pp. 906–911. [10] E. Loper and S. Bird, “NLTK: the natural language toolkit,” arXiv Prepr. cs/0205028, 2002. [11] S. Loria, “textblob Documentation,” 2018. [12] T. N. Bureau, “Times Now-VMR Opinion Poll For Election 2019: PM Narendra Modi-led NDA likely to get 279 seats, UPA 149,” Times Now News, New Delhi, 2019. [13] I. T. News Desk, “Lok Sabha Election 2019: NDA may get thin majority with 275 seats, BJD may retain Odisha, YSR Congress may win Andhra, says India TV-CNX pre-poll survey,” India TV, 2019. [14] “2019 Indian General Election,” Wikipedia, 2019. [Online]. Available: https://en.wikipedia.org/wiki/2019_Indian_general_election. 53

2019 4th International Conference on Information Technology (InCIT2019) Comparison of 3D Point Cloud Processing and CNN Prediction Based on RGBD Images for Bionic-eye’s Navigation Ananya Kuasakunrungroj Toshiaki Kondo Sitapa Rujikietgumjorn School of Information, Computer and School of Information, Computer and National Electronics and Computer Communication Technology (ICT) Communication Technology (ICT) Technology Center Sirindhorn International Institute Sirindhorn International Institute Pathumthani, Thailand [email protected] of Technology of Technology Thammasat University Thammasat University Pathumthani, Thailand Pathumthani, Thailand [email protected] [email protected] Hirohiko Kaneko Wen-Nung Lie Graduate School of Decision Science Department of Electrical Engineering and Center for Innovative Research on and Technology Tokyo Institute of Technology Aging Society (CIRAS) Tokyo, Japan National Chung Cheng University [email protected] Chia-Yi, Taiwan, ROC [email protected] Abstract— Bionic Eye is the device to restore the vision for the commercial device. Moreover, Bionics eye has only a few the blind. This device is developed based on vision partway levels of stimulation, which means that it can stimulate only a stimulation knowledge. However, because of the bionic eye few levels of greyscale. In order to generate a high usability stimulation hardware limitation, possible stimulation pattern pattern from intricate real-life environment image, image are low-resolution images. In order to make those low-resolution processing algorithms are required [2]. images to be useful for blinds, image processing is one of the major challenges in bionic eye development. According to real- One of the interesting topics to generate useful pattern is life applications, RGB-D images can be applied to enhance to increase blind’s mobility by using the bionic eye as object detection for a bionic eye. Moreover, the depth walkway guidance to avoiding the obstacle. Many studies in information can be used to generate a danger map for this topic as [3] used the intensity-based method in order to navigating the blind to the safe partway and avoid the obstacle segment the obstacle from the walkable way because depth that might be harmful to them. This paper focuses on comparing images are noisy and might lack some information due to the danger map results of several RGB-D processing methods RGB-D camera limitation. However, the amount of research on the bionic eye walkway navigation applications. The methods in robotic navigator application shows that in situations with under comparison include 3D point cloud processing and four a low intensity of light can benefit from using the depth image. models of CNN (convolution neural network) semantic segmentation based on RGB-D images. The images from SUN The depth images are significantly helpful to detect the RGB-D scene understanding benchmark suite are used as input obstacle that barely separated by intensity image. Moreover, images in this experiment. The result of this research shows that if both intensity-based information (RGB image) and depth each method has its own advantage. However, the convolution based information are used together, classification accuracy neural network seems to be significantly better in accuracy, can significantly be improved [3]. precision, and recall, when comparing with other resulting images. Even, many methods can be applied in order to classify the floor from each point in RGB-D image, point cloud processing Keywords—Point cloud processing, RGB-D, CNN, Walkway, and RGB-D convolution neural networks (CNN) are an Bionic eye interesting method to study for this application. Because of its implement ability that flexible to many situations, especially I. INTRODUCTION for the noisy image as the testing real-life room images dataset that crowded by several obstacles. In 2017, WHO’s fact sheet reported that around 36 million people in this world are suffered from Blindness [1]. In order The ground plane segmentation base on RANSAC method to help the blind, many assistive devices and methods such as is a part of 3D point cloud processing that used in other braille language, white cane (blind stick), and tactile graphic application as robotic mapping and navigation application. were developed. The bionic eye is one of the devices that was RANSAC method is based on statistical probability; even the developed, by the knowledge about phosphene (bright spot) information from a 3D camera might be noisy by light artifact, pattern visual pathway stimulation, for blind to be able to see RANSAC still possible to separate the plane from the others things. However, because of current hardware limitation, the things and we can check the plane that matches to the floor resolution of the image that can be stimulated to the blind is condition as a floor [3]. only around 6 × 10 pixels (epiretinal bionic eye: Argus II) in 54

2019 4th International Conference on Information Technology (InCIT2019) Another method is the segmentation method base on simulation. The post-processing process also fulfills some are convolution neural network (CNN). Many researchers applied that cannot be segmented by the RANSAC method due to the this method to solve multiple situations. This method is also limitation of depth information. The overview process of 3D probability base method; it can handle the light artifact point cloud processing is shown in Fig. 1. problem. This method requires the training images to make statistic decision on the testing image. According to this Fig. 1. Overview of 3D point cloud processing. statistic decision, convolution neural network (CNN) can be used to solve the complicated situation base on the training set III. RGB-D CONVOLUTION NEURAL NETWORK that was applied. Moreover, there are many segmentation According to [9] the semantic labeling can be applied to models available to use. Those model can be used to reduce segment the walkway apart from the obstacle by RGB image. the development time. However, the depth data may help to improve the result for the complicated image that cannot be segmented by RGB This paper used 3D point cloud processing and RGB-D image alone. This paper used Seg-net and U-net, which are the convolution neural networks (CNN), as Seg-net and U-net, to general and well-known semantic segmentation models, in generated low-resolution image danger map (45 × 45 pixels 8 RGB-D image to compare with the RANSAC method. Even levels Bitmap) in camera view for walkway navigation as the both convolution neuron network and 2D image processing representation of subretinal Bionic eye’s phosphene pattern. already have the comparison study in this application[9] but This comparison in these two statistical-based methods can be the study for 3D processing and RGB-D convolution neural used as a base for further development in bionic eye’s network still required for complicated scene segmentation. application and other similar application. There are two main approaches to process RGB-D images in a convolution neural network, as illustrated in Fig. 2. The In order to reduce data preparation times, RGB-D images first structure, namely RGB-D fusion method (Fig. 2 (b)), from SUN RGB-D scene understanding benchmark suite [4] concatenates intensity base image (RGB images) with the are used as the real-life environment input image. This depth information first and then use the deep learning model benchmark suite contains 10,335 RGB-D images from NYU to extract the feature from the mixed information. While depth V2 dataset [5], Berkeley B3DO dataset [6], and SUN3D another model, RGB-D feature concatenation, started by dataset [7] and also provided RGB-D image, 3D point cloud, extracts the feature from RGB image and depth image and also ground truth image for trains the convolution neural separately then concatenated the features together. Each deep network. learning model has different result while using these two structures [10]. SUN RGB-D scene understanding benchmark suite is a large dataset that concludes the images from several 3D cameras. Each image in the dataset has a different resolution and a variety of obstacle. This variety of data are an excellent example for testing as real-life The paper is organized as follows, Section II is reviews related 3D point cloud processing technique and explain about the methods, that were used. Section III describes the models of convolution neural networks. Section IV presents the result from both methods and Section V is the conclusion of this paper. II. 3D POINT CLOUD PROCESSING For 3D point cloud processing, this work uses the RANSAC (random sample consensus) method, to separated floor plane from the other plane. RANSAC is statistical probability base method that is widely used in many applications, including point cloud plane segmentation due to its simple implementation. RANSAC method starts with a random sample data then estimates the parameter from that data. After that, calculates the error and identifies the inlier candidates and repeats this process until it gets enough number of iterations [8]. After the floor is segmented as the inliers using RANSAC method, the segmented point cloud information is continued to post-processing process based on a k-nearest neighbor method. This post-processing process finds the nearest point that has similar RGB characteristics with the point that cannot specify the depth information and cannot be labeled with RANSAC method. Then, the labeled results are converted into a camera view low-resolution danger map (low intensity for the safe area and high intensity for a near object that might be harmful to the blind) for bionic eye stimulated pattern 55

2019 4th International Conference on Information Technology (InCIT2019) Seg-net [11]and U-net [12]are chosen to be semantic The architecture of Seg-net (Fig. 3) begins with padding to segmentation models for compared between these two preserve the original size of the image. After that, perform structures. Seg-net and U-net are conventional convolutional convolution to extract the features. Then, batch normalized neural network models for semantic pixel segmentation and apply a rectified linear unit (ReLU) as an activation consist of multiple layers of encoder and decoder. The encoder function for this model. Continue by max pooling in order to is for classifying each point in the image and decoder part is translation invariance of the input image. Repeat this to map the low-resolution feature map into input resolution or performance for 4 times. Then, perform decoder network as higher resolution. Seg-net is developed before U-net model. It up sampling and use the convolution model for matching each is designed to reducing computation time and memory; It feature to each pixel. In order to Finalize the output image, suitable for scene understanding application in limited softmax classifier is applied to predict the segmentation class computation resource situation as this work [11]. However, in with maximum probability for each pixel[11]. term of scene minor detail understanding, U-net model which designed for biomedical image segmentation usually have a The architecture of U-net (Fig. 4) is similar to the Seg-net higher performance [12]. model because it also bases on convolution and rectified linear unit (ReLU). However, the segmentation border of each item This work evaluates both Seg-net and U-net models with region is predicted by duplicate high-resolution image and the two proposed structures, RGB-D fusion and RGB-D concatenated it with up sampling features. Because of this feature concatenated. In other word, this paper evaluates total border prediction, U-net has an outstanding localization 4 neuron network model mention each model as Seg-net precision [12]. feature concatenated, Seg-net RGB-D fusion, U-net feature concatenated, and U-net RGB-D fusion. Then, the post- In the experiment, 10335 images in the dataset are processing process, as used in the 3D point cloud process, is separated into 3 groups, 5230 images for a training dataset, applied to the segmentation results. The architecture of Seg- 2555 images for a validation dataset, and 2550 images for a net and U-net are shown in Figs. 3 and 4. testing dataset. To find an appropriate epoch number, U-net model is trained with 300 epochs from 1 to 300. Fig. 5 shows that the accuracy of the validation dataset seems to be in the steady-state since around epoch 50, so the number of epoch for training is set to 50 for all model. The graph in Fig. 5 illustrates the relationship between the epoch number and accuracy from the training dataset and the validation set. Fig. 2. Comparison of RGB-D Features Concatenation and RGB-D Fusion Fig. 3. The architecture of Seg-net model. neural network model. (a) RGB-D features concatenation method structure and (b) RGB-D fusion method structure. 56

2019 4th International Conference on Information Technology (InCIT2019) TABLE I. RESULT ACCURACY & PRECISION OF EACH METHOD 3D point Seg-net Seg-net U-net U-net cloud* Feature Feature concate- RGB-D concate- RGB-D nated fusion nated fusion True 649379 652326 758217 592149 749863 positive pixels pixels pixels pixels pixels False 249083 271110 156219 331287 173573 negative pixels pixels pixels pixels pixels False 297606 179620 751496 123924 526132 positive pixels pixels pixels pixels pixels True 3216407 4060694 3488818 4116390 3714182 negative pixels pixels pixels pixels pixels total 4412475 5163750 5163750 5163750 5163750 pixels pixels pixels pixels pixels Accuracy 0.876 0.913 0.822 0.912 0.864 Precision 0.686 0.784 0.502 0.827 0.588 Recall 0.723 0.706 0.821 0.641 0.812 *3D point cloud cannot successfully work on 371 images of 2550 test images Fig. 4. The architecture of U-net model. Table I shows that the feature concatenated CNN can effectively segment the floor from the obstacles better than the Fig. 5. Graph of accuracy for each epoch number while training the U-net RGB-D fusion CNN and 3D point cloud processing. On the model with this dataset. other hand, the RGB-D fusion method has the best recall compared to the other two methods. After segmented by CNN, the segmented images are processed with the same post-processing process as a result Moreover, after applied 3D point cloud method with all from RANSAC method to generate the low-resolution image 10335 images in the dataset, the result shows that 1702 images and to fill some areas of the image that lack depth information cannot be computed with this method especially the image and generate the danger map. that does not have enough floor information and the other 2143 images misrecognizes tables, beds, sofas, and other IV. RESULT AND DISCUSSION obstacles as the floor. Examples of the results are shown in Figs. 6 and 7. Table I shows the accuracy, recall, and precision of the result from However, when comparing each method result with some each method compared to the ground truth in the dataset. examples shown in Figs. 6 and 7, point cloud processing Every pixel (45 × 45 pixels) of 2550 segmentation result method seems to work well in the hallway image (Fig. 6) as image, a total of 5,163,750 pixels, were separated into 4 well as RGB-D fusion CNN method, while the Feature groups: True positive, False negative, False positive, and True concatenated CNN cannot recognize walkway in the far negative. Due to the limitation of RANSAC method, 371 distance according to the missing of depth information. images in 2550 image can’t be computed, the accuracy of 3D point cloud processing was computed by only 2179 images. In contrast to Fig. 6, Fig. 7 illustrates a situation with small detail as the leg of the white chair. In this case, U-net model and 3D Point cloud method can successfully segment the leg of the white chair while it was missing in the Seg-net model’s result. Comparing between feature concatenated model and RGB-D fusion model, feature concatenated model can segment the details of the obstacle better than RGB-D fusion model. As mention in Table I, 3D point cloud has a limitation that if the image is included many things or have only one essential plane in the image; the result of RANSAC method will be resulting as an error because it can not specify the plane (inlier) from the other (outlier). The CNN base method is not affected by this limitation. Conversely, CNN base method cannot specify the situation that not included in the training set. Moreover, time and computation resource is required for training the CNN model. This limitation will be the concerning point when implementing CNN based method to bionic eye device. 57

2019 4th International Conference on Information Technology (InCIT2019) Fig. 6. Example of result 1. (a) Original color image, (b) Original depth Fig. 7. Example of result 2. (a) Original color image, (b) Original depth image, (c) Down sampling image of floor groundtruth from the dataset (mask image, (c) Down sampling image of floor groundtruth from the dataset (mask the floor as white pixel), (d) Danger map result from 3D point cloud the floor as white pixel), (d) Danger map result from 3D point cloud segmentation method, (e) Danger map result from Seg-net feature segmentation method, (e) Danger map result from Seg-net feature concatenated method, (f) Danger map result from Seg-net RGB-D fusion concatenated method, (f) Danger map result from Seg-net RGB-D fusion method, (g) Danger map result from U-net feature concatenated method, and method, (g) Danger map result from U-net feature concatenated method, and (h) Danger map result from U-net RGB-D fusion. (h) Danger map result from U-net RGB-D fusion. V. CONCLUSION The 3D point cloud is useful for specific details such as As a result, from Section IV, RGB-D CNN demonstrates distant points or small objects. The CNN model with feature good performances to generate low-resolution danger maps concatenated method has higher accuracy and precision while, based on the CNN semantic segmentation model, while 3D the RGB-D fusion method has a higher recall. In terms of point cloud processing method cannot perform in some bionic eye applications, a feature concatenated method seems situations. However, each method still has an advantage in to be the best method for this dataset. However, this different situations. comparison is based on only two neuron network models, U- net and Seg-net. As a future work, we plan to investigate other neuron network models. 58

2019 4th International Conference on Information Technology (InCIT2019) ACKNOWLEDGMENT This work was supported by Thailand Advanced Institute of Science and Technology - Tokyo Institute of Technology (TAIST-Tokyo Tech) and got some technical support by the College of Engineering, National Chung Cheng University, Taiwan, Republic of China. REFERENCES [1] R. R. A. Bourne et al., “Magnitude, temporal trends, and projections of the global prevalence of blindness and distance and near vision impairment: a systematic review and meta-analysis,” Lancet Glob. Heal., vol. 5, no. 9, pp. e888–e897, 2017. [2] H. Lorach, O. Marre, J. A. Sahel, R. Benosman, and S. Picaud, “Neural stimulation for visual rehabilitation: Advances and challenges,” J. Physiol. Paris, vol. 107, no. 5, pp. 421–431, 2013. [3] C. McCarthy, N. Barnes, and P. Lieby, “Ground surface segmentation for navigation with a low-resolution visual prosthesis.,” Conf. Proc. IEEE Eng. Med. Biol. Soc., vol. 2011, pp. 4457–60, 2011. [4] S. Song, S. P. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite.” [5] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor Segmentation and Support Inference from RGBD Images.” [6] A. Janoch et al., “A Category-Level 3-D Object Dataset: Putting the Kinect to Work.” [7] J. Xiao, A. Owens, and A. Torralba, “SUN3D: A Database of Big Spaces Reconstructed using SfM and Object Labels,” 2013. [8] S. Choi, T. Kim, and W. Yu, “Performance Evaluation of RANSAC Family.” [9] L. Horne, J. Alvarez, C. McCarthy, M. Salzmann, and N. Barnes, “Semantic labeling for prosthetic vision,” Comput. Vis. Image Underst., vol. 149, pp. 113–125, 2016. [10] L. Shao, Z. Cai, L. Liu, and K. Lu, “Performance evaluation of deep feature learning for RGB-D image/video classification,” Inf. Sci. (Ny)., vol. 385–386, pp. 266–283, 2017. [11] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, 2017. [12] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation.” 59

2019 4th International Conference on Information Technology (InCIT2019) Disaster Risk Management Training Simulation for People with Hearing Impairment: A Design and Implementation of ASL Assisted Model Using Virtual Reality Arlene R. Caballero Jasmin D. Niguidula Jonathan M. Caballero College of Technology College of Information Technology College of Information Technology Lyceum of the Philippines University Education Education Manila, Philippines Technological Institute of the Philippines Technological Institute of the Philippines [email protected] Manila, Philippines Manila, Philippines [email protected] [email protected] Abstract—— Disaster is the unforeseen and most of the decisions on how to respond to hazard and disaster risk [5]. time a sudden event that causes destruction, human suffering, To reduce the risk, there should be an implementation of great damage on property and economic losses. As described prevention and mitigation measures. This involves the by the United Nation International Strategy for Disaster process of anticipating potential sources of risk, practicing Reduction (UNISDR), the impact of disaster could be more safety measures to avoid hazard and reduce the economic, than just a damage to property but may also result to physical social and environmental impacts with proper intervention and mental disabilities, and the worst effect of its impact can [6]. result to loss of life. Every person including persons with disability should be aware of any vulnerable and harmful There are different disaster trainings, safety measures condition that anyone may possibly place in. By proper and prevention programs being conducted by the planning and appropriate training to disaster management, government and non-government agencies. These programs the increasing risk and impact of disaster may be reduced. should be designed as disability-inclusive programs to be This study emphasizes on the design and implementation of able to guarantee an equal and accessible training to persons ASL assisted simulated training on disaster risk management. with disability. This social inclusion has been mentioned in This focuses on facilitating programs on disaster awareness, the United Nations Sustainability Development Goals capacity building activities and real environment training for (UNSDG) in terms of promoting inclusive education and proper respond to disasters designed for people with hearing training to individuals who have disabilities [7] [8]. impairment. This study also aims to design an ASL assisted application that uses virtual reality technology to be able to Persons with disability are also called vulnerable simulate disaster drills and exercises to people with hearing populations who may have special needs. This includes impairment without exposing them to harmful conditions. The persons with hearing limitations [7]. The live trainings for methods used to design and implement this study were disasters such as earthquake, typhoon, tsunami and fire may qualitative approach to design and prototyping techniques. be expensive and dangerous at some point [9] [10] [11]. Therefore, these trainings may require specific drills and Keywords— Social Inclusion, Disability-Inclusive, training exercises. This may necessitate additional resources Emergency Preparedness, Disaster Simulation, Virtual Reality and experts who are specialized in conducting live trainings Training specifically for people with hearing impairment. Since trainings on disasters are dangerous and expensive, I. INTRODUCTION technology is used to facilitate disaster drills and exercises through virtual training simulations [9]. Simulation is a Disaster is the unforeseen and most of the time a sudden virtual activity that allows the user to be immersed in a event that causes destruction, human suffering, great damage tangible scenario by imitating or mimicking the experience on property and economic losses [1] [2] [3]. This is a done in real world. These simulations are computer situation where the impact needs a serious attention from the programs capable of generating a three-dimensional images national or international community because the damages on where the user can easily move and interact. With this, the affected community exceeds their ability to cope using training simulations for earthquake, fire, tsunami, flood and its own resources [1] [4]. typhoon which is dangerous to perform can be achieved without exposing the people with hearing impairment in The impact of disaster as described by the United Nation hazardous condition [12] [13]. International Strategy for Disaster Reduction (UNISDR) could be more than just a damage to property but may also This study aims to implement a disaster risk management result to physical and mental disabilities, and the worst effect training simulation which is purposely designed for people of its impact can result to loss of life [2]. Undeniably, the with hearing impairment. This explores disaster trainings effect can be a serious damage not only for a short period of that focuses on the simulation of earthquake, fire, tsunami or time but possible long term effect to an individual. Every flood, and typhoon conditions. The designed prototype person including persons with disability should be aware of provides teaching and training simulations aided with any vulnerable and harmful condition that anyone may American Sign Language (ASL) feature to assist the people possibly place in. Most essentially, to be prepared and be with hearing impairment. The sign language avatar will help knowledgeable on how to respond to any life threatening the persons with hearing impairment to realize the content of situation at all times. the disaster training modules. The modules include disaster awareness, emergency preparedness, and capacity building Disaster risk management, emergency preparedness, and activities for proper response to disasters. At the end of every proper training can help an individual to make educated 60

2019 4th International Conference on Information Technology (InCIT2019) module, an assessment with ASL assist feature is provided B. Social Inclusion to determine the competency of the people with hearing Social inclusion as defined by the United Nations is the impairment who got immersed in the virtual reality environment. “process of improving the terms of participation in society, particularly for people who are disadvantaged, through II. RELATED STUDIES enhancing opportunities, access to resources, voice and respect for rights” [15]. This is an advocacy which ensures A. Natural Disaster that all groups of people within the society will be given an To be able to classify the most common phenomenon equal importance and value including persons with disability. included in this study, this section reviews the different types of disasters as described by the Centre for Research on the The concept of disability inclusiveness is captured in the Epidemiology of Disasters (CRED) [1]. This section further 17 United Nations Sustainable Development Goals discusses social inclusion of people with disabilities and the (UNSDG). There are 33 core articles of the United Nations various applications of virtual reality to be able to realize Convention on the Rights of Persons with Disabilities real-life drills and hazardous condition in a simulated (CRPD) which discusses disability, human rights and environment. sustainable development. This means that the 17 goals in the UNSDG mandates all the member countries to ratify Natural disaster as described by the American Red Cross disability-inclusiveness embedded in their programs. The includes climate occurrences such as tornadoes and CRPD compels all the 33 member countries of the United hurricanes, avalanches and floods, underground disasters, Nations to conduct programs for disaster awareness, disaster and biological disasters such as transmissible disease preparedness and response which are inclusive of, and outbreaks [14]. These natural phenomenon is said to be accessible to, people with a disability [16]. natural forces that man can hardly control. This may cause damage to property and the worst scenario is the loss of life During the occurrence of natural disasters, people with [2]. disability are being separated with their family members because of their immobility and inflexibility to respond to TABLE I. DISASTER SUBGROUP DEFINITION AND CLASSIFICATION the circumstance. This condition increases their vulnerability during natural disasters and this necessitate the community Disaster Definition Disaster Main to impose the disability inclusion in disaster management Subgroup Types [8][16]. Geophysical Events originating from solid Meteorological earth Earthquake, To be able to include people with disability, the Volcano, Mass community should be able to include these people in all Hydrological Events caused by short- Movement (dry) phases of disaster management. This includes disaster risk lived/small to meso scale reduction – preparedness, prevention and mitigation, along Climatological atmospheric processes (in the Storm with disaster relief, rehabilitation and recovery [16]. With spectrum from minutes to days) this, the people with disability will be educated on how to Biological Events caused by deviations in Flood, Mass properly respond to disaster. The government should also the normal water cycle and/or Movement (wet) incorporate disability measures by conducting disaster overflow of bodies of water awareness program, giving special trainings on common Extreme occurrences of disasters, capacity building activities, and caused by wind set-up Temperature, real environment training for proper respond to disasters Events caused by long- Drought, Wildfire designed specifically for people with disability [17]. lived/meso to macro scale processes (in the spectrum from Epidemic, Insect C. Virtual Reality in Training intra-seasonal to multi-decadal Infestation, One of the technologies that is fast emerging today is climate variability) Animal Stampede virtual reality. Virtual reality (VR) is a computer program Disaster caused by the exposure that generates a three-dimensional images that used to of living organisms to germs and imitate or mimic the real-life world [17] [12]. In the virtual environment, the users can easily move and interact with the toxic substances objects designed within the immersive environment. Table 1 depicts the natural disaster subgroup definition The VR technology allows the users to explore an and classification reported by the Centre for Research on the immersive experience which exposes the user to an actual Epidemiology of Disasters (CRED) in the Annual Disaster disaster condition using a real- life simulation. Exposing a Statistical Review in 2011 [1]. Generally, there are five person to real environment by adding a greater level of disaster subgroups namely Geophysical, Meteorological, realism to respond to actual disaster condition has a benefit Hydrological, Climatological, and Biological. As shown on of saving time, cost, equipment, and safety [9]. the table, the 5 subgroups includes the disaster main types such as Earthquake, Volcano, Mass Movement (Dry), Safety is the major advantage of using simulated training Storm, Flood, Mass Movement (Wet), Extreme specially on performing hazardous task. This is why the VR Temperature, Drought and Wildfire, Epidemic, Insect is used in different field for simulating military training for Infestation, and Animal Stampede. battle, flight simulator, medical and surgical operations, engineering design and constructions, and even in virtual In this study, the most common occurring natural tours for business. The VR technology serves as an disasters were selected to educate the vulnerable population alternative practice in performing real-life drills and which are the persons with disability specifically, the people exercises [9] [10] [18] [12]. with hearing impairment. The natural phenomenon such as earthquake, flood/tsunami, fire, and typhoon were explored to reduce the cost and fatalities during the occurrence of these disasters [10] [9]. 61

2019 4th International Conference on Information Technology (InCIT2019) In the same manner, the VR technology can also be used Fig. 2. Design 2 Infographics Approach Using A/B to disaster risk management training to avoid high exposure Testing Technique [19] to hazard most importantly to the persons with disability since they experience increased vulnerability during natural Figures 1 and 2 shows the screenshots of the storyboard disasters [16] [17]. from A/B testing technique used in the study. This split testing approach allowed the respondents to compare the III. METHODS two variants of visualized design concept. As shown on the figure, the test-variants illustrates Design 1 as variable A A. Qualitative Approach and Design 2 as variable B. The Design 1 is a guideline- To be able to design and implement a disaster risk based design which prompts the user what to do while the Design 2 proposed an infographic approach to learning prior management simulated training for people with hearing to taking the simulated training. impairment, this study involved three (3) local city government where National Disaster Risk Reduction and B. Development and Protoyping Technique Management Council of the Philippines (NDRRMC) offices are confined. From the pool of identified experts, a In order to design and implement the disaster risk qualitative approach to data were conducted. management training simulation for people with hearing impairment, prototyping techniques such as Rapid In the focus group discussion and in-depth interview, Prototyping (RP) and Virtual Prototyping (VP) were applied. there were four (4) experts from each of the three (3) local The Rapid Prototyping (RP) is a technique where three- city government who participated in the series of small group dimensional model of the required design will be initially discussion. Among the 12 pool of experts, 3-5 members were created using computer [20]. On the other hand, Virtual facilitated to be able to discover the group’s opinion about a Prototyping (VP) consists of many capabilities such as particular area or topic on conducting disaster trainings. creation and viewing of three-dimensional solid models with Each of them were directed questions for affirmation and various colors and surface textures. VP uses additional validation. This technique allows the researcher to get software simulation tools which are embedded in the system important feedback and capture small in order to the design as a plug-ins to digitally generate animations of mechanisms, the simulated training on disaster risk management. finite element analysis (FEA) and computational fluid dynamics (CFD) of mechanical products and structures. In order to test whether the design of the simulated training is acceptable to the experts and persons with hearing In this study, virtual prototyping were done in order to impairment, a controlled experiment was conducted. In the convert the conceptual models into object models within the controlled experiment, there were two (2) design variants program. In the simulated training environment, the common presented to the respondents using the A/B Testing place that will be used as a ground for simulated training technique [19]. The design variants were presented using drills were constructed. This included disaster conditions for storyboard and object prototyping. The storyboard earthquake, typhoon, fire and flood/tsunami. Among other techniques allows the user to participate in the requirement objects that were modeled in the virtual world are fire, validation process by giving comments and criticism waves, raindrops, buildings, equipment, and more others in interactively. While the object prototyping technique take which the users need to interact with. part in discovering the specific needs of the user and build a quality system based on the requirement [20]. C. Data Collection To validate whether the designed prototype meets the The standard data used in this study were gathered from requirement of the persons with hearing impairment, there the government agency named National Disaster Risk were two (2) respondents who are experts in American Sign Reduction and Management Council of the Philippines Language (ASL) participated in the A/AB Testing sessions. (NDRRMC) mandated by the Philippine government to This stage is fundamental in the controlled experiment since propagate disaster risk management and emergency every instruction should be interpreted clearly by the sign preparedness. The content and guidelines simulated in the language avatar used in the program to be able to system prototype basically modelled from the actual communicate the instructions properly. disaster training provided by the agency. As a secondary data, different sources such as published articles, books on Fig. 1. Design 1 – Guideline-based Approach Using A/B disaster management, training manuals and standard Testing Technique [19] learning tools provided by other government. 62

2019 4th International Conference on Information Technology (InCIT2019) IV. SYSTEM FEATURES Figure 4 shows the fire training simulation when the user One of the benefits of this study is that, the persons do do not follow the guiding rules demonstrated in the not need to be physically exposed to any disastrous condition simulated training. In this scenario, the user opted to stay in to be able to develop the competency on how to properly the house and do nothing. In this figure, the scenario respond to disasters. The designed prototype is capable of simulates what could possibly happen to the person in doing viewing the ASL feature to assist the people with hearing the improper course of action. impairment to better appreciate the instructions depicted on the screen. C. Cases of Typhoon A. Cases of Earthquake In the cases of typhoon, the training simulation In the cases of earthquake, the prototype is composed of demonstrates an indoor and outdoor scenarios during typhoon. This includes the standard safety measures to be virtual training before earthquake, during earthquake, and taken during typhoon to reduce the risk of exposing oneself after earthquake simulations. The before earthquake virtual to hazard conditions. training contains preparation guidelines that simulate on what to prepare before the earthquake. While during earthquake scenario shows a simulation on what do to during earthquake. Lastly, the after earthquake scenario simulates what to respond when there are aftershocks. Fig. 3. Earthquake Simulation before Earthquake Fig. 5. Typhoon Training Simulation (a) Figure 3 depicts a simulation before earthquake. This Figure 5 shows the typhoon training simulation which simulation focuses on the preparation and guidelines before demonstrates the guiding rules and safety measures to be the earthquake occurs. This shows a list of guiding rule and taken during typhoon. This indoor scenario trains the user course of action to be taken before the disaster comes. The on what to prepare in order to survive in an extreme guideline is visible on the left part of the screen to be able to emergency cases. In one of the safety measures review some critical points performed in the simulated demonstrated, the user was asked to store some food and training. clean water as a safety measure and survival when the worst B. Cases of Fire case scenario of typhoon happened. In the cases of fire, a simulated exercises on how to D. Cases of Tsunami/Flood handle equipment such as fire extinguisher is provided. Aside from the text that can be seen on the screen, the The cases of tsunami/flood training simulation is questions and choices are interpreted in sign language. This composed of outdoor scenarios that demonstrates the worst fire module also simulates real-life drills and training on how cases of tsunami or flood. The simulated training shows the to escape during fire incidents. different safety measures that can be taken by a person to be able to survive in the cases of tsunami or flood. Fig. 4. Fire Training Simulation (b) Fig. 6. Tsunami/Flood Training Simulation (a) Figure 6 shows the simulation in a worst case scenario of tsunami or flood. As shown on the figure, the sign language avatar interprets how a person might end up to drowning as the simulation demonstrates the scenario. 63

2019 4th International Conference on Information Technology (InCIT2019) V. RESULTS educated decisions on how to respond to hazard and disaster risk. To demonstrate the result of the A/B test, there were 12 participants from the NDRRMC representing the 3 local This study further confirms that disaster training city governments and 2 sign language experts whom all program for earthquake, fire, typhoon and tsunami/flood participated in the series of focus group discussion can be performed without spending on expensive equipment composed of 3-5 members. A total of 14 respondents for deploying live trainings and physical drills to respond on participated on the controlled experiment. The following disaster. Moreover, social inclusion as mandated by the table shows the result of the A/B testing conducted for United Nations can also be realized because most people disaster risk management training simulation for people with disability are the ones who experience high with hearing impairment. vulnerability during natural disaster occurrences. TABLE II. A/B TEST RESULT REFERENCES Frequency Percentage A/B Test [1] D. Guha-sapir, P. Hoyois, and R. Below, “Annual Disaster Statistical 11 79% Review 2010: The numbers and trends,” Rev. Lit. Arts Am., pp. 1– Design 1 50, 2011. -Guideline-based Approach 3 21% Design 2 14 100% [2] UNISDR, “2009 UNISDR Terminology on Disaster Risk -Infographic Approach Reduction,” Int. Strat. Disaster Reduct., pp. 1–30, 2009. TOTAL [3] R. Fino, M. J. Lin, A. Caballero, and F. F. Balahadia, “Disaster awareness simulation for children with autism spectrum disorder Table 2 shows the result of the A/B test conducted in using android virtual reality,” J. Telecommun. Electron. Comput. the study. The table revealed that the Design 1 perceived by Eng., vol. 9, no. 2–6, 2017. the experts to be more applicable in terms of learning and performance. As shown on the table, there were 11 out of [4] UNISDR, “Terminology: Basic Terms of Disaster Risk Reduction,” 14 or 79% of the participants selected the Design 1 which is Terminol. UNISDR, pp. 1–8, 2004. a Guideline-based Approach. While there were 3 out of 14 or 21% of the participants selected Design 2 which is the [5] R. W. Perry and M. K. Lindell, “Preparedness for Emergency Infographic Approach. The result revealed that the Response : Guide- lines for the Emergency Planning Process,” guideline-based approach design performs conducive Disasters, vol. 27, no. 4, pp. 336–350, 2003. method for simulated training on disaster. [6] M. L. Carreño, O. D. Cardona, and A. H. Barbat, “A disaster risk management performance index,” Nat. Hazards, vol. 41, no. 1, pp. 1– 20, 2007.M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989. [7] E. B. Beckjord et al., “Enhancing Emergency Preparedness , Response , and Recovery Management for Vulnerable Populations Task 3 : Literature Review,” 2008. TABLE III. T-TEST: PAIRED TWO SAMPLE FOR MEANS [8] United Nations, “#Envision2030: 17 goals to transform the world for persons with disabilities,” United Nations Enable, 2018. [Online]. Design 1 Design 2 Available: Mean 0.79 0.21 https://www.un.org/development/desa/disabilities/envision2030.html. Variance 0.18 0.18 [9] R. M. Satava, “Virtual reality surgical simulator,” Surg. Endosc., vol. t Stat 2.51 7, no. 3, pp. 203–205, 1993. P(T<=t) one-tail 0.01 [10] P. B. Andreatta et al., “Virtual reality triage training provides a viable solution for disaster-preparedness,” Acad. Emerg. Med., vol. 17, no. Table 3 shows the result of A/B Testing variable. The 8, pp. 870–876, 2010. result of the paired two-sample design that compares the random sample of experts participated in the study. The [11] J. Rickel and W. Johnson, “Virtual humans for team training in Design 1 has an increase in scores based on the evaluation virtual reality,” AI Educ., no. July, pp. 585–594, 1999. of the expert respondents with M=0.79 compared to Design 2 with M=0.21. Further, as depicted in the statistical [12] F. Kuijper, “HMD based virtual environments for military training- analysis of the pair t-Test, t (19) = -4.35, p = 0.01. Two cases,” Fys. EN Elektron. LAB TNO HAGUE (NETHERLANDS)., 2000. VI. CONCLUSION [13] W. D. McCarty, S. Sheashy, P. Amburn, M. R. Stytz, and C. Switzer, This study design and implement a simulated training “A Virtual Cockpit for a Distributed Interactive Simulation,” IEEE model for disaster risk management specially designed for Comput. Graph. Appl., vol. 14, no. 1, pp. 49–54, 1994. people with hearing impairment. It was discovered that simulations on disaster trainings can be designed and [14] American Red Cross, “Disaster Action Team Handbook,” in Disaster conducted to the community including persons with Action Team Handbook, p. 3. disability such as people with hearing impairment. It was also realized that the simulated training on disasters are [15] United Nations, “Identifying social inclusion and exclusion,” potentially capable of exposing a person to actual condition Leaving No One Behind – Imp. Incl. Dev., pp. 17–28, 2016. and scenario of disaster without causing harm to person immersed in hazardous environment. This study confirms [16] Christian Blind Mission, “DISABILITY INCLUSION : DISASTER that simulated training designed for disasters can serve as an alternative tool to develop competencies in making MANAGEMENT,” 2001. [Online]. Available: https://www.cbm.org/Inclusion-Made-Easy-329091.php. [17] A. R. Caballero and J. D. Niguidula, “Disaster Risk Management and Emergency Preparedness,” in Proceedings of the 4th International Conference on Human-Computer Interaction and User Experience in Indonesia, CHIuXiD ’18 - CHIuXiD ’18, 2018, pp. 31–37. [18] J. Rickel and W. L. Johnson, “Virtual humans for team training in virtual reality,” Proc. 9th Int. Conf. Comput. Educ., no. September, pp. 3–11, 1999. [19] Wingify, “The Complete Guide to A/B Testing.” [Online]. Available: https://vwo.com/ab-testing/. [20] K. H. Madsen and P. H. Aiken, “Experiences using cooperative interactive storyboard prototyping,” Commun. ACM, vol. 36, no. 6, pp. 57–64, 1993. 64

2019 4th International Conference on Information Technology (InCIT2019) Segmentation of Shinbone Interosseous Space using GVF Techniques Siwakorn Artraksa John Gatewood Ham Krisana Chinnasarn Burapha University Thailand Burapha University Thailand Burapha University Thailand [email protected] [email protected] [email protected] Abstract— In this research, a new shinbone There are currently many ways to determine the interosseous space segmentation method was proposed. amount of muscle. Calculation of weight and height is used The X-ray images were obtained from Dual Energy X-ray for calculating Body Mass Index (BMI), which is an estimate Absorptiometry (DXA) and consisted of 3 components of initial body fat. If the body mass index is high, a higher (muscle, fat, and bone). The DXA scanner produces two risk exists of diseases such as diabetes, high blood pressure, different images, gray image and a color image each high blood cholesterol, and certain types of cancer. containing different information. Muscle and fat are two Traditional methods for calculating Body Mass Index (BMI) components that can be used to calculate muscle mass. may be wrong. Elderly patients may have bone problems, The bone area is used to compute an estimate of the bone resulting in a lower height due to aging. Bone density mineral density (BMD) measurement as an osteoporosis decreases resulting in bone collapsing. The height of the indicator. Muscle mass is used for the body mass index elderly decreases over time, leading to errors in the calculation. X-ray images are the main source of calculating of the Body Mass Index (BMI). information to measure muscle, fat, and bone area in the human body. The main problem is the ambiguous outline In addition, the body composition method can be of each component, the position of the legs being placed examined with other techniques such as Computerized into the X-ray machine, and the variety of leg shapes. Tomography Scan (CT Scan) or Computerized X-ray Traditional segmentation methods such as watershed machines to inspect the body with high radiation X-rays. X- transformation, Noise cancellation, and Image ray photos are very sharp and precise. But those who have Enhancement were used in preprocessing the images. been tested and receive high amounts of radiation while Analysis of light concentration fluctuations to determine receiving X-rays in large quantities may suffer genetic the location of the Gradient Vector Flow (GVF) was the damage. And there is also a risk of causing diseases such as technique used to find the region of interest (ROI), cancer, tumors, skin diseases, etc. Dual Energy X-ray shinbone interosseous space. The Jaccard Index Absorptiometry (DXA) produces much lower radiation. percentage reach 93.92%. Overlapped Area percentage Computerized Tomography Scan (CT Scan), therefore, takes reaches 93.51%. X-ray images from Duel Energy X-ray Absorptiometry (DXA) to calculate the amount of muscle mass and fat in the Keywords— X-ray images, Muscle Mass, ROI, GVF area of the legs and arms. These are used to diagnose Technique, Interosseous space Osteoporosis, or low bone density. Dual Energy X-ray Absorptiometry (DXA) measurements release a second X- I. INTRODUCTION ray energy through the body tissues and bone density as well. X-ray photos are based on bone density and tissues. The The body is mainly composed of muscles, fat, bones and images obtained from X-ray photos are composed of gray- water. These four components make up most of the weight in scale images and color images. The gray-scale images show the body. Muscles are an important part that help the body to bones, and color images show muscle and fat. Calculating the move. The body needs more than just bones or joints. The amount of muscle mass accurately requires separating the amount of muscle mass is unstable, depending on many area of the bone and muscle, but the X-ray image obtained changing factors. As age increases, this results in less from the Dual Energy X-ray Absorptiometry (DXA) has low movement the amount of muscle mass will decrease. But contrast, resulting in the shinbone interosseous space being having strong muscles will help the body increase the tallied incorrectly. Hala Algailani et al. [1] present the metabolic rate, increase bone density, and increase bone segmentation of overlapping red blood cells and different strength. The in turn will reduce pain in the joints, reduce characteristics in each cell. Denoise using non-local means fractures of the bones, and helps control weight. When the method and segmentation area of red blood cell using the muscle mass is less than a built-in low threshold, the body watershed method was employed. Karim El Soufi et al. [2] will focus on building muscle rather than treating injuries. present a method of segmentation of bone tissue in X-ray Chemicals that help build muscle also help to break down fat images. They improve gray-scale images to increase contrast and reduce stress. Body’s weight cannot determine the using Adaptive Histogram Equalization (CLAHE) to remove muscle mass, because of its companies such as fat and water. the background and soft tissue. Then morphological People who have a greater proportion of muscle mass have a operators are used to maintain the bone structure. S. more effective metabolic system than people with a smaller Kazeminia et al. [3] present a method based on edge detection percentage of muscle in their total body weight. of the bone by using intensity fluctuations to look at the values of local maxima in intensity(peaks). In the research 65

2019 4th International Conference on Information Technology (InCIT2019) presented above, the data of the imported images are clearly B. Image Enhancement separated. And the quality of the images that have been 1) TOP-HAT Transform[6] Mathematical morphology is an important tool in image imported is high. When compared to our image data, our processing. TOP-HAT is one of the most important mathematical morphology operations, and is determined by images are less accurate because our images have been dilation and erosion of two basic functions. The TOP-HAT transform is used to enhance features within bright images created using low radiation levels, yielding images that are of and is defined in Equation (1). lower quality. In this paper, we present a new method for identification of the shin bone and finding interosseous space adjacent to the shinbone. We will start with an image consisting of fibula and shin bone. We must improve the ������������������(������, ������) = ������(������, ������) − ������ ○ ������(������, ������) (1) image to separate the two bones. We will take the gray image to improve quality. Knee and ankle area have high intensity. ������ is a grayscale image, ������ is a structure element, TOP-HAT Thus, these areas are used to identify the shinbone region transforms are using opening (������ ○ ������) and closing (������•������) of which locates in between both areas. The direction of the ������(������, ������) by ������(������, ������). vector will move in the area of the space between the bones. Our, first algorithm finds the local entropy to identify the knee and ankle. Then the Gradient Vector Flow (GVF) C. Local Entropy technique is used to find ROI of the shin bone and segment Entropy [7] has been widely and efficiently used in image processing to quantify the image information contained in the image to separate out the interosseous space. The results one image. were compared with ground-truth by specialist radiologists The concept of local entropy calculates the entropy within a sliding window (size ������ × ������ ). The window will from our local university hospital. move through the image data to every pixel within the image rows and columns. The entropy calculation of each sliding II. BACKGROUND KNOWLEDGE window will set the middle pixel value. The value will depend on all other neighboring pixels within the window. Medical photographs are medical information that experts Equation (2) use to identify diseases and find the risk of disease. Therefore, before medical image processing for accuracy, some basic << (2) knowledge is required. ������ = − 4 4 ������(������67)������������������;������(������67) A. Anatomy 6=> 7=> 1) Fibula and Tibia The femur in the human body consists of 2 main parts, the ������ is local entropy value. ������ is probability of intensity image. fibula and the tibia (shin bone) [4], as shown in Figure 1. The ������ is the width and height of the sliding window size. ������67 is the tibia is the main weight-bearing bone of the lower leg and the intensity of the grayscale image. second longest bone of the body, after the femur. The fibula and tibia are under the lower knee. The fibula is composed of D. Gradient Vector Flow (GVF) bone. It is long and tapered with a large, bone-like shape. The inside is attached to the shin. In the middle, it has a thin, The gradient vector flow [8] in used for solving problems slender appearance. The lower end has a pointed appearance, with a gradient range external force field, The external force called 'Lateral malleolus'. The tip is attached to the ankle equation. GVF can be expressed as ������(������, ������) = joint. The tibia is larger than the fibula, and is the second (������(������, ������), ������(������, ������)) , and its energy function is defined in largest in the human body, after the femur (Patella). The Equation (3). appearance of a cross section of the middle bone is shaped like a triangle. The lower end is smaller than the upper end. ������ = B ������(������D; + ������F; + ������D; + ������F;) + |∇������|;|������ − ∇������|;������������������������ The bottom area is attached to the tip of the fibula. The area (3) below the knee and near by the ankle bone is called. Fibula and Tibia shown in Figure 1. ������ and ������ are horizontal direction and vertical direction of the gradient. ∇f is the gradient parameter of the edge map. If ������������ is minimum, the energy is dominated by the first partial derivative and the second phase will control the lowest energy possible if ������ = ������������. ������ is the weighing parameter that must be adjusted to be suitable for noise removing. Figure 1. The tibia and fibula [5] III. PROPOSED METHOD In our method, we have two main processes: pre-processing and finding space. Pre-processing uses grayscale images to find the knee and ankle area using local entropy. Then the picture quality is enhanced by using grayscale images and 66

2019 4th International Conference on Information Technology (InCIT2019) color images and the Gradient Vector Flow (GVF) technique to find the shin bone and find an interosseous space as shown in Figure 2. Figure 3. Knee and ankle area identification algorithm. Finding the local entropy value will vary the size of the window from 15 * 15 to of 25 * 25 and then it can find the value of peak signal-to-noise ratio (PSNR) as in (4). ������������������������ = 10������������������>R S������������������������������������;W (4) ������������������ is the mean squared error. ������������������ is the maximum possible pixel value of the image. ������������������ = 1 < Y − ������67 ); (5) ������������ 4 4(������67 6=> 7=> ������ and ������ are the size of a data point, ������67 is a data point value local entropy. ������67 is a data point of the original input image. Figure 2. Diagram of our proposed method The result of finding the local entropy value from windows sized 15 * 15 to windows sized 25 * 25 is shown in A. Pre-processing Figure 4, In the window with size 21 * 21, the value has the appropriate PSNR. But the value of the PSNR does not have Before beginning the process of finding the space between a better value, as you can see in the graph in Figure 5. the bones, we must first find the area of the discharge region Therefore, we chose to use the window size 21 * 21 as the because the input image consist of the whole component of boundary between the knee and ankle regions. human legs. The space between the legs is in the end between the tibia and fibula. Then, tibia and fibula bones are segmented from a whole leg’s image. 1) Knee and ankle joint identification The procedure for identifying the area of the knee bone and foot bone imports the gray scale image data because in the gray scale image, there will be clearer bone data than in the color image. Finding the knee and ankle by discovering the grouping of data after doing Local Entropy is done with Equation (2). We will group the knees and ankles by selecting the group that has the maximum area object, which is the knee area. The maximum remaining area object, which is below the knee position, is defined as the ankle area. The algorithm for finding the knee and ankle joint areas is shown in the Figure 3. Figure 4. Window size of Local Entropy (a). window size 15*15, (b). window size 17*17 (c). window size 19*19 (d). window size 21*21(e). window size 23*23 (f). window size 25*25 67

2019 4th International Conference on Information Technology (InCIT2019) B. Bone Identification 1) Image Subtraction Enhancement of the quality of images to make the bone information clearer and eliminate tissue using image subtraction is defined in Equation (6) and the result of the image subtraction is shown in Figure 8. ������ = ������ − ������ (6) ������ is image A and ������ is image B from Figure 7. Figure 5. Result of PSNR of window size 15*15 to 25*25 which shows the optimal point in the graph is window size 21*21. 2) Image cropping Figure 8. Result of image subtraction between grayscale and blue channel The knee and ankle area can be identified as limb areas of the images. fibula and tibia (shin bone). The green line is the tip of the knee area and the red line is the beginning of the ankle area 2) Leg area localization using GVF obtained using local entropy analysis. A cropped grayscale The gradient vector flow (GVF) method takes the image from image is shown in Figure 6 (a), and the corresponding color the image subtraction and determines the slope of the images are shown in Figure 6 (b). intensity in both horizontal and vertical directions using the appropriate, Sobel filters shown in Figure 9. Figure 6. (a) Border of knee and ankle area (b) Gray and blue channel Figure 9. Horizontal and vertical Sobel filters. images Sobel filter can reveal the edge of each image. Then, 3) Image Enhancement GVF is employed to determine the direction of vector. The After cropping, gray scale image and blue channel image is Gradient Magnitude is calculated to indicate the size or force enhanced using TOP-HAT transform. The result of this of the movement of the vector approaching the edge. This is enhancement is shown in Figure 7. shown graph in Figure 10. Figure 7. Enhancement by top-hat transforms (a) Grayscale image (b) Blue channel image Figure 10. Gradient vector flow field of fibula and tibia bone. 68

2019 4th International Conference on Information Technology (InCIT2019) The data from the GVF operation yields the data set of the image edge (Figure 11) by selecting the data group that is the space between the bones calculated from the Increasing and Decreasing functions. That set of information is the space between the bones shown in Figure 12. The algorithm for finding the shinbone interosseous space area is given in Figure 13. Figure 11. Graph is axis Y of left (red) and right (yellow) and shinbone interosseous space (green). Figure 14. (a) Ground-truth, (b)Adaptive k-Means Algorithm, (c) Watershed transformation [1] and (d) Proposed method. The comparison of the results was done with Area Overlap (AO) and Jaccard index (JI), defined in Equation (7) and Equation (8) respectively A. Overlapping Area Persentage(AOP) ������������ = ]������������������������>> ∩ ������������������������;;] × 100 (7) ∪ ������������D is image to compare. Figure 12. ROI of shinbone interosseous space. Figure 13. Shinbone interosseous space segmentation algorithm. IV. EXPERIMENTAL RESULTS Figure 15. (a) Image from proposed method, (b) Image from watershed transformation [1] and (c) Image from Adaptive k-Means Algorithm. The proposed method was applied to 40 X-ray images. The images were obtained from the local university hospital and B. Jaccard Index (JI) were generated with Dual-X-ray Absorptiometry (DXA) machine. The machine produced grayscale images and color ������������(������, ������) = |������ ∩ ������| (8) images with low contrast and low resolution. The results were |������ ∪ ������| compared to the field truth by expert radiologists from our local university hospital and are shown in Figure 14. ������ is ground-truth image and ������ is the segmented image. ������ ∩ ������ is the number of overlapped pixels of the segmented and 69

2019 4th International Conference on Information Technology (InCIT2019) ground-truth images. ������ ∪ ������ is the number of pixels of the V. DISCUSSION segmented image combined with the ground-truth image. These results were for images with very low contrast and low Table I: Result from Jaccard Index (JI) and Area Overlap resolution. In future work, we will improve our methods to (AO) measurement of Adaptive k-Means Algorithm, increase the accuracy of the leg bone detection and find the Watershed Transformation [1], and our new method. area of shinbone segmentation for images with low contrast and low resolution more precisely. This would allow us to Methods JI AOE estimate the true value of the shinbone interosseous space, 82.36 % 76.05 % even in the area obscured by tissues or bones caused by Adaptive incorrect leg placement when x-rays were made. k-Means 87.31 % 83.79 % Algorithm 93.92 % 93.51 % VI. ACKNOWLEDGMENT Watershed This work was financially supported by the Research Grant Transformation of Burapha University through the National Research [1] Council of Thailand (NRCT), fiscal year 2018, Faculty of Informatics, Burapha University, Burapha University Proposed Hospital, and Dr.Alisara Wongsuttileart, MD. Method REFERENCES 1. H. Algailani and M. E. S. Hamad, “Detection of Sickle Cell Disease Based on an Improved Watershed Segmentation,” 2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE), Khartoum, 2018, pp. 1-4. 2. K. El Soufi, Y. Kabbara, A. Shahin, M. Khalil and A. Nait-Ali, “CIMOR: An automatic segmentation to extract bone tissue in hand x- ray images,” 2013 2nd International Conference on Advances in Biomedical Engineering, Tripoli, 2013, pp. 171-174. 3. S. Kazeminia, N. Karimi, B. Mirmahboub, S. M. R. Soroushmehr, S. Samavi and K. Najarian, “Bone extraction in X-ray images by analysis of line fluctuations,” 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, 2015, pp. 882-886. 4. OpenStax, “Anatomy and Physiology,” Anatomy and Physiology, 06- Mar-2013. [Online]. Available: https://opentextbc.ca/anatomyandphysiology/. [Accessed: 17-Jun- 2019]. 5. “Tibia,” One Stop Information on Anatomy. [Online]. Available: https://www.knowyourbody.net/tibia.html. [Accessed: 17-Jun-2019]. Figure 16. Performance comparison between our approach (in the dashed 6. X. Bai, S. Gu and F. Zhou, “Entropy powered image fusion based on frame) with watershed transformation and adaptive k-means algorithm. multi scale top-hat transform,” 2010 3rd International Congress on Image and Signal Processing, Yantai, 2010, pp. 1083-1087. The research consisted of three main phases. First, the input image was enhanced in a pre-processing step. 7. F. Hržić, V. Jansky, D. Sušanj, G. Gulan, I. Kožar and D. Ž. Jeričević, Second, bone area was identified. Finally, the shinbone “Information entropy measures and clustering improve edge detection region was segmented. The experimental results of our new, in medical X-ray images,” 2018 41st International Convention on proposed method had high accuracy. For the Jaccard index Information and Communication Technology, Electronics and our average was 92.92% and Area overlap our average was 93.51%. Microelectronics (MIPRO), Opatija, 2018, pp. 0164-0166. 8. W. Qiongfei, Z. Yong and Z. Zhiqiang, “Infrared Image Segmentation Based on Gradient Vector Flow Model,” 2015 Sixth International Conference on Intelligent Systems Design and Engineering Applications (ISDEA), Guiyang, 2015, pp. 460-462. 70

2019 4th International Conference on Information Technology (InCIT2019) Information of Sulci Vector for classifying Hydrocephalus and Cerebral Atrophy Symptom Onsiri Singkorn Krisana Chinnasarn Burapha University, Chonburi, Thailand Burapha University, Chonburi, Thailand [email protected] [email protected] Abstract—Classifications of Hydrocephalus and and doubly image vision. If not properly treated, it will Cerebral Atrophy are extremely challenging. Because lose the nervous system and have a physical of the magnetic resonance imaging (MRI) of those handicapped. This abnormal condition can be patients with these two diseases are very similar. Then, diagnosed by surgery operation. this makes the difficulty in the diagnostic process. Hence, the process requires a specialist and time- In the medical analysis process, experts use consuming. This paper proposes two new features: (1) magnetic resonance images (MRI) as a source. the Ventricle and Sulci ratio (VSR) and (2) Because it is a safe technology and no side effect from Convergence vector summation (CVS). VSR is the the radiation. But the appearance of MRI of HC and ratio between the ventricle and the sum of sulci vectors CA in the ventricle are similar. In CA, the occurrence and CVS is the force of the vectors which converges of tissue around the ventricle is less than HC. But the into the ventricle. Both of these features make the depth of sulci in CA is more than HC. Size of the classification model gains more accuracy. ventricle in CA and HC are commonly larger than Experimental results show that both features provide normal. But size of ventricles in HC and CA are the higher accuracy than the existing methods which using same. Therefore making it difficult to diagnose the traditional features. The performance is evaluated disease. Condition of HC comes from the amount of based on confusion matrix measurement. TP rate, FP cerebrospinal fluid (CSF) in the brain. It increases the rate, and F-measure of the Hydrocephalus size of ventricle causing skull pressure and pressure in classification is 93.3%, 3.3%, and 90.3% respectively. the brain. Moreover, the appearance of the brain Besides, for classifying Cerebral Atrophy disease, the groove (Sulci) in both HC and CA are also similar. result in percentage is 96.7%, 6.7%, and 97.5% Then, these make the difficulty in the diagnose process respectively. by the experts as well. Therefore, sometimes the diagnosis process may be delayed which may result in Keywords- Hydrocephalus; Cerebral Atrophy; Sulci; life- threatening patients. Thus, several types of Vector; MRI research related to the extraction of brain features occur by focusing on presenting ways. They try to I. INTRODUCTION extract new features of the brain to increase the efficiency of classification. Manit et al. [1] proposed Cerebral Atrophy (CA) is considered a dangerous two news features of the brain for classifying CA and disease group. Because it affects the memory, HC. Proposed features consists of: Frontal Occipital temperament, and behaviour of the patient. This Horn Angle (FOHA) and Sulci Ratio (SR). Anna disease is commonly found in the elderly persons Fabijanska et al. [3] proposed Ventricular Angle (VA) caused by the death of nerve cells, drinking alcohol, and Frontal Horn Radius (FHR). O'Hayon et al. [5] eating painkillers and hypnotic drug for a long time. proposed Frontal and Occipital Horn Ratio (FOHR). Normally, the disease makes muscle contraction Moore W. et al. [7] proposed quantitative symptoms including intelligent decreased, mood and measurements of lateral ventricular volume and total behaviour change, and forgetting the events. cortical thickness. Currently, this disease has not been cured. Only treat according to the symptoms of the disease only In this research, two new features focusing on information of sulci were presented. They revealed the Hydrocephalus (HC) is a condition with too pattern of Hydrocephalus and Cerebral Atrophy much cerebrospinal fluid (CSF) resulting in increased symptoms. The related theories, proposed method, skull pressure and alters the structures and functions of experimental result, and conclusion are described in many relevant neural regions. These cause the loss of the following section. various nervous systems to cause abnormal brain function, body development, and impaired II. BACKGROUND KNOWLEDGE intelligence. Generally, HC will happen to infants and the elderly persons. The symptoms of the patient are to A. Anatomy of Ventricle become less conscious, speak less, abnormally walk, The ventricle is a space in the brain which is the storage of Cerebrospinal fluid. The ventricle can be separated 71

2019 4th International Conference on Information Technology (InCIT2019) into four parts, namely Lateral ventricle. In the reference images from experts as a measure of cerebellum area, separated into left and right ventricle. accuracy. The third ventricle is a single space that is middle O'Hayon BB et al [5]. Presenting research on Frontal between the thalamus. In addition, the inner ventricle and Occipital Horn Ratio: A Linear Estimate of consists of 3 horns: Frontal horn, Occipital horn, Ventricular Size for Multiple Imaging Modalities in Temporal horn as Figure 1. Pediatric Hydrocephalus. Presented a new ratio called the frontal and occipital horn ratio (FOHR). Measure Figure 1. Brain image in the horizontal plane. accuracy by using correlation coefficients and Spearman's correlation coefficients. Which can be B. Brain MRI image concluded that the normal FOHR is 0.37 and is Magnetic Resonance Imaging (MRI) Is a tool used to independent of age. detect abnormalities in various organs in the body. By using a high-intensity electromagnetic Fields. Used to D. Existing Features create cross-sectional images and horizontal plane 1. The Sulci Ratio (SR) images. The image consist of 3 planes: (1) Coronal Manit et al. [1] proposed a feature related to sulci and plane (2) Sagittal plane (3) Horizontal plane. The gyri which can explain the differences in both diseases. advantages of MRI Image are high resolution. Cause The sulci ratio (SR) was the ratio between total depth doctors to accurately diagnose the disease, MRI image of all sulci and the brain area as described in eq. (1) of the brain with elements Figure 2. ������������ = ∑&'() %& (1) * 2. Frontal and Occipital Horn Angle (FOHA) Another feature proposed by Manit et al. [1] was Frontal and Occipital Horn Angle (FOHA). These features consist of 3 angles including FOHA left angel, FOHA right angel and FOHA top angel as illustrated in Figure 3. Figure 2 MRI brain Anatomy Figure 3. Three angles that occur in the ventricle. C. Literature review 3. Ventricular Angle (VA) Manit et al. [1] Presenting research on Novel features Anna Fabijanska et al. [3] proposed Ventricular Angle for Classification of Hydrocephalus and Cerebral (VA). The angle occurs between the diameter of the Atrophy. By 2 new features presented are Frontal head is angled to the tangent line the frontal horn as Occipital Horn Angle (FOHA) and Sulci Ratio (SR). shown in Figure 4. And 3 basic features: Evans Ratio (ER), Frontal Occipital Horn Ratio (FOHR) and Ventricular Angle Figure 4. The angle that occurs between the forehead horn. (VA). Which the research will use in the cerebral Tissue only to analyze. The skull and meninges will 4. Evans Ratio (ER) not be used. When extracting all 5 features, all of Evans et al.[6] proposed ER which was the ratio these features are then applied to the learning set of between frontal horns of the ventricle ������,-. and the MLP neural networks to classify the Hydrocephalus longest side within the skull ������,-. of normal people. In and Cerebral Atrophy. the paper recommend that this ratio must be less than 0 . 2 9 for normal people. If it is not, the doctor will Anna Fabijanska et al. [3] Presenting research on Assessment of Hydrocephalus in Children based on Digital image processing and analysis extraction of the features of Hydrocephalus from computerized tomography (CT) scan. The first step is to extract the 4 feature: Evans Ratio (ER), Frontal and Occipital Horn Ratio (FOHR), Ventricular Angle (VA) and Frontal Horn Radius (FHR). Finally, All 4 features will be compared to find the relative error by using 72

2019 4th International Conference on Information Technology (InCIT2019) assume that it may be hydrocephalus as described in eq. (2) 2345 A. MRI Image 6345 The dataset is projected in horizontal plane. Dimension ������������ = (2) of the image is 160 x 256 pixels. Figure 6 5. Frontal Occipital Horn Ratio (FOHR) demonstrates the brain images in a horizontal plane. O'Hayon BB et al. [5] proposed FOHR. This ratio was calculated from the summation of ������,-. and horn the longest side in the occipital frontal of the ventricle ������,-. divided by ������,-. as described in eq. (3) ������������������������ = 2345;<345 (3) 6345 III. PROPOSED METHOD Figure 6. Brain MRI horizontal plane. Our proposed method for the classification of B. Image-preprocessing Hydrocephalus Condition and Cerebral Atrophy diseases consists of the following steps as shown in In this brain image processing, the elements within the Figure 5. brain were interesting things. Therefore, the preparation processes to eliminate some unwanted components of the images before processing were very important. These steps can simplify computation in the following steps and provide higher accuracy. In this paper, MLO techniques proposed in [2] had been used. There were 4 steps. First, identify backgrounds and foreground using Object Attribute Thresholding (OAT). Second, erosion morphology for separating brain from surrounding objects was applied. Third, labeling of each element in the segmented image was used. The largest pixel group in the segmented image will be identified as brain area. Finally, filling holes using modified morphology process was used. Figure 5. Flowchart of proposed method. Figure 7. Flowchart of MLO method. C. Feature extraction 1. The Ventricle and Sulci Ratio (VSR) Ventricle and sulci regions are commonly used for identifying Hydrocephalus and Cerebral Atrophy symptoms. Because size of ventricle has a large size, it can be set as the landmark for localizing and identifying Cerebral Atrophy symptom. Large size of ventricle will be come together with lost of brain tissue. Then, sulci region will be wide and deep. On the other hand, for in the Hydrocephalus symptom, the ventricle region will be expanded because ventricle 73

2019 4th International Conference on Information Technology (InCIT2019) region contains amount of water. Extending of ������ = area of the brain. ������6, ������6 = ������, ������ center position of each object. ventricle region causes the outer brain narrow. Since `] a] *] + -] = 1; ������, ������ > 0 (6) both symptom are having an inverse proportion, we Where then propose the ratio between ventricle region and summation of sulci vectors ( VSR ) for separating ������, ������ = coordinate of centroid. ������, ������ = major axis and minor axis length. Hydrocephalus from Cerebral Atrophy symptom. The ratio between the ventricle (������) and the sum of sulci vectors (������6) can be described as the eq (4): ������������������ = I (4) ∑&'() J& Figure 8. Sulci and ventricle that are used to calculate the ratio. Figure 9. Vectors converging to the center. 2. Convergence of Vector Summation (CVS) ������ + ������ ∗ (cos (������������������������������)), ������ + ������ ∗ (sin (������������������������������)) (7) In this feature extraction step, CVS value is obtained Where from sum up 360 different rotation of vectors. Firstly, centroid point of brain object is located using equation ������, ������ = coordinate of centroid. 5 as shown in Figure 8. Secondly, the secant line (SL) ������ = radius of object ������. between centroid coordinate and brain object ������������������������������ = the angle of rotation of axis. boundary is computed using eq. (6). Thirdly, by using eq. (7), SL is rotated from 1 degree to 360 degree as Figure 10. Finding Sulci Vector Algorithm. exhibited is Figure 9. Fourthly, overlapped area (OA) between each SL and Sulci region is identified. Then, each coordinate and its OA are inserted into vector in each rotational degree. The algorithm to find Sulci vector is demonstrated in Figure 10. Fifthly, to compute the internal force which converge to centroid coordinate, the momentum formula is employed in eq. (8). According to the summation, the output with large momentum is likely to be Cerebral Atrophy. The output with less momentum is classified to Hydrocephalus instead. Figure 11. Vectors convergence. ���n⃑��� = n���nn���nn���nn⃑��� (8) Where Figure 8. Center point of the object. ������ = Momentum. ������ = Mass of vector. ������������������������������������������������ = (������ = ������S������S + ������V������V + ⋯ + ������X������X, ������ = Angular speed. ������ ������ = -)\\);-]\\^];⋯;-'\\') (5) D. Classification Where Brain features were extracted from MRI images of the Hydrocephalus and Cerebral Atrophy. will be used as ������, ������ = coordinate of centroid. a learning set to create a classification model by a waka ������6 = area of each object. software package, method Naive Bayes as described in eq. (9) 74

2019 4th International Conference on Information Technology (InCIT2019) ������(������|������) = t(������|������)t(u) (9) sets of data: 1. A series of images of Alzheimer's patients from the database the Open Access Series of t(.) Imaging Studies (OASIS) 30 images. and 2. images of Parkinson's disease patients 30 images. Where ������(������) = All attribute numbers. ������(������) = All class numbers. The results showed that when we used 3 features that were earlier, the results obtained were IV. EXPERIMENTAL RESULTS Hydrocephalus, 53.3%, TP Rate, 5% FP Rate, 61.5% F-measure and Cerebral Atrophy, 9 5 . 0 % TP Rate, From a study about Hydrocephalus and Cerebral 46.7% FP Rate, 91.9% F-measure as shown in Table Atrophy, researchers have examined brain images of 2. When using 5 features [1], it can be seen that the FP 75 patients to do the experiment. Uses Three standard rate in the Cerebral Atrophy is still high and TP Rate, data sets contain 1. Database of patients with cerebral F-measure of Hydrocephalus can also increase hemorrhage 15 images, which is a database of the accuracy. Therefore, in order to increase accuracy, we Department of Radiology and Medical Informatics. use all 7 features that are combined with the features we proposed. It can be seen that there are better University Uniformed Services In terms of experimental results as shown in TABLE 2. images, patients with brain disease are divided into 2 TABLE 1. RESULTS OF ALL SEVEN FEATURES EXTRACTION. MRI disease CVS VSR SR Hydrocephalus FOHA Top FOHA Left FOHA Right image ER FOHR VA HC 165.22 1 HC 93.57 3.91 0.0053 0.10 0.72 23 43.96 121.55 121.39 2 HC 162.65 2.84 0.0035 0.39 0.91 54 81.38 129.72 0.00 3 HC 142.87 1.25 0.0082 0.16 0.33 54 104.17 160.52 124.88 4 HC 52.23 1.06 0.0058 62.02 135.80 137.36 5 36.95 24.21 0.0026 0.17 0.65 64 58.88 147.42 HC 0.47 1.04 52 157.00 15 101.422 3.09 0.0048 42.02 102.19 PD … 116.57 16 PD 195.41 0.84 0.0155 0.37 0.84 2 92.49 115.09 117.83 17 PD 179.61 0.59 0.0165 76.99 121.58 141.19 18 PD 130.83 0.79 0.0101 Cerebral Atrophy 43.06 130.36 117.89 19 PD 201.27 0.65 0.0149 69.50 120.81 178.96 20 214.99 0.49 0.0184 0.28 0.65 65 90.00 62.72 PD 0.26 0.62 22 48.59 45 AD 274.54 0.34 0.0284 0.23 0.55 37 145.30 128.88 106.17 46 AD 223.88 0.82 0.0271 0.22 0.58 57 97.90 109.96 100.45 47 AD 294.11 0.71 0.0287 0.31 0.64 59 93.62 115.79 107.59 48 AD 298.41 0.44 0.0294 97.43 116.57 119.10 49 AD 221.99 0.25 0.0216 … 140.84 63.58 98.92 50 194.79 0.4 0.0216 0.19 0.55 7 97.89 117.00 AD 0.32 0.73 58 102.09 75 276.01 0.73 0.0266 0.29 0.71 63 100.15 104.62 0.28 0.59 62 0.25 0.53 51 0.25 0.54 20 … 0.30 0.74 81 TABLE 2. MEASURE PERFORMANCE USING TRUE POSITIVE, FALSE POSITIVE, F-MEASURE. Use three Feature Use five Feature Use seven Feature (ER, VA, FOHR, Accuracy (%) (ER, VA, FOHR) (ER, VA, FOHR, FOHA, SR, CVS, VSR) TP FOHA, SR) 93.3 FP 3.3 F-measure Hydrocephalus (HC) 90.3 TP 53.3 73.3 96.7 FP 6.7 F-measure 5.0 1.7 97.5 61.5 81.5 Cerebral Atrophy (CA) 95.0 98.3 46.7 26.7 91.9 95.9 75

2019 4th International Conference on Information Technology (InCIT2019) In this paper, We measure performance using TP rate, classifying brain diseases with thinned ventricle and FP rate, F-measure with the following equation: brain damaged will be attempted. TP rate, Recall : wt In this research, CVS, and VSR perform (wt;xy) better results in the classification of Hydrocephalus ������������������, ������������������������������������ = (10) and Cerebral Atrophy, the sulci is used for determining diseases. Both of the proposed features focus on sulci, FP rate: ������������������ = (wyx;txt) (11) Which can be seen that, the previous features focus on Ventricle, which may not be enough to classify. When Precision: wt adding two of our features, the result of the experiment (wt;xt) is improved. ������������������������������������������������������ = (12) REFERENCES F-measure V∗t~•u6%6<X∗€•u-•• t~•u6%6<X∗€•u-•• [1] M. Chansuparp, A. Rodtook, S. Rasmequan and K. ������ − ������������������������������������������ = (13) Chinnasarn, \"Novel features for classification of hydrocephalus and cerebral atrophy,\" 2016 13th International where Joint Conference on Computer Science and Software Engineering (JCSSE), Khon Kaen, 2016, pp. 1-6. ������������ = True Positive. ������������ = True Negative. [2] M. Chansuparp, A. Rodtook, S. Rasmequan and K. ������������ = False Positive. Chinnasarn, \"The automated skull stripping of brain magnetic ������������ = False Negative. resonance images using the integrated method,\" 2015 8th Biomedical Engineering International Conference V. CONCLUSION AND DISCUSSION (BMEiCON), Pattaya, 2015, pp. 1-5. In this research, the classification of brain diseases was [3] Anna Fabijanska, Tomasz Weglinski, Krzysztof Zakrzewski, proposed. By using five traditional features, the Emilia Nowosławska, “Assessment Of Hydrocephalus In classifying result reaches high of FP-rate. It means Children Based On Digital Image Processing And Analysis”, that, the output from classification model with five International Journal of Applied Mathematics and Computer input features consist of numerous of false result. Science (AMCS), Vol.24, No.2, 2014. Thus, two new features for classification model are accounted. Firstly, we proposed “Ventricle and Sulci [4] H. Ng, C. Chuang and C. Hsu, \"Extraction and Analysis of Ratio (VSR)”. It is the ratio between brain ventricle Structural Features of Lateral Ventricle in Brain Medical and the sum of sulci vectors. Secondly, we proposed Images,\" 2012 Sixth International Conference on Genetic and “Convergence vector summation (CVS)”. It is the Evolutionary Computing, Kitakushu, 2012, pp. 35-38. amount of virtual force converging to center of brain image. By combining two new proposed features with [5] O'Hayon BB, Drake JM, Ossip MG, Tuli S, Clarke M, “Frontal five traditional features, the result from our approach and occipital horn ratio: A linear estimate of ventricular size reach 93.3%, TP Rate, 3.3% FP Rate, 90.3% F- for multiple imaging modalities in pediatric hydrocephalus”, measure in Hydrocephalus classification. 96.7% TP Pediatr Neurosurg, Nov 1998. Rate, 6.7% FP Rate, 97.5% F-measure in Cerebral Atrophy. In the future work, more new features for [6] Evans, W.A., Jr, “An encephalographic ratio for estimating ventricular enlargement and cerebral atrophy”, Arch. Neurol. Psychiat., Vol 47, pp.931–937, 1942. [7] Dana W. Moore, Ilhami Kovanlikaya, Linda A. Heier, “A Pilot Study of Quantitative MRI Measurements of Ventricular Volume and Cortical Atrophy for the Differential Diagnosis of Normal Pressure Hydrocephalus” Neurology Research International, 2012. 76


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook