Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore 2020所簡介

2020所簡介

Published by hcfeng, 2020-06-21 23:03:25

Description: 2020所簡介

Search

Read the Text Version

Index 所長的話 2 Director's Message 4 6 前言 7 Preface 9 9 亮點計畫 27 Spotlight Projects 27 71 人工智慧計畫 71 Arti cial Intelligence Projects 107 107 所內合作計畫 141 IIS Collaborative Projects 141 149 八大實驗室 149 Research Laboratories 189 189 特聘講座 / 特聘研究員 Distinguished Visiting Chair / Distinguished Research Fellow 研究人員 Research Faculty 支援部門 Supporting Departments 1

所長的話 「山不在高,有仙則名; 水不在深,有龍則靈」 學術研究的 impact 並不在以量取勝,中央研究院本身並 沒有靠數量取勝的本錢。資訊所成立至今已超過 35 個年 頭,所內同仁兢兢業業,歷年來獲得許多大獎的肯定。 包括兩位前所長李德財教授及李琳山教授先後獲選為中央 研究院院士。其他同仁的表現也不遑多讓,歷年來曾獲得 國科會傑出特約研究員獎、傑出研究獎、吳大猷獎、中央 研究院深耕計畫、中央研究院年輕學者著作獎、傑出人才 基金會創新研究獎、潘文淵研究傑出獎、東元獎等。數位 同仁並因在學術上之傑出表現獲選為 IEEE Fellow 及 ACM Fellow。 2

前所長李德財院士曾在就任時帶來兩句英文 \"Every job is a self-portrait of those who did it. Autograph your work with quality\"。當時公開徵求翻 譯,第一名的朋友翻譯成「件件工作,反映自我;凡經我手,必為佳 作。」個人非常認同這樣的工作態度。學術研究工作如果將目標界定 在造福人類,那致富就不是它的終極目標。個人也非常認同李遠哲前 院長「知識的果實與全人類共享」這樣的精神,也願意與同仁為這樣 的理想共同努力。 最後,我以資訊所的簡稱「IIS」表達自己對未來最誠摯的期許。第一 個「I」是 Integrity,我期待每一個同仁都能以「正直」當處世的態度, 不管是對人或對事;第二個「I」是 Innovation,要產生有 impact 的研 究成果,必定植基於創意;第三個字「S」是 Serenity,這個字的意涵 遠遠超過它的直譯「寧靜」。它蘊涵著「淡泊明志、寧靜致遠」的意思。 這種內在的寧靜能讓我們從事研究工作的人看的很遠,當研究遇到瓶 頸時能堅毅不拔、冷靜面對、並有智慧的解決問題。就此與大家共勉, 加油啦! 3

Director's Message \"Known will hills be if fairies dwell, no matter high or low; and charmed will waters be if dragons lurk, no matter deep or shallow.\" The impact of academic research is not a product of quantity, and so quantity of research is not the main concern of Academia Sinica. The Institute of Information Science has been established more than 35 years, and with a careful and conscientious mindset, our research fellows have earned many distinguished awards. Former directors Dr. Der-Tsai Lee and Dr. Lin-Shan Lee are chief examples, both having been elected as Academician of Academia in succession. The achievements and honors of other faculty are equally significant―they include the Outstanding Appointed Researcher Award, Outstanding Research Award, Ta-You Wu Memorial Award, Academia Sinica (A.S.) Sprout Project Award, A.S. Award for Junior Research Investigators, Young Researcher Award of the Foundation for the Advancement of Outstanding Scholarship, Outstanding Award of Pan Wen Yuan Foundation, TECO Foundation Award, and more. A few faculty members 4

are honored as IEEE Fellow and ACM Fellow due to their academic achievements. I fully ascribe to the motto promoted under former Director Dr. Der-Tsai Lee―\"Every job is a self-portrait of those who did it. Autograph your work with quality.\" This mentality conveys that the ultimate goal of academic research is not monetary fortune, but rather the fortune of mankind. On top of that, I support and hope to carry on the spirit expressed by Dr. Yuan-Tseh Lee, the former President of Academia Sinica, when he said, \"Share the fruit of knowledge with all mankind.\" Finally, I.I.S., which stands for Institute of Information Science, also stands for the following: \"I\" for Integrity- may upright be our name; \"I\" for Innovation- impactful research always originates from creativity; \"S\" for Serenity-rather than the literal Chinese translation of \"silence,\" its greater meaning derives from the saying, \"A simple life preserves integrity, tranquility yields transcendence.\" This inner quietness empowers researchers to enlarge their scope and vision, and to be still when faced with a problem, and then make a sound judgment that truly solves the problem by the given wisdom. Let's do it and strive for excellence! 5

前言 本次資訊所簡介的編排方式以不同的風格及樣貌呈現。為了讓外界能快 速了解資訊所過去兩三年的研究重點及成果,以及未來五年的研究重 心,我們將本所幾類重大的計畫逐一呈現。 首先特別介紹本所四個「亮點研究計畫」的成果。接著,資訊所獲得科 技部補助的五個「AI 計畫」將從不同專業角度切入,詳述他們各自與 AI 發展及應用的關係。在介紹完 AI 計畫之後,是對本所四個「大型合作 計畫」的詳細介紹。自 2019 年開始,所內改變合作計畫的執行方式 ─ 將各計畫預算金額放大、鼓勵與外界合作、並同時要求執行計畫者必須 盡量讓計畫變成未來能代表資訊所的旗艦計畫。 最後,資訊所八個實驗室將分別介紹過去三年、目前、及未來數年的研 究方向與成果;本所研究團隊及其各自研究領域、專長、貢獻亦將逐一 精彩呈現於所簡介尾端。 以下的篇幅,我們先介紹資訊所的「亮點研究計畫」。 6

Preface We present a brand new layout for the 2020 IIS brochure. In order for readers to capture at a glance the highlights of IIS research in recent years and future 5-year research priorities, we will present the outstanding research projects in the following categories respectively. We first introduce the research achievements of our four Spotlight Projects. These are followed by developments and applications arising from five AI Research Projects funded by the Ministry of Science and Technology (MOST). We then present four outstanding IIS Collaborative Projects, representing a new strategy started in 2019. We have greatly increased the budget for all four of those projects to strongly encourage external collaborations. All respective Principal Investigators have been assigned the mission of designing their collaborations as future agship projects for the Institute. The details of these projects cover quite a few pages in this brochure, and we are sure that our readers will not want to miss them! Finally, each of our eight IIS laboratories presents their own signi cant research accomplishments and academic impacts over the past three years. Future research plans and directions are also emphasized in this section of the brochure. Thus, it is the research sta of the current faculty and their research expertise that has the honor of closing out the 2020 IIS brochure. Get ready to experience and explore a feast of Information Science! Enjoy! 7

亮點計畫 Spotlight Projects

音樂會動畫自動生成系統 10 The AMCA Project: Automatic Music Concert Animation 12 14 密碼學與量子計算 16 Cryptography and Quantum Computation 18 20 染色體層級之新世代定序組裝策略 A Reliable Gap-Filled Strategy for Non-Reference 22 Chromosome-Level Assembly 24 空氣盒子:微型空氣品質感測系統 AirBox: Micro Air-Quality Sensing System 9

Spotlight Projects亮 點 計 畫 音樂會動畫自動生成系統 計畫主持人:蘇黎博士 計畫期程:2019/1~2021/12 多 媒 體 產 業 的 人 工 智 慧 化 是 一 項 細 膩 的 工 程, 其 往 往 segmentation)的問題。我們基於 U-Net 的架構,考慮具 關 係 到 影 像、 聲 音、 乃 至 於 情 感 層 面 等 多 模 態(cross- 有注意力或擴張機制的卷積核,同時處理不同尺寸的目 modal)資料的整合。例如,在多媒體動畫的製作過程中, 標物件,例如辨識短音與長音。 如何讓影像與音樂完美結合,是需要大量製作者耗費心 力安排的工作。本計畫的目的即是希望能突破此藩籬, 在動作生成研究中,專注於小提琴演奏者的動作生成已 可以讓機器自動理解音樂內容,並對應到虛擬角色的肢 經有了初步成果:以小提琴獨奏錄音檔為輸入訊號,即 體動作,甚至可以與真人一同表演。此技術未來預期可 可自動產生虛擬小提琴家的肢體座標,並透過音樂情緒 讓動畫製作者省下大量的製作時間,讓人機互動的多媒 模型決定身體律動。相較於端到端的類神經網路訓練模 體展演增加無限可能。 式,我們的初步成果著重於可解釋、可操控的參數化肢 體動作生成模式。本方法由右手的弓法模型、左手的指 本計畫聚焦於打造可以跟真人音樂家一起進行現場演出 法模型、以及上半身的音樂情緒模型所組成:右手的模 型由基於音訊的換弓點偵測達成,左手的模型則是透過 的虛擬音樂家。本系統分為三部分:音訊分析、動作生 音高偵測對應到把位與弦,左右手的弓指法資訊可決定 成、即時同步。音訊分析包含自動採譜、主旋律偵測、 生成骨架的型態。在音樂情緒的部分,由於頭部與上肢 樂器種類偵測等等,在過去的做法中,因涉及不同的訊 隨著節拍的週期性傾斜角度與音樂的激昂度(arousal) 號 特 徵 與 音 樂 資 料 標 註, 難 以 建 立 整 合 型 的 音 樂 分 析 有關,我們根據音訊模型的拍點偵測(beat tracking)與 解決方案。如今由於神經網路在多任務學習(multitask 音樂情緒模型的激昂度預測來控制頭部與上肢的傾斜角 learning)的發展,同時處理音高、時間和樂器種類的深 變化。同樣的原理也適用於其他種類的弦樂器。從音訊 度學習系統已經成為可能。我們疊合不同類型的訊號特 生成肢體動作的問題目前還在發展階段,未來有非常多 徵進行特徵選取的工作,增加訓練模型的強健性,並達 的發展可能。 到 移 調 不 變 性(transposition-invariant) 並 抑 制 掉 音 訊 處理問題中典型的泛音錯誤,更精確的說,我們提出的 方法將音訊分析簡化為電腦視覺中語意分割(semantic 圖一:可與真人音樂家一起進行現場演出的虛擬音樂家概念圖。 10

Brochure 2020 最後,在即時同步的技術上,我們提出的系統包含音樂 我們目前已經將上述技術應用在音樂視覺化、自動伴奏 / 追蹤器(music tracker)、音樂偵測器(music detector) 合奏、以及自動肢體動作生成等三種表演類型。我們的 和位置估算(position estimation)三個部分。音樂追蹤 系統已經在數個表演現場演奏,包含〈日新樂譯〉音樂 器包含多執行緒之線上動態時間校正(online dynamic 會(與沛思文教基金會合作,於國家音樂廳演出)、清 time warping, ODTW)演算法,每個執行緒使用 ODTW 大 AI 樂團開幕(與清大 AI 樂團合作)、〈夜之絮語〉音 估測現場演奏音樂當下的演奏速度,各自的結果加以平 樂會(與長笛家林怡君等合作,於衛武營演奏廳演出), 均得到精確的演奏速度值,與參照的演奏檔案比較,可 以及在 2019 年底演出的〈聲形〉音樂會(與口口實驗室 以得出速度的相對值。音樂偵測器的功能在於偵測音樂 合作,於濕地 Venue 演出)等等,除了是對於我們方法 什麼時候開始,這個機制可以讓我們不需要手動操作即 上的驗證以外,也成為技術開發者與製作人、表演者的 時同步系統。最後,由於音樂中會有許多重複的片段, 發想與溝通的重要平台,期望這樣的技術落地成為下一 所以位置估算的機制可以讓我們同時追蹤目前可能演奏 代多媒體產業的核心。 到的位置。結合以上三者,我們可以即時推出現場演奏 音樂在原譜或參考音檔中的位置,而表演的設計者可以 根據這個資訊作事件的對應。 圖二:小提琴演奏者的動作骨架生成系統。 11

Spotlight Projects亮 點 計 畫 The AMCA Project: Automatic Music Concert Animation Principal Investigator: Dr. Li Su Project Period: 2019/1~2021/12 Incorporating artificial intelligence into the multimedia based systems that simultaneously detect multiple pitch, industry is a complex project. It involves integration timing, and instrument types have become possible due of cross-modal data such as images, sounds, and even to the development of neural networks (NN) in multi- emotions into multimedia content. For example, for the task learning (MTL) approaches. Moreover, we can now production of animation, how to make a video match superimpose different types of signal representations, perfectly with performed or background music is a task allowing convolution kernels in NN to automatically that requires a lot of e ort from animation producers. The select desired features. Consequently, the training model purpose of this project is to overcome these difficulties. exhibits enhanced robustness, achieves transposition- We are endeavoring to teach machines to automatically invariance, and suppresses the challenging overtone understand music content and then match the body errors usually generated in audio processing. More movement of virtual characters to that content. This would specifically, our proposed method simplifies the issue ultimately allow virtual musicians to perform with real of musical transcription into semantic segmentation in people. We anticipate that, in the future, our technology will computer vision. Our U-Net-based architecture considers make the animation industry more efficient and enhance convolution kernels via attention or dilation mechanisms to the possibilities of interactive multimedia presentations. simultaneously process objects of di erent sizes, such as to identify both short and long musical notes. In this project, we are generating virtual musicians that can play music along with live musicians. Our proposed To generate animated body movement, we have achieved system can be divided into three elements: audio analysis, preliminary results based on the motion of a violin player. motion generation, and real-time synchronization. Audio Using a recording of a violin solo as the input signal, we analysis primarily involves automatic music transcription, automatically generate coordinate values of the body joints melody detection, and musical instrument recognition. In for a virtual violinist. Long-term body rhythms can also the past, due to the diversity of signal characteristics and be determined by our music emotion recognition model. data labels, it was di cult to establish a systematic solution Instead of employing an end-to-end NN, we are focusing for automatic music analysis. Nowadays, deep learning- on more interpretable and controllable body movement Figure 1 : The virtual musician system. 12

Brochure 2020 generation methods. Our proposed model consists of a we incorporate rhythm tracking from the audio model bowing model for the right hand, a fingering (position) and an emotion predictor model to control those aspects model for the left hand, and a musical emotion (expression) of body motion. These same principles can be applied to model for the upper body. The bowing model has been other kinds of stringed instruments. We are still tackling designed with an audio-based attack detection network, the problem of generating body movements solely from whereas the ngering model computes left-hand position audio content, but there are many possibilities for future from music pitch. From this information, patterns for the development. generated skeleton can be determined. In terms of music emotion recognition, since periodic head tilt and upper body motion tend to follow the rhythm and music type, Figure 2 : The skeleton generation system for violin performance. For real-time synchronization, our proposed system program director to design responsive events based on incorporates three elements, i.e., a music tracker, a music that information. We have applied this system to music detector, and a position estimator. The music tracker visualization, automatic accompaniment/ensemble, and includes online dynamic time-warping (ODTW) algorithms generation of automatic body movements for a virtual working across multiple threads. Each thread uses ODTW musician. to estimate the current performance speed of the live music performance. Estimated values across threads are Our system has been utilized for several live performances, averaged to obtain a stable and accurate estimate of including the Sound and Sense concert (in cooperation with performance speed. Relative tempo values are obtained the Pace Culture and Education Foundation, performed by comparing the live performance with a reference in the National Concert Hall), the opening ceremony of performance recording. The function of the music detector the NTHU AI Orchestra (in collaboration with the NTHU AI is to automatically detect when the music starts, meaning Orchestra), Whispers in the Night (in collaboration with that there is no need to manually launch the real-time synchronization mechanism. Finally, since music exhibits utist Sophia Lin, performed in the Weiwuying Auditorium), many repetitive segments, our position estimation and Sound and Shape (in collaboration with Koko Lab. Inc., mechanism allows us to simultaneously track the positions performed at Wetland Venue). These concerts were held not that the musician is currently playing. Combining these only to test our technology, but also to facilitate in-depth three elements, we can immediately align the position of conversation among music producers, performers, and a live performance to a reference recording, allowing a music technology developers, with the view to introducing new-age music technology to the multimedia industry. 13

Spotlight Projects亮 點 計 畫 密碼學與量子計算 計畫主持人:楊柏因博士、鐘楷閔博士 計畫期程:2018/1-2021/12 現代密碼學是探討如何在抵禦壞人行為下達成保密性、 殺手級應用之一。秀爾演算法基本上是個利用量子特性尋 真實性與各種所需功能的學科,這可被比喻為「好人」 找週期的演算法,能有效率的破解 RSA 與橢圓曲線密碼 (Honest Party) 與「 壞 人 」(Adversary) 的 戰 爭。 經 過 超 學 (Elliptic Curve Cryptography; ECC)。 儘管量子電腦的出 過 三 十 年 的 研 究, 密 碼 學 已 發 展 為 一 門 極 其 豐 富 且 有 現速度比預期的要慢,但 NIST 的一份報告曾謹慎的預估 重要實際應用的領域。從理論的角度來說,密碼學的發 量子電腦最早可能將於 2030 年破解橢圓曲線密碼學。 展 已 遠 超 越 歷 史 上 安 全 通 訊 的 目 標, 使 我 們 能 實 現 許 多看似矛盾不可能的密碼學任務,如零知識證明 (Zero- 鑑於轉換現行密碼架構的困難性,這是當今亟待解決的 knowledge Proofs)、 安 全 多 方 計 算 (Secure Multi-Party 問題。其嚴重性體現在轉換現行密碼架構所需的成本與 Computation) 與對加密資料的計算等,並伴隨著嚴謹的 時間。前者可以從當年解決 Y2K 問題的成本推估,後者 安全性保證。從實務的角度來說,密碼學是保護現代網 我們則應注意到業界已經花了二十年但仍未完全轉換到 路安全的基礎。每次與 Google 或成千上萬個網站建立連 AES。後量子密碼學(Post-Quantum Cryptography; PQC) 接時,我們都會使用加密技術。沒有它,現代網絡商務 是一個快速發展的領域,它通過開發可抵禦量子攻擊的 將一秒鐘也無法生存。 安全密碼系統(通常是公鑰密碼系統)來應對這一挑戰。 我們長期致力於密碼學的理論與實務研究。楊柏因研究 我們也可以將量子視為一把雙刃劍。 儘管可能無法在 員是廣泛使用的 Ed25519 數位簽章的共同作者。Ed25519 可見的未來實用化,但量子也增強了好人的能力,使其 將被美國國家標準暨技術局(NIST)納入 FIPS 186-5 標準 達成更強安全性或更多的任務。近年來量子密碼學對於 中,並有數億用戶使用。 楊柏因和王柏堯研究員也是在 當好人取得量子能力時的可能性有豐富且令人興奮的發 「高安全性加密軟體」(High Assurance Crypto Software) 展。 領域的先驅研究者。在理論研究方面,鐘楷閔研究員的 研究涵蓋了多項重要研究主題如零知識證明與 ( 平行 ) 隨 機存取機 ((Parallel) RAM) 計算模型下的密碼學等。 隨著最近量子科技與量子電腦的快速發展,了解量子對於 密碼學的影響是十分重要的研究課題。( 大型 ) 量子電腦 能利用秀爾演算法 (Shor’s Algorithm) 對現今公鑰加密系 統造成災難性的影響,這也是少數目前確認的量子計算的 後量子密碼學 密碼學是量子電腦造成最早也最可見實體影響的學科: 再 1-3 年後標準化。這個過程和 AES 與 SHA-3 的競賽是 美國國家標準局 (NIST)在 2015 年就開始討論新一代 類似的。 的、可抗量子的公鑰密碼系統,在 2016 年他們公開向全 世界徵求後量子密碼系統,這個甄選過程在 2017 年截止 楊柏因研究員在這十幾年來一直研究後量子密碼學,並 收件。在這四年間,幾乎所有實作派的密碼學家都參與 成為國際知名的專家 (他是後量子密碼學國際研討會指 後量子密碼系統的設計、實作和破密。 導委員會的成員)同時他的實驗室也是在台灣唯一參加 2017 年 NIST 甄選投稿的團隊。 NIST 的標準化過程進行到第二階段,在 82 個投稿的系統 中有 69 個「完整而正確」,它們成為第一階段的候選者, 楊研究員是所謂多變量密碼系統的專家。這種密碼系統 並被邀請參加 2018 年四月的第一次候選者大會。而幾 的安全性是繫於解多變量非線性方程組的複雜度。他的 乎是 NIST 才剛公布全部的名單,後量子密碼學家就已經 團隊完成了很多多變量密碼系統的理論和實務面的探 在分析和攻破這些系統。大概有 50 個沒被攻破的系統中 討,而幾乎所有現存的多變量密碼系統的設計都有他經 NIST 挑出 26 個(其中 17 個加解密系統和 9 個數位簽章) 手的部份。他的團隊在 NIST 的甄選中投了兩個系統,其 晉級第二階段,並邀請它們參加 2019 年八月的第二次候 中之一 (Rainbow 簽章系統)晉級第二階段且很可能進入 選者大會。第三階段預計將在 2020 六月開始。晉級這一 第三階段。 階段的最後決選者中將預計在 2022 年選出勝利者,並在 14

Brochure 2020 同時,楊團隊也研究其他的後量子密碼系統的設計、實 resilient Cryptography 與 Tampering-resilient Cryptography 作和攻擊。他們曾在幾個公開的晶格密碼系統分析的競 領域,以及在分析使用隨機預言機 (Random Oracle) 的構 賽中名列前茅。在其他的密碼系統中也可以看到他的影 造的安全性時,可能為壞人帶來優勢。 從理論的角度, 響。他最近的一個研究是如何高速且定常時間的計算最 了解這些優勢對安全性的影響,以及如何證明安全性是 高公因式 ( 數 ) 和模乘法反元素,這在 NTRU 及相關的密 重要的研究課題。這樣的理論研究也會對實務上決定後 碼系統的金鑰生成上非常重要。 量子密碼系統的安全參數產生影響。在此研究方向我們 的研究集中在探討具有量子信息壞人,發展對此證明安 量子為壞人帶來的能力其實不僅限於量子計算,還有如 全性的技術。我們也探討量子隨機預言模型下的安全性。 處 理 量 子 信 息, 利 用 量 子 糾 纏, 與 進 行 量 子 疊 加 的 操 作 等, 這 些 能 力 在 不 同 的 密 碼 學 領 域, 如 在 Leakage- 量子密碼學 相對於後量子密碼學探討古典密碼系統如何抵禦量子 第 一 項 是 關 於 設 備 無 關 密 碼 學, 其 目 的 是 設 計 協 定 (Protocol) 讓 古 典 的 好 人 能 安 全 地 利 用 不 可 信 任 的 量 壞 人, 量 子 密 碼 學 廣 泛 地 探 討 當 好 人 也 具 備 量 子 能 力 子 設 備。 我 們 的 研 究 工 作 提 出 了 一 個 基 於 最 少 假 設 的 時 能 實 現 的 密 碼 學 任 務。 如 量 子 密 鑰 分 發 (Quantum 設 備 無 關 亂 數 增 強 (Device-independent Randomness Key Distribution; QKD) 可 以 實 現 具 資 訊 理 論 安 全 性 Ampli cation; DI-RA) 協定。具體來說,我們的協定能只 (Information-theoretic Security) 的 安 全 通 訊 (Secure 利用一個具足夠亂度 (Su cient Min-entropy) 的弱亂數源 Communication),這在古典密碼學 ( 好人不具備量子能 (Weak Random Source),在不需要其他結構上的假設下可 力時 ) 可被證明是不可能實現的。又例如當未來資料也 驗證的產生純隨機數 (Truly Random Bits)。其他已知協定 變成量子資料時,我們如何對量子資料進行加密,能否 均需要使用具一定結構的 Santha-Vazirani 亂數源與特定 像對古典資料一樣,對加密後的量子資料進行運算,也 的條件獨立性假設。此研究結果在基礎物理學上也有應 是量子密碼學探討的重要問題。儘管這樣的研究可能在 用:我們的協定可被解讀為一個關於物理世界具隨機性 可見的將來無法實用化,量子密碼學是令人興奮的跨領 或確定性的二分定理 (Dichotomy Theorem):只要物理世 域 的 學 科, 需 要 結 合 密 碼 學、 量 子 物 理、 複 雜 度 理 論 界不是確定性的,就可驗證的存在完全隨機無法預測的 (Complexity Theory) 與 資 訊 理 論 (Information Theory) 等 事件,因此排除了「弱隨機世界」的可能。 領域的最近技術來研究。量子密碼學也提供了豐富的情 境與研究課題,往往能協助發展新的技術與深入的觀察, 我們近期的研究工作發展並利用量子密碼學的技術來研 反過來推動這些領域的研究。 究「 低 量 子 深 度 的 經 典 量 子 混 合 計 算 模 型 」 的 計 算 能 鐘 楷 閔 研 究 員 是 理 論 學 家, 從 事( 後 ) 量 子 密 碼 學 的 力,並回答 Scott Aaronson 與 Richard Jozsa 過去提出的 理 論 研 究 近 十 年。 他 曾 在 各 主 要 國 際 密 碼 學 會 議, 如 猜想。在可見的未來量子電腦的能力可能會被侷限在低 CRYPTO,Eurocrypt,Asiacrypt,QCrypt 與 TCC 擔 任 議 深度的量子計算,因此此類混合計算模型可以說是短期 程 委 員 會 (Program Committee) 委 員。 除 上 述 關 於 後 內利用量子電腦所能取得的計算能力。Jozsa 於 2006 年 量 子 密 碼 學 的 理 論 課 題 外, 他 也 研 究 了 量 子 密 碼 學 中 提出一個猜想,認為所有多項式時間 (Polynomial-time) 的 多 個 課 題, 如 設 備 無 關 密 碼 學 (Device-independent 的量子計算都可以在一種基於所謂 Measurement-based Cryptography)、 安 全 多 方 量 子 計 算 (Secure Multipart Computation 所定義的混合計算模型下被模擬。但另一 Quantum Computation) 與 古 典 委 託 量 子 計 算 (Classical 方面,Aaronson 持不同立場,猜想 BQP 與另一種自然的 Delegation of Quantum Computation) 等,並利用量子密 混合計算模型之間應該存在所謂的 Oracle Separation。我 碼學中的技術來回答量子複雜度理論中的猜想。底下重 們利用量子密碼學的技術證明了 BQP 與這兩種混合計算 點介紹其中兩項研究工作。 模型之間的 Oracle Separation,證明了 Aaronson 的猜想, 也表示 Jozsa 的猜想相對於 Oracle 並不成立。 15

Spotlight Projects亮 點 計 畫 Cryptography and Quantum Computation Principal Investigators: Dr. Bo-Yin Yang and Dr. Kai-Min Chung Project Period: 2018/1-2021/12 Traditionally, cryptography is about achieving secrecy applications\", fundamentally is a period-finding method that and authenticity and other desired functionalities while can \"break\" the RSA cryptosystem and ECC (Elliptic Curve withstanding adversarial behaviors, which can be viewed as a Cryptography). Although the advent of quantum computing battle between honest parties and their adversaries. After more has been slower than expected, NIST have cautiously predicted than three decades of research, cryptography has become that quantum computers will decode ECC by as early as 2030. an extremely rich field with significant practical impact. Due to the challenges of all society migrating to a new Theoretically, it has evolved far beyond the traditional goal cryptography infrastructure, it is a pressing issue that must of secure communications, enabling us to realize seemingly be addressed now. Not only is it a critical issue, but it has contradictory tasks such as zero-knowledge proofs, secure serious implications in terms of both cost (inferred from the multi-party computation, and computation over encrypted costs of the Y2K transition) and time (as industry has not fully data, whilst providing rigorous security guarantees. On the transitioned to AES despite the passing of two decades). Post- practical side, it serves as a fundamental building block of quantum cryptography (PQC) is a rapidly-growing research modern cybersecurity; every single connection to Google or any of thousands of websites involve use of cryptography. eld that addresses this challenge, involving the development Modern web commerce could not survive without it. of cryptosystems (usually public-key cryptosystems) that are We have been devoted to both the theoretical and practical secure against quantum adversaries. elements of cryptography. Dr. Bo-Yin Yang is co-author of the Quantum technology can also be viewed as a double-edged widely-used Ed25519 Digital Signature Scheme. Ed25519 will sword. Although it is much less likely to become practical in the be included in the FIPS 186-5 standard of the U.S. National foreseeable future, quantum computing enhances the power Institute of Standards and Technology (NIST), and is employed of honest parties to achieve stronger functionality or security. by hundreds of millions of users. Bo-Yin and Dr. Bow-Yaw Wang There has been a rich development in theoretical cryptography also pioneered research on HACS (High Assurance Crypto exploring the various exciting possibilities arising when we Software) for formal verification of cryptographic software. (the honest parties), and not just they (the adversaries), have On the theoretical side, Dr. Kai-Min Chung has studied quantum computers. fundamental theoretical topics such as zero-knowledge proofs and cryptography in the (parallel) RAM model. Given recent rapid developments in the building of quantum computers, it has become imperative to understand the effect of quantum technology in cryptography. Quantum computing, in the form of Shor’s algorithm, can devastate currently deployed public-key cryptography. Shor's algorithm, one of a few currently recognized quantum computing \"killer Post-quantum Cryptography Cryptography is the field in which quantum computing has those cryptosystems was held in August 2019. Round 3 of the made the earliest and most visible real-world impact. NIST standardization process is expected to begin in June 2020, with began discussing a new post-quantum standard in 2015, ultimately calling for proposals in 2016 with a deadline of 2017. nal candidates being announced in 2022 and standardization Over the past four years, almost every real-world cryptographer taking another 1-3 years. This process is broadly similar to the has been involved in the design, implementation or AES and SHA-3 competitions run by NIST. cryptanalysis of post-quantum public-key cryptosystems. Dr. Bo-Yin Yang has worked on post-quantum cryptography for NIST is currently undergoing Round 2 of the process for more than a decade and is an internationally renowned scholar standardizing post-quantum cryptosystems. Of the 82 in this eld. From its earliest days, he has sat on the Steering entries initially submitted, 69 were considered appropriate Committee of PQCrypto, an international workshop series on and sufficiently complete to call a conference in April 2018. PQC. His team was the only one from Taiwan to participate in PQC researchers from around the world are participating in the NIST call for proposals in 2017. this standardization process. They have been studying and Bo-Yin is an expert in the class of cryptosystems known as attempting to break rst-stage candidate cryptosystems upon multivariates, the security of which is based on the difficulty their release. Of 50 as yet unbroken candidate cryptosystems, of solving multivariate nonlinear systems of equations. He NIST selected 26 (17 encryption schemes and 9 digital has generated many theoretical and practical results on signatures) to advance to Round 2. A second conference on multivariates and has been involved in the design of almost all 16

Brochure 2020 extant multivariate cryptosystems. Bo-Yin’s team submitted The issue of the advantages that quantum adversaries have is two proposals to NIST for their PQC competition. One of not limited to quantum computation, since those advantages his proposals, Rainbow, is a Round 2 candidate and it will can also contribute to understanding how quantum likely enter Round 3. He also studies the design, attack, and information is processed, to exploiting entanglements, and realization of other PQC systems, principally lattice-based to making quantum superposition queries that naturally arise cryptosystems. Over the years he has held rst place in various in various contexts such as leakage and tampering-resilient lattice cryptanalysis challenges. His handiwork can be seen cryptography and constructions in the random oracle (RO) in other post-quantum cryptosystems. One of his principal model. With regard to theoretical aspects, it is fundamental contributions has been an algorithm for high-speed constant- to understand those adversarial advantages and how they time computation of greatest common divisors and modular can enable enhanced security of practical post-quantum multiplicative inverses, which is of paramount importance to cryptosystems. Our work on this topic focuses on developing generating keys for NTRU and related cryptosystems. techniques to prove security against quantum adversaries in the presence of quantum (side) information, as well as security in the quantum RO model. Quantum Cryptography Unlike post-quantum cryptography that focuses on securing classical parties to securely exploit the quantum power of classical crypto constructs against quantum adversaries, the untrusted quantum devices. His work on this topic resulted general eld of quantum cryptography broadly explores what in construction of the only device-independent randomness can be achieved when honest parties also have quantum amplification (DI-RA) protocol under proven minimal technology. As a well-known example, quantum-key assumptions. Specifically, the DI-RA protocol his group has distribution (QKD) enables communication with information- developed can certifiably generate truly random bits given theoretic security, which is classically deemed impossible. As a weak source with sufficient minimum entropy without any another example, when our data becomes quantum, will we structural assumptions. In contrast, other existing protocols be able to encrypt quantum data and perform computation require structured Santha-Vazirani sources and certain over it as we can for classical data? While practical research on conditional independence assumptions. Dr. Chung’s protocol this question might be a long way o , quantum cryptography also implies a strong dichotomous theorem for intrinsic is an exciting theoretical field that combines state-of-the-art randomness in fundamental physics, asserting either that techniques from cryptography, quantum physics, complexity \"Nature\" is fully deterministic or that totally unpredictable theory, and information theory. In turn, quantum cryptography events certi ably exist in \"Nature\". also provides a rich context for developing deep insights and Dr. Chung and his group have also developed techniques new techniques to advance the study of these contributory in quantum cryptography to study the power of classical- quantum hybrid computation using low-depth quantum elds. computation, answering conjectures by Scott Aaronson Dr. Kai-Min Chung is a theorist who has worked on various and Richard Jozsa. Given that quantum computation will theoretical aspects of (post-)quantum cryptography for be restricted to low depth, such hybrid models can capture nearly a decade. He has served on the program committees the computational power available in the near term, but of major international cryptography conferences such as the reliability of such hybrid models is unclear. In 2006, CRYPTO, Eurocrypt, Asiacrypt, QCrypt, and TCC. Apart from Richard Jozsa postulated that any polynomial-time quantum the theoretical topics on aforementioned PQC, he has also computation could be simulated in a hybrid model motivated worked on several topics in quantum cryptography such by measurement-based quantum computation. In contrast, as device-independent cryptography, secure multiparty Scott Aaronson inferred an oracle separation between BQP quantum computation, and classical delegation of quantum and another type of hybrid model (first mentioned in 2005, computation. He is also using techniques from quantum and resurrected in 2011 and 2014). Building on techniques cryptography to investigate topics in quantum complexity from quantum cryptography, Dr. Chung’s team has revealed an theory. oracle separation between BQP and both those hybrid models. One of his research interests is device-independent These ndings support Aaronson’s conjecture and reject that cryptography, through which protocols can be designed for of Jozsa. 17

點亮 計 染色體層級之新世代定序組裝策略: 畫 Spotlight Projects 運用於個人基因體精準醫療與經濟生物之基因體育種 計畫主持人:林仲彥博士 計畫期程:2018/1~2021/12 日新月異的生物序列次世代定序技術、單細胞基因分析、 目前研究團隊透過手邊已有大量數 TB 的高品質的自產 經濟作物的全基因體解析與個人化精準醫學等,將生物 資料 ( 包括次世代定序資料、第三代單分子定序及 10x 醫學研究推進至巨量資料層級,特別是個人的基因體組 Genomics 短片段標誌序列等 ),研發了新的混合組裝演 裝(正常與癌化組織),伴隨巨量生物序列的累積,將 算法。除了能在組裝前後進行序列的分析,也能動態地 能揭露更多致病機制與提供未來治療的可能策略,讓我 以遞迴 (Spiral assembling) 與多樣定序平台混搭的方式, 們人類有機會邁向更為健康長壽的新境界。 來 優 化 整 個 組 裝 的 基 因 體, 同 時 也 能 利 用 大 量 的 10x Genomics 短片段標誌序列,來重新組裝與修補在原有序 2019 年 10 月 30 日由行政院所積極推動的「台灣精準醫 列中的遺缺,以及透過組裝序列間的拓樸關係,來連結 療 Biobank 整合平台」,將帶領台灣個人化精準醫學, 與修補可能相鄰的長序列大片段。目前在日本鰻、台灣 與 個 人 基 因 體 定 序 分 析 相 關 研 究 的 大 步 邁 進。 然 而, 鯛及個人基因體組裝上,已有相當不錯的進展,能大幅 對 於 混 合 不 同 平 台、 不 同 品 質 與 定 序 長 度 所 產 出 之 大 減少序列中的遺缺,並連結組裝片段,同時增加基因內 量序列,來進行高品質的基因體重組 (de novo genome 涵。 assembly),目前並沒有很好的工具程式。由於基因定序 平台的快速演進與多樣化,定序長度已由原來的數百鹼 在非模式物種方面,對於已有不同平台定序資料的之雛 基,進展到五千以上甚至數十萬的鹼基長度,同時一些 形基因體,我們已發展可動態組合的演算法,依其定序 與染色體結構的相關定序技術 ( 如 Hi-C 等 ),也提供染 量與類型,來找出最佳的組裝方式。對於全新的定序計 色體層級的對應資訊。此外新的策略如 Bionano Genome 畫,我們則建議直接以三代定序為主,結合加上標誌序 Mapping Systems 等,以光學方式重新進行染色體基因體 列 的 二 代 定 序 ( 以 10x Genomics 所 產 出 的 短 序 列 標 誌 長片段測繪 (Genome Mapping) 的方式,可以讓我們得以 連鎖組 ) 與染色體構象 Hi-C 定序為輔,在考量定序成 瞭解染色體層級的真實排列順序與大片段的基因結構差 本與基因體覆蓋倍率的平衡下,利用所開發新一代組裝 演算法模組,將可提供全新高品質基因體組裝新策略。 異。然而,現今既存之基因重組演算法大多無法結合不 基於前項的成功經驗,已有院內生多中心、台大、海大 同定序策略的優點,少數可用之方法的組裝結果不佳且 與台灣水產試驗所等團隊與我們接觸商討合作,並簽訂 速度緩慢。針對這樣的現況,必需發展新一代的快速高 MOU,希望透過我們的經驗,來完成複雜的基因體解碼 效率演算法,利用雲端平台與分散式運算的優勢,才能 工作(如白蟻、海水台灣鯛、白帶魚及金目鱸等),作 將結合二代定序、三代定序及基因體光學解析等技術, 為後續基因體育種的重要基礎 ( 圖一 )。 將基因體組裝推進至染色體層級。以更為完整的面貌, 來解析複雜的基因上下游調控機制、找尋個人精準治療 策略與協助基因體育種等之複雜課題。然而,如何在有 限的定序經費與計算資源下,在定序策略與組裝品質之 間取得平衡,是基因體研究領域當今所遇到的最大挑戰。 圖一:開發新的組裝策略並結合新的定序技術,大幅提昇基因體組裝的質與量。 18

Brochure 2020 在個人化基因體重新組裝部分,將與中研院生物醫學研 圖二:利用我們所發展的策略來精進個人化基因體組裝品質,邁向 究所團隊合作,整合 10x Genomics 與 bionano 光學測繪 未來精準醫療。 的大量資料 ( 一個人約有超過 100Gb 以上的數據 ),來發 展新一代雲端組裝演算法,建構高品質個人基因體。目 前已能有效地將原有組裝之遺缺 (Gaps) 部分大幅縮減, 並增加序列長度與基因內容,也就是提昇整體組裝的解 析度與質量。將能作為未來結合人工智慧與相關醫療資 料後,發展精準醫療的重要紮實基礎 ( 圖二 )。同時,為 了解決大量運算所造成的計算需求,在高效率雲端計算 流程建構部分,研發團隊將與台灣微軟公司一同合作開 發。 本研究之相關成果也陸續整理釋出程式源碼及建構 國際研討會與舉辦工作坊的方式,以及一年一度的院區 開放日來宣傳與推廣。讓更多院內外的生物醫學研究社 Dockers images, 放 置 在 GitHub 及 Docker Hub (https:// 群得以善用這些工具,更深入他們自身的研究主題。此 hub.docker.com/u/lsbnb) 上。組裝出來的經濟生物基因 外,研發成果中的『人類與小鼠基因表現分析平台』,已 體,也透過研發團隊所建構的平台,進行註解與線上資 與國內定序廠商合作。以技術授權的方式完成轉移威健 料庫建構,將這一些非模式物種之基因體研究,提昇到 生物科技股份有限公司,於 2018 年底開始商業試運轉。 如同果蠅或小鼠等模式生物的水準,協助相關研究社群 進行更詳盡的研究 ( 圖三、四 )。同時,我們也透過參與 圖三:將非模式物種之基因體研究推至模式生物的架構。 圖四:基因體分析應用程式,及多個重要物種之線上資料庫平台,可透過 QR codes 連結網站。 19

Spotlight Projects亮 點 計 畫 A Reliable Gap-Filled Strategy for Non-Reference Chromosome-Level Assembly: Applications to Human Precise Medicine and Aquaculture Breeding Principal Investigators: Dr. Chun-Yen Lin Project Period: 2018/1~2021/12 Rapid progress in next-generation sequencing (NGS) not fully utilize all of the information available from raw and third-generation single-molecule sequencing (TGS) sequences. Emerging sequencing technologies such as technologies has moved biomedical research into the Hi-C (chromosome conformation capture) and Bionano era of big data. Big data has rendered sequencing-based (optical genome mapping technology) are prompting tasks―including bioinformatics analyses, data statistics us to integrate all available sequencing approaches to and visualization, and data transfer and storage―more characterize genomes more comprehensively than ever. challenging than ever, yet these issues have inspired new research avenues in bioinformatics and biomedicine. In We have generated and collected a considerable array particular, de novo genome assembly of human and non- of sequencing data, including NGS reads, linked reads, model organisms for personalized medicine and genome TGS reads, and Bionano optical mapping outputs. We are breeding are helping to analyze disease mechanisms and starting to develop a new hybrid and spiral approach to deciphering the secrets of the genome. assess this data by integrating gap-closing and sca olding algorithms using local de novo assembly of link reads (10x However, methods and software for dealing with hybrid genomics). This approach has improved the draft genome genome assembly of NGS linked-read (10x Genomics) assemblies of important aquacultural species (i.e., Japanese and TGS long-read data are currently limited, so results eel, Taiwan Tilapia, Giant Grouper) (Figure 1), both in terms are rarely satisfactory. However, the existing tools do of contiguity and completeness. Figure 1 : Our novel strategy to analyzing new sequencing technology outputs signi cantly improves the quality and extent of the Japanese eel genome assembly. 20

Brochure 2020 Moreover, our new approach integrates a large number Figure 2 : Our strategy can re ne the quality of personalized genome of raw reads from 10x Genomics and Bionano optical assemblies, contributing to the future of precision medicine. mapping to assemble (up to chromosome level) de novo personalized genome drafts with more genomic content more in-depth analyses (Figure 3). All software and tools and fewer gaps (Figure 2). To ease the computational derived from our study are being released to the public burden of local assembly, we are engaging in a via GitHub and DOCKER (https://hub.docker.com/u/lsbnb). collaboration on cloud computing with Taiwan Microsoft Furthermore, we have made several of our web databases Inc. available to the wider research community (Figure 4). To better interpret assembled genomes, we have implemented a web-based framework we name MOLAS (http://molas.iis.sinica.edu.tw). MOLAS aids genomic studies of non-model species to levels comparable to those of model organisms such as the Drosophila or mouse, allowing the research community to conduct Figure 3 : Genomes of non-model organisms can be characterized to the level of model organisms for further studies based on our tools and platforms. Figure 4 : Tools and web genome databases for collaborators and research communities. 21

Spotlight Projects亮 點 計 畫 空氣盒子:微型空氣品質感測系統 計畫主持人:陳伶志博士 計畫期程:2016/1~2021/12 微型空氣品質感測系統是一種利用低成本微型空氣品質 我們自 2013 年開始,針對微型空氣品質感測系統進行一 系列的研究,並且於 2015 年正式以細懸浮微粒 (PM2.5) 感測器、透過即時數據傳輸,以及結合大數據時空資料 為標的,發起參與式微型 PM2.5 感測系統,透過與民間 分析而成的新型態環境物聯網系統。相較於傳統的環境 監測系統,微型空氣品質感測系統承繼傳統物聯網低功 社群、資通訊產業、地方政府與各地中小學校的合作串 耗、低成本與無線網路傳輸的特性,不但在測站佈建的 數量與密度上可以輕而易舉遠勝傳統龐大且昂貴的專業 連,迅速在全台各地發酵。近期更透過政府的前瞻基礎 環境監測系統,其透過無線網路的資料傳輸能力更大幅 建設計畫,深化到全國每一間中小學的教育現場,且透 改善了以往環境監測使用人力傳遞資料的效率,並且開 過學研社群的推廣擴散,截至 2020 年 3 月,已成功於全 啟了小尺度 ( ner-grained)、即時性 (real-time) 與適地性 球 58 個國家達到超過一萬五千台的布建量,成為全球規 (location aware) 環境觀測的創新應用。因此,微型空氣 品質感測系統已經成為當今各國政府、大型企業與民間 模最大的微型空氣品質感測系統;同時,藉由與全球相 社群所共同重視且致力推廣的環境監測基礎建設。 關微型空品感測計畫的合作,我們所建置的微型 PM2.5 感測開放資料平台,更已經成為目前全球微型 PM2.5 感 測學研社群開放資料集的最大集散地。 圖一:微型 PM2.5 感測系統布建狀況地理分佈圖(至 2020 年 3 月)。 我們的各項研究成果已受到各界廣泛的注目。除了國內 Lab 的 Array of Things 計 畫、 美 國 Harvard University 的 各主要報章媒體的專訪與專題介紹外,在 2019 年國內首 The COGfx Study 計畫、 美國 University of Utah 的 AQ&U 部空汙紀錄片「浮塵之島」中,更詳盡地記錄了我們的 計畫、德國的 SmartAQNet 計畫、泰國的 DustBoy 計畫、 研究過程與各項成果。同時,我們的研究團隊亦有良好的 以及韓國慶尚南道的 AirBox 計畫等,同時,也獲得國際 國際能見度,並且與全球主要的微型空氣品質感測團隊, 知名媒體與節目的專題報導(CBS 新聞及 BBC News 的 建立密切的合作關係外,例如:美國 Argonne National Click 節目)。 TVBS T- 觀點專題報導(2018 年 1 月) CBS News(2018 年 8 月) 22

Brochure 2020 除了感測器的布建外,我們也根據微型空氣品質感測系 由於微型空氣品質感測系統能提供更小時空尺度的量測 統的特性,發展一系列的資料分析與應用服務演算法。 例如:針對參與式感測系統可能衍生的資料品質問題, 資料,因此除了在資訊科學與環境科學上的貢獻外,在 我們發展一個針對感測資料進行異常分析的即時運算演 公共衛生、風險管理、遙測影像分析、都市規劃、大氣 模擬、科技與社會研究等專業領域,也紛紛開啟了創新 算法;針對短時間內的空氣品質預報,我們提出一個融 的跨領域研究議題,而其各項學研成果的推廣應用,更 合資料叢集與類神經網路技術的運算方法;根據感測資 帶動了智慧城市的新發展,對於未來邁向智慧環境治理, 料與預報資料,我們結合 GIS 技術提出最小空汙曝露量 以及促進智慧決策,也成了產官學研與民間社群共同努 的個人化導航創新服務。我們的研究成果除了在學理上 力共創雙贏的具體新方向。 力求創新突破,並發表在相關領域的國際重要期刊外, 在實務上亦務求能實際應用於運行中的微型 PM2.5 感測 系統,以期能化研為用,為環境品質的監測善盡一分力 量。 (a) System architecture (IEEE Access' 17) (b) Anomaly detection (IEEE JIoT' 18) (c) Hybrid data forecast (IEEE Access' 18) (d) Clean Air Routing (IEEE Access' 19) 圖二:微型 PM2.5 感測系統相關研究成果。 「浮塵之島」紀錄片(2019 年 1 月) BBC News: Click 專題報導(2019 年 11 月) 23

Spotlight Projects亮 點 計 畫 AirBox: Micro Air-Quality Sensing System Principal Investigator: Dr. Ling-Jyh Chen Project Period: 2016/1~2021/12 Micro air-quality sensing is an emerging paradigm that sensing since 2013. We launched the AirBox project combines advances in low-cost air sensors, long-range in 2015, which engages citizens to participate in fine low-power communication, and big data analysis for particulate matter (a.k.a. PM2.5) sensing and empowers environmental monitoring. Compared to conventional participants to make low-cost PM2.5 sensing devices on environmental monitoring systems, it is more cost-e ective their own. The project has grown rapidly with support from for large-scale and dense deployments, and it can provide local governments, domestic IT companies, and citizen measurements at a finer spatio-temporal scale. In recent communities. In 2019, our project was awarded a 4-year years, diverse governments, industries, and communities government grant to deploy AirBox devices in every K12 worldwide have deployed large-scale micro air-quality school in Taiwan. By March 2020, we have placed more than sensing systems. 15,000 devices in 58 countries, and our open data portal has become the largest and most popular repository of We have been devoted to research on micro air-quality micro PM2.5-sensing data in the world. Figure 1 : Participating countries in the AirBox project (to March 2020). Our AirBox project has received extensive attention from collaborations with top research teams and government the media, public, and international research community. agencies such as the Array of Things project (Argonne Apart from domestic media coverage―including in \"Dust National Lab, USA), COGfx Study (Harvard University, USA), Island – Particulate Matters\", the rst documentary focusing AQ&U project (University of Utah, USA), SmartAQNet on air pollution in Taiwan, aired in 2019―AirBox has been project (Karlsruhe Institute of Technology, Germany), featured by CBS (USA) and BBC News (UK) in 2018 and DustBoy project (Chiang Mai University, Thailand), and 2019, respectively. Our project has good visibility in the AirBox Korea project (Gyeongsangnam-do Provincial community internationally, especially due to our tight Government, Korea). TVBS T- Viewpoint (2018.01) CBS Newst (2018.08) 24

Brochure 2020 In addition, we have investigated the properties of micro Finally, by exploiting the fine-scale data resolution of air-quality sensing data and generated a number of micro air-quality sensing, our AirBox project not only can algorithms for data analysis. For instance, we designed an benefit researches in the computer and environmental anomaly detection framework (ADF) to detect anomalous sciences, but it can also stimulate interdisciplinary devices/events in sensing data streams, and we have innovation in public health, risk management, urban proposed a hybrid model for short-term air-quality planning, atmospheric science, and various other science forecasting by incorporating data clustering and neural and technology fields. Our project has created a positive networks. Moreover, by combining real-time sensing data ecosystem in which academia, industry, governments, and and short-term forecasting results, we have proposed citizens collaborate. It has the potential to facilitate smart a clean air routing (CAR) algorithm to provide route city design, intelligent environmental governance, and recommendations for minimal air pollution exposure. Our public-private partnerships for the common good into the research is of both theoretical and practical value and future. our findings have been published in prestigious journals. Our algorithms have been implemented in the AirBox system, and the results they generate are being used by governments and research communities. (a) System architecture (IEEE Access' 17) (b) Anomaly detection (IEEE JIoT' 18) (c) Hybrid data forecast (IEEE Access' 18) (d) Clean Air Routing (IEEE Access' 19) Figure 2 : Selected research results from the AirBox project. \"Dust Island - Particulate Matters\" documentaryt (2019.01) BBC News: Clickt (2019.11) 25

人工智慧計畫 Artificial Intelligence Projects

計畫綜覽 28 Overview 29 30 深度學習與新興電腦視覺應用 34 Deep Learning for Emerging Computer Vision Applications 38 42 深度學習智慧系統整合:異質性深度模型 整合與檢索特徵學習 46 Deep Learning Methods for Application-driven 50 Smart System Integration 54 58 深度學習於多媒體資料處理的相關研究及應用 Deep Learning for Multimedia Information Processing 62 66 具深度理解之對話系統及智慧型輔助學習機器人 Intelligent Conversational Robot for Learning Assistance with Natural Language Understanding 建構概念為本且具語義結合性的中文知識庫 The Construction of A Concept-based Chinese Knowledge Base with Semantic Composition Capability 27

人工智慧計畫 / Artificial Intelligence Projects 計畫綜覽 / Overview 人 工 智 慧 的 深 度 學 習 議 題 在 ImageNet 被 開 發 出 來 並 舉 行 ILSVRC(ImageNet Large Scale Visual Recognition Challenge)後開始被熱烈探索,2016 年 AlphaGo 擊敗韓國圍棋世界冠軍李世乭後更讓深 度學習的研發進入另一個高峰期。 本所 2017 年學術諮詢委員會議委員最後有一項建議 \"The Institute needs an AI/Machine Learning group that is charged with the mission of innovation and research in the area of artificial intelligence, deep learning and data science; related activities exist in many current labs and cross-lab synergy should be maximized.\" 當年年中科技部也察覺到人工智慧的新時代已然來臨,於是快速提擬預算,向全國各領 域徵求人工智慧計畫。 本所原來也為了諮詢委員的建議擬成立一個 AI group,但科技部的徵求計畫讓許多同仁蠢蠢欲動。因 此,成立一個新的 group 的原始想法變成用各個實驗室原有的專長去投入 AI 研究。很幸運的,在全 國超過 600 件申請計畫中,本所獲得了四件大型 AI 計畫補助,分別是多媒體實驗室 3 件(劉庭祿- 源於 GAN 的深度學習技術與網路精簡化在電腦視覺得應用;陳祝嵩-基於 AI 應用之深度學習智慧系 統整合;廖弘源-深度學習於多媒體資料處理的相關研究及應用),語言與知識處理實驗室 1 件(許 聞廉-具深度理解之對話系統及智慧型輔助學習機器人)。隔年(2019 年開始),語言與知識處理實 驗室又多了一件通過計畫(馬偉雲-建構概念為本且具語意結合性的中文知識庫)。 此五件人工智慧計畫皆為四年期 ( 前四項計畫執行期間為 2018 年至 2021 年,最後一項計畫執行期間 為 2019 年至 2022 年 ),除了金額相當龐大之外(總計約每年 4,500 萬台幣),也為近幾年資訊所研發 重點設定其中一個方向。以下的篇幅將涵蓋五個 AI 計畫的介紹及其進行到目前(2020 年 3 月)的狀況。 28

Brochure 2020 The issue of deep learning in Artificial Intelligence (AI) has been discussed heatedly since ImageNet was developed and used to host ILSVRC (ImageNet Large Scale Visual Recognition Challenge). AlphaGo defeated South Korea’s world Go champion Lee Sedol in 2016, representing a massive leap in deep learning research. In 2017, the Academic Advisory Committee of IIS stated in its nal conclusions that \"The Institute needs an AI/Machine Learning group that is charged with the mission of innovation and research in the area of arti cial intelligence, deep learning and data science; related activities exist in many current labs and cross-lab synergy should be maximized.\" At that time, the Ministry of Science and Technology also realized that the era of AI had come, so it quickly allocated funds to solicit AI projects in various elds from across the country. In response to the 2017 Academic Advisory Committee, IIS originally intended to set up a dedicated AI lab. However, since MOST’s request for projects elicited excitement among so many of the PIs in IIS, the institute decided to employ the existing expertise in each of its labs to apply for MOST’s AI projects. Signi cantly, out of more than 600 project applications nationwide in the rst year, IIS was awarded four out of only 66 funded projects. The Multimedia Technology Lab is conducting three of those projects and the Language and Knowledge Processing Lab is responsible for the other. In a second round of applications in 2019, the Language and Knowledge Processing Lab was awarded an additional funded project. All ve funded projects have four- year terms and their combined budget is USD$1.5 million per year. In the following pages, each project reports on their progress to date (to March 2020). 29

工 人 智慧 計 畫 深度學習與新興電腦視覺應用 Arti cial Intelligence Projects 計畫主持人:劉庭祿博士 計畫期程:2018/1~2021/12 本計畫為四年期的研究規劃,旨在發展深度學習技術與 自然語言技術等,提出新興電腦視覺研究議題與應用。 其在電腦視覺領域的最新應用。關於深度學習方面,本 我們亦將探討適用於 360 影像、點雲集與醫學影像的電 計畫將聚焦於發展深度學習的各項核心技術,包含從基 腦視覺演算法。此外,本計畫與業界合作方面將著重於 礎的訓練資料正規化到深度學習網路架構的自動化搜尋 智慧零售與智慧製造,特別是發展考量邊緣設備運算資 源之深度學習網路精簡化技術,搭配快速的物件偵測與 等研究議題。針對監督式、半監督式、弱監督式與少例 辨識方法,來開發可自動化處理日常零售商品銷售的相 訓練資料的各種深度學習模式,亦是本計畫的重點研究 關商業應用。以下簡述本計畫前兩年執行之主要研究成 項目。在電腦視覺應用方面,我們將立基於本實驗室過 果與業界合作現狀。 去在物件偵測、物件辨識與影像分割等研究成果,開發 以深度學習為骨幹的新世代電腦視覺技術,並結合諸如 類生成對抗網路之電腦視覺技術 有鑑於生成對抗網路(GAN)與相關深度學習模型的優異 網路之非監督式學習機制,該機制無須透過明確的像素 表現, 在訓練深度學習網路(DNN)時若能善用類似生 層級前後景區域標註,即可自動建模出泛化的影像前後 成對抗網路(GAN)的信息反饋方式,應亦可獲致令人印 景概念。更為具體地說,我們的方法將元學習過程轉化 象深刻的神經網路效能提升。受此信息反饋方式的啟發, 為一種組合圖像編輯任務,藉由取得網路上豐富的視覺 我們著手研究用於訓練深度學習網路時的各式有效信息 效果圖像資料,該非監督式學習機制可訓練所設計之深 度學習網路來模仿某種視覺效果,於模型訓練時獲得的 反饋方式(例如網路聚合、注意力導引、記憶提示、局 相對應內部表示遮罩除可用來完成組合圖像編輯任務, 部或全域資訊、多模態融合等其他方式)。對於物件偵測 也可用來描述前景(請參見圖一)。這項研究成果發表在 問題,我們設計以非局部重點區域(ROI)配合邊框式相 AAAI 2019 上,而所設計的非監督式學習機制現已廣泛用 關性,來擴增物件偵測之特徵表示。該技術在 CVPR 2018 於解決實際應用。 中取得了強健性視覺挑戰賽中的實例分割競賽第一名。 針對處理影像分割問題,我們設計一種植基於生成對抗 圖一: Visual-E ect GAN (VEGAN) for gure-ground segmentation. 30

Brochure 2020 圖二: Referring image segmentation model and qualitative examples. 31

人 工 智 慧 計 畫 Arti cial Intelligence Projects 基於自然語言的電腦視覺技術 對於電影問答(QA)問題,我們開發了一個交互注意力 深度學習網路架構如圖二所示,該網路聚合了自下而上 推論之深度學習網路架構來解決電影問答。所提出的這 和自上而下的視覺與文本資訊來完成此分割任務。這項 項解決方法在 MovieQA 排行榜中直到 2018 年 8 月前都 工作發表於 ICCV 2019,據我們所知,此項技術在當前的 是該電影問答資料庫的最準確解決方案。語提示影像分 主要數據集上是最準確的語提示影像分割技術。 割任務是根據任務所提供之句子提示來準確地分割圖像 中的目標區域。對於語提示影像分割問題,我們提出的 應用於電腦視覺之小樣本學習 人類在有限指導之下去學習新概念的能力非常出色。即 (co-excitation) 的深度學習網路模組,來解決單一樣本物 便沒有某物件類別的先驗知識,人類視覺系統也能透過 件偵測問題。 圖三顯示了該模組如何透過特徵擠壓和共 執行不同的功能來處理該物件識別任務,這些功能包括 同激勵機制驅使單一樣本學習模型去自動偵測出查詢影 將影像中歸屬於物體的像素進行分群,提取重要影像特 徵進行比較以及應用注意力機制來做物件定位。在初步 像和目標影像間的共同特徵。這項工作的階段性成果發 研 究 中, 我 們 提 出 共 同 注 意 (co-attention) 與 共 同 激 勵 表在 NeurIPS 2019 上,我們目前正在進一步發展該技術。 圖三:Non-local proposals and co-excitation for one-shot object detection. 32

Brochure 2020 新興電腦視覺應用 對於處理 360 度影片 / 圖像時,我們提出了一種簡單而 高效的深度學習網路模型和演算法來解決應用於 3-D 點 有效的立方體填充(Cube Padding,CP)技術,這項研 雲集的計算機視覺任務,所欲探討的 3-D 點雲集應用包 發成果發表在 CVPR 2018 中,並展示了該技術可有效解 括 3-D 物件偵測,語義偵測和實例分割。 決 360 度影片中的顯著性偵測任務。我們還致力於設計 產業合作 這項四年期專案的關鍵績效指標(KPI)可由我們與產業 該公司於 2020 年春季正式成立並專攻深耕智慧零售產 業,該項產業適用於諸多 AI 技術,如商品物件自動化偵 合作夥伴的成功合作來衡量,本計畫積極協助提升產業 測與識別等應用情境。 合作夥伴在 AI 相關商業應用的研發能力。 自 2019 年底 以來,我們與一家具未來發展性的 AI 新創公司密切合作, 33

工 人 智慧 計 畫 Deep Learning for Emerging Computer Vision Applications Arti cial Intelligence Projects Principal Investigator: Dr. Tyng-Luh Liu Project Period: 2018/1~2021/12 The primary goal of this four-year project is to develop image segmentation―this project will advance available deep learning techniques related to emerging computer methods by incorporating powerful deep learning vision applications. Our research e orts focus on addressing approaches. We are also interested in combining computer crucial issues in designing deep learning techniques, vision and natural language processing techniques for including how to regulate layer-wise feature distributions emerging computer vision applications. In addition, we are and how to effectively carry out network architecture working closely with industry to identify aspects of smart searches, amongst other topics. Since availability and manufacture and smart retail for targeted intervention. quality of annotated training data can vary substantially Currently, we are designing hardware-aware network in practical applications, we also intend to establish simplification techniques so that our proposed deep learning frameworks that, apart from supervised settings, learning methods can be e ciently ported to target edge account for semi-supervised, weakly supervised, or few- devices without signi cantly degrading their performance. shot learning scenarios. Leveraging our past successes Below is a brief description of key results arising from our in dealing with conventional problems in computer research e orts over the rst two years of this project. vision―such as object detection, object recognition and GAN-inspired computer vision techniques Inspired by the impressive performance gain owing to in CVPR 2018. In dealing with the problem of image training a DNN with the GAN-like informative feedback segmentation, we design a GAN-based unsupervised versus without such additional information, we set out learning mechanism to model a general figure-ground to investigate the effects of training a DNN with regard concept without relying on explicit pixel-level annotations. to various forms of useful feedback, such as network More specifically, we formulate the meta-learning process aggregations, attention cues, memory cues, local-vs- as a compositional image editing task that learns to imitate global information, multi-modality fusion, amongst a certain visual e ect and derive the corresponding internal others. For object detection, we propose non-local ROIs representation by exploring webly-abundant images of to augment feature representations with bounding box- visual e ects. (See Figure 1.) This work is published in AAAI wise correlations. This technique wins the rst place of the 2019 and our proposed unsupervised scheme is now being instance segmentation contest of robust vision challenge used extensively to solve practical tasks. Figure 1 : Visual-E ect GAN (VEGAN) for gure-ground segmentation. 34

Brochure 2020 Figure 2 : Referring image segmentation model and qualitative examples. 35

人 工 智 慧 計 畫 Arti cial Intelligence Projects Natural language driven computer vision applications We develop an attention-to-attention DNN framework network architecture considers bottom-up and top-down to tackle the movie question answering (QA) problem. aggregations of visual-textual information to achieve the Our proposed method was ranked as the top-1 model in segmentation task. This work is presented in ICCV 2019, the MovieQA leaderboard until August 2018. The task and to the best of our knowledge, is the current SOTA of referring image segmentation is to correctly segment technique for referring image segmentation on the main the target region(s) in the image, according to a provided benchmark datasets. sentence hint. As illustrated in Figure 2, our proposed Few-shot learning for computer vision applications The ability of humans to learn new concepts under limited a preliminary study, we tackle the problem of one-shot guidance is remarkable. Even without prior knowledge object detection by proposing a co-attention and co- about an object's category, the human visual system has excitation DNN module. Figure 3 shows how the module evolved to handle such a task by performing different of squeeze and co-excitation enables one-shot learning to functionalities that include grouping the pixels of an object seek common features between the query and the target as a whole, extracting distinctive cues for comparison, image. Our work is published in NeurIPS 2019, and we are and exhibiting attention or fixation for localization. In currently engaged in further developing this project. Figure 3 : Non-local proposals and co-excitation for one-shot object detection. 36

Brochure 2020 Emerging computer vision applications In dealing with 360-degree videos/images, we have efficient DNN models and algorithms for computer vision proposed a simple and effective cube padding (CP) applications based on 3-D point clouds. The underlying technique and demonstrated its usefulness in performing applications include 3-D object detection, semantic saliency detection of 360-degree videos. This work is detection and instance segmentation, all in the form of published in CVPR 2018. We are also endeavoring to design point clouds. Industrial collaborations An important key performance index (KPI) of this four- have worked closely with a promising startup (officially year project is measured by our success in collaborating established in the spring of 2020), which specializes in with industrial partners to boost their AI-related research smart retail, a promising domain that AI techniques such as and development capacity. Since the end of 2019, we object detection and recognition are widely applicable. 37

工 人 智慧 計 畫 深度學習智慧系統整合:異質性深度模型整合與檢索特徵學習 Arti cial Intelligence Projects 計畫主持人:陳祝嵩博士;共同主持人:王新民博士、古倫維博士 計畫期程:2018/1~2021/12 深度學習系統是 AI 領域的重要工具,滲透到各個領域的 目前眾多訓練好的深度模型都在網路開源可取得,本計 應用之中,如電腦視覺、訊號分析、生醫影像、工業檢 畫的核心課題之一即為「如何整合既有的模型來開創多 測、語音處理、自然語言理解、資訊檢索等。深度學習 功能的應用?」除了單一的訊號源外,智慧型系統也可 的模型主要是針對不同的功能加以開發,例如在電腦視 能需要不止一個的感測裝置 (sensor),故亦需整合不同感 覺領域,不同的任務常具備不同的深度模型。如人臉辨 測器取得的訊號源 - 如結合影像與聲音訊號來進行同步 認、物體偵測、街景偵測、行人偵測、車輛偵測、場景 意圖理解與辨認的工作。當需進行多個任務時,過去的 文字、衣著辨認等,都有各自的深度學習模型,個別在 方法多以重行訓練新設計的網路 (learn-them-all) 或橋接 其專門的任務上都可達到很好的表現。而隨著不同領域 多個網路來達成,然而如此作法之模型設計與訓練耗時, 深度學習模型的發展,不同模態 (modal) 的 AI 功能在處 不易掌控並缺乏使用彈性,且複雜的模型不利於端點或 理上也有同樣的需求。例如在語音處理的領域,語音辨 嵌入式裝置上推論 (inference) 工作的實現。 認及語者識別的深度學習模型可能有所不同。而一個智 慧系統需要結合多模態的深度學習模型,例如整合視覺 本計畫目標包括了: 人臉辨認、聽覺語者辨認、與自然語言情感分析,來進 ( 一 ) 開發多類異質深度模型間整合機制 行對話人物身份判斷及狀態理解。因此亦需要深度學習 ( 二 ) 結合基礎技術,達成不同功能深度模型之整合,廣 模型之整合,讓多模態的功能統一建構於單一的模型之 中,達到計算資源的有效運用。 泛建立整合經驗 重點項目與成果如下: ( 一 ) 多類異質深度模型間整合機制 推論階段深度 CNN 分類網路整合方法研發 我們開發已經訓練好之 CNN 分類網路模型之整合方法。 盡相同。(2) 網路共編碼與校正學習:將不同網路的權重 藉由發掘出多個已經訓練後之模型權重值間的相關性, 進行跨模型之統一編碼,達成權重值之壓縮且共用的目 進行對應層之間的共編碼,原本的網路結構因而得以維 的。過去雖已有單一深度模型的壓縮與簡化方法,然仍 持,並可持續利用倒傳遞演算法將共編碼的 codebook 進 未有同時兼顧模型權重值之壓縮與共用的方法出現。本 行校正訓練與優化。成果發表於 AI 頂尖國際會議 IJCAI 法是世界上首見可於推論階段進行深度模型整合之技 2018。 我 們 並 藉 由 此 方 法 開 發 在 推 論 端 整 合 人 臉 及 語 者 辨 認 之 深 度 模 型, 成 果 於 CVPR 2019 Workshop on 術。特點包括 (1) 異質性 CNN 網路整合:所整合的 CNN Multimodal Learning and Applications 口頭報告論文。 模型可具備不同的層數,其卷積核大小與數目等也可不 圖一:深度模型之整合與共壓縮。於推論端將多個已經充分訓練好之深度類神經網路模型,合成為單一模型, 以提升系統執行速度與減少耗能。 38

Brochure 2020 不遺忘之永續深度學習 監督式深度學習的進展,除了利用已蒐集完成的資料集 法的特色是可完全避免遺忘,並在維持深度模型緊緻性 來進行學習外,也面對另一個困難 ─ 資料並非一次性即 (compactness) 的狀況下來進行擴充。成果發表於 ICMR 可蒐集與建構完成,而是分批獲得;而學習的任務或技 2019。接著我們改良此方法,在壓縮 (compacting) 與 擴 能也並非一次可達成,而是在過去以某些資料集學習完 張 (growing) 的交替步驟外,另增選擇 (picking) 的步驟, 相對應任務後,接著再以新的資料集學習新的任務。因 此永續性 (continuous) 或終身 (life-long) 學習變得益加重 以便類似人一般在鞏固過去知識的同時,也能藉由選擇 要。這部分的方法最大的困難是如何避免災難性地遺忘 了過去已習得的任務與技能。我們發展了濃縮化 (shrink) 過去的關鍵知識來對於目前要學習的新任務有所助益。 與擴張化 (expand) 的交替步驟,能夠在避免任務遺忘的 我們使用可微分的選擇遮罩 (selection mask) 連同模型 情況下,發展出實現多工任務的緊緻模型。達成在學習 新任務的同時,持續完整地保有舊任務的功能。我們方 的保留或擴充權重同時訓練,達成學習新任務但不遺忘 舊任務的效果。成果發表 AI 領域頂尖國際會議 NeurIPS 2019。 圖二:藉由模型壓縮、權重檢選、以及模型擴充之原理,達成不遺忘並可借重過去經驗的永續深度學習。 ( 二 ) 多功能深度模型之整合應用 快速檢索碼之深度學習 我們發展無監督與半監督式的二元檢索碼的深度學習技 發掘出類別的相關性,更容易將具標籤資料的學習結果 推展給無標籤資料使用。相關成果發表於 ICIP 2019。 術,以利於在大型資料庫中的快速搜尋與檢索。在半監 無監督二元檢索碼的深度學習技術發表於AI 領域頂尖 督式學習中,只有部分的資料被標註,因此以標註的資 期刊 IEEE Transactions on Pattern Analysis and Machine 料為基礎,透過自我學習的方式逐步擴充至未標註的資 Intelligence 2019。其影響因子 (IF: 17.73)為目前AI 領域最 料,進而達成完整的學習。我們的方法考慮類別之間的 關連性,類別的標籤值也是變數中的一環,可在學習網 高。 路權重的時一同被調整。由於我們可以在學習的過程中 圖三:速搜特徵之深度學習。研發二元檢索碼之深度學習技術,以利於大型資料庫中的快速搜尋。標籤值在 學習網路權重時可一併調整,自動發掘出類別間的相關性。 39

人 工 智 慧 計 畫 Arti cial Intelligence Projects 文字故事及影像故事之語意抽取及後端生成 影像故事生成為整合影像與文字資訊的多模式議 圖四:整合影像與文字資訊之故事生成方法流程圖。 題,給定一組圖片,需生成一個搭配圖片的最佳故 事。在故事生成問題上,要使得上下文通順如真人 所述說,原本就不容易。然而,若要能符合某些關 鍵元素來說事,將使此任務更加困難,影像故事生 成 就 是 這 樣 一 個 極 具 挑 戰 性 的 任 務。 我 們 從 語 意 出發探討這個問題,經由影像語意簡化 (reduction, summarization) 到 一 個 共 同 的 語 意 架 構 (semantic frames) 接著再以生成模型進行語意擴增 (generation and enrichment),達到生成通順而豐富的故事之目 標。目前本研究朝向增進前後文通順度 (coherence) 及可生成動態長度故事的方向研發,利用額外的知 識 (knowledge graph or ontology) 來補充圖片內及圖 片間元素的連結。所發展的模型經由真人閱讀實驗 驗證,與其他模型比較能產生目前最佳的故事。成 果發表於 AI 頂尖國際會議 AAAI 2020。 角度思路導引之新聞推薦 本研究整合影像與文字資訊的多模式議題。傳統在 新聞推薦上,多是利用讀者過去看過的新聞推測未 來會有興趣的新聞,或是推薦熱門新聞給讀者,忽 略了讀者之所以會被某些新聞主題吸引的真正原因 也包括他們本來就關心的議題,或是新聞圖片中吸 睛的主題。本研究同時利用文字及新聞附圖中所提 示的語意重點,搭配自動從大量新聞語料學得的先 驗知識概念,模擬讀者腦中的世界觀,導引模型推 理出讀者最可能有興趣的新聞並加以推薦。實驗結 果顯示,此一多模式新聞推薦模型,能有效同時利 用圖文提示的重點,抓住讀者有興趣的概念並推薦 出下一個被點閱的新聞。 圖五:整合影像與文字資訊的多模式新聞推薦方法流程圖。 40

Brochure 2020 基於鑑別式自動編碼器之語者辨識及語音辨識 自動編碼器 (autoencoder) 常用於無監督學習中的有效 圖六:基於 DcAE 的語音辨識聲學建模。基線系統的工作流 ( 虛 編碼。我們在 2017 年首度提出一個鑑別式自動編碼器 線 ) 由輸入層、編碼器、P 代碼和輸出層組成。基本版 DcAE (discriminative autoencoder, DcAE),應用在語者辨識, (DcAE-B) 的工作流用實線表示,其關鍵組件多了用於特徵重 其基本原理是在編碼層藉助不同的減損函數將語者相 構的 R 代碼、解碼器和輸出層。 DcAE-U 則進一步在編碼器和 關信息和語者無關的因素分離,使得語者相關表示具 解碼器層之間增加連接。 備更好的鑑別性,進而提高辨識率。2019 年,我們改 良 DcAE 的架構,成功套用在語音辨識開發工具 Kaldi 的 nnet3 設置中的 TDNN 與 TDNN-LSTM 聲學模型架構 中,並將測試在 WSJ 語料的 recipe 公開。成果發表於 Interspeech2019。 基於變分自動編碼器之語音轉換 語音轉換 (voice conversion, VC) 是將來源語者的語音轉換 提出讓模型同時滿足兩種以上的頻譜特徵,可以進一步 為目標語者的語音,音色及語調改變,但語音內容不變。 提高轉換後語音的音質和與目標語者的相似度,發表在 我們在 2016 年底首度將變分式自動編碼器 (variational ISCSLP2018 的論文獲得最佳學生論文獎。2019 年,我們 autoencoder, VAE) 應用到無對齊語料條件下的語音轉換, 將 WaveNet 聲碼器引進 VAE-based VC,以取代傳統聲碼 發 表 在 APSIPA ASC 2016 的 論 文 在 Google Scholar 已 經 器,並提出一個在編碼層中進一步濾除基頻 (fundamental 有 102 次的引用。2017 年,我們提出結合生成式對抗型 frequency, F0) 資訊的方法。整合相關研究成果的期刊論 網路 (generative adversarial network, GAN) 的無對齊語料 文於 2020 年初獲得 IEEE Trans. on ETCI 接受發表。 條件下的 VAE-based VC,發表在 Interspeech2017 的論文 在 Google Scholar 已經有 145 次的引用。2018 年,我們 圖七:基於 VAE 的語音轉換流程。同傳統語音轉換,聲碼器首先將語音波形參數化為聲學特徵,各特徵流分別經過轉換,最後聲碼 器將轉換後的特徵合成語音波形。我們的研究著重在用於頻譜特徵轉換的編碼器 (Eθ) 和解碼器 (GΦ) 的建模與學習。 未來展望 / 本計畫在基礎研究方面,將持續開展永續學習機制與應用,進行方法優化與改善。除了技術開發外,將與 台大醫院進行智慧急診合作,以 AI 技術進行及早安全離部項目之整合,改善急診病人流動及解決壅塞。並 與和信醫院合作醫學影像腫瘤分割之深度學習模型建置:整合鼻咽 / 淋巴腫瘤偵測,及藉由影像分析改進 診療方案。預定也將與微軟新聞研究群展開合作,協助編輯自動化流程。 41

工 人 智計 Deep Learning Methods for Application-driven 慧 畫 Smart System Integration Arti cial Intelligence Projects Principal Investigator: Dr. Chu-Song Chen; Co-PIs: Dr. Hsin-Min Wang, and Dr. Lun-Wei Ku Project Period: 2018/1~2021/12 Deep neural networks have played a primary role in recent However, it is often di cult to know in advance what might advances in AI. Multiple deep learning models have been be a suitable model architecture for learning all of the designed to handle various tasks. Since those models are tasks well. In this project, we are developing an approach trained with particular datasets, they are only effective whereby di erent deep learning models are integrated by for specific purposes. Our objective is to integrate these removing the redundancies among their lters or weights. di erent deep learning models (each trained with speci c Unlike existing approaches that conduct this step only in data). Our integrated network model will be capable of the training stage, our method integrates multiple models handling simultaneously all of the individual tasks of the at the inference stage. The resulting merged model can original models. Integrated deep learning models of this be further ne-tuned using the training data if necessary. nature have huge potential in real-world applications, We are also developing tools for continuous deep learning such as in AI agents or robots that are required to conduct and integration into merged models, so that the multi- multiple recognition tasks based on the same or di erent tasking of merged models can be continuously expanded. signal sources (e.g., images, sounds). Even where only We anticipate our merged model will be capable of image signals are interpreted, a deep learning smart system integrating image, audio, and natural language processing may have to process various visual classi cation tasks (such tasks. In addition, we are also establishing deep learning as object recognition, face identification, hand gesture methods for efficient binary feature representations, prediction). thereby enabling rapid retrieval or recall from a large database. The project has two main aspects: (I) Integration A typical approach to tackling multiple recognition tasks of heterogeneous deep models, and (II) Deep learning in a single system is to design a new model and train that models for smart system applications. We discuss some of new model on the combined datasets for all of the tasks. our results thus far below. I. Integration of Heterogeneous Deep Models Unifying and Merging Well-trained Deep Neural Networks at the Inference Stage We are proposing a novel approach to merging is substantially reduced because our method leverages the convolutional neural networks at the inference stage. weights shared among individual models and preserves the Our method can align the layers of two feed-forward general architectures of their well-trained neural networks. neural networks already trained to handle different tasks The resulting merged model is jointly compressed and and merge them into a unified model by sharing their can be implemented faster than the original models but representative weights. The performance of the resulting has comparable accuracy to them. Our results have been merged model can be improved by retraining. Our published in IJCAI 2018. We have also applied this approach approach effectively produces a compact model that can to merging and co-compressing face-recognition and simultaneously undertake the original tasks of individual speaker-identification in a single compact model, which models on resource-limited hardware. The development was presented at the CVPR 2019 Workshop on Multimodal time for the merged model, as well as training overheads, Learning and Applications. Figure 1 : Co-compression of deep CNN models. Our approach can merge well-trained deep CNN models into a more compact model for e cient inference. 42

Brochure 2020 Un-forgetting Continual Lifelong Learning of Deep Models expansion while maintaining model compactness despite handling sequential tasks. Our compaction and selection/ Continual lifelong learning is an essential aspect of expansion mechanism demonstrates that the knowledge many applications. We propose a simple but effective accumulated through learning previous tasks can help approach to continual deep learning. Our approach build a better model that can tackle new tasks, thereby leverages the principles of deep model compression, dispensing with the need to independently retrain original critical weights selection, and progressive network models for new tasks. Experimental results show that our expansion. By enforcing iterative integration of individual approach of incremental learning generates an integrated tasks, we apply incremental learning that is scalable to model that can tackle multiple tasks without forgetting the the number of sequential tasks in a continual learning tasks of the contributory models, while maintaining model process. Our approach is easy to implement and exhibits compactness and enhancing performance. The results of several favorable characteristics. First, it overcomes the our endeavors have been published in NeurIPS 2019. problem of \"forgetting\" (i.e., learning new tasks while remembering all previous tasks). Second, it allows model Figure 2 : Unforgetting continual lifelong learning of deep models via the principle of model compaction (C), weight picking (P), and model expansion (G). The CPG approach can exploit the experiences learned from previous tasks to boost the performance of the current task. II. Deep Learning Models for Smart System Applications Deep Learning of Binary Hash Codes for Fast Retrieval We are introducing a binary hash codes learning approach, other in terms of Hamming distance within the label space. whereby class label representations are rendered adaptable These label representations then serve as the output of during network training. We express the labels as hypercube hash function learning, thereby yielding compact and vertices in a K-dimensional space, and both the network discriminating binary hash codes. This approach has proven weights and class label representations are updated in the simple yet e ective and it is applicable to both supervised learning process. As the label representations are explored and semi-supervised hash code learning. Our research on from available data, semantically similar categories are hash code deep learning methods has been published in assigned with label representations that are close to each ICIP 2019 and IEEE TPAMI 2019. Figure 3 : Hash codes learning for e cient image retrieval. In our approach, each semantic label has its own representation codewords. The label representations automatically, encoded as K-dimensional unit-hypercube corners, can be learned automatically. 43

Arti cial Intelligence Projects人 工 智 Figure 4 : Visual Story Telling: Integrating visual and text 慧 information for story synthesis. 計 畫 Figure 5 : Integrating photo and text Information for multimodal news recommendation. Visual Storytelling Visual storytelling is a research topic involving multi-modal integration. Given a set of pictures, visual storytelling models aim to generate the best collocated story. For general textual storytelling, it is already difficult to generate coherent contexts like those spoken by people. Moreover, visual stories need to fit the features of related imagery, i.e., grounding, making this topic even more challenging. We are tackling this issue by targeting the commonalities conveyed by both images and texts. First, we extract key events from the images. Then, we represent these events using event frames defined in a common semantic framework, FrameNet, and leverage knowledge bases to connect them to enrich content. Finally, we adopt a separate textual story generation model, trained on a large dataset, to produce the nal story. We are now working on aspects of coherence, diversity and length exibility of the stories generated by this approach. Human evaluators have attested to the quality of the stories our model generates, and the results have been published in AAAI 2020. Multi-View News Recommendation This project involves integrating news articles and their accompanying images to generate multi- modal recommended outputs. Conventional models use reading histories (session-based), read news content (content-based), or \"hot news\" (collaborative filtering) to recommend news to users. However, these models overlook that users’ prior knowledge of the world may greatly impact their current interests or, more directly, users may simply be attracted by an image. Our model learns principle concepts from a large dataset of reading history, representing prior knowledge, and identi es eye-catching prompts from the current news article and accompanying images to recommend the next article. Our experimental results have confirmed the effectiveness of the model. Following on from this line of research, we are studying personalized recommendations and news descriptions generated from the perspective of di erent users. Additionally, we are exploring the influence of recommendation dynamics in terms of information masking among users. 44

Brochure 2020 Discriminative Autoencoder-based Speaker/Speech Figure 6 : Discriminative Autoencoder-based Speaker/Speech Recognition Recognition. Speaker-related information is extracted from speaker-independent factors with di erent loss Autoencoders are often used to efficiently encode in functions at the code layer, so that the speaker-related unsupervised learning. We rst proposed a discriminative representation has better discriminating power and autoencoder (DcAE) in 2017, which has been applied recognition accuracy is improved. to speaker recognition. The basis of this research is to separate speaker-related information from speaker- independent factors by considering different loss functions at the code layer, so that speaker-related representations are better discriminated, thereby improving recognition accuracy. In 2019, we modified the structure of our DcAE and successfully applied it to TDNN and TDNN-LSTM acoustic model architectures in the nnet3 setup of the Kaldi speech recognition toolkit. The corresponding recipe for the WSJ corpus was released, and the results of this work were published in Interspeech2019. Variational Autoencoder-based Voice Conversion two kinds of spectral features, further improving the quality of the converted speech and the similarity to the target Voice conversion (VC) aims to convert the speech of a speaker. That advance was published in ISCSLP2018, and source speaker to that of a target speaker without changing the paper won a Best Student Paper Award. In 2019, we the linguistic content. In 2016, we rst applied a variational introduced the WaveNet vocoder into our VAE-based VC autoencoder (VAE) to voice conversion under non-parallel system to replace the traditional WORLD vocoder. We also training conditions. The resulting paper was published in proposed a method to remove fundamental frequency APSIPA ASC 2016, which has been cited 102 times to date (F0) information at the code layer. A respective paper (Google Scholar data). In 2017, we further integrated a integrating those research results has recently been generative adversarial network (GAN) into our VAE-based accepted for publication by IEEE Trans. on ETCI. VC system. That work was published in Interspeech2017 (cited 145 times, Google Scholar data). Then, in 2018, we proposed a cross-domain VAE that simultaneously models Figure 7 : Voice conversion (VC) Diagram. Our study focus on the modeling and learning of the spectral feature encoder (Eθ) and decoder (GΦ). Future Topics / We are continuing to seek more favorable approaches for conducting continuous lifelong learning and integrating them into various applications. We have recently begun cooperating with NTU Hospital to identify early and appropriate applications of smart emergency medicine, and are planning to collaborate with Microsoft Newsgroup on re ning the editing process. 45

工 人 智慧 計 畫 深度學習於多媒體資料處理的相關研究及應用 Arti cial Intelligence Projects 計畫主持人:廖弘源博士 計畫期程:2018/1~2021/12 從 2018 年 開 始, 我 的 研 究 團 隊 開 始 接 受 科 技 部 式微。由於準備的時間短促,我們一開始就以多媒體訊 補 助, 啟 動 一 個 四 年 期 的 人 工 智 慧 相 關 研 究 計 畫 號資料處理結合深度學習做為題目。科技部擔心各研發 (2018/1~2021/12),每年的經費大約台幣 850 萬元。 團隊只以做純研究及發表論文做為計畫的目標,就表明 原本我們團隊的專長是多媒體訊號及資料處理,而科技 將是否能與台灣產業合作做為一項評估指標。其中一項 部也嘗試鼓勵各個不同訓練背景的研究團隊,由各個角 最重要的指標就是「業界出題,學界解題」。我的團隊 度切入人工智慧相關的理論或應用研究,因為科技部已 於是選擇上市的 IC 公司義隆電子合作,希望能整合各自 看出這是一個未來非常重要的方向,如果此時不挹注經 擅長的軟硬體優勢,發展不僅能適用於台灣,且能輸出 費投入,未來台灣不管在科學上或產業上的競爭力勢必 國際的「智慧城市交通車流解決方案」。 我們的目標是完成下列項目: 一、在十字路口直接執行邊緣運算 (edge computing),計算出車流的各項參數; 二、利用電腦視覺的技術,計算鄰近路口之間其他的車流參數,例如:車速、停等車列長度; 三、利用前述獲得之交通流量參數,用增強式學習 (reinforcement learning),動態調整被某一範圍涵蓋的所有十字路口 之交通號誌。 在計畫的第一年,我們團隊利用一個 360 度的魚眼相機 圖一:利用魚眼攝影機所偵測到的十字路口車流狀況。 放在十字路口,並由 YOLOv3 的模型開始做偵測並計算 車流。這個議題有兩個最大的困難處: (1) 要 用 邊 緣 端 的 簡 易 計 算 處 理 器, 例 如:Nvidia Jetson TX-2 來執行運算。TX2 的計算能力只有 GTX1080 Ti 顯示卡的二十分之一,要計算資料量很龐大的視訊, 又要維持相當的準確度,本來就極困難; (2)為了只要用最少量的攝影機去涵蓋整個十字路口, 我們決定採用 360 度的魚眼攝影機來符合上述要求。但 魚眼攝影機的影像空間其實是一個扭曲的空間,當我們 需要引入機器學習去學習所有可能的樣本時,它們的樣 態會因為扭曲的關係與傳統的正常空間完全不同。 我們團隊用 YOLOv3-tiny 當起頭去修改模型,讓它能符合 我們在扭曲空間執行車流偵測及計算的目的。此一成果 讓義隆電子參展台北國際電腦展 (Computex Taipei) 並由 550 件以上的參展產品中奪得 Best Choice Award 的金獎 (只有 8 件產品獲金獎)。圖一所示為利用魚眼攝影機 所偵測到的十字路口車流狀況。 46

Brochure 2020 為 了 有 效 運 用 電 腦 視 覺 技 術 偵 測 車 輛、 停 等 車 列 及 計 能力。這一項改變能夠適用於現行的所有主流 CNN 架構, 包含 ResNet、ResNeXt、DenseNet 等,並能在減少各種 算 車 速 等 參 數, 我 們 團 隊 決 定 設 計 新 的 機 器 學 習 模 型 CNN 10% 到 30% 的運算量的條件下仍維持或提升 image 來 加 速 處 理 視 訊 框 架 (frame) 的 速 度 及 保 持 甚 至 增 加 classi cation 的正確率。且由於 CSPNet 設計時考慮到運 計 算 或 偵 測 的 準 確 率。 我 們 團 隊 提 出 了 Cross Stage Partial Network(CSPNet) 的 新 模 型。CSPNet 利 用 優 化 算成本、記憶體頻寬、負載平衡…,一般而言能夠減少 Convolutional Neural Network (CNN) 架構中倒傳遞梯度流 40% 到 80% 的記憶體頻寬需求,因此更適用於資源受限 資訊傳導路徑的策略,最大化各層權重求取出的特徵差 的邊緣運算平台。 異性,使得 CNN 在輕量化過程中仍能維持高水準的學習 圖二: Cross Stage Partial Network(CSPNet) 的架構圖。 圖 二 為 CSPNet 的 結 構, 其 中 computational block 可 以 的 transition layers 係用於截斷重複使用的梯度資訊,其 是 ResBlock、ResXBlock、Res2Block、 或 DenseBlock 等 分別用於提升 stage 內與跨 stage 間不同區塊用於更新權 state-of-the-art 結構。由於採用了分流權重的設計,因 此 在 computational block 中 不 再 需 要 使 用 bottleneck 重之梯度的差異性,此設計能夠減少冗餘資訊的重複學 layers,此設計可以大幅降低運算量及各層所需記憶體頻 習以獲得更高效的參數利用率,因此在輕量 CNN 中更具 寬,並提升負載平衡。而在 cross stage merge 步驟前後 優勢。 47

人 工 智 慧 計 畫 Arti cial Intelligence Projects 搭 載 了 CSPNet backbone 的 detector 在 物 件 偵 測 任 務 便能以 3.35 倍的推論速度 (67 fps vs 20 fps) 運行物件偵測 上取得了優越的性能。表一所列為與當今 state-of-the- 模型並達到同等的準確度 (42.7% vs 42.8%)。這同時也意 art 方法比較的結果,我們的模型在同等級的準確度要 味著 CSPNet 適用的場景更加廣泛,能使用成本更低的 視訊擷取設備 ( 適用於低階到高階的任意相機或手機, 求 下, 推 論 速 度 遠 超 過 其 他 方 法。 與 其 他 人 工 設 計 的 而不受限於高解析度昂貴攝影機 ) 、更輕便的運算設備 model 相 比 普 遍 快 了 50% 到 300%, 而 與 基 於 Network (CSPNet 可以用一張 1080ti 達到超過 SM-NAS 使用兩張 Architecture Search (NAS) 的 方 法 相 比 亦 有 30% 以 上 速 V100 的推論速度,成本分別為新台幣 3 萬與新台幣 60 萬 ) 度 的 提 升。 與 AAAI 2020 發 表 的 Structural-to-Modular Network Architecture Search (SM-NAS) 方法相比,CSPNet 來完成任務。 僅需使用更低的 image resolution (512×512 vs 800×600) 表一: CSPNet 與最新的 state-of-the-art adtectors 的性能比較。 CSPNet 在運算資源受限制時優勢更為明顯(如圖三), 到 102 fps 和 72 fps 的運行速度,以上這些數據及表現, 與 state-of-the-art 方法相比,無論在怎樣的推論速度下 代 表 我 們 的 CSPNet 能 夠 將 AIOT 推 廣 到 everything 與 CSPNet 皆取得了最好的表現。與當前最佳的輕量模型 everywhere。另外,因為 CSPNet 能減少 10% 到 20% 的 ThunderNet 比較,CSPNet 在更高的準確度條件下提升 記憶體空間需求、10% 到 30% 的運算量、40% 到 80% 的 了 GPU 上 133 fps 的運行速度達到了 400 fps,這意味著 記憶體頻寬需求的優勢,更能大幅降低 AI 專用 ASIC 硬 該模型可使用單 GPU 同時運作約 12 路攝影機的即時智 慧型監控系統。並且,該模型能於 CPU 和 TX2 上分別達 體開發成本及後續能量耗損並提高穩定度。 圖三:CSPNet 與不同的 state-of-the-art adtectors 的性能比較。 48

Brochure 2020 有了前面利用魚眼攝影機所取得的十字路口車流參數以及由槍型攝影機所取得的其他車流參數 (車速、停等車列長度等),我們團隊由 2019 年 12 月起,接受義隆電子委託,預計花 15 個 月的時間投入智慧交通路網的建構。義隆電子在桃園市大園區連續 5 個路口架設魚眼攝影機及 槍型攝影機(圖四)。我們將利用增強式學習 (reinforcement learning) 的方式針對幾個鄰近路 口的交通流量做最佳化計算,並使用計算得到的數據作為動態調整交通號誌的依據。 圖四:桃園市大園區連續 5 個路口魚眼攝影機及槍型攝影機架設圖。 49


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook