Home Explore 2020簡介0619

2020簡介0619

Published by tomoaki.sinica, 2020-06-21 22:43:42

Description: 2020簡介0619

Read the Text Version

Pages:

Brochure 2020 In the second year of the project (2019), we focused on using endeavored to maximize the di erence in weight features computer vision technology to detect vehicles, to compute that can be obtained for each layer, permitting the CNN to the length of car queues, and to estimate car speed as well maintain a high level of learning ability. This system can as other parameters. We decided to design a new machine- be applied to all current mainstream CNN architectures learning model to meet the requirements of real-time including ResNet, ResNeXt, and DenseNet, amongst edge computing. My research team then proposed a new others, and it maintains or improves the accuracy of deep learning-based neural network model that we named image classi cation, reducing by 10-30% the various CNN Cross Stage Partial Network (CSPNet). The main concept computations. Since we designed CSPNet with a view to underlying CSPNet is optimization of the transmission path computing costs, load balancing, and memory bandwidth strategy of gradient ow for the back propagation process (reduced by 40-80%), it is more suitable for edge computing of a Convolutional Neural Network (CNN) architecture. We platforms with limited resources. Figure 2 : Architecture of CSPNet. Figure 2 illustrates the architecture of CSPNet. The merge step are both used to truncate reuse of the gradient computational block within this architecture can be information. The transition layer \"before\" the cross stage represented by state-of-the-art blocks such as ResBlock, merge step is used to increase the variability of gradients ResXBlock, Res2Block, or DenseBlock. By adopting the used to update weights within the stage, whereas the \"partial\" network design concept, there is no need to use transition layer \"after\" that step is used to update weights bottleneck layers to design a computational block and, in for the cross-stage block. This design reduces meaningless doing so, the amount of computations and the memory learning of redundant information, thereby ensuring bandwidth required by each layer is significantly reduced a higher parameter usage rate and rendering it more and load balancing can be increased. It is worth mentioning advantageous in a lightweight CNN framework. that the transition layers before and after the cross stage 51

人工智慧計畫 Arti cial Intelligence Projects Object detectors equipped with our CSPNet backbone have Search (SM-NAS) model. CSPNet needs a much lower been shown to achieve superior performance in object image resolution (512x512 vs 800x600) to operate object detection tasks to other models (see Table 1). It is obvious detectors at 3.35 times the inference speed than SM-NAS from Table 1 that the inference speed of our CSPNet (67 fps vs 20 fps), yet achieves the same level of accuracy model is far faster than other models with the same level (42.7% vs 42.8%). This enhanced performance means that of accuracy requirements (~50-300% faster than existing CSPNet is applicable to broader scenarios, including low- state-of-the-art models). Compared with more advanced cost video-capture equipment (i.e., any camera or mobile models, for example, our system is more than 30% faster phone and, thus, is not limited to expensive high-resolution than the Structural-to-Modular Network Architecture cameras) and non-powerful computing devices. Table 1 : Performance comparisons between our method and other state-of-the-art methods. The advantages of CSPNet become more obvious when the CSPNet can run on a GPU and TX2, achieving excellent computing resources utilizing it are limited. Compared with performances of 102 fps and 72 fps, respectively. These other state-of-the-art methods, CSPNet achieves the best performance statistics mean that our CSPNet system can performance under any requirement of inference speed. extend Arti cial Intelligence of Things (AIoT) to everything Moreover, relative to the best lightweight model available everywhere. Importantly, since CSPNet reduces memory today, ThunderNet, the computation speed of CSPNet space requirements by 10-20%, computations by 10-30%, increases from 133 fps on a GPU to 400 fps, with improved and memory bandwidth by 40-80%, it signi cantly reduces accuracy from 33.7% to 34.2%. Therefore, CSPNet can development costs for AI-specific ASIC hardware and use a single GPU to drive 12 camcorders simultaneously subsequent energy loss, while simultaneously improving and compute traffic flow in real-time. Furthermore, stability. Figure 3 : Performance of our CSPNet model compared with other state- of-the-art methods. 52

Brochure 2020 From the tra c parameters obtained by the sheye camcorders (2018) and gun-type camcorders (2019), in the third year of our project (2020) we accepted ELAN’s challenge and will spend the next 15 months constructing an intelligent transportation road network. ELAN has deployed sheye and gun-type camcorders at ve intersections in Taoyuan County (Figure 4). We will develop algorithms based on reinforcement learning to optimize tra c ow at several of these consecutive intersections, and use the data we obtain to dynamically adjust tra c signs Figure 4 : Fisheye and gun-type camcorders deployed at ve intersections in Taoyuan County. 53

工人智慧計畫具深度理解之對話系統及智慧型輔助學習機器人 Arti cial Intelligence Projects 計畫主持人：許聞廉博士、馬偉雲博士、呂菁菁博士、張詠淳博士計畫期程：2018/1~2021/12 中文由於沒有詞界，文法鬆散，大量省略，且詞序自由， components（BC）。以往，選取 BC 並沒有一個清楚的電腦處理起來相當困難。以往的中文斷詞，語音輸入，依據，純粹由程式決定。因此，選出的 BC 無法保證能以及文句剖析都是使用不同的演算法。我們在科技部 AI 掌握句子主要結構。現在，我們可以先將句子作某種程計畫中，藉由小學數學解題，以及對話系統的語言理解度的簡化，將某些修飾語縮進其搭配詞內，讓結構上重經驗，發展出一套展嶄新的中文處理演算法，稱之為「簡要的 components 顯示出來，再進行 SPBA 分群時，選出化還原法」（簡稱「簡化法」）。簡化法結合我們早期開的 BC 之代表性就更為顯著。這個現象在專有名詞辨識發的統計式準則模型 SPBA（合併稱為 RPBA）可同時處上相當清楚。因此，Reduction + SPBA （或簡稱 RPBA）理中文斷詞，語音輸入，文句剖析，甚至中文語言生成，就是我們目前所發展的語言分析演算法。由於 SPBA 及和機器翻譯。RPBA 的原理頗類似人類處理語言的方式， reduction 都不受語言限制，因此 RPBA 可以應用在任何結合規則與印象（統計）於一身，且可運用到各個不同語言。層面。詞與詞之間語意搭配（或依存）的關係在句子中極為重簡化法是奠基於詞與詞的搭配關係。一個詞 X 的修飾語，要。我們可以說，沒有一個詞在句子中是獨立存在的，通常是語意上能夠和 X 搭配的詞（稱為「搭配詞」）。例每一個詞一定會與句子中另一個詞有語意搭配關係。許如「漂亮」可以描述球賽，但美麗」則不行。一個複雜多這類的搭配關係是約定俗成的。譬如，我們會說「打的句子通常是由簡單句逐步地加上許多語意上適合搭配了一場漂亮的球賽」，而不會說「打了一場美麗的球賽」，即使「漂亮」與「美麗」意義相近。不瞭解這樣的搭配的修飾語，修飾子句，或者修飾語的修飾語，補語等等。關係，電腦經常會產生錯誤的剖析。譬如下面的例子：如果我們對每個詞 X 蒐集其搭配詞集合 FB(X)。就可以利用搭配詞之間的修飾關係，將一個複雜句反推回原來的 1. 完成清掃家裡的工作 (Finish the job of house cleaning) 簡單句。要進行這個計算，我們需要將句子中所有合理 • 完成 { [ 清掃家裡 ] 的工作 } --- ( 完成，工作 ) 的修飾關係利用 FB 和句子結構推導出來。如此，就會得到這個句子的依存剖析樹。將一個詞 X 的修飾語「併入」 2. 完成清掃家裡的垃圾 (Finish cleaning the household X 的動作，我們稱之為「簡化」(reduction)。對一個句子 garbage) 進行簡化，我們要從依存剖析樹的端點詞（leaf node） • 完成 { 清掃 [ 家裡的垃圾 ] } --- ( 完成，( 清掃，垃圾 ) 遞迴地與上面的搭配詞合併，回推至其原來的簡單句。事件 ) 簡化法是一個利用 FB 產生依存剖析樹的方法，可同時幫助進行語音辨識及語言生成。一般的剖析器很容易將第二句話剖析成和第一句類似，也就是主要事件是 ( 完成，垃圾 )。然而正確的方式卻是：我們可以將常用的語言模型，N-gram，與簡化法作一個完成了「清掃垃圾」這個事件。也就是說，（完成，工作）比較。N-gram 作為語音辨識的語言模型非常有效，然而是一個合適的語意搭配詞組，但（完成，垃圾）則否。當 N ≧ 3 時統計數量就非常龐大，簡化法就沒有這個問題。簡化法可以看成是詞的 bigram，但是經由遞迴運算，這類有意義的搭配關係可能有上億個配對，需要在非常可以合成長的 N-gram。譬如，從 AB，BC 兩個搭配詞可大的資料中才能統計得到，在任何有限的機器學習訓練以自動合成 ABC 的搭配詞串，不需事先儲存。同理，可語料中是無法看出的。這也說明了，為何一般機器學習以自動合成 ABCD…等更長的詞串。換句話說，就某個層的正確率在自然語言中有其瓶頸。面而言，簡化法具有 N-gram 的果效。但是，句子中有許多文法上的遠距相依性不容易在 N-gram 表達。這類結構上的性質就可以藉由我們早期發展的統計式準則模型（Statistical Principle based Approach, SPBA）來精準描述。SPBA 可以將句子自動分群，讓每一群的句子都可以被一組 backbone components （如 ABCD 四個詞，以及兩兩之間適當的 insertions）所涵蓋。SPBA 分群的好壞依賴其所選取的 backbone 54

Brochure 2020 以下，我們就來看一些簡化法的例子。在圖一我們逐步二則將動詞「給」的修飾語簡化，並利用其「動詞框架」，解說一句話中的受詞「蘋果」之修飾語簡化的過程。圖完成整句的剖析樹，稱之為 incidence map。圖一：蘋果的修飾語簡化的過程。圖二：整句話的 incidence map。 55

人工智慧計畫 Arti cial Intelligence Projects 本計畫一個主要目標是建構小學數學自動解題及解釋系問「400 元可買幾公斤？」，「或 8 公斤要多少元？」等等。也就是說，每個句子在題目中都扮演某種角色，這個角統。我們用簡化法來理解文字題中每個句子，將之轉換色通常會搭配其他相關角色的句子，前後呼應。根據這成 incidence map，並加入每個句子的數學含意。譬如，種句子的依存關係，我們就可以從問句推導至其相關句，「蘋果每十公斤 300 元」（或十公斤蘋果 300 元）這個及其間的數學關係的流程。句子是表示「單位價」的意思，其數學含意就是蘋果每公斤 30 元。其次，句子中有單位價的描述時，經常就會我們另一個創新是，利用自然語言的「腳本」（NL 這個腳本中每一個「指令」都會再產生更細部的腳本， script）來描述這種流程關係。譬如下面這題：並由電腦理解後執行，直到每個指令的答案都算出來為止。最後，利用這些腳本的敘述自動組成解題過程的「解一張貼紙五元，小明有兩張貼紙，小華花了二十元買釋」。遇到條件不足的情況，系統會自動回應「找不貼紙，兩人共有幾張？到…，此題無法計算」。最初的腳本是由人撰寫，之後遇到相似的句子則機器會利用相同的框架進行「代換」，從「一張貼紙五元」，我們知道這是貼紙的單位價。「小自動產生類似的腳本。句子的相似度則是由 incidence 華花了二十元買貼紙」就可以直接推出「小華買了四張 map 的比對來決定。比對時，句子中的字、詞需要作合貼紙」，也就是「小華擁有四張貼紙」。最後的問句「兩適的 generalization，以擴張這些腳本之應用範圍。人共有幾張？」會產生一個解題的「腳本」如下： 1. 找出幾張「什麼」 2. 找出兩人是哪兩個人 3. 找出每人擁有的貼紙數目 4. 將 (3) 的數目相加以自然語言腳本來描述解題流程的優點是，非程式背景意、更合適的客製化腳本。這在對話系統中，非常有用。的專業領域人員（譬如：PM）也看得懂並能撰寫這些指因此，我們已經將這套方法應用到財經相關的對話系統令。當錯誤發生時容易找出原因並加以修正，不必事事中，成效也非常卓著。仰賴軟體工程師。如此，專業人員有可能設計出更富創縱觀而言 / 本計畫有兩大重要貢獻：首先，是發展獨特、簡單的「簡化法」來搭配既有的 SPBA 做為中文句子剖析及理解的工具。其次，是利用自然語言的腳本來描述解題及對話的流程。將知識與程式區隔開來，讓非程式背景的專業人員也可用自然語言的指令來操控程式，充分貢獻其領域 know-how，並建立更完善的應用系統。 56

Brochure 2020 57

工人智計 Intelligent Conversational Robot for Learning Assistance 慧畫 with Natural Language Understanding Arti cial Intelligence Projects Principal Investigators: Dr. Wen-Lian Hsu, Dr. Wei-Yun Ma, Dr. Ching-Ching Lu, and Dr. Yung-Chun Chang Project Period: 2018/1~2021/12 The main goal of this project is to tackle the formidable appropriate insertions between adjacent components). task of Chinese language understanding. The Chinese In the past, the computer algorithm determined the language has no word boundary, its grammar rules are quality of the backbone components. However, there is loosely defined, word omission is prevalent, and word no guarantee that the main structure is captured by these order is exible, making it very di cult to process. Through components. Now, since some modifiers can be reduced our experiences with primary school mathematical word to their compatible nouns or verbs so that more important problems (MWP), we developed a reduction algorithm components are more likely to be represented. Thus, for the Chinese language. Together with our long- reduction + SPBA (or RPBA for short) represents our new developed Statistical Principle-based Approach (SPBA), algorithm for natural language processing. Since neither our new algorithm (called RPBA, i.e., reduction + SPBA) reduction nor SPBA is language-dependent, RPBA can be can simultaneously conduct word segmentation, speech used for any language. recognition, parsing, language generation, and even translation. This interpretable machine-learning model Word compatibility is crucial in a sentence, with every operates in a way very similar to how humans learn word having to be compatible with some other word in languages, combining impression and grammar structure, the sentence. Meaningful compatibility is an unwritten which can be applied to all aspects of the Chinese code. For example, one can say, \"The ball game is great\", language. but not \"The ball game is huge.\" Without knowledge of word compatibility, computers often make interpretative The reduction algorithm is based on specific word mistakes. Take the following sentences as an example: collocation relationships. A modifier of word X is often semantically compatible with X. A complex sentence can 1. 完成清掃家裡的工作 (Finish the job of house cleaning) usually be composed from a simple sentence by adding • 完成 { [ 清掃家裡 ] 的工作 } --- ( 完成，工作 ) compatible modifiers, sub-clauses, and complements. Let FB(X) be a collection of compatible modi ers of X. Making 2. 完成清掃家裡的垃圾 (Finish cleaning the household use of FB(X) and sentence structure, we can reversely garbage) deduce the original simple sentence from a complex • 完成 { 清掃 [ 家裡的垃圾 ] } --- ( 完成，( 清掃，垃圾 ) sentence. To do this, we need to derive all pertinent 事件 ) compatible pairs from FB(X). The action of merging a modifier with X is called reduction. Sentence reduction is A parser could easily treat the second sentence as being performed recursively from the leaves of a dependency similar to the rst one. Speci cally, it would conclude that parse tree (DPT) to the root, i.e., its simplest form. the main event in the second sentence is ( 完成，垃圾 ), whereas the correct event should be ( 完成，（清掃，垃 We have compared the popular N-gram model with RPBA. 圾 )）because ( 完成，工作 ) is meaningful and ( 完成，垃 The N-gram model is very e ective for speech recognition. 圾 ) is not. However, when N ≧ 3, it becomes difficult to store such a huge number of N-grams. In contrast, RPBA works like The number of such meaningful word pairs could be in the a bi-gram, recursively composing longer N-grams. For order of hundreds of millions, which can only be collected example, from two compatible pairs, e.g., AB and BC, we from a huge corpus. Knowledge like this is abundant, can compose the tri-gram ABC without pre-storing it. explaining why machine learning in natural language This approach can be further extended to ABCD or longer processing (NLP) using a limited labeled corpus easily hits a N-grams. bottleneck. Reduction can be used to reduce modi ers for nouns and verbs. However, long distance dependency is not easily captured by N-grams. Such structural properties can be modeled instead using SPBA. SPBA clusters sentences so that each cluster can be covered by a set of backbone component words and concepts (such as ABCD with 58

Brochure 2020 Below, we illustrate an example of our RPBA reduction verb「給」. Finally, by adopting the verb frame 「給」, algorithm. In Figure 1, we illustrate the process of reducing we can complete the parsing tree, which we refer to as \"the the modifiers of the noun「蘋果」. In Figure 2, we incidence map\". illustrate the process of reducing the modifiers of the Figure 1 : Reducing the modi ers of the noun「蘋果」. Figure 2 : Producing the incidence map of the sentence. 59

人工智慧計畫 Arti cial Intelligence Projects A primary objective of our project is to construct a apples is worth 30 dollars.\" By extension, the question could primary school MWP system that can not only solve a be posed \"How many kilos of apples would 400 dollars word problem, but also explain the process by which a buy?\" or \"How much would 8 kilos of apples cost?\" This solution was derived. We used RPBA to transform a given scenario implies that every sentence related to a problem sentence into its corresponding incidence map and added plays a role that can be associated with another sentence. its mathematical entailments. For example, the sentence Through such sentence compatibility, it is possible to derive \"Ten kilos of apples are worth 300 dollars.\" represents \"the associated sentences from the question sentence. unit price\". Its mathematical entailment is \"Every kilo of In order to describe the solution process, we have created 2. Find out which two persons represent \"both of natural language scripts (NL script) in solution ow charts. them\". To illustrate this, consider the following problem: 3. Find out the number of stickers each one has. A sticker sells for $5. John has two stickers, and Josh 4. Add the numbers in (3) together. bought $20 worth of stickers. How many do both of them have? Each instruction in these NL solution scripts can yield finer scripts that are executed by the model to identify From the sentence, \"A sticker sells for $5,\" we can obtain answers and, ultimately, to compile an explanation. The the unit price of a sticker. From \"Josh bought $20 worth of stickers,\" we derive that \"Josh bought four stickers\" or \"Josh rst NL solution scripts are written manually. Later, scripts has four stickers.\" From the question \"How many do both for similar sentences can be written automatically via the of them have?\" we can create the following NL solution same framework by substitution. Similarity is measured by scripts: matching the incidence maps. Furthermore, appropriate word generalization can be adopted to expand the 1. Find out how many items of \"what\". application range of these NL solution scripts. The advantage of using NL solution scripts in ow charts is errors without having to rely on expert programmers. Thus, that domain experts lacking a programming background more versatile NL solution scripts can be designed to meet (such as Project Manager) would also be able to understand the speci c needs of customers. and write these instructions. Moreover, they could correct Summary / This project has two major contributions. First, we have developed a simple and unique RPBA algorithm, that couples reduction with our SPBA approach, which can be used to parse and understand the Chinese language. Second, we are creating NL solution scripts to describe solution ow charts and explanations. Such scripts separate domain knowledge and programs, allowing non-programmers to write instructions as NL solution scripts and to construct more user-friendly applications. 60

Brochure 2020 61

工人智慧計畫建構概念為本且具語義結合性的中文知識庫 Arti cial Intelligence Projects 計畫主持人：馬偉雲博士計畫期程：2019/1~2022/12 我的研究團隊自科技部（MOST）獲得了一個為期四年在項目的第一年（2019 年），我們主要專注於從原始文（2019/1 ～ 2022/12）的人工智慧計畫。每年該項目獲本中獲取知識和知識的表達。為了從原始文本中獲取知得 MOST 撥款 23 萬美元。目的是建立一個能解決現實識，我們提出一種新的模型 - GraphRel，這是一種端到端問題並且支持台灣 AI 發展的中文知識庫。回顧歷史，在的深度學習模型，使用圖卷積網絡（GCN）共同學習擷 2012 年，Google 以知識庫（知識圖譜）作為搜尋檢索的取專有名詞和他們之間的關係（請參見圖一）。與以前研額外資源，效能獲得大幅提升。從那時起，無論是在工究的作法不同，我們考慮了專有名詞與關係之間的交互業界還是在學術界，知識庫的建設引起了廣泛的關注，影響，以兩階段來利用 GCN，可以更有效地擷取專有名也成功開發和部署了各種知識庫應用程序。鑑於深度學詞專有名詞和其交互關係。同時，我們也利用詞彙間的習的快速發展，近來研究者多將知識庫的信息加以編碼，語法結構作為文本的特徵，建立一個輸入文本的詞彙關作為深度學習模型可以利用的形式，賦予其解決各種實係圖，詞彙的訊息可藉由這樣的關係圖彼此傳遞，以達際的應用問題的能力。成設定的目標。與前人的研究相比，對擷取專有名詞交互關係的效能上有了很大的改進。我們針對兩個公共數開發實用且高質量的知識庫非常關鍵，但也充滿挑戰。據集 - NYT 和 WebNLG 評估了 GraphRel。我們的結果表涉及的實體數量可以達到數百萬，甚至數千萬。因此，明 GraphRel 保持高準確率，且大幅提升召回率。跟 2019 不可能手動建立知識庫，必須有自動建立的技術與機制。年最好的技術相比，GraphRel 在 NYT 和 WebNLG 的 F1 同時，還必須考慮知識庫的推理機制，包括因果關係和分數均超越 2019 年最好的技術達 3.2％和 5.8％的相對進推理過程，因此，必須仔細設計知識庫的表達形式和推步幅度。理能力。為了建立這種性質的實用知識庫，我們目前致力於知識的自動擷取、表示、推理和應用。我們已經開發了英文和中文兩種版本的知識庫形式。前者可以跟其他世界級的知識庫做比較，後者則用於構建中文知識庫的種種實際應用，落實計畫的目標。圖一： GraphRel 的整體架構。 62

Brochure 2020 我們在 2019 年有關知識獲取的另一項研究工作，是開發我們提出了 H-FND，這是一種階層式的去噪框架（請參一種在遠程監督下，達到排除錯誤樣本的方法。監督式見圖二）。 H-FND 包含多個步驟，用以確定每一筆自動產機器學習技術都需要一組訓練數據，但是標記訓練數據在時間和金錢上都很昂貴。遠程監督是生成數據的一種生的訓練樣本是否應保留、丟棄或修改成特定關係的陽自動化作法。在遠程監督中，我們利用已有的知識庫以性資料（Positive Sample）以提供更好的訓練資料。我們字串比對的方式，對應到語料庫，藉以收集要提取關係在 SemEval-2010 上進行了實驗，並設定多種不同的偽陰的上下文資訊。該過程會導致為數不少的偽訓練數據，性資料比率。我們的結果表明，即使在高達 50％的偽陰特別是偽陰性資料（False Negative Sample）。若用於訓性資料之下，H-FND 也有能力進行大部分的修復工作，練，會大幅降低最終模型的性能。為了克服這個問題，保持穩定的 F1 分數。圖二： H-FND 的整體架構。 63

人工智慧計畫 Arti cial Intelligence Projects 在 2019 年，我們開發並發布了 E-HowNet 2.0 – 一種新的義組合功能（請參見圖三）。與以前的版本相比，它具有實體關係常識表示模型，該模型具有用於知識表示的語以下特色： a. 原始概念和基本概念的層次結構重組：我們擴展了大 c. 開發本體結構自動建立系統：在修改了詞義或概念表量的基本概念，生成了更深的層次結構與更精確的達式之後，本體結構自動建立系統會將每個詞彙重新語義分類。這項工作使得由基本概念所表達的詞彙附加到適當的本體結構的節點上，從而產生一個新的意義變得更加精確和可讀。另外，我們將本體結構本體結構。 (Ontology) 分為兩部分。第一部分表示實體的層次結構，第二部分表示關係的層次結構，即語義角色。此 d. 通過基本概念的擴增和優化來改進詞彙的定義：通過外，屬性和屬性值也進行優化，掛載在同樣的本體結對基本概念的擴增和優化，可以對許多詞彙的定義進構之中。行修改，使其更加精確和可讀。此外，共享的語義特徵以及明確的關係表達（例如反義詞，屬性值和蘊含 b. 豐富的詞彙信息：除了語義的意義表達，操作表達以關係），也建立了更多詞彙之間的語義鏈接。及語義功能也加入到詞彙的訊息之中，以利於將來的語義合成過程。此外，我們還為事件類詞彙提供了事件框架。圖三： E-HowNet 2.0 的查詢範例 – 以「蜻蜓 (dragon y)」為例。 64

Brochure 2020 在計畫的第二年（2020 年）早期，我們發佈了詞彙語意常識性知識庫（例如 WordNet、ConceptNet、E-HowNet）類比測驗的數據集，目摽在評測詞彙的常識推理能力，標注了詞彙的常識。E-HowNet 目前以結構化的方式進行我們在 LREC 2020 中發佈了我們的研究結果。常識推理詞彙常識的定義，共有 88,000 個中文詞彙。我們提出一是自然語言推理任務的基礎。大多數模型依賴詞彙語意向量來提供背景世界知識，但詞彙語意向量在常識上的套從常識表示模型中提取常識類比題目的自動化作法，覆蓋面十分有限。因此，我們將建立詞彙級別的常識推從 E-HowNet 中提取了精確的類比，同時由語言學家進行理任務。現有的詞彙級別推理任務大多不著重在常識層確認。產生的測試集取名 CA-EHN，為第一個常識類比數面。以中國詞彙類比測驗集（CA）為例；簡體中文數據據集，包含 90,505 個類比，涵蓋 5,656 個詞彙和 763 個集和從谷歌翻譯成英文的繁體中文數據集僅包含數十種類比關係。此外，我們的實驗分析表明，考量常識的語關係，其中大多數是形態上的（例如前綴）或關於專有意向量在 CA-EHN 的評量上可以得到較高的分數。圖四名詞間的知識關係（例如某城市是某國家的首都）。顯示了 CA-EHN 的一些範例。圖四：CA-EHN (word:word=word:synset) 的範例。 65

人工智計 The Construction of a Concept-based Chinese 慧畫 Knowledge Base with Semantic Composition Capability Arti cial Intelligence Projects Principal Investigator: Dr. Wei-Yun Ma Project Period: 2019/1~2022/12 My research team was awarded a four-year (2019/1 ~ utilization. We have developed respective models with both 2022/12) AI-related project from the Ministry of Science English and Chinese versions. We use the former primarily and Technology (MOST) that began in 2019. MOST grants for model comparisons to meet academic requirements, USD$230,000 to this project each year. The objective is whereas the latter are employed for practical construction to develop a Chinese knowledge base with the ability to of a Chinese knowledge base. resolve realistic downstream problems and that can support AI development in Taiwan. In 2012, Google presented In the first year of the project (2019), we mainly focused their knowledge base (Knowledge Graph) as an external on knowledge acquisition from raw texts and knowledge resource to signi cantly enhance the value of information representation. For knowledge acquisition from raw texts, returned by Google searches. Since then, construction of we developed GraphRel, an end-to-end relation extraction knowledge bases has attracted a lot of attention, both model that uses graph convolutional networks (GCNs) to within industry and academia. Consequently, various jointly learn named entities and relations (see Figure 1). applications of knowledge bases have been successfully Unlike previous baselines, we consider the interactions developed and deployed. Given rapid developments in between named entities and relations via a 2-phase deep learning, there is a boom in encoding the information relation-weighted GCN to better extract relations. Linear from knowledge bases into deep learning models to and dependency structures were both used to extract empower them to resolve various downstream problems. sequential and regional features of the input text, and then a complete word graph was employed to extract Developing a practical and high-quality knowledge base implicit features among all word pairs of that text. Through for information processing is crucial yet challenging. our graph-based approach, predictions for overlapping The number of entities involved can number in the relations are substantially improved over previously millions, tens of millions, or even be infinite. Accordingly, reported sequential approaches. We have evaluated it is impossible to build a knowledge base manually, GraphRel against two public datasets: NYT and WebNLG. so it must be built automatically. Concurrently, the Our results show that GraphRel maintains high precision inference mechanism of the knowledge base, including and exhibits substantially enhanced recall. Moreover, causality, relationships, and action processes, must also we found that GraphRel outperforms the state of the art be considered, so the form of representation and the by 3.2% F1 score on NYT and 5.8% F1 score on WebNLG, representation ability of the knowledge base must be thereby achieving in 2019 a new state of the art for relation carefully designed. To build a practical knowledge base extraction. of this nature, we are currently engaged in researching knowledge acquisition, representation, reasoning and Figure 1 : Overview of GraphRel with 2-phase relation-weighted GCN. 66

Brochure 2020 Another element of our research work on knowledge text. Treating FN input as non-relation training sentences acquisition in 2019 pertained to the development of an can diminish final model performance. To overcome this effective approach for removing noisy samples under problem, we generated H-FND, a hierarchical false-negative distant supervision. Most machine-learning techniques denoising framework for robust relation extraction (see require a set of training data, but labeling training data Figure 2). H-FND uses a hierarchical protocol that first is expensive both in terms of time and money. Distant determines if non-relation (NA) sentences should be supervision represents an alternative approach to kept, discarded, or revised during training. Then, those generating training data. In distant supervision, an existing NA sentences are revised into appropriate relations for database is used to collect examples for the relation we better training input. We have conducted experiments want to extract, even though that process results in a on SemEval-2010 and randomly ltered ratios of training/ large amount of noisy training data being generated. validation sentences into NA. Our results show that H-FND Thus, distant supervision is vulnerable to false-negative revises FN sentences correctly and maintains high F1 scores (FN) sentences, especially when extracting relations from even when 50% of the sentences have been ltered out. Figure 2 : H-FND framework. The process in this diagram is executed per epoch. 67

人工智慧計畫 Arti cial Intelligence Projects Also in 2019, we developed and released E-HowNet 2.0 – a representation (see Figure 3). It has the following new entity-relation commonsense representation model improvements relative to its previous version: with semantic composition capability for knowledge a. Reorganization of the hierarchical structure of primitive c. Development of a new automatic ontology reconstruction and basic concepts: We have extended a large set system: In cases where lexical sense expressions or of basic concepts, generating a deeper hierarchical nodes of conceptual hierarchy are revised, the ontology structure and more precise semantic branching. This reconstruction system may re-attach each lexical entry work has also resulted in lexical senses expressed from to appropriate ontological nodes, resulting in a new basic concepts becoming more precise and readable. ontology. We have reformed the ontology structure into two parts. The first part represents a hierarchy for entities d. Improvements to sense de nitions via basic concepts: and the second part is a hierarchy for relations, i.e., Many word sense definitions can be revised and semantic roles. Furthermore, Attribute and Value types rendered more precise and readable by using basic have also been reorganized accordingly. concepts in their sense expressions. More semantic links are established due to the shared semantic features b. Rich lexical information: In addition to sense de nition, as well as explicit relation links, such as antonyms, each entry of lexical sense may include operational attribute values, and entailment. expressions as well as semantic functions, facilitating future semantic composition processes. Event frames, i.e., argument structures, for event type primitives are also provided. Figure 3 : Hierarchical structure including primitives and basic concepts in E-HowNet 2.0, with the example of the word \" 蜻蜓 (dragon y)\". 68

Brochure 2020 Early in the second year of the project (2020), we released However, commonsense knowledge bases (such as publicly a commonsense word analogy dataset and WordNet and ConceptNet) have long annotated relations published our research results in LREC 2020. Commonsense in Chinese. E-HowNet currently annotates 88,000 Chinese reasoning is fundamental for natural language agents words with their structured definitions and English to generalize inferences beyond their training corpora. translations. We investigated the extraction of accurate Although the Natural Language Inference (NLI) task commonsense analogies from our commonsense has proven a good pre-training objective for sentence representation model E-HowNet, resulting in an representations, the commonsense coverage is limited algorithm (CA-EHN) for extracting accurate analogies and most models are still end-to-end and heavily reliant from E-HowNet with refinements from linguists. CA-EHN on word representations to provide background world is the first commonsense analogy dataset containing knowledge. Therefore, we propose to model commonsense 90,505 analogies covering 5,656 words and 763 relations. knowledge down to word-level analogical reasoning. In Experimental analysis demonstrates that embedding more this regard, existing analogy benchmarks are poor. Take commonsense knowledge is useful and that CA-EHN can for example Chinese Analogy (CA); the simplified Chinese test this aspect of word embedding. Some examples of CA- dataset CA8 and the traditional Chinese dataset CA Google- EHN are shown in Figure 4. translated from English only contain dozens of relations, most of which are either morphological (e.g., a shared pre x) or about named entities (such as capital-country). Figure 4 : CA-EHN (word:word=word:synset). 69

所內合作計畫 IIS Collaborative Projects

計畫綜覽 72 Overview 73 74 陽明大學與資訊所數位醫療計畫 78 Digital Medicine: a Collaboration Between National Yang-Ming University and IIS 82 86 基於文本之開放領域自然語音對話問答系統 Conversational Open Domain Document-based 90 Natural Speech Q&A (COSQA) 94 智慧數位行銷之推薦、建模、與搜尋技術之開發與研究 98 Recommendation, Modeling, and Search for 102 Smart Digital Marketing GraphStor：從裝置到系統和應用程式整體設計的高效能兼高可靠度圖形計算處理系統 GraphStor: E cient & Recoverable Graph-processing System Design, from Devices to Systems and Applications 71

所內合作計畫 / IIS Collaborative Projects 計畫綜覽 / Overview 自 2019 年開始，本所大型「合作計畫」以新的審核機制徵求，每件申請計畫的年度預算大幅提高至 400- 800 萬為原則。目的在於規劃部份所內經費，吸引真正想研究一些創新議題的同仁，使其有機會發揮並有足夠的資源全力投入研究工作。同時也鼓勵所內同仁勇於做大的研究計畫、挑選好的研究主題、並能秉持嚴謹態度來執行。因此一開始就言明凡通過此類申請的研究計畫，每半年需進行一次進度審查，若執行上不符合原來的目標，則將加以中止執行。另外一個條件是必須當成所內的旗艦計畫，當有外賓來訪或者本所學術諮詢委員會議時，必須負責報告計畫執行情形。因為這樣的措施，提出申請的同仁都是希望在研究上有所突破、敢放手一搏者。 2019 年所內有兩個計畫通過審核，一個是王大為研究員的有關醫療資訊計畫及蘇黎助研究員的有關虛擬音樂會計畫。今年（2020）有 4 個計畫案獲得所內審核補助，分別是楊得年研究員的智慧推薦系統計畫、陳郁方研究員的圖形處理系統設計計畫、林仲彥研究員的新世代定序重組混合演算法計畫，及蘇克毅研究員的自然語言處理計畫，除了蘇黎助研究員及林仲彥研究員的計畫已放至「亮點計畫」介紹，其餘的 4 個計畫將在以下的篇幅中報告。 72

Brochure 2020 In 2019, the Institute established IIS Collaborative Projects to develop new research strategies. In principle, each project is subject to a funding limit of NT$130K-$270K, su cient to achieve objectives relating to novel and important/ agship topics. Once granted, projects are reviewed every six months. If a project is deemed not to be achieving its goals, it can be suspended. When foreign research teams visit IIS and during IIS Academic Advisory Committee meetings, progress on all funded projects must be reported. Due to these strict requirements, applications for these grants have declined considerably, even though they can greatly facilitate breakthroughs in research. In 2019, two projects were funded; Prof. Da-Wei Wang's research on medical information, and Prof. Li Su's project on virtual concerts. In 2020, four projects were awarded funding; namely, Prof. De-Nian Yang's \"smart recommendation system\", Prof. Yu-Fan Chen's \"recoverable graph-processing system design\", Prof. Chung-Yen Lin's \"new generation sequencing and recombination hybrid algorithm\", and Prof. Keh-Yi Su's \"conversational open domain document-based natural speech Q&A\". The projects of Prof. Li Su and Chung-Yen Lin are reported in the \"Spotlights\" project area of this brochure, and progress on the remaining four funded projects is detailed in the following pages. 73

所內合作計畫陽明大學與資訊所數位醫療計畫 IIS Collaborative Projects 計畫主持人：王大為博士計畫期程：2019/1~2021/12 緣起醫療照護為資訊與知識密集專業，隨著計算機計算效能與記憶量大幅提升，資料科學與機器學習在醫療照護的應用益形重要，IOM(institute of medicine) 提出學習型健康照護系統 (learning healthcare system)，即指出利用資訊技術加速並深化醫學知識從獲取到應用的程序。Lancet 也推出一本新的期刊 Digital Medicine。在在顯示資訊科技在醫療的潛力。然而醫學與資訊均為相當專業的領域，能專精單一領域就相當困難，要能同時掌握兩個領域的精髓實非易事。因此如何促成領域專家合作，並降低合作的門檻，是一個重要的課題。本研究希望透過陽明大學與中研院資訊所的合作，尋求合宜的合作機制。陽明大學有深厚的生醫與醫療的專業，而資訊所有各資訊領域專家，雙方的連結則可透過醫學資訊的專家來幫忙。初步規劃雙方分別投入五百萬研究經費進行合作。計畫初期將以中風的研究做為第一個合作的研究計，並以此為設計合作機制的實驗計畫，尋找最為合宜的跨領域合作方式。對於跨領域合作機制的設計，初期預計建立一個規劃小組，成員應包含生醫、資訊以及醫療資訊專家，規劃小組主要的工作在提出可能的跨領域研究議題，並且與各不同領域的專家進行溝通研討，著重於溝通協調以及媒介的工作，期望從中整理出有系統的推展跨領域合作的機制，以促成陽明大學與資訊所的長期合作。計畫實施後，交通大學也加入了計畫，因此參與者有陽明交大與中研院三團隊以及 NIH 的范楊振博士。工作項目一、活動 1. 108 年 1 月 28 日於本所舉辦「數位醫療與資訊技術應用系列活動 ( 一 )」 2. 108 年 5 月 14 日於陽明大學舉辦「Digital Medicine 高峰論壇暨媒合會」 3. 108 年 8 月 22 日於陽明大學舉辦「生醫數據研習會」 4. 108 年 12 月 5 日於交通大學舉辦國際研討會，會議的主題為「The Workshop on Digital Medicine in the Era of AI and Big Data」 74

Brochure 2020 二、促成的合作計畫成果 1. 腦微出血自動 3-D 實例物件分割模與偵測框架動機 : 高齡化社會已是全世界重大議題，各種老化相關疾病中以失智症最為令人矚目，早期偵測失智症或高風險群將有助於及早確診、施予適度治療、及減緩病程進展。腦血管與腦神經系統息息相關，腦血管損害將嚴重影響神經系統功能甚至神經退化，例如中風病人轉變為失智症風險極高。因此如何有效偵測腦血管變異與病灶所在及大小，並釐清腦血管病變與神經系統病變之關係，將有助於疾病診斷與治療方針。微出血 (microbleeds) 是一種腦小血管疾病。過去研究指出若腦中有微出血存在，則會與其認知功能缺失、腦中風風險提高有關 (Chung et al., 2016; Chung et al., 2017)，且會影響缺血型中風的治療手段與後續預防措施。在各式臨床檢查醫學影像中，腦微出血可利用磁敏感加權血管攝影影像 (SWAN) 觀察到，其在影像上之表徵為形狀為圓形或橢圓形且邊緣清楚的低訊號區域 (Greenberg et al., 2009)。然而在該類型影像上有許多類似微出血的結構會影響標記正確率，例如：正常血管、鈣化等。且又因為微出血面積很小不易偵測，因此常需要耗費大量的人力、時間去進行檢查與標記。因此本研究首要目標，為開發影像處理及深度學習技術，可自動偵測腦中微出血所在與大小。再者，為釐清腦血管病變與神經系統病變之關係，本研究採用臨床「體顯性腦動脈血管病變合併皮質下腦梗塞及腦白質病變 (CADASIL)」此一特殊病患，此疾病已被驗證與基因遺傳有關。而這些病患腦中常被發現有大量的微出血 (Chen et al., 2014) 與腦白質高信號病變，因此藉由探究不同病程 CADASIL 病患 ( 未發病、發病期短、發病期長 ) 之腦中微出血分布與其大腦白質病變的關係，將有助於釐清腦血管病變與神經系統功能之關聯性。目前結果 : 我們已與臨床醫師合作以人工方式標記建構具有 ground-truth labeling 的微出血影像資料庫。並基於 2-D 實體分割模型 (Mask R-CNN) 技術，開發 3-D 實例物件分割模型 ( 圖一 )，以改善其他 2-D 模型 ( 例如： Retina U-Net 和 U-Faster R-CNN) 於偵測微出血準確率不足之問題。我們所建立之 3-D 實例物件分割模型，其 AUC 為 0.706（IoU=0.5），領先現有之微出血偵測模型 ( 圖二 )。詳細偵測示意圖請參考圖三。目前此研究成果正在撰寫論文準備投稿中 (IEEE Transactions on Medical Imaging)，投稿之文章標題為「Automatic 3D Instance Segmentation and Detection Framework for Cerebral Microbleeds」。 75

IIS Collaborative Projects所內合作計畫圖一：3-D 實例物件偵測模型架構圖。圖二：多個模型之 AUC 比較圖。圖三：3-D 實例物件偵測模型之偵測微出血結果圖。紅框為自動偵測結果；綠框為 ground-truth；左下角數字為同一個人不同切面之影像。 2. 資訊所與陽明和交大共同開發中風 APP 希望透過此 APP 的開發來找尋共同研究議題。目前的醫療領域相關內容由陽明大學主導，授權機制由中研院主導，app 實作由交大主導。資訊所團隊將利用此機會實現動態同意 (dynamic consent) 機制，並以此機制為基礎，招募願意提供健康相關資訊做研究的個人，而個人控管其資訊的機制就是動態同意。動態同意的核心理念是將資料提供者視為積極的參與者，以往以猜想為基礎的個案研究。受試者在簽署同意書後，幾乎與研究團隊沒有進一步的互動，而動態同意實為增進個人參與的第一步。 76

Brochure 2020 考量健康資料會持續擴展，相關健康資料本身的格式亦可能會不定期更動，且資料主體 (Data Subject) 有針對前述健康資料的授權細節進行維護的需求。授權機制的設計應考量授權的彈性、完整性以及可追溯性。授權的彈性，需考量到針對不同外部健康資料的格式，能夠使用同一套邏輯予以處理；授權的完整性，由於外部健康資料可能會有不同的結構，且資料主體可能會對不同階層資料進行授權，要考慮盡可能以最小的空間儲存授權結果並保證其授權完整性；授權的可追溯性，是指在資料主體不斷維護授權的過程中，需將資料主體的修改歷程都予以留存，以避免日後資料主體對於授權內容有異議時，卻沒有歷程可供備查的情事發生。目前的架構規劃如下圖：圖四：腦微出血自動 3-D 實例物件分割模與偵測框架。 3. 促成的合作計畫，科技部計畫「言語溝通中情緒表達與理解之腦神經機轉解碼」計畫編號：MOST 108-2321-B-009-006-MY2 4. 建立資料集購買美兆資料庫，已完成 IRB 申請審核，並與美兆達成協議，本所研究人員皆可先利用此資料庫做研究，在完成實驗後再購買所需要的資料即可。未來規劃整合台灣生物資料庫的資料，並整合環境與其他相關資料。結語及討論目前執行發現舉辦些活動促成研究人員之間的互動可能是最有效率的做法，建立資料集可有機會降低進入醫學料研究的門檻，但需要相當的準備與規劃。 77

所內合計 Digital Medicine: a Collaboration Between 作畫 National Yang-Ming University and IIS IIS Collaborative Projects Principal Investigator: Dr. Da-Wei Wang Project Period: 2019/1~2021/12 1. Motivation and goals Medical science and the healthcare system depend on high quality data and availability of ever-expanding knowledge. Applications of data analytics and machine learning in medicine are attracting global attention. Medical institutes are promoting the concept of learning in healthcare systems to enhance and accelerate data collection, knowledge acquisition, and implementation in practice. Lancet recently published a new journal named Digital Medicine. However, both medicine and information science are highly specialized elds and mastering the essence of both elds is a daunting task. Therefore, promoting cooperation between experts in these fields is critical. This study hopes to find a workable collaborative mechanism between National Yang-Ming University and IIS. National Yang-Ming University hosts many experts in biomedicine, whereas the expertise of IIS lies in information science. These two entities can be linked through medical informatics technology. The preliminary plan is for the two parties to invest NT$5 million in research funds to enhance cooperation between them. In the early stages, research on stroke was adopted as the rst collaborative project, representing a framework to design the most appropriate mechanism for cross-disciplinary cooperation. Initially, a planning team was established, including biomedical, information science, and medical informatics experts. The role of the planning team is to propose potential cross-disciplinary research topics and to communicate with experts across the various elds. It is anticipated that, ultimately, a systematic mechanism for promoting cross-domain cooperation will be developed to promote long-term cooperation between National Yang-Ming University and the IIS. Later, National Chiao Tung University and Dr. Fan Yangzhen of NIH joined this project. 2. Activities Organized 2.1 January 28, 2019. \"Digital Medicine and Information Technology Application Series\" in IIS. 2.2 May 14, 2019. \"Digital Medicine Summit/Matching Forum\" in National Yang-Ming University. 2.3 August 22, 2019. \"Biomedical Data Workshop\" in National Yang-Ming University. 2.4 December 5, 2019. \"Workshop on Digital Medicine in the Era of AI and Big Data\" in National Chiao Tung University. 78

Brochure 2020 3. Initiated Projects 3.1 Automatic 3D Instance Segmentation and Detection Framework for Cerebral Microbleeds The ageing population has become a global issue. In particular, dementia places a tremendous burden on society. Early identi cation of high-risk groups, as well as early diagnosis, can e ectively slow the progression of dementia through appropriate treatments. The cerebrovascular system is closely linked to the cerebral nervous system. Cerebrovascular damage can seriously affect functioning of the nervous system and may even induce neurodegeneration. Therefore, stroke patients are at a high risk of dementia. E ectively detecting cerebrovascular variations and the location and size of lesions, as well as clarifying the relationships between cerebrovascular and neurological diseases, could greatly help with diagnoses and treatments of dementia. Cerebral microbleeding is a cerebrovascular disease that is correlated with loss of cognitive function and increased risk of stroke (Chung et al., 2016; Chung et al., 2017), and it impacts treatments for ischemic stroke and the options for subsequent precautionary measures. Cerebral microbleeds can be observed using magnetic sensitive weighted angiography (SWAN), manifesting in images as low-signal circular or oval- shaped areas with defined edges (Greenberg et al., 2009). However, other structures may present similar characteristics in SWAN images, such as normal blood vessels and calci cation, impeding accurate diagnosis. Moreover, since microbleeds are small and difficult to detect, they are time-consuming to diagnose. The primary goal of this study is to develop enhanced image processing and deep learning technologies that can automatically localize and measure microbleeds from SWAN images. Furthermore, in order to clarify the relationships between cerebrovascular and neurological diseases, we are studying patients with \"somatic dominant cerebral artery vascular disease combined with subcortical cerebral infarction and white matter lesions (CADASIL)\". CADASIL is thought to be genetically inherited. Patients frequently exhibit multiple microbleeds and high-signal lesions in brain white matter (Chen et al., 2014). Establishing the relationship between microbleed distribution and white matter lesions in the brain will help clarify the correlation between cerebrovascular diseases and nervous system function. We have partnered with clinicians to manually construct a microbleed image database with ground-truth labeling. A 3D object segmentation model has been developed based on 2D entity segmentation model technology (Mask R-CNN) (Figure 1) to remedy the insu cient accuracy of other 2D models (such as Retina U-Net and U-Faster R-CNN). The 3D object segmentation model we have created has an AUC of 0.706 (IoU = 0.5), which is better than those of existing microbleed detection models (Figure 2). A detailed example is presented in Figure 3. 79

IIS Collaborative Projects所內合作計畫 Figure 1 : System diagram of 3D object segmentation. Figure 2 : AUC for microbleed detection models. Figure 3 : Detection of microbleeds by a 3D object detection model (top row=full image; bottom row=enlargements). Red represents detected results; green is ground-truth results; orange is colocalization of model-detected and ground-truth microbleeds. The numbers in the lower left corner of the upper row of images represent di erent imaging aspects of the same patient. 3.2 Development of a stroke app There are two objectives for this project; rst to develop a clear end-product and, second, to discover suitable collaborative mechanisms. The project has three principle components, i.e., content, system implementation, and a consent mechanism. National Yang Ming University is leading the content development aspect, National Chiao Tung University is responsible for system implementation, and IIS is dealing with the consent mechanism. The IIS team is leveraging previously developed authorization service design to implement a dynamic consent mechanism and is currently engaged in recruiting patients to use this app and provide health-related information for further research. The core idea of dynamic consent is to treat data providers as active and continuous participants. Currently, participants tend to sign an informed consent form at the 80

Brochure 2020 beginning of a research project and thereafter there is almost no further interaction between data provider and research team in terms of consent. As health data continues to expand, the formats by which health data is presented may also change. Therefore, data providers may need to adjust their consent accordingly. The design of a consent mechanism should incorporate flexibility, integrity and traceability. The system architecture is illustrated in Figure 4 below. 1 2 3. Update database 1. Set Consent 2. Send to server Resource Owner (RO) 4. Display updated consent 3. check consents User Interface Authorization Server (AS) MS SQL Server 1. Apply for data 2. Send to AS 3 4. Retrieve consented data Requesting Party(RP) User Interface Resource Server (RS) 6. Inform FHIR server 4 requesting party 5. Store requested data in FHIR format Figure 4 : System architecture for a stroke app. 4. External Research Project \"Decoding the Neurological Mechanism of Emotion Expression in Language Communication\" (funded by MOST) Project number: MOST 108-2321-B-009-006-MY2 5. Build Data Repository The Mei-Chao health examination datatset has been acquired and, following an IRB review and negotiations with Mei-Chao, researchers in IIS can use this dataset for their research, only paying requisite fees for the data actually utilized. We plan to incorporate other data sources to build an integrated research environment. 6. Conclusion We have found that organizing interdisciplinary gatherings and discussions is the most cost-e ective way to facilitate collaborations. Establishing a data repository has the potential to lower initial barriers to cooperation, but it requires dedicated resources and adequate planning. 81

內所合作計畫基於文本之開放領域自然語音對話問答系統 IIS Collaborative Projects 計畫主持人：蘇克毅博士、王新民博士、曹昱博士計畫期程：2020/1~2022/12 本專案是由資訊科學研究所所資助的三年期計畫，用以研究及解決自然語言處理 (Natural Language Processing) 及語音識別 (Speech Recognition) 領域中的難題。每年預算為新台幣四百萬元。在這個知識爆炸的時代 1，大量累積的知識已遠遠超出了人腦所能負荷，所以必須要有電腦的幫助。例如我們需要從各種信息來源中提取相關資料、對提取內容進行推論、從已知信息中探勘新知識、根據資訊生成相對應的自然語言文本、以及總結 / 翻譯內容等等。由於大多數知識都是以文本形式呈現，因此各種自然語言處理程序已成為必要的工具。此外，由於理解內容並進行推論，對於許多進階應用至關重要，自然語言理解 / 推論（Natural Language Understanding/Inference）也因而成為非常重要的課題。由於任何自然語言理解 / 推論相關的議題，都可透過詢問適當的相應問題來測試，因此問答系統是評估這類議題進展最合適的測試平台。本專案「基於文本之開放領域自然語音對話問答系統（Conversational Open Domain Document-based Natural Speech Q&A System）」，不僅可作為自然語言理解 / 推論的測試平台，還能用於許多不同的地方：像客戶服務、醫療顧問、個人助理 ( 如蘋果電腦 Siri/HomePod、亞馬遜 Alexa/Echo、微軟 Cortana、谷歌 Assistant/Home 等 ) 、數位式學習、信息搜索、汽車導航系統等。本專案所提出的系統架構如圖一所示，其相關的問答操作流程如圖二所示。本系統將在深度神經網絡 (Deep Neural Network) 的框架上實作。與圖像或語音處理相較，自然語言理解 / 推論到目前為止，仍然是人工智慧中最具挑戰性的問題。這主要是由於它們需要涉及抽象 / 聚合式的特徵、進行深入的理解 / 推論，並常常需要大量的外部知識 ( 例如：詞彙 / 概念關係、常識和領域知識等 )。但人們在進行理解和推理時，通常不會意識到這點。儘管在一些簡單的數據集上，有些最先進的系統已表現出與人類相當的性能，然而它們主要是利用人類未注意到的表面特徵 (surface feature) 來得到答案，並沒有真正的進行推理。所以 2018 年圖靈獎 (Turing Award) 得主 LeCun 宣稱 2：「電腦還是非常笨，最聰明的人工智慧系統所擁有的普通常識，還沒有一隻家貓多。」儘管問答任務由來已久，本專案與大多數先前的系統並不相同，因為我們將處理自然語言理解 / 推論中各個方面的難題。例如：開放領域 (open-domain)、自由文本 (free-text)、多文本推論 (multi-document- rational)、常識推論 (common-sense-reasoning)、多段式推論 (multi-hop-inference)、語用推論 (pragmatic- reasoning) ( 常用於多輪對話中 ) 、自然語音 (natural-speech)、文本語音緊密耦合 (text-speech-closely- coupled)。與之相對的是：特定領域 (speci c-domain)、知識庫問答 (Knowledge-base QA)、單一片段推論 (single-passage-rational)、無外部知識 (without-external-knowledge)、一段式推論 (single-hop-inference)、無語用推論 (no-pragmatic-reasoning)、朗讀語音 (reading-speech)、文本語音鬆散耦合 (text-speech- loosely-coupled)。進行本專案將迫使我們面對人工智慧的真正挑戰，並動手處理那些前沿的研究課題。這個專案的成功將推動此領域向前邁出重要的一步。此外，在此專案中開發的技術，還可用於上述提到的許多不同應用。本專案因而將可以幫助本所將其影響力擴展到工業界。 1 維基百科 (Wikipedia) 到 2019/10/24，已有六百萬篇報導及四千九百萬頁。 2 Communications of ACM. 62(6). June 2019. https://cacm.acm.org/magazines/2019/6/236990-neural-net-worth/fulltext 82

Brochure 2020 圖一：「基於文本之開放領域式自然語音對話問答系統」架構。 83

IIS Collaborative Projects所內合作計畫圖二：問答操作流程。給定一個問題、維基百科、和一些外部資源（例如 WordNet、ConceptNet 和 Freebase 等），我們的任務是從維基百科和線上檢索的文本中，找出（或推論出）最可能的答案。為了減少計算量，我們會先在預處理階段使用現成的訊息檢索工具（例如 Lucene 搜索引擎），摘錄出相關的維基百科頁面（Wikipages）。此外，詢問不同類型的問題 ( 例如詢問某個人 / 事 / 時 / 地 / 物、程序、原因等 )，其結果類別也會不一樣 ( 例如答案可為：人名 / 位置 / 組織 / 產品 / 事件名稱、日期 / 時間、持續時間、距離、程序、原因等 )，且預期的答案形式亦不相同（例如可為：是 / 否、某個特定答案、或自由文本）。但不同類型的問題，往往需要以不同的機制（在圖二中稱為答案模式 (Answer-Mode)）來獲取所需的答案。例如，電腦需要執行以下不同的操作來分別找出結果：從給定的文本中定位一個或多個文字片段、從提取的信息中進行命題成立與否判斷（針對是 / 否問題）、執行邏輯（如：交集、聯集、補數等）、算術（如：求和、求差等）、以及匯總運算（如：比較、求最大 / 最小、計數等）。很明顯的，僅靠一個通用模組來總合處理上述各種尋求答案的機制，將是一件很困難的事情。我們因此提出了「分而治之」(Divide-and-Conquer) 的框架，將這一個複雜的任務轉換為一組簡單的子任務。也就是說，每個特定的答題機制將由不同的個別答案生成模組 / 模型來處理；而且對同一個答題機制，亦可採用集成模式 (ensemble approach) 來整合出答案（即答案是由多個不同的答案生成模組之結果匯整而成）。在這種運作模式下，分發器 (Dispatcher) 模組將首先識別問題的可能答案類型和答題模式 ( 如圖二所示 )，然後分別啟動各個相應的答案生成模組，以獲得一組可能的答案。然後，聚合器 (Aggregator) 模組通過合併那些獲得的結果，來生成最終答案。 84

Brochure 2020 我們的研究將著重在下列議題：（1）如何通過語用推論來導引用戶快速找到特定答案。（2）如何通過多段搜尋 (multi-hop search) 來尋找散佈在各種文本上的相關段落。（3）如何添加及利用從不同的外部知識庫（例如 Freebase，Wikidata 等）中提取的中間線索。（4）如何從 WordNet / ConceptNet 提取外部知識（例如普通常識、領域知識等），以增強多段推論能力。（5）如何處理自然語音，並緊密耦合文本處理和語音識別模組。在本專案的第一年（2020 年），我們將基於預先提取的維基百科頁面，完成一個特定領域的基準系統。預期完成下列項目：（1）設計一個統一的框架，以整合各種尋求答案的機制，（2）設計各種答案生成模組以模仿人類推理過程，以及（3）利用語音和文本線索，將文本處理和語音識別模組緊密耦合。在本專案的第二年（2021 年），我們將完成可在多領域中進行多段推論的系統。預期完成下列項目：（1）以多段搜索演算法，獲取散佈在不同文本上的相關支持證據，（2）將一個複雜的問題分解為一系列簡單的問題，（3）根據提取的支持證據進行多段推論，（4）進行領域調適 (domain adaptation)，以及（5）檢測並消除自發性語音中重複及不流暢的部分，以改善自然語音識別。在本專案的最後一年（2022 年），我們將完成一個開放領域系統，能夠以增強的常識推理能力，同時處理維基百科頁面和其他在線文本。預期完成下列項目：（1）可根據普通常識進行推理（將引入適當的中間目標），（2）在對話管理系統中進行語用推理，（3）增強可解釋性（即產生相關的推論過程），以及（4）藉由對抗性學習 (adversarial learning) 來增強自然語音識別的強健能力 (robustness)。為了展示我們所提出做法的優勢，我們將持續參加由科技部主辦的「科技大擂台：與 AI 對話」競賽、以及國際上知名的機器閱讀問答競賽 (Machine Reading for QA (MRQA) shared task)。 85

內所合計 Conversational Open Domain Document- 作畫 based Natural Speech Q&A (COSQA) IIS Collaborative Projects Principal Investigators: Dr. Keh-Yih Su, Dr. Hsin-Min Wang, and Dr. Yu Tsao Project Period: 2020/1~2022/12 This three-year project is supported by IIS to study challenging problems in Natural Language Processing (NLP) and Speech Recognition (SR). The project is funded to the amount of US$130,000 each year. In this era of knowledge availability1, advanced knowledge processing capability is essential―necessitating information extraction from various sources, making inferences from content, gaining new knowledge from available information, generating respective natural language texts, and summarizing/translating content, among other aspects―with the huge volume of available knowledge far exceeding the processing capability of a single human brain. Since most knowledge is expressed in text form, various NLP tools have become indispensable. Moreover, as understanding content and making inferences from it are essential aspects of many advanced applications, Natural Language Understanding/Inference (NLU/NLI) has become a very important topic. Since any NLU/NLI-related issue can be tackled by asking an appropriate corresponding question, Q&A systems are the most appropriate testbed for evaluating progress in NLU/NLI. The COSQA project not only represents an ideal testbed for conducting NLU/NLI research, but it can also contribute to many different applications, such as customer service, medical consultant systems, personal assistants (e.g., Apple Siri/ HomePod, Amazon Alexa/Echo, Microsoft Cortana, Google Assistant/Home), e-learning/information-seeking systems, and car navigating systems, amongst others. A block diagram of our proposed system is shown in Figure 1, and an associated Q&A operation ow is illustrated in Figure 2. This system will be implemented on a Deep Neural Network framework. Unlike image/speech processing, NLU still remains the most challenging problem in AI. This is mainly due to NLU involving abstract/aggregative features and deep understanding/inference, which frequently require considerable external knowledge (e.g., lexicon/concept relationship, commonsense and domain knowledge), which people may not be aware of even though they display understanding and reasoning. Although some state-of-the-art (SOTA) systems report excellent performance, equivalent to that of humans on some simple datasets, they mainly capture surface features ignored by humans and so cannot really be deemed to infer. Consequently, current SOTA systems lag far behind human performance on real datasets. As quoted from LeCun2: \"Machines are still very, very stupid. The smartest AI systems today have less common sense than a house cat.\" Although the Q&A task has a long history, the approach of the COSQA project differs from most previous systems in that we will handle in every aspect the di cult elements of NLU/NLI, i.e., open-domain, free-text, multi-document rationale, commonsense reasoning, multi-hop inference, pragmatic reasoning (required in multi-turn dialog), natural speech, closely-coupled text speech (versus specific domain, Knowledgebase QA, single-passage rationale, without external knowledge, single-hop inference, no pragmatic reasoning, reading-speech, loosely-coupled text speech). We are con dent this project will signi cantly advance this eld by tackling real challenges in AI. Moreover, the techniques developed in this project could be used in many di erent applications, extending the in uence of IIS on industry. 1 According to 2019/10/24 statistics, there are total 6/49 million articles/pages found in Wikipedia. 2 Communications of ACM. 62(6). June 2019. https://cacm.acm.org/magazines/2019/6/236990-neural-net-worth/fulltext 86

Brochure 2020 Figure 1 : Block diagram for the proposed Conversational Open Domain Document-based Natural Speech Q&A (COSQA) system. 87

IIS Collaborative Projects所內合作計畫 Figure 2 : Proposed Q&A operation ow. Given a Question, Wikipedia and some external Resources (such as WordNet, ConceptNet or Freebase), the task is to establish the most likely answer from related Wikipages and on-line retrieved documents. To reduce computation load, we rst extract related Wikipages using an o -the-shelf information retrieval tool (e.g., the Lucene search engine) as a pre-processing stage. Then, di erent kinds of questions (e.g., Who, What, When, Where, How, and Why) might be posed to those results, generating different answer types (e.g., person/ location/organization/product/event name, date/time, duration, distance, procedure, reason, etc.), in di erent answer forms (e.g., Yes/No, specific answer, free text). Importantly, different mechanisms (termed \"answer- modes\" in Figure 2) would be required to obtain various desired answers. For example, acquiring an answer might require conducting the following different operations: locating one or more spans from the given text, conducting entailment judgment (for yes/no questions), and performing logic (e.g., intersection, union, complement), arithmetic (e.g., sum, di erence), and aggregative (e.g., comparison, Max/Min, counting, etc.) operations on the extracted information. Obviously, it is exceedingly di cult to design a general-purpose module to handle all of the various answer- seeking mechanisms mentioned above. Therefore, we propose to utilize a Divide-and-Conquer framework to convert this complex task into a set of simple sub-tasks. Accordingly, each speci c answer-mode will be handled by a di erent answer-generation module/model. Moreover, one answer-type might be handled by several answer-generation modules if an ensemble approach is adopted. Thus, we have designed a dispatcher module to rst identify possible answer-types and answer-modes associated with a given question, allowing us to activate various corresponding answer-generation modules to get a set of results. Then, an aggregator module can generate the nal answer by merging all of the results. 88

Brochure 2020 Our research focuses on the following issues: (1) How to guide the user to quickly reach a speci c answer through pragmatic reasoning; (2) How to identify related passages scattered over various documents via multi-hop searching; (3) How to add intermediate clues (in addition to supporting evidence) extracted from different external knowledge bases (e.g., Freebase, Wikidata); (4) How to conduct multi-hop inference enhanced with external knowledge (e.g., commonsense, domain-knowledge, etc.) extracted from WordNet/ConceptNet; and (5) How to handle natural speech and integrate both the NLP and SR modules. In the first year of the project (2020), we will complete a domain-specific baseline system based on pre-extracted Wikipedia pages. To do this, we will: (1) design a uni ed framework to integrate various answer-seeking mechanisms; (2) design various answer-generation modules to mimic human reasoning; and (3) utilize speech and text clues to closely integrate the NLP and SR modules. In the second year (2021), we will complete a multi-domain system with multi-hop inference capability by: (1) designing a multi-hop searching algorithm to obtain all associated supporting evidence scattered across different documents; (2) decompose a complex question into a sequence of simpler questions; (3) conduct multi-hop inference based on the extracted supporting evidence; (4) conduct domain adaptation; and (5) detect and remove dis uency to improve spontaneous speech recognition. In the third year (2022), we will complete an open-domain system with enhanced commonsense reasoning capability over both Wikipages and other on-line documents by: (1) conducting commonsense reasoning (introducing various intermediate goals); (2) pragmatic reasoning in dialog management; (3) enhanced interpretability (i.e. generate associated rationales); and (4) enhanced robustness (via adversarial learning) to recognize natural speech. To demonstrate the power of our developed methodologies, we will continuously participate in the \"Formosa Grand Challenge -- Talk to AI\" contest (organized by the Ministry of Science and Technology) and the international Machine Reading for QA (MRQA) shared task. 89

內所合作計畫智慧數位行銷之推薦、建模、與搜尋技術之開發與研究 IIS Collaborative Projects 計畫主持人：葉彌妍博士、楊得年博士計畫期程：2020/1~2022/12 運用人工智慧（如人工物聯網及預測性分析）於數位行銷中，已被廣泛認為是近期間的顛覆性趨勢，例如沉浸式延伸實境（Extending Reality，XR，其包含 VR/AR/MR）中的零售、社群電子商務及程式化廣告。國際數據資訊公司（International Data Corporation，IDC）預測全世界在延伸實境上的花費將在 2020 年達到 18.8 億美元（含零售部分的 1.5 億美元），且延伸實境市場的年成長率亦能維持 77% 至 2023 年；根據 Forbes、Walker Sands 及 L. E. K. 的市場調查指出，有 79% 的消費者可能會光顧顯示客製化商品的延伸實境商店；65% 的消費者對延伸實境購物感到興奮，而當中的 54% 承認社群購物是他們購物的方式。因此，Oracle 指出，有 78% 的線上零售商，如 IKEA、Lowe、Alibaba 以及 eBay，已經（或正在計畫）實作出延伸實境及人工智慧。再者，eMarketer 預測美國在 2020 年有 86.2% 的電子廣告程式化，且全球電子廣告的成本亦將在 2021 年達到 375 億美元。另一方面，商業內幕、Gartner 及 Forbes 預見 2025 年時物聯網市場的年營業額將超過 3 兆美元，且物聯網裝置將達到 64 億台，而其中有超過 80% 的裝置包含人工智慧元件。由於 1) 新興的延伸實境技術，如：促使彈性及客製化的購物環境設計變得可行的多視角顯示（Multi- view Display，MVD），2) 涉及多方的策略性行為，如：線上消費者、零售商及廣告經紀商，以及 3) 大量且有偏差及雜訊的電子商務購物紀錄與社群物聯網裝置的資料，故設計智慧數位行銷的推薦系統在未來將是複雜的大工程。因此，在此三年計畫裡，我們的目標是設計以下核心技術：1) 考慮社群影響力及互動、運用延伸實境中多視角顯示的彈性來配置適合個別消費者的商品顯示之子系統；2) 藉由知識圖譜（Knowledge Graphs，KG）及時序性社群網路，有效利用多個相關商品及消費者動態認知以驅動社群影響力於多個推銷活動之子系統；3) 能從不完整資訊中為多個廣告經紀商精準預測即時競標（Real-time Bidding，RTB）的得標價之子系統；以及 4) 能快速且主動部署社群與人工物聯網裝置，以探測延伸實境中消費者的行為以及無偏差推薦與對消費者及時支援的困難。圖一即為延伸實境推薦系統的整體框架。圖一：智慧數位行銷資料處理之框架。第一年本計畫目標是利用支援主要視角（獨自觀看不同商品）及群體視角（與好友觀看共同商品）間的彈性視角切換的多視角顯示，為延伸實境群體購物設計一套推薦系統。雖然消費者介面及實體商品在消費者的環境中是一致的，但顯示出的虛擬商品可根據不同消費者的不同喜好做微調，而共同觀看的商品可促進社群互動及討論，以提升銷售額。故這套創新的延伸實境群體推薦系統是具有挑戰性的，因其需考慮 1) 個人喜好、2) 群體互動、3) 商品相關性、4) 群體動態及視角切換。相較之下，現有的個人化推薦並未考慮社群互動，如：協同過濾（Collaborative Filtering，CF）與貝氏個人化排序（Bayesian Personalized Ranking，BPR）；而群體推薦僅基於累加的喜好將一組統一的商品指定給一群消費者，犧牲了個人喜 90

Brochure 2020 好，如：由注意力網路（Attention Networks）或圖神經網路（Graph Neural Networks）設計的喜好累加器（Preference Aggregators）。為了得出多視角顯示下的消費者滿意度，我們提出一道新的最佳化問題，其名為社群感知延伸實境群體商品配置（Social-aware XR Group-Item Con guration，SXGIC），以最大化整體個人滿意度，同時確保 1) 消費者在不同位置上不會顯示重複商品；2) 同一商品的共用群體視角數控制在一定程度內；3) 相似及（或）互補商品放在彼此附近；4) 在連續位置間的共用視角的群體分割是相似的。我們將證明 SXGIC 問題的 NP-hard 及不可近似性，並提出一套整數規劃問題做為 SXGIC 的比較基準。我們亦將利用線性鬆弛後的整數規劃問題的分數解設計隨機湊整（Randomized Rounding）策略，以構造出最佳的多視角顯示配置。最後，我們將提出的方法引入一套以 Unity 建構並配備了 hTC VIVE HMD 及 Microsoft HoloLens 的延伸實境購物應用軟體原型來執行消費者個案研究，以驗證消費者實際滿意度與我們提出的最佳化問題的相關性。圖二：不同總人數時之整體個人滿意度。圖三：不同資料集中之整體個人滿意度。圖四：消費者個案研究之初步結果。初步實驗結果顯示本計畫提出之隨機湊整策略（圖二及圖三中之 AVG 及 AVG-D）所構造之多視角顯示配置相較其他現存推薦策略之總計滿意度至少高出 30%，並能有效在個人喜好（滿意度黑色部分）及群體互動（白色部分）之間達成平衡。此外，使用 Unity 建構之初步消費者研究個案研究中，我們從消費者回饋抽取最佳化問題中之重要參數（圖四 a），並依據此參數驗證實際使用者滿意度與最佳化模型相符（圖四 b）。進一步分析隨機湊整策略所構造之多視角顯示配置，我們發現所有使用者在推薦結果中被分群為緊密且喜好相近之次群體，導致次群體內部邊數遠高於跨群體邊數（圖四 c），而大多數好友能夠觀看相同物品（圖四 d）。第二年上述的延伸實境推薦系統對商品顯示的配置是根據預先評估的個人喜好及社群影響力帶來的益處，並未捕捉到重複且多方面的推銷活動中的社群影響力及消費者的動態認知間的相互影響（其稱為共同演化）。現有的影響力最大化研究選出若干位消費者作為種子，以推銷單一目標商品使得最多消費者可受影響；然而在現實生活中，公司企業經常在多個活動中推銷多樣相關商品。因此在本計劃的第二年中，我們將強調有效量化及將電子商務中多個推銷活動的社群影響力及消費者的動態認知所帶來的益處最大化，其面臨的新挑戰如下：1) 多數過往研究並未考慮推銷的商品間互補及替代的關係，其會經由經濟學中需求交叉彈性的概念影響消費者的喜好；2) 對於商品間關係的認知通常都是主觀的；3) 由於新購買的商品通常會給消費者帶來新體驗，使得他們改變對相關商品的認知，故對商品間的關係的主觀認知是動態的；4) 對商品間關係的主觀認知的改變可能會影響消費者的喜好，甚至改變對朋友的社群影響力。 91

IIS Collaborative Projects所內合作計畫因此，考慮到上述錯綜複雜的關係，為了處理社群影響力隨時間不同量化，我們將運用表述事實的知識圖譜來捕捉商品間的關係，以及針對某種關係的元結構（meta-graph）來描述具有此關係兩商品間的連結之語意，而針對每個元結構的權重則表示其對特定關係的重要性。根據先前購買的商品來學習和微調元結構的權重，消費者對商品關係的動態主觀認知可藉由一個以商品為節點、商品間的關係（如：互補及替代）及其相關性分數為邊的「主觀商品網路」捕捉。據此，我們將針對一系列相關商品的推銷，提出一道新問題──帶有消費者動態主觀認知的影響力最大化（In uence Maximization with Dynamic Personal Perception，DPP-IM），其同時考慮 1) 商品間的關係、2) 因著先前購買商品而會隨時間改變的個人喜好及社群影響力。DPP-IM 問題的目標在於在預算內選擇將在不同時間由適當消費者推銷的相關商品，以最大化在動態主觀認知及社群影響力下之利益。第一年提出的延伸實境推薦系統有另一項限制 ──為完全中心化的系統（亦即顯示的商品配置完全由零售平台所支配）。但在現實中，零售平台所販賣的大量商品來自不同的供應方，而這些供應方會為了有限零售資源相互競爭，如：延伸實境商場中的商品顯示位置及電子商務的廣告投放位置等。因此，仔細檢驗延伸實境推薦系統中供應方的策略性及競爭性的行為是很重要的。故在本計劃的第二年，我們將結合即時競標機制來協助將延伸實境推薦系統中的商品顯示放置於正確位置。為了幫助供應方平台（即所提出的延伸實境推薦系統）推測競標結果及配置商品顯示，及引導需求方平台（即廣告經紀商）在有限預算下的競標行為，設計一套能精準預測得標價（通常是競標價中最高者）的子系統是必要的。在一般封閉式拍賣（Sealed-bid Auction）機制下，如：次價投標拍賣（Second-price Auction），對需求方平台來說，一個主要的挑戰是缺少關於得標價的完整資訊（特別是對在過去競標失敗者），因為得標價只有競標成功者才可見。這問題在採用首價投標拍賣時會更為嚴重，且其相較於次價投標拍賣變得越來越普及。在首價投標拍賣中，即使需求方平台贏得廣告曝光機會，得標價仍不可見。唯一可得知的資訊是需求方平台是否以其競標價贏得廣告曝光機會。因此，我們的目標是協助需求方平台設計一套有效的在各種分佈及不同拍賣機制下的得標價預測方法。為了設計一套新的得標價模型，本計畫將基於深度模型架構評估不同損失函數的效能及影響；再者，我們將設計一模型階層（元件）及損失函數，自得標率習得雙重設限的（Double-Censored）得標價。我們亦將分析正則項如何使分佈更為平滑，及如何影響得標價模型的效能；再者，我們將分析得標價預測如何影響得標率及利潤，並研究如何運用得標價模型建立競標策略。我們將以利潤及成本曲線等度量來評估我們的模型及競標策略。第三年我們將轉移焦點至密集的資料收集以及經由物聯網（Internet of Things，IoT）進行即時活動與適地性搜尋，以促進消費者回饋的即時探測並提供延伸實境購物環境中的即時支援。因應延伸實境商店中的客製化商品配置，探測即時消費者回饋是必要的。再者，為了支援不同種類的回饋以有效排序多視角顯示配置，需要運用社群物聯網（Social Internet of Things，SIoT）協同辨識及處理局部活動，因此多種物聯網是必要的。另一方面，作為科技的新趨勢，對延伸實境環境與介面不夠熟悉的消費者可能會遇上技術性問題而需要求助，且在全面化使用延伸實境商店的情形下，運用社群物聯網來提供對消費者立即的支援及避免部署過多的員工是重要的。因此，設計一套可擴展且快速的社群物聯網的部署與溝通方式以支援上述需求是很重要的。據此，我們將考慮社群物聯網的溝通、運算與覆蓋的行動邊緣運算網路，探討延伸實境商店中的社群物聯網群體建構與個別社群物聯網的選擇工作，以因應動態環境中對消費者的支援，供應不同類型回饋的探測及適地性搜尋。我們將提出一道新的最佳化問題，以針對適地性探測及推薦，在動態環境下得到最佳的搜尋結果。 92

Brochure 2020 最後，本計畫目標在於強化推薦模型，使得其能穩定處理有偏差的資料。在現實世界中從過往購物紀錄及廣告競標紀錄收集而來的相關資料，必然會有雜訊與偏差，過往研究通常基於資料沒有偏差的假設將推薦模型最佳化，且許多研究使用線下評估結果來表示線上的結果。然而，收集到的資料可能會偏離特定傾向；基於這類有偏差的資料來訓練延伸實境推薦系統可能會有兩個主要問題：1) 線下評估結果無法真實反映線上的結果，其會降低消費者對延伸實境商店的滿意度與體驗品質；2) 分配到的顯示位置可能會偏向現有的廣告或較普及的商品，其會影響推薦結果的全面性及健全性。因此，平衡訓練資料的偏差對我們的延伸實境推薦系統是相當重要的。因應上述問題，我們將提出一套方法，以克服資料中的偏差，並建立一套較為全面的推薦系統。更確切地來說，給定資料的偏差傾向，我們可設計相對應的權重，以優化模型，進而從有偏差的資料得到更全面性的資訊；再者，我們將設計一套產生有偏差的資料及完成接續的假設驗證的方法。我們將設計一套實驗環境，以模擬有偏差的資料及無偏差資料在模型訓練上的效果；本計畫將提出一套負採樣的方法以模擬由現有模型推薦，但消費者不點閱的內容，藉由上述的負採樣的方法，模型的推薦效果可進一步提升。已採用延伸實境的電子商務供應商，如：eBay、Myers、IKEA、Lowe 及 Amazon，將可得益自本計畫開發的延伸實境推薦系統，以在延伸實境群體購物應用中，促進彈性且平衡的商品顯示配置，進而經由社群影響力及互動提升銷售額，同時不犧牲消費者個人的喜好。再者，對於尚未採用延伸實境的本地供應商，如：PChome、MOMO、Buy123 與 Rakuten，可藉由運用商品的關聯性及了解消費者對商品的動態認知來強化推銷。我們的社群物聯網部署子系統可為對社群物聯網平台與服務感興趣的公司企業帶來利益，如：Amazon AWS 的物聯網平台與 MinSphere （由 Alibaba 與 Siemens 共同建造），因它們連結到許多個社群物聯網以提供普及與適地性的服務。最後，我們的無偏差且穩健的學習技術對所有一般基於學習的預測模型的應用是有價值的，其包含（但不限於）推薦系統（如：Google、 Yahoo、YouTube 以及 Alibaba）。 93

所內合計 Recommendation, Modeling, and Search for 作畫 Smart Digital Marketing IIS Collaborative Projects Principal Investigators: Dr. Mi-Yen Yeh and Dr. De-Nian Yang Project Period: 2020/1~2022/12 It is widely believed that, in the near future, incorporating AI technologies such as Artificial Internet of Things (AIoT) and predictive analytics into new Digital Marketing [e.g. retail with immersive Extended Reality (XR, which subsumes VR/AR/MR), social e-commerce, and programmatic advertising] will be extensive. For example, the International Data Corporation (IDC) forecasts worldwide expenditure on XR to reach USD$18.8 billion in 2020 (including $1.5 billion in retail) and also foresees the XR market to continue annual growth of 77% at least to 2023. According to marketing surveys by Forbes, Walker Sands and L.E.K., 79% of consumers are likely to visit an XR store displaying customized products, 65% are excited about XR shopping, and 54% acknowledge social shopping as how they purchase products. Consequently, Oracle reports that 78% of online retailers (including IKEA, Lowe, Alibaba, and eBay) have already implemented or are planning to implement XR and AI. Furthermore, eMarketer predicts that 86.2% of digital display advertisements in the U.S. will be programmatic by 2020 and that digital advertisement spending globally will exceed USD$375 billion by 2021. Business Insider, Gartner and Forbes forecast the IoT market to grow to over USD$3 trillion annually and that the number of IoT devices will reach 64 billion by 2025, more than 80% of which will have an AI component. Devising recommendation systems for smart digital marketing in the future is a complex task given that: 1) XR technologies are constantly developing (e.g., Multi-View Display, MVD, which enables exible and customized shopping environment design); 2) the strategic behaviors of the multiple parties involved (e.g., online consumers, retailers, and advertising agents); and 3) the enormous, biased, and noisy datasets collected from e-commerce purchase logs and Social Internet of Things (SIoT) devices. In this three-year project, we aim to design the following core technologies: 1) a main subsystem that leverages the flexibility of MVD in XR to con gure product displays tailored to individual consumers and considering potential social in uences and interactions; 2) a subsystem that e ectively exploits the social in uences driven by multiple correlated items and dynamic user perceptions in multiple promotion campaigns through leverage of Knowledge Graphs (KG) and temporal social networks; 3) a subsystem that accurately predicts the winning prices in real-time bidding (RTB) for multiple advertising agents with incomplete information; and 4) a subsystem that efficiently and proactively deploys SIoT/AIoT devices to detect XR consumer behavior and tackles di culties with unbiased recommendations and timely user support. An overall framework of our XR recommendation system is presented in Fig. 1. Figure 1 : Data processing for smart digital marketing. First Year We aim to devise a recommendation system for XR group shopping by exploiting MVD, which supports flexible switching between a primary view (viewing different items privately) and a group view (viewing common items with friends) during shopping. Although the user interface and the real objects are consistent within the customers’ environments, the virtual items displayed can be tailored somewhat for di erent users based on their diverse preferences. In contrast, shared common items can stimulate social interactions and discussions to boost sales. Consequently, the innovative XR group recommendation task is challenging as it must consider 1) personal preferences, 2) social interactions, 3) item correlation, and 4) subgroup dynamics 94

Brochure 2020 and view switching. In contrast, existing personalized recommenders, e.g., Collaborative Filtering (CF) and Bayesian Personalized Ranking (BPR), fail to consider social interactions. Moreover, group recommenders, such as the preference aggregators devised through Attention Networks or Graph Neural Networks, sacri ce personal preferences by assigning a unified set of items for the entire group of users based only on their aggregate preference. To determine user satisfaction in MVD, we will formulate a new optimization problem, Social-aware XR Group-Item Configuration (SXGIC), to maximize overall personal satisfaction while also ensuring that: 1) no duplicated items are displayed at di erent slots to a user; 2) the number of users sharing group views of the same item is controlled; 3) similar and/or complementary items are placed near each other; and 4) the partitioning of shared-view subgroups between consecutive display slots is similar. We will prove the NP-hardness and inapproximability of SXGIC, and propose an Integer Program (IP) as a baseline approach for accurately solving SXGIC. We will then devise randomized rounding strategies to construct promising MVD con gurations by exploiting the fractional solutions of the linearly relaxed IP. Finally, we will incorporate the proposed methods into a prototype XR shopping application built in Unity with hTC VIVE HMD and Microsoft HoloLens, allowing us to conduct real-user studies to validate the correlation between real user satisfaction and our optimization problem. Figure 2 : Total satisfaction vs. di . number of users. Figure 3 : Total satisfaction vs. di . datasets. Figure 4 : Prototype user study results. Preliminary experimental results manifest that our randomized rounding approach (termed AVG in Figures 2 and 3) and its deterministic variation (termed AVG-D) consistently outperform existing baseline recommendation approaches by at least 30% in terms of total satisfaction in different input parameter settings (Figure 2) and across all datasets (Figure 3). As evidenced, our methods e ectively balance between personal preferences (black bars in Figures 2 and 3) and social interactions (white bars). Moreover, preliminary user study conducted on our prototype XR shopping system surveys important problem parameters from user opinions (Figure 4(a)) and validates that users are more satis ed viewing item con gurations given by AVG (Figure 4(b)). Further analysis on the resulted item configurations (Figure 4(c) and 4(d)) finds that our proposed approach clusters users into cohesive and dense subgroups (most friendship edges are preserved intra-subgroup in Figure 4(c)) while ensuring most pairs of friends view similar items together (high co-display percentage in Figure 4(d)). Second Year The XR recommendation system outlined above con gures item displays based on pre-evaluated personal preferences and social in uence bene ts. However, it does not capture the complicated interplay (known as co-evolution) between social in uences and dynamic user perceptions in repeated and diverse promotion campaigns. Existing research on In uence Maximization (IM) selects k users as the seeds to promote a single target item and maximize influenced users. Nevertheless, in real life, companies often promote multiple relevant items in multiple events. Therefore, in the second year of the project, we aim to focus on the task of e ectively quantifying and optimizing the bene ts of social in uences and dynamic users’ perceptions for multiple promotion campaigns in social e-commerce. This aspect of the project presents new challenges. Firstly, most previous work does not consider the complementary and substitutable relationships between 95

IIS Collaborative Projects所內合作計畫 promoted items, which can affect users’ preferences through the economic concept of cross-elasticity of demand. Secondly, perceptions of the relationships between items are usually personal. Thirdly, personal perceptions of item relationships are dynamic, with newly-adopted items usually opening new experiences to users and changing their perceptions regarding related items. Fourthly, changes in personal perceptions of item relationships may a ect users’ preferences and their social in uence over friends. Therefore, to address the important need to quantify social influence, we will leverage KG to capture relationships between items, where KG represents facts, a meta-graph for a certain relationship describes the semantics of connectivity between items, and a weighting re ects the importance of that meta-graph to that relationship. By learning and tailoring the meta-graph weighting according to previously adopted items, dynamic personal perceptions of item relationships can be captured through personal item networks, with the vertices as items and the edges as item relationships (e.g., complementary and substitutable) along with their relevance scores. Accordingly, we will formulate a new task, In uence Maximization with Dynamic Personal Perception (DPP-IM), to establish a sequence of promotions for relevant items wherein the relationships between items and the time-varied personal preferences and social influences due to previously adopted items are both considered. DPP-IM will identify correlated items to be promoted at di erent times to suitable users within a budget, thereby maximizing the bene ts of chosen items under dynamic user perceptions and social in uence. Another limitation of the XR recommendation system to be developed in the first year is that it assumes a fully centralized system, i.e., the displayed item configuration can be dictated by the retailing platform. However, in reality, retailers sell a plethora of goods from various vendors, who naturally compete for limited retailing resources such as item display slots in XR malls and advertisement slots in e-commerce. Therefore, it is important to carefully examine the strategic and competitive behaviors of suppliers in the XR recommendation system. Accordingly, in the second year of the project, we will integrate Real-Time Bidding (RTB) mechanisms to help determine the rights of item display in our XR recommendation system. It is essential to devise a subsystem to precisely predict the winning price (generally the highest bidding price among all competitors) for the supply-side platform (SSP, which is our XR recommendation platform) to help infer auction results and con gure the item display, as well as for the demand-side platforms (DSPs, the agents for advertisers) to guide their bidding behavior with limited budgets. Under common sealed-bid auction mechanisms such as the second-price auction, a major challenge for DSPs is the lack of complete information about the winning price, especially for lost bids in the past. That is because the winning price is visible only to the winner. This problem is exacerbated when the rst-price auction is adopted, which has become increasingly popular over the second price auction. In the rst-price auction, the winning price is invisible even when a DSP wins the impression. The only information available is if the DSP wins the impression with its bid price. Therefore, we aim to help DSPs to design an e ective prediction method for the winning price of various distributions and under di erent auction mechanisms. To devise a new winning price model, we will evaluate the performance and in uence of di erent loss functions based on a deep model structure. Moreover, we will design a model layer or component and a loss function for learning a double-censored winning price from the winning rate. We will also analyze the regularization term that makes the distribution smoother and in uences the performance of the winning price model. Furthermore, we will assess how the winning price prediction in uences the winning rate and revenue, and we will study how to use the winning price model to construct a bidding strategy. Our model and bidding strategy will be evaluated by means of metrics such as revenue and cost curve. Third Year We will shift our focus to intensive data collection and real-time event/location-based searching through the Internet of Things (IoT) to facilitate real-time detection of user feedback and provide immediate user support in the XR shopping environment. Detecting real-time user feedback is essential for customized item con guration in XR stores. Moreover, XR customers unfamiliar with the environment or interfaces may have technical problems and require help. It is crucial to exploit SIoT to provide immediate user support and guidance across XR stores to avoid massive sta deployment. Therefore, it is vital to devise scalable and e cient deployment of SIoT. Accordingly, we will study SIoT group construction and individual SIoT selection tasks for XR stores in terms of SIoT communication, computation, and MEC network coverage to provide heterogeneous feedback detection and location-based searching for user support in dynamic environments. We will formulate a new optimization problem to optimize search results under dynamic environments for location-based detection and recommendations. 96

Brochure 2020 Finally, we aim to enhance our recommendation system such that it is robust to biased data. It is inevitable that relevant real-world data, collected from previous e-commerce purchase logs and advertisement bidding logs, will contain noise and bias. Previous studies optimized the recommendation model based on assumptions that data is not biased and used o ine assessment results to represent online results. However, the collected data may deviate from a specific tendency. Training an XR recommendation system based on such biased data is likely to cause two main problems: 1) the results of o ine evaluation cannot truly reflect the online results, diminishing user satisfaction and experience in XR stores; and 2) the allocated display slots are biased toward existing advertisers or popular items, inhibiting the comprehensiveness and soundness of the recommendation results. Therefore, it is important for our XR recommendation system to balance training data bias. Accordingly, we will propose a method to overcome bias in the data. More speci cally, we will design a weighting system to re ne the model so that we can obtain more comprehensive information from biased data. Moreover, we will generate biased data to conduct hypothesis veri cation. Our experimental environment will simulate the e ects of biased data and non-biased data on model training. Then, we can develop a negative sampling method to simulate the contents recommended by the existing models that users do not select, allowing us to improve the items recommended by the model. XR-enabled e-commerce vendors such as eBay, Myers, IKEA, Lowe, and Amazon will benefit from our XR recommendation system because it will facilitate flexible and balanced item configurations to boost sales via social in uences and interactions while not sacri cing individual user preferences. Moreover, for local vendors (e.g., PChome, MOMO, Buy123, and Rakuten) that have not adopted XR technologies, their promotions can be enhanced by utilizing correlations among items and understanding customers’dynamic perceptions of items. Our SIoT deployment subsystem can bene t companies interested in SIoT platforms and services, e.g., Amazon AWS IoT platform and MinSphere (jointly built by Alibaba and Siemens), as they connect numerous SIoTs to provide ubiquitous and location-based services. Finally, our techniques for unbiased and robust learning will prove valuable to all general applications of learning-based prediction models, including but not limited to recommendation systems (such as Google, Yahoo, YouTube, and Alibaba). 97

所內合計 GraphStor：從裝置到系統和應用程式整體設計的作畫高效能兼高可靠度圖形計算處理系統 IIS Collaborative Projects 計畫主持人：陳郁方博士、張原豪博士計畫期程：2020/1~2022/12 由於在大數據時代資料量的爆增，為了有效率的將各式資料轉為可以被利用的資訊，圖形計算 (graph computing) 近年來受到高度的關注。圖形計算的一大瓶頸就是大量的資料輸出 / 輸入造成效能下降。從過去的發展歷程中，我們可以發現資料的成長速度遠大於記憶體設備成長速度，因此有越來越多的資料無法在執行過程中持續存放在記憶體中。記憶體和儲存設備 ( 如固態硬碟 ) 間的大量資料交換，在目前架構下是無法避免的。在目前的電腦架構下，為了增加系統的穩定性，每個層級 ( 如硬碟，檔案系統，資料庫系統 ) 都會各自維護自己的系統備份或日誌。這樣才能在遇到意外斷電或當機時，還能將資料回復到一個可讀取的狀態。然而，這些系統備份需要很大的空間，也會產生大量的資料輸出 / 輸入。在此一背景下，於今年一月起，我們三年期 (2020/1~2022/12) 的 GraphStor 計畫通過審核，獲得每年台幣 250 萬元的補助。GraphStor 計畫的目標是設計一個「高效率」和「高可靠度 ( 或稱『穩定』)」的處理系統，以因應大數據時代下的圖形計算需求。我們的設計大方向是減少系統分層，以減低資料搬運和備份的需求。例如我們會考慮在記憶體中做運算 (in-memory computing)，利用超低延遲 (ultra-low-latency) 的儲存設備直接取代記憶體，重新設計「檔案」的概念，選擇比較適合儲存圖形的結構。換言之，本計畫結合兩位主持人在「形式化驗證」及「儲存系統」研究專長，目標為 (1) 近期發展高效率及高可靠度的 crash-deterministic 固態硬碟的儲存系統，並基於此一儲存系統，(2) 中長期將進一步發展高效能兼高可靠度圖形計算處理系統。在近期目標中，我們有鑑於資料儲存系統無法避免意外斷電或當機，因此當機回復 (crash recovery) 機制乃設計中必要的一環。為了同時達成「高效率」和「穩定」的要求，我們一開始的研究目標是設計一個從儲存裝置出發直到應用層的當機回復 (crash recovery) 機制，以減少層級之間多餘的備份和額外的系統日誌作業，同時也確保系統在每次當機後，都能回朔到上次系統狀態的紀錄點 (check-point)。這牽涉到許多複雜的資料結構與演算法。為了確保在這樣複雜的結構下能達成系統穩定的設計目標，我們計劃使用形式化驗證（formal veri cation）的技術來增加系統的可靠度。這是目前唯一已知，可以證明一個軟體不存在錯誤的方法。也幾乎是軟體可靠度的最高要求。我們預設的儲存裝置是主流的固態硬碟 (solid-state drive, SSD)。在傳統的設計下，一但發生當機，在下次開機後，系統並不能保證回到一個穩定的狀態。所有在最後一次刷新 ( ush) 動作後寫入的資料，都有可能遺失。以下圖的簡單例子為例，我們有一個有八個區段 (sector) 的硬碟，W1 發生在最後一次刷新動作之前，而 W2 和 W3 都發生在最後一次刷新之後。W2 寫入一個區段，W3 寫入三個區段。當 W3 完成之後發生當機，下次開機時，所有 W2 和 W3 的寫入區段都有可能被執行或是沒被執行。所以可能的系統狀態會有 16 種。這會造成上層檔案系統或圖形處理系統很大的負擔。常見的解決方法，是檔案系統（例如 ext4）自己維持一份日誌，在當機發生時，利用自己的內部機制和日誌的內容進行系統狀態的回復。 98

Brochure 2020 圖一：傳統和 crash-deterministic 固態硬碟的行為。我們利用 SSD 異地更新 (out-of-place update) 的特性，設計演算法和資料結構，確保它最後一次刷新 ( ush) 動作結束時的系統狀態所用的硬碟空間，到下一次刷新前不會被系統回收。這樣的設計下，一但系統當機，我們的演算法可以確保找到並回朔至最後一次刷新動作時的狀態。我們稱這樣的新設計為 crash- deterministic 固態硬碟。直覺上來說，crash-deterministic 固態硬碟的刷新 ( ush) 動作，就相當於一次「儲存進度」的動作。上層的系統可以利用這個性質簡化當機回復機制的設計，在關鍵的時間點，呼叫一個刷新的動作來儲存進度。舉例來說，資料庫可以在每一筆交易之後呼叫一次刷新，確保每次當機後，不會出現交易完全一半的狀況。Crash-deterministic 固態硬碟的設計，預計將會是 GraphStor 的當機回復機制的基礎。 99

IIS Collaborative Projects所內合作計畫在我們的初步實驗（圖二），當底層的儲存系統使用 crash-deterministic 固態硬碟，上層檔案系統可以省略掉紀錄日誌的動作 ( 用 ext2 取代 ext4)。上層資料庫 SQLite 提交 (commit) 的一筆交易 (transaction) 所需要的 fsync 動作 ( 要求檔案系統清空目前的暫存狀態後並呼叫一次 SSD 的刷新動作 ) 也可以從 5 個減少到 1 個。 Crash-deterministic 固態硬碟設計上唯一的缺點，就是它用了比較複雜的資料結構和演算法。這樣在韌體實作上比較有可能出錯。一但發生錯誤，有時僅僅是一個位元的錯誤，也可能造成整個系統無法讀取。為了確保我們的韌體實作沒有錯誤，我們用了形式化驗證的技術 (formal veri cation)。結合了多種的驗證工具，包括 C 語言的驗證工具 Rosette+Serval，和定理證明器 Agda，我們開發了一個適用於檢驗 crash- deterministic 性質的系統。這系統能自動檢查韌體設計師所寫的 C 程式，證明它是 crash-determinisitc，或是找到一個它不是 crash-deterministic 的例子。圖二：crash-deterministic SSD 儲存系統的優勢。我們發現從儲存系統從一開始就設計整體系統的當機回復機制是一個好的選擇。近兩年 MIT 和華盛頓大學也有相關的研究，但是他們的當機回復機制主要是從檔案系統開始。因為檔案系統的複雜度遠高於儲存系統韌體的複雜度，其形式化驗證的成本極高。以 MIT 的檔案系統 DFSCQ 為例。其系統撰寫加上定理證明就花了 10 人年，其檔案系統比標準的 ext4 慢了 3 倍。從 crash-determinisitc 固態硬碟出發，我們整體儲存裝置的效能預計非但不會降低，反而會提高。驗證部分的自動化程度預計也會遠高於 MIT 的工作，整體開發時間約可縮短至 2~4 人年。 100

Pages:

tomoaki.sinica

2020簡介0619

Like this book? You can publish your book online for free in a few minutes!

Create your own flipbook

TOP SEARCH

business design fashion music health life sports home marketing children

2020簡介0619

Description: 2020簡介0619

Read the Text Version

tomoaki.sinica

TOP SEARCH

RELATED PUBLICATIONS