【機器學習2021】Transformer (上)

Hung-yi Lee
26 Mar 202132:48

Summary

TLDRThis video script delves into the intricacies of Transformer models, particularly their application in various natural language processing and speech-related tasks. It covers the architecture of the Transformer encoder, explaining the role of self-attention, residual connections, and normalization layers. The versatility of sequence-to-sequence models is highlighted, showcasing their use in tasks like machine translation, speech recognition, grammar parsing, and multi-label classification. The script also touches on the potential of Transformer models in tackling unconventional problems and explores alternative design choices for optimizing their performance.

Takeaways

  • 😃 The Transformer model is a powerful Sequence-to-Sequence (Seq2Seq) model used for various tasks like machine translation, speech recognition, and natural language processing (NLP).
  • 🤖 The Transformer consists of an Encoder and a Decoder, where the Encoder processes the input sequence and the Decoder generates the output sequence.
  • 🧠 The Encoder in the Transformer uses self-attention mechanisms to capture dependencies between words in the input sequence.
  • 🔄 The Encoder has a residual connection that adds the input to the output of each block, followed by layer normalization.
  • 💡 The original Transformer architecture is not necessarily the optimal design, and researchers continue to explore modifications like changing the order of operations or using different normalization techniques.
  • 🌐 Seq2Seq models can be applied to various NLP tasks like question answering, sentiment analysis, and grammar parsing by representing the input and output as sequences.
  • 🚀 End-to-end speech translation, where the model translates speech directly into text without intermediate transcription, is possible with Seq2Seq models.
  • 🎤 Seq2Seq models can be used for tasks like text-to-speech synthesis, where the input is text, and the output is an audio waveform.
  • 🔢 Multi-label classification problems, where an instance can belong to multiple classes, can be addressed using Seq2Seq models by allowing the model to output a variable-length sequence of class labels.
  • 🌉 Object detection, a computer vision task, can also be tackled using Seq2Seq models by representing the output as a sequence of bounding boxes and class labels.

Q & A

  • What is a Transformer?

    -A Transformer is a powerful sequence-to-sequence model widely used in natural language processing (NLP) and speech processing tasks. It uses self-attention mechanisms to capture long-range dependencies in input sequences.

  • What are some applications of sequence-to-sequence models discussed in the script?

    -The script discusses several applications, including speech recognition, machine translation, chatbots, question answering, grammar parsing, multi-label classification, and object detection.

  • How does a Transformer Encoder work?

    -The Transformer Encoder consists of multiple blocks, each containing a self-attention layer, a feed-forward network, and residual connections with layer normalization. The blocks process the input sequence and output a new sequence representation.

  • What is the purpose of residual connections in the Transformer architecture?

    -Residual connections are used to add the input of a block to its output, allowing the model to learn residual mappings and potentially improve gradient flow during training.

  • Why is layer normalization used in the Transformer instead of batch normalization?

    -Layer normalization is used instead of batch normalization because it operates on individual samples rather than mini-batches, which is more suitable for sequential data like text or speech.

  • What is the role of positional encoding in the Transformer?

    -Positional encoding is used to inject information about the position of each token in the input sequence, as self-attention alone cannot capture the order of the sequence.

  • Can the Transformer architecture be improved or modified?

    -Yes, the script mentions research papers that explore alternative designs for the Transformer, such as rearranging the layer normalization position or introducing a new normalization technique called Power Normalization.

  • What is the purpose of multi-head attention in the Transformer?

    -Multi-head attention allows the model to attend to different representations of the input sequence in parallel, capturing different types of relationships and dependencies.

  • Can sequence-to-sequence models be applied to tasks beyond translation?

    -Yes, the script mentions that sequence-to-sequence models can be applied to various NLP tasks, such as question answering, grammar parsing, multi-label classification, and even object detection in computer vision.

  • Why are sequence-to-sequence models useful for languages without written forms?

    -Sequence-to-sequence models can be used for speech-to-text translation, allowing languages without written forms to be translated directly into written languages that can be read and understood.

Outlines

00:00

🤖 Introduction to Transformer and Seq2Seq Models

This paragraph introduces the topic of Transformer and its relation to BERT. It explains the concept of Sequence-to-Sequence (Seq2Seq) models, which can be used for various tasks such as speech recognition, machine translation, and speech translation. The paragraph provides examples of how Seq2Seq models can be applied to tasks like translating Taiwanese to Chinese and Taiwanese speech synthesis.

05:01

🗣️ Applications of Seq2Seq Models in Language Processing

This paragraph discusses the potential applications of Seq2Seq models in language processing tasks, such as training a chatbot, question answering, translation, summarization, and sentiment analysis. It explains how various NLP tasks can be framed as question-answering problems and solved using Seq2Seq models. The paragraph also mentions the limitations of using a single Seq2Seq model for all tasks and the need for task-specific models.

10:03

🌐 Versatility of Seq2Seq Models across NLP Tasks

This paragraph further explores the versatility of Seq2Seq models in solving various NLP tasks, including grammar parsing, multi-label classification, and object detection. It explains how even tasks that may not seem like Seq2Seq problems can be formulated as such and solved using these models. The paragraph also mentions the potential of using Seq2Seq models for end-to-end language tasks without intermediate steps.

15:05

🧩 Seq2Seq Models for Grammar Parsing and Classification

This paragraph delves into the application of Seq2Seq models for grammar parsing and multi-label classification tasks. It explains how a tree structure can be represented as a sequence and fed into a Seq2Seq model for grammar parsing. The paragraph also discusses how multi-label classification problems, where an instance can belong to multiple classes, can be solved using Seq2Seq models by allowing the model to output variable-length sequences.

20:08

🔍 Overview of Transformer Architecture

This paragraph provides an overview of the Transformer architecture, which is the focus of the lecture. It explains that the Transformer consists of an Encoder and a Decoder, and the Encoder's primary function is to take a sequence of vectors as input and output another sequence of vectors. The paragraph introduces the concept of self-attention and its role in the Encoder architecture.

25:08

🧱 Detailed Structure of Transformer Encoder

This paragraph goes into the detailed structure of the Transformer Encoder. It explains the various components and operations within each block of the Encoder, including multi-head self-attention, residual connections, layer normalization, and feed-forward networks. The paragraph also discusses the reasoning behind certain design choices and potential alternatives proposed in research papers.

30:09

🔄 Alternatives and Improvements to Transformer Architecture

This paragraph discusses potential alternatives and improvements to the original Transformer architecture. It mentions research papers that explore the positioning of layer normalization within the blocks and the use of different normalization techniques, such as Power Normalization, which may outperform layer normalization in certain scenarios. The paragraph encourages exploring and considering alternative architectures for optimal performance.

Mindmap

Keywords

💡Transformer

The Transformer is a type of neural network architecture that is particularly well-suited for handling sequences of data, such as text or audio. In the video, it's highlighted as a core technology behind many natural language processing tasks, including machine translation and speech recognition. The mention of Transformers being related to BERT indicates its importance in the development of state-of-the-art models for processing human language.

💡Sequence-to-sequence model (Seq2Seq)

Seq2Seq models are designed to convert sequences from one domain (input sequence) into sequences in another domain (output sequence), with potentially different lengths. This concept is crucial in the video, especially when discussing tasks like machine translation where the model needs to determine the output length dynamically. The flexibility of Seq2Seq models makes them suitable for a variety of applications including speech recognition and text generation.

💡Speech recognition

Speech recognition is the process of converting spoken language into text. In the video, it's presented as a prime example of a Seq2Seq model application. The model listens to the input audio signal, processes it as a sequence of vectors, and outputs the corresponding textual representation. This task illustrates the Seq2Seq model's ability to handle variable input and output lengths.

💡Machine translation

Machine translation involves converting text from one language to another. The video uses this task to illustrate the application of Seq2Seq models, demonstrating how the input text in one language is transformed into output text in another language. This process underlines the model's capability to determine the relationship between the lengths of input and output sequences.

Highlights

Transformer is a sequence-to-sequence (seq2seq) model, which takes in a sequence as input and outputs another sequence, but the output length is determined by the machine itself.

Transformer can be applied to various tasks like speech recognition, machine translation, chatbots, question answering, grammar parsing, multi-label classification, and object detection, by treating the task as a seq2seq problem.

The concept of using seq2seq models for tasks like grammar parsing, which traditionally do not seem like seq2seq problems, was introduced in a 2014 paper titled "Grammar as a Foreign Language".

Transformer has an encoder and a decoder architecture, with the encoder processing the input sequence and the decoder determining the output sequence.

In the Transformer encoder, each block consists of a multi-head self-attention layer, followed by a residual connection and layer normalization, then a feed-forward network with residual connection and layer normalization.

The self-attention layer in the Transformer encoder considers the entire input sequence and outputs a vector for each position, allowing the model to capture long-range dependencies.

The residual connection in the Transformer encoder adds the input vector to the output vector, while layer normalization normalizes the vector across its dimensions.

The original Transformer architecture design may not be optimal, as suggested by research papers exploring alternative designs, such as changing the order of layer normalization and residual connections.

A paper titled "On Layer Normalization in the Transformer Architecture" proposes moving the layer normalization before the residual connection, improving performance.

Another paper, "Power Norm: Rethinking Batch Normalization In Transformers", suggests using power normalization instead of layer normalization, potentially improving performance.

The Transformer encoder processes the input sequence using positional encoding, as self-attention alone cannot capture the order of the sequence.

The Transformer decoder architecture, which determines the output sequence, is not covered in detail in this transcript.

Transformer models can be trained on large datasets, such as TV shows and movies for chatbots, or paired data like audio and transcripts for speech recognition and translation tasks.

The transcript provides an example of training a Transformer model on 1,500 hours of Taiwanese drama data to perform Taiwanese speech recognition and translation to Chinese text.

The transcript discusses the potential of using Transformer models for end-to-end speech translation, particularly for languages without written forms, by training on paired audio and text data from another language.

Transcripts

00:02

好 那接下來

00:04

我們要講這個作業五

00:06

大家會用上的Transformer

00:09

那我們在之前已經提了Transformer

00:11

提了不下N次

00:13

那如果你們還不知道

00:14

Transformer是什麼的話

00:15

Transformer其實就是變形金剛知道嗎

00:18

變形金剛的英文就是Transformer

00:22

那Transformer也跟我們之後會

00:24

提到的BERT有非常強烈的關係

00:28

所以這邊有一個BERT探出頭來

00:31

代表說Transformer跟BERT

00:33

是很有關係的

00:35

那Transformer是什麼呢

00:37

Transformer就是一個

00:39

Sequence-to-sequence的model

00:42

那Sequence-to-sequence的model

00:44

他的縮寫

00:45

我們會寫做Seq2seq

00:47

那Sequence-to-sequence的model

00:49

又是什麼呢

00:50

我們之前在講input a sequence的

00:53

case的時候

00:54

我們說input是一個sequence

00:56

那output有幾種可能

00:58

一種是input跟output的長度一樣

01:01

這個是在作業二的時候做的

01:04

有一個case是output指

01:06

output一個東西

01:07

這個是在作業四的時候做的

01:09

那接來作業五的case是

01:11

我們不知道應該要output多長

01:15

由機器自己決定output的長度

01:20

那有什麼樣的例子

01:21

有什麼樣的應用

01:22

是我們需要用到這種

01:24

Sequence-to-sequence的model

01:26

也就是input是一個sequence

01:27

output是一個sequence

01:28

但是我們不知道output應該有的長度

01:31

應該要由機器來自己決定

01:33

output的長度

01:34

有什麼樣的應用呢

01:35

舉例來說

01:36

一個很好的應用就是 語音辨識

01:39

在做語音辨識的時候

01:40

輸入是聲音訊號

01:42

我們在這一門課裡面

01:44

已經看過好多次

01:44

輸入的聲音訊號其實就是

01:47

一串的vector

01:49

輸出是什麼

01:50

輸出是語音辨識的結果

01:53

也就是輸入的這段聲音訊號

01:55

所對應的文字

01:56

我們這邊用圈圈來代表文字

02:00

每一個圈圈就代表

02:02

比如說中文裡面的一個方塊子

02:05

今天輸入跟輸出的長度

02:08

當然是有一些關係

02:10

但是卻沒有絕對的關係

02:12

我們說輸入的聲音訊號

02:14

他的長度是大T

02:15

我們並沒有辦法知道說

02:18

根據大T輸出的這個長度N

02:21

一定是多少

02:22

怎麼辦呢 由機器自己決定

02:23

由機器自己去聽這段聲音訊號的內容

02:28

自己決定他應該要輸出幾個文字

02:31

他輸出的語音辨識結果

02:33

輸出的句子裡面應該包含幾個字

02:36

由機器自己來決定

02:38

這個是語音辨識

02:40

還有很多其他的例子

02:42

比如說作業五我們會做機器翻譯

02:45

讓機器讀一個語言的句子

02:47

輸出另外一個語言的句子

02:50

那在做機器翻譯的時候

02:52

輸入的文字的長度是N

02:54

輸出的句子的長度是N'

02:57

那N跟N'之間的關係

03:00

也要由機器自己來決定

03:03

我們說輸入機器學習這個句子

03:05

輸出是machine learning

03:07

輸入是有四個字

03:10

輸出有兩個英文的詞彙

03:13

但是並不是所有中文跟英文的關係

03:15

都是輸出就是輸入的二分之一

03:18

到底輸入一段句子

03:20

輸出英文的句子要多長

03:23

由機器自己決定

03:25

甚至你可以做更複雜的問題

03:27

比如說做語音翻譯

03:30

什麼叫做語音翻譯

03:31

語音翻譯就是

03:32

你對機器說一句話

03:34

比如說machine learning

03:36

他輸出的不是英文

03:38

他直接把他聽到的英文的

03:40

聲音訊號翻譯成中文

03:42

你對他說machine learning

03:44

他輸出的是機器學習

03:47

你可能會問說

03:48

為什麼我們要做

03:49

Speech Translation這樣的任務

03:51

為什麼我們不說

03:52

我們直接做一個語音辨識

03:54

再做一個機器翻譯

03:56

把語音辨識系統跟機器翻譯系統

03:59

接起來 就直接是語音翻譯

04:02

那是因為其實世界上有很多語言

04:04

他根本連文字都沒有

04:07

世界上有超過七千種語言

04:10

那其實在這七千種語言

04:12

有超過半數其實是沒有文字的

04:15

對這些沒有文字的語言而言

04:18

你要做語音辨識

04:19

可能根本就沒有辦法

04:21

因為他沒有文字

04:22

所以你根本就沒有辦法做語音辨識

04:24

但我們有沒有可能對這些語言

04:27

做語音翻譯

04:29

直接把它翻譯成

04:31

我們有辦法閱讀的文字

04:34

一個很好的例子也許就是

04:36

台語的語音辨識

04:38

但我不會說台語沒有文字

04:40

很多人覺得台語是有文字的

04:41

但台語的文字並沒有那麼普及

04:43

現在聽說小學都有教台語的文字了

04:47

但台語的文字

04:48

並不是一般人能夠看得懂的

04:51

所以如果你做語音辨識

04:53

你給機器一段台語

04:56

然後它可能輸出是母湯

04:58

你根本就不知道

04:59

這段話在說什麼對不對

05:01

所以我們期待說機器也許可以做翻譯

05:04

做語音的翻譯

05:05

對它講一句台語

05:07

它直接輸出的是同樣意思的

05:10

中文的句子

05:11

那這樣一般人就可以看懂

05:14

那有沒有可能做到這件事呢

05:16

有沒有可能訓練一個類神經網路

05:18

這個類神經網路聽某一種語言

05:20

的聲音訊號

05:21

輸出是另外一種語言的文字呢

05:24

其實是有可能的

05:26

那對於台語這個例子而言

05:28

我們知道說

05:29

今天你要訓練一個neural network

05:31

你就需要有input跟output的配合

05:34

你需要有台語的聲音訊號

05:36

跟中文文字的對應關係

05:39

那這樣的資料好不好蒐集呢

05:41

這樣的資料

05:43

並不是沒有可能蒐集的

05:45

比如說YouTube上面

05:46

有很多的鄉土劇

05:47

你知道鄉土劇就是

05:49

台語語音 中文字幕

05:51

所以你只要它的台語語音載下來

05:53

中文字幕載下來

05:55

你就有台語聲音訊號

05:56

跟中文之間的對應關係

05:58

你就可以硬train一個模型

06:00

你就可以train我們剛才講的

06:01

我們等一下要講的Transformer

06:03

然後叫機器直接做台語的語音辨識

06:07

輸入台語 輸出中文

06:10

那你可能會覺得這個想法很狂

06:12

而且好像 聽起來有很多很多的問題

06:14

那我們實驗室就載了

06:15

一千五百個小時的鄉土劇的資料

06:18

然後 就真的拿來訓練一個

06:20

語音辨識系統

06:21

你可能會覺得說

06:22

這聽起來有很多的問題

06:23

舉例來說 鄉土劇有很多雜訊

06:26

有很多的音樂

06:27

不要管它這樣子

06:29

然後 鄉土劇的字幕

06:31

不一定跟聲音有對起來

06:33

就不要管它這樣子

06:34

然後呢你可能會想說

06:37

台語不是還有一些

06:39

比如說台羅拼音

06:40

台語也是有類似音標這種東西

06:42

也許我們可以先辨識成音標

06:44

當作一個中介

06:45

然後在從音標轉成中文

06:47

也沒有這樣做 直接訓練一個模型

06:49

輸入是聲音訊號

06:50

輸出直接就是中文的文字

06:53

這種沒有想太多 直接資料倒進去

06:56

就訓練一個模型的行為

06:58

就叫作硬train一發知道嗎

07:03

那你可能會想說

07:04

這樣子硬train一發到底能不能夠

07:07

做一個台語語音辨識系統呢

07:09

其實 還真的是有可能的

07:12

以下是一些真正的結果

07:15

機器在聽的一千五百個小時的

07:18

鄉土劇以後

07:19

你可以對它輸入一句台語

07:21

然後他就輸出一句中文的文字

07:23

以下是真正的例子

07:24

機器聽到的聲音是這樣子的

07:25

可以做一下台語的聽力測驗

07:28

看看你辨識出來的跟機器是不是一樣的

07:31

機器聽到這樣的句子

07:33

你的身體撐不住(台語)

07:34

那機器輸出是什麼呢

07:36

它的輸出是 你的身體撐不住

07:38

這個聲音訊號是你的身體撐不住(台語)

07:41

但機器並不是輸出無勘

07:43

而是它就輸出撐不住

07:45

或者是機器聽到的

07:47

是這樣的聲音訊號

07:48

沒事你為什麼要請假(台語)

07:50

沒事你為什麼要請假

07:52

機器聽到沒事(台語)

07:54

它並不是輸出 沒代沒誌

07:56

它是輸出 沒事

07:58

這樣聽到四個音節沒代沒誌(台語)

08:00

但它知道說台語的沒代沒誌(台語)

08:02

翻成中文 也許應該輸出 沒事

08:05

所以機器的輸出是

08:06

沒事你為什麼要請假

08:08

但機器其實也是蠻容易犯錯的

08:10

底下特別找機個犯錯的例子

08:12

給你聽一下

08:13

你聽聽這一段聲音訊號

08:15

不會膩嗎(台語)

08:16

他說不會膩嗎(台語)

08:18

我自己聽到的時候我覺得

08:19

我跟機器的答案是一樣的

08:20

就是說要生了嗎

08:23

但其實這句話

08:24

正確的答案就是

08:25

不會膩嗎(台語)

08:27

不會膩嗎

08:29

當然機器在倒裝

08:30

你知道有時候你從台語

08:32

轉成中文句子需要倒裝

08:34

在倒裝的部分感覺就沒有太學起來

08:38

舉例來說它聽到這樣的句子

08:40

我有跟廠長拜託(台語)

08:42

他說我有跟廠長拜託(台語)

08:44

那機器的輸出是

08:45

我有幫廠長拜託

08:47

但是你知道說這句話

08:49

其實是倒裝

08:50

我有跟廠長拜託(台語)

08:52

是我拜託廠長

08:54

但機器對於它來說

08:55

如果台語跟中文的關係需要倒裝的話

08:58

看起來學習起來還是有一點困難

09:00

這個例子想要告訴你說

09:02

直接台語聲音訊號轉繁體中文

09:05

不是沒有可能

09:06

是有可能可以做得到的

09:09

那其實台灣有很多人都在做

09:10

台語的語音辨識

09:11

如果你想要知道更多有關

09:13

台語語音辨識的事情的話

09:15

可以看一下下面這個網站

09:18

那台語語音辨識反過來

09:20

就是台語的語音合成對不對

09:24

我們如果是一個模型

09:25

輸入台語聲音 輸出中文的文字

09:28

那就是語音辨識

09:30

反過來 輸入文字 輸出聲音訊號

09:34

就是語音合成

09:36

這邊就是demo一下台語的語音合成

09:40

這個資料用的是

09:41

台灣 媠聲(台語)的資料

09:42

來找GOOGLE台灣媠聲(台語)

09:44

就可以找到這個資料集

09:45

裡面就是台語的聲音訊號

09:48

聽起來像是這個樣子

09:50

比如說你跟它說

09:51

歡迎來到台灣台大語音處理實驗室

09:54

不過這邊是需要跟大家說明一下

09:56

現在還沒有真的做End to End的模型

09:58

這邊模型還是分成兩階

10:00

他會先把中文的文字

10:02

轉成台語的台羅拼音

10:05

就像是台語的KK音標

10:07

在把台語的KK音標轉成聲音訊號

10:10

不過從台語的KK音標

10:11

轉成聲音訊號這一段

10:12

就是一個像是Transformer的network

10:15

其實是一個叫做echotron的model

10:18

它本質上就是一個Seq2Seq model

10:20

大概長的是這個樣子

10:22

所以你輸入文字

10:24

歡迎來到台大語音處理實驗室

10:26

機器的輸出是這個樣子的

10:28

歡迎來到台大(台語)

10:30

語音處理實驗室(台語)

10:32

或是你對他說這一句中文

10:35

然後他輸出的台語是這個樣子

10:38

最近肺炎真嚴重(台語)

10:40

要記得戴口罩 勤洗手(台語)

10:43

有病就要看醫生(台語)

10:45

所以你真的是可以

10:46

合出台語的聲音訊號的

10:49

就用我們在這一門課裡面學到的

10:52

Transformer或者是Seq2Seq的model

10:56

剛才講的是跟語音比較有關的

10:59

那在文字上

11:01

也會很廣泛的使用了Seq2Seq model

11:05

舉例來說你可以用Seq2Seq model

11:08

來訓練一個聊天機器人

11:11

聊天機器人就是你對它說一句話

11:14

它要給你一個回應

11:16

輸入輸出都是文字

11:18

文字就是一個vector Sequence

11:21

所以你完全可以用Seq2Seq 的model

11:24

來做一個聊天機器人

11:27

那怎麼訓練一個聊天機器人呢

11:29

你就要收集大量人的對話

11:31

像這種對話你可以收集

11:33

電視劇 電影的台詞 等等

11:36

你可以收集到

11:36

一堆人跟人之間的對話

11:39

假設在對話裡面有出現

11:40

某一個人說Hi

11:42

和另外一個人說

11:43

Hello How are you today

11:44

那你就可以教機器說

11:46

看到輸入是Hi

11:47

那你的輸出就要跟

11:49

Hello how are you today

11:51

越接近越好

11:52

那就可以訓練一個Seq2Seq model

11:55

那跟它說一句話

11:56

它就會給你一個回應

11:59

那事實上Seq2Seq model

12:01

在NLP的領域

12:02

在natural language processing的領域

12:04

的使用

12:05

是比你想像的更為廣泛

12:09

其實很多natural language processing的任務

12:12

都可以想成是question answering

12:16

QA的任務

12:18

怎麼說呢

12:19

所謂的Question Answering

12:20

就是給機器讀一段文字

12:23

然後你問機器一個問題

12:25

希望他可以給你一個正確的答案

12:28

很多你覺得跟question answering

12:30

沒什麼關係的任務

12:32

都可能可以想像成是QA

12:36

怎麼說呢 舉例來說

12:37

假設你今天想做的是翻譯

12:39

那機器讀的文章就是一個英文句子

12:42

問題是什麼 問題就是

12:44

這個句子的德文翻譯是什麼

12:46

然後輸出的答案就是德文

12:49

或者是你想要叫機器自動作摘要

12:51

摘要就是給機器讀一篇長的文章

12:53

叫他把長的文章的重點節錄出來

12:56

那你就是給機器一段文字

12:58

問題是這段文字的摘要是什麼

13:01

然後期待他可以輸出一個摘要

13:03

或者是你想要叫機器做

13:05

Sentiment analysis

13:06

什麼是Sentiment analysis呢

13:08

就是機器要自動判斷一個句子

13:09

是正面的還是負面的

13:11

像這樣子的應用在

13:13

假設你有做了一個產品

13:14

然後上線以後

13:15

你想要知道網友的評價

13:17

但是你又不可能一直

13:18

找人家ptt上面

13:18

把每一篇文章都讀過

13:20

所以怎麼辦

13:21

你就做一個Sentiment analysis model

13:23

看到有一篇文章裡面

13:24

有提到你的產品

13:26

然後就把這篇文章丟到

13:27

你的model裡面

13:28

去判斷這篇文章

13:29

是正面還是負面

13:31

怎麼把sentiment analysis這個問題

13:33

看成是QA的問題呢

13:35

你就給機器

13:36

你要判斷正面還負面的文章

13:38

你問題就是這個句子

13:40

是正面還是負面的

13:41

然後希望機器可以告訴你答案

13:43

所以各式各樣的NLP的問題

13:46

往往都可以看作是QA的問題

13:50

而QA的問題

13:51

就可以用Seq2Seq model來解

13:56

QA的問題怎麼用

13:57

Seq2Seq model來解呢

14:00

就是有一個Seq2Seq model輸入

14:03

就是有問題跟文章把它接在一起

14:06

輸出就是問題的答案

14:08

就結束了

14:09

你的問題加文章合起來

14:11

是一段很長的文字

14:12

答案是一段文字