抱歉,您的浏览器无法访问本站

本页面需要浏览器支持(启用)JavaScript


了解详情 >
BERT tutorial
BERT tutorial

Paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT的全称是: Bidirectional Encoder Representation from Transformers

1. Transformer to BERT

BERT tutorial
BERT tutorial

1.1 ELMO

ELMO 全称: Embeddings from Language Models

BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial

1.2 Transformer

BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial

1.3 Bert

BERT tutorial
BERT tutorial

BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial

E_A 代表这个 Token 属于 SentenceA 还是 Sentence B

1.4 Pre-training Bert

BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial


BERT tutorial
BERT tutorial


BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial
BERT tutorial

阅读理解是QA加难的版本

3. Recap

BERT tutorial
BERT tutorial

每个word都是这句话的所有信息组成的

Bert Training 40+ times, Fine-tune 2~4 times

every token: 12 * 768, 12 层的 Transformer.

Bert 主要的缺陷就是太大了.

Reference