BERT 可解释性

Syntactic knowledge

Open Sesame: Getting inside BERT’s Linguistic Knowledge

Patient Knowledge Distillation for BERT Model Compression.

Linguistic Knowledge and Transferability of Contextual Representations.

Parsing as pretraining.

Are pre-trained language models aware of phrases? simple but strong baselines for grammar induction.

Inducing syntactic trees from BERT representations

Do attention heads in BERT track syntactic dependencies?

What does BERT learn about the structure of language?

A Structural Probe for Finding Syntax in Word Representations.

Emergent linguistic structure in artificial neural networks trained by self-supervision.

BERT is not a knowledge base (yet): Factual knowledge vs. name-based rea- soning in unsupervised qa

IsSuper- vised Syntactic Parsing Beneficial for Language Understanding? An Empirical Investigation.

What BERT is not: Lessons from a new suite of psy- cholinguistic diagnostics for language models

What BERT is not: Lessons from a new suite of psy- cholinguistic diagnostics for language models

What do you learn from context? Probing for sentence structure in contextualized word representations.

Do NLP Models Know Numbers? Probing Numeracy in Embeddings.

What’s in a Name? Are BERT Named Entity Representations just as Good for any other Name?

BERT Rediscovers the Classical NLP Pipeline.

Investigating Entity Knowledge in BERT with Simple Neural End- To-End Entity Linking

Visualizing and Measuring the Geometry of BERT

BERT Rediscovers the Classical NLP Pipeline

理解BERT每一层都学到了什么