BERT 可解释性
Syntactic knowledge
Open Sesame: Getting inside BERT’s Linguistic Knowledge
Patient Knowledge Distillation for BERT Model Compression.
Linguistic Knowledge and Transferability of Contextual Representations.
Parsing as pretraining.
Are pre-trained language models aware of phrases? simple but strong baselines for grammar induction.
Inducing syntactic trees from BERT representations
Do attention heads in BERT track syntactic dependencies?
What does BERT learn about the structure of language?
A Structural Probe for Finding Syntax in Word Representations.
Emergent linguistic structure in artificial neural networks trained by self-supervision.
BERT is not a knowledge base (yet): Factual knowledge vs. name-based rea- soning in unsupervised qa
IsSuper- vised Syntactic Parsing Beneficial for Language Understanding? An Empirical Investigation.
What BERT is not: Lessons from a new suite of psy- cholinguistic diagnostics for language models
Semantic Knowledge
What BERT is not: Lessons from a new suite of psy- cholinguistic diagnostics for language models
What do you learn from context? Probing for sentence structure in contextualized word representations.
Do NLP Models Know Numbers? Probing Numeracy in Embeddings.
What’s in a Name? Are BERT Named Entity Representations just as Good for any other Name?
BERT Rediscovers the Classical NLP Pipeline.
Investigating Entity Knowledge in BERT with Simple Neural End- To-End Entity Linking
Visualizing and Measuring the Geometry of BERT
BERT Rediscovers the Classical NLP Pipeline
理解BERT每一层都学到了什么
https://www.zhihu.com/search?type=content&q=bert%20%20%E8%81%9A%E7%B1%BB