What is this Dataset?
BERT Concept Net is a dataset of latent concepts learned within the representations of BERT. The goal of this dataset is to complement existing Human defined concepts like linguistic and semantic properties (Part of Speech Tags, Syntactic tags, WordNet etc.). The concepts are discovered in an unsupervised fashion, and are annotated using a semi-supervised method.
Please checkout out the ICLR'22 paper for more details on how the dataset was curated and annotated. The labels themselves can also be explored below.
Birds from SEM:animal:land_animal
Proper Nouns from SEM:origin:europe:germany