What is NeuroX?

NeuroX is a framework that aims to interpret deep NLP models and increase the transparency of their inner workings and predictions. The goal of the framework and the proposed methodologies is to go beyond input features for interpretation and provide richer explanations of a given model and its predictions. It encompasses several lines of works, including Neuron Probing which highlights what components (layers, attention heads, neurons) of a network learn specific concepts and Latent Concept Discovery which extracts the concepts captured within the learned representations.

Active Collaborations

Past Collaborations

Projects

  • BERT Concept Net

    An annotated dataset of latent concepts learned within the representations of BERT. Released as part of the work in Discovering Latent Concepts Learned in BERT at ICLR'22.


    Project page
  • NeuroX Toolkit

    A Python library that encapsulates various methods for neuron interpretation and analysis, geared towards Deep NLP models. The library is a one-stop shop for activation extraction, probe training, clustering analysis, neuron selection and more.

  • Model Explorer

    A GUI toolkit that provides several methods to identify salient neurons with respect to a model itself or an external task. Provides visualization, ablation, and manipulation of neurons within a given model

    Try the Demo
  • ConceptX

    Explore latent concepts learned by a trained neural network model like BERT.

    Coming Soon!
  • ExplainMyPredictions



    Coming Soon!
  • Policy Police



    Coming Soon!

Publications

[1] Hassan Sajjad, Fahim Dalvi, Nadir Durrani, and Preslav Nakov. On the effect of dropping layers of pre-trained transformer models. Computer Speech and Language, 77(C):101429, 2023. [ bib | DOI | http ]
[2] Hassan Sajjad, Nadir Durrani, and Fahim Dalvi. Neuron-level Interpretation of Deep NLP Models: A Survey. Transactions of the Association for Computational Linguistics, 11, 2023. [ bib ]
[3] Ahmed Abdelali, Nadir Durrani, Fahim Dalvi, and Hassan Sajjad. Post-hoc analysis of arabic transformer models. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. [ bib ]
[4] Hassan Sajjad, Firoj Alam, Fahim Dalvi, and Nadir Durrani. Effect of post-processing on contextualized word representations. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3127--3142, Gyeongju, Republic of Korea, October 2022. International Committee on Computational Linguistics. [ bib | http ]
[5] Hassan Sajjad, Nadir Durrani, Fahim Dalvi, Firoj Alam, Abdul Khan, and Jia Xu. Analyzing encoded concepts in transformer language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3082--3101, Seattle, United States, July 2022. Association for Computational Linguistics. [ bib | DOI | http ]
[6] Fahim Dalvi, Abdul Khan, Firoj Alam, Nadir Durrani, Jia Xu, and Hassan Sajjad. Discovering latent concepts learned in BERT. In International Conference on Learning Representations, 2022. [ bib | http ]
[7] Nadir Durrani, Hassan Sajjad, and Fahim Dalvi. How transfer learning impacts linguistic knowledge in deep NLP models? In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4947--4957, Online, August 2021. Association for Computational Linguistics. [ bib | DOI | http ]
[8] Hassan Sajjad, Narine Kokhlikyan, Fahim Dalvi, and Nadir Durrani. Fine-grained interpretation and causation analysis in deep NLP models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials, pages 5--10, Online, June 2021. Association for Computational Linguistics. [ bib | DOI | http ]
[9] Shammur Absar Chowdhury, Nadir Durrani, and Ahmed Ali. What do end-to-end speech models learn about speaker, language and channel information? a layer-wise and neuron-level analysis, 2021. [ bib | arXiv ]
[10] Hassan Sajjad, Firoj Alam, Fahim Dalvi, and Nadir Durrani. Effect of post-processing on contextualized word representations, 2021. [ bib | arXiv ]
[11] Hassan Sajjad, Nadir Durrani, and Fahim Dalvi. Neuron-level interpretation of deep nlp models: A survey, 2021. [ bib | arXiv ]
[12] Fahim Dalvi, Hassan Sajjad, Nadir Durrani, and Yonatan Belinkov. Analyzing redundancy in pretrained transformer models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4908--4926, Online, November 2020. Association for Computational Linguistics. [ bib | DOI | http ]
[13] Nadir Durrani, Hassan Sajjad, Fahim Dalvi, and Yonatan Belinkov. Analyzing individual neurons in pre-trained language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4865--4880, Online, November 2020. Association for Computational Linguistics. [ bib | DOI | http ]
[14] John Wu, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and James Glass. Similarity analysis of contextual word representation models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4638--4655, Online, July 2020. Association for Computational Linguistics. [ bib | DOI | http ]
[15] Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, and James Glass. On the linguistic representational power of neural machine translation models. Computational Linguistics, 46(1):1--52, March 2020. [ bib | DOI | http ]
[16] Hassan Sajjad, Fahim Dalvi, Nadir Durrani, and Preslav Nakov. Poor man's BERT: smaller and faster transformer models. CoRR, abs/2004.03844, 2020. [ bib | arXiv | http ]
[17] Nadir Durrani, Fahim Dalvi, Hassan Sajjad, Yonatan Belinkov, and Preslav Nakov. One size does not fit all: Comparing NMT representations of different granularities. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1504--1516, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. [ bib | DOI | http ]
[18] Fahim Dalvi, Avery Nortonsmith, D. Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, and James Glass. Neurox: A toolkit for analyzing individual neurons in neural networks. In AAAI Conference on Artificial Intelligence (AAAI), January 2019. [ bib | http ]
[19] Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, D. Anthony Bau, and James Glass. What is one grain of sand in the desert? analyzing individual neurons in deep nlp models. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI, Oral presentation), January 2019. [ bib | http ]
[20] Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and James Glass. Identifying and controlling important neurons in neural machine translation. In International Conference on Learning Representations, 2019. [ bib | http ]
[21] Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, and Stephan Vogel. Understanding and improving morphological learning in the neural machine translation decoder. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 142--151, Taipei, Taiwan, November 2017. Asian Federation of Natural Language Processing. [ bib | http ]
[22] Yonatan Belinkov, Lluís Màrquez, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and James Glass. Evaluating layers of representation in neural machine translation on part-of-speech and semantic tagging tasks. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1--10, Taipei, Taiwan, November 2017. Asian Federation of Natural Language Processing. [ bib | http ]
[23] Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, and James Glass. What do Neural Machine Translation Models Learn about Morphology? In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), Vancouver, July 2017. Association for Computational Linguistics. [ bib | .pdf ]

Media Coverage

Various works from the projects have received coverage from science media

Team

Core Team

Hassan Sajjad Associate Professor Faculty of Computer Science
Dalhousie University
Nadir Durrani Senior Scientist Qatar Computing Research Institute
Fahim Dalvi Software Engineer Qatar Computing Research Institute

Collaborators

Abdul Rafae Khan Postdoctoral Researcher Stevens Institute of Technology
Ahmed Abdelali Senior Software Engineer Qatar Computing Research Institute
Firoj Alam Scientist Qatar Computing Research Institute
Jia Xu Assistant Professor Stevens Institute of Technology

Past Collaborators

Anthony Bau Undergraduate Student MIT CSAIL
James Glass Senior Research Scientist MIT CSAIL
Narine Kokhlikyan Software Engineer Facebook AI
Yonatan Belinkov Postdoctoral Researcher MIT and Harvard University

Opportunities

  • Toolkit Contributions

    We are happy to mentor researchers or engineers interested in contributing to our Open-Source toolkit. Follow the link for some ideas or get in touch to suggest your own!

    Potential Ideas