Welcome!¶
NeuroX is a Python library that encapsulates various methods for neuron interpretation and analysis, geared towards Deep NLP models. The library is a one-stop shop for activation extraction, probe training, clustering analysis, neuron selection and more. We currently support transformers models, with support for more toolkits coming soon.
Features¶
Support for extraction of activation from popular models including the entirety of transformers, with extended support for other models like OpenNMT-py planned in the near future
Support for training linear probes on top of these activations, on the entire activation space of a model, on specific layers, or even on specific set of neurons.
Support for neuron extraction related to specific concepts, using the Linear Correlation Analysis method (What is one Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models.). The toolkit can extract either a local ranking of neurons important to a particular target class, or a global ranking of neurons important to all the target classes.
Support for ablation analysis by either removing or zeroing out specific neurons to determine their function and importance.
Support for subword and character level aggregation across a variety of tokenizers, including BPE and all tokenizers in the transformers library.
Support for activation visualization over regular text, to generate qualitative samples of neuron activity over particular sentences.
Getting Started¶
See the Installation Instructions page for various ways of installing the toolkit. Browsing the methods in the API Reference is the best way to explore the toolkit. A Jupyter notebook is also provided with a complete example all the way from extraction to visualizing top neurons.
Citation¶
Please cite our AAAI’19 paper if you use this toolkit in your work.
@article{dalvi2019neurox,
title={NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks},
author={Dalvi, Fahim
and Nortonsmith, Avery
and Bau, D Anthony
and Belinkov, Yonatan
and Sajjad, Hassan
and Durrani, Nadir
and Glass, James},
journal={Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)},
year={2019}
}