Potential Projects
Neuron Ablation and Intervention
Ablation is one of the common methods to identify the role of a neuron or a set of neurons in a neural network model. In other words, it measures the effect of a set of neurons on the model’s prediction. For example, in a sentiment classification task, using ablation one can identify a set of neurons that are responsible for predicting positive sentiment class. Ablation is performed by changing the values of a set of neurons to a specified new value.
Detailed features/tasks- Implement Single/multiple neurons ablation
- Make the code generic to allow multi-layer ablation
- Support changing neuron values based on a neuron behavior
- Change neuron value to it’s mean value over a set of sentences
- Change neuron value to predefined values
- Support HuggingFace encoder-based models
Outcomes
Ablation and neuron activation forcing has been implemented for at least the popular NLP transformers models. Alongside the methods, a short tutorial in the form of a notebook should also be committed to the repository.
Expected project size: 350 hours
Universal Activation Extraction
An important aspect of neuron analysis and probing methods is extracting features (activations) from the model that is being interpreted. NeuroX currently supports a large number of transformer models because of the unified API provided by HuggingFace. The goal of this project would be to extend this to support more models (within transformers and beyond)
Detailed features/tasks- Fully describe the scope of the current extraction code, which HuggingFace models it does work with and which it doesn’t
- Expand activation extraction to support more encoder models (if any)
- Expand activation extraction to support seq2seq models
- Extract activations with the predicted tokens
- Extract activations with specific ground truth (i.e. force predictions)
- Integrate existing “generic PyTorch extractor” inside NeuroX
- Extend support to one other toolkit, e.g. OpenNMT
Outcomes
Activation extraction code supports transformers sequence to sequence models and at least one other toolkit like OpenNMT.
Expected project size: 350 hours
Probing methods Implementation
The task is to implement new neuron discovery and analysis methods such as the gaussian-based probing and corpus search.
Gaussian-based probe
Implement gaussian-based probe as proposed by Hennigen et al. (2020). The gaussian-based method assumed that the neuron activations exhibit gaussian distributions with respect to concepts like past and present verbs. It fits a multivariate gaussian using all neurons across a dataset to extract interesting linguistic concepts such as tense and number.
Paper: Intrinsic probing through dimension selection https://aclanthology.org/2020.emnlp-main.15.pdf
Code: https://github.com/rycolab/intrinsic-probing
Corpus Search method
Given a set of sentences, identify neurons that fire with the highest activations across the sentences.
Paper: Compositional Explanation of Neurons https://arxiv.org/abs/2006.14032
Paper: Finding Experts in Transformer Models https://arxiv.org/abs/2005.07647
- Implement Gaussian based probing along side linear probes in the toolkit
- Implement corpus search that takes a set of input sentences and return neurons that have an average activation value above a certain threshold
- Add an option to identify word-wise neuron search instead of sentence-wise neurons as in the first task
- Implement Union over intersection to identify a neuron (Paper: Compositional Explanation of Neurons)
Outcomes
New methods have been implemented and tested against existing codebases (if any) and results presented in the paper. In addition, tutorials have been added alongside existing examples that make use of the new implementations.
Expected project size: 175 hours for 1 class of methods, 350 hours for both classes
Probing-based Model Cards
The concept of Model cards has been proposed several times in the past, aiming to provide a concise and easy way to get a specific model’s overview and compare various models. NeuroX’s neuron probing can be used to gauge a model’s internal knowledge structure and extent towards particular tasks and dataset. This project would require setting up the infrastructure and building an app to run the NeuroX pipeline over several pre-existing datasets and produce a Model card.
Detailed features/tasks- Accumulate a set of datasets that form a good representation of overall knowledge (or a subset to begin with)
- Pre-process datasets into a consistent form so that the pipeline can deal with new data/tasks
- Create a backend (REST API) that given a model, computes the knowledge scores against the existing datasets
- This computation pipeline can be expanded to use a microservices architecture
- Build a frontend where a user can submit new models, compare existing models, or delve deeper into a specific model’s card
Outcomes
A web application and REST API is available to compute and retrieve a model card given a transformers model and predefined set of datasets.
Expected project size: 350 hours
Information theoretic probing
Following Voita & Titov., 2020,
You can choose priors for weights to get something interesting. For example, if you choose sparsity-inducing priors on the parameters, you can get variational dropout. What is more interesting, you can do this in such a way that you prune the whole neurons (not just individual weights): this is the Bayesian network compression method we use. This allows us to assess the probe complexity both using its description length and by inspecting the discovered architecture ...
We could use the information-theoretic view to probing tasks to study the role of the individual neurons in the probing task. The task is, following the original implementation, as well as the alternative supplementary materials and implementations, to bring the information-theoretic paradigm to NeuroX.
Detailed features/tasks- (Part 1) Implement one of the information-theoretic probing papers in the same models back-end stack as NeuroX has (i.e. provide the implementation compatible with the same frameworks) and provide the analytic proposal of using the method to probe individual neurons. Provide with the tutorial showing how to use the implemented method to perform probing (Difficulty: Medium, 175 hrs)
- (Part 2) Implement the ability to probe individual neurons with the proposed method, following the NeuroX API (Difficulty: Challenging, 175 hrs)
- Math: familiarity with the bayesian statistics and the information theory enough to understand the https://aclanthology.org/2020.acl-main.420.pdf and https://lena-voita.github.io/posts/mdl_probes.html.
- Programming: familiarity with Scikit-Learn, NLP stack for Python, OOP skills
- NLP: familiarity with the Transformers architecture, knowledge of the basic NLP tasks, interest towards linguistics
Outcomes
Information theoritic probing methods have been explored and at least a proof-of-concept has been implemented in NeuroX (as part of the library or as an example notebook).
Expected project size: 175 hours per part
Coding challenge Reproduce any of the Information-Theoretic probing works referenced in (Papers) with the SentEval Tense task and the BERT model
NeuroX Tutorials
The NeuroX toolkit serves as a supporting tool for various usecases such as, neuron probing, analyzing redundancy in the network, task-specific probing. This project aims at building a starter kit for the neuroX toolkit. More specifically, the contributor will prepare various end-to-end guidelines (python notebooks) that showcase different usecases of NeuroX.
Detailed features/tasks- Prepare a tutorial on binary classification probing task
- Build a pipeline to convert multiclass data to a binary task e.g. in POS tagging, train a task for Noun vs. not-Noun
- Build a pipeline to create binary data based on regex or a dictionary e.g. task of years vs. not-years, country-name vs. not-country-name
- Prepare a tutorial on multi-class probing task e.g. POS tagging
- Incorporate different visualizations to analyze the results
- Incorporate various test cases such as control tasks, random embeddings, etc.
Outcomes
Several tutorials have been added to the examples directory of the toolkit, alongside improvements in documentation where necessary.
Expected project size: 175 hours