neurox.interpretation¶
Submodules:
neurox.interpretation.ablation¶
Module for ablating neurons using various techniques.
This module provides a set of methods to ablate both layers and individual neurons from a given set.

neurox.interpretation.ablation.
keep_specific_neurons
(X, neuron_list)[source]¶ Filter activations so that they only contain specific neurons.
Warning
This function is deprecated and will be removed in future versions. Use
interpretation.ablation.filter_activations_keep_neurons
instead. Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
neuron_list (list or numpy.ndarray) – List of neurons to keep
 Returns
filtered_X – Numpy Matrix of size [
NUM_TOKENS
xlen(neuron_list)
] Return type
numpy.ndarray view

neurox.interpretation.ablation.
filter_activations_keep_neurons
(X, neurons_to_keep)[source]¶ Filter activations so that they only contain specific neurons.
Note
The returned value is a view, so modifying it will modify the original matrix.
 Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
neurons_to_keep (list or numpy.ndarray) – List of neurons to keep
 Returns
filtered_X – Numpy Matrix of size [
NUM_TOKENS
xlen(neurons_to_keep)
] Return type
numpy.ndarray view

neurox.interpretation.ablation.
filter_activations_remove_neurons
(X, neurons_to_remove)[source]¶ Filter activations so that they do not contain specific neurons.
Note
The returned value is a view, so modifying it will modify the original matrix.
 Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
neurons_to_remove (list or numpy.ndarray) – List of neurons to remove
 Returns
filtered_X – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS  len(neurons_to_remove)
] Return type
numpy.ndarray view

neurox.interpretation.ablation.
zero_out_activations_keep_neurons
(X, neurons_to_keep)[source]¶ Mask all neurons activations with zero other than specified neurons.
 Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
neurons_to_keep (list or numpy.ndarray) – List of neurons to not mask
 Returns
filtered_X – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
] Return type
numpy.ndarray

neurox.interpretation.ablation.
zero_out_activations_remove_neurons
(X, neurons_to_remove)[source]¶ Mask specific neuron activations with zero.
 Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
neurons_to_remove (list or numpy.ndarray) – List of neurons to mask
 Returns
filtered_X – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
] Return type
numpy.ndarray

neurox.interpretation.ablation.
filter_activations_by_layers
(X, layers_to_keep, num_layers, bidirectional_filtering='none')[source]¶ Filter activations so that they only contain specific layers.
Useful for performing layerwise analysis.
 Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
layers_to_keep (list or numpy.ndarray) – List of layers to keep. Layers are 0indexed
num_layers (int) – Total number of layers in the original model.
bidirectional_filtering (str) – Can be either “none” (Default), “forward” or “backward”. Useful if the model being analyzed is bidirectional and only layers in a certain direction need to be analyzed.
 Returns
filtered_X – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS_PER_LAYER * NUM_LAYERS
] The second dimension is doubled if the original model is bidirectional and no filtering is done. Return type
numpy.ndarray
Notes
For bidirectional models, the method assumes that the internal structure is as follows: forward layer 0 neurons, backward layer 0 neurons, forward layer 0 neurons …
neurox.interpretation.clustering¶
Module for clustering analysis.
This module contains functions to perform clustering analysis on neuron activations.

neurox.interpretation.clustering.
create_correlation_clusters
(X, use_abs_correlation=True, clustering_threshold=0.5, method='average')[source]¶ Create clusters based on neuron activation correlation. All neurons in the same cluster have “highly correlated” neurons that fire similarly on similar inputs.
 Parameters
X (numpy.ndarray) – Matrix of size [ NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors
use_abs_correlation (bool, optional) – Whether to use absolute correlation values. Two neurons that are correlated in the opposite direction may represent the same “knowledge” in a large neural network.
clustering_threshold (float, optional) – Hyperparameter for clustering. This is used as the threshold to convert hierarchical clusters into flat clusters.
 Returns
cluster_labels – List of cluster labels for every neuron
 Return type
list

neurox.interpretation.clustering.
extract_independent_neurons
(X, use_abs_correlation=True, clustering_threshold=0.5)[source]¶ Extract independent neurons from the given set of neurons.
This method first clusters all of the given neurons with every cluster representing similar neurons. A single neuron is then picked randomly from every cluster and this forms the final set of independent neurons that is returned
 Parameters
X (numpy.ndarray) – Matrix of size [ NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors
use_abs_correlation (bool, optional) – Whether to use absolute correlation values. Two neurons that are correlated in the opposite direction may represent the same “knowledge” in a large neural network.
clustering_threshold (float, optional) – Hyperparameter for clustering. This is used as the threshold to convert hierarchical clusters into flat clusters.
 Returns
independent_neurons – List of nonredundant indepenent neurons
 Return type
list

neurox.interpretation.clustering.
print_clusters
(cluster_labels)[source]¶ Utility function for printing clusters
 Parameters
cluster_labels (list) – List of cluster labels for every neuron. Usually the output of
interpretation.clustering.create_correlation_clusters
.

neurox.interpretation.clustering.
scikit_extract_independent_neurons
(X, clustering_threshold=0.5)[source]¶ Alternative implementation of
interpretation.clustering.extract_independent_neurons
.This is an alternative implementation of the
extract_independent_neurons
function using scikitlearn to create the correlation matrix instead of numpy. Should give identical results. Parameters
X (numpy.ndarray) – Matrix of size [ NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors
clustering_threshold (float, optional) – Hyperparameter for clustering. This is used as the threshold to convert hierarchical clusters into flat clusters.
 Returns
independent_neurons (list) – List of nonredundant indepenent neurons
cluster_labels (list) – List of cluster labels for every neuron
neurox.interpretation.linear_probe¶
Module for layer and neuron level linearprobe based analysis.
This module contains functions to train, evaluate and use a linear probe for both layerwise and neuronwise analysis.

class
neurox.interpretation.linear_probe.
LinearProbe
(input_size, num_classes)[source]¶ Bases:
torch.nn.modules.module.Module
Torch model for linear probe

training
: bool¶


neurox.interpretation.linear_probe.
l1_penalty
(var)[source]¶ L1/Lasso regularization penalty
 Parameters
var (torch.Variable) – Torch variable representing the weight matrix over which the penalty should be computed
 Returns
penalty – Torch variable containing the penalty as a single floating point value
 Return type
torch.Variable

neurox.interpretation.linear_probe.
l2_penalty
(var)[source]¶ L2/Ridge regularization penalty.
 Parameters
var (torch.Variable) – Torch variable representing the weight matrix over which the penalty should be computed
 Returns
penalty – Torch variable containing the penalty as a single floating point value
 Return type
torch.Variable
Notes
The penalty is derived from the L2norm, which has a square root. The exact optimization can also be done without the square root, but this makes no difference in the actual output of the optimization because of the scaling factor used along with the penalty.

neurox.interpretation.linear_probe.
train_logistic_regression_probe
(X_train, y_train, lambda_l1=0, lambda_l2=0, num_epochs=10, batch_size=32, learning_rate=0.001)[source]¶ Train a logistic regression probe.
This method trains a linear classifier that can be used as a probe to perform neuron analysis. Use this method when the task that is being probed for is a classification task. A logistic regression model is trained with Cross Entropy loss. The optimizer used is Adam with default
torch.optim
package hyperparameters. Parameters
X_train (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
.dtype
of the matrix must benp.float32
y_train (numpy.ndarray) – Numpy Vector with 0indexed class labels for each input token. The size of the vector must be [
NUM_TOKENS
]. Usually the output ofinterpretation.utils.create_tensors
. Assumes that class labels are continuous from0
toNUM_CLASSES1
.dtype
of the matrix must benp.int
lambda_l1 (float, optional) – L1 Penalty weight in the overall loss. Defaults to 0, i.e. no L1 regularization
lambda_l2 (float, optional) – L2 Penalty weight in the overall loss. Defaults to 0, i.e. no L2 regularization
num_epochs (int, optional) – Number of epochs to train the linear model for. Defaults to 10
batch_size (int, optional) – Batch size for the input to the linear model. Defaults to 32
learning_rate (float, optional) – Learning rate for optimizing the linear model.
 Returns
probe – Trained probe for the given task.
 Return type

neurox.interpretation.linear_probe.
train_linear_regression_probe
(X_train, y_train, lambda_l1=0, lambda_l2=0, num_epochs=10, batch_size=32, learning_rate=0.001)[source]¶ Train a linear regression probe.
This method trains a linear classifier that can be used as a probe to perform neuron analysis. Use this method when the task that is being probed for is a regression task. A linear regression model is trained with MSE loss. The optimizer used is Adam with default
torch.optim
package hyperparameters. Parameters
X_train (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
.dtype
of the matrix must benp.float32
y_train (numpy.ndarray) – Numpy Vector with realvalued labels for each input token. The size of the vector must be [
NUM_TOKENS
]. Usually the output ofinterpretation.utils.create_tensors
.dtype
of the matrix must benp.float32
lambda_l1 (float, optional) – L1 Penalty weight in the overall loss. Defaults to 0, i.e. no L1 regularization
lambda_l2 (float, optional) – L2 Penalty weight in the overall loss. Defaults to 0, i.e. no L2 regularization
num_epochs (int, optional) – Number of epochs to train the linear model for. Defaults to 10
batch_size (int, optional) – Batch size for the input to the linear model. Defaults to 32
learning_rate (float, optional) – Learning rate for optimizing the linear model.
 Returns
probe – Trained probe for the given task.
 Return type

neurox.interpretation.linear_probe.
evaluate_probe
(probe, X, y, idx_to_class=None, return_predictions=False, source_tokens=None, batch_size=32, metric='accuracy')[source]¶ Evaluates a trained probe.
This method evaluates a trained probe on the given data, and supports several standard metrics.
 Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
.dtype
of the matrix must benp.float32
y (numpy.ndarray) – Numpy Vector of size [
NUM_TOKENS
] with class labels for each input token. For classification, 0indexed class labels for each input token are expected. For regression, a real value per input token is expected. Usually the output ofinterpretation.utils.create_tensors
idx_to_class (dict, optional) – Class index to name mapping. Usually returned by
interpretation.utils.create_tensors
. If this mapping is provided, perclass metrics are also computed. Defaults to None.return_predictions (bool, optional) – If set to True, actual predictions are also returned along with scores for further use. Defaults to False.
source_tokens (list of lists, optional) – List of all sentences, where each is a list of the tokens in that sentence. Usually returned by
data.loader.load_data
. If provided andreturn_predictions
is True, each prediction will be paired with its original token. Defaults to None.batch_size (int, optional) – Batch size for the input to the model. Defaults to 32
metrics (str, optional) – Metric to use for evaluation scores. For supported metrics see
interpretation.metrics
 Returns
scores (dict) – The overall score on the given data with the key
__OVERALL__
. Ifidx_to_class
mapping is provided, additional keys representing each class and their associated scores are also part of the dictionary.predictions (list of 3tuples, optional) – If
return_predictions
is set to True, this list will contain a 3tuple for every input sample, representing(source_token, predicted_class, was_predicted_correctly)

neurox.interpretation.linear_probe.
get_top_neurons
(probe, percentage, class_to_idx)[source]¶ Get top neurons from a trained probe.
This method returns the set of all top neurons based on the given percentage. It also returns top neurons per class. All neurons (sorted by weight in ascending order) that account for
percentage
of the total weight mass are returned. See the given reference for the compcomplete selection algorithm description.Note
Absolute weight values are used for selection, instead of raw signed values
 Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
percentage (float) – Real number between 0 and 1, with 0 representing no weight mass and 1 representing the entire weight mass, i.e. all neurons.
class_to_idx (dict) – Class to class index mapping. Usually returned by
interpretation.utils.create_tensors
.
 Returns
overall_top_neurons (numpy.ndarray) – Numpy array with all top neurons
top_neurons (dict) – Dictionary with top neurons for every class, with the class name as the key and
numpy.ndarray
of top neurons (for that class) as the value.
Notes
One can expect distributed tasks to have more top neurons than focused tasks
One can also expect complex tasks to have more top neurons than simpler tasks

neurox.interpretation.linear_probe.
get_top_neurons_hard_threshold
(probe, fraction, class_to_idx)[source]¶ Get top neurons from a trained probe based on the maximum weight.
This method returns the set of all top neurons based on the given threshold. All neurons that have a weight above
threshold * max_weight
are considered as top neurons. It also returns top neurons per class.Note
Absolute weight values are used for selection, instead of raw signed values
 Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
fraction (float) – Fraction of maximum weight per class to use for selection
class_to_idx (dict) – Class to class index mapping. Usually returned by
interpretation.utils.create_tensors
.
 Returns
overall_top_neurons (numpy.ndarray) – Numpy array with all top neurons
top_neurons (dict) – Dictionary with top neurons for every class, with the class name as the key and
numpy.ndarray
of top neurons (for that class) as the value.

neurox.interpretation.linear_probe.
get_bottom_neurons
(probe, percentage, class_to_idx)[source]¶ Get bottom neurons from a trained probe.
Analogous to
interpretation.linear_probe.get_top_neurons
. This method returns the set of all bottom neurons based on the given percentage. It also returns bottom neurons per class. All neurons (sorted by weight in ascending order) that account forpercentage
of the total weight mass are returned. See the given reference for the complete selection algorithm description.Note
Absolute weight values are used for selection, instead of raw signed values
 Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
percentage (float) – Real number between 0 and 1, with 0 representing no weight mass and 1 representing the entire weight mass, i.e. all neurons.
class_to_idx (dict) – Class to class index mapping. Usually returned by
interpretation.utils.create_tensors
.
 Returns
overall_bottom_neurons (numpy.ndarray) – Numpy array with all bottom neurons
bottom_neurons (dict) – Dictionary with bottom neurons for every class, with the class name as the key and
numpy.ndarray
of bottom neurons (for that class) as the value.

neurox.interpretation.linear_probe.
get_random_neurons
(probe, probability)[source]¶ Get random neurons from a trained probe.
This method returns a random set of neurons based on the probability. Each neuron is either discarded or included based on a uniform random variable’s value (included if its less than probability, discarded otherwise)
 Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
probability (float) – Real number between 0 and 1, with 0 representing no selection and 1 representing selection of all neurons.
 Returns
random_neurons – Numpy array with random neurons
 Return type
numpy.ndarray

neurox.interpretation.linear_probe.
get_neuron_ordering
(probe, class_to_idx, search_stride=100)[source]¶ Get global ordering of neurons from a trained probe.
This method returns the global ordering of neurons in a model based on the given probe’s weight values. Top neurons are computed at increasing percentages of the weight mass and then accumulated inorder. See given reference for a complete description of the selection algorithm.
For example, if the neuron list at 1% weight mass is [#2, #52, #134], and at 2% weight mass is [#2, #4, #52, #123, #130, #134, #567], the returned ordering will be [#2, #52, #134, #4, #123, #130, #567]. Within each percentage, the ordering of neurons is arbitrary. In this case, the importance of #2, #52 and #134 is not necessarily in that order. The cutoffs between each percentage selection are also returned. Increasing the
search_stride
will decrease the distance between each cutoff, making the overall ordering more accurate.Note
Absolute weight values are used for selection, instead of raw signed values
 Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
class_to_idx (dict) – Class to class index mapping. Usually returned by
interpretation.utils.create_tensors
.search_stride (int, optional) – Defines how many pieces the percent weight mass selection is divided into. Higher leads to more a accurate ordering. Defaults to 100.
 Returns
global_neuron_ordering (numpy.ndarray) – Numpy array of size
NUM_NEURONS
with neurons in decreasing order of importance.cutoffs (list) – Indices where each percentage selection begins. All neurons between two cutoff values are arbitrarily ordered.

neurox.interpretation.linear_probe.
get_neuron_ordering_granular
(probe, class_to_idx, granularity=50, search_stride=100)[source]¶ Get global ordering of neurons from a trained probe.
This method is an alternative to
interpretation.linear_probe.get_neuron_ordering
. It works very similarly to that method, except that instead of adding the neurons from each percentage selection, neurons are added in chunks ofgranularity
neurons.Note
Absolute weight values are used for selection, instead of raw signed values
 Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
class_to_idx (dict) – Class to class index mapping. Usually returned by
interpretation.utils.create_tensors
.granularity (int, optional) – Approximate number of neurons in each chunk of selection. Defaults to 50.
search_stride (int, optional) – Defines how many pieces the percent weight mass selection is divided into. Higher leads to more a accurate ordering. Defaults to 100.
 Returns
global_neuron_ordering (numpy.ndarray) – Numpy array of size
NUM_NEURONS
with neurons in decreasing order of importance.cutoffs (list) – Indices where each chunk of selection begins. Each chunk will contain approximately
granularity
neurons. All neurons between two cutoff values (i.e. a chunk) are arbitrarily ordered.

neurox.interpretation.linear_probe.
get_fixed_number_of_bottom_neurons
(probe, num_bottom_neurons, class_to_idx)[source]¶ Get global bottom neurons.
This method returns a fixed number of bottoms neurons from the global ordering computed using
interpretation.linear_probe.get_neuron_ordering
.Note
Absolute weight values are used for selection, instead of raw signed values
 Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
num_bottom_neurons (int) – Number of bottom neurons for selection
class_to_idx (dict) – Class to class index mapping. Usually returned by
interpretation.utils.create_tensors
.
 Returns
global_bottom_neurons – Numpy array of size
num_bottom_neurons
with bottom neurons using the global ordering Return type
numpy.ndarray
neurox.interpretation.metrics¶
Module that wraps around several standard metrics

neurox.interpretation.metrics.
accuracy
(preds, labels)[source]¶ Accuracy.
 Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
 Returns
accuracy – Accuracy of the model
 Return type
float

neurox.interpretation.metrics.
f1
(preds, labels)[source]¶ FScore or F1 score.
Note
The implementation from
sklearn.metrics
is used to compute the score. Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
 Returns
f1_score – FScore of the model
 Return type
float

neurox.interpretation.metrics.
accuracy_and_f1
(preds, labels)[source]¶ Mean of Accuracy and FScore.
Note
The implementation from
sklearn.metrics
is used to compute the FScore. Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
 Returns
acc_f1_mean – Mean of Accuracy and FScore of the model
 Return type
float

neurox.interpretation.metrics.
pearson
(preds, labels)[source]¶ Pearson’s correlation coefficient
Note
The implementation from
scipy.stats
is used to compute the score. Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
 Returns
pearson_score – Pearson’s correlation coefficient of the model
 Return type
float

neurox.interpretation.metrics.
spearman
(preds, labels)[source]¶ Spearman correlation coefficient
Note
The implementation from
scipy.stats
is used to compute the score. Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
 Returns
spearman_score – Spearman correlation coefficient of the model
 Return type
float

neurox.interpretation.metrics.
pearson_and_spearman
(preds, labels)[source]¶ Mean of Pearson and Spearman correlation coefficients.
Note
The implementation from
scipy.stats
is used to compute the scores. Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
 Returns
pearson_spearman_mean – Mean of Pearson and Spearman correlation coefficients of the model
 Return type
float

neurox.interpretation.metrics.
matthews_corrcoef
(preds, labels)[source]¶ Matthew’s correlation coefficient
Note
The implementation from
sklearn.metrics
is used to compute the score. Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
 Returns
mcc_score – Matthew’s correlation coefficient of the model
 Return type
float

neurox.interpretation.metrics.
compute_score
(preds, labels, metric)[source]¶ Utility function to compute scores using several metrics.
 Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
metric (str) – One of
accuracy
,f1
,accuracy_and_f1
,pearson
,spearman
,pearson_and_spearman
ormatthews_corrcoef
.
 Returns
score – Score of the model with the chosen metric
 Return type
float
neurox.interpretation.probeless¶
Module for Probeless method
This module extracts neuron ranking for a label/tag (e.g Verbs) or for an entire property set (e.g Part of speech) without training any probes.

neurox.interpretation.probeless.
get_neuron_ordering
(X_train, y_train)[source]¶ Returns a list of top neurons w.r.t the overall task e.g. POS
 Parameters
X_train (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
y_train (numpy.ndarray) – Numpy Vector of size [
NUM_TOKENS
] with class labels for each input token. Usually the output ofinterpretation.utils.create_tensors
.
 Returns
ranking – list of
NUM_NEURONS
neuron indices, in decreasing order of importance. Return type
list

neurox.interpretation.probeless.
get_neuron_ordering_for_tag
(X_train, y_train, label2idx, tag)[source]¶ Returns a list of top neurons w.r.t a tag e.g. noun
 Parameters
X_train (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
y_train (numpy.ndarray) – Numpy Vector of size [
NUM_TOKENS
] with class labels for each input token. Usually the output ofinterpretation.utils.create_tensors
.label2idx (dict) – Class name to index mapping. Usually returned by
interpretation.utils.create_tensors
.tag (string) – tag for which rankings are extracted
 Returns
ranking – list of
NUM_NEURONS
neuron indices, in decreasing order of importance. Return type
list
Returns a dictionary of tags along with top neurons for each tag Returns a list of overall ranking
 Parameters
X_train (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
y_train (numpy.ndarray) – Numpy Vector of size [
NUM_TOKENS
] with class labels for each input token. Usually the output ofinterpretation.utils.create_tensors
.idx2label (dict) – Class index to name mapping. Usually returned by
interpretation.utils.create_tensors
.
 Returns
overall_ranking (list) – list of
NUM_NEURONS
neuron indices, in decreasing order of importance.ranking_per_tag (dict) – Dictionary with top neurons for every class, with the class name as the key and list of neurons as the values.
neurox.interpretation.utils¶

neurox.interpretation.utils.
isnotebook
()[source]¶ Utility function to detect if the code being run is within a jupyter notebook. Useful to change progress indicators for example.
 Returns
isnotebook – True if the function is being called inside a notebook, False otherwise.
 Return type
bool

neurox.interpretation.utils.
get_progress_bar
()[source]¶ Utility function to get a progress bar depending on the environment the code is running in. A normal textbased progress bar is returned in normal shells, and a notebook widgetbased progress bar is returned in jupyter notebooks.
 Returns
progressbar – The appropriate progressbar from the tqdm library.
 Return type
function

neurox.interpretation.utils.
batch_generator
(X, y, batch_size=32)[source]¶ Generator function to generate batches of data for training/evaluation.
This function takes two tensors representing the activations and labels respectively, and yields batches of parallel data. The last batch may contain fewer than
batch_size
elements. Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
y (numpy.ndarray) – Numpy Vector of size [
NUM_TOKENS
] with class labels for each input token. For classification, 0indexed class labels for each input token are expected. For regression, a real value per input token is expected. Usually the output ofinterpretation.utils.create_tensors
batch_size (int, optional) – Number of samples to return in each call. Defaults to 32.
 Yields
X_batch (numpy.ndarray) – Numpy Matrix of size [
batch_size
xNUM_NEURONS
]. The final batch may have fewer elements than the requestedbatch_size
y_batch (numpy.ndarray) – Numpy Vector of size [
batch_size
]. The final batch may have fewer elements than the requestedbatch_size

neurox.interpretation.utils.
tok2idx
(tokens)[source]¶ Utility function to generate unique indices for a set of tokens.
 Parameters
tokens (list of lists) – List of sentences, where each sentence is a list of tokens. Usually returned from
data.loader.load_data
 Returns
tok2idx_mapping – A dictionary with tokens as keys and a unique index for each token as values
 Return type
dict

neurox.interpretation.utils.
idx2tok
(srcidx)[source]¶ Utility function to an inverse mapping from a
tok2idx
mapping. Parameters
tok2idx_mapping (dict) – Token to index mapping, usually the output for
interpretation.utils.tok2idx
. Returns
idx2tok – A dictionary with unique indices as keys and their associated tokens as values
 Return type
dict

neurox.interpretation.utils.
count_target_words
(tokens)[source]¶ Utility function to count the total number of tokens in a dataset.
 Parameters
tokens (list of lists) – List of sentences, where each sentence is a list of tokens. Usually returned from
data.loader.load_data
 Returns
count – Total number of tokens in the given
tokens
structure Return type
int

neurox.interpretation.utils.
create_tensors
(tokens, activations, task_specific_tag, mappings=None, task_type='classification', binarized_tag=None, balance_data=False)[source]¶ Method to preprocess loaded datasets into tensors that can be used to train probes and perform analyis on. The input tokens are represented as list of sentences, where each sentence is a list of tokens. Each token also has an associated label. All tokens from all sentences are flattened into one dimension in the returned tensors. The returned tensors will thus have
total_num_tokens
rows. Parameters
tokens (list of lists) – List of sentences, where each sentence is a list of tokens. Usually returned from
data.loader.load_data
activations (list of numpy.ndarray) – List of sentence representations, where each sentence representation is a numpy matrix of shape
[num tokens in sentence x concatenated representation size]
. Usually retured fromdata.loader.load_activations
task_specific_tag (str) – Label to assign tokens with unseen labels. This is particularly useful if some labels are never seen during train, but are present in the dev or test set. This is usually set to the majority class in the task.
mappings (list of dicts) – List of four python dicts:
label2idx
,idx2label
,src2idx
andidx2src
for classification tasks. List of two dictssrc2idx
andidx2src
for regression tasks. Each dict represents either the mapping from class labels to indices and source tokens to indices or vice versa. Usually returned from a previous call tocreate_tensors
.task_type (str) – Either “classification” or “regression”, indicate the kind of task that is being probed.
binarized_tag (str, optional) – Tag/Label to create binary data. All other labels in the dataset are changed to OTHER. Defaults to None in which case the data labels are processed asis.
balance_data (bool, optional) – Whether the incoming data should be balanced. Data is balanced using utils.balance_binary_class_data for binary data and utils.balance_multi_class_data for multiclass data using undersampling. Defaults to False.
 Returns
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]y (numpy.ndarray) – Numpy vector of size [
NUM_TOKENS
]mappings (list of dicts) – List of four python dicts:
label2idx
,idx2label
,src2idx
andidx2src
for classification tasks. List of two dictssrc2idx
andidx2src
for regression tasks. Each dict represents either the mapping from class labels to indices and source tokens to indices or vice versa.
Notes
mappings
should be created exactly once, and should be reused for subsequent callsFor example,
mappings
can be created on train data, and the passed during the call for dev and test data.

neurox.interpretation.utils.
print_overall_stats
(all_results)[source]¶ Method to pretty print overall results.
Warning
This method was primarily written to process results from internal scripts and pipelines.
 Parameters
all_results (dict) – Dictionary containing the probe, overall scores, scores from selected neurons, neuron ordering and neuron selections at various percentages

neurox.interpretation.utils.
print_machine_stats
(all_results)[source]¶ Method to print overall results in tsv format.
Warning
This method was primarily written to process results from internal scripts and pipelines.
 Parameters
all_results (dict) – Dictionary containing the probe, overall scores, scores from selected neurons, neuron ordering and neuron selections at various percentages

neurox.interpretation.utils.
balance_binary_class_data
(X, y)[source]¶ Method to balance binary class data.
Note
The majority class is undersampled randomly to match the minority class in it’s size.
 Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually returned frominterpretation.utils.create_tensors
y (numpy.ndarray) – Numpy vector of size [
NUM_TOKENS
]. Usually returned frominterpretation.utils.create_tensors
 Returns
X_balanced (numpy.ndarray) – Numpy matrix of size [
NUM_BALANCED_TOKENS
xNUM_NEURONS
]y_balanced (numpy.ndarray) – Numpy vector of size [
NUM_BALANCED_TOKENS
]

neurox.interpretation.utils.
balance_multi_class_data
(X, y, num_required_instances=None)[source]¶ Method to balance multi class data.
Note
All classes are undersampled randomly to match the minority class in their size. If
num_required_instances
is provided, all classes are sampled proportionally so that the total number of selected examples is approximatelynum_required_instances
(because of rounding proportions). Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually returned frominterpretation.utils.create_tensors
y (numpy.ndarray) – Numpy vector of size [
NUM_TOKENS
]. Usually returned frominterpretation.utils.create_tensors
num_required_instances (int, optional) – Total number of required instances. All classes are sampled proportionally.
 Returns
X_balanced (numpy.ndarray) – Numpy matrix of size [
NUM_BALANCED_TOKENS
xNUM_NEURONS
]y_balanced (numpy.ndarray) – Numpy vector of size [
NUM_BALANCED_TOKENS
]

neurox.interpretation.utils.
load_probe
(probe_path)[source]¶ Loads a probe and its associated mappings from probe_path
Warning
This method is currently not implemented.
 Parameters
probe_path (str) – Path to a pkl object saved by interpretation.utils.save_probe
 Returns
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
mappings (list of dicts) – List of four python dicts:
label2idx
,idx2label
,src2idx
andidx2src
for classification tasks. List of two dictssrc2idx
andidx2src
for regression tasks. Each dict represents either the mapping from class labels to indices and source tokens to indices or vice versa.

neurox.interpretation.utils.
save_probe
(probe_path, probe, mappings)[source]¶ Saves a model and its associated mappings as a pkl object at probe_path
Warning
This method is currently not implemented.
 Parameters
probe_path (str) – Path to save a pkl object
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
mappings (list of dicts) – List of four python dicts:
label2idx
,idx2label
,src2idx
andidx2src
for classification tasks. List of two dictssrc2idx
andidx2src
for regression tasks. Each dict represents either the mapping from class labels to indices and source tokens to indices or vice versa.
Module contents: