neurox.interpretation¶
Submodules:
neurox.interpretation.ablation¶
Module for ablating neurons using various techniques.
This module provides a set of methods to ablate both layers and individual neurons from a given set.
-
neurox.interpretation.ablation.
keep_specific_neurons
(X, neuron_list)[source]¶ Filter activations so that they only contain specific neurons.
Warning
This function is deprecated and will be removed in future versions. Use
interpretation.ablation.filter_activations_keep_neurons
instead.- Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
neuron_list (list or numpy.ndarray) – List of neurons to keep
- Returns
filtered_X – Numpy Matrix of size [
NUM_TOKENS
xlen(neuron_list)
]- Return type
numpy.ndarray view
-
neurox.interpretation.ablation.
filter_activations_keep_neurons
(X, neurons_to_keep)[source]¶ Filter activations so that they only contain specific neurons.
Note
The returned value is a view, so modifying it will modify the original matrix.
- Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
neurons_to_keep (list or numpy.ndarray) – List of neurons to keep
- Returns
filtered_X – Numpy Matrix of size [
NUM_TOKENS
xlen(neurons_to_keep)
]- Return type
numpy.ndarray view
-
neurox.interpretation.ablation.
filter_activations_remove_neurons
(X, neurons_to_remove)[source]¶ Filter activations so that they do not contain specific neurons.
Note
The returned value is a view, so modifying it will modify the original matrix.
- Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
neurons_to_remove (list or numpy.ndarray) – List of neurons to remove
- Returns
filtered_X – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS - len(neurons_to_remove)
]- Return type
numpy.ndarray view
-
neurox.interpretation.ablation.
zero_out_activations_keep_neurons
(X, neurons_to_keep)[source]¶ Mask all neurons activations with zero other than specified neurons.
- Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
neurons_to_keep (list or numpy.ndarray) – List of neurons to not mask
- Returns
filtered_X – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]- Return type
numpy.ndarray
-
neurox.interpretation.ablation.
zero_out_activations_remove_neurons
(X, neurons_to_remove)[source]¶ Mask specific neuron activations with zero.
- Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
neurons_to_remove (list or numpy.ndarray) – List of neurons to mask
- Returns
filtered_X – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]- Return type
numpy.ndarray
-
neurox.interpretation.ablation.
filter_activations_by_layers
(X, layers_to_keep, num_layers, bidirectional_filtering='none')[source]¶ Filter activations so that they only contain specific layers.
Useful for performing layer-wise analysis.
- Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
layers_to_keep (list or numpy.ndarray) – List of layers to keep. Layers are 0-indexed
num_layers (int) – Total number of layers in the original model.
bidirectional_filtering (str) – Can be either “none” (Default), “forward” or “backward”. Useful if the model being analyzed is bi-directional and only layers in a certain direction need to be analyzed.
- Returns
filtered_X – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS_PER_LAYER * NUM_LAYERS
] The second dimension is doubled if the original model is bidirectional and no filtering is done.- Return type
numpy.ndarray
Notes
For bidirectional models, the method assumes that the internal structure is as follows: forward layer 0 neurons, backward layer 0 neurons, forward layer 0 neurons …
neurox.interpretation.clustering¶
Module for clustering analysis.
This module contains functions to perform clustering analysis on neuron activations.
-
neurox.interpretation.clustering.
create_correlation_clusters
(X, use_abs_correlation=True, clustering_threshold=0.5, method='average')[source]¶ Create clusters based on neuron activation correlation. All neurons in the same cluster have “highly correlated” neurons that fire similarly on similar inputs.
- Parameters
X (numpy.ndarray) – Matrix of size [ NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors
use_abs_correlation (bool, optional) – Whether to use absolute correlation values. Two neurons that are correlated in the opposite direction may represent the same “knowledge” in a large neural network.
clustering_threshold (float, optional) – Hyperparameter for clustering. This is used as the threshold to convert hierarchical clusters into flat clusters.
- Returns
cluster_labels – List of cluster labels for every neuron
- Return type
list
-
neurox.interpretation.clustering.
extract_independent_neurons
(X, use_abs_correlation=True, clustering_threshold=0.5)[source]¶ Extract independent neurons from the given set of neurons.
This method first clusters all of the given neurons with every cluster representing similar neurons. A single neuron is then picked randomly from every cluster and this forms the final set of independent neurons that is returned
- Parameters
X (numpy.ndarray) – Matrix of size [ NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors
use_abs_correlation (bool, optional) – Whether to use absolute correlation values. Two neurons that are correlated in the opposite direction may represent the same “knowledge” in a large neural network.
clustering_threshold (float, optional) – Hyperparameter for clustering. This is used as the threshold to convert hierarchical clusters into flat clusters.
- Returns
independent_neurons – List of non-redundant indepenent neurons
- Return type
list
-
neurox.interpretation.clustering.
print_clusters
(cluster_labels)[source]¶ Utility function for printing clusters
- Parameters
cluster_labels (list) – List of cluster labels for every neuron. Usually the output of
interpretation.clustering.create_correlation_clusters
.
-
neurox.interpretation.clustering.
scikit_extract_independent_neurons
(X, clustering_threshold=0.5)[source]¶ Alternative implementation of
interpretation.clustering.extract_independent_neurons
.This is an alternative implementation of the
extract_independent_neurons
function using scikit-learn to create the correlation matrix instead of numpy. Should give identical results.- Parameters
X (numpy.ndarray) – Matrix of size [ NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors
clustering_threshold (float, optional) – Hyperparameter for clustering. This is used as the threshold to convert hierarchical clusters into flat clusters.
- Returns
independent_neurons (list) – List of non-redundant indepenent neurons
cluster_labels (list) – List of cluster labels for every neuron
neurox.interpretation.linear_probe¶
Module for layer and neuron level linear-probe based analysis.
This module contains functions to train, evaluate and use a linear probe for both layer-wise and neuron-wise analysis.
-
class
neurox.interpretation.linear_probe.
LinearProbe
(input_size, num_classes)[source]¶ Bases:
torch.nn.modules.module.Module
Torch model for linear probe
-
training
: bool¶
-
-
neurox.interpretation.linear_probe.
l1_penalty
(var)[source]¶ L1/Lasso regularization penalty
- Parameters
var (torch.Variable) – Torch variable representing the weight matrix over which the penalty should be computed
- Returns
penalty – Torch variable containing the penalty as a single floating point value
- Return type
torch.Variable
-
neurox.interpretation.linear_probe.
l2_penalty
(var)[source]¶ L2/Ridge regularization penalty.
- Parameters
var (torch.Variable) – Torch variable representing the weight matrix over which the penalty should be computed
- Returns
penalty – Torch variable containing the penalty as a single floating point value
- Return type
torch.Variable
Notes
The penalty is derived from the L2-norm, which has a square root. The exact optimization can also be done without the square root, but this makes no difference in the actual output of the optimization because of the scaling factor used along with the penalty.
-
neurox.interpretation.linear_probe.
train_logistic_regression_probe
(X_train, y_train, lambda_l1=0, lambda_l2=0, num_epochs=10, batch_size=32, learning_rate=0.001)[source]¶ Train a logistic regression probe.
This method trains a linear classifier that can be used as a probe to perform neuron analysis. Use this method when the task that is being probed for is a classification task. A logistic regression model is trained with Cross Entropy loss. The optimizer used is Adam with default
torch.optim
package hyperparameters.- Parameters
X_train (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
.dtype
of the matrix must benp.float32
y_train (numpy.ndarray) – Numpy Vector with 0-indexed class labels for each input token. The size of the vector must be [
NUM_TOKENS
]. Usually the output ofinterpretation.utils.create_tensors
. Assumes that class labels are continuous from0
toNUM_CLASSES-1
.dtype
of the matrix must benp.int
lambda_l1 (float, optional) – L1 Penalty weight in the overall loss. Defaults to 0, i.e. no L1 regularization
lambda_l2 (float, optional) – L2 Penalty weight in the overall loss. Defaults to 0, i.e. no L2 regularization
num_epochs (int, optional) – Number of epochs to train the linear model for. Defaults to 10
batch_size (int, optional) – Batch size for the input to the linear model. Defaults to 32
learning_rate (float, optional) – Learning rate for optimizing the linear model.
- Returns
probe – Trained probe for the given task.
- Return type
-
neurox.interpretation.linear_probe.
train_linear_regression_probe
(X_train, y_train, lambda_l1=0, lambda_l2=0, num_epochs=10, batch_size=32, learning_rate=0.001)[source]¶ Train a linear regression probe.
This method trains a linear classifier that can be used as a probe to perform neuron analysis. Use this method when the task that is being probed for is a regression task. A linear regression model is trained with MSE loss. The optimizer used is Adam with default
torch.optim
package hyperparameters.- Parameters
X_train (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
.dtype
of the matrix must benp.float32
y_train (numpy.ndarray) – Numpy Vector with real-valued labels for each input token. The size of the vector must be [
NUM_TOKENS
]. Usually the output ofinterpretation.utils.create_tensors
.dtype
of the matrix must benp.float32
lambda_l1 (float, optional) – L1 Penalty weight in the overall loss. Defaults to 0, i.e. no L1 regularization
lambda_l2 (float, optional) – L2 Penalty weight in the overall loss. Defaults to 0, i.e. no L2 regularization
num_epochs (int, optional) – Number of epochs to train the linear model for. Defaults to 10
batch_size (int, optional) – Batch size for the input to the linear model. Defaults to 32
learning_rate (float, optional) – Learning rate for optimizing the linear model.
- Returns
probe – Trained probe for the given task.
- Return type
-
neurox.interpretation.linear_probe.
evaluate_probe
(probe, X, y, idx_to_class=None, return_predictions=False, source_tokens=None, batch_size=32, metric='accuracy')[source]¶ Evaluates a trained probe.
This method evaluates a trained probe on the given data, and supports several standard metrics.
- Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
.dtype
of the matrix must benp.float32
y (numpy.ndarray) – Numpy Vector of size [
NUM_TOKENS
] with class labels for each input token. For classification, 0-indexed class labels for each input token are expected. For regression, a real value per input token is expected. Usually the output ofinterpretation.utils.create_tensors
idx_to_class (dict, optional) – Class index to name mapping. Usually returned by
interpretation.utils.create_tensors
. If this mapping is provided, per-class metrics are also computed. Defaults to None.return_predictions (bool, optional) – If set to True, actual predictions are also returned along with scores for further use. Defaults to False.
source_tokens (list of lists, optional) – List of all sentences, where each is a list of the tokens in that sentence. Usually returned by
data.loader.load_data
. If provided andreturn_predictions
is True, each prediction will be paired with its original token. Defaults to None.batch_size (int, optional) – Batch size for the input to the model. Defaults to 32
metrics (str, optional) – Metric to use for evaluation scores. For supported metrics see
interpretation.metrics
- Returns
scores (dict) – The overall score on the given data with the key
__OVERALL__
. Ifidx_to_class
mapping is provided, additional keys representing each class and their associated scores are also part of the dictionary.predictions (list of 3-tuples, optional) – If
return_predictions
is set to True, this list will contain a 3-tuple for every input sample, representing(source_token, predicted_class, was_predicted_correctly)
-
neurox.interpretation.linear_probe.
get_top_neurons
(probe, percentage, class_to_idx)[source]¶ Get top neurons from a trained probe.
This method returns the set of all top neurons based on the given percentage. It also returns top neurons per class. All neurons (sorted by weight in ascending order) that account for
percentage
of the total weight mass are returned. See the given reference for the compcomplete selection algorithm description.Note
Absolute weight values are used for selection, instead of raw signed values
- Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
percentage (float) – Real number between 0 and 1, with 0 representing no weight mass and 1 representing the entire weight mass, i.e. all neurons.
class_to_idx (dict) – Class to class index mapping. Usually returned by
interpretation.utils.create_tensors
.
- Returns
overall_top_neurons (numpy.ndarray) – Numpy array with all top neurons
top_neurons (dict) – Dictionary with top neurons for every class, with the class name as the key and
numpy.ndarray
of top neurons (for that class) as the value.
Notes
One can expect distributed tasks to have more top neurons than focused tasks
One can also expect complex tasks to have more top neurons than simpler tasks
-
neurox.interpretation.linear_probe.
get_top_neurons_hard_threshold
(probe, fraction, class_to_idx)[source]¶ Get top neurons from a trained probe based on the maximum weight.
This method returns the set of all top neurons based on the given threshold. All neurons that have a weight above
threshold * max_weight
are considered as top neurons. It also returns top neurons per class.Note
Absolute weight values are used for selection, instead of raw signed values
- Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
fraction (float) – Fraction of maximum weight per class to use for selection
class_to_idx (dict) – Class to class index mapping. Usually returned by
interpretation.utils.create_tensors
.
- Returns
overall_top_neurons (numpy.ndarray) – Numpy array with all top neurons
top_neurons (dict) – Dictionary with top neurons for every class, with the class name as the key and
numpy.ndarray
of top neurons (for that class) as the value.
-
neurox.interpretation.linear_probe.
get_bottom_neurons
(probe, percentage, class_to_idx)[source]¶ Get bottom neurons from a trained probe.
Analogous to
interpretation.linear_probe.get_top_neurons
. This method returns the set of all bottom neurons based on the given percentage. It also returns bottom neurons per class. All neurons (sorted by weight in ascending order) that account forpercentage
of the total weight mass are returned. See the given reference for the complete selection algorithm description.Note
Absolute weight values are used for selection, instead of raw signed values
- Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
percentage (float) – Real number between 0 and 1, with 0 representing no weight mass and 1 representing the entire weight mass, i.e. all neurons.
class_to_idx (dict) – Class to class index mapping. Usually returned by
interpretation.utils.create_tensors
.
- Returns
overall_bottom_neurons (numpy.ndarray) – Numpy array with all bottom neurons
bottom_neurons (dict) – Dictionary with bottom neurons for every class, with the class name as the key and
numpy.ndarray
of bottom neurons (for that class) as the value.
-
neurox.interpretation.linear_probe.
get_random_neurons
(probe, probability)[source]¶ Get random neurons from a trained probe.
This method returns a random set of neurons based on the probability. Each neuron is either discarded or included based on a uniform random variable’s value (included if its less than probability, discarded otherwise)
- Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
probability (float) – Real number between 0 and 1, with 0 representing no selection and 1 representing selection of all neurons.
- Returns
random_neurons – Numpy array with random neurons
- Return type
numpy.ndarray
-
neurox.interpretation.linear_probe.
get_neuron_ordering
(probe, class_to_idx, search_stride=100)[source]¶ Get global ordering of neurons from a trained probe.
This method returns the global ordering of neurons in a model based on the given probe’s weight values. Top neurons are computed at increasing percentages of the weight mass and then accumulated in-order. See given reference for a complete description of the selection algorithm.
For example, if the neuron list at 1% weight mass is [#2, #52, #134], and at 2% weight mass is [#2, #4, #52, #123, #130, #134, #567], the returned ordering will be [#2, #52, #134, #4, #123, #130, #567]. Within each percentage, the ordering of neurons is arbitrary. In this case, the importance of #2, #52 and #134 is not necessarily in that order. The cutoffs between each percentage selection are also returned. Increasing the
search_stride
will decrease the distance between each cutoff, making the overall ordering more accurate.Note
Absolute weight values are used for selection, instead of raw signed values
- Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
class_to_idx (dict) – Class to class index mapping. Usually returned by
interpretation.utils.create_tensors
.search_stride (int, optional) – Defines how many pieces the percent weight mass selection is divided into. Higher leads to more a accurate ordering. Defaults to 100.
- Returns
global_neuron_ordering (numpy.ndarray) – Numpy array of size
NUM_NEURONS
with neurons in decreasing order of importance.cutoffs (list) – Indices where each percentage selection begins. All neurons between two cutoff values are arbitrarily ordered.
-
neurox.interpretation.linear_probe.
get_neuron_ordering_granular
(probe, class_to_idx, granularity=50, search_stride=100)[source]¶ Get global ordering of neurons from a trained probe.
This method is an alternative to
interpretation.linear_probe.get_neuron_ordering
. It works very similarly to that method, except that instead of adding the neurons from each percentage selection, neurons are added in chunks ofgranularity
neurons.Note
Absolute weight values are used for selection, instead of raw signed values
- Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
class_to_idx (dict) – Class to class index mapping. Usually returned by
interpretation.utils.create_tensors
.granularity (int, optional) – Approximate number of neurons in each chunk of selection. Defaults to 50.
search_stride (int, optional) – Defines how many pieces the percent weight mass selection is divided into. Higher leads to more a accurate ordering. Defaults to 100.
- Returns
global_neuron_ordering (numpy.ndarray) – Numpy array of size
NUM_NEURONS
with neurons in decreasing order of importance.cutoffs (list) – Indices where each chunk of selection begins. Each chunk will contain approximately
granularity
neurons. All neurons between two cutoff values (i.e. a chunk) are arbitrarily ordered.
-
neurox.interpretation.linear_probe.
get_fixed_number_of_bottom_neurons
(probe, num_bottom_neurons, class_to_idx)[source]¶ Get global bottom neurons.
This method returns a fixed number of bottoms neurons from the global ordering computed using
interpretation.linear_probe.get_neuron_ordering
.Note
Absolute weight values are used for selection, instead of raw signed values
- Parameters
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
num_bottom_neurons (int) – Number of bottom neurons for selection
class_to_idx (dict) – Class to class index mapping. Usually returned by
interpretation.utils.create_tensors
.
- Returns
global_bottom_neurons – Numpy array of size
num_bottom_neurons
with bottom neurons using the global ordering- Return type
numpy.ndarray
neurox.interpretation.metrics¶
Module that wraps around several standard metrics
-
neurox.interpretation.metrics.
accuracy
(preds, labels)[source]¶ Accuracy.
- Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
- Returns
accuracy – Accuracy of the model
- Return type
float
-
neurox.interpretation.metrics.
f1
(preds, labels)[source]¶ F-Score or F1 score.
Note
The implementation from
sklearn.metrics
is used to compute the score.- Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
- Returns
f1_score – F-Score of the model
- Return type
float
-
neurox.interpretation.metrics.
accuracy_and_f1
(preds, labels)[source]¶ Mean of Accuracy and F-Score.
Note
The implementation from
sklearn.metrics
is used to compute the F-Score.- Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
- Returns
acc_f1_mean – Mean of Accuracy and F-Score of the model
- Return type
float
-
neurox.interpretation.metrics.
pearson
(preds, labels)[source]¶ Pearson’s correlation coefficient
Note
The implementation from
scipy.stats
is used to compute the score.- Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
- Returns
pearson_score – Pearson’s correlation coefficient of the model
- Return type
float
-
neurox.interpretation.metrics.
spearman
(preds, labels)[source]¶ Spearman correlation coefficient
Note
The implementation from
scipy.stats
is used to compute the score.- Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
- Returns
spearman_score – Spearman correlation coefficient of the model
- Return type
float
-
neurox.interpretation.metrics.
pearson_and_spearman
(preds, labels)[source]¶ Mean of Pearson and Spearman correlation coefficients.
Note
The implementation from
scipy.stats
is used to compute the scores.- Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
- Returns
pearson_spearman_mean – Mean of Pearson and Spearman correlation coefficients of the model
- Return type
float
-
neurox.interpretation.metrics.
matthews_corrcoef
(preds, labels)[source]¶ Matthew’s correlation coefficient
Note
The implementation from
sklearn.metrics
is used to compute the score.- Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
- Returns
mcc_score – Matthew’s correlation coefficient of the model
- Return type
float
-
neurox.interpretation.metrics.
compute_score
(preds, labels, metric)[source]¶ Utility function to compute scores using several metrics.
- Parameters
preds (list or numpy.ndarray) – A list of predictions from a model
labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as
preds
metric (str) – One of
accuracy
,f1
,accuracy_and_f1
,pearson
,spearman
,pearson_and_spearman
ormatthews_corrcoef
.
- Returns
score – Score of the model with the chosen metric
- Return type
float
neurox.interpretation.probeless¶
Module for Probeless method
This module extracts neuron ranking for a label/tag (e.g Verbs) or for an entire property set (e.g Part of speech) without training any probes.
-
neurox.interpretation.probeless.
get_neuron_ordering
(X_train, y_train)[source]¶ Returns a list of top neurons w.r.t the overall task e.g. POS
- Parameters
X_train (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
y_train (numpy.ndarray) – Numpy Vector of size [
NUM_TOKENS
] with class labels for each input token. Usually the output ofinterpretation.utils.create_tensors
.
- Returns
ranking – list of
NUM_NEURONS
neuron indices, in decreasing order of importance.- Return type
list
-
neurox.interpretation.probeless.
get_neuron_ordering_for_tag
(X_train, y_train, label2idx, tag)[source]¶ Returns a list of top neurons w.r.t a tag e.g. noun
- Parameters
X_train (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
y_train (numpy.ndarray) – Numpy Vector of size [
NUM_TOKENS
] with class labels for each input token. Usually the output ofinterpretation.utils.create_tensors
.label2idx (dict) – Class name to index mapping. Usually returned by
interpretation.utils.create_tensors
.tag (string) – tag for which rankings are extracted
- Returns
ranking – list of
NUM_NEURONS
neuron indices, in decreasing order of importance.- Return type
list
Returns a dictionary of tags along with top neurons for each tag Returns a list of overall ranking
- Parameters
X_train (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
y_train (numpy.ndarray) – Numpy Vector of size [
NUM_TOKENS
] with class labels for each input token. Usually the output ofinterpretation.utils.create_tensors
.idx2label (dict) – Class index to name mapping. Usually returned by
interpretation.utils.create_tensors
.
- Returns
overall_ranking (list) – list of
NUM_NEURONS
neuron indices, in decreasing order of importance.ranking_per_tag (dict) – Dictionary with top neurons for every class, with the class name as the key and list of neurons as the values.
neurox.interpretation.utils¶
-
neurox.interpretation.utils.
isnotebook
()[source]¶ Utility function to detect if the code being run is within a jupyter notebook. Useful to change progress indicators for example.
- Returns
isnotebook – True if the function is being called inside a notebook, False otherwise.
- Return type
bool
-
neurox.interpretation.utils.
get_progress_bar
()[source]¶ Utility function to get a progress bar depending on the environment the code is running in. A normal text-based progress bar is returned in normal shells, and a notebook widget-based progress bar is returned in jupyter notebooks.
- Returns
progressbar – The appropriate progressbar from the tqdm library.
- Return type
function
-
neurox.interpretation.utils.
batch_generator
(X, y, batch_size=32)[source]¶ Generator function to generate batches of data for training/evaluation.
This function takes two tensors representing the activations and labels respectively, and yields batches of parallel data. The last batch may contain fewer than
batch_size
elements.- Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually the output ofinterpretation.utils.create_tensors
y (numpy.ndarray) – Numpy Vector of size [
NUM_TOKENS
] with class labels for each input token. For classification, 0-indexed class labels for each input token are expected. For regression, a real value per input token is expected. Usually the output ofinterpretation.utils.create_tensors
batch_size (int, optional) – Number of samples to return in each call. Defaults to 32.
- Yields
X_batch (numpy.ndarray) – Numpy Matrix of size [
batch_size
xNUM_NEURONS
]. The final batch may have fewer elements than the requestedbatch_size
y_batch (numpy.ndarray) – Numpy Vector of size [
batch_size
]. The final batch may have fewer elements than the requestedbatch_size
-
neurox.interpretation.utils.
tok2idx
(tokens)[source]¶ Utility function to generate unique indices for a set of tokens.
- Parameters
tokens (list of lists) – List of sentences, where each sentence is a list of tokens. Usually returned from
data.loader.load_data
- Returns
tok2idx_mapping – A dictionary with tokens as keys and a unique index for each token as values
- Return type
dict
-
neurox.interpretation.utils.
idx2tok
(srcidx)[source]¶ Utility function to an inverse mapping from a
tok2idx
mapping.- Parameters
tok2idx_mapping (dict) – Token to index mapping, usually the output for
interpretation.utils.tok2idx
.- Returns
idx2tok – A dictionary with unique indices as keys and their associated tokens as values
- Return type
dict
-
neurox.interpretation.utils.
count_target_words
(tokens)[source]¶ Utility function to count the total number of tokens in a dataset.
- Parameters
tokens (list of lists) – List of sentences, where each sentence is a list of tokens. Usually returned from
data.loader.load_data
- Returns
count – Total number of tokens in the given
tokens
structure- Return type
int
-
neurox.interpretation.utils.
create_tensors
(tokens, activations, task_specific_tag, mappings=None, task_type='classification', binarized_tag=None, balance_data=False)[source]¶ Method to pre-process loaded datasets into tensors that can be used to train probes and perform analyis on. The input tokens are represented as list of sentences, where each sentence is a list of tokens. Each token also has an associated label. All tokens from all sentences are flattened into one dimension in the returned tensors. The returned tensors will thus have
total_num_tokens
rows.- Parameters
tokens (list of lists) – List of sentences, where each sentence is a list of tokens. Usually returned from
data.loader.load_data
activations (list of numpy.ndarray) – List of sentence representations, where each sentence representation is a numpy matrix of shape
[num tokens in sentence x concatenated representation size]
. Usually retured fromdata.loader.load_activations
task_specific_tag (str) – Label to assign tokens with unseen labels. This is particularly useful if some labels are never seen during train, but are present in the dev or test set. This is usually set to the majority class in the task.
mappings (list of dicts) – List of four python dicts:
label2idx
,idx2label
,src2idx
andidx2src
for classification tasks. List of two dictssrc2idx
andidx2src
for regression tasks. Each dict represents either the mapping from class labels to indices and source tokens to indices or vice versa. Usually returned from a previous call tocreate_tensors
.task_type (str) – Either “classification” or “regression”, indicate the kind of task that is being probed.
binarized_tag (str, optional) – Tag/Label to create binary data. All other labels in the dataset are changed to OTHER. Defaults to None in which case the data labels are processed as-is.
balance_data (bool, optional) – Whether the incoming data should be balanced. Data is balanced using utils.balance_binary_class_data for binary data and utils.balance_multi_class_data for multi-class data using undersampling. Defaults to False.
- Returns
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]y (numpy.ndarray) – Numpy vector of size [
NUM_TOKENS
]mappings (list of dicts) – List of four python dicts:
label2idx
,idx2label
,src2idx
andidx2src
for classification tasks. List of two dictssrc2idx
andidx2src
for regression tasks. Each dict represents either the mapping from class labels to indices and source tokens to indices or vice versa.
Notes
mappings
should be created exactly once, and should be reused for subsequent callsFor example,
mappings
can be created on train data, and the passed during the call for dev and test data.
-
neurox.interpretation.utils.
print_overall_stats
(all_results)[source]¶ Method to pretty print overall results.
Warning
This method was primarily written to process results from internal scripts and pipelines.
- Parameters
all_results (dict) – Dictionary containing the probe, overall scores, scores from selected neurons, neuron ordering and neuron selections at various percentages
-
neurox.interpretation.utils.
print_machine_stats
(all_results)[source]¶ Method to print overall results in tsv format.
Warning
This method was primarily written to process results from internal scripts and pipelines.
- Parameters
all_results (dict) – Dictionary containing the probe, overall scores, scores from selected neurons, neuron ordering and neuron selections at various percentages
-
neurox.interpretation.utils.
balance_binary_class_data
(X, y)[source]¶ Method to balance binary class data.
Note
The majority class is under-sampled randomly to match the minority class in it’s size.
- Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually returned frominterpretation.utils.create_tensors
y (numpy.ndarray) – Numpy vector of size [
NUM_TOKENS
]. Usually returned frominterpretation.utils.create_tensors
- Returns
X_balanced (numpy.ndarray) – Numpy matrix of size [
NUM_BALANCED_TOKENS
xNUM_NEURONS
]y_balanced (numpy.ndarray) – Numpy vector of size [
NUM_BALANCED_TOKENS
]
-
neurox.interpretation.utils.
balance_multi_class_data
(X, y, num_required_instances=None)[source]¶ Method to balance multi class data.
Note
All classes are under-sampled randomly to match the minority class in their size. If
num_required_instances
is provided, all classes are sampled proportionally so that the total number of selected examples is approximatelynum_required_instances
(because of rounding proportions).- Parameters
X (numpy.ndarray) – Numpy Matrix of size [
NUM_TOKENS
xNUM_NEURONS
]. Usually returned frominterpretation.utils.create_tensors
y (numpy.ndarray) – Numpy vector of size [
NUM_TOKENS
]. Usually returned frominterpretation.utils.create_tensors
num_required_instances (int, optional) – Total number of required instances. All classes are sampled proportionally.
- Returns
X_balanced (numpy.ndarray) – Numpy matrix of size [
NUM_BALANCED_TOKENS
xNUM_NEURONS
]y_balanced (numpy.ndarray) – Numpy vector of size [
NUM_BALANCED_TOKENS
]
-
neurox.interpretation.utils.
load_probe
(probe_path)[source]¶ Loads a probe and its associated mappings from probe_path
Warning
This method is currently not implemented.
- Parameters
probe_path (str) – Path to a pkl object saved by interpretation.utils.save_probe
- Returns
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
mappings (list of dicts) – List of four python dicts:
label2idx
,idx2label
,src2idx
andidx2src
for classification tasks. List of two dictssrc2idx
andidx2src
for regression tasks. Each dict represents either the mapping from class labels to indices and source tokens to indices or vice versa.
-
neurox.interpretation.utils.
save_probe
(probe_path, probe, mappings)[source]¶ Saves a model and its associated mappings as a pkl object at probe_path
Warning
This method is currently not implemented.
- Parameters
probe_path (str) – Path to save a pkl object
probe (interpretation.linear_probe.LinearProbe) – Trained probe model
mappings (list of dicts) – List of four python dicts:
label2idx
,idx2label
,src2idx
andidx2src
for classification tasks. List of two dictssrc2idx
andidx2src
for regression tasks. Each dict represents either the mapping from class labels to indices and source tokens to indices or vice versa.
Module contents: