neurox.interpretation

Submodules:

neurox.interpretation.ablation

Module for ablating neurons using various techniques.

This module provides a set of methods to ablate both layers and individual neurons from a given set.

neurox.interpretation.ablation.keep_specific_neurons(X, neuron_list)[source]

Filter activations so that they only contain specific neurons.

Warning

This function is deprecated and will be removed in future versions. Use interpretation.ablation.filter_activations_keep_neurons instead.

Parameters
  • X (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors

  • neuron_list (list or numpy.ndarray) – List of neurons to keep

Returns

filtered_X – Numpy Matrix of size [NUM_TOKENS x len(neuron_list)]

Return type

numpy.ndarray view

neurox.interpretation.ablation.filter_activations_keep_neurons(X, neurons_to_keep)[source]

Filter activations so that they only contain specific neurons.

Note

The returned value is a view, so modifying it will modify the original matrix.

Parameters
  • X (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors

  • neurons_to_keep (list or numpy.ndarray) – List of neurons to keep

Returns

filtered_X – Numpy Matrix of size [NUM_TOKENS x len(neurons_to_keep)]

Return type

numpy.ndarray view

neurox.interpretation.ablation.filter_activations_remove_neurons(X, neurons_to_remove)[source]

Filter activations so that they do not contain specific neurons.

Note

The returned value is a view, so modifying it will modify the original matrix.

Parameters
  • X (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors

  • neurons_to_remove (list or numpy.ndarray) – List of neurons to remove

Returns

filtered_X – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS - len(neurons_to_remove)]

Return type

numpy.ndarray view

neurox.interpretation.ablation.zero_out_activations_keep_neurons(X, neurons_to_keep)[source]

Mask all neurons activations with zero other than specified neurons.

Parameters
  • X (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors

  • neurons_to_keep (list or numpy.ndarray) – List of neurons to not mask

Returns

filtered_X – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]

Return type

numpy.ndarray

neurox.interpretation.ablation.zero_out_activations_remove_neurons(X, neurons_to_remove)[source]

Mask specific neuron activations with zero.

Parameters
  • X (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors

  • neurons_to_remove (list or numpy.ndarray) – List of neurons to mask

Returns

filtered_X – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]

Return type

numpy.ndarray

neurox.interpretation.ablation.filter_activations_by_layers(X, layers_to_keep, num_layers, bidirectional_filtering='none')[source]

Filter activations so that they only contain specific layers.

Useful for performing layer-wise analysis.

Parameters
  • X (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors

  • layers_to_keep (list or numpy.ndarray) – List of layers to keep. Layers are 0-indexed

  • num_layers (int) – Total number of layers in the original model.

  • bidirectional_filtering (str) – Can be either “none” (Default), “forward” or “backward”. Useful if the model being analyzed is bi-directional and only layers in a certain direction need to be analyzed.

Returns

filtered_X – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS_PER_LAYER * NUM_LAYERS] The second dimension is doubled if the original model is bidirectional and no filtering is done.

Return type

numpy.ndarray

Notes

For bidirectional models, the method assumes that the internal structure is as follows: forward layer 0 neurons, backward layer 0 neurons, forward layer 0 neurons …

neurox.interpretation.clustering

Module for clustering analysis.

This module contains functions to perform clustering analysis on neuron activations.

neurox.interpretation.clustering.create_correlation_clusters(X, use_abs_correlation=True, clustering_threshold=0.5, method='average')[source]

Create clusters based on neuron activation correlation. All neurons in the same cluster have “highly correlated” neurons that fire similarly on similar inputs.

Parameters
  • X (numpy.ndarray) – Matrix of size [ NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors

  • use_abs_correlation (bool, optional) – Whether to use absolute correlation values. Two neurons that are correlated in the opposite direction may represent the same “knowledge” in a large neural network.

  • clustering_threshold (float, optional) – Hyperparameter for clustering. This is used as the threshold to convert hierarchical clusters into flat clusters.

Returns

cluster_labels – List of cluster labels for every neuron

Return type

list

neurox.interpretation.clustering.extract_independent_neurons(X, use_abs_correlation=True, clustering_threshold=0.5)[source]

Extract independent neurons from the given set of neurons.

This method first clusters all of the given neurons with every cluster representing similar neurons. A single neuron is then picked randomly from every cluster and this forms the final set of independent neurons that is returned

Parameters
  • X (numpy.ndarray) – Matrix of size [ NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors

  • use_abs_correlation (bool, optional) – Whether to use absolute correlation values. Two neurons that are correlated in the opposite direction may represent the same “knowledge” in a large neural network.

  • clustering_threshold (float, optional) – Hyperparameter for clustering. This is used as the threshold to convert hierarchical clusters into flat clusters.

Returns

independent_neurons – List of non-redundant indepenent neurons

Return type

list

neurox.interpretation.clustering.print_clusters(cluster_labels)[source]

Utility function for printing clusters

Parameters

cluster_labels (list) – List of cluster labels for every neuron. Usually the output of interpretation.clustering.create_correlation_clusters.

neurox.interpretation.clustering.scikit_extract_independent_neurons(X, clustering_threshold=0.5)[source]

Alternative implementation of interpretation.clustering.extract_independent_neurons.

This is an alternative implementation of the extract_independent_neurons function using scikit-learn to create the correlation matrix instead of numpy. Should give identical results.

Parameters
  • X (numpy.ndarray) – Matrix of size [ NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors

  • clustering_threshold (float, optional) – Hyperparameter for clustering. This is used as the threshold to convert hierarchical clusters into flat clusters.

Returns

  • independent_neurons (list) – List of non-redundant indepenent neurons

  • cluster_labels (list) – List of cluster labels for every neuron

neurox.interpretation.linear_probe

Module for layer and neuron level linear-probe based analysis.

This module contains functions to train, evaluate and use a linear probe for both layer-wise and neuron-wise analysis.

class neurox.interpretation.linear_probe.LinearProbe(input_size, num_classes)[source]

Bases: torch.nn.modules.module.Module

Torch model for linear probe

__init__(input_size, num_classes)[source]

Initialize a linear model

forward(x)[source]

Run a forward pass on the model

training: bool
neurox.interpretation.linear_probe.l1_penalty(var)[source]

L1/Lasso regularization penalty

Parameters

var (torch.Variable) – Torch variable representing the weight matrix over which the penalty should be computed

Returns

penalty – Torch variable containing the penalty as a single floating point value

Return type

torch.Variable

neurox.interpretation.linear_probe.l2_penalty(var)[source]

L2/Ridge regularization penalty.

Parameters

var (torch.Variable) – Torch variable representing the weight matrix over which the penalty should be computed

Returns

penalty – Torch variable containing the penalty as a single floating point value

Return type

torch.Variable

Notes

The penalty is derived from the L2-norm, which has a square root. The exact optimization can also be done without the square root, but this makes no difference in the actual output of the optimization because of the scaling factor used along with the penalty.

neurox.interpretation.linear_probe.train_logistic_regression_probe(X_train, y_train, lambda_l1=0, lambda_l2=0, num_epochs=10, batch_size=32, learning_rate=0.001)[source]

Train a logistic regression probe.

This method trains a linear classifier that can be used as a probe to perform neuron analysis. Use this method when the task that is being probed for is a classification task. A logistic regression model is trained with Cross Entropy loss. The optimizer used is Adam with default torch.optim package hyperparameters.

Parameters
  • X_train (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors. dtype of the matrix must be np.float32

  • y_train (numpy.ndarray) – Numpy Vector with 0-indexed class labels for each input token. The size of the vector must be [NUM_TOKENS]. Usually the output of interpretation.utils.create_tensors. Assumes that class labels are continuous from 0 to NUM_CLASSES-1. dtype of the matrix must be np.int

  • lambda_l1 (float, optional) – L1 Penalty weight in the overall loss. Defaults to 0, i.e. no L1 regularization

  • lambda_l2 (float, optional) – L2 Penalty weight in the overall loss. Defaults to 0, i.e. no L2 regularization

  • num_epochs (int, optional) – Number of epochs to train the linear model for. Defaults to 10

  • batch_size (int, optional) – Batch size for the input to the linear model. Defaults to 32

  • learning_rate (float, optional) – Learning rate for optimizing the linear model.

Returns

probe – Trained probe for the given task.

Return type

interpretation.linear_probe.LinearProbe

neurox.interpretation.linear_probe.train_linear_regression_probe(X_train, y_train, lambda_l1=0, lambda_l2=0, num_epochs=10, batch_size=32, learning_rate=0.001)[source]

Train a linear regression probe.

This method trains a linear classifier that can be used as a probe to perform neuron analysis. Use this method when the task that is being probed for is a regression task. A linear regression model is trained with MSE loss. The optimizer used is Adam with default torch.optim package hyperparameters.

Parameters
  • X_train (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors. dtype of the matrix must be np.float32

  • y_train (numpy.ndarray) – Numpy Vector with real-valued labels for each input token. The size of the vector must be [NUM_TOKENS]. Usually the output of interpretation.utils.create_tensors. dtype of the matrix must be np.float32

  • lambda_l1 (float, optional) – L1 Penalty weight in the overall loss. Defaults to 0, i.e. no L1 regularization

  • lambda_l2 (float, optional) – L2 Penalty weight in the overall loss. Defaults to 0, i.e. no L2 regularization

  • num_epochs (int, optional) – Number of epochs to train the linear model for. Defaults to 10

  • batch_size (int, optional) – Batch size for the input to the linear model. Defaults to 32

  • learning_rate (float, optional) – Learning rate for optimizing the linear model.

Returns

probe – Trained probe for the given task.

Return type

interpretation.linear_probe.LinearProbe

neurox.interpretation.linear_probe.evaluate_probe(probe, X, y, idx_to_class=None, return_predictions=False, source_tokens=None, batch_size=32, metric='accuracy')[source]

Evaluates a trained probe.

This method evaluates a trained probe on the given data, and supports several standard metrics.

Parameters
  • probe (interpretation.linear_probe.LinearProbe) – Trained probe model

  • X (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors. dtype of the matrix must be np.float32

  • y (numpy.ndarray) – Numpy Vector of size [NUM_TOKENS] with class labels for each input token. For classification, 0-indexed class labels for each input token are expected. For regression, a real value per input token is expected. Usually the output of interpretation.utils.create_tensors

  • idx_to_class (dict, optional) – Class index to name mapping. Usually returned by interpretation.utils.create_tensors. If this mapping is provided, per-class metrics are also computed. Defaults to None.

  • return_predictions (bool, optional) – If set to True, actual predictions are also returned along with scores for further use. Defaults to False.

  • source_tokens (list of lists, optional) – List of all sentences, where each is a list of the tokens in that sentence. Usually returned by data.loader.load_data. If provided and return_predictions is True, each prediction will be paired with its original token. Defaults to None.

  • batch_size (int, optional) – Batch size for the input to the model. Defaults to 32

  • metrics (str, optional) – Metric to use for evaluation scores. For supported metrics see interpretation.metrics

Returns

  • scores (dict) – The overall score on the given data with the key __OVERALL__. If idx_to_class mapping is provided, additional keys representing each class and their associated scores are also part of the dictionary.

  • predictions (list of 3-tuples, optional) – If return_predictions is set to True, this list will contain a 3-tuple for every input sample, representing (source_token, predicted_class, was_predicted_correctly)

neurox.interpretation.linear_probe.get_top_neurons(probe, percentage, class_to_idx)[source]

Get top neurons from a trained probe.

This method returns the set of all top neurons based on the given percentage. It also returns top neurons per class. All neurons (sorted by weight in ascending order) that account for percentage of the total weight mass are returned. See the given reference for the compcomplete selection algorithm description.

Note

Absolute weight values are used for selection, instead of raw signed values

Parameters
  • probe (interpretation.linear_probe.LinearProbe) – Trained probe model

  • percentage (float) – Real number between 0 and 1, with 0 representing no weight mass and 1 representing the entire weight mass, i.e. all neurons.

  • class_to_idx (dict) – Class to class index mapping. Usually returned by interpretation.utils.create_tensors.

Returns

  • overall_top_neurons (numpy.ndarray) – Numpy array with all top neurons

  • top_neurons (dict) – Dictionary with top neurons for every class, with the class name as the key and numpy.ndarray of top neurons (for that class) as the value.

Notes

  • One can expect distributed tasks to have more top neurons than focused tasks

  • One can also expect complex tasks to have more top neurons than simpler tasks

neurox.interpretation.linear_probe.get_top_neurons_hard_threshold(probe, fraction, class_to_idx)[source]

Get top neurons from a trained probe based on the maximum weight.

This method returns the set of all top neurons based on the given threshold. All neurons that have a weight above threshold * max_weight are considered as top neurons. It also returns top neurons per class.

Note

Absolute weight values are used for selection, instead of raw signed values

Parameters
  • probe (interpretation.linear_probe.LinearProbe) – Trained probe model

  • fraction (float) – Fraction of maximum weight per class to use for selection

  • class_to_idx (dict) – Class to class index mapping. Usually returned by interpretation.utils.create_tensors.

Returns

  • overall_top_neurons (numpy.ndarray) – Numpy array with all top neurons

  • top_neurons (dict) – Dictionary with top neurons for every class, with the class name as the key and numpy.ndarray of top neurons (for that class) as the value.

neurox.interpretation.linear_probe.get_bottom_neurons(probe, percentage, class_to_idx)[source]

Get bottom neurons from a trained probe.

Analogous to interpretation.linear_probe.get_top_neurons. This method returns the set of all bottom neurons based on the given percentage. It also returns bottom neurons per class. All neurons (sorted by weight in ascending order) that account for percentage of the total weight mass are returned. See the given reference for the complete selection algorithm description.

Note

Absolute weight values are used for selection, instead of raw signed values

Parameters
  • probe (interpretation.linear_probe.LinearProbe) – Trained probe model

  • percentage (float) – Real number between 0 and 1, with 0 representing no weight mass and 1 representing the entire weight mass, i.e. all neurons.

  • class_to_idx (dict) – Class to class index mapping. Usually returned by interpretation.utils.create_tensors.

Returns

  • overall_bottom_neurons (numpy.ndarray) – Numpy array with all bottom neurons

  • bottom_neurons (dict) – Dictionary with bottom neurons for every class, with the class name as the key and numpy.ndarray of bottom neurons (for that class) as the value.

neurox.interpretation.linear_probe.get_random_neurons(probe, probability)[source]

Get random neurons from a trained probe.

This method returns a random set of neurons based on the probability. Each neuron is either discarded or included based on a uniform random variable’s value (included if its less than probability, discarded otherwise)

Parameters
  • probe (interpretation.linear_probe.LinearProbe) – Trained probe model

  • probability (float) – Real number between 0 and 1, with 0 representing no selection and 1 representing selection of all neurons.

Returns

random_neurons – Numpy array with random neurons

Return type

numpy.ndarray

neurox.interpretation.linear_probe.get_neuron_ordering(probe, class_to_idx, search_stride=100)[source]

Get global ordering of neurons from a trained probe.

This method returns the global ordering of neurons in a model based on the given probe’s weight values. Top neurons are computed at increasing percentages of the weight mass and then accumulated in-order. See given reference for a complete description of the selection algorithm.

For example, if the neuron list at 1% weight mass is [#2, #52, #134], and at 2% weight mass is [#2, #4, #52, #123, #130, #134, #567], the returned ordering will be [#2, #52, #134, #4, #123, #130, #567]. Within each percentage, the ordering of neurons is arbitrary. In this case, the importance of #2, #52 and #134 is not necessarily in that order. The cutoffs between each percentage selection are also returned. Increasing the search_stride will decrease the distance between each cutoff, making the overall ordering more accurate.

Note

Absolute weight values are used for selection, instead of raw signed values

Parameters
  • probe (interpretation.linear_probe.LinearProbe) – Trained probe model

  • class_to_idx (dict) – Class to class index mapping. Usually returned by interpretation.utils.create_tensors.

  • search_stride (int, optional) – Defines how many pieces the percent weight mass selection is divided into. Higher leads to more a accurate ordering. Defaults to 100.

Returns

  • global_neuron_ordering (numpy.ndarray) – Numpy array of size NUM_NEURONS with neurons in decreasing order of importance.

  • cutoffs (list) – Indices where each percentage selection begins. All neurons between two cutoff values are arbitrarily ordered.

neurox.interpretation.linear_probe.get_neuron_ordering_granular(probe, class_to_idx, granularity=50, search_stride=100)[source]

Get global ordering of neurons from a trained probe.

This method is an alternative to interpretation.linear_probe.get_neuron_ordering. It works very similarly to that method, except that instead of adding the neurons from each percentage selection, neurons are added in chunks of granularity neurons.

Note

Absolute weight values are used for selection, instead of raw signed values

Parameters
  • probe (interpretation.linear_probe.LinearProbe) – Trained probe model

  • class_to_idx (dict) – Class to class index mapping. Usually returned by interpretation.utils.create_tensors.

  • granularity (int, optional) – Approximate number of neurons in each chunk of selection. Defaults to 50.

  • search_stride (int, optional) – Defines how many pieces the percent weight mass selection is divided into. Higher leads to more a accurate ordering. Defaults to 100.

Returns

  • global_neuron_ordering (numpy.ndarray) – Numpy array of size NUM_NEURONS with neurons in decreasing order of importance.

  • cutoffs (list) – Indices where each chunk of selection begins. Each chunk will contain approximately granularity neurons. All neurons between two cutoff values (i.e. a chunk) are arbitrarily ordered.

neurox.interpretation.linear_probe.get_fixed_number_of_bottom_neurons(probe, num_bottom_neurons, class_to_idx)[source]

Get global bottom neurons.

This method returns a fixed number of bottoms neurons from the global ordering computed using interpretation.linear_probe.get_neuron_ordering.

Note

Absolute weight values are used for selection, instead of raw signed values

Parameters
  • probe (interpretation.linear_probe.LinearProbe) – Trained probe model

  • num_bottom_neurons (int) – Number of bottom neurons for selection

  • class_to_idx (dict) – Class to class index mapping. Usually returned by interpretation.utils.create_tensors.

Returns

global_bottom_neurons – Numpy array of size num_bottom_neurons with bottom neurons using the global ordering

Return type

numpy.ndarray

neurox.interpretation.metrics

Module that wraps around several standard metrics

neurox.interpretation.metrics.accuracy(preds, labels)[source]

Accuracy.

Parameters
  • preds (list or numpy.ndarray) – A list of predictions from a model

  • labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as preds

Returns

accuracy – Accuracy of the model

Return type

float

neurox.interpretation.metrics.f1(preds, labels)[source]

F-Score or F1 score.

Note

The implementation from sklearn.metrics is used to compute the score.

Parameters
  • preds (list or numpy.ndarray) – A list of predictions from a model

  • labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as preds

Returns

f1_score – F-Score of the model

Return type

float

neurox.interpretation.metrics.accuracy_and_f1(preds, labels)[source]

Mean of Accuracy and F-Score.

Note

The implementation from sklearn.metrics is used to compute the F-Score.

Parameters
  • preds (list or numpy.ndarray) – A list of predictions from a model

  • labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as preds

Returns

acc_f1_mean – Mean of Accuracy and F-Score of the model

Return type

float

neurox.interpretation.metrics.pearson(preds, labels)[source]

Pearson’s correlation coefficient

Note

The implementation from scipy.stats is used to compute the score.

Parameters
  • preds (list or numpy.ndarray) – A list of predictions from a model

  • labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as preds

Returns

pearson_score – Pearson’s correlation coefficient of the model

Return type

float

neurox.interpretation.metrics.spearman(preds, labels)[source]

Spearman correlation coefficient

Note

The implementation from scipy.stats is used to compute the score.

Parameters
  • preds (list or numpy.ndarray) – A list of predictions from a model

  • labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as preds

Returns

spearman_score – Spearman correlation coefficient of the model

Return type

float

neurox.interpretation.metrics.pearson_and_spearman(preds, labels)[source]

Mean of Pearson and Spearman correlation coefficients.

Note

The implementation from scipy.stats is used to compute the scores.

Parameters
  • preds (list or numpy.ndarray) – A list of predictions from a model

  • labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as preds

Returns

pearson_spearman_mean – Mean of Pearson and Spearman correlation coefficients of the model

Return type

float

neurox.interpretation.metrics.matthews_corrcoef(preds, labels)[source]

Matthew’s correlation coefficient

Note

The implementation from sklearn.metrics is used to compute the score.

Parameters
  • preds (list or numpy.ndarray) – A list of predictions from a model

  • labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as preds

Returns

mcc_score – Matthew’s correlation coefficient of the model

Return type

float

neurox.interpretation.metrics.compute_score(preds, labels, metric)[source]

Utility function to compute scores using several metrics.

Parameters
  • preds (list or numpy.ndarray) – A list of predictions from a model

  • labels (list or numpy.ndarray) – A list of ground truth labels with the same number of elements as preds

  • metric (str) – One of accuracy, f1, accuracy_and_f1, pearson, spearman, pearson_and_spearman or matthews_corrcoef.

Returns

score – Score of the model with the chosen metric

Return type

float

neurox.interpretation.probeless

Module for Probeless method

This module extracts neuron ranking for a label/tag (e.g Verbs) or for an entire property set (e.g Part of speech) without training any probes.

neurox.interpretation.probeless.get_neuron_ordering(X_train, y_train)[source]

Returns a list of top neurons w.r.t the overall task e.g. POS

Parameters
  • X_train (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors

  • y_train (numpy.ndarray) – Numpy Vector of size [NUM_TOKENS] with class labels for each input token. Usually the output of interpretation.utils.create_tensors.

Returns

ranking – list of NUM_NEURONS neuron indices, in decreasing order of importance.

Return type

list

neurox.interpretation.probeless.get_neuron_ordering_for_tag(X_train, y_train, label2idx, tag)[source]

Returns a list of top neurons w.r.t a tag e.g. noun

Parameters
  • X_train (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors

  • y_train (numpy.ndarray) – Numpy Vector of size [NUM_TOKENS] with class labels for each input token. Usually the output of interpretation.utils.create_tensors.

  • label2idx (dict) – Class name to index mapping. Usually returned by interpretation.utils.create_tensors.

  • tag (string) – tag for which rankings are extracted

Returns

ranking – list of NUM_NEURONS neuron indices, in decreasing order of importance.

Return type

list

neurox.interpretation.probeless.get_neuron_ordering_for_all_tags(X_train, y_train, idx2label)[source]

Returns a dictionary of tags along with top neurons for each tag Returns a list of overall ranking

Parameters
  • X_train (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors

  • y_train (numpy.ndarray) – Numpy Vector of size [NUM_TOKENS] with class labels for each input token. Usually the output of interpretation.utils.create_tensors.

  • idx2label (dict) – Class index to name mapping. Usually returned by interpretation.utils.create_tensors.

Returns

  • overall_ranking (list) – list of NUM_NEURONS neuron indices, in decreasing order of importance.

  • ranking_per_tag (dict) – Dictionary with top neurons for every class, with the class name as the key and list of neurons as the values.

neurox.interpretation.utils

neurox.interpretation.utils.isnotebook()[source]

Utility function to detect if the code being run is within a jupyter notebook. Useful to change progress indicators for example.

Returns

isnotebook – True if the function is being called inside a notebook, False otherwise.

Return type

bool

neurox.interpretation.utils.get_progress_bar()[source]

Utility function to get a progress bar depending on the environment the code is running in. A normal text-based progress bar is returned in normal shells, and a notebook widget-based progress bar is returned in jupyter notebooks.

Returns

progressbar – The appropriate progressbar from the tqdm library.

Return type

function

neurox.interpretation.utils.batch_generator(X, y, batch_size=32)[source]

Generator function to generate batches of data for training/evaluation.

This function takes two tensors representing the activations and labels respectively, and yields batches of parallel data. The last batch may contain fewer than batch_size elements.

Parameters
  • X (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually the output of interpretation.utils.create_tensors

  • y (numpy.ndarray) – Numpy Vector of size [NUM_TOKENS] with class labels for each input token. For classification, 0-indexed class labels for each input token are expected. For regression, a real value per input token is expected. Usually the output of interpretation.utils.create_tensors

  • batch_size (int, optional) – Number of samples to return in each call. Defaults to 32.

Yields
  • X_batch (numpy.ndarray) – Numpy Matrix of size [batch_size x NUM_NEURONS]. The final batch may have fewer elements than the requested batch_size

  • y_batch (numpy.ndarray) – Numpy Vector of size [batch_size]. The final batch may have fewer elements than the requested batch_size

neurox.interpretation.utils.tok2idx(tokens)[source]

Utility function to generate unique indices for a set of tokens.

Parameters

tokens (list of lists) – List of sentences, where each sentence is a list of tokens. Usually returned from data.loader.load_data

Returns

tok2idx_mapping – A dictionary with tokens as keys and a unique index for each token as values

Return type

dict

neurox.interpretation.utils.idx2tok(srcidx)[source]

Utility function to an inverse mapping from a tok2idx mapping.

Parameters

tok2idx_mapping (dict) – Token to index mapping, usually the output for interpretation.utils.tok2idx.

Returns

idx2tok – A dictionary with unique indices as keys and their associated tokens as values

Return type

dict

neurox.interpretation.utils.count_target_words(tokens)[source]

Utility function to count the total number of tokens in a dataset.

Parameters

tokens (list of lists) – List of sentences, where each sentence is a list of tokens. Usually returned from data.loader.load_data

Returns

count – Total number of tokens in the given tokens structure

Return type

int

neurox.interpretation.utils.create_tensors(tokens, activations, task_specific_tag, mappings=None, task_type='classification', binarized_tag=None, balance_data=False)[source]

Method to pre-process loaded datasets into tensors that can be used to train probes and perform analyis on. The input tokens are represented as list of sentences, where each sentence is a list of tokens. Each token also has an associated label. All tokens from all sentences are flattened into one dimension in the returned tensors. The returned tensors will thus have total_num_tokens rows.

Parameters
  • tokens (list of lists) – List of sentences, where each sentence is a list of tokens. Usually returned from data.loader.load_data

  • activations (list of numpy.ndarray) – List of sentence representations, where each sentence representation is a numpy matrix of shape [num tokens in sentence x concatenated representation size]. Usually retured from data.loader.load_activations

  • task_specific_tag (str) – Label to assign tokens with unseen labels. This is particularly useful if some labels are never seen during train, but are present in the dev or test set. This is usually set to the majority class in the task.

  • mappings (list of dicts) – List of four python dicts: label2idx, idx2label, src2idx and idx2src for classification tasks. List of two dicts src2idx and idx2src for regression tasks. Each dict represents either the mapping from class labels to indices and source tokens to indices or vice versa. Usually returned from a previous call to create_tensors.

  • task_type (str) – Either “classification” or “regression”, indicate the kind of task that is being probed.

  • binarized_tag (str, optional) – Tag/Label to create binary data. All other labels in the dataset are changed to OTHER. Defaults to None in which case the data labels are processed as-is.

  • balance_data (bool, optional) – Whether the incoming data should be balanced. Data is balanced using utils.balance_binary_class_data for binary data and utils.balance_multi_class_data for multi-class data using undersampling. Defaults to False.

Returns

  • X (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]

  • y (numpy.ndarray) – Numpy vector of size [NUM_TOKENS]

  • mappings (list of dicts) – List of four python dicts: label2idx, idx2label, src2idx and idx2src for classification tasks. List of two dicts src2idx and idx2src for regression tasks. Each dict represents either the mapping from class labels to indices and source tokens to indices or vice versa.

Notes

  • mappings should be created exactly once, and should be reused for subsequent calls

  • For example, mappings can be created on train data, and the passed during the call for dev and test data.

neurox.interpretation.utils.print_overall_stats(all_results)[source]

Method to pretty print overall results.

Warning

This method was primarily written to process results from internal scripts and pipelines.

Parameters

all_results (dict) – Dictionary containing the probe, overall scores, scores from selected neurons, neuron ordering and neuron selections at various percentages

neurox.interpretation.utils.print_machine_stats(all_results)[source]

Method to print overall results in tsv format.

Warning

This method was primarily written to process results from internal scripts and pipelines.

Parameters

all_results (dict) – Dictionary containing the probe, overall scores, scores from selected neurons, neuron ordering and neuron selections at various percentages

neurox.interpretation.utils.balance_binary_class_data(X, y)[source]

Method to balance binary class data.

Note

The majority class is under-sampled randomly to match the minority class in it’s size.

Parameters
  • X (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually returned from interpretation.utils.create_tensors

  • y (numpy.ndarray) – Numpy vector of size [NUM_TOKENS]. Usually returned from interpretation.utils.create_tensors

Returns

  • X_balanced (numpy.ndarray) – Numpy matrix of size [NUM_BALANCED_TOKENS x NUM_NEURONS]

  • y_balanced (numpy.ndarray) – Numpy vector of size [NUM_BALANCED_TOKENS]

neurox.interpretation.utils.balance_multi_class_data(X, y, num_required_instances=None)[source]

Method to balance multi class data.

Note

All classes are under-sampled randomly to match the minority class in their size. If num_required_instances is provided, all classes are sampled proportionally so that the total number of selected examples is approximately num_required_instances (because of rounding proportions).

Parameters
  • X (numpy.ndarray) – Numpy Matrix of size [NUM_TOKENS x NUM_NEURONS]. Usually returned from interpretation.utils.create_tensors

  • y (numpy.ndarray) – Numpy vector of size [NUM_TOKENS]. Usually returned from interpretation.utils.create_tensors

  • num_required_instances (int, optional) – Total number of required instances. All classes are sampled proportionally.

Returns

  • X_balanced (numpy.ndarray) – Numpy matrix of size [NUM_BALANCED_TOKENS x NUM_NEURONS]

  • y_balanced (numpy.ndarray) – Numpy vector of size [NUM_BALANCED_TOKENS]

neurox.interpretation.utils.load_probe(probe_path)[source]

Loads a probe and its associated mappings from probe_path

Warning

This method is currently not implemented.

Parameters

probe_path (str) – Path to a pkl object saved by interpretation.utils.save_probe

Returns

  • probe (interpretation.linear_probe.LinearProbe) – Trained probe model

  • mappings (list of dicts) – List of four python dicts: label2idx, idx2label, src2idx and idx2src for classification tasks. List of two dicts src2idx and idx2src for regression tasks. Each dict represents either the mapping from class labels to indices and source tokens to indices or vice versa.

neurox.interpretation.utils.save_probe(probe_path, probe, mappings)[source]

Saves a model and its associated mappings as a pkl object at probe_path

Warning

This method is currently not implemented.

Parameters
  • probe_path (str) – Path to save a pkl object

  • probe (interpretation.linear_probe.LinearProbe) – Trained probe model

  • mappings (list of dicts) – List of four python dicts: label2idx, idx2label, src2idx and idx2src for classification tasks. List of two dicts src2idx and idx2src for regression tasks. Each dict represents either the mapping from class labels to indices and source tokens to indices or vice versa.

Module contents: