Module for corpus based analysis.

This module contains functions that relate neurons to corpus elements like words and sentences

neurox.analysis.corpus.get_top_words(tokens, activations, neuron, num_tokens=0)[source]

Get top activating words for any given neuron.

This method compares the activations of the given neuron across all tokens, and extracts tokens that account for the largest variance for that given neuron. It also returns a normalized score for each token, depicting their contribution to the overall variance.

  • tokens (dict) – Dictionary containing atleast one list with the key source. Usually returned from data.loader.load_data

  • activations (list of numpy.ndarray) – List of sentence representations, where each sentence representation is a numpy matrix of shape [num tokens in sentence x concatenated representation size]. Usually retured from data.loader.load_activations

  • neuron (int) – Index of the neuron relative to X

  • num_tokens (int, optional) – Number of top tokens to return. Defaults to 0, which returns all tokens with a non-neglible contribution to the variance


top_neurons – List of tuples, where each tuple is a (token, score) element

Return type

list of tuples


neurox.analysis.plotting.plot_accuracies_per_tag(title, **kwargs)[source]
neurox.analysis.plotting.plot_distributedness(title, top_neurons_per_tag)[source]
neurox.analysis.plotting.plot_accuracies(title, overall_acc, top_10_acc, random_10_acc, bottom_10_acc, top_15_acc, random_15_acc, bottom_15_acc, top_20_acc, random_20_acc, bottom_20_acc)[source]


neurox.analysis.visualization.visualize_activations(tokens, activations, darken=2, colors=['#d35f5f', '#00aad4'], text_direction='ltr', char_limit=60, font_size=20, filter_fn=<function <lambda>>)[source]

Visualize activation values for a particular neuron on some text.

This method returns an SVG drawing of text with every token’s background color set according to the passed in activation values (red for negative values and blue for positive).

  • tokens (list of str) – List of tokens over which the activations have been computed. In the rendered image, tokens will be separated by a single space.

  • activations (list of float) – List of activation values, one per token.

  • darken (int, optional) – Number of times to render the red/blue background. Increasing this value will reduce contrast but may help in better distinguishing between tokens. Defaults to 2

  • colors (list of str, optional) – List of two elements, the first indicating the color of the lowest activation value and the second indicating the color of the highest activation value. Defaults to shades of red and blue respectively

  • text_direction (str, optional) – One of ltr or rtl, indicating if the language being rendered is written left to right or right to left. Defaults to ltr

  • char_limit (int, optional) – Maximum number of characters per line. Defaults to 60

  • font_size (int, optional) – Font size in pixels. Defaults to 20px

  • filter_fn (str or fn, optional) –

    Additional functiont that modifies the incoming activations. Defaults to None resulting in keeping the activations as is. If fn is provided, it must accept a list of activations and return a list of exactly the same number of elements. str choices are currently:

    • top_tokens: Only highlights tokens whose activation values are within 80% of the top activating token in a given sentence. Absolute values are used for comparison.


rendered_svg – A SVG object that you can either save to file, convert into a png within python using an external library like Pycairo, or display in a notebook using the display from the module IPython.display

Return type


class neurox.analysis.visualization.TransformersVisualizer(model_name)[source]

Bases: object

Helper class to visualize sentences using activations from a transformers model.


A transformers model name or path, e.g. bert-base-uncased




The loaded model


transformers model


The loaded tokenizer


transformers tokenizer

__call__(tokens, layer, neuron)[source]

An object of this class can be called directly to get the visualized activations


>>> visualizer = TransformersVisualizer('bert-base-uncased')
>>> svg1 = visualizer(["This", "is", "a", "test"], 0, 10)
>>> svg2 = visualizer(["This", "is", "another", "test"], 5, 767)

Load the model and tokenizer

Module contents: