neurox.analysis¶
Submodules:
neurox.analysis.corpus¶
Module for corpus based analysis.
This module contains functions that relate neurons to corpus elements like words and sentences
-
neurox.analysis.corpus.
get_top_words
(tokens, activations, neuron, num_tokens=0)[source]¶ Get top activating words for any given neuron.
This method compares the activations of the given neuron across all tokens, and extracts tokens that account for the largest variance for that given neuron. It also returns a normalized score for each token, depicting their contribution to the overall variance.
- Parameters
tokens (dict) – Dictionary containing atleast one list with the key
source
. Usually returned fromdata.loader.load_data
activations (list of numpy.ndarray) – List of sentence representations, where each sentence representation is a numpy matrix of shape
[num tokens in sentence x concatenated representation size]
. Usually retured fromdata.loader.load_activations
neuron (int) – Index of the neuron relative to
X
num_tokens (int, optional) – Number of top tokens to return. Defaults to 0, which returns all tokens with a non-neglible contribution to the variance
- Returns
top_neurons – List of tuples, where each tuple is a (token, score) element
- Return type
list of tuples
neurox.analysis.plotting¶
neurox.analysis.visualization¶
-
neurox.analysis.visualization.
visualize_activations
(tokens, activations, darken=2, colors=['#d35f5f', '#00aad4'], text_direction='ltr', char_limit=60, font_size=20, filter_fn=<function <lambda>>)[source]¶ Visualize activation values for a particular neuron on some text.
This method returns an SVG drawing of text with every token’s background color set according to the passed in activation values (red for negative values and blue for positive).
- Parameters
tokens (list of str) – List of tokens over which the activations have been computed. In the rendered image, tokens will be separated by a single space.
activations (list of float) – List of activation values, one per token.
darken (int, optional) – Number of times to render the red/blue background. Increasing this value will reduce contrast but may help in better distinguishing between tokens. Defaults to 2
colors (list of str, optional) – List of two elements, the first indicating the color of the lowest activation value and the second indicating the color of the highest activation value. Defaults to shades of red and blue respectively
text_direction (str, optional) – One of
ltr
orrtl
, indicating if the language being rendered is written left to right or right to left. Defaults toltr
char_limit (int, optional) – Maximum number of characters per line. Defaults to 60
font_size (int, optional) – Font size in pixels. Defaults to 20px
filter_fn (str or fn, optional) –
Additional functiont that modifies the incoming activations. Defaults to None resulting in keeping the activations as is. If fn is provided, it must accept a list of activations and return a list of exactly the same number of elements. str choices are currently:
top_tokens
: Only highlights tokens whose activation values are within 80% of the top activating token in a given sentence. Absolute values are used for comparison.
- Returns
rendered_svg – A SVG object that you can either save to file, convert into a png within python using an external library like Pycairo, or display in a notebook using the
display
from the moduleIPython.display
- Return type
svgwrite.Drawing
-
class
neurox.analysis.visualization.
TransformersVisualizer
(model_name)[source]¶ Bases:
object
Helper class to visualize sentences using activations from a
transformers
model.-
model_name
¶ A
transformers
model name or path, e.g.bert-base-uncased
- Type
str
-
model
¶ The loaded model
- Type
transformers
model
-
tokenizer
¶ The loaded tokenizer
- Type
transformers
tokenizer
-
__call__
(tokens, layer, neuron)[source]¶ An object of this class can be called directly to get the visualized activations
Examples
>>> visualizer = TransformersVisualizer('bert-base-uncased') >>> svg1 = visualizer(["This", "is", "a", "test"], 0, 10) >>> svg2 = visualizer(["This", "is", "another", "test"], 5, 767)
-
Module contents: