functionalizer API

The underlying operating basis of the Spark Functionalizer is the subsequent application of several filters to a circuit representation encompassing synapses and cells to obtain a realistic set of synapses representing the connectome of a brain region.

In this implementation, a central Functionalizer instance is used to configure the Apache Spark setup, then load the appropriate cell data, scientific recipe, and touches between cells. Internally, the brain circuit is then represented by the Circuit class. A sequence of filters inheriting from the DatasetOperation class process the touches, which can be subsequently written to disk.

Entry Point

For most uses, the Circuit is constructed by the Functionalizer class based on user parameters passed through. The latter handles also user parameters and the setup of the Apache Spark infrastructure, including memory settings and storage paths.

class Functionalizer(**options)[source]

Bases: object

Functionalizer session class.

export_results(output_path=None, order: SortBy = SortBy.POST, filename: str = 'circuit.parquet')[source]

Writes the touches of the circuit to disk.

Parameters:
  • output_path – Allows to change the default output directory

  • order – The sorting of the touches

  • filename – Allows to change the default output name

init_data(recipe_file, circuit_config, source, source_nodeset, target, target_nodeset, edges=None)[source]

Initialize all data required.

Will load the necessary cell collections from source and target parameters, and construct the underlying brain Circuit. The recipe_file will only be fully processed once filters are instantiated. Similarly, edge and node data will only be fully read once filters are applied.

Parameters:
  • recipe_file – A scientific prescription to be used by the filters on the circuit

  • circuit_config – The basic configuration of the circuit

  • source – The source population name

  • source_nodeset – The source nodeset name

  • target – The target population name

  • target_nodeset – The target nodeset name

  • edges – A list of files containing edges

process_filters(filters=None, overwrite=False)[source]

Filter the circuit.

Uses either the specified filters or a default set, based on the parameters passed to the Functionalizer constructor.

Any filter that writes a checkpoint will be skipped if the sequence of data and filters leading up to said checkpoint did not change. Use the overwrite argument to change this behavior.

Parameters:
  • filters – A list of filter names to be run. Any Filter suffix should be omitted.

  • overwrite – Allows to overwrite checkpoints

circuit = None

ciruit containing neuron and touch data

Type:

property

property output_directory

the directory to save results in.

Type:

property

recipe = <functionalizer.core._MockRecipe object>

The parsed recipe

Type:

property

property touches

The current touch set without additional neuron data as Dataframe.

Type:

property

Data Handling

The NodeData class is used to read both nodes and edges from binary storage or Parquet. Nodes are customarily stored in SONATA format based on HDF5, and NodeData will internally cache them in Parquet format for faster future access.

class Circuit(source: NodeData, target: NodeData, touches: EdgeData)[source]

Bases: object

Representation of a circuit.

Simple data container to simplify and future-proof the API. Objects of this class will hold both nodes and edges of the initial brain connectivity.

Access to both node populations is provided via Circuit.source and Circuit.target. Likewise, the current edges can be obtained via Circuit.touches.

The preferred access to the circuit is through Circuit.df. This object property provides the synapses of the circuit joined with both neuron populations for a full set of properties. The source and target neuron populations attributes are prefixed with src_ and dst_, respectively. The identification of the neurons will be plain src and dst.

The Circuit.df property should also be used to update the connectivity.

Parameters:
  • source – the source neuron population

  • target – the target neuron population

  • touches – the synaptic connections

static expand(columns, source, target)[source]

Expand recipe-convention columns to names and data from dataframes.

For each column name in columns, given in the convention of the recipe, returns a tuple with:

  • the recipe names

  • the appropriate source or target name

  • the appropriate source or target name containing indices to library values

  • the library values to be used with the indexed column

static only_touch_columns(df)[source]

Remove neuron columns from a dataframe.

Parameters:

df – a dataframe to trim

build_circuit(touches)[source]

Joins touches with the node tables.

property df

return a dataframe representing the circuit.

Type:

property

property input_size

the original input size in bytes.

Type:

property

property metadata

metadata associated with the connections.

Type:

property

source

the source neuron population

Type:

property

target

the target neuron population

Type:

property

property touches

The touches originally used to construct the circuit.

Type:

property

class NodeData(circuit_config: str, population: str, nodeset: str, cache: str)[source]

Bases: object

Neuron data loading facilities.

This class represent neuron populations, lazily loaded. After the construction, general properties of the neurons, such as the unique values of the NodeData.mtype_values, NodeData.etype_values, or NodeData.sclass_values present can be accessed.

property df

The PySpark dataframe with the neuron data.

property population

The population name.

Filtering

A detailed overview of the scientific filter implementations available in functionalizer can be found in Synapse Filters.

class DatasetOperation(recipe, source, target)[source]

Bases: object

Basis for synapse filters.

Every filter should derive from DatasetOperation, which will enforce the right format for the constructor and apply() functions. The former is optional, but should be used to extract relevant information from the recipe.

The two node populations are passed to the constructor to enable cross-checks between the recipe information and the population properties. If the constructor raises an exception and the _required attribute is set to False, the filter will be skipped.

If filters add or remove columns from the dataframe, this should be communicated via the _columns attribute, otherwise the general invocation of the filters will fail, as column consistency is checked.

static pathway_functions(columns, counts)[source]

Construct pathway adding functions given columns and a value counts.

__call__(circuit)[source]

Apply the operation to circuit.

abstractmethod apply(circuit: Circuit)[source]

Needs actual implementation of the operation.

Takes a Circuit, applies some operations to it, and returns Spark dataframe representing the updated circuit.

_checkpoint = False

Store the results on disk, allows to skip computation on subsequent runs.

_checkpoint_buckets = None

Partition the data when checkpointing, avoids sort on load.

_columns = []

A list columns to be consumed and produced.

Each item should be a tuple of two strings, giving the column consumed/dropped, and the column produced. If no column is dropped, None can be used. Likewise, if a column is only dropped, None can be the second element.

Examples:

(None, "synapse_id")  # will produce the column "synapse_id"
("synapse_id", None)  # will drop the colulmn "synapse_id"
("ham", "spam")       # will produce the colum "spam" while also
                      # dropping "ham". If the latter is not
                      # present, the former will not be
                      # added.
_reductive = True

Indicates if the filter is expected to reduce the touch count.

_required = True

If set to False, the filter will be skipped if recipe components are not found.

_visible = False

Determines the visibility of the filter to the user.