functionalizer API¶
The underlying operating basis of the Spark Functionalizer is the subsequent application of several filters to a circuit representation encompassing synapses and cells to obtain a realistic set of synapses representing the connectome of a brain region.
In this implementation, a central Functionalizer instance is used
to configure the Apache Spark setup, then load the appropriate cell data,
scientific recipe, and touches between cells. Internally, the brain
circuit is then represented by the Circuit class.
A sequence of filters inheriting from the DatasetOperation class
process the touches, which can be subsequently written to disk.
Entry Point¶
For most uses, the Circuit is constructed by the
Functionalizer class based on user parameters passed through.
The latter handles also user parameters and the setup of the Apache Spark
infrastructure, including memory settings and storage paths.
- class Functionalizer(**options)[source]¶
Bases:
objectFunctionalizer session class.
- export_results(output_path=None, order: SortBy = SortBy.POST, filename: str = 'circuit.parquet')[source]¶
Writes the touches of the circuit to disk.
- Parameters:
output_path – Allows to change the default output directory
order – The sorting of the touches
filename – Allows to change the default output name
- init_data(recipe_file, circuit_config, source, source_nodeset, target, target_nodeset, edges=None)[source]¶
Initialize all data required.
Will load the necessary cell collections from source and target parameters, and construct the underlying brain
Circuit. The recipe_file will only be fully processed once filters are instantiated. Similarly, edge and node data will only be fully read once filters are applied.- Parameters:
recipe_file – A scientific prescription to be used by the filters on the circuit
circuit_config – The basic configuration of the circuit
source – The source population name
source_nodeset – The source nodeset name
target – The target population name
target_nodeset – The target nodeset name
edges – A list of files containing edges
- process_filters(filters=None, overwrite=False)[source]¶
Filter the circuit.
Uses either the specified filters or a default set, based on the parameters passed to the
Functionalizerconstructor.Any filter that writes a checkpoint will be skipped if the sequence of data and filters leading up to said checkpoint did not change. Use the overwrite argument to change this behavior.
- Parameters:
filters – A list of filter names to be run. Any Filter suffix should be omitted.
overwrite – Allows to overwrite checkpoints
- circuit = None¶
ciruit containing neuron and touch data
- Type:
property
- property output_directory¶
the directory to save results in.
- Type:
property
- recipe = <functionalizer.core._MockRecipe object>¶
The parsed recipe
- Type:
property
- property touches¶
The current touch set without additional neuron data as Dataframe.
- Type:
property
Data Handling¶
The NodeData class is used to read both nodes and edges from
binary storage or Parquet. Nodes are customarily stored in SONATA format
based on HDF5, and NodeData will internally cache them in
Parquet format for faster future access.
- class Circuit(source: NodeData, target: NodeData, touches: EdgeData)[source]¶
Bases:
objectRepresentation of a circuit.
Simple data container to simplify and future-proof the API. Objects of this class will hold both nodes and edges of the initial brain connectivity.
Access to both node populations is provided via
Circuit.sourceandCircuit.target. Likewise, the current edges can be obtained viaCircuit.touches.The preferred access to the circuit is through
Circuit.df. This object property provides the synapses of the circuit joined with both neuron populations for a full set of properties. The source and target neuron populations attributes are prefixed with src_ and dst_, respectively. The identification of the neurons will be plain src and dst.The
Circuit.dfproperty should also be used to update the connectivity.- Parameters:
source – the source neuron population
target – the target neuron population
touches – the synaptic connections
- static expand(columns, source, target)[source]¶
Expand recipe-convention columns to names and data from dataframes.
For each column name in columns, given in the convention of the recipe, returns a tuple with:
the recipe names
the appropriate source or target name
the appropriate source or target name containing indices to library values
the library values to be used with the indexed column
- static only_touch_columns(df)[source]¶
Remove neuron columns from a dataframe.
- Parameters:
df – a dataframe to trim
- property df¶
return a dataframe representing the circuit.
- Type:
property
- property input_size¶
the original input size in bytes.
- Type:
property
- property metadata¶
metadata associated with the connections.
- Type:
property
- source¶
the source neuron population
- Type:
property
- target¶
the target neuron population
- Type:
property
- property touches¶
The touches originally used to construct the circuit.
- Type:
property
- class NodeData(circuit_config: str, population: str, nodeset: str, cache: str)[source]¶
Bases:
objectNeuron data loading facilities.
This class represent neuron populations, lazily loaded. After the construction, general properties of the neurons, such as the unique values of the
NodeData.mtype_values,NodeData.etype_values, orNodeData.sclass_valuespresent can be accessed.- property df¶
The PySpark dataframe with the neuron data.
- property population¶
The population name.
Filtering¶
A detailed overview of the scientific filter implementations available in
functionalizer can be found in Synapse Filters.
- class DatasetOperation(recipe, source, target)[source]¶
Bases:
objectBasis for synapse filters.
Every filter should derive from
DatasetOperation, which will enforce the right format for the constructor andapply()functions. The former is optional, but should be used to extract relevant information from the recipe.The two node populations are passed to the constructor to enable cross-checks between the recipe information and the population properties. If the constructor raises an exception and the
_requiredattribute is set to False, the filter will be skipped.If filters add or remove columns from the dataframe, this should be communicated via the
_columnsattribute, otherwise the general invocation of the filters will fail, as column consistency is checked.- static pathway_functions(columns, counts)[source]¶
Construct pathway adding functions given columns and a value counts.
- abstractmethod apply(circuit: Circuit)[source]¶
Needs actual implementation of the operation.
Takes a Circuit, applies some operations to it, and returns Spark dataframe representing the updated circuit.
- _checkpoint = False¶
Store the results on disk, allows to skip computation on subsequent runs.
- _checkpoint_buckets = None¶
Partition the data when checkpointing, avoids sort on load.
- _columns = []¶
A list columns to be consumed and produced.
Each item should be a tuple of two strings, giving the column consumed/dropped, and the column produced. If no column is dropped, None can be used. Likewise, if a column is only dropped, None can be the second element.
Examples:
(None, "synapse_id") # will produce the column "synapse_id" ("synapse_id", None) # will drop the colulmn "synapse_id" ("ham", "spam") # will produce the colum "spam" while also # dropping "ham". If the latter is not # present, the former will not be # added.
- _reductive = True¶
Indicates if the filter is expected to reduce the touch count.
- _required = True¶
If set to False, the filter will be skipped if recipe components are not found.
- _visible = False¶
Determines the visibility of the filter to the user.