Debugging¶
Debug flags¶
For additional output containing information about the parameters for
reduce and cut steps, run Functionalizer with the --debug flag.
Warning
As activating this flag will lead to Functionalizer consuming a lot of memory to gather the required information for the debug output, it is not advised to activate this flag for large executions or by default.
Following a live execution¶
While a Functionalizer process is running, progress can be followed using the
web-interface of the first node executing Functionalizer, given that
${SLURM_JOBID} points to a valid SLURM job:
$ echo "http://$(echo $(sacct -j ${SLURM_JOBID} -n -o nodelist|head -n 1)):4040"
Above command will print the URL of the Apache Spark web interface, which can be used to track some progress.
Similarly, if a Hadoop cluster is instantiated alongside the Spark cluster, using the following will yield a web interface to check on, e.g., disk usage:
$ echo "http://$(echo $(sacct -j ${SLURM_JOBID} -n -o nodelist|head -n 1)):50070
To gauge other resource usage, follow the node list given by
$ sacct -j ${SLURM_JOBID} -n -o nodelist
With the BBP internal system monitoring, CPU utilization and memory usage
can be displayed for each of the nodes listed by the command above.
Use the BB5 System monitoring dashboard directly, and search for the
fully qualified domain name of the node above (including the
_bbp_epfl_ch suffix).
In case the URL has changed, look for the corresponding dashboard in the
BlueBrain Grafana instance.
Post execution analysis¶
Logs of past executions can be analyzed if the logs in a directory called eventlog have been conserved. To find this directory, use the following in the output directory of Functionalizer:
$ find . -name eventlog
$ export LOGDIR=$(find . -name eventlog)
Then an Apache Spark history server can be started as follows:
$ module load functionalizer
$ ${SPARK_HOME}/bin/spark-class \
-Dspark.history.fs.logDirectory=${LOGDIR} \
-Dspark.daemon.memory=30g \
-Dspark.daemon.cores=8 \
org.apache.spark.deploy.history.HistoryServer
The history server then will be active on port 18080 of the machine it was
started on, i.e., if bbpv1 is used, navigate to http://bbpv1:18080.
There will be a list of past executions of Functionalizer, and the history
server will take a small while to process one when opened.
The page of the execution will display job and stage status (a stage