Debugging Nipype Workflows

Throughout Nipype we try to provide meaningful error messages. If you run into an error that does not have a meaningful error message please let us know so that we can improve error reporting.

Here are some notes that may help debugging workflows or understanding performance issues.

  1. Always run your workflow first on a single iterable (e.g. subject) and gradually increase the execution distribution complexity (Linear->MultiProc-> SGE).

  2. Use the debug config mode. This can be done by setting:

    from nipype import config
    config.enable_debug_mode()
    

    as the first import of your nipype script. To enable debug logging use:

    from nipype import logging
    logging.update_logging(config)
    

    Note

    Turning on debug will rerun your workflows and will rerun them after debugging is turned off.

  3. There are several configuration options that can help with debugging. See Configuration File for more details:

    keep_inputs
    remove_unnecessary_outputs
    stop_on_first_crash
    stop_on_first_rerun
    
  4. When running in distributed mode on cluster engines, it is possible for a node to fail without generating a crash file in the crashdump directory. In such cases, it will store a crash file in the batch directory.

  5. All Nipype crashfiles can be inspected with the nipypecli crash utility.

  6. The nipypecli search command allows you to search for regular expressions in the tracebacks of the Nipype crashfiles within a log folder.

  7. Nipype determines the hash of the input state of a node. If any input contains strings that represent files on the system path, the hash evaluation mechanism will determine the timestamp or content hash of each of those files. Thus any node with an input containing huge dictionaries (or lists) of file names can cause serious performance penalties.

  8. For HUGE data processing, ‘stop_on_first_crash’:’False’, is needed to get the bulk of processing done, and then ‘stop_on_first_crash’:’True’, is needed for debugging and finding failing cases. Setting ‘stop_on_first_crash’: ‘False’ is a reasonable option when you would expect 90% of the data to execute properly.

  9. Sometimes nipype will hang as if nothing is going on and if you hit Ctrl+C you will get a ConcurrentLogHandler error. Simply remove the pypeline.lock file in your home directory and continue.

  10. One many clusters with shared NFS mounts synchronization of files across clusters may not happen before the typical NFS cache timeouts. When using PBS/LSF/SGE/Condor plugins in such cases the workflow may crash because it cannot retrieve the node result. Setting the job_finished_timeout can help:

    workflow.config['execution']['job_finished_timeout'] = 65