bripipetools Application Packages

Overview

Application-level packages are those exposed to the user through wrapper scripts and the command line. They are used to perform common, high-level tasks related to pipeline operations and data. Packages are listed roughly in order of dependency hierarchy (i.e., packages listed first depend on subsequently listed packages).

Note

Intended for developers!

The documentation below is effectively a dump of all high-level packages, modules, classes, and methods that are used to run bripipetools. This amount of detail shouldn’t be needed for most users, but provides a starting point for those looking to understand or modify the code.


Package details

dbification package

Manages the collection and annotation of data (e.g., generated by the Genomics Core or produced through bioinformatics processing) for import into GenLIMS. Modules are designed to handle the set of data associated with a particular “step” (e.g., a flowcell sequencing run or bioinformatics processing of a batch of samples). The dbify.control module inspects an input path and deploys the appropriate importer class.

control submodule

Parse arguments to determine and select appropriate importer class.

flowcellrun module

Class for importing data from a sequencing run into GenLIMS and the Research DB as new objects.

workflowbatch module

Class for importing data from a processing batch into databases as new objects. Supports both research database (“genomics…”) and GenLIMS collections.


postprocessing package

Covers a range of operations performed on outputs and other files produced through bioinformatics processing of a batch of samples. For example, the postprocess.stitching module parses data from individual files of similar type and combines data into a single table for all samples in a project. By extension, postprocess.compiling will take these stitched tables of different types and combine them into a new, large table for the project. On the other hand, the postprocess.cleanup module deals with fixing the way files are named and organized on the disk.

stitching module

Combine parsed data from a set of batch processing output files and write to a single CSV file.

compiling module

Compile combined/stitched ‘summary’ outputs of different types from batch processing and write to a single CSV file.

cleanup module

Clean up & organize outputs from processing workflow batch.


monitoring package

Contains tools for monitoring the status of pipeline steps. Classes and methods here are designed to inspect files on the server and report on various indicators of state (e.g., file existence, access, completion, size, etc.).

workflowbatches module

Monitor the outputs of a workflow processing batch.


submission package

Prepares data for batch submission through Globus Galaxy, typically starting from unaligned samples (libraries) from a flowcell run. The submission.batchcreate and submission.batchparameterize modules handle most of the work: the first takes a list of sample paths (or folders containing sample paths) and a workflow template file and controls the preparation of a batch submit file as well as target folders for batch outputs; the latter sets individual parameter values (mostly input and output file paths) for each sample, which are then used by the BatchCreator class to create and write the overall submission instructions. The submission.flowcellsubmit module provides a wrapper around batchcreate, allowing a user to select workflows and generate batch submissions for multiple unaligned projects from a flowcell run.

flowcellsubmit module

samplesubmit module

batchcreate module

batchparameterize module