bripipetools Core Packages¶
Overview¶
“Core” packages are where most of the heavy lifting happens, and are called by application-level modules to perform various pipeline tasks. Packages are listed roughly in order of dependency hierarchy (i.e., packages listed first depend on subsequently listed packages).
Note
Intended for developers!
The documentation below is effectively a dump of all low-level packages, modules, classes, and methods that are used to run bripipetools. This amount of detail shouldn’t be needed for most users, but provides a starting point for those looking to understand or modify the code.
Package details¶
annotation package¶
Includes critical functionality for identifying, locating, and describing data and results at various points (e.g., data generation, computational processing) in the bioinformatics pipeline. Each “annotator” class, contained in its respective module, is responsible for collecting and/or updating information for a specific object in the GenLIMS database. When possible, details for an object are retrieved directly from the database; for new objects or objects with missing fields, information is compiled, parsed, and formatted (as needed) from files on the server.
sequencedlibraries module¶
Classify / provide details for sequenced libraries (outputs of a flowcell sequencing run) and the associated raw data.
flowcellruns module¶
Classify / provide details for objects generated from an Illumina sequencing run performed by the BRI Genomics Core.
processedlibraries module¶
workflowbatches module¶
Classify / provide details for objects generated from a Globus Galaxy workflow processing batch performed by the BRI Bioinformatics Core.
qc package¶
Contains classes and methods for performing post-hoc quality control operations on raw or processed genomics data. Modules are organized according to the specifc QC step performed. Unlike routine quality inspection metrics and information provided by standard bioinformatics tools through processing workflows, modules here are aimed more at identifying problems with sample handling or data generation. As such, outputs from these submodules are designated as a special type, ‘validation’, to distinguish them from the QC, metrics, counts, and other output types generated through processing.
sexcheck module¶
Class and methods to perform routine sex check on all processed libraries.
sexverify module¶
Class and methods to perform routine sex check on all processed libraries.
sexpredict module¶
Class and methods to perform routine sex check on all processed libraries.
database package¶
Contains methods for interacting with - connecting to, retrieving data
from, and inserting data into - BRI databases (GenLIMS and ResDB) at
a low level. Under the hood, much of the functionality in this package
relies on the pymongo client library for MongoDB. The database.operations
module provides wrapper functions for getting/putting objects
from/to commonly used database collections, while database.mapping
helps to construct Python model class objects from database
documents. Methods in the database.connection module manage the
database connection, depending on environment and configurations.
connection module¶
Connect to a BRI Mongo database.
operations module¶
Basic operations for BRI Mongo databases.
mapping module¶
bripipetools mapping submodule: methods to map from Mongo documents to model classes.
model package¶
Establishes the underlying data model linking data from bioinformatics
processing pipelines to the GenLIMS/TG3 database. Python class
representations of database objects (documents) are defined in the
model.documents module. These classes include some basic
functionality, mostly related to setting/formatting attributes,
which are eventually fed back into the database as key-value pairs.
However, model classes are also the basic “currency” for several other
modules, where they are used to retrieve, modify, store, and return
data.
Depends on the util and parsing modules.
documents module¶
Classes representing documents in the GenLIMS database.
io package¶
Contains class representations of various file types produced through the generation or processing of genomics data. In particular, most of these classes provide methods for reading and parsing raw data from files and storing/returning these data in a more usable format, such as dictionaries or data frames. Each module contains the representaiton of a file generated by a particular tool or routine; some submodules may handle files from multiple methods within a tool (e.g., Picard). While not explicitly organized as such, modules adhere to a hierarchy based on the “type” of file, where current types include metrics, counts, QC, and validation.
workflow module¶
Class for reading and parsing Galaxy workflow files.
workflowbatch module¶
Classes for reading, parsing, and writing workflow batch submit files for Globus Galaxy.
picardmetrics module¶
Class for reading and parsing Picard metrics files.
htseqmetrics module¶
Class for reading and parsing Tophat Stats metrics files.
tophatstats module¶
Class for reading and parsing Tophat Stats metrics files.
fastqc module¶
Class for reading and parsing FastQC report files.
htseqcounts module¶
Class for reading and parsing htseq files.
sexcheck module¶
Class for reading and parsing sex check validation files.
parsing package¶
Slightly more specialized than methods in the util.strings module,
provides functions for parsing and extracting information from strings
that follow some expected nomenclature. The primary examples of this
information are IDs, names, labels, and other metadata for files and
objects generated either by Illumina technology or the BRI Genomics
Core (via GenLIMS). The parsing.processing module is also designed
to handle specialized strings and labels related to processing
workflows in Globus Galaxy.
Depends on the util module.
gencore module¶
illumina module¶
processing module¶
util module¶
Includes convenience methods related to handling and manipulating
strings (util.strings), file paths (util.files), as well as
user interactions via the command line (util.ui). Methods are used
throughout other packages to streamline common operations.