`obipcr`: the electronic PCR tool #

Description #

The obipcr program is the successor of ecoPCR. It is known as an in silico PCR software.

Synopsis #

obipcr --forward <string> --max-length|-L <int> --reverse <string>
       [--allowed-mismatches|-e <int>] [--batch-size <int>] [--circular|-c]
       [--compress|-Z] [--debug] [--delta|-D <int>] [--ecopcr] [--embl]
       [--fasta] [--fasta-output] [--fastq] [--fastq-output]
       [--force-one-cpu] [--fragmented] [--genbank] [--help|-h|-?]
       [--input-OBI-header] [--input-json-header] [--json-output]
       [--max-cpu <int>] [--min-length|-l <int>] [--no-order]
       [--no-progressbar] [--only-complete-flanking] [--out|-o <FILENAME>]
       [--output-OBI-header|-O] [--output-json-header]
       [--paired-with <FILENAME>] [--pprof] [--pprof-goroutine <int>]
       [--pprof-mutex <int>] [--skip-empty] [--solexa] [--version] [<args>]

Options #

`obipcr` mandatory options #

--forward <PATTERN>: The forward primer used for the electronic PCR. IUPAC codes can be used in the pattern.
--reverse <PATTERN>: The reverse primer used for the electronic PCR. IUPAC codes can be used in the pattern.
--max-length | -L <INTEGER>: Maximum length of the barcode, primers excluded.

Other `obipcr` specific options #

--allowed-mismatches | -e <INTEGER>: Maximum number of mismatches allowed for each primer (default: 0).
--min-length | -l <INTEGER>: Minimum length of the barcode primers excluded (default: no minimum length).
--circular | -c : Considers that sequences are circular. (default: sequences are considered linear)
--delta | -D <INTEGER>: Without this option, only the barcode sequences will be output, without the priming sites. This option allows to add the priming sites and the flanking sequences of the priming sites over a length of delta to each side of the barcode.
--only-complete-flanking: Works in conjunction with –delta. Prints only sequences with full-length flanking sequences (default: prints every sequence regardless of whether the flanking sequences are present).

Controlling the input data #

OBITools4 generally recognizes the input file format. It also recognizes whether the input file is compressed using GZIP. But some rare files can be misidentified, so the following options allow the user to force the format, thus bypassing the format identification step.

The file format options #

--fasta: indicates that sequence data is in fasta format.
--fastq: indicates that sequence data is in fastq format.
--embl: indicates that sequence data is in EMBL-ENA flatfile format.
--csv: indicates that sequence data is in CSV format.
--genbank: indicates that sequence data is in GenBank flatfile format.
--ecopcr: indicates that sequence data is in the old ecoPCR tabulated format.

Controlling the way OBITools4 are formatting annotations #

These options only apply to the FASTA and FASTQ formats

--input-OBI-header: FASTA/FASTQ title line annotations follow the old OBI format.
--input-json-header: FASTA/FASTQ title line annotations follow the JSON format.

Controlling quality score decoding #

This option only applies to the FASTQ formats

--solexa: decodes quality string according to the old Solexa specification. (default: the standard Sanger encoding is used, env: OBISSOLEXA)

Controlling the output data #

--compress | -Z : output is compressed using gzip. (default: false)
--no-order: the OBITools ensure that the order between the input file and the output file does not change. When multiple files are processed, they are processed one at a time. If the –no-order option is added to a command, multiple input files can be opened at the same time and their contents processed in parallel. This usually increases processing speed, but does not guarantee the order of the sequences in the output file. Also, processing multiple files in parallel may require more memory to perform the computation.
--fasta-output: writes sequence data in fasta format (default if quality data is not available).
--fastq-output: writes sequence data in fastq format (default if quality data is available).
--json-output: writes sequence data in JSON format.
--out | -o <FILENAME>: filename used for saving the output (default: “-”, the standard output)
--output-OBI-header | -O : writes output FASTA/FASTQ title line annotations in OBI format (default: JSON).
--output-json-header: writew output FASTA/FASTQ title line annotations in JSON format (the default format).
--skip-empty: sequences of length equal to zero are removed from the output (default: false).
--no-progressbar: deactivates progress bar display (default: false).

General options #

--help | -h|-? : shows this help.
--version: prints the version and exits.
--silent-warning: This option tells obitools to stop displaying warnings. This behaviour can be controlled by setting the OBIWARNINGS environment variable.

--max-cpu <INTEGER>: OBITools can take advantage of your computer’s multi-core architecture by parallelizing the computation across all available CPUs. Computing on more CPUs usually requires more memory to perform the computation. Reducing the number of CPUs used to perform a calculation is also a way to indirectly control the amount of memory used by the process. The number of CPUs used by OBITools can also be controlled by setting the OBIMAXCPU environment variable.
--force-one-cpu: forces the use of a single CPU core for parallel processing (default: false).
--batch-size <INTEGER>: number of sequence per batch for parallel processing (default: 1000, env: OBIBATCHSIZE)

--debug: enables debug mode, by setting log level to debug (default: false, env: OBIDEBUG)
--pprof: enables pprof server. Look at the log for details. (default: false).
--pprof-mutex <INTEGER>: enables profiling of mutex lock. (default: 10, env: OBIPPROFMUTEX)
--pprof-goroutine <INTEGER>: enables profiling of goroutine blocking profile. (default: 6060, env: OBIPPROFGOROUTINE)

Examples #

The minimal obipcr command looks like this:

obipcr -L 220  \  
       --forward GGGCAATCCTGAGCCAA \   
       --reverse CCATTGAGTCTCTGCACCTATC \
       /data/Genbank/Release_261 \
       > Sper01_obipcr.fasta

It retrieves the sequences from the NCBI Genbank Release 261 database located in the /data/Genbank/Release_261 directory. The output is saved in the file Sper01_obipcr.fasta. The primer pair is specified as GGGCAATCCTGAGCCAA and CCATTGAGTCTCTGCACCTATC using the -forward and --reverse options. These primers correspond to the Sper01 marker. The -L option specifies the maximum length of the barcode excluding the primers, here 220 nucleotides. By default, no mismatches are allowed between the primers and the priming sites.

To allow mismatches between the primers and the priming sites, use the --allowed-mismatches option or its short form -e. Here, the maximum number of mismatches allowed is 3. This maximum number of mismatches is allowed per primer. The mismatch can occur anywhere in the primer.

obipcr -e 3  \  
       -L 220  \  
       --forward GGGCAATCCTGAGCCAA \
       --reverse CCATTGAGTCTCTGCACCTATC \
       /data/Genbank/Release_261 \
       > Sper01_obipcr.fasta

To disallow mismatches at specific positions, add a sharp # after the blocked position. For example, GGGCAATCCTGAGCCAA# disallows mismatches at the last position of the forward primer. Since # is also used to introduce comments in a bash script, the primer containing the # sign must be enclosed in single or double quotes.

obipcr -e 3  \  
       -L 220  \  
       --forward "GGGCAATCCTGAGCCAA#" \
       --reverse "CCATTGAGTCTCTGCACCTATC" \
       /data/Genbank/Release_261 \
       > Sper01_obipcr.fasta

obipcr: the electronic PCR tool #