obipcr

obipcr: the electronic PCR tool #

Description #

obipcr performs in silico PCR: it scans a database of DNA sequences for regions bounded by a pair of primers and returns the amplicon sequences. It is the OBITools4 successor to ecoPCR and is the standard tool for assessing which taxa a primer pair can amplify before wet-lab work, or for extracting reference barcodes from annotated sequence databases to build metabarcoding reference libraries.

The command requires three mandatory arguments: a forward primer (--forward), a reverse primer (--reverse), and a maximum amplicon length excluding primers (--max-length). Both primers support IUPAC ambiguity codes (e.g., W, R, [AT]) and can include position anchors — appending # after a base forces an exact match at that position regardless of the mismatch budget ( see complete description of the pattern syntax).

The output is a fasta file where each record represents one amplicon found in one input sequence. The record identifier is derived from the source sequence name with the matched coordinates appended in brackets (e.g., NC_035499_sub[53275..53294]). Each amplicon carries additional annotation attributes describing the primer binding: direction, forward_primer, forward_match, forward_error, reverse_primer, reverse_match, and reverse_error. All other metadata present on the original sequence record is preserved.

graph TD
  A@{ shape: doc, label: "plastid10.fasta.gz" }
  C[obipcr]
  D@{ shape: doc, label: "out_amplicon.fasta" }
  A --> C:::obitools
  C --> D
  classDef obitools fill:#99d57c

The simplest obipcr command requires only the two primers and a maximum amplicon length. Given a file of 10 chloroplast sequences ( plastid10.fasta.gz), the following command retrieves all amplicons of the Sper01 marker with no mismatches allowed:

obipcr --forward GGGCAATCCTGAGCCAA \
       --reverse CCATTGAGTCTCTGCACCTATC \
       --max-length 220 \
       --circular \
       --no-progressbar \
       plastid10.fasta.gz \
       > out_amplicon.fasta
📄 out_amplicon.fasta
>NC_035499_sub[53275..53294] {"definition":"Zantedeschia aethiopica chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Zantedeschia aethiopica (calla lily)","taxid":"69721"}
atccttgttttgagaaaaag
>NC_044169_sub[46005..46052] {"definition":"Avena longiglumis chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Avena longiglumis","taxid":"4500"}
atccgtgttttgagaggggggttctcgaactagaatacaaaggaaaag
>NC_056278_sub[35930..35980] {"definition":"Echinosophora koreensis chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Echinosophora koreensis","taxid":"228658"}
atcctgtttttcgcgaaaccaaagaaaagttcataaagcgagaagaaaaag
>NC_036304_sub[48081..48128] {"definition":"Iodes cirrhosa chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Iodes cirrhosa","taxid":"1705841"}
atcctgtttttccgaaaacaaacaaaggttcagcaagcgaaaacaagg
>NC_049876_sub[52480..52533] {"definition":"Quercus gilva chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Quercus gilva","taxid":"103490"}
atcctattttacgaaaacaaataagggttcagaagaaagcgagaataaaaaaag

By default no mismatches are tolerated. To recover amplicons from more divergent taxa, use --allowed-mismatches (or -e). This value applies independently per primer. Allowing up to 2 mismatches per primer in the same search recovers 10 amplicons instead of 5:

obipcr --forward GGGCAATCCTGAGCCAA \
       --reverse CCATTGAGTCTCTGCACCTATC \
       --max-length 220 \
       --allowed-mismatches 2 \
       --no-progressbar \
       --circular \
       plastid10.fasta.gz \
       > out_mismatches.fasta 
📄 out_mismatches.fasta
>NC_008117_sub[118761..118779] {"definition":"Zygnema circumcarinatum chloroplast, complete genome.","direction":"forward","forward_error":2,"forward_match":"aggcaatcctgaaccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":2,"reverse_match":"tcattgagtctctgcacctatt","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Zygnema circumcarinatum","taxid":"35869"}
attctgttatcactgacag
>NC_030275_sub[47166..47215] {"definition":"Helianthus argophyllus chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":1,"reverse_match":"ccatcgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Helianthus argophyllus","taxid":"73275"}
atcacgttttccgaaaacaaacaaaggttcagaaagcgaaaataaaaaag
>NC_035499_sub[53275..53294] {"definition":"Zantedeschia aethiopica chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Zantedeschia aethiopica (calla lily)","taxid":"69721"}
atccttgttttgagaaaaag
>NC_041421_sub[46316..46364] {"definition":"Plantago ovata chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":2,"reverse_match":"ccttcgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Plantago ovata (blond plantain)","taxid":"185002"}
atcctgtcttctcaaaatcaaaataaaggttcagaaagcgagaaaaagg
>NC_044169_sub[46005..46052] {"definition":"Avena longiglumis chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Avena longiglumis","taxid":"4500"}
atccgtgttttgagaggggggttctcgaactagaatacaaaggaaaag
>NC_056278_sub[35930..35980] {"definition":"Echinosophora koreensis chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Echinosophora koreensis","taxid":"228658"}
atcctgtttttcgcgaaaccaaagaaaagttcataaagcgagaagaaaaag
>NC_036300_sub[47963..48004] {"definition":"Emmenopterys henryi voucher XGS20161127 chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":1,"reverse_match":"ccgttgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Emmenopterys henryi","taxid":"86990"}
atcctgttttccgaaaccaaaggttcagaaagtgaaaaaaag
>NC_036304_sub[48081..48128] {"definition":"Iodes cirrhosa chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Iodes cirrhosa","taxid":"1705841"}
atcctgtttttccgaaaacaaacaaaggttcagcaagcgaaaacaagg
>NC_049876_sub[52480..52533] {"definition":"Quercus gilva chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Quercus gilva","taxid":"103490"}
atcctattttacgaaaacaaataagggttcagaagaaagcgagaataaaaaaag
>NC_050043_sub[46723..46773] {"definition":"Fulcaldea stuessyi plastid, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":1,"reverse_match":"ccatcgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"plastid Fulcaldea stuessyi","taxid":"1080004"}
atcacgttttccgaaaacaaacaaaggttcagaaagcgaaaataaanaaag

To forbid mismatches on the three bases ending each primers you can add a # after the corresponding bases.

obipcr --forward 'GGGCAATCCTGAGCC#A#A#' \
       --reverse 'CCATTGAGTCTCTGCACCTA#T#C#' \
       --max-length 220 \
       --allowed-mismatches 2 \
       --no-progressbar \
       --circular \
       plastid10.fasta.gz \
       > out_mismatches_no3.fasta 
📄 out_mismatches_no3.fasta
>NC_030275_sub[47166..47215] {"definition":"Helianthus argophyllus chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCC#A#A#","reverse_error":1,"reverse_match":"ccatcgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTA#T#C#","scientific_name":"chloroplast Helianthus argophyllus","taxid":"73275"}
atcacgttttccgaaaacaaacaaaggttcagaaagcgaaaataaaaaag
>NC_035499_sub[53275..53294] {"definition":"Zantedeschia aethiopica chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCC#A#A#","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTA#T#C#","scientific_name":"chloroplast Zantedeschia aethiopica (calla lily)","taxid":"69721"}
atccttgttttgagaaaaag
>NC_041421_sub[46316..46364] {"definition":"Plantago ovata chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCC#A#A#","reverse_error":2,"reverse_match":"ccttcgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTA#T#C#","scientific_name":"chloroplast Plantago ovata (blond plantain)","taxid":"185002"}
atcctgtcttctcaaaatcaaaataaaggttcagaaagcgagaaaaagg
>NC_044169_sub[46005..46052] {"definition":"Avena longiglumis chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCC#A#A#","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTA#T#C#","scientific_name":"chloroplast Avena longiglumis","taxid":"4500"}
atccgtgttttgagaggggggttctcgaactagaatacaaaggaaaag
>NC_056278_sub[35930..35980] {"definition":"Echinosophora koreensis chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCC#A#A#","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTA#T#C#","scientific_name":"chloroplast Echinosophora koreensis","taxid":"228658"}
atcctgtttttcgcgaaaccaaagaaaagttcataaagcgagaagaaaaag
>NC_036300_sub[47963..48004] {"definition":"Emmenopterys henryi voucher XGS20161127 chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCC#A#A#","reverse_error":1,"reverse_match":"ccgttgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTA#T#C#","scientific_name":"chloroplast Emmenopterys henryi","taxid":"86990"}
atcctgttttccgaaaccaaaggttcagaaagtgaaaaaaag
>NC_036304_sub[48081..48128] {"definition":"Iodes cirrhosa chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCC#A#A#","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTA#T#C#","scientific_name":"chloroplast Iodes cirrhosa","taxid":"1705841"}
atcctgtttttccgaaaacaaacaaaggttcagcaagcgaaaacaagg
>NC_049876_sub[52480..52533] {"definition":"Quercus gilva chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCC#A#A#","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTA#T#C#","scientific_name":"chloroplast Quercus gilva","taxid":"103490"}
atcctattttacgaaaacaaataagggttcagaagaaagcgagaataaaaaaag
>NC_050043_sub[46723..46773] {"definition":"Fulcaldea stuessyi plastid, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCC#A#A#","reverse_error":1,"reverse_match":"ccatcgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTA#T#C#","scientific_name":"plastid Fulcaldea stuessyi","taxid":"1080004"}
atcacgttttccgaaaacaaacaaaggttcagaaagcgaaaataaanaaag
Note

Since # introduces comments in bash, primer sequences containing it must be quoted.

Synopsis #

obipcr --forward <string> --max-length|-L <int> --reverse <string>
       [--allowed-mismatches|-e <int>] [--batch-mem <string>]
       [--batch-size <int>] [--batch-size-max <int>] [--circular|-c]
       [--compress|-Z] [--csv] [--debug] [--delta|-D <int>] [--ecopcr]
       [--embl] [--fail-on-taxonomy] [--fasta] [--fasta-output] [--fastq]
       [--fastq-output] [--fragmented] [--genbank] [--help|-h|-?]
       [--input-OBI-header] [--input-json-header] [--json-output]
       [--max-cpu <int>] [--min-length|-l <int>] [--no-order]
       [--no-progressbar] [--only-complete-flanking] [--out|-o <FILENAME>]
       [--output-OBI-header|-O] [--output-json-header] [--pprof]
       [--pprof-goroutine <int>] [--pprof-mutex <int>] [--raw-taxid]
       [--silent-warning] [--skip-empty] [--solexa] [--taxonomy|-t <string>]
       [--u-to-t] [--update-taxid] [--version] [--with-leaves] [<args>]

Options #

obipcr mandatory options #

  • --forward <PATTERN>: The forward primer used for the electronic PCR. IUPAC ambiguity codes can be used in the pattern. Append # after a base (in quotes) to force an exact match at that position.
  • --reverse <PATTERN>: The reverse primer used for the electronic PCR. Same syntax as --forward.
  • --max-length | -L <INTEGER>: Maximum length of the amplicon in bases, primers excluded. Amplicons longer than this value are discarded.

Other obipcr specific options #

  • --allowed-mismatches | -e <INTEGER>: Maximum number of mismatches allowed for each primer independently (default: 0). The mismatch budget is applied per primer, not globally.
  • --min-length | -l <INTEGER>: Minimum length of the amplicon in bases, primers excluded (default: 0, no minimum length filter).
  • --circular | -c : Treat input sequences as circular (e.g., chloroplast or mitochondrial genomes). Allows primers to match across the sequence origin. (default: false)
  • --delta | -D <INTEGER>: Number of additional bases to include on each side of the amplicon beyond the primer-binding sites. Without this option only the barcode sequence between the priming sites is reported. A value of -1 means no flanking extension. (default: -1)
  • --only-complete-flanking: When used with --delta, discard amplicons for which the requested flanking region cannot be fully extracted (e.g., because the amplicon sits too close to a sequence end). (default: false)
  • --fragmented: Fragment long sequences into overlapping windows before searching. Speeds up PCR simulation on very long sequences such as whole chromosomes. Fragments overlap so that amplicons spanning a boundary are not missed. (default: false)

Taxonomic options #

  • --taxonomy | -t <string>: Path to the taxonomic database.
  • --fail-on-taxonomy: Exit with an error if a taxid referenced in the data is not present in the loaded taxonomy. (default: false)
  • --raw-taxid: Print taxids without supplementary information (taxon name and rank). (default: false)
  • --update-taxid: Automatically update taxids declared as merged into a newer identifier. (default: false)
  • --with-leaves: When taxonomy is extracted from a sequence file, add sequences as leaves of their taxid. (default: false)

Controlling the input data #

OBITools4 generally recognizes the input file format. It also recognizes whether the input file is compressed using GZIP. But some rare files can be misidentified, so the following options allow the user to force the format, thus bypassing the format identification step.
The file format options #
  • --fasta: indicates that sequence data is in fasta format.
  • --fastq: indicates that sequence data is in fastq format.
  • --embl: indicates that sequence data is in EMBL-ENA flatfile format.
  • --csv: indicates that sequence data is in CSV format.
  • --genbank: indicates that sequence data is in GenBank flatfile format.
  • --ecopcr: indicates that sequence data is in the old ecoPCR tabulated format.
Controlling the way OBITools4 are formatting annotations #
These options only apply to the FASTA and FASTQ formats
  • --input-OBI-header: FASTA/FASTQ title line annotations follow the old OBI format.
  • --input-json-header: FASTA/FASTQ title line annotations follow the JSON format.
Controlling quality score decoding #
This option only applies to the FASTQ formats
  • --solexa: decodes quality string according to the old Solexa specification. (default: the standard Sanger encoding is used, env: OBISSOLEXA)

Controlling the output data #

  • --compress | -Z : output is compressed using gzip. (default: false)
  • --no-order: the OBITools ensure that the order between the input file and the output file does not change. When multiple files are processed, they are processed one at a time. If the –no-order option is added to a command, multiple input files can be opened at the same time and their contents processed in parallel. This usually increases processing speed, but does not guarantee the order of the sequences in the output file. Also, processing multiple files in parallel may require more memory to perform the computation.
  • --fasta-output: writes sequence data in fasta format (default if quality data is not available).
  • --fastq-output: writes sequence data in fastq format (default if quality data is available).
  • --json-output: writes sequence data in JSON format.
  • --out | -o <FILENAME>: filename used for saving the output (default: “-”, the standard output)
  • --output-OBI-header | -O : writes output FASTA/FASTQ title line annotations in OBI format (default: JSON).
  • --output-json-header: writew output FASTA/FASTQ title line annotations in JSON format (the default format).
  • --skip-empty: sequences of length equal to zero are removed from the output (default: false).
  • --no-progressbar: deactivates progress bar display (default: false).

General options #

  • --help | -h|-? : shows this help.
  • --version: prints the version and exits.
  • --silent-warning: This option tells obitools to stop displaying warnings. This behaviour can be controlled by setting the OBIWARNINGS environment variable.
  • --max-cpu <INTEGER>: OBITools can take advantage of your computer’s multi-core architecture by parallelizing the computation across all available CPUs. Computing on more CPUs usually requires more memory to perform the computation. Reducing the number of CPUs used to perform a calculation is also a way to indirectly control the amount of memory used by the process. The number of CPUs used by OBITools can also be controlled by setting the OBIMAXCPU environment variable.
  • --force-one-cpu: forces the use of a single CPU core for parallel processing (default: false).
  • --batch-size <INTEGER>: minimum number of sequences per batch for parallel processing (floor, default: 1, env: OBIBATCHSIZE)
  • --batch-size-max <INTEGER>: maximum number of sequences per batch for parallel processing (ceiling, default: 2000, env: OBIBATCHSIZEMAX)
  • --batch-mem <STRING>: maximum memory per batch (e.g. 128K, 64M, 1G; default: 128M; set to 0 to disable, env: OBIBATCHMEM)
  • --debug: enables debug mode, by setting log level to debug (default: false, env: OBIDEBUG)
  • --pprof: enables pprof server. Look at the log for details. (default: false).
  • --pprof-mutex <INTEGER>: enables profiling of mutex lock. (default: 10, env: OBIPPROFMUTEX)
  • --pprof-goroutine <INTEGER>: enables profiling of goroutine blocking profile. (default: 6060, env: OBIPPROFGOROUTINE)

Examples #

Extracting amplicons with flanking context #

The file plastid10.fasta.gz contains 10 chloroplast genome sequences. To extract Sper01 amplicons and include 50 bases of flanking sequence on each side, use --delta. Adding --only-complete-flanking restricts the output to amplicons where the full flanking region is available — useful when you need the flanking context for subsequent alignment or primer design steps:

obipcr --forward GGGCAATCCTGAGCCAA \
       --reverse CCATTGAGTCTCTGCACCTATC \
       --max-length 220 \
       --delta 50 \
       --only-complete-flanking \
       --circular \
       --no-progressbar \
       -o out_extended.fasta \
       plastid10.fasta.gz
📄 out_extended.fasta
>NC_035499_sub[53208..53366] {"definition":"Zantedeschia aethiopica chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Zantedeschia aethiopica (calla lily)","taxid":"69721"}
acctactaagtggtaacttccaaactcagagaaaccctggaataaaaaatgggcaatcct
gagccaaatccttgttttgagaaaaaggataggtgcagagactcaatggaagctgttcta
acgaatggagttgattgcattgcgttggtagctggaatt
>NC_044169_sub[45938..46124] {"definition":"Avena longiglumis chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Avena longiglumis","taxid":"4500"}
acctgctaagtggtaacttccaaattcagagaaaccctggaattaaaaaagggcaatcct
gagccaaatccgtgttttgagaggggggttctcgaactagaatacaaaggaaaaggatag
gtgcagagactcaatggaagctgttctaacgaatcgagttaattacgttgtgttgttagt
ggaactc
>NC_056278_sub[35863..36052] {"definition":"Echinosophora koreensis chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Echinosophora koreensis","taxid":"228658"}
acttaccaagtgataactttcaaattcagagaaaccctggaattaataatgggcaatcct
gagccaaatcctgtttttcgcgaaaccaaagaaaagttcataaagcgagaagaaaaagga
taggtgcagagactcaatggaagctgttctaacaaatggagttgacgacatttcctttcg
cattagatta
>NC_036304_sub[48014..48200] {"definition":"Iodes cirrhosa chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Iodes cirrhosa","taxid":"1705841"}
tactaagtgataactttcaaattcagagaaaccctggaattaataaaaatgggcaatcct
gagccaaatcctgtttttccgaaaacaaacaaaggttcagcaagcgaaaacaagggatag
gtgcagagactcaatggaagctgttctaacaaatggagttgactgcgttggtagagaaat
ctttcga
>NC_049876_sub[52413..52605] {"definition":"Quercus gilva chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Quercus gilva","taxid":"103490"}
acttaacaagtgataactttcaaattcagagaaaccctggaattaaaaatgggcaatcct
gagccaaatcctattttacgaaaacaaataagggttcagaagaaagcgagaataaaaaaa
ggataggtgcagagactcaatggaagctgttctaacaagtggggttgacttctttacgtt
attacattaacgt
Note

Using -D 0 produces amplicon sequences that include the priming sites without any supplementary base. Without this option, the amplicon sequences are ouput without the priming sites (default behaviour).

Filtering amplicons by length #

To keep only amplicons within a specific size range (primers excluded), combine --min-length and --max-length. This is useful when a primer pair may produce multiple amplicons of different sizes and only a subset corresponds to the target barcode:

obipcr --forward GGGCAATCCTGAGCCAA \
       --reverse CCATTGAGTCTCTGCACCTATC \
       --max-length 60 \
       --min-length 25 \
       --circular \
       --no-progressbar \
       plastid10.fasta.gz >
       > out_filtered.fasta 
📄 out_filtered.fasta
>NC_044169_sub[46005..46052] {"definition":"Avena longiglumis chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Avena longiglumis","taxid":"4500"}
atccgtgttttgagaggggggttctcgaactagaatacaaaggaaaag
>NC_056278_sub[35930..35980] {"definition":"Echinosophora koreensis chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Echinosophora koreensis","taxid":"228658"}
atcctgtttttcgcgaaaccaaagaaaagttcataaagcgagaagaaaaag
>NC_036304_sub[48081..48128] {"definition":"Iodes cirrhosa chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Iodes cirrhosa","taxid":"1705841"}
atcctgtttttccgaaaacaaacaaaggttcagcaagcgaaaacaagg
>NC_049876_sub[52480..52533] {"definition":"Quercus gilva chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Quercus gilva","taxid":"103490"}
atcctattttacgaaaacaaataagggttcagaagaaagcgagaataaaaaaag

Dealing with memory limits #

For large sequence databases containing very long sequences (e.g., whole chromosomes), enabling --fragmented splits each sequence into overlapping windows before the primer search, substantially reducing memory use, but at the price of possibly getting artefactually several times the same amplicon in the output, because of the fragment overlap.

obipcr --forward GGGCAATCCTGAGCCAA \
       --reverse CCATTGAGTCTCTGCACCTATC \
       --max-length 220 \
       --allowed-mismatches 2 \
       --fragmented \
       --circular \
       --no-progressbar \
       plastid10.fasta.gz \
       > out_fragmented.fasta 
📄 out_fragmented.fasta
>NC_008117_sub[118761..118779] {"definition":"Zygnema circumcarinatum chloroplast, complete genome.","direction":"forward","forward_error":2,"forward_match":"aggcaatcctgaaccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":2,"reverse_match":"tcattgagtctctgcacctatt","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Zygnema circumcarinatum","taxid":"35869"}
attctgttatcactgacag
>NC_030275_sub[47166..47215] {"definition":"Helianthus argophyllus chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":1,"reverse_match":"ccatcgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Helianthus argophyllus","taxid":"73275"}
atcacgttttccgaaaacaaacaaaggttcagaaagcgaaaataaaaaag
>NC_035499_sub[53275..53294] {"definition":"Zantedeschia aethiopica chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Zantedeschia aethiopica (calla lily)","taxid":"69721"}
atccttgttttgagaaaaag
>NC_041421_sub[46316..46364] {"definition":"Plantago ovata chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":2,"reverse_match":"ccttcgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Plantago ovata (blond plantain)","taxid":"185002"}
atcctgtcttctcaaaatcaaaataaaggttcagaaagcgagaaaaagg
>NC_044169_sub[46005..46052] {"definition":"Avena longiglumis chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Avena longiglumis","taxid":"4500"}
atccgtgttttgagaggggggttctcgaactagaatacaaaggaaaag
>NC_056278_sub[35930..35980] {"definition":"Echinosophora koreensis chloroplast, complete sequence.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Echinosophora koreensis","taxid":"228658"}
atcctgtttttcgcgaaaccaaagaaaagttcataaagcgagaagaaaaag
>NC_036300_sub[47963..48004] {"definition":"Emmenopterys henryi voucher XGS20161127 chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":1,"reverse_match":"ccgttgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Emmenopterys henryi","taxid":"86990"}
atcctgttttccgaaaccaaaggttcagaaagtgaaaaaaag
>NC_036304_sub[48081..48128] {"definition":"Iodes cirrhosa chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Iodes cirrhosa","taxid":"1705841"}
atcctgtttttccgaaaacaaacaaaggttcagcaagcgaaaacaagg
>NC_049876_sub[52480..52533] {"definition":"Quercus gilva chloroplast, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":0,"reverse_match":"ccattgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"chloroplast Quercus gilva","taxid":"103490"}
atcctattttacgaaaacaaataagggttcagaagaaagcgagaataaaaaaag
>NC_050043_sub[46723..46773] {"definition":"Fulcaldea stuessyi plastid, complete genome.","direction":"forward","forward_error":0,"forward_match":"gggcaatcctgagccaa","forward_primer":"GGGCAATCCTGAGCCAA","reverse_error":1,"reverse_match":"ccatcgagtctctgcacctatc","reverse_primer":"CCATTGAGTCTCTGCACCTATC","scientific_name":"plastid Fulcaldea stuessyi","taxid":"1080004"}
atcacgttttccgaaaacaaacaaaggttcagaaagcgaaaataaanaaag

Displaying help #

obipcr --help