obitagpcr: split paired-end raw reads per sample
#
Description #
The obitagpcr
command processes paired-end raw reads from amplicon
sequencing experiments and assigns each read pair to the correct biological
sample/PCR replicate. It relies on two successive operations:
- Paired-end assembly — forward (R1) and reverse (R2) reads are merged
into a single consensus amplicon using the same overlap-based algorithm as
obipairing. - Demultiplexing — each assembled amplicon is matched against a list of
known primers and barcodes (tags) to identify the sample of origin, using
the same engine as
obimultiplex.
However, unlike the chaining of these two steps using obipairing
and obimultiplex
commands, obitagpcr
forgets the assembled amplicon and only tags the forward and the reverse reads with the deduced sample ID.
graph LR
R1["Forward reads
(R1.fastq)"] --> OBT{{obitagpcr}}
R2["Reverse reads
(R2.fastq)"] --> OBT
CSV["NGSFilter CSV
(--tag-list)"] --> OBT
OBT --> OUT_R1["result_R1.fastq"]
OBT --> OUT_R2["result_R2.fastq"]
OBT -. "--unidentified" .-> UNID["unassigned.fastq"]
obitagpcr
is an alternative entry point for Illumina paired-end
metabarcoding data, when we want to delegate the processing of data to external tools requiring per sample data files such as
DADA2.
Output files #
Unlike most OBITools4 commands, obitagpcr
always produces
paired output. Therefore the --out option must be used to indicate where to save the results, as example using --out result.fastq is producing two files:
result_R1.fastq and result_R2.fastq.
The NGSFilter sample description file #
The --tag-list option takes a
CSV
file that
describes all PCR reactions in the library. The exact structure of the file is shared with
obimultiplex
; see the obimultiplex
page for a complete
description of the format, @param configuration options, and tag-matching
algorithms.
| |
The file has two sections: optional @param lines that configure matching
behaviour (primer mismatches, indels, tag-matching algorithm), and a sample
table with at minimum five columns: experiment, sample, sample_tag,
forward_primer, reverse_primer.
Use obipcrtag --template to print an annotated example of the format.
Annotations added by obitagpcr
#
Each successfully demultiplexed sequence is annotated with the following attributes:
Sample identification
experiment:"wolf_diet"Experiment name as defined in the NGSFilter file.
sample:"29a_F260619"Sample (PCR) name as defined in the NGSFilter file.
Amplicon orientation
obimultiplex_direction:"forward"/"reverse"Because sequencing is not oriented, some read pairs have the forward read starting with the forward primer, while some others have the forward read starting with the reverse primer. The
obimultiplex_directionannotation documents these two cases:"forward"value means that the forward primer was found at the beginning of the forward read,"reverse"value means the reverse primer was found at the beginning of the forward read.
Adding the
--reorientateflag to the command, exchanges and reverse-complements both reads of pairs annotated as"reverse". Therefore, the reads in the R1 file all match the forward primer at their beginning, while the sequences in the R2 file all end with the reverse primer.
Primer matching
obimultiplex_forward_match:"ttagataccccactatgc"Forward primer sequence as observed in the read.
obimultiplex_forward_error:0Number of mismatches between the forward primer and the read.
obimultiplex_reverse_match:"tagaacaggctcctctag"Reverse primer sequence as observed in the read.
obimultiplex_reverse_error:0Number of mismatches between the reverse primer and the read.
Tag identification
obimultiplex_forward_tag:"gcctcct"Barcode sequence observed at the forward end of the read.
obimultiplex_reverse_tag:"gcctcct"Barcode sequence observed at the reverse end of the read.
When paired-end assembly succeeds via overlap alignment, additional attributes
from the obipairing
step (ali_length, score_norm, identity,
mode) are also present in the output, unless suppressed with --without-stat (see obimultiplex
for a complete description of the added annotations).
Synopsis #
obitagpcr --forward-reads|-F <FILENAME_F> --reverse-reads|-R <FILENAME_R>
[--allowed-mismatches|-e <int>] [--batch-mem <string>]
[--batch-size <int>] [--batch-size-max <int>] [--compress|-Z]
[--debug] [--delta|-D <int>] [--ecopcr] [--embl] [--exact-mode]
[--fast-absolute] [--fasta] [--fasta-output] [--fastq]
[--fastq-output] [--gap-penalty|-G <float64>] [--genbank]
[--help|-h|-?] [--input-OBI-header] [--input-json-header]
[--json-output] [--keep-errors] [--max-cpu <int>]
[--min-identity|-X <float64>] [--min-overlap <int>] [--no-order]
[--no-progressbar] [--out|-o <FILENAME>] [--output-OBI-header|-O]
[--output-json-header] [--penalty-scale <float64>] [--pprof]
[--pprof-goroutine <int>] [--pprof-mutex <int>] [--reorientate]
[--silent-warning] [--skip-empty] [--solexa]
[--tag-list|-s <string>] [--template] [--u-to-t]
[--unidentified|-u <string>] [--version] [--with-indels]
[--without-stat|-S] [<args>]
Options #
obitagpcr
specific options
#
--forward-reads|-F<FILENAME_F>: The file names containing the forward reads.--reverse-reads|-R<FILENAME_R>: The file names containing the reverse reads.--allowed-mismatches|-e<INTEGER>: Used to specify the number of errors allowed for matching primers. (default: -1)--delta|-D<int>: Length added to the fast detected overlap for the precise alignement (default: 5)--exact-mode: Do not run fast alignment heuristic. (default: false)--fast-absolute: Compute absolute fast score (no action in exact mode). (default: false)--gap-penalty|-G<float64>: Gap penaity expressed as the multiply factor applied to the mismatch score between two nucleotides with a quality of 40 (default 2). (default: 2.000000)--keep-errors: Prints symbol counts. (default: false)--min-identity|-X<float64>: Minimum identity between ovelaped regions of the reads to consider the aligment (default: 0.900000)--min-overlap<int>: Minimum ovelap between both the reads to consider the aligment (default: 20)--penalty-scale<float64>: Scale factor applied to the mismatch score and the gap penalty (default 1). (default: 1.000000)--reorientate: Reverse complemente reads if needed to store all the sequences in the same orientation respectively to forward and reverse primers (default: false)--tag-list|-s<string>: File name of the NGSFilter file describing PCRs.--template: Print on the standard output an example of CSV configuration file. (default: false)--unidentified|-u<string>: Filename used to store the sequences unassigned to any sample.--with-indels: Allows for indels during the primers matching. (default: false)--without-stat|-S: Remove alignment statistics from the produced consensus sequences. (default: false)
Controlling the input data #
OBITools4 generally recognizes the input file format. It also recognizes whether the input file is compressed using GZIP. But some rare files can be misidentified, so the following options allow the user to force the format, thus bypassing the format identification step.The file format options #
--fasta: indicates that sequence data is in fasta format.--fastq: indicates that sequence data is in fastq format.--embl: indicates that sequence data is in EMBL-ENA flatfile format.--csv: indicates that sequence data is in CSV format.--genbank: indicates that sequence data is in GenBank flatfile format.--ecopcr: indicates that sequence data is in the old ecoPCR tabulated format.
Controlling the way OBITools4 are formatting annotations #
These options only apply to the FASTA and FASTQ formats--input-OBI-header: FASTA/FASTQ title line annotations follow the old OBI format.--input-json-header: FASTA/FASTQ title line annotations follow the JSON format.
Controlling quality score decoding #
This option only applies to the FASTQ formats--solexa: decodes quality string according to the old Solexa specification. (default: the standard Sanger encoding is used, env: OBISSOLEXA)
Controlling the output data #
--compress|-Z: output is compressed using gzip. (default: false)--no-order: the OBITools ensure that the order between the input file and the output file does not change. When multiple files are processed, they are processed one at a time. If the –no-order option is added to a command, multiple input files can be opened at the same time and their contents processed in parallel. This usually increases processing speed, but does not guarantee the order of the sequences in the output file. Also, processing multiple files in parallel may require more memory to perform the computation.--fasta-output: writes sequence data in fasta format (default if quality data is not available).--fastq-output: writes sequence data in fastq format (default if quality data is available).--json-output: writes sequence data in JSON format.--out|-o<FILENAME>: filename used for saving the output (default: “-”, the standard output)--output-OBI-header|-O: writes output FASTA/FASTQ title line annotations in OBI format (default: JSON).--output-json-header: writew output FASTA/FASTQ title line annotations in JSON format (the default format).--skip-empty: sequences of length equal to zero are removed from the output (default: false).--no-progressbar: deactivates progress bar display (default: false).
General options #
--help|-h|-?: shows this help.--version: prints the version and exits.--silent-warning: This option tells obitools to stop displaying warnings. This behaviour can be controlled by setting the OBIWARNINGS environment variable.
Computation related options #
--max-cpu<INTEGER>: OBITools can take advantage of your computer’s multi-core architecture by parallelizing the computation across all available CPUs. Computing on more CPUs usually requires more memory to perform the computation. Reducing the number of CPUs used to perform a calculation is also a way to indirectly control the amount of memory used by the process. The number of CPUs used by OBITools can also be controlled by setting the OBIMAXCPU environment variable.--force-one-cpu: forces the use of a single CPU core for parallel processing (default: false).--batch-size<INTEGER>: minimum number of sequences per batch for parallel processing (floor, default: 1, env: OBIBATCHSIZE)--batch-size-max<INTEGER>: maximum number of sequences per batch for parallel processing (ceiling, default: 2000, env: OBIBATCHSIZEMAX)--batch-mem<STRING>: maximum memory per batch (e.g. 128K, 64M, 1G; default: 128M; set to 0 to disable, env: OBIBATCHMEM)
Debug related options #
--debug: enables debug mode, by setting log level to debug (default: false, env: OBIDEBUG)--pprof: enables pprof server. Look at the log for details. (default: false).--pprof-mutex<INTEGER>: enables profiling of mutex lock. (default: 10, env: OBIPPROFMUTEX)--pprof-goroutine<INTEGER>: enables profiling of goroutine blocking profile. (default: 6060, env: OBIPPROFGOROUTINE)
Examples #
The examples below use the wolf diet 12S metabarcoding dataset from the Illumina OBITools4 cookbook (4 paired-end reads shown; full 20-read files: wolf_F.fastq and wolf_R.fastq).
Basic demultiplexing of a paired-end library #
Assemble paired reads and assign them to samples using the primer–barcode
combinations defined in
wolf_diet_ngsfilter.csv. The --out flag creates
two files: out_basic_R1.fastq and out_basic_R2.fastq.
@HELIUM_000100422_612GNAAXX:7:119:14871:19157#0/1
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattattataacaaaatcattcgccagagtactaccggcaatagctcaaaactcaaagaactt
+
CCCCCCCCCCCCCCCCCCCCCCBCCCCB@BCCCCCCCCCCCCCB;CCCACCCCCCCAACA29,?<5899+A=A###################################
@HELIUM_000100422_612GNAAXX:7:108:5640:3823#0/1
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattgttcgccagagtactaccggcaatagcttaaaactcaaaggactt
+
CCCCCCCBCCCCCCCCCCCCCCCCCCCCCCBCCCCCBCCCCCCC<CCCCACC;C?CCCC@A;=,B;93:;CC=C;==??#############################
@HELIUM_000100422_612GNAAXX:7:97:14311:19299#0/1
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccgccactagcttaaaactcaaagaactc
+
CCCCCCCCCCCCCCCCCCCCCCCBBCCC?BCCCCCBC?CCCC@@;AAAAA5C@C@CCC@C>>;C@7CC@C93;31::5<<AA<@########################
@HELIUM_000100422_612GNAAXX:7:22:8540:14708#0/1
ccgcctccttagaacaggctcctctagaagggtataaagcaccgccaagtcctttgagttttaagctattgccggtagtactctggcgaataattttgttatattaat
+
CCCCCCCCCBCCCCCCCCBCCCCCCCCCCCA=AAA@CCCCCCCCCCC?CACCC?CC@C@CACC?CA=B?0A;AAA6;>3?AC?C?8AAA3<<-8<BAC@22<6?####
@HELIUM_000100422_612GNAAXX:7:119:14871:19157#0/2
ccgcctccttagaacaggctcctctagaagggtataaagcaccgccaagtcctttgagttttaacctactcccgctacacgtccgccgaataatactgttatcatatt
+
CCCCCCCCCCCAAC@CCBCCCCCCB@C@CCCC@@CBBB6@@CC@AC8CC<C>C@@#####################################################
@HELIUM_000100422_612GNAAXX:7:108:5640:3823#0/2
ccgcctccttagaacaggctcctctagaagggtataaagcaccgccaagtcctttgagttttaagctattgccggtagtactctggcgaacacttttgttatattact
+
CCCCCCCCCCCCCCCCACCCCCACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCBCCCCCCCCCC=CCCCCCCCCC:=><ACCCCBCCA8;68.69AA?>(>AC@CA3A
@HELIUM_000100422_612GNAAXX:7:97:14311:19299#0/2
ccgcctccttagaacaggctcctctagaagggtataaagcaccgccaagtcctttgagttttaagctcttgccggtagtactctggcgcacacttttcttatattact
+
CC@CCCCCBC?CBCCCCCCCCC=CC<CCCCCC9@C?;?<+BB@??85<?>?<<6<:<?43???<2?3;??CA@C552(8<5<>:).(//1//,1'6:375=CCCC@?6
@HELIUM_000100422_612GNAAXX:7:22:8540:14708#0/2
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaatagcttcacactcaaagaactt
+
CCCDCCCCCCCCCDCCCCCCCCCDCCC@CCACCCCCCCCCCCCDCCCCCCCCCDCCCCCBBBBCC=/AAA===>=C<CCC?B9AA;3??7CC@C6CCC8ACCC+AB8A
obitagpcr \
--forward-reads wolf_F_4seq.fastq \
--reverse-reads wolf_R_4seq.fastq \
--tag-list wolf_diet_ngsfilter.csv \
--out out_basic.fastq
The R1 output, with demultiplexing annotations added to each sequence header:
📄 out_basic_4seq_R1.fastq@HELIUM_000100422_612GNAAXX:7:119:14871:19157#0/1 {"experiment":"wolf_diet","obimultiplex_direction":"forward","obimultiplex_forward_match":"ttagataccccactatgc","obimultiplex_forward_mismatches":0,"obimultiplex_forward_tag":"gcctcct","obimultiplex_reverse_match":"tagaacaggctcctctag","obimultiplex_reverse_mismatches":0,"obimultiplex_reverse_tag":"gcctcct","sample":"29a_F260619"}
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattattataacaaaatcattcgccagagtactaccggcaatagctcaaaactcaaagaactt
+
CCCCCCCCCCCCCCCCCCCCCCBCCCCB@BCCCCCCCCCCCCCB;CCCACCCCCCCAACA29,?<5899+A=A###################################
@HELIUM_000100422_612GNAAXX:7:108:5640:3823#0/1 {"experiment":"wolf_diet","obimultiplex_direction":"forward","obimultiplex_forward_match":"ttagataccccactatgc","obimultiplex_forward_mismatches":0,"obimultiplex_forward_tag":"gcctcct","obimultiplex_reverse_match":"tagaacaggctcctctag","obimultiplex_reverse_mismatches":0,"obimultiplex_reverse_tag":"gcctcct","sample":"29a_F260619"}
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattgttcgccagagtactaccggcaatagcttaaaactcaaaggactt
+
CCCCCCCBCCCCCCCCCCCCCCCCCCCCCCBCCCCCBCCCCCCC<CCCCACC;C?CCCC@A;=,B;93:;CC=C;==??#############################
@HELIUM_000100422_612GNAAXX:7:97:14311:19299#0/1 {"experiment":"wolf_diet","obimultiplex_direction":"forward","obimultiplex_forward_match":"ttagataccccactatgc","obimultiplex_forward_mismatches":0,"obimultiplex_forward_tag":"gcctcct","obimultiplex_reverse_match":"tagaacaggctcctctag","obimultiplex_reverse_mismatches":0,"obimultiplex_reverse_tag":"gcctcct","sample":"29a_F260619"}
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccgccactagcttaaaactcaaagaactc
+
CCCCCCCCCCCCCCCCCCCCCCCBBCCC?BCCCCCBC?CCCC@@;AAAAA5C@C@CCC@C>>;C@7CC@C93;31::5<<AA<@########################
@HELIUM_000100422_612GNAAXX:7:22:8540:14708#0/1 {"experiment":"wolf_diet","obimultiplex_direction":"reverse","obimultiplex_forward_match":"ttagataccccactatgc","obimultiplex_forward_mismatches":0,"obimultiplex_forward_tag":"gcctcct","obimultiplex_reverse_match":"tagaacaggctcctctag","obimultiplex_reverse_mismatches":0,"obimultiplex_reverse_tag":"gcctcct","sample":"29a_F260619"}
ccgcctccttagaacaggctcctctagaagggtataaagcaccgccaagtcctttgagttttaagctattgccggtagtactctggcgaataattttgttatattaat
+
CCCCCCCCCBCCCCCCCCBCCCCCCCCCCCA=AAA@CCCCCCCCCCC?CACCC?CC@C@CACC?CA=B?0A;AAA6;>3?AC?C?8AAA3<<-8<BAC@22<6?####
The R2 output (out_basic_4seq_R2.fastq) contains the corresponding reverse
reads carrying identical annotations.
@HELIUM_000100422_612GNAAXX:7:119:14871:19157#0/2 {"experiment":"wolf_diet","obimultiplex_direction":"forward","obimultiplex_forward_match":"ttagataccccactatgc","obimultiplex_forward_mismatches":0,"obimultiplex_forward_tag":"gcctcct","obimultiplex_reverse_match":"tagaacaggctcctctag","obimultiplex_reverse_mismatches":0,"obimultiplex_reverse_tag":"gcctcct","sample":"29a_F260619"}
ccgcctccttagaacaggctcctctagaagggtataaagcaccgccaagtcctttgagttttaacctactcccgctacacgtccgccgaataatactgttatcatatt
+
CCCCCCCCCCCAAC@CCBCCCCCCB@C@CCCC@@CBBB6@@CC@AC8CC<C>C@@#####################################################
@HELIUM_000100422_612GNAAXX:7:108:5640:3823#0/2 {"experiment":"wolf_diet","obimultiplex_direction":"forward","obimultiplex_forward_match":"ttagataccccactatgc","obimultiplex_forward_mismatches":0,"obimultiplex_forward_tag":"gcctcct","obimultiplex_reverse_match":"tagaacaggctcctctag","obimultiplex_reverse_mismatches":0,"obimultiplex_reverse_tag":"gcctcct","sample":"29a_F260619"}
ccgcctccttagaacaggctcctctagaagggtataaagcaccgccaagtcctttgagttttaagctattgccggtagtactctggcgaacacttttgttatattact
+
CCCCCCCCCCCCCCCCACCCCCACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCBCCCCCCCCCC=CCCCCCCCCC:=><ACCCCBCCA8;68.69AA?>(>AC@CA3A
@HELIUM_000100422_612GNAAXX:7:97:14311:19299#0/2 {"experiment":"wolf_diet","obimultiplex_direction":"forward","obimultiplex_forward_match":"ttagataccccactatgc","obimultiplex_forward_mismatches":0,"obimultiplex_forward_tag":"gcctcct","obimultiplex_reverse_match":"tagaacaggctcctctag","obimultiplex_reverse_mismatches":0,"obimultiplex_reverse_tag":"gcctcct","sample":"29a_F260619"}
ccgcctccttagaacaggctcctctagaagggtataaagcaccgccaagtcctttgagttttaagctcttgccggtagtactctggcgcacacttttcttatattact
+
CC@CCCCCBC?CBCCCCCCCCC=CC<CCCCCC9@C?;?<+BB@??85<?>?<<6<:<?43???<2?3;??CA@C552(8<5<>:).(//1//,1'6:375=CCCC@?6
@HELIUM_000100422_612GNAAXX:7:22:8540:14708#0/2 {"experiment":"wolf_diet","obimultiplex_direction":"reverse","obimultiplex_forward_match":"ttagataccccactatgc","obimultiplex_forward_mismatches":0,"obimultiplex_forward_tag":"gcctcct","obimultiplex_reverse_match":"tagaacaggctcctctag","obimultiplex_reverse_mismatches":0,"obimultiplex_reverse_tag":"gcctcct","sample":"29a_F260619"}
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaatagcttcacactcaaagaactt
+
CCCDCCCCCCCCCDCCCCCCCCCDCCC@CCACCCCCCCCCCCCDCCCCCCCCCDCCCCCBBBBCC=/AAA===>=C<CCC?B9AA;3??7CC@C6CCC8ACCC+AB8A
Reorientate reads for consistent strand direction #
As explained above, Illumina sequencing is not an orientated process, so reads arrive in mixed orientations (obimultiplex_direction: "forward" or "reverse"). The --reorientate flag reverse-complements reads matched in the reverse direction
so that all output sequences run from forward primer to reverse primer:
obitagpcr \
--forward-reads wolf_F_4seq.fastq \
--reverse-reads wolf_R_4seq.fastq \
--tag-list wolf_diet_ngsfilter.csv \
--reorientate \
--out out_reorientate.fastq
Preview the first two sequences of the result:
head -n 8 out_reorientate_R1.fastq
@HELIUM_000100422_612GNAAXX:7:119:14871:19157#0/1 {"experiment":"wolf_diet","obimultiplex_direction":"forward","obimultiplex_forward_error":0,"obimultiplex_forward_match":"ttagataccccactatgc","obimultiplex_forward_tag":"gcctcct","obimultiplex_reverse_error":0,"obimultiplex_reverse_match":"tagaacaggctcctctag","obimultiplex_reverse_tag":"gcctcct","sample":"29a_F260619"}
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattattataacaaaatcattcgccagagtactaccggcaatagctcaaaactcaaagaactt
+
CCCCCCCCCCCCCCCCCCCCCCBCCCCB@BCCCCCCCCCCCCCB;CCCACCCCCCCAACA29,?<5899+A=A###################################
@HELIUM_000100422_612GNAAXX:7:108:5640:3823#0/1 {"experiment":"wolf_diet","obimultiplex_direction":"forward","obimultiplex_forward_error":0,"obimultiplex_forward_match":"ttagataccccactatgc","obimultiplex_forward_tag":"gcctcct","obimultiplex_reverse_error":0,"obimultiplex_reverse_match":"tagaacaggctcctctag","obimultiplex_reverse_tag":"gcctcct","sample":"29a_F260619"}
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattgttcgccagagtactaccggcaatagcttaaaactcaaaggactt
+
CCCCCCCBCCCCCCCCCCCCCCCCCCCCCCBCCCCCBCCCCCCC<CCCCACC;C?CCCC@A;=,B;93:;CC=C;==??#############################
The fourth reads, which was matched in reverse, have been exchanged and reverse-complemented. As a result, the read that was originally in the “R2” file is now in the “R1” file, and vice versa.:
head -n 16 out_reorientate_R1.fastq | tail -n 4
coissac@MacBook-Pro-de-Eric obitagpcr % head -n 16 out_reorientate_R1.fastq | tail -n 4
@HELIUM_000100422_612GNAAXX:7:22:8540:14708#0/2 {"experiment":"wolf_diet","obimultiplex_direction":"reverse","obimultiplex_forward_error":0,"obimultiplex_forward_match":"ttagataccccactatgc","obimultiplex_forward_tag":"gcctcct","obimultiplex_reverse_error":0,"obimultiplex_reverse_match":"tagaacaggctcctctag","obimultiplex_reverse_tag":"gcctcct","sample":"29a_F260619"}
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaatagcttcacactcaaagaactt
+
CCCDCCCCCCCCCDCCCCCCCCCDCCC@CCACCCCCCCCCCCCDCCCCCCCCCDCCCCCBBBBCC=/AAA===>=C<CCC?B9AA;3??7CC@C6CCC8ACCC+AB8A
Download the full output:
Capture unassigned reads for quality control #
Reads that fail barcode matching are silently discarded by default. Use
--unidentified to redirect them to a separate file for inspection. Unassigned
reads are annotated with an obimultiplex_error attribute indicating the
rejection reason.
In this dataset all reads are successfully assigned, so the unassigned file is empty:
obitagpcr \
--forward-reads wolf_F_4seq.fastq \
--reverse-reads wolf_R_4seq.fastq \
--tag-list wolf_diet_ngsfilter.csv \
--reorientate \
--unidentified out_unassigned.fastq \
--out out_identified.fastq
# Count unassigned reads (0 in this dataset)
obicount out_unassigned_R1.fastq
entities,n
variants,0
reads,0
symbols,0
For more on diagnosing rejection causes see the obimultiplex
page.
Allow indels in primer and tag matching #
By default only substitutions are accepted as differences. For sequencers that
produce indel errors (e.g. Oxford Nanopore), add --with-indels to enable full
edit-distance matching of primers. This can also be activated per-primer via
@param,indels,true in the NGSFilter file.
obitagpcr \
--forward-reads wolf_F_4seq.fastq \
--reverse-reads wolf_R_4seq.fastq \
--tag-list wolf_diet_ngsfilter.csv \
--allowed-mismatches 3 \
--with-indels \
--reorientate \
--out out_indels.fastq
The output format is identical to the basic example above. Download a full-size result (20 reads): out_indels_R1.fastq
Split output by sample with obidistribute
#
Because obitagpcr
always writes paired files (_R1 and _R2),
it cannot be piped directly into obidistribute
. The two steps must
be run sequentially: first demultiplex with --out, then distribute each file
separately.
obitagpcr \
--forward-reads wolf_F.fastq \
--reverse-reads wolf_R.fastq \
--tag-list wolf_diet_ngsfilter.csv \
--reorientate \
--out demux.fastq
obidistribute \
--classifier sample \
--pattern "sample_%s_R1.fastq" \
demux_R1.fastq
obidistribute \
--classifier sample \
--pattern "sample_%s_R2.fastq" \
demux_R2.fastq
This produces one R1/R2 pair per sample:
sample_13a_F730603_R1.fastq/sample_13a_F730603_R2.fastqsample_15a_F730814_R1.fastq/sample_15a_F730814_R2.fastqsample_26a_F040644_R1.fastq/sample_26a_F040644_R2.fastqsample_29a_F260619_R1.fastq/sample_29a_F260619_R2.fastq