obicomplement: reverse complement of sequences
#
This page was automatically generated by an AI assistant and has not yet been
reviewed or validated by the OBITools4 development team. It may contain
inaccuracies or incomplete information. Use with caution and refer to the command’s
--help output for authoritative option descriptions.
Description #
obicomplement
computes the reverse complement of every sequence in
its input. For each sequence, the nucleotides are reversed and each base is
replaced by its WatsonβCrick complement (AβT, CβG), yielding the strand that
would pair with the original sequence read in the opposite direction. Ambiguous
IUPAC characters are handled correctly and preserved in the output.
When quality scores are present (
fastq
input), they are reversed in
lock-step with the sequence so that each quality value remains associated with
its corresponding base after transformation. This makes obicomplement
safe to use in any pipeline that carries per-base quality information.
This operation is commonly needed when sequences were read on the wrong strand,
when a primer is designed on the reverse strand, or when preparing data for
strand-aware downstream tools such as obipairing
or obigrep
.
graph TD
A@{ shape: doc, label: "sequences.fasta" }
C[obicomplement]
D@{ shape: doc, label: "out_default.fasta" }
A --> C:::obitools
C --> D
classDef obitools fill:#99d57c
The file sequences.fasta contains five sample fasta
sequences:
π sequences.fasta>seq001 basic DNA sequence
ATCGATCGATCGATCGATCG
>seq002 GC-rich sequence
GCGCGCGCGCGCGCGCGCGC
>seq003 AT-rich sequence
ATATATATATATATATATAT
>seq004 palindromic sequence
AATTCCGGAATTCCGGAATT
>seq005 mixed sequence
ATCGGCTATGCATGCTAGCT
To compute the reverse complement of all five sequences:
obicomplement sequences.fasta -o out_default.fasta
>seq001 {"definition":"basic DNA sequence"}
cgatcgatcgatcgatcgat
>seq002 {"definition":"GC-rich sequence"}
gcgcgcgcgcgcgcgcgcgc
>seq003 {"definition":"AT-rich sequence"}
atatatatatatatatatat
>seq004 {"definition":"palindromic sequence"}
aattccggaattccggaatt
>seq005 {"definition":"mixed sequence"}
agctagcatgcatagccgat
Each sequence header description is wrapped in a JSON annotation block, and the sequence itself is written in lowercase with all bases reverse-complemented.
Synopsis #
obicomplement [--batch-mem <string>] [--batch-size <int>]
[--batch-size-max <int>] [--compress|-Z] [--csv] [--debug]
[--ecopcr] [--embl] [--fail-on-taxonomy] [--fasta]
[--fasta-output] [--fastq] [--fastq-output] [--genbank]
[--help|-h|-?] [--input-OBI-header] [--input-json-header]
[--json-output] [--max-cpu <int>] [--no-order]
[--no-progressbar] [--out|-o <FILENAME>]
[--output-OBI-header|-O] [--output-json-header]
[--paired-with <FILENAME>] [--raw-taxid] [--silent-warning]
[--skip-empty] [--solexa] [--taxonomy|-t <string>] [--u-to-t]
[--update-taxid] [--with-leaves] [<args>]
Options #
obicomplement
specific options
#
--paired-with<FILENAME>: filename containing the paired reads.
Taxonomic options #
--taxonomy|-t<string>: Path to the taxonomic database.--fail-on-taxonomy: Exit with an error if a taxid found in the data is not a currently valid node in the loaded taxonomy.--update-taxid: Automatically replace taxids that have been declared merged into a newer node by the taxonomy database.--raw-taxid: Print taxids in output files without appending the taxon name and rank.--with-leaves: When the taxonomy is extracted from a sequence file, attach sequences as leaves of their taxid node.
Controlling the input data #
OBITools4 generally recognizes the input file format. It also recognizes whether the input file is compressed using GZIP. But some rare files can be misidentified, so the following options allow the user to force the format, thus bypassing the format identification step.The file format options #
--fasta: indicates that sequence data is in fasta format.--fastq: indicates that sequence data is in fastq format.--embl: indicates that sequence data is in EMBL-ENA flatfile format.--csv: indicates that sequence data is in CSV format.--genbank: indicates that sequence data is in GenBank flatfile format.--ecopcr: indicates that sequence data is in the old ecoPCR tabulated format.
Controlling the way OBITools4 are formatting annotations #
These options only apply to the FASTA and FASTQ formats--input-OBI-header: FASTA/FASTQ title line annotations follow the old OBI format.--input-json-header: FASTA/FASTQ title line annotations follow the JSON format.
Controlling quality score decoding #
This option only applies to the FASTQ formats--solexa: decodes quality string according to the old Solexa specification. (default: the standard Sanger encoding is used, env: OBISSOLEXA)
Controlling the output data #
--compress|-Z: output is compressed using gzip. (default: false)--no-order: the OBITools ensure that the order between the input file and the output file does not change. When multiple files are processed, they are processed one at a time. If the –no-order option is added to a command, multiple input files can be opened at the same time and their contents processed in parallel. This usually increases processing speed, but does not guarantee the order of the sequences in the output file. Also, processing multiple files in parallel may require more memory to perform the computation.--fasta-output: writes sequence data in fasta format (default if quality data is not available).--fastq-output: writes sequence data in fastq format (default if quality data is available).--json-output: writes sequence data in JSON format.--out|-o<FILENAME>: filename used for saving the output (default: “-”, the standard output)--output-OBI-header|-O: writes output FASTA/FASTQ title line annotations in OBI format (default: JSON).--output-json-header: writew output FASTA/FASTQ title line annotations in JSON format (the default format).--skip-empty: sequences of length equal to zero are removed from the output (default: false).--no-progressbar: deactivates progress bar display (default: false).
General options #
--help|-h|-?: shows this help.--version: prints the version and exits.--silent-warning: This option tells obitools to stop displaying warnings. This behaviour can be controlled by setting the OBIWARNINGS environment variable.
Computation related options #
--max-cpu<INTEGER>: OBITools can take advantage of your computer’s multi-core architecture by parallelizing the computation across all available CPUs. Computing on more CPUs usually requires more memory to perform the computation. Reducing the number of CPUs used to perform a calculation is also a way to indirectly control the amount of memory used by the process. The number of CPUs used by OBITools can also be controlled by setting the OBIMAXCPU environment variable.--force-one-cpu: forces the use of a single CPU core for parallel processing (default: false).--batch-size<INTEGER>: minimum number of sequences per batch for parallel processing (floor, default: 1, env: OBIBATCHSIZE)--batch-size-max<INTEGER>: maximum number of sequences per batch for parallel processing (ceiling, default: 2000, env: OBIBATCHSIZEMAX)--batch-mem<STRING>: maximum memory per batch (e.g. 128K, 64M, 1G; default: 128M; set to 0 to disable, env: OBIBATCHMEM)
Debug related options #
--debug: enables debug mode, by setting log level to debug (default: false, env: OBIDEBUG)--pprof: enables pprof server. Look at the log for details. (default: false).--pprof-mutex<INTEGER>: enables profiling of mutex lock. (default: 10, env: OBIPPROFMUTEX)--pprof-goroutine<INTEGER>: enables profiling of goroutine blocking profile. (default: 6060, env: OBIPPROFGOROUTINE)
Examples #
The file
reads.fastq contains five
fastq
reads with
Phred quality scores. obicomplement
reverses both the nucleotide
sequence and the quality string so that each quality value stays aligned with
its base after the transformation:
@read001 sequencing read 1
ATCGATCGATCGATCGATCG
+
IIIIIIIIIIIIIIIIIIII
@read002 sequencing read 2
GCGCGCGCGCGCGCGCGCGC
+
IIIIIIIIIIIIIIIIIIII
@read003 sequencing read 3
ATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIII
@read004 sequencing read 4
AATTCCGGAATTCCGGAATT
+
IIIIIIIIIIIIIIIIIIII
@read005 sequencing read 5
ATCGGCTATGCATGCTAGCT
+
IIIIIIIIIIIIIIIIIIII
obicomplement reads.fastq --fastq-output -o out_fastq.fastq
@read001 {"definition":"sequencing read 1"}
cgatcgatcgatcgatcgat
+
IIIIIIIIIIIIIIIIIIII
@read002 {"definition":"sequencing read 2"}
gcgcgcgcgcgcgcgcgcgc
+
IIIIIIIIIIIIIIIIIIII
@read003 {"definition":"sequencing read 3"}
atatatatatatatatatat
+
IIIIIIIIIIIIIIIIIIII
@read004 {"definition":"sequencing read 4"}
aattccggaattccggaatt
+
IIIIIIIIIIIIIIIIIIII
@read005 {"definition":"sequencing read 5"}
agctagcatgcatagccgat
+
IIIIIIIIIIIIIIIIIIII
The file
rna_sequences.fasta contains RNA sequences
that use Uracil (U) instead of Thymine (T). The --u-to-t flag converts
each U to T before computing the reverse complement, producing valid DNA
output that can be used in DNA-based downstream analyses:
>rna001 mRNA fragment with uracil
AUGCAUGCAUGCAUGCAUGC
>rna002 coding RNA with uracil
GCAUGCAUGCAUGCAUGCAU
>rna003 polyU RNA sequence
UUUUUUUUUUUUUUUUUUUU
obicomplement --u-to-t rna_sequences.fasta -o out_rna_rc.fasta
>rna001 {"definition":"mRNA fragment with uracil"}
gcatgcatgcatgcatgcat
>rna002 {"definition":"coding RNA with uracil"}
atgcatgcatgcatgcatgc
>rna003 {"definition":"polyU RNA sequence"}
aaaaaaaaaaaaaaaaaaaa
For paired-end data,
R1.fastq and
R2.fastq contain
the forward and reverse mates respectively. obicomplement
processes
both files and writes the reverse-complemented results to separate _R1 and
_R2 output files:
@pair001/1 paired read 1 forward
ATCGATCGATCGATCGATCG
+
IIIIIIIIIIIIIIIIIIII
@pair002/1 paired read 2 forward
GCGCGCGCGCGCGCGCGCGC
+
IIIIIIIIIIIIIIIIIIII
@pair003/1 paired read 3 forward
AATTCCGGAATTCCGGAATT
+
IIIIIIIIIIIIIIIIIIII
@pair001/2 paired read 1 reverse
CGATCGATCGATCGATCGAT
+
IIIIIIIIIIIIIIIIIIII
@pair002/2 paired read 2 reverse
CGCGCGCGCGCGCGCGCGCG
+
IIIIIIIIIIIIIIIIIIII
@pair003/2 paired read 3 reverse
TTAATTGGCCAATTGGCCAA
+
IIIIIIIIIIIIIIIIIIII
obicomplement R1.fastq --paired-with R2.fastq --out out_paired.fastq
@pair001/1 {"definition":"paired read 1 forward"}
cgatcgatcgatcgatcgat
+
IIIIIIIIIIIIIIIIIIII
@pair002/1 {"definition":"paired read 2 forward"}
gcgcgcgcgcgcgcgcgcgc
+
IIIIIIIIIIIIIIIIIIII
@pair003/1 {"definition":"paired read 3 forward"}
aattccggaattccggaatt
+
IIIIIIIIIIIIIIIIIIII
@pair001/2 {"definition":"paired read 1 reverse"}
cgatcgatcgatcgatcgat
+
IIIIIIIIIIIIIIIIIIII
@pair002/2 {"definition":"paired read 2 reverse"}
cgcgcgcgcgcgcgcgcgcg
+
IIIIIIIIIIIIIIIIIIII
@pair003/2 {"definition":"paired read 3 reverse"}
ttaattggccaattggccaa
+
IIIIIIIIIIIIIIIIIIII
obicomplement --help