`obiannotate`: edit sequence annotations #

Description #

obiannotate is a tool for editing the sequence records of a dataset. It allows you to add, delete or modify annotations of sequence records, as well as edit the identifier, definition or sequence itself.

There are two particularly important groups of options in obiannotate . The first group is shared with obigrep and enables the selection of sequences. The second group specifies the changes to be made to the sequence records. In obigrep , the selection options determine which sequences the program will retain in its output. In contrast, every sequence in the input dataset is included in the result produced by obiannotate ; however, only the sequences selected by the selection options are modified according to the editing options. Non-selected sequences are transferred to the result without modification.

The selection options #

The edition options #

Edition of the annotations #

OBITools4 store annotations attached to each sequence using a tag/value mechanism. The annotation of a sequence if a set of tags each of them being associated to a value. Therefor, annotating a sequence is changing this set of tags by adding new tags, deleting some others or changing the value associated to a tag.

Adding annotations #

To add a new tag/value pair to a sequence obiannotate propose the generic option --set-tag Considering the following file:

📄 empty.fasta

>seqA1 
cgatgctgcatgctagtgctagtcgat
>seqB1 
tagctagctagctagctagctagctagcta
>seqA2 
gtagctagctagctagctagctagctaga
>seqC1 
cgatgctccatgctagtgctagtcgatga
>seqB2 
cgatggctccatgctagtgctagtcgatga

To add a foo tag to each sequence associated to the numeric value 3 the command is:

obiannotate --set-tag foo=3 empty.fasta

>seqA1 {"foo":3}
cgatgctgcatgctagtgctagtcgat
>seqB1 {"foo":3}
tagctagctagctagctagctagctagcta
>seqA2 {"foo":3}
gtagctagctagctagctagctagctaga
>seqC1 {"foo":3}
cgatgctccatgctagtgctagtcgatga
>seqB2 {"foo":3}
cgatggctccatgctagtgctagtcgatga

The argument of the --set-tag option foo=3 can be decomposed in two parts separated by the equal sign. The left part foo is the name of the target tag, the right part is the value to assign to the tag.

The left part must be a string when the right part is actually an OBITools4 expression language. Here the expression is simple 3, which is evaluated to the 3 integer value.

To assign as string value to a tag, the rigth part of the option argument must be a valid OBITools4 expression language corresponding to a string: "bar" with double quotes flanking the text having to be assigned. But to prevent the Bash UNIX shell to interpret itself the option parameter foo="bar", it has to be protected itself by single quote.

obiannotate --set-tag 'foo="bar"' empty.fasta

>seqA1 {"foo":"bar"}
cgatgctgcatgctagtgctagtcgat
>seqB1 {"foo":"bar"}
tagctagctagctagctagctagctagcta
>seqA2 {"foo":"bar"}
gtagctagctagctagctagctagctaga
>seqC1 {"foo":"bar"}
cgatgctccatgctagtgctagtcgatga
>seqB2 {"foo":"bar"}
cgatggctccatgctagtgctagtcgatga

As the right part is an expression, it can be more complex and realize some basic computations. In the next example the foo tag is valuated with the sequence identifier prefixed by "bar-".

obiannotate --set-tag 'foo="bar-" + sequence.Id()' empty.fasta

>seqA1 {"foo":"bar-seqA1"}
cgatgctgcatgctagtgctagtcgat
>seqB1 {"foo":"bar-seqB1"}
tagctagctagctagctagctagctagcta
>seqA2 {"foo":"bar-seqA2"}
gtagctagctagctagctagctagctaga
>seqC1 {"foo":"bar-seqC1"}
cgatgctccatgctagtgctagtcgatga
>seqB2 {"foo":"bar-seqB2"}
cgatggctccatgctagtgctagtcgatga

The complete description of the OBITools4 expression language is available here.

All the previous examples are tagging each sequence in the same way, but you can also use obiannotate to modify the annotation of only a subset of the sequence. As explained in the introduction of this documentation, this is achieved by combining selection and edition options.

For instance, to add a foo tag only to the single sequence having the id seqA2, is achieved by combining the selection option -I seqA2 and the edition option --set-tag 'foo="bar"'

obiannotate -I seqA2 --set-tag 'foo="bar"' empty.fasta

>seqA1 
cgatgctgcatgctagtgctagtcgat
>seqB1 
tagctagctagctagctagctagctagcta
>seqA2 {"foo":"bar"}
gtagctagctagctagctagctagctaga
>seqC1 
cgatgctccatgctagtgctagtcgatga
>seqB2 
cgatggctccatgctagtgctagtcgatga

Used with obigrep the -I seqA2 would have selected only the modified sequence.

obigrep -I seqA2  empty.fasta

>seqA2 
gtagctagctagctagctagctagctaga

The selection options being shared between obiannotate and obigrep , good method to check which sequences will be modified by obiannotate is to check the selection options at first with obigrep . Only the sequences present in the obigrep output will be edited by obiannotate .

obigrep -l 30 empty.fasta

>seqB1 
tagctagctagctagctagctagctagcta
>seqB2 
cgatggctccatgctagtgctagtcgatga

obiannotate -l 30 \
            --set-tag 'foo="bar-" + sequence.Id()' \
            empty.fasta

>seqA1 
cgatgctgcatgctagtgctagtcgat
>seqB1 {"foo":"bar-seqB1"}
tagctagctagctagctagctagctagcta
>seqA2 
gtagctagctagctagctagctagctaga
>seqC1 
cgatgctccatgctagtgctagtcgatga
>seqB2 {"foo":"bar-seqB2"}
cgatggctccatgctagtgctagtcgatga

Renaming tags #

Renaming tags can be useful for accounting for changes in a pipeline, adapting old datasets to new scripts or saving annotations produced by an *OBITools* command before rerunning it with different parameters. Consider the following fasta file:

📄 five_tags.fasta

>seqA1 {"count":1,"tata":"bar","taxid":"taxon:9606 [Homo sapiens]@species","toto":"titi"}
cgatgctgcatgctagtgctagtcgat
>seqB1 {"tata":"bar","taxid":"taxon:63221 [Homo sapiens neanderthalensis]@subspecies","toto":"tata"}
tagctagctagctagctagctagctagcta
>seqA2 {"count":5,"tata":"foo","taxid":"taxon:9605 [Homo]@genus","toto":"tutu"}
gtagctagctagctagctagctagctaga
>seqC1 {"count":15,"tata":"foo","taxid":"taxon:9604 [Hominidae]@family","toto":"foo"}
cgatgctccatgctagtgctagtcgatga
>seqB2 {"count":25,"tata":"bar"}
cgatggctccatgctagtgctagtcgatga

If you want to keep the taxonomic annotations as a reference before running the obitag command to produce a new one and then be able to compare the new one to the old one later, you can rename the taxid tag to ref_taxid and then run the obitag command, which will set a new ’taxid’ tag.

obiannotate --rename-tag ref_taxid=taxid  five_tags.fasta

>seqA1 {"count":1,"ref_taxid":"taxon:9606 [Homo sapiens]@species","tata":"bar","toto":"titi"}
cgatgctgcatgctagtgctagtcgat
>seqB1 {"ref_taxid":"taxon:63221 [Homo sapiens neanderthalensis]@subspecies","tata":"bar","toto":"tata"}
tagctagctagctagctagctagctagcta
>seqA2 {"count":5,"ref_taxid":"taxon:9605 [Homo]@genus","tata":"foo","toto":"tutu"}
gtagctagctagctagctagctagctaga
>seqC1 {"count":15,"ref_taxid":"taxon:9604 [Hominidae]@family","tata":"foo","toto":"foo"}
cgatgctccatgctagtgctagtcgatga
>seqB2 {"count":25,"tata":"bar"}
cgatggctccatgctagtgctagtcgatga

Adding a serial number to each sequence #

It can be useful to add a serial number to each sequence. This can be done by using the obiannotate command with the --number. That option will add a new tag to each sequence, with the name seq_number valued with an integer value that is incremented for each sequence.

obiannotate --number empty.fasta

>seqA1 {"seq_number":1}
cgatgctgcatgctagtgctagtcgat
>seqB1 {"seq_number":2}
tagctagctagctagctagctagctagcta
>seqA2 {"seq_number":3}
gtagctagctagctagctagctagctaga
>seqC1 {"seq_number":4}
cgatgctccatgctagtgctagtcgatga
>seqB2 {"seq_number":5}
cgatggctccatgctagtgctagtcgatga

–length
–aho-corasick –pattern –pattern-name –pattern-error –allows-indels

–scientific-name

–with-taxon-at-rank <RANK_NAME>

–taxonomic-rank

–taxonomic-path

–raw-taxid

–add-lca-in <SLOT_NAME> –lca-error <#.###>

Deleting annotations #

There are three options that allow for deleting annotations associated with a sequence. The easiest one is --clear. It removes every annotation associated to a sequence.

Considering the fasta sequence file

📄 five_tags.fasta

>seqA1 {"count":1,"tata":"bar","taxid":"taxon:9606 [Homo sapiens]@species","toto":"titi"}
cgatgctgcatgctagtgctagtcgat
>seqB1 {"tata":"bar","taxid":"taxon:63221 [Homo sapiens neanderthalensis]@subspecies","toto":"tata"}
tagctagctagctagctagctagctagcta
>seqA2 {"count":5,"tata":"foo","taxid":"taxon:9605 [Homo]@genus","toto":"tutu"}
gtagctagctagctagctagctagctaga
>seqC1 {"count":15,"tata":"foo","taxid":"taxon:9604 [Hominidae]@family","toto":"foo"}
cgatgctccatgctagtgctagtcgatga
>seqB2 {"count":25,"tata":"bar"}
cgatggctccatgctagtgctagtcgatga

The next command removes all the annotations

obiannotate --clear five_tags.fasta

>seqA1 
cgatgctgcatgctagtgctagtcgat
>seqB1 
tagctagctagctagctagctagctagcta
>seqA2 
gtagctagctagctagctagctagctaga
>seqC1 
cgatgctccatgctagtgctagtcgatga
>seqB2 
cgatggctccatgctagtgctagtcgatga

If you combine a selection option, here -C 10 which selects all the sequences occurring at most ten times, and the --clear option, you will delete annotations only on selected sequences. For other sequences the annotations are kept.

obiannotate -C 10 --clear five_tags.fasta

>seqA1 
cgatgctgcatgctagtgctagtcgat
>seqB1 
tagctagctagctagctagctagctagcta
>seqA2 
gtagctagctagctagctagctagctaga
>seqC1 {"count":15,"tata":"foo","taxid":"taxon:9604 [Hominidae]@family","toto":"foo"}
cgatgctccatgctagtgctagtcgatga
>seqB2 {"count":25,"tata":"bar"}
cgatggctccatgctagtgctagtcgatga

It is possible to delete a given tag based on its name using the --delete-tag option. In the following example the taxid tag is deleted. As the seqB2 sequence does not exhibe a taxid tag, it is not affected.

obiannotate --delete-tag taxid five_tags.fasta

>seqA1 {"count":1,"tata":"bar","toto":"titi"}
cgatgctgcatgctagtgctagtcgat
>seqB1 {"tata":"bar","toto":"tata"}
tagctagctagctagctagctagctagcta
>seqA2 {"count":5,"tata":"foo","toto":"tutu"}
gtagctagctagctagctagctagctaga
>seqC1 {"count":15,"tata":"foo","toto":"foo"}
cgatgctccatgctagtgctagtcgatga
>seqB2 {"count":25,"tata":"bar"}
cgatggctccatgctagtgctagtcgatga

Several --delete-tag options can be inserted in a single obiannotate command.

obiannotate --delete-tag taxid \
            --delete-tag count \
            five_tags.fasta

>seqA1 {"tata":"bar","toto":"titi"}
cgatgctgcatgctagtgctagtcgat
>seqB1 {"tata":"bar","toto":"tata"}
tagctagctagctagctagctagctagcta
>seqA2 {"tata":"foo","toto":"tutu"}
gtagctagctagctagctagctagctaga
>seqC1 {"tata":"foo","toto":"foo"}
cgatgctccatgctagtgctagtcgatga
>seqB2 {"tata":"bar"}
cgatggctccatgctagtgctagtcgatga

The last way to delete annotations is indirect. It is based on the --keep option, indicating the annotation to be kept. Consequently, all the other tags, the not kept, are deleted

obiannotate --keep taxid five_tags.fasta

>seqA1 {"taxid":"taxon:9606 [Homo sapiens]@species"}
cgatgctgcatgctagtgctagtcgat
>seqB1 {"taxid":"taxon:63221 [Homo sapiens neanderthalensis]@subspecies"}
tagctagctagctagctagctagctagcta
>seqA2 {"taxid":"taxon:9605 [Homo]@genus"}
gtagctagctagctagctagctagctaga
>seqC1 {"taxid":"taxon:9604 [Hominidae]@family"}
cgatgctccatgctagtgctagtcgatga
>seqB2 
cgatggctccatgctagtgctagtcgatga

Similarly to --delete-tag several --keep options can be provided to keep several annotations.

obiannotate --keep taxid \
            --keep count \
            five_tags.fasta

>seqA1 {"count":1,"taxid":"taxon:9606 [Homo sapiens]@species"}
cgatgctgcatgctagtgctagtcgat
>seqB1 {"taxid":"taxon:63221 [Homo sapiens neanderthalensis]@subspecies"}
tagctagctagctagctagctagctagcta
>seqA2 {"count":5,"taxid":"taxon:9605 [Homo]@genus"}
gtagctagctagctagctagctagctaga
>seqC1 {"count":15,"taxid":"taxon:9604 [Hominidae]@family"}
cgatgctccatgctagtgctagtcgatga
>seqB2 {"count":25}
cgatggctccatgctagtgctagtcgatga

Changing annotation values #

Edition of the identifier #

The identifier of a sequence can be updated by using the --set-id option. One of the most useful use cases of this option is to substitute the long id generated by the sequencer, by a new short one based on a number incremented from sequence to sequence, as the one generated by the --number option. To achieve this, one can use two piped obiannotate commands. The first adds the seq_number annotation to the sequences, and then the second updates the sequence id from this newly added seq_number tag.

obiannotate --number empty.fasta \
  | obiannotate --set-id 'sprintf("motus_%04d", annotations.seq_number)'

>motus_0001 {"seq_number":1}
cgatgctgcatgctagtgctagtcgat
>motus_0002 {"seq_number":2}
tagctagctagctagctagctagctagcta
>motus_0003 {"seq_number":3}
gtagctagctagctagctagctagctaga
>motus_0004 {"seq_number":4}
cgatgctccatgctagtgctagtcgatga
>motus_0005 {"seq_number":5}
cgatggctccatgctagtgctagtcgatga

The sprintf function of the OBITools4 expression language is used to format sequence identifiers. It requires a format string, in this case "motus_%04d", which describes how the new identifier will be generated. In this case, %04d will be replaced by the second argument of the sprintf() function, annotations.seq_number, which is the number associated with the sequence in the file. d indicates a decimal integer value, and the 4 in front specifies that this number will be padded to 4 digits. The 0 before the 4 indicates that the number will be padded with zeros.

The result of the printf function can be seen in the results presented. The first sequence is given the identifier motus_0001, the second is given the identifier motus_0002, and so on.

Edition of the definition #

Edition of the sequence #

–cut <###:###>
–sequence

Synopsis #

obiannotate [--add-lca-in <SLOT_NAME>] [--aho-corasick <string>]
            [--allows-indels] [--approx-pattern <PATTERN>]...
            [--attribute|-a <KEY=VALUE>]... [--batch-size <int>] [--clear]
            [--compress|-Z] [--csv] [--cut <###:###>] [--debug]
            [--definition|-D <PATTERN>]... [--delete-tag <KEY>]... [--ecopcr]
            [--embl] [--fail-on-taxonomy] [--fasta] [--fasta-output]
            [--fastq] [--fastq-output] [--force-one-cpu] [--genbank]
            [--has-attribute|-A <KEY>]... [--help|-h|-?]
            [--id-list <FILENAME>] [--identifier|-I <PATTERN>]...
            [--ignore-taxon|-i <TAXID>]... [--input-OBI-header]
            [--input-json-header] [--inverse-match|-v] [--json-output]
            [--keep|-k <KEY>]... [--lca-error <#.###>] [--length]
            [--max-count|-C <COUNT>] [--max-cpu <int>]
            [--max-length|-L <LENGTH>] [--min-count|-c <COUNT>]
            [--min-length|-l <LENGTH>] [--no-order] [--no-progressbar]
            [--number] [--only-forward] [--out|-o <FILENAME>]
            [--output-OBI-header|-O] [--output-json-header]
            [--paired-mode <forward|reverse|and|or|andnot|xor>]
            [--pattern <string>] [--pattern-error <int>]
            [--pattern-name <string>] [--pprof] [--pprof-goroutine <int>]
            [--pprof-mutex <int>] [--predicate|-p <EXPRESSION>]...
            [--raw-taxid] [--rename-tag|-R <NEW_NAME=OLD_NAME>]...
            [--require-rank <RANK_NAME>]...
            [--restrict-to-taxon|-r <TAXID>]... [--scientific-name]
            [--sequence|-s <PATTERN>]... [--set-identifier <EXPRESSION>]
            [--set-tag|-S <KEY=EXPRESSION>]... [--silent-warning]
            [--skip-empty] [--solexa] [--taxonomic-path] [--taxonomic-rank]
            [--taxonomy|-t <string>] [--u-to-t] [--update-taxid]
            [--valid-taxid] [--version] [--with-leaves]
            [--with-taxon-at-rank <RANK_NAME>]... [<args>]

Options #

`obiannotate` specific options #

Identifier modification #

--set-identifier <EXPRESSION>: An expression used to assigned the new id of the sequence.

Attribute modification #

--clear: Clears all attributes associated to the sequence records.
--delete-tag <KEY>: Deletes attribute named KEY. When this attribute is missing, the sequence record is skipped and the next one is examined.
--keep | -k <KEY>: Keeps only attribute named KEY. Several -k options can be combined.
--rename-tag | -R <NEW_NAME=OLD_NAME>: Changes attribute name OLD_NAME to NEW_NAME. When attribute named OLD_NAME is missing, the sequence record is skipped and the next one is examined.
--set-tag | -S <KEY=EXPRESSION>: Creates a new attribute named with a key KEY set with a value computed from EXPRESSION.

--aho-corasick <string>: Adds an aho-corasick attribute with the count of matches of the provided patterns.
--length: Adds attribute with seq_length as a key and sequence length as a value.
--pattern <string>: Adds a pattern attribute containing the pattern, a pattern_match attribute indicating the matched sequence, and a pattern_error slot indicating the number difference between the pattern and the match to the sequence.
--pattern-name <string>: specifies the name to use as prefix for the attributes reporting the match. (default: “pattern”)

Sequence modification #

--cut <###:###>: A pattern describing how to cut the sequence.

Taxonomy annotation #

--add-lca-in <KEY>: From the taxonomic annotation of the sequence (taxid attribute or merged_taxid attribute), a new attribute named KEY is added with the taxid of the lowest common ancestor corresponding to the current annotation.
--lca-error <#.###>: Error rate tolerated on the taxonomical description during the lowest common ancestor. At most a fraction of lca-error of the taxonomic information can disagree with the estimated LCA. (default: 0.000000)
--scientific-name: Annotates the sequence with its scientific name.

Taxonomy options #

Check taxids against a taxonomy #

OBITools4 allow loading a taxonomy database when they are processing sequence data. If done, the command checks the validity of taxids during the processing of the command. Three cases can occur:

The taxon is valid
The taxon is no more valid, but a new one replaces it
The taxon is no more valid, and no new taxid exists to replace it.

In the first case, the obitools normalize the writing of the taxid in the form:

    TAXCOD:TAXID [SCIENTIFIC NAME]@RANK

As example with the NCBI taxonomy the human taxid looks like :

    taxon:9606 [Homo sapiens]@species

That rewriting doesn't happen if the --raw-taxid option is set. In that case only the raw taxid is conserved.

In the second case, a warning message is logged on the standard error. If the --update-taxid is set, the command will update the expired taxid to the new equivalent one, and the valid taxon rules apply. Otherwise, the old taxid is maintained in the result. In the last case, a warning message is also issued to the standard error, and non-valid taxid is conserved as is. If the --fail-on-taxonomy option is set, the command stop and exit with an error at the first non-valid taxid encountred in input data.

--taxonomy | -t <string>: Path to the taxonomic database.
--raw-taxid: Displays the raw taxid for each displayed taxon. (default: false)
--update-taxid: Make obitools automatically updating the taxids that are declared merged to a newest one (default: false).
--fail-on-taxonomy: Make obitools failing on error if a used taxid is not a currently valid one (default: false).

--taxonomic-rank: Annotates the sequence with its taxonomic rank.
--taxonomic-path: Annotates the sequence with its taxonomic path.
--with-taxon-at-rank: Adds taxonomic annotation at taxonomic rank RANK_NAME.

Selecting sequence records #

Selection based on the sequence #

Strict matching #

--sequence | -s <PATTERN>: A Regular expression pattern used to match the sequence. Only the entries whose sequence matches the pattern are kept. Regular expression patterns are case-insensitive.

Approximate matching #

--approx-pattern <PATTERN>: A DNA pattern used to match the sequence. Only the entries whose sequence matches the pattern are kept. DNA patterns are case-insensitive. They can be matched allowing for errors: mismatches or insertions or deletions.
--allows-indels: allows for indels during pattern DNA pattern matching (see --approx-pattern option).
--pattern-error <INTEGER>: maximum number of errors allowed when searching for patterns in DNA (default 0, see --approx-pattern option).

Selection based on the sequence identifier #

--identifier | -I <REGEX>: Regular expression pattern to be tested against the sequence identifier. The pattern is case-insensitive.
--id-list <FILENAME>: points to a text file containing the list of sequence record identifiers to be selected. The file format consists in a single identifier per line.

Selection based on the sequence definition #

--definition | -D <REGEX>: Regular expression pattern to be tested against the sequence definition. The pattern is case-insensitive.

Selection based on the sequence properties #

--min-count | -c <COUNT>: selects the sequence records for which the number of occurrences (i.e the count attribute) is equal to or greater than the defined minimum count.
--max-count | -C <COUNT>: Select the sequence records for which the occurrence count (i.e the count attribute) is equal to or smaller than the defined maximum count.
--min-length | -l <LENGTH>: selects the sequence records for which the sequence length is equal to or greater than the defined minimum sequence length.
--max-length | -L <LENGTH>: selects sequence records for which the sequence length is equal to or less than the defined maximum sequence length.

Controlling the input data #

OBITools4 generally recognizes the input file format. It also recognizes whether the input file is compressed using GZIP. But some rare files can be misidentified, so the following options allow the user to force the format, thus bypassing the format identification step.

The file format options #

--fasta: indicates that sequence data is in fasta format.
--fastq: indicates that sequence data is in fastq format.
--embl: indicates that sequence data is in EMBL-ENA flatfile format.
--csv: indicates that sequence data is in CSV format.
--genbank: indicates that sequence data is in GenBank flatfile format.
--ecopcr: indicates that sequence data is in the old ecoPCR tabulated format.

Controlling the way OBITools4 are formatting annotations #

These options only apply to the FASTA and FASTQ formats

--input-OBI-header: FASTA/FASTQ title line annotations follow the old OBI format.
--input-json-header: FASTA/FASTQ title line annotations follow the JSON format.

Controlling quality score decoding #

This option only applies to the FASTQ formats

--solexa: decodes quality string according to the old Solexa specification. (default: the standard Sanger encoding is used, env: OBISSOLEXA)

Controlling the output data #

--compress | -Z : output is compressed using gzip. (default: false)
--no-order: the OBITools ensure that the order between the input file and the output file does not change. When multiple files are processed, they are processed one at a time. If the –no-order option is added to a command, multiple input files can be opened at the same time and their contents processed in parallel. This usually increases processing speed, but does not guarantee the order of the sequences in the output file. Also, processing multiple files in parallel may require more memory to perform the computation.
--fasta-output: writes sequence data in fasta format (default if quality data is not available).
--fastq-output: writes sequence data in fastq format (default if quality data is available).
--json-output: writes sequence data in JSON format.
--out | -o <FILENAME>: filename used for saving the output (default: “-”, the standard output)
--output-OBI-header | -O : writes output FASTA/FASTQ title line annotations in OBI format (default: JSON).
--output-json-header: writew output FASTA/FASTQ title line annotations in JSON format (the default format).
--skip-empty: sequences of length equal to zero are removed from the output (default: false).
--no-progressbar: deactivates progress bar display (default: false).

General options #

--help | -h|-? : shows this help.
--version: prints the version and exits.
--silent-warning: This option tells obitools to stop displaying warnings. This behaviour can be controlled by setting the OBIWARNINGS environment variable.

--max-cpu <INTEGER>: OBITools can take advantage of your computer’s multi-core architecture by parallelizing the computation across all available CPUs. Computing on more CPUs usually requires more memory to perform the computation. Reducing the number of CPUs used to perform a calculation is also a way to indirectly control the amount of memory used by the process. The number of CPUs used by OBITools can also be controlled by setting the OBIMAXCPU environment variable.
--force-one-cpu: forces the use of a single CPU core for parallel processing (default: false).
--batch-size <INTEGER>: number of sequence per batch for parallel processing (default: 1000, env: OBIBATCHSIZE)

--debug: enables debug mode, by setting log level to debug (default: false, env: OBIDEBUG)
--pprof: enables pprof server. Look at the log for details. (default: false).
--pprof-mutex <INTEGER>: enables profiling of mutex lock. (default: 10, env: OBIPPROFMUTEX)
--pprof-goroutine <INTEGER>: enables profiling of goroutine blocking profile. (default: 6060, env: OBIPPROFGOROUTINE)

Examples #

obiannotate --help

obiannotate: edit sequence annotations #