obimicrosat

obimicrosat: looks for microsatellite sequences in a sequence file #

Preliminary AI-generated documentation

This page was automatically generated by an AI assistant and has not yet been reviewed or validated by the OBITools4 development team. It may contain inaccuracies or incomplete information. Use with caution and refer to the command’s --help output for authoritative option descriptions.

Description #

obimicrosat scans DNA sequences for simple sequence repeats (SSRs), also called microsatellites β€” tandem repetitions of a short motif (1–6 bp by default). For each sequence containing a qualifying repeat, the command annotates it with the location, unit sequence, repeat count, and flanking regions, then writes it to output. Sequences with no detected microsatellite are silently discarded.

The detection works in two passes. A first regular expression finds any tandem repeat satisfying the unit-length and repeat-count constraints. The true minimal repeat unit is then determined, and a second scan refines the exact boundaries. The repeat unit is normalized to its lexicographically smallest rotation across all rotations and its reverse complement, which allows equivalent loci to be grouped consistently across samples.

By default, when the canonical form of a unit requires the reverse complement, the whole sequence is reoriented so that the microsatellite is always reported on the direct strand of the normalized unit. This behaviour can be disabled with --not-reoriented.

A common use case is identifying polymorphic SSR markers for population genetics, or flagging repeat-rich regions before designing PCR primers.

graph TD
  A@{ shape: doc, label: "sequences.fasta" }
  C[obimicrosat]
  D@{ shape: doc, label: "out_default.fasta" }
  A --> C:::obitools
  C --> D
  classDef obitools fill:#99d57c

Each retained sequence carries the following additional attributes after processing:

AttributeContent
microsatFull repeat region as a string
microsat_from1-based start position of the repeat
microsat_toEnd position of the repeat (inclusive)
microsat_unitRepeat unit as observed in the sequence
microsat_unit_normalizedLexicographically smallest canonical form
microsat_unit_orientationdirect or reverse
microsat_unit_lengthLength of the repeat unit (bp)
microsat_unit_countNumber of complete unit repetitions
seq_lengthTotal length of the (possibly reoriented) sequence
microsat_leftFlanking sequence to the left of the repeat
microsat_rightFlanking sequence to the right of the repeat

When a sequence is reoriented (reverse-complemented), _cmp is appended to its identifier. Coordinate attributes (microsat_from, microsat_to) always refer to the (possibly reoriented) output sequence.

Consider the following fasta input file:

πŸ“„ sequences.fasta
>seq001 dinucleotide AC repeat 16x with 40bp non-repetitive flanks
AGTCGAACTTGCATGCCTTCAGGGCAAGTCTAGCTTACGACACACACACACACACACACACACACACACACCGATAGTCATGCAAGTCTTGCGGCATAGATCGTTACCA
>seq002 mononucleotide A repeat 25x with 20bp non-repetitive flanks
AGTCGAACTTGCATGCCTTCAAAAAAAAAAAAAAAAAAAAAAAAAAACGATAGTCATGCAAGTCTTGCGG
>seq003 trinucleotide AGC repeat 11x with 40bp non-repetitive flanks
AGTCGAACTTGCATGCCTTCAGGGCAAGTCTAGCTTACGAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCCGATAGTCATGCAAGTCTTGCGGCATAGATCGTTACCA
>seq004 tetranucleotide ACGT repeat 6x with 20bp non-repetitive flanks
AGTCGAACTTGCATGCCTTCAACGTACGTACGTACGTACGTACGTCGATAGTCATGCAAGTCTTGCGG
>seq005 no microsatellite negative control
AGTCGAACTTGCATGCCTTCAGGGCAAGTCTAGCTTACGCATGCATGATCGATCGTATCGATCGATCG
>seq006 GT repeat 16x with 40bp non-repetitive flanks canonical form is AC
AGTCGAACTTGCATGCCTTCAGGGCAAGTCTAGCTTACGGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTCGATAGTCATGCAAGTCTTGCGGCATAGATCGTTACCA
>seq007 AC repeat 16x with only 10bp flanks insufficient for primer design
AGTCGAACTTACACACACACACACACACACACACACACACACCGATAGTCA

Running obimicrosat with default settings detects all SSRs with a 1–6 bp unit repeated at least 5 times over at least 20 bp:

obimicrosat sequences.fasta
>seq001 {"definition":"dinucleotide AC repeat 16x with 40bp non-repetitive flanks","microsat":"acacacacacacacacacacacacacacacac","microsat_from":40,"microsat_left":"agtcgaacttgcatgccttcagggcaagtctagcttacg","microsat_right":"cgatagtcatgcaagtcttgcggcatagatcgttacca","microsat_to":71,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"direct","seq_length":109}
agtcgaacttgcatgccttcagggcaagtctagcttacgacacacacacacacacacaca
cacacacacaccgatagtcatgcaagtcttgcggcatagatcgttacca
>seq006_cmp {"definition":"GT repeat 16x with 40bp non-repetitive flanks canonical form is AC","microsat":"acacacacacacacacacacacacacacacac","microsat_from":39,"microsat_left":"tggtaacgatctatgccgcaagacttgcatgactatcg","microsat_right":"cgtaagctagacttgccctgaaggcatgcaagttcgact","microsat_to":70,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"reverse","seq_length":109}
tggtaacgatctatgccgcaagacttgcatgactatcgacacacacacacacacacacac
acacacacaccgtaagctagacttgccctgaaggcatgcaagttcgact

Note that seq006 (a GT repeat) is automatically reverse-complemented so that its canonical unit (ac) is reported on the direct strand; its identifier becomes seq006_cmp and microsat_unit_orientation is set to reverse.

Synopsis #

obimicrosat [--min-unit-length|-m INT] [--max-unit-length|-M INT]
            [--min-unit-count INT] [--min-length|-l INT]
            [--min-flank-length|-f INT] [--not-reoriented|-n]
            [<filename>...]

Options #

obimicrosat specific options #

  • --min-unit-length | -m <INT>: Minimum length in base pairs of the repeated motif. Default: 1. Set to 2 to exclude mononucleotide repeats, 3 for di- and mononucleotide-free searches, etc.
  • --max-unit-length | -M <INT>: Maximum length in base pairs of the repeated motif. Default: 6. Increasing this value detects longer repeat units (minisatellites) at the cost of more complex patterns.
  • --min-unit-count <INT>: Minimum number of times the motif must be repeated. Default: 5. A value of 5 with a 2 bp unit requires at least 10 bp of pure repeat.
  • --min-length | -l <INT>: Minimum total length (in bp) of the repeat region. Default: 20. This filter applies after the unit-count filter and is useful to exclude very short but technically qualifying repeats.
  • --min-flank-length | -f <INT>: Minimum length of the flanking sequence on each side of the repeat. Default: 0. Sequences with flanks shorter than this threshold are discarded, which is useful when the output will feed a primer-design step.
  • --not-reoriented | -n : When set, sequences are never reverse-complemented to match the canonical orientation of the repeat unit. The microsatellite is reported as found, in its original orientation. Default: false (reorientation is active).

Controlling the input data #

OBITools4 generally recognizes the input file format. It also recognizes whether the input file is compressed using GZIP. But some rare files can be misidentified, so the following options allow the user to force the format, thus bypassing the format identification step.
The file format options #
  • --fasta: indicates that sequence data is in fasta format.
  • --fastq: indicates that sequence data is in fastq format.
  • --embl: indicates that sequence data is in EMBL-ENA flatfile format.
  • --csv: indicates that sequence data is in CSV format.
  • --genbank: indicates that sequence data is in GenBank flatfile format.
  • --ecopcr: indicates that sequence data is in the old ecoPCR tabulated format.
Controlling the way OBITools4 are formatting annotations #
These options only apply to the FASTA and FASTQ formats
  • --input-OBI-header: FASTA/FASTQ title line annotations follow the old OBI format.
  • --input-json-header: FASTA/FASTQ title line annotations follow the JSON format.
Controlling quality score decoding #
This option only applies to the FASTQ formats
  • --solexa: decodes quality string according to the old Solexa specification. (default: the standard Sanger encoding is used, env: OBISSOLEXA)

Controlling the output data #

  • --compress | -Z : output is compressed using gzip. (default: false)
  • --no-order: the OBITools ensure that the order between the input file and the output file does not change. When multiple files are processed, they are processed one at a time. If the –no-order option is added to a command, multiple input files can be opened at the same time and their contents processed in parallel. This usually increases processing speed, but does not guarantee the order of the sequences in the output file. Also, processing multiple files in parallel may require more memory to perform the computation.
  • --fasta-output: writes sequence data in fasta format (default if quality data is not available).
  • --fastq-output: writes sequence data in fastq format (default if quality data is available).
  • --json-output: writes sequence data in JSON format.
  • --out | -o <FILENAME>: filename used for saving the output (default: “-”, the standard output)
  • --output-OBI-header | -O : writes output FASTA/FASTQ title line annotations in OBI format (default: JSON).
  • --output-json-header: writew output FASTA/FASTQ title line annotations in JSON format (the default format).
  • --skip-empty: sequences of length equal to zero are removed from the output (default: false).
  • --no-progressbar: deactivates progress bar display (default: false).

General options #

  • --help | -h|-? : shows this help.
  • --version: prints the version and exits.
  • --silent-warning: This option tells obitools to stop displaying warnings. This behaviour can be controlled by setting the OBIWARNINGS environment variable.
  • --max-cpu <INTEGER>: OBITools can take advantage of your computer’s multi-core architecture by parallelizing the computation across all available CPUs. Computing on more CPUs usually requires more memory to perform the computation. Reducing the number of CPUs used to perform a calculation is also a way to indirectly control the amount of memory used by the process. The number of CPUs used by OBITools can also be controlled by setting the OBIMAXCPU environment variable.
  • --force-one-cpu: forces the use of a single CPU core for parallel processing (default: false).
  • --batch-size <INTEGER>: minimum number of sequences per batch for parallel processing (floor, default: 1, env: OBIBATCHSIZE)
  • --batch-size-max <INTEGER>: maximum number of sequences per batch for parallel processing (ceiling, default: 2000, env: OBIBATCHSIZEMAX)
  • --batch-mem <STRING>: maximum memory per batch (e.g. 128K, 64M, 1G; default: 128M; set to 0 to disable, env: OBIBATCHMEM)
  • --debug: enables debug mode, by setting log level to debug (default: false, env: OBIDEBUG)
  • --pprof: enables pprof server. Look at the log for details. (default: false).
  • --pprof-mutex <INTEGER>: enables profiling of mutex lock. (default: 10, env: OBIPPROFMUTEX)
  • --pprof-goroutine <INTEGER>: enables profiling of goroutine blocking profile. (default: 6060, env: OBIPPROFGOROUTINE)

Examples #

# Detect default microsatellites (unit 1–6 bp, β‰₯5 repeats, β‰₯20 bp total)
obimicrosat sequences.fasta > out_default.fasta
πŸ“„ out_default.fasta
>seq001 {"definition":"dinucleotide AC repeat 16x with 40bp non-repetitive flanks","microsat":"acacacacacacacacacacacacacacacac","microsat_from":40,"microsat_left":"agtcgaacttgcatgccttcagggcaagtctagcttacg","microsat_right":"cgatagtcatgcaagtcttgcggcatagatcgttacca","microsat_to":71,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"direct","seq_length":109}
agtcgaacttgcatgccttcagggcaagtctagcttacgacacacacacacacacacaca
cacacacacaccgatagtcatgcaagtcttgcggcatagatcgttacca
>seq002 {"definition":"mononucleotide A repeat 25x with 20bp non-repetitive flanks","microsat":"aaaaaaaaaaaaaaaaaaaaaaaaaaa","microsat_from":21,"microsat_left":"agtcgaacttgcatgccttc","microsat_right":"cgatagtcatgcaagtcttgcgg","microsat_to":47,"microsat_unit":"a","microsat_unit_count":27,"microsat_unit_length":1,"microsat_unit_normalized":"a","microsat_unit_orientation":"direct","seq_length":70}
agtcgaacttgcatgccttcaaaaaaaaaaaaaaaaaaaaaaaaaaacgatagtcatgca
agtcttgcgg
>seq003 {"definition":"trinucleotide AGC repeat 11x with 40bp non-repetitive flanks","microsat":"agcagcagcagcagcagcagcagcagcagcagcagc","microsat_from":40,"microsat_left":"agtcgaacttgcatgccttcagggcaagtctagcttacg","microsat_right":"cgatagtcatgcaagtcttgcggcatagatcgttacca","microsat_to":75,"microsat_unit":"agc","microsat_unit_count":12,"microsat_unit_length":3,"microsat_unit_normalized":"agc","microsat_unit_orientation":"direct","seq_length":113}
agtcgaacttgcatgccttcagggcaagtctagcttacgagcagcagcagcagcagcagc
agcagcagcagcagccgatagtcatgcaagtcttgcggcatagatcgttacca
>seq004 {"definition":"tetranucleotide ACGT repeat 6x with 20bp non-repetitive flanks","microsat":"acgtacgtacgtacgtacgtacgt","microsat_from":22,"microsat_left":"agtcgaacttgcatgccttca","microsat_right":"cgatagtcatgcaagtcttgcgg","microsat_to":45,"microsat_unit":"acgt","microsat_unit_count":6,"microsat_unit_length":4,"microsat_unit_normalized":"acgt","microsat_unit_orientation":"direct","seq_length":68}
agtcgaacttgcatgccttcaacgtacgtacgtacgtacgtacgtcgatagtcatgcaag
tcttgcgg
>seq006_cmp {"definition":"GT repeat 16x with 40bp non-repetitive flanks canonical form is AC","microsat":"acacacacacacacacacacacacacacacac","microsat_from":39,"microsat_left":"tggtaacgatctatgccgcaagacttgcatgactatcg","microsat_right":"cgtaagctagacttgccctgaaggcatgcaagttcgact","microsat_to":70,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"reverse","seq_length":109}
tggtaacgatctatgccgcaagacttgcatgactatcgacacacacacacacacacacac
acacacacaccgtaagctagacttgccctgaaggcatgcaagttcgact
>seq007 {"definition":"AC repeat 16x with only 10bp flanks insufficient for primer design","microsat":"acacacacacacacacacacacacacacacac","microsat_from":11,"microsat_left":"agtcgaactt","microsat_right":"cgatagtca","microsat_to":42,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"direct","seq_length":51}
agtcgaacttacacacacacacacacacacacacacacacaccgatagtca
# Restrict to di- and trinucleotide repeats only
obimicrosat -m 2 -M 3 sequences.fasta > out_dinucleotide.fasta
πŸ“„ out_dinucleotide.fasta
>seq001 {"definition":"dinucleotide AC repeat 16x with 40bp non-repetitive flanks","microsat":"acacacacacacacacacacacacacacacac","microsat_from":40,"microsat_left":"agtcgaacttgcatgccttcagggcaagtctagcttacg","microsat_right":"cgatagtcatgcaagtcttgcggcatagatcgttacca","microsat_to":71,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"direct","seq_length":109}
agtcgaacttgcatgccttcagggcaagtctagcttacgacacacacacacacacacaca
cacacacacaccgatagtcatgcaagtcttgcggcatagatcgttacca
>seq003 {"definition":"trinucleotide AGC repeat 11x with 40bp non-repetitive flanks","microsat":"agcagcagcagcagcagcagcagcagcagcagcagc","microsat_from":40,"microsat_left":"agtcgaacttgcatgccttcagggcaagtctagcttacg","microsat_right":"cgatagtcatgcaagtcttgcggcatagatcgttacca","microsat_to":75,"microsat_unit":"agc","microsat_unit_count":12,"microsat_unit_length":3,"microsat_unit_normalized":"agc","microsat_unit_orientation":"direct","seq_length":113}
agtcgaacttgcatgccttcagggcaagtctagcttacgagcagcagcagcagcagcagc
agcagcagcagcagccgatagtcatgcaagtcttgcggcatagatcgttacca
>seq006_cmp {"definition":"GT repeat 16x with 40bp non-repetitive flanks canonical form is AC","microsat":"acacacacacacacacacacacacacacacac","microsat_from":39,"microsat_left":"tggtaacgatctatgccgcaagacttgcatgactatcg","microsat_right":"cgtaagctagacttgccctgaaggcatgcaagttcgact","microsat_to":70,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"reverse","seq_length":109}
tggtaacgatctatgccgcaagacttgcatgactatcgacacacacacacacacacacac
acacacacaccgtaagctagacttgccctgaaggcatgcaagttcgact
>seq007 {"definition":"AC repeat 16x with only 10bp flanks insufficient for primer design","microsat":"acacacacacacacacacacacacacacacac","microsat_from":11,"microsat_left":"agtcgaactt","microsat_right":"cgatagtca","microsat_to":42,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"direct","seq_length":51}
agtcgaacttacacacacacacacacacacacacacacacaccgatagtca
# Require at least 30 bp flanking sequence on each side (for primer design)
obimicrosat -f 30 sequences.fasta > out_primer_ready.fasta
πŸ“„ out_primer_ready.fasta
>seq001 {"definition":"dinucleotide AC repeat 16x with 40bp non-repetitive flanks","microsat":"acacacacacacacacacacacacacacacac","microsat_from":40,"microsat_left":"agtcgaacttgcatgccttcagggcaagtctagcttacg","microsat_right":"cgatagtcatgcaagtcttgcggcatagatcgttacca","microsat_to":71,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"direct","seq_length":109}
agtcgaacttgcatgccttcagggcaagtctagcttacgacacacacacacacacacaca
cacacacacaccgatagtcatgcaagtcttgcggcatagatcgttacca
>seq003 {"definition":"trinucleotide AGC repeat 11x with 40bp non-repetitive flanks","microsat":"agcagcagcagcagcagcagcagcagcagcagcagc","microsat_from":40,"microsat_left":"agtcgaacttgcatgccttcagggcaagtctagcttacg","microsat_right":"cgatagtcatgcaagtcttgcggcatagatcgttacca","microsat_to":75,"microsat_unit":"agc","microsat_unit_count":12,"microsat_unit_length":3,"microsat_unit_normalized":"agc","microsat_unit_orientation":"direct","seq_length":113}
agtcgaacttgcatgccttcagggcaagtctagcttacgagcagcagcagcagcagcagc
agcagcagcagcagccgatagtcatgcaagtcttgcggcatagatcgttacca
>seq006_cmp {"definition":"GT repeat 16x with 40bp non-repetitive flanks canonical form is AC","microsat":"acacacacacacacacacacacacacacacac","microsat_from":39,"microsat_left":"tggtaacgatctatgccgcaagacttgcatgactatcg","microsat_right":"cgtaagctagacttgccctgaaggcatgcaagttcgact","microsat_to":70,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"reverse","seq_length":109}
tggtaacgatctatgccgcaagacttgcatgactatcgacacacacacacacacacacac
acacacacaccgtaagctagacttgccctgaaggcatgcaagttcgact
# Keep sequences in their original orientation (no reverse-complement)
obimicrosat --not-reoriented sequences.fasta > out_no_reorient.fasta
πŸ“„ out_no_reorient.fasta
>seq001 {"definition":"dinucleotide AC repeat 16x with 40bp non-repetitive flanks","microsat":"acacacacacacacacacacacacacacacac","microsat_from":40,"microsat_left":"agtcgaacttgcatgccttcagggcaagtctagcttacg","microsat_right":"cgatagtcatgcaagtcttgcggcatagatcgttacca","microsat_to":71,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"direct","seq_length":109}
agtcgaacttgcatgccttcagggcaagtctagcttacgacacacacacacacacacaca
cacacacacaccgatagtcatgcaagtcttgcggcatagatcgttacca
>seq002 {"definition":"mononucleotide A repeat 25x with 20bp non-repetitive flanks","microsat":"aaaaaaaaaaaaaaaaaaaaaaaaaaa","microsat_from":21,"microsat_left":"agtcgaacttgcatgccttc","microsat_right":"cgatagtcatgcaagtcttgcgg","microsat_to":47,"microsat_unit":"a","microsat_unit_count":27,"microsat_unit_length":1,"microsat_unit_normalized":"a","microsat_unit_orientation":"direct","seq_length":70}
agtcgaacttgcatgccttcaaaaaaaaaaaaaaaaaaaaaaaaaaacgatagtcatgca
agtcttgcgg
>seq003 {"definition":"trinucleotide AGC repeat 11x with 40bp non-repetitive flanks","microsat":"agcagcagcagcagcagcagcagcagcagcagcagc","microsat_from":40,"microsat_left":"agtcgaacttgcatgccttcagggcaagtctagcttacg","microsat_right":"cgatagtcatgcaagtcttgcggcatagatcgttacca","microsat_to":75,"microsat_unit":"agc","microsat_unit_count":12,"microsat_unit_length":3,"microsat_unit_normalized":"agc","microsat_unit_orientation":"direct","seq_length":113}
agtcgaacttgcatgccttcagggcaagtctagcttacgagcagcagcagcagcagcagc
agcagcagcagcagccgatagtcatgcaagtcttgcggcatagatcgttacca
>seq004 {"definition":"tetranucleotide ACGT repeat 6x with 20bp non-repetitive flanks","microsat":"acgtacgtacgtacgtacgtacgt","microsat_from":22,"microsat_left":"agtcgaacttgcatgccttca","microsat_right":"cgatagtcatgcaagtcttgcgg","microsat_to":45,"microsat_unit":"acgt","microsat_unit_count":6,"microsat_unit_length":4,"microsat_unit_normalized":"acgt","microsat_unit_orientation":"direct","seq_length":68}
agtcgaacttgcatgccttcaacgtacgtacgtacgtacgtacgtcgatagtcatgcaag
tcttgcgg
>seq006 {"definition":"GT repeat 16x with 40bp non-repetitive flanks canonical form is AC","microsat":"gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt","microsat_from":40,"microsat_left":"agtcgaacttgcatgccttcagggcaagtctagcttacg","microsat_right":"cgatagtcatgcaagtcttgcggcatagatcgttacca","microsat_to":71,"microsat_unit":"gt","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"reverse","seq_length":109}
agtcgaacttgcatgccttcagggcaagtctagcttacggtgtgtgtgtgtgtgtgtgtg
tgtgtgtgtgtcgatagtcatgcaagtcttgcggcatagatcgttacca
>seq007 {"definition":"AC repeat 16x with only 10bp flanks insufficient for primer design","microsat":"acacacacacacacacacacacacacacacac","microsat_from":11,"microsat_left":"agtcgaactt","microsat_right":"cgatagtca","microsat_to":42,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"direct","seq_length":51}
agtcgaacttacacacacacacacacacacacacacacacaccgatagtca
# Require at least 8 repeat units and a minimum repeat length of 30 bp
obimicrosat --min-unit-count 8 -l 30 sequences.fasta > out_strict.fasta
πŸ“„ out_strict.fasta
>seq001 {"definition":"dinucleotide AC repeat 16x with 40bp non-repetitive flanks","microsat":"acacacacacacacacacacacacacacacac","microsat_from":40,"microsat_left":"agtcgaacttgcatgccttcagggcaagtctagcttacg","microsat_right":"cgatagtcatgcaagtcttgcggcatagatcgttacca","microsat_to":71,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"direct","seq_length":109}
agtcgaacttgcatgccttcagggcaagtctagcttacgacacacacacacacacacaca
cacacacacaccgatagtcatgcaagtcttgcggcatagatcgttacca
>seq003 {"definition":"trinucleotide AGC repeat 11x with 40bp non-repetitive flanks","microsat":"agcagcagcagcagcagcagcagcagcagcagcagc","microsat_from":40,"microsat_left":"agtcgaacttgcatgccttcagggcaagtctagcttacg","microsat_right":"cgatagtcatgcaagtcttgcggcatagatcgttacca","microsat_to":75,"microsat_unit":"agc","microsat_unit_count":12,"microsat_unit_length":3,"microsat_unit_normalized":"agc","microsat_unit_orientation":"direct","seq_length":113}
agtcgaacttgcatgccttcagggcaagtctagcttacgagcagcagcagcagcagcagc
agcagcagcagcagccgatagtcatgcaagtcttgcggcatagatcgttacca
>seq006_cmp {"definition":"GT repeat 16x with 40bp non-repetitive flanks canonical form is AC","microsat":"acacacacacacacacacacacacacacacac","microsat_from":39,"microsat_left":"tggtaacgatctatgccgcaagacttgcatgactatcg","microsat_right":"cgtaagctagacttgccctgaaggcatgcaagttcgact","microsat_to":70,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"reverse","seq_length":109}
tggtaacgatctatgccgcaagacttgcatgactatcgacacacacacacacacacacac
acacacacaccgtaagctagacttgccctgaaggcatgcaagttcgact
>seq007 {"definition":"AC repeat 16x with only 10bp flanks insufficient for primer design","microsat":"acacacacacacacacacacacacacacacac","microsat_from":11,"microsat_left":"agtcgaactt","microsat_right":"cgatagtca","microsat_to":42,"microsat_unit":"ac","microsat_unit_count":16,"microsat_unit_length":2,"microsat_unit_normalized":"ac","microsat_unit_orientation":"direct","seq_length":51}
agtcgaacttacacacacacacacacacacacacacacacaccgatagtca
obimicrosat --help