JSON format

The JSON sequence file format #

To facilitate the exchange of data between different systems, and to allow easy parsing of the data with all programming languages, OBITools offers to export sequence data in JSON format. JSON output is requested by adding the --json-output option to the command line.

Converting FASTA to JSON format #

Here it is an example of two sequences in fasta format:

📄 json_example.fasta
>AB061527 {"count":1,"definition":"Sorex unguiculatus mitochondrial NA, complete genome.","family_name":"Soricidae","family_taxid":9376,"genus_name":"Sorex","genus_taxid":9379,"obicleandb_level":"family","obicleandb_trusted":2.2137847111025621e-13,"species_name":"Sorex unguiculatus","species_taxid":62275,"taxid":62275}
ttagccctaaacttaggtatttaatctaacaaaaatacccgtcagagaactactagcaat
agcttaaaactcaaaggacttggcggtgctttatatccct
>AL355887 {"count":2,"definition":"Human chromosome 14 NA sequence BAC R-179O11 of library RPCI-11 from chromosome 14 of Homo sapiens (Human)XXKW HTG.; HTGS_ACTIVFIN.","family_name":"Hominidae","family_taxid":9604,"genus_name":"Homo","genus_taxid":9605,"obicleandb_level":"genus","obicleandb_trusted":0,"species_name":"Homo sapiens","species_taxid":9606,"taxid":9606}
ttagccctaaactctagtagttacattaacaaaaccattcgtcagaatactacgagcaac
agcttaaaactcaaaggacctggcagttctttatatccct

that you can convert in JSON with the obiconvert command:

obiconvert --json-output json_example.fasta > json_example.json

which produces the following JSON output:

📄 json_example.json
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
[
  {
    "annotations": {
      "count": 1,
      "definition": "Sorex unguiculatus mitochondrial NA, complete genome.",
      "family_name": "Soricidae",
      "family_taxid": "9376",
      "genus_name": "Sorex",
      "genus_taxid": "9379",
      "obicleandb_level": "family",
      "obicleandb_trusted": 2.2137847111025621e-13,
      "species_name": "Sorex unguiculatus",
      "species_taxid": "62275",
      "taxid": "62275"
    },
    "id": "AB061527",
    "sequence": "ttagccctaaacttaggtatttaatctaacaaaaatacccgtcagagaactactagcaatagcttaaaactcaaaggacttggcggtgctttatatccct"
  },
  {
    "annotations": {
      "count": 2,
      "definition": "Human chromosome 14 NA sequence BAC R-179O11 of library RPCI-11 from chromosome 14 of Homo sapiens (Human)XXKW HTG.; HTGS_ACTIVFIN.",
      "family_name": "Hominidae",
      "family_taxid": "9604",
      "genus_name": "Homo",
      "genus_taxid": "9605",
      "obicleandb_level": "genus",
      "obicleandb_trusted": 0,
      "species_name": "Homo sapiens",
      "species_taxid": "9606",
      "taxid": "9606"
    },
    "id": "AL355887",
    "sequence": "ttagccctaaactctagtagttacattaacaaaaccattcgtcagaatactacgagcaacagcttaaaactcaaaggacctggcagttctttatatccct"
  }
]

Converting FASTQ to JSON format #

To convert a fastq file as the following example:

📄 json_example.fasta
>AB061527 {"count":1,"definition":"Sorex unguiculatus mitochondrial NA, complete genome.","family_name":"Soricidae","family_taxid":9376,"genus_name":"Sorex","genus_taxid":9379,"obicleandb_level":"family","obicleandb_trusted":2.2137847111025621e-13,"species_name":"Sorex unguiculatus","species_taxid":62275,"taxid":62275}
ttagccctaaacttaggtatttaatctaacaaaaatacccgtcagagaactactagcaat
agcttaaaactcaaaggacttggcggtgctttatatccct
>AL355887 {"count":2,"definition":"Human chromosome 14 NA sequence BAC R-179O11 of library RPCI-11 from chromosome 14 of Homo sapiens (Human)XXKW HTG.; HTGS_ACTIVFIN.","family_name":"Hominidae","family_taxid":9604,"genus_name":"Homo","genus_taxid":9605,"obicleandb_level":"genus","obicleandb_trusted":0,"species_name":"Homo sapiens","species_taxid":9606,"taxid":9606}
ttagccctaaactctagtagttacattaacaaaaccattcgtcagaatactacgagcaac
agcttaaaactcaaaggacctggcagttctttatatccct

use the equivalent command:

obiconvert --json-output json_example.fastq > json_example_qual.json

which produces the following JSON output:

📄 json_example_qual.json
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
[
  {
    "annotations": {
      "count": 1,
      "definition": "Sorex unguiculatus mitochondrial NA, complete genome.",
      "family_name": "Soricidae",
      "family_taxid": "9376",
      "genus_name": "Sorex",
      "genus_taxid": "9379",
      "obicleandb_level": "family",
      "species_name": "Sorex unguiculatus",
      "species_taxid": "62275",
      "taxid": "62275"
    },
    "id": "AB061527",
    "qualities": "BBFFBFFHHHHHFFFFFHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH",
    "sequence": "ttagccctaaacttaggtatttaatctaacaaaaatacccgtcagagaactactagcaat"
  },
  {
    "annotations": {
      "count": 2,
      "definition": "Human chromosome 14 NA sequence BAC R-179O11 of library RPCI-11 from chromosome 14 of Homo sapiens (Human)XXKW HTG.; HTGS_ACTIVFIN.",
      "family_name": "Hominidae",
      "family_taxid": "9604",
      "genus_name": "Homo",
      "genus_taxid": "9605",
      "obicleandb_level": "genus",
      "species_name": "Homo sapiens",
      "species_taxid": "9606",
      "taxid": "9606"
    },
    "id": "AL355887",
    "qualities": "@BBFFBFFHHHGGGFFFFFHHHJJJHHHHHHHHHJHHHHHHHHHHHHHHHHHHHHHHHHH",
    "sequence": "ttagccctaaactctagtagttacattaacaaaaccattcgtcagaatactacgagcaac"
  }
]