The CSV format

The CSV (Coma-Separated Values) flat file format #

The CSV format is a simple text format that is widely used for storing tabular data, such as spreadsheets, databases, and other data storage systems. It is a comma-separated values format, meaning that each value in a row is separated by a comma.

Each line of the file corresponds to a record that consists of the same number of fields. The first row of the file is a header row that contains the fields names. The field delimiter, the comma, can itself appear in a field using quotation marks around it, with ".

Here is an example with two sequences in a fasta file:

📄 two_sequences.fasta
>AB061527 {"count":1,"definition":"Sorex unguiculatus mitochondrial NA, complete genome.","family_name":"Soricidae","family_taxid":9376,"genus_name":"Sorex","genus_taxid":9379,"obicleandb_level":"family","obicleandb_trusted":2.2137847111025621e-13,"species_name":"Sorex unguiculatus","species_taxid":62275,"taxid":62275}
ttagccctaaacttaggtatttaatctaacaaaaatacccgtcagagaactactagcaat
agcttaaaactcaaaggacttggcggtgctttatatccct
>AL355887 {"count":2,"definition":"Human chromosome 14 NA sequence BAC R-179O11 of library RPCI-11 from chromosome 14 of Homo sapiens (Human)XXKW HTG.; HTGS_ACTIVFIN.","family_name":"Hominidae","family_taxid":9604,"genus_name":"Homo","genus_taxid":9605,"obicleandb_level":"genus","obicleandb_trusted":0,"species_name":"Homo sapiens","species_taxid":9606,"taxid":9606}
ttagccctaaactctagtagttacattaacaaaaccattcgtcagaatactacgagcaac
agcttaaaactcaaaggacctggcagttctttatatccct

The following command allows counting the number of records, and provides a CSV file:

obicount two_sequences.fasta
entities,n
variants,2
reads,3
symbols,200

In a prettier presentation:

obicount two_sequences.fasta | csvlook
| entities |   n |
| -------- | --- |
| variants |   2 |
| reads    |   3 |
| symbols  | 200 |

The CSV format of the result of obicount is an easy way to make plots with uplot:

obicount two_sequences.fasta \
   | uplot barplot -d, -H --xscale log10
                                n
            ┌                                        ┐ 
   variants ┤■■■■ 2.0                                  
      reads ┤■■■■■■■ 3.0                               
    symbols ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 200.0   
            └                                        ┘ 
                             [log10]