The CSV (Coma-Separated Values) flat file format #
The CSV format is a simple text format that is widely used for storing tabular data, such as spreadsheets, databases, and other data storage systems. It is a comma-separated values format, meaning that each value in a row is separated by a comma.
Each line of the file corresponds to a record that consists of the same number of fields. The first row of the file is a header row that contains the fields names. The field delimiter, the comma, can itself appear in a field using quotation marks around it, with "
.
Here is an example with two sequences in a fasta file:
📄 two_sequences.fasta>AB061527 {"count":1,"definition":"Sorex unguiculatus mitochondrial NA, complete genome.","family_name":"Soricidae","family_taxid":9376,"genus_name":"Sorex","genus_taxid":9379,"obicleandb_level":"family","obicleandb_trusted":2.2137847111025621e-13,"species_name":"Sorex unguiculatus","species_taxid":62275,"taxid":62275}
ttagccctaaacttaggtatttaatctaacaaaaatacccgtcagagaactactagcaat
agcttaaaactcaaaggacttggcggtgctttatatccct
>AL355887 {"count":2,"definition":"Human chromosome 14 NA sequence BAC R-179O11 of library RPCI-11 from chromosome 14 of Homo sapiens (Human)XXKW HTG.; HTGS_ACTIVFIN.","family_name":"Hominidae","family_taxid":9604,"genus_name":"Homo","genus_taxid":9605,"obicleandb_level":"genus","obicleandb_trusted":0,"species_name":"Homo sapiens","species_taxid":9606,"taxid":9606}
ttagccctaaactctagtagttacattaacaaaaccattcgtcagaatactacgagcaac
agcttaaaactcaaaggacctggcagttctttatatccct
The following command allows counting the number of records, and provides a CSV file:
obicount two_sequences.fasta
entities,n
variants,2
reads,3
symbols,200
In a prettier presentation:
obicount two_sequences.fasta | csvlook
| entities | n |
| -------- | --- |
| variants | 2 |
| reads | 3 |
| symbols | 200 |
The CSV format of the result of obicount
is an easy way to make plots with uplot
:
obicount two_sequences.fasta \
| uplot barplot -d, -H --xscale log10
n
┌ ┐
variants ┤■■■■ 2.0
reads ┤■■■■■■■ 3.0
symbols ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 200.0
└ ┘
[log10]