Different strategies were applied to assemble the reads of the individual cells, and later to combine single cells Dsc # 2 and # 3, in order to get the most out of the sequencing data. Statistics were checked with assemblathon. Since good assembly statistics do not automatically hold true that the assembly is optimal assemblies were always run through the RAST pipeline to check for misassemblies. SAGs were assembled by:
A. CLC bio
B. spades 2.3
C. spades-n
D. velvet-sc, kmer=37
E. velvet-sc n
F. Celera (CA)
G. Hybrid error correction method using CA assembled Illumina® data to correct long PacBio® reads
H. velvet assembly using Euler correction, kmer=55
I. spades assembly of Illumina®-only combined via PCAP with CA assembly of PacBio corrected by PacBio only
J. velvet-sc assembly of Illumina®-only combined via PCAP with CA assembly of PacBio® corrected by PacBio® only
K. velvet-sc assembly of Illumina®-only combined via PCAP with CA assembly of PacBio® corrected by Illumina®-only
L. spades assembly of Illumina®-only combined via PCAP with CA assembly of PacBio corrected by Illumina® only n = Normalization of the Illumina® reads
Single cells 2 and 3 were assembled together since they showed almost 100% identity at the nucleotide level after individual assembly. At this stage, a 0.32-Mb assembly was contained in 126 contigs for single cell 1 (Dsc1) and a 1.38-Mb assembly in 327 contigs for the co-assembly of single cells 2 and 3 (DscP2).
Assembled contigs were submitted to the Integrated Microbial Genomes database annotation pipeline (IMG, version 4.1) and to the Rapid Annotations using Subsystems Technology pipeline (RAST, version 4.0) in 2013. Some computationally assigned annotations were manually changed based on the inspection of evidence for the assigned annotations, orthologs in related genomes and gene neighborhoods. Pathways were predicted using RAST, IMG and KEGG (Kyoto Encyclopedia of Genes and Genomes). Nucleotide and amino-acid sequences of genes were blasted as query sequences against the NCBI databases.