Prior to assembly, Illumina sequences were quality filtered using adaptive window trimming and a quality threshold of 30 using the script Trim.pl (http://wiki.bioinformatics.ucdavis.edu/index.php/Trim.pl). All reads were screened for adaptor, barcorde, primer, and transposan sequences and trimmed as needed using FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/index.html). De novo genome assembly was performed with several assembly programs. Sequences generated through the 454 platform were first assembled with Roche’s GS De Novo Assembler v 2.6 ("Newbler") 2 using default parameters. De novo assemblies of 454 reads were also performed using mira 3 with the default settings for normal quality de novo genome assembly. De novo assembly of subsets of Illumina reads was performed with velvet 4, using an estimated coverage of 1000x, kmer size of 21, and a coverage cutoff of 5). Large contigs from Newbler , mira, and velvet were consolidated using Geneious Pro v 5.6.6 (Biomatters, Ltd, http://www.geneious.com) and aligned with progressiveMauve 5 to visualize the relationship of large contigs from different assemblies and to identify gaps to close. Primers were designed at the ends of contigs using either Geneious Pro or CLC Genomics Workbench v 5.1 (CLCbio, http://www.clcbio.com) to amplify gaps between contigs. Positive PCR amplification products linking contigs were cleaned using a Min-Elute PCR Purification kit (Qiagen) and Sanger sequenced. A nearly complete draft genome from strain JS085 served as a reference genome for the remaining five strains. Both Illumina and 454 reads were mapped to the reference genome with CLC Genomics Workbench. Unmapped reads were then assembled de novo to ensure that novel genomic content in the mapped strains was not overlooked. De novo assembly of 454 and/or Illumina reads for each strain was also performed in CLC Genomics Workbench and compared to the mapped assemblies using progressiveMauve.
Four of the strains were sequenced using both 454 and Illumina and two strains were sequenced only with Illumina. The sequencing coverage depth of quality-filtered reads ranged from 22X to 50X for 454 and up to 3618X for Illumina. Lebetimonas strain JS085 had the highest coverage of 454 reads and was assembled into 33 large contigs with Newbler and 1747 contigs with mira. The 20 largest contigs from each of these assemblies were consolidated using de novo assembly in Geneious to 10 contigs. An additional round of assembly in Geneious with the 10 consolidated contigs and velvet contigs greater than 10 Kbp further consolidated the draft genome to 6 contigs. Primers were designed for all possible combinations between the 6 contigs. One gap was closed using Sanger-sequenced positive pcr products. Finally, all 454 and Illumina reads for strain JS085 were mapped to the draft genome consisting of 5 contigs and the resulting consensus was used as the final draft genome. The five remaining genomes were assembled by mapping 454 and Illumina reads to the JS085 reference genome in CLC Genomics Workbench. Hybrid de novo assemblies in CLC Genomics Workbench of each strain did not extend contigs or close gaps between the 5 contigs of the draft genomes. Assemblies of unmapped reads produced only short contigs with no significant similarities using nucleotide BLAST 6.
BCO-DMO Processing:
- modified parameter names to conform with BCO-DMO naming conventions;
- added hyperlinks;
- removed "m" (meters) in depth column.