Methodology from Fullerton and Moyer (2016). See paper for references cited below.
Sample collection. Subsurface sediments were collected on IODP expedition 331 (Deep Hot Biosphere) from 1 September through 4 October 2010 (Fig. 1). Onboard contamination testing of sites C0015 (126°53=E, 27°47=N; hole B; section 1H-5; 5.6mbsf) and C0017 (126°55=E, 27°47=N; hole C; section 1H-7; 26.6 m bsf) found no indication of interior-core contamination using fluorescent microspheres (both holes C0015B and C0017C) and perfluorocarbon tracer (hole C0017C only). The sample from hole C0017C was also verified by PCR-generated phylotype comparisons based on 97% similarity to phylotypes obtained from drilling mud at a contamination level of 1% or less (26). Subsamples were aseptically collected from the interiors of whole-round cores and stored in cryovials with 27% (vol/vol) glycerol at -80°C.
Single-cell source. Core depths were chosen from sites C0015 and C0017, which were characterized as weakly oxidized pumiceous gravels with no detected sulfide mineralization and less than 0.1 wt% total organic carbon, total nitrogen, and total sulfur (25). The selected samples for single-cell genomes were from subsurface depths of 5.6 m bsf from hole C0015B and from 26.6 m bsf from hole C0017C. Temperatures were estimated at ~10.5°C for C0015B and ~8.1°C for C0017C at these depths. Details of geochemistry and lithography have been previously described (12, 24, 25).
Single-cell sorting, amplification, sequencing, and annotation. Samples from sites C0015 and C0017 (Fig. 1) were diluted with 1 ml of filter-sterilized artificial seawater (27), making a slurry, and then passed through a 90-um nylon mesh filter twice and centrifuged at ~500 ug for 2 min to produce a particle-free cell suspension. The suspension was then processed using fluorescence-activated single-cell sorting at the Single Cell Genome Center (SCGC) at Bigelow Laboratory for Ocean Sciences. Single-cell sorting and multiple displacement amplification (MDA) have been previously described (28). The amplified SSU rRNA gene sequences (27F/907R) were classified using the Ribosomal Database Project (RDP) online classifier (28, 29). Based on their SSU rRNA gene identities, nine Chloroflexi SAGs (of the total 29 unique MDA reactions identified after cell sorting) were chosen for whole-genome sequencing. These SAGs were sequenced and assembled, and contamination was checked by the SCGC, using previously well-described parameters (28, 29). Assembly was done using SPAdes v.3.0.0 (30). All contigs were compared to ensure no cross contamination among SAGs and the NCBI nt database, which was followed by tetramer principal-component analysis as previously described (31-33). These analyses revealed no contamination. The full name for each of the SAGs was shortened, e.g., Anaerolineales bacterium SCGC AC-711-B22 was shortened to An-B22. Phylogeny was abbreviated as follows: Anaerolineales to An, Dehalococcoidales to De, and Thermoflexales to Th. The assembled genomes were annotated using RAST (34). Gene annotations were compared to NCBI GenBank via BLASTn, and the results can be found in Tables S2 to S4 in the supplemental material.
The Anaerolineales SAGs were compared to the genome of Anaerolinea thermophila UNI-1 (GenBank accession number NC_014960) and the single Thermoflexales SAG to that of Thermoflexus hugenholtzii JAD2 (NCBI BioProject PRJNA195829), as they were determined to be their closest respective relatives. The type strain A. thermophila UNI-1 was isolated from an anaerobic granular sludge reactor treating fried soybean curd manufacturing wastewater in Japan (35), while the type strain T. hugenholtzii JAD2 was isolated from the sediment of Great Boiling Spring in Nevada (36). Both are considered thermophilic, Gram-negative, nonspore-forming, heterotrophic bacteria that grow in multicellular filaments (36, 37).
Phylogenetic analysis. SSU rRNA gene sequences and phylogenetic relatives were aligned using the Silva SINA aligner (38). For the rdhA analysis, amino acids were aligned using ClustalW within Geneious (39, 40). The resulting alignments were manually screened and then used to create a phylogenetic consensus tree using MrBayes within Geneious (41). Parameters included using the HKY85 substitution model, the chain length set at 1,100,000, and a subsampling frequency of 200. Priors were set with an unconstrained branch length. The average nucleotide identity (ANI) was calculated for the SAGs and selected genomes, with the BLAST parameters as previously described (42).
Genome completeness estimates. Genome completeness estimates were determined with BLASTP using predicted amino acid sequences against a set of single-copy core genes (43). To be considered valid, all proteins must have at least 30% identity over at least 30% of the length of the core gene (44). The core gene group is made up of 66 previously established genes belonging to a nonredundant list as examined by gene ontology (GO) annotations (44, 45).
Accession numbers. The SSU rRNA gene sequences obtained from MDA have been submitted to the NCBI GenBank database (accession numbers KT119838 to KT119846). All the SAGs have been made public in the Integrated Microbial Genomes (IMG) database (IMG submission identifiers [IDs] 68650, 69642 to 69645, 69647 to 69649, and 69684).