Whole genome sequence data from bacterial isolates from venting fluids at NW Rota Seamount, collected on R/V Thomas G. Thompson and R/V Kilo Moana cruises TN232 and KM1005 in the Mariana arc of the western Pacific in 2009 and 2010

Website: https://www.bco-dmo.org/dataset/632784
Data Type: Other Field Results
Version: 18 Jan 2016
Version Date: 2016-01-18

Project
» Functional gene diversity and expression in ocean crust microbial communities (NP Functional Gene Div)

Program
» Center for Dark Energy Biosphere Investigations (C-DEBI)
ContributorsAffiliationRole
Huber, JulieMarine Biological Laboratory (MBL)Principal Investigator
Rauch, ShannonWoods Hole Oceanographic Institution (WHOI BCO-DMO)BCO-DMO Data Manager


Dataset Description

Whole genome sequence data from bacterial isolates from venting fluids at NW Rota Seamount, collected in 2009 and 2010 on cruises TN232 and KM1005.


Methods & Sampling

Diffuse hydrothermal vent fluids were collected at several vent sites on NW Rota-1 seamount in 2009 and 2010 using the ROV Jason 2 and the hydrothermal fluid and particle sampler. Anaerobic enrichment media previously used for the isolation of Caminibacter profundus was inoculated with 1 ml of unfiltered diffuse flow fluids and incubated at 55 degrees C. Enrichments with positive microbial growth were isolated by three sets of dilution-to-extinction. The growth of Lebetimonas under varying conditions including alternative electron donor/acceptor pairs and with N2 gas as the sole nitrogen source was evaluated as described in the Supplementary Material of Meyer & Huber (2014). Growth of Lebetimonas strain JH369 with N2 gas as the sole nitrogen source was evaluated using anaerobic seawater media without yeast extract or ammonia and containing formate and elemental sulfur with an 80% N2 and 20% CO2 headspace.

Genomic DNA was extracted from pure cultures at log phase using a CTAB extraction. Libraries were prepared using Nextera DNA sample prep kits (Illumina, San Diego, CA, USA) and sequenced by Roche 454 GS FLX Titanium (454 Life Sciences, Branford, CT, USA) and/or using Illumina HiSeq 2000 paired reads (Illumina). In the case of strains sequenced with multiple platforms, the same genomic DNA extraction was used for all library preparations, with the exception of strain JS085. Genomes were assembled using several tools as described in the Supplementary material of Meyer and Huber 2014.

Related references:
Meyer, J.L. and J.A. Huber. 2014. Strain-level genomic variation in natural populations of Lebetimonas from an erupting deep-sea volcano. ISME Journal. 8:867–880. doi:10.1038/ismej.2013.206


Data Processing Description

Prior to assembly, Illumina sequences were quality filtered using adaptive window trimming and a quality threshold of 30 using the script Trim.pl (http://wiki.bioinformatics.ucdavis.edu/index.php/Trim.pl). All reads were screened for adaptor, barcorde, primer, and transposan sequences and trimmed as needed using FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/index.html). De novo genome assembly was performed with several assembly programs. Sequences generated through the 454 platform were first assembled with Roche’s GS De Novo Assembler v 2.6 ("Newbler") 2 using default parameters. De novo assemblies of 454 reads were also performed using mira 3 with the default settings for normal quality de novo genome assembly. De novo assembly of subsets of Illumina reads was performed with velvet 4, using an estimated coverage of 1000x, kmer size of 21, and a coverage cutoff of 5). Large contigs from Newbler , mira, and velvet were consolidated using Geneious Pro v 5.6.6 (Biomatters, Ltd, http://www.geneious.com) and aligned with progressiveMauve 5 to visualize the relationship of large contigs from different assemblies and to identify gaps to close. Primers were designed at the ends of contigs using either Geneious Pro or CLC Genomics Workbench v 5.1 (CLCbio, http://www.clcbio.com) to amplify gaps between contigs. Positive PCR amplification products linking contigs were cleaned using a Min-Elute PCR Purification kit (Qiagen) and Sanger sequenced. A nearly complete draft genome from strain JS085 served as a reference genome for the remaining five strains. Both Illumina and 454 reads were mapped to the reference genome with CLC Genomics Workbench. Unmapped reads were then assembled de novo to ensure that novel genomic content in the mapped strains was not overlooked. De novo assembly of 454 and/or Illumina reads for each strain was also performed in CLC Genomics Workbench and compared to the mapped assemblies using progressiveMauve.

Four of the strains were sequenced using both 454 and Illumina and two strains were sequenced only with Illumina. The sequencing coverage depth of quality-filtered reads ranged from 22X to 50X for 454 and up to 3618X for Illumina. Lebetimonas strain JS085 had the highest coverage of 454 reads and was assembled into 33 large contigs with Newbler and 1747 contigs with mira. The 20 largest contigs from each of these assemblies were consolidated using de novo assembly in Geneious to 10 contigs. An additional round of assembly in Geneious with the 10 consolidated contigs and velvet contigs greater than 10 Kbp further consolidated the draft genome to 6 contigs. Primers were designed for all possible combinations between the 6 contigs. One gap was closed using Sanger-sequenced positive pcr products. Finally, all 454 and Illumina reads for strain JS085 were mapped to the draft genome consisting of 5 contigs and the resulting consensus was used as the final draft genome. The five remaining genomes were assembled by mapping 454 and Illumina reads to the JS085 reference genome in CLC Genomics Workbench. Hybrid de novo assemblies in CLC Genomics Workbench of each strain did not extend contigs or close gaps between the 5 contigs of the draft genomes. Assemblies of unmapped reads produced only short contigs with no significant similarities using nucleotide BLAST 6.

BCO-DMO Processing:
- modified parameter names to conform with BCO-DMO naming conventions;
- added hyperlinks;
- removed "m" (meters) in depth column.


[ table of contents | back to top ]

Data Files

File
Lebetimonas_genomes.csv
(Comma Separated Values (.csv), 7.69 KB)
MD5:69335cf3ef035c17b08cae1173d69106
Primary data file for dataset ID 632784

[ table of contents | back to top ]

Parameters

ParameterDescriptionUnits
sequencing_center

Name of sequencing center.

dimensionless
domain

Domain of sample.

dimensionless
phylum

Taxonomic phylum.

dimensionless
class

Taxonomic class.

dimensionless
order

Taxonomic order.

dimensionless
family

Taxonomic family.

dimensionless
genus

Taxonomic genus.

dimensionless
study_name

Name of study.

dimensionless
sample_name

Name/identifier of the sample.

dimensionless
taxon_oid

Taxon identier (OID).

dimensionless
species

Species identifier.

dimensionless
NCBI_accession_num

NCBI accession number.

dimensionless
accession_url

Hyperlink to NCBI for the accession number.

dimensionless
IMG_genome_ID

IMG database (http://img.jgi.doe.gov/) genome identifier.

dimensionless
NCBI_taxon_ID

NCBI taxon identifier.

dimensionless
IMG_submission_ID

IMG database (http://img.jgi.doe.gov/) submission identifier.

dimensionless
GOLD_study_ID

GOLD database (https://gold.jgi.doe.gov/) study identifier.

dimensionless
GOLD_study_url

Hyperlink to GOLD database (https://gold.jgi.doe.gov/) for the study.

dimensionless
GOLD_project_ID

GOLD database (https://gold.jgi.doe.gov/) project identifier.

dimensionless
GOLD_project_url

Hyperlink to GOLD database (https://gold.jgi.doe.gov/) for the project.

dimensionless
GOLD_analysis_project_ID

GOLD database (https://gold.jgi.doe.gov/) analysis project identifier.

dimensionless
GOLD_analysis_project_url

Hyperlink to GOLD database (https://gold.jgi.doe.gov/) for the analysis project identifier.

dimensionless
GOLD_analysis_project_type

GOLD database (https://gold.jgi.doe.gov/) project type.

dimensionless
gene_model_QC

Gene model QC? (yes/no)

dimensionless
submission_type

Submission type.

dimensionless
strain

Strain.

dimensionless
is_public

Is the dataset public? (yes/no)

dimensionless
high_quality

Is it a high quality dataset? (yes/no)

dimensionless
add_date

?

dimensionless
biotic_relationships

Description of the biotic relationships.

dimensionless
cell_shape

Description of the cell shape.

dimensionless
contact_email

Contact email address.

dimensionless
contact_name

Contact name.

dimensionless
culture_type

Culture type.

dimensionless
cultured

Cultured? (yes/no)

dimensionless
depth

Depth.

dimensionless
ecosystem

Description of ecosystem.

dimensionless
ecosystem_category

Description of ecosystem category.

dimensionless
ecosystem_subtype

Description of ecosystem sub-type.

dimensionless
ecosystem_type

Description of ecosystem type.

dimensionless
energy_source

Energy source.

dimensionless
GOLD_sequencing_strategy

GOLD database (https://gold.jgi.doe.gov/) sequencing strategy.

dimensionless
gram_staining

Type of gram staining.

dimensionless
habitat

Description of habitat.

dimensionless
isolation

Description of isolation.

dimensionless
lat

Latitude.

decimal degrees
longhurst_code

Longhurst code.

dimensionless
longhurst_descrip

Longhurst description.

dimensionless
lon

Longitude.

decimal degress
motility

Motility.

dimensionless
O2_requirement

O2 requirements.

dimensionless
project_name

Project name.

dimensionless
relevance

Relevance.

dimensionless
sporulation

Type of sporulation.

dimensionless
temp_range

Description of temperature range.

dimensionless
gene_count

Gene count.

dimensionless


[ table of contents | back to top ]

Instruments

Dataset-specific Instrument Name
Generic Instrument Name
Automated DNA Sequencer
Dataset-specific Description
Libraries were prepared using Nextera DNA sample prep kits (Illumina, San Diego, CA, USA) and sequenced by Roche 454 GS FLX Titanium (454 Life Sciences, Branford, CT, USA) and/or using Illumina HiSeq 2000 paired reads (Illumina).
Generic Instrument Description
General term for a laboratory instrument used for deciphering the order of bases in a strand of DNA. Sanger sequencers detect fluorescence from different dyes that are used to identify the A, C, G, and T extension reactions. Contemporary or Pyrosequencer methods are based on detecting the activity of DNA polymerase (a DNA synthesizing enzyme) with another chemoluminescent enzyme. Essentially, the method allows sequencing of a single strand of DNA by synthesizing the complementary strand along it, one base pair at a time, and detecting which base was actually added at each step.

Dataset-specific Instrument Name
Jason 2
Generic Instrument Name
ROV Jason
Generic Instrument Description
The Remotely Operated Vehicle (ROV) Jason is operated by the Deep Submergence Laboratory (DSL) at Woods Hole Oceanographic Institution (WHOI). WHOI engineers and scientists designed and built the ROV Jason to give scientists access to the seafloor that didn't require them leaving the deck of the ship. Jason is a two-body ROV system. A 10-kilometer (6-mile) fiber-optic cable delivers electrical power and commands from the ship through Medea and down to Jason, which then returns data and live video imagery. Medea serves as a shock absorber, buffering Jason from the movements of the ship, while providing lighting and a bird’s eye view of the ROV during seafloor operations. During each dive (deployment of the ROV), Jason pilots and scientists work from a control room on the ship to monitor Jason’s instruments and video while maneuvering the vehicle and optionally performing a variety of sampling activities. Jason is equipped with sonar imagers, water samplers, video and still cameras, and lighting gear. Jason’s manipulator arms collect samples of rock, sediment, or marine life and place them in the vehicle’s basket or on "elevator" platforms that float heavier loads to the surface. More information is available from the operator site at URL. https://ndsf.whoi.edu/jason/


[ table of contents | back to top ]

Deployments

TN232

Website
Platform
R/V Thomas G. Thompson
Start Date
2009-04-03
End Date
2009-04-17
Description
Data expected from C-DEBI investigator, Julie Huber. Additional cruise information and original data are available from the NSF R2R data catalog.

KM1005

Website
Platform
R/V Kilo Moana
Start Date
2010-03-16
End Date
2010-03-30
Description
Data expected from C-DEBI investigator, Julie Huber. Additional cruise information and original data are available from the NSF R2R data catalog.


[ table of contents | back to top ]

Project Information

Functional gene diversity and expression in ocean crust microbial communities (NP Functional Gene Div)

Coverage: North Pond


Project description from C-DEBI:
The objective of this project is to determine the diversity, phylogeny, and expression of functional genes involved in carbon, hydrogen, and sulfur cycling in North Pond crustal fluids. These formation fluids are expected to be representative of the ubiquitous cold ocean crust habitat, where reactions between the water and mineral rock surfaces create substrates suitable for sustaining a potentially large reservoir of microbial life. Information regarding crustal microbial communities and the energy sources available for microbial metabolism has been limited by the inaccessibility of samples. IODP Expedition 336 will provide a unique opportunity to access deep subsurface formation fluids from North Pond, including sampling from multiple depth horizons within oceanic crust. My goal is to develop quantitative polymerase chain reaction assays to determine the expression of functional genes in order to increase our understanding of microbial metabolisms in deep subsurface environments.

This project was funded by a C-DEBI Postdoctoral Fellowship to Julie Meyer (formerly at the Marine Biological Laboratory).



[ table of contents | back to top ]

Program Information

Center for Dark Energy Biosphere Investigations (C-DEBI)


Coverage: Global


The mission of the Center for Dark Energy Biosphere Investigations (C-DEBI) is to explore life beneath the seafloor and make transformative discoveries that advance science, benefit society, and inspire people of all ages and origins.

C-DEBI provides a framework for a large, multi-disciplinary group of scientists to pursue fundamental questions about life deep in the sub-surface environment of Earth. The fundamental science questions of C-DEBI involve exploration and discovery, uncovering the processes that constrain the sub-surface biosphere below the oceans, and implications to the Earth system. What type of life exists in this deep biosphere, how much, and how is it distributed and dispersed? What are the physical-chemical conditions that promote or limit life? What are the important oxidation-reduction processes and are they unique or important to humankind? How does this biosphere influence global energy and material cycles, particularly the carbon cycle? Finally, can we discern how such life evolved in geological settings beneath the ocean floor, and how this might relate to ideas about the origin of life on our planet?

C-DEBI's scientific goals are pursued with a combination of approaches:
(1) coordinate, integrate, support, and extend the research associated with four major programs—Juan de Fuca Ridge flank (JdF), South Pacific Gyre (SPG), North Pond (NP), and Dorado Outcrop (DO)—and other field sites;
(2) make substantial investments of resources to support field, laboratory, analytical, and modeling studies of the deep subseafloor ecosystems;
(3) facilitate and encourage synthesis and thematic understanding of submarine microbiological processes, through funding of scientific and technical activities, coordination and hosting of meetings and workshops, and support of (mostly junior) researchers and graduate students; and
(4) entrain, educate, inspire, and mentor an interdisciplinary community of researchers and educators, with an emphasis on undergraduate and graduate students and early-career scientists.

Note: Katrina Edwards was a former PI of C-DEBI; James Cowen is a former co-PI.

Data Management:
C-DEBI is committed to ensuring all the data generated are publically available and deposited in a data repository for long-term storage as stated in their Data Management Plan (PDF) and in compliance with the NSF Ocean Sciences Sample and Data Policy. The data types and products resulting from C-DEBI-supported research include a wide variety of geophysical, geological, geochemical, and biological information, in addition to education and outreach materials, technical documents, and samples. All data and information generated by C-DEBI-supported research projects are required to be made publically available either following publication of research results or within two (2) years of data generation.

To ensure preservation and dissemination of the diverse data-types generated, C-DEBI researchers are working with BCO-DMO Data Managers make data publicly available online. The partnership with BCO-DMO helps ensure that the C-DEBI data are discoverable and available for reuse. Some C-DEBI data is better served by specialized repositories (NCBI's GenBank for sequence data, for example) and, in those cases, BCO-DMO provides dataset documentation (metadata) that includes links to those external repositories.



[ table of contents | back to top ]