Contributors | Affiliation | Role |
---|---|---|
Rocap, Gabrielle | University of Washington (UW) | Principal Investigator, Contact |
Fuchsman, Clara | University of Washington (UW) | Contact |
Rauch, Shannon | Woods Hole Oceanographic Institution (WHOI BCO-DMO) | BCO-DMO Data Manager |
Assembled metagenomes collected in the Eastern Tropical North Pacific Oxygen Deficient Zone in April 2012 on cruise TN278.
These data were published in (Fuchsman et al., 2017)
The GenBank BioProject accession number for this data is PRJNA350692. All samples were obtained from station 136 (106.543W, 17.043N; cast 136) or station BB2 (107.148W, 16.527N; cast 141) on cruise TN278 between April 8-9 2012.
Water from all samples was obtained from Niskin bottles on the CTD Rosette. At station 136 we sampled 10 depths including the oxycline and anoxic zones. Two liters of Niskin water were vacuum filtered onto a 0.2 µm SUPOR filter. At station BB2, a nearby station approximately four liters from 100m, 120m, and 150m were each prefiltered through >30 µm filters and subsequently filtered onto 0.2 µm SUPOR filters. All filters were stored at -80C until further processing on shore. The GenBank BioSample accession numbers are SAMN05944799-SAMN05944812.
DNA was extracted from filters using freeze thaw followed by incubation with lysozyme and proteinase K and phenol/chloroform extraction. A Rubicon THRUPLEX kit was used for library prep using 50 ng of DNA per sample. Four libraries were sequenced on an Illumina HiSeq 2500 in rapid mode (25 million 150 bp paired-end reads per sample) at Michigan State. The other 10 libraries were sequenced on an Illumina HiSeq 2500 in high output mode (40-70 million 125 bp paired-end reads per sample) at the University of Utah. Sequences were quality checked, trimmed, and remaining adapter sequences were removed using Trimmomatic (Bolger et al., 2014). Paired reads that overlapped were combined with Flash (Magoc and Salzberg, 2011). FASTQ files of the raw sequence reads are deposited in the Sequence Read Archive under study SRP092212 and the individual file accession numbers SRS1765745-SRS1765758.
Metagenomic sequences from each sample were assembled independently into larger contigs using two approaches. They have been deposited in the Whole Genome Shotgun archive under the accession numbers:
MOXS00000000
MPLT00000000
MPLU00000000
MPLV00000000
MPLW00000000
MPLX00000000
MPLY00000000
MPLZ00000000
MPLZ00000000
MPMB00000000
MPMC00000000
MPMD00000000
MPME00000000
MPMF00000000
First, for de novo assembly we pre-processed reads with the khmer software package (Crusoe et al. 2015), first using normalize-by-median which implements a Digital normalization algorithm (Brown et al., 2012) to reduce high coverage reads to 20x, followed by filter-abund.py to trim reads of kmers with an abundance below 2, and finally we used filter-below-abund.py to trim kmers with counts above 50 (Zhang et al. 2015). We assembled the khmer processed reads with the VELVET (1.2.10) assembler (Zerbino, 2010), using a kmer size of 45. Velvet assemblies have been released as version 1 in Whole Genome Shotgun archive (accession numbers XXXX01000000).
Second, we also generated a second independent de-novo assembly for each sample using MEGAHIT v1.1.2 (Li et al., 2015) in paired end mode with a minimum contig length of 500. MEGAHIT assemblies have been released as version 2 in the Whole Genome Shotgun archive (accession numbers XXXX02000000).
BCO-DMO Processing:
-modified parameter names to conform with BCO-DMO naming conventions (replaced spaces with underscores);
-added URLs linking to NCBI;
-split original "Lat Long" column into two separate columns; removed trailing "N" and "W" and removed degrees sign; changed lon to negative values;
-formatted date to yyyy-mm-dd from dd-APR-yyyy;
-removed "m" (units) from Depth column values and "L" from Water_Volume column.
File |
---|
ETNP_ODZ_metagenomes_Apr2012.csv (Comma Separated Values (.csv), 6.95 KB) MD5:a1f486cf0ae2a52c5ef34a4ea0aaa6a6 Primary data file for dataset ID 733748 |
Parameter | Description | Units |
Cruise | Cruise identifier | unitless |
Station | Station identifier | unitless |
CTD_cast | CTD cast number | unitless |
Lat | Latitude; positive values = North | decimal degrees |
Long | Longitude; positive values = East | decimal degrees |
Date | Date of sampling | unitless |
Depth | Depth of sample collection | meters (m) |
Water_Volume | Water volume collected | liters (L) |
Sample_Name | Sample identifier | unitless |
BioProject_Accession | NCBI BioProject accession number | unitless |
BioSample_Accession_Number | NCBI BioSample accession number | unitless |
Short_Read_Archive_Accession_Number | NCBI Short Read Archive (SRA) accession number | unitless |
Whole_Genome_Shotgun_Accession_Number | Whole Genome Shotgun accession number | unitless |
WGS_Velvet_Assembly | WGS Velvet Assembly | unitless |
WGS_MEGAHIT_Assembly | WGS MEGAHIT Assembly | unitless |
Sample_Processing | Description of sample processing | unitless |
Dataset-specific Instrument Name | Illumina HiSeq 2500 |
Generic Instrument Name | Automated DNA Sequencer |
Dataset-specific Description | Libraries were sequenced on an Illumina HiSeq 2500. |
Generic Instrument Description | General term for a laboratory instrument used for deciphering the order of bases in a strand of DNA. Sanger sequencers detect fluorescence from different dyes that are used to identify the A, C, G, and T extension reactions. Contemporary or Pyrosequencer methods are based on detecting the activity of DNA polymerase (a DNA synthesizing enzyme) with another chemoluminescent enzyme. Essentially, the method allows sequencing of a single strand of DNA by synthesizing the complementary strand along it, one base pair at a time, and detecting which base was actually added at each step. |
Dataset-specific Instrument Name | |
Generic Instrument Name | Niskin bottle |
Dataset-specific Description | Water from all samples was obtained from Niskin bottles on the CTD Rosette. |
Generic Instrument Description | A Niskin bottle (a next generation water sampler based on the Nansen bottle) is a cylindrical, non-metallic water collection device with stoppers at both ends. The bottles can be attached individually on a hydrowire or deployed in 12, 24, or 36 bottle Rosette systems mounted on a frame and combined with a CTD. Niskin bottles are used to collect discrete water samples for a range of measurements including pigments, nutrients, plankton, etc. |
Website | |
Platform | R/V Thomas G. Thompson |
Start Date | 2012-03-17 |
End Date | 2012-04-23 |
Description | Additional cruise data are available from the Rolling Deck to Repository (R2R): http://www.rvdata.us/catalog/TN278 |
Extracted from the NSF award abstract:
Recent advances in genome sequencing have transformed marine microbial ecology. Metagenomic datasets have led to the discovery of new protein families, underscored the importance of the rare biosphere and even allowed the study of intra-population diversity of dominant members of the community such as Prochlorococcus. Although open ocean microbial communities have been repeatedly demonstrated to vary significantly with depth, in terms of both phylogenetic diversity and biogeochemical function, the vast majority of sequence available in public databases is from surface waters. Much less is known about microbial communities below the surface of the ocean, even those at 50-200m. Next generation sequencing technologies offer a dramatic reduction in cost per base compared to Sanger or even pyrosequencing approaches. The very short read lengths from these platforms have not yet been successfully applied to mixed communities. This project is a metagenomic study of the microbial communities throughout the water column in the South Atlantic gyre using the ABI SOLiD platform with the following objectives:
- Construct and sequence paired-end metagenomic libraries from the surface, chlorophyll maximum, deep euphotic zone and twilight zone in the South Atlantic gyre using ABI SOLiD technology.
- Apply information theory based software methods to assemble the short read sequences and reconstruct the phylogenetic diversity and metabolic potential of these communities.
- Compare reconstructed sequences to existing metagenomic data from other parts of the world's oceans and to metaproteomic data from the the same location to inform sampling strategies and methodological choices for a proposed global molecular oceanic survey (Biogeotraces).
This project is appropriate as an EAGER because the genomic potential of deep euphotic zone and twilight zone communities will be very different from those in surface waters. Thus, this dataset will reshape our understanding of oligotrophic community function. Second, the massive advance in data generated at a low cost by short-read sequencing platforms has the potential to transform microbial oceanography by allowing metagenomic analyses to be routinely employed in hypothesis-driven experiments. However, new interdisciplinary tools are required to assemble and interpret this data. Until these next generation bioinformatic methods are established, proposals that take advantage of these technological breakthroughs remain high risk.
Broader Impacts: Metagenomic analyses of microbial communities collected on a GEOTRACES compliant cruise in the South Atlantic will provide a pilot dataset demonstrating the potential of short-read sequencing technologies for a future sectional molecular survey program. A post-doctoral researcher and a graduate student will be trained in the interdisciplinary perspectives required to analyze this sequence data and interpret it in a biogeochemical context.
Funding Source | Award |
---|---|
NSF Division of Ocean Sciences (NSF OCE) |