Contributors | Affiliation | Role |
---|---|---|
Toonen, Robert J. | University of Hawaiʻi at Mānoa (HIMB) | Principal Investigator |
White, Crow | Cal Poly San Luis Obispo | Principal Investigator |
Christie, Mark | Purdue University | Co-Principal Investigator |
Daniels, Benjamin | Oregon State University (OSU) | Scientist |
Davidson, Jean | Cal Poly San Luis Obispo | Scientist |
Evan, Freel | University of Hawaiʻi at Mānoa (HIMB) | Scientist |
Lee, Andy | Purdue University | Scientist |
López, Cataixa | University of Hawaiʻi at Mānoa (HIMB) | Scientist |
Mickle, Audrey | Woods Hole Oceanographic Institution (WHOI BCO-DMO) | BCO-DMO Data Manager |
Sample collection, DNA extraction, and pooling
Small craft and SCUBA were used to sample Southern and Central California, USA subtidal coastal waters. Foot tissue from 46 K. kelletii (urn:lsid:marinespecies.org:taxname:491054) adults (>60 mm shell length; Rosenthal 1970) – 23 individuals in 2015 and 23 individuals in 2016 – were sampled non-lethally from each of 13 subtidal locations (~15 m depth) across the species’ historical and expanded range. By only including adults, we aimed to represent the local, established population and avoid the bias in population structure that could arise from recent recruitment if juveniles were included. Collected tissue samples were then frozen on dry ice or liquid nitrogen for transport to the California Polytechnic State University (San Luis Obispo, CA) and stored at -80 ℃ until processed for DNA extraction.
DNA extraction was performed using an optimized version of the ‘salting-out’ protocol developed by Li et al. (2011) and modified by Daniels et al. (2023a), where the full extraction protocol is detailed. Briefly, 30 mg of tissue was lysed with proteinase K and RNase A in a warm water bath. Subsequently, DNA was separated from proteins, which precipitated in the presence of ammonium acetate by centrifugation, and was then purified from the supernatant via ethanol washes. Finally, precipitated DNA was resuspended in Tris-EDTA (1xTE) buffer and stored at -20 ºC until further analyses.
DNA quality was assessed visually using a 1% agarose gel in Tris-Acetic Acid-EDTA (TAE) buffer, GelRed (Biotium, Inc) gel stain, and referenced to the 200–10,000 bp Hyperladder I (Bioline, Meridian Bioscience). Because 91.3% of all extractions produced high molecular weight bands (>10 kb) with faint smears from degraded DNA, all specimens were included in the study. Extractions were quantified using the AccuClear Ultra High Sensitivity dsDNA quantification kit (Biotium, Inc) with 3 standards and measured using a SpectraMax M2 microplate reader (Molecular Devices, LLC). Finally, an equimolar amount of DNA from each of the 46 individuals collected at each location was pooled by the collection site (population), ensuring that each library had the same number of individuals. In total, 598 individuals belonging to 13 populations spanning approximately 800 km were included in the analysis.
Library preparation and sequencing
Equimolar pooled ezRAD (Toonen et al. 2013) libraries were generated following the detailed protocol of Knapp et al. (2016) for all 13 sites. Briefly, genomic DNA was digested using the isoschizomer restriction enzymes MboI and Sau3AI (New England Biolabs, Ipswich, MA). Digestions were performed in a total volume of 50 µl, containing 25 µl of dsDNA (~1 µg), 5 µl of NEB Cutsmart Buffer (provided with restriction enzymes), 18 µl of HPLC grade water, 1 µl MboI (10 units), and 1 µl Sau3AI (10 units) under the following thermocycler profile: 37 ºC for 18 h, then deactivation at 65 ºC for 20 min. After digestion, samples were cleaned using Mag-Bind TotalPure NGS (Omega Bio-Tek) beads at a 1:1.18 (DNA:beads) ratio to remove fragments < 200 bp (Norcross, GA). Libraries were prepared for sequencing using the KAPA Hyper Prep DNA kit (Roche Sequencing and Life Science) following a modified version of the manufacturer protocol (see Knapp et al. 2016). Quality control by a Bioanalyzer and sequencing of the libraries on one lane of an Illumina HiSeq2500 were performed in the DNA Technologies and Expression Analysis Core Laboratory at the University of California (Davis, CA).
Data filtering and SNPs calling
Libraries were initially trimmed to remove low-quality bases and adapters using dDocent v.2.9.4 (Puritz et al. 2014), obtaining an average of 16,640,463 reads per sample. The population of Monterey had the lowest number of reads, with 4,439,406, while Yellow Banks had the highest number, with 45,795,618 reads. The same pipeline was used to align the reads using BWA (mem algorithm) to the K. kelletii reference genome, which contains 2,107,417,620 base pairs (2.1 Gb) in 46,654 contigs and a complete BUSCO score of 84.1% (Daniels et al. 2023b).
SNPs were identified using FreeBayes (Garrison & Marth 2012) implemented in dDocent, by calling variants from merged bam files produced by the pipeline. The TotalRawSNPs.vcf file contained 18,327,457 shared SNPs with a mean depth distribution of 5.04, which was filtered through VCFtools v 0.1.16 (Danecek et al. 2011) using the following parameters: maf 0.05, minQ30, and min-meanDP20. To address the potential effect of missing data in our results, we generated a series of files using the filter –max-missing as implemented in vcftools, from most restrictive option (no missing data, max-missing 1) to a very relaxed one (50% missing data, max-missing 0.50). Further filters were explored in AssessPool (github.com/ToBoDev/assessPool), a bioinformatic program designed to filter, analyze, and visualize pool-seq data (Freel 2024). Downstream analyses only included loci with max-missing 0.75 and 30x coverage, which kept reliable SNPs calling with RADseq data while avoiding overrepresented loci with higher quality scores (Bentley et al. 2008, Li 2014, Rivera-Colón & Catchen 2021).
Population genetic analyses
The final vcf filtered file produced was imported to TASSEL v. 5 (Bradbury et al. 2007) to explore population similarities via Principal Component Analysis (PCA). Population genetic differentiation between sites was calculated based on pairwise estimates of FST using PoPoolation2, implemented in AssesPool, which also calculates pairwise significance (p-values) using Fisher’s Exact Test (Kofler et al. 2011). We also compared the matrix of our pairwise estimates of genetic differentiation to those obtained by White et al. (2010), where the authors analyzed between 50 and 92 individuals per population from the Southern California Bight using 9 microsatellite DNA loci. For the collection sites shared between both studies (ANN, COJ, ISV, JAL and YEL), we compared FST values using a Mantel test in the vegan R package with 9999 permutations to test for significantly correlated results (v. 2.5–7; Oksanen et al. 2020).
- Import "KW_DNA_PooledezRAD.xlsx" into BCO-DMO system with formatting
- Convert DateOfExtraction to YYYY-MM-DD
- Adjust parameter names to remove units
- Converted Lon to negative for West is negative representation
- Export file as "958359_v1_kelletia_kelletii_pooled_rad.csv"
Accepted species identifier confirmed on 2025-04-09.
File |
---|
958359_v1_kelletia_kelletii_pooled_rad.csv (Comma Separated Values (.csv), 300.36 KB) MD5:3a7ba3b16820642be9c6ecf1b487e2ad Primary data file for dataset ID 958359, version 1 |
Parameter | Description | Units |
CLNo | DNA extraction cap label number | unitless |
CapLabel | DNA extraction cap label | unitless |
SiteDescription | Name of Kelletia kelletii tissue sample collection site | unitless |
SiteCode | Code name of Kelletia kelletii tissue sample collection site | unitless |
Lat | Latitude of Kelletia kelletii tissue sample collection site, North is positive | decimal degrees |
Lon | Longitude of Kelletia kelletii tissue sample collection site, West is negative | decimal degrees |
SiteCodeLetter_cap | DNA extraction cap label letter | unitless |
Whelk_ID | Unique sample ID | unitless |
TissueType | Sample tissue type (adult or recruit) | unitless |
Year | Year of Kelletia kelletii tissue sample collection | unitless |
ExtractionCap | DNA extraction cap label number | unitless |
DateOfExtraction | Date DNA extraction was performed | unitless |
PerformedBy | Researcher number who performed DNA extraction | unitless |
OriginalCalculatedConcentration | DNA extraction concentration (ng/ul) if measured | ug/ul |
Elution_TEBuffer | TE elution buffer used in DNA extraction protocol | uL |
DNAYield | DNA extraction yield (ug), if measured | ug |
Measurement | Kelletia kelletii maximum shell length (mm), if measured (recruits only) | mm |
PoolRADseq | Flag if sample used in Pooled RADseq (1=yes, 0=no) | unitless |
Dataset-specific Instrument Name | Illumina HiSeq2500 |
Generic Instrument Name | Automated DNA Sequencer |
Dataset-specific Description | Quality control by a Bioanalyzer and sequencing of the libraries on one lane of an Illumina HiSeq2500 were performed in the DNA Technologies and Expression Analysis Core Laboratory at the University of California (Davis, CA). |
Generic Instrument Description | A DNA sequencer is an instrument that determines the order of deoxynucleotides in deoxyribonucleic acid sequences. |
Dataset-specific Instrument Name | SpectraMax M2 microplate reader (Molecular Devices, LLC) |
Generic Instrument Name | plate reader |
Dataset-specific Description | Extractions were quantified using the AccuClear Ultra High Sensitivity dsDNA quantification kit (Biotium, Inc) with 3 standards and measured using a SpectraMax M2 microplate reader (Molecular Devices, LLC). |
Generic Instrument Description | Plate readers (also known as microplate readers) are laboratory instruments designed to detect biological, chemical or physical events of samples in microtiter plates. They are widely used in research, drug discovery, bioassay validation, quality control and manufacturing processes in the pharmaceutical and biotechnological industry and academic organizations. Sample reactions can be assayed in 6-1536 well format microtiter plates. The most common microplate format used in academic research laboratories or clinical diagnostic laboratories is 96-well (8 by 12 matrix) with a typical reaction volume between 100 and 200 uL per well. Higher density microplates (384- or 1536-well microplates) are typically used for screening applications, when throughput (number of samples per day processed) and assay cost per sample become critical parameters, with a typical assay volume between 5 and 50 µL per well. Common detection modes for microplate assays are absorbance, fluorescence intensity, luminescence, time-resolved fluorescence, and fluorescence polarization. From: http://en.wikipedia.org/wiki/Plate_reader, 2014-09-0-23. |
Dataset-specific Instrument Name | thermocycler |
Generic Instrument Name | Thermal Cycler |
Dataset-specific Description | Briefly, genomic DNA was digested using the isoschizomer restriction enzymes MboI and Sau3AI (New England Biolabs, Ipswich, MA). Digestions were performed in a total volume of 50 µl, containing 25 µl of dsDNA (~1 µg), 5 µl of NEB Cutsmart Buffer (provided with restriction enzymes), 18 µl of HPLC grade water, 1 µl MboI (10 units), and 1 µl Sau3AI (10 units) under the following thermocycler profile: 37 ºC for 18 h, then deactivation at 65 ºC for 20 min. |
Generic Instrument Description | A thermal cycler or "thermocycler" is a general term for a type of laboratory apparatus, commonly used for performing polymerase chain reaction (PCR), that is capable of repeatedly altering and maintaining specific temperatures for defined periods of time. The device has a thermal block with holes where tubes with the PCR reaction mixtures can be inserted. The cycler then raises and lowers the temperature of the block in discrete, pre-programmed steps. They can also be used to facilitate other temperature-sensitive reactions, including restriction enzyme digestion or rapid diagnostics.
(adapted from http://serc.carleton.edu/microbelife/research_methods/genomics/pcr.html) |
NSF Award Abstract:
Where do young marine fish and shellfish come from? This project aims to improve our understanding of how coastal marine populations are connected in space and time. Coastal populations are replenished through the arrival of minuscule larvae that have been dispersed for weeks to months in the open ocean after spawning at remote sites. The combination of the long dispersal period of marine fish and shellfish larvae and the varying ocean currents results in complex patterns of "connectivity" among populations near and far. Identifying these patterns of connectivity is fundamental to marine science and critical for effective fisheries management and conservation, yet it remains an unresolved component of marine ecology. The study species is currently expanding its biogeographic range up the U.S. west coast. By genetically analyzing individuals from across the species' range, including offspring spawned in the laboratory by experimentally-crossed individuals collected in the field from throughout the species historical and expanded range, certain genes can serve to differentiate populations along the coast. The team leverages the statistical power of these geographically-informative genes to assign thousands of young collected in the field to the source populations that spawned them (across the species' range and over multiple years). The team then quantifies patterns of connectivity over multiple years, and tests fundamental hypotheses on the spatial scale, temporal variability, biogeographic patterns, and biophysical drivers of population connectivity. The project trains approximately two dozen U.S. university students in molecular ecology and marine science, as well as creating intellectual linkages among Ph.D.-granting and non-Ph.D.-granting universities. The project also supports further development of a K-12 education program that uses SCUBA diving and videography to teach elementary school students Next Generation Science Standards and train them for careers in science, technology, engineering and mathematics.
Using a kelp forest gastropod and fisheries species (Kellet's whelk, Kelletia kelletii), this project combines genome-wide Restriction site Associated DNA (RAD) loci with transcriptomic loci identified from common-garden laboratory crosses of individuals from the species' historical and expanded range to identify geographically-informative loci that maximize power for individual assignment testing. Leveraging the combined power of these loci, genetic assignment of approximately three thousand recruit samples to 20 putative source populations allows the team to construct three independent years of connectivity matrices and test some of the most fundamental questions in marine ecology, including: 1) Are marine populations open or closed and at what scales? 2) To what degree is the evolutionary pattern of gene flow represented by single versus multiple generations of connectivity events? And, 3) How spatially heterogeneous and temporally variable is population connectivity? Can one year of connectivity data predict anything about the next? Additionally, by focusing on a range-expanding species with common life history traits, the team addresses a number of questions with broad applicability and significant ecological and societal implications: 4) How much is population connectivity influenced by post-recruitment demographic and evolutionary processes? 5) How well-connected are historic- and expanded-range populations? And, of particular relevance to climate change, 6) Are El Nino oceanographic conditions, which are predicted to increase in frequency and intensity this century, driving the poleward range expansion of this coastal marine species? By coupling common-garden experimental crosses to identify maximally-informative transcriptomic loci with genomic RAD analysis of field samples, this project aims to accurately and precisely quantify marine population connectivity in high gene flow species with large population sizes.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Funding Source | Award |
---|---|
NSF Division of Ocean Sciences (NSF OCE) | |
NSF Division of Ocean Sciences (NSF OCE) | |
NSF Division of Ocean Sciences (NSF OCE) |