Illinois Data Bank Dataset Search Results
Results
published:
2018-03-01
The data set consists of Illumina sequences derived from 48 sediment samples, collected in 2015 from Lake Michigan and Lake Superior for the purpose of inventorying the fungal diversity in these two lakes. DNA was extracted from ca. 0.5g of sediment using the MoBio PowerSoil DNA isolation kits following the Earth Microbiome protocol. PCR was completed with the fungal primers ITS1F and fITS7 using the Fluidigm Access Array. The resulting amplicons were sequenced using the Illumina Hi-Seq2500 platform with rapid 2 x 250nt paired-end reads. The enclosed data sets contain the forward read files for both primers, both fixed-header index files, and the associated map files needed to be processed in QIIME. In addition, enclosed are two rarefied OTU files used to evaluate fungal diversity. All decimal latitude and decimal longitude coordinates of our collecting sites are also included.
File descriptions:
Great_lakes_Map_coordinates.xlsx = coordinates of sample sites
QIIME Processing ITS1 region: These are the raw files used to process the ITS1 Illumina reads in QIIME. ***only forward reads were processed
GL_ITS1_HW_mapFile_meta.txt = This is the map file used in QIIME.
ITS1F_Miller_Fludigm_I1_fixedheader.fastq = Index file from Illumina. Headers were fixed to match the forward reads (R1) file in order to process in QIIME
ITS1F_Miller_Fludigm_R1.fastq = Forward Illumina reads for the ITS1 region.
QIIME Processing ITS2 region: These are the raw files used to process the ITS2 Illumina reads in QIIME. ***only forward reads were processed
GL_ITS2_HW_mapFile_meta.txt = This is the map file used in QIIME.
ITS7_Miller_Fludigm_I1_Fixedheaders.fastq = Index file from Illumina. Headers were fixed to match the forward reads (R1) file in order to process in QIIME
ITS7_Miller_Fludigm_R1.fastq = Forward Illumina reads for the ITS2 region.
Resulting OTU Table and OTU table with taxonomy
ITS1 Region
wahl_ITS1_R1_otu_table.csv = File contains Representative OTUs based on ITS1 region for all the R1 data and the number of each OTU found in each sample.
wahl_ITS1_R1_otu_table_w_tax.csv = File contains Representative OTUs based on ITS1 region for all the R1 and the number of each OTU found in each sample along with taxonomic determination based on the following database: sh_taxonomy_qiime_ver7_97_s_31.01.2016_dev
ITS2 Region
wahl_ITS2_R1_otu_table.csv = File contains Representative OTUs based on ITS2 region for all the R1 data and the number of each OTU found in each sample.
wahl_ITS2_R1_otu_table_w_tax.csv = File contains Representative OTUs based on ITS2 region for all the R1 data and the number of each OTU found in each sample along with taxonomic determination based on the following database: sh_taxonomy_qiime_ver7_97_s_31.01.2016_dev
Rarified illumina dataset for each ITS Region
ITS1_R1_nosing_rare_5000.csv = Environmental parameters and rarefied OTU dataset for ITS1 region.
ITS2_R1_nosing_rare_5000.csv = Environmental parameters and rarefied OTU dataset for ITS2 region.
Column headings:
#SampleID = code including researcher initials and sequential run number
BarcodeSequence =
LinkerPrimerSequence = two sequences used CTTGGTCATTTAGAGGAAGTAA or GTGARTCATCGAATCTTTG
ReversePrimer = two sequences used GCTGCGTTCTTCATCGATGC or TCCTCCGCTTATTGATATGC
run_prefix = initials of run operator
Sample = location code, see thesis figures 1 and 2 for mapped locations and Great_lakes_Map_coordinates.xlsx for exact coordinates.
DepthGroup = S= shallow (50-100 m), MS=mid-shallow (101-150 m), MD=mid-deep (151-200 m), and D=deep (>200 m)"
Depth_Meters = Depth in meters
Lake = lake name, Michigan or Superior
Nitrogen %
Carbon %
Date = mm/dd/yyyy
pH = acidity, potential of Hydrogen (pH) scale
SampleDescription = Sample or control
X = sequential run number
OTU ID = Operational taxonomic unit ID
keywords:
Illumina; next-generation sequencing; ITS; fungi
published:
2020-02-05
Zahniser, James; Dietrich, Christopher
(2020)
The Delt_Comb.NEX text file contains the original data used in the phylogenetic analyses of Zahniser & Dietrich, 2013 (European Journal of Taxonomy, 45: 1-211). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first nine lines of the file indicate the file type (Nexus), that 152 taxa were analyzed, that a total of 3971 characters were analyzed, the format of the data, and specification for two symbols used in the dataset. There are four datasets separated into blocks, one each for: 28S rDNA gene, Histone H3 gene, morphology, and insertion/deletion characters scored based on the alignment of the 28S rDNA dataset. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the publication using this dataset. A text file, Delt_morph_char.txt, is available here that states the morphological characters and characters states that were scored in the Delt_Comb.NEX dataset. The original DNA sequence data are available from NCBI GenBank under the accession numbers indicated in publication. Chromatogram files for each sequencing read are available from the first author upon request.
keywords:
phylogeny; DNA sequence; morphology; parsimony analysis; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; histone H3; bayesian analysis
published:
2019-02-26
Neumann, Elizabeth; Comi, Troy; Rubakhin, Stanislav; Sweedler, Jonathan
(2019)
We have recently created an approach for high throughput single cell measurements using matrix assisted laser desorption / ionization mass spectrometry (MALDI MS) (J Am Soc Mass Spectrom. 2017, 28, 1919-1928. doi: 10.1007/s13361-017-1704-1. Chemphyschem. 2018, 19, 1180-1191. doi: 10.1002/cphc.201701364). While chemical detail is obtained on individual cells, it has not been possible to correlate the chemical information with canonical cell types.
Now we combine high-throughput single cell mass spectrometry with immunocytochemistry to determine lipid profiles of two known cell types, astrocytes and neurons from the rodent brain, with the work appearing as “Lipid heterogeneity between astrocytes and neurons revealed with single cell MALDI MS supervised by immunocytochemical classification” (DOI: 10.1002/anie.201812892).
Here we provide the data collected for this study. The dataset provides the raw data and script files for the rodent cerebral cells described in the manuscript.
keywords:
Single cell analysis; mass spectrometry; astrocyte; neuron; lipid analysis
published:
2019-06-12
Miller, Andrew; Raudabaugh, Daniel
(2019)
The data set contains Supplemental data sets for the Manuscript entitled "Where are they hiding? Testing the body snatchers hypothesis in pyrophilous fungi."
Environmental sampling: Amplification of nuclear DNA regions (ITS1 and ITS2) were completed using the Fluidigm Access Array and the resulting amplicons were sequenced on an Illumina MiSeq v2 platform runs using rapid 2 × 250 nt paired-end reads. Illumina sequencing run amplicons that were size selected into <500nt and >500nt sub-pools, then remixed together <500nt: >500nt by nM concentration in a 1x:3x proportion. All amplification and sequencing steps were performed at the Roy J. Carver Biotechnology Center at the University of Illinois Urbana-Champaign.
ITS1 region primers consisted of ITS1F (5'-CTTGGTCATTTAGAGGAAGTAA-'3) and ITS2 (5'-GCTGCGTTCTTCATCGATGC-'3).
ITS2 region primers consisted of fITS7 (5'-GTGARTCATCGAATCTTTG-'3) and ITS4 (5'-TCCTCCGCTTATTGATATGC-'3).
Supplemental files 1 through 5 contain the raw data files.
Supplemental 1 is the ITS1 Illumina MiSeq forward reads and Supplemental 2 is the corresponding index files.
Supplemental 3 is the ITS2 Illumina MiSeq forward reads and Supplemental 4 is the corresponding index files.
Supplemental 5 is the map file needed to process the forward reads and index files in QIIME.
Supplemental 6 and 7 contain the resulting QIIME 1.9.1. OTU tables along with UNITE, NCBI, and CONSTAX taxonomic assignments in addition to the representative OTU sequence.
Numeric samples within the OTU tables correspond to the following:
1 Brachythecium sp.
2 Usnea cornuta
3 Dicranum sp.
4 Leucodon julaceus
5 Lobaria quercizans
6 Rhizomnium sp.
7 Dicranum sp.
8 Thuidium delicatulum
9 Myelochroa aurulenta
10 Atrichum angustatum
11 Dicranum sp.
12 Hypnum sp.
13 Atrichum angustatum
14 Hypnum sp.
15 Thuidium delicatulum
16 Leucobryum sp.
17 Polytrichum commune
18 Atrichum angustatum
19 Atrichum angustatum
20 Atrichum crispulum
21 Bryaceae
22 Leucobryum sp.
23 Conocephalum conicum
24 Climacium americanum
25 Atrichum angustatum
26 Huperzia serrata
27 Polytrichum commune
28 Diphasiastrum sp.
29 Anomodon attenuatus
30 Bryoandersonia sp.
31 Polytrichum commune
32 Thuidium delicatulum
33 Brachythecium sp.
34 Leucobryum glaucum
35 Bryoandersonia sp.
36 Anomodon attenuatus
37 Pohlia sp.
38 Cinclidium sp.
39 Hylocomium splendens
40 Polytrichum commune
41 negative control
42 Soil
43 Soil
44 Soil
45 Soil
46 Soil
47 Soil
If a sample number is not present within the OTU table; either no sequences were obtained or no sequences passed the quality filtering step in QIIME.
Supplemental 8 contains the Summary of unique species per location.
published:
2019-07-08
Kehoe, Adam K.; Torvik, Vetle I.
(2019)
# Overview
These datasets were created in conjunction with the dissertation "Predicting Controlled Vocabulary Based on Text and Citations: Case Studies in Medical Subject Headings in MEDLINE and Patents," by Adam Kehoe.
The datasets consist of the following:
* twin_not_abstract_matched_complete.tsv: a tab-delimited file consisting of pairs of MEDLINE articles with identical titles, authors and years of publication. This file contains the PMIDs of the duplicate publications, as well as their medical subject headings (MeSH) and three measures of their indexing consistency.
* twin_abstract_matched_complete.tsv: the same as above, except that the MEDLINE articles also have matching abstracts.
* mesh_training_data.csv: a comma-separated file containing the training data for the model discussed in the dissertation.
* mesh_scores.tsv: a tab-delimited file containing a pairwise similarity score based on word embeddings, and MeSH hierarchy relationship.
## Duplicate MEDLINE Publications
Both the twin_not_abstract_matched_complete.tsv and twin_abstract_matched_complete.tsv have the same structure. They have the following columns:
1. pmid_one: the PubMed unique identifier of the first paper
2. pmid_two: the PubMed unique identifier of the second paper
3. mesh_one: A list of medical subject headings (MeSH) from the first paper, delimited by the "|" character
4. mesh_two: a list of medical subject headings from the second paper, delimited by the "|" character
5. hoopers_consistency: The calculation of Hooper's consistency between the MeSH of the first and second paper
6. nonhierarchicalfree: a word embedding based consistency score described in the dissertation
7. hierarchicalfree: a word embedding based consistency score additionally limited by the MeSH hierarchy, described in the dissertation.
## MeSH Training Data
The mesh_training_data.csv file contains the training data for the model discussed in the dissertation. It has the following columns:
1. pmid: the PubMed unique identifier of the paper
2. term: a candidate MeSH term
3. cit_count: the log of the frequency of the term in the citation candidate set
4. total_cit: the log of the total number the paper's citations
5. citr_count: the log of the frequency of the term in the citations of the paper's citations
6. total_citofcit: the log of the total number of the citations of the paper's citations
7. absim_count: the log of the frequency of the term in the AbSim candidate set
8. total_absim_count: the log of the total number of AbSim records for the paper
9. absimr_count: the log of the frequency of the term in the citations of the AbSim records
10. total_absimr_count: the log of the total number of citations of the AbSim record
11. log_medline_frequency: the log of the frequency of the candidate term in MEDLINE.
12. relevance: a binary indicator (True/False) if the candidate term was assigned to the target paper
## Cosine Similarity
The mesh_scores.tsv file contains a pairwise list of all MeSH terms including their cosine similarity based on the word embedding described in the dissertation. Because the MeSH hierarchy is also used in many of the evaluation measures, the relationship of the term pair is also included. It has the following columns:
1. mesh_one: a string of the first MeSH heading.
2. mesh_two: a string of the second MeSH heading.
3. cosine_similarity: the cosine similarity between the terms
4. relationship_type: a string identifying the relationship type, consisting of none, parent/child, sibling, ancestor and direct (terms are identical, i.e. a direct hierarchy match).
The mesh_model.bin file contains a binary word2vec C format file containing the MeSH term embeddings. It was generated using version 3.7.2 of the Python gensim library (https://radimrehurek.com/gensim/).
For an example of how to load the model file, see https://radimrehurek.com/gensim/models/word2vec.html#usage-examples, specifically the directions for loading the "word2vec C format."
keywords:
MEDLINE;MeSH;Medical Subject Headings;Indexing
published:
2023-07-10
Harmon-Threatt, Alexandra N.; Anderson, Nicholas L.
(2023)
Bee movement between habitat patches in a naturally fragmented ecosystem depended on species, patch, and matrix variables. Using a mark-recapture methodology in the naturally fragmented Ozark glade ecosystem, we assessed the importance of bee size, nesting biology, the distance between patches (e.g., isolation), and nesting and floral resources in habitat patches and the surrounding matrix on bee movement.
This dataset includes seven data files, three R code files, and a QGIS tool. Three of the data files include information collected at the study sites with regard to bees and matrix and patch characteristics. The other four data files are spatial files used to quantify the characteristics of the forest canopy between the study sites and the edge-to-edge distances between the study sites. R code in the R Markdown file recreates the analysis and data presentation for the associated publication. R script files contain processes for calculating some of the explanatory variables used in the analysis. The QGIS tool can be used as the first step to obtaining average values from a raster file where the cells are large relative to the areas of interest (AOI) that you would like to characterize. The second step is contained in one of the aforementioned R scripts.
Detected effects included: Larger bees were more likely to move between patches. Bee movement was less likely as the distance between patches increased. However, relatively short distances (~50 m) inhibited movement more than our a priori expectations. Bees were unlikely to move away from home patches with abundant and diverse floral and below-ground nesting resources. When home patches were less resource-rich, bee movement depended on the characteristics of the away patch or the matrix. In these cases, bees were more likely to move to away patches with greater below-ground nesting and floral resources. Matrix habitats with more available floral and below-ground nesting resources appear to impede movement to neighboring patches, potentially because they already provide supplemental resources for bees.
keywords:
habitat fragmentation; bees; movement; mark-recapture; nesting resources; floral resources; isolation
published:
2024-07-08
Chong, Jer Pin; Minnaert-Grote, Jamie; Zaya, David N.; Ashley, Mary V.; Coons, Janice; Ramp Neal, Jennifer M.; Molano-Flores, Brenda
(2024)
A population genetics study was conducted on three plant taxa in the genus Physaria that are found on the Kaibab Plateau (Arizona, USA). Physaria kingii subsp. kaibabensis is endemic to the Kaibab Plateau, and is of conservation concern because of its rarity, limited range, and potential threats to its long-term persistence. Additionally, the taxon is a candidate for federal protection under the Endangered Species Act. It was not clear how genetically isolated P. k. subsp. kaibabensis was from Physaria kingii subsp. latifolia, which is a widespread subspecies found throughout the southwestern USA, including on the Kaibab Plateau. Additionally, other authors have suggested that P. k. subsp. kaibabensis may hybridize with Physaria arizonica, a different species that is also widespread and found on and off the Kaibab Plateau. We conducted a population genetics study of all three groups to better determine the conservation status of P. k. subsp. kaibabensis. Genetic data are in the form of nuclear DNA microsatellites for 13 loci (all apparently diploid). Additionally, we have included location information for the collection sites. We collected tissue samples from on and off the Kaibab Plateau. The overall findings are shared in a manuscript being submitted for peer-review.
keywords:
Physaria kingii; Kaibab Plateau; endemism; conservation genetics; rare species biology
published:
2019-08-29
This is the published ortholog set derived from whole genome data used for the analysis of members of the B. tabaci complex of whiteflies. It includes the concatenated alignment and individual gene alignments used for analyses (Link to publication: https://www.mdpi.com/1424-2818/11/9/151).
published:
2025-10-08
Kim, Sang Yeol; Stessman, Dan J.; Wright, David A.; Spalding, Martin H.; Huber, Steven; Ort, Donald
(2025)
Rubisco activase (Rca) facilitates the release of sugar‐phosphate inhibitors from the active sites of Rubisco and thereby plays a central role in initiating and sustaining Rubisco activation. In Arabidopsis, alternative splicing of a single Rca gene results in two Rca isoforms, Rca‐α and Rca‐β. Redox modulation of Rca‐α regulates the function of Rca‐α and Rca‐β acting together to control Rubisco activation. Although Arabidopsis Rca‐α alone less effectively activates Rubisco in vitro , it is not known how CO2 assimilation and plant growth are impacted. Here, we show that two independent transgenic Arabidopsis lines expressing Rca‐α in the absence of Rca‐β (“Rca‐α only” lines) grew more slowly in various light conditions, especially under low light or fluctuating light intensity, and in a short day photoperiod compared to wildtype. Photosynthetic induction was slower in the Rca‐α only lines, and they maintained a lower rate of CO2 assimilation during both photoperiod types. Our findings suggest Rca oligomers composed of Rca‐α only are less effective in initiating and sustaining the activation of Rubisco than when Rca‐β is also present. Currently there are no examples of any plant species that naturally express Rca‐α only but numerous examples of species expressing Rca‐β only. That Rca‐α exists in most plant species, including many C3 and C4 food and bioenergy crops, implies its presence is adaptive under some circumstances.
keywords:
Feedstock Production;Biomass Analytics;Phenomics
published:
2025-10-24
Maitra, Shraddha; Singh, Vijay
(2025)
Sweet sorghum is typically cultivated for the food and fodder market. Recently, sweet sorghum varieties are being metabolically transitioned to enhance energy density by accumulating oil droplets in their vegetative tissues for bioenergy applications. Owing to the high biomass yield of sorghum, the transgenic lines can compete with oil-seed crops for biodiesel yield per unit area. In the initial phase of transgenic development, a high-throughput phenotyping method can bridge the gap between the production pipeline and analysis to improve the efficiency of the process. To meet the requirement, the present study extends the application of time-domain 1H-NMR spectroscopy for rapid quantification and characterization of the total in-situ lipids of sweet sorghum ‘ramada’ to lay the groundwork for analyzing the upcoming large quantity of transgenic samples. NMR technology has been successfully established for analyzing lipid contents of vegetative tissues of non-transgenic variety. The multiexponential analysis of spin-lattice (T1) relaxation spectra obtained from TD-NMR aided the investigation of the dynamics of the free and bound lipid fraction with plant development. The total lipid concentration of bagasse and leaves of non-transgenic sweet sorghum remained unchanged throughout the plant development. Leaves displayed a higher percentage of bound lipids as compared to bagasse. A significant variation in the lipid concentration of juice was observed at the different growth stages with a maximum lipid accumulation of 1.21 ± 0.04% w/w at the boot stage that decreased with further maturity of the plant.
keywords:
Conversion;Biomass Analytics;Lipidomics;Metabolomics
published:
2021-10-15
Atomic oxygen densities in the MLT, averaged for 2002-2018 for 26, 14 day periods, beginning January 1.
keywords:
SABER data
published:
2025-04-04
Fang, Liri; Salami, Malik Oyewale; Weber, Griffin M.; Torvik, Vetle I.
(2025)
This dataset, uCite, is the union of nine large-scale open-access PubMed citation data separated by reliability. There are 20 files, including the reliable and unreliable citation PMID pairs, non-PMID identifiers to PMID mapping (for DOIs, Lens, MAG, and Semantic Scholar), original PMID pairs from the nine resources, some metadata for PMIDs, duplicate PMIDs, some redirected PMID pairs, and PMC OA Patci citation matching results.
The short description of each data file is listed as follows. A detailed description can be found in the README.txt.
<strong>DATASET DESCRIPTION</strong>
<ol>
<li>PPUB.tsv.gz - tsv format file containing reliable citation pairs uCite.</li>
<li>PUNR.tsv.gz - tsv format file containing reliable citation pairs uCite.</li>
<li>DOI2PMID.tsv.gz - tsv format file containing results mapping DOI to PMID. </li>
<li> LEN2PMID.tsv.gz - tsv format file containing results mapping LensID pairs to PMID pairs.. </li>
<li> MAG2PMIDsorted.tsv.gz - tsv format file containing results mapping MAG ID to PMID. </li>
<li>SEM2PMID.tsv.gz - tsv ormat file containing results mapping Semantic Scholar ID to PMID. </li>
<li>JVNPYA.tsv.gz - tsv format file containing metadata of papers with PMID, journal name, volume, issue, pages, publication year, and first author's last name. </li>
<li>TiLTyAlJVNY.tsv.gz - tsv format file containing metadata of papers. </li>
<li> PMC-OA-patci.tsv.gz - tsv format file containing PubMed Central Open Access subset reference strings extracted by \cite{} processed by Patci.</li>
<li>REDIRECTS.gz - txt file containing unreliable PMID pairs mapped to reliable PMID pairs. </li>
<li>REMAP - file containing pairs of duplicate PubMed records (lhs PMID mapped to rhs PMID).</li>
<li> ami_pair.tsv.gz - tsv format file containing all citation pairs from Aminer (2015 version). </li>
<li> dim_pair.tsv.gz - tsv format file containing all citation pairs from Dimensions. </li>
<li> ice_pair.tsv.gz - tsv format file containing all citation pairs from iCite (April 2019 version, version 1). </li>
<li> len_pair.tsv.gz - tsv format file containing all citation pairs from Lens.org (harvested through Oct 2021). </li>
<li>mag_pair.tsv.gz - tsv format file containing all citation pairs from Microsoft Academic Graph (2015 version). </li>
<li> oci_pair.tsv.gz - tsv format file containing all citation pairs from Open Citations (Nov. 2021 dump, csv version ). </li>
<li> pat_pair.tsv.gz - tsv format file containing all citation pairs from Patci (i.e., from "PMC-OA-patci.tsv.gz"). </li>
<li> pmc_pair.tsv.gz - tsv format file containing all citation pairs from PubMed Central (harvest through Dec 2018 via e-Utilities).</li>
<li> sem_pair.tsv.gz - tsv format file containing all citation pairs from Semantic Scholar (2019 version) . </li>
</ol>
<strong>COLUMN DESCRIPTION</strong>
<strong>FILENAME</strong> : <em>PPUB.tsv.gz, PUNR.tsv.gz</em>
(1) fromPMID - PubMed ID of the citing paper.
(2) toPMID - PubMed ID of the cited paper.
(3) sources - citation sources, in which the citation pairs are identified.
(4) fromYEAR - Publication year of the citing paper.
(5) toYEAR - Publication year of the cited paper.
<strong>FILENAME</strong> : <em>DOI2PMID.tsv.gz</em>
(1) DOI - Semantic Scholar ID of paper records.
(2) PMID - PubMed ID of paper records.
(3) PMID2 - Digital Object Identifier of paper records, “-” if the paper doesn't have DOIs.
<strong>FILENAME</strong> : <em>SEMID2PMID.tsv.gz</em>
(1) SemID - Semantic Scholar ID of paper records.
(2) PMID - PubMed ID of paper records.
(3) DOI - Digital Object Identifier of paper records, “-” if the paper doesn't have DOIs.
<strong>FILENAME</strong> : <em>JVNPYA.tsv.gz</em>
- Each row refers to a publication record.
(1) PMID - PubMed ID.
(2) journal - Journal name.
(3) volume - Journal volume.
(4) issue - Journal issue.
(5) pages - The first page and last page (without leading digits) number of the publication separated by '-'.
(6) year - Publication year.
(7) lastname - Last name of the first author.
<strong>FILENAME</strong> : <em>TiLTyAlJVNY.tsv.gz</em>
(1) PMID - PubMed ID.
(2) title_tokenized - Paper title after tokenization.
(3) languages - Language that paper is written in.
(4) pub_types - Types of the publication.
(5) length(authors) - String length of author names.
(6) journal -Journal name .
(7) volume - Journal volume .
(8) issue - Journal issue.
(9) year - Publication year of print (not necessary epub).
<strong>FILENAME</strong> : <em> PMC-OA-patci.tsv.gz</em>
(1) pmcid - PubMed Central identifier.
(2) pos -
(3) fromPMID - PubMed ID of the citing paper.
(4) toPMID - PubMed ID of the cited paper.
(5) SRC - citation sources, in which the citation pairs are identified.
(6) MatchDB - PubMed, ADS, DBLP.
(7) Probability - Matching probability predicted by Patci.
(8) toPMID2 - PubMed ID of the cited paper, extracted from OA xml file
(9) SRC2 - citation sources, in which the citation pairs are identified.
(10) intxt_id -
(11) jounal - First character of the journal name.
(12) same_ref_string - Y if patci and xml reference string match, otherwise N.
(13) DIFF -
(14) bestSRC - Citation sources, in which the citation pairs are identified.
(15) Match - Matching strings annotated by Patci.
<strong>FILENAME</strong> : <em>REDIRECTS.gz</em>
Each row in Redirectis.txt is a string sequence in the same format as follows.
- "REDIRECTED FROM: source PMID_i PMID_j -> PMID_i' PMID_j "
- "REDIRECTED TO: source PMID_i PMID_j -> PMID_i PMID_j' "
Note: source is the names of sources where the PMID_i and PMID_j are from.
<strong>FILENAME</strong> : <em>REMAP</em>
Each row is remapping unreliable PMID pairs mapped to reliable PMID pairs.
The format of each row is "$REMAP{PMID_i} = PMID_j".
<strong>FILENAME</strong> : <em>ami_pair.tsv.gz, dim_pair.tsv.gz, ice_pair.tsv.gz, len_pair.tsv.gz, mag_pair.tsv.gz, oci_pair.tsv.gz, pat_pair.tsv.gz,pmc_pair.tsv.gz, sem_pair.tsv.gz</em>
(1) fromPMID - PubMed ID of the citing paper.
(2) toPMID - PubMed ID of the cited paper.
keywords:
Citation data; PubMed; Social Science;
published:
2018-06-18
Clark, Lindsay V.; Jin, Xiaoli; Petersen, Karen K.; Anzoua, Kossanou G.; Bagmet, Larissa; Chebukin, Pavel; Deuter, Martin; Dzyubenko, Elena; Dzyubenko, Nicolay; Heo, Kweon; Johnson, Douglas A.; Jørgensen, Uffe; Kjeldsen, Jens B.; Nagano, Hironori; Peng, Junhua; Sabitov, Andrey; Yamada, Toshihiko; Yoo, Ji Hye; Yu, Chang Yeon; Long, Stephen P.; Sacks, Erik J.
(2018)
This repository contains datasets and R scripts that were used in a study of the population structure of Miscanthus sacchariflorus in its native range across East Asia. Notably, genotypes of 764 individuals at 34,605 SNPs, called from reduced-representation DNA sequencing using a non-reference bioinformatics pipeline, are provided. Two similar SNP datasets, used for identifying clonal duplicates and for determining the ancestry of ornamental and hybrid Miscanthus plants identified in previous studies respectively, are also provided. There is also a spreadsheet listing the provenance and ploidy of all individuals along with their plastid (chloroplast) haplotypes. Software output for Structure, Treemix, and DIYABC is also included. See README.txt for more information about individual files. Results of this study are described in a manuscript in revision in Annals of Botany by the same authors, "Population structure of Miscanthus sacchariflorus reveals two major polyploidization events, tetraploid-mediated unidirectional introgression from diploid Miscanthus sinensis, and diversity centered around the Yellow Sea."
keywords:
Miscanthus; restriction site-associated DNA sequencing (RAD-seq); single nucleotide polymorphism (SNP); population genetics; Miscanthus xgiganteus; Miscanthus sacchariflorus; R scripts; germplasm; plastid haplotype
published:
2022-05-13
Yan, Bin; Dietrich, Christopher; Yu, Xiaofei; Dai, Renhuai; Maofa, Yang
(2022)
The files are plain text and contain the original data used in phylogenetic analyses of of Typhlocybinae (Bin, Dietrich, Yu, Meng, Dai and Yang 2022: Ecology & Evolution, in press). The three files with extension .phy are text files with aligned DNA sequences in the standard PHYLIP format and correspond to Matrix 1 (amino acid alignment), Matrix 2 (nucleotide alignment of first two codon positions of protein-coding genes) and Matrix 3 (nucleotide alignment of protein-coding genes plus 2 ribosomal genes) described in the Methods section. An additional text file in NEXUS format (.nex extension) contains the morphological character data used in the ancestral state reconstruction (ASCR) analysis described in the Methods. NEXUS is a standard format used by various phylogenetic analysis software. For more information on data file content, see the included "readme" files.
keywords:
Hemiptera; phylogeny; mitochondrial genome; morphology; leafhopper
published:
2025-12-08
Li, Shuai; Moller, Christopher; Mitchell, Noah G.; Martin, Duncan; Sacks, Erik; Saikia, Sampurna; Labonte, Nicholas R.; Baldwin, Brian S.; Morrison, Jesse; Ferguson, John; Leakey, Andrew; Ainsworth, Elizabeth
(2025)
The leaf economics spectrum (LES) describes multivariate correlations in leaf structural, physiological and chemical traits, originally based on diverse C3 species grown under natural ecosystems. However, the specific contribution of C4 species to the global LES is studied less widely. C4 species have a CO2 concentrating mechanism which drives high rates of photosynthesis and improves resource use efficiency, thus potentially pushing them towards the edge of the LES. Here, we measured foliage morphology, structure, photosynthesis, and nutrient content for hundreds of genotypes of the C4 grass Miscanthus × giganteus grown in two common gardens over two seasons. We show substantial trait variations across M. × giganteus genotypes and robust genotypic trait relationships. Compared to the global LES, M. × giganteus genotypes had higher photosynthetic rates, lower stomatal conductance, and less nitrogen content, indicating greater water and photosynthetic nitrogen use efficiency in the C4 species. Additionally, tetraploid genotypes produced thicker leaves with greater leaf mass per area and lower leaf density than triploid genotypes. By expanding the LES relationships across C3 species to include C4 crops, these findings highlight that M. × giganteus occupies the boundary of the global LES and suggest the potential for ploidy to alter LES traits.
keywords:
Feedstock Production;Biomass Analytics;Field Data
published:
2022-03-01
Cao, Yanghui; Dietrich, Christopher H.; Zahniser, James N.; Dmitriev, Dmitry A.
(2022)
The following files were used to reconstruct the phylogeny of the leafhopper subfamily Deltocephalinae, using IQ-TREE v1.6.12 and ASTRAL v 4.10.5.
<b>1) taxon_sampling.csv:</b> contains the sequencing ids (1st column) and the taxonomic information (2nd column) of each sample. Sequencing ids were used in the alignment files and partition files.
<b>2)concatenated_nt.phy:</b> concatenated nucleotide alignment used for the maximum likelihood analysis of Deltocephalinae by IQ-TREE v1.6.12. The file lists the sequences of 163,365 nucleotide positions from 429 genes in 730 samples. Hyphens are used to represent gaps.
<b>3) concatenated_nt_partition.nex:</b> the partitions for the concatenated nucleotide alignment. The file partitions the 163,365 nucleotide characters into 429 character sets, and defines the best substitution model for each character set.
<b>4) concatenated_aa.phy:</b> concatenated amino acid alignment used for the maximum likelihood analysis of Deltocephalinae by IQ-TREE v1.6.12. The file gives the sequences of 53,969 amino acids from 429 genes in 730 samples. Hyphens are used to represent gaps.
<b>5) concatenated_aa_partition.nex:</b> the partitions for the concatenated amino acid alignment. The file partitions the 53,969 characters into 429 character sets, and defines the best substitution model for each character set.
<b>6) concatenated_nt_106taxa.phy:</b> a reduced concatenated nucleotide alignment representing 107 samples x 86 genes. This alignment is used to estimate the divergence times of Deltocephalinae using MCMCTree in PAML v4.9. The file lists the sequences of 79,239 nucleotide positions from 86 genes in 107 samples. Hyphens are used to represent gaps.
<b>7) concatenated_nt_106taxa_partition.nex:</b> the partitions for the nucleotide alignment concatenated_nt_106taxa.phy. The file partitions the 79,239 nucleotide characters into 86 character sets, and defines the best substitution model for each character set.
<b>8) individual_gene_alignment.zip:</b> contains 429 FAS files, one for each of the partitioned nucleotide character sets in the concatenated_nt_partition.nex file. Hyphens are used to represent gaps. These files were used to construct gene trees using IQ-TREE v1.6.12, followed by multispecies coalescent analysis using ASTRAL v 4.10.5.
published:
2025-04-23
Gonzalez Mozo, Laura C; Dietrich, Christopher
(2025)
These data files were used for phylogenomic analyses of Darnini and related Membracidae (Hemiptera: Auchenorrhyncha) in the referenced article by Gonzalez-Mozo et al.
- The "mem_50p_alignment.fas" file contains the aligned, concatenated nucleotide sequence data for 51 species and 492 genetic loci included in the phylogenetic analyses ("N" indicates missing data and "-" indicates an alignment gap).
- The file "Table1.rtf" lists the included species, country of origin and genbank accession number. Species newly sequenced for this study have a Sample ID with prefix "DAR"; previously sequenced species for which data were downloaded from genbank have "NCBI" indicated in the same column of the table.
- The file "partition_def.txt" lists the 492 genetic loci included in the alignment with their exact positions indicated by the range of numbers given at the end of each line (e.g., locus "uce-1" occupies positions 1-280 in the alignment).
- The substitution model file "mem_50p.model" contains information on the substitution models used in the partitioned maximum likelihood analysis, including the models used for different data partitions and parameter values, as output by the phylogenetic software IQ-TREE.
- Individual tree files in Newick format (plain text) are provided for the phylogeny from concatenated analysis with the best likelihood score ("mem_50p_bestLikelihoodScore"), concatenated likelihood analysis with gene concordance factors ("mem_50p_gcf") and site concordance factors ("mem_50p_scf").
- The tree file from the ASTRAL analysis is "mem_50p_astral".
- The zip archive entitled “IQ-TREE analysis results.zip” includes output from the maximum likelihood analysis of the concatenated nucleotide sequence data, including the following: (1) main output file “mem_50p.iqtree” summarizing model selection, partitioning schemes, likelihood scores, and run parameters; (2) “mem_50p.mldist” including pairwise ML distances between taxa; (3) “mem_50p.best_scheme.nex” with the best partitioning scheme identified by ModelFinder in NEXUS format and (4) “mem_50p.best_scheme” the RAxM-compatible version of the same file.
- The “Ultrafast bootstrap results.zip” zip archive contains: (1) “mem_50p.ufboot” with the bootstrap replicate trees; (2) “mem_50p.contree” with the majority-rule consensus tree with support values; (3) “mem_50p.splits.nex”, with split support values across the replicates; (4) “mem_50p.log” is the log file.
- The “gene_trees.zip” zip archive contains the individual gene trees as input for subsequent coalescent gene tree analysis in the phylogenetic program ASTRAL.
- The file "DarniniAHE_Character Matrix.csv" contains the data for 6 morphological characters for which the ancestral states were reconstructed using the phylogenetic results from analysis of anchored-hybrid data (see article text for details).
- The file "scriptACRDarnini.txt" contains the commands used to reconstruct ancestral morphological characters states using the corHMM 2.8 R package. See the Methods section of the article for more details.
keywords:
Insecta; Hemiptera; anchored-hybrid enrichment; phylogeny; treehopper
published:
2021-10-15
Atomic oxygen data from SCIAMACHY, for the MLT, 2002-2012, averaged for 26, 14 day periods, beginning January 1.
keywords:
SCIAMACHY data
published:
2020-09-07
Chen, Luoye; Blanc-Betes, Elena; Hudiburg, Tara; Hellerstein, Daniel; Wallander, Steven; DeLucia, Evan; Khanna, Madhu
(2020)
This dataset contains BEPAM model code and input data to the replicate the results for "Assessing the Returns to Land and Greenhouse Gas Savings from Producing Energy Crops on Conservation Reserve Program Land."
The dataset consists of:
(1) The replication codes and data for the BEPAM model. The code file is named as output_0213-2020_Complete_daycent-agversion-[rental payment level]%_[biomass price].gms. (BEPAM-CRP model-Sep2020.zip)
(2) Simulation results from the BEPAM model (BEPAM_Simulation_Results.csv)
* Item (1) is in GAMS format. Item (2) is in text format.
keywords:
Miscanthus; Switchgrass; soil carbon sequestration; greenhouse gas savings; rental payments; biomass price
published:
2021-01-27
Kwang, Jeffrey S.; Langston, Abigail L.; Parker, Gary
(2021)
*This is the third version of the dataset*. New changes in this 3rd version:
<i>1.replaces simulations where the initial condition consists of a sinusoidal channel with topographic perturbations with simulations where the initial condition consists of a sinusoidal channel without topographic perturbations. These simulations better illustrate the transformation of a nondendritic network into a dendritic one.
2. contains two additional simulations showing how total domain size affects the landscape's dynamism.
3. changes dataset title to reflect the publication's title</i>
This dataset contains data from 18 simulations using a landscape evolution model. A landscape evolution model simulates how uplift and rock incision shape the Earth's (or other planets) surface. To date, most landscape evolution models exhibit "extreme memory" (paper: https://doi.org/10.1029/2019GL083305 and dataset: https://doi.org/10.13012/B2IDB-4484338_V1). Extreme memory in landscape evolution models causes initial conditions to be unrealistically preserved.
This dataset contains simulations from a new landscape evolution model that incorporates a sub-model that allows bedrock channels to erode laterally. With this addition, the landscapes no longer exhibit extreme memory. Initial conditions are erased over time, and the landscapes tend towards a dynamic steady state instead of a static one. The model with lateral erosion is named LEM-wLE (Landscape Evolution Model with Lateral Erosion) and the model without lateral erosion is named LEM-woLE (Landscape Evolution Model without Lateral Erosion).
There are 16 folders in total. Here are the descriptions:
<i>>LEM-woLE_simulations:</i> This folder contains simulations using LEM-woLE. Inside the folder are 5 subfolders containing 100 elevation rasters, 100 drainage area rasters, and 100 plots showing the slope-area relationship. Elevation depicts the height of the landscape, and drainage area represents a contributing area that is upslope. Each folder corresponds to a different initial condition. Driver files and code for these simulations can be found at https://github.com/jeffskwang/LEM-wLE.
<i>>MOVIE_S#_data:</i> There are 13 data folders that contain raster data for 13 simulations using LEM-wLE. Inside each folder are 1000 elevation rasters, 1000 drainage area rasters, and 1000 plots showing the slope-area relationship. Driver files and code for these simulations can be found at https://github.com/jeffskwang/LEM-wLE.
<i>>movies_mp4_format:</i> For each data folder there are 3 movies generated that show elevation (a), drainage area (b), and erosion rates (c). These files are formatted in the mp4 format and are best viewed using VLC media player (https://www.videolan.org/vlc/index.html).
<i>>movies_wmv_format:</i> This folder contains the same movies as the "movies_mp4_format" folder, but they are in a wmv format. These movies can be viewed using Windows media player or other Windows platform movie software.
Here are the captions for the 13 movies:
Movie S1. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Sinusoidal channel without randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 1.
Movie S2. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Inclined with small, randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 1.
Movie S3. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Inclined with large, randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 1.
Movie S4. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: V-shaped valley with randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 1.
Movie S5. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Sinusoidal channel with randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 1.
Movie S6. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Sinusoidal channel without randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 0.25.
Movie S7. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Sinusoidal channel without randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 0.5.
Movie S8. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Sinusoidal channel without randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 0.75.
Movie S9. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Flat with randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 1.
Movie S10. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Flat with randomized perturbations. Boundary Condition: 2 open boundaries at the top and bottom of the domain, and 2 closed boundaries on the left and right sides. KL/KV = 1.
Movie S11. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Flat with randomized perturbations. Boundary Condition: 4 open boundaries. KL/KV = 1.
Movie S12. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Flat with randomized perturbations. Boundary Condition: 4 open boundaries. KL/KV = 1. Compared to Movie S11, the length of the domain is 50% shorter, decreasing the total domain area.
Movie S13. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Flat with randomized perturbations. Boundary Condition: 4 open boundaries. KL/KV = 1. Compared to Movie S11, the length of the domain is 50% longer, increasing the total domain area.
The associated publication for this dataset has not yet been published, and we will update this description with a link when it is.
keywords:
landscape evolution; drainage networks; lateral migration; geomorphology
published:
2020-12-03
Lee, Mindy; Applegate, Catherine; Shaffer, Annabelle; Emamaddin, Abrar; Erdman, John; Nakamura, Manabu
(2020)
This small dataset is a raw data of anthropometric and dietary intake data.
keywords:
Obesity treatment; weight management; high protein; high fiber; nonrestrictive; data visualization; self-empowerment; informed decision making
published:
2021-03-05
Adey, Amaryllis; Larson, Eric
(2021)
Adey_Larson_Behavior.csv: Results of behavioral assays for rusty crayfish Faxonius rusticus collected from six lakes in Vilas County, Wisconsin in summer 2018. Crayfish_ID is an individual crayfish ID or identifier that matches to individuals in Adey_Larson_Isotope. Collection is how organisms were collected (trapped = baited trapping, snorkel = by hand). Lake is the study lake crayfish were collected from. Length is crayfish carapace length in mm. CPUE is crayfish catch-per-unit effort from baited trapping in that lake during summer 2018. Shelter_Occupancy, Exploration, Feeding_Snail, Feeding_Detritus, Feeding_Crayfish, and Aggressiveness are behavioral assay scores for individual crayfish. Shelter_Occupancy is frequency of observation intervals (12 maximum) in which crayfish were observed in shelter over a 12 hour period. Exploration is time for crayfish to explore a new area measured in seconds (maximum possible time 1200 seconds or 20 minutes). Feeding_Snail, Feeding_Detritus, and Feeding_Crayfish is the time for crayfish to take a food item (snail, detritus, or snail in the presence of another crayfish) measured in seconds (maximum possibe time 1200 seconds or 20 minutes). Aggressiveness is the response to an approach with a novel object scored as a fast retreat (-2), slow retreat (-1), no visible response (0), approach without threat display (1), approach with threat display (2), interaction with closed chelae (3), or interaction with open chelae (4). Three repeated aggressiveness measures were made per individual (Aggresiveness1, Aggresiveness2, Aggresiveness3), which were summed for inclusion in subsequent analyses (Aggresiveness_Sum). More detailed behavioral assay methods can be found in Adey 2019 Masters thesis.
Adey_Larson_Isotope.csv: Stable isotope (13C, 15N) values for rusty crayfish Faxonius rusticus and snail or mussel primary consumers from six lakes in Vilas County, Wisconsin collected during summer 2018. Crayf is an individual crayfish ID or identifier that matches to the same individual crayfish in Adey_Larson_Behavior. Lake is the study lake. Collection is how organisms were collected (trapped = baited trapping, snorkel = by hand). Sample type indicates whether isotope values are for crayfish, snail, or mussel. d13C and d15N are stable isotope values.
keywords:
individual specialization; intraspecific competition; behavior; diet; stable isotopes; crayfish; invasive species; limnology; Faxonius rusticus
published:
2023-01-10
Ruess, Paul ; Konar, Megan ; Wanders, Niko; Bierkens, Marc
(2023)
Agriculture is the largest user of water in the United States. Yet, we do not understand the spatially resolved sources of irrigation water use by crop. The goal of this study is to estimate crop-specific irrigation water use from surface water withdrawals, total groundwater withdrawals, and nonrenewable groundwater depletion for the Continental United States. Water use by source is provided for 20 crops and crop groups from 2008 to 2020 at the county spatial resolution.
These results present the first national-scale assessment of irrigation by crop, county, water source, and year. In total, there are nearly 2.5 million data points in this dataset (3,142 counties; 13 years; 3 water sources; and 20 crops). This dataset supports the paper by Ruess et al (2023) in Water Resources Research, https://doi.org/10.1029/2022WR032804.
When using, please cite as:
Ruess, P.J., Konar, M., Wanders, N. , & Bierkens, M. (2023). Irrigation by crop in the Continental United States from 2008 to 2020, Water Resources Research, 59, e2022WR032804. https://doi.org/10.1029/2022WR032804
keywords:
Water use; irrigation; surface water; groundwater; groundwater depletion; counties; crops; time series
published:
2023-04-06
Warnow, Tandy; Park, Minhyuk
(2023)
This is a simulated sequence dataset generated using INDELible and processed via a sequence fragmentation procedure.
keywords:
sequence length heterogeneity;indelible;computational biology;multiple sequence alignment
published:
2025-07-14
Hossain, Mohammad Tanver; Piorkowski, Dakota; Lowe, Andrew; Eom, Wonsik; Shetty, Abhishek; Tawfick, Sameh; Fudge, Douglas; Ewoldt, Randy
(2025)
Data accompanying the article "Physics of Unraveling and Micromechanics of Hagfish Threads".
Abstract of the article:
Hagfish slime is a unique biological material composed of mucus and protein threads that rapidly deploy into a cohesive network when deployed in seawater. The forces involved in thread deployment and interactions among mucus and threads are key to understanding how hagfish slime rapidly assembles into a cohesive, functional network. Despite extensive interest in its biophysical properties, the mechanical forces governing thread deployment and interaction remain poorly quantified. Here, we present the first direct in situ measurements of the micromechanical forces involved in hagfish slime formation, including mucus mechanical properties, skein peeling force, thread–mucus adhesion, and thread–thread cohesion. Using a custom glass-rod force sensing system, we show that thread deployment initiates when peeling forces exceed a threshold of approximately 6.8 nN. To understand the flow strength required for unraveling, we used a rheo-optic setup to impose controlled shear flow, enabling us to directly observe unraveling dynamics and determine the critical shear rate for unraveling of the skeins, which we then interpreted using an updated peeling-based force balance model. Our results reveal that thread–mucus adhesion dominates over thread–thread adhesion and that deployed threads contribute minimally to bulk shear rheology at constant flow rate. These findings clarify the physics underlying the rapid, flow-triggered assembly of hagfish slime and inform future designs of synthetic deployable fiber–gel systems.
keywords:
supplementary data; hagfish slime; unraveling skeins