Illinois Data Bank Dataset Search Results
Results
published:
2018-10-24
Ugarte, Carmen M.; Wander, Michelle M.
(2018)
This dataset was compiled between 2010 and 2011 from data published in the scientific literature from articles evaluating the influence of cropping systems and soil management practices on soil organic Carbon. We used the Thomas Reuter Web of Science database and by reviewed the reference sections of key peer-reviewed articles. Articles included in the database presented results from field sites within the continental United States.
keywords:
Cropping systems; soil management; soil organic carbon; soil quality.
published:
2021-04-11
Park, Minhyuk; Zaharias, Paul; Warnow, Tandy
(2021)
This dataset contains RNASim1000, Cox1-Het datasets as well as analyses of RNASim1000, Cox1-Het, and 1000M1(HF).
keywords:
phylogeny estimation; maximum likelihood; RAxML; IQ-TREE; FastTree; cox1; heterotachy; disjoint tree mergers; Tree of Life
published:
2021-12-09
Burnham, Mark; Simon, Sandra; Lee, DK; Kent, Angela; DeLucia, Evan; Yang, Wendy
(2021)
These data were collected in 2018 and 2019 at the University of Illinois Energy Farm (N 40.063607, W 88.206926). During each growing season, bulk and rhizosphere soil were collected from replicate Sorghum bicolor nitrogen use efficiency trial plots at three separate time points (approximately July 1, August 1, and September 1). We measured soil moisture, pH, soil nitrate and ammonium, potential nitrification, potential denitrification, and extracted and sequenced the V4 region of the 16S rRNA gene for microbial community analysis. All microbial sequence data is archived in the National Center for Biotechnology Information’s (NCBI) Sequence Read Archive (accession number SRP326979, project number PRJNA741261).
keywords:
soil nitrogen; nitrification; nitrogen cycle; sorghum; bioenergy; Center for Advanced Bioenergy and Bioproducts Innovation
published:
2018-12-01
Nelson, Andrew J; Lichiheb, Nebila; Koloutsou-Vakakis, Sotiria; Rood, Mark J.; Heuer, Mark; Myles, LaToya; Joo, Eva; Miller, Jesse; Bernacchi, Carl
(2018)
Ammonia flux measurement data using flux gradient and relaxed eddy accumulation methods, and ancillary environmental data collected during the 2014 corn-growing season in Central Illinois, USA. This excel file contains two spreadsheets: one README sheet, and one sheet containing all data. These data were used in the development of the manuscript titled "Ammonia Flux Measurements above a Corn Canopy using Relaxed Eddy Accumulation and a Flux Gradient System."
keywords:
Ammonia; Bi-directional Flux; Corn; Relaxed Eddy Accumulation; Flux Gradient; Urease Inhibitor
published:
2018-08-02
Data used to estimate the survival of Swainson's Thrushes crossing the Gulf of Mexico.
keywords:
capture history; thrush; survival
published:
2023-01-01
Cao, Yanghui; Dietrich, Christopher H.; Kits, Joel; Dmitriev, Dmitry A.; Xu, Ye; Huang, Min
(2023)
The following files were used to reconstruct the phylogeny of the leafhopper subfamily Typhlocybinae, using IQ-TREE v1.6.12 and ASTRAL v 4.10.5.
<b>1) Taxon_sampling.csv:</b> contains the sample IDs (1st column) and the taxonomic information (2nd column). Sample IDs were used in the alignment files and partition files.
<b>2) concatenated_nt_complete.phy:</b> a complete concatenated nucleotide dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12. The file lists the sequences of 248 samples with 154,992 nucleotide positions (intron included) from 665 loci. Hyphens are used to represent gaps.
<b>3) concatenated_nt_complete_partition.nex:</b> the partitioning schemes for concatenated_nt_complete.phy. The file partitions the 154,992 nucleotide characters into 426 character sets, and defines the best substitution model for each character set.
<b>4) concatenated_cds_complete.phy:</b> a complete concatenated coding DNA sequence dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12. The file lists the sequences of 248 samples with 153,525 nucleotide positions (intron excluded) from 665 loci. Hyphens are used to represent gaps.
<b>5) concatenated_cds_complete_partition.nex:</b> the partitioning schemes for concatenated_cds_complete.phy. The file partitions the 153,525 nucleotide characters into 426 character sets, and defines the best substitution model for each character set.
<b>6) concatenated_nt_reduced.phy:</b> a reduced concatenated nucleotide dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12. The file lists the sequences of 248 samples with 95,076 nucleotide positions (intron included) from 374 loci. Hyphens are used to represent gaps.
<b>7) concatenated_nt_reduced_partition.nex:</b> the partitioning schemes for concatenated_nt_reduced.phy. The file partitions the 95,076 nucleotide characters into 312 character sets, and defines the best substitution model for each character set.
<b>8) concatenated_aa_complete.phy:</b> a complete concatenated amino acid dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12, corresponding to concatenated_cds_complete.phy. The file lists the sequences of 248 samples with 51,175 amino acid positions from 665 loci. Hyphens are used to represent gaps.
<b>9) concatenated_aa_complete_partition.nex:</b> the partitioning schemes for concatenated_aa_complete.phy. The file partitions the 51,175 amino acid characters into 426 character sets, and defines the best substitution model for each character set.
<b>10) concatenated_aa_reduced.phy:</b> a reduced concatenated amino acid dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12, corresponding to concatenated_nt_reduced.phy. The file lists the sequences of 248 samples with 31,384 amino acid positions from 374 loci. Hyphens are used to represent gaps.
<b>11) concatenated_aa_reduced_partition.nex:</b> the partitioning schemes for concatenated_aa_reduced.phy. The file partitions the 31,384 amino acid characters into 312 character sets, and defines the best substitution model for each character set.
<b>12) Individual_gene_alignment.zip:</b> contains 426 FASTA files, each one is an alignment for a gene. Hyphens are used to represent gaps. These files were used to construct gene trees using IQ-TREE v1.6.12, followed by multispecies coalescent analysis using ASTRAL v 4.10.5 based the consensus trees with a minimum average bootstrap value of 70.
keywords:
Auchenorrhyncha, Cicadomorpha, Membracoidea, anchored hybrid enrichment
published:
2022-12-28
Harmon, Gabriel T.; Harmon-Threatt, Alexandra N.; Anderson, Nicholas L.
(2022)
The effect of pesticide contamination on arthropod biomass and diversity in simulated prairie restorations depended on arthropod feeding guild (e.g., predator, herbivore, or pollinator). The pesticides used in this study were the neonicotinoid insecticide clothianidin and the phthalimide fungicide captan. This dataset includes two data files. The first contains information about the study sites ("plots") and pesticide treatments. The second contains information about arthropod biomass and morphospecies richness separated by feeding guild for each month-plot combination. R code in an R Markdown file for the analysis and data presentation in the associated publication is also provided. Detected effects included: predator biomass was 66% lower in plots treated with clothianidin, and this effect persisted across the growing season; the impact on herbivore biomass appeared to be inconsistent, with biomass being 51% lower with clothianidin in June but no detected difference in July or August; herbivore morphospecies richness was 12% lower in plots treated with both clothianidin and captain; pollinators appeared to be unaffected by clothianidin; and pollinator biomass increased by 71% when captan was applied to a plot.
keywords:
Arthropod decline; pesticide; clothianidin; captan; habitat restoration; trophic effects; insects
published:
2026-01-09
Schultz, J Carl; Cao, Mingfeng; Zhao, Huimin
(2026)
Rhodotorula toruloides has been increasingly explored as a host for bioproduction of lipids, fatty acid derivatives and terpenoids. Various genetic tools have been developed, but neither a centromere nor an autonomously replicating sequence (ARS), both necessary elements for stable episomal plasmid maintenance, has yet been reported. In this study, cleavage under targets and release using nuclease (CUT&RUN), a method used for genome-wide mapping of DNA–protein interactions, was used to identify R. toruloides IFO0880 genomic regions associated with the centromeric histone H3 protein Cse4, a marker of centromeric DNA. Fifteen putative centromeres ranging from 8 to 19 kb in length were identified and analyzed, and four were tested for, but did not show, ARS activity. These centromeric sequences contained below average GC content, corresponded to transcriptional cold spots, were primarily nonrepetitive and shared some vestigial transposon-related sequences but otherwise did not show significant sequence conservation. Future efforts to identify an ARS in this yeast can utilize these centromeric DNA sequences to improve the stability of episomal plasmids derived from putative ARS elements.
keywords:
Genome Engineering; Genomics
published:
2019-02-02
The bee visitation data includes the percentage of each bee pollinator group in bee bowls and observed. The data are referenced in the article with the following citation:
Bennett, A.B., Lovell, S.T. 2019. Landscape and local site variables differentially influence pollinators and pollination services in urban agricultural sites. Accepted for publication in: PLOS ONE.
published:
2017-12-04
Zaya, David N.; Leicht-Young, Stacey A.; Pavlovic, Noel; Hetrea, Christopher S.; Ashley, Mary V.
(2017)
Data used for Zaya et al. (2018), published in Invasive Plant Science and Management DOI 10.1017/inp.2017.37, are made available here. There are three spreadsheet files (CSV) available, as well as a text file that has detailed descriptions for each file ("readme.txt"). One spreadsheet file ("prices.csv") gives pricing information, associated with Figure 3 in Zaya et al. (2018). The other two spreadsheet files are associated with the genetic analysis, where one file contains raw data for biallelic microsatellite loci ("genotypes.csv") and the other ("structureResults.csv") contains the results of Bayesian clustering analysis with the program STRUCTURE. The genetic data may be especially useful for future researchers. The genetic data contain the genotypes of the horticultural samples that were the focus of the published article, and also genotypes of nearly 400 wild plants. More information on the location of the wild plant collections can be found in the Supplemental information for Zaya et al. (2015) Biological Invasions 17:2975–2988 DOI 10.1007/s10530-015-0926-z. See "readme.txt" for more information.
keywords:
Horticultural industry; invasive species; microsatellite DNA; mislabeling; molecular testing
published:
2019-07-27
Clark, Lindsay V.; Dwiyanti, Maria Stefanie; Anzoua, Kossonou G.; Brummer, Joe E.; Glowacka, Katarzyna; Hall, Megan; Heo, Kweon; Jin, Xiaoli; Lipka, Alexander E.; Peng, Junhua; Yamada, Toshihiko; Yoo, Ji Hye; Yu, Chang Yeon; Zhao, Hua; Long, Stephen P.; Sacks, Erik J.
(2019)
Genotype calls are provided for a collection of 583 Miscanthus sinensis clones across 1,108,836 loci mapped to version 7 of the Miscanthus sinensis reference genome. Sequence and alignment information for all unique RAD tags is also provided to facilitate cross-referencing to other genomes.
keywords:
variant call format (VCF); sequence alignment/map format (SAM); miscanthus; single nucleotide polymorphism (SNP); restriction site-associated DNA sequencing (RAD-seq); bioenergy; grass
published:
2024-10-08
Mersich, Ina; Bishop, Rebecca; Diaz Yucupicio, Sandra; Nobrega, Ana D.; Austin, Scott; Barger, Anne; Fick , Megan E.; Wilkins, Pamela
(2024)
Acepromazine was administered to healthy adult horses to induce transient anemia secondary to splenic sequestration. Data was collected at baseline (T0), 1 hour (T1) and 12 hours (T2) post acepromazine administration. Data collection included PCV, TP, CBC, fibrinogen, PT, PTT and viscoelastic coagulation profiles (VCM Vet) as well as ultrasonographic measurements of the spleen at all 3 time points.
keywords:
horse; coagulation; viscoelastic testing; anemia; acepromazine
published:
2021-03-15
Stodola, Alison P.; Lydeard, Charles; Lamer, James T.; Douglass, Sarah A.; Cummings, Kevin; Campbell, David
(2021)
Dataset associated with "Hiding in plain sight: genetic confirmation of putative Louisiana Fatmucket Lampsilis hydiana in Illinois" as submitted to Freshwater Mollusk Biology and Conservation by Stodola et al. Images are from cataloged specimens from the Illinois Natural History Survey (INHS) Mollusk Collection in Champaign, Illinois that were used for genetic research. File names indicate the species as confirmed in Stodola et al. (i.e., Lampsilis siliquoidea or Lampsilis hydiana) followed by the INHS Mollusk Collection catalog number, followed by the individual specimen number, followed by shell view (interior or exterior). If no specimen number is noted in the file name, there is only one specimen for that catalog number. For example: Lsiliquoidea_46515_1_2_3_exterior.
Images were created by photographing specimens on a metric grid in an OrTech Photo-e-Box Plus with a Nikon D610 single lens reflex camera using a 60mm lens. Post-processing of images (cropping, image rotation, and auto contrast) occurred in Adobe Photoshop and saved as TIFF files using no image compression, interleaved pixel order, and IBM PC Byte Order. One additional partial lot, INHS Mollusk Catalog No. 37059 (shown with both interior and exterior view in one image), is included for reference but was not genetically sequenced. A .csv file contains an index of all specimens photographed.
SPECIES: species confirmed using genetic analyses
GENE: cox1 or nad1 mitochondrial gene
ACCESSION: GenBank accession number
INHS CATALOG NO: Illinois Natural History Survey Mollusk Collection Catalog number
WATERBODY: waterbody where specimen was collected
PUTATIVE SPECIES: species determination based on morphological characters prior to genetic analysis
Phylogenetic sequence data (.nex files) were aligned using BioEdit (Hall, T.A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series 41:95-98.). Pertinent methodology for the analysis are contained within the manuscript submittal for Stodola et al. to Freshwater Mollusk Biology and Conservation. In these files, "N" is a standard symbol for an unknown base.
keywords:
Lampsilis hydiana; Lampsilis siliquoidea; unionid; Louisiana Fatmucket; Fatmucket; genetic confirmation
published:
2024-01-31
Wang, Xiudan; Dietrich, Christopher; Zhang, Yalin
(2024)
The included files were used to reconstruct the phylogeny of Coelidiinae using combined morphological and molecular data, estimate divergence times and reconstruct ancestral biogeographic areas as described in the manuscript submitted for publication. The file “Coelidiinae_dna_morph_combined.nex” is a text file in standard NEXUS format used by various phylogenetic analysis programs. This file includes the aligned and concatenated nucleotide sequences or five gene regions (mitochondrial COI and 16S, and nuclear 28S D-2, histone H3, histone H2A and wingless) indicated by standard “ACGT” nucleotide symbols with missing data indicated by “?”, and morphological character data as defined in Table S3 used in the analyses. The data partitions are indicated toward the end of the file by ranges of numbers (“charset Subset 1 – 4” for the DNA data and “charset morph” for the morphological characters) followed by commands for the phylogenetic analysis program MrBayes that specify the model settings for each data partition. Detailed data on species included (as rows) in the dataset, including collection localities and GenBank accession numbers are provided in the Table_S1_Specimen_information.csv file. The file "TablesS2-S4.pdf" lists the primers used for polymerase chain reaction amplification, the list of morphological character definitions, and the morphological character matrix. The file “RASP_Distribution.csv” contains a list of the species included in the phylogenetic dataset (first column) and a code (second column) indicating their distributions as follows: (A) Oriental, (B) Palaearctic, (C) Australian, (D) Afrotropical, (E) Neotropical, and (F) Nearctic. More than one letter indicates that the species occurs in more than one region. The file "infile_for_BEAST.txt" is the input file in XML format used for the molecular divergence time analysis using the program BEAST (Bayesian Evolutionary Analysis by Sampling Trees) as described in the Methods section of the manuscript. This file includes comments that document the steps of the analysis.
keywords:
leafhopper; phylogeny; DNA sequence; insect; timetree; biogeography
published:
2018-04-05
GBS data from Phaseolus accessions, for a study led by Dr. Glen Hartman, UIUC. <br />The (zipped) fastq file can be processed with the TASSEL GBS pipeline or other pipelines for SNP calling. The related article has been submitted and the methods section describes the data processing in detail.
published:
2020-10-01
Fraterrigo, Jennifer; Rembelski, Mara
(2020)
We measured the effects of fire or drought treatment on plant, microbial and biogeochemical responses in temperate deciduous forests invaded by the annual grass Microstegium vimineum with a history of either frequent fire or fire exclusion.
Please note, on Documentation tab / Experimental or Sampling Design, “15 (XVI)” should be “16 (XVI)”.
keywords:
plant-soil interaction; grass-fire cycle; Microstegium; carbon and nitrogen cycling; microbial decomposers
published:
2025-10-10
Yang, Pan; Cai, Ximing; Leibensperger, Carrie; Khanna, Madhu
(2025)
The success of a bioenergy policy relies largely on the wide adoption of perennial energy crops at the farm scale. This study uses survey data to examine potential adoption decisions by farmers in the U.S. Midwest and the causal effects of various direct and indirect influencing factors, especially heterogeneous preferences of farmers. A Bayesian network (BN) model is developed to delineate the causal relationship between farmers adoption decisions and the influencing factors. We find a dominating role of economic factors and a non-negligible impact of non-economic factors, such as the perceived environmental benefits and the extent of familiarity with perennial energy crops. To examine the effect of heterogeneity in farmer preferences, we classify the surveyed farmers into four categories based on their attitudes toward the economic, social, and environmental dimensions of perennial energy crops. We identified statistically significant between-group differences in the responses of the four types of farmers to the various influencing factors. Our findings contribute to disentangling the complicated motivations that will influence perennial energy crop adoption decisions and provide implications for more targeted policy development that need to consider the heterogeneous drivers of farmer decisions about land use.
keywords:
Sustainability;Modeling
published:
2024-09-17
Cao, Yanghui; Dietrich, Christopher H.; Dmitriev, Dmitry A.; Kits, Joel H.; Xue, Qingquan; Zhang, Yalin
(2024)
The following seven zip files are compressed folders containing the input datasets/trees, main output files and the scripts of the related analyses performed in this study.
I. ancestral_microhabitat_reconstruction.zip: contains four files, including two input files (microhabitats.csv, timetree.tre) and a script (simmap_microhabitat.R) for ancestral states reconstruction of microhabitat by make.simmap implemented in the R package phytools v1.5, as well as the main output file (ancestral_microhabitats.csv).
1. ancestral_microhabitats.csv: reconstructed ancestral microhabitats for each node.
2. microhabitats.csv: microhabitats of the studies species.
3. simmap_microhabitat.R: the R script of make.simmap for ancestral microhabitat reconstruction
4. timetree.tre: dated tree used for ancestral state reconstruction for microhabitat and morphological characters
II. ancestral_morphology_reconstruction.zip: contains six files, including an input file (morphology.csv) and a script (simmap_morphology.R) for ancestral states reconstruction of morphology by make.simmap implemented in the R package phytools v1.5, as well as four main output files(forewing_ancestral_state.csv, frontal_sutures_ancestral_state.csv, hind_wing_ancestral_state.csv, ocellus_ancestral_state.csv).
1. forewing_ancestral_state.csv: reconstructed ancestral states of the development of the forewing for each node.
2. frontal_sutures_ancestral_state.csv: reconstructed ancestral states of the development of frontal sutures for each node.
3. hind_wing_ancestral_state.csv: reconstructed ancestral states of the development of the hind wing for each node.
4. morphology.csv: the states of the development of ocellus, forewing, hing wing and frontal sutures for each studies species.
5. ocellus_ancestral_state.csv: reconstructed ancestral states of the development of the ocellus for each node.
6. simmap_morphology.R: the R script of make.simmap for ancestral state reconstruction of morphology
III. biogeographic_reconstruction.zip: contains four files, including three input files (dispersal_probablity.txt, distributions.csv, timetree_noOutgroup.tre) used for a stratified biogeographic analysis by BioGeoBEARS in RASP v4.2 and the main output file (DIVELIKE_result.txt).
1. dispersal_probablity.txt: relative dispersal probabilities among biogeographical regions at different geological epochs.
2. distributions.csv: current distributions of the studied species.
3. DIVELIKE_result.txt: BioGeoBEARS result of ancestral areas based on the DIVELIKE model.
4. timetree_noOutgroup.tre: the dated tree with the outgroup lineage (Eurymelinae) excluded.
IV. coalescent_analysis.zip: contains a folder and two files, including a folder (individual_gene_alignment) of input files used to construct gene trees, an input file (MLtree_BS70.tre) used for the multi-species coalescent analysis by ASTRAL v 4.10.5 and the main output file (coalescent_species_tree.tre).
1. coalescent_species_tree.tre: the species tree generated by the multi-species coalescent analysis with the quartet support, effective number of genes and the local posterior probability indicated.
2. individual_gene_alignment: a folder containing 427 FASTA files, each one represents the nucleotide alignment for a gene. Hyphens are used to represent gaps. These files were used to construct gene trees using IQ-TREE v1.6.12.
3. MLtree_BS70.tre: 165 gene trees with the average SH-aLRT and ultrafast bootstrap values of ≥ 70%. This file was used to estimate the species tree by ASTRAL v 4.10.5.
V. divergence_time_estimation.zip: contains five files, including two input files (treefile_rooted_noBranchLength.tre, treefile_rooted.tre) and two control files (baseml.ctl, mcmctree.ctl) used for divergence time estimation by BASEML and MCMCTREE in PAML v4.9, as well as the main output file (timetree_with95%HPD.tre).
1. baseml.ctl: the control file used for the estimation of substitution rates by BASEML in PAML v4.9.
2. mcmctree.ctl: the control file used for the estimation of divergence times by MCMCTREE in PAML v4.9.
3. timetree_with95%HPD.tre: dated tree with the 95% highest posterior density confidence intervals indicated.
4. treefile_rooted_noBranchLength.tre: the maximum likelihood tree based on the concatenated nucleotide dataset with calibrations for the crown and internal nodes. Branch length and support values were not indicated.
5. treefile_rooted.tre: the maximum likelihood tree based on the concatenated nucleotide dataset with a secondary calibration on the root age. Branch support values were not indicated.
VI. maximum_likelihood_analysis_aa.zip: contains three files, including two input files (concatenated_aa_partition.nex, concatenated_aa.phy) used for the maximum likelihood analysis by IQ-TREE v1.6.12 and the main output file (MLtree_aa.tre).
1. concatenated_aa_partition.nex: the partitioning schemes for the maximum likelihood analysis using concatenated_aa.phy. This file partitions the 52,024 amino acid positions into 427 character sets.
2. concatenated_aa.phy: a concatenated amino acid dataset with 52,024 amino acid positions. Hyphens are used to represent gaps. This dataset was used for the maximum likelihood analysis.
3. MLtree_aa.tre: the maximum likelihood tree based on the concatenated amino acid dataset, with SH-aLRT values and ultrafast bootstrap values indicated.
VII. maximum_likelihood_analysis_nt.zip: contains three files, including two input files (concatenated_nt_partition.nex, concatenated_nt.phy) used for the maximum likelihood analysis by IQ-TREE v1.6.12 and the main output file (MLtree_nt.tre).
1. concatenated_nt_partition.nex: the partitioning schemes for the maximum likelihood analysis using concatenated_nt.phy. This file partitions the 156,072 nucleotide positions into 427 character sets.
2. concatenated_nt.phy: a concatenated nucleotide dataset with 156,072 nucleotide positions. Hyphens are used to represent gaps. This dataset was used for the maximum likelihood analysis as well as divergence time estimation.
3. MLtree_nt.tre: the maximum likelihood tree based on the concatenated nucleotide dataset, with SH-aLRT values and ultrafast bootstrap values indicated.
VIII. Taxon_sampling.csv: contains the sample IDs (1st column) which were used in the alignments and the taxonomic information (2nd to 6th columns).
keywords:
Anchored Hybrid Enrichment, Biogeography, Cicadellidae, Phylogenomics, Treehoppers
published:
2017-12-15
These are the results of an 8 month cohort study in two commercial dairy herds in Northwest Illinois. From each herd, 50 cows were selected at random, stratified over lactations 1 to 3. Serum from these animals was collected every two months and tested for antibodies to Bovine Leukosis Virus, Neospora caninum, and Mycobacterium avium subsp. paratuberculosis. Animals that left the herd during the study were replaced by another animal in the same herd and lactation. At the last sampling, serum neutralization assays were performed for Bovine Herpesvirus type 1 and Bovine Viral Diarrhea virus type 1 and 2. Production data before and after sampling was collected for the entire herd from PCdart.
keywords:
serostatus;dairy;production;cohort
published:
2023-07-05
Njuguna, Joyce; Clark, Lindsay; Lipka, Alexander; Anzoua, Kossonou; Bagmet, Larisa; Chebukin, Pavel; Dwiyanti, Maria; Dzyubenko, Elena; Dzyubenko, Nicolay; Ghimire, Bimal; Jin, Xiaoli; Johnson, Douglas; Kjeldsen, Jens; Nagano, Hironori; Oliveira, Ivone; Peng, Junhua; Petersen, Karen; Sabitov, Andrey; Seong, Eun; Yamada, Toshihiko; Yoo, Ji; Yu, Chang; Zhao, Hu; Munoz, Patricio; Long, Stephen; Sacks, Erik
(2023)
This dataset contains all data used in the paper "Impact of genotype-calling methodologies on genome-wide association and genomic prediction in polyploids". The dataset includes genotypes and phenotypic data from two autotetraploid species Miscanthus sacchariflorus and Vaccinium corymbosum that was used used for genome wide association studies and genomic prediction and the scripts used in the analysis.
In this V2, 2 files have the raw data are added:
"Miscanthus_sacchariflorus_RADSeq.vcf" is the VCF file with the raw SNP calls of the Miscanthus sacchariflorus data used for genotype calling using the 6 genotype calling methods.
"Blueberry_data_read_depths.RData" is the a RData file with the read depth data that was used for genotype calling in the Blueberry dataset.
keywords:
Polyploid; allelic dosage; Bayesian genotype-calling; Genome-wide association; Genomic prediction
published:
2024-07-11
Gholamalamdari, Omid; Belmont, Andrew
(2024)
This repository contains the data and computational analysis notebooks that were used in the following manuscript.
For more information on the methods and contributing authors, please refer to the original manuscript.
"Beyond A and B Compartments: how major nuclear locales define nuclear genome organization and function Omid Gholamalamdari et al. 2024"
keywords:
genomic analysis; R markdown; genomic segmentations
published:
2025-09-29
Zhai, Zhiyang; Liu, Hui; Shanklin, John
(2025)
During the transformation of wild-type (WT) Arabidopsis thaliana, a T-DNA containing OLEOSIN-GFP (OLE1-GFP) was inserted by happenstance within the GBSS1 gene, resulting in significant reduction in amylose and increase in leaf oil content in the transgenic line (OG). The synergistic effect on oil accumulation of combining gbss1 with the expression of OLE1-GFP was confirmed by transforming an independent gbss1 mutant (GABI_914G01) with OLE1-GFP. The resulting OLE1-GFP/gbss1 transgenic lines showed higher leaf oil content than the individual OLE1-GFP/WT or single gbss1 mutant lines. Further stacking of the lipogenic factors WRINKLED1, Diacylglycerol O-Acyltransferase (DGAT1), and Cys-OLEOSIN1 (an engineered sesame OLEOSIN1) in OG significantly elevated its oil content in mature leaves to 2.3% of dry weight, which is 15 times higher than that in WT Arabidopsis. Inducible expression of the same lipogenic factors was shown to be an effective strategy for triacylglycerol (TAG) accumulation without incurring growth, development, and yield penalties.
keywords:
Feedstock Production;Biomass Analytics
published:
2017-06-16
Haselhorst, Derek S; Tcheng, David K. ; Moreno, J. Enrique ; Punyasena, Surangi W.
(2017)
Table S1. Pollen types identified in the BCI and PNSL pollen rain data sets. Pollen types were identified to species when possible and assigned a life form based on descriptions provided in Croat, T.B. (1978). Taxa from BCI and PNSL were assigned a 1 if present in forest census data or a 0 if absent. The relative representation of each taxon has been provided for each extended record and by dry and wet season representation respectively. CA loadings are provided for axes 1 and 2 (Fig. 1).
keywords:
pollen; identifications; abundance; data; BCI; PNSL; Panama
published:
2022-01-01
Cao, Yanghui; Dietrich, Christopher H.
(2022)
The file “Fla.fasta”, comprising 10526 positions, is the concatenated amino acid alignments of 51 orthologues of 182 bacterial strains. It was used for the maximum likelihood and maximum parsimony analyses of Flavobacteriales. Bacterial species names and strains were used as the sequence names, host names of insect endosymbionts were shown in brackets. The file “16S.fasta” is the alignment of 233 bacterial 16S rRNA sequences. It contains 1455 positions and was used for the maximum likelihood analysis of flavobacterial insect endosymbionts. The names of endosymbiont strains were replaced by the name of their hosts. In addition to the species names, National Center for Biotechnology Information (NCBI) accession numbers were also indicated in the sequence names (e.g., sequence “Cicadellidae_Deltocephalinae_Macrostelini_Macrosteles_striifrons_AB795320” is the 16S rRNA of Macrosteles striifrons (Cicadellidae: Deltocephalinae: Macrostelini) with a NCBI accession number AB795320). The file “Sulcia_pep.fasta” is the concatenated amino acid alignments of 131 orthologues of “Candidatus Sulcia muelleri” (Sulcia). It contains 41970 positions and presents 101 Sulcia strains and 3 Blattabacterium strains. This file was used for the maximum likelihood analysis of Sulcia. The file “Sulcia_nucleotide.fasta” is the concatenated nucleotide alignment corresponding to the sequences in “Sulcia_pep.fasta” but also comprises the alignment of 16S rRNA. It has 127339 positions and was used for the maximum likelihood and maximum parsimony analyses of Sulcia. Individual gene alignments (16S rRNA and 131 orthologues of Sulcia and Blattabacterium) are deposited in the compressed file “individual_gene_alignments.zip”, which were used to construct gene trees for multispecies coalescent analysis. The names of Sulcia strains were replaced by the name of their hosts in “Sulcia_pep.fasta”, “Sulcia_nucleotide.fasta” and the files in “individual_gene_alignments.zip”. In all the alignment files, gaps are indicated by “-”.
keywords:
endosymbiont, “Candidatus Sulcia muelleri”, Auchenorrhyncha, coevolution