Displaying datasets 401 - 425 of 633 in total

Subject Area

Life Sciences (330)
Social Sciences (135)
Physical Sciences (91)
Technology and Engineering (62)
Uncategorized (14)
Arts and Humanities (1)

Funder

Other (192)
U.S. National Science Foundation (NSF) (187)
U.S. Department of Energy (DOE) (63)
U.S. National Institutes of Health (NIH) (59)
U.S. Department of Agriculture (USDA) (41)
Illinois Department of Natural Resources (IDNR) (17)
U.S. Geological Survey (USGS) (6)
U.S. National Aeronautics and Space Administration (NASA) (5)
Illinois Department of Transportation (IDOT) (4)
U.S. Army (2)

Publication Year

2021 (108)
2022 (108)
2020 (96)
2023 (78)
2019 (72)
2018 (62)
2017 (36)
2024 (36)
2016 (30)
2025 (2)
2009 (1)
2011 (1)
2012 (1)
2014 (1)
2015 (1)

License

CC0 (352)
CC BY (261)
custom (20)
published: 2021-02-25
 
Total nitrogen leaching rates were calculated over the Mississippi Atchafalaya River Basin (MARB) using an integrated economic-biophysical modeling approach. Land allocation for corn production and total nitrogen application rates were calculated for crop reporting districts using the Biofuel and Environmental Policy Analysis Model (BEPAM) for 5 RFS2 policy scenarios. These were used as input in the Integrated BIosphere Simulator-Agricultural Version (Agro-IBIS) and the Terrestrial Hydrologic Model with Biogeochemistry (THMB) to calculate the nitrogen loss. Land allocation and total nitrogen application simulations were simulated for the period 2016-2030 for 303 crop reporting districts (https://www.nass.usda.gov/Data_and_Statistics/County_Data_Files/Frequently_Asked_Questions/county_list.txt). The final 2030 values are reported here. Both are stored in csv files. Units for land allocation are million ha and nitrogen application are million kg. The nitrogen leaching rates were modeled with a spatial resolution of 5' x 5' using the North American Datum of 1983 projection and stored in NetCDF files. The 30-year average is calculated over the last 30 years of the 45 years being simulated. Leaching rates are calculated in kg-N/ha.
keywords: nitrogen leaching, bioethanol, bioenergy crops
published: 2021-02-16
 
Data from census of peer-reviewed papers discussing nosZ and published from 2013 to 2019. These data were reported in the manuscript titled, "Beyond denitrification: the role of microbial diversity in controlling nitrous oxide reduction and soil nitrous oxide emissions" published in Global Change Biology as an Invited Report.
keywords: atypical nosZ; Clade II nosZ; denitrification; nitrous oxide; N2O reduction; non-denitrifier; nosZ; nosZ-II; nosZ Clade II; soil N2O emissions
published: 2021-02-15
 
The file contains biomass and count data of food items encountered in the digestive tract of collected green-winged teal from the Illinois River Valley during spring 2016-2018. The file also contains biomass of food items collected from core samples collected at sites where the green-winged teal were collected. Together, the consumed and availability food data are used to calculate diet selection. The data also contains information on the teal, collection, sites, and other covariates used in analysis. Lastly, the dataset contains biomass of food items collected in medium (#35) and small (#60) sieves for 2018 core samples.
keywords: Anas crecca; food selection; green-winged teal; Illinois River Valley; moist-soil plants; spring migration; stopover ecology
published: 2021-02-10
 
This dataset consists of microclimatic temperature and vegetation structure maps at a 3-meter spatial resolution across the Great Smoky Mountains National Park. Included are raster models for sub-canopy, near-surface, minimum and maximum temperature averaged across the study period, season, and month during the growing season months of March through November from 2006-2010. Also available are the topographic and vegetation inputs developed for the microclimate models, including LiDAR-derived vegetation height, LiDAR-derived vegetation structure within four height strata, solar insolation, distance-to-stream, and topographic convergence index (TCI).
keywords: microclimate buffering; forest vegetation structure; temperature; Appalachian Mountains; climate downscaling; understory; LiDAR
published: 2020-12-30
 
High-speed X-ray videos of four E. abruptus specimens recorded at the Advanced Photron Source (Argonne National lab) in the Summer of 2018 and corresponding position data of landmarks tracked during the motion. See readme file for more details.
published: 2020-10-01
 
We measured the effects of fire or drought treatment on plant, microbial and biogeochemical responses in temperate deciduous forests invaded by the annual grass Microstegium vimineum with a history of either frequent fire or fire exclusion. Please note, on Documentation tab / Experimental or Sampling Design, “15 (XVI)” should be “16 (XVI)”.
keywords: plant-soil interaction; grass-fire cycle; Microstegium; carbon and nitrogen cycling; microbial decomposers
published: 2021-02-01
 
These datasets provide the basis of our analysis in the paper - The Potential Impact of a Clean Energy Society On Air Quality. All datasets here are from the model output (CAM4-chem). All the simulations were run to steady-state and only the outputs used in the analysis are archived here.
keywords: clean energy; ozone; particulates
published: 2021-01-25
 
Dataset associated with Zenzal et al. Oikos submission: Retreat, detour, or advance? Understanding the movements of birds confronting the Gulf of Mexico. https://doi.org/10.1111/oik.07834 Four CSV files were used for analysis and are related to the following subsections under the “Statistics” heading in the “Materials and Methods” section of the journal article: 1. Departing the Edge = “AIC Analysis.csv” 2. Comparing Retreating to Advancing = “Advance and Retreat Analysis.csv” and “Wind Data at Departure.csv” 3. Food Abundance = “Fruit Data.csv” and “Arthropod Data.csv” <b>Description of variables:</b> Year: the year in which data were collected. Departure: the direction in which an individual departed the Bon Secour National Wildlife Refuge. “North” indicates an individual that departed ≥315° or <45°; “Circum” indicates an individual that departed east (45 – 134°) or west ( 225 – 314°); “Trans” indicates an individual that departed south (135 – 224°). Age: the age of an individual at capture. Individuals were aged as hatch year (HY) or after hatch year (AHY) according to Pyle (1997; see related article for full citation). Fat: the fat score of an individual at capture. Individuals were scored on a 6-point scale ranging from 0-5 following Helms and Drury (1960; see related article for full citation). Species: the standardized four letter alphabetic code used as an abbreviation for English common names of North American Birds. SWTH: Catharus ustulatus; REVI: Vireo olivaceus; INBU: Passerina cyanea; WOTH: Hylocichla mustelina; RTHU: Archilochus colubris. FTM_SD: stopover duration or number of days between first capture and departure from automated radio telemetry system coverage at the Bon Secour National Wildlife Refuge. TMB_SD: stopover duration or number of days between first and last detection from automated radio telemetry systems north of Mobile Bay, AL, USA. Mean speed north (km/hr): the northbound travel speed of individuals retreating from the Bon Secour National Wildlife Refuge by determining the time when the signal strength indicated the bird was directly east or west of the automated telemetry system and dividing the amount of time it took for an individual to move in an assumed straight path between the Refuge systems and those north of Mobile Bay, AL, USA. Mean speed south (km/hr): the southbound travel speed of individuals advancing from north of Mobile Bay, AL, USA by determining the time when the signal strength indicated the bird was directly east or west of the automated telemetry system and dividing the amount of time it took for an individual to move in an assumed straight path between the Refuge systems and those north of Mobile Bay, AL, USA. LN_FTM_DEP_TIME: the natural log of departure time from the Bon Secour National Wildlife Refuge. Departure time is defined as the number of hours before or after civil twilight. LN_TMB_DEP_TIME: the natural log of departure time from north of Mobile Bay, AL, USA. Departure time is defined as the number of hours before or after civil twilight. Paired_FTM_DEP_TIME: the departure time or number of hours before or after civil twilight from Bon Secour National Wildlife Refuge. Paired_TMB_DEP_TIME: the departure time or number of hours before or after civil twilight from north of Mobile Bay, AL, USA. Wind Direction: the direction from which the wind originated at the Bon Secour National Wildlife Refuge on nights when individuals were departing. “N” indicates winds from the north (≥315° or <45°); “E” indicates winds from the east (45 – 134°); “W” indicates winds from the west ( 225 – 314°); “S” indicates winds from the south (135 – 224°). Wind Speed (m/s): the wind speed on nights when individuals were departing the Bon Secour National Wildlife Refuge. Group: the direction the bird was traveling under specific wind conditions. Northbound individuals traveled north from Bon Secour National Wildlife Refuge. Southbound individuals traveled south from habitats north of Mobile Bay, AL, USA. Fruit: weekly mean number of ripe fruit per meter. Site: the site from which the data were collected. FTM is located within the Bon Secour National Wildlife Refuge. TMB is located within the Jacinto Port Wildlife Management Area. DOY: number indicating day of year (i.e., 1 January = 001….31 December = 365). Arthropod Biomass: estimated mean arthropod biomass from each sampling period. <b>Note:</b> Empty cells indicate unavailable data where applicable.
keywords: migratory birds; migration; automated telemetry; Gulf of Mexico
published: 2021-01-23
 
Data sets from "Comparing Methods for Species Tree Estimation With Gene Duplication and Loss." It contains data simulated with gene duplication and loss under a variety of different conditions.
keywords: gene duplication and loss; species-tree inference;
published: 2020-12-02
 
The dataset includes the survey results about farmers’ perceptions of marginal land availability and the likelihood of a land pixel being marginal based on a machine learning model trained from the survey. Two spreadsheet files are the farmer and farm characteristics (marginal_land_survey_data_shared.xlsx), and the existing land use of marginal lands (land_use_info_sharing.xlsx). <b>Note:</b> the blank cells in these two spreadsheets mean missing values in the survey response. The GeoTiff file includes two bands, one the marginal land likelihood in the Midwestern states (0-1), the other the dominant reason of land marginality (0-5; 0 for farm size, 1 for growing season precipitation, 2 for root zone soil water capacity, 3 for average slope, 4 for growing season mean temperature, and 5 for growing season diurnal range of temperature). To read the data, please use a GIS software such as ArcGIS or QGIS.
keywords: marginal land; survey
published: 2021-01-04
 
This dataset contains the emulated global multi-model urban climate projections under RCP 8.5 and RCP 4.5 used in the article "Global multi-model projections of local urban climates" (https://www.nature.com/articles/s41558-020-00958-8). Details about this dataset and the local urban climate emulator are described in the article. This dataset documents the monthly mean projections of urban temperatures and urban relative humidity of 26 CMIP5 Earth system models (ESMs) from 2006 to 2100 across the globe. This dataset may be useful for multiple communities regarding urban climate change, impacts, vulnerability, risks, and adaptation applications.
keywords: Urban climate; multi-model climate projections; CMIP; urban warming; heat stress
published: 2020-12-15
 
The dataset consists of results and various input data that are used in the GAMS model for the publication "Repeal of the Clean Power Plan: Social Cost and Distributional Implications". All the data are either excel files or in the .inc format which can be read within GAMS or Notepad. Main data sources include: agriculture, transportation and electricity data. Model details can be found in the paper and the GAMS model package.
keywords: carbon abatement; welfare cost; electricity sector; partial equilibrium model
published: 2020-04-22
 
Data on Croatian restaurant allergen disclosures on restaurant websites, on-line menus and social media comments
keywords: restaurant; allergen; disclosure; tourism
published: 2020-12-12
 
Dataset associated with Jones et al FE-2019-01175 submission: Does the size and developmental stage of traits at fledging reflect juvenile flight ability among songbirds? Excel CSV files with all of the data used in analyses and file with descriptions of each column. The flight ability variable in this dataset was derived from fledgling drop tests, examples of which can be found in the related dataset: Jones, Todd M.; Benson, Thomas J.; Ward, Michael P. (2019): Flight Ability of Juvenile Songbirds at Fledgling: Examples of Fledgling Drop Tests. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2044905_V1.
keywords: body condition; fledgling; flight ability; locomotor ability; post-fledging; songbirds; wing development; wing emergence
published: 2020-12-03
 
This small dataset is a raw data of anthropometric and dietary intake data.
keywords: Obesity treatment; weight management; high protein; high fiber; nonrestrictive; data visualization; self-empowerment; informed decision making
published: 2020-12-01
 
This is the data set from the published manuscript 'Vertebrate scavenger guild composition and utilization of carrion in an East Asian temperate forest' by Inagaki et al.
keywords: Japan;Sika Deer
published: 2020-11-20
 
This data set explores the effect of the cyanobacterial gene ictB on photosynthesis in sorghum, under both normal greenhouse growing temperatures (32 C / 25 C) and during and after an 8 day chilling stress (10 C / 5 C). IctB is a cyanobacterial gene of unknown function, which was initially thought to be involved in inorganic carbon transport into cells. While ictB is known now not to be an independently active carbon transporter in its own right, it may play a role in passive diffusion of metabolites. This transgene was introduced into sorghum by the lab of Thomas Clemente, through Agrobacterium mediated transformation, alone and in combination with the tomato sedoheptulose-1,7-bisphosphatase (SBPase) gene. Eleven events (six double construct and five single construct ictB) were involved in this study. SBPase was included because some previous experiments in C3 species and some previous modeling work, as well as its position at a metabolic branch point, indicates it plays a role as a control point for photosynthesis. A chilling treatment was included because chilling is one of the most serious ecological factors limiting the range of C4 species. Data includes gene expression, metabolomics (at normal growing temperature), SBPase enzyme activity, biomass and photosynthetic traits at both warm temperature and during and after chilling stress. ----------------- EXPLANATORY NOTES FOR ICTB/SBPASE SORGHUM MANUSCRIPT Data are organized into 10 worksheets, representing an expected 10 tables that will serve a supplementary role in the final publication. These include data on gene expression, metabolomics (at normal growing temperature), SBPase enzyme activity, biomass and photosynthetic traits at both warm temperature and during and after chilling stress. <i><b>Tables are as follows:</i></b> 1. Event_Code: for Table S1. Event codes for events and constructs. Two constructs were generated for this study, and numerous transgenic “events” (i.e. independent transformations) were carried out for each construct. A construct represents the actual vector which was introduced into the plants (complete with promoter, gene of interest, marker gene, etc.) while an event represents a single successful introduction of the transgene. Events are uniquely labeled with letter and number strings but also with a four-digit number for ease of reference, this table explains which event corresponds to each four-digit number. 2. Photosynthetic_Data: for Table S2. Photosynthetic data at greenhouse growing temperature, for ictB single construct, ictB/SBPase double construct, and wild type lines. Five ictB and six ictB/SBPase events were included. Greenhouse growing temperature was approximately 32 °C and 25 °C night. Photosynthetic parameters were measured using a Licor 6400-XT, and included parameters related to carbon dioxide uptake, water loss, and chlorophyll fluorescence. 3. Chilling_Treatment: for Table S3. Photosynthetic response to chilling treatment, for ictB single construct, and wild type lines. Four ictB events were included. Chilling treatment lasted approximately 8 days and began either 3.5 or 5.5 weeks after transplanting the plants (chilling was done in two batches). Chilling treatment involved temperatures of 10 °C day / 7 °C night in growth chambers. Photosynthetic parameters were measured at several time points during and after the chilling treatment, were measured using a Licor 6400-XT, and included parameters related to carbon dioxide uptake, water loss, and chlorophyll fluorescence. 4. SBPase_Activity: for Table S4. SBPase activity in double construct plants. These data measure in vitro substrate-saturated activity of SBPase in desalted extracts from leaf tissues, at 25 °C. Units are micromoles of SBP processed per second per m2 of leaf tissue. Five ictB/SBPase events were included. 5. 2014_gene_exp: for Table S5. Gene expression in 2014 experiment (units of cycle times). These data measure cycle times to threshold, relative to reference genes, for expression of ictB and SBPase. Six ictB single construct events and five ictB/SBPase double construct events were included. Cycle times to threshold relative to reference genes (ΔCT) are inversely related to number of transcripts relative to reference genes, as follows: ΔCT = -log2([NictB]/[Nreference])/[1 + log2b] where b = efficiency of replication. 6. 2016_gene_exp: for Table S5. Gene expression in 2016 experiment (units of cycle times). These data measure cycle times to threshold, relative to reference genes, for expression of ictB and SBPase. Six ictB single construct events and five ictB/SBPase double construct events were included. Cycle times to threshold relative to reference genes (ΔCT) are inversely related to number of transcripts relative to reference genes, as follows: ΔCT = -log2([NictB]/[Nreference])/[1 + log2b] where b = efficiency of replication. 7. Metabolites: for Table S7. Levels of 267 metabolites in leaf tissue. Four ictB single construct events and four ictB/SBPase double construct events were included in these analyses. Metabolites were measured in methanol-extracted samples, either by liquid chromatography / mass spectrometry or by gas chromatography / mass spectrometry, and were compared between events on a relative basis. As quantification was relative to wild type rather than on an absolute basis, no units are included. 8. Metabolite_F_values: for Table S8. F values for effects of ictB, SBPase (in cases where the model was better with a SBPase effect) and event. These analyses are done for each metabolite included in Table S7, and show effects of the explanatory variables ictB, SBPase, and individual event. 9. Biomass_2020: for Table S9. Biomass and grain yield at harvest, for ictB, ictB/SBPase and wild type sorghum plants in spring 2020. Four ictb/SBPase double construct and four ictB single construct events were included. 10. Biomass_2017: for Table S10. Biomass and grain yield at harvest, in chilled and non-chilled sorghum plants containing the ictB transgene (along with wild type controls) in fall 2017. Four ictB single construct events were included. Chilling treatment involved temperatures of 10 °C day / 7 °C night in growth chambers. <i><b>All the variables in the file are explained as below:</i></b> o Type (IctB-SBPase and IctB). This refers to whether a plant is wild type, single construct (contains only the ictB transgene) or double construct (contains both the ictB and SBPase transgenes). o Code: these codes are shorter labels to refer to each transgene event for the sake of convenience. o Alternate_Code: these codes are shorter labels to refer to each transgene event for the sake of convenience. o Event Number: these are unique labels for each transgenic events. o Construct Number: these are labels for each transgenic construct (either the ictB single construct or the ictB/SBPase double construct). o year (i): this refers to the year in which the study was conducted (2014, 2016, 2017, or 2020) o transgene or Transgenic: whether the transgene was present o construct or Type : whether the ictB or the ictB/SBPase construct was present (double, single, wildtype): o temp: leaf temperature during the measurement o A: carbon assimilation rate, in μmol m-2 s-1 o gs: stomatal conductance, in mol m-2 s-1 o CI: intercellular carbon dioxide concentration, in parts per million or μL L-1 o fvfm:FV’/FM’ (maximal potential photosystem II quantum yield under light adapted conditions), dimensionless ratio o phipsill: ΦPSII (maximal potential photosystem II quantum yield under light adapted conditions), dimensionless ratio o qP: photochemical quenching, i.e. ratio of ΦPSII to FV’/FM’ , dimensionless ratio o iwue: intrinsic water use efficiency, i.e. ratio of carbon assimilation rate to stomatal conductance, in units of μmol mol-1 o event: individual transgenic / transformation event o Vmax: substrate-saturated in vitro activity of the SBPase enzyme, in μmol m-2 s-1 o ID: identification number of sample o ΔCT1: difference in cycle times to threshold during gene expression (quantitative PCR) assay, between ictB and the reference gene GAPDH, in units of cycles o ΔCT2: cycle times to threshold during gene expression (quantitative PCR) assay, between SBPase and the reference gene GAPDH, in units of cycles o GAPDH: cycle times to threshold for the reference gene GAPDH (glyceraldehyde phosphate dehydrogenase) o IctB: cycle times to threshold for the gene of interest ictB o SBPase: cycle times to threshold for the gene of interest SBPase o v1 to v267 represent individual metabolite (see the heading immediately above the labels v1, v2, etc.). Variables v268-v272 refer to total (summed) metabolite levels for particular pathways of interest. o leaf: Leaf and stem dry biomass (in grams) o seed: Seedhead dry biomass (in grams) o biomass: Total (leaf, stem + seed head) dry biomass (in grams) o harvind: ratio of seed head dry biomass to total dry biomass o treatment (chilled and nonchilled): “Chilled” plants were grown under warm greenhouse conditions (32 °C day / 25 °C night) for 6 or 8 weeks, then switched to chilling temperatures under growth chamber conditions (10 °C / 7 °C night) for 8 days, and were then returned to greenhouse growing conditions. -----------------
keywords: ictB; SBPase; photosynthesis; sorghum; chilling
published: 2020-11-25
 
Video recorded by Louise Barker using a Cannon Powershot camera documents late-season combat behavior in Agkistrodon contortrix. Recorded in Beaufort County, North Carolina, 11.1 km SE of downtown Washington on 21 October 2020.
keywords: Agkistrodon contortrix; combat; mating; reproduction; copperhead; pit viper; Viperidae;
published: 2020-12-31
 
This dataset contains the amino acid and nucleotide alignments corresponding to the phylogenetic analyses of South et al. 2020 in Systematic Entomology. This dataset also includes the gene trees that were used as input for coalescent analysis in ASTRAL.
keywords: Plecoptera; stoneflies; phylogeny; insects
published: 2020-11-18
 
These data obtained from the peer-reviewed literature and a public database depict the geographic expansion of the black-legged tick (Ixodes scapularis) and human cases of Lyme disease in the midwestern U.S. <b><i>Note</b></i>: There was an omission from the first version (V1) of the data set that required us to update the data. Specifically, we failed to include the data from the article "Caporale DA, Johnson CM, Millard BJ. 2005 Presence of Borrelia burgdorferi (Spirochaetales: Spirochaetaceae) in Southern Kettle Moraine State Forest, Wisconsin, and characterization of strain W97F51. J. Med. Entomol. 42, 457–472". In the second version (V2) of the data, this omission is corrected.
keywords: Lyme disease; Borrelia burgdorferi; Ixodes scapularis; black-legged tick
published: 2020-11-18
 
This is the dataset that accompanies the paper titled "A Dual-Frequency Radar Retrieval of Snowfall Properties Using a Neural Network", submitted for peer review in August 2020. Please see the github for the most up-to-date data after the revision process: https://github.com/dopplerchase/Chase_et_al_2021_NN Authors: Randy J. Chase, Stephen W. Nesbitt and Greg M. McFarquhar Corresponding author: Randy J. Chase (randyjc2@illinois.edu) Here we have the data used in the manuscript. Please email me if you have specific questions about units etc. 1) DDA/GMM database of scattering properties: base_df_DDA.csv This is the combined dataset from the following papers: Leinonen & Moisseev, 2015; Leinonen & Szyrmer, 2015; Lu et al., 2016; Kuo et al., 2016; Eriksson et al., 2018. The column names are D: Maximum dimension in meters, M: particle mass in grams kg, sigma_ku: backscatter cross-section at ku in m^2, sigma_ka: backscatter cross-section at ka in m^2, sigma_w: backscatter cross-section at w in m^2. The first column is just an index column. 2) Synthetic Data used to train and test the neural network: Unrimed_simulation_wholespecturm_train_V2.nc, Unrimed_simulation_wholespecturm_test_V2.nc This was the result of combining the PSDs and DDA/GMM particles randomly to build the training and test dataset. 3) Notebook for training the network using the synthetic database and Google Colab (tensorflow): Train_Neural_Network_Chase2020.ipynb This is the notebook used to train the neural network. 4)Trained tensorflow neural network: NN_6by8.h5 This is the hdf5 tensorflow model that resulted from the training. You will need this to run the retrieval. 5) Scalers needed to apply the neural network: scaler_X_V2.pkl, scaler_y_V2.pkl These are the sklearn scalers used in training the neural network. You will need these to scale your data if you wish to run the retrieval. 6) <b>New in this version</b> - Example notebook of how to run the trained neural network on Ku- Ka- band observations. We showed this with the 3rd case in the paper: Run_Chase2021_NN.ipynb 7) <b>New in this version</b> - APR data used to show how to run the neural network retrieval: Chase_2021_NN_APR03Dec2015.nc The data for the analysis on the observations are not provided here because of the size of the radar data. Please see the GHRC website (<a href="https://ghrc.nsstc.nasa.gov/home/">https://ghrc.nsstc.nasa.gov/home/</a>) if you wish to download the radar and in-situ data or contact me. We can coordinate transferring the exact datafiles used. The GPM-DPR data are avail. here: <a href="http://dx.doi.org/10.5067/GPM/DPR/GPM/2A/05">http://dx.doi.org/10.5067/GPM/DPR/GPM/2A/05</a>
published: 2020-11-14
 
Dataset includes temperature data (local average April daily temperatures), first egg dates and reproductive output of Prothonotary Warblers breeding in southernmost Illinois, USA. Also included are arrival dates for warblers returning to breeding grounds from wintering grounds, and global temperature anomaly data for comparison with local temperatures. These data were used in the manuscript entitled "Warmer April Temperatures on Breeding Grounds Promote Earlier Nesting in a Long-Distance Migratory Bird, the Prothonotary Warbler" published in Frontiers in Ecology and Evolution. A rich text file is included with explanations of each variable in the dataset.
keywords: first egg dates; global warming; local temperature effects; long-distance migratory bird; prothonotary warbler; protonotaria citrea; reproductive output
published: 2020-11-06
 
This data contains bam files and transcripts in the simulated instances generated for the paper 'JUMPER: Discontinuous Transcript Assembly in SARS-CoV-2' submitted for RECOMB 2021. The folder 'bam' contained the simulated bam files aligned using STAR wile the reads were generated using the method polyester Note: in the readme file, close to the end of the document, please ignore this sentence: 'Those files can be opened by using [name of software].'
keywords: transcript assembly; SARS-CoV-2; discontinuous transcription; coronaviruses
published: 2020-11-05
 
This version 2 dataset contains 34 files in total with one (1) additional file, called "Culture-dependent Isolate table with taxonomic determination and sequence data.csv". The remaining files (33) are identical to version 1. The following is the information about the new file and its variables: <b>Culture-dependent Isolate table with taxonomic determination and sequence data.csv</b>: Culture table with assigned taxonomy from NCBI. Single direction sequence for each isolate is include if one could be obtained. Sequence is derived from ITS1F-ITS4 PCR amplicons, with Sanger sequencing in one direction using ITS5. The files contains 20 variables with explanation as below: IsolateNumber : unique number identify each isolate cultured Time: season in which the sample was collected Location: the specific name of the location Habitat: type of habitat : either stream or peatland State: state in the USA in which the specific location is located Incubation_pH ID: pH of the medium during isolation of fungal cultures Genus: phylogenetic genus of the fungal isolates (determined by sequence similarity) Sequence_quality: base call quality of the entire sequence used for blast analysis, if known %_coverage: sequence coverage reported from GenBank %_ID: sequence similarity reported from GenBank Life_style : ecological life style if known Phylum: phylogenetic phylum as indicated by Index Fungorum Subphylum: phylogenetic subphylum as indicated by Index Fungorum Class: phylogenetic class as indicated by Index Fungorum Subclass: phylogenetic subclass as indicated by Index Fungorum Order: phylogenetic order as indicated by Index Fungorum Family: phylogenetic Family as indicated by Index Fungorum ITS5_Sequence: single direction sequence used for sequence similarity match using blastn. Primer ITS5 Fasta: sequence with nomenclature in a fasta format for easy cut and paste into phylogenetic software Note: blank cells mean no data is available or unknown.
keywords: ITS1 forward reads; Illumina; peatlands; streams; bogs; fens
published: 2020-07-15
 
This repository includes scripts and datasets for Chapter 6 of my PhD dissertation, " Supertree-like methods for genome-scale species tree estimation," that had not been published previously. This chapter is based on the article: Molloy, E.K. and Warnow, T. "FastMulRFS: Fast and accurate species tree estimation under generic gene duplication and loss models." Bioinformatics, In press. https://doi.org/10.1093/bioinformatics/btaa444. The results presented in my PhD dissertation differ from those in the Bioinformatics article, because I re-estimated species trees using FastMulRF and MulRF on the same datasets in the original repository (https://doi.org/10.13012/B2IDB-5721322_V1). To re-estimate species trees, (1) a seed was specified when running MulRF, and (2) a different script (specifically preprocess_multrees_v3.py from https://github.com/ekmolloy/fastmulrfs/releases/tag/v1.2.0) was used for preprocessing gene trees (which were then given as input to MulRF and FastMulRFS). Note that this preprocessing script is a re-implementation of the original algorithm for improved speed (a bug fix also was implemented). Finally, it was brought to my attention that the simulation in the Bioinformatics article differs from prior studies, because I scaled the species tree by 10 generations per year (instead of 0.9 years per generation, which is ~1.1 generations per year). I re-simulated datasets (true-trees-with-one-gen-per-year-psize-10000000.tar.gz and true-trees-with-one-gen-per-year-psize-50000000.tar.gz) using 0.9 years per generation to quantify the impact of this parameter change (see my PhD dissertation or the supplementary materials of Bioinformatics article for discussion).
keywords: Species tree estimation; gene duplication and loss; statistical consistency; MulRF, FastRFS