Displaying 76 - 100 of 668 in total

Subject Area

Life Sciences (365)
Social Sciences (136)
Physical Sciences (101)
Technology and Engineering (64)
Arts and Humanities (1)
Uncategorized (1)

Funder

Other (206)
U.S. National Science Foundation (NSF) (193)
U.S. Department of Energy (DOE) (68)
U.S. National Institutes of Health (NIH) (63)
U.S. Department of Agriculture (USDA) (44)
Illinois Department of Natural Resources (IDNR) (17)
U.S. Geological Survey (USGS) (7)
U.S. National Aeronautics and Space Administration (NASA) (6)
Illinois Department of Transportation (IDOT) (4)
U.S. Army (2)

Publication Year

2021 (108)
2022 (108)
2020 (96)
2023 (78)
2019 (72)
2024 (70)
2018 (61)
2017 (36)
2016 (30)
2025 (4)
2009 (1)
2011 (1)
2012 (1)
2014 (1)
2015 (1)

License

CC0 (367)
CC BY (281)
custom (20)

Datasets

published: 2022-10-10
 
Aerial imagery utilized as input in the manuscript "Deep convolutional neural networks exploit high spatial and temporal resolution aerial imagery to predict key traits in miscanthus" . Data was collected over M. Sacchariflorus and Sinensis breeding trials at the Energy Farm, UIUC in 2020. Flights were performed using a DJI M600 mounted with a Micasense Rededge multispectral sensor at 20 m altitude around solar noon. Imagery is available as tif file by field trial and date (10). The post-processing of raw images into orthophoto was performed in Agisoft Metashape software. Each crop surface model and multispectral orthophoto was stacked into an unique raster stack by date and uploaded here. Each raster stack includes 6 layers in the following order: Layer 1 = crop surface model, Layer 2 = Blue, Layer 3 = Green, Layer 4 = Red, Layer 5 = Rededge, and Layer 6 = NIR multispectral bands. Msa raster stacks were resampled to 1.67 cm spatial resolution and Msi raster stacks were resampled to 1.41 cm spatial resolution to ease their integration into further analysis. 'MMDDYYYY' is the date of data collection, 'MSA' is M. Sacchariflorus trial, 'MSI' is Miscanthus Sinensis trial, 'CSM' is crop surface model layer, and 'MULTSP' are the five multispectral bands.
keywords: convolutional neural networks; miscanthus; perennial grasses; bioenergy; field phenotyping; remote sensing; UAV
published: 2023-07-10
 
Bee movement between habitat patches in a naturally fragmented ecosystem depended on species, patch, and matrix variables. Using a mark-recapture methodology in the naturally fragmented Ozark glade ecosystem, we assessed the importance of bee size, nesting biology, the distance between patches (e.g., isolation), and nesting and floral resources in habitat patches and the surrounding matrix on bee movement. This dataset includes seven data files, three R code files, and a QGIS tool. Three of the data files include information collected at the study sites with regard to bees and matrix and patch characteristics. The other four data files are spatial files used to quantify the characteristics of the forest canopy between the study sites and the edge-to-edge distances between the study sites. R code in the R Markdown file recreates the analysis and data presentation for the associated publication. R script files contain processes for calculating some of the explanatory variables used in the analysis. The QGIS tool can be used as the first step to obtaining average values from a raster file where the cells are large relative to the areas of interest (AOI) that you would like to characterize. The second step is contained in one of the aforementioned R scripts. Detected effects included: Larger bees were more likely to move between patches. Bee movement was less likely as the distance between patches increased. However, relatively short distances (~50 m) inhibited movement more than our a priori expectations. Bees were unlikely to move away from home patches with abundant and diverse floral and below-ground nesting resources. When home patches were less resource-rich, bee movement depended on the characteristics of the away patch or the matrix. In these cases, bees were more likely to move to away patches with greater below-ground nesting and floral resources. Matrix habitats with more available floral and below-ground nesting resources appear to impede movement to neighboring patches, potentially because they already provide supplemental resources for bees.
keywords: habitat fragmentation; bees; movement; mark-recapture; nesting resources; floral resources; isolation
published: 2019-05-16
 
This repository includes scripts and datasets for the paper, "Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge." All data files in this repository are for analyses using the logdet distance matrix computed on the concatenated alignment. Data files for analyses using the average gene-tree internode distance matrix can be downloaded from the Illinois Data Bank (https://doi.org/10.13012/B2IDB-1424746_V1). The latest version of NJMerge can be downloaded from Github (https://github.com/ekmolloy/njmerge).<br /> <strong>List of Changes:</strong> &bull; Updated timings for NJMerge pipelines to include the time required to estimate distance matrices; this impacted files in the following folder: <strong>data.zip</strong> &bull; Replaced "Robinson-Foulds" distance with "Symmetric Difference"; this impacted files in the following folders: <strong> tools.zip; data.zip; scripts.zip</strong> &bull; Added some additional information about the java command used to run ASTRAL-III; this impacted files in the following folders: <strong>data.zip; astral64-trees.tar.gz (new)</strong>
keywords: divide-and-conquer; statistical consistency; species trees; incomplete lineage sorting; phylogenomics
published: 2019-05-31
 
The data are provided to illustrate methods in evaluating systematic transactional data reuse in machine learning. A library account-based recommender system was developed using machine learning processing over transactional data of 383,828 transactions (or check-outs) sourced from a large multi-unit research library. The machine learning process utilized the FP-growth algorithm over the subject metadata associated with physical items that were checked-out together in the library. The purpose of this research is to evaluate the results of systematic transactional data reuse in machine learning. The analysis herein contains a large-scale network visualization of 180,441 subject association rules and corresponding node metrics.
keywords: evaluating machine learning; network science; FP-growth; WEKA; Gephi; personalization; recommender systems
published: 2023-12-20
 
Important Note: the raw transient files need to be downloaded through this separate link: https://uofi.box.com/s/oagdxhea1wi8tvfij4robj0z0w8wq7j4. Once downloaded, place the file within the within the .d folder in the unzipped 20210930_ShortTransient_S3_5 folder to perform reconstruction step. The minimal datasets to run the computational pipeline MEISTER introduced in the manuscript titled "Integrative Multiscale Biochemical Mapping of the Brain via Deep-Learning-Enhanced High-Throughput Mass Spectrometry". The key steps of our computational pipeline include (1) tissue mass spectrometry imaging (MSI) reconstruction; (2) multimodal image registration and 3D reconstruction; (3) regional analysis; and (4) single-cell and tissue data integration. Detailed protocols to reproduce our results in the manuscript are provided with an example data set shared for learning the protocols. Our computational processing codes are implemented mostly in Python as well as MATLAB (for image registration).
keywords: deep learning;mass spectrometry;single cells
published: 2024-02-21
 
Data associated with the manuscript "Niche conservatism and spread explain hybridization and introgression between native and invasive fish" by Jordan H. Hartman, Joel B. Corush, Eric R. Larson, Jeremy S. Tiemann, Philip Willink, and Mark A. Davis. For this project, we combined results of ecological niche models (ENMs) and next-generation restriction site-associated DNA sequencing (RADseq) to test theories of niche conservatism and biotic resistance on the success of invasion, hybridization, and extent of introgression between native Western Banded Killifish and non-native Eastern Banded Killifish. This dataset provides the sampling locations and number of Banded Killifish in each population, accession numbers for RADseq from the National Center for Biotechnology Information Sequence Read Archive and the assignment of each Banded Killifish, the habitat associations of each population from the ENMs, and the occurrence points used to build the ENMs.
keywords: Banded Killifish; ecological niche model; Fundulus diaphanus; hybrid swarm; invasive species; Laurentian Great Lakes
published: 2018-09-06
 
The XSEDE program manages the database of allocation awards for the portfolio of advanced research computing resources funded by the National Science Foundation (NSF). The database holds data for allocation awards dating to the start of the TeraGrid program in 2004 to present, with awards continuing through the end of the second XSEDE award in 2021. The project data include lead researcher and affiliation, title and abstract, field of science, and the start and end dates. Along with the project information, the data set includes resource allocation and usage data for each award associated with the project. The data show the transition of resources over a fifteen year span along with the evolution of researchers, fields of science, and institutional representation.
keywords: allocations; cyberinfrastructure; XSEDE
published: 2023-06-29
 
This database provides estimates of agricultural and food commodity flows [in both tons and $US] between the US and China for the year 2017. Pairwise information is provided between US states and Chinese provinces, and US counties and Chinese provinces for 7 Standardized Classification of Transported Goods (SCTG) commodity categories. Additionally, crosswalks are provided to match Harmonized System (HS) codes and China's Multi-Regional Input Output (MRIO) commodity sectors to their corresponding SCTG commodity codes. The included SCTG commodities are: - SCTG 01: Iive animals and fish - SCTG 02: cereal grains - SCTG 03: agricultural products (except for animal feed, cereal grains, and forage products) - SCTG 04: animal feed, eggs, honey, and other products of animal origin - SCTG 05: meat, poultry, fish, seafood, and their preparations - SCTG 06: milled grain products and preparations, and bakery products - SCTG 07: other prepared foodstuffs, fats and oils For additional information, please see the related paper by Pandit et al. (2022) in Environmental Research Letters. ADD DOI WHEN RECEIVED
keywords: Food flows; High-resolution; County-scale; Bilateral; United States; China
published: 2024-03-25
 
This is the dataset for the manuscript titled, "Differing physiological performance of coexisting cool- and warmwater fish species under heatwaves in the Midwestern United States"
keywords: climate change; heat wave; metabolic rate; swimming; predator-prey interaction; thermal tolerance; Sander vitreus; walleye; largemouth bass; species distributions
published: 2024-01-19
 
This data set is related to a SoyFACE experiment conducted in 2004, 2006, 2007, and 2008 with the soybean cultivars Loda and HS93-4118. The experiment looked at how seed elements were affected by elevated CO2 and yield. In this V2, 2 new files were added per journal requirement. Total there are 5 data files in text format within the digrado_et_al_gcb_data_V2 and 1 readme file. The name of files are listed below. Details about headers are explained in the readme.txt file. <b>1. ionomic_data.txt file</b> contains the ionomic data (mg/kg) for the two cultivars. The file contains all six technical replicates for each plot. The cultivar, year, treatment, and the plot from which the samples were collected are given for each entry. <b>2. yield_data.txt file</b> contains the yield data for the two cultivars (seed yield in kg/ha, seed yield in bu/a, Protein (%), Oil (%)). The file contains yield data for every plot. The cultivar, year, treatment, and the plot from which the samples were collected are given for each entry. <b>3. mineral_pro_oil_yield.txt file</b> contains the yield per hectare for each mineral (g/ha) along with the yield per hectare for protein and oil (t/ha). This was obtained by multiplying the seed content of each element (minerals, protein, and oil) by the total seed yield. The file contains yield data for every plots. The cultivar, year, treatment, and the plot from which the samples were collected are given for each entry. <b>4. economic_assessment.txt file</b> contains data used to assess the financial impact of altered seed oil content on soybean oil production. <b>5. meteorological_data.txt file</b> contains the meteorological data recorded by a weather station located ~ 3km from the experimental site (Willard Airport Champaign). Data covering the period between May 28 and September 24 were used for 2004; between May 25 and September 24 were used in 2006; between May 23 and September 17 in 2007; and between June 16 and October 24 in 2008.
keywords: protein; oil; mineral; SoyFACE; nutrient; Glycine max; soybean; yield; CO2; agriculture; climate change
published: 2024-02-27
 
Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011) <br> <b>Items in this Dataset</b> 1. <i>Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf</i> - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. <i>Revised February 2024</i> 2. <i>Coup Data v2.1.3.csv</i> - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. <i>Revised February 2024</i> 3. <i>Source Document v2.1.3.pdf</i> - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. <i>Revised February 2024</i> 4. <i>README.md</i> - This file contains useful information for the user about the dataset. It is a text file written in markdown language. <i>Revised February 2024</i> <br> <b> Citation Guidelines</b> 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7
published: 2024-03-25
 
Diversity - PubMed dataset Contact: Apratim Mishra (March 22, 2024) This dataset presents article-level (pmid) and author-level (auid) diversity data for PubMed articles. The selection chosen includes articles retrieved from Authority 2018 [1], a total of 228 040 papers and 440 310 authors. The sample of papers is based on the top 40 journals in the dataset, limited to 2-10 authors published between 1990 – 2010, and stratified on paper count per year. Additionally, this dataset is limited to papers where the lead author is affiliated with one of the four countries: the US, the UK, Canada, and Australia. Files are encoded with ‘utf-8’. ################################################ File1: auids_plos.csv (Important columns defined, 7 in total) • AUID: a unique ID for each author • Ethnea: ethnicity prediction • Genni: gender prediction ################################################# File2: pmids_plos.csv (Important columns defined, 33 in total) • pmid: unique paper ID • year: Year of paper publication • no_authors: Author count • journal: Journal name • years: first year of publication for every author • age_bin: Binned age for every author • Country-temporal: Country of affiliation for every author • h_index: Journal h-index • TimeNovelty: Paper Time novelty [2] • nih_funded: Binary variable indicating NIH funding for any author • prior_cit_mean: Mean of all authors’ prior citation rate • Insti_impact_all: All authors’ respective institutions’ citation count • Insti_impact: Maximum of all institutions’ citation count • mesh_vals: Top MeSH values for every author for that paper • outer_mesh_vals: MeSH qualifiers for every author for that paper • relative_citation_ratio: RCR The ‘Readme’ includes a description for all columns. [1] Torvik, Vetle; Smalheiser, Neil (2021): Author-ity 2018 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2273402_V1 [2] Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1
keywords: Diversity; PubMed; Citation
published: 2024-03-28
 
Read me file for the data repository ******************************************************************************* This repository has raw data for the publication "Enhancing Carrier Mobility In Monolayer MoS2 Transistors With Process Induced Strain". We arrange the data following the figure in which it first appeared. For all electrical transfer measurement, we provide the up-sweep and down-sweep data, with voltage units in V and conductance unit in S. All Raman modes have unit of cm^-1. ******************************************************************************* How to use this dataset All data in this dataset is stored in binary Numpy array format as .npy file. To read a .npy file: use the Numpy module of the python language, and use np.load() command. Example: suppose the filename is example_data.npy. To load it into a python program, open a Jupyter notebook, or in the python program, run: import numpy as np data = np.load("example_data.npy") Then the example file is stored in the data object. *******************************************************************************
published: 2016-12-13
 
BAM files for founding strain (MG1655-motile) as well as evolved strains from replicate motility selection experiments in low-viscosity agar plates containing either rich medium (LB) or minimal medium (M63+0.18mM galactose)
published: 2022-03-25
 
This upload includes the 16S.B.ALL in 100-HF condition (referred to as 16S.B.ALL-100-HF) used in Experiment 3 of the WITCH paper (currently accepted in principle by the Journal of Computational Biology). 100-HF condition refers to making sequences fragmentary with an average length of 100 bp and a standard deviation of 60 bp. Additionally, we enforced that all fragmentary sequences to have lengths > 50 bp. Thus, the final average length of the fragments is slightly higher than 100 bp (~120 bp). In this case (i.e., 16S.B.ALL-100-HF), 1,000 sequences with lengths 25% around the median length are retained as "backbone sequences", while the remaining sequences are considered "query sequences" and made fragmentary using the "100-HF" procedure. Backbone sequences are aligned using MAGUS (or we extract their reference alignment). Then, the fragmentary versions of the query sequences are added back to the backbone alignment using either MAGUS+UPP or WITCH. More details of the tar.gz file are described in README.txt.
keywords: MAGUS;UPP;Multiple Sequence Alignment;eHMMs
published: 2016-06-06
 
These datasets represent first-time collaborations between first and last authors (with mutually exclusive publication histories) on papers with 2 to 5 authors in years [1988,2009] in PubMed. Each record of each dataset captures aspects of the similarity, nearness, and complementarity between two authors about the paper marking the formation of their collaboration.
published: 2018-05-06
 
This deposit contains all raw data and analysis from the paper "In-cell titration of small solutes controls protein stability and aggregation". Data is collected into several types: 1) analysis*.tar.gz are the analysis scripts and the resulting data for each cell. The numbers correspond to the numbers shown in Fig.S1. (in publication) 2) scripts.tar.gz contains helper scripts to create the dataset in bash format. 3) input.tar.gz contains headers and other information that is fed into bash scripts to create the dataset. 4) All rawData*.tar.gz are tarballs of the data of cells in different solutes in .mat files readable by matlab, as follows: - Each experiment included in the publication is represented by two matlab files: (1) a calibration jump under amber illumination (_calib.mat suffix) (2) a full jump under blue illumination (FRET data) - Each file contains the following fields: &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;coordleft - coordinates of cropped and aligned acceptor channel on the original image &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;coordright - coordinates of cropped and aligned donor channel on the original image] &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dataleft - a 3d 12-bit integer matrix containing acceptor channel flourescence for each pixel and time step. Not available in _calib files &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dataright - a 3d 12-bit integer matrix containing donor channel flourescence for each pixel and time step. This will be mCherry in _calib files and AcGFP in data files. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;frame1 - original image size &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;imgstd - cropped dimensions &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;numFrames - number of frames in dataleft and dataright &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;videos - a structure file containing camera data. Specifically, videos.TimeStamp includes the time from each frame.
keywords: Live cell; FRET microscopy; osmotic challenge; intracellular titrations; protein dynamics
published: 2022-05-13
 
The files are plain text and contain the original data used in phylogenetic analyses of of Typhlocybinae (Bin, Dietrich, Yu, Meng, Dai and Yang 2022: Ecology & Evolution, in press). The three files with extension .phy are text files with aligned DNA sequences in the standard PHYLIP format and correspond to Matrix 1 (amino acid alignment), Matrix 2 (nucleotide alignment of first two codon positions of protein-coding genes) and Matrix 3 (nucleotide alignment of protein-coding genes plus 2 ribosomal genes) described in the Methods section. An additional text file in NEXUS format (.nex extension) contains the morphological character data used in the ancestral state reconstruction (ASCR) analysis described in the Methods. NEXUS is a standard format used by various phylogenetic analysis software. For more information on data file content, see the included "readme" files.
keywords: Hemiptera; phylogeny; mitochondrial genome; morphology; leafhopper
published: 2022-08-08
 
This upload contains all datasets used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment". The zip file has the following structure (presented as an example): salma_paper_datasets/ |_README.md |_10aa/ |_crw/ |_homfam/ |_aat/ | |_... |_... |_het/ |_5000M2-het/ | |_... |_5000M3-het/ ... |_rec_res/ Generally, the structure can be viewed as: [category]/[dataset]/[replicate]/[alignment files] # Categories: 1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate. 2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM). 3. homfam: There are the 10 largest Homfam datasets, each with one replicate. 4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates. 5. rec\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper. # Alignment files There are at most 6 `.fasta` files in each sub-directory: 1. `all.unaln.fasta`: All unaligned sequences. 2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included. 3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences). 4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included. 5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences). 6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included. >If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing. >If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing. >If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing. # Additional file(s) 1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences.
keywords: SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity
published: 2023-03-15
 
This data set is related to the SoyFACE experiments, which are open-air agricultural climate change experiments that have been conducted since 2001. The fumigation experiments take place at the SoyFACE farm and facility in Champaign County, Illinois during the growing season of each year, typically between June and October. - The <i>"SoyFACE Plot Information 2001 to 2021"</i> file contains information about each year of the SoyFACE experiments, including the fumigation treatment type (CO2, O3, or a combination treatment), the crop species, the plots (also referred to as 'rings' and labeled with numbers between 2 and 31) used in each experiment, important experiment dates, and the target concentration levels or 'setpoints' for CO2 and O3 in each experiment. - This data set includes files with minute readings of the fumigation levels (<i>"SoyFACE 1-Minute Fumigation Data Files"</i> folder) from the SoyFACE experiments. The <i>"Soyface 1-Minute Fumigation Data Files"</i> folder contains sub-folders for each year of the experiments, each of which contains sub-folders for each ring used in that year's experiments. This data set also includes hourly data files for the fumigation experiments (<i>"SoyFACE Hourly Fumigation Data Files"</i> folder) created from the 1-minute files, and hourly ambient/weather data files for each year of the experiments (<i>"Hourly Weather and Ambient Data Files"</i> folder). The ambient CO2 and O3 data are collected at SoyFACE, and the weather data are collected from the SURFRAD and WARM weather stations located near the SoyFACE farm. - The <i>"Fumigation Target Percentages"</i> file shows how much of the time the CO2 and O3 fumigation levels are within a 10 or 20 percent margin of the target levels when the fumigation system is turned on. - The <i>"Matlab Files"</i> folder contains custom code (Aspray, E.K.) that was used to clean the <i>"SoyFACE 1-Minute Fumigation Data"</i> files and to generate the <i>"SoyFACE Hourly Fumigation Data"</i> and <i>"Fumigation Target Percentages"</i> files. Code information can be found in the <i>"SoyFACE Hourly Fumigation Data Explanation"</i> file. - Finally, the <i>" * Explanation"</i> files contain information about the column names, units of measurement, and other pertinent information for each data file. *<b>NOTE:</b> We have identified some files in the “SoyFACE 1-Minute Fumigation Data Files” folder in our SoyFACE data set submission that were not downloaded properly - the files were present in the folder, but the actual files were empty. V3 ensures that there are no longer any empty files in the data set.
keywords: SoyFACE; agriculture; agricultural; climate; climate change; atmosphere; atmospheric change; CO2; carbon dioxide; O3; ozone; soybean; fumigation; treatment
published: 2020-06-03
 
This dataset provides files for use in analysis of human land preference across Australasia, and in a localized analysis of land preference in Laos and Vietnam. All files can be imported into ArcGIS for visualization, and re-analyzed using the open source Maxent species distribution modeling program. CSV files contain known human presence sites for model validation. ASC files contain geographically coded environmental data for mean annual temperature and mean annual precipitation during the Last Glacial Maximum, as well as downward slope data. All ASC files are in the WGS 1984 Mercator map projection for visualization in ArcGIS and can be opened as text files in text editors supporting large file sizes.
keywords: human dispersal; ecological niche modeling; Australasia; Late Pleistocene; land preference
published: 2022-02-14
 
This dataset contains simulation results from numerical model PartMC-MOSAIC used in the article "Quantifying the effects of mixing state on aerosol optical properties". This article is submitted to the journal Atmospheric Physics and Chemistry. There are total 100 scenario directories in this dataset, denoted from 00-99. Each scenario contains 25 NetCDF files hourly output from PartMC-MOSAIC simulations containing the simulated gas and particle information. The data was produced using version 2.5.0 of PartMC-MOSAIC. Instructions to compile and run PartMC-MOSAIC are available at https://github.com/compdyn/partmc. The chemistry code MOSAIC is available by request from Rahul.Zaveri@pnl.gov. For more details of reproducing the cases, please contact nriemer@illinois.edu and yuyao3@illinois.edu.
keywords: Aerosol mixing state; Aerosol optical properties; Mie calculation; Black Carbon
published: 2023-02-23
 
Coups d'État are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d'État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e. realized or successful coups, unrealized coup attempts, or thwarted conspiracies) the type of actor(s) who initiated the coup (i.e. military, rebels, etc.), as well as the fate of the deposed leader. This current version, Version 2.1.2, adds 6 additional coup events that occurred in 2022 and updates the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrects a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixes this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removes two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and adds executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Changes from the previously released data (v2.0.0) also include: 1. Adding additional events and expanding the period covered to 1945-2022 2. Filling in missing actor information 3. Filling in missing information on the outcomes for the incumbent executive 4. Dropping events that were incorrectly coded as coup events <br> <b>Items in this Dataset</b> 1. <i>Cline Center Coup d'État Codebook v.2.1.2 Codebook.pdf</i> - This 16-page document provides a description of the Cline Center Coup d’État Project Dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d’État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. <i>Revised February 2023</i> 2. <i>Coup Data v2.1.2.csv</i> - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 981 observations. <i>Revised February 2023</i> 3. <i>Source Document v2.1.2.pdf</i> - This 315-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. <i>Revised February 2023</i> 4. <i>README.md</i> - This file contains useful information for the user about the dataset. It is a text file written in markdown language. <i>Revised February 2023</i> <br> <b> Citation Guidelines</b> 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2023. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.2. February 23. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V6 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2023. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.2. February 23. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V6
published: 2015-12-16
 
This dataset contains the data for PASTA and UPP. PASTA data was used in the following articles: Mirarab, Siavash, Nam Nguyen, Sheng Guo, Li-San Wang, Junhyong Kim, and Tandy Warnow. “PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.” Journal of Computational Biology 22, no. 5 (2015): 377–86. doi:10.1089/cmb.2014.0156. Mirarab, Siavash, Nam Nguyen, and Tandy Warnow. “PASTA: Ultra-Large Multiple Sequence Alignment.” Edited by Roded Sharan. Research in Computational Molecular Biology, 2014, 177–91. UPP data was used in: Nguyen, Nam-phuong D., Siavash Mirarab, Keerthana Kumar, and Tandy Warnow. “Ultra-Large Alignments Using Phylogeny-Aware Profiles.” Genome Biology 16, no. 1 (December 16, 2015): 124. doi:10.1186/s13059-015-0688-z.
published: 2019-08-13
 
Multiple sequence alignments from concatenated nuclear and mitochondrial genes and resulting phylogenetic tree files of fruit doves and their close relatives. Files include: BEAST input XML file (fruit_dove_beast_input.xml); a maximum clade credibility tree from a BEAST analysis (fruit_dove_beast_mcc.tre); concatenated multiple sequence alignment NEXUS files for the novel dataset (fruit_dove_concatenated_alignment.nex, 76 taxa, 4,277 characters) and the dataset with additional sequences (fruit_dove_plus_cibois_data_concatenated_alignment.nex, 204 taxa, 4,277 characters), both of which contain a MrBayes block including partition information; and 50% majority-rule consensus trees generated from MrBayes analyses, using the NEXUS alignment files as inputs (fruit_dove_mrbayes_consensus.tre, fruit_dove_plus_cibois_data_mrbayes_consensus.tre).
keywords: fruit doves; multiple sequence alignment; phylogeny; Aves: Columbidae