Displaying 351 - 375 of 708 in total
Subject Area
Funder
Publication Year
License
Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2022-01-01
 
The file “Fla.fasta”, comprising 10526 positions, is the concatenated amino acid alignments of 51 orthologues of 182 bacterial strains. It was used for the maximum likelihood and maximum parsimony analyses of Flavobacteriales. Bacterial species names and strains were used as the sequence names, host names of insect endosymbionts were shown in brackets. The file “16S.fasta” is the alignment of 233 bacterial 16S rRNA sequences. It contains 1455 positions and was used for the maximum likelihood analysis of flavobacterial insect endosymbionts. The names of endosymbiont strains were replaced by the name of their hosts. In addition to the species names, National Center for Biotechnology Information (NCBI) accession numbers were also indicated in the sequence names (e.g., sequence “Cicadellidae_Deltocephalinae_Macrostelini_Macrosteles_striifrons_AB795320” is the 16S rRNA of Macrosteles striifrons (Cicadellidae: Deltocephalinae: Macrostelini) with a NCBI accession number AB795320). The file “Sulcia_pep.fasta” is the concatenated amino acid alignments of 131 orthologues of “Candidatus Sulcia muelleri” (Sulcia). It contains 41970 positions and presents 101 Sulcia strains and 3 Blattabacterium strains. This file was used for the maximum likelihood analysis of Sulcia. The file “Sulcia_nucleotide.fasta” is the concatenated nucleotide alignment corresponding to the sequences in “Sulcia_pep.fasta” but also comprises the alignment of 16S rRNA. It has 127339 positions and was used for the maximum likelihood and maximum parsimony analyses of Sulcia. Individual gene alignments (16S rRNA and 131 orthologues of Sulcia and Blattabacterium) are deposited in the compressed file “individual_gene_alignments.zip”, which were used to construct gene trees for multispecies coalescent analysis. The names of Sulcia strains were replaced by the name of their hosts in “Sulcia_pep.fasta”, “Sulcia_nucleotide.fasta” and the files in “individual_gene_alignments.zip”. In all the alignment files, gaps are indicated by “-”.
keywords: endosymbiont, “Candidatus Sulcia muelleri”, Auchenorrhyncha, coevolution
published: 2021-06-14
 
This repository contains the weights for two StyleGAN2 networks trained on two composite T1 and T2 weighted open-source brain MR image datasets, and one StyleGAN2 network trained on the Flickr Face HQ image dataset. Example images sampled from the respective StyleGANs are also included. The datasets themselves are not included in this repository. The weights are stored as `.pkl` files. The code and instructions to load and use the weights can be found at https://github.com/comp-imaging-sci/pic-recon . Additional details and citations can be found in the file "README.md".
keywords: StyleGAN2; Generative adversarial network (GAN); MRI; Medical imaging
published: 2021-08-27
 
The dataset shows all poison frogs (superfamily Dendrobatoidea) in private U.S. collections during 1990–2020. For each species and color morph, there is a date of arrival, the way it arrived in U.S. collections, and detailed notes related to its presence in the pet trade.
keywords: pet trade; amphibians; Dendrobatidae
published: 2022-02-09
 
The data file contains a list of articles and their RCT Tagger prediction scores, which were used in a project associated with the manuscript "Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews".
keywords: Cochrane reviews; automation; randomized controlled trial; RCT; systematic reviews
published: 2022-02-09
 
The data file contains a list of articles with PMIDs information, which were used in a project associated with the manuscript "Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews".
keywords: Cochrane reviews; Randomized controlled trials; RCT; Automation; Systematic reviews
published: 2019-10-19
 
Large, distributed microphone arrays could offer dramatic advantages for audio source separation, spatial audio capture, and human and machine listening applications. This dataset contains acoustic measurements and speech recordings from 10 loudspeakers and 160 microphones spread throughout a large, reverberant conference room. The distributed microphone system contains two types of array: four wearable microphone arrays of 16 sensors each placed near the ears and across the upper body, and twelve tabletop arrays of 8 microphones each in enclosures designed to resemble voice-assistant speakers. The dataset includes recordings of chirps that can be used to measure impulse responses and of speech clips derived from the CSTR VCTK corpus. The speech clips are recorded both individually and as a mixture to support source separation experiments. The uncompressed files are about 13.4 GB.
keywords: microphone arrays; audio source separation; augmented listening; wireless sensor networks
published: 2020-10-28
 
We studied we examined the role of stream flow on environmental DNA (eDNA) concentrations and detectability of an invasive clam (Corbicula fluminea), while also accounting for other abiotic and biotic variables. This data includes the eDNA concentrations, quadrat estimates of clam density, and abiotic variables.
keywords: Corbicula; detection probability; eDNA; invasive species; lotic; occupancy modeling
published: 2022-08-20
 
Dataset associated with Jones and Ward BEAS-D-21-00106R2 submission: Parasitic cowbird development up to fledging and subsequent post-fledging survival reflect life history variation found across host species. Excel CSV files and .inp file with data used in nest survival and Brown-headed Cowbird post-fledging analyses and file with descriptions of each column. The CSV file is setup for logistic exposure models in SAS or R and the .inp file is setup to be uploaded into program MARK for multi-state recaptures only analysis. Species included in the analyses: American Robin, Blue Grosbeak, Brown Thrasher, Blue-winged Warbler, Carolina Chickadee, Chipping Sparrow, Common Yellowthroat, Dickcissel, Eastern Bluebird, Eastern Phoebe, Eastern Towhee, Field Sparrow, Gray Catbird, House Wren, Indigo Bunting, Northern Cardinal, Red-winged Blackbird, Tree Swallow, Yellow-breasted Chat, and Yellow Warbler.
keywords: brood parasitism; cowbird; carryover effects; phenotypic plasticity; post-fledging; songbirds
published: 2022-06-01
 
This dataset contain information for the paper "Changes in neuropeptide prohormone genes among Cetartio-dactyla livestock and wild species associated with evolution and domestication" Veterinary Sciences, MDPI. Protein sequences were predicted using GeneWise for 98 neuropeptide prohormone genes from publicly available genomes of 118 Cetartiodactyla species. All predictions (CetartiodactylaSequences2022.zip) were manually verified. Sequences were aligned within each prohormone using MAFFT (MDPImultalign2022.zip includes multiple sequence alignment of all species available for each prohormone). Phylogenetic gene trees were constructed using PhyML and the species tree was constructed using ASTRAL (MDPItree2022.zip). The data is released under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0).
keywords: prohormone; neuropeptide; Cetartiodactyla; Cetartiodactyla; phylogenetics; gene tree; species tree
published: 2023-06-06
 
This dataset is derived from the COCI, the OpenCitations Index of Crossref open DOI-to-DOI references (opencitations.net). Silvio Peroni, David Shotton (2020). OpenCitations, an infrastructure organization for open scholarship. Quantitative Science Studies, 1(1): 428-444. https://doi.org/10.1162/qss_a_00023 We have curated it to remove duplicates, self-loops, and parallel edges. These data were copied from the Open Citations website on May 6, 2023 and subsequently processed to produce a node list and an edge-list. Integer_ids have been assigned to the DOIs to reduce memory and storage needs when working with these data. As noted on the Open Citation website, each record is a citing-cited pair that uses DOIs as persistent identifiers.
keywords: open citations; bibliometrics; citation network; scientometrics
published: 2020-10-20
 
This dataset includes a total of 501 images of 42 fossil specimens of Striatopollis and 459 specimens of 45 extant species of the tribe Amherstieae-Fabaceae. These images were taken using Airyscan confocal superresolution microscopy at 630X magnification (63x/NA 1.4 oil DIC). The images are in the CZI file format. They can be opened using Zeiss propriety software (Zen, Zen lite) or in ImageJ. More information on how to open CZI files can be found here: [https://www.zeiss.com/microscopy/us/products/microscope-software/zen/czi.html#microscope---image-data].
keywords: Striatopollis catatumbus; superresolution microscopy; Cenozoic; tropics; Zeiss; CZI; striate pollen.
published: 2020-02-05
 
The Delt_Comb.NEX text file contains the original data used in the phylogenetic analyses of Zahniser & Dietrich, 2013 (European Journal of Taxonomy, 45: 1-211). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first nine lines of the file indicate the file type (Nexus), that 152 taxa were analyzed, that a total of 3971 characters were analyzed, the format of the data, and specification for two symbols used in the dataset. There are four datasets separated into blocks, one each for: 28S rDNA gene, Histone H3 gene, morphology, and insertion/deletion characters scored based on the alignment of the 28S rDNA dataset. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the publication using this dataset. A text file, Delt_morph_char.txt, is available here that states the morphological characters and characters states that were scored in the Delt_Comb.NEX dataset. The original DNA sequence data are available from NCBI GenBank under the accession numbers indicated in publication. Chromatogram files for each sequencing read are available from the first author upon request.
keywords: phylogeny; DNA sequence; morphology; parsimony analysis; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; histone H3; bayesian analysis
published: 2021-02-24
 
This dataset contains model output from the Community Earth System Model, Version 2 (CESM2; Danabasoglu et al. 2020). These data were used for analysis in Impacts of Large-Scale Soil Moisture Anomalies in Southeastern South America, published in the Journal of Hydrometeorology (DOI: 10.1175/JHM-D-20-0116.1). See this publication for details of the model simulations that created these data. Four NetCDF (.nc) files are included in this dataset. Two files correspond to the control simulation (FHIST_SP_control) and two files correspond to a simulation with a dry soil moisture anomaly imposed in southeastern South America (FHIST_SP_dry; see the publication mentioned in the preceding paragraph for details on the spatial extent of the imposed anomaly). For each simulation, one file corresponds to output from the atmospheric model (file names with "cam") of CESM2 and the other to the land model (file names with "clm2"). These files are raw CESM output concatenated into a single file for each simulation. All files include data from 1979-01-02 to 2003-12-31 at a daily resolution. The spatial resolution of all files is about 1 degree longitude x 1 degree latitude. Variables included in these files are listed or linked below. Variables in atmosphere model output: Vertical velocity (omega) Convective precipitation Large-scale precipitation Surface pressure Specific humidity Temperature (atmospheric profile) Reference temperature (temp. at reference height, 2 meters in this case) Zonal wind Meridional wind Geopotential height Variables in land model output: See https://www.cesm.ucar.edu/models/cesm1.2/clm/models/lnd/clm/doc/UsersGuide/history_fields_table_40.xhtml Note that not all of the variables listed at the above link are included in the land model output files in this dataset. This material is based upon work supported by the National Science Foundation under Grant No. 1454089. We acknowledge high-performance computing support from Cheyenne (doi:10.5065/D6RX99HX) provided by NCAR's Computational and Information Systems Laboratory, sponsored by the National Science Foundation. The CESM project is supported primarily by the National Science Foundation. We thank all the scientists, software engineers, and administrators who contributed to the development of CESM2. References Danabasoglu, G., and Coauthors, 2020: The Community Earth System Model Version 2 (CESM2). Journal of Advances in Modeling Earth Systems, 12, e2019MS001916, https://doi.org/10.1029/2019MS001916.
keywords: Climate modeling; atmospheric science; hydrometeorology; hydroclimatology; soil moisture; land-atmosphere interactions
published: 2021-03-23
 
DNN weights used in the evaluation of the ApproxTuner system. Link to paper: https://dl.acm.org/doi/10.1145/3437801.3446108
published: 2022-04-19
 
List of differentially expressed genes in human endometrial stromal cells with knockdown of Basigin (BSG) gene expression during decidualization. The BSG siRNA or negative scrambled control siRNA were transfected into human endometrial stromal cells (HESCs) following the protocol of siLentFect™ Lipid (Bio-Rad, Hercules, CA. Following complete knock down of BSG in HESCs (72 hours after adding siRNA), HESCs were treated with medium containing estrogen, progesterone and cAMP to induce decidualization. BSG siRNA and negative control scrambled siRNA were added to the cells every four days (day 0, 4) over the course of the decidualization protocol. Total RNA was harvested at day 6 of the decidualization protocol for microarray analysis. Microarray analysis was performed at the University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center. Briefly, 0.2 micrograms of total RNA were labeled using the Agilent two color QuickAmp labeling kit (Agilent Technologies, Santa Clara, CA) according to the manufacturer’s protocol. The optional spike-in controls were not used. Samples were hybridized to Human Gene Expression 4x44K v2 Microarray (Agilent Technologies, Santa Clara, CA) in an Agilent Hybridization Cassette according to standard protocols. The arrays were then scanned on an Axon GenePix 4000B scanner and the images were quantified using Axon GenePix 6.1. Microarray data pre-processing and statistical analyses were done in R (v3.6.2) using the limma package (3.42.0 (Ritchie et al., 2015). Median foreground and median background values from the 4 arrays were read into R and any spots that had been manually flagged (-100 values) were given a weight of zero. The background values were ignored because investigations showed that trying to use them to adjust for background fluorescence added more noise to the data; background was low and even for all arrays, therefore no background correction was done. The individual Cy5 and Cy3 fluorescence for each array were normalized together using the quantile method 3 (Yang and Thorne, 2003). Agilent's Human Gene Expression 4x44K v2 Microarray has a total of 45,220 probes: 1224 probes for positive controls, 153 negative control, 823 labeled “ignore” and 43,118 labeled “cDNA”. The pos+neg+ignore probes were used to ascertain the background level of fluorescence (6, on the log2 scale) then discarded. The cDNA probes comprise 34,127 unique 60mer probes, of which 999 probes are spotted 10 times each and the rest one time each. We averaged the replicate probes for those spotted 10 times and then fit a mixed model that had treatment and dye as fixed effects and array pairing as a random effect (Phipson et al., 2016; Smyth et al., 2005). After fitting the model but before False Discovery Rate (FDR) correction (Benjamini and Hochberg, 1995), probes were filtered out by the following criteria: 1) did not have at least 4/8 samples with expression values > 6 (14,105 probes removed), 2) no longer had an assigned Entrez Gene ID in Bioconductor’s HsAgilentDesign026652.db annotation package (v3.2.3; 2,152 probes removed) (Huber et al., 2015), 3) mapped to the same Entrez Gene ID as another probe but had a larger p-value for treatment effect (4,141 probes removed). This left 13,729 probes representing 13,729 unique genes. <b>*Please note: that there is a discrepancy between the file and the readme as this plain text is the actual data file of this dataset.</b>
keywords: Basigin; endometrium; decidualization; human
published: 2016-08-16
 
This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families used to build the HMMs and BLAST databases. The file structure is: ./X/Y/initial.fasttree ./X/Y/initial.fasta where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on the initial.fasta. The query.tar archive contains the query sequences for each cross-fold set. The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas, where Z is the fragment length (1, 0.5, or 0.25). The query files are found in the splits directory. [1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.
keywords: HIPPI dataset; ensembles of profile Hidden Markov models; Pfam
published: 2021-02-01
 
These datasets provide the basis of our analysis in the paper - The Potential Impact of a Clean Energy Society On Air Quality. All datasets here are from the model output (CAM4-chem). All the simulations were run to steady-state and only the outputs used in the analysis are archived here.
keywords: clean energy; ozone; particulates
published: 2021-04-15
 
To generate the bibliographic and survey data to support a data reuse study conducted by several Library faculty and accepted for publication in the Journal of Academic Librarianship, the project team utilized a series of web-based online scripts that employed several different endpoints from the Scopus API. The related dataset: "Data for: An Examination of Data Reuse Practices within Highly Cited Articles of Faculty at a Research University" contains survey design and results. <br /> 1) <b>getScopus_API_process_dmp_IDB.asp</b>: used the search API query the Scopus database API for papers by UIUC authors published in 2015 -- limited to one of 9 pre-defined Scopus subject areas -- and retrieve metadata results sorted highest to lowest by the number of times the retrieved articles were cited. The URL for the basic searches took the following form: https://api.elsevier.com/content/search/scopus?query=(AFFIL%28(urbana%20OR%20champaign) AND univ*%29) OR (AF-ID(60000745) OR AF-ID(60005290))&apikey=xxxxxx&start=" & nstart & "&count=25&date=2015&view=COMPLETE&sort=citedby-count&subj=PHYS<br /> Here, the variable nstart was incremented by 25 each iteration and 25 records were retrieved in each pass. The subject area was renamed (e.g. from PHYS to COMP for computer science) in each of the 9 runs. This script does not use the Scopus API cursor but downloads 25 records at a time for up to 28 times -- or 675 maximum bibliographic records. The project team felt that looking at the most 675 cited articles from UIUC faculty in each of the 9 subject areas was sufficient to gather a robust, representative sample of articles from 2015. These downloaded records were stored in a temporary table that was renamed for each of the 9 subject areas. <br /> 2) <b>get_citing_from_surveys_IDB.asp</b>: takes a Scopus article ID (eid) from the 49 UIUC author returned surveys and retrieves short citing article references, 200 at a time, into a temporary composite table. These citing records contain only one author, no author affiliations, and no author email addresses. This script uses the Scopus API cursor=* feature and is able to download all the citing references of an article 200 records at a time. <br /> 3) <b>put_in_all_authors_affil_IDB.asp</b>: adds important data to the short citing records. The script adds all co-authors and their affiliations, the corresponding author, and author email addresses. <br /> 4) <b>process_for_final_IDB.asp</b>: creates a relational database table with author, title, and source journal information for each of the citing articles that can be copied as an Excel file for processing by the Qualtrics survey software. This was initially 4,626 citing articles over the 49 UIUC authored articles, but was reduced to 2,041 entries after checking for available email addresses and eliminating duplicates.
keywords: Scopus API; Citing Records; Most Cited Articles
published: 2022-08-29
 
Example scripts and configuration files needed to perform select simulations described in the manuscript "Percolation transition prescribes protein size-specific barrier to passive transport through the nuclear pore complex."
keywords: Nuclear Pore Complex; simulation setup
published: 2023-01-12
 
This dataset was developed as part of a study that examined the correlational relationships between local journal authorship, local and external citation counts, full-text downloads, link-resolver clicks, and four global journal impact factor indices within an all-disciplines journal collection of 12,200 titles and six subject subsets at the University of Illinois at Urbana-Champaign (UIUC) Library. While earlier investigations of the relationships between usage (downloads) and citation metrics have been inconclusive, this study shows strong correlations in the all-disciplines set and most subject subsets. The normalized Eigenfactor was the only global impact factor index that correlated highly with local journal metrics. Some of the identified disciplinary variances among the six subject subsets may be explained by the journal publication aspirations of UIUC researchers. The correlations between authorship and local citations in the six specific subject subsets closely match national department or program rankings. All the raw data used in this analysis, in the form of relational database tables with multiple columns. Can be opned using MS Access. Description for variables can be viewed through "Design View" (by right clik on the selected table, choose "Design View"). The 2 PDF files provide an overview of tables are included in each MDB file. In addition, the processing scripts and Pearson correlation code is available at <a href="https://doi.org/10.13012/B2IDB-0931140_V1">https://doi.org/10.13012/B2IDB-0931140_V1</a>.
keywords: Usage and local citation relationships; publication; citation and usage metrics; publication; citation and usage correlation analysis; Pearson correlation analysis
published: 2021-05-14
 
This document contains the Supplemental Materials for Chapter 4: Climate Change Impacts on Agriculture from the report "An Assessment of the Impacts of Climate Change in Illinois" published in 2021.
keywords: Illinois; climate change; agriculture; impacts; adaptation; crop yield; ISAM; econometrics; days suitable for fieldwork
published: 2021-10-13
 
Drainage network analysis is fundamental to understanding the characteristics of surface hydrology. Based on elevation data, drainage network analysis is often used to extract key hydrological features like drainage networks and streamlines. Limited by raster-based data models, conventional drainage network algorithms typically allow water to flow in 4 or 8 directions (surrounding grids) from a raster grid. To resolve this limitation, this paper describes a new vector-based method for drainage network analysis that allows water to flow in any direction around each location. The method is enabled by rapid advances in Light Detection and Ranging (LiDAR) remote sensing and high-performance computing. The drainage network analysis is conducted using a high-density point cloud instead of Digital Elevation Models (DEMs) at coarse resolutions. Our computational experiments show that the vector-based method can better capture water flows without limiting the number of directions due to imprecise DEMs. Our case study applies the method to Rowan County watershed, North Carolina in the US. After comparing the drainage networks and streamlines detected with corresponding reference data from US Geological Survey generated from the Geonet software, we find that the new method performs well in capturing the characteristics of water flows on landscape surfaces in order to form an accurate drainage network. This dataset contains all the code, notebooks, datasets used in the study conducted for the research publication titled " A Vector-Based Method for Drainage Network Analysis Based on LiDAR Data ". ## What's Inside A quick explanation of the components * `A Vector Approach to Drainage Network Analysis Based on LiDAR Data.ipynb` is a notebook for finding the drainage network based on LiDAR data *`Picture1.png` is a picture representing the pseudocode of our new algorithm * HPC` folder contains codes for running the algorithm with sbatch in HPC ** `execute.sh` is a bash script file that use sbatch to conduct large scale analysis for the algorithm ** `run.sh` is a bash script file that calls the script file `execute.sh` for large scale calculation for the algorithm ** `run.py` includes the codes implemented for the algorithm * `Rowan Creek Data` includes data that are used in the study ** `3_1.las` and `3_2.las ` are the LiDAR data files that is used in our analysis presented in the paper. Users may use this data file to reproduce our results and may replace it with their own LiDAR file to run this method over different areas ** `reference` folder includes reference data from USGS *** `reference_3_1.tif` and `reference_3_2.tif` are reference data for the drainage system analysis retrieved from USGS.
keywords: CyberGIS; Drainage System Analysis; LiDAR