Displaying datasets 176 - 200 of 397 in total

Subject Area

Life Sciences (221)
Social Sciences (88)
Physical Sciences (53)
Technology and Engineering (34)
Arts and Humanities (1)


Other (107)
U.S. National Science Foundation (NSF) (104)
U.S. National Institutes of Health (NIH) (41)
U.S. Department of Energy (DOE) (35)
U.S. Department of Agriculture (USDA) (21)
Illinois Department of Natural Resources (IDNR) (9)
U.S. National Aeronautics and Space Administration (NASA) (4)
U.S. Geological Survey (USGS) (4)
U.S. Army (1)

Publication Year

2020 (100)
2021 (97)
2019 (73)
2018 (59)
2017 (35)
2016 (30)
2022 (3)


CC0 (227)
CC BY (162)
custom (8)
published: 2018-11-18
This dataset contains experimental measurements used in the paper, "Ultra-sensitivity of Numerical Landscape Evolution Models to their Initial Conditions." (to be submitted). The data is taken from experimental runs in a miniature landscape model named the eXperimental Landscape Evolution (XLE) facility. In this facility, we complete five >24hr runs at 5 minute temporal resolution. Every five minutes, an planform image was capture, and a digital elevation model (DEM) was generated. For each run, images and a corresponding animation of images are documented. In addition,ASCII formatted DEMs along with color hillshade maps were generated. The hillshade map images were also made into an animation. This dataset is associated with the following publication: https://doi.org/10.1029/2019GL083305
keywords: landscape evolution model; digital elevation model; geomorphology
published: 2020-06-02
The text file contains the original data used in the phylogenetic analyses of Xue et al. (2020: Systematic Entomology, in press). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 89 taxa (species) and 2676 characters, indicate that the first 2590 characters are DNA sequence and the last 86 are morphological, that gaps inserted into the DNA sequence alignment and inapplicable morphological characters are indicated by a dash, and that missing data are indicated by a question mark. The file contains aligned nucleotide sequence data for 5 gene regions and 86 morphological characters. The positions of data partitions are indicated in the mrbayes block of commands for the phylogenetic program MrBayes at the end of the file (Subset1 = 16S gene; Subset2 = 28S gene; Subset3 = COI gene; Subset 4 = Histone H3 and H2A genes). The mrbayes block also contains instructions for MrBayes on various non-default settings for that program. These are explained in the original publication. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf, also available from the journal website. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supplementary file.
keywords: phylogeny; DNA sequence; morphology; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; 16S rDNA; histone H3; histone H2A; cytochrome oxidase I; Bayesian analysis
published: 2020-06-03
This datasets provide basis of our analysis in the paper - Potential Impacts of Supersonic Aircraft on Stratospheric Ozone and Climate. All datasets here can be categorized into emission data and model output data (WACCM). All the model simulations (background and perturbation) were run to steady-state and only the datasets used in analysis are archived here.
keywords: NetCDF; Supersonic aircraft; Stratospheric ozone; Climate
published: 2020-06-03
This dataset provides files for use in analysis of human land preference across Australasia, and in a localized analysis of land preference in Laos and Vietnam. All files can be imported into ArcGIS for visualization, and re-analyzed using the open source Maxent species distribution modeling program. CSV files contain known human presence sites for model validation. ASC files contain geographically coded environmental data for mean annual temperature and mean annual precipitation during the Last Glacial Maximum, as well as downward slope data. All ASC files are in the WGS 1984 Mercator map projection for visualization in ArcGIS and can be opened as text files in text editors supporting large file sizes.
keywords: human dispersal; ecological niche modeling; Australasia; Late Pleistocene; land preference
published: 2020-05-31
This repository includes a simulated dataset and related scripts used for the paper "Moss: Accurate Single-Nucleotide Variant Calling from Multiple Bulk DNA Tumor Samples".
keywords: Somatic Mutations; Bulk DNA Sequencing; Cancer Genomics
published: 2020-05-30
Original leaf gas exchange and absorptance data used in the Collison et al. (2020) Light, Not Age, Underlies the Q9 Maladaptation of Maize and Miscanthus Photosynthesis to Self-Shading - Frontiers in Plant Science doi: 10.3389/fpls.2020.00783
keywords: C4 photosynthesis; canopy; bioenergy; food security; quantum yield; shade acclimation; photosynthetic light-use efficiency; leaf aging
published: 2019-11-11
This repository includes scripts and datasets for the paper, "FastMulRFS: Fast and accurate species tree estimation under generic gene duplication and loss models." Note: The results from estimating species trees with ASTRID-multi (included in this repository) are *not* included in the FastMulRFS paper. We estimated species trees with ASTRID-multi in the fall of 2019, but ASTRID-multi had an important bug fix in January 2020. Therefore, the ASTRID-multi species trees in this repository should be ignored.
keywords: Species tree estimation; gene duplication and loss; statistical consistency; MulRF, FastRFS
published: 2020-05-17
Models and predictions for submission to TRAC - 2020 Second Workshop on Trolling, Aggression and Cyberbullying Our approach is described in our paper titled: Mishra, Sudhanshu, Shivangi Prasad, and Shubhanshu Mishra. 2020. “Multilingual Joint Fine-Tuning of Transformer Models for Identifying Trolling, Aggression and Cyberbullying at TRAC 2020.” In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2020). The source code for training this model and more details can be found on our code repository: https://github.com/socialmediaie/TRAC2020 NOTE: These models are retrained for uploading here after our submission so the evaluation measures may be slightly different from the ones reported in the paper.
keywords: Social Media; Trolling; Aggression; Cyberbullying; text classification; natural language processing; deep learning; open source;
published: 2020-05-20
This dataset is a snapshot of the presence and structure of entrepreneurship education in U.S. four-year colleges and universities in 2015, including co-curricular activities and related infrastructure. Public, private not-for-profit and for-profit institutions are included, as are specialized four-year institutions. The dataset provides insight into the presence of entrepreneurship education both within business units and in other units of college campuses. Entrepreneurship is defined broadly, to include small business management and related career-focused options.
keywords: Entrepreneurship education; Small business education; Ewing Marion Kauffman Foundation; csv
published: 2020-05-15
Trained models for multi-task multi-dataset learning for sequence prediction in tweets Tasks include POS, NER, Chunking, and SuperSenseTagging Models were trained using: https://github.com/napsternxg/SocialMediaIE/blob/master/experiments/multitask_multidataset_experiment.py See https://github.com/napsternxg/SocialMediaIE for details.
keywords: twitter; deep learning; machine learning; trained models; multi-task learning; multi-dataset learning;
published: 2020-05-15
This data has tweets collected in paper Shubhanshu Mishra, Sneha Agarwal, Jinlong Guo, Kirstin Phelps, Johna Picco, and Jana Diesner. 2014. Enthusiasm and support: alternative sentiment classification for social movements on social media. In Proceedings of the 2014 ACM conference on Web science (WebSci '14). ACM, New York, NY, USA, 261-262. DOI: https://doi.org/10.1145/2615569.2615667 The data only contains tweet IDs and the corresponding enthusiasm and support labels by two different annotators.
keywords: Twitter; text classification; enthusiasm; support; social causes; LGBT; Cyberbullying; NFL
published: 2020-05-13
Terrorism is among the most pressing challenges to democratic governance around the world. The Responsible Terrorism Coverage (or ResTeCo) project aims to address a fundamental dilemma facing 21st century societies: how to give citizens the information they need without giving terrorists the kind of attention they want. The ResTeCo hopes to inform best practices by using extreme-scale text analytic methods to extract information from more than 70 years of terrorism-related media coverage from around the world and across 5 languages. Our goal is to expand the available data on media responses to terrorism and enable the development of empirically-validated models for socially responsible, effective news organizations. This particular dataset contains information extracted from terrorism-related stories in the New York Times published between 1945 and 2018. It includes variables that measure the relative share of terrorism-related topics, the valence and intensity of emotional language, as well as the people, places, and organizations mentioned. This dataset contains 3 files: 1. <i>"ResTeCo Project NYT Dataset Variable Descriptions.pdf"</i> <ul> <li>A detailed codebook containing a summary of the Responsible Terrorism Coverage (ResTeCo) Project New York Times (NYT) Dataset and descriptions of all variables. </li> </ul> 2. <i>"resteco-nyt.csv"</i> <ul><li>This file contains the data extracted from terrorism-related media coverage in the New York Times between 1945 and 2018. It includes variables that measure the relative share of topics, sentiment, and emotion present in this coverage. There are also variables that contain metadata and list the people, places, and organizations mentioned in these articles. There are 53 variables and 438,373 observations. The variable "id" uniquely identifies each observation. Each observation represents a single news article. </li> <li> <b>Please note</b> that care should be taken when using "respect-nyt.csv". The file may not be suitable to use in a spreadsheet program like Excel as some of the values get to be quite large. Excel cannot handle some of these large values, which may cause the data to appear corrupted within the software. It is encouraged that a user of this data use a statistical package such as Stata, R, or Python to ensure the structure and quality of the data remains preserved.</li> </ul> 3. <i>"README.md"</i> <ul><li>This file contains useful information for the user about the dataset. It is a text file written in mark down language</li> </ul> <b>Citation Guidelines</b> 1) To cite this codebook please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project New York Times (NYT) Dataset Variable Descriptions. Responsible Terrorism Coverage (ResTeCo) Project New York Times Dataset. Cline Center for Advanced Social Research. May 13. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-4638196_V1 2) To cite the data please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project New York Times Dataset. Cline Center for Advanced Social Research. May 13. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-4638196_V1
keywords: Terrorism, Text Analytics, News Coverage, Topic Modeling, Sentiment Analysis
published: 2020-05-12
The data provided herein is accelerometer and strain data taken from free vibration response of pre-tensioned, partially submerged steel beam specimens (modulus of elasticity assumed = 29,000 ksi). The specimens were subjected to various levels of pre-tension, and various levels of submersion in water. The purpose of the testing was to quantify the effects of partial submersion on the vibrating frequencies of pretensioned beams. Three specimens were tested, each with different cross section (but identical cross-sectional area). The different cross sections allow investigation of the effects of specimen width as the specimen vibrates through water. The testing procedure was as follows: 1) Apply a specified level of tension in the beam. Measure tension via 3 strain gages. 2) Submerge the specimens to a specified depth of water 3) Excite the beams with either a hammer impact or a pull-and-release method (physically pull the middle of the bar and quickly release) 4) Measure the free vibration of the beam with 2 accelerometers. Schematic drawings of the test setup and the test specimens are provided, as is a picture of the test setup.
keywords: free vibration; beam; partially-submerged; prestressed;
published: 2020-04-02
Automatic and manual counts of black flies captured in Illinois.
keywords: black flies; simuliids; ImageJ; count method
published: 2020-04-22
Nest survival and Fledgling production data for Bell's Vireo and Willow Flycatcher nests.
keywords: Bell's Vireo;Willow Flycatcher;habitat selection;fitness;
published: 2017-12-22
TBP assessment raw data files of pre- and post- motion capture velocity and center of pressure force plate data. Labels are self-explanatory. The .mat files refer to data exported from the force plate for the time-to-stabilization assessments while the .txt files are the data collected for smoothness of gait assessments. These files do not relate to one another and are from separate assessments. Version2's files are the result from using Python code Data_Bank_Cleaner.py on version1's. Please find more information in READ_ME_databank.txt.
keywords: Multiple Sclerosis; Rehabilitation; Balance; Ataxia; Ballet; Dance; Targeted Ballet Program
published: 2020-04-20
Supplemental data sets for the Manuscript entitled "Contribution of fungal and invertebrate communities to mass loss and wood depolymerization in tropical terrestrial and aquatic habitats"
keywords: Coiba Island; wood decomposition; cellulose; hemicellulose; lignin breakdown; aquatic fungi
published: 2020-04-06
Raw measurement data for umbilical remnants (umbilical vein, umbilical arteries and urachus) in support of Equine Veterinary Journal publication "Normal Regression of the Internal Umbilical Remnant Structures in Standardbred Foals."
keywords: equine; umbilicus; ultrasound
published: 2020-04-07
Baseline data from a multi-modal intervention study conducted at the University of Illinois at Urbana-Champaign. Data include results from a cardiorespiratory fitness assessment (maximal oxygen consumption, VO2max), a body composition assessment (Dual-Energy X-ray Absorptiometry, DXA), and Magnetic Resonance Spectroscopy Imaging. Data set includes data from 435 participants, ages 18-44 years.
keywords: Magnetic Resonance Spectroscopy; N-acetyl aspartic acid (NAA); Body Mass Index; cardiorespiratory fitness; body composition
published: 2020-03-14
Data on bank elevations determined from lidar data for the Upper Sangamon River, Illinois, the Mission River, Texas, and the White River in Indiana
keywords: bank elevations, rivers, meandering, lowland
published: 2020-03-13
Data files associated with the assembly of mitochondrial minicircles from five species of parasitic lice. This includes data from four species in the genus Columbicola and from the human louse (Pediculus humanus). The files include FASTA sequences for all five species, reference sequences for read mapping approaches, resulting contigs produced by various assembly approaches, and alignments of human louse minicircles mapped to published sequences of the same species.
keywords: mitochondria; FASTA; nucleotide sequences; alignment; Columbicola; Pediculus
published: 2020-03-08
This dataset inventories the availability of entrepreneurship and small business education, including co-curricular opportunities, in two-year colleges in the United States. The inventory provides a snapshot of activities at more than 1,650 public, not-for-profit, and private for-profit institutions, in 2014.
keywords: Small business education; entrepreneurship education; Kauffman Entrepreneurship Education Inventory; Ewing Marion Kauffman Foundation; Paul J. Magelli
published: 2020-03-03
This second version (V2) provides additional data cleaning compared to V1, additional data collection (mainly to include data from 2019), and more metadata for nodes. Please see NETWORKv2README.txt for more detail.
keywords: citations; retraction; network analysis; Web of Science; Google Scholar; indirect citation
published: 2020-02-27
These data were collected for an experiment examining effects of neonicotinoid (clothianidin) presence on hover fly (Diptera: Syrphidae) behavior. Hover flies of two species (Eristalis arbustorum and Toxomerus marginatus) were offered a choice to feed on artificial flowers laced with sucrose solution that was either contaminated (CLO) or not contaminated (CON) with clothianidin. Two different concentrations of clothianidin in 0.5 M sucrose solution were tested: 2.5 ppb and 150 ppb. We conducted four sets of 10 trials, each trial set examining a different combination of species and clothianidin dose. Across 6 hours of video for each trial we recorded 1) number of visits to each flower that resulted in feeding, and 2) amount of time spent feeding during each visit. We found that while neither species fed significantly longer on either of the solutions, E. arbustorum appeared to avoid flowers with clothianidin particularly at high rates. In the paper, we attribute this avoidance response, partially, to hover fly-visible spectral differences between the two flower choices and discuss potential implications for field and lab-based studies. In the enclosed zip file we have included all data for this project and code scripts from R. * Note: Data folder contains 4 files (instead of 6 as mentioned in Readme): e.tenax_photoreceptors.csv; hoverfly_data_UPDATE.csv; number_visits_UPDATE.csv; and Original 2018 hover fly choice test data_Clem2020.xlsx
keywords: Syrphidae; hoverfly; Eristalis; Toxomerus; Choice Experiment; Neonicotinoid; Clothianidin
published: 2020-02-23
Citation context annotation for papers citing retracted paper Matsuyama 2005 (RETRACTED: Matsuyama W, Mitsuyama H, Watanabe M, Oonakahara KI, Higashimoto I, Osame M, Arimura K. Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD. Chest. 2005 Dec 1;128(6):3817-27.), retracted in 2008 (Retraction in: Chest (2008) 134:4 (893) <a href="https://doi.org/10.1016/S0012-3692(08)60339-6">https://doi.org/10.1016/S0012-3692(08)60339-6<a/> ). This is part of the supplemental data for Jodi Schneider, Di Ye, Alison Hill, and Ashley Whitehorn. "Continued Citation of a Fraudulent Clinical Trial Report, Eleven Years after it was retracted for Falsifying Data" [R&R under review with Scientometrics]. Overall we found 148 citations to the retracted paper from 2006 to 2019, However, this dataset does not include the annotations described in the 2015. in Ashley Fulton, Alison Coates, Marie Williams, Peter Howe, and Alison Hill. "Persistent citation of the only published randomized controlled trial of omega-3 supplementation in chronic obstructive pulmonary disease six years after its retraction." Publications 3, no. 1 (2015): 17-26. In this dataset 70 new and newly found citations are listed: 66 annotated citations and 4 pending citations (non-annotated since we don't have full-text). "New citations" refer to articles published from March 25, 2014 to 2019, found in Google Scholar and Web of Science. "Newly found citations" refer articles published 2006-2013, found in Google Scholar and Web of Science, but not previously covered in Ashley Fulton, Alison Coates, Marie Williams, Peter Howe, and Alison Hill. "Persistent citation of the only published randomised controlled trial of omega-3 supplementation in chronic obstructive pulmonary disease six years after its retraction." Publications 3, no. 1 (2015): 17-26. NOTES: This is Unicode data. Some publication titles & quotes are in non-Latin characters and they may contain commas, quotation marks, etc. FILES/FILE FORMATS Same data in two formats: 2006-2019-new-citation-contexts-to-Matsuyama.csv - Unicode CSV (preservation format only) 2006-2019-new-citation-contexts-to-Matsuyama.xlsx - Excel workbook (preferred format) ROW EXPLANATIONS 70 rows of data - one citing publication per row COLUMN HEADER EXPLANATIONS Note - processing notes Annotation pending - Y or blank Year Published - publication year ID - ID corresponding to the network analysis. See Ye, Di; Schneider, Jodi (2019): Network of First and Second-generation citations to Matsuyama 2005 from Google Scholar and Web of Science. University of Illinois at Urbana-Champaign. <a href="https://doi.org/10.13012/B2IDB-1403534_V2">https://doi.org/10.13012/B2IDB-1403534_V2</a> Title - item title (some have non-Latin characters, commas, etc.) Official Translated Title - item title in English, as listed in the publication Machine Translated Title - item title in English, translated by Google Scholar Language - publication language Type - publication type (e.g., bachelor's thesis, blog post, book chapter, clinical guidelines, Cochrane Review, consumer-oriented evidence summary, continuing education journal article, journal article, letter to the editor, magazine article, Master's thesis, patent, Ph.D. thesis, textbook chapter, training module) Book title for book chapters - Only for a book chapter - the book title University for theses - for bachelor's thesis, Master's thesis, Ph.D. thesis - the associated university Pre/Post Retraction - "Pre" for 2006-2008 (means published before the October 2008 retraction notice or in the 2 months afterwards); "Post" for 2009-2019 (considered post-retraction for our analysis) Identifier where relevant - ISBN, Patent ID, PMID (only for items we considered hard to find/identify, e.g. those without a DOI-based URL) URL where available - URL, ideally a DOI-based URL Reference number/style - reference Only in bibliography - Y or blank Acknowledged - If annotated, Y, Not relevant as retraction not published yet, or N (blank otherwise) Positive / "Poor Research" (Negative) - P for positive, N for negative if annotated; blank otherwise Human translated quotations - Y or blank; blank means Google scholar was used to translate quotations for Translated Quotation X Specific/in passing (overall) - Specific if any of the 5 quotations are specific [aggregates Specific / In Passing (Quotation X)] Quotation 1 - First quotation (or blank) (includes non-Latin characters in some cases) Translated Quotation 1 - English translation of "Quotation 1" (or blank) Specific / In Passing (Quotation 1) - Specific if "Quotation 1" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 1) - Methods; Results; or Methods and Results - blank if "Quotation 1" not specific, no associated quotation, or not yet annotated Quotation 2 - Second quotation (includes non-Latin characters in some cases) Translated Quotation 2 - English translation of "Quotation 2" Specific / In Passing (Quotation 2) - Specific if "Quotation 2" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 2) - Methods; Results; or Methods and Results - blank if "Quotation 2" not specific, no associated quotation, or not yet annotated Quotation 3 - Third quotation (includes non-Latin characters in some cases) Translated Quotation 3 - English translation of "Quotation 3" Specific / In Passing (Quotation 3) - Specific if "Quotation 3" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 3) - Methods; Results; or Methods and Results - blank if "Quotation 3" not specific, no associated quotation, or not yet annotated Quotation 4 - Fourth quotation (includes non-Latin characters in some cases) Translated Quotation 4 - English translation of "Quotation 4" Specific / In Passing (Quotation 4) - Specific if "Quotation 4" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 4) - Methods; Results; or Methods and Results - blank if "Quotation 4" not specific, no associated quotation, or not yet annotated Quotation 5 - Fifth quotation (includes non-Latin characters in some cases) Translated Quotation 5 - English translation of "Quotation 5" Specific / In Passing (Quotation 5) - Specific if "Quotation 5" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 5) - Methods; Results; or Methods and Results - blank if "Quotation 5" not specific, no associated quotation, or not yet annotated Further Notes - additional notes
keywords: citation context annotation, retraction, diffusion of retraction