Dataset Search

Displaying 326 - 350 of 473 in total

Filters

Subject Area

Life Sciences (282)

Social Sciences (84)

Physical Sciences (67)

Technology and Engineering (37)

Uncategorized

Arts and Humanities (1)

Funder

U.S. Department of Energy (DOE) (150)

Other (116)

U.S. National Science Foundation (NSF) (112)

U.S. National Institutes of Health (NIH) (37)

U.S. Department of Agriculture (USDA) (28)

Illinois Department of Natural Resources (IDNR) (12)

U.S. Geological Survey (USGS) (2)

Illinois Department of Transportation (IDOT) (1)

U.S. National Aeronautics and Space Administration (NASA) (1)

U.S. Army (1)

Publication Year

2025 (153)

2022 (50)

2024 (50)

2021 (45)

2020 (36)

2023 (34)

2018 (29)

2026 (28)

2019 (27)

2016 (11)

2017 (10)

License

CC BY (267)

CC0 (194)

custom (12)

Illinois Data Bank Dataset Search Results

Results

published: 2025-06-03

Data for Analysis of Nematode Ventral Nerve Cords Suggests Multiple Instances of Evolutionary Addition and Loss of Neurons

Han, Jaeyeong; Ficca, Alyson; Lanzatella, Marissa; Leang, Kanika; Barnum, Matthew; Boudreaux, Jonathan; Schroeder, Nathan (2025)

This data comprises image files used in the analysis of Analysis of Nematode Ventral Nerve Cords Suggests Multiple Instances of Evolutionary Addition and Loss of Neurons by Han et al. (bioRxiv, 2025: doi: https://doi.org/10.1101/2025.03.20.644414). It is separated into two folders. The first comprise data using DAPI staining to quantify the number of VNC nuclei in diverse nematodes. The second includes dye-filling data of Mononchus aquaticus.

keywords: C. elegans; Mononchus; neuroanatomy; nematode nervous system; ventral nerve cord; secondary simplification

published: 2025-08-21

Data for "Enhancing 2-Pyrone Synthase Efficiency by High-Throughput Mass-Spectrometric Quantification and In Vitro/In Vivo Catalytic Performance Correlation"

Lu, Yi; Sweedler, Jonathan; Zhou, Shuaizhen; Zhou, Yu (2025)

Engineering efficient biocatalysts is essential for metabolic engineering to produce valuable bioproducts from renewable resources. However, due to the complexity of cellular metabolic networks, it is challenging to translate success in vitro into high performance in cells. To meet such a challenge, an accurate and efficient quantification method is necessary to screen a large set of mutants from complex cell culture and a careful correlation between the catalysis parameters in vitro and performance in cells is required. In this study, we employed a mass-spectrometry based high-throughput quantitative method to screen new mutants of 2-pyrone synthase (2PS) for triacetic acid lactone (TAL) biosynthesis through directed evolution in E. coli. From the process, we discovered two mutants with the highest improvement (46 fold) in titer and the fastest kcat (44 fold) over the wild type 2PS, respectively, among those reported in the literature. A careful examination of the correlation between intracellular substrate concentration, Michaelis-Menten parameters and TAL titer for these two mutants reveals that a fast reaction rate under limiting intracellular substrate concentrations is important for in-cell biocatalysis. Such properties can be tuned by protein engineering and synthetic biology to adopt these engineered proteins for the maximum activities in different intracellular environments.

keywords: catalysis; mass spectrometry; metabolic engineering

published: 2022-04-19

List of differentially expressed genes for "Basigin is necessary for normal decidualization of human uterine stromal cells"

Nowak, Romana; Yang, Shuhong; Li, Kailiang; Bi, Jiajia; Drnevich, Jenny (2022)

List of differentially expressed genes in human endometrial stromal cells with knockdown of Basigin (BSG) gene expression during decidualization. The BSG siRNA or negative scrambled control siRNA were transfected into human endometrial stromal cells (HESCs) following the protocol of siLentFect™ Lipid (Bio-Rad, Hercules, CA. Following complete knock down of BSG in HESCs (72 hours after adding siRNA), HESCs were treated with medium containing estrogen, progesterone and cAMP to induce decidualization. BSG siRNA and negative control scrambled siRNA were added to the cells every four days (day 0, 4) over the course of the decidualization protocol. Total RNA was harvested at day 6 of the decidualization protocol for microarray analysis. Microarray analysis was performed at the University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center. Briefly, 0.2 micrograms of total RNA were labeled using the Agilent two color QuickAmp labeling kit (Agilent Technologies, Santa Clara, CA) according to the manufacturer’s protocol. The optional spike-in controls were not used. Samples were hybridized to Human Gene Expression 4x44K v2 Microarray (Agilent Technologies, Santa Clara, CA) in an Agilent Hybridization Cassette according to standard protocols. The arrays were then scanned on an Axon GenePix 4000B scanner and the images were quantified using Axon GenePix 6.1. Microarray data pre-processing and statistical analyses were done in R (v3.6.2) using the limma package (3.42.0 (Ritchie et al., 2015). Median foreground and median background values from the 4 arrays were read into R and any spots that had been manually flagged (-100 values) were given a weight of zero. The background values were ignored because investigations showed that trying to use them to adjust for background fluorescence added more noise to the data; background was low and even for all arrays, therefore no background correction was done. The individual Cy5 and Cy3 fluorescence for each array were normalized together using the quantile method 3 (Yang and Thorne, 2003). Agilent's Human Gene Expression 4x44K v2 Microarray has a total of 45,220 probes: 1224 probes for positive controls, 153 negative control, 823 labeled “ignore” and 43,118 labeled “cDNA”. The pos+neg+ignore probes were used to ascertain the background level of fluorescence (6, on the log2 scale) then discarded. The cDNA probes comprise 34,127 unique 60mer probes, of which 999 probes are spotted 10 times each and the rest one time each. We averaged the replicate probes for those spotted 10 times and then fit a mixed model that had treatment and dye as fixed effects and array pairing as a random effect (Phipson et al., 2016; Smyth et al., 2005). After fitting the model but before False Discovery Rate (FDR) correction (Benjamini and Hochberg, 1995), probes were filtered out by the following criteria: 1) did not have at least 4/8 samples with expression values > 6 (14,105 probes removed), 2) no longer had an assigned Entrez Gene ID in Bioconductor’s HsAgilentDesign026652.db annotation package (v3.2.3; 2,152 probes removed) (Huber et al., 2015), 3) mapped to the same Entrez Gene ID as another probe but had a larger p-value for treatment effect (4,141 probes removed). This left 13,729 probes representing 13,729 unique genes. *Please note: that there is a discrepancy between the file and the readme as this plain text is the actual data file of this dataset.

keywords: Basigin; endometrium; decidualization; human

published: 2025-03-19

Data for Implementing Deep Soil and Dynamic Root Uptake in Noah-MP (v4.5): Impact on Amazon Dry-Season Transpiration

Bieri, Carolina A.; Dominguez, Francina; Miguez-Macho, Gonzalo; Fan, Ying (2025)

This repository includes HRLDAS Noah-MP model output generated as part of Bieri et al. (2025) - Implementing deep soil and dynamic root uptake in Noah-MP (v4.5): Impact on Amazon dry-season transpiration. These data are distributed in two different formats: Raw model output files and subsetted files that include data for a specific variable. All files are .nc format (NetCDF) and aggregated into .tar files to facilitate download. Given the size of these datasets, Globus transfer is the best way to download them. Raw model output for four model experiments is available: FD (control), GW, SOIL, and ROOT. See the associated publication for information on the different experiments. These data span an approximately 20 year period from 01 Jun 2000 to 31 Dec 2019. The data have a spatial resolution of 4 km and a temporal frequency of 3 hours. These data are for a domain in the southern Amazon basin (see Figure 1 in the associated publication). Data for each experiment is available as a .tar file which includes 3-hourly NetCDF files. All default Noah-MP output variables are included in each file. As a result, the .tar files are quite large and may take many hours or even days to transfer depending on your network speed and local configurations. These files are named 'noahmp_output_2000_2019_EXP.tar', where EXP is the name of the experiment (FD, GW, SOIL, or ROOT). Subsetted model output at a daily temporal resolution for all four model experiments is also available. These .tar files include the following variables: water table depth (ZWT), latent heat flux (LH), sensible heat flux (HFX), soil moisture (SOIL_M), canopy evaporation (ECAN), ground evaporation (EDIR), transpiration (ETRAN), rainfall rate at the surface (QRAIN), and two variables that are specific to the ROOT experiment: ROOTACTIVITY (root activity function) and GWRD (active root water uptake depth). There is one file for each variable within the tarred files. These files are named 'noahmp_output_subset_2000_2019_EXP.tar', where EXP is the name of the experiment (FD, GW, SOIL, or ROOT). Finally, there is a sample dataset with raw 3-hourly output from the ROOT experiment for one day. The purpose of this sample dataset is to allow users to confirm if these data meet their needs before initiating a full transfer via Globus. This file is named 'noahmp_output_sample_ROOT.tar'. The README.txt file provides information on the Noah-MP output variables in these datasets, among other specifications. Information on HRLDAS Noah-MP and names/definitions of model output variables that are useful in working with these data are available here: http://dx.doi.org/10.5065/ew8g-yr95. Note that some output variables may be listed in this document under a different variable name, so searching for the long name (e.g. 'baseflow' instead of 'QRF') is recommended. Information on additional output variables that were added to the model as part of this study is available here: https://github.com/bieri2/bieri-et-al-2025-EGU-GMD/tree/DynaRoot. Model code, configuration files, and forcing data used to carry out the model simulations are linked in the related resources section.

keywords: Land surface model; NetCDF

published: 2025-10-13

Data for Metabolic Engineering of Rhodotorula toruloides IFO0880 Improves C16 and C18 Fatty Alcohol Production from Synthetic Media

Schultz, J. Carl; Mishra, Shekhar; Gaither, Emily; Mejia, Andrea; Dinh, Hoang V.; Maranas, Costas D.; Zhao, Huimin (2025)

The oleaginous, carotenogenic yeast Rhodotorula toruloides has been increasingly explored as a platform organism for the production of terpenoids and fatty acid derivatives. Fatty alcohols, a fatty acid derivative widely used in the production of detergents and surfactants, can be produced microbially with the expression of a heterologous fatty acyl-CoA reductase. Due to its high lipid production, R. toruloides has high potential for fatty alcohol production, and in this study several metabolic engineering approaches were investigated to improve the titer of this product. Fatty acyl-CoA reductase from Marinobacter aqueolei was co-expressed with SpCas9 in R. toruloides IFO0880 and a panel of gene overexpressions and Cas9-mediated gene deletions were explored to increase the fatty alcohol production. Two overexpression targets (ACL1 and ACC1, improving cytosolic acetyl-CoA and malonyl-CoA production, respectively) and two deletion targets (the acyltransferases DGA1 and LRO1) resulted in significant (1.8 to 4.4-fold) increases to the fatty alcohol titer in culture tubes. Combinatorial exploration of these modifications in bioreactor fermentation culminated in a 3.7 g/L fatty alcohol titer in the LRO1Δ mutant. As LRO1 deletion was not found to be beneficial for fatty alcohol production in other yeasts, a lipidomic comparison of the DGA1 and LRO1 knockout mutants was performed, finding that DGA1 is the primary acyltransferase responsible for triacylglyceride production in R. toruloides, while LRO1 disruption simultaneously improved fatty alcohol production, increased diacylglyceride and triacylglyceride production, and increased glucose consumption. The fatty alcohol titer of fatty acyl-CoA reductase-expressing R. toruloides was significantly improved through the deletion of LRO1, or the deletion of DGA1 combined with overexpression of ACC1 and ACL1. Disruption of LRO1 surprisingly increased both lipid and fatty alcohol production, creating a possible avenue for future study of the lipid metabolism of this yeast.

keywords: Conversion;Genome Engineering;Genomics

published: 2016-08-16

HIPPI Dataset

Nguyen, Nam-phuong; Nute, Mike; Mirarab, Siavash; Warnow, Tandy (2016)

This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families used to build the HMMs and BLAST databases. The file structure is: ./X/Y/initial.fasttree ./X/Y/initial.fasta where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on the initial.fasta. The query.tar archive contains the query sequences for each cross-fold set. The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas, where Z is the fragment length (1, 0.5, or 0.25). The query files are found in the splits directory. [1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.

keywords: HIPPI dataset; ensembles of profile Hidden Markov models; Pfam

published: 2021-04-15

Scopus API Scripts for Data Reuse Project

Mischo, William (2021)

To generate the bibliographic and survey data to support a data reuse study conducted by several Library faculty and accepted for publication in the Journal of Academic Librarianship, the project team utilized a series of web-based online scripts that employed several different endpoints from the Scopus API. The related dataset: "Data for: An Examination of Data Reuse Practices within Highly Cited Articles of Faculty at a Research University" contains survey design and results. 1) getScopus_API_process_dmp_IDB.asp: used the search API query the Scopus database API for papers by UIUC authors published in 2015 -- limited to one of 9 pre-defined Scopus subject areas -- and retrieve metadata results sorted highest to lowest by the number of times the retrieved articles were cited. The URL for the basic searches took the following form: https://api.elsevier.com/content/search/scopus?query=(AFFIL%28(urbana%20OR%20champaign) AND univ*%29) OR (AF-ID(60000745) OR AF-ID(60005290))&apikey=xxxxxx&start=" & nstart & "&count=25&date=2015&view=COMPLETE&sort=citedby-count&subj=PHYS Here, the variable nstart was incremented by 25 each iteration and 25 records were retrieved in each pass. The subject area was renamed (e.g. from PHYS to COMP for computer science) in each of the 9 runs. This script does not use the Scopus API cursor but downloads 25 records at a time for up to 28 times -- or 675 maximum bibliographic records. The project team felt that looking at the most 675 cited articles from UIUC faculty in each of the 9 subject areas was sufficient to gather a robust, representative sample of articles from 2015. These downloaded records were stored in a temporary table that was renamed for each of the 9 subject areas. 2) get_citing_from_surveys_IDB.asp: takes a Scopus article ID (eid) from the 49 UIUC author returned surveys and retrieves short citing article references, 200 at a time, into a temporary composite table. These citing records contain only one author, no author affiliations, and no author email addresses. This script uses the Scopus API cursor=* feature and is able to download all the citing references of an article 200 records at a time. 3) put_in_all_authors_affil_IDB.asp: adds important data to the short citing records. The script adds all co-authors and their affiliations, the corresponding author, and author email addresses. 4) process_for_final_IDB.asp: creates a relational database table with author, title, and source journal information for each of the citing articles that can be copied as an Excel file for processing by the Qualtrics survey software. This was initially 4,626 citing articles over the 49 UIUC authored articles, but was reduced to 2,041 entries after checking for available email addresses and eliminating duplicates.

keywords: Scopus API; Citing Records; Most Cited Articles

published: 2025-12-09

Data for "Wild bee response to forest management varies seasonally and is mediated by resource availability"

Chase, Marissa H.; Fraterrigo, Jennifer M.; Charles, Brian; Harmon-Threatt, Alexandra (2025)

The dataset includes bee community data from a study conducted down in southern Illinois across three forested public land sites. Bee diversity and abundance data, as well as environmental variables, are included for each plot. Each plot was visited a total of four times.

keywords: wild bees; forest management; resource availability

published: 2021-10-13

A Vector-Based Method for Drainage Network Analysis Based on LiDAR Data

Lyu, Fangzheng; Xu, Zewei; Ma, Xinlin; Wang, Shaohua; Li, Zhiyu; Wang, Shaowen (2021)

Drainage network analysis is fundamental to understanding the characteristics of surface hydrology. Based on elevation data, drainage network analysis is often used to extract key hydrological features like drainage networks and streamlines. Limited by raster-based data models, conventional drainage network algorithms typically allow water to flow in 4 or 8 directions (surrounding grids) from a raster grid. To resolve this limitation, this paper describes a new vector-based method for drainage network analysis that allows water to flow in any direction around each location. The method is enabled by rapid advances in Light Detection and Ranging (LiDAR) remote sensing and high-performance computing. The drainage network analysis is conducted using a high-density point cloud instead of Digital Elevation Models (DEMs) at coarse resolutions. Our computational experiments show that the vector-based method can better capture water flows without limiting the number of directions due to imprecise DEMs. Our case study applies the method to Rowan County watershed, North Carolina in the US. After comparing the drainage networks and streamlines detected with corresponding reference data from US Geological Survey generated from the Geonet software, we find that the new method performs well in capturing the characteristics of water flows on landscape surfaces in order to form an accurate drainage network. This dataset contains all the code, notebooks, datasets used in the study conducted for the research publication titled " A Vector-Based Method for Drainage Network Analysis Based on LiDAR Data ". ## What's Inside A quick explanation of the components * `A Vector Approach to Drainage Network Analysis Based on LiDAR Data.ipynb` is a notebook for finding the drainage network based on LiDAR data *`Picture1.png` is a picture representing the pseudocode of our new algorithm * HPC` folder contains codes for running the algorithm with sbatch in HPC ** `execute.sh` is a bash script file that use sbatch to conduct large scale analysis for the algorithm ** `run.sh` is a bash script file that calls the script file `execute.sh` for large scale calculation for the algorithm ** `run.py` includes the codes implemented for the algorithm * `Rowan Creek Data` includes data that are used in the study ** `3_1.las` and `3_2.las ` are the LiDAR data files that is used in our analysis presented in the paper. Users may use this data file to reproduce our results and may replace it with their own LiDAR file to run this method over different areas ** `reference` folder includes reference data from USGS *** `reference_3_1.tif` and `reference_3_2.tif` are reference data for the drainage system analysis retrieved from USGS.

keywords: CyberGIS; Drainage System Analysis; LiDAR

published: 2021-10-28

Why the Stall? Using Metabolomics to Define the Lack of Upstream Movement of Invasive Bigheaded Carp in the Illinois River

Suski, Cory; Curtis-Quick, Jocelyn (2021)

Bigheaded carp were collected from the Illinois and Des Plaines Rivers, parts of the Illinois Waterway, from May to November 2018. A total of 93 fish were collected during sampling for a study comprised of 40 females, 41 males, and 12 unsexed fish. GC/MS metabolite profiling analysis detected 180 compounds. Livers from carp at the leading edge had differences in energy use and metabolism, and suppression of protective mechanisms relative to downstream fish; differences were consistent across time. This body of work provides evidence that water quality is linked to carp movement in the Illinois River. As water quality in this region continues to improve, consideration of this impact on carp spread is essential to protect the Great Lakes.

keywords: water quality; metabolites; range expansion; energy; contaminants

published: 2022-06-22

Data for Spatial Accessibility to HIV (Human Immunodeficiency Virus) Testing, Treatment, and Prevention Services in Illinois and Chicago, USA

Kang, Jeon-Young; Farkhad, Bita Fayaz; Chan, Man-pui Sally; Michels, Alexander; Albarracin, Dolores; Wang, Shaowen (2022)

This dataset helps to investigate the Spatial Accessibility to HIV Testing, Treatment, and Prevention Services in Illinois and Chicago, USA. The main components are: population data, healthcare data, GTFS feeds, and road network data. The core components are: 1) `GTFS` which contains GTFS (<a href="https://gtfs.org/">General Transit Feed Specification</a>) data which is provided by Chicago Transit Authority (CTA) from <a href="https://developers.google.com/transit/gtfs">Google's GTFS feeds</a>. Documentation defines the format and structure of the files that comprise a GTFS dataset: <a href="https://developers.google.com/transit/gtfs/reference?csw=1">https://developers.google.com/transit/gtfs/reference?csw=1</a>. 2) `HealthCare` contains shapefiles describing HIV healthcare providers in Chicago and Illinois respectively. The services come from <a href="https://locator.hiv.gov/">Locator.HIV.gov</a>. 3) `PopData` contains population data for Chicago and Illinois respectively. Data come from The American Community Survey and <a href="https://map.aidsvu.org/map">AIDSVu</a>. AIDSVu (https://map.aidsvu.org/map) provides data on PLWH in Chicago at the census tract level for the year 2017 and in the State of Illinois at the county level for the year 2016. The American Community Survey (ACS) provided the number of people aged 15 to 64 at the census tract level for the year 2017 and at the county level for the year 2016. The ACS provides annually updated information on demographic and socio economic characteristics of people and housing in the U.S. 4) `RoadNetwork` contains the road networks for Chicago and Illinois respectively from <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> using the Python <a href="https://osmnx.readthedocs.io/en/stable/">osmnx</a> package. The abstract for our paper is: Accomplishing the goals outlined in “Ending the HIV (Human Immunodeficiency Virus) Epidemic: A Plan for America Initiative” will require properly estimating and increasing access to HIV testing, treatment, and prevention services. In this research, a computational spatial method for estimating access was applied to measure distance to services from all points of a city or state while considering the size of the population in need for services as well as both driving and public transportation. Specifically, this study employed the enhanced two-step floating catchment area (E2SFCA) method to measure spatial accessibility to HIV testing, treatment (i.e., Ryan White HIV/AIDS program), and prevention (i.e., Pre-Exposure Prophylaxis [PrEP]) services. The method considered the spatial location of MSM (Men Who have Sex with Men), PLWH (People Living with HIV), and the general adult population 15-64 depending on what HIV services the U.S. Centers for Disease Control (CDC) recommends for each group. The study delineated service- and population-specific accessibility maps, demonstrating the method’s utility by analyzing data corresponding to the city of Chicago and the state of Illinois. Findings indicated health disparities in the south and the northwest of Chicago and particular areas in Illinois, as well as unique health disparities for public transportation compared to driving. The methodology details and computer code are shared for use in research and public policy.

keywords: HIV;spatial accessibility;spatial analysis;public transportation;GIS

published: 2020-11-20

November_2020_Jaikumar_et_al_IctB_Effects_in_Sorghum

Jaikumar, Nikhil; Clemente, Tom; Long, Steve; Ge, Zhengxiang; Changa, Timothy (2020)

This data set explores the effect of the cyanobacterial gene ictB on photosynthesis in sorghum, under both normal greenhouse growing temperatures (32 C / 25 C) and during and after an 8 day chilling stress (10 C / 5 C). IctB is a cyanobacterial gene of unknown function, which was initially thought to be involved in inorganic carbon transport into cells. While ictB is known now not to be an independently active carbon transporter in its own right, it may play a role in passive diffusion of metabolites. This transgene was introduced into sorghum by the lab of Thomas Clemente, through Agrobacterium mediated transformation, alone and in combination with the tomato sedoheptulose-1,7-bisphosphatase (SBPase) gene. Eleven events (six double construct and five single construct ictB) were involved in this study. SBPase was included because some previous experiments in C3 species and some previous modeling work, as well as its position at a metabolic branch point, indicates it plays a role as a control point for photosynthesis. A chilling treatment was included because chilling is one of the most serious ecological factors limiting the range of C4 species. Data includes gene expression, metabolomics (at normal growing temperature), SBPase enzyme activity, biomass and photosynthetic traits at both warm temperature and during and after chilling stress. ----------------- EXPLANATORY NOTES FOR ICTB/SBPASE SORGHUM MANUSCRIPT Data are organized into 10 worksheets, representing an expected 10 tables that will serve a supplementary role in the final publication. These include data on gene expression, metabolomics (at normal growing temperature), SBPase enzyme activity, biomass and photosynthetic traits at both warm temperature and during and after chilling stress. Tables are as follows: 1. Event_Code: for Table S1. Event codes for events and constructs. Two constructs were generated for this study, and numerous transgenic “events” (i.e. independent transformations) were carried out for each construct. A construct represents the actual vector which was introduced into the plants (complete with promoter, gene of interest, marker gene, etc.) while an event represents a single successful introduction of the transgene. Events are uniquely labeled with letter and number strings but also with a four-digit number for ease of reference, this table explains which event corresponds to each four-digit number. 2. Photosynthetic_Data: for Table S2. Photosynthetic data at greenhouse growing temperature, for ictB single construct, ictB/SBPase double construct, and wild type lines. Five ictB and six ictB/SBPase events were included. Greenhouse growing temperature was approximately 32 °C and 25 °C night. Photosynthetic parameters were measured using a Licor 6400-XT, and included parameters related to carbon dioxide uptake, water loss, and chlorophyll fluorescence. 3. Chilling_Treatment: for Table S3. Photosynthetic response to chilling treatment, for ictB single construct, and wild type lines. Four ictB events were included. Chilling treatment lasted approximately 8 days and began either 3.5 or 5.5 weeks after transplanting the plants (chilling was done in two batches). Chilling treatment involved temperatures of 10 °C day / 7 °C night in growth chambers. Photosynthetic parameters were measured at several time points during and after the chilling treatment, were measured using a Licor 6400-XT, and included parameters related to carbon dioxide uptake, water loss, and chlorophyll fluorescence. 4. SBPase_Activity: for Table S4. SBPase activity in double construct plants. These data measure in vitro substrate-saturated activity of SBPase in desalted extracts from leaf tissues, at 25 °C. Units are micromoles of SBP processed per second per m2 of leaf tissue. Five ictB/SBPase events were included. 5. 2014_gene_exp: for Table S5. Gene expression in 2014 experiment (units of cycle times). These data measure cycle times to threshold, relative to reference genes, for expression of ictB and SBPase. Six ictB single construct events and five ictB/SBPase double construct events were included. Cycle times to threshold relative to reference genes (ΔCT) are inversely related to number of transcripts relative to reference genes, as follows: ΔCT = -log2([NictB]/[Nreference])/[1 + log2b] where b = efficiency of replication. 6. 2016_gene_exp: for Table S5. Gene expression in 2016 experiment (units of cycle times). These data measure cycle times to threshold, relative to reference genes, for expression of ictB and SBPase. Six ictB single construct events and five ictB/SBPase double construct events were included. Cycle times to threshold relative to reference genes (ΔCT) are inversely related to number of transcripts relative to reference genes, as follows: ΔCT = -log2([NictB]/[Nreference])/[1 + log2b] where b = efficiency of replication. 7. Metabolites: for Table S7. Levels of 267 metabolites in leaf tissue. Four ictB single construct events and four ictB/SBPase double construct events were included in these analyses. Metabolites were measured in methanol-extracted samples, either by liquid chromatography / mass spectrometry or by gas chromatography / mass spectrometry, and were compared between events on a relative basis. As quantification was relative to wild type rather than on an absolute basis, no units are included. 8. Metabolite_F_values: for Table S8. F values for effects of ictB, SBPase (in cases where the model was better with a SBPase effect) and event. These analyses are done for each metabolite included in Table S7, and show effects of the explanatory variables ictB, SBPase, and individual event. 9. Biomass_2020: for Table S9. Biomass and grain yield at harvest, for ictB, ictB/SBPase and wild type sorghum plants in spring 2020. Four ictb/SBPase double construct and four ictB single construct events were included. 10. Biomass_2017: for Table S10. Biomass and grain yield at harvest, in chilled and non-chilled sorghum plants containing the ictB transgene (along with wild type controls) in fall 2017. Four ictB single construct events were included. Chilling treatment involved temperatures of 10 °C day / 7 °C night in growth chambers. All the variables in the file are explained as below: o Type (IctB-SBPase and IctB). This refers to whether a plant is wild type, single construct (contains only the ictB transgene) or double construct (contains both the ictB and SBPase transgenes). o Code: these codes are shorter labels to refer to each transgene event for the sake of convenience. o Alternate_Code: these codes are shorter labels to refer to each transgene event for the sake of convenience. o Event Number: these are unique labels for each transgenic events. o Construct Number: these are labels for each transgenic construct (either the ictB single construct or the ictB/SBPase double construct). o year (i): this refers to the year in which the study was conducted (2014, 2016, 2017, or 2020) o transgene or Transgenic: whether the transgene was present o construct or Type : whether the ictB or the ictB/SBPase construct was present (double, single, wildtype): o temp: leaf temperature during the measurement o A: carbon assimilation rate, in μmol m-2 s-1 o gs: stomatal conductance, in mol m-2 s-1 o CI: intercellular carbon dioxide concentration, in parts per million or μL L-1 o fvfm:FV’/FM’ (maximal potential photosystem II quantum yield under light adapted conditions), dimensionless ratio o phipsill: ΦPSII (maximal potential photosystem II quantum yield under light adapted conditions), dimensionless ratio o qP: photochemical quenching, i.e. ratio of ΦPSII to FV’/FM’ , dimensionless ratio o iwue: intrinsic water use efficiency, i.e. ratio of carbon assimilation rate to stomatal conductance, in units of μmol mol-1 o event: individual transgenic / transformation event o Vmax: substrate-saturated in vitro activity of the SBPase enzyme, in μmol m-2 s-1 o ID: identification number of sample o ΔCT1: difference in cycle times to threshold during gene expression (quantitative PCR) assay, between ictB and the reference gene GAPDH, in units of cycles o ΔCT2: cycle times to threshold during gene expression (quantitative PCR) assay, between SBPase and the reference gene GAPDH, in units of cycles o GAPDH: cycle times to threshold for the reference gene GAPDH (glyceraldehyde phosphate dehydrogenase) o IctB: cycle times to threshold for the gene of interest ictB o SBPase: cycle times to threshold for the gene of interest SBPase o v1 to v267 represent individual metabolite (see the heading immediately above the labels v1, v2, etc.). Variables v268-v272 refer to total (summed) metabolite levels for particular pathways of interest. o leaf: Leaf and stem dry biomass (in grams) o seed: Seedhead dry biomass (in grams) o biomass: Total (leaf, stem + seed head) dry biomass (in grams) o harvind: ratio of seed head dry biomass to total dry biomass o treatment (chilled and nonchilled): “Chilled” plants were grown under warm greenhouse conditions (32 °C day / 25 °C night) for 6 or 8 weeks, then switched to chilling temperatures under growth chamber conditions (10 °C / 7 °C night) for 8 days, and were then returned to greenhouse growing conditions. -----------------

keywords: ictB; SBPase; photosynthesis; sorghum; chilling

published: 2025-07-25

Pregnancy and chronic wasting disease (CWD) status for female white-tailed deer (Odocoileus virginianus) in northern Illinois, USA between fiscal years 2005 and 2024

Mori, Jameson; Rivera, Nelda; Brown, William; Skinner, Daniel; Schlichting, Peter; Novakofski, Jan; Mateus-Pinilla, Nohra (2025)

This dataset contains the pregnancy status of wild, white-tailed deer (Odocoileus virginianus) from northern Illinois culled as part of the Illinois Department of Natural Resources' chronic wasting disease (CWD) surveillance program. Fiscal years 2005 through 2024 are included. A fiscal year is the time between July 1st of one calendar year and June 30th of the next. Variables in this dataset include the pregnancy status, CWD infection status, age, weight, and day of mortality for each female deer, as well as the deer land cover utility (LCU) score for the TRS, township, or county from which the deer was culled. The deer population density of the county is also included. Data have been anonymized for landowner privacy reasons so that the location and year are not identifiable, but will give the same modeling results by maintaining how the data are grouped. The R code used to conduct the regression modeling is also included.

keywords: cervid; Cervidae, chronic wasting disease; CWD; reproduction; white-tailed deer; Odocoileus virginianus; pregnancy; regression

published: 2019-04-05

Inclusion_Criteria_Annotation

Dong, Xiaoru; Xie, Jingyi; Hoang, Linh (2019)

File Name: Inclusion_Criteria_Annotation.csv Data Preparation: Xiaoru Dong Date of Preparation: 2019-04-04 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: The file contains lists of inclusion criteria of Cochrane Systematic Reviews and the manual annotation results. 5420 inclusion criteria were annotated, out of 7158 inclusion criteria available. Annotations are either "Only RCTs" or "Others". There are 2 columns in the file: - "Inclusion Criteria": Content of inclusion criteria of Cochrane Systematic Reviews. - "Only RCTs": Manual Annotation results. In which, "x" means the inclusion criteria is classified as "Only RCTs". Blank means that the inclusion criteria is classified as "Others". Notes: 1. "RCT" stands for Randomized Controlled Trial, which, in definition, is "a work that reports on a clinical trial that involves at least one test treatment and one control treatment, concurrent enrollment and follow-up of the test- and control-treated groups, and in which the treatments to be administered are selected by a random process, such as the use of a random-numbers table." [Randomized Controlled Trial publication type definition from https://www.nlm.nih.gov/mesh/pubtypes.html]. 2. In order to reproduce the relevant data to this, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided. 3. This datafile (V2) is a updated version of the datafile published at https://doi.org/10.13012/B2IDB-5958960_V1 with some minor spelling mistakes in the data fixed.

keywords: Inclusion criteri; Randomized controlled trials; Machine learning; Systematic reviews

published: 2024-04-15

Data for Nanoscopic Imaging of Self-Propelled Ultrasmall Catalytic Nanomotors

Lyu, Zhiheng; Lehan, Yao; Zhisheng, Wang; Chang, Qian; Zuochen, Wang; Jiahui, Li; Yufeng, Wang; Qian, Chen (2024)

The dataset contains trajectories of Pt nanoparticles in 1.98 mM NaBH4 and NaCl, tracked under liquid-phase TEM. The coordinates (x, y) of nanoparticles are provided, together with the conversion factor that translates pixel size to actual distance. In the file, ∆t denotes the time interval and NaN indicates the absence of a value when the nanoparticle has not emerged or been tracked. The labeling of nanoparticles in the paper is also noted in the second row of the file.

keywords: nanomotor; liquid-phase TEM

published: 2020-09-02

Second-generation citation context analysis (2010-2019) to retracted paper Matsuyama 2005

Schneider, Jodi; Ye, Di; Hill, Alison (2020)

Citation context annotation. This dataset is a second version (V2) and part of the supplemental data for Jodi Schneider, Di Ye, Alison Hill, and Ashley Whitehorn. (2020) "Continued post-retraction citation of a fraudulent clinical trial report, eleven years after it was retracted for falsifying data". Scientometrics. In press, DOI: 10.1007/s11192-020-03631-1 Publications were selected by examining all citations to the retracted paper Matsuyama 2005, and selecting the 35 citing papers, published 2010 to 2019, which do not mention the retraction, but which mention the methods or results of the retracted paper (called "specific" in Ye, Di; Hill, Alison; Whitehorn (Fulton), Ashley; Schneider, Jodi (2020): Citation context annotation for new and newly found citations (2006-2019) to retracted paper Matsuyama 2005. University of Illinois at Urbana-Champaign. <a href="https://doi.org/10.13012/B2IDB-8150563_V1">https://doi.org/10.13012/B2IDB-8150563_V1</a> ). The annotated citations are second-generation citations to the retracted paper Matsuyama 2005 (RETRACTED: Matsuyama W, Mitsuyama H, Watanabe M, Oonakahara KI, Higashimoto I, Osame M, Arimura K. Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD. Chest. 2005 Dec 1;128(6):3817-27.), retracted in 2008 (Retraction in: Chest (2008) 134:4 (893) https://doi.org/10.1016/S0012-3692(08)60339-6). OVERALL DATA for VERSION 2 (V2) FILES/FILE FORMATS Same data in two formats: 2010-2019 SG to specific not mentioned FG.csv - Unicode CSV (preservation format only) - same as in V1 2010-2019 SG to specific not mentioned FG.xlsx - Excel workbook (preferred format) - same as in V1 Additional files in V2: 2G-possible-misinformation-analyzed.csv - Unicode CSV (preservation format only) 2G-possible-misinformation-analyzed.xlsx - Excel workbook (preferred format) ABBREVIATIONS: 2G - Refers to the second-generation of Matsuyama FG - Refers to the direct citation of Matsuyama (the one the second-generation item cites) COLUMN HEADER EXPLANATIONS File name: 2G-possible-misinformation-analyzed. Other column headers in this file have same meaning as explained in V1. The following are additional header explanations: Quote Number - The order of the quote (citation context citing the first generation article given in "FG in bibliography") in the second generation article (given in "2G article") Quote - The text of the quote (citation context citing the first generation article given in "FG in bibliography") in the second generation article (given in "2G article") Translated Quote - English translation of "Quote", automatically translation from Google Scholar Seriousness/Risk - Our assessment of the risk of misinformation and its seriousness 2G topic - Our assessment of the topic of the cited article (the second generation article given in "2G article") 2G section - The section of the citing article (the second generation article given in "2G article") in which the cited article(the first generation article given in "FG in bibliography") was found FG in bib type - The type of article (e.g., review article), referring to the cited article (the first generation article given in "FG in bibliography") FG in bib topic - Our assessment of the topic of the cited article (the first generation article given in "FG in bibliography") FG in bib section - The section of the cited article (the first generation article given in "FG in bibliography") in which the Matsuyama retracted paper was cited

keywords: citation context annotation; retraction; diffusion of retraction; second-generation citation context analysis

published: 2021-05-10

UAV-based multispectral time-series imagery of biomass sorghum - 2019

Varela Quintela, Sebastian; Leakey, Andrew (2021)

UAV-based high-resolution multispectral time-series orthophotos utilized to understand the relation between growth dynamics, imagery temporal resolution, and end-of-season biomass productivity of biomass sorghum as bioenergy crop. Sensor utilized is a RedEdge Micasense flown at 40 meters above ground level at the Energy Farm- UIUC in 2019.

keywords: Unmanned aerial vehicles; High throughput phenotyping; Machine learning; Bioenergy crops

published: 2022-10-14

NEXUS file for morphology-based phylogenetic analysis of Membracoidea (Hemiptera: Cicadellidae)

Dietrich, Christopher; Dmitriev, Dmitry; Takiya, Daniela; Thomas, Michael; Webb, Michael D; Zahniser, James; Zhang, Yalin (2022)

The Membracoidea_morph_data_Final.nex text file contains the original data used in the phylogenetic analyses of Dietrich et al. (Insect Systematics and Diversity, in review). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The complete taxon names corresponding to the 131 genus names listed under “BEGIN TAXA” are listed in Table 1 in the included PDF file “Taxa_and_characters”; the 229 morphological characters (names abbreviated under under “BEGIN CHARACTERS” are fully explained in the list of character descriptions following Table 1 in the same PDF). The data matrix follows “MATRIX” and gives the numerical values of characters for each taxon. Question marks represent missing data. The lists of characters and taxa and details on the methods used for phylogenetic analysis are included in the submitted manuscript.

keywords: leafhopper; treehopper; evolution; Cretaceous; Eocene

published: 2021-01-25

Data from Retreat, detour, or advance? Understanding the movements of birds confronting the Gulf of Mexico.

Zenzal, T. J. ; Ward, Michael; Diehl, Rob; Buler, Jeffrey; Smolinsky, Jaclyn; Deppe, Jill; Bolus, Rachel; Celis-Murillo, Antonio; Moore, Frank (2021)

Dataset associated with Zenzal et al. Oikos submission: Retreat, detour, or advance? Understanding the movements of birds confronting the Gulf of Mexico. https://doi.org/10.1111/oik.07834 Four CSV files were used for analysis and are related to the following subsections under the “Statistics” heading in the “Materials and Methods” section of the journal article: 1. Departing the Edge = “AIC Analysis.csv” 2. Comparing Retreating to Advancing = “Advance and Retreat Analysis.csv” and “Wind Data at Departure.csv” 3. Food Abundance = “Fruit Data.csv” and “Arthropod Data.csv” Description of variables: Year: the year in which data were collected. Departure: the direction in which an individual departed the Bon Secour National Wildlife Refuge. “North” indicates an individual that departed ≥315° or <45°; “Circum” indicates an individual that departed east (45 – 134°) or west ( 225 – 314°); “Trans” indicates an individual that departed south (135 – 224°). Age: the age of an individual at capture. Individuals were aged as hatch year (HY) or after hatch year (AHY) according to Pyle (1997; see related article for full citation). Fat: the fat score of an individual at capture. Individuals were scored on a 6-point scale ranging from 0-5 following Helms and Drury (1960; see related article for full citation). Species: the standardized four letter alphabetic code used as an abbreviation for English common names of North American Birds. SWTH: Catharus ustulatus; REVI: Vireo olivaceus; INBU: Passerina cyanea; WOTH: Hylocichla mustelina; RTHU: Archilochus colubris. FTM_SD: stopover duration or number of days between first capture and departure from automated radio telemetry system coverage at the Bon Secour National Wildlife Refuge. TMB_SD: stopover duration or number of days between first and last detection from automated radio telemetry systems north of Mobile Bay, AL, USA. Mean speed north (km/hr): the northbound travel speed of individuals retreating from the Bon Secour National Wildlife Refuge by determining the time when the signal strength indicated the bird was directly east or west of the automated telemetry system and dividing the amount of time it took for an individual to move in an assumed straight path between the Refuge systems and those north of Mobile Bay, AL, USA. Mean speed south (km/hr): the southbound travel speed of individuals advancing from north of Mobile Bay, AL, USA by determining the time when the signal strength indicated the bird was directly east or west of the automated telemetry system and dividing the amount of time it took for an individual to move in an assumed straight path between the Refuge systems and those north of Mobile Bay, AL, USA. LN_FTM_DEP_TIME: the natural log of departure time from the Bon Secour National Wildlife Refuge. Departure time is defined as the number of hours before or after civil twilight. LN_TMB_DEP_TIME: the natural log of departure time from north of Mobile Bay, AL, USA. Departure time is defined as the number of hours before or after civil twilight. Paired_FTM_DEP_TIME: the departure time or number of hours before or after civil twilight from Bon Secour National Wildlife Refuge. Paired_TMB_DEP_TIME: the departure time or number of hours before or after civil twilight from north of Mobile Bay, AL, USA. Wind Direction: the direction from which the wind originated at the Bon Secour National Wildlife Refuge on nights when individuals were departing. “N” indicates winds from the north (≥315° or <45°); “E” indicates winds from the east (45 – 134°); “W” indicates winds from the west ( 225 – 314°); “S” indicates winds from the south (135 – 224°). Wind Speed (m/s): the wind speed on nights when individuals were departing the Bon Secour National Wildlife Refuge. Group: the direction the bird was traveling under specific wind conditions. Northbound individuals traveled north from Bon Secour National Wildlife Refuge. Southbound individuals traveled south from habitats north of Mobile Bay, AL, USA. Fruit: weekly mean number of ripe fruit per meter. Site: the site from which the data were collected. FTM is located within the Bon Secour National Wildlife Refuge. TMB is located within the Jacinto Port Wildlife Management Area. DOY: number indicating day of year (i.e., 1 January = 001….31 December = 365). Arthropod Biomass: estimated mean arthropod biomass from each sampling period. Note: Empty cells indicate unavailable data where applicable.

keywords: migratory birds; migration; automated telemetry; Gulf of Mexico

published: 2024-11-27

Honey bee MERFISH data for SpaceExpress paper

Han, Hee-Sun; Schrader, Alex; Lee, JuYeon; Yeo, Seokjin; Traniello, Ian (2024)

Honey bee (apis mellifera) MERFISH data set prepared by the Han lab, from brains collected by the Robinson lab at UIUC. Dataset is comprised of ~22 thousand cells and 130 genes with x,y locations for each cell. Jupyter notebook file is included as an example to load the data using Scanpy.

keywords: smFISH; single transcript spatial transcriptomics; Honey bee brain; Apis mellifera; MERFISH

published: 2022-03-23

Data for: The Carbon Footprint of Cold Chain Food Flows in the United States

Wang, Junren; Karakoc, Deniz Berfin; Konar, Megan (2022)

This dataset is a estimation of county-to-county commodity delivery through cold chain in 2017. For each county pair, the weight[kg] and value[$] of the cold chain flow between origin and destination for SCTG 5 and SCTG 7 commodities are estimated by our model. - SCTG 5 - Meat, poultry, fish, seafood, and their preparations - SCTG 7 - Other prepared foodstuffs, fats, and oils

keywords: food flows; cold chain; county-scale; United States; carbon footprint

published: 2026-01-01

Data for "Environmental DNA metabarcoding for monitoring fish biodiversity in remote lakes"

Iacaruso, Nicholas J.; Myers, Jared T.; Seider, Michael J.; Davis, Mark (2026)

This dataset contains the data related to Chapter 2 of Iacaruso, N. (2026) "EVALUATING ENVIRONMENTAL DNA AS AN EARLY DETECTION METHOD FOR AQUATIC INVASIVE SPECIES". Doctoral Dissertation. University of Illinois Urbana-Champaign. This chapter will also be represented in Iacaruso et al. (2025) "Environmental DNA metabarcoding for monitoring fish biodiversity in remote lakes". North American Journal of Fisheries Management. (Forthcoming). The files contain the eDNA metabarcoding sequences from sampling Isle Royale lakes in 2021 and 2022, species read counts for each eDNA sample, and other information collected at each site.

keywords: eDNA; Fish; Management; Cisco

published: 2020-11-14

Global warming effects on nesting in Prothonotary Warblers

Hoover, Jeffrey; Schelsky, Wendy (2020)

Dataset includes temperature data (local average April daily temperatures), first egg dates and reproductive output of Prothonotary Warblers breeding in southernmost Illinois, USA. Also included are arrival dates for warblers returning to breeding grounds from wintering grounds, and global temperature anomaly data for comparison with local temperatures. These data were used in the manuscript entitled "Warmer April Temperatures on Breeding Grounds Promote Earlier Nesting in a Long-Distance Migratory Bird, the Prothonotary Warbler" published in Frontiers in Ecology and Evolution. A rich text file is included with explanations of each variable in the dataset.

keywords: first egg dates; global warming; local temperature effects; long-distance migratory bird; prothonotary warbler; protonotaria citrea; reproductive output

published: 2020-12-16

Responsible Terrorism Coverage (ResTeCo) Project Foreign Broadcast Information Service (FBIS) Dataset

Althaus, Scott; Bajjalieh, Joseph; Jungblut, Marc; Shalmon, Dan; Ghosh, Subhankar; Joshi, Pradnyesh (2020)

Terrorism is among the most pressing challenges to democratic governance around the world. The Responsible Terrorism Coverage (or ResTeCo) project aims to address a fundamental dilemma facing 21st century societies: how to give citizens the information they need without giving terrorists the kind of attention they want. The ResTeCo hopes to inform best practices by using extreme-scale text analytic methods to extract information from more than 70 years of terrorism-related media coverage from around the world and across 5 languages. Our goal is to expand the available data on media responses to terrorism and enable the development of empirically-validated models for socially responsible, effective news organizations. This particular dataset contains information extracted from terrorism-related stories in the Foreign Broadcast Information Service (FBIS) published between 1995 and 2013. It includes variables that measure the relative share of terrorism-related topics, the valence and intensity of emotional language, as well as the people, places, and organizations mentioned. This dataset contains 3 files: 1. "ResTeCo Project FBIS Dataset Variable Descriptions.pdf" A detailed codebook containing a summary of the Responsible Terrorism Coverage (ResTeCo) Project Foreign Broadcast Information Service (FBIS) Dataset and descriptions of all variables. 2. "resteco-fbis.csv" This file contains the data extracted from terrorism-related media coverage in the Foreign Broadcast Information Service (FBIS) between 1995 and 2013. It includes variables that measure the relative share of topics, sentiment, and emotion present in this coverage. There are also variables that contain metadata and list the people, places, and organizations mentioned in these articles. There are 53 variables and 750,971 observations. The variable "id" uniquely identifies each observation. Each observation represents a single news article. Please note that care should be taken when using "resteco-fbis.csv". The file may not be suitable to use in a spreadsheet program like Excel as some of the values get to be quite large. Excel cannot handle some of these large values, which may cause the data to appear corrupted within the software. It is encouraged that a user of this data use a statistical package such as Stata, R, or Python to ensure the structure and quality of the data remains preserved. 3. "README.md" This file contains useful information for the user about the dataset. It is a text file written in mark down language Citation Guidelines 1) To cite this codebook please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project Foreign Broadcast Information Service (FBIS) Dataset Variable Descriptions. Responsible Terrorism Coverage (ResTeCo) Project Foreign Broadcast Information Service (FBIS) Dataset. Cline Center for Advanced Social Research. December 16. University of Illinois Urbana-Champaign. doi: https://doi.org/10.13012/B2IDB-6360821_V1 2) To cite the data please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project Foreign Broadcast Information Service (FBIS) Dataset. Cline Center for Advanced Social Research. December 16. University of Illinois Urbana-Champaign. doi: https://doi.org/10.13012/B2IDB-6360821_V1

keywords: Terrorism, Text Analytics, News Coverage, Topic Modeling, Sentiment Analysis

published: 2020-12-16

Responsible Terrorism Coverage (ResTeCo) Project BBC Summary of World Broadcasts (SWB) Dataset

Althaus, Scott; Bajjalieh, Joseph; Jungblut, Marc; Shalmon, Dan; Ghosh, Subhankar; Joshi, Pradnyesh (2020)

Terrorism is among the most pressing challenges to democratic governance around the world. The Responsible Terrorism Coverage (or ResTeCo) project aims to address a fundamental dilemma facing 21st century societies: how to give citizens the information they need without giving terrorists the kind of attention they want. The ResTeCo hopes to inform best practices by using extreme-scale text analytic methods to extract information from more than 70 years of terrorism-related media coverage from around the world and across 5 languages. Our goal is to expand the available data on media responses to terrorism and enable the development of empirically-validated models for socially responsible, effective news organizations. This particular dataset contains information extracted from terrorism-related stories in the Summary of World Broadcasts published between 1979 and 2019. It includes variables that measure the relative share of terrorism-related topics, the valence and intensity of emotional language, as well as the people, places, and organizations mentioned. This dataset contains 3 files: 1. "ResTeCo Project SWB Dataset Variable Descriptions.pdf" A detailed codebook containing a summary of the Responsible Terrorism Coverage (ResTeCo) Project BBC Summary of World Broadcasts (SWB) Dataset and descriptions of all variables. 2. "resteco-swb.csv" This file contains the data extracted from terrorism-related media coverage in the BBC Summary of World Broadcasts (SWB) between 1979 and 2019. It includes variables that measure the relative share of topics, sentiment, and emotion present in this coverage. There are also variables that contain metadata and list the people, places, and organizations mentioned in these articles. There are 53 variables and 438,373 observations. The variable "id" uniquely identifies each observation. Each observation represents a single news article. Please note that care should be taken when using "resteco-swb.csv". The file may not be suitable to use in a spreadsheet program like Excel as some of the values get to be quite large. Excel cannot handle some of these large values, which may cause the data to appear corrupted within the software. It is encouraged that a user of this data use a statistical package such as Stata, R, or Python to ensure the structure and quality of the data remains preserved. 3. "README.md" This file contains useful information for the user about the dataset. It is a text file written in markdown language Citation Guidelines 1) To cite this codebook please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project BBC Summary of World Broadcasts (SWB) Dataset Variable Descriptions. Responsible Terrorism Coverage (ResTeCo) Project BBC Summary of World Broadcasts (SWB) Dataset. Cline Center for Advanced Social Research. December 16. University of Illinois Urbana-Champaign. doi: https://doi.org/10.13012/B2IDB-2128492_V1 2) To cite the data please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project Summary of World Broadcasts (SWB) Dataset. Cline Center for Advanced Social Research. December 16. University of Illinois Urbana-Champaign. doi: https://doi.org/10.13012/B2IDB-2128492_V1

keywords: Terrorism, Text Analytics, News Coverage, Topic Modeling, Sentiment Analysis