Dataset Search

Displaying 226 - 250 of 1082 in total

Filters

Subject Area

Life Sciences (668)

Physical Sciences (158)

Social Sciences (149)

Technology and Engineering (92)

Uncategorized

Arts and Humanities (2)

Funder

U.S. Department of Energy (DOE) (283)

Other (283)

U.S. National Science Foundation (NSF) (256)

U.S. National Institutes of Health (NIH) (92)

U.S. Department of Agriculture (USDA) (66)

Illinois Department of Natural Resources (IDNR) (28)

U.S. Geological Survey (USGS) (8)

U.S. National Aeronautics and Space Administration (NASA) (7)

Illinois Department of Transportation (IDOT) (4)

U.S. Army (3)

Publication Year

2025 (284)

2021 (108)

2022 (106)

2024 (105)

2026 (104)

2020 (96)

2023 (75)

2019 (72)

2018 (61)

2017 (36)

2016 (30)

2009 (1)

2011 (1)

2012 (1)

2014 (1)

2015 (1)

License

CC BY (572)

CC0 (481)

custom (29)

Illinois Data Bank Dataset Search Results

Results

published: 2024-03-28

Enhancing Carrier Mobility In Monolayer MoS2 Transistors With Process induced Strain

Zhang, Yue; Zhao, Helin; Huang, Siyuan; Hossain, Mohhamad Abir; van der Zande, Arend (2024)

Read me file for the data repository ******************************************************************************* This repository has raw data for the publication "Enhancing Carrier Mobility In Monolayer MoS2 Transistors With Process Induced Strain". We arrange the data following the figure in which it first appeared. For all electrical transfer measurement, we provide the up-sweep and down-sweep data, with voltage units in V and conductance unit in S. All Raman modes have unit of cm^-1. ******************************************************************************* How to use this dataset All data in this dataset is stored in binary Numpy array format as .npy file. To read a .npy file: use the Numpy module of the python language, and use np.load() command. Example: suppose the filename is example_data.npy. To load it into a python program, open a Jupyter notebook, or in the python program, run: import numpy as np data = np.load("example_data.npy") Then the example file is stored in the data object. *******************************************************************************

published: 2025-11-06

Data for Photoenzymatic Asymmetric Hydroamination for Chiral Alkyl Amine Synthesis

Harrison, Wesley; Jiang, Guangde; Zhang, Zhengyi; Li, Maolin; Chen, Haoyu; Zhao, Huimin (2025)

Chiral alkyl amines are common structural motifs in pharmaceuticals, natural products, synthetic intermediates, and bioactive molecules. An attractive method to prepare these molecules is the asymmetric radical hydroamination; however, this approach has not been explored with dialkyl amine-derived nitrogen-centered radicals since designing a catalytic system to generate the aminium radical cation, to suppress deleterious side reactions such as α-deprotonation and H atom abstraction, and to facilitate enantioselective hydrogen atom transfer is a formidable task. Herein, we describe the application of photoenzymatic catalysis to generate and harness the aminium radical cation for asymmetric intermolecular hydroamination. In this reaction, the flavin-dependent ene-reductase photocatalytically generates the aminium radical cation from the corresponding hydroxylamine and catalyzes the asymmetric intermolecular hydroamination to furnish the enantioenriched tertiary amine, whereby enantioinduction occurs through enzyme-mediated hydrogen atom transfer. This work highlights the use of photoenzymatic catalysis to generate and control highly reactive radical intermediates for asymmetric synthesis, addressing a long-standing challenge in chemical synthesis.

keywords: Conversion;Bioproducts;Catalysis

published: 2025-12-01

Data for "Modeling the Global Citation Network using the Scalable Agent-based Simulator for Citation Analysis with Recency-emphasized Sampling (SASCA-ReS)"

Park, Minhyuk; Yi, Haotian; Warnow, Tandy; Chacko, George (2025)

This dataset principally consists of four synthetic citation networks that were generated during the preparation of the manuscript Park M, Yi H, Warnow T, and Chacko G (2025). Modeling the Global Citation Network using the Scalable Agent-based Simulator for Citation Analysis with Recency-emphasized Sampling (SASCA-ReS). A preprint is available on Zenodo (below) and the manuscript has been submitted to the MetaRoR platform for review and feedback. @misc{park_2025_17789558, author = {Park, Minhyuk and Yi, Haotian and Warnow, Tandy and Chacko, George}, title = {Modeling the Global Citation Network using the Scalable Agent-based Simulator for Citation Analysis with Recency-emphasized Sampling (SASCA- ReS) }, month = dec, year = 2025, publisher = {Zenodo}, doi = {10.5281/zenodo.17789558}, url = {https://doi.org/10.5281/zenodo.17789558}, } The networks are roughly 14, 76, 161, and 218 million nodes each. Both nodelists with attributes and edge lists are provided as gzipped parquet files along with the configuration file that was passed to the SASCA-ReS software, which can be accessed at: <a href="https://github.com/illinois-or-research-analytics/SASCA-ReS">https://github.com/illinois-or-research-analytics/SASCA-ReS</a>. A copy of the configuration file that was used to generate the network with SASCA-ReS is also provided. For example: abm14_config.ini; abm14_edgelist.parquet.gz; and abm14_nodelist.parquet.gz. The column headers in the edgelists and nodelists and the fields in the configuration file are explained in the Github repository for SASCA-ReS. In addition, we provide sj_reccount, a table of real world citation frequencies that is an input to the SASCA-Res software. The first column (diff) of sj_reccount lists the difference between the publication year of a citing document and the publication year of a cited document. The second column (count) reports the frequency of such citations across the dataset of 77879427 observations, which is derived from the biomedical literature. Finally, we share data, composite_maverick_disruption.csv , from the mavericks (unconventional citing strategies) experiment reported in the Park et al. (2025) manuscript available at <a href="https://zenodo.org/records/17772113">https://zenodo.org/records/17772113</a>. The columns in the composite_maverick_disruption.csv file are: node_id -> of agents in the various simulations n_i, n_j, n_k -> terms used to compute disruption per "Wu, L., Wang, D. & Evans, J.A. Large teams develop and small teams disrupt science and technology. Nature 566, 378–382 (2019). <a href="https://doi.org/10.1038/s41586-019-0941-9">https://doi.org/10.1038/s41586-019-0941-9"</a> disruption -> the disruption metric of Wu, Wang, and Evans (2019) type -> maverick type (maximizer, randomnik, or minimizer) year -> virtual year in the simulation when the maverick was created alpha -> the alpha parameter of the control agent pa_weight -> the preferential attachment weight of the control agent phenotype fit_peak_value -> the fitness value assigned to the control agent in_degree -> the count of citations accumulated by the maverick or control agent at the end of the simulation out_degree -> the count of references made by the maverick tag -> a label for the experiment, e.g. od249_f1 indicates that the mavericks in this experiment made 249 citations and were assigned a fitness value of 1.

keywords: synthetic networks; agent based models; SASCA-ReS; citation networks

published: 2018-10-17

Wetland compensation and its impacts on β-diversity

Price, Edward; Spyreas, Greg; Matthews, Jeffrey (2018)

This is the dataset used in the Ecological Applications publication of the same name. This dataset consists of the following files: Internal.Community.Data.txt Regional.Community.Data.txt Site.Attributes.txt Year.Of.Final.Bio.Monitoring.txt Internal.Community.Data.txt is a site and plot by species matrix. Column labeled SITE consists of site IDs. Column labeled Plot consists of Plot numbers. All other columns represent species relative abundances per plot. Regional.Community.Data.txt is a site by species matrix of relative abundances. Column labeled site consists of site IDs. All other columns represent species relative abundances per site. Site.attributes.txt is a matrix of site attributes. Column labeled SITE consists of site IDs. Column labeled Long represents longitude in decimal degrees. Column labeled Lat represents latitude in decimal degrees. Column labeled Richness represents species richness of sites calculated from Regional Community Data. Column labeled NAT_COMP_REST represents designation as a randomly selected natural wetland (NAT), compensation wetland (COMP) or reference quality natural wetland (REF). Column labeled HQ_LQ_COMP represents designation as high quality (HQ), low quality (LQ) or compensation wetland (COMP). Column labeled SAMPLING_YEAR_INTERNAL represents year data used for analysis of internal β-diversity was gathered. Column labeled SAMPLING_YEAR_REGIONAL represents year data used for analysis of regional β-diversity was gathered. Column labeled TRANSECT_LENGTH represents length in meters of initial sampling transect. INAI_GRADE represents Illinois Natural Areas Inventory grades assigned to each site. Grades range from A for highest quality natural areas to E for lowest quality natural areas. Year.Of.Final.Bio.Monitoring.txt is a table representing years of final monitoring of compensation wetlands as mandated by the US Army Corps of Engineers. Column labeled Site consists of site IDs. Column labeled YR_FIN_BIO_MON consists of years of final monitoring. Entries of N/A represent dates that were unable to be located. More information about this dataset: Interested parties can request data from the Critical Trends Assessment Program, which was the source for data on naturally occurring wetlands in this study. More information on the program and data requests can be obtained by visiting the program webpage. Critical Trends Assessment Program, Illinois Natural History Survey. http://wwx.inhs.illinois.edu/research/ctap/

keywords: biodiversity; wetlands; wetland mitigation; biotic homogenization; beta diversity

published: 2018-12-04

NEXUS data file for phylogenetic analysis of Evacanthinae (Hemiptera: Cicadellidae)

Wang, Yang; Dietrich, Christopher; Zhang, Yalin (2018)

The text file contains the original data used in the phylogenetic analyses of Wang et al. (2017: Scientific Reports 7:45387). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 81 taxa (species) and 2905 characters, indicate that the first 2805 characters are DNA sequence and the last 100 are morphological, that the data may be interleaved (with data for one species on multiple rows), that gaps inserted into the DNA sequence alignment are indicated by a dash, and that missing data are indicated by a question mark. The file contains aligned nucleotide sequence data for 5 gene regions and 100 morphological characters. The identity and positions of data partitions are indicated in the mrbayes block of commands for the phylogenetic program MrBayes at the end of the file. The mrbayes block also contains instructions for MrBayes on various non-default settings for that program. These are explained in the original publication. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supplementary file.

keywords: phylogeny; DNA sequence; morphology; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; wingless; histone H3; cytochrome oxidase I; bayesian analysis

published: 2025-10-30

Data for Intron-Mediated Enhancement of DIACYLGLYCEROL ACYLTRANSFERASE1 Expression in Energycane Promotes a Step Change for Lipid Accumulation in Vegetative Tissues

Cao, Dang Viet; Luo, Guangbin; Korynta, Shelby; Liu, Hui; Liang, Yuanxue; Shanklin, John; Altpeter, Fredy (2025)

Metabolic engineering for hyperaccumulation of lipids in vegetative tissues is a novel strategy for enhancing energy density and biofuel production from biomass crops. Energycane is a prime feedstock for this approach due to its high biomass production and resilience under marginal conditions. DIACYLGLYCEROL ACYLTRANSFERASE (DGAT) catalyzes the last and only committed step in the biosynthesis of triacylglycerol (TAG) and can be a rate-limiting enzyme for the production of TAG. In this study, we explored the effect of intron-mediated enhancement (IME) on the expression of DGAT1 and resulting accumulation of TAG and total fatty acid (TFA) in leaf and stem tissues of energycane. To maximize lipid accumulation these evaluations were carried out by co-expressing the lipogenic transcription factor WRINKLED1 (WRI1) and the TAG protect factor oleosin (OLE1). Including an intron in the codon-optimized TmDGAT1 elevated the accumulation of its transcript in leaves by seven times on average based on 5 transgenic lines for each construct. Plants with WRI1 (W), DGAT1 with intron (Di), and OLE1 (O) expression (WDiO) accumulated TAG up to a 3.85% of leaf dry weight (DW), a 192-fold increase compared to non-modified energycane (WT) and a 3.8-fold increase compared to the highest accumulation under the intron-less gene combination (WDO). This corresponded to TFA accumulation of up to 8.4% of leaf dry weight, a 2.8-fold or 6.1-fold increase compared to WDO or WT, respectively. Co-expression of WDiO resulted in stem accumulations of TAG up to 1.14% of DW or TFA up to 2.08% of DW that exceeded WT by 57-fold or 12-fold and WDO more than twofold, respectively. Constitutive expression of these lipogenic “push pull and protect” factors correlated with biomass reduction. Intron-mediated enhancement (IME) of the expression of DGAT resulted in a step change in lipid accumulation of energycane and confirmed that under our experimental conditions it is rate limiting for lipid accumulation. IME should be applied to other lipogenic factors and metabolic engineering strategies. The findings from this study may be valuable in developing a high biomass feedstock for commercial production of lipids and advanced biofuels.

keywords: Feedstock Production;Lipidomics;Metabolomics

published: 2025-11-19

Data for Adapting C4 Photosynthesis to Atmospheric Change and Increasing Productivity by Elevating Rubisco Content in Sorghum and Sugarcane

Salesse-Smith, Coralie; Adar, Noga; Kannan, Baskaran; Nguyen, Thaibinhduong; Wei, Wei; Guo, Minghao; Ge, Zhengxiang; Altpeter, Fredy; Clemente, Tom; Long, Stephen (2025)

This repository includes data sets and R scripts that were used to perform analysis and produce figures for the following publication: Salesse-Smith, C. E. et al. “Adapting C4 photosynthesis to atmospheric change and increasing productivity by elevating Rubisco content in sorghum and sugarcane.” Proceedings of the National Academy of Sciences 122, e2419943122 (2025) doi:10.1073/pnas.2419943122.

keywords: Feedstock Production;Biomass Analytics;Sorghum;Sugarcane

published: 2017-12-22

Targeted ballet program mitigates ataxia and improves agility in moderate-to-advanced multiple sclerosis

Scheidler, Andrew; Kinnett-Hopkins, Dominique; Learmonth, Yvonne; Motl, Robert; Lopez-Ortiz, Citlali (2017)

TBP assessment raw data files of pre- and post- motion capture velocity and center of pressure force plate data. Labels are self-explanatory. The .mat files refer to data exported from the force plate for the time-to-stabilization assessments while the .txt files are the data collected for smoothness of gait assessments. These files do not relate to one another and are from separate assessments. Version2's files are the result from using Python code Data_Bank_Cleaner.py on version1's. Please find more information in READ_ME_databank.txt.

keywords: Multiple Sclerosis; Rehabilitation; Balance; Ataxia; Ballet; Dance; Targeted Ballet Program

published: 2018-04-23

Author-Linked data for Author-ity 2009

Torvik, Vetle I. (2018)

Provides links to Author-ity 2009, including records from principal investigators (on NIH and NSF grants), inventors on USPTO patents, and students/advisors on ProQuest dissertations. Note that NIH and NSF differ in the type of fields they record and standards used (e.g., institution names). Typically an NSF grant spanning multiple years is associated with one record, while an NIH grant occurs in multiple records, for each fiscal year, sub-projects/supplements, possibly with different principal investigators. The prior probability of match (i.e., that the author exists in Author-ity 2009) varies dramatically across NIH grants, NSF grants, and USPTO patents. The great majority of NIH principal investigators have one or more papers in PubMed but a minority of NSF principal investigators (except in biology) have papers in PubMed, and even fewer USPTO inventors do. This prior probability has been built into the calculation of match probabilities. The NIH data were downloaded from NIH exporter and the older NIH CRISP files. The dataset has 2,353,387 records, only includes ones with match probability > 0.5, and has the following 12 fields: 1 app_id, 2 nih_full_proj_nbr, 3 nih_subproj_nbr, 4 fiscal_year 5 pi_position 6 nih_pi_names 7 org_name 8 org_city_name 9 org_bodypolitic_code 10 age: number of years since their first paper 11 prob: the match probability to au_id 12 au_id: Author-ity 2009 author ID The NSF dataset has 262,452 records, only includes ones with match probability > 0.5, and the following 10 fields: 1 AwardId 2 fiscal_year 3 pi_position, 4 PrincipalInvestigators, 5 Institution, 6 InstitutionCity, 7 InstitutionState, 8 age: number of years since their first paper 9 prob: the match probability to au_id 10 au_id: Author-ity 2009 author ID There are two files for USPTO because here we linked disambiguated authors in PubMed (from Author-ity 2009) with disambiguated inventors. The USPTO linking dataset has 309,720 records, only includes ones with match probability > 0.5, and the following 3 fields 1 au_id: Author-ity 2009 author ID 2 inv_id: USPTO inventor ID 3 prob: the match probability of au_id vs inv_id The disambiguated inventors file (uiuc_uspto.tsv) has 2,736,306 records, and has the following 7 fields 1 inv_id: USPTO inventor ID 2 is_lower 3 is_upper 4 fullnames 5 patents: patent IDs separated by '|' 6 first_app_yr 7 last_app_yr

keywords: PubMed; USPTO; Principal investigator; Name disambiguation

published: 2025-12-15

The Vector Competence and Vectorial Capacity of Aedes Albopictus for Ross River Virus in the United States as a Function of Temperature

Spina, Joseph (2025)

Vector competence and survival data for Aedes albopictus mosquitoes exposed to Ross River virus

keywords: Emerging viruses; vectorial capacity; vector competence; container-breeding mosquitoes; alphavirus; Culicidae

published: 2020-04-07

Dataset for "Body mass and cardiorespiratory fitness are associated with altered brain metabolism"

Larsen, Ryan; Charles, Hillman; Kramer, Arthur; Cohen, Neal; Barbey, Aron (2020)

Baseline data from a multi-modal intervention study conducted at the University of Illinois at Urbana-Champaign. Data include results from a cardiorespiratory fitness assessment (maximal oxygen consumption, VO2max), a body composition assessment (Dual-Energy X-ray Absorptiometry, DXA), and Magnetic Resonance Spectroscopy Imaging. Data set includes data from 435 participants, ages 18-44 years.

keywords: Magnetic Resonance Spectroscopy; N-acetyl aspartic acid (NAA); Body Mass Index; cardiorespiratory fitness; body composition

published: 2021-05-10

Global multi-model projections of urban daily temperatures

Zheng, Zhonghua; Zhao, Lei; Oleson, Keith (2021)

This dataset contains the emulated global multi-model urban daily temperature projections under RCP 8.5 scenario. The dataset is derived from the study "Large model structural uncertainty in global projections of urban heat waves" (XXXX). Details about this dataset and the local urban climate emulator are described in the article. This dataset documents the global urban daily temperatures of 17 CMIP5 Earth system models for 2006-2015 and 2061-2070. This dataset may be useful for multiple communities regarding urban climate change, heat waves, impacts, vulnerability, risks, and adaptation applications.

keywords: Urban heat waves; CMIP; urban warming; heat stress; urban climate change

published: 2025-10-10

Data for Metabolic Engineering of Low-pH-Tolerant Non-Model Yeast, Issatchenkia orientalis, for Production of Citramalate

Wu, Zong-Yen; Sun, Wan; Shen, Yihui; Pratas, Jimmy; Suthers, Patrick F.; Hsieh, Ping-Hung; Dwaraknath, Sudharsan; Rabinowitz, Joshua D.; Maranas, Costas D.; Shao, Zengyi; Yoshikuni, Yasuo (2025)

Methyl methacrylate (MMA) is an important petrochemical with many applications. However, its manufacture has a large environmental footprint. Combined biological and chemical synthesis (semisynthesis) may be a promising alternative to reduce both cost and environmental impact, but strains that can produce the MMA precursor (citramalate) at low pH are required. A non-conventional yeast, Issatchenkia orientalis, may prove ideal, as it can survive extremely low pH. Here, we demonstrate the engineering of I. orientalis for citramalate production. Using sequence similarity network analysis and subsequent DNA synthesis, we selected a more active citramalate synthase gene (cimA) variant for expression in I. orientalis. We then adapted a piggyBac transposon system for I. orientalis that allowed us to simultaneously explore the effects of different cimA gene copy numbers and integration locations. A batch fermentation showed the genome-integrated-cimA strains produced 2.0 g/L citramalate in 48 h and a yield of up to 7% mol citramalate/mol consumed glucose. These results demonstrate the potential of I. orientalis as a chassis for citramalate production.

keywords: Conversion;Metabolomics

published: 2025-10-17

Data for Solvent-Free Enzymatic Esterification of Free Fatty Acids with Glycerol for Biodiesel Application: Optimized Using the Taguchi Experimental Method

Singh, Ramkrishna; Dien, Bruce S.; Singh, Vijay (2025)

Presence of free fatty acids along with glycerides poses a technical difficulty for biodiesel production. This work used a Taguchi L9 design to optimize the solvent-free enzymatic process to result in the esterification of oleic acid with glycerol. Under optimal conditions the esterification reaction temperature of 60°C, enzyme dose of 5 wt%, glycerol: oleic acid molar ratio of 5:1, and reaction time of 3 h, a 75.235 ± 2.19% conversion of oleic acid to esters was achieved. With the addition of molecular sieves, the conversion increased to 86.73% ± 1.09%. However, using the parameters predicted by Taguchi design (60°C, 5 wt%, 5:1, and 4.5 h), 88.5% ± 1.11% of oleic acid could be converted to esters derivative. Diglycerides were the major product, and the reaction equilibrium was attained after 4 h. The immobilized enzyme could be used up to seven times with only a 10% reduction in the conversion. Thus, the process can efficiently reduce the free fatty acid content of oil to make it suitable for biodiesel production.

keywords: Conversion;Lipidomics

published: 2025-11-18

Data for Appendix 7 PMID Duplication in the Union List of "Analyzing the consistency of retraction indexing"

McCumber, Corinne; Salami, Malik Oyewale (2025)

This project investigates retraction indexing agreement among data sources: BCI, BIOABS, CCC, Compendex, Crossref, GEOBASE, MEDLINE, PubMed, Retraction Watch, Scopus, and Web of Science Core. Post-retraction citation may be partly due to authors’ and publishers' challenges in systematically identifying retracted publications. To investigate retraction indexing quality, we investigate the agreement in indexing retracted publications between 11 database sources, restricting to their coverage, resulting in a union list of 85,392 unique items. This dataset highlights items that went through a DOI augmentation process to have PubMed added as a source and that have duplicated PMIDs, indicating data quality issues.

keywords: retraction status; data quality; indexing; retraction indexing; metadata; meta-science; RISRS; PMID duplication; identifier granularity

published: 2019-03-05

UIUC Campus Gamma-Ray Radiation Data

Zhao, Jifu (2019)

This dataset contains the raw nuclear background radiation data collected in the engineering campus of University of Illinois at Urbana-Champaign. It contains three columns, x, y, and counts, which corresponds to longitude, latitude, and radiation count rate (counts per second). In addition to the original background radiation data, there are several separate files that contain the simulated radioactive sources. For more detailed README file, please refer to this documentation: <a href= "https://www.dropbox.com/s/xjhmeog7fvijml7/README.pdf?dl=0">https://www.dropbox.com/s/xjhmeog7fvijml7/README.pdf?dl=0</a>

keywords: Nuclear Radiation

published: 2025-10-10

Data from Response Surface Methodology Guided Adsorption and Recovery of Free Fatty Acids from Oil Using Resin

Singh, Ramkrishna; Dien, Bruce; Singh, Vijay (2025)

The presence of free fatty acids interferes with the conversion of plant oils to biodiesel. Four strong and weak base resins were evaluated for the removal of free fatty acids (FFA) from oil. Amberlite FPA 51 showed the highest adsorption capacity of FFA. A resin concentration above 3% could enable a higher percentage FFA adsorption. The adsorption process fitted a pseudo-first-order kinetic model and achieved equilibrium in approximately 8 h. A full factorial design was used to optimize the resin and FFA concentrations at a fixed temperature (40° C). A ratio of resin to fatty acid concentrations above 1.875 was sufficient for 70% adsorption and the amount adsorbed continued to increase with further added resin. A two-step washing of resin using hexane and ethanol recovered approximately 67.55% ± 4.05% of the initially added fatty acid. The resin that was used was regenerated with 5% NaOH and re-used for a minimum of three consecutive cycles. However, the adsorption capacity diminished to 75% of the initial cycle in cycles 2 and 3. Thus, the work presents a resin-based process for deacidification of oil to reduce fatty acid content of oil for biodiesel production.

keywords: Conversion;Feedstock Bioprocessing

published: 2018-12-20

Words_Selected_by_Manual_Analysis

Dong, Xiaoru; Xie, Jingyi; Hoang, Linh (2018)

File Name: WordsSelectedByManualAnalysis.csv Data Preparation: Xiaoru Dong, Linh Hoang Date of Preparation: 2018-12-14 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: this file contains the list of 407 informative words reselected from the 1655 words by manual analysis. In particular, from the 1655 words that we got from information gain feature selection, we then manually read and eliminated the domain specific words. The remaining words then were selected into the "Manual Analysis Words" as the results. Notes: Even though the list of words in this file was selected manually. However, in order to reproduce the relevant data to this, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.

keywords: Inclusion criteria; Randomized controlled trials; Machine learning; Systematic reviews

published: 2023-02-10

Integrating multiple data sources improves prediction and inference for upland game occupancy models

Emmet, Robert L.; Benson, Thomas J.; Allen, Maximilian L.; Stodola, Kirk W. (2023)

Data and documentation for Ornithological Applications manuscript “Integrating multiple data sources improves prediction and inference for upland game bird occupancy models” by Robert L. Emmet, Thomas J. Benson, Maximilian L. Allen, and Kirk W. Stodola We combined data from the North American Breeding Bird Survey and eBird with a targeted survey (IDNR upland game) to estimate habitat use of northern bobwhite and ring-necked pheasant in Illinois and to document the efficiency and overlap among the various data sources. Data include, eBird, USGS Breeding Bird Survey, National Land Cover Database, Upland game bird surveys, stream data)

keywords: data integration; occupancy; avian population modelling; northern bobwhite;Colinus virginianus; ring-necked pheasant; Phasianus colchicus

published: 2023-02-07

Data from: DISCO+QR: Rooting Species Trees in the Presence of GDL and ILS

Willson, James; Tabatabaee, Yasamin; Liu, Baqiao; Warnow, Tandy (2023)

Data sets from "DISCO+QR: Rooting Species Trees in the Presence of GDL and ILS." It contains trees and sequences simulated with gene duplication and loss under a variety of different conditions. Note: - trees.tar.gz contains the simulated gene-family trees used in our experiments (both true trees from SimPhy as well as trees estimated from alignments). - alignments.tar.gz contains simulated sequence data used for estimating the gene-family trees

keywords: evolution; computational biology; bioinformatics; phylogenetics

published: 2025-09-18

Data from Impact of Fractionation Process on the Technical and Economic Viability of Corn Dry Grind Ethanol Process

Kurambhatti, Chinmay V.; Kumar, Deepak; Singh, Vijay (2025)

Use of corn fractionation techniques in dry grind process increases the number of coproducts, enhances their quality and value, generates feedstock for cellulosic ethanol production and potentially increases profitability of the dry grind process. The aim of this study is to develop process simulation models for eight different wet and dry corn fractionation techniques recovering germ, pericarp fiber and/or endosperm fiber, and evaluate their techno-economic feasibility at the commercial scale. Ethanol yields for plants processing 1113.11 MT corn/day were 37.2 to 40 million gal for wet fractionation and 37.3 to 31.3 million gal for dry fractionation, compared to 40.2 million gal for conventional dry grind process. Capital costs were higher for wet fractionation processes ($92.85 to $97.38 million) in comparison to conventional ($83.95 million) and dry fractionation ($83.35 to $84.91 million) processes. Due to high value of coproducts, ethanol production costs in most fractionation processes ($1.29 to $1.35/gal) were lower than conventional ($1.36/gal) process. Internal rate of return for most of the wet (6.88 to 8.58%) and dry fractionation (6.45 to 7.04%) processes was higher than the conventional (6.39%) process. Wet fractionation process designed for germ and pericarp fiber recovery was most profitable among the processes.

keywords: Conversion;Feedstock Bioprocessing;Modeling

published: 2025-10-17

Data for Efficient Delivery of a DNA Aptamer-Based Biosensor into Plant Cells for Glucose Sensing through Thiol-Mediated Uptake

Mou, Quanbing; Xue, Xueyi; Ma, Yuan; Banik, Mandira; Garcia, Valeria; Guo, Weijie; Wang, Jiang; Song, Tingjie; Chen, Li-Qing; Lu, Yi (2025)

DNA aptamers have been widely used as biosensors for detecting a variety of targets. Despite decades of success, they have not been applied to monitor any targets in plants, even though plants are a major platform for providing oxygen, food, and sustainable products ranging from energy fuels to chemicals, and high-value products such as pharmaceuticals. A major barrier to progress is a lack of efficient methods to deliver DNA into plant cells. We herein report a thiol-mediated uptake method that more efficiently delivers DNA into Arabidopsis and tobacco leaf cells than another state-of-the-art method, DNA nanostructures. Such a method allowed efficient delivery of a glucose DNA aptamer sensor into Arabidopsis for sensing glucose. This demonstration opens a new avenue to apply DNA aptamer sensors for functional studies of various targets, including metabolites, plant hormones, metal ions, and proteins in plants for a better understanding of the biodistribution and regulation of these species and their functions.

keywords: Conversion;Feedstock Production;Genomics

published: 2025-11-19

Data for Production of a δ-Lactam from Glucose through Integrating Biological and Chemical Catalysis

Kim, Min Soo; Shi, Longyuan; Zhao, Huimin; Huber, George (2025)

We present a new strategy for the production of a δ-lactam from glucose that integrates biological production of triacetic acid lactone (TAL, 4-hydroxy-6-methyl-2H-2-one) with catalytic transformation of TAL into 6-methylpiperidin-2-one (MPO) through metabolic engineering, isomerization, amination, and catalytic hydrogenation/hydrogenolysis. We developed a sustainable and antibiotic-free fed-batch fermentation using genetically modified Rhodotorula toruloides IFO0880. This process achieved a yield of 2-hydroxy-6-methyl-4H-pyran-4-one (2H4P) at 0.05 g/g of glucose, corresponding to a 9.9 g/L titer. By adjusting the pH of the fermentation broth to 2, 2H4P was quantitatively converted into TAL. The TAL in the fermentation broth was directly converted by aminolysis into 4-hydroxy-6-methylpyridin-2(1H)-one (HMPO), which achieved an 18.5% yield with 94.3% purity. The HMPO yield was lower in the fermentation broth than in a clean feedstock (32.2%), suggesting that the biological impurities are inhibitors in this reaction. Further investigation revealed that lower pH levels and reduced TAL concentrations in the fermentation broth significantly decreased HMPO yields. Subsequently, the precipitated HMPO was filtered and dried and then subjected to the final catalytic conversion in H2O solvent, achieving a MPO yield of 91.8%. This integrated approach demonstrated the direct use of TAL in the filtered aqueous fermentation broth without the need to isolate TAL.

keywords: Conversion;Catalysis;Metabolic Engineering

published: 2018-12-06

NEXUS data file for phylogenetic analysis of Iassinae (Hemiptera: Cicadellidae)

Krishnankutty, Sindhu; Dietrich, Christopher; Dai, Wu; Siddappaji, Madhura (2018)

The text file contains the original DNA sequence data used in the phylogenetic analyses of Krishnankutty et al. (2016: Systematic Entomology 41: 580–595). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The file contains five separate data blocks, one for each character partition (28S, histone H3, 12S, indels, and morphology) for 53 taxa (species). Gaps inserted into the DNA sequence alignment are indicated by a dash, and missing data are indicated by a question mark. The separate "indels1" block includes 40 indels (insertions/deletions) from the 28S sequence alignment re-coded using the modified complex indel coding scheme, as described in the "Materials and methods" of the original publication. The DIMENSIONS statements near the beginning of each block indicate the numbers of taxa (NTax) and characters (NChar). The file contains aligned nucleotide sequence data for 3 gene regions and 40 morphological characters. The file is configured for use with the maximum likelihood-based phylogenetic program GARLI but can also be parsed by any other bioinformatics software that supports the NEXUS format. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supporting pdf file. More details on individual analyses are provided in the original publication.

keywords: phylogeny; DNA sequence; morphology; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; histone H3; 12S mtDNA; maximum likelihood

published: 2020-11-06

Simulation Data for JUMPER: Discontinuous Transcript Assembly in SARS-CoV-2

Sashittal, Palash; Zhang, Chuanyi; El-Kebir, Mohammed (2020)

This data contains bam files and transcripts in the simulated instances generated for the paper 'JUMPER: Discontinuous Transcript Assembly in SARS-CoV-2' submitted for RECOMB 2021. The folder 'bam' contained the simulated bam files aligned using STAR wile the reads were generated using the method polyester Note: in the readme file, close to the end of the document, please ignore this sentence: 'Those files can be opened by using [name of software].'

keywords: transcript assembly; SARS-CoV-2; discontinuous transcription; coronaviruses