Illinois Data Bank Dataset Search Results
Results
published:
2025-10-01
Wang, Yajie; Huang, Xiaoqiang; Hui, Jingshu; Vo, Lam Tung; Zhao, Huimin
(2025)
There is a growing interest in developing cooperative chemoenzymatic reactions to harness the reactivity of chemical catalysts and the selectivity of enzymes for the synthesis of nonracemic chiral compounds. However, existing chemoenzymatic systems with more than one chemical reaction and one enzymatic reaction working cooperatively are rare. Moreover, the application of oxidoreductases in cooperative chemoenzymatic reactions is limited by the necessity of using expensive and unstable redox equivalents such as nicotinamide cofactors. Here, we report a light-driven cooperative chemoenzymatic system comprised of a photoinduced electron transfer reaction (PET) and a photosensitized energy transfer reaction (PEnT) with an enzymatic reduction in one-pot to synthesize chiral building blocks of bioactive compounds. As a proof of concept, ene-reductase was directly regenerated by PET in the absence of external cofactors. Meanwhile, enzymatic reduction worked cooperatively with photocatalyst-catalyzed energy transfer that continuously replenished the reactive isomer from the less reactive one. The whole system stereoconvergently reduced E/Z mixtures of alkenes to the enantiopure products. Additionally, enantioselective enzymatic reduction worked competitively with photocatalyst-catalyzed racemic background reaction and side reactions to channel the overall electron flow to the single enantiopure product. Such a light-driven cooperative chemoenzymatic system holds great potential for asymmetric synthesis using inexpensive petroleum or biomass-derived alkenes.
keywords:
Conversion;Catalysis
published:
2021-02-26
Bauder, Javan M; Allen, Maximilian L.
(2021)
These data were used in the survival and cause-specific mortality analyses of translocated nuisance American black bear in Wisconsin published in Animal Conservation (Bauder, J.M., N.M. Roberts, D. Ruid, B. Kohn, and M.L. Allen. Accepted. Lower survival of nuisance American black bears (Ursus americanus) is not due to translocation. Animal Conservation). Included are CSV files including each bear's capture history and associated covariates and meta-data for each CSV file. Also included is an example R script of how to conduct the analyses (this R script is also included as supporting information with the published paper).
keywords:
black bear; survival; translocation; nuisance wildlife management
published:
2025-09-18
Jagtap, Sujit; Bedekar, Ashwini; Liu, Jing-Jing; Jin, Yong-Su; Rao, Christopher V.
(2025)
Sugar alcohols are commonly used as low-calorie sweeteners and can serve as potential building blocks for bio-based chemicals. Previous work has shown that the oleaginous yeast Rhodosporidium toruloides IFO0880 can natively produce arabitol from xylose at relatively high titers, suggesting that it may be a useful host for sugar alcohol production. In this work, we explored whether R. toruloides can produce additional sugar alcohols. Rhodosporidium toruloides is able to produce galactitol from galactose. During growth in nitrogen-rich medium, R. toruloides produced 3.2 ± 0.6 g/L, and 8.4 ± 0.8 g/L galactitol from 20 to 40 g/L galactose, respectively. In addition, R. toruloides was able to produce galactitol from galactose at reduced titers during growth in nitrogen-poor medium, which also induces lipid production. These results suggest that R. toruloides can potentially be used for the co-production of lipids and galactitol from galactose. We further characterized the mechanism for galactitol production, including identifying and biochemically characterizing the critical aldose reductase. Intracellular metabolite analysis was also performed to further understand galactose metabolism. Rhodosporidium toruloides has traditionally been used for the production of lipids and lipid-based chemicals. Our work demonstrates that R. toruloides can also produce galactitol, which can be used to produce polymers with applications in medicine and as a precursor for anti-cancer drugs. Collectively, our results further establish that R. toruloides can produce multiple value-added chemicals from a wide range of sugars.
keywords:
Conversion;Genomics;Metabolomics
published:
2022-09-01
Di Giovanni, Alexander; Ward, Michael
(2022)
These data and code are associated with a study on differences in the rate of hatching failure of eggs across 14 free-living grassland and shrubland birds. We used a device to measure the embryonic heart rate of eggs and found there was variation across species related to factors such as nest type and nest safety. This work is to be published in Ornithology.
keywords:
embryonic death; grassland birds; egg mortality; heart rate
published:
2021-10-11
Peng, Jianhao; Ochoa, Idoia
(2021)
This dataset contains the ClonalKinetic dataset that was used in SimiC and its intermediate results for comparison. The Detail description can be found in the text file 'clonalKinetics_Example_data_description.txt' and 'ClonalKinetics_filtered.DF_data_description.txt'. The required input data for SimiC contains:
1. ClonalKinetics_filtered.clustAssign.txt => cluster assignment for each cell.
2. ClonalKinetics_filtered.DF.pickle => filtered scRNAseq matrix.
3. ClonalKinetics_filtered.TFs.pickle => list of driver genes.
The results after running SimiC contains:
1. ClonalKinetics_filtered_L10.01_L20.01_Ws.pickle => inferred GRNs for each cluster
2. ClonalKinetics_filtered_L10.01_L20.01_AUCs.pickle => regulon activity scores for each cell and each driver gene.
<b>NOTE:</b> “ClonalKinetics_filtered.rds” file which is mentioned in “ClonalKinetics_filtered.DF_data_description.txt” is an intermediate file and the authors have put all the processed in the pickle/txt file as described in the filtered data text.
keywords:
GRNs;SimiC;RDS;ClonalKinetic
published:
2021-08-12
Ferguson, John; Fernandes, Samuel; Monier, Brandon; Miller, Nathan; Allen, Dylan; Dmitrieva, Anna; Schmuker, Peter; Lozano, Roberto; Valluru, Ravi; Buckler, Edward; Gore, Michael; Brown, Patrick; Spalding, Edgar; Leakey, Andrew
(2021)
This dataset contains the images of a photoperiod sensitive sorghum accession population used for a GWAS/TWAS study of leaf traits related to water use efficiency in 2016 and 2017.
*<b>Note:</b> new in this second version is that JPG images outputted from the nms files were added
<b>Accessions_2016.zip</b> and <b>Accessions_2017.zip</b>: contain raw images produced by Optical Topometer (nms files) for all sorghum accessions. Images can be opened with Nanofocus μsurf analysis extended software (Oberhausen,Germany).
<b>Accessions_2016_jpg.zip</b> and <b>Accessions_2017_jpg.zip</b>: contain jpg images outputted from the nms files and used in the machine learning phenotyping.
keywords:
stomata; segmentation; water use efficiency
published:
2025-09-01
Chronic wasting disease (CWD) surveillance data from Illinois and Wisconsin, USA between the fiscal years 2003 and 2022 (calendar years 2002 and 2021). Data is reported at the township level as defined by the US Public Survey System. CWD cases, animals tested for CWD, and the apparent prevalence calculated from these values are given by township and fiscal year. Data has been anonymized by replacing original township names with identification numbers to maintain the privacy of landowners. Variables include Tests, Cases, and nonlinear transformations of Tests and Cases (inverse, square root, and log transformations).
keywords:
chronic wasting disease; cwd; white-tailed deer; deer; cervid; prion; apparent prevalence; prevalence; surveillance
published:
2025-12-15
Xiao, Tianxia; Khan, Artem; Shen, Yihui; Chen, Li; Rabinowitz, Joshua
(2025)
Ethanol and lactate are typical waste products of glucose fermentation. In mammals, glucose is catabolized by glycolysis into circulating lactate, which is broadly used throughout the body as a carbohydrate fuel. Individual cells can both uptake and excrete lactate, uncoupling glycolysis from glucose oxidation. Here we show that similar uncoupling occurs in budding yeast batch cultures of Saccharomyces cerevisiae and Issatchenkia orientalis. Even in fermenting S. cerevisiae that is net releasing ethanol, media 13C-ethanol rapidly enters and is oxidized to acetaldehyde and acetyl-CoA. This is evident in exogenous ethanol being a major source of both cytosolic and mitochondrial acetyl units. 2H-tracing reveals that ethanol is also a major source of both NADH and NADPH high-energy electrons, and this role is augmented under oxidative stress conditions. Thus, uncoupling of glycolysis from the oxidation of glucose-derived carbon via rapidly reversible reactions is a conserved feature of eukaryotic metabolism.
keywords:
Conversion;Metabolomics
published:
2025-10-01
Dai, Tao; Ellebracht, Nathan; Hunter Sellars, Elwin; Aui, Alvina; Hanna, Goldstein; Li, Wenqin; Hellwinckel, Chad; Price, Lydia; Wong, Andrew; Nico, Peter; Basso, Bruno; Robertson, G Philip; Pett-Ridge, Jennifer; Langholtz, Matthew; Baker, Sarah; Pang, Simon; Scown, Corinne
(2025)
Gigatonne-scale atmospheric carbon dioxide removal (CDR), alongside deep emission cuts, is critical to stabilizing the climate. However, some of the most scalable CDR technologies are also the most land intensive. Here, we examine whether adequate land resources exist in the contiguous United States to meet CDR targets when prioritizing grid emissions reduction, food production, and the protection of sensitive ecosystems. We focus on biomass carbon removal and storage (BiCRS) and direct air capture and storage (DACS) and show that suitable lands exceed the expected needs: 37.6 million hectares of land are available for BiCRS, resulting in 0.26 GtCO2 of CDR/year, and 34 million hectares are suitable for wind- and solar-powered DACS, resulting in 4.8 GtCO2 of CDR/year if facilities are co-located with geologic CO2 storage. We identify biomass and energy supply hotspots to meet CDR targets while ensuring land protection and minimizing land competition.
keywords:
carbon; geospatial
published:
2021-04-22
Torvik, Vetle; Smalheiser, Neil
(2021)
Author-ity 2018 dataset
Prepared by Vetle Torvik Apr. 22, 2021
The dataset is based on a snapshot of PubMed taken in December 2018 (NLMs baseline 2018 plus updates throughout 2018). A total of 29.1 million Article records and 114.2 million author name instances. Each instance of an author name is uniquely represented by the PMID and the position on the paper (e.g., 10786286_3 is the third author name on PMID 10786286). Thus, each cluster is represented by a collection of author name instances. The instances were first grouped into "blocks" by last name and first name initial (including some close variants), and then each block was separately subjected to clustering. The resulting clusters are provided in two different formats, the first in a file with only IDs and PMIDs, and the second in a file with cluster summaries:
####################
File 1: au2id2018.tsv
####################
Each line corresponds to an author name instance (PMID and Author name position) with an Author ID. It has the following tab-delimited fields:
1. Author ID
2. PMID
3. Author name position
########################
File 2: authority2018.tsv
#########################
Each line corresponds to a predicted author-individual represented by cluster of author name instances and a summary of all the corresponding papers and author name variants. Each cluster has a unique Author ID (the PMID of the earliest paper in the cluster and the author name position). The summary has the following tab-delimited fields:
1. Author ID (or cluster ID) e.g., 3797874_1 represents a cluster where 3797874_1 is the earliest author name instance.
2. cluster size (number of author name instances on papers)
3. name variants separated by '|' with counts in parenthesis. Each variant of the format lastname_firstname middleinitial, suffix
4. last name variants separated by '|'
5. first name variants separated by '|'
6. middle initial variants separated by '|' ('-' if none)
7. suffix variants separated by '|' ('-' if none)
8. email addresses separated by '|' ('-' if none)
9. ORCIDs separated by '|' ('-' if none). From 2019 ORCID Public Data File https://orcid.org/ and from PubMed XML
10. range of years (e.g., 1997-2009)
11. Top 20 most frequent affiliation words (after stoplisting and tokenizing; some phrases are also made) with counts in parenthesis; separated by '|'; ('-' if none)
12. Top 20 most frequent MeSH (after stoplisting) with counts in parenthesis; separated by '|'; ('-' if none)
13. Journal names with counts in parenthesis (separated by '|'),
14. Top 20 most frequent title words (after stoplisting and tokenizing) with counts in parenthesis; separated by '|'; ('-' if none)
15. Co-author names (lowercased lastname and first/middle initials) with counts in parenthesis; separated by '|'; ('-' if none)
16. Author name instances (PMID_auno separated by '|')
17. Grant IDs (after normalization; '-' if none given; separated by '|'),
18. Total number of times cited. (Citations are based on references harvested from open sources such as PMC).
19. h-index
20. Citation counts (e.g., for h-index): PMIDs by the author that have been cited (with total citation counts in parenthesis); separated by '|'
keywords:
author name disambiguation; PubMed
published:
2021-05-07
The dataset is based on a snapshot of PubMed taken in December 2018 (NLMs baseline 2018 plus updates throughout 2018), and for ORCIDs, primarily, the 2019 ORCID Public Data File https://orcid.org/.
Matching an ORCID to an individual author name on a PMID is a non-trivial process. Anyone can create an ORCID and claim to have contributed to any published work. Many records claim too many articles and most claim too few. Even though ORCID records are (most?) often populated by author name searches in popular bibliographic databases, there is no confirmation that the person's name is listed on the article. This dataset is the product of mapping ORCIDs to individual author names on PMIDs, even when the ORCID name does not match any author name on the PMID, and when there are multiple (good) candidate author names. The algorithm avoids assigning the ORCID to an article when there are no good candidates and when there are multiple equally good matches. For some ORCIDs that clearly claim too much, it triggers a very strict matching procedure (for ORCIDs that claim too much but the majority appear correct, e.g., 0000-0002-2788-5457), and sometimes deletes ORCIDs altogether when all (or nearly all) of its claimed PMIDs appear incorrect. When an individual clearly has multiple ORCIDs it deletes the least complete of them (e.g., 0000-0002-1651-2428 vs 0000-0001-6258-4628). It should be noted that the ORCIDs that claim to much are not necessarily due nefarious or trolling intentions, even though a few appear so. Certainly many are are due to laziness, such as claiming everything with a particular last name. Some cases appear to be due to test engineers (e.g., 0000-0001-7243-8157; 0000-0002-1595-6203), or librarians assisting faculty (e.g., ; 0000-0003-3289-5681), or group/laboratory IDs (0000-0003-4234-1746), or having contributed to an article in capacities other than authorship such as an Investigator, an Editor, or part of a Collective (e.g., 0000-0003-2125-4256 as part of the FlyBase Consortium on PMID 22127867), or as a "Reply To" in which case the identity of the article and authors might be conflated. The NLM has, in the past, limited the total number of authors indexed too. The dataset certainly has errors but I have taken great care to fix some glaring ones (individuals who claim to much), while still capturing authors who have published under multiple names and not explicitly listed them in their ORCID profile. The final dataset provides a "matchscore" that could be used for further clean-up.
Four files:
person.tsv: 7,194,692 rows, including header
1. orcid
2. lastname
3. firstname
4. creditname
5. othernames
6. otherids
7. emails
employment.tsv: 2,884,981 rows, including header
1. orcid
2. putcode
3. role
4. start-date
5. end-date
6. id
7. source
8. dept
9. name
10. city
11. region
12 country
13. affiliation
education.tsv: 3,202,253 rows, including header
1. orcid
2. putcode
3. role
4. start-date
5. end-date
6. id
7. source
8. dept
9. name
10. city
11. region
12 country
13. affiliation
pubmed2orcid.tsv: 13,133,065 rows, including header
1. PMID
2. au_order (author name position on the article)
3. orcid
4. matchscore (see below)
5. source: orcid (2019 ORCID Public Data File https://orcid.org/), pubmed (NLMs distributed XML files), or patci (an earlier version of ORCID with citations processed through the Patci tool)
12,037,375 from orcid; 1,06,5892 from PubMed XML; 29,797 from Patci
matchscore:
000: lastname, firstname and middle init match (e.g., Eric T MacKenzie vs
00: lastname, firstname match (e.g., Keith Ward)
0: lastname, firstname reversed match (e.g., Conde Santiago vs Santiago Conde)
1: lastname, first and middle init match (e.g., L. F. Panchenko)
11: lastname and partial firstname match (e.g., Mike Boland vs Michael Boland or Mel Ziman vs Melanie Ziman)
12: lastname and first init match
15: 3 part lastname and firstname match (David Grahame Hardie vs D Grahame Hardie)
2: lastname match and multipart firstname initial match Maria Dolores Suarez Ortega vs M. D. Suarez
22: partial lastname match and firstname match (e.g., Erika Friedmann vs Erika Friedman)
23: e.g., Antonio Garcia Garcia vs A G Garcia
25: Allan Downie vs J A Downie
26: Oliver Racz vs Oliver Bacz
27: Rita Ostrovskaya vs R U Ostrovskaia
29: Andrew Staehelin vs L A Staehlin
3: M Tronko vs N D Tron'ko
4: Sharon Dent (Also known as Sharon Y.R. Dent; Sharon Y Roth; Sharon Yoder) vs Sharon Yoder
45: Okulov Aleksei vs A B Okulov
48: Maria Del Rosario Garcia De Vicuna Pinedo vs R Garcia-Vicuna
49: Anatoliy Ivashchenko vs A Ivashenko
5 = lastname match only (weak match but sometimes captures alternative first name for better subsequent matches); e.g., Bill Hieb vs W F Hieb
6 = first name match only (weak match but sometimes captures alternative first name for better subsequent matches); e.g., Maria Borawska vs Maria Koscielak
7 = last or first name match on "other names"; e.g., Hromokovska Tetiana (Also known as Gromokovskaia, T. S., Громоковська Тетяна) vs T Gromokovskaia
77: Siva Subramanian vs Kolinjavadi N. Sivasubramanian
88 = no name in orcid but match caught by uniqueness of name across paper (at least 90% and 2 more than next most common name)
prefix:
C = ambiguity reduced (possibly eliminated) using city match (e.g., H Yang on PMID 24972200)
I = ambiguity eliminated by excluding investigators (ie.., one author and one or more investigators with that name)
T = ambiguity eliminated using PubMed pos (T for tie-breaker)
W = ambiguity resolved by authority2018
published:
2024-07-09
Storms, Suzanna; Shisler, Joanna; Nguyen, Thanh H.; Zuckermann, Federico; Lowe, James
(2024)
This dataset includes the RT-PCR results, RT-LAMP results, and the minutes to positive ROC curve calculations. This dataset includes data for the synthetic gBlock, cell culture, and clinical sample assays (nasal swabs and nasal wipes). Also included is a list of FDA approved point of care tests for influenza A virus to date (2-16-2024). MIQE guidelines are also included.
published:
2024-11-15
BL30K is a synthetic dataset rendered using Blender with ShapeNet's data. We break the dataset into six segments, each with approximately 5K videos. The videos are organized in a similar format as DAVIS and YouTubeVOS, so dataloaders for those datasets can be used directly. Each video is 160 frames long, and each frame has a resolution of 768*512. There are 3-5 objects per video, and each object has a random smooth trajectory -- we tried to optimize the trajectories in a greedy fashion to minimize object intersection (not guaranteed), with occlusions still possible (happen a lot in reality). See [Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion (MiVOS), CVPR 2022] for details.
published:
2025-10-01
Lyu, Mingkuan; Kong, Linggen; Yang, Zhenglin; Wu, Yuting; McGhee, Claire E.; Lu, Yi
(2025)
DNAzymes have been widely used in many sensing and imaging applications but have rarely been used for genetic engineering since their discovery in 1994, because their substrate scope is mostly limited to single-stranded DNA or RNA, whereas genetic information is stored mostly in double-stranded DNA (dsDNA). To overcome this major limitation, we herein report peptide nucleic acid (PNA)-assisted double-stranded DNA nicking by DNAzymes (PANDA) as the first example to expand DNAzyme activity toward dsDNA. We show that PANDA is programmable in efficiently nicking or causing double strand breaks on target dsDNA, which mimics protein nucleases and can act as restriction enzymes in molecular cloning. In addition to being much smaller than protein enzymes, PANDA has a higher sequence fidelity compared with CRISPR/Cas under the condition we tested, demonstrating its potential as a novel alternative tool for genetic engineering and other biochemical applications.
keywords:
Conversion;Genomics;Genome Engineering
published:
2021-07-20
Fu, Yuanxi; Schneider, Jodi
(2021)
This dataset contains data from extreme-disagreement analysis described in paper “Aaron M. Cohen, Jodi Schneider, Yuanxi Fu, Marian S. McDonagh, Prerna Das, Arthur W. Holt, Neil R. Smalheiser, 2021, Fifty Ways to Tag your Pubtypes: Multi-Tagger, a Set of Probabilistic Publication Type and Study Design Taggers to Support Biomedical Indexing and Evidence-Based Medicine.” In this analysis, our team experts carried out an independent formal review and consensus process for extreme disagreements between MEDLINE indexing and model predictive scores. “Extreme disagreements” included two situations: (1) an abstract was MEDLINE indexed as a publication type but received low scores for this publication type, and (2) an abstract received high scores for a publication type but lacked the corresponding MEDLINE index term. “High predictive score” is defined as the top 100 high-scoring, and “low predictive score” is defined as the bottom 100 low-scoring. Three publication types were analyzed, which are CASE_CONTROL_STUDY, COHORT_STUDY, and CROSS_SECTIONAL_STUDY. Results were recorded in three Excel workbooks, named after the publication types: case_control_study.xlsx, cohort_study.xlsx, and cross_sectional_study.xlsx.
The analysis shows that, when the tagger gave a high predictive score (>0.9) on articles that lacked a corresponding MEDLINE indexing term, independent review suggested that the model assignment was correct in almost all cases (CROSS_SECTIONAL_STUDY (99%), CASE_CONTROL_STUDY (94.9%), and COHORT STUDY (92.2%)). Conversely, when articles received MEDLINE indexing but model predictive scores were very low (<0.1), independent review suggested that the model assignment was correct in the majority of cases: CASE_CONTROL_STUDY (85.4%), COHORT STUDY (76.3%), and CROSS_SECTIONAL_STUDY (53.6%).
Based on the extreme disagreement analysis, we identified a number of false-positives (FPs) and false-negatives (FNs). For case control study, there were 5 FPs and 14 FNs. For cohort study, there were 7 FPs and 22 FNs. For cross-sectional study, there were 1 FP and 45 FNs. We reviewed and grouped them based on patterns noticed, providing clues for further improving the models. This dataset reports the instances of FPs and FNs along with their categorizations.
keywords:
biomedical informatics; machine learning; evidence based medicine; text mining
published:
2025-06-03
Okyem, Samuel; Trinklein, Timothy; Stanislav, Rubakhin; Jonathan, Sweedler
(2025)
This is a peptide imaging data obtained by mtarix assisted laser desoption ionization trapped ion mobility datasets from the central nervous sytem and select ganglion of aplysia Californica.
keywords:
Neuropeptides, Iosmerization, D-amino acids, MALDI-TIMS
published:
2025-12-08
Maitra, Shraddha; Viswanathan, Mothi Bharath; Park, Kiyoul; Kannan, Baskaran; Cano Alfanar, Sofia; McCoy, Scott M.; Cahoon, Edgar; Altpeter, Fredy; Leakey, Andrew; Singh, Vijay
(2025)
Plant oils are increasingly in demand as renewable feedstocks for biodiesel and biochemicals. Currently, oilseeds are the primary source of plant oils. Although the vegetative tissues of plants express lipid metabolism pathways, they do not hyper-accumulate lipids. Elevated synthesis, storage, and accumulation of lipids in vegetative tissues have been achieved by metabolic engineering of sugarcane to produce “oilcane.” This study evaluates the potential of oilcane as a renewable feedstock for the co-production of lipids and fermentable sugars. Oilcane was grown under favorable climatic and field conditions in Florida (FLOC) as well as during an abbreviated growing season, outside its typical growing region, in Illinois (ILOC). The potential lipid yield of 0.35 tons/ha was projected from the hyperaccumulation of fatty acids in the stored vegetative biomass of FLOC, which is approaching the lipid yield of soybean (0.44 tons/ha). Processing of the vegetative tissues of oilcane recovered 0.20 tons/ha, which represents the recovery of 55% of the total lipids from FLOC. Chemical-free hydrothermal bioprocessing of ILOC and FLOC bagasse and leaves at 180 °C for 10 min prevented the degeneration of in situ plant lipids. This allowed the recovery of lipids at the end of the bioprocess with a major fraction of lipids remaining in the biomass residues after pretreatment and saccharification. Improvements through refined biomass processing, crop management, and metabolic engineering are expected to boost lipid yields and make oilcane a prime feedstock for the production of biodiesel.
keywords:
Conversion;Feedstock Production;Feedstock Bioprocessing;Lipidomics;Metabolomics
published:
2021-05-10
This dataset contains data used in publication "Institutional Data Repository Development, a Moving Target" submitted to Code4Lib Journal. It is a tabular data file describing attributes of data files in datasets published in Illinois Data Bank 2016-04-01 to 2021-04-01.
keywords:
institutional repository
published:
2019-10-03
Choi, Sang Hyun; Rao, Vikyath D.; Gernat, Tim; Hamilton, Adam R.; Robinson, Gene E.; Goldenfeld, Nigel
(2019)
Dataset for F2F events of honeybees. F2F events are defined as face-to-face encounters of two honeybees that are close in distance and facing each other but not connected by the proboscis, thus not engaging in trophallaxis.
The first and the second columns show the unique id's of honeybees participating in F2F events. The third column shows the time at which the F2F event started while the fourth column shows the time at which it ended. Each time is in the Unix epoch timestamp in milliseconds.
keywords:
honeybee;face-to-face interaction
published:
2025-10-17
Deewan, Anshu; Liu, Jing-Jing; Jagtap, Sujit Sadashiv; Yun, Eun Ju; Walukiewicz, Hanna E.; Jin, Yong-Su; Rao, Christopher V.
(2025)
Oleaginous yeasts have received significant attention due to their substantial lipid storage capability. The accumulated lipids can be utilized directly or processed into various bioproducts and biofuels. Lipomyces starkeyi is an oleaginous yeast capable of using multiple plant-based sugars, such as glucose, xylose, and cellobiose. It is, however, a relatively unexplored yeast due to limited knowledge about its physiology. In this study, we have evaluated the growth of L. starkeyi on different sugars and performed transcriptomic and metabolomic analyses to understand the underlying mechanisms of sugar metabolism. Principal component analysis showed clear differences resulting from growth on different sugars. We have further reported various metabolic pathways activated during growth on these sugars. We also observed non-specific regulation in L. starkeyi and have updated the gene annotations for the NRRL Y-11557 strain. This analysis provides a foundation for understanding the metabolism of these plant-based sugars and potentially valuable information to guide the metabolic engineering of L. starkeyi to produce bioproducts and biofuels.
keywords:
Conversion;Metabolomics;Transcriptomics
published:
2019-08-29
Nardulli, Peter; Peyton, Buddy; Bajjalieh, Joseph; Singh, Ajay; Martin, Michael; Shalmon, Dan; Althaus, Scott
(2019)
This is part of the Cline Center’s ongoing Social, Political and Economic Event Database Project (SPEED) project. Each observation represents an event involving civil unrest, repression, or political violence in Sierra Leone, Liberia, and the Philippines (1979-2009). These data were produced in an effort to describe the relationship between exploitation of natural resources and civil conflict, and to identify policy interventions that might address resource-related grievances and mitigate civil strife.
This work is the result of a collaboration between the US Army Corps of Engineers’ Construction Engineer Research Laboratory (ERDC-CERL), the Swedish Defence Research Agency (FOI) and the Cline Center for Advanced Social Research (CCASR). The project team selected case studies focused on nations with a long history of civil conflict, as well as lucrative natural resources.
The Cline Center extracted these events from country-specific articles published in English by the British Broadcasting Corporation (BBC) Summary of World Broadcasts (SWB) from 1979-2008 and the CIA’s Foreign Broadcast Information Service (FBIS) 1999-2004. Articles were selected if they mentioned a country of interest, and were tagged as relevant by a Cline Center-built machine learning-based classification algorithm. Trained analysts extracted nearly 10,000 events from nearly 5,000 documents. The codebook—available in PDF form below—describes the data and production process in greater detail.
keywords:
Cline Center for Advanced Social Research; civil unrest; Social Political Economic Event Dataset (SPEED); political; event data; war; conflict; protest; violence; social; SPEED; Cline Center; Political Science
published:
2025-09-18
Saifuddin, Mustafa; Bhatnagar, Jennifer; Segrè, Daniel; Finzi, Adrien C.
(2025)
Respiration by soil bacteria and fungi is one of the largest fluxes of carbon (C) from the land surface. Although this flux is a direct product of microbial metabolism, controls over metabolism and their responses to global change are a major uncertainty in the global C cycle. Here, we explore an in silico approach to predict bacterial C-use efficiency (CUE) for over 200 species using genome-specific constraint-based metabolic modeling. We find that potential CUE averages 0.62 ± 0.17 with a range of 0.22 to 0.98 across taxa and phylogenetic structuring at the subphylum levels. Potential CUE is negatively correlated with genome size, while taxa with larger genomes are able to access a wider variety of C substrates. Incorporating the range of CUE values reported here into a next-generation model of soil biogeochemistry suggests that these differences in physiology across microbial taxa can feed back on soil-C cycling.
keywords:
Sustainability;Metabolomics;Modeling
published:
2024-10-11
Zinnen, Jack; Barak, Rebecca; Matthews, Jeffrey
(2024)
This is the core data for Influence of ecological characteristics and phylogeny on native plant species’ commercial availability, a manuscript pending publication in Ecological Applications. The data regard ecological characteristics, phenology, and phylogeny of plant species native to the Midwestern United States and how those factors relate to commercial availability.
keywords:
biodiversity; native plant nursery; plant trade; plant vendors; restoration
published:
2023-09-19
Salami, Malik Oyewale; Lee, Jou; Schneider, Jodi
(2023)
We used the following keywords files to identify categories for journals and conferences not in Scopus, for our STI 2023 paper "Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science".
The first four text files each contains keywords/content words in the form: 'keyword1', 'keyword2', 'keyword3', .... The file title indicates the name of the category:
file1: healthscience_words.txt
file2: lifescience_words.txt
file3: physicalscience_words.txt
file4: socialscience_words.txt
The first four files were generated from a combination of software and manual review in an iterative process in which we:
- Manually reviewed venue titles were not able to automatically categorize using the Scopus categorization or extending it as a resource.
- Iteratively reviewed uncategorized venue titles to manually curate additional keywords as content words indicating a venue title could be classified in the category healthscience, lifescience, physicalscience, or socialscience. We used English content words and added words we could automatically translate to identify content words. NOTE: Terminology with multiple potential meanings or contain non-English words that did not yield useful automatic translations e.g., (e.g., Al-Masāq) were not selected as content words.
The fifth text file is a list of stopwords in the form: 'stopword1', 'stopword2, 'stopword3', ...
file5: stopwords.txt
This file contains manually curated stopwords from venue titles to handle non-content words like 'conference' and 'journal,' etc.
This dataset is a revision of the following dataset:
Version 1: Lee, Jou; Schneider, Jodi: Keywords for manual field assignment for Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science. University of Illinois at Urbana-Champaign Data Bank.
Changes from Version 1 to Version 2:
- Added one author
- Added a stopwords file that was used in our data preprocessing.
- Thoroughly reviewed each of the 4 keywords lists. In particular, we added UTF-8 terminology, removed some non-content words and misclassified content words, and extensively reviewed non-English keywords.
keywords:
health science keywords; scientometrics; stopwords; field; keywords; life science keywords; physical science keywords; science of science; social science keywords; meta-science; RISRS
published:
2022-07-25
A set of chemical entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords:
synthetic biology; NERC data; chemical mentions