Illinois Data Bank Dataset Search Results
Results
published:
2022-12-05
Ng, Yee Man Margaret ; Taneja, Harsh
(2022)
These are similarity matrices of countries based on dfferent modalities of web use. Alexa website traffic, trending vidoes on Youtube and Twitter trends. Each matrix is a month of data aggregated
keywords:
Global Internet Use
published:
2017-03-02
This data was collected between 2004 and 2010 at White River National Wildlife Refuge (WRNWR) and Saint Francis National Forest (SF). It was collected as part of two master’s and one PhD project at Arkansas State University USA studying Swainson’s Warbler habitat use, survival, and body condition.
keywords:
Swainson’s Warbler; Limnothlypis swainsonii; flooding; natural disturbance; apparent survival; body condition
published:
2025-10-14
Jagtap, Sujit Sadashiv; Deewan, Anshu; Liu, Jing-Jing; Walukiewicz, Hanna E.; Yun, Eun Ju; Jin, Yong-Su; Rao, Christopher V.
(2025)
Rhodosporidium toruloides is an oleaginous yeast capable of producing a variety of biofuels and bioproducts from diverse carbon sources. Despite numerous studies showing its promise as a platform microorganism, little is known about its metabolism and physiology. In this work, we investigated the central carbon metabolism in R. toruloides IFO0880 using transcriptomics and metabolomics during growth on glucose, xylose, acetate, or soybean oil. These substrates were chosen because they can be derived from plants. Significant changes in gene expression and metabolite concentrations were observed during growth on these four substrates. We mapped these changes onto the governing metabolic pathways to better understand how R. toruloides reprograms its metabolism to enable growth on these substrates. One notable finding concerns xylose metabolism, where poor expression of xylulokinase induces a bypass leading to arabitol production. Collectively, these results further our understanding of central carbon metabolism in R. toruloides during growth on different substrates. They may also help guide the metabolic engineering and development of better models of metabolism for R. toruloides.
keywords:
Conversion;Metabolomics;Transcriptomics
published:
2020-12-02
Yang, Pan; Cai, Ximing; Khanna, Madhu
(2020)
The dataset includes the survey results about farmers’ perceptions of marginal land availability and the likelihood of a land pixel being marginal based on a machine learning model trained from the survey.
Two spreadsheet files are the farmer and farm characteristics (marginal_land_survey_data_shared.xlsx), and the existing land use of marginal lands (land_use_info_sharing.xlsx).
<b>Note:</b> the blank cells in these two spreadsheets mean missing values in the survey response.
The GeoTiff file includes two bands, one the marginal land likelihood in the Midwestern states (0-1), the other the dominant reason of land marginality (0-5; 0 for farm size, 1 for growing season precipitation, 2 for root zone soil water capacity, 3 for average slope, 4 for growing season mean temperature, and 5 for growing season diurnal range of temperature). To read the data, please use a GIS software such as ArcGIS or QGIS.
keywords:
marginal land; survey
published:
2021-06-14
Kelkar, Varun A.; Anastasio, Mark A.
(2021)
This repository contains the weights for two StyleGAN2 networks trained on two composite T1 and T2 weighted open-source brain MR image datasets, and one StyleGAN2 network trained on the Flickr Face HQ image dataset. Example images sampled from the respective StyleGANs are also included.
The datasets themselves are not included in this repository. The weights are stored as `.pkl` files. The code and instructions to load and use the weights can be found at https://github.com/comp-imaging-sci/pic-recon . Additional details and citations can be found in the file "README.md".
keywords:
StyleGAN2; Generative adversarial network (GAN); MRI; Medical imaging
published:
2025-11-03
von Haden, Adam C.; Eddy, William; Burnham, Mark B.; Brzostek, Edward; Yang, Wendy; DeLucia, Evan H.
(2025)
Root exudation is a key process for plant nutrient acquisition, but the controls on root exudation and its relationship to soil C and N processes in agroecosystems are unclear. We hypothesized that root exudation rates would be related to root morphological traits, N fertilization, and soil moisture. We also anticipated that root exudation would be correlated with bulk soil enzyme activity. Root exudation, root traits, and bulk soil extracellular enzyme activity were assessed in maize (Zea mays L.), soybean (Glycine max (L.) Merr.), biomass sorghum (Sorghum bicolor (L.) Moench), giant miscanthus (Miscanthus × giganteus), and switchgrass (Panicum virgatum L.). Measurements were taken in situ during two growing seasons with contrasting precipitation regimes, and N fertilization rate was varied in sorghum during one year. Specific root exudation (per unit root surface area) was negatively related to root diameter and was generally higher in annuals than perennials. Sorghum N fertilization did not affect root exudation rates, and soil moisture regime had no effect on annual root exudation rates within maize, sorghum, and miscanthus. Specific root exudation was negatively related to bulk soil C- and N-degrading soil enzyme activities. Intrinsic plant characteristics appeared more important than environmental variables in controlling in situ root exudation rates. The relationships between root diameter, root exudation, and soil C and N processes link root morphological traits to soil functions and demonstrate the potential tradeoffs among plant nutrient acquisition strategies in agroecosystems.
keywords:
Sustainability;Biomass Analytics;Field Data
published:
2021-10-27
de Jesús Astacio, Luis Miguel ; Prabhakara, Kaumudi Hassan; Li, Zeqian; Mickalide, Harry; Kuehn , Seppe
(2021)
Shared dataset consists of 16S sequencing data of microbial communities. Each community is composed of heterotrophic bacteria derived from one of two soil samples and the model algae Chlamydomonas reinhardtii. Each comunity was placed in a materially closed environment with an initial supply of carbon in the media and subjected to light-dark cycles. The closed microbial ecosystems (CES) survived via carbon cycling. Each CES was subjected to rounds of dilution, after which the community was sequenced (data provided here). The shared dataset allowed us to conclude that CES consistently self-assembled to cycle carbon (data not provided) via conserved metabolic capabilites (data not provided) dispite differences in taxonomic composition (data provided).
---------------------------
Naming convention:
[soil sample = A or B][CES replicate = 1,2,3, or 4]_[round number = 1,2,3,or 4]_[reverse read = R or forward read = F]_filt.fastq
Example -- A1_r1_F_filt.fastq means soil sample A, CES replicate 1, end of round1, forward read
keywords:
16S seq; .fastq; closed microbial ecosystems; carbon cycling
published:
2025-02-20
Zhou, Xiaoran; Zheng, Heng
(2025)
To gather news articles from the web that discuss the Cochrane Review (DOI: 10.1002/14651858.CD006207.pub6), we retrieved articles on August 1, 2023 from used Altmetric.com's Altmetric Explorer. We selected all articles that were written in English, published in the United States, and had a publication date <b>on or after March 10, 2023</b> (according to the "Mention Date" from Altmetric.com). This date is significant as it is when Cochrane issued a statement (https://www.cochrane.org/news/statement-physical-interventions-interrupt-or-reduce-spread-respiratory-viruses-review) about the "misleading interpretation" of the Cochrane Review made by news articles.
A previously published dataset for "Arguing about Controversial Science in the News: Does Epistemic Uncertainty Contribute to Information Disorder?" (DOI: 10.13012/B2IDB-4781172_V1) contains annotation of the news articles published before March 10, 2023. Our dataset annotates the news published on or after March 10, 2023.
The Altmetric_data.csv describes the selected news articles with both data exported from Altmetric Explorer and data we manually added
Data exported from Altmetric Explorer:
- Publication date of the news article
- Title of the news article
- Source/publication venue of the news article
- URL
- Country
Data we manually added:
- Whether the article is accessible
- The date we checked the article
- The corresponding ID of the article in MAXQDA
For each article from Altmetric.com, we first tried to use the Web Collector for MAXQDA to download the article from the website and imported it into MAXQDA (version 22.8.0).
We manually extracted direct quotations from the articles using MAXQDA.
We included surrounding words and sentences around direct quotations for context where needed.
We manually added codes and code categories in MAXQDA to identify the individuals (chief editors of the Cochrane Review, government agency representatives, journalists, and other experts such as physicians) or organizations (government agencies, other organizations, and research publications) who were quoted.
The MAXQDA_data.csv file contains excerpts from the news articles that contain the direct quotations we annotated.
For each excerpt, we included the following information:
- MAXQDA ID of the document from which the excerpt originates
- The collection date and source of the document
- The code we assigned to the excerpt
- The code category
- The excerpt itself
keywords:
altmetrics; MAXQDA; masks for COVID-19; scientific controversies; news articles
published:
2022-02-07
Karakoc, Deniz Berfin; Wang, Junren; Konar, Megan
(2022)
This dataset provides estimates of agricultural and food commodity flows [kg] between all county pairs within the United States for the years 2007, 2012, and 2017. The database provides 206.3 million data points, since pairwise information is provided between 3134 counties, for 7 commodity categories, and 3 time periods. The commodity categories correspond to the Standardized Classification of Transported Goods and are:
- SCTG 1: Iive animals and fish
- SCTG 2: cereal grains
- SCTG 3: agricultural products (except for animal feed, cereal grains, and forage products)
- SCTG 4: animal feed, eggs, honey, and other products of animal origin
- SCTG 5: meat, poultry, fish, seafood, and their preparations
- SCTG 6: milled grain products and preparations, and bakery products
- SCTG 7: other prepared foodstuffs, fats and oils
For additional information, please see the related paper by Karakoc et al. (2022) in Environmental Research Letters.
keywords:
food flows; high-resolution; county-scale; time-series; United States
published:
2022-07-11
Jeng, Amos; Bosch, Nigel; Perry, Michelle
(2022)
This dataset was developed as part of an online survey study that explores student characteristics that may predict what one finds helpful in replies to requests for help posted to an online college course discussion forum. 223 college students enrolled in an introductory statistics course were surveyed on their sense of belonging to their course community, as well as how helpful they found 20 examples of replies to requests for help posted to a statistics course discussion forum.
keywords:
help-giving; discussion forums; sense of belonging; college student
published:
2021-12-28
Xia, Yushu; Wander, Michelle
(2021)
*Updates for this V3: added a few more records and rearranged the sequence of the tables in order to support our new paper "Evaluation of Indirect and Direct Scoring Methods to Relate Biochemical Soil Quality Indicators to Ecosystem Services" accepted by the Soil Science Society of America Journal.
We summarize peer reviewed literature reporting associations between for three soil quality indicators (SQIs) (β-glucosidase (BG), fluorescein diacetate (FDA) hydrolysis, and permanganate oxidizable carbon (POXC)) and crop yield and greenhouse gas emissions. Peer-reviewed articles published between January of 1990 and May 2018 were searched using the Thomas Reuters Web of Science database (Thomas Reuters, Philadelphia, Pennsylvania) and Google Scholar to identify studies reporting results for: “β-glucosidase”, “permanganate oxidizable carbon”, “active carbon”, “readily oxidizable carbon”, or “fluorescein diacetate hydrolysis”, together with one or more of the following: “crop yield”, “productivity”, “greenhouse gas’, “CO2”, “CH4”, or “N2O”.
Meta-data for records include the following descriptor variables and covariates useful for scoring function development: 1) identifying factors for the study site (location, duration of the experiment), 2) soil textural class, pH, and SOC, 3) depth of soil sampling, 4) units used in published works (i.e.: equivalent mass, concentration), 5) SQI abundances and measured ecosystem functions, and 6) summary statistics for correlation between SQIs and functions (yield and greenhouse gas emissions).
*Note: Blank values in tables are considered unreported data.
keywords:
Soil health promoting practices; Soil quality indicators; β-glucosidase; fluorescein diacetate hydrolysis; Permanganate oxidizable carbon; Greenhouse gas emissions; Scoring curves; Soil Management Assessment Framework
published:
2024-04-10
Konar, Megan; Ruess, Paul J.; Wanders, Niko; Bierkens, Marc F.P.
(2024)
This dataset provides estimates of total Irrigation Water Use (IWU) by crop, county, water source, and year for the Continental United States. Total irrigation from Surface Water Withdrawals (SWW), total Groundwater Withdrawals (GWW), and nonrenewable Groundwater Depletion (GWD) is provided for 20 crops and crop groups from 2008 to 2020 at the county spatial resolution.
In total, there are nearly 2.5 million data points in this dataset (3,142 counties; 13 years; 3 water sources; and 20 crops). This dataset supports the paper by Ruess et al (2024) "Total irrigation by crop in the Continental United States from 2008 to 2020", Scientific Data, doi: 10.1038/s41597-024-03244-w
When using, please cite as:
Ruess, P.J., Konar, M., Wanders, N., and Bierkens, M.F.P. (2024) Total irrigation by crop in the Continental United States from 2008 to 2020, Scientific Data, doi: 10.1038/s41597-024-03244-w
keywords:
water use; irrigation; surface water; groundwater; groundwater depletion; counties; crops; time series
published:
2022-04-21
This dataset was created based on the publicly available microdata from PNS-2019, a national health survey conducted by the Instituto Brasileiro de Geografia e Estatistica (IBGE, Brazilian Institute of Geography and Statistics). IBGE is a federal agency responsible for the official collection of statistical information in Brazil – essentially, the Brazilian census bureau. Data on selected variables focusing on biopsychosocial domains related to pain prevalence, limitations and treatment are available. The Fundação Instituto Oswaldo Cruz has detailed information about the PNS, including questionnaires, survey design, and datasets (www.pns.fiocruz.br). The microdata can be found on the IBGE website (https://www.ibge.gov.br/estatisticas/downloads-estatisticas.html?caminho=PNS/2019/Microdados/Dados).
keywords:
back pain; health status disparities; biopsychosocial; Brazil
published:
2025-09-12
Dong, Hongxu; Clark, Lindsay; Lipka, Alexander; Brummer, Joe E.; Głowacka, Katarzyna; Hall, Megan C.; Heo, Kweon; Jin, Xiaoli; Peng, Junhua; Yamada, Toshihiko; Ghimire, Bimal Kumar; Yoo, Ji Hye; Yu, Chang Yeon; Zhao, Hua; Long, Stephen; Sacks, Erik
(2025)
Overwintering ability is an important selection criterion for Miscanthus breeding in temperate regions. Insufficient overwintering ability of the currently leading Miscanthus biomass cultivar, M. ×giganteus (M×g) ‘1993–1780′, in regions where average annual minimum temperatures are −26.1°C (USDA hardiness zone 5) or lower poses a pressing need to develop new cultivars with superior cold tolerance. To facilitate breeding of Miscanthus, this study characterized phenotypic and genetic variation of overwintering ability in an M. sinensis germplasm panel consisting of 564 accessions, evaluated in field trials at three locations in North America and two in Asia. Genome‐wide association (GWA) and genomic prediction analyses were performed. The Korea/N China M. sinensis genetic group is a valuable gene pool for cold tolerance. The Yangtze‐Qinling, Southern Japan, and Northern Japan genetic groups were also potential sources of cold tolerance. A total of 73 marker–trait associations were detected for overwintering ability. Estimated breeding value for overwintering ability based on these 73 markers could explain 55% of the variation for first winter overwintering ability among M. sinensis. Average genomic prediction ability for overwintering ability across 50 fivefold cross‐validations was high (~0.73) after accounting for population structure. Common genomic regions for overwintering ability were detected by GWA analyses and a previous parallel QTL mapping study using three interconnected biparental F1 populations. One QTL on Miscanthus LG 8 encompassed five GWA hits and a known cold‐responsive gene, COR47. The other overwintering ability QTL on Miscanthus LG 11 contained two GWA hits and three known cold stress‐related genes, carboxylesterase 13 (CEX13), WRKY2 transcription factor, and cold shock domain (CSDP1). Miscanthus accessions collected from high latitude locations with cold winters had higher rates of overwintering, and more alleles for overwintering, than accessions collected from southern locations with mild winters.
keywords:
Feedstock Production;Biomass Analytics;Genomics
published:
2020-06-12
Fu, Yuanxi; Hsiao, Tzu-Kun
(2020)
This is a network of 14 systematic reviews on the salt controversy and their included studies. Each edge in the network represents an inclusion from one systematic review to an article. Systematic reviews were collected from Trinquart (Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184 ).
<b>FILE FORMATS</b>
1) Article_list.csv - Unicode CSV
2) Article_attr.csv - Unicode CSV
3) inclusion_net_edges.csv - Unicode CSV
4) potential_inclusion_link.csv - Unicode CSV
5) systematic_review_inclusion_criteria.csv - Unicode CSV
6) Supplementary Reference List.pdf - PDF
<b>ROW EXPLANATIONS</b>
1) Article_list.csv - Each row describes a systematic review or included article.
2) Article_attr.csv - Each row is the attributes of a systematic review/included article.
3) inclusion_net_edges.csv - Each row represents an inclusion from a systematic review to an article.
4) potential_inclusion_link.csv - Each row shows the available evidence base of a systematic review.
5) systematic_review_inclusion_criteria.csv - Each row is the inclusion criteria of a systematic review.
6) Supplementary Reference List.pdf - Each item is a bibliographic record of a systematic review/included paper.
<b>COLUMN HEADER EXPLANATIONS</b>
<b>1) Article_list.csv:</b>
ID - Numeric ID of a paper
paper assigned ID - ID of the paper from Trinquart et al. (2016)
Type - Systematic review / primary study report
Study Groupings - Groupings for related primary study reports from the same report, from Trinquart et al. (2016) (if applicable, otherwise blank)
Title - Title of the paper
year - Publication year of the paper
Attitude - Scientific opinion about the salt controversy from Trinquart et al. (2016)
Doi - DOIs of the paper. (if applicable, otherwise blank)
Retracted (Y/N) - Whether the paper was retracted or withdrawn (Y). Blank if not retracted or withdrawn.
<b>2) Article_attr.csv:</b>
ID - Numeric ID of a paper
year - Publication year
Attitude - Scientific opinion about the salt controversy from Trinquart et al. (2016)
Type - Systematic review/ primary study report
<b>3) inclusion_net_edges.csv:</b>
citing_ID - The numeric ID of a systematic review
cited_ID - The numeric ID of the included articles
<b>4) potential_inclusion_link.csv:</b>
This data was translated from the Sankey diagram given in Trinquart et al. (2016) as Web Figure 4. Each row indicates a systematic review and each column indicates a primary study. In the matrix, "p" indicates that a given primary study had been published as of the search date of a given systematic review.
<b>5)systematic_review_inclusion_criteria.csv:</b>
ID - The numeric IDs of systematic reviews
paper assigned ID - ID of the paper from Trinquart et al. (2016)
attitude - Its scientific opinion about the salt controversy from Trinquart et al. (2016)
No. of studies included - Number of articles included in the systematic review
Study design - Study designs to include, per inclusion criteria
population - Populations to include, per inclusion criteria
Exposure/Intervention - Exposures/Interventions to include, per inclusion criteria
outcome - Study outcomes required for inclusion, per inclusion criteria
Language restriction - Report languages to include, per inclusion criteria
follow-up period - Follow-up period required for inclusion, per inclusion criteria
keywords:
systematic reviews; evidence synthesis; network visualization; tertiary studies
published:
2023-07-27
Feng, Ling; Takiya, Daniela; Krishnankutty, Sindhu; Dietrich, Christopher; Zhang, Yalin
(2023)
The text file contains the original aligned DNA nucleotide sequence data used in the phylogenetic analyses of Feng et al. (in review), comprising the 3 protein-coding genes (histone H3, cytochrome oxidase I and 2) and 2 ribosomal genes (28S D8 and 16S). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 257 taxa (species) and 2995 characters (nucleotide positions), indicate that the characters are DNA sequence, that gaps inserted into the DNA sequence alignment are indicated by a dash, and that missing data are indicated by a question mark. The remainder of the file contains the aligned nucleotide sequence data for the five genes. Data partitions, representing the individual genes and different codon positions of the protein-coding genes, are indicated by the lines beginning "charset" near the end of the file. Two supplementary tables in the provided PDF file provide additional information on the species in the dataset, including the GenBank accession numbers for the sequence data (Table S1) and the DNA substitution models used for each of the data partitions used for analyses in the phylogenetic analysis program IQ-Tree (version 1.6.8) (Table S3), as described in the Methods section of the paper. The supplemental tables will also be linked to the article upon publication at the journal website.
keywords:
Insect; leafhopper; dispersal; vicariance; evolution
published:
2025-10-01
Wang, Yajie; Huang, Xiaoqiang; Hui, Jingshu; Vo, Lam Tung; Zhao, Huimin
(2025)
There is a growing interest in developing cooperative chemoenzymatic reactions to harness the reactivity of chemical catalysts and the selectivity of enzymes for the synthesis of nonracemic chiral compounds. However, existing chemoenzymatic systems with more than one chemical reaction and one enzymatic reaction working cooperatively are rare. Moreover, the application of oxidoreductases in cooperative chemoenzymatic reactions is limited by the necessity of using expensive and unstable redox equivalents such as nicotinamide cofactors. Here, we report a light-driven cooperative chemoenzymatic system comprised of a photoinduced electron transfer reaction (PET) and a photosensitized energy transfer reaction (PEnT) with an enzymatic reduction in one-pot to synthesize chiral building blocks of bioactive compounds. As a proof of concept, ene-reductase was directly regenerated by PET in the absence of external cofactors. Meanwhile, enzymatic reduction worked cooperatively with photocatalyst-catalyzed energy transfer that continuously replenished the reactive isomer from the less reactive one. The whole system stereoconvergently reduced E/Z mixtures of alkenes to the enantiopure products. Additionally, enantioselective enzymatic reduction worked competitively with photocatalyst-catalyzed racemic background reaction and side reactions to channel the overall electron flow to the single enantiopure product. Such a light-driven cooperative chemoenzymatic system holds great potential for asymmetric synthesis using inexpensive petroleum or biomass-derived alkenes.
keywords:
Conversion;Catalysis
published:
2021-05-07
The dataset is based on a snapshot of PubMed taken in December 2018 (NLMs baseline 2018 plus updates throughout 2018), and for ORCIDs, primarily, the 2019 ORCID Public Data File https://orcid.org/.
Matching an ORCID to an individual author name on a PMID is a non-trivial process. Anyone can create an ORCID and claim to have contributed to any published work. Many records claim too many articles and most claim too few. Even though ORCID records are (most?) often populated by author name searches in popular bibliographic databases, there is no confirmation that the person's name is listed on the article. This dataset is the product of mapping ORCIDs to individual author names on PMIDs, even when the ORCID name does not match any author name on the PMID, and when there are multiple (good) candidate author names. The algorithm avoids assigning the ORCID to an article when there are no good candidates and when there are multiple equally good matches. For some ORCIDs that clearly claim too much, it triggers a very strict matching procedure (for ORCIDs that claim too much but the majority appear correct, e.g., 0000-0002-2788-5457), and sometimes deletes ORCIDs altogether when all (or nearly all) of its claimed PMIDs appear incorrect. When an individual clearly has multiple ORCIDs it deletes the least complete of them (e.g., 0000-0002-1651-2428 vs 0000-0001-6258-4628). It should be noted that the ORCIDs that claim to much are not necessarily due nefarious or trolling intentions, even though a few appear so. Certainly many are are due to laziness, such as claiming everything with a particular last name. Some cases appear to be due to test engineers (e.g., 0000-0001-7243-8157; 0000-0002-1595-6203), or librarians assisting faculty (e.g., ; 0000-0003-3289-5681), or group/laboratory IDs (0000-0003-4234-1746), or having contributed to an article in capacities other than authorship such as an Investigator, an Editor, or part of a Collective (e.g., 0000-0003-2125-4256 as part of the FlyBase Consortium on PMID 22127867), or as a "Reply To" in which case the identity of the article and authors might be conflated. The NLM has, in the past, limited the total number of authors indexed too. The dataset certainly has errors but I have taken great care to fix some glaring ones (individuals who claim to much), while still capturing authors who have published under multiple names and not explicitly listed them in their ORCID profile. The final dataset provides a "matchscore" that could be used for further clean-up.
Four files:
person.tsv: 7,194,692 rows, including header
1. orcid
2. lastname
3. firstname
4. creditname
5. othernames
6. otherids
7. emails
employment.tsv: 2,884,981 rows, including header
1. orcid
2. putcode
3. role
4. start-date
5. end-date
6. id
7. source
8. dept
9. name
10. city
11. region
12 country
13. affiliation
education.tsv: 3,202,253 rows, including header
1. orcid
2. putcode
3. role
4. start-date
5. end-date
6. id
7. source
8. dept
9. name
10. city
11. region
12 country
13. affiliation
pubmed2orcid.tsv: 13,133,065 rows, including header
1. PMID
2. au_order (author name position on the article)
3. orcid
4. matchscore (see below)
5. source: orcid (2019 ORCID Public Data File https://orcid.org/), pubmed (NLMs distributed XML files), or patci (an earlier version of ORCID with citations processed through the Patci tool)
12,037,375 from orcid; 1,06,5892 from PubMed XML; 29,797 from Patci
matchscore:
000: lastname, firstname and middle init match (e.g., Eric T MacKenzie vs
00: lastname, firstname match (e.g., Keith Ward)
0: lastname, firstname reversed match (e.g., Conde Santiago vs Santiago Conde)
1: lastname, first and middle init match (e.g., L. F. Panchenko)
11: lastname and partial firstname match (e.g., Mike Boland vs Michael Boland or Mel Ziman vs Melanie Ziman)
12: lastname and first init match
15: 3 part lastname and firstname match (David Grahame Hardie vs D Grahame Hardie)
2: lastname match and multipart firstname initial match Maria Dolores Suarez Ortega vs M. D. Suarez
22: partial lastname match and firstname match (e.g., Erika Friedmann vs Erika Friedman)
23: e.g., Antonio Garcia Garcia vs A G Garcia
25: Allan Downie vs J A Downie
26: Oliver Racz vs Oliver Bacz
27: Rita Ostrovskaya vs R U Ostrovskaia
29: Andrew Staehelin vs L A Staehlin
3: M Tronko vs N D Tron'ko
4: Sharon Dent (Also known as Sharon Y.R. Dent; Sharon Y Roth; Sharon Yoder) vs Sharon Yoder
45: Okulov Aleksei vs A B Okulov
48: Maria Del Rosario Garcia De Vicuna Pinedo vs R Garcia-Vicuna
49: Anatoliy Ivashchenko vs A Ivashenko
5 = lastname match only (weak match but sometimes captures alternative first name for better subsequent matches); e.g., Bill Hieb vs W F Hieb
6 = first name match only (weak match but sometimes captures alternative first name for better subsequent matches); e.g., Maria Borawska vs Maria Koscielak
7 = last or first name match on "other names"; e.g., Hromokovska Tetiana (Also known as Gromokovskaia, T. S., Громоковська Тетяна) vs T Gromokovskaia
77: Siva Subramanian vs Kolinjavadi N. Sivasubramanian
88 = no name in orcid but match caught by uniqueness of name across paper (at least 90% and 2 more than next most common name)
prefix:
C = ambiguity reduced (possibly eliminated) using city match (e.g., H Yang on PMID 24972200)
I = ambiguity eliminated by excluding investigators (ie.., one author and one or more investigators with that name)
T = ambiguity eliminated using PubMed pos (T for tie-breaker)
W = ambiguity resolved by authority2018
published:
2022-01-27
Li, Shuai; Moller, Christopher A.; Mitchell, Noah G.; Lee, DoKyoung; Sacks, Erik J.; Ainsworth, Elizabeth A.
(2022)
Twenty-two genotypes of C4 species grown under ambient and elevated O3 concentration were studied at the SoyFACE (40°02’N, 88°14’W) in 2019. This dataset contains leaf morphology, photosynthesis and nutrient contents measured at three time points. The results of CO2 response curves are also included.
keywords:
C4, O3, photosynthesis
published:
2025-09-18
Saifuddin, Mustafa; Bhatnagar, Jennifer; Segrè, Daniel; Finzi, Adrien C.
(2025)
Respiration by soil bacteria and fungi is one of the largest fluxes of carbon (C) from the land surface. Although this flux is a direct product of microbial metabolism, controls over metabolism and their responses to global change are a major uncertainty in the global C cycle. Here, we explore an in silico approach to predict bacterial C-use efficiency (CUE) for over 200 species using genome-specific constraint-based metabolic modeling. We find that potential CUE averages 0.62 ± 0.17 with a range of 0.22 to 0.98 across taxa and phylogenetic structuring at the subphylum levels. Potential CUE is negatively correlated with genome size, while taxa with larger genomes are able to access a wider variety of C substrates. Incorporating the range of CUE values reported here into a next-generation model of soil biogeochemistry suggests that these differences in physiology across microbial taxa can feed back on soil-C cycling.
keywords:
Sustainability;Metabolomics;Modeling
published:
2025-01-29
Quiroz, Edwin; Ashley, Mary V.; Zaya, David N.
(2025)
These data records weekly aphid and monarch butterfly (Danaus plexippus) neonate counts on individual milkweed plants in multiple raised garden beds in Chicago during the summers of 2023 and 2024. Relationships between aphid infestation and monarch neonates can be investigated along with weekly trends of monarch oviposition and aphid abundances. All gardens included in this study were on the University of Illinois Chicago campus, and within 100 meters of proximity. Data are provided on three milkweed species in 2023, and one milkweed species in 2024.
keywords:
Aphis; Myzocallis; Danaus plexippus; urban gardens; Asclepias syriaca; milkweeds
published:
2023-09-19
Salami, Malik Oyewale; Lee, Jou; Schneider, Jodi
(2023)
We used the following keywords files to identify categories for journals and conferences not in Scopus, for our STI 2023 paper "Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science".
The first four text files each contains keywords/content words in the form: 'keyword1', 'keyword2', 'keyword3', .... The file title indicates the name of the category:
file1: healthscience_words.txt
file2: lifescience_words.txt
file3: physicalscience_words.txt
file4: socialscience_words.txt
The first four files were generated from a combination of software and manual review in an iterative process in which we:
- Manually reviewed venue titles were not able to automatically categorize using the Scopus categorization or extending it as a resource.
- Iteratively reviewed uncategorized venue titles to manually curate additional keywords as content words indicating a venue title could be classified in the category healthscience, lifescience, physicalscience, or socialscience. We used English content words and added words we could automatically translate to identify content words. NOTE: Terminology with multiple potential meanings or contain non-English words that did not yield useful automatic translations e.g., (e.g., Al-Masāq) were not selected as content words.
The fifth text file is a list of stopwords in the form: 'stopword1', 'stopword2, 'stopword3', ...
file5: stopwords.txt
This file contains manually curated stopwords from venue titles to handle non-content words like 'conference' and 'journal,' etc.
This dataset is a revision of the following dataset:
Version 1: Lee, Jou; Schneider, Jodi: Keywords for manual field assignment for Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science. University of Illinois at Urbana-Champaign Data Bank.
Changes from Version 1 to Version 2:
- Added one author
- Added a stopwords file that was used in our data preprocessing.
- Thoroughly reviewed each of the 4 keywords lists. In particular, we added UTF-8 terminology, removed some non-content words and misclassified content words, and extensively reviewed non-English keywords.
keywords:
health science keywords; scientometrics; stopwords; field; keywords; life science keywords; physical science keywords; science of science; social science keywords; meta-science; RISRS
published:
2021-08-04
Sabrina, Sadia; Lewis, Quinn; Rhoads, Bruce
(2021)
This dataset contains data derived from large-scale particle velocimetry measurements obtained at the confluence of the Saline Branch and an unnamed tributary in Illinois. The data were collected using two cameras positioned about the confluence, one mounted on a cable and the other mounted on a tripod. A description of the content of the files can be found in Description of Files.rtf.
keywords:
confluence; hydrodynamics; LSPIV; flow structure; stagnation
published:
2022-04-19
Nowak, Romana; Yang, Shuhong; Li, Kailiang; Bi, Jiajia; Drnevich, Jenny
(2022)
List of differentially expressed genes in human endometrial stromal cells with knockdown of Basigin (BSG) gene expression during decidualization.
The BSG siRNA or negative scrambled control siRNA were transfected into human endometrial stromal cells (HESCs) following the protocol of siLentFect™ Lipid (Bio-Rad, Hercules, CA. Following complete knock down of BSG in HESCs (72 hours after adding siRNA), HESCs were treated with medium containing estrogen, progesterone and cAMP to induce decidualization. BSG siRNA and negative control scrambled siRNA were added to the cells every four days (day 0, 4) over the course of the decidualization protocol. Total RNA was harvested at day 6 of the decidualization protocol for microarray analysis. Microarray analysis was performed at the University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center. Briefly, 0.2 micrograms of total RNA were labeled using the Agilent two color QuickAmp labeling kit (Agilent Technologies, Santa Clara, CA) according to the manufacturer’s protocol. The optional spike-in controls were not used. Samples were hybridized to Human Gene Expression 4x44K v2 Microarray (Agilent Technologies, Santa Clara, CA) in an Agilent Hybridization Cassette according to standard protocols. The arrays were then scanned on an Axon GenePix 4000B scanner and the images were quantified using Axon GenePix 6.1.
Microarray data pre-processing and statistical analyses were done in R (v3.6.2) using the limma package (3.42.0 (Ritchie et al., 2015). Median foreground and median background values from the 4 arrays were read into R and any spots that had been manually flagged (-100 values) were given a weight of zero. The background values were ignored because investigations showed that trying to use them to adjust for background fluorescence added more noise to the data; background was low and even for all arrays, therefore no background correction was done.
The individual Cy5 and Cy3 fluorescence for each array were normalized together using the quantile method 3 (Yang and Thorne, 2003). Agilent's Human Gene Expression 4x44K v2 Microarray has a total of 45,220 probes: 1224 probes for positive controls, 153 negative control, 823 labeled “ignore” and 43,118 labeled “cDNA”. The pos+neg+ignore probes were used to ascertain the background level of fluorescence (6, on the log2 scale) then discarded. The cDNA probes comprise 34,127 unique 60mer probes, of which 999 probes are spotted 10 times each and the rest one time each. We averaged the replicate probes for those spotted 10 times and then fit a mixed model that had treatment and dye as fixed effects and array pairing as a random effect (Phipson et al., 2016; Smyth et al., 2005). After fitting the model but before False Discovery Rate (FDR) correction (Benjamini and Hochberg, 1995), probes were filtered out by the following criteria: 1) did not have at least 4/8 samples with expression values > 6 (14,105 probes removed), 2) no longer had an assigned Entrez Gene ID in Bioconductor’s HsAgilentDesign026652.db annotation package (v3.2.3; 2,152 probes removed) (Huber et al., 2015), 3) mapped to the same Entrez Gene ID as another probe but had a larger p-value for treatment effect (4,141 probes removed). This left 13,729 probes representing 13,729 unique genes.
<b>*Please note: that there is a discrepancy between the file and the readme as this plain text is the actual data file of this dataset.</b>
keywords:
Basigin; endometrium; decidualization; human
published:
2025-03-19
Bieri, Carolina A.; Dominguez, Francina; Miguez-Macho, Gonzalo; Fan, Ying
(2025)
This repository includes HRLDAS Noah-MP model output generated as part of Bieri et al. (2025) - Implementing deep soil and dynamic root uptake in Noah-MP (v4.5): Impact on Amazon dry-season transpiration.
These data are distributed in two different formats: Raw model output files and subsetted files that include data for a specific variable. All files are .nc format (NetCDF) and aggregated into .tar files to facilitate download. Given the size of these datasets, Globus transfer is the best way to download them.
Raw model output for four model experiments is available: FD (control), GW, SOIL, and ROOT. See the associated publication for information on the different experiments. These data span an approximately 20 year period from 01 Jun 2000 to 31 Dec 2019. The data have a spatial resolution of 4 km and a temporal frequency of 3 hours. These data are for a domain in the southern Amazon basin (see Figure 1 in the associated publication). Data for each experiment is available as a .tar file which includes 3-hourly NetCDF files. All default Noah-MP output variables are included in each file. As a result, the .tar files are quite large and may take many hours or even days to transfer depending on your network speed and local configurations. These files are named 'noahmp_output_2000_2019_EXP.tar', where EXP is the name of the experiment (FD, GW, SOIL, or ROOT).
Subsetted model output at a daily temporal resolution for all four model experiments is also available. These .tar files include the following variables: water table depth (ZWT), latent heat flux (LH), sensible heat flux (HFX), soil moisture (SOIL_M), canopy evaporation (ECAN), ground evaporation (EDIR), transpiration (ETRAN), rainfall rate at the surface (QRAIN), and two variables that are specific to the ROOT experiment: ROOTACTIVITY (root activity function) and GWRD (active root water uptake depth). There is one file for each variable within the tarred files. These files are named 'noahmp_output_subset_2000_2019_EXP.tar', where EXP is the name of the experiment (FD, GW, SOIL, or ROOT).
Finally, there is a sample dataset with raw 3-hourly output from the ROOT experiment for one day. The purpose of this sample dataset is to allow users to confirm if these data meet their needs before initiating a full transfer via Globus. This file is named 'noahmp_output_sample_ROOT.tar'.
The README.txt file provides information on the Noah-MP output variables in these datasets, among other specifications.
Information on HRLDAS Noah-MP and names/definitions of model output variables that are useful in working with these data are available here: http://dx.doi.org/10.5065/ew8g-yr95. Note that some output variables may be listed in this document under a different variable name, so searching for the long name (e.g. 'baseflow' instead of 'QRF') is recommended.
Information on additional output variables that were added to the model as part of this study is available here: https://github.com/bieri2/bieri-et-al-2025-EGU-GMD/tree/DynaRoot.
Model code, configuration files, and forcing data used to carry out the model simulations are linked in the related resources section.
keywords:
Land surface model; NetCDF