Dataset Search

Displaying 176 - 200 of 473 in total

Filters

Subject Area

Life Sciences (282)

Social Sciences (84)

Physical Sciences (67)

Technology and Engineering (37)

Uncategorized

Arts and Humanities (1)

Funder

U.S. Department of Energy (DOE) (150)

Other (116)

U.S. National Science Foundation (NSF) (112)

U.S. National Institutes of Health (NIH) (37)

U.S. Department of Agriculture (USDA) (28)

Illinois Department of Natural Resources (IDNR) (12)

U.S. Geological Survey (USGS) (2)

Illinois Department of Transportation (IDOT) (1)

U.S. National Aeronautics and Space Administration (NASA) (1)

U.S. Army (1)

Publication Year

2025 (153)

2022 (50)

2024 (50)

2021 (45)

2020 (36)

2023 (34)

2018 (29)

2026 (28)

2019 (27)

2016 (11)

2017 (10)

License

CC BY (267)

CC0 (194)

custom (12)

Illinois Data Bank Dataset Search Results

Results

published: 2025-08-08

Data for "Human landscape alterations and landcover heterogeneity influence northern raccoon (Procyon lotor) site use intensity"

Remmers, Justin J.; Allen, Maximilian; Green, Austin M. (2025)

Count histories from camera traps and remotely sensed covariate data used in N-mixture modeling to assess the site use intensity of raccoons in Illinois.

published: 2025-02-08

Synthetic Networks For Benchmarking

Anne, Lahari; Park, Minhyuk; Warnow, Tandy; Chacko, George (2025)

The synthetic networks in this dataset were generated using the RECCS protocol developed by Anne et al. (2024). Briefly, the RECCS process is as follows. An input network and clustering (by any algorithm) is used to pass input parameters to a stochastic block model (SBM) generator. The output is then modified to improve fit to the input real world clusters after which outlier nodes are added using one of three different options. See Anne et al. (2024): in press Complex Networks and Applications XIII (preprint : arXiv:2408.13647). The networks in this dataset were generated using either version 1 or version 2 of the RECCS protocol followed by outlier strategy S1. The input networks to the process were (i) the Curated Exosome Network (CEN), Wedell et al. (2021), (ii) cit_hepph (https://snap.stanford.edu/), (iii) cit_patents (https://snap.stanford.edu/), and (iv) wiki_topcats (https://snap.stanford.edu/). Input Networks: The CEN can be downloaded from the Illinois Data Bank: https://databank.illinois.edu/datasets/IDB-0908742 -> cen_pipeline.tar.gz -> S1_cen_cleaned.tsv The synthetic file naming system should be interpreted as follows: a_b_c.tsv.gz where a - name of inspirational network, e.g., cit_hepph b - the resolution value used when clustering a with the Leiden algorithm optimizing the Constant Potts Model, e.g., 0.01 c- the RECCS option used to approximate edge count and connectivity in the real world network, e.g., v1 Thus, cit_hepph_0.01_v1.tsv indicates that this network was modeled on the cit_hepph network and RECCSv1 was used to match edge count and connectivity to a Leiden-CPM 0.01 clustering of cit_hepph. For SBM generation, we used the graph_tool software (P. Peixoto, Tiago 2014. The graph-tool python library. figshare. Dataset. https://doi.org/10.6084/m9.figshare.1164194.v14) Additionally, this dataset contains synthetic networks generated for a replication experiment (repl_exp.tar.gz). The experiment aims to evaluate the consistency of RECCS-generated networks by producing multiple replicates under controlled conditions. These networks were generated using different configurations of RECCS, varying across two versions (v1 and v2), and applying the Connectivity Modifier (CM++, Ramavarapu et al. (2024)) pre-processing. Please note that the CM pipeline used for this experiment filters small clusters both before and after the CM treatment. Input Network : CEN Within repl_exp.tar.gz, the synthetic file naming system should be interpreted as follows: cen_<resolution><cm_status><reccs_version>sample<replicate_id>.tsv where: cen – Indicates the network was modeled on the Curated Exosome Network (CEN). resolution – The resolution parameter used in clustering the input network with Leiden-CPM (0.01). cm_status – Either cm (CM-treated input clustering) or no_cm (input clustering without CM treatment). reccs_version – The RECCS version used to generate the synthetic network (v1 or v2). replicate_id – The specific replicate (ranging from 0 to 2 for each configuration). For example: cen_0.01_cm_v1_sample_0.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, CM-treated input, and generated using RECCSv1 (first replicate). cen_0.01_no_cm_v2_sample_1.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, without CM treatment, and generated using RECCSv2 (second replicate). The ground truth clustering input to RECCS is contained in repl_exp_groundtruths.tar.gz.

keywords: Community Detection; Synthetic Networks; Stochastic Block Model (SBM);

published: 2022-07-25

SBKS - Celllines Raw Entity Mentions

Jett, Jacob (2022)

A set of cell-line entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.

keywords: synthetic biology; NERC data; cell-line mentions

published: 2025-04-24

Data for 'The conservatism of prairie pollinators according to experts and empiricism'

Bried, J. T. (2025)

Includes two files (.csv) behind all analyses and results in the paper published with the same title. 1) 'sites.species.counts' is the raw 2018-2022 data from Angella Moorehouse (Illinois Nature Preserves Commission) including her 456 identified pollinator species and her raw counts per site (there may be a few errors of identification or naming, and there will always be name changes over time). Headers in columns F through Q correspond to the remnant-site labels in Figure 1 and Table 1 of the paper. Columns R to AB are the “nonremnant” sites, which have not been uniquely labelled since the specific sites aren't referenced anywhere in the manuscript. 2) 'C.scores' has the 265 species assigned empirical C values (empirical.C) along with the four sets of expert C values and their confidence ranks (low, medium, high), and the Illinois/Indiana conservation ranks (S-ranks), following the methods described in the paper. Other headers in these files: - taxa.code: four-letter abbreviation for genus and specific name - genus: genus name - species: specific epithet - common.name: English name - group: general pollinator taxa group - empirical.C: empirically estimated conservatism score - expert#.C: conservatism score assigned by each of four experts - expert#.conf: expert's confidence in their conservatism score Blank cells in the site-species abundance matrix indicates species absence (or non-detection) Blank cells in C.scores.csv indicates missing S-ranks and unassigned C-scores (with associated missing confidence ranks) where experts lacked knowledge or confidence

keywords: ecological conservatism; indicator values; pollinator conservation; prairie ecosystems; protected areas; remnant communities

published: 2022-08-05

5000-het: Dataset of Nucleotide Sequences with a Form of Evolutionary Sequence Length Heterogeneity

Liu, Baqiao; Shen, Chengze; Warnow, Tandy (2022)

Simulated sequences provide a way to evaluate multiple sequence alignment (MSA) methods where the ground truth is exactly known. However, the realism of such simulated conditions often comes under question compared to empirical datasets. In particular, simulated data often does not display heterogeneity in the sequence lengths, a common feature in biological datasets. In order to imitate sequence length heterogeneity, we here present a set of data that are evolved under a mixture model of indel lengths, where indels have an occasional chance of being promoted to long indels (emulating large insertion/deletion events, e.g., domain-level gain/loss). This dataset is otherwise (e.g., in GTR parameters) analogous to the 1000M condition as presented in the SATe paper (doi: 10.1126/science.1171243) but with 5000 sequences and simulated with INDELible (http://abacus.gene.ucl.ac.uk/software/indelible/). For more information, see README.txt. For the INDELible control files, see https://github.com/ThisBioLife/5000M-234-het.

keywords: simulated data; sequence length heterogeneity; multiple sequence alignment;

published: 2023-07-28

Data for Genome-wide association and genomic prediction for yield and component traits of Miscanthus sacchariflorus

Njuguna, Joyce; Clark, Lindsay; Lipka , Alexander; Anzoua, Kossonou; Bagmet, Larisa; Chebukin, Pavel; Dwiyanti, Maria; Dzyubenko, Elena; Dzyubenko, Nicolay; Ghimire, Bimal; Jin, Xiaoli; Johnson, Douglas; Nagano, Hironori; Peng, Junhua; Petersen, Karen; Sabitov, Andrey; Seong, Eun; Yamada, Toshihiko; Yoo, Ji; Yu, Chang; Zhao, Hu; Long, Stephen; Sacks, Erik (2023)

The dataset is for a study conducted to understand genome-wide association (GWA) and genomic prediction of biomass yield and 14 yield-components traits in Miscanthus sacchariflorus. We evaluated a diversity panel with 590 accessions of M. sacchariflorus grown across four years in one subtropical and three temperate locations and genotyped with 268,109 single nucleotide polymorphisms (SNPs).

keywords: Miscanthus sacchariflorus; genome-wide association analysis; genomic prediction; bioenergy; biomass

published: 2025-09-30

Data for "Sustainable Potassium Sorbate Production from Triacetic Acid Lactone in Food-Grade Solvents"

Huber, George; Guest, Jeremy; Santiago-Martinez, Leoncio; Bhagwat, Sarang; Kim, Min Soo (2025)

This study advances the production of potassium sorbate (KS) from triacetic acid lactone (TAL) utilizing food-grade solvents, ethanol (EtOH) and isopropyl alcohol (IPA). We have previously demonstrated the route to produce KS from TAL in tetrahydrofuran (THF) as the main solvent, but the use of THF is associated with environmental and health risks especially for food applications. The process employs a catalytic approach in food-grade solvents and includes three main steps: hydrogenation, etherification and hydrolysis, and ring-opening hydrolysis to produce KS from TAL. In the synthesis of KS from TAL, the use of IPA leads to higher yields and reduced reaction times compared to EtOH. As a result, the overall reaction time in IPA was reduced to 35.7 h, compared to 42.1 h in our previous study using THF and EtOH, while achieving a comparable KS yield of 84% from TAL. The synthesized KS exhibits a trans-2, trans-4 geometrical configuration, identical to that of commercially available KS. Through techno-economic analysis (TEA) and life cycle assessment (LCA), we estimated full-scale production of KS from sugarcane with the developed process in IPA could achieve a minimum product selling price (MPSP) of $8.27 per kg with a range of $7.06–10.16 per kg [5th–95th percentiles from 6000 Monte Carlo simulations] and a carbon intensity (CI) of 13.7 [9.6–18.6] kg CO2-eq per kg. This study highlights the synthesis of KS from TAL using food-grade solvents, demonstrating improved economic viability and environmental sustainability compared to our previous research (MPSP of $9.68 per kg [$8.47–11.45 per kg] and CI of 16.2 [12.0–21.2] kg CO2-eq per kg), as the total required reaction decreases while achieving the comparable overall yield of KS from TAL.

keywords: bioproducts; catalysis

published: 2018-09-26

Pastinaca sativa P450s - CYP71AJ4 variants in New Zealand and North America

Cure, Anne; Calla, Bernarda; Berenbaum, May; Schuler, Mary (2018)

Nucleotide sequences from wild parsnip CYP71AJ4 (angelic in synthase. <a href ="https://www.ncbi.nlm.nih.gov/nuccore/EF191021">Genbank EF191021</a>) were obtained by Sanger sequencing. Seeds from individual plants from different populations were harvested to obtain corresponding cDNA. The cDNA was cloned and directly sequenced. Aminoacid translations were obtained using standard codon usage. Alignments of CYP71AJ4 sequences (involved in angular furanocoumarin biosynthesis) with as the reference sequence. Consistent amino acid variabilities were found between some populations. The relationship between sequencing variability and selective pressure is not yet known.

keywords: Pastinaca sativa; parsnip; furanocoumarins; psoralen

published: 2025-11-19

Data for Optimizing Bioenergy Sorghum Productivity and Nutrient Removal in Illinois: Impact of Nitrogen Fertilization Under Diverse Marginal Conditions

Jang, Chunhwa; Lee, Jung Woo; Namoi, Nictor; Kim, Jinwook; Lee, Moon-Sub; Crozier, Daniel; Yang, Wendy; Rooney, William; Lee, DoKyoung (2025)

Bioenergy sorghum (Sorghum bicolor L. Moench) is a promising crop for contributing to the United States bioenergy supply. However, the varying limitations of the marginal lands targeted for its cultivation present a management challenge. This two-year study aimed to investigate how the limitations associated with marginal cropland impact the effects of nitrogen fertilization on the yield of bioenergy sorghum and the uptake of 11 macro- (N, P, K, Ca, Mg, and S) and micronutrients (Fe, Mn, Zn, Cu, and B). The study contrasted prime cropland in central Illinois (Urbana) with three marginal cropland sites in southern (Ewing) and central Illinois (Fairbury and Pesotum). These marginal cropland sites are characterized by varying limitations, including low soil fertility (P and K limitations), leaching and erosion, and flooding, respectively. Four nitrogen rates (0, 56, 112, and 168 kg N ha−1) were tested under eight environments. The average yields and ranges of sorghum biomass were 20.2 (17.0–23.2) Mg ha−1 in Urbana, 18.1 (13.1–19.8) Mg ha−1 in Ewing, 13.8 (9.0–17.3) Mg ha−1 in Fairbury, and 23.3 (14.6–33.0) Mg ha−1 in Pesotum. Optimal N rates were 56 N in Pesotum and 112 N in Urbana, Ewing, and Fairbury. Tissue macronutrient contents in Urbana were generally higher than in the marginal croplands, while micronutrient contents did not show discernible trends. Increasing N rate generally correlated with the macronutrient removal except in Ewing. Comparable sorghum biomass yields were observed between prime and marginal croplands (averaging 18.3 Mg ha−1), but optimal N rates varied between 56 N and 112 N. This suggests that yield gaps can be narrowed by applying the optimal N rates for the respective locations. However, increased removals of macronutrients, especially P and K, with increasing yields indicate the need to revise fertilizer recommendations, particularly for soils deficient in these nutrients. Our study suggests that while sorghum production on marginal cropland is feasible, N management needs to be adapted to the unique limitations associated with various types of marginal cropland.

keywords: Sustainability;Biomass Analytics;Field Data;Nitrogen

published: 2023-10-26

Simulation trajectories for "A DNA turbine powered by a transmembrane potential across a nanopore"

Maffeo, Christopher; Aksimentiev, Aleksei (2023)

Simulation trajectory data and scripts for Nature Nanotechnology manuscript "A DNA turbine powered by a transmembrane potential across a nanopore" that demonstrates a rationally designed nanoscale DNA-origami turbine with three chiral blades that uses a transmembrane electrochemical potential across a nanopore to drive a DNA bundle into sustained unidirectional rotations of up to 10 revolutions/s. Driven by the asymmetric mobility of a DNA duplex, the rotation direction of the turbine is set by its designed chirality and the salinity of the solvent.

keywords: All-atom MD simulation; DNA; nanotechnology; motors and rotors

published: 2024-10-01

Data for Transcriptional responses of detoxification genes to coumaphos in a nontarget species, Galleria mellonella (greater wax moth) (Lepidoptera: Pyralidae), in the beehive environment

Li, Shengyun; Wu, Wen-Yen; Liao, Ling-Hsiu; Berenbaum, May (2024)

This dataset is associated with the manuscript "Transcriptional responses of detoxification genes to coumaphos in a nontarget species, Galleria mellonella (greater wax moth) (Lepidoptera: Pyralidae), in the beehive environment" This dataset includes 2 Excel files: 1) raw_data_bioassay.xlsx: this file contains the raw data for waxworm bioassay. There are 2 worksheets within this file: - LC50: raw data for measuring LC50 in the laboratory and field strain of Galleria mellonella. - RGR: Relative Growth Rate, raw data for measuring body weight of field strain of Galleria mellonella . 2) raw-data_RT-qPCR.xlsx: this file contains raw data (Ct value) of RT-qPCR.

keywords: Apis mellifera; cytochrome P450; honey bee; pesticide; waxworm

published: 2022-08-31

Data for Refining the role of nitrogen mineralization in mycorrhizal nutrient syndromes

Seyfried, Georgia; Midgley, Meghan; Phillips, Richard; Yang, Wendy (2022)

This dataset includes data on soil properties, soil N pools, and soil N fluxes presented in the manuscript, "Refining the role of nitrogen mineralization in mycorrhizal nutrient syndromes". Please refer to that publication for details about methodologies used to generate these data and for the experimental design. For this verison 2, we added specific gross nitrogen mineralization rates (ugN/gOM/d), microbial biomass carbon (ugC/gdw), microbial biomass nitrogen (ugN/gdw) and microbial biomass C:N ratios to the newest version of the data set. Additionally, we updated values for gross nitrogen mineralization, microbial NO3 assimilation and microbial NH4 assimilation to reflect slight changes in data processing. Those changes are reflected in "220829_All data_repository.csv". "220829_nitrogen_mineralization_readme.txt " is updated readme for the new file. The other 2 files begin with “220426_” are older version and same as in V1.

keywords: Nitrogen cycling; Ectomycorrhizal fungi; Arbuscular mycorrhizal fungi; Nitrogen fertilization; Gross mineralization

published: 2024-01-01

Data for Ornate Box Turtle (Terrapene ornata) Emergence

Edmonds, Devin; Bach, Elizabeth; Colton, Andrea; Jaquet, Izabelle; Kessler, Ethan; Dreslik, Michael (2024)

These data were used to make a predictive model of when ornate box turtles (Terrapene ornata) are likely to be above ground and at risk from fire. The data were generated using shell temperatures, soil temperatures at 0.35 m deep from known overwintering sites, and the spring and fall soil temperature inversion dates during 2019–2022 to infer if 26 individual radio-tracked turtles were above or below ground at three sites in Illinois.

keywords: turtle; conservation; controlled burn; fire management; ectotherm; hibernation; brumation; reptile

published: 2024-07-15

Impact Assessment of Climate Change and Afforestation

Li, Peiyuan; Sharma, Ashish; Wuebbles, Donald (2024)

Rising global temperatures and urban heat island effects challenge environmental health and energy systems at the city level, particularly in summer. Increased heatwaves raise energy demand for cooling, stressing power facilities, increasing costs, and risking blackouts. Heat impacts vary across cities due to differences in urban morphology, geography, land use, and land cover, highlighting vulnerable areas needing targeted heat mitigation. Urban tree canopies, a nature-based solution, effectively mitigate heat. Trees provide shade and cooling through evaporation, improving thermal comfort, reducing air conditioning energy consumption, and enhancing climate resilience. This report focused on the ComEd service area in the Chicago Metropolitan Region and assessed the impacts of population growth, urbanization, climate change, and an ambitious plan to plant 1 million trees. The report evaluated planting 1 million trees to quantify regional cooling effects projected for the 2030s. Afforestation locations were selected to avoid interference with existing infrastructure. Key findings include (i) extreme hot hours (>95°F) will increase from 30 to 200 per year, adding 420 Cooling Degree Days (CCD) by the 2030s, (ii) greener areas can be up to 10°F cooler than less vegetated neighborhoods in summer, (iii) tree canopies can create localized cooling, reducing temperatures by 0.7°F and lowering annual CCD by 60 to 65, and (iv) afforestation can reduce the region’s temperature by 0.7°F, saving 400 to 1100 Megawatt hours of daily power usage during summer. Note: The data is available upon request from <a href="mailto:dpiclimate@uilliois.edu">dpiclimate@uilliois.edu.

keywords: urban heat; cooling degree days; afforestation; tree canopy; Chicago region

published: 2025-03-05

2D Acoustic Numerical Breast Phantoms for Ultrasound Computed Tomography

Li, Fu; Villa, Umberto; Park, Seonyeong; Jeong, Gangwon; Anastasio, Mark A. (2025)

References - Li, Fu, Umberto Villa, Seonyeong Park, and Mark A. Anastasio. "3-D stochastic numerical breast phantoms for enabling virtual imaging trials of ultrasound computed tomography." IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 69, no. 1 (2021): 135-146. DOI: 10.1109/TUFFC.2021.3112544 - Li, Fu; Villa, Umberto; Park, Seonyeong; Anastasio, Mark, 2021, "2D Acoustic Numerical Breast Phantoms and USCT Measurement Data", https://doi.org/10.7910/DVN/CUFVKE, Harvard Dataverse, V1 Overview - This dataset includes 1,089 two-dimensional slices extracted from 3D numerical breast phantoms (NBPs) for ultrasound computed tomography (USCT) studies. The anatomical structures of these NBPs were obtained using tools from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project. The methods used to modify and extend the VICTRE NBPs for use in USCT studies are described in the publication cited above. - The NBPs in this dataset represent the following four ACR BI-RADS breast composition categories: > Type A - The breast is almost entirely fatty > Type B - There are scattered areas of fibroglandular density in the breast > Type C - The breast is heterogeneously dense > Type D - The breast is extremely dense - Each 2D slice is taken from a different 3D NBP, ensuring that no more than one slice comes from any single phantom. File Name Format - Each data file is stored as an HDF5 .mat file. The filenames follow this format: {type}{subject_id}.mat where{type} indicates the breast type (A, B, C, or D), and {subject_id} is a unique identifier assigned to each sample. For example, in the filename D510022534.mat, "D" represents the breast type, and "510022534" is the sample ID. File Contents - Each file contains the following variables: > "type": Breast type > "sos": Speed-of-sound map [mm/μs] > "den": Ambient density map [kg/mm³] > "att": Acoustic attenuation (power-law prefactor) map [dB/ MHzʸ mm] > "y": power-law exponent > "label": Tissue label map. Tissue types are denoted using the following labels: water (0), fat (1), skin (2), glandular tissue (29), ligament (88), lesion (200). - All spatial maps ("sos", "den", "att", and "label") have the same spatial dimensions of 2560 x 2560 pixels, with a pixel size of 0.1 mm x 0.1 mm. - "sos", "den", and "att" are float32 arrays, and "label" is an 8-bit unsigned integer array.

keywords: Medical imaging; Ultrasound computed tomography; Numerical phantom

published: 2018-04-19

MapAffil 2016 dataset -- PubMed author affiliations mapped to cities and their geocodes worldwide

Torvik, Vetle I. (2018)

MapAffil 2016 dataset -- PubMed author affiliations mapped to cities and their geocodes worldwide. Prepared by Vetle Torvik 2018-04-05 The dataset comes as a single tab-delimited Latin-1 encoded file (only the City column uses non-ASCII characters), and should be about 3.5GB uncompressed. • How was the dataset created? The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check here for information to get PubMed/MEDLINE, and NLMs data <a href ="https://www.nlm.nih.gov/databases/download/pubmed_medline.html">Terms and Conditions</a> • Affiliations are linked to a particular author on a particular article. Prior to 2014, NLM recorded the affiliation of the first author only. However, MapAffil 2016 covers some PubMed records lacking affiliations that were harvested elsewhere, from PMC (e.g., PMID 22427989), NIH grants (e.g., 1838378), and Microsoft Academic Graph and ADS (e.g. 5833220). • Affiliations are pre-processed (e.g., transliterated into ASCII from UTF-8 and html) so they may differ (sometimes a lot; see PMID 27487542) from PubMed records. • All affiliation strings where processed using the MapAffil procedure, to identify and disambiguate the most specific place-name, as described in: Torvik VI. MapAffil: A bibliographic tool for mapping author affiliation strings to cities and their geocodes worldwide. D-Lib Magazine 2015; 21 (11/12). 10p • Look for <a href="https://doi.org/10.1186/s41182-017-0073-6">Fig. 4</a> in the following article for coverage statistics over time: Palmblad M, Torvik VI. Spatiotemporal analysis of tropical disease research combining Europe PMC and affiliation mapping web services. Tropical medicine and health. 2017 Dec;45(1):33. Expect to see big upticks in coverage of PMIDs around 1988 and for non-first authors in 2014. • The code and back-end data is periodically updated and made available for query by PMID at <a href="http://abel.ischool.illinois.edu/">Torvik Research Group</a> • What is the format of the dataset? The dataset contains 37,406,692 rows. Each row (line) in the file has a unique PMID and author postition (e.g., 10786286_3 is the third author name on PMID 10786286), and the following thirteen columns, tab-delimited. All columns are ASCII, except city which contains Latin-1. 1. PMID: positive non-zero integer; int(10) unsigned 2. au_order: positive non-zero integer; smallint(4) 3. lastname: varchar(80) 4. firstname: varchar(80); NLM started including these in 2002 but many have been harvested from outside PubMed 5. year of publication: 6. type: EDU, HOS, EDU-HOS, ORG, COM, GOV, MIL, UNK 7. city: varchar(200); typically 'city, state, country' but could inlude further subvisions; unresolved ambiguities are concatenated by '|' 8. state: Australia, Canada and USA (which includes territories like PR, GU, AS, and post-codes like AE and AA) 9. country 10. journal 11. lat: at most 3 decimals (only available when city is not a country or state) 12. lon: at most 3 decimals (only available when city is not a country or state) 13. fips: varchar(5); for USA only retrieved by lat-lon query to https://geo.fcc.gov/api/census/block/find

keywords: PubMed, MEDLINE, Digital Libraries, Bibliographic Databases; Author Affiliations; Geographic Indexing; Place Name Ambiguity; Geoparsing; Geocoding; Toponym Extraction; Toponym Resolution

published: 2025-05-21

Data for A multiplex of connectome trajectories enables several connectivity patterns in parallel

Mostame, Parham; Wirsich, Jonathan; Alderson, Thomas H.; Ridley, Ben; Giraud, Anne-Lise; Carmichael, David W.; Vulliemoz, Serge; Guye, Maxime; Lemieux, Louis; Sadaghiani, Sepideh (2025)

___________________________________SUMMARY This dataset contains derivative data from concurrent fMRI and scalp EEG recordings used in: Mostame Parham, Wirsich Jonathan, Alderson Thomas H, Ridley Ben, Giraud Anne-Lise, Carmichael David W, Vulliemoz Serge, Guye Maxime, Lemieux Louis, Sadaghiani Sepideh (2024) A multiplex of connectome trajectories enables several connectivity patterns in parallel eLife 13:RP98777. doi: https://doi.org/10.7554/eLife.98777.3 ___________________________________RAW DATA The data has been originally published and described as part of other studies (Morillon et al., 2010; Sadaghiani et al., 2012). Briefly, 10 minutes of eyes-closed resting state were analyzed from 26 healthy subjects (average age = 24.39 years; range: 18-31 years; 8 females) with no history of psychiatric or neurological disorders. Informed consent was given by each participant and the study was approved by the local Research Ethics Committee (CPP Ile de France III). FMRI was acquired using a 3T Siemens Tim Trio scanner with a GE-EPI pulse sequence (TR = 2 s; TE = 50 ms; 40 slices; 300 volumes; field of view: 192×192; voxel size: 3×3×3 mm3). Structural T1-weighted scan were acquired using the MPRAGE pulse sequence (176 slices; field of view: 256×256; voxel size: 1×1×1 mm3). 62-channel scalp EEG (Easycap, with an additional EOG and an ECG channel) was recorded using an MR-compatible amplifier (BrainAmp MR, Brain Products) at 5Hz sampling rate. ___________________________________PREPROCESSING fMRI and EEG data were preprocessed with standard preprocessing steps as explained in detail elsewhere (Wirsich et al., 2020). In brief, fMRI underwent standard slice-time correction, spatial realignment (SPM12, http://www.fil.ion.ucl.ac.uk/spm/software/spm12). Structural T1-weighted images were processed using Freesurfer (recon-all, v6.0.0, https://surfer.nmr.mgh.harvard.edu/) in order to perform non-uniformity and intensity correction, skull stripping and gray/white matter segmentation. The cortex was parcellated into 68 regions of the Desikan-Kiliany atlas (Desikan et al., 2006). This atlas was chosen because —as an anatomical parcellation— avoids biases towards one or the other functional data modality. The T1 images of each subject and the Desikan-Killiany were co-registered to the fMRI images (FSL-FLIRT 6.0.2, https://fsl.fmrib.ox.ac.uk/fsl/fslwiki). We extracted signals of no interest such as the average signals of cerebrospinal fluid (CSF) and white matter from manually defined regions of interest (ROI, 5 mm sphere, Marsbar Toolbox 0.44, http://marsbar.sourceforge.net) and regressed out of the BOLD timeseries along with 6 rotation, translation motion parameters and global gray matter signal (Wirsich et al., 2017a). Then we bandpass-filtered the timeseries at 0.009–0.08 Hz. Average timeseries of each region was then used to calculate connectivity. EEG underwent gradient and cardio-ballistic artifact removal using Brain Vision Analyzer software (Allen et al., 1998, 2000) and was down-sampled to 250 Hz. EEG was projected into source space using the Tikhonov-regularized minimum norm in Brainstorm software (Baillet et al., 2001; Tadel et al., 2011). Source activity was then averaged to the 68 regions of the Desikan-Killiany atlas. Band-limited EEG signals in each canonical frequency band and every atlas region were then used to calculate frequency-specific connectome dynamics. Note that the MEG-ROI-nets toolbox in the OHBA Software Library (OSL; https://ohba-analysis.github.io/osl-docs/) was used to minimize source leakage in the band-limited source-localized EEG data (Colclough et al., 2015). ___________________________________FOLDER STRUCTURE The dataset includes five separate folders as described below: 1) EEGfMRI_dFC folder: connectome dynamics of scalp data This folder contains 26 single MATLAB (.mat) files for each subject. Inside each `.mat` is a structure with fields `A`, `B`, and `C`, corresponding to fMRI, amplitude-coupling, and phase-coupling connectome dynamics, respectively. The fMRI data are 3-dimensional (ROI × ROI × timepoints). The EEG data are stored in a 1×5 cell array (Delta, Theta, Alpha, Beta, Gamma), each cell containing a 3-D ROI × ROI × timepoints matrix. 2) EEGfMRI_dFC_SourceOrtho foldeR: connectome dynamics of source-orthogonalized scalp data Same format as above, except that EEG connectome dynamics are derived from source-orthogonalized signals. The MEG-ROI-nets toolbox in the OHBA Software Library (OSL; https://ohba-analysis.github.io/osl-docs/) was used to minimize source leakage in the band-limited, source-localized EEG data (Colclough et al., 2015). 3-5) Cross-modal Recurrence Plot (CRP) data Each subject has an Excel file with five sheets (Delta through Gamma), corresponding to the five frequency bands. Each sheet contains a 2-D CRP matrix (rows = fMRI timepoints, columns = band-limited EEG timepoints). - Scalp EEG–fMRI CRPs (CRP_EEGfMRI and CRP_EEGfMRI_SourceOrtho folder): two versions (with and without source-orthogonalization), each has 52 Excel files, including amplitude- and phase-coupling CRPs. - Intracranial EEG–fMRI CRPs (CRP_iEEGfMRI folder): one version, 27 Excel files, containing three cases: amplitude coupling, HRF-convolved amplitude coupling, and phase coupling.

keywords: Connectome; fMRI-EEG; Intracranial; Multiplex

published: 2025-09-25

Data for Observation of a Dynamic Magneto-chiral Instability in Photoexcited Tellurium

Huang, Yijing; Abboud, Nick (2025)

This repository provides the data and code used to reproduce key plots from the manuscript and to extend discussions that were only briefly covered therein. All MATLAB scripts were developed and tested in MATLAB R2024a. All Python scripts were developed and tested in Python 3.11.2. * NOTE: New in this V3: 1. 2 new MATLAB files (ChiralPointGroups.m and THz_current_estimation.m), ChiralPointGroups.pdf (a compiled version of ChiralPointGroups.m) and theoretical model code (theoretical_model.zip) are added. More information can be found in the readme. 2. Updated and renamed "publication_data.zip" (in V2) to "data_and_analysis.zip" 3. Change License from CC BY to "Other license". Licensing Terms: Data (all .mat files) is under CC BY and Code is released under MIT license. Therefore, V3 is bound to this new license. V2 is still under CC BY. → Data and analysis code (data_and_analysis.zip): The dataset is organized into five subfolders. Each subfolder corresponds to a unique combination of experimental conditions, including: • Magnetic field orientation (B ∥ c or B ⟂ c) • Scan parameter (magnetic field or temperature) • Pump laser polarization (linear s, linear p, or circular) • Detection polarization (linear s) Each folder contains: • The raw time-domain data files (.mat) • Oscillator parameters extracted via linear prediction algorithm (.mat) • MATLAB scripts (.m) that generate plots of the raw data, processed fits, and amplified modes. Each script should be run within its corresponding folder to ensure proper loading of the associated data files. Folder summary: 1. B_parallel_c_linear_spump_sprobe_field: B ∥ c, s-polarized pump, s-polarized THz detection, magnetic field dependence 2. B_parallel_c_linear_spump_sprobe_temperature: B ∥ c, s-polarized pump, s-polarized THz detection, temperature dependence 3. B_perp_c_linear_spump_sprobe_field: B ⟂ c, s-polarized pump, s-polarized THz detection, magnetic field dependence 4. B_perp_c_linear_spump_sprobe_temperature: B ⟂ c, s-polarized pump, s-polarized THz detection, temperature dependence 5. B_parallel_c_LCPRCP_pump_sprobe_field: B ∥ c, circularly polarized pump (LCP & RCP), s-polarized THz detection, magnetic field dependence →Theoretical model code (theoretical_model.zip): The Python script depends on packages “numpy” and “matplotlib”. The script generates a plot of the dispersion relations of the theoretical model introduced in the Main Text. More precisely, it plots the real (red) and imaginary (blue) parts of the frequency (ω) as a function of wavenumber (k) as obtained by solving the characteristic equation, equation (6) of the Supplemental Information, with σ_E and σ_Μ given respectively by equations (3) and (2) of the Main Text. All branches of the dispersion relations are plotted simultaneously. All model parameters are adjustable. The included Mathematica notebook (printout also provided in .pdf format) was used to obtain symbolic expressions for the coefficients of powers of ω appearing in the characteristic determinant. These coefficients were copied directly into the Python function detCoeffs(). → Standalone scripts (not in subfolders): • ChiralPointGroups.m Outputs a table summarizing the 2D matrix representation of σ_Μ in the 11 enantiomorphic point groups. ChiralPointGroups.pdf is a compiled version of chiral point groups table, identical to the output of ChiralPointGroups.m. • THz_current_estimation.m Estimates the photoinduced THz current in tellurium under magnetic field. The script evaluates a phenomenological resonant contribution to the magnetoelectric coupling (with negligible dependence on NIR polarization), leading to excitation of s-polarized, B-antisymmetric mode S_odd at ~0.37 THz. These standalone scripts provide additional physical discussion and calculation detail that are intentionally streamlined or omitted from the published manuscript and its supplementary materials for clarity and space.

keywords: magneto-chiral instability; THz emission; THz spectroscopy; nonequilibrium states; emergent phenomena; Weyl semiconductor; tellurium; ultrafast spectrscopy; photoexcitation

published: 2025-03-18

Global News Index and Extracted Features Repository (v.1.3.0)

Cline Center for Advanced Social Research (2025)

The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. Additional Resources: - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, please fill out the <a href="https://docs.google.com/forms/d/e/1FAIpQLSf-J937V6I4sMSxQt7gR3SIbUASR26KXxqSurrkBvlF-CIQnQ/viewform?usp=pp_url">Archer Access Request Form</a> so we can determine if you are eligible for access or not. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the <a href="https://forms.gle/6eA2yJUGFMtj5swY7">Archer User Feedback Form</a>. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this <a href="https://groups.webservices.illinois.edu/subscribe/154221">form</a> to subscribe to it. Citation Guidelines: 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2025. Global News Index and Extracted Features Repository [codebook], v1.3.0. Champaign, IL: University of Illinois. June. XX. doi:10.13012/B2IDB-5649852_V6 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2025. Global News Index and Extracted Features Repository [database], v1.3.0. Champaign, IL: University of Illinois. Jun. XX. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V6 *NOTE: V6 is replacing V5 with updated ‘Archer’ documents to reflect changes made to the Archer system.

published: 2024-01-19

Soybean seed quality response to eCO2 data files

Digrado, Anthony; Montes, Christopher; Baxter, Ivan; Ainsworth, Elizabeth (2024)

This data set is related to a SoyFACE experiment conducted in 2004, 2006, 2007, and 2008 with the soybean cultivars Loda and HS93-4118. The experiment looked at how seed elements were affected by elevated CO2 and yield. In this V2, 2 new files were added per journal requirement. Total there are 5 data files in text format within the digrado_et_al_gcb_data_V2 and 1 readme file. The name of files are listed below. Details about headers are explained in the readme.txt file. 1. ionomic_data.txt file contains the ionomic data (mg/kg) for the two cultivars. The file contains all six technical replicates for each plot. The cultivar, year, treatment, and the plot from which the samples were collected are given for each entry. 2. yield_data.txt file contains the yield data for the two cultivars (seed yield in kg/ha, seed yield in bu/a, Protein (%), Oil (%)). The file contains yield data for every plot. The cultivar, year, treatment, and the plot from which the samples were collected are given for each entry. 3. mineral_pro_oil_yield.txt file contains the yield per hectare for each mineral (g/ha) along with the yield per hectare for protein and oil (t/ha). This was obtained by multiplying the seed content of each element (minerals, protein, and oil) by the total seed yield. The file contains yield data for every plots. The cultivar, year, treatment, and the plot from which the samples were collected are given for each entry. 4. economic_assessment.txt file contains data used to assess the financial impact of altered seed oil content on soybean oil production. 5. meteorological_data.txt file contains the meteorological data recorded by a weather station located ~ 3km from the experimental site (Willard Airport Champaign). Data covering the period between May 28 and September 24 were used for 2004; between May 25 and September 24 were used in 2006; between May 23 and September 17 in 2007; and between June 16 and October 24 in 2008.

keywords: protein; oil; mineral; SoyFACE; nutrient; Glycine max; soybean; yield; CO2; agriculture; climate change

published: 2025-05-21

Pollen of Podocarpus (Podocarpaceae) II: Airyscan confocal superresolution images

Punyasena, Surangi W.; Adaime, Marc-Elie; Jaramillo, Carlos (2025)

This dataset includes a total of 16 images of 2 extant species of Podocarpus (Podocarpaceae) and 23 images of fossil specimens of the morphogenus Podocarpidites. The images were taken using a Zeiss LSM 880 microscope with Airyscan confocal superresolution at 630x magnification (63x/NA 1.4 oil DIC). The images are in the original CZI file format. They can be opened using Zeiss propriety software (Zen, Zen lite) or open microscopy software, such as ImageJ. More information on how to open CZI files can be found here: [https://www.zeiss.com/microscopy/us/products/software/zeiss-zen/czi-image-file-format.html] For Podocarpus (modern specimens): Each folder is labelled by genus and contain all images corresponding to that genus. Detailed information about the folders, files, and specimens can be found in the Excel file "METADATA_Podocarpus_extant.csv". This file includes metadata on: species, slide ID, collection, folder name file name and notes. Images are of pollen grains from slides in the Florida Museum of Natural History collections. For Podocarpidites (fossil specimens): Each image is named after the sample from which it was derived. Detailed information about the specimens can be found in the Excel file "METADATA_ Podocarpidites_fossil.csv". This file includes metadata: the fossil type (Taxon), the slide and sample name (Slide Info), the location of the sample locality (Country, Latitude, Longitude), the age of the sample (Min age, Max age), the location of the specimen on the sample slide (England Finder coordinates), and the image file name. Images are of fossil pollen from slides in Smithsonian Tropical Research Institute collections. Please cite this dataset and listed publications when using these images.

keywords: optical superresolution microscopy; Zeiss Airyscan; CZI images; conifer; saccate pollen; Podocarpus; Podocarpidites

published: 2018-05-21

Geometric analysis of magnetic dimensionality

Karigerasi, Manohar H.; Wagner, Lucas K.; Shoemaker, Daniel P. (2018)

This dataset contains bonding networks and tolerance ranges for geometric magnetic dimensionality. The data can be searched in the html frontend above, code obtained at the GitHub repository, or the raw data can be downloaded as csv below. The csv data contains the results of 42520 compounds (unique icsd_code) from ICSD FindIt v3.5.0. The csv is semicolon-delimited since some fields contain multiple comma-separated values.

keywords: materials science; physics; magnetism; crystallography

published: 2024-03-25

Data for "Differing physiological performance of coexisting cool- and warmwater fish species under heatwaves in the Midwestern United States"

Suski, Cory; Dai, Qihong (2024)

This is the dataset for the manuscript titled, "Differing physiological performance of coexisting cool- and warmwater fish species under heatwaves in the Midwestern United States"

keywords: climate change; heat wave; metabolic rate; swimming; predator-prey interaction; thermal tolerance; Sander vitreus; walleye; largemouth bass; species distributions

published: 2024-07-09

Data matrices for "Missing Data and Model Selection in Phylogenomics: A Re-Evaluation of Cicadomorpha (Hemiptera: Auchenorrhyncha) Superfamily Level Relationships Under Site-Heterogeneous Models"

Yan, Bin; Dietrich, Christopher; Yu, Xiaofei; Jiang, Yan; Dai, Renhuai; Du, Shiyu; Cai, Chenyang; Yang, Maofa; Zhang, Feng (2024)

The included files are the alignments of DNA or amino acid sequences used for phylogenetic analyses of Auchenorrhyncha (Insecta: Hemiptera) in the manuscript by Bin et al. submitted to the journal “Systematic Entomology.” The files are plain text in either FASTA (.fa or .fas suffix) or PHYLIP (.phy suffix) format. Matrix0 is the set of all loci after multiple sequence alignment and trimming (hereafter called). Matrix1 consists of loci having 75% average bootstrap support and 80% taxon completeness (hereafter called Matrix1). Matrix2 consists of loci having 75% average bootstrap support and 95% completeness. Matrix2_nt12 is the same as Matrix2 but with third codon positions excluded. More details on how the datasets were compiled is provided in the Methods section of the manuscript file, also included as a PDF. Supplemental figures for the submitted manuscript are also provided as a PDF for additional information.

keywords: Insecta; Phylogeny; DNA sequence; Evolution

published: 2018-03-08

Molecular Biology Databases Published in Nucleic Acids Research between 1991-2016

Imker, Heidi (2018)

This dataset was developed to create a census of sufficiently documented molecular biology databases to answer several preliminary research questions. Articles published in the annual Nucleic Acids Research (NAR) “Database Issues” were used to identify a population of databases for study. Namely, the questions addressed herein include: 1) what is the historical rate of database proliferation versus rate of database attrition?, 2) to what extent do citations indicate persistence?, and 3) are databases under active maintenance and does evidence of maintenance likewise correlate to citation? An overarching goal of this study is to provide the ability to identify subsets of databases for further analysis, both as presented within this study and through subsequent use of this openly released dataset.

keywords: databases; research infrastructure; sustainability; data sharing; molecular biology; bioinformatics; bibliometrics