Illinois Data Bank Dataset Search Results
Results
published:
2023-07-05
Fu, Yuanxi; Hsiao, Tzu-Kun; Joshi, Manasi Ballal; Lischwe Mueller, Natalie
(2023)
The salt controversy is the public health debate about whether a population-level salt reduction is beneficial. This dataset covers 82 publications--14 systematic review reports (SRRs) and 68 primary study reports (PSRs)--addressing the effect of sodium intake on cerebrocardiovascular disease or mortality. These present a snapshot of the status of the salt controversy as of September 2014 according to previous work by epidemiologists: The reports and their opinion classification (for, against, and inconclusive) were from Trinquart et al. (2016) (Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184 ), which collected 68 PSRs, 14 SRRs, 11 clinical guideline reports, and 176 comments, letters, or narrative reviews. Note that our dataset covers only the 68 PSRs and 14 SRRs from Trinquart et al. 2016, not the other types of publications, and it adds additional information noted below.
This dataset can be used to construct the inclusion network and the co-author network of the 14 SRRs and 68 PSRs. A PSR is "included" in an SRR if it is considered in the SRR's evidence synthesis. Each included PSR is cited in the SRR, but not all references cited in an SRR are included in the evidence synthesis or PSRs. Based on which PSRs are included in which SRRs, we can construct the inclusion network. The inclusion network is a bipartite network with two types of nodes: one type represents SRRs, and the other represents PSRs. In an inclusion network, if an SRR includes a PSR, there is a directed edge from the SRR to the PSR. The attribute file (report_list.csv) includes attributes of the 82 reports, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Notably, 11 PSRs have never been included in any SRR in the dataset. They are unused PSRs. If visualized with the inclusion network, they will appear as isolated nodes.
We used a custom-made workflow (Fu, Y. (2022). Scopus author info tool (1.0.1) [Python]. https://github.com/infoqualitylab/Scopus_author_info_collection ) that uses the Scopus API and manual work to extract and disambiguate authorship information for the 82 reports. The author information file (salt_cont_author.csv) is the product of this workflow and can be used to compute the co-author network of the 82 reports.
We also provide several other files in this dataset. We collected inclusion criteria (the criteria that make a PSR eligible to be included in an SRR) and recorded them in the file systematic_review_inclusion_criteria.csv. We provide a file (potential_inclusion_link.csv) recording whether a given PSR had been published as of the search date of a given SRR, which makes the PSR potentially eligible for inclusion in the SRR. We also provide a bibliography of the 82 publications (supplementary_reference_list.pdf). Lastly, we discovered minor discrepancies between the inclusion relationships identified by Trinquart et al. (2016) and by us. Therefore, we prepared an additional edge list (inclusion_net_edges_trinquart.csv) to preserve the inclusion relationships identified by Trinquart et al. (2016).
<b>UPDATES IN THIS VERSION COMPARED TO V2</b> (Fu, Yuanxi; Hsiao, Tzu-Kun; Joshi, Manasi Ballal (2022): The Salt Controversy Systematic Review Reports and Primary Study Reports Network Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6128763_V2)
- We added a new column "pub_date" to report_list.csv
- We corrected mistakes in supplementary_reference_list.pdf for report #28 and report #80. The author of report #28 is not Salisbury D but Khaw, K.-T., & Barrett-Connor, E. Report #80 was mistakenly mixed up with report #81.
keywords:
systematic reviews; evidence synthesis; network analysis; public health; salt controversy;
published:
2025-03-12
Jeong, Gangwon; Villa, Umberto; Park, Seonyeong; Anastasio, Mark A.
(2025)
References
- Jeong, Gangwon, Umberto Villa, and Mark A. Anastasio. "Revisiting the joint estimation of initial pressure and speed-of-sound distributions in photoacoustic computed tomography with consideration of canonical object constraints." Photoacoustics (2025): 100700.
- Park, Seonyeong, et al. "Stochastic three-dimensional numerical phantoms to enable computational studies in quantitative optoacoustic computed tomography of breast cancer." Journal of biomedical optics 28.6 (2023): 066002-066002.
Overview
- This dataset includes 80 two-dimensional slices extracted from 3D numerical breast phantoms (NBPs) for photoacoustic computed tomography (PACT) studies. The anatomical structures of these NBPs were obtained using tools from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project. The methods used to modify and extend the VICTRE NBPs for use in PACT studies are described in the publication cited above.
- The NBPs in this dataset represent the following four ACR BI-RADS breast composition categories:
> Type A - The breast is almost entirely fatty
> Type B - There are scattered areas of fibroglandular density in the breast
> Type C - The breast is heterogeneously dense
> Type D - The breast is extremely dense
- Each 2D slice is taken from a different 3D NBP, ensuring that no more than one slice comes from any single phantom.
File Name Format
- Each data file is stored as a .mat file. The filenames follow this format: {type}{subject_id}.mat where{type} indicates the breast type (A, B, C, or D), and {subject_id} is a unique identifier assigned to each sample. For example, in the filename D510022534.mat, "D" represents the breast type, and "510022534" is the sample ID.
File Contents
- Each file contains the following variables:
> "type": Breast type
> "p0": Initial pressure distribution [Pa]
> "sos": Speed-of-sound map [mm/μs]
> "att": Acoustic attenuation (power-law prefactor) map [dB/ MHzʸ mm]
> "y": power-law exponent
> "pressure_lossless": Simulated noiseless pressure data obtained by numerically solving the first-order acoustic wave equation using the k-space pseudospectral method, under the assumption of a lossless medium (corresponding to Studies I, II, and III).
> "pressure_lossy": Simulated noiseless pressure data obtained by numerically solving the first-order acoustic wave equation using the k-space pseudospectral method, incorporating a power-law acoustic absorption model to account for medium losses (corresponding to Study IV).
* The pressure data were simulated using a ring-array transducer that consists of 512 receiving elements uniformly distributed along a ring with a radius of 72 mm.
* Note: These pressure data are noiseless simulations. In Studies II–IV of the referenced paper, additive Gaussian i.i.d. noise were added to the measurement data. Users may add similar noise to the provided data as needed for their own studies.
- In Study I, all spatial maps (e.g., sos) have dimensions of 512 × 512 pixels, with a pixel size of 0.32 mm × 0.32 mm.
- In Study II and Study III, all spatial maps (sos) have dimensions of 1024 × 1024 pixels, with a pixel size of 0.16 mm × 0.16 mm.
- In Study IV, both the sos and att maps have dimensions of 1024 × 1024 pixels, with a pixel size of 0.16 mm × 0.16 mm.
keywords:
Medical imaging; Photoacoustic computed tomography; Numerical phantom; Joint reconstruction
published:
2025-10-10
Clark, Teresa J.; Schwender, Jorg
(2025)
Upregulation of triacylglycerols (TAGs) in vegetative plant tissues such as leaves has the potential to drastically increase the energy density and biomass yield of bioenergy crops. In this context, constraint-based analysis has the promise to improve metabolic engineering strategies. Here we present a core metabolism model for the C4 biomass crop Sorghum bicolor (iTJC1414) along with a minimal model for photosynthetic CO2 assimilation, sucrose and TAG biosynthesis in C3 plants. Extending iTJC1414 to a four-cell diel model we simulate C4 photosynthesis in mature leaves with the principal photo-assimilatory product being replaced by TAG produced at different levels. Independent of specific pathways and per unit carbon assimilated, energy content and biosynthetic demands in reducing equivalents are about 1.3 to 1.4 times higher for TAG than for sucrose. For plant generic pathways, ATP- and NADPH-demands per CO2 assimilated are higher by 1.3- and 1.5-fold, respectively. If the photosynthetic supply in ATP and NADPH in iTJC1414 is adjusted to be balanced for sucrose as the sole photo-assimilatory product, overproduction of TAG is predicted to cause a substantial surplus in photosynthetic ATP. This means that if TAG synthesis was the sole photo-assimilatory process, there could be an energy imbalance that might impede the process. Adjusting iTJC1414 to a photo-assimilatory rate that approximates field conditions, we predict possible daily rates of TAG accumulation, dependent on varying ratios of carbon partitioning between exported assimilates and accumulated oil droplets (TAG, oleosin) and in dependence of activation of futile cycles of TAG synthesis and degradation. We find that, based on the capacity of leaves for photosynthetic synthesis of exported assimilates, mature leaves should be able to reach a 20% level of TAG per dry weight within one month if only 5% of the photosynthetic net assimilation can be allocated into oil droplets. From this we conclude that high TAG levels should be achievable if TAG synthesis is induced only during a final phase of the plant life cycle.
keywords:
Feedstock Production;Modeling
published:
2019-12-03
These are the alignments of transcriptome data used for the analysis of members of Heteroptera. This dataset is analyzed in "Deep instability in the phylogenetic backbone of Heteroptera is only partly overcome by transcriptome-based phylogenomics" published in Insect Systematics and Diversity.
keywords:
Heteroptera; Hemiptera; Phylogenomics; transcriptome
published:
2023-03-27
Littlefield, Alexander; Xie, Dajie; Richards, Corey; Ocier, Christian; Gao, Haibo; Messinger, Jonah; Ju, Lawrence; Gao, Jingxing; Edwards, Lonna; Braun, Paul; Goddard, Lynford
(2023)
This dataset contains the full data used in the paper titled "Enabling High Precision Gradient Index Control in Subsurface Multiphoton Lithography," available at https://doi.org/10.1021/acsphotonics.2c01950 .
The data used for Table 1 can be found in the dataset for the related Figure 8.
Some supplemental figures' data can be found in the main figures data:
Figure S2's data is contained in Figure 6.
Figure S4 and Table S1 data is derived from Figure 6.
Figure S9 is derived from Figure 7.
Figure S10 is contained in Figure 7.
Figure S12 is derived from Figure 6 and the Python code prism-fringe-analysis.
Figures without a data file named after them do not have any data affiliated with them and are purely graphical representations.
published:
2021-02-18
Wang, Shaowen; Lyu, Fangzheng; Wang, Shaohua; Catlet, Charles; Padmanabhan, Anand; Soltani, Kiumars
(2021)
Increasingly pervasive location-aware sensors interconnected with rapidly advancing wireless network services are motivating the development of near-real-time urban analytics. This development has revealed both tremendous challenges and opportunities for scientific innovation and discovery. However, state-of-the-art urban discovery and innovation are not well equipped to resolve the challenges of such analytics, which in turn limits new research questions from being asked and answered. Specifically, commonly used urban analytics capabilities are typically designed to handle, process, and analyze static datasets that can be treated as map layers and are consequently ill-equipped in (a) resolving the volume and velocity of urban big data; (b) meeting the computing requirements for processing, analyzing, and visualizing these datasets; and (c) providing concurrent online access to such analytics. To tackle these challenges, we have developed a novel cyberGIS framework that includes computationally reproducible approaches to streaming urban analytics. This framework is based on CyberGIS-Jupyter, through integration of cyberGIS and real-time urban sensing, for achieving capabilities that have previously been unavailable toward helping cities solve challenging urban informatics problems.
The files included in this dataset functions as follows:
1) Spatial_interpolation.ipynb is a python based Jupyter notebook that enables users to conduct spatial interpolation with AoT data;
2) Urban_Informatics.ipynb is a Jupyter notebook that helps to explore the AoT dataset;
3) chicago-complete.weekly.2019-09-30-to-2019-10-06.tar includes all the high-frequency urban sensing data from AoT sensors from 2019 September 30th to 2019 October 6th collected in Chicago, US;
4) sensors.csv is a processed dataset including information about the temperature in Chicago, and it is used in Spatial_interpolation.ipynb.
keywords:
CyberGIS; Urban informatics; Array of Things
published:
2019-05-20
Lao, Yuyang; Schiffer, Peter
(2019)
This is the experimental data of tetris artificial spin ice. The islands are made of Permalloy materials with size of 170 nm by 470 nm by 2.5 nm. The systems are measured at a temperature where the islands are fluctuating around room temperature. The data is recorded as photoemission electron microscopy intensity. More details about the dataset can be found in the file Note.txt and Tetris_data_list.xlsx
Note:
2 files name bl11_teris600_033 and bl11_tetris600_2_135 are not recorded in the excel sheet because they are corrupted during the measurement. Any data that is not recorded in the excel sheet is either corrupted or of low quality.
From files *_028 to *_049, tetris is spelled with “t” while in the raw data folder without “t”. This is a typo. Throughout the dataset, tetris and teris are supposed to have the same meaning.
keywords:
artificial spin ice
published:
2020-10-15
Khanna, Madhu; Wang, Weiwei; Wang, Michael
(2020)
This dataset consists of various input data that are used in the GAMS model. All the data are in the format of .inc which can be read within GAMS or Notepad. Main data sources include: acreage data (acre), crop budget data ($/acre), crop yield data (e.g. bushel/acre), Soil carbon sequestration data (KgCO2/ha/yr). Model details can be found in the "Assessing the Additional Carbon Savings with Biofuel" and GAMS model package.
## File Description
(1) GAMS Model.zip: This includes all the input files and scripts for running the model
(2) Table*.csv: These files include the data from the tables in the manuscript
(3) Figure2_3_4.csv: This contains the data used to create the figures in the manuscript
(4) BaselineResults.csv: This includes a summary of the model results.
(5) SensitivityResults_*.csv: Model results from the various sensitivity analyses performed
(6) LUC_emission.csv: land use change emissions by crop reporting district for changes of pasturelands to annual crops.
keywords:
Biogenic carbon intensity; Corn ethanol; Economic model; Dynamic optimization; Anticipated baseline approach; Life cycle carbon intenisty
published:
2022-02-11
Hoang, Khanh Linh; Schneider, Jodi; Kansara, Yogeshwar
(2022)
The data contains a list of articles given low score by the RCT Tagger and an error analysis of them, which were used in a project associated with the manuscript "Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews".
Change made in this V3 is that the data is divided into two parts:
- Error Analysis of 44 Low Scoring Articles with MEDLINE RCT Publication Type.
- Error Analysis of 244 Low Scoring Articles without MEDLINE RCT Publication Type.
keywords:
Cochrane reviews; automation; randomized controlled trial; RCT; systematic reviews
published:
2024-03-25
Xia, Yushu; Kwon, Hoyoung; Wander, Michelle
(2024)
This accompanying study is published under the title "Estimating soil N2O emissions induced by organic and inorganic fertilizer inputs using a Tier-2, regression-based meta-analytic approach for U.S. agricultural lands" at Science of the Total Environment. The study is authored by Dr. Yushu Xia, Dr. Hoyoung Kwon, and Dr. Michelle Wander. The DOI for this study is <a href="https://doi.org/10.1016/j.scitotenv.2024.171930">https://doi.org/10.1016/j.scitotenv.2024.171930</a>.
keywords:
soil; nitrous oxide; agriculture; fertilizers; meta-analysis
published:
2025-07-30
Skorupa, A. J.; Bried, J. T.
(2025)
This dataset includes three data files for linking species' climate sensitivity, trait combinations, and listing status. It contains species occurrence data within Hydrologic Unit Code 12 (HUC12) watersheds, along with trait information and Rarity and Climate Sensitivity (RCS) index scores for lotic caddisflies, stoneflies, mussels, dragonflies, and crayfish across all Midwest Climate Adaptation Science Center states: Minnesota, Iowa, Missouri, Wisconsin, Illinois, Indiana, Michigan, and Ohio. For mussels, the geographic scope is expanded to include all Midwest Regional Species of Greatest Conservation Need (RSGCN) states—North Dakota, South Dakota, Nebraska, Kansas, and Kentucky. However, occurrence data for mussels is not included due to data-sharing agreements. Metadata are included with each data file. Please refer to the associated manuscript for original data sources, trait references, and details on the RCS index calculation.
keywords:
climate sensitivity; conservation status; traits; aquatic invertebrates; Midwest
published:
2019-08-05
Skinner, Rachel; Dietrich, Christopher; Walden, Kimberly; Gordon, Eric; Sweet, Andrew; Podsiadlowski, Lars; Petersen, Malte; Simon, Chris; Takiya, Daniela; Johnson, Kevin
(2019)
The data in this directory corresponds to:
Skinner, R.K., Dietrich, C.H., Walden, K.K.O., Gordon, E., Sweet, A.D., Podsiadlowski, L., Petersen, M., Simon, C., Takiya, D.M., and Johnson, K.P.
Phylogenomics of Auchenorrhyncha (Insecta: Hemiptera) using Transcriptomes: Examining Controversial Relationships via Degeneracy Coding and Interrogation of Gene Conflict.
Systematic Entomology.
Correspondance should be directed to: Rachel K. Skinner, rskinn2@illinois.edu
If you use these data, please cite our paper in Systematic Entomology.
The following files can be found in this dataset:
Amino_acid_concatenated_alignment.phy: the amino acid alignment used in this analysis in phylip format.
Amino_acid_raxml_partitions.txt (for reference only): the partitions for the amino acid alignment, but a partitioned amino acid analysis was not performed in this study.
Amino_acid_concatenated_tree.newick: the best maximum likelihood tree with bootstrap values in newick format.
ASTRAL_input_gene_trees.tre: the concatenated gene tree input file for ASTRAL
README_pie_charts.md: explains the the scripts and data needed to recreate the pie charts figure from our paper. There is also another
Corresponds to the following files:
ASTRAL_species_tree_EN_only.newick: the species tree with only effective number (EN) annotation
ASTRAL_species_tree_pp1_only.newick: the species tree with only the posterior probability 1 (main topology) annotation
ASTRAL_species_tree_q1_only.newick: the species tree with only the quartet scores for the main topology (q1)
ASTRAL_species_tree_q2_only.newick: the species tree with only the quartet scores for the first alternative topology (q2)
ASTRAL_species_tree_q3_only.newick: the species tree with only the quartet scores for the second alternative topology (q3)
print_node_key_files.py: script needed to create the following files:
node_keys.key: text file with node IDs and topologies
complete_q_scores.key: text file with node IDs multiplied q scores
EN_node_vals.key: text file with node IDs and EN values
create_pie_charts_tree.py: script needed to visualize the tree with pie charts, pp1, and EN values plotted at nodes
ASTRAL_species_tree_full_annotation.newick: the species tree with full annotation from the ASTRAL analysis.
NOTE: It may be more useful to examine individual value files if you want to visualize the tree,
e.g., in figtree, since the full annotations are extensive and can make viewing difficult.
Complete_NT_concatenated_alignment.phy: the nucleotide alignment that includes unmodified third codon positions. The alignment is in phylip format.
Complete_NT_raxml_partitions.txt: the raxml-style partition file of the nucleotide partitions
Complete_NT_concatenated_tree.newick: the best maximum likelihood tree from the concatenated complete analysis NT with bootstrap values in newick format
Complete_NT_partitioned_tree.newick: the best maximum likelihood tree from the partitioned complete NT analysis with bootstrap values in newick format
Degeneracy_coded_nt_concatenated_alignment.phy: the degeneracy coded nucleotide alignment in phylip format
Degeneracy_coded_nt_raxml_partitions.txt: the raxml-style partition file for the degeneracy coded nucleotide alignment
Degeneracy_coded_nt_concatenated_tree.newick: the best maximum likelihood tree from the degeneracy-coded concatenated analysis with bootstrap values in newick format
Degeneracy_coded_nt_partitioned_tree.newick: the best maximum likelihood tree from the degeneracy-coded partitioned analysis with bootstrap values in newick format
count_ingroup_taxa.py: script that counts the number of ingroup and/or outgroup taxa present in an alignment
keywords:
Auchenorrhyncha; Hemiptera; alignment; trees
published:
2025-10-10
Dong, Chang; Shi, Zhuwei; Huang, Lei; Zhao, Huimin; Xu, Zhinan; Lian, Jiazhang
(2025)
Mitochondrion is generally considered as the most promising subcellular organelle for compartmentalization engineering. Much progress has been made in reconstituting whole metabolic pathways in the mitochondria of yeast to harness the precursor pools (i.e., pyruvate and acetyl-CoA), bypass competing pathways, and minimize transportation limitations. However, only a few mitochondrial targeting sequences (MTSs) have been characterized (i.e., MTS of COX4), limiting the application of compartmentalization engineering for multigene biosynthetic pathways in the mitochondria of yeast. In the present study, based on the mitochondrial proteome, a total of 20 MTSs were cloned and the efficiency of these MTSs in targeting heterologous proteins, including the Escherichia coli FabI and enhanced green fluorescence protein (EGFP) into the mitochondria was evaluated by growth complementation and confocal microscopy. After systematic characterization, six of the well-performed MTSs were chosen for the colocalization of complete biosynthetic pathways into the mitochondria. As proof of concept, the full α-santalene biosynthetic pathway consisting of 10 expression cassettes capable of converting acetyl-coA to α-santalene was compartmentalized into the mitochondria, leading to a 3.7-fold improvement in the production of α-santalene. The newly characterized MTSs should contribute to the expanded metabolic engineering and synthetic biology toolbox for yeast mitochondrial compartmentalization engineering.
keywords:
Conversion;Metabolic Engineering
published:
2020-11-05
Miller, Andrew; Raudabaugh, Daniel
(2020)
This version 2 dataset contains 34 files in total with one (1) additional file, called "Culture-dependent Isolate table with taxonomic determination and sequence data.csv". The remaining files (33) are identical to version 1. The following is the information about the new file and its variables:
<b>Culture-dependent Isolate table with taxonomic determination and sequence data.csv</b>: Culture table with assigned taxonomy from NCBI. Single direction sequence for each isolate is include if one could be obtained. Sequence is derived from ITS1F-ITS4 PCR amplicons, with Sanger sequencing in one direction using ITS5. The files contains 20 variables with explanation as below:
IsolateNumber : unique number identify each isolate cultured
Time: season in which the sample was collected
Location: the specific name of the location
Habitat: type of habitat : either stream or peatland
State: state in the USA in which the specific location is located
Incubation_pH ID: pH of the medium during isolation of fungal cultures
Genus: phylogenetic genus of the fungal isolates (determined by sequence similarity)
Sequence_quality: base call quality of the entire sequence used for blast analysis, if known
%_coverage: sequence coverage reported from GenBank
%_ID: sequence similarity reported from GenBank
Life_style : ecological life style if known
Phylum: phylogenetic phylum as indicated by Index Fungorum
Subphylum: phylogenetic subphylum as indicated by Index Fungorum
Class: phylogenetic class as indicated by Index Fungorum
Subclass: phylogenetic subclass as indicated by Index Fungorum
Order: phylogenetic order as indicated by Index Fungorum
Family: phylogenetic Family as indicated by Index Fungorum
ITS5_Sequence: single direction sequence used for sequence similarity match using blastn. Primer ITS5
Fasta: sequence with nomenclature in a fasta format for easy cut and paste into phylogenetic software
Note: blank cells mean no data is available or unknown.
keywords:
ITS1 forward reads; Illumina; peatlands; streams; bogs; fens
published:
2019-05-07
Detmer, Thomas; Wahl, David
(2019)
Data set of trophic cascade in mesocosms experiments for zooplankton (biomass and body size) and phytoplankton (chlorophyll a concentration) caused by Bluegill as well as zooplankton production in those same treatment groups. Zooplankton were collected by tube sampler and phytoplankton were collected through grab samples.
keywords:
Trophic cascades; size-selective predation; compensatory mechanisms; biomanipulation; invasive fish; Daphnia; Moina
published:
2020-02-12
Price, Edward; Spyreas, Greg; Matthews, Jeffrey
(2020)
This is the dataset used in the Landscape Ecology publication of the same name. This dataset consists of the following files:
NWCA_Int_Veg.txt
NWCA_Reg_Veg.txt
NWCA_Site_Attributes.txt
NWCA_Int_Veg.txt is a site and plot by species matrix. Column labeled SITES consists of site IDs. Column labeled Plots consist of Plot ID numbers. All other columns represent species abundances (estimates of percent cover, summed across five plots).
NWCA_Reg_Veg.txt is a site by species matrix of species abundances. Column labeled SITES consist of site IDs. All other columns represent species abundances (estimates of percent cover within individual plots).
NWCA_Site_Attributes.txt is a matrix of site attributes. Column labeled SITES consist of site IDs. Column labeled AA_CENTER_LAT consist of latitudinal coordinates for the Assessment Area center point in decimal degrees. Column labeled AA_CENTER_LONG consist of longitudinal coordinates for the Assessment Area center point in decimal degrees. Column REFPLUS_NWCA represents disturbance gradient classes including MIN (minimally disturbed), L (least disturbed), I (intermediate), M (most disturbed). Column REFPLUS_NWCA2 represents revised disturbance gradient classes based on protocols described in the article. These revised classes were used for analysis. Column labeled STRESS_HEAVYMETAL represents heavy metal stressor classes, used to ascertain which wetlands were missing soil data. Classes in the STRESS_HEAVYMETAL column include Low, Moderate, High, and Missing. Sites with Missing STRESS_HEAVYMETAL classes were removed from analysis.
More information about this dataset: All of the data used in this analysis was gathered from the National Wetlands Condition Assessment. Wetland surveys were conducted from 4/4/2011 to 11/2/2011. The entire National Wetlands Condition Assessment Dataset, which includes 3640 unique taxonomic identities of plants, can be found at: https://www.epa.gov/national-aquatic-resource-surveys/data-national-aquatic-resource-surveys
keywords:
Anthropogenic disturbance; β-Diversity; Biotic homogenization; Phalaris arundinacea; reed canary grass; Wetlands
published:
2024-12-05
Salami, Malik Oyewale; McCumber, Corinne
(2024)
This project investigates retraction indexing agreement among data sources: BCI, BIOABS, CCC, Compendex, Crossref, GEOBASE, MEDLINE, PubMed, Retraction Watch, Scopus, and Web of Science Core. Post-retraction citation may be partly due to authors’ and publishers' challenges in systematically identifying retracted publications. To investigate retraction indexing quality, we investigate the agreement in indexing retracted publications between 11 database sources, restricting to their coverage, resulting in a union list of 85,392 unique items. We also discuss common errors in indexing retracted publications. Our results reveal low retraction indexing agreement scores, indicating that databases widely disagree on indexing retracted publications they cover, leading to a lack of consistency in what publications are identified as retracted. Our findings highlight the need for clear and standard practices in the curation and management of retracted publications.
Pipeline code to get the result files can be found in the GitHub repository
https://github.com/infoqualitylab/retraction-indexing-agreement in the ‘src’ file containing iPython notebooks:
The ‘unionlist_completed-ria_2024-07-09.csv’ file has been redacted to remove proprietary data, as noted below in README.txt. Among our sources, data is openly available only for Crossref, PubMed, and Retraction Watch.
FILE FORMATS:
1) unionlist_completed-ria_2024-07-09.csv - UTF-8 CSV file
2) README.txt - text file
keywords:
retraction status; data quality; indexing; retraction indexing; metadata; meta-science; RISRS
published:
2018-03-01
The data set consists of Illumina sequences derived from 48 sediment samples, collected in 2015 from Lake Michigan and Lake Superior for the purpose of inventorying the fungal diversity in these two lakes. DNA was extracted from ca. 0.5g of sediment using the MoBio PowerSoil DNA isolation kits following the Earth Microbiome protocol. PCR was completed with the fungal primers ITS1F and fITS7 using the Fluidigm Access Array. The resulting amplicons were sequenced using the Illumina Hi-Seq2500 platform with rapid 2 x 250nt paired-end reads. The enclosed data sets contain the forward read files for both primers, both fixed-header index files, and the associated map files needed to be processed in QIIME. In addition, enclosed are two rarefied OTU files used to evaluate fungal diversity. All decimal latitude and decimal longitude coordinates of our collecting sites are also included.
File descriptions:
Great_lakes_Map_coordinates.xlsx = coordinates of sample sites
QIIME Processing ITS1 region: These are the raw files used to process the ITS1 Illumina reads in QIIME. ***only forward reads were processed
GL_ITS1_HW_mapFile_meta.txt = This is the map file used in QIIME.
ITS1F_Miller_Fludigm_I1_fixedheader.fastq = Index file from Illumina. Headers were fixed to match the forward reads (R1) file in order to process in QIIME
ITS1F_Miller_Fludigm_R1.fastq = Forward Illumina reads for the ITS1 region.
QIIME Processing ITS2 region: These are the raw files used to process the ITS2 Illumina reads in QIIME. ***only forward reads were processed
GL_ITS2_HW_mapFile_meta.txt = This is the map file used in QIIME.
ITS7_Miller_Fludigm_I1_Fixedheaders.fastq = Index file from Illumina. Headers were fixed to match the forward reads (R1) file in order to process in QIIME
ITS7_Miller_Fludigm_R1.fastq = Forward Illumina reads for the ITS2 region.
Resulting OTU Table and OTU table with taxonomy
ITS1 Region
wahl_ITS1_R1_otu_table.csv = File contains Representative OTUs based on ITS1 region for all the R1 data and the number of each OTU found in each sample.
wahl_ITS1_R1_otu_table_w_tax.csv = File contains Representative OTUs based on ITS1 region for all the R1 and the number of each OTU found in each sample along with taxonomic determination based on the following database: sh_taxonomy_qiime_ver7_97_s_31.01.2016_dev
ITS2 Region
wahl_ITS2_R1_otu_table.csv = File contains Representative OTUs based on ITS2 region for all the R1 data and the number of each OTU found in each sample.
wahl_ITS2_R1_otu_table_w_tax.csv = File contains Representative OTUs based on ITS2 region for all the R1 data and the number of each OTU found in each sample along with taxonomic determination based on the following database: sh_taxonomy_qiime_ver7_97_s_31.01.2016_dev
Rarified illumina dataset for each ITS Region
ITS1_R1_nosing_rare_5000.csv = Environmental parameters and rarefied OTU dataset for ITS1 region.
ITS2_R1_nosing_rare_5000.csv = Environmental parameters and rarefied OTU dataset for ITS2 region.
Column headings:
#SampleID = code including researcher initials and sequential run number
BarcodeSequence =
LinkerPrimerSequence = two sequences used CTTGGTCATTTAGAGGAAGTAA or GTGARTCATCGAATCTTTG
ReversePrimer = two sequences used GCTGCGTTCTTCATCGATGC or TCCTCCGCTTATTGATATGC
run_prefix = initials of run operator
Sample = location code, see thesis figures 1 and 2 for mapped locations and Great_lakes_Map_coordinates.xlsx for exact coordinates.
DepthGroup = S= shallow (50-100 m), MS=mid-shallow (101-150 m), MD=mid-deep (151-200 m), and D=deep (>200 m)"
Depth_Meters = Depth in meters
Lake = lake name, Michigan or Superior
Nitrogen %
Carbon %
Date = mm/dd/yyyy
pH = acidity, potential of Hydrogen (pH) scale
SampleDescription = Sample or control
X = sequential run number
OTU ID = Operational taxonomic unit ID
keywords:
Illumina; next-generation sequencing; ITS; fungi
published:
2020-02-05
Zahniser, James; Dietrich, Christopher
(2020)
The Delt_Comb.NEX text file contains the original data used in the phylogenetic analyses of Zahniser & Dietrich, 2013 (European Journal of Taxonomy, 45: 1-211). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first nine lines of the file indicate the file type (Nexus), that 152 taxa were analyzed, that a total of 3971 characters were analyzed, the format of the data, and specification for two symbols used in the dataset. There are four datasets separated into blocks, one each for: 28S rDNA gene, Histone H3 gene, morphology, and insertion/deletion characters scored based on the alignment of the 28S rDNA dataset. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the publication using this dataset. A text file, Delt_morph_char.txt, is available here that states the morphological characters and characters states that were scored in the Delt_Comb.NEX dataset. The original DNA sequence data are available from NCBI GenBank under the accession numbers indicated in publication. Chromatogram files for each sequencing read are available from the first author upon request.
keywords:
phylogeny; DNA sequence; morphology; parsimony analysis; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; histone H3; bayesian analysis
published:
2019-02-26
Neumann, Elizabeth; Comi, Troy; Rubakhin, Stanislav; Sweedler, Jonathan
(2019)
We have recently created an approach for high throughput single cell measurements using matrix assisted laser desorption / ionization mass spectrometry (MALDI MS) (J Am Soc Mass Spectrom. 2017, 28, 1919-1928. doi: 10.1007/s13361-017-1704-1. Chemphyschem. 2018, 19, 1180-1191. doi: 10.1002/cphc.201701364). While chemical detail is obtained on individual cells, it has not been possible to correlate the chemical information with canonical cell types.
Now we combine high-throughput single cell mass spectrometry with immunocytochemistry to determine lipid profiles of two known cell types, astrocytes and neurons from the rodent brain, with the work appearing as “Lipid heterogeneity between astrocytes and neurons revealed with single cell MALDI MS supervised by immunocytochemical classification” (DOI: 10.1002/anie.201812892).
Here we provide the data collected for this study. The dataset provides the raw data and script files for the rodent cerebral cells described in the manuscript.
keywords:
Single cell analysis; mass spectrometry; astrocyte; neuron; lipid analysis
published:
2019-06-12
Miller, Andrew; Raudabaugh, Daniel
(2019)
The data set contains Supplemental data sets for the Manuscript entitled "Where are they hiding? Testing the body snatchers hypothesis in pyrophilous fungi."
Environmental sampling: Amplification of nuclear DNA regions (ITS1 and ITS2) were completed using the Fluidigm Access Array and the resulting amplicons were sequenced on an Illumina MiSeq v2 platform runs using rapid 2 × 250 nt paired-end reads. Illumina sequencing run amplicons that were size selected into <500nt and >500nt sub-pools, then remixed together <500nt: >500nt by nM concentration in a 1x:3x proportion. All amplification and sequencing steps were performed at the Roy J. Carver Biotechnology Center at the University of Illinois Urbana-Champaign.
ITS1 region primers consisted of ITS1F (5'-CTTGGTCATTTAGAGGAAGTAA-'3) and ITS2 (5'-GCTGCGTTCTTCATCGATGC-'3).
ITS2 region primers consisted of fITS7 (5'-GTGARTCATCGAATCTTTG-'3) and ITS4 (5'-TCCTCCGCTTATTGATATGC-'3).
Supplemental files 1 through 5 contain the raw data files.
Supplemental 1 is the ITS1 Illumina MiSeq forward reads and Supplemental 2 is the corresponding index files.
Supplemental 3 is the ITS2 Illumina MiSeq forward reads and Supplemental 4 is the corresponding index files.
Supplemental 5 is the map file needed to process the forward reads and index files in QIIME.
Supplemental 6 and 7 contain the resulting QIIME 1.9.1. OTU tables along with UNITE, NCBI, and CONSTAX taxonomic assignments in addition to the representative OTU sequence.
Numeric samples within the OTU tables correspond to the following:
1 Brachythecium sp.
2 Usnea cornuta
3 Dicranum sp.
4 Leucodon julaceus
5 Lobaria quercizans
6 Rhizomnium sp.
7 Dicranum sp.
8 Thuidium delicatulum
9 Myelochroa aurulenta
10 Atrichum angustatum
11 Dicranum sp.
12 Hypnum sp.
13 Atrichum angustatum
14 Hypnum sp.
15 Thuidium delicatulum
16 Leucobryum sp.
17 Polytrichum commune
18 Atrichum angustatum
19 Atrichum angustatum
20 Atrichum crispulum
21 Bryaceae
22 Leucobryum sp.
23 Conocephalum conicum
24 Climacium americanum
25 Atrichum angustatum
26 Huperzia serrata
27 Polytrichum commune
28 Diphasiastrum sp.
29 Anomodon attenuatus
30 Bryoandersonia sp.
31 Polytrichum commune
32 Thuidium delicatulum
33 Brachythecium sp.
34 Leucobryum glaucum
35 Bryoandersonia sp.
36 Anomodon attenuatus
37 Pohlia sp.
38 Cinclidium sp.
39 Hylocomium splendens
40 Polytrichum commune
41 negative control
42 Soil
43 Soil
44 Soil
45 Soil
46 Soil
47 Soil
If a sample number is not present within the OTU table; either no sequences were obtained or no sequences passed the quality filtering step in QIIME.
Supplemental 8 contains the Summary of unique species per location.
published:
2019-07-08
Kehoe, Adam K.; Torvik, Vetle I.
(2019)
# Overview
These datasets were created in conjunction with the dissertation "Predicting Controlled Vocabulary Based on Text and Citations: Case Studies in Medical Subject Headings in MEDLINE and Patents," by Adam Kehoe.
The datasets consist of the following:
* twin_not_abstract_matched_complete.tsv: a tab-delimited file consisting of pairs of MEDLINE articles with identical titles, authors and years of publication. This file contains the PMIDs of the duplicate publications, as well as their medical subject headings (MeSH) and three measures of their indexing consistency.
* twin_abstract_matched_complete.tsv: the same as above, except that the MEDLINE articles also have matching abstracts.
* mesh_training_data.csv: a comma-separated file containing the training data for the model discussed in the dissertation.
* mesh_scores.tsv: a tab-delimited file containing a pairwise similarity score based on word embeddings, and MeSH hierarchy relationship.
## Duplicate MEDLINE Publications
Both the twin_not_abstract_matched_complete.tsv and twin_abstract_matched_complete.tsv have the same structure. They have the following columns:
1. pmid_one: the PubMed unique identifier of the first paper
2. pmid_two: the PubMed unique identifier of the second paper
3. mesh_one: A list of medical subject headings (MeSH) from the first paper, delimited by the "|" character
4. mesh_two: a list of medical subject headings from the second paper, delimited by the "|" character
5. hoopers_consistency: The calculation of Hooper's consistency between the MeSH of the first and second paper
6. nonhierarchicalfree: a word embedding based consistency score described in the dissertation
7. hierarchicalfree: a word embedding based consistency score additionally limited by the MeSH hierarchy, described in the dissertation.
## MeSH Training Data
The mesh_training_data.csv file contains the training data for the model discussed in the dissertation. It has the following columns:
1. pmid: the PubMed unique identifier of the paper
2. term: a candidate MeSH term
3. cit_count: the log of the frequency of the term in the citation candidate set
4. total_cit: the log of the total number the paper's citations
5. citr_count: the log of the frequency of the term in the citations of the paper's citations
6. total_citofcit: the log of the total number of the citations of the paper's citations
7. absim_count: the log of the frequency of the term in the AbSim candidate set
8. total_absim_count: the log of the total number of AbSim records for the paper
9. absimr_count: the log of the frequency of the term in the citations of the AbSim records
10. total_absimr_count: the log of the total number of citations of the AbSim record
11. log_medline_frequency: the log of the frequency of the candidate term in MEDLINE.
12. relevance: a binary indicator (True/False) if the candidate term was assigned to the target paper
## Cosine Similarity
The mesh_scores.tsv file contains a pairwise list of all MeSH terms including their cosine similarity based on the word embedding described in the dissertation. Because the MeSH hierarchy is also used in many of the evaluation measures, the relationship of the term pair is also included. It has the following columns:
1. mesh_one: a string of the first MeSH heading.
2. mesh_two: a string of the second MeSH heading.
3. cosine_similarity: the cosine similarity between the terms
4. relationship_type: a string identifying the relationship type, consisting of none, parent/child, sibling, ancestor and direct (terms are identical, i.e. a direct hierarchy match).
The mesh_model.bin file contains a binary word2vec C format file containing the MeSH term embeddings. It was generated using version 3.7.2 of the Python gensim library (https://radimrehurek.com/gensim/).
For an example of how to load the model file, see https://radimrehurek.com/gensim/models/word2vec.html#usage-examples, specifically the directions for loading the "word2vec C format."
keywords:
MEDLINE;MeSH;Medical Subject Headings;Indexing
published:
2023-07-10
Harmon-Threatt, Alexandra N.; Anderson, Nicholas L.
(2023)
Bee movement between habitat patches in a naturally fragmented ecosystem depended on species, patch, and matrix variables. Using a mark-recapture methodology in the naturally fragmented Ozark glade ecosystem, we assessed the importance of bee size, nesting biology, the distance between patches (e.g., isolation), and nesting and floral resources in habitat patches and the surrounding matrix on bee movement.
This dataset includes seven data files, three R code files, and a QGIS tool. Three of the data files include information collected at the study sites with regard to bees and matrix and patch characteristics. The other four data files are spatial files used to quantify the characteristics of the forest canopy between the study sites and the edge-to-edge distances between the study sites. R code in the R Markdown file recreates the analysis and data presentation for the associated publication. R script files contain processes for calculating some of the explanatory variables used in the analysis. The QGIS tool can be used as the first step to obtaining average values from a raster file where the cells are large relative to the areas of interest (AOI) that you would like to characterize. The second step is contained in one of the aforementioned R scripts.
Detected effects included: Larger bees were more likely to move between patches. Bee movement was less likely as the distance between patches increased. However, relatively short distances (~50 m) inhibited movement more than our a priori expectations. Bees were unlikely to move away from home patches with abundant and diverse floral and below-ground nesting resources. When home patches were less resource-rich, bee movement depended on the characteristics of the away patch or the matrix. In these cases, bees were more likely to move to away patches with greater below-ground nesting and floral resources. Matrix habitats with more available floral and below-ground nesting resources appear to impede movement to neighboring patches, potentially because they already provide supplemental resources for bees.
keywords:
habitat fragmentation; bees; movement; mark-recapture; nesting resources; floral resources; isolation
published:
2024-07-08
Chong, Jer Pin; Minnaert-Grote, Jamie; Zaya, David N.; Ashley, Mary V.; Coons, Janice; Ramp Neal, Jennifer M.; Molano-Flores, Brenda
(2024)
A population genetics study was conducted on three plant taxa in the genus Physaria that are found on the Kaibab Plateau (Arizona, USA). Physaria kingii subsp. kaibabensis is endemic to the Kaibab Plateau, and is of conservation concern because of its rarity, limited range, and potential threats to its long-term persistence. Additionally, the taxon is a candidate for federal protection under the Endangered Species Act. It was not clear how genetically isolated P. k. subsp. kaibabensis was from Physaria kingii subsp. latifolia, which is a widespread subspecies found throughout the southwestern USA, including on the Kaibab Plateau. Additionally, other authors have suggested that P. k. subsp. kaibabensis may hybridize with Physaria arizonica, a different species that is also widespread and found on and off the Kaibab Plateau. We conducted a population genetics study of all three groups to better determine the conservation status of P. k. subsp. kaibabensis. Genetic data are in the form of nuclear DNA microsatellites for 13 loci (all apparently diploid). Additionally, we have included location information for the collection sites. We collected tissue samples from on and off the Kaibab Plateau. The overall findings are shared in a manuscript being submitted for peer-review.
keywords:
Physaria kingii; Kaibab Plateau; endemism; conservation genetics; rare species biology
published:
2019-08-29
This is the published ortholog set derived from whole genome data used for the analysis of members of the B. tabaci complex of whiteflies. It includes the concatenated alignment and individual gene alignments used for analyses (Link to publication: https://www.mdpi.com/1424-2818/11/9/151).