Illinois Data Bank Dataset Search Results
Results
published:
2024-05-30
Lyu, Fangzheng; Zhou, Lixuanwu; Park, Jinwoo; Baig, Furqan; Wang, Shaowen
(2024)
This dataset contains all the datasets used in the study conducted for the research publication titled "Mapping dynamic human sentiments of heat exposure with location-based social media data". This paper develops a cyberGIS framework to analyze and visualize human sentiments of heat exposure dynamically based on near real-time location-based social media (LBSM) data. Large volumes and low-cost LBSM data, together with a content analysis algorithm based on natural language processing are used effectively to generate heat exposure maps from human sentiments on social media.
## What’s inside - A quick explanation of the components of the zip file
* US folder includes the shapefile corresponding to the United State with County as spatial unit
* Census_tract folder includes the shapefile corresponding to the Cook County with census tract as spatial unit
* data/data.txt includes instruction to retrieve the sample data either from Keeling or figshare
* geo/data20000.txt is the heat dictionary created in this paper, please refer to the corresponding publication to see the data creation process
Jupyter notebook and code attached to this publication can be found at: https://github.com/cybergis/real_time_heat_exposure_with_LBSMD
keywords:
CyberGIS; Heat Exposure; Location-based Social Media Data; Urban Heat
published:
2024-02-08
Martinez, Carlos; Pena, Gisselle; Wells, Kaylee K.
(2024)
This dataset contains transcribed entries from the "Prairie Directory of North America" (Adelman and Schwartz 2013) for the Tallgrass, Mixed Grass, and Shortgrass prairie regions of the united states. We identified the historical spatial extent of the Tallgrass, Mixed Grass, and Shortgrass prairie regions using Ricketts et al. (1999), Olson et al. (2001), and Dixon et al. (2014) and selected the counties entirely or partially within these boundaries from the USDA Forest Service (2022) file. The resulting lists of counties are included as separate files. The dataset contains information on publicly accessible grasslands and prairies in these regions including acreage and amenities like hunting access, restrooms, parking, and trails.
keywords:
grasslands; prairies; prairie directory of north america; site amenities; site attributes
published:
2021-05-12
Clem, Scott; Harmon-Threatt, Alexandra
(2021)
These are the data sets associated with our publication "Field borders provide winter refuge for beneficial predators and parasitoids: a case study on organic farms." For this project, we compared the communities of overwintering arthropod natural enemies in organic cultivated fields and wildflower-strip field borders at five different sites in central Illinois.
Abstract:
Semi-natural field borders are frequently used in midwestern U.S. sustainable agriculture. These habitats are meant to help diversify otherwise monocultural landscapes and provision them with ecosystem services, including biological control. Predatory and parasitic arthropods (i.e., potential natural enemies) often flourish in these habitats and may move into crops to help control pests. However, detailed information on the capacity of semi-natural field borders for providing overwintering refuge for these arthropods is poorly understood. In this study, we used soil emergence tents to characterize potential natural enemy communities (i.e., predacious beetles, wasps, spiders, and other arthropods) overwintering in cultivated organic crop fields and adjacent field borders. We found a greater abundance, species richness, and unique community composition of predatory and parasitic arthropods in field borders compared to arable crop fields, which were generally poorly suited as overwintering habitat. Furthermore, potential natural enemies tended to be positively associated with forb cover and negatively associated with grass cover, suggesting that grassy field borders with less forb cover are less well-suited as winter refugia. These results demonstrate that semi-natural habitats like field borders may act as a source for many natural enemies on a year-to-year basis and are important for conserving arthropod diversity in agricultural landscapes.
keywords:
Natural enemy; wildflower strips; conservation biological control; semi-natural habitat; field border; organic farming
published:
2020-08-25
Allan, Brian; Fredericks, Lisa
(2020)
The Allan Lab has published a Fluidigm pipeline online. This is the url: https://github.com/HPCBio/allan-fluidigm-pipeline.
This url includes a tutorial for running the pipeline. However it does not have test datasets yet.
This tarball hosted at the Illinois Data Bank is the dataset that completes the github tutorial.
It includes inputs (custom database of tick pathogens and fluidigm raw reads) and output files (tables of samples with taxonomic classifications).
keywords:
custom database of tick pathogens; fluidigm pipeline; fluidigm paired reads; fluidigm tutorial
published:
2020-12-01
This is the data set from the published manuscript 'Vertebrate scavenger guild composition and utilization of carrion in an East Asian temperate forest' by Inagaki et al.
keywords:
Japan;Sika Deer
published:
2021-10-15
Jianhao, Peng; Idoia, Ochoa
(2021)
This is the 5 states 5000 cells synthetic expression file we used for validation of SimiC, a single cell gene regulatory network inference method with similarity constraints. Ground truth GRNs are stored in Numpy array format, and expression profiles of all states combined are stored in Pandas DataFrame in format of Pickle files.
keywords:
Numpy array; GRNs; Pandas DataFrame;
published:
2020-06-01
Hoover, Jeffrey P; Davros, Nicole M; Schelsky, Wendy; Brawn, Jeffry D
(2020)
Dataset associated with Hoover et al AUK-19-093 submission: Local conspecific density does not influence reproductive output in a secondary cavity-nesting songbird. Excel CSV with all of the data used in analyses.
Description of variables
YEARS: year
ORDINAL_DATE: number for what day of the year it is with 1 January = 1,……30 December = 365
SITE: acronym for each study site
BOX: unique nest box identifier on each study site
TREAT: designates whether nest box was in a high- or low- nest box density area within each study site
ACTUAL_NO_NEIGHBORS: number of pairs of warblers using a nest box within 200 m of a given pair’s nest box
CLUTCH_SIZE: number of warbler eggs in nest at the onset of incubation
PROWN: number of warbler nestlings once eggs have hatched
PROWF: number of warbler nestlings that fledged out of the nest box
HATCH_SUCCESS: proportion of eggs in the nest that hatched
FLEDG_SUCCESS: proportion of the nestlings that fledged from the nest box
HATCH_SUCCESS2: binary category where “0” indicates there was some, and “1” indicates there was no hatching failure
FLEDG_SUCCESS2: binary category where “0” indicates there was some, and “1” indicates there was no nestling failure (i.e. nestling death)
BHCO_PARASIT2: binary category where “0” indicates no cowbird parasitism, and “1” indicates there was cowbird parasitism
BHCOE: number of cowbird eggs in clutch
BHCOF: number of cowbird nestlings that fledged from the nest
PAIRID: unique number that identifies a male and female warbler that are together at a nest box and this number is the same in a subsequent nesting attempt or year if the same male and female are together again
FEMALE_ID: unique identifier for each female which represents her leg band combination. Each letter represents a band with letters preceding the hyphen being on the right leg and after the hyphen the left leg
FEM_AGE: binary category where “0” indicates a 1-year-old bird and “1” indicates a >1-year-old bird
FEMALE_BREEDING_ATTEMPT: “1” indicates first, “2” indicates second,……..breeding attempt within a given year
SECOND_ATTEMPT: for any female that fledged a brood in a given year, binary category where “0” represents that they did not, and “1” indicates that they did attempt a second brood that year
F_TOT_PROWF: total reproductive output (number of warbler fledglings produced) for a given female in a given year
MALE_ID: unique identifier for each male which represents his leg band combination. Each letter represents a band with letters preceding the hyphen being on the right leg and after the hyphen the left leg
MALE_AGE2: binary category where “0” indicates a 1-year-old bird and “1” indicates a >1-year-old bird
Provisioning_rate: total number of food provisions per nestling per hour by male and female warbler combined
BROOD_MASS: average nestling mass (g) for the brood
BROOD_TARSUS: average nestling tarsus length (mm) for the brood
Brood_condition: unit-less index of nestling condition that uses the residuals of the BROOD_MASS/BROOD_TARSUS relationship
A period (“.”) represents where data were not collected, not available, or because individual nest or female did not qualify for consideration of a category assignment.
An empty cell represents no data available for this particular cell.
keywords:
conspecific density; density dependence; food limitation; hatching success; nestling body condition; nestling provisioning; Prothonotary Warbler; reproductive output
published:
2023-01-12
Mischo, William; Schlembach, Mary C.; Cabada, Elisandro
(2023)
This dataset was developed as part of a study that examined the correlational relationships between local journal authorship, local and external citation counts, full-text downloads, link-resolver clicks, and four global journal impact factor indices within an all-disciplines journal collection of 12,200 titles and six subject subsets at the University of Illinois at Urbana-Champaign (UIUC) Library. While earlier investigations of the relationships between usage (downloads) and citation metrics have been inconclusive, this study shows strong correlations in the all-disciplines set and most subject subsets. The normalized Eigenfactor was the only global impact factor index that correlated highly with local journal metrics. Some of the identified disciplinary variances among the six subject subsets may be explained by the journal publication aspirations of UIUC researchers. The correlations between authorship and local citations in the six specific subject subsets closely match national department or program rankings.
All the raw data used in this analysis, in the form of relational database tables with multiple columns. Can be opned using MS Access. Description for variables can be viewed through "Design View" (by right clik on the selected table, choose "Design View"). The 2 PDF files provide an overview of tables are included in each MDB file.
In addition, the processing scripts and Pearson correlation code is available at <a href="https://doi.org/10.13012/B2IDB-0931140_V1">https://doi.org/10.13012/B2IDB-0931140_V1</a>.
keywords:
Usage and local citation relationships; publication; citation and usage metrics; publication; citation and usage correlation analysis; Pearson correlation analysis
published:
2022-10-13
Xue, Qingquan; Xue, Qingquan; Dietrich, Christopher H.; Dietrich, Christopher H.; Zhang, Yalin; Zhang, Yalin
(2022)
The text file contains the original DNA nucleotide sequence data used in the phylogenetic analyses of Xue et al. (in review), comprising the 13 protein-coding genes and 2 ribosomal gene subunits of the mitochondrial genome. The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 30 taxa (species) and 13078 characters, indicate that the characters are DNA sequence, that gaps inserted into the DNA sequence alignment are indicated by a dash, and that missing data are indicated by a question mark. The positions of data partitions are indicated in the mrbayes block of commands for the phylogenetic program MrBayes (version 3.2.6) beginning near the end of the file. The mrbayes block also contains instructions for MrBayes on various non-default settings for that program. These are explained in the Methods section of the submitted manuscript. Two supplementary tables in the provided PDF file provide additional information on the species in the dataset, including the GenBank accession numbers for the sequence data (Table S1) and the DNA substitution models used for each of the individual mitochondrial genes and for different codon positions of the protein-coding genes used for analyses in the programs MrBayes and IQ-Tree (version 1.6.8) (Table S2). Full citations for references listed in Table S1 can be found by searching GenBank using the corresponding accession number. The supplemental tables will also be linked to the article upon publication at the journal website.
keywords:
Hemiptera; phylogeny; mitochondrial genome; morphology; leafhopper
published:
2023-07-05
Fu, Yuanxi; Hsiao, Tzu-Kun; Joshi, Manasi Ballal; Lischwe Mueller, Natalie
(2023)
The salt controversy is the public health debate about whether a population-level salt reduction is beneficial. This dataset covers 82 publications--14 systematic review reports (SRRs) and 68 primary study reports (PSRs)--addressing the effect of sodium intake on cerebrocardiovascular disease or mortality. These present a snapshot of the status of the salt controversy as of September 2014 according to previous work by epidemiologists: The reports and their opinion classification (for, against, and inconclusive) were from Trinquart et al. (2016) (Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184 ), which collected 68 PSRs, 14 SRRs, 11 clinical guideline reports, and 176 comments, letters, or narrative reviews. Note that our dataset covers only the 68 PSRs and 14 SRRs from Trinquart et al. 2016, not the other types of publications, and it adds additional information noted below.
This dataset can be used to construct the inclusion network and the co-author network of the 14 SRRs and 68 PSRs. A PSR is "included" in an SRR if it is considered in the SRR's evidence synthesis. Each included PSR is cited in the SRR, but not all references cited in an SRR are included in the evidence synthesis or PSRs. Based on which PSRs are included in which SRRs, we can construct the inclusion network. The inclusion network is a bipartite network with two types of nodes: one type represents SRRs, and the other represents PSRs. In an inclusion network, if an SRR includes a PSR, there is a directed edge from the SRR to the PSR. The attribute file (report_list.csv) includes attributes of the 82 reports, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Notably, 11 PSRs have never been included in any SRR in the dataset. They are unused PSRs. If visualized with the inclusion network, they will appear as isolated nodes.
We used a custom-made workflow (Fu, Y. (2022). Scopus author info tool (1.0.1) [Python]. https://github.com/infoqualitylab/Scopus_author_info_collection ) that uses the Scopus API and manual work to extract and disambiguate authorship information for the 82 reports. The author information file (salt_cont_author.csv) is the product of this workflow and can be used to compute the co-author network of the 82 reports.
We also provide several other files in this dataset. We collected inclusion criteria (the criteria that make a PSR eligible to be included in an SRR) and recorded them in the file systematic_review_inclusion_criteria.csv. We provide a file (potential_inclusion_link.csv) recording whether a given PSR had been published as of the search date of a given SRR, which makes the PSR potentially eligible for inclusion in the SRR. We also provide a bibliography of the 82 publications (supplementary_reference_list.pdf). Lastly, we discovered minor discrepancies between the inclusion relationships identified by Trinquart et al. (2016) and by us. Therefore, we prepared an additional edge list (inclusion_net_edges_trinquart.csv) to preserve the inclusion relationships identified by Trinquart et al. (2016).
<b>UPDATES IN THIS VERSION COMPARED TO V2</b> (Fu, Yuanxi; Hsiao, Tzu-Kun; Joshi, Manasi Ballal (2022): The Salt Controversy Systematic Review Reports and Primary Study Reports Network Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6128763_V2)
- We added a new column "pub_date" to report_list.csv
- We corrected mistakes in supplementary_reference_list.pdf for report #28 and report #80. The author of report #28 is not Salisbury D but Khaw, K.-T., & Barrett-Connor, E. Report #80 was mistakenly mixed up with report #81.
keywords:
systematic reviews; evidence synthesis; network analysis; public health; salt controversy;
published:
2025-03-12
Jeong, Gangwon; Villa, Umberto; Park, Seonyeong; Anastasio, Mark A.
(2025)
References
- Jeong, Gangwon, Umberto Villa, and Mark A. Anastasio. "Revisiting the joint estimation of initial pressure and speed-of-sound distributions in photoacoustic computed tomography with consideration of canonical object constraints." Photoacoustics (2025): 100700.
- Park, Seonyeong, et al. "Stochastic three-dimensional numerical phantoms to enable computational studies in quantitative optoacoustic computed tomography of breast cancer." Journal of biomedical optics 28.6 (2023): 066002-066002.
Overview
- This dataset includes 80 two-dimensional slices extracted from 3D numerical breast phantoms (NBPs) for photoacoustic computed tomography (PACT) studies. The anatomical structures of these NBPs were obtained using tools from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project. The methods used to modify and extend the VICTRE NBPs for use in PACT studies are described in the publication cited above.
- The NBPs in this dataset represent the following four ACR BI-RADS breast composition categories:
> Type A - The breast is almost entirely fatty
> Type B - There are scattered areas of fibroglandular density in the breast
> Type C - The breast is heterogeneously dense
> Type D - The breast is extremely dense
- Each 2D slice is taken from a different 3D NBP, ensuring that no more than one slice comes from any single phantom.
File Name Format
- Each data file is stored as a .mat file. The filenames follow this format: {type}{subject_id}.mat where{type} indicates the breast type (A, B, C, or D), and {subject_id} is a unique identifier assigned to each sample. For example, in the filename D510022534.mat, "D" represents the breast type, and "510022534" is the sample ID.
File Contents
- Each file contains the following variables:
> "type": Breast type
> "p0": Initial pressure distribution [Pa]
> "sos": Speed-of-sound map [mm/μs]
> "att": Acoustic attenuation (power-law prefactor) map [dB/ MHzʸ mm]
> "y": power-law exponent
> "pressure_lossless": Simulated noiseless pressure data obtained by numerically solving the first-order acoustic wave equation using the k-space pseudospectral method, under the assumption of a lossless medium (corresponding to Studies I, II, and III).
> "pressure_lossy": Simulated noiseless pressure data obtained by numerically solving the first-order acoustic wave equation using the k-space pseudospectral method, incorporating a power-law acoustic absorption model to account for medium losses (corresponding to Study IV).
* The pressure data were simulated using a ring-array transducer that consists of 512 receiving elements uniformly distributed along a ring with a radius of 72 mm.
* Note: These pressure data are noiseless simulations. In Studies II–IV of the referenced paper, additive Gaussian i.i.d. noise were added to the measurement data. Users may add similar noise to the provided data as needed for their own studies.
- In Study I, all spatial maps (e.g., sos) have dimensions of 512 × 512 pixels, with a pixel size of 0.32 mm × 0.32 mm.
- In Study II and Study III, all spatial maps (sos) have dimensions of 1024 × 1024 pixels, with a pixel size of 0.16 mm × 0.16 mm.
- In Study IV, both the sos and att maps have dimensions of 1024 × 1024 pixels, with a pixel size of 0.16 mm × 0.16 mm.
keywords:
Medical imaging; Photoacoustic computed tomography; Numerical phantom; Joint reconstruction
published:
2025-10-10
Clark, Teresa J.; Schwender, Jorg
(2025)
Upregulation of triacylglycerols (TAGs) in vegetative plant tissues such as leaves has the potential to drastically increase the energy density and biomass yield of bioenergy crops. In this context, constraint-based analysis has the promise to improve metabolic engineering strategies. Here we present a core metabolism model for the C4 biomass crop Sorghum bicolor (iTJC1414) along with a minimal model for photosynthetic CO2 assimilation, sucrose and TAG biosynthesis in C3 plants. Extending iTJC1414 to a four-cell diel model we simulate C4 photosynthesis in mature leaves with the principal photo-assimilatory product being replaced by TAG produced at different levels. Independent of specific pathways and per unit carbon assimilated, energy content and biosynthetic demands in reducing equivalents are about 1.3 to 1.4 times higher for TAG than for sucrose. For plant generic pathways, ATP- and NADPH-demands per CO2 assimilated are higher by 1.3- and 1.5-fold, respectively. If the photosynthetic supply in ATP and NADPH in iTJC1414 is adjusted to be balanced for sucrose as the sole photo-assimilatory product, overproduction of TAG is predicted to cause a substantial surplus in photosynthetic ATP. This means that if TAG synthesis was the sole photo-assimilatory process, there could be an energy imbalance that might impede the process. Adjusting iTJC1414 to a photo-assimilatory rate that approximates field conditions, we predict possible daily rates of TAG accumulation, dependent on varying ratios of carbon partitioning between exported assimilates and accumulated oil droplets (TAG, oleosin) and in dependence of activation of futile cycles of TAG synthesis and degradation. We find that, based on the capacity of leaves for photosynthetic synthesis of exported assimilates, mature leaves should be able to reach a 20% level of TAG per dry weight within one month if only 5% of the photosynthetic net assimilation can be allocated into oil droplets. From this we conclude that high TAG levels should be achievable if TAG synthesis is induced only during a final phase of the plant life cycle.
keywords:
Feedstock Production;Modeling
published:
2019-12-03
These are the alignments of transcriptome data used for the analysis of members of Heteroptera. This dataset is analyzed in "Deep instability in the phylogenetic backbone of Heteroptera is only partly overcome by transcriptome-based phylogenomics" published in Insect Systematics and Diversity.
keywords:
Heteroptera; Hemiptera; Phylogenomics; transcriptome
published:
2023-03-27
Littlefield, Alexander; Xie, Dajie; Richards, Corey; Ocier, Christian; Gao, Haibo; Messinger, Jonah; Ju, Lawrence; Gao, Jingxing; Edwards, Lonna; Braun, Paul; Goddard, Lynford
(2023)
This dataset contains the full data used in the paper titled "Enabling High Precision Gradient Index Control in Subsurface Multiphoton Lithography," available at https://doi.org/10.1021/acsphotonics.2c01950 .
The data used for Table 1 can be found in the dataset for the related Figure 8.
Some supplemental figures' data can be found in the main figures data:
Figure S2's data is contained in Figure 6.
Figure S4 and Table S1 data is derived from Figure 6.
Figure S9 is derived from Figure 7.
Figure S10 is contained in Figure 7.
Figure S12 is derived from Figure 6 and the Python code prism-fringe-analysis.
Figures without a data file named after them do not have any data affiliated with them and are purely graphical representations.
published:
2021-02-18
Wang, Shaowen; Lyu, Fangzheng; Wang, Shaohua; Catlet, Charles; Padmanabhan, Anand; Soltani, Kiumars
(2021)
Increasingly pervasive location-aware sensors interconnected with rapidly advancing wireless network services are motivating the development of near-real-time urban analytics. This development has revealed both tremendous challenges and opportunities for scientific innovation and discovery. However, state-of-the-art urban discovery and innovation are not well equipped to resolve the challenges of such analytics, which in turn limits new research questions from being asked and answered. Specifically, commonly used urban analytics capabilities are typically designed to handle, process, and analyze static datasets that can be treated as map layers and are consequently ill-equipped in (a) resolving the volume and velocity of urban big data; (b) meeting the computing requirements for processing, analyzing, and visualizing these datasets; and (c) providing concurrent online access to such analytics. To tackle these challenges, we have developed a novel cyberGIS framework that includes computationally reproducible approaches to streaming urban analytics. This framework is based on CyberGIS-Jupyter, through integration of cyberGIS and real-time urban sensing, for achieving capabilities that have previously been unavailable toward helping cities solve challenging urban informatics problems.
The files included in this dataset functions as follows:
1) Spatial_interpolation.ipynb is a python based Jupyter notebook that enables users to conduct spatial interpolation with AoT data;
2) Urban_Informatics.ipynb is a Jupyter notebook that helps to explore the AoT dataset;
3) chicago-complete.weekly.2019-09-30-to-2019-10-06.tar includes all the high-frequency urban sensing data from AoT sensors from 2019 September 30th to 2019 October 6th collected in Chicago, US;
4) sensors.csv is a processed dataset including information about the temperature in Chicago, and it is used in Spatial_interpolation.ipynb.
keywords:
CyberGIS; Urban informatics; Array of Things
published:
2019-05-20
Lao, Yuyang; Schiffer, Peter
(2019)
This is the experimental data of tetris artificial spin ice. The islands are made of Permalloy materials with size of 170 nm by 470 nm by 2.5 nm. The systems are measured at a temperature where the islands are fluctuating around room temperature. The data is recorded as photoemission electron microscopy intensity. More details about the dataset can be found in the file Note.txt and Tetris_data_list.xlsx
Note:
2 files name bl11_teris600_033 and bl11_tetris600_2_135 are not recorded in the excel sheet because they are corrupted during the measurement. Any data that is not recorded in the excel sheet is either corrupted or of low quality.
From files *_028 to *_049, tetris is spelled with “t” while in the raw data folder without “t”. This is a typo. Throughout the dataset, tetris and teris are supposed to have the same meaning.
keywords:
artificial spin ice
published:
2022-02-11
Hoang, Khanh Linh; Schneider, Jodi; Kansara, Yogeshwar
(2022)
The data contains a list of articles given low score by the RCT Tagger and an error analysis of them, which were used in a project associated with the manuscript "Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews".
Change made in this V3 is that the data is divided into two parts:
- Error Analysis of 44 Low Scoring Articles with MEDLINE RCT Publication Type.
- Error Analysis of 244 Low Scoring Articles without MEDLINE RCT Publication Type.
keywords:
Cochrane reviews; automation; randomized controlled trial; RCT; systematic reviews
published:
2024-03-25
Xia, Yushu; Kwon, Hoyoung; Wander, Michelle
(2024)
This accompanying study is published under the title "Estimating soil N2O emissions induced by organic and inorganic fertilizer inputs using a Tier-2, regression-based meta-analytic approach for U.S. agricultural lands" at Science of the Total Environment. The study is authored by Dr. Yushu Xia, Dr. Hoyoung Kwon, and Dr. Michelle Wander. The DOI for this study is <a href="https://doi.org/10.1016/j.scitotenv.2024.171930">https://doi.org/10.1016/j.scitotenv.2024.171930</a>.
keywords:
soil; nitrous oxide; agriculture; fertilizers; meta-analysis
published:
2025-07-30
Skorupa, A. J.; Bried, J. T.
(2025)
This dataset includes three data files for linking species' climate sensitivity, trait combinations, and listing status. It contains species occurrence data within Hydrologic Unit Code 12 (HUC12) watersheds, along with trait information and Rarity and Climate Sensitivity (RCS) index scores for lotic caddisflies, stoneflies, mussels, dragonflies, and crayfish across all Midwest Climate Adaptation Science Center states: Minnesota, Iowa, Missouri, Wisconsin, Illinois, Indiana, Michigan, and Ohio. For mussels, the geographic scope is expanded to include all Midwest Regional Species of Greatest Conservation Need (RSGCN) states—North Dakota, South Dakota, Nebraska, Kansas, and Kentucky. However, occurrence data for mussels is not included due to data-sharing agreements. Metadata are included with each data file. Please refer to the associated manuscript for original data sources, trait references, and details on the RCS index calculation.
keywords:
climate sensitivity; conservation status; traits; aquatic invertebrates; Midwest
published:
2019-08-05
Skinner, Rachel; Dietrich, Christopher; Walden, Kimberly; Gordon, Eric; Sweet, Andrew; Podsiadlowski, Lars; Petersen, Malte; Simon, Chris; Takiya, Daniela; Johnson, Kevin
(2019)
The data in this directory corresponds to:
Skinner, R.K., Dietrich, C.H., Walden, K.K.O., Gordon, E., Sweet, A.D., Podsiadlowski, L., Petersen, M., Simon, C., Takiya, D.M., and Johnson, K.P.
Phylogenomics of Auchenorrhyncha (Insecta: Hemiptera) using Transcriptomes: Examining Controversial Relationships via Degeneracy Coding and Interrogation of Gene Conflict.
Systematic Entomology.
Correspondance should be directed to: Rachel K. Skinner, rskinn2@illinois.edu
If you use these data, please cite our paper in Systematic Entomology.
The following files can be found in this dataset:
Amino_acid_concatenated_alignment.phy: the amino acid alignment used in this analysis in phylip format.
Amino_acid_raxml_partitions.txt (for reference only): the partitions for the amino acid alignment, but a partitioned amino acid analysis was not performed in this study.
Amino_acid_concatenated_tree.newick: the best maximum likelihood tree with bootstrap values in newick format.
ASTRAL_input_gene_trees.tre: the concatenated gene tree input file for ASTRAL
README_pie_charts.md: explains the the scripts and data needed to recreate the pie charts figure from our paper. There is also another
Corresponds to the following files:
ASTRAL_species_tree_EN_only.newick: the species tree with only effective number (EN) annotation
ASTRAL_species_tree_pp1_only.newick: the species tree with only the posterior probability 1 (main topology) annotation
ASTRAL_species_tree_q1_only.newick: the species tree with only the quartet scores for the main topology (q1)
ASTRAL_species_tree_q2_only.newick: the species tree with only the quartet scores for the first alternative topology (q2)
ASTRAL_species_tree_q3_only.newick: the species tree with only the quartet scores for the second alternative topology (q3)
print_node_key_files.py: script needed to create the following files:
node_keys.key: text file with node IDs and topologies
complete_q_scores.key: text file with node IDs multiplied q scores
EN_node_vals.key: text file with node IDs and EN values
create_pie_charts_tree.py: script needed to visualize the tree with pie charts, pp1, and EN values plotted at nodes
ASTRAL_species_tree_full_annotation.newick: the species tree with full annotation from the ASTRAL analysis.
NOTE: It may be more useful to examine individual value files if you want to visualize the tree,
e.g., in figtree, since the full annotations are extensive and can make viewing difficult.
Complete_NT_concatenated_alignment.phy: the nucleotide alignment that includes unmodified third codon positions. The alignment is in phylip format.
Complete_NT_raxml_partitions.txt: the raxml-style partition file of the nucleotide partitions
Complete_NT_concatenated_tree.newick: the best maximum likelihood tree from the concatenated complete analysis NT with bootstrap values in newick format
Complete_NT_partitioned_tree.newick: the best maximum likelihood tree from the partitioned complete NT analysis with bootstrap values in newick format
Degeneracy_coded_nt_concatenated_alignment.phy: the degeneracy coded nucleotide alignment in phylip format
Degeneracy_coded_nt_raxml_partitions.txt: the raxml-style partition file for the degeneracy coded nucleotide alignment
Degeneracy_coded_nt_concatenated_tree.newick: the best maximum likelihood tree from the degeneracy-coded concatenated analysis with bootstrap values in newick format
Degeneracy_coded_nt_partitioned_tree.newick: the best maximum likelihood tree from the degeneracy-coded partitioned analysis with bootstrap values in newick format
count_ingroup_taxa.py: script that counts the number of ingroup and/or outgroup taxa present in an alignment
keywords:
Auchenorrhyncha; Hemiptera; alignment; trees
published:
2025-10-10
Dong, Chang; Shi, Zhuwei; Huang, Lei; Zhao, Huimin; Xu, Zhinan; Lian, Jiazhang
(2025)
Mitochondrion is generally considered as the most promising subcellular organelle for compartmentalization engineering. Much progress has been made in reconstituting whole metabolic pathways in the mitochondria of yeast to harness the precursor pools (i.e., pyruvate and acetyl-CoA), bypass competing pathways, and minimize transportation limitations. However, only a few mitochondrial targeting sequences (MTSs) have been characterized (i.e., MTS of COX4), limiting the application of compartmentalization engineering for multigene biosynthetic pathways in the mitochondria of yeast. In the present study, based on the mitochondrial proteome, a total of 20 MTSs were cloned and the efficiency of these MTSs in targeting heterologous proteins, including the Escherichia coli FabI and enhanced green fluorescence protein (EGFP) into the mitochondria was evaluated by growth complementation and confocal microscopy. After systematic characterization, six of the well-performed MTSs were chosen for the colocalization of complete biosynthetic pathways into the mitochondria. As proof of concept, the full α-santalene biosynthetic pathway consisting of 10 expression cassettes capable of converting acetyl-coA to α-santalene was compartmentalized into the mitochondria, leading to a 3.7-fold improvement in the production of α-santalene. The newly characterized MTSs should contribute to the expanded metabolic engineering and synthetic biology toolbox for yeast mitochondrial compartmentalization engineering.
keywords:
Conversion;Metabolic Engineering
published:
2020-11-05
Miller, Andrew; Raudabaugh, Daniel
(2020)
This version 2 dataset contains 34 files in total with one (1) additional file, called "Culture-dependent Isolate table with taxonomic determination and sequence data.csv". The remaining files (33) are identical to version 1. The following is the information about the new file and its variables:
<b>Culture-dependent Isolate table with taxonomic determination and sequence data.csv</b>: Culture table with assigned taxonomy from NCBI. Single direction sequence for each isolate is include if one could be obtained. Sequence is derived from ITS1F-ITS4 PCR amplicons, with Sanger sequencing in one direction using ITS5. The files contains 20 variables with explanation as below:
IsolateNumber : unique number identify each isolate cultured
Time: season in which the sample was collected
Location: the specific name of the location
Habitat: type of habitat : either stream or peatland
State: state in the USA in which the specific location is located
Incubation_pH ID: pH of the medium during isolation of fungal cultures
Genus: phylogenetic genus of the fungal isolates (determined by sequence similarity)
Sequence_quality: base call quality of the entire sequence used for blast analysis, if known
%_coverage: sequence coverage reported from GenBank
%_ID: sequence similarity reported from GenBank
Life_style : ecological life style if known
Phylum: phylogenetic phylum as indicated by Index Fungorum
Subphylum: phylogenetic subphylum as indicated by Index Fungorum
Class: phylogenetic class as indicated by Index Fungorum
Subclass: phylogenetic subclass as indicated by Index Fungorum
Order: phylogenetic order as indicated by Index Fungorum
Family: phylogenetic Family as indicated by Index Fungorum
ITS5_Sequence: single direction sequence used for sequence similarity match using blastn. Primer ITS5
Fasta: sequence with nomenclature in a fasta format for easy cut and paste into phylogenetic software
Note: blank cells mean no data is available or unknown.
keywords:
ITS1 forward reads; Illumina; peatlands; streams; bogs; fens
published:
2019-05-07
Detmer, Thomas; Wahl, David
(2019)
Data set of trophic cascade in mesocosms experiments for zooplankton (biomass and body size) and phytoplankton (chlorophyll a concentration) caused by Bluegill as well as zooplankton production in those same treatment groups. Zooplankton were collected by tube sampler and phytoplankton were collected through grab samples.
keywords:
Trophic cascades; size-selective predation; compensatory mechanisms; biomanipulation; invasive fish; Daphnia; Moina
published:
2020-02-12
Price, Edward; Spyreas, Greg; Matthews, Jeffrey
(2020)
This is the dataset used in the Landscape Ecology publication of the same name. This dataset consists of the following files:
NWCA_Int_Veg.txt
NWCA_Reg_Veg.txt
NWCA_Site_Attributes.txt
NWCA_Int_Veg.txt is a site and plot by species matrix. Column labeled SITES consists of site IDs. Column labeled Plots consist of Plot ID numbers. All other columns represent species abundances (estimates of percent cover, summed across five plots).
NWCA_Reg_Veg.txt is a site by species matrix of species abundances. Column labeled SITES consist of site IDs. All other columns represent species abundances (estimates of percent cover within individual plots).
NWCA_Site_Attributes.txt is a matrix of site attributes. Column labeled SITES consist of site IDs. Column labeled AA_CENTER_LAT consist of latitudinal coordinates for the Assessment Area center point in decimal degrees. Column labeled AA_CENTER_LONG consist of longitudinal coordinates for the Assessment Area center point in decimal degrees. Column REFPLUS_NWCA represents disturbance gradient classes including MIN (minimally disturbed), L (least disturbed), I (intermediate), M (most disturbed). Column REFPLUS_NWCA2 represents revised disturbance gradient classes based on protocols described in the article. These revised classes were used for analysis. Column labeled STRESS_HEAVYMETAL represents heavy metal stressor classes, used to ascertain which wetlands were missing soil data. Classes in the STRESS_HEAVYMETAL column include Low, Moderate, High, and Missing. Sites with Missing STRESS_HEAVYMETAL classes were removed from analysis.
More information about this dataset: All of the data used in this analysis was gathered from the National Wetlands Condition Assessment. Wetland surveys were conducted from 4/4/2011 to 11/2/2011. The entire National Wetlands Condition Assessment Dataset, which includes 3640 unique taxonomic identities of plants, can be found at: https://www.epa.gov/national-aquatic-resource-surveys/data-national-aquatic-resource-surveys
keywords:
Anthropogenic disturbance; β-Diversity; Biotic homogenization; Phalaris arundinacea; reed canary grass; Wetlands
published:
2024-12-05
Salami, Malik Oyewale; McCumber, Corinne
(2024)
This project investigates retraction indexing agreement among data sources: BCI, BIOABS, CCC, Compendex, Crossref, GEOBASE, MEDLINE, PubMed, Retraction Watch, Scopus, and Web of Science Core. Post-retraction citation may be partly due to authors’ and publishers' challenges in systematically identifying retracted publications. To investigate retraction indexing quality, we investigate the agreement in indexing retracted publications between 11 database sources, restricting to their coverage, resulting in a union list of 85,392 unique items. We also discuss common errors in indexing retracted publications. Our results reveal low retraction indexing agreement scores, indicating that databases widely disagree on indexing retracted publications they cover, leading to a lack of consistency in what publications are identified as retracted. Our findings highlight the need for clear and standard practices in the curation and management of retracted publications.
Pipeline code to get the result files can be found in the GitHub repository
https://github.com/infoqualitylab/retraction-indexing-agreement in the ‘src’ file containing iPython notebooks:
The ‘unionlist_completed-ria_2024-07-09.csv’ file has been redacted to remove proprietary data, as noted below in README.txt. Among our sources, data is openly available only for Crossref, PubMed, and Retraction Watch.
FILE FORMATS:
1) unionlist_completed-ria_2024-07-09.csv - UTF-8 CSV file
2) README.txt - text file
keywords:
retraction status; data quality; indexing; retraction indexing; metadata; meta-science; RISRS