Illinois Data Bank
Log in with NetID
University Library, University of Illinois at Urbana-Champaign
Illinois Data Bank
Log in with NetID
25 per page
50 per page
Displaying datasets 101 - 125 of 478 in total
Generate Report from Search Results
Life Sciences (254)
Social Sciences (114)
Physical Sciences (68)
Technology and Engineering (38)
Arts and Humanities (1)
U.S. National Science Foundation (NSF) (139)
U.S. National Institutes of Health (NIH) (49)
U.S. Department of Energy (DOE) (42)
U.S. Department of Agriculture (USDA) (23)
Illinois Department of Natural Resources (IDNR) (10)
U.S. National Aeronautics and Space Administration (NASA) (5)
U.S. Geological Survey (USGS) (5)
Illinois Department of Transportation (IDOT) (1)
U.S. Army (1)
CC BY (186)
Lyu, Fangzheng; Xu, Zewei; Ma, Xinlin; Wang, Shaohua; Li, Zhiyu; Wang, Shaowen (2021): A Vector-Based Method for Drainage Network Analysis Based on LiDAR Data . University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6359717_V1
Drainage network analysis is fundamental to understanding the characteristics of surface hydrology. Based on elevation data, drainage network analysis is often used to extract key hydrological features like drainage networks and streamlines. Limited by raster-based data models, conventional drainage network algorithms typically allow water to flow in 4 or 8 directions (surrounding grids) from a raster grid. To resolve this limitation, this paper describes a new vector-based method for drainage network analysis that allows water to flow in any direction around each location. The method is enabled by rapid advances in Light Detection and Ranging (LiDAR) remote sensing and high-performance computing. The drainage network analysis is conducted using a high-density point cloud instead of Digital Elevation Models (DEMs) at coarse resolutions. Our computational experiments show that the vector-based method can better capture water flows without limiting the number of directions due to imprecise DEMs. Our case study applies the method to Rowan County watershed, North Carolina in the US. After comparing the drainage networks and streamlines detected with corresponding reference data from US Geological Survey generated from the Geonet software, we find that the new method performs well in capturing the characteristics of water flows on landscape surfaces in order to form an accurate drainage network. This dataset contains all the code, notebooks, datasets used in the study conducted for the research publication titled " A Vector-Based Method for Drainage Network Analysis Based on LiDAR Data ". ## What's Inside A quick explanation of the components * `A Vector Approach to Drainage Network Analysis Based on LiDAR Data.ipynb` is a notebook for finding the drainage network based on LiDAR data *`Picture1.png` is a picture representing the pseudocode of our new algorithm * HPC` folder contains codes for running the algorithm with sbatch in HPC ** `execute.sh` is a bash script file that use sbatch to conduct large scale analysis for the algorithm ** `run.sh` is a bash script file that calls the script file `execute.sh` for large scale calculation for the algorithm ** `run.py` includes the codes implemented for the algorithm * `Rowan Creek Data` includes data that are used in the study ** `3_1.las` and `3_2.las ` are the LiDAR data files that is used in our analysis presented in the paper. Users may use this data file to reproduce our results and may replace it with their own LiDAR file to run this method over different areas ** `reference` folder includes reference data from USGS *** `reference_3_1.tif` and `reference_3_2.tif` are reference data for the drainage system analysis retrieved from USGS.
CyberGIS; Drainage System Analysis; LiDAR
Detmer, Thomas (2021): Temperature, dissolved oxygen, and Secchi depth of Illinois Reservoirs. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1187851_V1
This data set describes temperature, dissolved oxygen, and secchi depth in 1-m interval profiles in the deepest point in 10 Illinois reservoirs between the years 1995 and 2016.
Water temperature; dissolved oxygen; secchi depth; climate change
Peng, Jianhao; Ochoa, Idoia (2021): ClonalKinetic Data and Intermediate Results of SimiC. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3975180_V1
This dataset contains the ClonalKinetic dataset that was used in SimiC and its intermediate results for comparison. The Detail description can be found in the text file 'clonalKinetics_Example_data_description.txt' and 'ClonalKinetics_filtered.DF_data_description.txt'. The required input data for SimiC contains: 1. ClonalKinetics_filtered.clustAssign.txt => cluster assignment for each cell. 2. ClonalKinetics_filtered.DF.pickle => filtered scRNAseq matrix. 3. ClonalKinetics_filtered.TFs.pickle => list of driver genes. The results after running SimiC contains: 1. ClonalKinetics_filtered_L10.01_L20.01_Ws.pickle => inferred GRNs for each cluster 2. ClonalKinetics_filtered_L10.01_L20.01_AUCs.pickle => regulon activity scores for each cell and each driver gene. <b>NOTE:</b> “ClonalKinetics_filtered.rds” file which is mentioned in “ClonalKinetics_filtered.DF_data_description.txt” is an intermediate file and the authors have put all the processed in the pickle/txt file as described in the filtered data text.
Wang, Justin; Curtis, Jeffrey H; Riemer, Nicole; West, Matthew (2021): Data from: Learning coagulation processes with combinatorially-invariant neural networks. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3904737_V1
This dataset contains all the necessary information to recreate the study presented in the paper entitled "Learning coagulation processes with combinatorially-invariant neural networks". This consists of (1) the aggregated output files used for machine learning, (2) the machine learning codes used to learn the presented models, (3) the PartMC model source code that was used to generate the simulation data and (4) the Python scripts used construct the scenario library for training and testing simulations. This data was used to investigate a method (combinatorally-invariant neural network) for learning the aerosol process of coagulation. This data may be useful for application of other methods.
Machine learning; Atmospheric chemistry; Particle-resolved modeling; Coagulation; Atmospheric Science
Stern, Jessica; Herman, Brook D. ; Matthews, Jeffrey (2021): Data from determining vegetation metric robustness to environmental and methodological variables . University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0938556_V1
We studied vegetation metric robustness to environmental (season, interannual, and regional) and methodological (observer) variables, as well as adequate sample size for vegetation metrics across four regions of the United States.
coefficients of conservatism; floristic quality assessment; restoration; vegetation metric;
Clark, Lindsay V.; Mays, Wittney; Lipka, Alexander E.; Sacks, Erik J. (2021): Dataset for evaluating the Hind/He statistic in polyRAD. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4814898_V1
All of the files in this dataset pertain to the evaluation of a novel statistic, Hind/He, for distinguishing Mendelian loci from paralogs. They are derived from a RAD-seq genotyping dataset of diploid and tetraploid Miscanthus sacchariflorus.
Zaharias, Paul; Grosshauser, Martin; Warnow, Tandy (2021): Data from "Re-evaluating Deep Neural Networks for Phylogeny Estimation: The issue of taxon sampling". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8921156_V1
This repository includes datasets for the paper "Re-evaluating Deep Neural Networks for Phylogeny Estimation: The issue of taxon sampling" accepted for RECOMB2021 and submitted to Journal of Computational Biology. Each zipped file contains a README.
deep neural networks; heterotachy; GHOST; quartet estimation; phylogeny estimation
planned publication date: 2022-08-20
Jones, Todd; Ward, Michael (2022): Jones and Ward BEAS-D-21-00106R2. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4619552_V1
Dataset associated with Jones and Ward BEAS-D-21-00106R2 submission: Parasitic cowbird development up to fledging and subsequent post-fledging survival reflect life history variation found across host species. Excel CSV files and .inp file with data used in nest survival and Brown-headed Cowbird post-fledging analyses and file with descriptions of each column. The CSV file is setup for logistic exposure models in SAS or R and the .inp file is setup to be uploaded into program MARK for multi-state recaptures only analysis. Species included in the analyses: American Robin, Blue Grosbeak, Brown Thrasher, Blue-winged Warbler, Carolina Chickadee, Chipping Sparrow, Common Yellowthroat, Dickcissel, Eastern Bluebird, Eastern Phoebe, Eastern Towhee, Field Sparrow, Gray Catbird, House Wren, Indigo Bunting, Northern Cardinal, Red-winged Blackbird, Tree Swallow, Yellow-breasted Chat, and Yellow Warbler.
brood parasitism; cowbird; carryover effects; phenotypic plasticity; post-fledging; songbirds
von Haden, Adam C.; DeLucia, Evan H.; Yang, Wendy; Burnham, Mark (2021): Maize and Sorghum Establishment and Yield following Pre-Emergence Waterlogging. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8293871_V1
In 2020, early-season extreme precipitation events occurred following the planting of Sorghum bicolor (L.) Moench and Zea mays L. in central Illinois that caused ponding. Following the first rainfall event 50m transects were established to assess the waterlogging effects on seedling emergence and crop yields. Soil moisture, emergence, stem and tiller count, LAI, and yield were measured at various points in the season along these transects.
Sorghum; Maize; Emergence; Yield; LAI
Felix, Hanau; Hannes, Rost; Ochoa, Idoia (2021): mspack-data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1396774_V2
This data set contains mass spectrometry data used for the publication "mspack: efficient lossless and lossy mass spectrometry data compression".
mass-spectrometry data; compression; proteomics
Long, Stephen Patrick; Acevedo-Siaca, Liana Gabriella (2021): Data for publication "Evaluating natural variation, heritability, and genetic advance of photosynthetic traits in rice (Oryza sativa)". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3427028_V1
1. Rice H2 - Destructive Harvest - These data are for the destructive harvest (above-ground biomass) of 30 diverse indica rice genotypes that were grown to evaluate natural variation as well as the heritability of photosynthesis-related traits. Traits measured include: plant height, leaf area, plant fresh and dry weights, and tiller number. 2. Rice H2 - ACi Response Summary - These data characterize the response of CO2 uptake to change in intercellular CO2 concentration in 30 diverse indica rice genotypes. These measurements were taken to evaluate natural variation and the heritability of photosynthesis-related traits in rice. 3. Rice H2 - Survey Style Gas Exchange Measurements - These data document steady-state survey style gas exchange measurements in 30 diverse indica rice genotypes. These measurements were taken to evaluate natural variation and the heritability of photosynthesis-related traits in rice.
photosynthesis, photosynthetic capacity, natural variation, heritability, food security, rice
Ferguson, John; Fernandes, Samuel; Monier, Brandon; Miller, Nathan; Allen, Dylan; Dmitrieva, Anna; Schmuker, Peter; Lozano, Roberto; Valluru, Ravi; Buckler, Edward; Gore, Michael; Brown, Patrick; Spalding, Edgar; Leakey, Andrew (2021): Machine learning enabled phenotyping for GWAS and TWAS of WUE traits in 869 field-grown sorghum accessions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5565022_V2
This dataset contains the images of a photoperiod sensitive sorghum accession population used for a GWAS/TWAS study of leaf traits related to water use efficiency in 2016 and 2017. *<b>Note:</b> new in this second version is that JPG images outputted from the nms files were added <b>Accessions_2016.zip</b> and <b>Accessions_2017.zip</b>: contain raw images produced by Optical Topometer (nms files) for all sorghum accessions. Images can be opened with Nanofocus μsurf analysis extended software (Oberhausen,Germany). <b>Accessions_2016_jpg.zip</b> and <b>Accessions_2017_jpg.zip</b>: contain jpg images outputted from the nms files and used in the machine learning phenotyping.
stomata; segmentation; water use efficiency
Lotspeich-Yadao, Michael (2021): State of Illinois - Common Spatial Geodatabase for the Social Sciences. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4857915_V1
This geodatabase serves two purposes: 1) to provide State of Illinois agencies with a fast resource for the preparation of maps and figures that require the use of shape or line files from federal agencies, the State of Illinois, or the City of Chicago, and 2) as a start for social scientists interested in exploring how geographic information systems (whether this is data visualization or geographically weighted regression) can bring new meaning to the interpretation of their data. All layer files included are relevant to the State of Illinois. Sources for this geodatabase include the U.S. Census Bureau, U.S. Geological Survey, City of Chicago, Chicago Public Schools, Chicago Transit Authority, Regional Transportation Authority, and Bureau of Transportation Statistics.
State of Illinois; City of Chicago; Chicago Public Schools; GIS; Statistical tabulation areas; hydrography
Sabrina, Sadia; Lewis, Quinn; Rhoads, Bruce (2021): Data on Confluence Hydrodynamics from Large-scale Particle Velocimetry. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1079505_V1
This dataset contains data derived from large-scale particle velocimetry measurements obtained at the confluence of the Saline Branch and an unnamed tributary in Illinois. The data were collected using two cameras positioned about the confluence, one mounted on a cable and the other mounted on a tripod. A description of the content of the files can be found in Description of Files.rtf.
confluence; hydrodynamics; LSPIV; flow structure; stagnation
Proescholdt, Randi (2021): RISRS Retraction Review - Field Variation Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2070560_V1
This data comes from a scoping review associated with the project called Reducing the Inadvertent Spread of Retracted Science. The data summarizes the fields that have been explored by existing research on retraction, a list of studies comparing retraction in different fields, and a list of studies focused on retraction of COVID-19 articles.
retraction; fields; disciplines; research integrity
Miller, Jim; Czesny, Sergiusz; Dai, Qihong; Ellis, James; Iverson, Louis; Matthews, Jeff; Roswell, Charlie; Suski, Cory; Taft, John; Ward, Mike (2021): An Assessment of the Impacts of Climate Change in Illinois, Chapter 6: Climate Change Impacts on Ecosystems, Supplement 6.1: Scientific and Common Species Names. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9049988_V1
Please cite as: Jim Miller, Sergiusz Czesny, Qihong Dai, James Ellis, Louis Iverson, Jeff Matthews, Charles Roswell, Cory Suski, John Taft, and Mike Ward. 2021. “Climate Change Impacts on Ecosystems: Scientific and Common Species Names”.
Scientific names; Common names; Illinois species
Zuckermann, Federico (2021): Bacillus-based direct-fed microbial reduces the pathogenic synergy of a co-infection with Salmonella enterica serovar Choleraesuis and porcine reproductive and respiratory syndrome virus. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0439780_V1
Raw data and its analysis collected from a trial designed to test the impact of providing a Bacillus-based direct-fed microbial (DFM) on the syndrome resulting from orally infecting pigs with either Salmonella enterica serotype Choleraesuis (S. Choleraesuis) alone, or in combination with an intranasal challenge, three days later, with porcine reproductive and respiratory syndrome virus (PRRSV).
Shen, Chengze; Zaharias, Paul; Warnow, Tandy (2021): MAGUS+eHMMs: Improved Multiple Sequence Alignment Accuracy for Fragmentary Sequences. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2419626_V1
This dataset contains 1) the cleaned version of 11 CRW datasets, 2) RNASim10k dataset in high fragmentation and 3) three CRW datasets (16S.3, 16S.T, 16S.B.ALL) in high fragmentation.
MAGUS;UPP;Multiple Sequence Alignment;PASTA;eHMMs
Gramig, Benjamin; Khanna, Madhu; Jain, Atul (2021): An Assessment of the Impacts of Climate Change in Illinois, Chapter 4: Climate Change Impacts on Agriculture, Supplemental Materials. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8285949_V1
This document contains the Supplemental Materials for Chapter 4: Climate Change Impacts on Agriculture from the report "An Assessment of the Impacts of Climate Change in Illinois" published in 2021.
Illinois; climate change; agriculture; impacts; adaptation; crop yield; ISAM; econometrics; days suitable for fieldwork
Iverson, Louis (2021): An Assessment of the Impacts of Climate Change in Illinois, Chapter 6: Climate Change Impacts on Ecosystems, Supplemental Forest Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3459813_V1
Supplemental Forest Data for Chapter 6: Climate Change Impacts on Ecosystems in "An Assessment of the Impacts of Climate Change in Illinois"
Hsiao, Tzu-Kun; Schneider, Jodi (2021): Dataset for "Continued use of retracted papers: Temporal trends in citations and (lack of) awareness of retractions shown in citation contexts in biomedicine". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8255619_V2
This dataset includes five files. Descriptions of the files are given as follows: <b>FILENAME: PubMed_retracted_publication_full_v3.tsv</b> - Bibliographic data of retracted papers indexed in PubMed (retrieved on August 20, 2020, searched with the query "retracted publication" [PT] ). - Except for the information in the "cited_by" column, all the data is from PubMed. - PMIDs in the "cited_by" column that meet either of the two conditions below have been excluded from analyses:  PMIDs of the citing papers are from retraction notices (i.e., those in the “retraction_notice_PMID.csv” file).  Citing paper and the cited retracted paper have the same PMID. ROW EXPLANATIONS - Each row is a retracted paper. There are 7,813 retracted papers. COLUMN HEADER EXPLANATIONS 1) PMID - PubMed ID 2) Title - Paper title 3) Authors - Author names 4) Citation - Bibliographic information of the paper 5) First Author - First author's name 6) Journal/Book - Publication name 7) Publication Year 8) Create Date - The date the record was added to the PubMed database 9) PMCID - PubMed Central ID (if applicable, otherwise blank) 10) NIHMS ID - NIH Manuscript Submission ID (if applicable, otherwise blank) 11) DOI - Digital object identifier (if applicable, otherwise blank) 12) retracted_in - Information of retraction notice (given by PubMed) 13) retracted_yr - Retraction year identified from "retracted_in" (if applicable, otherwise blank) 14) cited_by - PMIDs of the citing papers. (if applicable, otherwise blank) Data collected from iCite. 15) retraction_notice_pmid - PMID of the retraction notice (if applicable, otherwise blank) <b>FILENAME: PubMed_retracted_publication_CitCntxt_withYR_v3.tsv</b> - This file contains citation contexts (i.e., citing sentences) where the retracted papers were cited. The citation contexts were identified from the XML version of PubMed Central open access (PMCOA) articles. - This is part of the data from: Hsiao, T.-K., & Torvik, V. I. (manuscript in preparation). Citation contexts identified from PubMed Central open access articles: A resource for text mining and citation analysis. - Citation contexts that meet either of the two conditions below have been excluded from analyses:  PMIDs of the citing papers are from retraction notices (i.e., those in the “retraction_notice_PMID.csv” file).  Citing paper and the cited retracted paper have the same PMID. ROW EXPLANATIONS - Each row is a citation context associated with one retracted paper that's cited. - In the manuscript, we count each citation context once, even if it cites multiple retracted papers. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) year - Publication year of the citing paper 4) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, tbl_fig_caption = tables and table/figure captions) 5) IMRaD - IMRaD section of the citation context (I = Introduction, M = Methods, R = Results, D = Discussions/Conclusion, NoIMRaD = not identified) 6) sentence_id - The ID of the citation context in a given location. For location information, please see column 4. The first sentence in the location gets the ID 1, and subsequent sentences are numbered consecutively. 7) total_sentences - Total number of sentences in a given location 8) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 9) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 10) citation - The citation context 11) progression - Position of a citation context by centile within the citing paper. 12) retracted_yr - Retraction year of the retracted paper 13) post_retraction - 0 = not post-retraction citation; 1 = post-retraction citation. A post-retraction citation is a citation made after the calendar year of retraction. <b>FILENAME: 724_knowingly_post_retraction_cit.csv</b> (updated) - The 724 post-retraction citation contexts that we determined knowingly cited the 7,813 retracted papers in "PubMed_retracted_publication_full_v3.tsv". - Two citation contexts from retraction notices have been excluded from analyses. ROW EXPLANATIONS - Each row is a citation context. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) pub_type - Publication type collected from the metadata in the PMCOA XML files. 4) pub_type2 - Specific article types. Please see the manuscript for explanations. 5) year - Publication year of the citing paper 6) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, table_or_figure_caption = tables and table/figure captions) 7) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 8) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 9) citation - The citation context 10) retracted_yr - Retraction year of the retracted paper 11) cit_purpose - Purpose of citing the retracted paper. This is from human annotations. Please see the manuscript for further information about annotation. 12) longer_context - A extended version of the citation context. (if applicable, otherwise blank) Manually pulled from the full-texts in the process of annotation. <b>FILENAME: Annotation manual.pdf</b> - The manual for annotating the citation purposes in column 11) of the 724_knowingly_post_retraction_cit.tsv. <b>FILENAME: retraction_notice_PMID.csv</b> (new file added for this version) - A list of 8,346 PMIDs of retraction notices indexed in PubMed (retrieved on August 20, 2020, searched with the query "retraction of publication" [PT] ).
citation context; in-text citation; citation to retracted papers; retraction
Rozansky, Zachary; Larson, Eric; Taylor, Christopher (2021): Data for “Invasive virile crayfish (Faxonius virilis Hagen, 1870) hybridizes with native spothanded crayfish (Faxonius punctimanus Creaser, 1933) in the Current River watershed of Missouri, U.S.”. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7683513_V1
This dataset contains 1 CSV file: RozanskyLarsonTaylorMsat.csv which contains microsatellite fragment lengths for Virile and Spothanded Crayfish from the Current River watershed of Missouri, U.S., and complimentary data, including assignments to species by phenotype and COI sequence data, GenBank accession numbers for COI sequence data, study sites with dates of collection and geographic coordinates, and Illinois Natural History Survey (INHS) Crustacean Collection lots where specimens are stored.
invasive species; hybridization; crayfishes; streams; freshwater; Cambaridae; virile crayfish; spothanded crayfish; Missouri; Current River; Ozark National Scenic Riverways
Szydlowski, Daniel; Daniels, Melissa; Larson, Eric (2021): Data for Do rusty crayfish (Faxonius rusticus) invasions affect water clarity in north temperate lakes?. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4293962_V1
Data associated with the manuscript "Do rusty crayfish invasions affect water clarity in north temperate lakes?" by Daniel K. Szydlowski, Melissa K. Daniels, and Eric R. lARSON
chlorophyll a; crayfish; Faxonius rusticus; invasive species; lakes; LandSat; remote sening; rusty crayfish; Secchi disc; water clarity
Fu, Yuanxi; Schneider, Jodi (2021): Dataset for Fifty Ways to Tag your Pubtypes: Multi-Tagger, a Set of Probabilistic Publication Type and Study Design Taggers to Support Biomedical Indexing and Evidence-Based Medicine. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4635945_V1
This dataset contains data from extreme-disagreement analysis described in paper “Aaron M. Cohen, Jodi Schneider, Yuanxi Fu, Marian S. McDonagh, Prerna Das, Arthur W. Holt, Neil R. Smalheiser, 2021, Fifty Ways to Tag your Pubtypes: Multi-Tagger, a Set of Probabilistic Publication Type and Study Design Taggers to Support Biomedical Indexing and Evidence-Based Medicine.” In this analysis, our team experts carried out an independent formal review and consensus process for extreme disagreements between MEDLINE indexing and model predictive scores. “Extreme disagreements” included two situations: (1) an abstract was MEDLINE indexed as a publication type but received low scores for this publication type, and (2) an abstract received high scores for a publication type but lacked the corresponding MEDLINE index term. “High predictive score” is defined as the top 100 high-scoring, and “low predictive score” is defined as the bottom 100 low-scoring. Three publication types were analyzed, which are CASE_CONTROL_STUDY, COHORT_STUDY, and CROSS_SECTIONAL_STUDY. Results were recorded in three Excel workbooks, named after the publication types: case_control_study.xlsx, cohort_study.xlsx, and cross_sectional_study.xlsx. The analysis shows that, when the tagger gave a high predictive score (>0.9) on articles that lacked a corresponding MEDLINE indexing term, independent review suggested that the model assignment was correct in almost all cases (CROSS_SECTIONAL_STUDY (99%), CASE_CONTROL_STUDY (94.9%), and COHORT STUDY (92.2%)). Conversely, when articles received MEDLINE indexing but model predictive scores were very low (<0.1), independent review suggested that the model assignment was correct in the majority of cases: CASE_CONTROL_STUDY (85.4%), COHORT STUDY (76.3%), and CROSS_SECTIONAL_STUDY (53.6%). Based on the extreme disagreement analysis, we identified a number of false-positives (FPs) and false-negatives (FNs). For case control study, there were 5 FPs and 14 FNs. For cohort study, there were 7 FPs and 22 FNs. For cross-sectional study, there were 1 FP and 45 FNs. We reviewed and grouped them based on patterns noticed, providing clues for further improving the models. This dataset reports the instances of FPs and FNs along with their categorizations.
biomedical informatics; machine learning; evidence based medicine; text mining
Castro, Daniel; Sweedler, Jonathan (2021): High-Throughput Single-Organelle Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5949772_V1
The dataset contains the high-throughput matrix-assisted laser desorption/ionization mass spectrometry XmL files for the atrial gland and red hemiduct of Aplysia californica.
Dense-core vesicle; High-throughput; Mass Spectrometry; MALDI; Organelle; Image-Guided; Atrial gland; red hemiduct; Lucent Vesicle