Displaying 1 - 25 of 708 in total
Subject Area
Funder
Publication Year
License
Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2024-12-10
 
MMAudio pretrained large-44k-v2 model. These models can be used in the open-sourced codebase https://github.com/hkchengrex/MMAudio
published: 2024-12-07
 
MMAudio pretrained models. These models can be used in the open-sourced codebase https://github.com/hkchengrex/MMAudio
published: 2024-11-19
 
This project investigates retraction indexing agreement among data sources: Crossref, Retraction Watch, Scopus, and Web of Science. As of July 2024, this reassesses the April 2023 union list of Schneider et al. (2023): https://doi.org/10.55835/6441e5cae04dbe5586d06a5f. As of April 2023, over 1 in 5 DOIs had discrepancies in retraction indexing among the 49,924 DOIs indexed as retracted in at least one of Crossref, Retraction Watch, Scopus, and Web of Science (Schneider et al., 2023). Here, we determine what changed in 15 months. Pipeline code to get the results files can be found in the GitHub repository https://github.com/infoqualitylab/retraction-indexing-agreement in the iPython notebook 'MET-STI2024_Reassessment_of_retraction_indexing_agreement.ipynb' Some files have been redacted to remove proprietary data, as noted in README.txt. Among our sources, data is openly available only for Crossref and Retraction Watch. FILE FORMATS: 1) unionlist_completed_2023-09-03-crws-ressess.csv - UTF-8 CSV file 2) unionlist_completed-ria_2024-07-09-crws-ressess.csv - UTF-8 CSV file 3) unionlist-15months-period_sankey.png - Portable Network Graphics (PNG) file 4) unionlist_ria_proportion_comparison.png - Portable Network Graphics (PNG) file 5) README.txt - text file FILE DESCRIPTION: Description of the files can be found in README.txt
keywords: retraction status; data quality; indexing; retraction indexing; metadata; meta-science; RISRS
published: 2024-10-31
 
School buses transport 20 million students annually and are currently undergoing electrification in the US. With Vehicle-to-Building (V2B) technology, electric school buses (ESBs) can supply energy to school buildings during power outages, ensuring continued operation and safety. This study proposes assessing the resilience of secondary schools during outages by leveraging ESB fleets as backup power across various US climate regions. The findings indicate that the current fleet of ESBs in representative cities across different climate regions in the US is insufficient to meet the power demands of an entire school or even its HVAC system. However, we estimated the number of ESBs required to support the school's power needs, and we showed that the use of V2B technology significantly reduces carbon emissions compared to backup diesel generators. While adjusting HVAC setpoints and installing solar panels have limited impacts on enhancing school resilience, gathering students in classrooms during outages significantly improved resilience in our case study in Houston, Texas. Given the ongoing electrification of school buses, it is essential for schools to complement ESBs with stationary batteries and other backup power sources, such as solar and/or diesel generators, to effectively address prolonged outages. Determining the deployment of direct current fast and Level 2 chargers can reduce infrastructure costs while maintaining the resilience benefits of ESBs. This dataset includes the simulation process and results of this study.
keywords: Electric school bus; Power outages,;Vehicle-to-Building technology; Carbon emission reduction; Backup power source
published: 2024-12-05
 
This project investigates retraction indexing agreement among data sources: BCI, BIOABS, CCC, Compendex, Crossref, GEOBASE, MEDLINE, PubMed, Retraction Watch, Scopus, and Web of Science Core. Post-retraction citation may be partly due to authors’ and publishers' challenges in systematically identifying retracted publications. To investigate retraction indexing quality, we investigate the agreement in indexing retracted publications between 11 database sources, restricting to their coverage, resulting in a union list of 85,392 unique items. We also discuss common errors in indexing retracted publications. Our results reveal low retraction indexing agreement scores, indicating that databases widely disagree on indexing retracted publications they cover, leading to a lack of consistency in what publications are identified as retracted. Our findings highlight the need for clear and standard practices in the curation and management of retracted publications. Pipeline code to get the result files can be found in the GitHub repository https://github.com/infoqualitylab/retraction-indexing-agreement in the ‘src’ file containing iPython notebooks: The ‘unionlist_completed-ria_2024-07-09.csv’ file has been redacted to remove proprietary data, as noted below in README.txt. Among our sources, data is openly available only for Crossref, PubMed, and Retraction Watch. FILE FORMATS: 1) unionlist_completed-ria_2024-07-09.csv - UTF-8 CSV file 2) README.txt - text file
keywords: retraction status; data quality; indexing; retraction indexing; metadata; meta-science; RISRS
published: 2024-12-05
 
Data consists of RNA expression, tuber mass, photosynthetic capacity and diurnal CO2 assimilation calculations, potato tuber nutrient content, photorespiratory metabolite analysis and meteorological data to support the increase in yield and thermotolerance observed in potato plants with an introduce photorespiratory bypass. Data was collected between 2019-2024 at University of Illinois at Urbana-Champaign, IL, USA.
keywords: Photorespiratory bypass; photosynthesis; photorespiration; food security; potato
published: 2024-12-01
 
Healthy mares were kept at pasture for 3 weeks, stabled for 5 weeks, returned to pasture and an final sample collected 6 weeks later. Samples were collected weekly: gastric fluid by double-tube nasogastric intubation and aspiration, feces by rectal palpation. Microbial DNA was isolated using the QIAamp PowerFecal Pro DNA kit. Full length 16S, ITS and partial 23S rRNA gene libraries were created using the Shoreline Complete ID kit.
published: 2024-11-27
 
Honey bee (apis mellifera) MERFISH data set prepared by the Han lab, from brains collected by the Robinson lab at UIUC. Dataset is comprised of ~22 thousand cells and 130 genes with x,y locations for each cell. Jupyter notebook file is included as an example to load the data using Scanpy.
keywords: smFISH; single transcript spatial transcriptomics; Honey bee brain; Apis mellifera; MERFISH
published: 2020-09-02
 
Citation context annotation. This dataset is a second version (V2) and part of the supplemental data for Jodi Schneider, Di Ye, Alison Hill, and Ashley Whitehorn. (2020) "Continued post-retraction citation of a fraudulent clinical trial report, eleven years after it was retracted for falsifying data". Scientometrics. In press, DOI: 10.1007/s11192-020-03631-1 Publications were selected by examining all citations to the retracted paper Matsuyama 2005, and selecting the 35 citing papers, published 2010 to 2019, which do not mention the retraction, but which mention the methods or results of the retracted paper (called "specific" in Ye, Di; Hill, Alison; Whitehorn (Fulton), Ashley; Schneider, Jodi (2020): Citation context annotation for new and newly found citations (2006-2019) to retracted paper Matsuyama 2005. University of Illinois at Urbana-Champaign. <a href="https://doi.org/10.13012/B2IDB-8150563_V1">https://doi.org/10.13012/B2IDB-8150563_V1</a> ). The annotated citations are second-generation citations to the retracted paper Matsuyama 2005 (RETRACTED: Matsuyama W, Mitsuyama H, Watanabe M, Oonakahara KI, Higashimoto I, Osame M, Arimura K. Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD. Chest. 2005 Dec 1;128(6):3817-27.), retracted in 2008 (Retraction in: Chest (2008) 134:4 (893) https://doi.org/10.1016/S0012-3692(08)60339-6). <b>OVERALL DATA for VERSION 2 (V2)</b> FILES/FILE FORMATS Same data in two formats: 2010-2019 SG to specific not mentioned FG.csv - Unicode CSV (preservation format only) - same as in V1 2010-2019 SG to specific not mentioned FG.xlsx - Excel workbook (preferred format) - same as in V1 Additional files in V2: 2G-possible-misinformation-analyzed.csv - Unicode CSV (preservation format only) 2G-possible-misinformation-analyzed.xlsx - Excel workbook (preferred format) <b>ABBREVIATIONS: </b> 2G - Refers to the second-generation of Matsuyama FG - Refers to the direct citation of Matsuyama (the one the second-generation item cites) <b>COLUMN HEADER EXPLANATIONS </b> File name: 2G-possible-misinformation-analyzed. Other column headers in this file have same meaning as explained in V1. The following are additional header explanations: Quote Number - The order of the quote (citation context citing the first generation article given in "FG in bibliography") in the second generation article (given in "2G article") Quote - The text of the quote (citation context citing the first generation article given in "FG in bibliography") in the second generation article (given in "2G article") Translated Quote - English translation of "Quote", automatically translation from Google Scholar Seriousness/Risk - Our assessment of the risk of misinformation and its seriousness 2G topic - Our assessment of the topic of the cited article (the second generation article given in "2G article") 2G section - The section of the citing article (the second generation article given in "2G article") in which the cited article(the first generation article given in "FG in bibliography") was found FG in bib type - The type of article (e.g., review article), referring to the cited article (the first generation article given in "FG in bibliography") FG in bib topic - Our assessment of the topic of the cited article (the first generation article given in "FG in bibliography") FG in bib section - The section of the cited article (the first generation article given in "FG in bibliography") in which the Matsuyama retracted paper was cited
keywords: citation context annotation; retraction; diffusion of retraction; second-generation citation context analysis
suppressed by curator
 
published: 2024-11-18
 
This dataset supports the implementation described in the manuscript "Breaking the Barrier of Human-Annotated Training Data for Machine-Learning-Aided Biological Research Using Aerial Imagery." It consists of UAV aerial imagery used to execute the code available at https://github.com/pixelvar79/GAN-Flowering-Detection-paper. For detailed information on dataset usage and instructions on implementing the code to reproduce the study, please refer to the GitHub repository.
keywords: Plant phenotyping; generative and adversarial learning; phenotyping; UAV; UAS, drone
published: 2024-10-18
 
Exhaustive species inventory of suburban wetland complex in northeast Ohio (Cuyahoga County).
keywords: floristic survey; wetland complex; comprehensive species list
published: 2024-10-16
 
School testing data were provided by Shield Illinois (ShieldIL), which conducted weekly in-school testing on behalf of the Illinois Department of Public Health (IDPH) for all participating schools in the state excluding Chicago Public Schools. The populations and proportions of students and employees in the studied school districts are reported by Elementary/Secondary Information System (ElSi) database.
keywords: COVID-19; school testing
published: 2023-07-05
 
The salt controversy is the public health debate about whether a population-level salt reduction is beneficial. This dataset covers 82 publications--14 systematic review reports (SRRs) and 68 primary study reports (PSRs)--addressing the effect of sodium intake on cerebrocardiovascular disease or mortality. These present a snapshot of the status of the salt controversy as of September 2014 according to previous work by epidemiologists: The reports and their opinion classification (for, against, and inconclusive) were from Trinquart et al. (2016) (Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184 ), which collected 68 PSRs, 14 SRRs, 11 clinical guideline reports, and 176 comments, letters, or narrative reviews. Note that our dataset covers only the 68 PSRs and 14 SRRs from Trinquart et al. 2016, not the other types of publications, and it adds additional information noted below. This dataset can be used to construct the inclusion network and the co-author network of the 14 SRRs and 68 PSRs. A PSR is "included" in an SRR if it is considered in the SRR's evidence synthesis. Each included PSR is cited in the SRR, but not all references cited in an SRR are included in the evidence synthesis or PSRs. Based on which PSRs are included in which SRRs, we can construct the inclusion network. The inclusion network is a bipartite network with two types of nodes: one type represents SRRs, and the other represents PSRs. In an inclusion network, if an SRR includes a PSR, there is a directed edge from the SRR to the PSR. The attribute file (report_list.csv) includes attributes of the 82 reports, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Notably, 11 PSRs have never been included in any SRR in the dataset. They are unused PSRs. If visualized with the inclusion network, they will appear as isolated nodes. We used a custom-made workflow (Fu, Y. (2022). Scopus author info tool (1.0.1) [Python]. https://github.com/infoqualitylab/Scopus_author_info_collection ) that uses the Scopus API and manual work to extract and disambiguate authorship information for the 82 reports. The author information file (salt_cont_author.csv) is the product of this workflow and can be used to compute the co-author network of the 82 reports. We also provide several other files in this dataset. We collected inclusion criteria (the criteria that make a PSR eligible to be included in an SRR) and recorded them in the file systematic_review_inclusion_criteria.csv. We provide a file (potential_inclusion_link.csv) recording whether a given PSR had been published as of the search date of a given SRR, which makes the PSR potentially eligible for inclusion in the SRR. We also provide a bibliography of the 82 publications (supplementary_reference_list.pdf). Lastly, we discovered minor discrepancies between the inclusion relationships identified by Trinquart et al. (2016) and by us. Therefore, we prepared an additional edge list (inclusion_net_edges_trinquart.csv) to preserve the inclusion relationships identified by Trinquart et al. (2016). <b>UPDATES IN THIS VERSION COMPARED TO V2</b> (Fu, Yuanxi; Hsiao, Tzu-Kun; Joshi, Manasi Ballal (2022): The Salt Controversy Systematic Review Reports and Primary Study Reports Network Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6128763_V2) - We added a new column "pub_date" to report_list.csv - We corrected mistakes in supplementary_reference_list.pdf for report #28 and report #80. The author of report #28 is not Salisbury D but Khaw, K.-T., & Barrett-Connor, E. Report #80 was mistakenly mixed up with report #81.
keywords: systematic reviews; evidence synthesis; network analysis; public health; salt controversy;
published: 2023-09-21
 
The relationship between physical activity and mental health, especially depression, is one of the most studied topics in the field of exercise science and kinesiology. Although there is strong consensus that regular physical activity improves mental health and reduces depressive symptoms, some debate the mechanisms involved in this relationship as well as the limitations and definitions used in such studies. Meta-analyses and systematic reviews continue to examine the strength of the association between physical activity and depressive symptoms for the purpose of improving exercise prescription as treatment or combined treatment for depression. This dataset covers 27 review articles (either systematic review, meta-analysis, or both) and 365 primary study articles addressing the relationship between physical activity and depressive symptoms. Primary study articles are manually extracted from the review articles. We used a custom-made workflow (Fu, Yuanxi. (2022). Scopus author info tool (1.0.1) [Python]. <a href="https://github.com/infoqualitylab/Scopus_author_info_collection">https://github.com/infoqualitylab/Scopus_author_info_collection</a> that uses the Scopus API and manual work to extract and disambiguate authorship information for the 392 reports. The author information file (author_list.csv) is the product of this workflow and can be used to compute the co-author network of the 392 articles. This dataset can be used to construct the inclusion network and the co-author network of the 27 review articles and 365 primary study articles. A primary study article is "included" in a review article if it is considered in the review article's evidence synthesis. Each included primary study article is cited in the review article, but not all references cited in a review article are included in the evidence synthesis or primary study articles. The inclusion network is a bipartite network with two types of nodes: one type represents review articles, and the other represents primary study articles. In an inclusion network, if a review article includes a primary study article, there is a directed edge from the review article node to the primary study article node. The attribute file (article_list.csv) includes attributes of the 392 articles, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Collectively, this dataset reflects the evidence production and use patterns within the exercise science and kinesiology scientific community, investigating the relationship between physical activity and depressive symptoms. FILE FORMATS 1. article_list.csv - Unicode CSV 2. author_list.csv - Unicode CSV 3. Chinese_author_name_reference.csv - Unicode CSV 4. inclusion_net_edges.csv - Unicode CSV 5. review_article_details.csv - Unicode CSV 6. supplementary_reference_list.pdf - PDF 7. README.txt - text file 8. systematic_review_inclusion_criteria.csv - Unicode CSV <b>UPDATES IN THIS VERSION COMPARED TO V3</b> (Clarke, Caitlin; Lischwe Mueller, Natalie; Joshi, Manasi Ballal; Fu, Yuanxi; Schneider, Jodi (2023): The Inclusion Network of 27 Review Articles Published between 2013-2018 Investigating the Relationship Between Physical Activity and Depressive Symptoms. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4614455_V3) - We added a new file systematic_review_inclusion_criteria.csv.
keywords: systematic reviews; meta-analyses; evidence synthesis; network visualization; tertiary studies; physical activity; depressive symptoms; exercise; review articles
published: 2024-11-01
 
This dataset includes data on soil nitrous oxide fluxes, soil properties, and climate presented in the manuscript, "A conceptual model explaining spatial variation in soil nitrous oxide emissions in agricultural fields," published in Commucations Earth & Environment. Please refer to that publication for details about methodologies used to generate these data and for the experimental design.
keywords: soil nitrous oxide emissions; gross nitrous oxide production; gross nitrous oxide consumption; N2O; denitrification; maize; cannon model
published: 2024-11-07
 
This dataset consists of the 286 publications retrieved from Web of Science and Scopus on July 6, 2023 as citations for Willoughby et al., 2014: Patrick H. Willoughby, Matthew J. Jansma, and Thomas R. Hoye (2014). A guide to small-molecule structure assignment through computation of (¹H and ¹³C) NMR chemical shifts. Nature Protocols, 9(3), Article 3. https://doi.org/10.1038/nprot.2014.042 We added the DOIs of the citing publications into a Zotero collection. Then we exported all 286 DOIs in two formats: a .csv file (data export) and an .rtf file (bibliography). <b>Willoughby2014_286citing_publications.csv</b> is a Zotero data export of the citing publications. <b>Willoughby2014_286citing_publications.rtf</b> is a bibliography of the citing publications, using a variation of the American Psychological Association style (7th edition) with full names instead of initials. To create <b>Willoughby2014_citation_contexts.csv</b>, HZ manually extracted the paragraphs that contain a citation marker of Willoughby et al., 2014. We refer to these paragraphs as the citation contexts of Willoughby et al., 2014. Manual extraction started with 286 citing publications but excluded 2 publications that are not in English, those with DOIs 10.13220/j.cnki.jipr.2015.06.004 and 10.19540/j.cnki.cjcmm.20200604.201 The silver standard aimed to triage the citing publications of Willoughby et al., 2014 that are at risk of propagating unreliability due to a code glitch in a computational chemistry protocol introduced in Willoughby et al., 2014. The silver standard was created stepwise: First one chemistry expert (YF) manually annotated the corpus of 284 citing publications in English, using their full text and citation contexts. She manually categorized publications as either at risk of propagating unreliability or not at risk of propagating unreliability, with a rationale justifying each category. Then we selected a representative sample of citation contexts to be double annotated. To do this, MJS turned the full dataset of citation contexts (Willoughby2014_citation_contexts.csv) into word embeddings, clustered them using similarity measures using BERTopic's HDBS, and selected representative citation contexts based on the centroids of the clusters. Next the second chemistry expert (EV) annotated the 77 publications associated with the citation contexts, considering the full text as well as the citation contexts. <b>double_annotated_subset_77_before_reconciliation.csv</b> provides EV and YF's annotation before reconciliation. To create the silver standard YF, EV, and JS discussed differences and reconciled most differences. YF and EV had principled reasons for disagreeing on 9 publications; to handle these, YF updated the annotations, to create the silver standard we use for evaluation in the remainder of our JCDL 2024 paper (<b>silver_standard.csv</b>) <b>Inter_Annotator_Agreement.xlsx</b> indicates publications where the two annotators made opposite decisions and calculates the inter-annotator agreement before and after reconciliation together. <b>double_annotated_subset_77_before_reconciliation.csv</b> provides EV and YF's annotation after reconciliation, including applying the reconciliation policy.
keywords: unreliable cited sources; knowledge maintenance; citations; scientific digital libraries; scholarly publications; reproducibility; unreliability propagation; citation contexts
published: 2018-04-19
 
MapAffil 2016 dataset -- PubMed author affiliations mapped to cities and their geocodes worldwide. Prepared by Vetle Torvik 2018-04-05 The dataset comes as a single tab-delimited Latin-1 encoded file (only the City column uses non-ASCII characters), and should be about 3.5GB uncompressed. &bull; How was the dataset created? The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check here for information to get PubMed/MEDLINE, and NLMs data <a href ="https://www.nlm.nih.gov/databases/download/pubmed_medline.html">Terms and Conditions</a> &bull; Affiliations are linked to a particular author on a particular article. Prior to 2014, NLM recorded the affiliation of the first author only. However, MapAffil 2016 covers some PubMed records lacking affiliations that were harvested elsewhere, from PMC (e.g., PMID 22427989), NIH grants (e.g., 1838378), and Microsoft Academic Graph and ADS (e.g. 5833220). &bull; Affiliations are pre-processed (e.g., transliterated into ASCII from UTF-8 and html) so they may differ (sometimes a lot; see PMID 27487542) from PubMed records. &bull; All affiliation strings where processed using the MapAffil procedure, to identify and disambiguate the most specific place-name, as described in: <i>Torvik VI. MapAffil: A bibliographic tool for mapping author affiliation strings to cities and their geocodes worldwide. D-Lib Magazine 2015; 21 (11/12). 10p</i> &bull; Look for <a href="https://doi.org/10.1186/s41182-017-0073-6">Fig. 4</a> in the following article for coverage statistics over time: <i>Palmblad M, Torvik VI. Spatiotemporal analysis of tropical disease research combining Europe PMC and affiliation mapping web services. Tropical medicine and health. 2017 Dec;45(1):33.</i> Expect to see big upticks in coverage of PMIDs around 1988 and for non-first authors in 2014. &bull; The code and back-end data is periodically updated and made available for query by PMID at <a href="http://abel.ischool.illinois.edu/">Torvik Research Group</a> &bull; What is the format of the dataset? The dataset contains 37,406,692 rows. Each row (line) in the file has a unique PMID and author postition (e.g., 10786286_3 is the third author name on PMID 10786286), and the following thirteen columns, tab-delimited. All columns are ASCII, except city which contains Latin-1. 1. PMID: positive non-zero integer; int(10) unsigned 2. au_order: positive non-zero integer; smallint(4) 3. lastname: varchar(80) 4. firstname: varchar(80); NLM started including these in 2002 but many have been harvested from outside PubMed 5. year of publication: 6. type: EDU, HOS, EDU-HOS, ORG, COM, GOV, MIL, UNK 7. city: varchar(200); typically 'city, state, country' but could inlude further subvisions; unresolved ambiguities are concatenated by '|' 8. state: Australia, Canada and USA (which includes territories like PR, GU, AS, and post-codes like AE and AA) 9. country 10. journal 11. lat: at most 3 decimals (only available when city is not a country or state) 12. lon: at most 3 decimals (only available when city is not a country or state) 13. fips: varchar(5); for USA only retrieved by lat-lon query to https://geo.fcc.gov/api/census/block/find
keywords: PubMed, MEDLINE, Digital Libraries, Bibliographic Databases; Author Affiliations; Geographic Indexing; Place Name Ambiguity; Geoparsing; Geocoding; Toponym Extraction; Toponym Resolution
published: 2021-05-07
 
Prepared by Vetle Torvik 2021-05-07 The dataset comes as a single tab-delimited Latin-1 encoded file (only the City column uses non-ASCII characters). • How was the dataset created? The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in December, 2018. (NLMs baseline 2018 plus updates throughout 2018). Affiliations are linked to a particular author on a particular article. Prior to 2014, NLM recorded the affiliation of the first author only. However, MapAffil 2018 covers some PubMed records lacking affiliations that were harvested elsewhere, from PMC (e.g., PMID 22427989), NIH grants (e.g., 1838378), and Microsoft Academic Graph and ADS (e.g. 5833220). Affiliations are pre-processed (e.g., transliterated into ASCII from UTF-8 and html) so they may differ (sometimes a lot; see PMID 27487542) from PubMed records. All affiliation strings where processed using the MapAffil procedure, to identify and disambiguate the most specific place-name, as described in: Torvik VI. MapAffil: A bibliographic tool for mapping author affiliation strings to cities and their geocodes worldwide. D-Lib Magazine 2015; 21 (11/12). 10p • Look for Fig. 4 in the following article for coverage statistics over time: Palmblad, M., Torvik, V.I. Spatiotemporal analysis of tropical disease research combining Europe PMC and affiliation mapping web services. Trop Med Health 45, 33 (2017). <a href="https://doi.org/10.1186/s41182-017-0073-6">https://doi.org/10.1186/s41182-017-0073-6</a> Expect to see big upticks in coverage of PMIDs around 1988 and for non-first authors in 2014. • The code and back-end data is periodically updated and made available for query by PMID at http://abel.ischool.illinois.edu/cgi-bin/mapaffil/search.py • What is the format of the dataset? The dataset contains 52,931,957 rows (plus a header row). Each row (line) in the file has a unique PMID and author order, and contains the following eighteen columns, tab-delimited. All columns are ASCII, except city which contains Latin-1. 1. PMID: positive non-zero integer; int(10) unsigned 2. au_order: positive non-zero integer; smallint(4) 3. lastname: varchar(80) 4. firstname: varchar(80); NLM started including these in 2002 but many have been harvested from outside PubMed 5. initial_2: middle name initial 6. orcid: From 2019 ORCID Public Data File https://orcid.org/ and from PubMed XML 7. year: year of the publication 8. journal: name of journal that the publication is published 9. affiliation: author's affiliation?? 10. disciplines: extracted from departments, divisions, schools, laboratories, centers, etc. that occur on at least unique 100 affiliations across the dataset, some with standardization (e.g., 1770799), English translations (e.g., 2314876), or spelling corrections (e.g., 1291843) 11. grid: inferred using a high-recall technique focused on educational institutions (but, for experimental purposes, includes a few select hospitals, national institutes/centers, international companies, governmental agencies, and 200+ other IDs [RINGGOLD, Wikidata, ISNI, VIAF, http] for institutions not in GRID). Based on 2019 GRID version https://www.grid.ac/ 12. type: EDU, HOS, EDU-HOS, ORG, COM, GOV, MIL, UNK 13. city: varchar(200); typically 'city, state, country' but could include further subdivisions; unresolved ambiguities are concatenated by '|' 14. state: Australia, Canada and USA (which includes territories like PR, GU, AS, and post-codes like AE and AA) 15. country 16. lat: at most 3 decimals (only available when city is not a country or state) 17. lon: at most 3 decimals (only available when city is not a country or state) 18. fips: varchar(5); for USA only retrieved by lat-lon query to https://geo.fcc.gov/api/census/block/find
keywords: PubMed, MEDLINE, Digital Libraries, Bibliographic Databases; Author Affiliations; Geographic Indexing; Place Name Ambiguity; Geoparsing; Geocoding; Toponym Extraction; Toponym Resolution; institution name disambiguation
published: 2024-07-08
 
A population genetics study was conducted on three plant taxa in the genus Physaria that are found on the Kaibab Plateau (Arizona, USA). Physaria kingii subsp. kaibabensis is endemic to the Kaibab Plateau, and is of conservation concern because of its rarity, limited range, and potential threats to its long-term persistence. Additionally, the taxon is a candidate for federal protection under the Endangered Species Act. It was not clear how genetically isolated P. k. subsp. kaibabensis was from Physaria kingii subsp. latifolia, which is a widespread subspecies found throughout the southwestern USA, including on the Kaibab Plateau. Additionally, other authors have suggested that P. k. subsp. kaibabensis may hybridize with Physaria arizonica, a different species that is also widespread and found on and off the Kaibab Plateau. We conducted a population genetics study of all three groups to better determine the conservation status of P. k. subsp. kaibabensis. Genetic data are in the form of nuclear DNA microsatellites for 13 loci (all apparently diploid). Additionally, we have included location information for the collection sites. We collected tissue samples from on and off the Kaibab Plateau. The overall findings are shared in a manuscript being submitted for peer-review.
keywords: Physaria kingii; Kaibab Plateau; endemism; conservation genetics; rare species biology
published: 2024-11-15
 
This page contains the data for the manuscript "Vacuolating cytotoxin A interactions with the host cell surface". This manuscript is currently in prep.
keywords: Steven R Blanke; Vacuolating cytotoxin A; VacA; Helicobacter pylori; protein binding; sphingomyelin; cell surface
published: 2024-11-15
 
BL30K is a synthetic dataset rendered using Blender with ShapeNet's data. We break the dataset into six segments, each with approximately 5K videos. The videos are organized in a similar format as DAVIS and YouTubeVOS, so dataloaders for those datasets can be used directly. Each video is 160 frames long, and each frame has a resolution of 768*512. There are 3-5 objects per video, and each object has a random smooth trajectory -- we tried to optimize the trajectories in a greedy fashion to minimize object intersection (not guaranteed), with occlusions still possible (happen a lot in reality). See [Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion (MiVOS), CVPR 2022] for details.
published: 2024-11-14
 
These data represent the raw data from the paper “The invasion of Japanese hop (Humulus japonicus) in a restored floodplain forest” published in Invasive Plant Science and Management by Annie H. Huang and Jeffrey W. Matthews.
keywords: invasive plants; restored wetlands
published: 2024-11-14
 
These data are social media posts on Facebook and Twitter, as identified by SCOPES and healthfeedback.org as misinformation. We independently pulled social media data using Brandwatch’s (previously Crimson Hexagon) historical Twitter database and CrowdTangle, a public insights tool owned and operated by Facebook. Each of these databases only store publicly tagged posts and both databases have been used as Twitter and Facebook data sources in previous academic research studies (see, for example, Yun, Pamuksuz, and Duff 2019; Jernigan and Rushman 2014). The period on which we searched was January 1, 2020, to March 31, 2021. The original misinformation links were screenshots of posts or memes, links to native Facebook, Twitter, or Reddit posts and links to articles/websites containing misinformation.These links were passed through CrowdTangle to verify that they were not labeled. This process gave us a dataset of posts of unlabeled misinformation links. We found 12,184 instances of HF’s COVID-19 misinformation links being shared on Twitter versus 6,388 instances of the same links being shared on Facebook.
keywords: Covid-19; Facebook; Twitter; Social Media: Misinformation; Labelling
published: 2024-11-13
 
These datasets are for the four-dimensional scanning transmission electron microscopy (4D-STEM) and electron energy loss spectroscopy (EELS) experiments for cathode nanoparticles at different states. The raw 4D-STEM experiment datasets were collected by TEM image & analysis software (FEI) and were saved as SER files. The raw 4D-STEM datasets of SER files can be opened and viewed in MATLAB using our analysis software package of imToolBox available at https://github.com/flysteven/imToolBox. The raw EELS datasets were collected by DigitalMicrograph software and were saved as DM4 files. The raw EELS datasets can be opened and viewed in DigitalMicrograph software or using our analysis codes available at https://github.com/chenlabUIUC/OrientedPhaseDomain. All the datasets are from the work "Nanoscale Stacking Fault Engineering and Mapping in Spinel Oxides for Reversible Multivalent Ion Insertion" (2024). The 4D-STEM experiment data include four example datasets for cathode nanoparticles collected at pristine and discharged states. Each dataset contains a stack of diffraction patterns collected at different probe positions scanned across the cathode nanoparticle. 1. Pristine untreated nanoparticle: "Pristine U-NP.ser" 2. Pristine 200ºC heated nanoparticle: "Pristine H200-NP.ser" 3. Untreated nanoparticle after first discharge in Zn-ion batteries: "Discharged U-NP.ser" 4. 200ºC heated nanoparticle after first discharge in Zn-ion batteries: "Discharged H200-NP.ser" The EELS experiment data includes six example datasets for cathode nanoparticles collected at different states (in "EELS datasets.zip") as described below. Each EELS dataset contains the zero-loss and core-loss EELS spectra collected at different probe positions scanned across the cathode nanoparticle. 1. Pristine untreated nanoparticle: "Pristine U-NP EELS.zip" 2. Pristine 200ºC heated nanoparticle: "Prisitne H200-NP EELS.zip" 3. Untreated nanoparticle after first discharge in Zn-ion batteries: "Discharged U-NP EELS.zip" 4. Untreated nanoparticle after first charge in Zn-ion batteries: "Charged U-NP EELS.zip" 5. 200ºC heated nanoparticle after first discharge in Zn-ion batteries: "Discharged H200-NP EELS.zip" 6. 200ºC heated nanoparticle after first charge in Zn-ion batteries: "Charged H200-NP EELS.zip" The details of the software package and codes that can be used to analyze the 4D-STEM datasets and EELS datasets are available at: https://github.com/chenlabUIUC/OrientedPhaseDomain. Once our paper is formally published, we will update the relationship of these datasets with our paper.
keywords: 4D-STEM; EELS; defects; strain; cathode; nanoparticle; energy storage
published: 2024-11-12
 
This is the data set for the article entitled "Pollinator seed mixes are phenologically dissimilar to prairie remnants," a manuscript pending publication in Restoration Ecology. This represents the core phenology data of prairie remnant and pollinator seed mixes that were used for the main analyses. Note that additional data associated with the manuscript are intended to be published as a supplement in the journal.
keywords: native plants; ecological restoration; tallgrass prairie; native plant materials