Displaying 451 - 475 of 656 in total

Subject Area

Life Sciences (353)
Social Sciences (136)
Physical Sciences (99)
Technology and Engineering (65)
Uncategorized (2)
Arts and Humanities (1)


Other (201)
U.S. National Science Foundation (NSF) (194)
U.S. Department of Energy (DOE) (68)
U.S. National Institutes of Health (NIH) (60)
U.S. Department of Agriculture (USDA) (43)
Illinois Department of Natural Resources (IDNR) (17)
U.S. National Aeronautics and Space Administration (NASA) (6)
U.S. Geological Survey (USGS) (6)
Illinois Department of Transportation (IDOT) (4)
U.S. Army (2)

Publication Year

2021 (108)
2022 (108)
2020 (96)
2023 (78)
2019 (72)
2018 (62)
2024 (57)
2017 (36)
2016 (30)
2025 (4)
2009 (1)
2011 (1)
2012 (1)
2014 (1)
2015 (1)


CC0 (363)
CC BY (273)
custom (20)


published: 2020-03-03
This second version (V2) provides additional data cleaning compared to V1, additional data collection (mainly to include data from 2019), and more metadata for nodes. Please see NETWORKv2README.txt for more detail.
keywords: citations; retraction; network analysis; Web of Science; Google Scholar; indirect citation
published: 2020-04-07
Baseline data from a multi-modal intervention study conducted at the University of Illinois at Urbana-Champaign. Data include results from a cardiorespiratory fitness assessment (maximal oxygen consumption, VO2max), a body composition assessment (Dual-Energy X-ray Absorptiometry, DXA), and Magnetic Resonance Spectroscopy Imaging. Data set includes data from 435 participants, ages 18-44 years.
keywords: Magnetic Resonance Spectroscopy; N-acetyl aspartic acid (NAA); Body Mass Index; cardiorespiratory fitness; body composition
published: 2020-05-04
The Cline Center Historical Phoenix Event Data covers the period 1945-2019 and includes 8.2 million events extracted from 21.2 million news stories. This data was produced using the state-of-the-art PETRARCH-2 software to analyze content from the New York Times (1945-2018), the BBC Monitoring's Summary of World Broadcasts (1979-2019), the Wall Street Journal (1945-2005), and the Central Intelligence Agency’s Foreign Broadcast Information Service (1995-2004). It documents the agents, locations, and issues at stake in a wide variety of conflict, cooperation and communicative events in the Conflict and Mediation Event Observations (CAMEO) ontology. The Cline Center produced these data with the generous support of Linowes Fellow and Faculty Affiliate Prof. Dov Cohen and help from our academic and private sector collaborators in the Open Event Data Alliance (OEDA). For details on the CAMEO framework, see: Schrodt, Philip A., Omür Yilmaz, Deborah J. Gerner, and Dennis Hermreck. "The CAMEO (conflict and mediation event observations) actor coding framework." In 2008 Annual Meeting of the International Studies Association. 2008. http://eventdata.parusanalytics.com/papers.dir/APSA.2005.pdf Gerner, D.J., Schrodt, P.A. and Yilmaz, O., 2012. Conflict and mediation event observations (CAMEO) Codebook. http://eventdata.parusanalytics.com/cameo.dir/CAMEO.Ethnic.Groups.zip For more information about PETRARCH and OEDA, see: http://openeventdata.org/
keywords: OEDA; Open Event Data Alliance (OEDA); Cline Center; Cline Center for Advanced Social Research; civil unrest; petrarch; phoenix event data; violence; protest; political; conflict; political science
published: 2020-08-21
# WikiCSSH If you are using WikiCSSH please cite the following: > Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana. 2020. “WikiCSSH: Extracting Computer Science Subject Headings from Wikipedia.” In Workshop on Scientific Knowledge Graphs (SKG 2020). https://skg.kmi.open.ac.uk/SKG2020/papers/HAN_et_al_SKG_2020.pdf > Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana. 2020. "WikiCSSH - Computer Science Subject Headings from Wikipedia". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0424970_V1 Download the WikiCSSH files from: https://doi.org/10.13012/B2IDB-0424970_V1 More details about the WikiCSSH project can be found at: https://github.com/uiuc-ischool-scanr/WikiCSSH This folder contains the following files: WikiCSSH_categories.csv - Categories in WikiCSSH WikiCSSH_category_links.csv - Links between categories in WikiCSSH Wikicssh_core_categories.csv - Core categories as mentioned in the paper WikiCSSH_category_links_all.csv - Links between categories in WikiCSSH (includes a dummy category called <ROOT> which is parent of isolates and top level categories) WikiCSSH_category2page.csv - Links between Wikipedia pages and Wikipedia Categories in WikiCSSH WikiCSSH_page2redirect.csv - Links between Wikipedia pages and Wikipedia page redirects in WikiCSSH This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit <a href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</a> or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
keywords: wikipedia; computer science;
published: 2020-09-02
Citation context annotation. This dataset is a second version (V2) and part of the supplemental data for Jodi Schneider, Di Ye, Alison Hill, and Ashley Whitehorn. (2020) "Continued post-retraction citation of a fraudulent clinical trial report, eleven years after it was retracted for falsifying data". Scientometrics. In press, DOI: 10.1007/s11192-020-03631-1 Publications were selected by examining all citations to the retracted paper Matsuyama 2005, and selecting the 35 citing papers, published 2010 to 2019, which do not mention the retraction, but which mention the methods or results of the retracted paper (called "specific" in Ye, Di; Hill, Alison; Whitehorn (Fulton), Ashley; Schneider, Jodi (2020): Citation context annotation for new and newly found citations (2006-2019) to retracted paper Matsuyama 2005. University of Illinois at Urbana-Champaign. <a href="https://doi.org/10.13012/B2IDB-8150563_V1">https://doi.org/10.13012/B2IDB-8150563_V1</a> ). The annotated citations are second-generation citations to the retracted paper Matsuyama 2005 (RETRACTED: Matsuyama W, Mitsuyama H, Watanabe M, Oonakahara KI, Higashimoto I, Osame M, Arimura K. Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD. Chest. 2005 Dec 1;128(6):3817-27.), retracted in 2008 (Retraction in: Chest (2008) 134:4 (893) <a href="https://doi.org/10.1016/S0012-3692(08)60339-6">https://doi.org/10.1016/S0012-3692(08)60339-6<a/> ). <b>OVERALL DATA for VERSION 2 (V2)</b> FILES/FILE FORMATS Same data in two formats: 2010-2019 SG to specific not mentioned FG.csv - Unicode CSV (preservation format only) - same as in V1 2010-2019 SG to specific not mentioned FG.xlsx - Excel workbook (preferred format) - same as in V1 Additional files in V2: 2G-possible-misinformation-analyzed.csv - Unicode CSV (preservation format only) 2G-possible-misinformation-analyzed.xlsx - Excel workbook (preferred format) <b>ABBREVIATIONS: </b> 2G - Refers to the second-generation of Matsuyama FG - Refers to the direct citation of Matsuyama (the one the second-generation item cites) <b>COLUMN HEADER EXPLANATIONS </b> File name: 2G-possible-misinformation-analyzed. Other column headers in this file have same meaning as explained in V1. The following are additional header explanations: Quote Number - The order of the quote (citation context citing the first generation article given in "FG in bibliography") in the second generation article (given in "2G article") Quote - The text of the quote (citation context citing the first generation article given in "FG in bibliography") in the second generation article (given in "2G article") Translated Quote - English translation of "Quote", automatically translation from Google Scholar Seriousness/Risk - Our assessment of the risk of misinformation and its seriousness 2G topic - Our assessment of the topic of the cited article (the second generation article given in "2G article") 2G section - The section of the citing article (the second generation article given in "2G article") in which the cited article(the first generation article given in "FG in bibliography") was found FG in bib type - The type of article (e.g., review article), referring to the cited article (the first generation article given in "FG in bibliography") FG in bib topic - Our assessment of the topic of the cited article (the first generation article given in "FG in bibliography") FG in bib section - The section of the cited article (the first generation article given in "FG in bibliography") in which the Matsuyama retracted paper was cited
keywords: citation context annotation; retraction; diffusion of retraction; second-generation citation context analysis
published: 2018-03-01
The data set consists of Illumina sequences derived from 48 sediment samples, collected in 2015 from Lake Michigan and Lake Superior for the purpose of inventorying the fungal diversity in these two lakes. DNA was extracted from ca. 0.5g of sediment using the MoBio PowerSoil DNA isolation kits following the Earth Microbiome protocol. PCR was completed with the fungal primers ITS1F and fITS7 using the Fluidigm Access Array. The resulting amplicons were sequenced using the Illumina Hi-Seq2500 platform with rapid 2 x 250nt paired-end reads. The enclosed data sets contain the forward read files for both primers, both fixed-header index files, and the associated map files needed to be processed in QIIME. In addition, enclosed are two rarefied OTU files used to evaluate fungal diversity. All decimal latitude and decimal longitude coordinates of our collecting sites are also included. File descriptions: Great_lakes_Map_coordinates.xlsx = coordinates of sample sites QIIME Processing ITS1 region: These are the raw files used to process the ITS1 Illumina reads in QIIME. ***only forward reads were processed GL_ITS1_HW_mapFile_meta.txt = This is the map file used in QIIME. ITS1F_Miller_Fludigm_I1_fixedheader.fastq = Index file from Illumina. Headers were fixed to match the forward reads (R1) file in order to process in QIIME ITS1F_Miller_Fludigm_R1.fastq = Forward Illumina reads for the ITS1 region. QIIME Processing ITS2 region: These are the raw files used to process the ITS2 Illumina reads in QIIME. ***only forward reads were processed GL_ITS2_HW_mapFile_meta.txt = This is the map file used in QIIME. ITS7_Miller_Fludigm_I1_Fixedheaders.fastq = Index file from Illumina. Headers were fixed to match the forward reads (R1) file in order to process in QIIME ITS7_Miller_Fludigm_R1.fastq = Forward Illumina reads for the ITS2 region. Resulting OTU Table and OTU table with taxonomy ITS1 Region wahl_ITS1_R1_otu_table.csv = File contains Representative OTUs based on ITS1 region for all the R1 data and the number of each OTU found in each sample. wahl_ITS1_R1_otu_table_w_tax.csv = File contains Representative OTUs based on ITS1 region for all the R1 and the number of each OTU found in each sample along with taxonomic determination based on the following database: sh_taxonomy_qiime_ver7_97_s_31.01.2016_dev ITS2 Region wahl_ITS2_R1_otu_table.csv = File contains Representative OTUs based on ITS2 region for all the R1 data and the number of each OTU found in each sample. wahl_ITS2_R1_otu_table_w_tax.csv = File contains Representative OTUs based on ITS2 region for all the R1 data and the number of each OTU found in each sample along with taxonomic determination based on the following database: sh_taxonomy_qiime_ver7_97_s_31.01.2016_dev Rarified illumina dataset for each ITS Region ITS1_R1_nosing_rare_5000.csv = Environmental parameters and rarefied OTU dataset for ITS1 region. ITS2_R1_nosing_rare_5000.csv = Environmental parameters and rarefied OTU dataset for ITS2 region. Column headings: #SampleID = code including researcher initials and sequential run number BarcodeSequence = LinkerPrimerSequence = two sequences used CTTGGTCATTTAGAGGAAGTAA or GTGARTCATCGAATCTTTG ReversePrimer = two sequences used GCTGCGTTCTTCATCGATGC or TCCTCCGCTTATTGATATGC run_prefix = initials of run operator Sample = location code, see thesis figures 1 and 2 for mapped locations and Great_lakes_Map_coordinates.xlsx for exact coordinates. DepthGroup = S= shallow (50-100 m), MS=mid-shallow (101-150 m), MD=mid-deep (151-200 m), and D=deep (>200 m)" Depth_Meters = Depth in meters Lake = lake name, Michigan or Superior Nitrogen % Carbon % Date = mm/dd/yyyy pH = acidity, potential of Hydrogen (pH) scale SampleDescription = Sample or control X = sequential run number OTU ID = Operational taxonomic unit ID
keywords: Illumina; next-generation sequencing; ITS; fungi
published: 2020-02-12
The XSEDE program manages the database of allocation awards for the portfolio of advanced research computing resources funded by the National Science Foundation (NSF). The database holds data for allocation awards dating to the start of the TeraGrid program in 2004 to present, with awards continuing through the end of the second XSEDE award in 2021. The project data include lead researcher and affiliation, title and abstract, field of science, and the start and end dates. Along with the project information, the data set includes resource allocation and usage data for each award associated with the project. The data show the transition of resources over a fifteen year span along with the evolution of researchers, fields of science, and institutional representation.
keywords: allocations; cyberinfrastructure; XSEDE
published: 2018-04-19
Prepared by Vetle Torvik 2018-04-15 The dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed. &bull; How was the dataset created? First and last names of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including Ethnea+Genni as described in: <i>Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA. http://hdl.handle.net/2142/88927</i> <i>Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720</i> EthnicSeer: http://singularity.ist.psu.edu/ethnicity <i>Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada</i> SexMachine 0.1.1: <a href="https://pypi.python.org/pypi/SexMachine/">https://pypi.org/project/SexMachine</a> First names, for some Author-ity records lacking them, were harvested from outside bibliographic databases. &bull; The code and back-end data is periodically updated and made available for query at <a href ="http://abel.ischool.illinois.edu">Torvik Research Group</a> &bull; What is the format of the dataset? The dataset contains 9,300,182 rows and 10 columns 1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition) 2. name: full name used as input to EthnicSeer) 3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX 4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction 5. lastname: used as input for Ethnea+Genni 6. firstname: used as input for Ethnea+Genni 7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short) 8. Genni: predicted gender; 'F', 'M', or '-' 9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male) 10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'
keywords: Androgyny; Bibliometrics; Data mining; Search engine; Gender; Semantic orientation; Temporal prediction; Textual markers
published: 2018-08-06
This annotation study compared RobotReviewer's data extraction to that of three novice data extractors, using six included articles synthesized in one Cochrane review: Bailey E, Worthington HV, van Wijk A, Yates JM, Coulthard P, Afzal Z. Ibuprofen and/or paracetamol (acetaminophen) for pain relief after surgical removal of lower wisdom teeth. Cochrane Database Syst Rev. 2013; CD004624; doi:10.1002/14651858.CD004624.pub2 The goal was to assess the relative advantage of RobotReviewer's data extraction with respect to quality.
keywords: RobotReviewer; annotation; information extraction; data extraction; systematic review automation; systematic reviewing;
published: 2019-02-26
We have recently created an approach for high throughput single cell measurements using matrix assisted laser desorption / ionization mass spectrometry (MALDI MS) (J Am Soc Mass Spectrom. 2017, 28, 1919-1928. doi: 10.1007/s13361-017-1704-1. Chemphyschem. 2018, 19, 1180-1191. doi: 10.1002/cphc.201701364). While chemical detail is obtained on individual cells, it has not been possible to correlate the chemical information with canonical cell types. Now we combine high-throughput single cell mass spectrometry with immunocytochemistry to determine lipid profiles of two known cell types, astrocytes and neurons from the rodent brain, with the work appearing as “Lipid heterogeneity between astrocytes and neurons revealed with single cell MALDI MS supervised by immunocytochemical classification” (DOI: 10.1002/anie.201812892). Here we provide the data collected for this study. The dataset provides the raw data and script files for the rodent cerebral cells described in the manuscript.
keywords: Single cell analysis; mass spectrometry; astrocyte; neuron; lipid analysis
published: 2019-02-07
This dataset contains all data used in the two studies included in "PICAN-PI..." by Nute, et al, other than the original raw sequences. That includes: 1) Supplementary information for the Manuscript, including all the graphics that were created, 2) 16S Reference Alignment, Phylogeny and Taxonomic Annotation used by SEPP, and 3) Data used in the manuscript as input for the graphics generation (namely, SEPP outputs and sequence multiplicities).
keywords: microbiome; data visualization; graphics; phylogenetics; 16S
published: 2019-03-06
This dataset is provided to support the statements in Tarokh, A., and R.Y. Makhnenko. 2019. Remarks on the solid and bulk responses of fluid-filled porous rock, Geophysics. The unjacketed bulk modulus is a poroelastic parameter that can be directly measured in a laboratory test under a loading that preserves the difference between the mean stress and pore pressure constant. For a monomineralic rock, the measurement of the unjacketed bulk modulus is ignored because it is assumed to be equal to the bulk modulus of the solid phase. To examine this assumption, we tested porous sandstones (Berea and Dunnville) and limestones (Apulian and Indiana) mainly composed of quartz and calcite, respectively, under the unjacketed condition. The presence of microscale inhomogeneities, in the form of non-connected (occluded) pores, was shown to cause a considerable difference between the unjacketed bulk modulus and the bulk modulus of the solid phase. Furthermore, we found the unjacketed bulk modulus to be independent of the unjacketed pressure and Terzaghi effective pressure and therefore a constant.
keywords: Poroelasticity; anisotropic solid skeleton; unjacketed bulk modulus; non-connected porosity
published: 2019-05-01
This dataset contains scripts and data developed as a part of the research manuscript titled “Spatial and Temporal Allocation of Ammonia Emissions from Fertilizer Application Important for Air Quality Predictions in U.S. Corn Belt”. This includes (1) Spatial and temporal factors for ammonia emissions from agricultural fertilizer usage developed using the hybrid ISS-DNDC method for the Midwest U.S., (2) CAMx job scripts and outputs of predictions of ambient ammonia and total and speciated PM2.5, (3) Observation data used to statistically evaluate CAMx predictions, and (4) MATLAB programs developed to pair CAMx predictions with ground-based observation data in space and time.
keywords: Air quality; Ammonia; Emissions; PM2.5; CAMx; DNDC; spatial resolution; Midwest U.S.
published: 2019-05-31
This dataset includes all data presented in the manuscript entitled: "Dynamic controls on field-scale soil nitrous oxide hot spots and hot moments across a microtopographic gradient"
keywords: denitrification; depressions; microtopography; nitrous oxide; soil oxygen; soil temperature
published: 2019-06-22
keywords: conspecific attraction; fruit-eating bird; Hawaiian flora; playback experiment; seed dispersal; social information; Zosterops japonicas
published: 2018-01-11
Dataset includes structure and values of a causal model for Training Quality in nuclear power plants. Each entry refers to a piece of evidence supporting causality of the Training Quality causal model. Includes bibliographic information, context-specific text from the reference, and three weighted values; (M1) credibility of reference, (2) causality determined by the author, and (3) analysts confidence level. (M1, M2, and M3) Weight metadata are based on probability language from <a href="https://www.ipcc.ch/ipccreports/tar/vol4/english/index.htm" style="text-decoration: none" >Intergovernmental Panel on Climate Change (IPCC), Climate Change 2001: Synthesis Report</a>. The language can be found in the “Summary for Policymakers” section, in the PDF format. Weight Metadata: LowerBound_Probability, UpperBound_Probability, Qualitative Language 0.99, 1, Virtually Certain 0.9, 0.99, Very Likely 0.66, 0.9, Likely 0.33, 0.66, Medium Likelihood 0.1, 0.33, Unlikely 0.01, 0.1, Very Unlikely 0, 0.01, Extremely Unlikely
keywords: Data-Theoretic; Training; Organization; Probabilistic Risk Assessment; Training Quality; Causal Model; DT-BASE; Bayesian Belief Network; Bayesian Network; Theory-Building
published: 2018-01-03
Concatenated sequence alignment, phylogenetic analysis files, and relevant software parameter files from a cophylogenetic study of Brueelia-complex lice and their avian hosts. The sequence alignment file includes a list of character blocks for each gene alignment and the parameters used for the MrBayes phylogenetic analysis. 1) Files from the MrBayes analyses: a) a file with 100 random post-burnin trees (50% burnin) used in the cophylogenetic analysis - analysisrandom100_trees_brueelia.tre b) a majority rule consensus tree - treeconsensus_tree_brueelia.tre c) a maximum clade credibility tree - mcc_tree_brueelia.tre The tree tips are labeled with louse voucher names, and can be referenced in Supplementary Table 1 of the associated publication. 2) Files related to a BEAST analysis with COI data: a) the XML file used as input for the BEAST run, including model parameters, MCMC chain length, and priors - beast_parameters_coi_brueelia.xml b) a file with 100 random post-burnin trees (10% burnin) from the BEAST posterior distribution of trees; used in OTU analysis - beast_100random_trees_brueelia.tre c) an ultrametric maximum clade credibility tree - mcc_tree_beast_brueelia.tre 3) A maximum clade credibility tree of Brueelia-complex host species generated from a distribution of trees downloaded from https://birdtree.org/subsets/ - mcc_tree_brueelia_hosts.tre 4) Concatenated sequence alignment - concatenated_alignment_brueelia.nex
keywords: bird lice; Brueelia-complex; passerines; multiple sequence alignment; phylogenetic tree; Bayesian phylogenetic analysis; MrBayes; BEAST
published: 2018-03-08
This dataset was developed to create a census of sufficiently documented molecular biology databases to answer several preliminary research questions. Articles published in the annual Nucleic Acids Research (NAR) “Database Issues” were used to identify a population of databases for study. Namely, the questions addressed herein include: 1) what is the historical rate of database proliferation versus rate of database attrition?, 2) to what extent do citations indicate persistence?, and 3) are databases under active maintenance and does evidence of maintenance likewise correlate to citation? An overarching goal of this study is to provide the ability to identify subsets of databases for further analysis, both as presented within this study and through subsequent use of this openly released dataset.
keywords: databases; research infrastructure; sustainability; data sharing; molecular biology; bioinformatics; bibliometrics
published: 2018-03-28
Bibliotelemetry data are provided in support of the evaluation of Internet of Things (IoT) middleware within library collections. IoT infrastructure within the physical library environment is the basis for an integrative, hybrid approach to digital resource recommenders. The IoT infrastructure provides mobile, dynamic wayfinding support for items in the collection, which includes features for location-based recommendations. A modular evaluation and analysis herein clarified the nature of users’ requests for recommendations based on their location, and describes subject areas of the library for which users request recommendations. The modular mobile design allowed for deep exploration of bibliographic identifiers as they appeared throughout the global module system, serving to provide context to the searching and browsing data that are the focus of this study.
keywords: internet of things; IoT; academic libraries; bibliographic classification
published: 2018-04-23
Conceptual novelty analysis data based on PubMed Medical Subject Headings ---------------------------------------------------------------------- Created by Shubhanshu Mishra, and Vetle I. Torvik on April 16th, 2018 ## Introduction This is a dataset created as part of the publication titled: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : the magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra. It contains final data generated as part of our experiments based on MEDLINE 2015 baseline and MeSH tree from 2015. The dataset is distributed in the form of the following tab separated text files: * PubMed2015_NoveltyData.tsv - Novelty scores for each paper in PubMed. The file contains 22,349,417 rows and 6 columns, as follow: - PMID: PubMed ID - Year: year of publication - TimeNovelty: time novelty score of the paper based on individual concepts (see paper) - VolumeNovelty: volume novelty score of the paper based on individual concepts (see paper) - PairTimeNovelty: time novelty score of the paper based on pair of concepts (see paper) - PairVolumeNovelty: volume novelty score of the paper based on pair of concepts (see paper) * mesh_scores.tsv - Temporal profiles for each MeSH term for all years. The file contains 1,102,831 rows and 5 columns, as follow: - MeshTerm: Name of the MeSH term - Year: year - AbsVal: Total publications with that MeSH term in the given year - TimeNovelty: age (in years since first publication) of MeSH term in the given year - VolumeNovelty: : age (in number of papers since first publication) of MeSH term in the given year * meshpair_scores.txt.gz (36 GB uncompressed) - Temporal profiles for each MeSH term for all years - Mesh1: Name of the first MeSH term (alphabetically sorted) - Mesh2: Name of the second MeSH term (alphabetically sorted) - Year: year - AbsVal: Total publications with that MeSH pair in the given year - TimeNovelty: age (in years since first publication) of MeSH pair in the given year - VolumeNovelty: : age (in number of papers since first publication) of MeSH pair in the given year * README.txt file ## Dataset creation This dataset was constructed using multiple datasets described in the following locations: * MEDLINE 2015 baseline: <a href="https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html">https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html</a> * MeSH tree 2015: <a href="ftp://nlmpubs.nlm.nih.gov/online/mesh/2015/meshtrees/">ftp://nlmpubs.nlm.nih.gov/online/mesh/2015/meshtrees/</a> * Source code provided at: <a href="https://github.com/napsternxg/Novelty">https://github.com/napsternxg/Novelty</a> Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check <a href="https://www.nlm.nih.gov/databases/download/pubmed_medline.html">here </a>for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions: Additional data related updates can be found at: <a href="http://abel.ischool.illinois.edu">Torvik Research Group</a> ## Acknowledgments This work was made possible in part with funding to VIT from <a href="https://projectreporter.nih.gov/project_info_description.cfm?aid=8475017&icde=18058490">NIH grant P01AG039347 </a> and <a href="http://www.nsf.gov/awardsearch/showAward?AWD_ID=1348742">NSF grant 1348742 </a>. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. ## License Conceptual novelty analysis data based on PubMed Medical Subject Headings by Shubhanshu Mishra, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at <a href="https://github.com/napsternxg/Novelty">https://github.com/napsternxg/Novelty</a>
keywords: Conceptual novelty; bibliometrics; PubMed; MEDLINE; MeSH; Medical Subject Headings; Analysis;
published: 2018-05-21
This dataset contains bonding networks and tolerance ranges for geometric magnetic dimensionality. The data can be searched in the html frontend above, code obtained at the GitHub repository, or the raw data can be downloaded as csv below. The csv data contains the results of 42520 compounds (unique icsd_code) from ICSD FindIt v3.5.0. The csv is semicolon-delimited since some fields contain multiple comma-separated values.
keywords: materials science; physics; magnetism; crystallography
published: 2018-04-05
GBS data from Phaseolus accessions, for a study led by Dr. Glen Hartman, UIUC. <br />The (zipped) fastq file can be processed with the TASSEL GBS pipeline or other pipelines for SNP calling. The related article has been submitted and the methods section describes the data processing in detail.
published: 2018-06-06
DNDC scripts and outputs that were generated as a part of the research publication 'Evaluation of DeNitrification DeComposition Model for Estimating Ammonia Fluxes from Chemical Fertilizer Application'.
keywords: DNDC; REA; ammonia emissions; fertilizers; uncertainty analysis