Displaying 1 - 25 of 726 in total
Subject Area
Funder
Publication Year
License
Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2024-07-30
 
This file contains the white-tailed deer (Odocoileus virginianus) land cover utility score (deer LCU score) datasets for every TRS (township, range, and section), township, and county in Illinois, USA. The file is an Excel spreadsheet with a metadata sheet, separate sheets for the deer LCU scores for each spatial level, and a sheet with the data required to replicate how the deer LCU score approach was validated. The deer LCU score is a unitless value, with larger scores corresponding to a spatial unit with more and/or better deer habitat.
keywords: habitat; white-tailed deer; deer; Odocoileus virginianus; land cover; land classification; landscape; habitat suitability index; ecology; environment
published: 2025-02-03
 
The data and code provided in this dataset can be used to generate plots that show the results of linear prediction algorithm and the amplified modes, supporting the key argument of the manuscript. It is divided into five subfolders, each corresponding to one combination of external condition (magnetic field B, temperature), scan parameter (temperature, magnetic field B), pump laser polarization (linear s, linear p, and circular), and sample orientation ( B parallel to c axis, B perpendicular to c axis): 1) B parallel to c axis, linear pump polarization in s, linear THz emission polarization in s, field dependence (B_parallel_c_linear_spump_sprobe_field). 2) B parallel to c axis, linear pump polarization in s, linear THz emission polarization in s, temperature dependence (B_parallel_c_linear_spump_sprobe_temperature). 3) B perpendicular to c axis, linear pump polarization in s, linear THz emission polarization in s, field dependence (B_perp_c_linear_spump_sprobe_field). 4) B perpendicular to c axis, linear pump polarization in s, linear THz emission polarization in s, temperature dependence (B_perp_c_linear_spump_sprobe_temperature). 5) B parallel to c axis, circular pump polarization (left circularly polarized LCP and right circularly polarized RCP), linear THz emission polarization in s, field dependence (B_parallel_c_LCPRCP_pump_sprobe_field). Each folder contains the raw data (.mat), the oscillator parameters obtained through linear prediction algorithm (.mat), and the plot-generating code (.m). The code plots the raw data, the fit to the processed data, and the amplified modes. Codes are written in MATLAB R2024a; the working directory of each code should be the corresponding subfolder that contains it.
keywords: magneto-chiral instability; THz emission; THz spectroscopy; nonequilibrium states; emergent phenomena; Weyl semiconductor; tellurium; ultrafast spectrscopy; photoexcitation
published: 2016-05-19
 
This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission. The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.
keywords: taxi;transportation;New York City;GPS
published: 2025-02-14
 
This dataset includes the original data (including photographs as .jpg files and sound recordings as .wav files) and detailed descriptions of workflows for analyses of acoustic and morphometric data for the Neoaliturus tenellus (beet leafhopper) species complex. Files needed for different parts of the two analytical workflows are included in the "Acoustics.zip" and "PCA.zip" archives. The "Folder Structure.png" file contains a diagram of the folder structure of the two archives. Each archive contains a "ReadMe" file with instructions for repeating the analyses. File and folder names including the two-letter abbreviations TB, TD, TN and TP refer to four different putative species (operational taxonomic units, or OTUs, of the Neoaliturus tenellus complex.
keywords: Hemiptera; Cicadellidae; integrative taxonomy; courtship; morphology
published: 2025-02-07
 
This dataset contains raw data of plasma glucose, insulin, c-peptide, GLP-1, and FGF21 collected as part of a study aimed to study alcohol pharmacokinetics in women who underwent metabolic surgery.
keywords: Excel; Alcohol and metabolic surgery; glucose; insulin; c-peptide; glp-1; fgf21
published: 2024-03-27
 
To gather news articles from the web that discuss the Cochrane Review, we used Altmetric Explorer from Altmetric.com and retrieved articles on August 1, 2023. We selected all articles that were written in English, published in the United States, and had a publication date <b>prior to March 10, 2023</b> (according to the “Mention Date” on Altmetric.com). This date is significant as it is when Cochrane issued a statement about the "misleading interpretation" of the Cochrane Review. The collection of news articles is presented in the Altmetric_data.csv file. The dataset contains the following data that we exported from Altmetric Explorer: - Publication date of the news article - Title of the news article - Source/publication venue of the news article - URL - Country We manually checked and added the following information: - Whether the article still exists - Whether the article is accessible - Whether the article is from the original source We assigned MAXQDA IDs to the news articles. News articles were assigned the same ID when they were (a) identical or (b) in the case of Article 207, closely paraphrased, paragraph by paragraph. Inaccessible items were assigned a MAXQDA ID based on their "Mention Title". For each article from Altmetric.com, we first tried to use the Web Collector for MAXQDA to download the article from the website and imported it into MAXQDA (version 22.7.0). If an article could not be retrieved using the Web Collector, we either downloaded the .html file or in the case of Article 128, retrieved it from the NewsBank database through the University of Illinois Library. We then manually extracted direct quotations from the articles using MAXQDA. We included surrounding words and sentences, and in one case, a news agency’s commentary, around direct quotations for context where needed. The quotations (with context) are the positions in our analysis. We also identified who was quoted. We excluded quotations when we could not identify who or what was being quoted. We annotated quotations with codes representing groups (government agencies, other organizations, and research publications) and individuals (authors of the Cochrane Review, government agency representatives, journalists, and other experts such as epidemiologists). The MAXQDA_data.csv file contains excerpts from the news articles that contain the direct quotations we identified. For each excerpt, we included the following information: - MAXQDA ID of the document from which the excerpt originates; - The collection date and source of the document; - The code with which the excerpt is annotated; - The code category; - The excerpt itself.
keywords: altmetrics; MAXQDA; polylogue analysis; masks for COVID-19; scientific controversies; news articles
published: 2022-05-13
 
The files are plain text and contain the original data used in phylogenetic analyses of of Typhlocybinae (Bin, Dietrich, Yu, Meng, Dai and Yang 2022: Ecology & Evolution, in press). The three files with extension .phy are text files with aligned DNA sequences in the standard PHYLIP format and correspond to Matrix 1 (amino acid alignment), Matrix 2 (nucleotide alignment of first two codon positions of protein-coding genes) and Matrix 3 (nucleotide alignment of protein-coding genes plus 2 ribosomal genes) described in the Methods section. An additional text file in NEXUS format (.nex extension) contains the morphological character data used in the ancestral state reconstruction (ASCR) analysis described in the Methods. NEXUS is a standard format used by various phylogenetic analysis software. For more information on data file content, see the included "readme" files.
keywords: Hemiptera; phylogeny; mitochondrial genome; morphology; leafhopper
published: 2022-10-14
 
The Membracoidea_morph_data_Final.nex text file contains the original data used in the phylogenetic analyses of Dietrich et al. (Insect Systematics and Diversity, in review). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The complete taxon names corresponding to the 131 genus names listed under “BEGIN TAXA” are listed in Table 1 in the included PDF file “Taxa_and_characters”; the 229 morphological characters (names abbreviated under under “BEGIN CHARACTERS” are fully explained in the list of character descriptions following Table 1 in the same PDF). The data matrix follows “MATRIX” and gives the numerical values of characters for each taxon. Question marks represent missing data. The lists of characters and taxa and details on the methods used for phylogenetic analysis are included in the submitted manuscript.
keywords: leafhopper; treehopper; evolution; Cretaceous; Eocene
published: 2024-04-05
 
The following files include specimen information, DNA sequence data, and additional information on the analyses used to reconstruct the phylogeny of the leafhopper genus Neoaliturus as described in the Methods section of the original paper: 1. Taxon_sampling.csv: contains data on the individual specimens from which DNA was extracted, including sample code, taxon name, collection data (locality, date and name of collector) and museum unique identifier. 2. Alignments.zip: a ZIP archive containing 432 separate FASTA files representing the aligned nucleotide sequences of individual gene loci used in the analysis. 3. Concatenated_Matrix.fa: is a FASTA file containing the concatenated individual gene alignments used for the maximum likelihood analysis in IQ-TREE. 4. Genes_and_Loci.rtf: identifies the individual genes and loci used in the analysis. The partition name is the same as the name of the individual alignment file in the zipped Alignments folder. 5. Partitions_best_scheme.nex: is a text file in the standard NEXUS format that indicates the names of the individual data partitions and their locations in the concatenated matrix, and also indicates the substitution model for each partition. 6. (New in this version 2) Scripts & Description.zip includes 8 custom shell or perl scripts used to assemble the DNA sequence data by perform reciprocal blast searches between the reference sequences and assemblies for each sample, extract the best sequences based on the blast searches, screen the hits for each locus and keep only the best result, and generate the nucleotide sequence dataset for the predicted orthologues (see the file description.txt for details). 7. (New in this version 2) Full_genetic_distances_matrix.csv shows the genetic distances between pairs of samples in the datset (proportion of nucleotides that differ between samples).
keywords: leafhopper; phylogeny; anchored-hybrid-enrichment; DNA sequence; insect
published: 2025-02-08
 
The synthetic networks in this dataset were generated using the RECCS protocol developed by Anne et al. (2024). Briefly, the RECCS process is as follows. An input network and clustering (by any algorithm) is used to pass input parameters to a stochastic block model (SBM) generator. The output is then modified to improve fit to the input real world clusters after which outlier nodes are added using one of three different options. See Anne et al. (2024): in press Complex Networks and Applications XIII (preprint : arXiv:2408.13647). The networks in this dataset were generated using either version 1 or version 2 of the RECCS protocol followed by outlier strategy S1. The input networks to the process were (i) the Curated Exosome Network (CEN), Wedell et al. (2021), (ii) cit_hepph (https://snap.stanford.edu/), (iii) cit_patents (https://snap.stanford.edu/), and (iv) wiki_topcats (https://snap.stanford.edu/). Input Networks: The CEN can be downloaded from the Illinois Data Bank: https://databank.illinois.edu/datasets/IDB-0908742 -> cen_pipeline.tar.gz -> S1_cen_cleaned.tsv The synthetic file naming system should be interpreted as follows: a_b_c.tsv.gz where a - name of inspirational network, e.g., cit_hepph b - the resolution value used when clustering a with the Leiden algorithm optimizing the Constant Potts Model, e.g., 0.01 c- the RECCS option used to approximate edge count and connectivity in the real world network, e.g., v1 Thus, cit_hepph_0.01_v1.tsv indicates that this network was modeled on the cit_hepph network and RECCSv1 was used to match edge count and connectivity to a Leiden-CPM 0.01 clustering of cit_hepph. For SBM generation, we used the graph_tool software (P. Peixoto, Tiago 2014. The graph-tool python library. figshare. Dataset. https://doi.org/10.6084/m9.figshare.1164194.v14) Additionally, this dataset contains synthetic networks generated for a replication experiment (repl_exp.tar.gz). The experiment aims to evaluate the consistency of RECCS-generated networks by producing multiple replicates under controlled conditions. These networks were generated using different configurations of RECCS, varying across two versions (v1 and v2), and applying the Connectivity Modifier (CM++, Ramavarapu et al. (2024)) pre-processing. Please note that the CM pipeline used for this experiment filters small clusters both before and after the CM treatment. Input Network : CEN Within repl_exp.tar.gz, the synthetic file naming system should be interpreted as follows: cen_<resolution><cm_status><reccs_version>sample<replicate_id>.tsv where: cen – Indicates the network was modeled on the Curated Exosome Network (CEN). resolution – The resolution parameter used in clustering the input network with Leiden-CPM (0.01). cm_status – Either cm (CM-treated input clustering) or no_cm (input clustering without CM treatment). reccs_version – The RECCS version used to generate the synthetic network (v1 or v2). replicate_id – The specific replicate (ranging from 0 to 2 for each configuration). For example: cen_0.01_cm_v1_sample_0.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, CM-treated input, and generated using RECCSv1 (first replicate). cen_0.01_no_cm_v2_sample_1.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, without CM treatment, and generated using RECCSv2 (second replicate). The ground truth clustering input to RECCS is contained in repl_exp_groundtruths.tar.gz.
keywords: Community Detection; Synthetic Networks; Stochastic Block Model (SBM);
published: 2024-09-17
 
The following seven zip files are compressed folders containing the input datasets/trees, main output files and the scripts of the related analyses performed in this study. I. ancestral_microhabitat_reconstruction.zip: contains four files, including two input files (microhabitats.csv, timetree.tre) and a script (simmap_microhabitat.R) for ancestral states reconstruction of microhabitat by make.simmap implemented in the R package phytools v1.5, as well as the main output file (ancestral_microhabitats.csv). 1. ancestral_microhabitats.csv: reconstructed ancestral microhabitats for each node. 2. microhabitats.csv: microhabitats of the studies species. 3. simmap_microhabitat.R: the R script of make.simmap for ancestral microhabitat reconstruction 4. timetree.tre: dated tree used for ancestral state reconstruction for microhabitat and morphological characters II. ancestral_morphology_reconstruction.zip: contains six files, including an input file (morphology.csv) and a script (simmap_morphology.R) for ancestral states reconstruction of morphology by make.simmap implemented in the R package phytools v1.5, as well as four main output files(forewing_ancestral_state.csv, frontal_sutures_ancestral_state.csv, hind_wing_ancestral_state.csv, ocellus_ancestral_state.csv). 1. forewing_ancestral_state.csv: reconstructed ancestral states of the development of the forewing for each node. 2. frontal_sutures_ancestral_state.csv: reconstructed ancestral states of the development of frontal sutures for each node. 3. hind_wing_ancestral_state.csv: reconstructed ancestral states of the development of the hind wing for each node. 4. morphology.csv: the states of the development of ocellus, forewing, hing wing and frontal sutures for each studies species. 5. ocellus_ancestral_state.csv: reconstructed ancestral states of the development of the ocellus for each node. 6. simmap_morphology.R: the R script of make.simmap for ancestral state reconstruction of morphology III. biogeographic_reconstruction.zip: contains four files, including three input files (dispersal_probablity.txt, distributions.csv, timetree_noOutgroup.tre) used for a stratified biogeographic analysis by BioGeoBEARS in RASP v4.2 and the main output file (DIVELIKE_result.txt). 1. dispersal_probablity.txt: relative dispersal probabilities among biogeographical regions at different geological epochs. 2. distributions.csv: current distributions of the studied species. 3. DIVELIKE_result.txt: BioGeoBEARS result of ancestral areas based on the DIVELIKE model. 4. timetree_noOutgroup.tre: the dated tree with the outgroup lineage (Eurymelinae) excluded. IV. coalescent_analysis.zip: contains a folder and two files, including a folder (individual_gene_alignment) of input files used to construct gene trees, an input file (MLtree_BS70.tre) used for the multi-species coalescent analysis by ASTRAL v 4.10.5 and the main output file (coalescent_species_tree.tre). 1. coalescent_species_tree.tre: the species tree generated by the multi-species coalescent analysis with the quartet support, effective number of genes and the local posterior probability indicated. 2. individual_gene_alignment: a folder containing 427 FASTA files, each one represents the nucleotide alignment for a gene. Hyphens are used to represent gaps. These files were used to construct gene trees using IQ-TREE v1.6.12. 3. MLtree_BS70.tre: 165 gene trees with the average SH-aLRT and ultrafast bootstrap values of ≥ 70%. This file was used to estimate the species tree by ASTRAL v 4.10.5. V. divergence_time_estimation.zip: contains five files, including two input files (treefile_rooted_noBranchLength.tre, treefile_rooted.tre) and two control files (baseml.ctl, mcmctree.ctl) used for divergence time estimation by BASEML and MCMCTREE in PAML v4.9, as well as the main output file (timetree_with95%HPD.tre). 1. baseml.ctl: the control file used for the estimation of substitution rates by BASEML in PAML v4.9. 2. mcmctree.ctl: the control file used for the estimation of divergence times by MCMCTREE in PAML v4.9. 3. timetree_with95%HPD.tre: dated tree with the 95% highest posterior density confidence intervals indicated. 4. treefile_rooted_noBranchLength.tre: the maximum likelihood tree based on the concatenated nucleotide dataset with calibrations for the crown and internal nodes. Branch length and support values were not indicated. 5. treefile_rooted.tre: the maximum likelihood tree based on the concatenated nucleotide dataset with a secondary calibration on the root age. Branch support values were not indicated. VI. maximum_likelihood_analysis_aa.zip: contains three files, including two input files (concatenated_aa_partition.nex, concatenated_aa.phy) used for the maximum likelihood analysis by IQ-TREE v1.6.12 and the main output file (MLtree_aa.tre). 1. concatenated_aa_partition.nex: the partitioning schemes for the maximum likelihood analysis using concatenated_aa.phy. This file partitions the 52,024 amino acid positions into 427 character sets. 2. concatenated_aa.phy: a concatenated amino acid dataset with 52,024 amino acid positions. Hyphens are used to represent gaps. This dataset was used for the maximum likelihood analysis. 3. MLtree_aa.tre: the maximum likelihood tree based on the concatenated amino acid dataset, with SH-aLRT values and ultrafast bootstrap values indicated. VII. maximum_likelihood_analysis_nt.zip: contains three files, including two input files (concatenated_nt_partition.nex, concatenated_nt.phy) used for the maximum likelihood analysis by IQ-TREE v1.6.12 and the main output file (MLtree_nt.tre). 1. concatenated_nt_partition.nex: the partitioning schemes for the maximum likelihood analysis using concatenated_nt.phy. This file partitions the 156,072 nucleotide positions into 427 character sets. 2. concatenated_nt.phy: a concatenated nucleotide dataset with 156,072 nucleotide positions. Hyphens are used to represent gaps. This dataset was used for the maximum likelihood analysis as well as divergence time estimation. 3. MLtree_nt.tre: the maximum likelihood tree based on the concatenated nucleotide dataset, with SH-aLRT values and ultrafast bootstrap values indicated. VIII. Taxon_sampling.csv: contains the sample IDs (1st column) which were used in the alignments and the taxonomic information (2nd to 6th columns).
keywords: Anchored Hybrid Enrichment, Biogeography, Cicadellidae, Phylogenomics, Treehoppers
published: 2025-01-06
 
The complete data for the publication "RNA helicase MOV10 suppresses fear memory and dendritic arborization and regulates microtubule dynamics in hippocampal neurons," excluding sequencing data deposited in GEO, is provided here.
keywords: MOV10; NUMA1; hippocampal neurons; behavior; cytoskeleton; tiff; czi; dv; mp4; mpg; ndpi; csv; xlsx; R
published: 2025-01-17
 
This is the data set for a publication titled, "Coupling carbon dioxide gas within a bubble curtain enhances its effectiveness to deter fish." The current study sought to quantify whether adding carbon dioxide gas (CO2) to a bubble curtain would enhance its efficacy to block fish. For this, a choice tank was outfitted with bubble curtains infused with either compressed air alone, or with two different concentrations of CO2 [30 or 100 mg/L]. Passage rates and position of common carp (an invasive Cyprinid) and black bullhead (a native Ictalurid) exposed to these treatments were compared. The data set consists of data from each of the experiments performed during the study.
keywords: invasive species; multimodal barriers; deterrents; biodiversity; species range; distribution
published: 2025-02-07
 
These data represent the raw data from the paper “Influence of light availability and water depth on competition between Phalaris arundinacea and herbaceous vines” published in Wetlands by Annie H. Huang and Jeffrey W. Matthews. The data are archived in one file: Huang&Matthews_mesocosm_data_archive. This file includes raw data collected during a greenhouse experiment described in the paper.
published: 2025-02-07
 
Incoherent scatter radar datasets collected during the September 2016 campaign at Arecibo have been deposited in this databank. The lag products of the ISR data are stored as lag profile matrices with 5 minutes of integration time. The data is organized in a Python dictionary format, with each file containing 12 lag profile matrices representing one hour of observation. A sample Python script is provided to illustrate its usage.
published: 2025-02-06
 
Data from a study on the behavior of blue-winged and golden-winged warblers. We were investigating vocalizations and how the species reconizes each other. There are banding, behavioral data from a playback study, and song data.
keywords: warblers; songs; species recognition
published: 2021-05-07
 
Prepared by Vetle Torvik 2021-05-07 The dataset comes as a single tab-delimited Latin-1 encoded file (only the City column uses non-ASCII characters). • How was the dataset created? The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in December, 2018. (NLMs baseline 2018 plus updates throughout 2018). Affiliations are linked to a particular author on a particular article. Prior to 2014, NLM recorded the affiliation of the first author only. However, MapAffil 2018 covers some PubMed records lacking affiliations that were harvested elsewhere, from PMC (e.g., PMID 22427989), NIH grants (e.g., 1838378), and Microsoft Academic Graph and ADS (e.g. 5833220). Affiliations are pre-processed (e.g., transliterated into ASCII from UTF-8 and html) so they may differ (sometimes a lot; see PMID 27487542) from PubMed records. All affiliation strings where processed using the MapAffil procedure, to identify and disambiguate the most specific place-name, as described in: Torvik VI. MapAffil: A bibliographic tool for mapping author affiliation strings to cities and their geocodes worldwide. D-Lib Magazine 2015; 21 (11/12). 10p • Look for Fig. 4 in the following article for coverage statistics over time: Palmblad, M., Torvik, V.I. Spatiotemporal analysis of tropical disease research combining Europe PMC and affiliation mapping web services. Trop Med Health 45, 33 (2017). <a href="https://doi.org/10.1186/s41182-017-0073-6">https://doi.org/10.1186/s41182-017-0073-6</a> Expect to see big upticks in coverage of PMIDs around 1988 and for non-first authors in 2014. • The code and back-end data is periodically updated and made available for query by PMID at http://abel.ischool.illinois.edu/cgi-bin/mapaffil/search.py • What is the format of the dataset? The dataset contains 52,931,957 rows (plus a header row). Each row (line) in the file has a unique PMID and author order, and contains the following eighteen columns, tab-delimited. All columns are ASCII, except city which contains Latin-1. 1. PMID: positive non-zero integer; int(10) unsigned 2. au_order: positive non-zero integer; smallint(4) 3. lastname: varchar(80) 4. firstname: varchar(80); NLM started including these in 2002 but many have been harvested from outside PubMed 5. initial_2: middle name initial 6. orcid: From 2019 ORCID Public Data File https://orcid.org/ and from PubMed XML 7. year: year of the publication 8. journal: name of journal that the publication is published 9. affiliation: author's affiliation?? 10. disciplines: extracted from departments, divisions, schools, laboratories, centers, etc. that occur on at least unique 100 affiliations across the dataset, some with standardization (e.g., 1770799), English translations (e.g., 2314876), or spelling corrections (e.g., 1291843) 11. grid: inferred using a high-recall technique focused on educational institutions (but, for experimental purposes, includes a few select hospitals, national institutes/centers, international companies, governmental agencies, and 200+ other IDs [RINGGOLD, Wikidata, ISNI, VIAF, http] for institutions not in GRID). Based on 2019 GRID version https://www.grid.ac/ 12. type: EDU, HOS, EDU-HOS, ORG, COM, GOV, MIL, UNK 13. city: varchar(200); typically 'city, state, country' but could include further subdivisions; unresolved ambiguities are concatenated by '|' 14. state: Australia, Canada and USA (which includes territories like PR, GU, AS, and post-codes like AE and AA) 15. country 16. lat: at most 3 decimals (only available when city is not a country or state) 17. lon: at most 3 decimals (only available when city is not a country or state) 18. fips: varchar(5); for USA only retrieved by lat-lon query to https://geo.fcc.gov/api/census/block/find
keywords: PubMed, MEDLINE, Digital Libraries, Bibliographic Databases; Author Affiliations; Geographic Indexing; Place Name Ambiguity; Geoparsing; Geocoding; Toponym Extraction; Toponym Resolution; institution name disambiguation
published: 2021-04-22
 
Author-ity 2018 dataset Prepared by Vetle Torvik Apr. 22, 2021 The dataset is based on a snapshot of PubMed taken in December 2018 (NLMs baseline 2018 plus updates throughout 2018). A total of 29.1 million Article records and 114.2 million author name instances. Each instance of an author name is uniquely represented by the PMID and the position on the paper (e.g., 10786286_3 is the third author name on PMID 10786286). Thus, each cluster is represented by a collection of author name instances. The instances were first grouped into "blocks" by last name and first name initial (including some close variants), and then each block was separately subjected to clustering. The resulting clusters are provided in two different formats, the first in a file with only IDs and PMIDs, and the second in a file with cluster summaries: #################### File 1: au2id2018.tsv #################### Each line corresponds to an author name instance (PMID and Author name position) with an Author ID. It has the following tab-delimited fields: 1. Author ID 2. PMID 3. Author name position ######################## File 2: authority2018.tsv ######################### Each line corresponds to a predicted author-individual represented by cluster of author name instances and a summary of all the corresponding papers and author name variants. Each cluster has a unique Author ID (the PMID of the earliest paper in the cluster and the author name position). The summary has the following tab-delimited fields: 1. Author ID (or cluster ID) e.g., 3797874_1 represents a cluster where 3797874_1 is the earliest author name instance. 2. cluster size (number of author name instances on papers) 3. name variants separated by '|' with counts in parenthesis. Each variant of the format lastname_firstname middleinitial, suffix 4. last name variants separated by '|' 5. first name variants separated by '|' 6. middle initial variants separated by '|' ('-' if none) 7. suffix variants separated by '|' ('-' if none) 8. email addresses separated by '|' ('-' if none) 9. ORCIDs separated by '|' ('-' if none). From 2019 ORCID Public Data File https://orcid.org/ and from PubMed XML 10. range of years (e.g., 1997-2009) 11. Top 20 most frequent affiliation words (after stoplisting and tokenizing; some phrases are also made) with counts in parenthesis; separated by '|'; ('-' if none) 12. Top 20 most frequent MeSH (after stoplisting) with counts in parenthesis; separated by '|'; ('-' if none) 13. Journal names with counts in parenthesis (separated by '|'), 14. Top 20 most frequent title words (after stoplisting and tokenizing) with counts in parenthesis; separated by '|'; ('-' if none) 15. Co-author names (lowercased lastname and first/middle initials) with counts in parenthesis; separated by '|'; ('-' if none) 16. Author name instances (PMID_auno separated by '|') 17. Grant IDs (after normalization; '-' if none given; separated by '|'), 18. Total number of times cited. (Citations are based on references harvested from open sources such as PMC). 19. h-index 20. Citation counts (e.g., for h-index): PMIDs by the author that have been cited (with total citation counts in parenthesis); separated by '|'
keywords: author name disambiguation; PubMed
published: 2024-10-10
 
Diversity - PubMed dataset Contact: Apratim Mishra (Oct, 2024) This dataset presents article-level (pmid) and author-level (auid) diversity data for PubMed articles. The chosen selection includes articles retrieved from Authority 2018 [1], 907 024 papers, and 1 316 838 authors, and is an expanded dataset of V1. The sample of articles consists of the top 40 journals in the dataset, limited to 2-12 authors published between 1991 – 2014, which are article type "journal type" written in English. Files are 'gzip' compressed and separated by tab space, and V3 includes the correct author count for the included papers (pmids) and updated results with no NaNs. ################################################ File1: auids_plos_3.csv.gz (Important columns defined, 5 in total) • AUID: a unique ID for each author • Genni: gender prediction • Ethnea: ethnicity prediction ################################################# File2: pmids_plos_3.csv.gz (Important columns defined) • pmid: unique paper • auid: all unique auids (author-name unique identification) • year: Year of paper publication • no_authors: Author count • journal: Journal name • years: first year of publication for every author • Country-temporal: Country of affiliation for every author • h_index: Journal h-index • TimeNovelty: Paper Time novelty [2] • nih_funded: Binary variable indicating funding for any author • prior_cit_mean: Mean of all authors’ prior citation rate • Insti_impact: All unique institutions’ citation rate • mesh_vals: Top MeSH values for every author of that paper • relative_citation_ratio: RCR The ‘Readme’ includes a description for all columns. [1] Torvik, Vetle; Smalheiser, Neil (2021): Author-ity 2018 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2273402_V1 [2] Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1
keywords: Diversity; PubMed; Citation
suppressed by curator
 
published: 2024-09-28
 
Per the authors' request, the data files for this dataset are now suppressed. Please visit this new dataset for the complete and updated data files: Huang, Yijing; Fahad , Mahmood (2025): Data for Observation of a Magneto-chiral Instability in Photoexcited Tellurium. University of Illinois Urbana-Champaign.<a href="https://doi.org/10.13012/B2IDB-1409842_V1">https://doi.org/10.13012/B2IDB-1409842_V1</a> ==================== The data and code provided in this dataset can be used to generate key plots in the manuscript. It is divided into four subfolders (B parallel/perpendicular to the tellurium c axis and field/ temperature dependence), each containing the raw data (saved in .mat format), the oscillator parameters obtained through linear prediction (saved in .mat format), and the plot-generating code (.m files). The code was written using MATLAB R2024a. To run the code, go to each folder, and run the .m file in that folder, which generates two plots.
published: 2025-01-31
 
Title: Airyscan confocal superresolution images of extant Malvaceae pollen with a focus on Bombacoideae Authors: Surangi W. Punyasena, Ingrid Romero, Michael A. Urban Subject: Biological sciences Keywords: Malvaceae; superresolution microscopy; Zeiss; Bombacacidites; Neotropics; CZI Funder: NSF-DBI Advances in Bioinformatics (NSF-DBI-1262561) Corresponding Creator: Surangi W. Punyasena This dataset includes a total of 430 images of extant specimens of the Malvaceae, with a focus on species that are or have been included within the subfamily Bombacoideae. There are 27 genera included within 26 folders. Each folder is named by genus and contains all the images that correspond to that genus. Note that the genus _Matisia_ is included with _Quararibea_ as detailed in the metadata READ ME file. The specimens imaged are from the palynological collections of the Swedish Museum of Natural History and Smithsonian Tropical Research Institute, and herbarium specimens from the Smithsonian Herbarium National Museum. The optical superresolution microscopy images were taken using a Zeiss LSM 880 with Airyscan at 630X magnification (63x/NA 1.4 oil DIC). The images are in the original CZI file format. They can be opened using Zeiss propriety software (Zen, Zen lite) or in ImageJ/FIJI. More information on how to open CZI files can be found here: [https://www.zeiss.com/microscopy/en/products/software/zeiss-zen/czi-image-file-format.html] Image metadata and file organization are described in the CSV file "METADATA_Malvaceae_Bombacoideae_modern-species.csv". The column headings are: Folder The folder in which the image file is found Subfamily The current subfamily determination based on the literature. Note that _Pentaplaris_ and _Septotheca_ have not been assigned a subfamily. Genus Genus name Species Species name Accepted name Accepted species name, updated from the literature Slide name Species name as denoted on the herbarium slide Collection Source of the herbarium slide: Sweden National Museum of Natural History or the Smithsonian Tropical Research Institute File name File name using the species name denoted on the herbarium slide Slide ID/Herbarium ID Specimen collection number Please cite this dataset as: Punyasena, Surangi W.; Romero, Ingrid; Urban, Michael A. (2025): Airyscan confocal superresolution images of extant Malvaceae pollen with a focus on Bombacoideae. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-2968712_V1
keywords: Malvaceae; superresolution microscopy; Zeiss; Bombacoideae; Neotropics; CZI
published: 2025-01-30
 
Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.2.0 adds 94 additional coup events. 66 of these came from examining Powell and Thyne’s “discarded” events and 28 of these events were added to the data set in the normal annual review of potential new coup events. This version also updates the coding to events in Brazil in 1945 and the Congo in 1968. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 as a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011) Version 1.0.0 was released in 2013. This version consolidated coup data taken from the following sources: • The Center for Systemic Peace (Marshall and Marshall, 2007) • The World Handbook of Political and Social Indicators (Taylor and Jodice, 1983) • Coup d’Ètat: A Practical Handbook (Luttwak, 1979) • The Cline Center’s Social, Political and Economic Event Database (SPEED) Project (Nardulli, Althaus and Hayes, 2015) • Government Change in Authoritarian Regimes – 2010 Update (Svolik and Akcinaroglu, 2006) <br> <b>Items in this Dataset</b> 1. <i>Cline Center Coup d'État Codebook v.2.2.0 Codebook.pdf</i> - This 17-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. <i>Revised January 2025</i> 2. <i>Coup Data v2.2.0.csv</i> - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1094 observations. <i>Revised January 2025</i> 3. <i>Source Document v2.2.0.pdf</i> - This 347-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. <i>Revised January 2025</i> 4. <i>README.md</i> - This file contains useful information for the user about the dataset. It is a text file written in markdown language. <i>Revised January 2025</i> <br> <b> Citation Guidelines</b> 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2025. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.2.0. Janurary 30. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V8 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Michael Martin, Sam Alahi, Norah Fadell, and Maddie Jeralds. 2025. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.2.0. Janurary 30. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V8
published: 2025-01-30
 
This is a research data for a manuscript - A Framework of Simulating Structural Sediment Perimeter Barriers using VFSMOD.
keywords: sediment control
published: 2025-01-29
 
These data records weekly aphid and monarch butterfly (Danaus plexippus) neonate counts on individual milkweed plants in multiple raised garden beds in Chicago during the summers of 2023 and 2024. Relationships between aphid infestation and monarch neonates can be investigated along with weekly trends of monarch oviposition and aphid abundances. All gardens included in this study were on the University of Illinois Chicago campus, and within 100 meters of proximity. Data are provided on three milkweed species in 2023, and one milkweed species in 2024.
keywords: Aphis; Myzocallis; Danaus plexippus; urban gardens; Asclepias syriaca; milkweeds