Displaying 1 - 25 of 736 in total
Subject Area
Funder
Publication Year
License
Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2025-03-20
 
This dataset contains white-tailed deer (Odocoileus virginianus) land cover utility score (deer LCU score) data for every TRS (township, range, and section), township-range, and county in Illinois, USA, based on annual National Land Cover Database (NLCD) data released for all years between 2000 and 2023. LCU data is provided in CSV files for each spatial scale, with TRS data split into 2 CSV files due to size limits. Rasters (TIF) showing all deer habitat in Illinois are also provided to show the location, quality, and quantity of deer habitat. A metadata file is also included for additional information.
keywords: habitat; white-tailed deer; deer; Odocoileus virginianus; land cover; land classification; landscape; habitat suitability index; ecology; environment
published: 2025-03-19
 
This repository includes HRLDAS Noah-MP model output generated as part of Bieri et al. (2025) - Implementing deep soil and dynamic root uptake in Noah-MP (v4.5): Impact on Amazon dry-season transpiration. These data are distributed in two different formats: Raw model output files and subsetted files that include data for a specific variable. All files are .nc format (NetCDF) and aggregated into .tar files to facilitate download. Given the size of these datasets, Globus transfer is the best way to download them. Raw model output for four model experiments is available: FD (control), GW, SOIL, and ROOT. See the associated publication for information on the different experiments. These data span an approximately 20 year period from 01 Jun 2000 to 31 Dec 2019. The data have a spatial resolution of 4 km and a temporal frequency of 3 hours. These data are for a domain in the southern Amazon basin (see Figure 1 in the associated publication). Data for each experiment is available as a .tar file which includes 3-hourly NetCDF files. All default Noah-MP output variables are included in each file. As a result, the .tar files are quite large and may take many hours or even days to transfer depending on your network speed and local configurations. These files are named 'noahmp_output_2000_2019_EXP.tar', where EXP is the name of the experiment (FD, GW, SOIL, or ROOT). Subsetted model output at a daily temporal resolution for all four model experiments is also available. These .tar files include the following variables: water table depth (ZWT), latent heat flux (LH), sensible heat flux (HFX), soil moisture (SOIL_M), canopy evaporation (ECAN), ground evaporation (EDIR), transpiration (ETRAN), rainfall rate at the surface (QRAIN), and two variables that are specific to the ROOT experiment: ROOTACTIVITY (root activity function) and GWRD (active root water uptake depth). There is one file for each variable within the tarred files. These files are named 'noahmp_output_subset_2000_2019_EXP.tar', where EXP is the name of the experiment (FD, GW, SOIL, or ROOT). Finally, there is a sample dataset with raw 3-hourly output from the ROOT experiment for one day. The purpose of this sample dataset is to allow users to confirm if these data meet their needs before initiating a full transfer via Globus. This file is named 'noahmp_output_sample_ROOT.tar'. The README.txt file provides information on the Noah-MP output variables in these datasets, among other specifications. Information on HRLDAS Noah-MP and names/definitions of model output variables that are useful in working with these data are available here: http://dx.doi.org/10.5065/ew8g-yr95. Note that some output variables may be listed in this document under a different variable name, so searching for the long name (e.g. 'baseflow' instead of 'QRF') is recommended. Information on additional output variables that were added to the model as part of this study is available here: https://github.com/bieri2/bieri-et-al-2025-EGU-GMD/tree/DynaRoot. Model code, configuration files, and forcing data used to carry out the model simulations are linked in the related resources section.
keywords: Land surface model; NetCDF
published: 2025-03-18
 
The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. Additional Resources: - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, please fill out the <a href="https://docs.google.com/forms/d/e/1FAIpQLSf-J937V6I4sMSxQt7gR3SIbUASR26KXxqSurrkBvlF-CIQnQ/viewform?usp=pp_url"><b>Archer Access Request Form</b></a> so we can determine if you are eligible for access or not. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the <a href="https://forms.gle/6eA2yJUGFMtj5swY7"><b>Archer User Feedback Form</b></a>. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this <a href="https://groups.webservices.illinois.edu/subscribe/154221"><b>form</b></a> to subscribe to it. <b>Citation Guidelines:</b> 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2025. Global News Index and Extracted Features Repository [codebook], v1.3.0. Champaign, IL: University of Illinois. June. XX. doi:10.13012/B2IDB-5649852_V6 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2025. Global News Index and Extracted Features Repository [database], v1.3.0. Champaign, IL: University of Illinois. Jun. XX. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V6 *NOTE: V6 is replacing V5 with updated ‘Archer’ documents to reflect changes made to the Archer system.
published: 2025-03-17
 
A mechanistic functional structural plant model. The .gsz file includes a parameterised maize and soybean to be used in GRoIMP software https://grogra.de/. The current model is parameterised to maize cultivar DKC63-21RIB and soybean cultivar AG36X6 for the 2019 growing season in Champaign, IL USA.
keywords: Functional structural plant model; intercropping; plant architecture; maize; soybean
published: 2025-03-14
 
Hype - PubMed dataset Prepared by Apratim Mishra This dataset captures ‘Hype’ within biomedical abstracts sourced from PubMed. The selection chosen is ‘journal articles’ written in English, published between 1975 and 2019, totaling ~5.2 million. The classification relies on the presence of specific candidate ‘hype words’ and their abstract location. Therefore, each article (PMID) might have multiple instances in the dataset due to the presence of multiple hype words in different abstract sentences. The candidate hype words are 35 in count: 'major', 'novel', 'central', 'critical', 'essential', 'strongly', 'unique', 'promising', 'markedly', 'excellent', 'crucial', 'robust', 'importantly', 'prominent', 'dramatically', 'favorable', 'vital', 'surprisingly', 'remarkably', 'remarkable', 'definitive', 'pivotal', 'innovative', 'supportive', 'encouraging', 'unprecedented', 'enormous', 'exceptional', 'outstanding', 'noteworthy', 'creative', 'assuring', 'reassuring', 'spectacular', and 'hopeful’. This is version 3 of the dataset. Added new file - WSD_hype.tsv File 1: hype_dataset_final.tsv Primary dataset. It has the following columns: 1. PMID: represents unique article ID in PubMed 2. Year: Year of publication 3. Hype_word: Candidate hype word, such as ‘novel.’ 4. Sentence: Sentence in abstract containing the hype word. 5. Hype_percentile: Abstract relative position of hype word. 6. Hype_value: Propensity of hype based on the hype word, the sentence, and the abstract location. 7. Introduction: The ‘I’ component of the hype word based on IMRaD 8. Methods: The ‘M’ component of the hype word based on IMRaD 9. Results: The ‘R’ component of the hype word based on IMRaD 10. Discussion: The ‘D’ component of the hype word based on IMRaD File 2: hype_removed_phrases_final.tsv Secondary dataset with same columns as File 1. Hype in the primary dataset is based on excluding certain phrases that are rarely hype. The phrases that were removed are included in File 2 and modeled separately. Removed phrases: 1. Major: histocompatibility, component, protein, metabolite, complex, surgery 2. Novel: assay, mutation, antagonist, inhibitor, algorithm, technique, series, method, hybrid 3. Central: catheters, system, design, composite, catheter, pressure, thickness, compartment 4. Critical: compartment, micelle, temperature, incident, solution, ischemia, concentration, thinking, nurses, skills, analysis, review, appraisal, evaluation, values 5. Essential: medium, features, properties, opportunities, oil 6. Unique: model, amino 7. Robust: regression 8. Vital: capacity, signs, organs, status, structures, staining, rates, cells, information 9. Outstanding: questions, issues, question, questions, challenge, problems, problem, remains 10. Remarkable: properties 11. Definite: radiotherapy, surgery File 3: WSD_hype.tsv Includes hype-based disambiguation for candidate words targeted for WSD (Word sense disambiguation)
keywords: Hype; PubMed; Abstracts; Biomedicine
published: 2025-03-13
 
ALMA Band 4 and 7 observations of the dust continuum in the Class 0 protostellar system L1448 IRS3B. We include the selfcal script, imaging scripts, fits files, and the python scripts for the figures in the paper.
keywords: ALMA; Band 4; Band 6; polarization; L1448 IRS3B
published: 2024-05-07
 
This dataset builds on an existing dataset which captures artists’ demographics who are represented by top tier galleries in the 2016–2017 New York art season (Case-Leal, 2017, https://web.archive.org/web/20170617002654/http://www.havenforthedispossessed.org/) with a census of reviews and catalogs about those exhibitions to assess proportionality of media coverage across race and gender. The readme file explains variables, collection, relationship between the datasets, and an example of how the Case-Leal dataset was transformed. The ArticleDataset.csv provides all articles with citation information as well as artist, artistic identity characteristic, and gallery. The ExhibitionCatalog.csv provides exhibition catalog citation information for each identified artist.
keywords: diversity and inclusion; diversity audit; contemporary art; art exhibitions; art exhibition reviews; exhibition catalogs; magazines; newspapers; demographics
published: 2025-03-05
 
These data files were used for phylogenomic analyses of Darnini and related Membracidae (Hemiptera: Auchenorrhyncha) in the referenced article by Gonzalez-Mozo et al. The "mem_50p_alignment.fasta" file contains the aligned, concatenated nucleotide sequence data for 52 species and 494 genetic loci included in the phylogenetic analyses ("N" indicates missing data and "-" indicates an alignment gap). The file "Table 1.rtf" lists the included species, country of origin and genbank accession number. Species newly sequenced for this study have a Sample ID with prefix "DAR"; previously sequenced species for which data were downloaded from genbank have "NCBI" indicated in the same column of the table. The file "partition_def_50p.txt" lists the 494 genetic loci included in the alignment with their exact positions indicated by the range of numbers given at the end of each line (e.g., locus "uce-1" occupies positions 1-280 in the alignment).
keywords: Insecta; Hemiptera; anchored-hybrid enrichment; phylogeny; treehopper
published: 2025-03-12
 
Environmental DNA metabarcoding data for fish communities at 50 sites in the Tennessee River watershed of northern Alabama, United States collected in summer 2018 used in the calculation of an Index of Biotic Integrity for biological monitoring
keywords: Alabama; biological monitoring; environmental DNA; fish; Index of Biotic Integrity; water quality
published: 2025-03-12
 
References - Jeong, Gangwon, Umberto Villa, and Mark A. Anastasio. "Revisiting the joint estimation of initial pressure and speed-of-sound distributions in photoacoustic computed tomography with consideration of canonical object constraints." Photoacoustics (2025): 100700. - Park, Seonyeong, et al. "Stochastic three-dimensional numerical phantoms to enable computational studies in quantitative optoacoustic computed tomography of breast cancer." Journal of biomedical optics 28.6 (2023): 066002-066002. Overview - This dataset includes 80 two-dimensional slices extracted from 3D numerical breast phantoms (NBPs) for photoacoustic computed tomography (PACT) studies. The anatomical structures of these NBPs were obtained using tools from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project. The methods used to modify and extend the VICTRE NBPs for use in PACT studies are described in the publication cited above. - The NBPs in this dataset represent the following four ACR BI-RADS breast composition categories: > Type A - The breast is almost entirely fatty > Type B - There are scattered areas of fibroglandular density in the breast > Type C - The breast is heterogeneously dense > Type D - The breast is extremely dense - Each 2D slice is taken from a different 3D NBP, ensuring that no more than one slice comes from any single phantom. File Name Format - Each data file is stored as a .mat file. The filenames follow this format: {type}{subject_id}.mat where{type} indicates the breast type (A, B, C, or D), and {subject_id} is a unique identifier assigned to each sample. For example, in the filename D510022534.mat, "D" represents the breast type, and "510022534" is the sample ID. File Contents - Each file contains the following variables: > "type": Breast type > "p0": Initial pressure distribution [Pa] > "sos": Speed-of-sound map [mm/μs] > "att": Acoustic attenuation (power-law prefactor) map [dB/ MHzʸ mm] > "y": power-law exponent > "pressure_lossless": Simulated noiseless pressure data obtained by numerically solving the first-order acoustic wave equation using the k-space pseudospectral method, under the assumption of a lossless medium (corresponding to Studies I, II, and III). > "pressure_lossy": Simulated noiseless pressure data obtained by numerically solving the first-order acoustic wave equation using the k-space pseudospectral method, incorporating a power-law acoustic absorption model to account for medium losses (corresponding to Study IV). * The pressure data were simulated using a ring-array transducer that consists of 512 receiving elements uniformly distributed along a ring with a radius of 72 mm. * Note: These pressure data are noiseless simulations. In Studies II–IV of the referenced paper, additive Gaussian i.i.d. noise were added to the measurement data. Users may add similar noise to the provided data as needed for their own studies. - In Study I, all spatial maps (e.g., sos) have dimensions of 512 × 512 pixels, with a pixel size of 0.32 mm × 0.32 mm. - In Study II and Study III, all spatial maps (sos) have dimensions of 1024 × 1024 pixels, with a pixel size of 0.16 mm × 0.16 mm. - In Study IV, both the sos and att maps have dimensions of 1024 × 1024 pixels, with a pixel size of 0.16 mm × 0.16 mm.
keywords: Medical imaging; Photoacoustic computed tomography; Numerical phantom; Joint reconstruction
published: 2025-03-05
 
References - Li, Fu, Umberto Villa, Seonyeong Park, and Mark A. Anastasio. "3-D stochastic numerical breast phantoms for enabling virtual imaging trials of ultrasound computed tomography." IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 69, no. 1 (2021): 135-146. DOI: 10.1109/TUFFC.2021.3112544 - Li, Fu; Villa, Umberto; Park, Seonyeong; Anastasio, Mark, 2021, "2D Acoustic Numerical Breast Phantoms and USCT Measurement Data", https://doi.org/10.7910/DVN/CUFVKE, Harvard Dataverse, V1 Overview - This dataset includes 1,089 two-dimensional slices extracted from 3D numerical breast phantoms (NBPs) for ultrasound computed tomography (USCT) studies. The anatomical structures of these NBPs were obtained using tools from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project. The methods used to modify and extend the VICTRE NBPs for use in USCT studies are described in the publication cited above. - The NBPs in this dataset represent the following four ACR BI-RADS breast composition categories: > Type A - The breast is almost entirely fatty > Type B - There are scattered areas of fibroglandular density in the breast > Type C - The breast is heterogeneously dense > Type D - The breast is extremely dense - Each 2D slice is taken from a different 3D NBP, ensuring that no more than one slice comes from any single phantom. File Name Format - Each data file is stored as an HDF5 .mat file. The filenames follow this format: {type}{subject_id}.mat where{type} indicates the breast type (A, B, C, or D), and {subject_id} is a unique identifier assigned to each sample. For example, in the filename D510022534.mat, "D" represents the breast type, and "510022534" is the sample ID. File Contents - Each file contains the following variables: > "type": Breast type > "sos": Speed-of-sound map [mm/μs] > "den": Ambient density map [kg/mm³] > "att": Acoustic attenuation (power-law prefactor) map [dB/ MHzʸ mm] > "y": power-law exponent > "label": Tissue label map. Tissue types are denoted using the following labels: water (0), fat (1), skin (2), glandular tissue (29), ligament (88), lesion (200). - All spatial maps ("sos", "den", "att", and "label") have the same spatial dimensions of 2560 x 2560 pixels, with a pixel size of 0.1 mm x 0.1 mm. - "sos", "den", and "att" are float32 arrays, and "label" is an 8-bit unsigned integer array.
keywords: Medical imaging; Ultrasound computed tomography; Numerical phantom
planned publication date: 2025-06-30
 
Includes two files (.csv) behind all analyses and results in the paper published with the same title. <b>1) 'sites.species.counts'</b> is the raw 2018-2022 data from Angella Moorehouse (Illinois Nature Preserves Commission) including her 456 identified pollinator species and her raw counts per site (there may be a few errors of identification or naming, and there will always be name changes over time). Headers in columns F through Q correspond to the remnant-site labels in Figure 1 and Table 1 of the paper. Columns R to AB are the “nonremnant” sites, which have not been uniquely labelled since the specific sites aren't referenced anywhere in the manuscript. <b>2) 'C.scores'</b> has the 265 species assigned empirical C values (empirical.C) along with the four sets of expert C values and their confidence ranks (low, medium, high), and the Illinois/Indiana conservation ranks (S-ranks). Other headers in these files: - taxa.code: four-letter abbreviation for genus and specific name - genus: genus name - species: specific epithet - common.name: English name - group: general pollinator taxa group - empirical.C: empirically estimated conservatism score - expert#.C: conservatism score assigned by each of four experts - expert#.conf: expert's confidence in their conservatism score Blank cells in the site-species abundance matrix indicates species absence (or non-detection) Blank cells in C.scores.csv indicates missing S-ranks and unassigned C-scores (with associated missing confidence ranks) where experts lacked knowledge or confidence
keywords: ecological conservatism; indicator values; pollinator conservation; prairie ecosystems; protected areas; remnant communities
published: 2025-02-23
 
Dataset with numerical routines and laboratory testing data associated with the manuscript: Bondarenko, N., Podladchikov, Y., Williams‐Stroud, S., & Makhnenko, R. (2025). Stratigraphy‐induced localization of microseismicity during CO2 injection in Illinois Basin. Journal of Geophysical Research: Solid Earth, 130, e2024JB029526. https://doi.org/10.1029/2024JB029526
keywords: Illinois Basin Decatur Project; Induced Seismicity; GPU; Numerical modeling
published: 2025-02-20
 
To gather news articles from the web that discuss the Cochrane Review (DOI: 10.1002/14651858.CD006207.pub6), we retrieved articles on August 1, 2023 from used Altmetric.com's Altmetric Explorer. We selected all articles that were written in English, published in the United States, and had a publication date <b>on or after March 10, 2023</b> (according to the "Mention Date" from Altmetric.com). This date is significant as it is when Cochrane issued a statement (https://www.cochrane.org/news/statement-physical-interventions-interrupt-or-reduce-spread-respiratory-viruses-review) about the "misleading interpretation" of the Cochrane Review made by news articles. A previously published dataset for "Arguing about Controversial Science in the News: Does Epistemic Uncertainty Contribute to Information Disorder?" (DOI: 10.13012/B2IDB-4781172_V1) contains annotation of the news articles published before March 10, 2023. Our dataset annotates the news published on or after March 10, 2023. The Altmetric_data.csv describes the selected news articles with both data exported from Altmetric Explorer and data we manually added Data exported from Altmetric Explorer: - Publication date of the news article - Title of the news article - Source/publication venue of the news article - URL - Country Data we manually added: - Whether the article is accessible - The date we checked the article - The corresponding ID of the article in MAXQDA For each article from Altmetric.com, we first tried to use the Web Collector for MAXQDA to download the article from the website and imported it into MAXQDA (version 22.8.0). We manually extracted direct quotations from the articles using MAXQDA. We included surrounding words and sentences around direct quotations for context where needed. We manually added codes and code categories in MAXQDA to identify the individuals (chief editors of the Cochrane Review, government agency representatives, journalists, and other experts such as physicians) or organizations (government agencies, other organizations, and research publications) who were quoted. The MAXQDA_data.csv file contains excerpts from the news articles that contain the direct quotations we annotated. For each excerpt, we included the following information: - MAXQDA ID of the document from which the excerpt originates - The collection date and source of the document - The code we assigned to the excerpt - The code category - The excerpt itself
keywords: altmetrics; MAXQDA; masks for COVID-19; scientific controversies; news articles
published: 2025-02-03
 
The data and code provided in this dataset can be used to generate plots that show the results of linear prediction algorithm and the amplified modes, supporting the key argument of the manuscript. It is divided into five subfolders, each corresponding to one combination of external condition (magnetic field B, temperature), scan parameter (temperature, magnetic field B), pump laser polarization (linear s, linear p, and circular), and sample orientation ( B parallel to c axis, B perpendicular to c axis): 1) B parallel to c axis, linear pump polarization in s, linear THz emission polarization in s, field dependence (B_parallel_c_linear_spump_sprobe_field). 2) B parallel to c axis, linear pump polarization in s, linear THz emission polarization in s, temperature dependence (B_parallel_c_linear_spump_sprobe_temperature). 3) B perpendicular to c axis, linear pump polarization in s, linear THz emission polarization in s, field dependence (B_perp_c_linear_spump_sprobe_field). 4) B perpendicular to c axis, linear pump polarization in s, linear THz emission polarization in s, temperature dependence (B_perp_c_linear_spump_sprobe_temperature). 5) B parallel to c axis, circular pump polarization (left circularly polarized LCP and right circularly polarized RCP), linear THz emission polarization in s, field dependence (B_parallel_c_LCPRCP_pump_sprobe_field). Each folder contains the raw data (.mat), the oscillator parameters obtained through linear prediction algorithm (.mat), and the plot-generating code (.m). The code plots the raw data, the fit to the processed data, and the amplified modes. Codes are written in MATLAB R2024a; the working directory of each code should be the corresponding subfolder that contains it.
keywords: magneto-chiral instability; THz emission; THz spectroscopy; nonequilibrium states; emergent phenomena; Weyl semiconductor; tellurium; ultrafast spectrscopy; photoexcitation
published: 2016-05-19
 
This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission. The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.
keywords: taxi;transportation;New York City;GPS
published: 2025-02-14
 
This dataset includes the original data (including photographs as .jpg files and sound recordings as .wav files) and detailed descriptions of workflows for analyses of acoustic and morphometric data for the Neoaliturus tenellus (beet leafhopper) species complex. Files needed for different parts of the two analytical workflows are included in the "Acoustics.zip" and "PCA.zip" archives. The "Folder Structure.png" file contains a diagram of the folder structure of the two archives. Each archive contains a "ReadMe" file with instructions for repeating the analyses. File and folder names including the two-letter abbreviations TB, TD, TN and TP refer to four different putative species (operational taxonomic units, or OTUs, of the Neoaliturus tenellus complex.
keywords: Hemiptera; Cicadellidae; integrative taxonomy; courtship; morphology
published: 2025-02-07
 
This dataset contains raw data of plasma glucose, insulin, c-peptide, GLP-1, and FGF21 collected as part of a study aimed to study alcohol pharmacokinetics in women who underwent metabolic surgery.
keywords: Excel; Alcohol and metabolic surgery; glucose; insulin; c-peptide; glp-1; fgf21
published: 2024-03-27
 
To gather news articles from the web that discuss the Cochrane Review, we used Altmetric Explorer from Altmetric.com and retrieved articles on August 1, 2023. We selected all articles that were written in English, published in the United States, and had a publication date <b>prior to March 10, 2023</b> (according to the “Mention Date” on Altmetric.com). This date is significant as it is when Cochrane issued a statement about the "misleading interpretation" of the Cochrane Review. The collection of news articles is presented in the Altmetric_data.csv file. The dataset contains the following data that we exported from Altmetric Explorer: - Publication date of the news article - Title of the news article - Source/publication venue of the news article - URL - Country We manually checked and added the following information: - Whether the article still exists - Whether the article is accessible - Whether the article is from the original source We assigned MAXQDA IDs to the news articles. News articles were assigned the same ID when they were (a) identical or (b) in the case of Article 207, closely paraphrased, paragraph by paragraph. Inaccessible items were assigned a MAXQDA ID based on their "Mention Title". For each article from Altmetric.com, we first tried to use the Web Collector for MAXQDA to download the article from the website and imported it into MAXQDA (version 22.7.0). If an article could not be retrieved using the Web Collector, we either downloaded the .html file or in the case of Article 128, retrieved it from the NewsBank database through the University of Illinois Library. We then manually extracted direct quotations from the articles using MAXQDA. We included surrounding words and sentences, and in one case, a news agency’s commentary, around direct quotations for context where needed. The quotations (with context) are the positions in our analysis. We also identified who was quoted. We excluded quotations when we could not identify who or what was being quoted. We annotated quotations with codes representing groups (government agencies, other organizations, and research publications) and individuals (authors of the Cochrane Review, government agency representatives, journalists, and other experts such as epidemiologists). The MAXQDA_data.csv file contains excerpts from the news articles that contain the direct quotations we identified. For each excerpt, we included the following information: - MAXQDA ID of the document from which the excerpt originates; - The collection date and source of the document; - The code with which the excerpt is annotated; - The code category; - The excerpt itself.
keywords: altmetrics; MAXQDA; polylogue analysis; masks for COVID-19; scientific controversies; news articles
published: 2022-05-13
 
The files are plain text and contain the original data used in phylogenetic analyses of of Typhlocybinae (Bin, Dietrich, Yu, Meng, Dai and Yang 2022: Ecology & Evolution, in press). The three files with extension .phy are text files with aligned DNA sequences in the standard PHYLIP format and correspond to Matrix 1 (amino acid alignment), Matrix 2 (nucleotide alignment of first two codon positions of protein-coding genes) and Matrix 3 (nucleotide alignment of protein-coding genes plus 2 ribosomal genes) described in the Methods section. An additional text file in NEXUS format (.nex extension) contains the morphological character data used in the ancestral state reconstruction (ASCR) analysis described in the Methods. NEXUS is a standard format used by various phylogenetic analysis software. For more information on data file content, see the included "readme" files.
keywords: Hemiptera; phylogeny; mitochondrial genome; morphology; leafhopper
published: 2022-10-14
 
The Membracoidea_morph_data_Final.nex text file contains the original data used in the phylogenetic analyses of Dietrich et al. (Insect Systematics and Diversity, in review). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The complete taxon names corresponding to the 131 genus names listed under “BEGIN TAXA” are listed in Table 1 in the included PDF file “Taxa_and_characters”; the 229 morphological characters (names abbreviated under under “BEGIN CHARACTERS” are fully explained in the list of character descriptions following Table 1 in the same PDF). The data matrix follows “MATRIX” and gives the numerical values of characters for each taxon. Question marks represent missing data. The lists of characters and taxa and details on the methods used for phylogenetic analysis are included in the submitted manuscript.
keywords: leafhopper; treehopper; evolution; Cretaceous; Eocene
published: 2024-04-05
 
The following files include specimen information, DNA sequence data, and additional information on the analyses used to reconstruct the phylogeny of the leafhopper genus Neoaliturus as described in the Methods section of the original paper: 1. Taxon_sampling.csv: contains data on the individual specimens from which DNA was extracted, including sample code, taxon name, collection data (locality, date and name of collector) and museum unique identifier. 2. Alignments.zip: a ZIP archive containing 432 separate FASTA files representing the aligned nucleotide sequences of individual gene loci used in the analysis. 3. Concatenated_Matrix.fa: is a FASTA file containing the concatenated individual gene alignments used for the maximum likelihood analysis in IQ-TREE. 4. Genes_and_Loci.rtf: identifies the individual genes and loci used in the analysis. The partition name is the same as the name of the individual alignment file in the zipped Alignments folder. 5. Partitions_best_scheme.nex: is a text file in the standard NEXUS format that indicates the names of the individual data partitions and their locations in the concatenated matrix, and also indicates the substitution model for each partition. 6. (New in this version 2) Scripts & Description.zip includes 8 custom shell or perl scripts used to assemble the DNA sequence data by perform reciprocal blast searches between the reference sequences and assemblies for each sample, extract the best sequences based on the blast searches, screen the hits for each locus and keep only the best result, and generate the nucleotide sequence dataset for the predicted orthologues (see the file description.txt for details). 7. (New in this version 2) Full_genetic_distances_matrix.csv shows the genetic distances between pairs of samples in the datset (proportion of nucleotides that differ between samples).
keywords: leafhopper; phylogeny; anchored-hybrid-enrichment; DNA sequence; insect
published: 2025-02-08
 
The synthetic networks in this dataset were generated using the RECCS protocol developed by Anne et al. (2024). Briefly, the RECCS process is as follows. An input network and clustering (by any algorithm) is used to pass input parameters to a stochastic block model (SBM) generator. The output is then modified to improve fit to the input real world clusters after which outlier nodes are added using one of three different options. See Anne et al. (2024): in press Complex Networks and Applications XIII (preprint : arXiv:2408.13647). The networks in this dataset were generated using either version 1 or version 2 of the RECCS protocol followed by outlier strategy S1. The input networks to the process were (i) the Curated Exosome Network (CEN), Wedell et al. (2021), (ii) cit_hepph (https://snap.stanford.edu/), (iii) cit_patents (https://snap.stanford.edu/), and (iv) wiki_topcats (https://snap.stanford.edu/). Input Networks: The CEN can be downloaded from the Illinois Data Bank: https://databank.illinois.edu/datasets/IDB-0908742 -> cen_pipeline.tar.gz -> S1_cen_cleaned.tsv The synthetic file naming system should be interpreted as follows: a_b_c.tsv.gz where a - name of inspirational network, e.g., cit_hepph b - the resolution value used when clustering a with the Leiden algorithm optimizing the Constant Potts Model, e.g., 0.01 c- the RECCS option used to approximate edge count and connectivity in the real world network, e.g., v1 Thus, cit_hepph_0.01_v1.tsv indicates that this network was modeled on the cit_hepph network and RECCSv1 was used to match edge count and connectivity to a Leiden-CPM 0.01 clustering of cit_hepph. For SBM generation, we used the graph_tool software (P. Peixoto, Tiago 2014. The graph-tool python library. figshare. Dataset. https://doi.org/10.6084/m9.figshare.1164194.v14) Additionally, this dataset contains synthetic networks generated for a replication experiment (repl_exp.tar.gz). The experiment aims to evaluate the consistency of RECCS-generated networks by producing multiple replicates under controlled conditions. These networks were generated using different configurations of RECCS, varying across two versions (v1 and v2), and applying the Connectivity Modifier (CM++, Ramavarapu et al. (2024)) pre-processing. Please note that the CM pipeline used for this experiment filters small clusters both before and after the CM treatment. Input Network : CEN Within repl_exp.tar.gz, the synthetic file naming system should be interpreted as follows: cen_<resolution><cm_status><reccs_version>sample<replicate_id>.tsv where: cen – Indicates the network was modeled on the Curated Exosome Network (CEN). resolution – The resolution parameter used in clustering the input network with Leiden-CPM (0.01). cm_status – Either cm (CM-treated input clustering) or no_cm (input clustering without CM treatment). reccs_version – The RECCS version used to generate the synthetic network (v1 or v2). replicate_id – The specific replicate (ranging from 0 to 2 for each configuration). For example: cen_0.01_cm_v1_sample_0.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, CM-treated input, and generated using RECCSv1 (first replicate). cen_0.01_no_cm_v2_sample_1.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, without CM treatment, and generated using RECCSv2 (second replicate). The ground truth clustering input to RECCS is contained in repl_exp_groundtruths.tar.gz.
keywords: Community Detection; Synthetic Networks; Stochastic Block Model (SBM);
published: 2024-09-17
 
The following seven zip files are compressed folders containing the input datasets/trees, main output files and the scripts of the related analyses performed in this study. I. ancestral_microhabitat_reconstruction.zip: contains four files, including two input files (microhabitats.csv, timetree.tre) and a script (simmap_microhabitat.R) for ancestral states reconstruction of microhabitat by make.simmap implemented in the R package phytools v1.5, as well as the main output file (ancestral_microhabitats.csv). 1. ancestral_microhabitats.csv: reconstructed ancestral microhabitats for each node. 2. microhabitats.csv: microhabitats of the studies species. 3. simmap_microhabitat.R: the R script of make.simmap for ancestral microhabitat reconstruction 4. timetree.tre: dated tree used for ancestral state reconstruction for microhabitat and morphological characters II. ancestral_morphology_reconstruction.zip: contains six files, including an input file (morphology.csv) and a script (simmap_morphology.R) for ancestral states reconstruction of morphology by make.simmap implemented in the R package phytools v1.5, as well as four main output files(forewing_ancestral_state.csv, frontal_sutures_ancestral_state.csv, hind_wing_ancestral_state.csv, ocellus_ancestral_state.csv). 1. forewing_ancestral_state.csv: reconstructed ancestral states of the development of the forewing for each node. 2. frontal_sutures_ancestral_state.csv: reconstructed ancestral states of the development of frontal sutures for each node. 3. hind_wing_ancestral_state.csv: reconstructed ancestral states of the development of the hind wing for each node. 4. morphology.csv: the states of the development of ocellus, forewing, hing wing and frontal sutures for each studies species. 5. ocellus_ancestral_state.csv: reconstructed ancestral states of the development of the ocellus for each node. 6. simmap_morphology.R: the R script of make.simmap for ancestral state reconstruction of morphology III. biogeographic_reconstruction.zip: contains four files, including three input files (dispersal_probablity.txt, distributions.csv, timetree_noOutgroup.tre) used for a stratified biogeographic analysis by BioGeoBEARS in RASP v4.2 and the main output file (DIVELIKE_result.txt). 1. dispersal_probablity.txt: relative dispersal probabilities among biogeographical regions at different geological epochs. 2. distributions.csv: current distributions of the studied species. 3. DIVELIKE_result.txt: BioGeoBEARS result of ancestral areas based on the DIVELIKE model. 4. timetree_noOutgroup.tre: the dated tree with the outgroup lineage (Eurymelinae) excluded. IV. coalescent_analysis.zip: contains a folder and two files, including a folder (individual_gene_alignment) of input files used to construct gene trees, an input file (MLtree_BS70.tre) used for the multi-species coalescent analysis by ASTRAL v 4.10.5 and the main output file (coalescent_species_tree.tre). 1. coalescent_species_tree.tre: the species tree generated by the multi-species coalescent analysis with the quartet support, effective number of genes and the local posterior probability indicated. 2. individual_gene_alignment: a folder containing 427 FASTA files, each one represents the nucleotide alignment for a gene. Hyphens are used to represent gaps. These files were used to construct gene trees using IQ-TREE v1.6.12. 3. MLtree_BS70.tre: 165 gene trees with the average SH-aLRT and ultrafast bootstrap values of ≥ 70%. This file was used to estimate the species tree by ASTRAL v 4.10.5. V. divergence_time_estimation.zip: contains five files, including two input files (treefile_rooted_noBranchLength.tre, treefile_rooted.tre) and two control files (baseml.ctl, mcmctree.ctl) used for divergence time estimation by BASEML and MCMCTREE in PAML v4.9, as well as the main output file (timetree_with95%HPD.tre). 1. baseml.ctl: the control file used for the estimation of substitution rates by BASEML in PAML v4.9. 2. mcmctree.ctl: the control file used for the estimation of divergence times by MCMCTREE in PAML v4.9. 3. timetree_with95%HPD.tre: dated tree with the 95% highest posterior density confidence intervals indicated. 4. treefile_rooted_noBranchLength.tre: the maximum likelihood tree based on the concatenated nucleotide dataset with calibrations for the crown and internal nodes. Branch length and support values were not indicated. 5. treefile_rooted.tre: the maximum likelihood tree based on the concatenated nucleotide dataset with a secondary calibration on the root age. Branch support values were not indicated. VI. maximum_likelihood_analysis_aa.zip: contains three files, including two input files (concatenated_aa_partition.nex, concatenated_aa.phy) used for the maximum likelihood analysis by IQ-TREE v1.6.12 and the main output file (MLtree_aa.tre). 1. concatenated_aa_partition.nex: the partitioning schemes for the maximum likelihood analysis using concatenated_aa.phy. This file partitions the 52,024 amino acid positions into 427 character sets. 2. concatenated_aa.phy: a concatenated amino acid dataset with 52,024 amino acid positions. Hyphens are used to represent gaps. This dataset was used for the maximum likelihood analysis. 3. MLtree_aa.tre: the maximum likelihood tree based on the concatenated amino acid dataset, with SH-aLRT values and ultrafast bootstrap values indicated. VII. maximum_likelihood_analysis_nt.zip: contains three files, including two input files (concatenated_nt_partition.nex, concatenated_nt.phy) used for the maximum likelihood analysis by IQ-TREE v1.6.12 and the main output file (MLtree_nt.tre). 1. concatenated_nt_partition.nex: the partitioning schemes for the maximum likelihood analysis using concatenated_nt.phy. This file partitions the 156,072 nucleotide positions into 427 character sets. 2. concatenated_nt.phy: a concatenated nucleotide dataset with 156,072 nucleotide positions. Hyphens are used to represent gaps. This dataset was used for the maximum likelihood analysis as well as divergence time estimation. 3. MLtree_nt.tre: the maximum likelihood tree based on the concatenated nucleotide dataset, with SH-aLRT values and ultrafast bootstrap values indicated. VIII. Taxon_sampling.csv: contains the sample IDs (1st column) which were used in the alignments and the taxonomic information (2nd to 6th columns).
keywords: Anchored Hybrid Enrichment, Biogeography, Cicadellidae, Phylogenomics, Treehoppers