Displaying 1 - 25 of 746 in total
Subject Area
Funder
Publication Year
License
Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2025-04-25
 
This is an Excel file containing data about the physical environments of four Brazilian schools and the average daily minutes/day of physical activity and sedentary behavior exhibited by schoolchildren during school hours.
keywords: school environment; physical activity
published: 2025-04-25
 
Zika virus (ZIKV) infection has been linked to neurological disorders such as microcephaly in children. Cases of Guillain-Barré Syndrome (GBS), a peripheral nervous system (PNS) disorder, have been reported in adults with ZIKV infection. These ZIKV-related GBS cases often exhibit atypical clinical features compared to classic GBS, including central nervous system (CNS) involvement. This dataset comprises two patient groups and a healthy control group. The first patient group includes adults with confirmed ZIKV infection, presenting both PNS-related GBS symptoms and CNS manifestations. The second group consists of adults with GBS but without ZIKV infection. The final group includes healthy, unaffected individuals.
keywords: Zika virus; Guillain-Barré Syndrome; adults; neuroimaging; central nervous system;
published: 2020-05-04
 
The Cline Center Historical Phoenix Event Data covers the period 1945-2019 and includes 8.2 million events extracted from 21.2 million news stories. This data was produced using the state-of-the-art PETRARCH-2 software to analyze content from the New York Times (1945-2018), the BBC Monitoring's Summary of World Broadcasts (1979-2019), the Wall Street Journal (1945-2005), and the Central Intelligence Agency’s Foreign Broadcast Information Service (1995-2004). It documents the agents, locations, and issues at stake in a wide variety of conflict, cooperation and communicative events in the Conflict and Mediation Event Observations (CAMEO) ontology. The Cline Center produced these data with the generous support of Linowes Fellow and Faculty Affiliate Prof. Dov Cohen and help from our academic and private sector collaborators in the Open Event Data Alliance (OEDA). For details on the CAMEO framework, see: Schrodt, Philip A., Omür Yilmaz, Deborah J. Gerner, and Dennis Hermreck. "The CAMEO (conflict and mediation event observations) actor coding framework." In 2008 Annual Meeting of the International Studies Association. 2008. http://eventdata.parusanalytics.com/papers.dir/APSA.2005.pdf Gerner, D.J., Schrodt, P.A. and Yilmaz, O., 2012. Conflict and mediation event observations (CAMEO) Codebook. http://eventdata.parusanalytics.com/cameo.dir/CAMEO.Ethnic.Groups.zip For more information about PETRARCH and OEDA, see: http://openeventdata.org/
keywords: OEDA; Open Event Data Alliance (OEDA); Cline Center; Cline Center for Advanced Social Research; civil unrest; petrarch; phoenix event data; violence; protest; political; conflict; political science
published: 2025-04-24
 
Includes two files (.csv) behind all analyses and results in the paper published with the same title. <b>1) 'sites.species.counts'</b> is the raw 2018-2022 data from Angella Moorehouse (Illinois Nature Preserves Commission) including her 456 identified pollinator species and her raw counts per site (there may be a few errors of identification or naming, and there will always be name changes over time). Headers in columns F through Q correspond to the remnant-site labels in Figure 1 and Table 1 of the paper. Columns R to AB are the “nonremnant” sites, which have not been uniquely labelled since the specific sites aren't referenced anywhere in the manuscript. <b>2) 'C.scores'</b> has the 265 species assigned empirical C values (empirical.C) along with the four sets of expert C values and their confidence ranks (low, medium, high), and the Illinois/Indiana conservation ranks (S-ranks). Other headers in these files: - taxa.code: four-letter abbreviation for genus and specific name - genus: genus name - species: specific epithet - common.name: English name - group: general pollinator taxa group - empirical.C: empirically estimated conservatism score - expert#.C: conservatism score assigned by each of four experts - expert#.conf: expert's confidence in their conservatism score Blank cells in the site-species abundance matrix indicates species absence (or non-detection) Blank cells in C.scores.csv indicates missing S-ranks and unassigned C-scores (with associated missing confidence ranks) where experts lacked knowledge or confidence
keywords: ecological conservatism; indicator values; pollinator conservation; prairie ecosystems; protected areas; remnant communities
published: 2025-04-24
 
These are the datasets underlying the figures in the manuscript "Methods of active surveillance for hard ticks and associated tick-borne pathogens of public health importance in the contiguous United States: A Comprehensive Systematic Review". The review considered only publications reporting on active tick or tick-borne pathogen surveillance in the contiguous United States published between 1944 and 2018. For the purposes of this review, we were only concerned with studies of Ixodidae (hard ticks) and/or studies of tick-borne pathogens (in humans, animals, or hard ticks) of public health importance to humans. Study designs included cross-sectional, serological, epidemiological, ecological, or observational studies. Only peer-reviewed publications published in the English language were included. Studies were excluded if they focused on a tick that is not a vector of a human pathogen or on a pathogen that does not cause disease in humans, if the tick or tick-borne pathogen findings were incidental, or if they did not include quantitative surveillance data. For the purpose of this study, we defined surveillance data as information on ticks or pathogens provided through active sampling in natural areas; it should be noted that this does not match the strict definition used by the CDC, which requires sustained sampling efforts across time. Studies were also excluded if they: explored regions other than the contiguous US; focused on treatment, vaccine, or therapeutics development and/or diagnostics of human disease; focused on tick or pathogen genetics; focused on experimental studies with ticks or hosts; were tick control and/or management studies; performed only passive surveillance; were review articles; were not peer reviewed; were in a language other than English; the full text was not available; and if the disease was not a risk to the general public. In addition, for articles which reported data that had previously been published, we only included previously unreported information collected by the authors, and we referenced the specific period of collection for these data to ensure we were not double-recording data. Due to publication delays, we also performed a non-systematic review of the literature of articles published between 2019 – 2023 on tick and tickborne pathogen surveillance methods conducted in the contiguous United States. Keyword search was performed in PubMed Central and Web of Science Core Collection databases. The search algorithm keywords included tick(s), Amblyomma, Dermacentor, Ixodes, Rhipicephalus, Acari Ixodidea, tick host(s), Lyme disease, Rocky Mountain Spotted Fever, Spotted Fever Group, Rickettsiosis, Ehrlichiosis, Anaplasmosis, Borreliosis, Tularemia, Babesiosis, tick-borne pathogen, Powassan, Heartland, Bourbon, Colorado tick fever, Pacific Coast tick fever, tick surveillance, surveillance, (sero)epidemiology, prevalence, distribution, ecology, United States. The search algorithm utilized is provided as follows: TI= ((ticks OR Ixodes OR Amblyomma OR Dermacentor OR Rhipicephalus OR "Acari Ixodidi" OR "tick hosts" OR "tick host") OR ("Lyme Disease" OR "Rocky Mountain Spotted Fever" OR "Spotted Fever Group" OR Rickettsiosis OR Rickettsial OR Ehrlichiosis OR Anaplasmosis OR Borreliosis OR Tularemia OR Babesiosis OR Borrelia OR Ehrlichia OR Anaplasma OR Rickettsia OR Babesia OR "tick-borne pathogen" OR "tick borne pathogen")) AND TS= ("tick surveillance" OR surveillance OR epidemiology OR seroepidemiology OR ecology) AND CU=("United States of America" OR "USA" OR "United States" OR United-States). These datasets are the collated data underlying the figures in the manuscript. For more details, please see the publication. The following are explanations for variables used in all the CSV files: Tick: Species of tick collected Tick_Method: Method of collecting ticks Pathogen: Species of pathogen tested for Path_Method: Method of testing for pathogens Decade: Decade of publication n: Number of publications STATE: state in which study was conducted COUNTY: county in which study was conducted 1944 - 2018 (Was surveillance performed?): was there at least one publication included with a publication date within the 1944-2018 period in this geographic region? 2019 - 2023 (Was surveillance performed?): was there at least one publication included with a publication date within the 2019-2023 period in this geographic region?
keywords: ticks; systematic review; surveillance
published: 2025-04-23
 
These data files were used for phylogenomic analyses of Darnini and related Membracidae (Hemiptera: Auchenorrhyncha) in the referenced article by Gonzalez-Mozo et al. - The "mem_50p_alignment.fas" file contains the aligned, concatenated nucleotide sequence data for 51 species and 492 genetic loci included in the phylogenetic analyses ("N" indicates missing data and "-" indicates an alignment gap). - The file "Table1.rtf" lists the included species, country of origin and genbank accession number. Species newly sequenced for this study have a Sample ID with prefix "DAR"; previously sequenced species for which data were downloaded from genbank have "NCBI" indicated in the same column of the table. - The file "partition_def.txt" lists the 492 genetic loci included in the alignment with their exact positions indicated by the range of numbers given at the end of each line (e.g., locus "uce-1" occupies positions 1-280 in the alignment). - The substitution model file "mem_50p.model" contains information on the substitution models used in the partitioned maximum likelihood analysis, including the models used for different data partitions and parameter values, as output by the phylogenetic software IQ-TREE. - Individual tree files in Newick format (plain text) are provided for the phylogeny from concatenated analysis with the best likelihood score ("mem_50p_bestLikelihoodScore"), concatenated likelihood analysis with gene concordance factors ("mem_50p_gcf") and site concordance factors ("mem_50p_scf"). - The tree file from the ASTRAL analysis is "mem_50p_astral". - The zip archive entitled “IQ-TREE analysis results.zip” includes output from the maximum likelihood analysis of the concatenated nucleotide sequence data, including the following: (1) main output file “mem_50p.iqtree” summarizing model selection, partitioning schemes, likelihood scores, and run parameters; (2) “mem_50p.mldist” including pairwise ML distances between taxa; (3) “mem_50p.best_scheme.nex” with the best partitioning scheme identified by ModelFinder in NEXUS format and (4) “mem_50p.best_scheme” the RAxM-compatible version of the same file. - The “Ultrafast bootstrap results.zip” zip archive contains: (1) “mem_50p.ufboot” with the bootstrap replicate trees; (2) “mem_50p.contree” with the majority-rule consensus tree with support values; (3) “mem_50p.splits.nex”, with split support values across the replicates; (4) “mem_50p.log” is the log file. - The “gene_trees.zip” zip archive contains the individual gene trees as input for subsequent coalescent gene tree analysis in the phylogenetic program ASTRAL. - The file "DarniniAHE_Character Matrix.csv" contains the data for 6 morphological characters for which the ancestral states were reconstructed using the phylogenetic results from analysis of anchored-hybrid data (see article text for details). - The file "scriptACRDarnini.txt" contains the commands used to reconstruct ancestral morphological characters states using the corHMM 2.8 R package. See the Methods section of the article for more details.
keywords: Insecta; Hemiptera; anchored-hybrid enrichment; phylogeny; treehopper
published: 2025-02-03
 
The data and code provided in this dataset can be used to generate plots that show the results of linear prediction algorithm and the amplified modes, supporting the key argument of the manuscript. It is divided into five subfolders, each corresponding to one combination of external condition (magnetic field B, temperature), scan parameter (temperature, magnetic field B), pump laser polarization (linear s, linear p, and circular), and sample orientation ( B parallel to c axis, B perpendicular to c axis): 1) B parallel to c axis, linear pump polarization in s, linear THz emission polarization in s, field dependence (B_parallel_c_linear_spump_sprobe_field). 2) B parallel to c axis, linear pump polarization in s, linear THz emission polarization in s, temperature dependence (B_parallel_c_linear_spump_sprobe_temperature). 3) B perpendicular to c axis, linear pump polarization in s, linear THz emission polarization in s, field dependence (B_perp_c_linear_spump_sprobe_field). 4) B perpendicular to c axis, linear pump polarization in s, linear THz emission polarization in s, temperature dependence (B_perp_c_linear_spump_sprobe_temperature). 5) B parallel to c axis, circular pump polarization (left circularly polarized LCP and right circularly polarized RCP), linear THz emission polarization in s, field dependence (B_parallel_c_LCPRCP_pump_sprobe_field). Each folder contains the raw data (.mat), the oscillator parameters obtained through linear prediction algorithm (.mat), and the plot-generating code (.m). The code plots the raw data, the fit to the processed data, and the amplified modes. Codes are written in MATLAB R2024a; the working directory of each code should be the corresponding subfolder that contains it.
keywords: magneto-chiral instability; THz emission; THz spectroscopy; nonequilibrium states; emergent phenomena; Weyl semiconductor; tellurium; ultrafast spectrscopy; photoexcitation
planned publication date: 2025-05-15
 
Coagulation testing (VCM Vet™) was performed on 57 horses with acute abdominal pain at admission to the University of Illinois Veterinary Teaching Hospital. Additional clinical data were recorded retrospectively. ROC analysis was performed to determine the optimal number of abnormal coagulation parameters for coagulopathy diagnosis based on survival. General linear regression (GLM) and random forest (RF) classification models were developed to predict short-term survival. A training cohort of 40 horses was used for model development, and model performance was determined using the remaining 17 horses.
keywords: horse; coagulation; colic; abdominal pain; survival; machine learning; blood clotting; viscoelastic testing
published: 2025-04-21
 
#Overview These are reference packages for the TIPP3 software for abundance profiling and/or species detection from metagenomic reads (e.g., Illumina, PacBio, Nanopore, etc.). Different refpkg versions are listed. TIPP3 software: https://github.com/c5shen/TIPP3 #Changelog V1.2 (`tipp3-refpkg-1-2.zip`) >>Fixed old typos in the file mapping text. >>Added new files `taxonomy/species_to_marker.tsv` for new function `run_tipp3.py detection [...parameters]`. Please use the latest release of the TIPP3 software for this new function. V1 (`tipp3-refpkg.zip`) >>Initial release of the TIPP3 reference package. #Usage 1. unzip the file to a local directory (will get a folder named "tipp3-refpkg"). 2. use with TIPP3 software: `run_tipp3.py -r [path/to/tipp3-refpkg] [other parameters]`
keywords: TIPP3; abundance profile; reference database; taxonomic identification
published: 2025-04-17
 
This dataset includes analysis code used to analyze the data involved with swapping photons between superconducting qubits in separate modules though a superconducting coaxial cable bus. The dataset includes Python code to model and plot the data, CAD designs of the modules that hold the superconducting qubits, high frequency simulation software files to model the electric fields of the superconducting circuits
keywords: superconducting qubits; qunatum information; modular architecture
published: 2024-07-15
 
Rising global temperatures and urban heat island effects challenge environmental health and energy systems at the city level, particularly in summer. Increased heatwaves raise energy demand for cooling, stressing power facilities, increasing costs, and risking blackouts. Heat impacts vary across cities due to differences in urban morphology, geography, land use, and land cover, highlighting vulnerable areas needing targeted heat mitigation. Urban tree canopies, a nature-based solution, effectively mitigate heat. Trees provide shade and cooling through evaporation, improving thermal comfort, reducing air conditioning energy consumption, and enhancing climate resilience. This report focused on the ComEd service area in the Chicago Metropolitan Region and assessed the impacts of population growth, urbanization, climate change, and an ambitious plan to plant 1 million trees. The report evaluated planting 1 million trees to quantify regional cooling effects projected for the 2030s. Afforestation locations were selected to avoid interference with existing infrastructure. Key findings include (i) extreme hot hours (>95°F) will increase from 30 to 200 per year, adding 420 Cooling Degree Days (CCD) by the 2030s, (ii) greener areas can be up to 10°F cooler than less vegetated neighborhoods in summer, (iii) tree canopies can create localized cooling, reducing temperatures by 0.7°F and lowering annual CCD by 60 to 65, and (iv) afforestation can reduce the region’s temperature by 0.7°F, saving 400 to 1100 Megawatt hours of daily power usage during summer. <b>Note: The data is available upon request from <a href="mailto:dpiclimate@uilliois.edu">dpiclimate@uilliois.edu.
keywords: urban heat; cooling degree days; afforestation; tree canopy; Chicago region
published: 2025-04-15
 
Data for the invertebrate analysis in chapter 2 of Jacob Ridgway's thesis: "Neonicotinoids and Fungicides Alter Soil Invertebrate Abundance and Richness Within Restored Prairie"
keywords: Thesis;Soil Invertebrate;Pesticides
published: 2025-04-14
 
This dataset builds on an existing dataset which captures artists’ demographics who are represented by top tier galleries in the 2016–2017 New York art season (Case-Leal, 2017, https://web.archive.org/web/20170617002654/http://www.havenforthedispossessed.org/) with a census of reviews and catalogs about those exhibitions to assess proportionality of media coverage across race and gender. The readme file explains variables, collection, relationship between the datasets, and an example of how the Case-Leal dataset was transformed. The ArticleDataset.csv provides all articles with citation information as well as artist, artistic identity characteristic, and gallery. The ExhibitionCatalog.csv provides exhibition catalog citation information for each identified artist. New in this V2: - In V1, ArticleDataset.csv had both data on the articles published and all of the exhibitions, which was misleading. In V2 I separated out so that ArticleDataset only has articles, and AllSoloShows has all shows, including those that had no articles written about them in the publications reviewed. - Upon closer review I noticed approximately 10 out of the 133 articles had incorrect information in variable "Publication content type: art or general" and/or "Publication Carrier type: web or library?" so I updated V2. - Upon closer review I noticed there was 3 instances of artists who had two solo shows apiece: in addition to Meleko Mokgosi and Carrie Mae Weems which I had already noted in V1, there was also Roxy Paine. I had not noticed this because only one of two of Paine's shows had been written about. This brings the total number of shows to 117 (which was 116 in V1). -Upon closer review I removed one row from ExhibitionCatalogs.csv, as the item i had listed did not meet the parameters.
keywords: diversity and inclusion; diversity audit; contemporary art; art exhibitions; art exhibition reviews; exhibition catalogs; magazines; newspapers; demographics
published: 2025-04-04
 
This dataset, uCite, is the union of nine large-scale open-access PubMed citation data separated by reliability. There are 20 files, including the reliable and unreliable citation PMID pairs, non-PMID identifiers to PMID mapping (for DOIs, Lens, MAG, and Semantic Scholar), original PMID pairs from the nine resources, some metadata for PMIDs, duplicate PMIDs, some redirected PMID pairs, and PMC OA Patci citation matching results. The short description of each data file is listed as follows. A detailed description can be found in the README.txt. <strong>DATASET DESCRIPTION</strong> <ol> <li>PPUB.tsv.gz - tsv format file containing reliable citation pairs uCite.</li> <li>PUNR.tsv.gz - tsv format file containing reliable citation pairs uCite.</li> <li>DOI2PMID.tsv.gz - tsv format file containing results mapping DOI to PMID. </li> <li> LEN2PMID.tsv.gz - tsv format file containing results mapping LensID pairs to PMID pairs.. </li> <li> MAG2PMIDsorted.tsv.gz - tsv format file containing results mapping MAG ID to PMID. </li> <li>SEM2PMID.tsv.gz - tsv ormat file containing results mapping Semantic Scholar ID to PMID. </li> <li>JVNPYA.tsv.gz - tsv format file containing metadata of papers with PMID, journal name, volume, issue, pages, publication year, and first author's last name. </li> <li>TiLTyAlJVNY.tsv.gz - tsv format file containing metadata of papers. </li> <li> PMC-OA-patci.tsv.gz - tsv format file containing PubMed Central Open Access subset reference strings extracted by \cite{} processed by Patci.</li> <li>REDIRECTS.gz - txt file containing unreliable PMID pairs mapped to reliable PMID pairs. </li> <li>REMAP - file containing pairs of duplicate PubMed records (lhs PMID mapped to rhs PMID).</li> <li> ami_pair.tsv.gz - tsv format file containing all citation pairs from Aminer (2015 version). </li> <li> dim_pair.tsv.gz - tsv format file containing all citation pairs from Dimensions. </li> <li> ice_pair.tsv.gz - tsv format file containing all citation pairs from iCite (April 2019 version, version 1). </li> <li> len_pair.tsv.gz - tsv format file containing all citation pairs from Lens.org (harvested through Oct 2021). </li> <li>mag_pair.tsv.gz - tsv format file containing all citation pairs from Microsoft Academic Graph (2015 version). </li> <li> oci_pair.tsv.gz - tsv format file containing all citation pairs from Open Citations (Nov. 2021 dump, csv version ). </li> <li> pat_pair.tsv.gz - tsv format file containing all citation pairs from Patci (i.e., from "PMC-OA-patci.tsv.gz"). </li> <li> pmc_pair.tsv.gz - tsv format file containing all citation pairs from PubMed Central (harvest through Dec 2018 via e-Utilities).</li> <li> sem_pair.tsv.gz - tsv format file containing all citation pairs from Semantic Scholar (2019 version) . </li> </ol> <strong>COLUMN DESCRIPTION</strong> <strong>FILENAME</strong> : <em>PPUB.tsv.gz, PUNR.tsv.gz</em> (1) fromPMID - PubMed ID of the citing paper. (2) toPMID - PubMed ID of the cited paper. (3) sources - citation sources, in which the citation pairs are identified. (4) fromYEAR - Publication year of the citing paper. (5) toYEAR - Publication year of the cited paper. <strong>FILENAME</strong> : <em>DOI2PMID.tsv.gz</em> (1) DOI - Semantic Scholar ID of paper records. (2) PMID - PubMed ID of paper records. (3) PMID2 - Digital Object Identifier of paper records, “-” if the paper doesn't have DOIs. <strong>FILENAME</strong> : <em>SEMID2PMID.tsv.gz</em> (1) SemID - Semantic Scholar ID of paper records. (2) PMID - PubMed ID of paper records. (3) DOI - Digital Object Identifier of paper records, “-” if the paper doesn't have DOIs. <strong>FILENAME</strong> : <em>JVNPYA.tsv.gz</em> - Each row refers to a publication record. (1) PMID - PubMed ID. (2) journal - Journal name. (3) volume - Journal volume. (4) issue - Journal issue. (5) pages - The first page and last page (without leading digits) number of the publication separated by '-'. (6) year - Publication year. (7) lastname - Last name of the first author. <strong>FILENAME</strong> : <em>TiLTyAlJVNY.tsv.gz</em> (1) PMID - PubMed ID. (2) title_tokenized - Paper title after tokenization. (3) languages - Language that paper is written in. (4) pub_types - Types of the publication. (5) length(authors) - String length of author names. (6) journal -Journal name . (7) volume - Journal volume . (8) issue - Journal issue. (9) year - Publication year of print (not necessary epub). <strong>FILENAME</strong> : <em> PMC-OA-patci.tsv.gz</em> (1) pmcid - PubMed Central identifier. (2) pos - (3) fromPMID - PubMed ID of the citing paper. (4) toPMID - PubMed ID of the cited paper. (5) SRC - citation sources, in which the citation pairs are identified. (6) MatchDB - PubMed, ADS, DBLP. (7) Probability - Matching probability predicted by Patci. (8) toPMID2 - PubMed ID of the cited paper, extracted from OA xml file (9) SRC2 - citation sources, in which the citation pairs are identified. (10) intxt_id - (11) jounal - First character of the journal name. (12) same_ref_string - Y if patci and xml reference string match, otherwise N. (13) DIFF - (14) bestSRC - Citation sources, in which the citation pairs are identified. (15) Match - Matching strings annotated by Patci. <strong>FILENAME</strong> : <em>REDIRECTS.gz</em> Each row in Redirectis.txt is a string sequence in the same format as follows. - "REDIRECTED FROM: source PMID_i PMID_j -> PMID_i' PMID_j " - "REDIRECTED TO: source PMID_i PMID_j -> PMID_i PMID_j' " Note: source is the names of sources where the PMID_i and PMID_j are from. <strong>FILENAME</strong> : <em>REMAP</em> Each row is remapping unreliable PMID pairs mapped to reliable PMID pairs. The format of each row is "$REMAP{PMID_i} = PMID_j". <strong>FILENAME</strong> : <em>ami_pair.tsv.gz, dim_pair.tsv.gz, ice_pair.tsv.gz, len_pair.tsv.gz, mag_pair.tsv.gz, oci_pair.tsv.gz, pat_pair.tsv.gz,pmc_pair.tsv.gz, sem_pair.tsv.gz</em> (1) fromPMID - PubMed ID of the citing paper. (2) toPMID - PubMed ID of the cited paper.
keywords: Citation data; PubMed; Social Science;
published: 2025-04-05
 
This data set includes information on mixing metric values and distances to determine the average length scale, rates and variability of mixing downstream of 43 river confluences for 150 mixing events. The file "pmx_all data.csv" contains confluence names, the number of events per confluence site, and Pmx values measured at various actual and dimensionless downstream distances. The file "pmx_binned data.csv" provides mean Pmx values within 0.5-unit dimensionless distance bins.
keywords: river; mixing; confluences; remote sensing
published: 2025-04-02
 
This dataset contains Raman spectra, each acquired from an individual, living, cell entrapped within a soft or stiff gelatin methacrylate hydrogel or from a cell-free region of the hydrogel sample. Spectra were acquired from the following cell types: Madin-Darby Canine Kidney cell (MDCK); Chinese hamster ovary cell (CHO-K1); transfected CHO-K1 cell that expressed the SNAP-tag and HaloTag reporter proteins fused to an organelle-specific protein (CHO-T); human monocyte-like cell (THP-1); inactive macrophage-like (M0-like); active anti-inflammatory macrophage-like (M2-like), pro/anti-inflammatory macrophage-like (M1/M2-like). These spectra are useful for identifying whether the hydrogel matrix obscures the Raman spectral signatures that are characteristic of each of these cell types.
keywords: Raman spectroscopy; 3D cell culture; single-cell spectrum; hydrogel scaffold; collagen scaffold; macrophage spectra; macrophage differentiation; THP-1 line; noninvasive phenotype identification; vibrational spectroscopy
published: 2016-05-19
 
This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission. The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.
keywords: taxi;transportation;New York City;GPS
published: 2020-08-22
 
We are releasing the tracing dataset of four microservice benchmarks deployed on our dedicated Kubernetes cluster consisting of 15 heterogeneous nodes. The dataset is not sampled and is from selected types of requests in each benchmark, i.e., compose-posts in the social network application, compose-reviews in the media service application, book-rooms in the hotel reservation application, and reserve-tickets in the train ticket booking application. The four microservice applications come from [DeathStarBench](https://github.com/delimitrou/DeathStarBench) and [Train-Ticket](https://github.com/FudanSELab/train-ticket). The performance anomaly injector is from [FIRM](https://gitlab.engr.illinois.edu/DEPEND/firm.git). The dataset was preprocessed from the raw data generated in FIRM's tracing system. The dataset is separated by on which microservice component is the performance anomaly located (as the file name suggests). Each dataset is in CSV format and fields are separated by commas. Each line consists of the tracing ID and the duration (in 10^(-3) ms) of each component. Execution paths are specified in `execution_paths.txt` in each directory.
keywords: Microservices; Tracing; Performance
published: 2025-03-19
 
This repository includes HRLDAS Noah-MP model output generated as part of Bieri et al. (2025) - Implementing deep soil and dynamic root uptake in Noah-MP (v4.5): Impact on Amazon dry-season transpiration. These data are distributed in two different formats: Raw model output files and subsetted files that include data for a specific variable. All files are .nc format (NetCDF) and aggregated into .tar files to facilitate download. Given the size of these datasets, Globus transfer is the best way to download them. Raw model output for four model experiments is available: FD (control), GW, SOIL, and ROOT. See the associated publication for information on the different experiments. These data span an approximately 20 year period from 01 Jun 2000 to 31 Dec 2019. The data have a spatial resolution of 4 km and a temporal frequency of 3 hours. These data are for a domain in the southern Amazon basin (see Figure 1 in the associated publication). Data for each experiment is available as a .tar file which includes 3-hourly NetCDF files. All default Noah-MP output variables are included in each file. As a result, the .tar files are quite large and may take many hours or even days to transfer depending on your network speed and local configurations. These files are named 'noahmp_output_2000_2019_EXP.tar', where EXP is the name of the experiment (FD, GW, SOIL, or ROOT). Subsetted model output at a daily temporal resolution for all four model experiments is also available. These .tar files include the following variables: water table depth (ZWT), latent heat flux (LH), sensible heat flux (HFX), soil moisture (SOIL_M), canopy evaporation (ECAN), ground evaporation (EDIR), transpiration (ETRAN), rainfall rate at the surface (QRAIN), and two variables that are specific to the ROOT experiment: ROOTACTIVITY (root activity function) and GWRD (active root water uptake depth). There is one file for each variable within the tarred files. These files are named 'noahmp_output_subset_2000_2019_EXP.tar', where EXP is the name of the experiment (FD, GW, SOIL, or ROOT). Finally, there is a sample dataset with raw 3-hourly output from the ROOT experiment for one day. The purpose of this sample dataset is to allow users to confirm if these data meet their needs before initiating a full transfer via Globus. This file is named 'noahmp_output_sample_ROOT.tar'. The README.txt file provides information on the Noah-MP output variables in these datasets, among other specifications. Information on HRLDAS Noah-MP and names/definitions of model output variables that are useful in working with these data are available here: http://dx.doi.org/10.5065/ew8g-yr95. Note that some output variables may be listed in this document under a different variable name, so searching for the long name (e.g. 'baseflow' instead of 'QRF') is recommended. Information on additional output variables that were added to the model as part of this study is available here: https://github.com/bieri2/bieri-et-al-2025-EGU-GMD/tree/DynaRoot. Model code, configuration files, and forcing data used to carry out the model simulations are linked in the related resources section.
keywords: Land surface model; NetCDF
published: 2025-04-01
 
ICoastalDB, which was developed using Microsoft structured query language (SQL) Server, consists of water quality and related data in the Illinois coastal zone that were collected by various organizations. The information in the dataset includes, but is not limited to, sample data type, method of data sampling, location, time and date of sampling and data units.
keywords: Illinois Coastal Zone; Water Quality Data
published: 2025-03-28
 
8-bit RGB realizations of a stochastic image model (SIM) of the **kinds** of things seen in fluorescence microscopy of biological samples. Note that no attempt was made to model a particular tissue, sample, or microscope. Distinct image features are seen in each color channel. The first public mention of these SIMs is in "Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure" by Frank Brooks and Rucha Deshpande. Manuscript on ArXiv and submitted for publication.
keywords: image models; fluorescence microscopy; training data; image-to-image translation; generative model evaluation
published: 2025-03-20
 
This dataset contains white-tailed deer (Odocoileus virginianus) land cover utility score (deer LCU score) data for every TRS (township, range, and section), township-range, and county in Illinois, USA, based on annual National Land Cover Database (NLCD) data released for all years between 2000 and 2023. LCU data is provided in CSV files for each spatial scale, with TRS data split into 2 CSV files due to size limits. Rasters (TIF) showing all deer habitat in Illinois are also provided to show the location, quality, and quantity of deer habitat. A metadata file is also included for additional information.
keywords: habitat; white-tailed deer; deer; Odocoileus virginianus; land cover; land classification; landscape; habitat suitability index; ecology; environment
published: 2025-03-18
 
The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. Additional Resources: - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, please fill out the <a href="https://docs.google.com/forms/d/e/1FAIpQLSf-J937V6I4sMSxQt7gR3SIbUASR26KXxqSurrkBvlF-CIQnQ/viewform?usp=pp_url"><b>Archer Access Request Form</b></a> so we can determine if you are eligible for access or not. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the <a href="https://forms.gle/6eA2yJUGFMtj5swY7"><b>Archer User Feedback Form</b></a>. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this <a href="https://groups.webservices.illinois.edu/subscribe/154221"><b>form</b></a> to subscribe to it. <b>Citation Guidelines:</b> 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2025. Global News Index and Extracted Features Repository [codebook], v1.3.0. Champaign, IL: University of Illinois. June. XX. doi:10.13012/B2IDB-5649852_V6 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2025. Global News Index and Extracted Features Repository [database], v1.3.0. Champaign, IL: University of Illinois. Jun. XX. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V6 *NOTE: V6 is replacing V5 with updated ‘Archer’ documents to reflect changes made to the Archer system.