Illinois Data Bank
Deposit Dataset
Find Data
Policies
Guides
Contact Us
Log in with NetID
University Library, University of Illinois at Urbana-Champaign
Toggle navigation
Illinois Data Bank
Deposit Dataset
Find Data
Policies
Guides
Contact Us
Log in with NetID
<
1
2
3
4
5
6
7
8
9
10
…
23
24
>
25 per page
50 per page
Show All
Displaying datasets 126 - 150 of 576 in total
Clear Filters
Generate Report from Search Results
Subject Area
Life Sciences (308)
Social Sciences (128)
Physical Sciences (84)
Technology and Engineering (51)
Uncategorized (4)
Arts and Humanities (1)
Funder
U.S. National Science Foundation (NSF) (173)
Other (168)
U.S. Department of Energy (DOE) (60)
U.S. National Institutes of Health (NIH) (53)
U.S. Department of Agriculture (USDA) (32)
Illinois Department of Natural Resources (IDNR) (13)
U.S. Geological Survey (USGS) (6)
U.S. National Aeronautics and Space Administration (NASA) (5)
Illinois Department of Transportation (IDOT) (3)
U.S. Army (2)
Publication Year
2021 (108)
2022 (108)
2020 (96)
2019 (72)
2023 (65)
2018 (59)
2017 (35)
2016 (30)
2024 (3)
License
CC0 (323)
CC BY (236)
custom (17)
published: 2022-03-25
Shen, Chengze; Park, Minhyuk; Warnow, Tandy (2022): The 16S.B.ALL dataset in 100-HF condition. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6604429_V1
This upload includes the 16S.B.ALL in 100-HF condition (referred to as 16S.B.ALL-100-HF) used in Experiment 3 of the WITCH paper (currently accepted in principle by the Journal of Computational Biology). 100-HF condition refers to making sequences fragmentary with an average length of 100 bp and a standard deviation of 60 bp. Additionally, we enforced that all fragmentary sequences to have lengths > 50 bp. Thus, the final average length of the fragments is slightly higher than 100 bp (~120 bp). In this case (i.e., 16S.B.ALL-100-HF), 1,000 sequences with lengths 25% around the median length are retained as "backbone sequences", while the remaining sequences are considered "query sequences" and made fragmentary using the "100-HF" procedure. Backbone sequences are aligned using MAGUS (or we extract their reference alignment). Then, the fragmentary versions of the query sequences are added back to the backbone alignment using either MAGUS+UPP or WITCH. More details of the tar.gz file are described in README.txt.
keywords:
MAGUS;UPP;Multiple Sequence Alignment;eHMMs
published: 2022-08-06
Madhavan, Vidya; Aishwarya, Anuva (2022): Data for Spin-selective tunneling from nanowires of the candidate topological Kondo insulator SmB6. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9971603_V1
This dataset consists of all the files and codes that are part of the manuscript (main text and supplement) titled "Spin-selective tunneling from nanowires of the candidate topological Kondo insulator SmB6". For detailed information on the individual files refer to the specific readme files.
keywords:
Topology; Kondo Inuslator; Spin; Scanning tunneling microscopy; antiferromagnetism
has sharing link
published: 2022-08-06
Carson, Dawn; Kopsco, Heather; Gronemeyer, Peg; Mateus-Pinilla, Nohra; Smith, Genee; Sandstrom, Emma; Smith, Rebecca (2022): Knowledge, attitudes, and practices of Illinois medical professionals related to ticks and tick-borne disease. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0685545_V1
An online knowledge, attitudes, and practices survey on ticks and tick-borne diseases was distributed to medical professionals in Illinois during summer 2020 to fall 2021. These are the raw data associated with that survey and the survey questions used. Age, gender, and county of practice have been removed for identifiability. We have added calculated values (columns 165 to end), including: the tick knowledge score, TBD knowledge score, and total knowledge score, which are the sum of the total number of correct answers in each category, and score percent, which are the proportion of correct answers in each category; region, which is determined from the county of practice; TBD relevant practice, which separates the practice variable into TBD primary, secondary, and non-responders; and several variables which group categories.
keywords:
ticks; medicine; tick-borne disease; survey
published: 2022-08-05
Hunninck, Louis; O'Keefe, Joy (2022): Bat activity and diversity in agricultural landscapes in Illinois, USA. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7792566_V1
This data set documents bat activity (counts per detector-night per phonic group) and bat diversity (number of bat species per detector-night) in relation to distance to the nearest forested corridor in a row crop agriculture dominated landscape and in relation to relative crop pest abundance. This data set was used to assess if bats were homogeneously distributed over a near-uninterrupted agricultural landscape and to assess the importance of forested corridors and the presence of pest species on their distribution across the landscape. Data was collected with 50 AudioMoth bat detectors along 10 transects, with each transect having 5 detectors. The transects started at a forest corridor and extended out for 4 km into uninterrupted row crop agriculture. Pest abundance was extrapolated from data collected in the same county during the same time as the study. Potentially important weather covariates were extracted from the nearest operational weather station.
keywords:
bats; bat activity; biodiversity; agricultural pest
published: 2022-08-01
Shearer, David; Beilke, Elizabeth (2022): Data for Playing it by ear: gregarious sparrows recognize and respond to isolated wingbeat sounds and predator-based cues. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6676149_V1
Datasets that accompany Shearer and Beilke 2022 publication (Title: Playing it by ear: gregarious sparrows recognize and respond to isolated wingbeat sounds and predator-based cues.; Journal: Animal Cognition)
keywords:
Vigilance; auditory detection; predator detection; predator-prey interaction; antipredator behavior
published: 2022-07-25
Jett, Jacob (2022): SBKS - Species Ambiguous Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1194770_V1
Related to the raw entity mentions, this dataset represents the effects of the data cleaning process and collates all of the entity mentions which were too ambiguous to successfully link to the NCBI's taxonomy identifier system.
keywords:
synthetic biology; NERC data; species mentions, ambiguous entities
published: 2022-07-25
Jett, Jacob (2022): SBKS - Species Raw Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4950847_V1
A set of species entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords:
synthetic biology; NERC data; species mentions
published: 2022-07-25
Jett, Jacob (2022): SBKS - Species - Cleaned & Grounded Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8323975_V1
This dataset represents the results of manual cleaning and annotation of the entity mentions contained in the raw dataset (https://doi.org/10.13012/B2IDB-4950847_V1). Each mention has been consolidated and linked to an identifier for a matching concept from the NCBI's taxonomy database.
keywords:
synthetic biology; NERC data; species mentions; cleaned data; NCBI TaxonID
published: 2022-07-25
Jett, Jacob (2022): SBKS - Species Noisy Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7146216_V1
This dataset is derived from the raw dataset (https://doi.org/10.13012/B2IDB-4950847_V1) and collects entity mentions that were manually determined to be noisy, non-species entities.
keywords:
synthetic biology; NERC data; species mentions, noisy entities
published: 2022-07-25
Jett, Jacob (2022): SBKS - Species Not Found Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5491578_V1
This dataset is derived from the raw entity mention dataset (https://doi.org/10.13012/B2IDB-4950847_V1) for species entities and represents those that were determined to be species (i.e., were not noisy entities) but for which no corresponding concept could be found in the NCBI taxonomy database.
keywords:
synthetic biology; NERC data; species mentions, not found entities
published: 2022-07-25
Jett, Jacob (2022): SBKS - Chemical Ambiguous Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2910468_V1
Related to the raw entity mentions (https://doi.org/10.13012/B2IDB-4163883_V1), this dataset represents the effects of the data cleaning process and collates all of the entity mentions which were too ambiguous to successfully link to the ChEBI ontology.
keywords:
synthetic biology; NERC data; chemical mentions; ambiguous entities
published: 2022-07-25
Jett, Jacob (2022): SBKS - Chemical Raw Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4163883_V1
A set of chemical entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords:
synthetic biology; NERC data; chemical mentions
published: 2022-07-25
Jett, Jacob (2022): SBKS - Celllines Raw Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8851803_V1
A set of cell-line entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords:
synthetic biology; NERC data; cell-line mentions
published: 2022-07-25
Jett, Jacob (2022): SBKS - Chemical - Cleaned & Grounded Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3396059_V1
This dataset represents the results of manual cleaning and annotation of the entity mentions contained in the raw dataset (https://doi.org/10.13012/B2IDB-4163883_V1). Each mention has been consolidated and linked to an identifier for a matching concept from the NCBI's taxonomy database.
keywords:
synthetic biology; NERC data; chemical mentions; cleaned data; ChEBI ontology
published: 2022-07-25
Jett, Jacob (2022): SBKS - Chemical Noisy Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7228767_V1
This dataset is derived from the raw dataset (https://doi.org/10.13012/B2IDB-4163883_V1) and collects entity mentions that were manually determined to be noisy, non-chemical entities.
keywords:
synthetic biology; NERC data; chemical mentions, noisy entities
published: 2022-07-25
Jett, Jacob (2022): SBKS - Chemical Not Found Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4570128_V1
This dataset is derived from the raw entity mention dataset (https://doi.org/10.13012/B2IDB-4163883_V1) for checmical entities and represents those that were determined to be chemicals (i.e., were not noisy entities) but for which no corresponding concept could be found in the ChEBI ontology.
keywords:
synthetic biology; NERC data; chemical mentions, not found entities
published: 2022-07-25
Jett, Jacob (2022): SBKS - Genes Raw Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3887275_V1
A set of gene and gene-related entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords:
synthetic biology; NERC data; gene mentions
published: 2021-05-10
Fallaw, Colleen (2021): Data for Institutional Data Repository Development, a Moving Target. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7291801_V1
This dataset contains data used in publication "Institutional Data Repository Development, a Moving Target" submitted to Code4Lib Journal. It is a tabular data file describing attributes of data files in datasets published in Illinois Data Bank 2016-04-01 to 2021-04-01.
keywords:
institutional repository
published: 2022-07-11
Jeng, Amos; Bosch, Nigel; Perry, Michelle (2022): Data for: Sense of Belonging Predicts Perceived Helpfulness in Online Peer Help-Giving Interactions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2872989_V1
This dataset was developed as part of an online survey study that explores student characteristics that may predict what one finds helpful in replies to requests for help posted to an online college course discussion forum. 223 college students enrolled in an introductory statistics course were surveyed on their sense of belonging to their course community, as well as how helpful they found 20 examples of replies to requests for help posted to a statistics course discussion forum.
keywords:
help-giving; discussion forums; sense of belonging; college student
published: 2022-07-19
Parmar, Dharmeshkumar; Jia, Jin; Shrout, Joshua; Sweedler, Jonathan; Bohn, Paul (2022): Effect of Micro-patterned Mucin on Quinolone and Rhamnolipid Profiles of Mucoid Pseudomonas aeruginosa under Antibiotic Stress . University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0382919_V1
#### Details of Pseudomonas aeruginosa biofilm dataset #### ----------------*Folder Structure*------------------------------------- This dataset contains peak intensity tables extracted from mass spectrometry imaging (MSI) data using tools, SCiLS and MSI reader. There are 2 folders in "MSI-Data-Paeruginosa-biofilms-UIUC-DP-JVS-July2022.zip", each folder contains 3 sub-folders as listed below. 1. PellicleBiofilms-and-Supernatant [Pellicle biofilms collected from air-liquid interface and spend supernatant medium after 96 h incubation period]: (1) Full-Scan-Data-96h; (2) MSMS-data-from-C7-Quinolones-96h; and (3) MSMS-data-from-C9-Quinolones-96h 2. StaticBiofilms [Static biofilms grown on mucin surface]: (1) Full-Scan-Data; (2) MSMS-data-from-C7-Quinolones; and (3) MSMS-data-from-C9-Quinolones ----------------*File name*---------------------------------------------- Sample information is included in the file names for easy identification and processing. Attributes covered in file names are explained in the example below. *Example file name "Rep1-Stat-FRD1-mPat-48-FS"* ~ Each unit of information is separated by "-" ~Unit 1 - "Rep1" - Biological replicate ( Rep1, Rep2, and Rep3) ~Unit 2 - "Stat" - Sample type (Stat = Static Biofilm, Pel = Pellicle biofilm, Sup = Supernatant) ~Unit 3 - "FRD1" - Strain (FRD1 = Mucoid strain, PAO1C = Non-mucoid strain) ~Unit 4 - "mPat" - Type of mucin surface used (mPat = patterned mucin surface, mUni = uniform mucin surface) ~Unit 5 - "48" - Sample time point (hours = 48, 72, 96) ~Unit 6 - "FS" - Scan type used in MSI (FS = high resolution full-scan, 260 = targeted MS/MS of C7 quinolones (m/z 260), 288 = targeted MS/MS of C9 quinolones (m/z 288)) ----------------*File structure*------------------------------------------ All MSI data has been exported to CSV format. Each CSV files contains information about scan number, Coordinates (x,y,z), m/z values, extraction window (absolute), and corresponding intensities in the form of a matrix. ----------------*End of Information*--------------------------------------
keywords:
mass spectrometry imaging (MSI); biofilm; antibiotic resistance; Pseudomonas aeruginosa; quorum sensing; rhamnolipids
published: 2022-06-20
Jiang, Ming; Dubnicek, Ryan; Worthey, Glen; Underwood, Ted; Downie, J. Stephen (2022): A Prototype Gutenberg-HathiTrust Sentence-level Parallel Corpus. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1685085_V1
This is a sentence-level parallel corpus in support of research on OCR quality. The source data comes from: (1) Project Gutenberg for human-proofread "clean" sentences; and, (2) HathiTrust Digital Library for the paired sentences with OCR errors. In total, this corpus contains 167,079 sentence pairs from 189 sampled books in four domains (i.e., agriculture, fiction, social science, world war history) published from 1793 to 1984. There are 36,337 sentences that have two OCR views paired with each clean version. In addition to sentence texts, this corpus also provides the location (i.e., sentence and chapter index) of each sentence in its belonging Gutenberg volume.
keywords:
sentence-level parallel corpus; optical character recognition; OCR errors; Project Gutenberg; HathiTrust Digital Library; digital libraries; digital humanities;
published: 2022-06-22
Kang, Jeon-Young; Farkhad, Bita Fayaz; Chan, Man-pui Sally; Michels, Alexander; Albarracin, Dolores; Wang, Shaowen (2022): Data for Spatial Accessibility to HIV (Human Immunodeficiency Virus) Testing, Treatment, and Prevention Services in Illinois and Chicago, USA. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9096476_V1
This dataset helps to investigate the Spatial Accessibility to HIV Testing, Treatment, and Prevention Services in Illinois and Chicago, USA. The main components are: population data, healthcare data, GTFS feeds, and road network data. The core components are: 1) `GTFS` which contains GTFS (<a href="https://gtfs.org/">General Transit Feed Specification</a>) data which is provided by Chicago Transit Authority (CTA) from <a href="https://developers.google.com/transit/gtfs">Google's GTFS feeds</a>. Documentation defines the format and structure of the files that comprise a GTFS dataset: <a href="https://developers.google.com/transit/gtfs/reference?csw=1">https://developers.google.com/transit/gtfs/reference?csw=1</a>. 2) `HealthCare` contains shapefiles describing HIV healthcare providers in Chicago and Illinois respectively. The services come from <a href="https://locator.hiv.gov/">Locator.HIV.gov</a>. 3) `PopData` contains population data for Chicago and Illinois respectively. Data come from The American Community Survey and <a href="https://map.aidsvu.org/map">AIDSVu</a>. AIDSVu (https://map.aidsvu.org/map) provides data on PLWH in Chicago at the census tract level for the year 2017 and in the State of Illinois at the county level for the year 2016. The American Community Survey (ACS) provided the number of people aged 15 to 64 at the census tract level for the year 2017 and at the county level for the year 2016. The ACS provides annually updated information on demographic and socio economic characteristics of people and housing in the U.S. 4) `RoadNetwork` contains the road networks for Chicago and Illinois respectively from <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> using the Python <a href="https://osmnx.readthedocs.io/en/stable/">osmnx</a> package. <b>The abstract for our paper is:</b> Accomplishing the goals outlined in “Ending the HIV (Human Immunodeficiency Virus) Epidemic: A Plan for America Initiative” will require properly estimating and increasing access to HIV testing, treatment, and prevention services. In this research, a computational spatial method for estimating access was applied to measure distance to services from all points of a city or state while considering the size of the population in need for services as well as both driving and public transportation. Specifically, this study employed the enhanced two-step floating catchment area (E2SFCA) method to measure spatial accessibility to HIV testing, treatment (i.e., Ryan White HIV/AIDS program), and prevention (i.e., Pre-Exposure Prophylaxis [PrEP]) services. The method considered the spatial location of MSM (Men Who have Sex with Men), PLWH (People Living with HIV), and the general adult population 15-64 depending on what HIV services the U.S. Centers for Disease Control (CDC) recommends for each group. The study delineated service- and population-specific accessibility maps, demonstrating the method’s utility by analyzing data corresponding to the city of Chicago and the state of Illinois. Findings indicated health disparities in the south and the northwest of Chicago and particular areas in Illinois, as well as unique health disparities for public transportation compared to driving. The methodology details and computer code are shared for use in research and public policy.
keywords:
HIV;spatial accessibility;spatial analysis;public transportation;GIS
published: 2022-07-10
Winogradoff, David; Chou, Han-Yi; Maffeo, Christopher; Aksimentiev, Aleksei (2022): Trajectory files for "Percolation transition prescribes protein size-specific barrier to passive transport through the nuclear pore complex.". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5581194_V1
keywords:
Nuclear pore complex; system files; trajectory files
published: 2022-07-08
Rahlin, Anastasia; Saunders, Sarah; Beilke, Stephanie (2022): Spatial drivers of wetland bird occupancy within an urbanized matrix in the Upper Midwestern United States. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1575830_V1
Dataset for "Spatial drivers of wetland bird occupancy within an urbanized matrix in the Upper Midwestern United States" manuscript contains occupancy data for ten wetland bird species used in single-species occupancy models at four spatial scales and four wetland habitat types. Data were collected from 2017-2019 in NE Illinois and NW Indiana. Dataset includes wetland bird occupancy data, habitat parameter values for each survey location, and R code used to run analyses.
keywords:
wetland birds; occupancy; emergent wetland; urbanization; Great Lakes region
published: 2022-05-20
Haselhorst, Derek; Moreno, J. Enrique; Tcheng, David K.; Punyasena, Surangi W. (2022): Images and annotated counts for aerial pollen samples from the Barro Colorado Island megaplot, Panama (1994 – 2010). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2176715_V1
This dataset includes images and annotated counts for 150 airborne pollen samples from the Center for Tropical Forest Science 50 ha forest dynamics plot on Barro Colorado Island, Panama. Samples were collected once a year from April 1994 to June 2010.
keywords:
aerial pollen traps; automated pollen identification; Barro Colorado Island; convolutional neural networks; Neotropics; palynology; phenology