Illinois Data Bank
Displaying 376 - 400 of 812 in total
Subject Area
Funder
Publication Year
License
Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2020-02-12
 
This dataset contains the results of a three month audit of housing advertisements. It accompanies the 2020 ICWSM paper "Auditing Race and Gender Discrimination in Online Housing Markets". It covers data collected between Dec 7, 2018 and March 19, 2019. There are two json files in the dataset: The first contains a list of json objects representing advertisements separated by newlines. Each object includes the date and time it was collected, the image and title (if collected) of the ad, the page on which it was displayed, and the training treatment it received. The second file is a list of json objects representing a visit to a housing lister separated by newlines. Each object contains the url, training treatment applied, the location searched, and the metadata of the top sites scraped. This metadata includes location, price, and number of rooms. The dataset also includes the raw images of ads collected in order to code them by interest and targeting. These were captured by selenium and named using a perceptive hash to de-duplicate images.
keywords: algorithmic audit; advertisement audit;
published: 2021-10-27
 
Shared dataset consists of 16S sequencing data of microbial communities. Each community is composed of heterotrophic bacteria derived from one of two soil samples and the model algae Chlamydomonas reinhardtii. Each comunity was placed in a materially closed environment with an initial supply of carbon in the media and subjected to light-dark cycles. The closed microbial ecosystems (CES) survived via carbon cycling. Each CES was subjected to rounds of dilution, after which the community was sequenced (data provided here). The shared dataset allowed us to conclude that CES consistently self-assembled to cycle carbon (data not provided) via conserved metabolic capabilites (data not provided) dispite differences in taxonomic composition (data provided). --------------------------- Naming convention: [soil sample = A or B][CES replicate = 1,2,3, or 4]_[round number = 1,2,3,or 4]_[reverse read = R or forward read = F]_filt.fastq Example -- A1_r1_F_filt.fastq means soil sample A, CES replicate 1, end of round1, forward read
keywords: 16S seq; .fastq; closed microbial ecosystems; carbon cycling
published: 2018-07-29
 
This repository includes scripts, datasets, and supplementary materials for the study, "NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge. ***When downloading datasets, please note that the following errors.*** In README.txt, lines 37 and 38 should read: + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre Note that the file names (fasttree-exon.tre and fasttree-intron.tre) are swapped. In tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the "symmetric difference error rate" as the "Robinson-Foulds error rate". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This could impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the normalized symmetric difference is always greater than or equal to the normalized Robinson-Foulds distance, so the gene tree error rates reported in the study are more conservative. In njmerge-supplementary-materials.pdf, the alpha parameter shown in Supplementary Table S2 is actually the divisor D, which is used to compute alpha for each gene as follows. 1. For each gene, a random value X between 0 and 1 is drawn from a uniform distribution. 2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2). Note that because the mean of the uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns.
keywords: phylogenomics; species trees; incomplete lineage sorting; divide-and-conquer
published: 2024-01-04
 
This data set includes all of data related to stretchable TFTs based on 2D heterostructures including optical images of TFTs, Raman and Photoluminescence characteristics data, Transport measurement data, and AFM topography data. Abstract Two-dimensional (2D) materials are outstanding candidates for stretchable electronics, but a significant challenge is their heterogeneous integration into stretchable geometries on soft substrates. Here, we demonstrate a strategy for stretchable thin film transistors (2D S-TFT) based on wrinkled heterostructures on elastomer substrates where 2D materials formed the gate, source, drain, and channel, and characterized them with Raman spectroscopy and transport measurements.
keywords: 2D materials; 2D heterstructures; Stretchable electronics; transistors; buckling engineering
published: 2014-10-29
 
This dataset provides the data for Nguyen, Nam-phuong, et al. "TIPP: taxonomic identification and phylogenetic profiling." Bioinformatics 30.24 (2014): 3548-3555.
published: 2023-03-08
 
A stochastic domination analysis model was developed to examine the effect that emerging carbon markets can have on the spatially varying returns and risk profiles of bioenergy crops relative to conventional crops. The code is written in MATLAB, and includes the calculated output. See the README file for instructions to run the code.
keywords: bioenergy crops; economic modeling; stochastic domination analysis model;
published: 2018-12-20
 
File Name: Inclusion_Criteria_Annotation.csv Data Preparation: Xiaoru Dong Date of Preparation: 2018-12-14 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: The file contains lists of inclusion criteria of Cochrane Systematic Reviews and the manual annotation results. 5420 inclusion criteria were annotated, out of 7158 inclusion criteria available. Annotations are either "Only RCTs" or "Others". There are 2 columns in the file: - "Inclusion Criteria": Content of inclusion criteria of Cochrane Systematic Reviews. - "Only RCTs": Manual Annotation results. In which, "x" means the inclusion criteria is classified as "Only RCTs". Blank means that the inclusion criteria is classified as "Others". Notes: 1. "RCT" stands for Randomized Controlled Trial, which, in definition, is "a work that reports on a clinical trial that involves at least one test treatment and one control treatment, concurrent enrollment and follow-up of the test- and control-treated groups, and in which the treatments to be administered are selected by a random process, such as the use of a random-numbers table." [Randomized Controlled Trial publication type definition from https://www.nlm.nih.gov/mesh/pubtypes.html]. 2. In order to reproduce the relevant data to this, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.
keywords: Inclusion criteria, Randomized controlled trials, Machine learning, Systematic reviews
published: 2019-03-25
 
This dataset contains genotypic and phenotypic data, R scripts, and the results of analysis pertaining to a multi-location field trial of Miscanthus sinensis. Genome-wide association and genomic prediction were performed for biomass yield and 14 yield-component traits across six field trial locations in Asia and North America, using 46,177 single-nucleotide polymorphism (SNP) markers mined from restriction site-associated DNA sequencing (RAD-seq) and 568 M. sinensis accessions. Genomic regions and candidate genes were identified that can be used for breeding improved varieties of M. sinensis, which in turn will be used to generate new M. xgiganteus clones for biomass.
keywords: miscanthus; genotyping-by-sequencing (GBS); genome-wide association studies (GWAS); genomic selection
published: 2024-02-15
 
Dataset includes the dataset for estimating bat density from acoustic data and the R code. The data support a publication by Meredith L. Hoggatt, Clarissa A. Starbuck, and Joy M. O'Keefe entitled Acoustic monitoring yields informative bat population density estimates.
keywords: acoustics; bats; monitoring; population density; random encounter model
published: 2019-02-22
 
This dataset includes measurements taken during the experiments on patterns of alluvial cover over bedrock. The dataset includes an hour worth of timelapse images taken every 10s for eight different experimental conditions. It also includes the instantaneous water surface elevations measured with eTapes at a frequency of 10Hz for each experiment. The 'Read me Data.txt' file explains in more detail the contents of the dataset.
keywords: bedrock; erosion; alluvial; meandering; alluvial cover; sinuosity; flume; experiments; abrasion;
published: 2024-02-16
 
Sample data from one typical phantom test and one deidentified shunt patient test (shown in Fig. 8 of the MRM paper), with the corresponding analysis code for the Shunt-FENSI technique. For the MRM paper “Measuring CSF Shunt Flow with MRI Using Flow Enhancement of Signal Intensity (FENSI)”
keywords: Shunt-FENSI; MRM; Hydrocephalus; VP Shunt; Flow Quantification; Pediatric Neurosurgery; Pulse Sequence; Signal Simulation
published: 2016-12-20
 
Scripts and example data for AIDData (aiddata.org) processing in support of forthcoming Nakamura dissertation. This dataset includes two sets of scripts and example data files from an aiddata.org data dump. Fuller documentation about the functionality for these scripts is within the readme file. Additional background information and description of usage will be in the forthcoming Nakamura dissertation (link will be added when available). Data originally supplied by Nakamura. Python code and this readme file created by Wickes. Data included within this deposit are examples to demonstrate execution. Roughly, there are two python scripts in here: keyword_search.py, designed to assist in finding records matching specific keywords, and matching_tool.ipynb, designed to assist in detection of which records are and are not contained within a keyword results file and an aiddata project data file.
keywords: aiddata; natural resources
published: 2021-11-05
 
This data set contains survey results from a 2021 survey of University of Illinois University Library employees conducted as part of the Becoming A Trans Inclusive Library Project to evaluate the awareness of University of Illinois faculty, staff, and student employees regarding transgender identities, and to assess the professional development needs of library employees to better serve trans and gender non-conforming patrons. The survey instrument is available in the IDEALS repository: http://hdl.handle.net/2142/110080.
keywords: transgender awareness, academic library, gender identity awareness, professional development opportunities
published: 2021-09-17
 
We studied vegetation metric robustness to environmental (season, interannual, and regional) and methodological (observer) variables, as well as adequate sample size for vegetation metrics across four regions of the United States.
keywords: coefficients of conservatism; floristic quality assessment; restoration; vegetation metric;
published: 2022-11-11
 
This dataset is for characterizing chemical short-range-ordering in CrCoNi medium entropy alloys. It has three sub-folders: 1. code, 2. sample WQ, 3. sample HT. The software needed to run the files is Gatan Microscopy Suite® (GMS). Please follow the instruction on this page to install the DM3 GMS: <a href="https://www.gatan.com/installation-instructions#Step1">https://www.gatan.com/installation-instructions#Step1</a> 1. Code folder contains three DM scripts to be installed in Gatan DigitalMicrograph software to analyze scanning electron nanobeam diffraction (SEND) dataset: Cepstrum.s: need [EF-SEND_sampleWQ_cropped_aligned.dm3] in Sample WQ and the average image from [EF-SEND_sampleWQ_cropped_aligned.dm3]. Same for Sample HT folder. log_BraggRemoval.s: same as above. Patterson.s: Need refined diffuse patterns in Sample HT folder. 2. Sample WQ and 3. Sample HT folders both contain the SEND data (.ser) and the binned SEND data (.dm3) as well as our calculated strain maps as the strain measurement reference. The Sample WQ folder additionally has atomic resolution STEM images; the Sample HT folder additionally has three refined diffuse patterns as references for diffraction data processing. * Only .ser file is needed to perform the strain measurement using imToolBox as listed in the manuscript. .emi file contains the meta data of the microscope, which can be opened together with .ser file using FEI TIA software.
keywords: Medium entropy alloy; CrCoNi; chemical short-range-ordering; CSRO; TEM
published: 2022-11-09
 
This dataset includes the blue water intensity by sector (41 industries and service sectors) for provinces in China, economic and virtual water network flow for China in 2017, and the corresponding network properties for these two networks.
keywords: Economic network; Virtual water; Supply chains; Network analysis; Multilayer; MRIO
published: 2023-04-02
 
Use of cellulosic biofuels from non-feedstocks are modeled using the BEPAM (Biofuel and Environmental Policy Analysis Model) model to quantifying the uncertainties about induced land use change effects, net greenhouse gas saving potential, and economic costs. The code is in GAMS, general algebraic modeling language. NOTE: Column 3 is titled "BAU" in "merged_BAU.gdx", "merged_RFS.gdx", and "merged_CEM.gdx", but contains "RFS" data in "merged_RFS.gdx" and "CEM" data in "merged_CEM.gdx".
keywords: cellulosic biomass; BEPAM; economic modeling
published: 2016-12-19
 
Files in this dataset represent an investigation into use of the Library mobile app Minrva during the months of May 2015 through December 2015. During this time interval 45,975 API hits were recorded by the Minrva web server. The dataset included herein is an analysis of the following: 1) a delineation of API hits to mobile app modules use in the Minrva app by month, 2) a general analysis of Minrva app downloads to module use, and 3) the annotated data file providing associations from API hits to specific modules used, organized by month (May 2015 – December 2015).
keywords: API analysis; log analysis; Minrva Mobile App
published: 2021-07-15
 
The dataset contains the high-throughput matrix-assisted laser desorption/ionization mass spectrometry XmL files for the atrial gland and red hemiduct of Aplysia californica.
keywords: Dense-core vesicle; High-throughput; Mass Spectrometry; MALDI; Organelle; Image-Guided; Atrial gland; red hemiduct; Lucent Vesicle
published: 2023-03-28
 
Sentences and citation contexts identified from the PubMed Central open access articles ---------------------------------------------------------------------- The dataset is delivered as 24 tab-delimited text files. The files contain 720,649,608 sentences, 75,848,689 of which are citation contexts. The dataset is based on a snapshot of articles in the XML version of the PubMed Central open access subset (i.e., the PMCOA subset). The PMCOA subset was collected in May 2019. The dataset is created as described in: Hsiao TK., & Torvik V. I. (manuscript) OpCitance: Citation contexts identified from the PubMed Central open access articles. <b>Files</b>: • A_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with A. • B_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with B. • C_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with C. • D_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with D. • E_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with E. • F_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with F. • G_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with G. • H_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with H. • I_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with I. • J_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with J. • K_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with K. • L_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with L. • M_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with M. • N_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with N. • O_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with O. • P_p1_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 1). • P_p2_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 2). • Q_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with Q. • R_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with R. • S_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with S. • T_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with T. • UV_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with U or V. • W_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with W. • XYZ_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with X, Y or Z. Each row in the file is a sentence/citation context and contains the following columns: • pmcid: PMCID of the article • pmid: PMID of the article. If an article does not have a PMID, the value is NONE.  • location: The article component (abstract, main text, table, figure, etc.) to which the citation context/sentence belongs.  • IMRaD: The type of IMRaD section associated with the citation context/sentence. I, M, R, and D represent introduction/background, method, results, and conclusion/discussion, respectively; NoIMRaD indicates that the section type is not identifiable.  • sentence_id: The ID of the citation context/sentence in the article component • total_sentences: The number of sentences in the article component.  • intxt_id: The ID of the citation. • intxt_pmid: PMID of the citation (as tagged in the XML file). If a citation does not have a PMID tagged in the XML file, the value is "-". • intxt_pmid_source: The sources where the intxt_pmid can be identified. Xml represents that the PMID is only identified from the XML file; xml,pmc represents that the PMID is not only from the XML file, but also in the citation data collected from the NCBI Entrez Programming Utilities. If a citation does not have an intxt_pmid, the value is "-".  • intxt_mark: The citation marker associated with the inline citation. • best_id: The best source link ID (e.g., PMID) of the citation. • best_source: The sources that confirm the best ID. • best_id_diff: The comparison result between the best_id column and the intxt_pmid column. • citation: A citation context. If no citation is found in a sentence, the value is the sentence.  • progression: Text progression of the citation context/sentence.  <b>Supplementary Files</b> • PMC-OA-patci.tsv.gz – This file contains the best source link IDs for the references (e.g., PMID). Patci [1] was used to identify the best source link IDs. The best source link IDs are mapped to the citation contexts and displayed in the *_journal IntxtCit.tsv files as the best_id column. Each row in the PMC-OA-patci.tsv.gz file is a citation (i.e., a reference extracted from the XML file) and contains the following columns: • pmcid: PMCID of the citing article. • pos: The citation's position in the reference list. • fromPMID: PMID of the citing article. • toPMID: Source link ID (e.g., PMID) of the citation. This ID is identified by Patci. • SRC: The sources that confirm the toPMID. • MatchDB: The origin bibliographic database of the toPMID. • Probability: The match probability of the toPMID. • toPMID2: PMID of the citation (as tagged in the XML file). • SRC2: The sources that confirm the toPMID2. • intxt_id: The ID of the citation. • journal: The first letter of the journal title. This maps to the *_journal_IntxtCit.tsv files. • same_ref_string: Whether the citation string appears in the reference list more than once. • DIFF: The comparison result between the toPMID column and the toPMID2 column. • bestID: The best source link ID (e.g., PMID) of the citation. • bestSRC: The sources that confirm the best ID. • Match: Matching result produced by Patci. [1] Agarwal, S., Lincoln, M., Cai, H., & Torvik, V. (2014). Patci – a tool for identifying scientific articles cited by patents. GSLIS Research Showcase 2014. http://hdl.handle.net/2142/54885 • intxt_cit_license_fromPMC.tsv – This file contains the CC licensing information for each article. The licensing information is from PMC's file lists [2], retrieved on June 19, 2020, and March 9, 2023. It should be noted that the license information for 189,855 PMCIDs is <b>NO-CC CODE</b> in the file lists, and 521 PMCIDs are absent in the file lists. The absence of CC licensing information does not indicate that the article lacks a CC license. For example, PMCID: 6156294 (<b>NO-CC CODE</b>) and PMCID: 6118074 (absent in the PMC's file lists) are under CC-BY licenses according to their PDF versions of articles. The intxt_cit_license_fromPMC.tsv file has two columns: • pmcid: PMCID of the article. • license: The article’s CC license information provided in PMC’s file lists. The value is nan when an article is not present in the PMC’s file lists. [2] https://www.ncbi.nlm.nih.gov/pmc/tools/ftp/ • Supplementary_File_1.zip – This file contains the code for generating the dataset.
keywords: citation context; in-text citation; inline citation; bibliometrics; science of science
published: 2023-08-11
 
This dataset contains leaf photosynthetic and biochemical traits, plant biomass, and yield in five C3 crops (chickpea, rice, snap bean, soybean, wheat) and four C4 crops (sorghum, maize, Miscanthus × giganteus, switchgrass) grown under ambient and elevated O3 concentration ([O3]) in the field at free-air O3 concentration enrichment (O3-FACE) facilities over the past 20 years.
keywords: C3 and C4 crops; elevated O3; FACE; photosynthesis; yield
published: 2024-01-30
 
The data files are for the paper entitled: Melting of the charge density wave by generation of pairs of topological defects in UTe2 to be published in Nature Physics. The data was obtained on a 300 mK custom designed Unisoku scanning tunneling microscope using the Nanonis module. All the data files have been named based on the Figure numbers that they represent.
keywords: superconductivity; triplet; topology; heavy fermion; Kondo; magnetic field; charge density wave
published: 2023-08-02
 
This dataset was developed as part of an online survey study that investigates how phatic expressions—comments that are social rather than informative in nature—influence the perceived helpfulness of online peer help-giving replies in an asynchronous college course discussion forum. During the study, undergraduate students (N = 320) rated and described the helpfulness of examples of replies to online requests for help, both with and without four types of phatic expressions: greeting/parting tokens, other-oriented comments, self-oriented comments, and neutral comments.
keywords: help-giving; phatic expression; discussion forum; online learning; engagement
Research Data Service Illinois Data Bank
Access and Use Policies Web Privacy Notice Contact Us