Illinois Data Bank Dataset Search Results
Results
published:
2021-04-15
To generate the bibliographic and survey data to support a data reuse study conducted by several Library faculty and accepted for publication in the Journal of Academic Librarianship, the project team utilized a series of web-based online scripts that employed several different endpoints from the Scopus API. The related dataset: "Data for: An Examination of Data Reuse Practices within Highly Cited Articles of Faculty at a Research University" contains survey design and results. <br />
1) <b>getScopus_API_process_dmp_IDB.asp</b>: used the search API query the Scopus database API for papers by UIUC authors published in 2015 -- limited to one of 9 pre-defined Scopus subject areas -- and retrieve metadata results sorted highest to lowest by the number of times the retrieved articles were cited. The URL for the basic searches took the following form: https://api.elsevier.com/content/search/scopus?query=(AFFIL%28(urbana%20OR%20champaign) AND univ*%29) OR (AF-ID(60000745) OR AF-ID(60005290))&apikey=xxxxxx&start=" & nstart & "&count=25&date=2015&view=COMPLETE&sort=citedby-count&subj=PHYS<br />
Here, the variable nstart was incremented by 25 each iteration and 25 records were retrieved in each pass. The subject area was renamed (e.g. from PHYS to COMP for computer science) in each of the 9 runs. This script does not use the Scopus API cursor but downloads 25 records at a time for up to 28 times -- or 675 maximum bibliographic records. The project team felt that looking at the most 675 cited articles from UIUC faculty in each of the 9 subject areas was sufficient to gather a robust, representative sample of articles from 2015. These downloaded records were stored in a temporary table that was renamed for each of the 9 subject areas. <br />
2) <b>get_citing_from_surveys_IDB.asp</b>: takes a Scopus article ID (eid) from the 49 UIUC author returned surveys and retrieves short citing article references, 200 at a time, into a temporary composite table. These citing records contain only one author, no author affiliations, and no author email addresses. This script uses the Scopus API cursor=* feature and is able to download all the citing references of an article 200 records at a time. <br />
3) <b>put_in_all_authors_affil_IDB.asp</b>: adds important data to the short citing records. The script adds all co-authors and their affiliations, the corresponding author, and author email addresses. <br />
4) <b>process_for_final_IDB.asp</b>: creates a relational database table with author, title, and source journal information for each of the citing articles that can be copied as an Excel file for processing by the Qualtrics survey software. This was initially 4,626 citing articles over the 49 UIUC authored articles, but was reduced to 2,041 entries after checking for available email addresses and eliminating duplicates.
keywords:
Scopus API; Citing Records; Most Cited Articles
published:
2022-06-22
Kang, Jeon-Young; Farkhad, Bita Fayaz; Chan, Man-pui Sally; Michels, Alexander; Albarracin, Dolores; Wang, Shaowen
(2022)
This dataset helps to investigate the Spatial Accessibility to HIV Testing, Treatment, and Prevention Services in Illinois and Chicago, USA.
The main components are: population data, healthcare data, GTFS feeds, and road network data. The core components are:
1) `GTFS` which contains GTFS (<a href="https://gtfs.org/">General Transit Feed Specification</a>) data which is provided by Chicago Transit Authority (CTA) from <a href="https://developers.google.com/transit/gtfs">Google's GTFS feeds</a>. Documentation defines the format and structure of the files that comprise a GTFS dataset: <a href="https://developers.google.com/transit/gtfs/reference?csw=1">https://developers.google.com/transit/gtfs/reference?csw=1</a>.
2) `HealthCare` contains shapefiles describing HIV healthcare providers in Chicago and Illinois respectively. The services come from <a href="https://locator.hiv.gov/">Locator.HIV.gov</a>.
3) `PopData` contains population data for Chicago and Illinois respectively. Data come from The American Community Survey and <a href="https://map.aidsvu.org/map">AIDSVu</a>. AIDSVu (https://map.aidsvu.org/map) provides data on PLWH in Chicago at the census tract level for the year 2017 and in the State of Illinois at the county level for the year 2016. The American Community Survey (ACS) provided the number of people aged 15 to 64 at the census tract level for the year 2017 and at the county level for the year 2016. The ACS provides annually updated information on demographic and socio economic characteristics of people and housing in the U.S.
4) `RoadNetwork` contains the road networks for Chicago and Illinois respectively from <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> using the Python <a href="https://osmnx.readthedocs.io/en/stable/">osmnx</a> package.
<b>The abstract for our paper is:</b>
Accomplishing the goals outlined in “Ending the HIV (Human Immunodeficiency Virus) Epidemic: A Plan for America Initiative” will require properly estimating and increasing access to HIV testing, treatment, and prevention services. In this research, a computational spatial method for estimating access was applied to measure distance to services from all points of a city or state while considering the size of the population in need for services as well as both driving and public transportation. Specifically, this study employed the enhanced two-step floating catchment area (E2SFCA) method to measure spatial accessibility to HIV testing, treatment (i.e., Ryan White HIV/AIDS program), and prevention (i.e., Pre-Exposure Prophylaxis [PrEP]) services. The method considered the spatial location of MSM (Men Who have Sex with Men), PLWH (People Living with HIV), and the general adult population 15-64 depending on what HIV services the U.S. Centers for Disease Control (CDC) recommends for each group. The study delineated service- and population-specific accessibility maps, demonstrating the method’s utility by analyzing data corresponding to the city of Chicago and the state of Illinois. Findings indicated health disparities in the south and the northwest of Chicago and particular areas in Illinois, as well as unique health disparities for public transportation compared to driving. The methodology details and computer code are shared for use in research and public policy.
keywords:
HIV;spatial accessibility;spatial analysis;public transportation;GIS
published:
2025-10-02
Jin, Yong-Su; Rao, Christopher; Ye, Quanhui; Oh, Hyunjoon; Tohidifar, Payman; Koh, Hyun Gi
(2025)
For economic and sustainable biomanufacturing, the oleaginous yeast Rhodotorula toruloides has emerged as a promising platform for producing biofuels, pharmaceuticals, and other valuable chemicals. However, genetic manipulation of R. toruloides has been limited by its high GC content and the lack of a replicating plasmid, necessitating gene integration into the genome of the yeast. To address these challenges, we developed the RT-EZ (R. toruloides Efficient Zipper) toolkit, a versatile tool based on Golden Gate assembly, designed to streamline R. toruloides engineering with improved efficiency and flexibility. The RT-EZ toolkit simplifies vector construction by incorporating new features such as bidirectional promoters and 2A peptides, color-based screening using RFP, and sequences optimized for both Agrobacterium tumefaciens-mediated transformation (ATMT) and easy linearization, enabling straightforward selection and transformation. Notably, the RT-EZ kit can be used to construct an expression cassette with four different genes in one assembly reaction, significantly improving vector construction speed and efficiency. The utility of the RT-EZ toolkit was demonstrated through the successful synthesis of arachidonic acid in R. toruloides by coexpressing fatty acid elongases and desaturases. This result underscores the potential of the RT-EZ toolkit to advance synthetic biology in R. toruloides, providing a streamlined method for addressing genetic engineering challenges in the yeast.
keywords:
gene editing; genome engineering
published:
2020-11-20
Jaikumar, Nikhil; Clemente, Tom; Long, Steve; Ge, Zhengxiang; Changa, Timothy
(2020)
This data set explores the effect of the cyanobacterial gene ictB on photosynthesis in sorghum, under both normal greenhouse growing temperatures (32 C / 25 C) and during and after an 8 day chilling stress (10 C / 5 C). IctB is a cyanobacterial gene of unknown function, which was initially thought to be involved in inorganic carbon transport into cells. While ictB is known now not to be an independently active carbon transporter in its own right, it may play a role in passive diffusion of metabolites. This transgene was introduced into sorghum by the lab of Thomas Clemente, through Agrobacterium mediated transformation, alone and in combination with the tomato sedoheptulose-1,7-bisphosphatase (SBPase) gene. Eleven events (six double construct and five single construct ictB) were involved in this study. SBPase was included because some previous experiments in C3 species and some previous modeling work, as well as its position at a metabolic branch point, indicates it plays a role as a control point for photosynthesis. A chilling treatment was included because chilling is one of the most serious ecological factors limiting the range of C4 species.
Data includes gene expression, metabolomics (at normal growing temperature), SBPase enzyme activity, biomass and photosynthetic traits at both warm temperature and during and after chilling stress.
-----------------
EXPLANATORY NOTES FOR ICTB/SBPASE SORGHUM MANUSCRIPT
Data are organized into 10 worksheets, representing an expected 10 tables that will serve a supplementary role in the final publication. These include data on gene expression, metabolomics (at normal growing temperature), SBPase enzyme activity, biomass and photosynthetic traits at both warm temperature and during and after chilling stress.
<i><b>Tables are as follows:</i></b>
1. Event_Code: for Table S1. Event codes for events and constructs. Two constructs were generated for this study, and numerous transgenic “events” (i.e. independent transformations) were carried out for each construct. A construct represents the actual vector which was introduced into the plants (complete with promoter, gene of interest, marker gene, etc.) while an event represents a single successful introduction of the transgene. Events are uniquely labeled with letter and number strings but also with a four-digit number for ease of reference, this table explains which event corresponds to each four-digit number.
2. Photosynthetic_Data: for Table S2. Photosynthetic data at greenhouse growing temperature, for ictB single construct, ictB/SBPase double construct, and wild type lines. Five ictB and six ictB/SBPase events were included. Greenhouse growing temperature was approximately 32 °C and 25 °C night. Photosynthetic parameters were measured using a Licor 6400-XT, and included parameters related to carbon dioxide uptake, water loss, and chlorophyll fluorescence.
3. Chilling_Treatment: for Table S3. Photosynthetic response to chilling treatment, for ictB single construct, and wild type lines. Four ictB events were included. Chilling treatment lasted approximately 8 days and began either 3.5 or 5.5 weeks after transplanting the plants (chilling was done in two batches). Chilling treatment involved temperatures of 10 °C day / 7 °C night in growth chambers. Photosynthetic parameters were measured at several time points during and after the chilling treatment, were measured using a Licor 6400-XT, and included parameters related to carbon dioxide uptake, water loss, and chlorophyll fluorescence.
4. SBPase_Activity: for Table S4. SBPase activity in double construct plants. These data measure in vitro substrate-saturated activity of SBPase in desalted extracts from leaf tissues, at 25 °C. Units are micromoles of SBP processed per second per m2 of leaf tissue. Five ictB/SBPase events were included.
5. 2014_gene_exp: for Table S5. Gene expression in 2014 experiment (units of cycle times). These data measure cycle times to threshold, relative to reference genes, for expression of ictB and SBPase. Six ictB single construct events and five ictB/SBPase double construct events were included. Cycle times to threshold relative to reference genes (ΔCT) are inversely related to number of transcripts relative to reference genes, as follows:
ΔCT = -log2([NictB]/[Nreference])/[1 + log2b] where b = efficiency of replication.
6. 2016_gene_exp: for Table S5. Gene expression in 2016 experiment (units of cycle times). These data measure cycle times to threshold, relative to reference genes, for expression of ictB and SBPase. Six ictB single construct events and five ictB/SBPase double construct events were included. Cycle times to threshold relative to reference genes (ΔCT) are inversely related to number of transcripts relative to reference genes, as follows:
ΔCT = -log2([NictB]/[Nreference])/[1 + log2b] where b = efficiency of replication.
7. Metabolites: for Table S7. Levels of 267 metabolites in leaf tissue. Four ictB single construct events and four ictB/SBPase double construct events were included in these analyses. Metabolites were measured in methanol-extracted samples, either by liquid chromatography / mass spectrometry or by gas chromatography / mass spectrometry, and were compared between events on a relative basis. As quantification was relative to wild type rather than on an absolute basis, no units are included.
8. Metabolite_F_values: for Table S8. F values for effects of ictB, SBPase (in cases where the model was better with a SBPase effect) and event. These analyses are done for each metabolite included in Table S7, and show effects of the explanatory variables ictB, SBPase, and individual event.
9. Biomass_2020: for Table S9. Biomass and grain yield at harvest, for ictB, ictB/SBPase and wild type sorghum plants in spring 2020. Four ictb/SBPase double construct and four ictB single construct events were included.
10. Biomass_2017: for Table S10. Biomass and grain yield at harvest, in chilled and non-chilled sorghum plants containing the ictB transgene (along with wild type controls) in fall 2017. Four ictB single construct events were included. Chilling treatment involved temperatures of 10 °C day / 7 °C night in growth chambers.
<i><b>All the variables in the file are explained as below:</i></b>
o Type (IctB-SBPase and IctB). This refers to whether a plant is wild type, single construct (contains only the ictB transgene) or double construct (contains both the ictB and SBPase transgenes).
o Code: these codes are shorter labels to refer to each transgene event for the sake of convenience.
o Alternate_Code: these codes are shorter labels to refer to each transgene event for the sake of convenience.
o Event Number: these are unique labels for each transgenic events.
o Construct Number: these are labels for each transgenic construct (either the ictB single construct or the ictB/SBPase double construct).
o year (i): this refers to the year in which the study was conducted (2014, 2016, 2017, or 2020)
o transgene or Transgenic: whether the transgene was present
o construct or Type : whether the ictB or the ictB/SBPase construct was present (double, single, wildtype):
o temp: leaf temperature during the measurement
o A: carbon assimilation rate, in μmol m-2 s-1
o gs: stomatal conductance, in mol m-2 s-1
o CI: intercellular carbon dioxide concentration, in parts per million or μL L-1
o fvfm:FV’/FM’ (maximal potential photosystem II quantum yield under light adapted conditions), dimensionless ratio
o phipsill: ΦPSII (maximal potential photosystem II quantum yield under light adapted conditions), dimensionless ratio
o qP: photochemical quenching, i.e. ratio of ΦPSII to FV’/FM’ , dimensionless ratio
o iwue: intrinsic water use efficiency, i.e. ratio of carbon assimilation rate to stomatal conductance, in units of μmol mol-1
o event: individual transgenic / transformation event
o Vmax: substrate-saturated in vitro activity of the SBPase enzyme, in μmol m-2 s-1
o ID: identification number of sample
o ΔCT1: difference in cycle times to threshold during gene expression (quantitative PCR) assay, between ictB and the reference gene GAPDH, in units of cycles
o ΔCT2: cycle times to threshold during gene expression (quantitative PCR) assay, between SBPase and the reference gene GAPDH, in units of cycles
o GAPDH: cycle times to threshold for the reference gene GAPDH (glyceraldehyde phosphate dehydrogenase)
o IctB: cycle times to threshold for the gene of interest ictB
o SBPase: cycle times to threshold for the gene of interest SBPase
o v1 to v267 represent individual metabolite (see the heading immediately above the labels v1, v2, etc.). Variables v268-v272 refer to total (summed) metabolite levels for particular pathways of interest.
o leaf: Leaf and stem dry biomass (in grams)
o seed: Seedhead dry biomass (in grams)
o biomass: Total (leaf, stem + seed head) dry biomass (in grams)
o harvind: ratio of seed head dry biomass to total dry biomass
o treatment (chilled and nonchilled): “Chilled” plants were grown under warm greenhouse conditions (32 °C day / 25 °C night) for 6 or 8 weeks, then switched to chilling temperatures under growth chamber conditions (10 °C / 7 °C night) for 8 days, and were then returned to greenhouse growing conditions.
-----------------
keywords:
ictB; SBPase; photosynthesis; sorghum; chilling
published:
2025-07-25
Mori, Jameson; Rivera, Nelda; Brown, William; Skinner, Daniel; Schlichting, Peter; Novakofski, Jan; Mateus-Pinilla, Nohra
(2025)
This dataset contains the pregnancy status of wild, white-tailed deer (Odocoileus virginianus) from northern Illinois culled as part of the Illinois Department of Natural Resources' chronic wasting disease (CWD) surveillance program. Fiscal years 2005 through 2024 are included. A fiscal year is the time between July 1st of one calendar year and June 30th of the next. Variables in this dataset include the pregnancy status, CWD infection status, age, weight, and day of mortality for each female deer, as well as the deer land cover utility (LCU) score for the TRS, township, or county from which the deer was culled. The deer population density of the county is also included. Data have been anonymized for landowner privacy reasons so that the location and year are not identifiable, but will give the same modeling results by maintaining how the data are grouped. The R code used to conduct the regression modeling is also included.
keywords:
cervid; Cervidae, chronic wasting disease; CWD; reproduction; white-tailed deer; Odocoileus virginianus; pregnancy; regression
published:
2019-04-05
Dong, Xiaoru; Xie, Jingyi; Hoang, Linh
(2019)
File Name: Inclusion_Criteria_Annotation.csv
Data Preparation: Xiaoru Dong
Date of Preparation: 2019-04-04
Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang
Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks.
Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider.
Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews.
Description: The file contains lists of inclusion criteria of Cochrane Systematic Reviews and the manual annotation results. 5420 inclusion criteria were annotated, out of 7158 inclusion criteria available. Annotations are either "Only RCTs" or "Others". There are 2 columns in the file:
- "Inclusion Criteria": Content of inclusion criteria of Cochrane Systematic Reviews.
- "Only RCTs": Manual Annotation results. In which, "x" means the inclusion criteria is classified as "Only RCTs". Blank means that the inclusion criteria is classified as "Others".
Notes:
1. "RCT" stands for Randomized Controlled Trial, which, in definition, is "a work that reports on a clinical trial that involves at least one test treatment and one control treatment, concurrent enrollment and follow-up of the test- and control-treated groups, and in which the treatments to be administered are selected by a random process, such as the use of a random-numbers table." [Randomized Controlled Trial publication type definition from https://www.nlm.nih.gov/mesh/pubtypes.html].
2. In order to reproduce the relevant data to this, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.
3. This datafile (V2) is a updated version of the datafile published at https://doi.org/10.13012/B2IDB-5958960_V1 with some minor spelling mistakes in the data fixed.
keywords:
Inclusion criteri; Randomized controlled trials; Machine learning; Systematic reviews
published:
2020-09-02
Schneider, Jodi; Ye, Di; Hill, Alison
(2020)
Citation context annotation. This dataset is a second version (V2) and part of the supplemental data for Jodi Schneider, Di Ye, Alison Hill, and Ashley Whitehorn. (2020) "Continued post-retraction citation of a fraudulent clinical trial report, eleven years after it was retracted for falsifying data". Scientometrics. In press, DOI: 10.1007/s11192-020-03631-1
Publications were selected by examining all citations to the retracted paper Matsuyama 2005, and selecting the 35 citing papers, published 2010 to 2019, which do not mention the retraction, but which mention the methods or results of the retracted paper (called "specific" in Ye, Di; Hill, Alison; Whitehorn (Fulton), Ashley; Schneider, Jodi (2020): Citation context annotation for new and newly found citations (2006-2019) to retracted paper Matsuyama 2005. University of Illinois at Urbana-Champaign. <a href="https://doi.org/10.13012/B2IDB-8150563_V1">https://doi.org/10.13012/B2IDB-8150563_V1</a> ). The annotated citations are second-generation citations to the retracted paper Matsuyama 2005 (RETRACTED: Matsuyama W, Mitsuyama H, Watanabe M, Oonakahara KI, Higashimoto I, Osame M, Arimura K. Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD. Chest. 2005 Dec 1;128(6):3817-27.), retracted in 2008 (Retraction in: Chest (2008) 134:4 (893) https://doi.org/10.1016/S0012-3692(08)60339-6).
<b>OVERALL DATA for VERSION 2 (V2)</b>
FILES/FILE FORMATS
Same data in two formats:
2010-2019 SG to specific not mentioned FG.csv - Unicode CSV (preservation format only) - same as in V1
2010-2019 SG to specific not mentioned FG.xlsx - Excel workbook (preferred format) - same as in V1
Additional files in V2:
2G-possible-misinformation-analyzed.csv - Unicode CSV (preservation format only)
2G-possible-misinformation-analyzed.xlsx - Excel workbook (preferred format)
<b>ABBREVIATIONS: </b>
2G - Refers to the second-generation of Matsuyama
FG - Refers to the direct citation of Matsuyama (the one the second-generation item cites)
<b>COLUMN HEADER EXPLANATIONS </b>
File name: 2G-possible-misinformation-analyzed. Other column headers in this file have same meaning as explained in V1. The following are additional header explanations:
Quote Number - The order of the quote (citation context citing the first generation article given in "FG in bibliography") in the second generation article (given in "2G article")
Quote - The text of the quote (citation context citing the first generation article given in "FG in bibliography") in the second generation article (given in "2G article")
Translated Quote - English translation of "Quote", automatically translation from Google Scholar
Seriousness/Risk - Our assessment of the risk of misinformation and its seriousness
2G topic - Our assessment of the topic of the cited article (the second generation article given in "2G article")
2G section - The section of the citing article (the second generation article given in "2G article") in which the cited article(the first generation article given in "FG in bibliography") was found
FG in bib type - The type of article (e.g., review article), referring to the cited article (the first generation article given in "FG in bibliography")
FG in bib topic - Our assessment of the topic of the cited article (the first generation article given in "FG in bibliography")
FG in bib section - The section of the cited article (the first generation article given in "FG in bibliography") in which the Matsuyama retracted paper was cited
keywords:
citation context annotation; retraction; diffusion of retraction; second-generation citation context analysis
published:
2026-01-15
Huang, Xiaoqiang; Jiang, Guangde; Harrison, Wesley; Wang, Binju; Zhao, Huimin
(2026)
Exploiting nature’s catalysts for non-natural transformations that are inaccessible to chemocatalysis is highly desirable but challenging. On the one hand, the widespread nicotinamide-dependent oxidoreductases have not been utilized for single-electron-transfer-induced bimolecular cross-couplings; on the other, the addition of catalytic asymmetric radical conjugate to terminal alkenes remains a challenge owing to strong racemic background reaction and unselective termination of prochiral radical species. Here we report a chemomimetic biocatalysitic approach for construction of alpha-carbonyl stereocentres via an unnatural intermolecular conjugate addition of N-(acyloxy)phthalimides-derived radicals with acceptor-substituted terminal alkenes, by combination of visible-light excitation and nicotinamide-dependent ketoreductases (KREDs). Based on protein crystal structure, we engineered KREDs via a semi-rational mutagenesis strategy to improve reaction outcomes with a small and high-quality variants library. Mechanistic investigations combining wet experiments, crystallographic studies and computational simulations demonstrate that the repurposed biocatalyst can suppress racemic background reaction and unselected side reactions, yielding enantioselectivity that is challenging to achieve by chemocatalysis.
keywords:
Catalysis
published:
2021-02-28
Ghosh, Sudipta; Riemer, Nicole; Giuliani, Graziano; Giorgi , Filippo; Ganguly, Dilip; Dey, Sagnik
(2021)
This dataset contains the RegCM4 simulations used in the article " Implementation of dynamic ageing of carbonaceous aerosols in regional climate model RegCM". This dataset was used to investigate the impact of a new aging parameterisation scheme implemented in a regional climate model RegCM4. The dataset contains two sets of simulations: Expt_fix and Expt_dyn. It consists of the seasonal mean and daily mean values of the variables that were used to create the visualizations of this study. The Expt_fix and Expt_dyn dataset contain 34 and 38 NetCDF files, respectively. The CERES_vs_2expts_new.mat file is the comparison between CERES shortwave downward flux at the surface and same model outputs from two experiments for clear sky and all sky conditions.
--------------------------------------------------
The following information about the dataset was generated on 2021-01-08 by SUDIPTA GHOSH
<b>GENERAL INFORMATION</b>
<i>1. Date of data collection (single date, range, approximate date):</i> 2019-01-01 to 2019-12-31
<i>2. Geographic location of data collection:</i> Urbana-Champaign,Illinois, USA
<i>3. Information about funding sources that supported the collection of the data:</i> This work is supported by the MoEFCC under the NCAP-COALESCE project [Grant No. 14/10/2014-CC]. The first author acknowledges DST-INSPIRE fellowship [IF150055] and Fulbright-Kalam Climate Doctoral fellowship. N. R. acknowledges funding from NSF AGS-1254428 and DOE grant DE-SC0019192. Department of Science and Technology, Funds for Improvement of Science and Technology infrastructure in universities and higher educational institutions (DST-FIST) grant (SR/FST/ESII-016/2014) are acknowledged for the computing support.
<b>DATA & FILE OVERVIEW</b>
<i>1. File List:</i> Expt_fix and Expt_dyn datasets contain the analysed seasonal means and daily means of the variables that have been used to create the visualizations of this study. Each of the Expt_fix and Expt_dyn datasets contains 34 and 38 NetCDF files, respectively.
<i>2. Relationship between files, if important:</i> NA
<i>3. Additional related data collected that was not included in the current data package:</i> No
<b>METHODOLOGICAL INFORMATION</b>
<i>1. Description of methods used for collection/generation of data: </i>
The model RegCM4 code is freely available online from <a href="http://gforge.ictp.it/gf/project/regcm/">http://gforge.ictp.it/gf/project/regcm/</a>.
The anthropogenic aerosol emissions considered for the simulations are taken from IIASA inventory. The data used can be easily accessed online <a href="http://clima-dods.ictp.it/regcm4/">http://clima-dods.ictp.it/regcm4/</a> website.
TRMM observed precipitation data can be assessed from <a href="https://giovanni.gsfc.nasa.gov/giovanni/">https://giovanni.gsfc.nasa.gov/giovanni/</a> website.
CRU temperature data is available at <a href="https://crudata.uea.ac.uk/cru/data/hrg/">https://crudata.uea.ac.uk/cru/data/hrg/</a>.
CERES satellite surface shortwave downward fluxes are available at <a href="https://ceres.larc.nasa.gov/data/">https://ceres.larc.nasa.gov/data/</a> website.
Input files for the RegCM4 model are archived in <a href="http://clima-dods.ictp.it/regcm4/">http://clima-dods.ictp.it/regcm4/</a> website.
This dataset contains the RegCM4 simulations used in the article " Implementation of dynamic ageing of carbonaceous aerosols in regional climate model RegCM ". Two sets of simulations: Expt_fix and Expt_dyn consists of the output data . This dataset only contains the analysed seasonal mean and daily mean of the variables that have been used to create the visualizations of this study. Each of Expt_fix and Expt_dyn contains 34 and 38 NetCDF files respectively. This dataset was used to investigate the impact of a new aging parameterisation scheme implemented in a regional climate model RegCM4.
<i>2. Methods for processing the data:</i> Seasonal Mean and daily average values were extracted from 6-hourly model output.
<i>3. Instrument- or software-specific information needed to interpret the data:</i> CDO-1.7.1, Grads-2.0.a9, Matlab2016b
<i>4. Standards and calibration information, if appropriate:</i> NA
<i>5. Environmental/experimental conditions:</i> NA
<i>6. Describe any quality-assurance procedures performed on the data:</i> NA
<i>7. People involved with sample collection, processing, analysis and/or submission:</i> Sudipta Ghosh, Nicole Riemer, Graziano Giuliani, Filippo Giorgi, Dilip Ganguly, Sagnik Dey
<b>DATA-SPECIFIC INFORMATION FOR: Expt_fix_data.tar.gz</b>
<i>1. Number of variables:</i> 29
<i>2. Number of cases/rows:</i> NA
<i>3. Variable List:</i> Mass concentration (Kg m-3) of BC, BC_HB, BC_HL, OC, OC_HB, OC_HL; Columnar burden (mg m-2)] of BC, BC_HL, BC_HB, OC; Dry deposition flux (mg m-2 day-1) of BC_HB, BC_HL, OC_HB, OC_HL; Wet deposition flux due washout (mg m-2 day-1) of BC_HB, BC_HL, OC_HB, OC_HL; Wet deposition flux due to rainout (mg m-2 day-1) of BC_HB, BC_HL OC_HB, OC_HL; AOD (unit less), precipitation (Kg m-2 s-1), temperature (K) , v-wind (m s-1), u-wind (m s-1), Surface shortwave downward flux (W m-2), Shortwave radiative forcing at the surface and top of atmosphere (W m-2)
<b>DATA-SPECIFIC INFORMATION FOR: Expt_dyn_data.tar.gz</b>
<i>1. Number of variables:</i> 30
<i>2. Number of cases/rows:</i> NA
<i>3. Variable List:</i> Mass concentration (Kg m-3) of BC, BC_HB, BC_HL, OC, OC_HB, OC_HL; Columnar burden (mg m-2)] of BC, BC_HL, BC_HB, OC; Dry deposition flux (mg m-2 day-1) of BC_HB, BC_HL OC_HB, OC_HL; Wet deposition flux due washout (mg m-2 day-1) of BC_HB, BC_HL OC_HB, OC_HL; Wet deposition flux due to rainout (mg m-2 day-1) of BC_HB, BC_HL OC_HB, OC_HL; AOD (unit less); precipitation (Kg m-2 s-1); temperature (K); v-wind (m s-1); u-wind (m s-1); Surface shortwave downward flux (W m-2); Shortwave radiative forcing at the surface and top of atmosphere (W m-2); ageingscale (s-1)
<b>DATA-SPECIFIC INFORMATION FOR: CERES_vs_2expts_new.mat</b>
<i>1. Number of variables:</i> 12
<i>2. Number of cases/rows:</i> NA
<i>3. Variable List:</i> Surface shortwave downward flux for clear sky (W/m-2) for CERES, Expt_fix, Expt_dyn (for winter JF and monsoon JJAS seasons); Surface shortwave downward flux for all sky conditions (W/m-2) for CERES, Expt_fix, Expt_dyn (for winter JF and monsoon JJAS seasons).
<b>NOTE:</b> The following information applies for all three (3) files:
<i> Missing data codes:</i> NA
<i>Specialized formats or other abbreviations used:</i> NA
keywords:
Carbonaceous aerosols; ageing parameterisation scheme; regional climate model; NetCDF
published:
2024-03-06
OKeefe, Joy; Bennett, Andrew
(2024)
These data are the result of analyses of the metagenome of North American bats, including 18s and 16s barcode genes designed to target microorganisms of the gut. These files are Phyloseq import files created by the DADA2 program. Each barcode gene is uploaded separately as the four files required to build a phyloseq object. For each barcode gene, the files include amplicon sequence variant (ASV) sequences, sequence tables (seqtab) which connect individual samples to the ASVs, tax tables (taxtab) which identify the taxa present as determined by a Bayesian RDP classifier, and rooted phylogenetic trees for the ASVs. Additionally, we have included a "sample_data" file which is necessary for sorting of samples across all four sequence analysis data sets by study and species. Some sample information which could identify the location of endangered species has been restricted. Multiple studies are represented in the data which can be accessed using standard methods in the Phyloseq program (e.g. For a study of bats, parasites, and gut microbiome dysregulation by Bennett, Suski, and OKeefe 2024 [in prep March 2024], study specific data can be accessed using the Study variable "DYSBIOMICS." File names include reference to the primer set used to generate them (18s primer sets: G3, G4, G6; 16s primer set: 341F3_806R5).
keywords:
metagenomics
published:
2021-05-01
Cheng, Ti-Chung; Li, Tiffany Wenting; Karahalios, Karrie; Sundaram, Hari
(2021)
This is the first version of the dataset.
This dataset contains anonymize data collected during the experiments mentioned in the publication: “I can show what I really like.”: Eliciting Preferences via Quadratic Voting that would appear in April 2021.
Once the publication link is public, we would provide an update here.
These data were collected through our open-source online systems that are available at (experiment1)[https://github.com/a2975667/QV-app] and (experiment 2)[https://github.com/a2975667/QV-buyback]
There are two folders in this dataset. The first folder (exp1_data) contains data collected during experiment 1; the second folder (exp2_data) contains data collected during experiment 2.
keywords:
Quadratic Voting; Likert scale; Empirical studies; Collective decision-making
published:
2019-10-15
Choi, Sang Hyun; Rao, Vikyath; Gernat, Tim; Hamilton, Adam; Robinson, Gene; Goldenfeld, Nigel
(2019)
Filtered trophallaxis interactions for two honeybee colonies, each containing 800 worker bees and one queen. Each colony consists of bees that were administered a juvenile hormone analogy, a vehicle treatment, or a sham treatment to determine the effect of colony perturbation on the duration of trophallaxis interactions. Columns one and two display the unique identifiers for each bee involved in a particular trophallaxis exchange, and columns three and four display the Unix timestamp of the beginning/end of the interaction (in milliseconds), respectively.<br /><b>Note</b>: the queen interactions were omitted from the uploaded dataset for reasons that are described in submitted manuscript. Those bees that performed poorly are also omitted from the final dataset.
keywords:
honey bee; trophallaxis; social network
published:
2020-09-25
This repository contains the datasets and corresponding results for the paper "MAGUS: Multiple Sequence Alignment using Graph Clustering".
The Datasets.zip archive contains the ROSE, balibase, Gutell, and RNASim datasets used in our experiments.
The Results.zip archive contains the outputs of running our methods against these datasets.
Datasets used:
ROSE: 10 simulated nucleotide model conditions from the SATe paper, each with 20 replicates, and with 1000 sequences per replicate.
The ROSE datasets were originally taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/sate-i">https://sites.google.com/eng.ucsd.edu/datasets/alignment/sate-i</a>
RNASim: This is a collection of simulated nucleotide datasets that were generated under a model of evolution that reflects selection due to RNA structural constraints. We sampled 20 subsets of 1000 sequences each, as well as 10 subsets of 10000 each, by randomly sampling from the original million-sequence RNASim dataset.
Gutell: 16S.M, 16S.3, 16S.T, 16S.B.ALL: Four biological nucleotide datasets from the Comparative Ribosomal Website (CRW) with cleaned reference alignments from SATe. Since PASTA is restricted to datasets without sequence length heterogeneity, these were modified to remove sequences that deviate by more than 20% from the median length. The scrubbed datasets range from 740 to 24,246 sequences. The pre-screened 16S datasets were taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/16s23s">https://sites.google.com/eng.ucsd.edu/datasets/alignment/16s23s</a>
BAliBASE: We use eight BAliBASE amino acid datasets used in the PASTA paper. As above, we remove outlier sequences, which leaves us with sizes ranging from 195 to 732 sequences. The pre-screened Balibase datasets were taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/pastaupp">https://sites.google.com/eng.ucsd.edu/datasets/alignment/pastaupp</a>
published:
2025-02-07
Huang, Annie H.; Matthews, Jeffrey W.
(2025)
These data represent the raw data from the paper “Influence of light availability and water depth on competition between Phalaris arundinacea and herbaceous vines” published in Wetlands by Annie H. Huang and Jeffrey W. Matthews. The data are archived in one file: Huang&Matthews_mesocosm_data_archive. This file includes raw data collected during a greenhouse experiment described in the paper.
published:
2025-09-17
Kamara, Shasta; Glomb, Jackson; Suski, Cory
(2025)
Data was generated from juvenile paddlefish acclimated to one of three different temperatures (13.0°C, 17.5°C, or 22.0°C) for two weeks. After which, fish were subjected to one of two experiments, one being simulated angling in which physiological parameters (stress hormones, lactate, glucose, ions, and oxygen transport parameters were evaluated in plasma or whole blood), the other experiment consisted of critical thermal maxima tests. Data set includes physiological parameters, water quality temperatures, and morphometric data generated from these experiments and fish.
keywords:
Sport fish, critical thermal maximum, exercise, recovery, conservation, fisheries, management
published:
2018-07-28
Hoang, Linh; Schneider, Jodi
(2018)
This dataset presents a citation analysis and citation context analysis used in Linh Hoang, Frank Scannapieco, Linh Cao, Yingjun Guan, Yi-Yun Cheng, and Jodi Schneider. Evaluating an automatic data extraction tool based on the theory of diffusion of innovation. Under submission. We identified the papers that directly describe or evaluate RobotReviewer from the list of publications on the RobotReviewer website <http://www.robotreviewer.net/publications>, resulting in 6 papers grouped into 5 studies (we collapsed a conference and journal paper with the same title and authors into one study). We found 59 citing papers, combining results from Google Scholar on June 05, 2018 and from Scopus on June 23, 2018. We extracted the citation context around each citation to the RobotReviewer papers and categorized these quotes into emergent themes.
keywords:
RobotReviewer; citation analysis; citation context analysis
published:
2023-06-01
Pan, Chao; Peng, Jianhao; Chien, Eli; Milenkovic, Olgica
(2023)
This dataset contains four real-world sub-datasets with data embedded into Poincare ball models, including Olsson's single-cell RNA expression data, CIFAR10, Fashion-MNIST and mini-ImageNet. Each sub-dataset has two corresponding files: one is the data file, the other one is the pre-computed reference points for each class in the sub-dataset. Please refer to our paper (https://arxiv.org/pdf/2109.03781.pdf) and codes (https://github.com/thupchnsky/PoincareLinearClassification) for more details.
keywords:
Hyperbolic space; Machine learning; Poincare ball models; Perceptron algorithm; Support vector machine
published:
2020-02-01
Williams, Benjamin R.; Benson, Thomas J.
(2020)
This data describes habitat use, availability, landscape level influences, and daily movement of dabbling ducks in the Wabash River Valley of southeastern Illinois and southwestern Indiana. It contains triangulated locations of individual ducks, associated habitat assignments of those locations, flood survey data to determine water availability, and randomly generated points to assess landscape level questions.
keywords:
waterfowl; ducks; dabbling; mallard; teal; habitat
published:
2022-10-13
Xue, Qingquan; Xue, Qingquan; Dietrich, Christopher H.; Dietrich, Christopher H.; Zhang, Yalin; Zhang, Yalin
(2022)
The text file contains the original DNA nucleotide sequence data used in the phylogenetic analyses of Xue et al. (in review), comprising the 13 protein-coding genes and 2 ribosomal gene subunits of the mitochondrial genome. The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 30 taxa (species) and 13078 characters, indicate that the characters are DNA sequence, that gaps inserted into the DNA sequence alignment are indicated by a dash, and that missing data are indicated by a question mark. The positions of data partitions are indicated in the mrbayes block of commands for the phylogenetic program MrBayes (version 3.2.6) beginning near the end of the file. The mrbayes block also contains instructions for MrBayes on various non-default settings for that program. These are explained in the Methods section of the submitted manuscript. Two supplementary tables in the provided PDF file provide additional information on the species in the dataset, including the GenBank accession numbers for the sequence data (Table S1) and the DNA substitution models used for each of the individual mitochondrial genes and for different codon positions of the protein-coding genes used for analyses in the programs MrBayes and IQ-Tree (version 1.6.8) (Table S2). Full citations for references listed in Table S1 can be found by searching GenBank using the corresponding accession number. The supplemental tables will also be linked to the article upon publication at the journal website.
keywords:
Hemiptera; phylogeny; mitochondrial genome; morphology; leafhopper
published:
2023-07-05
Fu, Yuanxi; Hsiao, Tzu-Kun; Joshi, Manasi Ballal; Lischwe Mueller, Natalie
(2023)
The salt controversy is the public health debate about whether a population-level salt reduction is beneficial. This dataset covers 82 publications--14 systematic review reports (SRRs) and 68 primary study reports (PSRs)--addressing the effect of sodium intake on cerebrocardiovascular disease or mortality. These present a snapshot of the status of the salt controversy as of September 2014 according to previous work by epidemiologists: The reports and their opinion classification (for, against, and inconclusive) were from Trinquart et al. (2016) (Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184 ), which collected 68 PSRs, 14 SRRs, 11 clinical guideline reports, and 176 comments, letters, or narrative reviews. Note that our dataset covers only the 68 PSRs and 14 SRRs from Trinquart et al. 2016, not the other types of publications, and it adds additional information noted below.
This dataset can be used to construct the inclusion network and the co-author network of the 14 SRRs and 68 PSRs. A PSR is "included" in an SRR if it is considered in the SRR's evidence synthesis. Each included PSR is cited in the SRR, but not all references cited in an SRR are included in the evidence synthesis or PSRs. Based on which PSRs are included in which SRRs, we can construct the inclusion network. The inclusion network is a bipartite network with two types of nodes: one type represents SRRs, and the other represents PSRs. In an inclusion network, if an SRR includes a PSR, there is a directed edge from the SRR to the PSR. The attribute file (report_list.csv) includes attributes of the 82 reports, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Notably, 11 PSRs have never been included in any SRR in the dataset. They are unused PSRs. If visualized with the inclusion network, they will appear as isolated nodes.
We used a custom-made workflow (Fu, Y. (2022). Scopus author info tool (1.0.1) [Python]. https://github.com/infoqualitylab/Scopus_author_info_collection ) that uses the Scopus API and manual work to extract and disambiguate authorship information for the 82 reports. The author information file (salt_cont_author.csv) is the product of this workflow and can be used to compute the co-author network of the 82 reports.
We also provide several other files in this dataset. We collected inclusion criteria (the criteria that make a PSR eligible to be included in an SRR) and recorded them in the file systematic_review_inclusion_criteria.csv. We provide a file (potential_inclusion_link.csv) recording whether a given PSR had been published as of the search date of a given SRR, which makes the PSR potentially eligible for inclusion in the SRR. We also provide a bibliography of the 82 publications (supplementary_reference_list.pdf). Lastly, we discovered minor discrepancies between the inclusion relationships identified by Trinquart et al. (2016) and by us. Therefore, we prepared an additional edge list (inclusion_net_edges_trinquart.csv) to preserve the inclusion relationships identified by Trinquart et al. (2016).
<b>UPDATES IN THIS VERSION COMPARED TO V2</b> (Fu, Yuanxi; Hsiao, Tzu-Kun; Joshi, Manasi Ballal (2022): The Salt Controversy Systematic Review Reports and Primary Study Reports Network Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6128763_V2)
- We added a new column "pub_date" to report_list.csv
- We corrected mistakes in supplementary_reference_list.pdf for report #28 and report #80. The author of report #28 is not Salisbury D but Khaw, K.-T., & Barrett-Connor, E. Report #80 was mistakenly mixed up with report #81.
keywords:
systematic reviews; evidence synthesis; network analysis; public health; salt controversy;
published:
2025-03-12
Jeong, Gangwon; Villa, Umberto; Park, Seonyeong; Anastasio, Mark A.
(2025)
References
- Jeong, Gangwon, Umberto Villa, and Mark A. Anastasio. "Revisiting the joint estimation of initial pressure and speed-of-sound distributions in photoacoustic computed tomography with consideration of canonical object constraints." Photoacoustics (2025): 100700.
- Park, Seonyeong, et al. "Stochastic three-dimensional numerical phantoms to enable computational studies in quantitative optoacoustic computed tomography of breast cancer." Journal of biomedical optics 28.6 (2023): 066002-066002.
Overview
- This dataset includes 80 two-dimensional slices extracted from 3D numerical breast phantoms (NBPs) for photoacoustic computed tomography (PACT) studies. The anatomical structures of these NBPs were obtained using tools from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project. The methods used to modify and extend the VICTRE NBPs for use in PACT studies are described in the publication cited above.
- The NBPs in this dataset represent the following four ACR BI-RADS breast composition categories:
> Type A - The breast is almost entirely fatty
> Type B - There are scattered areas of fibroglandular density in the breast
> Type C - The breast is heterogeneously dense
> Type D - The breast is extremely dense
- Each 2D slice is taken from a different 3D NBP, ensuring that no more than one slice comes from any single phantom.
File Name Format
- Each data file is stored as a .mat file. The filenames follow this format: {type}{subject_id}.mat where{type} indicates the breast type (A, B, C, or D), and {subject_id} is a unique identifier assigned to each sample. For example, in the filename D510022534.mat, "D" represents the breast type, and "510022534" is the sample ID.
File Contents
- Each file contains the following variables:
> "type": Breast type
> "p0": Initial pressure distribution [Pa]
> "sos": Speed-of-sound map [mm/μs]
> "att": Acoustic attenuation (power-law prefactor) map [dB/ MHzʸ mm]
> "y": power-law exponent
> "pressure_lossless": Simulated noiseless pressure data obtained by numerically solving the first-order acoustic wave equation using the k-space pseudospectral method, under the assumption of a lossless medium (corresponding to Studies I, II, and III).
> "pressure_lossy": Simulated noiseless pressure data obtained by numerically solving the first-order acoustic wave equation using the k-space pseudospectral method, incorporating a power-law acoustic absorption model to account for medium losses (corresponding to Study IV).
* The pressure data were simulated using a ring-array transducer that consists of 512 receiving elements uniformly distributed along a ring with a radius of 72 mm.
* Note: These pressure data are noiseless simulations. In Studies II–IV of the referenced paper, additive Gaussian i.i.d. noise were added to the measurement data. Users may add similar noise to the provided data as needed for their own studies.
- In Study I, all spatial maps (e.g., sos) have dimensions of 512 × 512 pixels, with a pixel size of 0.32 mm × 0.32 mm.
- In Study II and Study III, all spatial maps (sos) have dimensions of 1024 × 1024 pixels, with a pixel size of 0.16 mm × 0.16 mm.
- In Study IV, both the sos and att maps have dimensions of 1024 × 1024 pixels, with a pixel size of 0.16 mm × 0.16 mm.
keywords:
Medical imaging; Photoacoustic computed tomography; Numerical phantom; Joint reconstruction
published:
2022-03-09
Rapti, Zoi; Rivera Quinones, Vanessa; Stewart Merrill, Tara
(2022)
MATLAB files for the analysis of an ODE model for disease transmission. The codes may be used to find equilibrium points, study transient dynamics, evaluate the basic reproductive number (R0), and simulate the model when parameters depend on the independent variables. In addition, the codes may be used to perform local sensitivity analysis of R0 on the model parameters.
published:
2025-10-10
Clark, Teresa J.; Schwender, Jorg
(2025)
Upregulation of triacylglycerols (TAGs) in vegetative plant tissues such as leaves has the potential to drastically increase the energy density and biomass yield of bioenergy crops. In this context, constraint-based analysis has the promise to improve metabolic engineering strategies. Here we present a core metabolism model for the C4 biomass crop Sorghum bicolor (iTJC1414) along with a minimal model for photosynthetic CO2 assimilation, sucrose and TAG biosynthesis in C3 plants. Extending iTJC1414 to a four-cell diel model we simulate C4 photosynthesis in mature leaves with the principal photo-assimilatory product being replaced by TAG produced at different levels. Independent of specific pathways and per unit carbon assimilated, energy content and biosynthetic demands in reducing equivalents are about 1.3 to 1.4 times higher for TAG than for sucrose. For plant generic pathways, ATP- and NADPH-demands per CO2 assimilated are higher by 1.3- and 1.5-fold, respectively. If the photosynthetic supply in ATP and NADPH in iTJC1414 is adjusted to be balanced for sucrose as the sole photo-assimilatory product, overproduction of TAG is predicted to cause a substantial surplus in photosynthetic ATP. This means that if TAG synthesis was the sole photo-assimilatory process, there could be an energy imbalance that might impede the process. Adjusting iTJC1414 to a photo-assimilatory rate that approximates field conditions, we predict possible daily rates of TAG accumulation, dependent on varying ratios of carbon partitioning between exported assimilates and accumulated oil droplets (TAG, oleosin) and in dependence of activation of futile cycles of TAG synthesis and degradation. We find that, based on the capacity of leaves for photosynthetic synthesis of exported assimilates, mature leaves should be able to reach a 20% level of TAG per dry weight within one month if only 5% of the photosynthetic net assimilation can be allocated into oil droplets. From this we conclude that high TAG levels should be achievable if TAG synthesis is induced only during a final phase of the plant life cycle.
keywords:
Feedstock Production;Modeling
published:
2021-11-23
Riemer, Nicole; Yao, Yu; Dawson, Matthew; Dabdub, Donald
(2021)
This dataset contains simulation results from PartMC-MOSAIC-CAPRAM used in the article ”Eval- uating the impacts of cloud processing on resuspended aerosol particles after cloud evaporation using a particle-resolved model”.
In this V2, there are eight folders: one for urban plume simulation to provide the initial particle population for cloud processing, the other four folders are for the four cloud cycles simulated and the last two are for the coagulation cases. Within the urban plume simulation, there are 25 NetCDF files hourly output from PartMC-MOSAIC simulations containing the gas and particle information. Within the four cloud cycle folders, there are 25 subdirectories that contain the cloud processing results for aerosol population from urban plume environment. For each subdirectory, there are 31 NetCDF files out- put every minute from PartMC-MOSAIC-CAPRAM simulations containing aerosol and gas information after aqueous chemistry. Another two folders are for the cases considering Brownian coagulation and sedimentation coalescence. Each contained 93 NetCDF files, produced from repeating the 30-minutes simulations for three times to consider the coagulation randomness. The low polluted case folder includes the simulated cloud processing results for 25 urban plume cases with less aerosol number concentration. This dataset was used to investigate the effects of cloud processing on aerosol mixing state and CCN properties.
keywords:
cloud process; coagulation; aqueous chemistry; aerosol mixing state; CCN
published:
2020-11-05
Miller, Andrew; Raudabaugh, Daniel
(2020)
This version 2 dataset contains 34 files in total with one (1) additional file, called "Culture-dependent Isolate table with taxonomic determination and sequence data.csv". The remaining files (33) are identical to version 1. The following is the information about the new file and its variables:
<b>Culture-dependent Isolate table with taxonomic determination and sequence data.csv</b>: Culture table with assigned taxonomy from NCBI. Single direction sequence for each isolate is include if one could be obtained. Sequence is derived from ITS1F-ITS4 PCR amplicons, with Sanger sequencing in one direction using ITS5. The files contains 20 variables with explanation as below:
IsolateNumber : unique number identify each isolate cultured
Time: season in which the sample was collected
Location: the specific name of the location
Habitat: type of habitat : either stream or peatland
State: state in the USA in which the specific location is located
Incubation_pH ID: pH of the medium during isolation of fungal cultures
Genus: phylogenetic genus of the fungal isolates (determined by sequence similarity)
Sequence_quality: base call quality of the entire sequence used for blast analysis, if known
%_coverage: sequence coverage reported from GenBank
%_ID: sequence similarity reported from GenBank
Life_style : ecological life style if known
Phylum: phylogenetic phylum as indicated by Index Fungorum
Subphylum: phylogenetic subphylum as indicated by Index Fungorum
Class: phylogenetic class as indicated by Index Fungorum
Subclass: phylogenetic subclass as indicated by Index Fungorum
Order: phylogenetic order as indicated by Index Fungorum
Family: phylogenetic Family as indicated by Index Fungorum
ITS5_Sequence: single direction sequence used for sequence similarity match using blastn. Primer ITS5
Fasta: sequence with nomenclature in a fasta format for easy cut and paste into phylogenetic software
Note: blank cells mean no data is available or unknown.
keywords:
ITS1 forward reads; Illumina; peatlands; streams; bogs; fens