Dataset Search

Displaying 826 - 850 of 1004 in total

Filters

Subject Area

Life Sciences (616)

Social Sciences (148)

Physical Sciences (143)

Technology and Engineering (84)

Uncategorized

Arts and Humanities (2)

Funder

Other (270)

U.S. National Science Foundation (NSF) (242)

U.S. Department of Energy (DOE) (239)

U.S. National Institutes of Health (NIH) (88)

U.S. Department of Agriculture (USDA) (62)

Illinois Department of Natural Resources (IDNR) (26)

U.S. Geological Survey (USGS) (8)

U.S. National Aeronautics and Space Administration (NASA) (6)

Illinois Department of Transportation (IDOT) (4)

U.S. Army (3)

Publication Year

2025 (288)

2021 (108)

2022 (106)

2024 (105)

2020 (96)

2023 (75)

2019 (72)

2018 (61)

2017 (36)

2016 (30)

2026 (22)

2009 (1)

2011 (1)

2012 (1)

2014 (1)

2015 (1)

License

CC BY (514)

CC0 (463)

custom (27)

Illinois Data Bank Dataset Search Results

Results

published: 2019-02-26

Data for: Lipid heterogeneity between astrocytes and neurons revealed with single cell MALDI MS supervised by immunocytochemical classification

Neumann, Elizabeth; Comi, Troy; Rubakhin, Stanislav; Sweedler, Jonathan (2019)

We have recently created an approach for high throughput single cell measurements using matrix assisted laser desorption / ionization mass spectrometry (MALDI MS) (J Am Soc Mass Spectrom. 2017, 28, 1919-1928. doi: 10.1007/s13361-017-1704-1. Chemphyschem. 2018, 19, 1180-1191. doi: 10.1002/cphc.201701364). While chemical detail is obtained on individual cells, it has not been possible to correlate the chemical information with canonical cell types. Now we combine high-throughput single cell mass spectrometry with immunocytochemistry to determine lipid profiles of two known cell types, astrocytes and neurons from the rodent brain, with the work appearing as “Lipid heterogeneity between astrocytes and neurons revealed with single cell MALDI MS supervised by immunocytochemical classification” (DOI: 10.1002/anie.201812892). Here we provide the data collected for this study. The dataset provides the raw data and script files for the rodent cerebral cells described in the manuscript.

keywords: Single cell analysis; mass spectrometry; astrocyte; neuron; lipid analysis

published: 2019-06-12

Supplemental data sets for Raudabaugh et al., Where are they hiding? Testing the body snatchers hypothesis in pyrophilous fungi

Miller, Andrew; Raudabaugh, Daniel (2019)

The data set contains Supplemental data sets for the Manuscript entitled "Where are they hiding? Testing the body snatchers hypothesis in pyrophilous fungi." Environmental sampling: Amplification of nuclear DNA regions (ITS1 and ITS2) were completed using the Fluidigm Access Array and the resulting amplicons were sequenced on an Illumina MiSeq v2 platform runs using rapid 2 × 250 nt paired-end reads. Illumina sequencing run amplicons that were size selected into <500nt and >500nt sub-pools, then remixed together <500nt: >500nt by nM concentration in a 1x:3x proportion. All amplification and sequencing steps were performed at the Roy J. Carver Biotechnology Center at the University of Illinois Urbana-Champaign. ITS1 region primers consisted of ITS1F (5'-CTTGGTCATTTAGAGGAAGTAA-'3) and ITS2 (5'-GCTGCGTTCTTCATCGATGC-'3). ITS2 region primers consisted of fITS7 (5'-GTGARTCATCGAATCTTTG-'3) and ITS4 (5'-TCCTCCGCTTATTGATATGC-'3). Supplemental files 1 through 5 contain the raw data files. Supplemental 1 is the ITS1 Illumina MiSeq forward reads and Supplemental 2 is the corresponding index files. Supplemental 3 is the ITS2 Illumina MiSeq forward reads and Supplemental 4 is the corresponding index files. Supplemental 5 is the map file needed to process the forward reads and index files in QIIME. Supplemental 6 and 7 contain the resulting QIIME 1.9.1. OTU tables along with UNITE, NCBI, and CONSTAX taxonomic assignments in addition to the representative OTU sequence. Numeric samples within the OTU tables correspond to the following: 1 Brachythecium sp. 2 Usnea cornuta 3 Dicranum sp. 4 Leucodon julaceus 5 Lobaria quercizans 6 Rhizomnium sp. 7 Dicranum sp. 8 Thuidium delicatulum 9 Myelochroa aurulenta 10 Atrichum angustatum 11 Dicranum sp. 12 Hypnum sp. 13 Atrichum angustatum 14 Hypnum sp. 15 Thuidium delicatulum 16 Leucobryum sp. 17 Polytrichum commune 18 Atrichum angustatum 19 Atrichum angustatum 20 Atrichum crispulum 21 Bryaceae 22 Leucobryum sp. 23 Conocephalum conicum 24 Climacium americanum 25 Atrichum angustatum 26 Huperzia serrata 27 Polytrichum commune 28 Diphasiastrum sp. 29 Anomodon attenuatus 30 Bryoandersonia sp. 31 Polytrichum commune 32 Thuidium delicatulum 33 Brachythecium sp. 34 Leucobryum glaucum 35 Bryoandersonia sp. 36 Anomodon attenuatus 37 Pohlia sp. 38 Cinclidium sp. 39 Hylocomium splendens 40 Polytrichum commune 41 negative control 42 Soil 43 Soil 44 Soil 45 Soil 46 Soil 47 Soil If a sample number is not present within the OTU table; either no sequences were obtained or no sequences passed the quality filtering step in QIIME. Supplemental 8 contains the Summary of unique species per location.

published: 2019-07-29

Data from TRACTION: Fast non-parametric improvement of estimated gene trees

Christensen, Sarah; Molloy, Erin K.; Vachaspati, Pranjal; Warnow, Tandy (2019)

Datasets used in the study, "TRACTION: Fast non-parametric improvement of estimated gene trees," accepted at the Workshop on Algorithms in Bioinformatics (WABI) 2019.

keywords: Gene tree correction; horizontal gene transfer; incomplete lineage sorting

published: 2020-06-03

Ecological niche models of Late Pleistocene human land preference: an Australasian test case

Zachwieja, Alexandra (2020)

This dataset provides files for use in analysis of human land preference across Australasia, and in a localized analysis of land preference in Laos and Vietnam. All files can be imported into ArcGIS for visualization, and re-analyzed using the open source Maxent species distribution modeling program. CSV files contain known human presence sites for model validation. ASC files contain geographically coded environmental data for mean annual temperature and mean annual precipitation during the Last Glacial Maximum, as well as downward slope data. All ASC files are in the WGS 1984 Mercator map projection for visualization in ArcGIS and can be opened as text files in text editors supporting large file sizes.

keywords: human dispersal; ecological niche modeling; Australasia; Late Pleistocene; land preference

published: 2023-09-01

Farmers’ knowledge, attitudes, and prevention practices regarding ticks and tickborne diseases in Illinois

Chakraborty, Sulagna; Steckler, Teresa; Gronemeyer, Peg; Mateus-Pinilla, Nohra; Smith, Rebecca (2023)

An online and paper knowledge, attitudes, and practices survey on ticks and tick-borne diseases (TBD) was distributed to farmers in Illinois during summer 2020 to spring 2022 (paper version titled Final Draft Farmer KAP_v.SoftCopy_Revised.docx). These are the raw data associated with that survey and the survey questions used (FarmerTickKAPdata.csv, data dictionary in Data Description.docx). We have added calculated values (columns 286 to end, code for calculation in FarmerKAPvariableCalculation.R), including: the tick knowledge score, TBD knowledge score, and total knowledge score, which are the sum of the total number of correct answers in each category, and score percent, which are the proportion of correct answers in each category.

keywords: ticks; survey; tick-borne disease; farmer

published: 2021-06-17

Model output from the Weather Research and Forecasting model with water vapor tracers over Amazon and La Plata river basins

Dominguez, Francina; Yang, Zhao (2021)

Model output dataset (6-hourly) from the Weather Research and Forecasting (WRF) model simulations over South America with the added capability of water vapor tracers to track the moisture that originates over the Amazon and the La Plata river basins. The simulations were performed for the period 2003-2013 at 20-km horizontal resolution fully coupled with the Noah-MP land surface model. Limited number of original output variables sufficient for reproducing the analyses in papers that cite this dataset are included here. The attached wrfout_southamerica_readme.txt contains detailed information about the file format and variables. For the complete model dataset, contact francina@illinois.edu.

keywords: WRF; Amazon; La Plata; South America; Numerical tracers

published: 2019-07-08

Datasets from "Predicting Controlled Vocabulary Based on Text and Citations: Case Studies in Medical Subject Headings in MEDLINE and Patents"

Kehoe, Adam K.; Torvik, Vetle I. (2019)

# Overview These datasets were created in conjunction with the dissertation "Predicting Controlled Vocabulary Based on Text and Citations: Case Studies in Medical Subject Headings in MEDLINE and Patents," by Adam Kehoe. The datasets consist of the following: * twin_not_abstract_matched_complete.tsv: a tab-delimited file consisting of pairs of MEDLINE articles with identical titles, authors and years of publication. This file contains the PMIDs of the duplicate publications, as well as their medical subject headings (MeSH) and three measures of their indexing consistency. * twin_abstract_matched_complete.tsv: the same as above, except that the MEDLINE articles also have matching abstracts. * mesh_training_data.csv: a comma-separated file containing the training data for the model discussed in the dissertation. * mesh_scores.tsv: a tab-delimited file containing a pairwise similarity score based on word embeddings, and MeSH hierarchy relationship. ## Duplicate MEDLINE Publications Both the twin_not_abstract_matched_complete.tsv and twin_abstract_matched_complete.tsv have the same structure. They have the following columns: 1. pmid_one: the PubMed unique identifier of the first paper 2. pmid_two: the PubMed unique identifier of the second paper 3. mesh_one: A list of medical subject headings (MeSH) from the first paper, delimited by the "|" character 4. mesh_two: a list of medical subject headings from the second paper, delimited by the "|" character 5. hoopers_consistency: The calculation of Hooper's consistency between the MeSH of the first and second paper 6. nonhierarchicalfree: a word embedding based consistency score described in the dissertation 7. hierarchicalfree: a word embedding based consistency score additionally limited by the MeSH hierarchy, described in the dissertation. ## MeSH Training Data The mesh_training_data.csv file contains the training data for the model discussed in the dissertation. It has the following columns: 1. pmid: the PubMed unique identifier of the paper 2. term: a candidate MeSH term 3. cit_count: the log of the frequency of the term in the citation candidate set 4. total_cit: the log of the total number the paper's citations 5. citr_count: the log of the frequency of the term in the citations of the paper's citations 6. total_citofcit: the log of the total number of the citations of the paper's citations 7. absim_count: the log of the frequency of the term in the AbSim candidate set 8. total_absim_count: the log of the total number of AbSim records for the paper 9. absimr_count: the log of the frequency of the term in the citations of the AbSim records 10. total_absimr_count: the log of the total number of citations of the AbSim record 11. log_medline_frequency: the log of the frequency of the candidate term in MEDLINE. 12. relevance: a binary indicator (True/False) if the candidate term was assigned to the target paper ## Cosine Similarity The mesh_scores.tsv file contains a pairwise list of all MeSH terms including their cosine similarity based on the word embedding described in the dissertation. Because the MeSH hierarchy is also used in many of the evaluation measures, the relationship of the term pair is also included. It has the following columns: 1. mesh_one: a string of the first MeSH heading. 2. mesh_two: a string of the second MeSH heading. 3. cosine_similarity: the cosine similarity between the terms 4. relationship_type: a string identifying the relationship type, consisting of none, parent/child, sibling, ancestor and direct (terms are identical, i.e. a direct hierarchy match). The mesh_model.bin file contains a binary word2vec C format file containing the MeSH term embeddings. It was generated using version 3.7.2 of the Python gensim library (https://radimrehurek.com/gensim/). For an example of how to load the model file, see https://radimrehurek.com/gensim/models/word2vec.html#usage-examples, specifically the directions for loading the "word2vec C format."

keywords: MEDLINE;MeSH;Medical Subject Headings;Indexing

published: 2023-07-10

Data for Bee movement between natural fragments is rare despite differences in species, patch, and matrix variables

Harmon-Threatt, Alexandra N.; Anderson, Nicholas L. (2023)

Bee movement between habitat patches in a naturally fragmented ecosystem depended on species, patch, and matrix variables. Using a mark-recapture methodology in the naturally fragmented Ozark glade ecosystem, we assessed the importance of bee size, nesting biology, the distance between patches (e.g., isolation), and nesting and floral resources in habitat patches and the surrounding matrix on bee movement. This dataset includes seven data files, three R code files, and a QGIS tool. Three of the data files include information collected at the study sites with regard to bees and matrix and patch characteristics. The other four data files are spatial files used to quantify the characteristics of the forest canopy between the study sites and the edge-to-edge distances between the study sites. R code in the R Markdown file recreates the analysis and data presentation for the associated publication. R script files contain processes for calculating some of the explanatory variables used in the analysis. The QGIS tool can be used as the first step to obtaining average values from a raster file where the cells are large relative to the areas of interest (AOI) that you would like to characterize. The second step is contained in one of the aforementioned R scripts. Detected effects included: Larger bees were more likely to move between patches. Bee movement was less likely as the distance between patches increased. However, relatively short distances (~50 m) inhibited movement more than our a priori expectations. Bees were unlikely to move away from home patches with abundant and diverse floral and below-ground nesting resources. When home patches were less resource-rich, bee movement depended on the characteristics of the away patch or the matrix. In these cases, bees were more likely to move to away patches with greater below-ground nesting and floral resources. Matrix habitats with more available floral and below-ground nesting resources appear to impede movement to neighboring patches, potentially because they already provide supplemental resources for bees.

keywords: habitat fragmentation; bees; movement; mark-recapture; nesting resources; floral resources; isolation

published: 2024-07-08

Microsatellite genotypes and locations for three Physaria taxa on and near the Kaibab Plateau, Arizona, USA

Chong, Jer Pin; Minnaert-Grote, Jamie; Zaya, David N.; Ashley, Mary V.; Coons, Janice; Ramp Neal, Jennifer M.; Molano-Flores, Brenda (2024)

A population genetics study was conducted on three plant taxa in the genus Physaria that are found on the Kaibab Plateau (Arizona, USA). Physaria kingii subsp. kaibabensis is endemic to the Kaibab Plateau, and is of conservation concern because of its rarity, limited range, and potential threats to its long-term persistence. Additionally, the taxon is a candidate for federal protection under the Endangered Species Act. It was not clear how genetically isolated P. k. subsp. kaibabensis was from Physaria kingii subsp. latifolia, which is a widespread subspecies found throughout the southwestern USA, including on the Kaibab Plateau. Additionally, other authors have suggested that P. k. subsp. kaibabensis may hybridize with Physaria arizonica, a different species that is also widespread and found on and off the Kaibab Plateau. We conducted a population genetics study of all three groups to better determine the conservation status of P. k. subsp. kaibabensis. Genetic data are in the form of nuclear DNA microsatellites for 13 loci (all apparently diploid). Additionally, we have included location information for the collection sites. We collected tissue samples from on and off the Kaibab Plateau. The overall findings are shared in a manuscript being submitted for peer-review.

keywords: Physaria kingii; Kaibab Plateau; endemism; conservation genetics; rare species biology

published: 2018-03-01

Linking landscape composition to predator-specific nest predation requires examining multiple landscape scales

Chiavacci, Scott J.; Benson, Thomas J.; Ward, Michael P. (2018)

Data were used to analyze patterns in predator-specific nest predation on shrubland birds in Illinois as related to landscape composition at multiple landscape scales. Data were used in a Journal of Applied Ecology research paper of the same name. Data were collected between 2011 and 2014 at sites in east-central and northeastern Illinois, USA as part of a Ph.D. research project on the relationship between avian nest predation and landscape characteristics, and how nest predation affects adult and nestling bird behavior.

keywords: nest predation; avian ecology; land cover; landscape composition; landscape scale; nest camera; nest survival; predator-specific mortality; scale-dependence; scrubland; shrub-nesting bird

published: 2019-08-29

Bemisia tabaci ortholog set

de Moya, Robert (2019)

This is the published ortholog set derived from whole genome data used for the analysis of members of the B. tabaci complex of whiteflies. It includes the concatenated alignment and individual gene alignments used for analyses (Link to publication: https://www.mdpi.com/1424-2818/11/9/151).

published: 2024-10-12

Data for "Strain rate controls alignment in growing bacterial monolayers"

Langeslay, Blake; Juarez, Gabriel (2024)

Simulation data used to generate plots in the associated paper ("Strain rate controls alignment in growing bacterial monolayers").

published: 2025-10-08

Data from Arabidopsis Plants Expressing Only the Redox-Regulated Rca-α Isoform Have Constrained Photosynthesis and Plant Growth

Kim, Sang Yeol; Stessman, Dan J.; Wright, David A.; Spalding, Martin H.; Huber, Steven; Ort, Donald (2025)

Rubisco activase (Rca) facilitates the release of sugar‐phosphate inhibitors from the active sites of Rubisco and thereby plays a central role in initiating and sustaining Rubisco activation. In Arabidopsis, alternative splicing of a single Rca gene results in two Rca isoforms, Rca‐α and Rca‐β. Redox modulation of Rca‐α regulates the function of Rca‐α and Rca‐β acting together to control Rubisco activation. Although Arabidopsis Rca‐α alone less effectively activates Rubisco in vitro , it is not known how CO2 assimilation and plant growth are impacted. Here, we show that two independent transgenic Arabidopsis lines expressing Rca‐α in the absence of Rca‐β (“Rca‐α only” lines) grew more slowly in various light conditions, especially under low light or fluctuating light intensity, and in a short day photoperiod compared to wildtype. Photosynthetic induction was slower in the Rca‐α only lines, and they maintained a lower rate of CO2 assimilation during both photoperiod types. Our findings suggest Rca oligomers composed of Rca‐α only are less effective in initiating and sustaining the activation of Rubisco than when Rca‐β is also present. Currently there are no examples of any plant species that naturally express Rca‐α only but numerous examples of species expressing Rca‐β only. That Rca‐α exists in most plant species, including many C3 and C4 food and bioenergy crops, implies its presence is adaptive under some circumstances.

keywords: Feedstock Production;Biomass Analytics;Phenomics

published: 2025-10-24

Data for Application of Time-Domain 1H NMR for Investigating Dynamics of Vegetative Lipids in Bioenergy Crops at Different Developmental Stages

Maitra, Shraddha; Singh, Vijay (2025)

Sweet sorghum is typically cultivated for the food and fodder market. Recently, sweet sorghum varieties are being metabolically transitioned to enhance energy density by accumulating oil droplets in their vegetative tissues for bioenergy applications. Owing to the high biomass yield of sorghum, the transgenic lines can compete with oil-seed crops for biodiesel yield per unit area. In the initial phase of transgenic development, a high-throughput phenotyping method can bridge the gap between the production pipeline and analysis to improve the efficiency of the process. To meet the requirement, the present study extends the application of time-domain 1H-NMR spectroscopy for rapid quantification and characterization of the total in-situ lipids of sweet sorghum ‘ramada’ to lay the groundwork for analyzing the upcoming large quantity of transgenic samples. NMR technology has been successfully established for analyzing lipid contents of vegetative tissues of non-transgenic variety. The multiexponential analysis of spin-lattice (T1) relaxation spectra obtained from TD-NMR aided the investigation of the dynamics of the free and bound lipid fraction with plant development. The total lipid concentration of bagasse and leaves of non-transgenic sweet sorghum remained unchanged throughout the plant development. Leaves displayed a higher percentage of bound lipids as compared to bagasse. A significant variation in the lipid concentration of juice was observed at the different growth stages with a maximum lipid accumulation of 1.21 ± 0.04% w/w at the boot stage that decreased with further maturity of the plant.

keywords: Conversion;Biomass Analytics;Lipidomics;Metabolomics

published: 2019-11-12

Data for: How do moral values differ in tweets on social movements?

Rezapour, Rezvaneh (2019)

We are sharing the tweet IDs of four social movements: #BlackLivesMatter, #WhiteLivesMatter, #AllLivesMatter, and #BlueLivesMatter movements. The tweets are collected between May 1st, 2015 and May 30, 2017. We eliminated the location to the United States and focused on extracting the original tweets, excluding the retweets. Recommended citations for the data: Rezapour, R. (2019). Data for: How do Moral Values Differ in Tweets on Social Movements?. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9614170_V1 and Rezapour, R., Ferronato, P., and Diesner, J. (2019). How do moral values differ in tweets on social movements?. In 2019 Computer Supported Cooperative Work and Social Computing Companion Publication (CSCW’19 Companion), Austin, TX.

keywords: Twitter; social movements; black lives matter; blue lives matter; all lives matter; white lives matter

published: 2020-10-01

No choice mating trials and two choice mating trials in the polymorphic tortoise beetle, Chelymorpha alternans

Strickland, Lynette (2020)

These datasets were performed to assess whether color pattern phenotypes of the polymorphic tortoise beetle, Chelymorpha alternans, mate randomly with one another, and whether there are any reproductive differences between assortative and disassortative pairings.

keywords: mate choice, color polymorphisms, random mating

published: 2021-10-15

SABER Intra-annual Data

Swenson, Gary (2021)

Atomic oxygen densities in the MLT, averaged for 2002-2018 for 26, 14 day periods, beginning January 1.

keywords: SABER data

published: 2025-04-04

uCite: The union of nine large-scale public PubMed citation datasets with reliability labels

Fang, Liri; Salami, Malik Oyewale; Weber, Griffin M.; Torvik, Vetle I. (2025)

This dataset, uCite, is the union of nine large-scale open-access PubMed citation data separated by reliability. There are 20 files, including the reliable and unreliable citation PMID pairs, non-PMID identifiers to PMID mapping (for DOIs, Lens, MAG, and Semantic Scholar), original PMID pairs from the nine resources, some metadata for PMIDs, duplicate PMIDs, some redirected PMID pairs, and PMC OA Patci citation matching results. The short description of each data file is listed as follows. A detailed description can be found in the README.txt. DATASET DESCRIPTION <ol> <li>PPUB.tsv.gz - tsv format file containing reliable citation pairs uCite.</li> <li>PUNR.tsv.gz - tsv format file containing reliable citation pairs uCite.</li> <li>DOI2PMID.tsv.gz - tsv format file containing results mapping DOI to PMID. </li> <li> LEN2PMID.tsv.gz - tsv format file containing results mapping LensID pairs to PMID pairs.. </li> <li> MAG2PMIDsorted.tsv.gz - tsv format file containing results mapping MAG ID to PMID. </li> <li>SEM2PMID.tsv.gz - tsv ormat file containing results mapping Semantic Scholar ID to PMID. </li> <li>JVNPYA.tsv.gz - tsv format file containing metadata of papers with PMID, journal name, volume, issue, pages, publication year, and first author's last name. </li> <li>TiLTyAlJVNY.tsv.gz - tsv format file containing metadata of papers. </li> <li> PMC-OA-patci.tsv.gz - tsv format file containing PubMed Central Open Access subset reference strings extracted by \cite{} processed by Patci.</li> <li>REDIRECTS.gz - txt file containing unreliable PMID pairs mapped to reliable PMID pairs. </li> <li>REMAP - file containing pairs of duplicate PubMed records (lhs PMID mapped to rhs PMID).</li> <li> ami_pair.tsv.gz - tsv format file containing all citation pairs from Aminer (2015 version). </li> <li> dim_pair.tsv.gz - tsv format file containing all citation pairs from Dimensions. </li> <li> ice_pair.tsv.gz - tsv format file containing all citation pairs from iCite (April 2019 version, version 1). </li> <li> len_pair.tsv.gz - tsv format file containing all citation pairs from Lens.org (harvested through Oct 2021). </li> <li>mag_pair.tsv.gz - tsv format file containing all citation pairs from Microsoft Academic Graph (2015 version). </li> <li> oci_pair.tsv.gz - tsv format file containing all citation pairs from Open Citations (Nov. 2021 dump, csv version ). </li> <li> pat_pair.tsv.gz - tsv format file containing all citation pairs from Patci (i.e., from "PMC-OA-patci.tsv.gz"). </li> <li> pmc_pair.tsv.gz - tsv format file containing all citation pairs from PubMed Central (harvest through Dec 2018 via e-Utilities).</li> <li> sem_pair.tsv.gz - tsv format file containing all citation pairs from Semantic Scholar (2019 version) . </li> </ol> COLUMN DESCRIPTION FILENAME : PPUB.tsv.gz, PUNR.tsv.gz (1) fromPMID - PubMed ID of the citing paper. (2) toPMID - PubMed ID of the cited paper. (3) sources - citation sources, in which the citation pairs are identified. (4) fromYEAR - Publication year of the citing paper. (5) toYEAR - Publication year of the cited paper. FILENAME : DOI2PMID.tsv.gz (1) DOI - Semantic Scholar ID of paper records. (2) PMID - PubMed ID of paper records. (3) PMID2 - Digital Object Identifier of paper records, “-” if the paper doesn't have DOIs. FILENAME : SEMID2PMID.tsv.gz (1) SemID - Semantic Scholar ID of paper records. (2) PMID - PubMed ID of paper records. (3) DOI - Digital Object Identifier of paper records, “-” if the paper doesn't have DOIs. FILENAME : JVNPYA.tsv.gz - Each row refers to a publication record. (1) PMID - PubMed ID. (2) journal - Journal name. (3) volume - Journal volume. (4) issue - Journal issue. (5) pages - The first page and last page (without leading digits) number of the publication separated by '-'. (6) year - Publication year. (7) lastname - Last name of the first author. FILENAME : TiLTyAlJVNY.tsv.gz (1) PMID - PubMed ID. (2) title_tokenized - Paper title after tokenization. (3) languages - Language that paper is written in. (4) pub_types - Types of the publication. (5) length(authors) - String length of author names. (6) journal -Journal name . (7) volume - Journal volume . (8) issue - Journal issue. (9) year - Publication year of print (not necessary epub). FILENAME : PMC-OA-patci.tsv.gz (1) pmcid - PubMed Central identifier. (2) pos - (3) fromPMID - PubMed ID of the citing paper. (4) toPMID - PubMed ID of the cited paper. (5) SRC - citation sources, in which the citation pairs are identified. (6) MatchDB - PubMed, ADS, DBLP. (7) Probability - Matching probability predicted by Patci. (8) toPMID2 - PubMed ID of the cited paper, extracted from OA xml file (9) SRC2 - citation sources, in which the citation pairs are identified. (10) intxt_id - (11) jounal - First character of the journal name. (12) same_ref_string - Y if patci and xml reference string match, otherwise N. (13) DIFF - (14) bestSRC - Citation sources, in which the citation pairs are identified. (15) Match - Matching strings annotated by Patci. FILENAME : REDIRECTS.gz Each row in Redirectis.txt is a string sequence in the same format as follows. - "REDIRECTED FROM: source PMID_i PMID_j -> PMID_i' PMID_j " - "REDIRECTED TO: source PMID_i PMID_j -> PMID_i PMID_j' " Note: source is the names of sources where the PMID_i and PMID_j are from. FILENAME : REMAP Each row is remapping unreliable PMID pairs mapped to reliable PMID pairs. The format of each row is "$REMAP{PMID_i} = PMID_j". FILENAME : ami_pair.tsv.gz, dim_pair.tsv.gz, ice_pair.tsv.gz, len_pair.tsv.gz, mag_pair.tsv.gz, oci_pair.tsv.gz, pat_pair.tsv.gz，pmc_pair.tsv.gz, sem_pair.tsv.gz (1) fromPMID - PubMed ID of the citing paper. (2) toPMID - PubMed ID of the cited paper.

keywords: Citation data; PubMed; Social Science;

published: 2018-06-18

Population genetic structure of Miscanthus sacchariflorus

Clark, Lindsay V.; Jin, Xiaoli; Petersen, Karen K.; Anzoua, Kossanou G.; Bagmet, Larissa; Chebukin, Pavel; Deuter, Martin; Dzyubenko, Elena; Dzyubenko, Nicolay; Heo, Kweon; Johnson, Douglas A.; Jørgensen, Uffe; Kjeldsen, Jens B.; Nagano, Hironori; Peng, Junhua; Sabitov, Andrey; Yamada, Toshihiko; Yoo, Ji Hye; Yu, Chang Yeon; Long, Stephen P.; Sacks, Erik J. (2018)

This repository contains datasets and R scripts that were used in a study of the population structure of Miscanthus sacchariflorus in its native range across East Asia. Notably, genotypes of 764 individuals at 34,605 SNPs, called from reduced-representation DNA sequencing using a non-reference bioinformatics pipeline, are provided. Two similar SNP datasets, used for identifying clonal duplicates and for determining the ancestry of ornamental and hybrid Miscanthus plants identified in previous studies respectively, are also provided. There is also a spreadsheet listing the provenance and ploidy of all individuals along with their plastid (chloroplast) haplotypes. Software output for Structure, Treemix, and DIYABC is also included. See README.txt for more information about individual files. Results of this study are described in a manuscript in revision in Annals of Botany by the same authors, "Population structure of Miscanthus sacchariflorus reveals two major polyploidization events, tetraploid-mediated unidirectional introgression from diploid Miscanthus sinensis, and diversity centered around the Yellow Sea."

keywords: Miscanthus; restriction site-associated DNA sequencing (RAD-seq); single nucleotide polymorphism (SNP); population genetics; Miscanthus xgiganteus; Miscanthus sacchariflorus; R scripts; germplasm; plastid haplotype

published: 2020-05-15

Trained models for multi-task multi-dataset learning for sequence prediction in tweets - Old Experiments

Mishra, Shubhanshu (2020)

Trained models for multi-task multi-dataset learning for sequence prediction in tweets Tasks include POS, NER, Chunking, and SuperSenseTagging Models were trained using: https://github.com/napsternxg/SocialMediaIE/blob/master/experiments/multitask_multidataset_experiment.py See https://github.com/napsternxg/SocialMediaIE for details.

keywords: twitter; deep learning; machine learning; trained models; multi-task learning; multi-dataset learning;

published: 2022-05-13

Data files for phylogenetic analysis of Typhlocybinae (Hemiptera: Cicadellidae)

Yan, Bin; Dietrich, Christopher; Yu, Xiaofei; Dai, Renhuai; Maofa, Yang (2022)

The files are plain text and contain the original data used in phylogenetic analyses of of Typhlocybinae (Bin, Dietrich, Yu, Meng, Dai and Yang 2022: Ecology & Evolution, in press). The three files with extension .phy are text files with aligned DNA sequences in the standard PHYLIP format and correspond to Matrix 1 (amino acid alignment), Matrix 2 (nucleotide alignment of first two codon positions of protein-coding genes) and Matrix 3 (nucleotide alignment of protein-coding genes plus 2 ribosomal genes) described in the Methods section. An additional text file in NEXUS format (.nex extension) contains the morphological character data used in the ancestral state reconstruction (ASCR) analysis described in the Methods. NEXUS is a standard format used by various phylogenetic analysis software. For more information on data file content, see the included "readme" files.

keywords: Hemiptera; phylogeny; mitochondrial genome; morphology; leafhopper

published: 2025-12-08

Data for The Leaf Economics Spectrum of Triploid and Tetraploid C4 Grass Miscanthus x giganteus

Li, Shuai; Moller, Christopher; Mitchell, Noah G.; Martin, Duncan; Sacks, Erik; Saikia, Sampurna; Labonte, Nicholas R.; Baldwin, Brian S.; Morrison, Jesse; Ferguson, John; Leakey, Andrew; Ainsworth, Elizabeth (2025)

The leaf economics spectrum (LES) describes multivariate correlations in leaf structural, physiological and chemical traits, originally based on diverse C3 species grown under natural ecosystems. However, the specific contribution of C4 species to the global LES is studied less widely. C4 species have a CO2 concentrating mechanism which drives high rates of photosynthesis and improves resource use efficiency, thus potentially pushing them towards the edge of the LES. Here, we measured foliage morphology, structure, photosynthesis, and nutrient content for hundreds of genotypes of the C4 grass Miscanthus × giganteus grown in two common gardens over two seasons. We show substantial trait variations across M. × giganteus genotypes and robust genotypic trait relationships. Compared to the global LES, M. × giganteus genotypes had higher photosynthetic rates, lower stomatal conductance, and less nitrogen content, indicating greater water and photosynthetic nitrogen use efficiency in the C4 species. Additionally, tetraploid genotypes produced thicker leaves with greater leaf mass per area and lower leaf density than triploid genotypes. By expanding the LES relationships across C3 species to include C4 crops, these findings highlight that M. × giganteus occupies the boundary of the global LES and suggest the potential for ploidy to alter LES traits.

keywords: Feedstock Production;Biomass Analytics;Field Data

published: 2019-03-13

Spatial Conservation and Investment Portfolios to Manage Climate-Related Risk

Ando, Amy; Fraterrigo, Jennifer; Guntenspergen, Glenn; Howlader, Aparna; Mallory, Mindy; Olker, Jennifer; Stickley, Samuel (2019)

keywords: climate change; conservation; diversification; environmental investments; MPT; porftfolio; risk; uncertainty

published: 2025-09-15

Data from Sugar Production from Bioenergy Sorghum by Using Pilot-Scale Continuous Hydrothermal Pretreatment Combined with Disk Refining

Cheng, Ming-Hsun; Dien, Bruce; Lee, D. K.; Singh, Vijay (2025)

Chemical-free pretreatments are attracting increased interest because they generate less inhibitor in hydrolysates. In this study, pilot-scaled continuous hydrothermal (PCH) pretreatment followed by disk refining was evaluated and compared to laboratory-scale batch hot water (LHW) pretreatment. Bioenergy sorghum bagasse (BSB) was pretreated at 160-190 °C for 10 min with and without subsequent disk milling. Hydrothermal pretreatment and disk milling synergistically improved glucose and xylose release by 10-20% compared to hydrothermal pretreatment alone. Maximum yields of glucose and xylose of 82.55% and 70.78%, respectively were achieved, when BSB was pretreated at 190 °C and 180 °C followed by disk milling. LHW pretreated BSB had 5-15% higher sugar yields compared to PCH for all pretreatment conditions. The surface area improvement was also performed. PCH pretreatment combined with disk milling increased BSB surface area by 31.80-106.93%, which was greater than observed using LHW pretreatment.

keywords: Conversion;Sustainability;Genomics;Hydrolysate

published: 2017-08-11

Magnetotransport measurements of connected kagome artificial spin ice in armchair and zigzag configurations

Schiffer, Peter; Le, Brian L. (2017)

Enclosed in this dataset are transport data of kagome connected artificial spin ice networks composed of permalloy nanowires. The data herein are reproductions of the data seen in Appendix B of the dissertation titled "Magnetotransport of Connected Artificial Spin Ice". Field sweeps with the magnetic field applied in-plane were performed in 5 degree increments for armchair orientation kagome artificial spin ice and zigzag orientation kagome artificial spin ice.

keywords: Magnetotransport; artificial spin ice; nanowires