Displaying 376 - 400 of 668 in total

Datasets

published: 2020-05-17

Mishra, Sudhanshu; Prasad, Shivangi; Mishra, Shubhanshu (2020): Trained models for Multilingual Joint Fine-tuning of Transformer models for identifying Trolling, Aggression and Cyberbullying at TRAC 2020. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8882752_V1

Models and predictions for submission to TRAC - 2020 Second Workshop on Trolling, Aggression and Cyberbullying Our approach is described in our paper titled: Mishra, Sudhanshu, Shivangi Prasad, and Shubhanshu Mishra. 2020. “Multilingual Joint Fine-Tuning of Transformer Models for Identifying Trolling, Aggression and Cyberbullying at TRAC 2020.” In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2020). The source code for training this model and more details can be found on our code repository: https://github.com/socialmediaie/TRAC2020 NOTE: These models are retrained for uploading here after our submission so the evaluation measures may be slightly different from the ones reported in the paper.

keywords: Social Media; Trolling; Aggression; Cyberbullying; text classification; natural language processing; deep learning; open source;

published: 2023-06-01

Pan, Chao; Peng, Jianhao; Chien, Eli; Milenkovic, Olgica (2023): Embedded dataset in Poincare Balls. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6901251_V1

This dataset contains four real-world sub-datasets with data embedded into Poincare ball models, including Olsson's single-cell RNA expression data, CIFAR10, Fashion-MNIST and mini-ImageNet. Each sub-dataset has two corresponding files: one is the data file, the other one is the pre-computed reference points for each class in the sub-dataset. Please refer to our paper (https://arxiv.org/pdf/2109.03781.pdf) and codes (https://github.com/thupchnsky/PoincareLinearClassification) for more details.

keywords: Hyperbolic space; Machine learning; Poincare ball models; Perceptron algorithm; Support vector machine

published: 2022-08-31

Seyfried, Georgia; Midgley, Meghan; Phillips, Richard; Yang, Wendy (2022): Data for Refining the role of nitrogen mineralization in mycorrhizal nutrient syndromes. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5586647_V2

This dataset includes data on soil properties, soil N pools, and soil N fluxes presented in the manuscript, "Refining the role of nitrogen mineralization in mycorrhizal nutrient syndromes". Please refer to that publication for details about methodologies used to generate these data and for the experimental design. For this verison 2, we added specific gross nitrogen mineralization rates (ugN/gOM/d), microbial biomass carbon (ugC/gdw), microbial biomass nitrogen (ugN/gdw) and microbial biomass C:N ratios to the newest version of the data set. Additionally, we updated values for gross nitrogen mineralization, microbial NO3 assimilation and microbial NH4 assimilation to reflect slight changes in data processing. Those changes are reflected in "220829_All data_repository.csv". "220829_nitrogen_mineralization_readme.txt " is updated readme for the new file. The other 2 files begin with “220426_” are older version and same as in V1.

keywords: Nitrogen cycling; Ectomycorrhizal fungi; Arbuscular mycorrhizal fungi; Nitrogen fertilization; Gross mineralization

published: 2023-07-01

Tonks, Adam; Hwang, Jeongwoo (2023): Data for the paper "Assessment of spatiotemporal flood risk due to compound precipitation extremes across the contiguous United States". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6626437_V1

This is the data used in the paper "Assessment of spatiotemporal flood risk due to compound precipitation extremes across the contiguous United States". Code from the Github repository https://github.com/adtonks/precip_extremes can be used with the data here to reproduce the paper's results. v1.0.0 of the code is also archived at https://doi.org/10.5281/zenodo.8104252 This dataset is derived from NOAA-CIRES-DOE 20th Century Reanalysis V3. The NOAA-CIRES-DOE Twentieth Century Reanalysis Project version 3 used resources of the National Energy Research Scientific Computing Center managed by Lawrence Berkeley National Laboratory which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 and used resources of NOAA's Remotely Deployed High Performance Computing Systems.

keywords: spatiotemporal; CONUS; United States; precipitation; extremes; flooding

published: 2022-05-20

Haselhorst, Derek; Moreno, J. Enrique; Tcheng, David K.; Punyasena, Surangi W. (2022): Images and annotated counts for aerial pollen samples from the Barro Colorado Island megaplot, Panama (1994 – 2010). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2176715_V1

This dataset includes images and annotated counts for 150 airborne pollen samples from the Center for Tropical Forest Science 50 ha forest dynamics plot on Barro Colorado Island, Panama. Samples were collected once a year from April 1994 to June 2010.

keywords: aerial pollen traps; automated pollen identification; Barro Colorado Island; convolutional neural networks; Neotropics; palynology; phenology

published: 2020-08-01

Horna Munoz, Daniel; Constantinescu, George; Rhoads, Bruce ; Lewis, Quinn; Sukhodolov, Alexander (2020): Confluence Density Effects Simulation Database. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6257171_V1

This data set shows how density effects have an important influence on mixing at a small river confluence. The data consist of results of simulations using a detached eddy simulation model.

keywords: confluence; flow dynamics; density effects

published: 2011-09-20

Swenson, M. Shel; Suri, Rahul; Linder, C. Randal; Warnow, Tandy; Nguyen, Nam-puhong; Mirarab, Siavash; Neves, Diogo Telmo; Sobral, João Luís; Pingali, Keshav; Nelesen, Serita; Liu, Kevin; Wang, Li-San (2011): Data for SuperFine, DACTAL, and BeeTLe. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2952208_V1

This page provides the data for SuperFine, DACTAL, and BeeTLe publications. - Swenson, M. Shel, et al. "SuperFine: fast and accurate supertree estimation." Systematic biology 61.2 (2012): 214. - Nguyen, Nam, Siavash Mirarab, and Tandy Warnow. "MRL and SuperFine+ MRL: new supertree methods." Algorithms for Molecular Biology 7 (2012): 1-13. - Neves, Diogo Telmo, et al. "Parallelizing superfine." Proceedings of the 27th Annual ACM Symposium on Applied Computing. 2012. - Nelesen, Serita, et al. "DACTAL: divide-and-conquer trees (almost) without alignments." Bioinformatics 28.12 (2012): i274-i282. - Liu, Kevin, and Tandy Warnow. "Treelength optimization for phylogeny estimation." PLoS One 7.3 (2012): e33104.

published: 2019-12-20

Wang, Yu; Burgess, Steven J. ; de Becker, Elsa ; Long, Stephen P. (2019): Data and code for: Photosynthesis in the fleeting shadows: An overlooked opportunity for increasing crop productivity?. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9453481_V1

This dynamic photosynthesis model of soybean canopy is developed by Yu Wang (yuwangcn@illinois.edu), IGB, University of Illinois. If you want to know more details, please check the following publication Yu Wang, Steven J. Burgess, Elsa de Becker, Stephen P. Long. Photosynthesis in the fleeting shadows: An overlooked opportunity for increasing crop productivity? The Plant Journal.

keywords: Matlab; Soybean canopy; photosynthesis model

published: 2020-03-13

Sweet, Andrew; Johnson, Kevin; Cameron, Stephen (2020): Data from: Mitochondrial genomes of Columbicola feather lice are highly fragmented, indicating repeated evolution of minicircle-type genomes in parasitic lice . University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2211060_V2

Data files associated with the assembly of mitochondrial minicircles from five species of parasitic lice. This includes data from four species in the genus Columbicola and from the human louse (Pediculus humanus). The files include FASTA sequences for all five species, reference sequences for read mapping approaches, resulting contigs produced by various assembly approaches, and alignments of human louse minicircles mapped to published sequences of the same species.

keywords: mitochondria; FASTA; nucleotide sequences; alignment; Columbicola; Pediculus

published: 2021-09-06

Vargas, Fabio (2021): Mesospheric gravity wave activity estimated via airglow imagery, multistatic meteor radar, and SABER data taken during the SIMONe–2018 campaign. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8585682_V1

Airglow images and Meteor radar data used in the paper "Mesospheric gravity wave activity estimated via airglow imagery, multistatic meteor radar, and SABER data taken during the SIMONe–2018 campaign".

keywords: airglow; meteor radar; gravity waves; momentum flux;

published: 2021-10-15

Jianhao, Peng; Idoia, Ochoa (2021): Synthetic datasets for SimiC . University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4996748_V1

This is the 5 states 5000 cells synthetic expression file we used for validation of SimiC, a single cell gene regulatory network inference method with similarity constraints. Ground truth GRNs are stored in Numpy array format, and expression profiles of all states combined are stored in Pandas DataFrame in format of Pickle files.

keywords: Numpy array; GRNs; Pandas DataFrame;

published: 2016-05-16

Imker, Heidi (2016): Phylogenetic Analysis of the NRPS AmbE Condensation Domains for the L-2-amino-4-methoxy-trans-3-butenoic acid (AMB) Biosynthetic Pathway in Pseudomonas aeruginosa. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4602893_V1

This dataset contains the protein sequences and trees used to compare Non-Ribosomal Peptide Synthetase (NRPS) condensation domains in the AMB gene cluster and was used to create figure S1 in Rojas et al. 2015. Instead of having to collect representative sequences independently, this set of condensation domain sequences may serve as a quick reference set for coarse classification of condensation domains.

keywords: NRPS; biosynthetic gene cluster; antimetabolite; Pseudomonas; oxyvinylglycine; secondary metabolite; thiotemplate; toxin

published: 2020-08-25

Allan, Brian; Fredericks, Lisa (2020): AllanLab fluidigm pipeline test dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0095812_V1

The Allan Lab has published a Fluidigm pipeline online. This is the url: https://github.com/HPCBio/allan-fluidigm-pipeline. This url includes a tutorial for running the pipeline. However it does not have test datasets yet. This tarball hosted at the Illinois Data Bank is the dataset that completes the github tutorial. It includes inputs (custom database of tick pathogens and fluidigm raw reads) and output files (tables of samples with taxonomic classifications).

keywords: custom database of tick pathogens; fluidigm pipeline; fluidigm paired reads; fluidigm tutorial

published: 2019-09-17

Fraebel, David T.; Kuehn, Seppe (2019): Sequencing data for migration rate selection experiments (0.2% agar, 1mM sugar). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2128477_V1

BAM files for evolved strains from migration rate selection experiments conducted in low viscosity (0.2% w/v) agar plates containing M63 minimal medium with 1mM of mannose, melibiose, N-acetylglucosamine or galactose

published: 2023-01-12

Mischo, William; Schlembach, Mary C. (2023): Processing and Pearson Correlation Scripts for the C&RL Article on the Relationships between Publication, Citation, and Usage Metrics at the University of Illinois at Urbana-Champaign Library . University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0931140_V1

These processing and Pearson correlational scripts were developed to support the study that examined the correlational relationships between local journal authorship, local and external citation counts, full-text downloads, link-resolver clicks, and four global journal impact factor indices within an all-disciplines journal collection of 12,200 titles and six subject subsets at the University of Illinois at Urbana-Champaign (UIUC) Library. This study shows strong correlations in the all-disciplines set and most subject subsets. Special processing scripts and web site dashboards were created, including Pearson correlational analysis scripts for reading values from relational databases and displaying tabular results. The raw data used in this analysis, in the form of relational database tables with multiple columns, is available at <a href="https://doi.org/10.13012/B2IDB-6810203_V1">https://doi.org/10.13012/B2IDB-6810203_V1</a>.

keywords: Pearson Correlation Analysis Scripts; Journal Publication; Citation and Usage Data; University of Illinois at Urbana-Champaign Scholarly Communication

published: 2022-09-29

Levine, Nathaniel (2022): 3DIFICE: A Synthetic Dataset for Training Computer Vision Algorithms to Recognize Earthquake Damage to Reinforced Concrete Structures. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6415287_V1

3DIFICE: 3-dimensional Damage Imposed on Frame structures for Investigating Computer vision-based Evaluation methods This dataset contains 1,396 synthetic images and label maps with various types of earthquake damage imposed on reinforced concrete frame structures. Damage includes: cracking, spalling, exposed transverse rebar, and exposed longitudinal rebar. Each image has an associated label map that can be used for training machine learning algorithms to recognize the various types of damage.

keywords: computer vision; earthquake engineering; structural health monitoring; civil engineering; structural engineering;

published: 2019-09-25

Wong, Tony; Hughes, A; Tokuda, K; Indebetouw, R; Onishi, T; Bandurski, J. B.; Chen, C. H. R.; Fukui, Y; Glover, S. C. O.; Klessen, R. S.; Pineda, J. L.; Roman-Duval, J.; Sewilo, M.; Wojciechowski, E.; Zahorecz, S. (2019): Data for: Relations Between Molecular Cloud Structure Sizes and Line Widths in the Large Magellanic Cloud. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7090706_V1

12CO and 13CO maps for six molecular clouds in the Large Magellanic Cloud, obtained with the Atacama Large Millimeter/submillimeter Array (ALMA). See the associated article in the Astrophysical Journal, and README files within each ZIP archive. Please cite the article if you use these data.

keywords: Radio astronomy

published: 2018-06-20

Lao, Yuyang; Caravelli, Francesco; Sheikh, Mohammed; Sklenar, Joseph; Gardeazabal, Daniel; Watts, Justin D. ; Albrecht, Alan M. ; Scholl, Andreas; Dahmen, Karin; Nisoli, Cristiano; Schiffer, Peter (2018): Data from: Classical Topological Order in the Kinetics of Artificial Spin Ice. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0598724_V1

The dataset includes the data used in the study of Classical Topological Order in the Kinetics of Artificial Spin Ice. This includes the photoemission electron microscopy intensity measurement of artificial spin ice at different temperatures as a function of time. The data includes the raw data, the metadata, and the data cookbook. Please refer to the data cookbook for more information. Note: vertex_population.xlsx file in the meta_data_code folder can be disregarded.

keywords: artificial spin ice; PEEM; topological order

published: 2019-05-20

Lao, Yuyang; Schiffer, Peter (2019): Tetris artificial spin ice kinetics . University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0779814_V1

This is the experimental data of tetris artificial spin ice. The islands are made of Permalloy materials with size of 170 nm by 470 nm by 2.5 nm. The systems are measured at a temperature where the islands are fluctuating around room temperature. The data is recorded as photoemission electron microscopy intensity. More details about the dataset can be found in the file Note.txt and Tetris_data_list.xlsx Note: 2 files name bl11_teris600_033 and bl11_tetris600_2_135 are not recorded in the excel sheet because they are corrupted during the measurement. Any data that is not recorded in the excel sheet is either corrupted or of low quality. From files *_028 to *_049, tetris is spelled with “t” while in the raw data folder without “t”. This is a typo. Throughout the dataset, tetris and teris are supposed to have the same meaning.

keywords: artificial spin ice

published: 2019-07-04

Sashittal, Palash; El-Kebir, Mohammed (2019): SharpTNI Results. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9734610_V1

Results generated using SharpTNI on data collected from the 2014 Ebola outbreak in Sierra Leone.

published: 2019-08-05

Skinner, Rachel; Dietrich, Christopher; Walden, Kimberly; Gordon, Eric; Sweet, Andrew; Podsiadlowski, Lars; Petersen, Malte; Simon, Chris; Takiya, Daniela; Johnson, Kevin (2019): Data for Phylogenomics of Auchenorrhyncha (Insecta: Hemiptera) using Transcriptomes: Examining Controversial Relationships via Degeneracy Coding and Interrogation of Gene Conflict. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1461292_V1

The data in this directory corresponds to: Skinner, R.K., Dietrich, C.H., Walden, K.K.O., Gordon, E., Sweet, A.D., Podsiadlowski, L., Petersen, M., Simon, C., Takiya, D.M., and Johnson, K.P. Phylogenomics of Auchenorrhyncha (Insecta: Hemiptera) using Transcriptomes: Examining Controversial Relationships via Degeneracy Coding and Interrogation of Gene Conflict. Systematic Entomology. Correspondance should be directed to: Rachel K. Skinner, rskinn2@illinois.edu If you use these data, please cite our paper in Systematic Entomology. The following files can be found in this dataset: Amino_acid_concatenated_alignment.phy: the amino acid alignment used in this analysis in phylip format. Amino_acid_raxml_partitions.txt (for reference only): the partitions for the amino acid alignment, but a partitioned amino acid analysis was not performed in this study. Amino_acid_concatenated_tree.newick: the best maximum likelihood tree with bootstrap values in newick format. ASTRAL_input_gene_trees.tre: the concatenated gene tree input file for ASTRAL README_pie_charts.md: explains the the scripts and data needed to recreate the pie charts figure from our paper. There is also another Corresponds to the following files: ASTRAL_species_tree_EN_only.newick: the species tree with only effective number (EN) annotation ASTRAL_species_tree_pp1_only.newick: the species tree with only the posterior probability 1 (main topology) annotation ASTRAL_species_tree_q1_only.newick: the species tree with only the quartet scores for the main topology (q1) ASTRAL_species_tree_q2_only.newick: the species tree with only the quartet scores for the first alternative topology (q2) ASTRAL_species_tree_q3_only.newick: the species tree with only the quartet scores for the second alternative topology (q3) print_node_key_files.py: script needed to create the following files: node_keys.key: text file with node IDs and topologies complete_q_scores.key: text file with node IDs multiplied q scores EN_node_vals.key: text file with node IDs and EN values create_pie_charts_tree.py: script needed to visualize the tree with pie charts, pp1, and EN values plotted at nodes ASTRAL_species_tree_full_annotation.newick: the species tree with full annotation from the ASTRAL analysis. NOTE: It may be more useful to examine individual value files if you want to visualize the tree, e.g., in figtree, since the full annotations are extensive and can make viewing difficult. Complete_NT_concatenated_alignment.phy: the nucleotide alignment that includes unmodified third codon positions. The alignment is in phylip format. Complete_NT_raxml_partitions.txt: the raxml-style partition file of the nucleotide partitions Complete_NT_concatenated_tree.newick: the best maximum likelihood tree from the concatenated complete analysis NT with bootstrap values in newick format Complete_NT_partitioned_tree.newick: the best maximum likelihood tree from the partitioned complete NT analysis with bootstrap values in newick format Degeneracy_coded_nt_concatenated_alignment.phy: the degeneracy coded nucleotide alignment in phylip format Degeneracy_coded_nt_raxml_partitions.txt: the raxml-style partition file for the degeneracy coded nucleotide alignment Degeneracy_coded_nt_concatenated_tree.newick: the best maximum likelihood tree from the degeneracy-coded concatenated analysis with bootstrap values in newick format Degeneracy_coded_nt_partitioned_tree.newick: the best maximum likelihood tree from the degeneracy-coded partitioned analysis with bootstrap values in newick format count_ingroup_taxa.py: script that counts the number of ingroup and/or outgroup taxa present in an alignment

keywords: Auchenorrhyncha; Hemiptera; alignment; trees

published: 2019-12-03

de Moya, Robert (2019): Heteroptera Transcriptome Set. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7784896_V1

These are the alignments of transcriptome data used for the analysis of members of Heteroptera. This dataset is analyzed in "Deep instability in the phylogenetic backbone of Heteroptera is only partly overcome by transcriptome-based phylogenomics" published in Insect Systematics and Diversity.

keywords: Heteroptera; Hemiptera; Phylogenomics; transcriptome

published: 2020-01-20

Zhang, Jun; Wuebbles, Donald; Kinnison, Douglas; Saiz López, Alfonso (2020): Data for: Revising the Ozone Depletion Potentials for Short-Lived Chemicals such as CF3I and CH3I. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5952573_V1

This datasets provide basis of our analysis in the paper - Revising the Ozone Depletion Potentials for Short-Lived Chemicals such as CF3I and CH3I. All datasets here are from the model output (CAM4-chem). All the simulations (background and perturbation) were run to steady-state and only the last year outputs used in analysis are archived here.

keywords: Illinois Data Bank; NetCDF; Ozone Depletion Potential; CF3I and CH3I

published: 2020-11-05

Miller, Andrew; Raudabaugh, Daniel (2020): Data from Species Distribution, Phylogenetic Structure, and Functional Roles of Detritius Inhabiting Fungi Across Contrasting Aquatic Environments.. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6862941_V2

This version 2 dataset contains 34 files in total with one (1) additional file, called "Culture-dependent Isolate table with taxonomic determination and sequence data.csv". The remaining files (33) are identical to version 1. The following is the information about the new file and its variables: Culture-dependent Isolate table with taxonomic determination and sequence data.csv: Culture table with assigned taxonomy from NCBI. Single direction sequence for each isolate is include if one could be obtained. Sequence is derived from ITS1F-ITS4 PCR amplicons, with Sanger sequencing in one direction using ITS5. The files contains 20 variables with explanation as below: IsolateNumber : unique number identify each isolate cultured Time: season in which the sample was collected Location: the specific name of the location Habitat: type of habitat : either stream or peatland State: state in the USA in which the specific location is located Incubation_pH ID: pH of the medium during isolation of fungal cultures Genus: phylogenetic genus of the fungal isolates (determined by sequence similarity) Sequence_quality: base call quality of the entire sequence used for blast analysis, if known %_coverage: sequence coverage reported from GenBank %_ID: sequence similarity reported from GenBank Life_style : ecological life style if known Phylum: phylogenetic phylum as indicated by Index Fungorum Subphylum: phylogenetic subphylum as indicated by Index Fungorum Class: phylogenetic class as indicated by Index Fungorum Subclass: phylogenetic subclass as indicated by Index Fungorum Order: phylogenetic order as indicated by Index Fungorum Family: phylogenetic Family as indicated by Index Fungorum ITS5_Sequence: single direction sequence used for sequence similarity match using blastn. Primer ITS5 Fasta: sequence with nomenclature in a fasta format for easy cut and paste into phylogenetic software Note: blank cells mean no data is available or unknown.

keywords: ITS1 forward reads; Illumina; peatlands; streams; bogs; fens

published: 2019-05-10

Pradhan, Dikshant; Jensen, Paul (2019): Pradhan 2019 Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3352362_V1

Data necessary for production of figures presented in "Efficient enzyme coupling algorithms identify functional pathways in genome-scale metabolic models" by Pradhan et al.

keywords: Efficient enzyme coupling algorithms identify functional pathways in genome-scale metabolic models;

Subject Area

Funder

Publication Year

License

Datasets