Illinois Data Bank
Deposit Dataset
Find Data
Policies
Guides
Contact Us
Log in with NetID
Toggle navigation
Illinois Data Bank
Deposit Dataset
Find Data
Policies
Guides
Contact Us
Log in with NetID
Displaying 501 - 525 of 738 in total
<
1
2
…
17
18
19
20
21
22
23
24
25
…
29
30
>
25 per page
50 per page
Show All
Go
Clear Filters
Generate Report from Search Results
Subject Area
Life Sciences (395)
Social Sciences (139)
Physical Sciences (106)
Technology and Engineering (68)
Uncategorized
Arts and Humanities (1)
Funder
Other (227)
U.S. National Science Foundation (NSF) (211)
U.S. Department of Energy (DOE) (76)
U.S. National Institutes of Health (NIH) (73)
U.S. Department of Agriculture (USDA) (50)
Illinois Department of Natural Resources (IDNR) (20)
U.S. Geological Survey (USGS) (7)
U.S. National Aeronautics and Space Administration (NASA) (6)
Illinois Department of Transportation (IDOT) (4)
U.S. Army (2)
Publication Year
2024 (110)
2021 (108)
2022 (106)
2020 (96)
2023 (75)
2019 (72)
2018 (61)
2025 (39)
2017 (36)
2016 (30)
2009 (1)
2011 (1)
2012 (1)
2014 (1)
2015 (1)
License
CC0 (403)
CC BY (312)
custom (23)
Illinois Data Bank Dataset Search Results
Dataset Search Results
published: 2022-05-20
Haselhorst, Derek; Moreno, J. Enrique; Tcheng, David K.; Punyasena, Surangi W. (2022): Images and annotated counts for aerial pollen samples from the Barro Colorado Island megaplot, Panama (1994 – 2010). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2176715_V1
This dataset includes images and annotated counts for 150 airborne pollen samples from the Center for Tropical Forest Science 50 ha forest dynamics plot on Barro Colorado Island, Panama. Samples were collected once a year from April 1994 to June 2010.
keywords:
aerial pollen traps; automated pollen identification; Barro Colorado Island; convolutional neural networks; Neotropics; palynology; phenology
published: 2011-09-20
Swenson, M. Shel; Suri, Rahul; Linder, C. Randal; Warnow, Tandy; Nguyen, Nam-puhong; Mirarab, Siavash; Neves, Diogo Telmo; Sobral, João Luís; Pingali, Keshav; Nelesen, Serita; Liu, Kevin; Wang, Li-San (2011): Data for SuperFine, DACTAL, and BeeTLe. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2952208_V1
This page provides the data for SuperFine, DACTAL, and BeeTLe publications. - Swenson, M. Shel, et al. "SuperFine: fast and accurate supertree estimation." Systematic biology 61.2 (2012): 214. - Nguyen, Nam, Siavash Mirarab, and Tandy Warnow. "MRL and SuperFine+ MRL: new supertree methods." Algorithms for Molecular Biology 7 (2012): 1-13. - Neves, Diogo Telmo, et al. "Parallelizing superfine." Proceedings of the 27th Annual ACM Symposium on Applied Computing. 2012. - Nelesen, Serita, et al. "DACTAL: divide-and-conquer trees (almost) without alignments." Bioinformatics 28.12 (2012): i274-i282. - Liu, Kevin, and Tandy Warnow. "Treelength optimization for phylogeny estimation." PLoS One 7.3 (2012): e33104.
published: 2019-12-20
Wang, Yu; Burgess, Steven J. ; de Becker, Elsa ; Long, Stephen P. (2019): Data and code for: Photosynthesis in the fleeting shadows: An overlooked opportunity for increasing crop productivity?. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9453481_V1
This dynamic photosynthesis model of soybean canopy is developed by Yu Wang (yuwangcn@illinois.edu), IGB, University of Illinois. If you want to know more details, please check the following publication Yu Wang, Steven J. Burgess, Elsa de Becker, Stephen P. Long. Photosynthesis in the fleeting shadows: An overlooked opportunity for increasing crop productivity? The Plant Journal.
keywords:
Matlab; Soybean canopy; photosynthesis model
published: 2020-03-13
Sweet, Andrew; Johnson, Kevin; Cameron, Stephen (2020): Data from: Mitochondrial genomes of Columbicola feather lice are highly fragmented, indicating repeated evolution of minicircle-type genomes in parasitic lice . University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2211060_V2
Data files associated with the assembly of mitochondrial minicircles from five species of parasitic lice. This includes data from four species in the genus Columbicola and from the human louse (Pediculus humanus). The files include FASTA sequences for all five species, reference sequences for read mapping approaches, resulting contigs produced by various assembly approaches, and alignments of human louse minicircles mapped to published sequences of the same species.
keywords:
mitochondria; FASTA; nucleotide sequences; alignment; Columbicola; Pediculus
published: 2021-09-06
Vargas, Fabio (2021): Mesospheric gravity wave activity estimated via airglow imagery, multistatic meteor radar, and SABER data taken during the SIMONe–2018 campaign. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8585682_V1
Airglow images and Meteor radar data used in the paper "Mesospheric gravity wave activity estimated via airglow imagery, multistatic meteor radar, and SABER data taken during the SIMONe–2018 campaign".
keywords:
airglow; meteor radar; gravity waves; momentum flux;
published: 2021-10-15
Jianhao, Peng; Idoia, Ochoa (2021): Synthetic datasets for SimiC . University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4996748_V1
This is the 5 states 5000 cells synthetic expression file we used for validation of SimiC, a single cell gene regulatory network inference method with similarity constraints. Ground truth GRNs are stored in Numpy array format, and expression profiles of all states combined are stored in Pandas DataFrame in format of Pickle files.
keywords:
Numpy array; GRNs; Pandas DataFrame;
published: 2016-05-16
Imker, Heidi (2016): Phylogenetic Analysis of the NRPS AmbE Condensation Domains for the L-2-amino-4-methoxy-trans-3-butenoic acid (AMB) Biosynthetic Pathway in Pseudomonas aeruginosa. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4602893_V1
This dataset contains the protein sequences and trees used to compare Non-Ribosomal Peptide Synthetase (NRPS) condensation domains in the AMB gene cluster and was used to create figure S1 in Rojas et al. 2015. Instead of having to collect representative sequences independently, this set of condensation domain sequences may serve as a quick reference set for coarse classification of condensation domains.
keywords:
NRPS; biosynthetic gene cluster; antimetabolite; Pseudomonas; oxyvinylglycine; secondary metabolite; thiotemplate; toxin
published: 2019-09-17
Fraebel, David T.; Kuehn, Seppe (2019): Sequencing data for migration rate selection experiments (0.2% agar, 1mM sugar). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2128477_V1
BAM files for evolved strains from migration rate selection experiments conducted in low viscosity (0.2% w/v) agar plates containing M63 minimal medium with 1mM of mannose, melibiose, N-acetylglucosamine or galactose
published: 2018-06-20
Lao, Yuyang; Caravelli, Francesco; Sheikh, Mohammed; Sklenar, Joseph; Gardeazabal, Daniel; Watts, Justin D. ; Albrecht, Alan M. ; Scholl, Andreas; Dahmen, Karin; Nisoli, Cristiano; Schiffer, Peter (2018): Data from: Classical Topological Order in the Kinetics of Artificial Spin Ice. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0598724_V1
The dataset includes the data used in the study of Classical Topological Order in the Kinetics of Artificial Spin Ice. This includes the photoemission electron microscopy intensity measurement of artificial spin ice at different temperatures as a function of time. The data includes the raw data, the metadata, and the data cookbook. Please refer to the data cookbook for more information. Note: vertex_population.xlsx file in the meta_data_code folder can be disregarded.
keywords:
artificial spin ice; PEEM; topological order
published: 2019-05-20
Lao, Yuyang; Schiffer, Peter (2019): Tetris artificial spin ice kinetics . University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0779814_V1
This is the experimental data of tetris artificial spin ice. The islands are made of Permalloy materials with size of 170 nm by 470 nm by 2.5 nm. The systems are measured at a temperature where the islands are fluctuating around room temperature. The data is recorded as photoemission electron microscopy intensity. More details about the dataset can be found in the file Note.txt and Tetris_data_list.xlsx Note: 2 files name bl11_teris600_033 and bl11_tetris600_2_135 are not recorded in the excel sheet because they are corrupted during the measurement. Any data that is not recorded in the excel sheet is either corrupted or of low quality. From files *_028 to *_049, tetris is spelled with “t” while in the raw data folder without “t”. This is a typo. Throughout the dataset, tetris and teris are supposed to have the same meaning.
keywords:
artificial spin ice
published: 2019-07-04
Sashittal, Palash; El-Kebir, Mohammed (2019): SharpTNI Results. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9734610_V1
Results generated using SharpTNI on data collected from the 2014 Ebola outbreak in Sierra Leone.
published: 2019-08-05
Skinner, Rachel; Dietrich, Christopher; Walden, Kimberly; Gordon, Eric; Sweet, Andrew; Podsiadlowski, Lars; Petersen, Malte; Simon, Chris; Takiya, Daniela; Johnson, Kevin (2019): Data for Phylogenomics of Auchenorrhyncha (Insecta: Hemiptera) using Transcriptomes: Examining Controversial Relationships via Degeneracy Coding and Interrogation of Gene Conflict. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1461292_V1
The data in this directory corresponds to: Skinner, R.K., Dietrich, C.H., Walden, K.K.O., Gordon, E., Sweet, A.D., Podsiadlowski, L., Petersen, M., Simon, C., Takiya, D.M., and Johnson, K.P. Phylogenomics of Auchenorrhyncha (Insecta: Hemiptera) using Transcriptomes: Examining Controversial Relationships via Degeneracy Coding and Interrogation of Gene Conflict. Systematic Entomology. Correspondance should be directed to: Rachel K. Skinner, rskinn2@illinois.edu If you use these data, please cite our paper in Systematic Entomology. The following files can be found in this dataset: Amino_acid_concatenated_alignment.phy: the amino acid alignment used in this analysis in phylip format. Amino_acid_raxml_partitions.txt (for reference only): the partitions for the amino acid alignment, but a partitioned amino acid analysis was not performed in this study. Amino_acid_concatenated_tree.newick: the best maximum likelihood tree with bootstrap values in newick format. ASTRAL_input_gene_trees.tre: the concatenated gene tree input file for ASTRAL README_pie_charts.md: explains the the scripts and data needed to recreate the pie charts figure from our paper. There is also another Corresponds to the following files: ASTRAL_species_tree_EN_only.newick: the species tree with only effective number (EN) annotation ASTRAL_species_tree_pp1_only.newick: the species tree with only the posterior probability 1 (main topology) annotation ASTRAL_species_tree_q1_only.newick: the species tree with only the quartet scores for the main topology (q1) ASTRAL_species_tree_q2_only.newick: the species tree with only the quartet scores for the first alternative topology (q2) ASTRAL_species_tree_q3_only.newick: the species tree with only the quartet scores for the second alternative topology (q3) print_node_key_files.py: script needed to create the following files: node_keys.key: text file with node IDs and topologies complete_q_scores.key: text file with node IDs multiplied q scores EN_node_vals.key: text file with node IDs and EN values create_pie_charts_tree.py: script needed to visualize the tree with pie charts, pp1, and EN values plotted at nodes ASTRAL_species_tree_full_annotation.newick: the species tree with full annotation from the ASTRAL analysis. NOTE: It may be more useful to examine individual value files if you want to visualize the tree, e.g., in figtree, since the full annotations are extensive and can make viewing difficult. Complete_NT_concatenated_alignment.phy: the nucleotide alignment that includes unmodified third codon positions. The alignment is in phylip format. Complete_NT_raxml_partitions.txt: the raxml-style partition file of the nucleotide partitions Complete_NT_concatenated_tree.newick: the best maximum likelihood tree from the concatenated complete analysis NT with bootstrap values in newick format Complete_NT_partitioned_tree.newick: the best maximum likelihood tree from the partitioned complete NT analysis with bootstrap values in newick format Degeneracy_coded_nt_concatenated_alignment.phy: the degeneracy coded nucleotide alignment in phylip format Degeneracy_coded_nt_raxml_partitions.txt: the raxml-style partition file for the degeneracy coded nucleotide alignment Degeneracy_coded_nt_concatenated_tree.newick: the best maximum likelihood tree from the degeneracy-coded concatenated analysis with bootstrap values in newick format Degeneracy_coded_nt_partitioned_tree.newick: the best maximum likelihood tree from the degeneracy-coded partitioned analysis with bootstrap values in newick format count_ingroup_taxa.py: script that counts the number of ingroup and/or outgroup taxa present in an alignment
keywords:
Auchenorrhyncha; Hemiptera; alignment; trees
published: 2019-12-03
de Moya, Robert (2019): Heteroptera Transcriptome Set. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7784896_V1
These are the alignments of transcriptome data used for the analysis of members of Heteroptera. This dataset is analyzed in "Deep instability in the phylogenetic backbone of Heteroptera is only partly overcome by transcriptome-based phylogenomics" published in Insect Systematics and Diversity.
keywords:
Heteroptera; Hemiptera; Phylogenomics; transcriptome
published: 2020-01-20
Zhang, Jun; Wuebbles, Donald; Kinnison, Douglas; Saiz López, Alfonso (2020): Data for: Revising the Ozone Depletion Potentials for Short-Lived Chemicals such as CF3I and CH3I. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5952573_V1
This datasets provide basis of our analysis in the paper - Revising the Ozone Depletion Potentials for Short-Lived Chemicals such as CF3I and CH3I. All datasets here are from the model output (CAM4-chem). All the simulations (background and perturbation) were run to steady-state and only the last year outputs used in analysis are archived here.
keywords:
Illinois Data Bank; NetCDF; Ozone Depletion Potential; CF3I and CH3I
published: 2020-11-05
Miller, Andrew; Raudabaugh, Daniel (2020): Data from Species Distribution, Phylogenetic Structure, and Functional Roles of Detritius Inhabiting Fungi Across Contrasting Aquatic Environments.. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6862941_V2
This version 2 dataset contains 34 files in total with one (1) additional file, called "Culture-dependent Isolate table with taxonomic determination and sequence data.csv". The remaining files (33) are identical to version 1. The following is the information about the new file and its variables: <b>Culture-dependent Isolate table with taxonomic determination and sequence data.csv</b>: Culture table with assigned taxonomy from NCBI. Single direction sequence for each isolate is include if one could be obtained. Sequence is derived from ITS1F-ITS4 PCR amplicons, with Sanger sequencing in one direction using ITS5. The files contains 20 variables with explanation as below: IsolateNumber : unique number identify each isolate cultured Time: season in which the sample was collected Location: the specific name of the location Habitat: type of habitat : either stream or peatland State: state in the USA in which the specific location is located Incubation_pH ID: pH of the medium during isolation of fungal cultures Genus: phylogenetic genus of the fungal isolates (determined by sequence similarity) Sequence_quality: base call quality of the entire sequence used for blast analysis, if known %_coverage: sequence coverage reported from GenBank %_ID: sequence similarity reported from GenBank Life_style : ecological life style if known Phylum: phylogenetic phylum as indicated by Index Fungorum Subphylum: phylogenetic subphylum as indicated by Index Fungorum Class: phylogenetic class as indicated by Index Fungorum Subclass: phylogenetic subclass as indicated by Index Fungorum Order: phylogenetic order as indicated by Index Fungorum Family: phylogenetic Family as indicated by Index Fungorum ITS5_Sequence: single direction sequence used for sequence similarity match using blastn. Primer ITS5 Fasta: sequence with nomenclature in a fasta format for easy cut and paste into phylogenetic software Note: blank cells mean no data is available or unknown.
keywords:
ITS1 forward reads; Illumina; peatlands; streams; bogs; fens
published: 2019-05-10
Pradhan, Dikshant; Jensen, Paul (2019): Pradhan 2019 Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3352362_V1
Data necessary for production of figures presented in "Efficient enzyme coupling algorithms identify functional pathways in genome-scale metabolic models" by Pradhan et al.
keywords:
Efficient enzyme coupling algorithms identify functional pathways in genome-scale metabolic models;
published: 2019-12-03
de Moya, Robert (2019): Feather Louse Orthology set. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0440388_V1
This is the data set associated with the manuscript titled "Extensive host-switching of avian feather lice following the Cretaceous-Paleogene mass extinction event." Included are the gene alignments used for phylogenetic analyses and the cophylogenetic input files.
keywords:
phylogenomics, cophylogenetics, feather lice, birds
published: 2012-07-01
Mirarab, Siavash; Ngyuen, Nam-Phuong; Warnow, Tandy (2012): Data for SEPP: SATé-Enabled Phylogenetic Placement.. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9316702_V1
This dataset provides the data for Mirarab, Siavash, Nam Nguyen, and Tandy Warnow. "SEPP: SATé-enabled phylogenetic placement." Biocomputing 2012. 2012. 247-258.
published: 2019-06-12
Miller, Andrew; Raudabaugh, Daniel (2019): Supplemental data sets for Raudabaugh et al., Where are they hiding? Testing the body snatchers hypothesis in pyrophilous fungi. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1530363_V1
The data set contains Supplemental data sets for the Manuscript entitled "Where are they hiding? Testing the body snatchers hypothesis in pyrophilous fungi." Environmental sampling: Amplification of nuclear DNA regions (ITS1 and ITS2) were completed using the Fluidigm Access Array and the resulting amplicons were sequenced on an Illumina MiSeq v2 platform runs using rapid 2 × 250 nt paired-end reads. Illumina sequencing run amplicons that were size selected into <500nt and >500nt sub-pools, then remixed together <500nt: >500nt by nM concentration in a 1x:3x proportion. All amplification and sequencing steps were performed at the Roy J. Carver Biotechnology Center at the University of Illinois Urbana-Champaign. ITS1 region primers consisted of ITS1F (5'-CTTGGTCATTTAGAGGAAGTAA-'3) and ITS2 (5'-GCTGCGTTCTTCATCGATGC-'3). ITS2 region primers consisted of fITS7 (5'-GTGARTCATCGAATCTTTG-'3) and ITS4 (5'-TCCTCCGCTTATTGATATGC-'3). Supplemental files 1 through 5 contain the raw data files. Supplemental 1 is the ITS1 Illumina MiSeq forward reads and Supplemental 2 is the corresponding index files. Supplemental 3 is the ITS2 Illumina MiSeq forward reads and Supplemental 4 is the corresponding index files. Supplemental 5 is the map file needed to process the forward reads and index files in QIIME. Supplemental 6 and 7 contain the resulting QIIME 1.9.1. OTU tables along with UNITE, NCBI, and CONSTAX taxonomic assignments in addition to the representative OTU sequence. Numeric samples within the OTU tables correspond to the following: 1 Brachythecium sp. 2 Usnea cornuta 3 Dicranum sp. 4 Leucodon julaceus 5 Lobaria quercizans 6 Rhizomnium sp. 7 Dicranum sp. 8 Thuidium delicatulum 9 Myelochroa aurulenta 10 Atrichum angustatum 11 Dicranum sp. 12 Hypnum sp. 13 Atrichum angustatum 14 Hypnum sp. 15 Thuidium delicatulum 16 Leucobryum sp. 17 Polytrichum commune 18 Atrichum angustatum 19 Atrichum angustatum 20 Atrichum crispulum 21 Bryaceae 22 Leucobryum sp. 23 Conocephalum conicum 24 Climacium americanum 25 Atrichum angustatum 26 Huperzia serrata 27 Polytrichum commune 28 Diphasiastrum sp. 29 Anomodon attenuatus 30 Bryoandersonia sp. 31 Polytrichum commune 32 Thuidium delicatulum 33 Brachythecium sp. 34 Leucobryum glaucum 35 Bryoandersonia sp. 36 Anomodon attenuatus 37 Pohlia sp. 38 Cinclidium sp. 39 Hylocomium splendens 40 Polytrichum commune 41 negative control 42 Soil 43 Soil 44 Soil 45 Soil 46 Soil 47 Soil If a sample number is not present within the OTU table; either no sequences were obtained or no sequences passed the quality filtering step in QIIME. Supplemental 8 contains the Summary of unique species per location.
published: 2019-07-29
Christensen, Sarah; Molloy, Erin K.; Vachaspati, Pranjal; Warnow, Tandy (2019): Data from TRACTION: Fast non-parametric improvement of estimated gene trees. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1747658_V1
Datasets used in the study, "TRACTION: Fast non-parametric improvement of estimated gene trees," accepted at the Workshop on Algorithms in Bioinformatics (WABI) 2019.
keywords:
Gene tree correction; horizontal gene transfer; incomplete lineage sorting
published: 2019-08-30
Allen, Maximilian (2019): Wisconsin Bobcat Harvest Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2501832_V1
This dataset includes the data from an analysis of bobcat harvest data with particular focus on the relationship between catch-per-unit-effort and population size. The data relate to bobcat trapper and hunter harvest metrics from Wisconsin and include two RDS files which can be open in the software R using the readRDS() function.
keywords:
bobcat; catch-per-unit-effort; CPUE; harvest; Lynx rufus; wildlife management; trapper; hunter
published: 2017-12-22
Scheidler, Andrew; Kinnett-Hopkins, Dominique; Learmonth, Yvonne; Motl, Robert; Lopez-Ortiz, Citlali (2017): Targeted ballet program mitigates ataxia and improves agility in moderate-to-advanced multiple sclerosis. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6858418_V2
TBP assessment raw data files of pre- and post- motion capture velocity and center of pressure force plate data. Labels are self-explanatory. The .mat files refer to data exported from the force plate for the time-to-stabilization assessments while the .txt files are the data collected for smoothness of gait assessments. These files do not relate to one another and are from separate assessments. Version2's files are the result from using Python code Data_Bank_Cleaner.py on version1's. Please find more information in READ_ME_databank.txt.
keywords:
Multiple Sclerosis; Rehabilitation; Balance; Ataxia; Ballet; Dance; Targeted Ballet Program
published: 2019-07-08
Kehoe, Adam K.; Torvik, Vetle I. (2019): Datasets from "Predicting Controlled Vocabulary Based on Text and Citations: Case Studies in Medical Subject Headings in MEDLINE and Patents". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8020612_V1
# Overview These datasets were created in conjunction with the dissertation "Predicting Controlled Vocabulary Based on Text and Citations: Case Studies in Medical Subject Headings in MEDLINE and Patents," by Adam Kehoe. The datasets consist of the following: * twin_not_abstract_matched_complete.tsv: a tab-delimited file consisting of pairs of MEDLINE articles with identical titles, authors and years of publication. This file contains the PMIDs of the duplicate publications, as well as their medical subject headings (MeSH) and three measures of their indexing consistency. * twin_abstract_matched_complete.tsv: the same as above, except that the MEDLINE articles also have matching abstracts. * mesh_training_data.csv: a comma-separated file containing the training data for the model discussed in the dissertation. * mesh_scores.tsv: a tab-delimited file containing a pairwise similarity score based on word embeddings, and MeSH hierarchy relationship. ## Duplicate MEDLINE Publications Both the twin_not_abstract_matched_complete.tsv and twin_abstract_matched_complete.tsv have the same structure. They have the following columns: 1. pmid_one: the PubMed unique identifier of the first paper 2. pmid_two: the PubMed unique identifier of the second paper 3. mesh_one: A list of medical subject headings (MeSH) from the first paper, delimited by the "|" character 4. mesh_two: a list of medical subject headings from the second paper, delimited by the "|" character 5. hoopers_consistency: The calculation of Hooper's consistency between the MeSH of the first and second paper 6. nonhierarchicalfree: a word embedding based consistency score described in the dissertation 7. hierarchicalfree: a word embedding based consistency score additionally limited by the MeSH hierarchy, described in the dissertation. ## MeSH Training Data The mesh_training_data.csv file contains the training data for the model discussed in the dissertation. It has the following columns: 1. pmid: the PubMed unique identifier of the paper 2. term: a candidate MeSH term 3. cit_count: the log of the frequency of the term in the citation candidate set 4. total_cit: the log of the total number the paper's citations 5. citr_count: the log of the frequency of the term in the citations of the paper's citations 6. total_citofcit: the log of the total number of the citations of the paper's citations 7. absim_count: the log of the frequency of the term in the AbSim candidate set 8. total_absim_count: the log of the total number of AbSim records for the paper 9. absimr_count: the log of the frequency of the term in the citations of the AbSim records 10. total_absimr_count: the log of the total number of citations of the AbSim record 11. log_medline_frequency: the log of the frequency of the candidate term in MEDLINE. 12. relevance: a binary indicator (True/False) if the candidate term was assigned to the target paper ## Cosine Similarity The mesh_scores.tsv file contains a pairwise list of all MeSH terms including their cosine similarity based on the word embedding described in the dissertation. Because the MeSH hierarchy is also used in many of the evaluation measures, the relationship of the term pair is also included. It has the following columns: 1. mesh_one: a string of the first MeSH heading. 2. mesh_two: a string of the second MeSH heading. 3. cosine_similarity: the cosine similarity between the terms 4. relationship_type: a string identifying the relationship type, consisting of none, parent/child, sibling, ancestor and direct (terms are identical, i.e. a direct hierarchy match). The mesh_model.bin file contains a binary word2vec C format file containing the MeSH term embeddings. It was generated using version 3.7.2 of the Python gensim library (https://radimrehurek.com/gensim/). For an example of how to load the model file, see https://radimrehurek.com/gensim/models/word2vec.html#usage-examples, specifically the directions for loading the "word2vec C format."
keywords:
MEDLINE;MeSH;Medical Subject Headings;Indexing
published: 2019-08-29
de Moya, Robert (2019): Bemisia tabaci ortholog set. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5333299_V1
This is the published ortholog set derived from whole genome data used for the analysis of members of the B. tabaci complex of whiteflies. It includes the concatenated alignment and individual gene alignments used for analyses (Link to publication: https://www.mdpi.com/1424-2818/11/9/151).
published: 2020-10-01
Strickland, Lynette (2020): No choice mating trials and two choice mating trials in the polymorphic tortoise beetle, Chelymorpha alternans. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8972634_V1
These datasets were performed to assess whether color pattern phenotypes of the polymorphic tortoise beetle, Chelymorpha alternans, mate randomly with one another, and whether there are any reproductive differences between assortative and disassortative pairings.
keywords:
mate choice, color polymorphisms, random mating