Illinois Data Bank Dataset Search Results
Results
published:
2017-06-16
Haselhorst, Derek S; Tcheng, David K. ; Moreno, J. Enrique ; Punyasena, Surangi W.
(2017)
Table S1. Pollen types identified in the BCI and PNSL pollen rain data sets. Pollen types were identified to species when possible and assigned a life form based on descriptions provided in Croat, T.B. (1978). Taxa from BCI and PNSL were assigned a 1 if present in forest census data or a 0 if absent. The relative representation of each taxon has been provided for each extended record and by dry and wet season representation respectively. CA loadings are provided for axes 1 and 2 (Fig. 1).
keywords:
pollen; identifications; abundance; data; BCI; PNSL; Panama
published:
2018-04-23
Mishra, Shubhanshu; Torvik, Vetle I.
(2018)
Conceptual novelty analysis data based on PubMed Medical Subject Headings
----------------------------------------------------------------------
Created by Shubhanshu Mishra, and Vetle I. Torvik on April 16th, 2018
## Introduction
This is a dataset created as part of the publication titled: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : the magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra.
It contains final data generated as part of our experiments based on MEDLINE 2015 baseline and MeSH tree from 2015.
The dataset is distributed in the form of the following tab separated text files:
* PubMed2015_NoveltyData.tsv - Novelty scores for each paper in PubMed. The file contains 22,349,417 rows and 6 columns, as follow:
- PMID: PubMed ID
- Year: year of publication
- TimeNovelty: time novelty score of the paper based on individual concepts (see paper)
- VolumeNovelty: volume novelty score of the paper based on individual concepts (see paper)
- PairTimeNovelty: time novelty score of the paper based on pair of concepts (see paper)
- PairVolumeNovelty: volume novelty score of the paper based on pair of concepts (see paper)
* mesh_scores.tsv - Temporal profiles for each MeSH term for all years. The file contains 1,102,831 rows and 5 columns, as follow:
- MeshTerm: Name of the MeSH term
- Year: year
- AbsVal: Total publications with that MeSH term in the given year
- TimeNovelty: age (in years since first publication) of MeSH term in the given year
- VolumeNovelty: : age (in number of papers since first publication) of MeSH term in the given year
* meshpair_scores.txt.gz (36 GB uncompressed) - Temporal profiles for each MeSH term for all years
- Mesh1: Name of the first MeSH term (alphabetically sorted)
- Mesh2: Name of the second MeSH term (alphabetically sorted)
- Year: year
- AbsVal: Total publications with that MeSH pair in the given year
- TimeNovelty: age (in years since first publication) of MeSH pair in the given year
- VolumeNovelty: : age (in number of papers since first publication) of MeSH pair in the given year
* README.txt file
## Dataset creation
This dataset was constructed using multiple datasets described in the following locations:
* MEDLINE 2015 baseline: <a href="https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html">https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html</a>
* MeSH tree 2015: <a href="ftp://nlmpubs.nlm.nih.gov/online/mesh/2015/meshtrees/">ftp://nlmpubs.nlm.nih.gov/online/mesh/2015/meshtrees/</a>
* Source code provided at: <a href="https://github.com/napsternxg/Novelty">https://github.com/napsternxg/Novelty</a>
Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016.
Check <a href="https://www.nlm.nih.gov/databases/download/pubmed_medline.html">here </a>for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions:
Additional data related updates can be found at: <a href="http://abel.ischool.illinois.edu">Torvik Research Group</a>
## Acknowledgments
This work was made possible in part with funding to VIT from <a href="https://projectreporter.nih.gov/project_info_description.cfm?aid=8475017&icde=18058490">NIH grant P01AG039347 </a> and <a href="http://www.nsf.gov/awardsearch/showAward?AWD_ID=1348742">NSF grant 1348742 </a>. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
## License
Conceptual novelty analysis data based on PubMed Medical Subject Headings by Shubhanshu Mishra, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License.
Permissions beyond the scope of this license may be available at <a href="https://github.com/napsternxg/Novelty">https://github.com/napsternxg/Novelty</a>
keywords:
Conceptual novelty; bibliometrics; PubMed; MEDLINE; MeSH; Medical Subject Headings; Analysis;
published:
2022-01-01
Cao, Yanghui; Dietrich, Christopher H.
(2022)
The file “Fla.fasta”, comprising 10526 positions, is the concatenated amino acid alignments of 51 orthologues of 182 bacterial strains. It was used for the maximum likelihood and maximum parsimony analyses of Flavobacteriales. Bacterial species names and strains were used as the sequence names, host names of insect endosymbionts were shown in brackets. The file “16S.fasta” is the alignment of 233 bacterial 16S rRNA sequences. It contains 1455 positions and was used for the maximum likelihood analysis of flavobacterial insect endosymbionts. The names of endosymbiont strains were replaced by the name of their hosts. In addition to the species names, National Center for Biotechnology Information (NCBI) accession numbers were also indicated in the sequence names (e.g., sequence “Cicadellidae_Deltocephalinae_Macrostelini_Macrosteles_striifrons_AB795320” is the 16S rRNA of Macrosteles striifrons (Cicadellidae: Deltocephalinae: Macrostelini) with a NCBI accession number AB795320). The file “Sulcia_pep.fasta” is the concatenated amino acid alignments of 131 orthologues of “Candidatus Sulcia muelleri” (Sulcia). It contains 41970 positions and presents 101 Sulcia strains and 3 Blattabacterium strains. This file was used for the maximum likelihood analysis of Sulcia. The file “Sulcia_nucleotide.fasta” is the concatenated nucleotide alignment corresponding to the sequences in “Sulcia_pep.fasta” but also comprises the alignment of 16S rRNA. It has 127339 positions and was used for the maximum likelihood and maximum parsimony analyses of Sulcia. Individual gene alignments (16S rRNA and 131 orthologues of Sulcia and Blattabacterium) are deposited in the compressed file “individual_gene_alignments.zip”, which were used to construct gene trees for multispecies coalescent analysis. The names of Sulcia strains were replaced by the name of their hosts in “Sulcia_pep.fasta”, “Sulcia_nucleotide.fasta” and the files in “individual_gene_alignments.zip”. In all the alignment files, gaps are indicated by “-”.
keywords:
endosymbiont, “Candidatus Sulcia muelleri”, Auchenorrhyncha, coevolution
published:
2024-08-24
Jones, Todd; Llamas, Alfredo; Phillips, Jennifer
(2024)
Dataset associated with Jones et al. GCB-23-1273.R1 submission: Phenotypic signatures of urbanization? Resident, but not migratory, songbird eye size varies with urban-associated light pollution levels. Excel CSV file with all of the data used in analyses and file with descriptions of each column.
keywords:
body size; demographics; eye size; phenotypic divergence; songbirds; sensory pollution; urbanization
published:
2023-12-18
Edmonds, Devin; Adamovicz, Laura; Allender, Matthew; Colton, Andrea; Randy, Nyboer; Michael, Dreslik
(2023)
We conducted long-term capture-mark-recapture surveys on two isolated ornate box turtle (Terrapene ornata) populations in northern Illinois, USA. This dataset provides the capture history strings and additional demographic information used for estimating population vital rates with robust design capture-mark-recapture models. The vital rates were then used in a stage-based population projection matrix model for each population.
keywords:
demography; capture-mark-recapture; vital rates; conservation; wildlife ecology
published:
2011-09-20
Swenson, M. Shel; Suri, Rahul; Linder, C. Randal; Warnow, Tandy; Nguyen, Nam-puhong; Mirarab, Siavash; Neves, Diogo Telmo; Sobral, João Luís; Pingali, Keshav; Nelesen, Serita; Liu, Kevin; Wang, Li-San
(2011)
This page provides the data for SuperFine, DACTAL, and BeeTLe publications.
- Swenson, M. Shel, et al. "SuperFine: fast and accurate supertree estimation." Systematic biology 61.2 (2012): 214.
- Nguyen, Nam, Siavash Mirarab, and Tandy Warnow. "MRL and SuperFine+ MRL: new supertree methods." Algorithms for Molecular Biology 7 (2012): 1-13.
- Neves, Diogo Telmo, et al. "Parallelizing superfine." Proceedings of the 27th Annual ACM Symposium on Applied Computing. 2012.
- Nelesen, Serita, et al. "DACTAL: divide-and-conquer trees (almost) without alignments." Bioinformatics 28.12 (2012): i274-i282.
- Liu, Kevin, and Tandy Warnow. "Treelength optimization for phylogeny estimation." PLoS One 7.3 (2012): e33104.
published:
2017-12-14
Hepler, Katherine C.
(2017)
keywords:
uranium harvesting from seawater; Geospatial analysis; adsorbent performance; NPRE 412
published:
2017-11-14
Miller, Martin; Chung, Soon-Jo; Hutchinson, Seth
(2017)
If you use this dataset, please cite the IJRR data paper (bibtex is below).
We present a dataset collected from a canoe along the Sangamon River in Illinois. The canoe was equipped with a stereo camera, an IMU, and a GPS device, which provide visual data suitable for stereo or monocular applications, inertial measurements, and position data for ground truth. We recorded a canoe trip up and down the river for 44 minutes covering 2.7 km round trip. The dataset adds to those previously recorded in unstructured environments and is unique in that it is recorded on a river, which provides its own set of challenges and constraints that are described
in this paper. The data is divided into subsets, which can be downloaded individually.
Video previews are available on Youtube:
https://www.youtube.com/channel/UCOU9e7xxqmL_s4QX6jsGZSw
The information below can also be found in the README files provided in the 527 dataset and each of its subsets. The purpose of this document is to assist researchers in using this dataset.
Images
======
Raw
---
The raw images are stored in the cam0 and cam1 directories in bmp format. They are bayered images that need to be debayered and undistorted before they are used. The camera parameters for these images can be found in camchain-imucam.yaml. Note that the camera intrinsics describe a 1600x1200 resolution image, so the focal length and center pixel coordinates must be scaled by 0.5 before they are used. The distortion coefficients remain the same even for the scaled images. The camera to imu tranformation matrix is also in this file. cam0/ refers to the left camera, and cam1/ refers to the right camera.
Rectified
---------
Stereo rectified, undistorted, row-aligned, debayered images are stored in the rectified/ directory in the same way as the raw images except that they are in png format. The params.yaml file contains the projection and rotation matrices necessary to use these images. The resolution of these parameters do not need to be scaled as is necessary for the raw images.
params.yml
----------
The stereo rectification parameters. R0,R1,P0,P1, and Q correspond to the outputs of the OpenCV stereoRectify function except that 1s and 2s are replaced by 0s and 1s, respectively.
R0: The rectifying rotation matrix of the left camera.
R1: The rectifying rotation matrix of the right camera.
P0: The projection matrix of the left camera.
P1: The projection matrix of the right camera.
Q: Disparity to depth mapping matrix
T_cam_imu: Transformation matrix for a point in the IMU frame to the left camera frame.
camchain-imucam.yaml
--------------------
The camera intrinsic and extrinsic parameters and the camera to IMU transformation usable with the raw images.
T_cam_imu: Transformation matrix for a point in the IMU frame to the camera frame.
distortion_coeffs: lens distortion coefficients using the radial tangential model.
intrinsics: focal length x, focal length y, principal point x, principal point y
resolution: resolution of calibration. Scale the intrinsics for use with the raw 800x600 images. The distortion coefficients do not change when the image is scaled.
T_cn_cnm1: Transformation matrix from the right camera to the left camera.
Sensors
-------
Here, each message in name.csv is described
###rawimus###
time # GPS time in seconds
message name # rawimus
acceleration_z # m/s^2 IMU uses right-forward-up coordinates
-acceleration_y # m/s^2
acceleration_x # m/s^2
angular_rate_z # rad/s IMU uses right-forward-up coordinates
-angular_rate_y # rad/s
angular_rate_x # rad/s
###IMG###
time # GPS time in seconds
message name # IMG
left image filename
right image filename
###inspvas###
time # GPS time in seconds
message name # inspvas
latitude
longitude
altitude # ellipsoidal height WGS84 in meters
north velocity # m/s
east velocity # m/s
up velocity # m/s
roll # right hand rotation about y axis in degrees
pitch # right hand rotation about x axis in degrees
azimuth # left hand rotation about z axis in degrees clockwise from north
###inscovs###
time # GPS time in seconds
message name # inscovs
position covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz m^2
attitude covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz deg^2
velocity covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz (m/s)^2
###bestutm###
time # GPS time in seconds
message name # bestutm
utm zone # numerical zone
utm character # alphabetical zone
northing # m
easting # m
height # m above mean sea level
Camera logs
-----------
The files name.cam0 and name.cam1 are text files that correspond to cameras 0 and 1, respectively. The columns are defined by:
unused: The first column is all 1s and can be ignored.
software frame number: This number increments at the end of every iteration of the software loop.
camera frame number: This number is generated by the camera and increments each time the shutter is triggered. The software and camera frame numbers do not have to start at the same value, but if the difference between the initial and final values is not the same, it suggests that frames may have been dropped.
camera timestamp: This is the cameras internal timestamp of the frame capture in units of 100 milliseconds.
PC timestamp: This is the PC time of arrival of the image.
name.kml
--------
The kml file is a mapping file that can be read by software such as Google Earth. It contains the recorded GPS trajectory.
name.unicsv
-----------
This is a csv file of the GPS trajectory in UTM coordinates that can be read by gpsbabel, software for manipulating GPS paths.
@article{doi:10.1177/0278364917751842,
author = {Martin Miller and Soon-Jo Chung and Seth Hutchinson},
title ={The Visual–Inertial Canoe Dataset},
journal = {The International Journal of Robotics Research},
volume = {37},
number = {1},
pages = {13-20},
year = {2018},
doi = {10.1177/0278364917751842},
URL = {https://doi.org/10.1177/0278364917751842},
eprint = {https://doi.org/10.1177/0278364917751842}
}
keywords:
slam;sangamon;river;illinois;canoe;gps;imu;stereo;monocular;vision;inertial
published:
2019-09-17
Mishra, Shubhanshu
(2019)
Trained models for multi-task multi-dataset learning for text classification in tweets.
Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality.
Models were trained using: <a href="https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification.py">https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification.py</a>
See <a href="https://github.com/socialmediaie/SocialMediaIE">https://github.com/socialmediaie/SocialMediaIE</a> and <a href="https://socialmediaie.github.io">https://socialmediaie.github.io</a> for details.
If you are using this data, please also cite the related article:
Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929
keywords:
twitter; deep learning; machine learning; trained models; multi-task learning; multi-dataset learning; sentiment; sarcasm; abusive content;
published:
2020-08-21
Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana
(2020)
# WikiCSSH
If you are using WikiCSSH please cite the following:
> Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana. 2020. “WikiCSSH: Extracting Computer Science Subject Headings from Wikipedia.” In Workshop on Scientific Knowledge Graphs (SKG 2020). https://skg.kmi.open.ac.uk/SKG2020/papers/HAN_et_al_SKG_2020.pdf
> Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana. 2020. "WikiCSSH - Computer Science Subject Headings from Wikipedia". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0424970_V1
Download the WikiCSSH files from: https://doi.org/10.13012/B2IDB-0424970_V1
More details about the WikiCSSH project can be found at: https://github.com/uiuc-ischool-scanr/WikiCSSH
This folder contains the following files:
WikiCSSH_categories.csv - Categories in WikiCSSH
WikiCSSH_category_links.csv - Links between categories in WikiCSSH
Wikicssh_core_categories.csv - Core categories as mentioned in the paper
WikiCSSH_category_links_all.csv - Links between categories in WikiCSSH (includes a dummy category called <ROOT> which is parent of isolates and top level categories)
WikiCSSH_category2page.csv - Links between Wikipedia pages and Wikipedia Categories in WikiCSSH
WikiCSSH_page2redirect.csv - Links between Wikipedia pages and Wikipedia page redirects in WikiCSSH
This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit <a href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</a> or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
keywords:
wikipedia; computer science;
published:
2022-03-19
McCoy, Annette; Secor, Erica; Roady, Patrick; Gray, Sarah; Klein, Julie; Gutierrez-Nibeyro, Santiago
(2022)
Raw arthroscopic scores, histologic scores, cytokine measurements, and performance data for the study cohort described in the accompanying publication.
keywords:
horse; metatarsophalangeal joint; arthroscopy; exercise; developmental orthopedic disease
published:
2016-06-23
This dataset was extracted from a set of metadata files harvested from the DataCite metadata store (https://search.datacite.org/ui) during December 2015. Metadata records for items with a resourceType of dataset were collected. 1,647,949 total records were collected.
This dataset contains three files:
1) readme.txt: A readme file.
2) version-results.csv: A CSV file containing three columns: DOI, DOI prefix, and version text contents
3) version-counts.csv: A CSV file containing counts for unique version text content values.
keywords:
datacite;metadata;version values;repository data
published:
2024-10-10
Mishra, Apratim; Lee, Haejin; Jeoung, Sullam; Torvik, Vetle; Diesner, Jana
(2024)
Diversity - PubMed dataset
Contact: Apratim Mishra (Oct, 2024)
This dataset presents article-level (pmid) and author-level (auid) diversity data for PubMed articles. The chosen selection includes articles retrieved from Authority 2018 [1], 907 024 papers, and 1 316 838 authors, and is an expanded dataset of V1. The sample of articles consists of the top 40 journals in the dataset, limited to 2-12 authors published between 1991 – 2014, which are article type "journal type" written in English. Files are 'gzip' compressed and separated by tab space, and V3 includes the correct author count for the included papers (pmids) and updated results with no NaNs.
################################################
File1: auids_plos_3.csv.gz (Important columns defined, 5 in total)
• AUID: a unique ID for each author
• Genni: gender prediction
• Ethnea: ethnicity prediction
#################################################
File2: pmids_plos_3.csv.gz (Important columns defined)
• pmid: unique paper
• auid: all unique auids (author-name unique identification)
• year: Year of paper publication
• no_authors: Author count
• journal: Journal name
• years: first year of publication for every author
• Country-temporal: Country of affiliation for every author
• h_index: Journal h-index
• TimeNovelty: Paper Time novelty [2]
• nih_funded: Binary variable indicating funding for any author
• prior_cit_mean: Mean of all authors’ prior citation rate
• Insti_impact: All unique institutions’ citation rate
• mesh_vals: Top MeSH values for every author of that paper
• relative_citation_ratio: RCR
The ‘Readme’ includes a description for all columns.
[1] Torvik, Vetle; Smalheiser, Neil (2021): Author-ity 2018 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2273402_V1
[2] Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1
keywords:
Diversity; PubMed; Citation
published:
2025-07-21
Feng, Jennifer T.; van den Berg, Thya; Donders, Timme H.; Kong, Shu; Puthanveetil Satheesan, Sandeep; Punyasena, Surangi W.
(2025)
This dataset includes image stacks, annotated counts, and ground-truth masks from two high-resolution sediment cores extracted from Laguna Pallcacocha, in El Cajas National Park, Ecuadorian Andes by Moy et al. (2002) and Hagemans et al. (2021). The first core (PAL 1999, from Moy et al. (2002)) extends through the Holocene (11,600 cal. yr. BP - present). There are a total of 900 annotated image stacks and masks in the PAL 1999 domain. The second core (PAL IV, from Hagemans et al. (2021)) captures the 20th century. There are 2986 annotated image stacks and masks in the PAL IV domain.
Different microscopes and annotations tools were used to image and annotate each core and there are corresponding differences in naming conventions and file formats. Thus, we organized our data separately for the PAL 1999 and the PAL IV domains. The three letter codes used to label our pollen annotations are in the file: “Pollen_Identification_Codes.xlsx”.
Both domain directories contain:
• Image stacks organized by subdirectory
• Annotations within each image stack directory, containing specimen identifications using a three letter code and coordinates defining bounding boxes or circles
• Ground-truth distance-transform masks for each image stack
The zip file "bestValModel_encoder.paramOnly.zip" is the trained pollen detection model produced from the images and annotations in this dataset.
Please cite this dataset as:
Feng, Jennifer T.; van den Berg, Thya; Donders, Timme H.; Kong, Shu; Puthanveetil Satheesan, Sandeep; Punyasena, Surangi W. (2025): Slide scans, annotated pollen counts, and trained pollen detection models for fossil pollen samples from Laguna Pallcacocha, El Cajas National Park, Ecuador . University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4207757_V1
Please also include citations of the original publications from which these data are taken:
Feng, Jennifer T., Sandeep Puthanveetil Satheesan, Shu Kong, Timme H. Donders, and Surangi W. Punyasena. “Addressing the ‘Open World’: Detecting and Segmenting Pollen on Palynological Slides with Deep Learning.” bioRxiv, January 1, 2025. https://doi.org/10.1101/2025.01.05.631390.
Feng, Jennifer T., Sandeep Puthanveetil Satheesan, Shu Kong, Timme H. Donders, and Surangi W. Punyasena. “Addressing the ‘Open World’: Detecting and Segmenting Pollen on Palynological Slides with Deep Learning.” Paleobiology, 2025 [in press].
Feng, J. T. (2023). Open-world deep learning applied to pollen detection (MS thesis, University of Illinois at Urbana-Champaign). https://hdl.handle.net/2142/120168
keywords:
continual learning; deep learning; domain gaps; open-world; palynology; pollen grain detection; taxonomic bias
published:
2025-01-30
Raw data associated with PMID: 38925247
published:
2025-01-30
Zhang, Yufan; Bhattarai, Rabin
(2025)
This is a research data for a manuscript - A Framework of Simulating Structural Sediment Perimeter Barriers using VFSMOD.
keywords:
sediment control
published:
2017-12-14
Objectives: This study follows-up on previous work that began examining data deposited in an institutional repository. The work here extends the earlier study by answering the following lines of research questions: (1) what is the file composition of datasets ingested into the University of Illinois at Urbana-Champaign campus repository? Are datasets more likely to be single file or multiple file items? (2) what is the usage data associated with these datasets? Which items are most popular?
Methods: The dataset records collected in this study were identified by filtering item types categorized as "data" or "dataset" using the advanced search function in IDEALS. Returned search results were collected in an Excel spreadsheet to include data such as the Handle identifier, date ingested, file formats, composition code, and the download count from the item's statistics report. The Handle identifier represents the dataset record's persistent identifier. Composition represents codes that categorize items as single or multiple file deposits. Date available represents the date the dataset record was published in the campus repository. Download statistics were collected via a website link for each dataset record and indicates the number of times the dataset record has been downloaded. Once the data was collected, it was used to evaluate datasets deposited into IDEALS.
Results: A total of 522 datasets were identified for analysis covering the period between January 2007 and August 2016. This study revealed two influxes occurring during the period of 2008-2009 and in 2014. During the first time frame a large number of PDFs were deposited by the Illinois Department of Agriculture. Whereas, Microsoft Excel files were deposited in 2014 by the Rare Books and Manuscript Library. Single file datasets clearly dominate the deposits in the campus repository. The total download count for all datasets was 139,663 and the average downloads per month per file across all datasets averaged 3.2.
Conclusion: Academic librarians, repository managers, and research data services staff can use the results presented here to anticipate the nature of research data that may be deposited within institutional repositories. With increased awareness, content recruitment, and improvements, IRs can provide a viable cyberinfrastructure for researchers to deposit data, but much can be learned from the data already deposited. Awareness of trends can help librarians facilitate discussions with researchers about research data deposits as well as better tailor their services to address short-term and long-term research needs.
keywords:
research data; research statistics; institutional repositories; academic libraries
published:
2022-06-01
Southey, Bruce; Rodriguez-Zas, Sandra L.
(2022)
This dataset contain information for the paper "Changes in neuropeptide prohormone genes among Cetartio-dactyla livestock and wild species associated with evolution and domestication" Veterinary Sciences, MDPI. Protein sequences were predicted using GeneWise for 98 neuropeptide prohormone genes from publicly available genomes of 118 Cetartiodactyla species. All predictions (CetartiodactylaSequences2022.zip) were manually verified. Sequences were aligned within each prohormone using MAFFT (MDPImultalign2022.zip includes multiple sequence alignment of all species available for each prohormone). Phylogenetic gene trees were constructed using PhyML and the species tree was constructed using ASTRAL (MDPItree2022.zip). The data is released under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0).
keywords:
prohormone; neuropeptide; Cetartiodactyla; Cetartiodactyla; phylogenetics; gene tree; species tree
published:
2025-09-23
Zhao, Huimin; Chen, Li-Qing; Martin, Teresa; Xue, Xueyi; Singh, Nilmani; Tan, Shi-I; Boob, Aashutosh
(2025)
Mitochondria play a key role in energy production and metabolism, making them a promising target for metabolic engineering and disease treatment. However, despite the known influence of passenger proteins on localization efficiency, only a few protein-localization tags have been characterized for mitochondrial targeting. To address this limitation, we leverage a Variational Autoencoder to design novel mitochondrial targeting sequences. In silico analysis reveals that a high fraction of the generated peptides (90.14%) are functional and possess features important for mitochondrial targeting. We characterize artificial peptides in four eukaryotic organisms and, as a proof-of-concept, demonstrate their utility in increasing 3-hydroxypropionic acid titers through pathway compartmentalization and improving 5-aminolevulinate synthase delivery by 1.62-fold and 4.76-fold, respectively. Moreover, we employ latent space interpolation to shed light on the evolutionary origins of dual-targeting sequences. Overall, our work demonstrates the potential of generative artificial intelligence for both fundamental research and practical applications in mitochondrial biology.
keywords:
AI/ML; metabolic engineering; modeling; software
published:
2017-06-16
Haselhorst, Derek S.; Tcheng, David K.; Moreno, J. Enrique ; Punyasena, Surangi W.
(2017)
Table S2. Raw pollen counts and climatic data for each seasonal sampling period. Climatic data reflects the average daily conditions observed over the duration samples were collected (˚C/day, mm/day, MJ/m2/day). Lycopodium counts and counts for each pollen taxon reflect the aggregated pollen sum from four sampling heights.
keywords:
pollen; count; climate; data; BCI; PNSL; Panama
published:
2020-12-07
Tian, Yuan; Smith-Bolton, Rachel
(2020)
This page contains the data for the publication "Regulation of growth and cell fate during tissue regeneration by the two SWI/SNF chromatin-remodeling complexes of Drosophila" published in Genetics, 2020
published:
2020-11-25
Barker, Louise; Gaulke, Sarah M.; Chace, Jordyn Z.; Davis, Mark A.; Niemiller, Matthew L.; Taylor, Steven J.; Schuett, Gordon W.
(2020)
Video recorded by Louise Barker using a Cannon Powershot camera documents late-season combat behavior in Agkistrodon contortrix. Recorded in Beaufort County, North Carolina, 11.1 km SE of downtown Washington on 21 October 2020.
keywords:
Agkistrodon contortrix; combat; mating; reproduction; copperhead; pit viper; Viperidae;
published:
2017-06-01
List of Chinese Students Receiving a Ph.D. in Chemistry between 1905 and 1964. Based on two books compiling doctoral dissertations by Chinese students in the United States. Includes disciplines; university; advisor; year degree awarded, birth and/or death date, dissertation title. Accompanies Chapter 5 : History of the Modern Chemistry Doctoral Program in Mainland China by Vera V. Mainz published in "Igniting the Chemical Ring of Fire : Historical Evolution of the Chemical Communities in the Countries of the Pacific Rim", Seth Rasmussen, Editor. Published by World Scientific. Expected publication 2017.
keywords:
Chinese; graduate student; dissertation; university; advisor; chemistry; engineering; materials science
published:
2017-06-16
Haselhorst, Derek S.; Tcheng, David K.; Moreno, J. Enrique ; Punyasena, Surangi W.
(2017)
Table S3. Mean slope response for each predictive model used in the ecoinformatic analysis. Mean responses are provided for each seasonal and annual pollen data set analyzed from BCI and PNSL and are summarized by life form. Calculated p-values are provided for each model.
keywords:
pollen; response; climate; ecoinformatics; BCI; PNSL; Panama
published:
2017-09-26
Gramig, Benjamin M.; Widmar, Nicole
(2017)
This file contains the supplemental appendix for the article "Farmer Preferences for Agricultural Soil Carbon Sequestration Schemes" published in Applied Economic Policy and Perspectives (accepted 2017).
keywords:
appendix; carbon sequestration; tillage; choice experiment