Datasets (67)

Funder

Other (13)
U.S. National Science Foundation (NSF) (12)
U.S. Department of Energy (DOE) (6)
U.S. National Institutes of Health (NIH) (3)
Illinois Department of Natural Resources (IDNR) (2)
U.S. Department of Agriculture (USDA) (2)
U.S. Geological Survey (USGS) (1)

License

CC0 (41)
CC BY (24)
custom (2)
published: 2017-12-15
 
These are the results of an 8 month cohort study in two commercial dairy herds in Northwest Illinois. From each herd, 50 cows were selected at random, stratified over lactations 1 to 3. Serum from these animals was collected every two months and tested for antibodies to Bovine Leukosis Virus, Neospora caninum, and Mycobacterium avium subsp. paratuberculosis. Animals that left the herd during the study were replaced by another animal in the same herd and lactation. At the last sampling, serum neutralization assays were performed for Bovine Herpesvirus type 1 and Bovine Viral Diarrhea virus type 1 and 2. Production data before and after sampling was collected for the entire herd from PCdart.
keywords: serostatus;dairy;production;cohort
published: 2017-12-15
 
Dataset includes structure and values of a causal model for Training Quality in nuclear power plants. Each entry refers to a piece of evidence supporting causality of the Training Quality causal model. Includes bibliographic information, context-specific text from the reference, and three weighted values; (M1) credibility of reference, (2) causality determined by the author, and (3) analysts confidence level. (M1, M2, and M3) Weight metadata are based on probability language from: Intergovernmental Panel on Climate Change (IPCC), Climate Change 2001: Synthesis Report. The URL to the report: https://www.ipcc.ch/ipccreports/tar/vol4/english/index.htm. The language can be found firstly in “Summary for Policymakers” section, in PDF format. Weight Metadata: LowerBound_Probability, UpperBound_Probability, Qualitative Language 0.99, 1, Virtually Certain 0.9, 0.99, Very Likely 0.66, 0.9, Likely 0.33, 0.66, Medium Likelihood 0.1, 0.33, Unlikely 0.01, 0.1, Very Unlikely 0, 0.01, Extremely Unlikely
keywords: Data-Theoretic; Training; Organization; Probabilistic Risk Assessment; Training Quality; Causal Model; DT-BASE; Bayesian Belief Network; Bayesian Network; Theory-Building
published: 2017-12-14
 
Objectives: This study follows-up on previous work that began examining data deposited in an institutional repository. The work here extends the earlier study by answering the following lines of research questions: (1) what is the file composition of datasets ingested into the (institution blinded for review) campus repository? Are datasets more likely to be single file or multiple file items? (2) what is the usage data associated with these datasets? Which items are most popular? Methods: The dataset records collected in this study were identified by filtering item types categorized as "data" or "dataset" using the advanced search function in (IR blinded for review). Returned search results were collected in an Excel spreadsheet to include data such as the Handle identifier, date ingested, file formats, composition code, and the download count from the item's statistics report. The Handle identifier represents the dataset record's persistent identifier. Composition represents codes that categorize items as single or multiple file deposits. Date available represents the date the dataset record was published in the campus repository. Download statistics were collected via a website link for each dataset record and indicates the number of times the dataset record has been downloaded. Once the data was collected, it was used to evaluate datasets deposited into (IR blinded for review). Results: A total of 522 datasets were identified for analysis covering the period between January 2007 and August 2016. This study revealed two influxes occurring during the period of 2008-2009 and in 2014. During the first time frame a large number of PDFs were deposited by the Illinois Department of Agriculture. Whereas, Microsoft Excel files were deposited in 2014 by the Rare Books and Manuscript Library. Single file datasets clearly dominate the deposits in the campus repository. The total download count for all datasets was 139,663 and the average downloads per month per file across all datasets averaged 3.2. Conclusion: Academic librarians, repository managers, and research data services staff can use the results presented here to anticipate the nature of research data that may be deposited within institutional repositories. With increased awareness, content recruitment, and improvements, IRs can provide a viable cyberinfrastructure for researchers to deposit data, but much can be learned from the data already deposited. Awareness of trends can help librarians facilitate discussions with researchers about research data deposits as well as better tailor their services to address short-term and long-term research needs.
keywords: research data; research statistics; institutional repositories; academic libraries
published: 2017-11-14
 
If you use this dataset, please cite the IJRR data paper using the above citation info. We present a dataset collected from a canoe along the Sangamon River in Illinois. The canoe was equipped with a stereo camera, an IMU, and a GPS device, which provide visual data suitable for stereo or monocular applications, inertial measurements, and position data for ground truth. We recorded a canoe trip up and down the river for 44 minutes covering 2.7 km round trip. The dataset adds to those previously recorded in unstructured environments and is unique in that it is recorded on a river, which provides its own set of challenges and constraints that are described in this paper. The data is divided into subsets, which can be downloaded individually. Video previews are available on Youtube: https://www.youtube.com/channel/UCOU9e7xxqmL_s4QX6jsGZSw The information below can also be found in the README files provided in the 527 dataset and each of its subsets. The purpose of this document is to assist researchers in using this dataset. Images ====== Raw --- The raw images are stored in the cam0 and cam1 directories in bmp format. They are bayered images that need to be debayered and undistorted before they are used. The camera parameters for these images can be found in camchain-imucam.yaml. Note that the camera intrinsics describe a 1600x1200 resolution image, so the focal length and center pixel coordinates must be scaled by 0.5 before they are used. The distortion coefficients remain the same even for the scaled images. The camera to imu tranformation matrix is also in this file. cam0/ refers to the left camera, and cam1/ refers to the right camera. Rectified --------- Stereo rectified, undistorted, row-aligned, debayered images are stored in the rectified/ directory in the same way as the raw images except that they are in png format. The params.yaml file contains the projection and rotation matrices necessary to use these images. The resolution of these parameters do not need to be scaled as is necessary for the raw images. params.yml ---------- The stereo rectification parameters. R0,R1,P0,P1, and Q correspond to the outputs of the OpenCV stereoRectify function except that 1s and 2s are replaced by 0s and 1s, respectively. R0: The rectifying rotation matrix of the left camera. R1: The rectifying rotation matrix of the right camera. P0: The projection matrix of the left camera. P1: The projection matrix of the right camera. Q: Disparity to depth mapping matrix T_cam_imu: Transformation matrix for a point in the IMU frame to the left camera frame. camchain-imucam.yaml -------------------- The camera intrinsic and extrinsic parameters and the camera to IMU transformation usable with the raw images. T_cam_imu: Transformation matrix for a point in the IMU frame to the camera frame. distortion_coeffs: lens distortion coefficients using the radial tangential model. intrinsics: focal length x, focal length y, principal point x, principal point y resolution: resolution of calibration. Scale the intrinsics for use with the raw 800x600 images. The distortion coefficients do not change when the image is scaled. T_cn_cnm1: Transformation matrix from the right camera to the left camera. Sensors ------- Here, each message in name.csv is described ###rawimus### time # GPS time in seconds message name # rawimus acceleration_z # m/s^2 IMU uses right-forward-up coordinates -acceleration_y # m/s^2 acceleration_x # m/s^2 angular_rate_z # rad/s IMU uses right-forward-up coordinates -angular_rate_y # rad/s angular_rate_x # rad/s ###IMG### time # GPS time in seconds message name # IMG left image filename right image filename ###inspvas### time # GPS time in seconds message name # inspvas latitude longitude altitude # ellipsoidal height WGS84 in meters north velocity # m/s east velocity # m/s up velocity # m/s roll # right hand rotation about y axis in degrees pitch # right hand rotation about x axis in degrees azimuth # left hand rotation about z axis in degrees clockwise from north ###inscovs### time # GPS time in seconds message name # inscovs position covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz m^2 attitude covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz deg^2 velocity covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz (m/s)^2 ###bestutm### time # GPS time in seconds message name # bestutm utm zone # numerical zone utm character # alphabetical zone northing # m easting # m height # m above mean sea level Camera logs ----------- The files name.cam0 and name.cam1 are text files that correspond to cameras 0 and 1, respectively. The columns are defined by: unused: The first column is all 1s and can be ignored. software frame number: This number increments at the end of every iteration of the software loop. camera frame number: This number is generated by the camera and increments each time the shutter is triggered. The software and camera frame numbers do not have to start at the same value, but if the difference between the initial and final values is not the same, it suggests that frames may have been dropped. camera timestamp: This is the cameras internal timestamp of the frame capture in units of 100 milliseconds. PC timestamp: This is the PC time of arrival of the image. name.kml -------- The kml file is a mapping file that can be read by software such as Google Earth. It contains the recorded GPS trajectory. name.unicsv ----------- This is a csv file of the GPS trajectory in UTM coordinates that can be read by gpsbabel, software for manipulating GPS paths.
keywords: slam;sangamon;river;illinois;canoe;gps;imu;stereo;monocular;vision;inertial
published: 2017-12-12
 
This dataset includes both meteorology and oceanography data collected at stations (CSI03, CSI06, and CSI09) near the Gulf of Mexico from the LSU WAVCIS (Waves-Current-Surge Information System) lab. The associated data analysis visualization is also saved in separate directories.
keywords: WAVCIS; Gulf of Mexico; Meteorology; Oceanography
published: 2017-12-04
 
Data used for Zaya et al. (2018), published in Invasive Plant Science and Management DOI 10.1017/inp.2017.37, are made available here. There are three spreadsheet files (CSV) available, as well as a text file that has detailed descriptions for each file ("readme.txt"). One spreadsheet file ("prices.csv") gives pricing information, associated with Figure 3 in Zaya et al. (2018). The other two spreadsheet files are associated with the genetic analysis, where one file contains raw data for biallelic microsatellite loci ("genotypes.csv") and the other ("structureResults.csv") contains the results of Bayesian clustering analysis with the program STRUCTURE. The genetic data may be especially useful for future researchers. The genetic data contain the genotypes of the horticultural samples that were the focus of the published article, and also genotypes of nearly 400 wild plants. More information on the location of the wild plant collections can be found in the Supplemental information for Zaya et al. (2015) Biological Invasions 17:2975–2988 DOI 10.1007/s10530-015-0926-z. See "readme.txt" for more information.
keywords: Horticultural industry; invasive species; microsatellite DNA; mislabeling; molecular testing
planned publication date: 2018-03-01
 
Data were used to analyze patterns in predator-specific nest predation on shrubland birds in Illinois as related to landscape composition at multiple landscape scales. Data were used in a Journal of Applied Ecology research paper of the same name. Data were collected between 2011 and 2014 at sites in east-central and northeastern Illinois, USA as part of a Ph.D. research project on the relationship between avian nest predation and landscape characteristics, and how nest predation affects adult and nestling bird behavior.
keywords: nest predation; avian ecology; land cover; landscape composition; landscape scale; nest camera; nest survival; predator-specific mortality; scale-dependence; scrubland; shrub-nesting bird
planned publication date: 2018-03-01
 
The data set consists of Illumina sequences derived from 48 sediment samples, collected in 2015 from Lake Michigan and Lake Superior for the purpose of inventorying the fungal diversity in these two lakes. DNA was extracted from ca. 0.5g of sediment using the MoBio PowerSoil DNA isolation kits following the Earth Microbiome protocol. PCR was completed with the fungal primers ITS1F and fITS7 using the Fluidigm Access Array. The resulting amplicons were sequenced using the Illumina Hi-Seq2500 platform with rapid 2 x 250nt paired-end reads. The enclosed data sets contain the forward read files for both primers, both fixed-header index files, and the associated map files needed to be processed in QIIME. In addition, enclosed are two rarefied OTU files used to evaluate fungal diversity. All decimal latitude and decimal longitude coordinates of our collecting sites are also included. File descriptions: Great_lakes_Map_coordinates.xlsx = coordinates of sample sites QIIME Processing ITS1 region: These are the raw files used to process the ITS1 Illumina reads in QIIME. ***only forward reads were processed GL_ITS1_HW_mapFile_meta.txt = This is the map file used in QIIME. ITS1F_Miller_Fludigm_I1_fixedheader.fastq = Index file from Illumina. Headers were fixed to match the forward reads (R1) file in order to process in QIIME ITS1F_Miller_Fludigm_R1.fastq = Forward Illumina reads for the ITS1 region. QIIME Processing ITS2 region: These are the raw files used to process the ITS2 Illumina reads in QIIME. ***only forward reads were processed GL_ITS2_HW_mapFile_meta.txt = This is the map file used in QIIME. ITS7_Miller_Fludigm_I1_Fixedheaders.fastq = Index file from Illumina. Headers were fixed to match the forward reads (R1) file in order to process in QIIME ITS7_Miller_Fludigm_R1.fastq = Forward Illumina reads for the ITS2 region. Resulting OTU Table and OTU table with taxonomy ITS1 Region wahl_ITS1_R1_otu_table.csv = File contains Representative OTUs based on ITS1 region for all the R1 data and the number of each OTU found in each sample. wahl_ITS1_R1_otu_table_w_tax.csv = File contains Representative OTUs based on ITS1 region for all the R1 and the number of each OTU found in each sample along with taxonomic determination based on the following database: sh_taxonomy_qiime_ver7_97_s_31.01.2016_dev ITS2 Region wahl_ITS2_R1_otu_table.csv = File contains Representative OTUs based on ITS2 region for all the R1 data and the number of each OTU found in each sample. wahl_ITS2_R1_otu_table_w_tax.csv = File contains Representative OTUs based on ITS2 region for all the R1 data and the number of each OTU found in each sample along with taxonomic determination based on the following database: sh_taxonomy_qiime_ver7_97_s_31.01.2016_dev Rarified illumina dataset for each ITS Region ITS1_R1_nosing_rare_5000.csv = Environmental parameters and rarefied OTU dataset for ITS1 region. ITS2_R1_nosing_rare_5000.csv = Environmental parameters and rarefied OTU dataset for ITS2 region. Column headings: #SampleID = code including researcher initials and sequential run number BarcodeSequence = LinkerPrimerSequence = two sequences used CTTGGTCATTTAGAGGAAGTAA or GTGARTCATCGAATCTTTG ReversePrimer = two sequences used GCTGCGTTCTTCATCGATGC or TCCTCCGCTTATTGATATGC run_prefix = initials of run operator Sample = location code, see thesis figures 1 and 2 for mapped locations and Great_lakes_Map_coordinates.xlsx for exact coordinates. DepthGroup = S= shallow (50-100 m), MS=mid-shallow (101-150 m), MD=mid-deep (151-200 m), and D=deep (>200 m)" Depth_Meters = Depth in meters Lake = lake name, Michigan or Superior Nitrogen % Carbon % Date = mm/dd/yyyy pH = acidity, potential of Hydrogen (pH) scale SampleDescription = Sample or control X = sequential run number OTU ID = Operational taxonomic unit ID
keywords: Illumina; next-generation sequencing; ITS; fungi
published: 2017-12-01
 
This dataset contains all the numerical results (digital elevation models) that are presented in the paper "Landscape evolution models using the stream power incision model show unrealistic behavior when m/n equals 0.5." The paper can be found at: http://www.earth-surf-dynam-discuss.net/esurf-2017-15/ The paper has been accepted, but the most up to date version may not be available at the link above. If so, please contact Jeffrey Kwang at jeffskwang@gmail.com to obtain the most up to date manuscript.
keywords: landscape evolution models; digital elelvation model
published: 2017-11-29
 
This dataset contains genotypic and phenotypic data, R scripts, and the results of analysis pertaining to a multi-location field trial of Miscanthus sinensis. Genome-wide association and genomic prediction were performed for biomass yield and 14 yield-component traits across six field trial locations in Asia and North America, using 46,177 single-nucleotide polymorphism (SNP) markers mined from restriction site-associated DNA sequencing (RAD-seq) and 568 M. sinensis accessions. Genomic regions and candidate genes were identified that can be used for breeding improved varieties of M. sinensis, which in turn will be used to generate new M. xgiganteus clones for biomass.
keywords: miscanthus; genotyping-by-sequencing (GBS); genome-wide association studies (GWAS); genomic selection
published: 2016-08-18
 
Copyright Review Management System renewals by year, data from Table 2 of the article "How Large is the ‘Public Domain’? A comparative Analysis of Ringer’s 1961 Copyright Renewal Study and HathiTrust CRMS Data."
keywords: copyright; copyright renewals; HathiTrust
published: 2017-11-15
 
Monthly water withdrawal records (total pumpage and per-capita consumption) for the City of Austin, Texas (2000-2014). Data were provided by Austin Water Utility.
keywords: Water use; Water conservation
published: 2017-03-02
 
This data was collected between 2004 and 2010 at White River National Wildlife Refuge (WRNWR) and Saint Francis National Forest (SF). It was collected as part of two master’s and one PhD project at Arkansas State University USA studying Swainson’s Warbler habitat use, survival, and body condition.
keywords: Swainson’s Warbler; Limnothlypis swainsonii; flooding; natural disturbance; apparent survival; body condition
published: 2017-10-11
 
The International Registry of Reproductive Pathology Database is part of pioneering work done by Dr. Kenneth McEntee to comprehensively document thousands of disease cases studies. His large and comprehensive collection of case reports and physical samples was complimented by development of the International Registry of Reproductive Pathology Database in the 1980s. The original FoxPro Database files and a migrated access version were completed by the College of Veterinary Medicine in 2016. Access CSV files were completed by the University of Illinois Library in 2017.
keywords: Animal Pathology; Databases; Veterinary Medicine
published: 2017-10-10
 
This dataset contains ground motion data for Newmark Structural Engineering Laboratory (NSEL) Report Series 048, "Modification of ground motions for use in Central North America: Southern Illinois surface ground motions for structural analysis". The data are 20 individual ground motion time history records developed at each of the 10 sites (for a total of 200 ground motions). These accompanying ground motions are developed following the detailed procedure presented in Kozak et al. [2017].
keywords: earthquake engineering; ground motion records; southern Illinois seismic hazard; dynamic structural analysis; conditional mean spectrum
published: 2017-09-28
 
This is the dataset used in the Journal of Ecology publication of the same name. It is a site by species matrix of species relative abundances. The file BH.veg.data.csv contains a site by species matrix of species relative abundance (percent cover across all sampling quadrats within site). Data under the heading Year refers to sampling periods. Year 1 refers to the first set of samples taken between 1997 and 2000, Year 2 refers to the second set taken between 2002 and 2005, Year 3 refers to the third set taken between 2007 and 2010, and Year 4 refers to the fourth set taken between 2012 and 2015. All sites met Critical Trends Assessment Program (CTAP) size criteria of being at least 2 ha in size with a minimum of 500 m2 of suitable sampling area. The data in file BH.site.location.csv contains Public Land Survey System ranges and townships in which specific sites were located. All sites were located within the U.S. state of Illinois. More information about this dataset: Interested parties can request data from the Critical Trends Assessment Program, which was the source for the data on the wetlands in this study. More information on the program and data requests can be obtained by visiting the program webpage. Critical Trends Assessment Program, Illinois Natural History Survey. http://wwx.inhs.illinois.edu/research/ctap/
keywords: biodiversity; biotic homogenization; invasive species; Phalaris arundinacea; plant population and community dynamics; similarity index; wetlands
published: 2017-09-26
 
This file contains the supplemental appendix for the article "Farmer Preferences for Agricultural Soil Carbon Sequestration Schemes" published in Applied Economic Policy and Perspectives (accepted 2017).
keywords: appendix; carbon sequestration; tillage; choice experiment
published: 2017-02-21
 
GBS data from diverse sorghum lines. Project funded by DOE, ARPA-E, and startup funds to PJ Brown.
published: 2017-02-21
 
GBS data from biparental sorghum populations provided by Dr. Bill Rooney, TAMU. Data produced and analyzed by Pradeep Hirannaiah to study recombination in sorghum. Funding for this study was provided by the Sorghum Checkoff.
published: 2017-02-23
 
GBS data from diverse sorghum lines. Project funded by DOE, ARPA-E, and startup funds to PJ Brown.
published: 2017-09-08
 
Transport and MFM data of brickwork artificial spin ice composed of permalloy are included, which are reproductions of the data in an article named "Magnetic response of brickwork artificial spin ice". Transport data represent magnetic response of connected brickwork artificial spin ice, and MFM data represent how both connected and disconnected brickwork artificial spin ice react to external magnetic fields. SEM images of typical samples are included, where individual nanowire leg (island) is approximately 660 nm long and 140 nm wide with a 40 nm thickness. For the transport, each sample was measured in a longitudinal and a transverse geometry. Red curves are the 2500 Oe to -2500 Oe sweeps and the blue curves are -2500 Oe to 2500 Oe sweeps. Transport measurements were taken by using a standard 4-wire technique. Each plot was saved in pdf format.
keywords: Magnetotransport
published: 2017-09-06
 
Spire angle data for sinistral whelks of the family Busyconidae. Data focuses on spire angles, with some data on total shell length. Locality information is present for all modern specimens.
keywords: lightning whelk; sinistral whelk; spire angle; sourcing; Busycon; Cahokia; Spiro
published: 2017-06-15
 
Datasets used in the study, "Optimal completion of incomplete gene trees in polynomial time using OCTAL," presented at WABI 2017.
keywords: phylogenomics; missing data; coalescent-based species tree estimation; gene trees
published: 2016-12-18
 
This dataset is the numerical simulation data of the computational study of the cold front-related hydrodynamics in the Wax Lake delta. The numerical model used is ECOM-si.
keywords: Wax Lake delta; Hydrodynamics; Cold front
published: 2016-12-12
 
This dataset is about a topographic LIDAR survey (saved in “waxlake-lidar.img”) that was conducted over the Wax Lake delta, between longitudes −91.5848 to −91.292 degrees, and latitudes 29.3647 to 29.6466 degrees. Different from other elevation data, the positive value in the LIDAR data indicates land elevation, while the zero value implies riverbed without identifying specific water depth.
keywords: LIDAR; Wax Lake delta
published: 2016-12-12
 
This dataset is the field measurements of water depth at the Wax Lake delta conducted in late 2012.
keywords: Wax Lake delta; Bathymetry
published: 2016-12-12
 
This dataset is the field measurements of water depth at the Wax Lake delta on the date 2012-12-01.
keywords: Wax Lake delta; Bathymetry
published: 2016-12-12
 
This dataset includes data of the the Wax Lake delta from four public agencies: NGDC, USGS, NDBC, and NOAA CO-OPS. Besides the original data, the processed data associated with analyzed figures are also shared.
keywords: Wax Lake delta; NOAA CO-OPS; NGDC; USGS; NDBC
published: 2016-12-12
 
This dataset is the field measurements of currents at two stations (Big Hogs Bayou and Delta1) in the the Wax Lake delta in November 2012 and February 2013.
keywords: Wax Lake delta; Currents
published: 2017-08-11
 
Enclosed in this dataset are transport data of kagome connected artificial spin ice networks composed of permalloy nanowires. The data herein are reproductions of the data seen in Appendix B of the dissertation titled "Magnetotransport of Connected Artificial Spin Ice". Field sweeps with the magnetic field applied in-plane were performed in 5 degree increments for armchair orientation kagome artificial spin ice and zigzag orientation kagome artificial spin ice.
keywords: Magnetotransport; artificial spin ice; nanowires
published: 2017-07-29
 
This dataset contains the PartMC-MOSAIC simulations used in the article “Plume-exit modeling to determine cloud condensation nuclei activity of aerosols from residential biofuel combustion”. The data is organized as a set of folders, each folder representing a different scenario modeled. Each folder contains a series of NetCDF files, which are the output of the PartMC-MOSAIC simulation. They contain information on particle and gas properties, both of the biofuel burning plume and background. Input files for PartMC-MOSAIC are also included. This dataset was used during the open review process at Atmospheric Chemistry and Physics (ACP) and supports both the discussion paper and final article.
keywords: CCN; cloud condensation nuclei; activation; supersaturation; biofuel
published: 2017-06-28
 
TBP assessment raw data files of pre- and post- motion capture velocity and center of pressure force plate data. Labels are self-explanatory. The .mat files refer to data exported from the force plate for the time-to-stabilization assessments while the .txt files are the data collected for smoothness of gait assessments. These files do not relate to one another and are from separate assessments. Note: there are some .txt files contain wide white space between data and the header.
keywords: Multiple Sclerosis; Rehabilitation; Balance; Ataxia; Ballet; Dance; Targeted Ballet Program
published: 2017-06-16
 
Table S3. Mean slope response for each predictive model used in the ecoinformatic analysis. Mean responses are provided for each seasonal and annual pollen data set analyzed from BCI and PNSL and are summarized by life form. Calculated p-values are provided for each model.
keywords: pollen; response; climate; ecoinformatics; BCI; PNSL; Panama
published: 2017-06-16
 
Table S2. Raw pollen counts and climatic data for each seasonal sampling period. Climatic data reflects the average daily conditions observed over the duration samples were collected (˚C/day, mm/day, MJ/m2/day). Lycopodium counts and counts for each pollen taxon reflect the aggregated pollen sum from four sampling heights.
keywords: pollen; count; climate; data; BCI; PNSL; Panama
published: 2017-06-16
 
Table S1. Pollen types identified in the BCI and PNSL pollen rain data sets. Pollen types were identified to species when possible and assigned a life form based on descriptions provided in Croat, T.B. (1978). Taxa from BCI and PNSL were assigned a 1 if present in forest census data or a 0 if absent. The relative representation of each taxon has been provided for each extended record and by dry and wet season representation respectively. CA loadings are provided for axes 1 and 2 (Fig. 1).
keywords: pollen; identifications; abundance; data; BCI; PNSL; Panama
published: 2017-06-01
 
List of Chinese Students Receiving a Ph.D. in Chemistry between 1905 and 1964. Based on two books compiling doctoral dissertations by Chinese students in the United States. Includes disciplines; university; advisor; year degree awarded, birth and/or death date, dissertation title. Accompanies Chapter 5 : History of the Modern Chemistry Doctoral Program in Mainland China by Vera V. Mainz published in "Igniting the Chemical Ring of Fire : Historical Evolution of the Chemical Communities in the Countries of the Pacific Rim", Seth Rasmussen, Editor. Published by World Scientific. Expected publication 2017.
keywords: Chinese; graduate student; dissertation; university; advisor; chemistry; engineering; materials science
published: 2017-05-01
 
Indianapolis Int'l Airport to Urbana: Sampling Rate: 2 Hz Total Travel Time: 5901534 ms or 98.4 minutes Number of Data Points: 11805 Distance Traveled: 124 miles via I-74 Device used: Samsung Galaxy S6 Date Recorded: 2016-11-27 Parameters Recorded: * ACCELEROMETER X (m/s²) * ACCELEROMETER Y (m/s²) * ACCELEROMETER Z (m/s²) * GRAVITY X (m/s²) * GRAVITY Y (m/s²) * GRAVITY Z (m/s²) * LINEAR ACCELERATION X (m/s²) * LINEAR ACCELERATION Y (m/s²) * LINEAR ACCELERATION Z (m/s²) * GYROSCOPE X (rad/s) * GYROSCOPE Y (rad/s) * GYROSCOPE Z (rad/s) * LIGHT (lux) * MAGNETIC FIELD X (microT) * MAGNETIC FIELD Y (microT) * MAGNETIC FIELD Z (microT) * ORIENTATION Z (azimuth °) * ORIENTATION X (pitch °) * ORIENTATION Y (roll °) * PROXIMITY (i) * ATMOSPHERIC PRESSURE (hPa) * SOUND LEVEL (dB) * LOCATION Latitude * LOCATION Longitude * LOCATION Altitude (m) * LOCATION Altitude-google (m) * LOCATION Altitude-atmospheric pressure (m) * LOCATION Speed (kph) * LOCATION Accuracy (m) * LOCATION ORIENTATION (°) * Satellites in range * GPS NMEA * Time since start in ms * Current time in YYYY-MO-DD HH-MI-SS_SSS format Quality Notes: There are some things to note about the quality of this data set that you may want to consider while doing preprocessing. This dataset was taken continuously as a single trip, no stop was made for gas along the way making this a very long continuous dataset. It starts in the parking lot of the Indianapolis International Airport and continues directly towards a gas station on Lincoln Avenue in Urbana, IL. There are a couple parts of the trip where the phones orientation had to be changed because my navigation cut out. These times are easy to account for based on Orientation X/Y/Z change. I would also advise cutting out the first couple hundred points or the points leading up to highway speed. The phone was mounted in the cupholder in the front seat of the car.
keywords: smartphone; sensor; driving; accelerometer; gyroscope; magnetometer; gps; nmea; barometer; satellite
published: 2017-05-31
 
Dataset includes maternal antigen treatment and early-life antigen treatment for male zebra finches. Also includes data on beak coloration, measures of song complexity for each male, and female responses to treated males. Male beak color and song metadata: * MATID= Maternal Identity * MATTRT=Maternal antigen treatment prior to egg laying (KLH=keyhole limpet hemocyanin, LPS= lipopolysaccharide, PBS=phosphate buffered saline) * YGTRT= Young antigen treatment post-hatch (KLH=keyhole limpet hemocyanin, LPS= lipopolysaccharide, PBS=phosphate buffered saline)) * NESTBANDNUM= Nestling band number * Haptoglobin=haptoglobin levels at day 28 (mg/ml) * Mean TE= Mean number of total elements in that male's song * TE (z)= Z-transformed total elements * Mean UE=Mean number of unique elements in the song * UE (z)= z-transformed unique elements * mean phrases= Mean number of song phrases * Phrases (z)= z-transformed song phrases * Mean D= Mean song duration in seconds * D (z)=z-transformed song duration * B2 standard=beak brightness standardized so that lower values reflect less bright beaks * B2 (z)=z-transformed brightness * S1R standard= beak saturation at high wavelengths standardized so that lower values reflect less red beaks * S1R (z)=z-transformed S1R * S1U standard= beak saturation at low wavelengths standardized so that lower values reflect less red beaks * S1U (z)=z-transformed S1U * H4B standard= beak hue standardized so that lower values reflect less red beaks * H4B (z)=z-transformed H4B Female choice metadata: * Control Bird=PBS denotes that all control males received phosphate buffered saline * Treatment Bird= Treatment the male received (keyhole limpet hemocyanin (KLH) or lipopolysaccharide (LPS)) * Beak Wipes Control=# of beak wipes the female performed when on the control male side * Beak Wipes Treatment=# of beak wipes the female performed when on the "treatment male" side * Hops Control=# of hops female performed when on the control male side * Hops Treatment=# of hops female performed when on the treatment male side * Time Spent Near Control=amount of time (sec) female spent on the control male side * Time Spent Near Treatment=amount of time (sec) the female spent on the treatment male side
keywords: early-life; stress; immune response; phenotypic correlation; sexual signal; zebra finch;birdsongs; acoustic signals; beak coloration; mate selection
published: 2016-06-23
 
This dataset contains hourly traffic estimates (speeds) for individual links of the New York City road network for the years 2010-2013, estimated from New York City Taxis.
keywords: traffic estimates; traffic conditions; New York City
published: 2017-03-07
 
This is a sample 5 minute video of an E coli bacterium swimming in a microfluidic chamber as well as some supplementary code files to be used with the Matlab code available at https://github.com/dfraebel/CellTracking
published: 2016-05-19
 
This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission. The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.
keywords: taxi;transportation;New York City;GPS
published: 2016-06-23
 
This dataset was extracted from a set of metadata files harvested from the DataCite metadata store (https://search.datacite.org/ui) during December 2015. Metadata records for items with a resourceType of dataset were collected. 1,647,949 total records were collected. This dataset contains three files: 1) readme.txt: A readme file. 2) version-results.csv: A CSV file containing three columns: DOI, DOI prefix, and version text contents 3) version-counts.csv: A CSV file containing counts for unique version text content values.
keywords: datacite;metadata;version values;repository data
published: 2016-05-26
 
This data set includes survey responses collected during 2015 from academic libraries with library publishing services. Each institution responded to questions related to its use of user studies or information about readers in order to shape digital publication design, formats, and interfaces. Survey data was supplemented with institutional categories to facilitate comparison across institutional types.
keywords: academic libraries; publishing; user experience; user studies
published: 2017-03-08
 
This dataset includes early embryogenesis and post-embryonic development of Soybean cyst nematode.
keywords: Soybean cyst nematode; Embryogenesis; Post-embryonic development
published: 2017-02-28
 
Leesburg, VA to Indianapolis, Indiana: Sampling Rate: 0.1 Hz Total Travel Time: 31100007 ms or 518 minutes or 8.6 hours Distance Traveled: 570 miles via I-70 Number of Data Points: 3112 Device used: Samsung Galaxy S4 Date Recorded: 2017-01-15 Parameters Recorded: * ACCELEROMETER X (m/s²) * ACCELEROMETER Y (m/s²) * ACCELEROMETER Z (m/s²) * GRAVITY X (m/s²) * GRAVITY Y (m/s²) * GRAVITY Z (m/s²) * LINEAR ACCELERATION X (m/s²) * LINEAR ACCELERATION Y (m/s²) * LINEAR ACCELERATION Z (m/s²) * GYROSCOPE X (rad/s) * GYROSCOPE Y (rad/s) * GYROSCOPE Z (rad/s) * LIGHT (lux) * MAGNETIC FIELD X (microT) * MAGNETIC FIELD Y (microT) * MAGNETIC FIELD Z (microT) * ORIENTATION Z (azimuth °) * ORIENTATION X (pitch °) * ORIENTATION Y (roll °) * PROXIMITY (i) * ATMOSPHERIC PRESSURE (hPa) * Relative Humidity (%) * Temperature (F) * SOUND LEVEL (dB) * LOCATION Latitude * LOCATION Longitude * LOCATION Altitude (m) * LOCATION Altitude-google (m) * LOCATION Altitude-atmospheric pressure (m) * LOCATION Speed (kph) * LOCATION Accuracy (m) * LOCATION ORIENTATION (°) * Satellites in range * GPS NMEA * Time since start in ms * Current time in YYYY-MO-DD HH-MI-SS_SSS format Quality Notes: There are some things to note about the quality of this data set that you may want to consider while doing preprocessing. This dataset was taken continuously but had multiple stops to refuel (without the data recording ceasing). This can be removed by parsing out all data that has a speed of 0. The mount for this dataset was fairly stable (as can be seen by the consistent orientation angle throughout the dataset). It was mounted tightly between two seats in the back of the vehicle. Unfortunately, the frequency for this dataset was set fairly low at one per ten seconds.
keywords: smartphone; sensor; driving; accelerometer; gyroscope; magnetometer; gps; nmea; barometer; satellite; temperature; humidity
published: 2016-12-20
 
Scripts and example data for AIDData (aiddata.org) processing in support of forthcoming Nakamura dissertation. This dataset includes two sets of scripts and example data files from an aiddata.org data dump. Fuller documentation about the functionality for these scripts is within the readme file. Additional background information and description of usage will be in the forthcoming Nakamura dissertation (link will be added when available). Data originally supplied by Nakamura. Python code and this readme file created by Wickes. Data included within this deposit are examples to demonstrate execution. Roughly, there are two python scripts in here: keyword_search.py, designed to assist in finding records matching specific keywords, and matching_tool.ipynb, designed to assist in detection of which records are and are not contained within a keyword results file and an aiddata project data file.
keywords: aiddata; natural resources
published: 2016-12-19
 
Files in this dataset represent an investigation into use of the Library mobile app Minrva during the months of May 2015 through December 2015. During this time interval 45,975 API hits were recorded by the Minrva web server. The dataset included herein is an analysis of the following: 1) a delineation of API hits to mobile app modules use in the Minrva app by month, 2) a general analysis of Minrva app downloads to module use, and 3) the annotated data file providing associations from API hits to specific modules used, organized by month (May 2015 – December 2015).
keywords: API analysis; log analysis; Minrva Mobile App
published: 2016-12-13
 
BAM files for founding strain (MG1655-motile) as well as evolved strains from replicate motility selection experiments in low-viscosity agar plates containing either rich medium (LB) or minimal medium (M63+0.18mM galactose)
published: 2016-12-02
 
This dataset enumerates the number of geocoded tweets captured in geographic rectangular bounding boxes around the metropolitan statistical areas (MSAs) defined for 49 American cities, during a four-week period in 2012 (between April and June), through the Twitter Streaming API. More information on MSA definitions: https://www.census.gov/population/metro/
keywords: human dynamics; social media; urban informatics; pace of life; Twitter; ecological correlation; individual behavior
published: 2016-11-30
 
This is the dataset used in the BioScience publication of the same name. More information about this dataset: Interested parties can request data from the Critical Trends Assessment Program, which was the source for the data on natural areas in this study. More information on the program and data requests can be obtained by visiting the program webpage. Critical Trends Assessment Program, Illinois Natural History Survey. http://wwx.inhs.illinois.edu/research/ctap/ These spatial datasets were used for analyses: Illinois Natural History Survey. 2003. Illinois GAP analysis land cover classification 1999-2000, 1:100 000 Scale, Raster Digital Data, Version 2.0. Champaign, IL, USA. Illinois State Geological Survey. 1995. Illinois Landcover Thematic Map Coverage Map 1991-1995. Champaign, IL, USA. Illinois State Geological Survey. 2001. Illinois Landcover Thematic Map Coverage Map 1999-2000. Champaign, IL, USA. USDA National Agricultural Statistics Service Cropland Data Layer. 1999-2015. Published crop-specific data layer [Online]. Available at https://nassgeodata.gmu.edu/CropScape/. USDA-NASS, Washington, DC. Information on agricultural practices and landcover changes were derived from the following U.S. Department of Agriculture (USDA) resources: USDA Economic Research Service. 2016. Adoption of Genetically Engineered Crops in the U.S. Available at http://www.ers.usda.gov/data-products/. USDA-ERS, Washington, DC. USDA Natural Resources Conservation Service. 2015. Summary Report: 2012 National Resources Inventory. https://www.nrcs.usda.gov/Internet/FSE_DOCUMENTS/nrcseprd396218.pdf. USDA-NRCS, Washington, DC, and Center for Survey Statistics and Methodology, Iowa State University, Ames, Iowa.
keywords: Milkweed; Monarch Butterfly; CTAP Critical Trends Assessment Program; BioScience
published: 2016-11-28
 
These show the topography and relief of the Precambrian surface of the Cratonic Platform of the United States.
keywords: precambrian; geology; relief; elevation
published: 2016-06-06
 
These datasets represent first-time collaborations between first and last authors (with mutually exclusive publication histories) on papers with 2 to 5 authors in years [1988,2009] in PubMed. Each record of each dataset captures aspects of the similarity, nearness, and complementarity between two authors about the paper marking the formation of their collaboration.
published: 2016-08-16
 
This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families used to build the HMMs and BLAST databases. The file structure is: ./X/Y/initial.fasttree ./X/Y/initial.fasta where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on the initial.fasta. The query.tar archive contains the query sequences for each cross-fold set. The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas, where Z is the fragment length (1, 0.5, or 0.25). The query files are found in the splits directory. [1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.
keywords: HIPPI dataset; ensembles of profile Hidden Markov models; Pfam
published: 2016-08-02
 
These data are the result of a multi-step process aimed at enriching BIBFRAME RDF with linked data. The process takes in an initial MARC XML file, transforms it to BIBFRAME RDF/XML, and then four separate python files corresponding to the BIBFRAME 1.0 model (Work, Instance, Annotation, and Authority) are run over the BIBFRAME RDF/XML output. The input and outputs of each step are included in this data set. Input file types include the CSV; MARC XML; and Master RDF/XML Files. The CSV contain bibliographic identifiers to e-books. From CSVs a set of MARC XML are generated. The MARC XML are utilized to produce the Master RDF file set. The major outputs of the enrichment code produce BIBFRAME linked data as Annotation RDF, Instance RDF, Work RDF, and Authority RDF.
keywords: BIBFRAME; Schema.org; linked data; discovery; MARC; MARCXML; RDF
published: 2016-07-22
 
Datasets and R scripts relating to the manuscript "Ecological characteristics and in situ genetic associations for yield-component traits of wild Miscanthus from eastern Russia" published in Annals of Botany, 10.1093/aob/mcw137. Field data, including collection locations, physical and ecological information for each location, and plant phenotypes relating to biomass are included. Genetic data in this repository include single nucleotide polymorphisms (SNPs) derived from restriction site-associated DNA sequencing (RAD-seq), as well as plastid microsatellites. A file is also included listing the DNA sequences of all RAD-seq markers generated to-date by the Sacks lab, including those from this publication.
keywords: Miscanthus sacchariflorus; Miscanthus sinensis; Russia; germplasm; RAD-seq; SNP
published: 2016-05-16
 
This dataset contains the protein sequences and trees used to compare NRPS condensation domains in the AMB gene cluster and was used to create figure S1 in Rojas et al. 2015. Instead of having to collect representative sequences independently, this set of condensation domain sequences may serve as a quick reference set for coarse classification of condensation domains.
keywords: condensation domain; NRPS; biosynthetic gene cluster; antimetabolite; Pseudomonas; oxyvinylglycine; secondary metabolite; thiotemplate; toxin
published: 2016-06-23
 
This dataset was extracted from a set of metadata files harvested from the DataCite metadata store (http://search.datacite.org/ui) during December 2015. Metadata records for items with a resourceType of dataset were collected. 1,647,949 total records were collected. This dataset contains four files: 1) readme.txt: a readme file. 2) language-results.csv: A CSV file containing three columns: DOI, DOI prefix, and language text contents 3) language-counts.csv: A CSV file containing counts for unique language text content values. 4) language-grouped-counts.txt: A text file containing the results of manually grouping these language codes.
keywords: datacite;metadata;language codes;repository data