Displaying 151 - 175 of 639 in total

Subject Area

Life Sciences (335)
Social Sciences (135)
Physical Sciences (92)
Technology and Engineering (62)
Uncategorized (14)
Arts and Humanities (1)


Other (193)
U.S. National Science Foundation (NSF) (189)
U.S. Department of Energy (DOE) (64)
U.S. National Institutes of Health (NIH) (60)
U.S. Department of Agriculture (USDA) (42)
Illinois Department of Natural Resources (IDNR) (17)
U.S. Geological Survey (USGS) (6)
U.S. National Aeronautics and Space Administration (NASA) (5)
Illinois Department of Transportation (IDOT) (4)
U.S. Army (2)

Publication Year

2021 (108)
2022 (108)
2020 (96)
2023 (78)
2019 (72)
2018 (62)
2024 (42)
2017 (36)
2016 (30)
2025 (2)
2009 (1)
2011 (1)
2012 (1)
2014 (1)
2015 (1)


CC0 (356)
CC BY (263)
custom (20)


published: 2019-10-27
This dataset accompanies the paper "STREETS: A Novel Camera Network Dataset for Traffic Flow" at Neural Information Processing Systems (NeurIPS) 2019. Included are: *Over four million still images form publicly accessible cameras in Lake County, IL. The images were collected across 2.5 months in 2018 and 2019. *Directed graphs describing the camera network structure in two communities in Lake County. *Documented non-recurring traffic incidents in Lake County coinciding with the 2018 data. *Traffic counts for each day of images in the dataset. These counts track the volume of traffic in each community. *Other annotations and files useful for computer vision systems. Refer to the accompanying "readme.txt" or "readme.pdf" for further details.
keywords: camera network; suburban vehicular traffic; roadways; computer vision
published: 2017-11-14
If you use this dataset, please cite the IJRR data paper (bibtex is below). We present a dataset collected from a canoe along the Sangamon River in Illinois. The canoe was equipped with a stereo camera, an IMU, and a GPS device, which provide visual data suitable for stereo or monocular applications, inertial measurements, and position data for ground truth. We recorded a canoe trip up and down the river for 44 minutes covering 2.7 km round trip. The dataset adds to those previously recorded in unstructured environments and is unique in that it is recorded on a river, which provides its own set of challenges and constraints that are described in this paper. The data is divided into subsets, which can be downloaded individually. Video previews are available on Youtube: https://www.youtube.com/channel/UCOU9e7xxqmL_s4QX6jsGZSw The information below can also be found in the README files provided in the 527 dataset and each of its subsets. The purpose of this document is to assist researchers in using this dataset. Images ====== Raw --- The raw images are stored in the cam0 and cam1 directories in bmp format. They are bayered images that need to be debayered and undistorted before they are used. The camera parameters for these images can be found in camchain-imucam.yaml. Note that the camera intrinsics describe a 1600x1200 resolution image, so the focal length and center pixel coordinates must be scaled by 0.5 before they are used. The distortion coefficients remain the same even for the scaled images. The camera to imu tranformation matrix is also in this file. cam0/ refers to the left camera, and cam1/ refers to the right camera. Rectified --------- Stereo rectified, undistorted, row-aligned, debayered images are stored in the rectified/ directory in the same way as the raw images except that they are in png format. The params.yaml file contains the projection and rotation matrices necessary to use these images. The resolution of these parameters do not need to be scaled as is necessary for the raw images. params.yml ---------- The stereo rectification parameters. R0,R1,P0,P1, and Q correspond to the outputs of the OpenCV stereoRectify function except that 1s and 2s are replaced by 0s and 1s, respectively. R0: The rectifying rotation matrix of the left camera. R1: The rectifying rotation matrix of the right camera. P0: The projection matrix of the left camera. P1: The projection matrix of the right camera. Q: Disparity to depth mapping matrix T_cam_imu: Transformation matrix for a point in the IMU frame to the left camera frame. camchain-imucam.yaml -------------------- The camera intrinsic and extrinsic parameters and the camera to IMU transformation usable with the raw images. T_cam_imu: Transformation matrix for a point in the IMU frame to the camera frame. distortion_coeffs: lens distortion coefficients using the radial tangential model. intrinsics: focal length x, focal length y, principal point x, principal point y resolution: resolution of calibration. Scale the intrinsics for use with the raw 800x600 images. The distortion coefficients do not change when the image is scaled. T_cn_cnm1: Transformation matrix from the right camera to the left camera. Sensors ------- Here, each message in name.csv is described ###rawimus### time # GPS time in seconds message name # rawimus acceleration_z # m/s^2 IMU uses right-forward-up coordinates -acceleration_y # m/s^2 acceleration_x # m/s^2 angular_rate_z # rad/s IMU uses right-forward-up coordinates -angular_rate_y # rad/s angular_rate_x # rad/s ###IMG### time # GPS time in seconds message name # IMG left image filename right image filename ###inspvas### time # GPS time in seconds message name # inspvas latitude longitude altitude # ellipsoidal height WGS84 in meters north velocity # m/s east velocity # m/s up velocity # m/s roll # right hand rotation about y axis in degrees pitch # right hand rotation about x axis in degrees azimuth # left hand rotation about z axis in degrees clockwise from north ###inscovs### time # GPS time in seconds message name # inscovs position covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz m^2 attitude covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz deg^2 velocity covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz (m/s)^2 ###bestutm### time # GPS time in seconds message name # bestutm utm zone # numerical zone utm character # alphabetical zone northing # m easting # m height # m above mean sea level Camera logs ----------- The files name.cam0 and name.cam1 are text files that correspond to cameras 0 and 1, respectively. The columns are defined by: unused: The first column is all 1s and can be ignored. software frame number: This number increments at the end of every iteration of the software loop. camera frame number: This number is generated by the camera and increments each time the shutter is triggered. The software and camera frame numbers do not have to start at the same value, but if the difference between the initial and final values is not the same, it suggests that frames may have been dropped. camera timestamp: This is the cameras internal timestamp of the frame capture in units of 100 milliseconds. PC timestamp: This is the PC time of arrival of the image. name.kml -------- The kml file is a mapping file that can be read by software such as Google Earth. It contains the recorded GPS trajectory. name.unicsv ----------- This is a csv file of the GPS trajectory in UTM coordinates that can be read by gpsbabel, software for manipulating GPS paths. @article{doi:10.1177/0278364917751842, author = {Martin Miller and Soon-Jo Chung and Seth Hutchinson}, title ={The Visual–Inertial Canoe Dataset}, journal = {The International Journal of Robotics Research}, volume = {37}, number = {1}, pages = {13-20}, year = {2018}, doi = {10.1177/0278364917751842}, URL = {https://doi.org/10.1177/0278364917751842}, eprint = {https://doi.org/10.1177/0278364917751842} }
keywords: slam;sangamon;river;illinois;canoe;gps;imu;stereo;monocular;vision;inertial
published: 2024-01-01
These data were used to make a predictive model of when ornate box turtles (Terrapene ornata) are likely to be above ground and at risk from fire. The data were generated using shell temperatures, soil temperatures at 0.35 m deep from known overwintering sites, and the spring and fall soil temperature inversion dates during 2019–2022 to infer if 26 individual radio-tracked turtles were above or below ground at three sites in Illinois.
keywords: turtle; conservation; controlled burn; fire management; ectotherm; hibernation; brumation; reptile
published: 2024-01-01
Supplementary data tables for the dissertation "Hybridization dynamics and population genomics of a Manacus hybrid zone." This work focuses on the dynamics of hybridization over time in two species of tropical birds, the golden-collared manakin (Manacus vitellinus) and white-collared manakin (Manacus candei) comparing data from historical museum samples and contemporary wild-caught birds. Table A1 contains the sample metadata for the Manacus Restriction site-associated DNA sequencing dataset used in the dissertation with associated NCBI Biosample Accession numbers, Smithsonian Museum of Natural History number (where applicable), sample IDs, sampling site locations, and sample information of year the sample was taken, age, and sex. Table A6 contains phenotypic measurements of male plumage traits of manakins used in cline analyses to assess hybrid zone movement over time in historical and contemporary datasets, including beard length (mm), epaulet width (mm), tail length (mm), collar color (nm), and belly color (nm). Table A7 contains a summary of male plumage measurements across the hybrid zone. Table C1 contains a list of annotated protein coding genes in candidate regions of interest in Manacus genomes using outlier regions of genomic divergence, linkage disequilibrium, and enrichment of parental private alleles.
keywords: csv; manacus; manakin; genomics; dissertation
published: 2024-01-01
Contains scattering data obtained for (TaSe4)2I at the Advanced Photon Source at Argonne National Laboratory. Beamline 6ID-D was used with a beam energy of 64.8 keV in a transmission geometry. Data was obtained at temperatures between 28 and 300 K. See the readme.txt file for more information.
keywords: X-ray diffraction
published: 2023-12-18
Data in this publication were used to examine the effects of habitat and landscape-level covariates on occupancy and interannual dynamics and the effects of environmental factors on detection of Black-billed Cuckoos and Yellow-billed Cuckoos. Data were collected between 2019-2020 in northern Illinois, USA. Procedures were approved by the Illinois Institutional Animal Care and Use Committee (IACUC), protocol no. 19086.
keywords: Black-billed Cuckoo; habitat use; multi-scale; occupancy dynamics; turnover; Yellow-billed Cuckoo
published: 2023-12-15
This page contains the data for the publication "Regenerative growth is constrained by brain tumor to ensure proper patterning in Drosophila" published in PLOS Genetics in 2023.
published: 2023-12-13
Corbicula spp. are one of the most prolific aquatic invasive species in the world and can have negative effects on aquatic ecosystems. We performed qualitative field surveys, examined literature accounts and natural history museum holdings, and accessed citizen science data sources to document the distribution of Corbicula in Mexico and shared drainages. Through 26 publications (N = 127 records), 312 museum holdings, and 446 iNaturalist records, we documented 885 records pertaining to Corbicula in Mexico and shared drainages. The first record of the species in Mexico was in 1969, and it has since been reported from 26 of the 32 Mexican states and most of the major river basins throughout the country. However, we suggest Corbicula is more prevalent in Mexico than we report in this work as it is often under sampled / under reported.
keywords: Corbicula; exotic species; invasive species; Asian Clams; Bivalvia; freshwater systems
published: 2023-12-06
This dataset accompanies an article published in the journal Bioacoustics: "Tradeoffs in sound quality and cost for passive acoustic devices", https://doi.org/10.1080/09524622.2023.2290715. The dataset contains measurements for acoustic call files for free-flying bats simultaneously recorded on both Audiomoth and Anabat Swift passive acoustic recording devices in a conservation area in northeastern Missouri, USA. We paired calls from the two devices and compared indicators of recording quality measured in a proprietary program (Bat Call Identification Software). The dataset also contains a file enumerating the proportions of calls classified as low frequency, mid frequency, or Myotis (three phonic groups) for each type of recording device. The data were used to compare the quality and sensitivity of the two devices. The scripts for modeling procedures and figures are included in the dataset.
keywords: Bats; echolocation; passive acoustic monitoring; sensors
published: 2023-10-26
Simulation trajectory data and scripts for Nature Nanotechnology manuscript "A DNA turbine powered by a transmembrane potential across a nanopore" that demonstrates a rationally designed nanoscale DNA-origami turbine with three chiral blades that uses a transmembrane electrochemical potential across a nanopore to drive a DNA bundle into sustained unidirectional rotations of up to 10 revolutions/s. Driven by the asymmetric mobility of a DNA duplex, the rotation direction of the turbine is set by its designed chirality and the salinity of the solvent.
keywords: All-atom MD simulation; DNA; nanotechnology; motors and rotors
published: 2023-10-22
HGT+ILS datasets from Davidson, R., Vachaspati, P., Mirarab, S., & Warnow, T. (2015). Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC genomics, 16(10), 1-12. Contains model species trees, true and estimated gene trees, and simulated alignments.
keywords: evolution; computational biology; bioinformatics; phylogenetics
published: 2023-10-16
This dataset provides microhabitat and environmental variables collected in the habitat of the poison frog Mantella baroni from 155 1-meter square quadrats in Vohimana Reserve along forest valleys, on slopes, and on ridgelines. We also provide data from photographic capture-recapture surveys used for estimating abundance.
keywords: occupancy; abundance; amphibian; Madagascar; microhabitat; capture-recapture
published: 2019-07-11
We studied the effect of windstorm disturbance on forest invasive plants in southern Illinois. This data includes raw data on plant abundance at survey points, compiled data used in statistical analyses, and spatial data for surveyed plots and units. This file package also includes a readme.doc file that describes the data in detail, including attribute descriptions.
keywords: tornado, blowdowns, derecho, invasive plants, Shawnee National Forest, southern Illinois
published: 2023-09-21
The relationship between physical activity and mental health, especially depression, is one of the most studied topics in the field of exercise science and kinesiology. Although there is strong consensus that regular physical activity improves mental health and reduces depressive symptoms, some debate the mechanisms involved in this relationship as well as the limitations and definitions used in such studies. Meta-analyses and systematic reviews continue to examine the strength of the association between physical activity and depressive symptoms for the purpose of improving exercise prescription as treatment or combined treatment for depression. This dataset covers 27 review articles (either systematic review, meta-analysis, or both) and 365 primary study articles addressing the relationship between physical activity and depressive symptoms. Primary study articles are manually extracted from the review articles. We used a custom-made workflow (Fu, Yuanxi. (2022). Scopus author info tool (1.0.1) [Python]. <a href="https://github.com/infoqualitylab/Scopus_author_info_collection">https://github.com/infoqualitylab/Scopus_author_info_collection</a> that uses the Scopus API and manual work to extract and disambiguate authorship information for the 392 reports. The author information file (author_list.csv) is the product of this workflow and can be used to compute the co-author network of the 392 articles. This dataset can be used to construct the inclusion network and the co-author network of the 27 review articles and 365 primary study articles. A primary study article is "included" in a review article if it is considered in the review article's evidence synthesis. Each included primary study article is cited in the review article, but not all references cited in a review article are included in the evidence synthesis or primary study articles. The inclusion network is a bipartite network with two types of nodes: one type represents review articles, and the other represents primary study articles. In an inclusion network, if a review article includes a primary study article, there is a directed edge from the review article node to the primary study article node. The attribute file (article_list.csv) includes attributes of the 392 articles, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Collectively, this dataset reflects the evidence production and use patterns within the exercise science and kinesiology scientific community, investigating the relationship between physical activity and depressive symptoms. FILE FORMATS 1. article_list.csv - Unicode CSV 2. author_list.csv - Unicode CSV 3. Chinese_author_name_reference.csv - Unicode CSV 4. inclusion_net_edges.csv - Unicode CSV 5. review_article_details.csv - Unicode CSV 6. supplementary_reference_list.pdf - PDF 7. README.txt - text file 8. systematic_review_inclusion_criteria.csv - Unicode CSV <b>UPDATES IN THIS VERSION COMPARED TO V3</b> (Clarke, Caitlin; Lischwe Mueller, Natalie; Joshi, Manasi Ballal; Fu, Yuanxi; Schneider, Jodi (2023): The Inclusion Network of 27 Review Articles Published between 2013-2018 Investigating the Relationship Between Physical Activity and Depressive Symptoms. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4614455_V3) - We added a new file systematic_review_inclusion_criteria.csv.
keywords: systematic reviews; meta-analyses; evidence synthesis; network visualization; tertiary studies; physical activity; depressive symptoms; exercise; review articles
published: 2023-09-20
Dataset includes bee trait information and species abundance information for bees collected at 29 forests plots in southern Illinois, USA. Plots are located within three public land sites. Environmental data were also collected for each of the 29 plots.
keywords: wild bees; forest management; functional traits
published: 2023-09-19
We used the following keywords files to identify categories for journals and conferences not in Scopus, for our STI 2023 paper "Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science". The first four text files each contains keywords/content words in the form: 'keyword1', 'keyword2', 'keyword3', .... The file title indicates the name of the category: file1: healthscience_words.txt file2: lifescience_words.txt file3: physicalscience_words.txt file4: socialscience_words.txt The first four files were generated from a combination of software and manual review in an iterative process in which we: - Manually reviewed venue titles were not able to automatically categorize using the Scopus categorization or extending it as a resource. - Iteratively reviewed uncategorized venue titles to manually curate additional keywords as content words indicating a venue title could be classified in the category healthscience, lifescience, physicalscience, or socialscience. We used English content words and added words we could automatically translate to identify content words. NOTE: Terminology with multiple potential meanings or contain non-English words that did not yield useful automatic translations e.g., (e.g., Al-Masāq) were not selected as content words. The fifth text file is a list of stopwords in the form: 'stopword1', 'stopword2, 'stopword3', ... file5: stopwords.txt This file contains manually curated stopwords from venue titles to handle non-content words like 'conference' and 'journal,' etc. This dataset is a revision of the following dataset: Version 1: Lee, Jou; Schneider, Jodi: Keywords for manual field assignment for Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science. University of Illinois at Urbana-Champaign Data Bank. Changes from Version 1 to Version 2: - Added one author - Added a stopwords file that was used in our data preprocessing. - Thoroughly reviewed each of the 4 keywords lists. In particular, we added UTF-8 terminology, removed some non-content words and misclassified content words, and extensively reviewed non-English keywords.
keywords: health science keywords; scientometrics; stopwords; field; keywords; life science keywords; physical science keywords; science of science; social science keywords; meta-science; RISRS
published: 2022-08-08
This upload contains all datasets used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment". The zip file has the following structure (presented as an example): salma_paper_datasets/ |_README.md |_10aa/ |_crw/ |_homfam/ |_aat/ | |_... |_... |_het/ |_5000M2-het/ | |_... |_5000M3-het/ ... |_rec_res/ Generally, the structure can be viewed as: [category]/[dataset]/[replicate]/[alignment files] # Categories: 1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate. 2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM). 3. homfam: There are the 10 largest Homfam datasets, each with one replicate. 4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates. 5. rec\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper. # Alignment files There are at most 6 `.fasta` files in each sub-directory: 1. `all.unaln.fasta`: All unaligned sequences. 2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included. 3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences). 4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included. 5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences). 6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included. >If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing. >If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing. >If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing. # Additional file(s) 1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences.
keywords: SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity
published: 2023-09-01
An online and paper knowledge, attitudes, and practices survey on ticks and tick-borne diseases (TBD) was distributed to farmers in Illinois during summer 2020 to spring 2022 (paper version titled Final Draft Farmer KAP_v.SoftCopy_Revised.docx). These are the raw data associated with that survey and the survey questions used (FarmerTickKAPdata.csv, data dictionary in Data Description.docx). We have added calculated values (columns 286 to end, code for calculation in FarmerKAPvariableCalculation.R), including: the tick knowledge score, TBD knowledge score, and total knowledge score, which are the sum of the total number of correct answers in each category, and score percent, which are the proportion of correct answers in each category.
keywords: ticks; survey; tick-borne disease; farmer
published: 2023-08-24
This data set includes all of data related to strain-resilient FETs based on 2D heterostructures including optical images of FETs, Raman characteristics data, Transport measurement data, and AFM topography data.
keywords: 2D materials; Stretchable electronics
published: 2023-08-11
This dataset contains leaf photosynthetic and biochemical traits, plant biomass, and yield in five C3 crops (chickpea, rice, snap bean, soybean, wheat) and four C4 crops (sorghum, maize, Miscanthus × giganteus, switchgrass) grown under ambient and elevated O3 concentration ([O3]) in the field at free-air O3 concentration enrichment (O3-FACE) facilities over the past 20 years.
keywords: C3 and C4 crops; elevated O3; FACE; photosynthesis; yield
published: 2023-08-04
Data are provided that are relevant to the rare plant Phlox pilosa ssp. sangamonensis, or Sangamon phlox, and other members of the genus that occur in its native range. Sangamon phlox is a state-endangered subspecies that is only known to occur in two Illinois counties. Data provided come from all known Sangamon phlox populations, which we estimate as 10 separate populations. Data include genetic data from DNA microsatellite loci (allele sizes and basic summaries), flowering population size estimates, rates of fruit set, and rates of seed set. Additionally, genetic data (from microsatellites) are provided for Phlox divaricata ssp. laphamii (three populations), Phlox pilosa ssp. pilosa (two populations), and Phlox pilosa ssp. fulgida (two populations).
keywords: Phlox; conservation genetics; microsatellites; endemism; rare plants
published: 2023-08-03
This file contains the delta 15N values for leaf material collected from Cyathea rojasiana tree ferns before and after fertilization using ammonium -15N chloride solution to determine whether 15N update is possible from senescent leaves. Details of the experiment are provided in the online supplement to the published paper. Briefly, In February 2022 we selected three mature C. rojasiana individuals 1-1.5m in height that had leaves rooted in the soil and one new developing (but unexpanded) leaf. For each fern, two plastic pots (10 x 10 x 12 cm) were filled with a 50:50 mixture of washed river sand and soil from the Chorro watershed. For each pot, one senescent leaf that was rooted in the soil was carefully excavated and its roots transplanted into the pot. Pots were then fertilized by adding 30 ml of a 0.02 M 15N solution of ammonium-15N chloride (98% 15N; Sigma-Aldrich 299251; St Louis, MO) to yield a target concentration of 2 µg15N cm-3 of soil. After fertilization pots were carefully enclosed within thick plastic bags, and sealed around the senescent leaf rachis to prevent leaching any of 15N from the pot to the surrounding soil. At the time of N fertilization, pinnae of the youngest fully expanded leaf were collected from each fern. One pinna was collected from the base of the leaf and one from the distal end of the leaf. In March 2022, after 28 days the roots were removed from pots and two additional leaf pinnae sampled from each fern: one from the base and one from the distal end of the youngest (now fully expanded) leaf. Leaf samples were dried for 72 hours at 60 C and then leaf lamina tissue finely ground with a bead beater. The delta 15N for each leaf sample determined at the University of Illinois, Urbana-Champaign using a Thermo Delta V Advantage IRMS run in combination with a Costech 4010 Elemental Analyzer. Samples were run in continuous flow relative to laboratory standards that were calibrated with USGS 40, 41, and NBS 19 reference materials.
keywords: 15N; Cyathea rojasiana; N fertilization; montane forest
published: 2023-07-14
Data for Post-retraction citation: A review of scholarly research on the spread of retracted science Schneider, Jodi; Das, Susmita; Léveillé, Jacqueline; Proescholdt, Randi Contact: Jodi Schneider jodi@illinois.edu & jschneider@pobox.com ********** OVERVIEW ********** This dataset provides further analysis for an ongoing literature review about post-retraction citation. This ongoing work extends a poster presented as: Jodi Schneider, Jacqueline Léveillé, Randi Proescholdt, Susmita Das, and The RISRS Team. Characterization of Publications on Post-Retraction Citation of Retracted Articles. Presented at the Ninth International Congress on Peer Review and Scientific Publication, September 8-10, 2022 hybrid in Chicago. https://hdl.handle.net/2142/114477 (now also in https://peerreviewcongress.org/abstract/characterization-of-publications-on-post-retraction-citation-of-retracted-articles/ ) Items as of the poster version are listed in the bibliography 92-PRC-items.pdf. Note that following the poster, we made several changes to the dataset (see changes-since-PRC-poster.txt). For both the poster dataset and the current dataset, 5 items have 2 categories (see 5-items-have-2-categories.txt). Articles were selected from the Empirical Retraction Lit bibliography (https://infoqualitylab.org/projects/risrs2020/bibliography/ and https://doi.org/10.5281/zenodo.5498474 ). The current dataset includes 92 items; 91 items were selected from the 386 total items in Empirical Retraction Lit bibliography version v.2.15.0 (July 2021); 1 item was added because it is the final form publication of a grouping of 2 items from the bibliography: Yang (2022) Do retraction practices work effectively? Evidence from citations of psychological retracted articles http://doi.org/10.1177/01655515221097623 Items were classified into 7 topics; 2 of the 7 topics have been analyzed to date. ********************** OVERVIEW OF ANALYSIS ********************** DATA ANALYZED: 2 of the 7 topics have been analyzed to date: field-based case studies (n = 20) author-focused case studies of 1 or several authors with many retracted publications (n = 15) FUTURE DATA TO BE ANALYZED, NOT YET COVERED: 5 of the 7 topics have not yet been analyzed as of this release: database-focused analyses (n = 33) paper-focused case studies of 1 to 125 selected papers (n = 15) studies of retracted publications cited in review literature (n = 8) geographic case studies (n = 4) studies selecting retracted publications by method (n = 2) ************** FILE LISTING ************** ------------------ BIBLIOGRAPHY ------------------ 92-PRC-items.pdf ------------------ TEXT FILES ------------------ README.txt 5-items-have-2-categories.txt changes-since-PRC-poster.txt ------------------ CODEBOOKS ------------------ Codebook for authors.docx Codebook for authors.pdf Codebook for field.docx Codebook for field.pdf Codebook for KEY.docx Codebook for KEY.pdf ------------------ SPREADSHEETS ------------------ field.csv field.xlsx multipleauthors.csv multipleauthors.xlsx multipleauthors-not-named.csv multipleauthors-not-named.xlsx singleauthors.csv singleauthors.xlsx *************************** DESCRIPTION OF FILE TYPES *************************** BIBLIOGRAPHY (92-PRC-items.pdf) presents the items, as of the poster version. This has minor differences from the current data set. Consult changes-since-PRC-poster.txt for details on the differences. TEXT FILES provide notes for additional context. These files end in .txt. CODEBOOKS describe the data we collected. The same data is provided in both Word (.docx) and PDF format. There is one general codebook that is referred to in the other codebooks: Codebook for KEY lists fields assigned (e.g., for a journal or conference). Note that this is distinct from the overall analysis in the Empirical Retraction Lit bibliography of fields analyzed; for that analysis see Proescholdt, Randi (2021): RISRS Retraction Review - Field Variation Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2070560_V1 Other codebooks document specific information we entered on each column of a spreadsheet. SPREADSHEETS present the data collected. The same data is provided in both Excel (.xlsx) and CSV format. Each data row describes a publication or item (e.g., thesis, poster, preprint). For column header explainations, see the associated codebook. ***************************** DETAILS ON THE SPREADSHEETS ***************************** field-based case studies CODEBOOK: Codebook for field --REFERS TO: Codebook for KEY DATA SHEET: field REFERS TO: Codebook for KEY --NUMBER OF DATA ROWS: 20 NOTE: Each data row describes a publication/item. --NUMBER OF PUBLICATION GROUPINGS: 17 --GROUPED PUBLICATIONS: Rubbo (2019) - 2 items, Yang (2022) - 3 items author-focused case studies of 1 or several authors with many retracted publications CODEBOOK: Codebook for authors --REFERS TO: Codebook for KEY DATA SHEET 1: singleauthors (n = 9) --NUMBER OF DATA ROWS: 9 --NUMBER OF PUBLICATION GROUPINGS: 9 DATA SHEET 2: multipleauthors (n = 5 --NUMBER OF DATA ROWS: 5 --NUMBER OF PUBLICATION GROUPINGS: 5 DATA SHEET 3: multipleauthors-not-named (n = 1) --NUMBER OF DATA ROWS: 1 --NUMBER OF PUBLICATION GROUPINGS: 1 ********************************* CRediT <http://credit.niso.org> ********************************* Susmita Das: Conceptualization, Data curation, Investigation, Methodology Jaqueline Léveillé: Data curation, Investigation Randi Proescholdt: Conceptualization, Data curation, Investigation, Methodology Jodi Schneider: Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Supervision
keywords: retraction; citation of retracted publications; post-retraction citation; data extraction for scoping reviews; data extraction for literature reviews;
published: 2022-12-31
Trajectory data for Nature Nanotechnology manuscript "DNA double helix, a tiny electromotor" that demonstrates how an electric field applied along the helical axis of a DNA or RNA molecule will generate an electroosmotic flow that causes the duplex to spin about that axis, much like a turbine.
keywords: All-atom MD simulation; DNA; nanotechnology; motors and rotors