Illinois Data Bank Dataset Search Results
Results
published:
2023-12-18
Edmonds, Devin; Adamovicz, Laura; Allender, Matthew; Colton, Andrea; Randy, Nyboer; Michael, Dreslik
(2023)
We conducted long-term capture-mark-recapture surveys on two isolated ornate box turtle (Terrapene ornata) populations in northern Illinois, USA. This dataset provides the capture history strings and additional demographic information used for estimating population vital rates with robust design capture-mark-recapture models. The vital rates were then used in a stage-based population projection matrix model for each population.
keywords:
demography; capture-mark-recapture; vital rates; conservation; wildlife ecology
published:
2011-09-20
Swenson, M. Shel; Suri, Rahul; Linder, C. Randal; Warnow, Tandy; Nguyen, Nam-puhong; Mirarab, Siavash; Neves, Diogo Telmo; Sobral, João Luís; Pingali, Keshav; Nelesen, Serita; Liu, Kevin; Wang, Li-San
(2011)
This page provides the data for SuperFine, DACTAL, and BeeTLe publications.
- Swenson, M. Shel, et al. "SuperFine: fast and accurate supertree estimation." Systematic biology 61.2 (2012): 214.
- Nguyen, Nam, Siavash Mirarab, and Tandy Warnow. "MRL and SuperFine+ MRL: new supertree methods." Algorithms for Molecular Biology 7 (2012): 1-13.
- Neves, Diogo Telmo, et al. "Parallelizing superfine." Proceedings of the 27th Annual ACM Symposium on Applied Computing. 2012.
- Nelesen, Serita, et al. "DACTAL: divide-and-conquer trees (almost) without alignments." Bioinformatics 28.12 (2012): i274-i282.
- Liu, Kevin, and Tandy Warnow. "Treelength optimization for phylogeny estimation." PLoS One 7.3 (2012): e33104.
published:
2017-12-14
Hepler, Katherine C.
(2017)
keywords:
uranium harvesting from seawater; Geospatial analysis; adsorbent performance; NPRE 412
published:
2017-11-14
Miller, Martin; Chung, Soon-Jo; Hutchinson, Seth
(2017)
If you use this dataset, please cite the IJRR data paper (bibtex is below).
We present a dataset collected from a canoe along the Sangamon River in Illinois. The canoe was equipped with a stereo camera, an IMU, and a GPS device, which provide visual data suitable for stereo or monocular applications, inertial measurements, and position data for ground truth. We recorded a canoe trip up and down the river for 44 minutes covering 2.7 km round trip. The dataset adds to those previously recorded in unstructured environments and is unique in that it is recorded on a river, which provides its own set of challenges and constraints that are described
in this paper. The data is divided into subsets, which can be downloaded individually.
Video previews are available on Youtube:
https://www.youtube.com/channel/UCOU9e7xxqmL_s4QX6jsGZSw
The information below can also be found in the README files provided in the 527 dataset and each of its subsets. The purpose of this document is to assist researchers in using this dataset.
Images
======
Raw
---
The raw images are stored in the cam0 and cam1 directories in bmp format. They are bayered images that need to be debayered and undistorted before they are used. The camera parameters for these images can be found in camchain-imucam.yaml. Note that the camera intrinsics describe a 1600x1200 resolution image, so the focal length and center pixel coordinates must be scaled by 0.5 before they are used. The distortion coefficients remain the same even for the scaled images. The camera to imu tranformation matrix is also in this file. cam0/ refers to the left camera, and cam1/ refers to the right camera.
Rectified
---------
Stereo rectified, undistorted, row-aligned, debayered images are stored in the rectified/ directory in the same way as the raw images except that they are in png format. The params.yaml file contains the projection and rotation matrices necessary to use these images. The resolution of these parameters do not need to be scaled as is necessary for the raw images.
params.yml
----------
The stereo rectification parameters. R0,R1,P0,P1, and Q correspond to the outputs of the OpenCV stereoRectify function except that 1s and 2s are replaced by 0s and 1s, respectively.
R0: The rectifying rotation matrix of the left camera.
R1: The rectifying rotation matrix of the right camera.
P0: The projection matrix of the left camera.
P1: The projection matrix of the right camera.
Q: Disparity to depth mapping matrix
T_cam_imu: Transformation matrix for a point in the IMU frame to the left camera frame.
camchain-imucam.yaml
--------------------
The camera intrinsic and extrinsic parameters and the camera to IMU transformation usable with the raw images.
T_cam_imu: Transformation matrix for a point in the IMU frame to the camera frame.
distortion_coeffs: lens distortion coefficients using the radial tangential model.
intrinsics: focal length x, focal length y, principal point x, principal point y
resolution: resolution of calibration. Scale the intrinsics for use with the raw 800x600 images. The distortion coefficients do not change when the image is scaled.
T_cn_cnm1: Transformation matrix from the right camera to the left camera.
Sensors
-------
Here, each message in name.csv is described
###rawimus###
time # GPS time in seconds
message name # rawimus
acceleration_z # m/s^2 IMU uses right-forward-up coordinates
-acceleration_y # m/s^2
acceleration_x # m/s^2
angular_rate_z # rad/s IMU uses right-forward-up coordinates
-angular_rate_y # rad/s
angular_rate_x # rad/s
###IMG###
time # GPS time in seconds
message name # IMG
left image filename
right image filename
###inspvas###
time # GPS time in seconds
message name # inspvas
latitude
longitude
altitude # ellipsoidal height WGS84 in meters
north velocity # m/s
east velocity # m/s
up velocity # m/s
roll # right hand rotation about y axis in degrees
pitch # right hand rotation about x axis in degrees
azimuth # left hand rotation about z axis in degrees clockwise from north
###inscovs###
time # GPS time in seconds
message name # inscovs
position covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz m^2
attitude covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz deg^2
velocity covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz (m/s)^2
###bestutm###
time # GPS time in seconds
message name # bestutm
utm zone # numerical zone
utm character # alphabetical zone
northing # m
easting # m
height # m above mean sea level
Camera logs
-----------
The files name.cam0 and name.cam1 are text files that correspond to cameras 0 and 1, respectively. The columns are defined by:
unused: The first column is all 1s and can be ignored.
software frame number: This number increments at the end of every iteration of the software loop.
camera frame number: This number is generated by the camera and increments each time the shutter is triggered. The software and camera frame numbers do not have to start at the same value, but if the difference between the initial and final values is not the same, it suggests that frames may have been dropped.
camera timestamp: This is the cameras internal timestamp of the frame capture in units of 100 milliseconds.
PC timestamp: This is the PC time of arrival of the image.
name.kml
--------
The kml file is a mapping file that can be read by software such as Google Earth. It contains the recorded GPS trajectory.
name.unicsv
-----------
This is a csv file of the GPS trajectory in UTM coordinates that can be read by gpsbabel, software for manipulating GPS paths.
@article{doi:10.1177/0278364917751842,
author = {Martin Miller and Soon-Jo Chung and Seth Hutchinson},
title ={The Visual–Inertial Canoe Dataset},
journal = {The International Journal of Robotics Research},
volume = {37},
number = {1},
pages = {13-20},
year = {2018},
doi = {10.1177/0278364917751842},
URL = {https://doi.org/10.1177/0278364917751842},
eprint = {https://doi.org/10.1177/0278364917751842}
}
keywords:
slam;sangamon;river;illinois;canoe;gps;imu;stereo;monocular;vision;inertial
published:
2019-09-17
Mishra, Shubhanshu
(2019)
Trained models for multi-task multi-dataset learning for text classification in tweets.
Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality.
Models were trained using: <a href="https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification.py">https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification.py</a>
See <a href="https://github.com/socialmediaie/SocialMediaIE">https://github.com/socialmediaie/SocialMediaIE</a> and <a href="https://socialmediaie.github.io">https://socialmediaie.github.io</a> for details.
If you are using this data, please also cite the related article:
Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929
keywords:
twitter; deep learning; machine learning; trained models; multi-task learning; multi-dataset learning; sentiment; sarcasm; abusive content;
published:
2020-08-21
Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana
(2020)
# WikiCSSH
If you are using WikiCSSH please cite the following:
> Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana. 2020. “WikiCSSH: Extracting Computer Science Subject Headings from Wikipedia.” In Workshop on Scientific Knowledge Graphs (SKG 2020). https://skg.kmi.open.ac.uk/SKG2020/papers/HAN_et_al_SKG_2020.pdf
> Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana. 2020. "WikiCSSH - Computer Science Subject Headings from Wikipedia". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0424970_V1
Download the WikiCSSH files from: https://doi.org/10.13012/B2IDB-0424970_V1
More details about the WikiCSSH project can be found at: https://github.com/uiuc-ischool-scanr/WikiCSSH
This folder contains the following files:
WikiCSSH_categories.csv - Categories in WikiCSSH
WikiCSSH_category_links.csv - Links between categories in WikiCSSH
Wikicssh_core_categories.csv - Core categories as mentioned in the paper
WikiCSSH_category_links_all.csv - Links between categories in WikiCSSH (includes a dummy category called <ROOT> which is parent of isolates and top level categories)
WikiCSSH_category2page.csv - Links between Wikipedia pages and Wikipedia Categories in WikiCSSH
WikiCSSH_page2redirect.csv - Links between Wikipedia pages and Wikipedia page redirects in WikiCSSH
This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit <a href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</a> or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
keywords:
wikipedia; computer science;
published:
2022-03-19
McCoy, Annette; Secor, Erica; Roady, Patrick; Gray, Sarah; Klein, Julie; Gutierrez-Nibeyro, Santiago
(2022)
Raw arthroscopic scores, histologic scores, cytokine measurements, and performance data for the study cohort described in the accompanying publication.
keywords:
horse; metatarsophalangeal joint; arthroscopy; exercise; developmental orthopedic disease
published:
2016-06-23
This dataset was extracted from a set of metadata files harvested from the DataCite metadata store (https://search.datacite.org/ui) during December 2015. Metadata records for items with a resourceType of dataset were collected. 1,647,949 total records were collected.
This dataset contains three files:
1) readme.txt: A readme file.
2) version-results.csv: A CSV file containing three columns: DOI, DOI prefix, and version text contents
3) version-counts.csv: A CSV file containing counts for unique version text content values.
keywords:
datacite;metadata;version values;repository data
published:
2024-10-10
Mishra, Apratim; Lee, Haejin; Jeoung, Sullam; Torvik, Vetle; Diesner, Jana
(2024)
Diversity - PubMed dataset
Contact: Apratim Mishra (Oct, 2024)
This dataset presents article-level (pmid) and author-level (auid) diversity data for PubMed articles. The chosen selection includes articles retrieved from Authority 2018 [1], 907 024 papers, and 1 316 838 authors, and is an expanded dataset of V1. The sample of articles consists of the top 40 journals in the dataset, limited to 2-12 authors published between 1991 – 2014, which are article type "journal type" written in English. Files are 'gzip' compressed and separated by tab space, and V3 includes the correct author count for the included papers (pmids) and updated results with no NaNs.
################################################
File1: auids_plos_3.csv.gz (Important columns defined, 5 in total)
• AUID: a unique ID for each author
• Genni: gender prediction
• Ethnea: ethnicity prediction
#################################################
File2: pmids_plos_3.csv.gz (Important columns defined)
• pmid: unique paper
• auid: all unique auids (author-name unique identification)
• year: Year of paper publication
• no_authors: Author count
• journal: Journal name
• years: first year of publication for every author
• Country-temporal: Country of affiliation for every author
• h_index: Journal h-index
• TimeNovelty: Paper Time novelty [2]
• nih_funded: Binary variable indicating funding for any author
• prior_cit_mean: Mean of all authors’ prior citation rate
• Insti_impact: All unique institutions’ citation rate
• mesh_vals: Top MeSH values for every author of that paper
• relative_citation_ratio: RCR
The ‘Readme’ includes a description for all columns.
[1] Torvik, Vetle; Smalheiser, Neil (2021): Author-ity 2018 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2273402_V1
[2] Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1
keywords:
Diversity; PubMed; Citation
published:
2025-07-21
Feng, Jennifer T.; van den Berg, Thya; Donders, Timme H.; Kong, Shu; Puthanveetil Satheesan, Sandeep; Punyasena, Surangi W.
(2025)
This dataset includes image stacks, annotated counts, and ground-truth masks from two high-resolution sediment cores extracted from Laguna Pallcacocha, in El Cajas National Park, Ecuadorian Andes by Moy et al. (2002) and Hagemans et al. (2021). The first core (PAL 1999, from Moy et al. (2002)) extends through the Holocene (11,600 cal. yr. BP - present). There are a total of 900 annotated image stacks and masks in the PAL 1999 domain. The second core (PAL IV, from Hagemans et al. (2021)) captures the 20th century. There are 2986 annotated image stacks and masks in the PAL IV domain.
Different microscopes and annotations tools were used to image and annotate each core and there are corresponding differences in naming conventions and file formats. Thus, we organized our data separately for the PAL 1999 and the PAL IV domains. The three letter codes used to label our pollen annotations are in the file: “Pollen_Identification_Codes.xlsx”.
Both domain directories contain:
• Image stacks organized by subdirectory
• Annotations within each image stack directory, containing specimen identifications using a three letter code and coordinates defining bounding boxes or circles
• Ground-truth distance-transform masks for each image stack
The zip file "bestValModel_encoder.paramOnly.zip" is the trained pollen detection model produced from the images and annotations in this dataset.
Please cite this dataset as:
Feng, Jennifer T.; van den Berg, Thya; Donders, Timme H.; Kong, Shu; Puthanveetil Satheesan, Sandeep; Punyasena, Surangi W. (2025): Slide scans, annotated pollen counts, and trained pollen detection models for fossil pollen samples from Laguna Pallcacocha, El Cajas National Park, Ecuador . University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4207757_V1
Please also include citations of the original publications from which these data are taken:
Feng, Jennifer T., Sandeep Puthanveetil Satheesan, Shu Kong, Timme H. Donders, and Surangi W. Punyasena. “Addressing the ‘Open World’: Detecting and Segmenting Pollen on Palynological Slides with Deep Learning.” bioRxiv, January 1, 2025. https://doi.org/10.1101/2025.01.05.631390.
Feng, Jennifer T., Sandeep Puthanveetil Satheesan, Shu Kong, Timme H. Donders, and Surangi W. Punyasena. “Addressing the ‘Open World’: Detecting and Segmenting Pollen on Palynological Slides with Deep Learning.” Paleobiology, 2025 [in press].
Feng, J. T. (2023). Open-world deep learning applied to pollen detection (MS thesis, University of Illinois at Urbana-Champaign). https://hdl.handle.net/2142/120168
keywords:
continual learning; deep learning; domain gaps; open-world; palynology; pollen grain detection; taxonomic bias
published:
2025-01-30
Raw data associated with PMID: 38925247
published:
2025-01-30
Zhang, Yufan; Bhattarai, Rabin
(2025)
This is a research data for a manuscript - A Framework of Simulating Structural Sediment Perimeter Barriers using VFSMOD.
keywords:
sediment control
published:
2017-12-14
Objectives: This study follows-up on previous work that began examining data deposited in an institutional repository. The work here extends the earlier study by answering the following lines of research questions: (1) what is the file composition of datasets ingested into the University of Illinois at Urbana-Champaign campus repository? Are datasets more likely to be single file or multiple file items? (2) what is the usage data associated with these datasets? Which items are most popular?
Methods: The dataset records collected in this study were identified by filtering item types categorized as "data" or "dataset" using the advanced search function in IDEALS. Returned search results were collected in an Excel spreadsheet to include data such as the Handle identifier, date ingested, file formats, composition code, and the download count from the item's statistics report. The Handle identifier represents the dataset record's persistent identifier. Composition represents codes that categorize items as single or multiple file deposits. Date available represents the date the dataset record was published in the campus repository. Download statistics were collected via a website link for each dataset record and indicates the number of times the dataset record has been downloaded. Once the data was collected, it was used to evaluate datasets deposited into IDEALS.
Results: A total of 522 datasets were identified for analysis covering the period between January 2007 and August 2016. This study revealed two influxes occurring during the period of 2008-2009 and in 2014. During the first time frame a large number of PDFs were deposited by the Illinois Department of Agriculture. Whereas, Microsoft Excel files were deposited in 2014 by the Rare Books and Manuscript Library. Single file datasets clearly dominate the deposits in the campus repository. The total download count for all datasets was 139,663 and the average downloads per month per file across all datasets averaged 3.2.
Conclusion: Academic librarians, repository managers, and research data services staff can use the results presented here to anticipate the nature of research data that may be deposited within institutional repositories. With increased awareness, content recruitment, and improvements, IRs can provide a viable cyberinfrastructure for researchers to deposit data, but much can be learned from the data already deposited. Awareness of trends can help librarians facilitate discussions with researchers about research data deposits as well as better tailor their services to address short-term and long-term research needs.
keywords:
research data; research statistics; institutional repositories; academic libraries
published:
2022-06-01
Southey, Bruce; Rodriguez-Zas, Sandra L.
(2022)
This dataset contain information for the paper "Changes in neuropeptide prohormone genes among Cetartio-dactyla livestock and wild species associated with evolution and domestication" Veterinary Sciences, MDPI. Protein sequences were predicted using GeneWise for 98 neuropeptide prohormone genes from publicly available genomes of 118 Cetartiodactyla species. All predictions (CetartiodactylaSequences2022.zip) were manually verified. Sequences were aligned within each prohormone using MAFFT (MDPImultalign2022.zip includes multiple sequence alignment of all species available for each prohormone). Phylogenetic gene trees were constructed using PhyML and the species tree was constructed using ASTRAL (MDPItree2022.zip). The data is released under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0).
keywords:
prohormone; neuropeptide; Cetartiodactyla; Cetartiodactyla; phylogenetics; gene tree; species tree
published:
2025-09-23
Zhao, Huimin; Chen, Li-Qing; Martin, Teresa; Xue, Xueyi; Singh, Nilmani; Tan, Shi-I; Boob, Aashutosh
(2025)
Mitochondria play a key role in energy production and metabolism, making them a promising target for metabolic engineering and disease treatment. However, despite the known influence of passenger proteins on localization efficiency, only a few protein-localization tags have been characterized for mitochondrial targeting. To address this limitation, we leverage a Variational Autoencoder to design novel mitochondrial targeting sequences. In silico analysis reveals that a high fraction of the generated peptides (90.14%) are functional and possess features important for mitochondrial targeting. We characterize artificial peptides in four eukaryotic organisms and, as a proof-of-concept, demonstrate their utility in increasing 3-hydroxypropionic acid titers through pathway compartmentalization and improving 5-aminolevulinate synthase delivery by 1.62-fold and 4.76-fold, respectively. Moreover, we employ latent space interpolation to shed light on the evolutionary origins of dual-targeting sequences. Overall, our work demonstrates the potential of generative artificial intelligence for both fundamental research and practical applications in mitochondrial biology.
keywords:
AI/ML; metabolic engineering; modeling; software
published:
2017-06-16
Haselhorst, Derek S.; Tcheng, David K.; Moreno, J. Enrique ; Punyasena, Surangi W.
(2017)
Table S2. Raw pollen counts and climatic data for each seasonal sampling period. Climatic data reflects the average daily conditions observed over the duration samples were collected (˚C/day, mm/day, MJ/m2/day). Lycopodium counts and counts for each pollen taxon reflect the aggregated pollen sum from four sampling heights.
keywords:
pollen; count; climate; data; BCI; PNSL; Panama
published:
2020-12-07
Tian, Yuan; Smith-Bolton, Rachel
(2020)
This page contains the data for the publication "Regulation of growth and cell fate during tissue regeneration by the two SWI/SNF chromatin-remodeling complexes of Drosophila" published in Genetics, 2020
published:
2020-11-25
Barker, Louise; Gaulke, Sarah M.; Chace, Jordyn Z.; Davis, Mark A.; Niemiller, Matthew L.; Taylor, Steven J.; Schuett, Gordon W.
(2020)
Video recorded by Louise Barker using a Cannon Powershot camera documents late-season combat behavior in Agkistrodon contortrix. Recorded in Beaufort County, North Carolina, 11.1 km SE of downtown Washington on 21 October 2020.
keywords:
Agkistrodon contortrix; combat; mating; reproduction; copperhead; pit viper; Viperidae;
published:
2025-01-30
Peyton, Buddy; Bajjalieh, Joseph; Martin, Michael; Alahi, Sam; Fadell, Norah; Jeralds, Maddie
(2025)
Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader.
Version 2.2.0 adds 94 additional coup events. 66 of these came from examining Powell and Thyne’s “discarded” events and 28 of these events were added to the data set in the normal annual review of potential new coup events. This version also updates the coding to events in Brazil in 1945 and the Congo in 1968.
Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 as a conspiracy.
Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022.
Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup.
Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event.
Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include:
• Reconciling missing event data
• Removing events with irreconcilable event dates
• Removing events with insufficient sourcing (each event needs at least two sources)
• Removing events that were inaccurately coded as coup events
• Removing variables that fell below the threshold of inter-coder reliability required by the project
• Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries
• Extending the period covered from 1945-2005 to 1945-2019
• Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
Version 1.0.0 was released in 2013. This version consolidated coup data taken from the following sources:
• The Center for Systemic Peace (Marshall and Marshall, 2007)
• The World Handbook of Political and Social Indicators (Taylor and Jodice, 1983)
• Coup d’Ètat: A Practical Handbook (Luttwak, 1979)
• The Cline Center’s Social, Political and Economic Event Database (SPEED) Project (Nardulli, Althaus and Hayes, 2015)
• Government Change in Authoritarian Regimes – 2010 Update (Svolik and Akcinaroglu, 2006)
<br>
<b>Items in this Dataset</b>
1. <i>Cline Center Coup d'État Codebook v.2.2.0 Codebook.pdf</i> - This 17-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. <i>Revised January 2025</i>
2. <i>Coup Data v2.2.0.csv</i> - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1094 observations. <i>Revised January 2025</i>
3. <i>Source Document v2.2.0.pdf</i> - This 347-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. <i>Revised January 2025</i>
4. <i>README.md</i> - This file contains useful information for the user about the dataset. It is a text file written in markdown language. <i>Revised January 2025</i>
<br>
<b> Citation Guidelines</b>
1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation:
Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2025. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.2.0. Janurary 30. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V8
2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access):
Peyton, Buddy, Joseph Bajjalieh, Michael Martin, Sam Alahi, Norah Fadell, and Maddie Jeralds. 2025. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.2.0. Janurary 30. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V8
published:
2017-06-01
List of Chinese Students Receiving a Ph.D. in Chemistry between 1905 and 1964. Based on two books compiling doctoral dissertations by Chinese students in the United States. Includes disciplines; university; advisor; year degree awarded, birth and/or death date, dissertation title. Accompanies Chapter 5 : History of the Modern Chemistry Doctoral Program in Mainland China by Vera V. Mainz published in "Igniting the Chemical Ring of Fire : Historical Evolution of the Chemical Communities in the Countries of the Pacific Rim", Seth Rasmussen, Editor. Published by World Scientific. Expected publication 2017.
keywords:
Chinese; graduate student; dissertation; university; advisor; chemistry; engineering; materials science
published:
2017-06-16
Haselhorst, Derek S.; Tcheng, David K.; Moreno, J. Enrique ; Punyasena, Surangi W.
(2017)
Table S3. Mean slope response for each predictive model used in the ecoinformatic analysis. Mean responses are provided for each seasonal and annual pollen data set analyzed from BCI and PNSL and are summarized by life form. Calculated p-values are provided for each model.
keywords:
pollen; response; climate; ecoinformatics; BCI; PNSL; Panama
published:
2017-09-26
Gramig, Benjamin M.; Widmar, Nicole
(2017)
This file contains the supplemental appendix for the article "Farmer Preferences for Agricultural Soil Carbon Sequestration Schemes" published in Applied Economic Policy and Perspectives (accepted 2017).
keywords:
appendix; carbon sequestration; tillage; choice experiment
published:
2018-04-19
Prepared by Vetle Torvik 2018-04-15
The dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed.
• How was the dataset created?
First and last names of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including
Ethnea+Genni as described in:
<i>Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA.
http://hdl.handle.net/2142/88927</i>
<i>Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720</i>
EthnicSeer: http://singularity.ist.psu.edu/ethnicity
<i>Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada</i>
SexMachine 0.1.1: <a href="https://pypi.python.org/pypi/SexMachine/">https://pypi.org/project/SexMachine</a>
First names, for some Author-ity records lacking them, were harvested from outside bibliographic databases.
• The code and back-end data is periodically updated and made available for query at <a href ="http://abel.ischool.illinois.edu">Torvik Research Group</a>
• What is the format of the dataset?
The dataset contains 9,300,182 rows and 10 columns
1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition)
2. name: full name used as input to EthnicSeer)
3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX
4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction
5. lastname: used as input for Ethnea+Genni
6. firstname: used as input for Ethnea+Genni
7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short)
8. Genni: predicted gender; 'F', 'M', or '-'
9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male)
10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'
keywords:
Androgyny; Bibliometrics; Data mining; Search engine; Gender; Semantic orientation; Temporal prediction; Textual markers
published:
2018-12-14
Stein Kenfield, Ayla
(2018)
Spreadsheet with data about whether or not the indicated institutional repository website provides metadata documentation. See readme file for more information.
keywords:
institutional repositories; metadata; best practices; metadata documentation
published:
2020-12-15
Khanna, Madhu; Chen, Xiaoguang; Wang, Weiwei; Oliver, Anthony
(2020)
The dataset consists of results and various input data that are used in the GAMS model for the publication "Repeal of the Clean Power Plan: Social Cost and Distributional Implications". All the data are either excel files or in the .inc format which can be read within GAMS or Notepad. Main data sources include: agriculture, transportation and electricity data. Model details can be found in the paper and the GAMS model package.
keywords:
carbon abatement; welfare cost; electricity sector; partial equilibrium model