Displaying 101 - 125 of 139 in total
Subject Area
Funder
Publication Year
License
Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2023-09-19
 
We used the following keywords files to identify categories for journals and conferences not in Scopus, for our STI 2023 paper "Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science". The first four text files each contains keywords/content words in the form: 'keyword1', 'keyword2', 'keyword3', .... The file title indicates the name of the category: file1: healthscience_words.txt file2: lifescience_words.txt file3: physicalscience_words.txt file4: socialscience_words.txt The first four files were generated from a combination of software and manual review in an iterative process in which we: - Manually reviewed venue titles were not able to automatically categorize using the Scopus categorization or extending it as a resource. - Iteratively reviewed uncategorized venue titles to manually curate additional keywords as content words indicating a venue title could be classified in the category healthscience, lifescience, physicalscience, or socialscience. We used English content words and added words we could automatically translate to identify content words. NOTE: Terminology with multiple potential meanings or contain non-English words that did not yield useful automatic translations e.g., (e.g., Al-Masāq) were not selected as content words. The fifth text file is a list of stopwords in the form: 'stopword1', 'stopword2, 'stopword3', ... file5: stopwords.txt This file contains manually curated stopwords from venue titles to handle non-content words like 'conference' and 'journal,' etc. This dataset is a revision of the following dataset: Version 1: Lee, Jou; Schneider, Jodi: Keywords for manual field assignment for Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science. University of Illinois at Urbana-Champaign Data Bank. Changes from Version 1 to Version 2: - Added one author - Added a stopwords file that was used in our data preprocessing. - Thoroughly reviewed each of the 4 keywords lists. In particular, we added UTF-8 terminology, removed some non-content words and misclassified content words, and extensively reviewed non-English keywords.
keywords: health science keywords; scientometrics; stopwords; field; keywords; life science keywords; physical science keywords; science of science; social science keywords; meta-science; RISRS
published: 2019-01-07
 
Vendor transcription of the Catalogue of Copyright Entries, Part 1, Group 1, Books: New Series, Volume 29 for the Year 1932. This file contains all of the entries from the indicated volume.
keywords: copyright; Catalogue of Copyright Entries; Copyright Office
published: 2016-05-26
 
This data set includes survey responses collected during 2015 from academic libraries with library publishing services. Each institution responded to questions related to its use of user studies or information about readers in order to shape digital publication design, formats, and interfaces. Survey data was supplemented with institutional categories to facilitate comparison across institutional types.
keywords: academic libraries; publishing; user experience; user studies
published: 2020-08-21
 
# WikiCSSH If you are using WikiCSSH please cite the following: > Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana. 2020. “WikiCSSH: Extracting Computer Science Subject Headings from Wikipedia.” In Workshop on Scientific Knowledge Graphs (SKG 2020). https://skg.kmi.open.ac.uk/SKG2020/papers/HAN_et_al_SKG_2020.pdf > Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana. 2020. "WikiCSSH - Computer Science Subject Headings from Wikipedia". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0424970_V1 Download the WikiCSSH files from: https://doi.org/10.13012/B2IDB-0424970_V1 More details about the WikiCSSH project can be found at: https://github.com/uiuc-ischool-scanr/WikiCSSH This folder contains the following files: WikiCSSH_categories.csv - Categories in WikiCSSH WikiCSSH_category_links.csv - Links between categories in WikiCSSH Wikicssh_core_categories.csv - Core categories as mentioned in the paper WikiCSSH_category_links_all.csv - Links between categories in WikiCSSH (includes a dummy category called <ROOT> which is parent of isolates and top level categories) WikiCSSH_category2page.csv - Links between Wikipedia pages and Wikipedia Categories in WikiCSSH WikiCSSH_page2redirect.csv - Links between Wikipedia pages and Wikipedia page redirects in WikiCSSH This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit <a href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</a> or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
keywords: wikipedia; computer science;
published: 2018-04-19
 
Prepared by Vetle Torvik 2018-04-15 The dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed. &bull; How was the dataset created? First and last names of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including Ethnea+Genni as described in: <i>Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA. http://hdl.handle.net/2142/88927</i> <i>Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720</i> EthnicSeer: http://singularity.ist.psu.edu/ethnicity <i>Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada</i> SexMachine 0.1.1: <a href="https://pypi.python.org/pypi/SexMachine/">https://pypi.org/project/SexMachine</a> First names, for some Author-ity records lacking them, were harvested from outside bibliographic databases. &bull; The code and back-end data is periodically updated and made available for query at <a href ="http://abel.ischool.illinois.edu">Torvik Research Group</a> &bull; What is the format of the dataset? The dataset contains 9,300,182 rows and 10 columns 1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition) 2. name: full name used as input to EthnicSeer) 3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX 4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction 5. lastname: used as input for Ethnea+Genni 6. firstname: used as input for Ethnea+Genni 7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short) 8. Genni: predicted gender; 'F', 'M', or '-' 9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male) 10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'
keywords: Androgyny; Bibliometrics; Data mining; Search engine; Gender; Semantic orientation; Temporal prediction; Textual markers
published: 2018-08-06
 
This annotation study compared RobotReviewer's data extraction to that of three novice data extractors, using six included articles synthesized in one Cochrane review: Bailey E, Worthington HV, van Wijk A, Yates JM, Coulthard P, Afzal Z. Ibuprofen and/or paracetamol (acetaminophen) for pain relief after surgical removal of lower wisdom teeth. Cochrane Database Syst Rev. 2013; CD004624; doi:10.1002/14651858.CD004624.pub2 The goal was to assess the relative advantage of RobotReviewer's data extraction with respect to quality.
keywords: RobotReviewer; annotation; information extraction; data extraction; systematic review automation; systematic reviewing;
published: 2018-03-28
 
Bibliotelemetry data are provided in support of the evaluation of Internet of Things (IoT) middleware within library collections. IoT infrastructure within the physical library environment is the basis for an integrative, hybrid approach to digital resource recommenders. The IoT infrastructure provides mobile, dynamic wayfinding support for items in the collection, which includes features for location-based recommendations. A modular evaluation and analysis herein clarified the nature of users’ requests for recommendations based on their location, and describes subject areas of the library for which users request recommendations. The modular mobile design allowed for deep exploration of bibliographic identifiers as they appeared throughout the global module system, serving to provide context to the searching and browsing data that are the focus of this study.
keywords: internet of things; IoT; academic libraries; bibliographic classification
published: 2018-04-23
 
Conceptual novelty analysis data based on PubMed Medical Subject Headings ---------------------------------------------------------------------- Created by Shubhanshu Mishra, and Vetle I. Torvik on April 16th, 2018 ## Introduction This is a dataset created as part of the publication titled: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : the magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra. It contains final data generated as part of our experiments based on MEDLINE 2015 baseline and MeSH tree from 2015. The dataset is distributed in the form of the following tab separated text files: * PubMed2015_NoveltyData.tsv - Novelty scores for each paper in PubMed. The file contains 22,349,417 rows and 6 columns, as follow: - PMID: PubMed ID - Year: year of publication - TimeNovelty: time novelty score of the paper based on individual concepts (see paper) - VolumeNovelty: volume novelty score of the paper based on individual concepts (see paper) - PairTimeNovelty: time novelty score of the paper based on pair of concepts (see paper) - PairVolumeNovelty: volume novelty score of the paper based on pair of concepts (see paper) * mesh_scores.tsv - Temporal profiles for each MeSH term for all years. The file contains 1,102,831 rows and 5 columns, as follow: - MeshTerm: Name of the MeSH term - Year: year - AbsVal: Total publications with that MeSH term in the given year - TimeNovelty: age (in years since first publication) of MeSH term in the given year - VolumeNovelty: : age (in number of papers since first publication) of MeSH term in the given year * meshpair_scores.txt.gz (36 GB uncompressed) - Temporal profiles for each MeSH term for all years - Mesh1: Name of the first MeSH term (alphabetically sorted) - Mesh2: Name of the second MeSH term (alphabetically sorted) - Year: year - AbsVal: Total publications with that MeSH pair in the given year - TimeNovelty: age (in years since first publication) of MeSH pair in the given year - VolumeNovelty: : age (in number of papers since first publication) of MeSH pair in the given year * README.txt file ## Dataset creation This dataset was constructed using multiple datasets described in the following locations: * MEDLINE 2015 baseline: <a href="https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html">https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html</a> * MeSH tree 2015: <a href="ftp://nlmpubs.nlm.nih.gov/online/mesh/2015/meshtrees/">ftp://nlmpubs.nlm.nih.gov/online/mesh/2015/meshtrees/</a> * Source code provided at: <a href="https://github.com/napsternxg/Novelty">https://github.com/napsternxg/Novelty</a> Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check <a href="https://www.nlm.nih.gov/databases/download/pubmed_medline.html">here </a>for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions: Additional data related updates can be found at: <a href="http://abel.ischool.illinois.edu">Torvik Research Group</a> ## Acknowledgments This work was made possible in part with funding to VIT from <a href="https://projectreporter.nih.gov/project_info_description.cfm?aid=8475017&icde=18058490">NIH grant P01AG039347 </a> and <a href="http://www.nsf.gov/awardsearch/showAward?AWD_ID=1348742">NSF grant 1348742 </a>. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. ## License Conceptual novelty analysis data based on PubMed Medical Subject Headings by Shubhanshu Mishra, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at <a href="https://github.com/napsternxg/Novelty">https://github.com/napsternxg/Novelty</a>
keywords: Conceptual novelty; bibliometrics; PubMed; MEDLINE; MeSH; Medical Subject Headings; Analysis;
published: 2022-12-05
 
These are similarity matrices of countries based on dfferent modalities of web use. Alexa website traffic, trending vidoes on Youtube and Twitter trends. Each matrix is a month of data aggregated
keywords: Global Internet Use
published: 2022-10-04
 
One of the newest types of multimedia involves body-connected interfaces, usually termed haptics. Haptics may use stylus-based tactile interfaces, glove-based systems, handheld controllers, balance boards, or other custom-designed body-computer interfaces. How well do these interfaces help students learn Science, Technology, Engineering, and Mathematics (STEM)? We conducted an updated review of learning STEM with haptics, applying meta-analytic techniques to 21 published articles reporting on 53 effects for factual, inferential, procedural, and transfer STEM learning. This deposit includes the data extracted from those articles and comprises the raw data used in the meta-analytic analyses.
keywords: Computer-based learning; haptic interfaces; meta-analysis
published: 2022-07-11
 
This dataset was developed as part of an online survey study that explores student characteristics that may predict what one finds helpful in replies to requests for help posted to an online college course discussion forum. 223 college students enrolled in an introductory statistics course were surveyed on their sense of belonging to their course community, as well as how helpful they found 20 examples of replies to requests for help posted to a statistics course discussion forum.
keywords: help-giving; discussion forums; sense of belonging; college student
published: 2020-12-16
 
Terrorism is among the most pressing challenges to democratic governance around the world. The Responsible Terrorism Coverage (or ResTeCo) project aims to address a fundamental dilemma facing 21st century societies: how to give citizens the information they need without giving terrorists the kind of attention they want. The ResTeCo hopes to inform best practices by using extreme-scale text analytic methods to extract information from more than 70 years of terrorism-related media coverage from around the world and across 5 languages. Our goal is to expand the available data on media responses to terrorism and enable the development of empirically-validated models for socially responsible, effective news organizations. This particular dataset contains information extracted from terrorism-related stories in the Foreign Broadcast Information Service (FBIS) published between 1995 and 2013. It includes variables that measure the relative share of terrorism-related topics, the valence and intensity of emotional language, as well as the people, places, and organizations mentioned. This dataset contains 3 files: 1. "ResTeCo Project FBIS Dataset Variable Descriptions.pdf" A detailed codebook containing a summary of the Responsible Terrorism Coverage (ResTeCo) Project Foreign Broadcast Information Service (FBIS) Dataset and descriptions of all variables. 2. "resteco-fbis.csv" This file contains the data extracted from terrorism-related media coverage in the Foreign Broadcast Information Service (FBIS) between 1995 and 2013. It includes variables that measure the relative share of topics, sentiment, and emotion present in this coverage. There are also variables that contain metadata and list the people, places, and organizations mentioned in these articles. There are 53 variables and 750,971 observations. The variable "id" uniquely identifies each observation. Each observation represents a single news article. Please note that care should be taken when using "resteco-fbis.csv". The file may not be suitable to use in a spreadsheet program like Excel as some of the values get to be quite large. Excel cannot handle some of these large values, which may cause the data to appear corrupted within the software. It is encouraged that a user of this data use a statistical package such as Stata, R, or Python to ensure the structure and quality of the data remains preserved. 3. "README.md" This file contains useful information for the user about the dataset. It is a text file written in mark down language Citation Guidelines 1) To cite this codebook please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project Foreign Broadcast Information Service (FBIS) Dataset Variable Descriptions. Responsible Terrorism Coverage (ResTeCo) Project Foreign Broadcast Information Service (FBIS) Dataset. Cline Center for Advanced Social Research. December 16. University of Illinois Urbana-Champaign. doi: https://doi.org/10.13012/B2IDB-6360821_V1 2) To cite the data please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project Foreign Broadcast Information Service (FBIS) Dataset. Cline Center for Advanced Social Research. December 16. University of Illinois Urbana-Champaign. doi: https://doi.org/10.13012/B2IDB-6360821_V1
keywords: Terrorism, Text Analytics, News Coverage, Topic Modeling, Sentiment Analysis
published: 2020-12-16
 
Terrorism is among the most pressing challenges to democratic governance around the world. The Responsible Terrorism Coverage (or ResTeCo) project aims to address a fundamental dilemma facing 21st century societies: how to give citizens the information they need without giving terrorists the kind of attention they want. The ResTeCo hopes to inform best practices by using extreme-scale text analytic methods to extract information from more than 70 years of terrorism-related media coverage from around the world and across 5 languages. Our goal is to expand the available data on media responses to terrorism and enable the development of empirically-validated models for socially responsible, effective news organizations. This particular dataset contains information extracted from terrorism-related stories in the Summary of World Broadcasts published between 1979 and 2019. It includes variables that measure the relative share of terrorism-related topics, the valence and intensity of emotional language, as well as the people, places, and organizations mentioned. This dataset contains 3 files: 1. "ResTeCo Project SWB Dataset Variable Descriptions.pdf" A detailed codebook containing a summary of the Responsible Terrorism Coverage (ResTeCo) Project BBC Summary of World Broadcasts (SWB) Dataset and descriptions of all variables. 2. "resteco-swb.csv" This file contains the data extracted from terrorism-related media coverage in the BBC Summary of World Broadcasts (SWB) between 1979 and 2019. It includes variables that measure the relative share of topics, sentiment, and emotion present in this coverage. There are also variables that contain metadata and list the people, places, and organizations mentioned in these articles. There are 53 variables and 438,373 observations. The variable "id" uniquely identifies each observation. Each observation represents a single news article. Please note that care should be taken when using "resteco-swb.csv". The file may not be suitable to use in a spreadsheet program like Excel as some of the values get to be quite large. Excel cannot handle some of these large values, which may cause the data to appear corrupted within the software. It is encouraged that a user of this data use a statistical package such as Stata, R, or Python to ensure the structure and quality of the data remains preserved. 3. "README.md" This file contains useful information for the user about the dataset. It is a text file written in markdown language Citation Guidelines 1) To cite this codebook please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project BBC Summary of World Broadcasts (SWB) Dataset Variable Descriptions. Responsible Terrorism Coverage (ResTeCo) Project BBC Summary of World Broadcasts (SWB) Dataset. Cline Center for Advanced Social Research. December 16. University of Illinois Urbana-Champaign. doi: https://doi.org/10.13012/B2IDB-2128492_V1 2) To cite the data please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project Summary of World Broadcasts (SWB) Dataset. Cline Center for Advanced Social Research. December 16. University of Illinois Urbana-Champaign. doi: https://doi.org/10.13012/B2IDB-2128492_V1
keywords: Terrorism, Text Analytics, News Coverage, Topic Modeling, Sentiment Analysis
published: 2020-10-11
 
This dataset contains the publication record of 6429 computer science researchers collected from the Microsoft Academic dataset provided through their Knowledge Service API (http://bit.ly/microsoft-data).
published: 2020-08-10
 
These are text files downloaded from the Web of Science for the bibliographic analyses found in Zinnen et al. (2020) in Applied Vegetation Science. They represent the papers and reference lists from six expert-based indicator systems: Floristic Quality Assessment, hemeroby, naturalness indicator values (& social behaviors), Ellenberg indicator values, grassland utilization values, and urbanity indicator values. To examine data, download VOSviewer and see instructrions from van Eck & Waltman (2019) for how to upload data. Although we used bibliographic coupling, there are a number of other interesting bibliographic analyses you can use with these data (e.g., visualizing citations between journals from this set of documents). Note: There are two caveats to note about these data and Supplements 1 & 2 associated with our paper. First, there are some overlapping papers in these text files (i.e., raw data). When added individually, the papers sum to more than the numbers we give. However, when combined VOSviewer recognizes these as repeats, and matches the numbers we list in S1 and the manuscript. Second, we labelled the downloaded papers in S2 with their respective systems. In some cases, the labels do not completely match our counts listed in S1 and raw data. This is because some of these papers use another system, but were not captured in our systematic literature search (e.g., a paper may have used hemeroby, but was not picked up by WoS, so this paper is not listed as one of the 52 hemeroby papers).
keywords: Web of Science; bibliographic analyses; vegetation; VOSviewer
published: 2020-06-12
 
This is a network of 14 systematic reviews on the salt controversy and their included studies. Each edge in the network represents an inclusion from one systematic review to an article. Systematic reviews were collected from Trinquart (Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184 ). <b>FILE FORMATS</b> 1) Article_list.csv - Unicode CSV 2) Article_attr.csv - Unicode CSV 3) inclusion_net_edges.csv - Unicode CSV 4) potential_inclusion_link.csv - Unicode CSV 5) systematic_review_inclusion_criteria.csv - Unicode CSV 6) Supplementary Reference List.pdf - PDF <b>ROW EXPLANATIONS</b> 1) Article_list.csv - Each row describes a systematic review or included article. 2) Article_attr.csv - Each row is the attributes of a systematic review/included article. 3) inclusion_net_edges.csv - Each row represents an inclusion from a systematic review to an article. 4) potential_inclusion_link.csv - Each row shows the available evidence base of a systematic review. 5) systematic_review_inclusion_criteria.csv - Each row is the inclusion criteria of a systematic review. 6) Supplementary Reference List.pdf - Each item is a bibliographic record of a systematic review/included paper. <b>COLUMN HEADER EXPLANATIONS</b> <b>1) Article_list.csv:</b> ID - Numeric ID of a paper paper assigned ID - ID of the paper from Trinquart et al. (2016) Type - Systematic review / primary study report Study Groupings - Groupings for related primary study reports from the same report, from Trinquart et al. (2016) (if applicable, otherwise blank) Title - Title of the paper year - Publication year of the paper Attitude - Scientific opinion about the salt controversy from Trinquart et al. (2016) Doi - DOIs of the paper. (if applicable, otherwise blank) Retracted (Y/N) - Whether the paper was retracted or withdrawn (Y). Blank if not retracted or withdrawn. <b>2) Article_attr.csv:</b> ID - Numeric ID of a paper year - Publication year Attitude - Scientific opinion about the salt controversy from Trinquart et al. (2016) Type - Systematic review/ primary study report <b>3) inclusion_net_edges.csv:</b> citing_ID - The numeric ID of a systematic review cited_ID - The numeric ID of the included articles <b>4) potential_inclusion_link.csv:</b> This data was translated from the Sankey diagram given in Trinquart et al. (2016) as Web Figure 4. Each row indicates a systematic review and each column indicates a primary study. In the matrix, "p" indicates that a given primary study had been published as of the search date of a given systematic review. <b>5)systematic_review_inclusion_criteria.csv:</b> ID - The numeric IDs of systematic reviews paper assigned ID - ID of the paper from Trinquart et al. (2016) attitude - Its scientific opinion about the salt controversy from Trinquart et al. (2016) No. of studies included - Number of articles included in the systematic review Study design - Study designs to include, per inclusion criteria population - Populations to include, per inclusion criteria Exposure/Intervention - Exposures/Interventions to include, per inclusion criteria outcome - Study outcomes required for inclusion, per inclusion criteria Language restriction - Report languages to include, per inclusion criteria follow-up period - Follow-up period required for inclusion, per inclusion criteria
keywords: systematic reviews; evidence synthesis; network visualization; tertiary studies
published: 2020-05-20
 
This dataset is a snapshot of the presence and structure of entrepreneurship education in U.S. four-year colleges and universities in 2015, including co-curricular activities and related infrastructure. Public, private not-for-profit and for-profit institutions are included, as are specialized four-year institutions. The dataset provides insight into the presence of entrepreneurship education both within business units and in other units of college campuses. Entrepreneurship is defined broadly, to include small business management and related career-focused options.
keywords: Entrepreneurship education; Small business education; Ewing Marion Kauffman Foundation; csv
published: 2020-03-08
 
This dataset inventories the availability of entrepreneurship and small business education, including co-curricular opportunities, in two-year colleges in the United States. The inventory provides a snapshot of activities at more than 1,650 public, not-for-profit, and private for-profit institutions, in 2014.
keywords: Small business education; entrepreneurship education; Kauffman Entrepreneurship Education Inventory; Ewing Marion Kauffman Foundation; Paul J. Magelli
published: 2019-11-12
 
We are sharing the tweet IDs of four social movements: #BlackLivesMatter, #WhiteLivesMatter, #AllLivesMatter, and #BlueLivesMatter movements. The tweets are collected between May 1st, 2015 and May 30, 2017. We eliminated the location to the United States and focused on extracting the original tweets, excluding the retweets. Recommended citations for the data: Rezapour, R. (2019). Data for: How do Moral Values Differ in Tweets on Social Movements?. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9614170_V1 and Rezapour, R., Ferronato, P., and Diesner, J. (2019). How do moral values differ in tweets on social movements?. In 2019 Computer Supported Cooperative Work and Social Computing Companion Publication (CSCW’19 Companion), Austin, TX.
keywords: Twitter; social movements; black lives matter; blue lives matter; all lives matter; white lives matter
published: 2019-09-06
 
This is a dataset of 1101 comments from The New York Times (May 1, 2015-August 31, 2015) that contains a mention of the stemmed words vaccine or vaxx.
keywords: vaccine;online comments
published: 2018-12-31
 
Sixty undergraduate STEM lecture classes were observed across 14 departments at the University of Illinois Urbana-Champaign in 2015 and 2016. We selected the classes to observe using purposive sampling techniques with the objectives of (1) collecting classroom observations that were representative of the STEM courses offered; (2) conducting observations on non-test, typical class days; and (3) comparing these classroom observations using the Class Observation Protocol for Undergraduate STEM (COPUS) to record the presence and frequency of active learning practices utilized by Community of Practice (CoP) and non-CoP instructors. Decimal values are the result of combined observations. All COPUS codes listed are from Smith (2013) "The Classroom Observation Protocol for Undergraduate STEM (COPUS): A New Instrument to Characterize STEM Classroom Practices" paper. For more information on the data collection process, see "Evidence that communities of practice are associated with active learning in large STEM lectures" by Tomkin et. al. (2019) in the International Journal of STEM Education.
keywords: COPUS, Community of Practice
published: 2019-04-05
 
File Name: Inclusion_Criteria_Annotation.csv Data Preparation: Xiaoru Dong Date of Preparation: 2019-04-04 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: The file contains lists of inclusion criteria of Cochrane Systematic Reviews and the manual annotation results. 5420 inclusion criteria were annotated, out of 7158 inclusion criteria available. Annotations are either "Only RCTs" or "Others". There are 2 columns in the file: - "Inclusion Criteria": Content of inclusion criteria of Cochrane Systematic Reviews. - "Only RCTs": Manual Annotation results. In which, "x" means the inclusion criteria is classified as "Only RCTs". Blank means that the inclusion criteria is classified as "Others". Notes: 1. "RCT" stands for Randomized Controlled Trial, which, in definition, is "a work that reports on a clinical trial that involves at least one test treatment and one control treatment, concurrent enrollment and follow-up of the test- and control-treated groups, and in which the treatments to be administered are selected by a random process, such as the use of a random-numbers table." [Randomized Controlled Trial publication type definition from https://www.nlm.nih.gov/mesh/pubtypes.html]. 2. In order to reproduce the relevant data to this, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided. 3. This datafile (V2) is a updated version of the datafile published at https://doi.org/10.13012/B2IDB-5958960_V1 with some minor spelling mistakes in the data fixed.
keywords: Inclusion criteri; Randomized controlled trials; Machine learning; Systematic reviews
published: 2018-12-20
 
File Name: AllWords.csv Data Preparation: Xiaoru Dong, Linh Hoang Date of Preparation: 2018-12-12 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: The file contains lists of all words (all features) from the bag-of-words feature extraction. Notes: In order to reproduce the data in this file, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.
keywords: Inclusion criteria; Randomized controlled trials; Machine learning; Systematic reviews
published: 2018-12-20
 
File Name: Error_Analysis.xslx Data Preparation: Xiaoru Dong Date of Preparation: 2018-12-12 Data Contributions: Xiaoru Dong, Linh Hoang, Jingyi Xie, Jodi Schneider Data Source: The classification prediction results of prediction in testing data set Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews Description: The file contains lists of the wrong and correct prediction of inclusion criteria of Cochrane Systematic Reviews from the testing data set and the length (number of words) of the inclusion criteria. Notes: In order to reproduce the relevant data to this, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.
keywords: Inclusion criteria, Randomized controlled trials, Machine learning, Systematic reviews
published: 2018-12-14
 
Spreadsheet with data about whether or not the indicated institutional repository website provides metadata documentation. See readme file for more information.
keywords: institutional repositories; metadata; best practices; metadata documentation