Displaying Dataset 1 - 25 of 71 in total

Subject Area

Social Sciences (71)
Life Sciences (0)
Physical Sciences (0)
Technology and Engineering (0)
Uncategorized (0)
Arts and Humanities (0)

Funder

U.S. National Institutes of Health (NIH) (16)
U.S. National Science Foundation (NSF) (11)
Other (9)
U.S. Department of Energy (DOE) (0)
U.S. Department of Agriculture (USDA) (0)
Illinois Department of Natural Resources (IDNR) (0)
U.S. National Aeronautics and Space Administration (NASA) (0)
U.S. Geological Survey (USGS) (0)
U.S. Army (0)

Publication Year

2018 (22)
2020 (21)
2019 (15)
2016 (8)
2017 (5)
2021 (0)

License

CC BY (41)
CC0 (30)
custom (0)
published: 2020-10-11
 
This dataset contains the publication record of 6429 computer science researchers collected from the Microsoft Academic dataset provided through their Knowledge Service API (http://bit.ly/microsoft-data).
published: 2020-09-27
 
This dataset contains R codes used to produce the figures submitted in the manuscript titled "Understanding the multifaceted geospatial software ecosystem: a survey approach". The raw survey data used to populate these charts cannot be shared due to the survey consent agreement.
keywords: R; figures; geospatial software
published: 2020-09-02
 
Citation context annotation. This dataset is a second version (V2) and part of the supplemental data for Jodi Schneider, Di Ye, Alison Hill, and Ashley Whitehorn. (2020) "Continued post-retraction citation of a fraudulent clinical trial report, eleven years after it was retracted for falsifying data". Scientometrics. In press, DOI: 10.1007/s11192-020-03631-1 Publications were selected by examining all citations to the retracted paper Matsuyama 2005, and selecting the 35 citing papers, published 2010 to 2019, which do not mention the retraction, but which mention the methods or results of the retracted paper (called "specific" in Ye, Di; Hill, Alison; Whitehorn (Fulton), Ashley; Schneider, Jodi (2020): Citation context annotation for new and newly found citations (2006-2019) to retracted paper Matsuyama 2005. University of Illinois at Urbana-Champaign. <a href="https://doi.org/10.13012/B2IDB-8150563_V1">https://doi.org/10.13012/B2IDB-8150563_V1</a> ). The annotated citations are second-generation citations to the retracted paper Matsuyama 2005 (RETRACTED: Matsuyama W, Mitsuyama H, Watanabe M, Oonakahara KI, Higashimoto I, Osame M, Arimura K. Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD. Chest. 2005 Dec 1;128(6):3817-27.), retracted in 2008 (Retraction in: Chest (2008) 134:4 (893) <a href="https://doi.org/10.1016/S0012-3692(08)60339-6">https://doi.org/10.1016/S0012-3692(08)60339-6<a/> ). <b>OVERALL DATA for VERSION 2 (V2)</b> FILES/FILE FORMATS Same data in two formats: 2010-2019 SG to specific not mentioned FG.csv - Unicode CSV (preservation format only) - same as in V1 2010-2019 SG to specific not mentioned FG.xlsx - Excel workbook (preferred format) - same as in V1 Additional files in V2: 2G-possible-misinformation-analyzed.csv - Unicode CSV (preservation format only) 2G-possible-misinformation-analyzed.xlsx - Excel workbook (preferred format) <b>ABBREVIATIONS: </b> 2G - Refers to the second-generation of Matsuyama FG - Refers to the direct citation of Matsuyama (the one the second-generation item cites) <b>COLUMN HEADER EXPLANATIONS </b> File name: 2G-possible-misinformation-analyzed. Other column headers in this file have same meaning as explained in V1. The following are additional header explanations: Quote Number - The order of the quote (citation context citing the first generation article given in "FG in bibliography") in the second generation article (given in "2G article") Quote - The text of the quote (citation context citing the first generation article given in "FG in bibliography") in the second generation article (given in "2G article") Translated Quote - English translation of "Quote", automatically translation from Google Scholar Seriousness/Risk - Our assessment of the risk of misinformation and its seriousness 2G topic - Our assessment of the topic of the cited article (the second generation article given in "2G article") 2G section - The section of the citing article (the second generation article given in "2G article") in which the cited article(the first generation article given in "FG in bibliography") was found FG in bib type - The type of article (e.g., review article), referring to the cited article (the first generation article given in "FG in bibliography") FG in bib topic - Our assessment of the topic of the cited article (the first generation article given in "FG in bibliography") FG in bib section - The section of the cited article (the first generation article given in "FG in bibliography") in which the Matsuyama retracted paper was cited
keywords: citation context annotation; retraction; diffusion of retraction; second-generation citation context analysis
published: 2020-08-21
 
# WikiCSSH If you are using WikiCSSH please cite the following: > Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana. 2020. “WikiCSSH: Extracting Computer Science Subject Headings from Wikipedia.” In Workshop on Scientific Knowledge Graphs (SKG 2020). https://skg.kmi.open.ac.uk/SKG2020/papers/HAN_et_al_SKG_2020.pdf > Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana. 2020. "WikiCSSH - Computer Science Subject Headings from Wikipedia". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0424970_V1 Download the WikiCSSH files from: https://doi.org/10.13012/B2IDB-0424970_V1 More details about the WikiCSSH project can be found at: https://github.com/uiuc-ischool-scanr/WikiCSSH This folder contains the following files: WikiCSSH_categories.csv - Categories in WikiCSSH WikiCSSH_category_links.csv - Links between categories in WikiCSSH Wikicssh_core_categories.csv - Core categories as mentioned in the paper WikiCSSH_category_links_all.csv - Links between categories in WikiCSSH (includes a dummy category called <ROOT> which is parent of isolates and top level categories) WikiCSSH_category2page.csv - Links between Wikipedia pages and Wikipedia Categories in WikiCSSH WikiCSSH_page2redirect.csv - Links between Wikipedia pages and Wikipedia page redirects in WikiCSSH This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit <a href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</a> or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
keywords: wikipedia; computer science;
published: 2020-08-18
 
These data and code enable replication of the findings and robustness checks in "No buzz for bees: Media coverage of pollinator decline," published in Proceedings of the National Academy of Sciences of the United States of America (2020)". In this paper, we find that although widespread declines in insect biomass and diversity are increasing concern within the scientific community, it remains unclear whether attention to pollinator declines has also increased within information sources serving the general public. Examining patterns of journalistic attention to the pollinator population crisis can also inform efforts to raise awareness about the importance of declines of insect species providing ecosystem services beyond pollination. We used the Global News Index developed by the Cline Center for Advanced Social Research at the University of Illinois at Urbana-Champaign to track news attention to pollinator topics in nearly 25 million news items published by two American national newspapers and four international wire services over the past four decades. We provide a link to documentation of the Global News Index in the "relationships with articles, code, o. We found vanishingly low levels of attention to pollinator population topics relative to coverage of climate change, which we use as a comparison topic. In the most recent subset of ~10 million stories published from 2007 to 2019, 1.39% (137,086 stories) refer to climate change/global warming, while only 0.02% (1,780) refer to pollinator populations in all contexts and just 0.007% (679) refer to pollinator declines. Substantial increases in news attention were detectable only in U.S. national newspapers. We also find that while climate change stories appear primarily in newspaper “front sections”, pollinator population stories remain largely marginalized in “science” and “back section” reports. At the same time, news reports about pollinator populations increasingly link the issue to climate change, which might ultimately help raise public awareness to effect needed policy changes.
keywords: News Coverage; Text Analytics; Insects; Pollinator; Cline Center; Cline Center for Advanced Social Research; political; social; political science; Global News Index; Archer; news; mass communication; journalism
published: 2020-08-10
 
These are text files downloaded from the Web of Science for the bibliographic analyses found in Zinnen et al. (2020) in Applied Vegetation Science. They represent the papers and reference lists from six expert-based indicator systems: Floristic Quality Assessment, hemeroby, naturalness indicator values (& social behaviors), Ellenberg indicator values, grassland utilization values, and urbanity indicator values. To examine data, download VOSviewer and see instructrions from van Eck & Waltman (2019) for how to upload data. Although we used bibliographic coupling, there are a number of other interesting bibliographic analyses you can use with these data (e.g., visualizing citations between journals from this set of documents). Note: There are two caveats to note about these data and Supplements 1 & 2 associated with our paper. First, there are some overlapping papers in these text files (i.e., raw data). When added individually, the papers sum to more than the numbers we give. However, when combined VOSviewer recognizes these as repeats, and matches the numbers we list in S1 and the manuscript. Second, we labelled the downloaded papers in S2 with their respective systems. In some cases, the labels do not completely match our counts listed in S1 and raw data. This is because some of these papers use another system, but were not captured in our systematic literature search (e.g., a paper may have used hemeroby, but was not picked up by WoS, so this paper is not listed as one of the 52 hemeroby papers).
keywords: Web of Science; bibliographic analyses; vegetation; VOSviewer
published: 2020-07-16
 
Dataset to be for SocialMediaIE tutorial
keywords: social media; deep learning; natural language processing
published: 2020-02-12
 
This dataset contains the results of a three month audit of housing advertisements. It accompanies the 2020 ICWSM paper "Auditing Race and Gender Discrimination in Online Housing Markets". It covers data collected between Dec 7, 2018 and March 19, 2019. There are two json files in the dataset: The first contains a list of json objects representing advertisements separated by newlines. Each object includes the date and time it was collected, the image and title (if collected) of the ad, the page on which it was displayed, and the training treatment it received. The second file is a list of json objects representing a visit to a housing lister separated by newlines. Each object contains the url, training treatment applied, the location searched, and the metadata of the top sites scraped. This metadata includes location, price, and number of rooms. The dataset also includes the raw images of ads collected in order to code them by interest and targeting. These were captured by selenium and named using a perceptive hash to de-duplicate images.
keywords: algorithmic audit; advertisement audit;
published: 2020-06-19
 
This dataset include data pulled from the World Bank 2009, the World Values Survey wave 6, Transparency International from 2009. The data were used to measure perceptions of expertise from individuals in nations that are recipients of development aid as measured by the World Bank.
keywords: World Values Survey; World Bank; expertise; development
published: 2020-06-12
 
This is a network of 14 systematic reviews on the salt controversy and their included studies. Each edge in the network represents an inclusion from one systematic review to an article. Systematic reviews were collected from Trinquart (Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184 ). <b>FILE FORMATS</b> 1) Article_list.csv - Unicode CSV 2) Article_attr.csv - Unicode CSV 3) inclusion_net_edges.csv - Unicode CSV 4) potential_inclusion_link.csv - Unicode CSV 5) systematic_review_inclusion_criteria.csv - Unicode CSV 6) Supplementary Reference List.pdf - PDF <b>ROW EXPLANATIONS</b> 1) Article_list.csv - Each row describes a systematic review or included article. 2) Article_attr.csv - Each row is the attributes of a systematic review/included article. 3) inclusion_net_edges.csv - Each row represents an inclusion from a systematic review to an article. 4) potential_inclusion_link.csv - Each row shows the available evidence base of a systematic review. 5) systematic_review_inclusion_criteria.csv - Each row is the inclusion criteria of a systematic review. 6) Supplementary Reference List.pdf - Each item is a bibliographic record of a systematic review/included paper. <b>COLUMN HEADER EXPLANATIONS</b> <b>1) Article_list.csv:</b> ID - Numeric ID of a paper paper assigned ID - ID of the paper from Trinquart et al. (2016) Type - Systematic review / primary study report Study Groupings - Groupings for related primary study reports from the same report, from Trinquart et al. (2016) (if applicable, otherwise blank) Title - Title of the paper year - Publication year of the paper Attitude - Scientific opinion about the salt controversy from Trinquart et al. (2016) Doi - DOIs of the paper. (if applicable, otherwise blank) Retracted (Y/N) - Whether the paper was retracted or withdrawn (Y). Blank if not retracted or withdrawn. <b>2) Article_attr.csv:</b> ID - Numeric ID of a paper year - Publication year Attitude - Scientific opinion about the salt controversy from Trinquart et al. (2016) Type - Systematic review/ primary study report <b>3) inclusion_net_edges.csv:</b> citing_ID - The numeric ID of a systematic review cited_ID - The numeric ID of the included articles <b>4) potential_inclusion_link.csv:</b> This data was translated from the Sankey diagram given in Trinquart et al. (2016) as Web Figure 4. Each row indicates a systematic review and each column indicates a primary study. In the matrix, "p" indicates that a given primary study had been published as of the search date of a given systematic review. <b>5)systematic_review_inclusion_criteria.csv:</b> ID - The numeric IDs of systematic reviews paper assigned ID - ID of the paper from Trinquart et al. (2016) attitude - Its scientific opinion about the salt controversy from Trinquart et al. (2016) No. of studies included - Number of articles included in the systematic review Study design - Study designs to include, per inclusion criteria population - Populations to include, per inclusion criteria Exposure/Intervention - Exposures/Interventions to include, per inclusion criteria outcome - Study outcomes required for inclusion, per inclusion criteria Language restriction - Report languages to include, per inclusion criteria follow-up period - Follow-up period required for inclusion, per inclusion criteria
keywords: systematic reviews; evidence synthesis; network visualization; tertiary studies
published: 2020-05-17
 
Models and predictions for submission to TRAC - 2020 Second Workshop on Trolling, Aggression and Cyberbullying Our approach is described in our paper titled: Mishra, Sudhanshu, Shivangi Prasad, and Shubhanshu Mishra. 2020. “Multilingual Joint Fine-Tuning of Transformer Models for Identifying Trolling, Aggression and Cyberbullying at TRAC 2020.” In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2020). The source code for training this model and more details can be found on our code repository: https://github.com/socialmediaie/TRAC2020 NOTE: These models are retrained for uploading here after our submission so the evaluation measures may be slightly different from the ones reported in the paper.
keywords: Social Media; Trolling; Aggression; Cyberbullying; text classification; natural language processing; deep learning; open source;
published: 2020-05-20
 
This dataset is a snapshot of the presence and structure of entrepreneurship education in U.S. four-year colleges and universities in 2015, including co-curricular activities and related infrastructure. Public, private not-for-profit and for-profit institutions are included, as are specialized four-year institutions. The dataset provides insight into the presence of entrepreneurship education both within business units and in other units of college campuses. Entrepreneurship is defined broadly, to include small business management and related career-focused options.
keywords: Entrepreneurship education; Small business education; Ewing Marion Kauffman Foundation; csv
published: 2020-05-15
 
Trained models for multi-task multi-dataset learning for sequence prediction in tweets Tasks include POS, NER, Chunking, and SuperSenseTagging Models were trained using: https://github.com/napsternxg/SocialMediaIE/blob/master/experiments/multitask_multidataset_experiment.py See https://github.com/napsternxg/SocialMediaIE for details.
keywords: twitter; deep learning; machine learning; trained models; multi-task learning; multi-dataset learning;
published: 2020-05-15
 
This data has tweets collected in paper Shubhanshu Mishra, Sneha Agarwal, Jinlong Guo, Kirstin Phelps, Johna Picco, and Jana Diesner. 2014. Enthusiasm and support: alternative sentiment classification for social movements on social media. In Proceedings of the 2014 ACM conference on Web science (WebSci '14). ACM, New York, NY, USA, 261-262. DOI: https://doi.org/10.1145/2615569.2615667 The data only contains tweet IDs and the corresponding enthusiasm and support labels by two different annotators.
keywords: Twitter; text classification; enthusiasm; support; social causes; LGBT; Cyberbullying; NFL
published: 2020-05-13
 
Terrorism is among the most pressing challenges to democratic governance around the world. The Responsible Terrorism Coverage (or ResTeCo) project aims to address a fundamental dilemma facing 21st century societies: how to give citizens the information they need without giving terrorists the kind of attention they want. The ResTeCo hopes to inform best practices by using extreme-scale text analytic methods to extract information from more than 70 years of terrorism-related media coverage from around the world and across 5 languages. Our goal is to expand the available data on media responses to terrorism and enable the development of empirically-validated models for socially responsible, effective news organizations. This particular dataset contains information extracted from terrorism-related stories in the New York Times published between 1945 and 2018. It includes variables that measure the relative share of terrorism-related topics, the valence and intensity of emotional language, as well as the people, places, and organizations mentioned. This dataset contains 3 files: 1. <i>"ResTeCo Project NYT Dataset Variable Descriptions.pdf"</i> <ul> <li>A detailed codebook containing a summary of the Responsible Terrorism Coverage (ResTeCo) Project New York Times (NYT) Dataset and descriptions of all variables. </li> </ul> 2. <i>"resteco-nyt.csv"</i> <ul><li>This file contains the data extracted from terrorism-related media coverage in the New York Times between 1945 and 2018. It includes variables that measure the relative share of topics, sentiment, and emotion present in this coverage. There are also variables that contain metadata and list the people, places, and organizations mentioned in these articles. There are 53 variables and 438,373 observations. The variable "id" uniquely identifies each observation. Each observation represents a single news article. </li> <li> <b>Please note</b> that care should be taken when using "respect-nyt.csv". The file may not be suitable to use in a spreadsheet program like Excel as some of the values get to be quite large. Excel cannot handle some of these large values, which may cause the data to appear corrupted within the software. It is encouraged that a user of this data use a statistical package such as Stata, R, or Python to ensure the structure and quality of the data remains preserved.</li> </ul> 3. <i>"README.md"</i> <ul><li>This file contains useful information for the user about the dataset. It is a text file written in mark down language</li> </ul> <b>Citation Guidelines</b> 1) To cite this codebook please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project New York Times (NYT) Dataset Variable Descriptions. Responsible Terrorism Coverage (ResTeCo) Project New York Times Dataset. Cline Center for Advanced Social Research. May 13. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-4638196_V1 2) To cite the data please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project New York Times Dataset. Cline Center for Advanced Social Research. May 13. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-4638196_V1
keywords: Terrorism, Text Analytics, News Coverage, Topic Modeling, Sentiment Analysis
published: 2020-05-11
 
The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. <b>Additional Resources:</b> - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, you can fill out the <a href="https://forms.gle/oaUWRSSCkqKxyY5T7"><b>Archer User Information Form</b></a>. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the <a href="https://forms.gle/6eA2yJUGFMtj5swY7"><b>Archer User Feedback Form</b></a>. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this form to <a href="https://groups.webservices.illinois.edu/subscribe/123172"><b>subscribe to Archer Users Group</b></a>. <b>Citation Guidelines:</b> 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2020. Global News Index and Extracted Features Repository [codebook]. Champaign, IL: University of Illinois. doi:10.13012/B2IDB-5649852_V1 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2020. Global News Index and Extracted Features Repository [database]. Champaign, IL: University of Illinois. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V1
keywords: Cline Center; Cline Center for Advanced Social Research; political; social; political science; Global News Index; Archer; news; mass communication; journalism;
published: 2020-05-04
 
The Cline Center Historical Phoenix Event Data covers the period 1945-2019 and includes 8.2 million events extracted from 21.2 million news stories. This data was produced using the state-of-the-art PETRARCH-2 software to analyze content from the New York Times (1945-2018), the BBC Monitoring's Summary of World Broadcasts (1979-2019), the Wall Street Journal (1945-2005), and the Central Intelligence Agency’s Foreign Broadcast Information Service (1995-2004). It documents the agents, locations, and issues at stake in a wide variety of conflict, cooperation and communicative events in the Conflict and Mediation Event Observations (CAMEO) ontology. The Cline Center produced these data with the generous support of Linowes Fellow and Faculty Affiliate Prof. Dov Cohen and help from our academic and private sector collaborators in the Open Event Data Alliance (OEDA). For details on the CAMEO framework, see: Schrodt, Philip A., Omür Yilmaz, Deborah J. Gerner, and Dennis Hermreck. "The CAMEO (conflict and mediation event observations) actor coding framework." In 2008 Annual Meeting of the International Studies Association. 2008. http://eventdata.parusanalytics.com/papers.dir/APSA.2005.pdf Gerner, D.J., Schrodt, P.A. and Yilmaz, O., 2012. Conflict and mediation event observations (CAMEO) Codebook. http://eventdata.parusanalytics.com/cameo.dir/CAMEO.Ethnic.Groups.zip For more information about PETRARCH and OEDA, see: http://openeventdata.org/
keywords: OEDA; Open Event Data Alliance (OEDA); Cline Center; Cline Center for Advanced Social Research; civil unrest; petrarch; phoenix event data; violence; protest; political; conflict; political science
published: 2020-03-08
 
This dataset inventories the availability of entrepreneurship and small business education, including co-curricular opportunities, in two-year colleges in the United States. The inventory provides a snapshot of activities at more than 1,650 public, not-for-profit, and private for-profit institutions, in 2014.
keywords: Small business education; entrepreneurship education; Kauffman Entrepreneurship Education Inventory; Ewing Marion Kauffman Foundation; Paul J. Magelli
published: 2020-03-03
 
This second version (V2) provides additional data cleaning compared to V1, additional data collection (mainly to include data from 2019), and more metadata for nodes. Please see NETWORKv2README.txt for more detail.
keywords: citations; retraction; network analysis; Web of Science; Google Scholar; indirect citation
published: 2020-02-23
 
Citation context annotation for papers citing retracted paper Matsuyama 2005 (RETRACTED: Matsuyama W, Mitsuyama H, Watanabe M, Oonakahara KI, Higashimoto I, Osame M, Arimura K. Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD. Chest. 2005 Dec 1;128(6):3817-27.), retracted in 2008 (Retraction in: Chest (2008) 134:4 (893) <a href="https://doi.org/10.1016/S0012-3692(08)60339-6">https://doi.org/10.1016/S0012-3692(08)60339-6<a/> ). This is part of the supplemental data for Jodi Schneider, Di Ye, Alison Hill, and Ashley Whitehorn. "Continued Citation of a Fraudulent Clinical Trial Report, Eleven Years after it was retracted for Falsifying Data" [R&R under review with Scientometrics]. Overall we found 148 citations to the retracted paper from 2006 to 2019, However, this dataset does not include the annotations described in the 2015. in Ashley Fulton, Alison Coates, Marie Williams, Peter Howe, and Alison Hill. "Persistent citation of the only published randomized controlled trial of omega-3 supplementation in chronic obstructive pulmonary disease six years after its retraction." Publications 3, no. 1 (2015): 17-26. In this dataset 70 new and newly found citations are listed: 66 annotated citations and 4 pending citations (non-annotated since we don't have full-text). "New citations" refer to articles published from March 25, 2014 to 2019, found in Google Scholar and Web of Science. "Newly found citations" refer articles published 2006-2013, found in Google Scholar and Web of Science, but not previously covered in Ashley Fulton, Alison Coates, Marie Williams, Peter Howe, and Alison Hill. "Persistent citation of the only published randomised controlled trial of omega-3 supplementation in chronic obstructive pulmonary disease six years after its retraction." Publications 3, no. 1 (2015): 17-26. NOTES: This is Unicode data. Some publication titles & quotes are in non-Latin characters and they may contain commas, quotation marks, etc. FILES/FILE FORMATS Same data in two formats: 2006-2019-new-citation-contexts-to-Matsuyama.csv - Unicode CSV (preservation format only) 2006-2019-new-citation-contexts-to-Matsuyama.xlsx - Excel workbook (preferred format) ROW EXPLANATIONS 70 rows of data - one citing publication per row COLUMN HEADER EXPLANATIONS Note - processing notes Annotation pending - Y or blank Year Published - publication year ID - ID corresponding to the network analysis. See Ye, Di; Schneider, Jodi (2019): Network of First and Second-generation citations to Matsuyama 2005 from Google Scholar and Web of Science. University of Illinois at Urbana-Champaign. <a href="https://doi.org/10.13012/B2IDB-1403534_V2">https://doi.org/10.13012/B2IDB-1403534_V2</a> Title - item title (some have non-Latin characters, commas, etc.) Official Translated Title - item title in English, as listed in the publication Machine Translated Title - item title in English, translated by Google Scholar Language - publication language Type - publication type (e.g., bachelor's thesis, blog post, book chapter, clinical guidelines, Cochrane Review, consumer-oriented evidence summary, continuing education journal article, journal article, letter to the editor, magazine article, Master's thesis, patent, Ph.D. thesis, textbook chapter, training module) Book title for book chapters - Only for a book chapter - the book title University for theses - for bachelor's thesis, Master's thesis, Ph.D. thesis - the associated university Pre/Post Retraction - "Pre" for 2006-2008 (means published before the October 2008 retraction notice or in the 2 months afterwards); "Post" for 2009-2019 (considered post-retraction for our analysis) Identifier where relevant - ISBN, Patent ID, PMID (only for items we considered hard to find/identify, e.g. those without a DOI-based URL) URL where available - URL, ideally a DOI-based URL Reference number/style - reference Only in bibliography - Y or blank Acknowledged - If annotated, Y, Not relevant as retraction not published yet, or N (blank otherwise) Positive / "Poor Research" (Negative) - P for positive, N for negative if annotated; blank otherwise Human translated quotations - Y or blank; blank means Google scholar was used to translate quotations for Translated Quotation X Specific/in passing (overall) - Specific if any of the 5 quotations are specific [aggregates Specific / In Passing (Quotation X)] Quotation 1 - First quotation (or blank) (includes non-Latin characters in some cases) Translated Quotation 1 - English translation of "Quotation 1" (or blank) Specific / In Passing (Quotation 1) - Specific if "Quotation 1" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 1) - Methods; Results; or Methods and Results - blank if "Quotation 1" not specific, no associated quotation, or not yet annotated Quotation 2 - Second quotation (includes non-Latin characters in some cases) Translated Quotation 2 - English translation of "Quotation 2" Specific / In Passing (Quotation 2) - Specific if "Quotation 2" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 2) - Methods; Results; or Methods and Results - blank if "Quotation 2" not specific, no associated quotation, or not yet annotated Quotation 3 - Third quotation (includes non-Latin characters in some cases) Translated Quotation 3 - English translation of "Quotation 3" Specific / In Passing (Quotation 3) - Specific if "Quotation 3" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 3) - Methods; Results; or Methods and Results - blank if "Quotation 3" not specific, no associated quotation, or not yet annotated Quotation 4 - Fourth quotation (includes non-Latin characters in some cases) Translated Quotation 4 - English translation of "Quotation 4" Specific / In Passing (Quotation 4) - Specific if "Quotation 4" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 4) - Methods; Results; or Methods and Results - blank if "Quotation 4" not specific, no associated quotation, or not yet annotated Quotation 5 - Fifth quotation (includes non-Latin characters in some cases) Translated Quotation 5 - English translation of "Quotation 5" Specific / In Passing (Quotation 5) - Specific if "Quotation 5" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 5) - Methods; Results; or Methods and Results - blank if "Quotation 5" not specific, no associated quotation, or not yet annotated Further Notes - additional notes
keywords: citation context annotation, retraction, diffusion of retraction
published: 2020-02-12
 
The XSEDE program manages the database of allocation awards for the portfolio of advanced research computing resources funded by the National Science Foundation (NSF). The database holds data for allocation awards dating to the start of the TeraGrid program in 2004 to present, with awards continuing through the end of the second XSEDE award in 2021. The project data include lead researcher and affiliation, title and abstract, field of science, and the start and end dates. Along with the project information, the data set includes resource allocation and usage data for each award associated with the project. The data show the transition of resources over a fifteen year span along with the evolution of researchers, fields of science, and institutional representation.
keywords: allocations; cyberinfrastructure; XSEDE
published: 2019-12-22
 
Dataset providing calculation of a Competition Index (CI) for Late Pleistocene carnivore guilds in Laos and Vietnam and their relationship to humans. Prey mass spectra, Prey focus masses, and prey class raw data can be used to calculate the CI following Hemmer (2004). Mass estimates were calculated for each species following Van Valkenburgh (1990). Full citations to methodological papers are included as relationships with other resources
keywords: competition; Southeast Asia; carnivores; humans
published: 2019-10-16
 
Human annotations of randomly selected judged documents from the AP 88-89, Robust 2004, WT10g, and GOV2 TREC collections. Seven annotators were asked to read documents in their entirety and then select up to ten terms they felt best represented the main topic(s) of the document. Terms were chosen from among a set sampled from the document in question and from related documents.
keywords: TREC; information retrieval; document topicality; document description
published: 2019-11-12
 
We are sharing the tweet IDs of four social movements: #BlackLivesMatter, #WhiteLivesMatter, #AllLivesMatter, and #BlueLivesMatter movements. The tweets are collected between May 1st, 2015 and May 30, 2017. We eliminated the location to the United States and focused on extracting the original tweets, excluding the retweets. Recommended citations for the data: Rezapour, R. (2019). Data for: How do Moral Values Differ in Tweets on Social Movements?. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9614170_V1 and Rezapour, R., Ferronato, P., and Diesner, J. (2019). How do moral values differ in tweets on social movements?. In 2019 Computer Supported Cooperative Work and Social Computing Companion Publication (CSCW’19 Companion), Austin, TX.
keywords: Twitter; social movements; black lives matter; blue lives matter; all lives matter; white lives matter
published: 2019-09-17
 
Trained models for multi-task multi-dataset learning for text classification as well as sequence tagging in tweets. Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality. Sequence tagging tasks include POS, NER, Chunking, and SuperSenseTagging. Models were trained using: <a href="https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification_tagging.py">https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification_tagging.py</a> See <a href="https://github.com/socialmediaie/SocialMediaIE">https://github.com/socialmediaie/SocialMediaIE</a> and <a href="https://socialmediaie.github.io">https://socialmediaie.github.io</a> for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929
keywords: twitter; deep learning; machine learning; trained models; multi-task learning; multi-dataset learning; classification; sequence tagging