Version DOI Comment Publication Date
1 10.13012/B2IDB-9087546_V1 2018-04-19
2.64 KB File
717 MB File

Contact the Research Data Service for help interpreting this log.

update: {"nested_updated_at"=>[Thu, 18 Apr 2024 18:23:34.356021000 UTC +00:00, Thu, 18 Apr 2024 18:23:34.418889000 UTC +00:00]} 2024-04-18T18:23:34Z
update: {"nested_updated_at"=>[Mon, 08 Aug 2022 16:45:04.247294000 UTC +00:00, Thu, 18 Apr 2024 18:23:34.356021000 UTC +00:00]} 2024-04-18T18:23:34Z
update: {"nested_updated_at"=>[nil, Mon, 08 Aug 2022 16:45:04.247294000 UTC +00:00]} 2024-01-03T18:23:19Z
update: {"description"=>["Prepared by Vetle Torvik 2018-04-15\r\n\r\nThe dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed.\r\n\r\n&bull; How was the dataset created?\r\nFirst and lastnames of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including\r\n\r\nEthnea+Genni as described in:\r\n\r\n<i>Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA.\r\nhttp://hdl.handle.net/2142/88927</i>\r\n\r\n<i>Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720</i>\r\n\r\nEthnicSeer: http://singularity.ist.psu.edu/ethnicity\r\n<i>Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada</i>\r\n\r\nSexMachine 0.1.1: <a href=\"https://pypi.python.org/pypi/SexMachine/\">https://pypi.org/project/SexMachine</a>\r\n\r\nFirst names, for some Author-ity records lacking them, were harvested from outside bibliographic databases.\r\n\r\n&bull; The code and back-end data is periodically updated and made available for query at <a href =\"http://abel.ischool.illinois.edu\">Torvik Research Group</a>\r\n\r\n&bull; What is the format of the dataset?\r\nThe dataset contains 9,300,182 rows and 10 columns\r\n1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition)\r\n2. name: full name used as input to EthnicSeer)\r\n3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX\r\n4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction\r\n5. lastname: used as input for Ethnea+Genni\r\n6. firstname: used as input for Ethnea+Genni\r\n7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short)\r\n8. Genni: predicted gender; 'F', 'M', or '-'\r\n9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male)\r\n10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'\r\n", "Prepared by Vetle Torvik 2018-04-15\r\n\r\nThe dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed.\r\n\r\n&bull; How was the dataset created?\r\nFirst and last names of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including\r\n\r\nEthnea+Genni as described in:\r\n\r\n<i>Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA.\r\nhttp://hdl.handle.net/2142/88927</i>\r\n\r\n<i>Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720</i>\r\n\r\nEthnicSeer: http://singularity.ist.psu.edu/ethnicity\r\n<i>Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada</i>\r\n\r\nSexMachine 0.1.1: <a href=\"https://pypi.python.org/pypi/SexMachine/\">https://pypi.org/project/SexMachine</a>\r\n\r\nFirst names, for some Author-ity records lacking them, were harvested from outside bibliographic databases.\r\n\r\n&bull; The code and back-end data is periodically updated and made available for query at <a href =\"http://abel.ischool.illinois.edu\">Torvik Research Group</a>\r\n\r\n&bull; What is the format of the dataset?\r\nThe dataset contains 9,300,182 rows and 10 columns\r\n1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition)\r\n2. name: full name used as input to EthnicSeer)\r\n3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX\r\n4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction\r\n5. lastname: used as input for Ethnea+Genni\r\n6. firstname: used as input for Ethnea+Genni\r\n7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short)\r\n8. Genni: predicted gender; 'F', 'M', or '-'\r\n9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male)\r\n10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'\r\n"]} 2018-12-05T21:47:04Z
update: {"keywords"=>["Androgyny; Bibliometrics; Data mining; Earch engine; Gender; Semantic orientation; Temporal prediction; Textual markers", "Androgyny; Bibliometrics; Data mining; Search engine; Gender; Semantic orientation; Temporal prediction; Textual markers"]} 2018-12-04T17:56:06Z
update: {"description"=>["Prepared by Vetle Torvik April 5, 2018\r\n\r\nThe dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed.\r\n\r\n&bull; How was the dataset created?\r\nFirst and lastnames of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including\r\n\r\nEthnea+Genni as described in:\r\n\r\n<i>Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA.\r\nhttp://hdl.handle.net/2142/88927</i>\r\n\r\n<i>Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720</i>\r\n\r\nEthnicSeer: http://singularity.ist.psu.edu/ethnicity\r\n<i>Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada</i>\r\n\r\nSexMachine 0.1.1: <a href=\"https://pypi.python.org/pypi/SexMachine/\">https://pypi.org/project/SexMachine</a>\r\n\r\nFirst names, for some Author-ity records lacking them, were harvested from outside bibliographic databases.\r\n\r\n&bull; The code and back-end data is periodically updated and made available for query at <a href =\"http://abel.ischool.illinois.edu\">Torvik Research Group</a>\r\n\r\n&bull; What is the format of the dataset?\r\nThe dataset contains 9,300,182 rows and 10 columns\r\n1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition)\r\n2. name: full name used as input to EthnicSeer)\r\n3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX\r\n4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction\r\n5. lastname: used as input for Ethnea+Genni\r\n6. firstname: used as input for Ethnea+Genni\r\n7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short)\r\n8. Genni: predicted gender; 'F', 'M', or '-'\r\n9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male)\r\n10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'\r\n", "Prepared by Vetle Torvik 2018-04-15\r\n\r\nThe dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed.\r\n\r\n&bull; How was the dataset created?\r\nFirst and lastnames of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including\r\n\r\nEthnea+Genni as described in:\r\n\r\n<i>Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA.\r\nhttp://hdl.handle.net/2142/88927</i>\r\n\r\n<i>Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720</i>\r\n\r\nEthnicSeer: http://singularity.ist.psu.edu/ethnicity\r\n<i>Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada</i>\r\n\r\nSexMachine 0.1.1: <a href=\"https://pypi.python.org/pypi/SexMachine/\">https://pypi.org/project/SexMachine</a>\r\n\r\nFirst names, for some Author-ity records lacking them, were harvested from outside bibliographic databases.\r\n\r\n&bull; The code and back-end data is periodically updated and made available for query at <a href =\"http://abel.ischool.illinois.edu\">Torvik Research Group</a>\r\n\r\n&bull; What is the format of the dataset?\r\nThe dataset contains 9,300,182 rows and 10 columns\r\n1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition)\r\n2. name: full name used as input to EthnicSeer)\r\n3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX\r\n4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction\r\n5. lastname: used as input for Ethnea+Genni\r\n6. firstname: used as input for Ethnea+Genni\r\n7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short)\r\n8. Genni: predicted gender; 'F', 'M', or '-'\r\n9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male)\r\n10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'\r\n"]} 2018-04-23T19:33:56Z
update: {"description"=>["Prepared by Vetle Torvik April 5, 2018\r\n\r\nThe dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed.\r\n\r\n&bull; How was the dataset created?\r\nFirst and lastnames of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including\r\n\r\nEthnea+Genni as described in:\r\n\r\n<i>Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA.\r\nhttp://hdl.handle.net/2142/88927</i>\r\n\r\n<i>Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720</i>\r\n\r\nEthnicSeer: http://singularity.ist.psu.edu/ethnicity\r\n<i>Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada</i>\r\n\r\nSexMachine 0.1.1: < a href=\"https://pypi.python.org/pypi/SexMachine/\">https://pypi.org/project/SexMachine/</a>\r\n\r\nFirst names, for some Author-ity records lacking them, were harvested from outside bibliographic databases.\r\n\r\n&bull; The code and back-end data is periodically updated and made available for query at <a href =\"http://abel.ischool.illinois.edu\">Torvik Research Group</a>\r\n\r\n&bull; What is the format of the dataset?\r\nThe dataset contains 9,300,182 rows and 10 columns\r\n1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition)\r\n2. name: full name used as input to EthnicSeer)\r\n3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX\r\n4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction\r\n5. lastname: used as input for Ethnea+Genni\r\n6. firstname: used as input for Ethnea+Genni\r\n7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short)\r\n8. Genni: predicted gender; 'F', 'M', or '-'\r\n9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male)\r\n10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'\r\n", "Prepared by Vetle Torvik April 5, 2018\r\n\r\nThe dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed.\r\n\r\n&bull; How was the dataset created?\r\nFirst and lastnames of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including\r\n\r\nEthnea+Genni as described in:\r\n\r\n<i>Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA.\r\nhttp://hdl.handle.net/2142/88927</i>\r\n\r\n<i>Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720</i>\r\n\r\nEthnicSeer: http://singularity.ist.psu.edu/ethnicity\r\n<i>Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada</i>\r\n\r\nSexMachine 0.1.1: <a href=\"https://pypi.python.org/pypi/SexMachine/\">https://pypi.org/project/SexMachine</a>\r\n\r\nFirst names, for some Author-ity records lacking them, were harvested from outside bibliographic databases.\r\n\r\n&bull; The code and back-end data is periodically updated and made available for query at <a href =\"http://abel.ischool.illinois.edu\">Torvik Research Group</a>\r\n\r\n&bull; What is the format of the dataset?\r\nThe dataset contains 9,300,182 rows and 10 columns\r\n1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition)\r\n2. name: full name used as input to EthnicSeer)\r\n3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX\r\n4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction\r\n5. lastname: used as input for Ethnea+Genni\r\n6. firstname: used as input for Ethnea+Genni\r\n7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short)\r\n8. Genni: predicted gender; 'F', 'M', or '-'\r\n9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male)\r\n10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'\r\n"]} 2018-04-23T19:33:07Z
update: {"description"=>["prepared by Vetle Torvik April 5, 2018\r\n\r\nThe dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed.\r\n\r\nHow was the dataset created?\r\nFirst and lastnames of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including\r\n\r\nEthnea+Genni as described in:\r\n\r\nTorvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA.\r\nhttp://hdl.handle.net/2142/88927\r\n\r\nSmith BN, Singh M, Torvik VI (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 199-208). JCDL '13. Indianapolis, IN, USA.\r\n\r\nEthnicSeer: http://singularity.ist.psu.edu/ethnicity\r\n Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada\r\n\r\nSexMachine 0.1.1: https://pypi.python.org/pypi/SexMachine/\r\n\r\nFirst names, for some Author-ity records lacking them, were harvested from outside bibliographic databases.\r\n\r\nThe code and back-end data is periodically updated and made available for query here\r\nhttp://abel.ischool.illinois.edu\r\n\r\nWhat is the format of the dataset?\r\n1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition)\r\n2. name: full name used as input to EthnicSeer)\r\n2. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX\r\n3. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction\r\n4. lastname: used as input for Ethnea+Genni\r\n5. firstname: used as input for Ethnea+Genni\r\n6. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short)\r\n7. Genni: predicted gender; 'F', 'M', or '-'\r\n8. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male)\r\n9. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'\r\n", "Prepared by Vetle Torvik April 5, 2018\r\n\r\nThe dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed.\r\n\r\n&bull; How was the dataset created?\r\nFirst and lastnames of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including\r\n\r\nEthnea+Genni as described in:\r\n\r\n<i>Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA.\r\nhttp://hdl.handle.net/2142/88927</i>\r\n\r\n<i>Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720</i>\r\n\r\nEthnicSeer: http://singularity.ist.psu.edu/ethnicity\r\n<i>Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada</i>\r\n\r\nSexMachine 0.1.1: < a href=\"https://pypi.python.org/pypi/SexMachine/\">https://pypi.org/project/SexMachine/</a>\r\n\r\nFirst names, for some Author-ity records lacking them, were harvested from outside bibliographic databases.\r\n\r\n&bull; The code and back-end data is periodically updated and made available for query at <a href =\"http://abel.ischool.illinois.edu\">Torvik Research Group</a>\r\n\r\n&bull; What is the format of the dataset?\r\nThe dataset contains 9,300,182 rows and 10 columns\r\n1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition)\r\n2. name: full name used as input to EthnicSeer)\r\n3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX\r\n4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction\r\n5. lastname: used as input for Ethnea+Genni\r\n6. firstname: used as input for Ethnea+Genni\r\n7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short)\r\n8. Genni: predicted gender; 'F', 'M', or '-'\r\n9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male)\r\n10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'\r\n"], "keywords"=>["", "Androgyny; Bibliometrics; Data mining; Earch engine; Gender; Semantic orientation; Temporal prediction; Textual markers"], "version_comment"=>[nil, ""], "subject"=>["", "Social Sciences"]} 2018-04-23T19:30:14Z