Illinois Data Bank Dataset Search Results
Results
published:
2025-05-21
Mostame, Parham; Wirsich, Jonathan; Alderson, Thomas H.; Ridley, Ben; Giraud, Anne-Lise; Carmichael, David W.; Vulliemoz, Serge; Guye, Maxime; Lemieux, Louis; Sadaghiani, Sepideh
(2025)
___________________________________SUMMARY
This dataset contains derivative data from concurrent fMRI and scalp EEG recordings used in:
Mostame Parham, Wirsich Jonathan, Alderson Thomas H, Ridley Ben, Giraud Anne-Lise, Carmichael David W, Vulliemoz Serge, Guye Maxime, Lemieux Louis, Sadaghiani Sepideh (2024) A multiplex of connectome trajectories enables several connectivity patterns in parallel eLife 13:RP98777. doi: https://doi.org/10.7554/eLife.98777.3
___________________________________RAW DATA
The data has been originally published and described as part of other studies (Morillon et al., 2010; Sadaghiani et al., 2012). Briefly, 10 minutes of eyes-closed resting state were analyzed from 26 healthy subjects (average age = 24.39 years; range: 18-31 years; 8 females) with no history of psychiatric or neurological disorders. Informed consent was given by each participant and the study was approved by the local Research Ethics Committee (CPP Ile de France III). FMRI was acquired using a 3T Siemens Tim Trio scanner with a GE-EPI pulse sequence (TR = 2 s; TE = 50 ms; 40 slices; 300 volumes; field of view: 192×192; voxel size: 3×3×3 mm3). Structural T1-weighted scan were acquired using the MPRAGE pulse sequence (176 slices; field of view: 256×256; voxel size: 1×1×1 mm3). 62-channel scalp EEG (Easycap, with an additional EOG and an ECG channel) was recorded using an MR-compatible amplifier (BrainAmp MR, Brain Products) at 5Hz sampling rate.
___________________________________PREPROCESSING
fMRI and EEG data were preprocessed with standard preprocessing steps as explained in detail elsewhere (Wirsich et al., 2020). In brief, fMRI underwent standard slice-time correction, spatial realignment (SPM12, http://www.fil.ion.ucl.ac.uk/spm/software/spm12). Structural T1-weighted images were processed using Freesurfer (recon-all, v6.0.0, https://surfer.nmr.mgh.harvard.edu/) in order to perform non-uniformity and intensity correction, skull stripping and gray/white matter segmentation. The cortex was parcellated into 68 regions of the Desikan-Kiliany atlas (Desikan et al., 2006). This atlas was chosen because —as an anatomical parcellation— avoids biases towards one or the other functional data modality. The T1 images of each subject and the Desikan-Killiany were co-registered to the fMRI images (FSL-FLIRT 6.0.2, https://fsl.fmrib.ox.ac.uk/fsl/fslwiki). We extracted signals of no interest such as the average signals of cerebrospinal fluid (CSF) and white matter from manually defined regions of interest (ROI, 5 mm sphere, Marsbar Toolbox 0.44, http://marsbar.sourceforge.net) and regressed out of the BOLD timeseries along with 6 rotation, translation motion parameters and global gray matter signal (Wirsich et al., 2017a). Then we bandpass-filtered the timeseries at 0.009–0.08 Hz. Average timeseries of each region was then used to calculate connectivity.
EEG underwent gradient and cardio-ballistic artifact removal using Brain Vision Analyzer software (Allen et al., 1998, 2000) and was down-sampled to 250 Hz. EEG was projected into source space using the Tikhonov-regularized minimum norm in Brainstorm software (Baillet et al., 2001; Tadel et al., 2011). Source activity was then averaged to the 68 regions of the Desikan-Killiany atlas. Band-limited EEG signals in each canonical frequency band and every atlas region were then used to calculate frequency-specific connectome dynamics. Note that the MEG-ROI-nets toolbox in the OHBA Software Library (OSL; https://ohba-analysis.github.io/osl-docs/) was used to minimize source leakage in the band-limited source-localized EEG data (Colclough et al., 2015).
___________________________________FOLDER STRUCTURE
The dataset includes five separate folders as described below:
1) EEGfMRI_dFC folder: connectome dynamics of scalp data
This folder contains 26 single MATLAB (.mat) files for each subject. Inside each `.mat` is a structure with fields `A`, `B`, and `C`, corresponding to fMRI, amplitude-coupling, and phase-coupling connectome dynamics, respectively. The fMRI data are 3-dimensional (ROI × ROI × timepoints). The EEG data are stored in a 1×5 cell array (Delta, Theta, Alpha, Beta, Gamma), each cell containing a 3-D ROI × ROI × timepoints matrix.
2) EEGfMRI_dFC_SourceOrtho foldeR: connectome dynamics of source-orthogonalized scalp data
Same format as above, except that EEG connectome dynamics are derived from source-orthogonalized signals. The MEG-ROI-nets toolbox in the OHBA Software Library (OSL; https://ohba-analysis.github.io/osl-docs/) was used to minimize source leakage in the band-limited, source-localized EEG data (Colclough et al., 2015).
3-5) Cross-modal Recurrence Plot (CRP) data
Each subject has an Excel file with five sheets (Delta through Gamma), corresponding to the five frequency bands. Each sheet contains a 2-D CRP matrix (rows = fMRI timepoints, columns = band-limited EEG timepoints).
- Scalp EEG–fMRI CRPs (CRP_EEGfMRI and CRP_EEGfMRI_SourceOrtho folder): two versions (with and without source-orthogonalization), each has 52 Excel files, including amplitude- and phase-coupling CRPs.
- Intracranial EEG–fMRI CRPs (CRP_iEEGfMRI folder): one version, 27 Excel files, containing three cases: amplitude coupling, HRF-convolved amplitude coupling, and phase coupling.
keywords:
Connectome; fMRI-EEG; Intracranial; Multiplex
published:
2025-01-31
Punyasena, Surangi W.; Romero, Ingrid; Urban, Michael A.
(2025)
Title: Airyscan confocal superresolution images of extant Malvaceae pollen with a focus on Bombacoideae
Authors: Surangi W. Punyasena, Ingrid Romero, Michael A. Urban
Subject: Biological sciences
Keywords: Malvaceae; superresolution microscopy; Zeiss; Bombacacidites; Neotropics; CZI
Funder: NSF-DBI Advances in Bioinformatics (NSF-DBI-1262561)
Corresponding Creator: Surangi W. Punyasena
This dataset includes a total of 430 images of extant specimens of the Malvaceae, with a focus on species that are or have been included within the subfamily Bombacoideae. There are 27 genera included within 26 folders. Each folder is named by genus and contains all the images that correspond to that genus. Note that the genus _Matisia_ is included with _Quararibea_ as detailed in the metadata READ ME file.
The specimens imaged are from the palynological collections of the Swedish Museum of Natural History and Smithsonian Tropical Research Institute, and herbarium specimens from the Smithsonian Herbarium National Museum.
The optical superresolution microscopy images were taken using a Zeiss LSM 880 with Airyscan at 630X magnification (63x/NA 1.4 oil DIC). The images are in the original CZI file format. They can be opened using Zeiss propriety software (Zen, Zen lite) or in ImageJ/FIJI. More information on how to open CZI files can be found here: [https://www.zeiss.com/microscopy/en/products/software/zeiss-zen/czi-image-file-format.html]
Image metadata and file organization are described in the CSV file "METADATA_Malvaceae_Bombacoideae_modern-species.csv". The column headings are:
Folder The folder in which the image file is found
Subfamily The current subfamily determination based on the literature. Note that _Pentaplaris_ and _Septotheca_ have not been assigned a subfamily.
Genus Genus name
Species Species name
Accepted name Accepted species name, updated from the literature
Slide name Species name as denoted on the herbarium slide
Collection Source of the herbarium slide: Sweden National Museum of Natural History or the Smithsonian Tropical Research Institute
File name File name using the species name denoted on the herbarium slide
Slide ID/Herbarium ID Specimen collection number
Please cite this dataset as:
Punyasena, Surangi W.; Romero, Ingrid; Urban, Michael A. (2025): Airyscan confocal superresolution images of extant Malvaceae pollen with a focus on Bombacoideae. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-2968712_V1
keywords:
Malvaceae; superresolution microscopy; Zeiss; Bombacoideae; Neotropics; CZI
published:
2020-05-31
Zhang, Chuanyi; El-Kebir, Mohammed; Ochoa, Idoia
(2020)
This repository includes a simulated dataset and related scripts used for the paper "Moss: Accurate Single-Nucleotide Variant Calling from Multiple Bulk DNA Tumor Samples".
keywords:
Somatic Mutations; Bulk DNA Sequencing; Cancer Genomics
published:
2020-02-23
Ye, Di; Hill, Alison; Whitehorn (Fulton), Ashley; Schneider, Jodi
(2020)
Citation context annotation for papers citing retracted paper Matsuyama 2005 (RETRACTED: Matsuyama W, Mitsuyama H, Watanabe M, Oonakahara KI, Higashimoto I, Osame M, Arimura K. Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD. Chest. 2005 Dec 1;128(6):3817-27.), retracted in 2008 (Retraction in: Chest (2008) 134:4 (893) <a href="https://doi.org/10.1016/S0012-3692(08)60339-6">https://doi.org/10.1016/S0012-3692(08)60339-6<a/> ). This is part of the supplemental data for Jodi Schneider, Di Ye, Alison Hill, and Ashley Whitehorn. "Continued Citation of a Fraudulent Clinical Trial Report, Eleven Years after it was retracted for Falsifying Data" [R&R under review with Scientometrics].
Overall we found 148 citations to the retracted paper from 2006 to 2019, However, this dataset does not include the annotations described in the 2015. in Ashley Fulton, Alison Coates, Marie Williams, Peter Howe, and Alison Hill. "Persistent citation of the only published randomized controlled trial of omega-3 supplementation in chronic obstructive pulmonary disease six years after its retraction." Publications 3, no. 1 (2015): 17-26.
In this dataset 70 new and newly found citations are listed: 66 annotated citations and 4 pending citations (non-annotated since we don't have full-text).
"New citations" refer to articles published from March 25, 2014 to 2019, found in Google Scholar and Web of Science.
"Newly found citations" refer articles published 2006-2013, found in Google Scholar and Web of Science, but not previously covered in Ashley Fulton, Alison Coates, Marie Williams, Peter Howe, and Alison Hill. "Persistent citation of the only published randomised controlled trial of omega-3 supplementation in chronic obstructive pulmonary disease six years after its retraction." Publications 3, no. 1 (2015): 17-26.
NOTES:
This is Unicode data. Some publication titles & quotes are in non-Latin characters and they may contain commas, quotation marks, etc.
FILES/FILE FORMATS
Same data in two formats:
2006-2019-new-citation-contexts-to-Matsuyama.csv - Unicode CSV (preservation format only)
2006-2019-new-citation-contexts-to-Matsuyama.xlsx - Excel workbook (preferred format)
ROW EXPLANATIONS
70 rows of data - one citing publication per row
COLUMN HEADER EXPLANATIONS
Note - processing notes
Annotation pending - Y or blank
Year Published - publication year
ID - ID corresponding to the network analysis. See Ye, Di; Schneider, Jodi (2019): Network of First and Second-generation citations to Matsuyama 2005 from Google
Scholar and Web of Science. University of Illinois at Urbana-Champaign. <a href="https://doi.org/10.13012/B2IDB-1403534_V2">https://doi.org/10.13012/B2IDB-1403534_V2</a>
Title - item title (some have non-Latin characters, commas, etc.)
Official Translated Title - item title in English, as listed in the publication
Machine Translated Title - item title in English, translated by Google Scholar
Language - publication language
Type - publication type (e.g., bachelor's thesis, blog post, book chapter, clinical guidelines, Cochrane Review, consumer-oriented evidence summary, continuing education journal article, journal article, letter to the editor, magazine article, Master's thesis, patent, Ph.D. thesis, textbook chapter, training module)
Book title for book chapters - Only for a book chapter - the book title
University for theses - for bachelor's thesis, Master's thesis, Ph.D. thesis - the associated university
Pre/Post Retraction - "Pre" for 2006-2008 (means published before the October 2008 retraction notice or in the 2 months afterwards); "Post" for 2009-2019 (considered post-retraction for our analysis)
Identifier where relevant - ISBN, Patent ID, PMID (only for items we considered hard to find/identify, e.g. those without a DOI-based URL)
URL where available - URL, ideally a DOI-based URL
Reference number/style - reference
Only in bibliography - Y or blank
Acknowledged - If annotated, Y, Not relevant as retraction not published yet, or N (blank otherwise)
Positive / "Poor Research" (Negative) - P for positive, N for negative if annotated; blank otherwise
Human translated quotations - Y or blank; blank means Google scholar was used to translate quotations for Translated Quotation X
Specific/in passing (overall) - Specific if any of the 5 quotations are specific [aggregates Specific / In Passing (Quotation X)]
Quotation 1 - First quotation (or blank) (includes non-Latin characters in some cases)
Translated Quotation 1 - English translation of "Quotation 1" (or blank)
Specific / In Passing (Quotation 1) - Specific if "Quotation 1" refers to methods or results of the Matsuyama paper (or blank)
What is referenced from Matsuyama (Quotation 1) - Methods; Results; or Methods and Results - blank if "Quotation 1" not specific, no associated quotation, or not yet annotated
Quotation 2 - Second quotation (includes non-Latin characters in some cases)
Translated Quotation 2 - English translation of "Quotation 2"
Specific / In Passing (Quotation 2) - Specific if "Quotation 2" refers to methods or results of the Matsuyama paper (or blank)
What is referenced from Matsuyama (Quotation 2) - Methods; Results; or Methods and Results - blank if "Quotation 2" not specific, no associated quotation, or not yet annotated
Quotation 3 - Third quotation (includes non-Latin characters in some cases)
Translated Quotation 3 - English translation of "Quotation 3"
Specific / In Passing (Quotation 3) - Specific if "Quotation 3" refers to methods or results of the Matsuyama paper (or blank)
What is referenced from Matsuyama (Quotation 3) - Methods; Results; or Methods and Results - blank if "Quotation 3" not specific, no associated quotation, or not yet annotated
Quotation 4 - Fourth quotation (includes non-Latin characters in some cases)
Translated Quotation 4 - English translation of "Quotation 4"
Specific / In Passing (Quotation 4) - Specific if "Quotation 4" refers to methods or results of the Matsuyama paper (or blank)
What is referenced from Matsuyama (Quotation 4) - Methods; Results; or Methods and Results - blank if "Quotation 4" not specific, no associated quotation, or not yet annotated
Quotation 5 - Fifth quotation (includes non-Latin characters in some cases)
Translated Quotation 5 - English translation of "Quotation 5"
Specific / In Passing (Quotation 5) - Specific if "Quotation 5" refers to methods or results of the Matsuyama paper (or blank)
What is referenced from Matsuyama (Quotation 5) - Methods; Results; or Methods and Results - blank if "Quotation 5" not specific, no associated quotation, or not yet annotated
Further Notes - additional notes
keywords:
citation context annotation, retraction, diffusion of retraction
published:
2023-03-28
Hsiao, Tzu-Kun; Torvik, Vetle
(2023)
Sentences and citation contexts identified from the PubMed Central open access articles
----------------------------------------------------------------------
The dataset is delivered as 24 tab-delimited text files. The files contain 720,649,608 sentences, 75,848,689 of which are citation contexts. The dataset is based on a snapshot of articles in the XML version of the PubMed Central open access subset (i.e., the PMCOA subset). The PMCOA subset was collected in May 2019.
The dataset is created as described in: Hsiao TK., & Torvik V. I. (manuscript) OpCitance: Citation contexts identified from the PubMed Central open access articles.
<b>Files</b>:
• A_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with A.
• B_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with B.
• C_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with C.
• D_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with D.
• E_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with E.
• F_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with F.
• G_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with G.
• H_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with H.
• I_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with I.
• J_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with J.
• K_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with K.
• L_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with L.
• M_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with M.
• N_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with N.
• O_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with O.
• P_p1_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 1).
• P_p2_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 2).
• Q_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with Q.
• R_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with R.
• S_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with S.
• T_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with T.
• UV_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with U or V.
• W_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with W.
• XYZ_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with X, Y or Z.
Each row in the file is a sentence/citation context and contains the following columns:
• pmcid: PMCID of the article
• pmid: PMID of the article. If an article does not have a PMID, the value is NONE.
• location: The article component (abstract, main text, table, figure, etc.) to which the citation context/sentence belongs.
• IMRaD: The type of IMRaD section associated with the citation context/sentence. I, M, R, and D represent introduction/background, method, results, and conclusion/discussion, respectively; NoIMRaD indicates that the section type is not identifiable.
• sentence_id: The ID of the citation context/sentence in the article component
• total_sentences: The number of sentences in the article component.
• intxt_id: The ID of the citation.
• intxt_pmid: PMID of the citation (as tagged in the XML file). If a citation does not have a PMID tagged in the XML file, the value is "-".
• intxt_pmid_source: The sources where the intxt_pmid can be identified. Xml represents that the PMID is only identified from the XML file; xml,pmc represents that the PMID is not only from the XML file, but also in the citation data collected from the NCBI Entrez Programming Utilities. If a citation does not have an intxt_pmid, the value is "-".
• intxt_mark: The citation marker associated with the inline citation.
• best_id: The best source link ID (e.g., PMID) of the citation.
• best_source: The sources that confirm the best ID.
• best_id_diff: The comparison result between the best_id column and the intxt_pmid column.
• citation: A citation context. If no citation is found in a sentence, the value is the sentence.
• progression: Text progression of the citation context/sentence.
<b>Supplementary Files</b>
• PMC-OA-patci.tsv.gz – This file contains the best source link IDs for the references (e.g., PMID). Patci [1] was used to identify the best source link IDs. The best source link IDs are mapped to the citation contexts and displayed in the *_journal IntxtCit.tsv files as the best_id column.
Each row in the PMC-OA-patci.tsv.gz file is a citation (i.e., a reference extracted from the XML file) and contains the following columns:
• pmcid: PMCID of the citing article.
• pos: The citation's position in the reference list.
• fromPMID: PMID of the citing article.
• toPMID: Source link ID (e.g., PMID) of the citation. This ID is identified by Patci.
• SRC: The sources that confirm the toPMID.
• MatchDB: The origin bibliographic database of the toPMID.
• Probability: The match probability of the toPMID.
• toPMID2: PMID of the citation (as tagged in the XML file).
• SRC2: The sources that confirm the toPMID2.
• intxt_id: The ID of the citation.
• journal: The first letter of the journal title. This maps to the *_journal_IntxtCit.tsv files.
• same_ref_string: Whether the citation string appears in the reference list more than once.
• DIFF: The comparison result between the toPMID column and the toPMID2 column.
• bestID: The best source link ID (e.g., PMID) of the citation.
• bestSRC: The sources that confirm the best ID.
• Match: Matching result produced by Patci.
[1] Agarwal, S., Lincoln, M., Cai, H., & Torvik, V. (2014). Patci – a tool for identifying scientific articles cited by patents. GSLIS Research Showcase 2014. http://hdl.handle.net/2142/54885
• intxt_cit_license_fromPMC.tsv – This file contains the CC licensing information for each article. The licensing information is from PMC's file lists [2], retrieved on June 19, 2020, and March 9, 2023. It should be noted that the license information for 189,855 PMCIDs is <b>NO-CC CODE</b> in the file lists, and 521 PMCIDs are absent in the file lists. The absence of CC licensing information does not indicate that the article lacks a CC license. For example, PMCID: 6156294 (<b>NO-CC CODE</b>) and PMCID: 6118074 (absent in the PMC's file lists) are under CC-BY licenses according to their PDF versions of articles.
The intxt_cit_license_fromPMC.tsv file has two columns:
• pmcid: PMCID of the article.
• license: The article’s CC license information provided in PMC’s file lists. The value is nan when an article is not present in the PMC’s file lists.
[2] https://www.ncbi.nlm.nih.gov/pmc/tools/ftp/
• Supplementary_File_1.zip – This file contains the code for generating the dataset.
keywords:
citation context; in-text citation; inline citation; bibliometrics; science of science
published:
2024-06-27
Han, Hee-Sun ; Schrader, Alex; Lee, JuYeon
(2024)
U-2 OS MERFISH data set prepared by the Han lab at UIUC based off of procedures developed in Moffitt et al. Proc. Natl. Acad. Sci. USA 113 (39), 11046–11051.
Data is comprised of ~2 million spots from 130 genes with x,y,z location, cell assignment, and correction status.
keywords:
smFISH; single transcript spatial transcriptomics; U-2 OS; Cancer cell line; MERFISH
published:
2022-12-11
The data are original electron micrographs from the lab of the late Dr. Burt Endo of the USDA. These data were digitized from photographic prints and glass plate negatives at 600 DPI as 16 bit TIFF files. This fourth version added 6 new ZIP files from the Endo data collection. "Endo folder database.xlsx" is updated to reflect the addition. Information in "Readme_FileNameFormatting.docx" remains the same as in V3.
keywords:
Heterodera glycines; Meloidogyne incognita; Burt Endo; nematode
published:
2025-06-05
Guan, Yingjun; Fang, Liri
(2025)
There are two files in this dataset.
File1: AffiNorm
AffiNorm contains 1,001 rows, including one header row, randomly sampled from MapAffil 2018 Dataset ([**https://doi.org/10.13012/B2IDB-2556310_V1**](https://databank.illinois.edu/datasets/IDB-2556310)). Each row in the file corresponds to a particular author on a particular PubMed record, and contains the following 26 columns, comma-delimited. All columns are ASCII, except city which contains Latin-1.
COLUMN DESCRIPTION
1. PMID: the PubMed identifier. int.
2. ORDER: the position of the author. int.
3. YEAR - The year of publication. int(4), eg: 1975.
4. affiliation - affiliation string of the author. eg: Department of Pathology, University of Chicago, Illinois 60637.
5. annotation_type: the number of institutions annotated, denoted by S, M, O, or Z, where "S" (single) indicates 1 institution was annotated; "M" (Multiple) indicates more than one institutions were annotated; "O" (Out of Vocabulary or None) indicates no institution was annotated, but an institution was apparently mentioned; "Z" indicates no institution was mentioned.
6. Institution: the standard name(s) of the annotated institution(s), according to ROR. if "S" (single institution), it is saved as a string, eg: University of Chicago; if "M", it is saved as a string that looks like a python list, eg: ['Public Health Laboratory Service'; 'Centre for Applied Microbiology and Research']; if "O" or "Z", then blank.
7. inst_type: the type of institution, according to ROR. the potential values are: education, funder, healthcare, company, archive, nonprofit, government, facility, other. An institution may have more than one type, eg: ['Education', 'Funder']
8. type_edu: TRUE if the inst_type contains "Education"; FALSE otherwise.
9. RORid: ROR identifier(s), eg: https://ror.org/05hs6h993. when multiple, the order corresponds to institution (column 6)
10. RORid_label. the standard name(s) of the annotated institution(s) according to ROR.same as institution (column 6)
11. GRIDid: GRID identifier(s). eg: grid.170205.1
12. GRIDid_label: the standard name(s) of the annotated institution(s) according to GRID. eg: University of Chicago.
13. WikiDataid: WikiData identifier(s). eg: Q131252
14. WikiDataid_label: the standard name(s) of the annotated institution(s) according to WikiData. eg: University of Chicago
15. synonyms: a comma separated list of variant names from InsVar (file 2) . format of string. eg: University of Chicago, Chicago University, U of C, UChicago, uchicago.edu, U Chicago, ...
16. MapAffil-grid: GRID from the MapAffil 2018 Dataset.
17. MapAffil-grid_label: The standard name of institution from MapAffil 2018 Dataset.
18. judge_mapA: TRUE if GRIDid (column 11) contains MapAffil-grid (column 16); FALSE otherwise.
19. MapAffiltemporal-grid: GRID from the temporal version of MapAffil, http://abel.ischool.illinois.edu/data/MapAffilTempo2018.tsv.gz
20. MapAffiltemporal-grid_label: The standard name of institution from MapAffilTemporal 2018 Dataset.
21. judge_mapT: TRUE if GRIDid (column 11) contains MapAffiltemporal-grid (column 19); FALSE otherwise.
22. RORapi_query_id: ROR from ROR api tool (query endpoint)
23. RORapi_query_id_label: The standard name of institution from ROR api tool (query endpoint). format in string.
24. judge_rorapi_affiliation: TRUE if RORid (column 9) contains RORapi_query_id (column 22); FALSE otherwise.
25. rorapi_affiliation_id: ROR from ROR api tool (affiliation endpoint).
26. judge_rorapi_affiliation: TRUE if RORid (column 9) contains RORapi_affiliation (column 25); FALSE otherwise.
File 2: insVar.json
InsVar is a supplementary dataset for AffiNorm, which includes the institution ID and its redirected aliases from wikidata. The institution ID list is from GRID, the redirected aliases are from wiki api, for example: https://en.wikipedia.org/wiki/Special:WhatLinksHere?target=University+of+Illinois+Urbana-Champaign&namespace=&hidetrans=1&hidelinks=1&limit=100
In InsVar, the data is saved in a python dictionary format. the key is the GRID identifier, for example: "grid.1001.0" (Australian National University), and the value is a list of redirected aliases strings.
{"grid.1001.0": ["ANU", "ANU College", "ANU College of Arts and Social Sciences", "ANU College of Asia and the Pacific", "ANU Union", "ANUSA", "Asia Pacific Week", "Australia National University", "Australian Forestry School", "the Australian National University", ...], "grid.1002.3": ...}
keywords:
PubMed; MEDLINE; Digital Libraries; Bibliographic Databases; Institution Names; Author Affiliations; Institution Name Ambiguity; Authority files
published:
2020-06-12
Fu, Yuanxi; Hsiao, Tzu-Kun
(2020)
This is a network of 14 systematic reviews on the salt controversy and their included studies. Each edge in the network represents an inclusion from one systematic review to an article. Systematic reviews were collected from Trinquart (Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184 ).
<b>FILE FORMATS</b>
1) Article_list.csv - Unicode CSV
2) Article_attr.csv - Unicode CSV
3) inclusion_net_edges.csv - Unicode CSV
4) potential_inclusion_link.csv - Unicode CSV
5) systematic_review_inclusion_criteria.csv - Unicode CSV
6) Supplementary Reference List.pdf - PDF
<b>ROW EXPLANATIONS</b>
1) Article_list.csv - Each row describes a systematic review or included article.
2) Article_attr.csv - Each row is the attributes of a systematic review/included article.
3) inclusion_net_edges.csv - Each row represents an inclusion from a systematic review to an article.
4) potential_inclusion_link.csv - Each row shows the available evidence base of a systematic review.
5) systematic_review_inclusion_criteria.csv - Each row is the inclusion criteria of a systematic review.
6) Supplementary Reference List.pdf - Each item is a bibliographic record of a systematic review/included paper.
<b>COLUMN HEADER EXPLANATIONS</b>
<b>1) Article_list.csv:</b>
ID - Numeric ID of a paper
paper assigned ID - ID of the paper from Trinquart et al. (2016)
Type - Systematic review / primary study report
Study Groupings - Groupings for related primary study reports from the same report, from Trinquart et al. (2016) (if applicable, otherwise blank)
Title - Title of the paper
year - Publication year of the paper
Attitude - Scientific opinion about the salt controversy from Trinquart et al. (2016)
Doi - DOIs of the paper. (if applicable, otherwise blank)
Retracted (Y/N) - Whether the paper was retracted or withdrawn (Y). Blank if not retracted or withdrawn.
<b>2) Article_attr.csv:</b>
ID - Numeric ID of a paper
year - Publication year
Attitude - Scientific opinion about the salt controversy from Trinquart et al. (2016)
Type - Systematic review/ primary study report
<b>3) inclusion_net_edges.csv:</b>
citing_ID - The numeric ID of a systematic review
cited_ID - The numeric ID of the included articles
<b>4) potential_inclusion_link.csv:</b>
This data was translated from the Sankey diagram given in Trinquart et al. (2016) as Web Figure 4. Each row indicates a systematic review and each column indicates a primary study. In the matrix, "p" indicates that a given primary study had been published as of the search date of a given systematic review.
<b>5)systematic_review_inclusion_criteria.csv:</b>
ID - The numeric IDs of systematic reviews
paper assigned ID - ID of the paper from Trinquart et al. (2016)
attitude - Its scientific opinion about the salt controversy from Trinquart et al. (2016)
No. of studies included - Number of articles included in the systematic review
Study design - Study designs to include, per inclusion criteria
population - Populations to include, per inclusion criteria
Exposure/Intervention - Exposures/Interventions to include, per inclusion criteria
outcome - Study outcomes required for inclusion, per inclusion criteria
Language restriction - Report languages to include, per inclusion criteria
follow-up period - Follow-up period required for inclusion, per inclusion criteria
keywords:
systematic reviews; evidence synthesis; network visualization; tertiary studies
published:
2023-07-27
Feng, Ling; Takiya, Daniela; Krishnankutty, Sindhu; Dietrich, Christopher; Zhang, Yalin
(2023)
The text file contains the original aligned DNA nucleotide sequence data used in the phylogenetic analyses of Feng et al. (in review), comprising the 3 protein-coding genes (histone H3, cytochrome oxidase I and 2) and 2 ribosomal genes (28S D8 and 16S). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 257 taxa (species) and 2995 characters (nucleotide positions), indicate that the characters are DNA sequence, that gaps inserted into the DNA sequence alignment are indicated by a dash, and that missing data are indicated by a question mark. The remainder of the file contains the aligned nucleotide sequence data for the five genes. Data partitions, representing the individual genes and different codon positions of the protein-coding genes, are indicated by the lines beginning "charset" near the end of the file. Two supplementary tables in the provided PDF file provide additional information on the species in the dataset, including the GenBank accession numbers for the sequence data (Table S1) and the DNA substitution models used for each of the data partitions used for analyses in the phylogenetic analysis program IQ-Tree (version 1.6.8) (Table S3), as described in the Methods section of the paper. The supplemental tables will also be linked to the article upon publication at the journal website.
keywords:
Insect; leafhopper; dispersal; vicariance; evolution
published:
2021-05-07
The dataset is based on a snapshot of PubMed taken in December 2018 (NLMs baseline 2018 plus updates throughout 2018), and for ORCIDs, primarily, the 2019 ORCID Public Data File https://orcid.org/.
Matching an ORCID to an individual author name on a PMID is a non-trivial process. Anyone can create an ORCID and claim to have contributed to any published work. Many records claim too many articles and most claim too few. Even though ORCID records are (most?) often populated by author name searches in popular bibliographic databases, there is no confirmation that the person's name is listed on the article. This dataset is the product of mapping ORCIDs to individual author names on PMIDs, even when the ORCID name does not match any author name on the PMID, and when there are multiple (good) candidate author names. The algorithm avoids assigning the ORCID to an article when there are no good candidates and when there are multiple equally good matches. For some ORCIDs that clearly claim too much, it triggers a very strict matching procedure (for ORCIDs that claim too much but the majority appear correct, e.g., 0000-0002-2788-5457), and sometimes deletes ORCIDs altogether when all (or nearly all) of its claimed PMIDs appear incorrect. When an individual clearly has multiple ORCIDs it deletes the least complete of them (e.g., 0000-0002-1651-2428 vs 0000-0001-6258-4628). It should be noted that the ORCIDs that claim to much are not necessarily due nefarious or trolling intentions, even though a few appear so. Certainly many are are due to laziness, such as claiming everything with a particular last name. Some cases appear to be due to test engineers (e.g., 0000-0001-7243-8157; 0000-0002-1595-6203), or librarians assisting faculty (e.g., ; 0000-0003-3289-5681), or group/laboratory IDs (0000-0003-4234-1746), or having contributed to an article in capacities other than authorship such as an Investigator, an Editor, or part of a Collective (e.g., 0000-0003-2125-4256 as part of the FlyBase Consortium on PMID 22127867), or as a "Reply To" in which case the identity of the article and authors might be conflated. The NLM has, in the past, limited the total number of authors indexed too. The dataset certainly has errors but I have taken great care to fix some glaring ones (individuals who claim to much), while still capturing authors who have published under multiple names and not explicitly listed them in their ORCID profile. The final dataset provides a "matchscore" that could be used for further clean-up.
Four files:
person.tsv: 7,194,692 rows, including header
1. orcid
2. lastname
3. firstname
4. creditname
5. othernames
6. otherids
7. emails
employment.tsv: 2,884,981 rows, including header
1. orcid
2. putcode
3. role
4. start-date
5. end-date
6. id
7. source
8. dept
9. name
10. city
11. region
12 country
13. affiliation
education.tsv: 3,202,253 rows, including header
1. orcid
2. putcode
3. role
4. start-date
5. end-date
6. id
7. source
8. dept
9. name
10. city
11. region
12 country
13. affiliation
pubmed2orcid.tsv: 13,133,065 rows, including header
1. PMID
2. au_order (author name position on the article)
3. orcid
4. matchscore (see below)
5. source: orcid (2019 ORCID Public Data File https://orcid.org/), pubmed (NLMs distributed XML files), or patci (an earlier version of ORCID with citations processed through the Patci tool)
12,037,375 from orcid; 1,06,5892 from PubMed XML; 29,797 from Patci
matchscore:
000: lastname, firstname and middle init match (e.g., Eric T MacKenzie vs
00: lastname, firstname match (e.g., Keith Ward)
0: lastname, firstname reversed match (e.g., Conde Santiago vs Santiago Conde)
1: lastname, first and middle init match (e.g., L. F. Panchenko)
11: lastname and partial firstname match (e.g., Mike Boland vs Michael Boland or Mel Ziman vs Melanie Ziman)
12: lastname and first init match
15: 3 part lastname and firstname match (David Grahame Hardie vs D Grahame Hardie)
2: lastname match and multipart firstname initial match Maria Dolores Suarez Ortega vs M. D. Suarez
22: partial lastname match and firstname match (e.g., Erika Friedmann vs Erika Friedman)
23: e.g., Antonio Garcia Garcia vs A G Garcia
25: Allan Downie vs J A Downie
26: Oliver Racz vs Oliver Bacz
27: Rita Ostrovskaya vs R U Ostrovskaia
29: Andrew Staehelin vs L A Staehlin
3: M Tronko vs N D Tron'ko
4: Sharon Dent (Also known as Sharon Y.R. Dent; Sharon Y Roth; Sharon Yoder) vs Sharon Yoder
45: Okulov Aleksei vs A B Okulov
48: Maria Del Rosario Garcia De Vicuna Pinedo vs R Garcia-Vicuna
49: Anatoliy Ivashchenko vs A Ivashenko
5 = lastname match only (weak match but sometimes captures alternative first name for better subsequent matches); e.g., Bill Hieb vs W F Hieb
6 = first name match only (weak match but sometimes captures alternative first name for better subsequent matches); e.g., Maria Borawska vs Maria Koscielak
7 = last or first name match on "other names"; e.g., Hromokovska Tetiana (Also known as Gromokovskaia, T. S., Громоковська Тетяна) vs T Gromokovskaia
77: Siva Subramanian vs Kolinjavadi N. Sivasubramanian
88 = no name in orcid but match caught by uniqueness of name across paper (at least 90% and 2 more than next most common name)
prefix:
C = ambiguity reduced (possibly eliminated) using city match (e.g., H Yang on PMID 24972200)
I = ambiguity eliminated by excluding investigators (ie.., one author and one or more investigators with that name)
T = ambiguity eliminated using PubMed pos (T for tie-breaker)
W = ambiguity resolved by authority2018
published:
2025-12-01
Mori, Jameson; Zilinger, Amber; Neumann, Julia; Pentrak, Martin; Paton, Tim; Novakofski, Jan; Mateus-Pinilla, Nohra
(2025)
This dataset measurements for the following soil components from soil samples collected in northern Illinois between 2023 and 2024. Two file formats containing the same data are offered (Excel spreadsheet and CSV):
1. Soil clay minerals (illite, kaolinite, chlorite, and smectite)
2. pH
3. Other soil minerals: aluminum (Al), arsenic (As), barium (Ba), boron aluminide (Bal), calcium (Ca), cadmium (Cd), chloride (Cl), cobalt (Co), chromium (Cr), copper (Cu), iron (Fe), magnesium (Mg), manganese (Mn), mercury (Hg), molybdenum (Mo), nobium (Nb), nickel (Ni), potassium (K), phosphorous (P), lead (Pb), palladium (Pd), rubidium (Rb), silver (Ag), sulfur (S), thorium (Th), titanium (Ti), uranium (U), vanadium (V), yttrium (Y), zinc (Zn), and zirconium (Zr)
Samples were collected on the side of public roads within the right of way. X-ray diffraction was used to quantify soil clay components, while other soil minerals were measured using a Niton XL5 Plus Analyzer. pH was measured using a Yinmik YK-S01 Digital Soil pH Tester. Samples were collected as part of a project funded by the United States Department of Agriculture Animal and Plant Inspection Service (USDA-APHIS) to examine the role of soil characteristics on chronic wasting disease (CWD) persistence in northern Illinois, USA.
keywords:
CWD; chronic wasting disease; soil; clay; pH; mineral; environmental transmission; X-ray diffraction
published:
2025-10-17
Deewan, Anshu; Liu, Jing-Jing; Jagtap, Sujit Sadashiv; Yun, Eun Ju; Walukiewicz, Hanna E.; Jin, Yong-Su; Rao, Christopher V.
(2025)
Oleaginous yeasts have received significant attention due to their substantial lipid storage capability. The accumulated lipids can be utilized directly or processed into various bioproducts and biofuels. Lipomyces starkeyi is an oleaginous yeast capable of using multiple plant-based sugars, such as glucose, xylose, and cellobiose. It is, however, a relatively unexplored yeast due to limited knowledge about its physiology. In this study, we have evaluated the growth of L. starkeyi on different sugars and performed transcriptomic and metabolomic analyses to understand the underlying mechanisms of sugar metabolism. Principal component analysis showed clear differences resulting from growth on different sugars. We have further reported various metabolic pathways activated during growth on these sugars. We also observed non-specific regulation in L. starkeyi and have updated the gene annotations for the NRRL Y-11557 strain. This analysis provides a foundation for understanding the metabolism of these plant-based sugars and potentially valuable information to guide the metabolic engineering of L. starkeyi to produce bioproducts and biofuels.
keywords:
Conversion;Metabolomics;Transcriptomics
published:
2021-11-16
Prada, Cecilia M.; Turner, Benjamin L.; Dalling, James W.
(2021)
Data from an a field experiment at El Velo, Chiriqui, Republic of Panama. Data contain information about functional traits of seedlings growing in different treatments including type of forest, nitrogen addition and organic matter.
keywords:
Mycorrhiza; nitrogen; oak forest; Panama; plant-soil feedbacks, seedling growth
published:
2016-08-16
Nguyen, Nam-phuong; Nute, Mike; Mirarab, Siavash; Warnow, Tandy
(2016)
This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families
used to build the HMMs and BLAST databases. The file structure is:
./X/Y/initial.fasttree
./X/Y/initial.fasta
where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder
are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the
seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on
the initial.fasta.
The query.tar archive contains the query sequences for each cross-fold set.
The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas,
where Z is the fragment length (1, 0.5, or 0.25). The query files are found
in the splits directory.
[1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.
keywords:
HIPPI dataset; ensembles of profile Hidden Markov models; Pfam
published:
2024-04-15
Lyu, Zhiheng; Lehan, Yao; Zhisheng, Wang; Chang, Qian; Zuochen, Wang; Jiahui, Li; Yufeng, Wang; Qian, Chen
(2024)
The dataset contains trajectories of Pt nanoparticles in 1.98 mM NaBH4 and NaCl, tracked under liquid-phase TEM. The coordinates (x, y) of nanoparticles are provided, together with the conversion factor that translates pixel size to actual distance. In the file, ∆t denotes the time interval and NaN indicates the absence of a value when the nanoparticle has not emerged or been tracked. The labeling of nanoparticles in the paper is also noted in the second row of the file.
keywords:
nanomotor; liquid-phase TEM
published:
2022-10-14
Dietrich, Christopher; Dmitriev, Dmitry; Takiya, Daniela; Thomas, Michael; Webb, Michael D; Zahniser, James; Zhang, Yalin
(2022)
The Membracoidea_morph_data_Final.nex text file contains the original data used in the phylogenetic analyses of Dietrich et al. (Insect Systematics and Diversity, in review). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The complete taxon names corresponding to the 131 genus names listed under “BEGIN TAXA” are listed in Table 1 in the included PDF file “Taxa_and_characters”; the 229 morphological characters (names abbreviated under under “BEGIN CHARACTERS” are fully explained in the list of character descriptions following Table 1 in the same PDF). The data matrix follows “MATRIX” and gives the numerical values of characters for each taxon. Question marks represent missing data. The lists of characters and taxa and details on the methods used for phylogenetic analysis are included in the submitted manuscript.
keywords:
leafhopper; treehopper; evolution; Cretaceous; Eocene
published:
2024-11-27
Han, Hee-Sun; Schrader, Alex; Lee, JuYeon; Yeo, Seokjin; Traniello, Ian
(2024)
Honey bee (apis mellifera) MERFISH data set prepared by the Han lab, from brains collected by the Robinson lab at UIUC. Dataset is comprised of ~22 thousand cells and 130 genes with x,y locations for each cell. Jupyter notebook file is included as an example to load the data using Scanpy.
keywords:
smFISH; single transcript spatial transcriptomics; Honey bee brain; Apis mellifera; MERFISH
published:
2023-07-05
Fu, Yuanxi; Hsiao, Tzu-Kun; Joshi, Manasi Ballal; Lischwe Mueller, Natalie
(2023)
The salt controversy is the public health debate about whether a population-level salt reduction is beneficial. This dataset covers 82 publications--14 systematic review reports (SRRs) and 68 primary study reports (PSRs)--addressing the effect of sodium intake on cerebrocardiovascular disease or mortality. These present a snapshot of the status of the salt controversy as of September 2014 according to previous work by epidemiologists: The reports and their opinion classification (for, against, and inconclusive) were from Trinquart et al. (2016) (Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184 ), which collected 68 PSRs, 14 SRRs, 11 clinical guideline reports, and 176 comments, letters, or narrative reviews. Note that our dataset covers only the 68 PSRs and 14 SRRs from Trinquart et al. 2016, not the other types of publications, and it adds additional information noted below.
This dataset can be used to construct the inclusion network and the co-author network of the 14 SRRs and 68 PSRs. A PSR is "included" in an SRR if it is considered in the SRR's evidence synthesis. Each included PSR is cited in the SRR, but not all references cited in an SRR are included in the evidence synthesis or PSRs. Based on which PSRs are included in which SRRs, we can construct the inclusion network. The inclusion network is a bipartite network with two types of nodes: one type represents SRRs, and the other represents PSRs. In an inclusion network, if an SRR includes a PSR, there is a directed edge from the SRR to the PSR. The attribute file (report_list.csv) includes attributes of the 82 reports, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Notably, 11 PSRs have never been included in any SRR in the dataset. They are unused PSRs. If visualized with the inclusion network, they will appear as isolated nodes.
We used a custom-made workflow (Fu, Y. (2022). Scopus author info tool (1.0.1) [Python]. https://github.com/infoqualitylab/Scopus_author_info_collection ) that uses the Scopus API and manual work to extract and disambiguate authorship information for the 82 reports. The author information file (salt_cont_author.csv) is the product of this workflow and can be used to compute the co-author network of the 82 reports.
We also provide several other files in this dataset. We collected inclusion criteria (the criteria that make a PSR eligible to be included in an SRR) and recorded them in the file systematic_review_inclusion_criteria.csv. We provide a file (potential_inclusion_link.csv) recording whether a given PSR had been published as of the search date of a given SRR, which makes the PSR potentially eligible for inclusion in the SRR. We also provide a bibliography of the 82 publications (supplementary_reference_list.pdf). Lastly, we discovered minor discrepancies between the inclusion relationships identified by Trinquart et al. (2016) and by us. Therefore, we prepared an additional edge list (inclusion_net_edges_trinquart.csv) to preserve the inclusion relationships identified by Trinquart et al. (2016).
<b>UPDATES IN THIS VERSION COMPARED TO V2</b> (Fu, Yuanxi; Hsiao, Tzu-Kun; Joshi, Manasi Ballal (2022): The Salt Controversy Systematic Review Reports and Primary Study Reports Network Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6128763_V2)
- We added a new column "pub_date" to report_list.csv
- We corrected mistakes in supplementary_reference_list.pdf for report #28 and report #80. The author of report #28 is not Salisbury D but Khaw, K.-T., & Barrett-Connor, E. Report #80 was mistakenly mixed up with report #81.
keywords:
systematic reviews; evidence synthesis; network analysis; public health; salt controversy;
published:
2025-03-12
Jeong, Gangwon; Villa, Umberto; Park, Seonyeong; Anastasio, Mark A.
(2025)
References
- Jeong, Gangwon, Umberto Villa, and Mark A. Anastasio. "Revisiting the joint estimation of initial pressure and speed-of-sound distributions in photoacoustic computed tomography with consideration of canonical object constraints." Photoacoustics (2025): 100700.
- Park, Seonyeong, et al. "Stochastic three-dimensional numerical phantoms to enable computational studies in quantitative optoacoustic computed tomography of breast cancer." Journal of biomedical optics 28.6 (2023): 066002-066002.
Overview
- This dataset includes 80 two-dimensional slices extracted from 3D numerical breast phantoms (NBPs) for photoacoustic computed tomography (PACT) studies. The anatomical structures of these NBPs were obtained using tools from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project. The methods used to modify and extend the VICTRE NBPs for use in PACT studies are described in the publication cited above.
- The NBPs in this dataset represent the following four ACR BI-RADS breast composition categories:
> Type A - The breast is almost entirely fatty
> Type B - There are scattered areas of fibroglandular density in the breast
> Type C - The breast is heterogeneously dense
> Type D - The breast is extremely dense
- Each 2D slice is taken from a different 3D NBP, ensuring that no more than one slice comes from any single phantom.
File Name Format
- Each data file is stored as a .mat file. The filenames follow this format: {type}{subject_id}.mat where{type} indicates the breast type (A, B, C, or D), and {subject_id} is a unique identifier assigned to each sample. For example, in the filename D510022534.mat, "D" represents the breast type, and "510022534" is the sample ID.
File Contents
- Each file contains the following variables:
> "type": Breast type
> "p0": Initial pressure distribution [Pa]
> "sos": Speed-of-sound map [mm/μs]
> "att": Acoustic attenuation (power-law prefactor) map [dB/ MHzʸ mm]
> "y": power-law exponent
> "pressure_lossless": Simulated noiseless pressure data obtained by numerically solving the first-order acoustic wave equation using the k-space pseudospectral method, under the assumption of a lossless medium (corresponding to Studies I, II, and III).
> "pressure_lossy": Simulated noiseless pressure data obtained by numerically solving the first-order acoustic wave equation using the k-space pseudospectral method, incorporating a power-law acoustic absorption model to account for medium losses (corresponding to Study IV).
* The pressure data were simulated using a ring-array transducer that consists of 512 receiving elements uniformly distributed along a ring with a radius of 72 mm.
* Note: These pressure data are noiseless simulations. In Studies II–IV of the referenced paper, additive Gaussian i.i.d. noise were added to the measurement data. Users may add similar noise to the provided data as needed for their own studies.
- In Study I, all spatial maps (e.g., sos) have dimensions of 512 × 512 pixels, with a pixel size of 0.32 mm × 0.32 mm.
- In Study II and Study III, all spatial maps (sos) have dimensions of 1024 × 1024 pixels, with a pixel size of 0.16 mm × 0.16 mm.
- In Study IV, both the sos and att maps have dimensions of 1024 × 1024 pixels, with a pixel size of 0.16 mm × 0.16 mm.
keywords:
Medical imaging; Photoacoustic computed tomography; Numerical phantom; Joint reconstruction
published:
2019-07-04
Sashittal, Palash; El-Kebir, Mohammed
(2019)
Results generated using SharpTNI on data collected from the 2014 Ebola outbreak in Sierra Leone.
published:
2019-06-12
Miller, Andrew; Raudabaugh, Daniel
(2019)
The data set contains Supplemental data sets for the Manuscript entitled "Where are they hiding? Testing the body snatchers hypothesis in pyrophilous fungi."
Environmental sampling: Amplification of nuclear DNA regions (ITS1 and ITS2) were completed using the Fluidigm Access Array and the resulting amplicons were sequenced on an Illumina MiSeq v2 platform runs using rapid 2 × 250 nt paired-end reads. Illumina sequencing run amplicons that were size selected into <500nt and >500nt sub-pools, then remixed together <500nt: >500nt by nM concentration in a 1x:3x proportion. All amplification and sequencing steps were performed at the Roy J. Carver Biotechnology Center at the University of Illinois Urbana-Champaign.
ITS1 region primers consisted of ITS1F (5'-CTTGGTCATTTAGAGGAAGTAA-'3) and ITS2 (5'-GCTGCGTTCTTCATCGATGC-'3).
ITS2 region primers consisted of fITS7 (5'-GTGARTCATCGAATCTTTG-'3) and ITS4 (5'-TCCTCCGCTTATTGATATGC-'3).
Supplemental files 1 through 5 contain the raw data files.
Supplemental 1 is the ITS1 Illumina MiSeq forward reads and Supplemental 2 is the corresponding index files.
Supplemental 3 is the ITS2 Illumina MiSeq forward reads and Supplemental 4 is the corresponding index files.
Supplemental 5 is the map file needed to process the forward reads and index files in QIIME.
Supplemental 6 and 7 contain the resulting QIIME 1.9.1. OTU tables along with UNITE, NCBI, and CONSTAX taxonomic assignments in addition to the representative OTU sequence.
Numeric samples within the OTU tables correspond to the following:
1 Brachythecium sp.
2 Usnea cornuta
3 Dicranum sp.
4 Leucodon julaceus
5 Lobaria quercizans
6 Rhizomnium sp.
7 Dicranum sp.
8 Thuidium delicatulum
9 Myelochroa aurulenta
10 Atrichum angustatum
11 Dicranum sp.
12 Hypnum sp.
13 Atrichum angustatum
14 Hypnum sp.
15 Thuidium delicatulum
16 Leucobryum sp.
17 Polytrichum commune
18 Atrichum angustatum
19 Atrichum angustatum
20 Atrichum crispulum
21 Bryaceae
22 Leucobryum sp.
23 Conocephalum conicum
24 Climacium americanum
25 Atrichum angustatum
26 Huperzia serrata
27 Polytrichum commune
28 Diphasiastrum sp.
29 Anomodon attenuatus
30 Bryoandersonia sp.
31 Polytrichum commune
32 Thuidium delicatulum
33 Brachythecium sp.
34 Leucobryum glaucum
35 Bryoandersonia sp.
36 Anomodon attenuatus
37 Pohlia sp.
38 Cinclidium sp.
39 Hylocomium splendens
40 Polytrichum commune
41 negative control
42 Soil
43 Soil
44 Soil
45 Soil
46 Soil
47 Soil
If a sample number is not present within the OTU table; either no sequences were obtained or no sequences passed the quality filtering step in QIIME.
Supplemental 8 contains the Summary of unique species per location.
published:
2021-10-15
Atomic oxygen densities in the MLT, averaged for 2002-2018 for 26, 14 day periods, beginning January 1.
keywords:
SABER data
published:
2025-04-04
Fang, Liri; Salami, Malik Oyewale; Weber, Griffin M.; Torvik, Vetle I.
(2025)
This dataset, uCite, is the union of nine large-scale open-access PubMed citation data separated by reliability. There are 20 files, including the reliable and unreliable citation PMID pairs, non-PMID identifiers to PMID mapping (for DOIs, Lens, MAG, and Semantic Scholar), original PMID pairs from the nine resources, some metadata for PMIDs, duplicate PMIDs, some redirected PMID pairs, and PMC OA Patci citation matching results.
The short description of each data file is listed as follows. A detailed description can be found in the README.txt.
<strong>DATASET DESCRIPTION</strong>
<ol>
<li>PPUB.tsv.gz - tsv format file containing reliable citation pairs uCite.</li>
<li>PUNR.tsv.gz - tsv format file containing reliable citation pairs uCite.</li>
<li>DOI2PMID.tsv.gz - tsv format file containing results mapping DOI to PMID. </li>
<li> LEN2PMID.tsv.gz - tsv format file containing results mapping LensID pairs to PMID pairs.. </li>
<li> MAG2PMIDsorted.tsv.gz - tsv format file containing results mapping MAG ID to PMID. </li>
<li>SEM2PMID.tsv.gz - tsv ormat file containing results mapping Semantic Scholar ID to PMID. </li>
<li>JVNPYA.tsv.gz - tsv format file containing metadata of papers with PMID, journal name, volume, issue, pages, publication year, and first author's last name. </li>
<li>TiLTyAlJVNY.tsv.gz - tsv format file containing metadata of papers. </li>
<li> PMC-OA-patci.tsv.gz - tsv format file containing PubMed Central Open Access subset reference strings extracted by \cite{} processed by Patci.</li>
<li>REDIRECTS.gz - txt file containing unreliable PMID pairs mapped to reliable PMID pairs. </li>
<li>REMAP - file containing pairs of duplicate PubMed records (lhs PMID mapped to rhs PMID).</li>
<li> ami_pair.tsv.gz - tsv format file containing all citation pairs from Aminer (2015 version). </li>
<li> dim_pair.tsv.gz - tsv format file containing all citation pairs from Dimensions. </li>
<li> ice_pair.tsv.gz - tsv format file containing all citation pairs from iCite (April 2019 version, version 1). </li>
<li> len_pair.tsv.gz - tsv format file containing all citation pairs from Lens.org (harvested through Oct 2021). </li>
<li>mag_pair.tsv.gz - tsv format file containing all citation pairs from Microsoft Academic Graph (2015 version). </li>
<li> oci_pair.tsv.gz - tsv format file containing all citation pairs from Open Citations (Nov. 2021 dump, csv version ). </li>
<li> pat_pair.tsv.gz - tsv format file containing all citation pairs from Patci (i.e., from "PMC-OA-patci.tsv.gz"). </li>
<li> pmc_pair.tsv.gz - tsv format file containing all citation pairs from PubMed Central (harvest through Dec 2018 via e-Utilities).</li>
<li> sem_pair.tsv.gz - tsv format file containing all citation pairs from Semantic Scholar (2019 version) . </li>
</ol>
<strong>COLUMN DESCRIPTION</strong>
<strong>FILENAME</strong> : <em>PPUB.tsv.gz, PUNR.tsv.gz</em>
(1) fromPMID - PubMed ID of the citing paper.
(2) toPMID - PubMed ID of the cited paper.
(3) sources - citation sources, in which the citation pairs are identified.
(4) fromYEAR - Publication year of the citing paper.
(5) toYEAR - Publication year of the cited paper.
<strong>FILENAME</strong> : <em>DOI2PMID.tsv.gz</em>
(1) DOI - Semantic Scholar ID of paper records.
(2) PMID - PubMed ID of paper records.
(3) PMID2 - Digital Object Identifier of paper records, “-” if the paper doesn't have DOIs.
<strong>FILENAME</strong> : <em>SEMID2PMID.tsv.gz</em>
(1) SemID - Semantic Scholar ID of paper records.
(2) PMID - PubMed ID of paper records.
(3) DOI - Digital Object Identifier of paper records, “-” if the paper doesn't have DOIs.
<strong>FILENAME</strong> : <em>JVNPYA.tsv.gz</em>
- Each row refers to a publication record.
(1) PMID - PubMed ID.
(2) journal - Journal name.
(3) volume - Journal volume.
(4) issue - Journal issue.
(5) pages - The first page and last page (without leading digits) number of the publication separated by '-'.
(6) year - Publication year.
(7) lastname - Last name of the first author.
<strong>FILENAME</strong> : <em>TiLTyAlJVNY.tsv.gz</em>
(1) PMID - PubMed ID.
(2) title_tokenized - Paper title after tokenization.
(3) languages - Language that paper is written in.
(4) pub_types - Types of the publication.
(5) length(authors) - String length of author names.
(6) journal -Journal name .
(7) volume - Journal volume .
(8) issue - Journal issue.
(9) year - Publication year of print (not necessary epub).
<strong>FILENAME</strong> : <em> PMC-OA-patci.tsv.gz</em>
(1) pmcid - PubMed Central identifier.
(2) pos -
(3) fromPMID - PubMed ID of the citing paper.
(4) toPMID - PubMed ID of the cited paper.
(5) SRC - citation sources, in which the citation pairs are identified.
(6) MatchDB - PubMed, ADS, DBLP.
(7) Probability - Matching probability predicted by Patci.
(8) toPMID2 - PubMed ID of the cited paper, extracted from OA xml file
(9) SRC2 - citation sources, in which the citation pairs are identified.
(10) intxt_id -
(11) jounal - First character of the journal name.
(12) same_ref_string - Y if patci and xml reference string match, otherwise N.
(13) DIFF -
(14) bestSRC - Citation sources, in which the citation pairs are identified.
(15) Match - Matching strings annotated by Patci.
<strong>FILENAME</strong> : <em>REDIRECTS.gz</em>
Each row in Redirectis.txt is a string sequence in the same format as follows.
- "REDIRECTED FROM: source PMID_i PMID_j -> PMID_i' PMID_j "
- "REDIRECTED TO: source PMID_i PMID_j -> PMID_i PMID_j' "
Note: source is the names of sources where the PMID_i and PMID_j are from.
<strong>FILENAME</strong> : <em>REMAP</em>
Each row is remapping unreliable PMID pairs mapped to reliable PMID pairs.
The format of each row is "$REMAP{PMID_i} = PMID_j".
<strong>FILENAME</strong> : <em>ami_pair.tsv.gz, dim_pair.tsv.gz, ice_pair.tsv.gz, len_pair.tsv.gz, mag_pair.tsv.gz, oci_pair.tsv.gz, pat_pair.tsv.gz,pmc_pair.tsv.gz, sem_pair.tsv.gz</em>
(1) fromPMID - PubMed ID of the citing paper.
(2) toPMID - PubMed ID of the cited paper.
keywords:
Citation data; PubMed; Social Science;
published:
2021-10-15
Atomic oxygen data from SCIAMACHY, for the MLT, 2002-2012, averaged for 26, 14 day periods, beginning January 1.
keywords:
SCIAMACHY data