Illinois Data Bank

Slide scans, annotated pollen counts, and trained pollen detection models for fossil pollen samples from Laguna Pallcacocha, El Cajas National Park, Ecuador

This dataset includes image stacks, annotated counts, and ground-truth masks from two high-resolution sediment cores extracted from Laguna Pallcacocha, in El Cajas National Park, Ecuadorian Andes by Moy et al. (2002) and Hagemans et al. (2021). The first core (PAL 1999, from Moy et al. (2002)) extends through the Holocene (11,600 cal. yr. BP - present). There are a total of 900 annotated image stacks and masks in the PAL 1999 domain. The second core (PAL IV, from Hagemans et al. (2021)) captures the 20th century. There are 2986 annotated image stacks and masks in the PAL IV domain.

Different microscopes and annotations tools were used to image and annotate each core and there are corresponding differences in naming conventions and file formats. Thus, we organized our data separately for the PAL 1999 and the PAL IV domains. The three letter codes used to label our pollen annotations are in the file: “Pollen_Identification_Codes.xlsx”.

Both domain directories contain:
• Image stacks organized by subdirectory
• Annotations within each image stack directory, containing specimen identifications using a three letter code and coordinates defining bounding boxes or circles
• Ground-truth distance-transform masks for each image stack

The zip file "bestValModel_encoder.paramOnly.zip" is the trained pollen detection model produced from the images and annotations in this dataset.

Please cite this dataset as:

Feng, Jennifer T.; van den Berg, Thya; Donders, Timme H.; Kong, Shu; Puthanveetil Satheesan, Sandeep; Punyasena, Surangi W. (2025): Slide scans, annotated pollen counts, and trained pollen detection models for fossil pollen samples from Laguna Pallcacocha, El Cajas National Park, Ecuador . University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4207757_V1

Please also include citations of the original publications from which these data are taken:

Feng, Jennifer T., Sandeep Puthanveetil Satheesan, Shu Kong, Timme H. Donders, and Surangi W. Punyasena. “Addressing the ‘Open World’: Detecting and Segmenting Pollen on Palynological Slides with Deep Learning.” bioRxiv, January 1, 2025. https://doi.org/10.1101/2025.01.05.631390.

Feng, Jennifer T., Sandeep Puthanveetil Satheesan, Shu Kong, Timme H. Donders, and Surangi W. Punyasena. “Addressing the ‘Open World’: Detecting and Segmenting Pollen on Palynological Slides with Deep Learning.” Paleobiology, 2025 [in press].

Feng, J. T. (2023). Open-world deep learning applied to pollen detection (MS thesis, University of Illinois at Urbana-Champaign). https://hdl.handle.net/2142/120168

Life Sciences
continual learning; deep learning; domain gaps; open-world; palynology; pollen grain detection; taxonomic bias
CC0
University of Illinois Campus Research Board -Grant:RB22079
University of Illinois School of Integrative Biology Francis M. and Harlie M. Clark Research Support Grant
Dutch Research Council (NWO) -Grant:824.14.018
University of Macau-Grant:SRG2023-00044-FST
Surangi W. Punyasena
388 times
Version DOI Comment Publication Date
1 10.13012/B2IDB-4207757_V1 2025-07-21

4.46 MB View File
8.39 GB File
63.4 GB File
15.2 KB View File
95.6 MB File

Contact the Research Data Service for help interpreting this log.

Dataset update: {"publication_state"=>["metadata embargo", "released"]} 2025-07-21T13:00:08Z
Dataset update: {"description"=>["This dataset includes image stacks, annotated counts, and ground-truth masks from two high-resolution sediment cores extracted from Laguna Pallcacocha, in El Cajas National Park, Ecuadorian Andes by Moy et al. (2002) and Hagemans et al. (2021). The first core (PAL 1999, from Moy et al. (2002)) extends through the Holocene (11,600 cal. yr. BP - present). There are a total of 900 annotated image stacks and masks in the PAL 1999 domain. The second core (PAL IV, from Hagemans et al. (2021)) captures the 20th century. There are 2986 annotated image stacks and masks in the PAL IV domain.\r\n\r\nDifferent microscopes and annotations tools were used to image and annotate each core and there are corresponding differences in naming conventions and file formats. Thus, we organized our data separately for the PAL 1999 and the PAL IV domains. The three letter codes used to label our pollen annotations are in the file: “Pollen_Identification_Codes.xlsx”.\r\n\r\nBoth domain directories contain:\r\n•\tImage stacks organized by subdirectory\r\n•\tAnnotations within each image stack directory, containing specimen identifications using a three letter code and coordinates defining bounding boxes or circles\r\n•\tGround-truth distance-transform masks for each image stack\r\n\r\nThe zip file \"bestValModel_encoder.paramOnly.zip\" is the trained pollen detection model produced from the images and annotations in this dataset.\r\n\r\n\r\nPlease cite this dataset as:\r\n\r\nFeng, Jennifer T.; van den Berg, Thya; Donders, Timme H.; Kong, Shu; Puthanveetil Satheesan, Sandeep; Punyasena, Surangi W. (2025): Slide scans, annotated pollen counts, and trained pollen detection models for fossil pollen samples from Laguna Pallcacocha, El Cajas National Park, Ecuador . University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4207757_V1 \r\n\r\nPlease also include citations of the original publications from which these data are taken:\r\n\r\nFeng, Jennifer T.; Puthanveetil Satheesan, Sandeep; Kong, Shu; Donders, Timme H.; Punyasena, Surangi W. (2025). Deep learning techniques for pollen detection in the open world.\r\n\r\nFeng, J. T. (2023). Open-world deep learning applied to pollen detection (MS thesis, University of Illinois at Urbana-Champaign). https://hdl.handle.net/2142/120168 \r\n\r\n", "This dataset includes image stacks, annotated counts, and ground-truth masks from two high-resolution sediment cores extracted from Laguna Pallcacocha, in El Cajas National Park, Ecuadorian Andes by Moy et al. (2002) and Hagemans et al. (2021). The first core (PAL 1999, from Moy et al. (2002)) extends through the Holocene (11,600 cal. yr. BP - present). There are a total of 900 annotated image stacks and masks in the PAL 1999 domain. The second core (PAL IV, from Hagemans et al. (2021)) captures the 20th century. There are 2986 annotated image stacks and masks in the PAL IV domain.\r\n\r\nDifferent microscopes and annotations tools were used to image and annotate each core and there are corresponding differences in naming conventions and file formats. Thus, we organized our data separately for the PAL 1999 and the PAL IV domains. The three letter codes used to label our pollen annotations are in the file: “Pollen_Identification_Codes.xlsx”.\r\n\r\nBoth domain directories contain:\r\n•\tImage stacks organized by subdirectory\r\n•\tAnnotations within each image stack directory, containing specimen identifications using a three letter code and coordinates defining bounding boxes or circles\r\n•\tGround-truth distance-transform masks for each image stack\r\n\r\nThe zip file \"bestValModel_encoder.paramOnly.zip\" is the trained pollen detection model produced from the images and annotations in this dataset.\r\n\r\n\r\nPlease cite this dataset as:\r\n\r\nFeng, Jennifer T.; van den Berg, Thya; Donders, Timme H.; Kong, Shu; Puthanveetil Satheesan, Sandeep; Punyasena, Surangi W. (2025): Slide scans, annotated pollen counts, and trained pollen detection models for fossil pollen samples from Laguna Pallcacocha, El Cajas National Park, Ecuador . University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4207757_V1 \r\n\r\nPlease also include citations of the original publications from which these data are taken:\r\n\r\nFeng, Jennifer T., Sandeep Puthanveetil Satheesan, Shu Kong, Timme H. Donders, and Surangi W. Punyasena. “Addressing the ‘Open World’: Detecting and Segmenting Pollen on Palynological Slides with Deep Learning.” bioRxiv, January 1, 2025. https://doi.org/10.1101/2025.01.05.631390. \r\n\r\nFeng, Jennifer T., Sandeep Puthanveetil Satheesan, Shu Kong, Timme H. Donders, and Surangi W. Punyasena. “Addressing the ‘Open World’: Detecting and Segmenting Pollen on Palynological Slides with Deep Learning.” Paleobiology, 2025 [in press]. \r\n\r\nFeng, J. T. (2023). Open-world deep learning applied to pollen detection (MS thesis, University of Illinois at Urbana-Champaign). https://hdl.handle.net/2142/120168 \r\n\r\n"]} 2025-07-20T12:48:21Z
Dataset update: {"release_date"=>[Sun, 31 Aug 2025, Mon, 21 Jul 2025]} 2025-07-20T12:42:34Z
Research Data Service Illinois Data Bank
Access and Use Policies Web Privacy Notice Contact Us