Automatic identification of diatoms using deep learning to improve ecological diagnosis of aquatic environments
- Authors
- Publication Date
- Dec 13, 2023
- Source
- Hal-Diderot
- Keywords
- Language
- English
- License
- Unknown
- External links
Abstract
Diatoms are a type of unicellular algae found in all aquatic environments. These organisms are very sensitive to changes in water quality and habitat conditions. This characteristic makes them useful for bioindication: In France, the Biological Diatom Index (BDI) has been used routinely since 2000 to assess the ecological quality of watercourses, within the framework of the European Water Framework Directive. Traditionally, assessing the ecological quality of water and ecosystem health through diatoms is a meticulous and time-consuming process. From natural samples, qualified experts in diatom taxonomy identify individuals to the species level based on optical microscope observations. However, this manual identification process is not without its challenges. Factors such as the quality of microscopy equipment, the level of expertise of experts, and the inherent subjectivity of human judgment all contribute to variability in the identification results. In this context, this work aims to reduce this variability by automating the diatom identification process. The first objective is the development of a tool to detect diatoms within microscope images, distinguishing them among a myriad of other objects present. The second objective is to classify the detected diatoms down to the species level, with a certain level of confidence. The main contribution of this research lies in the creation of an end-to-end pipeline for automating diatom identification, based on deep learning methods. Deep learning has demonstrated remarkable capabilities in tasks requiring complex pattern recognition and classification, which is ideally suited for automating the nuanced and complex process of diatom identification. By harnessing the power of deep neural networks, this pipeline streamlines and accelerates the identification process, providing a more efficient alternative to manual classification methods. However, the adoption of deep learning is not without its own challenges. A fundamental requirement is access to substantial and properly annotated datasets to effectively train these neural networks. To overcome this challenge, a method for generating synthetic datasets was proposed. By meticulously crafting artificial data while preserving the characteristics of real diatom images, this method helps supplement available training data and improves the neural network's ability to generalize its learning to new diatom samples. Another challenge arises from similarities between different diatom species (inter-specie similarity) and variations within a single species (intra-specie variability) during the automated classification process. This poses a considerable challenge to the accurate classification of diatoms. This work explores innovative strategies and techniques that account for these complexities, thereby improving the system's ability to discern subtle differences and make accurate classifications. Finally, a new method for quantifying uncertainty in deep classifiers is also developed, which contributes to improving the reliability of the classification process, in particular by making it possible to detect out-of-the-distribution images. To demonstrate the practicability of the developed tool, its application in biomonitoring is presented by calculating the IBD on a large number of representative samples from the Rhine-Meuse basin.