Our workflow standardizes the process of implementing deep learning (DL) use cases for electron microscopy (EM). It is designed for DL experts by streamlining training, testing, and inference through a PyTorch-based playground with a jupyter notebook based interface for easy use by EM experts. DL experts can easily contribute their own use cases using our template. This approach enables electron microscopists to work with a single, user-friendly implementation to get more familiar in the area of deep learning, while simplifying the development process for DL specialists.
Image processing icons created by BomSymbols - Flaticon, Ai brain icons created by Eklip Studio - Flaticon, Evaluation icons created by justicon - Flaticon, Inference icons created by Freepik - Flaticon
In the following, we will introduce the steps of the workflow to EM specialists as well as DL specialists. Please open the corresponding tab when reading.
In deep learning for electron microscopy (EM), the process of creating and optimizing models to address specific challenges within EM is known as development. This process is structured around three key steps:
These steps ensure deep learning models are effectively adapted for EM tasks, providing solutions specific to your lab's requirements.
Data preparation is a critical aspect of the deep learning pipeline. Recognizing that expertise in data collection and annotation primarily resides within EM labs, our workflow is designed to provide guidance for EM researchers to develop their own datasets in collaboration with DL experts.
While each use case in our workflow focuses on a primary task (e.g., counting objects in EM images), the workflow is flexible enough to allow you to swap the application area (e.g., quantifying mitochondria in EM images) without needing to modify the code—only the data needs to be replaced.
Data acquisition is the first step in creating a dataset. This involves gathering raw EM images, typically from various imaging modalities such as TEM, STEM, or SEM. As a EM expert, your role is to collect diverse and well-balanced datasets that cover a range of features relevant to the task. DL experts will support and guide you through the process if needed by providing nessecary information within their use case.
Annotating EM data is often the most time-consuming part of dataset preparation. Our workflow enables EM researchers to annotate their data using the Vision Annotation Tool (CVAT), a user-friendly tool that simplifies this process.
Data preprocessing is essential for preparing your dataset for model training. It includes several important steps:
Correctly structuring your data allows you to adapt the use case application based on the provided training data. To do this, you will need to follow the data structure defined by the DL expert. For simplicity, DL experts are encouraged to organize the dataset into a single folder for training, validation, and testing when submitting their work to the playground, as data splitting can be handled during runtime. This approach simplifies the data structuring process for EM experts. Details will be documented within each use case individually.
During training, the model learns patterns from the data by adjusting its internal parameters (weights) based on the input-output relationships. This process is guided by a loss function, which measures the error between predicted and true values (labels/annotations). The model is iteratively updated to minimize this error. Validation, on the other hand, involves evaluating the model’s performance on a separate set of data (the validation set) that it hasn't seen during training. This helps to check how well the model generalizes to new, unseen data and aids in detecting issues such as overfitting. The training and validation processes together ensure that the model is well-suited for the task at hand and can deliver reliable results in real-world applications.
Hyperparameter tuning is the process of selecting the best values for parameters that influence the model's performance, but cannot be optimized during training. We offer an automated search to simplify this process. DL experts define a default search space for those who prefer not to engage with the technical details. If you are more experienced or willing to learn about the process, we offer the ability to modify the search space as needed without code changes, but by filling a simple form. The DL expert will provide explanations of the incluence of each tunable parameter within the use case.
During model training, performance is continuously monitored through logging. This provides EM researchers with valuable insights into the training process, helping identify issues like overfitting or data biases, and learning about the process of model training.
For each execution of the full notebook , logs are saved in a dedicated directory
(logs/data-current-datetime/
).
For each training run a subfolder will be created. There can be multiple folders called
Sweep_idx
containing the logs for each sweep run of the hyperparameter tuning.
Additionally, there will be one subfolder TrainingRun
containing the logs for the full
model training. Finally, there is one subfolder Evaluate
containing logging results of the
evaluation.
Each subfolder may contain following logs:
Attributes | Explanation | Directory |
---|---|---|
Hyperparameters: | A record of the training hyperparameters. | logs/data-current-datetime/subfolder/hyperparameter.json |
Model Checkpoints: | Snapshots of the model at various stages, enabling you to resume training or use the best model for inference. | logs/data-current-datetime/subfolder/checkpoints |
Training/Validation Loss Curves: | Graphs showing the model's training progress over time. For better understanding of training curves see this guide. | logs/data-current-datetime/subfolder/plots |
Qualitative Visualizations: | Sample images alongside model predictions to assess visual accuracy. | logs/data-current-datetime/subfolder/samples (validation) or
logs/data-current-datetime/Evaluate/samples (test)
|
Test Metrics: | Quantitative performance measures on unseen test data. | logs/data-current-datetime/Evaluate/test_results.txt |
Model evaluation assesses whether the trained model meets the desired criteria and is ready for deployment or requires further refinement.
Typically, the most recent model is selected for evaluation, but you can also evaluate other models by providing the checkpoint path.
To help you assess the models performance, the DL expert determines the specific evaluation metrics based on the model's goals. Each used metric will be explained within the use case by the DL expert to provide a better understanding of the aspects of evaluation. This can also help to identify possible shortcomings of the evaluation by EM experts.
Please note that the DeepEM Playground provides a tool to bridge the gap between DL and EM experts, fostering improved research in this interdisciplinary field and demonstrating the power of deep learning. However, none of the trained models are flawless and should always be used with human oversight. We do not take responsibility for any irresponsible usage of the models or their predictions.
Inference is the process of using a thoroughly trained and tested deep learning model to make predictions on new, unseen data. It finally allows to use the model to support the analysis of EM data.
To perform inference with a trained model, you first need to define the data you want the model to make predictions on. This could be a specific set of EM data that you wish to analyze. The data can be provided as a single file for prediction or as a folder containing multiple files, allowing for automatic processing of all the included data. This flexibility ensures the model can handle both individual cases and larger datasets efficiently.
The EM specialist is responsible for selecting a previously trained model for inference. It is essential that the model has been evaluated thoroughly according to the evaluation criteria provided by the DL expert, ensuring that the evaluation results are promising. Only models with strong performance, as indicated by the evaluation metrics, should be used for making predictions, ensuring reliable and accurate results.
The model will be used to make predictions on the provided data. However, it's important to remember that no trained model is perfect, and human oversight remains essential. The results generated by the model should always be carefully checked for plausibility to ensure accuracy and reliability.
BibTex Code Here