DeepEM Playground

Workflow

Our workflow standardizes the process of implementing deep learning (DL) use cases for electron microscopy (EM). It is designed for DL experts by streamlining training, testing, and inference through a PyTorch-based playground with a jupyter notebook based interface for easy use by EM experts. DL experts can easily contribute their own use cases using our template. This approach enables electron microscopists to work with a single, user-friendly implementation to get more familiar in the area of deep learning, while simplifying the development process for DL specialists.

Figure 1: We propose a simple workflow for developing deep learning solutions for the supported analysis of EM data. The workflow is designed in such way, that it allows DL experts to implement and provide DL solutions and evaluation methods with minimal overhead, while EM experts are able to train their own models.

Image processing icons created by BomSymbols - Flaticon, Ai brain icons created by Eklip Studio - Flaticon, Evaluation icons created by justicon - Flaticon, Inference icons created by Freepik - Flaticon

In the following, we will introduce the steps of the workflow to EM specialists as well as DL specialists. Please open the corresponding tab when reading.

Development

In deep learning for electron microscopy (EM), the process of creating and optimizing models to address specific challenges within EM is known as development. This process is structured around three key steps:

Data: Preparing high-quality, well-annotated datasets tailored to your lab’s needs.
Model Training: Training models by optimizing architectures and parameters.
Model Evaluation: Evaluating the model’s performance using task-specific metrics.

These steps ensure deep learning models are effectively adapted for EM tasks, providing solutions specific to your lab's requirements.

1. Data

Data preparation is a critical aspect of the deep learning pipeline. Recognizing that expertise in data collection and annotation primarily resides within EM labs, our workflow is designed to provide guidance for EM researchers to develop their own datasets in collaboration with DL experts.

While each use case in our workflow focuses on a primary task (e.g., counting objects in EM images), the workflow is flexible enough to allow you to swap the application area (e.g., quantifying mitochondria in EM images) without needing to modify the code—only the data needs to be replaced.

1.1 Data Acquisition

Data acquisition is the first step in creating a dataset. This involves gathering raw EM images, typically from various imaging modalities such as TEM, STEM, or SEM. As a EM expert, your role is to collect diverse and well-balanced datasets that cover a range of features relevant to the task. DL experts will support and guide you through the process if needed by providing nessecary information within their use case.

1.2 Data Annotation

Annotating EM data is often the most time-consuming part of dataset preparation. Our workflow enables EM researchers to annotate their data using the Vision Annotation Tool (CVAT), a user-friendly tool that simplifies this process.

We provide a step-by-step guide for setting up CVAT and annotating images.
Each use case provides a step-by-step guide for the needed annotations, in order to allow you as EM researcher to annotate your own dataset for training a deep learning model.

1.3 Data Preprocessing

Data preprocessing is essential for preparing your dataset for model training. It includes several important steps:

Data Reformatting: Ensures that your data is compatible with the deep learning code. We recommend DL experts who contribute their work to be using standard formats such as .tif or .mrc.
Image Enhancement: Image enhancement involves improving image quality using techniques such as denoising and contrast adjustment. In some cases, these enhancements can boost the training and downstream performance of deep learning models. If necessary, the steps to enhance your data will be documented within each specific use case.

1.4 Data Structuring

Correctly structuring your data allows you to adapt the use case application based on the provided training data. To do this, you will need to follow the data structure defined by the DL expert. For simplicity, DL experts are encouraged to organize the dataset into a single folder for training, validation, and testing when submitting their work to the playground, as data splitting can be handled during runtime. This approach simplifies the data structuring process for EM experts. Details will be documented within each use case individually.

2. Model Training

During training, the model learns patterns from the data by adjusting its internal parameters (weights) based on the input-output relationships. This process is guided by a loss function, which measures the error between predicted and true values (labels/annotations). The model is iteratively updated to minimize this error. Validation, on the other hand, involves evaluating the model’s performance on a separate set of data (the validation set) that it hasn't seen during training. This helps to check how well the model generalizes to new, unseen data and aids in detecting issues such as overfitting. The training and validation processes together ensure that the model is well-suited for the task at hand and can deliver reliable results in real-world applications.

2.1. Hyperparameter Tuning

Hyperparameter tuning is the process of selecting the best values for parameters that influence the model's performance, but cannot be optimized during training. We offer an automated search to simplify this process. DL experts define a default search space for those who prefer not to engage with the technical details. If you are more experienced or willing to learn about the process, we offer the ability to modify the search space as needed without code changes, but by filling a simple form. The DL expert will provide explanations of the incluence of each tunable parameter within the use case.

2.2. Training and Validation

During model training, performance is continuously monitored through logging. This provides EM researchers with valuable insights into the training process, helping identify issues like overfitting or data biases, and learning about the process of model training.

For each execution of the full notebook , logs are saved in a dedicated directory (logs/data-current-datetime/). For each training run a subfolder will be created. There can be multiple folders called Sweep_idx containing the logs for each sweep run of the hyperparameter tuning. Additionally, there will be one subfolder TrainingRun containing the logs for the full model training. Finally, there is one subfolder Evaluate containing logging results of the evaluation. Each subfolder may contain following logs:

Attributes	Explanation	Directory
Hyperparameters:	A record of the training hyperparameters.	`logs/data-current-datetime/subfolder/hyperparameter.json`
Model Checkpoints:	Snapshots of the model at various stages, enabling you to resume training or use the best model for inference.	`logs/data-current-datetime/subfolder/checkpoints`
Training/Validation Loss Curves:	Graphs showing the model's training progress over time. For better understanding of training curves see this guide.	`logs/data-current-datetime/subfolder/plots`
Qualitative Visualizations:	Sample images alongside model predictions to assess visual accuracy.	`logs/data-current-datetime/subfolder/samples` (validation) or `logs/data-current-datetime/Evaluate/samples` (test)
Test Metrics:	Quantitative performance measures on unseen test data.	`logs/data-current-datetime/Evaluate/test_results.txt`

3. Model Evaluation

Model evaluation assesses whether the trained model meets the desired criteria and is ready for deployment or requires further refinement.

Inference

Inference is the process of using a thoroughly trained and tested deep learning model to make predictions on new, unseen data. It finally allows to use the model to support the analysis of EM data.

1. Define Data

To perform inference with a trained model, you first need to define the data you want the model to make predictions on. This could be a specific set of EM data that you wish to analyze. The data can be provided as a single file for prediction or as a folder containing multiple files, allowing for automatic processing of all the included data. This flexibility ensures the model can handle both individual cases and larger datasets efficiently.

2. Choose Model

The EM specialist is responsible for selecting a previously trained model for inference. It is essential that the model has been evaluated thoroughly according to the evaluation criteria provided by the DL expert, ensuring that the evaluation results are promising. Only models with strong performance, as indicated by the evaluation metrics, should be used for making predictions, ensuring reliable and accurate results.

2. Make Prediction

The model will be used to make predictions on the provided data. However, it's important to remember that no trained model is perfect, and human oversight remains essential. The results generated by the model should always be carefully checked for plausibility to ensure accuracy and reliability.

BibTeX

BibTex Code Here

Workflow

Development

1. Data

1.1 Data Acquisition

1.2 Data Annotation

1.3 Data Preprocessing

1.4 Data Structuring

2. Model Training

2.1. Hyperparameter Tuning

2.2. Training and Validation

3. Model Evaluation

3.1. Choose Model

3.2. Evaluation

Development

1. Data

1.1 Data Acquisition

1.2 Data Annotation

1.3 Data Preprocessing

1.4 Data Structuring

2. Model Training

2.1 Hyperparameter Tuning

2.2 Training and Validation

3. Model Evaluation

3.1 Choose Model

3.2 Evaluation

Inference

1. Define Data

2. Choose Model

2. Make Prediction

Inference

1. Define Data

2. Choose Model

3. Make Prediction

BibTeX