REMBI - Recommended Metadata for Biological Images – metadata guidelines for bioimaging data

Author(s)	Julia Jakiela Laura Cooper Katarzyna Kamieniecka Krzysztof Poterlowicz
Reviewers

Overview
Questions:

What is REMBI and why should I use it?

What information should be included when collecting bioimage data?

Objectives:

Organise bioimage metadata

Find out what REMBI is and why it is useful

Categorise what metadata belongs to each of the submodules of REMBI

Gather the metadata for an example bioimage dataset

Requirements:

tutorial Hands-on: FAIR in a nutshell

tutorial Hands-on: FAIR data management solutions

tutorial Hands-on: FAIR Bioimage Metadata

Time estimation: 15 minutes

Supporting Materials:

video Recordings

video Tutorial (September 2024) - 48m14s

video View All

instances Available on these Galaxies

Possibly Working

UseGalaxy.eu

UseGalaxy.org

UseGalaxy.org.au

UseGalaxy.fr

Published: Sep 13, 2023

Last modification: Oct 15, 2024

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

purl PURL: https://gxy.io/GTN:T00361

version Revision: 6

Metadata guidelines for bioimaging data

REMBI (Recommended Metadata for Biological Images) was proposed as a draft metadata guidelines to begin addressing the needs of diverse communities within light and electron microscopy. Currently, these guidelines are in draft form to encourage discussion within the community, but they provide a useful guide as to what metadata should be gathered to make your image data FAIR. They divide the metadata requirements into eight modules which further split into attributes - that seems to be a daunting task, doesn’t it? But at the same time it’s exciting news for the community! To find out more, have a look at the REMBI article.

Question

In the REMBI paper, the authors consider three potential user groups who require different metadata. Find out what are these three groups and their metadata requirements.

The identified three user groups are: Biologists, Imaging scientists, Computer-vision researchers.

A research biologist may be interested in the biological sample that has been imaged to compare it to similar samples that they are working with.

An imaging scientist may be interested in how the image was acquired so they can improve upon current image acquisition techniques.

A computer vision researcher may be interested in annotated ground-truth segmentations, that can be obtained from the image, so they can develop faster and more accurate algorithms.

If you’re an instructor leading this training, you might ask people to work in small groups for this exercise and encourage the discussion. Ask group members to share which of the user groups they identify as and what metadata they would want.

Categories of metadata

REMBI covers different categories of metadata, such as:

study
study component
biosample
specimen
image acquisition
image data
image correlation
analyzed data

Within each module, there are attributes that should be included to make the published data FAIR. We will explore all the modules and attributes suggested by REMBI and we’ll show some examples as well.

Study

The first module of REMBI metadata describes the Study and should include:

Study type
Study description
General dataset information

Study type

Ideally, the study type will be part of an ontology. You can look up the main subject of your study using a tool like OLS to find a suitable ontology. This will help others to see where your study sits within the wider research area.

Comment: Example

Study type Regulation of mitotic cell division

Study type	Regulation of mitotic cell division

Study description

A brief description of the project. The Study Description should include the title of the study, a brief description and any related publication details such as authors, title and DOI. If you are gathering metadata prepublication, you can fill in the publication details later or enter a draft title or the journal name you plan to submit to. It’s still a good idea to include the category, so you don’t forget.

Comment: Example

Study description

Title Imaging mitotic cells

Description Visualising HeLa cells using confocal microscopy

Publication details TBC

Study description
Title	Imaging mitotic cells
Description	Visualising HeLa cells using confocal microscopy
Publication details	TBC

General dataset information

This should include all the information that relates to all the data in the project. This can include the names of contributors and the repository where the data is or will be stored. State the licence under which you intend to make the data available, the repository you intend to submit to and if you are using a schema for structuring your metadata. This helps to keep all collaborators on the same page. Any other general information with respect to the study can be included here, but try to keep this broad as more detailed information should be included in other sections of the metadata.

Comment: Example

General Dataset Information

Contributors Alica and Bob

Repository Bioimage Archive

Licenses CC-BY

Schemas Datacite Metadata

General Dataset Information
Contributors	Alica and Bob
Repository	Bioimage Archive
Licenses	CC-BY
Schemas	Datacite Metadata

Study component

A study component can be thought of as an experiment, both the physical experiment and subsequent data analysis, or a series of experiments that have been conducted with the same aim in mind.

The associated metadata should describe the imaging method used and include a description of the image dataset. The REMBI guidelines store high-level metadata in the study component and then divide the more detailed metadata into other modules.

Within the Study component we include the Imaging Method which should describe the techniques used to acquire the raw data. This could be one or multiple methods, which should be part of a relevant ontology. For Confocal Microscopy data, we can use the Biological Imaging Methods Ontology, although it is also present in a number of other ontologies.

The description of the study component should include an overview of what was imaged as well as any processed data that is created during analysis.

Comment: Example

Imaging Method Confocal Microscopy

Study Component Description Images of cells and segmented binary masks

Imaging Method	Confocal Microscopy
Study Component Description	Images of cells and segmented binary masks

You could either choose to store the metadata in the same file as your study data or have a new file for each study component. This could be stored in the same place as your study metadata, or you could create a subdirectory structure.

Biosample

The first thing you need for the biosample metadata is an Identity. This is a code that you assign to each sample you are describing, which will link this metadata to the physical sample. Then, state what the biological entity is, which should come from a relevant ontology. Use a taxonomy to name the organism. Next, describe the variables in your experiment. The REMBI guidelines split the variables into three types:

intrinsic - describe an innate trait of the biosample, such as a genetic alteration
extrinsic - describe something you added to the sample, for example, a reagent
experimental - things that you intentionally vary, like time

You can leave out some of the variables if they are not part of your experiment.

Comment: Example

Identity CM001

Biological entity JURKAT E-6.1 cell

Organism Homo sapiens

Intrinsic variable Jurkat E6.1 transfected with emerald-VAMP7

Extrinsic variable Aspirin

Experimental variables Dose response of aspirin

Identity	CM001
Biological entity	JURKAT E-6.1 cell
Organism	Homo sapiens
Intrinsic variable	Jurkat E6.1 transfected with emerald-VAMP7
Extrinsic variable	Aspirin
Experimental variables	Dose response of aspirin

Specimen

The specimen metadata should include:

the experimental status (control or test)
the location within the biosample, such as a coordinate or a particular well in a plate
how the sample was prepared
how the signal is being generated
the content and biological entities of different channels.

Include enough information so that someone with experience in the field could reproduce a sample by following the information you provided. Assume they would know typical techniques and name them using terms from an ontology if possible. Only include lots of detail if you are describing a novel technique.

Comment: Example

Experimental status Control

Location within biosample Plasma membrane within 100 nm of coverslip (TIRF)

Preparation method Cos-7 cells cultured in DMEM medium, and then plated on #1 coverslips and imaged live in L-15 medium

Signal/contrast mechanism fluorescent proteins

Channel – content Green: eGFP, Red: mCherry

Channel – biological entity Green: EGFR, Red: Src

Experimental status	Control
Location within biosample	Plasma membrane within 100 nm of coverslip (TIRF)
Preparation method	Cos-7 cells cultured in DMEM medium, and then plated on #1 coverslips and imaged live in L-15 medium
Signal/contrast mechanism	fluorescent proteins
Channel – content	Green: eGFP, Red: mCherry
Channel – biological entity	Green: EGFR, Red: Src

Image acquisition

Here you should include all the information about the instrument you used and how it was set up. Like with the specimen metadata, describe this information as though you are speaking to someone who already knows how to use a similar instrument. What would they need to know to produce the same image data?

Check with your facility manager if they have any guidelines for what details need to be recorded for your particular instrument. Make sure that the parameters you record can actually be used by someone else if they don’t have exactly the same instrument or setup. For example, don’t say that you used a certain percentage of laser power, as this doesn’t tell you how much power was used unless you also provide the total power of the laser. If the instrument software has automatically generated a metadata file, remember to save this. Depending on its content, this may be sufficient.

Start with the details of the equipment for the Instrument Attributes. If this is commercial equipment, include the make and model, a short description of what type of instrument it is and details about its configuration. If the instrument is bespoke, you will need to include more details. Next, you should include image acquisition parameters. These relate to how the instrument was set up for the particular experiment. Some of these may be captured automatically by the instrument’s software, so make things easy for yourself and check if a file is generated and what’s in it. If a file is generated, then you only need to manually record anything that is missing from the file.

Comment: Example

Instrument attributes Olympus FV3000, laser point scanning confocal, 500-550 nm filter, 37-degree chamber.

Image acquisition parameters

Objective 20x

Excitation Wavelength 488 nm

PMT gain 500 V

Pixel dwell time 2 𝜇s

Confocal aperture 200 𝜇m

Instrument attributes	Olympus FV3000, laser point scanning confocal, 500-550 nm filter, 37-degree chamber.
Image acquisition parameters
Objective	20x
Excitation Wavelength	488 nm
PMT gain	500 V
Pixel dwell time	2 𝜇s
Confocal aperture	200 𝜇m

To help you collect the information for your own data, you might have a look at the local resources from your institution or universities. For example, at Warwick University, there are webpages describing the metadata that needs to be collected for some of the microscopes.

Image data

In this section, you record all the information related to all the images you have. Not only the primary or raw images, but also any processed images, perhaps such as binary files showing the resulting segmentation.

You need to say what format the images are in and if they have undergone any compression, the dimensions of the images, and what the physical size of the pixel or voxel is, including the units. Most of this information you should be able to get from the metadata or header of the image files.

Next, you need to state the physical size of the image or magnification, calculated from the pixel or voxel size and the dimension extents. Give any information related to how the channels are represented. For processed images, you need to provide the methods used for processing.

Finally, say you have used contrast inversion, do the bright features in the image correspond to areas of high signal, or is it the other way around?

Comment: Example

Type Primary Image, Segmentation

Format and compression Primary: .oir (Olympus), Segmentation: .tiff

Dimension extents x: 512, y: 512, z: 25

Size description 153.6 x 153.6 x 25 𝜇m

Pixel/Voxel size description 0.3 x 0.3 x 0.1 𝜇m

Image processing method Fiji: Median filter (3 pixel kernel), Otsu threshold

Contrast inversion No

Type	Primary Image, Segmentation
Format and compression	Primary: .oir (Olympus), Segmentation: .tiff
Dimension extents	x: 512, y: 512, z: 25
Size description	153.6 x 153.6 x 25 𝜇m
Pixel/Voxel size description	0.3 x 0.3 x 0.1 𝜇m
Image processing method	Fiji: Median filter (3 pixel kernel), Otsu threshold
Contrast inversion	No

Image correlation

If you have used different imaging modalities with the same sample, this part of the metadata should describe how the images relate to one another. You could use this section to describe generally the relationship between images. In the example below, images from different modalities have been aligned.

Comment: Example

Spatial and temporal alignment Manual

Fiducials used Soil grains

Transformation matrix See file: Transforms.csv

Size description 153.6 x 153.6 x 25 𝜇m

Related images and relationship Primary XCT: Data/XCT Primary XRF: Data/XRF Processed XRF: Data/Transformed_XRF

Spatial and temporal alignment	Manual
Fiducials used	Soil grains
Transformation matrix	See file: Transforms.csv
Size description	153.6 x 153.6 x 25 𝜇m
Related images and relationship	Primary XCT: Data/XCT Primary XRF: Data/XRF Processed XRF: Data/Transformed_XRF

Analysed data

This section should not include metadata for any image data, including processed images, as that should have been covered in the Image Data section. Instead, it should describe the analysis results you have, such as measurements. Have you done some numerical analysis or some phenotyping or something else? There is no need to describe the methods in great detail if they are already described in the relevant publication.

Comment: Example

Analysis results type Speed of cell division

Data used for analysis Preprocessed images, Cell tracks

Analysis method Track cell lineage: BayesianTracker (btrack) with configuration track_config.json Measure speed: Numerical analysis in Python

Analysis results type	Speed of cell division
Data used for analysis	Preprocessed images, Cell tracks
Analysis method	Track cell lineage: BayesianTracker (btrack) with configuration track_config.json Measure speed: Numerical analysis in Python

Final notes

For more examples, check out REMBI Supplementary Information - either in pdf or spreadsheet.

At first glance, it might seem to be quite a stretch to collect all that metadata! But don’t get discouraged - following those guidelines will ensure better communication between the scientists and will make your research FAIR: Findable, Accessible, Interoperable, Reusable. During big data era when we are surrounded by so much resources, it’s crucial to get good data management habits, share them with others and hence contribute to the development of Science toghether.

You've Finished the Tutorial

Key points

REMBI describes useful guidelines for bioimaging that can help unification and FARIfication of the data.

Frequently Asked Questions

Have questions about this tutorial? Check out the FAQ page for the FAIR Data, Workflows, and Research topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Julia Jakiela, Laura Cooper, Katarzyna Kamieniecka, Krzysztof Poterlowicz, REMBI - Recommended Metadata for Biological Images – metadata guidelines for bioimaging data (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/fair/tutorials/bioimage-REMBI/tutorial.html Online; accessed TODAY
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{fair-bioimage-REMBI,
author = "Julia Jakiela and Laura Cooper and Katarzyna Kamieniecka and Krzysztof Poterlowicz",
	title = "REMBI - Recommended Metadata for Biological Images – metadata guidelines for bioimaging data (Galaxy Training Materials)",
	year = "",
	month = "",
	day = "",
	url = "\url{https://training.galaxyproject.org/training-material/topics/fair/tutorials/bioimage-REMBI/tutorial.html}",
	note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol}
}

                   

Funding

These individuals or organisations provided funding support for the development of this resource

DASH UK

This Fellowship was funded through the ELIXIR-UK DaSH project as part of the UKRI Innovation Scholars: Data Science Training in Health and Bioscience call (DaSH). (MR/V038966/1). The project aims to embed Research Data Management (RDM) know-how into UK universities and institutes by producing and delivering training in FAIR data stewardship using ELIXIR-UK knowledge and resources.

ELIXIR Europe

Congratulations on successfully completing this tutorial!

Go Further
Do you want to extend your knowledge? Follow one of our recommended follow-up trainings:

Imaging

You can use Ephemeris's shed-tools install command to install the tools used in this tutorial.
shed-tools install [-g GALAXY] [-a API_KEY] -t <(curl https://training.galaxyproject.org/training-material/api/topics/fair/tutorials/bioimage-REMBI/tutorial.json | jq .admin_install_yaml -r)
Alternatively you can copy and paste the following YAML
---
install_tool_dependencies: true
install_repository_dependencies: true
install_resolver_dependencies: true
tools: []

No feedback has been recieved yet for this training. Be the first one by filling in the feedback form from above.