View markdown source on GitHub

Pangeo ecosystem 101 for everyone

Contributors

Questions

Objectives

last_modification Published: Feb 18, 2022
last_modification Last Updated: Mar 1, 2022

Pangeo in a nutshell

A Community platform for Big Data geoscience

Funders

NSF Logo EarthCube Logo NASA Logo MOORE Logo By Gordon and Betty Moore Foundation - Own work, Public Domain

Speaker Notes


Motivations

There are several building crises facing the geoscience community:

.left[- Big Data: datasets are growing too rapidly and legacy software tools for scientific analysis can’t handle them. This is a major obstacle to scientific progress.] .left[- Technology Gap: a growing gap between the technological sophistication of industry solutions (high) and scientific software (low).] .left[- Reproducibility: a fragmentation of software tools and environments renders most geoscience research effectively unreproducible and prone to failure.]

Speaker Notes


Goals

Pangeo aims to address these challenges through a unified, collaborative effort.

The mission of Pangeo is to cultivate an ecosystem in which the next generation of open-source analysis tools for ocean, atmosphere and climate science can be developed, distributed, and sustained. These tools must be scalable in order to meet the current and future challenges of big data, and these solutions should leverage the existing expertise outside of the geoscience community.

Speaker Notes


The Pangeo Approach

Pangeo approach

Source: Pangeo 2.0 by Ryan Abernathey, December 22, 2020.

Speaker Notes


The Pangeo Software ecosystem

Pangeo approach

Source: Pangeo Tutorial - Ocean Sciences 2020 by Ryan Abernathey, February 17, 2020.

Speaker Notes


Pangeo Galaxy Tools

A growing number of tools available to non Python programmers

.left[Xarray Galaxy tools:

Our objective is to bridge the gap between disciplines and add tools on demand to help cross-disciplinary research

Speaker Notes


How to cite and support Pangeo

Speaker Notes


Learn more

Speaker Notes


Key Points

Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors! Galaxy Training Network Tutorial Content is licensed under Creative Commons Attribution 4.0 International License.

References

  1. Abernathey, R., kevin paul, joe hamman, matthew rocklin, chiara lepore et al., 2017 Pangeo NSF Earthcube Proposal. 10.6084/m9.figshare.5361094.v1 https://figshare.com/articles/Pangeo_NSF_Earthcube_Proposal/5361094
  2. Abernathey, R. P., T. Augspurger, A. Banihirwe, C. C. Blackmon-Luca, T. J. Crone et al., 2021 Cloud-Native Repositories for Big Scientific Data. Computing in Science & Engineering 23: 26–35. 10.1109/mcse.2021.3059437
  3. Gentemann, C. L., C. Holdgraf, R. Abernathey, D. Crichton, J. Colliander et al., 2021 Science Storms the Cloud. 10.1002/essoar.10506344.2
  4. Sambasivan, N., S. Kapania, H. Highfill, D. Akrong, P. Paritosh et al., 2021 “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI, in Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, ACM. 10.1145/3411764.3445518