All Learning Resources

  • DataONE Data Management Module 06: Data Protection and Backups

    There are several important elements to digital preservation, including data protection, backup and archiving. In this lesson, these concepts are introduced and best practices are highlighted with case study examples of how things can go wrong. Exploring the logistical, technical and policy implications of data preservation, participants will be able to identify their preservation needs and be ready to implement good data preservation practices by the end of the module. This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.

  • DataONE Data Management Module 07: Metadata

    What is metadata? Metadata is data (or documentation) that describes and provides context for data and it is everywhere around us. Metadata allows us to understand the details of a dataset, including: where it was collected, how it was collected, what gaps in the data mean, what the units of measurement are, who collected the data, how it should be attributed etc. By creating and providing good descriptive metadata for our own data, we enable others to efficiently discover and use the data products from our research. This lesson explores the importance of metadata to data authors, users of the data and organizations, and highlights the utility of metadata. It provides an overview of the different metadata standards that exist, and the core elements that are consistent across them; guiding users in selecting a metadata standard to work with and introduces the best practices needed for writing a high quality metadata record. 
    This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise, handout, and supporting data files.

  • DataONE Data Management Module 08: Data Citation

    Data citation is a key practice that supports the recognition of data creation as a primary research output rather than as a mere byproduct of research. Providing reliable access to research data should be a routine practice, similar to the practice of linking researchers to bibliographic references. After completing this lesson, participants should be able to define data citation and describe its benefits; to identify the roles of various actors in supporting data citation; to recognize common metadata elements and persistent data locators and describe the process for obtaining one, and to summarize best practices for supporting data citation. This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.

  • DataONE Data Management Module 09: Analysis and Workflows

    Understanding the types, processes, and frameworks of workflows and analyses is helpful for researchers seeking to understand more about research, how it was created, and what it may be used for. This lesson uses a subset of data analysis types to introduce reproducibility, iterative analysis, documentation, provenance and different types of processes. Described in more detail are the benefits of documenting and establishing informal (conceptual) and formal (executable) workflows. This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.

  • DataONE Data Management Module 10: Legal and Policy Issues

    Conversations regarding research data often intersect with questions related to ethical, legal, and policy issues for managing research data. This lesson will define copyrights, licenses, and waivers, discuss ownership and intellectual property, and describe some reasons for data restriction. After completing this lesson, participants will be able to identify ethical, legal, and policy considerations that surround the use and management of research data. The 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.

  • NASA Earthdata Webinar Series

    Monthly webinars on discovery and access to NASA Earth science data sets, services and tools.  Webinars are archived on YouTube from 2013 to the present.  Presenters are experts in different domains within NASA's Earth science research areas and are usually affiliated with NASA data centers and / or data archives.  Specific titles for the current year's webinars can be found from the main page, but can also be found from separate pages for each year.  These webinars are available to assist those wishing to learn or teach how to obtain and view these data. 

  • NASA Earthdata Video Tutorials

    Short video tutorials on topics related to available NASA EOSDIS data products, various types of data discovery, data access, and data tool demonstrations such as the Panoply tool for creating line plots.  Videos accessible on YouTube from listing on main webinars and tutorials page.  These tutorials are available to assist those wishing to learn or teach how to obtain and view these data. 

  • Transform and visualize data in R using the packages tidyr, dplyr and ggplot2: An EDI VTC Tutorial.

    The two tutorials, presented by Susanne Grossman-Clarke, demonstrate how to tidy data in R with the package “tidyr” and transform data using the package “dplyr”. The goal of those data transformations is to support data visualization with the package “ggplot2” for data analysis and scientific publications of which examples were shown.

  • Introduction to code versioning and collaboration with Git and GitHub: An EDI VTC Tutorial.

    This tutorial is an introduction to code versioning and collaboration with Git and GitHub.  Tutorial goals are to help you:  

    • Understand basic Git concepts and terminology.
    • Apply concepts as Git commands to track versioning of a developing file.
    • Create a GitHub repository and push local content to it.
    • Clone a GitHub repository to the local workspace to begin developing.
    • Inspire you to incorporate Git and GitHub into your workflow.

    There are a number of exercises within the tutorial to help you apply the concepts learned.  
    Follow up questions can be directed via email to:  o Colin Smith  ( AND Susanne Grossman-Clarke (

  • 23 (research data) Things

    23 (research data) Things is self-directed learning for anybody who wants to know more about research data. Anyone can do 23 (research data) Things at any time.  Do them all, do some, cherry-pick the Things you need or want to know about. Do them on your own, or get together a Group and share the learning.  The program is intended to be flexible, adaptable and fun!

    Each of the 23 Things offers a variety of learning opportunities with activities at three levels of complexity: ‘Getting started’, ‘Learn more’ and ‘Challenge me’. All resources used in the program are online and free to use.

  • Introduction to Data Documentation - DISL Data Management Metadata Training Webinar Series - Part 1

    Introduction to data documentation (metadata) for science datasets. Includes basic concepts about metadata and a few words about data accessibility. Video is about 23 minutes.

  • NSIDC DAAC Data Recipes

    A collection of tutorials, called "data recipes" that describe how to use Earth science data from NASA's National Snow and Ice Data Center (NSIDC) using easily available tools and commonly used formats for Earth science data.  These tutorials are available to assist those wishing to learn or teach how to obtain and view these data. 

  • Why Cite Data?

    This video explains what data citation is and why it's important. It also discusses what digital object identifiers (DOIs) are and how they are used.

  • MANTRA Research Data Management Training

    MANTRA is a free, online non-assessed course with guidelines to help you understand and reflect on how to manage the digital data you collect throughout your research. It has been crafted for the use of post-graduate students, early career researchers, and also information professionals. It is freely available on the web for anyone to explore on their own.

    Through a series of interactive online units you will learn about terminology, key concepts, and best practice in research data management.

    There are eight online units in this course and one set of offline (downloadable) data handling tutorials that will help you:

    Understand the nature of research data in a variety of disciplinary settings
    Create a data management plan and apply it from the start to the finish of your research project
    Name, organise, and version your data files effectively
    Gain familiarity with different kinds of data formats and know how and when to transform your data
    Document your data well for yourself and others, learn about metadata standards and cite data properly
    Know how to store and transport your data safely and securely (backup and encryption)
    Understand legal and ethical requirements for managing data about human subjects; manage intellectual property rights
    Understand the benefits of sharing, preserving and licensing data for re-use
    Improve your data handling skills in one of four software environments: R, SPSS, NVivo, or ArcGIS

  • OntoSoft Tutorial: A distributed semantic registry for scientific software

    An overview of the OntoSoft project, an intelligent system to assist scientists in making their software more discoverable and reusable.

    For more information on the OntoSoft project, go to ​

  • An overview of the EDI data repository and data portal

    The Environmental Data Initiative (EDI) data repository is a metadata-driven archive for environmental and ecological research data described by the Ecological Metadata Language (EML). This webinar will provide an overview of the PASTA software used by the repository and demonstrate the essentials of uploading a data package to the repository through the EDI Data Portal. 

  • FAIR Self-Assessment Tool

    The FAIR Data Principles are a set of guiding principles in order to make data findable, accessible, interoperable and reusable (Wilkinson et al., 2016). Using this tool you will be able to assess the 'FAIRness' of a dataset and determine how to enhance its FAIRness (where applicable).

    This self-assessment tool has been designed predominantly for data librarians and IT staff but could be used by software engineers developing FAIR Data tools and services, and researchers provided they have assistance from research support staff.

    You will be asked questions related to the principles underpinning Findable, Accessible, Interoperable and Reusable. Once you have answered all the questions in each section you will be given a ‘green bar’ indicator based on your answers in that section, and when all sections are completed, an overall 'FAIRness' indicator is provided.

  • Webinar: Jupyter as a Gateway for Scientific Collaboration and Education

    Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism, and industry. The core premise of the Jupyter architecture is to design tools around the experience of interactive computing, building an environment, protocol, file format and libraries optimized for the computational process when there is a human in the loop, in a live iteration with ideas and data assisted by the computer.

    The Jupyter Notebook, a system that allows users to compose rich documents that combine narrative text and mathematics together with live code and the output of computations in any format compatible with a web browser (plots, animations, audio, video, etc.), provides a foundation for scientific collaboration. The next generation of the Jupyter web interface, JupyterLab, will combine in a single user interface not only the notebook but multiple other tools to access Jupyter services and remote computational resources and data.  A flexible and responsive UI allows the user to mix Notebooks, terminals, text editors, graphical consoles and more, presenting in a single, unified environment the tools needed to work with a remote environment.  Furthermore, the entire design is extensible and based on plugins that interoperate via open APIs, making it possible to design new plugins tailored to specific types of data or user needs.

    JupyterHub enables Jupyter Notebook and JupyterLab to be used by groups of users for research collaboration and education. We believe JupyterHub provides a foundation on which to build modern scientific gateways that support a wide range of user scenarios, from interactive data exploration in high-level languages like Python, Julia or R, to the education of researchers and students whose work relies on traditional HPC resources.

    The presenter discusses the benefits and applications of Jupyter Notebooks.

    Scroll to the bottom of the page to view the webinar. Presentation slides are also available on the same page. 

  • Seismic Data Quality Assurance Using IRIS MUSTANG Metrics

    Seismic data quality assurance involves reviewing data in order to identify and resolve problems that limit the use of the data – a time-consuming task for large data volumes! Additionally, no two analysts review seismic data in quite the same way. Recognizing this, IRIS developed the MUSTANG automated seismic data quality metrics system to provide data quality measurements for all data archived at IRIS Data Services. Knowing how to leverage MUSTANG metrics can help users quickly discriminate between usable and problematic data and it is flexible enough for each user to adapt it to their own working style.
    This tutorial presents strategies for using MUSTANG metrics to optimize your own data quality review. Many of the examples in this tutorial illustrate approaches used by the IRIS Data Services Quality Assurance (QA) staff.

  • R Class for Seismologists

    The IRIS Data Management Center (DMC) archives and distributes data to support the seismological research community. The class described here introduces DMC and other seismologists to the R statistical programming language and its use with seismological data available from DMC web services. The capabilities of the seismicRoll, IRISSeismic and IRISMustangMetrics packages developed as part of the MUSTANG project will be demonstrated.
    Class materials are broken up into nine separate lessons that assume some experience coding but not necessarily any familiarity with R. Lessons are presented in sequential order and assume the student already has R and RStudio installed on their computer. Autodidacts new to R should take about 20-30 hrs to complete the course. The target audience for these materials consists of IRIS DMC employees or graduate students with a degree in the natural sciences and some experience using scientific software such as MATLAB or Python.

  • Research Lifecycle at University of Central Florida

    Short video discussing the "Research Lifecycle at University of Central Florida," a useful diagram for understanding the typical flow of a research project.

  • Research Data Management Community Training

    Good research data management is of great importance for high-quality research. Implementing professional research data management from the start helps to avoid problems in the data creation and curation phases.

    • Definition(s) of RDM
    • Benefits and Advantages of RDM
    • Research Data Life-Cycle
    • Structure and components of RDM
    • Stakeholders
    • Recommended literature
  • Access Policies and Usage Regulations: Licenses

    The webinar about licensing and policy will look into why it is important that research data are provided with licenses.

    • Benefits of sharing research data
    • Challenges
    • Types of licenses
    • Data ownership and reuse
    • Using creative commons in archiving research data

    During the workshop, participants will acquire a basic knowledge of data licensing.

  • Best Practices for Biomedical Research Data Management

    This course presents approximately 20 hours of content aimed at a broad audience on recommended practices facilitating the discoverability, access, integrity, reuse value, privacy, security, and long-term preservation of biomedical research data.

    Each of the nine modules is dedicated to a specific component of data management best practices and includes video lectures, presentation slides, readings & resources, research teaching cases, interactive activities, and concept quizzes.

    Background Statement:
    Biomedical research today is not only rigorous, innovative and insightful, it also has to be organized and reproducible. With more capacity to create and store data, there is the challenge of making data discoverable, understandable, and reusable. Many funding agencies and journal publishers are requiring publication of relevant data to promote open science and reproducibility of research.

    In order to meet to these requirements and evolving trends, researchers and information professionals will need the data management and curation knowledge and skills to support the access, reuse and preservation of data.

    This course is designed to address present and future data management needs.

    Best Practices for Biomedical Research Data Management serves as an introductory course for information professionals and scientific researchers to the field of scientific data management.

    In this course, learners will explore relationships between libraries and stakeholders seeking support for managing their research data. 

  • Introduction to R

    Learn the basics of reproducible workflows in R using a USGS ​National Water Information System (NWIS) dataset.

    Instructions for accessing the dataset are provided within the tutorial.

  • ISRIC Spring School

    The ISRIC Spring School aims to introduce participants to world soils, soil databases, software for soil data analysis and visualisation, digital soil mapping and soil-web services through two 5-day courses run in parallel.  Target audiences for the Spring School include soil and environmental scientists involved in (digital) soil mapping and soil information production at regional, national and continental scales; soil experts and professionals in natural resources management and planning; and soil science students at MSc and PhD level.  Examples courses include "World Soils and their Assessment (WSA) and Hands-on Global Soil Information Facilities (GSIF).  Data management topics are included within the course topics.

  • FAIRification Process

    The FAIR Data Principles apply to metadata, data, and supporting infrastructure (e.g., search engines). Most of the requirements for findability and accessibility can be achieved at the metadata level. Interoperability and reuse require more efforts at the data level. The scheme discussed in the resource depicts the FAIRification process adopted by GO FAIR, focusing on data, but also indicating the required work for metadata. 

    For information about GO FAIR Metrics, go to and for GO FAIR Metadata, go to .


  • Clean your taxonomy data with the taxonomyCleanr R package

    Taxonomic data can be messy and challenging to work with. Incorrect spelling, the use of common names, unaccepted names, and synonyms, contribute to ambiguity in what a taxon actually is. The taxonomyCleanr R package helps you resolve taxonomic data to a taxonomic authority, get accepted names and taxonomic serial numbers, as well as create metadata for your taxa in the Ecological Metadata Language (EML) format.

  • Postgres, EML and R in a data management workflow

    Metadata storage and creation of Ecological Metadata Language (EML) can be a challenge for people and organizations who want to archive their data. A workflow was developed to combine efficient EML record generation (using the package developed by the R community) with centrally-controlled metadata in a relational database. The webinar has two components: 1) a demonstration of metadata storage and management using a relational database, and 2) discussion of an example EML file generation workflow using pre-defined R functions.


  • Experimental Design Assistant

    The Experimental Design Assistant (EDA) ( is a free web-based tool that was developed by the NC3Rs ( It guides researchers through the design and analysis of in vivo experiments. The EDA allows users to build a stepwise visual representation of their experiment, providing feedback and dedicated support for randomization, blinding and sample size calculation. This demonstration will provide an introduction to the tool and provide guidance on getting started. Ultimately, the use of a tool such as the EDA will lead to carefully designed experiments that yield robust and reproducible data using the minimum number of animals consistent with scientific objectives. 

  • Open Science and Innovation

    This course helps you to understand open business models and responsible research and innovation (RRI) and illustrates how these can foster innovation. By the end of the course, you will:

    • Understand key concepts and values of open business models and responsible research and innovation
    • Know how to plan your innovation activities
    • Be able to use Creative Commons licenses in business
    • Understand new technology transfer policies with the ethos of Open Science
    • Learn how to get things to market faster
  • Data Management using NEON Small Mammal Data

    Undergraduate STEM students are graduating into professions that require them to manage and work with data at many points of a data management lifecycle. Within ecology, students are presented not only with many opportunities to collect data themselves but increasingly to access and use public data collected by others. This activity introduces the basic concept of data management from the field through to data analysis. The accompanying presentation materials mention the importance of considering long-term data storage and data analysis using public data.

    Content page: ​

  • Coffee and Code: Write Once Use Everywhere (Pandoc)

    Pandoc ( is a document processing program that runs on multiple operating systems (Mac, Windows, Linux) and can read and write a wide variety of file formats. In many respects, Pandoc can be thought of as a universal translator for documents. This workshop focuses on a subset of input and output document types, just scratching the surface of the transformations made possible by Pandoc.

    Click 00-Overview.ipynb on the provided GitHub page or go directly to the overview, here:

  • Coffee and Code: Introduction to Version Control

    This is a tutorial about version control, also known as revision control, a method for tracking changes to files and folders within a source code tree, project, or any complex set of files or documents.

    Also see ​Advanced Version Control, here: ​

  • Open Licensing

    Licensing your research outputs is an important part of practicing Open Science. After completing this course, you will:

    • Know what licenses are, how they work, and how to apply them 
    • Understand how different types of licenses can affect research output reuse
    • Know how to select the appropriate license for your research 
  • Singularity User Guide

    Singularity is a container solution created by necessity for scientific and application driven workloads.  .

    Over the past decade and a half, virtualization has gone from an engineering toy to a global infrastructure necessity and the evolution of enabling technologies has flourished. Most recently, we have seen the introduction of the latest spin on virtualization… “containers”. 

    Many scientists, especially those involved with the high performance computation (HPC) community, could benefit greatly by using container technology, but they need a feature set that differs somewhat from that available with current container technology. This necessity drives the creation of Singularity and articulated its four primary functions:

    • Mobility of compute
    • Reproducibility
    • User freedom
    • Support on existing traditional HPC 

    This user guide introduces Singularity, a free, cross-platform and open-source computer program that performs operating-system-level virtualization also known as containerization.

  • Make EML with R and share on GitHub

    Introduction to the Ecological Metadata Language (EML). Topics include:

    • Use R to build EML for a mock dataset
    • Validate EML and write to file
    • Install Git and configure to track file versioning in RStudio
    • Set up GitHub account and repository
    • Push local content to GitHub for sharing and collaboration

    Access the rendered version of this tutorial here:​

  • Tutorial: DataCite Linking

    This tutorial walks users through the simple process of creating a workflow in the OpenMinTeD platform that allows them to extract links to DataCite ( - mainly citations to datasets - from scientific publications.

  • Florilege, a new database of habitats and phenotypes of food microbe flora

    This tutorial explains how to use the “Habitat-Phenotype Relation Extractor for Microbes” application available from the OpenMinTeD platform. It also explains the scientific issues it addresses, and how the results of the TDM process can be queried and exploited by researchers through the Florilège application.  

    In recent years, developments in molecular technologies have led to an exponential growth of experimental data and publications, many of which are open, however accessible separately. Therefore, it is now crucial for researchers to have bioinformatics infrastructures at their disposal, that propose unified access to both data and related scientific articles. With the right text mining infrastructures and tools, application developers and data managers can rapidly access and process textual data, link them with other data and make the results available for scientists.

    The text-mining process behind Florilege has been set up by INRA using the OpenMinTeD environment. It consists in extracting the relevant information, mostly textual, from scientific literature and databases. Words or word groups are identified and assigned a type, like  “habitat” or “taxon”.

    Sections of the tutorial:
    1. Biological motivation of the Florilege database
    2. Florilège Use-Case on OpenMinTeD (includes a description of how to access the Habitat-Phenotype Relation Extractor for Microbes application)
    3. Florilege backstage: how is it build?
    4. Florilège description
    5. How to use Florilege ?


  • Best Practice in Open Research

    This course introduces some practical steps toward making your research more open. We begin by exploring the practical implications of open research, and the benefits it can deliver for research integrity and public trust, as well as benefits you will accrue in your own work. After a short elaboration of some useful rules of thumb, we move quickly onto some more practical steps towards meeting contemporary best practice in open research and introduce some useful discipline-specific resources. Upon completing this course, you will:

    • Understand the practical implications of taking a more open approach to research
    • Be prepared to meet expectations relating to openness from funders, publishers, and peers 
    • Be able to reap the benefits of working openly
    • Have an understanding of the guiding principles to follow when building openness into your research workflow
    • Know about some useful tools and resources to help you embed Open Science into work research practices