All Learning Resources

  • OSTI Lecture 3: Data, Code and Content Licensing

    The following video is an original recording from the OSTI pilot initiative. Part 3 of the Graduate Training in Open Science series entitled "Data, Code and Content Licensing", the seminar introduces the theme of licensing, outlines the advantages of this approach, and takes students through the main steps of implementation.

    The Open Science Training Initiative provides a series of lectures in open science, data management, licensing and reproducibility, for use with graduate students and postdoctoral researchers. The lectures can be used individually as one-off information lectures in aspects of open science or can be integrated into existing course provision at your institution as a lecture series, forming part of a hands-on exercise in producing a coherent research story.
    The raw materials have already been released online in the GitHub repository

  • OSTI Lecture 4: Data Management Planning

    What is a Data Management Plan? Why do we need them and how do they relate to our day-to-day research work? This short lecture, Part 4 of the Graduate Training in Open Science series, introduces the concept of the DMP, set within the context of the scope and scale of data produced in modern scientific research, and breaks the how-to process down into short-, medium- and long-term project management stages. Please note that, as a result of student feedback and the analysis undertaken for the Post-Pilot Report, this lecture will be offered as a mini-workshop in the release of the official material.

    The Open Science Training Initiative provides a series of lectures in open science, data management, licensing and reproducibility, for use with graduate students and postdoctoral researchers. The lectures can be used individually as one-off information lectures in aspects of open science or can be integrated into existing course provision at your institution as a lecture series, forming part of a hands-on exercise in producing a coherent research story.
    The raw materials have already been released online in the GitHub repository

  • Introduction To Databases And WoSIS

    The module teaches an introduction to databases and general soil database design, outlines problems of data standardization and harmonization, and concludes with some practical sessions on these issues.
    -Introduction to relational and spatial databases
    -Introduction to soil data modeling in databases
    -World Soil Information Service (WoSIS) databse structure
    -Practical for data access query and manipulation
    -Enable users to understand database usage, design, and access
    -Enable users to understand how soil data is modeled in databases

  • Data: Changing Paradigms for Data Management, Publication and Sharing #1

    In this webinar which is part of the Research Data Information Integration series, Professor William Michener provides a historical overview of data management and data sharing , focusing on lessons learned from past and emerging large ecological and environmental research programs , reviews some of the current impediments to data management , publication and sharing, discusses solutions to these challenges including various tools that support management of data throughout the data lifecycle from planning through analysis, explores new approaches to publishing and sharing data such as the Dryad digital repository and DataONE, and glimpses a future where informatics can better enable science, highlighting some of the activities that are underway with respect to changing the scientific culture (e.g., altimetric, semantic annotation and provenance tracking ).


  • Web Tutorials - Using the GLOBE Website

    The Global Learning and Observations to Benefit the Environment (GLOBE) Program is an international science and education program that provides students and the public worldwide with the opportunity to participate in data collection and the scientific process and contribute meaningfully to our understanding of the Earth system and global environment.
    The tutorials on this page are here to help you understand and work with the various parts of the GLOBE website. Each area is listed on the left, click on an item to see a video and/or to download step-by-step guides.
    Tutorials cover the main areas of the site, including:
    -GLOBE Data User Guide, to help scientists and researchers understand, access, and use available GLOBE data.
    -GLOBE Data Fundamentals, a webinar that allows one to understand how to access GLOBE data and includes a demonstration of the various tools.
    -The main GLOBE website, which allows you to engage in The GLOBE Program, as well as to collaborate and access training and educational resources.
    -The Data Entry System, which allows you to enter scientific data that school organizations have collected.
    -The Visualization and Data Retrieval System, which provides tools for interacting with and retrieving scientific data that has been entered by trained community members that belong to school organizations.

  • Excel for Chemical Engineers

    This great Excel series is a perennial favorite.  Learn about custom functions using VBO, pivot tables, macros, indirect references and more.
    The content includes:
    -Excelling with Excel #1 – Custom Functions Using VBA
    -Excelling with Excel #2 – Pivot Tables
    - Excelling with Excel #3 – Macros
    - Excelling with Excel #4 – Indirect References

  • Risk Analysis Screening Tool (RAST) and Chemical Hazard Engineering Fundamentals (CHEF)

    The Risk Analysis Screening Tool (RAST) is a free, downloadable Excel workbook that is used to help identify hazard scenarios and evaluate process risk within a single program. The user inputs information about the chemical, equipment, or unit operation parameters, process conditions, and facility layout. The program will suggest potentially hazardous scenarios and estimate the worst-case consequences based on user input. This tool is excellent for helping engineers in managing changes to a process or evaluating and potentially reducing the hazards of a process in the design stage. Attendees will learn about what the tool can do and begin to evaluate their processes.
    In addition to RAST, a companion information package, the Chemical Hazard Engineering Fundamentals (CHEF) documentation, describes in detail the theoretical basis of the methods, techniques, and assumptions which are used in RAST for the different hazard evaluation and risk analysis steps.

    Table of contents:
    -RAST Overview
    -CHEF Overview
    -Case Studies
    -Terms and Conditions
    -Download and Install
    -RAST User and CHEF Manuals
    -Frequently Asked Questions (FAQs)
    -RAST Development History

  • What we wish we had learned in Graduate School - a data management training roadmap for graduate students

    This Road Map is a dynamic guide to help graduate students wade through the ocean of data resources. This guide will help identify what data management practices graduate students should be considering at different graduate school milestones.

    Data management training for graduate students is a very important but often undervalued area of graduate school education. Many graduate students will go on and become professionals who are using, producing, and/or managing data that have tremendous benefits for both the research community and society. However, our personal experiences as graduate students show that data lifecycle and data management training are not part of the core curriculum in graduate school. As Earth Science Information Partners (ESIP) Community Fellows, we understand that data management is a critical skill in earth science and we all wished we had an opportunity to integrate it from the beginning in our graduate school experience. To the issue of lack of formal data management training in graduate education, we convened a working session during the 2020 ESIP Summer Meeting called “What we wish we had learned in Graduate School?” The session was initially planned as a working session for early career professionals to share resources and lessons learned during our own graduate school experiences. The session has sparked broad interests from the Earth science data community and attracted participants across different career stages and with different levels of expertise. The outcome of the session has been summarized as a roadmap that follows the DataONE Data Lifecycle. This roadmap projects the data lifecycle into the traditional graduate school timeline and highlights the benefits and resources of data management training for each component in the data lifecycle. This roadmap for graduate data management training will be distributed via ESIP and be continued as part of the ESIP Community Program in the future to promote data management training for graduate students in Earth sciences and beyond.

    Also available as a webinar from DataONE: 

  • Text Mining in Python through the HTRC Feature Reader

    In this lesson, we introduce the HTRC Feature Reader, a library for working with the HTRC Extracted Features dataset using the Python programming language. The HTRC Feature Reader is structured to support work using popular data science libraries, particularly Pandas. Pandas provides simple structures for holding data and powerful ways to interact with it. The HTRC Feature Reader uses these data structures, so learning how to use it will also cover general data analysis skills in Python.
    We introduce a toolkit for working with the 13.6 million volume Extracted Features Dataset from the HathiTrust Research Center. You will learn how to peer at the words and trends of any book in the collection, while developing broadly useful Python data analysis skills.
    Today, you’ll learn:
    -How to work with notebooks, an interactive environment for data science in Python;
    -Methods to read and visualize text data for millions of books with the HTRC Feature Reader; and
    -Data malleability, the skills to select, slice, and summarize extracted features data using the flexible “Data Frame” structure.

  • Exploring and Analyzing Network Data with Python

    This lesson introduces network metrics and how to draw conclusions from them when working with humanities data. You will learn how to use the Network X Python package to produce and work with these network statistics.
    In this tutorial, you will learn:
    -To use the Network X package for working with network data in Python
    -To analyze humanities network data to find:
    Network structure and path lengths,
    Important or central nodes,
    Communities and subgroups.

    This tutorial assumes that you have:
    -A basic familiarity with networks and/or have read “From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources” by Martin Düring here on Programming Historian;
    -Installed Python 3, not the Python 2 that is installed natively in Unix-based operating systems such as Macs (If you need assistance installing Python 3, check out the Hitchhiker’s Guide to Python); and
    -Installed the pip package installer.1

  • Getting Started With Topic Modeling And MALLET

    In this lesson, you will first learn what topic modeling is and why you might want to employ it in your research. You will then learn how to install and work with the MALLET natural language processing toolkit to do so. MALLET involves modifying an environment variable (essentially, setting up a short-cut so that your computer always knows where to find the MALLET program) and working with the command line (ie, by typing in commands manually, rather than clicking on icons or menus). We will run the topic modeler on some example files, and look at the kinds of outputs that MALLET installed. This will give us a good idea of how it can be used on a corpus of texts to identify topics found in the documents without reading them individually.

  • Introducción a Topic Modeling y MALLET

    En esta lección, primero aprenderás qué es topic modeling1 y por qué podrías querer utilizarlo en tus investigaciones. Luego aprenderás cómo instalar y trabajar con MALLET, una caja de herramientas para procesamiento de lenguajes naturales (PLN) que sirve para realizar este tipo de análisis. MALLET requiere que se modifique una variable de entorno (esto es, configurar un atajo para que la computadora sepa en todo momento dónde encontrar el programa MALLET) y que se trabaje con la línea de comandos (es decir, tecleando comandos manualmente en vez de hacer clic en íconos o menús).

  • Making Research Data Available

    There is a growing awareness of the importance of research data. Elsevier is committed to encouraging and supporting researchers who want to store, share, discover and reuse data. To this end, Elsevier has set up several initiatives that allow authors to make their data available when they publish with Elsevier. The webinars in the collection (located on the bottom half of the web page) cover:

    • Ways for researchers to store, share, discover, and use data
    • How to create a good research data management plan  
    • Data Citation: How can you as a researcher benefit from citing data? 

  • Hivebench Electronic Lab Notebook

    The time it takes to prepare, analyze and share experimental results can seem prohibitive, especially in the current, highly competitive world of biological research. However, not only is data sharing mandated by certain funding and governmental bodies, it also has distinct advantages for research quality and impact. Good laboratory practices recommend that all researchers use electronic lab notebooks (ELN) to save their results. This resource includes numerous short video demonstrations of Hivebench:

    • Start using Hivebench, the full demo
    • Creating a Hivebench account
    • Managing protocols & methods
    • Storing experimental findings in a notebook
    • Managing research data
    • Doing research on iPhone and iPad
    • Editing experiments
    • Collaborating with colleagues
    • Searching for results
    • Staying up to date with the newsfeed
    • Planning experiments with the calendar
    • Using open science protocols
    • Mendeley Data Export
    • Managing inventory of reagents
    • Signing and counter signing experiments
    • Archiving notebooks
    • How to keep data alive when researchers move on? Organizing data, methods, and protocols.
  • Remote Sensing for Monitoring Land Degradation and Sustainable Cities Sustainable Development Goals (SDGs) [Advanced]

    The Sustainable Development Goals (SDGs) are an urgent call for action by countries to preserve our oceans and forests, reduce inequality, and spur economic growth. The land management SDGs call for consistent tracking of land cover metrics. These metrics include productivity, land cover, soil carbon, urban expansion, and more. This webinar series will highlight a tool that uses NASA Earth Observations to track land degradation and urban development that meet the appropriate SDG targets. 

    SDGs 11 and 15 relate to sustainable urbanization and land use and cover change. SDG 11 aims to "make cities and human settlements inclusive, safe, resilient, and sustainable." SDG 15 aims to "combat desertification, drought, and floods, and strive to achieve a land degradation neutral world." To assess progress towards these goals, indicators have been established, many of which can be monitored using remote sensing. 

    In this training, attendees will learn to use a freely-available QGIS plugin, Trends.Earth, created by Conservation International (CI) and have special guest speakers from the United Nations Convention to Combat Desertification (UNCCD) and UN Habitat. Trends.Earth allows users to plot time series of key land change indicators. Attendees will learn to produce maps and figures to support monitoring and reporting on land degradation, improvement, and urbanization for SDG indicators 15.3.1 and 11.3.1. Each part of the webinar series will feature a presentation, hands-on exercise, and time for the speaker to answer live questions. 

    Learning Objectives: By the end of this training, attendees will: 

    • Become familiar with SDG Indicators 15.3.1 and 11.3.1
    • Understand the basics on how to compute sub indicators of SDG 15.3.1 such as: productivity, land cover, and soil carbon 
    • Understand how to use the Trends.Earth Urban Mapper web interface
    • Learn the basics of the Trends.Earth toolkit including: 
      • Plotting time series 
      • Downloading data
      • Use default or custom data for productivity, land cover, and soil organic carbon
      • Calculate a SDG 15.3.1 spatial layers and summary table 
      • Calculate urban change metrics
      • Create urban change summary tables

    Course Format: This training has been developed in partnership with Conservation International, United Nations Convention to Combat Desertification (UNCCD), and UN Habitat. 

    • Three, 1.5-hour sessions that include lectures, hands-on exercises, and a question and answer session
    • The first session will be broadcast in English, and the second session will contain the same content, broadcast in Spanish (see separate record for Spanish version at: 


    Each part of 3 includes links to the recordings, presentation slides, exercises and Question & Answer Transcripts.   

  • Agency Requirements: NSF Data Management Plans

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "NSF Data Management Plans".  The module was authored by Ruth Duerr from the National Snow and Ice Data Center in Boulder, Colorado.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).

    If you’ve done any proposal writing for the National Science Foundation (NSF), you know that NSF now requires that all proposals be accompanied by a data management plan that can be no longer than two pages.   The data management plans are expected to respond to NSF’s existing policy on the dissemination and sharing of research results.  You can find a description of this policy in the NSF Award and Administration Guide to which we provide a link later in this module. In addition, we should note that the NSF’s proposal submission system, Fastlane, will not accept a proposal that does not have a data management plan attached as a supplementary document.

    Individual directorates may have specific guidance for data management plans. For example, the Ocean Sciences Division specifies that data be available within two years after acquisition. Specifications for some individual directorates may provide a list of places where you must archive your data and what you should do if none of the archives in the list can take your data. They may also have additional requirements for both annual and final reporting beyond the general case requirements from NSF.  In addition, individual solicitations may have program specific guidelines to which you need to pay attention.  This module is available in both presentation slide and video formats.

  • Intro to Data Management

    This guide will provide general information about data management, including an overview of Data Management Plans (DMPs), file naming conventions, documentation, security, backup, publication, and preservation. We have included the CMU data life cycle to put the pieces in context in the Data 101 section.
    The CMU Libraries provides research data management resources for guidance on data management, planning, and sharing for researchers, faculty, and students.

  • Content-based Identifiers for Iterative Forecasts: A Proposal

    Iterative forecasts pose particular challenges for archival data storage and retrieval. In an iterative forecast, data about the past and present must be downloaded and fed into an algorithm that will output a forecast data product. Previous forecasts must also be scored against the realized values in the latest observations. Content-based identifiers provide a convenient way to consistently identify input and outputs and associated scripts. These identifiers are:
    (1) location-agnostic – they don’t depend on a URL or other location-based authority (like DOI)
    (2) reproducible – the same data file always has the same identifier
    (3) frictionless – cheap and easy to generate with widely available software, no authentication or network connection
    (4) sticky – the identifier cannot become unstuck or separated from the content
    (5) compatible – most existing infrastructure, including DataONE, can quite readily use these identifiers.

    In this webinar, the speaker will illustrate an example iterative forecasting workflow. In the process, he will highlight some newly developed R packages for making this easier.

  • Supporting Researchers in Discovering Data Repositories

    How do researchers go about identifying a repository to preserve their data? Do they have all the information they need to make an informed decision? Are there resources available to help?
    There is a myriad of repositories available to support data preservation and they differ across multiple axes. So which one is right for your data? The answer is large, ‘it depends’. But this can be frustrating to a new researcher looking to publish data for the first time. What questions need to be asked to detangle these dependencies and where can a researcher go for answers?
    Conversations and sessions at domain conferences have consistently suggested that researchers need more support in navigating the landscape of data repositories and with support from ESIP Funding Friday, we sought to do that. In this webinar, we will introduce a resource under development that aims to serve as a gateway for information about repository selection. With links to existing resources, games, and outreach materials, we aim to facilitate the discovery of data repositories and we welcome contributions to increase the value of this resource.

  • A FAIR Afternoon: On FAIR Data Stewardship for Technology Hotel (/ETH4) beneficiaries

    FAIR data awareness event for Enabling Technology Hotels 4ed. One of the aims of the Enabling Technologies Hotels programme, is to promote the application of the FAIR data principles in research data stewardship, data integration, methods, and standards. This relates to the objective of the national plan open science that research data have to be made suitable for re-usability.

    With this FAIR data training, ZonMw and DTL aim to help researchers (hotel guests and managers) that have obtained a grant in the 4th round of the programme to apply FAIR data management in their research.

  • RDM Onboarding Checklist

    Research Data Management is essential for responsible research and should be introduced when starting a new project or joining a new lab. Managing data across a project and/ or a team allows for accurate communication about that project. This session will review the important steps for onboarding new employees/trainees to a lab or new projects. The key takeaway from this session will be how to incorporate these steps within your individual project or lab environment. While the principles are general, these documents focus on Harvard policies and resources. Internal and external links have been provided throughout the document as supplementary resources, including a glossary of terms. 

    There are 2 checklists as follow: 
    The RDM Onboarding Checklist: Abridged Version serves as a condensed version of the comprehensive checklist described above. This version is intended to be used as an actionable checklist, employed after reviewing the onboarding processes and resources provided in the comprehensive checklist.
    The RDM Onboarding Checklist: Comprehensive Version serves as a general, research data management-focused guide to employee/trainee onboarding as they join a new lab or begin new projects (follow one or both of these as they apply to your situation). This comprehensive version is provided as an initial introduction to the onboarding process and to the breadth of available resources; this version is intended to be reviewed first, prior to utilizing the abridged version.

     Learning Objectives:

    • Become familiar with the research data lifecycle
    • Understand the details and requirements at each stage of data management onboarding
    • Engage with best practices to enhance your current and future research
    • Receive resources and contacts for future help
  • Workshop On Data Management Plans For Linguistic Research

    The rising tide of data management and sharing requirements from funding agencies, publishers, and institutions has created a new set of pressures for researchers who are already stretched for time and funds. While it can feel like yet another set of painful hurdles, in reality, the process of creating a Data Management Plan (DMP) can be a surprisingly useful exercise, especially when done early in a project’s lifecycle. Good data management practiced throughout one’s career, can save time, money, and frustration, while ultimately helping increase the impacts of research.
    This 1-day workshop will involve lecture and discussion around concepts of data management throughout the data lifecycle (from data creation, storage, and analysis to data sharing, archiving, and reusing), as well as related issues such as intellectual property, copyright, open access, data citation, attribution, and metrics. Participants will learn about data management best practices and useful tools while engaging in activities designed to produce a DMP similar to those desired by the NSF Behavioral and Cognitive Sciences Division (for example, Linguistics, Documenting Endangered Languages), as well as other federal agencies such as NEH.