All Learning Resources
GeoBuilder - How to Share My Session
A brief tutorial that shows how to share a GeoBuilder session. The GeoBuilder tool provides a wizard type interface that guides users through several steps for loading, selecting, configuring and analyzing geo-referenced tabular data. To use the accompanying My Geo Hub tutorial exercises, go to https://mygeohub.org/resources/1219. Note that the number before each step is the time on the YouTube video where it shows how each step is done. Also, note that the video does not contain audio content.
For more information about GeoBuilder, go to https://mygeohub.org/resources/geobuilder.
Introduction to Lidar
This self-paced, online training introduces several fundamental concepts of lidar and demonstrates how high-accuracy lidar-derived elevation data support natural resource and emergency management applications in the coastal zone.
- Define lidar
- Select different types of elevation data for specific coastal applications
- Describe how lidar are collected
- Identify the important characteristics of lidar data
- Distinguish between different lidar data products
- Recognize aspects of data quality that impact data usability
- Locate sources of lidar data
- Discover additional information and additional educational resources
Note: requires Flash Plugin.
Introduction to Lidar
This course provides an overview of Lidar technology; data collection workflow; data products formats, and metadata; Lidar and vegetation; QA/QC, artifacts, issues to keep in mind; and DEM generation from Lidar point cloud data.
The focus of this workshop is on working with genomics data and data management and analysis for genomics research. It covers data management and analysis for genomics research including best practices for the organization of bioinformatics projects and data, use of command line utilities, use of command line tools to analyze sequence quality and perform variant calling, and connecting to and using cloud computing.
- Project organization and management
- Introduction to the command line
- Data wrangling and processing
- Introduction to cloud computing for genomics
- Data analysis and visualization in R *beta*
This workshop uses a tabular ecology dataset from the Portal Project Teaching Database and teaches data cleaning, management, analysis, and visualization. There are no pre-requisites, and the materials assume no prior knowledge about the tools. We use a single dataset throughout the workshop to model the data management and analysis workflow that a researcher would use.
- Data Organization in Spreadsheets
- Data Cleaning with OpenRefine
- Data Management with SQL
- Data Analysis and Visualization in R
- Data Analysis and Visualization in Python
The Ecology workshop can be taught using R or Python as the base language.
Portal Project Teaching Dataset: the Portal Project Teaching Database is a simplified version of the Portal Project Database designed for teaching. It provides a real-world example of life-history, population, and ecological data, with sufficient complexity to teach many aspects of data analysis and management, but with many complexities removed to allow students to focus on the core ideas and skills being taught.
The Agriculture Open Data Package
he third GODAN Capacity Development Working Group webinar, supported by GODAN Action, focused on the Agriculture Open Data Package (AgPack).
In 2016 GODAN, ODI, the Open Data Charter and OD4D developed the Agricultural Open Data Package (AgPack) to help governments to realize impact with open data in the agriculture sector and food security. Details at http://www.agpack.info
During the webinar the speakers outlined examples and use cases of governments using open data in support of their agricultural sector and food security. Also, the different roles a government can pick up to facilitate such a development, how open data can support government policy objectives on agriculture and food security.
Publishing Open Data from an Organisational Point of View
The second GODAN Capacity Building webinar was on “Publishing open data from an organisational point of view” and was lead by GODAN Action colleagues from the Open Data Institute in London.
This webinar focused on key aspects:
- Why publish open data
- What benefit can publishing open data bring
- Why licenses are the most important aspect of publishing open data
- How to start with publishing open data
GODAN Working Group on Capacity Development
The first webinar organized by the GODAN (Global Open Data for Agriculture & Nutrition) Working Group on Capacity Development gave an overview of GODAN, its objectives and how people can get involved. The webinar also provided information on the purpose of the GODAN Working Group on Capacity Development and explained how to join and get involved in the activities.
Curriculum on Open Data and Research Data Management in Agriculture and Nutrition
This paper details the curriculum for the Open Data Management in Agriculture and Nutrition e-learning course, including background to the course, course design, target audiences, and lesson objectives and outcomes.
This free online course aims to strengthen the capacity of data producers and data consumers to manage and use open data in agriculture and nutrition. One of the main learning objectives is for the course to be used widely within agricultural and nutrition knowledge networks, in different institutions. The course also aims to raise awareness of different types of data formats and uses, and to highlight how important it is for data to be reliable, accessible and transparent.
The course is delivered through Moodle e-learning platform. Course units include:
Unit 1: Open data principles
Unit 2: Using open data
Unit 3: Making data open
Unit 4: Sharing open data
Unit 5: IPR and Licensing
By the end of the course, participants will be able to:
- Understand the principles and benefits of open data
- Understand ethics and responsible use of data
- Identify the steps to advocate for open data policies
- Understand how and where to find open data
- Apply techniques to data analysis and visualisation
- Recognise the necessary steps to set up an open data repository
- Define the FAIR data principles
- Understand the basics of copyright and database rights
- Apply open licenses to data
The course is open to infomediaries which includes ICT workers, technologist - journalists, communication officers, librarians and extensionists; policy makers, administrators and project managers, and researchers, academics and scientists working in the area of agriculture, nutrition, weather and climate, and land data.
New England Collaborative Data Management Curriculum
NECDMC is an instructional tool for teaching data management best practices to undergraduates, graduate students, and researchers in the health sciences, sciences, and engineering disciplines. Each of the curriculum’s seven online instructional modules aligns with the National Science Foundation’s data management plan recommendations and addresses universal data management challenges. Included in the curriculum is a collection of actual research cases that provides a discipline specific context to the content of the instructional modules. These cases come from a range of research settings such as clinical research, biomedical labs, an engineering project, and a qualitative behavioral health study. Additional research cases will be added to the collection on an ongoing basis. Each of the modules can be taught as a stand-alone class or as part of a series of classes. Instructors are welcome to customize the content of the instructional modules to meet the learning needs of their students and the policies and resources at their institutions.
Imaging and Analyzing Southern California’s Active Faults with High-Resolution Lidar Topography
Over the past 5+ years, many of Southern California’s active faults have been scanned with airborne lidar through various community and PI-data collection efforts (e.g., the B4 Project, EarthScope, and the post-El Mayor–Cucapah earthquake). All of these community datasets are publicly available (via OpenTopography: https://www.opentopography.org) and powerfully depict the effect of repeated slip along these active faults as well as surface processes in a range of climatic regimes. These datasets are of great interest to the Southern California Earthquake Center (SCEC) research and greater academic communities and have already yielded important new insights into earthquake processes in southern California.
This is a short course on LiDAR technology, data processing, and analysis techniques. The foci of the course are fault trace and geomorphic mapping applications, integration with other geospatial data, and data visualization and analysis approaches. Course materials include slide presentations, video demonstrations, and text-based software application tutorials.
GODAN Webinar Series
A series of webinars organised by the GODAN Working Group on Capacity Development in collaboration with CTA. The Global Open Data for Agriculture and Nutrition (GODAN) supports the proactive sharing of open data to make information about agriculture and nutrition available, accessible and usable to deal with the urgent challenge of ensuring world food security. A core principle behind GODAN is that a solution to Zero Hunger lies within existing, but often unavailable, agriculture and nutrition data. At the GODAN Summit in September 2016, GODAN launched a new Working Group on Capacity Development. More info here: https://www.godan.info/news/leveraging-power-webinars-support-open-data-...
Sustaining Science Gateways—Finding your "best fit" model
Digital projects – science gateways, data repositories, educational websites, and others—have a few things in common. They can deliver a great deal of value to users – by sharing widely sophisticated tools, large data sets, or access to computing capacity among those in the academic sector who really need them to advance their work. But they share something else in common, too: They are devilishly hard to run in a way that permits ongoing growth and expansion.
In this webinar, Nancy Maron, a lead instructor in the Science Gateways Bootcamp, introduces participants to the key elements of sustainability planning – the building blocks for developing Science Gateways that have the best chance for ongoing growth.
The webinar will introduce sustainability models and share some key tactics for identifying the models that are most likely to work for your gateway. We will touch upon funding models, the competitive environment, and audience assessment, to show how these need to be considered in tandem with any plan.
Working in the R Ecosystem: Building Applications & Content for Your Gateway
The R programming language first appeared on the scene in the 1990's as an open source environment for statistical modeling and data analysis. Throughout the last decade, interest in the language has grown alongside researcher's abilities to collect and store larger amounts of data. Today, scientific and business decisions increasingly rely on the interpretation of this data. New libraries for processing data and communicating results are being debuted in ways that break down traditional language silos. Technologies like interactive documents, HTML based applications, and RESTful APIs have exposed capability gaps between R's interfaces for numerical analysis libraries and its built-in ability for graphical display. In this webinar, Derrick Kearney will survey several R libraries that are helping people bridge the gap between their R-based analysis and the numerous ways people are representing results today, all of which can be published on your science gateway, thus extending your research impact to others in a reproducible way.
Webinar: National Data Service (NDS) Labs Workbench
The growing size and complexity of high-value scientific datasets are pushing the boundaries of traditional models of data access and discovery. Many large datasets are only accessible through the systems on which they were created or require specialized software or computational resources for re-use. In response to this growing need, the National Data Service (NDS) consortium is developing the Labs Workbench platform, a scalable, web-based system intended to support turn-key deployment of encapsulated data management and analysis tools to support exploratory analysis and development on cloud resources that are physically "near" the data and associated high-performance computing (HPC) systems. The Labs Workbench may complement existing science gateways by enabling exploratory analysis of data and the ability for users to deploy and share their own tools. The Labs Workbench platform has also been used to support a variety training and workshop environments.
This webinar includes a demonstration of the Labs Workbench platform and a discussion of several key use cases. A presentation of findings from the recent Workshop on Container Based Analysis Environments for Research Data Access and Computing further highlight compatibilities between science gateways and interactive analysis platforms such as Labs Workbench.
Facing the data challenge: Developing data policy and services
Overview of research data management (RDM), who is responsible for RDM, the components of a researh data service, and policy and research activity roadmap development in compliance with Engineering and Physical Sciences Research Council (EPSRC) funding expectations in the UK.
Research Data Management and Sharing MOOC
This course will provide learners with an introduction to research data management and sharing. After completing this course, learners will understand the diversity of data and their management needs across the research data lifecycle, be able to identify the components of good data management plans and be familiar with best practices for working with data including the organization, documentation, and storage and security of data. Learners will also understand the impetus and importance of archiving and sharing data as well as how to assess the trustworthiness of repositories.
Note: The course is free to access. However, if you pay for the course, you will have access to all of the features and content you need to earn a Course Certificate from Coursera. If you complete the course successfully, your electronic Certificate will be added to your Coursera Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. Note that the Course Certificate does not represent official academic credit from the partner institution offering the course.
Also, note that the course is offered on a regular basis. For information about the next enrollment, go to the provided URL.
The video tutorials are available from the home page under the general topics listed below, and also on the GeoMapApp YouTube channel at: https://www.youtube.com/user/GeoMapApp. The tutorials demonstrate how to perform common tasks with GeoMapApp. Full information on the functions is available at the provided web address. General topics include:
- Import Your Own Data
- Analyze Data
- Working with Gridded Data
- Available Data and examples
- Portals (including, for example Ocean Floor Drilling, Multibeam Swath Bathymetry DAta, Seismic Data, Earthquake data)
- In-Depth Webinars
GeoMapApp is an earth science exploration and visualization application that is continually being expanded as part of the Marine Geoscience Data System (MGDS) at the Lamont-Doherty Earth Observatory of Columbia University. The application provides direct access to the Global Multi-Resolution Topography (GMRT) compilation that hosts high resolution (~100 m node spacing) bathymetry from multibeam data for ocean areas and ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) and NED (National Elevation Dataset) topography datasets for the global land masses.
Do-It-Yourself Research Data Management Training Kit for Librarians
Online training materials on topics designed for small groups of librarians who wish to gain conficence and understanding of research data management. The DIY Training Kit is designed to contain everything needed to complete a similar training course on your own (in small groups) and is based on open educational materials. The materials have been enhanced with Data Curation Profiles and reflective questions based on the experience of academic librarians who have taken the course.
The training kit includes:
- Promotional slides for the RDM Training Kit
- Training schedule
- Research Data MANTRA online course by EDINA and Data Library, University of Edinburgh
- Reflective writing questions
- Selected group exercises (with answers) from UK Data Archive, University of Essex - Managing and sharing data: Training resources. September, 2011 (PDF). Complete RDM Resources Training Pack available:
- Podcasts for short talks by the original Edinburgh speakers if running course without ‘live’ speakers (Windows or Quicktime versions).
- Presentation files (pptx) if learners decide to take turns presenting each topic.
- Evaluation forms
- Independent study assignment: Interview with a researcher, based on Data Curation Profile, from D2C2, Purdue University Libraries and Boston University Libraries.
DataONE Data Management Module 03: Data Management Planning
Data management planning is the starting point in the data life cycle. Creating a formal document that outlines what you will do with the data during and after the completion of research helps to ensure that the data is safe for current and future use. This 30-40 minute lesson describes the benefits of a data management plan (DMP), outlines the components of a DMP, details tools for creating a DMP, provides NSF DMP information, and demonstrates the use of an example DMP and includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.
DataONE Data Management Module 04: Data Entry and Manipulation
When entering data, common goals include: creating data sets that are valid, have gone through an established process to ensure quality, are organized, and reusable. This lesson outlines best practices for creating data files. It will detail options for data entry and integration, and provide examples of processes used for data cleaning, organization and manipulation and includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise, handout, and supporting data files.
DataONE Data Management Module 05: Data Quality Control and Assurance
Quality assurance and quality control are phrases used to describe activities that prevent errors from entering or staying in a data set. These activities ensure the quality of the data before it is collected, entered, or analyzed, as well as actively monitoring and maintaining the quality of data throughout the study. In this lesson, we define and provide examples of quality assurance, quality control, data contamination and types of errors that may be found in data sets. After completing this lesson, participants will be able to describe best practices in quality assurance and quality control and relate them to different phases of data collection and entry. This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.
DataONE Data Management Module 06: Data Protection and Backups
There are several important elements to digital preservation, including data protection, backup and archiving. In this lesson, these concepts are introduced and best practices are highlighted with case study examples of how things can go wrong. Exploring the logistical, technical and policy implications of data preservation, participants will be able to identify their preservation needs and be ready to implement good data preservation practices by the end of the module. This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.
DataONE Data Management Module 07: Metadata
What is metadata? Metadata is data (or documentation) that describes and provides context for data and it is everywhere around us. Metadata allows us to understand the details of a dataset, including: where it was collected, how it was collected, what gaps in the data mean, what the units of measurement are, who collected the data, how it should be attributed etc. By creating and providing good descriptive metadata for our own data, we enable others to efficiently discover and use the data products from our research. This lesson explores the importance of metadata to data authors, users of the data and organizations, and highlights the utility of metadata. It provides an overview of the different metadata standards that exist, and the core elements that are consistent across them; guiding users in selecting a metadata standard to work with and introduces the best practices needed for writing a high quality metadata record.
This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise, handout, and supporting data files.
DataONE Data Management Module 08: Data Citation
Data citation is a key practice that supports the recognition of data creation as a primary research output rather than as a mere byproduct of research. Providing reliable access to research data should be a routine practice, similar to the practice of linking researchers to bibliographic references. After completing this lesson, participants should be able to define data citation and describe its benefits; to identify the roles of various actors in supporting data citation; to recognize common metadata elements and persistent data locators and describe the process for obtaining one, and to summarize best practices for supporting data citation. This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.
DataONE Data Management Module 09: Analysis and Workflows
Understanding the types, processes, and frameworks of workflows and analyses is helpful for researchers seeking to understand more about research, how it was created, and what it may be used for. This lesson uses a subset of data analysis types to introduce reproducibility, iterative analysis, documentation, provenance and different types of processes. Described in more detail are the benefits of documenting and establishing informal (conceptual) and formal (executable) workflows. This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.
DataONE Data Management Module 10: Legal and Policy Issues
Conversations regarding research data often intersect with questions related to ethical, legal, and policy issues for managing research data. This lesson will define copyrights, licenses, and waivers, discuss ownership and intellectual property, and describe some reasons for data restriction. After completing this lesson, participants will be able to identify ethical, legal, and policy considerations that surround the use and management of research data. The 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.
NASA Earthdata Webinar Series
Monthly webinars on discovery and access to NASA Earth science data sets, services and tools. Webinars are archived on YouTube from 2013 to the present. Presenters are experts in different domains within NASA's Earth science research areas and are usually affiliated with NASA data centers and / or data archives. Specific titles for the current year's webinars can be found from the main page, but can also be found from separate pages for each year. These webinars are available to assist those wishing to learn or teach how to obtain and view these data.
NASA Earthdata Video Tutorials
Short video tutorials on topics related to available NASA EOSDIS data products, various types of data discovery, data access, and data tool demonstrations such as the Panoply tool for creating line plots. Videos accessible on YouTube from listing on main webinars and tutorials page. These tutorials are available to assist those wishing to learn or teach how to obtain and view these data.
Transform and visualize data in R using the packages tidyr, dplyr and ggplot2: An EDI VTC Tutorial.
The two tutorials, presented by Susanne Grossman-Clarke, demonstrate how to tidy data in R with the package “tidyr” and transform data using the package “dplyr”. The goal of those data transformations is to support data visualization with the package “ggplot2” for data analysis and scientific publications of which examples were shown.
Introduction to code versioning and collaboration with Git and GitHub: An EDI VTC Tutorial.
This tutorial is an introduction to code versioning and collaboration with Git and GitHub. Tutorial goals are to help you:
- Understand basic Git concepts and terminology.
- Apply concepts as Git commands to track versioning of a developing file.
- Create a GitHub repository and push local content to it.
- Clone a GitHub repository to the local workspace to begin developing.
- Inspire you to incorporate Git and GitHub into your workflow.
There are a number of exercises within the tutorial to help you apply the concepts learned.
Follow up questions can be directed via email to: o Colin Smith (firstname.lastname@example.org) AND Susanne Grossman-Clarke (email@example.com).
23 (research data) Things
23 (research data) Things is self-directed learning for anybody who wants to know more about research data. Anyone can do 23 (research data) Things at any time. Do them all, do some, cherry-pick the Things you need or want to know about. Do them on your own, or get together a Group and share the learning. The program is intended to be flexible, adaptable and fun!
Each of the 23 Things offers a variety of learning opportunities with activities at three levels of complexity: ‘Getting started’, ‘Learn more’ and ‘Challenge me’. All resources used in the program are online and free to use.
Introduction to Data Documentation - DISL Data Management Metadata Training Webinar Series - Part 1
Introduction to data documentation (metadata) for science datasets. Includes basic concepts about metadata and a few words about data accessibility. Video is about 23 minutes.
NSIDC DAAC Data Recipes
A collection of tutorials, called "data recipes" that describe how to use Earth science data from NASA's National Snow and Ice Data Center (NSIDC) using easily available tools and commonly used formats for Earth science data. These tutorials are available to assist those wishing to learn or teach how to obtain and view these data.
Why Cite Data?
This video explains what data citation is and why it's important. It also discusses what digital object identifiers (DOIs) are and how they are used.
MANTRA Research Data Management Training
MANTRA is a free, online non-assessed course with guidelines to help you understand and reflect on how to manage the digital data you collect throughout your research. It has been crafted for the use of post-graduate students, early career researchers, and also information professionals. It is freely available on the web for anyone to explore on their own.
Through a series of interactive online units you will learn about terminology, key concepts, and best practice in research data management.
There are eight online units in this course and one set of offline (downloadable) data handling tutorials that will help you:
Understand the nature of research data in a variety of disciplinary settings
Create a data management plan and apply it from the start to the finish of your research project
Name, organise, and version your data files effectively
Gain familiarity with different kinds of data formats and know how and when to transform your data
Document your data well for yourself and others, learn about metadata standards and cite data properly
Know how to store and transport your data safely and securely (backup and encryption)
Understand legal and ethical requirements for managing data about human subjects; manage intellectual property rights
Understand the benefits of sharing, preserving and licensing data for re-use
Improve your data handling skills in one of four software environments: R, SPSS, NVivo, or ArcGIS
OntoSoft Tutorial: A distributed semantic registry for scientific software
An overview of the EDI data repository and data portal
The Environmental Data Initiative (EDI) data repository is a metadata-driven archive for environmental and ecological research data described by the Ecological Metadata Language (EML). This webinar will provide an overview of the PASTA software used by the repository and demonstrate the essentials of uploading a data package to the repository through the EDI Data Portal.
FAIR Self-Assessment Tool
The FAIR Data Principles are a set of guiding principles in order to make data findable, accessible, interoperable and reusable (Wilkinson et al., 2016). Using this tool you will be able to assess the 'FAIRness' of a dataset and determine how to enhance its FAIRness (where applicable).
This self-assessment tool has been designed predominantly for data librarians and IT staff but could be used by software engineers developing FAIR Data tools and services, and researchers provided they have assistance from research support staff.
You will be asked questions related to the principles underpinning Findable, Accessible, Interoperable and Reusable. Once you have answered all the questions in each section you will be given a ‘green bar’ indicator based on your answers in that section, and when all sections are completed, an overall 'FAIRness' indicator is provided.
Webinar: Jupyter as a Gateway for Scientific Collaboration and Education
Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism, and industry. The core premise of the Jupyter architecture is to design tools around the experience of interactive computing, building an environment, protocol, file format and libraries optimized for the computational process when there is a human in the loop, in a live iteration with ideas and data assisted by the computer.
The Jupyter Notebook, a system that allows users to compose rich documents that combine narrative text and mathematics together with live code and the output of computations in any format compatible with a web browser (plots, animations, audio, video, etc.), provides a foundation for scientific collaboration. The next generation of the Jupyter web interface, JupyterLab, will combine in a single user interface not only the notebook but multiple other tools to access Jupyter services and remote computational resources and data. A flexible and responsive UI allows the user to mix Notebooks, terminals, text editors, graphical consoles and more, presenting in a single, unified environment the tools needed to work with a remote environment. Furthermore, the entire design is extensible and based on plugins that interoperate via open APIs, making it possible to design new plugins tailored to specific types of data or user needs.
JupyterHub enables Jupyter Notebook and JupyterLab to be used by groups of users for research collaboration and education. We believe JupyterHub provides a foundation on which to build modern scientific gateways that support a wide range of user scenarios, from interactive data exploration in high-level languages like Python, Julia or R, to the education of researchers and students whose work relies on traditional HPC resources.
The presenter discusses the benefits and applications of Jupyter Notebooks.
Scroll to the bottom of the page to view the webinar. Presentation slides are also available on the same page.