All Learning Resources
DataONE Data Management Module 05: Data Quality Control and Assurance
Quality assurance and quality control are phrases used to describe activities that prevent errors from entering or staying in a data set. These activities ensure the quality of the data before it is collected, entered, or analyzed, as well as actively monitoring and maintaining the quality of data throughout the study. In this lesson, we define and provide examples of quality assurance, quality control, data contamination and types of errors that may be found in data sets. After completing this lesson, participants will be able to describe best practices in quality assurance and quality control and relate them to different phases of data collection and entry. This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.
DataONE Data Management Module 06: Data Protection and Backups
There are several important elements to digital preservation, including data protection, backup and archiving. In this lesson, these concepts are introduced and best practices are highlighted with case study examples of how things can go wrong. Exploring the logistical, technical and policy implications of data preservation, participants will be able to identify their preservation needs and be ready to implement good data preservation practices by the end of the module. This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.
DataONE Data Management Module 07: Metadata
What is metadata? Metadata is data (or documentation) that describes and provides context for data and it is everywhere around us. Metadata allows us to understand the details of a dataset, including: where it was collected, how it was collected, what gaps in the data mean, what the units of measurement are, who collected the data, how it should be attributed etc. By creating and providing good descriptive metadata for our own data, we enable others to efficiently discover and use the data products from our research. This lesson explores the importance of metadata to data authors, users of the data and organizations, and highlights the utility of metadata. It provides an overview of the different metadata standards that exist, and the core elements that are consistent across them; guiding users in selecting a metadata standard to work with and introduces the best practices needed for writing a high quality metadata record.
This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise, handout, and supporting data files.
DataONE Data Management Module 08: Data Citation
Data citation is a key practice that supports the recognition of data creation as a primary research output rather than as a mere byproduct of research. Providing reliable access to research data should be a routine practice, similar to the practice of linking researchers to bibliographic references. After completing this lesson, participants should be able to define data citation and describe its benefits; to identify the roles of various actors in supporting data citation; to recognize common metadata elements and persistent data locators and describe the process for obtaining one, and to summarize best practices for supporting data citation. This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.
DataONE Data Management Module 09: Analysis and Workflows
Understanding the types, processes, and frameworks of workflows and analyses is helpful for researchers seeking to understand more about research, how it was created, and what it may be used for. This lesson uses a subset of data analysis types to introduce reproducibility, iterative analysis, documentation, provenance and different types of processes. Described in more detail are the benefits of documenting and establishing informal (conceptual) and formal (executable) workflows. This 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.
DataONE Data Management Module 10: Legal and Policy Issues
Conversations regarding research data often intersect with questions related to ethical, legal, and policy issues for managing research data. This lesson will define copyrights, licenses, and waivers, discuss ownership and intellectual property, and describe some reasons for data restriction. After completing this lesson, participants will be able to identify ethical, legal, and policy considerations that surround the use and management of research data. The 30-40 minute lesson includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.
Introduction to Data Documentation - DISL Data Management Metadata Training Webinar Series - Part 1
Introduction to data documentation (metadata) for science datasets. Includes basic concepts about metadata and a few words about data accessibility. Video is about 23 minutes.
An overview of the EDI data repository and data portal
The Environmental Data Initiative (EDI) data repository is a metadata-driven archive for environmental and ecological research data described by the Ecological Metadata Language (EML). This webinar will provide an overview of the PASTA software used by the repository and demonstrate the essentials of uploading a data package to the repository through the EDI Data Portal.
FAIR Self-Assessment Tool
The FAIR Data Principles are a set of guiding principles in order to make data findable, accessible, interoperable and reusable (Wilkinson et al., 2016). Using this tool you will be able to assess the 'FAIRness' of a dataset and determine how to enhance its FAIRness (where applicable).
This self-assessment tool has been designed predominantly for data librarians and IT staff but could be used by software engineers developing FAIR Data tools and services, and researchers provided they have assistance from research support staff.
You will be asked questions related to the principles underpinning Findable, Accessible, Interoperable and Reusable. Once you have answered all the questions in each section you will be given a ‘green bar’ indicator based on your answers in that section, and when all sections are completed, an overall 'FAIRness' indicator is provided.
Webinar: Jupyter as a Gateway for Scientific Collaboration and Education
Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism, and industry. The core premise of the Jupyter architecture is to design tools around the experience of interactive computing, building an environment, protocol, file format and libraries optimized for the computational process when there is a human in the loop, in a live iteration with ideas and data assisted by the computer.
The Jupyter Notebook, a system that allows users to compose rich documents that combine narrative text and mathematics together with live code and the output of computations in any format compatible with a web browser (plots, animations, audio, video, etc.), provides a foundation for scientific collaboration. The next generation of the Jupyter web interface, JupyterLab, will combine in a single user interface not only the notebook but multiple other tools to access Jupyter services and remote computational resources and data. A flexible and responsive UI allows the user to mix Notebooks, terminals, text editors, graphical consoles and more, presenting in a single, unified environment the tools needed to work with a remote environment. Furthermore, the entire design is extensible and based on plugins that interoperate via open APIs, making it possible to design new plugins tailored to specific types of data or user needs.
JupyterHub enables Jupyter Notebook and JupyterLab to be used by groups of users for research collaboration and education. We believe JupyterHub provides a foundation on which to build modern scientific gateways that support a wide range of user scenarios, from interactive data exploration in high-level languages like Python, Julia or R, to the education of researchers and students whose work relies on traditional HPC resources.
The presenter discusses the benefits and applications of Jupyter Notebooks.
Scroll to the bottom of the page to view the webinar. Presentation slides are also available on the same page.
R Class for Seismologists
The IRIS Data Management Center (DMC) archives and distributes data to support the seismological research community. The class described here introduces DMC and other seismologists to the R statistical programming language and its use with seismological data available from DMC web services. The capabilities of the seismicRoll, IRISSeismic and IRISMustangMetrics packages developed as part of the MUSTANG project will be demonstrated.
Class materials are broken up into nine separate lessons that assume some experience coding but not necessarily any familiarity with R. Lessons are presented in sequential order and assume the student already has R and RStudio installed on their computer. Autodidacts new to R should take about 20-30 hrs to complete the course. The target audience for these materials consists of IRIS DMC employees or graduate students with a degree in the natural sciences and some experience using scientific software such as MATLAB or Python.Research Lifecycle at University of Central Florida
Short video discussing the "Research Lifecycle at University of Central Florida," a useful diagram for understanding the typical flow of a research project.
Introduction to R
Learn the basics of reproducible workflows in R using a USGS National Water Information System (NWIS) dataset.
Instructions for accessing the dataset are provided within the tutorial.
Clean your taxonomy data with the taxonomyCleanr R package
Taxonomic data can be messy and challenging to work with. Incorrect spelling, the use of common names, unaccepted names, and synonyms, contribute to ambiguity in what a taxon actually is. The taxonomyCleanr R package helps you resolve taxonomic data to a taxonomic authority, get accepted names and taxonomic serial numbers, as well as create metadata for your taxa in the Ecological Metadata Language (EML) format.
Postgres, EML and R in a data management workflow
Metadata storage and creation of Ecological Metadata Language (EML) can be a challenge for people and organizations who want to archive their data. A workflow was developed to combine efficient EML record generation (using the package developed by the R community) with centrally-controlled metadata in a relational database. The webinar has two components: 1) a demonstration of metadata storage and management using a relational database, and 2) discussion of an example EML file generation workflow using pre-defined R functions.
Experimental Design Assistant
The Experimental Design Assistant (EDA) (https://eda.nc3rs.org.uk) is a free web-based tool that was developed by the NC3Rs (https://www.nc3rs.org.uk). It guides researchers through the design and analysis of in vivo experiments. The EDA allows users to build a stepwise visual representation of their experiment, providing feedback and dedicated support for randomization, blinding and sample size calculation. This demonstration will provide an introduction to the tool and provide guidance on getting started. Ultimately, the use of a tool such as the EDA will lead to carefully designed experiments that yield robust and reproducible data using the minimum number of animals consistent with scientific objectives.
Open Science and Innovation
This course helps you to understand open business models and responsible research and innovation (RRI) and illustrates how these can foster innovation. By the end of the course, you will:
- Understand key concepts and values of open business models and responsible research and innovation
- Know how to plan your innovation activities
- Be able to use Creative Commons licenses in business
- Understand new technology transfer policies with the ethos of Open Science
- Learn how to get things to market faster
Open Licensing
Licensing your research outputs is an important part of practicing Open Science. After completing this course, you will:
- Know what licenses are, how they work, and how to apply them
- Understand how different types of licenses can affect research output reuse
- Know how to select the appropriate license for your research
Tutorial: DataCite Linking
This tutorial walks users through the simple process of creating a workflow in the OpenMinTeD platform that allows them to extract links to DataCite (https://www.datacite.org) - mainly citations to datasets - from scientific publications.
Florilege, a new database of habitats and phenotypes of food microbe flora
This tutorial explains how to use the “Habitat-Phenotype Relation Extractor for Microbes” application available from the OpenMinTeD platform. It also explains the scientific issues it addresses, and how the results of the TDM process can be queried and exploited by researchers through the Florilège application.
In recent years, developments in molecular technologies have led to an exponential growth of experimental data and publications, many of which are open, however accessible separately. Therefore, it is now crucial for researchers to have bioinformatics infrastructures at their disposal, that propose unified access to both data and related scientific articles. With the right text mining infrastructures and tools, application developers and data managers can rapidly access and process textual data, link them with other data and make the results available for scientists.
The text-mining process behind Florilege has been set up by INRA using the OpenMinTeD environment. It consists in extracting the relevant information, mostly textual, from scientific literature and databases. Words or word groups are identified and assigned a type, like “habitat” or “taxon”.
Sections of the tutorial:
1. Biological motivation of the Florilege database
2. Florilège Use-Case on OpenMinTeD (includes a description of how to access the Habitat-Phenotype Relation Extractor for Microbes application)
3. Florilege backstage: how is it build?
4. Florilège description
5. How to use Florilege ?Best Practice in Open Research
This course introduces some practical steps toward making your research more open. We begin by exploring the practical implications of open research, and the benefits it can deliver for research integrity and public trust, as well as benefits you will accrue in your own work. After a short elaboration of some useful rules of thumb, we move quickly onto some more practical steps towards meeting contemporary best practice in open research and introduce some useful discipline-specific resources. Upon completing this course, you will:
- Understand the practical implications of taking a more open approach to research
- Be prepared to meet expectations relating to openness from funders, publishers, and peers
- Be able to reap the benefits of working openly
- Have an understanding of the guiding principles to follow when building openness into your research workflow
- Know about some useful tools and resources to help you embed Open Science into work research practices
Managing and Sharing Research Data
Data-driven research is becoming increasingly common in a wide range of academic disciplines, from Archaeology to Zoology, and spanning Arts and Science subject areas alike. To support good research, we need to ensure that researchers have access to good data. Upon completing this course, you will:
- Understand which data you can make open and which need to be protected
- Know how to go about writing a data management plan
- Understand the FAIR principles
- Be able to select which data to keep and find an appropriate repository for them
- Learn tips on how to get maximum impact from your research data
GeoNode for Developers Workshop
GeoNode is a web-based application and platform for developing geospatial information systems (GIS) and for deploying spatial data infrastructures (SDI). It is designed to be extended and modified and can be integrated into existing platforms.
This workshop covers the following topics:- GeoNode in development mode, how to
- The geonode-project to customize GeoNode
- Change the look and feel of the application
- Add your own app
- Add your own models, view, and logic
- Build your own APIs
- Add a third party app
- Deploy your customized GeoNode
To access geonode-project on GitHub, go to https://github.com/GeoNode/geonode-project .
Science Impact of Sustained Cyberinfrastructure: The Pegasus Example
This talk is the first in a series of NSF's Office of Advanced Cyberinfrastructure (OAC) webinars. Dr. Deelman describes the challenges of developing and sustaining cyberinfrastructure capabilities that have impact on scientific discovery and that innovate in the changing cyberinfrastructure landscape. The recent multi-messenger observation triggered by LIGO and VIRGO’s first detection of gravitational waves produced by colliding neutron stars is a clear display of the increasing impact of dependable research cyberinfrastructure (CI) on scientific discovery.
Today’s cyberinfrastructure—hardware, software, and workforce—underpins the entire scientific workflow, from data collection at instruments, through complex analysis, to simulation, visualization, and analytics. The Pegasus project in an example of a cyberinfrastructure effort that enables LIGO and other communities to accomplish their scientific goals. In addition, it delivers robust automation capabilities to researchers at the Southern California Earthquake Center (SCEC) studying seismic phenomena, to astronomers seeking to understand the structure of the universe, to material scientists developing new drug delivery methods, and to students seeking to understand human population migration.
Environmental Data Initiative Five Phases of Data Publishing Webinar - Make metadata with the EML assembly line
High-quality structured metadata is essential to the persistence and reuse of ecological data; however, creating such metadata requires substantial technical expertise and effort. To accelerate the production of metadata in the Ecological Metadata Language (EML), we’ve created the EMLassemblyline R code package. Assembly line operators supply the data and information about the data, then the machinery auto-extracts additional content and translates it all to EML. In this webinar, the presenter will provide an overview of the assembly line, how to operate it, and a brief demonstration of its use on an example dataset.
This video in the Environmental Data Initiative (EDI) "Five Phases of Data Publishing" tutorial series covers the third phase of data publishing, describing.
ISRIC Spring School
The ISRIC Spring School aims to introduce participants to world soils, soil databases, software for soil data analysis and visualisation, digital soil mapping and soil-web services through two 5-day courses run in parallel. Target audiences for the Spring School include soil and environmental scientists involved in (digital) soil mapping and soil information production at regional, national and continental scales; soil experts and professionals in natural resources management and planning; and soil science students at MSc and PhD level. Examples courses include "World Soils and their Assessment (WSA) and Hands-on Global Soil Information Facilities (GSIF). Data management topics are included within the course topics.
Hands-on Intro to SQL (Structured Query Language)
This workshop will teach the basics of working with and querying structured data in a database environment. This workshop uses the SQLite plugin for Firefox. The data used is a time-series for a small mammal community in southern Arizona in the southern United States. This is part of a project studying the effects of rodents and ants on the plant community that has been running for almost 40 years. The rodents are sampled on a series of 24 plots, with different experimental manipulations controlling which rodents are allowed to access which plots.
DATUM for Health: Research data management training for health studies
The DATUM for Health training programme covers both generic and discipline-specific issues, focusing on the management of qualitative, unstructured data, and is suitable for students at any stage of their PhD. It aims to provide students with the knowledge to manage their research data at every stage in the data lifecycle, from creation to final storage or destruction. They learn how to use their data more effectively and efficiently, how to store and destroy it securely, and how to make it available to a wider audience to increase its use, value and impact.
The programme comprises:
Overview: programme aims and scope, design, outline content and materials, recommendations
Session 1: Introduction to research data management (URL
Session 2: Data curation lifecycle
Session 3: Problems and practical strategies and solutionsFor each session the materials comprise PPT slides, notes for tutors and handouts.
Datatree - Data Training Engaging End-users
*Requires sigining up for a free account*
A free online course with all you need to know for research data management, along with ways to engage and share data with business, policymakers, media and the wider public.
The self-paced course will take 15 to 20 hours to complete in eight structured modules. The course is packed with video, quizzes and real-life examples of data management, along with plenty of additional background information.
The materials will be available for structured learning, but also to dip in for immediate problem solving.
Data Management Expert Guide
This guide is written for social science researchers who are in an early stage of practising research data management. With this guide, CESSDA wants to contribute to professionalism in data management and increase the value of research data.
If you follow the guide, you will travel through the research data lifecycle from planning, organising, documenting, processing, storing and protecting your data to sharing and publishing them. Taking the whole roundtrip will take you approximately 15 hours, however you can also hop on and off at any time.
Diversity Workbench (DWB) in 15 Steps
Introduction and demonstration of the Diversity Workbench (DWB), a "virtual research environment for multiple scientific purposes with regard to management and analysis of life and environmental sciences data. The framework is appropriate to store different kinds of bio- and geodiversity data, taxonomies, terminologies, and facilitates the processing of ecological, molecular biological, observational, collection and taxonomic data" (DWB).
For detailed information about DWB, go to https://diversityworkbench.net/Portal/Diversity_Workbench.CESSDA Expert Tour Guide on Data Management
Target audience and mission:
This tour guide was written for social science researchers who are in an early stage of practising research data management. With this tour guide, CESSDA wants to contribute to increased professionalism in data management and to improving the value of research data.
Overview:
If you follow the guide, you will travel through the research data lifecycle from planning, organising, documenting, processing, storing and protecting your data to sharing and publishing them. Taking the whole roundtrip will take you approximately 15 hours. You can also just hop on and off.
During your travels, you will come across the following recurring topics:
Adapt Your DMP
European Diversity
Expert Tips
Tour Operators
Current chapters include the following topics: Plan; Organise & Document; Process; Store; Protect; Archive & Publish. Other chapters may be added over time.Plan, a chapter of the CESSDA Expert Tour Guide on Data Management
This introductory chapter features a brief introduction to research data management and data management planning.
Before we get you started on making your own Data Management Plan (DMP), we will guide you through the concepts which provide the basic knowledge for the rest of your travels. Research data, social science data and FAIR data are some of the concepts you will pass by.
After completing your travels through this chapter you should be:
Familiar with concepts such as (sensitive) personal data and FAIR principles;
Aware of what data management and a data management plan (DMP) is and why it is important;
Familiar with the content elements that make up a DMP;
Able to answer the DMP questions which are listed at the end of this chapter and adapt your own DMP.Organise & Document, a chapter of the CESSDA Expert Tour Guide on Data Management
In this chapter, we provide you with tips and tricks on how to properly organise and document your data and metadata.
We begin with discussing good practices in designing an appropriate data file structure, file naming and organising your data within suitable folder structures. You will find out how the way you organise your data facilitates orientation in the data file, contributes to understanding the information contained and helps to prevent errors and misinterpretations.
In addition, we will focus on an appropriate documentation of your data. Development of rich metadata is required by FAIR data principles and any other current standards promoting data sharing.
After completing your travels through this chapter on organising and documenting your data you should:
Be aware of the elements which are important in setting up an appropriate structure and organisation of your data for intended research work and data sharing;
Have an overview of best practices in file naming and organising your data files in a well-structured and unambiguous folder structure;
Understand how comprehensive data documentation and metadata increases the chance your data are correctly understood and discovered;
Be aware of common metadata standards and their value;
Be able to answer the DMP questions which are listed at the end of this chapter and adapt your own DMP.Process, a chapter of the CESSDA Expert Tour on Data Management
In this chapter we focus on data operations needed to prepare your data files for analysis and data sharing. Throughout the different phases of your project, your data files will be edited numerous times. During this process it is crucial to maintain the authenticity of research information contained in the data and prevent it from loss or deterioration.
However, we will start with the topics of data entry and coding as the first steps of your work with your data files. Finally, you will learn about the importance of a comprehensive approach to data quality.
After completing your travels through this chapter you should:
Be familiar with strategies to minimise errors during the processes of data entry and data coding;
Understand why the choice of file format should be planned carefully;
Be able to manage the integrity and authenticity of your data during the research process;
Understand the importance of a systematic approach to data quality;
Able to answer the DMP questions which are listed at the end of this chapter and adapt your own DMP.Store, a chapter of the CESSDA Expert Tour on Data Management
The data that you collect, organise, prepare, and analyse to answer your research questions, and the documentation describing it are the lifeblood of your research. Put bluntly: without data, there is no research. It is therefore essential that you take adequate measures to protect your data against accidental loss and against unauthorised manipulation.
Particularly when collecting (sensitive) personal data it is necessary to ensure that these data can only be accessed by those authorized to do so. In this chapter, you will learn more about measures to help you address these threats.
After completing your travels through this chapter you should:
Be familiar with strategies to minimise errors during the processes of data entry and data coding;
Understand why the choice of file format should be planned carefully;
Be able to manage the integrity and authenticity of your data during the research process;
Understand the importance of a systematic approach to data quality;
Able to answer the DMP questions which are listed at the end of this chapter and adapt your own DMP.Protect, a chapter of the CESSDA Expert Tour on Data Management
This part of the tour guide focuses on key legal and ethical considerations in creating shareable data.
We begin with clarifying the different legal requirements of Member States, and the impact of the upcoming General Data Protection Regulation (GDPR) on research data management. Subsequently, we will show you how sharing personal data can often be accomplished by using a combination of obtaining informed consent, data anonymisation and regulating data access. Also, the supporting role of ethical review in managing your legal and ethical obligations is highlighted.
After completing your trips around this chapter you should:
Be aware of your legal and ethical obligations towards participants and be informed of the different legal requirements of Member States;
Understand how well-protecting your data, protects you against violating laws and promises made to participants;
Understand the impact of the upcoming General Data Protection Regulation (GDPR; European Union, 2016);
Understand how a combination of informed consent, anonymisation and access controls allows you to create shareable personal data;
Be able to define what elements should be integrated into a consent form;
Be able to apply anonymisation techniques to your data;
Be able to answer the DMP questions which are listed at the end of this chapter and adapt your own DMP.Archive & Publish, a chapter of the CESSDA Expert Tour on Data Management
High-quality data have the potential to be reused in many ways. Archiving and publishing your data properly will enable both your future self as well as future others to get the most out of your data.
In this chapter, we venture into the landscape of research data archiving and publication. We will guide you in making an informed decision on where to archive and publish your data in such a way that others can properly access, understand, use and cite them.
Understand the difference between data archiving and data publishing;
Be aware of the benefits of data publishing;
Be able to differentiate between different data publication services (data journal, self-archiving, a data repository);
Be able to select a data repository which fits your research data's needs;
Be aware of ways to promote your research data publication;
Be able to answer the DMP questions which are listed at the end of this chapter and adapt your own DMP.Research Data Management Hands on Workshop
Description: This project includes material designed for teaching a 1.5 hour research data management workshop. It involves a case study that requires workshop participants to navigate messy data to identify the data that corresponds with the data represented in a figure from an article. Workshop attendees are then required to modify the messy data to follow research data management best practices.
Penn State Online: Introduction to GIS modeling and Python
This unit is Lesson 1 of the online course, GEOG 485: GIS Programming and Software Development at PennState University's College of Earth and Mineral Sciences.
As with GEOG 483 and GEOG 484, the lessons in this course are project-based with key concepts embedded within. However, because of the nature of computer programming, there is no way this course can follow the step-by-step instruction design of the previous courses. You will probably find the course to be more challenging than the others.