FAIR Data Principles

  • Clean your taxonomy data with the taxonomyCleanr R package

    Taxonomic data can be messy and challenging to work with. Incorrect spelling, the use of common names, unaccepted names, and synonyms, contribute to ambiguity in what a taxon actually is. The taxonomyCleanr R package helps you resolve taxonomic data to a taxonomic authority, get accepted names and taxonomic serial numbers, as well as create metadata for your taxa in the Ecological Metadata Language (EML) format.

  • Postgres, EML and R in a data management workflow

    Metadata storage and creation of Ecological Metadata Language (EML) can be a challenge for people and organizations who want to archive their data. A workflow was developed to combine efficient EML record generation (using the package developed by the R community) with centrally-controlled metadata in a relational database. The webinar has two components: 1) a demonstration of metadata storage and management using a relational database, and 2) discussion of an example EML file generation workflow using pre-defined R functions.

     

  • Experimental Design Assistant

    The Experimental Design Assistant (EDA) (https://eda.nc3rs.org.uk) is a free web-based tool that was developed by the NC3Rs (https://www.nc3rs.org.uk). It guides researchers through the design and analysis of in vivo experiments. The EDA allows users to build a stepwise visual representation of their experiment, providing feedback and dedicated support for randomization, blinding and sample size calculation. This demonstration will provide an introduction to the tool and provide guidance on getting started. Ultimately, the use of a tool such as the EDA will lead to carefully designed experiments that yield robust and reproducible data using the minimum number of animals consistent with scientific objectives. 

  • Open Science and Innovation

    This course helps you to understand open business models and responsible research and innovation (RRI) and illustrates how these can foster innovation. By the end of the course, you will:

    • Understand key concepts and values of open business models and responsible research and innovation
    • Know how to plan your innovation activities
    • Be able to use Creative Commons licenses in business
    • Understand new technology transfer policies with the ethos of Open Science
    • Learn how to get things to market faster
  • Data Management using NEON Small Mammal Data

    Undergraduate STEM students are graduating into professions that require them to manage and work with data at many points of a data management lifecycle. Within ecology, students are presented not only with many opportunities to collect data themselves but increasingly to access and use public data collected by others. This activity introduces the basic concept of data management from the field through to data analysis. The accompanying presentation materials mention the importance of considering long-term data storage and data analysis using public data.

    Content page: ​https://github.com/NEONScience/NEON-Data-Skills/blob/master/tutorials/te...

  • Open Licensing

    Licensing your research outputs is an important part of practicing Open Science. After completing this course, you will:

    • Know what licenses are, how they work, and how to apply them 
    • Understand how different types of licenses can affect research output reuse
    • Know how to select the appropriate license for your research 
  • Make EML with R and share on GitHub

    Introduction to the Ecological Metadata Language (EML). Topics include:

    • Use R to build EML for a mock dataset
    • Validate EML and write to file
    • Install Git and configure to track file versioning in RStudio
    • Set up GitHub account and repository
    • Push local content to GitHub for sharing and collaboration

    Access the rendered version of this tutorial here:​https://cdn.rawgit.com/EDIorg/tutorials/2002b911/make_eml_with_r/make_em...

  • Tutorial: DataCite Linking

    This tutorial walks users through the simple process of creating a workflow in the OpenMinTeD platform that allows them to extract links to DataCite (https://www.datacite.org) - mainly citations to datasets - from scientific publications.

  • Florilege, a new database of habitats and phenotypes of food microbe flora

    This tutorial explains how to use the “Habitat-Phenotype Relation Extractor for Microbes” application available from the OpenMinTeD platform. It also explains the scientific issues it addresses, and how the results of the TDM process can be queried and exploited by researchers through the Florilège application.  

    In recent years, developments in molecular technologies have led to an exponential growth of experimental data and publications, many of which are open, however accessible separately. Therefore, it is now crucial for researchers to have bioinformatics infrastructures at their disposal, that propose unified access to both data and related scientific articles. With the right text mining infrastructures and tools, application developers and data managers can rapidly access and process textual data, link them with other data and make the results available for scientists.

    The text-mining process behind Florilege has been set up by INRA using the OpenMinTeD environment. It consists in extracting the relevant information, mostly textual, from scientific literature and databases. Words or word groups are identified and assigned a type, like  “habitat” or “taxon”.

    Sections of the tutorial:
    1. Biological motivation of the Florilege database
    2. Florilège Use-Case on OpenMinTeD (includes a description of how to access the Habitat-Phenotype Relation Extractor for Microbes application)
    3. Florilege backstage: how is it build?
    4. Florilège description
    5. How to use Florilege ?

     

  • Best Practice in Open Research

    This course introduces some practical steps toward making your research more open. We begin by exploring the practical implications of open research, and the benefits it can deliver for research integrity and public trust, as well as benefits you will accrue in your own work. After a short elaboration of some useful rules of thumb, we move quickly onto some more practical steps towards meeting contemporary best practice in open research and introduce some useful discipline-specific resources. Upon completing this course, you will:

    • Understand the practical implications of taking a more open approach to research
    • Be prepared to meet expectations relating to openness from funders, publishers, and peers 
    • Be able to reap the benefits of working openly
    • Have an understanding of the guiding principles to follow when building openness into your research workflow
    • Know about some useful tools and resources to help you embed Open Science into work research practices
  • Managing and Sharing Research Data

    Data-driven research is becoming increasingly common in a wide range of academic disciplines, from Archaeology to Zoology, and spanning Arts and Science subject areas alike. To support good research, we need to ensure that researchers have access to good data. Upon completing this course, you will:

    • Understand which data you can make open and which need to be protected
    • Know how to go about writing a data management plan
    • Understand the FAIR principles
    • Be able to select which data to keep and find an appropriate repository for them
    • Learn tips on how to get maximum impact from your research data
  • GeoNode for Developers Workshop

    GeoNode is a web-based application and platform for developing geospatial information systems (GIS) and for deploying spatial data infrastructures (SDI). It is designed to be extended and modified and can be integrated into existing platforms.
    This workshop covers the following topics:

    • GeoNode in development mode, how to
    • The geonode-project to customize GeoNode
    • Change the look and feel of the application
    • Add your own app
    • Add your own models, view, and logic
    • Build your own APIs
    • Add a third party app
    • Deploy your customized GeoNode


    To access geonode-project on GitHub, go to https://github.com/GeoNode/geonode-project .

     

  • Environmental Data Initiative Five Phases of Data Publishing Webinar - What are metadata and structured metadata?

    Metadata are essential to understanding a dataset. The talk covers:

    • How structured metadata are used to document, discover, and analyze ecological datasets.
    • Tips on creating quality metadata content.
    • An introduction to the metadata language used by the Environmental Data Initiative, Ecological Metadata Language (EML). EML is written in XML, a general purpose mechanism for describing hierarchical information, so some general XML features and how these apply to EML are covered.

    This video in the Environmental Data Initiative (EDI) "Five Phases of Data Publishing" tutorial series covers the third phase of data publishing, describing.

     

  • Environmental Data Initiative Five Phases of Data Publishing Webinar - Make metadata with the EML assembly line

    High-quality structured metadata is essential to the persistence and reuse of ecological data; however, creating such metadata requires substantial technical expertise and effort. To accelerate the production of metadata in the Ecological Metadata Language (EML), we’ve created the EMLassemblyline R code package. Assembly line operators supply the data and information about the data, then the machinery auto-extracts additional content and translates it all to EML. In this webinar, the presenter will provide an overview of the assembly line, how to operate it, and a brief demonstration of its use on an example dataset.

    This video in the Environmental Data Initiative (EDI) "Five Phases of Data Publishing" tutorial series covers the third phase of data publishing, describing.

     

  • Environmental Data Initiative Five Phases of Data Publishing Webinar - Creating "clean" data for archiving

    Not all data are easy to use, and some are nearly impossible to use effectively. This presentation lays out the principles and some best practices for creating data that will be easy to document and use. It will identify many of the pitfalls in data preparation and formatting that will cause problems further down the line and how to avoid them.

    This video in the Environmental Data Initiative (EDI) "Five Phases of Data Publishing" tutorial series covers the second phase of data publishing, cleaning data. For more guidance from EDI on data cleaning, also see "How to clean and format data using Excel, OpenRefine, and Excel," located here: ​https://www.youtube.com/watch?v=tRk01ytRXjE.

  • Environmental Data Initiative Five Phases of Data Publishing Webinar - How to clean and format data using Excel, OpenRefine, and Excel

    This webinar provides an overview of some of the tools available for formatting and cleaning data,  guidance on tool suitability and limitations, and an example dataset and instructions for working with those tools.

    This video in the Environmental Data Initiative (EDI) "Five Phases of Data Publishing" tutorial series covers the second phase of data publishing, cleaning data.

    For more guidance from EDI on data cleaning, also see " Creating 'clean' data for archiving," located here:  https://www.youtube.com/watch?v=gW_-XTwJ1OA.

  • A FAIR afternoon: on FAIR data stewardship for Technology Hotel (/ETH4) beneficiaries

    FAIR data awareness event for Enabling Technology Hotels 4ed. One of the aims of the Enabling Technologies Hotels programme, is to promote the application of the FAIR data principles in research data stewardship, data integration, methods, and standards. This relates to the objective of the national plan open science that research data have to be made suitable for re-usability.
    With this FAIR data training, ZonMw and DTL aim to help researchers (hotel guests and managers) that have obtained a grant in the 4th round of the programme to apply FAIR data management in their research.

  • Data Management Expert Guide

    This guide is written for social science researchers who are in an early stage of practising research data management. With this guide, CESSDA wants to contribute to professionalism in data management and increase the value of research data.

    If you follow the guide, you will travel through the research data lifecycle from planning, organising, documenting, processing, storing and protecting your data to sharing and publishing them. Taking the whole roundtrip will take you approximately 15 hours, however you can also hop on and off at any time.

  • CESSDA Expert Tour Guide on Data Management

    Target audience and mission:
    This tour guide was written for social science researchers who are in an early stage of practising research data management. With this tour guide, CESSDA wants to contribute to increased professionalism in data management and to improving the value of research data.
    Overview:
    If you follow the guide, you will travel through the research data lifecycle from planning, organising, documenting, processing, storing and protecting your data to sharing and publishing them. Taking the whole roundtrip will take you approximately 15 hours. You can also just hop on and off.
    During your travels, you will come across the following recurring topics:
    Adapt Your DMP
    European Diversity
    Expert Tips
    Tour Operators
    Current chapters include the following topics:  Plan; Organise & Document; Process; Store; Protect;  Archive & Publish.  Other chapters may be added over time.

  • Research Rigor & Reproducibility: Understanding the Data Lifecycle for Research Success

    This course provides recommended practices for facilitating the discoverability, access, integrity, and reuse value of your research data.  The modules have been selected from a larger Canvas course "Best Practices for Biomedical Research Data Management (https://www.canvas.net/browse/harvard-medical/courses/biomed-research-da... ).

    Biomedical research today is not only rigorous, innovative and insightful, it also has to be organized and reproducible. With more capacity to create and store data, there is the challenge of making data discoverable, understandable, and reusable. Many funding agencies and journal publishers are requiring publication of relevant data to promote open science and reproducibility of research.

    In this course, students will learn how to identify and address current workflow challenges throughout the research life cycle. By understanding best practices for managing your data throughout a project, you will succeed in making your research ready to publish, share, interpret, and be used by others.  Course materials include video lectures, presentation slides, readings and resources, research case studies, interactive activities and concept quizzes.  

  • Best Practices for Biomedical Research Data Management

    This course presents approximately 20 hours of content aimed at a broad audience on recommended practices facilitating the discoverability, access, integrity, reuse value, privacy, security, and long-term preservation of biomedical research data.

    Each of the nine modules is dedicated to a specific component of data management best practices and includes video lectures, presentation slides, readings & resources, research teaching cases, interactive activities, and concept quizzes.

    Background Statement:
    Biomedical research today is not only rigorous, innovative and insightful, it also has to be organized and reproducible. With more capacity to create and store data, there is the challenge of making data discoverable, understandable, and reusable. Many funding agencies and journal publishers are requiring publication of relevant data to promote open science and reproducibility of research.

    In order to meet to these requirements and evolving trends, researchers and information professionals will need the data management and curation knowledge and skills to support the access, reuse and preservation of data.

    This course is designed to address present and future data management needs.

    Best Practices for Biomedical Research Data Management serves as an introductory course for information professionals and scientific researchers to the field of scientific data management.  The course is also offered by Canvas Instruction, at:  https://www.canvas.net/browse/harvard-medical/courses/biomed-research-da... .

    In this course, learners will explore relationships between libraries and stakeholders seeking support for managing their research data. 

  • DataONE Data Management Module 02: Data Sharing

    When first sharing research data, researchers often raise questions about the value, benefits, and mechanisms for sharing. Many stakeholders and interested parties, such as funding agencies, communities, other researchers, or members of the public may be interested in research, results and related data. This 30-40 minute lesson addresses data sharing in the context of the data life cycle, the value of sharing data, concerns about sharing data, and methods and best practices for sharing data and includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.

  • DataONE Data Management Module 01: Why Data Management

    As rapidly changing technology enables researchers to collect large, complex datasets with relative ease, the need to effectively manage these data increases in kind. This is the first lesson in a series of education modules intended to provide a broad overview of various topics related to research data management. This 30-40 minute module covers trends in data collection, storage and loss, the importance and benefits of data management, and an introduction to the data life cycle and includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.

  • FAIR Webinar Series

    This webinar series explores each of the four FAIR principles (Findable, Accessible, Interoperable, Reusable) in depth - practical case studies from a range of disciplines, Australian and international perspectives, and resources to support the uptake of FAIR principles.

    The FAIR data principles were drafted by the FORCE11 group in 2015. The principles have since received worldwide recognition as a useful framework for thinking about sharing data in a way that will enable maximum use and reuse.  A seminal article describing the FAIR principles can also be found at:  https://www.nature.com/articles/sdata201618.

    This series is of interest to those who work with creating, managing, connecting and publishing research data at institutions:
    - researchers and research teams who need to ensure their data is reusable and publishable
    - data managers and researchers
    - Librarians, data managers and repository managers
    - IT who need to connect Institutional research data, HR and other IT systems

  • Introduction, FAIR Principles and Management Plans

    A presentation on FAIR Data and Software, and FAIR Principles and Managment Plans, which occured during a Carpentries-Based Workshop in Hannover, Germany, Jul 9-13 2018. 

  • Coffee and Code: Reproducibility and Communication

    This workshop provides an introduction to reproducibility and communication of research using notebooks based on RStudio and Jupyter Notebooks. The development of effective documentation and accesible and reusable methods in scientific analysis can make a significant contribution to the reproducibility and understanding of a research activity.  The integration of executable code with blocks of narrative content within notebook systems such as those provided in RStudio and the Jupyter Notebook (and Lab) software environments provides a streamined way to bring these minimum components (data, metadata, code, and software) into a package that can be easily shared with others for review and reuse.

    This workshop will provide:  

    • A high-level introduction to the notebook interfaces provided for R and Python through the RStudio and Jupyter Notebook environments.
    • An introduction to Markdown as a language supported by both systems for adding narrative content to notebooks
    • Sample notebooks illustrating structure, content, and output options

     From the master page for this resource, the Reproducibility and Communication Using Notebooks ipynb file provides more information about what is covered in this workshop.  

  • Coffee and Code: Introduction to Version Control

    This is a tutorial about version control, also known as revision control, a method for tracking changes to files and folders within a source code tree, project, or any complex set of files or documents.

    Also see ​Advanced Version Control, here: ​https://github.com/unmrds/cc-version-control/blob/master/03-advanced-ver...

  • Coffee and Code: Advanced Version Control

    Learn advanced version control practices for tracking changes to files and folders within a source code tree, project, or any complex set of files or documents.  

    This tutorial builds on concepts taught in "Introduction to Version Control," found here: https://github.com/unmrds/cc-version-control/blob/master/01-version-cont....

    Git Repository for this Workshop: https://github.com/unmrds/cc-version-control

  • MANTRA Research Data Management Training

    MANTRA is a free, online non-assessed course with guidelines to help you understand and reflect on how to manage the digital data you collect throughout your research. It has been crafted for the use of post-graduate students, early career researchers, and also information professionals. It is freely available on the web for anyone to explore on their own.

    Through a series of interactive online units you will learn about terminology, key concepts, and best practice in research data management.

    There are eight online units in this course and one set of offline (downloadable) data handling tutorials that will help you:

    Understand the nature of research data in a variety of disciplinary settings
    Create a data management plan and apply it from the start to the finish of your research project
    Name, organise, and version your data files effectively
    Gain familiarity with different kinds of data formats and know how and when to transform your data
    Document your data well for yourself and others, learn about metadata standards and cite data properly
    Know how to store and transport your data safely and securely (backup and encryption)
    Understand legal and ethical requirements for managing data about human subjects; manage intellectual property rights
    Understand the benefits of sharing, preserving and licensing data for re-use
    Improve your data handling skills in one of four software environments: R, SPSS, NVivo, or ArcGIS

  • Research Data Management and Open Data

    This was a presentation during the Julius Symposium 2017 on Open Science and in particular on Open data and/or FAIR data.  Examples are given of medical and health research data.

  • Train the Trainer Workshop: How do I create a course in research data management?

    Presentations and excercises of a train-the-trainer Workshop on how to create a course in research data management, given at the International Digital Curation Conference 2018 in Barcelona.

  • Developing Data Management Education, Support, and Training

    These presentations were part of an invited guest lecture on data management for CISE graduates students of the CAP5108: Research Methods for Human-centered Computing course at the University of Florida (UF) on April 12, 2018. Graduate students were introduced to the DCC Checklist for a Data Management Plan, OAIS Model (cessda adaptation), ORCiD, IR, high-performance computing (HPC) storage options at UF, data lifecycle models (USGS and UNSW), data publication guides (Beckles, 2018) and reproducibility guidelines (ACM SIGMOD 2017/2018). This was the first guest lecture on data management for UF computer & information science & engineering (CISE) graduate students in CAP 5108: Research Methods for Human-centered Computing - https://www.cise.ufl.edu/class/cap5108sp18/.  A draft of a reproducibility template is provided in version 3 of the guest lecture.  

  • Coffee and Code: Write Once Use Everywhere (Pandoc)

    Pandoc at http://pandoc.org  is a document processing program that runs on multiple operating systems (Mac, Windows, Linux) and can read and write a wide variety of file formats. In many respects, Pandoc can be thought of as a universal translator for documents. This workshop focuses on a subset of input and output document types, just scratching the surface of the transformations made possible by Pandoc.

    Click 00-Overview.ipynb on the provided GitHub page or go directly to the overview, here:
    https://github.com/unmrds/cc-pandoc/blob/master/00-Overview.ipynb

  • Seismic Data Quality Assurance Using IRIS MUSTANG Metrics

    Seismic data quality assurance involves reviewing data in order to identify and resolve problems that limit the use of the data – a time-consuming task for large data volumes! Additionally, no two analysts review seismic data in quite the same way. Recognizing this, IRIS developed the MUSTANG automated seismic data quality metrics system to provide data quality measurements for all data archived at IRIS Data Services. Knowing how to leverage MUSTANG metrics can help users quickly discriminate between usable and problematic data and it is flexible enough for each user to adapt it to their own working style.
    This tutorial presents strategies for using MUSTANG metrics to optimize your own data quality review. Many of the examples in this tutorial illustrate approaches used by the IRIS Data Services Quality Assurance (QA) staff.
     

  • Singularity User Guide

    Singularity is a container solution created by necessity for scientific and application driven workloads.  .
    Over the past decade and a half, virtualization has gone from an engineering toy to a global infrastructure necessity and the evolution of enabling technologies has flourished. Most recently, we have seen the introduction of the latest spin on virtualization… “containers”. 
    Many scientists, especially those involved with the high performance computation (HPC) community, could benefit greatly by using container technology, but they need a feature set that differs somewhat from that available with current container technology. This necessity drives the creation of Singularity and articulated its four primary functions:

    • Mobility of compute
    • Reproducibility
    • User freedom
    • Support on existing traditional HPC 


    This user guide introduces Singularity, a free, cross-platform and open-source computer program that performs operating-system-level virtualization also known as containerization.

  • Why Cite Data?

    This video explains what data citation is and why it's important. It also discusses what digital object identifiers (DOIs) are and how they are used.