FAIR Data Principles

  • Webinar: National Data Service (NDS) Labs Workbench

    The growing size and complexity of high-value scientific datasets are pushing the boundaries of traditional models of data access and discovery. Many large datasets are only accessible through the systems on which they were created or require specialized software or computational resources for re-use. In response to this growing need, the National Data Service (NDS) consortium is developing the Labs Workbench platform, a scalable, web-based system intended to support turn-key deployment of encapsulated data management and analysis tools to support exploratory analysis and development on cloud resources that are physically "near" the data and associated high-performance computing (HPC) systems.  The Labs Workbench may complement existing science gateways by enabling exploratory analysis of data and the ability for users to deploy and share their own tools. The Labs Workbench platform has also been used to support a variety training and workshop environments.

    This webinar includes a demonstration of the Labs Workbench platform and a discussion of several key use cases. A presentation of findings from the recent Workshop on Container Based Analysis Environments for Research Data Access and Computing further highlight compatibilities between science gateways and interactive analysis platforms such as Labs Workbench.
     

  • Research Data Management and Sharing MOOC

    This course will provide learners with an introduction to research data management and sharing. After completing this course, learners will understand the diversity of data and their management needs across the research data lifecycle, be able to identify the components of good data management plans and be familiar with best practices for working with data including the organization, documentation, and storage and security of data. Learners will also understand the impetus and importance of archiving and sharing data as well as how to assess the trustworthiness of repositories.

    Note: The course is free to access. However, if you pay for the course, you will have access to all of the features and content you need to earn a Course Certificate from Coursera. If you complete the course successfully, your electronic Certificate will be added to your Coursera Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. Note that the Course Certificate does not represent official academic credit from the partner institution offering the course.
    Also, note that the course is offered on a regular basis. For information about the next enrollment, go to the provided URL.

     
  • Introduction to Data Documentation - DISL Data Management Metadata Training Webinar Series - Part 1

    Introduction to data documentation (metadata) for science datasets. Includes basic concepts about metadata and a few words about data accessibility. Video is about 23 minutes.

  • An overview of the EDI data repository and data portal

    The Environmental Data Initiative (EDI) data repository is a metadata-driven archive for environmental and ecological research data described by the Ecological Metadata Language (EML). This webinar will provide an overview of the PASTA software used by the repository and demonstrate the essentials of uploading a data package to the repository through the EDI Data Portal. 

  • FAIR Self-Assessment Tool

    The FAIR Data Principles are a set of guiding principles in order to make data findable, accessible, interoperable and reusable (Wilkinson et al., 2016). Using this tool you will be able to assess the 'FAIRness' of a dataset and determine how to enhance its FAIRness (where applicable).

    This self-assessment tool has been designed predominantly for data librarians and IT staff but could be used by software engineers developing FAIR Data tools and services, and researchers provided they have assistance from research support staff.

    You will be asked questions related to the principles underpinning Findable, Accessible, Interoperable and Reusable. Once you have answered all the questions in each section you will be given a ‘green bar’ indicator based on your answers in that section, and when all sections are completed, an overall 'FAIRness' indicator is provided.

  • Clean your taxonomy data with the taxonomyCleanr R package

    Taxonomic data can be messy and challenging to work with. Incorrect spelling, the use of common names, unaccepted names, and synonyms, contribute to ambiguity in what a taxon actually is. The taxonomyCleanr R package helps you resolve taxonomic data to a taxonomic authority, get accepted names and taxonomic serial numbers, as well as create metadata for your taxa in the Ecological Metadata Language (EML) format.

  • Postgres, EML and R in a data management workflow

    Metadata storage and creation of Ecological Metadata Language (EML) can be a challenge for people and organizations who want to archive their data. A workflow was developed to combine efficient EML record generation (using the package developed by the R community) with centrally-controlled metadata in a relational database. The webinar has two components: 1) a demonstration of metadata storage and management using a relational database, and 2) discussion of an example EML file generation workflow using pre-defined R functions.

     

  • Open Science and Innovation

    This course helps you to understand open business models and responsible research and innovation (RRI) and illustrates how these can foster innovation. By the end of the course, you will:

    • Understand key concepts and values of open business models and responsible research and innovation
    • Know how to plan your innovation activities
    • Be able to use Creative Commons licenses in business
    • Understand new technology transfer policies with the ethos of Open Science
    • Learn how to get things to market faster
  • Open Licensing

    Licensing your research outputs is an important part of practicing Open Science. After completing this course, you will:

    • Know what licenses are, how they work, and how to apply them 
    • Understand how different types of licenses can affect research output reuse
    • Know how to select the appropriate license for your research 
  • Make EML with R and share on GitHub

    Introduction to the Ecological Metadata Language (EML). Topics include:

    • Use R to build EML for a mock dataset
    • Validate EML and write to file
    • Install Git and configure to track file versioning in RStudio
    • Set up GitHub account and repository
    • Push local content to GitHub for sharing and collaboration

    Access the rendered version of this tutorial here:​https://cdn.rawgit.com/EDIorg/tutorials/2002b911/make_eml_with_r/make_em...

  • Tutorial: DataCite Linking

    This tutorial walks users through the simple process of creating a workflow in the OpenMinTeD platform that allows them to extract links to DataCite (https://www.datacite.org) - mainly citations to datasets - from scientific publications.

  • Florilege, a new database of habitats and phenotypes of food microbe flora

    This tutorial explains how to use the “Habitat-Phenotype Relation Extractor for Microbes” application available from the OpenMinTeD platform. It also explains the scientific issues it addresses, and how the results of the TDM process can be queried and exploited by researchers through the Florilège application.  

    In recent years, developments in molecular technologies have led to an exponential growth of experimental data and publications, many of which are open, however accessible separately. Therefore, it is now crucial for researchers to have bioinformatics infrastructures at their disposal, that propose unified access to both data and related scientific articles. With the right text mining infrastructures and tools, application developers and data managers can rapidly access and process textual data, link them with other data and make the results available for scientists.

    The text-mining process behind Florilege has been set up by INRA using the OpenMinTeD environment. It consists in extracting the relevant information, mostly textual, from scientific literature and databases. Words or word groups are identified and assigned a type, like  “habitat” or “taxon”.

    Sections of the tutorial:
    1. Biological motivation of the Florilege database
    2. Florilège Use-Case on OpenMinTeD (includes a description of how to access the Habitat-Phenotype Relation Extractor for Microbes application)
    3. Florilege backstage: how is it build?
    4. Florilège description
    5. How to use Florilege ?

     

  • Managing and Sharing Research Data

    Data-driven research is becoming increasingly common in a wide range of academic disciplines, from Archaeology to Zoology, and spanning Arts and Science subject areas alike. To support good research, we need to ensure that researchers have access to good data. Upon completing this course, you will:

    • Understand which data you can make open and which need to be protected
    • Know how to go about writing a data management plan
    • Understand the FAIR principles
    • Be able to select which data to keep and find an appropriate repository for them
    • Learn tips on how to get maximum impact from your research data
  • Environmental Data Initiative Five Phases of Data Publishing Webinar - Make metadata with the EML assembly line

    High-quality structured metadata is essential to the persistence and reuse of ecological data; however, creating such metadata requires substantial technical expertise and effort. To accelerate the production of metadata in the Ecological Metadata Language (EML), we’ve created the EMLassemblyline R code package. Assembly line operators supply the data and information about the data, then the machinery auto-extracts additional content and translates it all to EML. In this webinar, the presenter will provide an overview of the assembly line, how to operate it, and a brief demonstration of its use on an example dataset.

    This video in the Environmental Data Initiative (EDI) "Five Phases of Data Publishing" tutorial series covers the third phase of data publishing, describing.

     

  • Data Management Expert Guide

    This guide is written for social science researchers who are in an early stage of practising research data management. With this guide, CESSDA wants to contribute to professionalism in data management and increase the value of research data.

    If you follow the guide, you will travel through the research data lifecycle from planning, organising, documenting, processing, storing and protecting your data to sharing and publishing them. Taking the whole roundtrip will take you approximately 15 hours, however you can also hop on and off at any time.

  • CESSDA Expert Tour Guide on Data Management

    Target audience and mission:
    This tour guide was written for social science researchers who are in an early stage of practising research data management. With this tour guide, CESSDA wants to contribute to increased professionalism in data management and to improving the value of research data.
    Overview:
    If you follow the guide, you will travel through the research data lifecycle from planning, organising, documenting, processing, storing and protecting your data to sharing and publishing them. Taking the whole roundtrip will take you approximately 15 hours. You can also just hop on and off.
    During your travels, you will come across the following recurring topics:
    Adapt Your DMP
    European Diversity
    Expert Tips
    Tour Operators
    Current chapters include the following topics:  Plan; Organise & Document; Process; Store; Protect;  Archive & Publish.  Other chapters may be added over time.

  • Best Practices for Biomedical Research Data Management

    This course presents approximately 20 hours of content aimed at a broad audience on recommended practices facilitating the discoverability, access, integrity, reuse value, privacy, security, and long-term preservation of biomedical research data.

    Each of the nine modules is dedicated to a specific component of data management best practices and includes video lectures, presentation slides, readings & resources, research teaching cases, interactive activities, and concept quizzes.

    Background Statement:
    Biomedical research today is not only rigorous, innovative and insightful, it also has to be organized and reproducible. With more capacity to create and store data, there is the challenge of making data discoverable, understandable, and reusable. Many funding agencies and journal publishers are requiring publication of relevant data to promote open science and reproducibility of research.

    In order to meet to these requirements and evolving trends, researchers and information professionals will need the data management and curation knowledge and skills to support the access, reuse and preservation of data.

    This course is designed to address present and future data management needs.

    Best Practices for Biomedical Research Data Management serves as an introductory course for information professionals and scientific researchers to the field of scientific data management.  The course is also offered by Canvas Instruction, at:  https://www.canvas.net/browse/harvard-medical/courses/biomed-research-da... .

    In this course, learners will explore relationships between libraries and stakeholders seeking support for managing their research data. 

  • FAIR Webinar Series

    This webinar series explores each of the four FAIR principles (Findable, Accessible, Interoperable, Reusable) in depth - practical case studies from a range of disciplines, Australian and international perspectives, and resources to support the uptake of FAIR principles.

    The FAIR data principles were drafted by the FORCE11 group in 2015. The principles have since received worldwide recognition as a useful framework for thinking about sharing data in a way that will enable maximum use and reuse.  A seminal article describing the FAIR principles can also be found at:  https://www.nature.com/articles/sdata201618.

    This series is of interest to those who work with creating, managing, connecting and publishing research data at institutions:
    - researchers and research teams who need to ensure their data is reusable and publishable
    - data managers and researchers
    - Librarians, data managers and repository managers
    - IT who need to connect Institutional research data, HR and other IT systems

  • Coffee and Code: Introduction to Version Control

    This is a tutorial about version control, also known as revision control, a method for tracking changes to files and folders within a source code tree, project, or any complex set of files or documents.

    Also see ​Advanced Version Control, here: ​https://github.com/unmrds/cc-version-control/blob/master/03-advanced-ver...

  • Coffee and Code: Advanced Version Control

    Learn advanced version control practices for tracking changes to files and folders within a source code tree, project, or any complex set of files or documents.  

    This tutorial builds on concepts taught in "Introduction to Version Control," found here: https://github.com/unmrds/cc-version-control/blob/master/01-version-cont....

    Git Repository for this Workshop: https://github.com/unmrds/cc-version-control

  • MANTRA Research Data Management Training

    MANTRA is a free, online non-assessed course with guidelines to help you understand and reflect on how to manage the digital data you collect throughout your research. It has been crafted for the use of post-graduate students, early career researchers, and also information professionals. It is freely available on the web for anyone to explore on their own.

    Through a series of interactive online units you will learn about terminology, key concepts, and best practice in research data management.

    There are eight online units in this course and one set of offline (downloadable) data handling tutorials that will help you:

    Understand the nature of research data in a variety of disciplinary settings
    Create a data management plan and apply it from the start to the finish of your research project
    Name, organise, and version your data files effectively
    Gain familiarity with different kinds of data formats and know how and when to transform your data
    Document your data well for yourself and others, learn about metadata standards and cite data properly
    Know how to store and transport your data safely and securely (backup and encryption)
    Understand legal and ethical requirements for managing data about human subjects; manage intellectual property rights
    Understand the benefits of sharing, preserving and licensing data for re-use
    Improve your data handling skills in one of four software environments: R, SPSS, NVivo, or ArcGIS

  • Coffee and Code: Write Once Use Everywhere (Pandoc)

    Pandoc at http://pandoc.org  is a document processing program that runs on multiple operating systems (Mac, Windows, Linux) and can read and write a wide variety of file formats. In many respects, Pandoc can be thought of as a universal translator for documents. This workshop focuses on a subset of input and output document types, just scratching the surface of the transformations made possible by Pandoc.

    Click 00-Overview.ipynb on the provided GitHub page or go directly to the overview, here:
    https://github.com/unmrds/cc-pandoc/blob/master/00-Overview.ipynb

  • 2018 NOAA Environmental Data Management Workshop (EDMW)

    The EDMW 2018 theme is "Improving Discovery, Access, Usability, and Archiving of NOAA Data for Societal Benefit." The workshop builds on past work by providing training, highlighting progress, identifying issues, fostering discussions, and determining where new technologies can be applied for management of environmental data and information at NOAA. All NOAA employees and contractors are welcome, including data producers, data managers, metadata developers, archivists, researchers, grant issuers, policy developers, program managers, and others.  Links to recordings of the sessions plus presentation slides are available.
    Some key topic areas include:

    • Big Earth Data Initiative (BEDI)
    • Data Governance
    • NCEI's Emerging Data Management System
    • Data Visualization
    • Data Lifecycle Highlights
    • Data Archiving
    • Data Integration
    • Metadata Authoring, Management & Evolution
    • NOAA Institutional Repository
    • Video Data Managment & Access Solutions
    • Unified Access Format (UAF), ERDDAP & netCDF
    • Improving Data Discovery & Access to Data 
    • Arctic & Antarctic Data Access
  • Introduction to Data Management

    This short course on data management is designed for graduate students in the engineering disciplines who seek to prepare themselves as “data information literate" scientists in the digital research environment. Detailed videos and writing activities will help you prepare for the specific and long-term needs of managing your research data. Experts in digital curation will describe current sharing expectations of federal funding agencies (like NSF, NIH) and give advice on how toethically share and preserve research data for long-term access and reuse.

    Students will get out of this course:

    • Seven web-based lessons that you can watch anytime online or download to your device.
    • A Data Management Plan (DMP) template with tips on how to complete each section. Your completed DMP can be used in grant applications or put into practice as a protocol for handling data individually or within your research group or lab. 
    •  Feedback and consultation on your completed DMP by research data curators in your field. 

    Module topics include: 
    1. Introduction to Data Management
    2. Data to be Managed
    3. Organization and Documentation
    4. Data Access and Ownership
    5. Data Sharing and Re-use
    6. Preservation Techniques
    7. Complete Your DMP.

  • Manage, Improve and Open Up your Research and Data

    This module will look at emerging trends and best practice in data management, quality assessment and IPR issues.

    We will look at policies regarding data management and their implementation, particularly in the framework of a Research Infrastructure.
    By the end of this module, you should be able to:

    • Understand and describe the FAIR Principles and what they are used for
    • Understand and describe what a Data Management Plan is, and how they are used
    • Understand and explain what Open Data, Open Access and Open Science means for researchers
    • Describe best practices around data management
    • Understand and explain how Research Infrastructures interact with and inform policy on issues around data management

    PARTHENOS training provides modules and resources in digital humanities and research infrastructures with the goal of strengthening the cohesion of research in the broad sector of Linguistic Studies, Humanities, Cultural Heritage, History, Archaeology and related fields.  Activities designed to meet this goal will address and provide common solutions to the definition and implementation of joint policies and solutions for the humanities and linguistic data lifecycle, taking into account the specific needs of the sector including the provision of joint training activities and modules on topics related to understanding research infrastructures and mangaging, improving and openin up research and data for both learners and trainers.

    More information about the PARTHENOS project can be found at:  http://www.parthenos-project.eu/about-the-project-2.  Other training modules created by PARTHENOS can be found at:  http://training.parthenos-project.eu/training-modules/.

  • OntoSoft Tutorial: A distributed semantic registry for scientific software

    An overview of the OntoSoft project, an intelligent system to assist scientists in making their software more discoverable and reusable.

    For more information on the OntoSoft project, go to ​https://imcr.ontosoft.org/.

  • ANDS Guide to Persistent Identifiers: Awareness Level

    A persistent identifier (PID) is a long-lasting reference to a resource. That resource might be a publication, dataset or person. Equally it could be a scientific sample, funding body, set of geographical coordinates, unpublished report or piece of software. Whatever it is, the primary purpose of the PID is to provide the information required to reliably identify, verify and locate it. A PID may be connected to a set of metadata describing an item rather than to the item itself.
    The contents of this page are:
     What is a persistent identifier?
    Why do we need persistent identifiers?
    How do persistent identifiers work?
    What needs to be done, by whom?

    Other ANDS Guides are available at the working level and expert level from this page.

  • ANDS Guides to Persistent Identifiers: Working Level

    This module is to familiarize researchers and administrators with persistent identifiers as they apply to research. It gives an overview of the various issues involved with ensuring identifiers provide ongoing access to research products. The issues are both technical and policy; this module focuses on policy issues. 
    This guide goes through the same issues as the ANDS guide Persistent identifiers: awareness level, but in more detail. The introductory module is not a prerequisite for this module.
    The contents of this page are:
    Why persistent identifiers?
    What is an Identifier?
    Data and Identifier life cycles
    What is Identifier Resolution?
    Technologies
    Responsibilities
    Policies

    Other ANDS Guides on this topic at the awareness level and expert level can be found from this page.

  • ANDS Guides to Persistent identifiers: Expert Level

    This module aims to provide research administrators and technical staff with a thorough understanding of the issues involved in setting up a persistent identifier infrastructure. It provides an overview of the types of possible identifier services, including core services and value-added services. It offers a comprehensive review of the policy issues that are involved in setting up persistent identifiers. Finally, a glossary captures the underlying concepts on which the policies and services are based.

    Other ANDS Guides on this topic are available for the awareness level and the working level from this page.

  • Findability of Research Data and Software Through PIDs and FAIR Repositories

    This presentation introducing the "Findability of Research Data and Software Through PIDs and FAIR Repositories" is one of 9 webinars on topics related to FAIR Data and Software that was offered at a Carpentries-based Workshop in Hannover, Germany, Jul 9-13 2018.  Presentation slides are also available in addition to the recorded presentation.
    Other topics included in the series include:
    - Introduction, FAIR Principles and Management Plans
    - Accessibility through Git, Python Functions and Their Documentation
    - Interoperability through Python Modules, Unit-Testing and Continuous Integration
    - Reusability through Community Standards, Tidy Data Formats and R Functions, their Documentation, Packaging, and Unit-Testing
    - Reusability:  Data Licensing
    - Reusability:  Software Licensing
    - Reusability:  Software Publication
    - FAIR Data and Software - Summary
     
    URL locations for the other modules in the webinar can be found at the URL above.

  • Accessibility Through Git, Python Functions and Their Documentation

    This presentation " Accessibility Through Git, Python Functions and Their Documentation" is one of 9 webinars on topics related to FAIR Data and Software that was offered at a Carpentries-based Workshop in Hannover, Germany, Jul 9-13 2018.  Presentation slides are also available in addition to the recorded presentation.
    In this presentation they Talk about:
    - The definitions and role of Accessibility
    - Version control & project management with GIT(HUB)
    - Accessible software & comprehensible code
    - Functions in python & R

    Other topics included in the series include:
    - Introduction, FAIR Principles and Management Plans
    - Findability of Research Data and Software Through PIDs and FAIR Repositories
    - Interoperability through Python Modules, Unit-Testing and Continuous Integration
    - Reusability through Community Standards, Tidy Data Formats and R Functions, their Documentation, Packaging, and Unit-Testing
    - Reusability:  Data Licensing
    - Reusability:  Software Licensing
    - Reusability:  Software Publication
    - FAIR Data and Software - Summary
     
    URL locations for the other modules in the webinar can be found at the URL above.

  • FAIR Data and Software - Summary

    This presentation Summary of FAIR Data and Software  is one of 9 webinars on topics related to FAIR Data and Software that was offered at a Carpentries-based Workshop in Hannover, Germany, Jul 9-13 2018.  Presentation slides are also available in addition to the recorded presentation.
     
    Other topics included in the series include:
    - Introduction, FAIR Principles and Management Plans
    -Findability of Research Data and Software Through PIDs and FAIR Repositories
    - Accessibility through Git, Python Functions and Their Documentation
    - Interoperability through Python Modules, Unit-Testing and Continuous Integration
    - Reusability Through Community-Standards, Tidy Data Formats and R Functions, Their Documentation, Packaging and Unit-Testing
    - Reusability: Data Licensing
    - Reusability: Software Licensing
    - Reusability:  Software Publication

    URL locations for the other modules in the webinar can be found at the URL above.

  • Formal Ontologies: A Complete Novice's Guide

    This module is specifically aimed at those who are not yet familiar with ontologies as a means of research data management, and will take you through some of the main features of ontologies, and the reasons for using them.  If you’d like to take a step back to a very basic introduction to knowledge representation systems, you could have a look at the brief summary we have given in the ‘Introduction to Research Infrastructures Module’ before starting.
    By the end of this module, participants should be able to:
    -Understand what we mean by ‘Data Hetereogeneity’, and how it affects knowledge representation
    -Understand and explain the basic concept of an ontology
    -Understand and explain how ontologies are used to curate and share research data

    PARTHENOS training provides modules and resources in digital humanities and research infrastructures with the goal of strengthening the cohesion of research in the broad sector of Linguistic Studies, Humanities, Cultural Heritage, History, Archaeology and related fields.  Activities designed to meet this goal will address and provide common solutions to the definition and implementation of joint policies and solutions for the humanities and linguistic data lifecycle, considering the specific needs of the sector including the provision of joint training activities and modules on topics related to understanding research infrastructures and managing, improving and opening up research and data for both learners and trainers.
    More information about the PARTHENOS project can be found at:  http://www.parthenos-project.eu/about-the-project-2.
      Other training modules created by PARTHENOS can be found at:  http://training.parthenos-project.eu/training-modules/.
     

  • PARTHENOS E-Humanities and E-Heritage Webinar Series

    The PARTHENOS eHumanities and eHeritage Webinar Series provides a lens through which a more nuanced understanding of the role of Digital Humanities and Cultural Heritage research infrastructures in research can be obtained.  Participants of the PARTHENOS Webinar Series will delve into a number of topics, technologies, and methods that are connected with an “infrastructural way” of engaging with data and conducting humanities research.

    Topics include: theoretical and practical reflections on digital and analogue research infrastructures; opportunities and challenges of eHumanities and eResearch; finding, working and contributing to Research Infrastructure collections; standards; FAIR principles; ontologies; tools and Virtual Research Environments (VREs), and; new publication and dissemination types.  

    Slides and video recordings of the webinars can be found from the "Wrap Up & Materials" pages at the landing page for each webinar's separate listing/linking that can be found on this series landing page.  

    Learning Objectives: 
    Each webinar of the PARTHENOS Webinar Series has an individual focus and can be followed independently.  Participants who follow the whole series will gain a complete overview on the role and value of Digital Humanities and Cultural Heritage Research Infrastructures for research, and will be able to identify Research Infrastructures especially valuable for their research and data.
     

  • STL Data Curation Primer

    An STL file stores information about 3D models. It is commonly used for printing 3D objects. The STL format approximates 3D surfaces of a solid model with oriented triangles (facets) of different size and shape (aspect ratio) in order to achieve a representation suitable for viewing or reproduction using digital fabrication. This format describes only the surface geometry of a three-dimensional object without any representation of color, texture, or other common model attributes. These files are usually generated as an end product of a 3D modeling or spatial capture process. The purpose of this primer is to guide a data curator through the curation process for STL files.

    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #2 held at Johns Hopkins University on April 17-18, 2019.
    The full set of Data Curation Primers can be found at:https://conservancy.umn.edu/handle/11299/202810
    Interactive primers available for download and derivatives at:https://github.com/DataCurationNetwork/data-primers
     

  • GeoJSON Data Curation Primer

    GeoJSON is a geospatial data interchange format for encoding vector geographical data structures, such as point, line, and polygon geometries, as well as their non-spatial attributes. The purpose of this primer is to guide a data curator through the curation process for GeoJSON files. Key questions for curation review:
    ● Are coordinates listed in the following format: [longitude, latitude, elevation] 
    ● Can the file be opened in a text editor and viewed in QGIS 
    ● Does the file pass validation 
    ● Is there associated metadata/README.md files
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #2 held at Johns Hopkins University on April 17-18, 2019.
    The full set of Data Curation Primers can be found at:https://conservancy.umn.edu/handle/11299/202810
    Interactive primers available for download and derivatives at:https://github.com/DataCurationNetwork/data-primers
     

  • Data Management using NEON Small Mammal Data

    Undergraduate STEM students are graduating into professions that require them to manage and work with data at many points of a data management lifecycle. Within ecology, students are presented not only with many opportunities to collect data themselves but increasingly to access and use public data collected by others. This activity introduces the basic concept of data management from the field through to data analysis. The accompanying presentation materials mention the importance of considering long-term data storage and data analysis using public data.

    Content page: ​https://github.com/NEONScience/NEON-Data-Skills/blob/master/tutorials/te...

  • Environmental Data Initiative Five Phases of Data Publishing Webinar - What are metadata and structured metadata?

    Metadata are essential to understanding a dataset. The talk covers:

    • How structured metadata are used to document, discover, and analyze ecological datasets.
    • Tips on creating quality metadata content.
    • An introduction to the metadata language used by the Environmental Data Initiative, Ecological Metadata Language (EML). EML is written in XML, a general purpose mechanism for describing hierarchical information, so some general XML features and how these apply to EML are covered.

    This video in the Environmental Data Initiative (EDI) "Five Phases of Data Publishing" tutorial series covers the third phase of data publishing, describing.

     

  • Environmental Data Initiative Five Phases of Data Publishing Webinar - Creating "clean" data for archiving

    Not all data are easy to use, and some are nearly impossible to use effectively. This presentation lays out the principles and some best practices for creating data that will be easy to document and use. It will identify many of the pitfalls in data preparation and formatting that will cause problems further down the line and how to avoid them.

    This video in the Environmental Data Initiative (EDI) "Five Phases of Data Publishing" tutorial series covers the second phase of data publishing, cleaning data. For more guidance from EDI on data cleaning, also see "How to clean and format data using Excel, OpenRefine, and Excel," located here: ​https://www.youtube.com/watch?v=tRk01ytRXjE.

  • Environmental Data Initiative Five Phases of Data Publishing Webinar - How to clean and format data using Excel, OpenRefine, and Excel

    This webinar provides an overview of some of the tools available for formatting and cleaning data,  guidance on tool suitability and limitations, and an example dataset and instructions for working with those tools.

    This video in the Environmental Data Initiative (EDI) "Five Phases of Data Publishing" tutorial series covers the second phase of data publishing, cleaning data.

    For more guidance from EDI on data cleaning, also see " Creating 'clean' data for archiving," located here:  https://www.youtube.com/watch?v=gW_-XTwJ1OA.