FAIR Data Principles
Simplifying the Reuse and Interoperability of Hydrologic Data Sets and Models with Semantic Metadata that is Human-Readable & Machine-Actionable
This slide set discusses the big, generic problem facing geoscientists today that stems from lack of interoperability across a huge number of heterogeneous resources, and how to solve it. Practical solutions to tame the inherent heterogeneity involve the collection of standardized, "deep-description" metadata for resources that are then wrapped with standardized APIs that provide callers wtih access to both the data and the metadata.
Working in the R Ecosystem: Building Applications & Content for Your Gateway
The R programming language first appeared on the scene in the 1990's as an open source environment for statistical modeling and data analysis. Throughout the last decade, interest in the language has grown alongside researcher's abilities to collect and store larger amounts of data. Today, scientific and business decisions increasingly rely on the interpretation of this data. New libraries for processing data and communicating results are being debuted in ways that break down traditional language silos. Technologies like interactive documents, HTML based applications, and RESTful APIs have exposed capability gaps between R's interfaces for numerical analysis libraries and its built-in ability for graphical display. In this webinar, Derrick Kearney will survey several R libraries that are helping people bridge the gap between their R-based analysis and the numerous ways people are representing results today, all of which can be published on your science gateway, thus extending your research impact to others in a reproducible way.
Webinar: National Data Service (NDS) Labs Workbench
The growing size and complexity of high-value scientific datasets are pushing the boundaries of traditional models of data access and discovery. Many large datasets are only accessible through the systems on which they were created or require specialized software or computational resources for re-use. In response to this growing need, the National Data Service (NDS) consortium is developing the Labs Workbench platform, a scalable, web-based system intended to support turn-key deployment of encapsulated data management and analysis tools to support exploratory analysis and development on cloud resources that are physically "near" the data and associated high-performance computing (HPC) systems. The Labs Workbench may complement existing science gateways by enabling exploratory analysis of data and the ability for users to deploy and share their own tools. The Labs Workbench platform has also been used to support a variety training and workshop environments.
This webinar includes a demonstration of the Labs Workbench platform and a discussion of several key use cases. A presentation of findings from the recent Workshop on Container Based Analysis Environments for Research Data Access and Computing further highlight compatibilities between science gateways and interactive analysis platforms such as Labs Workbench.
23 (research data) Things
23 (research data) Things is self-directed learning for anybody who wants to know more about research data. Anyone can do 23 (research data) Things at any time. Do them all, do some, cherry-pick the Things you need or want to know about. Do them on your own, or get together a Group and share the learning. The program is intended to be flexible, adaptable and fun!
Each of the 23 Things offers a variety of learning opportunities with activities at three levels of complexity: ‘Getting started’, ‘Learn more’ and ‘Challenge me’. All resources used in the program are online and free to use.
FAIR Self-Assessment Tool
The FAIR Data Principles are a set of guiding principles in order to make data findable, accessible, interoperable and reusable (Wilkinson et al., 2016). Using this tool you will be able to assess the 'FAIRness' of a dataset and determine how to enhance its FAIRness (where applicable).
This self-assessment tool has been designed predominantly for data librarians and IT staff but could be used by software engineers developing FAIR Data tools and services, and researchers provided they have assistance from research support staff.
You will be asked questions related to the principles underpinning Findable, Accessible, Interoperable and Reusable. Once you have answered all the questions in each section you will be given a ‘green bar’ indicator based on your answers in that section, and when all sections are completed, an overall 'FAIRness' indicator is provided.
Webinar: Jupyter as a Gateway for Scientific Collaboration and Education
Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism, and industry. The core premise of the Jupyter architecture is to design tools around the experience of interactive computing, building an environment, protocol, file format and libraries optimized for the computational process when there is a human in the loop, in a live iteration with ideas and data assisted by the computer.
The Jupyter Notebook, a system that allows users to compose rich documents that combine narrative text and mathematics together with live code and the output of computations in any format compatible with a web browser (plots, animations, audio, video, etc.), provides a foundation for scientific collaboration. The next generation of the Jupyter web interface, JupyterLab, will combine in a single user interface not only the notebook but multiple other tools to access Jupyter services and remote computational resources and data. A flexible and responsive UI allows the user to mix Notebooks, terminals, text editors, graphical consoles and more, presenting in a single, unified environment the tools needed to work with a remote environment. Furthermore, the entire design is extensible and based on plugins that interoperate via open APIs, making it possible to design new plugins tailored to specific types of data or user needs.
JupyterHub enables Jupyter Notebook and JupyterLab to be used by groups of users for research collaboration and education. We believe JupyterHub provides a foundation on which to build modern scientific gateways that support a wide range of user scenarios, from interactive data exploration in high-level languages like Python, Julia or R, to the education of researchers and students whose work relies on traditional HPC resources.
The presenter discusses the benefits and applications of Jupyter Notebooks.
Scroll to the bottom of the page to view the webinar. Presentation slides are also available on the same page.
Access Policies and Usage Regulations: Licenses
The webinar about licensing and policy will look into why it is important that research data are provided with licenses.
- Benefits of sharing research data
- Types of licenses
- Data ownership and reuse
- Using creative commons in archiving research data
During the workshop, participants will acquire a basic knowledge of data licensing.
Postgres, EML and R in a data management workflow
Metadata storage and creation of Ecological Metadata Language (EML) can be a challenge for people and organizations who want to archive their data. A workflow was developed to combine efficient EML record generation (using the package developed by the R community) with centrally-controlled metadata in a relational database. The webinar has two components: 1) a demonstration of metadata storage and management using a relational database, and 2) discussion of an example EML file generation workflow using pre-defined R functions.
Data Management using NEON Small Mammal Data
Undergraduate STEM students are graduating into professions that require them to manage and work with data at many points of a data management lifecycle. Within ecology, students are presented not only with many opportunities to collect data themselves but increasingly to access and use public data collected by others. This activity introduces the basic concept of data management from the field through to data analysis. The accompanying presentation materials mention the importance of considering long-term data storage and data analysis using public data.
Licensing your research outputs is an important part of practicing Open Science. After completing this course, you will:
- Know what licenses are, how they work, and how to apply them
- Understand how different types of licenses can affect research output reuse
- Know how to select the appropriate license for your research
Florilege, a new database of habitats and phenotypes of food microbe flora
This tutorial explains how to use the “Habitat-Phenotype Relation Extractor for Microbes” application available from the OpenMinTeD platform. It also explains the scientific issues it addresses, and how the results of the TDM process can be queried and exploited by researchers through the Florilège application.
In recent years, developments in molecular technologies have led to an exponential growth of experimental data and publications, many of which are open, however accessible separately. Therefore, it is now crucial for researchers to have bioinformatics infrastructures at their disposal, that propose unified access to both data and related scientific articles. With the right text mining infrastructures and tools, application developers and data managers can rapidly access and process textual data, link them with other data and make the results available for scientists.
The text-mining process behind Florilege has been set up by INRA using the OpenMinTeD environment. It consists in extracting the relevant information, mostly textual, from scientific literature and databases. Words or word groups are identified and assigned a type, like “habitat” or “taxon”.
Sections of the tutorial:
1. Biological motivation of the Florilege database
2. Florilège Use-Case on OpenMinTeD (includes a description of how to access the Habitat-Phenotype Relation Extractor for Microbes application)
3. Florilege backstage: how is it build?
4. Florilège description
5. How to use Florilege ?
Managing and Sharing Research Data
Data-driven research is becoming increasingly common in a wide range of academic disciplines, from Archaeology to Zoology, and spanning Arts and Science subject areas alike. To support good research, we need to ensure that researchers have access to good data. Upon completing this course, you will:
- Understand which data you can make open and which need to be protected
- Know how to go about writing a data management plan
- Understand the FAIR principles
- Be able to select which data to keep and find an appropriate repository for them
- Learn tips on how to get maximum impact from your research data
Environmental Data Initiative Five Phases of Data Publishing Webinar - What are metadata and structured metadata?
Metadata are essential to understanding a dataset. The talk covers:
- How structured metadata are used to document, discover, and analyze ecological datasets.
- Tips on creating quality metadata content.
- An introduction to the metadata language used by the Environmental Data Initiative, Ecological Metadata Language (EML). EML is written in XML, a general purpose mechanism for describing hierarchical information, so some general XML features and how these apply to EML are covered.
This video in the Environmental Data Initiative (EDI) "Five Phases of Data Publishing" tutorial series covers the third phase of data publishing, describing.
Environmental Data Initiative Five Phases of Data Publishing Webinar - Make metadata with the EML assembly line
High-quality structured metadata is essential to the persistence and reuse of ecological data; however, creating such metadata requires substantial technical expertise and effort. To accelerate the production of metadata in the Ecological Metadata Language (EML), we’ve created the EMLassemblyline R code package. Assembly line operators supply the data and information about the data, then the machinery auto-extracts additional content and translates it all to EML. In this webinar, the presenter will provide an overview of the assembly line, how to operate it, and a brief demonstration of its use on an example dataset.
This video in the Environmental Data Initiative (EDI) "Five Phases of Data Publishing" tutorial series covers the third phase of data publishing, describing.
Environmental Data Initiative Five Phases of Data Publishing Webinar - Creating "clean" data for archiving
Not all data are easy to use, and some are nearly impossible to use effectively. This presentation lays out the principles and some best practices for creating data that will be easy to document and use. It will identify many of the pitfalls in data preparation and formatting that will cause problems further down the line and how to avoid them.
This video in the Environmental Data Initiative (EDI) "Five Phases of Data Publishing" tutorial series covers the second phase of data publishing, cleaning data. For more guidance from EDI on data cleaning, also see "How to clean and format data using Excel, OpenRefine, and Excel," located here: https://www.youtube.com/watch?v=tRk01ytRXjE.
Data Management Expert Guide
This guide is written for social science researchers who are in an early stage of practising research data management. With this guide, CESSDA wants to contribute to professionalism in data management and increase the value of research data.
If you follow the guide, you will travel through the research data lifecycle from planning, organising, documenting, processing, storing and protecting your data to sharing and publishing them. Taking the whole roundtrip will take you approximately 15 hours, however you can also hop on and off at any time.
CESSDA Expert Tour Guide on Data Management
Target audience and mission:
This tour guide was written for social science researchers who are in an early stage of practising research data management. With this tour guide, CESSDA wants to contribute to increased professionalism in data management and to improving the value of research data.
If you follow the guide, you will travel through the research data lifecycle from planning, organising, documenting, processing, storing and protecting your data to sharing and publishing them. Taking the whole roundtrip will take you approximately 15 hours. You can also just hop on and off.
During your travels, you will come across the following recurring topics:
Adapt Your DMP
Current chapters include the following topics: Plan; Organise & Document; Process; Store; Protect; Archive & Publish. Other chapters may be added over time.
Research Rigor & Reproducibility: Understanding the Data Lifecycle for Research Success
This course provides recommended practices for facilitating the discoverability, access, integrity, and reuse value of your research data. The modules have been selected from a larger Canvas course "Best Practices for Biomedical Research Data Management (https://www.canvas.net/browse/harvard-medical/courses/biomed-research-da... ).
Biomedical research today is not only rigorous, innovative and insightful, it also has to be organized and reproducible. With more capacity to create and store data, there is the challenge of making data discoverable, understandable, and reusable. Many funding agencies and journal publishers are requiring publication of relevant data to promote open science and reproducibility of research.
In this course, students will learn how to identify and address current workflow challenges throughout the research life cycle. By understanding best practices for managing your data throughout a project, you will succeed in making your research ready to publish, share, interpret, and be used by others. Course materials include video lectures, presentation slides, readings and resources, research case studies, interactive activities and concept quizzes.
FAIR Webinar Series
This webinar series explores each of the four FAIR principles (Findable, Accessible, Interoperable, Reusable) in depth - practical case studies from a range of disciplines, Australian and international perspectives, and resources to support the uptake of FAIR principles.
The FAIR data principles were drafted by the FORCE11 group in 2015. The principles have since received worldwide recognition as a useful framework for thinking about sharing data in a way that will enable maximum use and reuse. A seminal article describing the FAIR principles can also be found at: https://www.nature.com/articles/sdata201618.
This series is of interest to those who work with creating, managing, connecting and publishing research data at institutions:
- researchers and research teams who need to ensure their data is reusable and publishable
- data managers and researchers
- Librarians, data managers and repository managers
- IT who need to connect Institutional research data, HR and other IT systems
Coffee and Code: Introduction to Version Control
This is a tutorial about version control, also known as revision control, a method for tracking changes to files and folders within a source code tree, project, or any complex set of files or documents.
Also see Advanced Version Control, here: https://github.com/unmrds/cc-version-control/blob/master/03-advanced-ver...
Coffee and Code: Advanced Version Control
Learn advanced version control practices for tracking changes to files and folders within a source code tree, project, or any complex set of files or documents.
This tutorial builds on concepts taught in "Introduction to Version Control," found here: https://github.com/unmrds/cc-version-control/blob/master/01-version-cont....
Git Repository for this Workshop: https://github.com/unmrds/cc-version-control
MANTRA Research Data Management Training
MANTRA is a free, online non-assessed course with guidelines to help you understand and reflect on how to manage the digital data you collect throughout your research. It has been crafted for the use of post-graduate students, early career researchers, and also information professionals. It is freely available on the web for anyone to explore on their own.
Through a series of interactive online units you will learn about terminology, key concepts, and best practice in research data management.
There are eight online units in this course and one set of offline (downloadable) data handling tutorials that will help you:
Understand the nature of research data in a variety of disciplinary settings
Create a data management plan and apply it from the start to the finish of your research project
Name, organise, and version your data files effectively
Gain familiarity with different kinds of data formats and know how and when to transform your data
Document your data well for yourself and others, learn about metadata standards and cite data properly
Know how to store and transport your data safely and securely (backup and encryption)
Understand legal and ethical requirements for managing data about human subjects; manage intellectual property rights
Understand the benefits of sharing, preserving and licensing data for re-use
Improve your data handling skills in one of four software environments: R, SPSS, NVivo, or ArcGIS
Research Data Management and Open Data
This was a presentation during the Julius Symposium 2017 on Open Science and in particular on Open data and/or FAIR data. Examples are given of medical and health research data.
Coffee and Code: Write Once Use Everywhere (Pandoc)
Pandoc at http://pandoc.org is a document processing program that runs on multiple operating systems (Mac, Windows, Linux) and can read and write a wide variety of file formats. In many respects, Pandoc can be thought of as a universal translator for documents. This workshop focuses on a subset of input and output document types, just scratching the surface of the transformations made possible by Pandoc.
Click 00-Overview.ipynb on the provided GitHub page or go directly to the overview, here:
Singularity User Guide
Singularity is a container solution created by necessity for scientific and application driven workloads. .
Over the past decade and a half, virtualization has gone from an engineering toy to a global infrastructure necessity and the evolution of enabling technologies has flourished. Most recently, we have seen the introduction of the latest spin on virtualization… “containers”.
Many scientists, especially those involved with the high performance computation (HPC) community, could benefit greatly by using container technology, but they need a feature set that differs somewhat from that available with current container technology. This necessity drives the creation of Singularity and articulated its four primary functions:
- Mobility of compute
- User freedom
- Support on existing traditional HPC
This user guide introduces Singularity, a free, cross-platform and open-source computer program that performs operating-system-level virtualization also known as containerization.