All Learning Resources

  • Tutorial: DataCite Linking

    This tutorial walks users through the simple process of creating a workflow in the OpenMinTeD platform that allows them to extract links to DataCite ( - mainly citations to datasets - from scientific publications.

  • Florilege, a new database of habitats and phenotypes of food microbe flora

    This tutorial explains how to use the “Habitat-Phenotype Relation Extractor for Microbes” application available from the OpenMinTeD platform. It also explains the scientific issues it addresses, and how the results of the TDM process can be queried and exploited by researchers through the Florilège application.  

    In recent years, developments in molecular technologies have led to an exponential growth of experimental data and publications, many of which are open, however accessible separately. Therefore, it is now crucial for researchers to have bioinformatics infrastructures at their disposal, that propose unified access to both data and related scientific articles. With the right text mining infrastructures and tools, application developers and data managers can rapidly access and process textual data, link them with other data and make the results available for scientists.

    The text-mining process behind Florilege has been set up by INRA using the OpenMinTeD environment. It consists in extracting the relevant information, mostly textual, from scientific literature and databases. Words or word groups are identified and assigned a type, like  “habitat” or “taxon”.

    Sections of the tutorial:
    1. Biological motivation of the Florilege database
    2. Florilège Use-Case on OpenMinTeD (includes a description of how to access the Habitat-Phenotype Relation Extractor for Microbes application)
    3. Florilege backstage: how is it build?
    4. Florilège description
    5. How to use Florilege ?


  • Best Practice in Open Research

    This course introduces some practical steps toward making your research more open. We begin by exploring the practical implications of open research, and the benefits it can deliver for research integrity and public trust, as well as benefits you will accrue in your own work. After a short elaboration of some useful rules of thumb, we move quickly onto some more practical steps towards meeting contemporary best practice in open research and introduce some useful discipline-specific resources. Upon completing this course, you will:

    • Understand the practical implications of taking a more open approach to research
    • Be prepared to meet expectations relating to openness from funders, publishers, and peers 
    • Be able to reap the benefits of working openly
    • Have an understanding of the guiding principles to follow when building openness into your research workflow
    • Know about some useful tools and resources to help you embed Open Science into work research practices
  • Managing and Sharing Research Data

    Data-driven research is becoming increasingly common in a wide range of academic disciplines, from Archaeology to Zoology, and spanning Arts and Science subject areas alike. To support good research, we need to ensure that researchers have access to good data. Upon completing this course, you will:

    • Understand which data you can make open and which need to be protected
    • Know how to go about writing a data management plan
    • Understand the FAIR principles
    • Be able to select which data to keep and find an appropriate repository for them
    • Learn tips on how to get maximum impact from your research data
  • GeoNode for Developers Workshop

    GeoNode is a web-based application and platform for developing geospatial information systems (GIS) and for deploying spatial data infrastructures (SDI). It is designed to be extended and modified and can be integrated into existing platforms.
    This workshop covers the following topics:

    • GeoNode in development mode, how to
    • The geonode-project to customize GeoNode
    • Change the look and feel of the application
    • Add your own app
    • Add your own models, view, and logic
    • Build your own APIs
    • Add a third party app
    • Deploy your customized GeoNode

    To access geonode-project on GitHub, go to .


  • Science Impact of Sustained Cyberinfrastructure: The Pegasus Example

    This talk is the first in a series of NSF's Office of Advanced Cyberinfrastructure (OAC) webinars. Dr. Deelman describes the challenges of developing and sustaining cyberinfrastructure capabilities that have impact on scientific discovery and that innovate in the changing cyberinfrastructure landscape. The recent multi-messenger observation triggered by LIGO and VIRGO’s first detection of gravitational waves produced by colliding neutron stars is a clear display of the increasing impact of dependable research cyberinfrastructure (CI) on scientific discovery.

    Today’s cyberinfrastructure—hardware, software, and workforce—underpins the entire scientific workflow, from data collection at instruments, through complex analysis, to simulation, visualization, and analytics. The Pegasus project in an example of a cyberinfrastructure effort that enables LIGO and other communities to accomplish their scientific goals. In addition, it delivers robust automation capabilities to researchers at the Southern California Earthquake Center (SCEC) studying seismic phenomena, to astronomers seeking to understand the structure of the universe, to material scientists developing new drug delivery methods, and to students seeking to understand human population migration.

  • Environmental Data Initiative Five Phases of Data Publishing Webinar - Make metadata with the EML assembly line

    High-quality structured metadata is essential to the persistence and reuse of ecological data; however, creating such metadata requires substantial technical expertise and effort. To accelerate the production of metadata in the Ecological Metadata Language (EML), we’ve created the EMLassemblyline R code package. Assembly line operators supply the data and information about the data, then the machinery auto-extracts additional content and translates it all to EML. In this webinar, the presenter will provide an overview of the assembly line, how to operate it, and a brief demonstration of its use on an example dataset.

    This video in the Environmental Data Initiative (EDI) "Five Phases of Data Publishing" tutorial series covers the third phase of data publishing, describing.


  • ISRIC Spring School

    The ISRIC Spring School aims to introduce participants to world soils, soil databases, software for soil data analysis and visualisation, digital soil mapping and soil-web services through two 5-day courses run in parallel.  Target audiences for the Spring School include soil and environmental scientists involved in (digital) soil mapping and soil information production at regional, national and continental scales; soil experts and professionals in natural resources management and planning; and soil science students at MSc and PhD level.  Examples courses include "World Soils and their Assessment (WSA) and Hands-on Global Soil Information Facilities (GSIF).  Data management topics are included within the course topics.

  • Hands-on Intro to SQL (Structured Query Language)

    This workshop will teach the basics of working with and querying structured data in a database environment. This workshop uses the SQLite plugin for Firefox.  The data used is a time-series for a small mammal community in southern Arizona in the southern United States. This is part of a project studying the effects of rodents and ants on the plant community that has been running for almost 40 years. The rodents are sampled on a series of 24 plots, with different experimental manipulations controlling which rodents are allowed to access which plots.

  • DATUM for Health: Research data management training for health studies

    The DATUM for Health training programme covers both generic and discipline-specific issues, focusing on the management of qualitative, unstructured data, and is suitable for students at any stage of their PhD. It aims to provide students with the knowledge to manage their research data at every stage in the data lifecycle, from creation to final storage or destruction. They learn how to use their data more effectively and efficiently, how to store and destroy it securely, and how to make it available to a wider audience to increase its use, value and impact.

    The programme comprises:

    Overview: programme aims and scope, design, outline content and materials, recommendations 
    Session 1: Introduction to research data management (URL
    Session 2: Data curation lifecycle
    Session 3: Problems and practical strategies and solutions

    For each session the materials comprise PPT slides, notes for tutors and handouts.

  • Datatree - Data Training Engaging End-users

    *Requires sigining up for a free account*

    A free online course with all you need to know for research data management, along with ways to engage and share data with business, policymakers, media and the wider public.

    The self-paced course will take 15 to 20 hours to complete in eight structured modules. The course is packed with video, quizzes and real-life examples of data management, along with plenty of additional background information.

    The materials will be available for structured learning, but also to dip in for immediate problem solving.

  • Data Management Expert Guide

    This guide is written for social science researchers who are in an early stage of practising research data management. With this guide, CESSDA wants to contribute to professionalism in data management and increase the value of research data.

    If you follow the guide, you will travel through the research data lifecycle from planning, organising, documenting, processing, storing and protecting your data to sharing and publishing them. Taking the whole roundtrip will take you approximately 15 hours, however you can also hop on and off at any time.

  • Diversity Workbench (DWB) in 15 Steps

    Introduction and demonstration of the Diversity Workbench (DWB), ​a "virtual research environment for multiple scientific purposes with regard to management and analysis of life and environmental sciences data. ​The framework is appropriate to store different kinds of bio- and geodiversity data, taxonomies, terminologies, and facilitates the processing of ecological, molecular biological, observational, collection and taxonomic data" (DWB).
    For detailed information about DWB, go to ​

  • CESSDA Expert Tour Guide on Data Management

    Target audience and mission:
    This tour guide was written for social science researchers who are in an early stage of practising research data management. With this tour guide, CESSDA wants to contribute to increased professionalism in data management and to improving the value of research data.
    If you follow the guide, you will travel through the research data lifecycle from planning, organising, documenting, processing, storing and protecting your data to sharing and publishing them. Taking the whole roundtrip will take you approximately 15 hours. You can also just hop on and off.
    During your travels, you will come across the following recurring topics:
    Adapt Your DMP
    European Diversity
    Expert Tips
    Tour Operators
    Current chapters include the following topics:  Plan; Organise & Document; Process; Store; Protect;  Archive & Publish.  Other chapters may be added over time.

  • Plan, a chapter of the CESSDA Expert Tour Guide on Data Management

    This introductory chapter features a brief introduction to research data management and data management planning.
    Before we get you started on making your own Data Management Plan (DMP), we will guide you through the concepts which provide the basic knowledge for the rest of your travels. Research data, social science data and FAIR data are some of the concepts you will pass by.
    After completing your travels through this chapter you should be:
    Familiar with concepts such as (sensitive) personal data and FAIR principles;
    Aware of what data management and a data management plan (DMP) is and why it is important;
    Familiar with the content elements that make up a DMP;
    Able to answer the DMP questions which are listed at the end of this chapter and adapt your own DMP.

  • Organise & Document, a chapter of the CESSDA Expert Tour Guide on Data Management

    In this chapter, we provide you with tips and tricks on how to properly organise and document your data and metadata.
    We begin with discussing good practices in designing an appropriate data file structure, file naming and organising your data within suitable folder structures. You will find out how the way you organise your data facilitates orientation in the data file, contributes to understanding the information contained and helps to prevent errors and misinterpretations.
    In addition, we will focus on an appropriate documentation of your data. Development of rich metadata is required by FAIR data principles and any other current standards promoting data sharing.
    After completing your travels through this chapter on organising and documenting your data you should:
    Be aware of the elements which are important in setting up an appropriate structure and organisation of your data for intended research work and data sharing;
    Have an overview of best practices in file naming and organising your data files in a well-structured and unambiguous folder structure;
    Understand how comprehensive data documentation and metadata increases the chance your data are correctly understood and discovered;
    Be aware of common metadata standards and their value;
    Be able to answer the DMP questions which are listed at the end of this chapter and adapt your own DMP.

  • Process, a chapter of the CESSDA Expert Tour on Data Management

    In this chapter we focus on data operations needed to prepare your data files for analysis and data sharing. Throughout the different phases of your project, your data files will be edited numerous times. During this process it is crucial to maintain the authenticity of research information contained in the data and prevent it from loss or deterioration.
    However, we will start with the topics of data entry and coding as the first steps of your work with your data files. Finally, you will learn about the importance of a comprehensive approach to data quality.
    After completing your travels through this chapter you should:
    Be familiar with strategies to minimise errors during the processes of data entry and data coding;
    Understand why the choice of file format should be planned carefully;
    Be able to manage the integrity and authenticity of your data during the research process;
    Understand the importance of a systematic approach to data quality;
    Able to answer the DMP questions which are listed at the end of this chapter and adapt your own DMP.

  • Store, a chapter of the CESSDA Expert Tour on Data Management

    The data that you collect, organise, prepare, and analyse to answer your research questions, and the documentation describing it are the lifeblood of your research. Put bluntly: without data, there is no research. It is therefore essential that you take adequate measures to protect your data against accidental loss and against unauthorised manipulation.
    Particularly when collecting (sensitive) personal data it is necessary to ensure that these data can only be accessed by those authorized to do so. In this chapter, you will learn more about measures to help you address these threats.
    After completing your travels through this chapter you should:
    Be familiar with strategies to minimise errors during the processes of data entry and data coding;
    Understand why the choice of file format should be planned carefully;
    Be able to manage the integrity and authenticity of your data during the research process;
    Understand the importance of a systematic approach to data quality;
    Able to answer the DMP questions which are listed at the end of this chapter and adapt your own DMP.

  • Protect, a chapter of the CESSDA Expert Tour on Data Management

    This part of the tour guide focuses on key legal and ethical considerations in creating shareable data.
    We begin with clarifying the different legal requirements of Member States, and the impact of the upcoming General Data Protection Regulation (GDPR) on research data management. Subsequently, we will show you how sharing personal data can often be accomplished by using a combination of obtaining informed consent, data anonymisation and regulating data access. Also, the supporting role of ethical review in managing your legal and ethical obligations is highlighted.
    After completing your trips around this chapter you should:
    Be aware of your legal and ethical obligations towards participants and be informed of the different legal requirements of Member States;
    Understand how well-protecting your data, protects you against violating laws and promises made to participants;
    Understand the impact of the upcoming General Data Protection Regulation (GDPR; European Union, 2016);
    Understand how a combination of informed consent, anonymisation and access controls allows you to create shareable personal data;
    Be able to define what elements should be integrated into a consent form;
    Be able to apply anonymisation techniques to your data;
    Be able to answer the DMP questions which are listed at the end of this chapter and adapt your own DMP.

  • Archive & Publish, a chapter of the CESSDA Expert Tour on Data Management

    High-quality data have the potential to be reused in many ways. Archiving and publishing your data properly will enable both your future self as well as future others to get the most out of your data.
    In this chapter, we venture into the landscape of research data archiving and publication. We will guide you in making an informed decision on where to archive and publish your data in such a way that others can properly access, understand, use and cite them.
    Understand the difference between data archiving and data publishing;
    Be aware of the benefits of data publishing;
    Be able to differentiate between different data publication services (data journal, self-archiving, a data repository);
    Be able to select a data repository which fits your research data's needs;
    Be aware of ways to promote your research data publication;
    Be able to answer the DMP questions which are listed at the end of this chapter and adapt your own DMP.

  • Research Data Management Hands on Workshop

    Description: This project includes material designed for teaching a 1.5 hour research data management workshop. It involves a case study that requires workshop participants to navigate messy data to identify the data that corresponds with the data represented in a figure from an article. Workshop attendees are then required to modify the messy data to follow research data management best practices.

  • Penn State Online: Introduction to GIS modeling and Python

    This unit is Lesson 1 of the online course, GEOG 485: GIS Programming and Software Development at PennState University's College of Earth and Mineral Sciences.
    As with GEOG 483 and GEOG 484, the lessons in this course are project-based with key concepts embedded within. However, because of the nature of computer programming, there is no way this course can follow the step-by-step instruction design of the previous courses. You will probably find the course to be more challenging than the others.

  • University of California Libraries: Research Data Matters

    What is research data and why is managing your data important? Where can you get help with research data management? In this introductory video, three University of California researchers address these questions from their own experience and explain the impact of good data management practices.  Researchers interviewed include Professor Christine Borgman, Professor Rick Prelinger, and Professor Marjorie Katz.  

  • Introduction to Python GIS for Data Science

    Module on Python and GIS part-time data science course was offered by General Assembly during Summer 2015. The module provides a quick introduction to Python and how it relates to GIS.  

  • Research Data Management: Practical Data Management

    A series of modules and video tutorials describing research data management best practices. 
    Module 1: Where to start - data planning

    1.1 ​Data Life Cycle & Searching for Data (5:59 minutes)
    1.3 File Naming (3:39 minutes)
    1.4 ReadMe Files, Library Support, Checklist (4:29 minutes)

    Module 2: Description, storage, archiving

    2.1 Data Description (2:16 minutes)
    2.2 Workflow Documentation & Metadata Standards (4:36 minutes)
    2.3 Storage & Backups (2:48 minutes)
    2.4 Archiving: How (2:50 minutes)
    2.5 Archiving: Where (3:57 minutes)

    Module 3: Publishing, sharing, visibility 

    3.1 What is Data Publishing? (4:50)
    3.2 What and Where to Publish? (1:47)
    3.3 Data Licenses (1:51)
    3.4 Citing and DOI's (1:09)
    3.5 ORCID (2:04)
    3.6 Altmetrics (2:15)

  • Research Rigor & Reproducibility: Understanding the Data Lifecycle for Research Success

    This course provides recommended practices for facilitating the discoverability, access, integrity, and reuse value of your research data.  The modules have been selected from a larger Canvas course "Best Practices for Biomedical Research Data Management ( ).

    Biomedical research today is not only rigorous, innovative and insightful, it also has to be organized and reproducible. With more capacity to create and store data, there is the challenge of making data discoverable, understandable, and reusable. Many funding agencies and journal publishers are requiring publication of relevant data to promote open science and reproducibility of research.

    In this course, students will learn how to identify and address current workflow challenges throughout the research life cycle. By understanding best practices for managing your data throughout a project, you will succeed in making your research ready to publish, share, interpret, and be used by others.  Course materials include video lectures, presentation slides, readings and resources, research case studies, interactive activities and concept quizzes.  

  • Best Practices for Biomedical Research Data Management

    This course presents approximately 20 hours of content aimed at a broad audience on recommended practices facilitating the discoverability, access, integrity, reuse value, privacy, security, and long-term preservation of biomedical research data.

    Each of the nine modules is dedicated to a specific component of data management best practices and includes video lectures, presentation slides, readings & resources, research teaching cases, interactive activities, and concept quizzes.

    Background Statement:
    Biomedical research today is not only rigorous, innovative and insightful, it also has to be organized and reproducible. With more capacity to create and store data, there is the challenge of making data discoverable, understandable, and reusable. Many funding agencies and journal publishers are requiring publication of relevant data to promote open science and reproducibility of research.

    In order to meet to these requirements and evolving trends, researchers and information professionals will need the data management and curation knowledge and skills to support the access, reuse and preservation of data.

    This course is designed to address present and future data management needs.

    Best Practices for Biomedical Research Data Management serves as an introductory course for information professionals and scientific researchers to the field of scientific data management.  The course is also offered by Canvas Instruction, at: .

    In this course, learners will explore relationships between libraries and stakeholders seeking support for managing their research data. 

  • EUDAT Research Data Management

    This site provides several videos on research data management, including why its important, metadata, archives, and other topics. 

    The EUDAT training programme is delivered through a multiple channel approach and includes:
    eTraining components delivered via the EUDAT website: a selection of presentations, documents and informative video tutorials clustered by topic and level of required skills targeting all EUDAT stakeholders.

    Ad-hoc workshops organised together with research communities and infrastructures to illustrate how to integrate EUDAT services in their research data management infrastructure. Mainly designed for research communities, infrastructures and data centres, they usually include pragmatic hands-on sessions.  Interested in a EUDAT workshop for your research community? Contact us at [email protected].

    One hour webinars delivered via the EUDAT website focusing on different research data management components and how EUDAT contributes to solving research data management challenges. 

  • DataONE Data Management Module 02: Data Sharing

    When first sharing research data, researchers often raise questions about the value, benefits, and mechanisms for sharing. Many stakeholders and interested parties, such as funding agencies, communities, other researchers, or members of the public may be interested in research, results and related data. This 30-40 minute lesson addresses data sharing in the context of the data life cycle, the value of sharing data, concerns about sharing data, and methods and best practices for sharing data and includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.

  • DataONE Data Management Module 01: Why Data Management

    As rapidly changing technology enables researchers to collect large, complex datasets with relative ease, the need to effectively manage these data increases in kind. This is the first lesson in a series of education modules intended to provide a broad overview of various topics related to research data management. This 30-40 minute module covers trends in data collection, storage and loss, the importance and benefits of data management, and an introduction to the data life cycle and includes a downloadable presentation (PPT or PDF) with supporting hands-on exercise and handout.

  • Introduction to R

    In this introduction to R, you will master the basics of this beautiful open source language, including factors, lists and data frames. With the knowledge gained in this course, you will be ready to undertake your first very own data analysis.  Topics include:  an introduction to basics, vectors, matrices, factors, lists and data forms.  Approximately 62 exercises are included.

  • Intro to Python for Data Science

    Python is a general-purpose programming language that is becoming more and more popular for doing data science. Companies worldwide are using Python to harvest insights from their data and get a competitive edge. Unlike any other Python tutorial, this course focuses on Python specifically for data science. In our Intro to Python class, you will learn about powerful ways to store and manipulate data as well as cool data science tools to start your own analyses.  Topics covered include:  Python basics, Python lists, functions and packages, and NumPy, an array package for Python.

  • Using, learning, teaching, and programming with the Paleobiology Database

    The Paleobiology Database is a public database of paleontological data that anyone can use, maintained by an international non-governmental group of paleontologists. You can explore the data online in the Navigator, which lets you filter fossil occurrences by time, space, and taxonomy, and displays their modern and paleogeographic locations; or you can download the data to your own computer to do your own analyses.  The educational resources offered by the Paleobiology include:
    - Presentations including lectures and slide shows to introduce you to the PBDB
    - Web apps that provide a variety of online interfaces for exploring PBDB data via the API
    - Mobile apps that provide applications for iOS and Android providing new views of the PBDB's data via the API
    - Lesson plans and teaching activities using the Paleobiology Database
    - Tutorials on how to get and use data from the website, and on how to contribute data to the database, viewable on Youtube
    - Libraries and functions for interacting with PBDB data via R   
    - Documentation, code examples, and issue reporting for the PBDB API
    - Other Paleobiology Database related external resources including a link to the Paleobiology Github repository
    For more information about the Paleobiology Database, see: .

  • Intermediate R

    The intermediate R course is the logical next stop on your journey in the R programming language. In this R training you will learn about conditional statements, loops and functions to power your own R scripts. Next, you can make your R code more efficient and readable using the apply functions. Finally, the utilities chapter gets you up to speed with regular expressions in the R programming language, data structure manipulations and times and dates. This R tutorial will allow you to learn R and take the next step in advancing your overall knowledge and capabilities while programming in R.

  • Introduction to SAGA GIS Software

    A quick introduction to the System for Automated Geographic Analysis (SAGA) GIS software which is an open source Geographic Information System software package. SAGA GIS has been designed for an easy and effective implementation of spatial algorithms and offers a comprehensive, crowing set of geoscientific methods. A data management module is included in the software.

  • ESRI Academy: Data Management

    ESRI, the creator of ArcMap and other Geographic Information Systems (GIS) software product, provides a large number of training courses on topics that include Data Management as well as other skills such as the use of GIS, Python Programming, and other GIS skills.  The types of training materials include tutorials, videos, web courses, instructor-led courses, training seminars, learning plans (including one that leads to 6 courses on the Fundamentals of Data Management) and story maps.  Some training materials are available online while others are on location;  some are free, and some have an associated fee.  Each course provides a certificate once it is completed.  

  • Data Carpentry Geospatial Workshop

    This workshop is designed to teach both general geospatial concepts, but also build capacity related to the use of the "R" programming language for data management skills.  The learner will find out how to use "R" with geospatial data, particularly geospatial raster and vector data.  The workshop lessons include:
    - Introduction to Geospatial Concepts to help the learner understand data structures and common storage and transfor formats for spatial data. The goal of this lesson is to provide an introduction to core geospatial data concepts. It is intended as a pre-requisite for the R for Raster and Vector Data lesson for learners who have no prior experience working with geospatial data.

    - Introduction to R for Geospatial Data to help the learner import data into $, cacluate summary statistics, and create publication-quality graphics by providing an introduction to the R programming language.

    - Introduction to Geospatial Raster and Vector Data with R in which the learner will open, work with, and plot vector and raster-format spatial data in R.   This lesson provides a more in-depth introduction to visualization (focusing on geospatial data), and working with data structures unique to geospatial data.  It assumes that learners are already familiar with both geospatial data concepts and the core concepts of R.

  • Open Access Post-Graduate Teaching Materials in Managing Research Data in Archaeology

    Looking after digital data is central to good research. We all know of horror stories of people losing or deleting their entire dissertation just weeks prior to a deadline! But even before this happens, good practice in looking after research data from the beginning to the end of a project makes work and life a lot less stressful. Defined in the widest sense, digital data includes all files created or manipulated on a computer (text, images, spreadsheets, databases, etc). With publishing and archiving of research increasingly online we all have a responsibility to ensure the long-term preservation of archaeological data, while at same time being aware of issues of sensitive data, intellectual property rights, open access, and freedom of information.
    The DataTrain teaching materials have been designed to familiarise post-graduate students in good practice in looking after their research data. A central tenet is the importance of thinking about this in conjunction with the projected outputs and publication of research projects. The eight presentations, followed by group discussion and written exercises, follow the lifecycle of digital data from pre-project planning, data creation, data management, publication, long-term preservation and lastly to issues of the re-use of digital data. At the same time the course follows the career path of researchers from post-graduate research students, through post-doctoral research projects, to larger collaborative and inter-disciplinary projects.
    The teaching material is targeted at co-ordinators of Core Research Skills courses for first year post-graduate research students in archaeology. The material is open access and you are invited to re-use and amend the content as best suits the requirements of your university department. The complete course is designed to run either as a four hour half-day workshop, or 2 x 2 hour classes. Alternatively, individual modules can be slotted into existing data management and core research skills teaching.

  • The BD2K Guide to the Fundamentals of Data Science Series

    The Big Data to Knowledge (BD2K) Initiative presents this virtual lecture series on the data science underlying modern biomedical research. Since its beginning in September 2016, the webinar series consists of presentations from experts across the country covering the basics of data management, representation, computation, statistical inference, data modeling, and other topics relevant to “big data” in biomedicine. The webinar series provides essential training suitable for individuals at an introductory overview level. All video presentations from the seminar series are streamed for live viewing, recorded, and posted online for future viewing and reference. These videos are also indexed as part of TCC’s Educational Resource Discovery Index (ERuDIte). This webinar series is a collaboration between the TCC, the NIH Office of the Associate Director for Data Science, and BD2K Centers Coordination Center (BD2KCCC).

    View all archived videos on our YouTube channel: 

  • ETD+ Toolkit: Training Students to manage ETD+ research outputs

    The ETD+ Toolkit is a Google Drive Open Curriculum package that is an approach to improving student and faculty research output management. Focusing on the Electronic Thesis and Dissertation (ETD) as a mile-marker in a student’s research trajectory, it provides in-time advice to students and faculty about avoiding common digital loss scenarios for the ETD and all of its affiliated files.

    The ETD+ Toolkit provides free introductory training resources on crucial data curation and digital longevity techniques. It has been designed as a training series to help students and faculty identify and offset risks and threats to their digital research footprints.

    What it is:
    An open set of six modules and evaluation instruments that prepare students to create, store, and maintain their research outputs on durable devices and in durable formats. Each is designed to stand alone; they may also be used as a series.

    What each module includes:
    Each module includes Learning Objectives, a one-page Handout, a Guidance Brief, a Slideshow with full presenter notes, and an evaluation Survey. Each module is released under a CC-BY license and all elements are openly editable to make reuse as easy as possible.