All Learning Resources

  • Data Management Plan - Data Management Guides

    A collection of online data management guides, data management planning tools, guidelines from funding agencies, and data management plan examples for researchers and librarians. This page also contains a link to various courses and tutorials on research data management for health science librarians and researchers at: .  

  • Memcached Tutorial

    ​Memcached is an open-source, high-performance, distributed memory object caching system intended to speed up dynamic web applications by reducing the database load. It is a key-value dictionary of strings, objects, etc., stored in the memory, resulting from database calls, API calls, or page rendering. This tutorial provides a basic understanding of all the relevant concepts of Memcached needed to create and deploy a highly scalable and performance-oriented system.
    The key features of Memcached are as follows −
    -It is an open-source.
    -The Memcached server is a big hash table.
    -It significantly reduces the database load
    -It is perfectly efficient for websites with high database load.
    -It is distributed under a Berkeley Software Distribution (BSD) license.
    -It is a client-server application over TCP or UDP.
    Memcached is not −
    -a persistent data store
    -a database
    -a large object cache
    -fault-tolerant or highly available

  • Data Structure and Algorithms Tutorial

    Data Structures are the programmatic way of storing data so that data can be used efficiently. Almost every enterprise application uses various types of data structures in one or the other way. This tutorial will give you a great understanding on Data Structures needed to understand the complexity of enterprise level applications and need of algorithms, and data structures.
    As applications are getting complex and data-rich, there are three common problems that applications face nowadays.
    -Data Search − Consider an inventory of 1 million items of a store. If the application is to search an item, it has to search an item in 1 million items every time slowing down the search. As data grows, search will become slower.
    -Processor speed − Processor speed although being very high, falls limited if the data grows to billion records.
    -Multiple requests − As thousands of users can search for data simultaneously on a web server, even the fast server fails while searching the data.
    To solve the above-mentioned problems, data structures come to rescue. Data can be organized in a data structure in such a way that all items may not be required to be searched, and the required data can be searched almost instantly.

  • Introduction to Light Detection and Ranging (LiDAR) – Explore Point Clouds and Work with LiDAR Raster Data in R

    The tutorials in this series introduces Light Detection and Ranging (LiDAR). Concepts covered include how LiDAR data is collected, LiDAR as gridded, raster data and an introduction to digital models derived from LiDAR data (Canopy Height Models (CHM), Digital Surface Models (DSM), and Digital Terrain Models (DTM)). The series introduces the concepts through videos, graphical examples, and text. The series continues with visualization of LiDAR-derived raster data using,, and R, three free, open-source tools.
    Data used in this series are from the National Ecological Observatory Network (NEON) and are in .las, GeoTIFF and .csv formats.
    Series Objectives
    After completing the series you will:
    -Know what LiDAR data are
    -Understand key attributes of LiDAR data
    -Know what LiDAR-derived DTM, DSM, and CHM digital models are
    -Be able to visualize LiDAR-derived data in .las format using
    -Be able to create a Canopy Height Model in R
    -Be able to create an interactive map of LiDAR-derived data.

  • Data Activity: Visualize Precipitation Data in R to Better Understand the 2013 Colorado Floods

    Several factors contributed to extreme flooding that occurred in Boulder, Colorado in 2013. In this data activity, we explore and visualize the data for precipitation (rainfall) data collected by the National Weather Service's Cooperative Observer Program. The tutorial is part of the Data Activities that can be used with the Quantifying The Drivers and Impacts of Natural Disturbance Events Teaching Module.
    Learning Objectives

    After completing this tutorial, you will be able to:
    -Download precipitation data from NOAA's National Centers for Environmental Information.
    -Plot precipitation data in R.
    -Publish & share an interactive plot of the data using Plotly.
    -Subset data by date (if completing Additional Resources code).
    -Set a No Data Value to NA in R (if completing Additional Resources code).

  • Introduction To Working With Time Series Data In Text Formats In R

     The tutorials in this series cover how to open, work with and plot tabular time-series data in R. Additional topics include working with time and date classes (e.g., POSIXct, POSIXlt, and Date), subsetting time series data by date and time and creating faceted or tiled sets of plots.  Data used in this series cover NEON Harvard Forest Field Site and are in .csv file format.

    Series Objectives:
    After completing the series, you will:

    Time Series 00-
    Be able to open a .csv file in R using read.csv and understand why we are using that file type.
    -Understand how to work data stored in different columns within a data.frame in R.
    -Understand how to examine R object structures and data classes.
    -Be able to convert dates, stored as a character class, into an R date class.
    -Know how to create a quick plot of a time-series data set using qplot.

    Time Series 01-
    Know how to import a .csv file and examine the structure of the related R object.
    -Use a metadata file to better understand the content of a dataset.
    -Understand the importance of including metadata details in your R script.
    -Know what an EML file is.

    Time Series 02-
    Understand various date-time classes and data structure in R.
    -Understand what POSIXct and POSIXlt data classes are and why POSIXct may be preferred for some tasks.
    -Be able to convert a column containing date-time information in character format to a date-time R class.
    -Be able to convert a date-time column to different date-time classes.
    -Learn how to write out a date-time class object in different ways (month-day, month-day-year, etc.).

    Time Series 03-
    Be able to subset data by date.
    -Know how to search for NA or missing data values.
    -Understand different possibilities on how to deal with missing data.

    Time Series 04-
    Know several ways to manipulate data using functions in the dplyr package in R.
    -Be able to use group-by, summarize, and mutate functions.
    -Write and understand R code with pipes for cleaner, efficient coding.
    -Use the year function from the lubridate package to extract the year from a date-time class variable.

    Time Series 05-
    Be able to create basic time series plots using ggplot in R.
    -Understand the syntax of ggplot and know how to find out more about the package.
    -Be able to plot data using scatter and bar plots.

    Time Series 06-
    Know how to use facets in the ggplot2 package.
    -Be able to combine different types of data into one plot layout.

    Time Series Culmination Activity-
    Have applied ggplot2 and dplyr skills to a new data set.
    -learn how to set min/max axis values in ggplot to align data on multiple plots.

  • Agriculture Data Management Software Introduction

    This course provides a quick intro to getting started with precision ag software.
    Topics include: 
    - 1.1 SMS Download and Install
    - 1.2 Upgrading SMS
    - 1.3 Importing SMS Files
    -  1.4 Getting Familiar with SMS

    Terms of use for this course can be found at: 

  • Yield Monitor Data Post-Processing or "Cleaning"

    In this course, you will learn how to clean agricultural yield data with free tools to make more reliable interpretations and agronomic decisions.  You will use your ag data, get experience with precision ag software, and learn about new agriculture technologies. By enrolling in the course, you can visit the  contents which are divided into 2 parts:
    1-SMS Advanced - Software Management
    2- Yield Data Post Processing

    Terms of use for this course can be found at: 

  • Conducting On-Farm Research With Ag Technology

    In this course, you will learn how to use your ag technologies to conduct on-farm research in your operation. The overall topic is:  
    Topics include:
    - 1.1 Designing On-Farm Research Experiments and Creating Prescriptions
    - 1.2 Tutorial and Tutorial Walkthrough Video
    - 1.3 Additional Help Videos
    - 1.4 Get CCA Credits for Designing On-Farm Research Experiments and Creating Prescriptions

    Terms of use for this course can be found at: 


  • Data Management for Hybrid and Multi-Cloud: A Four-Step Journey

    Enterprises such as Intuit, Macy’s, and others are out-innovating their competitors and dramatically grow their business with hybrid and multi-cloud database deployments.
    Data management is the hardest part of systems delivery, especially when making the transition to the cloud. But done correctly, the right data strategy can result in a multitude of benefits, including:
    · Data portability across on-premises and cloud
    · Reduced data silos
    · Resiliency and availability
    · Security
    · Capacity planning and growth
    Join Robin Schumacher, Chief Product Officer at DataStax, as he explores best practices for defining and implementing data management strategies for the cloud. Schumacher will outline a four-step journey that will take you from your first deployment in the cloud through to true intercloud implementation. He will also walk through a real-world use case where a major retailer has evolved through the four phases over a period of four years and is now benefitting from a highly resilient multi-cloud deployment.
  • Workshop On Data Management Plans For Linguistic Research

    The rising tide of data management and sharing requirements from funding agencies, publishers, and institutions has created a new set of pressures for researchers who are already stretched for time and funds. While it can feel like yet another set of painful hurdles, in reality, the process of creating a Data Management Plan (DMP) can be a surprisingly useful exercise, especially when done early in a project’s lifecycle. Good data management practiced throughout one’s career, can save time, money, and frustration, while ultimately helping increase the impacts of research.
    This 1-day workshop will involve lecture and discussion around concepts of data management throughout the data lifecycle (from data creation, storage, and analysis to data sharing, archiving, and reusing), as well as related issues such as intellectual property, copyright, open access, data citation, attribution, and metrics. Participants will learn about data management best practices and useful tools while engaging in activities designed to produce a DMP similar to those desired by the NSF Behavioral and Cognitive Sciences Division (for example, Linguistics, Documenting Endangered Languages), as well as other federal agencies such as NEH.
  • 3 Data Management Best Practices

    Develop awareness, through examples and lab exercises, of the challenges associated with managing scientific experimental data. Learn best practices for managing data across the entire research lifecycle, from conceiving of an experiment to sharing resulting data and analysis.
    Learning Objectives:  
    Students will develop skills in the following areas:
    -Know how to organize data to better find it later,
    -Follow best practices for storage and backup to protect files from loss,
    -Document data so that anyone can follow what you did,
    -Properly cite and follow license conditions when using another researcher’s data.
    Table of Contents:
    3.1 Overview
    3.2 File Organization
    3.3 Storage and Backup
    3.4 Data Quality Control
    3.5 Documentation
    3.6 Data Sharing
    3.7 Assignment

  • Document Your Code with Jupyter Notebooks

    This series teaches you to use the Jupyter Notebook file format to document code and efficiently publish code results & outputs.

    Series Objectives
    After completing the series, you will be able to:
    -Document & Publish Your Workflow

    •      Explain why documenting and publishing one's code is important.
    •      Describe two tools that enable ease of publishing code & output: Jupyter Notebook application.

    -Introduction to Using Jupyter Notebooks

    •      Know how to create a notebook using the Jupyter application.
    •      Be able to write a script with text and code chunks.

  • Key Issues In Reusing Data

    Participants will hear about the key issues in the secondary analysis as a method. The introductory session will briefly cover the pros and cons of reusing data and the importance of learning about the origins of your data. Quantitative and qualitative secondary analysis will be discussed with examples and issues of context, sampling and ethics will be raised. This session is more conceptual, which is more suited to those who want a more practical introduction to data.

    In order to follow the content of the webinar attendees should already be familiar with the basic methods of qualitative or quantitative data research.

  • Tidy-ing Your Data: Simple Steps for Reproducible Research

    This webinar introduces the ideal of relational data modeling and how it can be used as a principle for organizing data, even without a data management system. 

    Data management is a fundamental skill to any researcher, but implementing data management practices is often reserved for when data get "big" or "complex." In reality, all data can benefit from good management and organization practices, regardless of the size or complexity of the data. This webinar introduces the concept of relational data modeling and how it can be used as a principle for organizing data, even without a database management system. Using a relational data model creates a “tidy” data structure, which makes data easier to search, filter, document, and analyze.

  • Data Curation Primers: Expanding The Community Curation Toolkit

    The Data Curation Network presents a new resource to add to your data curation toolkit. “Data curation primers” are a concise, actionable resource to assist data curators in adding value to a dataset. These evolving documents detail a specific subject, disciplinary area or curation task, and can be used as a reference or jump-start to curating research data.
    The first set of these primers, authored by teams of experts, have just been published and cover the following data types/formats: Microsoft Excel; Microsoft Access; Geodatabases; netCDF; Jupyter Notebooks, SPSS; and websites. 
    Data curation primers were the direct output of an IMLS-funded workshop series hosted by the partners in the Data Curation Network (DCN) and leverage the expertise of data curators nation-wide. Attend this webinar to get an update on the DCN, a little history about the DCN Education initiative, a demo of the newest releases of the curation primers, and some ideas of how you can incorporate this resource into your workflows and share your own expertise.
    Learning outcomes:
    1-Increase understanding of data curation practices and tools in various disciplines, data types, and formats.
    2- share expertise and enhance curation capacity for curation nationwide.
    3- meet like-minded colleagues who are interested in building and extending curation practices.

  • Developing, Packaging and Sharing Reproducible Research Objects: The Whole Tale Approach

    It has been recognized for some time now that sharing data is critical when publishing research findings.  However, it remains challenging (if not impossible) for many potential users of a research object to deal with the installation of complex software dependencies and the appropriate parameterization and execution of multiple scripts contained in often complicated, nested research objects. A key objective of the NSF-funded Whole Tale project is to make the development, sharing, and reuse of reproducible research objects more seamless, both for creators and users of "tales". Tales can be seen as a special kind of research object that bundles data and metadata, but also code and the runtime execution environment necessary to reproduce the computational aspects of a research paper. In particular, a human-centered narrative, e.g., in the form of a Jupyter (or RStudio) notebook can be used as the central element of research tales that interleave scientific explanations, code, and visualizations. By making it easier to develop, package, share, and execute tales, different types of users are supported by the Whole Tale approach: researchers can easily combine data from different sources (e.g., DataONE member nodes), analyze and visualize the data, and then bundle up their research products---together with the software environment used to generate the products---and share the resulting tales. Peers can use the shared tales, e.g., as part of a review process (associated with a scientific publication) or make it the basis for their own research. In this webinar, we will illustrate the Whole Tale approach using a number of simple examples and demonstrations.

  • Data Discovery

    Data discovery is a crucial stage in the research process, especially in the social sciences and humanities as many valuable studies have originated from secondary data.
    Speakers will take participants on a tour through five core elements of the discovery process. These elements are 1) identification of the purpose of the specific data use intended, 2) finding an appropriate data resource, 3) setting up a search query, 4) selecting the data and finally 5) evaluation of the data quality.
    The guest speaker, sociologist Kristýna Bašná from the Institute of Sociology, Czech Academy of Sciences, will present her personal experience with discovering data in the context of her research on civic culture and democracy.

    Material from the webinar available includes:

    Video Recording on YouTube
    Slides from the webinar
    User Guide (PDF, 375 KB) - finding and accessing data from national social science data services
    Case studies (PDF, 213 KB)
    Also see the related resource at:

  • Cleaning Data in SQL

    Real-world data is almost always messy. As a data scientist or a data analyst or even as a developer, if you need to discover facts about data, it is vital to ensure that data is tidy enough for doing that. There is actually a well-rounded definition of tidy data, and you can check out this wiki page to find more resources about it.
    In this tutorial, you will be practicing some of the most common data cleaning techniques in SQL. You will create your own dummy dataset, but the techniques can be applied to real-world data (of the tabular form) as well. The contents of this tutorial are as follows:
    -Different data types and their messy values
    -Problems that can raise from messy numbers
    -Cleaning numeric values
    -Messy strings
    -Cleaning string values
    -Messy data values and cleaning them
    -Duplications and removing them

    Note that you should already be knowing how to write basics SQL queries in PostgreSQL (the RDBMS you will be using in this tutorial).

  • Getting Started With Lab Folder

    Labfolder is an electronic lab notebook that enables researchers to record findings and make new discoveries. By reinventing the traditional paper lab notebook, our productivity & collaboration platform makes it easier to create, find, share, discuss & validate research data as a team.  This Getting Started guide will help you learn what Labfolder is and how to use it for data entry, protocols and standard operating procedures, data retrieval, collaboration, inventory and data security tasks.  

    Labfolder is free for a team of up to 3 scientists in academia;  other pricing is available for larger teams in academia, for business and industry, and for classroom use.  


  • Data Visualization with Power BI

    Power BI is a cloud-based business analytics service from Microsoft that enables anyone to visualize and analyze data, with better speed and efficiency. It is a powerful as well as a flexible tool for connecting with and analyzing a wide variety of data. Many businesses even consider it indispensable for data-science-related work. Power BI’s ease of use comes from the fact that it has a drag and drop interface. This feature helps to perform tasks like sorting, comparing and analyzing, very easily and fast. Power BI is also compatible with multiple sources, including Excel, SQL Server, and cloud-based data repositories which makes it an excellent choice for Data Scientists.
    This tutorial will cover the following topics:
    Overview of Power BI
    Advantages of using Power BI
    PowerBI Desktop
    Getting Started
    Transforming Data
    Report Creation
    Building a Dashboard
    Power BI’s integration with R & Python
    Saving and Publishing

    Terms of use can be found at:

  • Principal Component Analysis (PCA) in Python

    Principal Component Analysis (PCA) is a linear dimensionality reduction technique that can be utilized for extracting information from a high-dimensional space by projecting it into a lower-dimensional subspace. It tries to preserve the essential parts that have more variation of the data and remove the non-essential parts with fewer variations.
    In this tutorial, you will learn about PCA and how it can be leveraged to extract information from the data without any supervision using two popular datasets: Breast Cancer and CIFAR-10.

    According to Wikipedia, PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components.

  • Python Data Type Conversion Tutorial

    In this Python tutorial, you'll tackle implicit and explicit data type conversion of primitive and non-primitive data structures with the help of code examples!

    Every value in Python has a data type. Data types are a classification of data that tells the compiler or the interpreter how you want to use the data. The type defines the operations that can be done on the data and the structure in which you want the data to be stored. In data science, you will often need to change the type of data, so that it becomes easier to use and work with.
    This tutorial will tackle some of the important and most frequently used data structures, and you will learn to change their types to suit your needs. More specifically, you will learn:
    -Implicit and Explicit Data Type Conversion
    -Primitive versus Non-primitive Data Structures
    -Integer and Float Conversions
    -Data Type Conversion with Strings
    -Conversion to Tuples and Lists
    -Binary, Octal, and Hexadecimal Integers in Python

    Python has many data types. You must have already seen and worked with some of them. You have integers and float to deal with numerical values, boolean (bool) to deal with true/false values and strings to work with alphanumeric characters. You can make use of lists, tuples, dictionary, and sets that are data structures where you can store a collection of values.

  • Reading and Importing Excel Files into R

    R Tutorial on Reading and Importing Excel Files into R will help you understand how to read and import spreadsheet files using basic R and packages.

    This tutorial on reading and importing Excel files into R will give an overview of some of the options that exist to import Excel files and spreadsheets of different extensions to R. Both basic commands in R and dedicated packages are covered. At the same time, some of the most common problems that you can face when loading Excel files and spreadsheets into R will be addressed.

    Excel is a spreadsheet application developed by Microsoft. It is an easily accessible tool for organizing, analyzing and storing data in tables and has widespread use in many different application fields all over the world. It doesn't need to surprise that R has implemented some ways to read, write and manipulate Excel files (and spreadsheets in general).

  • SQL Tutorial: How to Write Better Queries

    In this tutorial, you will learn about anti-patterns, execution plans, time complexity, query tuning, and optimization in SQL. First off, you’ll start with a short overview of the importance of learning SQL for jobs in data science, and next, you’ll first learn more about how SQL query processing and execution so that you can adequately understand the importance of writing qualitative queries: more specifically, you’ll see that the query is parsed, rewritten, optimized and finally evaluated. You’ll also learn more about the set-based versus the procedural approach to querying. You’ll briefly go more into time complexity and the big O notation to get an idea about the time complexity of an execution plan before you execute your query; Lastly, You'll briefly get some pointers on how you can tune your query further.

    Terms for use of this information can be found at:

  • Introduction to code versioning and collaboration with Git and GitHub: An EDI VTC Tutorial.

    This tutorial is an introduction to code versioning and collaboration with Git and GitHub.  Tutorial goals are to help you:  

    • Understand basic Git concepts and terminology.
    • Apply concepts as Git commands to track versioning of a developing file.
    • Create a GitHub repository and push local content to it.
    • Clone a GitHub repository to the local workspace to begin developing.
    • Inspire you to incorporate Git and GitHub into your workflow.

    There are a number of exercises within the tutorial to help you apply the concepts learned.  
    Follow up questions can be directed via email to:  o Colin Smith  ([email protected]) AND Susanne Grossman-Clarke ([email protected]).

  • Transform and visualize data in R using the packages tidyr, dplyr and ggplot2: An EDI VTC Tutorial.

    The two tutorials, presented by Susanne Grossman-Clarke, demonstrate how to tidy data in R with the package “tidyr” and transform data using the package “dplyr”. The goal of those data transformations is to support data visualization with the package “ggplot2” for data analysis and scientific publications of which examples were shown.

  • Large-Scale Multi-view Data Analysis

    Multi-view data are extensively accessible nowadays, since various types of features, viewpoints, and different sensors. For example, the most popular commercial depth sensor Kinect uses both visible light and near-infrared sensors for depth estimation; automatic driving uses both visual and radar sensors to produce real-time 3D information on the road, and face analysis algorithms prefer face images from different views for high-fidelity reconstruction and recognition. All of them tend to facilitate better data representation in different application scenarios. Essentially, multiple features attempt to uncover various knowledge within each view to alleviate the final tasks, since each view would preserve both shared and private information. This becomes increasingly common in the era of “Big Data” where the data are on large-scale, subject to corruption, generated from multiple sources, and have complex structures. While these problems attracted substantial research attention recently, a systematic overview of multi-view learning for Big Data analysis has never been given. In the face of big data and challenging real-world applications, we summarize and go through the most recent multi-view learning techniques appropriate to different data-driven problems. Specifically, our tutorial covers most multi-view data representation approaches, centered around two major applications along with Big Data, i.e., multi-view clustering, multi-view classification. In addition, it discusses current and upcoming challenges. This would benefit the community in both industry and academia from literature review to future directions.
    This tutorial, available in PDF format,  is one of nine tutorials from the 2018 IEEE international conference on BIG DATA in Seattle WA, also you reach out to the others at IEEE Big Data 2018 Tutorials.
  • When Metadata Collides: Lessons on Combining Records from Multiple Repository Systems

    Institution X likes to use Dublin Core and enjoys occasionally storing coordinates in the dc: rights field along with normal rights statements. Institution Y prefers PBCore and dabbles in storing LCSH subject strings as a type of corporation. What happens when the time comes for these two institutions to put their data in a shared environment? These are the issues the Boston Public Library has been facing building a statewide digital repository for Massachusetts made up of items from dozens of organizations that each have their own way of doing metadata. This talk is on the Digital Commonwealth initiative and our role as a DPLA hub, lessons learned while dealing with other institutions' data, and how we manage a repository system that contains actual digitized objects alongside metadata-only harvested records. In addition, a portion of this talk is on breaking the conventional library wisdom of "dumbing down" data to the lowest common denominator in a shared context. Instead, we go in the opposite direction: we make what we take in much more rich and discoverable by linking terms to controlled vocabularies, parsing subjects for geographic information, parsing potential dates from various fields into a standard format, and more.

    This presentation was part of Open Repositories 2014, Helsinki, Finland, June 9-13, 2014;  General Track, 24x7 Presentations 
    The slides are available in PDF format at:

  • Research Data Management In The Arts and Humanities

    In recent times the principal focus for research data management protagonists has been upon scientific data, due perhaps to a combination of conspicuous Government or funder declarations with a bias towards the sciences and the very public consciousness of examples of 'big data', notably the output from CERN's Large Hadron Collider.

    That is not to say that developments in the management of Arts and Humanities data have been absent, merely occluded. We aim to take some steps towards rectifying this situation with RDMF10, which will examine what it is about Arts and Humanities data that may require a different kind of handling to that given to other disciplines, how the needs for support, advocacy, training, and infrastructure are being supplied and, consequently, what are the strengths and weaknesses of the current arrangements for data curation and sharing.

    The broad aims of the event were:

    -To examine aspects of Arts and Humanities data that may require a different kind of handling to that given to other disciplines;
    -To discuss how needs for support, advocacy, training, and infrastructure are being described and met;
    -And consequently, to assess the strengths and weaknesses of the current arrangements for Arts and Humanities data curation and sharing, and brainstorm ways forward.

    Presentations include:  
    - Introduction - Martin Donnelly, DCC
    -  Keynote 1: "What’s so different about Arts and Humanities data?" - Professor David De Roure, Director, Oxford e-Research Centre 
    -  Keynote 2: "Err, what do I do with this? Exploring infrastructure requirements for visual arts researchers" - Leigh Garrett, Director, Visual Arts Data Service 
    -  Researcher support and development requirements - Simon Willmoth, Director of Research Management and Administration, University of the Arts London 
    -  Advocacy and outreach - Stephanie Meece, Scholarly Communications Librarian, University of the Arts London
    -  A researcher's view on Arts and Humanities data management/sharing (with a focus on infrastructure needs and wants) - Dr Julianne Nyhan, Lecturer in Digital Information Studies, University College London
    -  Data and the Sonic Art Research Unit - Professor Paul Whitty and Dr Felicity Ford, Oxford Brookes University 
    Institutional case study: Research data management in the humanities: A non-Procrustean infrastructure - Sally Rumsey, Janet McKnight and Dr James A. J. Wilson, University of Oxford 
    -  Linking institutional, national and international infrastructures - Sally Chambers, DARIAH 


  • Principles Of Data Curation And Preservation

    Session outline:
    -Definitions of data curation and preservation
    -Stages of data curation and preservation
    -Top tips for data curation and preservation
    - Real-life examples of data curation and preservation
     -Key references to find out more about data curation
    Learning outcomes:
    -Recognize the basic principle of the appropriate curation and preservation of research data
    -Outline the essentials of good data management practice
    -Identify the reasons for good data management

  • Archiving The Artist: An Introduction To Digital Curation

    Learning outcomes:
    -Identify the advantages and challenges of digitizing art archives
    -Recognize the basic principle of digitization and digital curation
    - Compare and evaluate digitized collections in the visual arts
    -Apply the knowledge acquired in this session to begin to develop your own digital collection

  • DBMS Tutorial

    DBMS Tutorial provides basic and advanced concepts of Database. The database management system is software that is used to manage the database. This  DBMS Tutorial includes all topics of DBMS such as introduction, ER model, keys, relational model, join operation, SQL, functional dependency, transaction, concurrency control, etc.

    DBMS allows users the following tasks:
    Data Definition: It is used for creation, modification, and removal of definition that defines the organization of data in the database.
    Data Updating: It is used for the insertion, modification, and deletion of the actual data in the database.
    Data Retrieval: It is used to retrieve the data from the database which can be used by applications for various purposes.
    User Administration: It is used for registering and monitoring users, maintain data integrity, enforcing data security, dealing with concurrency control, monitoring performance and recovering information corrupted by unexpected failure.
    Subtopics covered under this tutorial include:

    What is a Database
    Types of Databases
    What is RDBMS
    DBMS vs File System
    DBMS Architecture
    Three schema Architecture
    Data Models
    Data model schema
    Data Independence
    DBMS Language
    ACID Properties in DBMS
    Data modeling

  • SAR for Landcover Applications [Advanced]

    This webinar series will build on the knowledge and skills previously developed in ARSET SAR training. Presentations and demonstrations will focus on agriculture and flood applications. Participants will learn to characterize floods with Google Earth Engine. Participants will also learn to analyze synthetic aperture radar (SAR) for agricultural applications, including retrieving soil moisture and identifying crop types.

    Learning Objectives: By the end of this training, attendees will be able to: 

    1. analyze SAR data in Google Earth Engine
    2. generate soil moisture analyses
    3. identify different types of crops   

    Course Format: 

    • This webinar series will consist of two, two-hour parts
    • Each part will include a presentation on the theory of the topic followed by a demonstration and exercise for attendees. 
    • This training is also available in Spanish. Please visit the Spanish page for more information.
    • A certificate of completion will also be available to participants who attend all sessions and complete the homework assignment, which will be based on the webinar sessions. Note: certificates of completion only indicate the attendee participated in all aspects of the training, they do not imply proficiency on the subject matter, nor should they be seen as a professional certification.

    Prerequisites are not required for this training, but attendees that do not complete them may not be adequately prepared for the pace of the training. 

    Part One: Monitoring Flood Extent with Google Earth Engine
    This session will focus on the use of Google Earth Engine (GEE) to generate flood extent products using SAR images from Sentinel-1. The first third of the session will cover the basic principles of radar remote sensing related to flooded vegetation. The remaining time in the session will be dedicated to a demonstration on how to use GEE to generate flood extent products with Sentinel-1.
    Part Two: Exploiting SAR to Monitor Agriculture
    Featuring guest speaker Dr. Heather McNairn, from Agriculture and Agri-Food Canada, this session will focus on using SAR to monitor different agriculture-related topics, building on the skills learned in the SAR agriculture session from 2018. The first part of the session will cover the basics of radar remote sensing as related to agriculture. The remainder of the session will focus on the use of SAR to retrieve soil moisture, identify crop types, and map land cover.

    Each part of 2 includes links to the recordings, presentation slides, and Question & Answer Transcripts.

  • Forest Mapping and Monitoring with SAR Data [Advanced]

    Measurements of forest cover and change are vital to understanding the global carbon cycle and the contribution of forests to carbon sequestration. Many nations are engaged in international agreements, such as the Reducing Emissions from Deforestation and Degradation (REDD+) initiative, which includes tracking annual deforestation rates and developing early warning systems of forest loss. Remote sensing data are integral to data collection for these metrics, however, the use of optical remote sensing for monitoring forest health can be challenging in tropical, cloud-prone regions.
    Radar remote sensing overcomes these challenges because of its ability to “see” the surface through clouds or regardless of day or night conditions. In addition, the radar signal can penetrate through the vegetation canopy and provide information relevant to structure and density. Although the capabilities and benefits of SAR data for forest mapping and monitoring are known, it is underutilized operationally due to data complexities and limited user-friendly tutorials.

    This advanced webinar series will introduce participants to 1.) SAR time series analysis of forest change using Google Earth Engine (GEE), 2.) land cover classification with radar and optical data with GEE, 3.) mapping mangroves with SAR, and 4.) forest stand height estimation with SAR. Each training will include a theoretical portion describing the use of SAR for landcover mapping as related to the focus of the session followed by a demonstration that will show participants how to access, download, and analyze SAR data for forest mapping and monitoring. These demonstrations will use freely-available, open-source data, and software.

    Learning Objectives: By the end of this training, attendees will be able to:

    • Interpret radar data for forest mapping
    • Understand how radar data can be applied to land cover mapping
    • Become familiar with open source tools used to analyze radar data
    • Conduct a land cover classification with radar and optical data
    • Map mangrove forests with radar data
    • Understand how forest stand height can be mapped using radar data
    • Apply SAR time-series analysis to map forest change
    • Learn about upcoming radar missions at NASA

    Course Format: 

    • Four parts with sessions offered in English and Spanish
    • Four exercises
    • One Google Form homework

    prerequisites: Attendees who have not completed the following may not be prepared for the pace of this training:

    Part 1: Time Series Analysis of Forest Change

    • Introduction to analysis and interpretation of SAR data for forest mapping
    • Exercise: Time Series of Forest Change using GEE
    • Q&A

    Part 2: Land Cover Classification with Radar and Optical Data

    • Review of the unique attributes of radar and optical data as related to forest mapping and how they can be complementary
    • Classification algorithms and improvements with optical imagery
    • Exercise: Land Cover Classification with Radar and Optical using GEE
    • Q&A

    Part 3: Mangrove Mapping

    • Introduction to analysis and interpretation of SAR data for mangrove mapping
    • Exercise: Mapping Mangroves with the Sentinel Toolbox
    • Q&A

    Part 4: Forest Stand Height (with Guest Speaker Paul Siqueria)

    • Introduction to the use of SAR data for mapping forest stand height
    • Applications and looking forward to NISAR 2022
    • Demo: Estimating Forest Stand Height
    • Q&A

    ​Each part of 4 includes links to the recordings, presentation slides, exercises, and Question & Answer Transcripts.

  • Mapeo y Monitoreo de los Bosques con Datos SAR [Avanzado]

    Esta capacitación avanzada cubrirá los siguientes temas 1) análisis del cambio en los bosques con datos SAR multi-temporales utilizando Google Earth Engine (GEE); 2) la clasificación de la cobertura terrestre con datos SAR y ópticos utilizando GEE; 3) el mapeo de manglares con SAR; 4) y la estimación de la altura de los bosques utilizando SAR. Cada sesión incluirá una porción teórica describiendo el uso de SAR para el mapeo de la cobertura relevante el enfoque de la sesión, seguida por una demostración de cómo acceder, descargar y analizar datos SAR para el mapeo y monitoreo del bosque. Estas demostraciones utilizan datos y software de libre acceso y de fuente abierta.

    Objetivos de Aprendizaje: 

    • Para la conclusión de esta capacitación, los participantes podrán:
    • Interpretar datos radar para el mapeo de los bosques
    • Entender cómo se puede aplicar datos radar para el mapeo de la cobertura terrestre
    • Estar familiarizados con herramientas de fuente abierta para analizar datos radar
    • Realizar una clasificación de la cobertura terrestre con datos radar y ópticos
    • Mapear manglares con datos radar
    • Entender cómo la altura de los rodales de los bosques se puede mapear con datos radar
    • Aplicar análisis de series temporales SAR para mapear cambios en los bosques
    • Aprender sobre futuras misiones radar de la NASA

    Formato del Curso: 

    • Cuatro partes con sesiones disponibles en inglés y español
    • Cuatro ejercicios
    • Una tarea en Google Form
    • Habrá un certificado de finalización disponible para los participantes que asistan a todas las sesiones y completen las tareas, la cual estará basada en las sesiones del webinar. Nota: los certificados de finalización indican únicamente que el poseyente participó en todos los aspectos de la capacitación, no implican competencia en la temática ni se deben ver como una certificación profesional.

    Completar los Fundamentos de la Percepción Remota (Teledetección), Introducción al Radar de Apertura Sintética y SAR y sus Aplicaciones para la Cobertura Terrestre o tener experiencia equivalente. Los participantes que no completen los prerrequisitos podrían no estar lo suficientemente preparados para el ritmo de la capacitación.
    Software instrucciones
    Puede seguir las demostraciones utilizando el software enumerado a continuación. Las grabaciones de cada parte estarán disponibles en YouTube dentro de 24 horas después de cada demostración para que usted pueda repasarlas a su propio ritmo.
    Primera Parte: SAR para el Mapeo de Inundaciones Utilizando Google Earth Engine
    Google Earth Engine
    Segunda Parte: SAR Interferométrico para la Observación de Derrumbes
    Tercera Parte: Generación de un Modelo de Elevación Digital (Digital Elevation Model o DEM)
    Para ambas partes, los presentadores utilizarán el Sentinel-1 Toolbox

    Primera Parte: Análisis del Cambio en los Bosques con Datos SAR Multi-Temporales
    • Introducción al análisis e interpretación de datos SAR para el mapeo de los bosques 
    • Ejercicio: Datos SAR multi-temporales para el análisis del cambio en los bosques usando GEE 
    • Sesión de preguntas y respuestas

    Segunda Parte: Clasificación de la Cobertura Terrestre con Datos SAR y Ópticos
    • Repaso de las caracteristicas de los datos SAR y ópticos relevantes al mapeo de bosques y cómo se pueden complementar entre sí
    • Algoritmos para la clasificación con imágenes ópticas 
    • Ejercicio: Clasificación de la cobertura terrestre con datos SAR y opticos usando GEE 
    • Sesión de preguntas y respuestas

    Tercera Parte: Mapeo de Manglares
    • Introducción al análisis e interpretación de datos SAR para el mapeo de manglares 
    • Ejercicio: El mapeo de Manglares con el Sentinel Toolbox 
    • Sesión de preguntas y respuestas

    Cuarta Parte: Estimación de la Altura de los Bosques con SAR (Presentador Invitado el Dr. Paul Siqueira)
    • Introducción al uso de datos SAR para estimar la altura de los bosques
    • Aplicaciones y a la espera de NISAR en el 2022 
    • Demo: Estimacion de la altura de los bosques 
    • Sesión de preguntas y respuestas

  • SQL for Data Analysis – Tutorial for Beginners – ep1

    SQL is simple and easy to understand. Thus, not just engineers, developers or data analysts/scientists can use it, but anyone who is willing to invest a few days into learning and practicing it.
    The author has created this SQL series to be the most practical and most hands-on SQL tutorial for aspiring Data Analysts and Data Scientists. It will start from the very beginning, so if you have never touched coding/programming/querying, that won’t be an issue!

  • Webinar: Introduction to QAMyData ‘health-check’ tool for numeric data

    This webinar is an introduction to the new QAMyData tool for health-checking your numeric data, launched in November 2019.
    The tool uses automated methods to detect and report on some of the most common problems found in the survey or numeric data, such as missingness, duplication, outliers, and direct identifiers. The open-source tool helps data creators and user’s quality assess a numeric data file using a comprehensive list of ‘tests’, classified into types: file, metadata, data integrity, and direct identifiers. Popular file formats can be tested, including SPSS, Stata, SAS, and CSV. The test configuration feature allows the creation of your own unique Data Quality Profile, which can play a useful role in your ‘FAIR’ data checking.
    The webinar will describe the tests that are included in the tool, how to configure these to meet your own quality thresholds, and how to download the software from our GitHub page. they will also show their teaching exercise using messy data that can help promote data management skills.

  • Introduction to Data Mining

    All great learning opportunities are built on a solid foundation. This data mining fundamentals series is jam-packed with all the background information, technical terminology, and basic knowledge that you will need to hit the ground running.   This page provides a playlist of 24 video sessions on the basics of data mining.  Topics in the introductory session include:
    – Data and Data Types
    – Data Quality
    – Data Preprocessing
    – Similarity and Dissimilarity
    – Data Exploration and Visualization

  • Effectively Managing And Sharing Research Data In Spreadsheets

    Regardless of discipline studied or methods used, it is likely that researchers use some sort of spreadsheet application (e.g. Excel, Google Sheets) to investigate, manipulate, or share research data. Though spreadsheets are easy to exchange with other researchers, difficulties can arise if sharing and preservation are not adequately considered. Join JHU Data Management Services for our online training course, “Effectively Managing and Sharing Research Data in Spreadsheets”. NOTE:   You will be required to login to Blackboard to use the resources, but can do so as a guest.  

    We will cover spreadsheet practices that can:
    1-Increase the possibility research data contained in spreadsheets can be re-used by others (and you) in the future, and
    2-Help to reduce the chance of error when using spreadsheets for data acquisition.

    -Introduction (7 mins lecture and 1 min demo)
    -Prior to Data Collection (7 mins lecture)
    -Track Versions and Provenance (5 mins lecture and 15 mins demos)
    -Data Quality Control (10 mins lecture and 9 mins demos)
    -Documenting Data, Values, and Variables (9 mins lecture and 14 mins demos)
    -Data Sharing for Spreadsheets (9 mins lecture and 6 mins demos)
    -Using Other People’s Spreadsheet Data (8 mins lecture and 35 mins demos)
    -Resources (4 mins lecture and 53 mins demos)