All Learning Resources

  • Responsible Data Use: Copyright and Data

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Copyright and Data".  The module was authored by Matthew Mayernik from the National Center for Atmospheric Research.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  This module is available in both presentation slide and video formats.

    In this module, we will focus on copyright law and associated procedures related to data. We don’t often think about data as having intellectual properties as a book or a movie would, but there are some important intellectual property issues to understand about data, especially involving copyright.

    We will first talk about what is and is not copyrightable in the United States with respect to data. Copyright laws can be vary greatly from country to country and jurisdiction to jurisdiction around the world, so we want to emphasize that our discussion in this module will focus upon copyright and data in the United States. 

    We will also talk about open copyright license options and how they apply to data, and how copyright can be used or deliberately waived, in order to make data more open and easier to access and use.  We will also discuss how it’s possible to use non legal means for establishing community based norms to address some of these issues.  

  • Data Management Plans: Why Create a Data Management Plan?

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Why Create a Data Management Plan?"  The module was authored by Ruth Duerr from The Ronin Institute.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).
    In this module, we’ll  very briefly review what a Data Management Plan is, followed by a discussion of our top three reasons for creating a data management plan.  These reasons are:  

    First, that proper data management planning should make your work easier and possibly even cheaper than if you had handled your data in an ad-hoc fashion throughout your project;  Second, handling your data properly and documenting them well, can actually improve your standing with your users and with your colleagues, most importantly,  and Last and perhaps least, because your funding agency says that you must.  Hopefully, by the end of this module you will become convinced that while funding agency requirements may be the stick making you create data management plans now, creation of the plans has actually been in your best interest all along.  This module is available in both presentation slide and video formats.

  • Data Management Plans: Elements of a Data Management Plan

     
    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Elements of a Data Management Plan".  The module was authored by Ruth Duerr from The Ronin Institutue.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).

    In the module "Why Create a Data Management Plan?” we learned that it is just as important to plan how you will manage your data as it is to plan the rest of your research activities.  Given that, your next question is likely to be “Well what should be in a data management plan?”  Answering that question, at a high level, is the purpose of this module.  In the following slide we give a brief description of each of the components or elements of a plan.  This module is available in both presentation slide and video formats.

  • Elements of a Data Management Plan: Identifying the materials to be created

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Identifying the Materials to be Created".  The module was authored by Ruth Duerr from The Ronin Institute.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA). 

    As discussed in the module “Data Management Plans: Elements of a Data Management Plan”, your data management plan needs to discuss the type or types of data that will be produced over the course of your research.  This module is available in both presentation slide and video formats.

  • Elements of a Data Management Plan: Organization and standards

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Identifying the Materials to be Created".  The module was authored by Ruth Duerr from The Ronin Institute.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).   This modules is available in both presentation slide and video formats.

    The purpose of this module is to introduce you to the range of topics related to data organization and standards that may be important for you to address in your data management plan.  We say may be important because just as every research project is different, so every data management plan needs to be different to accommodate the particular needs of that project.  This means that some topics on data organization and standards will be more important to think about and describe in your data management plan than others.  Which topics are more important will depend on the size, scale, complexity and other details of your project that differentiate what you are doing from what others are  doing.  So… no, unfortunately, there isn’t a one-size-fits all data management plan!  

  • Managing Your Data: Assign Descriptive File Names

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course. The subject of this module is "Assign Descriptive File Names."  This module was authored by Robert Cook from the Oak Ridge National Laboratory.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).In this module we’re going to talk about how to construct unique file names that can be readily identified and found. The file names should reflect the contents of each file and include enough information so you can uniquely identify the file. The practices have been written for data files, but we think you’ll see that these practices can also be used for other types of files, such as documents, spreadsheets, presentations and even your pictures.  The goal is to have you open a directory on your computer, be able to readily identify the contents of each file and get to the file that you need.   This module is available in both presentation slide and video formats.

  • Managing Your Data: Backing Up Your Data

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course. The subject of this module is "Backing Up Your Data."  This module was authored by Robert Cook from the Oak Ridge National Laboratory.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  This module is available in both presentation slide and video formats.

  • Data Formats: Choosing and Adopting Community Accepted Standards

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Choosing and Adopting Community Accepted Standards".  The module was authored by Curt Tilmes from the National Aeronautics and Space Administration (NASA).  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  In this module, we’re going to talk about formats for your data, and provide some guidelines for choosing and adopting community accepted standards for data formats.  This module is available in both presentation slide and video formats.

  • Creating Documentation and Metadata: Introduction to Metadata and Metadata Standards

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Introduction to Metadata and Metadata Standards".  The module was authored by Lynn Yarmey from the National Snow and Ice Data Center.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).

    In this module, we will be talking about metadata. To give you an overview of what this module will cover, we will be discussing terminology related to metadata, look at examples of metadata, bring metadata standards into the conversation and offer some suggestions on how to start implementing some of what we talk about in your lab. We will also give you a few pointers for more information.  This module is available in both presentation slide and video formats.

  • Creating Documentation and Metadata: Metadata for Discovery

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course. The subject of this module is "Metadata for Discovery."  This module was authored by Lola Olsen from the National Aeronautics & Space Administration, NASA, and Tyler Stevens, NASA Contractor for Wyle Information Systems at the Global Change Master Directory, and the Goddard Space Flight Center.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).

    In this module, we will provide an introduction to discovery level metadata, talk about key categories of this metadata, and show several examples.  This module is available in both presentation slide and video formats.

  • Working with Your Archive: Broadening Your User Community

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Broadening Your User Community".  The module was authored by Robert R. Downs from the NASA Socioeconomic Data and Applications Center which is operated by CIESIN – the Center for International Earth Science Information Network at Columbia University.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA.

    In this module, we will be discussing the relevance that broadening your user community has to data management as well as its advantages to you.  We will talk about ways to assess the current state of your data users, uses and gaps, then ways to develop a plan to broaden your user community.  Finally, we will discuss methods for broadening your user community initially and on an ongoing basis. 
    This module is available in both presentation slide and video formats.

  • Local Data Management: Advertising Your Data

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Advertising your data."  This module was authored by Nancy Hoebelheinrich from Knowledge Motifs LLC.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  This module is available in both presentation slide and video formats.  

    What we will be talking about in this presentation is why and how to advertise your data. When we talk about advertising your data we are thinking of techniques that go beyond word of mouth or informal notice. We’ll discuss the fact that funding agencies and institutions may actually require that you get word out about your published data, and that advertising your data can jump-start your career.
     
    We’ll also talk about different ways of advertising your data.  One of the obvious methods is by submitting descriptive information about your data to various destinations such as catalogs, but there are an ever-increasing number of other methods that can be used as well.  We will cover a couple of those methods in this brief introduction.  We’ll also encourage you to seek the help of the data center or data archive where you’re storing your data to help you advertise your data.  
     

  • Advertising your data: Agency requirements for submitting metadata

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Agency Requirements for Submitting Metadata."  This module was authored by Nancy Hoebelheinrich from Knowledge Motifs LLC.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).

    As an overview, we will touch the following topics in this module: 

    - How the agencies persuade you to make metadata available about your data by submission or publication. Metadata can be defined as descriptive information about your data of the type that is usually found in search portals. 
    - Specifically, we will be talking about the National Science Foundation (NSF’s) required Data Management Plan that motivates you to make metadata available.
    - The National Aeronautics and Space Administration (NASA’s) Data and Information Policy that encourages you to make metadata available.
    - The National Oceanic and Atmospheric Administration (NOAA’s) Administrative Order 212-15 that directs you to make metadata available.
    - How the practice of submitting or publishing metadata is promoted by the E-Government Act of 2002 (44 U.S.C 3602).
    Other topics we’ll discuss include the timeliness of the metadata submission, and some dissemination tools and techniques that can help you make your metadata available to the public. This module is available in both presentation slide and video formats.

  • Advertising your data: Using data portals and metadata registries

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course. The subject of this module is "Using Data Portals and Metadata Registries." This module was authored by Nancy Hoebelheinrich from Knowledge Motifs LLC. Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).

    In this module, we will be talking about some common gateways that people often use to find your data: data portals and metadata registries.  We will describe what a data portal is and contrast it to a metadata registry.  We will explain some key information that you will need to know to submit information about your data, and some options for submitting the information to these gateways.  In addition, we will discuss some relevant data portals and metadata registries for science data. This module is available in both presentation slide and video formats.
     

  • Using Data Portals and Metadata Registries: Submitting Metadata to the GCMD

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course. The subject of this module is "Submitting Metadata to the GCMD."  This module was authored by Lola Olsen from the National Aeronautics & Space Administration, NASA, and Tyler Stevens, NASA Contractor for Wyle Information Systems at the Global Change Master Directory, and the Goddard Space Flight Center.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).In this module, we will provide an introduction to the Global Change Master Directory, known as the GCMD; describe how to submit a metadata record for your data to the GCMD using the docBUILDER metadata authoring tool; and provide an overview of docBUILDER functionality.  This module is available in both presentation slide and video formats.

  • Local Data Management: Providing Access to Your Data

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course. The subject of this module is "Providing Access to Data". The module was authored by Matthew Mayernik from the National Center for Atmospheric Research. Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).

    In this module, we will talk about how you can provide access to your data. Arguing that data should be openly available and why that is important, we’ll discuss funding agency requirements for making data available and accessible with a focus upon United States Government agencies. We’ll ask the question, who has responsibility for providing access to your data? Despite the diagram’s indication on this slide, it is individuals who need to take responsibility for providing access to their own data in various ways. To help you follow through on that responsibility, we’ll talk generally about the challenges involved in making data accessible. This module is available in both presentation slide and video formats.

  • Providing Access to Your Data: Determining Your Audience

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Determining Your Audience".  The module was authored by Robert R. Downs from the NASA Socioeconomic Data and Applications Center which is operated by CIESIN – the Center for International Earth Science Information Network at Columbia University.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  This module is available in both presentation slide and video formats.

    When you think about providing access to your data, it’s important to think about the audiences that could use the data itself, but also to those who could use the data products and services generated from them.  The users of the data could be those currently interested in your data as well as future users of the data. Keep in mind that there could be several audiences for your data as they move through the entire life cycle. The audiences might reflect various user demographics or various purposes for using the data, and can certainly change over time.

    Determining the audiences that use your data can help you identify their needs, and inform the development of your data to meet those needs. Development of the data might include the creation of data products and services that you provide to assist users in using your data.  Knowledge of the audiences for your data will help you identify the various products and services that you might offer to current or new user communities, and also help verify that the needs of your users are being met.

    Efforts to determine the audiences for your data should continue throughout the entire data life cycle so that you can improve the user experience. Awareness of the initial users of your data can inform the data development process, as well as your plans for disseminating the data and for providing stewardship to manage the data over time. Observations about later users of your data can inform potential improvements for your data, so you can better serve both your current and your future users.  This module is available in both presentation slide and video formats.

  • Providing Access to Your Data: Access Mechanisms

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Access Mechanisms".  The module was authored by Robert R. Downs from the NASA Socioeconomic Data and Applications Center which is operated by CIESIN – the Center for International Earth Science Information Network at Columbia University.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  This module is available in both presentation slide and video formats.

    In this module, we plan to give you some background and context for the topic and describe its relevance to data management.  We’d like to introduce you to a way to think about the parties who can provide access to your data, and some of the mechanisms that might be used.  We’ll discuss some community considerations and resource considerations for access, and, finally describe some access mechanisms that can be offered by data centers.

  • Providing Access to Your Data: Tracking Data Usage

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Tracking Data Usage".  The module was authored by Robert R. Downs from the NASA Socioeconomic Data and Application Center which is operated by CIESIN – the Center for International Earth Science Information Network at Columbia University.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  This module is available in both presentation slide and video formats.

    In this module, we will give you some background and context for this topic, and then describe its relevance to data management.  We’ll discuss what data usage can tell you about your data and where you can find usage information.  We’ll also briefly discuss the advantages of tracking data citations.

  • Providing Access to Your Data: Handling Sensitive Data

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Handling Sensitive Data".  The module was authored by Robert R. Downs from the NASA Socioeconomic Data and Applications Center which is operated by CIESIN – the Center for International Earth Science Information Network at Columbia University.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  This module is available in both presentation slide and video formats.

    In this module, we will tell you what sensitive data is and provide some background information about it.  We will discuss why it is important that you identify and manage sensitive data, particularly for science.  We’ll also talk about some important issues to discuss with your archive about managing the sensitive data.

  • Providing Access to Your Data: Rights

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Rights".  The module was authored by Robert R. Downs from the NASA Socioeconomic Data and Applications Center which is operated by CIESIN – the Center for International Earth Science Information Network at Columbia University.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  This module is available in both presentation slide and video formats.

    In this module, we will first provide some background and context on the topic of rights, discuss the relevance of rights to data management and then describe some options that you have for assigning rights, with examples. 

  • Earth Lab Free Online Courses, Tutorials and Tools

    Welcome to Earth Data Science This site contains open, tutorials and course materials covering topics including data integration, GIS and data intensive science. Explore our 312 earth data science lessons that will help you learn how to work with data in the R and Python programming languages.  Also, you can get a professional Certificate in Earth Data Analytics at University of Colorado, Boulder.  

    Online courses available include, for example:
    - Online Earth Data Science Courses
    - Use Data for Earth and Environmental Science in Open Source Python Textbook
    - Scientist’s Guide to Plotting Data in Python Textbook
    - Earth Analytics Bootcamp Course
    - Intro to Earth Data Science Textbook
    - Earth Analytics Python Course
    - Earth Analytics Bootcamp Course
    - Earth Analytics R Course

    Earth Analytics Workshops include, for example:
    - Get Started With GIS in Open Source Python Tools
    - Setup the Earth Analytics Python Environment On Your Computer
    -  Introduction to Clean Coding and the tidyverse in R

    Recent Tutorials include: 
    - Visualizing hourly traffic crime data for Denver, Colorado using R, dplyr, and ggplot
    - Calculating the area of polygons in Google Earth Engine
    - Introduction to the Google Earth Engine Python API
     

  • Open Source Software for Preprocessing GIS Data for Hydrological Models

    The information available from this web page cover a number of topics, courses, video and web tutorials related to Open Source Software for Preprocessing GIS Data for Hydrological Models. Courses include QGIS for Hydrological Applications, Using GDAL for preprocessing, Python 3 Tutorial, and Field surveys with QGIS, Mergin and Input.  
    The courses, video tutorials and webinars are designed for professionals (engineers and scientists) active in the water/environmental sector, especially those involved in planning and management of water systems as well as numerical modelling. Pre-requisites are a basic knowledge of computing and water related topics.
    After these courses, you will be able to understand:
    -The basic concepts of GIS Raster, vector, projections, geospatial analysis Use a GIS for:
    -Thematic mapping
    -Basic data processing and editing
    -Basic geoprocessing and analysis
    -DEM processing and catchment delineation
    -Find open source software and open data

  • UNIX Tutorial for Beginners

    A beginner’s guide to the Unix and Linux operating system. Eight simple tutorials which cover the basics of UNIX / Linux commands.  Other Unix resources are listed on the home page as well.

  • The Challenge of Big Data for the Social Sciences

     
    The ubiquity of "big data" about social, political and economic phenomena has the potential to transform the way we approach social science. In this talk, Professor Benoit outlines the challenges and opportunities to social sciences caused by the rise of big data, with applications and examples. He discusses the rise of the field of data science, and whether this is a threat or a blessing for the traditional social scientific model and its ability to help us better understand society.

     

  • The Theory, Practice and Limits of Big Data for the Social Sciences

    Martin Hilbert delivered this talk on May 1, 2015 at the Institute for Social Sciences conference series Leading Research in the Social Sciences Today.  This video presents The Theory, Practice and Limits of Big Data for the Social Sciences. Dr. Hilbert talks about storage, information and growth, the concept of a digital footprint and data using examples to clarify the content. 

  • Steps in a Digital Preservation Workflow

    Workflows are the way people get work done, and can be illustrated as series of steps that need to be completed sequentially in a diagram or checklist. It can involve anything, from documentation to tasks and data being moved from one location to the next.This presentation will outline generic considerations and processes for building and managing a digital preservation workflow. It will consider the workflow within the larger context of a digital content life cycle, which runs from information creation through to ongoing discovery and access. It will focus upon generalized steps institutions can use to acquire, preserve and serve content. The presentation will describe distinct workflow stages in conjunction with sample procedures, policies, tools and services, stressing the dynamic nature of workflows over time, including the use of modular components and ongoing work to enhance automation and cope with issues of scale.

    In this video, the presenter points out below topics:
    -Introduction to workflow in a digital preservation context
    -outline of how to conceptualize a workflow
    -Variables that influence the design execution of workflow
    -consideration of some existing models, architectures and tools

  • Google Earth Engine Tutorials

    These tutorials provide an introduction to using the Google Earth Engine JavaScript API for advanced geospatial analysis. The tutorials assume no programming background, although they do assume a willingness to learn some JavaScript programming. The links in this page can be used to get started on the tutorials or use the menus on the left to jump to a section of interest. In addition, there are 10 video tutorials from lectures or hands-on trainings conducted at the Earth Engine Users' Summit. View the videos after completeing the self-paced tutorials.  Finally, the Earth Engine developer community has created additional tutorials on topics deemed important by the community.  These can be found under the Community Tutorials and include topics such as: Combining Feature Collections, Customizing Base Map Styles and others.  

    The topics for API tutorials are:
    -Introduction to JavaScript for Earth Engine
    -Introduction to the Earth Engine JavaScript API
    -Introduction to Global Forest Change datasets
    -Introduction to the JRC Global Surface Water dataset
    -Introduction to Earth Engine (condensed)
    -Classification
    --Hands-on Intermediate Training
    Arrays and Matrices
    -Time Series Analysis
    -Tables and Vectors
    -Importing and Exporting
    -Earth Engine and the Google Cloud Platform
    -Google Maps API
    -Publishing and Storytelling
     

  • Introduction to Data Management for Undergraduate Students: Data Management Overview

    This library guide covers the basics and best practices for data management for individuals who are new to the research and data-collecting process.  Topics included in this guide are:
    - Data Management Overview
    - Data Documentation
    - Data Preservation
    - Filenaming Conventions
    - Data Backup 

  • Virginia (VA) Data Management Boot Camp 2016

    Institutions throughout Virginia have partnered since 2013 to present an annual Data Management Boot Camp.  Included at this web site are presentation slides on various topics including Organizing Data, Documentation and Metadata, Data Ownership, Sharing Data, Finding Data, and DMP Tool Presentation.  Other resources include links to TEDx Talks, datasets and exercises associated with various topics and tools such as Open Refine, R & R Studio, and DMP Online.   Recordings from previous years are also linked. 

  • Ag Data Commons Monthly Webinar Series

    Each month the Ag Data Commons offers a webinar with topics ranging from introduction for new users to topics with a data management or curation focus. We also leave time for organized question and answer periods. To join us for any of the upcoming webinars, you can email [email protected] and we will mail the join information to you for upcoming webinars. You can also check the news section for the next webinar's connect information. Upcoming webinars are listed on the Ag Data Commons News Page at https://data.nal.usda.gov/news, complete with details about the webinar subject and connect information. Please note each meeting number will be different.
    Topics include: 
    Making Data Machine Readable
    Creating a Data Management Plan
    Data Dictionaries
    Data-Literature Linking in the Ag Data Commons
    Data Science & Agriculture
    Introduction to GeoData

  • Python Developer’s Guide

    This guide is a comprehensive resource for contributing to Python – for both new and experienced contributors. These instructions cover how to get a working copy of the source code and a compiled version of the CPython interpreter (CPython is the version of Python) .It also gives an overview of the directory structure of the CPython source code. There are 32 sections Step-by-step Guide available from this web page.
     

  • Data Carpentry Ecology Workshop

    Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop uses a tabular ecology dataset and teaches data cleaning, management, analysis and visualization.

    The workshop can be taught using R or Python as the base language.

    Overview of the lessons:

    Data organization in spreadsheets
    Data cleaning with OpenRefine
    Introduction to R or python
    Data analysis and visualization in R or python
    SQL for data management

  • CMU Intro to Database Systems Course

    These courses are focused on the design and implementation of database management systems. Topics include data models (relational, document, key/value), storage models (n-ary, decomposition), query languages (SQL, stored procedures), storage architectures (heaps, log-structured), indexing (order preserving trees, hash tables), transaction processing (ACID, concurrency control), recovery (logging, checkpoints), query processing (joins, sorting, aggregation, optimization), and parallel architectures (multi-core, distributed). Case studies on open-source and commercial database systems will be used to illustrate these techniques and trade-offs. The course is appropriate for students with strong systems programming skills.  There are 26 videos associated with this course which was originally offered in Fall 2018 as Course 15 445/645 at Carnegie Mellon University.  

  • OMOP Common Data Model and Extract, Transform & Load Tutorial

    In this tutorial you will learn about the details of the Observational Medical Outcomes Partnership (OMOP) Common Data Model  (CDM) and how to apply it to Extract, Transform & Load (ETL) data.  The OMOP Common Data Model allows for the systematic analysis of disparate observational databases. The concept behind this approach is to transform data contained within those databases into a common format (data model) as well as a common representation (terminologies, vocabularies, coding schemes), and then perform systematic analyses using a library of standard analytic routines that have been written based on the common format.  In this tutorial, you can observe Best practices of converting data into a data module.
    Topics covered within this tutorial include:  
    -What is OMOP/OHDSI?
    -OMOP Common Data Model (CDM)– Why and How
    - How to retrieve data from OMOP CDM
    -Setup and Performing of an Extract Transform and Load process into the CDM
    -Using WhiteRabbit and Rabbit-In-A-Hat to Build an ETL
    - Testing and Quality Assurance

    Included with the video presentation of the tutorial include:
    Tutorial slides
    CDM_QUERY_EXAMPLES.sql
    CDM_QUERY_EXAMPLES_EXTRAS.sql
    OHDSI-in-a-box
    TUTORIAL_ScanReport.xlsx

    The OHDSI Common Data Model and Extract, Transform & Load Tutorial took place on September 24rd, 2016 during the 2016 OHDSI Symposium. Recordings were made possible by the generous support of Johnson & Johnson, the JKTG Foundation, and Pfizer.

  • OMOP Common Data Model and Standardized Vocabularies

    This workshop is for data holders who want to apply OHDSI’s data standards to their own observational datasets and researchers who want to be aware of OHDSI’s data standards, so they can leverage data in OMOP CDM format for their own research purposes.

    Topics covered within this tutorial include:  
    -Introductions and Ground Rules Foundational
     • History of OMOP
    • Why and How
     • Birth of OHDSI
     -Introduction to OMOP Common Data Model OHDSI Community
     Example of Remote Study
     VM Overview
    -Ancestors & Descendants
    - How does it work for Drugs
    -SQL Examples
    -History of the model
    - In-depth discussion of model
    -Era discussion
    - Real-World Scenario
    - ETL Piballs
    -Leveraging OHDSI Tools
    -OHDSI Community

    After the Tutorials, you will know: 
    1. History of OMOP, OHDSI
    2. How Standardized Vocabulary works
    3. How to find codes and Concepts
    4. How to navigate the concept hierarchy
    5. The OMOP Common Data Model (CDM)
    6. How to use the OMOP CDM 

    Included with the video presentation of the tutorial include:
    Tutorial slides
     

  • SPSS Data Curation Primer

    This data curation primer primarily discusses .sav and .por files. SPSS Statistics (.sav): Data files saved in IBM SPSS Statistics format. Portable (.por): Portable format that can be read by other versions of IBM SPSS Statistics and versions on other operating systems.
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018.
    Table of Contents:
    -Description of Format
    -Example Data
    -Start the Conversation: Broad Questions and Clarifications on Research Data
    -Key Questions
    -Key Clarifications
    -Applicable Metadata Standards, Recommended Elements, and Readme File
    -Tutorials
    -Software
    -Preservation Actions
    -FAIR Principles & SPSS
    -Format Use
    -Documentation of Curation Process
    -Appendix A: Other SPSS File Formats
    -Appendix B: Project Level or Study Level Metadata
    -Appendix C: DDI Metadata
    -Appendix D: Dictionary Schema
    -Bibliography

    Other Data Curation Primers can be found at:  https://conservancy.umn.edu/handle/11299/202810.  Interactive primers available for download and derivatives at: https://github.com/DataCurationNetwork/data-primers.

     

  • Microsoft Excel Data Curation Primer

    Microsoft Excel’s widespread adoption in the corporate sector is well known, but the application has also found use in many areas of scholarship. Despite the ubiquity of tabular data in CSV (comma-separated values) format, and the availability of many tools and analysis platforms that operate on CSV files, Microsoft Excel continues to be used widely in the natural sciences and social sciences. As a consequence, Excel files are routinely deposited in data repositories and curators are likely to encounter them.
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018.
    Table of contents:
     -Description of format 
    -Overview 
    -Characteristics
     -Typical purposes and functions 
    -What to look for 
    -Problems opening the file 
    -Content problems 
    -Software for viewing or analyzing data -Preservation actions 
    -Excel CURATE checklist 
    -Appendix: Creating a data dictionary
    - References

    More information about the collection of Data Curation Primers can be found at:  http://hdl.handle.net/11299/202810.

    Interactive primers available for download and derivatives at: https://github.com/DataCurationNetwork/data-primers.
     

  • Jupyter Notebooks: A Primer for Data Curators

    Jupyter Notebooks are composite digital objects used to develop, share, view, and execute interspersed, interlinked, and interactive documentation, equations, visualizations, and code. Researchers seeking to deposit software, in this case Jupyter Notebooks, in repositories do so with the expectation that repositories will provide documentation explaining “what you can deposit, the supported file formats for deposits, what metadata you may need to provide, how to provide this metadata and what happens after you make your deposit” (Jackson, 2018a). This expectation is not necessarily met by repositories that currently accept software deposits and complex objects like Jupyter Notebooks. This guide is meant to both inform curatorial practices around Jupyter Notebooks, and support the development of resources that meet researchers’ expectations to ensure long-term availability of software in curated archival repositories. Guidance provided by Jisc and the Software Sustainability Institute outlines three different kinds of software deposits: a minimal deposit, a runnable deposit, and a comprehensive deposit (Jackson, 2018b). This primer follows this same conceptual framework in dealing with Jupyter Notebooks, which even in their static, non-executable form, can be used to document how scientific research was carried out or be used as teaching models among many other use cases.
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018.

    The full set of Data Curation Primers can be found at:  https://conservancy.umn.edu/handle/11299/202810.

    Interactive primers available for download and derivatives at: https://github.com/DataCurationNetwork/data-primers.

  • Microsoft Access Data Curation Primer

    This primer assumes a conceptual familiarity with relational databases (and associated terminology) and a basic level of experience with Microsoft Access.
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018.

    More information about the collection of Data Curation Primers can be found at:  http://hdl.handle.net/11299/202810.

    Interactive primers available for download and derivatives at: https://github.com/DataCurationNetwork/data-primers.