All Learning Resources

  • A Toolbox for Curating and Archiving Research Software for Data Management Specialists

    In this page, a set of tools, resources, approaches, and questions are presented which allow researchers or research data management specialists to address potential knowledge gaps in providing software archiving and/or preservation services as a companion to data service. Click each topic to learn more about software sharing and archiving.
  • 36 Tutorials To Excel At MS Excel Spreadsheets

    Considering the internet is filled with free and inexpensive classes, it makes sense that you can find a wide range of Microsoft Excel tutorials to guide you through the process. What’s cool about these tutorials is that combining many of them together often gives you a more in-depth look into Excel than a regular college course would.
    Contents:
    - Why You Should Learn Excel
    - Excel Basics Tutorials
    - Advanced Excel Mathematics Tutorials
    - Excel Database Tutorials
    - MS Excel Functions Tutorials
    - Excel Graphing Tutorials
    - Excel Printing Tutorials
    - Other Small Business Resources

  • Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehouse Architecture

    This tutorial on data warehouse concepts will tell you everything you need to know in performing data warehousing and business intelligence. The various data warehouse concepts explained in this video are:
    1. What Is Data Warehousing?
    2. Data Warehousing Concepts:
    3. OLAP (On-Line Analytical Processing)
    4. Types of OLAP Cubes
    5. Dimensions, Facts & Measures
    6. Data Warehouse Schema
    Check our complete Data Warehousing & Business Intelligence playlist here.

  • A Complete Guide To Math And Statistics For Data Science

    Math and Statistics for Data Science are essential because these disciples form the basic foundation of all the Machine Learning Algorithms. In fact, Mathematics is behind everything around us, from shapes, patterns and colors, to the count of petals in a flower. Although having a good understanding of programming languages, Machine Learning algorithms and following a data-driven approach is necessary to become a Data Scientist, Data Science isn’t all about these fields.
    In this blog post, you will understand the importance of Math and Statistics for Data Science and how they can be used to build Machine Learning models. Here’s a list of topics author will be covering in this Math and Statistics for Data Science blog:

    -Introduction to Statistics
    -Terminologies in Statistics
    -Categories in Statistics
    -Understanding Descriptive Analysis
    -Descriptive Statistics In R
    -Understanding Inferential Analysis
    -Inferential Statistics In R

  • ASU Library Data Management Tutorials

    The contents include:
    -Introduction to Research Data Management tutorial: This tutorial provides an overview of the importance of your data management plan and best practices to help you manage research data.
    -Writing a Research Data Management Plan tutorial: This tutorial discusses the steps for writing a detailed and effective research data management plan.
    -Use the DMPTool to Write a Plan tutorial: This tutorial gives a basic overview of using the DMPTool to help individuals create a data management plan.
  • A Complete Python Tutorial to Learn Data Science from Scratch

    In this tutorial, you will learn data science using python from scratch, and It will also help you to learn basic data analysis methods using python, and you will also be able to enhance your knowledge of machine learning algorithms.
    Table of Contents
    1-Basics of Python for Data Analysis

    • Why learn Python for data analysis?
    • Python 2.7 v/s 3.4
    • How to install Python?
    • Running a few simple programs in Python

    2-Python libraries and data structures

    • Python Data Structures
    • Python Iteration and Conditional Constructs
    • Python Libraries

    3-Exploratory analysis in Python using Pandas

    • Introduction to series and data frames
    • Analytics Vidhya dataset- Loan Prediction Problem

    4-Data Munging in Python using Pandas
    5-Building a Predictive Model in Python

    • Logistic Regression
    • Decision Tree
    • Random Forest
  • ORNL DAAC Learning Resources

    This page provides a variety of resources for creating, submitting and using ORNL DAAC data. We provide guidance on preparing a data management plan and properly formatting data for long-term archiving, various access tools and services, and  using and analyzing data in popular software packages.  Resource types include tutorials, code, webinars, help pages, classroom exercises and workshops.  New resources are added as they are made available.

  • Ocean Teacher Global Academy

    The OceanTeacher Global Academy (OTGA) provides a comprehensive internet-based training platform that supports classroom training, blended training, and online (distance) learning.  OTGA aims to build equitable capacity related to ocean research, observations and services in all IOC Member States by delivering training courses on a range of topics addressing the priority areas of the UN Decade of Ocean Science for Sustainable Development and the 2030 Agenda and its SDGs as well as supporting the implementation of the IOC Capacity Development Strategy.

  • Structuring and Documenting a USGS Public Data Release

    This tutorial is designed to help scientists think about the best way to structure and document their USGS public data releases. The ultimate goal is to present data in a logical and organized manner that enables users to quickly understand the data. The first part of the tutorial describes the general considerations for structuring and documenting a data release, regardless of the platform being used to distribute the data. The second part of the tutorial describes how these general consideration can be implemented in ScienceBase. The tutorial is designed for USGS researchers, data managers, and collaborators, but some of the content may be useful for non-USGS researchers who need some tips for structuring and documenting their data for public distribution.

  • Python for Data Management


    This training webinar for Python is part of a technical webinar series created by the USGS Core Science Analytics, Synthesis, and Library section to improve data managers’ and scientists' skills with using Python in order to perform basic data management tasks. 

    Who: These training events are intended for a wide array of users, ranging from those with little or no experience with Python to others who may be familiar with the language but are interested in learning techniques for automating file manipulation, batch generation of metadata, and other data management related tasks.

    Requirements: This series will be taught using Jupyter notebook and the Python bundle that ships with the new USGS Metadata Wizard 2.x tool.

    Topics include:
    - Working with Local Files
    - Batch Metadata Handling
    - Using the USGS ScienceBase Platform with PySB

  • Data Management Planning Part 1: overview and a USGS program experience

    Emily Fort of the USGS presents an introduction to data management planning and a USGS program experience.

  • Data Management Planning Part 2: theory and practice in research data management

    Steve Tessler and Stan Smith present an example of a data management planning strategy for USGS science centers.

  • Data Collection Part 1: How to avoid a spreadsheet mess - Lessons learned from an ecologist

    Most scientists have experienced the disappointment of opening an old data file and not fully understanding the contents. During data collection, we frequently optimize ease and efficiency of data entry, producing files that are not well formatted or described for longer term uses, perhaps assuming in the moment that the details of our experiments and observations would be impossible to forget. We can make the best of our sometimes embarrassing data management errors by using them as ‘teachable moments’, opening our dusty file drawers to explore the most common errors, and some quick fixes to improve day-to-day approaches to data.
     

  • Data Collection Part 2: Relational databases - Getting the foundation right

  • Data Sharing and Management within a Large-Scale, Heterogeneous Sensor Network using the CUAHSI Hydrologic Information System

    Hydrology researchers are collecting data using in situ sensors at high frequencies, for extended durations, and with spatial distributions that require infrastructure for data storage, management, and sharing. Managing streaming sensor data is challenging, especially in large networks with large numbers of sites and sensors.  The availability and utility of these data in addressing scientific questions related to water availability, water quality, and natural disasters relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into usable data products.  It also depends on the ability of researchers to share and access the data in useable formats.  In this presentation I will describe tools that have been developed for research groups and sites conducting long term monitoring using in situ sensors.  Functionality includes the ability to track equipment, deployments, calibrations, and other events related to monitoring site maintenance and to link this information to the observational data that they are collecting, which is imperative in ensuring the quality of sensor-based data products. I will present these tools in the context of a data management and publication workflow case study for the iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) network of aquatic and terrestrial sensors.  iUTAH researchers have developed and deployed an ecohydrologic observatory to monitor Gradients Along Mountain to Urban Transitions (GAMUT). The GAMUT Network measures aspects of water inputs, outputs, and quality along a mountain-to-urban gradient in three watersheds that share common water sources (winter-derived precipitation) but differ in the human and biophysical nature of land-use transitions. GAMUT includes sensors at aquatic and terrestrial sites for real-time monitoring of common meteorological variables, snow accumulation and melt, soil moisture, surface water flow, and surface water quality. I will present the overall workflow we have developed, our use of existing software tools from the CUAHSI Hydrologic Information System, and new software tools that we have deployed for both managing the sensor infrastructure and for storing, managing, and sharing the sensor data.

  • Metadata: Standards, tools and recommended techniques

  • Monitoring Resources: web tools promoting documentation, data discovery and collaboration

    The presentation focuses on USGS/​Pacific Northwest Aquatic Monitoring Partnership's (PNAMP) Monitoring Resources toolset.

  • How high performance computing is changing the game for scientists, and how to get involved

  • Best practices for preparing data to share and preserve

    Scientists spend considerable time conducting field studies and experiments, analyzing the data collected, and writing research papers, but an often overlooked activity is effectively managing the resulting data. The goal of this webinar is to provide guidance on fundamental data management practices that investigators should perform during the course of data collection to improve the usability of their data sets.  Topics covered will include data structure, quality control, and data documentation. In addition, I will briefly discuss data curation practices that are done by archives to ensure that data can be discovered and used in the future. By following the practices, data will be less prone to error, more efficiently structured for analysis, and more readily understandable for any future questions that they might help address.

  • Data citation and you: Where things stand today

  • Open data and the USGS Science Data Catalog

  • Open Data Management in Agriculture and Nutrition Online Course

    This free online course aims to strengthen the capacity of data producers and data consumers to manage and use open data in agriculture and nutrition. One of the main learning objectives is for the course to be used widely within agricultural and nutrition knowledge networks, in different institutions. The course also aims to raise awareness of different types of data formats and uses, and to highlight how important it is for data to be reliable, accessible and transparent.
    The course is delivered through Moodle e-learning platform.  Course units include:

    Unit 1:  Open data principles (http://aims.fao.org/online-courses/open-data-management-agriculture-and-...)
    Unit 2:  Using open data (http://aims.fao.org/online-courses/open-data-management-agriculture-and-...)
    Unit 3:  Making data open (http://aims.fao.org/online-courses/open-data-management-agriculture-and-...)
    Unit 4:  Sharing open data (http://aims.fao.org/online-courses/open-data-management-agriculture-and-...)
    Unit 5:  IPR and Licensing (http://aims.fao.org/online-courses/open-data-management-agriculture-and-...)

    By the end of the course, participants will be able to:
    - Understand the principles and benefits of open data
    -  Understand ethics and responsible use of data
    -  Identify the steps to advocate for open data policies
    -  Understand how and where to find open data
    -  Apply techniques to data analysis and visualisation
    -  Recognise the necessary steps to set up an open data repository
    -  Define the FAIR data principles
    -  Understand the basics of copyright and database rights
    -  Apply open licenses to data
    The course is open to infomediaries which includes ICT workers, technologist - journalists, communication officers, librarians and extensionists; policy makers, administrators and project managers, and researchers, academics and scientists working in the area of  agriculture, nutrition, weather and climate, and land data.

  • Dendro Open-Source Dropbox

    Dendro is a collaborative file storage and description platform designed to support users in collecting and describing data, with its roots in research data management. It does not intend to replace existing research data repositories, because it is placed before the moment of deposit in a data repository.  The DENDRO platform is an open-source platform designed to help researchers describe their datasets, fully build on Linked Open Data. Whenever researchers want to publish a dataset, they can export to repositories such as CKAN, DSpace, Invenio, or EUDAT's B2SHARE. 

    It is designed to support the work of research groups with collaborative features such as:

    File metadata versioning
    Permissions management
    Editing and rollback
    Public/Private/Metadata Only project visibility

    You start by creating a “Project”, which is like a Dropbox shared folder. Projects can be private (completely invisible to non-colaborators), metadata-only (only metadata is visible but data is not), and public (everyone can read both data and metadata). Project members can then upload files and folders and describe those resources using domain-specific and generic metadata, so it can suit a broad spectrum of data description needs. The contents of some files that contain data (Excel, CSV, for example) is automatically extracted, as well as text from others (PDF, Word, TXT, etc) to assist discovery.

    Dendro provides a flexible data description framework built on Linked Open Data at the core (triple store as), scalable file storage for handling big files, BagIt-represented backups, authentication with ORCID and sharing to practically any repository platform.

    Further information about Dendro can be found on its Github repository at:  https://github.com/feup-infolab/dendro.  Documentation and descriptions of Dendro can be found in other languages from the primary URL home page.

  • RDM Onboarding Checklist

    Research Data Management is essential for responsible research and should be introduced when starting a new project or joining a new lab. Managing data across a project and/ or a team allows for accurate communication about that project. This session will review the important steps for onboarding new employees/trainees to a lab or new projects. The key takeaway from this session will be how to incorporate these steps within your individual project or lab environment. While the principles are general, these documents focus on Harvard policies and resources. Internal and external links have been provided throughout the document as supplementary resources, including a glossary of terms. 

    There are 2 checklists as follow: 
    The RDM Onboarding Checklist: Abridged Version serves as a condensed version of the comprehensive checklist described above. This version is intended to be used as an actionable checklist, employed after reviewing the onboarding processes and resources provided in the comprehensive checklist.
    The RDM Onboarding Checklist: Comprehensive Version serves as a general, research data management-focused guide to employee/trainee onboarding as they join a new lab or begin new projects (follow one or both of these as they apply to your situation). This comprehensive version is provided as an initial introduction to the onboarding process and to the breadth of available resources; this version is intended to be reviewed first, prior to utilizing the abridged version.

     Learning Objectives:

    • Become familiar with the research data lifecycle
    • Understand the details and requirements at each stage of data management onboarding
    • Engage with best practices to enhance your current and future research
    • Receive resources and contacts for future help

  • Open Access Post-Graduate Teaching Materials in Managing Research Data in Archaeology

    Looking after digital data is central to good research. We all know of horror stories of people losing or deleting their entire dissertation just weeks prior to a deadline! But even before this happens, good practice in looking after research data from the beginning to the end of a project makes work and life a lot less stressful. Defined in the widest sense, digital data includes all files created or manipulated on a computer (text, images, spreadsheets, databases, etc). With publishing and archiving of research increasingly online we all have a responsibility to ensure the long-term preservation of archaeological data, while at same time being aware of issues of sensitive data, intellectual property rights, open access, and freedom of information.
    The DataTrain teaching materials have been designed to familiarise post-graduate students in good practice in looking after their research data. A central tenet is the importance of thinking about this in conjunction with the projected outputs and publication of research projects. The eight presentations, followed by group discussion and written exercises, follow the lifecycle of digital data from pre-project planning, data creation, data management, publication, long-term preservation and lastly to issues of the re-use of digital data. At the same time the course follows the career path of researchers from post-graduate research students, through post-doctoral research projects, to larger collaborative and inter-disciplinary projects.
    The teaching material is targeted at co-ordinators of Core Research Skills courses for first year post-graduate research students in archaeology. The material is open access and you are invited to re-use and amend the content as best suits the requirements of your university department. The complete course is designed to run either as a four hour half-day workshop, or 2 x 2 hour classes. Alternatively, individual modules can be slotted into existing data management and core research skills teaching.