USGS Science Support Framework
USGS Data Management Training Modules – the Value of Data Management
This is one of six interactive modules created to help researchers, data stewards, managers and the public gain an understanding of the value of data management in science and provide best practices to perform good data management within their organization. In this module, you will learn how to: 1. Describe the various roles and responsibilities of data management. 2. Explain how data management relates to everyday work and the greater good. 3. Motivate (with examples) why data management is valuable. These basic lessons will provide the foundation for understanding why good data management is worth pursuing.
USGS Data Management Training Modules – Planning for Data Management
This is one of six interactive modules created to help researchers, data stewards, managers and the public gain an understanding of the value of data management in science and provide best practices to perform good data management within their organization. In this module, we will provide an overview of data management plans. First, we will define and describe Data Management Plans, or DMPs. We will then explain the benefits of creating a DMP. Finally, we will provide instructions on how to prepare a DMP, including covering key components common to most DMPs.
USGS Data Management Training Modules – Best Practices for Preparing Science Data to Share
This is one of six interactive modules created to help researchers, data stewards, managers and the public gain an understanding of the value of data management in science and provide best practices to perform good data management within their organization. In this module, you’ll learn:
The importance of maintaining well-managed science data
Nine fundamental practices scientists should implement when preparing data to share
Associated best practices for each data management habit
USGS Data Management Training Modules – Science Data Lifecycle
This is one of six interactive modules created to help researchers, data stewards, managers and the public gain an understanding of the value of data management in science and provide best practices to perform good data management within their organization. By the end of this module, you should be able to answer the following questions… What is a science data lifecycle? Why is a science data lifecycle important and useful? What are the elements of the USGS science data lifecycle, and how are they connected? What are the difference roles and responsibilities? Where do you go if you need more information?
USGS Data Management Training Modules – Planning for Data Management Part II
This is one of six interactive modules created to help researchers, data stewards, managers and the public gain an understanding of the value of data management in science and provide best practices to perform good data management within their organization. By the end of this course you should know the difference between data management plans and project plans; you should know how to use the DMPTool to create a data management plan; and you should understand the basic information that should go into a data management plan.
Structuring and Documenting a USGS Public Data Release
This tutorial is designed to help scientists think about the best way to structure and document their USGS public data releases. The ultimate goal is to present data in a logical and organized manner that enables users to quickly understand the data. The first part of the tutorial describes the general considerations for structuring and documenting a data release, regardless of the platform being used to distribute the data. The second part of the tutorial describes how these general consideration can be implemented in ScienceBase. The tutorial is designed for USGS researchers, data managers, and collaborators, but some of the content may be useful for non-USGS researchers who need some tips for structuring and documenting their data for public distribution.
Data Management Planning Part 1: overview and a USGS program experience
Emily Fort of the USGS presents an introduction to data management planning and a USGS program experience.
Data Management Planning Part 2: theory and practice in research data management
Steve Tessler and Stan Smith present an example of a data management planning strategy for USGS science centers.
Data Collection Part 1: How to avoid a spreadsheet mess - Lessons learned from an ecologist
Most scientists have experienced the disappointment of opening an old data file and not fully understanding the contents. During data collection, we frequently optimize ease and efficiency of data entry, producing files that are not well formatted or described for longer term uses, perhaps assuming in the moment that the details of our experiments and observations would be impossible to forget. We can make the best of our sometimes embarrassing data management errors by using them as ‘teachable moments’, opening our dusty file drawers to explore the most common errors, and some quick fixes to improve day-to-day approaches to data.
Data Collection Part 2: Relational databases - Getting the foundation right
Data Sharing and Management within a Large-Scale, Heterogeneous Sensor Network using the CUAHSI Hydrologic Information System
Hydrology researchers are collecting data using in situ sensors at high frequencies, for extended durations, and with spatial distributions that require infrastructure for data storage, management, and sharing. Managing streaming sensor data is challenging, especially in large networks with large numbers of sites and sensors. The availability and utility of these data in addressing scientific questions related to water availability, water quality, and natural disasters relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into usable data products. It also depends on the ability of researchers to share and access the data in useable formats. In this presentation I will describe tools that have been developed for research groups and sites conducting long term monitoring using in situ sensors. Functionality includes the ability to track equipment, deployments, calibrations, and other events related to monitoring site maintenance and to link this information to the observational data that they are collecting, which is imperative in ensuring the quality of sensor-based data products. I will present these tools in the context of a data management and publication workflow case study for the iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) network of aquatic and terrestrial sensors. iUTAH researchers have developed and deployed an ecohydrologic observatory to monitor Gradients Along Mountain to Urban Transitions (GAMUT). The GAMUT Network measures aspects of water inputs, outputs, and quality along a mountain-to-urban gradient in three watersheds that share common water sources (winter-derived precipitation) but differ in the human and biophysical nature of land-use transitions. GAMUT includes sensors at aquatic and terrestrial sites for real-time monitoring of common meteorological variables, snow accumulation and melt, soil moisture, surface water flow, and surface water quality. I will present the overall workflow we have developed, our use of existing software tools from the CUAHSI Hydrologic Information System, and new software tools that we have deployed for both managing the sensor infrastructure and for storing, managing, and sharing the sensor data.
Metadata: Standards, tools and recommended techniques
How high performance computing is changing the game for scientists, and how to get involved
Best practices for preparing data to share and preserve
Scientists spend considerable time conducting field studies and experiments, analyzing the data collected, and writing research papers, but an often overlooked activity is effectively managing the resulting data. The goal of this webinar is to provide guidance on fundamental data management practices that investigators should perform during the course of data collection to improve the usability of their data sets. Topics covered will include data structure, quality control, and data documentation. In addition, I will briefly discuss data curation practices that are done by archives to ensure that data can be discovered and used in the future. By following the practices, data will be less prone to error, more efficiently structured for analysis, and more readily understandable for any future questions that they might help address.
Data citation and you: Where things stand today
Open data and the USGS Science Data Catalog
USGS Data Templates Overview
Creating Data Templates for data collection, data storage, and metadata saves time and increases consistency. Utilizing form validation increases data entry reliability.
- Why use data templates?
- Templates During Data Entry - how to design data validating templates
- After Data Entry - ensuring accurate data entry
- Data Storage and Metadata
- Best Practices
- Data Templates
- Long-term Storage
- Tools for creating data templates
- Google Forms
- Microsoft Excel
- Microsoft Access
- OpenOffice - Calc
Training Materials for Data Management in Reclamation
This document (downloadable from this landing page) provides supplementary educational materials focused upon US Bureau of Reclamation (USBR) approaches to data management that use and expand upon a number of USGS training modules on data management. The USBR supplementary materials include:
- A discussion of the Reclamation data lifecycle
- A Reclamation data management plan template
- Examples of Reclamation data management best practice
- Lessons learned from various USBR data management efforts.
Python for Data Management
This training webinar for Python is part of a technical webinar series created by the USGS Core Science Analytics, Synthesis, and Library section to improve data managers’ and scientists' skills with using Python in order to perform basic data management tasks.
Who: These training events are intended for a wide array of users, ranging from those with little or no experience with Python to others who may be familiar with the language but are interested in learning techniques for automating file manipulation, batch generation of metadata, and other data management related tasks.
Requirements: This series will be taught using Jupyter notebook and the Python bundle that ships with the new USGS Metadata Wizard 2.x tool.
- Working with Local Files
- Batch Metadata Handling
- Using the USGS ScienceBase Platform with PySB
USGS Data Management Training Modules—Metadata for Research Data
This is one of six interactive modules created to help researchers, data stewards, managers and the public gain an understanding of the value of data management in science and provide best practices to perform good data management within their organization. This module covers metadata for research data. The USGS Data Management Training modules were funded by the USGS Community for Data Integration and the USGS Office of Organizational and Employee Development's Technology Enabled Learning Program in collaboration with Bureau of Land Management, California Digital Library, and Oak Ridge National Laboratory. Special thanks to Jeffrey Morisette, Dept. of the Interior North Central Climate Science Center; Janice Gordon, USGS Core Science Analytics, Synthesis, and Libraries; National Indian Programs Training Center; and Keith Kirk, USGS Office of Science Quality Information.