All Learning Resources
Intro to Python for Data Science
Python is a general-purpose programming language that is becoming more and more popular for doing data science. Companies worldwide are using Python to harvest insights from their data and get a competitive edge. Unlike any other Python tutorial, this course focuses on Python specifically for data science. In our Intro to Python class, you will learn about powerful ways to store and manipulate data as well as cool data science tools to start your own analyses. Topics covered include: Python basics, Python lists, functions and packages, and NumPy, an array package for Python.
Using, learning, teaching, and programming with the Paleobiology Database
The Paleobiology Database is a public database of paleontological data that anyone can use, maintained by an international non-governmental group of paleontologists. You can explore the data online in the Navigator, which lets you filter fossil occurrences by time, space, and taxonomy, and displays their modern and paleogeographic locations; or you can download the data to your own computer to do your own analyses. The educational resources offered by the Paleobiology include:
- Presentations including lectures and slide shows to introduce you to the PBDB
- Web apps that provide a variety of online interfaces for exploring PBDB data via the API
- Mobile apps that provide applications for iOS and Android providing new views of the PBDB's data via the API
- Lesson plans and teaching activities using the Paleobiology Database
- Tutorials on how to get and use data from the website, and on how to contribute data to the database, viewable on Youtube
- Libraries and functions for interacting with PBDB data via R
- Documentation, code examples, and issue reporting for the PBDB API
- Other Paleobiology Database related external resources including a link to the Paleobiology Github repository
For more information about the Paleobiology Database, see: https://paleobiodb.org/#/faq .
The intermediate R course is the logical next stop on your journey in the R programming language. In this R training you will learn about conditional statements, loops and functions to power your own R scripts. Next, you can make your R code more efficient and readable using the apply functions. Finally, the utilities chapter gets you up to speed with regular expressions in the R programming language, data structure manipulations and times and dates. This R tutorial will allow you to learn R and take the next step in advancing your overall knowledge and capabilities while programming in R.
Hivebench Electronic Lab Notebook
The time it takes to prepare, analyze and share experimental results can seem prohibitive, especially in the current, highly competitive world of biological research. However, not only is data sharing mandated by certain funding and governmental bodies, it also has distinct advantages for research quality and impact. Good laboratory practices recommend that all researchers use electronic lab notebooks (ELN) to save their results. This resource includes numerous short video demonstrations of Hivebench:
- Start using Hivebench, the full demo
- Creating a Hivebench account
- Managing protocols & methods
- Storing experimental findings in a notebook
- Managing research data
- Doing research on iPhone and iPad
- Editing experiments
- Collaborating with colleagues
- Searching for results
- Staying up to date with the newsfeed
- Planning experiments with the calendar
- Using open science protocols
- Mendeley Data Export
- Managing inventory of reagents
- Signing and counter signing experiments
- Archiving notebooks
- How to keep data alive when researchers move on? Organizing data, methods, and protocols.
Introduction to Data Management for Undergraduate Students: Data Management Overview
This library guide covers the basics and best practices for data management for individuals who are new to the research and data-collecting process. Topics included in this guide are:
- Data Management Overview
- Data Documentation
- Data Preservation
- Filenaming Conventions
- Data Backup
Introduction to SAGA GIS Software
A quick introduction to the System for Automated Geographic Analysis (SAGA) GIS software which is an open source Geographic Information System software package. SAGA GIS has been designed for an easy and effective implementation of spatial algorithms and offers a comprehensive, crowing set of geoscientific methods. A data management module is included in the software.
ESRI Academy: Data Management
ESRI, the creator of ArcMap and other Geographic Information Systems (GIS) software product, provides a large number of training courses on topics that include Data Management as well as other skills such as the use of GIS, Python Programming, and other GIS skills. The types of training materials include tutorials, videos, web courses, instructor-led courses, training seminars, learning plans (including one that leads to 6 courses on the Fundamentals of Data Management) and story maps. Some training materials are available online while others are on location; some are free, and some have an associated fee. Each course provides a certificate once it is completed.
Data Carpentry Geospatial Workshop
This workshop is designed to teach both general geospatial concepts, but also build capacity related to the use of the "R" programming language for data management skills. The learner will find out how to use "R" with geospatial data, particularly geospatial raster and vector data. The workshop lessons include:
- Introduction to Geospatial Concepts to help the learner understand data structures and common storage and transfor formats for spatial data. The goal of this lesson is to provide an introduction to core geospatial data concepts. It is intended as a pre-requisite for the R for Raster and Vector Data lesson for learners who have no prior experience working with geospatial data.
- Introduction to R for Geospatial Data to help the learner import data into $, cacluate summary statistics, and create publication-quality graphics by providing an introduction to the R programming language.
- Introduction to Geospatial Raster and Vector Data with R in which the learner will open, work with, and plot vector and raster-format spatial data in R. This lesson provides a more in-depth introduction to visualization (focusing on geospatial data), and working with data structures unique to geospatial data. It assumes that learners are already familiar with both geospatial data concepts and the core concepts of R.
Data Rescue: Packaging, Curation, Ingest, and Discovery
Data Conservancy was introduced to Data Rescue Boulder through our long-time partner Ruth Duerr of Ronin Institute. Through our conversations, we recognized that Data Rescue Boulder has a need to process large number of rescued data sets and store them in more permanent homes. We also recognized that Data Conservancy along with Open Science Framework have the software infrastructure to support such activities and bring a selective subset of the rescued data into our own institution repository. We chose the subset of data based on a selection from one of the Johns Hopkins University faculty members.
This video shows one of the pathways through which data could be brought into a Fedora-backed institutional repository using our tools and platforms
Data Conservancy screen cast demonstrating integration between the Data Conservancy Packaging Tool, the Fedora repository, and the Open Science Framework. Resources referenced throughout the screen cast are linked below.
DC Package Tool GUI
DC Package Ingest
- Package Ingest release page
- Fedora API Extension Architecture Home, GitHub, and Docker-based demo
- API-X funding provided by IMLS grant #LG-70-16-0076-16
Fedora OSF Storage Provider
(under development as of April 2017)
Open Access Post-Graduate Teaching Materials in Managing Research Data in Archaeology
Looking after digital data is central to good research. We all know of horror stories of people losing or deleting their entire dissertation just weeks prior to a deadline! But even before this happens, good practice in looking after research data from the beginning to the end of a project makes work and life a lot less stressful. Defined in the widest sense, digital data includes all files created or manipulated on a computer (text, images, spreadsheets, databases, etc). With publishing and archiving of research increasingly online we all have a responsibility to ensure the long-term preservation of archaeological data, while at same time being aware of issues of sensitive data, intellectual property rights, open access, and freedom of information.
The DataTrain teaching materials have been designed to familiarise post-graduate students in good practice in looking after their research data. A central tenet is the importance of thinking about this in conjunction with the projected outputs and publication of research projects. The eight presentations, followed by group discussion and written exercises, follow the lifecycle of digital data from pre-project planning, data creation, data management, publication, long-term preservation and lastly to issues of the re-use of digital data. At the same time the course follows the career path of researchers from post-graduate research students, through post-doctoral research projects, to larger collaborative and inter-disciplinary projects.
The teaching material is targeted at co-ordinators of Core Research Skills courses for first year post-graduate research students in archaeology. The material is open access and you are invited to re-use and amend the content as best suits the requirements of your university department. The complete course is designed to run either as a four hour half-day workshop, or 2 x 2 hour classes. Alternatively, individual modules can be slotted into existing data management and core research skills teaching.
TraD: Training for Data management at UEL
TraD aims to embed good practice in data management (DM) at UEL by developing disciplinary training material for postgraduate curricula, training opportunities for research staff and a learning module for library support staff.
Adapting existing resources in Psychology and creating new teaching material in Computer Science, we will pilot training for research and taught postgraduates and adopt DM as a topic into the relevant curricula. A generic training workshop aimed at PGRs and academic staff will provide a grounding in DM: this will be delivered as part of the Graduate School’s Researcher Development Programme, and will be suitable for adaptation in disciplinary settings. Fourthly, we will create an online learning course aimed at subject librarians and others in DM support roles at UEL. This will help equip those who support researchers to understand their respective roles, responsibilities and the technology involved in managing research data.
The project will conclude with an event organised with the Library and Information Research Group (LIRG) to share the project’s experience with a national audience, and this will build on communications work throughout the project to enhance DM skills at UEL and to share our results with the wider JISC community.
The BD2K Guide to the Fundamentals of Data Science Series
The Big Data to Knowledge (BD2K) Initiative presents this virtual lecture series on the data science underlying modern biomedical research. Since its beginning in September 2016, the webinar series consists of presentations from experts across the country covering the basics of data management, representation, computation, statistical inference, data modeling, and other topics relevant to “big data” in biomedicine. The webinar series provides essential training suitable for individuals at an introductory overview level. All video presentations from the seminar series are streamed for live viewing, recorded, and posted online for future viewing and reference. These videos are also indexed as part of TCC’s Educational Resource Discovery Index (ERuDIte). This webinar series is a collaboration between the TCC, the NIH Office of the Associate Director for Data Science, and BD2K Centers Coordination Center (BD2KCCC).
View all archived videos on our YouTube channel:
ETD+ Toolkit: Training Students to manage ETD+ research outputs
The ETD+ Toolkit is a Google Drive Open Curriculum package that is an approach to improving student and faculty research output management. Focusing on the Electronic Thesis and Dissertation (ETD) as a mile-marker in a student’s research trajectory, it provides in-time advice to students and faculty about avoiding common digital loss scenarios for the ETD and all of its affiliated files.
The ETD+ Toolkit provides free introductory training resources on crucial data curation and digital longevity techniques. It has been designed as a training series to help students and faculty identify and offset risks and threats to their digital research footprints.
What it is:
An open set of six modules and evaluation instruments that prepare students to create, store, and maintain their research outputs on durable devices and in durable formats. Each is designed to stand alone; they may also be used as a series.
What each module includes:
Each module includes Learning Objectives, a one-page Handout, a Guidance Brief, a Slideshow with full presenter notes, and an evaluation Survey. Each module is released under a CC-BY license and all elements are openly editable to make reuse as easy as possible.
Open Access to Publications in Horizon 2020 (May 2017)
This webinar is part of the OpenAIRE Spring Webinars 2017.
It dealt with the Open Access mandate in H2020, what is expected of projects with regards to the OA policies in H2020 and how OpenAIRE can help.
Webinar led by Eloy Rodrigues and Pedro Príncipe (UMinho)
Webinar presentation: https://www.slideshare.net/OpenAIRE_eu/openaire-webinar-open-access-to-publications-in-horizon-2020-may-2017
Webinar recordings: https://webinars.eifl.net/2017-05-29_OpenAIRE_H2020_OAtopublications/index.html
Last updated on 30 December 2017.
The Horizon 2020 Open Research Data Pilot: Introduction to the Requirements of the Open Research Data Pilot
This course provides an introduction to the European Commission's Open Research Data Pilot in Horizon 2020. It includes two sections: Introduction to the Requirements of the Open Research Data Pilot and How to Comply with the Requirements of the Open Research Data Pilot. Each section may include videos, presentation slides, demonstrations, associated readings, and quizzes which can be found at the URL to the home page for this course.
- Understand what is required of participants in the Horizon 2020 Open Research Data pilot
- Learn about the concepts of open data, metadata, licensing and repositories
- Identify key resources and services that can help you to comply with requirements
- Undertake short tests to check your understanding
DMP Bingo - the good, the bad, the ugly (v.2)
Updated to v2 on 2016-11-10
An activity for teaching research data management and data management plans. The bingo cards have both "good" and "bad" DMP attributes which can be used for discussion.
2. A set of 20 different bingo cards in 2 formats ("ready to print" PDF file / editable Excel file).
3. A zip file containing 7 DMPs
All DMPs included in this file-set have had identifying information deleted or changed. Institutions may be identifiable but no individuals.
'Good Enough' Research Data Management: A Brief Guide for Busy People
This brief guide presents a set of good data management practices that researchers can adopt, regardless of their data management skills and levels of expertise.
De bonnes pratiques en gestion des données de recherche: Un guide sommaire pour gens occupés (French version of the 'Good Enough' RDM)
Ce petit guide présente un ensemble de bonnes pratiques que les chercheurs peuvent adopter, et ce, indépendamment de leurs compétences ou de leur niveau d’expertise.
How to Make a Data Dictionary
A data dictionary is critical to making your research more reproducible because it allows others to understand your data. The purpose of a data dictionary is to explain what all the variable names and values in your spreadsheet really mean. This guide gives examples and instruction on how to asemble a data dictionary.
Smithsonian Libraries: Describing Your Project : Citation Metadata
Smithsonian Libraries Metadata Guide.
The overall description for your project could be referred to as project metadata, citation metadata, a data record, a metadata record, or a dataset record. The information supplied in the project description should be sufficient to enable you and others to find and properly cite your data.
A metadata record gives the basic who, what, where, and when of the data. It is a high level description that others can use to cite your data. It may be submitted with a dataset as a separate file when deposited in a repository, or displayed in the repository with data entered into a form.
Metacat Administrator's Guide
Metacat is a repository for data and metadata (documentation about data) that helps scientists find, understand and effectively use data sets they manage or that have been created by others. Thousands of data sets are currently documented in a standardized way and stored in Metacat systems, providing the scientific community with a broad range of science data that–because the data are well and consistently described–can be easily searched, compared, merged, or used in other ways.
This Metacat Administrator's Guide includes instruction on the following topics:
Chapter 1: Introduction
Chapter 2: Contributors
Chapter 3: License
Chapter 4: Downloading and installing Metacat
Chapter 5: Configuring Metacat
Chapter 6: DataONE Member Node Support
Chapter 7: Accessing and submitting Metadata and data
Chapter 8: Metacat indexing
Chapter 9: Modifying and creating themes
Chapter 10: Metacat authentication mechanism
Chapter 11: Metacat's use of Geoserver
Chapter 12: Replication
Chapter 13: Harvester and harvest list editor
Chapter 14: OAI protocol for metadata harvesting
Chapter 15: Event logging
Chapter 16: Enabling web searches: sitemaps
Chapter 17: Appendix: Metacat properties
Chapter 18: Appendix: Development issues
QGIS - for Absolute Beginners
This video is a complete rundown of the basics in QGIS, a free GIS software package designed as an alternative to ArcMap.
QGIS is a user friendly Open Source Geographic Information System (GIS) licensed under the GNU General Public License. QGIS is an official project of the Open Source Geospatial Foundation (OSGeo). It runs on Linux, Unix, Mac OSX, Windows and Android and supports numerous vector, raster, and database formats and functionalities.
QGIS Training Manual
A training manual written by the QGIS Development Team. It includes instruction on the basic use of the QGIS interface, applied applications, and other basic operations. Topics include: general tools, QGIS GUI, working with projections, raster and vector data, managing data sources and integration with GRASS GIS. Examples are given of working with GPS and OGC data. A list of plugins is also included.
QGIS aims to be a user-friendly GIS, providing common functions and features. The initial goal of the project was to provide a GIS data viewer. QGIS has reached the point in its evolution where it is being used by many for their daily GIS data-viewing needs. QGIS supports a number of raster and vector data formats, with new format support easily added using the plugin architecture.
Training Materials for Data Management in Reclamation
This document (downloadable from this landing page) provides supplementary educational materials focused upon US Bureau of Reclamation (USBR) approaches to data management that use and expand upon a number of USGS training modules on data management. The USBR supplementary materials include:
- A discussion of the Reclamation data lifecycle
- A Reclamation data management plan template
- Examples of Reclamation data management best practice
- Lessons learned from various USBR data management efforts.
Introduction to Data Management Plans
Video presentation and slides introducing the concept of Data Management Plans given by Dr. Andrew Stephonson at the Research Resource Forum at Northwestern University in 2016. Dr. Stephenson is Distinguished Professor of Biology and Associate Dean for Research and Graduate Education in the Eberly College of Science at Penn State. As an active researcher, he has generated and collected data for many years and served on many a panel reviewing grant proposals. From his perspective, data management plans make good sense. In the following video, he describes the elements of a DMP and why they are important. The video presentation is available at: https://www.youtube.com/watch?v=uHyDzt6E3qU
This presentation is part of a Data Management Plan Tutorial prepared by the Penn State University Libraries and contains the following modules:
- Introduction to Data Management Plans
- Why Do You Need a Data Management Plan?
- Components of a Typical Plan
- Tools and Other Resources for Data Management Planning
- Part 1: Data and Data Collection
- Part 2: Documenting the Data
- Part 3: Policies for Data Sharing and Access
- Part 4: Reuse and Redistribution of Data
- Part 5: Long-Term Preservation and Archiving of Data
- Next Steps to Take
The entire Data Management Plan tutorial can be found at: https://www.e-education.psu.edu/dmpt
- Introduction to Data Management Plans