All Learning Resources
How to motivate researcher engagement?
Presentation given about Data Stewardship at TU Delft and Data Championship at Cambridge University at Dutch LCRDM (Landelijk Coördinatiepunt Research Data Management) Data Steward meeting 1st December 2017. Topics covered include suggestions by data stewards about how to approach and persuade researchers to engage in data management and stewardship activities.
CURATE! The Digital Curator Game
The CURATE game is designed to be used as an exercise that prompts players to put themselves into digital project scenarios in order to address issues and challenges that arise when institutions engage with digital curation and preservation.
Developed as a means to highlight the importance of training in digital curation among practitioners and managers working in libraries, museums and cultural heritage institutes, the game has been used as a self-assessment tool, a team-building exercise and a training tool for early career students.
The CURATE game package includes:
- Welcome to CURATE Presentation
- Game Board (PDF)
- Game Cards (PDF)
- About the Game (PDF)
- Rules (PDF)
- Record Sheet & Closing Questions (PDF)
- Frequently Asked Questions (DoC)
Research Data Management Community Training
Good research data management is of great importance for high-quality research. Implementing professional research data management from the start helps to avoid problems in the data creation and curation phases.
- Definition(s) of RDM
- Benefits and Advantages of RDM
- Research Data Life-Cycle
- Structure and components of RDM
- Recommended literature
Access Policies and Usage Regulations: Licenses
The webinar about licensing and policy will look into why it is important that research data are provided with licenses.
- Benefits of sharing research data
- Types of licenses
- Data ownership and reuse
- Using creative commons in archiving research data
During the workshop, participants will acquire a basic knowledge of data licensing.
Coffee and Code: R & RStudio
What is R?
R is an [Open Source](https://opensource.org) programming language that is specifically designed for data analysis and visualization. It consists of the [core R system](https://cran.r-project.org) and a collection of (currently) over [13,000 packages](http://cran.cnr.berkeley.edu) that provide specialized data manipulation, analysis, and visualization capabilities. R is an implementation of the *S* statistical language developed in the mid-1970s at Bell Labs, with the start of development in the early 1990s and a stable beta version available by 2000. R has been under continuous development for over 25 years and has hit major development [milestones](https://en.wikipedia.org/wiki/R )(programming_language) over that time.
R syntax is relatively straightforward and is based on a core principle of providing reasonable default values for many functions, and allowing a lot of flexibility and power through the use of optional parameters.
Train the Trainer Workshop: How do I create a course in research data management?
Presentations and excercises of a train-the-trainer Workshop on how to create a course in research data management, given at the International Digital Curation Conference 2018 in Barcelona.
Data and Software Skills Training for Librarians
Library Carpentry is an open education volunteer network and lesson organization dedicated to teaching librarians data and software skills. The goal is to help librarians better engage with constituents and improve how they do their work. This presentation will serve as an introduction on how Library Carpentry formed in 2015, evolved as a global community of library professionals and will continue as a future sibling of the Carpentries, an umbrella organization of distinct lesson organizations, such as Data and Software Carpentry. We’ll cover existing collaborative lesson development, curricula coverage, workshop activities and the global instructor community. We’ll then talk about the future coordinating activities led by the UC system to align and prepare for a merging with Data and Software Carpentry.
Workshop: Research Data Management in a Nutshell
The workshop Research Data in a Nutshell was part of the Doctoral Day of the Albertus Magnus Graduate Center (AMGC) at the University of Cologne on January 18 2018.
The workshop was intended as a brief, interactive introduction into RDM for beginning doctoral students.
This lesson assumes no prior experience with the tools covered in the workshop. However, learners are expected to have some familiarity with biological concepts, including nucleotide abbreviations and the concept of genomic variation within a population.
Workshop Overview. Workshop materials include a recommendation for a dataset to be used with the lesson materials.
Project organization and management:
Learn how to structure your metadata, organize and document your genomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database.
Introduction to the command line:
Learn to navigate your file system, create, copy, move, and remove files and directories, and automate repetitive tasks using scripts and wildcards.
Data wrangling and processing:
Use command-line tools to perform quality control, align reads to a reference genome, and identify and visualize between-sample variation.
Introduction to cloud computing for genomics:
Learn how to work with Amazon AWS cloud computing and how to transfer data between your local computer and cloud resources.
This workshop uses a tabular ecology dataset from the Portal Project Teaching Database and teaches data cleaning, management, analysis, and visualization. There are no pre-requisites, and the materials assume no prior knowledge about the tools. We use a single dataset throughout the workshop to model the data management and analysis workflow that a researcher would use.
- Data Organization in Spreadsheets
- Data Cleaning with OpenRefine
- Data Management with SQL
- Data Analysis and Visualization in R
- Data Analysis and Visualization in Python
The Ecology workshop can be taught using R or Python as the base language.
Portal Project Teaching Dataset: the Portal Project Teaching Database is a simplified version of the Portal Project Database designed for teaching. It provides a real-world example of life-history, population, and ecological data, with sufficient complexity to teach many aspects of data analysis and management, but with many complexities removed to allow students to focus on the core ideas and skills being taught.
Planning for Software Reproducibility and Reuse
Many research projects depend on the development of scripts or other software to collect data, perform analyses or simulations, and visualize results. Working in a way that makes it easier for your future self and others to understand and re-use your code means that more time can be dedicated to the research itself, rather than troubleshooting hard-to-understand code, resulting in more effective research. In addition, by following some simple best practices around code sharing, the visibility and impact of your research can be increased. In this introductory session, you will:
- learn about best practices for writing, documenting (Documentation), and organizing code (Organization & Automation),
- understand the benefits of using version control (Version Control & Quality Assurance),
- learn about how code can be linked to research results and why (Context & Credit),
- understand why it is important to make your code publishable and citable and how to do so (Context & Credit),
- learn about intellectual property issues (Licensing),
- learn about how and why your software can be preserved over time (Archiving).
ScienceBase as a Platform for Data Release
This video tutorial provides information about using ScienceBase as a platform for data release. We will describe the data release workflow and demonstrate, step-by-step, how to complete a data release in ScienceBase.
Introduction to GRASS GIS
GRASS GIS, commonly referred to as GRASS (Geographic Resources Analysis Support System), is a free and open source Geographic Information System (GIS) software suite used for geospatial data management and analysis, image processing, graphics and maps production, spatial modeling, and visualization. GRASS GIS is currently used in academic and commercial settings around the world, as well as by many governmental agencies and environmental consulting companies. It is a founding member of the Open Source Geospatial Foundation (OSGeo).
This training includes an introduction to raster and vector analysis, image processing, water flow modeling, Lidar data import and analysis, solar radiation analysis, shaded relief, network analysis using four interaces, and python scripting. The training uses GRASS GIS 7.0 and a GRASS GIS NC SPM sample dataset. The GitHub repository can be found at: https://github.com/ncsu-geoforall-lab/grass-intro-workshop .
Getting Started With Lab Folder
Labfolder is an electronic lab notebook that enables researchers to record findings and make new discoveries. By reinventing the traditional paper lab notebook, our productivity & collaboration platform makes it easier to create, find, share, discuss & validate research data as a team. This Getting Started guide will help you learn what Labfolder is and how to use it for data entry, protocols and standard operating procedures, data retrieval, collaboration, inventory and data security tasks.
Labfolder is free for a team of up to 3 scientists in academia; other pricing is available for larger teams in academia, for business and industry, and for classroom use.
What is Open Science?
This introductory course will help you to understand what open science is and why it is something you should care about. You'll get to grips with the expectations of research funders and will learn how practising aspects of open science can benefit your career progression. Upon completing this course, you will:
- understand what Open Science means and why you should care about it
- be aware of some of the different ways to go about making your own research more open over the research lifecycle
- understand why funding bodies are in support of Open Science and what their basic requirements are
- be aware of the pontential benefits of practicing open science
It is important to remember that Open Science is not different from traditional science. It just means that you carry out your research in a more transparent and collaborative way. Open Science applies to all research disciplines. While Open Science is the most commonly used term, you may also hear people talking about Open Scholarship or Open Research in the Arts and Humanities.
Research Data Management Guide
This guide can assist you in effectively managing, sharing, and preserving your research data. It provides information and guidance for all aspects of the data lifecycle, from creating data management plans during the proposal phase to sharing and publishing your data at the conclusion of your project. This guide is not specific to any particular funder, discipline, or type of data. The guide also features data management stories and examples, both good and bad, that would be useful to research data management instructors or other service providers.
Dendro Open-Source Dropbox
Dendro is a collaborative file storage and description platform designed to support users in collecting and describing data, with its roots in research data management. It does not intend to replace existing research data repositories, because it is placed before the moment of deposit in a data repository. The DENDRO platform is an open-source platform designed to help researchers describe their datasets, fully build on Linked Open Data. Whenever researchers want to publish a dataset, they can export to repositories such as CKAN, DSpace, Invenio, or EUDAT's B2SHARE.
It is designed to support the work of research groups with collaborative features such as:
File metadata versioning
Editing and rollback
Public/Private/Metadata Only project visibility
You start by creating a “Project”, which is like a Dropbox shared folder. Projects can be private (completely invisible to non-colaborators), metadata-only (only metadata is visible but data is not), and public (everyone can read both data and metadata). Project members can then upload files and folders and describe those resources using domain-specific and generic metadata, so it can suit a broad spectrum of data description needs. The contents of some files that contain data (Excel, CSV, for example) is automatically extracted, as well as text from others (PDF, Word, TXT, etc) to assist discovery.
Dendro provides a flexible data description framework built on Linked Open Data at the core (triple store as), scalable file storage for handling big files, BagIt-represented backups, authentication with ORCID and sharing to practically any repository platform.
Further information about Dendro can be found on its Github repository at: https://github.com/feup-infolab/dendro. Documentation and descriptions of Dendro can be found in other languages from the primary URL home page.
This training module provides you with instructions on how to deploy B2SAFE and B2STAGE with iRODS4. It also shows you how to use these services. Moreover, the module provides hands-on training on Persistent Identifiers, more specifically Handle v8 and the corresponding B2HANDLE python library.
B2SAFE is a robust, safe and highly available service which allows community and departmental repositories to implement data management policies on their research data across multiple administrative domains in a trustworthy manner.
B2STAGE is a reliable, efficient, light-weight and easy-to-use service to transfer research data sets between EUDAT storage resources and high-performance computing (HPC) workspaces.
Please consult the user documentation on the services for a general introduction, if needed, before following the contents of this git repository. This training material foresees two types of trainees: those who want to learn how to use the EUDAT B2SAFE and B2STAGE services; and those who prefer to deploy and integrate these services. Following the full, in-depth tutorial will allow you to understand how the components of a service are combined and thus enables you to also extend the integration of services at the low-level (technology-level rather than API level). Following just the "use" part of the training will familiarise you with the APIs of the services, but not with the underlying technology and its wiring.
Training on using the E2O WCI Data Portal
Course Content: This course will offer an introduction to the eartH2Observe Water Cycle Integrator (WCI) Data Portal available at https://wci.earth2observe.eu/
Course Objectives: Training the users on how to navigate through the E2O WCI portal: navigate around the map, select indicators by searching, perform some analysis on the selected indicators, download data, and other WCI functionalities.
Why is this topic interesting? With this training we can increase the use of the WCI, build capacity, and furthermore the dissemination of all the available data and tools. Upon completion of this training the users will have increased capacity in efficiently using the portal and its functionalities.
The course includes 3 lessons:
Lesson 1: GISportal - An Introduction
Lesson 2: GISportal - External Data and Collaboration
Lesosn 3: GISportal - Docker Version
The Earth2Observe (E2O) Water Cycle Integrator (WCI) portal takes data that you select and plots it on a map to help you analyse, export and share it.
The WCI portal is an open source project built by Plymouth Marine Laboratory's Remote Sensing Group. The portal builds on the development of several other EU funded projects, past and present, that PML have involvement in. You can find the code on GitHub at https://github.com/earth2observe-pml/GISportal
Data Management Guidelines
The guidelines available from this web page cover a number of topics related to research data management. The guidelinesare targeted to researchers wishing to submit data to the Finnish Social Science Data Archive, but may be helpful to other social scientists interested in practices related to research data management with the understanding that the guidelines refer to the situation in Finland, and may not be applicable in other countries due to differences in legislation and research infrastructure.
High level topics (or chapters) covered include:
- Data management planning (the data, rights, confidentiality and data security, file formats and programs, documentation on data processing and content, lifecycle, data management plan models)
- Copyrights and agreements
- Processing quantitative data files
- Processing qualitiative data files
- Anonymisation and personal data including policies related to ethical review of human sciences
- Data description and metadata
- Physical data storage
The guidelines are also available in FSD's Guidelines in DMPTuuli, a data management planning tool for Finnish research organisations. It provides templates and guidance for making a data management plan (DMP).
Bio-Linux 8 is a powerful, free bioinformatics workstation platform that can be installed on anything from a laptop to a large server, or run as a virtual machine. Bio-Linux 8 adds more than 250 bioinformatics packages to an Ubuntu Linux 14.04 LTS base, providing around 50 graphical applications and several hundred command line tools. The Galaxy environment for browser-based data analysis and workflow construction is also incorporated in Bio-Linux 8.
Bio-Linux 8 comes with a tutorial document suitable for complete beginners to Linux, though some basic bioinformatics knowledge (eg. what is a read, assembly, feature, translation) is assumed. The tutorial comprises a general introduction to the Linux system and a set of exercises exploring specific bioinformatics tools. You can find the latest version of the tutorial via the Bio-Linux documentation icon on the desktop. There is also a copy on-line at: http://nebc.nerc.ac.uk/downloads/courses/Bio-Linux/bl8_latest.pdf. Allow yourself around 2 days to work through this, depending on your previous experience. Other, taugh courses can be found on the Bio-Linux Training web page.
Bio-Linux 8 represents the continued commitment of NERC to maintain the platform, and comes with many updated and additional tools and libraries. With this release we support pre-prepared VM images for use with VirtualBox, VMWare or Parallels. Virtualised Bio-Linux will power the EOS Cloud, which is in development for launch in 2015.
OMOP Common Data Model and Extract, Transform & Load Tutorial
In this tutorial you will learn about the details of the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and how to apply it to Extract, Transform & Load (ETL) data. The OMOP Common Data Model allows for the systematic analysis of disparate observational databases. The concept behind this approach is to transform data contained within those databases into a common format (data model) as well as a common representation (terminologies, vocabularies, coding schemes), and then perform systematic analyses using a library of standard analytic routines that have been written based on the common format. In this tutorial, you can observe Best practices of converting data into a data module.
Topics covered within this tutorial include:
-What is OMOP/OHDSI?
-OMOP Common Data Model (CDM)– Why and How
- How to retrieve data from OMOP CDM
-Setup and Performing of an Extract Transform and Load process into the CDM
-Using WhiteRabbit and Rabbit-In-A-Hat to Build an ETL
- Testing and Quality Assurance
Included with the video presentation of the tutorial include:
The OHDSI Common Data Model and Extract, Transform & Load Tutorial took place on September 24rd, 2016 during the 2016 OHDSI Symposium. Recordings were made possible by the generous support of Johnson & Johnson, the JKTG Foundation, and Pfizer.
Introduction to HydroShare
HydroShare is an online, collaborative system for sharing and publishing a broad set of hydrologic data types, models, and code. It enables people to collaborate seamlessly in a high performance computing environment, thereby enhancing research, education, and application of hydrologic knowledge. HydroShare is being developed by a team from the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) supported by National Science Foundation awards ACI-1148453 and ACI-1148090.
The introduction to HydroSHare inlcude a Getting Started guide, a Frequently Asked Questions guide, and a number of videos on topics such as:
- Collaborate Data and Model Sharing using HydroShare
- Delineate Watersheds and Perform Hydrologic Terrain Analysis with HydroShare and CyberGIS
- Share, Publish and execute your SWAT models with HydroShare and SWATShare
CUAHSI is an organization representing more than 130 U.S. universities and international water-science-related organizations and is sponsored by the National Science Foundation to provide infrastructure and services to advance the development of hydrologic science and education in the United States.
The Realities of Research Data Management
The Realities of Research Data Management is a four-part series that explores how research universities are addressing the challenge of managing research data throughout the research lifecycle. In this series, we examine the context, influences, and choices higher education institutions face in building or acquiring RDM capacity—in other words, the infrastructure, services, and other resources needed to support emerging data management practices. Our findings are based on case studies of four institutions: University of Edinburgh (UK), the University of Illinois at Urbana-Champaign (US), Monash University (Australia) and Wageningen University & Research (the Netherlands), in four very different national contexts.
- Part One of the series: A Tour of the Research Data Management (RDM) Service Space, found at: https://www.oclc.org/research/publications/2017/oclcresearch-rdm-part-on....
- Part Two of the series: Scoping the University RDM Service Bundle at: https://www.oclc.org/research/publications/2017/oclcresearch-rdm-part-tw....
- Part Three of the series: Incentives for building University RDM Services at: https://www.oclc.org/research/publications/2017/oclcresearch-rdm-part-th....
- Part Four of the series: Sourcing and Scaling RDM Services at: https://www.oclc.org/research/publications/2017/oclcresearch-rdm-part-fo....
In addition, supplemental material has been provided including in-depth profiles of each collaborating institution's RDM service spaces, a "Works in Progress Webinar: Policy Realities in Research Data Management" with an accompanying three-part Planning Guide. at: https://www.oclc.org/research/publications/2017/oclcresearch-rdm-institu....
OMOP Common Data Model and Standardized Vocabularies
This workshop is for data holders who want to apply OHDSI’s data standards to their own observational datasets and researchers who want to be aware of OHDSI’s data standards, so they can leverage data in OMOP CDM format for their own research purposes.
Topics covered within this tutorial include:
-Introductions and Ground Rules Foundational
• History of OMOP
• Why and How
• Birth of OHDSI
-Introduction to OMOP Common Data Model OHDSI Community
Example of Remote Study
-Ancestors & Descendants
- How does it work for Drugs
-History of the model
- In-depth discussion of model
- Real-World Scenario
- ETL Piballs
-Leveraging OHDSI Tools
After the Tutorials, you will know:
1. History of OMOP, OHDSI
2. How Standardized Vocabulary works
3. How to find codes and Concepts
4. How to navigate the concept hierarchy
5. The OMOP Common Data Model (CDM)
6. How to use the OMOP CDM
Included with the video presentation of the tutorial include:
Data Management for the Humanities
The guidelines available from this web page cover a number of topics related to Data Management. Many of the resources and information found in this guide have been adapted from the UK Data Archive and the DH Curation Guide. The guidelines are targeted to researchers wishing to submit data to the social science research data, and would be useful to new data curators and data librarians in the Arts & Humanities as well. Each section has useful references for further study, if desired.
What You Will Find in This Guide:
-How to Document and Format your Data
-Examples of Data Management Plans (DMP) and Data Curation Profiles (DCP)
-Tools to Help You Create DMPs and DCPs
-California Digital Library Data Repositories
-Where to Get Help on Campus
-A list of Federal Funding Agencies and Their Data Management Requirements
-A Description of Data Curation for the Humanities and What Makes Humanities Data Unique
-Information on Data Representation
-Resources on Data Description Standards
Intro to SQL for Data Science
The role of a data scientist is to turn raw data into actionable insights. Much of the world's raw data—from electronic medical records to customer transaction histories—lives in organized collections of tables called relational databases. Therefore, to be an effective data scientist, you must know how to wrangle and extract data from these databases using a language called SQL (pronounced ess-que-ell, or sequel). This course teaches you everything you need to know to begin working with databases today!
ResearchVault is a secure computing environment where scientists and collaborators can conduct research on restricted and confidential data.
ResearchVault (also known as ResVault) is designed to act as a workstation that is secure and pre-approved with the capacity for large-scale data storage and computation. Researchers can:
Securely store restricted data like:
- electronic protected health information (ePHI) (HIPAA)
- export-controlled data (ITAR/EAR)
- student data (FERPA)
- controlled unclassified information (CUI)
- intellectual property data (IP)
Store and work with larger data sets than is possible on a regular workstation
Perform work on stored data sets with familiar software tools running on virtual machines located in the UF data center
Concurrently run more programs than on a regular workstation
Display work results on a graphical interface that is securely transmitted to remote devices such as desktops, laptops, or iPads
Work collaboratively with other researchers on the same data sets using different workstations.
The system is modeled on a bank vault where you receive:
An individual deposit box with secure storage for valuables
Privacy from other users and bank staff
A secure area within the vault to privately access your valuables
ResVault is available to University of Florida faculty and students. People not associated with UF can be sponsored by faculty at UF. Training materials available from the ResVault home page include an introductory / overview video, and a recording of a training session on the research administration for restricted data that was given on Oct 12, 2018. The recording is available from the UF Media Website. It describes how the requirement for using special IT infrastructure is handled and how the right environment for each project is determined, as well as the training requirements for project participants.
Managing Creative Arts Research Data
This post-graduate teaching module for creative arts disciplines is focused on making data and digital documentation that is highly usable and has maximum impact. The module content is particularly well suited for inclusion within MA programmes dealing with ephemeral art forms such as dance, music, visual art, theatre or media design. Learning is self-directed. MCARD-ExcersiceV1.0.pdf is an optional, summative assessment exercise.
This module, funded as part of the wider JISC Managing Research Data programme as part of the Curating Artistic Research Output (CAiRO) Project, offers data management knowledge tailored to the special requirements of the creative arts researcher who is producing non-standard (i.e. non-textual) research outputs. The module aims to develop the development of skills required by arts researchers to effectively self-archive and then disseminate data made through research activities. The module can also help researchers to better understand data management issues and then communicate needs to third parties, such as institutional repositories, in order to negotiate appropriate levels of service.
Downloadable resources associated with this module include a zip file containing module content as stand alone .html files, a PDF of optional, summative exercise, and a PDF version of the introduction. Topics include:
Unit 1: Introducing art as research data
Unit 2: Creating art as research data
Unit 3: Managing art as research data
Unit 4: Delivering art as research data
Each unit has a suggested order (accessible via the navigation on the left of each page) and addition ‘Focus on’ content which further illustrates topics covered in the main body. Module content can be accessed directly online at: https://www.webarchive.org.uk/wayback/archive/20111001142018/http://www....
Best Practices in Data Collection and Management Workshop
Ever need to help a researcher share and archive their research data? Would you know how to advise them on managing their data so it can be easily shared and re-used? This workshop will cover best practices for collecting and organizing research data related to the goal of data preservation and sharing. We will focus on best practices and tips for collecting data, including file naming, documentation/metadata, quality control, and versioning, as well as access and control/security, backup and storage, and licensing. We will discuss the library’s role in data management, and the opportunities and challenges around supporting data sharing efforts. Through case studies we will explore a typical research data scenario and propose solutions and services by the library and institutional partners. Finally, we discuss methods to stay up to date with data management related topics.
This workshop was presented at NN/LM MAR Research Data Management Symposium: Doing It Your Way: Approaches to Research Data Management for Libraries. Powerpoint slides are available for download. files include a biophysics case study.
Terms of Access: There is 1 restricted file in this dataset which may be used; however, you are asked not to share the Mock lab notebook. It is completely fictitious. Users may request access to files.
pyunicorn (Unified Complex Network and RecurreNce analysis toolbox) is a fully object-oriented Python package for the advanced analysis and modeling of complex networks. Above the standard measures of complex network theory such as degree, betweenness and clustering coefficient it provides some uncommon but interesting statistics like Newman’s random walk betweenness. pyunicorn features novel node-weighted (node splitting invariant) network statistics as well as measures designed for analyzing networks of interacting/interdependent networks.
Moreover, pyunicorn allows to easily construct networks from uni- and multivariate time series data (functional (climate) networks and recurrence networks). This involves linear and nonlinear measures of time series analysis for constructing functional networks from multivariate data as well as modern techniques of nonlinear analysis of single time series like recurrence quantification analysis (RQA) and recurrence network analysis. Other introductory information about pyunicorn can be found at: http://www.pik-potsdam.de/~donges/pyunicorn/index.html .
Tutorials for pyunicorn are designed to be self-explanatory. Besides being online, the tutorials are also available as ipython notebooks. For further details about the used classes and methods please refer to the API at: http://www.pik-potsdam.de/~donges/pyunicorn/api_doc.html.
The Agriculture Open Data Package
he third GODAN Capacity Development Working Group webinar, supported by GODAN Action, focused on the Agriculture Open Data Package (AgPack).
In 2016 GODAN, ODI, the Open Data Charter and OD4D developed the Agricultural Open Data Package (AgPack) to help governments to realize impact with open data in the agriculture sector and food security.
During the webinar the speakers outlined examples and use cases of governments using open data in support of their agricultural sector and food security. Also, the different roles a government can pick up to facilitate such a development, how open data can support government policy objectives on agriculture and food security.
E-Infrastructures and Data Management Toolkit
This online toolkit provides training and educational resources for data discovery, management, and curation across the globe, in support of an international collaborative effort to enable open access to scientific data. Tools within the toolkit include:
- DDOMP Researcher Guide which has resources and tips for creating a successful DDOMP (data management plan)
- Data Management Training including webinars, courses, certifications, and literature on data management topics
- Best Practices & Standards which provide guidelines for effective data management.
Video tutorials about each of these tools are available at: https://www.youtube.com/watch?v=2qQeDCB3XhU&list=PLq4USJIxTB6TYUgkJ0OX3W...
Other capacity building tools include a Data Skills Curricula Framework to enhance information management skills for data-intensive science which was developed by the Belmont Forum’s e-Infrastructures and Data Management (e-I&DM) Project to improve data literacy, security and sharing in data-intensive, transdisciplinary global change research. More information about the curricula framework including a full report and an outline of courses important for researchers doing data-intensive research can be found at: https://www.belmontforum.org/resources/outline-data-skills-curricula-fra... .
Introduction to Research Data Management - half-day course (Oxford)
Teaching resources for a half-day course for researchers (including postgraduate research students), giving a general overview of some major research data management topics. Included are a slideshow with presenter's notes, a key resources hand-out, and two other hand-outs for use in a practical data management planning exercise. These course materials are part of a set of resources created by the JISC Managing Research Data programme-funded DaMaRO Project at the University of Oxford. The original version of the course includes some Oxford-specific material, so delocalized versions (which omit this) of the slideshow and the key resources hand-out are also provided
Introduction to Humanities Research Data Management
Reusable, machine-readable data are one pillar of Open Science (Open Scholarship). Serving this data
reuse aspect requires from researchers to carefully document their methods and to take good care of
their research data. Due to this paradigm shift, for Humanities and Heritage researchers, activities and
issues around planning, organizing, storing, and sharing data and other research results and products
play an increasing role. Therefore, during two workshop sessions, participants will dive
into a number of topics, technologies, and methods that are connected with
Humanities Research Data Management. The participants will acquire knowledge and skills that will
enable them to draft their own executable research data management plan that will support the
production of reusable, machine-readable data, a key prerequisite for conducting effective and
sustainable projects. Topics that will be covered are theoretical reflections on the role of data within
humanities research and cultural heritage studies, opportunities and challenges of eHumanities and
eResearch, implementing the FAIR principles and relevant standards, and basics of Data Management
Learning outcomes: Participants of this workshop will gain an overview about issues related to
Humanities Research Data Management and learn about relevant tools and information resources.
Through a hands-on session, the participants will be especially equipped and skilled to draft the nucleus
of their own Research Data Management Plan.
Research data management training modules in Social Anthropology (Cambridge)
Looking after digital data is central to good research. We all know of horror stories of people losing or deleting their entire dissertation just weeks prior to a deadline. Even before this happens, good practice in looking after research data from the beginning to the end of a project makes work and life a lot less stressful. Defined in the widest sense, digital data includes all files created or manipulated on a computer (text, images, spreadsheets, databases, etc). With publishing and archiving of research increasingly being online, we all have a responsibility to ensure the long-term preservation of our research data, while at same time being aware of issues of sensitive data, intellectual property rights, open access, and freedom of information. The DataTrain teaching materials have been designed to familiarise post-graduate students in good practice in looking after their research data. A central tenet is the importance of thinking about this in conjunction with the projected outputs and publication of research projects. This teaching package is focussed on data management for Social Anthropology.
For each of three modules of the course, notes and powerpoint presentations are available as well a a survey model, a list of useful sofwrae, and a list os references and web-based resources as handouts. Topics include the process of fieldwork, the kinds of data collected, and the methods for their collection. Other topics relate to the organisation of data including basic information on file management, some practical demonstration of software tools and back-up techniques.
Course materials are available in a downloadable zip file.
Library Carpentry OpenRefine
This Library Carpentry lesson introduces librarians to working with data in OpenRefine. At the conclusion of the lesson you will understand what the OpenRefine software does and how to use the OpenRefine software to work with data files. This lesson is a supplement to https://github.com/LibraryCarpentry/lc-open-refine/tree/v2019.06.1.
RDM Training (Herts) Module 3: Safeguarding Data
Discussing the benefits and risks of storage media, back up systems, sharing data across the internet, and general security, this training module encourages secure storage and sharing of data whilst in the working stage of a research project. Slides and training notes are included in this pack in one collection, but can be divided into four sections; storage solutions, keep it safe - back up, sharing, and security. A lesson plan is included in the zip package of module files.
This module is 3 of 4. The topics of the other modules are:
Module 1: Project Planning: http://zenodo.org/record/28026
Module 2: Getting Started: http://zenodo.org/record/28027
Module 4: Finishing Touches: http://zenodo.org/record/28029
RDM Training (Herts) Module 2: Getting Started
This module highlights research data management issues that should be addressed when starting a project: choosing file structures and naming conventions, file versioning, metadata and documentation, software choices, and the best practice for programming. Considering these details before data collection ensures that the data are well managed and organised, and require fewer transformations when preparing them for publication. Slides and training notes are included in this pack in one collection, but can be divided into five sections: filing systems, metadata, software, documentation, and coding.
This module is 2 of 4. The topics of the other modules are:
Module 1: Project Planning: http://zenodo.org/record/28026
Module 3: Safeguarding Data: http://zenodo.org/record/28028
Module 4: Finishing Touches: http://zenodo.org/record/28029
RDM Training (Herts) Module 4: Finishing Touches
At the end of a research project, the science is published and the data should be preserved. This final RDM module includes advice on where to publish the science outcomes and the supporting data as well as how to select the data, anonymise it, and choose the right archive for your data. Slides and training notes are included in this pack in one collection, but can be divided into four sections: publication, preserving data, anonymisation, and archiving data.
This module is 4 of 4. The topics of the other modules are:
Module 1: Project Planning: http://zenodo.org/record/28026
Module 2: Getting Started: http://zenodo.org/record/28027
Module 3: Safeguarding Data: http://zenodo.org/record/28028