All Learning Resources

  • Python for Data Management

    This training webinar for Python is part of a technical webinar series created by the USGS Core Science Analytics, Synthesis, and Library section to improve data managers’ and scientists' skills with using Python in order to perform basic data management tasks. 

    Who: These training events are intended for a wide array of users, ranging from those with little or no experience with Python to others who may be familiar with the language but are interested in learning techniques for automating file manipulation, batch generation of metadata, and other data management related tasks.

    Requirements: This series will be taught using Jupyter notebook and the Python bundle that ships with the new USGS Metadata Wizard 2.x tool.

    Topics include:
    - Working with Local Files
    - Batch Metadata Handling
    - Using the USGS ScienceBase Platform with PySB

  • Why should you worry about good data management practices?

    To prepare data for archival it must be organized in well-formatted, described, and documented datasets. Benefits of good data management include:

    Short-term
    Spend less time doing data management and more time doing research
    Easier to prepare and use data for yourself
    Collaborators can readily understand and use data files

    Long-term (data publication)
    Scientists outside your project can find, understand, and use your data to address broad questions
    You get credit for archived data products and their use in other papers
    Sponsors protect their investment

    This page provides an overview of data management planning and preparation. It offers practical methods to successfully share and archive your data at the ORNL DAAC.  Topics include:  Best Practices for Data Management, Writing Data Management Plans including examples of data management plans, How-to's amd Resources.  

  • FAIR Webinar Series

    This webinar series explores each of the four FAIR principles (Findable, Accessible, Interoperable, Reusable) in depth - practical case studies from a range of disciplines, Australian and international perspectives, and resources to support the uptake of FAIR principles.

    The FAIR data principles were drafted by the FORCE11 group in 2015. The principles have since received worldwide recognition as a useful framework for thinking about sharing data in a way that will enable maximum use and reuse.  A seminal article describing the FAIR principles can also be found at:  https://www.nature.com/articles/sdata201618.

    This series is of interest to those who work with creating, managing, connecting and publishing research data at institutions:
    - researchers and research teams who need to ensure their data is reusable and publishable
    - data managers and researchers
    - Librarians, data managers and repository managers
    - IT who need to connect Institutional research data, HR and other IT systems

  • Tools for Version Control of Research Data

    Research data tend to change over time (get expanded, corrected, cleaned, etc.). Version control is the management of changes to data or documents. This talk addresses why version control is a crucial component of research data management and introduces software tools that are available for this purpose. ​This workshop was part of the Conference Connecting Data for Research held at VU University in Amsterdam.

  • Top 10 FAIR Data & Software Things

    The Top 10 FAIR Data & Software Global Sprint was held online over the course of two-days (29-30 November 2018), where participants from around the world were invited to develop brief guides (stand alone, self paced training materials), called "Things", that can be used by the research community to understand FAIR in different contexts but also as starting points for conversations around FAIR. The idea for "Top 10 Data Things" stems from initial work done at the Australian Research Data Commons or ARDC (formerly known as the Australian National Data Service).

    The Global Sprint was organised by Library Carpentry, Australian Research Data Commons and the Research Data Alliance Libraries for Research Data Interest Group in collaboration with FOSTER Open Science, OpenAire, RDA Europe, Data Management Training Clearinghouse, California Digital Library, Dryad, AARNet, Center for Digital Scholarship at the Leiden University, and DANS. Anyone could join the Sprint and roughly 25 groups/individuals participated from The Netherlands, Germany, Australia, United States, Hungary, Norway, Italy, and Belgium. See the full list of registered Sprinters.

    Sprinters worked off of a primer that was provided in advance together with an online ARDC webinar introducing FAIR and the Sprint titled, "Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint." Groups/individuals developed their Things in Google docs which could be accessed and edited by all participants. The Sprinters also used a Zoom channel provided by ARDC, for online calls and coordination, and a Gitter channel, provided by Library Carpentry, to chat with each other throughout the two-days. In addition, participants used the Twitter hashtag #Top10FAIR to communicate with the broader community, sometimes including images of the day.

    Participants greeted each other throughout the Sprint and created an overall welcoming environment. As the Sprint shifted to different timezones, it was a chance for participants to catch up. The Zoom and Gitter channels were a way for many to connect over FAIR but also discuss other topics. A number of participants did not know what to expect from a Library Carpentry/Carpentries-like event but found a welcoming environment where everyone could participate.

  • Guidelines for Effective Data Management Plans

    Data Management Plans
    Federal funding agencies are increasingly recommending or requiring formal data management plans with all grant applications. To help researchers meet those requirements, ICPSR offers these guidelines. Based on our Data Management Plan Web site, this document contains a framework, example data management plans, links to other resources, and a bibliography of related publications. ICPSR also hosts a blog on data management plans.

    Topics include:
    Framework for Creating a Data Management Plan
    Data Mangeme Plan Resources & Examples
    Resources for Development
    Templates and Tools
    Guidance on Funder Requirements
    Good Practice Guidance.

    We hope you find this information helpful as you craft a data management plan. Please contact us at [email protected] with any comments or suggestions.

  • Data Management Plan - Data Management Guides

    A collection of online data management guides, data management planning tools, guidelines from funding agencies, and data management plan examples for researchers and librarians. This page also contains a link to various courses and tutorials on research data management for health science librarians and researchers at:  https://nnlm.gov/data/courses-and-workshops .  

  • Columbia Research Data Management Tutorials and Templates

    The ReaDI Program has created several tutorials and templates to aid in the management of data during the collection phase of research and preparing for publication.  Tutorial topics include:   Good Laboratory Notebook Practices, Laboratory Notebook Checklist, Best Practices for Data Management When Using Instrumentation, and Guidelines on the Organization of Samples in a Laboratory.   Downloadable templates are available on related topics, such as data to figure map templates.  

    The Research and Data Integrity (ReaDI) program is designed to enhance data management and research integrity at Columbia University. The ReaDI program provides resources, outreach and consultation services to researchers at all stages in their careers. Many of the resources are applicable to researchers at any institution.

  • How-to Guides to Managing a Research Project

    These guides are designed to mirror the lifecycle of your research project. They provide support at its various stages.  Topics include:
    - Creating & analysing data
    - Choosing file formats
    - Data discovery & re-use
    - Storing & preserving data
    - Sharing data
    - Handling sensitive & personal information
    - Planning ahead for Data Management
    - Software sustainability, preservation and sharing.

  • Data Management Guidelines

    The guidelines available from this web page cover a number of topics related to research data management.  The guidelinesare targeted to researchers wishing to submit data to the Finnish Social Science Data Archive, but may be helpful to other social scientists interested in practices related to research data management with the understanding that the guidelines refer to the situation in Finland, and may not be applicable in other countries due to differences in legislation and research infrastructure.
    High level topics (or chapters) covered include:
    - Data management planning (the data, rights, confidentiality and data security, file formats and programs, documentation on data processing and content, lifecycle, data management plan models)
    - Copyrights and agreements
    - Processing quantitative data files
    - Processing qualitiative data files
    - Anonymisation and personal data including policies related to ethical review of human sciences
    - Data description and metadata
    - Physical data storage
    - Examples 
    The guidelines are also available in FSD's Guidelines in DMPTuuli, a data management planning tool for Finnish research organisations. It provides templates and guidance for making a data management plan (DMP).
    .

  • USGS Data Management Plans

    The resources in this section will help you understand how to develop your DMP. The checklist outlines the minimum USGS requirements. The FAQ and DMP Writing Best Practices list below will help you understand other important considerations when developing your own DMP. To help standardize or provide guidance on DMPs, a science center or funding source may choose to document their own Data Management strategy. A number of templates and examples are provided.  This page also includes resources related to the overall research data lifecycle that will help put data management plans in the context of the research done.  Information is provided that identifies what the U.S. Geological Survey Manual requires.

  • Research data management training modules in Archaeology (Cambridge)

    Looking after digital data is central to good research. We all know of horror stories of people losing or deleting their entire dissertation just weeks prior to a deadline! But even before this happens, good practice in looking after research data from the beginning to the end of a project makes work and life a lot less stressful. Defined in the widest sense, digital data includes all files created or manipulated on a computer (text, images, spreadsheets, databases, etc). With publishing and archiving of research increasingly online we all have a responsibility to ensure the long-term preservation of archaeological data, while at same time being aware of issues of sensitive data, intellectual property rights, open access, and freedom of information. The DataTrain teaching materials have been designed to familiarise post-graduate students in good practice in looking after their research data. A central tenet is the importance of thinking about this in conjunction with the projected outputs and publication of research projects. The eight presentations, followed by group discussion and written exercises, follow the lifecycle of digital data from pre-project planning, data creation, data management, publication, long-term preservation and lastly to issues of the re-use of digital data. At the same time the course follows the career path of researchers from post-graduate research students, through post-doctoral research projects, to larger collaborative and inter-disciplinary projects. The teaching material is targeted at co-ordinators of Core Research Skills courses for first year post-graduate research students in archaeology. The material is open access and you are invited to re-use and amend the content as best suits the requirements of your university department. The complete course is designed to run either as a four hour half-day workshop, or 2 x 2 hour classes. Alternatively, individual modules can be slotted into existing data management and core research skills teaching.

  • Ten Simple Rules for Creating a Good Data Management Plan

    Research papers and data products are key outcomes of the science enterprise. Governmental, nongovernmental, and private foundation sponsors of research are increasingly recognizing the value of research data. As a result, most funders now require that sufficiently detailed data management plans be submitted as part of a research proposal. A data management plan (DMP) is a document that describes how you will treat your data during a project and what happens with the data after the project ends.  Such plans typically cover all or portions of the data life cycle—from data discovery, collection, and organization (e.g., spreadsheets, databases), through quality assurance/quality control, documentation (e.g., data types, laboratory methods) and use of the data, to data preservation and sharing with others (e.g., data policies and dissemination approaches). The article also includes a downloadable image that illustrates the relationship between hypothetical research and data life cycles and highlights the links to the rules presented in this paper.

  • Research Data Services Guides in Support of Data Management

    Research Data Services is a collaboration between the University of Iowa Libraries, the Office of the Vice President of Research and Economic Development, Information Technology Services, and other campus offices, to support researchers' data management needs. The  guides that are part of these Services include answers to key questions, but may also include short videos on the following topics:
    - Data Management Plans
    - Data Organization and Documentation
    - Data Repositories
    - Datasets
    - Other University of Iowa services and resources available as well as external tools, websites, and repositories that may be useful.

  • Photogrammetry Workshop UNM GEM Lab

    This course provides an introduction to photgrammetry with a full set of data to utilize in building a Digital Elevation Model using Agisoft Photoscan.  The course uses a gitHub repository to grow the workshop into a full featured course on the applications of modern remote sensing and photogrammetry techniques in and for the environmental and geosciences.
     

  • Coffee and Code: Reproducibility and Communication

    This workshop provides an introduction to reproducibility and communication of research using notebooks based on RStudio and Jupyter Notebooks. The development of effective documentation and accesible and reusable methods in scientific analysis can make a significant contribution to the reproducibility and understanding of a research activity.  The integration of executable code with blocks of narrative content within notebook systems such as those provided in RStudio and the Jupyter Notebook (and Lab) software environments provides a streamined way to bring these minimum components (data, metadata, code, and software) into a package that can be easily shared with others for review and reuse.

    This workshop will provide:  

    • A high-level introduction to the notebook interfaces provided for R and Python through the RStudio and Jupyter Notebook environments.
    • An introduction to Markdown as a language supported by both systems for adding narrative content to notebooks
    • Sample notebooks illustrating structure, content, and output options

     From the master page for this resource, the Reproducibility and Communication Using Notebooks ipynb file provides more information about what is covered in this workshop.  

  • Coffee and Code: NoSQL

    Introduction to NoSQL

    In previous sessions we have looked at use cases for relational database management systems (RDBMS), which predominantly make use of SQL. Today's session provides an overview of NoSQL databases. NoSQL can be understood to mean "no SQL" or, alternatively, "not only SQL." NoSQL databases are non-relational, which in the simplest terms means they are not made up of tables.

    Topics we will cover include:

    • Differences between SQL and NoSQL databases
    • Types of NoSQL databases and their use cases
    • Document database basics with MongoDB
    • Graph database basics with Neo4j
  • Coffee and Code: Introduction to Version Control

    This is a tutorial about version control, also known as revision control, a method for tracking changes to files and folders within a source code tree, project, or any complex set of files or documents.

    Also see ​Advanced Version Control, here: ​https://github.com/unmrds/cc-version-control/blob/master/03-advanced-ver...

  • Coffee and Code: Advanced Version Control

    Learn advanced version control practices for tracking changes to files and folders within a source code tree, project, or any complex set of files or documents.  

    This tutorial builds on concepts taught in "Introduction to Version Control," found here: https://github.com/unmrds/cc-version-control/blob/master/01-version-cont....

    Git Repository for this Workshop: https://github.com/unmrds/cc-version-control

  • Coffee and Code: Introduction to Database Design

    In this session, we are going to dig a little deeper into databases as representions of systems and processes. A database with a single table may not feel or function much differently from a spreadsheet. Much of the benefit of using databases results from designing them as models of complex systems in ways that spreadsheets just can't do:

    • Inventory control and billing
    • Human resources
    • Blogging platforms
    • Ecosystems

    There will be some more advanced SQL statements this time, though we will still be using SQLite. Concepts which will be discussed and implemented in our code include

    • Entities and attributes
    • Keys
    • Relationships
    • Normalization
  • Open Data Management in Agriculture and Nutrition Online Course

    This free online course aims to strengthen the capacity of data producers and data consumers to manage and use open data in agriculture and nutrition. One of the main learning objectives is for the course to be used widely within agricultural and nutrition knowledge networks, in different institutions. The course also aims to raise awareness of different types of data formats and uses, and to highlight how important it is for data to be reliable, accessible and transparent.
    The course is delivered through Moodle e-learning platform.  Course units include:

    Unit 1:  Open data principles (http://aims.fao.org/online-courses/open-data-management-agriculture-and-...)
    Unit 2:  Using open data (http://aims.fao.org/online-courses/open-data-management-agriculture-and-...)
    Unit 3:  Making data open (http://aims.fao.org/online-courses/open-data-management-agriculture-and-...)
    Unit 4:  Sharing open data (http://aims.fao.org/online-courses/open-data-management-agriculture-and-...)
    Unit 5:  IPR and Licensing (http://aims.fao.org/online-courses/open-data-management-agriculture-and-...)

    By the end of the course, participants will be able to:
    - Understand the principles and benefits of open data
    -  Understand ethics and responsible use of data
    -  Identify the steps to advocate for open data policies
    -  Understand how and where to find open data
    -  Apply techniques to data analysis and visualisation
    -  Recognise the necessary steps to set up an open data repository
    -  Define the FAIR data principles
    -  Understand the basics of copyright and database rights
    -  Apply open licenses to data
    The course is open to infomediaries which includes ICT workers, technologist - journalists, communication officers, librarians and extensionists; policy makers, administrators and project managers, and researchers, academics and scientists working in the area of  agriculture, nutrition, weather and climate, and land data.

  • Getting Started with Data Management & DMPTool

    Data management plans are critical for compliance on most sponsored projects, and will save you time and resources throughout your project. The DMPTool is on online tool to help you write a data management plan using templates with specific funder requirements.  

  • Introduction to Data Management

    A quick guide to data managment provided by the Sponsored Projects at the University of Nevada, Reno,including discussion of the advantages of managing data.  Emphasis i s placed on the use of the DMPTool. Topics include:
    Why Manage Data?
    About Data Management Plans
    Creating a Data Management Plan
    Sample Data Management Plans
    NSF Data Management Plan FAQs
    NIH Data Plans 

  • Ocean Health Index Toolbox Training

    The single biggest motivation of the Ocean Health Index is to use science to inform marine management. And not just any science, the best available science, data, methods, and tools. OHI assessments use collaborative open software so that they are transparent and reproducible; we call this software the OHI Toolbox.

    Openness is an important part of how we work; we describe how and why in Lowndes et al. 2017, Nature Ecology & Evolution: Our path to better science in less time using open data science tools. Using the OHI Toolbox requires coding and using data science software; you can learn this in OHI’s Intro to Open Data Science training book.

    This Toolbox Training book will train you to prepare for and use the OHI Toolbox. It can be used to teach workshops as a curriculum or workshop guide or for self-paced learning.

    Chapters include:
    Introduction
    Planning and Gathering Data
    OHI Planner
    OHI Toolbox
    Toolbox Ecosystem
    Preparing data
    Calculations:  basic workflow
    Pressures and Resilience
    Calculations
    Reporting
    Communication:  OHI+ websites

     

  • Intro to Data Management

    This guide will provide general information about data management, including an overview of Data Management Plans (DMPs), file naming conventions, documentation, security, backup, publication, and preservation. We have included the CMU data life cycle to put the pieces in context in the Data 101 section.
    The CMU Libraries provides research data management resources for guidance on data management, planning, and sharing for researchers, faculty, and students.

  • MANTRA Research Data Management Training

    MANTRA is a free, online non-assessed course with guidelines to help you understand and reflect on how to manage the digital data you collect throughout your research. It has been crafted for the use of post-graduate students, early career researchers, and also information professionals. It is freely available on the web for anyone to explore on their own.

    Through a series of interactive online units you will learn about terminology, key concepts, and best practice in research data management.

    There are eight online units in this course and one set of offline (downloadable) data handling tutorials that will help you:

    Understand the nature of research data in a variety of disciplinary settings
    Create a data management plan and apply it from the start to the finish of your research project
    Name, organise, and version your data files effectively
    Gain familiarity with different kinds of data formats and know how and when to transform your data
    Document your data well for yourself and others, learn about metadata standards and cite data properly
    Know how to store and transport your data safely and securely (backup and encryption)
    Understand legal and ethical requirements for managing data about human subjects; manage intellectual property rights
    Understand the benefits of sharing, preserving and licensing data for re-use
    Improve your data handling skills in one of four software environments: R, SPSS, NVivo, or ArcGIS

  • RDMRose Learning Materials

    RDMRose was a JISC funded project to produce, and teach professional development learning materials in Research Data Management (RDM) tailored for Information professionals. The Slideshare presentations and documents include an overview of RDM, research in higher education, looking at research data, the research data lifecycle, data management plans, research data services, metadata, and data citation.  

    RDMRose developed and adapted learning materials about RDM to meet the specific needs of liaison librarians in university libraries, both for practitioners’ CPD and for embedding into the postgraduate taught curriculum. Its deliverables included open educational resources materials suitable for learning in multiple modes, including face to face and self-directed learning.

    Session topics include:
    Introductions, RDM, and the Role of LIS
    The Nature of Research and the Need for RDM
    The digital curation lifecycle
    Key Institutions and Projects in RDM
    What is data?
    Managing data
    Case Studies of Resaerch Projects
    Case Study:  Institutional Context, and Conclusions

     

  • Research Data Management and Open Data

    This was a presentation during the Julius Symposium 2017 on Open Science and in particular on Open data and/or FAIR data.  Examples are given of medical and health research data.

  • The Service Oriented Toolkit for Research Data Management

    The Service Oriented Toolkit for Research Data Management project was co-funded by the JISC Managing Research Data Programme 2011-2013 and The University of Hertfordshire. The project focused on the realisation of practical benefits for operationalising an institutional approach to good practice in RDM. The objectives of the project were to audit current best practice, develop technology demonstrators with the assistance of leading UH research groups, and then reflect these developments back into the wider internal and external research community via a toolkit of services and guidance. The overall aim was to contribute to the efficacy and quality of research data plans, and establish and cement good data management practice in line with local and national policy.

    The toolkit offers blog entries, survey results and analysis, case studies, reviews of service, test data, services, artifacts such as research project file plans, workflow recommendations, datasets, presentations, example data management plans,  and training on topics such as data encryption.  Information is also provided on best practice assessments in Astronomy, Physics, Maths, Robotics, and Atmospheric sciences based on formal and informal interviews with researchers.

  • 'Good Enough' Research Data Management: A Brief Guide for Busy People

    This brief guide presents a set of good data management practices that researchers can adopt, regardless of their data management skills and levels of expertise.

  • De bonnes pratiques en gestion des données de recherche: Un guide sommaire pour gens occupés (French version of the 'Good Enough' RDM)

    Ce petit guide présente un ensemble de bonnes pratiques que les chercheurs peuvent adopter, et ce, indépendamment de leurs compétences ou de leur niveau d’expertise. 

  • Coffee and Code: Content Platform

    UNM RDS Content Platform for the Coffee & Code Workshop Series

    This repository contains the needed code to replicate the presentation and playground environments used for the UNM Research Data Services (RDS) Coffee & Code workshop series. The materials in this repository leverage Docker as a platform for developing and deploying portable containers that support individual applications. In the case of the Coffee & Code instruction platform, the applications that are integrated into the system include:

    • Jupyter Notebooks as a presentation, demonstration, and experimentation environment (based on the datascience-notebook container with the addition of Pandoc and LaTeX)
    • A web-based RStudio environment (based on the rocker/rstudio with the addition of the R dplyr, ggplot2, ggrepel)
    • Installed tools within the Jupyter Notebook platform include:
      • Git
      • Pandoc & LaTeX
      • BASH shell
      • Python
      • R
  • Digital Preservation Workshop Module 5: Post Submission

    Module 5 of The Digital Preservation Network's Digital Preservation Workflow Curriculum examines the relationship of the content holders to the preservation service on an ongoing basis following the submission of content. Regardless of whether the preservation environment is internal to the organization, an external service providing organization, or a collaborative consortium, ongoing preservation is a shared responsibility. This module will lay out the various roles, responsibilities, and tasks, and the service framework that will support the establishment of a sustainable preservation program.

     

  • Digital Preservation Workshop Module 6: Sustainability

    Module 6 of The Digital Preservation Network's Digital Preservation Workflow Curriculum introduces some of the standards and best practices for digital preservation program assessment, tools and activities for performing assessments, and developing a business case for digital preservation. Finally, the module provides practical next steps for applying knowledge gained through the workshop. 

     

  • Digital Preservation Workshop Module 1: Programmatic Digital Preservation

    Module 1 of The Digital Preservation Network's Digital Preservation Curriculum provides an overview of the workshop contents and approach. It begins with a discussion of the goal of the workshop — providing participants with the capacity to ensure valuable content is stored in a managed environment over the long-term, and enact digital preservation programs at their organizations — and provides an opportunity for participants to discuss what this might look like within different organizational contexts. Participants will look at the factors involved in operationalizing a digital preservation program, and the pathway that content travels along as it nears a long-term storage environment. This module introduces the problem-solving and decision-making framework that will run throughout all subsequent modules.

     

  • Digital Preservation Workshop Module 2: Selection

    Module 2 of The Digital Preservation Network's Digital Preservation Workflow Curriculum introduces the concept of selection for digital preservation and how understanding an organization’s collections landscape can help with planning, selection, and prioritization of digital content for preservation. Lectures will discuss planning, and offer criteria and introduce tools to track and document collections and evaluate their readiness to prioritize submission to a digital preservation service. Participants will consider factors such as legal status, “done-ness” (when is an asset ready to be preserved?), and roles and responsibilities for decision making. They will be asked to look at how the sources of content (whether from digitization or borndigital) affect decision making, and will apply what they have learned through discussions and a case study on evaluation. 

     

  • Digital Preservation Workshop Module 4: Submission and Ingest

    Module 4 of the Digital Preservation Network's Digital Preservation Workflow Curriculum introduces the concept of transferring submission packages to preservation environments. It underscores the importance of logging transfer, upload, and verification events during ingest for the establishment (or continuation) of an audit trail that will track digital assets throughout their life in the preservation environment. Lecture will provide an overview of best practices for submission and the capture of information produced by the related events. Participants will gain experience with tools that support package transfer and will upload submission packages into a local environment and a cloud or preservation service. 

     

  • Digital Preservation Workshop Module 3: Preparing for Submission

    Module 3 of The Digital Preservation Network's Digital Preservation Workflow Workshop Curriculum focuses on preparing content for submission to a long-term storage service, whether in-house or external to the organization. It will emphasize requisite tasks such as understanding and conforming to submission requirements, local file management prior to submission, and tracking asset status. This module will explore common challenges encountered during this stage in the workflow, such as determining how and when to capture metadata, deciding what is “good enough” to submit, dealing with different content sources (e.g., born-digital vs. digitized), and work through ways of resolving these. A case study will be used to provide participants with experience creating a plan for this stage. A hands-on exercise creating a preservation package according to the specifications of a long-term storage service will expose participants to common tools and approaches for compliance with requirements. It will conclude with a discussion of how the processes reviewed during this module can be implemented in a program that will support all organizational content regardless of type, source, or owner.

     

  • Digital Preservation Workflow Curriculum Development

    The purpose of this workshop curriculum is to provide attendees with: A. An understanding of the goals, processes, and responsibilities involved in the creation of a digital preservation program B. Problem-solving and decision-making skills to enable ongoing, collaborative digital preservation throughout technological, organizational, and content changes.

    This workshop will equip participants with a set of skills and knowledge that will enable them to enact programmatic digital preservation within their organization. It is focused on equipping organizations with the capability to implement and manage a digital preservation program. The workshop modules present the requirements of a digital preservation ecosystem from two parallel viewpoints: 1) governance and program management, including the creation of a unified strategy and the need for cross-organizational coordination, balancing the competing priorities of innovation and maintenance, and 2) asset management, including the selection and submission of content to a managed preservation environment, and ongoing post-submission responsibilities.

    Module topics include:
    1.  Enabling Programmatic Digital Preservation
    2.  Selection
    3.  Preparing for Submission
    4.  Submission and Ingest
    5.  Post-Submission
    6.  Sustainability
     

  • Coffee and Code: Natural Language Processing with Python

    Github repository for this workshop: https://github.com/unmrds/cc-nlp https://github.com/unmrds/cc-nlp

    The processing and analysis of natural languages is a core requirement for extracting structured information from spoken, signed, or written language and for feeding that information into systems or processes that generate insights from, or responses to provided language data. As languages that are naturally evolved and not designed for a specific purpose natural languages pose significant challenges when developing automated systems.

    Natural Language Processing - the class of activities in which language analysis, interpretation, and generation play key roles - is used in many disciplines as is demonstrated by this random sample of recent papers using NLP to address very different research problems:

    "Unsupervised entity and relation extraction from clinical records in Italian" (1)
    "Candyflipping and Other Combinations: Identifying Drug–Drug Combinations from an Online Forum" (2)
    "How Can Linguistics Help to Structure a Multidisciplinary Neo Domain such as Exobiology?" (3)
    "Bag of meta-words: A novel method to represent document for the sentiment classification" (4)
    "Information Needs and Communication Gaps between Citizens and Local Governments Online during Natural Disasters" (5)
    "Mining the Web for New Words: Semi-Automatic Neologism Identification with the NeoCrawler" (6)
    "Distributed language representation for authorship attribution" (7)
    "Toward a computational history of universities: Evaluating text mining methods for interdisciplinarity detection from PhD dissertation abstracts" (8)
    "Ecological momentary interventions for depression and anxiety" (9)