All Learning Resources

  • GeoDatabase (.gdb) Data Curation Primer

    The geodatabase is a container for geospatial datasets that can also provide relational functionality between the files. Although the term geodatabase can be used more widely, this primer describes the ArcGIS geodatabase designed by Esri.
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018.
    Table of Contents :
     1. Description of format  
    2. Examples of geodatabase datasets  
    3. Key questions 
     4. Instructions for resources to use in the curation review of geodatabase files 
     5. Metadata 
    6. Preservation actions  
    7. Bibliography 
     Appendix 1: Future Primer Directions 

    More information about the collection of Data Curation Primers can be found at:  http://hdl.handle.net/11299/202810.

    Interactive primers available for download and derivatives at: https://github.com/DataCurationNetwork/data-primers.

  • Tutorial for using the netCDF Data Curation Primer

    This document is a supplemental primer to the main IMLS-Data-CurationFormat Profile-netCDF primer (http://hdl.handle.net/2027.42/145724). Within this primer, the NCAR Global Climate Four-Dimensional Data Assimilation (CFDDA) Hourly 40 km Reanalysis dataset from the Research Data Archive (RDA) at the National Center for Atmospheric Research (NCAR) is used to demonstrate how to assess a netCDF-based dataset according to the main primer’s instructions. In particular, Panoply, a curation review tool that is recommended by the main primer, is used to examine the dataset in order to help answer the questions outlined in the “Key Questions for Curation Review” section of the main primer.
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018.

    More information about the collection of Data Curation Primers can be found at:  http://hdl.handle.net/11299/202810.

    Interactive primers available for download and derivatives at: https://github.com/DataCurationNetwork/data-primers.
     

  • Persistent Identifiers: Current Features and Future Properties (Webinar)

    This webinar is for people who know what persistent identifiers are, but are interested in knowing much more about what you can actually do with them. In other words, what are the services that are being built on top of identifier systems that could be useful to the digital preservation community? it will cover topics such as party identification, interoperability and (metadata) services such as multiple resolution. Following on from that, it will explain more about the next generation of resolvers and work on extensions, such as specification of the URN r-component semantics.

  • ANDS Guide to Persistent Identifiers: Awareness Level

    A persistent identifier (PID) is a long-lasting reference to a resource. That resource might be a publication, dataset or person. Equally it could be a scientific sample, funding body, set of geographical coordinates, unpublished report or piece of software. Whatever it is, the primary purpose of the PID is to provide the information required to reliably identify, verify and locate it. A PID may be connected to a set of metadata describing an item rather than to the item itself.
    The contents of this page are:
     What is a persistent identifier?
    Why do we need persistent identifiers?
    How do persistent identifiers work?
    What needs to be done, by whom?

    Other ANDS Guides are available at the working level and expert level from this page.

  • ANDS Guides to Persistent Identifiers: Working Level

    This module is to familiarize researchers and administrators with persistent identifiers as they apply to research. It gives an overview of the various issues involved with ensuring identifiers provide ongoing access to research products. The issues are both technical and policy; this module focuses on policy issues. 
    This guide goes through the same issues as the ANDS guide Persistent identifiers: awareness level, but in more detail. The introductory module is not a prerequisite for this module.
    The contents of this page are:
    Why persistent identifiers?
    What is an Identifier?
    Data and Identifier life cycles
    What is Identifier Resolution?
    Technologies
    Responsibilities
    Policies

    Other ANDS Guides on this topic at the awareness level and expert level can be found from this page.

  • ANDS Guides to Persistent identifiers: Expert Level

    This module aims to provide research administrators and technical staff with a thorough understanding of the issues involved in setting up a persistent identifier infrastructure. It provides an overview of the types of possible identifier services, including core services and value-added services. It offers a comprehensive review of the policy issues that are involved in setting up persistent identifiers. Finally, a glossary captures the underlying concepts on which the policies and services are based.

    Other ANDS Guides on this topic are available for the awareness level and the working level from this page.

  • Wordpress.com (hosted) Data Curation Primer

    WordPress.com is the hosted version of the open-source WordPress.org software (https://en.support.wordpress.com/com-vs-org/; https://dailypost.wordpress.com/2013/11/14/com-or-org/) offering a free online publishing platform with optional features, plans, and custom domains available for an additional cost (https://wordpress.com/about/). This primer will focus exclusively on the WordPress.com free site export and archiving process. In the future, additional primers and/or additions to this primer may be beneficial in order to cover the variations with WordPress.com Business Plan sites and WordPress.org software. 
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018.
    Table of Contents: 
    1. Description of format
     2. Examples 
    3. Sample data set citations 
    4. Key questions to ask yourself 
    5. Key clarifications to get from researcher 
    6. Applicable metadata standard, core elements, and readme requirements 
    7. Resources for reviewing data 
    8. Software for viewing or analyzing data 
    9. Preservation actions 
    10. What to look for to make sure this file meets FAIR principles 
    11. Ways in which fields may use this format 
    12. Unresolved Issues/Further Questions [for example tracking the provenance of data creation, level of detail in a dataset] 
    13. Documentation of curation process: What do capture from curation process 
    14. Appendix A - filetype CURATED checklist 

    More information about the collection of Data Curation Primers can be found at:  http://hdl.handle.net/11299/202810.

    Interactive primers available for download and derivatives at: https://github.com/DataCurationNetwork/data-primers.

  • Identifying and Linking Physical Samples with Data using IGSNs - PIDs Short Bites #2

    This webinar is the second in the PIDs Short Bites webinar series series examining persistent identifiers and their use in research. This webinar:
    1) introduced the IGSN, outlining its structure, use, application and availability for Australian researchers and research institutions
    2) discussed the international symposium "Linking Environmental Data and Samples".
     
    Slides available: https://www.slideshare.net/AustralianNationalDataService/identifying-and...
     

  • Linking Data and Publications - the Scholix Initiative - PIDs Short Bites #3

    This webinar was the third in the PID Short Bites webinar series examining persistent identifiers and their use in research. This webinar provides an introduction and overview of the Scholix (SCHOlarly LInk eXchange) initiative: a high-level interoperability framework aimed at increasing and facilitating exchange of information about the links between data and scholarly literature, as well as between data. The framework is a global community and multi-stakeholder driven effort involving journal publishers, data centers, and global service providers.

  • DOIs to Support Citation of Grey Literature - PIDs Short Bites #1

    This webinar was the first in the PIDs Short Bites webinar series examining persistent identifiers and their use in research. It begins with a brief introduction on the use of persistent identifiers in research followed by an outline of how UNSW has approached supporting discovery and citation of grey literature. grey literature materials are often important parts of the scholarly record which can contribute to research impact, and thus there is a need to make them discoverable and citable. Accompanying workflows meet the needs of researchers or administrators that produce grey literature on a regular and ongoing basis.
    You can find the Slides on:
    https://zenodo.org/record/165620#.XbMzV5pKiUk
     
     https://www.slideshare.net/AustralianNationalDataService/pids-for-resear...
     

  • RAID, a PID for Projects - PIDs Short Bites #4

     
    This webinar is the fourth in the PID Short Bites webinar series that will cover: RAiD PIDs.  Research Activity Identier (RAID) addresses issues surrounding Research Data Management planning and processes.
    You can find the slides here:
    https://www.slideshare.net/AustralianNationalDataService/andrew-janke-ra...
    https://www.slideshare.net/AustralianNationalDataService/siobhann-mccaff...

  • Introduction to Statistics for Social Sciences: Lecture 2

    This video is one of a 3 lecture series that introduces students to statistics for social science research.  The lectures support the textbook:  "REVEL for Elementary Statistics in Social Science" by J. Levin, J.A. Fox, and D.R. Forde.  ​This video covers the measures of central tendency and variability topics included in Chapters 3 and 4 of the Levin, Fox, and Forde text. The REVEL book contains a balanced overview of statistical analysis in the social sciences, providing coverage of both theoretical concepts and step-by-step computational techniques. Throughout this best-selling text, authors Jack Levin, James Alan Fox, and David R. Forde make statistics accessible to all readers, particularly those without a strong background in mathematics. Jessica Bishop-Royse,  the instructor of the video course,  has divided the book’s chapter into 3 lectures and presents examples to clarify the contents.

    Access to Lecture 1 and  Lecture 3

  • Introduction to Statistics for Social Sciences: Lecture 3

    This video is one of a 3 lecture series that introduces students to statistics for social science research.  The lectures support the textbook:  "REVEL for Elementary Statistics in Social Science" by J. Levin, J.A. Fox, and D.R. Forde.  ​This video lecture covers probability and normal distributions, topics included in Chapter 5 of the Levin, Fox, and Forde text. The REVEL book contains a balanced overview of statistical analysis in the social sciences, providing coverage of both theoretical concepts and step-by-step computational techniques. Throughout this best-selling text, authors Jack Levin, James Alan Fox, and David R. Forde make statistics accessible to all readers, particularly those without a strong background in mathematics. Jessica Bishop-Royse,  the instructor of the video course,  has divided the book’s chapter into 3 lectures and presents examples to clarify the contents.

    Access to Lecture 1 and Lecture 2

  • Introduction to Statistics for Social Sciences: Lecture 1

    This video is one of a 3 lecture series that introduces students to statistics for social science research.  The lectures support the textbook:  "REVEL for Elementary Statistics in Social Science" by J. Levin, J.A. Fox, and D.R. Forde.  This video lecture covers the research process, and organizing and viewing data, topics covered in Chapters 1 and 2 in the Levin, Fox, and Forde text. The REVEL book contains a balanced overview of statistical analysis in the social sciences, providing coverage of both theoretical concepts and step-by-step computational techniques. Throughout this best-selling text, authors Jack Levin, James Alan Fox, and David R. Forde make statistics accessible to all readers, particularly those without a strong background in mathematics. Jessica Bishop-Royse,  the instructor of the video course,  has divided the book’s chapter into 3 lectures and presents examples to clarify the contents.
    Access to Lecture 2,  and lecture 3.
     
  • Teaching and Learning with ICPSR

    These resources were created especially for undergraduate faculty and students. While any of ICPSR's data and tools can be used in the classroom, the ones provided here make it easy for instructors to set up data-driven learning experiences. The materials can be used as the basis for assignments, as an in-class or study exercise, for lecture content, or any other way you see fit. All resources are provided under a Creative Commons (attribution) License.

    A number of data-driven learning guides are provide which are standardized exercises that introduce (or reinforce) key concepts in the social sciences by guiding students through a series of questions and related data analyses. Analyses are preset so students can focus on content rather than mechanics of data analysis. To assist instructors with selection, guides are also categorized by the most sophisticated statistical test presented in the exercise.

    In addition, exercise modules that are made up of sequenced activities. While assignments may be created using a few of the exercises in a set, the full package must be used to meet the stated learning objectives for each. Exercise Sets are often appropriate for Research Methods courses and more substantively focused courses.

    Established in 1962, the Inter-university Consortium for Political and Social Research (ICPSR) provides leadership and training in data access, curation, and methods of analysis for a diverse and expanding social science research community. The ICPSR data archive is unparalleled in its depth and breadth; its data holdings encompass a range of disciplines, including political science, sociology, demography, economics, history, education, gerontology, criminal justice, public health, foreign policy, health and medical care, education, child care research, law, and substance abuse. ICPSR also hosts several sponsored projects focusing on specific disciplines or topics. Social scientists in all fields are encouraged to archive their data at ICPSR.

    ICPSR also provides guidelines related to curation of social science data.  Specific data curation guidelines on data quality, access, preservation , confidentiality and citation are available as videos and other resources at:  http://www.icpsr.umich.edu/web/pages/datamanagement/index.html .

  • PID Platform

    The platform is designed to help people understand what persistent identifiers are, why they exist, what they're used for and how to use them. It's split into several sections, each aimed at different stakeholder groups.  The PID Platform was developed by Project THOR, THOR was a 30 month project funded by the European Commission under the Horizon 2020 programme. It aimed to establish seamless integration between articles, data, and researchers across the research lifecycle. The project created a wealth of open resources and fostered a sustainable international e-infrastructure. The result was reduced duplication, economies of scale, richer research services, and opportunities for innovation.  The work of the THOR project has been continued by the FREYA Project.  Find out more about the FREYA Project at:  https://www.project-freya.eu/en .

    The PID platform is one product of THOR.  The best place to start is to choose one of the introductions to the stakeholder groups:
    -Introduction for integrators at:  https://project-thor.readme.io/v2.0/docs/introduction-for-integrators
    -Introduction for policy makers at:  https://project-thor.readme.io/v2.0/docs/introduction-for-policy-makers
    -Introduction for publishers at:  https://project-thor.readme.io/v2.0/docs/introduction-for-publishers
    -Introduction for researchers at:  https://project-thor.readme.io/v2.0/docs/introduction-for-researchers
    -Introduction for librarians and repository managers at:  https://project-thor.readme.io/v2.0/docs/introduction-for-librarians-and...

    Other resources produced by THOR including webinar presentations, posters, etc., can be found from the Getting Started link.

  • Big Data Hadoop Tutorial for Beginners: Learn in 7 Days!

    Big Data is the latest buzzword in the IT Industry. Apache’s Hadoop is a leading Big Data platform used by IT giants Yahoo, Facebook & Google. This step by step free course is geared to make a Hadoop Expert. This online guide is designed for beginners. But knowledge of  Java and Linux will help.  NOTE:  The tutorials feature ads on the pages of the tutorials.
    You can find these contents from this page:
      -Introduction to BIG DATA: What is, Types, Characteristics & Example
      -What is Hadoop? Introduction, Architecture, Ecosystem, Components
      -How to Install Hadoop with Step by Step Configuration on Ubuntu
      -HDFS Tutorial: Architecture, Read & Write Operation using Java API
      -What is MapReduce? How it Works - Hadoop MapReduce Tutorial
      -Hadoop & MapReduce Examples: Create your First Program
      -Hadoop MapReduce Join & Counter with Example
     -Apache Sqoop Tutorial: What is, Architecture, Example
     -Apache Flume Tutorial: What is, Architecture & Twitter Example
     -Hadoop Pig Tutorial: What is, Architecture, Example
     -Apache Oozie Tutorial: What is, Workflow, Example - Hadoop
     -Big Data Testing Tutorial: What is, Strategy, how to test Hadoop
     -Hadoop & MapReduce Interview Questions & Answers
  • An Introduction to Humanities Data Curation

    This webpage is a compilation of articles that address aspects of data curation in the digital humanities. The goal of it is to direct readers to trusted resources with enough context from expert editors and the other members of the research community to indicate how these resources might help them with their own data curation challenges.
    Each article provides a short introduction to a topic and a list of linked resources. Structuring articles in this way acknowledges the many excellent resources that already exist to provide guidance on subjects relevant to curation such as data formats, legal policies, description, and more.
    The table of contents:
    -An Introduction to Humanities
    -Data Curation-Classics, “Digital Classics” and Issues for Data Curation
    -Data Representation
    -Digital Collections and Aggregations
    -Policy, Practice, and Law
    -Standards
  • Data Management In The Arts and Humanities

    This presentation provides a response to a few tricky questions that often come up at the Digital Curation Centre in the UK by people providing services to art and humanities researchers.  Martin Donnely at the Digital Curation Centre discusses topics including. 
    - A brief introduction about DCC: The Digital Curation Centre (DCC) is an internationally-recognized center of expertise in digital curation with a focus on building capability and skills for research data management. The DCC provides expert advice and practical help to research organizations wanting to store, manage, protect, and share digital research data.
    - What is data and what do we mean by research data management?
    - What are the scientific methods and why is different in Arts and Humanities?
    - What are the strengths and weaknesses of data in the Art and Humanities?
    - Archiving issues around Art and Humanities
  • Findability of Research Data and Software Through PIDs and FAIR Repositories

    This presentation introducing the "Findability of Research Data and Software Through PIDs and FAIR Repositories" is one of 9 webinars on topics related to FAIR Data and Software that was offered at a Carpentries-based Workshop in Hannover, Germany, Jul 9-13 2018.  Presentation slides are also available in addition to the recorded presentation.
    Other topics included in the series include:
    - Introduction, FAIR Principles and Management Plans
    - Accessibility through Git, Python Functions and Their Documentation
    - Interoperability through Python Modules, Unit-Testing and Continuous Integration
    - Reusability through Community Standards, Tidy Data Formats and R Functions, their Documentation, Packaging, and Unit-Testing
    - Reusability:  Data Licensing
    - Reusability:  Software Licensing
    - Reusability:  Software Publication
    - FAIR Data and Software - Summary
     
    URL locations for the other modules in the webinar can be found at the URL above.

  • Accessibility Through Git, Python Functions and Their Documentation

    This presentation " Accessibility Through Git, Python Functions and Their Documentation" is one of 9 webinars on topics related to FAIR Data and Software that was offered at a Carpentries-based Workshop in Hannover, Germany, Jul 9-13 2018.  Presentation slides are also available in addition to the recorded presentation.
    In this presentation they Talk about:
    - The definitions and role of Accessibility
    - Version control & project management with GIT(HUB)
    - Accessible software & comprehensible code
    - Functions in python & R

    Other topics included in the series include:
    - Introduction, FAIR Principles and Management Plans
    - Findability of Research Data and Software Through PIDs and FAIR Repositories
    - Interoperability through Python Modules, Unit-Testing and Continuous Integration
    - Reusability through Community Standards, Tidy Data Formats and R Functions, their Documentation, Packaging, and Unit-Testing
    - Reusability:  Data Licensing
    - Reusability:  Software Licensing
    - Reusability:  Software Publication
    - FAIR Data and Software - Summary
     
    URL locations for the other modules in the webinar can be found at the URL above.

  • Interoperability Through Python Modules, Unit-Testing and Continuous Integration

    This presentation " Interoperability Through Python Modules, Unit-Testing and Continuous Integration" is one of 9 webinars on topics related to FAIR Data and Software that was offered at a Carpentries-based Workshop in Hannover, Germany, Jul 9-13 2018.  Presentation slides are also available in addition to the recorded presentation.
     
    Other topics included in the series include:
    - Introduction, FAIR Principles and Management Plans
    -Findability of Research Data and Software Through PIDs and FAIR Repositories
    - Accessibility through Git, Python Functions and Their Documentation
    - Reusability through Community Standards, Tidy Data Formats and R Functions, their Documentation, Packaging, and Unit-Testing
    - Reusability:  Data Licensing
    - Reusability:  Software Licensing
    - Reusability:  Software Publication
    - FAIR Data and Software - Summary
     
    URL locations for the other modules in the webinar can be found at the URL above.

  • Reusability Through Community-Standards, Tidy Data Formats and R Functions, Their Documentation, Packaging and Unit-Testing

    This presentation introducing the "Reusability Through Community-Standards, Tidy Data Formats and R Functions, Their Documentation, Packaging and Unit-Testing" is one of 9 webinars on topics related to FAIR Data and Software that was offered at a Carpentries-based Workshop in Hannover, Germany, Jul 9-13 2018.  Presentation slides are also available in addition to the recorded presentation.
     
    Other topics included in the series include:
    - Introduction, FAIR Principles and Management Plans
    -Findability of Research Data and Software Through PIDs and FAIR Repositories
    - Accessibility through Git, Python Functions and Their Documentation
    - Interoperability through Python Modules, Unit-Testing and Continuous Integration
    - Reusability:  Data Licensing
    - Reusability:  Software Licensing
    - Reusability:  Software Publication
    - FAIR Data and Software - Summary
     
    URL locations for the other modules in the webinar can be found at the URL above.

  • Reusability: Data Licensing

    This presentation "Reusability: Data Licensing" is one of 9 webinars on topics related to FAIR Data and Software that was offered at a Carpentries-based Workshop in Hannover, Germany, Jul 9-13 2018.  Presentation slides are also available in addition to the recorded presentation.
     
    Other topics included in the series include:
    - Introduction, FAIR Principles and Management Plans
    -Findability of Research Data and Software Through PIDs and FAIR Repositories
    - Accessibility through Git, Python Functions and Their Documentation
    - Interoperability through Python Modules, Unit-Testing and Continuous Integration
    - Reusability Through Community-Standards, Tidy Data Formats and R Functions, Their Documentation, Packaging and Unit-Testing
    - Reusability:  Software Licensing
    - Reusability:  Software Publication
    - FAIR Data and Software - Summary
     
    URL locations for the other modules in the webinar can be found at the URL above.

  • Reusability: Software Licensing

    This presentation " Reusability: Software Licensing" is one of 9 webinars on topics related to FAIR Data and Software that was offered at a Carpentries-based Workshop in Hannover, Germany, Jul 9-13 2018.  Presentation slides are also available in addition to the recorded presentation.
     
    Other topics included in the series include:
    - Introduction, FAIR Principles and Management Plans
    -Findability of Research Data and Software Through PIDs and FAIR Repositories
    - Accessibility through Git, Python Functions and Their Documentation
    - Interoperability through Python Modules, Unit-Testing and Continuous Integration
    - Reusability Through Community-Standards, Tidy Data Formats and R Functions, Their Documentation, Packaging and Unit-Testing
    - Reusability: Data Licensing
    - Reusability:  Software Publication
    - FAIR Data and Software - Summary
     
    URL locations for the other modules in the webinar can be found at the URL above.

  • Reusability: Software Publication

    This presentation " Reusability: Software Publication" is one of 9 webinars on topics related to FAIR Data and Software that was offered at a Carpentries-based Workshop in Hannover, Germany, Jul 9-13 2018.  Presentation slides are also available in addition to the recorded presentation.
     
    Other topics included in the series include:
    - Introduction, FAIR Principles and Management Plans
    -Findability of Research Data and Software Through PIDs and FAIR Repositories
    - Accessibility through Git, Python Functions and Their Documentation
    - Interoperability through Python Modules, Unit-Testing and Continuous Integration
    - Reusability Through Community-Standards, Tidy Data Formats and R Functions, Their Documentation, Packaging and Unit-Testing
    - Reusability: Data Licensing
    - Reusability: Software Licensing
    - FAIR Data and Software - Summary
     
    URL locations for the other modules in the webinar can be found at the URL above

  • FAIR Data and Software - Summary

    This presentation Summary of FAIR Data and Software  is one of 9 webinars on topics related to FAIR Data and Software that was offered at a Carpentries-based Workshop in Hannover, Germany, Jul 9-13 2018.  Presentation slides are also available in addition to the recorded presentation.
     
    Other topics included in the series include:
    - Introduction, FAIR Principles and Management Plans
    -Findability of Research Data and Software Through PIDs and FAIR Repositories
    - Accessibility through Git, Python Functions and Their Documentation
    - Interoperability through Python Modules, Unit-Testing and Continuous Integration
    - Reusability Through Community-Standards, Tidy Data Formats and R Functions, Their Documentation, Packaging and Unit-Testing
    - Reusability: Data Licensing
    - Reusability: Software Licensing
    - Reusability:  Software Publication

    URL locations for the other modules in the webinar can be found at the URL above.

  • Formal Ontologies: A Complete Novice's Guide

    This module is specifically aimed at those who are not yet familiar with ontologies as a means of research data management, and will take you through some of the main features of ontologies, and the reasons for using them.  If you’d like to take a step back to a very basic introduction to knowledge representation systems, you could have a look at the brief summary we have given in the ‘Introduction to Research Infrastructures Module’ before starting.
    By the end of this module, participants should be able to:
    -Understand what we mean by ‘Data Hetereogeneity’, and how it affects knowledge representation
    -Understand and explain the basic concept of an ontology
    -Understand and explain how ontologies are used to curate and share research data

    PARTHENOS training provides modules and resources in digital humanities and research infrastructures with the goal of strengthening the cohesion of research in the broad sector of Linguistic Studies, Humanities, Cultural Heritage, History, Archaeology and related fields.  Activities designed to meet this goal will address and provide common solutions to the definition and implementation of joint policies and solutions for the humanities and linguistic data lifecycle, considering the specific needs of the sector including the provision of joint training activities and modules on topics related to understanding research infrastructures and managing, improving and opening up research and data for both learners and trainers.
    More information about the PARTHENOS project can be found at:  http://www.parthenos-project.eu/about-the-project-2.
      Other training modules created by PARTHENOS can be found at:  http://training.parthenos-project.eu/training-modules/.
     

  • PARTHENOS E-Humanities and E-Heritage Webinar Series

    The PARTHENOS eHumanities and eHeritage Webinar Series provides a lens through which a more nuanced understanding of the role of Digital Humanities and Cultural Heritage research infrastructures in research can be obtained.  Participants of the PARTHENOS Webinar Series will delve into a number of topics, technologies, and methods that are connected with an “infrastructural way” of engaging with data and conducting humanities research.

    Topics include: theoretical and practical reflections on digital and analogue research infrastructures; opportunities and challenges of eHumanities and eResearch; finding, working and contributing to Research Infrastructure collections; standards; FAIR principles; ontologies; tools and Virtual Research Environments (VREs), and; new publication and dissemination types.  

    Slides and video recordings of the webinars can be found from the "Wrap Up & Materials" pages at the landing page for each webinar's separate listing/linking that can be found on this series landing page.  

    Learning Objectives: 
    Each webinar of the PARTHENOS Webinar Series has an individual focus and can be followed independently.  Participants who follow the whole series will gain a complete overview on the role and value of Digital Humanities and Cultural Heritage Research Infrastructures for research, and will be able to identify Research Infrastructures especially valuable for their research and data.
     

  • Analyzing Documents with TF-IDF

    This lesson focuses on a core natural language processing and information retrieval method called Term Frequency - Inverse Document Frequency (tf-idf). You may have heard about tf-idf in the context of topic modeling, machine learning, or other approaches to text analysis. Tf-idf comes up a lot in published work because it’s both a corpus exploration method and a pre-processing step for many other text-mining measures and models.

    Looking closely at tf-idf will leave you with an immediately applicable text analysis method. This lesson will also introduce you to some of the questions and concepts of computationally oriented text analysis. Namely, this lesson addresses how you can isolate a document’s most important words from the kinds of words that tend to be highly frequent across a set of documents in that language. In addition to tf-idf, there are a number of computational methods for determining which words or phrases characterize a set of documents, and I highly recommend Ted Underwood’s 2011 blog post as a supplement.

    Suggested Prior Skills
    - Prior familiarity with Python or a similar programming language. Code for this lesson is written in Python 3.6, but you can run tf-idf in several different versions of Python, using one of several packages, or in various other programming languages. The precise level of code literacy or familiarity recommended is hard to estimate, but you will want to be comfortable with basic types and operations. To get the most out of this lesson, it is recommended that you work your way through something like Codeacademy’s “Introduction to Python” course, or that you complete some of the introductory Python lessons on the Programming Historian.
    - In lieu of the above recommendation, you should review Python’s basic types (string, integer, float, list, tuple, dictionary), working with variables, writing loops in Python, and working with object classes/instances.
    - Experience with Excel or an equivalent spreadsheet application if you wish to examine the linked spreadsheet files. You can also use the pandas library in python to view the CSVs.

  • Temporal Network Analysis with R

    This tutorial introduces methods for visualizing and analyzing temporal networks using several libraries written for the statistical programming language R. With the rate at which network analysis is developing, there will soon be more user-friendly ways to produce similar visualizations and analyses, as well as entirely new metrics of interest. For these reasons, this tutorial focuses as much on the principles behind creating, visualizing, and analyzing temporal networks (the “why”) as it does on the particular technical means by which we achieve these goals (the “how”). It also highlights some of the unhappy oversimplifications that historians may have to make when preparing their data for temporal network analysis, an area where our discipline may actually suggest new directions for temporal network analysis research.

    One of the most basic forms of historical argument is to identify, describe, and analyze changes in a phenomenon or set of phenomena as they occur over a period of time. The premise of this tutorial is that when historians study networks, we should, insofar as it is possible, also be acknowledging and investigating how networks change over time.

    Lesson Goals
    In this tutorial you will learn:
    -The types of data necessary to model a temporal network
    -How to visualize a temporal network using the NDTV package in R
    -How to quantify and visualize some important network-level and node-level metrics that describe temporal networks using the TSNA package in R.

    Prerequisites:
    This tutorial assumes that you have:
    - a basic familiarity with static network visualization and analysis, which you can get from excellent tutorials on the Programming Historian such as From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources and Exploring and Analyzing Network Data with Python
    - RStudio with R version 3.0 or higher
    - A basic understanding of how R can be used to modify data. You may want to review the excellent tutorial on R Basics with Tabular Data found at:  https://programminghistorian.org/en/lessons/r-basics-with-tabular-data.

  • File Naming Convention Worksheet

    This worksheet walks researchers through the process of creating a file naming convention for a group of files. This process includes: choosing metadata, encoding and ordering the metadata, adding version information, and properly formatting the file names. Two versions of the worksheet are available: a Caltech Library branded version (PDF) and a generic editable version (MS Word).

  • Data Science Training Camp at Woods Hole Oceanographic Institution: Syllabus and slide presentations in 2020

    With data and software increasingly recognized as scholarly research products, and aiming towards open science and reproducibility, it is imperative for today's oceanographers to learn foundational practices and skills for data management and research computing, as well as practices specific to the ocean sciences. This educational package was developed as a data science training camp for graduate students and professionals in the ocean sciences and implemented at the Woods Hole Oceanographic Institution (WHOI) in 2019 and 2020. Here we provide materials for the 2020 camp.  Contents of this package include the syllabus and slide presentations for each of the four modules:
    1 "Good enough practices in scientific computing,"
    2 Data management,
    3 Software development and research computing,
    and 4 Best practices in the ocean sciences.
    The 3rd module is split into two parts. We also include a poster presented at the 2020 Ocean Science Meeting, which has some results from pre- and post-surveys.
     

  • Project Close-Out Checklist for Research Data

    The close-out checklist describes a range of activities for helping ensure that research data are properly managed at the end of a project or at researcher departure. Activities include: making stewardship decisions, preparing files for archiving, sharing data, and setting aside important files in a "FINAL" folder. Two versions of the checklist are available: a Caltech Library branded version (PDF) and a generic editable version (MS Word).

  • Efficient BIM Data Management & Quality Control of Revit Projects

    This AGACAD webinar provides guidance to speedy building design, facility management, and BIM data analysis in Revit projects. The contents include:
    • Manage BIM data in your Revit model and set LOD
    • Review, change & easily update BIM Data in your Revit projects
    • Find and modify any element parameters in BIM model with ease
    • Use formulas to make your own data tables
    • Insert elements into your project using various predefined rules
    • Set up and control LOD requirements based on standards, specifications, or framework agreed upon by the IPD team
    • Ensure that BIM models fit the agreed standards.

  • Top 5 Workflows for Precise BIM Data Management

    Do you have the need to easily rename families in your Revit project to match standards?  Do you find it hard to edit and control revisions within Revit?  Do you need accurate Quantity Take-Off information from your Revit model?  How about the need to edit parameter information more easily than a Revit Schedule?  Tired of assigning View Templates and managing view properties manually? Review this webcast as we cover these examples and more, utilizing a powerful Revit add-on application from Ideate Software called Ideate BIMLink.  It’s precise, fast, and easy Data Management of your BIM information.

  • Visualizing Data with Bokeh and Pandas

    The ability to load raw data, sample it, and then visually explore and present it is a valuable skill across disciplines. In this tutorial, you will learn how to do this in Python by using the Bokeh and Pandas libraries. Specifically, we will work through visualizing and exploring aspects of WWII bombing runs conducted by Allied powers, i.e., the WW II THOR dataset (Theater History of Operations Reports (THOR).
    At the end of the lesson you will be able to:
    -Load tabular CSV data
    -Perform basic data manipulation, such as aggregating and sub-sampling raw data
    -Visualize quantitative, categorical, and geographic data for web display
    -Add varying types of interactivity to your visualizations

    Prerequisites
    -This tutorial can be completed using any operating systems. It requires Python 3 and a web browser. You may use any text editor to write your code.
    -This tutorial assumes that you have a basic knowledge of the Python language and its associated data structures, particularly lists.
    -If you work in Python 2, you will need to create a virtual environment for Python 3, and even if you work in Python 3, creating a virtual environment for this tutorial is good practice.

  • Introduction To MySQL With R

    MySQL is a relational database used to store and query information. This lesson will use the R language to provide a tutorial and examples to:
    -Set up and connect to a table in MySQL.
    -Store records to the table.
    -Query the table.
    In this tutorial you will make a database of newspaper stories that contain words from a search of a newspaper archive. The program will store the title, date published and URL of each story in a database. They’ll use another program to query the database and look for historically significant patterns. Sample data will be provided from the Welsh Newspapers Online newspaper archive. They are working toward having a list of stories they can query for information. At the end of the lesson, they will run a query to generate a graph of the number of newspaper stories in the database to see if there is a pattern that is significant.

    To do this lesson you will need a computer where you have permission to install software such as R and RStudio, if you are not running that already. In addition to programming in R, they will be installing some components of a database system called MySQL which works on Windows, Mac and Linux.

    Some knowledge of installing software as well as organizing data into fields is helpful for this lesson which is of medium difficulty.

  • Dealing with Big Data and Network Analysis Using Neo4j

    In this lesson, you will learn how to use a graph database to store and analyze complex networked information. Networks are all around us. Social scientists use networks to better understand how people are connected. This information can be used to understand how things like rumors or even communicable diseases can spread throughout a community of people.
    This tutorial will focus on the Neo4j graph database and the Cypher query language that comes with it.
    -Neo4j is a free, open-source graph database written in java that is available for all major computing platforms.
    -Cypher is the query language for the Neo4j database that is designed to insert and select information from the database.
    By the end of this lesson you will be able to construct, analyze, and visualize networks based on big — or just inconveniently large — data. The final section of this lesson contains code and data to illustrate the key points of this lesson.

  • Remote Sensing for Monitoring Land Degradation and Sustainable Cities Sustainable Development Goals (SDGs)

    The Sustainable Development Goals (SDGs) are an urgent call for action by countries to preserve our oceans and forests, reduce inequality, and spur economic growth. The land management SDGs call for consistent tracking of land cover metrics. These metrics include productivity, land cover, soil carbon, urban expansion, and more. This webinar series will highlight a tool that uses NASA Earth Observations to track land degradation and urban development that meet the appropriate SDG targets. 

    SDGs 11 and 15 relate to sustainable urbanization and land use and cover change. SDG 11 aims to "make cities and human settlements inclusive, safe, resilient, and sustainable." SDG 15 aims to "combat desertification, drought, and floods, and strive to achieve a land degradation neutral world." To assess progress towards these goals, indicators have been established, many of which can be monitored using remote sensing. 

    In this training, attendees will learn to use a freely-available QGIS plugin, Trends.Earth, created by Conservation International (CI) and have special guest speakers from the United Nations Convention to Combat Desertification (UNCCD) and UN Habitat. Trends.Earth allows users to plot time series of key land change indicators. Attendees will learn to produce maps and figures to support monitoring and reporting on land degradation, improvement, and urbanization for SDG indicators 15.3.1 and 11.3.1. Each part of the webinar series will feature a presentation, hands-on exercise, and time for the speaker to answer live questions. 

    Learning Objectives: By the end of this training, attendees will: 

    • Become familiar with SDG Indicators 15.3.1 and 11.3.1
    • Understand the basics on how to compute sub indicators of SDG 15.3.1 such as: productivity, land cover, and soil carbon 
    • Understand how to use the Trends.Earth Urban Mapper web interface
    • Learn the basics of the Trends.Earth toolkit including: 
      • Plotting time series 
      • Downloading data
      • Use default or custom data for productivity, land cover, and soil organic carbon
      • Calculate a SDG 15.3.1 spatial layers and summary table 
      • Calculate urban change metrics
      • Create urban change summary tables

    Course Format: This training has been developed in partnership with Conservation International, United Nations Convention to Combat Desertification (UNCCD), and UN Habitat. 

    • Three, 1.5-hour sessions that include lectures, hands-on exercises, and a question and answer session
    • The first session will be broadcast in English, and the second session will contain the same content, broadcast in Spanish (see separate record for Spanish version at:  https://dmtclearinghouse.esipfed.org/node/10935 

    ​Prerequisites: 

    Each part of 3 includes links to the recordings, presentation slides, exercises and Question & Answer Transcripts.