All Learning Resources

  • Local Data Management: Providing Access to Your Data

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course. The subject of this module is "Providing Access to Data". The module was authored by Matthew Mayernik from the National Center for Atmospheric Research. Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).

    In this module, we will talk about how you can provide access to your data. Arguing that data should be openly available and why that is important, we’ll discuss funding agency requirements for making data available and accessible with a focus upon United States Government agencies. We’ll ask the question, who has responsibility for providing access to your data? Despite the diagram’s indication on this slide, it is individuals who need to take responsibility for providing access to their own data in various ways. To help you follow through on that responsibility, we’ll talk generally about the challenges involved in making data accessible. This module is available in both presentation slide and video formats.

  • Providing Access to Your Data: Determining Your Audience

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Determining Your Audience".  The module was authored by Robert R. Downs from the NASA Socioeconomic Data and Applications Center which is operated by CIESIN – the Center for International Earth Science Information Network at Columbia University.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  This module is available in both presentation slide and video formats.

    When you think about providing access to your data, it’s important to think about the audiences that could use the data itself, but also to those who could use the data products and services generated from them.  The users of the data could be those currently interested in your data as well as future users of the data. Keep in mind that there could be several audiences for your data as they move through the entire life cycle. The audiences might reflect various user demographics or various purposes for using the data, and can certainly change over time.

    Determining the audiences that use your data can help you identify their needs, and inform the development of your data to meet those needs. Development of the data might include the creation of data products and services that you provide to assist users in using your data.  Knowledge of the audiences for your data will help you identify the various products and services that you might offer to current or new user communities, and also help verify that the needs of your users are being met.

    Efforts to determine the audiences for your data should continue throughout the entire data life cycle so that you can improve the user experience. Awareness of the initial users of your data can inform the data development process, as well as your plans for disseminating the data and for providing stewardship to manage the data over time. Observations about later users of your data can inform potential improvements for your data, so you can better serve both your current and your future users.  This module is available in both presentation slide and video formats.

  • Providing Access to Your Data: Access Mechanisms

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Access Mechanisms".  The module was authored by Robert R. Downs from the NASA Socioeconomic Data and Applications Center which is operated by CIESIN – the Center for International Earth Science Information Network at Columbia University.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  This module is available in both presentation slide and video formats.

    In this module, we plan to give you some background and context for the topic and describe its relevance to data management.  We’d like to introduce you to a way to think about the parties who can provide access to your data, and some of the mechanisms that might be used.  We’ll discuss some community considerations and resource considerations for access, and, finally describe some access mechanisms that can be offered by data centers.

  • Providing Access to Your Data: Tracking Data Usage

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Tracking Data Usage".  The module was authored by Robert R. Downs from the NASA Socioeconomic Data and Application Center which is operated by CIESIN – the Center for International Earth Science Information Network at Columbia University.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  This module is available in both presentation slide and video formats.

    In this module, we will give you some background and context for this topic, and then describe its relevance to data management.  We’ll discuss what data usage can tell you about your data and where you can find usage information.  We’ll also briefly discuss the advantages of tracking data citations.

  • Providing Access to Your Data: Handling Sensitive Data

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Handling Sensitive Data".  The module was authored by Robert R. Downs from the NASA Socioeconomic Data and Applications Center which is operated by CIESIN – the Center for International Earth Science Information Network at Columbia University.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  This module is available in both presentation slide and video formats.

    In this module, we will tell you what sensitive data is and provide some background information about it.  We will discuss why it is important that you identify and manage sensitive data, particularly for science.  We’ll also talk about some important issues to discuss with your archive about managing the sensitive data.

  • Providing Access to Your Data: Rights

    This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course.  The subject of this module is "Rights".  The module was authored by Robert R. Downs from the NASA Socioeconomic Data and Applications Center which is operated by CIESIN – the Center for International Earth Science Information Network at Columbia University.  Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA).  This module is available in both presentation slide and video formats.

    In this module, we will first provide some background and context on the topic of rights, discuss the relevance of rights to data management and then describe some options that you have for assigning rights, with examples. 

  • Earth Lab Free Online Courses, Tutorials and Tools

    Welcome to Earth Data Science This site contains open, tutorials and course materials covering topics including data integration, GIS and data intensive science. Explore our 312 earth data science lessons that will help you learn how to work with data in the R and Python programming languages.  Also, you can get a professional Certificate in Earth Data Analytics at University of Colorado, Boulder.  

    Online courses available include, for example:
    - Online Earth Data Science Courses
    - Use Data for Earth and Environmental Science in Open Source Python Textbook
    - Scientist’s Guide to Plotting Data in Python Textbook
    - Earth Analytics Bootcamp Course
    - Intro to Earth Data Science Textbook
    - Earth Analytics Python Course
    - Earth Analytics Bootcamp Course
    - Earth Analytics R Course

    Earth Analytics Workshops include, for example:
    - Get Started With GIS in Open Source Python Tools
    - Setup the Earth Analytics Python Environment On Your Computer
    -  Introduction to Clean Coding and the tidyverse in R

    Recent Tutorials include: 
    - Visualizing hourly traffic crime data for Denver, Colorado using R, dplyr, and ggplot
    - Calculating the area of polygons in Google Earth Engine
    - Introduction to the Google Earth Engine Python API
     

  • Open Source Software for Preprocessing GIS Data for Hydrological Models

    The information available from this web page cover a number of topics, courses, video and web tutorials related to Open Source Software for Preprocessing GIS Data for Hydrological Models. Courses include QGIS for Hydrological Applications, Using GDAL for preprocessing, Python 3 Tutorial, and Field surveys with QGIS, Mergin and Input.  
    The courses, video tutorials and webinars are designed for professionals (engineers and scientists) active in the water/environmental sector, especially those involved in planning and management of water systems as well as numerical modelling. Pre-requisites are a basic knowledge of computing and water related topics.
    After these courses, you will be able to understand:
    -The basic concepts of GIS Raster, vector, projections, geospatial analysis Use a GIS for:
    -Thematic mapping
    -Basic data processing and editing
    -Basic geoprocessing and analysis
    -DEM processing and catchment delineation
    -Find open source software and open data

  • UNIX Tutorial for Beginners

    A beginner’s guide to the Unix and Linux operating system. Eight simple tutorials which cover the basics of UNIX / Linux commands.  Other Unix resources are listed on the home page as well.

  • The Challenge of Big Data for the Social Sciences

     
    The ubiquity of "big data" about social, political and economic phenomena has the potential to transform the way we approach social science. In this talk, Professor Benoit outlines the challenges and opportunities to social sciences caused by the rise of big data, with applications and examples. He discusses the rise of the field of data science, and whether this is a threat or a blessing for the traditional social scientific model and its ability to help us better understand society.

     

  • The Theory, Practice and Limits of Big Data for the Social Sciences

    Martin Hilbert delivered this talk on May 1, 2015 at the Institute for Social Sciences conference series Leading Research in the Social Sciences Today.  This video presents The Theory, Practice and Limits of Big Data for the Social Sciences. Dr. Hilbert talks about storage, information and growth, the concept of a digital footprint and data using examples to clarify the content. 

  • Steps in a Digital Preservation Workflow

    Workflows are the way people get work done, and can be illustrated as series of steps that need to be completed sequentially in a diagram or checklist. It can involve anything, from documentation to tasks and data being moved from one location to the next.This presentation will outline generic considerations and processes for building and managing a digital preservation workflow. It will consider the workflow within the larger context of a digital content life cycle, which runs from information creation through to ongoing discovery and access. It will focus upon generalized steps institutions can use to acquire, preserve and serve content. The presentation will describe distinct workflow stages in conjunction with sample procedures, policies, tools and services, stressing the dynamic nature of workflows over time, including the use of modular components and ongoing work to enhance automation and cope with issues of scale.

    In this video, the presenter points out below topics:
    -Introduction to workflow in a digital preservation context
    -outline of how to conceptualize a workflow
    -Variables that influence the design execution of workflow
    -consideration of some existing models, architectures and tools

  • Google Earth Engine Tutorials

    These tutorials provide an introduction to using the Google Earth Engine JavaScript API for advanced geospatial analysis. The tutorials assume no programming background, although they do assume a willingness to learn some JavaScript programming. The links in this page can be used to get started on the tutorials or use the menus on the left to jump to a section of interest. In addition, there are 10 video tutorials from lectures or hands-on trainings conducted at the Earth Engine Users' Summit. View the videos after completeing the self-paced tutorials.  Finally, the Earth Engine developer community has created additional tutorials on topics deemed important by the community.  These can be found under the Community Tutorials and include topics such as: Combining Feature Collections, Customizing Base Map Styles and others.  

    The topics for API tutorials are:
    -Introduction to JavaScript for Earth Engine
    -Introduction to the Earth Engine JavaScript API
    -Introduction to Global Forest Change datasets
    -Introduction to the JRC Global Surface Water dataset
    -Introduction to Earth Engine (condensed)
    -Classification
    --Hands-on Intermediate Training
    Arrays and Matrices
    -Time Series Analysis
    -Tables and Vectors
    -Importing and Exporting
    -Earth Engine and the Google Cloud Platform
    -Google Maps API
    -Publishing and Storytelling
     

  • Introduction to Data Management for Undergraduate Students: Data Management Overview

    This library guide covers the basics and best practices for data management for individuals who are new to the research and data-collecting process.  Topics included in this guide are:
    - Data Management Overview
    - Data Documentation
    - Data Preservation
    - Filenaming Conventions
    - Data Backup 

  • Virginia (VA) Data Management Boot Camp 2016

    Institutions throughout Virginia have partnered since 2013 to present an annual Data Management Boot Camp.  Included at this web site are presentation slides on various topics including Organizing Data, Documentation and Metadata, Data Ownership, Sharing Data, Finding Data, and DMP Tool Presentation.  Other resources include links to TEDx Talks, datasets and exercises associated with various topics and tools such as Open Refine, R & R Studio, and DMP Online.   Recordings from previous years are also linked. 

  • Ag Data Commons Monthly Webinar Series

    Each month the Ag Data Commons offers a webinar with topics ranging from introduction for new users to topics with a data management or curation focus. We also leave time for organized question and answer periods. To join us for any of the upcoming webinars, you can email [email protected] and we will mail the join information to you for upcoming webinars. You can also check the news section for the next webinar's connect information. Upcoming webinars are listed on the Ag Data Commons News Page at https://data.nal.usda.gov/news, complete with details about the webinar subject and connect information. Please note each meeting number will be different.
    Topics include: 
    Making Data Machine Readable
    Creating a Data Management Plan
    Data Dictionaries
    Data-Literature Linking in the Ag Data Commons
    Data Science & Agriculture
    Introduction to GeoData

  • Python Developer’s Guide

    This guide is a comprehensive resource for contributing to Python – for both new and experienced contributors. These instructions cover how to get a working copy of the source code and a compiled version of the CPython interpreter (CPython is the version of Python) .It also gives an overview of the directory structure of the CPython source code. There are 32 sections Step-by-step Guide available from this web page.
     

  • Data Carpentry Ecology Workshop

    Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop uses a tabular ecology dataset and teaches data cleaning, management, analysis and visualization.

    The workshop can be taught using R or Python as the base language.

    Overview of the lessons:

    Data organization in spreadsheets
    Data cleaning with OpenRefine
    Introduction to R or python
    Data analysis and visualization in R or python
    SQL for data management

  • CMU Intro to Database Systems Course

    These courses are focused on the design and implementation of database management systems. Topics include data models (relational, document, key/value), storage models (n-ary, decomposition), query languages (SQL, stored procedures), storage architectures (heaps, log-structured), indexing (order preserving trees, hash tables), transaction processing (ACID, concurrency control), recovery (logging, checkpoints), query processing (joins, sorting, aggregation, optimization), and parallel architectures (multi-core, distributed). Case studies on open-source and commercial database systems will be used to illustrate these techniques and trade-offs. The course is appropriate for students with strong systems programming skills.  There are 26 videos associated with this course which was originally offered in Fall 2018 as Course 15 445/645 at Carnegie Mellon University.  

  • OMOP Common Data Model and Extract, Transform & Load Tutorial

    In this tutorial you will learn about the details of the Observational Medical Outcomes Partnership (OMOP) Common Data Model  (CDM) and how to apply it to Extract, Transform & Load (ETL) data.  The OMOP Common Data Model allows for the systematic analysis of disparate observational databases. The concept behind this approach is to transform data contained within those databases into a common format (data model) as well as a common representation (terminologies, vocabularies, coding schemes), and then perform systematic analyses using a library of standard analytic routines that have been written based on the common format.  In this tutorial, you can observe Best practices of converting data into a data module.
    Topics covered within this tutorial include:  
    -What is OMOP/OHDSI?
    -OMOP Common Data Model (CDM)– Why and How
    - How to retrieve data from OMOP CDM
    -Setup and Performing of an Extract Transform and Load process into the CDM
    -Using WhiteRabbit and Rabbit-In-A-Hat to Build an ETL
    - Testing and Quality Assurance

    Included with the video presentation of the tutorial include:
    Tutorial slides
    CDM_QUERY_EXAMPLES.sql
    CDM_QUERY_EXAMPLES_EXTRAS.sql
    OHDSI-in-a-box
    TUTORIAL_ScanReport.xlsx

    The OHDSI Common Data Model and Extract, Transform & Load Tutorial took place on September 24rd, 2016 during the 2016 OHDSI Symposium. Recordings were made possible by the generous support of Johnson & Johnson, the JKTG Foundation, and Pfizer.

  • OMOP Common Data Model and Standardized Vocabularies

    This workshop is for data holders who want to apply OHDSI’s data standards to their own observational datasets and researchers who want to be aware of OHDSI’s data standards, so they can leverage data in OMOP CDM format for their own research purposes.

    Topics covered within this tutorial include:  
    -Introductions and Ground Rules Foundational
     • History of OMOP
    • Why and How
     • Birth of OHDSI
     -Introduction to OMOP Common Data Model OHDSI Community
     Example of Remote Study
     VM Overview
    -Ancestors & Descendants
    - How does it work for Drugs
    -SQL Examples
    -History of the model
    - In-depth discussion of model
    -Era discussion
    - Real-World Scenario
    - ETL Piballs
    -Leveraging OHDSI Tools
    -OHDSI Community

    After the Tutorials, you will know: 
    1. History of OMOP, OHDSI
    2. How Standardized Vocabulary works
    3. How to find codes and Concepts
    4. How to navigate the concept hierarchy
    5. The OMOP Common Data Model (CDM)
    6. How to use the OMOP CDM 

    Included with the video presentation of the tutorial include:
    Tutorial slides
     

  • Microsoft Excel Data Curation Primer

    Microsoft Excel’s widespread adoption in the corporate sector is well known, but the application has also found use in many areas of scholarship. Despite the ubiquity of tabular data in CSV (comma-separated values) format, and the availability of many tools and analysis platforms that operate on CSV files, Microsoft Excel continues to be used widely in the natural sciences and social sciences. As a consequence, Excel files are routinely deposited in data repositories and curators are likely to encounter them.
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018.
    Table of contents:
     -Description of format 
    -Overview 
    -Characteristics
     -Typical purposes and functions 
    -What to look for 
    -Problems opening the file 
    -Content problems 
    -Software for viewing or analyzing data -Preservation actions 
    -Excel CURATE checklist 
    -Appendix: Creating a data dictionary
    - References

    More information about the collection of Data Curation Primers can be found at:  http://hdl.handle.net/11299/202810.

    Interactive primers available for download and derivatives at: https://github.com/DataCurationNetwork/data-primers.
     

  • Jupyter Notebooks: A Primer for Data Curators

    Jupyter Notebooks are composite digital objects used to develop, share, view, and execute interspersed, interlinked, and interactive documentation, equations, visualizations, and code. Researchers seeking to deposit software, in this case Jupyter Notebooks, in repositories do so with the expectation that repositories will provide documentation explaining “what you can deposit, the supported file formats for deposits, what metadata you may need to provide, how to provide this metadata and what happens after you make your deposit” (Jackson, 2018a). This expectation is not necessarily met by repositories that currently accept software deposits and complex objects like Jupyter Notebooks. This guide is meant to both inform curatorial practices around Jupyter Notebooks, and support the development of resources that meet researchers’ expectations to ensure long-term availability of software in curated archival repositories. Guidance provided by Jisc and the Software Sustainability Institute outlines three different kinds of software deposits: a minimal deposit, a runnable deposit, and a comprehensive deposit (Jackson, 2018b). This primer follows this same conceptual framework in dealing with Jupyter Notebooks, which even in their static, non-executable form, can be used to document how scientific research was carried out or be used as teaching models among many other use cases.
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018.

    The full set of Data Curation Primers can be found at:  https://conservancy.umn.edu/handle/11299/202810.

    Interactive primers available for download and derivatives at: https://github.com/DataCurationNetwork/data-primers.

  • Microsoft Access Data Curation Primer

    This primer assumes a conceptual familiarity with relational databases (and associated terminology) and a basic level of experience with Microsoft Access.
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018.

    More information about the collection of Data Curation Primers can be found at:  http://hdl.handle.net/11299/202810.

    Interactive primers available for download and derivatives at: https://github.com/DataCurationNetwork/data-primers.

  • GeoDatabase (.gdb) Data Curation Primer

    The geodatabase is a container for geospatial datasets that can also provide relational functionality between the files. Although the term geodatabase can be used more widely, this primer describes the ArcGIS geodatabase designed by Esri.
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018.
    Table of Contents :
     1. Description of format  
    2. Examples of geodatabase datasets  
    3. Key questions 
     4. Instructions for resources to use in the curation review of geodatabase files 
     5. Metadata 
    6. Preservation actions  
    7. Bibliography 
     Appendix 1: Future Primer Directions 

    More information about the collection of Data Curation Primers can be found at:  http://hdl.handle.net/11299/202810.

    Interactive primers available for download and derivatives at: https://github.com/DataCurationNetwork/data-primers.

  • Tutorial for using the netCDF Data Curation Primer

    This document is a supplemental primer to the main IMLS-Data-CurationFormat Profile-netCDF primer (http://hdl.handle.net/2027.42/145724). Within this primer, the NCAR Global Climate Four-Dimensional Data Assimilation (CFDDA) Hourly 40 km Reanalysis dataset from the Research Data Archive (RDA) at the National Center for Atmospheric Research (NCAR) is used to demonstrate how to assess a netCDF-based dataset according to the main primer’s instructions. In particular, Panoply, a curation review tool that is recommended by the main primer, is used to examine the dataset in order to help answer the questions outlined in the “Key Questions for Curation Review” section of the main primer.
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018.

    More information about the collection of Data Curation Primers can be found at:  http://hdl.handle.net/11299/202810.

    Interactive primers available for download and derivatives at: https://github.com/DataCurationNetwork/data-primers.
     

  • ANDS Guide to Persistent Identifiers: Awareness Level

    A persistent identifier (PID) is a long-lasting reference to a resource. That resource might be a publication, dataset or person. Equally it could be a scientific sample, funding body, set of geographical coordinates, unpublished report or piece of software. Whatever it is, the primary purpose of the PID is to provide the information required to reliably identify, verify and locate it. A PID may be connected to a set of metadata describing an item rather than to the item itself.
    The contents of this page are:
     What is a persistent identifier?
    Why do we need persistent identifiers?
    How do persistent identifiers work?
    What needs to be done, by whom?

    Other ANDS Guides are available at the working level and expert level from this page.

  • ANDS Guides to Persistent Identifiers: Working Level

    This module is to familiarize researchers and administrators with persistent identifiers as they apply to research. It gives an overview of the various issues involved with ensuring identifiers provide ongoing access to research products. The issues are both technical and policy; this module focuses on policy issues. 
    This guide goes through the same issues as the ANDS guide Persistent identifiers: awareness level, but in more detail. The introductory module is not a prerequisite for this module.
    The contents of this page are:
    Why persistent identifiers?
    What is an Identifier?
    Data and Identifier life cycles
    What is Identifier Resolution?
    Technologies
    Responsibilities
    Policies

    Other ANDS Guides on this topic at the awareness level and expert level can be found from this page.

  • ANDS Guides to Persistent identifiers: Expert Level

    This module aims to provide research administrators and technical staff with a thorough understanding of the issues involved in setting up a persistent identifier infrastructure. It provides an overview of the types of possible identifier services, including core services and value-added services. It offers a comprehensive review of the policy issues that are involved in setting up persistent identifiers. Finally, a glossary captures the underlying concepts on which the policies and services are based.

    Other ANDS Guides on this topic are available for the awareness level and the working level from this page.

  • Wordpress.com (hosted) Data Curation Primer

    WordPress.com is the hosted version of the open-source WordPress.org software (https://en.support.wordpress.com/com-vs-org/; https://dailypost.wordpress.com/2013/11/14/com-or-org/) offering a free online publishing platform with optional features, plans, and custom domains available for an additional cost (https://wordpress.com/about/). This primer will focus exclusively on the WordPress.com free site export and archiving process. In the future, additional primers and/or additions to this primer may be beneficial in order to cover the variations with WordPress.com Business Plan sites and WordPress.org software. 
    This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018.
    Table of Contents: 
    1. Description of format
     2. Examples 
    3. Sample data set citations 
    4. Key questions to ask yourself 
    5. Key clarifications to get from researcher 
    6. Applicable metadata standard, core elements, and readme requirements 
    7. Resources for reviewing data 
    8. Software for viewing or analyzing data 
    9. Preservation actions 
    10. What to look for to make sure this file meets FAIR principles 
    11. Ways in which fields may use this format 
    12. Unresolved Issues/Further Questions [for example tracking the provenance of data creation, level of detail in a dataset] 
    13. Documentation of curation process: What do capture from curation process 
    14. Appendix A - filetype CURATED checklist 

    More information about the collection of Data Curation Primers can be found at:  http://hdl.handle.net/11299/202810.

    Interactive primers available for download and derivatives at: https://github.com/DataCurationNetwork/data-primers.

  • Identifying and Linking Physical Samples with Data using IGSNs - PIDs Short Bites #2

    This webinar is the second in the PIDs Short Bites webinar series series examining persistent identifiers and their use in research. This webinar:
    1) introduced the IGSN, outlining its structure, use, application and availability for Australian researchers and research institutions
    2) discussed the international symposium "Linking Environmental Data and Samples".
     
    Slides available: https://www.slideshare.net/AustralianNationalDataService/identifying-and...
     

  • Linking Data and Publications - the Scholix Initiative - PIDs Short Bites #3

    This webinar was the third in the PID Short Bites webinar series examining persistent identifiers and their use in research. This webinar provides an introduction and overview of the Scholix (SCHOlarly LInk eXchange) initiative: a high-level interoperability framework aimed at increasing and facilitating exchange of information about the links between data and scholarly literature, as well as between data. The framework is a global community and multi-stakeholder driven effort involving journal publishers, data centers, and global service providers.

  • DOIs to Support Citation of Grey Literature - PIDs Short Bites #1

    This webinar was the first in the PIDs Short Bites webinar series examining persistent identifiers and their use in research. It begins with a brief introduction on the use of persistent identifiers in research followed by an outline of how UNSW has approached supporting discovery and citation of grey literature. grey literature materials are often important parts of the scholarly record which can contribute to research impact, and thus there is a need to make them discoverable and citable. Accompanying workflows meet the needs of researchers or administrators that produce grey literature on a regular and ongoing basis.
    You can find the Slides on:
    https://zenodo.org/record/165620#.XbMzV5pKiUk
     
     https://www.slideshare.net/AustralianNationalDataService/pids-for-resear...
     

  • RAID, a PID for Projects - PIDs Short Bites #4

     
    This webinar is the fourth in the PID Short Bites webinar series that will cover: RAiD PIDs.  Research Activity Identier (RAID) addresses issues surrounding Research Data Management planning and processes.
    You can find the slides here:
    https://www.slideshare.net/AustralianNationalDataService/andrew-janke-ra...
    https://www.slideshare.net/AustralianNationalDataService/siobhann-mccaff...

  • Introduction to Statistics for Social Sciences: Lecture 2

    This video is one of a 3 lecture series that introduces students to statistics for social science research.  The lectures support the textbook:  "REVEL for Elementary Statistics in Social Science" by J. Levin, J.A. Fox, and D.R. Forde.  ​This video covers the measures of central tendency and variability topics included in Chapters 3 and 4 of the Levin, Fox, and Forde text. The REVEL book contains a balanced overview of statistical analysis in the social sciences, providing coverage of both theoretical concepts and step-by-step computational techniques. Throughout this best-selling text, authors Jack Levin, James Alan Fox, and David R. Forde make statistics accessible to all readers, particularly those without a strong background in mathematics. Jessica Bishop-Royse,  the instructor of the video course,  has divided the book’s chapter into 3 lectures and presents examples to clarify the contents.

    Access to Lecture 1 and  Lecture 3

  • Introduction to Statistics for Social Sciences: Lecture 3

    This video is one of a 3 lecture series that introduces students to statistics for social science research.  The lectures support the textbook:  "REVEL for Elementary Statistics in Social Science" by J. Levin, J.A. Fox, and D.R. Forde.  ​This video lecture covers probability and normal distributions, topics included in Chapter 5 of the Levin, Fox, and Forde text. The REVEL book contains a balanced overview of statistical analysis in the social sciences, providing coverage of both theoretical concepts and step-by-step computational techniques. Throughout this best-selling text, authors Jack Levin, James Alan Fox, and David R. Forde make statistics accessible to all readers, particularly those without a strong background in mathematics. Jessica Bishop-Royse,  the instructor of the video course,  has divided the book’s chapter into 3 lectures and presents examples to clarify the contents.

    Access to Lecture 1 and Lecture 2

  • Introduction to Statistics for Social Sciences: Lecture 1

    This video is one of a 3 lecture series that introduces students to statistics for social science research.  The lectures support the textbook:  "REVEL for Elementary Statistics in Social Science" by J. Levin, J.A. Fox, and D.R. Forde.  This video lecture covers the research process, and organizing and viewing data, topics covered in Chapters 1 and 2 in the Levin, Fox, and Forde text. The REVEL book contains a balanced overview of statistical analysis in the social sciences, providing coverage of both theoretical concepts and step-by-step computational techniques. Throughout this best-selling text, authors Jack Levin, James Alan Fox, and David R. Forde make statistics accessible to all readers, particularly those without a strong background in mathematics. Jessica Bishop-Royse,  the instructor of the video course,  has divided the book’s chapter into 3 lectures and presents examples to clarify the contents.
    Access to Lecture 2,  and lecture 3.
     
  • Teaching and Learning with ICPSR

    These resources were created especially for undergraduate faculty and students. While any of ICPSR's data and tools can be used in the classroom, the ones provided here make it easy for instructors to set up data-driven learning experiences. The materials can be used as the basis for assignments, as an in-class or study exercise, for lecture content, or any other way you see fit. All resources are provided under a Creative Commons (attribution) License.

    A number of data-driven learning guides are provide which are standardized exercises that introduce (or reinforce) key concepts in the social sciences by guiding students through a series of questions and related data analyses. Analyses are preset so students can focus on content rather than mechanics of data analysis. To assist instructors with selection, guides are also categorized by the most sophisticated statistical test presented in the exercise.

    In addition, exercise modules that are made up of sequenced activities. While assignments may be created using a few of the exercises in a set, the full package must be used to meet the stated learning objectives for each. Exercise Sets are often appropriate for Research Methods courses and more substantively focused courses.

    Established in 1962, the Inter-university Consortium for Political and Social Research (ICPSR) provides leadership and training in data access, curation, and methods of analysis for a diverse and expanding social science research community. The ICPSR data archive is unparalleled in its depth and breadth; its data holdings encompass a range of disciplines, including political science, sociology, demography, economics, history, education, gerontology, criminal justice, public health, foreign policy, health and medical care, education, child care research, law, and substance abuse. ICPSR also hosts several sponsored projects focusing on specific disciplines or topics. Social scientists in all fields are encouraged to archive their data at ICPSR.

    ICPSR also provides guidelines related to curation of social science data.  Specific data curation guidelines on data quality, access, preservation , confidentiality and citation are available as videos and other resources at:  http://www.icpsr.umich.edu/web/pages/datamanagement/index.html .

  • PID Platform

    The platform is designed to help people understand what persistent identifiers are, why they exist, what they're used for and how to use them. It's split into several sections, each aimed at different stakeholder groups.  The PID Platform was developed by Project THOR, THOR was a 30 month project funded by the European Commission under the Horizon 2020 programme. It aimed to establish seamless integration between articles, data, and researchers across the research lifecycle. The project created a wealth of open resources and fostered a sustainable international e-infrastructure. The result was reduced duplication, economies of scale, richer research services, and opportunities for innovation.  The work of the THOR project has been continued by the FREYA Project.  Find out more about the FREYA Project at:  https://www.project-freya.eu/en .

    The PID platform is one product of THOR.  The best place to start is to choose one of the introductions to the stakeholder groups:
    -Introduction for integrators at:  https://project-thor.readme.io/v2.0/docs/introduction-for-integrators
    -Introduction for policy makers at:  https://project-thor.readme.io/v2.0/docs/introduction-for-policy-makers
    -Introduction for publishers at:  https://project-thor.readme.io/v2.0/docs/introduction-for-publishers
    -Introduction for researchers at:  https://project-thor.readme.io/v2.0/docs/introduction-for-researchers
    -Introduction for librarians and repository managers at:  https://project-thor.readme.io/v2.0/docs/introduction-for-librarians-and...

    Other resources produced by THOR including webinar presentations, posters, etc., can be found from the Getting Started link.

  • Big Data Hadoop Tutorial for Beginners: Learn in 7 Days!

    Big Data is the latest buzzword in the IT Industry. Apache’s Hadoop is a leading Big Data platform used by IT giants Yahoo, Facebook & Google. This step by step free course is geared to make a Hadoop Expert. This online guide is designed for beginners. But knowledge of  Java and Linux will help.  NOTE:  The tutorials feature ads on the pages of the tutorials.
    You can find these contents from this page:
      -Introduction to BIG DATA: What is, Types, Characteristics & Example
      -What is Hadoop? Introduction, Architecture, Ecosystem, Components
      -How to Install Hadoop with Step by Step Configuration on Ubuntu
      -HDFS Tutorial: Architecture, Read & Write Operation using Java API
      -What is MapReduce? How it Works - Hadoop MapReduce Tutorial
      -Hadoop & MapReduce Examples: Create your First Program
      -Hadoop MapReduce Join & Counter with Example
     -Apache Sqoop Tutorial: What is, Architecture, Example
     -Apache Flume Tutorial: What is, Architecture & Twitter Example
     -Hadoop Pig Tutorial: What is, Architecture, Example
     -Apache Oozie Tutorial: What is, Workflow, Example - Hadoop
     -Big Data Testing Tutorial: What is, Strategy, how to test Hadoop
     -Hadoop & MapReduce Interview Questions & Answers