All Learning Resources
Getting Started With Lab Folder
Labfolder is an electronic lab notebook that enables researchers to record findings and make new discoveries. By reinventing the traditional paper lab notebook, our productivity & collaboration platform makes it easier to create, find, share, discuss & validate research data as a team. This Getting Started guide will help you learn what Labfolder is and how to use it for data entry, protocols and standard operating procedures, data retrieval, collaboration, inventory and data security tasks.
Labfolder is free for a team of up to 3 scientists in academia; other pricing is available for larger teams in academia, for business and industry, and for classroom use.
Data Visualization with Power BI
Power BI is a cloud-based business analytics service from Microsoft that enables anyone to visualize and analyze data, with better speed and efficiency. It is a powerful as well as a flexible tool for connecting with and analyzing a wide variety of data. Many businesses even consider it indispensable for data-science-related work. Power BI’s ease of use comes from the fact that it has a drag and drop interface. This feature helps to perform tasks like sorting, comparing and analyzing, very easily and fast. Power BI is also compatible with multiple sources, including Excel, SQL Server, and cloud-based data repositories which makes it an excellent choice for Data Scientists.
This tutorial will cover the following topics:
Overview of Power BI
Advantages of using Power BI
Building a Dashboard
Power BI’s integration with R & Python
Saving and Publishing
Principal Component Analysis (PCA) in Python
Principal Component Analysis (PCA) is a linear dimensionality reduction technique that can be utilized for extracting information from a high-dimensional space by projecting it into a lower-dimensional subspace. It tries to preserve the essential parts that have more variation of the data and remove the non-essential parts with fewer variations.
In this tutorial, you will learn about PCA and how it can be leveraged to extract information from the data without any supervision using two popular datasets: Breast Cancer and CIFAR-10.
According to Wikipedia, PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components.
Python Data Type Conversion Tutorial
In this Python tutorial, you'll tackle implicit and explicit data type conversion of primitive and non-primitive data structures with the help of code examples!
Every value in Python has a data type. Data types are a classification of data that tells the compiler or the interpreter how you want to use the data. The type defines the operations that can be done on the data and the structure in which you want the data to be stored. In data science, you will often need to change the type of data, so that it becomes easier to use and work with.
This tutorial will tackle some of the important and most frequently used data structures, and you will learn to change their types to suit your needs. More specifically, you will learn:
-Implicit and Explicit Data Type Conversion
-Primitive versus Non-primitive Data Structures
-Integer and Float Conversions
-Data Type Conversion with Strings
-Conversion to Tuples and Lists
-Binary, Octal, and Hexadecimal Integers in Python
Python has many data types. You must have already seen and worked with some of them. You have integers and float to deal with numerical values, boolean (bool) to deal with true/false values and strings to work with alphanumeric characters. You can make use of lists, tuples, dictionary, and sets that are data structures where you can store a collection of values.
Reading and Importing Excel Files into R
R Tutorial on Reading and Importing Excel Files into R will help you understand how to read and import spreadsheet files using basic R and packages.
This tutorial on reading and importing Excel files into R will give an overview of some of the options that exist to import Excel files and spreadsheets of different extensions to R. Both basic commands in R and dedicated packages are covered. At the same time, some of the most common problems that you can face when loading Excel files and spreadsheets into R will be addressed.
Excel is a spreadsheet application developed by Microsoft. It is an easily accessible tool for organizing, analyzing and storing data in tables and has widespread use in many different application fields all over the world. It doesn't need to surprise that R has implemented some ways to read, write and manipulate Excel files (and spreadsheets in general).
SQL Tutorial: How to Write Better QueriesIn this tutorial, you will learn about anti-patterns, execution plans, time complexity, query tuning, and optimization in SQL. First off, you’ll start with a short overview of the importance of learning SQL for jobs in data science, and next, you’ll first learn more about how SQL query processing and execution so that you can adequately understand the importance of writing qualitative queries: more specifically, you’ll see that the query is parsed, rewritten, optimized and finally evaluated. You’ll also learn more about the set-based versus the procedural approach to querying. You’ll briefly go more into time complexity and the big O notation to get an idea about the time complexity of an execution plan before you execute your query; Lastly, You'll briefly get some pointers on how you can tune your query further.
Terms for use of this information can be found at:https://www.datacamp.com/terms-of-use
Introduction to code versioning and collaboration with Git and GitHub: An EDI VTC Tutorial.
This tutorial is an introduction to code versioning and collaboration with Git and GitHub. Tutorial goals are to help you:
- Understand basic Git concepts and terminology.
- Apply concepts as Git commands to track versioning of a developing file.
- Create a GitHub repository and push local content to it.
- Clone a GitHub repository to the local workspace to begin developing.
- Inspire you to incorporate Git and GitHub into your workflow.
There are a number of exercises within the tutorial to help you apply the concepts learned.
Follow up questions can be directed via email to: o Colin Smith ([email protected]) AND Susanne Grossman-Clarke ([email protected]).
Transform and visualize data in R using the packages tidyr, dplyr and ggplot2: An EDI VTC Tutorial.
The two tutorials, presented by Susanne Grossman-Clarke, demonstrate how to tidy data in R with the package “tidyr” and transform data using the package “dplyr”. The goal of those data transformations is to support data visualization with the package “ggplot2” for data analysis and scientific publications of which examples were shown.
Large-Scale Multi-view Data AnalysisMulti-view data are extensively accessible nowadays, since various types of features, viewpoints, and different sensors. For example, the most popular commercial depth sensor Kinect uses both visible light and near-infrared sensors for depth estimation; automatic driving uses both visual and radar sensors to produce real-time 3D information on the road, and face analysis algorithms prefer face images from different views for high-fidelity reconstruction and recognition. All of them tend to facilitate better data representation in different application scenarios. Essentially, multiple features attempt to uncover various knowledge within each view to alleviate the final tasks, since each view would preserve both shared and private information. This becomes increasingly common in the era of “Big Data” where the data are on large-scale, subject to corruption, generated from multiple sources, and have complex structures. While these problems attracted substantial research attention recently, a systematic overview of multi-view learning for Big Data analysis has never been given. In the face of big data and challenging real-world applications, we summarize and go through the most recent multi-view learning techniques appropriate to different data-driven problems. Specifically, our tutorial covers most multi-view data representation approaches, centered around two major applications along with Big Data, i.e., multi-view clustering, multi-view classification. In addition, it discusses current and upcoming challenges. This would benefit the community in both industry and academia from literature review to future directions.This tutorial, available in PDF format, is one of nine tutorials from the 2018 IEEE international conference on BIG DATA in Seattle WA, also you reach out to the others at IEEE Big Data 2018 Tutorials.
When Metadata Collides: Lessons on Combining Records from Multiple Repository SystemsInstitution X likes to use Dublin Core and enjoys occasionally storing coordinates in the dc: rights field along with normal rights statements. Institution Y prefers PBCore and dabbles in storing LCSH subject strings as a type of corporation. What happens when the time comes for these two institutions to put their data in a shared environment? These are the issues the Boston Public Library has been facing building a statewide digital repository for Massachusetts made up of items from dozens of organizations that each have their own way of doing metadata. This talk is on the Digital Commonwealth initiative and our role as a DPLA hub, lessons learned while dealing with other institutions' data, and how we manage a repository system that contains actual digitized objects alongside metadata-only harvested records. In addition, a portion of this talk is on breaking the conventional library wisdom of "dumbing down" data to the lowest common denominator in a shared context. Instead, we go in the opposite direction: we make what we take in much more rich and discoverable by linking terms to controlled vocabularies, parsing subjects for geographic information, parsing potential dates from various fields into a standard format, and more.
This presentation was part of Open Repositories 2014, Helsinki, Finland, June 9-13, 2014; General Track, 24x7 Presentations
The slides are available in PDF format at: https://www.doria.fi/bitstream/handle/10024/97750/metadata_presentation.pdf?sequence=2&isAllowed=y
Research Data Management In The Arts and Humanities
In recent times the principal focus for research data management protagonists has been upon scientific data, due perhaps to a combination of conspicuous Government or funder declarations with a bias towards the sciences and the very public consciousness of examples of 'big data', notably the output from CERN's Large Hadron Collider.
That is not to say that developments in the management of Arts and Humanities data have been absent, merely occluded. We aim to take some steps towards rectifying this situation with RDMF10, which will examine what it is about Arts and Humanities data that may require a different kind of handling to that given to other disciplines, how the needs for support, advocacy, training, and infrastructure are being supplied and, consequently, what are the strengths and weaknesses of the current arrangements for data curation and sharing.
The broad aims of the event were:
-To examine aspects of Arts and Humanities data that may require a different kind of handling to that given to other disciplines;
-To discuss how needs for support, advocacy, training, and infrastructure are being described and met;
-And consequently, to assess the strengths and weaknesses of the current arrangements for Arts and Humanities data curation and sharing, and brainstorm ways forward.
- Introduction - Martin Donnelly, DCC
- Keynote 1: "What’s so different about Arts and Humanities data?" - Professor David De Roure, Director, Oxford e-Research Centre
- Keynote 2: "Err, what do I do with this? Exploring infrastructure requirements for visual arts researchers" - Leigh Garrett, Director, Visual Arts Data Service
- Researcher support and development requirements - Simon Willmoth, Director of Research Management and Administration, University of the Arts London
- Advocacy and outreach - Stephanie Meece, Scholarly Communications Librarian, University of the Arts London
- A researcher's view on Arts and Humanities data management/sharing (with a focus on infrastructure needs and wants) - Dr Julianne Nyhan, Lecturer in Digital Information Studies, University College London
- Data and the Sonic Art Research Unit - Professor Paul Whitty and Dr Felicity Ford, Oxford Brookes University
Institutional case study: Research data management in the humanities: A non-Procrustean infrastructure - Sally Rumsey, Janet McKnight and Dr James A. J. Wilson, University of Oxford
- Linking institutional, national and international infrastructures - Sally Chambers, DARIAH
Principles Of Data Curation And Preservation
-Definitions of data curation and preservation
-Stages of data curation and preservation
-Top tips for data curation and preservation
- Real-life examples of data curation and preservation
-Key references to find out more about data curation
-Recognize the basic principle of the appropriate curation and preservation of research data
-Outline the essentials of good data management practice
-Identify the reasons for good data management
Archiving The Artist: An Introduction To Digital Curation
-Identify the advantages and challenges of digitizing art archives
-Recognize the basic principle of digitization and digital curation
- Compare and evaluate digitized collections in the visual arts
-Apply the knowledge acquired in this session to begin to develop your own digital collection
DBMS Tutorial provides basic and advanced concepts of Database. The database management system is software that is used to manage the database. This DBMS Tutorial includes all topics of DBMS such as introduction, ER model, keys, relational model, join operation, SQL, functional dependency, transaction, concurrency control, etc.
DBMS allows users the following tasks:
Data Definition: It is used for creation, modification, and removal of definition that defines the organization of data in the database.
Data Updating: It is used for the insertion, modification, and deletion of the actual data in the database.
Data Retrieval: It is used to retrieve the data from the database which can be used by applications for various purposes.
User Administration: It is used for registering and monitoring users, maintain data integrity, enforcing data security, dealing with concurrency control, monitoring performance and recovering information corrupted by unexpected failure.
Subtopics covered under this tutorial include:
What is a Database
Types of Databases
What is RDBMS
DBMS vs RDBMS
DBMS vs File System
Three schema Architecture
Data model schema
ACID Properties in DBMS
SAR for Landcover Applications [Advanced]
This webinar series will build on the knowledge and skills previously developed in ARSET SAR training. Presentations and demonstrations will focus on agriculture and flood applications. Participants will learn to characterize floods with Google Earth Engine. Participants will also learn to analyze synthetic aperture radar (SAR) for agricultural applications, including retrieving soil moisture and identifying crop types.
Learning Objectives: By the end of this training, attendees will be able to:
- analyze SAR data in Google Earth Engine
- generate soil moisture analyses
- identify different types of crops
- This webinar series will consist of two, two-hour parts
- Each part will include a presentation on the theory of the topic followed by a demonstration and exercise for attendees.
- This training is also available in Spanish. Please visit the Spanish page for more information.
- A certificate of completion will also be available to participants who attend all sessions and complete the homework assignment, which will be based on the webinar sessions. Note: certificates of completion only indicate the attendee participated in all aspects of the training, they do not imply proficiency on the subject matter, nor should they be seen as a professional certification.
Prerequisites are not required for this training, but attendees that do not complete them may not be adequately prepared for the pace of the training.
- Introduction to Synthetic Aperture Radar
- Advanced Webinar: Radar Remote Sensing for Land, Water, and Disaster Applications
Part One: Monitoring Flood Extent with Google Earth Engine
This session will focus on the use of Google Earth Engine (GEE) to generate flood extent products using SAR images from Sentinel-1. The first third of the session will cover the basic principles of radar remote sensing related to flooded vegetation. The remaining time in the session will be dedicated to a demonstration on how to use GEE to generate flood extent products with Sentinel-1.
Part Two: Exploiting SAR to Monitor Agriculture
Featuring guest speaker Dr. Heather McNairn, from Agriculture and Agri-Food Canada, this session will focus on using SAR to monitor different agriculture-related topics, building on the skills learned in the SAR agriculture session from 2018. The first part of the session will cover the basics of radar remote sensing as related to agriculture. The remainder of the session will focus on the use of SAR to retrieve soil moisture, identify crop types, and map land cover.
Each part of 2 includes links to the recordings, presentation slides, and Question & Answer Transcripts.
Forest Mapping and Monitoring with SAR Data [Advanced]Measurements of forest cover and change are vital to understanding the global carbon cycle and the contribution of forests to carbon sequestration. Many nations are engaged in international agreements, such as the Reducing Emissions from Deforestation and Degradation (REDD+) initiative, which includes tracking annual deforestation rates and developing early warning systems of forest loss. Remote sensing data are integral to data collection for these metrics, however, the use of optical remote sensing for monitoring forest health can be challenging in tropical, cloud-prone regions.
Radar remote sensing overcomes these challenges because of its ability to “see” the surface through clouds or regardless of day or night conditions. In addition, the radar signal can penetrate through the vegetation canopy and provide information relevant to structure and density. Although the capabilities and benefits of SAR data for forest mapping and monitoring are known, it is underutilized operationally due to data complexities and limited user-friendly tutorials.
This advanced webinar series will introduce participants to 1.) SAR time series analysis of forest change using Google Earth Engine (GEE), 2.) land cover classification with radar and optical data with GEE, 3.) mapping mangroves with SAR, and 4.) forest stand height estimation with SAR. Each training will include a theoretical portion describing the use of SAR for landcover mapping as related to the focus of the session followed by a demonstration that will show participants how to access, download, and analyze SAR data for forest mapping and monitoring. These demonstrations will use freely-available, open-source data, and software.
Learning Objectives: By the end of this training, attendees will be able to:
- Interpret radar data for forest mapping
- Understand how radar data can be applied to land cover mapping
- Become familiar with open source tools used to analyze radar data
- Conduct a land cover classification with radar and optical data
- Map mangrove forests with radar data
- Understand how forest stand height can be mapped using radar data
- Apply SAR time-series analysis to map forest change
- Learn about upcoming radar missions at NASA
- Four parts with sessions offered in English and Spanish
- Four exercises
- One Google Form homework
prerequisites: Attendees who have not completed the following may not be prepared for the pace of this training:
Part 1: Time Series Analysis of Forest Change
- Introduction to analysis and interpretation of SAR data for forest mapping
- Exercise: Time Series of Forest Change using GEE
Part 2: Land Cover Classification with Radar and Optical Data
- Review of the unique attributes of radar and optical data as related to forest mapping and how they can be complementary
- Classification algorithms and improvements with optical imagery
- Exercise: Land Cover Classification with Radar and Optical using GEE
Part 3: Mangrove Mapping
- Introduction to analysis and interpretation of SAR data for mangrove mapping
- Exercise: Mapping Mangroves with the Sentinel Toolbox
Part 4: Forest Stand Height (with Guest Speaker Paul Siqueria)
- Introduction to the use of SAR data for mapping forest stand height
- Applications and looking forward to NISAR 2022
- Demo: Estimating Forest Stand Height
Each part of 4 includes links to the recordings, presentation slides, exercises, and Question & Answer Transcripts.
Mapeo y Monitoreo de los Bosques con Datos SAR [Avanzado]
Esta capacitación avanzada cubrirá los siguientes temas 1) análisis del cambio en los bosques con datos SAR multi-temporales utilizando Google Earth Engine (GEE); 2) la clasificación de la cobertura terrestre con datos SAR y ópticos utilizando GEE; 3) el mapeo de manglares con SAR; 4) y la estimación de la altura de los bosques utilizando SAR. Cada sesión incluirá una porción teórica describiendo el uso de SAR para el mapeo de la cobertura relevante el enfoque de la sesión, seguida por una demostración de cómo acceder, descargar y analizar datos SAR para el mapeo y monitoreo del bosque. Estas demostraciones utilizan datos y software de libre acceso y de fuente abierta.
Objetivos de Aprendizaje:
- Para la conclusión de esta capacitación, los participantes podrán:
- Interpretar datos radar para el mapeo de los bosques
- Entender cómo se puede aplicar datos radar para el mapeo de la cobertura terrestre
- Estar familiarizados con herramientas de fuente abierta para analizar datos radar
- Realizar una clasificación de la cobertura terrestre con datos radar y ópticos
- Mapear manglares con datos radar
- Entender cómo la altura de los rodales de los bosques se puede mapear con datos radar
- Aplicar análisis de series temporales SAR para mapear cambios en los bosques
- Aprender sobre futuras misiones radar de la NASA
Formato del Curso:
- Cuatro partes con sesiones disponibles en inglés y español
- Cuatro ejercicios
- Una tarea en Google Form
- Habrá un certificado de finalización disponible para los participantes que asistan a todas las sesiones y completen las tareas, la cual estará basada en las sesiones del webinar. Nota: los certificados de finalización indican únicamente que el poseyente participó en todos los aspectos de la capacitación, no implican competencia en la temática ni se deben ver como una certificación profesional.
Completar los Fundamentos de la Percepción Remota (Teledetección), Introducción al Radar de Apertura Sintética y SAR y sus Aplicaciones para la Cobertura Terrestre o tener experiencia equivalente. Los participantes que no completen los prerrequisitos podrían no estar lo suficientemente preparados para el ritmo de la capacitación.
Puede seguir las demostraciones utilizando el software enumerado a continuación. Las grabaciones de cada parte estarán disponibles en YouTube dentro de 24 horas después de cada demostración para que usted pueda repasarlas a su propio ritmo.
Primera Parte: SAR para el Mapeo de Inundaciones Utilizando Google Earth Engine
Google Earth Engine
Segunda Parte: SAR Interferométrico para la Observación de Derrumbes
Tercera Parte: Generación de un Modelo de Elevación Digital (Digital Elevation Model o DEM)
Para ambas partes, los presentadores utilizarán el Sentinel-1 Toolbox
Primera Parte: Análisis del Cambio en los Bosques con Datos SAR Multi-Temporales
• Introducción al análisis e interpretación de datos SAR para el mapeo de los bosques
• Ejercicio: Datos SAR multi-temporales para el análisis del cambio en los bosques usando GEE
• Sesión de preguntas y respuestas
Segunda Parte: Clasificación de la Cobertura Terrestre con Datos SAR y Ópticos
• Repaso de las caracteristicas de los datos SAR y ópticos relevantes al mapeo de bosques y cómo se pueden complementar entre sí
• Algoritmos para la clasificación con imágenes ópticas
• Ejercicio: Clasificación de la cobertura terrestre con datos SAR y opticos usando GEE
• Sesión de preguntas y respuestas
Tercera Parte: Mapeo de Manglares
• Introducción al análisis e interpretación de datos SAR para el mapeo de manglares
• Ejercicio: El mapeo de Manglares con el Sentinel Toolbox
• Sesión de preguntas y respuestas
Cuarta Parte: Estimación de la Altura de los Bosques con SAR (Presentador Invitado el Dr. Paul Siqueira)
• Introducción al uso de datos SAR para estimar la altura de los bosques
• Aplicaciones y a la espera de NISAR en el 2022
• Demo: Estimacion de la altura de los bosques
• Sesión de preguntas y respuestas
SQL for Data Analysis – Tutorial for Beginners – ep1
SQL is simple and easy to understand. Thus, not just engineers, developers or data analysts/scientists can use it, but anyone who is willing to invest a few days into learning and practicing it.
The author has created this SQL series to be the most practical and most hands-on SQL tutorial for aspiring Data Analysts and Data Scientists. It will start from the very beginning, so if you have never touched coding/programming/querying, that won’t be an issue!
Webinar: Introduction to QAMyData ‘health-check’ tool for numeric data
This webinar is an introduction to the new QAMyData tool for health-checking your numeric data, launched in November 2019.
The tool uses automated methods to detect and report on some of the most common problems found in the survey or numeric data, such as missingness, duplication, outliers, and direct identifiers. The open-source tool helps data creators and user’s quality assess a numeric data file using a comprehensive list of ‘tests’, classified into types: file, metadata, data integrity, and direct identifiers. Popular file formats can be tested, including SPSS, Stata, SAS, and CSV. The test configuration feature allows the creation of your own unique Data Quality Profile, which can play a useful role in your ‘FAIR’ data checking.
The webinar will describe the tests that are included in the tool, how to configure these to meet your own quality thresholds, and how to download the software from our GitHub page. they will also show their teaching exercise using messy data that can help promote data management skills.
Introduction to Data Mining
All great learning opportunities are built on a solid foundation. This data mining fundamentals series is jam-packed with all the background information, technical terminology, and basic knowledge that you will need to hit the ground running. This page provides a playlist of 24 video sessions on the basics of data mining. Topics in the introductory session include:
– Data and Data Types
– Data Quality
– Data Preprocessing
– Similarity and Dissimilarity
– Data Exploration and Visualization
Effectively Managing And Sharing Research Data In SpreadsheetsRegardless of discipline studied or methods used, it is likely that researchers use some sort of spreadsheet application (e.g. Excel, Google Sheets) to investigate, manipulate, or share research data. Though spreadsheets are easy to exchange with other researchers, difficulties can arise if sharing and preservation are not adequately considered. Join JHU Data Management Services for our online training course, “Effectively Managing and Sharing Research Data in Spreadsheets”. NOTE: You will be required to login to Blackboard to use the resources, but can do so as a guest.
We will cover spreadsheet practices that can:
1-Increase the possibility research data contained in spreadsheets can be re-used by others (and you) in the future, and
2-Help to reduce the chance of error when using spreadsheets for data acquisition.
-Introduction (7 mins lecture and 1 min demo)
-Prior to Data Collection (7 mins lecture)
-Track Versions and Provenance (5 mins lecture and 15 mins demos)
-Data Quality Control (10 mins lecture and 9 mins demos)
-Documenting Data, Values, and Variables (9 mins lecture and 14 mins demos)
-Data Sharing for Spreadsheets (9 mins lecture and 6 mins demos)
-Using Other People’s Spreadsheet Data (8 mins lecture and 35 mins demos)
-Resources (4 mins lecture and 53 mins demos)
A Toolbox for Curating and Archiving Research Software for Data Management SpecialistsIn this page, a set of tools, resources, approaches, and questions are presented which allow researchers or research data management specialists to address potential knowledge gaps in providing software archiving and/or preservation services as a companion to data service. Click each topic to learn more about software sharing and archiving.
36 Tutorials To Excel At MS Excel Spreadsheets
Considering the internet is filled with free and inexpensive classes, it makes sense that you can find a wide range of Microsoft Excel tutorials to guide you through the process. What’s cool about these tutorials is that combining many of them together often gives you a more in-depth look into Excel than a regular college course would.
- Why You Should Learn Excel
- Excel Basics Tutorials
- Advanced Excel Mathematics Tutorials
- Excel Database Tutorials
- MS Excel Functions Tutorials
- Excel Graphing Tutorials
- Excel Printing Tutorials
- Other Small Business Resources
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehouse Architecture
This tutorial on data warehouse concepts will tell you everything you need to know in performing data warehousing and business intelligence. The various data warehouse concepts explained in this video are:
1. What Is Data Warehousing?
2. Data Warehousing Concepts:
3. OLAP (On-Line Analytical Processing)
4. Types of OLAP Cubes
5. Dimensions, Facts & Measures
6. Data Warehouse Schema
Check our complete Data Warehousing & Business Intelligence playlist here.
A Complete Guide To Math And Statistics For Data ScienceMath and Statistics for Data Science are essential because these disciples form the basic foundation of all the Machine Learning Algorithms. In fact, Mathematics is behind everything around us, from shapes, patterns and colors, to the count of petals in a flower. Although having a good understanding of programming languages, Machine Learning algorithms and following a data-driven approach is necessary to become a Data Scientist, Data Science isn’t all about these fields.
In this blog post, you will understand the importance of Math and Statistics for Data Science and how they can be used to build Machine Learning models. Here’s a list of topics author will be covering in this Math and Statistics for Data Science blog:
-Introduction to Statistics
-Terminologies in Statistics
-Categories in Statistics
-Understanding Descriptive Analysis
-Descriptive Statistics In R
-Understanding Inferential Analysis
-Inferential Statistics In R
ASU Library Data Management TutorialsThe contents include:
-Introduction to Research Data Management tutorial: This tutorial provides an overview of the importance of your data management plan and best practices to help you manage research data.
-Writing a Research Data Management Plan tutorial: This tutorial discusses the steps for writing a detailed and effective research data management plan.
-Use the DMPTool to Write a Plan tutorial: This tutorial gives a basic overview of using the DMPTool to help individuals create a data management plan.
A Complete Python Tutorial to Learn Data Science from Scratch
In this tutorial, you will learn data science using python from scratch, and It will also help you to learn basic data analysis methods using python, and you will also be able to enhance your knowledge of machine learning algorithms.
Table of Contents
1-Basics of Python for Data Analysis
- Why learn Python for data analysis?
- Python 2.7 v/s 3.4
- How to install Python?
- Running a few simple programs in Python
2-Python libraries and data structures
- Python Data Structures
- Python Iteration and Conditional Constructs
- Python Libraries
3-Exploratory analysis in Python using Pandas
- Introduction to series and data frames
- Analytics Vidhya dataset- Loan Prediction Problem
4-Data Munging in Python using Pandas
5-Building a Predictive Model in Python
- Logistic Regression
- Decision Tree
- Random Forest