top of page

DATA SCIENCE

HOMESCHOOL CONNECTIONS ONLINE

Course Description

This valuable course is a two-semester exploration of many topics associated with data science.  In many industries – agriculture, medical fields, cyber-security, manufacturing, and more – and from within the small-scale family business to big-data corporations like Google, the availability of data is almost everywhere.  The ability to work with that data to gain insights into correlations, the visualization of that data in a variety of charts and plots, to be able to identify data that appears to be an outlier from the larger dataset and/or from the trends, and to predict future outcomes based upon variable inputs, these are all just some of the ways that data is used to assist people in determining valuable insights in otherwise chaotic and disconnected pieces of information.

Because data science can be applied to so many working environments, the study of it is no longer just limited to those who are interested in a career in Information Technology (IT).  Data science is becoming one of the fastest growing professional careers available because of its ability to find a “home” in so many industries.

 

Prerequisites
  • An understanding of algebra is recommended for an understanding of polynomial equations, algebraic reasoning, and problem-solving.

  • An understanding of matrix mathematics and statistics is helpful but NOT required – they will be discussed in the lectures.

  • Previous computer programming experience -- Python programming preferred but other programming languages are acceptable.  Computer Programming 101 (available as a recorded course through Unlimited Access) and/or Introduction to Computer Science (also available as a recorded course through Unlimited Access) would provide sufficient prerequisite experience.  Much of the analysis will take place using Python-based computer programs.

  • General familiarity with computers including the ability to open applications, use menu-driven commands, and type using the keyboard so that the emphasis of the lessons is on specific programming assignments and related data-science topics

Course Outline

Topics subject to minor changes.  Topics will be interspersed throughout lectures and will span multiple weeks.

PART 1 (Fall Semester)

  • Data Science

    • What is it?

    • Who uses it?

  • Workflows and methodologies used by data scientists

  • Python programming for data science

  • The development environment (Anaconda, Jupyter Notebooks, and Spyder)

  • Review of Python programming fundamentals and Python data types (variables, lists, dictionaries, etc.)

  • Python functions and some of the Python modules we will be using (Pandas, NumPy, scikit-learn, and more)

  • Data Analysis

  • Exploring data sets of various types (sales data, website visitor logs, user profile data, etc.)

  • Cleaning "dirty" datasets

  • Review of (or introduction to) statistical math methods

  • Data visualization in Python and spreadsheet applications

PART 2 (Spring Semester)

  • Data Modeling

    • Data classification

    • Linear Regression

    • Logistic Regression

    • Bias - Variance

    • ...and more

  • Machine Learning

    • Natural Language (text mining)

    • Decision Trees

  • Getting data from external website APIs

  • Data Analysis

    • Exploring data sets of various types (sports data, traffic data, feedback reviews, etc.)

    • Working with relational datasets (one-to-one, one-to-many, and many-to-many)

    • Review of (or introduction to) statistical math methods

    • Data visualization in Python and spreadsheet applications

Course Materials

All course materials are to be provided by the professor.  Software to be installed -- Anaconda (https://www.anaconda.com) with Python 2.7 version (NOT Anaconda with Python 3.x version) which is available for Windows, Mac, and Linux operating systems.  Within Anaconda, ensure that the Jupyter Notebook and Spyder add-in applications are installed. The open source Anaconda Distribution is the easiest way to do Python data science and machine learning.

Homework

Computer-generated quizzes, at-home analytical exercises, and exploration of methodologies applied towards items of personal interest.  Spreadsheet applications like Microsoft Excel and/or Open Office (https://www.openoffice.org) may also be utilized. Students can expect 2-6 hours of studies outside of class depending upon their proficiency with programming in Python and their previous familiarity with algebra, matrix mathematics, and statistics.  If some of the math is new, then naturally there’s time that would need to be spent on learning math before it can be effectively programmed.

© 2024 by Domenico Ruggiero a.k.a. "The Software Maestro"

bottom of page