**Introduction to the basics of modern computer science and data science**

**Course Description**

Intended for students with no previous experience in computer programming, but with a basic knowledge of how to operate a computer (a laptop or PC is needed for this course). This course serves as a high-school level introduction to the basics of modern computer science and data science.

The techniques taught in data science and programming are used in countless fields of study. After taking this course, students will not only have an edge over their peers in related university courses in competitive university environments, but also have the tools to start building their own project portfolio early on using programming, which is crucial for future internship and long-term employment opportunities. To support learning, students will be assigned two projects (one for CS, one for DS) per week to be graded.

The computer science (CS) portion will focus on basic programming skills and concepts that can serve as a foundation to any future education in computer science. Creativity and problem-solving skills will be emphasised. By the end of this course, students will have made a basic calculator, to-do list, budgeting app, as well as a regression tool to be applied to any dataset. The data science (DS) portion will focus on the basics of data management in R, as well as simple machine learning algorithms. By the end of the course, students will have made a regression algorithm they can use on any dataset. This practical, project-oriented approach to the course will allow students to see and experience for themselves the real-life applications of their code.

The techniques taught in data science and programming are used in countless fields of study. After taking this course, students will not only have an edge over their peers in related university courses in competitive university environments, but also have the tools to start building their own project portfolio early on using programming, which is crucial for future internship and long-term employment opportunities. To support learning, students will be assigned two projects (one for CS, one for DS) per week to be graded.

The computer science (CS) portion will focus on basic programming skills and concepts that can serve as a foundation to any future education in computer science. Creativity and problem-solving skills will be emphasised. By the end of this course, students will have made a basic calculator, to-do list, budgeting app, as well as a regression tool to be applied to any dataset. The data science (DS) portion will focus on the basics of data management in R, as well as simple machine learning algorithms. By the end of the course, students will have made a regression algorithm they can use on any dataset. This practical, project-oriented approach to the course will allow students to see and experience for themselves the real-life applications of their code.

**Timeline**

A class-by-class breakdown of all the topics that we intend to cover, subject to change.

From a list of provided datasets, choose one and answer:

If there is time, we will begin discussing how the p5.js canvas works (RGB colours and pixel coordinates), and possibly how to draw on it in code.

Students will be given a document with various lines of code containing errors, and will be asked to circle the errors and explain how to fix them. Some lines will be incomplete, and students will be asked to fill in the code with the correct syntax. The goal is for students to practise the mechanics of writing code and searching for errors.

If we get to explaining coordinates and colours in class, the homework will also include questions on these concepts.

Case study with COVID-19 data. Load the data into R from the internet. Split the data into a training and testing set. Comment on the proportion of data. Repeat the process but with another dataset, this time importing it from the internet directly.

Get comfortable writing code and practice creativity! Students will use what we learned in class along with any other resources to create some unique digital art. They will incorporate user input by having the program respond to the mouse or keys.

Tidy a dataset! The students will be provided with a custom dataset made by us, and we want the student to A) load the dataset into the programming environment, then B) tidy a dataset with small amounts of guidance. We will ask for specific subsections of the dataset and expect the student to be able to get there with code. We will emphasise the interpretation of the data, asking the students to think critically about the quality of the data and the biases that may or may not be present.

Students will be given a series of common basic programming exercises to practise working with variables and designing functions to solve given problems. Examples of exercises include:

Exercises that students struggle with will be reviewed in-depth at the beginning of the next class, on top of showing my solution for each one. If the students feel comfortable with the concepts, I will show extensions of each of these programs which they may be able to achieve in the future.

Load a given dataset into R. Split into training and testing data. Students will be asked to retrieve various different statistics from the data and answer questions analyzing them,

thinking in terms of basic statistics.

Combine variables and conditional statements with user interaction by making a simple calculator program. The exact way it works is up to the student, whether they want to use browser prompts, buttons on the screen or a keyboard. Once they get a basic calculator working, they should come up with 5 ways to improve it or make it more complex. They will research these improvements and see if they can make any of them on their own using the skills covered so far. If not, we will cover much more later on, and hopefully they’ll come back and upgrade their calculators!

The first thing we want is for students to load in a custom dataset, and then tidy it following our rough guidance. We then want the students to produce a visualisation with code and argue for the appropriateness of the visualisation, comparing and contrasting their visualisation with other kinds for the specific kind of data provided. We will then provide short answer questions, where the student is given different visualisations for some data and must produce a short analysis of appropriateness, comparing and contrasting, etc. We will also get the student to analyse the visualisations by commenting on biases, spread, and visible issues or interesting features of the data.

Students will create a basic to-do list that the user can add tasks to and remove tasks using buttons. This requires using an array to keep track of the tasks, add/remove tasks, and requires a loop to show the tasks on screen.

Students will implement a linear regression algorithm for predicting house prices from square footage, and then for life span from pounds overweight. Illustrative plots will be required.

Students will implement and visualise either insertion sort or bubble sort. They will first write the algorithm in one step (one function), then break it down so that each step is seen as a separate frame. Students will be encouraged to experiment and be creative with ways to visualise the algorithm.

Students will implement a KNN regression algorithm to answer two predictive questions from the same datasets. Visualisations will be required, as well as an analytic comparison between linear and KNN regression in a provided case study.

Students will simulate streaming services with movies and TV series, each with different monthly costs, and people who are subscribed to different services. The program must be made using a variety of classes, which we will discuss in-class. Students are encouraged to visualise the program in whatever way they choose, although the only requirement is to build the functionality in code.

Implement two decision trees for two different case studies. The first: Is an email spam or not spam? The second: Is a tumour malignant or benign based on its features?

Students will design a simple program to help keep track of their finances. The user should be able to add multiple sources of income, expenses, and adjust their savings goal. A basic user interface will be provided so that the students will actually be able to use this app in their lives, and the students are encouraged to expand upon the interface to better suit their needs as well.

*Week 1***Class 1 (DS)**: Introduction to the field of data science (understanding what it can be used for), including the role of data in the modern world. Show publicly accessible databases and discuss how they may be used. Discuss inherent issues that can be present in data, the ease of lying with data, and introduce sampling methods to avoid such pitfalls.*Project*From a list of provided datasets, choose one and answer:

- Where and how the data was collected
- Potential uses for the data by researchers
- Real-world questions that you think could be answered with this data
- What are potential issues/biases inherent to the data

**Class 2 (CS)**: Discussion of the role of computer science and programming in the modern world. Introduction to the coding environment, JavaScript syntax, and what it means to write code. We will introduce the 2 main p5.js functions: setup and draw.If there is time, we will begin discussing how the p5.js canvas works (RGB colours and pixel coordinates), and possibly how to draw on it in code.

*Project: Syntax and errors*Students will be given a document with various lines of code containing errors, and will be asked to circle the errors and explain how to fix them. Some lines will be incomplete, and students will be asked to fill in the code with the correct syntax. The goal is for students to practise the mechanics of writing code and searching for errors.

If we get to explaining coordinates and colours in class, the homework will also include questions on these concepts.

*Week 2***Class 1 (DS)**: Review of key points from last class with a Kahoot. Introduction to RStudio and reading data into the programming environment using R. Discuss different file formats. Illustrate some features of datasets. Discuss the difference between training and testing data. Dealing with data imbalance between training and testing set.*Project*Case study with COVID-19 data. Load the data into R from the internet. Split the data into a training and testing set. Comment on the proportion of data. Repeat the process but with another dataset, this time importing it from the internet directly.

**Class 2 (CS)**: Review of last class with an introduction to drawing in p5.js. Students will be given some time in class to practise making various shapes. Then we will cover basic user interaction in p5.js and how that can be combined with the functions already introduced so far. Students will be introduced to a more formal definition of functions and will have the chance to make their own functions to enhance their programs. If there’s time, we will begin introducing basic data types such as strings, numbers and booleans.*Project: Interactive art*Get comfortable writing code and practice creativity! Students will use what we learned in class along with any other resources to create some unique digital art. They will incorporate user input by having the program respond to the mouse or keys.

*Week 3***Class 1 (DS)**: Review of loading in data. Data manipulation: introduction to tidy data, why it is important, and how to manipulate data into being tidy.*Project*Tidy a dataset! The students will be provided with a custom dataset made by us, and we want the student to A) load the dataset into the programming environment, then B) tidy a dataset with small amounts of guidance. We will ask for specific subsections of the dataset and expect the student to be able to get there with code. We will emphasise the interpretation of the data, asking the students to think critically about the quality of the data and the biases that may or may not be present.

**Class 2 (CS)**: Review of functions from last class and showcasing some visualisations (either made by the students last week or possibilities). Introduction to data types: strings, numbers and booleans. Operating on strings and numbers using functions. We will then learn how to use variables to store and use data, and how to operate on variables.*Project: Coding exercises*Students will be given a series of common basic programming exercises to practise working with variables and designing functions to solve given problems. Examples of exercises include:

- Average calculator: Write a program that asks the user to input marks for 4 courses, and then displays the average mark. Warning: Use brackets for order of operations!
- Pop quiz! Ask the user 5 simple math problems. After the user enters their answers, display the problems, the correct answer, and the user’s answers all on the screen. The correct answer should be calculated by your program (not hard-coded).
- Write a program that will first ask the user for the radius of a circle, then continuously draw a circle of that size at the user’s mouse position on screen. Display the area of the circle in text in the middle of the circle.
- Bonus: Make the circle’s radius increase by 5 each time the user clicks.

- Pig-latin translator: Ask the user for a sentence and translate it into pig latin.

Exercises that students struggle with will be reviewed in-depth at the beginning of the next class, on top of showing my solution for each one. If the students feel comfortable with the concepts, I will show extensions of each of these programs which they may be able to achieve in the future.

*Week 4***Class 1 (DS)**: Review tidying data through various guided exercises. Introduce the basics of standard deviation, mean/median, and probability. Introduce the Monty Hall problem. Learn how to get and interpret summary statistics (utilising different functions for different objectives) from a data table.*Project*Load a given dataset into R. Split into training and testing data. Students will be asked to retrieve various different statistics from the data and answer questions analyzing them,

thinking in terms of basic statistics.

**Class 2 (CS)**: Review of last week’s homework. Introduction to the control flow of programs and how to manipulate it through conditional statements (if, else if, else). Subsequent introduction to boolean logic and how to use it in code.*Project: Calculator*Combine variables and conditional statements with user interaction by making a simple calculator program. The exact way it works is up to the student, whether they want to use browser prompts, buttons on the screen or a keyboard. Once they get a basic calculator working, they should come up with 5 ways to improve it or make it more complex. They will research these improvements and see if they can make any of them on their own using the skills covered so far. If not, we will cover much more later on, and hopefully they’ll come back and upgrade their calculators!

*Week 5***Class 1 (DS)**: Review basic statistics and how to find summary statistics. Data visualisation: discuss purposes and uses, and learn different ways to visualise data using R.*Project*The first thing we want is for students to load in a custom dataset, and then tidy it following our rough guidance. We then want the students to produce a visualisation with code and argue for the appropriateness of the visualisation, comparing and contrasting their visualisation with other kinds for the specific kind of data provided. We will then provide short answer questions, where the student is given different visualisations for some data and must produce a short analysis of appropriateness, comparing and contrasting, etc. We will also get the student to analyse the visualisations by commenting on biases, spread, and visible issues or interesting features of the data.

**Class 2 (CS)**: Introduction to arrays and loops (another way to change control flow), with an introduction to basic traversal algorithms such as linear search and finding the largest number in an array. Both for loops and while loops will be covered, exploring different examples for each one. We will go over practice questions in class to make sure students understand the way arrays and loops work.*Project: To-do list*Students will create a basic to-do list that the user can add tasks to and remove tasks using buttons. This requires using an array to keep track of the tasks, add/remove tasks, and requires a loop to show the tasks on screen.

*Week 6***Class 1 (DS)**: Review data visualisations. Introduction to what it means for machines to learn. Introduction to linear regression. Discuss the advantages and disadvantages of linear regression. Discuss overfitting and underfitting, touching on the bias-variance tradeoff. See implementation in R with a case study.*Project*Students will implement a linear regression algorithm for predicting house prices from square footage, and then for life span from pounds overweight. Illustrative plots will be required.

**Class 2 (CS)**: Introduction to algorithms, what they are, what they are used for. Explanation of 2 types of algorithms: searching and sorting. In class, we will discuss how selection sort, insertion sort, and bubble sort work. Then we will practise transferring algorithms to code by using what we talked about in explaining selection sort to write the algorithm in code in class. We will then show one way in which this algorithm can be visualised in code to make it more intuitive.*Project: Visualising algorithms*Students will implement and visualise either insertion sort or bubble sort. They will first write the algorithm in one step (one function), then break it down so that each step is seen as a separate frame. Students will be encouraged to experiment and be creative with ways to visualise the algorithm.

*Week 7***Class 1 (DS)**: Review linear regression. Introduction to KNN regression. Compare and contrast these approaches in regression analysis. See implementation in R with a case study.*Project*Students will implement a KNN regression algorithm to answer two predictive questions from the same datasets. Visualisations will be required, as well as an analytic comparison between linear and KNN regression in a provided case study.

**Class 2 (CS)**: Learn about different data types and how to make your own using classes. We will go over how to give classes properties and methods and how to make new objects (instances). This very basic introduction to object-oriented programming leads to a discussion about decomposition: how to approach problems by breaking them down. This week’s project will be presented in class to allow students to practise decomposition and planning code before they begin.*Project: Streaming services*Students will simulate streaming services with movies and TV series, each with different monthly costs, and people who are subscribed to different services. The program must be made using a variety of classes, which we will discuss in-class. Students are encouraged to visualise the program in whatever way they choose, although the only requirement is to build the functionality in code.

*Week 8***Class 1 (DS)**: Review linear and KNN regression. Introduction to decision trees and implementation. Concluding remarks on the state of AI, qualitatively glazing over things like ChatGPT, as well as demonstrating how reinforcement learning works (with many illustrations). Talk about the future of AI.*Project*Implement two decision trees for two different case studies. The first: Is an email spam or not spam? The second: Is a tumour malignant or benign based on its features?

**Class 2 (CS)**: Where to go from here? First we will review objects, and touch on a few more advanced ways to use objects such as inheritance. In class we will mention and showcase other slightly more advanced programming concepts such as recursion and graphs, using real-life examples like fractals and social media networks to explain the concepts. We will discuss applications of computer science, from web development to engineering to the development of AI. We will wrap up the course by discussing ways in which what the students have learned in this course, and other basic programming concepts, can help them in their day-to-day lives even if they don’t choose to further study STEM.*Project: Budgeting app*Students will design a simple program to help keep track of their finances. The user should be able to add multiple sources of income, expenses, and adjust their savings goal. A basic user interface will be provided so that the students will actually be able to use this app in their lives, and the students are encouraged to expand upon the interface to better suit their needs as well.