Skip to content

Introduction to Health Data Science

University Logo

Spring 2025

This is the course page for PUBH 1142

Residential Course in GWSPH 300A on Tuesdays 9:30PM to 12:00PM

Welcome to PUBH 1142, your undergraduate introduction to the field of Health Data Science. My name is Jay and I am an Associate Professor in the Department of Biostatistics and Bioinformatics.

My hope is that you love this course and see just how empowering data science can be.

Watch Blackboard for content and up to date information. The teaching assistants will be introduced shortly.

Course Description

Health Data Science (HDS) involves the collection, processing, analysis, and interpretation of various types of health data, including electronic health records, medical imaging, genomics data, and public health data.

This course commences by defining HDS and the health data scientist in the context of a world where data are used in a secure, private, and ethical manner to generate reproducible results and insights.

Introduction to HDS is a discovery of health data sources and the use of computational thinking and computational tools to capture, manage, and analyze the data, before then presenting the insights gained from data.

The discovery concludes with the fundamentals of modern biostatistical and machine learning concepts.

Course Competencies

After this semester you will have established the following skills.

  • Evaluate the role of data science with respect to personal and public health
  • Evaluate and analyze different roles in health data science
  • Identify and analyze ethical issues pertaining to health data
  • Identify and analyze privacy concerns pertaining to health data
  • Evaluate the types of data used in health data science
  • Identify sources of personal and public health data
  • Apply methods of data capture and analyses in spreadsheet files
  • Apply methods of data capture and retrieval from databases
  • Install and use a computer language for data management and analyses
  • Use modern integrated development environments
  • Use versioning software for collaborative data science
  • Perform mathematical calculations using software
  • Use logical arguments to control the flow of execution of simple code
  • Apply common built-in computer language functions to perform tasks
  • Generate computer language functions to perform tasks
  • Filter data for analysis based on criteria
  • Select data subsets of data using indexing
  • Synthesize summaries of data in order to understand the information in the data
  • Generate commonly used data visualizations
  • Synthesize a hypothesis
  • Synthesize a model and evaluate the results from a model
Course Material

There is no prescribed textbook for PUBH 1142. Changes in data science is too rapid for textbook development.

The course material consists of a set of Jupyter notebooks, written in the Python language. These notebooks will be made available on Blackboard. You can also find it on THIS page.

Technology requirements

Students must have access to a dedicated desktop computer or laptop with access to the internet. Data from open public health sources as well as data from research projects in the Department of Biostatistics and Bioinformatics will be used throughout the course.

Access to the internet will be required to interact with this data. Students should have knowledge of working with Blackboard which will serve as the main source of announcements and distribution of resources, including the submission of assignments.

All students must create a free GitHub account and sign up for GitHub Copilot which is free when using a George Washington University email address. Homework assignments and the final examination must be uploaded to your GitHub repository.

Complete the following steps at least one week before class.

  1. Sign up for a free GitHub account HERE.
  2. Sign up for a free GitHub Copilot account HERE, but only after you have already created a GitHub account. Click on the Join GitHub Education button. Use your university email address. You will have to provide proof that you are a student.

This course requires writing Python code. We will use GitHub Copilot as generative artificial intelligence model to write the code.

Grading

Eight of the notebooks contain homework assignments. These must be completed at home after class and be submitted via GitHub before the start of the following class. The dates are indicated below under Important Dates and Times. Every day that an assignment is late will result in 5% reduction in the grade, up to 3 days, at which time the student will receive 0 credit for the assignment. 

The final exam is in the form of a project created as a Jupyter notebook. It must contain insights from a data set of your interest and choice. The project requirements will become clear during the course. The final date for submission of the project is Friday, May 3 at midnight.

The following components make up the final grade.

  1. Homework assignments 65%
  2. Class participation 5%
  3. Final exam 30%

Standard School of Public Health Letter Grades will be use.

  1. 93% - 100% A
  2. 90% - 92% A-
  3. 87% - 89% B+
  4. 83% - 86% B
  5. 90% - 82% B-
  6. 77% - 79% C+
  7. 73% - 76% C
  8. 70% - 72% C-
  9. 67% - 69% D+
  10. 63% - 66% D
  11. 60% - 62% D-
  12. 59% and lower F
Workload

This is a 3-credit course and requires a minimum workload of 112.5 hours. Each week includes a double residential lecture. In addition to attending these lectures, students are expected to spend a minimum of five hours per week in independent study, working on assignments, preparing for quizzes, and preparing for the final exam.

Important Dates and Times

ActionDateTime
Submit assignment from week 3February 49:30 AM
Submit assignment from week 4February 119:30 AM
Submit assignment from week 6February 259:30 AM
Submit assignment from week 7March 49:30 AM
Submit assignment from week 8March 189:30 AM
Submit assignment from week 9April 19:30 AM
Submit assignment from week 10April 89:30 AM
Submit assignment from week 11April 159:30 AM
Final projectMay 312:00 AM
Important dates and times

Weekly Schedule

DateTopics
Tuesday
January 14
- Introduction
- Install Python
- Install Visual Studio Code
- Definitions and scope of Health Data Science
Tuesday
January 21
- Sources of Health Data
Tuesday
January 28
- Ethics
- Privacy
-Security
Tuesday
February 4
- Submit homework assignment from previous week
- Spreadsheet software
- Databases
Tuesday
February 11
- Submit homework assignment from previous week
- Computer languages
Tuesday
February 18
- Basic coding
Tuesday
February 25
- Submit homework assignment from previous week
- Functions
Tuesday
March 4
- Submit homework assignment from previous week
- Data wrangling
Tuesday
March 11
- No class
Tuesday
March 18
- Submit homework assignment from previous week
- Understanding the information in tabular data
Tuesday
March 25
- Project
Tuesday
April 1
- Submit homework assignment from previous week
- Data visualization
Tuesday
April 8
- Submit homework assignment from previous week
- Inference
Tuesday
April 15
- Submit homework assignment from previous week
- Project
Tuesday
April 22
- Machine learning
Weekly schedule

Class Policies

All homework assignments and the final project are to be completed in conformance with The George Washington University Code of Academic Integrity. Each student must complete graded assessments on their own and answer all questions in their own words. Collaboration with other students is encouraged for the non-graded assignments (i.e., practice problems). Students registered for this course will be held to the highest standards of academic integrity. Written work submitted by a student must be the product of his/her own efforts. Plagiarism, cheating and other forms of academic dishonesty, including dishonesty involving computer technology, are strictly prohibited. 

Although attendance to lectures is not mandatory, it is strongly recommended that you make every effort to attend both sessions each week, physically and mentally. Experience has shown that attendance to both types of sessions proves beneficial on homework assignments and the final exam project. Please come to all lecture and application sessions prepared to ask questions relevant to the topics being discussed.

Outside of class, you are encouraged to post general questions directly to the Discussions section on Blackboard. Dr. Klopper will post answers in a timely manner. If the question is relevant only to an individual or is of a personal nature, send an email directly to Dr. Klopper (juanklopper@gwu.edu).

All written assignments must be submitted at the beginning of the session on the date due. If you are unable to attend a lecture or application session, you must make prior arrangements to submit your assignment in another manner. Please contact Dr. Klopper via e-mail BEFORE the assignment due date to arrange a drop-off time and location. Failure to do so will result in a grade of 0 on the assignment.

There will not be any other alternate date for submission of the final project that serves as the final examination, so please plan accordingly. Any student who experiences significant family or personal illness or emergency after the final withdrawal date and is unable to complete course work should ask the instructor for an incomplete for the course. Each case will be managed on an individual basis.