Project description

Introduction

TL;DR: Ask a question you’re curious about and answer it with a dataset of your choice. This is your project in a nutshell.

May be too long, but please do read

The project for this class will consist of analysis and visualization on a dataset of your own choosing. The dataset may already exist, or you may collect your own data using a survey or by conducting an experiment. You can choose the data based on your teams’ interests or based on work in other courses or research projects. The goal of this project is for you to demonstrate proficiency in the techniques we have covered in this class (and beyond, if you like) and apply them to a novel dataset in a meaningful way. You can choose also start with any dataset you want from the datasets released in 2023 as part of this project: https://github.com/rfordatascience/tidytuesday/tree/master/data/2023#readme.

The goal is not to do an exhaustive data analysis i.e., do not calculate every statistic and procedure you have learned for every variable, but rather let me know that you are proficient at asking meaningful questions and answering them with results of data analysis and visualization, that you are proficient in using R/RStudio, and that you are proficient at interpreting and presenting the results. Focus on methods that help you begin to answer your research questions. You do not have to apply every visualization procedure we learned. Also, critique your own methods and provide suggestions for improving your analysis. Issues pertaining to the reliability and validity of your data, and appropriateness of the statistical analysis should be discussed here.

The project is very open ended. You should create some kind of compelling visualization(s) of this data in R (via ggplot2). There is no limit on what tools or packages you may use but sticking to packages we learned in class is required. You do not need to visualize all of the data at once. A single high-quality visualization will receive a much higher grade than a large number of poor-quality visualizations. Also pay attention to your presentation. Neatness, coherency, and clarity will count. All analyses must be done in RStudio (or other IDE), using R, and all components of the project must be reproducible (with the exception of the presentation).

You will work on the project with your lab teams.

The four milestones for the final project are

  1. Milestone 1 - Working collaboratively
  2. Milestone 2 - Proposals, with three dataset ideas
  3. Milestone 3 - Peer review, on another team’s project
  4. Milestone 4 - Presentation with slides and a reproducible project writeup of your analysis, with a draft along the way.

Submission of these deliverables will happen on GitHub and feedback will be provided as GitHub issues that you need to engage with and close. The collection of the documents in your GitHub repo will create a webpage for your project. To create the webpage go to the Build tab in RStudio, and click on Render Website.

This is intentionally vague – part of the challenge is to design a project that showcases best your interests and strengths.

One requirement is that your project should feature some element that you had to learn on your own. This could be a package you use that we didn’t teach in class (e.g., a package for building 3D visualizations) or a workflow (e.g., making a package) or anything else. If you’re not sure if your “new” thing counts, just ask!

Ideas

Your first task is to come up with a goal for your project. Here are a few ideas to help you get started thinking:

  • Present and visualize a technical topic in statistics or mathematics, e.g., Gradient descent, quadrature, autoregressive (AR) models, etc.
  • Build a Shiny app that that has an Instagram-like user interface for applying filters, except not filters but themes for ggplots.
  • Create an R package that provides functionality for a set of ggplot2 themes and/or color palettes.1
  • Build a generative art system.
  • Do a deep dive into accessibility for data visualization and build a lesson plan for creating accessible visualizations with ggplot2, Quarto, and generally within the R ecosystem.
  • Create an interactive and/or animated spatio-temporal visualization on a topic of interest to you, e.g., redistricting, COVID-19, voter suppression, etc.
  • Recreate art pieces with ggplot2.
  • Make a data visualization telling a story and convert it to an illustration, presenting both the computational and artistic piece side by side.
  • Build a dashboard.

And, of course, your project can be a about visualizing a dataset of interest to you (similar to your first project). The only rule about this dataset is that it can’t be from TidyTuesday. Beyond that, it should be something truly of interest to your team, and a dataset that allows for a deep exploration.

Milestone 1 - Working collaboratively

For the first milestone of your project you’ll practice a collaborative Git workflow with your team members. Instructions are outlined in Milestone 1: Working collaboratively. Each team member taking part in the collaborative working activity will get 5 points towards their project.

Milestone 2 - Proposal

There are two main purposes of the project proposal:

  • To help you think about the project early, so you can get a head start on finding data, reading relevant literature, thinking about the questions you wish to answer, etc.
  • To ensure that the data you wish to analyze, methods you plan to use, and the scope of your analysis are feasible and will allow you to be successful for this project.

Instructions and grading criteria for this milestone are outlined in Milestone 2: Proposal.

Milestone 3 - Peer review

Critically reviewing others’ work is a crucial part of the scientific process, and INFO 526 is no exception. You will be assigned two teams to review. This feedback is intended to help you create a high quality final project, as well as give you experience reading and constructively critiquing the work of others.

Instructions and grading criteria for this milestone are outlined in Milestone 3: Peer review.

Milestone 4 - Writeup and presentation

Instructions and grading criteria for this milestone are outlined in Milestone 4: Writeup + presentation.

Other

Reproducibility + organization

  • All required files are provided. Quarto files render without issues and reproduce the necessary outputs. If building a package, the checks pass.
  • If there’s a dataset, it’s provided in a data folder, a codebook is provided, and a local copy of the data file is used where needed.
  • Documents are well structured and easy to follow. No extraneous materials.
  • All issues are closed, mostly with specific commits addressing them.

Teamwork

You will be asked to fill out a survey where you rate the contribution and teamwork of each team member by assigning a contribution percentage for each team member. Filling out the survey is a prerequisite for getting credit on the team member evaluation. If you are suggesting that an individual did less than half the expected contribution given your team size (e.g., for a team of four students, if a student contributed less than 12.5% of the total effort), please provide some explanation. If any individual gets an average peer score indicating that this was the case, their grade will be assessed accordingly and penalties may apply beyond the teamwork component of the grade.

If you have concerns with the teamwork and/or contribution from any team members, please email me by the project presentation deadline. You only need to email me if you have concerns. Otherwise, I will assume everyone on the team equally contributed and will receive full credit for the teamwork portion of the grade.

Grading

The grade breakdown is as follows:

Total 110 pts
M1: Working collaboratively 5 pts
M2: Project proposal 10 pts
M3: Peer review 5 pts
M4: Write-up 40 pts
M4: Slides + presentation 25 pts
Reproducibility + organization 5 pts
Teamwork 20 pts

Grading summary

Grading of the project will take into account the following:

  • Content - What is the quality of research and/or policy question and relevancy of data to those questions?
  • Correctness - Are statistical procedures carried out and explained correctly?
  • Writing and Presentation - What is the quality of the statistical presentation, writing, and explanations?
  • Creativity and Critical Thought - Is the project carefully thought out? Are the limitations carefully considered? Does it appear that time and effort went into the planning and implementation of the project?

A general breakdown of scoring is as follows:

  • 90%-100%: Outstanding effort. Student understands how to apply all statistical concepts, can put the results into a cogent argument, can identify weaknesses in the argument, and can clearly communicate the results to others.
  • 80%-89%: Good effort. Student understands most of the concepts, puts together an adequate argument, identifies some weaknesses of their argument, and communicates most results clearly to others.
  • 70%-79%: Passing effort. Student has misunderstanding of concepts in several areas, has some trouble putting results together in a cogent argument, and communication of results is sometimes unclear.
  • 60%-69%: Struggling effort. Student is making some effort, but has misunderstanding of many concepts and is unable to put together a cogent argument. Communication of results is unclear.
  • Below 60%: Student is not making a sufficient effort.

Late work policy

There is no late work accepted on this project. Be sure to turn in your work early to avoid any technological mishaps.

Guidelines

Please use the project repository that has been created for your team to complete your project. Everything should be done reproducibly. This means that I should be able to clone your repo and reproduce everything you’ve submitted as part of your project.

All results presented must have corresponding code. If you do calculations by hand instead of using R and then report the results from the calculations, you will not receive credit for those calculations. Any answers/results given without the corresponding R code that generated the result will not be considered. For example, if you’re reporting the number of observations in your dataset, don’t just write the number manually, use inline R code to calculate the number. All code reported in your final project document should work properly. Please do not include any extraneous code or code which produces error messages. Code which produces certain warnings and messages is acceptable, as long as you understand what the warnings mean. In such cases you can add warning: false and message: false in the relevant R chunks. Warnings that signal lifecycle changes (e.g., a function is deprecated and there’s a newer/better function out there) should generally be addressed by updating your code, not just by hiding the warning.

Tips

  • We hope some of you will take the challenge to be adventurous and learn some new skills as part of this project. We’re happy to support you along the way, so don’t hesitate to ask as many questions as needed!

  • You’re working in the same repo as your teammates now, so merge conflicts will happen, issues will arise, and that’s fine! Commit and push often, and ask questions when stuck.

  • Review the marking guidelines below and ask questions if any of the expectations are unclear.

  • Make sure each team member is contributing, both in terms of quality and quantity of contribution (we will be reviewing commits from different team members).

  • Set aside time to work together and apart (physically).

  • Code:

    • In your presentation your code should be hidden (echo: false) so that your slides are neat and easy to read. However your document should include all your code such that if I re-render your Quarto file I should be able to obtain the results you presented. Exception: If you want to highlight something specific about a piece of code, you’re welcomed to show that portion.

    • In your write-up your code should be visible.

  • Teamwork: You are to complete the assignment as a team. All team members are expected to contribute equally to the completion of this assignment and team evaluations will be given at its completion - anyone judged to not have sufficiently contributed to the final product will have their grade penalized. While different team members may have different backgrounds and abilities, it is the responsibility of every team member to understand how and why all code and approaches in the assignment work.

  • When you’re done, review the documents on GitHub to make sure you’re happy with the final state of your work. Then go get some rest!

Footnotes

  1. Never built an R package before? See https://r-pkgs.org/whole-game.html for a great introduction.↩︎