A Comparison of Machine Learning Code Quality in Python Scripts and Jupyter Notebooks

Nov 1, 2023·

Kyle Adams

Aleksei Vilkomir

Mark Hills

· 0 min read

PDF Cite Slides

Abstract

Jupyter notebooks are currently one of the most popular environments for Python development, especially in domains such as data science. Existing studies have shown that notebooks may promote bad coding habits, leading to poor code quality and challenges with replicating notebook results. In this paper, we compare the code quality of Python machine learning code found in Jupyter notebooks to that found in regular Python scripts. The goal of this work is to better understand how the machine learning code created in Jupyter notebooks differs both from machine learning code provided in scripts and from the larger body of Python code, with the aim of creating tools to better support both data science students and practitioners.

Type

Conference paper

Publication

Papers of the 37th Annual CCSC Southeastern Conference (CCSC-SE 2023)

Last updated on Nov 1, 2023

Authors

Mark Hills

Associate Professor

← Starting a Civic Engagement Capstone: An Experience Report Aug 1, 2024

Enabling Go Program Analysis in Rascal Oct 1, 2023 →