A Comparison of Machine Learning Code Quality in Python Scripts and Jupyter Notebooks

Abstract

Jupyter notebooks are currently one of the most popular environments for Python development, especially in domains such as data science. Existing studies have shown that notebooks may promote bad coding habits, leading to poor code quality and challenges with replicating notebook results. In this paper, we compare the code quality of Python machine learning code found in Jupyter notebooks to that found in regular Python scripts. The goal of this work is to better understand how the machine learning code created in Jupyter notebooks differs both from machine learning code provided in scripts and from the larger body of Python code, with the aim of creating tools to better support both data science students and practitioners.

Publication
Papers of the 37th Annual CCSC Southeastern Conference (CCSC-SE 2023)
Mark Hills
Mark Hills
Associate Professor

My research interests include programming languages, program analysis, and software engineering.