California Mathematics Framework: Data Science

Posted by

Chapter 5 of the first draft defines data science as follows

Data Science is the process of uncovering the stories hidden within data. It involves formulating questions, and collecting, cleaning, wrangling, analyzing, and visualizing data (that is often huge and complex) to uncover patterns and trends and communicate them to others. [lines 77 – 80]

They offer more explanation with their definition of data scientists.

Professional data scientists draw upon mathematics, statistics, and computer science, and think critically about the qualitative features of a data set to find meaning and communicate the results of their inquiries. Data scientists work together to address uncertainty in data while avoiding bias (Finzer, 2013). [lines 80 – 83]

Describing data science as a  “process of uncovering stories hidden within data” may sound comforting, but it is meaningless. Data science is a complex analysis procedure for discovering specific truths associated with well-formulated questions. It is a multi-disciplinary approach to collecting, analyzing, interpreting, and communicating the truths that have been discovered. The very process of formulating questions requires expertise in the area of interest (e.g., in physics, biology, economics, and even sociology). As William Finzer, Senior Scientist KCP Technologies, notes in his paper “The Data Science Education Dilemma”, substantive expertise gives a data scientist an understanding of the disciplinary context for a data set without which choosing a valid analysis methodology will be difficult or impossible. In fact, take a minute to reflect on the following insight into data science by William Finzer in the same paper.

First, there is the disciplined, quantitative thinking found in mathematics and statistics. From statistics comes an understanding of variability and experience using statistical tools to work with data. Second, substantive expertise gives a data scientist an understanding of the disciplinary context for a data set without which choosing a valid analysis methodology will be difficult or impossible. Finally, computing and data skills which, combined with creative problem solving abilities, allow one to see inside the machine and to visualize the structure of the data.

Relative to the observations I made in California Mathematics Framework: The Teacher, how is the teacher to be trained for California certification to teach data science?  Data science education programs are evolving since the data science academic discipline is still being defined. Data science is not an established discipline.

Since William Finzer referenced “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics” by William S. Cleveland, Statistics Research, Bell Labs, it might be helpful to reflect on Mr. Cleveland’s insights.

This document describes a plan to enlarge the major areas of technical work of the field of statistics. Because the plan is ambitious and implies substantial change, the altered field will be called “data science.”

This appears to be a call to define a new discipline, data science. Since Cleveland’s paper was published in 2001, it seems data science began to emerge approximately 20 years ago. How “established” is this discipline?

Consider the following passage from The Data Science Design Manual by Steven Skiena of Stony Brook University.

What is data science? Like any emerging field, it hasn’t been completely defined yet, but you know enough about it to be interested or else you wouldn’t be reading this book.

An “emerging field”! Professor Skiena’s book was published in 2017, 17 years after the Cleveland paper. Skiena continues

I think of data science as lying at the intersection of computer science, statistics, and substantive application domains.

That sounds a lot like William Finzer’s description. The Finzer paper was published in 2013. It seems that progress is steady but slow. Like the tortoise, it will finish the race when it finishes the race, and not a moment before.

Trying to rush the development of this program is futile. And it cannot be implemented as a course of study in PK – 12 until the discipline is well-defined.

Just a quick historical note.

When I worked in industry, I worked with data. I needed to understand the physical process that generated the data as well as the statistical methods necessary to analyze the data. My on-the-job training led me to buy The Statistical Analysis of Experimental Data by John Mandel. Mandel wrote this book while employed at the National Bureau of Standards. The copyright is 1964.

To say that measurement is at the heart of modern science is to utter a commonplace. But it is no commonplace to assert that the ever increasing importance of measurements of the utmost precision in science and technology has created, or rather reaffirmed, the need for a systematic science of data analysis. I am sorry to say that such a science does not exist.

Mandel noted that data analysis at that time was mostly a combination of intuition and experience that needed to be transformed into a systematic body of knowledge with its own principles and working rules. Statistical principles of inference appear to constitute a good starting point for such an enterprise.

Statistical data analysis is well-established in industry and academia. It provides a solid foundation for quantitative literacy in general education. And it can provide a great foundation for PK – 12 students in all academic pathways.

But the only way it can be successfully included in the PK – 12 mathematics curriculum is if the teachers have mastered statistical data analysis. Content knowledge is essential! Educational psychology informs pedagogy, it is not mathematics course content.

As always, “Good night and good luck.”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s