IADSS Articles Published
at HDSR
We published two articles at Harvard Data Science Review, our first in a series of work to present IADSS Knowledge Framework for Data Science and Analytics and map skills to most common industry roles. Our foundational body of knowledge in the articles can support both industry and academia, as a reference in designing data science curricula as well as in developing measurement and assessment tools.
Toward Foundations for Data Science and Analytics: A Knowledge Framework for Professional Standards
This article is the first in a series authored by the Initiative for Analytics and Data Science Standards (IADSS) to present a framework for the knowledge and skills required in data science and support building a measurement and assessment methodology for analytics and data science professionals.
The article intends to contribute to the ongoing discussion on data science knowledge and skills in the industry. Our aim is building a framework for knowledge and skills required in data science, and support building a measurement and assessment methodology.
As the industry is racing to harness the power of data, demand for data science professionals is growing at an increasing rate. However, almost every organization has a unique way of defining roles in data science and associated skills and knowledge. This has resulted in a confusing industry landscape for employers, academic and training institutions, and existing and aspiring data science professionals. We aim to review both the history of data science and the emergence of data science as a profession in the industry, followed by a classification of knowledge and skills commonly associated with data science professionals, pointing to a lack of detailed and consistent treatment of the topic.
The demand for analytics and data science skills parallels the growth of interest and investment in data science. However, the explosive growth surrounding data science has left in its wake a state of confusion regarding the basic definitions of related tools, methods, skills, and roles associated with this discipline. The definition of data science itself is shrouded in uncertainty. Although ''data scientist'' has emerged as a job title, every industry, function, and business appear to be looking for their definition of the role. Universities have responded to the demand for data scientists by creating schools, institutes, and centers and establishing degree programs for relevant disciplines. But a lack of common vocabulary in data science hampers the design and implementation of such standards and related measurement instruments to assess the abilities and competencies of data science professionals.
One of the biggest concern is the significant cost of this confusion to employers in terms of interviewing or hiring the wrong candidates or for aspiring data scientists having to deal with lack of clarity on requirements for knowledge and skills in job descriptions. A second area of concern is ''self-training''. Our concerns is that the future data scientists are being trained outside of an academic environment and that can affect both the compromise quality and the consistency of the training.
We are confident that if the growth is managed well, data science will blossom into a rich space where academia will equip the future workforce with the right skills, employers will better understand how to define roles and assess candidates, and candidates will have higher quality training and well-understood expectations of the roles they are applying for.
defining data science.
We (IADSS) establish a working definition to guide the rest of the discussion. This working definition builds on a broad goal: connecting data to achieving goals. We take the practice of data science in industry as our starting point. We believe this goal is shared across data science's application domains, including its use in scientific discovery. Our working definition of data science is: ''Using Data to achieve specified goals by designing or applying computational methods for inference or prediction.''
In this broad view, data science implies a set of activities that, in today's view, can be considered trans-disciplinary. Yet, we imply that data scientists deliver, to some degree, against three main challenges, that is, their daily goals are a combination of tackling computational, scientific, and organizational goals. This combination of goals implies a combination of skills and knowledge traditionally found in different professions.
Many studies before ours have given a discussion of the knowledge, skills, and abilities associated with data science. While some studies highlight a mapping between academic fields of study and the knowledge required in data science practice, others have sought to build a set of required skills directly from job descriptions or reported tasks and activities of practitioners.
We drew up several conclusions after reviewing the large and diverse set of literature.There appears to be no consensus on what depth and breadth of engineering knowledge is required of data scientists. Some works go as far as to connect data scientists to advanced software engineering practice and application development. Others imply elementary knowledge of computing and just enough skills to perform data-related tasks. Similarly, we observe that the breadth of quantitative disciplines associated with data science is highly varied. There appears to be no comprehensive prior study that thoroughly focuses on skills required in analytics and data science. Many sources define the discipline through knowledge areas and implicitly with the accompanying cognitive skills. Moreover, reminiscent of the data scientist's role in driving decision making, interpersonal skills for communication and collaboration are mentioned frequently.
Today, the term data science mostly refers to a role conceived within 'big tech' one that is characterized by the rapid application of varied and advanced quantitative skills to very large and disparate data stores. These applications require a unique combination of skills, found in expert engineers and research scientists. In turn, these professionals are expected to interface with other stakeholders in the business, in increasingly agile organizations. These needs give rise to the modern data science profession.
In our view, what ultimately defines today's data science professionals is the need to embrace a growing responsibility to connect data to decisions and products, and a set of challenges that transcend the boundaries of traditional industrial roles, skills, and fields of academic study.
Exactly what topics and technologies are worthy of study for professionals in the broader analytics and data science fields? Our literature review found varied answers and conclusions. We organized our findings into a proposed hierarchy of knowledge, emphasizing subjects that are widely agreed upon among existing studies, giving a detailed view of topics complete with the required background knowledge. Our foundational body of knowledge, to be revised by further discussions and research findings, can be used as a reference in designing data science curricula as well as in developing measurement and assessment tools.
How Can We Train Data Scientists When We Can’t Agree on Who They Are?
As the demand for data science talent has exploded, so have the efforts to train data science professionals. There are many programs and formats for training in data science, ranging from short online courses to full-time undergraduate degree programs. But the question is: How can we train data scientists when we can't agree on who they are?
As there is not yet an agreed-upon definition of who data scientists are and which skills and knowledge they need to have, designing programs or developing curricula is challenging. On the other hand, organizations in industry are often not able to articulate their expectations from data science talent clearly, which in turn makes hiring, managing, and developing data professionals mostly inefficient and ineffective.
Although ''data scientist'' has emerged as a job title, every industry, function, and business appear to be looking for their definition of the role and that universities have responded to the demand for data scientists by creating schools, institutes, and centers and establishing degree programs for relevant disciplines. These suffer from the same confusion: the undisputed multidisciplinary nature of data science.
We strongly embrace the thinking that data science is an umbrella term and consists of activities more complex than a single professional, the data scientist, can perform. This line of thinking quickly brings us to a ''data science team'' made up of individuals with clear role definitions and specialties who collectively meet the skill and knowledge requirements of the organization.
Distinct roles in the industry with distinct sets of skills and knowledge might also be appropriate for training programs to adopt. While it would certainly be useful for a data science student to understand the entire data science lifecycle to start with, it would be more effective to then pick and choose certain areas of specialty to gain a level of expertise that would be useful to employers as they recruit data science talent. This would enable them to match the right person to the right job with potentially a quicker transition into the role.
In order to better understand the student perspective and develop a potential proposal to make comparing programs an easier task, we ran two limited-scale research studies in the weeks leading up to our KDD workshop. In one study, we asked more than 150 recent graduates of data science training programs, ranging from boot camps to degree programs, about their experience before, during, and after training. In the second study we asked program administrators to map their curriculum to the IADSS Knowledge Framework and mark each knowledge area with the level of coverage in the program.We believe this exercise of ''normalizing'' curriculum against a standardized framework of knowledge areas would make understanding focus areas of different programs a much easier exercise and could benefit students and curriculum designers as well as organizations looking to hire graduates from these training programs.
Efforts to educate data scientists of the future will certainly evolve with the field over the coming years, and communication and collaboration between academia and industry will be important. We also believe standardization of industry roles and training curricula are critical enablers of the goal to ensure that the growing global need for data science professionals can be met effectively and efficiently.