Part 2 of a Data Science at OVO blog series! Part 1 is here.
This is perhaps a trick question as the answer depends on where you work, or who you are talking to. But this article might help you understand OVO's take on the topic.
A bit of history
The fact that there's confusion around roles is not that surprising - even the term ‘data scientist’ is fairly new. When I and some of my current colleagues at OVO and Kaluza started doing it, it wasn’t known as Data Science - we were Data Analysts, typically.
Google trends exposes how the term has evolved over time, and this agrees with similar trends for frequency of job posts, for example:
I recall my dad, who works in IT, sending me Harvard Business Review articles about the new ‘data rock stars’ and getting caught up in the Big Data hype. So naturally, there’s going to be a bit of variation in terminology, until the industry settles down a bit and we move on to the next big wave.
Where we are now
Anyway - where we are now is that the term is used differently, in different places, and so it can be challenging as an analyst or data scientist to navigate career paths, expectations and learning and development opportunities.
The good news is that the names for the fundamental skills required for either role are somewhat more universally agreed on, so lets use those as a reference and see how we get on.
I’d also like to write about the importance of building a balanced team, heavily in agreement with articles such as this one authored by Google’s Cassie K, but that is probably another post. The main thing I want to emphasise for now is that one role is not intrinsically ‘better’ than the other.
Back to ‘history’ aka 5-10 years ago. For a while there was a fashion to describe Data Scientists as ‘unicorns’. Venn diagrams such as this permeated the blogosphere:
The fabled 'unicorn' data scientist might have all these attributes. But an analyst would have some, indeed a mathematician would have some and a software engineer would have some too. So, lets unpack those dimensions and add a few more that are relevant to delivering value in a business and being accountable and ethical.
Analyst and Data Scientist skills
Data mining & visualisation
This is all about exploratory data analysis, trend identification, hypotheses forming, and storytelling. An expert in these skills is able to present data as information, i.e. with meaning. They rapidly find trends and anomalies in data. Their visuals are intuitive, and the outcomes, upshots and themes of the data they’re presenting is easily understood by any audience.
Analytical & Creative thinking
This relates to the correct interpretation of verbal, written and numerical information. An analytical thinker is able to deal with ambiguity, independently engaging in tasks requiring interpretation of complex and often vague sets of information. They identify gaps in information and can make assumptions to continue analysis, and also proactively seek a wide range of sources of information.
And a creative thinker constructs new ideas and approaches, resulting in proactive, future focused projects and new opportunities.
Statistics & OR
As well as having an understanding of statistical and operational research techniques, this skill relates to reasoning under uncertainty, rigour and the selection of the right methods for the right problem. Skills here support model assessment and evaluation, validation of variables used and justification of the appropriate technology.
Speaking to my well educated OR colleagues, they assure me that OR is science in the sense of scientific laws, provable mathematical results etc, rather than experimental science (which is where we might put machine learning, blackbox techniques etc.) So the ability to conduct experiments and build models which are provable, rigorous and explainable is what is key here.
So much has been written about machine learning and artificial intelligence elsewhere on the interweb. I'll summarise as relating to the implementation of algorithms which learn from experience, which can scale to make accurate inferences from diverse data sets.
Engineering & Productionisation
Experts in this area can build models which operate and improve in (near) real time, and they have an understanding of how to make them valid and robust in production. They have an appreciation of the full software development lifecycle including testing, containerisation, version control, CI/CD, and they know what is missing from this list.
With more experience this could extend to technical design and appropriate systems architecture for analytical models.
Relates to consideration of value in the context of the business to ensure delivery of the right solution at the right time. It's hugely important that an analyst or data scientist can critically evaluate the likely value of their work outcomes, before they start.
This feeds in to the 80/20 rule, also called the Pareto principle - or more broadly an understanding of the appropriate effort to deploy to reach the desired outcome.
Similar to results focus, domain expertise enables speed and relevance of delivery. Having domain knowledge typically comes from experience and research, rather than skill. The skill lies in being able to get up to speed quickly in a new domain, having attention to detail, and educating others.
In the context of an energy supplier and technology company, which I am writing from, domain expertise might relate to product A/B testing, customer segmentation, energy trading and demand forecasting, speech analytics or contact forecasting.
Ethics & Legal
Not being Cambridge Analytica.
So what’s the difference between an analyst and a data scientist?
Again, I’d emphasise that the answer to this will vary depending on who you talk to, and there’s overlap in the skills. So, inspired by Sean McClure’s answer to this Quora post, here’s my two pence on the difference in the definitions:
I imagine this will date quickly - there are already more job titles being used for folk specialising in one or more of these areas - such as Machine Learning Engineer, ML researcher, Computational Statistician, … but I'd like to think the same skills framework can be used as a basis for explaining the difference between those, too. At least for a while.
What do you think? Do you agree on these ratings, and the basis of skills? Have you seen the benefit of having a balance of these skills, and these roles, in your teams? Do you think it's feasible for an individual to get a ‘10’ on everything? What will you learn next?