This headline may seem a bit odd to you. After all, if you’re a data scientist in 2019, you’re already marketable. Since data science has a huge impact on today’s businesses, the demand for DS experts is growing. At the moment I’m writing this, there are 144,527 data science jobs on LinkedIn alone.
The most in-demand data science skills of 2019
The following chart represents the skills employers are seeking from data science engineers in 2019:
For this analysis, we looked at 300 Data Science vacancies from StackOverflow, AngelList, and similar websites. Some terms might have been repeated more than once within one job listing.
Note: Bear in mind, this research represents the preferences of the employers, rather than the data science engineers themselves.
Key takeaways and Data Science trends
Obviously, Data Science is more about fundamental knowledge than frameworks and libraries, yet there are still some trends and technologies worth noting.
Real-time data processing
With the increasing use of various sensors, mobile devices, and IoT (18), companies are aiming to get more insights from real-time data processing. Thus the stream analytics platforms such as Apache Flink (21) are popular among some employers.
Feature Engineering and Hyperparameter Tuning
The ability to process data and extract valuable insights from it is vital. However, Data Visualization (55) is a no less important skill for any data scientist. It’s crucial that you could represent the outcomes of your work in a format, understandable to any team member or a customer. As for the data visualization tools, employers prefer Tableau (54).
In the vacancies, we encountered such terms as AWS (86), Docker (36), and Kubernetes (24). Hence the general trends in the software development industry are applicable to Data Science field, too.
What experts say
The technologies in this rating are on par. However, in Data Science, there are some things that are just as important as coding. It’s the ability to glean insights from “data output” such as final data sets and trends, visualization, and telling the story with that data. Also, it’s the ability to present the findings in a manner that is understandable. Know your audience — if they are Ph.D.’s, talk to them in an appropriate manner, but if they’re from the C Suite, they won’t care about programming — only results and ROI.
The snapshot data is useful to see the current state of the market but it doesn’t represent the trends, so it’s hard to plan for the future based on the snapshot alone. I would say that the usage of R will continue to steadily decline (the same can be said about MATLAB), while the popularity of Python among data scientists will keep rising. Hadoop and Big Data are on the list because the industry has some inertia: Hadoop will disappear (no one seriously invests in it anymore) and big data is no longer a hot trend. Whether one has to invest their time in learning Scala is unclear: Google officially supports Kotlin (also a JVM language), it’s simpler to learn while Scala has a steep learning curve. I’m also skeptical about the future of TensorFlow: academia already switched to PyTorch and academia’s influence is the strongest in data science compared to other industries. (The opinions are mine and might not represent Gartner’s views.)
PyTorch is the driving force of reinforcement learning with mathematical operations on CUDA tensors with GPUs. It is also a stronger framework for parallelizing the code natively on multiple GPUs at the same time unlike TensorFlow that requires to wrap each operation to a device. PyTorch also builds dynamic graphs which are efficient for recurrent neural networks. Theano-based TensorFlow produces static charts and is more complicated to learn compared to Torch-based PyTorch. The TensorFlow reflects the larger community of developers and researchers. PyTorch will show more momentum, when it builds machine learning dashboard visualization tools such as TensorBoard. PyTorch is more Pythonic in terms of debugging and data visualization libraries with matplotlib and seaborn. Most of the debugging tools of Python can be leveraged to debug PyTorch as well. TensorFlow comes with its own debugging tool tfdbg.
Chief Data Scientist, Accenture,
winner of Top 50 Tech Leader Awards.
LinkedIn | Twitter
I think of data science “jobs” differently than data science “careers.” Job listings offer insights into specific skills the market needs now but for a career, one of the most important skills I’ve seen is the ability to learn. Data science is a fast moving field and you need to be able to easily pick up new techniques, tools, and domain knowledge if you’re going to succeed over the long term. Do that by challenging yourself and avoid getting too comfortable.
Data Science is a fast-evolving and complicated industry, where general knowledge matters as well as the experience with particular technologies. Hope this article helps you get valuable insights on what skills of both kinds you need to stay marketable in 2019. Good luck!