Data Scientists Use Cloud Tools to Unlock the Value of Data for Businesses

Data scientists help organizations shift from relying on instinct and experience to using data for new, transformative insights. And, although the role of “data scientist” was not even identified as a profession until a decade ago, for the past two years recruiting site Glassdoor has cited data scientist as the highest-ranked job based upon salary, job satisfaction, and the number of job openings.

Data scientists are in great demand because of the volume of data that organizations are dealing with, due in part to the explosion of data streams now enabled by cloud. About 20 per cent of this is structured data that businesses have historically collected, but the other 80 per cent is unstructured data, which comes in the form of emails, social media, images or videos, and can be much harder to manage, collect, and analyze.

Additionally, recent survey data highlights cloud growth in several areas, which means data scientists will need to grapple with new workloads from AI, analytics, and IoT devices. Access to data in the cloud is critical to today’s data scientists as they need a centralized and accessible platform across all teams – especially data science teams.

As digital transformation drives more companies and industries around the world to the cloud, there is a growing need to capture and manage both new and legacy data. As long as a data scientist has easy access to this data, he or she is already equipped with the skills to analyze the growing volumes through cloud technology to turn information into insights that can transform businesses and industries. The problem is there’s just not enough data scientists to handle current – let alone future – demands. According to the “Worldwide Semiannual Big Data and Analytics Spending Guide” from International Data Corp., global revenues for big data and business analytics will grow from $130 billion in 2016 to more than $203 billion in 2020.

Most organizations hire data scientists to develop algorithms and build machine learning models, which is typically the part of the job that they enjoy most; but, according to a report from CrowdFlower, at most companies there’s an “80/20 rule”: data scientists spend 80 per cent of their time on finding, cleansing, and organizing data, leaving only 20 per cent actually analyzing the data.

For these reasons, organizations need to provide new cloud services and technology to provide data scientists with the tools they need to rapidly find and organize growing volumes of data. This leaves them with more time to focus on where their skills are most valuable: analyzing and working with the increasing volume of datasets being generated by everything from sensors to devices and users.

This can include tools to automate and simplify data discovery, curation and governance, as well as intelligent search capabilities to help data scientists find the data they need. Metadata, such as tags, comments, and quality metrics, can help them more quickly decide whether a data set will be useful. Integrated data governance provides data scientists with confidence that the models and results they produce from data sets are used responsibly by others in the organization.

The goal is to give data scientists the time needed to build and train multiple models simultaneously, rather than being limited to working on one model at a time. This approach spreads out the risk of analytics projects, encouraging experimentation that yields breakthroughs, instead of focusing resources on a single approach that could be a dead end.

Cloud is the foundation of such a strategy, and it gives data scientists the ability to easily save, access, and extend models, allowing them to use existing assets as templates for new projects. The practice, called “transfer learning,” lets them avoid starting from scratch every time and focuses on preserving the knowledge gained while solving one problem and applying it to a different, but related problem.

Disruptive technology is available to eliminate the 80/20 rule and provides data scientists with the tools to reclaim much of the time that they’re currently wasting on discovering and cleansing data. Instead, data scientists can produce innovative work that provides competitive advantage for organizations and will help them transform their businesses and industries.

Simo Vujovic is the Director of Cloud and Cognitive at IBM Canada.