Learn more about BrainStation's Intro Day, a popular event series that gives aspiring students a chance to try our coding bootcamps for a day while connecting with bootcamp alumni and hiring partners.
Colin Fraser is a Data Scientist at TELUS, where he focuses on business customer analytics and is also the Lead Educator for BrainStation’s Data Science course. We sat down with Colin to learn more about his role as a Data Scientist, and to uncover how companies like TELUS use data to drive-decision making.
The field of Data Science is still relatively new; how did you get involved in the industry?
I started out as a quantitative analyst at a futures trading firm, which used to be one of the only kinds of jobs where you could get paid to do the kind of math and science that I wanted to do. I enjoyed the work very much, but the hours were rough and there is a fair amount of high pressure, stress, and long hours that go along with that line of work. At the same time, I started reading about Data Science, which used many of the same kinds of tools I was using in quant finance but towards a broader class of business problems, and I started looking around to see if I could make a transition into that type of job. Turned out I could, and it’s been great!
What’s the best part for you about a career in Data Science?
Data Science is ultimately about knowledge creation. It is about figuring out how we can make use of a great big mess of data that our organization collects and turn it into knowledge which informs the operations of the business going forward. Along the way, we make use of state-of-the-art techniques and tools from computer science and statistics. I guess being on the front lines of this sort of knowledge creation activity, making new discoveries about how our customers behave or the impacts of our actions, is the most exciting part of being in data science for me.
What kind of tools do you work with on a day-to-day basis?
One of our most important tools is RStudio Server, which is a great development environment for working with R on a server (and there’s a fully functional free version!). For crunching big data we use Hadoop-based tools like Hive, and for crunching smaller data our main tool is R, although I make use of Python frequently for Natural Language Processing applications. We also use a tool called h2o.ai for doing machine learning, and Tableau for some types of data visualizations.
What does a typical day look like for you and your team at TELUS?
Typically I’ll start working at about 6 in the morning so that I can get at least a few uninterrupted hours of coding in before it’s time for meetings with our various stakeholders. Waking up that early is not a requirement for a career in Data Science, but I find that it is a nice way to make time for deep uninterrupted focus. My team works closely with marketing and sales channels, so a great deal of my time during normal working hours is spent coordinating with them to make sure that we are building data science tools that will be useful. Hence, I spend a lot more time than you might expect in conference calls, meetings, and doing presentations to showcase our work and help turn models into action.
How do you use data to drive decision-making at TELUS?
My team focuses on identifying marketing and customer experience opportunities for business customers. By looking at data on a customer’s billing history, product profile, usage patterns, and other pieces of information such as the business type and size, we are able to make smart recommendations about what kinds of products that customers might be able to use. We are also able to identify which customers might be at risk of leaving us for a competitor, or which might have a low level of customer satisfaction. These systems and models drive marketing and customer satisfaction campaigns to continually improve our level of service to our customers.
What are your favorite examples of Data Science in action? Netflix suggested movies, Amazon suggested purchases, etc.
Some of my favorite examples come from epidemiology. In 2008, Google launched Google Flu Trends, which was an effort to predict flu outbreaks in real-time using local search terms. The idea would be that if there was an uptick in searches within an area for terms like “fever and sore throat” that this could indicate a flu outbreak in that area. Although it initially looked like a success, the model that they used ended up making some pretty far off predictions for the 2012-2013 flu season, and Google eventually shuttered the project. The basic idea was good, however, and it inspired many other researchers to build similar tools that have been more successful; for instance, flu-prediction.com uses a combination of Twitter’s firehose of data and IBM Watson to make predictions about flu outbreaks in the United States, and many organizations run Data Science competitions to see who can build the best flu prediction model. My view is that we’ve only seen a sliver of the power of these kinds of tools for public health, and I’m excited for the future of the intersection between computer science, statistics, and epidemiology.
Why is learning Data Science applicable to everyone, not just those looking to become Data Scientists?
As organizations grow in sophistication with respect to their data collection operations, it is inevitable that they will want to find a way to make use of these vast seas of data generated by their operations. To do that, you need to understand what is possible. Data Science seeks to provide a unified framework for talking about the art-of-the-possible–that is, to figuring out what kinds of questions can be answered by data, and what needs to be done in order to answer those questions. Even if you’re not ultimately the person who will be crunching the numbers, it is extremely useful to be able to frame your business problem as a Data Science question and think about how Data Science might solve it–to be able to say things like “alright so we can set that up as a supervised learning problem and we can use this dataset as a training set.” That will allow you to communicate effectively with Data Science professionals, avoid going down dead ends, and ultimately make smarter decisions about what to do with your data. For that reason, I believe that anyone in a position to have input on a Data Science related project would be extremely well served to have a basic familiarity with the ideas and language of Data Science, even if they don’t intend to completely get into the weeds or make it their primary job function.
What’s your best advice for those who want to transition into Data Science?
A lot of job postings out there list advanced degrees as requirements for Data Science positions. Sometimes that’s a red line, but often it is not. Some of the best data scientists I’ve met have been motivated self-learners who have pursued less traditional avenues for learning about this new field. What’s important for hiring managers is that you can demonstrate mastery of the subject in some way, and increasingly it is understood that this demonstration need not be through traditional channels. That means that the best thing you can do is to immerse yourself in the subject and demonstrate publicly that you’re doing so. Online and part-time courses are a great way to show that you’re willing to learn, and put the code that you end up writing as part of the courses up on something like Github is a nice way to get started building a portfolio.
Aside from that, participation in things like Kaggle machine learning competitions is a great way to show that you are engaged with the Data Science community, as well as to show off your chops as an aspiring data scientist.
Finally, a well-executed project that you pull off on your own can be a great way to demonstrate your abilities and impress potential hiring managers. Pick something that you’re really interested in, ask a question about it, and try to answer that question with data. Document your journey and make it into a blog.
When was a moment when YOU received a piece of advice that changed your career?
The best piece of career advice came from well before I was in Data Science, about ten years ago when I was doing business-to-business sales. The advice came from my Sales Manager and it was this: don’t be afraid to talk to anybody if you know you have something that will help them. At the time, the advice was concerning trying to sell our products to C-level executives, but it has stuck with me throughout my career transition into Data Science. Data Science is ultimately about finding problems and solving them, and if you can use Data Science to solve someone’s problem, no matter who it is, they’ll want to hear what you have to say.
How do you think Data Science will change in the next 5 years?
One of the most important changes in Data Science today is the increasing prevalence of so-called Big Data. Big Data refers to data which takes up too much space to work with on a single computer, and so special tools have to be used to recruit multiple computers at the same time to work on Big Data problems. For now, there are some Data Scientists who specialize in Big Data problems and others who don’t, and that’s fine for now, but I can see that in the next five years these tools will become less and less dispensable.
Inspired? Learn how to use data to drive decision-making in your role. Learn more about our upcoming Data Science course.