4 Experts on the Future of London Tech
London has established itself as a global tech powerhouse. How will the city make the most of this movement to ensure prosperity for businesses and talent?
Our Story
Learn about who we are, our vision and how we’re changing the future of work.
Testimonials
Hear from our students on how BrainStation has helped them build successful careers.
Careers
Looking to join our team? View our open positions across the globe.
Press and Media
Resources and contact information for our media partners.
Colin Fraser is a Data Scientist at TELUS, where he focuses on business customer analytics and is also the Lead Educator for BrainStation’s Data Science course. We sat down with Colin to learn more about his role as a Data Scientist, and to uncover how companies like TELUS use data to drive-decision making.
I started out as a quantitative analyst at a futures trading firm, which used to be one of the only kinds of jobs where you could get paid to do the kind of math and science that I wanted to do. I enjoyed the work very much, but the hours were rough and there is a fair amount of high pressure, stress, and long hours that go along with that line of work. At the same time, I started reading about Data Science, which used many of the same kinds of tools I was using in quant finance but towards a broader class of business problems, and I started looking around to see if I could make a transition into that type of job. Turned out I could, and it’s been great!
Data Science is ultimately about knowledge creation. It is about figuring out how we can make use of a great big mess of data that our organization collects and turn it into knowledge which informs the operations of the business going forward. Along the way, we make use of state-of-the-art techniques and tools from computer science and statistics. I guess being on the front lines of this sort of knowledge creation activity, making new discoveries about how our customers behave or the impacts of our actions, is the most exciting part of being in data science for me.
One of our most important tools is RStudio Server, which is a great development environment for working with R on a server (and there’s a fully functional free version!). For crunching big data we use Hadoop-based tools like Hive, and for crunching smaller data our main tool is R, although I make use of Python frequently for Natural Language Processing applications. We also use a tool called h2o.ai for doing machine learning, and Tableau for some types of data visualizations.
Typically I’ll start working at about 6 in the morning so that I can get at least a few uninterrupted hours of coding in before it’s time for meetings with our various stakeholders. Waking up that early is not a requirement for a career in Data Science, but I find that it is a nice way to make time for deep uninterrupted focus. My team works closely with marketing and sales channels, so a great deal of my time during normal working hours is spent coordinating with them to make sure that we are building data science tools that will be useful. Hence, I spend a lot more time than you might expect in conference calls, meetings, and doing presentations to showcase our work and help turn models into action.
My team focuses on identifying marketing and customer experience opportunities for business customers. By looking at data on a customer’s billing history, product profile, usage patterns, and other pieces of information such as the business type and size, we are able to make smart recommendations about what kinds of products that customers might be able to use. We are also able to identify which customers might be at risk of leaving us for a competitor, or which might have a low level of customer satisfaction. These systems and models drive marketing and customer satisfaction campaigns to continually improve our level of service to our customers.
Some of my favorite examples come from epidemiology. In 2008, Google launched Google Flu Trends, which was an effort to predict flu outbreaks in real-time using local search terms. The idea would be that if there was an uptick in searches within an area for terms like “fever and sore throat” that this could indicate a flu outbreak in that area. Although it initially looked like a success, the model that they used ended up making some pretty far off predictions for the 2012-2013 flu season, and Google eventually shuttered the project. The basic idea was good, however, and it inspired many other researchers to build similar tools that have been more successful; for instance, flu-prediction.com uses a combination of Twitter’s firehose of data and IBM Watson to make predictions about flu outbreaks in the United States, and many organizations run Data Science competitions to see who can build the best flu prediction model. My view is that we’ve only seen a sliver of the power of these kinds of tools for public health, and I’m excited for the future of the intersection between computer science, statistics, and epidemiology.
As organizations grow in sophistication with respect to their data collection operations, it is inevitable that they will want to find a way to make use of these vast seas of data generated by their operations. To do that, you need to understand what is possible. Data Science seeks to provide a unified framework for talking about the art-of-the-possible–that is, to figuring out what kinds of questions can be answered by data, and what needs to be done in order to answer those questions. Even if you’re not ultimately the person who will be crunching the numbers, it is extremely useful to be able to frame your business problem as a Data Science question and think about how Data Science might solve it–to be able to say things like “alright so we can set that up as a supervised learning problem and we can use this dataset as a training set.” That will allow you to communicate effectively with Data Science professionals, avoid going down dead ends, and ultimately make smarter decisions about what to do with your data. For that reason, I believe that anyone in a position to have input on a Data Science related project would be extremely well served to have a basic familiarity with the ideas and language of Data Science, even if they don’t intend to completely get into the weeds or make it their primary job function.
A lot of job postings out there list advanced degrees as requirements for Data Science positions. Sometimes that’s a red line, but often it is not. Some of the best data scientists I’ve met have been motivated self-learners who have pursued less traditional avenues for learning about this new field. What’s important for hiring managers is that you can demonstrate mastery of the subject in some way, and increasingly it is understood that this demonstration need not be through traditional channels. That means that the best thing you can do is to immerse yourself in the subject and demonstrate publicly that you’re doing so. Online and part-time courses are a great way to show that you’re willing to learn, and put the code that you end up writing as part of the courses up on something like Github is a nice way to get started building a portfolio.
Aside from that, participation in things like Kaggle machine learning competitions is a great way to show that you are engaged with the Data Science community, as well as to show off your chops as an aspiring data scientist.
Finally, a well-executed project that you pull off on your own can be a great way to demonstrate your abilities and impress potential hiring managers. Pick something that you’re really interested in, ask a question about it, and try to answer that question with data. Document your journey and make it into a blog.
The best piece of career advice came from well before I was in Data Science, about ten years ago when I was doing business-to-business sales. The advice came from my Sales Manager and it was this: don’t be afraid to talk to anybody if you know you have something that will help them. At the time, the advice was concerning trying to sell our products to C-level executives, but it has stuck with me throughout my career transition into Data Science. Data Science is ultimately about finding problems and solving them, and if you can use Data Science to solve someone’s problem, no matter who it is, they’ll want to hear what you have to say.
One of the most important changes in Data Science today is the increasing prevalence of so-called Big Data. Big Data refers to data which takes up too much space to work with on a single computer, and so special tools have to be used to recruit multiple computers at the same time to work on Big Data problems. For now, there are some Data Scientists who specialize in Big Data problems and others who don’t, and that’s fine for now, but I can see that in the next five years these tools will become less and less dispensable.
Inspired? Learn how to use data to drive decision-making in your role. Learn more about our upcoming Data Science course.
Get the latest on upcoming courses, programs, events, and more — straight to your inbox.
You have been added to our mailing list, and will now receive updates from BrainStation.