The Cloud Blog

5 Skills Every Data Scientist Will Need For Their Job in 2018 - Lava Protocols

Written by Admin | Mar 28, 2018 9:50:43 AM

by Devon Hopkins

From Reddit to the New York Times, data scientists are in hot demand. Many want to break into the field, but the available career advice can be overwhelming: Which coding languages should you know? Do you have to be an expert in machine learning? Is it better to beef up on your technical skills or to nail down your design principles?

To answer these questions (and more), CARTO’s newest addition to the Data & Research team, Wenfei Xu, is here to tell us about the journey that ultimately led her to CARTO. Below, she shares her five secrets for success.

1. A foundation in statistics helps you understand large datasets

At the University of Chicago, Xu studied economics, including single and multi-linear regressions, time series econometrics, and forecast modeling. Though she’s no longer trying to model the price of fixed income instruments, she still relies on her undergrad training to guide her data analysis

For example, Xu is currently working on a project to better understand how people use public parks in New York City through mobile phone data. Working with terabytes of data (like the latitude and longitude where the app was opened, the timestamp, the length of use), she was able to choose the right distribution method (logarithmic) to evaluate the data.

2. Study design principles to better understand how to visualize data

After three years working in finance, Xu decided to go back to school and study architecture and urban planning. While at MIT, she learned how to use design principles to prioritize certain information. Xu worked on a project last month about the different communities living in Williamsburg, Brooklyn. As a former resident of the neighborhood, she wanted to depict the co-habitation of those communities using visual design principles. In her analysis, she used pick-up and drop-off data from New York City taxis to identify different sub-groups.

She found that a substantial number of people, nicknamed “partiers,” take taxis from Lower Manhattan and other parts of Brooklyn to Williamsburg, generally pretty late at night and on the weekend. To make that message stick out, she used a simple black-and-white map with street boundaries and overlaid it with bright, primary colors for each of the pick-ups and drop-offs.

3. Practice telling a story with your data

According to Xu, the best statistical models and sharpest design principles should ultimately come together to tell a narrative. When she created the Williamsburg taxi map, Xu discovered that there were 75 potential communities she could include, far too many to for one map. But, because she had a clear story in mind — that demographically disparate communities co-exist in Williamsburg, often in overlapping space — she was able to best support her argument by whittling down the options to the best five groups, even if they weren’t necessarily the largest.

4. Find a community or group to bounce ideas off of

Wenfei says she’s lucky because CARTO sits “at the intersection of industry and academia,” meaning she has access to the best minds in both. She can also count on her coworkers for help. For example, many of the parallel processing and visualization tools she’s currently using were introduced to her (or made) by her colleagues. If you don’t yet have your own data science squad yet, you’re in luck. Wenfei writes a newsletter that you can join.

5. Pay attention to these tools, concepts, and programming languages

Hard skills are also important, especially when it comes to landing your data scientist dream job. Wenfei recommends the following:

Concepts to know

  • a solid foundation in statistics
  • hypothesis testing
  • linear regressions
  • machine learning
  • visualization principles

Skills to have

  • python or R
  • spatial analysis
  • cloud computing and distributing computing methods
  • database skills such as PostgreSQL

Tools to be familiar with

  • the iPython/Jupyter environment
  • Matplotlib
  • Pandas
  • NumPy
  • Dask
  • Bokeh
  • scikit-learn


Lava is an authorised Salesforce Partner in Malaysia and has more than a decade of experience in cloud solutions which includes marketing automation, CRM implementation, change management, and consultation. We pride ourselves in not just being a CRM partner but in also understanding the needs of our customers and taking their business to the next level