• <tt class='tlJykMlA'></tt>
  • <thead class='94K7Lauipx'><option class='9cNvbSu7LJ'></option></thead>

    <em class='jepytdnyfeW3'><b class='oR1Ys12nk'><td class='6rzzLTN'></td></b></em>

  • <dl class='Td0EmkF'><b class='qoWhaibfJ9'></b></dl>

  • <span class='lS1G'></span>


    KDnuggets 500彩票下载app二维码 » News » 2020 » Jan » Opinions » I wanna be a data scientist, but… how? ( 20:n03 )

    Gold BlogI wanna be a data scientist, but… how?


    It’s easy to say "I wanna be a data scientist," but... where do you500彩票下载app二维码 start? How much time is needed to be desired by companies? Do you500彩票下载app二维码 need a Master’s degree? Do you500彩票下载app二维码 need to know every mathematical concept ever derived? The journey might be long, but follow this plan to help you500彩票下载app二维码 keep moving forward toward you500彩票下载app二维码r career goal.

    By , Data Scientist.



    If you500彩票下载app二维码 start surfing online the skills required for a data scientist job position, the easiest thing is to panic; unless you500彩票下载app二维码r motivation is real. Data science covers so many things that it can be overwhelming; as well as the Moscow Metro map.

    And so was the plan in 2013, before they opened 40 new stops.

    Of course, from the point of view of the companies… what to ask for when you500彩票下载app二维码 want a profile that can deal with all the possible points in the above picture? Well … everything that fits in the job description field (I guess it has a limit, but I’m not sure at all):

    “… I have also seen an infographic on the internet that will save us the task of looking for 40 requirements out there…. And hey! if we are lucky with this, we can also cover the position of Data Engineer and even that one for an Architect; and we‘ll get a 3 for 1”

    No panic! all those skills are the ones that add up between the two.

    From the point of view of an aspirant with a lot of experience: it’s not about fitting into the description of a unicorn, and less from one day to the next. In the last 2 years, I had the opportunity to interview several people to cover data scientist positions. And some of the candidates did not cover half of the skills that are usually required in Linkedin for this position… being working as such for years! Be very careful with the requirements described in the job offers, which are more dangerous than the leaflets of the medicines :)

    And then… how will I know which skills do I need?

    I had time months ago to read a lot of articles, with experiences and advice on the subject (written by people who got the job studying on their own), and I was looking for a pattern to see where my steps should go. The journey could be more or less long, but there were things that always appeared:

    • Have a good foundation in algebra, calculus, probability, and statistics (the maths that we swallow in the first 2 courses of any career in engineering).
    • Python or R as a programming language, and their corresponding libraries for Data Science.
    • Knowledge of SQL to make queries about databases (with joinsand those things, not difficult).
    • Obtaining data from different sources (API queries, web scrapping, …).
    • Cleaning and preprocessing of data (and the famous feature engineering).
    • Machine Learning (algorithms, modeling, evaluation, optimization, etc).
    • Deep Learning, Reinforcement Learning, Natural Language Processing, Computer Vision, …


    • Creation of visualizations to explain the results.
    • Storytelling
    • Formulation of questions and preparation/testing of hypotheses.
    • Domain knowledge.

    For me, it’s enough… although surely more than one reader will miss something in there. What is clear is that, in order to draw a plan which will take us to the goal, we won’t need many more things; in the same way we don’t need to be the fastest in order to finish a race. It’s simply about acquiring the theoretical and practical knowledge that allows us to perform the tasks that clearly belong to that role; regardless of the Cloud platform that the companies use, their version system, their degree of automation, etc. Well… in fact, those additional skills (and the so-called soft skills) are what will differentiate our profile from others and the key that can give us the job of our dreams; but first, let’s go for the basics, right?

    The following diagram simplifies the previous list. This is a world fed mainly from developers and mathematicians since they already have a pillar and a half (or two) of the three ones that support Data Science. Although we should not underestimate the domain where it is applied; there are many use cases of Machine Learning in all sectors, but evidently a bank is not the same as a hospital, and the knowledge of a specific field will always help us to better understand the data and to ask the right questions to obtain valuable answers.

    I have seen a cute unicorn.

    In my case, I have the luck and the advantage of coming from engineering where I learned programming and an absurd amount of mathematics. I have been working as a developer, analyst and even architect for many years, and lately, I’ve been very close to the data. I also have a (practical) Master in Big Data and Business Analytics. And even for me, there is a long path to follow (or several ones).

    Hey! Do I need a master’s degree?

    I’d tell you500彩票下载app二维码 that it is not necessary. There is enough quality information on the Internet to match and exceed the knowledge and skills that a master’s degree can provide you500彩票下载app二维码; even the most practical and complete of them all.

    But companies are looking for people with additional and certified training…

    It’s true; that’s something frequently asked (in my opinion we have a serious problem of degreetitis). But the most important thing is to show you500彩票下载app二维码r knowledge; not you500彩票下载app二维码r titles. A technical interviewer will value what you500彩票下载app二维码 really know above everything else; you500彩票下载app二维码’ll simply have to convince them that you500彩票下载app二维码 are the right person for the position.


    The most important thing is always to have a plan

    Knowing how you500彩票下载app二维码 are going to organize you500彩票下载app二维码rself is the key to achieving you500彩票下载app二维码r goals; whatever the goal is. That’s why it’s convenient that you500彩票下载app二维码 take you500彩票下载app二维码r time to elaborate a plan, and that you500彩票下载app二维码 write it down, with the maximum possible detail…

    We could not miss a cat.

    Sounds good to me. But… I have no idea where to start!

    Right now, I’ll explain to you500彩票下载app二维码 how is the itinerary that I would choose for myself if I started from scratch…

    1. Choose between Python or R. My advice: if you500彩票下载app二维码 already have experience with one of them, stay with that. If not, choose Python (you500彩票下载app二维码’ll never think you500彩票下载app二维码 made a bad decision, I promise). Set up you500彩票下载app二维码r local environment or jump to the as Goku.
    2. Obviously you500彩票下载app二维码’ll need to know of the language in order to start writing code. You can gradually expand you500彩票下载app二维码r knowledge, for example, with a web full of short . Meanwhile, it’s good that you500彩票下载app二维码 keep an eye on , , or , since they will be you500彩票下载app二维码r day-to-day tools.
    3. The next thing would be to know the basics of the most used libraries for Data Science. In the case of Python, we have, for example, NumPy, Pandas, MatplotLib or Scikit-Learn. For R, we have dplyr, tidyr, ggplot2, knitr, caret, dmlc, or mlr. I recommend you500彩票下载app二维码 to follow that covers all of them, or read about each one, and of course: write code while you500彩票下载app二维码 learn.
    4. Machine Learning! There are a lot of introductory courses, books, and resources at you500彩票下载app二维码r fingertips. We have ’s course in to learn the fundamentals. If you500彩票下载app二维码 chose Python, I recommend you500彩票下载app二维码 the course in . There are a couple more courses with excellent reviews in Udemy (, ), and a highly recommended (which goes a little further). There exist also a number of specialized websites where you500彩票下载app二维码 can find . The options are almost endless!
    5. Deep Learning, Reinforcement Learning, and company. Again there is a lot of free (or almost) information. We have a specialization in , again from Andrew Ng, a very complete course in , the by the creator of Keras, and another with good reviews. We will start by having a global vision of everything that this “field” encompasses, with its practical application, and in our hands will be to choose the branch in which we want to continue… deepening.
    6. Participate in competitions (start once you500彩票下载app二维码’ve reached point 4!). Kaggle is a place where you500彩票下载app二维码 can find a lot of user-friendly datasets to practice and test you500彩票下载app二维码rself against other data scientists. This last part is the best from Kaggle since you500彩票下载app二维码 can know how good you500彩票下载app二维码r model really is, and if you500彩票下载app二维码’ve messed up because of a little mistake you500彩票下载app二维码 made (something you500彩票下载app二维码’ll never know easily with real-world problems). Moreover, there’s a lot of information available in kernels and forums to learn and improve day-to-day. Choose an open competition or an old dataset you500彩票下载app二维码 may like, and… play!
    7. Most important tip: do projects. Pose a problem and try to solve it. Find a topic that motivates you500彩票下载app二维码 or related to a sector where you500彩票下载app二维码 want to work. Use real-world but also build them from scratch by getting the information from where it’s located. Create you500彩票下载app二维码r own data flow. Learn to clean and preprocess any kind of messy data. Choose the most suitable algorithms, compare models, and optimize their parameters. Try to tell a story accompanying the results, and decorate it with stunning visualizations that you500彩票下载app二维码’ll create (or you500彩票下载app二维码 saw out ). Try something new in every project you500彩票下载app二维码 start: try to automate processes, consider how you500彩票下载app二维码 would take a notebook to a production environment, …. This is where a world of possibilities opens up!
    8. Prepare for job interviews. If you500彩票下载app二维码 get at this point successfully, for sure, you500彩票下载app二维码 can defend you500彩票下载app二维码rself well… or not? Let’s see… What experience do you500彩票下载app二维码 have in real projects?Which SQL query would you500彩票下载app二维码 write to extract this information from that database? Do you500彩票下载app二维码 know Docker and Kubernetes? What about Spark? Have you500彩票下载app二维码 administered a Hadoop cluster? Have you500彩票下载app二维码 used Elastic? What experience do you500彩票下载app二维码 have with Kafka? OK… there are countless technologies we haven’t touched (so far) which may come up at a certain point. But I consider them as add-ons you500彩票下载app二维码’ll need (or not), with the only objective of passing the interview for a position where additional knowledge is needed (or not xD). Don’t think too much about this, and never use it as an excuse to postpone you500彩票下载app二维码r first interview: you500彩票下载app二维码’ve already learned a lot of things, which by the way were far more important and complicated.
      As a tip: If you500彩票下载app二维码 see that some requirement is repeated a lot in job offers that you500彩票下载app二维码 like… maybe you500彩票下载app二维码 should keep an eye on it. If you500彩票下载app二维码 go to an interview and fail in a question… take note, go 500彩票下载app二维码, and strengthen you500彩票下载app二维码r knowledge on that subject.
      Doing interviews is part of the journey; the most important thing is that you500彩票下载app二维码 must assimilate it from the beginning and learn from it!

    It’s always great to know our limitations so that we can fix it. Following the previous planning, we’ll realize on the fly if we need more time on something. For example, it’s possible that at some point, we have to reinforce our knowledge of statistics because we don’t understand concepts we see repeated over and over again. Or maybe it’s necessary to put more focus on the programming part. Do not fear: we‘ll see …


    And to execute a plan, we’ll have to follow some tactics

    All right, but… when will I know that I’m ready to move from one point to another? What degree of depth and knowledge will be required in each subject to get the job?

    Good question. In fact, everything is heading towards point 7 (do projects!) So the ideal would be to get there as quickly as possible, with enough knowledge to defend you500彩票下载app二维码rself in a good part of the flow for a typical data project.

    Well, let’s consider a tactic that will help us to optimize the trip. These are some of the key points for me:

    • Learn by doing: The best way to learn something is to put it into practice. Spend most of you500彩票下载app二维码r time writing code. And I’ll say it again: do projects!
    • Organize you500彩票下载app二维码r agenda. Try to spend some time learning each day. Set small milestones with realistic deadlines (you500彩票下载app二维码 can use a if that helps you500彩票下载app二维码) and try to meet them. Check what you500彩票下载app二维码 could accomplish and what you500彩票下载app二维码 couldn’t. Don’t get overwhelmed, but do not relax either :)
    • Learn as if you500彩票下载app二维码 had to teach: Take notes, make summaries, draw diagrams… very good! But you500彩票下载app二维码 don’t really understand something unless you500彩票下载app二维码 can explain it to you500彩票下载app二维码r grandmother. And that’s the reason why I decided to start this blog :) You can follow the .

    • Take the top-down approach. With a bottom-up approach, what we would do is to follow the classic flow of learning: learn first all the small pieces before you500彩票下载app二维码 can reach the whole. An example of this approach would be to choose an Algebra course, another one for Calculus and one more for Probability and Statistics, with the only purpose of being able to face the Machine Learning algorithms. With a top-down approach, we’ll simply try to learn Machine Learning, scratching (or deepening) the mathematical part when necessary.This way we won’t lose the motivation, the focus, or our time with something maybe irrelevant. Did you500彩票下载app二维码 learn what offsides is before playing you500彩票下载app二维码r first soccer game?
    • Be resourceful: there are a and tools (click on the link!). As important as having a solid knowledge is the ability to quickly locate what we don’t know or don’t remember.
    • Upload you500彩票下载app二维码r projects to GitHub. Those will be you500彩票下载app二维码r credentials to apply for the job you500彩票下载app二维码 want. If you500彩票下载app二维码 don’t have paid experience, you500彩票下载app二维码’ll need to prove experience with you500彩票下载app二维码r own projects. On the Internet, you500彩票下载app二维码 can find a lot of ideas or papers, and you500彩票下载app二维码 can also try to solve a real problem or a concern of you500彩票下载app二维码r day-to-day.

    • Don’t let you500彩票下载app二维码rself be drownedby the amount of information published daily. There are many people doing very interesting and innovative things, but you500彩票下载app二维码 need to focus on acquiring the base that will enable you500彩票下载app二维码 to become a data scientist; you500彩票下载app二维码 can ignore what is published every minute on Twitter.
    • Prepare the interviews thoroughly. There’s a lot of information on this (lists of typical questions, tips to improve you500彩票下载app二维码r CV, …) and even mentors if you500彩票下载app二维码 need extra help. By the way: it is essential that you500彩票下载app二维码 are able to explain what you500彩票下载app二维码 did in you500彩票下载app二维码r data projects.
    • Remain up to date once the goal is reached. Subscribe to the most relevant and , follow the data on Twitter or Linkedin, participate in forums, attend meetups, try to be Gold in a Kaggle competition, or simply expand you500彩票下载app二维码r skills.

    Final tip: the journey is long, so don’t face it as a speed race, but as a half marathon. Be constant and follow you500彩票下载app二维码r plan, but dosing you500彩票下载app二维码r strength. Surely there will come a time when you500彩票下载app二维码 think of surrender, but that’s also part of the process. In the end, as in all long distances, the key is to keep moving forward :)

    . Reposted with permission.


    Sign Up

    By subscribing you500彩票下载app二维码 accept KDnuggets Privacy Policy


  • <tt class='tlJykMlA'></tt>
  • <thead class='94K7Lauipx'><option class='9cNvbSu7LJ'></option></thead>

    <em class='jepytdnyfeW3'><b class='oR1Ys12nk'><td class='6rzzLTN'></td></b></em>

  • <dl class='Td0EmkF'><b class='qoWhaibfJ9'></b></dl>

  • <span class='lS1G'></span>