There is a good chance that if you are reading this you do not really know what Data Science is. In fact, there is even a good chance that you have never even heard of what Data Science is. Let me try my best to explain what Data Science is and why it is worth considering for a college major.
“Data Science is a mix of Computer Science and math.” That is what my Dad said to a curious 17 year old me without a set major in mind. In practice, that definition is kind of accurate. Technically, however, the definition is the science and process of extracting insight from data. This includes both the processes of gathering and cleaning data, as well as analyzing and reporting the data. Let’s break this down step by step.
Gathering the Data
Gathering data is pretty straightforward. A very common task across all industries is using past data to predict future data. For example, predicting the stock market. In order to make an informed prediction on what the price of a stock is going to be, it makes sense that the first step is to gather all the information there is to know about the stock; If I am going to try to predict the price of Tesla stock, I need data about the history of Tesla stock price. Gathering this data requires the knowledge of where to find the data as well as the technical skill to actually go out and get it.
Cleaning the Data
Now that you have Tesla stock price data, the next logical step is to prepare it in such a way that it is useful for your task. Is this variable useful? Are all of the values reasonable? Are there any missing values? Are there variables that do not exist that I need to create? All of these are questions a Data Scientist has to answer when it is time “clean” the data. This step requires not only skill but also industry knowledge to complete. This also happens to be the most important step in the data lifecycle. “Garbage in, garbage out.” This is a common phrase tossed around in the data world that essentially means if the data itself is bad, no matter how good anything else is the result will also be bad.
Analyzing the Data
Now that we have gathered and cleaned the data, it is time to do the fun part: analyze the data. This is where the common (mis)understanding of Data Science comes in. Data Scientists often use math and computer science techniques in the forms of statistical analysis and machine learning models to complete this step. Here, we actually predict the price of the Tesla stock.
Reporting the Data
Finally, now that we have the Tesla stock price, it is time to show these results to our clients. We use data visualization techniques to effectively explain the process we used to arrive at our conclusion. This usually involves designing graphs and plots to enhance the data analysis. This step is one of the most underrated parts about Data Science. Oftentimes, great analysis can be ruined by poor visualizations and presentation. Think of it like the marketing of the Data Science world. If your products are good but your advertisements are bad, chances are you will not be making many sales.
Final Thoughts
Data Science is much more complex than just predicting a stock. Anything that you can think of that involves data can likely use Data Science. It is one of the fastest growing industries with a booming job market. With the amount of data that is collected worldwide growing at an exponential rate, the need for data scientists who can effectively analyze all of this data will not only grow but become essential in every industry.
One of the things I love the most about Data Science is the flexibility. Not only can you be a Data Scientist in any industry, but there are many different specializations to choose from. As I outlined above, the 4 steps in the data life cycle are very different from one another and require different skill sets. In my time as a Data Scientist, I have worked on each step in the data life cycle. It keeps things new and exciting, and there are always new challenges on the horizon.
If you like math and computer science but maybe are not sure about the prospect of grinding code all day, or want to be one of the pioneers of a new and exciting field, Data Science might just be for you.
Comments