Data Science and Intuitive Mathematics

math statistics & probability

business-commerce-gut_feelings-investor-investing-investment-markets-bven747_low.jpg

Sitting at the cross-roads of mathematics, statistics, and computer science, the emerging field of data science (ranked by many as the top career in the US) seems daunting to those still developing strong technical skills. At the same time, a host of dynamic and highly-efficient libraries give coders the power to treat complex areas like machine learning as a black box.

There is, however, middle ground: an intuitive understanding of mathematics that makes that box somewhat less opaque. Moreover, it suggests an end-goal for students learning mathematics at all levels: that there is something important to be gained studying math quite aside from the computations and the proofs.

Example I – Calculus and Optimization

Machine learning is concerned, generally speaking, with the task of teaching computers to learn from data without constant human oversight. In practice, the computer is programmed with a vague model of learning, and fills in the specifics in accordance with the data it is given. And how does the program know when it has succeeded or failed? The scientist provides a function that measures the explanatory power of the computer’s learning, and charges the computer with optimizing it. Then the program runs and runs until it arrives at a conclusion which provides the most explanatory power.

One of the many challenges in implementing this strategy is the problem of realizing when one has stumbled on an optimal solution. Should the computer decide on the best hypothesis only after considering all the alternatives? In practice this is impossible – such programs may take millions of years to run! Here, calculus provides an answer – at an optimal solution the derivative of the learning function will be zero. Thus, all we need to do is start with a poor solution and tinker with it, decreasing the derivative until we arrive at zero. So far, so good. However, there is still a problem with this approach. Can you guess what it might be?

The answer is: although the derivative is zero at the optimal solution, having a zero derivative does not mean your solution is optimal. Such a solution may only be locally optimal, i.e. better than only the nearby alternatives.

Screen_Shot_2016-03-30_at_5.13.22_PM.png

Indeed, what can often happen is that your program gets “trapped” in a local optimum, never realizing that although your strategy is better than similar alternatives, it may be far inferior to vastly different ones. In practice, dealing with this problem is tricky and case-specific. Someone interested in such a data problem does not necessarily need to have a mastery of complex optimization techniques. But they do need to understand why this approach has a tendency to produce sub-optimal results and how that relates to the shape of the data and the optimization function, as drawn above. Such an intuitive appreciation can inform them as to when, if necessary, they need to reach for a more sophisticated code module.

Example II – Linear Algebra and Dimensionality Reduction

Another major challenge in machine learning is dealing with the problem of having too many features. For example, customers’ decisions may be affected by only a small handful of variables, but the company collected information on many more. Or perhaps some variables are so closely related that recording all of them is unnecessary. An abundance of variables makes computation difficult and obscures the underlining meaning present in the data. One common approach to get around this is dimensionality reduction¸ which amounts to the judicious throwing away of unnecessary variables. This is often accomplished via Principal Component Analysis, or PCA. A foray into the mathematics of PCAs requires an understanding of linear algebra, which, while ideal, is not something one can always assume.

So, here is an intuitive explanation. Put simply, PCA isolates the most import variables by computing how much explanatory power would be lost if we forgot about them. Suppose that I am analyzing a sector of the stock market and companies A,B, and C are all performing virtually identically. I could ignore about any two of them and not lose information from my data set.

Screen_Shot_2016-03-30_at_5.16.15_PM.png

The picture above is a more common example. The collection of points represents my data set; U1 and U2 are two distinct combinations of the underlying variables. If I had to choose one of U1 or U2 to record and another to discard, clearly U2 would have to go, since this combination does not vary as much over the whole data set and is less useful in distinguishing my data points. In practice there may be hundreds and thousands of variables, and since we cannot visualize things as we just did, we use linear algebra and statistics to automate the process.

The advantage of having such an intuitive understanding, even without all the technical details, is that it helps appreciate any subtleties that may develop. This explanation makes it quite clear that the process of PCA is very sensitive to the correct choice of units used. Measuring one company in US dollars and another in Ruble would throw off the scales and produce poor results, and a scientist who realizes this is likely like to accidentally overlook it.

Mathematics as Intuition

The purpose of the above two examples is two demonstrate that underlying every technical aspect of mathematics is an intuition that drives it. This intuition is often more accessible than the associated formalism, and in practice the former can sometimes be substituted for the latter. Simply having the right picture in mind can transform a black box into a tool to be wielded creatively.

Sign up for a Mentor 

Comments

topicTopics
academics study skills MCAT medical school admissions SAT expository writing college admissions English MD/PhD admissions strategy writing LSAT GMAT GRE physics chemistry math biology graduate admissions academic advice ACT interview prep law school admissions test anxiety language learning premed MBA admissions career advice personal statements homework help AP exams creative writing MD study schedules test prep computer science Common Application summer activities history mathematics philosophy organic chemistry secondary applications economics supplements research 1L PSAT admissions coaching grammar law psychology statistics & probability legal studies ESL CARS SSAT covid-19 dental admissions logic games reading comprehension engineering USMLE calculus PhD admissions Spanish mentorship parents Latin biochemistry case coaching verbal reasoning DAT English literature STEM excel medical school political science skills AMCAS French Linguistics MBA coursework Tutoring Approaches academic integrity chinese letters of recommendation Anki DO Social Advocacy admissions advice algebra art history artificial intelligence astrophysics business cell biology classics diversity statement gap year genetics geometry kinematics linear algebra mechanical engineering mental health presentations quantitative reasoning study abroad technical interviews time management work and activities 2L DMD IB exams ISEE MD/PhD programs Sentence Correction adjusting to college algorithms amino acids analysis essay athletics business skills careers cold emails data science dental school finance first generation student functions graphing information sessions international students internships logic networking poetry resume revising science social sciences software engineering tech industry trigonometry writer's block 3L AAMC Academic Interest EMT FlexMed Fourier Series Greek Health Professional Shortage Area Italian Lagrange multipliers London MD vs PhD MMI Montessori National Health Service Corps Pythagorean Theorem Python Shakespeare Step 2 TMDSAS Taylor Series Truss Analysis Zoom acids and bases active learning architecture argumentative writing art art and design schools art portfolios bacteriology bibliographies biomedicine brain teaser campus visits cantonese capacitors capital markets central limit theorem centrifugal force chemical engineering chess chromatography class participation climate change clinical experience community service constitutional law consulting cover letters curriculum dementia demonstrated interest dimensional analysis distance learning econometrics electric engineering electricity and magnetism escape velocity evolution executive function freewriting genomics harmonics health policy history of medicine history of science hybrid vehicles hydrophobic effect ideal gas law immunology induction infinite institutional actions integrated reasoning intermolecular forces intern investing investment banking lab reports linear maps mandarin chinese matrices mba medical physics meiosis microeconomics mitosis mnemonics music music theory nervous system neurology neuroscience object-oriented programming office hours operating systems