Utility-based data marketplaces

In "Computer Mediated Transactions" by Hal Varian, Varian offers an insightful look at how and why innovation has accelerated so rapidly within the realm of the internet. The piece offers some interesting insight regarding the historical development of the internet starting in the 1990’s, but it also makes some prescient predictions about the future. Especially given that it was written in 2010, I found the ‘Deployment of Applications’ section particularly compelling given some of the developments that have taken place since the time of this article’s writing. Most notably, Varian states “in the future it is likely that there will be a number of cloud computing vendors that will offer computing on a utility-based model. This production model dramatically reduces the entry costs of offering online services and will likely lead to a significant increase in businesses that provide such specialized services (Ambrust et al. 2009)” (Varian, 2010). Although a number of the established cloud service providers (Google, Amazon, etc.) have made efforts in this space, I believe that Snowflake is perhaps the best example of the computing future described by Varian. 

Snowflake was a private company founded in 2012 and that later had one of the most historic technology IPOs ever in 2020. Snowflake offers cloud based data storage and analytics that has become known as ‘data warehouse as a service.’ What is perhaps most interesting about Snowflake’s capabilities and business model is their ability to decouple storage from compute. Customers pay next to nothing to store their data on Snowflake servers, and are charged on a consumption basis as they run queries on data. More importantly, Snowflake has created a data marketplace such that Snowflake customers can share (or sell) data with one another, allowing small and large businesses alike to join various datasets both internally and externally. In an age where data has become a strategic differentiator for nearly every business, the ability to democratize data access through shared infrastructure is quite compelling. However, the question remains whether shared infrastructure and utility based pricing will in fact lead to a more democratic data ecosystem. 

I agree with the assertion by Varian that utility based production models are an exciting future, however I question the implication that this will be a net benefit to data consumers. In the past decade, we have seen news and communications democratized on the internet through businesses like Facebook and Twitter. However, companies like Facebook have struggled to offer a democratic communications utility while also bearing the responsibility of what is shared on their platform (often at the determinant of consumers). I wonder to what extent this offers a cautionary tale for data marketplaces like Snowflake. In the near term, businesses are likely to see lower costs of data analysis and easier access to data they may not have had the ability to query before. But if Snowflake were to grow the way Facebook did, at what point will they begin to lose control / insight over what types of data is shared and with whom? More importantly, if we believe data computing is in fact a utility, to what extent do we want such a utility completely controlled by one (or a select few) private companies?


academics study skills MCAT medical school admissions SAT expository writing college admissions English MD/PhD admissions GRE GMAT LSAT writing chemistry strategy math physics ACT biology language learning test anxiety graduate admissions law school admissions MBA admissions interview prep homework help creative writing AP exams MD academic advice career advice personal statements study schedules summer activities history premed philosophy secondary applications Common Application computer science test prep organic chemistry supplements ESL PSAT admissions coaching economics grammar law statistics & probability psychology SSAT covid-19 legal studies reading comprehension 1L CARS logic games Spanish USMLE calculus dental admissions engineering parents research Latin verbal reasoning DAT excel mathematics political science French Linguistics Tutoring Approaches chinese DO MBA coursework Social Advocacy academic integrity biochemistry case coaching classics diversity statement genetics geometry kinematics medical school quantitative reasoning skills IB exams ISEE MD/PhD programs PhD admissions algebra astrophysics athletics business business skills careers data science letters of recommendation mental health mentorship social sciences software engineering tech industry trigonometry work and activities 2L 3L AMCAS Academic Interest Anki EMT English literature FlexMed Fourier Series Greek Italian MD vs PhD Montessori Pythagorean Theorem STEM Sentence Correction Zoom admissions advice algorithms amino acids analysis essay architecture argumentative writing art history artificial intelligence cantonese capacitors capital markets cell biology central limit theorem chemical engineering chromatography class participation climate change clinical experience cold emails community service constitutional law curriculum dental school distance learning enrichment european history executive function finance first generation student fun facts functions gap year harmonics health policy history of medicine history of science hybrid vehicles information sessions institutional actions integrated reasoning intern international students internships investing investment banking logic mandarin chinese mba mechanical engineering meiosis mitosis music music theory neurology operating systems pedagogy phrase structure rules plagiarism poetry pre-dental presentations proofs pseudocode school selection science simple linear regression sociology software study abroad synthesis teaching technical interviews time management transfer typology units virtual interviews writing circles