Utility-based data marketplaces

In "Computer Mediated Transactions" by Hal Varian, Varian offers an insightful look at how and why innovation has accelerated so rapidly within the realm of the internet. The piece offers some interesting insight regarding the historical development of the internet starting in the 1990’s, but it also makes some prescient predictions about the future. Especially given that it was written in 2010, I found the ‘Deployment of Applications’ section particularly compelling given some of the developments that have taken place since the time of this article’s writing. Most notably, Varian states “in the future it is likely that there will be a number of cloud computing vendors that will offer computing on a utility-based model. This production model dramatically reduces the entry costs of offering online services and will likely lead to a significant increase in businesses that provide such specialized services (Ambrust et al. 2009)” (Varian, 2010). Although a number of the established cloud service providers (Google, Amazon, etc.) have made efforts in this space, I believe that Snowflake is perhaps the best example of the computing future described by Varian. 

Snowflake was a private company founded in 2012 and that later had one of the most historic technology IPOs ever in 2020. Snowflake offers cloud based data storage and analytics that has become known as ‘data warehouse as a service.’ What is perhaps most interesting about Snowflake’s capabilities and business model is their ability to decouple storage from compute. Customers pay next to nothing to store their data on Snowflake servers, and are charged on a consumption basis as they run queries on data. More importantly, Snowflake has created a data marketplace such that Snowflake customers can share (or sell) data with one another, allowing small and large businesses alike to join various datasets both internally and externally. In an age where data has become a strategic differentiator for nearly every business, the ability to democratize data access through shared infrastructure is quite compelling. However, the question remains whether shared infrastructure and utility based pricing will in fact lead to a more democratic data ecosystem. 

I agree with the assertion by Varian that utility based production models are an exciting future, however I question the implication that this will be a net benefit to data consumers. In the past decade, we have seen news and communications democratized on the internet through businesses like Facebook and Twitter. However, companies like Facebook have struggled to offer a democratic communications utility while also bearing the responsibility of what is shared on their platform (often at the determinant of consumers). I wonder to what extent this offers a cautionary tale for data marketplaces like Snowflake. In the near term, businesses are likely to see lower costs of data analysis and easier access to data they may not have had the ability to query before. But if Snowflake were to grow the way Facebook did, at what point will they begin to lose control / insight over what types of data is shared and with whom? More importantly, if we believe data computing is in fact a utility, to what extent do we want such a utility completely controlled by one (or a select few) private companies?