This month we are reviewing the book Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz.
I’ve had this book in my queue for a while now, and it seemed like it fit well (if somewhat loosely) with last month’s book review of The Mom Test. Both books are about how people lie to us, and to themselves, and how we can uncover the truth.
The Mom Test focuses on asking better questions. Everybody Lies focuses on using better data.
I see these two as different sides of the same coin. In order to truly understand people, their problems, and ultimately our society, we have to use qualitative and quantitative data effectively. We have to cut through the noise and the lies we tell ourselves and each other, and get to the truth.
This is true in most aspects of our lives. To make better public policy, we have to understand reality. To be better parents, we have to have good information. And since we’re very interested in good product development and experiences, we need to truly understand our users and our customers, cutting through fluff to get to the real problems.
So what were some of the key lessons from Everybody Lies? Let’s dive in.
Overview
Like the title states, Everybody Lies is focused on data from the internet and what it can tell us about human behavior. Stephens-Davidowitz is a former Google data scientist with a PhD in economics from Harvard, and he uses both fields, and data from Google, Facebook, and PornHub, to delve into insights we couldn’t get a decade ago.
Out of the gate, the author starts with the supposition that we can’t know what people are truly thinking. Our private thoughts truly are private. And they tend to stay that way. And when we interact with others, we tend to lie or misrepresent ourselves for a variety of reasons. We all want to put on our best face. It may even be because we’re lying to ourselves, and have every intention of only eating candy once a day. But when the temptation arises, we fall short.
That’s where big data can fill gaps. Big data, as defined by the author, is a loose term that simply describes the sheer volume, diversity, and quantity of data that modern technology allows us to acquire.
The four powers of this modern big data are its:
Novelty
Honesty
Abundance
Controlled Experiments
So how can we apply all of this to what we do?
3 Key Points
Honesty
Information is only as useful as it is honest. That is the difficulty we often face in customer interviews, surveys, and other information gathering: we are all dishonest in varying degrees. We all suffer from social desirability bias, so even on anonymous surveys, we may not be entirely truthful because we want to portray our best selves, even if it is only to ourselves.
But as Everybody Lies points out, there are certain types of data that don’t lie, or at least are much more honest. The book focuses on Google searches, which have little incentive to be dishonest. We can search for anything without feeling judged or worrying about what others think.
This leads to interesting, and sometimes disheartening, results. The author points out that despite the progress the United States has made regarding racism, the data tell a different story. While this book was published a few years ago, we can see that many of the problems we had hoped were on the mend in America have only been exacerbated.
But what about in software development?
The book shares an interesting story about the Netflix recommendation engine. And this story perfectly described my Netflix rental habits before streaming.
In the early days of Netflix, you actually got DVDs by mail (hard to believe, but we lived in crazy times back then). You would select a queue of movies that you’d like to watch next, and when you returned one DVD, you’d decide which one you’d like to receive next. Most of us would fill our queue with aspirational movies, like documentaries and award-winning films that no one remembers. But when it came time to actually get the DVD, we’d pop the action movie or the comedy up to the top, above all the documentaries.
Netflix saw this behavior. It saw what we said we wanted to watch (The Civil War documentary) and what we were actually watching (Top Gun). And it created its recommendation engine based on what users were actually doing, not what they said they wanted to do.
This is one of the critical lessons of data (and user research): we have to base our understanding on what people (our users and customers and potential customers) are actually doing, not what they say they are doing or would like to do. Because we all say we will watch documentaries, but when Friday night rolls around, we’ll probably choose Top Gun.
Zooming In
The next key takeaway is that data allows us to zoom in on specifics in a way that we haven’t been able to before.
One of the most fascinating studies from the book for me came from the author aggregating data on baseball team fans. He found that fans of specific teams often correlated directly with who won the World Series (the baseball championship in the U.S.) when the child was around 8-years-old. So kids were picking their favorite team and sticking with them based on who was most popular and winning around the time they were 8.
That is fascinating. When we’re born and the timing of events matters to our tastes years down the road. Taking a broad data set and then narrowing in on specific parts allows us to understand things we couldn’t have otherwise understood.
It may not be enough to understand that “lots of people are fans of the New York Yankees”, but zooming in on why helps us get a clearer picture. Segmentation can yield similar results for us. Understanding specific user groups can help us better understand why certain things work and certain things don’t.
In a use case for one application we created, we worked on better understanding specific user segments based on their job roles. As we broke apart the specific roles, we realized specific users were doing very different things. This helped us not only sort our data better, but also focus our development efforts in a more meaningful way. In our case, it was a combination of data and user interviews that helped us garner these insights, but zooming in on the right segmentation will yield interesting results.
Experiments
Finally, data allows us to create or review natural experiments in a way that hasn’t been possible previously.
One case study really stood out to me from the book.
We can now better understand incentives and patient outcomes in medical treatment. They often change the medicare reimbursement formulas for arbitrary reasons, and this provides an interesting natural experiment. Are doctors incentivized to adjust what tests they order based on reimbursements? Do the tests impact outcomes? The answers, according to the book, are that yes, incentives influence doctors even when the tests have no impact on patient outcomes.
So how can we use experiments?
Most of us in product development are familiar with A/B testing. It used to be difficult to do, but is now so easy and so ubiquitous that we’d be crazy not to A/B test our software.
But are we testing?
In my experience, not frequently enough. Many large companies like Facebook and Google are heavily invested in A/B testing, to the point of absurdity in some cases. While many of our companies aren’t invested at all. We assume our features will work best as designed so we don’t test alternatives. But we should. We need to understand not only what works best for our businesses, but what creates the best experiences for our users, and that comes from testing and iterating. Often the results will surprise us. We may not think money can influence doctors when it won’t help their patients, but the data say otherwise. So what other interesting things will you discover as you experiment?
Conclusions
At this point, most of us understand big data. It doesn’t even need quotes around it. We know what it is and why it is important. Most of our products are busy incorporating it or trying to incorporate more data.
Using data is critical to understand our world, understand our users, and understand ourselves. But it can’t tell the entire story. It can point us in the right direction, tell us things we may not have known or guessed, and give us insight we may have overlooked. But, as Everybody Lies suggests, we still need the human element to make sense of all the data we have. We are humans first, and data can never tell that complete story.
So we can use data to help paint an honest picture, but it can’t and shouldn’t replace insights and interactions. It can and should supplement. It can help inform our decisions, zoom in on problems, spot trends, and change broad behavior. Ultimately, it can help us change the world. But we have to ensure we’re changing it for the better.
Other Good Links
Dystopia Is Upon Us. Are You Ready? (article) - Speaking of data, and its potential misuses… “Jules Polonetsky, founder of the Future of Privacy Forum, an advocacy group that develops privacy protection for ethical business practices, warns that there is a risk for everything—including what you do, hear, and see—to be tracked and analyzed if governments don’t set boundaries on the types of data being collected and how it’s used. He advises that “we need a national privacy law that will set a baseline for responsible uses of data.””
Microsoft and Nvidia build largest ever AI to mimic human language (article) - Of course, we can use vast amounts of data for good things, and very interesting things. Take, for example, the latest human language AI… “Microsoft and chip manufacturer Nvidia have created a vast artificial intelligence that can mimic human language more convincingly than ever before. But the cost and time involved in creating the neural network has called into question whether such AIs can continue to scale up. The new neural network, known as the Megatron-Turing Natural Language Generation (MT-NLG) has 530 billion parameters, more than tripling the scale of OpenAI’s groundbreaking GPT-3 neural network that was considered the state-of-the-art up until now.”
MVPs, Product Launches, and Learning New Skills - A Conversation with Jason Sherman (podcast) - Jason Sherman - an entrepreneur, film-maker, and futurist - joined me to talk about validating ideas through minimum viable products (MVPs), creating successful launches, and constant learning to build new skills and iterate on your ideas. It was another superb conversation.