Benartzi (and Lehrer’s) The Smarter Screen: Surprising Ways to Influence and Improve Online Behaviour
The replication crisis has ruined my ability to relax while reading a book built on social psychology foundations. The rolling sequence of interesting but small sample and possibly not replicable findings leaves me somewhat on edge. Shlomo Benartzi’s (with Jonah Lehrer) The Smarter Screen: Surprising Ways to Influence and Improve Online Behavior (2015) is one such case.
Sure, I accept there is a non-zero probability that a 30 millisecond exposure to the Apple logo could make someone more creative than exposure to the IBM logo. Closing a menu after making my choice might make me more satisfied by giving me closure. Reading something in Comic Sans might lead me to think about it in a different way. But on net, most of these interesting results won’t hold up. Which? I don’t know.
That said, like a Malcolm Gladwell book, The Smarter Screen does have some interesting points and directed me to plenty of interesting material elsewhere. Just don’t bet your house on the parade of results being right.
The central thesis in The Smarter Screen is that since so many of our decisions are now made on screens, we should invest more time in designing these screens for better decision making. Agreed.
I saw Benartzi present about screen decision-making a few years ago, when he highlighted how some biases play out differently on screens compared to other mediums. For example, he suggested that defaults were less sticky on screens (we are quick to un-check the pre-checked box). While that particular example didn’t appear in The Smarter Screen, other examples followed a similar theme.
As a start, we read much faster on screens. Benartzi gives the example of a test with a written instruction at the front of the test to not answer the following questions. Experimental subjects suffered double rate of failure when on a computer - up from around 20% to 46% - skipping over the instruction and answering questions they should not have answered.
People are also more truthful on screens. For instance, people report more health problems and drug use to screens. Men report less sexual partners, women more. We order pizza closer to our preferences (no embarrassment about those idiosyncratic tastes).
Screens can also exacerbate biases as the digital format allows for more extreme environments, such as massive ranges of products. The thousands of each type of pen on Amazon or the maze of healthcare plans on HealthCare.gov are typically not seen in stores or in hard copy.
The choice overload experienced on screens is a theme through the book, with many of Benartzi’s suggestions focused on making the choice manageable. Use categories to break up the choice. Use tournaments where small sets of comparisons are presented and the winners face off against each other (do you need to assume transitivity of preferences for this to work?). All sound suggestions worth trying.
One interesting complaint of Benartzi’s is about Amazon’s massive range. They have over 1,000 black roller-ball pens! An academic critiquing one of the world’s largest companies built on offering massive choice (and with a reputation for A/B testing) is somewhat circumspect. Maybe Amazon could be even bigger? (Interestingly, after critiquing Amazon for not allowing “closure” and reducing satisfaction by suggesting similar products after purchase, Benartzi suggests Amazon already knows this issue).
The material on choice overload reflects Benartzi’s habit through the book of giving a relatively uncritical discussion of his preferred underlying literature. Common examples such as the jam experiment are trotted out, with no mention of the failed replications or the meta-analysis showing a mean effect of changing the number of choices of zero. Benartzi’s message that we need to test these ideas covers him to a degree, but a more sceptical reporting of the literature would have been helpful.
Some other sections have a similar shallowness. The material on subliminal advertising ignores the debates around it. Some of the cited studies have all the hallmarks of a spurious result, with multiple comparisons and effects only under specific conditions. For example, people are more likely to buy Mountain Dew if the Mountain Dew ad played at 10 times speed is preceded by an ad for a dissimilar product like a Honda. There is no effect when an ad for a (similar) Hummer is played first. Really?
Or take disfluency and the study by Adam Alter and friends. Forty students were exposed to two versions of the cognitive reflection task. A typical question in the cognitive reflection task is the following:
A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?
The two versions differed in that one used a small light grey font that made the questions hard to read. Those exposed to the harder to read questions achieved higher scores. Exciting stuff
But 16 replications involving a total of around 7,000 people found nothing (Terry Burnham discusses these replications in more detail here). Here’s how Benartzi deals with the replications:
It’s worth pointing out, however, that not every study looking at disfluent fonts gets similar results. For reasons that remain unclear, many experiments have found little to no effect when counterintuitive math problems, such as those in the CRT, are printed in hard-to-read letters. While people take longer to answer the questions, this extra time doesn’t lead to higher scores. Clearly, more research is needed.
What is Benartzi’s benchmark for accepting that a cute experimental result hasn’t stood up to further examination and that we can move on to more prospective research? Sixteen studies involving 7,000 people in total showing no effect, one study with 40 people showing a result. The jury is still out?
One feeling I had at the end of the book was that the proposed solutions were “small”. Behavioural scientists are often criticised for proposing small solutions, which is generally unfair given the low cost of many of the interventions. The return on investment can be massive. But the absence of new big ideas at the close of the book raised the question (at least for me) of where the next big result can be.
Benartzi was, of course, at the centre of one of the greatest triumphs in the application of behavioural science - the Save More Tomorrow plan he developed with Richard Thaler. Many of the other large successful applications of behavioural science rely on the same mechanism, defaults.
So when Benartzi’s closing idea is to create an app for smartphones to increase retirement saving, it feels slightly underwhelming. The app would digitally alter portraits of the user to make them look old and help relate them to their future self. The app would make saving effortless through pre-filled information and the like. Just click a button. But you first have to get people to download it. What is the marginal effect on these people already motivated enough to download the app? (Although here is some tentative evidence that at least among certain cohorts this effect is above zero.)
Other random thoughts:
One important thread through the book is the gap between identifying behaviours we want to change and changing them. Feedback is simply not enough. Think of a bathroom scale. It is cheap, available, accurate, and most people have a good idea of their weight. Bathroom scales haven’t stopped the increase in obesity.
Benartzi discusses the potential of query theory, which proposes that people arrive at decisions by asking themselves a series of internal questions. How can we shape decisions by posing the questions externally?
Benartzi references a study in which 255 students received an annual corporate report. One report was aesthetically pleasing, the other less attractive. Despite both reports containing the same information, the students gave a higher valuation for the company with the attractive report (more than double). Bernartzi suggests the valuations should have been the same, but I am not sure. In the same way that wasteful advertising can be a signal that the brand has money and will stick around, the attractive report provides a signal about the company. If a company doesn’t have the resources to make its report look decent, how much should you trust the data and claims in it?
Does The Smarter Screen capture a short period where screens have their current level of importance? Think of ordering a pizza. Ten years ago we might have phoned, been given an estimated time of delivery and then waited. Today we can order our pizza on our smartphone, then watch it move through the process of construction, cooking and delivery. Shortly (if you’re not already doing this), you’ll simply order your pizza through your Alexa.
Benartzi discusses how we could test people through a series of gambles to determine their loss aversion score. When people later face decisions, an app with knowledge of their level of loss aversion could help guide their decision. I have a lot of doubt about the ability to get a specific, stable and useful measure of loss aversion for a particular person, and am a fan of the approach of Greg Davies to the bigger question of how we should consider attitudes to risk and short-term behavioural responses.
In the pointers at the end of one of the chapters, Benartzi asks “Are you trusting my advice too much? While there is a lot of research to back up my recommendations, it is equally important to test the actual user experience and choice quality and adjust the design accordingly.” Fair point!