Limiting disinformation on social platforms: Why incentives are the bottleneck

Combating disinformation is a challenging problem. It is crucial to ask when a profit-motivated social platform is incentivized to solve it.

oh do we love politics
Photo by little plant / Unsplash

Platforms such as Facebook possess the technology and expertise to detect and filter disinformation effectively. However, the social giant is less eager to deploy such technology if it means filtering content that drives user engagement - and therefore time-spent - on the platform.

This outcome is only natural when 98% of your $86B revenue comes from showcasing ads to users; content-curation algorithms will tend to optimize for engagement rather than veracity of content. And indeed, Facebook’s proprietary algorithms that maximize engagement also prioritize content that is divisive, misleading, and filled with conspiracy.

Disinformation prevention therefore is much less a problem of technological obstacles than it is of misaligned incentives. This is especially true because, while platforms such as Facebook, Twitter and YouTube provide a convenient avenue for sharing content, it is only society that bears the costs of damages if there’s any harmful consequence due to the unmitigated spread of disinformation.

Predicting whether a specific piece of content will cause harm is extremely hard. Moreover, filtering content is costly, as it comprises both the direct costs of implementing sophisticated algorithms and also the indirect costs due to the potential loss of user engagement. Hence, all else being equal, social platforms face no compelling incentives to proactively filter disinformation.

Photo by Jeremy Zero / Unsplash

To elaborate further, suppose technology was not a limiting factor. Let’s assume instead that social platforms own a perfectly accurate classification model, one that can flag potentially harmful false content with a 100% accuracy. With misaligned incentives, could we believe a promise from the platform that they would use their model to promptly filter disinformation?

Or perhaps a more pertinent question to ask is if there’s any way to verify their upholding of such a promise, without us having access to their proprietary model or its underlying data. One option of course is to not do anything; that is, we assume goodwill on behalf of the platforms and let market forces dictate when they filter disinformation using their perfect model.

But the incentive problem suggests that platforms cannot be counted on to self-police content, despite their claims to the contrary. The only reason for them to filter content is to avoid any backlash from the public or popular press, or to circumvent regulation. For example, recall what happened earlier this year on Capitol Hill. Facebook and Twitter scrambled to ban certain hashtags, phrases and problematic groups only after all the mayhem had started.

Of course, in hindsight it’s easier to gauge what content is problematic or harmful. But should social platforms, with the technology, expertise and troves of data at their disposal, be afforded the same leeway?

Nevertheless, even if they had a perfect classification model, because of misaligned incentives, unless they fully incur the total costs of any resulting societal damages, platforms will systematically under-filter disinformation.

So naturally you might think a solution is to simply regulate social platforms using this so-called strict liability standard: The platform is responsible for all harm caused by content that it hosts. The problem is that we actually can’t; to ascribe total responsibility to platforms for individual acts of violence is not only difficult, but might retroactively seem unfair.

Plus, who decides which platform is more liable for the violence on Capitol Hill? According to Sheryl Sanberg, the riots were organized on other platforms and not Facebook, after all.

My recent work formalizes these incentive issues and offers key insights for the regulation of social platforms to control the harm from disinformation. Some of the defining features of this domain are not too dissimilar from those of environmental regulation.

Consider the prevention of oil spills as an example. It is difficult to exactly predict if an oil spill (harm from disinformation) will occur. And if a spill (Pizzagate, or Capitol Hill riots) does occur, determining what caused it is an involved process. Strict liability does not always work because what if a tanker spilled oil due to adverse weather conditions (individual acts of violence) and not because it was faulty?

peaceful sea-scene
Photo by Finn Mund / Unsplash

A solution to this problem is to regulate firms using a so-called negligence standard: A regulatory authority specifies a standard of care that all firms must adhere to before they can transport any oil in order to avoid any penalties for non-negligence. And to ensure compliance with this level of care, the authority can expend some resources to monitor oil firms at random times.

With disinformation and social platforms, however, specifying such a care standard is not as straightforward as outlining conditions that render a tanker fit for the transport of oil. Obviously, if we have access to platforms’ proprietary model and its underlying data, monitoring their effort becomes significantly easier. In that case, we simply regulate using the same negligence standard, and we are done. If it were only that simple.

Yet, while specifying a care standard for disinformation is tough, there does exist a crude public notion about certain content that ought to be filtered from social platforms. The task is painless for items with explicit nudity, child pornography, abuse, or any form of graphic violence. And even so, I would argue, for entities such as “The Proud Boys,” “QAnon,” or anything anti-vax.

What complicates the issue further, interestingly, is that even if a regulator is able to publicly specify an explicit level of care for filtering false content, because this standard is explicit, disinformation authors can learn to bypass it.

Why? The sheer rate and scale at which users generate and share content means that disinformation is constantly evolving over time as new falsehoods spring up and new groups become targeted. In formal machine learning, this problem falls under the umbrella of performative prediction.

The public care standard, as a result, is likely to become increasingly lax over time, because the regulator, unlike social platforms, does not have access to the data or expertise to update it. And furthermore, because platforms’ model is proprietary, regulatory monitoring will always prove ineffectual at flagging new, evolved forms of disinformation.

In fact, because the platforms are not necessarily incentivized to proactively filter disinformation, I demonstrate that regulatory monitoring can only prompt them to filter content at the publicly-specified level of care, and no more; that is, platforms will only do the bare minimum to avoid penalties for not following this care standard.

Clearly, this scheme is not good enough. In an ideal world, social platforms constantly filter new forms of disinformation instead of following the more lax regulatory care standard.

If you have been reading up until this point, and are somewhat convinced in the way I’ve laid out the narrative, there is a particular way to make this scheme work. And that is with the regulatory threat of an overreaction, or a stringent public backlash, should there be a harmful consequence due to disinformation.

I’ve described how platforms are quick to filter content once any harm has occurred. In all likelihood, this action is taken to evade liability or future regulation, as the public is less likely to tolerate content from entities responsible for the harm. Effectively, this amounts to an improvement in the regulator-specified standard of care that must be followed by the platforms.

If they believe this standard of care will become very strict as a response to a harmful event, such that compliance possibly results in filtering benign content, social platforms can be incentivized to proactively filter disinformation in order to avoid risking such a scenario.

This result is one of the main takeaways of my formal analysis. Absent this threat of overreaction, there is no way to incentivize or verify social platforms’ prompt filtering of disinformation using only a negligence standard based on a publicly-specified level of care.

Ultimately, the outcome is not ideal because there remains a danger that regulatory overreaction can take the form of censorship. Still, with the scenario of misaligned incentives at hand, the threat of such an overreaction should not be taken off the table as a possible avenue for controlling disinformation using a negligence standard.

Advancing technological means for the quick detection of bogus content is crucial in limiting the harm from disinformation. But these efforts will remain futile if social platforms’ engagement-centric incentives continue to limit adoption and early use.

Show Comments