- Apr 7, 2023

Sideways Looks #29: A.I.-pril Fools, also Dating Data

Updated: Apr 10, 2023

We should be learning lessons from real disinformation, not AI hype and debunked images.

A couple of weeks ago I wrote a quick post in response to concerns, by Gary Marcus, that ChatGPT could become “a machine gun of disinformation”. At the time, AI-generated images of Donald Trump being dragged away in handcuffs were circulating on the internet. This prompted some, (over-hasty in my view) claims that generative AI would lead to an unstoppable wave of fakery and uncertainty – a “machine gun of disinformation”. My view was - well, we were already in that world pre-AI (scroll through this account to see examples of the sort of tactics which predate AI).

Since then, three other stories have strengthened that view – and one story has made me pause.

The first story was a picture of the Pope in a massive white Balenciaga-style puffer coat going viral. It’s a great photo. Many people, myself included, were initially taken in by it. But, as revealed by Chris Stokel-Walker, as part of his ongoing (inhumanly productive) run of tech reportage, it was actually generated using Midjourney by a guy who’d taken magic mushrooms.

This image led to similar commentary about the dangers of AI disinformation - which prompted similar annoyance in me. Like the Trump pictures, it was quickly revealed to be a fake. Also - the idea the Pope owns a massive Balenciaga-style puffer coat is low-stakes, pretty random, and of little relevance to most people’s lives. It’s hardly surprising people didn’t scrutinise it; it’s perhaps more surprising, and potentially heartening, that it was still revealed to be AI generated.

All this reminded me a bit of the old tale (associated with the below famous image) of WWII engineers studying damage on planes which returned to base and reinforcing the bits with most damage; until, the story goes, a statistician named Abraham Wald pointed out that these they were learning from planes which had survived their damage. The lesson to learn was: You should reinforce the bits where these planes had not been hit.

Anyway, back to the stories I read this week. The next two my list are a little different. The first was an excellent FT long read on scientific ‘paper mills’ which churn out articles using faked data (including doctored images). The writers spoke to specialist fraud detectors employed by some of the more scrupulous journals; but noted that many journals rely solely on the traditional, and flawed, system of peer review by (often unpaid) academics. As many of these fake articles were in biomedical sciences, there is a real risk of medical research being compromised. The second was an astonishing story in the New Statesman’s latest Long Read, on threats to Taiwan, which had completely passed me by:

In September 2018, the Chinese embassy evacuated 1,044 tourists, including 32 Taiwanese, from the Kansai Airport of Japan after it was struck by a powerful typhoon. Several Chinese state-owned media outlets covered the story, emphasizing that some Taiwanese tourists who identified as Chinese were allowed to get on the bus sent by the Chinese embassy. The story was immediately shared by a China-based online user and reported by Taiwanese media, engendering a lot of controversy in Taiwan. The article attacked the Taiwanese government for not evacuating its citizens as soon as possible, and a Taiwanese diplomat in Japan even committed suicide over the mounting criticism. However, the story was ultimately revealed to be a fake news report originating from a Chinese online media outlet, Guancha Syndicate, and shared by state propaganda organs, such as Xinhua News Agency.

Two stories about fakeries which had real-world consequences before they were uncovered; neither of which used AI. These are the stories we should be learning from, not being distracted by everything AI-related.

The final story, the one that made me pause, was about the Great Cascadia earthquake and tsunami that hit the Pacific Northwest in 2001. But before I get to this story and why I responded differently, let me just briefly take a step back and outline my broader position.

It ain't what you think, it's the way that you think it

Discussions of AI risk often talk of two camps: people who think AI is going to cause major social impacts, including bad ones; and people who think this is overblown. From what I wrote above, I sound like I’m in the latter camp. So let me be clear: I think either camp could be correct. We don’t know yet.

The important thing is not to claim what will happen; it’s to try and predict how things could happen, and thereby how they could be addressed.

I’ll accept it’s possible that there may be some new issues with generative AI, e.g.:

Maybe producing fake information will become a more fun, memetic experience (see how creating Pope / Balenciaga images has now become a bit of meme), prompting people to get involved in disinformation who previously would not have. But then again, if something had achieved ‘meme-fake’ status we’d probably know to be more cautious about it.
Perhaps fake news will be harder to remove from Bing Chat Search than it is from more traditional web search (see stories about false bribery and sexual harrassment claims – similar things could and did happen before AI, but this is more about the AI hallucinating things rather than bad human intentions). But also, I suspect it would be harder for a bad actor to reliably get a specific fake story into the output in the first place. Plus, anyone straightforwardly trusting outputs of a chat search tool is probably a lost cause anyway.
Maybe cyber trolls currently paid to spray disinformation will instead be paid to support cyberattacks, while generative AI takes their old jobs. But then the risk isn't the disinformation, it's the other stuff.

The point with all the above – I have considered paths by which AI might change situations for the worse, and explored how those assumptions might be false (sort of an anti-hero version of a 'theory of change').

Like the WWII planes, this is helped when we accurately understand both the past and the present. There is, ironically, a lot of misinformation about misinformation. In 2019 Demos wrote a very good paper arguing that “The widely-held focus on ‘fake news’ is myopic… Much of the information shared during information operations is not ‘fake’, but the selective amplification of reputable, mainstream media stories to fit an agenda”. This view is fortunately now more mainstream, but a lot of newcomers to counter-disinformation still tend to focus on fakery. Another example is the perception that Kremlin information operations to turn people against Ukraine have failed. That may be largely true in the West; but as research I did with CASM showed, this misses that a lot of the operations were actually targeted against UN members in the Global South, who later abstained from condemning Russia. Plus, as I keep mentioning, there’s still lots of (I think unwarranted) strong confidence that social media information operations made Brexit and Trump happen (it may have done; but I think it’s far from clear and the argument risks discrediting real political concerns).

So on the one hand, warnings of generative AI as ‘machine guns of disinformation’ may serve a useful role of scaring people into actually trying to do something. But on the other, I don’t think this “criti-hype” is actually helping us understand new risks, but instead mixes up AI and information risks in muddling ways.

We could learn more from the less AI-driven but more consequential fake news stories I described above. There, false information was created more strategically, initially targeted at fooling gatekeepers (academic journals, newspapers) and part of a wider network of information which added to credibility. This, I think, is an area where generative AI might have more leverage opportunities – as a sniper rifle, not a machine gun – which I’m sure many bad actors are already considering. For that, let’s look back at the final of my stories.

The Great Cascadia Earthquake of 2001

A series of images were posted on reddit about the Great Cascadia earthquake and tsunami that hit the Pacific Northwest in 2001. They were knitted together into a whole story, including pictures of George W. Bush meeting the mayor of Tacoma amidst the earthquake. Like the Pope image, a lot of internet-savvy people were initially taken in; even an online AI image detection tool created by Huggingface labelled the images as real. Again, though, it was fairly quickly revealed that these were fake.

The original reddit thread hosts some interesting discussions, including one user arguing “If future historians can only find an image on random people's social media posts, they'll just assume it's fake… A lot of major news articles are based on things that could be trivially fabricated in theory, like records or testimony from a whistleblower. It comes down to trusting that the major news publications will do their due diligence in reporting those stories.” But the comment which seems to have really hit a nerve, being highly upvoted and widely quoted, was “People in 2100 won't know which parts of history were real”.

This for me is the bigger risk of generative AI: that expert assessment, mechanisms which can (somewhat) defend against disinformation, could become harder.

Someone could quickly generate a network of related fake photos (the fact that image generators by default produce multiple images may make this easier) then do a bit of quick photoshop to remove the obvious AI signals (e.g. weird hands), and send them to a newspaper. The newspaper is a unsure, and requests more photos; an additional roster can be very quickly knocked up. Related audio and text can also be produced to give the story even greater depth (again, AI can help smooth over obvious ‘tells’ like non-standard English). Lots of outlets reject the pictures from this unknown photographer (despite their professional-looking profile, also AI generated); but one or two get excited and print them. But they’ve all been sent marginally different pictures, so when they print the pictures there’s now multiple different images of ‘the event’ in circulation; adding greater credibility.

This isn’t a machine gun on social media; this is a sniper rifle designed to get a story into reliable outlets, and thereby become credible.

Even outside this extreme example – and I’ll admit I don’t know if papers already have better credibility mechanisms than I give them credit for – there’s just a risk that the whole process of expert assessment becomes harder. It becomes harder for experts to instantly dismiss fakes, or be able to say ‘it’s just a single source’. Open-Source Intelligence Analysts, who often rely on links between images, may now have more fake networks of images to sift through (ironically the Trump images were produced - for entertainment rather than as disinformation - by Eliot Higgins, one of the biggest modern heroes of this movement). It’ll be easy, maybe even fun, for people to create images which are variations on ongoing real events, making it harder to unpick what’s actually unfolding. In sum: I’m less worried about impacts of AI for ‘normal’ people, who can be sufficiently taken in by existing disinformation machine-gun methods (not because they’re stupid; going through life fact-checking believable stuff which isn’t in one’s area of specialism is hard work). I'm worried about undermining traditional mechanisms that we turn to in times of uncertainty, not necessarily because they'll be completely deceived but because their work will become more complicated.

Again, I may be wrong in that prediction (when researching I found a thesis from 1997 worrying about the impact of digital photography on the news environment). But I think that I’ve laid out a credible path to risk, considering what actual new issues AI might create, which operates very differently from a machine gun of disinformation and would require different defenses.

So I think everyone concerned about information, but particularly people with influence, needs to think on two separate tracks.

The first is to consider hazards to the information environment which are already here, not allow AI to become the centre of that conversation.
The second is to consider where AI could actually have more leverage to change things, and how that might happen; as I’ve argued, I think that’s more likely in the careful and targeted operations, not widespread machine-gunning.

We can combine these tracks when useful; for instance, panic over generative AI might catalyse defensive action which should already have been happening (e.g. better critical thinking education, including for journalists and politicians). But that needs to be done strategically, not because the Pope doesn’t actually own that puffy Balenciaga jacket. Until someone 3-D prints it and sends it to the Vatican by drone, of course. Then technology really would be creating reality.

Fun Fact About: Dating Data

A friend of mine requested a fun fact about “dating data” to mark Valentine’s Day. I thought this was a great idea, then forgot to do that. Sorry friend (though the Ones and Tooze podcast did a whole series on the economics of love, so they got that instead). But I did actually already have a great fact about dating data ready to go, so you get that belatedly. This is from the book Dataclysm by Christian Rudder, founder of dating site OKCupid (info re-found here).

OKCupid tracked how many keys people pressed when people typed their first message to a new contact, and compared that to the length of the final message. The result was the graph below. Lots of keys typed but short final message – the bottom right half of the below graph – suggested that people were typing, deleting, and editing a lot. A more 1:1 relationship between the two – the dotted diagonal line in the below graph – suggested people were simply typing and sending.

BUT – what about all those messages at the top left? That would mean people were sending more letters than they had actually typed. In particular, what are those bright vertical lines at the left? You may be able to guess, so answer is below the graph.

Answer: The keys people were pressing were CTRL-V (or CMD-V on an Apple), to paste a previously copied message. They then may have pressed more keys to edit this longer message (hence the big cloud between the diagonal and vertical lines), or just left it as it is or also pasted the person’s name into it (hence the bring vertical lines).

The results: Copy-pasted messages were apparently 25% less likely to get a response. But as it was so much quicker, a person could send so many more of them and thereby get more replies overall.

And yes, I’m sure some of you are now thinking about how ChatGPT affects all this. My view: ChatGPT might help people who are bad at messaging to get more dates. But unless they’re also going to take a ChatGPT-enabled earpiece on the date, there’s a limit to the method.

Recommendations

ChatGPT for language practice: As I’ve been quite down ChatGPT recently – which is, for all its faults, a very impressive piece of technology – I want to recommend a cool use of it for language learners. If you go to ChatGPT on Chrome, and install the Voice Control for ChatGPT plugin, you can talk aloud to ChatGPT and it replies. It’s considerably less personable than an actual speaking partner, but it’s also much more available whenever you have spare time. Plus – a huge advantage in my view – you also end up with a transcription of the conversation, which is very helpful for learning. (And you can also then check it with DeepL or another translation tool; as any readers of this piece should know, never trust ChatGPT to be accurate 😉)

General Tech News: Readers of this newsletter have probably picked up that there’s quite a lot of tech news happening (to put it mildly). If you feel you need other guides beyond this newsletter my top picks are Platformer, Tech Policy Press, and Stratechery. All come with associated podcasts if that's your preferred medium.

Podcast: I’ve recommended Capitalisnt before, a (broadly) finance and economics podcast presented by the excellent duo of Luigi Zingales and Bethany McClean. They recently did an episode on The Twitter Files that escalated into a superb elaboration of the problems of tribalism today.

Finally, a museum. I have recently, finally, found a long-term flat in Berlin. In order to furnish it, I’ve been wandering around vintage stores. I rarely buy anything – that stuff is heavy – but I do enjoy the experience. I know I’m not alone in that. So if you want an extreme version of that experience – or just find wandering around shops purely for entertainment a bit odd – I recommend the free Sir John Soane museum in Holborn, London. The place is basically the house of a wealthy 19th century hoarder, with a bare minimum of order imposed. The colonial overtones are a little uncomfortable, but the experience of Just Stuff is memorable and enjoyable. Me and my friend wandered around pretending it was a property we were intending to buy; you don’t need to do that, but it’s recommended.

Thanks for reading. Please do share with others using this link, and let me know your thoughts via this short poll.

Oliver Marsh

social research on digital life

Sideways Looks #29: A.I.-pril Fools, also Dating Data

It ain't what you think, it's the way that you think it

The Great Cascadia Earthquake of 2001

Fun Fact About: Dating Data

Recommendations

Recent Posts