Jun 8, 2021

Sideways Looks #5: Using Machines to Tell Stories, also Cats in Boxes

Updated: Aug 1, 2021

Hello you,

Sideways Looks #5 for you. This week we’re watching films and ‘topic modelling’ UK election manifestos. Also learning more about cats and boxes. I hope you all have good weekends. I’m getting vaccinated on Sunday, which I’m very grateful for. As the UK is vaccinating by age group, me and many friends are getting them within days of each other – which is a fun shared excitement and also means we can swap stories of side-effects from different versions. If you haven’t had yours yet, I hope it’s coming soon. Oliver Thought for the Week: Horror, Comedy, and UK Election Manifestos What’s the opposite of ‘film buff’? Whatever it is, I’m one of those. I reckon I’ve seen fewer than 100 films, and am mildly phobic of cinemas. But even I know basic film genres. So let’s do a thought experiment. Imagine someone who is even more ignorant of films than me, entirely unaware of genres. Now imagine you got them to watch hours and hours of films. Our watcher would start to notice patterns. Many films have two people falling in love. Other films have people shooting each other and driving cars fast. Others have violent monsters. The watcher starts to create labels: ‘romance’, ‘action’, ‘horror’, and so on.* After finishing their mega-marathon they stumble into the light, and discover – to their pleasant surprise – that their labels match those already used by film experts. Or maybe not. Maybe they come up with Netflix-style labels, like ‘whimsical’, ‘gritty, or ‘heartfelt’ (these are all real Netflix labels, as is ‘dysfunctional family’). They’re not what the experts have been using, but still usefully described different films. This time, both our film watcher and the experts are pleasantly surprised. Or maybe – perhaps out of spite, personal strangeness, or disorientation from hours of films – our intrepid watcher emerges with a classification system based on the dominant colour in each film. ‘What did you think of The English Patient?’ they are asked. ‘It was’ (checks notes) ‘very yellow’, they reply. ‘But not as yellow as Yellow’ they add. Everyone is surprised, and not pleasantly. This is an extended metaphor for ‘topic modelling’, a process where you chuck text through a machine to see what it thinks the main themes (or ‘topics’) are. I did some of this in my PhD (see chapter 4 for cool diagrams). This week I discovered The Manifesto Project, which digitizes election manifestos from across the world. So I had a go at topic modelling UK manifestos to see what would happen. At this point I was hoping to give you interactive tool for exploring the results yourself. But I’ve struggled with that (turns out Google Data Studio doesn’t know that 1983 is a year) so you’ll have to wait. In the meantime, here’s a funky diagram:

I’m not going to explain it fully here, sorry. But if you can read the words, you'll see the grey cloud is mostly education-related words, the green cloud health-related, the red cloud economics, and so on. So the machine has separated out recognisable areas of political campaigning. Clever machine. (It’s called Iramuteq, created and made freely available by Pierre Ratinaud). BUT. The same text through a different machine (called MALLET) produces different, weirder results. Crime and education are mushed together; another topic is mostly vague words like ‘care’, ‘social’, and ‘rights’. Thinking back to our film analogy – is MALLET producing Netflix-like labels, surprising but sensible distinctions? Or is this more like labelling films by colour, a characterisation which isn’t wrong but doesn’t add anything? And is it bad that I prefer Iramuteq's results because they give me what I expected to see, somewhat defeating the point of using the machine? All this harks back to the theme of last week’s newsletter: balancing deep understanding vs. getting things done. Topic modelling is weirdly addictive, tweaking inputs and seeing what clusters pop out this time. (Iramuteq also makes a fun noise when it’s finished, which adds to that). But I should sit down and really understand what’s going on, and how to use what the machines are giving me. The problem: these are complicated machines. And using multiple machines, while good for rigour, multiples that workload. So maybe I should just accept what they give me. This dilemma - how to use machines to understand the world - is a broader problem beyond topic modelling, and may be worsening. Making complicated reality into understandable stories is already a challenge of modern communications (as seen a lot in Covid). Machines can help us tell stories, by reading more than we ever could and picking out patterns we may never see. But as these machines become more intelligent, their results are moving towards being fundamentally unexplainable. The price may be stories that even the storyteller does not fully understand. But at least Netflix will probably still have labels for them. * = If they were very diligent, they might even assign percentages to films which don’t neatly fall into one bucket – so Shaun of the Dead gets 60% comedy, 30% horror, 8% action, 2% romance, etc. This is closer to actual topic modelling: how much ‘like’ a certain label is something, rather than exact matches. Fun fact about: Cats in Boxes There’s a distinguished tradition of scientists thinking about cats in boxes. The ‘Schrödinger’s cat’ thought experiment was designed to attack quantum physics, but instead ended up as a popular way of teaching quantum physics. (Text explainer here, video explainers here and here – if you don’t understand it, don’t worry. It’s weird). More recently a team of scientists led by Gabriella E. Smith have done a real experiment on how cats react to imaginary boxes. It is well known that cats like to sit inside boxes. The experiment suggests that cats also like sitting inside optical illusions that look like boxes. (Paper here, summary here). An interesting side-note: This experiment sourced data from volunteers who set up illusory boxes and recorded their cats’ responses following instructions they’d been sent by the scientists. This is called ‘Citizen Science’, and has been used very well in fields from astronomy to zoology (and has itself been the subject of lots of great social research). It seems this approach had some problems due to unhelpfulness, and unexpectedly this came from the humans rather than the cats. The experimenters noted “a weakness of this study was… the small dataset. The most likely cause of this was significant owner participation attrition” – i.e., people giving up. I wonder if the wide reporting of this experiment might encourage more people to get involved and stay involved. Given this experiment may have been prompted by a Twitter trend (#CatSquare), that would be a fun positive circle from media, to science, to media, to science again. Recommendations Andy Sandford, of the consultancy ‘We Are Lean and Agile’, hosted a very interesting interview with Matt Prosser, Chief Executive of Dorset Council, on digital leadership in local government. I found it a really good blend of big picture thinking (how to futureproof; how to blend technologies, skills, and mindsets effectively) with practical problem solving (refusing to use paper so everyone got into the habit of doing everything digitally). A recommendation that I have resisted for many years: Pod Save The World. It’s a foreign policy podcast hosted by two former Obama staffers. I’ve resisted it, alongside many other American political podcasts, for the last few years as I felt they were often just howls of rage against Trump. Justifiable, but not an ideal use of limited media consumption time. But now I’ve picked it up and found it dotted with interesting insights from former insiders about challenges faced in the top levels of government. On a recent episode Ben Rhodes interviewed Samantha Power – both have also written books (The World As It Is, Education of an Idealist) and appeared in a documentary (The Final Year) which I’d thoroughly recommend. Harking back to Sideways Looks #1, this Harvard Business Review article argued for more considered and less pace-driven product development. A fun little nugget about Amazon: “CEO Jeff Bezos often called himself the ‘chief slowdown officer,’ and he got involved when he thought teams were moving quickly into coding without clearly defining the customer problem.” As I hope this week’s Sideways Look has been a little more technical but still fun, I’ll round off by recommending a technical-but-fun Twitter account. Chloe Condon is a former musical theatre actor who learned to code and now works for Microsoft. Her account mixes GIF-laden depictions of developer life, mentoring advice, and general good internet content. A joyful follow.

Oliver Marsh

social research on digital life

Sideways Looks #5: Using Machines to Tell Stories, also Cats in Boxes

Recent Posts

Comments