The 404 Media Podcast (Premium Feed)

from 404 Media

November 20, 2024 at 4:57 AM

Buzzfeed's AI Ads Are a Disaster

You last listened November 20, 2024

Episode Notes

Transcript

We start this week with Emanuel's story on how AI-powered ads on Buzzfeed are recommending people buy things like a hat worn by a person who died by suicide. After the break, Joseph talks about an unprecedented leak out of phone forensics tech Graykey. In the subscribers-only section, Sam tells us about HarperCollins' AI deal and how MIT Press is exploring one too.

YouTube version: https://youtu.be/wiRE9lIBzyc

Joseph: 00:04

Hello. And welcome to the 404 Media Podcast, where we bring you unparalleled access to hidden worlds both online and IRL. 404 Media is a journalist founded company and needs your support. To subscribe, go to 404media.c0, as well as bonus content every single week. Subscribers also get access to additional episodes where we respond to their best comments.

Joseph: 00:26

Gain access to that content at 404media.c0. I'm your host, Joseph. And with me are 404 Media cofounders, Sam Cole

San: 00:35

Hello.

Joseph: 00:36

Emmanuel Mayberg Hello. And Jason Kebler.

Jason: 00:41

Hi.

Joseph: 00:42

I'm not gonna lie. I went on complete autopilot while reading out that intro. And then you start having the metaphors of, wait, I'm still talking, but did I do the line correctly? And then, anyway, the time has passed. You did it great.

San: 00:53

That's perfect.

Jason: 00:55

When you when you're out when you've been out, I've tried to remember what you say, and it's it's hard to remember.

Joseph: 01:03

That's why I literally read it.

San: 01:05

In one episode, I was just like, hey. What's up? And didn't her intro anyone or the podcast?

Jason: 01:12

Wait. You've been reading it? I thought you did it from memory to be told honest.

Joseph: 01:15

No. No. I read it every single time. There's a Google Doc.

Jason: 01:18

Got it.

Joseph: 01:19

Not that impressive. Let's start with a story from Emmanuel. AI powered BuzzFeed ads suggest you buy hat of man who died by suicide. Before we get into what went wrong, Emmanuel, which is obviously quite clear in the headline, but what is this AI based advertisement system that BuzzFeed is using?

Emanuel: 01:46

So I wanna explain this by first talking about affiliate links because I think most listeners are probably familiar with this. Affiliate links is when you go to a website, you read a story, and whether the story is a review of a product, a bunch of products, or is on a review of any kind but happens to mention a few products, you can click on a link that takes you to an Amazon store page or some other online retailer. And if you end up clicking those links or buying an item via those links, then Amazon makes a sale and the publisher that, included the affiliate link gets a very small cut of that sale. And this is something that's become, popular, like, an effective way for publishers to monetize in recent years. Very common.

Emanuel: 02:45

I'm sure people have seen it on BuzzFeed, advice where we used to work, other websites. What this company called Trendi, which is based in Australia,

Joseph: 02:58

what it does spells t e n d I I, which is Right. An interesting spelling.

Emanuel: 03:06

Yeah. I gotta do it for the SEO. You know? So this trendy company, they are basically taking the same scheme, but trying to do it for images rather than text. So you'll read a story.

Emanuel: 03:20

The story will have images in it. They have, an AI model that recognizes objects in images, it tries to match the, let's say, shirt or shoes in an image to a participating trendy retailer and you can click on a button on the image that says shop this image. It will bring up a bunch of products that look like the items in the image and send you to the relevant store in order to buy them. And in the same way, if you make a purchase and sometimes if you click, it's enough for, the publisher to monetize that content.

Joseph: 04:03

Yeah. And, I mean, that sounds harmless enough. Right? I can imagine that would have been very useful for BuzzFeed if they had that technology when they did the dress article. You know, is it blue and gold or whatever?

Joseph: 04:16

Everybody's trying to dry, buy that dress. That being said, this technology is obviously from this year, 2024, and it's been applied to articles that were written a lot earlier. Right? Like, it it it's not just new stuff that's coming out. It's applied to articles that have been on the website for some time.

Joseph: 04:37

Is that right? Yeah.

Emanuel: 04:39

Yeah. So, you know, in the past couple of years, a bunch of new media companies have totally collapsed. Vice that we have survived is one of them. BuzzFeed is another one. I used to read BuzzFeed all the time.

Emanuel: 04:56

They had, like, a very good vibrant newsroom, did breaking news, investigative features, just, like, really well written award winning articles. And the way that I found out about this story, the fact that they even implemented this trendy technology is, 4 4 media reader who was researching this condition I've never heard of called empty nose syndrome. And, basically, this is a condition where you feel like your airways are obstructed even though medically, they are not. It just feels that way. It kinda feels like you're drowning.

Emanuel: 05:34

It's this really awful condition, which in some rare cases have, caused people to take their own lives because they, like, couldn't deal with the symptoms. And he was researching this, and he landed on a really well written BuzzFeed article about empty nose syndrome. And that article opens with this really sad anecdote about one of these people that took their own life, because of this condition. And the, article includes a very, you know, sad image of this guy with his young nephew, I think it is. And the shop this image button appeared on that photograph and suggested that he buy a beanie or something that looked like the beanie that he was wearing in that photograph.

Emanuel: 06:26

And he flagged that to me, and I went poking around, learned more about this trendy company, checked out some other sites that they work with, but really digging into BuzzFeed because it has so many articles and such like a a huge back catalog of articles and to see how it was implemented. And, the short version is that it was implemented basically everywhere, which resulted in some, like, really inappropriate monetization of horrible images.

Joseph: 07:03

Yeah. I don't think this is what Trendy intended. I mean, we'll get to their statement in a minute. Right? But I guess it's, well, to be charitable is an unforeseen consequence.

Joseph: 07:13

Right? So when you were going through the BuzzFeed archive and seeing how else Trendy had sort of been, you know, implemented on some of these articles or what Trendy was doing, what were some of the other examples? I think there was one from the Challenger team. Right? And what what was that one and what were some of the other ones?

Emanuel: 07:33

Yeah. So the I would say, like, least appropriate implementation here is, like, a classic type of BuzzFeed article. It's titled 17 creepy, disturbing, and terrifying things I learned about this learned about this month that I really, really, really, really cannot keep to myself.

Joseph: 07:52

Classic BuzzFeed.

Emanuel: 07:54

Right. A great a great BuzzFeed article. And this is actually from, like, the post BuzzFeed news collapse.

Joseph: 08:00

Right.

Emanuel: 08:01

And it's just like a listicle with a bunch of, like, disturbing factoids that are, like, ripped from the news or Wikipedia articles and stuff like this. And, yeah, there were some really unfortunate attempts to monetize images of there's the famous 1986 Challenger disaster where the space shuttle exploded shortly after launch and all the astronauts died. So there was an image of that crew, and the shop this image button tried to monetize their blue uniform and match it to, like, blue puffer jackets that people can buy. My favorite one, which is, I would say, less tragic and more funny is this really early medical illustration of some of the worst symptoms of syphilis that can cause, like, this rot in your face. And the trendy shop this image button matched that color of the faces to, like, a shade of MAC lipstick.

Emanuel: 09:11

So it's like, if you wanna look like you're about to die from syphilis, shop this lipstick.

Joseph: 09:16

Yeah. So I'm like it's only $63.25 as well according to the picture.

Emanuel: 09:22

You know? And then once it's Australian, actually, which is another, thing I wanna highlight here is the company is Australian, and I contacted BuzzFeed. And BuzzFeed was, I would say, like, apologetic, and, maybe horrified, but also they were like, actually, this is BuzzFeed Australia, which has spun out into its own company, which is something, that happens in the media. I don't know why, actually, but it's

Joseph: 09:53

like even had that with Vice. Right?

Emanuel: 09:54

Yeah. It happened to Vice. I think Gopher's house

Jason: 09:58

also runs Yeah.

Emanuel: 09:59

Gizmo has its own, like, Australian thing. Some of the gaming sites that I used to work at had the same arrangement, and that happened to to Vice not to to, BuzzFeed not too long ago. So they were like, we're actually not on top of this. It's its own company. They implemented it in Australia.

Emanuel: 10:14

The ads are, geofenced to Australia, which is why I had trouble, finding them when I was looking around.

Joseph: 10:23

But what what did you do? Did you have to use a VPN or something? Or

Emanuel: 10:27

We used the VPN and also the original tipster who had the ads, active when he was browsing just, like, started sending me more and more articles. And then we verified them by sending them directly to BuzzFeed and Trendy, which confirmed that, yeah, they were implemented kind of indiscriminately across the site.

Joseph: 10:45

Yeah. Obviously, the 404 media reader saw it and sent it to you. You've obviously seen it. Have any of the, like, normal BuzzFeed readers spotted this? I think some left some comments.

Joseph: 10:58

Right?

Emanuel: 10:59

Yeah. So this is interesting because I wasn't able to see the ads in the US. But what I was able to see is comments on those articles from Australian readers, which is how I was able to, tell that the ads appeared in some other inappropriate places. So BuzzFeed had this article, for example, on how different celebrities were reacting or not to the war in Gaza. And, one of the commenters was like, wow.

Emanuel: 11:33

It's really inappropriate that I'm looking at an image of, this completely bombed out street and a kid walking through it. And the shop this image button is like, how would you like to cop this look of this, you know, Palestinian refugee? So, yeah, that's a con the the the I in some cases, I didn't see the ads themselves, but I saw commenters as far back as a year ago being like, this is a fucked up ad.

Joseph: 12:00

This is a little bit speculative or more more of an open question. And I think this would just be more of our opinion, really. But, like, did this happen potentially because Trendy doesn't see BuzzFeed as sort of a news outlet anymore, even though I don't think that's entirely fair? And same as you, Emmanuel, I I thought they actually had the best investigations team in the business for actually quite a while before it got gutted. But, again, a little bit speculative.

Joseph: 12:26

But do you think Trendy sees this as just like a content platform rather than the website that's, as you say, gonna have coverage about, like, the war on Gaza or some or something like that? Or even if even if we don't know if that's how Trendy feels, is that is why it's happening here? It's like a tool for content, not a tool for news. Do you see what I mean?

Emanuel: 12:45

Totally. Yeah. I think, that's a good question. I think it says 2 things. 1 is it is not clear, and I asked specifically I asked both BuzzFeed and Trendy.

Emanuel: 12:58

Like, as the publisher, if you choose to have a relationship with Trendy, how are you choosing? Like, how do you manage where does this appear or not? And neither one of them wanted to comment on it, which I would take along with the fact that it was kind of appearing everywhere that they really weren't controlling for it at all. That but that's speculation. But I think it also says something about, like, what BuzzFeed is at this point.

Emanuel: 13:24

It's like this weird artifact of a once great newsroom. And now, like, I'm sorry. I know that there's people work there, but it's just, like, lowest common denominator content factory that just, like, publishes everything and, like, a ton of celebrity news. So when I was searching the Internet for other sites that work with Trendy, I came upon this, I don't know how to pronounce this. It's a the site is Dimarge, maybe, d m a r g e, and it's like a bunch of celebrity news, and it's what what people are wearing on the red carpet and stuff like this.

Emanuel: 14:01

And in that context, the trendy implementation I mean, say what you will, but it's, like, inoffensive. It's sort of, like, synergistic. Right? It's like you're reading an article about what celebrities are wearing, and then the AI tells you, like, hey. Do you wanna buy something like this?

Emanuel: 14:17

Like, click on that. And BuzzFeed has a ton of articles like that where that implementation makes sense, but then it also has, like, this back catalog of very serious, sometimes dark, sometimes tragic journalism where it doesn't fit at all. And at this point, you know, BuzzFeed is just something that they're trying to, like, squeeze every dollar out of, and they threw this AI product on it, and this is what we get.

Jason: 14:43

Yeah. I mean, that that's what I was gonna say is, like, it doesn't do news anymore at all to to my knowledge, and it's like there's zero respect there for the work that people did or respect for the archives or the stories that they told. Like, this is a publication that won a Pulitzer and did really, really incredible work, and it's like squeezing every penny out of every corner of that website is, like, what they're doing now. Like, I'd this seems to me like a not even a side effect of what they're trying to do, but, like, surely someone either forgot to turn this off on news articles or there's no way of turning it off on news articles, but, like, the the archives are, like, something to be ransacked for this company. It's not something to be, like, respected in any way.

Joseph: 15:34

Yeah. I mean, like, I'm not trying to do Trendy's or BuzzFeed's job, for them. But theoretically, you know, when we go and publish an article, we can tag it a certain way. Right? It's news or maybe it's a subject, like, AI or whatever.

Joseph: 15:47

Presumably, you could tag all of your news articles as news and tell Trendy, hey, please don't run fucking close efforts on this very horrible news story. Let me just read out Trendy CEO Aaron Wolf's statement just so we get that out there as well. Quote, unfortunately, this was an oversight, an accident, and obviously not what Trendy is intended to do. We have accidentally appeared on images which are clearly not right, and our intention is to continue to evolve our product so it may avoid circumstances like this happening in the future. We truly hope we have not caused any offense to the audience of BuzzFeed.

Joseph: 16:26

And I'm sure that you believe Trendy is all about positive, happy experiences for consumers and better advertising without commoditizing consumer personal data. And, quote, kind of an interesting thing at the end, where it's like, well, we're not using your data from browsing. We're just lifting, from somebody who has syphilis and recommending makeup, that way. I I I guess just the last question, Emmanuel, is you you mentioned one site that you saw as, like, celebrity news. Have you seen anywhere else?

Joseph: 16:58

And do you do do you see other companies, other media companies trying to do something like this? I just mean, do could you see that happening?

Emanuel: 17:07

Yeah. It has a bunch of very notable partners, and I would say most of those partners make a lot more sense. So you have Vogue and Marie Claire and other fashion brands. I believe popsugar was in there. I'm not sure all of those are active, but they did work with them at some point, and that seems, I think, fine.

Emanuel: 17:26

I just wanna add one more thing, and that is because we are talking about an AI product, I wanna make clear that, like, it basically doesn't work. Like, it will team. Like okay. So I'm looking at an image. I'm on this, demarge, website right now where the shop this image button is, implemented, and it's a picture of Ryan Gosling from I don't know.

Emanuel: 17:51

This looks like early 2000, and he's wearing a T shirt that has, like, the tuxedo look, like, printed on it. He's not wearing a tuxedo. He's wearing a tuxedo T shirt. And the shop this image button is recommending that I buy a black Allbirds T shirt. And what I mean by it doesn't work is that it's like and I did the same thing.

Emanuel: 18:13

I was looking at, like, an image of Kanye who is wearing always, like, these really crazy designer striking clothes, and it recommends that you buy stuff in the image, but it's not what he's actually wearing. It's just doing its best to match something from a participating retailer. Right? Because the reason I'm seeing an Allbirds T shirt is because Allbirds decided to make a deal with trendy and advertise its T shirts this way. So it's not showing you the actual product.

Emanuel: 18:43

It's showing you something that looks like the product. And it's like, if you wanna dress like Kanye, you're not gonna be able to do it by following whatever Trendy recommend. It's just like sort of putting something in front of you that looks like it. It's not you're not actually shopping the image. You know what I mean?

Joseph: 19:02

Yeah. Which I think makes it, again, I know you said it's not clear if it's, like, active with Vogue or whatever, but they say it was. And it's like, oh, you can get, like, you know, a runway look or something. That might not be readily available and, like, then it's not gonna work then either necessarily. I mean, again, I don't know if that's a use case at all.

Joseph: 19:19

But yeah. That that

Emanuel: 19:20

Exactly. Yeah.

Joseph: 19:21

Yeah. Alright. Let's leave that there. And then when we come back, we're gonna talk about an unprecedented leak out of the phone forensics tech, Greaky. We'll be right back after this.

Joseph: 19:45

Alright. And we are back. This is one I wrote, called leaked documents show what phones, secretive tech, gray key can unlock. And before Jason asks me a few questions, I do just wanna stress. This is really complicated.

Joseph: 20:05

It's really, really difficult. It was really hard for me to get my head around. Jason and Emmanuel very carefully edited it, caught stuff that I missed. And there are a lot of questions remaining because, you know, this is a super secretive company, GreyKey and magnet magnet, the company that now own, owns it. But there's some pretty interesting stuff.

Joseph: 20:31

So what we got is 2 spreadsheets, and they are a granular list of what iPhone and Android devices GreyKey is able to retrieve data from. And before I throw it to Jason, I'll say that the top line is basically the Graykey is only able to retrieve, quote, partial, end quote, data from all modern iPhones, that's the iPhone 12 up to the 16, that run iOS 18 or 18.0.1. We don't quite know about 18.1, which is the current most recent version of iOS that was released on October 28. The documents look like they're from October, but just before October 28. So, we don't have that bit.

Jason: 21:29

Okay. So, Joseph, what what is the gray key?

Joseph: 21:34

Gray key is a very small device that can sit on a law enforcement officer's desk, or I think there's a mobile version or maybe they've integrated that in there as well. But it really came on the scene in around 2016, 2017, 2018, like around then post San Bernardino. And everybody will remember that it was really, really hard for the FBI to get into the San Bernardino iPhone, big court case. Asimov Security eventually hacked the device. We actually touched on that, last week.

Joseph: 22:08

In the wake of that, this company called Grayshift launched this tiny little product called the Greykey, and it was something like 15 to 30,000. So the price has actually changed a little bit over the years, and there's sort of an annual subscription as well with a number of unlocks, that sort of thing. But Forbes, Tom Brewster over at Forbes, he first revealed the existence of it. And it basically sent shockwaves through the law enforcement and the forensic communities. Right?

Joseph: 22:40

After all these years of iPhones being uncrackable, it now appeared that, hey, here's a box that if you give them tens of 1,000 of dollars, cops will be able to get into there. And it has a little USB c cable on the front or a lightning, depending on what you're trying to do. You plug it in. And it does 2 things. Right?

Joseph: 23:01

The first is that it will try to get the passcode, and maybe it will do that through brute forcing, or maybe it will extract a a key chain, and maybe there'll be clues in there. I've seen some other documents where it'll extract information like messages, and it will then try to find clues on other passwords or other pins. Maybe because, oh, I found something that looks like a birthday. And now maybe that could be a PIN code or something like that. So you have the the trying to break into the phone, which is one of the most important parts.

Joseph: 23:33

And then you have well, it's also extracting or I think very fair as Emanuele said when he edited it, it retrieves the data from the device and then makes it it presents it to law enforcement officer. It's not just a big file. You go

Jason: 23:50

It's like an inter it's like an interface that'll be, like, here is messages from a specific app or here here are photos, like, stuff like that. Right? Like, it it's a it's a back end that you can kinda, like, browse through.

Joseph: 24:02

Yeah. It makes it much, much, much easier to dig through iPhone and now Android's, images of devices. And, yeah, as I said, it started with iPhones, but then a few years ago, they branched into Android because, of course, its main competitor is Cellebrite, which I'm sure everybody's familiar with.

Jason: 24:23

I was gonna ask about that, actually. I don't even know the answer or if you know the answer, but do like, by and large, are Cellebrite and GreyKey, like, more or less the same thing? As in, like, do one offer things that the other does not to our knowledge, or are they both just are they, like, direct competitors offering, like, essentially overlapping products?

Joseph: 24:47

Yeah. I would say they're basically direct competitors. You then have other ones that are a bit smaller like AlcamSoft, I think a Russian company. And they do look into mobile forensics as well, so there's that sort of thing. You then have these other companies that are more just focused, sort of, on the visualization side.

Joseph: 25:07

Like, oh, you already have the data, and they will interpret it for you. But GreyKey and Cellebrite, it's supposed to be an all in one solution. Right? You buy the box or you buy the service. You plug the phone in.

Joseph: 25:19

You unlock it or get what data you can and then interpret it. I think a slight difference to Cellebrite is that they have this advanced program where at least this was the case a few years ago. And the least celebrate documents we got touched on this a little bit. But in some cases, you send the phone to celebrate, and then they do it. And I think that's in part to look after their capabilities.

Joseph: 25:45

You know? Hey. If it's on the box, even if that's phoning home, there's a world in which someone could get hold of that and try to reverse engineer it. Right? That could be one reason for it.

Joseph: 25:54

The other is just that when it comes to mobile phone forensics, there are so many variables that it could be better for someone to do it internally rather than the tool of the cops' forensics lab or whatever. And by that, I mean more stuff like the screen is broken or the battery got hot and expanded and, like, completely screwed up the phone or something like that. And that's when you send it to Celebrae or someone to, look over.

Jason: 26:22

Yeah. Okay. So, as far so I remember Greg Key came on the scene, like, a few years after Cellebrite, and it was this pretty mysterious start like, upstart in the industry. As far as I know, there have been, like, very few leaks about GreyKey. Like, they're we've learned about them through, like, FOIA documents and sometimes court cases and things like that.

Jason: 26:44

But in terms of, like, big leaks about how it works and its capabilities and what phones it can unlock, so on and so forth, I'm not aware of any others. There there may be a a few at some point, but this is one of the biggest ones, I think, that has ever happened. So what did you get and what do the documents show?

Joseph: 27:05

Yeah. I I think it's, as I said, unprecedented. We've had similar leaks from Cellebrite, the ones we've reported on, the ones that other people have obtained, And then even way back in Motherboards, you know, we covered the hack of Cellebrite as well, where I think I got, like, 500 gigabytes of data from Cellebrite and did what we could there. But, yes, gray key, it doesn't leak very often. They treat their material very, very carefully where if you do a Freedom of Information request for emails, they don't send this sort of material, at least ordinarily.

Joseph: 27:41

As like an attachment to an email. You know? And I FOIA'd tons of material over Gratki, and I've never seen something like this. And this was a leak that somebody gave to me. It wasn't through, a FOIA.

Joseph: 27:52

But as for what they actually show, basically, if you update your iPhone, you're probably pretty good. Again, I'm looking at the table now, which shows GreyKey's capabilities against iOS 18 and 18.0.1. And everything from the iPhone 12 through to the 16 Pro Max says if you're running either 18 or 0.0.1, it can only get partial data. We don't know whether that is after 1st unlock or before 1st unlock, AFU, BFU. AFU being the the owner has already unlocked the device at least once since it was powered on, and that can make it a little bit easier.

Joseph: 28:35

Or BFU is obviously the opposite of that, and that hasn't been the case. We don't know. But I think it's still very a massive takeaway is that simply the spreadsheet does not say full. It's the fact that Grady just cannot get full data from a modern iPhone. I think that's the main, sort of, takeaway.

Joseph: 28:54

When you look at Android, it is obviously way more varied. There are, I don't know, a squillion different Android phones.

Jason: 29:05

A squillion?

Joseph: 29:06

Squillion. I mean I

Jason: 29:07

don't think I've heard that one before.

Joseph: 29:09

Maybe fact check me on that one. But, you know, they're all made by different OEMs, different manufacturers, all these different forks of Androids. Maybe you have a Samsung, maybe you have a a Google. Some phone made from somewhere else, in the world as well. And although the Google Pixel devices, according to the spreadsheet, can have data extracted if there are partial AFU status and they get a bit of data in AFU state.

Joseph: 29:40

The rest is a massive mix of all data, no data. It's a hodgepodge.

Jason: 29:46

I feel we talked about AFU, BFU last week, but I think we should do a a very quick reminder of what before first unlock and after first unlock is for people just joining us.

Joseph: 30:00

Yeah. So I touched on it. But to spell it out a little bit more, b f u, before first unlock, that would be if, let's say, the phone is off, the iPhone is off for whatever reason. The police officer turns it on to forensically extract it. And that obviously means that the actual owner of the device has not entered the passcode, which would, decrypt a lot of the information on there.

Joseph: 30:29

That's a b f u state before first unlock. AFU, the opposite of that. And that's gonna be when the user has unlocked the device at some point in time. That is especially important for a number of different reasons. I mean, let's say, police officers, raid somebody's apartment.

Joseph: 30:50

I'm just gonna say a drug trafficker for sake of example. And they're on their phone and they want to preserve evidence. The police will probably try to grab that phone while it's on and so it hasn't been turned off by the user. Because that would probably be in the AFU state and then easier to unlock. Now let's say, I don't know, tragically, a child goes missing, and they had a phone on them, and it runs out of battery, and it powers down.

Joseph: 31:17

That would then be a b f u state, and then that could be more difficult. There are sort of, like, real world consequences to how and when the phone is seized, which would dictate whether it can be unlocked or not, essentially.

Jason: 31:31

I wanna hear you go through, like, increasingly tragic, hypotheticals in in your mind. Okay. So there's been kind of a lot of leaks lately, and you've gotten, I think, pretty much all of them, in this world where, you know, last week you had the, new iOS feature about the sort of, like, reboot after a phone's been idle for a while. A few months ago, you had a Cellebrite leak. Now you have this gray key leak.

Jason: 31:59

What is sort of, like, the the current state of play for iPhone hacking by law enforcement? And, I guess, what is sort of, like, the the status quo here? Like, it seems to me like it's a super uneven, like, playing field more or less. I don't know if playing field is the right word, but it's not it's an uneven situation where some phones, they can break into, like, really quickly, really easily with these tools. Others,

Joseph: 32:36

many many members of law enforcement wouldn't want that. They would want a more permanent solution. That's, of course, what we had with Apple versus FBI, where, really, they weren't trying to get access to one phone. They were trying to develop a capability that would ensure access for future cases as well. Right?

Joseph: 32:54

And then Asimov hacks the phone, and then Greiki comes along. And it seems to be like there's this cycle where Apple will do some sort of security upgrade. You know, I mentioned USB restricted mode, which a few years back, that meant that you couldn't plug the phone, into a computer and get data from it, essentially, because the port would just turn into a charging port rather than a data one. That comes along. There's all of this you know, it feels like the sky is falling down for forensic investigators.

Joseph: 33:23

And then GreyKey and or CelebRight find workarounds, and then it continues. And then we get, as you say, this really interesting iOS 18 reboot where if the phone has been left idle for 3 full days or, you know, turning onto the 4th day, it will reboot and go to a b f u state. Now investigators have to find a way to, deal with that as well. That is just how this goes. It's a constant cycle.

Joseph: 33:52

They find exploits. There are workarounds, that sort of thing. That's not to say it'll be like this forever. In the same way, the custom encrypted phones of criminals became such a massive headache for cops that the FBI decided to run its own encrypted phone company. It came to a head there.

Joseph: 34:12

I I think it's absolutely possible that if it becomes too hard to get data out of iPhones or Androids or any devices and there's a really, really, really important case, of course, it could come up again where the FBI is like, we need a more permanent solution. And it's not just a US thing. Right? The UK has demanded technical capability access as well. Australia does some things as well.

Joseph: 34:36

Europe is part of the discussion too. But at least right now, it looks like Grakke and celebrate, they're a little bit behind. And then a couple of months later, they catch up, at least in some way.

Emanuel: 34:51

It seems to me speaking only about the iPhone, it seems to me like for a few years now, basically, the takeaway from all of these leaks is if you keep updating your phone as soon as the security updates drop, you're probably okay. And if that is the case, I am wondering, what do you think the value of a gray key is to law enforcement? Is it that they know that enough phones that they're trying to get into are not updated, or are they paying for those short points in the cycle where gray key is ahead of Apple?

Joseph: 35:30

I think they're paying for those windows. And and we don't know, like, the 18 adoption rate or the 18.1 adoption rate, or we know that Tim Tim, Cook, Tim Apple said that it is going very, very quickly and people are moving up to it. So there is there are those windows. I do think that, yes, even though they can only get partial data from iPhones, that's not no data. So there is something there.

Joseph: 35:58

And it's what we were talking about earlier, where it can be visualized, it can be mapped with other data. They can still perform investigations. And obviously, I don't know if this is fully fair because I'm not a police officer. But I would say probably the value proposition has probably gone down if it's not unlocking the phones, which doesn't say it's it's not worthless, but maybe the value has gone down a little bit.

Jason: 36:25

You know what? That that ray that raises something that we didn't talk about last week, but kind of plays with your story last week about, the new iOS update and the phone rebooting, which is, like, cops sometimes confiscate phones, and they might not break into it right away or they might not try to break into it right away. And I wonder if these are often sitting in evidence, like storage lockers, for quite some time until GrayKey or Cellebrite is able to make a new exploit and then or is able to update whatever they're doing. And then there's, like, a a frenzy of, you know, unlocking a bunch of phones. And I wonder if that is I guess, like, when we talked about this last week, I was like, oh, well, they're just gonna do it right away now.

Jason: 37:19

But in some cases, they might not be able to unlock these phones right away. They might be waiting for that window where, you know, there there is a period where they're able to unlock these phones before Apple gets a new update. And I don't know. Have you have you thought about that? Do you even understand what I'm saying?

Jason: 37:39

I feel like I'm rambling a little bit.

Joseph: 37:40

No. No. No. Yeah. I get it.

Joseph: 37:42

Because, yeah, I've spoken to people, even for my book, when I was talking to people who, like, had to get data from PGP Blackberries that were used by criminals, you would sometimes open a device and you would see a certain sticker and you'd go, well, that's a Phantom secure device. I'm not getting anything off that. You basically throw it in the trash because, like, well, what's the point? I can't get anything from it. But if it was another sort of phone, you would leave it there and would be, like, on the conveyor belt or in storage, and you would come back to it later.

Joseph: 38:10

And cops absolutely do that. They're just waiting for, oh, okay. Well, GrayKey just needs to update the latest version. When we get that, we'll be able to get into that device and that'll be great. And I think that is why the iOS 18 reboot timer is so such a big deal for, law enforcement or, you know, other people trying to get into the data like PHIUS potentially.

Joseph: 38:34

But I do think the main context in which it's introduced is law enforcement. No longer can they just have a phone waiting there for GreyKey to hurry up and push an update. It's like, we have 4 days and then we're screwed, basically. So I imagine Gracy is going to try to find a way somehow, and celebrate, to keep the phone in that non reboot stage. Maybe there's a way in which the timer going down, there's a way to fuck with that or something.

Joseph: 39:04

I mean, I have no idea, but that's gonna be what they're gonna try try to do, I imagine.

Jason: 39:09

Just need an artificial finger touching the screen sometimes and moving it around.

Joseph: 39:14

Yeah. Like, just

Jason: 39:15

a a mouse jiggler.

Joseph: 39:17

Yeah. Exactly. Or or just a guy. You know? Just a guy touch touching all the phones constantly.

Jason: 39:23

Phone toucher.

Joseph: 39:24

Yeah. Alright. We will leave that there. If you are listening to the free version of the podcast, I'll now play us out. But if you are a paying for a 4 media subscriber, we're gonna talk a ton about AI and the publishing industry and books.

Joseph: 39:41

Sam has a couple of really interesting stories about that. You can subscribe and gain access to that content at 404media.co. We'll be right back after this. Alright. And we are back for the subscribers only section.

Joseph: 40:09

As I mentioned, both of these are written by Sam. The first one I presume this is the order we're doing in. Correct me if I'm wrong, Sam. But HarperCollins confirms it has a deal to sell author's work to AI company. I mean, first of all, what is HarperCollins said here, Sam?

San: 40:28

Yeah. I mean, the the order of in which we're doing things that Joe is referring to is kinda weird because this is a story that I, that I wrote in the middle of trying to get comments as part of a different story, which we're gonna talk about in a minute, but it became its own mini scoop. But

Joseph: 40:46

The web the website's busy today. We've published, like, 5 articles or something. We're

San: 40:51

we're we're we be blogging, as usual. So yeah. So this story, just to back up before we talk about, like, what Harvard Con's said in particular, I found this story because, you know, like, begrudgingly, I will admit that I found this story through looking at Blue Sky. Jason probably enjoys that.

Joseph: 41:15

Sam, you need to give up the Blue Sky. Hey. Not because I don't appreciate the bit, but you're gonna get so tired

San: 41:20

I know.

Joseph: 41:21

Of doing the bit.

San: 41:22

I will never get tired of being a hater, first of all. So so

Jason: 41:25

I'm sick of it. Mean about one of our stories on blue sky, and I'm seeping.

San: 41:30

Yes. So it's begun. Jason Hayes was out. Hours,

Jason: 41:33

since we started this.

San: 41:35

Yeah. And who could have expected it?

Joseph: 41:36

Some can I say something super quick before we get to it? I've been moaning about this to you to, at least Jason. But being being on Blue Sky for a while, the conversation has been intelligent or nuanced. I've definitely noticed when more people have come over, I'm getting the normal stupid replies. Like, oh, this isn't news.

Joseph: 41:55

Oh, I knew this. You're so so smart. It is news. That's why I'm fucking writing about it. But anyway, I think that is just what happens when more people go to a social network.

Joseph: 42:05

Okay. Rant over.

San: 42:06

Yeah. You're telling me that, like, pedantic dumbasses are online. Like, that's crazy.

Joseph: 42:11

It's crazy.

San: 42:12

Yeah. Breaking news. So yeah. So with this, this is a story that kinda came about because I was in the middle of writing this other story about MIT Press. And then on Friday, an author who wrote a children's book, his name is Daniel Kibblesmith.

San: 42:28

He, published a book called Santa's Husband with HarperCollins, which sounds like a really good book, honestly. But he posted screenshots on Blue Sky kind of, like, without any context, but you kind of could infer that they were from his agent. And his agent was telling him that, Harper Collins had approached them saying, you know, do you want to opt in to this AI data training deal? And if you do want to, then you get, $25100 nonnegotiable as compensation. And this guy was, the author was pissed, because, obviously, that's not enough money, and just everything is becoming AI, LLM fodder these days.

San: 43:19

So some, like, different, like, blogs reported out, you know, that this happened on Blue Sky, but no one had really confirmed from HarperCollins that that was the case or that that was even what was happening. So I emailed them, and I was like, I wanna include this in this other story, so let me email Harper Collins and make sure we have this straight. And they replied, like, right away to my shock. And they said that,

Joseph: 43:45

which, by

San: 43:45

the way, HarperCollins is, like, one of the big five. It's, like, the the major publishers globally, so it's a really big one if people aren't familiar with, like, the publishing world. But, HarperCollins replied to me and said, that they had reached an agreement with an artificial intelligence technology company, but didn't name what company, to allow the limited use of select nonfiction backlist titles for training AI models to improve model quality and performance. And, you know, they went on the statement goes on to say, like, they they respect, the views of authors, and they gave them the chance to opt in to the agreement or to pass on the opportunity. They called them opportunity.

San: 44:32

So, yeah, they I kinda I sent a couple follow-up questions because they they mentioned that there was kind of a limited scope and guardrails around what input would be contributed to this training data, but they didn't reply. So, yeah, that's that's kind of the gist of that. It's, you know, it's pretty short, but it's also it's incremental part of, like, this bigger trend of publishers striking these deals with, places like Marcus Microsoft and, you know, big tech companies to train LLMs. So they get paid by a company like Microsoft to then give their books to the the AI machine.

Joseph: 45:13

I mean, can you just elaborate a little bit on that broader context? You have the HarperCollins news here, and the other one we're gonna talk about is MIT Press, but we'll get to that in a minute. What is some of that broader context? Have any of the other publishers done this? Because, obviously, there's a lot of there's, like, the news side that we look at a lot.

Joseph: 45:31

Right? And there's, I don't know, there's the OpenAI lawsuit with New York Times and Yeah. All of that as well. What's some of the publisher context?

San: 45:39

Yeah. So a couple other publishers are making these deals and basically not telling authors as far as anyone can tell, and are selling authors' works to these companies, you know, just without getting permission, let alone compensating them at all or giving them the chance to opt in like HarperCollins did. So, yeah, just a few that I heard about or found through the process of reporting out these two stories. I guess earlier in the summer, Taylor and Francis, which is a a big one, owns, Routledge Routledge. They sold their author's research as part of a $10,000,000 deal with Microsoft, and then authors found out through word-of-mouth, which is crazy.

San: 46:28

It's a huge deal.

Joseph: 46:30

And then all of the authors got part of that $10,000,000. Right?

San: 46:33

Totally. Yeah. Definitely. They got a $10,000,000. Each.

San: 46:36

Yeah. No. They got nothing. And then, there there were there were a couple others. It's like Oxford University Press, Wiley.

San: 46:45

They both have data deals in the books. You know, they're they're act they say they're actively working with that's what Oxford said, actively working with companies developing LLMs to improve research outcomes and champion the vital role that researchers have in an AI enabled world, which I think is funny because they're selling out their researchers to respect researchers.

Joseph: 47:14

Yeah. It's also interesting because Oxford is gonna be a slight office obviously more academic context than a Harper Collins or something. Right? Which I guess brings us to this next story, which is AI companies are trying to get MIT Press books. So what happened here, Sam?

Joseph: 47:35

There was an email sent out by MIT Press.

San: 47:39

Yeah. So earlier this month, MIT Press emailed all of its authors, with basically a call for input or for opinions. I saw the email was very, like, transparent, which was nice to see. They started off by saying, you know, we're we're being approached by several AI companies and data brokers about using text from our publications, so works that you have authored, as training data for generative AI tools in exchange for payment. So, you know, it's like if you're an author and you don't you're just like you're, an academic somewhere and you wrote an MIT Pressbook, you may be like, what are they talking about?

San: 48:21

And they kind of explain it a little bit, and they say, you know, we're we're being approached by these big companies to sell your work to them, basically. And they

Joseph: 48:33

So so at least they haven't gone ahead yet, which is different to the other ones. Right?

San: 48:37

No. They haven't. And they say in the email, they haven't entered any deals, and, you know, they they are approaching authors before they do make any decisions about deals. And they say, you know, what you say to us will help us make these decisions and consider these deals. Yeah.

San: 49:01

They I mean, the full email is long, and you can read it on on our website. But,

Joseph: 49:06

That's the point of an article rather than a podcast.

San: 49:09

Yeah. I mean, I could read it, but it's gonna be time. It's way long. But it's, you know, it's like they they then link to a Google form and so the Google form has a bunch of questions in it. The questions are like, you know, do you do you think that, the books that you have written should be used to train generative AI, share your perspectives on academic publishers like MIT Press entering into licensing agreements.

San: 49:35

You know, do you would you want to be able to opt in to one of these kind of agreements? Would you like to be given the choice? Those kind of things. So there's a handful of questions. And I emailed around to a couple different authors.

San: 49:48

Specifically, I tried to contact ones that had, like, tech books because I was like, they would probably be primed to know what's going on here. And, yeah, I mean, they people surprisingly or not surprisingly were like, wow. This is kind of scary. This is not great. Don't love this.

San: 50:08

But also the the director of MIT Press told me that, the responses have kind of been all over the place. Like, some people are saying, you know, we're concerned about these systems and the way that they cannibalize. She said they people have said they're concerned about cannibalizing their own work and the publishing economy, which I think is a very strong statement. And then on the other hand, some authors are saying they want to have their work trained on considering how fast these systems are growing. They want it to be in the, in the training data so that the the outputs are informed by them, which I think is kind of an interesting perspective.

San: 50:48

But I will read, it's Aaron Steinreich. He Aaron Steinreich's quote from this article. He wrote the book with Jesse Gilbert called The Secret Life of Data. It came out in April. And I thought his quote was really spot on, so I'm just gonna read it, verbatim.

San: 51:09

He said, I told them that the problem with using our work to train LLMs isn't that individual authors deserve to be compensated or given due credit for their work being churned through the AI grinder. Rather, it's a structural problem in which the labor of working scholars en masse is being used to feed the profits of insatiably greedy tech elites, affecting a massive upward transfer of wealth while simultaneously undermining the role the role of an expertise and value of individual perspective in the production of knowledge, which has widespread civic and cultural consequences, which I thought was kind of an interesting I mean, this is something we've talked about lots, on this podcast and also in our articles. But to put it so just kind of, like, succinctly, I thought was interesting. It's it's not so much, like, I want credit for it's due type of problem. It's that.

San: 51:58

But it's it's also just, like, structurally we're talking about, like, academics and scholars and, like, full time professors writing books on the side and probably not getting paid a ton of money. I don't know what MIT Press pays, like, on average, but I've heard not huge life changing sums of money, regardless. But it's more like the structural issue where, you know, you're working full time, you're a scholar, you're you're an expert in your fields, You publish a book because it's a labor of love, and then the people making the actual money off of that are gutting, you know, that industry that you're working in for their own profit to turn it out in the form of a chatbot. So, yeah, I think it's it's very kind of, I don't know, illustrative of, like, this whole entire problem of, AI training without permission.

Joseph: 52:57

For sure. I think that point's, really salient as well. Jason, were you gonna say something?

Jason: 53:04

I was just gonna say this. The $2,500 that HarperCollins is offering to authors, which seems to be a flat rate regardless of anything, is I find it to be, like, a re really kind of interesting number, I guess, because it's, like, not enough to change anyone's life, but it also there's, like, so much data being sucked up by AI that, like, $2,500, not nothing, but it feel it feels like it's, like, just enough to make people, like, kind of feel like they probably should say yes, whereas, like, in the long run, it's you're kind of building this, like, horrible machine. And I guess I'm just wondering not not to talk about, you know, is $2,500 a lot of money or not? You know, it totally depends on the context, but I think that for, like, unlimited access to probably what is, like, your life's work to be fed into this machine, I find it to be, like, a a very, like, difficult number to wrap my head around, whether that is, like, I just haven't seen how much money people are getting for AI before from, like, a for any of these deals, and this is the first time that I've ever seen a number, I guess.

Jason: 54:23

Yeah. Have we seen any other numbers? I I don't I can't see any

Emanuel: 54:27

this, week or last week that came out, but it was for a publisher. I think it was Meredith, is getting paid 16,000,000 by, maybe OpenAI for training on its entire catalog. That's, like, a notable number that I've seen. Jason, to what you said, though, I think this whole question is more complicated now than it was, like, a year or 2 ago. It's like, if somebody came to us a year ago and was like, I wanna give you $10,000,000 to train my AI model on 44 media, I'd be like, go ahead.

Emanuel: 55:00

You know what I mean? It's just like, I don't think, that, like, an LLM that's using our articles to train in the aggregate to, like, speak English well is is, like, gonna hurt us. But now with products like, Perplexity and Gemini and, like, essentially if there's a if if the AI model is essentially just, like, sucking the valuable information from our articles and presenting them to people who are searching for that information, then at that point, we are competing with the model. And at that point, it's like, I I I don't think at any price or it would have to be like a like an extraordinary price in order for us to to sell our work. But if I'm like a novelist, if I'm writing fiction for Harper Collins and you wanna train on my novel and you're gonna give me $25100, it's like I don't know.

Emanuel: 55:57

I don't feel like a novelist is gonna compete with an AI model. It's just not in the same yeah.

Jason: 56:05

I mean, but the these, like, these models are shitting out books, including novels, I guess. It's just like I but I do think that they're kind of, like, one to one aspect of, like, oh, did my book turn like, is it being turned against me in some way is really hard to know. I just also obviously, like, the economics of books are all over the place where you have a few hits and people make a ton of money off of them, and then you have, like, many, many, many books that don't earn back their advances. And I I don't know. Like, as someone there's only 2 of us on this podcast who have written books, so I I can't speak to it, but I think I would have a real hard time giving away my my book for $2,500 to to an AI scraper.

Joseph: 56:56

On the on the number, I have to be careful because, obviously, I presume there are confidentiality agreements in place. I would have read them before I signed. But I'll just say that I've sold my book for translation in in different languages. And I'm not gonna say it's that 2,500. I'm just gonna say, let's say it's that very, very broad ballpark, right, of how much a translation costs.

Joseph: 57:21

And in that case, the book is sold and it's translated into one language for one market, and that's it. And that seems you know, it can be a fair trade off. You know, oh, it's a market that wouldn't have been accessible before, that sort of thing. It's what you said, Jason, where you're selling the nonfiction book or the novel or whatever, like, forever, probably, to this AI system. And I simultaneously agree with you.

Joseph: 57:53

The $2,500 is just around the mark where people could, like, I'm sorry. I think it was Emmanuel said that. But that that's just around the the the mark where it'd be like, yeah, I I could take that. But it's just when you introduce the idea of perpetuity that it changes the calculation a little bit. You know?

Joseph: 58:12

And I haven't been given an offer like this. I I I guess I'll just ask you, Sam. Do do you know if well, who's your publisher, if you don't mind saying Sam? I think it's popular information. And do you know their stance on AI?

Joseph: 58:25

San: 58:25

Yeah. I don't. My publisher, when I started it still is, but my publisher is Workmen, which is, like, a small indie press. Not tiny, but, like, it's an independent. It's not one of the big five.

San: 58:38

And they're great. It's but then they got bought halfway through my process of publishing, like, halfway through the before they launched the thing by Hache, which is a it's one of the huge ones. It's one of the ones that's, like, suing the Internet archive. Like, it's, like, a big one, and that that transition process was a pain in the ass. And, you know, it's it's definitely kind of I feel like it's changed things a little bit, on the on my end, you know, from going from, like, this indie publisher to, like, they're now owned by some huge, huge publisher.

San: 59:15

But, yeah, I don't know their, I don't know their stance. I know you were looking up the the Hachette, like, UK

Joseph: 59:21

Yeah.

San: 59:21

Print. So And they they have, like, a AI disclaimer or whatever.

Joseph: 59:26

Yeah. My, my publisher is also Hachette. It's actually public affairs, which is an imprint of them. Right? And I'm not gonna read this whole thing because it's kind of kind of complicated.

Joseph: 59:40

But their position on AI, according to the UK website, we encourage responsible experimentation with AI for operational uses and recognize the benefits of remaining curious and embracing technology. Then they talk for several paragraphs. And they basically don't indicate that they have some sort of deal, but then they just say the end, we recognize that this is a fast moving area and will continue to be guided by industry standards and the needs of key creative rights holders. To me, that read and again, you know, reading between the lines is just my opinion. But it sounds like they're waiting to figure out what everybody else is doing when they say industry standards.

Joseph: 01:00:19

I mean, like, well, if everybody else is doing it, we'll do it too. And just to be clear, I wouldn't be super jazzed if my book was sold, to this without my consent or without monetary compensation. I guess the just the last question is I don't know this, but for you, Sam, I think you've searched the dataset which includes books which have been used to tray AI, and yours wasn't in there. Is that right?

San: 01:00:45

Yeah. Mine's not in there, and yours isn't either. That's, it's it's called Books 3. It's a big dataset of pirated books, that was kinda the center of this, lawsuits last year, where, I guess, the the allegations were that Meta and Bloomberg and, I think, another AI company, they were using Books 3 to train their LLMs. And then The Atlantic published kind of a a database little tool so you could, like, put in your name or put in, like, any author's name and see if your book was in the dataset and, therefore, part of this training data, which I think was I think it's also very interesting.

San: 01:01:34

There's also that's book 3 is also part of an even bigger dataset called the pile, and that's kind of even more pirated books. So, yeah, there's all these kind of, like, lawsuits that are I mean, that one's ongoing. There have been a couple others, like, anthropic is being sued by a handful of authors for, you know, their anthropocals accused of, using the same datasets to train, its LLMs and anthropic makes Claude, which is a very popular chatbot. Yeah. It's all and then the MIT press the director of MIT press said and acknowledged that, you know, this is all we're waiting to see, like, how these legal decisions shake out, like and whether or not training like this on copyrighted content is considered fair use.

San: 01:02:26

Like, all this is still very much, like, legal gray area, which I think is something that the tech companies are very much taking advantage of and scooping up what they can while they can making it too big of a problem to litigate against. And then, you know, it's from there, it's everyone's doing it. So, yeah, we'll see.

Joseph: 01:02:47

Yeah. We'll see. I guess we'll keep an eye on that. We'll see if our books get sold or pirated into these systems or whatever. I'm sure it's inevitable.

Joseph: 01:02:56

Well, I don't want it to be inevitable. I'm sure it'll happen, which is not exactly one and the same thing. But yeah. Alright. We will see.

Joseph: 01:03:05

And with that, I will play us out. As a reminder, 404 Media is journalist founded and supported by subscribers. If you wish to subscribe to 404 Media and directly support our work, please go to 404media.c0. You'll get unlimited access to our articles and an ad free version of this podcast. You also get to listen to the subscribers only section, where we talk about a bonus story each week.

Joseph: 01:03:29

This podcast is made in partnership with Kaleidoscope. Another way to support us is by leaving a 5 star rating and review for the podcast. That stuff really helps us out. This has been 404 Media. We will see you again next week.

View more episodes »