from 404 Media
Hello, and welcome to the four zero four Media Podcast where we bring you unparalleled access to hidden worlds both online and IRL. Four zero four Media is a journalist founded company and needs your support. To subscribe, go to 404media.co. As well as bonus content every single week, subscribers also get access to additional episodes where we respond to their best comments. Gain access to that content at 404media.co.
Joseph Cox:I'm your host, Joseph. And with me are four zero four media cofounders, Sam Cole Hey. Emmanuel Mayberg. Hello. And Jason Kebler.
Joseph Cox:Hello. Hello. So real quick. Hopefully, you're hearing this podcast in time. You should be because we publish this to subscribers Tuesday evening, and then Wednesday morning, free subscribers get it.
Joseph Cox:But on Wednesday, the eighteenth, that's tomorrow or today, depending on where you're listening, at 1PM EST, we're going to be having our latest FOIA forum. This is a livestreamed event of an hour, realistically two hours we usually go over, where we're going to explain to you how to pry records from the government using freedom of information requests and public records requests. Specifically, we're gonna be talking about a story that Emmanuel and Jason did a while back, a company called Massive Blue who were making these AI personas for cops that pose as college protesters, really, really wild stuff. So if you wanna learn how we did that and how you can replicate those requests, please become paid subscriber. Or if you already are one, keep an eye out for an email with a link to a livestream.
Joseph Cox:We've tried to pull in a lot of places. You know? I'll I'll put a link into the into the show notes here as well, and we also try to put at the top of the emails as well. And beyond that, Jason, I think you wanted to talk about merch as well?
Jason Koebler:Yeah. We have merch back in stock. Our four zero four code tank tops were incredibly, incredibly popular, so thank you all for ordering them. I ordered a bunch more. We have them in every size.
Jason Koebler:So if you wanted those, you can go to 404media.c0 and then click merch, and you'll see them there. And, also, if you preorder them, that means that your preorder is going out very, very soon. So thank you for the patience there. Should we get into it? Because I believe I'm asking you some questions, Joe.
Joseph Cox:Sounds good.
Jason Koebler:Yeah. So the first story we're talking about this week is airlines don't want you to know they sold your flight data to DHS. This is a really wild story. I didn't know about this at all that this was happening. Where did you first find the story?
Joseph Cox:So on May 1, I noticed that immigration customs enforcement, ICE, had a new contract in these government procurement databases. Basically, what I do is I have a a shortcut on my desktop and I'll click it and every so often, would just check to see the latest contracts ICE has with the government. Or, you know, I've done it for customs and border protection and other agencies as well. It's just sort of what we're covering at the moment. But on May 1, I saw that ICE entered some sort of contract with airlines reporting corporation.
Joseph Cox:And I'm like, well, that sounds interesting. What the hell is that? And I filed a FOIA, and then I looked for other agencies that had deals with Airlines Reporting Corporation, and we'll get into what those were as well. But the main one that this story is about is customs and border protection, CBP. I filed those FOIAs.
Joseph Cox:Then ICE actually released some more documents about this purchase of data, and Believer actually reported that about about a week or so after. And now what we have are these documents I got from CBP, and they lay out in much more detail the sort of data the DHS is buying, the use cases for it, and I'm sure we'll get into the most important thing in which you highlighted when you were editing the article, the fact that the airlines were basically trying to cover it up. Right? I feel like that stood out to you.
Jason Koebler:It really did stand out to me because I don't I don't know exactly the language that they used. I should have the story up, which I don't. My my computer exploded while we were recording this podcast. So I lost my tabs, but they are back up now. But you have it up, so why don't you read it?
Joseph Cox:Yeah. So one part of the documents and this is the contracts between airlines reporting corporation ARC and CPP tells the agency to quote, not publicly identify vendor or its employees individually or collectively as the source of the reports unless the customer is compelled to do so by a valid court order or subpoena and gives ARC immediate notice of same. In other words, UNR allows to reveal where this airline data came from that you were using to generate internal reports or whatever else you're gonna make with the data.
Jason Koebler:Right. So this is a part of a travel intelligence program through, as you said, ARC. And and as I understand it, this
Joseph Cox:is
Jason Koebler:like a company slash entity that was spun up by most major airlines in The United States for the purposes of selling customer data, like, more or less. Like, it is a data broker that is owned by American Airlines. And by American Airlines, I mean, United States based airlines, including American Airlines.
Joseph Cox:And American Airlines. Yeah. Literally, yes. Yeah. So they make this data broker, and the way it works is that when you book a flight with a travel agent, like maybe that's online or maybe you go to a physical one, there has to be some sort of conduit between the travel agent and you and the airline.
Joseph Cox:And ARC sits in the middle of that transaction and it's able to get this data. I mean, it it provides a legitimate service there. It routes this information. It allows these bookings to take place. But on the side, what ARC does is it develops products based on that data.
Joseph Cox:So maybe they can see, oh, wow. The number of flights went up after COVID or something. That's just a hypothetical, but there's all of these sorts of trends and that sort of thing. But what they're also doing, you know, according to these documents we got and the ones published by ICE is that ARC has a side hustle basically of selling this data to the government as well. And you and you mentioned some of the airlines.
Joseph Cox:I mean, there's ones on the board. I'll just double check them. Yes. They have representatives from Delta, Southwest, United, American Airlines, Alaska Airlines, JetBlue, and then you have Lufthansa and Air France from Europe as well and Canada's Air Canada. And there's a little bit of discrepancy in the documents we got.
Joseph Cox:It says eight U major US airlines own it, and then another one says nine. I think one probably joined over. We just frame it as at least eight airlines own this data broker.
Jason Koebler:Right. And when you say travel agents, I mean, obviously, you were like, if you go to a travel agent, that that would be a conduit. But you're talking about sites like Expedia, for example. Yeah. You know, like, really widely used websites.
Jason Koebler:I just wanna stress that it's like, this is not affecting only people who are going to a specific travel agent. It's like, it's third party booking services of which there there are many.
Joseph Cox:Yeah. That that probably would have actually been a better way to phrase it, but, like, third party booking services. Yeah. It's not just obscure brick and mortar travel agents in your neighborhood or something like that. It's massively popular sites like Expedia where this data is being essentially harvested from in I think it was the ICE documents or maybe it was the customs and border one that we got.
Joseph Cox:Interestingly, DHS says ARC does not contain data if somebody books a flight directly with an airline, which is kind of interesting because you go, well, it's with the airlines. Won't they just sell it? No. Because ARC is not in the middle of that transaction. You're going straight to the airline.
Joseph Cox:You're booking with JetBlue or United or whatever. That doesn't end up in Arc's really big dataset of billions and billions of records. And I guess I I should say that's passenger names, the credit card used, which I found really interesting. You can search by credit card. And then, of course, the flight itineraries, so you know where someone has been, where they maybe are gonna fly that day, or something I I find really interesting, kind of know what they're gonna do in the future, which isn't really the case with a lot of data we cover, like location or whatever.
Joseph Cox:It is predicting and showing where someone is going to be at a later date, which is pretty novel.
Jason Koebler:Yeah. Yeah. I I guess I'm curious, like, do we know what law enforcement does with this type of data? Because some of the responses I saw to this article, and I don't think they're good responses, but some of these responses I said I saw were like, well, you have to show your ID when you get to an airport, and therefore, like, DHS will know that you're going to be there. I also assume there's, like, some sort of roster or something.
Jason Koebler:I actually don't understand exactly how this works, and I'd be curious to either read more about it if if someone's already reported this or to do more reporting on it. Like, how does DHS know which people are going to be at an airport on any given day? And I would imagine that this is one of one of the ways. Right? It it's as you said, they can then predict who is going to be where and at what times because they they have this sort of, like, future data.
Joseph Cox:Yeah. And, I mean, I think an important thing to remember is that DHS is not a monolith. Right? Like, TSA is going to know who is in an airport at that time because you're showing your ID to the TSA agent, and you're literally right in front of them. Like, you're yourself, basically.
Joseph Cox:Right? And they're gonna have access to other data along the way there. Other parts of DHS can get this data and potential potentially in other ways as well. But, again, it's not like a a one size fits all solution. The reason that Customs and Border says it's buying this data is it's for the office of professional responsibility, OPR, which is basically like its internal watchdog.
Joseph Cox:It's internal affairs that if somebody in customs and border protection is doing something corrupt or criminal or whatever, this internal affairs unit can and is supposed to investigate them. And when I got a statement finally from Customs and Border Protection about this, they said it's just used for that. It is just used for that division or unit to investigate those sorts of people. And that's all well and good. Some people may even say that that's a legitimate and a good use case, but we can't have that conversation until now because we published it and because we found out.
Joseph Cox:And the airlines were trying to cover up in the first place, you know. Like, it's really about the sale rather than the use.
Jason Koebler:Well, there's that. And then which I think we do we should talk a little bit more about. But then DHS is not the only agency that has bought this sort of data. Like, ARC has deals with other agencies as well. Right?
Joseph Cox:Yeah. So again, when I first saw the ICE deal, then I did a bunch of foyers, and we're still waiting for the vast majority. But beyond customs and border protection, there's a secret service, The SEC, DEA, Air Force, US Marshal Service, TSA, funnily enough, and ATF, the Bureau of Alcohol was its back on firearms. Now, I don't know. Maybe SEC is using it for a very different reason to DEA.
Joseph Cox:You would imagine so because those agencies have completely different mandates, but we don't know specifically what they're using it for yet. And that's why we have all of these freedom of information requests out. And again, maybe it comes back and they're using it for fairly innocuous purposes. Maybe some are using it for much more interesting ones, but the sale is happening in the first place. And, you know, because the data is being sold, there isn't really a legal mechanism there.
Joseph Cox:They're just buying access to it.
Jason Koebler:Right. And I mean, what really stood out to me again is that it's happening through this third party. It's like happening through this umbrella corporation, you know, arc that, again, Airlines Reporting Corporation, which no one has ever ever heard of that because they have an extremely low profile. And then, again, no one has heard of it because in its contract, it says, don't say where the the data came from. And that's like one of my favorite things to FOIA, and that's like a really great thing to FOIA if people are are listening to this and are interested in it, is a lot of times when companies sign contracts with the government, the company will try to put a nondisclosure agreement into the contract, but that nondisclosure agreement itself is subject to FOIA just because of the way that, you know, FOIA works, and that is a public record.
Jason Koebler:It's, you know, it's taxpayer money that's being used to to purchase this, and therefore, it should be available. And so this wasn't a specific nondisclosure agreement, but it was a, you know, a section of the contract that said, hey. Don't say that we signed this contract. Don't say that the airlines were the source of the data. And and it's an example of these companies, these airlines sort of, like, double dipping.
Jason Koebler:Like, they're they're getting into it's just like them finding other ways to monetize other than just, like, selling you access to a flight. They're they're figuring out, like, okay. Well, now we have this huge information database about who is flying, where they're flying, what credit cards they're using, that sort of thing. How can we further monetize this? And I think that is a conversation worth having.
Joseph Cox:Yeah. And I think that's why so many people were pissed off at this. You're already paying for a flight where you're gonna be crammed into some economy seat with no leg room. You're gonna have to pay extra for a bag. You have to pay for Wi Fi or something.
Joseph Cox:And then on top of all of that, we're we're also gonna sell your flight data to the government, and I don't think people are particularly happy about that. You mentioned the nondisclosure agreements, and it reminds me of when we covered a lot of location data being sold to the government. That is ordinary apps installed on your phone, sending location data off to a company, and then they sell it directly or it gets sold to somebody else who then sells it to the US government, including customs and border protection, funnily enough. You go through the sort of contracts related to that, and there was one for a tool called Locate X made by Babel Street, I think. And there was sort of a mendum in there where it said you cannot use this information in court, and, like, you can't reveal this information.
Joseph Cox:It's supposed to just be used for, like, leads and tips and intelligence. And it reminded me of that basically where you have these government agencies buying data, and then there may be no transparency or accountability of where that data came from or how it's being used by design. And I guess that also leads to the I feel it's obvious, but almost to stress it. This isn't being done with a warrant. I don't think you necessarily need a warrant to get flight data ordinarily, but this isn't just talking about one or two flights.
Joseph Cox:It's talking about customs and border protection and potentially these other agencies buying bulk access to billions of people's flight records, then they can search through basically at their own whim. You know, I didn't see anything in the contract that says, you can only use this for national security. You can only use this for combating terrorism or or something like that. I didn't see any disclaimers like that in the contract. So at least theoretically, until we get more information, it's kind of up to the customer to do what they want with this information.
Joseph Cox:And we see that when people when law enforcement agencies buy data because that's exactly why they're buying it. They wanna be able to do what they want without the legal processes in place. Right.
Jason Koebler:This this came up in in the context of some of our flock reporting, which I'll just, like, quickly run through. But a few commenters on on our website were saying, well, why don't the cops need to get a warrant to search, you know, for license plate data or whatever? And and the argument that that one would make is, like, you don't have an expectation of privacy when you're in public. There is nothing stopping a cop from standing on a corner and writing down the license plate of everyone that drives by. But what our laws were, like, not really written for was the automation of these sort of things and the privatization of it and also the the fact that it's done at scale and in, like, a historic way.
Jason Koebler:And so there is a really interesting lawsuit in Virginia about FLAC, about whether, you know, the automation of this type of technology does change the calculus as to whether cops need a warrant or not, and we're gonna be following that. But but, basically, it's like you can stand on the corner and look at a license plate, but can you stand on every corner of every street at the same time with an automated camera, take a picture, log that into a database, you know, make a historic record of where a specific car has gone, and do that all over the entire country all at once. And that's a little bit like what we're talking about here where as private companies get more and more into surveillance, they are deploying technologies, and they're doing things that are allowing for, like, really big, like, large scale mass surveillance. And then because the cops are buying access to these databases, the cops feel like they don't need a warrant because the police themselves are not the ones who are doing the surveilling. They are, like, buying access to a commercial product, and then the commercial company is the one that's actually doing the surveillance.
Jason Koebler:And I think that's a little bit like what's happening here and like what we've seen over and over again with, you know, social media monitoring companies, with, you know, data brokers in general. And I I do think it's like a big flaw in our privacy laws and something that, you know, we need to to talk more about.
Joseph Cox:Yeah. Absolutely.
Jason Koebler:I guess last thing on this is, you know, future reporting, future FOIAs on this. Like, what are you looking into next here?
Joseph Cox:Yeah. It it's really just waiting to get back those contracts from those other government agencies. Also looking into whether local police have access. One part when I originally wrote this story, it was it was focused on the the contract says Customs and Border Protection are using this data in part to support state and local police, which is obviously very interesting. We will write when you edit it to bring the basically, the cover up higher up into the story, but I find that very, very interesting.
Joseph Cox:Do local police have access to this? I mean, I think that would be crazy, but I've I've seen some pretty wild things over the last few years. So there's that. There may be more emails about it and that sort of thing. And, yeah, just who has access to this this data on a on a wider scale, really.
Joseph Cox:Alright. Should we leave that there?
Jason Koebler:Yeah. Let's leave that there. When we come back
Joseph Cox:I beat you.
Jason Koebler:After the break, we will talk about AI bots that are scraping museum websites, open libraries, archives, etcetera. It's a story by Emmanuel. We'll be right back after this.
Joseph Cox:Alright. And we are back. As Jason said, this is one written by Emmanuel and the headline is AI scraping bots are breaking open libraries, archives, and museums. Emmanuel, this is based on a survey just to lay the groundwork. Who made this survey, and what did it find at a high level?
Joseph Cox:And then we'll get into some of the really interesting specifics.
Emanuel Maiberg:So this was written by Michael Weinberg who works at NYU at something called the GLAM eLab, and that is something that NYU and the University of Exeter work on together, and it is basically an organization that helps small libraries, galleries, archives, and museums take their collections, digitize them in some way, and make them available for everyone online for free.
Joseph Cox:So we don't know what organizations were surveyed exactly. Right? It's like an anonymous survey. Is that right?
Emanuel Maiberg:Yeah. So we have heard anecdotally, and it's something that we've reported on before, that AI scrapers, which are these bots that kind of troll across the Internet, look for valuable training data, and then hoover it up so they can train AI models, are flooding all these open resources with too much traffic, more than they can handle, and taking them offline in some cases. And you hear about that happening at this library or that museum or this collection, but this is the first attempt by someone to, like, quantify the problem and see how widespread it is. And the bottom line is that it is very widespread, but there are some limitations that Weinberg acknowledges in the study. First of all, he invited as many organizations as possible to participate.
Emanuel Maiberg:Only 43 of them participated, and it's possible that they are self selecting in some fashion. But
Joseph Cox:What what what do you mean what do you mean by that?
Emanuel Maiberg:By self selecting? It's possible that some museum or some library saw the request and was, you know, befuddled by it. Then they were like, what are you talking about? We don't have this problem. It didn't respond.
Emanuel Maiberg:And, obviously, if the library did experience something, they did, you know, they they chose to participate. Right. So those 43 respondents, which we could talk, you know, in some more some more detail about the data, but they're anonymous, a, so they can speak more freely about what they're seeing and share some more private data and analytics about, like, how much traffic they're getting and what is knocking them offline. And I would say most importantly, and I think this would probably be familiar to you from security reporting, they don't want to be too specific too specific about who they are and what they're doing to stop the AI scrapers so the scrapers don't learn about the countermeasures and then can better circumvent them. So, yeah, that's that's kind of who's who's involved in this and why they're anonymous.
Joseph Cox:Yeah. Do we have any idea what sort of scrapers or specific scrapers we're talking about? Like, are we talking about chat GPT or anything like that, or do the libraries not know, and that's part of the problem? You know what I mean?
Emanuel Maiberg:There's also an attribution problem. Right? It's hard to say for sure. Some of them so the the report doesn't name the specific scrapers, but we do know from experience that anthropic sorry, not anthropic. Perplexity, for example, in the past have been caught ignoring robots.
Emanuel Maiberg:Txt, which is this file that site owners can put in their website to tell bots not to scrape it. And then in the past, this was sort of like an accepted norm that was respected, but increasingly, as this training data is becoming more valuable, they're they're ignored, and, like, perplexity is one that has repeatedly ignored robots. Txt. Others self identify. Whether they ignore the robots.
Emanuel Maiberg:Txt file or not, they self identify what the bot is. And other times, the organizations can make a guess based on the IP ranges that are hitting them. They're like, oh, these IP ranges are clearly coming from Alibaba. So it's safe to assume that Alibaba is scraping this website for AI training data based on the behavior, but it's hard to say for sure. It's possible that somebody is using Alibaba infrastructure, but it's actually a different company.
Joseph Cox:Yeah. So what's some of the the impact you say that, like, some get knocked offline or maybe take themselves offline? Somebody here has used a quote of a of a DDoS attack comparing it to that. Like, what's some of the concrete impact that these scrapers are having on, like, these open databases and archives and kind of ruining it for everybody?
Emanuel Maiberg:Yeah. So one interesting thing about the report is that in the vast majority of cases, the only reason that an organization knows this is even happening is because their services are degraded to some noticeable degree. The site slows down. It's not accessible at all. This was just a coincidence, but last week, I think on Friday, University of North Carolina Chapel Hill, which is a big university, a research university, has a very robust kind of online library full of books and papers, and it's something that students use, teachers use, just the public can use.
Emanuel Maiberg:And they found out that this was happening to them because nobody could access it, which is very disruptive to the organization and the student and the teachers and all of that. And they have a big IT department, and they solved it by deploying some new kind of firewall that, again, they don't want to talk about in too much detail, so people don't learn how to circumvent it. But that's sort of like a typical example of how people know that it's a problem. The the the impact is that these resources that exist for the public and the whole goal of these organizations and of Glammy where where Weinberg works, is to make this cultural heritage, as he calls it, available to as many people as possible. That's the mission of the organization.
Emanuel Maiberg:It's like, oh, there's a little museum in France that has, like, a bunch of manuscripts that you can go see if you visit it, but wouldn't it be great if they just digitized everything and made it available online? And it's like, yes, that would be great, but then that opens them up to these scrapers, and all that data is very valuable now. So the impact is that the public no longer has access because it's being hoovered up so aggressively by all these different AI companies.
Joseph Cox:Yeah. Like, people want this data to be accessed by the public for the reasons you just laid out, but the it seems that the trade off is you make it publicly accessible. You get swarmed by all of these bots, which are gonna degrade the archive and, you know, potentially knock it offline or whatever. Is there basically nothing to be done with that trade off? Like, is it sort of I mean, this is a bad way to put it, but, the cost of doing business because it's it's not a business, but you see what I'm getting at?
Joseph Cox:Like, is there just nothing to be done or what?
Emanuel Maiberg:So there are things that people can do. The response to the story has been very interesting because I feel like I've heard from a bunch of other institutions, which I don't know if they're included in this survey because it's anonymous, but judging by their response, I think they were not. So the problem is, like, again, demonstrably widespread. And people have kind of been telling me interesting things about what they're doing and what their solutions are, and I hope to have a story in the next couple of weeks about some interesting solutions. I want Jason actually to talk about some solutions that he reported on.
Emanuel Maiberg:Cloudflare has a thing, and there's kind of like these funny solutions to trick the scrapers. But I also like to talk about this tension. I'm going to make a tortured analogy, but it's something that I talked to to Weinberg about. But in 2023, this book came out called the art thief. Have have anyone have you guys heard of this at all?
Emanuel Maiberg:Really great book, nonfiction. It's about one of the most prolific art thieves in in history. He worked in the early two thousands in Germany and France, and he stole more than 200 pieces. And the way that he did this is he didn't steal, like, gigantic famous pieces. He wasn't going after the Mona Lisa.
Emanuel Maiberg:He just went to these small regional museums in the countryside and, like, stole tons of small pieces. Altogether, they were worth, like, I don't know, $2,000,000,000 or something like that, and it's a really fascinating story about why he did it, how he did it, what happened when he got caught and all of that. But one of the lessons of the story is that the author talked to, like, the owners of these original museums, and they explained that when something gets stolen from one of these museums, the damage isn't only that the piece is gone. It's that it breaks the social contract of how these museums operate. Right?
Emanuel Maiberg:It's like these museums might have a security guard, they might have security cameras, but it's not like Ocean's 11. Right? There isn't like lasers and like heat sensors keeping the pieces safe. The social contract is is this art, this history, these texts are part of our collective cultural heritage, and we're putting in the work to make it available to the public because it belongs to everybody. And the public, in return, kind of like agrees to be respectful and not not fuck with it.
Emanuel Maiberg:And when somebody steals something, they break the social contract and that forces the museums to lock everything down and make it less accessible. And this is kind of what is happening online as well. So one thing that people can do, right, the people who manage these collections, they can have people log in. They can have captchas. They can have all kind of, like, friction that would make it hard, if not impossible, for an AI scraper to get all the information, but would require a little bit more from human users as well.
Emanuel Maiberg:And the maintainers of these collections are very reluctant to do that because the entire point of doing this, like the entire point of digitization and Glammy and all this stuff is to make it as accessible as possible. Right? It's like a it's a very benevolent mission that these people have. So there's that issue. Like, they're reluctant to do it because they wanna make it so available.
Emanuel Maiberg:And then the other thing is that, and this is something that Weinberg really emphasized, even at and he he focuses on small and medium sized organizations, but he says even in, like, a big organization, once they digitize something, there's like one person maybe who is responsible for keeping that stuff online and functional. It might be someone's job on top of a totally different job that they already do. It might be a volunteer. And any change or update that you force them to introduce is very, very difficult to implement, if not impossible. Right?
Emanuel Maiberg:It's like if you go to one of these organizations, you go to one of these small museums and they're like, hey, We need to implement a capture. We need to implement a login. We need to implement something like that. They're like, well, we're just gonna take this offline because we can't do this at all. Right?
Emanuel Maiberg:It's just, like, impossible for us to to to put in the work on top of what we already did to to digitize it, so we're just not gonna have it at all. Jason, do you remember the the the Cloudflare?
Jason Koebler:Yeah. Yeah. So, I mean, this is it's very grim for the reasons that you just said because a lot of these organizations are probably, like, barely financially solvent depending on who they are and what they are, and and it's expensive to keep this sort of thing up. And then that's to say nothing about the status of, like, the actual things that are being scraped. I assume some of them are in the public domain by now if they're, like, really old.
Jason Koebler:A lot of them probably are not. But we found that, you know, AI companies don't really care. I did write a long time ago about different types of mazes that have been deployed. There was one that was like a DIY open source one by a specific programmer. We probably talked about it maybe six months ago or or maybe a little longer than that.
Jason Koebler:He called in AI Tar Pit, and it was just like an infinitely generating website that a human being would click off of pretty much immediately, but that an AI scraper would scrape over and over and over again kind of indefinitely. For for something like a museum to spin this up, it's like part of the point of an AI tar pit is to waste a scraper's time and, like, by creating infinite number of pages, which doesn't do anything really to help the museum because that uses a lot of their own bandwidth because they they are allowing the scraper to hit it. They're just hitting, like, nonsense over and over and over again. But Cloudflare, the the gigantic Internet infrastructure company, released something very similar to this. It's like a similar design, and and that is something you can put in front of it now.
Jason Koebler:You can also, as you said, you know, put a login wall sort of depending on the scraper, like, may or may not want to try to get past that. And, you know, that's something that we did to preserve the cultural works here at four zero four Media was put them behind a login wall sometimes. And and I think that that's helped. I think I'll probably talk about this more later, but I went to a journalism business conference two weeks ago for for for media to talk about this, and a lot of big news outlets were talk where they're talking about how they are trying to protect their own sites from AI scrapers. And I believe it was the Daily Mail was there, and they gave a presentation about the fact that you sort of need to stop these scrapers very early on and also catch them in the act more or less so that you can then go to the company and say, like, we know that you are scraping this when you should not be.
Jason Koebler:And for the Daily Mail, was for the purposes of trying to strike a deal with OpenAI or with these different companies. Like, say, hey. I know you're trying to steal this stuff. We have stopped you with our login wall or with our, you know, robots. Txt or the various things that they're doing.
Jason Koebler:Like, let's strike a deal here. But one of the points that that their business person was making was, like, once this stuff is scraped, you kinda, like, lose a lot of your leverage unless you're willing to sue them. And that can be really expensive. We don't even know if it's gonna be successful. There's tons of lawsuits out there right now that are still ongoing and that, you know, we have been following and will continue to follow.
Jason Koebler:But for something like a museum, it's interesting because I bet their collections don't change all that often. They're getting hit by these scrapers. The value has already been, like, extracted probably in in many cases. But the way that these scrapers work, they're probably coming back over and over and over again and hitting them over and over and over again even though they've already gotten what they want, which is, like, really frustrating.
Emanuel Maiberg:Just for that, to illustrate that, like, the UNC thing that happened last week, the IT people were explaining that, it prob like, the information is easy to get, but they have a search engine, and what the bots were doing were just, like, spamming it with different search terms. So it's, like, an incredibly inefficient way of extracting the data. You know? It's like if they if if if it was just, an agreed upon, hey. Can we please have your data?
Emanuel Maiberg:We'll pay this much for it. It would the the the organization could maybe benefit from it. And then also, you you won't have to, like, DDoS the library in order to get it.
Jason Koebler:I mean or if or if it's like, okay. Scrape us, like, once a year or once every six months. Don't scrape us, like, constantly. I mean, ideally scrape us not at all, but it's like, please don't scrape us daily. And then the other thing is just, like, there's constantly new bots that are doing this associated with new companies.
Jason Koebler:Companies that already exist are creating new bots to scrape for different purposes. And so it's not like you can just protect against, you know, OpenAI scraper. You need to protect against all the different types of scrapers that that different companies might be running. Remember, like, which one what the names are, keep up to date with what they're called, so on and so forth, and and figure out how to block all that traffic. And it it's, like, extremely not trivial.
Jason Koebler:There's a couple different products that have been released to try to automate this, but it is still, like it's it's a permission structure that's, like, really fucked up because it's opt out, not not opt in, and you don't even know, like, what you're opting out of because there's constantly new ones that you have to think of, and there's, like, new strategies that the AI companies are using to circumvent robots.txt because they don't care for the most part.
Joseph Cox:Yeah. And opting out is, as you say, not straightforward. You have to, like, fight that through technical or potentially legal means. Alright. That was that was fascinating.
Joseph Cox:I'm definitely interested to hear what else we find about that. If you are listening to the free version of the podcast, I'll now play us out. But if you are a paying four zero media subscriber, we're gonna talk about the frankly casual surveillance relationship between ICE and local cops, and that's according to internal emails that Jason got. You can subscribe and gain access to that content at 404media.co. We'll be right back after this.
Joseph Cox:Alright. We are back in the subscribers only section. Jason, this is one you wrote. The headline is emails reveal the casual surveillance alliance between ICE and local police. Before we get into the the specifics of each department and and what they were saying, that sort of thing, just what are these emails broadly, and how did you get them?
Joseph Cox:Where where did they come from?
Jason Koebler:Yeah. They are FOIAD emails from the Medford, Oregon Police Department. I believe Medford is, like, the maybe the seventh largest city in Oregon. So it's not a small town, but it is definitely not a big city. It's, like, pretty small city that seems to have, like, a really big interest in surveillance technology.
Jason Koebler:And it it was FOIA ed by this group called the Information for Public Use. They are an Oregon based anonymous group that does a lot of FOIA requests and then releases the documents. And so they sent me the documents. I verified them. I did extra reporting on them, etcetera, etcetera.
Jason Koebler:And what these specific documents are is, like, an email thread about the Southern Oregon analyst group, which is pretty, like, informal group of cops.
Joseph Cox:It's a big group chat.
Jason Koebler:A big group chat, but it's on email. It's a big email thread. It's a reply all email thread is is what I would call it. And, basically, it was set up by someone at the Medford Police Department for professional development. And it was basically like, hello.
Jason Koebler:I do surveillance. You guys do surveillance for your respective agencies. Let's talk about all of the different tools that we have access to, how we use them, and also if I have access to something that you don't have access to, like, maybe I can either teach you how to use it in some cases, or I can, like, run specific searches for you on a specific tool. And so there was that. And then, crucially, like, on this email chain is not just, like, 10 different local police departments, but also ICE's Homeland Security Investigations, someone who is based in in Southern Oregon for ICE, so a federal ICE agent with HSI, and someone with the FBI was on it as well.
Jason Koebler:And in this case, it was like we've done a lot of reporting about FLAC in particular and how ICE has gotten access to local to FLAC despite not having a contract for it. And it's done that through local police by asking them to run lookups for them. And this is like an email chain that sort of shows how that happens. It's like a microcosm of something that is definitely happening all over the country. I don't know if there it's happening in these, like, analyst group group email chains or anything like that, but it is definitely happening where ICE is getting, like, pretty informal access to surveillance tools by asking local police.
Jason Koebler:And some of these emails show exactly, like, how an HSI agent asks for a lookup.
Joseph Cox:Yeah. It it reminds me somewhat of and I think I've mentioned this before where there's this big email list of used by police who do forensic extraction of mobile phones. So bypassing passwords, breaking into phones they have physical access to, that sort of thing. A lot of legitimate use cases for that, but also some, you know, definitely room for abuses, and of course, it feeds the exploit market as well. But that is broadly a big a big email chain, basically, yeah, where people could exchange tips, that sort of thing.
Joseph Cox:This seems a lot more active in that, as you say, you might have HSI or whoever say, please do this lookup or local police offering to FBI, etcetera, hey. I can do this for you. And, I mean, kind of putting aside whether you need a warrant for an automatic license plate weed or not because we spoke about that earlier, kind of regardless of that. There's no process here because they're just like sending an email being like, hey, mate. Could you do this for me?
Joseph Cox:Which was I think the craziest thing to me personally.
Jason Koebler:Yeah. And it was interesting because it goes both ways. So there is one example where, you know, the HSI agent says, hey. Like, I was told to contact you, and here's a quote. It was it's the name of a detective.
Jason Koebler:I didn't include names in this because I don't think they're actually relevant. These are all, like, local police, and it it's could happen it it's happening all over, so I didn't want to just, like, include their names in this case. But it says, detective redacted asked me to contact you and request a LPR check, which is license plate reader check on two vehicles. Please see details below. Case number, blah blah blah, vehicles, and then it's license plate, and then it says date range 2021.
Jason Koebler:Thanks. And then it's signed by an HSI agent. And the local police officer is like, sure thing, and runs it and sends them the information. And then, crucially, I actually didn't include this in the article just because it was, like, a lot of detail, but they say, like, let me know if you wanna go back even further. Like, let me know if you want me to, like, do an additional search if you want me to check like, change the change the timeline for it.
Jason Koebler:That actually is in the article. I did include it. Yeah. And then a few like, a few a little while later, like, I think it was like a year later, that same local police officer emailed that same HSI agent and said, hey. Can you actually run a set of plates for me from the border crossing system?
Jason Koebler:And so what that is is DHS has license plate readers at the border, so they have a record of every car that has crossed into and out of The United States at at specific border checkpoints. And they said, can you quote run plates through the border crossing system? And then the ICE person respondent said, yes. I can do that. Let me know what you need, and I'll take a look.
Jason Koebler:And then they did, and there was actually no hits in this case. But this is not like a, hello. I have talked to a court. I've gotten a warrant for this. I I'm, like, going through some sort of established data sharing protocols.
Jason Koebler:It's very much like gonna send my buddy an email and ask them for a favor situation.
Joseph Cox:Or or even, like, some sort of login portal where because I know federal law enforcement do this sometimes where you log in and there's data there where you can access kinda like those fusion centers in the way. It's like, here it is in one place, and it's been approved. It's being stored securely. We invite people to do that. It it's just I scratch your back.
Joseph Cox:You scratch mine. I mean, that's that's what it sounds like to me. What what were some of the tools? Sorry. I feel like you're gonna say something.
Jason Koebler:Yeah. No. So so those are the examples of sort of like how license plate reader those were two specific license plate reader lookups in these documents. And we know from our reporting that this happens thousands of times all throughout the country. That's, like, been a lot of our reporting over the last few weeks.
Jason Koebler:But then, like, as part of this larger group of local, state, and federal law enforcement, they basically had, like, an intro thread where it's like, oh, let's all get to know each other. And so here's an example of of, like, one of the emails. It was someone who had just started as a crime analyst for Josephine County Sheriff's Office on the Josephine Marijuana Enforcement Team. Josephine County is in Oregon. Marijuana enforcement team is they look for, like, illegal marijuana farmers in Oregon.
Jason Koebler:And they're like, I used to work for the military doing analysis on ISIS activity. Then I went to PayPal to be an open source Intel analyst in Chevron. And then I worked for Pinkerton, which is notably, you know, like, very famous private security company that's done a lot of, like, labor labor union spying and things like this. And then they say, I've been working for the cops for six months. Some of the tools I use are Flock, TLO, which is a credit report lookup situation that Joseph's written a lot about, leads online, which is like a data aggregation service, WSIN, which is Western States Intelligence Network, which is like a data a formal data sharing platform like the one you were just talking about, Joseph.
Jason Koebler:VIN decoding, which is like car ownership records, and then the and SockPuppet social media accounts. So he's talking about
Joseph Cox:how That was wild. That was really wild. Yeah.
Jason Koebler:So he's talking about how he's created, like, fake social media accounts to do social media monitoring, which is really wild. Then someone else chimes in, and they're just like, I have access to Flock, evidence.com, Carfax for cops, TLO. Others are like, I have access to all these camera systems, blah blah blah. My favorite if you can have a favorite email of something like this, someone said, even though we don't have cameras in our city, I would love any opportunity to search for something through Flock. I have much to learn when sneaking around on social media and collecting accurate reports from what is inputted by our department.
Jason Koebler:So this was like a new analyst who was like, hey. If you, like, have access to Flock, I would like to learn how to use it, or I would like to search for it. Like, it's pretty wild. And, yeah, it was just like a bunch of cops. Well, some of them are civilians, which is something that the police department got mad at me about when I was talking to them.
Jason Koebler:So I was careful to not say that they're cops. They're they're civilian analysts for police departments. So they collect surveillance, and then they give that surveillance to cops. But they are, like, employed by the police department. What?
Jason Koebler:I I don't see any difference here, really.
Joseph Cox:No. I mean, I guess
Jason Koebler:They're not they, like, can't arrest you on their own, I think is, like, what the they haven't been through the, like, police academy training to go, like they don't have, like, a gun and a badge. They, work in an office, and they do surveillance, and then they pass that surveillance to police.
Joseph Cox:Yeah. I mean, I I don't know. Like, it depends how you phrase it in the copy. I have to go double check, but, yeah, an employee of the police department or something like that. But
Jason Koebler:There's no Yeah. I mean, there's no there's no difference here, and I was careful not to call them cops, any individual person a cop. They're intelligence analysts is what they call themselves. But, yeah, it's just, like, pretty mind blowing to see how many different types of surveillance tools even small police departments have access to. It's pretty shocking to see how casually they're talking about it and then also how they're just kind of, like, offering it up to each other.
Jason Koebler:And I I talked to someone at ACLU Oregon, and they're like, this is not supposed to happen. It's not supposed to happen in Oregon, especially because like in Illinois, like in California, like in some other states that we've mentioned for some of our FLAC reporting, there is very specific information sharing laws and ways that information should be shared between agencies and then also for what purposes. So, you know, they shouldn't be able to do this for immigration because Oregon is a sanctuary state. They shouldn't be able to do this for abortion, things like that. And and we don't know if they're doing it for abortion or not in Oregon, but, basically, like, there are restrictions on this type of data sharing in Oregon, at least on paper.
Jason Koebler:But when you have stuff like this happening, so informally, you know, on email chains, and then also they're they were doing, like, meetups where they're going to lunch together, things like that, it's, like, really hard to keep track of what's happening and whether all of the regulations and laws are being followed.
Joseph Cox:Yeah. Well, that answered my last question about why it matters, and I think you pull it really well. I guess we'll keep an eye out for emails from other groups that may be doing this sort of thing as well. But how about we leave out there and I'll play us out. As a reminder, four zero four Media is journalist founded and supported by subscribers.
Joseph Cox:If you do wish to subscribe to four zero four Media and directly support our work, please go to 404media.co. You'll get unlimited access to our articles and an ad free version of this podcast. You'll also get to listen to the subscribers only section where we talk about a bonus story each week. This podcast is made in partnership with Kaleidoscope. Another way to support us is by leaving a five star rating and review for the podcast.
Joseph Cox:That stuff really helps us out. Here is one of those from spell checker. The four zero four team does a fantastic job at everything they do. Independent journalism for the win. I feel like I may have read that one before.
Joseph Cox:I'm sorry if I did. This has been four zero four media. We'll see you again next week.