How I’m Trying to Use Generative AI as a Journalism Engineer

Guest Contributor

September 12, 2024

Image of an iceberg where the majority of the iceberg is underwater. The image of a search engine with the letters "GPT" is overlayed at the top.

Credit: Olga Aleksandrova

This article was originally published on The Markup, a nonprofit newsroom that investigates how powerful institutions are using technology to change our society. Read the original article on their website.

By: Tomas Apodaca

Hello, readers!

I’m Tomas Apodaca, a journalism engineer at The Markup and CalMatters. It’s my job to write software and analyze data and algorithms in service of our investigative journalism.

In the year since I joined The Markup, I’ve been thinking about how artificial intelligence can be used in the newsroom—or really, whether generative AI in particular can be ethically used in the newsroom at all. The journalism industry’s relationship with AI is complicated. News organizations like the Associated Press have licensed their content to OpenAI, while newsrooms like the New York Times are suing OpenAI. Just last week, California lawmakers scrapped a proposal that would’ve forced Google to pay news organizations for using their journalism, in favor of a deal that includes some funding for journalism, and specifically, funding for a “National AI Accelerator.”

I’m an AI skeptic, in part because I learned about generative AI in 2021 from a paper that questioned whether the benefits of ever-larger language models were worth their costs. Since then, I’ve learned so much more about the ethical and environmental toll that AI takes:

Models produce text that is untethered from reality, a phenomenon known as “hallucinating”
Also, nobody knows how to stop the “hallucinations”
In pursuit of training data, the most highly valued companies in the world are racing to slurp up the entire searchable internet (most of which is copyrighted) and then hiding their theft behind a series of exclusive content deals
That slurry of content is full of biases and bigotry from every murky corner of the web
Generative AI models trained on that diet spew out harmful medical advice and corny stereotypes; they incorrectly label students’ papers as AI-generated and rank résumés lower based on the applicant’s name
Speaking of copyrighted content, people making a living in the creative industries are losing jobs to programs trained on their own work
Original creative work is being ripped off or drowned out by AI “slop”
Workers are paid poorly for everything from labeling training data to pretending to be AI themselves
Some of the biggest names in AI have no compunctions about partnering with authoritarian governments or espousing philosophies that value the lives of people in the far future over those of people who are struggling to live now
A single query to a large language model can consume 3 to 10 times as much energy as a simple web search
Tech giants are fast-tracking data centers to host the models, at the cost of millions of gallons of fresh water a week
This accelerated consumption is having an outsized impact on communities that are already at an environmental disadvantage
It’s hard to enjoy using a tool when you’re aware of its surveillance, policing and military applications

… and that’s just the tip of a very dirty iceberg.

Unfortunately for me (and for the world), I’m convinced that generative AI can also be extremely useful.

This year I have been working out how I can use AI in a way that gets around some of these issues, with limited success. The credo I’ve settled on is “Run locally and verify.”

There are specialized models small enough to run on personal devices, and tools like LM Studio, Ollama, and llm make them easy to download, run, and tinker with. Now I can run experiments, extract text from PDFs, analyze large datasets, ask coding questions, and transcribe audio on my 3-year-old laptop’s processors. The local models aren’t as fast or capable as the hosted options, but they get the job done.

Critically, and in accordance with The Markup’s AI ethics policy, I check the results for accuracy and will always disclose when I’ve used AI in the production of a story.

It’s an approach that doesn’t address many of my concerns, but at least with this setup, I don’t have to worry about dumping water on the ground every time I hit “Enter.”

What about you? If you’re struggling with how to use AI or another technology ethically and have found effective workarounds, I’d love to hear about it.

I wish I could dismiss AI, like cryptocurrency or the metaverse: hyped-up technologies with limited practical applications (unless you’re extorting money or fancy wearing bulky headgear to talk to an avatar of your boss).

AI is overhyped, but even if the industry bubble pops tomorrow, the technology’s genuine utility means it’s not likely to disappear from our lives. That’s why, despite my misgivings, I needed to figure out how to use these tools in a way that minimized their harms, at least so that I could report on them responsibly.

Here are a few things that convinced me that AI (generative and beyond) has real-world utility:

AI’s spectacular public failures reveal newsworthy things about the values of the people who build and use it
A software engineer working with the right models can solve coding problems more quickly and take on more challenging tasks in unfamiliar languages
AI models are improving automated transcription of human speech and translation between languages
People have been using machine learning tools to analyze data for decades; modern large language models are easier to use for the same type of work, and are more flexible about their inputs (for example, reporters can use the Washington Post’s new Haystacker tool to search and analyze data across text, images, and video)
Non-generative machine learning tools like computer vision are getting better; the New York Times used computer vision to reveal that Israel dropped hundreds of 2,000-pound bombs in civilian areas of Gaza
Machine learning has proven useful in the life sciences, including for analyzing field recordings to support rainforest preservation and forecasting bird migrations
Also, it can be terribly fun to report about the embarrassing things chatbots say

I know that this technology is disruptive. I’m not happy with how it’s hurting workers and guzzling key environmental resources. I’m trying my best to mitigate the downsides. But if I’m being honest with myself, it doesn’t feel like I’m doing enough, and as an industry, we need to prioritize ethical considerations when we think about using generative AI.

How are you thinking about these issues? How is AI affecting your work? What are you doing in response? Tell me about it at tomas@themarkup.org.

Thanks for reading,

Tomas Apodaca
Journalism Engineer
The Markup / CalMatters

This article was originally published on The Markup and was republished under the Creative Commons Attribution-NonCommercial-NoDerivatives license.

How I’m Trying to Use Generative AI as a Journalism Engineer — Ethically