top of page

What’s in OpenAI’s Custom GPT Store for Journalists?

  • Writer: Lily
    Lily
  • May 29, 2024
  • 7 min read

Updated: Jul 18, 2024


What’s in OpenAI’s Custom GPT Store for Journalists?

Professionals in the news industry are beginning to wonder how generative AI will change the way they do their jobs. Some fear their work may disappear because of AI, while others believe it won’t help them at all. As a student journalist at Northwestern University, but also as someone double majoring in computer science, I don’t fear AI completely replacing me (for now), but I do believe that leveraging the right tools can help me work more efficiently. It always seems like I don’t have enough time to write and publish all the story ideas I have. So I wondered: How could generative AI speed up my work?


Open AI recently launched the GPT store, which contains specialized versions of ChatGPT built by both OpenAI and other users to help with specific tasks. Some journalists have started developing these custom GPTs to support specific tasks they perform. In hopes of finding more of these kinds of tools to help me research, write, report, and illustrate my work more efficiently, this post walks through how I systematically explored what’s on offer there to support journalists. I first describe how I collected a broad set of custom GPTs that may be relevant to journalism, and then I expand on how I rated and ranked them for journalistic tasks.

Collecting GPTs

Since the GPT store doesn’t have a complete list of all GPTs available to the public, I decided to acquire them through the platform’s search functionality. To do this programmatically, I executed search requests of the GPT store using hundreds of keywords relating to a wide range of journalism tasks and activities.

To create a starting point for these keywords, I prompted ChatGPT (see footnote [1]). For the prompt, I plugged in the 30 tasks and 19 work activities found on ONet under the “News Analysts, Reporters, and Journalists” occupation page. ONet, which was developed for the U.S. Department of Labor, provides a comprehensive and detailed database of occupational information, ensuring reliable and current data on job roles and responsibilities. This prompt produced 40 keywords, but to find an even more comprehensive list of GPTs, I wanted to further expand my base of search terms.


To do this, I again used ChatGPT to perform a keyword expansion on each of my first 40 keywords (see footnote [2]).

After this keyword expansion step, I had 586 journalism-related keywords and keyphrases. I did some quick trimming to eliminate terms that were obviously not journalistic and wouldn’t produce the types of GPTs that I sought. After this filtering, I had 552 terms left.

To collect relevant GPTs, I ran a search request using cURL (a command line utility) with each keyword (for a template of the request see footnote [3]). Each request returned a JSON object that contained information about the first 10 GPTs matching the keyword (it seems to be a limitation of the search functionality that it can only return up to 10 items per search). From these requests, I was able to collect the name, description and unique ID of the GPT, as well as example conversation starters (these are meant to help users understand how they could use the GPT), and the number of conversations (an indicator of GPT use). After performing these requests for each keyword and filtering out duplicates based on ID number, I had a corpus of 3,749 GPTs.


For efficiency in downstream analysis, I decided to filter out GPTs that had fewer than 100 conversations, focusing the corpus on those GPTs that had more traction and usage. After doing this, I was left with 693 GPTs.

Rating and Ranking GPTs

With a long list of journalism-related GPTs, I needed a system for deciding which GPTs could be the most useful. Ideally, we would test these all manually and write reviews on them, but this wasn’t a feasible starting point for almost 700 models. So I decided to use GPT-4 via the OpenAI API to rank the GPTs I had collected and then manually test the highest-ranked models.

To rate them on a scale from 1 (not useful) to 4 (very useful) I used the same 30 journalistic tasks from ONet as above to assess the utility of each GPT for each of the tasks (see footnote [4]).


With these rankings, I calculated some average statistics about each GPT to have a better idea of what the results meant. I averaged each GPT’s ratings across tasks and also tallied the distribution of 4s, 3s, 2s, and 1s for each GPT. I thought this would give me an idea of which GPTs would be the most useful for my purposes because I could filter by the highest-ranked GPT and by GPTs with a lot of high rankings.


However, I realized this may not tell the entire story. Instead of only judging the GPTs based on their general scores, I also decided to see if certain GPTs were highly ranked in certain specialties or clusters of tasks, but potentially ranked lower overall. To do this I manually divided the 30 tasks into four subgroups (Writing and Editing, Reporting and Investigating, Broadcast and Multimedia, and Communication and Engagement). Then, using my already generated rankings, I calculated averages for each of the subgroups. This allowed me to find GPTs that were more specific to a more focused set of tasks in journalism, instead of only GPTs that were helpful across the entire set of tasks.


Testing GPTs

Based on the ratings and rankings I created, the next step was to manually test some of the highest-ranked GPTs. I wanted to see if they were actually useful in the story creation process. To create my test set I took the four overall highest-ranked GPTs, and the two highest-ranked from each task subgroup, resulting in 12 GPTs.

Then, to test the selected GPTs, I decided to use them in the way I would as a journalist and record my observations. While not entirely systematic, as a first cut at assessment, this was the most flexible way to see how the GPTs might be useful in my workflow.

I spent 45 minutes experimenting with each GPT. For the four GPTs that got the highest overall scores, I tried performing tasks that could benefit me throughout all aspects of my work. For the more specialized models, I focused on aspects of journalism that were more targeted and pertinent to the described subset of tasks they could assist with.


Takeaways

The most obvious takeaway from this experience was the difference between using the high overall ranking GPTs for a broad range of tasks versus using specialized GPTs for a refined range of tasks. For example, Journalist Pro, a GPT that was supposed to be able to assist with a wide range of tasks, gave very bland and, in many cases, unhelpful results. On the other hand, Fact-checking, a GPT specifically designed to identify incorrect statements and opinions vs. facts in writing, was successfully able to verify my writing was based in fact and spot sentences to either double check or change.

This trend continued across all of the GPTs I tested: the general GPTs gave disappointing and basic results, while the specialized GPTs were useful in solving their smaller tasks. Even when I attempted to have the more general GPTs perform the smaller tasks that the specialized GPTs excelled in, they let me down.


Legal Eye, the 2nd highest overall ranked model, is a GPT that is described as helping with research, sourcing, and writing. However, I found it ineffective for finding pertinent articles for my ideas, suggested interview subjects that didn’t make sense for my angle, and when provided with information, wrote a subpar story that lacked the legal and political analysis it claimed it would give.

Improve my Writing, my highest ranked writing and editing GPT did a fantastic job of finding grammatical errors and correcting sentence structure. A feature I particularly enjoyed was how it could alter an entire piece of writing by converting the tone or writing style to a completely different one. For example, I gave the model a piece I wrote that was informative and serious. I had the GPT change the story to something more catered to a younger audience, and it did an impressive job of this. This model, while smaller in scope of tasks than Legal Eye, did a more thorough job of the tasks I had it perform.


Video Maker by Lucas AI, my highest-ranked broadcast and multimedia GPT, did a great job of script creation and making informative videos with interesting visuals. It converted written pieces into reasonable videos. However, these videos were not production level, with the narrator being monotone and the images sometimes being mismatched to the script. It was a good tool to create a baseline video, but additional editing would be needed to publish it.


Conclusion

This project gave me some new ideas that I think can be useful to other journalists thinking about using custom GPTs.

My advice would be to find the biggest pain points in your process and then do research into specialized GPTs that could solve this issue. For example, if you find yourself spinning your wheels on identifying useful interviewees for your story, dig around for GPTs that specialize in finding people or doing article research. Don’t assume a general GPT aimed towards journalism can help you. Research GPT, a custom-built GPT that is designed to help with complex research issues by providing experts to speak to and articles to read, may be of use in this situation. If it isn’t, doing a simple search for “source finder” and adjacent keywords in the GPT store will generate GPTs built specifically for your task at hand. From there, play around with each one, as they will have small discrepancies that could influence your work. Also, looking at the GPT’s ratings and number of conversations can be an indicator of which models are better developed and more tested.

In the future it could also make sense to develop a more rigorous set of evaluation criteria for each — would you want in-depth reviews of custom GPTs for specific use cases and tasks you have? Let me know!

Comentarios


bottom of page