Podcasts and AI

Using GenAI to do Public Humanities

At the intersection of artificial intelligence and public scholarship, my project explores both the possibilities and perils of using AI to make complex research more accessible. Taking as my study a paper on simulating ventilator allocation protocols—a topic that gained urgent relevance during the COVID-19 pandemic— I set out to examine how large language models might help bridge the gap between specialized academic knowledge and public understanding. Using a suite of AI tools, I attempted to automatically generate readable summaries at different comprehension levels (from grade school to college) and even produce a short podcast script explaining the research findings.

This experiment serves multiple purposes: testing the current capabilities of AI in translating technical content for lay audiences, understanding the ethical implications of automated research communication, and exploring how humanities scholars might thoughtfully integrate these emerging tools into public engagement work. While the technical aspects of ventilator allocation protocols might seem far removed from traditional humanities concerns, the underlying questions—how do we make difficult decisions about resource allocation? how do we communicate complex ethical tradeoffs to the public?—sit squarely within the humanities’ mission to foster informed public dialogue about pressing social issues. By documenting both my successes and failures in this AI-assisted translation process, I hope to contribute to broader conversations about the future of public scholarship in an age of artificial intelligence.

Use Case 1: Lay Summaries of Academic Research

A popular modality for public humanities work is long and short form journalism – mostly through opinion pieces. I am, admittedly, not a very good opinion writer – but I am fascinated by the “translational” aspects of GenAI. And so I engaged in a couple of experiments to see whether GenAI could be used to do public humanities. I started with NotebookLM, which has received a fair bit of hype.

Think of Notebook LM as a very quick, and very capable undergraduate research assistant. With it, you can search through, summarize and ask questions about a set of sources. It is surprisingly accurate, and is much less likely to hallucinate findings or quotations than a general LLM. This is because it is able to “attend” to the sources in the notebook much more closely.

Using Notebook LM is surprisingly straightforward.

Login to notebooklm.google.com, using a Google account
You will be asked to create a new notebook. Once you do so, you will be brought to the main interface. Here you can add sources to a list (left hand column), or chat/query with the assistant (bottom)

3. Add sources. You can upload PDFs, audio files, or provide hyperlinks to text/video, or paste text.

4. Query the sources. You can ask natural language questions that:

Ask for summaries of all sources
Ask for specific quotations that support/discuss a particular theme, claim
Ask for structured data – i.e. how many people were enrolled in the study? What was the average survival rate for each ventilator protocol?

The output of these queries is typically detailed and accurate. Neat!

It can also generate convincing lay summaries of articles. For instance, if we ask it to “generate a 200-word lay summary, targeted at readers with a college degree”, we get the following:

A recent study in PLOS ONE used a simulation to investigate ethical trade-offs in ventilator allocation during public health emergencies. The researchers compared six different protocols, including age-based, SOFA-based, and comorbidity-based approaches, using data from over 3,700 adult hospital admissions requiring ventilation in a New York hospital system. The simulation looked at the number of lives and life-years saved, as well as potential racial disparities.

The study found that age-sensitive protocols performed best, saving more lives and life-years than other methods. Surprisingly, the researchers did not find a trade-off between saving the most lives and saving the most life-years. Additionally, while some protocols resulted in significant racial disparities, especially the age-based and Colorado protocols, these protocols also saved more lives in Black and Hispanic populations. The New York protocol, while resulting in more equal survival rates across racial groups, saved fewer lives overall. The study highlights the complexity of ethical decision-making in crisis standards of care, and emphasizes the importance of empirical analysis in evaluating different approaches.

If we ask it to re-write that summary for “an audience with a high-school education”, we get:

A study in PLOS ONE used computer simulation to test different ways to decide who gets a ventilator during a health crisis when there are not enough for everyone. Researchers looked at six different plans, including ones that prioritized people based on age, how sick they were, or other health problems. They used data from over 3,700 patients who needed ventilators in a New York hospital. The simulation calculated how many lives and years of life each plan could save, and also looked for unfair differences between racial groups.

The study found that plans that considered age worked best, saving the most lives and years of life. Surprisingly, saving the most lives also meant saving the most life-years. Some plans caused significant racial inequality. The age-based and Colorado plans saved more lives for Black and Hispanic people but favored them less. The New York plan was more equal but saved fewer lives overall.

The study also showed that how much difference each plan made depended on how scarce ventilators were. The plans that seemed best in general worked much better when ventilators were moderately scarce compared to when they were very scarce or not scarce. The researchers suggest that having more levels of priority might help a plan work better. The results emphasize the importance of careful testing and analysis when deciding how to handle a health crisis.

You can even ask it to produce summaries for “children completing grade school”:

A study used a computer game to see which ways of sharing ventilators during a health crisis were the best. The game tested six different plans for sharing ventilators, using information from a real hospital.

The study found that plans that considered age worked the best for saving lives and years of life. It was interesting that saving the most lives also meant saving the most life-years. Some plans caused unfair differences between races. For example, plans that favored people based on their age saved more lives for Black and Hispanic people but favored White people less.

A plan that didn’t consider age was more equal, but saved fewer lives overall. The researchers learned that it’s important to test plans carefully to make sure they are fair and work well.

This flexibility is pretty astonishing, but is often cheerfully overconfident. This last one doesn’t exactly succeed at making things accessible: few grade schools are teaching the concept of a “life-year” to students. It also makes controversial value judgements (i.e. about “unfair” differences between races) that aren’t argued for in the paper. Be warned.

Use Case 2: Auto-Generated Podcasts

One of the things that has created a lot of buzz around NotebookLM is the ability to auto generate

A 10-min long podcast discussing the material in your notebook. These are, frankly, astonishing and well worth playing around with to generate podcasts of your own research articles to see how accurate and compelling they are.

To generate one of these, select the “notebook guide” from the bottom right of the chat window, and click “Generate Audio Overview”. It may take up to a minute to generate the podcast, but then you should be able to listen.

Here are is an example using NotebookLM :

<https://notebooklm.google.com/notebook/e8a9c604-82b2-4635-8818-0d26e67a417d>

The quality of the podcast is surprising! In the example above, it correctly identifies some of the complex comparisons to be made between picking the appropriate ventilator allocation protocol and other public health policy successes (seatbelts, vaccination), and gets the precise numerical details correct.

So what should academics think about all this?

The process is deceptively simple: feed a scientific paper into NotebookLM (or ChatGPT or Claude’s “Projects”), and with a well-crafted prompt, receive back a structured summary: research questions, methodology, key findings, and limitations. Venture into the humanities and social sciences and things get a bit trickier, but typically the main claim and structure of the argument can be retrieved with a simple request and a click of a button. More sophisticated users are homebrewing automated pipelines that can process entire folders of PDFs, generating annotated bibliographies and literature reviews at a scale that would have seemed impossible just two years ago. You can even experiment with “chain-of-thought” prompting techniques that ask these AI systems to explicitly reason through papers step by step, allegedly producing deeper and more nuanced analyses than simple summarization.

The promised benefits are, at first glance, compelling. By delegating the time-consuming task of initial reading and summarization to AI, researchers claim they can survey their fields more comprehensively, staying current with the torrent of new publications while focusing their human attention on synthesis and original thought. Proponents argue that AI assistance could democratize access to complex academic literature, helping scholars from non-English speaking backgrounds parse dense technical writing. For graduate students and early-career researchers, these tools promise to ease the overwhelming process of conducting literature reviews and identifying relevant prior work.

Yet, there are problems with the frictionless future offered by these tools. The machine-generated summaries themselves betray a concerning cognitive style, one that flattens nuanced arguments into sanitized bullet points and neat taxonomies. Much like Edward Tufte’s critique of PowerPoint’s “cognitive style,” which he argued degraded complex information into oversimplified hierarchies, LLM summaries risk imposing their own reductive template on scholarly thought. These AI systems excel at identifying main points and key findings, but struggle to capture the subtle interplay of ideas, the productive tensions and generative contradictions that often constitute a work’s most valuable insights. Imagine reading Deleuze literally, but not seriously. The result is a kind of intellectual compression, where the rich texture of academic discourse is smoothed into frictionless, consumable chunks.

This is not only a fussy concern about “research quality”, but also raises questions about how we train the next generation of scholars. Traditionally, the grunt work of research—combing through archives, synthesizing literature reviews, sitting with difficult texts—has served as an apprenticeship in close reading and critical thinking. When undergraduate research assistants can simply feed source material into language models, we risk depriving them of this crucial developmental stage. The ability to deeply engage with challenging works, to trace the contours of an argument across hundreds of pages, to sit in productive confusion until understanding emerges—these skills cannot be outsourced to algorithms without impoverishing our collective intellectual capacity.

Perhaps most importantly, auto-summarization of articles accelerates the industrialization of academia. In the pressure cooker of the modern university, where publication frequency serves as a proxy for intellectual merit, large language models promise a seductive shortcut: instantaneous research summaries generated with a few keystrokes. While this artificial assistance might help researchers keep pace with the ever-mounting expectations of university administrators and tenure committees, it threatens to further entrench the industrial logic of academic “productivity”—a mindset that reduces scholarship to measurable outputs while neglecting the slow, necessary work of genuine intellectual discovery.

Coda

A confession: the immediately preceding section was largely drafted by using Claude.ai. In particular, I fed the following prompt into Claude.ai and hit Enter.

“Imagine you are writing a short, journalistic piece that offers a critique of academics using LLMs to auto-generate research summaries. This could be targeted at readers of the New York Times magazine, the Baffler, or the New Yorker. Can you generate three paragraphs that focus on the following points: 1) Implications for research “productivity” and concomitant demands on researchers to “produce” scholarly work more quickly. 2) Concerns about the “cognitive style” of LLM produced summaries / bullet point lists. If possible reference the Edward Tufte’s work on the “Cognitive Style of Powerpoint”. 3) Pedagogical concerns about how we should go about training fledgling researchers. Will there be less need for undergrad researchers to sit with whole books to help us, less incentive to learn the skill of deep / close reading?”

That generated the last three paragraphs, while the first two were generated with a follow up prompt:

Write two quick introductory paragraphs that lay out (1) the process of LLM-assisted summarization, and (2)the promised benefits.

I then lightly edited the text – “Imagine reading Deleuze literally, but not seriously” is an insert – and removed some fabricated quotes (!!) and speculative claims about what other academics are doing that I couldn’t verify.

You probably find it ironic – perhaps a little reprehensible – that I’d use an LLM to generate a critique of AI-generated summaries. To be honest, I’m not entirely sure how I feel about the use of these tools either. What I’ve hoped to at least demonstrate, however, is that they can (and will) be used to quickly and easily create accessible explanations of academic research. Indeed, we are going to have figure out what kinds of uses we are willing to tolerate, celebrate, and/or malign.