人工智能已经碾压新闻业

When tech companies first rolled out generative-AI products, some critics immediately feared a media collapse. Every bit of writing, imagery, and video became suspect. But for news publishers and journalists, another calamity was on the horizon.
当科技公司首次推出生成型产品时，一些批评家立即担心媒体崩溃。每一刻的写作，图像和视频都令人怀疑。但是对于新闻出版商和记者来说，另一场灾难正在范围内。

Chatbots have proved adept at keeping users locked into conversations. They do so by answering every question, often through summarizing articles from news publishers. Suddenly, fewer people are traveling outside the generative-AI sites—a development that poses an existential threat to the media, and to the livelihood of journalists everywhere.
事实证明，聊天机器人擅长将用户锁定在对话中。他们经常通过总结新闻发布者的文章来回答每个问题来做到这一点。突然，在生成-AI网站之外旅行的人很少，这一发展对媒体构成了生存威胁，以及各地记者的生计。

According to one comprehensive study, Google’s AI Overviews—a feature that summarizes web pages above the site’s usual search results—has already reduced traffic to outside websites by more than 34 percent. The CEO of DotDash Meredith, which publishes People, Better Homes & Gardens, and Food & Wine, recently said the company is preparing for a possible “Google Zero” scenario. Some have speculated that traffic drops resulting from chatbots were part of the reason outlets such as Business Insider and the Daily Dot have recently had layoffs. “Business Insider was built for an internet that doesn’t exist anymore,” one former staffer recently told the media reporter Oliver Darcy.
根据一项综合研究，Google的AI概述（该功能总结了网站通常的搜索结果上方的网页），已经使外部网站的流量减少了34％以上。Dotdash Meredith的首席执行官出版了人们，更好的房屋和花园以及美食与葡萄酒，最近表示，该公司正在为可能的“ Google Zero”情况做准备。一些人推测，由聊天机器人引起的交通跌幅是商业内幕和每日DOT等媒体的部分原因。一位前职员最近告诉媒体记者奥利弗·达西（Oliver Darcy），“商业内幕是为不再存在的互联网建造的。”

Not all publishers are at equal risk: Those that primarily rely on general-interest readers who come in from search engines and social media may be in worse shape than specialized publishers with dedicated subscribers. Yet no one is totally safe. Released in May 2024, AI Overviews joins ChatGPT, Claude, Grok, Perplexity, and other AI-powered products that, combined, have replaced search for more than 25 percent of Americans, according to one study. Companies train chatbots on huge amounts of stolen books and articles, as my previous reporting has shown, and scrape news articles to generate responses with up-to-date information. Large language models also train on copious materials in the public domain—but much of what is most useful to these models, particularly as users seek real-time information from chatbots, is news that exists behind a paywall. Publishers are creating the value, but AI companies are intercepting their audiences, subscription fees, and ad revenue.
并非所有出版商都处于同等风险中：那些主要依赖于搜索引擎和社交媒体的一般利益读者的风险可能比拥有专门订户的专业出版商差。但是没有人完全安全。一项研究显示，AI概述于2024年5月发行，加入了Chatgpt，Claude，Grok，Grok，困惑和其他AI驱动的产品，这些产品总结了搜索超过25％的美国人的搜索。正如我以前的报道所表明的那样，公司培训聊天机器人，培训大量被盗的书籍和文章，并刮擦新闻文章，以通过最新信息产生回应。大型语言模型还在公共领域中的大量材料进行培训，但是这些模型最有用的大部分内容，尤其是当用户从聊天机器人那里寻求实时信息时，是付费墙背后的新闻。出版商正在创造价值，但是AI公司正在拦截他们的受众，订阅费和广告收入。

Read: The unbelievable scale of AI’s pirated-books problem
阅读：AI的Pirated-Books问题的令人难以置信的规模

I asked Anthropic, xAI, Perplexity, Google, and OpenAI about this problem. Anthropic and xAI did not respond. Perplexity did not directly comment on the issue. Google argued that it was sending “higher-quality” traffic to publisher websites, meaning that users purportedly spend more time on the sites once they click over, but declined to offer any data in support of this claim. OpenAI referred me to an article showing that ChatGPT is sending more traffic to websites overall than it did previously, but the raw numbers are fairly modest. The BBC, for example, reportedly received 118,000 visits from ChatGPT in April, but that’s practically nothing relative to the hundreds of millions of visitors it receives each month. The article also shows that traffic from ChatGPT has in fact declined for some publishers.
我向拟人，XAI，困惑，Google和OpenAI询问了此问题。人类和XAI没有回应。困惑并未直接对此问题发表评论。Google认为，它正在向发布者网站发送“更高质量”的访问量，这意味着用户单击一旦单击，据称用户在网站上花费了更多时间，但拒绝提供任何数据以支持此索赔。Openai将我转介给一篇文章，显示Chatgpt总体上将更多的流量发送给网站，但原始数字相当谦虚。例如，据报道，英国广播公司（BBC）于4月从Chatgpt进行了118,000次访问，但这实际上与每月收到数亿人的游客相关。文章还表明，Chatgpt的流量实际上已经下降了一些出版商。

Over the past few months, I’ve spoken with several news publishers, all of whom see AI as a near-term existential threat to their business. Rich Caccappolo, the vice chair of media at the company that publishes the Daily Mail—the U.K.’s largest newspaper by circulation—told me that all publishers “can see that Overviews are going to unravel the traffic that they get from search, undermining a key foundational pillar of the digital-revenue model.” AI companies have claimed that chatbots will continue to send readers to news publishers, but have not cited evidence to support this claim. I asked Caccappolo if he thought AI-generated answers could put his company out of business. “That is absolutely the fear,” he told me. “And my concern is it’s not going to happen in three or five years—I joke it’s going to happen next Tuesday.”
在过去的几个月中，我与几家新闻出版商进行了交谈，他们都将AI视为对其业务的近期存在威胁。该公司媒体副主席Rich Caccappolo出版了《每日邮报》（英国最大的报纸划分的报纸），我愿意我所有出版商“都可以看到，概述将揭示他们从搜索中获得的流量，破坏了数字奖励模型的关键基础基础。”AI公司声称，聊天机器人将继续将读者送往新闻发布者，但没有引用证据来支持这一主张。我问Caccappolo他是否认为AI生成的答案可以使他的公司倒闭。“那绝对是恐惧，”他告诉我。“我担心的是，这三到五年不会发生 - 我开玩笑会在下周二发生。”

Book publishers, especially those of nonfiction and textbooks, also told me they anticipate a massive decrease in sales, as chatbots can both summarize their books and give detailed explanations of their contents. Publishers have tried to fight back, but my conversations revealed how much the deck is stacked against them. The world is changing fast, perhaps irrevocably. The institutions that comprise our country’s free press are fighting for their survival.
书籍出版商，尤其是非小说类和教科书的出版商，还告诉我，他们预计销售量会大大减少，因为聊天机器人都可以总结他们的书籍并详细说明其内容。出版商试图反击，但我的对话揭示了甲板对他们的堆积程度。世界正在快速变化，也许是不可撤销的。构成我国自由媒体的机构正在为生存而战。

Publishers have been responding in two ways. First: legal action. At least 12 lawsuits involving more than 20 publishers have been filed against AI companies. Their outcomes are far from certain, and the cases might be decided only after irreparable damage has been done.
出版商已经通过两种方式做出回应。第一：法律行动。至少有12项涉及20多名出版商的诉讼已向AI公司提出。他们的结果远非确定，并且只有在造成了无法弥补的损失后才能决定案件。

The second response is to make deals with AI companies, allowing their products to summarize articles or train on editorial content. Some publishers, such as The Atlantic, are pursuing both strategies (the company has a corporate partnership with OpenAI and is suing Cohere). At least 72 licensing deals have been made between publishers and AI companies in the past two years. But figuring out how to approach these deals is no easy task. Caccappolo told me he has “felt a tremendous imbalance at the negotiating table”—a sentiment shared by others I spoke with. One problem is that there is no standard price for training an LLM on a book or an article. The AI companies know what kinds of content they want, and having already demonstrated an ability and a willingness to take it without paying, they have extraordinary leverage when it comes to negotiating. I’ve learned that books have sometimes been licensed for only a couple hundred dollars each, and that a publisher that asks too much may be turned down, only for tech companies to take their material anyway.
第二个回应是与AI公司达成交易，允许其产品总结文章或培训编辑内容。一些出版商，例如大西洋，正在采用两种策略（该公司与Openai建立了公司合作伙伴关系，并且正在起诉Cohere）。在过去的两年中，出版商和AI公司之间至少达成了72笔许可协议。但是，弄清楚如何处理这些交易并非易事。卡卡波洛告诉我，他“在谈判桌上感到极大的失衡” - 我与之交谈的其他人分享了这种情绪。一个问题是，在书或文章上培训LLM的标准价格没有标准价格。AI公司知道他们想要什么样的内容，并且已经表现出了不付款的能力和愿意接受它的能力和意愿，在谈判方面，它们具有非凡的杠杆作用。我了解到，有时只有几百美元的书籍获得许可，而且出版商要求太多的出版商可能会被拒绝，而技术公司无论如何都可以拿走其材料。

Read: ChatGPT turned into a Studio Ghibli machine. How is that legal?
阅读：Chatgpt变成了吉卜力录音室。那是合法的？

Another issue is that different content appears to have different value for different LLMs. The digital-media company Ziff Davis has studied web-based AI training data sets and observed that content from “high-authority” sources, such as major newspapers and magazines, appears more desirable to AI companies than blog and social-media posts. (Ziff Davis is suing OpenAI for training on its articles without paying a licensing fee.) Researchers at Microsoft have also written publicly about “the importance of high-quality data” and have suggested that textbook-style content may be particularly desirable.
另一个问题是，不同的内容对于不同的LLM似乎具有不同的值。这家数字媒体公司Ziff Davis研究了基于Web的AI培训数据集，并观察到，“高授权”来源的内容（例如主要报纸和杂志）似乎比博客和社交媒体帖子更需要。（齐夫·戴维斯（Ziff Davis）正在起诉Openai进行有关其文章的培训，而无需支付许可费。）微软的研究人员还公开撰写了有关“高质量数据的重要性”的文章，并建议教科书式内容可能特别可取。

But beyond a few specific studies like these, there is little insight into what kind of content most improves an LLM, leaving a lot of unanswered questions. Are biographies more or less important than histories? Does high-quality fiction matter? Are old books worth anything? Amy Brand, the director and publisher of the MIT Press, told me that “a solution that promises to help determine the fair value of specific human-authored content within the active marketplace for LLM training data would be hugely beneficial.”
但是，除了一些类似的特定研究之外，几乎没有深入了解哪种内容最能改善LLM，而留下了许多未解决的问题。传记或多或少比历史重要吗？高质量小说很重要吗？旧书值得吗？麻省理工学院出版社的董事兼出版商艾米·布兰德（Amy Brand）告诉我：“一种有望帮助确定LLM培训数据中的特定人为著名内容的公允价值的解决方案将是非常有益的。”

A publisher’s negotiating power is also limited by the degree to which it can stop an AI company from using its work without consent. There’s no surefire way to keep AI companies from scraping news websites; even the Robots Exclusion Protocol, the standard opt-out method available to news publishers, is easily circumvented. Because AI companies generally keep their training data a secret, and because there is no easy way for publishers to check which chatbots are summarizing their articles, publishers have difficulty figuring out which AI companies they might sue or try to strike a deal with. Some experts, such as Tim O’Reilly, have suggested that laws should require the disclosure of copyrighted training data, but no existing legislation requires companies to reveal specific authors or publishers that have been used for AI training material.
出版商的谈判能力也受到可以阻止AI公司在未经同意的情况下使用其工作的程度的限制。没有确保AI公司刮擦新闻网站的方法；即使是机器人排除协议，新闻发布者可用的标准退出方法也很容易避免。由于AI公司通常将其培训数据保密，并且由于出版商没有简单的方法来检查哪些聊天机器人正在总结其文章，因此出版商很难弄清楚他们可能起诉哪些AI公司或试图达成协议。一些专家，例如蒂姆·奥莱利（Tim O’Reilly），建议法律应要求披露受版权保护的培训数据，但没有现有的立法要求公司揭示用于AI培训材料的特定作者或出版商。

Of course, all of this raises a question. AI companies seem to have taken publishers’ content already. Why would they pay for it now, especially because some of these companies have argued in court that training LLMs on copyrighted books and articles is fair use?
当然，所有这些都提出了一个问题。AI公司似乎已经吸收了发布商的内容。他们为什么现在要为此付款，尤其是因为其中一些公司在法庭上辩称，对受版权保护的书籍和文章进行培训是合理使用的？

Perhaps the deals are simply hedges against an unfavorable ruling in court. If AI companies are prevented from training on copyrighted work for free, then organizations that have existing deals with publishers might be ahead of their competition. Publisher deals are also a means of settling without litigation—which may be a more desirable path for publishers who are risk-averse or otherwise uncertain. But the legal scholar James Grimmelmann told me that AI companies could also respond to complaints like Ziff Davis’s by arguing that the deals involve more than training on a publisher’s content: They may also include access to cleaner versions of articles, ongoing access to a daily or real-time feed, or a release from liability for their chatbot’s plagiarism. Tech companies could argue that the money exchanged in these deals is exclusively for the nonlicensing elements, so they aren’t paying for training material. It’s worth noting that tech companies almost always refer to these deals as partnerships, not licensing deals, likely for this reason.
也许这些交易只是针对法庭上不利的裁决的树篱。如果阻止AI公司免费培训受版权保护的工作，那么与出版商现有交易的组织可能在竞争中处于领先地位。出版商交易也是无需诉讼解决的一种手段，对于规避风险或不确定的出版商来说，这可能是更理想的途径。但是，法律学者詹姆斯·格里梅尔曼（James Grimmelmann）告诉我，AI公司还可以通过争辩说，这些交易涉及的不仅仅涉及出版商的内容培训这些交易：它们还可能包括访问更清洁版本的文章，持续访问每日或实时的供稿，或者对他们的chatbot plagabot的责任释放。科技公司可能会争辩说，这些交易中交换的资金仅用于无许可元素，因此他们不为培训材料付费。值得注意的是，由于这个原因，科技公司几乎总是将这些交易称为合伙企业，而不是许可交易。

Regardless, the modest income from these arrangements is not going to save publishers: Even a good deal, one publisher told me, won’t come anywhere near recouping the revenue lost from decreased readership. Publishers that can figure out how to survive the generative-AI assault may need to invent different business models and find new streams of revenue. There may be viable strategies, but none of the publishers I spoke with has a clear idea of what they are.
无论如何，这些安排中适中的收入不会为出版商挽救：一位出版商告诉我，即使是一笔不错的交易，也不会收回减少读者损失的收入。可以弄清楚如何生存的发行商可能需要发明不同的业务模型并找到新的收入流。可能会有可行的策略，但是我与之交谈的发行商都没有清楚地了解它们是什么。

Publishers have become accustomed to technological threats over the past two decades, perhaps most notably the loss of ad revenue to Facebook and Google, a company that was recently found to have an illegal monopoly in online advertising (though the company has said it will appeal the ruling). But the rise of generative AI may spell doom for the Fourth Estate: With AI, the tech industry even deprives publishers of an audience.
在过去的二十年中，出版商已经习惯了技术威胁，也许最著名的是Facebook和Google的AD收入损失，该公司最近被发现在线广告中具有非法垄断（尽管该公司表示将对该裁决提出上诉）。但是，生成性AI的兴起可能会为第四庄园拼写厄运：在AI的情况下，技术行业甚至剥夺了出版商的观众。

In the event of publisher mass extinction, some journalists will be able to endure. The so-called creator economy shows that it’s possible to provide high-quality news and information through Substack, YouTube, and even TikTok. But not all reporters can simply move to these platforms. Investigative journalism that exposes corruption and malfeasance by powerful people and companies comes with a serious risk of legal repercussions, and requires resources—such as time and money—that tend to be in short supply for freelancers.
如果出版商大规模灭绝，一些记者将能够忍受。所谓的创造者经济表明，可以通过替代，YouTube甚至Tiktok提供高质量的新闻和信息。但是，并非所有记者都可以简单地转移到这些平台上。强大的人和公司揭露腐败和渎职的调查新闻具有严重的法律影响风险，并且需要资源（例如时间和金钱），而这些资源往往供应自由职业者。

If news publishers start going out of business, won’t AI companies suffer too? Their chatbots need access to journalism to answer questions about the world. Doesn’t the tech industry have an interest in the survival of newspapers and magazines?
如果新闻出版商开始倒闭，AI公司也不会受苦吗？他们的聊天机器人需要访问新闻业才能回答有关世界的问题。科技行业对报纸和杂志的生存感兴趣吗？

In fact, there are signs that AI companies believe publishers are no longer needed. In December, at The New York Times’ DealBook Summit, OpenAI CEO Sam Altman was asked how writers should feel about their work being used for AI training. “I think we do need a new deal, standard, protocol, whatever you want to call it, for how creators are going to get rewarded.” He described an “opt-in” regime where an author could receive “micropayments” when their name, likeness, and style were used. But this could not be further from OpenAI’s current practice, in which products are already being used to imitate the styles of artists and writers, without compensation or even an effective opt-out.
实际上，有迹象表明AI公司认为不再需要出版商。12月，在《纽约时报》的交易书峰会上，OpenAI首席执行官Sam Altman被问及作家对他们的工作用于AI培训的感觉如何。“我认为我们确实需要一项新的交易，标准，协议，无论您想称呼什么，才能获得奖励。”他描述了一个“选择加入”制度，在使用其名称，相似性和样式时，作者可以收到“微付款”。但这与Openai当前的实践相距甚远，在该实践中，已经使用产品来模仿艺术家和作家的风格，没有补偿甚至有效的选择退出。

Google CEO Sundar Pichai was also asked about writer compensation at the DealBook Summit. He suggested that a market solution would emerge, possibly one that wouldn’t involve publishers in the long run. This is typical. As in other industries they’ve “disrupted,” Silicon Valley moguls seem to perceive old, established institutions as middlemen to be removed for greater efficiency. Uber enticed drivers to work for it, crushed the traditional taxi industry, and now controls salaries, benefits, and workloads algorithmically. This has meant greater convenience for consumers, just as AI arguably does—but it has also proved ruinous for many people who were once able to earn a living wage from professional driving. Pichai seemed to envision a future that may have a similar consequence for journalists. “There’ll be a marketplace in the future, I think—there’ll be creators who will create for AI,” he said. “People will figure it out.”
Google首席执行官Sundar Pichai在Dealbook Summit上也被询问了作家薪酬。他建议将出现市场解决方案，从长远来看可能不涉及出版商。这是典型的。就像在其他行业中，他们“被破坏了”，硅谷大人物似乎认为旧的机构作为中间商，以提高效率。Uber诱使驾驶员为此工作，破坏了传统的出租车行业，现在可以通过算法来控制薪资，福利和工作量。这对消费者来说意味着更大的便利，就像AI可以说是这样的，但对于曾经能够从专业驾驶中赚取生活工资的许多人来说，这也被证明是毁灭性的。Pichai似乎设想了可能对记者产生类似后果的未来。他说：“我认为，将来会有一个市场 - 有创造者会为AI创造。”“人们会弄清楚。”

人工智能已经碾压新闻业

The End of Publishing as We Know It

最新文章

热门文章