史蒂夫说

和我一起拓展意识边界

凯文·凯利:触手可及的无限创意

**触手可及的无限创意 - 凯文·凯利

Picture Limitless Creativity at your Fingertips by Kevin Kelly**

原文:https://www.wired.com/story/picture-limitless-creativity-ai-image-generators/
机翻 by DeepL.com

想象一下,皮克斯最杰出的动画师之一李-恩里克还是个七年级学生。他正盯着学校第一台电脑屏幕上的火车头图像。哇,他想。然而,当李了解到这幅图像并不是简单地要求 "一张火车的图片 "就能出现的时候,一些神奇的感觉就消失了。相反,它必须经过艰苦的编码和渲染--由辛勤工作的人类完成。

现在想象一下,43年后,李偶然发现了DALL-E,一个人工智能,可以根据人类提供的提示生成原始的艺术作品,这些提示可以简单到 "一张火车的图片"。当他输入文字创造一个又一个图像时,惊叹声又出现了。只是这一次,它并没有消失。"他说:"这感觉就像一个奇迹。"当结果出现时,我的呼吸被带走了,眼泪在我的眼睛里打转。它就是那么神奇。"

我们的机器已经跨越了一个门槛。在我们的生活中,我们一直被安慰说计算机不可能有真正的创造力。然而,突然间,数以百万计的人现在正在使用新品种的人工智能来生成令人惊叹的、从未见过的图片。这些用户中的大多数都不是像李-昂克里奇那样的专业艺术家,而这正是关键所在。他们不一定非得如此。不是每个人都能编写、导演和编辑像《玩具总动员3》或《可可》这样的奥斯卡奖得主,但每个人都可以启动人工智能图像生成器并输入一个想法。屏幕上出现的东西在其真实性和细节深度方面令人震惊。因此,普遍的反应是。哇。仅仅在四项服务上--Midjourney、Stable Diffusion、Artbreeder和DALL-E--与人工智能合作的人类现在每天创造出超过2000万张图像。通过手中的画笔,人工智能已经成为一个令人惊叹的引擎。

因为这些产生惊喜的人工智能已经从人类制作的数十亿张图片中学习了他们的艺术,他们的产出徘徊在我们期望图片的样子。但由于它们是外星人工智能,甚至对它们的创造者来说也是根本性的神秘,它们以人类不可能想到的方式重组新图片,填补我们大多数人无法想象的细节,更不用说执行的技能。它们也可以被指示生成更多我们喜欢的东西的变体,无论我们想要什么风格--几秒钟。最终,这就是他们最强大的优势。他们可以制造出可亲和可理解的新事物,但同时又完全出乎意料。

事实上,这些新的人工智能生成的图像是如此出人意料,以至于在惊叹之后的无声敬畏中,几乎所有遇到过它们的人都会产生另一种想法。人类制造的艺术现在必须结束了。谁能与这些机器的速度、廉价、规模以及疯狂的创造力竞争?艺术是否又是我们必须向机器人屈服的另一种人类追求?还有一个明显的问题。如果计算机可以有创造力,那么它们还能做什么,而我们被告知它们不能做什么?

在过去的六个月里,我使用人工智能创造了数千张引人注目的图片,为了寻找隐藏在代码中的更多美感,我常常整夜失眠。在采访了这些生成器的创造者、强力用户和其他早期采用者之后,我可以做出一个非常明确的预测。生成式人工智能将改变我们设计一切的方式。哦,没有一个人类艺术家会因为这项新技术而失去工作。

把在人工智能帮助下生成的图像称为 "共同创作",这并不夸张。这种新力量的令人清醒的秘密是,它的最佳应用不是输入单一的提示,而是人类和机器之间长时间对话的结果。每张图片的进展来自于很多很多的迭代、来回、迂回,以及数小时,有时是数天的团队工作--所有这些都是在多年的机器学习进展的基础上取得的。

人工智能图像生成器诞生于两项独立技术的结合。一个是历史悠久的深度学习神经网络,可以生成连贯的现实图像,另一个是自然语言模型,可以作为图像引擎的接口。这两者被结合成一个语言驱动的图像生成器。研究人员在互联网上搜索了所有有相邻文字的图像,如标题,并使用数十亿个这样的例子来连接视觉形式与文字,以及文字与形式。有了这种新的组合,人类用户可以输入一串描述他们所寻找的图像的词语--提示,而提示将根据这些词语生成一幅图像。

现在谷歌的科学家们发明了扩散计算模型,这些模型是今天图像生成器的核心,但该公司一直非常担心人们会用它们来做什么,所以它仍然没有向公众开放自己的实验性生成器Imagen和Parti。(只有员工可以尝试,而且对可以要求的内容有严格的指导。) 那么,现在最受欢迎的三个图像生成器平台是三个没有遗产保护的初创公司,这并不是巧合。Midjourney是一家由大卫-霍尔茨(David Holz)发起的创业公司,他将生成器建立在一个新兴的艺术家社区中。人工智能的界面是一个嘈杂的Discord服务器;所有的工作和提示从一开始就被公开了。DALL-E是非营利组织OpenAI的第二代产品,由埃隆-马斯克和其他人资助。2022年8月,稳定扩散出现在现场,由欧洲企业家Emad Mostaque创建。这是一个开源项目,还有一个好处是任何人都可以下载其软件并在自己的桌面上本地运行。与其他项目相比,Stable Diffusion将AI图像生成器释放到了野外。

为什么这么多人对玩弄这些人工智能如此兴奋?许多图像被创造出来的原因与人类一直以来创造大多数艺术的原因相同:因为图像很美,我们想看它们。就像篝火中的火焰一样,灯光图案令人着迷。它们从不重复;它们一次又一次地令人惊讶。它们描绘了没有人见过或甚至无法想象的场景,而且它们的构图非常专业。这是一种类似于探索电子游戏世界或翻阅艺术书籍的乐趣。他们的创造力有一种真正的美,我们盯着看,就像我们在博物馆里欣赏一个伟大的艺术展览一样。事实上,观看生成的图像的游行非常像参观一个个人博物馆--但在这种情况下,墙上挂满了我们要求的艺术。而且,下一张图片的永久新颖性和惊喜几乎没有减弱。用户可以分享他们发现的珍宝,但我猜测,目前每天生成的2000万张图片中,99%的图片只会被一个人--它们的创造者--看到。

像任何艺术一样,这些图片也能起到治疗作用。人们花时间制作奇怪的人工智能图片,就像他们在周日作画,或在日记中涂鸦,或拍摄视频一样的原因。他们用媒体来解决自己生活中的一些问题,一些无法用其他方式表达的问题。我见过一些图片,描绘了动物天堂的模样,是为了回应一只爱犬的死亡而创作的。许多图像探索了无形的、精神领域的表现,大概是作为对它们的一种思考方式。"Midjourney的创造者霍尔茨告诉我:"整个使用方法的很大一部分基本上是艺术疗法。"这些图像在普遍意义上并不具有真正的美学吸引力,但在人们生活中发生的事情的背景下,以一种非常深刻的方式吸引人。" 这些机器可以用来产生各种类型的幻想。虽然托管服务禁止色情和血腥,但在桌面版本上什么都可以,就像在Photoshop中那样。

人工智能生成的图片也可以是功利性的。假设你正在提交一份关于将医院塑料垃圾回收为建筑材料的可能性的报告,你想要一张用试管制成的房子的图片。你可以在图片市场上搜索由人类艺术家制作的可用图片。但是像这样独特的任务很少会有预先存在的图片,即使找到了,其版权状况也可能是可疑的或昂贵的。在几分钟内为你的报告生成一张独特的、个性化的图片,然后插入你的幻灯片、通讯或博客中,这样做更便宜、更快,而且可能更合适,而且版权所有权是你的(目前)。我自己一直在使用这些生成器,为我自己的幻灯片演示创造图像。

在对实力派用户的非正式调查中,我发现他们只有大约40%的时间是在寻找功利性的图片。大多数人工智能图像被用在以前没有图像的地方。它们通常不会取代人类艺术家创作的图像。例如,它们可能是由本身没有艺术天赋,或没有时间和预算来雇用人的人创建的,以说明纯文本的通讯。就像一个世纪前机械摄影并没有扼杀人类的插图,而是大大扩展了图像出现的地方,人工智能图像生成器也为更多的艺术提供了可能性,而不是减少。我们将开始看到根据上下文生成的图像主要出现在目前空白的空间,如电子邮件、短信、博客、书籍和社交媒体。

这种新的艺术存在于绘画和摄影之间。它生活在一个和绘画一样大的可能性空间里,和人类的想象力一样巨大。但你像摄影师一样在这个空间里移动,寻找发现。调整你的提示,你可能会到达一个以前没有人去过的地方,所以你慢慢地探索这个区域,一边走一边拍快照。这片区域可能是一个主题,或一种情绪,或一种风格,它可能值得返回。艺术在于找到一个新的领域,并把自己放在那里,在你捕捉到的东西中锻炼出良好的品味和敏锐的眼光的工艺。当摄影第一次出现时,似乎摄影师所要做的就是按下按钮。同样地,一个人要想获得辉煌的人工智能图像,似乎只需按下按钮。在这两种情况下,你都会得到一个图像。但要得到一个伟大的图像,一个真正的艺术图像,那是另一回事。

可访问的人工智能图像生成器甚至没有一年的历史,但已经很明显,有些人在创造人工智能图像方面比其他人好得多。虽然他们使用的是同样的程序,但那些积累了数千小时的算法的人可以神奇地产生比普通人好很多倍的图像。这些大师的图像具有惊人的连贯性和视觉上的大胆性,通常被人工智能往往产生的大量细节所淹没。这是因为这是一项团队运动。人类艺术家和机器艺术家是一个二重奏。这不仅需要经验,还需要大量的时间和工作来产生有用的东西。就好像人工智能上有一个滑杆:一端是最大惊喜,另一端是最大服从。要让人工智能给你带来惊喜是非常容易的。(而这往往是我们对它的要求。)但让人工智能服从你是非常困难的。正如马里奥-克林格曼(Mario Klingemann)所说,他以销售他的人工智能生成的艺术品的NFT为生,"如果你心中有一个非常具体的图像,它总是感觉你在对抗一个力场"。诸如 "为这个区域遮阳"、"加强这个部分 "和 "调低它 "的命令被不情愿地服从了。AI必须被说服。

当前版本的DALL-E、稳定扩散和Midjourney将提示的长度限制在一条长推文的范围内。再长的话,文字就会混在一起;图像就会变成浆糊。这意味着每一个神话般的图像背后都有一个召唤它的简短的魔法咒语。它从第一个咒语开始。你如何说它很重要。你的直接结果体现在四到九张图片的网格中。从这批图片中,你对后代图片进行变化和变异。现在你有了一个雏形。如果它们看起来很有希望,就开始调整咒语,在它诞生更多代的图像时将其推向新的方向。在你寻找最引人注目的构图时,将这组图像反复繁殖。如果需要几十代,也不要感到绝望。像人工智能一样思考;它喜欢听什么?悄悄地告诉它过去的成功经验,并把它们加入提示中。重复进行。改变词序,看看它是否喜欢这样。记住要具体。重复进行,直到你积累了一整个部落的图像,看起来有很好的骨骼和潜力。现在,除了少数几个,把所有的都删掉。要毫不留情。开始对最有希望的图像进行绘画。这意味着要求人工智能在当前边界之外的某些方向上扩展图像。擦除那些不工作的部分。建议由人工智能用更多的咒语进行替换(称为内画)。如果人工智能不能理解你的暗示,可以试试其他人使用的咒语。当人工智能已经走得很远时,将图像迁移到Photoshop进行最后的裁剪。就像你什么都没做过一样,尽管一张与众不同的图像需要50个步骤,这也是很常见的。

这种新的魔术背后是提示的艺术。每个艺术家或设计师都开发出一种方法,通过进化他们的提示,说服人工智能产生其最佳效果。让我们把这些新的艺术家称为AI低语者,或提示艺术家,或提示者。提示者几乎像导演一样工作,引导他们的外星合作者的工作走向一个统一的愿景。从人工智能中提取一流图片所需的复杂过程,正迅速成为一种精细的艺术技能。几乎每天都有新的工具出现,使提示工作变得更容易、更好。PromptBase是一个供提示者销售提示的市场,这些提示可以创建简单的图像,如表情符号、标志、图标、头像和游戏武器。这就像剪贴画,但他们不卖艺术,而是卖产生艺术的提示。而且,与固定的剪贴画不同,它很容易改变和调整艺术,以适应你的需要,而且你可以反复提取多个版本。这些提示大多卖几块钱,这是一个公平的价格,因为自己磨练一个提示是多么的麻烦。

平均水平以上的提示不仅包括主题,而且还描述了照明、视角、唤起的情感、调色板、抽象程度,也许还有模仿的参考图片。就像其他艺术技能一样,现在有一些课程和指南来训练新晋的提示者掌握提示的细微之处。DALL-E 2的一个粉丝Guy Parsons把一本免费的提示书放在一起,里面有关于如何超越哗众取宠和获得可以实际使用的图像的技巧。一个例子。帕森说,如果你的提示包括特定的术语,如 "西格玛75毫米相机镜头",那么人工智能不只是创造出镜头的特定外观;"它更广泛地暗指'镜头出现在描述中的那种照片'",这往往更专业,因此产生更高质量的图像。正是这种多层次的掌握,产生了壮观的结果。

由于技术原因,即使你重复完全相同的提示,你也不可能得到相同的图像。每个图像都有一个随机生成的种子,如果没有这个种子,从统计学上讲是不可能复制的。此外,给不同的人工智能引擎提供相同的提示会产生不同的图像--《中程》的图像更具有绘画性,而《达利》则为摄影的真实性而优化。尽管如此,并不是每个提示者都愿意分享他们的秘密。看到一个特别出色的图像后,人们的自然反应是问:"你用的是什么咒语?" 提示的内容是什么?罗宾-米勒,传奇游戏《神秘》的共同创造者和数字艺术家的先驱,每天都会发布一张人工智能生成的图片。"当人们问我用了什么提示时,"他说,"我一直很惊讶,我不想告诉他们。这里面有一种艺术,这也让我感到惊讶。" 克林格曼以不分享他的提示而闻名。"我相信所有的图像都已经存在了,"他说。"你不制造它们,你找到它们。如果你通过巧妙的提示到达某个地方,我不明白为什么我想邀请其他人去那里"。

在我看来,提示者显然是在做真正的艺术。一个完美的电影导演,如希区柯克,如黑泽明,不过是一个演员、动作、场景、想法的提示者,又是什么呢?优秀的图像生成器提示者也在从事类似的工作,他们试图在艺术馆出售自己的作品或参加艺术竞赛也不为过。今年夏天,杰森-艾伦在科罗拉多州博览会美术比赛中赢得了数字艺术类的第一名,他的作品是以太空歌剧为主题的大型画布,署名是 "杰森-艾伦通过Midjourney"。这是一幅相当酷的图片,无论使用什么工具,都需要花费一些精力来制作。通常情况下,数字艺术类别的图像是使用Photoshop和Blender类型的工具创建的,这些工具使艺术家能够从数字化的物体、纹理和部件库中获取信息,然后将其拼贴在一起形成场景。它们不是画出来的;这些数字图像是不折不扣的技术组合。拼贴画是一种古老的艺术形式,使用人工智能来培育拼贴画是一种自然的进化。如果说3D渲染的拼贴画是艺术,那么Midjourney图片就是艺术。正如艾伦告诉Vice,"我一直在探索一种特殊的提示。我已经用它创作了数百张图片,经过许多星期的微调和策划,我选择了我的前三名,并将它们印在画布上"。

当然,艾伦的蓝丝带敲响了警钟。对一些评论家来说,这是一个末世的标志,是艺术的终结,是人类艺术家的终结。随之而来的是可预见的悲叹,许多人指出这对挣扎的艺术家来说是多么不公平。人工智能不仅会接管并杀死我们所有人--显然,它们会在这样做的同时创造出世界上最好的艺术。

每一项新技术的诞生都会点燃一个技术恐慌周期。有七个阶段。

  • 不要用这种无稽之谈来烦我。它将永远不会工作。
  • 好吧,它正在发生,但它是危险的,因为它不能很好地工作。
  • 等等,它运作得太好。我们需要阻止它。做点什么吧!
  • 这东西太强大了,对那些无法获得它的人来说是不公平的。
  • 现在它无处不在,而且没有办法逃脱它。不公平。
  • 我打算放弃它。为期一个月。
  • 让我们把注意力放在真正的问题上--那就是当前的下一个新技术。

今天,在人工智能图像生成器的情况下,一群新兴的、非常精通技术的艺术家和摄影师正在三级恐慌中工作。以一种被动的、第三人称的、假想的方式,他们担心其他人(但绝不是他们自己)会失去工作。盖蒂图片社(Getty Images)是销售用于设计和编辑的图片库和插图的首要机构,它已经禁止了人工智能生成的图片;某些在DeviantArt上发布作品的艺术家也提出了类似的禁令。有一些善意的要求是用标签来识别人工智能艺术,并将其与 "真正的 "艺术区分开来。

除此之外,一些艺术家希望保证他们自己的作品不被用来训练人工智能。但这是典型的第三级恐慌,因为它充其量只是被误导了。算法会接触到60亿张图片和相应的文字。如果你不是一个有影响力的艺术家,删除你的作品就没有任何区别。无论训练集里有没有你的作品,生成的图片看起来都完全一样。但是,即使你是一个有影响力的艺术家,删除你的图片仍然没有关系。因为你的风格影响了其他人的作品--影响力的定义--即使你的图片被删除,你的影响力也会存在。想象一下,如果我们把梵高的所有图片从训练集中删除。梵高的风格仍然会嵌入到那些模仿他或受他影响的人所创作的浩瀚的图像海洋中。

风格是通过提示来召唤的,比如说。"以梵高的风格"。一些不快乐的艺术家宁愿他们的名字被审查,不允许被用作提示。因此,即使他们的影响不能被消除,你也无法达到,因为他们的名字是禁区。正如我们从以前所有的审查尝试中知道的那样,这类言论禁令很容易绕过;你可以把名字拼错,或者简单地用文字描述风格。例如,我发现,我可以生成具有雄伟灯光和突出前景的详细黑白自然景观照片--而不需要使用安塞尔-亚当斯的名字。

艺术家还有另一个动机要把自己删除。他们可能担心大公司会从他们的作品中赚钱,而他们的贡献不会得到补偿。但我们不会因为人类艺术家对其他人类艺术家的影响而给他们补偿。以大卫-霍克尼为例,他是目前收入最高的艺术家之一。霍克尼经常承认其他在世艺术家对他的作品产生了巨大的影响。作为一个社会,我们不期望他(或其他人)给他的影响者写支票,尽管他可以。认为人工智能应该向他们的影响者支付费用是一种延伸。成功的艺术家为他们的成功支付的 "税 "是他们对其他人的成功的无偿影响。

更重要的是,影响的界限是出了名的模糊、短暂和不精确。我们都受到周围一切事物的影响,其程度我们不知道,当然也无法量化。当我们写备忘录或用手机拍照片时,我们在多大程度上受到了海明威或多萝西娅-兰格的直接或间接影响?当我们创造一些东西时,不可能解开我们的影响。同样,也不可能解开人工智能图像宇宙中的影响线。理论上,我们可以构建一个系统,将人工智能赚到的钱支付给训练集中的艺术家,但我们必须认识到,这种信用会被任意地(不公平地)进行,在60亿份的池子里,每个艺术家的实际补偿金额会非常微不足道,以至于毫无意义。

在未来几年里,人工智能图像生成器内的计算引擎将继续扩大和改进,直到它成为我们在视觉上所做的一切的中心节点。它将从字面上看到所有的东西,知道所有的风格,它将描绘、想象和生成我们需要的任何东西。它将成为一个视觉搜索引擎,一个理解图像的视觉百科全书,以及我们使用的最重要的感官--我们的视觉的主要工具。现在,每一个在AI中深入运行的神经网络算法都依赖于大量的数据--因此需要数十亿的图像来训练它。但在未来十年,我们将拥有可操作的人工智能,它所依赖的学习实例要少得多,也许只有1万个。我们将教更强大的人工智能图像生成器如何作画,向它们展示数以千计的精心策划、高度精选的现有艺术图像,当这一点到来时,各种背景的艺术家将为被纳入训练集而相互争夺。如果一个艺术家进入了主池,他们的影响力将被所有人分享和感受,而那些未被纳入的艺术家必须克服任何艺术家的主要障碍:不是盗版,而是默默无闻。

二维生成算法一诞生,实验者们就急着想知道下一步是什么。Nvidia公司雄心勃勃的联合创始人Jensen Huang认为,下一代芯片将为metaverse生成3D世界--他称之为 "下一代计算平台"。在今年9月的一个星期内,有三款新颖的文字转3D/视频图像生成器被宣布。GET3D(Nvidia)、Make-A-Video(Meta)和DreamFusion(谷歌)。扩张的速度比我写的还快。尽管由人工智能产生的可定格的2D图片令人惊奇,但将它们的创作外包并不会从根本上改变世界。我们已经处于2D的高峰期。人工智能图像生成器所释放的真正的超级力量将是生产3D图像和视频。

一个未来的3D引擎的提示可能看起来像这样。"创建一个青少年的凌乱卧室,墙上贴着海报,床上没有整理,下午的阳光透过紧闭的百叶窗照射进来。" 几秒钟后,一个完整的房间就诞生了,衣柜的门打开了,所有的脏衣服都在地板上,完全是3D的。然后,告诉人工智能:"做一个70年代的厨房,用冰箱磁铁和储藏室里的所有麦片盒。在全体积的细节中。一个你可以走过的厨房。或者可以在视频中拍照"。挤满了交替渲染的世界的游戏,以及用服装和布景装饰的长篇电影,对个人艺术家来说永远是遥不可及的,他们仍然受到大笔资金的影响。人工智能可以使游戏、地铁和电影像小说、绘画和歌曲一样快速制作。皮克斯的电影在一瞬间就完成了! 一旦数以百万计的业余爱好者在家里制作出数十亿部电影和无尽的元数据,他们将用自己的本土天才孵化出全新的媒体类型--虚拟旅游、空间记忆。而当大笔资金和专业人士配备了这些新工具后,我们将看到复杂程度前所未有的杰作。

但是,即使是3D世界和视频的广阔宇宙,也不足以容纳人工智能图像生成器所发起的破坏。DALL-E、Midjourney和Stable Diffusion只是各种类型的生成器的第一个版本。它们的首要功能,即模式识别,对人类大脑来说几乎是一种反射,是我们在没有意识到的情况下完成的。它是我们所做的几乎所有事情的核心。当然,我们的思维比模式识别更复杂;几十种认知功能使我们的大脑充满活力。但这种在机器中合成的单一类型的认知(也是我们迄今为止所合成的唯一认知),已经比我们最初想象的更进一步--而且可能会继续比我们现在想象的更进一步。

当人工智能注意到一种模式时,它会以一种压缩的方式存储它。圆形物体被放在一个 "圆度 "方向,红色物体被放在另一个 "红度 "方向,以此类推。也许它也会注意到 "绿度 "和 "食物度"。它抽象出数十亿个方向,或模式。经过思考或训练,它注意到这四种品质的重叠产生了 "应用性",又是一个方向。此外,它还将所有这些注意到的方向与单词模式联系起来,后者也可以共享重叠的品质。因此,当人类通过 "苹果 "这个词要求一幅苹果的图片时,人工智能就会画出一幅具有这四种(或更多)品质的图片。它不是在组装现有图片的碎片;相反,它是在 "想象 "一个具有适当品质的新图片。它有点像记住了一幅不存在但可能存在的图片。

这种技术也可用于--事实上,已经在使用,以非常早期的形式--寻找新的药物。人工智能在我们已知的所有活性药物分子的数据库中接受训练,注意到其化学结构的模式。然后,人工智能被要求 "记住 "或想象出我们从未想过的、似乎与有效分子相似的分子。奇妙的是,其中一些确实有效,就像人工智能对一个被要求的假想水果的图像可以看起来非常像一个水果。这是真正的转变,不久之后,同样的技术将被用来帮助设计汽车、起草法律、编写代码、编写原声带、组装世界以进行娱乐和指导,并共同创造我们作为工作的东西。我们应该牢记迄今为止从人工智能图像生成器中学到的教训,因为很快就会有更多寻求模式的人工智能出现在生活的各个领域。我们目前面临的恐慌周期只是为即将到来的转变做了很好的预演。

到目前为止,我们对人工智能生成器的了解是,它们作为合作伙伴工作得最好。一个流氓人工智能接管的噩梦不会发生。这种设想从根本上说是对历史的误读。在过去,技术很少直接取代人类想做的工作。例如,机器自动生成照片--被称为照相机--在19世纪被人担心,因为它肯定会使肖像画家失去业务。但历史学家汉斯-鲁斯博姆(Hans Rooseboom)只找到了当时一个因摄影而感到失业的人像画家。(摄影实际上激发了那个世纪后期绘画的复苏)。在我们这个时代,我们可能已经预计到摄影的专业职业会下降,因为智能手机吞噬了世界,每个人都成为了摄影师--每天有9500万张照片上传到Instagram,而且还在不断增加。然而,美国的摄影专业人员的数量一直在缓慢上升,从2002年的16万(相机手机之前)到2021年的23万。

与其惧怕人工智能,我们不如思考它教会了我们什么。而人工智能图像生成器教给我们最重要的东西就是这个。创造力不是某种超自然的力量。它是可以被合成、放大和操纵的东西。事实证明,我们并不需要为了孵化创造力而实现智能。创造力比我们想象的更有元素。它是独立于意识的。我们可以在像深度学习的神经网络这样愚蠢的东西中产生创造力。大量的数据加上模式识别算法似乎足以设计出一个过程,这个过程会不停地给我们带来惊喜和帮助。

研究创造力的学者们提到了一种叫做大写字母创造力的东西。大写字母创造力是一个重大突破所带来的惊人的、改变领域的、改变世界的重新安排。想想狭义相对论,DNA的发现,或毕加索的格尔尼卡。大写字母的创造力超越了单纯的新。它是特别的,也是罕见的。它以一种深刻的方式触动了我们人类,远远超过外星人工智能所能理解的。

要与人类深入联系,总是需要一个有创造力的人类在其中。然而,这种高度的创造力,不应该与大多数人类艺术家、设计师和发明家日复一日产生的创造力相混淆。平凡的、普通的、小写的创造力是我们从一个伟大的新标志设计或一个很酷的书籍封面,一个别致的数字可穿戴设备或最新的必备时装,或我们最喜欢的科幻连续剧的场景设计中得到的东西。大多数人类艺术,无论过去还是现在,都是小写字母。而小写字母的创意正是人工智能生成器所提供的。

但这是巨大的。有史以来第一次,人类可以按需、实时、大规模、廉价地创造出日常的创意行为。合成的创造力现在是一种商品。古代的哲学家们会在他们的坟墓里翻身,但事实证明,要产生创造力--产生新的东西--你需要的只是正确的代码。我们可以把它插入目前没有活力的微小设备中,或者我们可以把创造力应用于大型统计模型,或者把创造力嵌入药物发现程序中。我们还能用合成创造力做什么?我们可能觉得有点像中世纪的农民,他们被问到:"如果你的指尖上有250匹马的力量,你会怎么做?" 我们不知道。这是一个非凡的礼物。我们所知道的是,我们现在有了轻松的创造力引擎,我们可以把它瞄准那些从未见过新奇、创新或创造性变化的陈旧角落。在一切都崩溃的背景下,这种超能力可以帮助我们无限期地延长惊叹。如果使用得当,我们可以在宇宙中创造一个小小的凹痕。

英文全文:

Picture Limitless Creativity at Your Fingertips
By Kevin Kelly

PICTURE LEE UNKRICH, one of Pixar’s most distinguished animators, as a seventh grader. He’s staring at an image of a train locomotive on the screen of his school’s first computer. Wow, he thinks. Some of the magic wears off, however, when Lee learns that the image had not appeared simply by asking for “a picture of a train.” Instead, it had to be painstakingly coded and rendered—by hard-working humans.

Now picture Lee 43 years later, stumbling onto DALL-E, an artificial intelligence that generates original works of art based on human-supplied prompts that can literally be as simple as “a picture of a train.” As he types in words to create image after image, the wow is back. Only this time, it doesn’t go away. “It feels like a miracle,” he says. “When the results appeared, my breath was taken away and tears welled in my eyes. It’s that magical.”

Our machines have crossed a threshold. All our lives, we have been reassured that computers were incapable of being truly creative. Yet, suddenly, millions of people are now using a new breed of AIs to generate stunning, never-before-seen pictures. Most of these users are not, like Lee Unkrich, professional artists, and that’s the point: They do not have to be. Not everyone can write, direct, and edit an Oscar winner like Toy Story 3 or Coco, but everyone can launch an AI image generator and type in an idea. What appears on the screen is astounding in its realism and depth of detail. Thus the universal response: Wow. On four services alone—Midjourney, Stable Diffusion, Artbreeder, and DALL-E—humans working with AIs now cocreate more than 20 million images every day. With a paintbrush in hand, artificial intelligence has become an engine of wow.

Because these surprise-generating AIs have learned their art from billions of pictures made by humans, their output hovers around what we expect pictures to look like. But because they are an alien AI, fundamentally mysterious even to their creators, they restructure the new pictures in a way no human is likely to think of, filling in details most of us wouldn’t have the artistry to imagine, let alone the skills to execute. They can also be instructed to generate more variations of something we like, in whatever style we want—in seconds. This, ultimately, is their most powerful advantage: They can make new things that are relatable and comprehensible but, at the same time, completely unexpected.

So unexpected are these new AI-generated images, in fact, that—in the silent awe immediately following the wow—another thought occurs to just about everyone who has encountered them: Human-made art must now be over. Who can compete with the speed, cheapness, scale, and, yes, wild creativity of these machines? Is art yet another human pursuit we must yield to robots? And the next obvious question: If computers can be creative, what else can they do that we were told they could not?

I have spent the past six months using AIs to create thousands of striking images, often losing a night’s sleep in the unending quest to find just one more beauty hidden in the code. And after interviewing the creators, power users, and other early adopters of these generators, I can make a very clear prediction: Generative AI will alter how we design just about everything. Oh, and not a single human artist will lose their job because of this new technology.

IT IS NO exaggeration to call images generated with the help of AI cocreations. The sobering secret of this new power is that the best applications of it are the result not of typing in a single prompt but of very long conversations between humans and machines. Progress for each image comes from many, many iterations, back-and-forths, detours, and hours, sometimes days, of teamwork—all on the back of years of advancements in machine learning.

AI image generators were born from the marriage of two separate technologies. One was a historical line of deep learning neural nets that could generate coherent realistic images, and the other was a natural language model that could serve as an interface to the image engine. The two were combined into a language-driven image generator. Researchers scraped the internet for all images that had adjacent text, such as captions, and used billions of these examples to connect visual forms to words, and words to forms. With this new combination, human users could enter a string of words—the prompt—that described the image they sought, and the prompt would generate an image based on those words.

Scientists now at Google invented the diffusion computational models that are at the core of image generators today, but the company has been so concerned about what people might do with them that it still has not opened its own experimental generators, Imagen and Parti, to the public. (Only employees can try them, and with tight guidelines on what can be requested.) It is no coincidence, then, that the three most popular platforms for image generators right now are three startups with no legacy to protect. Midjourney is a bootstrapping startup launched by David Holz, who based the generator in an emerging community of artists. The interface to the AI is a noisy Discord server; all the work and prompts were made public from the start. DALL-E is a second-gen product of the nonprofit OpenAI, funded by Elon Musk and others. Stable Diffusion appeared on the scene in August 2022, created by Emad Mostaque, a European entrepreneur. It’s an open source project, with the added benefit that anyone can download its software and run it locally on their own desktop. More than the others, Stable Diffusion has unleashed AI image generators into the wild.

Why are so many people so excited to play with these AIs? Many images are being created for the same reason that humans have always made most art: because the images are pretty and we want to look at them. Like flames in a campfire, the light patterns are mesmerizing. They never repeat themselves; they surprise, again and again. They depict scenes no one has witnessed before or can even imagine, and they are expertly composed. It’s a similar pleasure to exploring a video game world, or paging through an art book. There is a real beauty to their creativity, and we stare much in the way we might appreciate a great art show at a museum. In fact, viewing a parade of generated images is very much like visiting a personal museum—but in this case, the walls are full of art we ask for. And the perpetual novelty and surprise of the next image hardly wanes. Users may share the gems they discover, but my guess is that 99 percent of the 20 million images currently generated each day will only ever be viewed by a single human—their cocreator.

Like any art, the images can also be healing. People spend time making strange AI pictures for the same reason they might paint on Sundays, or scribble in a journal, or shoot a video. They use the media to work out something in their own lives, something that can’t be said otherwise. I’ve seen images depicting what animal heaven might look like, created in response to the death of a beloved dog. Many images explore the representation of intangible, spiritual realms, presumably as a way to think about them. “A huge portion of the entire usage is basically art therapy,” Holz, the Midjourney creator, tells me. “The images are not really aesthetically appealing in a universal sense but are appealing, in a very deep way, within the context of what’s going on in people’s lives.” The machines can be used to generate fantasies of all types. While the hosted services prohibit porn and gore, anything goes on the desktop versions, as it might in Photoshop.

AI-generated pictures can be utilitarian too. Say you are presenting a report on the possibility of recycling hospital plastic waste into construction materials and you want an image of a house made out of test tubes. You could search stock photo markets for a usable image made by a human artist. But a unique assignment like this rarely yields a preexisting picture, and even if found, its copyright status could be dubious or expensive. It is cheaper, faster, and probably far more appropriate to generate a unique, personalized image for your report in a few minutes that you can then insert into your slides, newsletter, or blog—and the copyright ownership is yours (for now). I have been using these generators myself to cocreate images for my own slide presentations.

In an informal poll of power users, I found that only about 40 percent of their time is spent seeking utilitarian images. Most AI images are used in places where there were no images previously. They usually do not replace an image created by a human artist. They may be created, for example, to illustrate a text-only newsletter by someone without artistic talent themselves, or the time and budget to hire someone. Just as mechanical photography did not kill human illustrations a century ago, but rather significantly expanded the places in which images appeared, so too do AI image generators open up possibilities for more art, not less. We’ll begin to see contextually generated images predominately in spaces that are currently blank, like emails, text messages, blogs, books, and social media.

This new art resides somewhere between painting and photography. It lives in a possibility space as large as painting and drawing—as huge as human imagination. But you move through the space like a photographer, hunting for discoveries. Tweaking your prompts, you may arrive at a spot no one has visited before, so you explore this area slowly, taking snapshots as you step through. The territory might be a subject, or a mood, or a style, and it might be worth returning to. The art is in the craft of finding a new area and setting yourself up there, exercising good taste and the keen eye of curation in what you capture. When photography first appeared, it seemed as if all the photographer had to do was push the button. Likewise, it seems that all a person has to do for a glorious AI image is push the button. In both cases, you get an image. But to get a great one—a truly artistic one—well, that’s another matter.

ACCESSIBLE AI IMAGE generators are not even a year old, but already it is evident that some people are much better at creating AI images than others. Although they’re using the same programs, those who have accumulated thousands of hours with the algorithms can magically produce images that are many times better than the average person’s. The images by these masters have a striking coherence and visual boldness that is normally overwhelmed by the flood of details the AIs tend to produce. That is because this is a team sport: The human artist and the machine artist are a duet. And it requires not just experience but also lots of hours and work to produce something useful. It is as if there is a slider bar on the AI: At one end is Maximum Surprise, and at the other end Maximum Obedience. It is very easy to get the AI to surprise you. (And that is often all we ask of it.) But it is very difficult to get the AI to obey you. As Mario Klingemann, who makes his living selling NFTs of his AI-generated artwork, says, “If you have a very specific image in mind, it always feels like you are up against a forcefield.” Commands like “shade this area,” “enhance this part,” and “tone it down” are obeyed reluctantly. The AIs have to be persuaded.

Current versions of DALL-E, Stable Diffusion, and Midjourney limit prompts to about the length of a long tweet. Any longer and the words muddle together; the image turns to mush. That means that behind every fabulous image lies a short magic spell that summons it. It begins with the first incantation. How you say it matters. Your immediate results materialize in a grid of four to nine images. From that batch of pictures, you variate and mutate offspring images. Now you have a brood. If they look promising, begin to tweak the spell to nudge it in new directions as it births more generations of images. Multiply the group again and again as you search for the most compelling composition. Do not despair if it takes dozens of generations. Think like the AI; what does it like to hear? Whisper instructions that have worked in the past, and add them to the prompt. Repeat. Change the word order to see whether it likes that. Remember to be specific. Replicate until you have amassed a whole tribe of images that seem to have good bones and potential. Now cull out all but a select few. Be merciless. Begin outpainting the most promising images. That means asking the AI to extend the image out in certain directions beyond the current borders. Erase those portions that are not working. Suggest replacements to be done by the AI with more incantations (called inpainting). If the AI is not comprehending your hints, try spells used by others. When the AI has gone as far as it can, migrate the image to Photoshop for final tailoring. Present it as if you have done nothing, even though it is not uncommon for a distinctive image to require 50 steps.

Behind this new magecraft is the art of prompting. Each artist or designer develops a way of persuading an AI to yield its best by evolving their prompts. Let’s call these new artists AI whisperers, or prompt artists, or promptors. The promptors work almost as directors, guiding the work of their alien collaborators toward a unified vision. The convoluted process required to tease a first-rate picture out of an AI is quickly emerging as a fine-art skill. Almost daily, new tools arrive to make prompting easier, better. PromptBase is a market for promptors to sell prompts that create simple images such as emoticons, logos, icons, avatars, and game weapons. It’s like clip art, but instead of selling the art, they sell the prompt that generates the art. And unlike fixed clip art, it is easy to alter and tweak the art to fit your needs, and you can extract multiple versions again and again. Most of these prompts sell for a couple bucks, which is a fair price, given how much trouble it is to hone a prompt on your own.

Above-average prompts not only include the subject but also describe the lighting, the point of view, the emotion evoked, the color palette, the degree of abstraction, and perhaps a reference picture to imitate. As with other artistic skills, there are now courses and guidebooks to train the budding promptor in the finer points of prompting. One fan of DALL-E 2, Guy Parsons, put together a free Prompt Book, jammed with tips on how to go beyond the wow and get images you can actually use. One example: If your prompt includes specific terms such as “Sigma 75 mm camera lens,” Parson says, then the AI doesn’t just create that specific look made by the lens; “it more broadly alludes to ‘the kind of photo where the lens appears in the description,’” which tends to be more professional and therefore yields higher-quality images. It’s this kind of multilevel mastery that produces spectacular results.

For technical reasons, even if you repeat the exact same prompt, you are unlikely to get the same image. There is a randomly generated seed for each image, without which it is statistically impossible to replicate. Additionally, the same prompt given to different AI engines produces different images—Midjourney’s are more painterly, while DALL-E is optimized for photographic realism. Still, not every promptor wishes to share their secrets. The natural reaction upon seeing a particularly brilliant image is to ask, “What spell did you use?” What was the prompt? Robyn Miller, cocreator of the legendary game Myst and a pioneering digital artist, has been posting an AI-generated image every day. “When people ask me what prompt I used,” he says, “I have been surprised that I don’t want to tell them. There is an art to this, and that has also surprised me.” Klingemann is famous for not sharing his prompts. “I believe all images already exist,” he says. “You don’t make them, you find them. If you get somewhere by clever prompting, I do not see why I want to invite everybody else there.”

It seems obvious to me that promptors are making true art. What is a consummate movie director—like Hitchcock, like Kurosawa—but a promptor of actors, actions, scenes, ideas? Good image-generator promptors are engaged in a similar craft, and it is no stretch for them to try and sell their creations in art galleries or enter them into art contests. This summer, Jason Allen won first place in the digital art category at the Colorado State Fair Fine Art competition for a large, space-opera-themed canvas that was signed “Jason Allen via Midjourney.” It’s a pretty cool picture that would’ve taken some effort to make no matter what tools were used. Usually images in the digital art category are created using Photoshop and Blender-type tools that enable the artist to dip into libraries of digitized objects, textures, and parts, which are then collaged together to form the scene. They are not drawn; these digital images are unapologetically technological assemblages. Collages are a venerable art form, and using AI to breed a collage is a natural evolution. If a 3D-rendered collage is art, then a Midjourney picture is art. As Allen told Vice, “I have been exploring a special prompt. I have created hundreds of images using it, and after many weeks of fine-tuning and curating my gens, I chose my top 3 and had them printed on canvas.”

Of course, Allen’s blue ribbon set off alarm bells. To some critics, this was a sign of the end times, the end of art, the end of human artists. Predictable lamentations ensued, with many pointing out how unfair it felt for struggling artists. The AIs are not only going to take over and kill us all—they are, apparently, going to make the world’s best art while doing so.

AT ITS BIRTH, every new technology ignites a Tech Panic Cycle. There are seven phases:

  • Don’t bother me with this nonsense. It will never work.
  • OK, it is happening, but it’s dangerous, ’cause it doesn’t work well.
  • Wait, it works too well. We need to hobble it. Do something!
  • This stuff is so powerful that it’s not fair to those without access to it.
  • Now it’s everywhere, and there is no way to escape it. Not fair.
  • I am going to give it up. For a month.
  • Let’s focus on the real problem—which is the next current thing.

Today, in the case of AI image generators, an emerging band of very tech-savvy artists and photographers are working out of a Level 3 panic. In a reactive, third-person, hypothetical way, they fear other people (but never themselves) might lose their jobs. Getty Images, the premier agency selling stock photos and illustrations for design and editorial use, has already banned AI-generated images; certain artists who post their work on DeviantArt have demanded a similar ban. There are well-intentioned demands to identify AI art with a label and to segregate it from “real” art.

Beyond that, some artists want assurances that their own work not be used to train the AIs. But this is typical of Level 3 panic—in that it is, at best, misguided. The algorithms are exposed to 6 billion images with attendant text. If you are not an influential artist, removing your work makes zero difference. A generated picture will look exactly the same with or without your work in the training set. But even if you are an influential artist, removing your images still won't matter. Because your style has affected the work of others—the definition of influence—your influence will remain even if your images are removed. Imagine if we removed all of Van Gogh’s pictures from the training set. The style of Van Gogh would still be embedded in the vast ocean of images created by those who have imitated or been influenced by him.

Styles are summoned via prompts, as in: “in the style of Van Gogh.” Some unhappy artists would rather their names be censored and not permitted to be used as a prompt. So even if their influence can’t be removed, you can’t reach it because their name is off-limits. As we know from all previous attempts at censoring, these kinds of speech bans are easy to work around; you can misspell a name, or simply describe the style in words. I found, for example, that I could generate detailed black-and-white natural landscape photographs with majestic lighting and prominent foregrounds—without ever using Ansel Adams’ name.

There is another motivation for an artist to remove themselves. They might fear that a big corporation will make money off of their work, and their contribution won’t be compensated. But we don’t compensate human artists for their influence on other human artists. Take David Hockney, one of the highest-paid living artists. Hockney often acknowledges the great influence other living artists have on his work. As a society, we don’t expect him (or others) to write checks to his influences, even though he could. It’s a stretch to think AIs should pay their influencers. The “tax” that successful artists pay for their success is their unpaid influence on the success of others.

What’s more, lines of influence are famously blurred, ephemeral, and imprecise. We are all influenced by everything around us, to degrees we are not aware of and certainly can’t quantify. When we write a memo or snap a picture with our phone, to what extent have we been influenced—directly or indirectly—by Ernest Hemingway or Dorothea Lange? It’s impossible to unravel our influences when we create something. It is likewise impossible to unravel the strands of influence in the AI image universe. We could theoretically construct a system to pay money earned by the AI to artists in the training set, but we’d have to recognize that this credit would be made arbitrarily (unfairly) and that the actual compensatory amounts per artist in a pool of 6 billion shares would be so trivial as to be nonsensical.

In the coming years, the computational engine inside an AI image generator will continue to expand and improve until it becomes a central node in whatever we do visually. It will have literally seen everything and know all styles, and it will paint, imagine, and generate just about anything we need. It will become a visual search engine, and a visual encyclopedia with which to understand images, and the primary tool we use with our most important sense, our sight. Right now, every neural net algorithm running deep in the AIs relies on massive amounts of data—thus the billions of images needed to train it. But in the next decade, we’ll have operational AI that relies on far fewer examples to learn, perhaps as few as 10,000. We’ll teach even more powerful AI image generators how to paint by showing them thousands of carefully curated, highly selected images of existing art, and when this point comes, artists of all backgrounds will be fighting one another to be included in the training set. If an artist is in the main pool, their influence will be shared and felt by all, while those not included must overcome the primary obstacle for any artist: not piracy, but obscurity.

AS SOON AS 2D generative algorithms were born, experimenters rushed to figure out what was next. Jensen Huang, the ambitious cofounder of Nvidia, believes the next generation of chips will generate 3D worlds for the metaverse—“the next computing platform,” as he calls it. In a single week this past September, three novel text-to-3D/video image generators were announced: GET3D (Nvidia), Make-A-Video (Meta), and DreamFusion (Google). The expansion is happening faster than I can write. Amazing as frameable 2D pictures produced by AI are, outsourcing their creation is not going to radically change the world. We are already at peak 2D. The genuine superpower being released by AI image generators will be in producing 3D images and video.

A future prompt for a 3D engine might look something like this: “Create the messy bedroom of a teenager, with posters on the wall, an unmade bed, and afternoon sunlight streaming through closed blinds.” And in seconds, a fully realized room is born, the closet door open and all the dirty clothes on the floor—in full 3D. Then, tell the AI: “Make a 1970s kitchen with refrigerator magnets and all the cereal boxes in the pantry. In full volumetric detail. One that you could walk through. Or that could be photographed in a video.” Games crammed with alternatively rendered worlds and full-length movies decked out with costumes and sets have eternally been out of reach for individual artists, who remain under the power of large dollars. AI could make games, metaverses, and movies as quick to produce as novels, paintings, and songs. Pixar films in an instant! Once millions of amateurs are churning out billions of movies and endless metaverses at home, they will hatch entirely new media genres—virtual tourism, spatial memes—with their own native geniuses. And when big dollars and professionals are equipped with these new tools, we’ll see masterpieces at a level of complexity never seen before.

But even the vast universes of 3D worlds and video are not vast enough to contain the disruption that AI image generators have initiated. DALL-E, Midjourney, and Stable Diffusion are just the first versions of generative machines of all types. Their prime function, pattern recognition, is almost a reflex for human brains, something we accomplish without conscious thinking. It is at the core of almost everything we do. Our thinking is more complex than just pattern recognition, of course; dozens of cognitive functions animate our brain. But this single type of cognition, synthesized in machines (and the only cognition we have synthesized so far), has taken us further than we first thought—and will probably continue to advance further than we now think.

When an AI notices a pattern, it stores it in a compressed way. Round objects are placed in a “roundness” direction, red objects in another direction for “redness,” and so on. Maybe it notices “treeness” and “foodness” too. It abstracts out billions of directions, or patterns. Upon reflection—or training—it notices that the overlap of these four qualities produces “appleness,” yet another direction. Furthermore, it links all these noticed directions with word patterns, which can also share overlapping qualities. So when a human requests a picture of an apple via the word “apple,” the AI paints an image with those four (or more) qualities. It is not assembling bits of existing pictures; rather, it is “imagining” a new picture with the appropriate qualities. It sort of remembers a picture that does not exist but could.

This same technique can be used—in fact, is already being used, in very early forms—to find new drugs. The AI is trained on a database of all the molecules we know to be active medicines, noticing patterns in their chemical structures. Then the AI is asked to “remember” or imagine molecules we have never thought of that seem to be similar to the molecules that work. Wonderfully, some of them actually do work, just as an AI image of a requested imaginary fruit can look remarkably like a fruit. This is the real transformation, and soon enough, the same technique will be used to help design automobiles, draft laws, write code, compose soundtracks, assemble worlds to entertain and instruct, and cocreate the stuff we do as work. We should take to heart the lessons we’ve learned so far from AI image generators because there will soon be more pattern-seeking AIs in all realms of life. The panic cycle we presently face is simply a good rehearsal for the coming shift.

What we know about AI generators so far is that they work best as partners. The nightmare of a rogue AI taking over is just not happening. That vision is fundamentally a misreading of history. In the past, technology has rarely directly displaced humans from work they wanted to do. For instance, the automatic generation of pictures by a machine—called a camera—was feared in the 1800s because it would surely put portrait painters out of business. But the historian Hans Rooseboom could find only a single portrait painter from that time who felt unemployed by photography. (Photography actually inspired a resurgence of painting later in that century.) Closer to our time, we might have expected professional occupations in photography to fall as the smartphone swallowed the world and everybody became a photographer—with 95 million uploads to Instagram a day and counting. Yet the number of photography professionals in the US has been slowly rising, from 160,000 in 2002 (before camera phones) to 230,000 in 2021.

Instead of fearing AI, we are better served thinking about what it teaches us. And the most important thing AI image generators teach us is this: Creativity is not some supernatural force. It is something that can be synthesized, amplified, and manipulated. It turns out that we didn’t need to achieve intelligence in order to hatch creativity. Creativity is more elemental than we thought. It is independent of consciousness. We can generate creativity in something as dumb as a deep learning neural net. Massive data plus pattern recognition algorithms seems sufficient to engineer a process that will surprise and aid us without ceasing.

Scholars of creativity refer to something called Uppercase Creativity. Uppercase Creativity is the stunning, field-changing, world-altering rearrangement that a major breakthrough brings. Think special relativity, the discovery of DNA, or Picasso’s Guernica. Uppercase Creativity goes beyond the merely new. It is special, and it is rare. It touches us humans in a profound way, far beyond what an alien AI can fathom.

To connect with a human deeply will always require a Creative human in the loop. This high creativity, however, should not be confused with the creativity that most human artists, designers, and inventors produce day to day. Mundane, ordinary, lowercase creativity is what we get with a great new logo design or a cool book cover, a nifty digital wearable or the latest must-have fashion, or the set design for our favorite sci-fi serial. Most human art, past and present, is lowercase. And lowercase creativity is exactly what the AI generators deliver.

But this is huge. For the first time in history, humans can conjure up everyday acts of creativity on demand, in real time, at scale, for cheap. Synthetic creativity is a commodity now. Ancient philosophers will turn in their graves, but it turns out that to make creativity—to generate something new—all you need is the right code. We can insert it into tiny devices that are presently inert, or we can apply creativity to large statistical models, or embed creativity in drug discovery routines. What else can we use synthetic creativity for? We may feel a little bit like medieval peasants who are being asked, “What would you do if you had the power of 250 horses at your fingertips?” We dunno. It’s an extraordinary gift. What we do know is we now have easy engines of creativity, which we can aim into stale corners that have never seen novelty, innovation, or the wow of creative change. Against the background of everything that breaks down, this superpower can help us extend the wow indefinitely. Used properly, we can make a small dent in the universe.

Article Comments