Exactly one year ago, my book, ‘Blueberry & the Bear and Other Stories,’ made its debut. It holds the distinction of being the world’s first collection of bedtime stories authored using ChatGPT, with illustrations generated by Dall-E2. It meant I collaborated with a Large Language Model and a Diffusion Model, serving as my co-author and illustrator, respectively.
When I first introduced ‘Blueberry & the Bear’ to the world, it was met with a mixture of awe and confusion – a reaction that persists in various forms even today. The widespread introduction of AI in various sectors has been met with a mix of awe and confusion, emotions that are still prevalent today. Such reactions are completely understandable given the groundbreaking nature of recent AI advancements. For the first time in history, domains traditionally dominated by human intellect and creativity are being enhanced and challenged by artificial intelligence. Whether referred to as machines, programs, or AI entities, these innovations have shown remarkable capabilities. They can now engage in meaningful conversations, answer complex questions, talk to us in realistic voices, and create strikingly realistic images and videos from simple prompts. This represents a monumental shift into a new era of interaction between humans and machines.
My fascination with Artificial Intelligence began a few years ago, but in truth, it was a natural extension of my longstanding awe and curiosity for digital technology—a passion that took root in my childhood. This technological interest has always been intertwined with my love for the humanistic domain, including literature, art, and music. I’ve always perceived technology as an integral part of culture, not merely as an external force. This perspective might be considered unorthodox by some. I view artificial intelligence not just as technology being developed and introduced into the world, but also as a potent, culture-shaping force. Its influence, I believe, is often significantly underestimated.
Over the past year, I’ve observed the formidable force of AI creating ripples that could soon swell into tsunamis of change. The progress in foundational model science, their practical applications, and the efforts to regulate them have made 2023 a year brimming with excitement and innovation. Additionally, advancements in areas tangentially related to AI, like 3D scanning and robotics, have further fueled this dynamic environment.
The pace of advancements in AI can be best described as ‘exponential.’ This year alone, we’ve seen progress that seemed like a distant dream just a few years back. For those immersed in the field, such strides are more a confirmation than a surprise. Consider, for example, the synergy between corporate entities like the OpenAI-Microsoft partnership initiated in 2016, where Microsoft’s investment in state-of-the-art data centers for AI model training has been reciprocated with access to these advanced models. This has led to a swift integration of OpenAI’s models into Microsoft’s entire product lineup in less than a year.
However, an equally remarkable story unfolds in the open-source realm. The open-source community has played a pivotal role in the democratization of AI technology. Breakthroughs from these groups have made it possible to run LLMs and diffusion models on local PCs or through cloud APIs, bringing sophisticated AI tools to a wider audience. This open-source revolution in AI not only accelerates innovation but also raises important questions about the advantages and challenges of widespread access to these powerful technologies.
Reflecting on the evolution of AI’s prominence, at a tech conference in late November 2022, AI was barely a topic of conversation. Fast forward to Microsoft’s Ignite 2023, and the narrative has shifted dramatically, focusing on Copilots and the vision for AI assistants. This change underscores the rapid integration of AI into mainstream technology and the significant contributions of both corporate and open-source initiatives in shaping the future of AI.
The most significant transformation, however, lies in our communication methods and, intriguingly, with whom we communicate. For the first time in history, we’re engaging in genuine conversations with non-human entities. Just a few months after ChatGPT’s global impact, we’ve progressed to having low-latency voice chats with AI agents like ChatGPT and PI, mirroring real conversations. I’ve experienced this firsthand, calling both ChatGPT and PI while driving, and the resemblance to speaking with a real person is striking. These interactions allow for learning about new topics or brainstorming ideas, often yielding surprisingly insightful feedback.
Despite this, I remain acutely aware that I’m not conversing with a human. I understand the underlying process: my voice is recorded, transcribed by a voice recognition model, and this text is then input into a Large Language Model (LLM) like ChatGPT, which generates a response. This response is converted back into audio by a voice generation model, creating an audio reply that sounds convincingly human. With some of the latest low-latency models and APIs, this entire process occurs almost in real time.
However, even with this understanding, I sometimes find myself momentarily enthralled, as if I were interacting with a human. Our psychological mechanisms are traditionally wired for human interaction, requiring a recalibration to adapt to this new form of communication with AI entities.
As I expressed in a newspaper article back in March, ‘we are no longer going to be able to trust what we read, see, or hear on our gadgets.’ My firsthand experiences with all the major image and video generation tools, witnessing their use and rapid improvement, have solidified this belief. Image generation models have achieved levels of realism that are nothing short of astonishing. Although AI in video generation is somewhat behind, it has recently made significant advancements, especially in face-swapping and animating avatars.
I’m generally cautious about using the word ‘revolutionary,’ but the upcoming changes in this domain are bound to feel just like that. Be it ‘metahumans,’ ‘synthetic people,’ or ‘AI agents,’ we’re on the brink of an explosion in the number of AI-powered digital entities that look and sound eerily human. The way we, as ‘real’ humans, choose to interact with these entities will be instrumental in shaping our future.
The effectiveness of the newly agreed-upon EU AI Act, along with other international regulations, in safeguarding us from the more sinister applications of AI technology, remains to be seen. An almost quasi-religious divide is emerging between those advocating for the rapid development of full-on Artificial General Intelligence (AGI) and those who are either cautious about the brisk pace of development or outright warning of its potential to spell doom for our species.
Following the drama surrounding the OpenAI CEO or the sharp exchanges between leading AI experts can be quite the spectacle. But beyond the entertainment value, there’s a profound reality: this technology is poised to fundamentally change us. At a deep philosophical level, we are being confronted by our creations. Throughout history, we have conjured gods, often with mixed outcomes. Today, we are at it again, but this time, our creations have the potential to be far more tangible and seemingly real than the bearded figures of ancient holy texts.
As we stand at this crossroads, the decisions we make and the paths we choose in embracing and regulating AI will shape not only our technological landscape but also our very understanding of human identity and creativity.
Sous-chef – AI has a more prominent role in the generation of the text content, but still under close human guidance.