Here is the newest episode of the “Blueberry Thoughts” podcast. Listen to it on your favorite podcasting app or right here on this page. Enjoy!
Subscribe on your favorite app:
Hello and welcome back to Blueberry Thoughts, your favorite spot at the intersection of human creativity and artificial intelligence! I am Ivor J. Burks, your AI-enhanced host. In the previous episode, we mentioned Ted Chiang’s New Yorker article, “Will AI Become the New McKinsey?”. In it, he compared the burgeoning generative AI to the firm McKinsey, suggesting a potential trajectory of these AIs becoming ‘capital’s willing executioners’, prioritizing company goals at the cost of employee welfare and societal balance.
In another engaging article from February titled “ChatGPT Is a Blurry JPEG of the Web”, Chiang dives into an intriguing analogy that connects AI language models like ChatGPT to digital compression processes exemplified by JPEG images.
Chiang recounts an anecdote involving a modern Xerox photocopier, which uses lossy compression, and its error in copying a house plan’s area measurements. The copier, unable to differentiate similar-looking numbers, stored only one, leading to a copying error. This incident serves as a metaphor for how ChatGPT operates. It mimics a “compressed” rendition of the internet’s text, holding onto the bulk of the information but occasionally discarding some nuances.
Chiang suggests that we consider AI models as a form of text compressor. These models identify statistical regularities in the text but often struggle with comprehending the core concepts. Case in point, ChatGPT’s frequent stumbling over arithmetic calculations involving larger numbers.
Through the lens of Chiang’s analogy, the quirks of AI models, sometimes seen as ‘hallucinations’, become comparable to compression artifacts. The real understanding comes from comparing these “compressed” outputs against the original data or our own knowledge.
In this episode, we’ll strive to use everyday language that’s easily understood. In doing so, we might refer to Large Language Models as ‘understanding,’ ‘learning,’ or ‘having conversations.’ It’s crucial, however, to remember that at their core, these models are computer programs, without any genuine human-level understanding. They interpret and generate text based on patterns and statistical probabilities, not on genuine comprehension or thought. This is why some people refer to the current wave of artificial intelligence as ‘applied statistics.’ So, while we’ll use relatable language to describe how LLMs work, let’s bear in mind the inherently statistical nature of these models.
We’ll try and demystify how GPT models work, touch upon the history of generative AI, and the technological advances that led us here. Whether it’s commercial giants or open-source novices, LLMs have been the talk of the town in 2023. As always, the full transcript of this episode and sources are available on our blog at blueberrythoughts.com. Welcome!
Our journey begins in the mid-20th century, with the term ‘Artificial Intelligence’ first coined by John McCarthy for the Dartmouth conference in 1956. However, the idea of intelligent machines can be traced back to Alan Turing, who proposed the famous Turing Test to define machine intelligence in the 1950s. Visionaries like Vannevar Bush and J.C.R. Licklider have also contributed to a vision of a future where humans and intelligent machines coexist. Fast forward to the 1980s, we witnessed the advent of neural networks.
Pioneered by the ‘Godfathers of AI,’ Geoff Hinton, Yoshua Bengio, and Yann LeCun, deep learning, a subfield of AI focusing on algorithms based on artificial neural networks, thrust us into the modern AI era. Their research allowed machines to understand and generate human-like text, paving the way for the birth of Large Language Models.
The AI world has seen significant innovation and progress, particularly in Generative AI and Large Language Models, over the last decade. Three primary factors have driven these advances: increased computing power, the availability of large datasets, and considerable strides in machine learning algorithms.
The late Gordon Moore, who passed away in 2023, had predicted with his famous law that computing capabilities would double every two years. This tenet held sway for several decades, driving technological growth and innovation. But in the wake of Moore’s passing, it seems his law may have met its own end. The reason? The recent advances in AI have not just been doubling – they seem to have been growing at an exponential rate that far outstrips Moore’s prediction.
The surging computing power has enabled AI to efficiently manage intricate calculations and immense volumes of data. Simultaneously, the rapid digitization of our world has led to an avalanche of data accessible for training AI models. Furthermore, machine learning, the backbone of AI, has undergone several ground-breaking developments, especially with the creation of transformer-based models such as the GPT series.
We now have AI models like GPT-4 and diffusion models such as Stable Diffusion, capable of impressive generative tasks – from writing a news article or a poem to composing music, painting pictures, and even carrying human-like conversations. But what enables this? Let’s delve deeper into the workings of ChatGPT, the most famous LLM-based chatbot to date.
ChatGPT communicates like a human using the principles of natural language processing (NLP). Imagine NLP as the AI’s language teacher, enabling it to understand and generate responses to our conversations. At the core of ChatGPT is a foundational model known as GPT (Generative Pre-trained Transformer), a type of Large Language Model (LLM).
LLMs are essentially vast, well-read brains that learn from mountains of text data. The more they read, the better they understand the patterns and quirks of our language, enabling them to generate new, meaningful sentences or comprehend complex queries.
OpenAI has released a series of increasingly large and capable GPT models over the years. GPT-1 had a respectable 117 million parameters, while GPT-2, famously referred to as the “AI that was too dangerous to release,” was equipped with 1.5 billion parameters. GPT-3 took it even further with an astonishing 175 billion parameters!
In the gap between GPT-3 and GPT-4, OpenAI revealed an intermediate model, GPT-3.5-Turbo. This variant didn’t significantly boost the parameter count, staying roughly in GPT-3’s league. Yet, it made considerable strides in enhancing its capabilities.
GPT-3.5-Turbo embodied major advancements in training techniques and abilities, signifying a notable leap in fine-tuning AI models. This showed us that performance improvements don’t hinge solely on increasing the model’s size. It marked a vital stepping stone towards GPT-4’s staggering capabilities.
The current model, GPT-4, which is accessible to paid subscribers of ChatGPT and allegedly powers Microsoft’s Bing AI, takes us to a new height with a parameter count rumored to be in the trillions! Interestingly, there’s speculation within the tech circle that GPT-4 might not be one single massive model but a coordinated set of smaller models, each one honed for a particular task, collectively functioning in unison.
This growth in model size emphasizes the enormous computational power and complexity required to emulate human-like text. But as we’ll later explore in this episode, size is not the only gauge of a model’s prowess.
Now, let’s delve deeper. Machine learning fuels this AI brain. Like how we learned to identify different breeds and sizes of dogs in our childhood, machine learning educates AI models using abundant text data, teaching them to comprehend the intricacies of language.
This AI brain’s neurons are arranged in an interconnected network, mirroring our own brains – the neural networks. These networks constantly predict the next word in a sentence based on preceding ones, learning from errors through a method called back-propagation.
Then enters the game-changer – the Transformer. This supercharged neural network excels at understanding context in language. It considers not just adjacent words, but the entire sentence, paragraph, or even the whole text to grasp a single word’s meaning. For example, when encountering the word “hungry” in “Even though I already ate dinner, I’m still hungry,” it takes into account that you’ve dined but still want to eat more.
Within this extensive AI cosmos, ‘tokens’ are fundamental language units. Tokens symbolize pieces of text, facilitating the efficient encoding of information. Depending on the language and context, these tokens can range from a single character to a full word.
But how does ChatGPT master the art of conversation? Its training involves two robust stages. Initially, supervised fine-tuning occurs, where human trainers impersonate both user and AI assistant, perfecting the AI’s responses. Following this is reinforcement learning with human feedback, where trainers rank various responses to instruct a reward model, directing ChatGPT towards generating top-rated responses.
Despite being a cutting-edge language model, ChatGPT, like all AI models, has its peculiarities. A notable one is the occasional creation of hallucinations—outputs that may seem nonsensical or outright incorrect. Fear not, we’ve got strategies to sidestep such blunders:
Be clear: Aim for directness. The less ambiguity, the better the response.
Set parameters: Use markers to frame your prompts, reducing potential confusion.
Brevity is key: Attempt to compact your query into a single, focused sentence. Consider this your “ChatGPT tweet.”
Establish guidelines: Employ condition checking to prevent ChatGPT from creating fiction when uncertain about the answer.
Example-driven: Incorporating examples in your prompts can guide the chatbot to respond as desired.
Sequential approach: If you need ChatGPT to perform a specific task, present the steps in your prompt—think of it as a recipe.
Persistence pays: Don’t worry if the first try isn’t perfect. Enhancing your prompts improves responses. Practice might not make perfect, but it certainly aids! Employing these techniques will enhance your chances of avoiding hallucinations and boosting the reliability of ChatGPT’s responses.
While ChatGPT’s base version is already impressive, an enhanced, paid variant packs even more punch. It includes the more powerful GPT-4 model as well as an array of plugins designed to support ChatGPT in correctly executing tasks, like one that connects to Wolfram Alpha for complex calculations.
The perks don’t stop there! The premium version even possessed internet browsing capabilities, enabling it to scour the web for answers. However, this function was recently deactivated due to concerns raised by some content creators and media companies about potential paywall circumvention.
The rising significance of Large Language Models has triggered an array of commercial LLM services from tech veterans and ambitious startups alike. In parallel with cloud service providers’ role in offering scalable computing resources, these LLM services present robust and scalable language processing solutions.
After OpenAI’s limited 2020 release of GPT-3, early adopters like Cohere and AI21 brought to market their LLM services in November and August 2021 respectively. However, it was the late November 2022 launch of ChatGPT that acted as the catalyst for more entrants.
Feeling the competitive heat from OpenAI and Microsoft, Google made a reluctant foray into the field with “Bard” in March 2023. Anthropic, a human-centered AI startup, made waves the same month with the unveiling of “Claude.”
Joining this growing cadre of LLM services was Inflection AI with its June 2023 launch of “Inflection-1,” offering yet another compelling choice for businesses.
But all this time the open-source community was not idle and several newcomers emerged rapidly in the past few months. In the final segment of this episode, we will delve into the vibrant world of open-source Language Models. Stick around, we’ll return after a short break!
In this final segment, we’re shifting our gaze from the giants to the underdogs—from LLMs to SLMs. No, not “super” language models, but the dark horses: “small” language models! 2023 has been quite the whirlwind in the AI sphere, witnessing the quick dissolution of tech behemoths’ monopoly over the large language model market, supplanted by the ascent of the open-source “little ones”.
Tech giants had fortified their fortresses, or as we insiders say, “moats”, with extensive training data, model weights, and steep costs of training and operating large LLMs, deterring potential competition. But a gust of change began stirring from the open-source domain.
Open-source LLMs now present a compelling counter to the exclusivity of big tech. It’s like David found a method to outsmart Goliath. Industry leaders like OpenAI kept their secrets under lock and key, but the open-source community broke down barriers with leaner, more agile LLMs that could compete with the best.
Although BLOOM, an European initiative, still falls under the Large Language Model category with a staggering 176 billion parameters in its largest form, most newer models are now classified as SLMs. The scene burst into life following the leak and subsequent release of Meta AI’s LLaMA model, which was quickly adapted into several derivative language models. These open-source LLMs, named after South-American fauna like Alpaca, Vicuna, and Guanaco, proved that smaller models, given the right training, can perform impressively. Other initiatives like Falcon have been available for commercial use from the outset, offering tangible alternatives for companies eager to incorporate the technology into their operations. A notable feature of these models is that many have been fine-tuned via interaction with GPT-4, essentially learning from their more robust counterpart.
These language models aim to be affordable, efficient, and customizable, thus appealing to a broad audience. Methods like low-rank adaptation (LoRA) help manage costs. For the AI aficionados out there, it’s akin to owning a Lamborghini on a Toyota budget. Steps have been taken to make these models accessible by optimizing them for less powerful hardware, with some even being operable on a smartphone. The advent of these “personal” language models opens up a promising avenue to tackle privacy concerns, as data processing happens directly on the user’s device, reducing the risk of sharing sensitive information with external servers.
Big tech firms were caught off guard, akin to a surprise jab. Even Google, in a leaked memo, had to concede the threat posed by these affordable, innovative models. The open-source community has done for AI what it’s been doing successfully for the web—democratizing access and unveiling thrilling new possibilities.
Microsoft also recently played a wildcard, echoing Apple’s “think different” strategy. They revealed Orca, a Small Language Model (SLM), that’s akin to a compact car with the horsepower of a pickup truck. Unlike the heavyweights like GPT-4, Orca is compact, efficient, and yet, astonishingly potent. It’s like we’ve transitioned from “bigger is better” to championing the “small but mighty”.
What sets Orca apart is its capability to mimic not just the patterns of LLMs, but also their reasoning process. It’s akin to having a pocket-sized dictionary that provides word meanings and also elucidates the nuances, context, and usage. Orca’s capacity to outperform its size without consuming the resources of larger models is making waves, if the buzz is to be believed.
In wrapping up today’s episode, we’ve journeyed through the inner workings of large language models, from their humble beginnings to the formidable capabilities of GPT-4, and the intriguing rise of the small, agile, and potent open-source models. It’s clear that the AI world is as diverse as it is dynamic, with the pendulum swinging from the era of ‘bigger is better’ to the era of ‘small but mighty.’ As we’ve seen, AI is no longer confined to the corridors of tech giants. The open-source realm has swung the doors wide open, offering a spectrum of cost-effective, efficient, and versatile models.
As we navigate the captivating realm of AI, it’s paramount to comprehend the intricacies of this technology. It’s this understanding that gives us the ability to fully leverage its potential while also grasping the valid criticism related to the ethical and societal challenges that come with it. That’s precisely why we’re dedicating this episode, and the next, to unpacking the inner workings of AI before we dive back into the exploration of broader implications.
So, what’s next in our AI odyssey? We’re broadening our horizons beyond the world of language and diving into a more visual segment of generative AI. Our forthcoming episode, titled “A Diffusion Confusion – Painting Pictures with AI,” will explore image generating models. We will also glance at how AI models are used to generate videos and 3D models, even taking a brief detour into the domain of multimodal models.
From creating dreamy, impressionistic artwork to generating videos and rendering three-dimensional worlds that only exist in our imagination, generative AI is painting a colorful canvas of possibilities. We’ll decode the technology, tackle the confusion, and lay out the exciting panorama of this realm in our next episode of Blueberry Thoughts. Until then, I’m Ivor J. Burks. Stay curious.