É hora de falar sobre o GPT-5? — O problema com os transformadores

Is it time to talk about GPT-5? — The problem with transformers

June 4, 2024 Roberto Magalhães

GPT-5 is on the horizon and promises to shake up the industry. But are more parameters needed to create a more powerful model?

In the ever-evolving world of software development, artificial intelligence (AI) has emerged as a game changer. Its potential to revolutionize industries and drive business growth has caught the attention of CEOs, CFOs and investors. As technology continues to advance at an unprecedented rate, a question arises: can AI be improved with raw energy? In this article, we will explore the possibilities and implications of empowering AI by increasing computational capabilities.

AI has evolved at an incredible rate, from early chatbots like Eliza to modern machine learning algorithms, and this rapid progression has been greatly supported by AI development services. AI is now capable of matching and even surpassing human intelligence in many areas. However, this potential comes at a great cost: More powerful AIs require more power, as well as more computational capacity.

By adding more processing power to AI systems, engineers can unlock new levels of performance and achieve breakthrough results. This can be achieved through various means, such as utilizing high-performance computing clusters or leveraging cloud-based infrastructure.

Take GPT-3 and its family of models as an example. As far as large language models (LLMs) are concerned, when trying to create an AI, it seems that the standard for giving an estimate of the model's capabilities is given in terms of the number of parameters it has. The higher the number, the more powerful the AI. And while yes, size matters, parameters are not everything and at some point we will face the engineering problem of requiring more processing power than we can provide.

Before we delve deeper, I want to draw a parallel with a subject that is close to my heart: video games and consoles. See, I'm a child of the 80s; I was there for the great console wars of the 90s – Sega does what Nintendo doesn't and all that jazz. At some point, consoles stopped marketing their sound capabilities or the quality of their colors and instead started talking about bits.

In essence, the more bits, the more powerful the console; everyone was after these big parts. And this led companies to create extremely crazy architectures. It didn't matter how insane the hardware was, as long as they could promote it as having more bits than the competition (Ahe, Atari Jaguar).

This continued for a long time – Sega left the console market, Sony conquered the world with the Playstation, Microsoft entered the competition with the Xbox – and at the heart of each generation, we had the pieces. In the PS2 era, we also started talking about polygons and teraflops; once again, it was all about the big numbers.

And then came the era of the PS3 and Xbox 360. Ah, the promise of realistic graphics, immersive sound and more. Now it wasn't about pieces; it was about how many polygons on the screens, fps, storage capacities; once again, it was the largest number.

The two console manufacturers faced off and, without ever realizing it, a small alternative appeared on the market: Nintendo's Wii. The Wii was a toy compared to the beasts that Sony and Microsoft brought to market, but Nintendo was smart. They targeted the casual audience, those who were not intoxicated by large numbers. The end result speaks for itself. During this console generation, the PS3 sold 80 million units, the Xbox 360 sold 84, and the Wii? – 101 million units.

The little underdog conquered the market and all it took was a little creativity and ingenuity.

What do my ramblings have to do with the AI arms race? In fact, as we see, there is a very strong reason to be careful with bigger models, and it's not because they're going to take over the world.

Why do we want bigger models?

So what are the advantages of putting our models on bigger, more powerful hardware? Just as software developers can work miracles with a box of energy drinks, more RAM and more processing power are a boost that increases the computational possibilities of our models.

Powering AI with more computing power involves giving it greater resources to process data faster and more efficiently. This can be achieved through various means, such as utilizing high-performance computing clusters or leveraging cloud-based infrastructure. By supercharging AI systems, organizations can unlock new levels of performance and achieve breakthrough results.

A significant advantage of empowering AI with greater computational capabilities, aided by machine learning services, is its ability to analyze large data sets in real time. With access to immense computing power, AI algorithms can quickly identify patterns and trends that might otherwise go unnoticed. This enables CEOs and CFOs to make faster, more informed decisions based on accurate insights derived from complex data sets.

Additionally, more powerful AI systems, including AI for software testing, have the potential to process complex patterns in data sets more effectively, leading to highly accurate predictions that help investors make informed decisions. By leveraging increased computing power, organizations can leverage predictive analytics models that provide valuable insights into market trends, customer behavior and investment opportunities.

Ultimately, empowered AI has the ability to automate repetitive tasks at scale while maintaining accuracy and reducing operational costs for businesses. With greater computing power, organizations can deploy advanced automation solutions that streamline processes across multiple departments, such as finance, operations or customer service.

And this is all common sense, right? More power means more processing power, which translates into larger models and faster/accurate results. However, while the potential benefits of boosting AI with more computing power are significant, there are several tangential issues that need to be considered:

Ethical considerations : As AI becomes more powerful, ethical concerns around invasion of privacy or biased decision-making may arise. Organizations must ensure transparency and accountability when implementing AI-enabled solutions to maintain trust and avoid potential pitfalls.
Environmental impact : Increasing computing power requires more energy consumption, which can have environmental implications. It is crucial that organizations balance the benefits of AI-enablement with sustainable practices and explore ways to minimize their carbon footprint.

The problem with simply putting more power into refining our models is that it's a bit like the dark side of Star Wars (I'm a geek...). Yes, it's a faster path to power, but it also comes at a cost that may not be evident until it's too late.

Transformer Models: A Revolutionary Approach to AI

Just to add tension, let's talk a little about transformer models and why they are so important to modern computing and machine learning. Let's explore the transformative power of transformer models (pun intended) and their implications for business.

Transformative models are a type of deep learning architecture that uses self-attention mechanisms to efficiently process sequential data. In fact, attention is so important that the original article was titled “Attention Is All You Need.”

To simplify a very complex subject, unlike traditional recurrent neural networks (RNNs) or convolutional neural networks (CNNs), transformers can capture long-range dependencies in data without relying on sequential processing. In other words, imagine you have a box full of photographs and you want to organize them chronologically.

One method would be to stack the photos and then look at each one in order, classifying them based on their relationship to their nearest neighbors. This could definitely work, but it comes with some major problems: mainly because you're not paying attention to the entire stack of photos, but rather a few at a time.

The second approach, the one reminiscent of transformers, involves laying out all the photos on the floor and looking at them all at once, figuring out which photos are closest to which based on colors, styles, content, and so on. See the difference? This pays more attention to context than a sequential analysis.

This innovation paved the way for notable advances in natural language processing (NLP) tasks such as machine translation, sentiment analysis, and question answering.

An important advantage of transformer models is their ability to understand complex linguistic structures with exceptional accuracy. By leveraging self-attention mechanisms, these models can analyze relationships between words or phrases within a sentence more effectively than previous approaches.

It's pretty simple when you put it like that, right? Context is everything in language, and transformers can be “aware” of more information than just a few words, so they have more information to accurately predict the new word in a sentence. Or, in the case of other applications, such as sentiment analysis, it can identify sentiment towards a topic and even differentiate whether a comment is sarcastic based on context.

Machine translation has always been a challenging task due to linguistic nuances and cultural differences between languages. However, transformative models have significantly improved translation quality by modeling global dependencies between words, rather than relying solely on local context as traditional approaches do. This innovation empowers companies operating globally with more accurate translations for their products, services and marketing materials.

The Dark Side of Power: The Challenges of Scaling Transformer Models

While transformer models have revolutionized the field of AI and brought significant advances in language understanding, scaling these models to handle larger data sets and more complex tasks presents its own set of challenges.

Firstly, transformers are resource-intensive. As they grow in size and complexity, they require substantial computing resources to train and deploy effectively. Training large-scale transformer models requires high-performance computing clusters or cloud-based infrastructure with specialized hardware such as graphics processing units (GPUs) or tensor processing units (TPUs). This increased demand for computing power can pose financial constraints for organizations without adequate resources.

Just look for OpenAI and its GPT models. No one can deny how incredible these models are, but it comes at a cost. The models are running in data centers that would make old computer mainframes look like laptops by comparison. In fact, you candownload any of the open source LLMs out there and try running it on your computer and watch your RAM cry out in pain as the model gobbles it up.

And most models are smaller compared to GPT-3.5 in terms of parameters. For example, Llama (Meta's LLM) and its open source cousins have somewhere around 40 billion parameters. Compare this to GPT-3's 175 billion parameters. And although OpenAI has chosen not to disclose how many parameters GPT-4 has, rumors put it at around 1 trillion .

Just to put it in perspective, Sam Altman, CEO of OpenAI, told the press that GPT-4 training costs around 100 million dollars. And take into account that this model uses data that has already been collected and pre-processed for the other models.

Scaling transformer models often requires access to large amounts of labeled training data. While some domains may have readily available datasets, others may require extensive efforts to manually collect or annotate data. Furthermore, ensuring the quality and diversity of training data is crucial to avoid bias or distorted representations in the model.

Recently, a class action lawsuit was filed against OpenAI for lack of transparency in data collection. Similar complaints have been raised by the EU. The theory is that, just as you can't make an omelet without breaking a few eggs, you can't build a trillion-parameter model without getting superficial data.

Larger transformer models tend to have a greater number of parameters, making them more difficult to optimize during training. Fine-tuning hyperparameters and optimizing model architectures become increasingly complex tasks as scale grows. Organizations must invest time and expertise in fine-tuning these parameters to achieve optimal performance, avoiding overfitting or underfitting issues.

Deploying scaled-up transformer models in production environments can be a difficult task due to their resource requirements and potential compatibility issues with existing infrastructure or software systems. Organizations need robust deployment strategies that ensure efficient utilization of computing resources while maintaining scalability and reliability.

Open source strikes back

Competition in the world of AI has long been seen as a battleground between tech titans like Google and OpenAI. However, an unexpected competitor is quickly emerging: the open source community. A leaked letter from a Google engineer posits that open source has the potential to overshadow Google and OpenAI in the race for AI dominance.

A significant advantage of open source platforms is the power of collaborative innovation. With the leak of Meta's capable base model, the open source community took a quantum leap. Individuals and research institutions around the world have rapidly developed improvements and modifications, some surpassing the developments of Google and OpenAI.

The range of ideas and solutions produced by the open source community has been wide-ranging and high-impact due to its decentralized nature and open to all. The model created by this community iterated and improved on existing solutions, something that Google and OpenAI could consider in their strategies.

Interestingly, the engineer in question also points to the fact that these open source models are being built with accessibility in mind. In contrast to the juggernaut that is GPT-4, some of these models produce impressive results and can be run on a powerful laptop. We can summarize their opinion on LLMs in five main points:

Lack of flexibility and speed : Development of large models is slow and it is difficult to make iterative improvements to them quickly. This hampers the pace of innovation and prevents quick reactions to new data sets and tasks.
Costly retraining : Whenever a new application or idea emerges, large models often need to be retrained from scratch. This not only rules out the pre-training, but also any improvements made to it. In the open source world, these improvements add up quickly, making complete recycling extremely expensive.
Impediment to Innovation : While large models may initially offer superior capabilities, their size and complexity can stifle rapid experimentation and innovation. The pace of improvement of smaller, rapidly iterated models in the open source community far exceeds that of larger models, and their best versions are already largely indistinguishable from large models like ChatGPT. Therefore, the focus on large models puts companies like Google at a disadvantage.
Data scaling laws : Large models also depend heavily on the quantity of data rather than the quality. However, many open source projects are now training on small, highly curated datasets, which potentially challenges conventional wisdom about data scaling laws in machine learning.
Restricted accessibility : Large models often require substantial computational resources, which limits their accessibility to a wider range of developers and researchers. This factor impedes the democratization of AI, a fundamental advantage of the open source community.

In other words, smaller models allow for faster iterations and, consequently, faster development. This is one of those cases where we can safely say that less is more. The experiments that the open source community is doing with these models are incredible and, as we mentioned in the fourth point, they are basically questioning many assumptions we have made so far about machine learning.

I started with a video game analogy and will end with one. In an interview with Yoshinori Kitase, director of the incredible Final Fantasy VI, the Japanese developer was asked about the climate and culture of game development in the 90s. Unsurprisingly, Kitase admitted that it was a pain.

Having to fit an epic tale with graphics, dialogue, music and even cut scenes into a mere 8 megabytes of storage seems impossible by today's standards. But Kitase actually spoke quite favorably about the experience. For him, time constraints forced the team to think creatively, to shape and remodel their vision until they were able to reduce it to 8 megabytes.

It seems the open source community embodies this spirit. Lacking the resources of tech giants, they took on the task of creating and developing models that could work with a potato. And in the process, they showed us that more parameters are just one way to building a powerful language model.

If you liked this article, check out one of our other articles on AI.