Blog OCTO

Buckle Up for a Dialogue with ChatGPT – An AI Perspective

Carolina Bessega Innovation Lead, Office of the CTO Publicado 20 Dic 2024

In an earlier blog, my colleague and renowned Wi-Fi expert, David Coleman, posed various questions to ChatGPT. These questions ranged from inquiries about the impact of Wi-Fi on society, to a challenge to craft a compelling movie script. Despite the diverse nature of the questions, ChatGPT was able to address each one successfully. Notably, in response to the final question, ChatGPT acknowledged that it could not replace David as a thought leader in the field. But I already knew that 😉

Language is a powerful tool that enables us to express our thoughts, ideas, and emotions. It has played a critical role in the development of human civilization, allowing us to build complex societies, create art and literature, and transmit knowledge across generations. It is truly a defining feature of what it means to be human.

In business, language is the foundation of all knowledge sharing and development. Communication is at the core of everything from programming languages to customer support interactions. It’s no surprise that some of the largest artificial intelligence (AI) research groups are investing heavily in developing better language models to enable more efficient communication between AI and humans.

For decades, the ability to communicate through language has been regarded as a sign of intelligence. As we continue to advance the capabilities of AI, the question arises: If an AI can hold a conversation with a human, does that make it intelligent?”

The recent release of ChatGPT has sparked increased interest in the technology community regarding the capabilities of AI language models. In this blog, I will delve deeper into the philosophical debate surrounding AI and consciousness by analyzing the Chinese Room thought experiment, first introduced by philosopher John Searle in 1980. To follow with an overview of how AI models like ChatGPT works.

The Chinese Room Thought Experiment

In the realm of artificial intelligence, one of the central questions has been how to determine if a machine truly possesses intelligence. The Turing test, proposed in 1950, has been widely accepted as a benchmark for measuring a machine’s ability to simulate human intelligence. However, in 1980, philosopher John Searle introduced the Chinese Room thought experiment to challenge the notion that passing the Turing test equates to understanding.

The Chinese Room thought experiment illustrates a scenario where a person who does not speak Chinese is placed in a closed room, with a box of Chinese symbols and a manual for answering questions in Chinese. Someone outside the room, who speaks Chinese but is unaware of who is in the room or what materials they possess, sends questions through a narrow slot in the door. The person inside the room then consults the manual to respond with the appropriate symbols.

Searle argues that in this scenario, the person outside the room, who is getting the correct answers, would believe that the person inside the room understands Chinese, just as a machine that passes the Turing test would be believed to possess intelligence. However, in both cases, the individual or machine is simply manipulating symbols without truly understanding their meaning.

ChatGPT (and other language models) can process and “understand” human language. They can understand natural language input, interpret text, and generate responses. Although that understanding is based on patterns and rules that have been learned, it does not reflect an intrinsic understanding of the world or human experience.

The understanding AI models possess is limited to the information and knowledge input into its training data and models. For now, AI models do not have their own beliefs, desires, goals, or motivations. In a sense, it is like the Chinese Room thought experiment.

While the Chinese Room thought experiment might not be universally accepted in the AI and cognitive science community, it still serves as a valuable reminder of the complexity and nuance involved in understanding true intelligence.

As an AI practitioner, I firmly believe that while AI advancements are impressive, we are still far from achieving true human-like intelligence. At first glance, ChatGPT may seem to pass the Turing test, but after interacting for a few minutes, it becomes clear that there are still elements missing for it to be mistaken for a human. However, this does not detract from the fact that it can and will become a valuable tool in various applications. For now, please remember it is experimental, so always take the answers with a grain of salt and trust your human instincts and knowledge.

Let’s dive into the fun part. How does this model work?

AI Fundaments Behind Language Models

A language model is a machine learning model designed to predict the probability of the next word(s) in a given sequence of text. In simpler terms, it is trained to understand the likelihood of a word occurring based on the previous words in the sequence.

To train a language model, a large dataset of text is provided to the model, and it is trained to predict the next word in a sequence given the previous words. This makes language models particularly useful for tasks like translation, text summarization, question-answering, etc.

One example of a popular language model is Generative Pre-trained Transformer (GPT). It is a transformer-based model that uses self-supervised pre-training on a large text dataset to generate highly coherent and realistic text.

Wait! What is self-supervised learning? Self-supervised learning is a type of machine learning where the model is trained to learn from a dataset without explicit supervision from a human. Unlike supervised learning, where a human needs to provide the desired outcome as part of the training data, in self-supervised learning, the model is trained using input-output pairs generated from the same dataset, without explicit supervision.

For example, in the case of a model designed to classify emails as spam or not spam, supervised learning would require providing samples of emails that that a human has previously classified. On the other hand, unsupervised learning could be used to segment customers into groups based on their purchase history, with the model extracting patterns and grouping similar customers without explicit guidance.

The self-supervised training of GPT models happens using a technique called masked language modeling (MLM), also known as Cloze task. In MLM, a portion of the input text is replaced with a special token (such as «[MASK]»). The model is trained to predict the original token, which might be a word or group of words based on the context. This allows the model to learn about the relationships between words in a sentence and how they are used in context, which is crucial for language processing tasks such as language translation or text summarization.

For example, in the sentence «The [MASK] is red», the model is trained to predict the missing word «apple». But it could also be “car” or “book”. Keep reading below to know what mechanism needs to be applied to understand/remember the context.

There are multiple variations of MLM training, like permutating the order of words or predicting the contiguous part of sentences. The key is these training strategies help the model learn about the relationships between the words in a sentence and how they are being used in context.

A really important advantage of self-supervised pre-training is that it allows using very large datasets, like the entire Wikipedia corpus or all the available digital books. The model learns the language structure and the meaning of words without the need for human annotation (an enormous task that blocks many projects).

ChatGPT is based on the GPT architecture, which is a type of language model developed by OpenAI. All modern language models, like GPT and BERT, are based on the transformer architecture, which was introduced by Google researchers in 2017 in their paper «Attention is All You Need».

The GPT model is trained using a variant of the transformer architecture called the Transformer-XL, which allows the model to maintain context over long sequences of text by using a technique called recurrence.

Let’s dig a little bit deeper into the most general transformer’s architecture. Transformers-based models are a specific category of neural network architecture. The critical components of this architecture are:

  • Self-attention mechanism: It weighs the importance of each part of the input when making predictions. This allows the model to understand the relationships between words in the sentence and how they are used in the context (remember our: “The [“MASK”] is red” example, where the language model knows the correct substitution is “apple” instead of “car” or “book”?). AI will not only look at that sentence but also will analyze bi-directionally to text before and after and weigh the essential pieces. In this example, the full context was “The [“MASK”] is red. Like the one that fell on Issac Newton’s head.” And now, it is clear the only correct answer is “apple”.
  • Encoder and decoder: The transformer model is typically composed of an encoder and a decoder. The encoder takes the input and generates a set of representations, called the «context», that captures the meaning of the input. The decoder then uses this context to generate the output. Therefore, the text is generative. The model generates new text out of the encoded version.
  • Multi-Head Attention (MHA): This is a specific implementation of the self-attention mechanism, where multiple attention mechanisms (the heads) are applied in parallel. Thisallows the model to attend to different parts of the input sequence in parallel to learn and use different relationships between elements.
  • Positional Encoding: Adds positional information to the words in the sentences, because it is not the same: «The dog bit the mailman» is different than «The mailman bit the dog».

Finally, I would like to mention that language models can be fine-tuned for your specific use case either using supervised learning or reinforcement learning; in that way, you can leverage its pre-trained understanding of the language while getting answers specific to your case.

In the case of ChatGPT it has been (and continues to be fine-tuned) using reinforcement learning from human feedback.

A few thoughts to remember:

  • A language model allows a machine to generate text, such as translations, summaries, and answers.
  • A language model can be pre-trained on a large amount of data without labels and then fine-tuned on a smaller, task-specific dataset with labelled examples.
  • The current state-of-the-art of language models is transformer-based models that use self-attention mechanisms.

The capabilities of machine learning models like ChatGPT are awe-inspiring, with a wide range of potential applications and high levels of accuracy. I highly recommend giving ChatGPT a try for yourself – not only is it a fun experience, but you may also find ways in which it can assist you. As an example, I utilized ChatGPT to generate a conclusion for this article as seen below:

“In conclusion, the recent release of ChatGPT has reignited the debate surrounding the capabilities of AI language models and their understanding of human language. The Chinese Room thought experiment, proposed by philosopher John Searle in 1980, highlights the limitations of AI models in truly understanding language and the world. Despite this, ChatGPT and other language models have the ability to process and generate responses based on patterns and rules learned from their training data. However, it is important to remember that these models currently lack the beliefs, desires, goals, and motivations that are inherent to human intelligence. As AI practitioners, it is important to continue to critically examine the capabilities and limitations of these models while also exploring new ways to enhance their understanding and capabilities.

Get the latest stories sent straight to your inbox!

Casos Relacionados