ChatGPT, developed by OpenAI and launched in November 2022, isn’t the only large language model that has received lots of attention lately, but it’s by far the most widely known. A previous blog post that listed a glossary of AI terms included this brief definition:
Large Language Model (LLM): LLMs are a class of AI models that excel at natural language understanding and generation, like ChatGPT. They have numerous applications in text generation, language translation and more.
You may have read over the past year that GPT-4 (the paid version of ChatGPT) has been able to pass many difficult exams. Here are just a few:
- GPT-4 scored in the 90th percentile of the bar exam with a score of 298 out of 400.
- GPT-4 aced the SAT reading and writing section with a score of 710 out of 800, which puts it in the top 93rd percentile of test-takers.
- GPT-4 scored in the top 99th to 100th percentile on the 2020 USA Biology Olympiad Semifinal Exam.
- GPT-4 has passed a host of Advanced Placement examinations, exams for college-level courses taken by high school students that are administered by the College Board.
This is all amazing (and somewhat scary). Yet somehow, GPT-4 still struggles with high school math exams. The AMC 10 is an exam administered to high school students that covers various mathematical subjects (algebra, geometry, etc). According to OpenAI, GPT-4 scored a 30 on the AMC 10 exam, putting it between the bottom 6th and 12th percentile.
How is this possible, and why can’t OpenAI just tweak GPT-4 so it can score in the top 90+ percentile for high school math exams?
Here’s the secret that probably surprises most people: No one on earth fully understands the inner workings of large language models. Researchers are working to gain a better understanding, but this is a slow process that will take years — perhaps decades — to complete.
ChatGPT, like other LLMs, is often considered a "black box" in terms of understanding its internal decision-making processes. The model's architecture, especially in the case of deep learning models like GPT, involves complex computations and numerous parameters, making it challenging to interpret how the model arrives at specific answers.
While data scientists and researchers can have a high-level understanding of the model architecture and the training process, the detailed workings and the specific reasons behind individual predictions are not easily interpretable. This lack of interpretability is sometimes referred to as the "black box" nature of deep neural networks.
Several factors contribute to the black box nature of models like ChatGPT:
- Complexity: Deep neural networks, especially those with millions or billions of parameters, have complex architectures that make it difficult to trace how input information is processed at each layer.
- High Dimensionality: The input space for language models is highly dimensional, involving a vast number of possible word combinations and sequences. Understanding the contributions of each input component to the output can be challenging.
- Nonlinearity: Neural networks use activation functions and nonlinear transformations, introducing complexities in the relationships between input and output.
- Large-Scale Training: Models like ChatGPT are trained on massive datasets, and the interactions between parameters during training are intricate. Extracting meaningful explanations for individual predictions becomes challenging in such scenarios.
Efforts are underway in the field of explainable AI to improve the interpretability of models like ChatGPT. Some methods aim to provide post hoc explanations by highlighting relevant portions of input text that influenced the model's decision. However, achieving full transparency and interpretability in deep learning models remains an active area of research.
In practical terms, while users, including data scientists, can have a general understanding of how ChatGPT is designed and trained, predicting specific answers with certainty or understanding the precise rationale for each response is often elusive due to the black box nature of the model. It's important for users to be aware of this limitation and exercise caution when relying on the model for critical decisions.