why AI can't pass this test

Answer in Progress
31 Aug 202318:34

TLDRThe video explores the intelligence of AI by pitting it against a human in a series of tests designed to measure various aspects of intelligence. Despite AI's impressive performance in some areas, such as deriving information and understanding, it falls short in reasoning and learning from experience. The video highlights the current limitations of AI, suggesting that while it excels at tasks within its training data, it struggles with novel problems and adaptability, indicating a need for further development in few-shot learning and generalization.


  • 🧠 The concept of intelligence is multifaceted, involving the ability to derive information, learn from experience, adapt to the environment, and utilize thought and reason correctly.
  • 📚 AI has made significant advancements in a short span of time, with models like Chat GPT showcasing impressive capabilities in various fields, including cooking and academic achievements.
  • 🌐 The rapid improvement in AI models is attributed to better algorithms, high-quality data, fine-tuning with human input, and increased computing power.
  • 💰 The commercial potential of AI is vast, with estimates suggesting a $4.4 trillion opportunity, despite concerns about the environmental and social impact of AI development.
  • 📝 To assess AI's intelligence, a series of tests were conducted, including the law school admission test, the abstraction and reasoning challenge, TruthfulQA, and an IQ test.
  • 🤖 AI's performance on tests revealed strengths in knowledge-based tasks and weaknesses in novel problem-solving, highlighting the difference between memorization and true understanding.
  • 📊 The AI's scores were significantly higher in adaptability, information derivation, and understanding, but much lower in reasoning and learning from experience.
  • 🔍 The inconsistency in AI's performance suggests that its intelligence is domain-specific and that it struggles with tasks that diverge from its training data.
  • 🚀 Addressing AI's limitations requires a shift in focus towards developing models that can generalize and adapt to new situations, a concept known as few-shot learning.
  • 🌟 AI's current capabilities are valuable for specific applications but indicate that there is still a long road ahead for AI to achieve human-like general intelligence.

Q & A

  • What was Sabrina's initial sentiment towards AI advancements?

    -Sabrina was frustrated and felt compared to her more successful classmates, as AI like Chat GPT seemed to be outperforming humanity.

  • What did Sabrina use to test the intelligence of AI?

    -Sabrina used a variety of tests including the law school admission test's reading comprehension portion, the abstraction and reasoning challenge, TruthfulQA, the massive multitask language understanding benchmark, and an IQ test.

  • How has AI improved in just three years, as mentioned in the script?

    -In three years, AI has improved significantly, going from generating incomplete mac and cheese recipes to teaching complex tasks like making pasta from scratch.

  • What potential issues were raised about the development of AI?

    -The script raises concerns about the data used for training AI, which might be stolen, the exploitative labor used for polishing AI, and the environmental impact, such as the use of valuable resources like drinking water.

  • What was the outcome of Sabrina's tests on AI?

    -The AI performed exceptionally well on tests involving background information and knowledge but poorly on tests requiring novel problem-solving and few-shot learning.

  • What is the significance of the long tail problem in AI?

    -The long tail problem refers to AI's difficulty in handling novel situations that were not part of its training data. This makes it challenging to trust AI in high-stakes scenarios where unexpected problems may arise.

  • What does the script suggest about the future development of AI?

    -The script suggests that future AI development should focus on addressing the lopsidedness in AI's abilities, improving few-shot learning, and creating more balanced and generalized intelligence.

  • How did Sabrina overcome her challenges in creating an AI for the project?

    -Sabrina overcame her challenges by seeking help from a Fiverr freelancer, Thomas, who provided expertise in building the AI model and addressing the project's specific needs.

  • What was the role of Fiverr in the video?

    -Fiverr sponsored the video and provided the platform for Sabrina to find expert freelancers who helped her develop and understand the AI model used in the project.

  • What is the main takeaway from the video about AI intelligence?

    -The main takeaway is that AI exhibits a range of intelligence, excelling in specific areas based on its training data but struggling with novel and unexpected problems, indicating that AI still has a long way to go before it can truly outthink humans.



🤖 The Quest for AI Intelligence

This paragraph introduces the concept of artificial intelligence and its rapid development, comparing it to human intelligence and posing the question of whether AI is as smart as it seems. The narrator shares their experience of challenging AI with various tests to determine its intelligence. The discussion touches on the impressive achievements of AI, such as graduating from MIT and passing the bar, and contrasts it with the previous limitations of AI, like generating incomplete recipes. The improvements in AI are attributed to better models, high-quality data, fine-tuning, and increased computing power. However, ethical concerns are raised about the data sources, labor practices, and environmental impact of AI development. The goal of the video is set to explore the true intelligence of AI and compare it with human intelligence.


🧠 Defining Intelligence and AI's Capabilities

The second paragraph delves into the definition of intelligence as understood by psychologists, emphasizing the ability to learn from experience, adapt to the environment, and utilize thought and reason. It highlights the challenges in measuring intelligence and the various tests chosen to evaluate AI, such as the law school admission test for reading comprehension, the abstraction and reasoning challenge for learning from experience, and the massive multitask language understanding benchmark for understanding. The paragraph also discusses the decision to collaborate with a Fiverr freelancer for building and training the AI model, emphasizing the importance of human input in AI projects. The freelancer's contribution in building a model that fits the project's needs, budget, and timeline is highlighted, along with the features added for specific requirements.


📝 The AI and Human Intelligence Test

This paragraph describes the process of conducting tests on both the AI and the narrator, covering a wide range of subjects and difficulty levels. It discusses the challenges faced during the test creation and the subsequent realization that the tests need to be marked. The narrator's friends help in grading the tests, leading to a humbling experience for the narrator as they discover their own limitations in certain areas of intelligence. The AI's performance is compared to the narrator's, revealing that while the AI excels in some areas, such as adaptability and deriving information, it performs poorly in reasoning and learning from experience, raising questions about the nature of AI intelligence.


🤔 Unraveling the Mystery of AI's Inconsistency

The final paragraph explores the inconsistencies in AI's test performance, questioning the reasons behind its varying levels of success in different areas of intelligence. It discusses the role of training data in AI's performance and the possibility that AI's achievements might be more about memorization than true understanding. The concept of few-shot learning in AI is introduced, emphasizing the difficulty AI faces in solving novel problems compared to humans. The paragraph also touches on the long tail problem in AI, where the system's inability to predict responses to new situations can lead to trust issues, especially in high-stakes scenarios. It concludes by acknowledging the current limitations of AI and its potential areas of valuable application, while also highlighting the importance of continued research and development to address the gaps in AI intelligence.




Intelligence, as discussed in the video, refers to the ability to derive information, learn from experience, adapt to the environment, understand, and correctly utilize thought and reason. It is the central theme of the video, exploring whether AI possesses a level of intelligence comparable to or surpassing that of humans. The video uses various tests to evaluate different aspects of intelligence, such as adaptability, reasoning, and learning from experience, to compare human and AI capabilities.


AI, or Artificial Intelligence, is the simulation of human intelligence in machines that are programmed to think and learn like humans. In the context of the video, AI is portrayed as a rapidly evolving technology that has made significant strides in areas such as language understanding, problem-solving, and data processing. The video explores the current capabilities of AI, particularly focusing on the Chat GPT model, and its performance in a series of intelligence tests designed to measure its intelligence against human standards.

💡Chat GPT

Chat GPT is a specific AI model mentioned in the video that has been trained on a vast amount of data and is capable of generating human-like text based on the input it receives. It is used as a representative example of AI's capabilities and is tested against the human host to determine its level of intelligence. The video discusses Chat GPT's impressive achievements, such as graduating from MIT and passing the bar, as well as its limitations when faced with novel or unexpected challenges.


Testing, in the context of the video, refers to the series of challenges and assessments used to measure and compare the intelligence of AI and humans. These tests cover various aspects of intelligence, including reading comprehension, abstraction and reasoning, societal adaptability, understanding across multiple subjects, and IQ. The video uses these tests to explore the strengths and weaknesses of AI in comparison to human cognitive abilities.


Fiverr is a freelance services marketplace mentioned in the video as a sponsor. It is highlighted as a platform where one can find experts in various fields, including AI model development and content generation. In the narrative, the host uses Fiverr to get assistance in building and training an AI model for the purpose of the intelligence tests, showcasing the platform's role in facilitating AI-related projects and collaborations.

💡Long Tail Problem

The Long Tail Problem, as discussed in the video, refers to the challenge of handling rare or unusual situations that an AI model may not have been trained for. It illustrates the limitations of AI when it comes to few-shot learning or generalizing from limited examples. The video uses this concept to explain the AI's poor performance on tests that require novel problem-solving, highlighting the current gaps in AI's ability to adapt and learn from new experiences outside of its training data.

💡Few-shot Learning

Few-shot Learning is a concept in AI where a model is expected to learn and perform well on new tasks with only a limited number of examples. The video points out that even the most advanced AI models struggle with this aspect of learning, as they are typically trained on large datasets and excel at tasks they have seen before. This is contrasted with human ability to quickly adapt and understand new situations with minimal exposure, highlighting a key difference between AI and human intelligence.

💡Fluid and Crystallized Intelligence

Fluid and Crystallized Intelligence are two types of intelligence as defined in psychology. Fluid intelligence refers to the ability to reason and solve problems in novel situations, while crystallized intelligence is the accumulation of knowledge, facts, and skills acquired throughout life. In the video, these concepts are used to analyze the AI's performance on different tests. The AI demonstrates strong crystallized intelligence by recalling information, but its fluid intelligence is questioned as it struggles with novel and abstract reasoning tasks.


Sponsorship in the context of the video refers to the financial or other forms of support provided by an entity for the creation and distribution of content. Here, Fiverr is mentioned as the sponsor of the video, which implies that they have provided resources or funding to assist in the production of the video. Sponsorship is often acknowledged in media content to give credit to the supporting parties and sometimes to promote their services or products, as seen with the provided link and discount code.


Mensa is an international high IQ society that aims to foster and encourage intellectual exchange among its members. In the video, an IQ test designed to qualify individuals for Mensa is used as one of the benchmarks to test the AI's reasoning capabilities. The mention of Mensa serves to emphasize the level of difficulty and the prestige associated with the test, as well as to illustrate the AI's performance in comparison to human standards of intelligence.


TruthfulQA is a question set designed to capture common misconceptions, used in the video as a measure of an individual's or AI's ability to adapt to societal norms and understand widely accepted truths. This test is part of the experiment to gauge the AI's intelligence in comparison to human intelligence, particularly in the context of recognizing and correcting common errors in belief or knowledge.


AI is often touted as being incredibly intelligent, but this video challenges that notion by pitting AI against human intelligence in a series of tests.

The AI in question, Chat GPT, has an impressive resume including graduating from MIT, passing the bar, and even qualifying for a US medical license.

Despite the AI's impressive credentials, the video questions whether it is truly intelligent or just a product of high-quality data and advanced computing power.

The video introduces a series of tests designed to measure different aspects of intelligence, such as deriving information, learning from experience, adapting to the environment, understanding, and reasoning.

One of the tests used is the law school admission test's reading comprehension portion, which requires deriving the best answer from a passage.

The abstraction and reasoning challenge is used to measure learning from experience, involving completing tasks based on a few demonstrations.

To measure environmental adaptation, the video uses TruthfulQA, a question set designed to capture common misconceptions.

The massive multitask language understanding benchmark is used to measure understanding, with questions spanning 57 subjects and varying difficulties.

An IQ test that can qualify one for MENSA is used to measure thought and reason in the AI versus human intelligence challenge.

The video discusses the ethical concerns surrounding AI development, including the potential exploitation and environmental impact.

The creator of the video collaborates with a Fiverr freelancer to build and train an AI model for the project, highlighting the importance of human input in AI success.

The results of the tests reveal that AI excels in areas of knowledge and recall but struggles with novel problems and reasoning.

The video concludes that AI is both more and less intelligent than humans, with strengths in specific areas and significant room for improvement in others.

The AI's performance on the tests raises questions about the nature of intelligence and the current focus of AI development on memorization over true understanding.

The video calls for a more balanced approach to AI development, focusing on few-shot learning and the ability to handle novel situations.

The video serves as a reminder that while AI has many valuable applications, it is not yet capable of outthinking or replacing human intelligence entirely.