A New Era of NovelAI Begins Now
TLDRThe transcript introduces a new era for NovelAI with the addition of painting to image generation and the unveiling of Cleo, an in-house developed AI model. Cleo, trained on 1.5 trillion tokens, boasts a Lambada score of 73% and an 8192 token context, making her the most advanced model of her size. Although experimental, Cleo is a testament to NovelAI's capability to train large language models. Initially available to Opus subscribers, Cleo will be accessible to all users in two weeks. The team thanks the community for their patience and teases more exciting developments to come.
Takeaways
- 🎨 New feature: Painting to image generation is being introduced, allowing users to modify and color images.
- 🐶 A cute dog is used as a visual distraction during the announcement.
- 📅 Upcoming release: The official release of the painting feature is scheduled for Thursday, two days from the announcement.
- 🚀 Module updates: Sigurd and Andrew are receiving new V2 modules, which are complete replacements for the original versions.
- 🍳 Team effort: The team has been working hard, with various metaphorical 'delicacies' representing their work.
- 🌟 Introduction of Cleo: Cleo is a custom-made model developed in-house, trained from scratch with a custom tokenizer and dataset.
- 📚 Extensive training: Cleo was trained on 1.5 trillion tokens, providing a strong general knowledge base.
- 🏆 Performance: Cleo achieved a Lambada score of 73%, surpassing other models of similar size.
- 🔍 Contextual understanding: Cleo features an impressive 8192 token context, enhancing its ability to understand and generate text.
- 🔢 Parameter size: Despite its capabilities, Cleo is compact with only 3 billion parameters.
- 🧑🔬 Proof of concept: Cleo serves as a proof of concept for the team's ability to train large language models.
- 🔧 Testing phase: Cleo is still experimental and will be initially available to Opus subscribers for testing.
- 📈 Future plans: The team is already training larger models and has more exciting developments planned for the year.
Q & A
What is the main announcement regarding image generation?
-The main announcement is that painting will be integrated into image generation, which is accessible via the image to image interface.
What are the new modules being introduced for Sigurd and Andrew?
-Sigurd and Andrew are getting brand new modules of the V2 variety, which are complete replacements to the original modules.
What is significant about the model Cleo?
-Cleo is the first custom made model created entirely in-house, trained from scratch with a custom tokenizer, a custom 6 terabyte pre-trained data set, custom fine tune, and a custom pre-trained model. It is designed to excel in storytelling.
How many tokens of data has Cleo been trained on?
-Cleo has been trained on 1.5 trillion tokens of data.
What is Cleo's Lambada score and how does it compare to other models?
-Cleo's Lambada score is 73 percent, which is better than any other similarly sized model.
What is the token context length of Cleo?
-Cleo features an 8192 token context length.
How many parameters does Cleo have?
-Cleo has 3 billion parameters.
What is the current status of Cleo in terms of availability?
-Cleo is still somewhat experimental and is being tested by Opus subscribers. Other users should expect to get access to Cleo in two weeks.
What does the team have planned for the future?
-The team has begun training much larger models and has more exciting developments planned for the year.
How does the team feel about the patience and support of their audience?
-The team is grateful for the patience and support of their audience and is committed to keeping them engaged with new developments.
What is the significance of the painting feature being introduced?
-The painting feature allows users to add a creative touch to their images by coloring and replacing elements within the generated image.
What does the team mean by 'cooking real hard' in the context of their work?
-The phrase 'cooking real hard' is a metaphor for the team's intensive efforts and hard work in developing and refining their AI models.
Outlines
🎨 Introducing Painting to Image Gen and New AI Modules
The video begins with a casual greeting and a playful introduction to the integration of painting into the image generation process. It's highlighted that this is a departure from text generation, but the focus quickly shifts to exciting advancements. The presenter teases an upcoming feature, painting, which allows for interactive image manipulation. They also announce the release of new V2 modules for Sigurd and Andrew, emphasizing that these are not just simple updates but significant upgrades. The presenter then introduces Cleo, a custom-made AI model developed in-house, trained from the ground up with a custom tokenizer, a 6-terabyte pre-trained dataset, and a custom fine-tune process. Cleo stands out for her extensive training on 1.5 trillion tokens, resulting in superior general knowledge and performance, as evidenced by her Lambada score of 73 percent. During fine-tuning, Cleo even achieved a score of 74. As the first model with an 8192 token context and a compact 3B parameter size, Cleo represents a proof of concept, showcasing the team's capability to train large language models. Although still experimental, Cleo is made available to Opus subscribers for testing, with a wider release planned for two weeks later.
🚀 Upcoming Developments and Subscriber Previews
The script concludes with a teaser of even more innovative features to come in the future, generating excitement among the audience. It's mentioned that Opus subscribers will have the privilege of early access to these new developments. The video ends on a high note with an energetic piece of background music, reinforcing the dynamic and forward-thinking spirit of the team.
Mindmap
Keywords
💡NovelAI
💡Image Generation
💡Text Generation
💡Modules
💡Tokenizer
💡Pre-trained Data Set
💡Fine Tune
💡Parameter Count
💡Lambada Score
💡Token Context
💡Proof of Concept
💡Opus Subscribers
Highlights
Introduction of painting to image generation, a new feature not related to text generation.
Image to image interface allows for color adjustments and text replacements.
Announcement of the official release of painting feature in two days, on Thursday.
Sigurd and Andrew terpy are receiving brand new V2 modules.
Cleo, the first custom made model created in-house, is introduced.
Cleo has been trained from scratch with a custom tokenizer and 6 terabyte pre-trained dataset.
Cleo is trained on 1.5 trillion tokens, offering better general knowledge.
Cleo achieves a Lambada score of 73 percent, surpassing similarly sized models.
During fine tuning, Cleo reached a Lambardo score of 74.
Cleo features an 8192 token context and is packaged in a 3 billion parameter model.
Cleo is a proof of concept model, signifying the capability to train large language models.
Training process for Cleo has been finalized, addressing data set issues and smoothing out the process.
Larger models are already in training following the success with Cleo.
Opus subscribers will have first access to Cleo while final adjustments are made.
General availability of Cleo for all users is expected in two weeks.
The team expresses gratitude for patience and support, with more exciting developments planned for the year.
Opus subscribers will be the first to experiment with Cleo.