Data Machina #250 – Data Machina

Llama 3: A Watershed AI moment? I reckon that the release of Llama 3 is perhaps one of the most important moments in AI development so far. The Llama 3 stable is already giving birth to all sorts of amazing animals and model derivatives. You can expect Llama 3 will unleash the mother of all battles against closed AI models like GPT-4.

Meta AI just posted: ”Our largest Llama 3 models are over 400B parameters. And they are still being trained.” The upcoming Llama-400B will change the playing field for many independent researchers, little AI startups, one-man AI developers, and also enterprise AI apps. For now, The Zuck and Yan LeCunn are the bastions of “open AI.”

Quick Llama 3 Summary:

A family of SOTA, open models available in both 8B & 70B parameter sizes, in pre-trained base and instruction-tuned versions
License. Open but not fully Apache 2.0 open-source. Free license for research and commercial applications but with limitations. Read Llama 3 license here.
Open models and weights upon request. Get them here.
Trained on 24k GPUs!! and +15 trillion tokens. Massive for such model sizes.
Context window expanded to 8192 length. People expected 128K at least
New tokeniser with 128K words vocabulary built on tope of OpenAI TikToken
Meta AI official blogpost: Introducing Meta Llama 3: The most capable openly available LLM to date
Nathan’s great overview of all the tech details: Llama 3: Scaling open LLMs to AGI

Run Llama 3 with Meta AI intelligent assistant. Llama 3 has been integrated with Meta AI. Try it for chat, coding tasks, and problem solving here. It also runs on Facebook, WhatsApp and Instagram. If you’re not in the US, try with a VPN.

Easily deploy Llama 3 on cloud AI stacks. Using HuggingFace Deploy, you can now deploy Llama 3 on Azure ML, Google Vertex, Amazon SageMaker or HuggingFace hosting. Checkout: HuggingFace Meta-LLama-3-8B click deploy.

Run Llama 3 at blazing speed, super cheap cost.

Run Llama 3-Instruct-8B GGUF for efficient chat. GGUF is a binary format that is optimised for quick loading and saving of models. Llama 3 instruction tuned models are optimised for dialogue and outperform most open source chat models. Get Meta-Llama-3-8B-Instruct-GGUF here. Thanks to the great @nousresearch and @ggerganov.

Run Llama 3 on Apple silicon devices. You can now run any Lllama-3 model quantised in 4 bit or 8bit on your local Apple silicon device using Apple MLX framework. Thanks to the awesome @Prince_Canuma.

See how Llama 3 was jailbroken. The researchers at Meta AI team say that they spent a lot of time safeguarding and redteaming Llama 3. Well, I’m not so sure about that because -inevitably- lots of jailbreaks are starting to pop up. Checkout A Trivial Jailbreak Against Llama 3 or Jailbreaking Llama 3 for education purposes.

Have a nice week.

AI Agentic Design Patterns: Multi-Agent Collaboration
Decomposing Predictions by Modeling Model Computation
Open-sourcing Idefics2: A Powerful 8B Vision-Language Model
Cookbook & Tutorials for the Google Gemini Models
[tutorial] Overview of LM Model Alignment Methods (77 slides)
AI Model Compression: A Deep Guide to Quantisation
A Great Reading List on [Modern] Machine Learning
My Thoughts on AI Agents: Looping vs. Planning
Standford 2024 AI Index Report (pdf, 500 pages)
[amazing] MSR VASA-1: Lifelike Audio-Driven Talking Faces in Real Time

Share Data Machina with your friends

torchtune – A PyTorch Lib for LLM Finetuning + Recipes
DSPY: Not Your Average Prompt Engineering
[opensource] DeepMind Penzai: Build, Edit & Visualise Neural Nets

[free] MIT Lectures: Learning Deep Representations
[free course] Quantization Fundamentals with Hugging Face
Efficient, Large-scale Clustering & Visualisation for NLP and Vision

Stanford STORM: Writing Wikipedia-like Articles from Scratch with LLMs
DeepMind: The Limits of Token Prediction & Many-shot In-context Learning
Mini-Gemini: Enhancing Muti-Modality in Vision-Language Models (repo, etc)

Scaling AI Models Like You Mean It
Architecture & Design Principles for LLMOps
Orchestrating Online ML Model Training with Airflow

YouTube-Commons – 15M Transcripts, 2M Videos
HQ-Edit: A HQ Dataset for Instruction-based Image Editing
COCONut – 383K Images, 5.1M Human-verified Segmentations

Enjoyed this post? Tell your friends about Data Machina. Thanks for reading.

Tips? Suggestions? Feedback? email Carlos

Curated by @ds_ldn in the middle of the night.

Data Machina #250 – Data Machina

10 AI Events to Check in Fall & Winter 2021

Introductory time-series forecasting with torch

Does GPT-4 Pass the Turing Test?

Vapi Secures $20M Series A to Redefine Enterprise AI Voice Agents

Electronic health records (EHR) management with AI

Related articles

AI for the board game Diplomacy

10 AI Events to Check in Fall & Winter 2021

Introductory time-series forecasting with torch

Does GPT-4 Pass the Turing Test?

Latest news

AI for the board game Diplomacy

10 AI Events to Check in Fall & Winter 2021

Introductory time-series forecasting with torch

Popular news

AI for the board game Diplomacy

10 AI Events to Check in Fall & Winter 2021

Introductory time-series forecasting with torch