Data Machina #250 – Data Machina

Data Machina #250 – Data Machina

Llama 3: A Watershed AI moment? I reckon that the release of Llama 3 is perhaps one of the most important moments in AI development so far. The Llama 3 stable is already giving birth to all sorts of amazing animals and model derivatives. You can expect Llama 3 will unleash the mother of all battles against closed AI models like GPT-4.

Meta AI just posted: ”Our largest Llama 3 models are over 400B parameters. And they are still being trained.” The upcoming Llama-400B will change the playing field for many independent researchers, little AI startups, one-man AI developers, and also enterprise AI apps. For now, The Zuck and Yan LeCunn are the bastions of “open AI.”

Quick Llama 3 Summary:

  • A family of SOTA, open models available in both 8B & 70B parameter sizes, in pre-trained base and instruction-tuned versions

  • License. Open but not fully Apache 2.0 open-source. Free license for research and commercial applications but with limitations. Read Llama 3 license here.

  • Open models and weights upon request. Get them here.

  • Trained on 24k GPUs!! and +15 trillion tokens. Massive for such model sizes.

  • Context window expanded to 8192 length. People expected 128K at least

  • New tokeniser with 128K words vocabulary built on tope of OpenAI TikToken

  • Meta AI official blogpost: Introducing Meta Llama 3: The most capable openly available LLM to date

  • Nathan’s great overview of all the tech details: Llama 3: Scaling open LLMs to AGI

Run Llama 3 with Meta AI intelligent assistant. Llama 3 has been integrated with Meta AI. Try it for chat, coding tasks, and problem solving here. It also runs on Facebook, WhatsApp and Instagram. If you’re not in the US, try with a VPN.

Easily deploy Llama 3 on cloud AI stacks. Using HuggingFace Deploy, you can now deploy Llama 3 on Azure ML, Google Vertex, Amazon SageMaker or HuggingFace hosting. Checkout: HuggingFace Meta-LLama-3-8B click deploy.

Run Llama 3 at blazing speed, super cheap cost.

Run Llama 3-Instruct-8B GGUF for efficient chat. GGUF is a binary format that is optimised for quick loading and saving of models. Llama 3 instruction tuned models are optimised for dialogue and outperform most open source chat models. Get Meta-Llama-3-8B-Instruct-GGUF here. Thanks to the great @nousresearch and @ggerganov.

Run Llama 3 on Apple silicon devices. You can now run any Lllama-3 model quantised in 4 bit or 8bit on your local Apple silicon device using Apple MLX framework. Thanks to the awesome @Prince_Canuma.

See how Llama 3 was jailbroken. The researchers at Meta AI team say that they spent a lot of time safeguarding and redteaming Llama 3. Well, I’m not so sure about that because -inevitably- lots of jailbreaks are starting to pop up. Checkout A Trivial Jailbreak Against Llama 3 or Jailbreaking Llama 3 for education purposes.

Have a nice week.

  1. AI Agentic Design Patterns: Multi-Agent Collaboration

  2. Decomposing Predictions by Modeling Model Computation

  3. Open-sourcing Idefics2: A Powerful 8B Vision-Language Model

  4. Cookbook & Tutorials for the Google Gemini Models

  5. [tutorial] Overview of LM Model Alignment Methods (77 slides)

  6. AI Model Compression: A Deep Guide to Quantisation

  7. A Great Reading List on [Modern] Machine Learning

  8. My Thoughts on AI Agents: Looping vs. Planning

  9. Standford 2024 AI Index Report (pdf, 500 pages)

  10. [amazing] MSR VASA-1: Lifelike Audio-Driven Talking Faces in Real Time

Share Data Machina with your friends

  1. torchtune – A PyTorch Lib for LLM Finetuning + Recipes

  2. DSPY: Not Your Average Prompt Engineering

  3. [opensource] DeepMind Penzai: Build, Edit & Visualise Neural Nets

  1. [free] MIT Lectures: Learning Deep Representations

  2. [free course] Quantization Fundamentals with Hugging Face

  3. Efficient, Large-scale Clustering & Visualisation for NLP and Vision

  1. Stanford STORM: Writing Wikipedia-like Articles from Scratch with LLMs

  2. DeepMind: The Limits of Token Prediction & Many-shot In-context Learning

  3. Mini-Gemini: Enhancing Muti-Modality in Vision-Language Models (repo, etc)

  1. Scaling AI Models Like You Mean It

  2. Architecture & Design Principles for LLMOps

  3. Orchestrating Online ML Model Training with Airflow

  1. YouTube-Commons – 15M Transcripts, 2M Videos

  2. HQ-Edit: A HQ Dataset for Instruction-based Image Editing

  3. COCONut – 383K Images, 5.1M Human-verified Segmentations

Enjoyed this post? Tell your friends about Data Machina. Thanks for reading.

Share

Tips? Suggestions? Feedback? email Carlos

Curated by @ds_ldn in the middle of the night.

Related articles

Introductory time-series forecasting with torch

This is the first post in a series introducing time-series forecasting with torch. It does assume some prior...

Does GPT-4 Pass the Turing Test?

Large language models (LLMs) such as GPT-4 are considered technological marvels capable of passing the Turing test successfully....