Data Machina #254 – Data Machina

On the State of AI Coding Agents. How could we start using AI to migrate years of messy, flimsy legacy code to a modern stack? … Perhaps an AI Code Migration Agent ???

We’re doing AI chat & espresso at Level 39, One Canada Square. James -a veteran CTO with all the scars- is asking these rather funny, rhetorical questions. There is a deep silence in the room, pensive faces around. Everyone is staring through the massive windows overlooking The City skyline as the sunset strikes. We wonder in perplexity -in the very philosophical and information theory sense- whether AI Coding Agents are fully ready for such tasks in prod, or not and if yes when…

Are AI Coding agents any good at solving real-world coding issues autonomously? The team at Princeton Language & Intelligence (PL&I) has come up with SWE-bench, a benchmark for evaluating AI coding agents (paper, code, benchmark). It turns out that current AI Agents are not achieving very good scores in this benchmark yet.

The PL&I team also open sourced SWE-agent an agent that turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories. Checkout the video below with a good hands-on, deep dive.

Amazon Q. The SWE-bench leaderboard is constantly changing but it seems that Amazon Q Developer Agent is for now leading the pack. Amazon Q is a closed model and not very popular in the AI community. It was able to successfully solve only 13.8% out of 2294 tasks. Not a lot really! Here is a vid with a deep dive on Amazon Q.

Devin the 1st Autonomous AI Engineer? In March, Cognition Labs announced Devin, the world’s first fully autonomous AI software engineer. Cognition Labs claimed they were setting a new state of the art on the SWE-bench coding benchmark. Devin went viral but then people in the AI community exposed some tricks used in Devin’s demo. Watch the video below to understand the good, the bad and the ugly of Devin.

Open source AI community to the rescue: OpenDevin. This started as a small side project and has quickly become one of the most popular AI Coding agents projects. OpenDevin agents collaborate with human developers to write code, fix bugs, and ship features. Probably, one of the best open source AI software engineer for developing apps. OpenDevin is now achieving 21% in the swe-bench, the highest score. Checkout the overview below:

Devika MIT licensed. Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Checkout the vid below.

Blackbox AI Coding Agents IDE. This is a -still free- pretty amazing AI Coding Agents IDE that comes packed with agents specialised in+30 development languages. The agents can perform natural language to code, chat to code, image to code, plus many s/w engineering tasks like: bug fixing, unit testing, code translation, API integration, coding docs, coding optimisation… Apparently millions of devs use it. Checkout Blackbox AI’s playground, agents and features here.

Data Machina #254 – Data Machina

GPT-Engineer. Another popular open source AI engineer, gpt-engineer lets you: 1) Specify software in natural language, 2) sit back and watch as an AI writes and executes the code, and 3) Ask the AI to implement improvements. Somehow, not sure, but it seems this project perhaps is starting to fall behind other similar projects. In this video Arjan asks: “Is GPT Engineer Actually Useful?

ChatDev Virtual Software Company. This is an amazing, OSS virtual software company that operates through various intelligent agents holding different roles, including CEP, CPO , CTO, programmer, reviewer , tester, art designer… These agents form a multi-agent organisational structure and are united by a mission to “revolutionise the digital world through programming.” The agents within ChatDev collaborate by participating in specialised functional seminars, including tasks such as designing, coding, testing, and documenting. Checkout the repo and paper: ChatDev Multi-Agent Collaborative Software Development.

AI Chatbots Transform Software Development

Other less ambitious AI Coding Agents for specific s/w engineering tasks.

  • PR-Agent automates the review and analysis of pull requests, and generates feedback and suggestions.

  • What The Diff automatically writes pull request descriptions, sends out summarised notifications to non-technical stakeholders in the loop, and helps you to refactor minor issues during the review.

  • Cover Agent automatically generates qualified tests to enhance existing test suites to help efficiently increasing code coverage.

Have a nice week.

  1. An Evolving Sixth Sense for AI

  2. How to Build Terrible AI Systems

  3. Successful LMs Evals and 7 Mistakes

  4. Teaching Neural Nets to Make Neural Nets & Auto LoRA

  5. On Fine-tuning: Can a LLM Really Learn New Things?

  6. The KAN Revolution Arriving at AI: Death of DL?

  7. An Overview of the new OSS Falcon 2 Family of Models

  8. Perplexica – An OSS AI Search Egine Alt to Perplexity AI

  9. CogVLM2 – An OSS VL Model with Chat Skills that Beats GPT-4V

  10. Meta Tutorials: How to Run Llama-3 on Linux, Windows & MacOS

Share Data Machina with your friends

  1. KHOJ v1.12 – An OSS App that Creates Personal AI Agents

  2. Building a Multi-Tool AI Agent with Databricks DBRX & DSPy

  3. Implementing Llama-3 from Scratch, One Tensor and MatMul at a Time

  1. [tutorial] Diffusion Models in Image & Vision

  2. Extracting Millions of Interpretable Features from AI Models in Prod

  3. [tutorial] PyCon US2024 The Fundamentals of Modern DL with PyTorch

  1. Is Sora a World Simulator? A Survey on General World Models and Beyond

  2. Pandora: On-the-fly World Model VideoGen with NL (paper, code, demo)

  3. An Atari RL Agent Trained in a Diffusion World Model (paper, code, game)

  1. How to Test for Topics & Embeddings Drift

  2. OSS Netflix Metaflow v 2.11 – Easily Build & Manage AI/ML Projects

  3. Breaking Down Workflow Orchestration and Pipeline Authoring in MLOps

  1. An Awesome Collection of Resources on Synthetic Data

  2. BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

  3. Curating Custom Datasets for LLM Training with OSS NVIDIA NeMo Curator

Enjoyed this post? Tell your friends about Data Machina. Thanks for reading.

Share

Tips? Suggestions? Feedback? email Carlos

Curated by @ds_ldn in the middle of the night.

Related articles

8 Significant Research Papers on LLM Reasoning

Simple next-token generation, the foundational technique of large language models (LLMs), is usually insufficient for tackling complex reasoning...

AI-Generated Masterpieces: The Blurring Lines Between Human and Machine Creativity

Hey there! Just the other day, I was admiring a beautiful painting at a local art gallery when...

Marek Rosa – dev blog: GoodAI LTM Benchmark v3 Released

 The main purpose of the GoodAI LTM Benchmark has always been to serve as an objective measure for...