Marek Rosa – dev blog: LTM Benchmark: Improvements and new reports

At GoodAI, we are committed to developing agents that are capable of continual and life-long learning. As part of our efforts, we have previously open-sourced the GoodAI LTM Benchmark, a suite of tests aimed to evaluate the Long-Term Memory (LTM) abilities of any conversational agent. In this benchmark, all tasks take place as part of one single very long conversation between the agent and our virtual tester. The benchmark interleaves information and probing questions from different tasks, albeit taking special care of weaving them together into a natural conversation.

LTM = Long-Term Memory

As a direct consequence of our research in agents with LTM, the GoodAI LTM Benchmark is in constant evolution. To us it represents an invaluable tool for evaluating our agents and validating our hypotheses. Additionally, it helps us characterize the ways in which the distinct agents fail and therefore it provides us goals to aim for. In the GoodAI LTM team we regard the GoodAI LTM Benchmark as a moving goal post, and by introducing new tasks and features we are continuously pushing that goal post away, because what is a goal post worth if it is easy to reach?

Marek Rosa – dev blog: LTM Benchmark: Improvements and new reports

New features

With every new feature, we try to make the GoodAI LTM Benchmark not only more and more challenging, but also more realistic. The thing about benchmarking LTM is that you need your tests to be long, very long. So you either introduce a ton of dummy interactions for the sole sake of filling up the conversation, and accept that all those tokens are wasted resources, or you start interleaving the tasks and weave them into a seamless and natural conversation (like we do). We are always doing our best to minimize the amount of wasted tokens, whilst keeping the conversation natural and making sure that the agent can follow along.

For more details, continue to GoodAI Blog Post

Thank you for reading this blog!

Best,
Marek Rosa
CEO, Creative Director, Founder at Keen Software House
CEO, CTO, Founder at GoodAI

For more news:
GoodAI Discord: https://discord.gg/Pfzs7WWJwf
Space Engineers: www.SpaceEngineersGame.com
Keen Software House: www.keenswh.com
VRAGE Engine: www.keenswh.com/vrage/
GoodAI: www.GoodAI.com
Personal Blog: blog.marekrosa.org

Personal bio:

Marek Rosa is the founder and CEO of GoodAI, a general artificial intelligence R&D company, and Keen Software House, an independent game development studio, started in 2010, and best known for its best-seller Space Engineers (over 5 million copies sold). Space Engineers has the 4th largest Workshop on Steam with over 500K mods, ships, stations, worlds, and more!

Marek has been interested in game development and artificial intelligence since childhood. He started his career as a programmer and later transitioned to a leadership role. After the success of Keen Software House titles, Marek was able to fund GoodAI in 2014 with a $10 Million personal investment.

Both companies now have over 100 engineers, researchers, artists, and game developers.

Marek’s primary focus includes Space Engineers, the VRAGE3 engine, the AI People game, long-term memory systems (LTM), an LLM-powered personal assistant with LTM named Charlie Mnemonic, and the Groundstation.

GoodAI’s mission is to develop AGI – as fast as possible – to help humanity and understand the universe. One of the commercial stepping stones is the “AI People” game, which features LLM-driven AI NPCs. These NPCs are grounded in the game world, interacting dynamically with the game environment and with other NPCs, and they possess long-term memory and developing personalities. GoodAI also works on autonomous agents that can self-improve and solve any task that a human can.

Marek Rosa – dev blog: LTM Benchmark: Improvements and new reports

New features

10 AI Events to Check in Fall & Winter 2021

Introductory time-series forecasting with torch

Does GPT-4 Pass the Turing Test?

Vapi Secures $20M Series A to Redefine Enterprise AI Voice Agents

Electronic health records (EHR) management with AI

Related articles

AI for the board game Diplomacy

10 AI Events to Check in Fall & Winter 2021

Introductory time-series forecasting with torch

Does GPT-4 Pass the Turing Test?

Latest news

AI for the board game Diplomacy

10 AI Events to Check in Fall & Winter 2021

Introductory time-series forecasting with torch

Popular news

AI for the board game Diplomacy

10 AI Events to Check in Fall & Winter 2021

Introductory time-series forecasting with torch