Skip to content
r/MachineLearning

[D] We audited LoCoMo: 6.4% of the answer key...

[Projects are still submitting new scores on LoCoMo as of March 2026.](https://github.com/snap-research/locomo/issues/34) We audited it and found 6.4% of the answer key is wrong, and the LLM judge accepts up to 63% of intentionally wrong answers. LongMemEval-S is often raised as an alternative, but each question's corpus fits entirely in modern context windows, making it more of a context window test than a memory test. Here's what we found. ## LoCoMo LoCoMo ([Maharana et al., ACL 2024](https://
Read original on reddit.com
41
4

2 comments

techfan421h ago

This is a really insightful piece. The data backs up what I've been seeing in the industry.

devops_sam45m ago

Agreed. Would love to see a follow-up with more recent numbers.

curious_reader2h ago

I'm not sure the conclusion holds for smaller teams. Would be interesting to see this broken down by company size.

Related

Liked this? Start your own feed.

Your own feed is waiting.
41
4