The Upstart Contenders - Evaluating Reka AI, Mistra, and Cohere

While AI giants like OpenAI, Google, and Anthropic have dominated the large language model landscape, a new crop of ambitious startups are emerging to challenge the status quo. Leveraging the latest innovations in language AI and novel training techniques, upstarts like Reka AI, Mistra, and Cohere are delivering intriguing LLM offerings that could disrupt the playing field.

The experts at LMSYS have put models from these providers through their rigorous gauntlet of evaluations to see how they truly stack up against the behemoths like GPT-4, PaLM, and Claude. And the results reveal some promising contenders that excel in specific areas while still trailing the leaders in overall quality. Let's dive into the benchmarks:

Reka AI

Founded by AI research pioneers, London-based Reka burst onto the scene by training what they claim to be the first "constitutionally constrained" LLM focused on truthfulness and reliability from the ground up. Their RekAI20B and RekAI80B models emphasize factual responses over fluency or creative flair.

Where Reka's LLMs fall behind is in more subjective or open-ended language tasks like story writing, opinionated text generation, and multi-modal capabilities involving audio/visuals. Their scores in those domains trail the established leaders by a fair margin.

However, for applications where factual accuracy and multi-step reasoning are paramount - such as research, analysis, data exploration, or expert system development - Reka's laser focus on truthful outputs could make their offerings extremely valuable and differentiated.

Mistra AI

Taking a contrasting philosophical approach, Mistra's LLMs double down on open-ended generation, creative language, and multi-lingual skills with models like Mistra-MT and Mistra-Writer. Instead of emphasizing factual correctness, these models embrace a more freeform and unbounded approach to language production.

Where Mistra struggles is in areas that require precision, logical coherence, and grounding to factual sources of knowledge. Their models underperform on tasks like question-answering, multi-step math reasoning, and analytical report generation compared to the leading LLMs.

But for use cases where unbridled creativity, linguistic diversity, and free-flowing generation are valued over strict accuracy - such as story ideation, poetry composition, open-domain dialogue - Mistra could be an intriguing option that leans into that unrestrained expressiveness.

Cohere AI

Representing a more general-purpose approach, Cohere was founded by engineers from OpenAI, Google Brain, and University of Toronto. Their core LLMs like Cohere-LX aim to deliver high performance across a wide breadth of language skills without the same level of specialization as Reka or Mistra.

While not quite matching GPT-4's peak scores, Cohere holds its own across a variety of linguistic skills - displaying strong coding abilities, multi-lingual transfer, and summarization talents that could make it appealing for enterprise use cases.

Where Cohere's models fall a bit flatter are in more specialized domains like mathematical reasoning, advanced multi-modal understanding, and open-ended creative ideation. It doesn't lead in any one category, but presents a robust all-around offering.

The Startups' Advantages

So what could give these LLM upstarts an edge over the current AI titans despite trailing in some quality benchmarks? A few key factors:

  1. Agility & Rapid Iteration: As younger and more nimble organizations, startups like Reka, Mistra, and Cohere can move quickly to incorporate the latest AI training techniques, optimize inferencing costs, and rapidly iterate their offerings. They have a pace advantage.
  2. Risk-Taking & Specialization: Without the regulatory pressures of big consumer tech companies, LLM startups can take more risks in areas like constitutional training philosophies (Reka), unconstrained open-ended generation (Mistra), or diving into specialized industry domains. This differentiation could be valuable.
  3. Cost Innovation: If these startups can achieve even comparable quality to incumbents at lower price points through novel techniques, they could win over cost-conscious enterprises with an appealing value proposition.
  4. Potential M&A Plays: Given their head start in next-gen language AI, these upstarts could also make attractive M&A targets for big tech companies looking to rapidly bolster their LLM capabilities through acquisition.

The Road Ahead

While OpenAI, Google, Anthropic, and others clearly maintain an overall lead in LLM quality across the broadest measures, this new crop of nimble startups is pushing the boundaries in fascinating ways that could reshape the landscape down the road.

From ultra-reliable models like Reka's to open-ended creative powerhouses like Mistra to general-purpose high performers from Cohere, we're witnessing an acceleration of innovation happening across the LLM space that will only intensify in 2024 and beyond.

So while the name-brand leaders still reign supreme for most current use cases, developers, enterprises, and technically savvy users would be wise to keep a close eye on the upstart contenders as well. The LLM playing field is widening rapidly - and that increased competition will ultimately drive more breakthroughs to benefit everyone.

For a comparison of rankings and prices across different LLM APIs, you can refer to LLMCompare.