Hiring Software Engineers in a ChatGPT World

Hiring good software engineers has always been a challenge. While large language models (LLMs) have made many routine tasks easier, they have made hiring much harder. How can we discern the authentic candidates from the AI-generated ones when the effort required to spoof an application has become so minimal? Importantly, how do we do this without spending all of our time on hiring?

Read more

Crafting a Winning AI Strategy: Critical Questions for Executives

In today's rapidly evolving business landscape, AI isn't just a buzzword—it's a game-changer. As an executive, you're likely facing the challenge of creating an AI strategy that drives real value for your organization. Maybe you’re at the stage where many different departments are piloting AI projects, and you’re wondering how they all add up. Or maybe some pilots have yielded results, and you’re wondering if there’s any infrastructure you could put in place to accelerate AI’s impact.

At Hop Labs, we've guided numerous clients through this process, and we've identified two levels of questions that can serve as the pillars of a robust AI strategy.

Read more

Four Truths About AI Strategies

AI strategy is a hot topic these days, and everyone’s scrambling to come up with one. But where do you start? 

At Hop, we have a comprehensive process for developing an AI strategy that we work through with our clients, but before we even get started, it's helpful to consider some foundational truths underlying our approach. These are some things we believe about AI strategies that are not necessarily widely understood.

Read more

Unproductive Claims about AI in 2024

I'm all for reasonable disagreements, but I find a lot of the current conversation around generative AI relatively unproductive. Every keynote speech at every conference I've been to this year has repeated some trite phrases that might make for a good sound bite but don't hold up to much critical consideration. In an attempt to further the conversation usefully, in this article, I'll point out some of the phrases that people use almost axiomatically that I don't think are actually true.

Read more

How Does the Agile Manifesto Apply to Research Engineering?

Applying novel research methods to production systems can be messy — you have to experiment, try things out, change tactics, abandon early attempts. This can result in tools that don't interoperate, duplicated infrastructure, a confusing backlog of tasks, and more.

Anybody who's been around software development in the past two decades is familiar with the standard approach for not getting buried by these kinds of challenges: Agile methodology, which increases the rate of iteration and builds flexibility into the process. At Hop, much of the software engineering work we do is in support of clients' machine learning research projects. The principles in the Manifesto for Agile Software Development are still relevant to engineers like us, but benefit from a second look. In this article, we examine some of the twelve principles laid out in the manifesto, reflecting our experience working on a broad range of research-oriented projects.

Read more

Online Connections Are a Remote Substitute for Real Life

Know anyone who spends a lot of time in front of a screen? Our whole team does. Since you’re reading this online, there’s a good chance you do too. How much effort do you put toward balancing that with in-person social time? Technology offers so many benefits in our day-to-day living, including social connection opportunities that wouldn’t be possible without it. However, awareness of its limitations and downsides, as well as our fundamental need for real-life connection, is key to staying healthy.

Read more

Beyond Prompt Engineering: The Toolkit for Getting LLMs to Do What You Want, Part 2

When it comes to approaches for guiding the behavior of LLMs in their applications, prompt engineering, fine tuning, and LLM chaining garner the lion’s share of attention in this space, and for good reason – they don’t require extremely deep technical expertise, and they support fast iteration cycles.

However, they don’t encompass the full scope of techniques that can be or will be brought to bear in the creation of LLM applications in the coming years. In this post, we cover three more tools, from de rigueur for complex LLM applications to speculative techniques that may not be production-ready for some time yet.

Read more

Beyond Prompt Engineering: The Toolkit for Getting LLMs to Do What You Want, Part 1

When creating LLM applications, people correctly place a lot of emphasis on the foundation model – the model underpinning an LLM app sets a cap on the reasoning ability of the system, and because LLM calls tend to dominate the per-interaction costs of serving an LLM application, the choice of foundation model sets the baseline for the marginal cost and latency of the whole system.

However, unless you’re trying to make a mirror of the ChatGPT or Claude website, you’ll want to modify the behavior of that underlying model in some way: you’ll want it to provide certain types of information, refrain from touching certain topics, and respond in a certain style and format. In this article and the next, we’ll discuss techniques for achieving that behavior modification, from well-trod to exploratory.

Read more

Hear Me Out: The Potential of Low-Latency Voice AI

Picture this: two users, same exact need – to get advice on a health issue. User 1 opens up a text interface. Types in their symptoms, medical history, the works. Maybe they're a little embarrassed, but hey, no one's watching. They take their time, make sure they don't leave anything out. The AI comes back with a detailed response. User 1 reads it once, twice, a few times. Lets it sink in. They highlight the key points, the action items. They feel informed, empowered. They've got a plan.

Now User 2, they go for voice. They start explaining their symptoms, and the AI jumps in with clarifying questions. It's a back-and-forth, a real conversation. User 2 feels heard, understood. The AI shares its advice. User 2 listens intently. It's like the AI is right there in the room with them, guiding them. The inflection, the pauses, it all lands differently. User 2 feels cared for, supported.

Same need, two very different experiences. All because of the interface.

Read more

Leashing Your LLM: Practical and Cost-Saving Tips for Staying on Topic

The general nature of LLMs makes them inherently powerful but notoriously difficult to control. When building an LLM-based product or interface that is exposed to users, a key challenge is limiting the scope of interaction to your business domain and intended use case. This remains an “unsolved” problem in practice mostly because modern LLMs are still susceptible to disregarding instructions and hallucinating (i.e., factual inaccuracy). As a consequence, operators must defend against unintended and potentially risky interactions. That can be difficult, because the ecosystem and tools for this problem are relatively nascent. Few (if any) commercial or open-source software packages offer out-of-the-box solutions that are accurate, simple, and affordable. We know, because our team has investigated many of these solutions, including AWS Bedrock Guardrails, NVIDIA NeMO Guardrails, and others.

Read more

The Most Important Uses for LLMs Aren’t Chatbots

Since the release of ChatGPT in late 2022, AI has received large and increasing amounts of attention and investment. We believe this is entirely warranted – AI in various forms is poised to change the way that businesses work. But one consequence of the ChatGPT release being the catalyst for this wave of attention is that people equate AI with large language models (LLMs), and they equate LLMs with chatbots.

We love chatbots – ChatGPT and others in its class are amazing tools – but, as an AI consultancy with a long history of projects in the space before the current mania, we’re sensitive to the conflation of LLMs and chatbots. Many of the most exciting potential uses for LLMs have little to do with the chatbot interface, and we think those should get more attention.

Read more

Engineer Better Research Results From a Solid Workbench

Treating the process of your work as important as the result will improve the quality of your results. All of the most successful projects that I’ve seen share a common factor: they are a delight to work on. When your workspace is organized, your tools are sharp, and the goals are clear, it’s easier to stay in a flow state and to do your best work. Projects that are mired in tedium, don’t have a good feedback loop, and don’t have a solid pattern of delivery can easily get into trouble. Without enough institutional momentum to make up for the poor engineering environment, they can fail. A lot of focus gets put on building the right thing for customers, and rightfully so, but it’s important to remember that before we can ship anything, we have to first build our workbench. Whether we do that haphazardly or intentionally can have an enormous impact on the quality of our results.

Read more

Evaluating the Evaluators: LLM Assessments in Practice

While an afternoon can be enough to get an LLM app demo working, it can take much longer to characterize and curtail unexpected LLM behavior. The process of making an LLM app reliable is mostly trial and error, involving spot-checking by the developer, reviews by product owners, and auto-evaluation. Auto-evaluation was introduced with the GPTScore paper in 2023, and by now people appreciate the need to evaluate this middle layer of LLM evaluators. At Hop, we’ve spent much of the past year working with auto-evaluation and feel that there’s a rich set of design decisions that aren’t regularly discussed. Here are some of the things we’ve been thinking about along the way.

Read more

Could You Be Talking to an AI Doctor?

Think back to your last telehealth visit with a doctor. Perhaps your kid had a persistently high fever, or you had worrying chest pain. Are you sure you were interacting with a human? What makes you sure? Perhaps the doctor listened attentively to your symptoms, asked pertinent questions, and even picked up on subtle cues in your language that hinted at the severity of your condition. 

Read more

Testing Research Code: Sometimes Worth It

Machine learning researchers often don’t write tests for their code. They’re not software engineers, and their code needs only to train a model or prove out an experiment. Plus, their code changes rapidly, and it’s hard to write tests in that context that don’t immediately need to be rewritten. However, at Hop, we’ve found that adding certain kinds of tests can actually accelerate research and increase confidence in results through improving code quality and encouraging reuse.

Read more

Why Most LLM App POCs Fail

LLMs aren’t yet widely used as an architectural component in production — the core issue is reliability. Not knowing how to engage with the reliability challenge – in a structured and productive manner – is what I think limits the success of most teams building LLM-powered applications. In our projects at Hop, we’ve developed a relatively uncommon perspective on how to effectively engage with this reliability challenge.

Read more

Machine Learning Is About Statistics After All: A Series of Vignettes, Part 1

Over the past decade, we’ve seen diminishing importance of traditional statistics in data science. It’s now possible to train complicated models while understanding very little about how they work. There’s a widespread attitude among practitioners that it’s enough to know how to code up architectures in PyTorch and correct obscure bugs, and that the math is someone else’s problem. We at Hop put ML models into production, and we’re here to tell you that the math is not someone else’s problem.

Read more

Code Quality for Research

I view research (and especially applied research of the type that Hop does) as a type of multi-armed bandit problem — one that tries to balance new approaches (exploration) with successful approaches (exploitation). The code quality/technical debt conversation is usually a bit muddled these days, but it becomes a bit easier to think about if you articulate where on the exploration/exploitation spectrum you currently are.

Read more