Building Trust with LLMs: Balancing Product with Potential

SUMMARY

  • Harvard Business Publishing (HBP) recognized the potential reward of leveraging LLMs for Harvard Business Review’s rich archive of content, but were also aware of the potential brand risk if they performed poorly.

  • HBP engaged Hop as a guide to navigate differing internal perspectives about how to approach generative AI, and to bridge the gap between their minimal GenAI experience and a customer-facing application that would protect and boost their brand.

  • As reliability was crucial, our approach toward building an LLM-powered chatbot invested heavily in rigorous measurement and evaluation.

  • Our efforts resulted in a smooth deployment to subscribers on HBR.org.

  • HBP is now exploring additional ways to leverage GenAI, and will be able to reuse and extend the modular and open-source-powered infrastructure we created.


THE COMPANY

Harvard Business Publishing (HBP) is a well-established company at the forefront of business knowledge, known for its flagship media brand Harvard Business Review (HBR), case studies, and leadership development solutions. Early in the period when large language models (LLMs) started seeing widespread commercial use, HBP recognized that LLMs stood poised to transform the publishing industry. However, the potential reward of leveraging LLMs also came with a significant brand risk if they performed poorly.

HBP was in search of a guide to help navigate differing internal perspectives about how to approach generative AI, and to bridge the gap between their minimal GenAI experience and a customer-facing application that would protect and boost their brand. HBP engaged Hop in that capacity, to develop an LLM-powered chatbot application that would leverage HBR’s rich archive of content to answer customers’ questions about leadership and management.

THE CHALLENGE

HBP faced several organizational, technical, and reputation-related challenges in the implementation of this LLM application. It was clear to many in the company that there were opportunities to use LLMs in HBP products, but there were some conflicting perspectives and a lack of clarity in direction. They didn’t yet know what was possible from a product perspective, they weren’t sure how to get started, and they didn’t know what they needed to know to implement successfully – they were facing true uncertainty.

Multiplying this risk was the overwhelming amount of change in the technology, which made it difficult to know which investments would last, and which products would continue to be relevant through the course of the development cycle. There were a number of vendors pitching interfaces for LLM applications, but it was unclear which vendors provided real value and which were just hype. They needed trusted partners to navigate this space, and to help upskill and expand their own internal team.

HBP has a trusted, high-value brand, one that stands to lose a lot from associating an ineffective – or worse, toxic – chatbot with their name. Not only did they have to chart a course through rapidly changing waters, they had to do it in a way that avoided any potential for a negative brand experience.

THE APPROACH

In the space of LLM applications, proofs of concept can be made over a weekend, but making those POCs reliable takes much more work. Unlike for startups, which fail fast in search of product-market fit, reliability is crucial for trusted, high-value brands. 

Reliability can only be achieved through rigorous measurement and evaluation, so our approach invested heavily in the evaluation process. Our team worked intensively with the editorial and product teams at HBP to develop a rubric for application output, then built out the infrastructure to run experiments on each component of the system.

We expected the technology to evolve rapidly during our development period, and we mitigated this risk by architecting the system to be able to replace models as necessary. With a modular design, we were able to prioritize open-source resources and choose LLMs solely according to their performance on quality, latency, and cost.

As anticipated, new generations of LLMs emerged three times during the project. Our approach allowed us to quickly determine the added value of switching, and then implement the switch seamlessly. Interestingly, we found that, for one of the new generations, “upgrading” the LLM decreased the quality of our application, reinforcing the fact that evaluations must be done in context.

In the final stage of the initial development process, we prepared for a brand-safe rollout, combining extensive in-house testing with a structured red-teaming exercise, a pilot program followed by a gradual rollout to HBP customers. To be fully ready, we worked with HBP’s communications, legal, and executive teams to develop an incident response plan in the event of anything going awry.

Hop’s approach spanned the entire process of implementation, from automated content ingestion pipelines connected to HBP’s own data infrastructure to a serverless implementation of the application hosted in HBP’s cloud infrastructure. We also developed self-service tools for HBP editorial staff to experiment with prompt engineering in the context of the application, and fostered the institutional knowledge required to make judgment calls about potential prompts. Our team shepherded the product through a brand-safe launch and responded to early user feedback to develop additional features.

THE RESULTS

Hop’s engagement with HBP resulted in the smooth deployment of an LLM application to HBR subscribers who can access the tool on HBR.org. It’s also being incorporated into a new enterprise learning product. HBP customers now have one more way to surface and interact with HBP content, and HBP has a new window into the needs of their customers.

HBP is much better informed about the potential and limitations of GenAI. They are already using the institutional knowledge developed during this project to explore alternate places to introduce GenAI, as well as multiple ways to reuse and extend the same infrastructure (since it’s modular and uses open-source components) for other use cases.

For more on this project, from the HBP perspective, check out this podcast.

Looking to leverage LLMs to boost your brand? Contact us to learn how Hop can help.