Stefan Krawczyk | Sep, 2023: Unveiling the Fascinating LLMOps: Hamilton’s Production Prompt Engineering Patterns

Introduction:

prompts and monitor changes, such as a version control system or a change tracking tool. Keeping track of prompt changes will help ensure that any issues or inconsistencies can be easily identified and resolved. Additionally, it is important to have a system in place for managing and updating prompts, whether it be through dynamically loading prompts at runtime or retrieving prompts from an external system. This will allow for quick and efficient iteration on prompts without the need for redeploying the entire application.

Full Article: Stefan Krawczyk | Sep, 2023: Unveiling the Fascinating LLMOps: Hamilton’s Production Prompt Engineering Patterns

An Overview of Iterating on Prompts with HamiltonPrompts

The importance of the prompts used with large language models (LLMs) cannot be underestimated. Even small changes to prompts can have significant effects on the outputs generated by the models. As your product evolves, so too must your prompts. LLMs are constantly being developed and released, which means prompts must be adjusted accordingly. To streamline this process and ensure efficient workflow without compromising production quality, it’s essential to establish an iteration pattern. In this post, we’ll explore the best practices for managing prompts using Hamilton, an open-source micro-orchestration framework. We’ll draw analogies to MLOps patterns and discuss trade-offs along the way. While the focus is on Hamilton, the high-level takeaways are applicable even if you don’t use this specific tool.

Before we dive in, here are a few things to note:

– I am one of the co-creators of Hamilton.
– If you’re not familiar with Hamilton, you can find more information at the bottom of this post.
– This article doesn’t cover “context management.” Instead, it focuses on the nuts and bolts of iterating and creating a production-grade “prompt context management” story.
– We’ll assume that the prompts are being used in an online web service setting.
– Our Hamilton PDF summarizer example will serve as a reference point throughout this discussion.
– Our credibility in this domain stems from our experience building self-service data/MLOps tools, particularly for Stitch Fix’s team of over 100 data scientists.

Prompts and LLM APIs: An Analogy

When it comes to “Ops” practices, both LLMOps and MLOps are relatively new concepts. While MLOps has gained some traction, neither practice is as widely adopted as DevOps. DevOps focuses on shipping code to production, while MLOps deals with shipping code and data artifacts (such as statistical models) to production. So where does LLMOps fit in? Personally, I believe LLMOps aligns more closely with MLOps. Here’s why:

1. LLM workflows consist primarily of code.
2. LLM APIs are data artifacts that can be adjusted using prompts, similar to hyper-parameters in machine learning models.

Given this perspective, it’s clear that versioning the LLM API and prompts together is crucial for solid production practices. Just as you would want to validate that an ML model behaves correctly when its hyper-parameters change, the same principle applies to LLMs and prompts. It’s important to distinguish between changes made to the LLM workflow and changes made to the LLM API + prompts. While LLMs (whether self-hosted or accessed via APIs) are generally static, the prompts component of the LLM API + prompts functions more like a new model artifact when modified.

Two Approaches to Managing Prompts

There are two main approaches to handling prompts:

1. Treat prompts as dynamic runtime variables: In this approach, the prompt template used is not static but can be updated during deployment.
2. Treat prompts as code: In this approach, the prompt template remains static and predetermined for each deployment.

The choice between these approaches impacts the complexity of managing the various components involved. Let’s explore each approach in the context of Hamilton.

Dynamically Pass/Load Prompts

Prompts are essentially strings, making them easy to pass around in most programming languages. The idea behind dynamically passing or loading prompts is to abstract your code so that the required prompts can be specified at runtime. Essentially, you “load” or “reload” prompt templates whenever an updated version is available.

To draw an analogy to MLOps, this approach is similar to auto-reloading the ML model artifact (e.g., a pkl file) whenever a new model becomes available.

The benefit of this approach is the ability to quickly roll out new prompts without needing to redeploy your application. However, it also comes with increased operational burden. Consider the following challenges:

– Monitoring changes: It may be unclear to those monitoring your application when a prompt change occurs and if it has propagated through the system. For instance, if a new prompt is pushed and the LLM begins returning more tokens per request, causing latency to spike, it can be confusing for the monitoring team.
– Rollback semantics: If something goes wrong, rolling back to a previous deployment may not be straightforward without knowledge of another system. Simply rolling back the code alone won’t address the prompt changes.
– Managing system understanding: It’s important to have robust monitoring in place to track which prompts were used. This information is crucial when customer service or support teams raise tickets for investigation.
– Prompt management: You’ll need to establish and monitor a system to handle and store your prompts. This system will require additional maintenance and management efforts separate from your code-serving infrastructure.
– Synchronizing changes: Updating and deploying your code and prompts will require coordination. If changes need to be made to both systems, ensuring they align can become a significant operational overhead.

In the Hamilton workflow, the code for our PDF summarizer would include the required prompt inputs for each function. This way, you can inject the prompts at request time or dynamically load them from an external system:

“`Python
from hamilton import base, driver
import summarization_shortened

# Create driver
dr = (driver.Builder().with_modules(summarization_shortened).build())

# Pull prompts from somewhere
summarize_chunk_of_text_prompt = “””SOME PROMPT FOR {chunked_text}”””
summarize_text_from_summaries_prompt = “””SOME PROMPT {summarized_chunks} … {user_query}”””

# Execute and pass in the prompts
result = dr.execute(
[“summarized_text”],
inputs={
“summarize_chunk_of_text_prompt”: summarize_chunk_of_text_prompt,

}
)
“`

Alternatively, you can modify your code to dynamically load prompts by adding functions that retrieve prompts from an external system:

“`Python
# prompt_template_loaders.py
def summarize_chunk_of_text_prompt(db_client: Client, other_args: str) -> str:
# Pseudo code to retrieve the latest prompt from the database
_prompt = db_client.query(“get latest prompt X from DB”, other_args)
return _prompt

def summarize_text_from_summaries_prompt(db_client: Client, another_arg: str) -> str:
# Pseudo code to retrieve the latest prompt from the database
_prompt = db_client.query(“get latest prompt Y from DB”, another_arg)
return _prompt
“`

You can then modify the driver code to use these prompt retrieval functions:

“`Python
from hamilton import base, driver
import prompt_template_loaders # Load this to provide prompt inputs
import summarization_shortened

# Create driver
dr = (driver.Builder().with_modules(prompt_template_loaders, summarization_shortened).build())

# Execute without passing prompts as inputs
result = dr.execute([“summarized_text”], inputs={
# Don’t need to pass prompts in this version
})
“`

Monitoring and Logging Prompt Usage

To keep track of prompt usage, you can employ various logging and monitoring techniques. Here are a few options:

– Log results of execution: After running Hamilton, you can log the relevant information to a system of your choice. For example:

“`Python
result = dr.execute(
[“summarized_text”, “summarize_chunk_of_text_prompt”, … # and any other desired outputs
],
inputs={
# Don’t need to pass prompts in this version
}
)
my_log_system(result) # Send relevant data to a system you own for safekeeping
“`

– Log specific prompts: To log a specific prompt used, you can include a logging statement within the prompt retrieval function. For example:

“`Python
import logging

logger = logging.getLogger(__name__)

def summarize_text_from_summaries_prompt(db_client: Client, another_arg: str) -> str:
# Pseudo code to retrieve the latest prompt from the database
_prompt = db_client.query(“get latest prompt Y from DB”, another_arg)
logger.info(f”Prompt used is [{_prompt}]”)
return _prompt
“`

Extending Hamilton to Include Logging

Hamilton allows you to capture information from executed functions, so you can extend it to include prompt logging. For instance:

“`Python
from hamilton import base, driver
import summarization_shortened
import logging

logger = logging.getLogger(__name__)

def summarize_text_from_summaries_prompt(db_client: Client, another_arg: str) -> str:
# Pseudo code to retrieve the latest prompt from the database
_prompt = db_client.query(“get latest prompt Y from DB”, another_arg)
logger.info(f”Prompt used is [{_prompt}]”)
return _prompt

# Create driver
dr = (driver.Builder()
.with_modules(prompt_template_loaders, summarization_shortened)
.build())

# Execute without passing prompts as inputs
result = dr.execute([“summarized_text”], inputs={
# Don’t need to pass prompts in this version
})
“`

With these approaches, you can effectively monitor prompt usage and gain insights into the prompts applied during each execution of your code.

Conclusion

Managing prompts in a production setting requires careful consideration and a structured workflow. By implementing the practices outlined here, utilizing tools like Hamilton, and leveraging MLOps principles, you can iterate on prompts effectively and streamline your production process. Remember to monitor prompt changes, establish rollback strategies, and maintain robust systems for prompt management. With a well-structured approach, you can confidently evolve your prompts and maximize the potential of your LLMs.

Summary: Stefan Krawczyk | Sep, 2023: Unveiling the Fascinating LLMOps: Hamilton’s Production Prompt Engineering Patterns

“An Overview of Production-Grade Prompt Iteration with HamiltonPrompts”

This article discusses the importance of evolving prompts in a production context when working with large language models (LLMs). It highlights the need to iterate on prompts as LLMs change and provides best practices for managing prompts using Hamilton, an open-source micro-orchestration framework. The article also draws analogies to MLOps patterns and discusses trade-offs. Overall, it provides valuable insights for efficiently iterating on prompts in a production environment.



LLMOps: Production Prompt Engineering Patterns with Hamilton – FAQs

Frequently Asked Questions

1. What is LLMOps?

LLMOps stands for Low Latency Multi-Operation Systems. It is a production prompt engineering pattern that focuses on minimizing latency and improving system performance in multi-operation environments.

2. How does LLMOps benefit production prompt engineering?

LLMOps enables faster and more efficient processing of multiple operations, reducing latency and improving overall system performance. By implementing LLMOps, organizations can streamline their production prompts and enhance the effectiveness of their engineering patterns.

3. What is Hamilton?

Hamilton is a framework that implements LLMOps and provides a set of tools and resources for building multi-operation systems. It offers a scalable and fault-tolerant infrastructure to handle complex production prompt engineering requirements.

4. How can I implement LLMOps with Hamilton?

To implement LLMOps with Hamilton, you need to first understand the concepts and principles behind LLMOps. Then, you can leverage Hamilton’s framework and resources to design and develop your production prompt engineering patterns. Hamilton provides extensive documentation and examples to guide you through the implementation process.

5. Are there any specific requirements to use LLMOps with Hamilton?

Using LLMOps with Hamilton requires a basic understanding of production prompt engineering, distributed systems, and software development. Familiarity with programming languages such as Python or Java is recommended for implementing LLMOps patterns effectively.

6. Can LLMOps be used in any industry?

Yes, LLMOps can be applied in various industries that involve multi-operation systems, such as finance, e-commerce, telecommunications, and healthcare. Any organization that deals with high-volume operations can benefit from implementing LLMOps with Hamilton.

7. Is LLMOps suitable for both small-scale and large-scale systems?

LLMOps is designed to be scalable and adaptable to different system sizes. Whether you have a small-scale or large-scale system, LLMOps with Hamilton can be tailored to meet your specific production prompt engineering needs.

8. Can LLMOps and Hamilton help improve system reliability?

Yes, LLMOps and Hamilton contribute to improved system reliability by providing fault-tolerant infrastructure and efficient processing of multiple operations. By incorporating LLMOps and Hamilton into your production prompt engineering patterns, you can enhance the stability and dependability of your systems.

9. Are there any success stories of implementing LLMOps with Hamilton?

Yes, several organizations have successfully implemented LLMOps with Hamilton and achieved significant performance improvements in their production prompt engineering. These success stories demonstrate the effectiveness and value of LLMOps and Hamilton in various industries.

10. How can I get started with LLMOps and Hamilton?

To get started with LLMOps and Hamilton, you can visit the official Hamilton website for documentation, tutorials, and examples. The website provides comprehensive resources to help you understand LLMOps and implement it effectively with Hamilton.