But it worked in the notebook!

Displaying Imagen images in web applications 

In addition to generating text, many LLMs can generate images.  Of course they’re fun to look at and share, but you can also use them to personalize web pages and  show your product in a variety of worlds the customer creates.  We’re only starting to see all the possible uses of generated images. 

It’s easy to get started with Imagen, Google’s text-to-image model.  There’s a link on that page that lets you try out Imagen in your browser.  If you’re interested in seeing code for image generation, a quick start is available in this notebook. Displaying an image named myImage in a notebook is as easy as myImage.show() .  

Encouraged by this, you may be ready to add Imagen to a web application immediately.  More good news–there’s a Codelab that walks you through the steps to do this.  And if you’re familiar with deploying to Cloud Run and using Flask, most of that lab shouldn’t be too hard.  (And if you’re not familiar with them, it’s a great way to learn more.)  Creating an image comes down to creating a model, asking it to generate an image given a prompt, and showing the image:

generation_model = ImageGenerationModel.from_pretrained("imagegeneration@006")

responses = generation_model.generate_images(
    prompt="A watercolor painting of autumn leaves falling in the wind",
)[0]

response.show()

But then reality strikes.  When you go to use the show() method that worked so smoothly in a notebook, it doesn’t work.  And sure enough, the documentation for the GeneratedImage makes that clear:

So, we need to find an alternative to show that will work in a non-notebook environment.  Fortunately, we have a lot of possibilities.  Let’s explore them.

Our constraints

Besides creating an image that will displayed, what other constraints do we have?  Some possible requirements are:

  • Allow multiple users to use the application simultaneously.
    • This means that when we create an image, we won’t just be able to save it to a file with a fixed name, since many if there are multiple users, they would all want to use that name.  Who knows which image they’d get?
  • Provide for user privacy.
    • If someone is using the app to look at potential designs for a the next great water bottle, we don’t want someone else to be able to retrieve that image accidentally.
  • Reduce storage needs.
    • You could create keys for each user and store their images with that key as part of the image name, but this may take up a lot of space in your app’s storage.
  • The image needs to be converted to a form that can be displayed by the HTML command: <img src="{{image_uri}}"> .  So we’ll need to have a URL for the image.

In short, we need a way to store a user image so it is accessible only to the user who requests it, but we don’t want to store user images.  Sounds tricky.

But it can be done!

Data URL to the rescue

There is a less common, though still completely standard, way to make this work.  Instead of giving the img tag  a URL that needs to be fetched from some server, give it a URL that contains the entire image itself.  Doing that requires using a data URL

Instead of passing a location, data URLs send all of the data in the URL.  An example is below (from [1]):



The beginning indicates this URL holds data originally in the format image/png and base 64 encoded in the remainder of the URL.  This encodes the icon Base64 Icon

Yes, most modern browsers can handle very long URLs.  So, instead of having to store the image and send its URL, with the privacy and space issues this may lead to, we’ll generate an image and then convert it to Base64 and send it directly to the function that will render the WWW page.

Using the Data URL

There are still complications that we’ll have to deal with.  When you examine the the documentation for the GeneratedImage, the only method that looks useful is save :

This requires a location to save the image to.  Which sounds like we’re back to our original problem, but it turns out, Python has us covered in this case.  The tempfile library provides the ability to create a variety of temporary files and directories, including the NamedTemporaryFile.  We can use the name of that temporary file with the GeneratedImages’s save as shown below:

with tempfile.NamedTemporaryFile("wb") as f:
    filename = f.name
    response.save(filename, include_generation_parameters=False)
    # process the saved file here, before it goes away

The generated image is saved to the temporary file.  We don’t care what the name is, just that it has a unique name.  Now we can create the Base64 encoding so we can send the image to the template to be displayed.  To do this, we’ll need to:

  • open the file we just wrote
    with open(filename, "rb") as image_file:
  • read it in
    binary_image = image_file.read()
  • get its Base64 encoding
    base64_image = base64.b64encode(binary_image).decode("utf-8")
  • create the data URL holding this encoding:
    image_url = f"data:image/png;base64,{base64_image}"

Since all of this will be included in the with statement, the temporary file will be closed and automatically cleaned up when this is done.  The final code to get a data URL (image_url) from a generated image (response) is:

with tempfile.NamedTemporaryFile("wb") as f:
    filename = f.name
    response.save(filename, include_generation_parameters=False)
    # process the saved file here, before it goes away
    with open(filename, "rb") as image_file:
        binary_image = image_file.read()
        base64_image =    base64.b64encode(binary_image).decode("utf-8")
        image_url = f"data:image/png;base64,{base64_image}"

The final step is to render an HTML template that will display the image in image_url.

And that’s it

At least for being able to get a URL from a generated image.  If you want to go through all the steps to create a web app to get a prompt from a user and display the image generated for the prompt, take a look at this Codelab.  

I look forward to seeing how you can use this ability in your apps![1]https://www.learningtree.com/blog/encoding-image-css-html/, Retrieved 4 November 2024.

A storied introduction to RAG

One of my nightmares is being on Jeopardy! and having the worst board ever.  Along with categories like Minor League Right Fielders of the 1970s and The Periodic Table in Portuguese  I’d expect to see LLM Abbreviations.  While I do know LLM stands for Large Language Model, there seems to be a new LLM-related abbreviation every day. 

It’s time to face my fears and take on one of those abbreviations–RAG.  But instead of getting into technical details right away, let’s tell a story that will hopefully make RAG a lot less daunting.

What is RAG?

The abbreviation RAG itself is not that bad; it stands for Retrieval-Augmented Generation.  That’s a pretty good description.  Before generating an answer, RAG will retrieve documents that are pertinent to the question and send them along to the LLM to get the final answer.

RAG is one way to improve how LLMs generate their output.  There are a lot of other alternatives to using RAG, such as 

  • retraining an entire model from scratch (a very expensive operation) 
  • working to improve the prompts sent to the model 
  • tuning the model with more test cases specific to your use case

So, when is RAG a better choice than these?  

  • RAG gives the LLM access to current information.  Since retraining is so expensive, it isn’t done often and some popular LLMs may not have been trained for years.  
  • RAG gives the LLM access to specific information for a company.  A bakery may use a general LLM to help with planning, but by using RAG, it can add exact information about the bakery’s tools, recipes, and supplies.   
  • Since the user is providing information they believe to be true, RAG tends to reduce hallucinations.

A story

I’ve had the privilege of knowing Dr. Shannon Duvall for years and one of her research areas is storytelling in computer science[1].  Stories can make complex topics more concrete and, on first glance, RAG is a complex topic.  So, let’s start with a story.

Chris owns a travel agency and Pat works there.  

Chris finished travel agent school in 2022 and has been too busy to keep up with what’s new.

Fortunately, the travel agency gets lots of brochures from around the world that have more recent information.  Now, these brochures are in lots of foreign languages and they use similar words to say the same thing (like “on the water”, “beachfront”, “walk from your room to the ocean”, “sur la mer”, “на берегу океана” etc.).  So Pat has encoded them into a bunch of numbers, based on their attributes.  Pat uses a very involved process to do this encoding.  There’s no clear pattern between the document contents and the encoding, but it is amazingly consistent!

When Madison, a customer, comes and asks for advice, Pat encodes the request using the same process and looks at the brochures’ encodings and finds the ones that match the best.  Again, Pat has a fancy way of checking for how well brochures match, but we trust Pat to do the match well.

Once Pat has all the best brochures ready, they are given to Chris, who also gets Madison’s request.  Chris has a rule that any response must include information from at least one of those brochures.   Based on all of this, Chris responds to Madison.

Comments on the story

On first read, this story may seem to have nothing to do with computing or LLMs, but let’s break it down.

  • Chris represents the LLM that will ultimately generate a recommendation for Madison.
  • But Chris’s knowledge is outdated, so will need to be augmented.
  • There are lots of travel brochures that can be used to help Chris make a current decision and Pat will decide which of these are pertinent.
  • Fortunately, the information for each of the brochures has been given a numeric encoding that is stored in the travel agency’s computer before Madison ever showed up.
  • So Pat will run a simple program to encode Madison’s request and determine which of the brochures are the closest matches.  
  • These brochures are then given to Chris along with the original request.  Using this additional, pertinent material, Chris can make a better recommendation to Madison. 

So, how does that relate to doing retrieval-augmented response generation?

First, RAG is not a one step process.

  • You need to first create a set of authoritative material and create an encoding from it.  This does not need to be done for each request of the LLM.
  • Then you need to encode the request and use it to retrieve the pertinent documents.
    • Be sure you’re using the same encoding system for both the main set of documents and the request!
  • Then you can send the query along with the documents to the “main” LLM to create a response.

RAG doesn’t change the underlying model.  Because of this, it has a relatively low cost per request, just needing to encode single prompt.  

And there’s not one way to do it.  You still need to pick a place to save the embeddings (the vector space) and decide on how to determine how close two points in that vector space are (the similarity measure).  If you have a lot of embeddings (at least 5,000), it may increase efficiency to create a vector index to improve your search speed.  And you still have your choice of the main LLM to feed these intermediate results into.

Why use numbers?

Why do we need to translate the documents into numbers?  Put simply, computers are very, very good at processing numbers and nowhere near as good at dealing with text.  By converting to numbers, we play into the strengths of computing.  (My apologies to any cyber-archaeologists who read this in 20 to 100 years and laugh at the claim that computers are not as good at handling text as they are at numbers.)  Converting to numbers removes differences in wording and languages.   

If you look into embeddings, you’ll usually find examples like plotting foods on a “spicy” to “bland” axis (and it may go to 2 or 3 dimensions, adding in “hot” or “cold” and “healthy” or “indulgent”).  I’m not going to do that here because it can be misleading.  First, most embeddings are done in hundreds of dimensions.  Second, usually the axes do not have clear meaning (like “temperature” or “spiciness”).   But you can see a simple example of the embeddings for individual words at the Embedding Projector, which visualizes high dimensional embeddings for 10,000 common English words and the closest words to them.  I searched for “dog” and while I expected some of the results (like “pet”), others were more surprising (like “hat”).

Next steps

Hopefully, this introduction has let you see RAG is not as daunting a topic as it may have seemed.  Understanding something and implementing it can be vastly different.  Come back for future posts to get into the details of find and embedding data and passing it on to an LLM

[1] Shannon Duvall. 2008. Computer science fairy tales. J. Comput. Sci. Coll. 24, 2 (December 2008), 98–104.