Counting LLM Tokens

In the world of LLMs, someone eventually pays for “tokens.” But tokens are not necessarily equivalent to words. Understanding the relationship between words and tokens is critical to grasping how language models like GPT-4 process text.

While a simple word like “cat” may be a single token, a more complex word like “unbelievable” might be broken down into multiple tokens such as “un,” “believ,” and “able.” By converting text into these smaller units, language models can better understand and generate natural language, making them more effective at tasks like translation, summarization, and conversation.

Continue reading

Regression Testing your LLM RAG


Regression testing ensures that the answers obtained from tests align with the expected results. Whether it’s a ChatBot or Copilot, regression testing is crucial for verifying the accuracy of responses. For instance, in a ChatBot designed for HR queries, consistency in answering questions like “How do I change my withholding percentage on my 401K?” is essential, even after modifying or changing the LLM model or changing the embedding process of input documents.

Using a Python script, you can automate this process by comparing the actual responses with the expected ones. By employing text similarity functions, discrepancies between the actual and expected responses can be identified. This comparison returns a value close to 1 for contextual similarity, while values closer to 0 indicate significant differences. One example test could be like:

{
“Original_Prompt”: “What is the capital of France?”,
“Expected_Answer”: “The capital of France is Paris.”
}

To experiment with this testing process, a sample Python script has been shared that reads prompts and expected values from a json file, scoring them against the actual responses generated by the LLM. This script uses the OpenAI API and is just one example of automating RAG regression testing. Check out the script and the accompanying “test_prompts.json” file for sample input data in the provided GitHub link.

For organizations focusing on AI governance and prioritizing accuracy, automating RAG regression testing can become a step toward ensuring the reliability of AI systems. Take a look at the script and the sample input file.

https://github.com/oregon-tony/AI-Examples/blob/main/promptRegression

#RegressionTesting #AI #Automation #Python #OpenAI #Accuracy #RAG #Compliance

LLM Prompt Injection – Try this example.

As professionals working on AI projects, you might find this example of LLM Prompt Injection particularly relevant to your work. I’ve been involved in several AI projects, and I’d like to share one specific instance of LLM Prompt Injection that you can experiment with right away.

With the rapid deployment of AI features in the enterprise, it’s crucial to maintain the overall security of your creations. This example specifically addresses LLM Prompt Injection, one of the many aspects of LLM security. 

Continue reading

AI on IBM Power


Theme: Use your Power9 IBM Power resources for AI data processing tasks

By now, most have heard the saying:

If you are not using AI, you are behind.

For companies using IBM Power, especially Power9, which is commonplace throughout the industry, here is an idea of how you might use your existing or easily accessible cloud-based Power resources to help you jumpstart your journey to AI.

Continue reading