Regression Testing your LLM RAG


Regression testing ensures that the answers obtained from tests align with the expected results. Whether it’s a ChatBot or Copilot, regression testing is crucial for verifying the accuracy of responses. For instance, in a ChatBot designed for HR queries, consistency in answering questions like “How do I change my withholding percentage on my 401K?” is essential, even after modifying or changing the LLM model or changing the embedding process of input documents.

Using a Python script, you can automate this process by comparing the actual responses with the expected ones. By employing text similarity functions, discrepancies between the actual and expected responses can be identified. This comparison returns a value close to 1 for contextual similarity, while values closer to 0 indicate significant differences. One example test could be like:

{
“Original_Prompt”: “What is the capital of France?”,
“Expected_Answer”: “The capital of France is Paris.”
}

To experiment with this testing process, a sample Python script has been shared that reads prompts and expected values from a json file, scoring them against the actual responses generated by the LLM. This script uses the OpenAI API and is just one example of automating RAG regression testing. Check out the script and the accompanying “test_prompts.json” file for sample input data in the provided GitHub link.

For organizations focusing on AI governance and prioritizing accuracy, automating RAG regression testing can become a step toward ensuring the reliability of AI systems. Take a look at the script and the sample input file.

https://github.com/oregon-tony/AI-Examples/blob/main/promptRegression

#RegressionTesting #AI #Automation #Python #OpenAI #Accuracy #RAG #Compliance

LLM Prompt Injection – Try this example.

As professionals working on AI projects, you might find this example of LLM Prompt Injection particularly relevant to your work. I’ve been involved in several AI projects, and I’d like to share one specific instance of LLM Prompt Injection that you can experiment with right away.

With the rapid deployment of AI features in the enterprise, it’s crucial to maintain the overall security of your creations. This example specifically addresses LLM Prompt Injection, one of the many aspects of LLM security. 

Continue reading

AI on IBM Power


Theme: Use your Power9 IBM Power resources for AI data processing tasks

By now, most have heard the saying:

If you are not using AI, you are behind.

For companies using IBM Power, especially Power9, which is commonplace throughout the industry, here is an idea of how you might use your existing or easily accessible cloud-based Power resources to help you jumpstart your journey to AI.

Continue reading

MGM Resorts Ransomware Attack: Disaster Recovery as a Malware Defense

This article was authored by me and posted on my company’s website. Please read the full article there.

MGM Resorts reported an active Ransomware incident starting on September 11th, and as of September 17th, it had not fully recovered. Rumors are that the company did not pay the ransom and is “recovering” its systems.

It makes you wonder, if a company like MGM Resorts, with all of its available resources, is struggling with a ransomware attack, what does that mean for the everyday company, not on its scale? After all, cyber criminals attack companies of all sizes.

I previously wrote about the concept of using the cloud to test and perfect your malware defenses. The main point is that the cloud could be a safe way to test your preventative measures in a live sandbox environment without the risk of actual contamination.

Why didn’t MGM switch to its Disaster Recovery (DR) system? You would think it would have a mirror of its production systems, and it could “switch over” in such events. Most DR systems are designed to switch over in minutes or hours, but not days or never. There are a few possibilities. One might be that its DR system was also impacted by the attack. The other is that its DR model likely did not include shared components essential to its overall operation, which seems unlikely.

Continue to the full article at this link.

The Rise of the Super Cloud and What it Means for Specialized Workloads

This article was authored by me and posted on my company’s website. Please read the full article there.

First came “the cloud,” and IT embraced and consumed it. For many companies, this evolved into hybrid-cloud due to business requirements such as meeting regulatory and data sovereignty requirements, leveraging paid-for on-premises technology investments, and addressing requirements for low latency, especially when communicating to legacy architectures.

Then came “multi-cloud,” as described by Vmware and others. Where “the cloud” usually means using the services of a single cloud provider, which most of us have done, “multi-cloud” describes using multiple cloud providers’ services in a heterogeneous way. More complex than the single cloud, multi-cloud is helpful for organizations needing to pick and choose services from various cloud vendors or requiring high-end redundancy. Today, 61% of businesses use one or two clouds and are considered to be “multi-cloud.” The drawback of multi-cloud is that each cloud operates in a more isolated operational model, and the customer has to integrate them. Concerns about specialized skill sets, greater complexity, and increased security concerns are often cited as the challenges of multi-cloud.

Continue to the full at this link.

Leveraging the cloud to accelerate mergers, acquisitions, divestitures

This article was authored by me and posted on my company’s website. Please read the full article there.

Companies engage in mergers, acquisitions, and divestitures. In most cases, a lengthy due diligence process happens. At some point in the process, “technological compatibility” between the impacted organizations is considered.  What happens when the companies merging have entirely different IT architectures?

Continue to the full article at this link.

How to “Float” on the Multi-Cloud.

There is a lot of talk about “multi-cloud,” but trying to achieve that level of cloud diversity might be challenging for many organizations. If you are starting out in the cloud, instead of building cloud-specific expertise across multiple cloud providers, try to “float” across multiple clouds as much as possible. Here is how.

First off, “What is Multi-cloud?”

Continue reading

Southwest needs a lift-and-shift to the multi-cloud, then refactor.

Most of us have heard about the crisis Southwest Airlines had over the Holidays. Most articles cite “problems related to legacy systems…” and “outdated scheduling software called SkySolver.” And, of course, there will be a huge financial impact as they try to make everything right with their customer base.

Most likely, the CEO, CFO, COO, CIO, and CTO of Southwest are receiving many calls and emails from vendors offering to “Let us fix it. We will convert everything to be cloud native…” This path sounds like the old saying, “No one ever got fired for buying IBM…” Southwest has a stated multi-cloud strategy, but legacy applications like SkySolver were obviously not priority cloud-native candidates. Though there will be pressure from investors, the industry, and the press to convert legacy applications like SkySolver to cloud-native, I would not initially recommend this approach.

Continue reading

Don’t have a Cloud Strategy? You should still migrate one app to the Cloud.

“Cloud computing is in its beginning stages and will only continue to grow, Amazon Web Services CEO Adam Selipsky told CNBC’s Jim Cramer on Tuesday.” (June 28th, 2022)

Even if you currently don’t have a comprehensive cloud strategy, regardless of the reason, there is a justification for doing a “proof-of-concept” in the cloud for at least one application of significance in your app portfolio.

Here are a few top reasons companies cite why they don’t move to the cloud. Of course, there could be many others, but these are popular.

  1. Costs (cost of the cloud service, plus implied costs like network connectivity)
  2. Your applications are antiquated and based on mainframe or mid-range servers.
  3. Security
  4. “If it isn’t broken, don’t fix it.”
Continue reading