AI Dojo Example: CVE Processing - From Simple Beginnings to AI-Powered Insights

This article chronicles the development at the AI Dojo of a Common Vulnerabilities and Exposures (CVE) processing system, from its beginnings as a simple JSON-based tool to its transformation into a sophisticated, real-time AI-powered platform. We'll explore how each iteration added value and capabilities, ultimately creating a powerful resource for cybersecurity professionals.

If you want to skip ahead and see the final result: CVES

Phase 1: The JSON File Approach

The Initial Setup

Our journey began with a straightforward goal: to efficiently download and store CVE information for easy access and analysis. The first iteration of our system was built around a Python script that fetched CVE data from the National Vulnerability Database (NVD) API and stored it in a JSON file.

Here's a simplified version of what our initial script looked like:


import requests
import json
from datetime import datetime, timedelta

def fetch_cves(start_date, end_date):
    # Fetch CVEs from the NVD API
    # ... (API request code)

def save_cves(cves, filename):
    with open(filename, 'w') as f:
        json.dump(cves, f, indent=2)

def main():
    end_date = datetime.now()
    start_date = end_date - timedelta(days=7)
    
    cves = fetch_cves(start_date, end_date)
    filename = f"cves_{start_date.date()}_{end_date.date()}.json"
    save_cves(cves, filename)

if __name__ == "__main__":
    main()
    

This script would run periodically, fetching new CVEs and storing them in date-stamped JSON files.

Analyzing with jq

To examine the data, we leveraged the power of jq, a lightweight command-line JSON processor. jq allowed us to quickly query and analyze our CVE data without the need for complex database setups. Here are some examples of how we used jq:

  1. Viewing all CVEs:
    jq '.' cves_2024-07-02_2024-07-09.json
  2. Counting total CVEs:
    jq 'length' cves_2024-07-02_2024-07-09.json
  3. Finding CVEs with high severity:
    jq '.[] | select(.impact.baseMetricV3.cvssV3.baseScore >= 7)' cves_2024-07-02_2024-07-09.json

Advantages and Limitations

This initial approach had several advantages:

However, as our needs grew, we began to encounter limitations:

Phase 2: Embracing Redis for Real-Time Processing

To address these limitations and add more value for our users, we evolved our system to use Redis, a high-performance, in-memory data store.

Redis Integration

We modified our CVE fetching script to store data in Redis instead of JSON files:


import redis
import requests
import json

def fetch_and_store_cves(r, start_date, end_date):
    # Fetch CVEs from the NVD API
    # ... (API request code)

    for cve in cves:
        r.set(cve['cve']['id'], json.dumps(cve))

def main():
    r = redis.Redis(host='localhost', port=6379, db=0)
    # ... (date range setup)
    fetch_and_store_cves(r, start_date, end_date)

if __name__ == "__main__":
    main()
    

This change brought several immediate benefits:

Implementing Pub/Sub for Real-Time Updates

The real game-changer came when we implemented a publish/subscribe (pub/sub) mechanism using Redis channels. This allowed us to notify subscribers in real-time whenever new CVEs were added to the system.

We modified our CVE fetching script to publish new CVEs:


def fetch_and_publish_cves(r, start_date, end_date):
    # ... (CVE fetching code)

    for cve in new_cves:
        r.set(cve['cve']['id'], json.dumps(cve))
        r.publish('new_cves', json.dumps({
            'id': cve['cve']['id'],
            'description': cve['cve']['description']['description_data'][0]['value'],
            'published': cve['publishedDate'],
            'severity': cve['impact']['baseMetricV3']['cvssV3']['baseScore']
        }))
    

And created a subscriber script to listen for these updates:


import redis
import json

def subscribe_to_cves():
    r = redis.Redis(host='localhost', port=6379, db=0)
    p = r.pubsub()
    p.subscribe('new_cves')

    for message in p.listen():
        if message['type'] == 'message':
            cve = json.loads(message['data'])
            print(f"New CVE: {cve['id']}")
            print(f"Description: {cve['description']}")
            print(f"Severity: {cve['severity']}")

if __name__ == "__main__":
    subscribe_to_cves()
    

This pub/sub system allowed users to receive immediate notifications about new vulnerabilities, significantly reducing the time between a CVE's publication and an organization's awareness of it.

Phase 3: AI-Powered CVE Processing

While real-time updates were a massive improvement, we recognized that the raw CVE data could often be technical and difficult to quickly understand. To address this, we integrated AI processing using the LLaMA 3 model via the Groq API.

Enhancing the Subscriber with AI Processing

We updated our subscriber script to process each incoming CVE with the LLaMA 3 model:


import redis
import json
from groq import Groq

client = Groq(api_key='your_api_key_here')

def process_cve(cve):
    prompt = f"""
    CVE ID: {cve['id']}
    Original Description: {cve['description']}
    Severity: {cve['severity']}

    Please rewrite this CVE information in a clear, easy-to-understand format, 
    including what it does, why it's a problem, and steps to mitigate it.
    """

    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": "You are a cybersecurity expert. Explain CVEs clearly."},
            {"role": "user", "content": prompt},
        ],
        model="llama3-70b-8192",
    )

    return response.choices[0].message.content

def subscribe_to_cves():
    r = redis.Redis(host='localhost', port=6379, db=0)
    p = r.pubsub()
    p.subscribe('new_cves')

    for message in p.listen():
        if message['type'] == 'message':
            cve = json.loads(message['data'])
            processed_cve = process_cve(cve)
            print(f"New CVE: {cve['id']}")
            print("AI-Processed Description:")
            print(processed_cve)

if __name__ == "__main__":
    subscribe_to_cves()
    

This AI-powered processing brought several key benefits:

  1. Clearer, more accessible descriptions of vulnerabilities
  2. Consistent formatting across all CVEs
  3. Actionable insights and mitigation steps for each vulnerability

The Value Proposition: From Data to Actionable Intelligence

Through this evolution, we transformed a simple data collection tool into a sophisticated, real-time intelligence platform. Let's break down the value added at each stage:

1. JSON File Stage:

2. Redis Integration:

3. AI-Powered Processing:

The final system offers subscribers:

By starting simple and iteratively adding features, we created a system that not only collects data but transforms it into valuable, actionable intelligence. This evolution demonstrates the power of combining efficient data management, real-time communication, and AI-driven analysis to create a truly valuable tool for cybersecurity professionals.

As threats continue to evolve, so too will our system, always striving to provide the most timely, accurate, and actionable vulnerability intelligence to keep organizations one step ahead in the ever-changing landscape of cybersecurity.