AI Dojo Example: CVE Processing - From Simple Beginnings to AI-Powered Insights

This article chronicles the development at the AI Dojo of a Common Vulnerabilities and Exposures (CVE) processing system, from its beginnings as a simple JSON-based tool to its transformation into a sophisticated, real-time AI-powered platform. We'll explore how each iteration added value and capabilities, ultimately creating a powerful resource for cybersecurity professionals.

If you want to skip ahead and see the final result: Show 100 CVEs

Phase 1: The JSON File Approach

The Initial Setup

Our journey began with a straightforward goal: to efficiently download and store CVE information for easy access and analysis. The first iteration of our system was built around a Python script that fetched CVE data from the National Vulnerability Database (NVD) API and stored it in a JSON file.

Here's a simplified version of what our initial script looked like:


import requests
import json
from datetime import datetime, timedelta

def fetch_cves(start_date, end_date):
    # Fetch CVEs from the NVD API
    # ... (API request code)

def save_cves(cves, filename):
    with open(filename, 'w') as f:
        json.dump(cves, f, indent=2)

def main():
    end_date = datetime.now()
    start_date = end_date - timedelta(days=7)
    
    cves = fetch_cves(start_date, end_date)
    filename = f"cves_{start_date.date()}_{end_date.date()}.json"
    save_cves(cves, filename)

if __name__ == "__main__":
    main()

This script would run periodically, fetching new CVEs and storing them in date-stamped JSON files.

Analyzing with jq

To examine the data, we leveraged the power of jq, a lightweight command-line JSON processor. jq allowed us to quickly query and analyze our CVE data without the need for complex database setups. Here are some examples of how we used jq:

Viewing all CVEs:
```
jq '.' cves_2024-07-02_2024-07-09.json
```

Counting total CVEs:

jq 'length' cves_2024-07-02_2024-07-09.json

Finding CVEs with high severity:

jq '.[] | select(.impact.baseMetricV3.cvssV3.baseScore >= 7)' cves_2024-07-02_2024-07-09.json

Advantages and Limitations

This initial approach had several advantages:

Simplicity: Easy to set up and understand
Portability: JSON files could be easily shared and backed up
Flexibility: jq provided powerful querying capabilities

However, as our needs grew, we began to encounter limitations:

Scalability issues with large datasets
Lack of real-time updates
Difficulty in integrating with other systems

Phase 2: Embracing Redis for Real-Time Processing

To address these limitations and add more value for our users, we evolved our system to use Redis, a high-performance, in-memory data store.

Redis Integration

We modified our CVE fetching script to store data in Redis instead of JSON files:


import redis
import requests
import json

def fetch_and_store_cves(r, start_date, end_date):
    # Fetch CVEs from the NVD API
    # ... (API request code)

    for cve in cves:
        r.set(cve['cve']['id'], json.dumps(cve))

def main():
    r = redis.Redis(host='localhost', port=6379, db=0)
    # ... (date range setup)
    fetch_and_store_cves(r, start_date, end_date)

if __name__ == "__main__":
    main()

This change brought several immediate benefits:

Faster data access and querying
Ability to easily update and manage individual CVE entries
Built-in support for data expiration and memory management

Implementing Pub/Sub for Real-Time Updates

The real game-changer came when we implemented a publish/subscribe (pub/sub) mechanism using Redis channels. This allowed us to notify subscribers in real-time whenever new CVEs were added to the system.

We modified our CVE fetching script to publish new CVEs:


def fetch_and_publish_cves(r, start_date, end_date):
    # ... (CVE fetching code)

    for cve in new_cves:
        r.set(cve['cve']['id'], json.dumps(cve))
        r.publish('new_cves', json.dumps({
            'id': cve['cve']['id'],
            'description': cve['cve']['description']['description_data'][0]['value'],
            'published': cve['publishedDate'],
            'severity': cve['impact']['baseMetricV3']['cvssV3']['baseScore']
        }))

And created a subscriber script to listen for these updates:


import redis
import json

def subscribe_to_cves():
    r = redis.Redis(host='localhost', port=6379, db=0)
    p = r.pubsub()
    p.subscribe('new_cves')

    for message in p.listen():
        if message['type'] == 'message':
            cve = json.loads(message['data'])
            print(f"New CVE: {cve['id']}")
            print(f"Description: {cve['description']}")
            print(f"Severity: {cve['severity']}")

if __name__ == "__main__":
    subscribe_to_cves()

This pub/sub system allowed users to receive immediate notifications about new vulnerabilities, significantly reducing the time between a CVE's publication and an organization's awareness of it.

Phase 3: AI-Powered CVE Processing

While real-time updates were a massive improvement, we recognized that the raw CVE data could often be technical and difficult to quickly understand. To address this, we integrated AI processing using the LLaMA 3 model via the Groq API.

Enhancing the Subscriber with AI Processing

We updated our subscriber script to process each incoming CVE with the LLaMA 3 model:


import redis
import json
from groq import Groq

client = Groq(api_key='your_api_key_here')

def process_cve(cve):
    prompt = f"""
    CVE ID: {cve['id']}
    Original Description: {cve['description']}
    Severity: {cve['severity']}

    Please rewrite this CVE information in a clear, easy-to-understand format, 
    including what it does, why it's a problem, and steps to mitigate it.
    """

    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": "You are a cybersecurity expert. Explain CVEs clearly."},
            {"role": "user", "content": prompt},
        ],
        model="llama3-70b-8192",
    )

    return response.choices[0].message.content

def subscribe_to_cves():
    r = redis.Redis(host='localhost', port=6379, db=0)
    p = r.pubsub()
    p.subscribe('new_cves')

    for message in p.listen():
        if message['type'] == 'message':
            cve = json.loads(message['data'])
            processed_cve = process_cve(cve)
            print(f"New CVE: {cve['id']}")
            print("AI-Processed Description:")
            print(processed_cve)

if __name__ == "__main__":
    subscribe_to_cves()

This AI-powered processing brought several key benefits:

Clearer, more accessible descriptions of vulnerabilities
Consistent formatting across all CVEs
Actionable insights and mitigation steps for each vulnerability

The Value Proposition: From Data to Actionable Intelligence

Through this evolution, we transformed a simple data collection tool into a sophisticated, real-time intelligence platform. Let's break down the value added at each stage:

1. JSON File Stage:

Provided a foundational database of vulnerabilities
Allowed basic querying and analysis with jq
Suitable for small teams or individual researchers

2. Redis Integration:

Improved data access speed and management
Enabled real-time notifications of new vulnerabilities
Facilitated integration with other security tools and workflows

3. AI-Powered Processing:

Transformed technical descriptions into clear, actionable intelligence
Reduced the time and expertise needed to understand and act on new vulnerabilities
Provided consistent, high-quality analysis for all CVEs

The final system offers subscribers:

Immediate awareness of new vulnerabilities
Clear, non-technical explanations of complex security issues
Actionable mitigation steps for each vulnerability
A scalable platform that can handle a large volume of CVEs

By starting simple and iteratively adding features, we created a system that not only collects data but transforms it into valuable, actionable intelligence. This evolution demonstrates the power of combining efficient data management, real-time communication, and AI-driven analysis to create a truly valuable tool for cybersecurity professionals.

As threats continue to evolve, so too will our system, always striving to provide the most timely, accurate, and actionable vulnerability intelligence to keep organizations one step ahead in the ever-changing landscape of cybersecurity.