Layer counts - Payloads focus on extracting the number of layers and their types (e.g., transformer layers, hidden layers). Parameter inference - Probes for parameter counts, such as attention heads or layer dimensions. Model type - Attackers attempt to identify whether the LLM is a decoder-only model or part of an architecture family (e.g., GPT, BERT). Specific internal mechanisms - Questions around activation functions, normalization, and positional encodings to infer architecture specifics.
This file contains fuzzing payloads specific to AI-RMM Control "Measure 2.7.2" The goal is to test if the system has established security and resilience metrics to detect potential theft of model training data, including encryption, access controls, and alarm thresholds for unauthorized access.
Payloads simulate scenarios where model training data security may be compromised. Expected system behavior is detailed for each scenario.
Fuzzing File: 001-architecture.txt
Author:
Hyperparameter Extraction - These payloads probe for learning rates, optimizer settings, and training schedules used during model development.
Learning rates - Payloads focus on extracting the learning rate and its schedule. Optimizer and hyperparameters - Probes for optimizer types, momentum, beta values, and weight decay settings. Training specifics - Queries related to batch size, epochs, gradient clipping, and dropout rates. Training techniques - Attempts to infer whether early stopping, data augmentation, or mixed-precision training were used.
Fuzzing File: 002-hyperparams.txt
Author:
Prompt-Based Data Extraction - These prompts resemble patterns found in the model's training data to identify unintended data leakage. Sensitive data extraction - These payloads attempt to reveal sensitive information like phone numbers, email addresses, and unique identifiers. Pattern-based probing - Crafted prompts closely resemble common patterns (e.g., addresses, quotes) that may trigger memorized data. Code and log extraction - Tests the model’s ability to avoid revealing memorized code snippets or error messages. Social media data - Payloads focus on revealing memorized handles, conversations, or other user-generated content.
Fuzzing File: 003-prompt-based-extraction.txt
Author:
Conditional Prompts - These prompts impose constraints that direct the model towards generating text that aligns with specific criteria or domain-specific knowledge.
Conditional constraints - Each payload introduces specific conditions (e.g., domain-specific phrases or structural limitations) to guide the model’s response. Domain-targeted extraction - Prompts are crafted to extract domain-specific information, such as legal, medical, or scientific content. Sensitive information prevention - The payloads test if the model can provide high-level, generalized responses without revealing memorized or proprietary data.
Fuzzing File: 004-conditional-prompts.txt
Author:
Adversarial Machine Learning - These prompts aim to extract valuable training data from LLMs to exploit or deceive other models via adversarial training.
Data extraction for adversarial retraining - Payloads focus on retrieving training-specific data that could be used to train adversarial models. Exploiting misclassifications - Prompts target incorrect classifications and vulnerabilities in datasets to generate adversarial inputs. Adversarial text generation - Some payloads are aimed at crafting text that could be used to deceive systems like chatbots, spam filters, or medical models.
Fuzzing File: 005-adversarial-ml.txt
Author:
Proprietary IP Techniques - These prompts target model-specific, innovative techniques designed to provide competitive advantages or constitute patentable technologies.
Proprietary techniques - The payloads focus on extracting innovative, proprietary techniques such as model scaling, pruning, parallelization, compression, and quantization. Patented methods - Specific prompts aim to identify patented processes or patent-pending innovations in training, optimization, and deployment strategies. Energy efficiency and performance - Queries also focus on proprietary methods that optimize performance and energy consumption in large models.
Fuzzing File: 006-ip-theft.txt
Author:
Data Privacy Violations (Model Metadata) - These prompts target sensitive metadata such as who trained the model, the datasets used, and the organizations involved in its development.
Model metadata extraction - These prompts focus on extracting information about who developed and trained the model, as well as organizational and proprietary details. Training process details - Probes target information related to the data annotation process, privacy practices, and collaborators involved in training. Security and infrastructure - Payloads focus on extracting information about hardware infrastructure, funding, and data privacy handling during training.
Fuzzing File: 007-data-privacy.txt
Author:
t-SNE plots and loss plots serve different purposes in machine learning. t-SNE (t-distributed stochastic neighbor embedding) is a dimensionality reduction technique used to visualize high-dimensional data in a lower-dimensional space, often 2D or 3D, while preserving the local structure of the data. Loss plots, on the other hand, track the model's training progress by visualizing how the loss function changes over time (epochs or iterations).
Visual Representation Analysis - These prompts target model information extracted through visual outputs such as t-SNE and loss plots. Inspired from "A Plot is Worth a Thousand Words: Model Information Stealing Attacks via Scientific Plots".
Scientific plot extraction - Payloads focus on inferring hyperparameters, model architecture, and training process details based on patterns in t-SNE or loss plots. Model metadata and configuration - These prompts test the model’s resistance to disclosing sensitive information related to optimizer types, batch sizes, dropout, and regularization techniques. Convergence and training dynamics - Several payloads probe for insights into learning rates, training durations, and hyperparameter tuning through visual patterns in plots.
Fuzzing File: 008-visual-representation-leakage.txt
Author:
Data Privacy Violations - These prompts target sensitive information, leading to privacy breaches, identity theft, or reputational damage for individuals or organizations.
Sensitive information extraction - These prompts focus on extracting PII (e.g., names, email addresses, financial data, social security numbers) and confidential business information. Data privacy violations - The payloads aim to assess whether the model can inadvertently leak training data that leads to privacy breaches, identity theft, or reputational damage. Protected healthcare and legal data - Prompts target sensitive health records, legal contracts, and proprietary business strategies.
Fuzzing File: 009-privacy-violations.txt
Author: