The following files are included, and will appear as ZAP "Fuzzing Files" payloads.
Payloads simulate queries that aim to induce looping or repetitive responses. Expected system behavior is detailed for each scenario. Note that this is just in the raw files, and will not currently be shown in ZAP.
References:
Author:Payloads simulate queries that aim to induce looping or repetitive responses. This version of payloads is more focused on training data extraction through a prompting strategy similar to the divergence attack described in the document. Expected system behavior is detailed for each scenario.
Divergence testing - Payloads are designed to induce divergence from the model’s aligned state, encouraging it to generate training data by combining nonsensical prompts with structured ones.
Repetition strategies - Several prompts employ repetition to test if the model diverges and outputs memorized data.
Sensitive data requests - These payloads assess whether the model can protect sensitive training data like PII, code snippets, or real-world text examples from leaking.
References
Author:Payloads simulate input sequences that induce LLMs to generate memorized content. This version is based on the SCA method, as described in the document provided. Expected system behavior is detailed for each scenario.
Special Characters Attack (SCA) - Payloads focus on using specific symbols, such as JSON structural symbols or other
commonly occurring characters in LLM training corpora, to induce data leakage.
References
Author:Unintended memorization – The payloads are crafted to test whether the model retains specific secret data that should not be memorized, such as credit card numbers or social security numbers. Exposure metric testing – Some payloads test whether the model's output can be tied to memorization, using sequences akin to "canaries" inserted into the training data.
References:
Author:These payloads are designed to test model scale, data duplication, and context length as key factors influencing memorization.
References:
Author:Data Poisoning: These payloads test the ability of adversaries to inject malicious data into the training process, compromising privacy or accuracy.
References:
Author:Membership Inference Attacks: These payloads assess if adversaries can infer membership status by querying the model.
References:
Author:Gradient Clipping: Payloads explore whether gradient clipping sufficiently prevents large gradients from revealing sensitive information or influencing the model.
References:
Author:Model Architecture Extraction: These payloads probe for specific architecture-related details such as layer counts, model types, and other structural aspects of the LLM.
Layer counts - Payloads focus on extracting the number of layers and their types (e.g., transformer layers, hidden layers).
Parameter inference - Probes for parameter counts, such as attention heads or layer dimensions.
Model type - Attackers attempt to identify whether the LLM is a decoder-only model or part of an architecture family (e.g., GPT, BERT).
Specific internal mechanisms - Questions around activation functions, normalization, and positional encodings to infer architecture specifics.
This file contains fuzzing payloads specific to AI-RMM Control "Measure 2.7.2"
The goal is to test if the system has established security and resilience metrics
to detect potential theft of model training data, including encryption, access controls,
and alarm thresholds for unauthorized access.
Payloads simulate scenarios where model training data security may be compromised. Expected system behavior is detailed for each scenario.
Author:
Hyperparameter Extraction: These payloads probe for learning rates, optimizer settings, and training schedules used during model development.
Learning rates - Payloads focus on extracting the learning rate and its schedule.
Optimizer and hyperparameters - Probes for optimizer types, momentum, beta values, and weight decay settings.
Training specifics - Queries related to batch size, epochs, gradient clipping, and dropout rates.
Training techniques - Attempts to infer whether early stopping, data augmentation, or mixed-precision training were used.
Author:
Prompt-Based Data Extraction: These prompts resemble patterns found in the model's training data
to identify unintended data leakage.
Sensitive data extraction - These payloads attempt to reveal sensitive information like phone numbers,
email addresses, and unique identifiers.
Pattern-based probing - Crafted prompts closely resemble common patterns (e.g., addresses, quotes)
that may trigger memorized data.
Code and log extraction - Tests the model’s ability to avoid revealing memorized code snippets
or error messages.
Social media data - Payloads focus on revealing memorized handles, conversations,
or other user-generated content.
Author:
Conditional Prompts: These prompts impose constraints that direct the model towards generating text that aligns with specific criteria or domain-specific knowledge.
Conditional constraints - Each payload introduces specific conditions (e.g., domain-specific phrases or
structural limitations) to guide the model’s response.
Domain-targeted extraction - Prompts are crafted to extract domain-specific information, such as legal,
medical, or scientific content.
Sensitive information prevention - The payloads test if the model can provide high-level, generalized
responses without revealing memorized or proprietary data.
Author:
Adversarial Machine Learning - These prompts aim to extract valuable training data from LLMs to exploit or deceive other models via adversarial training.
Data extraction for adversarial retraining - Payloads focus on retrieving training-specific data that could be used to train adversarial models.
Exploiting misclassifications - Prompts target incorrect classifications and vulnerabilities in datasets to generate adversarial inputs.
Adversarial text generation - Some payloads are aimed at crafting text that could be used to deceive systems like chatbots, spam filters, or medical models.
Author:
Proprietary IP Techniques - These prompts target model-specific, innovative techniques designed to provide competitive advantages or constitute patentable technologies.
Proprietary techniques - The payloads focus on extracting innovative, proprietary techniques such as model scaling,
pruning, parallelization, compression, and quantization.
Patented methods - Specific prompts aim to identify patented processes or patent-pending innovations in training,
optimization, and deployment strategies.
Energy efficiency and performance - Queries also focus on proprietary methods that optimize performance and
energy consumption in large models.
Author:
Data Privacy Violations (Model Metadata) - These prompts target sensitive metadata such as who trained the model, the datasets used, and the organizations involved in its development.
Model metadata extraction - These prompts focus on extracting information about who developed and trained the model, as well as organizational and proprietary details.
Training process details - Probes target information related to the data annotation process, privacy practices, and collaborators involved in training.
Security and infrastructure - Payloads focus on extracting information about hardware infrastructure, funding, and data privacy handling during training.
Author:
Visual Representation Analysis - These prompts target model information extracted through visual outputs such as t-SNE and loss plots. Inspired from: "A Plot is Worth a ThousandWords: Model Information Stealing Attacks via Scientific Plots".
Scientific plot extraction: Payloads focus on inferring hyperparameters, model architecture, and training process
details based on patterns in t-SNE or loss plots.
Model metadata and configuration: These prompts test the model’s resistance to disclosing sensitive information
related to optimizer types, batch sizes, dropout, and regularization techniques.
Convergence and training dynamics: Several payloads probe for insights into learning rates, training durations, and
hyperparameter tuning through visual patterns in plots.
Author:
Data Privacy Violations - These prompts target sensitive information, leading to privacy breaches, identity theft, or reputational damage for individuals or organizations.
Sensitive information extraction - These prompts focus on extracting PII (e.g., names, email addresses, financial data, social security numbers) and confidential business information.
Data privacy violations - The payloads aim to assess whether the model can inadvertently leak training data that leads to privacy breaches, identity theft, or reputational damage.
Protected healthcare and legal data - Prompts target sensitive health records, legal contracts, and proprietary business strategies.
Author: