Payloads simulate scenarios where model training data security may be compromised. Expected system behavior is detailed for each scenario.
Fuzzing File: 001-by-asking.txt
Author:
Payloads simulate queries that aim to induce looping or repetitive responses. Expected system behavior is detailed for each scenario. Note that this is just in the raw files, and will not currently be shown in ZAP.
References:
Fuzzing File: 002-through-hypnosis.txtAuthor:
These payloads simulate queries that aim to induce looping or repetitive responses. This version of payloads is more focused on training data extraction through a prompting strategy similar to the divergence attack described in the document. Expected system behavior is detailed for each scenario.
Repetition strategies - Several prompts employ repetition to test if the model diverges and outputs memorized data.
Sensitive data requests - These payloads assess whether the model can protect sensitive training data like PII, code snippets, or real-world text examples from leaking.
References
Fuzzing File: 003-divergence-attack.txtAuthor:
Payloads simulate input sequences that induce LLMs to generate memorized content. This version is based on the SCA method, as described in the document provided. Expected system behavior is detailed for each scenario.
Special Characters Attack (SCA) - Payloads focus on using specific symbols, such as JSON structural symbols or other commonly occurring characters in LLM training corpora, to induce data leakage.
References
Fuzzing File: 004-SCA.txtAuthor:
Unintended memorization – The payloads are crafted to test whether the model retains specific secret data that should not be memorized, such as credit card numbers or social security numbers. Exposure metric testing – Some payloads test whether the model's output can be tied to memorization, using sequences akin to "canaries" inserted into the training data.
References:
Fuzzing File: 005-the-secret-sharer.txtAuthor:
These payloads are designed to test model scale, data duplication, and context length as key factors influencing memorization.
References:
Fuzzing File: 006-quantifying-memorization.txtAuthor:
DP-SGD stands for Differentially Private Stochastic Gradient Descent.
It's a privacy-preserving version of the standard Stochastic Gradient Descent (SGD) algorithm used to train machine learning models — especially neural networks — while ensuring differential privacy for the data used during training.
Data Poisoning - These payloads test the ability of adversaries to inject malicious data into the training process, compromising privacy or accuracy.
References:
Fuzzing File: 007-data-poisoning-attacks.txtAuthor:
Membership Inference Attacks - These payloads assess if adversaries can infer membership status by querying the model.
References:
Fuzzing File: 008-membership-inference.txtAuthor:
Gradient Clipping - Payloads explore whether gradient clipping sufficiently prevents large gradients from revealing sensitive information or influencing the model.
References:
Fuzzing File: 009-gradient-clipping.txtAuthor: