ZK-VSA: ZERO-KNOWLEDGE VERIFIABLE SPEAKER ANONYMIZATION LEVERAGING PHASE VOCODER WITH TIME-SCALE MODIFICATION

Shuang Liang1, Yang Hua2, Peishen Yan1, Linshan Jiang3, Tao Song1,*, Bin Yao1,*, Haibing Guan1
1 School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
2 School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, United Kingdom
3 Institute of Data Science, National University of Singapore, Singapore, Singapore

Abstract

Speaker anonymization protects against speaker identity inference, yet third parties cannot verify that released speech is authenticated and anonymized as predefined without revealing the original. We propose Verifiable Speaker Anonymization (VSA), a paradigm that enables public verification that a predefined anonymization has been applied while the original remains hidden. We instantiate this paradigm as ZK-VSA using zero-knowledge succinct non-interactive arguments of knowledge (ZK-SNARKs): we encode phase vocoder with time-scale modification (PV-TSM) as arithmetic constraints suitable for succinct proofs, complemented by SNARK-friendly phase handling, and integrate cryptographic commitments with digital signatures for authentication. We evaluate ZK-VSA on LibriSpeech, using automatic speech recognition (ASR) for intelligibility and automatic speaker verification (ASV) for anonymity. Our proof-constrained anonymization closely matches floating-point PV-TSM, while proofs add only a slight overhead and verify in milliseconds. These results demonstrate the practicality of VSA and open a path to proof-based guarantees for broader speech transformations.

Anonymization
& Prove
Anonymization...
PII
PII
re-identification attacks
re-identification attacks
inversion attacks
inversion attacks
Text is not SVG - cannot display
Fig.1: Scenario of verifiable speaker anonymization (VSA). A trusted device records and signs speech signal; the system outputs an anonymized signal and proof for verification, resisting: (a) inversion attacks—recovering the original signal, and (b) re-identification attacks—inferring the speaker's identity.

System Workflow

2025-09-07T13:54:20.862297 image/svg+xml Matplotlib v3.10.3, https://matplotlib.org/ 2025-09-07T13:54:20.767201 image/svg+xml Matplotlib v3.10.3, https://matplotlib.org/ 2025-09-07T13:54:20.666188 image/svg+xml Matplotlib v3.10.3, https://matplotlib.org/ 2025-09-07T13:57:13.615806 image/svg+xml Matplotlib v3.10.3, https://matplotlib.org/ 2025-09-07T13:57:13.462257 image/svg+xml Matplotlib v3.10.3, https://matplotlib.org/ 2025-09-07T13:54:20.767201 image/svg+xml Matplotlib v3.10.3, https://matplotlib.org/
Private
Private
Public
Public
file_type_key
2
2
3
3
6
6
`sk`
`pk`
5
5
7
7
7
7
Scale
Scale
Unwrapping
Unwrapping
Rounding
Rounding
Hash
Hash
Signature
Signature
Rescale
Rescale
SNARK.Prove
SNARK.Prove
1
1
sign
sign
generate
keypair
generate...
`\Phi_X`
`R`
`R`
`\Phi_Y`
$$\tilde{\theta} = \frac{2^{\ell}}{\pi}\th...
$$\theta = \frac{\pi}{2^\ell} \tilde{\th...
prove
prove
verify
verify
copy
copy
4
4
PV-TSM
PV-TSM
`x(n)`
`y(n)`
`X`
`Y`
STFT
STFT
ISTFT
ISTFT
8
8
Fig.2: End-to-end workflow from trusted capture to verifiable anonymized release: device-signed STFT commitments, PV-TSM on phase, a proof bound to those commitments, and public verification with ISTFT reconstruction.

Samples

Transcript Original Utterance Semitone Anonymized Utterance
Proof (ZK-VSA)1
Due to the verifier's size (~3MB), on-demand loading is employed to reduce unnecessary bandwidth.
FPP Floating-point PV-TSM ZK-VSA Proposed SNARK-friendly PV-TSM
NO NAMES PLEASE SAID HOLMES AS WE KNOCKED AT GILCHRIST'S DOOR
Duration: 4.1s
3
Download proof.json
(Proof size: 292 bytes)
2
Download proof.json
(Proof size: 292 bytes)
1
Download proof.json
(Proof size: 292 bytes)
-1
Download proof.json
(Proof size: 292 bytes)
-2
Download proof.json
(Proof size: 292 bytes)
-3
Download proof.json
(Proof size: 292 bytes)
OH THAT MADE HIM SO ANGRY
Duration: 3.2s
3
Download proof.json
(Proof size: 292 bytes)
2
Download proof.json
(Proof size: 292 bytes)
1
Download proof.json
(Proof size: 292 bytes)
-1
Download proof.json
(Proof size: 292 bytes)
-2
Download proof.json
(Proof size: 292 bytes)
-3
Download proof.json
(Proof size: 292 bytes)
IT IS THE ONLY AMENDS I ASK OF YOU FOR THE WRONG YOU HAVE DONE ME
Duration: 4.1s
3
Download proof.json
(Proof size: 292 bytes)
2
Download proof.json
(Proof size: 292 bytes)
1
Download proof.json
(Proof size: 292 bytes)
-1
Download proof.json
(Proof size: 292 bytes)
-2
Download proof.json
(Proof size: 292 bytes)
-3
Download proof.json
(Proof size: 292 bytes)
SILVIA DID NOT THINK THAT HER GOOD CONDUCT WAS A MERIT FOR SHE KNEW THAT SHE WAS VIRTUOUS ONLY BECAUSE HER SELF LOVE COMPELLED HER TO BE SO AND SHE NEVER EXHIBITED ANY PRIDE OR ASSUMED ANY SUPERIORITY TOWARDS HER THEATRICAL SISTERS ALTHOUGH SATISFIED TO SHINE BY THEIR TALENT OR THEIR BEAUTY THEY CARED LITTLE ABOUT RENDERING THEMSELVES CONSPICUOUS BY THEIR VIRTUE
Duration: 23.7s
3
Download proof.json
(Proof size: 292 bytes)
2
Download proof.json
(Proof size: 292 bytes)
1
Download proof.json
(Proof size: 292 bytes)
-1
Download proof.json
(Proof size: 292 bytes)
-2
Download proof.json
(Proof size: 292 bytes)
-3
Download proof.json
(Proof size: 292 bytes)

1. The verifier implementation is available at Github.