Novel constructions for Proof-of-Replication

**Work in Progress**: This is a work in progress. For comments and suggestions contact us at [research@protocol.ai](mailto:research@protocol.ai)

We as Protocol Labs actively support these areas of research with grants, bounties and direct collaborations. We plan to fund research related to these open problems. Reach out if you want to work on or are working on these problems.

---

# *Proof-of-Replication*

> *Proof-of-Replication* (PoRep) is a new kind of *Proof-of-Storage*, that can be used to show that some data *D* has been replicated to its own uniquely dedicated physical storage. This document presents Proof-of-Replication and its related open problems.

## Introduction

This document presents *Proof-of-Replication* (PoRep), motivates its use cases, provides a definition and outlines the list of related open problems. For a more formal presentation of PoRep schemes, please refer to our [Technical Report on *Proof-of-Replication*](https://filecoin.io/proof-of-replication.pdf).

### Motivation

**Centralized Cloud**. The current cloud infrastructure operates in a centralized setting: the cloud provider is trusted by the client to offer their service correctly, for example storing data, or perform computation. The centralized setting makes it difficult for a client to spot malfunctioning and malicious behaviors. This leads to a cloud infrastructure where only *trusted* providers participate.

**Untrusted Cloud**.  In the past decade there has been extensive work on delegating computation and storage in an *untrusted* cloud setting. If the cloud provider operations were to be *verifiable*, then clients would not need to trust the providers. The intuition is that a provider must generate a proof for their the service that a client can verify. In a naive scenario, a client could store the data on a cloud storage provider and challenge the provider to serve the entire data back in order to *verify* that the storage provider is still storing the data. However, this straw-man solution would be impractical for standard cloud usage. Practical solutions rely on cryptographic assumptions where the storage provider generates a short cryptographic proof that the client can cheaply verify. There are several cryptographic proof systems for proving storage, we refer in particular *Proof-of-Retrievability* (PoR) and *Proof-of-Data-Possession* (PDP).

**Decentralized Storage Networks**. Decentralized Storage Networks are peer-to-peer systems where nodes (either altruistically or with incentives) store data of other peers. Similar to the untrusted cloud setting, client nodes verify cryptographic proofs rather than relying on trust to ensure that other nodes are storing their data.

**Proof-of-Replication**. Current schemes only guarantee that the storage provider had the file at the time the proof was generated. However, this does not guarantee that the file was stored through time and that some storage was dedicated to that particular file. In addition, in systems where providers are rewarded proportionally to their storage (e.g. the mining reward in Filecoin), current existing schemes cannot be used, since storage providers can lie about their storage. Proof-of-Replication is a novel proof system, where storage providers cannot lie about storing files, since they would have to store a unique physical permutation (that we refer to as replica) for each file or copy they claim to store.

**Note**: For a more detailed use-cases for Proof-of-Replication, please refer to our [Technical Report](https://filecoin.io/proof-of-replication.pdf) (Section 1.3).

### Problems with current PoR and PDP schemes

State-of-the-art *Proof-of-Retrievability* (PoR) and *Proof-of-Data-Possession* schemes used today in verifiable cloud storage and in decentralized storage networks, only guarantee that a provider had possession of the data at the time of the challenge/response interaction and they are subject to the following three attacks:

- **Sybil Attack**: Malicious storage providers could pretend to store (and get paid for) more copies than the ones physically stored by creating multiple Sybil identities, but storing the data only *once*.
- **Outsourcing Attack**: Malicious storage providers could commit to store more data than the amount they can physically store, relying on quickly fetching data from other storage providers.
- **Generation Attack**: Malicious storage providers could claim to be storing a large amount of data which they are instead efficiently generating on-demand using a small program. If the program is smaller than the purportedly stored data, this inflates the malicious storage provider’s likelihood of winning a block reward in Filecoin, which is proportional to the storage provider’s storage currently in use.

**Note**: For a more detailed survey of the different Proofs-of-Storage, please refer to our [Technical Report](https://filecoin.io/proof-of-replication.pdf) (Section 1.2)

## Problem Definition

*Proof-of-Replication* allows a storage provider to convince a user that some data *D* has been replicated to its own unique storage. This definition is stricter than PoR and PDP, since it forces the prover to store a unique copy of the file.

### Definition

*Proof-of-Replication* enables a prover *P* to convince a verifier *V* that *P* is storing a replica *R*, a physically independent copy of some data *D*, unique to *P*. The scheme is defined by a tuple of polynomial time algorithms (_Setup_, _Prove_, _Verify_). The assumption is that generation of a replica after _Setup_  must be difficult (if not impossible) to generate.

- *Setup*: On setup, either a party or both (depending on the scheme) generate a unique permutation of the original data _D_, which we call replica _R_.
- _Prove_: On receiving a challenge, the prover must generate a proof that it is in possession of the replica and that it was derived from data _D_. The prover must only be able to respond to the challenge successfully if it is in possession of the replica, since would be difficult (if not impossible) to generate a replica that can be used to generate the proof at this stage
- _Verify_: On receiving the proof, the verifier checks the validity of the proof and accepts or rejects the proof. 

**Note**: For a formal definition of Proof-of-Replication, please refer to our [Technical Report](https://filecoin.io/proof-of-replication.pdf) (Section 2).

### Time-bounded Proof-of-Replication (Visual)

**Timing assumption**. *Time-bounded Proof-of-Replication* are constructions of PoRep with timing assumptions. The assumption is that generation of the replica (hence the _Setup_) takes some time _t_ that is substantially larger than the time it takes to produce a proof (hence _time(Prove)_) and the round-trip time (_RTT_) for sending a challenge and receiving a proof.

**Distinguishing Malicious provers**. A malicious prover that does not have _R_, must obtain it (or generate it), before the _Prove_ step. A verifier can distinguish an honest prover from a malicious prover, since the malicious one will take too long to answer the challenge. A verifier will reject if receiving the proof from the prover takes longer than a timeout (bounded between proving time and sealing time).

The following figure shows the different steps of the Proof of Replication on a timeline. We refer to the the process of generating a replica as "sealing".

![](https://protocol.ai/img/blog/proof-of-rep.svg)

**Note**: For a formal definition of Time-bounded Proof-of-Replication, please refer to our [Technical Report](https://filecoin.io/proof-of-replication.pdf) (Section 2.1).

### Beyond Time-bounded Proof-of-Replication

There exist other *Proof-of-Replication* schemes that do not rely on timing assumption but instead of trust assumption. For example, the replica could be generated via a multi-party computation with the assumption that all the involved parties will not collude in the future.


## Open Problems

While there are some candidate constructions of the Proof-of-Replication, there are still improvements for making such constructions more practical. An ideal PoRep construction should have as many of the following properties as possible.

### Desirable properties

- **Transparency**: There is no such extra information that a prover can use to generate valid PoRep proofs without storing the replica itself during _Prove_.
- **Practical Slow Permutation**: If _Prove_ time takes _time(Prove)_ and round-trip time is _RTT_, then the theoretical lower bound for _time(Setup)_ to be secure is _RTT+time(Prove)_. A slow permutation is practical if the time gap between _time(Setup)_ and _RTT+time(Prove)_ is small without breaking security. The smaller the time gap, the more practical the permutation.
- **Short Verification Time**: The verification time is short if it is constant in the security parameter O(λ), for a small constant, or sublinear in the size of the data. (Note: For a reasonable data size, e.g. 10GB, a constant verification might concretely take longer than a sublinear verification)
- **Short Proofs**: The proofs are short if they are constant in the security parameter O(λ), for a small constant, or sublinear in the size of the data (Note: For a reasonable data size, e.g. 10GB, a constant proof size might be concretely larger than a sublinear proof size)
- **Non-parallelizable Permutation**: The permutation should not be parallelizable, a prover should not get any advantage by adding computational power. This is important for the Filecoin network, miners should get their returns proportional to their storage rather than their power. (Note: we allow for exploration of *Rational Proof-of-Replication*, where parallelization is allowed, however it is more expensive to generate proofs).
- **Fast Parallelizable Decoding**: The algorithm for retrieving the original data from the replica can be parallelized.
- **No Size Expansion**: The size of the replica should be equivalent (or a constant) to the size of the original data.

### Concrete Open Problems

1. **New PoRep constructions**: Are there constructions of a *transparent* PoRep with *practically slow permutation*, *fast verification time*, *short proofs* and *parallelizable decoding*? (see current [Proof-of-Replication construction](https://www.youtube.com/watch?v=8_9ONpyRZEI)). In addition, are there radically different constructions that achieve the same goals?
2. **PoRep without timing assumption**: Are there other PoRep schemes that are not time-bounded? If so, are there PoRep that do not require a trusted party to generate the replica? Would they work in the Filecoin setting?
3. **Aggregating Proofs**: In the setting of Filecoin, we require miners to repeatedly report valid Proof-of-Replication. Is there a practical way to aggregate and or sequentially compose multiple proof of replication (maintaining short proof size and fast verification)? (For more details, see Proof-of-Spacetime https://github.com/protocol/research/issues/5)

## Useful Readings
- Filecoin Whitepaper ([paper](https://filecoin.io/filecoin.pdf))
- Proof of Replication ([paper](https://filecoin.io/proof-of-replication.pdf))
- Filecoin Proof of Replication motivation ([video](https://youtu.be/vyRZBeMtkrA?t=1793))
- Proof of Replication Construction ([video](https://www.youtube.com/watch?v=8_9ONpyRZEI), [slides](https://cyber.stanford.edu/sites/default/files/bpase_porep.pptx))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Novel constructions for Proof-of-Replication #4

Proof-of-Replication

Introduction

Motivation

Problems with current PoR and PDP schemes

Problem Definition

Definition

Time-bounded Proof-of-Replication (Visual)

Beyond Time-bounded Proof-of-Replication

Open Problems

Desirable properties

Concrete Open Problems

Useful Readings

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development