forked from markfasheh/duperemove
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathduperemove.8
136 lines (109 loc) · 4.41 KB
/
duperemove.8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
.TH "duperemove" "8" "March 2014" "Version 0.04"
.SH "NAME"
duperemove \- Find duplicate extents and print them to stdout
.SH "SYNOPSIS"
\fBduperemove\fR \fI[options]\fR \fIfiles...\fI
.SH "DESCRIPTION"
.PP
\fBduperemove\fR is a simple tool for finding duplicated extents and
submitting them for deduplication. When given a list of files it will
hash their contents on a block by block basis and compare those hashes
to each other, finding and categorizing extents that match each
other. When given the \fB-d\fR option, \fBduperemove\fR will submit those
extents for deduplication using the btrfs-extent-same ioctl.
.SH "GENERAL"
Duperemove has two major modes of operation one of which is a subset
of the other.
.SS "Readonly / Non-deduplicating Mode"
When run without \fB-d\fR (the default) duperemove will print out one or
more tables of matching extents it has determined would be ideal
candidates for deduplication. As a result, readonly mode is useful for
seeing what duperemove might do when run with \fB-d\fR. The output could
also be used by some other software to submit the extents for
deduplication at a later time.
It is important to note that this mode will not print out \fBall\fR
instances of matching extents, just those it would consider for
deduplication.
Generally, duperemove does not concern itself with the underlying
representation of the extents it processes. Some of them could be
compressed, undergoing I/O, or even have already been deduplicated. In
dedupe mode, the kernel handles those details and therefore we try not
to replicate that work.
.SS "Deduping Mode"
This functions similarly to readonly mode with the exception that the
duplicated extents found in our "read, hash, and compare" step will
actually be submitted for deduplication. An estimate of the total data
deduplicated will be printed after the operation is complete. This
estimate is calculated by comparing the total amount of shared bytes
in each file before and after the dedupe.
.SH "OPTIONS"
\fIfiles\fR can refer to a list of regular files and directories or be
a hyphen (-) to read them from standard input.
If a directory is specified, all regular files within it will also be
scanned.
.TP
\fB\-r\fR
Enable recursive dir traversal.
.TP
\fB\-d\fR
De-dupe the results - only works on \fIbtrfs\fR.
.TP
\fB\-A\fR
Opens files readonly when deduping. Primarily for use by privileged
users on readonly snapshots.
.TP
\fB\-b size\fR
Use the specified block size. The default is \fB128K\fR.
.TP
\fB\-h\fR
Print numbers in human-readable format.
.TP
\fB\-v\fR
Be verbose.
.TP
\fB\--io-threads=N\fR
Use N threads for I/O. This is used by the file hashing and dedupe
stages. Default is automatically detected based on number of host
cpus.
.TP
\fB\--read-hashes=hashfile\fR
Read hashes from a hashfile. A file list is not required with this
option. Dedupe can be done if duperemove is run from the same base
directory as is stored in the hash file (basically duperemove has to
be able to find the files).
.TP
\fB\--write-hashes=hashfile\fR
Write hashes to a hashfile. These can be read in at a later date and
deduped from.
.TP
\fB\--lookup-extents=[yes|no]\fR
Defaults to no. While checksumming a file, duperemove can optionally
lookup file extent state to see whether a given file block is already
shared. This information can later be used to optimize the search for
duplicate extents. There are some caveats to this, so please read
below.
On btrfs, extents which have been snapshotted are reported as shared,
as more than one inode points to them. A deduped extent also gets
reported as shared for the same reasons. Internally duperemove can not
yet make the distinction between the two. If \fB--lookup-extents\fR is
turned on, duperemove will consider a shared extent to have already
been deduped. On a snapshotted file system this might cause all or
most of the extents to be skipped for dedupe.
If you are not making snapshots on the fs you are deduping, this
option will allow duperemove to make better decisions on which extents
to dedupe.
A future version of duperemove will remove this restriction, allowing
us to default this option to on.
.TP
\fB\-?, --help\fR
Prints help text.
.TP
\fB\--hash-threads=N\fR
Deprecated, see \fB--io-threads\fR above.
.SH "FAQ"
Please see the \fBFAQ.md\fR file which should have been included with your duperemove package.
.SH "NOTES"
Deduplication is currently only supported by the \fIbtrfs\fR filesystem.
.SH "SEE ALSO"
.BR filesystems(5)
.BR btrfs(8)