Skip to content

Commit

Permalink
util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
Browse files Browse the repository at this point in the history
When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).

Current, We have below different possible use cases:

1. pmem=on is set, shared=on is set, MAP_SYNC supported:
   a: backend is a dax supporting file.
    - MAP_SYNC will active.
   b: backend is not a dax supporting file.
    - mmap will trigger a warning. then MAP_SYNC flag will be ignored

2. The rest of cases:
   - we will never pass the MAP_SYNC to mmap2

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
[ehabkost: Rebased patch to latest code on master]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Tested-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190422004849.26463-2-richardw.yang@linux.intel.com>
[ehabkost: squashed documentation patch]
Message-Id: <20190422004849.26463-3-richardw.yang@linux.intel.com>
[ehabkost: documentation fixup]
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
  • Loading branch information
Zhang Yi authored and ehabkost committed Apr 25, 2019
1 parent 8cf108c commit 119906a
Show file tree
Hide file tree
Showing 3 changed files with 64 additions and 4 deletions.
22 changes: 19 additions & 3 deletions docs/nvdimm.txt
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,25 @@ Guest Data Persistence
----------------------

Though QEMU supports multiple types of vNVDIMM backends on Linux,
currently the only one that can guarantee the guest write persistence
is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
which all guest access do not involve any host-side kernel cache.
the only backend that can guarantee the guest write persistence is:

A. DAX device (e.g., /dev/dax0.0, ) or
B. DAX file(mounted with dax option)

When using B (A file supporting direct mapping of persistent memory)
as a backend, write persistence is guaranteed if the host kernel has
support for the MAP_SYNC flag in the mmap system call (available
since Linux 4.15 and on certain distro kernels) and additionally
both 'pmem' and 'share' flags are set to 'on' on the backend.

If these conditions are not satisfied i.e. if either 'pmem' or 'share'
are not set, if the backend file does not support DAX or if MAP_SYNC
is not supported by the host kernel, write persistence is not
guaranteed after a system crash. For compatibility reasons, these
conditions are ignored if not satisfied. Currently, no way is
provided to test for them.
For more details, please reference mmap(2) man page:
http://man7.org/linux/man-pages/man2/mmap.2.html.

When using other types of backends, it's suggested to set 'unarmed'
option of '-device nvdimm' to 'on', which sets the unarmed flag of the
Expand Down
5 changes: 5 additions & 0 deletions qemu-options.hx
Original file line number Diff line number Diff line change
Expand Up @@ -4233,6 +4233,11 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
If @option{pmem} is set to 'on', QEMU will take necessary operations to
guarantee the persistence of its own writes to @option{mem-path}
(e.g. in vNVDIMM label emulation and live migration).
Also, we will map the backend-file with MAP_SYNC flag, which ensures the
file metadata is in sync for @option{mem-path} in case of host crash
or a power failure. MAP_SYNC requires support from both the host kernel
(since Linux kernel 4.15) and the filesystem of @option{mem-path} mounted
with DAX option.

@item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}

Expand Down
41 changes: 40 additions & 1 deletion util/mmap-alloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@
* later. See the COPYING file in the top-level directory.
*/

#ifdef CONFIG_LINUX
#include <linux/mman.h>
#else /* !CONFIG_LINUX */
#define MAP_SYNC 0x0
#define MAP_SHARED_VALIDATE 0x0
#endif /* CONFIG_LINUX */

#include "qemu/osdep.h"
#include "qemu/mmap-alloc.h"
#include "qemu/host-utils.h"
Expand Down Expand Up @@ -82,6 +89,7 @@ void *qemu_ram_mmap(int fd,
bool is_pmem)
{
int flags;
int map_sync_flags = 0;
int guardfd;
size_t offset;
size_t pagesize;
Expand Down Expand Up @@ -132,9 +140,40 @@ void *qemu_ram_mmap(int fd,
flags = MAP_FIXED;
flags |= fd == -1 ? MAP_ANONYMOUS : 0;
flags |= shared ? MAP_SHARED : MAP_PRIVATE;
if (shared && is_pmem) {
map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
}

offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;

ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE, flags, fd, 0);
ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
flags | map_sync_flags, fd, 0);

if (ptr == MAP_FAILED && map_sync_flags) {
if (errno == ENOTSUP) {
char *proc_link, *file_name;
int len;
proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
file_name = g_malloc0(PATH_MAX);
len = readlink(proc_link, file_name, PATH_MAX - 1);
if (len < 0) {
len = 0;
}
file_name[len] = '\0';
fprintf(stderr, "Warning: requesting persistence across crashes "
"for backend file %s failed. Proceeding without "
"persistence, data might become corrupted in case of host "
"crash.\n", file_name);
g_free(proc_link);
g_free(file_name);
}
/*
* if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
* we will remove these flags to handle compatibility.
*/
ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
flags, fd, 0);
}

if (ptr == MAP_FAILED) {
munmap(guardptr, total);
Expand Down

0 comments on commit 119906a

Please sign in to comment.