Skip to content

Commit

Permalink
Use memfd_create(2) for sharable file descriptor allocation (v6d-io#1293
Browse files Browse the repository at this point in the history
)

What do these changes do?
-------------------------

- Use `memfd_create()` on Linux, verified both on Docker and on
Kubernetes (between pods)
- Fixes the `vineyard-dev` docker image

Related issue number
--------------------

Fixes v6d-io#1292

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
  • Loading branch information
sighingnow authored Apr 10, 2023
1 parent e6885b6 commit dfb5809
Show file tree
Hide file tree
Showing 9 changed files with 61 additions and 45 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/build-test-graph.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,6 @@ on:
paths:
- modules/graph
pull_request:
branches:
- "*"
paths:
- modules/graph

Expand Down
11 changes: 11 additions & 0 deletions docker/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ WHEEL_IMAGE := vineyard-wheel
WHEEL_MANIFEST_TAG := $(WHEEL_VERSION)_$(WHEEL_PYTHON)
WHEEL_TAG := $(WHEEL_MANIFEST_TAG)_$(PLATFORM)

DEV_REGISTRY := $(REGISTRY)
DEV_IMAGE := vineyard-dev
DEV_TAG := latest_$(PLATFORM)

PYTHON_DEV_REGISTRY := $(REGISTRY)
PYTHON_DEV_IMAGE := vineyard-python-dev
PYTHON_DEV_TAG := latest_$(PLATFORM)
Expand Down Expand Up @@ -90,6 +94,13 @@ python-wheel:
--platform linux/$(ARCH)
.PHONY: python-wheel

# build dev image
build-dev:
docker build ./dev/ \
-f ./dev/Dockerfile.dev \
-t $(DEV_REGISTRY)/$(DEV_IMAGE):$(DEV_TAG)
.PHONY: build-dev

# build python-dev image
build-python-dev:
docker build ../ \
Expand Down
2 changes: 1 addition & 1 deletion docker/dev/Dockerfile.dev
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM ubuntu:20.04
FROM ubuntu:22.04

RUN chmod 1777 /tmp

Expand Down
8 changes: 5 additions & 3 deletions docker/dev/build_scripts/install-arrow.sh
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
#!/bin/bash

set -ex
set -o pipefail

export DEBIAN_FRONTEND=noninteractive
export DEBCONF_NONINTERACTIVE_SEEN=true

# install apache-arrow
wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
apt update
apt install -y libarrow-dev=6.0.1-1 libparquet-dev=6.0.1-1 libarrow-python-dev=6.0.1-1
apt install -y libarrow-dev=10.0.1-1 libparquet-dev=10.0.1-1

# install pyarrow from scratch
sudo pip3 install --no-binary pyarrow pyarrow==6.0.1
sudo pip3 install --no-binary pyarrow pyarrow==10.0.1

# apt-get cleanup
apt-get autoclean
rm -rf ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb

3 changes: 3 additions & 0 deletions docker/dev/build_scripts/install-deps.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/bin/bash

set -ex
set -o pipefail

export DEBIAN_FRONTEND=noninteractive
export DEBCONF_NONINTERACTIVE_SEEN=true

Expand Down
4 changes: 3 additions & 1 deletion docker/dev/build_scripts/install-miscs.sh
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
#!/bin/bash

set -ex
set -o pipefail

# install python packages for codegen, and io adaptors
sudo pip3 install -U "Pygments>=2.4.1"
sudo pip3 install -r build_scripts/requirements.txt

# install clang-format
sudo curl -L https://github.com/muttleyxd/clang-tools-static-binaries/releases/download/master-22538c65/clang-format-8_linux-amd64 --output /usr/bin/clang-format
sudo chmod +x /usr/bin/clang-format

24 changes: 14 additions & 10 deletions docker/dev/build_scripts/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,34 +1,38 @@
argcomplete
black
black>=22.3.0
breathe
click
docutils==0.16
etcd-distro
flake8
flake8>=4.0.1
flake8-comprehensions
flake8-logging-format
furo # sphinx theme
isort
isort>=5.10.1
jinja2>=3.0.0
libclang
linkify-it-py
makefun
myst-parser>=0.13.0
nbsphinx
numpy>=1.18.5
parsec
pandas<1.0.0; python_version<"3.6"
pandas<1.2.0; python_version<"3.7"
pandas>=1.0.0; python_version>="3.7"
pandas<1.2.0; python_version<"3.7"
pickle5; python_version<="3.7"
pygments>=2.4.1
psutil
pyarrow
pygments>=2.4.1
pytest
pytest-benchmark
pytest-datafiles
pytest-timeout
setuptools
shared-memory38; python_version<="3.7"
sortedcontainers
sphinx>=3.0.2
sphinx>=3.0.2,<6
sphinx-copybutton
sphinx-panels
sphinxemoji
sphinxext-opengraph
sphinx-panels
treelib
wheel

27 changes: 0 additions & 27 deletions src/server/memory/allocator.cc
Original file line number Diff line number Diff line change
Expand Up @@ -51,33 +51,6 @@ int64_t BulkAllocator::footprint_limit_ = 0;
int64_t BulkAllocator::allocated_ = 0;

void* BulkAllocator::Init(const size_t size, std::string const& allocator) {
#if __linux__
// Seems that mac os doesn't respect the sysctl configuration.
//
// Vineyard actually can use shared memory larger than the the sysctl result,
// we disable the warning on mac for less inaccurate warnings.
int64_t shmmax = get_maximum_shared_memory();
LOG(INFO) << "shmax: " << shmmax;
if (shmmax < static_cast<float>(size)) {
LOG(WARNING)
<< "The 'size' is greater than the maximum shared memory size ("
<< shmmax << ")" << std::endl;
LOG(WARNING)
<< " If you are inside a Docker container, please pass the argument "
"'--shm-size' when 'docker run'."
<< std::endl;

// try remount
//
// shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
std::string options = "size=" + std::to_string(size);
int flags = MS_REMOUNT | MS_NOSUID | MS_NODEV | MS_NOEXEC | MS_RELATIME;
if (mount("shm", "/dev/shm", "tmpfs", flags, options.c_str()) != 0) {
VLOG(2) << "Failed to remount: " << strerror(errno);
}
}
#endif

if (allocator == "dlmalloc") {
use_mimalloc_ = false;
return DLmallocAllocator::Init(size);
Expand Down
25 changes: 24 additions & 1 deletion src/server/memory/malloc.cc
Original file line number Diff line number Diff line change
Expand Up @@ -84,11 +84,28 @@ int create_buffer(int64_t size, bool memory) {

#else // _WIN32

/**
* Notes [memfd_create vs. mkstemp]
*
* Use memfd_create(2) on Linux if available. Note that with `memfd_create`,
* the shared memory usage is accounted to the process that owns the file
* descriptor, i.e., to the process that creates the blob (which triggers the
* **anonymous** physical memory allocation), which is a bit counterintuitive
* (as the shared memory usage cannot be observed on the vineyardd process),
* but a natural fit for the use cases of vineyard, including on Kubernetes
* where we subject the memory limitation to the pod level and want to limits
* the memory usage of the compute process.
*
* The `memfd_create` syscall resolves the SIGBUS errors as well.
*
* [1]: https://dvdhrm.wordpress.com/2014/06/10/memfd_create2/
*/

// directory where to create the memory-backed file
#ifdef __linux__
std::string file_template;
if (memory) {
file_template = "/dev/shm/vineyard-bulk-XXXXXX";
file_template = "vineyard-bulk-XXXXXX";
} else {
file_template = "/tmp/vineyard-bulk-XXXXXX";
}
Expand All @@ -98,16 +115,22 @@ int create_buffer(int64_t size, bool memory) {
#endif
std::vector<char> file_name(file_template.begin(), file_template.end());
file_name.push_back('\0');
#ifdef __linux__ // see also: Notes [memfd_create vs. mkstemp]
fd = memfd_create(&file_name[0], 0);
#else
fd = mkstemp(&file_name[0]);
#endif
if (fd < 0) {
LOG(ERROR) << "create_buffer failed to open file " << &file_name[0];
return -1;
}
// Immediately unlink the file so we do not leave traces in the system.
#ifndef __linux__ // see also: Notes [memfd_create vs. mkstemp]
if (unlink(&file_name[0]) != 0) {
LOG(ERROR) << "failed to unlink file " << &file_name[0];
return -1;
}
#endif
if (true) {
// Increase the size of the file to the desired size. This seems not to be
// needed for files that are backed by the huge page fs, see also
Expand Down

0 comments on commit dfb5809

Please sign in to comment.