Quality-of-Implementation R
- Specialization
- Performance and Scalability
+
- Forward Progress
- Composability
- Corner Cases
@@ -1822,6 +1823,106 @@
RCU thus provides a range of tools to allow updaters to strike the
required tradeoff between latency, flexibility and CPU overhead.
+
+
+
+In theory, delaying grace-period completion and callback invocation
+is harmless.
+In practice, not only are memory sizes finite but also callbacks sometimes
+do wakeups, and sufficiently deferred wakeups can be difficult
+to distinguish from system hangs.
+Therefore, RCU must provide a number of mechanisms to promote forward
+progress.
+
+
+These mechanisms are not foolproof, nor can they be.
+For one simple example, an infinite loop in an RCU read-side critical
+section must by definition prevent later grace periods from ever completing.
+For a more involved example, consider a 64-CPU system built with
+CONFIG_RCU_NOCB_CPU=y and booted with rcu_nocbs=1-63,
+where CPUs 1 through 63 spin in tight loops that invoke
+call_rcu().
+Even if these tight loops also contain calls to cond_resched()
+(thus allowing grace periods to complete), CPU 0 simply will
+not be able to invoke callbacks as fast as the other 63 CPUs can
+register them, at least not until the system runs out of memory.
+In both of these examples, the Spiderman principle applies: With great
+power comes great responsibility.
+However, short of this level of abuse, RCU is required to
+ensure timely completion of grace periods and timely invocation of
+callbacks.
+
+
+RCU takes the following steps to encourage timely completion of
+grace periods:
+
+
+- If a grace period fails to complete within 100 milliseconds,
+ RCU causes future invocations of cond_resched() on
+ the holdout CPUs to provide an RCU quiescent state.
+ RCU also causes those CPUs' need_resched() invocations
+ to return true, but only after the corresponding CPU's
+ next scheduling-clock.
+
- CPUs mentioned in the nohz_full kernel boot parameter
+ can run indefinitely in the kernel without scheduling-clock
+ interrupts, which defeats the above need_resched()
+ strategem.
+ RCU will therefore invoke resched_cpu() on any
+ nohz_full CPUs still holding out after
+ 109 milliseconds.
+
- In kernels built with CONFIG_RCU_BOOST=y, if a given
+ task that has been preempted within an RCU read-side critical
+ section is holding out for more than 500 milliseconds,
+ RCU will resort to priority boosting.
+
- If a CPU is still holding out 10 seconds into the grace
+ period, RCU will invoke resched_cpu() on it regardless
+ of its nohz_full state.
+
+
+
+The above values are defaults for systems running with HZ=1000.
+They will vary as the value of HZ varies, and can also be
+changed using the relevant Kconfig options and kernel boot parameters.
+RCU currently does not do much sanity checking of these
+parameters, so please use caution when changing them.
+Note that these forward-progress measures are provided only for RCU,
+not for
+SRCU or
+Tasks RCU.
+
+
+RCU takes the following steps in call_rcu() to encourage timely
+invocation of callbacks when any given non-rcu_nocbs CPU has
+10,000 callbacks, or has 10,000 more callbacks than it had the last time
+encouragement was provided:
+
+
+- Starts a grace period, if one is not already in progress.
+
- Forces immediate checking for quiescent states, rather than
+ waiting for three milliseconds to have elapsed since the
+ beginning of the grace period.
+
- Immediately tags the CPU's callbacks with their grace period
+ completion numbers, rather than waiting for the RCU_SOFTIRQ
+ handler to get around to it.
+
- Lifts callback-execution batch limits, which speeds up callback
+ invocation at the expense of degrading realtime response.
+
+
+
+Again, these are default values when running at HZ=1000,
+and can be overridden.
+Again, these forward-progress measures are provided only for RCU,
+not for
+SRCU or
+Tasks RCU.
+Even for RCU, callback-invocation forward progress for rcu_nocbs
+CPUs is much less well-developed, in part because workloads benefiting
+from rcu_nocbs CPUs tend to invoke call_rcu()
+relatively infrequently.
+If workloads emerge that need both rcu_nocbs CPUs and high
+call_rcu() invocation rates, then additional forward-progress
+work will be required.
+
@@ -2272,7 +2373,7 @@
Furthermore, NMI handlers can be interrupted by what appear to RCU
to be normal interrupts.
One way that this can happen is for code that directly invokes
-rcu_irq_enter() and rcu_irq_exit() to be called
+rcu_irq_enter() and rcu_irq_exit() to be called
from an NMI handler.
This astonishing fact of life prompted the current code structure,
which has rcu_irq_enter() invoking rcu_nmi_enter()
@@ -2294,7 +2395,7 @@
Unfortunately, there is no way to cancel an RCU callback;
once you invoke call_rcu(), the callback function is
-going to eventually be invoked, unless the system goes down first.
+eventually going to be invoked, unless the system goes down first.
Because it is normally considered socially irresponsible to crash the system
in response to a module unload request, we need some other way
to deal with in-flight RCU callbacks.
@@ -3233,6 +3334,11 @@
originating call_rcu() instance, though probably not
in production kernels.
+
+Additional work may be required to provide reasonable forward-progress
+guarantees under heavy load for grace periods and for callback
+invocation.
+
From 2d0350a8f0e6eb5494141c61c5c749b5155df33d Mon Sep 17 00:00:00 2001
From: "Joel Fernandes (Google)"
Date: Fri, 21 Sep 2018 18:31:53 -0400
Subject: [PATCH 012/112] doc: Clarify RCU data-structure comment about
rcu_tree fanout
RCU Data-Structures document describes a trick to test RCU with small
number of CPUs but with a taller tree. It wasn't immediately clear how
the document arrived at 16 CPUs which also requires setting the
FANOUT_LEAF to 2 instead of the default of 16. This commit therefore
provides the needed clarification.
Signed-off-by: Joel Fernandes (Google)
Cc:
Signed-off-by: Paul E. McKenney
---
.../RCU/Design/Data-Structures/Data-Structures.html | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html
index 1d2051c0c3fc16..476b1ac38e4c2a 100644
--- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html
+++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html
@@ -127,9 +127,11 @@
RCU currently permits up to a four-level tree, which on a 64-bit system
accommodates up to 4,194,304 CPUs, though only a mere 524,288 CPUs for
32-bit systems.
-On the other hand, you can set CONFIG_RCU_FANOUT to be
-as small as 2 if you wish, which would permit only 16 CPUs, which
-is useful for testing.
+On the other hand, you can set both CONFIG_RCU_FANOUT and
+CONFIG_RCU_FANOUT_LEAF to be as small as 2, which would result
+in a 16-CPU test using a 4-level tree.
+This can be useful for testing large-system capabilities on small test
+machines.
This multi-level combining tree allows us to get most of the
performance and scalability
From dd944caa8173a19c702076471aae17a2d793ebeb Mon Sep 17 00:00:00 2001
From: "Joel Fernandes (Google)"
Date: Sat, 22 Sep 2018 19:41:27 -0400
Subject: [PATCH 013/112] doc: Remove rcu_preempt_state reference in stallwarn
Consolidation of RCU-bh, RCU-preempt, and RCU-sched into one RCU flavor
to rule them all resulted in the removal of rcu_preempt_state. However,
stallwarn.txt still mentions rcu_preempt_state. This commit therefore
Updates stallwarn documentation accordingly.
Signed-off-by: Joel Fernandes (Google)
Cc:
Signed-off-by: Paul E. McKenney
---
Documentation/RCU/stallwarn.txt | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 491043fd976f8b..b01bcafc64aa02 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -176,9 +176,8 @@ causing stalls, and that the stall was affecting RCU-sched. This message
will normally be followed by stack dumps for each CPU. Please note that
PREEMPT_RCU builds can be stalled by tasks as well as by CPUs, and that
the tasks will be indicated by PID, for example, "P3421". It is even
-possible for a rcu_preempt_state stall to be caused by both CPUs -and-
-tasks, in which case the offending CPUs and tasks will all be called
-out in the list.
+possible for an rcu_state stall to be caused by both CPUs -and- tasks,
+in which case the offending CPUs and tasks will all be called out in the list.
CPU 2's "(3 GPs behind)" indicates that this CPU has not interacted with
the RCU core for the past three grace periods. In contrast, CPU 16's "(0
From 8f15c682ac5a778feb8e343f9057b89beb40d85b Mon Sep 17 00:00:00 2001
From: Connor Shu
Date: Wed, 22 Aug 2018 14:16:46 -0700
Subject: [PATCH 014/112] rcutorture: Automatically create initrd directory
The rcutorture scripts currently expect the user to create the
tools/testing/selftests/rcutorture/initrd directory. Should the user
fail to do this, the kernel build will fail with obscure and confusing
error messages. This commit therefore adds explicit checks for the
tools/testing/selftests/rcutorture/initrd directory, and if not present,
creates one on systems on which dracut is installed. If this directory
could not be created, a less obscure error message is emitted and the
test is aborted.
Suggested-by: Thomas Gleixner
Signed-off-by: Connor Shu
[ paulmck: Adapt the script to fit into the rcutorture framework and
severely abbreviate the initrd/init script. ]
Signed-off-by: Paul E. McKenney
---
tools/testing/selftests/rcutorture/bin/kvm.sh | 8 +++
.../selftests/rcutorture/bin/mkinitrd.sh | 60 +++++++++++++++++++
2 files changed, 68 insertions(+)
create mode 100755 tools/testing/selftests/rcutorture/bin/mkinitrd.sh
diff --git a/tools/testing/selftests/rcutorture/bin/kvm.sh b/tools/testing/selftests/rcutorture/bin/kvm.sh
index 5a7a62d76a50b9..19864f1cb27a42 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm.sh
@@ -194,6 +194,14 @@ do
shift
done
+if test -z "$TORTURE_INITRD" || tools/testing/selftests/rcutorture/bin/mkinitrd.sh
+then
+ :
+else
+ echo No initrd and unable to create one, aborting test >&2
+ exit 1
+fi
+
CONFIGFRAG=${KVM}/configs/${TORTURE_SUITE}; export CONFIGFRAG
if test -z "$configs"
diff --git a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
new file mode 100755
index 00000000000000..ae773760f39696
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
@@ -0,0 +1,60 @@
+#!/bin/bash
+#
+# Create an initrd directory if one does not already exist.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, you can access it online at
+# http://www.gnu.org/licenses/gpl-2.0.html.
+#
+# Copyright (C) IBM Corporation, 2013
+#
+# Author: Connor Shu
+
+D=tools/testing/selftests/rcutorture
+
+# Prerequisite checks
+[ -z "$D" ] && echo >&2 "No argument supplied" && exit 1
+if [ ! -d "$D" ]; then
+ echo >&2 "$D does not exist: Malformed kernel source tree?"
+ exit 1
+fi
+if [ -d "$D/initrd" ]; then
+ echo "$D/initrd already exists, no need to create it"
+ exit 0
+fi
+
+T=${TMPDIR-/tmp}/mkinitrd.sh.$$
+trap 'rm -rf $T' 0 2
+mkdir $T
+
+cat > $T/init << '__EOF___'
+#!/bin/sh
+while :
+do
+ sleep 1000000
+done
+__EOF___
+
+# Try using dracut to create initrd
+command -v dracut >/dev/null 2>&1 || { echo >&2 "Dracut not installed"; exit 1; }
+echo Creating $D/initrd using dracut.
+
+# Filesystem creation
+dracut --force --no-hostonly --no-hostonly-cmdline --module "base" $T/initramfs.img
+cd $D
+mkdir initrd
+cd initrd
+zcat $T/initramfs.img | cpio -id
+cp $T/init init
+echo Done creating $D/initrd using dracut
+exit 0
From 38e630424ba304dbe07ae52aa78d1ed6d38d9f75 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney"
Date: Thu, 23 Aug 2018 10:48:18 -0700
Subject: [PATCH 015/112] rcutorture: Add initrd support for systems lacking
dracut
The support for creating initrd directories using dracut is a great
improvement over having to always hand-create them, it is a bit annoying
to have to install some otherwise irrelevant package just to be able to
run rcutorture. This commit therefore adds support for creating initrd
directories on systems innocent of dracut. You do need gcc, but then
again you need that to build the kernel (or to build llvm) in any case.
The idea is to create an initrd directory containing nothing but a
statically linked binary having a for-loop over a long-term sleep().
The result is a Linux kernel with almost no userspace: even the
time-honored /dev, /lib, /tmp, and /usr directories are gone. In fact,
the only directory present is "/", but only because I don't know how to
get rid of it, at least short of not having an initrd in the first place.
Although statically linked binaries are much maligned, and rightly so,
their disadvantages seem to be irrelevant for this particular use case.
From https://www.akkadia.org/drepper/no_static_linking.html:
1. Fixes are difficult to apply to hordes of widely scattered
statically linked binaries. But in this case, there is only one
binary, but there would otherwise be no fewer than four libraries.
2. Security measures like local address randomization cannot be used.
Prudence prevents me from asserting that it is impossible to
base a remote attack on a networking-free rcutorture instance.
Nevertheless, bonus points to the first person who comes up with
such an attack!
3. More efficient use of physical memory. Not in this case, given
that libc is 1.8MB and the statically linked binary "only" 800K.
4. Features such as locales, name service switch (NSS),
internationalized domain names (IDN) tool, and so on require
dynamic linking. Bonus points to the first person coming up
with a valid rcutorture use case requiring these features in
its initrd.
5. Accidental violations of (L)GPL. Actually, this change actually
helps -avoid- such violations by reducing the temptation to
pass around tarballs of rcutorture-ready initrd directories.
After all, the rcutorture scripts automatically create an initrd
directory for you, so why bother with the tarballs?
6. Tools and hacks like ltrace, LD_PRELOAD, LD_PROFILE, and LD_AUDIT
don't work. Again, bonus points to the first person coming up
with a valid rcutorture use case requiring these features in
its initrd.
Nevertheless, the script will use dracut if available, and will create the
statically linked binary only when dracut are missing. Those preferring
the smaller initrd directory resulting from the statically linked binary
(like me) are free to hand-edit mkinitrd.sh to remove the code using
dracut. ;-)
Signed-off-by: Paul E. McKenney
---
.../selftests/rcutorture/bin/mkinitrd.sh | 40 ++++++--
.../selftests/rcutorture/doc/initrd.txt | 99 +++----------------
2 files changed, 45 insertions(+), 94 deletions(-)
diff --git a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
index ae773760f39696..87a87ffeaa85a1 100755
--- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
+++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
@@ -46,15 +46,41 @@ done
__EOF___
# Try using dracut to create initrd
-command -v dracut >/dev/null 2>&1 || { echo >&2 "Dracut not installed"; exit 1; }
-echo Creating $D/initrd using dracut.
+if command -v dracut >/dev/null 2>&1
+then
+ echo Creating $D/initrd using dracut.
+ # Filesystem creation
+ dracut --force --no-hostonly --no-hostonly-cmdline --module "base" $T/initramfs.img
+ cd $D
+ mkdir initrd
+ cd initrd
+ zcat $T/initramfs.img | cpio -id
+ cp $T/init init
+ chmod +x init
+ echo Done creating $D/initrd using dracut
+ exit 0
+fi
-# Filesystem creation
-dracut --force --no-hostonly --no-hostonly-cmdline --module "base" $T/initramfs.img
+# No dracut, so create a C-language initrd/init program and statically
+# link it. This results in a very small initrd, but might be a bit less
+# future-proof than dracut.
+echo "Could not find dracut, attempting C initrd"
cd $D
mkdir initrd
cd initrd
-zcat $T/initramfs.img | cpio -id
-cp $T/init init
-echo Done creating $D/initrd using dracut
+cat > init.c << '___EOF___'
+#include
+
+int main(int argc, int argv[])
+{
+ for (;;)
+ sleep(1000*1000*1000); /* One gigasecond is ~30 years. */
+ return 0;
+}
+___EOF___
+gcc -static -Os -o init init.c
+strip init
+rm init.c
+echo "Done creating a statically linked C-language initrd"
+
exit 0
diff --git a/tools/testing/selftests/rcutorture/doc/initrd.txt b/tools/testing/selftests/rcutorture/doc/initrd.txt
index 833f826d6ec2c9..933b4fd1232715 100644
--- a/tools/testing/selftests/rcutorture/doc/initrd.txt
+++ b/tools/testing/selftests/rcutorture/doc/initrd.txt
@@ -1,9 +1,12 @@
-This document describes one way to create the initrd directory hierarchy
-in order to allow an initrd to be built into your kernel. The trick
-here is to steal the initrd file used on your Linux laptop, Ubuntu in
-this case. There are probably much better ways of doing this.
+The rcutorture scripting tools automatically create the needed initrd
+directory using dracut. Failing that, this tool will create an initrd
+containing a single statically linked binary named "init" that loops
+over a very long sleep() call. In both cases, this creation is done
+by tools/testing/selftests/rcutorture/bin/mkinitrd.sh.
-That said, here are the commands:
+However, if you are attempting to run rcutorture on a system that does
+not have dracut installed, and if you don't like the notion of static
+linking, you might wish to press an existing initrd into service:
------------------------------------------------------------------------
cd tools/testing/selftests/rcutorture
@@ -11,22 +14,7 @@ zcat /initrd.img > /tmp/initrd.img.zcat
mkdir initrd
cd initrd
cpio -id < /tmp/initrd.img.zcat
-------------------------------------------------------------------------
-
-Another way to create an initramfs image is using "dracut"[1], which is
-available on many distros, however the initramfs dracut generates is a cpio
-archive with another cpio archive in it, so an extra step is needed to create
-the initrd directory hierarchy.
-
-Here are the commands to create a initrd directory for rcutorture using
-dracut:
-
-------------------------------------------------------------------------
-dracut --no-hostonly --no-hostonly-cmdline --module "base bash shutdown" /tmp/initramfs.img
-cd tools/testing/selftests/rcutorture
-mkdir initrd
-cd initrd
-/usr/lib/dracut/skipcpio /tmp/initramfs.img | zcat | cpio -id < /tmp/initramfs.img
+# Manually verify that initrd contains needed binaries and libraries.
------------------------------------------------------------------------
Interestingly enough, if you are running rcutorture, you don't really
@@ -39,75 +27,12 @@ with 0755 mode.
------------------------------------------------------------------------
#!/bin/sh
-[ -d /dev ] || mkdir -m 0755 /dev
-[ -d /root ] || mkdir -m 0700 /root
-[ -d /sys ] || mkdir /sys
-[ -d /proc ] || mkdir /proc
-[ -d /tmp ] || mkdir /tmp
-mkdir -p /var/lock
-mount -t sysfs -o nodev,noexec,nosuid sysfs /sys
-mount -t proc -o nodev,noexec,nosuid proc /proc
-# Some things don't work properly without /etc/mtab.
-ln -sf /proc/mounts /etc/mtab
-
-# Note that this only becomes /dev on the real filesystem if udev's scripts
-# are used; which they will be, but it's worth pointing out
-if ! mount -t devtmpfs -o mode=0755 udev /dev; then
- echo "W: devtmpfs not available, falling back to tmpfs for /dev"
- mount -t tmpfs -o mode=0755 udev /dev
- [ -e /dev/console ] || mknod --mode=600 /dev/console c 5 1
- [ -e /dev/kmsg ] || mknod --mode=644 /dev/kmsg c 1 11
- [ -e /dev/null ] || mknod --mode=666 /dev/null c 1 3
-fi
-
-mkdir /dev/pts
-mount -t devpts -o noexec,nosuid,gid=5,mode=0620 devpts /dev/pts || true
-mount -t tmpfs -o "nosuid,size=20%,mode=0755" tmpfs /run
-mkdir /run/initramfs
-# compatibility symlink for the pre-oneiric locations
-ln -s /run/initramfs /dev/.initramfs
-
-# Export relevant variables
-export ROOT=
-export ROOTDELAY=
-export ROOTFLAGS=
-export ROOTFSTYPE=
-export IP=
-export BOOT=
-export BOOTIF=
-export UBIMTD=
-export break=
-export init=/sbin/init
-export quiet=n
-export readonly=y
-export rootmnt=/root
-export debug=
-export panic=
-export blacklist=
-export resume=
-export resume_offset=
-export recovery=
-
-for i in /sys/devices/system/cpu/cpu*/online
-do
- case $i in
- '/sys/devices/system/cpu/cpu0/online')
- ;;
- '/sys/devices/system/cpu/cpu*/online')
- ;;
- *)
- echo 1 > $i
- ;;
- esac
-done
-
while :
do
sleep 10
done
------------------------------------------------------------------------
-References:
-[1]: https://dracut.wiki.kernel.org/index.php/Main_Page
-[2]: http://blog.elastocloud.org/2015/06/rapid-linux-kernel-devtest-with-qemu.html
-[3]: https://www.centos.org/forums/viewtopic.php?t=51621
+This approach also allows most of the binaries and libraries in the
+initrd filesystem to be dispensed with, which can save significant
+space in rcutorture's "res" directory.
From 229ab0cb5be3bfbac5947df7240f6905470ca413 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney"
Date: Thu, 23 Aug 2018 15:23:23 -0700
Subject: [PATCH 016/112] rcutorture: Make initrd/init execute in userspace
Currently, the initrd/init script and executable remain blocked almost
all the time. However, it is necessary to test nohz_full userspace
execution, which both variants of initrd/init fail to do. This commit
therefore causes initrd/init to spend about a millisecond per second
executing in userspace.
Reported-by: Josh Triplett
Signed-off-by: Paul E. McKenney
---
.../selftests/rcutorture/bin/mkinitrd.sh | 43 +++++++++++++++++--
1 file changed, 39 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
index 87a87ffeaa85a1..b48c504edfe12d 100755
--- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
+++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
@@ -39,9 +39,22 @@ mkdir $T
cat > $T/init << '__EOF___'
#!/bin/sh
+# Run in userspace a few milliseconds every second. This helps to
+# exercise the NO_HZ_FULL portions of RCU.
while :
do
- sleep 1000000
+ q=
+ for i in \
+ a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a \
+ a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a \
+ a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a \
+ a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a \
+ a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a \
+ a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
+ do
+ q="$q $i"
+ done
+ sleep 1
done
__EOF___
@@ -70,15 +83,37 @@ mkdir initrd
cd initrd
cat > init.c << '___EOF___'
#include
+#include
+
+volatile unsigned long delaycount;
int main(int argc, int argv[])
{
- for (;;)
- sleep(1000*1000*1000); /* One gigasecond is ~30 years. */
+ int i;
+ struct timeval tv;
+ struct timeval tvb;
+
+ for (;;) {
+ sleep(1);
+ /* Need some userspace time. */
+ if (gettimeofday(&tvb, NULL))
+ continue;
+ do {
+ for (i = 0; i < 1000 * 100; i++)
+ delaycount = i * i;
+ if (gettimeofday(&tv, NULL))
+ break;
+ tv.tv_sec -= tvb.tv_sec;
+ if (tv.tv_sec > 1)
+ break;
+ tv.tv_usec += tv.tv_sec * 1000 * 1000;
+ tv.tv_usec -= tvb.tv_usec;
+ } while (tv.tv_usec < 1000);
+ }
return 0;
}
___EOF___
-gcc -static -Os -o init init.c
+cc -static -Os -o init init.c
strip init
rm init.c
echo "Done creating a statically linked C-language initrd"
From 70e9f504774b35aacd7b43d873b51ec5260e58ad Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney"
Date: Thu, 6 Sep 2018 10:26:07 -0700
Subject: [PATCH 017/112] rcutorture: Add cross-compile capability to initrd.sh
This adds the CROSS_COMPILE environment to the initrd.sh script's
gcc command to enable cross compilation.
Reported-by: Willy Tarreau
Signed-off-by: Paul E. McKenney
---
tools/testing/selftests/rcutorture/bin/mkinitrd.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
index b48c504edfe12d..70661457e3d64b 100755
--- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
+++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
@@ -113,7 +113,7 @@ int main(int argc, int argv[])
return 0;
}
___EOF___
-cc -static -Os -o init init.c
+${CROSS_COMPILE}gcc -static -Os -o init init.c
strip init
rm init.c
echo "Done creating a statically linked C-language initrd"
From 18d7bf8ed3a1628ee653d3abde051703642ecd60 Mon Sep 17 00:00:00 2001
From: Willy Tarreau
Date: Sun, 9 Sep 2018 11:41:10 +0200
Subject: [PATCH 018/112] rcutorture: Always strip using the cross-compiler
Strip using -s on the compiler command line instead of calling the "strip"
utility as the latter isn't necessarily compatible with the target arch.
Signed-off-by: Willy Tarreau
Signed-off-by: Paul E. McKenney
---
tools/testing/selftests/rcutorture/bin/mkinitrd.sh | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
index 70661457e3d64b..dbb6f016028177 100755
--- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
+++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
@@ -113,8 +113,7 @@ int main(int argc, int argv[])
return 0;
}
___EOF___
-${CROSS_COMPILE}gcc -static -Os -o init init.c
-strip init
+${CROSS_COMPILE}gcc -s -static -Os -o init init.c
rm init.c
echo "Done creating a statically linked C-language initrd"
From 825fa4cdfb10d8cbf784ebdadd6d5d93130a0cb5 Mon Sep 17 00:00:00 2001
From: Willy Tarreau
Date: Sun, 9 Sep 2018 11:46:48 +0200
Subject: [PATCH 019/112] rcutorture: Check initrd/init instead of initrd only
If the build fails, we can end up with an empty initrd directory which
prevents the build script from operating again. Better rely on the
resulting init executable instead.
Signed-off-by: Willy Tarreau
Signed-off-by: Paul E. McKenney
---
tools/testing/selftests/rcutorture/bin/mkinitrd.sh | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
index dbb6f016028177..56a56ea06983dd 100755
--- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
+++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
@@ -28,8 +28,8 @@ if [ ! -d "$D" ]; then
echo >&2 "$D does not exist: Malformed kernel source tree?"
exit 1
fi
-if [ -d "$D/initrd" ]; then
- echo "$D/initrd already exists, no need to create it"
+if [ -s "$D/initrd/init" ]; then
+ echo "$D/initrd/init already exists, no need to create it"
exit 0
fi
@@ -65,7 +65,7 @@ then
# Filesystem creation
dracut --force --no-hostonly --no-hostonly-cmdline --module "base" $T/initramfs.img
cd $D
- mkdir initrd
+ mkdir -p initrd
cd initrd
zcat $T/initramfs.img | cpio -id
cp $T/init init
@@ -79,7 +79,7 @@ fi
# future-proof than dracut.
echo "Could not find dracut, attempting C initrd"
cd $D
-mkdir initrd
+mkdir -p initrd
cd initrd
cat > init.c << '___EOF___'
#include
From 66b6f755ad45d354c5b74abd258f67aa8b40b3c7 Mon Sep 17 00:00:00 2001
From: Willy Tarreau
Date: Sun, 9 Sep 2018 13:26:04 +0200
Subject: [PATCH 020/112] rcutorture: Import a copy of nolibc
This is a definition of the most common syscalls needed in minimalist
init executables, allowing to statically build them with no external
dependencies. It is sufficient in its current form to build rcutorture's
init on x86_64, i386, arm, and arm64. Others have not been ported or
tested. Updates may be found here :
http://git.formilux.org/?p=people/willy/nolibc.git
Signed-off-by: Willy Tarreau
Signed-off-by: Paul E. McKenney
---
.../testing/selftests/rcutorture/bin/nolibc.h | 2197 +++++++++++++++++
1 file changed, 2197 insertions(+)
create mode 100644 tools/testing/selftests/rcutorture/bin/nolibc.h
diff --git a/tools/testing/selftests/rcutorture/bin/nolibc.h b/tools/testing/selftests/rcutorture/bin/nolibc.h
new file mode 100644
index 00000000000000..f98f5b92d3eb31
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/bin/nolibc.h
@@ -0,0 +1,2197 @@
+/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
+/* nolibc.h
+ * Copyright (C) 2017-2018 Willy Tarreau
+ */
+
+/* some archs (at least aarch64) don't expose the regular syscalls anymore by
+ * default, either because they have an "_at" replacement, or because there are
+ * more modern alternatives. For now we'd rather still use them.
+ */
+#define __ARCH_WANT_SYSCALL_NO_AT
+#define __ARCH_WANT_SYSCALL_NO_FLAGS
+#define __ARCH_WANT_SYSCALL_DEPRECATED
+
+#include
+#include
+#include
+#include
+#include
+
+#define NOLIBC
+
+/* Build a static executable this way :
+ * $ gcc -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib \
+ * -static -include nolibc.h -lgcc -o hello hello.c
+ *
+ * Useful calling convention table found here :
+ * http://man7.org/linux/man-pages/man2/syscall.2.html
+ *
+ * This doc is even better :
+ * https://w3challs.com/syscalls/
+ */
+
+
+/* this way it will be removed if unused */
+static int errno;
+
+#ifndef NOLIBC_IGNORE_ERRNO
+#define SET_ERRNO(v) do { errno = (v); } while (0)
+#else
+#define SET_ERRNO(v) do { } while (0)
+#endif
+
+/* errno codes all ensure that they will not conflict with a valid pointer
+ * because they all correspond to the highest addressable memry page.
+ */
+#define MAX_ERRNO 4095
+
+/* Declare a few quite common macros and types that usually are in stdlib.h,
+ * stdint.h, ctype.h, unistd.h and a few other common locations.
+ */
+
+#define NULL ((void *)0)
+
+/* stdint types */
+typedef unsigned char uint8_t;
+typedef signed char int8_t;
+typedef unsigned short uint16_t;
+typedef signed short int16_t;
+typedef unsigned int uint32_t;
+typedef signed int int32_t;
+typedef unsigned long long uint64_t;
+typedef signed long long int64_t;
+typedef unsigned long size_t;
+typedef signed long ssize_t;
+typedef unsigned long uintptr_t;
+typedef signed long intptr_t;
+typedef signed long ptrdiff_t;
+
+/* for stat() */
+typedef unsigned int dev_t;
+typedef unsigned long ino_t;
+typedef unsigned int mode_t;
+typedef signed int pid_t;
+typedef unsigned int uid_t;
+typedef unsigned int gid_t;
+typedef unsigned long nlink_t;
+typedef signed long off_t;
+typedef signed long blksize_t;
+typedef signed long blkcnt_t;
+typedef signed long time_t;
+
+/* for poll() */
+struct pollfd {
+ int fd;
+ short int events;
+ short int revents;
+};
+
+/* for select() */
+struct timeval {
+ long tv_sec;
+ long tv_usec;
+};
+
+/* for pselect() */
+struct timespec {
+ long tv_sec;
+ long tv_nsec;
+};
+
+/* for gettimeofday() */
+struct timezone {
+ int tz_minuteswest;
+ int tz_dsttime;
+};
+
+/* for getdents64() */
+struct linux_dirent64 {
+ uint64_t d_ino;
+ int64_t d_off;
+ unsigned short d_reclen;
+ unsigned char d_type;
+ char d_name[];
+};
+
+/* commonly an fd_set represents 256 FDs */
+#define FD_SETSIZE 256
+typedef struct { uint32_t fd32[FD_SETSIZE/32]; } fd_set;
+
+/* needed by wait4() */
+struct rusage {
+ struct timeval ru_utime;
+ struct timeval ru_stime;
+ long ru_maxrss;
+ long ru_ixrss;
+ long ru_idrss;
+ long ru_isrss;
+ long ru_minflt;
+ long ru_majflt;
+ long ru_nswap;
+ long ru_inblock;
+ long ru_oublock;
+ long ru_msgsnd;
+ long ru_msgrcv;
+ long ru_nsignals;
+ long ru_nvcsw;
+ long ru_nivcsw;
+};
+
+/* stat flags (WARNING, octal here) */
+#define S_IFDIR 0040000
+#define S_IFCHR 0020000
+#define S_IFBLK 0060000
+#define S_IFREG 0100000
+#define S_IFIFO 0010000
+#define S_IFLNK 0120000
+#define S_IFSOCK 0140000
+#define S_IFMT 0170000
+
+#define S_ISDIR(mode) (((mode) & S_IFDIR) == S_IFDIR)
+#define S_ISCHR(mode) (((mode) & S_IFCHR) == S_IFCHR)
+#define S_ISBLK(mode) (((mode) & S_IFBLK) == S_IFBLK)
+#define S_ISREG(mode) (((mode) & S_IFREG) == S_IFREG)
+#define S_ISFIFO(mode) (((mode) & S_IFIFO) == S_IFIFO)
+#define S_ISLNK(mode) (((mode) & S_IFLNK) == S_IFLNK)
+#define S_ISSOCK(mode) (((mode) & S_IFSOCK) == S_IFSOCK)
+
+#define DT_UNKNOWN 0
+#define DT_FIFO 1
+#define DT_CHR 2
+#define DT_DIR 4
+#define DT_BLK 6
+#define DT_REG 8
+#define DT_LNK 10
+#define DT_SOCK 12
+
+/* all the *at functions */
+#ifndef AT_FDWCD
+#define AT_FDCWD -100
+#endif
+
+/* lseek */
+#define SEEK_SET 0
+#define SEEK_CUR 1
+#define SEEK_END 2
+
+/* reboot */
+#define LINUX_REBOOT_MAGIC1 0xfee1dead
+#define LINUX_REBOOT_MAGIC2 0x28121969
+#define LINUX_REBOOT_CMD_HALT 0xcdef0123
+#define LINUX_REBOOT_CMD_POWER_OFF 0x4321fedc
+#define LINUX_REBOOT_CMD_RESTART 0x01234567
+#define LINUX_REBOOT_CMD_SW_SUSPEND 0xd000fce2
+
+
+/* The format of the struct as returned by the libc to the application, which
+ * significantly differs from the format returned by the stat() syscall flavours.
+ */
+struct stat {
+ dev_t st_dev; /* ID of device containing file */
+ ino_t st_ino; /* inode number */
+ mode_t st_mode; /* protection */
+ nlink_t st_nlink; /* number of hard links */
+ uid_t st_uid; /* user ID of owner */
+ gid_t st_gid; /* group ID of owner */
+ dev_t st_rdev; /* device ID (if special file) */
+ off_t st_size; /* total size, in bytes */
+ blksize_t st_blksize; /* blocksize for file system I/O */
+ blkcnt_t st_blocks; /* number of 512B blocks allocated */
+ time_t st_atime; /* time of last access */
+ time_t st_mtime; /* time of last modification */
+ time_t st_ctime; /* time of last status change */
+};
+
+#define WEXITSTATUS(status) (((status) & 0xff00) >> 8)
+#define WIFEXITED(status) (((status) & 0x7f) == 0)
+
+
+/* Below comes the architecture-specific code. For each architecture, we have
+ * the syscall declarations and the _start code definition. This is the only
+ * global part. On all architectures the kernel puts everything in the stack
+ * before jumping to _start just above us, without any return address (_start
+ * is not a function but an entry pint). So at the stack pointer we find argc.
+ * Then argv[] begins, and ends at the first NULL. Then we have envp which
+ * starts and ends with a NULL as well. So envp=argv+argc+1.
+ */
+
+#if defined(__x86_64__)
+/* Syscalls for x86_64 :
+ * - registers are 64-bit
+ * - syscall number is passed in rax
+ * - arguments are in rdi, rsi, rdx, r10, r8, r9 respectively
+ * - the system call is performed by calling the syscall instruction
+ * - syscall return comes in rax
+ * - rcx and r8..r11 may be clobbered, others are preserved.
+ * - the arguments are cast to long and assigned into the target registers
+ * which are then simply passed as registers to the asm code, so that we
+ * don't have to experience issues with register constraints.
+ * - the syscall number is always specified last in order to allow to force
+ * some registers before (gcc refuses a %-register at the last position).
+ */
+
+#define my_syscall0(num) \
+({ \
+ long _ret; \
+ register long _num asm("rax") = (num); \
+ \
+ asm volatile ( \
+ "syscall\n" \
+ : "=a" (_ret) \
+ : "0"(_num) \
+ : "rcx", "r8", "r9", "r10", "r11", "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+#define my_syscall1(num, arg1) \
+({ \
+ long _ret; \
+ register long _num asm("rax") = (num); \
+ register long _arg1 asm("rdi") = (long)(arg1); \
+ \
+ asm volatile ( \
+ "syscall\n" \
+ : "=a" (_ret) \
+ : "r"(_arg1), \
+ "0"(_num) \
+ : "rcx", "r8", "r9", "r10", "r11", "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+#define my_syscall2(num, arg1, arg2) \
+({ \
+ long _ret; \
+ register long _num asm("rax") = (num); \
+ register long _arg1 asm("rdi") = (long)(arg1); \
+ register long _arg2 asm("rsi") = (long)(arg2); \
+ \
+ asm volatile ( \
+ "syscall\n" \
+ : "=a" (_ret) \
+ : "r"(_arg1), "r"(_arg2), \
+ "0"(_num) \
+ : "rcx", "r8", "r9", "r10", "r11", "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+#define my_syscall3(num, arg1, arg2, arg3) \
+({ \
+ long _ret; \
+ register long _num asm("rax") = (num); \
+ register long _arg1 asm("rdi") = (long)(arg1); \
+ register long _arg2 asm("rsi") = (long)(arg2); \
+ register long _arg3 asm("rdx") = (long)(arg3); \
+ \
+ asm volatile ( \
+ "syscall\n" \
+ : "=a" (_ret) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), \
+ "0"(_num) \
+ : "rcx", "r8", "r9", "r10", "r11", "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+#define my_syscall4(num, arg1, arg2, arg3, arg4) \
+({ \
+ long _ret; \
+ register long _num asm("rax") = (num); \
+ register long _arg1 asm("rdi") = (long)(arg1); \
+ register long _arg2 asm("rsi") = (long)(arg2); \
+ register long _arg3 asm("rdx") = (long)(arg3); \
+ register long _arg4 asm("r10") = (long)(arg4); \
+ \
+ asm volatile ( \
+ "syscall\n" \
+ : "=a" (_ret), "=r"(_arg4) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), \
+ "0"(_num) \
+ : "rcx", "r8", "r9", "r11", "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \
+({ \
+ long _ret; \
+ register long _num asm("rax") = (num); \
+ register long _arg1 asm("rdi") = (long)(arg1); \
+ register long _arg2 asm("rsi") = (long)(arg2); \
+ register long _arg3 asm("rdx") = (long)(arg3); \
+ register long _arg4 asm("r10") = (long)(arg4); \
+ register long _arg5 asm("r8") = (long)(arg5); \
+ \
+ asm volatile ( \
+ "syscall\n" \
+ : "=a" (_ret), "=r"(_arg4), "=r"(_arg5) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
+ "0"(_num) \
+ : "rcx", "r9", "r11", "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+#define my_syscall6(num, arg1, arg2, arg3, arg4, arg5, arg6) \
+({ \
+ long _ret; \
+ register long _num asm("rax") = (num); \
+ register long _arg1 asm("rdi") = (long)(arg1); \
+ register long _arg2 asm("rsi") = (long)(arg2); \
+ register long _arg3 asm("rdx") = (long)(arg3); \
+ register long _arg4 asm("r10") = (long)(arg4); \
+ register long _arg5 asm("r8") = (long)(arg5); \
+ register long _arg6 asm("r9") = (long)(arg6); \
+ \
+ asm volatile ( \
+ "syscall\n" \
+ : "=a" (_ret), "=r"(_arg4), "=r"(_arg5) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
+ "r"(_arg6), "0"(_num) \
+ : "rcx", "r11", "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+/* startup code */
+asm(".section .text\n"
+ ".global _start\n"
+ "_start:\n"
+ "pop %rdi\n" // argc (first arg, %rdi)
+ "mov %rsp, %rsi\n" // argv[] (second arg, %rsi)
+ "lea 8(%rsi,%rdi,8),%rdx\n" // then a NULL then envp (third arg, %rdx)
+ "and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned when
+ "sub $8, %rsp\n" // entering the callee
+ "call main\n" // main() returns the status code, we'll exit with it.
+ "movzb %al, %rdi\n" // retrieve exit code from 8 lower bits
+ "mov $60, %rax\n" // NR_exit == 60
+ "syscall\n" // really exit
+ "hlt\n" // ensure it does not return
+ "");
+
+/* fcntl / open */
+#define O_RDONLY 0
+#define O_WRONLY 1
+#define O_RDWR 2
+#define O_CREAT 0x40
+#define O_EXCL 0x80
+#define O_NOCTTY 0x100
+#define O_TRUNC 0x200
+#define O_APPEND 0x400
+#define O_NONBLOCK 0x800
+#define O_DIRECTORY 0x10000
+
+/* The struct returned by the stat() syscall, equivalent to stat64(). The
+ * syscall returns 116 bytes and stops in the middle of __unused.
+ */
+struct sys_stat_struct {
+ unsigned long st_dev;
+ unsigned long st_ino;
+ unsigned long st_nlink;
+ unsigned int st_mode;
+ unsigned int st_uid;
+
+ unsigned int st_gid;
+ unsigned int __pad0;
+ unsigned long st_rdev;
+ long st_size;
+ long st_blksize;
+
+ long st_blocks;
+ unsigned long st_atime;
+ unsigned long st_atime_nsec;
+ unsigned long st_mtime;
+
+ unsigned long st_mtime_nsec;
+ unsigned long st_ctime;
+ unsigned long st_ctime_nsec;
+ long __unused[3];
+};
+
+#elif defined(__i386__) || defined(__i486__) || defined(__i586__) || defined(__i686__)
+/* Syscalls for i386 :
+ * - mostly similar to x86_64
+ * - registers are 32-bit
+ * - syscall number is passed in eax
+ * - arguments are in ebx, ecx, edx, esi, edi, ebp respectively
+ * - all registers are preserved (except eax of course)
+ * - the system call is performed by calling int $0x80
+ * - syscall return comes in eax
+ * - the arguments are cast to long and assigned into the target registers
+ * which are then simply passed as registers to the asm code, so that we
+ * don't have to experience issues with register constraints.
+ * - the syscall number is always specified last in order to allow to force
+ * some registers before (gcc refuses a %-register at the last position).
+ *
+ * Also, i386 supports the old_select syscall if newselect is not available
+ */
+#define __ARCH_WANT_SYS_OLD_SELECT
+
+#define my_syscall0(num) \
+({ \
+ long _ret; \
+ register long _num asm("eax") = (num); \
+ \
+ asm volatile ( \
+ "int $0x80\n" \
+ : "=a" (_ret) \
+ : "0"(_num) \
+ : "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+#define my_syscall1(num, arg1) \
+({ \
+ long _ret; \
+ register long _num asm("eax") = (num); \
+ register long _arg1 asm("ebx") = (long)(arg1); \
+ \
+ asm volatile ( \
+ "int $0x80\n" \
+ : "=a" (_ret) \
+ : "r"(_arg1), \
+ "0"(_num) \
+ : "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+#define my_syscall2(num, arg1, arg2) \
+({ \
+ long _ret; \
+ register long _num asm("eax") = (num); \
+ register long _arg1 asm("ebx") = (long)(arg1); \
+ register long _arg2 asm("ecx") = (long)(arg2); \
+ \
+ asm volatile ( \
+ "int $0x80\n" \
+ : "=a" (_ret) \
+ : "r"(_arg1), "r"(_arg2), \
+ "0"(_num) \
+ : "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+#define my_syscall3(num, arg1, arg2, arg3) \
+({ \
+ long _ret; \
+ register long _num asm("eax") = (num); \
+ register long _arg1 asm("ebx") = (long)(arg1); \
+ register long _arg2 asm("ecx") = (long)(arg2); \
+ register long _arg3 asm("edx") = (long)(arg3); \
+ \
+ asm volatile ( \
+ "int $0x80\n" \
+ : "=a" (_ret) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), \
+ "0"(_num) \
+ : "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+#define my_syscall4(num, arg1, arg2, arg3, arg4) \
+({ \
+ long _ret; \
+ register long _num asm("eax") = (num); \
+ register long _arg1 asm("ebx") = (long)(arg1); \
+ register long _arg2 asm("ecx") = (long)(arg2); \
+ register long _arg3 asm("edx") = (long)(arg3); \
+ register long _arg4 asm("esi") = (long)(arg4); \
+ \
+ asm volatile ( \
+ "int $0x80\n" \
+ : "=a" (_ret) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), \
+ "0"(_num) \
+ : "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \
+({ \
+ long _ret; \
+ register long _num asm("eax") = (num); \
+ register long _arg1 asm("ebx") = (long)(arg1); \
+ register long _arg2 asm("ecx") = (long)(arg2); \
+ register long _arg3 asm("edx") = (long)(arg3); \
+ register long _arg4 asm("esi") = (long)(arg4); \
+ register long _arg5 asm("edi") = (long)(arg5); \
+ \
+ asm volatile ( \
+ "int $0x80\n" \
+ : "=a" (_ret) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
+ "0"(_num) \
+ : "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+/* startup code */
+asm(".section .text\n"
+ ".global _start\n"
+ "_start:\n"
+ "pop %eax\n" // argc (first arg, %eax)
+ "mov %esp, %ebx\n" // argv[] (second arg, %ebx)
+ "lea 4(%ebx,%eax,4),%ecx\n" // then a NULL then envp (third arg, %ecx)
+ "and $-16, %esp\n" // x86 ABI : esp must be 16-byte aligned when
+ "push %ecx\n" // push all registers on the stack so that we
+ "push %ebx\n" // support both regparm and plain stack modes
+ "push %eax\n"
+ "call main\n" // main() returns the status code in %eax
+ "movzbl %al, %ebx\n" // retrieve exit code from lower 8 bits
+ "movl $1, %eax\n" // NR_exit == 1
+ "int $0x80\n" // exit now
+ "hlt\n" // ensure it does not
+ "");
+
+/* fcntl / open */
+#define O_RDONLY 0
+#define O_WRONLY 1
+#define O_RDWR 2
+#define O_CREAT 0x40
+#define O_EXCL 0x80
+#define O_NOCTTY 0x100
+#define O_TRUNC 0x200
+#define O_APPEND 0x400
+#define O_NONBLOCK 0x800
+#define O_DIRECTORY 0x10000
+
+/* The struct returned by the stat() syscall, 32-bit only, the syscall returns
+ * exactly 56 bytes (stops before the unused array).
+ */
+struct sys_stat_struct {
+ unsigned long st_dev;
+ unsigned long st_ino;
+ unsigned short st_mode;
+ unsigned short st_nlink;
+ unsigned short st_uid;
+ unsigned short st_gid;
+
+ unsigned long st_rdev;
+ unsigned long st_size;
+ unsigned long st_blksize;
+ unsigned long st_blocks;
+
+ unsigned long st_atime;
+ unsigned long st_atime_nsec;
+ unsigned long st_mtime;
+ unsigned long st_mtime_nsec;
+
+ unsigned long st_ctime;
+ unsigned long st_ctime_nsec;
+ unsigned long __unused[2];
+};
+
+#elif defined(__ARM_EABI__)
+/* Syscalls for ARM in ARM or Thumb modes :
+ * - registers are 32-bit
+ * - stack is 8-byte aligned
+ * ( http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4127.html)
+ * - syscall number is passed in r7
+ * - arguments are in r0, r1, r2, r3, r4, r5
+ * - the system call is performed by calling svc #0
+ * - syscall return comes in r0.
+ * - only lr is clobbered.
+ * - the arguments are cast to long and assigned into the target registers
+ * which are then simply passed as registers to the asm code, so that we
+ * don't have to experience issues with register constraints.
+ * - the syscall number is always specified last in order to allow to force
+ * some registers before (gcc refuses a %-register at the last position).
+ *
+ * Also, ARM supports the old_select syscall if newselect is not available
+ */
+#define __ARCH_WANT_SYS_OLD_SELECT
+
+#define my_syscall0(num) \
+({ \
+ register long _num asm("r7") = (num); \
+ register long _arg1 asm("r0"); \
+ \
+ asm volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_num) \
+ : "memory", "cc", "lr" \
+ ); \
+ _arg1; \
+})
+
+#define my_syscall1(num, arg1) \
+({ \
+ register long _num asm("r7") = (num); \
+ register long _arg1 asm("r0") = (long)(arg1); \
+ \
+ asm volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), \
+ "r"(_num) \
+ : "memory", "cc", "lr" \
+ ); \
+ _arg1; \
+})
+
+#define my_syscall2(num, arg1, arg2) \
+({ \
+ register long _num asm("r7") = (num); \
+ register long _arg1 asm("r0") = (long)(arg1); \
+ register long _arg2 asm("r1") = (long)(arg2); \
+ \
+ asm volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), \
+ "r"(_num) \
+ : "memory", "cc", "lr" \
+ ); \
+ _arg1; \
+})
+
+#define my_syscall3(num, arg1, arg2, arg3) \
+({ \
+ register long _num asm("r7") = (num); \
+ register long _arg1 asm("r0") = (long)(arg1); \
+ register long _arg2 asm("r1") = (long)(arg2); \
+ register long _arg3 asm("r2") = (long)(arg3); \
+ \
+ asm volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), \
+ "r"(_num) \
+ : "memory", "cc", "lr" \
+ ); \
+ _arg1; \
+})
+
+#define my_syscall4(num, arg1, arg2, arg3, arg4) \
+({ \
+ register long _num asm("r7") = (num); \
+ register long _arg1 asm("r0") = (long)(arg1); \
+ register long _arg2 asm("r1") = (long)(arg2); \
+ register long _arg3 asm("r2") = (long)(arg3); \
+ register long _arg4 asm("r3") = (long)(arg4); \
+ \
+ asm volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), \
+ "r"(_num) \
+ : "memory", "cc", "lr" \
+ ); \
+ _arg1; \
+})
+
+#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \
+({ \
+ register long _num asm("r7") = (num); \
+ register long _arg1 asm("r0") = (long)(arg1); \
+ register long _arg2 asm("r1") = (long)(arg2); \
+ register long _arg3 asm("r2") = (long)(arg3); \
+ register long _arg4 asm("r3") = (long)(arg4); \
+ register long _arg5 asm("r4") = (long)(arg5); \
+ \
+ asm volatile ( \
+ "svc #0\n" \
+ : "=r" (_arg1) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
+ "r"(_num) \
+ : "memory", "cc", "lr" \
+ ); \
+ _arg1; \
+})
+
+/* startup code */
+asm(".section .text\n"
+ ".global _start\n"
+ "_start:\n"
+#if defined(__THUMBEB__) || defined(__THUMBEL__)
+ /* We enter here in 32-bit mode but if some previous functions were in
+ * 16-bit mode, the assembler cannot know, so we need to tell it we're in
+ * 32-bit now, then switch to 16-bit (is there a better way to do it than
+ * adding 1 by hand ?) and tell the asm we're now in 16-bit mode so that
+ * it generates correct instructions. Note that we do not support thumb1.
+ */
+ ".code 32\n"
+ "add r0, pc, #1\n"
+ "bx r0\n"
+ ".code 16\n"
+#endif
+ "pop {%r0}\n" // argc was in the stack
+ "mov %r1, %sp\n" // argv = sp
+ "add %r2, %r1, %r0, lsl #2\n" // envp = argv + 4*argc ...
+ "add %r2, %r2, $4\n" // ... + 4
+ "and %r3, %r1, $-8\n" // AAPCS : sp must be 8-byte aligned in the
+ "mov %sp, %r3\n" // callee, an bl doesn't push (lr=pc)
+ "bl main\n" // main() returns the status code, we'll exit with it.
+ "and %r0, %r0, $0xff\n" // limit exit code to 8 bits
+ "movs r7, $1\n" // NR_exit == 1
+ "svc $0x00\n"
+ "");
+
+/* fcntl / open */
+#define O_RDONLY 0
+#define O_WRONLY 1
+#define O_RDWR 2
+#define O_CREAT 0x40
+#define O_EXCL 0x80
+#define O_NOCTTY 0x100
+#define O_TRUNC 0x200
+#define O_APPEND 0x400
+#define O_NONBLOCK 0x800
+#define O_DIRECTORY 0x4000
+
+/* The struct returned by the stat() syscall, 32-bit only, the syscall returns
+ * exactly 56 bytes (stops before the unused array). In big endian, the format
+ * differs as devices are returned as short only.
+ */
+struct sys_stat_struct {
+#if defined(__ARMEB__)
+ unsigned short st_dev;
+ unsigned short __pad1;
+#else
+ unsigned long st_dev;
+#endif
+ unsigned long st_ino;
+ unsigned short st_mode;
+ unsigned short st_nlink;
+ unsigned short st_uid;
+ unsigned short st_gid;
+#if defined(__ARMEB__)
+ unsigned short st_rdev;
+ unsigned short __pad2;
+#else
+ unsigned long st_rdev;
+#endif
+ unsigned long st_size;
+ unsigned long st_blksize;
+ unsigned long st_blocks;
+ unsigned long st_atime;
+ unsigned long st_atime_nsec;
+ unsigned long st_mtime;
+ unsigned long st_mtime_nsec;
+ unsigned long st_ctime;
+ unsigned long st_ctime_nsec;
+ unsigned long __unused[2];
+};
+
+#elif defined(__aarch64__)
+/* Syscalls for AARCH64 :
+ * - registers are 64-bit
+ * - stack is 16-byte aligned
+ * - syscall number is passed in x8
+ * - arguments are in x0, x1, x2, x3, x4, x5
+ * - the system call is performed by calling svc 0
+ * - syscall return comes in x0.
+ * - the arguments are cast to long and assigned into the target registers
+ * which are then simply passed as registers to the asm code, so that we
+ * don't have to experience issues with register constraints.
+ *
+ * On aarch64, select() is not implemented so we have to use pselect6().
+ */
+#define __ARCH_WANT_SYS_PSELECT6
+
+#define my_syscall0(num) \
+({ \
+ register long _num asm("x8") = (num); \
+ register long _arg1 asm("x0"); \
+ \
+ asm volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+})
+
+#define my_syscall1(num, arg1) \
+({ \
+ register long _num asm("x8") = (num); \
+ register long _arg1 asm("x0") = (long)(arg1); \
+ \
+ asm volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), \
+ "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+})
+
+#define my_syscall2(num, arg1, arg2) \
+({ \
+ register long _num asm("x8") = (num); \
+ register long _arg1 asm("x0") = (long)(arg1); \
+ register long _arg2 asm("x1") = (long)(arg2); \
+ \
+ asm volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), \
+ "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+})
+
+#define my_syscall3(num, arg1, arg2, arg3) \
+({ \
+ register long _num asm("x8") = (num); \
+ register long _arg1 asm("x0") = (long)(arg1); \
+ register long _arg2 asm("x1") = (long)(arg2); \
+ register long _arg3 asm("x2") = (long)(arg3); \
+ \
+ asm volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), \
+ "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+})
+
+#define my_syscall4(num, arg1, arg2, arg3, arg4) \
+({ \
+ register long _num asm("x8") = (num); \
+ register long _arg1 asm("x0") = (long)(arg1); \
+ register long _arg2 asm("x1") = (long)(arg2); \
+ register long _arg3 asm("x2") = (long)(arg3); \
+ register long _arg4 asm("x3") = (long)(arg4); \
+ \
+ asm volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), \
+ "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+})
+
+#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \
+({ \
+ register long _num asm("x8") = (num); \
+ register long _arg1 asm("x0") = (long)(arg1); \
+ register long _arg2 asm("x1") = (long)(arg2); \
+ register long _arg3 asm("x2") = (long)(arg3); \
+ register long _arg4 asm("x3") = (long)(arg4); \
+ register long _arg5 asm("x4") = (long)(arg5); \
+ \
+ asm volatile ( \
+ "svc #0\n" \
+ : "=r" (_arg1) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
+ "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+})
+
+#define my_syscall6(num, arg1, arg2, arg3, arg4, arg5, arg6) \
+({ \
+ register long _num asm("x8") = (num); \
+ register long _arg1 asm("x0") = (long)(arg1); \
+ register long _arg2 asm("x1") = (long)(arg2); \
+ register long _arg3 asm("x2") = (long)(arg3); \
+ register long _arg4 asm("x3") = (long)(arg4); \
+ register long _arg5 asm("x4") = (long)(arg5); \
+ register long _arg6 asm("x5") = (long)(arg6); \
+ \
+ asm volatile ( \
+ "svc #0\n" \
+ : "=r" (_arg1) \
+ : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
+ "r"(_arg6), "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+})
+
+/* startup code */
+asm(".section .text\n"
+ ".global _start\n"
+ "_start:\n"
+ "ldr x0, [sp]\n" // argc (x0) was in the stack
+ "add x1, sp, 8\n" // argv (x1) = sp
+ "lsl x2, x0, 3\n" // envp (x2) = 8*argc ...
+ "add x2, x2, 8\n" // + 8 (skip null)
+ "add x2, x2, x1\n" // + argv
+ "and sp, x1, -16\n" // sp must be 16-byte aligned in the callee
+ "bl main\n" // main() returns the status code, we'll exit with it.
+ "and x0, x0, 0xff\n" // limit exit code to 8 bits
+ "mov x8, 93\n" // NR_exit == 93
+ "svc #0\n"
+ "");
+
+/* fcntl / open */
+#define O_RDONLY 0
+#define O_WRONLY 1
+#define O_RDWR 2
+#define O_CREAT 0x40
+#define O_EXCL 0x80
+#define O_NOCTTY 0x100
+#define O_TRUNC 0x200
+#define O_APPEND 0x400
+#define O_NONBLOCK 0x800
+#define O_DIRECTORY 0x4000
+
+/* The struct returned by the newfstatat() syscall. Differs slightly from the
+ * x86_64's stat one by field ordering, so be careful.
+ */
+struct sys_stat_struct {
+ unsigned long st_dev;
+ unsigned long st_ino;
+ unsigned int st_mode;
+ unsigned int st_nlink;
+ unsigned int st_uid;
+ unsigned int st_gid;
+
+ unsigned long st_rdev;
+ unsigned long __pad1;
+ long st_size;
+ int st_blksize;
+ int __pad2;
+
+ long st_blocks;
+ long st_atime;
+ unsigned long st_atime_nsec;
+ long st_mtime;
+
+ unsigned long st_mtime_nsec;
+ long st_ctime;
+ unsigned long st_ctime_nsec;
+ unsigned int __unused[2];
+};
+
+#elif defined(__mips__) && defined(_ABIO32)
+/* Syscalls for MIPS ABI O32 :
+ * - WARNING! there's always a delayed slot!
+ * - WARNING again, the syntax is different, registers take a '$' and numbers
+ * do not.
+ * - registers are 32-bit
+ * - stack is 8-byte aligned
+ * - syscall number is passed in v0 (starts at 0xfa0).
+ * - arguments are in a0, a1, a2, a3, then the stack. The caller needs to
+ * leave some room in the stack for the callee to save a0..a3 if needed.
+ * - Many registers are clobbered, in fact only a0..a2 and s0..s8 are
+ * preserved. See: https://www.linux-mips.org/wiki/Syscall as well as
+ * scall32-o32.S in the kernel sources.
+ * - the system call is performed by calling "syscall"
+ * - syscall return comes in v0, and register a3 needs to be checked to know
+ * if an error occured, in which case errno is in v0.
+ * - the arguments are cast to long and assigned into the target registers
+ * which are then simply passed as registers to the asm code, so that we
+ * don't have to experience issues with register constraints.
+ */
+
+#define my_syscall0(num) \
+({ \
+ register long _num asm("v0") = (num); \
+ register long _arg4 asm("a3"); \
+ \
+ asm volatile ( \
+ "addiu $sp, $sp, -32\n" \
+ "syscall\n" \
+ "addiu $sp, $sp, 32\n" \
+ : "=r"(_num), "=r"(_arg4) \
+ : "r"(_num) \
+ : "memory", "cc", "at", "v1", "hi", "lo", \
+ \
+ ); \
+ _arg4 ? -_num : _num; \
+})
+
+#define my_syscall1(num, arg1) \
+({ \
+ register long _num asm("v0") = (num); \
+ register long _arg1 asm("a0") = (long)(arg1); \
+ register long _arg4 asm("a3"); \
+ \
+ asm volatile ( \
+ "addiu $sp, $sp, -32\n" \
+ "syscall\n" \
+ "addiu $sp, $sp, 32\n" \
+ : "=r"(_num), "=r"(_arg4) \
+ : "0"(_num), \
+ "r"(_arg1) \
+ : "memory", "cc", "at", "v1", "hi", "lo", \
+ \
+ ); \
+ _arg4 ? -_num : _num; \
+})
+
+#define my_syscall2(num, arg1, arg2) \
+({ \
+ register long _num asm("v0") = (num); \
+ register long _arg1 asm("a0") = (long)(arg1); \
+ register long _arg2 asm("a1") = (long)(arg2); \
+ register long _arg4 asm("a3"); \
+ \
+ asm volatile ( \
+ "addiu $sp, $sp, -32\n" \
+ "syscall\n" \
+ "addiu $sp, $sp, 32\n" \
+ : "=r"(_num), "=r"(_arg4) \
+ : "0"(_num), \
+ "r"(_arg1), "r"(_arg2) \
+ : "memory", "cc", "at", "v1", "hi", "lo", \
+ \
+ ); \
+ _arg4 ? -_num : _num; \
+})
+
+#define my_syscall3(num, arg1, arg2, arg3) \
+({ \
+ register long _num asm("v0") = (num); \
+ register long _arg1 asm("a0") = (long)(arg1); \
+ register long _arg2 asm("a1") = (long)(arg2); \
+ register long _arg3 asm("a2") = (long)(arg3); \
+ register long _arg4 asm("a3"); \
+ \
+ asm volatile ( \
+ "addiu $sp, $sp, -32\n" \
+ "syscall\n" \
+ "addiu $sp, $sp, 32\n" \
+ : "=r"(_num), "=r"(_arg4) \
+ : "0"(_num), \
+ "r"(_arg1), "r"(_arg2), "r"(_arg3) \
+ : "memory", "cc", "at", "v1", "hi", "lo", \
+ \
+ ); \
+ _arg4 ? -_num : _num; \
+})
+
+#define my_syscall4(num, arg1, arg2, arg3, arg4) \
+({ \
+ register long _num asm("v0") = (num); \
+ register long _arg1 asm("a0") = (long)(arg1); \
+ register long _arg2 asm("a1") = (long)(arg2); \
+ register long _arg3 asm("a2") = (long)(arg3); \
+ register long _arg4 asm("a3") = (long)(arg4); \
+ \
+ asm volatile ( \
+ "addiu $sp, $sp, -32\n" \
+ "syscall\n" \
+ "addiu $sp, $sp, 32\n" \
+ : "=r" (_num), "=r"(_arg4) \
+ : "0"(_num), \
+ "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4) \
+ : "memory", "cc", "at", "v1", "hi", "lo", \
+ \
+ ); \
+ _arg4 ? -_num : _num; \
+})
+
+#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \
+({ \
+ register long _num asm("v0") = (num); \
+ register long _arg1 asm("a0") = (long)(arg1); \
+ register long _arg2 asm("a1") = (long)(arg2); \
+ register long _arg3 asm("a2") = (long)(arg3); \
+ register long _arg4 asm("a3") = (long)(arg4); \
+ register long _arg5 = (long)(arg5); \
+ \
+ asm volatile ( \
+ "addiu $sp, $sp, -32\n" \
+ "sw %7, 16($sp)\n" \
+ "syscall\n " \
+ "addiu $sp, $sp, 32\n" \
+ : "=r" (_num), "=r"(_arg4) \
+ : "0"(_num), \
+ "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5) \
+ : "memory", "cc", "at", "v1", "hi", "lo", \
+ \
+ ); \
+ _arg4 ? -_num : _num; \
+})
+
+/* startup code, note that it's called __start on MIPS */
+asm(".section .text\n"
+ ".set nomips16\n"
+ ".global __start\n"
+ ".set noreorder\n"
+ ".option pic0\n"
+ ".ent __start\n"
+ "__start:\n"
+ "lw $a0,($sp)\n" // argc was in the stack
+ "addiu $a1, $sp, 4\n" // argv = sp + 4
+ "sll $a2, $a0, 2\n" // a2 = argc * 4
+ "add $a2, $a2, $a1\n" // envp = argv + 4*argc ...
+ "addiu $a2, $a2, 4\n" // ... + 4
+ "li $t0, -8\n"
+ "and $sp, $sp, $t0\n" // sp must be 8-byte aligned
+ "addiu $sp,$sp,-16\n" // the callee expects to save a0..a3 there!
+ "jal main\n" // main() returns the status code, we'll exit with it.
+ "nop\n" // delayed slot
+ "and $a0, $v0, 0xff\n" // limit exit code to 8 bits
+ "li $v0, 4001\n" // NR_exit == 4001
+ "syscall\n"
+ ".end __start\n"
+ "");
+
+/* fcntl / open */
+#define O_RDONLY 0
+#define O_WRONLY 1
+#define O_RDWR 2
+#define O_APPEND 0x0008
+#define O_NONBLOCK 0x0080
+#define O_CREAT 0x0100
+#define O_TRUNC 0x0200
+#define O_EXCL 0x0400
+#define O_NOCTTY 0x0800
+#define O_DIRECTORY 0x10000
+
+/* The struct returned by the stat() syscall. 88 bytes are returned by the
+ * syscall.
+ */
+struct sys_stat_struct {
+ unsigned int st_dev;
+ long st_pad1[3];
+ unsigned long st_ino;
+ unsigned int st_mode;
+ unsigned int st_nlink;
+ unsigned int st_uid;
+ unsigned int st_gid;
+ unsigned int st_rdev;
+ long st_pad2[2];
+ long st_size;
+ long st_pad3;
+ long st_atime;
+ long st_atime_nsec;
+ long st_mtime;
+ long st_mtime_nsec;
+ long st_ctime;
+ long st_ctime_nsec;
+ long st_blksize;
+ long st_blocks;
+ long st_pad4[14];
+};
+
+#endif
+
+
+/* Below are the C functions used to declare the raw syscalls. They try to be
+ * architecture-agnostic, and return either a success or -errno. Declaring them
+ * static will lead to them being inlined in most cases, but it's still possible
+ * to reference them by a pointer if needed.
+ */
+static __attribute__((unused))
+void *sys_brk(void *addr)
+{
+ return (void *)my_syscall1(__NR_brk, addr);
+}
+
+static __attribute__((noreturn,unused))
+void sys_exit(int status)
+{
+ my_syscall1(__NR_exit, status & 255);
+ while(1); // shut the "noreturn" warnings.
+}
+
+static __attribute__((unused))
+int sys_chdir(const char *path)
+{
+ return my_syscall1(__NR_chdir, path);
+}
+
+static __attribute__((unused))
+int sys_chmod(const char *path, mode_t mode)
+{
+#ifdef __NR_fchmodat
+ return my_syscall4(__NR_fchmodat, AT_FDCWD, path, mode, 0);
+#else
+ return my_syscall2(__NR_chmod, path, mode);
+#endif
+}
+
+static __attribute__((unused))
+int sys_chown(const char *path, uid_t owner, gid_t group)
+{
+#ifdef __NR_fchownat
+ return my_syscall5(__NR_fchownat, AT_FDCWD, path, owner, group, 0);
+#else
+ return my_syscall3(__NR_chown, path, owner, group);
+#endif
+}
+
+static __attribute__((unused))
+int sys_chroot(const char *path)
+{
+ return my_syscall1(__NR_chroot, path);
+}
+
+static __attribute__((unused))
+int sys_close(int fd)
+{
+ return my_syscall1(__NR_close, fd);
+}
+
+static __attribute__((unused))
+int sys_dup(int fd)
+{
+ return my_syscall1(__NR_dup, fd);
+}
+
+static __attribute__((unused))
+int sys_dup2(int old, int new)
+{
+ return my_syscall2(__NR_dup2, old, new);
+}
+
+static __attribute__((unused))
+int sys_execve(const char *filename, char *const argv[], char *const envp[])
+{
+ return my_syscall3(__NR_execve, filename, argv, envp);
+}
+
+static __attribute__((unused))
+pid_t sys_fork(void)
+{
+ return my_syscall0(__NR_fork);
+}
+
+static __attribute__((unused))
+int sys_fsync(int fd)
+{
+ return my_syscall1(__NR_fsync, fd);
+}
+
+static __attribute__((unused))
+int sys_getdents64(int fd, struct linux_dirent64 *dirp, int count)
+{
+ return my_syscall3(__NR_getdents64, fd, dirp, count);
+}
+
+static __attribute__((unused))
+pid_t sys_getpgrp(void)
+{
+ return my_syscall0(__NR_getpgrp);
+}
+
+static __attribute__((unused))
+pid_t sys_getpid(void)
+{
+ return my_syscall0(__NR_getpid);
+}
+
+static __attribute__((unused))
+int sys_gettimeofday(struct timeval *tv, struct timezone *tz)
+{
+ return my_syscall2(__NR_gettimeofday, tv, tz);
+}
+
+static __attribute__((unused))
+int sys_ioctl(int fd, unsigned long req, void *value)
+{
+ return my_syscall3(__NR_ioctl, fd, req, value);
+}
+
+static __attribute__((unused))
+int sys_kill(pid_t pid, int signal)
+{
+ return my_syscall2(__NR_kill, pid, signal);
+}
+
+static __attribute__((unused))
+int sys_link(const char *old, const char *new)
+{
+#ifdef __NR_linkat
+ return my_syscall5(__NR_linkat, AT_FDCWD, old, AT_FDCWD, new, 0);
+#else
+ return my_syscall2(__NR_link, old, new);
+#endif
+}
+
+static __attribute__((unused))
+off_t sys_lseek(int fd, off_t offset, int whence)
+{
+ return my_syscall3(__NR_lseek, fd, offset, whence);
+}
+
+static __attribute__((unused))
+int sys_mkdir(const char *path, mode_t mode)
+{
+#ifdef __NR_mkdirat
+ return my_syscall3(__NR_mkdirat, AT_FDCWD, path, mode);
+#else
+ return my_syscall2(__NR_mkdir, path, mode);
+#endif
+}
+
+static __attribute__((unused))
+long sys_mknod(const char *path, mode_t mode, dev_t dev)
+{
+#ifdef __NR_mknodat
+ return my_syscall4(__NR_mknodat, AT_FDCWD, path, mode, dev);
+#else
+ return my_syscall3(__NR_mknod, path, mode, dev);
+#endif
+}
+
+static __attribute__((unused))
+int sys_mount(const char *src, const char *tgt, const char *fst,
+ unsigned long flags, const void *data)
+{
+ return my_syscall5(__NR_mount, src, tgt, fst, flags, data);
+}
+
+static __attribute__((unused))
+int sys_open(const char *path, int flags, mode_t mode)
+{
+#ifdef __NR_openat
+ return my_syscall4(__NR_openat, AT_FDCWD, path, flags, mode);
+#else
+ return my_syscall3(__NR_open, path, flags, mode);
+#endif
+}
+
+static __attribute__((unused))
+int sys_pivot_root(const char *new, const char *old)
+{
+ return my_syscall2(__NR_pivot_root, new, old);
+}
+
+static __attribute__((unused))
+int sys_poll(struct pollfd *fds, int nfds, int timeout)
+{
+ return my_syscall3(__NR_poll, fds, nfds, timeout);
+}
+
+static __attribute__((unused))
+ssize_t sys_read(int fd, void *buf, size_t count)
+{
+ return my_syscall3(__NR_read, fd, buf, count);
+}
+
+static __attribute__((unused))
+ssize_t sys_reboot(int magic1, int magic2, int cmd, void *arg)
+{
+ return my_syscall4(__NR_reboot, magic1, magic2, cmd, arg);
+}
+
+static __attribute__((unused))
+int sys_sched_yield(void)
+{
+ return my_syscall0(__NR_sched_yield);
+}
+
+static __attribute__((unused))
+int sys_select(int nfds, fd_set *rfds, fd_set *wfds, fd_set *efds, struct timeval *timeout)
+{
+#if defined(__ARCH_WANT_SYS_OLD_SELECT) && !defined(__NR__newselect)
+ struct sel_arg_struct {
+ unsigned long n;
+ fd_set *r, *w, *e;
+ struct timeval *t;
+ } arg = { .n = nfds, .r = rfds, .w = wfds, .e = efds, .t = timeout };
+ return my_syscall1(__NR_select, &arg);
+#elif defined(__ARCH_WANT_SYS_PSELECT6) && defined(__NR_pselect6)
+ struct timespec t;
+
+ if (timeout) {
+ t.tv_sec = timeout->tv_sec;
+ t.tv_nsec = timeout->tv_usec * 1000;
+ }
+ return my_syscall6(__NR_pselect6, nfds, rfds, wfds, efds, timeout ? &t : NULL, NULL);
+#else
+#ifndef __NR__newselect
+#define __NR__newselect __NR_select
+#endif
+ return my_syscall5(__NR__newselect, nfds, rfds, wfds, efds, timeout);
+#endif
+}
+
+static __attribute__((unused))
+int sys_setpgid(pid_t pid, pid_t pgid)
+{
+ return my_syscall2(__NR_setpgid, pid, pgid);
+}
+
+static __attribute__((unused))
+pid_t sys_setsid(void)
+{
+ return my_syscall0(__NR_setsid);
+}
+
+static __attribute__((unused))
+int sys_stat(const char *path, struct stat *buf)
+{
+ struct sys_stat_struct stat;
+ long ret;
+
+#ifdef __NR_newfstatat
+ /* only solution for arm64 */
+ ret = my_syscall4(__NR_newfstatat, AT_FDCWD, path, &stat, 0);
+#else
+ ret = my_syscall2(__NR_stat, path, &stat);
+#endif
+ buf->st_dev = stat.st_dev;
+ buf->st_ino = stat.st_ino;
+ buf->st_mode = stat.st_mode;
+ buf->st_nlink = stat.st_nlink;
+ buf->st_uid = stat.st_uid;
+ buf->st_gid = stat.st_gid;
+ buf->st_rdev = stat.st_rdev;
+ buf->st_size = stat.st_size;
+ buf->st_blksize = stat.st_blksize;
+ buf->st_blocks = stat.st_blocks;
+ buf->st_atime = stat.st_atime;
+ buf->st_mtime = stat.st_mtime;
+ buf->st_ctime = stat.st_ctime;
+ return ret;
+}
+
+
+static __attribute__((unused))
+int sys_symlink(const char *old, const char *new)
+{
+#ifdef __NR_symlinkat
+ return my_syscall3(__NR_symlinkat, old, AT_FDCWD, new);
+#else
+ return my_syscall2(__NR_symlink, old, new);
+#endif
+}
+
+static __attribute__((unused))
+mode_t sys_umask(mode_t mode)
+{
+ return my_syscall1(__NR_umask, mode);
+}
+
+static __attribute__((unused))
+int sys_umount2(const char *path, int flags)
+{
+ return my_syscall2(__NR_umount2, path, flags);
+}
+
+static __attribute__((unused))
+int sys_unlink(const char *path)
+{
+#ifdef __NR_unlinkat
+ return my_syscall3(__NR_unlinkat, AT_FDCWD, path, 0);
+#else
+ return my_syscall1(__NR_unlink, path);
+#endif
+}
+
+static __attribute__((unused))
+pid_t sys_wait4(pid_t pid, int *status, int options, struct rusage *rusage)
+{
+ return my_syscall4(__NR_wait4, pid, status, options, rusage);
+}
+
+static __attribute__((unused))
+pid_t sys_waitpid(pid_t pid, int *status, int options)
+{
+ return sys_wait4(pid, status, options, 0);
+}
+
+static __attribute__((unused))
+pid_t sys_wait(int *status)
+{
+ return sys_waitpid(-1, status, 0);
+}
+
+static __attribute__((unused))
+ssize_t sys_write(int fd, const void *buf, size_t count)
+{
+ return my_syscall3(__NR_write, fd, buf, count);
+}
+
+
+/* Below are the libc-compatible syscalls which return x or -1 and set errno.
+ * They rely on the functions above. Similarly they're marked static so that it
+ * is possible to assign pointers to them if needed.
+ */
+
+static __attribute__((unused))
+int brk(void *addr)
+{
+ void *ret = sys_brk(addr);
+
+ if (!ret) {
+ SET_ERRNO(ENOMEM);
+ return -1;
+ }
+ return 0;
+}
+
+static __attribute__((noreturn,unused))
+void exit(int status)
+{
+ sys_exit(status);
+}
+
+static __attribute__((unused))
+int chdir(const char *path)
+{
+ int ret = sys_chdir(path);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int chmod(const char *path, mode_t mode)
+{
+ int ret = sys_chmod(path, mode);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int chown(const char *path, uid_t owner, gid_t group)
+{
+ int ret = sys_chown(path, owner, group);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int chroot(const char *path)
+{
+ int ret = sys_chroot(path);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int close(int fd)
+{
+ int ret = sys_close(fd);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int dup2(int old, int new)
+{
+ int ret = sys_dup2(old, new);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int execve(const char *filename, char *const argv[], char *const envp[])
+{
+ int ret = sys_execve(filename, argv, envp);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+pid_t fork(void)
+{
+ pid_t ret = sys_fork();
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int fsync(int fd)
+{
+ int ret = sys_fsync(fd);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int getdents64(int fd, struct linux_dirent64 *dirp, int count)
+{
+ int ret = sys_getdents64(fd, dirp, count);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+pid_t getpgrp(void)
+{
+ pid_t ret = sys_getpgrp();
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+pid_t getpid(void)
+{
+ pid_t ret = sys_getpid();
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int gettimeofday(struct timeval *tv, struct timezone *tz)
+{
+ int ret = sys_gettimeofday(tv, tz);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int ioctl(int fd, unsigned long req, void *value)
+{
+ int ret = sys_ioctl(fd, req, value);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int kill(pid_t pid, int signal)
+{
+ int ret = sys_kill(pid, signal);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int link(const char *old, const char *new)
+{
+ int ret = sys_link(old, new);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+off_t lseek(int fd, off_t offset, int whence)
+{
+ off_t ret = sys_lseek(fd, offset, whence);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int mkdir(const char *path, mode_t mode)
+{
+ int ret = sys_mkdir(path, mode);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int mknod(const char *path, mode_t mode, dev_t dev)
+{
+ int ret = sys_mknod(path, mode, dev);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int mount(const char *src, const char *tgt,
+ const char *fst, unsigned long flags,
+ const void *data)
+{
+ int ret = sys_mount(src, tgt, fst, flags, data);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int open(const char *path, int flags, mode_t mode)
+{
+ int ret = sys_open(path, flags, mode);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int pivot_root(const char *new, const char *old)
+{
+ int ret = sys_pivot_root(new, old);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int poll(struct pollfd *fds, int nfds, int timeout)
+{
+ int ret = sys_poll(fds, nfds, timeout);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+ssize_t read(int fd, void *buf, size_t count)
+{
+ ssize_t ret = sys_read(fd, buf, count);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int reboot(int cmd)
+{
+ int ret = sys_reboot(LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, cmd, 0);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+void *sbrk(intptr_t inc)
+{
+ void *ret;
+
+ /* first call to find current end */
+ if ((ret = sys_brk(0)) && (sys_brk(ret + inc) == ret + inc))
+ return ret + inc;
+
+ SET_ERRNO(ENOMEM);
+ return (void *)-1;
+}
+
+static __attribute__((unused))
+int sched_yield(void)
+{
+ int ret = sys_sched_yield();
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int select(int nfds, fd_set *rfds, fd_set *wfds, fd_set *efds, struct timeval *timeout)
+{
+ int ret = sys_select(nfds, rfds, wfds, efds, timeout);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int setpgid(pid_t pid, pid_t pgid)
+{
+ int ret = sys_setpgid(pid, pgid);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+pid_t setsid(void)
+{
+ pid_t ret = sys_setsid();
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+unsigned int sleep(unsigned int seconds)
+{
+ struct timeval my_timeval = { seconds, 0 };
+
+ if (sys_select(0, 0, 0, 0, &my_timeval) < 0)
+ return my_timeval.tv_sec + !!my_timeval.tv_usec;
+ else
+ return 0;
+}
+
+static __attribute__((unused))
+int stat(const char *path, struct stat *buf)
+{
+ int ret = sys_stat(path, buf);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int symlink(const char *old, const char *new)
+{
+ int ret = sys_symlink(old, new);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int tcsetpgrp(int fd, pid_t pid)
+{
+ return ioctl(fd, TIOCSPGRP, &pid);
+}
+
+static __attribute__((unused))
+mode_t umask(mode_t mode)
+{
+ return sys_umask(mode);
+}
+
+static __attribute__((unused))
+int umount2(const char *path, int flags)
+{
+ int ret = sys_umount2(path, flags);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+int unlink(const char *path)
+{
+ int ret = sys_unlink(path);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+pid_t wait4(pid_t pid, int *status, int options, struct rusage *rusage)
+{
+ pid_t ret = sys_wait4(pid, status, options, rusage);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+pid_t waitpid(pid_t pid, int *status, int options)
+{
+ pid_t ret = sys_waitpid(pid, status, options);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+pid_t wait(int *status)
+{
+ pid_t ret = sys_wait(status);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+ssize_t write(int fd, const void *buf, size_t count)
+{
+ ssize_t ret = sys_write(fd, buf, count);
+
+ if (ret < 0) {
+ SET_ERRNO(-ret);
+ ret = -1;
+ }
+ return ret;
+}
+
+/* some size-optimized reimplementations of a few common str* and mem*
+ * functions. They're marked static, except memcpy() and raise() which are used
+ * by libgcc on ARM, so they are marked weak instead in order not to cause an
+ * error when building a program made of multiple files (not recommended).
+ */
+
+static __attribute__((unused))
+void *memmove(void *dst, const void *src, size_t len)
+{
+ ssize_t pos = (dst <= src) ? -1 : (long)len;
+ void *ret = dst;
+
+ while (len--) {
+ pos += (dst <= src) ? 1 : -1;
+ ((char *)dst)[pos] = ((char *)src)[pos];
+ }
+ return ret;
+}
+
+static __attribute__((unused))
+void *memset(void *dst, int b, size_t len)
+{
+ char *p = dst;
+
+ while (len--)
+ *(p++) = b;
+ return dst;
+}
+
+static __attribute__((unused))
+int memcmp(const void *s1, const void *s2, size_t n)
+{
+ size_t ofs = 0;
+ char c1 = 0;
+
+ while (ofs < n && !(c1 = ((char *)s1)[ofs] - ((char *)s2)[ofs])) {
+ ofs++;
+ }
+ return c1;
+}
+
+static __attribute__((unused))
+char *strcpy(char *dst, const char *src)
+{
+ char *ret = dst;
+
+ while ((*dst++ = *src++));
+ return ret;
+}
+
+static __attribute__((unused))
+char *strchr(const char *s, int c)
+{
+ while (*s) {
+ if (*s == (char)c)
+ return (char *)s;
+ s++;
+ }
+ return NULL;
+}
+
+static __attribute__((unused))
+char *strrchr(const char *s, int c)
+{
+ const char *ret = NULL;
+
+ while (*s) {
+ if (*s == (char)c)
+ ret = s;
+ s++;
+ }
+ return (char *)ret;
+}
+
+static __attribute__((unused))
+size_t nolibc_strlen(const char *str)
+{
+ size_t len;
+
+ for (len = 0; str[len]; len++);
+ return len;
+}
+
+#define strlen(str) ({ \
+ __builtin_constant_p((str)) ? \
+ __builtin_strlen((str)) : \
+ nolibc_strlen((str)); \
+})
+
+static __attribute__((unused))
+int isdigit(int c)
+{
+ return (unsigned int)(c - '0') <= 9;
+}
+
+static __attribute__((unused))
+long atol(const char *s)
+{
+ unsigned long ret = 0;
+ unsigned long d;
+ int neg = 0;
+
+ if (*s == '-') {
+ neg = 1;
+ s++;
+ }
+
+ while (1) {
+ d = (*s++) - '0';
+ if (d > 9)
+ break;
+ ret *= 10;
+ ret += d;
+ }
+
+ return neg ? -ret : ret;
+}
+
+static __attribute__((unused))
+int atoi(const char *s)
+{
+ return atol(s);
+}
+
+static __attribute__((unused))
+const char *ltoa(long in)
+{
+ /* large enough for -9223372036854775808 */
+ static char buffer[21];
+ char *pos = buffer + sizeof(buffer) - 1;
+ int neg = in < 0;
+ unsigned long n = neg ? -in : in;
+
+ *pos-- = '\0';
+ do {
+ *pos-- = '0' + n % 10;
+ n /= 10;
+ if (pos < buffer)
+ return pos + 1;
+ } while (n);
+
+ if (neg)
+ *pos-- = '-';
+ return pos + 1;
+}
+
+__attribute__((weak,unused))
+void *memcpy(void *dst, const void *src, size_t len)
+{
+ return memmove(dst, src, len);
+}
+
+/* needed by libgcc for divide by zero */
+__attribute__((weak,unused))
+int raise(int signal)
+{
+ return kill(getpid(), signal);
+}
+
+/* Here come a few helper functions */
+
+static __attribute__((unused))
+void FD_ZERO(fd_set *set)
+{
+ memset(set, 0, sizeof(*set));
+}
+
+static __attribute__((unused))
+void FD_SET(int fd, fd_set *set)
+{
+ if (fd < 0 || fd >= FD_SETSIZE)
+ return;
+ set->fd32[fd / 32] |= 1 << (fd & 31);
+}
+
+/* WARNING, it only deals with the 4096 first majors and 256 first minors */
+static __attribute__((unused))
+dev_t makedev(unsigned int major, unsigned int minor)
+{
+ return ((major & 0xfff) << 8) | (minor & 0xff);
+}
From b94ec36896dafc0a12106b1536fe87f99e9a0c5d Mon Sep 17 00:00:00 2001
From: Willy Tarreau
Date: Sun, 9 Sep 2018 13:33:02 +0200
Subject: [PATCH 021/112] rcutorture: Make use of nolibc when available
This reduces the size of the init executable from ~800 kB to ~800 bytes
on x86_64. This is only implemented for x86_64, i386, arm and arm64.
Others not tested.
Signed-off-by: Willy Tarreau
Signed-off-by: Paul E. McKenney
---
.../selftests/rcutorture/bin/mkinitrd.sh | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
index 56a56ea06983dd..da298394daa244 100755
--- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
+++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
@@ -82,8 +82,10 @@ cd $D
mkdir -p initrd
cd initrd
cat > init.c << '___EOF___'
+#ifndef NOLIBC
#include
#include
+#endif
volatile unsigned long delaycount;
@@ -113,7 +115,21 @@ int main(int argc, int argv[])
return 0;
}
___EOF___
-${CROSS_COMPILE}gcc -s -static -Os -o init init.c
+
+# build using nolibc on supported archs (smaller executable) and fall
+# back to regular glibc on other ones.
+if echo -e "#if __x86_64__||__i386__||__i486__||__i586__||__i686__" \
+ "||__ARM_EABI__||__aarch64__\nyes\n#endif" \
+ | ${CROSS_COMPILE}gcc -E -nostdlib -xc - \
+ | grep -q '^yes'; then
+ # architecture supported by nolibc
+ ${CROSS_COMPILE}gcc -fno-asynchronous-unwind-tables -fno-ident \
+ -nostdlib -include ../bin/nolibc.h -lgcc -s -static -Os \
+ -o init init.c
+else
+ ${CROSS_COMPILE}gcc -s -static -Os -o init init.c
+fi
+
rm init.c
echo "Done creating a statically linked C-language initrd"
From 868f7a09a4f385e5167fc0ff9ec4c3f817897f3a Mon Sep 17 00:00:00 2001
From: Lance Roy
Date: Thu, 4 Oct 2018 23:45:38 -0700
Subject: [PATCH 022/112] x86/PCI: Replace spin_is_locked() with lockdep
lockdep_assert_held() is better suited to checking locking requirements,
since it only checks if the current thread holds the lock regardless of
whether someone else does. This is also a step towards possibly removing
spin_is_locked().
Signed-off-by: Lance Roy
Cc: Bjorn Helgaas
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Borislav Petkov
Cc: "H. Peter Anvin"
Cc:
Cc:
Signed-off-by: Paul E. McKenney
---
arch/x86/pci/i386.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 8cd66152cdb0e8..9df652d3d9275b 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -59,7 +59,7 @@ static struct pcibios_fwaddrmap *pcibios_fwaddrmap_lookup(struct pci_dev *dev)
{
struct pcibios_fwaddrmap *map;
- WARN_ON_SMP(!spin_is_locked(&pcibios_fwaddrmap_lock));
+ lockdep_assert_held(&pcibios_fwaddrmap_lock);
list_for_each_entry(map, &pcibios_fwaddrmappings, list)
if (map->dev == dev)
From f3e763c3e544b73ae5c4a3842cedb9ff6ca37715 Mon Sep 17 00:00:00 2001
From: Randy Dunlap
Date: Mon, 3 Sep 2018 12:45:45 -0700
Subject: [PATCH 023/112] srcu: Fix kernel-doc missing notation
Fix kernel-doc warnings for missing parameter descriptions:
../include/linux/srcu.h:175: warning: Function parameter or member 'p' not described in 'srcu_dereference_notrace'
../include/linux/srcu.h:175: warning: Function parameter or member 'sp' not described in 'srcu_dereference_notrace'
Fixes: 0b764a6e4e19d ("srcu: Add notrace variant of srcu_dereference")
Signed-off-by: Randy Dunlap
Cc: Lai Jiangshan
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Steven Rostedt
Cc: Mathieu Desnoyers
Cc: Joel Fernandes (Google)
Signed-off-by: Paul E. McKenney
---
include/linux/srcu.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 67135d4a8a30e3..ebd5f1511690f1 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -171,6 +171,9 @@ static inline int srcu_read_lock_held(const struct srcu_struct *sp)
/**
* srcu_dereference_notrace - no tracing and no lockdep calls from here
+ * @p: the pointer to fetch and protect for later dereferencing
+ * @sp: pointer to the srcu_struct, which is used to check that we
+ * really are in an SRCU read-side critical section.
*/
#define srcu_dereference_notrace(p, sp) srcu_dereference_check((p), (sp), 1)
From 0607ba8403c4cdb253f8c5200ecf654dfb7790cc Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney"
Date: Wed, 25 Apr 2018 13:01:25 -0700
Subject: [PATCH 024/112] srcu: Prevent __call_srcu() counter wrap with
read-side critical section
Ever since cdf7abc4610a ("srcu: Allow use of Tiny/Tree SRCU from
both process and interrupt context"), it has been permissible
to use SRCU read-side critical sections in interrupt context.
This allows __call_srcu() to use SRCU read-side critical sections to
prevent a new SRCU grace period from ending before the call to either
srcu_funnel_gp_start() or srcu_funnel_exp_start completes, thus preventing
SRCU grace-period counter overflow during that time.
Note that this does not permit removal of the counter-wrap checks in
srcu_gp_end(). These check are necessary to handle the case where
a given CPU does not interact at all with SRCU for an extended time
period.
This commit therefore adds an SRCU read-side critical section to
__call_srcu() in order to prevent grace period counter wrap during
the funnel-locking process.
Signed-off-by: Paul E. McKenney
---
kernel/rcu/srcutree.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index a8846ed7f35298..60f3236beaf721 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -858,6 +858,7 @@ void __call_srcu(struct srcu_struct *sp, struct rcu_head *rhp,
rcu_callback_t func, bool do_norm)
{
unsigned long flags;
+ int idx;
bool needexp = false;
bool needgp = false;
unsigned long s;
@@ -871,6 +872,7 @@ void __call_srcu(struct srcu_struct *sp, struct rcu_head *rhp,
return;
}
rhp->func = func;
+ idx = srcu_read_lock(sp);
local_irq_save(flags);
sdp = this_cpu_ptr(sp->sda);
spin_lock_rcu_node(sdp);
@@ -892,6 +894,7 @@ void __call_srcu(struct srcu_struct *sp, struct rcu_head *rhp,
srcu_funnel_gp_start(sp, sdp, s, do_norm);
else if (needexp)
srcu_funnel_exp_start(sp, sdp->mynode, s);
+ srcu_read_unlock(sp, idx);
}
/**
From 9cac83a57e99e4692315b7a91a81bab787961d97 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney"
Date: Tue, 11 Sep 2018 08:57:48 -0700
Subject: [PATCH 025/112] rcu: Stop expedited grace periods from relying on
stop-machine
The CPU-selection code in sync_rcu_exp_select_cpus() disables preemption
to prevent the cpu_online_mask from changing. However, this relies on
the stop-machine mechanism in the CPU-hotplug offline code, which is not
desirable (it would be good to someday remove the stop-machine mechanism).
This commit therefore instead uses the relevant leaf rcu_node structure's
->ffmask, which has a bit set for all CPUs that are fully functional.
A given CPU's bit is cleared very early during offline processing by
rcutree_offline_cpu() and set very late during online processing by
rcutree_online_cpu(). Therefore, if a CPU's bit is set in this mask, and
preemption is disabled, we have to be before the synchronize_sched() in
the CPU-hotplug offline code, which means that the CPU is guaranteed to be
workqueue-ready throughout the duration of the enclosing preempt_disable()
region of code.
This also has the side-effect of using WORK_CPU_UNBOUND if all the CPUs for
this leaf rcu_node structure are offline, which is an acceptable difference
in behavior.
Reported-by: Sebastian Andrzej Siewior
Signed-off-by: Paul E. McKenney
---
kernel/rcu/tree_exp.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 8d18c1014e2beb..e669ccf3751b1b 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -450,10 +450,12 @@ static void sync_rcu_exp_select_cpus(smp_call_func_t func)
}
INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
preempt_disable();
- cpu = cpumask_next(rnp->grplo - 1, cpu_online_mask);
+ cpu = find_next_bit(&rnp->ffmask, BITS_PER_LONG, -1);
/* If all offline, queue the work on an unbound CPU. */
- if (unlikely(cpu > rnp->grphi))
+ if (unlikely(cpu > rnp->grphi - rnp->grplo))
cpu = WORK_CPU_UNBOUND;
+ else
+ cpu += rnp->grplo;
queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
preempt_enable();
rnp->exp_need_flush = true;
From 9213784b48f8ba666b4695ca3f5d34f583daab83 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney"
Date: Mon, 22 Oct 2018 08:26:00 -0700
Subject: [PATCH 026/112] rcu: Eliminate BUG_ON() for kernel/rcu/tree_plugin.h
The tree_plugin.h file has a number of calls to BUG_ON(), which panics
the kernel, which is not a good strategy for devices (like embedded)
that don't have a way to capture console output. This commit therefore
converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE().
Reported-by: Linus Torvalds
Signed-off-by: Paul E. McKenney
[ paulmck: Fix typo: s/rcuo/rcub/. ]
---
kernel/rcu/tree_plugin.h | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 05915e53633657..adda608874a869 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1464,7 +1464,8 @@ static void __init rcu_spawn_boost_kthreads(void)
for_each_possible_cpu(cpu)
per_cpu(rcu_cpu_has_work, cpu) = 0;
- BUG_ON(smpboot_register_percpu_thread(&rcu_cpu_thread_spec));
+ if (WARN_ONCE(smpboot_register_percpu_thread(&rcu_cpu_thread_spec), "%s: Could not start rcub kthread, OOM is now expected behavior\n", __func__))
+ return;
rcu_for_each_leaf_node(rnp)
(void)rcu_spawn_one_boost_kthread(rnp);
}
@@ -2322,7 +2323,8 @@ static int rcu_nocb_kthread(void *arg)
tail = rdp->nocb_follower_tail;
rdp->nocb_follower_tail = &rdp->nocb_follower_head;
raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
- BUG_ON(!list);
+ if (WARN_ON_ONCE(!list))
+ continue;
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeNonEmpty"));
/* Each pass through the following loop invokes a callback. */
@@ -2495,7 +2497,8 @@ static void rcu_spawn_one_nocb_kthread(int cpu)
/* Spawn the kthread for this CPU. */
t = kthread_run(rcu_nocb_kthread, rdp_spawn,
"rcuo%c/%d", rcu_state.abbr, cpu);
- BUG_ON(IS_ERR(t));
+ if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo kthread, OOM is now expected behavior\n", __func__))
+ return;
WRITE_ONCE(rdp_spawn->nocb_kthread, t);
}
From f0ad56e876cdd67730065274625edbcfe0cca278 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney"
Date: Mon, 22 Oct 2018 08:33:06 -0700
Subject: [PATCH 027/112] rcu: Eliminate BUG_ON() for kernel/rcu/update.c
The update.c file has a number of calls to BUG_ON(), which panics the
kernel, which is not a good strategy for devices (like embedded) that
don't have a way to capture console output. This commit therefore
converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE().
Reported-by: Linus Torvalds
Signed-off-by: Paul E. McKenney
---
kernel/rcu/update.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index f203b94f6b5be1..cb26de25658aa5 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -822,7 +822,8 @@ static int __init rcu_spawn_tasks_kthread(void)
struct task_struct *t;
t = kthread_run(rcu_tasks_kthread, NULL, "rcu_tasks_kthread");
- BUG_ON(IS_ERR(t));
+ if (WARN_ONCE(IS_ERR(t), "%s: Could not start Tasks-RCU grace-period kthread, OOM is now expected behavior\n", __func__))
+ return 0;
smp_mb(); /* Ensure others see full kthread. */
WRITE_ONCE(rcu_tasks_kthread_ptr, t);
return 0;
From 5cc379a42acd7104747077db7aaf4b01115ee484 Mon Sep 17 00:00:00 2001
From: "Joel Fernandes (Google)"
Date: Tue, 25 Sep 2018 11:25:57 -0700
Subject: [PATCH 028/112] doc: Update information about resched_cpu
Since commit fced9c8cfe6b ("rcu: Avoid resched_cpu() when rescheduling
the current CPU"), resched_cpu is not directly called from
sync_sched_exp_handler. Update the documentation about the same.
Signed-off-by: Joel Fernandes (Google)
Cc:
Signed-off-by: Paul E. McKenney
---
.../Expedited-Grace-Periods/Expedited-Grace-Periods.html | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html
index e62c7c34a369ee..8e4f873b979ffd 100644
--- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html
+++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html
@@ -160,9 +160,9 @@
The rcu_data Structure
-
- The rcu_dynticks Structure
The rcu_head Structure
@@ -174,16 +172,8 @@ Data-Structure Relationships
RCU must handle dyntick-idle CPUs specially
because RCU would otherwise wake up each CPU on every grace period,
which would defeat the whole purpose of CONFIG_NO_HZ_IDLE.
-RCU uses the rcu_dynticks structure to track
-which CPUs are in dyntick idle mode, as shown below:
-
-
-
-
However, if a CPU is in dyntick-idle mode, it is in that mode
-for all flavors of RCU.
-Therefore, a single rcu_dynticks structure is allocated per
-CPU, and all of a given CPU's rcu_data structures share
-that rcu_dynticks, as shown in the figure.
+RCU uses the dynticks related fields in the rcu_data structure
+to track which CPUs are in dyntick idle mode.
Kernels built with CONFIG_PREEMPT_RCU support
rcu_preempt in addition to rcu_sched and rcu_bh, as shown below:
@@ -216,9 +206,6 @@
Data-Structure Relationships
Each rcu_node structure has a spinlock.
The fields in rcu_data are private to the corresponding
CPU, although a few can be read and written by other CPUs.
- Similarly, the fields in rcu_dynticks are private
- to the corresponding CPU, although a few can be read by
- other CPUs.
It is important to note that different data structures can have
@@ -274,11 +261,6 @@
Data-Structure Relationships
access to this information from the corresponding CPU.
Finally, this structure records past dyntick-idle state
for the corresponding CPU and also tracks statistics.
- rcu_dynticks:
- This per-CPU structure tracks the current dyntick-idle
- state for the corresponding CPU.
- Unlike the other three structures, the rcu_dynticks
- structure is not replicated per RCU flavor.
rcu_head:
This structure represents RCU callbacks, and is the
only structure allocated and managed by RCU users.
@@ -289,8 +271,8 @@ Data-Structure Relationships
If all you wanted from this article was a general notion of how
RCU's data structures are related, you are done.
Otherwise, each of the following sections give more details on
-the rcu_state, rcu_node, rcu_data,
-and rcu_dynticks data structures.
+the rcu_state, rcu_node and rcu_data data
+structures.
The rcu_state Structure
@@ -1017,30 +999,19 @@ Connection to Other Data Structures
1 int cpu;
- 2 struct rcu_state *rsp;
- 3 struct rcu_node *mynode;
- 4 struct rcu_dynticks *dynticks;
- 5 unsigned long grpmask;
- 6 bool beenonline;
+ 2 struct rcu_node *mynode;
+ 3 unsigned long grpmask;
+ 4 bool beenonline;
The ->cpu field contains the number of the
-corresponding CPU, the ->rsp pointer references
-the corresponding rcu_state structure (and is most frequently
-used to locate the name of the corresponding flavor of RCU for tracing),
-and the ->mynode field references the corresponding
-rcu_node structure.
+corresponding CPU and the ->mynode field references the
+corresponding rcu_node structure.
The ->mynode is used to propagate quiescent states
up the combining tree.
-
The ->dynticks pointer references the
-rcu_dynticks structure corresponding to this
-CPU.
-Recall that a single per-CPU instance of the rcu_dynticks
-structure is shared among all flavors of RCU.
-These first four fields are constant and therefore require not
-synchronization.
+These two fields are constant and therefore do not require synchronization.
-
The ->grpmask field indicates the bit in
+
The ->grpmask field indicates the bit in
the ->mynode->qsmask corresponding to this
rcu_data structure, and is also used when propagating
quiescent states.
@@ -1181,26 +1152,22 @@
Dyntick-Idle Handling
count the number of times this CPU is determined to be in
dyntick-idle state, and is used for tracing and debugging purposes.
-
-The rcu_dynticks Structure
-
-The rcu_dynticks maintains the per-CPU dyntick-idle state
-for the corresponding CPU.
-Unlike the other structures, rcu_dynticks is not
-replicated over the different flavors of RCU.
-The fields in this structure may be accessed only from the corresponding
-CPU (and from tracing) unless otherwise stated.
-Its fields are as follows:
+
+This portion of the rcu_data structure is declared as follows:
1 long dynticks_nesting;
2 long dynticks_nmi_nesting;
3 atomic_t dynticks;
4 bool rcu_need_heavy_qs;
- 5 unsigned long rcu_qs_ctr;
- 6 bool rcu_urgent_qs;
+ 5 bool rcu_urgent_qs;
+These fields in the rcu_data structure maintain the per-CPU dyntick-idle
+state for the corresponding CPU.
+The fields may be accessed only from the corresponding CPU (and from tracing)
+unless otherwise stated.
+
The ->dynticks_nesting field counts the
nesting depth of process execution, so that in normal circumstances
this counter has value zero or one.
@@ -1242,19 +1209,12 @@
This flag is checked by RCU's context-switch and cond_resched()
code, which provide a momentary idle sojourn in response.
-