Merge pull request #28 from gmbeard/feature/performance-improvements

feat(performance) Multiple performance improvements and other fixes
gmbeard · Dec 9, 2023 · 67b6057 · 67b6057
2 parents 92b09ab + 52e6959
commit 67b6057
Show file tree

Hide file tree

Showing 43 changed files with 974 additions and 254 deletions.
diff --git a/.gitignore b/.gitignore
@@ -3,3 +3,4 @@ build/
 .private/
 experimental/
 dist/
+report/
diff --git a/.versioning/changes/DyxXCHexJS.minor.md b/.versioning/changes/DyxXCHexJS.minor.md
@@ -0,0 +1 @@
+Improves encoding performance by using a dedicated thread
diff --git a/.versioning/changes/S9GbUeHXdB.minor.md b/.versioning/changes/S9GbUeHXdB.minor.md
@@ -1 +1 @@
-Adds a build-time option to enable timing metric collection for audio & video
+Adds a build-time option to enable frame time metric collection
diff --git a/.versioning/changes/UaFV3jPgO3.minor.md b/.versioning/changes/UaFV3jPgO3.minor.md
@@ -0,0 +1 @@
+Reduces frame "jitter" when capturing certain games in X11
diff --git a/.versioning/changes/wekMseGz7z.patch.md b/.versioning/changes/wekMseGz7z.patch.md
@@ -0,0 +1 @@
+Fixes a memory leak in H/W encoding pipeline
diff --git a/README.md b/README.md
@@ -9,6 +9,7 @@ Typical screen capture utilities copy the framebuffer data between host and GPU
 - [Building from source](#building-from-source)
 - [Installing](#installing)
 - [Alternative projects](#some-alternative-projects)
+- [FAQ](doc/faq.md)
 
 #### Example - Cyberpunk 2077
 [![Cyberpunk 2077](http://i3.ytimg.com/vi/frXGxrdgTLY/hqdefault.jpg)](https://www.youtube.com/watch?v=frXGxrdgTLY)
@@ -26,21 +27,15 @@ The resulting media/container type is determined by the extension of `<OUTPUT FI
 
 If no `OPTIONS` are specified then *Shadow Cast* will pick some sensible defaults for the audio/video encoders and sample/frame rates, but these can be changed by specifying the following `OPTIONS` on the command line...
 
-- `-A <AUDIO ENCODER>` - All options available to `ffmpeg` should work here. Defaults to `libopus`
-- `-V <VIDEO ENCODER>` - Available options are `h264_nvenc` and `hevc_nvenc`. Defaults to `hevc_nvenc`
-- `-f <FRAME RATE>` - Values from `20` to `60` are accepted. Defaults to `60`
-- `-s <SAMPLE RATE>` - Defaults to `48000` (_NOTE: Some encoders will only support certain sample rates. Shadow Cast will display an error if your chosen sample rate isn't supported_)
+| Option                    | Description   |
+|---------                  |------------   |
+| `-A <AUDIO ENCODER>`      | Audio encoder. All options available to `ffmpeg` should work here. Defaults to `libopus` |
+| `-V <VIDEO ENCODER>`      | Video encoder. Available options are `h264_nvenc` and `hevc_nvenc`. defaults to `hevc_nvenc` |
+| `-f <FRAMES PER SECOND>`  | Capture FPS. values from `20` to `70` are accepted. defaults to `60`  |
+| `-s <SAMPLE RATE>`        | Audio sample rate. Defaults to `48000` (_NOTE: Some encoders will only support certain sample rates. Shadow Cast will display an error if your chosen sample rate isn't supported_) |
 
 Ctrl+C / SIGINT will stop the capture session and finalize the output media.
 
-### Help. I'm getting the following error
-
-#### `ERROR: Couldn't create NvFBC instance`
-*Shadow Cast* uses the *NvFBC* facility to provide efficient, low-latency framebuffer capture on X11. By default, NVIDIA disables this on most (if not all) of its consumer-level GPUs. However, there are two ways around this restriction...
-
-- You can find a utility to patch your NVIDIA drivers in the [keylase/nvidia-patch](https://github.com/keylase/nvidia-patch) GitHub repo.
-- You can obtain a "key" to unlock this feature at runtime. The key can be set at runtime via the `SHADOW_CAST_NVFBC_KEY=<BASE64 ENCODED KEY>` environment variable. I use this method but I'm not sure how "official" it is, so no keys are provided in this repo. Feel free to message me about this.
-
 ### Requirements
 - FFMpeg (libav)
 - NVIDIA GPU, supporting NVENC and NvFBC

diff --git a/doc/faq.md b/doc/faq.md
@@ -0,0 +1,24 @@
+### Q. Help, I'm getting the following error
+
+```
+ERROR: Couldn't create NvFBC instance
+```
+
+#### A.
+*Shadow Cast* uses the *NvFBC* facility to provide efficient, low-latency framebuffer capture on X11. By default, NVIDIA disables this on most (if not all) of its consumer-level GPUs. However, there are two ways around this restriction...
+
+- You can find a utility to patch your NVIDIA drivers in the [keylase/nvidia-patch](https://github.com/keylase/nvidia-patch) GitHub repo.
+- You can obtain a "key" to unlock this feature at runtime. The key can be set at runtime via the `SHADOW_CAST_NVFBC_KEY=<BASE64 ENCODED KEY>` environment variable. I use this method but I'm not sure how "official" it is, so no keys are provided in this repo. Feel free to message me about this.
+
+### Q. I'm capturing gameplay footage on X11 and my game is "laggy"
+
+#### A.
+In some games, the NVIDIA Capture library (*NvFBC*) appears to interact poorly with v-sync if your refresh rate matches the capture FPS. Try disabling v-sync. Another option you can try is to set the following environment variable when capturing...
+
+```
+$ SHADOW_CAST_STRICT_FPS=0 shadow-cast ...
+```
+
+Please note, however, using this option will scale the output video's frame rate to match NvFBC's closest match (e.g. `62.5` in the case of a 60fps capture).
+
+I haven't quite worked out the definitive cause of this "lagginess", but I suspect it is because NvFBC only accepts integer millisecond values as its sampling rate, so cannot exactly match *Shadow Cast*'s frame rate. For example, 60fps would require a fractional sampling frequency of `16.666` milliseconds, and NvFBC only allows either `16` or `17` milliseconds.
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
@@ -35,17 +35,20 @@ add_library(shadow-cast-obj
     services/audio_service.cpp
     services/context.cpp
     services/drm_video_service.cpp
+    services/encoder.cpp
+    services/encoder_service.cpp
     services/metrics_service.cpp
     services/readiness.cpp
     services/service.cpp
     services/service_registry.cpp
     services/signal_service.cpp
     services/video_service.cpp
 
+    utils/base64.cpp
     utils/cmd_line.cpp
     utils/contracts.cpp
-    utils/base64.cpp
     utils/elapsed.cpp
+    utils/frame_time.cpp
     utils/result.cpp
 
     error.cpp

diff --git a/src/av/codec.cpp b/src/av/codec.cpp
@@ -18,7 +18,7 @@ auto create_video_encoder(std::string const& encoder_name,
                           CUcontext cuda_ctx,
                           AVBufferPool* pool,
                           VideoOutputSize size,
-                          std::uint32_t fps,
+                          FrameTime const& ft,
                           AVPixelFormat pixel_format) -> sc::CodecContextPtr
 {
     sc::BorrowedPtr<AVCodec const> video_encoder { avcodec_find_encoder_by_name(
@@ -30,10 +30,10 @@ auto create_video_encoder(std::string const& encoder_name,
     sc::CodecContextPtr video_encoder_context { avcodec_alloc_context3(
         video_encoder.get()) };
     video_encoder_context->codec_id = video_encoder->id;
-    video_encoder_context->time_base.num = 1;
-    video_encoder_context->time_base.den = fps;
-    video_encoder_context->framerate.num = fps;
-    video_encoder_context->framerate.den = 1;
+    auto const timebase = ft.per_second_ratio();
+    video_encoder_context->time_base = timebase;
+    video_encoder_context->framerate.num = timebase.den;
+    video_encoder_context->framerate.den = timebase.num;
     video_encoder_context->sample_aspect_ratio.num = 0;
     video_encoder_context->sample_aspect_ratio.den = 0;
     video_encoder_context->max_b_frames = 0;
@@ -69,9 +69,6 @@ auto create_video_encoder(std::string const& encoder_name,
     hw_frame_context->height = video_encoder_context->height;
     hw_frame_context->sw_format = pixel_format;
     hw_frame_context->format = video_encoder_context->pix_fmt;
-    hw_frame_context->device_ref = av_buffer_ref(device_ctx.get());
-    hw_frame_context->device_ctx =
-        reinterpret_cast<AVHWDeviceContext*>(device_ctx->data);
 
     hw_frame_context->pool = pool;
     hw_frame_context->initial_pool_size = 1;
@@ -81,11 +78,7 @@ auto create_video_encoder(std::string const& encoder_name,
                                    av_error_to_string(ret) };
     }
 
-    /* TODO: Are we doing this correctly? It seems to be the
-     * source of memory leak...
-     */
-    video_encoder_context->hw_device_ctx = device_ctx.release();
-    video_encoder_context->hw_frames_ctx = frame_context.release();
+    video_encoder_context->hw_frames_ctx = av_buffer_ref(frame_context.get());
 
     AVDictionary* options = nullptr;
     av_dict_set_int(&options, "qp", 21, 0);

diff --git a/src/av/codec.hpp b/src/av/codec.hpp
@@ -25,7 +25,7 @@ auto create_video_encoder(std::string const& encoder_name,
                           CUcontext cuda_ctx,
                           AVBufferPool* pool,
                           VideoOutputSize size,
-                          std::uint32_t fps,
+                          FrameTime const& ft,
                           AVPixelFormat pixel_format) -> sc::CodecContextPtr;
 } // namespace sc
 

diff --git a/src/handlers/audio_chunk_writer.cpp b/src/handlers/audio_chunk_writer.cpp
@@ -3,6 +3,7 @@
 #include "av/sample_format.hpp"
 #include "config.hpp"
 #include "error.hpp"
+#include "services/encoder.hpp"
 #include <algorithm>
 #include <cassert>
 #include <memory>
@@ -11,15 +12,14 @@
 namespace sc
 {
 
-ChunkWriter::ChunkWriter(AVFormatContext* format_context,
-                         AVCodecContext* codec_context,
-                         AVStream* stream) noexcept
-    : format_context_ { format_context }
-    , codec_context_ { codec_context }
+ChunkWriter::ChunkWriter(AVCodecContext* codec_context,
+                         AVStream* stream,
+                         Encoder encoder) noexcept
+    : codec_context_ { codec_context }
     , stream_ { stream }
+    , encoder_ { encoder }
     , frame_ { av_frame_alloc() }
     , total_samples_written_ { 0 }
-    , packet_ { av_packet_alloc() }
 {
 }
 
@@ -32,7 +32,9 @@ auto ChunkWriter::operator()(MediaChunk const& chunk) -> void
     auto const sample_size = sc::sample_format_size(sample_format);
     auto const interleaved = sc::is_interleaved_format(sample_format);
 
-    sc::BorrowedPtr<AVFrame> frame = frame_.get();
+    auto encoder_frame =
+        encoder_.prepare_frame(codec_context_.get(), stream_.get());
+    auto* frame = encoder_frame->frame.get();
 
     frame->nb_samples = chunk.sample_count;
     frame->format = codec_context_->sample_fmt;
@@ -46,9 +48,7 @@ auto ChunkWriter::operator()(MediaChunk const& chunk) -> void
     frame->pts = total_samples_written_;
     total_samples_written_ += frame->nb_samples;
 
-    sc::initialize_writable_buffer(frame.get());
-
-    AVFrameUnrefGuard unref_guard { frame };
+    sc::initialize_writable_buffer(frame);
 
     auto n = 0;
     for (auto const& channel_buffer : chunk.channel_buffers()) {
@@ -63,11 +63,7 @@ auto ChunkWriter::operator()(MediaChunk const& chunk) -> void
         std::copy(begin(source), end(source), begin(target));
     }
 
-    send_frame(frame.get(),
-               codec_context_.get(),
-               format_context_.get(),
-               stream_.get(),
-               packet_.get());
+    encoder_.write_frame(std::move(encoder_frame));
 }
 
 } // namespace sc
diff --git a/src/handlers/audio_chunk_writer.hpp b/src/handlers/audio_chunk_writer.hpp
@@ -2,26 +2,26 @@
 #define SHADOW_CAST_HANDLERS_AUDIO_CHUNK_WRITER_HPP_INCLUDED
 
 #include "av.hpp"
+#include "services/encoder.hpp"
 #include "utils/borrowed_ptr.hpp"
 
 namespace sc
 {
 
 struct ChunkWriter
 {
-    explicit ChunkWriter(AVFormatContext* format_context,
-                         AVCodecContext* codec_context,
-                         AVStream* stream) noexcept;
+    explicit ChunkWriter(AVCodecContext* codec_context,
+                         AVStream* stream,
+                         Encoder encoder) noexcept;
 
     auto operator()(MediaChunk const& chunk) -> void;
 
 private:
-    BorrowedPtr<AVFormatContext> format_context_;
     BorrowedPtr<AVCodecContext> codec_context_;
     BorrowedPtr<AVStream> stream_;
+    Encoder encoder_;
     FramePtr frame_;
     std::size_t total_samples_written_ { 0 };
-    PacketPtr packet_;
 };
 
 } // namespace sc

diff --git a/src/handlers/drm_video_frame_writer.cpp b/src/handlers/drm_video_frame_writer.cpp
@@ -1,38 +1,45 @@
 #include "handlers/drm_video_frame_writer.hpp"
+#include "services/encoder.hpp"
 #include "utils/elapsed.hpp"
 
 namespace sc
 {
 
-DRMVideoFrameWriter::DRMVideoFrameWriter(AVFormatContext* fmt_context,
-                                         AVCodecContext* codec_context,
-                                         AVStream* stream)
-    : format_context_ { fmt_context }
-    , codec_context_ { codec_context }
+DRMVideoFrameWriter::DRMVideoFrameWriter(AVCodecContext* codec_context,
+                                         AVStream* stream,
+                                         Encoder encoder)
+    : codec_context_ { codec_context }
     , stream_ { stream }
-    , frame_ { av_frame_alloc() }
-    , packet_ { av_packet_alloc() }
+    , encoder_ { encoder }
 {
-    frame_->format = codec_context_->pix_fmt;
-    frame_->width = codec_context_->width;
-    frame_->height = codec_context_->height;
-    frame_->color_range = codec_context_->color_range;
-    frame_->color_primaries = codec_context_->color_primaries;
-    frame_->color_trc = codec_context_->color_trc;
-    frame_->colorspace = codec_context_->colorspace;
-    frame_->chroma_location = codec_context_->chroma_sample_location;
-    if (auto const r = av_hwframe_get_buffer(
-            codec_context_->hw_frames_ctx, frame_.get(), 0);
-        r < 0)
-        throw std::runtime_error { "Failed to get H/W frame buffer" };
 }
 
-auto DRMVideoFrameWriter::operator()(CUarray data, NvCuda const& cuda) -> void
+auto DRMVideoFrameWriter::operator()(CUarray data,
+                                     NvCuda const& cuda,
+                                     std::uint64_t frame_time) -> void
 {
     SC_EXPECT(data);
-    SC_EXPECT(frame_->linesize[0]);
-    SC_EXPECT(frame_->height);
-    SC_EXPECT(frame_->data[0]);
+
+    auto encoder_frame =
+        encoder_.prepare_frame(codec_context_.get(), stream_.get());
+    auto* frame = encoder_frame->frame.get();
+
+    frame->format = codec_context_->pix_fmt;
+    frame->width = codec_context_->width;
+    frame->height = codec_context_->height;
+    frame->color_range = codec_context_->color_range;
+    frame->color_primaries = codec_context_->color_primaries;
+    frame->color_trc = codec_context_->color_trc;
+    frame->colorspace = codec_context_->colorspace;
+    frame->chroma_location = codec_context_->chroma_sample_location;
+    if (auto const r =
+            av_hwframe_get_buffer(codec_context_->hw_frames_ctx, frame, 0);
+        r < 0)
+        throw std::runtime_error { "Failed to get H/W frame buffer" };
+
+    SC_EXPECT(frame->linesize[0]);
+    SC_EXPECT(frame->height);
+    SC_EXPECT(frame->data[0]);
 
     CUDA_MEMCPY2D memcpy_struct {};
 
@@ -43,10 +50,10 @@ auto DRMVideoFrameWriter::operator()(CUarray data, NvCuda const& cuda) -> void
     memcpy_struct.dstY = 0;
     memcpy_struct.dstMemoryType = CU_MEMORYTYPE_DEVICE;
     memcpy_struct.srcArray = data;
-    memcpy_struct.dstDevice = reinterpret_cast<CUdeviceptr>(frame_->data[0]);
-    memcpy_struct.dstPitch = frame_->linesize[0];
-    memcpy_struct.WidthInBytes = frame_->linesize[0];
-    memcpy_struct.Height = frame_->height;
+    memcpy_struct.dstDevice = reinterpret_cast<CUdeviceptr>(frame->data[0]);
+    memcpy_struct.dstPitch = frame->linesize[0];
+    memcpy_struct.WidthInBytes = frame->linesize[0];
+    memcpy_struct.Height = frame->height;
 
     if (auto const r = cuda.cuMemcpy2D_v2(&memcpy_struct); r != CUDA_SUCCESS) {
         char const* err = "unknown";
@@ -57,13 +64,9 @@ auto DRMVideoFrameWriter::operator()(CUarray data, NvCuda const& cuda) -> void
         };
     }
 
-    frame_->pts = frame_number_++;
+    frame->pts = frame_time * frame_number_++;
 
-    send_frame(frame_.get(),
-               codec_context_.get(),
-               format_context_.get(),
-               stream_.get(),
-               packet_.get());
+    encoder_.write_frame(std::move(encoder_frame));
 }
 
 } // namespace sc
diff --git a/src/handlers/drm_video_frame_writer.hpp b/src/handlers/drm_video_frame_writer.hpp
@@ -3,24 +3,24 @@
 
 #include "av.hpp"
 #include "nvidia.hpp"
+#include "services/encoder.hpp"
+#include <cstdint>
 
 namespace sc
 {
 struct DRMVideoFrameWriter
 {
-    DRMVideoFrameWriter(AVFormatContext* fmt_context,
-                        AVCodecContext* codec_context,
-                        AVStream* stream);
+    DRMVideoFrameWriter(AVCodecContext* codec_context,
+                        AVStream* stream,
+                        Encoder encoder);
 
-    auto operator()(CUarray, NvCuda const&) -> void;
+    auto operator()(CUarray, NvCuda const&, std::uint64_t) -> void;
 
 private:
-    BorrowedPtr<AVFormatContext> format_context_;
     BorrowedPtr<AVCodecContext> codec_context_;
     BorrowedPtr<AVStream> stream_;
-    FramePtr frame_;
+    Encoder encoder_;
     std::size_t frame_number_ { 0 };
-    PacketPtr packet_;
 };
 
 } // namespace sc
-Original file line number
+Diff line change
@@ Expand Up / @@ -3,3 +3,4 @@ build/ @@
     .private/
     experimental/
     dist/
+    report/
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		Improves encoding performance by using a dedicated thread
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		Adds a build-time option to enable timing metric collection for audio & video
		Adds a build-time option to enable frame time metric collection
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		Reduces frame "jitter" when capturing certain games in X11