Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(performance) Multiple performance improvements and other fixes #28

Merged
merged 1 commit into from
Dec 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
feat(performance) Multiple performance improvements and other fixes
Performance:
- Adds dedicated thread for dequeueing and writing encoded packets
- Use native (BGRA) colourspace for NVFBC capture
  Using the native colourspace means NVFBC doesn't have to perform any
  conversion. In theory, this will improve capture performance
- Allow NvFBC to wait up to a maximum time for a new frame

Fixes:
- Adds eventfd for cancelling contexts without timers
  Contexts without timers require this to be cancellable with
  `request_stop()`.
- Fixes memory leak in h/w frame context initialization

Other:
- Adds metrics for video and audio encoding frame times
- Adds performance report generator script
- Allows NvFBC adjusted FPS to be enabled/disabled with ENV var
  `SHADOW_CAST_STRICT_FPS`. Defaults to `1`.
- Add FAQ
  • Loading branch information
gmbeard committed Dec 9, 2023
commit 52e69591add6a65a538614379cd6c2a3c4c92240
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ build/
.private/
experimental/
dist/
report/
1 change: 1 addition & 0 deletions .versioning/changes/DyxXCHexJS.minor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Improves encoding performance by using a dedicated thread
2 changes: 1 addition & 1 deletion .versioning/changes/S9GbUeHXdB.minor.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Adds a build-time option to enable timing metric collection for audio & video
Adds a build-time option to enable frame time metric collection
1 change: 1 addition & 0 deletions .versioning/changes/UaFV3jPgO3.minor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Reduces frame "jitter" when capturing certain games in X11
1 change: 1 addition & 0 deletions .versioning/changes/wekMseGz7z.patch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fixes a memory leak in H/W encoding pipeline
19 changes: 7 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Typical screen capture utilities copy the framebuffer data between host and GPU
- [Building from source](#building-from-source)
- [Installing](#installing)
- [Alternative projects](#some-alternative-projects)
- [FAQ](doc/faq.md)

#### Example - Cyberpunk 2077
[![Cyberpunk 2077](http://i3.ytimg.com/vi/frXGxrdgTLY/hqdefault.jpg)](https://www.youtube.com/watch?v=frXGxrdgTLY)
Expand All @@ -26,21 +27,15 @@ The resulting media/container type is determined by the extension of `<OUTPUT FI

If no `OPTIONS` are specified then *Shadow Cast* will pick some sensible defaults for the audio/video encoders and sample/frame rates, but these can be changed by specifying the following `OPTIONS` on the command line...

- `-A <AUDIO ENCODER>` - All options available to `ffmpeg` should work here. Defaults to `libopus`
- `-V <VIDEO ENCODER>` - Available options are `h264_nvenc` and `hevc_nvenc`. Defaults to `hevc_nvenc`
- `-f <FRAME RATE>` - Values from `20` to `60` are accepted. Defaults to `60`
- `-s <SAMPLE RATE>` - Defaults to `48000` (_NOTE: Some encoders will only support certain sample rates. Shadow Cast will display an error if your chosen sample rate isn't supported_)
| Option | Description |
|--------- |------------ |
| `-A <AUDIO ENCODER>` | Audio encoder. All options available to `ffmpeg` should work here. Defaults to `libopus` |
| `-V <VIDEO ENCODER>` | Video encoder. Available options are `h264_nvenc` and `hevc_nvenc`. defaults to `hevc_nvenc` |
| `-f <FRAMES PER SECOND>` | Capture FPS. values from `20` to `70` are accepted. defaults to `60` |
| `-s <SAMPLE RATE>` | Audio sample rate. Defaults to `48000` (_NOTE: Some encoders will only support certain sample rates. Shadow Cast will display an error if your chosen sample rate isn't supported_) |

Ctrl+C / SIGINT will stop the capture session and finalize the output media.

### Help. I'm getting the following error

#### `ERROR: Couldn't create NvFBC instance`
*Shadow Cast* uses the *NvFBC* facility to provide efficient, low-latency framebuffer capture on X11. By default, NVIDIA disables this on most (if not all) of its consumer-level GPUs. However, there are two ways around this restriction...

- You can find a utility to patch your NVIDIA drivers in the [keylase/nvidia-patch](https://github.com/keylase/nvidia-patch) GitHub repo.
- You can obtain a "key" to unlock this feature at runtime. The key can be set at runtime via the `SHADOW_CAST_NVFBC_KEY=<BASE64 ENCODED KEY>` environment variable. I use this method but I'm not sure how "official" it is, so no keys are provided in this repo. Feel free to message me about this.

### Requirements
- FFMpeg (libav)
- NVIDIA GPU, supporting NVENC and NvFBC
Expand Down
24 changes: 24 additions & 0 deletions doc/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
### Q. Help, I'm getting the following error

```
ERROR: Couldn't create NvFBC instance
```

#### A.
*Shadow Cast* uses the *NvFBC* facility to provide efficient, low-latency framebuffer capture on X11. By default, NVIDIA disables this on most (if not all) of its consumer-level GPUs. However, there are two ways around this restriction...

- You can find a utility to patch your NVIDIA drivers in the [keylase/nvidia-patch](https://github.com/keylase/nvidia-patch) GitHub repo.
- You can obtain a "key" to unlock this feature at runtime. The key can be set at runtime via the `SHADOW_CAST_NVFBC_KEY=<BASE64 ENCODED KEY>` environment variable. I use this method but I'm not sure how "official" it is, so no keys are provided in this repo. Feel free to message me about this.

### Q. I'm capturing gameplay footage on X11 and my game is "laggy"

#### A.
In some games, the NVIDIA Capture library (*NvFBC*) appears to interact poorly with v-sync if your refresh rate matches the capture FPS. Try disabling v-sync. Another option you can try is to set the following environment variable when capturing...

```
$ SHADOW_CAST_STRICT_FPS=0 shadow-cast ...
```

Please note, however, using this option will scale the output video's frame rate to match NvFBC's closest match (e.g. `62.5` in the case of a 60fps capture).

I haven't quite worked out the definitive cause of this "lagginess", but I suspect it is because NvFBC only accepts integer millisecond values as its sampling rate, so cannot exactly match *Shadow Cast*'s frame rate. For example, 60fps would require a fractional sampling frequency of `16.666` milliseconds, and NvFBC only allows either `16` or `17` milliseconds.
5 changes: 4 additions & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -35,17 +35,20 @@ add_library(shadow-cast-obj
services/audio_service.cpp
services/context.cpp
services/drm_video_service.cpp
services/encoder.cpp
services/encoder_service.cpp
services/metrics_service.cpp
services/readiness.cpp
services/service.cpp
services/service_registry.cpp
services/signal_service.cpp
services/video_service.cpp

utils/base64.cpp
utils/cmd_line.cpp
utils/contracts.cpp
utils/base64.cpp
utils/elapsed.cpp
utils/frame_time.cpp
utils/result.cpp

error.cpp
Expand Down
19 changes: 6 additions & 13 deletions src/av/codec.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ auto create_video_encoder(std::string const& encoder_name,
CUcontext cuda_ctx,
AVBufferPool* pool,
VideoOutputSize size,
std::uint32_t fps,
FrameTime const& ft,
AVPixelFormat pixel_format) -> sc::CodecContextPtr
{
sc::BorrowedPtr<AVCodec const> video_encoder { avcodec_find_encoder_by_name(
Expand All @@ -30,10 +30,10 @@ auto create_video_encoder(std::string const& encoder_name,
sc::CodecContextPtr video_encoder_context { avcodec_alloc_context3(
video_encoder.get()) };
video_encoder_context->codec_id = video_encoder->id;
video_encoder_context->time_base.num = 1;
video_encoder_context->time_base.den = fps;
video_encoder_context->framerate.num = fps;
video_encoder_context->framerate.den = 1;
auto const timebase = ft.per_second_ratio();
video_encoder_context->time_base = timebase;
video_encoder_context->framerate.num = timebase.den;
video_encoder_context->framerate.den = timebase.num;
video_encoder_context->sample_aspect_ratio.num = 0;
video_encoder_context->sample_aspect_ratio.den = 0;
video_encoder_context->max_b_frames = 0;
Expand Down Expand Up @@ -69,9 +69,6 @@ auto create_video_encoder(std::string const& encoder_name,
hw_frame_context->height = video_encoder_context->height;
hw_frame_context->sw_format = pixel_format;
hw_frame_context->format = video_encoder_context->pix_fmt;
hw_frame_context->device_ref = av_buffer_ref(device_ctx.get());
hw_frame_context->device_ctx =
reinterpret_cast<AVHWDeviceContext*>(device_ctx->data);

hw_frame_context->pool = pool;
hw_frame_context->initial_pool_size = 1;
Expand All @@ -81,11 +78,7 @@ auto create_video_encoder(std::string const& encoder_name,
av_error_to_string(ret) };
}

/* TODO: Are we doing this correctly? It seems to be the
* source of memory leak...
*/
video_encoder_context->hw_device_ctx = device_ctx.release();
video_encoder_context->hw_frames_ctx = frame_context.release();
video_encoder_context->hw_frames_ctx = av_buffer_ref(frame_context.get());

AVDictionary* options = nullptr;
av_dict_set_int(&options, "qp", 21, 0);
Expand Down
2 changes: 1 addition & 1 deletion src/av/codec.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ auto create_video_encoder(std::string const& encoder_name,
CUcontext cuda_ctx,
AVBufferPool* pool,
VideoOutputSize size,
std::uint32_t fps,
FrameTime const& ft,
AVPixelFormat pixel_format) -> sc::CodecContextPtr;
} // namespace sc

Expand Down
26 changes: 11 additions & 15 deletions src/handlers/audio_chunk_writer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#include "av/sample_format.hpp"
#include "config.hpp"
#include "error.hpp"
#include "services/encoder.hpp"
#include <algorithm>
#include <cassert>
#include <memory>
Expand All @@ -11,15 +12,14 @@
namespace sc
{

ChunkWriter::ChunkWriter(AVFormatContext* format_context,
AVCodecContext* codec_context,
AVStream* stream) noexcept
: format_context_ { format_context }
, codec_context_ { codec_context }
ChunkWriter::ChunkWriter(AVCodecContext* codec_context,
AVStream* stream,
Encoder encoder) noexcept
: codec_context_ { codec_context }
, stream_ { stream }
, encoder_ { encoder }
, frame_ { av_frame_alloc() }
, total_samples_written_ { 0 }
, packet_ { av_packet_alloc() }
{
}

Expand All @@ -32,7 +32,9 @@ auto ChunkWriter::operator()(MediaChunk const& chunk) -> void
auto const sample_size = sc::sample_format_size(sample_format);
auto const interleaved = sc::is_interleaved_format(sample_format);

sc::BorrowedPtr<AVFrame> frame = frame_.get();
auto encoder_frame =
encoder_.prepare_frame(codec_context_.get(), stream_.get());
auto* frame = encoder_frame->frame.get();

frame->nb_samples = chunk.sample_count;
frame->format = codec_context_->sample_fmt;
Expand All @@ -46,9 +48,7 @@ auto ChunkWriter::operator()(MediaChunk const& chunk) -> void
frame->pts = total_samples_written_;
total_samples_written_ += frame->nb_samples;

sc::initialize_writable_buffer(frame.get());

AVFrameUnrefGuard unref_guard { frame };
sc::initialize_writable_buffer(frame);

auto n = 0;
for (auto const& channel_buffer : chunk.channel_buffers()) {
Expand All @@ -63,11 +63,7 @@ auto ChunkWriter::operator()(MediaChunk const& chunk) -> void
std::copy(begin(source), end(source), begin(target));
}

send_frame(frame.get(),
codec_context_.get(),
format_context_.get(),
stream_.get(),
packet_.get());
encoder_.write_frame(std::move(encoder_frame));
}

} // namespace sc
10 changes: 5 additions & 5 deletions src/handlers/audio_chunk_writer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,26 @@
#define SHADOW_CAST_HANDLERS_AUDIO_CHUNK_WRITER_HPP_INCLUDED

#include "av.hpp"
#include "services/encoder.hpp"
#include "utils/borrowed_ptr.hpp"

namespace sc
{

struct ChunkWriter
{
explicit ChunkWriter(AVFormatContext* format_context,
AVCodecContext* codec_context,
AVStream* stream) noexcept;
explicit ChunkWriter(AVCodecContext* codec_context,
AVStream* stream,
Encoder encoder) noexcept;

auto operator()(MediaChunk const& chunk) -> void;

private:
BorrowedPtr<AVFormatContext> format_context_;
BorrowedPtr<AVCodecContext> codec_context_;
BorrowedPtr<AVStream> stream_;
Encoder encoder_;
FramePtr frame_;
std::size_t total_samples_written_ { 0 };
PacketPtr packet_;
};

} // namespace sc
Expand Down
69 changes: 36 additions & 33 deletions src/handlers/drm_video_frame_writer.cpp
Original file line number Diff line number Diff line change
@@ -1,38 +1,45 @@
#include "handlers/drm_video_frame_writer.hpp"
#include "services/encoder.hpp"
#include "utils/elapsed.hpp"

namespace sc
{

DRMVideoFrameWriter::DRMVideoFrameWriter(AVFormatContext* fmt_context,
AVCodecContext* codec_context,
AVStream* stream)
: format_context_ { fmt_context }
, codec_context_ { codec_context }
DRMVideoFrameWriter::DRMVideoFrameWriter(AVCodecContext* codec_context,
AVStream* stream,
Encoder encoder)
: codec_context_ { codec_context }
, stream_ { stream }
, frame_ { av_frame_alloc() }
, packet_ { av_packet_alloc() }
, encoder_ { encoder }
{
frame_->format = codec_context_->pix_fmt;
frame_->width = codec_context_->width;
frame_->height = codec_context_->height;
frame_->color_range = codec_context_->color_range;
frame_->color_primaries = codec_context_->color_primaries;
frame_->color_trc = codec_context_->color_trc;
frame_->colorspace = codec_context_->colorspace;
frame_->chroma_location = codec_context_->chroma_sample_location;
if (auto const r = av_hwframe_get_buffer(
codec_context_->hw_frames_ctx, frame_.get(), 0);
r < 0)
throw std::runtime_error { "Failed to get H/W frame buffer" };
}

auto DRMVideoFrameWriter::operator()(CUarray data, NvCuda const& cuda) -> void
auto DRMVideoFrameWriter::operator()(CUarray data,
NvCuda const& cuda,
std::uint64_t frame_time) -> void
{
SC_EXPECT(data);
SC_EXPECT(frame_->linesize[0]);
SC_EXPECT(frame_->height);
SC_EXPECT(frame_->data[0]);

auto encoder_frame =
encoder_.prepare_frame(codec_context_.get(), stream_.get());
auto* frame = encoder_frame->frame.get();

frame->format = codec_context_->pix_fmt;
frame->width = codec_context_->width;
frame->height = codec_context_->height;
frame->color_range = codec_context_->color_range;
frame->color_primaries = codec_context_->color_primaries;
frame->color_trc = codec_context_->color_trc;
frame->colorspace = codec_context_->colorspace;
frame->chroma_location = codec_context_->chroma_sample_location;
if (auto const r =
av_hwframe_get_buffer(codec_context_->hw_frames_ctx, frame, 0);
r < 0)
throw std::runtime_error { "Failed to get H/W frame buffer" };

SC_EXPECT(frame->linesize[0]);
SC_EXPECT(frame->height);
SC_EXPECT(frame->data[0]);

CUDA_MEMCPY2D memcpy_struct {};

Expand All @@ -43,10 +50,10 @@ auto DRMVideoFrameWriter::operator()(CUarray data, NvCuda const& cuda) -> void
memcpy_struct.dstY = 0;
memcpy_struct.dstMemoryType = CU_MEMORYTYPE_DEVICE;
memcpy_struct.srcArray = data;
memcpy_struct.dstDevice = reinterpret_cast<CUdeviceptr>(frame_->data[0]);
memcpy_struct.dstPitch = frame_->linesize[0];
memcpy_struct.WidthInBytes = frame_->linesize[0];
memcpy_struct.Height = frame_->height;
memcpy_struct.dstDevice = reinterpret_cast<CUdeviceptr>(frame->data[0]);
memcpy_struct.dstPitch = frame->linesize[0];
memcpy_struct.WidthInBytes = frame->linesize[0];
memcpy_struct.Height = frame->height;

if (auto const r = cuda.cuMemcpy2D_v2(&memcpy_struct); r != CUDA_SUCCESS) {
char const* err = "unknown";
Expand All @@ -57,13 +64,9 @@ auto DRMVideoFrameWriter::operator()(CUarray data, NvCuda const& cuda) -> void
};
}

frame_->pts = frame_number_++;
frame->pts = frame_time * frame_number_++;

send_frame(frame_.get(),
codec_context_.get(),
format_context_.get(),
stream_.get(),
packet_.get());
encoder_.write_frame(std::move(encoder_frame));
}

} // namespace sc
14 changes: 7 additions & 7 deletions src/handlers/drm_video_frame_writer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,24 @@

#include "av.hpp"
#include "nvidia.hpp"
#include "services/encoder.hpp"
#include <cstdint>

namespace sc
{
struct DRMVideoFrameWriter
{
DRMVideoFrameWriter(AVFormatContext* fmt_context,
AVCodecContext* codec_context,
AVStream* stream);
DRMVideoFrameWriter(AVCodecContext* codec_context,
AVStream* stream,
Encoder encoder);

auto operator()(CUarray, NvCuda const&) -> void;
auto operator()(CUarray, NvCuda const&, std::uint64_t) -> void;

private:
BorrowedPtr<AVFormatContext> format_context_;
BorrowedPtr<AVCodecContext> codec_context_;
BorrowedPtr<AVStream> stream_;
FramePtr frame_;
Encoder encoder_;
std::size_t frame_number_ { 0 };
PacketPtr packet_;
};

} // namespace sc
Expand Down
Loading