Skip to content

Commit

Permalink
Merge pull request #28 from gmbeard/feature/performance-improvements
Browse files Browse the repository at this point in the history
feat(performance) Multiple performance improvements and other fixes
  • Loading branch information
gmbeard authored Dec 9, 2023
2 parents 92b09ab + 52e6959 commit 67b6057
Show file tree
Hide file tree
Showing 43 changed files with 974 additions and 254 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ build/
.private/
experimental/
dist/
report/
1 change: 1 addition & 0 deletions .versioning/changes/DyxXCHexJS.minor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Improves encoding performance by using a dedicated thread
2 changes: 1 addition & 1 deletion .versioning/changes/S9GbUeHXdB.minor.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Adds a build-time option to enable timing metric collection for audio & video
Adds a build-time option to enable frame time metric collection
1 change: 1 addition & 0 deletions .versioning/changes/UaFV3jPgO3.minor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Reduces frame "jitter" when capturing certain games in X11
1 change: 1 addition & 0 deletions .versioning/changes/wekMseGz7z.patch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fixes a memory leak in H/W encoding pipeline
19 changes: 7 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Typical screen capture utilities copy the framebuffer data between host and GPU
- [Building from source](#building-from-source)
- [Installing](#installing)
- [Alternative projects](#some-alternative-projects)
- [FAQ](doc/faq.md)

#### Example - Cyberpunk 2077
[![Cyberpunk 2077](http://i3.ytimg.com/vi/frXGxrdgTLY/hqdefault.jpg)](https://www.youtube.com/watch?v=frXGxrdgTLY)
Expand All @@ -26,21 +27,15 @@ The resulting media/container type is determined by the extension of `<OUTPUT FI

If no `OPTIONS` are specified then *Shadow Cast* will pick some sensible defaults for the audio/video encoders and sample/frame rates, but these can be changed by specifying the following `OPTIONS` on the command line...

- `-A <AUDIO ENCODER>` - All options available to `ffmpeg` should work here. Defaults to `libopus`
- `-V <VIDEO ENCODER>` - Available options are `h264_nvenc` and `hevc_nvenc`. Defaults to `hevc_nvenc`
- `-f <FRAME RATE>` - Values from `20` to `60` are accepted. Defaults to `60`
- `-s <SAMPLE RATE>` - Defaults to `48000` (_NOTE: Some encoders will only support certain sample rates. Shadow Cast will display an error if your chosen sample rate isn't supported_)
| Option | Description |
|--------- |------------ |
| `-A <AUDIO ENCODER>` | Audio encoder. All options available to `ffmpeg` should work here. Defaults to `libopus` |
| `-V <VIDEO ENCODER>` | Video encoder. Available options are `h264_nvenc` and `hevc_nvenc`. defaults to `hevc_nvenc` |
| `-f <FRAMES PER SECOND>` | Capture FPS. values from `20` to `70` are accepted. defaults to `60` |
| `-s <SAMPLE RATE>` | Audio sample rate. Defaults to `48000` (_NOTE: Some encoders will only support certain sample rates. Shadow Cast will display an error if your chosen sample rate isn't supported_) |

Ctrl+C / SIGINT will stop the capture session and finalize the output media.

### Help. I'm getting the following error

#### `ERROR: Couldn't create NvFBC instance`
*Shadow Cast* uses the *NvFBC* facility to provide efficient, low-latency framebuffer capture on X11. By default, NVIDIA disables this on most (if not all) of its consumer-level GPUs. However, there are two ways around this restriction...

- You can find a utility to patch your NVIDIA drivers in the [keylase/nvidia-patch](https://github.com/keylase/nvidia-patch) GitHub repo.
- You can obtain a "key" to unlock this feature at runtime. The key can be set at runtime via the `SHADOW_CAST_NVFBC_KEY=<BASE64 ENCODED KEY>` environment variable. I use this method but I'm not sure how "official" it is, so no keys are provided in this repo. Feel free to message me about this.

### Requirements
- FFMpeg (libav)
- NVIDIA GPU, supporting NVENC and NvFBC
Expand Down
24 changes: 24 additions & 0 deletions doc/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
### Q. Help, I'm getting the following error

```
ERROR: Couldn't create NvFBC instance
```

#### A.
*Shadow Cast* uses the *NvFBC* facility to provide efficient, low-latency framebuffer capture on X11. By default, NVIDIA disables this on most (if not all) of its consumer-level GPUs. However, there are two ways around this restriction...

- You can find a utility to patch your NVIDIA drivers in the [keylase/nvidia-patch](https://github.com/keylase/nvidia-patch) GitHub repo.
- You can obtain a "key" to unlock this feature at runtime. The key can be set at runtime via the `SHADOW_CAST_NVFBC_KEY=<BASE64 ENCODED KEY>` environment variable. I use this method but I'm not sure how "official" it is, so no keys are provided in this repo. Feel free to message me about this.

### Q. I'm capturing gameplay footage on X11 and my game is "laggy"

#### A.
In some games, the NVIDIA Capture library (*NvFBC*) appears to interact poorly with v-sync if your refresh rate matches the capture FPS. Try disabling v-sync. Another option you can try is to set the following environment variable when capturing...

```
$ SHADOW_CAST_STRICT_FPS=0 shadow-cast ...
```

Please note, however, using this option will scale the output video's frame rate to match NvFBC's closest match (e.g. `62.5` in the case of a 60fps capture).

I haven't quite worked out the definitive cause of this "lagginess", but I suspect it is because NvFBC only accepts integer millisecond values as its sampling rate, so cannot exactly match *Shadow Cast*'s frame rate. For example, 60fps would require a fractional sampling frequency of `16.666` milliseconds, and NvFBC only allows either `16` or `17` milliseconds.
5 changes: 4 additions & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -35,17 +35,20 @@ add_library(shadow-cast-obj
services/audio_service.cpp
services/context.cpp
services/drm_video_service.cpp
services/encoder.cpp
services/encoder_service.cpp
services/metrics_service.cpp
services/readiness.cpp
services/service.cpp
services/service_registry.cpp
services/signal_service.cpp
services/video_service.cpp

utils/base64.cpp
utils/cmd_line.cpp
utils/contracts.cpp
utils/base64.cpp
utils/elapsed.cpp
utils/frame_time.cpp
utils/result.cpp

error.cpp
Expand Down
19 changes: 6 additions & 13 deletions src/av/codec.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ auto create_video_encoder(std::string const& encoder_name,
CUcontext cuda_ctx,
AVBufferPool* pool,
VideoOutputSize size,
std::uint32_t fps,
FrameTime const& ft,
AVPixelFormat pixel_format) -> sc::CodecContextPtr
{
sc::BorrowedPtr<AVCodec const> video_encoder { avcodec_find_encoder_by_name(
Expand All @@ -30,10 +30,10 @@ auto create_video_encoder(std::string const& encoder_name,
sc::CodecContextPtr video_encoder_context { avcodec_alloc_context3(
video_encoder.get()) };
video_encoder_context->codec_id = video_encoder->id;
video_encoder_context->time_base.num = 1;
video_encoder_context->time_base.den = fps;
video_encoder_context->framerate.num = fps;
video_encoder_context->framerate.den = 1;
auto const timebase = ft.per_second_ratio();
video_encoder_context->time_base = timebase;
video_encoder_context->framerate.num = timebase.den;
video_encoder_context->framerate.den = timebase.num;
video_encoder_context->sample_aspect_ratio.num = 0;
video_encoder_context->sample_aspect_ratio.den = 0;
video_encoder_context->max_b_frames = 0;
Expand Down Expand Up @@ -69,9 +69,6 @@ auto create_video_encoder(std::string const& encoder_name,
hw_frame_context->height = video_encoder_context->height;
hw_frame_context->sw_format = pixel_format;
hw_frame_context->format = video_encoder_context->pix_fmt;
hw_frame_context->device_ref = av_buffer_ref(device_ctx.get());
hw_frame_context->device_ctx =
reinterpret_cast<AVHWDeviceContext*>(device_ctx->data);

hw_frame_context->pool = pool;
hw_frame_context->initial_pool_size = 1;
Expand All @@ -81,11 +78,7 @@ auto create_video_encoder(std::string const& encoder_name,
av_error_to_string(ret) };
}

/* TODO: Are we doing this correctly? It seems to be the
* source of memory leak...
*/
video_encoder_context->hw_device_ctx = device_ctx.release();
video_encoder_context->hw_frames_ctx = frame_context.release();
video_encoder_context->hw_frames_ctx = av_buffer_ref(frame_context.get());

AVDictionary* options = nullptr;
av_dict_set_int(&options, "qp", 21, 0);
Expand Down
2 changes: 1 addition & 1 deletion src/av/codec.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ auto create_video_encoder(std::string const& encoder_name,
CUcontext cuda_ctx,
AVBufferPool* pool,
VideoOutputSize size,
std::uint32_t fps,
FrameTime const& ft,
AVPixelFormat pixel_format) -> sc::CodecContextPtr;
} // namespace sc

Expand Down
26 changes: 11 additions & 15 deletions src/handlers/audio_chunk_writer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#include "av/sample_format.hpp"
#include "config.hpp"
#include "error.hpp"
#include "services/encoder.hpp"
#include <algorithm>
#include <cassert>
#include <memory>
Expand All @@ -11,15 +12,14 @@
namespace sc
{

ChunkWriter::ChunkWriter(AVFormatContext* format_context,
AVCodecContext* codec_context,
AVStream* stream) noexcept
: format_context_ { format_context }
, codec_context_ { codec_context }
ChunkWriter::ChunkWriter(AVCodecContext* codec_context,
AVStream* stream,
Encoder encoder) noexcept
: codec_context_ { codec_context }
, stream_ { stream }
, encoder_ { encoder }
, frame_ { av_frame_alloc() }
, total_samples_written_ { 0 }
, packet_ { av_packet_alloc() }
{
}

Expand All @@ -32,7 +32,9 @@ auto ChunkWriter::operator()(MediaChunk const& chunk) -> void
auto const sample_size = sc::sample_format_size(sample_format);
auto const interleaved = sc::is_interleaved_format(sample_format);

sc::BorrowedPtr<AVFrame> frame = frame_.get();
auto encoder_frame =
encoder_.prepare_frame(codec_context_.get(), stream_.get());
auto* frame = encoder_frame->frame.get();

frame->nb_samples = chunk.sample_count;
frame->format = codec_context_->sample_fmt;
Expand All @@ -46,9 +48,7 @@ auto ChunkWriter::operator()(MediaChunk const& chunk) -> void
frame->pts = total_samples_written_;
total_samples_written_ += frame->nb_samples;

sc::initialize_writable_buffer(frame.get());

AVFrameUnrefGuard unref_guard { frame };
sc::initialize_writable_buffer(frame);

auto n = 0;
for (auto const& channel_buffer : chunk.channel_buffers()) {
Expand All @@ -63,11 +63,7 @@ auto ChunkWriter::operator()(MediaChunk const& chunk) -> void
std::copy(begin(source), end(source), begin(target));
}

send_frame(frame.get(),
codec_context_.get(),
format_context_.get(),
stream_.get(),
packet_.get());
encoder_.write_frame(std::move(encoder_frame));
}

} // namespace sc
10 changes: 5 additions & 5 deletions src/handlers/audio_chunk_writer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,26 @@
#define SHADOW_CAST_HANDLERS_AUDIO_CHUNK_WRITER_HPP_INCLUDED

#include "av.hpp"
#include "services/encoder.hpp"
#include "utils/borrowed_ptr.hpp"

namespace sc
{

struct ChunkWriter
{
explicit ChunkWriter(AVFormatContext* format_context,
AVCodecContext* codec_context,
AVStream* stream) noexcept;
explicit ChunkWriter(AVCodecContext* codec_context,
AVStream* stream,
Encoder encoder) noexcept;

auto operator()(MediaChunk const& chunk) -> void;

private:
BorrowedPtr<AVFormatContext> format_context_;
BorrowedPtr<AVCodecContext> codec_context_;
BorrowedPtr<AVStream> stream_;
Encoder encoder_;
FramePtr frame_;
std::size_t total_samples_written_ { 0 };
PacketPtr packet_;
};

} // namespace sc
Expand Down
69 changes: 36 additions & 33 deletions src/handlers/drm_video_frame_writer.cpp
Original file line number Diff line number Diff line change
@@ -1,38 +1,45 @@
#include "handlers/drm_video_frame_writer.hpp"
#include "services/encoder.hpp"
#include "utils/elapsed.hpp"

namespace sc
{

DRMVideoFrameWriter::DRMVideoFrameWriter(AVFormatContext* fmt_context,
AVCodecContext* codec_context,
AVStream* stream)
: format_context_ { fmt_context }
, codec_context_ { codec_context }
DRMVideoFrameWriter::DRMVideoFrameWriter(AVCodecContext* codec_context,
AVStream* stream,
Encoder encoder)
: codec_context_ { codec_context }
, stream_ { stream }
, frame_ { av_frame_alloc() }
, packet_ { av_packet_alloc() }
, encoder_ { encoder }
{
frame_->format = codec_context_->pix_fmt;
frame_->width = codec_context_->width;
frame_->height = codec_context_->height;
frame_->color_range = codec_context_->color_range;
frame_->color_primaries = codec_context_->color_primaries;
frame_->color_trc = codec_context_->color_trc;
frame_->colorspace = codec_context_->colorspace;
frame_->chroma_location = codec_context_->chroma_sample_location;
if (auto const r = av_hwframe_get_buffer(
codec_context_->hw_frames_ctx, frame_.get(), 0);
r < 0)
throw std::runtime_error { "Failed to get H/W frame buffer" };
}

auto DRMVideoFrameWriter::operator()(CUarray data, NvCuda const& cuda) -> void
auto DRMVideoFrameWriter::operator()(CUarray data,
NvCuda const& cuda,
std::uint64_t frame_time) -> void
{
SC_EXPECT(data);
SC_EXPECT(frame_->linesize[0]);
SC_EXPECT(frame_->height);
SC_EXPECT(frame_->data[0]);

auto encoder_frame =
encoder_.prepare_frame(codec_context_.get(), stream_.get());
auto* frame = encoder_frame->frame.get();

frame->format = codec_context_->pix_fmt;
frame->width = codec_context_->width;
frame->height = codec_context_->height;
frame->color_range = codec_context_->color_range;
frame->color_primaries = codec_context_->color_primaries;
frame->color_trc = codec_context_->color_trc;
frame->colorspace = codec_context_->colorspace;
frame->chroma_location = codec_context_->chroma_sample_location;
if (auto const r =
av_hwframe_get_buffer(codec_context_->hw_frames_ctx, frame, 0);
r < 0)
throw std::runtime_error { "Failed to get H/W frame buffer" };

SC_EXPECT(frame->linesize[0]);
SC_EXPECT(frame->height);
SC_EXPECT(frame->data[0]);

CUDA_MEMCPY2D memcpy_struct {};

Expand All @@ -43,10 +50,10 @@ auto DRMVideoFrameWriter::operator()(CUarray data, NvCuda const& cuda) -> void
memcpy_struct.dstY = 0;
memcpy_struct.dstMemoryType = CU_MEMORYTYPE_DEVICE;
memcpy_struct.srcArray = data;
memcpy_struct.dstDevice = reinterpret_cast<CUdeviceptr>(frame_->data[0]);
memcpy_struct.dstPitch = frame_->linesize[0];
memcpy_struct.WidthInBytes = frame_->linesize[0];
memcpy_struct.Height = frame_->height;
memcpy_struct.dstDevice = reinterpret_cast<CUdeviceptr>(frame->data[0]);
memcpy_struct.dstPitch = frame->linesize[0];
memcpy_struct.WidthInBytes = frame->linesize[0];
memcpy_struct.Height = frame->height;

if (auto const r = cuda.cuMemcpy2D_v2(&memcpy_struct); r != CUDA_SUCCESS) {
char const* err = "unknown";
Expand All @@ -57,13 +64,9 @@ auto DRMVideoFrameWriter::operator()(CUarray data, NvCuda const& cuda) -> void
};
}

frame_->pts = frame_number_++;
frame->pts = frame_time * frame_number_++;

send_frame(frame_.get(),
codec_context_.get(),
format_context_.get(),
stream_.get(),
packet_.get());
encoder_.write_frame(std::move(encoder_frame));
}

} // namespace sc
14 changes: 7 additions & 7 deletions src/handlers/drm_video_frame_writer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,24 @@

#include "av.hpp"
#include "nvidia.hpp"
#include "services/encoder.hpp"
#include <cstdint>

namespace sc
{
struct DRMVideoFrameWriter
{
DRMVideoFrameWriter(AVFormatContext* fmt_context,
AVCodecContext* codec_context,
AVStream* stream);
DRMVideoFrameWriter(AVCodecContext* codec_context,
AVStream* stream,
Encoder encoder);

auto operator()(CUarray, NvCuda const&) -> void;
auto operator()(CUarray, NvCuda const&, std::uint64_t) -> void;

private:
BorrowedPtr<AVFormatContext> format_context_;
BorrowedPtr<AVCodecContext> codec_context_;
BorrowedPtr<AVStream> stream_;
FramePtr frame_;
Encoder encoder_;
std::size_t frame_number_ { 0 };
PacketPtr packet_;
};

} // namespace sc
Expand Down
Loading

0 comments on commit 67b6057

Please sign in to comment.