Skip to content

Removing internal_subset leaks memory #1784

Closed
@stevecheckoway

Description

Removing an HTML document's internal_subset leaks memory.

What's the output from nokogiri -v?

# Nokogiri (1.7.0.1)
    ---
    warnings: []
    nokogiri: 1.7.0.1
    ruby:
      version: 2.4.4
      platform: x86_64-darwin17
      description: ruby 2.4.4p296 (2018-03-28 revision 63013) [x86_64-darwin17]
      engine: ruby
    libxml:
      binding: extension
      source: packaged
      libxml2_path: "/Users/steve/programming/nokogumbo/vendor/bundle/ruby/2.4.0/gems/nokogiri-1.7.0.1/ports/x86_64-apple-darwin17.4.0/libxml2/2.9.4"
      libxslt_path: "/Users/steve/programming/nokogumbo/vendor/bundle/ruby/2.4.0/gems/nokogiri-1.7.0.1/ports/x86_64-apple-darwin17.4.0/libxslt/1.1.29"
      libxml2_patches: []
      libxslt_patches: []
      compiled: 2.9.4
      loaded: 2.9.4

Can you provide a self-contained script that reproduces what you're seeing?

#!/usr/bin/env ruby
# encoding: utf-8
require 'nokogiri'

1_000_000.times do |i|
  doc = Nokogiri::HTML::Document.new
  doc.internal_subset.remove
end

Here're the two leaks.

Leak: 0x7fddedc00250  size=16  zone: DefaultMallocZone_0x10cdff000
        Call stack: [thread 0x7fff98b0b380]: | 0x7fff60324015 (libdyld.dylib) start | 0x10cab9f3b (ruby) main | 0x10cb0d71d (libruby.2.4.dylib) ruby_run_node | 0x10cb0d7ec (libruby.2.4.dylib) ruby_exec_internal | 0x10cc02df4 (libruby.2.4.dylib) vm_exec | 0x10cbf7158 (libruby.2.4.dylib) vm_exec_core | 0x10cc06729 (libruby.2.4.dylib) vm_call_cfunc | 0x10cb52dbb (libruby.2.4.dylib) int_dotimes | 0x10cbff937 (libruby.2.4.dylib) rb_yield_1 | 0x10cc0c0a2 (libruby.2.4.dylib) invoke_block_from_c_splattable | 0x10cc02df4 (libruby.2.4.dylib) vm_exec | 0x10cbf7781 (libruby.2.4.dylib) vm_exec_core | 0x10cc06729 (libruby.2.4.dylib) vm_call_cfunc | 0x10cf24e2a (nokogiri.bundle) new | 0x10cfb4e99 (nokogiri.bundle) htmlNewDoc | 0x10cfb4c5d (nokogiri.bundle) htmlNewDocNoDtD | 0x10cf818c7 (nokogiri.bundle) xmlCreateIntSubset | 0x10d016e6a (nokogiri.bundle) xmlStrdup | 0x10d016d8d (nokogiri.bundle) xmlStrndup | 0x10cb22de2 (libruby.2.4.dylib) objspace_xmalloc0 | 0x7fff604cc4c7 (libsystem_malloc.dylib) malloc | 0x7fff604cd1e1 (libsystem_malloc.dylib) malloc_zone_malloc

Leak: 0x7fddedc00450  size=48  zone: DefaultMallocZone_0x10cdff000
        Call stack: [thread 0x7fff98b0b380]: | 0x7fff60324015 (libdyld.dylib) start | 0x10cab9f3b (ruby) main | 0x10cb0d71d (libruby.2.4.dylib) ruby_run_node | 0x10cb0d7ec (libruby.2.4.dylib) ruby_exec_internal | 0x10cc02df4 (libruby.2.4.dylib) vm_exec | 0x10cbf7158 (libruby.2.4.dylib) vm_exec_core | 0x10cc06729 (libruby.2.4.dylib) vm_call_cfunc | 0x10cb52dbb (libruby.2.4.dylib) int_dotimes | 0x10cbff937 (libruby.2.4.dylib) rb_yield_1 | 0x10cc0c0a2 (libruby.2.4.dylib) invoke_block_from_c_splattable | 0x10cc02df4 (libruby.2.4.dylib) vm_exec | 0x10cbf7781 (libruby.2.4.dylib) vm_exec_core | 0x10cc06729 (libruby.2.4.dylib) vm_call_cfunc | 0x10cf24e2a (nokogiri.bundle) new | 0x10cfb4e99 (nokogiri.bundle) htmlNewDoc | 0x10cfb4c5d (nokogiri.bundle) htmlNewDocNoDtD | 0x10cf819aa (nokogiri.bundle) xmlCreateIntSubset | 0x10d016e6a (nokogiri.bundle) xmlStrdup | 0x10d016d8d (nokogiri.bundle) xmlStrndup | 0x10cb22de2 (libruby.2.4.dylib) objspace_xmalloc0 | 0x7fff604cc4c7 (libsystem_malloc.dylib) malloc | 0x7fff604cd1e1 (libsystem_malloc.dylib) malloc_zone_malloc

This is causing problems for users of Nokogumbo (see rubys/nokogumbo#20).

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions