EFF Reduce the size of shared objects of the C-extensions generated by Cython #27767
Description
Context
scikit-learn uses C-extensions in critical part of its implementations via Cython.
Each C-entension is build from one or several Cython translation unit (a .pyx
file with a potential .pxd
companion file).
In scikit-learn, each C-extension build consists of a single Cython translation which is transpilled to a C or C++ translation unit, which is then compiled to a shared object file.
The resulting C or C++ translation unit contains the code translation from Cython to C and large preambule and epylogue of macros, functions, structs, global variables such as virtual tables, Python module definition, etc.
For instance, while the code of sklearn/utils/_heap.pyx
only consists of less than 100 lines for a single function, the resulting sklearn/utils/heap.c
file consists of more than 3500 lines, most of being the preambule's and the epilogue's injected by Cython:
Content of the generated sklearn/utils/heap.c
▾ macros
-CYTHON_ABI
-CYTHON_ASSUME_SAFE_MACROS
-CYTHON_ASSUME_SAFE_MACROS
-CYTHON_ASSUME_SAFE_MACROS
-CYTHON_ASSUME_SAFE_MACROS
-CYTHON_AVOID_BORROWED_REFS
-CYTHON_AVOID_BORROWED_REFS
-CYTHON_AVOID_BORROWED_REFS
-CYTHON_AVOID_BORROWED_REFS
-CYTHON_COMPILING_IN_CPYTHON
-CYTHON_COMPILING_IN_CPYTHON
-CYTHON_COMPILING_IN_CPYTHON
-CYTHON_COMPILING_IN_CPYTHON
-CYTHON_COMPILING_IN_NOGIL
-CYTHON_COMPILING_IN_NOGIL
-CYTHON_COMPILING_IN_NOGIL
-CYTHON_COMPILING_IN_NOGIL
-CYTHON_COMPILING_IN_PYPY
-CYTHON_COMPILING_IN_PYPY
-CYTHON_COMPILING_IN_PYPY
-CYTHON_COMPILING_IN_PYPY
-CYTHON_COMPILING_IN_PYSTON
-CYTHON_COMPILING_IN_PYSTON
-CYTHON_COMPILING_IN_PYSTON
-CYTHON_COMPILING_IN_PYSTON
-CYTHON_FALLTHROUGH
-CYTHON_FALLTHROUGH
-CYTHON_FALLTHROUGH
-CYTHON_FALLTHROUGH
-CYTHON_FALLTHROUGH
-CYTHON_FALLTHROUGH
-CYTHON_FAST_PYCALL
-CYTHON_FAST_PYCALL
-CYTHON_FAST_PYCALL
-CYTHON_FAST_PYCALL
-CYTHON_FAST_PYCCALL
-CYTHON_FAST_THREAD_STATE
-CYTHON_FAST_THREAD_STATE
-CYTHON_FAST_THREAD_STATE
-CYTHON_FAST_THREAD_STATE
-CYTHON_FAST_THREAD_STATE
-CYTHON_FORMAT_SSIZE_T
-CYTHON_FUTURE_DIVISION
-CYTHON_HEX_VERSION
-CYTHON_INLINE
-CYTHON_INLINE
-CYTHON_INLINE
-CYTHON_INLINE
-CYTHON_INLINE
-CYTHON_MAYBE_UNUSED_VAR(x)
-CYTHON_NCP_UNUSED
-CYTHON_NCP_UNUSED
-CYTHON_PEP393_ENABLED
-CYTHON_PEP393_ENABLED
-CYTHON_PEP489_MULTI_PHASE_INIT
-CYTHON_PEP489_MULTI_PHASE_INIT
-CYTHON_PEP489_MULTI_PHASE_INIT
-CYTHON_PEP489_MULTI_PHASE_INIT
-CYTHON_PEP489_MULTI_PHASE_INIT
-CYTHON_REFNANNY
-CYTHON_RESTRICT
-CYTHON_RESTRICT
-CYTHON_RESTRICT
-CYTHON_RESTRICT
-CYTHON_SMALL_CODE
-CYTHON_SMALL_CODE
-CYTHON_SMALL_CODE
-CYTHON_UNPACK_METHODS
-CYTHON_UNPACK_METHODS
-CYTHON_UNPACK_METHODS
-CYTHON_UNPACK_METHODS
-CYTHON_UNUSED
-CYTHON_UNUSED
-CYTHON_UNUSED
-CYTHON_UNUSED
-CYTHON_UPDATE_DESCRIPTOR_DOC
-CYTHON_UPDATE_DESCRIPTOR_DOC
-CYTHON_UPDATE_DESCRIPTOR_DOC
-CYTHON_USE_ASYNC_SLOTS
-CYTHON_USE_ASYNC_SLOTS
-CYTHON_USE_ASYNC_SLOTS
-CYTHON_USE_ASYNC_SLOTS
-CYTHON_USE_ASYNC_SLOTS
-CYTHON_USE_ASYNC_SLOTS
-CYTHON_USE_DICT_VERSIONS
-CYTHON_USE_DICT_VERSIONS
-CYTHON_USE_DICT_VERSIONS
-CYTHON_USE_DICT_VERSIONS
-CYTHON_USE_EXC_INFO_STACK
-CYTHON_USE_EXC_INFO_STACK
-CYTHON_USE_EXC_INFO_STACK
-CYTHON_USE_EXC_INFO_STACK
-CYTHON_USE_EXC_INFO_STACK
-CYTHON_USE_PYLIST_INTERNALS
-CYTHON_USE_PYLIST_INTERNALS
-CYTHON_USE_PYLIST_INTERNALS
-CYTHON_USE_PYLIST_INTERNALS
-CYTHON_USE_PYLONG_INTERNALS
-CYTHON_USE_PYLONG_INTERNALS
-CYTHON_USE_PYLONG_INTERNALS
-CYTHON_USE_PYLONG_INTERNALS
-CYTHON_USE_PYLONG_INTERNALS
-CYTHON_USE_PYTYPE_LOOKUP
-CYTHON_USE_PYTYPE_LOOKUP
-CYTHON_USE_PYTYPE_LOOKUP
-CYTHON_USE_PYTYPE_LOOKUP
-CYTHON_USE_PYTYPE_LOOKUP
-CYTHON_USE_TP_FINALIZE
-CYTHON_USE_TP_FINALIZE
-CYTHON_USE_TP_FINALIZE
-CYTHON_USE_TP_FINALIZE
-CYTHON_USE_TYPE_SLOTS
-CYTHON_USE_TYPE_SLOTS
-CYTHON_USE_TYPE_SLOTS
-CYTHON_USE_TYPE_SLOTS
-CYTHON_USE_UNICODE_INTERNALS
-CYTHON_USE_UNICODE_INTERNALS
-CYTHON_USE_UNICODE_INTERNALS
-CYTHON_USE_UNICODE_INTERNALS
-CYTHON_USE_UNICODE_WRITER
-CYTHON_USE_UNICODE_WRITER
-CYTHON_USE_UNICODE_WRITER
-CYTHON_USE_UNICODE_WRITER
-CYTHON_USE_UNICODE_WRITER
-CYTHON_WITHOUT_ASSERTIONS
-DL_EXPORT(t)
-DL_IMPORT(t)
-HAVE_LONG_LONG
-METH_FASTCALL
-METH_STACKLESS
-PY_LONG_LONG
-PY_SSIZE_T_CLEAN
-PyBaseString_Type
-PyBoolObject
-PyByteArray_Check(obj)
-PyIntObject
-PyInt_AS_LONG
-PyInt_AsLong
-PyInt_AsSsize_t
-PyInt_AsUnsignedLongLongMask
-PyInt_AsUnsignedLongMask
-PyInt_Check(op)
-PyInt_CheckExact(op)
-PyInt_FromLong
-PyInt_FromSize_t
-PyInt_FromSsize_t
-PyInt_FromString
-PyInt_FromUnicode
-PyInt_Type
-PyMem_RawFree(p)
-PyMem_RawMalloc(n)
-PyMem_RawRealloc(p,n)
-PyNumber_Int
-PyObject_ASCII(o)
-PyObject_Format(obj,fmt)
-PyObject_Free(p)
-PyObject_Malloc(s)
-PyObject_Realloc(p)
-PyObject_Unicode
-PySet_CheckExact(obj)
-PyStringObject
-PyString_Check
-PyString_CheckExact
-PyString_Type
-PyUnicode_1BYTE_KIND
-PyUnicode_2BYTE_KIND
-PyUnicode_4BYTE_KIND
-PyUnicode_Contains(u,s)
-PyUnicode_InternFromString(s)
-Py_BUILD_CORE
-Py_HUGE_VAL
-Py_TPFLAGS_CHECKTYPES
-Py_TPFLAGS_HAVE_FINALIZE
-Py_TPFLAGS_HAVE_INDEX
-Py_TPFLAGS_HAVE_NEWBUFFER
-Py_tss_NEEDS_INIT
-_USE_MATH_DEFINES
-__PYX_BUILD_PY_SSIZE_T
-__PYX_COMMA
-__PYX_DEFAULT_STRING_ENCODING
-__PYX_DEFAULT_STRING_ENCODING_IS_ASCII
-__PYX_DEFAULT_STRING_ENCODING_IS_DEFAULT
-__PYX_DEFAULT_STRING_ENCODING_IS_UTF8
-__PYX_DICT_VERSION_INIT
-__PYX_ERR(f_index,lineno,Ln_error)
-__PYX_EXTERN_C
-__PYX_EXTERN_C
-__PYX_GET_DICT_VERSION(dict)
-__PYX_GET_DICT_VERSION(dict)
-__PYX_HAVE_API__sklearn__utils___heap
-__PYX_HAVE__sklearn__utils___heap
-__PYX_MARK_ERR_POS(f_index,lineno)
-__PYX_NAN()
-__PYX_PY_DICT_LOOKUP_IF_MODIFIED(VAR,DICT,LOOKUP)
-__PYX_PY_DICT_LOOKUP_IF_MODIFIED(VAR,DICT,LOOKUP)
-__PYX_UPDATE_DICT_CACHE(dict,value,cache_var,version_var)
-__PYX_UPDATE_DICT_CACHE(dict,value,cache_var,version_var)
-__PYX_VERIFY_RETURN_INT(target_type,func_type,func_value)
-__PYX_VERIFY_RETURN_INT_EXC(target_type,func_type,func_value)
-__PYX__VERIFY_RETURN_INT(target_type,func_type,func_value,exc)
-__Pyx_BUILTIN_MODULE_NAME
-__Pyx_BUILTIN_MODULE_NAME
-__Pyx_CLEAR(r)
-__Pyx_CLineForTraceback(tstate,c_line)
-__Pyx_DECREF(r)
-__Pyx_DECREF(r)
-__Pyx_DECREF_SET(r,v)
-__Pyx_DefaultClassType
-__Pyx_DefaultClassType
-__Pyx_DefaultClassType
-__Pyx_ErrFetch(type,value,tb)
-__Pyx_ErrFetch(type,value,tb)
-__Pyx_ErrFetchInState(tstate,type,value,tb)
-__Pyx_ErrFetchWithState(type,value,tb)
-__Pyx_ErrFetchWithState(type,value,tb)
-__Pyx_ErrRestore(type,value,tb)
-__Pyx_ErrRestore(type,value,tb)
-__Pyx_ErrRestoreInState(tstate,type,value,tb)
-__Pyx_ErrRestoreWithState(type,value,tb)
-__Pyx_ErrRestoreWithState(type,value,tb)
-__Pyx_GIVEREF(r)
-__Pyx_GIVEREF(r)
-__Pyx_GOTREF(r)
-__Pyx_GOTREF(r)
-__Pyx_HAS_GCC_DIAGNOSTIC
-__Pyx_INCREF(r)
-__Pyx_INCREF(r)
-__Pyx_MODULE_NAME
-__Pyx_NewRef(obj)
-__Pyx_Owned_Py_None(b)
-__Pyx_PyAsyncMethodsStruct
-__Pyx_PyBaseString_Check(obj)
-__Pyx_PyBaseString_Check(obj)
-__Pyx_PyBaseString_CheckExact(obj)
-__Pyx_PyBaseString_CheckExact(obj)
-__Pyx_PyByteArray_FromCString(s)
-__Pyx_PyByteArray_FromString(s)
-__Pyx_PyByteArray_FromStringAndSize(s,l)
-__Pyx_PyBytes_AsSString(s)
-__Pyx_PyBytes_AsString(s)
-__Pyx_PyBytes_AsUString(s)
-__Pyx_PyBytes_AsWritableSString(s)
-__Pyx_PyBytes_AsWritableString(s)
-__Pyx_PyBytes_AsWritableUString(s)
-__Pyx_PyBytes_FromCString(s)
-__Pyx_PyBytes_FromString
-__Pyx_PyBytes_FromStringAndSize
-__Pyx_PyCFunctionFast
-__Pyx_PyCFunctionFastWithKeywords
-__Pyx_PyCode_HasFreeVars(co)
-__Pyx_PyCode_HasFreeVars(co)
-__Pyx_PyCode_New(a,k,l,s,f,code,c,n,v,fv,cell,fn,name,fline,lnos)
-__Pyx_PyCode_New(a,k,l,s,f,code,c,n,v,fv,cell,fn,name,fline,lnos)
-__Pyx_PyDict_GetItemStr(dict,name)
-__Pyx_PyDict_GetItemStr(dict,name)
-__Pyx_PyDict_NewPresized(n)
-__Pyx_PyDict_NewPresized(n)
-__Pyx_PyErr_Clear()
-__Pyx_PyErr_Clear()
-__Pyx_PyErr_GivenExceptionMatches(err,type)
-__Pyx_PyErr_GivenExceptionMatches2(err,type1,type2)
-__Pyx_PyErr_Occurred()
-__Pyx_PyErr_Occurred()
-__Pyx_PyErr_SetNone(exc)
-__Pyx_PyErr_SetNone(exc)
-__Pyx_PyErr_SetNone(exc)
-__Pyx_PyException_Check(obj)
-__Pyx_PyFastCFunction_Check(func)
-__Pyx_PyFastCFunction_Check(func)
-__Pyx_PyFrame_SetLineNumber(frame,lineno)
-__Pyx_PyFrame_SetLineNumber(frame,lineno)
-__Pyx_PyInt_AsHash_t
-__Pyx_PyInt_AsHash_t
-__Pyx_PyInt_FromHash_t
-__Pyx_PyInt_FromHash_t
-__Pyx_PyMODINIT_FUNC
-__Pyx_PyMODINIT_FUNC
-__Pyx_PyMODINIT_FUNC
-__Pyx_PyMODINIT_FUNC
-__Pyx_PyMODINIT_FUNC
-__Pyx_PyMethod_New(func,self,klass)
-__Pyx_PyMethod_New(func,self,klass)
-__Pyx_PyNumber_Divide(x,y)
-__Pyx_PyNumber_Divide(x,y)
-__Pyx_PyNumber_Float(x)
-__Pyx_PyNumber_InPlaceDivide(x,y)
-__Pyx_PyNumber_InPlaceDivide(x,y)
-__Pyx_PyNumber_Int(x)
-__Pyx_PyNumber_Int(x)
-__Pyx_PyObject_AsSString(s)
-__Pyx_PyObject_AsUString(s)
-__Pyx_PyObject_AsWritableSString(s)
-__Pyx_PyObject_AsWritableString(s)
-__Pyx_PyObject_AsWritableUString(s)
-__Pyx_PyObject_FromCString(s)
-__Pyx_PyObject_FromString
-__Pyx_PyObject_FromStringAndSize
-__Pyx_PyObject_GC_IsFinalized(o)
-__Pyx_PyObject_GC_IsFinalized(o)
-__Pyx_PyObject_GetAttrStr(o,n)
-__Pyx_PySequence_SIZE(seq)
-__Pyx_PySequence_SIZE(seq)
-__Pyx_PySequence_Tuple(obj)
-__Pyx_PyStr_FromCString(s)
-__Pyx_PyStr_FromString
-__Pyx_PyStr_FromString
-__Pyx_PyStr_FromStringAndSize
-__Pyx_PyStr_FromStringAndSize
-__Pyx_PyString_Format(a,b)
-__Pyx_PyString_Format(a,b)
-__Pyx_PyString_FormatSafe(a,b)
-__Pyx_PyThreadState_Current
-__Pyx_PyThreadState_Current
-__Pyx_PyThreadState_Current
-__Pyx_PyThreadState_Current
-__Pyx_PyThreadState_assign
-__Pyx_PyThreadState_assign
-__Pyx_PyThreadState_declare
-__Pyx_PyThreadState_declare
-__Pyx_PyType_AsAsync(obj)
-__Pyx_PyType_AsAsync(obj)
-__Pyx_PyType_AsAsync(obj)
-__Pyx_PyUnicode_AsUnicode
-__Pyx_PyUnicode_Concat(a,b)
-__Pyx_PyUnicode_Concat(a,b)
-__Pyx_PyUnicode_ConcatSafe(a,b)
-__Pyx_PyUnicode_ConcatSafe(a,b)
-__Pyx_PyUnicode_DATA(u)
-__Pyx_PyUnicode_DATA(u)
-__Pyx_PyUnicode_FormatSafe(a,b)
-__Pyx_PyUnicode_FromCString(s)
-__Pyx_PyUnicode_FromStringAndSize(c_str,size)
-__Pyx_PyUnicode_FromStringAndSize(c_str,size)
-__Pyx_PyUnicode_FromUnicode(u)
-__Pyx_PyUnicode_FromUnicodeAndLength
-__Pyx_PyUnicode_GET_LENGTH(u)
-__Pyx_PyUnicode_GET_LENGTH(u)
-__Pyx_PyUnicode_IS_TRUE(u)
-__Pyx_PyUnicode_IS_TRUE(u)
-__Pyx_PyUnicode_IS_TRUE(u)
-__Pyx_PyUnicode_IS_TRUE(u)
-__Pyx_PyUnicode_KIND(u)
-__Pyx_PyUnicode_KIND(u)
-__Pyx_PyUnicode_MAX_CHAR_VALUE(u)
-__Pyx_PyUnicode_MAX_CHAR_VALUE(u)
-__Pyx_PyUnicode_READ(k,d,i)
-__Pyx_PyUnicode_READ(k,d,i)
-__Pyx_PyUnicode_READY(op)
-__Pyx_PyUnicode_READY(op)
-__Pyx_PyUnicode_READY(op)
-__Pyx_PyUnicode_READ_CHAR(u,i)
-__Pyx_PyUnicode_READ_CHAR(u,i)
-__Pyx_PyUnicode_WRITE(k,d,i,ch)
-__Pyx_PyUnicode_WRITE(k,d,i,ch)
-__Pyx_RefNannyDeclarations
-__Pyx_RefNannyDeclarations
-__Pyx_RefNannyFinishContext()
-__Pyx_RefNannyFinishContext()
-__Pyx_RefNannySetupContext(name,acquire_gil)
-__Pyx_RefNannySetupContext(name,acquire_gil)
-__Pyx_RefNannySetupContext(name,acquire_gil)
-__Pyx_SET_REFCNT(obj,refcnt)
-__Pyx_SET_REFCNT(obj,refcnt)
-__Pyx_SET_SIZE(obj,size)
-__Pyx_SET_SIZE(obj,size)
-__Pyx_TypeCheck(obj,type)
-__Pyx_TypeCheck(obj,type)
-__Pyx_XCLEAR(r)
-__Pyx_XDECREF(r)
-__Pyx_XDECREF(r)
-__Pyx_XDECREF_SET(r,v)
-__Pyx_XGIVEREF(r)
-__Pyx_XGIVEREF(r)
-__Pyx_XGOTREF(r)
-__Pyx_XGOTREF(r)
-__Pyx_XINCREF(r)
-__Pyx_XINCREF(r)
-__Pyx_fits_Py_ssize_t(v,type,is_signed)
-__Pyx_long_cast(x)
-__Pyx_sst_abs(value)
-__Pyx_sst_abs(value)
-__Pyx_sst_abs(value)
-__Pyx_sst_abs(value)
-__Pyx_sst_abs(value)
-__Pyx_sst_abs(value)
-__Pyx_sst_abs(value)
-__Pyx_truncl
-__Pyx_truncl
-__Pyx_uchar_cast(c)
-__Pyx_void_to_None(void_result)
-__cdecl
-__fastcall
-__has_attribute(x)
-__has_cpp_attribute(x)
-__pyx_PyFloat_AsDouble(x)
-__pyx_PyFloat_AsDouble(x)
-__pyx_PyFloat_AsFloat(x)
-__stdcall
-likely(x)
-likely(x)
-offsetof(type,member)
-unlikely(x)
-unlikely(x)
▾ prototypes
-__Pyx_AddTraceback(const char * funcname,int c_line,int py_line,const char * filename)
-__Pyx_CLineForTraceback(PyThreadState * tstate,int c_line)
-__Pyx_ErrFetchInState(PyThreadState * tstate,PyObject ** type,PyObject ** value,PyObject ** tb)
-__Pyx_ErrRestoreInState(PyThreadState * tstate,PyObject * type,PyObject * value,PyObject * tb)
-__Pyx_ExportFunction(const char * name,void (* f)(void),const char * sig)
-__Pyx_InitStrings(__Pyx_StringTabEntry * t)
-__Pyx_IsSubtype(PyTypeObject * a,PyTypeObject * b)
-__Pyx_PyBool_FromLong(long b)
-__Pyx_PyErr_GivenExceptionMatches(PyObject * err,PyObject * type)
-__Pyx_PyErr_GivenExceptionMatches2(PyObject * err,PyObject * type1,PyObject * type2)
-__Pyx_PyIndex_AsHash_t(PyObject *)
-__Pyx_PyIndex_AsSsize_t(PyObject *)
-__Pyx_PyInt_As_int(PyObject *)
-__Pyx_PyInt_As_long(PyObject *)
-__Pyx_PyInt_FromSize_t(size_t)
-__Pyx_PyInt_From_long(long value)
-__Pyx_PyNumber_IntOrLong(PyObject * x)
-__Pyx_PyObject_AsString(PyObject *)
-__Pyx_PyObject_AsStringAndSize(PyObject *,Py_ssize_t * length)
-__Pyx_PyObject_GetAttrStr(PyObject * obj,PyObject * attr_name)
-__Pyx_PyObject_IsTrue(PyObject *)
-__Pyx_PyObject_IsTrueAndDecref(PyObject *)
-__Pyx_PyUnicode_FromString(const char *)
-__Pyx_RefNannyImportAPI(const char * modname)
-__Pyx_check_binary_version(void)
-__Pyx_get_object_dict_version(PyObject * obj)
-__Pyx_get_tp_dict_version(PyObject * obj)
-__Pyx_modinit_function_export_code(void)
-__Pyx_modinit_function_import_code(void)
-__Pyx_modinit_global_init_code(void)
-__Pyx_modinit_type_import_code(void)
-__Pyx_modinit_type_init_code(void)
-__Pyx_modinit_variable_export_code(void)
-__Pyx_modinit_variable_import_code(void)
-__Pyx_object_dict_version_matches(PyObject * obj,PY_UINT64_T tp_dict_version,PY_UINT64_T obj_dict_version)
-__pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry * entries,int count,int code_line)
-__pyx_find_code_object(int code_line)
-__pyx_fuse_0__pyx_f_7sklearn_5utils_5_heap_heap_push(float *,__pyx_t_7sklearn_5utils_9_typedefs_intp_t *,__pyx_t_7sklearn_5utils_9_typedefs_intp_t,float,__pyx_t_7sklearn_5utils_9_typedefs_intp_t)
-__pyx_fuse_1__pyx_f_7sklearn_5utils_5_heap_heap_push(double *,__pyx_t_7sklearn_5utils_9_typedefs_intp_t *,__pyx_t_7sklearn_5utils_9_typedefs_intp_t,double,__pyx_t_7sklearn_5utils_9_typedefs_intp_t)
-__pyx_insert_code_object(int code_line,PyCodeObject * code_object)
-__pyx_pymod_create(PyObject * spec,PyModuleDef * def)
-__pyx_pymod_exec__heap(PyObject * module)
-init_heap(void)
▾-__anonf7ac09720103 : enum
[enumerators]
+__pyx_check_sizeof_voidp
▾ typedefs
-Py_hash_t
-Py_tss_t
-__Pyx_CodeObjectCacheEntry
-__Pyx_PyAsyncMethodsStruct
-__Pyx_PyCFunctionFast
-__Pyx_PyCFunctionFastWithKeywords
-__Pyx_RefNannyAPIStruct
-__Pyx_StringTabEntry
-__pyx_t_7sklearn_5utils_9_typedefs_float32_t
-__pyx_t_7sklearn_5utils_9_typedefs_float64_t
-__pyx_t_7sklearn_5utils_9_typedefs_int32_t
-__pyx_t_7sklearn_5utils_9_typedefs_int64_t
-__pyx_t_7sklearn_5utils_9_typedefs_intp_t
-__pyx_t_7sklearn_5utils_9_typedefs_uint32_t
-__pyx_t_7sklearn_5utils_9_typedefs_uint64_t
-__pyx_t_7sklearn_5utils_9_typedefs_uint8_t
-uint32_t
-uint32_t
-uint8_t
-uint8_t
▾-__Pyx_CodeObjectCache : struct
[members]
+count
+entries
+max_count
▾-__anonf7ac09720208 : struct
[members]
+am_aiter
+am_anext
+am_await
▾-__anonf7ac09720308 : struct
[members]
+encoding
+intern
+is_str
+is_unicode
+n
+p
+s
▾-__anonf7ac09720408 : struct
[members]
+DECREF
+FinishContext
+GIVEREF
+GOTREF
+INCREF
+SetupContext
▾-__anonf7ac09720508 : struct
[members]
+code_line
+code_object
-__anonf7ac0972060a : union
▾ variables
-__PYX_DEFAULT_STRING_ENCODING
-__Pyx_RefNanny
-__Pyx_sys_getdefaultencoding_not_ascii
-__pyx_b
-__pyx_cfilenm
-__pyx_clineno
-__pyx_code_cache
-__pyx_cython_runtime
-__pyx_d
-__pyx_empty_bytes
-__pyx_empty_tuple
-__pyx_empty_unicode
-__pyx_f
-__pyx_filename
-__pyx_k_cline_in_traceback
-__pyx_k_main
-__pyx_k_name
-__pyx_k_test
-__pyx_lineno
-__pyx_m
-__pyx_methods
__pyx_module_is_main_sklearn__utils___heap
-__pyx_moduledef
-__pyx_moduledef_slots
-__pyx_n_s_cline_in_traceback
-__pyx_n_s_main
-__pyx_n_s_name
-__pyx_n_s_test
-__pyx_string_tab
▾ functions
CYTHON_MAYBE_UNUSED_VAR(const T &)
-PyThread_tss_alloc(void)
-PyThread_tss_create(Py_tss_t * key)
-PyThread_tss_delete(Py_tss_t * key)
-PyThread_tss_free(Py_tss_t * key)
-PyThread_tss_get(Py_tss_t * key)
-PyThread_tss_is_created(Py_tss_t * key)
-PyThread_tss_set(Py_tss_t * key,void * value)
-__PYX_NAN()
-__Pyx_AddTraceback(const char * funcname,int c_line,int py_line,const char * filename)
-__Pyx_CLineForTraceback(CYTHON_UNUSED PyThreadState * tstate,int c_line)
-__Pyx_CreateCodeObjectForTraceback(const char * funcname,int c_line,int py_line,const char * filename)
-__Pyx_ErrFetchInState(PyThreadState * tstate,PyObject ** type,PyObject ** value,PyObject ** tb)
-__Pyx_ErrRestoreInState(PyThreadState * tstate,PyObject * type,PyObject * value,PyObject * tb)
▾-__Pyx_ExportFunction(const char * name,void (* f)(void),const char * sig)
-__Pyx_InBases(PyTypeObject * a,PyTypeObject * b)
-__Pyx_InitCachedBuiltins(void)
-__Pyx_InitCachedConstants(void)
-__Pyx_InitGlobals(void)
-__Pyx_InitStrings(__Pyx_StringTabEntry * t)
-__Pyx_IsSubtype(PyTypeObject * a,PyTypeObject * b)
-__Pyx_PyBool_FromLong(long b)
-__Pyx_PyCode_New(int a,int k,int l,int s,int f,PyObject * code,PyObject * c,PyObject * n,PyObject * v,PyObject * fv,PyObject * cell,PyObject * fn,PyObject * name,int fline,PyObject * lnos)
-__Pyx_PyErr_GivenExceptionMatches(PyObject * err,PyObject * exc_type)
-__Pyx_PyErr_GivenExceptionMatches2(PyObject * err,PyObject * exc_type1,PyObject * exc_type2)
-__Pyx_PyErr_GivenExceptionMatchesTuple(PyObject * exc_type,PyObject * tuple)
-__Pyx_PyIndex_AsHash_t(PyObject * o)
-__Pyx_PyIndex_AsSsize_t(PyObject * b)
-__Pyx_PyInt_As_int(PyObject * x)
-__Pyx_PyInt_As_long(PyObject * x)
-__Pyx_PyInt_FromSize_t(size_t ival)
-__Pyx_PyInt_From_long(long value)
-__Pyx_PyNumber_IntOrLong(PyObject * x)
-__Pyx_PyNumber_IntOrLongWrongResultType(PyObject * result,const char * type_name)
-__Pyx_PyObject_AsString(PyObject * o)
-__Pyx_PyObject_AsStringAndSize(PyObject * o,Py_ssize_t * length)
-__Pyx_PyObject_GetAttrStr(PyObject * obj,PyObject * attr_name)
-__Pyx_PyObject_IsTrue(PyObject * x)
-__Pyx_PyObject_IsTrueAndDecref(PyObject * x)
-__Pyx_PyUnicode_AsStringAndSize(PyObject * o,Py_ssize_t * length)
-__Pyx_PyUnicode_AsStringAndSize(PyObject * o,Py_ssize_t * length)
-__Pyx_PyUnicode_FromString(const char * c_str)
-__Pyx_Py_UNICODE_strlen(const Py_UNICODE * u)
-__Pyx_RefNannyImportAPI(const char * modname)
-__Pyx_check_binary_version(void)
-__Pyx_get_object_dict_version(PyObject * obj)
-__Pyx_get_tp_dict_version(PyObject * obj)
-__Pyx_init_sys_getdefaultencoding_params(void)
-__Pyx_init_sys_getdefaultencoding_params(void)
-__Pyx_inner_PyErr_GivenExceptionMatches2(PyObject * err,PyObject * exc_type1,PyObject * exc_type2)
-__Pyx_inner_PyErr_GivenExceptionMatches2(PyObject * err,PyObject * exc_type1,PyObject * exc_type2)
-__Pyx_is_valid_index(Py_ssize_t i,Py_ssize_t limit)
-__Pyx_modinit_function_export_code(void)
-__Pyx_modinit_function_import_code(void)
-__Pyx_modinit_global_init_code(void)
-__Pyx_modinit_type_import_code(void)
-__Pyx_modinit_type_init_code(void)
-__Pyx_modinit_variable_export_code(void)
-__Pyx_modinit_variable_import_code(void)
-__Pyx_object_dict_version_matches(PyObject * obj,PY_UINT64_T tp_dict_version,PY_UINT64_T obj_dict_version)
-__Pyx_pretend_to_initialize(void * ptr)
-__pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry * entries,int count,int code_line)
-__pyx_find_code_object(int code_line)
-__pyx_fuse_0__pyx_f_7sklearn_5utils_5_heap_heap_push(float * __pyx_v_values,__pyx_t_7sklearn_5utils_9_typedefs_intp_t * __pyx_v_indices,__pyx_t_7sklearn_5utils_9_typedefs_intp_t __pyx_v_size,float __pyx_v_val,__pyx_t_7sklearn_5utils_9_typedefs_intp_t __pyx_v_val_idx)
-__pyx_fuse_1__pyx_f_7sklearn_5utils_5_heap_heap_push(double * __pyx_v_values,__pyx_t_7sklearn_5utils_9_typedefs_intp_t * __pyx_v_indices,__pyx_t_7sklearn_5utils_9_typedefs_intp_t __pyx_v_size,double __pyx_v_val,__pyx_t_7sklearn_5utils_9_typedefs_intp_t __pyx_v_val_idx)
-__pyx_insert_code_object(int code_line,PyCodeObject * code_object)
init_heap(void)
Problem
Currently the uncompressed size of scikit-learn is around 48.8MB, 20MB of which are shared object files. As reported by @rth in pyodide/pyodide#4289, while shared object files are optimized for Emscripten quite heavily, they still accounts for most of the size of scikit-learn on this stack.
Extensions' shared object sizes on Linux
find . -name \*.so -exec du -h {} \; | sort -h --reverse
2,1M ./sklearn/_loss/_loss.cpython-312-x86_64-linux-gnu.so
712K ./sklearn/utils/sparsefuncs_fast.cpython-312-x86_64-linux-gnu.so
672K ./sklearn/neighbors/_kd_tree.cpython-312-x86_64-linux-gnu.so
672K ./sklearn/neighbors/_ball_tree.cpython-312-x86_64-linux-gnu.so
624K ./sklearn/tree/_tree.cpython-312-x86_64-linux-gnu.so
608K ./sklearn/metrics/_dist_metrics.cpython-312-x86_64-linux-gnu.so
520K ./sklearn/datasets/_svmlight_format_fast.cpython-312-x86_64-linux-gnu.so
504K ./sklearn/preprocessing/_target_encoder_fast.cpython-312-x86_64-linux-gnu.so
476K ./sklearn/svm/_libsvm.cpython-312-x86_64-linux-gnu.so
448K ./sklearn/svm/_libsvm_sparse.cpython-312-x86_64-linux-gnu.so
448K ./sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.cpython-312-x86_64-linux-gnu.so
440K ./sklearn/metrics/_pairwise_distances_reduction/_datasets_pair.cpython-312-x86_64-linux-gnu.so
440K ./sklearn/linear_model/_cd_fast.cpython-312-x86_64-linux-gnu.so
404K ./sklearn/cluster/_k_means_elkan.cpython-312-x86_64-linux-gnu.so
396K ./sklearn/utils/_cython_blas.cpython-312-x86_64-linux-gnu.so
396K ./sklearn/cluster/_k_means_common.cpython-312-x86_64-linux-gnu.so
384K ./sklearn/preprocessing/_csr_polynomial_expansion.cpython-312-x86_64-linux-gnu.so
352K ./sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.cpython-312-x86_64-linux-gnu.so
340K ./sklearn/tree/_splitter.cpython-312-x86_64-linux-gnu.so
340K ./sklearn/cluster/_hdbscan/_tree.cpython-312-x86_64-linux-gnu.so
324K ./sklearn/metrics/_pairwise_distances_reduction/_argkmin.cpython-312-x86_64-linux-gnu.so
324K ./sklearn/linear_model/_sgd_fast.cpython-312-x86_64-linux-gnu.so
312K ./sklearn/tree/_criterion.cpython-312-x86_64-linux-gnu.so
312K ./sklearn/cluster/_k_means_lloyd.cpython-312-x86_64-linux-gnu.so
308K ./sklearn/ensemble/_hist_gradient_boosting/splitting.cpython-312-x86_64-linux-gnu.so
308K ./sklearn/cluster/_hdbscan/_reachability.cpython-312-x86_64-linux-gnu.so
304K ./sklearn/cluster/_hierarchical_fast.cpython-312-x86_64-linux-gnu.so
292K ./sklearn/metrics/_pairwise_distances_reduction/_base.cpython-312-x86_64-linux-gnu.so
280K ./sklearn/linear_model/_sag_fast.cpython-312-x86_64-linux-gnu.so
276K ./sklearn/ensemble/_hist_gradient_boosting/histogram.cpython-312-x86_64-linux-gnu.so
272K ./sklearn/svm/_liblinear.cpython-312-x86_64-linux-gnu.so
268K ./sklearn/utils/_seq_dataset.cpython-312-x86_64-linux-gnu.so
264K ./sklearn/neighbors/_quad_tree.cpython-312-x86_64-linux-gnu.so
260K ./sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors_classmode.cpython-312-x86_64-linux-gnu.so
256K ./sklearn/_isotonic.cpython-312-x86_64-linux-gnu.so
252K ./sklearn/cluster/_k_means_minibatch.cpython-312-x86_64-linux-gnu.so
248K ./sklearn/utils/_fast_dict.cpython-312-x86_64-linux-gnu.so
248K ./sklearn/tree/_utils.cpython-312-x86_64-linux-gnu.so
248K ./sklearn/metrics/_pairwise_fast.cpython-312-x86_64-linux-gnu.so
244K ./sklearn/decomposition/_online_lda_fast.cpython-312-x86_64-linux-gnu.so
240K ./sklearn/metrics/_pairwise_distances_reduction/_argkmin_classmode.cpython-312-x86_64-linux-gnu.so
232K ./sklearn/utils/_typedefs.cpython-312-x86_64-linux-gnu.so
232K ./sklearn/utils/_isfinite.cpython-312-x86_64-linux-gnu.so
220K ./sklearn/utils/arrayfuncs.cpython-312-x86_64-linux-gnu.so
220K ./sklearn/ensemble/_gradient_boosting.cpython-312-x86_64-linux-gnu.so
216K ./sklearn/ensemble/_hist_gradient_boosting/utils.cpython-312-x86_64-linux-gnu.so
212K ./sklearn/utils/_random.cpython-312-x86_64-linux-gnu.so
212K ./sklearn/utils/murmurhash.cpython-312-x86_64-linux-gnu.so
212K ./sklearn/ensemble/_hist_gradient_boosting/_predictor.cpython-312-x86_64-linux-gnu.so
212K ./sklearn/decomposition/_cdnmf_fast.cpython-312-x86_64-linux-gnu.so
212K ./sklearn/cluster/_hdbscan/_linkage.cpython-312-x86_64-linux-gnu.so
204K ./sklearn/metrics/cluster/_expected_mutual_info_fast.cpython-312-x86_64-linux-gnu.so
204K ./sklearn/manifold/_barnes_hut_tsne.cpython-312-x86_64-linux-gnu.so
192K ./sklearn/utils/_weight_vector.cpython-312-x86_64-linux-gnu.so
184K ./sklearn/manifold/_utils.cpython-312-x86_64-linux-gnu.so
184K ./sklearn/ensemble/_hist_gradient_boosting/_gradient_boosting.cpython-312-x86_64-linux-gnu.so
184K ./sklearn/cluster/_dbscan_inner.cpython-312-x86_64-linux-gnu.so
180K ./sklearn/ensemble/_hist_gradient_boosting/_bitset.cpython-312-x86_64-linux-gnu.so
180K ./sklearn/ensemble/_hist_gradient_boosting/_binning.cpython-312-x86_64-linux-gnu.so
128K ./sklearn/utils/_vector_sentinel.cpython-312-x86_64-linux-gnu.so
112K ./sklearn/ensemble/_hist_gradient_boosting/common.cpython-312-x86_64-linux-gnu.so
80K ./sklearn/feature_extraction/_hashing_fast.cpython-312-x86_64-linux-gnu.so
48K ./sklearn/utils/_openmp_helpers.cpython-312-x86_64-linux-gnu.so
32K ./sklearn/svm/_newrand.cpython-312-x86_64-linux-gnu.so
28K ./sklearn/utils/_sorting.cpython-312-x86_64-linux-gnu.so
28K ./sklearn/neighbors/_partition_nodes.cpython-312-x86_64-linux-gnu.so
24K ./sklearn/utils/_heap.cpython-312-x86_64-linux-gnu.so
24K ./sklearn/__check_build/_check_build.cpython-312-x86_64-linux-gnu.so
Possible solutions
Strip all symbols and optimize for size
This can be done by adding -Wl,--strip-all
to extra_link_args
and -Os -g0
to extra_compile_args
.
In practice, it can significantly shrink shared object (up to nearly 50% size reduction):
Extensions' shared object sizes on Linux after striping all symbols and optimizing for size
find . -name \*.so -exec du -h {} \; | sort -h --reverse
1,2M sklearn/_loss/_loss.cpython-312-x86_64-linux-gnu.so
560K sklearn/tree/_tree.cpython-312-x86_64-linux-gnu.so
468K sklearn/neighbors/_kd_tree.cpython-312-x86_64-linux-gnu.so
468K sklearn/neighbors/_ball_tree.cpython-312-x86_64-linux-gnu.so
460K sklearn/utils/sparsefuncs_fast.cpython-312-x86_64-linux-gnu.so
460K sklearn/metrics/_dist_metrics.cpython-312-x86_64-linux-gnu.so
380K sklearn/datasets/_svmlight_format_fast.cpython-312-x86_64-linux-gnu.so
376K sklearn/svm/_libsvm.cpython-312-x86_64-linux-gnu.so
352K sklearn/svm/_libsvm_sparse.cpython-312-x86_64-linux-gnu.so
336K sklearn/tree/_splitter.cpython-312-x86_64-linux-gnu.so
320K sklearn/preprocessing/_target_encoder_fast.cpython-312-x86_64-linux-gnu.so
320K sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.cpython-312-x86_64-linux-gnu.so
316K sklearn/metrics/_pairwise_distances_reduction/_datasets_pair.cpython-312-x86_64-linux-gnu.so
312K sklearn/utils/_cython_blas.cpython-312-x86_64-linux-gnu.so
308K sklearn/linear_model/_cd_fast.cpython-312-x86_64-linux-gnu.so
300K sklearn/tree/_criterion.cpython-312-x86_64-linux-gnu.so
280K sklearn/preprocessing/_csr_polynomial_expansion.cpython-312-x86_64-linux-gnu.so
280K sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.cpython-312-x86_64-linux-gnu.so
280K sklearn/cluster/_k_means_elkan.cpython-312-x86_64-linux-gnu.so
276K sklearn/cluster/_k_means_common.cpython-312-x86_64-linux-gnu.so
252K sklearn/linear_model/_sgd_fast.cpython-312-x86_64-linux-gnu.so
248K sklearn/cluster/_hdbscan/_tree.cpython-312-x86_64-linux-gnu.so
244K sklearn/metrics/_pairwise_distances_reduction/_argkmin.cpython-312-x86_64-linux-gnu.so
240K sklearn/tree/_utils.cpython-312-x86_64-linux-gnu.so
240K sklearn/ensemble/_hist_gradient_boosting/splitting.cpython-312-x86_64-linux-gnu.so
236K sklearn/cluster/_hierarchical_fast.cpython-312-x86_64-linux-gnu.so
236K sklearn/cluster/_hdbscan/_reachability.cpython-312-x86_64-linux-gnu.so
232K sklearn/svm/_liblinear.cpython-312-x86_64-linux-gnu.so
228K sklearn/cluster/_k_means_lloyd.cpython-312-x86_64-linux-gnu.so
224K sklearn/metrics/_pairwise_distances_reduction/_base.cpython-312-x86_64-linux-gnu.so
212K sklearn/neighbors/_quad_tree.cpython-312-x86_64-linux-gnu.so
208K sklearn/linear_model/_sag_fast.cpython-312-x86_64-linux-gnu.so
204K sklearn/ensemble/_hist_gradient_boosting/histogram.cpython-312-x86_64-linux-gnu.so
200K sklearn/utils/_seq_dataset.cpython-312-x86_64-linux-gnu.so
196K sklearn/utils/_fast_dict.cpython-312-x86_64-linux-gnu.so
196K sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors_classmode.cpython-312-x86_64-linux-gnu.so
196K sklearn/_isotonic.cpython-312-x86_64-linux-gnu.so
196K sklearn/decomposition/_online_lda_fast.cpython-312-x86_64-linux-gnu.so
196K sklearn/cluster/_k_means_minibatch.cpython-312-x86_64-linux-gnu.so
192K sklearn/utils/_isfinite.cpython-312-x86_64-linux-gnu.so
192K sklearn/metrics/_pairwise_fast.cpython-312-x86_64-linux-gnu.so
192K sklearn/metrics/_pairwise_distances_reduction/_argkmin_classmode.cpython-312-x86_64-linux-gnu.so
184K sklearn/utils/_typedefs.cpython-312-x86_64-linux-gnu.so
180K sklearn/utils/arrayfuncs.cpython-312-x86_64-linux-gnu.so
180K sklearn/ensemble/_hist_gradient_boosting/utils.cpython-312-x86_64-linux-gnu.so
180K sklearn/ensemble/_gradient_boosting.cpython-312-x86_64-linux-gnu.so
172K sklearn/utils/_random.cpython-312-x86_64-linux-gnu.so
168K sklearn/utils/murmurhash.cpython-312-x86_64-linux-gnu.so
168K sklearn/ensemble/_hist_gradient_boosting/_predictor.cpython-312-x86_64-linux-gnu.so
168K sklearn/decomposition/_cdnmf_fast.cpython-312-x86_64-linux-gnu.so
168K sklearn/cluster/_hdbscan/_linkage.cpython-312-x86_64-linux-gnu.so
160K sklearn/metrics/cluster/_expected_mutual_info_fast.cpython-312-x86_64-linux-gnu.so
160K sklearn/manifold/_barnes_hut_tsne.cpython-312-x86_64-linux-gnu.so
152K sklearn/utils/_weight_vector.cpython-312-x86_64-linux-gnu.so
152K sklearn/manifold/_utils.cpython-312-x86_64-linux-gnu.so
152K sklearn/ensemble/_hist_gradient_boosting/_gradient_boosting.cpython-312-x86_64-linux-gnu.so
152K sklearn/cluster/_dbscan_inner.cpython-312-x86_64-linux-gnu.so
148K sklearn/ensemble/_hist_gradient_boosting/_bitset.cpython-312-x86_64-linux-gnu.so
144K sklearn/ensemble/_hist_gradient_boosting/_binning.cpython-312-x86_64-linux-gnu.so
112K sklearn/utils/_vector_sentinel.cpython-312-x86_64-linux-gnu.so
96K sklearn/ensemble/_hist_gradient_boosting/common.cpython-312-x86_64-linux-gnu.so
68K sklearn/feature_extraction/_hashing_fast.cpython-312-x86_64-linux-gnu.so
44K sklearn/utils/_openmp_helpers.cpython-312-x86_64-linux-gnu.so
32K sklearn/svm/_newrand.cpython-312-x86_64-linux-gnu.so
28K sklearn/utils/_sorting.cpython-312-x86_64-linux-gnu.so
28K sklearn/neighbors/_partition_nodes.cpython-312-x86_64-linux-gnu.so
24K sklearn/utils/_heap.cpython-312-x86_64-linux-gnu.so
24K sklearn/__check_build/_check_build.cpython-312-x86_64-linux-gnu.so
Group several translation units within C extensions (and use interprocedural optimization)
So as to reuse duplicated symbols in shared objects and perform optimization over several translation units (such as inlining functions, etc.)