Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core]Fix the segfault #17772

Merged
merged 3 commits into from
Aug 13, 2021
Merged

[Core]Fix the segfault #17772

merged 3 commits into from
Aug 13, 2021

Conversation

rkooo567
Copy link
Contributor

Why are these changes needed?

^[[B^[[B^[[B^[[B^[[B^[[B^[[B(raylet, ip=172.31.33.163) *** SIGSEGV received at time=1628728663 on cpu 22 ***
(raylet, ip=172.31.33.163) PC: @     0x55aaecaaa0ac  (unknown)  plasma::PlasmaStore::IsObjectSpillable()
(raylet, ip=172.31.33.163)     @     0x7f1721ae4980  1119658160  (unknown)
(raylet, ip=172.31.33.163)     @     0x55aaeca3cbdc        240  ray::raylet::LocalObjectManager::SpillObjectsOfSize()
(raylet, ip=172.31.33.163)     @     0x55aaeca3cfef         64  ray::raylet::LocalObjectManager::SpillObjectUptoMaxThroughput()
(raylet, ip=172.31.33.163)     @     0x55aaeccf0bb6        112  boost::asio::detail::completion_handler<>::do_complete()
(raylet, ip=172.31.33.163)     @     0x55aaed0f62c8        112  boost::asio::detail::scheduler::do_run_one()
(raylet, ip=172.31.33.163)     @     0x55aaed0f76a1        160  boost::asio::detail::scheduler::run()
(raylet, ip=172.31.33.163)     @     0x55aaed0f9630         64  boost::asio::io_context::run()
(raylet, ip=172.31.33.163)     @     0x55aaec94af0a       1088  main
(raylet, ip=172.31.33.163)     @     0x7f1720bbbbf7  (unknown)  __libc_start_main
(raylet, ip=172.31.33.163)     @ 0x41d589495541f689  (unknown)  (unknown)
(raylet, ip=172.31.33.163) [2021-08-11 17:37:43,363 E 291 291] logging.cc:299: *** SIGSEGV received at time=1628728663 on cpu 22 ***
(raylet, ip=172.31.33.163) [2021-08-11 17:37:43,363 E 291 291] logging.cc:299: PC: @     0x55aaecaaa0ac  (unknown)  plasma::PlasmaStore::IsObjectSpillable()
(raylet, ip=172.31.33.163) [2021-08-11 17:37:43,363 E 291 291] logging.cc:299:     @     0x7f1721ae4980  1119658160  (unknown)
(raylet, ip=172.31.33.163) [2021-08-11 17:37:43,363 E 291 291] logging.cc:299:     @     0x55aaeca3cbdc        240  ray::raylet::LocalObjectManager::SpillObjectsOfSize()
(raylet, ip=172.31.33.163) [2021-08-11 17:37:43,363 E 291 291] logging.cc:299:     @     0x55aaeca3cfef         64  ray::raylet::LocalObjectManager::SpillObjectUptoMaxThroughput()
(raylet, ip=172.31.33.163) [2021-08-11 17:37:43,363 E 291 291] logging.cc:299:     @     0x55aaeccf0bb6        112  boost::asio::detail::completion_handler<>::do_complete()
(raylet, ip=172.31.33.163) [2021-08-11 17:37:43,363 E 291 291] logging.cc:299:     @     0x55aaed0f62c8        112  boost::asio::detail::scheduler::do_run_one()
(raylet, ip=172.31.33.163) [2021-08-11 17:37:43,363 E 291 291] logging.cc:299:     @     0x55aaed0f76a1        160  boost::asio::detail::scheduler::run()
(raylet, ip=172.31.33.163) [2021-08-11 17:37:43,363 E 291 291] logging.cc:299:     @     0x55aaed0f9630         64  boost::asio::io_context::run()
(raylet, ip=172.31.33.163) [2021-08-11 17:37:43,363 E 291 291] logging.cc:299:     @     0x55aaec94af0a       1088  main
(raylet, ip=172.31.33.163) [2021-08-11 17:37:43,363 E 291 291] logging.cc:299:     @     0x7f1720bbbbf7  (unknown)  __libc_start_main
(raylet, ip=172.31.33.163) [2021-08-11 17:37:43,363 E 291 291] logging.cc:299:     @ 0x41d589495541f689  (unknown)  (unknown)
^[[A2021-08-11 17:38:12,111     WARNING worker.py:1215 -- The node with node id: 735bff24910fccb93c9a5238e5efdfdb4e5a3665ca6a7a6a6102a157 and ip: 172.31.33.163 has been marked dead because the detector has missed too many heartbeats from it. This can happen when a raylet crashes unexpectedly or has lagging heartbeats.

GetObject can return the nullptr, so we should check that always.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@rkooo567 rkooo567 assigned scv119 and richardliaw and unassigned richardliaw Aug 12, 2021
@scv119 scv119 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 13, 2021
@rkooo567
Copy link
Contributor Author

The docker build failure should be unrelated.

@rkooo567 rkooo567 merged commit 21635b3 into ray-project:master Aug 13, 2021
Bam4d pushed a commit to Bam4d/ray that referenced this pull request Aug 13, 2021
krfricke pushed a commit that referenced this pull request Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants