Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Some More GPU documentation #401

Merged
merged 100 commits into from
Apr 12, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
100 commits
Select commit Hold shift + click to select a range
4810c79
add dummy gpu solver code
huanzhang12 Feb 10, 2017
e41ba15
initial GPU code
huanzhang12 Feb 12, 2017
6dde565
fix crash bug
huanzhang12 Feb 12, 2017
2dce7d1
first working version
huanzhang12 Feb 12, 2017
146b2dd
use asynchronous copy
huanzhang12 Feb 12, 2017
1f39a03
use a better kernel for root
huanzhang12 Feb 13, 2017
435674d
parallel read histogram
huanzhang12 Feb 13, 2017
22f478a
sparse features now works, but no acceleration, compute on CPU
huanzhang12 Feb 13, 2017
cfd77ae
compute sparse feature on CPU simultaneously
huanzhang12 Feb 13, 2017
40c3212
fix big bug; add gpu selection; add kernel selection
huanzhang12 Feb 14, 2017
c3398c9
better debugging
huanzhang12 Feb 14, 2017
76a13c7
clean up
huanzhang12 Feb 15, 2017
2dc4555
add feature scatter
huanzhang12 Feb 15, 2017
d4c1c01
Add sparse_threshold control
huanzhang12 Feb 15, 2017
97da274
fix a bug in feature scatter
huanzhang12 Feb 15, 2017
a96ca80
clean up debug
huanzhang12 Feb 15, 2017
9be6438
temporarily add OpenCL kernels for k=64,256
huanzhang12 Feb 27, 2017
cbef453
fix up CMakeList and definition USE_GPU
huanzhang12 Feb 27, 2017
4d08152
add OpenCL kernels as string literals
huanzhang12 Feb 28, 2017
624d405
Add boost.compute as a submodule
huanzhang12 Feb 28, 2017
11b241f
add boost dependency into CMakeList
huanzhang12 Feb 28, 2017
5142f19
fix opencl pragma
huanzhang12 Feb 28, 2017
508b48c
use pinned memory for histogram
huanzhang12 Feb 28, 2017
1a63b99
use pinned buffer for gradients and hessians
huanzhang12 Mar 1, 2017
e2166b1
better debugging message
huanzhang12 Mar 1, 2017
3b24e33
add double precision support on GPU
huanzhang12 Mar 9, 2017
e7336ee
fix boost version in CMakeList
huanzhang12 Mar 9, 2017
b29fec7
Add a README
huanzhang12 Mar 9, 2017
97fed3e
reconstruct GPU initialization code for ResetTrainingData
huanzhang12 Mar 12, 2017
164dbd1
move data to GPU in parallel
huanzhang12 Mar 12, 2017
c1c605e
fix a bug during feature copy
huanzhang12 Mar 13, 2017
c5ab1ae
update gpu kernels
huanzhang12 Mar 13, 2017
947629a
update gpu code
huanzhang12 Mar 15, 2017
105b0dd
initial port to LightGBM v2
huanzhang12 Mar 19, 2017
ba2c0a3
speedup GPU data loading process
huanzhang12 Mar 21, 2017
a6cb794
Add 4-bit bin support to GPU
huanzhang12 Mar 22, 2017
ed929cb
re-add sparse_threshold parameter
huanzhang12 Mar 23, 2017
2cd3d85
remove kMaxNumWorkgroups and allows an unlimited number of features
huanzhang12 Mar 23, 2017
4d2758f
add feature mask support for skipping unused features
huanzhang12 Mar 24, 2017
62bc04e
enable kernel cache
huanzhang12 Mar 24, 2017
e4dd344
use GPU kernels withoug feature masks when all features are used
huanzhang12 Mar 24, 2017
61b09a3
REAdme.
Mar 25, 2017
da20fc0
REAdme.
Mar 25, 2017
2d43e36
update README
huanzhang12 Mar 25, 2017
9602cd7
update to v2
huanzhang12 Mar 25, 2017
cd52bb0
fix typos (#349)
wxchan Mar 17, 2017
be91a98
change compile to gcc on Apple as default
chivee Mar 18, 2017
8f1d05e
clean vscode related file
chivee Mar 19, 2017
411383f
refine api of constructing from sampling data.
guolinke Mar 21, 2017
487660e
fix bug in the last commit.
guolinke Mar 21, 2017
882f420
more efficient algorithm to sample k from n.
guolinke Mar 22, 2017
7d0f338
fix bug in filter bin
guolinke Mar 22, 2017
0b44817
change to boost from average output.
guolinke Mar 22, 2017
85a3ba4
fix tests.
guolinke Mar 22, 2017
f615ba0
only stop training when all classes are finshed in multi-class.
guolinke Mar 23, 2017
fbed3ca
limit the max tree output. change hessian in multi-class objective.
guolinke Mar 24, 2017
8eb961b
robust tree model loading.
guolinke Mar 24, 2017
10cd85f
fix test.
guolinke Mar 24, 2017
e57ec49
convert the probabilities to raw score in boost_from_average of class…
guolinke Mar 24, 2017
39965a0
fix the average label for binary classification.
guolinke Mar 24, 2017
8ac77dc
Add boost_from_average to docs (#354)
Laurae2 Mar 24, 2017
25f6268
don't use "ConvertToRawScore" for self-defined objective function.
guolinke Mar 24, 2017
bf3dfb6
boost_from_average seems doesn't work well in binary classification. …
guolinke Mar 24, 2017
22df883
For a better jump link (#355)
JayveeHe Mar 25, 2017
9f4d2f0
add FitByExistingTree.
guolinke Mar 25, 2017
f54ac4d
adapt GPU tree learner for FitByExistingTree
huanzhang12 Mar 26, 2017
59c473b
avoid NaN output.
guolinke Mar 26, 2017
a0549d1
update boost.compute
huanzhang12 Mar 26, 2017
5e945d2
fix typos (#361)
zhangyafeikimi Mar 26, 2017
3891cdb
fix broken links (#359)
wxchan Mar 26, 2017
48b4d9d
update README
huanzhang12 Mar 27, 2017
7248e58
disable GPU acceleration by default
huanzhang12 Mar 27, 2017
56fe2cc
fix image url
huanzhang12 Mar 27, 2017
1c51775
cleanup debug macro
huanzhang12 Mar 27, 2017
78ae386
Initial GPU acceleration
huanzhang12 Mar 27, 2017
2690181
Merge remote-tracking branch 'gpudev/master'
huanzhang12 Mar 27, 2017
f3573d5
remove old README
huanzhang12 Mar 27, 2017
12e5b82
do not save sparse_threshold_ in FeatureGroup
huanzhang12 Mar 27, 2017
1159854
add details for new GPU settings
huanzhang12 Mar 27, 2017
c719ead
ignore submodule when doing pep8 check
huanzhang12 Mar 27, 2017
15c97b4
allocate workspace for at least one thread during builing Feature4
huanzhang12 Mar 27, 2017
cb35a02
move sparse_threshold to class Dataset
huanzhang12 Mar 28, 2017
a039a3a
remove duplicated code in GPUTreeLearner::Split
huanzhang12 Mar 29, 2017
35ab97f
Remove duplicated code in FindBestThresholds and BeforeFindBestSplit
huanzhang12 Mar 29, 2017
28c1715
do not rebuild ordered gradients and hessians for sparse features
huanzhang12 Mar 29, 2017
2af1860
support feature groups in GPUTreeLearner
huanzhang12 Apr 4, 2017
475cf8c
Merge remote-tracking branch 'upstream/master'
huanzhang12 Apr 5, 2017
4d5d957
Initial parallel learners with GPU support
huanzhang12 Apr 5, 2017
4b44173
add option device, cleanup code
huanzhang12 Apr 5, 2017
b948c1f
clean up FindBestThresholds; add some omp parallel
huanzhang12 Apr 6, 2017
50f7da1
Merge remote-tracking branch 'upstream/master'
huanzhang12 Apr 7, 2017
3a16753
Merge remote-tracking branch 'upstream/master'
huanzhang12 Apr 7, 2017
2b0514e
constant hessian optimization for GPU
huanzhang12 Apr 8, 2017
e72d8cd
Fix GPUTreeLearner crash when there is zero feature
huanzhang12 Apr 9, 2017
a68ae52
use np.testing.assert_almost_equal() to compare lists of floats in tests
huanzhang12 Apr 9, 2017
2ac5103
travis for GPU
huanzhang12 Apr 9, 2017
edb30a6
Merge remote-tracking branch 'upstream/master'
huanzhang12 Apr 9, 2017
0c5eb15
Merge remote-tracking branch 'upstream/master'
huanzhang12 Apr 9, 2017
b121443
Merge remote-tracking branch 'upstream/master'
huanzhang12 Apr 11, 2017
74bc952
add tutorial and more GPU docs
huanzhang12 Apr 12, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
clean up FindBestThresholds; add some omp parallel
  • Loading branch information
huanzhang12 committed Apr 7, 2017
commit b948c1ff8b7dec89d0d14a418aa3c9df6c3aaed2
160 changes: 53 additions & 107 deletions src/treelearner/gpu_tree_learner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,7 @@ void GPUTreeLearner::AllocateGPUMemory() {
BinIterator* bin_iter = train_data_->FeatureGroupIterator(dense_ind[i]);
if (dynamic_cast<Dense4bitsBinIterator*>(bin_iter) != 0) {
Dense4bitsBinIterator iter = *static_cast<Dense4bitsBinIterator*>(bin_iter);
#pragma omp parallel for schedule(static)
for (int j = 0; j < num_data_; ++j) {
host4[j].s[i >> 1] |= ((iter.RawGet(j) * device_bin_mults_[copied_feature4 * dword_features_ + i]
+ ((j+i) & (device_bin_mults_[copied_feature4 * dword_features_ + i] - 1)))
Expand All @@ -468,13 +469,15 @@ void GPUTreeLearner::AllocateGPUMemory() {
BinIterator* bin_iter = train_data_->FeatureGroupIterator(dense_ind[i]);
if (dynamic_cast<DenseBinIterator<uint8_t>*>(bin_iter) != 0) {
DenseBinIterator<uint8_t> iter = *static_cast<DenseBinIterator<uint8_t>*>(bin_iter);
#pragma omp parallel for schedule(static)
for (int j = 0; j < num_data_; ++j) {
host4[j].s[i] = iter.RawGet(j) * device_bin_mults_[copied_feature4 * dword_features_ + i]
+ ((j+i) & (device_bin_mults_[copied_feature4 * dword_features_ + i] - 1));
}
}
else if (dynamic_cast<Dense4bitsBinIterator*>(bin_iter) != 0) {
Dense4bitsBinIterator iter = *static_cast<Dense4bitsBinIterator*>(bin_iter);
#pragma omp parallel for schedule(static)
for (int j = 0; j < num_data_; ++j) {
host4[j].s[i] = iter.RawGet(j) * device_bin_mults_[copied_feature4 * dword_features_ + i]
+ ((j+i) & (device_bin_mults_[copied_feature4 * dword_features_ + i] - 1));
Expand All @@ -490,6 +493,7 @@ void GPUTreeLearner::AllocateGPUMemory() {
}
// fill the leftover features
if (dword_features_ == 8) {
#pragma omp parallel for schedule(static)
for (int j = 0; j < num_data_; ++j) {
for (i = k; i < dword_features_; ++i) {
// fill this empty feature with some "random" value
Expand All @@ -498,6 +502,7 @@ void GPUTreeLearner::AllocateGPUMemory() {
}
}
else if (dword_features_ == 4) {
#pragma omp parallel for schedule(static)
for (int j = 0; j < num_data_; ++j) {
for (i = k; i < dword_features_; ++i) {
// fill this empty feature with some "random" value
Expand Down Expand Up @@ -784,7 +789,6 @@ bool GPUTreeLearner::ConstructGPUHistogramsAsync(
if (num_data <= 0) {
return false;
}


// copy data indices if it is not null
if (data_indices != nullptr && num_data != num_data_) {
Expand Down Expand Up @@ -818,13 +822,15 @@ bool GPUTreeLearner::ConstructGPUHistogramsAsync(
}
// converted indices in is_feature_used to feature-group indices
std::vector<int8_t> is_feature_group_used(num_feature_groups_, 0);
#pragma omp parallel for schedule(static,1024) if (num_features_ >= 2048)
for (int i = 0; i < num_features_; ++i) {
if(is_feature_used[i]) {
is_feature_group_used[train_data_->Feature2Group(i)] = 1;
}
}
// construct the feature masks for dense feature-groups
int used_dense_feature_groups = 0;
#pragma omp parallel for schedule(static,1024) reduction(+:used_dense_feature_groups) if (num_dense_feature_groups_ >= 2048)
for (int i = 0; i < num_dense_feature_groups_; ++i) {
if (is_feature_group_used[dense_feature_group_map_[i]]) {
feature_masks_[i] = 1;
Expand All @@ -835,6 +841,10 @@ bool GPUTreeLearner::ConstructGPUHistogramsAsync(
}
}
bool use_all_features = used_dense_feature_groups == num_dense_feature_groups_;
// if no feature group is used, just return and do not use GPU
if (used_dense_feature_groups == 0) {
return false;
}
#if GPU_DEBUG >= 1
printf("feature masks:\n");
for (unsigned int i = 0; i < feature_masks_.size(); ++i) {
Expand All @@ -843,6 +853,8 @@ bool GPUTreeLearner::ConstructGPUHistogramsAsync(
printf("\n");
printf("%d feature groups, %d used, %d\n", num_dense_feature_groups_, used_dense_feature_groups, use_all_features);
#endif
// if not all feature groups are used, we need to transfer the feature mask to GPU
// otherwise, we will use a specialized GPU kernel with all feature groups enabled
if (!use_all_features) {
queue_.enqueue_write_buffer(device_feature_masks_, 0, num_dense_feature4_ * dword_features_, ptr_pinned_feature_masks_);
}
Expand All @@ -854,6 +866,7 @@ bool GPUTreeLearner::ConstructGPUHistogramsAsync(
void GPUTreeLearner::ConstructHistograms(const std::vector<int8_t>& is_feature_used, bool use_subtract) {
std::vector<int8_t> is_sparse_feature_used(num_features_, 0);
std::vector<int8_t> is_dense_feature_used(num_features_, 0);
#pragma omp parallel for schedule(static)
for (int feature_index = 0; feature_index < num_features_; ++feature_index) {
if (!is_feature_used_[feature_index]) continue;
if (!is_feature_used[feature_index]) continue;
Expand All @@ -866,7 +879,8 @@ void GPUTreeLearner::ConstructHistograms(const std::vector<int8_t>& is_feature_u
}
// construct smaller leaf
HistogramBinEntry* ptr_smaller_leaf_hist_data = smaller_leaf_histogram_array_[0].RawData() - 1;
bool use_gpu = ConstructGPUHistogramsAsync(is_feature_used,
// ConstructGPUHistogramsAsync will return true if there are availabe feature gourps dispatched to GPU
bool is_gpu_used = ConstructGPUHistogramsAsync(is_feature_used,
nullptr, smaller_leaf_splits_->num_data_in_leaf(),
nullptr, nullptr,
nullptr, nullptr);
Expand All @@ -878,8 +892,8 @@ void GPUTreeLearner::ConstructHistograms(const std::vector<int8_t>& is_feature_u
ordered_bins_, gradients_, hessians_,
ordered_gradients_.data(), ordered_hessians_.data(),
ptr_smaller_leaf_hist_data);
// wait for GPU to finish
if (use_gpu) {
// wait for GPU to finish, only if GPU is actually used
if (is_gpu_used) {
if (tree_config_->gpu_use_dp) {
// use double precision
WaitAndGetHistograms<HistogramBinEntry>(ptr_smaller_leaf_hist_data, is_feature_used);
Expand All @@ -890,10 +904,37 @@ void GPUTreeLearner::ConstructHistograms(const std::vector<int8_t>& is_feature_u
}
}

// Compare GPU histogram with CPU histogram, useful for debuggin GPU code problem
// #define GPU_DEBUG_COMPARE
#ifdef GPU_DEBUG_COMPARE
for (int i = 0; i < num_dense_feature_groups_; ++i) {
if (!feature_masks_[i])
continue;
int dense_feature_group_index = dense_feature_group_map_[i];
size_t size = train_data_->FeatureGroupNumBin(dense_feature_group_index);
HistogramBinEntry* ptr_smaller_leaf_hist_data = smaller_leaf_histogram_array_[0].RawData() - 1;
HistogramBinEntry* current_histogram = ptr_smaller_leaf_hist_data + train_data_->GroupBinBoundary(dense_feature_group_index);
HistogramBinEntry* gpu_histogram = new HistogramBinEntry[size];
data_size_t num_data = smaller_leaf_splits_->num_data_in_leaf();
printf("Comparing histogram for feature %d size %d, %lu bins\n", dense_feature_group_index, num_data, size);
std::copy(current_histogram, current_histogram + size, gpu_histogram);
std::memset(current_histogram, 0, train_data_->FeatureGroupNumBin(dense_feature_group_index) * sizeof(HistogramBinEntry));
train_data_->FeatureGroupBin(dense_feature_group_index)->ConstructHistogram(
num_data != num_data_ ? smaller_leaf_splits_->data_indices() : nullptr,
num_data,
num_data != num_data_ ? ordered_gradients_.data() : gradients_,
num_data != num_data_ ? ordered_hessians_.data() : hessians_,
current_histogram);
CompareHistograms(gpu_histogram, current_histogram, size, dense_feature_group_index);
std::copy(gpu_histogram, gpu_histogram + size, current_histogram);
delete [] gpu_histogram;
}
#endif

if (larger_leaf_histogram_array_ != nullptr && !use_subtract) {
// construct larger leaf
HistogramBinEntry* ptr_larger_leaf_hist_data = larger_leaf_histogram_array_[0].RawData() - 1;
use_gpu = ConstructGPUHistogramsAsync(is_feature_used,
is_gpu_used = ConstructGPUHistogramsAsync(is_feature_used,
larger_leaf_splits_->data_indices(), larger_leaf_splits_->num_data_in_leaf(),
gradients_, hessians_,
ordered_gradients_.data(), ordered_hessians_.data());
Expand All @@ -905,8 +946,8 @@ void GPUTreeLearner::ConstructHistograms(const std::vector<int8_t>& is_feature_u
ordered_bins_, gradients_, hessians_,
ordered_gradients_.data(), ordered_hessians_.data(),
ptr_larger_leaf_hist_data);
// wait for GPU to finish
if (use_gpu) {
// wait for GPU to finish, only if GPU is actually used
if (is_gpu_used) {
if (tree_config_->gpu_use_dp) {
// use double precision
WaitAndGetHistograms<HistogramBinEntry>(ptr_larger_leaf_hist_data, is_feature_used);
Expand All @@ -920,119 +961,24 @@ void GPUTreeLearner::ConstructHistograms(const std::vector<int8_t>& is_feature_u
}

void GPUTreeLearner::FindBestThresholds() {
std::vector<int8_t> is_feature_used(num_features_, 0);
#pragma omp parallel for schedule(static)
SerialTreeLearner::FindBestThresholds();

#if GPU_DEBUG >= 3
for (int feature_index = 0; feature_index < num_features_; ++feature_index) {
if (!is_feature_used_[feature_index]) continue;
if (parent_leaf_histogram_array_ != nullptr
if (parent_leaf_histogram_array_ != nullptr
&& !parent_leaf_histogram_array_[feature_index].is_splittable()) {
smaller_leaf_histogram_array_[feature_index].set_is_splittable(false);
continue;
}
is_feature_used[feature_index] = 1;
}
bool use_subtract = true;
if (parent_leaf_histogram_array_ == nullptr) {
use_subtract = false;
}

ConstructHistograms(is_feature_used, use_subtract);

// Compare GPU histogram with CPU histogram, useful for debuggin GPU code problem
// #define GPU_DEBUG_COMPARE
#ifdef GPU_DEBUG_COMPARE
for (int i = 0; i < num_dense_feature_groups_; ++i) {
if (!feature_masks_[i])
continue;
int dense_feature_group_index = dense_feature_group_map_[i];
size_t size = train_data_->FeatureGroupNumBin(dense_feature_group_index);
HistogramBinEntry* ptr_smaller_leaf_hist_data = smaller_leaf_histogram_array_[0].RawData() - 1;
HistogramBinEntry* current_histogram = ptr_smaller_leaf_hist_data + train_data_->GroupBinBoundary(dense_feature_group_index);
HistogramBinEntry* gpu_histogram = new HistogramBinEntry[size];
data_size_t num_data = smaller_leaf_splits_->num_data_in_leaf();
printf("Comparing histogram for feature %d size %d, %lu bins\n", dense_feature_group_index, num_data, size);
std::copy(current_histogram, current_histogram + size, gpu_histogram);
std::memset(current_histogram, 0, train_data_->FeatureGroupNumBin(dense_feature_group_index) * sizeof(HistogramBinEntry));
train_data_->FeatureGroupBin(dense_feature_group_index)->ConstructHistogram(
num_data != num_data_ ? smaller_leaf_splits_->data_indices() : nullptr,
num_data,
num_data != num_data_ ? ordered_gradients_.data() : gradients_,
num_data != num_data_ ? ordered_hessians_.data() : hessians_,
current_histogram);
CompareHistograms(gpu_histogram, current_histogram, size, dense_feature_group_index);
std::copy(gpu_histogram, gpu_histogram + size, current_histogram);
delete [] gpu_histogram;
}
#endif

std::vector<SplitInfo> smaller_best(num_threads_);
std::vector<SplitInfo> larger_best(num_threads_);
// find splits
#pragma omp parallel for schedule(static)
for (int feature_index = 0; feature_index < num_features_; ++feature_index) {
if (!is_feature_used[feature_index]) { continue; }
const int tid = omp_get_thread_num();
SplitInfo smaller_split;
#if GPU_DEBUG >= 3
size_t bin_size = train_data_->FeatureNumBin(feature_index) + 1;
printf("feature %d smaller leaf (before fix):\n", feature_index);
PrintHistograms(smaller_leaf_histogram_array_[feature_index].RawData() - 1, bin_size);
#endif

train_data_->FixHistogram(feature_index,
smaller_leaf_splits_->sum_gradients(), smaller_leaf_splits_->sum_hessians(),
smaller_leaf_splits_->num_data_in_leaf(),
smaller_leaf_histogram_array_[feature_index].RawData());

#if GPU_DEBUG >= 3
printf("feature %d smaller leaf:\n", feature_index);
PrintHistograms(smaller_leaf_histogram_array_[feature_index].RawData() - 1, bin_size);
#endif
smaller_leaf_histogram_array_[feature_index].FindBestThreshold(
smaller_leaf_splits_->sum_gradients(),
smaller_leaf_splits_->sum_hessians(),
smaller_leaf_splits_->num_data_in_leaf(),
&smaller_split);
if (smaller_split.gain > smaller_best[tid].gain) {
smaller_best[tid] = smaller_split;
smaller_best[tid].feature = train_data_->RealFeatureIndex(feature_index);
}
// only has root leaf
if (larger_leaf_splits_ == nullptr || larger_leaf_splits_->LeafIndex() < 0) { continue; }

if (use_subtract) {
larger_leaf_histogram_array_[feature_index].Subtract(smaller_leaf_histogram_array_[feature_index]);
} else {
train_data_->FixHistogram(feature_index, larger_leaf_splits_->sum_gradients(), larger_leaf_splits_->sum_hessians(),
larger_leaf_splits_->num_data_in_leaf(),
larger_leaf_histogram_array_[feature_index].RawData());
}
#if GPU_DEBUG >= 4
printf("feature %d larger leaf:\n", feature_index);
PrintHistograms(larger_leaf_histogram_array_[feature_index].RawData() - 1, bin_size);
#endif
SplitInfo larger_split;
// find best threshold for larger child
larger_leaf_histogram_array_[feature_index].FindBestThreshold(
larger_leaf_splits_->sum_gradients(),
larger_leaf_splits_->sum_hessians(),
larger_leaf_splits_->num_data_in_leaf(),
&larger_split);
if (larger_split.gain > larger_best[tid].gain) {
larger_best[tid] = larger_split;
larger_best[tid].feature = train_data_->RealFeatureIndex(feature_index);
}
}

auto smaller_best_idx = ArrayArgs<SplitInfo>::ArgMax(smaller_best);
int leaf = smaller_leaf_splits_->LeafIndex();
best_split_per_leaf_[leaf] = smaller_best[smaller_best_idx];

if (larger_leaf_splits_ != nullptr && larger_leaf_splits_->LeafIndex() >= 0) {
leaf = larger_leaf_splits_->LeafIndex();
auto larger_best_idx = ArrayArgs<SplitInfo>::ArgMax(larger_best);
best_split_per_leaf_[leaf] = larger_best[larger_best_idx];
}
#endif
}

void GPUTreeLearner::Split(Tree* tree, int best_Leaf, int* left_leaf, int* right_leaf) {
Expand Down
3 changes: 0 additions & 3 deletions src/treelearner/gpu_tree_learner.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,7 @@
// Use Boost.Compute on-disk kernel cache
#define BOOST_COMPUTE_USE_OFFLINE_CACHE
#include <boost/compute/core.hpp>
#include <boost/compute/memory/local_buffer.hpp>
#include <boost/compute/algorithm/transform.hpp>
#include <boost/compute/container/vector.hpp>
#include <boost/compute/functional/math.hpp>
#include <boost/align/aligned_allocator.hpp>


Expand Down