Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](ES catalog)Support auto detect available nodes in node_discovery mode #46646

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

qidaye
Copy link
Contributor

@qidaye qidaye commented Jan 8, 2025

What problem does this PR solve?

Add ES nodes_discovery for automatic node discovery.

When nodes_discovery is enabled, Doris will automatic detect available nodes for build scan range.
And when it is disabled, Doris will use the seeds host only, same as before.

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 8, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@qidaye
Copy link
Contributor Author

qidaye commented Jan 8, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32414 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 85a257040de2d81267bd88cab78acad05d6ff1b9, data reload: false

------ Round 1 ----------------------------------
q1	17590	6149	6058	6058
q2	2045	304	171	171
q3	10421	1250	764	764
q4	10292	858	424	424
q5	8949	2165	1950	1950
q6	209	187	149	149
q7	886	759	601	601
q8	9243	1367	1158	1158
q9	5163	4852	4907	4852
q10	6745	2284	1856	1856
q11	483	274	261	261
q12	339	358	221	221
q13	17758	3701	3032	3032
q14	240	253	217	217
q15	562	500	492	492
q16	620	602	599	599
q17	570	850	316	316
q18	6986	6325	6398	6325
q19	1222	999	570	570
q20	305	310	185	185
q21	2822	2149	1907	1907
q22	365	334	306	306
Total cold run time: 103815 ms
Total hot run time: 32414 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6240	6205	6218	6205
q2	230	333	233	233
q3	2336	2718	2344	2344
q4	1398	1824	1342	1342
q5	4234	4681	4756	4681
q6	189	177	137	137
q7	2104	1907	1828	1828
q8	2659	2787	2714	2714
q9	7272	7216	7290	7216
q10	3099	3349	2849	2849
q11	590	508	489	489
q12	674	737	632	632
q13	3437	3884	3179	3179
q14	299	321	276	276
q15	574	524	508	508
q16	654	707	651	651
q17	1240	1759	1246	1246
q18	7789	7367	7457	7367
q19	821	1144	1118	1118
q20	2022	1988	1890	1890
q21	5822	5375	4835	4835
q22	609	598	618	598
Total cold run time: 54292 ms
Total hot run time: 52338 ms

@qidaye qidaye force-pushed the es_catalog_node_auto_activation branch from 85a2570 to e082af9 Compare January 8, 2025 14:19
@qidaye
Copy link
Contributor Author

qidaye commented Jan 8, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32498 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e082af9ffcfcac1e69b2e77fc933d07104556b97, data reload: false

------ Round 1 ----------------------------------
q1	17632	6158	6051	6051
q2	2048	309	163	163
q3	10424	1240	742	742
q4	10237	887	442	442
q5	7975	2149	1959	1959
q6	211	183	147	147
q7	901	743	601	601
q8	9233	1351	1158	1158
q9	5211	4802	4857	4802
q10	6745	2301	1858	1858
q11	491	274	262	262
q12	345	359	220	220
q13	17745	3696	3041	3041
q14	232	224	213	213
q15	542	502	493	493
q16	636	608	603	603
q17	571	856	324	324
q18	6821	6461	6397	6397
q19	1599	971	553	553
q20	308	315	186	186
q21	2933	2154	1984	1984
q22	363	337	299	299
Total cold run time: 103203 ms
Total hot run time: 32498 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6260	6240	6243	6240
q2	238	327	233	233
q3	2230	2667	2312	2312
q4	1406	1823	1370	1370
q5	4346	4758	4722	4722
q6	186	183	141	141
q7	2084	1949	1837	1837
q8	2670	2865	2690	2690
q9	7270	7220	7296	7220
q10	3078	3315	2850	2850
q11	582	498	494	494
q12	640	714	574	574
q13	3486	3922	3269	3269
q14	276	321	293	293
q15	569	516	503	503
q16	662	691	654	654
q17	1252	1776	1270	1270
q18	7611	7470	7270	7270
q19	858	1163	1154	1154
q20	1991	2000	1971	1971
q21	5772	5345	5045	5045
q22	617	604	594	594
Total cold run time: 54084 ms
Total hot run time: 52706 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 195030 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e082af9ffcfcac1e69b2e77fc933d07104556b97, data reload: false

query1	1318	950	920	920
query2	6465	2374	2397	2374
query3	11004	4505	4639	4505
query4	36174	23586	23811	23586
query5	4162	625	479	479
query6	292	206	194	194
query7	4007	498	309	309
query8	312	254	239	239
query9	9421	2738	2732	2732
query10	443	316	256	256
query11	16202	15210	15164	15164
query12	159	110	105	105
query13	1576	558	400	400
query14	9334	6816	6883	6816
query15	255	221	200	200
query16	7842	622	449	449
query17	1563	743	627	627
query18	2095	410	304	304
query19	206	174	176	174
query20	125	140	116	116
query21	205	124	112	112
query22	4450	4373	4514	4373
query23	34366	33050	33090	33050
query24	7547	2367	2414	2367
query25	495	448	422	422
query26	808	279	150	150
query27	2799	476	338	338
query28	5725	2517	2494	2494
query29	586	545	436	436
query30	212	182	159	159
query31	971	875	786	786
query32	71	66	59	59
query33	472	354	296	296
query34	801	853	534	534
query35	797	859	766	766
query36	1003	1071	947	947
query37	115	99	81	81
query38	4065	4175	4317	4175
query39	1558	1473	1465	1465
query40	209	121	108	108
query41	48	43	42	42
query42	127	101	108	101
query43	537	531	505	505
query44	1379	846	850	846
query45	185	177	171	171
query46	926	1069	681	681
query47	1883	1870	1827	1827
query48	395	407	350	350
query49	736	486	391	391
query50	656	667	417	417
query51	7207	7022	6902	6902
query52	101	105	93	93
query53	236	259	184	184
query54	507	491	419	419
query55	83	84	83	83
query56	258	256	244	244
query57	1210	1175	1112	1112
query58	244	252	238	238
query59	3273	3270	3049	3049
query60	287	268	265	265
query61	111	103	109	103
query62	852	789	769	769
query63	239	191	188	188
query64	3121	1069	659	659
query65	3345	3259	3212	3212
query66	768	411	306	306
query67	16501	15620	15388	15388
query68	8774	714	511	511
query69	496	291	254	254
query70	1223	1055	1152	1055
query71	441	280	253	253
query72	6508	3837	3886	3837
query73	655	758	365	365
query74	10187	8820	9000	8820
query75	4801	3158	2638	2638
query76	4832	1189	786	786
query77	852	359	275	275
query78	10092	9916	9392	9392
query79	3124	805	598	598
query80	678	515	440	440
query81	487	269	233	233
query82	208	161	124	124
query83	205	165	145	145
query84	281	158	74	74
query85	755	359	306	306
query86	360	299	273	273
query87	4405	4519	4429	4429
query88	3373	2228	2198	2198
query89	419	324	289	289
query90	2115	189	187	187
query91	131	133	103	103
query92	69	54	50	50
query93	2270	843	591	591
query94	656	395	287	287
query95	332	263	258	258
query96	493	616	284	284
query97	2827	2909	2807	2807
query98	227	212	192	192
query99	1644	1518	1364	1364
Total cold run time: 299345 ms
Total hot run time: 195030 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.75 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e082af9ffcfcac1e69b2e77fc933d07104556b97, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.04
query3	0.24	0.07	0.07
query4	1.60	0.10	0.11
query5	0.41	0.42	0.42
query6	1.16	0.65	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.59	0.50	0.51
query10	0.55	0.57	0.53
query11	0.14	0.10	0.10
query12	0.14	0.11	0.11
query13	0.61	0.62	0.59
query14	2.73	2.84	2.82
query15	0.90	0.82	0.82
query16	0.37	0.38	0.36
query17	1.05	1.05	1.04
query18	0.23	0.20	0.20
query19	1.87	1.78	1.98
query20	0.01	0.01	0.02
query21	15.36	0.87	0.58
query22	0.75	0.83	0.68
query23	15.28	1.34	0.60
query24	2.69	1.45	1.37
query25	0.18	0.09	0.10
query26	0.25	0.15	0.14
query27	0.06	0.06	0.05
query28	14.47	1.55	1.04
query29	12.59	3.92	3.28
query30	0.25	0.08	0.07
query31	2.81	0.59	0.37
query32	3.23	0.55	0.46
query33	3.17	3.07	3.12
query34	16.95	5.12	4.47
query35	4.48	4.47	4.50
query36	0.64	0.49	0.49
query37	0.11	0.06	0.06
query38	0.06	0.04	0.04
query39	0.04	0.02	0.02
query40	0.17	0.14	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 106.46 s
Total hot run time: 31.75 s

@qidaye
Copy link
Contributor Author

qidaye commented Jan 9, 2025

run buildall

@morningman
Copy link
Contributor

This pull request introduces several enhancements and new features related to Elasticsearch (ES) integration in the project. The most significant changes include the addition of EsNodeDiscovery for automatic node detection, updates to EsExternalCatalog to manage available nodes, and modifications to the CatalogMgr and related classes to support these new features.

Enhancements to Elasticsearch Integration:

  • Automatic Node Discovery:

    • Added EsNodeDiscovery class for periodic detection of available ES nodes. (fe/fe-core/src/main/java/org/apache/doris/datasource/es/EsNodeDiscovery.java)
    • Integrated EsNodeDiscovery with InternalCatalog to manage ES node information. (fe/fe-core/src/main/java/org/apache/doris/catalog/Env.java, fe/fe-core/src/main/java/org/apache/doris/datasource/InternalCatalog.java) [1] [2]
  • Catalog Management:

    • Updated CatalogMgr to register and deregister ES catalogs with EsNodeDiscovery. (fe/fe-core/src/main/java/org/apache/doris/datasource/CatalogMgr.java) [1] [2]
  • Elasticsearch Catalog Enhancements:

    • Modified EsExternalCatalog to track and update available nodes information. (fe/fe-core/src/main/java/org/apache/doris/datasource/es/EsExternalCatalog.java) [1] [2]
    • Enhanced EsRestClient to fetch a list of HTTP nodes. (fe/fe-core/src/main/java/org/apache/doris/datasource/es/EsRestClient.java)
  • ES Table and Search Context:

    • Updated EsTable to include available nodes information. (fe/fe-core/src/main/java/org/apache/doris/catalog/EsTable.java)
    • Modified SearchContext to utilize available nodes information during query execution. (fe/fe-core/src/main/java/org/apache/doris/datasource/es/SearchContext.java, fe/fe-core/src/main/java/org/apache/doris/datasource/es/PartitionPhase.java) [1] [2]

Dependency Updates:

  • Test Dependency:
    • Added mockito-core as a test dependency in the pom.xml file. (fe/fe-core/pom.xml)

Code Quality:

  • Checkstyle Suppression:
    • Added suppression for static import checks in EsExternalCatalogTest. (fe/check/checkstyle/suppressions.xml)

@doris-robot
Copy link

TPC-H: Total hot run time: 33298 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 77e393b5c563c5b1e9cd0fcaf630be0c5b25b783, data reload: false

------ Round 1 ----------------------------------
q1	17586	7317	6071	6071
q2	2092	317	174	174
q3	10573	1241	766	766
q4	10203	900	449	449
q5	7540	2265	2050	2050
q6	221	179	147	147
q7	925	789	598	598
q8	9242	1527	1363	1363
q9	5220	4959	5038	4959
q10	6840	2302	1874	1874
q11	497	286	262	262
q12	353	395	226	226
q13	17747	3778	3104	3104
q14	235	235	213	213
q15	566	508	507	507
q16	641	640	586	586
q17	596	909	343	343
q18	7147	6483	6517	6483
q19	2261	1034	586	586
q20	312	341	192	192
q21	2992	2221	2037	2037
q22	367	347	308	308
Total cold run time: 104156 ms
Total hot run time: 33298 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6459	6362	6441	6362
q2	248	331	233	233
q3	2277	2659	2350	2350
q4	1443	1804	1333	1333
q5	4348	4828	4928	4828
q6	210	182	146	146
q7	2121	2012	1811	1811
q8	2680	2896	2814	2814
q9	7265	7321	7259	7259
q10	3025	3268	2843	2843
q11	612	518	496	496
q12	667	779	663	663
q13	3581	3969	3326	3326
q14	284	299	266	266
q15	584	508	508	508
q16	656	675	641	641
q17	1271	1789	1279	1279
q18	7831	7527	7348	7348
q19	862	1237	1242	1237
q20	2035	2057	1963	1963
q21	5653	5200	5073	5073
q22	634	636	579	579
Total cold run time: 54746 ms
Total hot run time: 53358 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196311 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 77e393b5c563c5b1e9cd0fcaf630be0c5b25b783, data reload: false

query1	1322	977	917	917
query2	6325	2501	2365	2365
query3	10993	4837	4927	4837
query4	32782	23968	23516	23516
query5	4252	608	473	473
query6	282	204	183	183
query7	3979	491	316	316
query8	307	241	244	241
query9	9587	2715	2719	2715
query10	452	315	248	248
query11	17936	15474	14990	14990
query12	161	108	109	108
query13	1561	547	406	406
query14	10052	7757	7393	7393
query15	245	199	196	196
query16	7976	640	480	480
query17	1561	800	630	630
query18	2136	419	308	308
query19	220	202	160	160
query20	117	116	114	114
query21	214	133	103	103
query22	4550	4720	4590	4590
query23	33984	33531	33373	33373
query24	6264	2257	2388	2257
query25	488	470	393	393
query26	738	272	153	153
query27	2026	473	332	332
query28	5100	2529	2484	2484
query29	663	546	438	438
query30	211	180	153	153
query31	983	914	811	811
query32	83	64	56	56
query33	483	340	307	307
query34	766	854	521	521
query35	792	835	730	730
query36	1000	1030	994	994
query37	128	98	78	78
query38	4235	4202	4268	4202
query39	1534	1452	1449	1449
query40	200	112	97	97
query41	51	54	49	49
query42	125	110	105	105
query43	524	536	494	494
query44	1366	861	845	845
query45	181	175	174	174
query46	887	1069	661	661
query47	1916	1916	1887	1887
query48	391	401	323	323
query49	721	490	383	383
query50	645	680	399	399
query51	7120	7065	6909	6909
query52	112	101	96	96
query53	235	266	185	185
query54	483	485	410	410
query55	86	88	78	78
query56	240	255	242	242
query57	1249	1204	1159	1159
query58	234	231	240	231
query59	3063	3173	3132	3132
query60	279	283	250	250
query61	112	114	110	110
query62	864	812	731	731
query63	256	189	190	189
query64	3419	1012	680	680
query65	3303	3267	3270	3267
query66	766	405	299	299
query67	16178	15764	15482	15482
query68	8708	696	546	546
query69	464	286	248	248
query70	1216	1160	1123	1123
query71	434	278	254	254
query72	6466	3939	3854	3854
query73	647	739	359	359
query74	10009	9118	8767	8767
query75	4131	3150	2673	2673
query76	3582	1190	752	752
query77	765	366	274	274
query78	10098	9977	9618	9618
query79	3297	849	599	599
query80	649	541	434	434
query81	499	274	219	219
query82	551	155	126	126
query83	161	157	143	143
query84	240	91	71	71
query85	793	350	295	295
query86	394	306	293	293
query87	4552	4394	4230	4230
query88	4524	2185	2187	2185
query89	404	330	293	293
query90	1742	183	187	183
query91	127	131	105	105
query92	68	56	51	51
query93	1893	890	539	539
query94	659	403	273	273
query95	336	306	246	246
query96	479	618	287	287
query97	2834	2961	2828	2828
query98	224	197	196	196
query99	1398	1467	1360	1360
Total cold run time: 293838 ms
Total hot run time: 196311 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.21 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 77e393b5c563c5b1e9cd0fcaf630be0c5b25b783, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.03	0.03
query3	0.23	0.07	0.07
query4	1.62	0.10	0.11
query5	0.42	0.43	0.39
query6	1.16	0.65	0.66
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.59	0.50	0.49
query10	0.54	0.56	0.54
query11	0.15	0.10	0.10
query12	0.14	0.10	0.11
query13	0.61	0.59	0.59
query14	2.84	2.85	2.73
query15	0.89	0.83	0.83
query16	0.39	0.37	0.40
query17	1.06	1.02	0.99
query18	0.23	0.21	0.20
query19	1.87	1.89	2.02
query20	0.01	0.00	0.01
query21	15.35	0.87	0.58
query22	0.76	0.69	0.60
query23	15.47	1.37	0.58
query24	2.92	1.20	0.97
query25	0.20	0.23	0.11
query26	0.26	0.15	0.14
query27	0.08	0.06	0.06
query28	13.91	1.59	1.04
query29	12.58	3.96	3.28
query30	0.24	0.09	0.06
query31	2.82	0.59	0.37
query32	3.22	0.55	0.46
query33	3.04	3.11	3.18
query34	16.83	5.04	4.46
query35	4.47	4.45	4.45
query36	0.64	0.48	0.48
query37	0.10	0.06	0.06
query38	0.04	0.04	0.03
query39	0.03	0.02	0.03
query40	0.17	0.14	0.13
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.03	0.04	0.03
Total cold run time: 106.2 s
Total hot run time: 31.21 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants