forked from arangodb/arangodb
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGELOG
17722 lines (12016 loc) · 715 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
devel
-----
* Fix implicit capture of views in a context of JS transaction.
* Fix a crash caused by returning a result produced by ANALYZER function.
* Update the Web UI's list of built-in AQL functions for proper syntax
highlighting in the query editor.
* Bug-fix in the case of very rare network issues there was a chance that
an AQL query could get stuck during a cleanup and after a commit.
This would cause the client to receive a timeout, and the Coordinator
blocking a Scheduler thread. This situation is sorted out and the thread
will not be blocked anymore. We also added logs in case the query
could not successfully be cleaned up, which would leave locks on shards
behind.
* Switched to GCC 10 as the default compiler and use Sandy Bridge as the
default required architecture (Linux, macOS binaries).
* Fix an assertion failure that occurred when restoring view definitions from
a cluster into a single server.
* Added new ArangoSearch analyzer type "stopwords".
* Fix error message in case of index unique constraint violations. They were
lacking the actual error message (i.e. "unique constraint violated") and
only showed the index details. The issue was introduced only in devel in Feb.
* Removed old metrics in new v2 metric api. Those metric endpoints were
identical to the sum value of histograms.
* Allow process-specific logfile names.
This change allows replacing '$PID' with the current process id in the
`--log.output` and `--audit.output` startup parameters.
This way it is easier to write process-specific logfiles.
* Backport a bugfix from upstream RocksDB for opening encrypted files with
small sizes. Without the bugfix, the server may run into assertion failures
during recovery.
* Fix duplicate leaving of V8 contexts when returning streaming cursors.
The `exitContext` call done on query shutdown could previously try to exit
the V8 context multiple times, which would cause undefined behavior. Now
we are tracking if we already left the context to prevent duplicate invocation.
* In a cluster, do not create the collections `_statistics`, `_statistics15` and
`statisticsRaw` on DB servers. These collections should only be created by the
coordinator, and should translate into 2 shards each on DB servers. But there
shouldn't be shards named `_statistics*` on DB servers.
* Fixed two bogus messages about hotbackup restore:
- Coordinators unconditionally logged the message "Got a hotbackup restore
event, getting new cluster-wide unique IDs..." on shutdown. This was not
necessarily related to a hotbackup restore.
- DB servers unconditionally logged the message "Strange, we could not
unregister the hotbackup restore callback." on shutdown, although this was
meaningless.
* Rename "save" return attribute to "dst" in AQL functions `DATE_UTCTOLOCAL` and
`DATE_LOCALTOUTC`.
* Fix potentially undefined behavior when creating a CalculationTransactionContext
for an arangosearch analyzer. An uninitialized struct member was passed as an
argument to its base class. This potentially had no observable effects, but
should be fixed.
* Retry a cluster internal network request if the connection comes from the
pool and turns out to be stale (connection immediately closed). This fixes
some spurious errors after a hotbackup restore.
* Fix progress reporting for arangoimport with large files on Windows.
Previously, progress was only reported for the first 2GB of data due to an
int overflow.
* Log the actual signal instead of "control-c" and also include the process id
of the process that sent the signal.
* Fixed Github issue #13665: Improve index selection when there are multiple
candidate indexes.
* When dropping a collection or an index with a larger amount of documents, the
key range for the collection/index in RocksDB gets compacted. Previously, the
compaction was running in foreground and thus would block the deletion operations.
Now, the compaction is running in background, so that the deletion operations
can return earlier.
The maximum number of compaction jobs that are executed in background can be
configured using the new startup parameter `--rocksdb.max-parallel-compactions`,
which defaults to 2.
* Put Sync/LatestID into hotbackup and restore it on hotbackup restore
if it is in the backup. This helps with unique key generation after
a hotbackup is restored to a young cluster.
* Fixed a bug in the index count optimization that doubled counted documents
when using array expansions in the fields definition.
* Don't store selectivity estimate values for newly created system collections.
Not storing the estimates has a benefit especially for the `_statistics`
system collections, which are written to periodically even on otherwise
idle servers. In this particular case, the actual statistics data was way
smaller than the writes caused by the index estimate values, causing a
disproportional overhead just for maintaining the selectivity estimates.
The change now turns off the selectivity estimates for indexes in all newly
created system collections, and for new user-defined indexes of type
"persistent", "hash" or "skiplist", there is now an attribute "estimates"
which can be set to `false` to disable the selectivity estimates for the index.
The attribute is optional. Not setting it will lead to the index being
created with selectivity estimates, so this is a downwards-compatible change
for user-defined indexes.
* Added startup option `--query.global-memory-limit` to set a limit on the
combined estimated memory usage of all AQL queries (in bytes).
If this option has a value of `0`, then no memory limit is in place.
This is also the default value and the same behavior as in previous versions
of ArangoDB.
Setting the option to a value greater than zero will mean that the total memory
usage of all AQL queries will be limited approximately to the configured value.
The limit is enforced by each server in a cluster independently, i.e. it can
be set separately for coordinators, DB servers etc. The memory usage of a
query that runs on multiple servers in parallel is not summed up, but tracked
seperately on each server.
If a memory allocation in a query would lead to the violation of the configured
global memory limit, then the query is aborted with error code 32 ("resource
limit exceeded").
The global memory limit is approximate, in the same fashion as the per-query
limit provided by the option `--query.memory-limit` is. Some operations,
namely calls to AQL functions and their intermediate results, are currently
not properly tracked.
If both `--query.global-memory-limit` and `--query.memory-limit` are set,
the former must be set at least as high as the latter.
To reduce the cost of globally tracking the memory usage of AQL queries, the
global memory usage counter is only updated in steps of 32 kb, making
this also the minimum granularity of the global memory usage figure.
In the same fashion, the granularity of the peak memory usage counter inside
each query was also adjusted to steps of 32 kb.
* Added startup option `--query.memory-limit-override` to control whether
individual AQL queries can increase their memory limit via the `memoryLimit`
query option. This is the default, so a query that increases its memory limit
is allowed to use more memory.
The new option `--query.memory-limit-override` allows turning this behavior
off, so that individual queries can only lower their maximum allowed memory
usage.
* Added metric `arangodb_aql_global_memory_usage` to expose the total amount
of memory (in steps of 32 kb) that is currently in use by all AQL queries.
* Added metric `arangodb_aql_global_memory_limit` to expose the memory limit
from startup option `--query.global-memory-limit`.
* Allow setting path to the timezone information via the `TZ_DATA` environment
variable, in the same fashion as the currently existing `ICU_DATA` environment
variable. The `TZ_DATA` variable is useful in environments` that start arangod
from some unusual locations, when it can't find its `tzdata` directory
automatically.
* Fixed a bug in query cost estimation when a NoResults node occured in a spliced
subquery. This could lead to a server crash.
* Fix slower-than-necessary arangoimport behavior:
arangoimport has a built-in rate limiter, which can be useful for importing
data with a somewhat constant rate. However, it is enabled by default and
limits imports to 1MB per second. These settings are not useful.
This change turns the rate limiting off by default, and sets the default
chunk size to 8MB (up from 1MB) as well. This means that arangoimport will
send larger batches to the server by default. The already existing `--batch-size`
option can be used to control the maximum size of each batch.
The new parameter `--auto-rate-limit` can now be used to toggle rate limiting.
It defaults to off, whereas previously rate limiting was enabled by default
unless `--batch-size` was specified when arangoimport was invoked.
* The cluster dashboard charts in the web UI are now more readable during the
initialization phase. Additionally, the amount of agents are now displayed
there as well. An agent failure will also appear here in case it exists.
* Added more useful information during the SmartGraph creation in the web UI
in case the current database is a OneShard database.
* Add support for building with Zen 3 CPU when optimizing for the local
architecture.
* The web UI's node overview now displays also agent information (cluster only).
* The statistics view in the web UI does now provide more system specific
information in case the Metrics API is enabled. Different statistics may
be visible depending on the operating system.
* Added metrics documentation snippets and infrastructure for that.
* Added a new cluster distribution view to the web UI. The view includes general
details about cluster-wide distribution in general as well as more detailed
shard distribution specific information.
* Follower primaries respond with
TRI_ERROR_CLUSTER_SHARD_FOLLOWER_REFUSES_OPERATION
to any read request. Fixes a wrongly responded 404 from chaos
tests.
* Fixed Github issue #13632: Query Fails on Upsert with Replace_nth.
* Reasonably harden MoveShard against invalid VelocyPack input.
* Removed older reference to VelocyPackDumper.
* Added `--documents-per-batch` option to arangoexport.
This option allows to control the number of documents to be returned by each
server-side batch. It can be used to limit the number of documents per batch
when exporting collections with large documents.
* Added a new metrics view to the web UI. This view can be used in a clustered
environment as well as in a single instance. Metrics are displayed either in
a tabular format or as plain text (Prometheus Text-based format).
Additionally, the metrics can be downloaded there.
* Added a new maintenance mode tab to the web UI in cluster mode.
The new tab shows the current state of the cluster supervision maintenance
and allows to enable/disable the maintenance mode from there. The tab will
only be visible in the `_system` database. The required privileges for
displaying the maintenance mode status and/or changing it are the as for
using the REST APIs for the maintenance mode.
* Fixed a problem that coordinators would vanish from the UI and the Health
API if one switched the agency Supervision into maintenance mode and kept
left that maintenance mode on for more than 24h.
* Fixed a bug in the web interface that displayed the error "Not authorized to
execute this request" when trying to create an index in the web interface in a
database other than `_system` with a user that does not have any access
permissions for the `_system` database.
The error message previously displayed error actually came from an internal
request made by the web interface, but it did not affect the actual index
creation.
* Added ability to display Coordinator and DBServer logs from inside the Web UI
in a clustered environment when privileges are sufficient.
Additionally, displayed log entries can now be downloaded from the web UI in
single server and in cluster mode.
* The Web UI's info view of a collection now displays additional properties and
statistics (e.g. RocksDB related figures, sharding information and more).
* Improve progress reporting for shard synchronization in the web UI.
The UI will now show how many shards are actively syncing data, and will
provide a better progress indicator, especially if there is more than one
follower for a shard.
* Fixed issue BTS-309: The Graph API (Gharial) did not respond with the correct
HTTP status code when validating edges. It now responds with 400 (Bad Request)
as documented and a new, more precise error code (1947) and message if a vertex
collection referenced in the _from or _to attribute is not part of the graph.
* Added `--shard` option to arangodump, so that dumps can be restricted to one or
multiple shards only.
* Enable statistics in web UI in non-`_system` databases in cluster mode.
In cluster mode, the web UI dashboard did not display statistics properly
when not being logged in to the `_system` database. For all other databases
than `_system`, no statistics were displayed but just some "No data..."
placeholders.
Statistics for non-`_system` databases were not properly displayed since
3.7.6 due to an internal change in the statistics processing.
In addition, a new startup option `--server.statistics-all-databases`
controls whether cluster statistics are displayed in the web interface for
all databases (if the option is set to `true`) or just for the system
database (if the option is set to `false`).
The default value for the option is `true`, meaning statistics will be
displayed in the web interface for all databases.
* Add optional hostname logging to log messages.
Whether or not the hostname is added to each log message can be controlled via
the new startup option `--log.hostname`. Its default value is the empty string,
meaning no hostname will be added to log messages.
Setting the option to an arbitrary string value will make this string be logged
in front of each regular log message, and inside the `hostname` attribute in
case of JSON-based logging. Setting the option to a value of `auto` will use
the hostname as returned by `gethostbyname`.
* Added logging of elapsed time of ArangoSearch commit/consolidation/cleanup
jobs.
* Added list-repeat AIR primitive that creates a list containing n copies of the input value.
* Updated OpenSSL to 1.1.1j and OpenLDAP to 2.4.57.
* Prevent arangosh from trying to connect after every executed command.
This fixes the case when arangosh is started with default options, but no
server is running on localhost:8529. In this particular case, arangosh will
try to connect on startup and after every executed shell command. The
connect attempts all fail and time out after 300ms.
In this case we now don't try to reconnect after every command.
* Added 'custom-query' testcase to arangobench to allow execution of custom
queries.
This also adds the options `--custom-query` and `--custom-query-file` for
arangobench.
* Addition to the internal Refactoring of K_PATHS feature: K_PATHS queries are
now being executed on the new refactored graph engine in a clustered
environment. This change should not have any visible effect on users.
* Reduce memory footprint of agency Store in Node class.
* Cleanup old hotbackup transfer jobs in agency.
* On Windows create a minidump in case of an unhandled SEH exception for
post-mortem debugging.
* Add JWT secret support for arangodump and arangorestore, i.e. they now also
provide the command-line options `--server.ask-jwt-secret` and
`--server.jwt-secret-keyfile` with the same meanings as in arangosh.
* Add optional hyperlink to program option sections for information purposes,
and add optional sub-headlines to program options for better grouping.
These changes will be visible only when using `--help`.
* For Windows builds, remove the defines `_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS`
and `_ENABLE_ATOMIC_ALIGNMENT_FIX` that were needed to build Boost components
with MSVC in older versions of Boost and MSVC.
Both of these defines are obsolete nowadays.
* Database initial sync considers document count on leader for
estimating timeouts when over 1 million docs on leader.
* EE only bugfix: On DisjointSmartGraphs that are used in anonymous way,
there was a chance that the query could fail, if non-disjoint collections
were part of the query. Named DisjointSmartGraphs have been save to this bug.
Example:
DisjointSmartGraph (graph) on vertices -edges-> vertices
Query:
WITH vertices, unrelated
FOR out IN 1 OUTBOUND "v/1:1" edges
FOR u IN unrelated
RETURN [out, u]
The "unrelated" collection was pulled into the DisjointSmartGraph, causing
the AQL setup to create erroneous state.
This is now fixed and the above query works.
This query:
WITH vertices, unrelated
FOR out IN 1 OUTBOUND "v/1:1" GRAPH "graph"
FOR u IN unrelated
RETURN [out, u]
was not affected by this bug.
* Fixed issue BTS-268: fix a flaky Foxx self-heal procedure.
* Fixed issue DEVSUP-720: Within an AQL query, the "COLLECT WITH COUNT INTO"
statement could lead to a wrong count output when used in combination with
an index which has been created with an array index attribute.
* Fixed issue #13117: Aardvark: Weird cursor offsets in query editor.
Disabled font ligatures for Ace editor in Web UI to avoid rare display issue.
* Fixed ES-784 regression related to encryption cipher propagation to
ArangoSearch data.
* Improved the wording for sharding options displayed in the web interface.
Instead of offering `flexible` and `single`, now use the more intuitive
`Sharded` and `OneShard` options, and update the help text for them.
* Make all AQL cursors return compact result arrays.
* Fix profiling of AQL queries with the `silent` and `stream` options sets in
combination. Using the `silent` option makes a query execute, but discard all
its results instantly. This led to some confusion in streaming queries, which
can return the first query results once they are available, but don't
necessarily execute the full query.
Now, `silent` correctly discards all results even in streaming queries, but
this has the effect that a streaming query will likely be executed completely
when the `silent` option is set. This is not the default however, and the
`silent` option is normally not set. There is no change for streaming queries
if the `silent` option is not set.
As a side-effect of this change, this makes profiling (i.e. using
`db._profileQuery(...)` work for streaming queries as well. Previously,
profiling a streaming query could have led to some internal errors, and even
query results being returned, even though profiling a query should not return
any query results.
* Make dropping of indexes in cluster retry in case of precondition failed.
When dropping an indexes of a collection in the cluster, the operation could
fail with a "precondition failed" error in case there were simultaneous
index creation or drop actions running for the same collection. The error
was returned properly internally, but got lost at the point when
`<collection>.dropIndex()` simply converted any error to just `false`.
We can't make `dropIndex()` throw an exception for any error, because that
would affect downwards-compatibility. But in case there is a simultaneous
change to the collection indexes, we can just retry our own operation and
check if it succeeds then. This is what `dropIndex()` will do now.
* Try to raise file descriptors limit in local start scripts (in `scripts/`
directory - used for development only).
* Fix error reporting in the reloadTLS route.
* Fix potential undefined behavior when iterating over connected nodes in an
execution plan and calling callbacks for each of the nodes: if the callbacks
modified the list of connected nodes of the current that they were called
from, this could lead to potentially undefined behavior due to iterator
invalidation. The issue occurred when using a debug STL via `_GLIBCXX_DEBUG`.
* Fixed replication bug in MerkleTree sync protocol, which could lead to
data corruption. The visible effect was that shards could no longer get
in sync since the counts would not match after sync, even after a recount.
This corruption only happened if there were large amounts of differences
(at least 65537) and the destination side had newer revisions for some
keys than the source side.
* Fixed a RocksDB bug which could lead to an assertion failure when compiling
with STL debug mode -D_GLIBCXX_DEBUG.
* Fixed a rare internal buffer overflow around ridBuffers.
* Issue #13141: The `move-filters-into-enumerate` optimization, when applied to
an EnumerateCollectionNode (i.e. full collection scan), did not do regular
checks for the query being killed during the filtering of documents, resulting
in the maxRuntime option and manual kill of a query not working timely.
* Simplify the DistributeExecutor and avoid implicit modification of its input
variable. Previously the DistributeExecutor could update the input variable
in-place, leading to unexpected results (see #13509).
The modification logic has now been moved into three new _internal_ AQL
functions (MAKE_DISTRIBUTE_INPUT, MAKE_DISTRIBUTE_INPUT_WITH_KEY_CREATION,
and MAKE_DISTRIBUTE_GRAPH_INPUT) and an additional calculation node with an
according function call will be introduced if we need to prepare the input
data for the distribute node.
* Added new REST APIs for retrieving the sharding distribution:
- GET `/_api/database/shardDistribution` will return the number of
collections, shards, leaders and followers for the database it is run
inside. The request can optionally be restricted to include data from
only a single DB server, by passing the `DBserver` URL parameter.
This API can only be used on coordinators.
- GET `/_admin/cluster/shardDistribution` will return global statistics
on the current shard distribution, showing the total number of databases,
collections, shards, leaders and followers for the entire cluster.
The results can optionally be restricted to include data from only a
single DB server, by passing the `DBserver` URL parameter.
By setting the `details` URL parameter, the response will not contain
aggregates, but instead one entry per available database will be returned.
This API can only be used in the `_system` database of coordinators, and
requires admin user privileges.
* Decrease the size of serialized index estimates, by introducing a
compressed serialization format. The compressed format uses the previous
uncompressed format internally, compresses it, and stores the compressed
data instead. This makes serialized index estimates a lot smaller, which
in turn decreases the size of I/O operations for index maintenance.
* Do not create index estimator objects for proxy collection objects on
coordinators and DB servers. Proxy objects are created on coordinators and
DB servers for all shards, and they also make index objects available. In
order to reduce the memory usage by these objects, we don't create any
index estimator objects for indexes in those proxy objects. Index estimators
usually take several KB of memory each, so not creating them will pay out
for higher numbers of collections/shards.
* More improvements for logging:
* Added new REST API endpoint GET `/_admin/log/entries` to return log entries
in a more intuitive format, putting each log entry with all its properties
into an object. The API response is an array with all log message objects
that match the search criteria.
This is an extension to the already existing API endpoint GET `/_admin/log`,
which returned log messages fragmented into 5 separate arrays.
The already existing API endpoint GET `/_admin/log` for retrieving log
messages is now deprecated, although it will stay available for some time.
* Truncation of log messages now takes JSON format into account, so that
the truncation of oversized JSON log messages still keeps a valid JSON
structure even after the truncation.
* The maximum size of in-memory log messages was doubled from 256 to 512
chars, so that longer parts of each log message can be preserved now.
* Improvements for logging. This adds the following startup options to arangod:
- `--log.max-entry-length`: controls the maximum line length for individual
log messages that are written into normal logfiles by arangod (note: this
does not include audit log messages).
Any log messages longer than the specified value will be truncated and the
suffix '...' will be added to them. The purpose of this parameter is to
shorten long log messages in case there is not a lot of space for logfiles,
and to keep rogue log messages from overusing resources.
The default value is 128 MB, which is very high and should effectively
mean downwards-compatibility with previous arangod versions, which did not
restrict the maximum size of log messages.
- `--audit.max-entry-length`: controls the maximum line length for individual
audit log messages that are written into audit logs by arangod. Any audit
log messages longer than the specified value will be truncated and the
suffix '...' will be added to them.
The default value is 128 MB, which is very high and should effectively
mean downwards-compatibility with previous arangod versions, which did not
restrict the maximum size of log messages.
- `--log.in-memory-level`: controls which log messages are preserved in
memory (in case `--log.in-memory` is set to `true`). The default value is
`info`, meaning all log messages of types `info`, `warning`, `error` and
`fatal` will be stored by an instance in memory (this was also the behavior
in previous versions of ArangoDB). By setting this option to `warning`,
only `warning`, `error` and `fatal` log messages will be preserved in memory,
and by setting the option to `error` only error and fatal messages will be kept.
This option is useful because the number of in-memory log messages is
limited to the latest 2048 messages, and these slots are by default shared
between informational, warning and error messages.
* Honor the value of startup option `--log.api-enabled` when set to `false`.
The desired behavior in this case is to turn off the REST API for logging,
but was not implemented. The default value for the option is `true`, so the
REST API is enabled. This behavior did not change, and neither did the
behavior when setting the option to a value of `jwt` (meaning the REST API
for logging is only available for superusers with a valid JWT token).
* Split the update operations for the _fishbowl system collection with Foxx
apps into separate insert/replace and remove operations. This makes the
overall update not atomic, but as removes are unlikely here, we can now get
away with a simple multi-document insert-replace operation instead of a
truncate and an exclusive transaction, which was used before.
* Fix `/_admin/cluster/removeServer` API.
This often returned HTTP 500 with an error message "Need open Array" due to
an internal error when setting up agency preconditions.
* Remove logging startup options `--log.api-enabled` and `--log.keep-logrotate`
for all client tools (arangosh, arangodump, arangorestore etc.), as these
options are only meaningful for arangod.
* Fixed BTS-284: upgrading from 3.6 to 3.7 in cluster enviroment.
Moved upgrade ArangoSearch links task to later step as it needs cluster
connection. Removed misleading error log records for failed ArangoSearch index
creation during upgrade phase.
* Extend the "move-calculations-up" optimizer rule so that it can move
calculations out of subqueries into the outer query.
* Don't allocate ahead-of-time memory for striped PRNG array in arangod,
but instead use thread-local PRNG instances. Not only does this save a
few megabytes of memory, but it also avoids potential (but unlikely)
sharing of the same PRNG instance by multiple threads.
* Remove undocumented CMake variable `USE_BACKTRACE`, and remove define
`ARANGODB_ENABLE_BACKTRACE`. Both were turned off by default before, and
when turned on allow to produce backtraces from within the executable in
case debug symbols were available, working and the build was also compiled
with `USE_MAINTAINER_MODE=On`. Some code in this context was obviously
unreachable, so now it has all been removed.
To log a backtrace from within arangod, it is now possible to call
`CrashHandler::logBacktrace()`, which will log a backtrace of the calling
thread to the arangod log. This is restricted to Linux builds only.
* Fix warnings about suggest-override which can break builds when warnings
are treated as errors.
* Turn off option `--server.export-read-write-metrics` for now, until there
is certainty about the runtime overhead it introduces.
* Fixed issue #12543: Unused Foxx service config can not be discarded.
* Fixed issue #12363: Foxx HTTP API upgrade/replace always enables
development mode.
* Remove unsafe query option `inspectSimplePlans`. This option previously
defaulted to `true`, and turning it off could make particular queries fail.
The option was ignored in the cluster previously, and turning it off only
had an effect in single server, there making very simple queries (queries
not containing any FOR loops) not going through the optimizer's complete
pipeline as a performance optimization. However, the optimization was only
possible for a very small number of queries and even had adverse effects,
so it is now removed entirely.
* On Linux and MacOS, require at least 8192 usable file descriptors at startup.
If less file descriptors are available to the arangod process, then the
startup is automatically aborted.
Even the chosen minimum value of 8192 will often not be high enough to
store considerable amounts of data. However, no higher value was chosen
in order to not make too many existing small installations fail at startup
after upgrading.
The required number of file descriptors can be configured using the startup
option `--server.descriptors-minimum`. It defaults to 8192, but it can be
increased to ensure that arangod can make use of a sufficiently high number
of files. Setting `--server.descriptors-minimum` to a value of `0` will
make the startup require only an absolute minimum limit of 1024 file
descriptors, effectively disabling the change.
Such low values should only be used to bypass the file descriptors check
in case of an emergency, but this is not recommended for production.
* Added metric `arangodb_transactions_expired` to track the total number
of expired and then garbage-collected transactions.
* Allow toggling the document read/write counters and histograms via the
new startup option `--server.export-read-write-metrics false`. This
option defaults to `true`, so these metrics will be exposed by default.
* Upgraded bundled version of libunwind to v1.5.
* Added startup option `--javascript.tasks` to allow turning off JavaScript
tasks if not needed. The default value for this option is `true`, meaning
JavaScript tasks are available as before.
However, with this option they can be turned off by admins to limit the
amount of JavaScript user code that is executed.
* Only instantiate a striped PRNG instance for the arangod server, but not
for any of the client tools (e.g. arangosh, arangodump, arangorestore).
The client tools do not use the striped PRNG, so we can save a few MBs of
memory for allocating the striped PRNG instance there, plus some CPU time
for initializing it.
* Improve shard synchronization protocol by only transferring the required
parts of the inventory from leader to follower. Previously, for each shard
the entire inventory was exchanged, which included all shards of the
respective database with all their details.
In addition, save 3 cluster-internal requests per shard in the initial shard
synchronization protocol by reusing already existing information in the
different steps of the replication process.
* Added metric `arangodb_scheduler_low_prio_queue_last_dequeue_time` that
provides the time (in millseconds) it took for the most recent low priority
scheduler queue item to bubble up to the queue's head. This metric can be
used to estimate the queuing time for incoming requests.
The metric will be updated probabilistically when a request is pulled from
the scheduler queue, and may remain at its previous value for a while if
only few requests are coming in or remain permanently at its previous value
if no further requests are incoming at all.
* Allow {USER} placeholder string also in `--ldap.search-filter`.
* Fix agency restart with mismatching compation and log indexes.
* Added metrics for document read and write operations:
- `arangodb_document_writes: Total number of document write operations
(successful and failed) not performed by synchronous replication.
- `arangodb_document_writes_replication`: Total number of document write
operations (successful and failed) by cluster synchronous replication.
- `arangodb_collection_truncates`: Total number of collection truncate
operations (successful and failed) not performed by cluster synchronous
replication.
- `arangodb_collection_truncates_replication`: Total number of collection
truncate operations (successful and failed) by synchronous replication.
- `arangodb_document_read_time`: Execution time histogram of all document
primary key read operations (successful and failed) [s]. Note: this
does not include secondary index lookups, range scans and full collection
scans.
- `arangodb_document_insert_time`: Execution time histogram of all
document insert operations (successful and failed) [s].
- `arangodb_document_replace_time`: Execution time histogram of all
document replace operations (successful and failed) [s].
- `arangodb_document_remove_time`: Execution time histogram of all
document remove operations (successful and failed) [s].
- `arangodb_document_update_time`: Execution time histogram of all
document update operations (successful and failed) [s].
- `arangodb_collection_truncate_time`: Execution time histogram of all
collection truncate operations (successful and failed) [s].
The timer metrics are turned off by default, and can be enabled by setting
the startup option `--server.export-read-write-metrics true`.
* Fixed some wrong behavior in single document updates. If the option
ignoreRevs=false was given and the precondition _rev was given in the body
but the _key was given in the URL path, then the rev was wrongly taken
as 0, rather than using the one from the document body.
* Improved logging for error 1489 ("a shard leader refuses to perform a
replication operation"). The log message will now provide the database and
shard name plus the differing information about the shard leader.
* Make `padded` and `autoincrement` key generators export their `lastValue`
values, so that they are available in dumps and can be restored elsewhere
from a dump.
* Fix decoding of values in `padded` key generator when restoring from a dump.
* Fixed error reporting for hotbackup restore from dbservers back to
coordinators. This could for example swallow out of disk errors during
hotbackup restore.
* Fixed rare objectId conflict for indexes.
* Fix for OASIS-409. Fixed indexing _id attribute at recovery.
* Add shard-parallelism to arangodump when dumping collections with multiple
shards.
Previously, arangodump could execute a dump concurrently on different
collections, but it did not parallelize the dump for multiple shards of the
same collection.
This change should speed up dumping of collections with multiple shards.
When dumping multiple shards of the same collection concurrently, parallelism
is still limited by all these threads needing to serialize their chunks into
the same (shared) output file.
* Add option `--envelope` for arangodump, to control if each dumped document
should be wrapped into a small JSON envelope (e.g.
`{"type":2300,"data":{...}}`). This JSON envelope is not necessary anymore
since ArangoDB 3.8, so omitting it can produce smaller (and slightly faster)
dumps.
Restoring a dump without these JSON envelopers is handled automatically by
ArangoDB 3.8 and higher. Restoring a dump without these JSON envelopes into
previous versions (pre 3.8) however is not supported. Thus the option should
only be used if the client tools (arangodump, arangorestore) and the arangod
server are all using v3.8 or higher, and the dumps will never be stored into
earlier versions.
The default value for this option is `true`, meaning the JSON wrappers will
be stored as part of the dump. This is compatible with all previous verions.
* Fix some issues with key generators not properly taking into account the
`allowUserKeys` attribute when in a cluster.
* Make AQL optimizer rule "splice-subqueries" mandatory, in the sense that it
cannot be disabled anymore. As a side effect of this change, there will no
query execution plans created by 3.8 that contain execution nodes of type
`SubqueryNode`. `SubqueryNode`s will only be used during query planning and
optimization, but at the end of the query optimization phase will all have
been replaced with nodes of types `SubqueryStartNode` and `SubqueryEndNode`.
The code to execute non-spliced subqueries remains in place so that 3.8 can
still execute queries planned on a 3.7 instance with the "splice-subqueries"
optimizer rule intentionally turned off. The code for executing non-spliced
subqueries can be removed in 3.9.
* Normalize user-provided input/output directory names in arangoimport,
arangoexport, arangodump and arangorestore before splitting them into path
components, in the sense that now both forward and backward slashes can be
used on Windows, even interchangingly.
* Added the following bit handling functions to AQL:
- BIT_AND(array): and-combined result
- BIT_OR(array): or-combined result
- BIT_XOR(array): xor-combined result
- BIT_NEGATE(value, bits): bitwise negation of `value`, with a mask of
`bits` length
- BIT_TEST(value, index): test if bit at position `index` is set in `value`
(indexes are 0-based)
- BIT_POPCOUNT(value): return number of bits set in `value`
- BIT_SHIFT_LEFT(value, shift, bits): bitwise shift-left of `value` by
`shift` bits, with a mask of `bits` length
- BIT_SHIFT_RIGHT(value, shift, bits): bitwise shift-right of `value` by
`shift` bits, with a mask of `bits` length
- BIT_CONSTRUCT(array): construct a number with bits set at the positions
given in the array
- BIT_DECONSTRUCT(value): deconstruct a number into an array of its individual
set bits
- BIT_TO_STRING(value): create a bitstring representation from numeric `value`
- BIT_FROM_STRING(value): parse a bitstring representation into a number
`BIT_AND`, `BIT_OR` and `BIT_XOR` are also available as aggregate functions
for usage inside COLLECT AGGREGATE.
All above bit operations support unsigned integer values with up to 32 bits.
Using values outside the supported range will make any of these bit functions
return `null` and register a warning.
* Add binary (base 2) and hexadecimal (base 16) integer literals to AQL.
These literals can be used where regular (base 10) integer literal can used.
The prefix for binary integer literals is `0b`, e.g. `0b10101110`.
The prefix for hexadecimal integer literals i `0x`, e.g. `0xabcdef02`.
Binary and hexadecimal integer literals can only be used for unsigned integers.
The maximum supported value is `(2 ^ 32) - 1`, i.e. `0xffffffff` (hexadecimal)
or `0b11111111111111111111111111111111` (binary).
* AQL query execution plan register usage optimization.
This is a performance optimization that may positively affect some AQL
queries that use a lot of variables that are only needed in certain
parts of the query.
The positive effect will come from saving registers, which directly
translates to saving columns in AqlItemBlocks.
Previously, the number of registers that were planned for each depth
level of the query never decreased when going from one level to the
next. Even though unused registers were recycled since 3.7, this did
not lead to unused registers being completely dismantled.
Now there is an extra step at the end of the register planning that
keeps track of the actually used registers on each depth, and that
will shrink the number of registers for the depth to the id of the
maximum register. This is done for each depth separately.
Unneeded registers on the right hand side of the maximum used register
are now discarded. Unused registers on the left hand side of the maximum
used register id are not discarded, because we still need to guarantee
that registers from depths above stay in the same slot when starting
a new depth.
* Added metric `arangodb_aql_current_query` to track the number of currently
executing AQL queries.
* Updated arangosync to 1.2.2.
* Fix a bug in the agency Supervision which could lead to removeFollower
jobs constantly being created and immediately stopped again.
* Limit additional replicas in failover cases to +2.
* Print a version mismatch (major/minor version difference) between the
arangosh version and the remote arangod version at arangosh startup.
* Internal refactoring of K_PATH feature, with the goal to have all graph
algorithms on the same framework. This change should not have any visible
effect on users.
* Fixed an endless busy loop which could happen if a coordinator tries to
roll back a database creation, but the database has already been dropped
by other means.
* Removed server-side JavaScript object `ArangoClusterComm`, so it cannot be
used from inside JavaScript operations or Foxx.
The `ArangoClusterComm` object was previously used inside a few internal
JavaScript operations, but was not part of the public APIs.
* Restrict access to functions inside JavaScript objects `ArangoAgency` and
`ArangoAgent` to JavaScript code that is running in privileged mode, i.e.
via the server's emergency console, the `/_admin/execute` API (if turned on)
or internal bootstrap scripts.
* Added startup option `--javascript.transactions` to allow turning off JavaScript
transactions if not needed. The default value for this option is `true`, meaning
JavaScript transactions are available as before.
However, with this option they can be turned off by admins to limit the amount
of JavaScript user code that is executed.
* Introduce a default memory limit for AQL queries, to prevent rogue queries from
consuming the entire memory available to an arangod instance.
The limit is introduced via changing the default value of the option `--query.memory-limit`
from previously `0` (meaning: no limit) to a dynamically calculated value.
The per-query memory limits defaults are now:
Available memory: 0 (0MiB) Limit: 0 unlimited, %mem: n/a
Available memory: 134217728 (128MiB) Limit: 33554432 (32MiB), %mem: 25.0
Available memory: 268435456 (256MiB) Limit: 67108864 (64MiB), %mem: 25.0
Available memory: 536870912 (512MiB) Limit: 201326592 (192MiB), %mem: 37.5
Available memory: 805306368 (768MiB) Limit: 402653184 (384MiB), %mem: 50.0
Available memory: 1073741824 (1024MiB) Limit: 603979776 (576MiB), %mem: 56.2
Available memory: 2147483648 (2048MiB) Limit: 1288490189 (1228MiB), %mem: 60.0
Available memory: 4294967296 (4096MiB) Limit: 2576980377 (2457MiB), %mem: 60.0
Available memory: 8589934592 (8192MiB) Limit: 5153960755 (4915MiB), %mem: 60.0
Available memory: 17179869184 (16384MiB) Limit: 10307921511 (9830MiB), %mem: 60.0
Available memory: 25769803776 (24576MiB) Limit: 15461882265 (14745MiB), %mem: 60.0
Available memory: 34359738368 (32768MiB) Limit: 20615843021 (19660MiB), %mem: 60.0
Available memory: 42949672960 (40960MiB) Limit: 25769803776 (24576MiB), %mem: 60.0
Available memory: 68719476736 (65536MiB) Limit: 41231686041 (39321MiB), %mem: 60.0
Available memory: 103079215104 (98304MiB) Limit: 61847529063 (58982MiB), %mem: 60.0
Available memory: 137438953472 (131072MiB) Limit: 82463372083 (78643MiB), %mem: 60.0
Available memory: 274877906944 (262144MiB) Limit: 164926744167 (157286MiB), %mem: 60.0
Available memory: 549755813888 (524288MiB) Limit: 329853488333 (314572MiB), %mem: 60.0
As previously, a memory limit value of `0` means no limitation.
The limit values are per AQL query, so they may still be too high in case queries
run in parallel. The defaults are intentionally high in order to not stop any valid,
previously working queries from succedding.
* Added startup option `--audit.queue` to control audit logging queuing
behavior (Enterprise Edition only):
The option controls whether audit log messages are submitted to a queue
and written to disk in batches or if they should be written to disk directly
without being queued.
Queueing audit log entries may be beneficial for latency, but can lead to
unqueued messages being lost in case of a power loss or crash. Setting
this option to `false` mimics the behavior from 3.7 and before, where
audit log messages were not queued but written in a blocking fashion.
* Fixed some situations of
[...]
SUBQUERY
FILTER
LIMIT
[...]
in AQL queries, yielding incorrect responses. A distributed
state within the subquery was not reset correctly.
This could also lead into "shrink" errors of AQL item blocks,
or much higher query runtimes.
Fixes:
- BTS-252
- ES-687
- github issue: #13099
- github issue: #13124
- github issue: #13147
- github issue: #13305
- DEVSUP-665
* Added metric `arangodb_server_statistics_cpu_cores` to provide the number of
CPU cores visibile to the arangod process. This is the number of CPU cores
reported by the operating system to the process.
If the environment variable `ARANGODB_OVERRIDE_DETECTED_NUMBER_OF_CORES` is
set to a positive value at instance startup, this value will be returned
instead.
* `COLLECT WITH COUNT INTO x` and `COLLECT var = expr WITH COUNT INTO x` are now
internally transformed into `COLLECT AGGREGATE x = LENGTH()` and
`COLLECT var = expr AGGREGATE x = LENGTH()` respectively. In addition, any
argument passed to the `COUNT`/`LENGTH` aggregator functions are now optimized
away. This not only simplified the code, but also allows more query optimizations:
- If the variable in `COLLECT WITH COUNT INTO var` is not used, the implicit
aggregator is now removed.
- All queries of the form `COLLECT AGGREGATE x = LENGTH()` are now executed
using the count executor, which can result in significantly improved
performance.
* Minor and rare AQL performance improvement, in nested subqueries:
LET sq1 ([..] FILTER false == true LET sq2 = (<X>) [..])
where sq1 produces no data (e.g. by the above filter) for sq2,
the part <X> have been asked two times (second returns empty result),
instead of one, if and only if the main query executes sq1 exactly one time.
Now we get away with one call only.
In the case sq1 has data, or sq1 is executed more often, only one call was needed
(assuming the data fits in one batch).
* Updated OpenSSL to 1.1.1i and OpenLDAP to 2.4.56.
* Bug-Fix: In one-shard-database setups that were created in 3.6.* and then
upgraded to 3.7.5 the DOCUMENT method in AQL will now return documents again.
* Make internal ClusterInfo::getPlan() wait for initial plan load from agency.
* Added AQL timezone functions `DATE_TIMEZONE` and `DATE_TIMEZONES`.
* Make DB servers report storage engine health to the agency, via a new "health"
attribute in requests sent to Sync/ServerStates/<id>.
The supervision can in the future check this attribute if it is posted,
and mark servers as BAD or FAILED in case an unhealthy status is reported.
DB server health is currently determined by whether or not the storage engine
(RocksDB) has reported a background error, and by whether or not the free disk
space has reached a critical low amount. The current threshold for free disk
space is set at 1% of the disk capacity (only the disk is considered that
contains the RocksDB database directory).
The minimum required free disk space percentage can be configured using the new
startup option `--rocksdb.minimum-disk-free-percent`, which needs to be between
0 and 1 (including). A value of 0 disables the check.
The minimum required free disk space can also be configured in bytes using the
new startup option `--rocksdb.minimum-disk-free-bytes`. A value of 0 disables
this check, too.
* Failed servers are now reported consistently in the web interface, at
approximately the same time in the navigation bar and in the nodes view.
Previously these two places had their own, independent poll mechanism for the
nodes' health, and they were updated independently, which could cause an
inconsistent view of the nodes' availability.
Using only one poll mechanism instead also saves some period background requests
for the second availability check.
* Updated arangosync to 1.2.1.
* Clean up callback bin and empty promises in single-host-agency.
* Stabilize a Foxx cleanup test.
* Drop a pair of braces {} in /_admin/metrics in case of empty labels, which
makes the API adhere better to the official Prometheus syntax.
* Add some more metrics to the ConnectionPool.
* Remove HTTP "Connection" header when forwarding requests in the cluster
from one coordinator to another, and let the internal network layer
handle closing of connections and keep-alive.
* Added new metric: "arangodb_collection_lock_sequential_mode" this will count
how many times we need to do a sequential locking of collections. If this
metric increases this indicates lock contention in transaction setup.
Most likely this is caused by exlcusive locks used on collections with
more than one shard.
* Fix for BTS-213
Changed the transaction locking mechanism in the cluster case.
For all installations that do not use "exclusive" collection locks this change
will not be noticable. In case of "exclusive" locks, and collections with more
than one shard, it is now less likely to get a LOCK_TIMEOUT (ErrorNum 18).
It is still possible to get into the LOCK_TIMEOUT case, especially if
the "exclusive" operation(s) are long-running.
* Reduce overhead of audit logging functionality if audit logging is turned
off.
* Add several more attributes to audit-logged queries, namely query execution
time and exit code (0 = no error, other values correspond to general ArangoDB
error codes).
* Added "startupTime", "computationTime" and "storageTime" to Pregel result