Merge remote-tracking branch 'origin/master'

huawei-noah · Jan 8, 2025 · 033a768 · 033a768
2 parents 9116742 + e01c657
commit 033a768
Showing 6 changed files with 96 additions and 4 deletions.
diff --git a/HEBO/hebo/__init__.py b/HEBO/hebo/__init__.py
@@ -14,4 +14,4 @@
 from . import optimizers
 from . import sklearn_tuner
 
-__version__ = "0.3.5"
+__version__ = "0.3.6"
diff --git a/HEBO/setup.py b/HEBO/setup.py
@@ -18,7 +18,7 @@
 
 setuptools.setup(
         name        = 'HEBO',
-        version     = '0.3.5', # also needs to be changed in hebo/__init__.py
+        version     = '0.3.6', # also needs to be changed in hebo/__init__.py
         packages    = setuptools.find_packages(),
         description = 'Heteroscedastic evolutionary bayesian optimisation',
         long_description = long_description,

diff --git a/README.md b/README.md
@@ -20,6 +20,7 @@ Huawei, Noah's Ark Lab.
     - [SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks](./SparsePO)
 - Generative Model Research
     - [EM-LLM: Human-like Episodic Memory for Infinite Context LLMs](./EM-LLM)
+    - [Mixture of Attentions For Speculative Decoding](https://github.com/huawei-noah/HEBO/tree/mixture-of-attentions/)
 
 Further instructions are provided in the README files associated to each project.
 
@@ -340,3 +341,10 @@ Code associated with our EM-LLM paper: [[arXiv]](https://arxiv.org/abs/2407.0945
 #### Abstract
 
 Large language models (LLMs) have shown remarkable capabilities, but still struggle with processing extensive contexts, limiting their ability to maintain coherence and accuracy over long sequences. In contrast, the human brain excels at organising and retrieving episodic experiences across vast temporal scales, spanning a lifetime. In this work, we introduce EM-LLM, a novel approach that integrates key aspects of human episodic memory and event cognition into LLMs with no fine-tuning, enabling them to handle practically infinite context lengths while maintaining computational efficiency. EM-LLM organises sequences of tokens into coherent episodic events using a combination of Bayesian surprise and graph-theoretic boundary refinement in an online fashion. When needed, these events are retrieved through a two-stage memory process, combining similarity-based and temporally contiguous retrieval for efficient and human-like access to relevant information. Experiments on the LongBench and $\infty$-Bench benchmarks demonstrate EM-LLM's superior performance, consistently outperforming the state-of-the-art retrieval model InfLLM across various baseline LLMs. In addition, EM-LLM outperforms its popular counterpart, RAG, in a wide range of tasks, while requiring similar resources. Notably, EM-LLM's performance even surpasses full-context models in most tasks, while successfully performing retrieval across 5 million tokens -- a scale computationally infeasible for such models. Finally, our analysis reveals strong correlations between EM-LLM's event segmentation and human-perceived events, suggesting a bridge between this artificial system and its biological counterpart, thereby offering a novel computational framework for exploring human memory mechanisms.
+
+## [Mixture of Attentions for Speculative Decoding](https://github.com/huawei-noah/HEBO/tree/mixture-of-attentions/)
+
+#### Abstract
+
+The growth in the number of parameters of Large Language Models (LLMs) has led to a significant surge in computational requirements, making them challenging and costly to deploy. Speculative decoding (SD) leverages smaller models to efficiently propose future tokens, which are then verified by the LLM in parallel. Small models that utilise activations from the LLM currently achieve the fastest decoding speeds. However, we identify several limitations of SD models including the lack of on-policyness during training and partial observability. To address these shortcomings, we propose a more grounded architecture for small models by introducing a Mixture of Attentions for SD. Our novel architecture can be applied in two scenarios: a conventional single device deployment and a novel client-server deployment where the small model is hosted on a consumer device and the LLM on a server. In a single-device scenario, we demonstrate state-of-the-art speedups improving EAGLE-2 by 9.5% and its acceptance length by 25%. In a client-server setting, our experiments demonstrate: 1) state-of-the-art latencies with minimal calls to the server for different network conditions, and 2) in the event of a complete disconnection, our approach can maintain higher accuracy compared to other SD methods and demonstrates advantages over API calls to LLMs, which would otherwise be unable to continue the generation process.
+
diff --git a/SIMMER/LICENSE b/SIMMER/LICENSE
@@ -0,0 +1,22 @@
+MIT License
+
+Copyright (c) 2022. Huawei Technologies Co., Ltd.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ~~THE~~
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
diff --git a/SIMMER/README.md b/SIMMER/README.md
@@ -1,4 +1,4 @@
-# Saut\'e and Simmer {RL}: Safe Reinforcement Learning Using Safety State Augmentation
+# Sauté and Simmer RL: Safe Reinforcement Learning Using Safety State Augmentation
 
 ###  Sauté RL: Almost Surely Safe RL Using State Augmentation
 
@@ -39,7 +39,7 @@ conda env create -f sauterl.yml
 conda activate sauterl
 ```
 
-Our implementation is based on the Open AI safety starter agents. To install the Open AI libraries run the following commands:
+Our implementation is based on the Open AI safety starter agents (distributed under MIT license). To install the Open AI libraries run the following commands:
 
 ```console
 mkdir safe-rl

diff --git a/SIMMER/THIRD PARTY OPEN SOURCE SOFTWARE NOTICE.txt b/SIMMER/THIRD PARTY OPEN SOURCE SOFTWARE NOTICE.txt
@@ -0,0 +1,62 @@
+THIRD PARTY OPEN SOURCE SOFTWARE NOTICE
+
+Please note we provide an open source software notice for the third party open source software along with this software
+ and/or this software component contributed by Huawei (in the following just “this SOFTWARE”).
+ The open source software licenses are granted by the respective right holders.
+
+Warranty Disclaimer
+THE OPEN SOURCE SOFTWARE IN THIS SOFTWARE IS DISTRIBUTED IN THE HOPE THAT IT WILL BE USEFUL,
+BUT WITHOUT ANY WARRANTY, WITHOUT EVEN THE IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+SEE THE APPLICABLE LICENSES FOR MORE DETAILS.
+
+------------------------------------------------------------------------------------------------------------------------
+
+Copyright Notice and License Texts
+
+Software: Safety starter agents (https://github.com/openai/safety-starter-agents)
+
+MIT License
+
+Copyright (c) 2019 OpenAI
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+Software: Safety gym (https://github.com/openai/safety-gym)
+
+MIT License
+
+Copyright (c) 2019 OpenAI
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.