diff --git a/CITATION.cff b/CITATION.cff
new file mode 100644
index 000000000..1a7c4e4ac
--- /dev/null
+++ b/CITATION.cff
@@ -0,0 +1,65 @@
+cff-version: 1.2.0
+title: 'OpenLLM: Operating LLMs in production'
+message: >-
+  If you use this software, please cite it using these
+  metadata.
+type: software
+authors:
+  - given-names: Aaron
+    family-names: Pham
+    email: aarnphm@bentoml.com
+    orcid: 'https://orcid.org/0009-0008-3180-5115'
+  - given-names: Chaoyu
+    family-names: Yang
+    email: chaoyu@bentoml.com
+  - given-names: Sean
+    family-names: Sheng
+    email: ssheng@bentoml.com
+  - given-names: Shenyang
+    family-names: ' Zhao'
+    email: larme@bentoml.com
+  - given-names: Sauyon
+    family-names: Lee
+    email: sauyon@bentoml.com
+  - given-names: Bo
+    family-names: Jiang
+    email: jiang@bentoml.com
+  - given-names: Fog
+    family-names: Dong
+    email: fog@bentoml.com
+  - given-names: Xipeng
+    family-names: Guan
+    email: xipeng@bentoml.com
+  - given-names: Frost
+    family-names: Ming
+    email: frost@bentoml.com
+repository-code: 'https://github.com/bentoml/OpenLLM'
+url: 'https://bentoml.com/'
+abstract: >-
+  OpenLLM is an open platform for operating large language
+  models (LLMs) in production. With OpenLLM, you can run
+  inference with any open-source large-language models,
+  deploy to the cloud or on-premises, and build powerful AI
+  apps. It has built-in support for a wide range of
+  open-source LLMs and model runtime, including StableLM,
+  Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.
+  OpenLLM helps serve LLMs over RESTful API or gRPC with one
+  command or query via WebUI, CLI, our Python/Javascript
+  client, or any HTTP client. It provides first-class
+  support for LangChain, BentoML and Hugging Face that
+  allows you to easily create your own AI apps by composing
+  LLMs with other models and services. Last but not least,
+  it automatically generates LLM server OCI-compatible
+  Container Images or easily deploys as a serverless
+  endpoint via BentoCloud.
+keywords:
+  - MLOps
+  - LLMOps
+  - LLM
+  - Infrastructure
+  - Transformers
+  - LLM Serving
+  - Model Serving
+  - Serverless Deployment
+license: Apache-2.0
+date-released: '2023-06-13'
diff --git a/README.md b/README.md
index 8fb2c0f85..81441262c 100644
--- a/README.md
+++ b/README.md
@@ -327,7 +327,8 @@ OPENLLM_FLAN_T5_FRAMEWORK=tf openllm start flan-t5
 
 ### Fine-tuning support (Experimental)
 
-One can serve OpenLLM models with any PEFT-compatible layers with `--adapter-id`:
+One can serve OpenLLM models with any PEFT-compatible layers with
+`--adapter-id`:
 
 ```bash
 openllm start opt --model-id facebook/opt-6.7b --adapter-id aarnphm/opt-6-7b-quotes
@@ -345,21 +346,26 @@ To use multiple adapters, use the following format:
 openllm start opt --model-id facebook/opt-6.7b --adapter-id aarnphm/opt-6.7b-lora --adapter-id aarnphm/opt-6.7b-lora:french_lora
 ```
 
-By default, the first adapter-id will be the default Lora layer, but optionally users can change what Lora layer to use for inference via `/v1/adapters`:
+By default, the first adapter-id will be the default Lora layer, but optionally
+users can change what Lora layer to use for inference via `/v1/adapters`:
 
 ```bash
 curl -X POST http://localhost:3000/v1/adapters --json '{"adapter_name": "vn_lora"}'
 ```
 
-Note that for multiple adapter-name and adapter-id, it is recommended to update to use the default adapter before sending the inference, to avoid any performance degradation
+Note that for multiple adapter-name and adapter-id, it is recommended to update
+to use the default adapter before sending the inference, to avoid any
+performance degradation
 
-To include this into the Bento, one can also provide a `--adapter-id` into `openllm build`:
+To include this into the Bento, one can also provide a `--adapter-id` into
+`openllm build`:
 
 ```bash
 openllm build opt --model-id facebook/opt-6.7b --adapter-id ...
- ```
+```
 
-> **Note**: We will gradually roll out support for fine-tuning all models. Currently, only OPT has fully adapters support.
+> **Note**: We will gradually roll out support for fine-tuning all models.
+> Currently, only OPT has fully adapters support.
 
 ### Integrating a New Model
 
@@ -582,3 +588,19 @@ capabilities or have any questions, don't hesitate to reach out in our
 Checkout our
 [Developer Guide](https://github.com/bentoml/OpenLLM/blob/main/DEVELOPMENT.md)
 if you wish to contribute to OpenLLM's codebase.
+
+## 📔 Citation
+
+If you use OpenLLM in your research, we provide a [citation](./CITATION.cff) to
+use:
+
+```bibtex
+@software{Pham_OpenLLM_Operating_LLMs_2023,
+author = {Pham, Aaron and Yang, Chaoyu and Sheng, Sean and  Zhao, Shenyang and Lee, Sauyon and Jiang, Bo and Dong, Fog and Guan, Xipeng and Ming, Frost},
+license = {Apache-2.0},
+month = jun,
+title = {{OpenLLM: Operating LLMs in production}},
+url = {https://github.com/bentoml/OpenLLM},
+year = {2023}
+}
+```