Add book + fix some code (tracel-ai#671)

hojjatabdollahi · Aug 23, 2023 · 00d3d20 · 00d3d20
1 parent b60d931
commit 00d3d20
Show file tree

Hide file tree

Showing 20 changed files with 495 additions and 28 deletions.
diff --git a/burn-book/src/SUMMARY.md b/burn-book/src/SUMMARY.md
@@ -1,9 +1,20 @@
 - [Overview](./overview.md)
 - [Why Burn?](./motivation.md)
-- [Guide](./guide/README.md)
-  - [Model](./guide/model.md)
-  - [Data](./guide/data.md)
-  - [Training](./guide/training.md)
-  - [Backend](./guide/backend.md)
-  - [Inference](./guide/inference.md)
-  - [Conclusion](./guide/conclusion.md)
+- [Basic Workflow: From Training to Inference](./basic-workflow/README.md)
+  - [Model](./basic-workflow/model.md)
+  - [Data](./basic-workflow/data.md)
+  - [Training](./basic-workflow/training.md)
+  - [Backend](./basic-workflow/backend.md)
+  - [Inference](./basic-workflow/inference.md)
+  - [Conclusion](./basic-workflow/conclusion.md)
+- [Building Blocks](./building-blocks/README.md)
+  - [Backend](./building-blocks/backend.md)
+  - [Tensor](./building-blocks/tensor.md)
+  - [Autodiff](./building-blocks/autodiff.md)
+  - [Module](./building-blocks/module.md)
+- [Import ONNX Model]()
+- [Advanced]()
+  - [Custom Training Loops]()
+  - [Custom Metric]()
+  - [Custom Kernels]()
+    - [WGPU]()
diff --git a/burn-book/src/guide/README.md → burn-book/src/basic-workflow/README.md b/burn-book/src/guide/README.md → burn-book/src/basic-workflow/README.md
diff --git a/burn-book/src/guide/backend.md → burn-book/src/basic-workflow/backend.md b/burn-book/src/guide/backend.md → burn-book/src/basic-workflow/backend.md
diff --git a/burn-book/src/guide/conclusion.md → burn-book/src/basic-workflow/conclusion.md b/burn-book/src/guide/conclusion.md → burn-book/src/basic-workflow/conclusion.md
diff --git a/burn-book/src/guide/data.md → burn-book/src/basic-workflow/data.md b/burn-book/src/guide/data.md → burn-book/src/basic-workflow/data.md
diff --git a/burn-book/src/guide/inference.md → burn-book/src/basic-workflow/inference.md b/burn-book/src/guide/inference.md → burn-book/src/basic-workflow/inference.md
@@ -34,7 +34,7 @@ pub fn infer<B: Backend>(artifact_dir: &str, device: B::Device, item: MNISTItem)
         TrainingConfig::load(&format!("{artifact_dir}/config.json")).expect("A config exists");
     let record = CompactRecorder::new()
         .load(format!("{artifact_dir}/model").into())
-        .expect("Failed to save trained model");
+        .expect("Failed to load trained model");
 
     let model = config.model.init_with::<B>(record).to_device(&device);
 
@@ -53,4 +53,4 @@ Then we can fetch the record using the same recorder as we used during training.
 Finally we can init the model with the configuration and the record before sending it to the wanted device for inference.
 For simplicity we can use the same batcher used during the training to pass from a MNISTItem to a tensor.
 
-By running the infer function, you should see the predictions of your model!  
+By running the infer function, you should see the predictions of your model!  
diff --git a/burn-book/src/guide/model.md → burn-book/src/basic-workflow/model.md b/burn-book/src/guide/model.md → burn-book/src/basic-workflow/model.md
diff --git a/burn-book/src/guide/training-output.png → ...ok/src/basic-workflow/training-output.png b/burn-book/src/guide/training-output.png → ...ok/src/basic-workflow/training-output.png
diff --git a/burn-book/src/guide/training.md → burn-book/src/basic-workflow/training.md b/burn-book/src/guide/training.md → burn-book/src/basic-workflow/training.md
@@ -114,11 +114,8 @@ pub fn train<B: ADBackend>(artifact_dir: &str, config: TrainingConfig, device: B
 
     let model_trained = learner.fit(dataloader_train, dataloader_test);
 
-    CompactRecorder::new()
-        .record(
-            model_trained.into_record(),
-            format!("{artifact_dir}/model").into(),
-        )
+    model_trained
+        .save_file(format!("{artifact_dir}/model"), &CompactRecorder::new())
         .expect("Failed to save trained model");
 }
 ```

diff --git a/burn-book/src/building-blocks/README.md b/burn-book/src/building-blocks/README.md
@@ -0,0 +1,7 @@
+# Building Blocks
+
+In this section, we'll guide you through the core elements that make up Burn.
+We'll walk you through the key components that serve as the building blocks of the framework and your future projects.
+
+As you explore Burn, you might notice that we occasionally draw comparisons to PyTorch.
+We believe it can provide a smoother learning curve and help you grasp the nuances more effectively.
diff --git a/burn-book/src/building-blocks/autodiff.md b/burn-book/src/building-blocks/autodiff.md
@@ -0,0 +1,83 @@
+# Autodiff
+
+Burn's tensor also supports autodifferentiation, which is an essential part of any deep learning framework.
+We introduced the `Backend` trait in the [previous section](./backend.md), but Burn also has another trait for autodiff: `ADBackend`.
+
+However, not all tensors support auto-differentiation; you need a backend that implements both the `Backend` and `ADBackend` traits.
+Fortunately, you can add autodifferentiation capabilities to any backend using a backend decorator: `type MyAutodiffBackend = ADBackendDecorator<MyBackend>`.
+This decorator implements both the `ADBackend` and `Backend` traits by maintaining a dynamic computational graph and utilizing the inner backend to execute tensor operations.
+
+The `ADBackend` trait adds new operations on float tensors that can't be called otherwise.
+It also provides a new associated type, `B::Gradients`, where each calculated gradient resides.
+
+```rust, ignore
+fn calculate_gradients<B: ADBackend>(tensor: Tensor<B, 2>) -> B::Gradients {
+    let mut gradients = tensor.clone().backward();
+
+    let tensor_grad = tensor.grad(&gradients);        // get
+    let tensor_grad = tensor.grad_remove(&mut gradients); // pop
+
+    gradients
+}
+```
+
+Note that some functions will always be available even if the backend doesn't implement the `ADBackend` trait.
+In such cases, those functions will do nothing.
+
+| Burn API                                               | PyTorch Equivalent                                   |
+|--------------------------------------------------------|------------------------------------------------------|
+| `tensor.detach()`                                      | `tensor.detach()`                                    |
+| `tensor.require_grad()`                                | `tensor.requires_grad()`                             |
+| `tensor.is_require_grad()`                             | `tensor.requires_grad`                               |
+| `tensor.set_require_grad(require_grad)`                | `tensor.requires_grad(False)`                        |
+
+However, you're unlikely to make any mistakes since you can't call `backward` on a tensor that is on a backend that doesn't implement `ADBackend`.
+Additionally, you can't retrieve the gradient of a tensor without an autodiff backend.
+
+## Difference with PyTorch
+
+The way Burn handles gradients is different from PyTorch.
+First, when calling `backward`, each parameter doesn't have its `grad` field updated.
+Instead, the backward pass returns all the calculated gradients in a container.
+This approach offers numerous benefits, such as the ability to easily send gradients to other threads.
+
+You can also retrieve the gradient for a specific parameter using the `grad` method on a tensor.
+Since this method takes the gradients as input, it's hard to forget to call `backward` beforehand.
+Note that sometimes, using `grad_remove` can improve performance by allowing inplace operations.
+
+In PyTorch, when you don't need gradients for inference or validation, you typically need to scope your code using a block.
+
+```python
+# Inference mode
+torch.inference():
+   # your code
+   ...
+
+# Or no grad
+torch.no_grad():
+   # your code
+   ...
+```
+
+With Burn, you don't need to wrap the backend with the `ADBackendDecorator` for inference, and you can call `inner()` to obtain the inner tensor, which is useful for validation.
+
+```rust, ignore
+/// Use `B: ADBackend`
+fn example_validation<B: ADBackend>(tensor: Tensor<B, 2>) {
+    let inner_tensor: Tensor<B::InnerBackend, 2> = tensor.inner();
+    let _ = inner_tensor + 5;
+}
+
+/// Use `B: Backend`
+fn example_inference<B: Backend>(tensor: Tensor<B, 2>) {
+    let _ = tensor + 5;
+    ...
+}
+```
+
+**Gradients with Optimizers**
+
+We've seen how gradients can be used with tensors, but the process is a bit different when working with optimizers from `burn-core`.
+To work with the `Module` trait, a translation step is required to link tensor parameters with their gradients.
+This step is necessary to easily support gradient accumulation and training on multiple devices, where each module can be forked and run on different devices in parallel.
+We'll explore deeper into this topic in the [Module](./module.md) section.
diff --git a/burn-book/src/building-blocks/backend.md b/burn-book/src/building-blocks/backend.md
@@ -0,0 +1,11 @@
+# Backend
+
+Nearly everything in Burn is based on the `Backend` trait, which enables you to run tensor operations using different implementations without having to modify your code.
+While a backend may not necessarily have autodiff capabilities, the `ADBackend` trait specifies when autodiff is needed.
+This trait not only abstracts operations but also tensor, device, and element types, providing each backend the flexibility they need.
+It's worth noting that the trait assumes eager mode since burn fully supports dynamic graphs.
+However, we may create another API to assist with integrating graph-based backends, without requiring any changes to the user's code.
+
+Users are not expected to directly use the backend trait methods, as it is primarily designed with backend developers in mind rather than Burn users.
+Therefore, most Burn userland APIs are generic across backends.
+This approach helps users discover the API more organically with proper autocomplete and documentation.
diff --git a/burn-book/src/building-blocks/module.md b/burn-book/src/building-blocks/module.md
@@ -0,0 +1,101 @@
+# Module
+
+The `Module` derive allows you to create your own neural network modules, similar to PyTorch.
+The derive function only generates the necessary methods to essentially act as a parameter container for your type, it makes no assumptions about how the forward pass is declared.
+
+```rust, ignore
+use burn::nn;
+use burn::module::Module;
+use burn::tensor::backend::Backend;
+
+#[derive(Module, Debug)]
+pub struct PositionWiseFeedForward<B: Backend> {
+    linear_inner: Linear<B>,
+    linear_outer: Linear<B>,
+    dropout: Dropout,
+    gelu: GELU,
+}
+
+impl<B: Backend> PositionWiseFeedForward<B> {
+    /// Normal method added to a struct.
+    pub fn forward<const D: usize>(&self, input: Tensor<B, D>) -> Tensor<B, D> {
+        let x = self.linear_inner.forward(input);
+        let x = self.gelu.forward(x);
+        let x = self.dropout.forward(x);
+
+        self.linear_outer.forward(x)
+    }
+}
+```
+
+Note that all fields declared in the struct must also implement the `Module` trait.
+
+## Tensor
+
+If you want to create your own module that contains tensors, and not just other modules defined with the `Module` derive, you need to be careful to achieve the behavior you want.
+
+- `Param<Tensor<B, D>>`:
+If you want the tensor to be included as a parameter of your modules, you need to wrap the tensor in a `Param` struct.
+This will create an ID that will be used to identify this parameter.
+This is essential when performing module optimization and when saving states such as optimizer and module checkpoints.
+Note that a module's record only contains parameters.
+
+- `Param<Tensor<B, D>>.set_require_grad(false)`:
+If you want the tensor to be included as a parameter of your modules, and therefore saved with the module's weights, but you don't want it to be updated by the optimizer.
+
+- `Tensor<B, D>`:
+If you want the tensor to act as a constant that can be recreated when instantiating a module.
+This can be useful when generating sinusoidal embeddings, for example.
+
+
+## Methods
+
+These methods are available for all modules.
+
+| Burn API                                               | PyTorch Equivalent                                      |
+|--------------------------------------------------------|---------------------------------------------------------|
+| `module.devices()`                                     | N/A                                                     |
+| `module.fork(device)`                                  | Similar to `module.to(device).detach()`                 |
+| `module.to_device(device)`                             | `module.to(device)`                                     |
+| `module.no_grad()`                                     | `module.require_grad_(False)`                           |
+| `module.num_params()`                                  | N/A                                                     |
+| `module.visit(visitor)`                                | N/A                                                     |
+| `module.map(mapper)`                                   | N/A                                                     |
+| `module.into_record()`                                 | Similar to `state_dict`                                 |
+| `module.load_record(record)`                           | Similar to `load_state_dict(state_dict)`                |
+| `module.save_file(file_path, recorder)`                | N/A                                                     |
+| `module.load_file(file_path, recorder)`                | N/A                                                     |
+
+
+Similar to the backend trait, there is also the `ADModule` trait to signify a module with autodiff support.
+
+| Burn API                                               | PyTorch Equivalent                                      |
+|--------------------------------------------------------|---------------------------------------------------------|
+| `module.valid()`                                       | `module.eval()`                                         |
+
+## Visitor & Mapper
+
+As mentioned earlier, modules primarily function as parameter containers.
+Therefore, we naturally offer several ways to perform functions on each parameter.
+This is distinct from PyTorch, where extending module functionalities is not as straightforward.
+
+The `map` and `visitor` methods are quite similar but serve different purposes.
+Mapping is used for potentially mutable operations where each parameter of a module can be updated to a new value.
+In Burn, optimizers are essentially just sophisticated module mappers.
+Visitors, on the other hand, are used when you don't intend to modify the module but need to retrieve specific information from it, such as the number of parameters or a list of devices in use.
+
+You can implement your own mapper or visitor by implementing these simple traits:
+
+```rust, ignore
+/// Module visitor trait.
+pub trait ModuleVisitor<B: Backend> {
+    /// Visit a tensor in the module.
+    fn visit<const D: usize>(&mut self, id: &ParamId, tensor: &Tensor<B, D>);
+}
+
+/// Module mapper trait.
+pub trait ModuleMapper<B: Backend> {
+    /// Map a tensor in the module.
+    fn map<const D: usize>(&mut self, id: &ParamId, tensor: Tensor<B, D>) -> Tensor<B, D>;
+}
+```