forked from jackyzha0/quartz
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
206 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
--- | ||
tags: | ||
- "Type/Note" | ||
- "Topic/Data_Science" | ||
- "Class/DSC_80" | ||
date: | ||
- 2024-10-03 | ||
--- | ||
|
||
# Aggregating | ||
|
||
## Adding and Modifying Columns | ||
|
||
Adding a new column to a dataframe using `assign` | ||
```python | ||
dogs = pd.read_csv(Path('data') / 'dogs43.csv', index_col='breed') | ||
dogs.assign(cost_per_year=dogs['lifetime_cost'] / dogs['longevity']) | ||
``` | ||
|
||
Chain methods together instead of writing long, hard-to-read lines | ||
- Need to wrap expression in parentheses to add newlines before every method call | ||
|
||
```python | ||
(dogs | ||
.assign(cost_per_year=dogs['lifetime_cost'] / dogs['longevity']) | ||
.sort_values('cost_per_year') | ||
.iloc[:5] | ||
) | ||
``` | ||
|
||
Assign with special column names (spaces, special characters) | ||
```python | ||
dogs.assign(**{'cost per year 💵': dogs['lifetime_cost'] / dogs['longevity']}) | ||
``` | ||
|
||
`df.copy()` copies a dataframe in place | ||
|
||
`df[] = ` assigns column in place | ||
|
||
`df.assign()` assigns a column to a new dataframe | ||
|
||
Avoid `inplace=True` - plans to remove in future releases of pandas, not good practice | ||
|
||
`df[column].to_numpy()` returns the numpy array of a column | ||
|
||
`dogs.max(axis=1)` won't work because you are trying to take the max of a mix of datatypes | ||
|
||
## Data Granularity and the `groupby` method | ||
|
||
Fine granularity: small details | ||
Coarse: bigger picture | ||
|
||
You should opt for **finer granularity** for more detail if you have the resources to do so | ||
|
||
How to go from fine to coarse granularity: **Aggregating** | ||
|
||
> [!definition] Aggregation | ||
> Aggregating is the act of combining many values into a single value | ||
`penguins.groupby('species')['body_mass_g'].mean()` | ||
|
||
"Split-apply-combine" Paradigm | ||
|
||
[https://dsc80.com/resources/lectures/lec03/imgs/image_0.png]() | ||
|
||
```python | ||
(penguins | ||
.assign(is_dream = penguins['island'] == 'Dream') | ||
.groupby('species') | ||
['is_dream'] | ||
.mean() | ||
) | ||
``` | ||
|
||
```python | ||
%%pt | ||
penguins_small.groupby('species') | ||
``` | ||
|
||
Allows us to visualize groupby_objects | ||
|
||
Aggregation Methods | ||
- `count()` | ||
- `sum()` | ||
- `mean()` | ||
- `max()` | ||
- `last()` | ||
- `first()` | ||
|
||
```python | ||
(penguins | ||
.sort_values('body_mass_g') | ||
.groupby('sex') | ||
.last() | ||
) | ||
``` | ||
|
||
Generally, you should select column(s) directly after groupby | ||
|
||
## Beyond default aggregation methods | ||
|
||
```python | ||
(penguins | ||
.groupby('species') | ||
['body_mass_g'] | ||
.aggregate(['count', 'mean']) | ||
) | ||
``` | ||
|
||
```python | ||
(penguins | ||
.groupby('species') | ||
.aggregate({'bill_length_mm': 'max', 'island': 'unique'}) | ||
) | ||
``` | ||
|
||
```python | ||
def iqr(s): | ||
return np.percntile(s, 75) - np.percentile(s, 25) | ||
|
||
(penguins | ||
.groupby('species') | ||
['body_mass_g'] | ||
.agg(iqr) # agg is short for .aggregate | ||
) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
--- | ||
tags: | ||
- "Type/Note" | ||
- "Topic/Mathematics" | ||
- "Class/MATH_109" | ||
date: | ||
- 2024-10-03 | ||
--- | ||
|
||
# Necessary and Sufficient Conditions, Axiomatic Properties of the Real Numbers | ||
|
||
> [!proposition] 2.3.1: + and - for real numbers | ||
> $a,b\in \mathbb{R}$ | ||
> 1. Commutativity: $a+b=b+a, a\cdot b = b\cdot a$ | ||
> 2. Associativity: $(a+b)+c=a+(b+c), (a\cdot b)\cdot c = a\cdot (b\cdot c)$ | ||
> 3. Distributivity: $a\cdot(b+c) = c\cdot b + a\cdot c, (a+b)\cdot c = a\cdot c + b \cdot c$ | ||
> 4. Zero Identity: $0\in \mathbb{R}, a+0=a\forall a \in \mathbb{R}$ | ||
> 5. Unity: $1\in \mathbb{R}, a\cdot 1 = a \forall a \in \mathbb{R}$ | ||
> 6. Subtractivity: $\forall a, \text{the equation } a+x=0\text{ has a unique solution, called }-a$ ($b-a$ is defined as $b+(-a)$) | ||
> 7. Division: $\forall a \neq 0,\text{ the equation }a \cdot x = 1 \text{ has a unique solution }\frac{1}{a}$ ($\forall b$, $\frac{b}{a}$ is defined as $b\cdot\left(\frac{1}{a}\right)$) | ||
> | ||
> In one word, 1-7 collectively say $\mathbb{R}$ is a **field**. | ||
> [!question] Which of the following conditions are necessary for the positive integer $n$ to be divisible by 6 (proofs are not necessary)? | ||
> 1. 3 divides $n$ | ||
> 2. 9 divides $n$ | ||
> 3. 12 divides $n$ | ||
> 4. $n=12$ | ||
> 5. 6 divides $n^2$ | ||
> 6. 2 divides $n$ and 3 divides $n$ | ||
> 7. 2 divides $n$ or 3 divides $n$ | ||
> [!answer]- | ||
> Necessary is the same as saying $n$ is divisible by 6 $\implies$ a condition | ||
> 1, 5, 6, 7 | ||
> Strong condition (6) implies weak condition (7) | ||
> | ||
> Sufficient is the same as saying condition $\implies$ $n$ is divisible by 6 | ||
> 1. 3 divides $n$ - necessary | ||
> 2. 9 divides $n$ - neither | ||
> 3. 12 divides $n$ - sufficient | ||
> 4. $n=12$ - sufficient | ||
> 5. 6 divides $n^2$ - necessary, sufficient | ||
> 6. 2 divides $n$ and 3 divides $n$ - necessary, sufficient | ||
> 7. 2 divides $n$ or 3 divides $n$ - necessary | ||
> [!question] Use the properties of addition and multiplication of real numbers given in Properties 2.3.1 to deduce that, for all real numbers $a$ and $b$, | ||
> 1. $a\times 0 = 0 = 0 \times a$ | ||
> 2. $(-a)b=-ab=a(-b)$ | ||
> 3. $(-a)(-b)=ab$ | ||
> [!answer] | ||
> **Problem 1.** $a\times 0 = 0 = 0\times a$ | ||
> | ||
> $a\times (1+0) = a\times 1= a$ | ||
> $a\times (1+0) = a\times 1 + a\times 0= a + a\times 0$ | ||
> Subtracting $a$ from both sides: | ||
> $a\times 0 = 0$ | ||
> Commutative: $0\times a = a \times 0 = 0$ | ||
> | ||
> **Problem 2.** $(-a)b=-ab=a(-b)$ | ||
> | ||
> $(a+(-a))\times b = 0 \times b = 0$ | ||
> $=a\times b + (-a) \times b$ | ||
> Subtracting $a\times b$: $(-a)\times b = -ab$ | ||
> Commutative: $b\times(-a) = -ba$ | ||
> | ||
> **Problem 3.** $(-a)(-b)=ab$ | ||
> | ||
> $(-a)(-b) = -(a\cdot(-b)) = -(-(ab))$ | ||
> $-(-ab)$ and $ab$ both solve the equation $x+(-ab)=0$ so $-(-ab)=ab$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--- | ||
tags: | ||
- "Daily_Note" | ||
--- | ||
|
||
- [[DSC 80 Lecture 3]] | ||
- [[CSE 158 Lecture 3]] | ||
- [[DSC 40B Lecture 3]] | ||
- [[MATH 109 Discussion 1]] |