From e9ac770dd6784ebe57da7cdbccf51c52890a2e1a Mon Sep 17 00:00:00 2001
From: aaron <aaron@0x29a.ch>
Date: Thu, 30 Apr 2026 16:12:24 +0200
Subject: [PATCH] feature(notizen): add notes from l2 afternoon

---
 ML/notizen/L2_Notizen.md | 58 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/ML/notizen/L2_Notizen.md b/ML/notizen/L2_Notizen.md
index 4c3e786..c5e4a11 100644
--- a/ML/notizen/L2_Notizen.md
+++ b/ML/notizen/L2_Notizen.md
@@ -356,7 +356,65 @@ until termination condition
         - average distance of sample to all other points in the same cluster - average distance of sample to all other points in the next nearest cluster
     - between -1 and 1 where 1 means dense clusters
 
+## Naive Bayes
 
+- Supervised learning
 
+### Bayes' Theorem
 
+- mathematical law about conditional probabilities
+- given two events A and B, then the conditional probability P(B|A) relates to the probability that event B occurs after A has occured
+   - applies e.g. when we are blindl drawing samples from a bag containing red an black alls without returning the balls
+   - assume we start wit 2 red and 2 black balls
+        - then P(red) = 2/4 = P(black) for the first draw
+   - if we draw a red ball first, then we know upfront for the 2nd draw
+        - P(red|red) = 1/3 and P(black|red) = 2/3
+- probability that two conditionally-related events P(A) and P(B|A) occur one after another: P(A^B) = P(A) * P(B|A)
+    - e.g probability to first draw red and then red again: P(red^red) = P(red)*P(red|red) = 2/4 * 1/3 = 1/6
+- P(A|B), P(B|A) may be calculated as
+    - P(A|B) = P(A^B) / P(B) and P(B|A) = P(A^B) / P(A)
+    - in the drawing example: P(red|red) = 2/4 * 1/3 / 2/4 = 1/3
+- from this we can derive Bayes' theorem
+    - P(A|B) = P(B|A) * P(A) / P(B)
 
+### Using Bayes' Theorem for Classification
+
+- Our events are A = class c, B = sample x.
+- We calculate P(c|x)
+    - x -> Ich bekomme ein sample
+    - P(c|x) -> wie hoch ist die Wahrscheinlichkeit dass das sample x in der klasse c ist?
+- given x, our classifier thus
+    1. calculates P(c|x) for all c
+    2. selects c with the highes P(c|x)
+- P(c|x) = P(x|c) * P(c) / P(x)
+    - P(c) -> probability of a class c -> kann aus den Samples ausgelesen werden -> Zählen wieviele Samples hab ich in einer Klasse
+        - es gilt: n = n_samples within c / n_samples
+    - P(x) -> wie häufig hab ich ein bestimmtes sample in meinen Trainingsdaten? -> Meist nur ein einziges mal, manchmal ein identisches sample mehrmals
+        - wenn nur einmal gilt: n = 1 / n_samples
+    - P(x|c) -> wie häufig hab ich einen Feature Vektor in einer bestimmten Klasse -> Optimalerweise nur einmal -> Bei einer klasse wahrscheinlichkeit = 1 und bei allen anderen 0
+        - man schaut sich die einzelnen Features an und schaut welche kombinationen von Features sind einen Hinweis auf eine Klasse
+        - P(f1, f2, ..., fn|c) where fi are the sample's features
+        - we assume that fi are independent from each other given a class c
+            - thus we can calculate P(f1,f2,...,fn|c) = P(f1|c) * P(f2|c) * ... * P(fn|c)
+            - 3 different NB variants for calculating P(fi|c)
+                - Gaussian: fi are continuous and P(fi|c) are normally distributed
+                - multinomial: P(fi|c) = n_fi in c / n_fi occures overall
+                - bernoulli: binary features
+        - in general this independence assumption is wrong
+            - within c certain features are often correlated
+            - thus the name: naive bayes classifier (NB)
+            - but the NB classifier performs well in practice despite this "naive" assumption
+- Advantages 
+    - very fast learning and testing
+    - performed super auf grossen Datensätzen
+    - low storage requirements
+    - optimal if the independence assumptions hold
+        - typischerweise hat man aber keine unabhängigen features
+    - very good in domains with many equally important features
+        - robust gegen schlechtes feature engineering
+        - bspw.: Um zu erkennen ob das ein Brief ist schau ich ob ein Datum drauf ist -> Das ist Mist
+            - für decicion tree fatal
+            - für naive bayes hat man halt nur einen neuen Skalierungsfaktor
+        - sehr fehlertolerant
+- Disadvantages
+    - Often not the best classifier