feature(notizen): add notes from l2 afternoon

This commit is contained in:
2026-04-30 16:12:24 +02:00
parent 45b154fee6
commit e9ac770dd6
+58
View File
@@ -356,7 +356,65 @@ until termination condition
- average distance of sample to all other points in the same cluster - average distance of sample to all other points in the next nearest cluster - average distance of sample to all other points in the same cluster - average distance of sample to all other points in the next nearest cluster
- between -1 and 1 where 1 means dense clusters - between -1 and 1 where 1 means dense clusters
## Naive Bayes
- Supervised learning
### Bayes' Theorem
- mathematical law about conditional probabilities
- given two events A and B, then the conditional probability P(B|A) relates to the probability that event B occurs after A has occured
- applies e.g. when we are blindl drawing samples from a bag containing red an black alls without returning the balls
- assume we start wit 2 red and 2 black balls
- then P(red) = 2/4 = P(black) for the first draw
- if we draw a red ball first, then we know upfront for the 2nd draw
- P(red|red) = 1/3 and P(black|red) = 2/3
- probability that two conditionally-related events P(A) and P(B|A) occur one after another: P(A^B) = P(A) * P(B|A)
- e.g probability to first draw red and then red again: P(red^red) = P(red)*P(red|red) = 2/4 * 1/3 = 1/6
- P(A|B), P(B|A) may be calculated as
- P(A|B) = P(A^B) / P(B) and P(B|A) = P(A^B) / P(A)
- in the drawing example: P(red|red) = 2/4 * 1/3 / 2/4 = 1/3
- from this we can derive Bayes' theorem
- P(A|B) = P(B|A) * P(A) / P(B)
### Using Bayes' Theorem for Classification
- Our events are A = class c, B = sample x.
- We calculate P(c|x)
- x -> Ich bekomme ein sample
- P(c|x) -> wie hoch ist die Wahrscheinlichkeit dass das sample x in der klasse c ist?
- given x, our classifier thus
1. calculates P(c|x) for all c
2. selects c with the highes P(c|x)
- P(c|x) = P(x|c) * P(c) / P(x)
- P(c) -> probability of a class c -> kann aus den Samples ausgelesen werden -> Zählen wieviele Samples hab ich in einer Klasse
- es gilt: n = n_samples within c / n_samples
- P(x) -> wie häufig hab ich ein bestimmtes sample in meinen Trainingsdaten? -> Meist nur ein einziges mal, manchmal ein identisches sample mehrmals
- wenn nur einmal gilt: n = 1 / n_samples
- P(x|c) -> wie häufig hab ich einen Feature Vektor in einer bestimmten Klasse -> Optimalerweise nur einmal -> Bei einer klasse wahrscheinlichkeit = 1 und bei allen anderen 0
- man schaut sich die einzelnen Features an und schaut welche kombinationen von Features sind einen Hinweis auf eine Klasse
- P(f1, f2, ..., fn|c) where fi are the sample's features
- we assume that fi are independent from each other given a class c
- thus we can calculate P(f1,f2,...,fn|c) = P(f1|c) * P(f2|c) * ... * P(fn|c)
- 3 different NB variants for calculating P(fi|c)
- Gaussian: fi are continuous and P(fi|c) are normally distributed
- multinomial: P(fi|c) = n_fi in c / n_fi occures overall
- bernoulli: binary features
- in general this independence assumption is wrong
- within c certain features are often correlated
- thus the name: naive bayes classifier (NB)
- but the NB classifier performs well in practice despite this "naive" assumption
- Advantages
- very fast learning and testing
- performed super auf grossen Datensätzen
- low storage requirements
- optimal if the independence assumptions hold
- typischerweise hat man aber keine unabhängigen features
- very good in domains with many equally important features
- robust gegen schlechtes feature engineering
- bspw.: Um zu erkennen ob das ein Brief ist schau ich ob ein Datum drauf ist -> Das ist Mist
- für decicion tree fatal
- für naive bayes hat man halt nur einen neuen Skalierungsfaktor
- sehr fehlertolerant
- Disadvantages
- Often not the best classifier