feature(notizen): add notes from l2 afternoon
This commit is contained in:
@@ -356,7 +356,65 @@ until termination condition
|
||||
- average distance of sample to all other points in the same cluster - average distance of sample to all other points in the next nearest cluster
|
||||
- between -1 and 1 where 1 means dense clusters
|
||||
|
||||
## Naive Bayes
|
||||
|
||||
- Supervised learning
|
||||
|
||||
### Bayes' Theorem
|
||||
|
||||
- mathematical law about conditional probabilities
|
||||
- given two events A and B, then the conditional probability P(B|A) relates to the probability that event B occurs after A has occured
|
||||
- applies e.g. when we are blindl drawing samples from a bag containing red an black alls without returning the balls
|
||||
- assume we start wit 2 red and 2 black balls
|
||||
- then P(red) = 2/4 = P(black) for the first draw
|
||||
- if we draw a red ball first, then we know upfront for the 2nd draw
|
||||
- P(red|red) = 1/3 and P(black|red) = 2/3
|
||||
- probability that two conditionally-related events P(A) and P(B|A) occur one after another: P(A^B) = P(A) * P(B|A)
|
||||
- e.g probability to first draw red and then red again: P(red^red) = P(red)*P(red|red) = 2/4 * 1/3 = 1/6
|
||||
- P(A|B), P(B|A) may be calculated as
|
||||
- P(A|B) = P(A^B) / P(B) and P(B|A) = P(A^B) / P(A)
|
||||
- in the drawing example: P(red|red) = 2/4 * 1/3 / 2/4 = 1/3
|
||||
- from this we can derive Bayes' theorem
|
||||
- P(A|B) = P(B|A) * P(A) / P(B)
|
||||
|
||||
### Using Bayes' Theorem for Classification
|
||||
|
||||
- Our events are A = class c, B = sample x.
|
||||
- We calculate P(c|x)
|
||||
- x -> Ich bekomme ein sample
|
||||
- P(c|x) -> wie hoch ist die Wahrscheinlichkeit dass das sample x in der klasse c ist?
|
||||
- given x, our classifier thus
|
||||
1. calculates P(c|x) for all c
|
||||
2. selects c with the highes P(c|x)
|
||||
- P(c|x) = P(x|c) * P(c) / P(x)
|
||||
- P(c) -> probability of a class c -> kann aus den Samples ausgelesen werden -> Zählen wieviele Samples hab ich in einer Klasse
|
||||
- es gilt: n = n_samples within c / n_samples
|
||||
- P(x) -> wie häufig hab ich ein bestimmtes sample in meinen Trainingsdaten? -> Meist nur ein einziges mal, manchmal ein identisches sample mehrmals
|
||||
- wenn nur einmal gilt: n = 1 / n_samples
|
||||
- P(x|c) -> wie häufig hab ich einen Feature Vektor in einer bestimmten Klasse -> Optimalerweise nur einmal -> Bei einer klasse wahrscheinlichkeit = 1 und bei allen anderen 0
|
||||
- man schaut sich die einzelnen Features an und schaut welche kombinationen von Features sind einen Hinweis auf eine Klasse
|
||||
- P(f1, f2, ..., fn|c) where fi are the sample's features
|
||||
- we assume that fi are independent from each other given a class c
|
||||
- thus we can calculate P(f1,f2,...,fn|c) = P(f1|c) * P(f2|c) * ... * P(fn|c)
|
||||
- 3 different NB variants for calculating P(fi|c)
|
||||
- Gaussian: fi are continuous and P(fi|c) are normally distributed
|
||||
- multinomial: P(fi|c) = n_fi in c / n_fi occures overall
|
||||
- bernoulli: binary features
|
||||
- in general this independence assumption is wrong
|
||||
- within c certain features are often correlated
|
||||
- thus the name: naive bayes classifier (NB)
|
||||
- but the NB classifier performs well in practice despite this "naive" assumption
|
||||
- Advantages
|
||||
- very fast learning and testing
|
||||
- performed super auf grossen Datensätzen
|
||||
- low storage requirements
|
||||
- optimal if the independence assumptions hold
|
||||
- typischerweise hat man aber keine unabhängigen features
|
||||
- very good in domains with many equally important features
|
||||
- robust gegen schlechtes feature engineering
|
||||
- bspw.: Um zu erkennen ob das ein Brief ist schau ich ob ein Datum drauf ist -> Das ist Mist
|
||||
- für decicion tree fatal
|
||||
- für naive bayes hat man halt nur einen neuen Skalierungsfaktor
|
||||
- sehr fehlertolerant
|
||||
- Disadvantages
|
||||
- Often not the best classifier
|
||||
|
||||
Reference in New Issue
Block a user