feature(notizen): add notes from l2 afternoon

2026-04-30 16:12:24 +02:00
parent 45b154fee6
commit e9ac770dd6
1 changed files with 58 additions and 0 deletions
@@ -356,7 +356,65 @@ until termination condition
        - average distance of sample to all other points in the same cluster - average distance of sample to all other points in the next nearest cluster
    - between -1 and 1 where 1 means dense clusters
 ## Naive Bayes
 - Supervised learning
 ### Bayes' Theorem
 - mathematical law about conditional probabilities
 - given two events A and B, then the conditional probability P(B|A) relates to the probability that event B occurs after A has occured
   - applies e.g. when we are blindl drawing samples from a bag containing red an black alls without returning the balls
   - assume we start wit 2 red and 2 black balls
        - then P(red) = 2/4 = P(black) for the first draw
   - if we draw a red ball first, then we know upfront for the 2nd draw
        - P(red|red) = 1/3 and P(black|red) = 2/3
 - probability that two conditionally-related events P(A) and P(B|A) occur one after another: P(A^B) = P(A) * P(B|A)
    - e.g probability to first draw red and then red again: P(red^red) = P(red)*P(red|red) = 2/4 * 1/3 = 1/6
 - P(A|B), P(B|A) may be calculated as
    - P(A|B) = P(A^B) / P(B) and P(B|A) = P(A^B) / P(A)
    - in the drawing example: P(red|red) = 2/4 * 1/3 / 2/4 = 1/3
 - from this we can derive Bayes' theorem
    - P(A|B) = P(B|A) * P(A) / P(B)
 ### Using Bayes' Theorem for Classification
 - Our events are A = class c, B = sample x.
 - We calculate P(c|x)
    - x -> Ich bekomme ein sample
    - P(c|x) -> wie hoch ist die Wahrscheinlichkeit dass das sample x in der klasse c ist?
 - given x, our classifier thus
    1. calculates P(c|x) for all c
    2. selects c with the highes P(c|x)
 - P(c|x) = P(x|c) * P(c) / P(x)
    - P(c) -> probability of a class c -> kann aus den Samples ausgelesen werden -> Zählen wieviele Samples hab ich in einer Klasse
        - es gilt: n = n_samples within c / n_samples
    - P(x) -> wie häufig hab ich ein bestimmtes sample in meinen Trainingsdaten? -> Meist nur ein einziges mal, manchmal ein identisches sample mehrmals
        - wenn nur einmal gilt: n = 1 / n_samples
    - P(x|c) -> wie häufig hab ich einen Feature Vektor in einer bestimmten Klasse -> Optimalerweise nur einmal -> Bei einer klasse wahrscheinlichkeit = 1 und bei allen anderen 0
        - man schaut sich die einzelnen Features an und schaut welche kombinationen von Features sind einen Hinweis auf eine Klasse
        - P(f1, f2, ..., fn|c) where fi are the sample's features
        - we assume that fi are independent from each other given a class c
            - thus we can calculate P(f1,f2,...,fn|c) = P(f1|c) * P(f2|c) * ... * P(fn|c)
            - 3 different NB variants for calculating P(fi|c)
                - Gaussian: fi are continuous and P(fi|c) are normally distributed
                - multinomial: P(fi|c) = n_fi in c / n_fi occures overall
                - bernoulli: binary features
        - in general this independence assumption is wrong
            - within c certain features are often correlated
            - thus the name: naive bayes classifier (NB)
            - but the NB classifier performs well in practice despite this "naive" assumption
 - Advantages 
    - very fast learning and testing
    - performed super auf grossen Datensätzen
    - low storage requirements
    - optimal if the independence assumptions hold
        - typischerweise hat man aber keine unabhängigen features
    - very good in domains with many equally important features
        - robust gegen schlechtes feature engineering
        - bspw.: Um zu erkennen ob das ein Brief ist schau ich ob ein Datum drauf ist -> Das ist Mist
            - für decicion tree fatal
            - für naive bayes hat man halt nur einen neuen Skalierungsfaktor
        - sehr fehlertolerant
 - Disadvantages
    - Often not the best classifier