From e9ac770dd6784ebe57da7cdbccf51c52890a2e1a Mon Sep 17 00:00:00 2001 From: aaron Date: Thu, 30 Apr 2026 16:12:24 +0200 Subject: [PATCH] feature(notizen): add notes from l2 afternoon --- ML/notizen/L2_Notizen.md | 58 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/ML/notizen/L2_Notizen.md b/ML/notizen/L2_Notizen.md index 4c3e786..c5e4a11 100644 --- a/ML/notizen/L2_Notizen.md +++ b/ML/notizen/L2_Notizen.md @@ -356,7 +356,65 @@ until termination condition - average distance of sample to all other points in the same cluster - average distance of sample to all other points in the next nearest cluster - between -1 and 1 where 1 means dense clusters +## Naive Bayes +- Supervised learning +### Bayes' Theorem +- mathematical law about conditional probabilities +- given two events A and B, then the conditional probability P(B|A) relates to the probability that event B occurs after A has occured + - applies e.g. when we are blindl drawing samples from a bag containing red an black alls without returning the balls + - assume we start wit 2 red and 2 black balls + - then P(red) = 2/4 = P(black) for the first draw + - if we draw a red ball first, then we know upfront for the 2nd draw + - P(red|red) = 1/3 and P(black|red) = 2/3 +- probability that two conditionally-related events P(A) and P(B|A) occur one after another: P(A^B) = P(A) * P(B|A) + - e.g probability to first draw red and then red again: P(red^red) = P(red)*P(red|red) = 2/4 * 1/3 = 1/6 +- P(A|B), P(B|A) may be calculated as + - P(A|B) = P(A^B) / P(B) and P(B|A) = P(A^B) / P(A) + - in the drawing example: P(red|red) = 2/4 * 1/3 / 2/4 = 1/3 +- from this we can derive Bayes' theorem + - P(A|B) = P(B|A) * P(A) / P(B) +### Using Bayes' Theorem for Classification + +- Our events are A = class c, B = sample x. +- We calculate P(c|x) + - x -> Ich bekomme ein sample + - P(c|x) -> wie hoch ist die Wahrscheinlichkeit dass das sample x in der klasse c ist? +- given x, our classifier thus + 1. calculates P(c|x) for all c + 2. selects c with the highes P(c|x) +- P(c|x) = P(x|c) * P(c) / P(x) + - P(c) -> probability of a class c -> kann aus den Samples ausgelesen werden -> Zählen wieviele Samples hab ich in einer Klasse + - es gilt: n = n_samples within c / n_samples + - P(x) -> wie häufig hab ich ein bestimmtes sample in meinen Trainingsdaten? -> Meist nur ein einziges mal, manchmal ein identisches sample mehrmals + - wenn nur einmal gilt: n = 1 / n_samples + - P(x|c) -> wie häufig hab ich einen Feature Vektor in einer bestimmten Klasse -> Optimalerweise nur einmal -> Bei einer klasse wahrscheinlichkeit = 1 und bei allen anderen 0 + - man schaut sich die einzelnen Features an und schaut welche kombinationen von Features sind einen Hinweis auf eine Klasse + - P(f1, f2, ..., fn|c) where fi are the sample's features + - we assume that fi are independent from each other given a class c + - thus we can calculate P(f1,f2,...,fn|c) = P(f1|c) * P(f2|c) * ... * P(fn|c) + - 3 different NB variants for calculating P(fi|c) + - Gaussian: fi are continuous and P(fi|c) are normally distributed + - multinomial: P(fi|c) = n_fi in c / n_fi occures overall + - bernoulli: binary features + - in general this independence assumption is wrong + - within c certain features are often correlated + - thus the name: naive bayes classifier (NB) + - but the NB classifier performs well in practice despite this "naive" assumption +- Advantages + - very fast learning and testing + - performed super auf grossen Datensätzen + - low storage requirements + - optimal if the independence assumptions hold + - typischerweise hat man aber keine unabhängigen features + - very good in domains with many equally important features + - robust gegen schlechtes feature engineering + - bspw.: Um zu erkennen ob das ein Brief ist schau ich ob ein Datum drauf ist -> Das ist Mist + - für decicion tree fatal + - für naive bayes hat man halt nur einen neuen Skalierungsfaktor + - sehr fehlertolerant +- Disadvantages + - Often not the best classifier