feature(notizen): add notes from l2 morning

2026-04-30 10:26:07 +02:00
parent 1c8c4e5142
commit 5a7f4dfe38
2 changed files with 139 additions and 1 deletions
@@ -1,6 +1,6 @@
 # Notizen Lektion 1
->Thema: Einführung Practical Machine Learning
+>Thema: Einführung Practical Machine Learning 1
 >Datum: 22.04.2026
 >Dozent: Jürgen Vogel
@@ -0,0 +1,138 @@
 # Notizen Lektion 2
 >Thema: Einführung Practical Machine Learning 2
 >Datum: 22.04.2026
 >Dozent: Jürgen Vogel
 ## recap
 > [!NOTE] Definition
 algorithm that learns from experience E to solve some tasks T with performance P and P improves with E
 - Model
    - represents the solution to the tasks T
    - is learnt and adapted based on E
    - can be evaluated with respect to P
 - Features
    - are the relevant part of the data E for creating the model
    - may have to be designed explicitly depending on the ML algorithm
 - Categorization with respect to
    - experience E: supervised vs. unsupervised vs. reinforcement leanring
    - tasks T: clustering vs. classification vs. regresseion
    - human-readable model: white box vs. black box
 - Project
    - agile/iterative development (CRISP-DM)
 - Key Challenges
    - definition of T that is both solvable and generates value
    - large amounts of high quality data E
    - feature engineering
    - dealing with 95% models
 ## Evaluation
 ### How good is the machine learning system?
 - returned result is good if it solves the problem at hand
    - may be qualitative or quantitative
    - may be subjective (user need, context, and preferences)
    - may change over time
    - also depends on factors such as credibility, specificity, exhaustivitiy, recency, clarity, interpretability... of the result
 - Beispiel Suchmaschine: Eine Reihe von Keywords werden in eine Suchmaschine eingegeben
    - Wann ist die Antwort der Suchmaschine "gut"? 
        - Schwirig zu beantworten, da es sich von Nutzer zu Nutzer unterscheided
    - Casual User: Frage aus generellem Context -> generellere Antwort okay
        - "Wo ist in Laufdistanz ein Restaurant, das offen ist"
            - Man will nicht das bestmögliche Setting finden und alle Restaurants finde
        - Schnelles Ergebnis und gut genug
    - Expert User: Recherchiert sehr detailierte Informationen
        - Umfangreiche Analyse machen 
        - Was gibts alles für wiss. Literatur zum Thema?
        - Was sind die besten Verfahren?
        - Informationsbedürfnis sehr hoch
 - thus, the ML system needs to be assessed in "real-life" situations
    - often with user involvement
    - similar methods as with user requirements research
        - usability tests, interviews, field studies, log analysis
    - but this takes time and is costly
 ### Metrics SR/ER
 - Wichtig:
    - Success Rate
    - Error Rate
 - Success
    - Result is correct -> ein einzelnes Sample ist richtig klassifiziert worden
    - success rate -> durschnitt über grössere Menge samples
        - nennt man auch accuracy
 - Error
    - Result is incorrect -> ein einzelnes Sample ist falsch
    - error rate -> durschnitt über grössere Menge samples
 - Beides ist eine 1/0 Betrachtung -> Entweder falsch oder richtig
 - Bsp: Wie viele Personen sind auf Bild
    - Modell sagt 3 Personen
    - Auf Bild sind 5 Personen
    - Wie bewertet man das?
        - falsch? -> 100% error
        - ein bisschen richtig? 3/5 erkannt 2/5 fehler
 - Generalisieren wir die Erfolgsrate erhält man
    - our ML system takes some test data D as input and produces some results
        - D -> {r'1, ... r'n}
        - e.g. if r'i are from a list of predefined labels , we call this classification
    - the test data also includes the expected result "gold standard"
        - D -> {r1, ..., rn}
    - for the test setting, we define some comparison functions
        - c(r, r') = 1 if r = r', 0 else # vergleichsfunktion
    - then we can calculate the success rate SR as
        - SR = (1/n)*sum(i=1, n, c(ri, r'i))
 ### Precision and Recall for Binary Classification
 - Bsp. Suchmaschine -> Man will evaluieren ob das Modell gut funktioniert
    - Für eine Suchanfrage wurde ein Test Set zusammengestellt
    - Manuell bewertet (Gold Standard): 
        - Man weiss für jeden Eintrag: Website passt oder passt nicht
 Bewertung:
 |                     | positive gold        | negative gold       |
 | ------------------- | -------------------  | ------------------- |
 | positive classified | true positive (TP)   | false positive (FP) |
 | negative classified | false negatives (FN) | true negative (TN)  |
 - True Positives: Classifier bewertet positiv, Goldstandard sagt positiv    
 - True Negatives: Classifier sagt negativ und das stimmt auch
 - False Negatives: Classifier sagt nicht negativ, Goldstandard sagt aber positiv
    - das ist ein Fehler
    - Bsp. Suchmaschine: Die Suchmaschine liefert ein Suchresultat nicht zurück obwohl es relevant wäre
 - False Positives: Classifier sagt positive, das stimmt aber nicht
    - das ist ein weiterer Fehler
    - Bsp. Suchmaschine: Die Suchmaschine liefert ein nichtrelevantes Suchresultat zurück
 - Daraus abgeleitete Metriken:
    - **Precision**
        - Menge der TP in Bezug auf die insgesamt positiven Samples (gemäss Gold Standard)
        - Wenn mein Algorithmus keinen Fehler macht dann hat man 100% precision
        - P = TP / (Class p Classified)
        - Bsp.: Wieviele der angezeigten Webseiten, sind gemäss Gold Standard wirklich relevant?
    - **Recall**
        - Wie hoch ist der Anteil der False Negatives gemäss Gold Standard
        - R = TP / (Class p Gold)
        - Bsp. Welche Seiten die der Mensch (Gold Standard) als relevant klassifiziert hat, werden tatsächlich angezeigt?
            - Perfekt wenn all relevanten Seiten angezeigt wurden
            - Schlecht wenn keine relevanten Seiten gefunden wurden