Notizen Lektion 2

Thema: Einführung Practical Machine Learning 2 Datum: 22.04.2026 Dozent: Jürgen Vogel

recap

[!NOTE] Definition algorithm that learns from experience E to solve some tasks T with performance P and P improves with E

Model
- represents the solution to the tasks T
- is learnt and adapted based on E
- can be evaluated with respect to P
Features
- are the relevant part of the data E for creating the model
- may have to be designed explicitly depending on the ML algorithm
Categorization with respect to
- experience E: supervised vs. unsupervised vs. reinforcement leanring
- tasks T: clustering vs. classification vs. regresseion
- human-readable model: white box vs. black box
Project
- agile/iterative development (CRISP-DM)
Key Challenges
- definition of T that is both solvable and generates value
- large amounts of high quality data E
- feature engineering
- dealing with 95% models

returned result is good if it solves the problem at hand
- may be qualitative or quantitative
- may be subjective (user need, context, and preferences)
- may change over time
- also depends on factors such as credibility, specificity, exhaustivitiy, recency, clarity, interpretability... of the result
Beispiel Suchmaschine: Eine Reihe von Keywords werden in eine Suchmaschine eingegeben
- Wann ist die Antwort der Suchmaschine "gut"?
  - Schwirig zu beantworten, da es sich von Nutzer zu Nutzer unterscheided
- Casual User: Frage aus generellem Context -> generellere Antwort okay
  - "Wo ist in Laufdistanz ein Restaurant, das offen ist"
    - Man will nicht das bestmögliche Setting finden und alle Restaurants finde
  - Schnelles Ergebnis und gut genug
- Expert User: Recherchiert sehr detailierte Informationen
  - Umfangreiche Analyse machen
  - Was gibts alles für wiss. Literatur zum Thema?
  - Was sind die besten Verfahren?
  - Informationsbedürfnis sehr hoch
thus, the ML system needs to be assessed in "real-life" situations
- often with user involvement
- similar methods as with user requirements research
  - usability tests, interviews, field studies, log analysis
- but this takes time and is costly

Wichtig:
- Success Rate
- Error Rate
Success
- Result is correct -> ein einzelnes Sample ist richtig klassifiziert worden
- success rate -> durschnitt über grössere Menge samples
  - nennt man auch accuracy
Error
- Result is incorrect -> ein einzelnes Sample ist falsch
- error rate -> durschnitt über grössere Menge samples
Beides ist eine 1/0 Betrachtung -> Entweder falsch oder richtig
Bsp: Wie viele Personen sind auf Bild
- Modell sagt 3 Personen
- Auf Bild sind 5 Personen
- Wie bewertet man das?
  - falsch? -> 100% error
  - ein bisschen richtig? 3/5 erkannt 2/5 fehler
Generalisieren wir die Erfolgsrate erhält man
- our ML system takes some test data D as input and produces some results
  - D -> {r'1, ... r'n}
  - e.g. if r'i are from a list of predefined labels , we call this classification
- the test data also includes the expected result "gold standard"
  - D -> {r1, ..., rn}
- for the test setting, we define some comparison functions
  - c(r, r') = 1 if r = r', 0 else # vergleichsfunktion
- then we can calculate the success rate SR as
  - SR = (1/n)*sum(i=1, n, c(ri, r'i))

Bsp. Suchmaschine -> Man will evaluieren ob das Modell gut funktioniert
- Für eine Suchanfrage wurde ein Test Set zusammengestellt
- Manuell bewertet (Gold Standard):
  - Man weiss für jeden Eintrag: Website passt oder passt nicht

Bewertung:

	positive gold	negative gold
positive classified	true positive (TP)	false positive (FP)
negative classified	false negatives (FN)	true negative (TN)

True Positives: Classifier bewertet positiv, Goldstandard sagt positiv
True Negatives: Classifier sagt negativ und das stimmt auch
False Negatives: Classifier sagt nicht negativ, Goldstandard sagt aber positiv
- das ist ein Fehler
- Bsp. Suchmaschine: Die Suchmaschine liefert ein Suchresultat nicht zurück obwohl es relevant wäre
False Positives: Classifier sagt positive, das stimmt aber nicht
- das ist ein weiterer Fehler
- Bsp. Suchmaschine: Die Suchmaschine liefert ein nichtrelevantes Suchresultat zurück
Daraus abgeleitete Metriken:
- Precision
  - Menge der TP in Bezug auf die insgesamt positiven Samples (gemäss Gold Standard)
  - Wenn mein Algorithmus keinen Fehler macht dann hat man 100% precision
  - P = TP / (Class p Classified)
  - Bsp.: Wieviele der angezeigten Webseiten, sind gemäss Gold Standard wirklich relevant?
- Recall
  - Wie hoch ist der Anteil der False Negatives gemäss Gold Standard
  - R = TP / (Class p Gold)
  - Bsp. Welche Seiten die der Mensch (Gold Standard) als relevant klassifiziert hat, werden tatsächlich angezeigt?
    - Perfekt wenn all relevanten Seiten angezeigt wurden
    - Schlecht wenn keine relevanten Seiten gefunden wurden