Types Of Machine Learning

VARIOUS MODEL FAMILIES

​Stanford cs221 - reflex, variable, state, logic

WEAKLY SUPERVISED

    1.
    ​Text classification with extremely small datasets, relies heavily on feature engineering methods such as number of hashtags, number of punctuations and other insights that are really good for this type of text.
    2.
    A great review paper for weakly supervision, discusses:
      1.
      Incomplete supervision
      2.
      Inaccurate
      3.
      Inexact
      4.
      Active learning
    3.
    ​Stanford on weakly
    6.
    ​Out of distribution generalization using test-time training - Test-time training turns a single unlabeled test instance into a self-supervised learning problem, on which we update the model parameters before making a prediction on this instance.
    7.
    ​Learning Deep Networks from Noisy Labels with Dropout Regularization - Large datasets often have unreliable labelsβ€”such as those obtained from Amazon’s Mechanical Turk or social media platformsβ€”and classifiers trained on mislabeled datasets often exhibit poor performance. We present a simple, effective technique for accounting for label noise when training deep neural networks. We augment a standard deep network with a softmax layer that models the label noise statistics. Then, we train the deep network and noise model jointly via end-to-end stochastic gradient descent on the (perhaps mislabeled) dataset. The augmented model is overdetermined, so in order to encourage the learning of a non-trivial noise model, we apply dropout regularization to the weights of the noise model during training. Numerical experiments on noisy versions of the CIFAR-10 and MNIST datasets show that the proposed dropout technique outperforms state-of-the-art methods.
    8.
    ​Distill to label weakly supervised instance labeling using knowledge distillation - β€œWeakly supervised instance labeling using only image-level labels, in lieu of expensive fine-grained pixel annotations, is crucial in several applications including medical image analysis. In contrast to conventional instance segmentation scenarios in computer vision, the problems that we consider are characterized by a small number of training images and non-local patterns that lead to the diagnosis. In this paper, we explore the use of multiple instance learning (MIL) to design an instance label generator under this weakly supervised setting. Motivated by the observation that an MIL model can handle bags of varying sizes, we propose to repurpose an MIL model originally trained for bag-level classification to produce reliable predictions for single instances, i.e., bags of size 1. To this end, we introduce a novel regularization strategy based on virtual adversarial training for improving MIL training, and subsequently develop a knowledge distillation technique for repurposing the trained MIL model. Using empirical studies on colon cancer and breast cancer detection from histopathological images, we show that the proposed approach produces high-quality instance-level prediction and significantly outperforms state-of-the MIL methods.”

SEMI SUPERVISED

    1.
    ​Paper review​
    5.
    ​Fast ai forums​
    7.
    ​s4l​
    8.
    ​Google’s UDM and MixMatch dissected- For text classification, the authors used a combination of back translation and a new method called TF-IDF based word replacing.
Back translation consists of translating a sentence into some other intermediate language (e.g. French) and then translating it back to the original language (English in this case). The authors trained an English-to-French and French-to-English system on the WMT 14 corpus.
TF-IDF word replacement replaces words in a sentence at random based on the TF-IDF scores of each word (words with a lower TF-IDF have a higher probability of being replaced).
    1.
    ​MixMatch, medium, 2, 3, 4, that works by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeled data using MixUp. We show that MixMatch obtains state-of-the-art results by a large margin across many datasets and labeled data amounts
    2.
    ReMixMatch - paper is really good. β€œWe improve the recently-proposed β€œMixMatch” semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring”
    3.
    ​FixMatch - FixMatch is a recent semi-supervised approach by Sohn et al. from Google Brain that improved the state of the art in semi-supervised learning(SSL). It is a simpler combination of previous methods such as UDA and ReMixMatch.
    ​
    7.
    ​Fidelity-Weighted Learning - β€œfidelity-weighted learning” (FWL), a semi-supervised student- teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network (trained on the task we care about) on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher (who has access to the high-quality labels). Both student and teacher are learned from the data.

REGRESSION

Metrics:
    1.
    ​R2​
    2.
    Medium 1, 2, 3, 4,
    3.
    ​Tutorial​

ACTIVE LEARNING

    1.
    If you need to start somewhere start here - types of AL, the methodology, examples, sample selection functions.
    2.
    A thorough review paper about AL
    3.
    ​The book on AL​
    4.
      1.
      The alternative is Query by committee - Importantly, the active learning method we presented above is the most naive form of what is called "uncertainty sampling" where we chose to sample based on how uncertain our model was. An alternative approach, called Query by Committee, maintains a collection of models (the committee) and selecting the most "controversial" data point to label next, that is one where the models disagreed on. Using such a committee may allow us to overcome the restricted hypothesis a single model can express, though at the onset of a task we still have no way of knowing what hypothesis we should be using.
      2.
      ​Paper: warning against transferring actively sampled datasets to other models
    6.
    7.
    Using weak and strong oracle in AL, paper.
    8.
    ​The pitfalls of AL - how to choose (cost-effectively) the active learning technique when one starts without the labeled data needed for methods like cross-validation; 2. how to choose (cost-effectively) the base learning technique when one starts without the labeled data needed for methods like cross-validation, given that we know that learning curves cross, and given possible interactions between active learning technique and base learner; 3. how to deal with highly skewed class distributions, where active learning strategies find few (or no) instances of rare classes; 4. how to deal with concepts including very small subconcepts (β€œdisjuncts”)β€”which are hard enough to find with random sampling (because of their rarity), but active learning strategies can actually avoid finding them if they are misclassified strongly to begin with; 5. how best to address the cold-start problem, and especially 6. whether and what alternatives exist for using human resources to improve learning, that may be more cost efficient than using humans simply for labeling selected cases, such as guided learning [3], active dual supervision [2], guided feature labeling [1], etc.
    10.
    A great tutorial ​
    11.
    ​An ok video​
    17.
    ​Video 2​
    20.
    ​Medium on AL***
Basic Framework for HITL Supriya Ghosh

Human In The loop ML book by Robert munro​

    1.
    ​GIT​
    3.
      1.
      Least Confidence: difference between the most confident prediction and 100% confidence
      2.
      Margin of Confidence: difference between the top two most confident predictions
      3.
      Ratio of Confidence: ratio between the top two most confident predictions
      4.
      Entropy: difference between all predictions, as defined by information theory
      5.
      ​
      ​
    4.
    ​Diversity sampling - you want to make sure that it covers as diverse a set of data and real-world demographics as possible.
      1.
      Model-based Outliers: sampling for low activation in your logits and hidden layers to find items that are confusing to your model because of lack of information
      2.
      Cluster-based Sampling: using Unsupervised Machine Learning to sample data from all the meaningful trends in your data’s feature-space
      3.
      Representative Sampling: sampling items that are the most representative of the target domain for your model, relative to your current training data
      4.
      Real-world diversity: using sampling strategies that increase fairness when trying to support real-world diversity
    1.
      1.
      Least Confidence Sampling with Clustering-based Sampling: sample items that are confusing to your model and then cluster those items to ensure a diverse sample (see diagram below).
      2.
      Uncertainty Sampling with Model-based Outliers: sample items that are confusing to your model and within those find items with low activation in the model.
      3.
      Uncertainty Sampling with Model-based Outliers and Clustering: combine methods 1 and 2.
      4.
      Representative Cluster-based Sampling: cluster your data to capture multinodal distributions and sample items that are most like your target domain (see diagram below).
      5.
      Sampling from the Highest Entropy Cluster: cluster your unlabeled data and find the cluster with the highest average confusion for your model.
      6.
      Uncertainty Sampling and Representative Sampling: sample items that are both confusing to your current model and the most like your target domain.
      7.
      Model-based Outliers and Representative Sampling: sample items that have low activation in your model but are relatively common in your target domain.
      8.
      Clustering with itself for hierarchical clusters: recursively cluster to maximize the diversity.
      9.
      Sampling from the Highest Entropy Cluster with Margin of Confidence Sampling: find the cluster with the most confusion and then sample for the maximum pairwise label confusion within that cluster.
      10.
      Combining Ensemble Methods and Dropouts with individual strategies: aggregate results that come from multiple models or multiple predictions from one model via Monte-Carlo Dropouts aka Bayesian Deep Learning.
      11.
      ​
      ​
    2.
    Active transfer learning.
    3.
    ​
    ​
Machine in the loop

ONLINE LEARNING

    1.
    If you want to start with OL - start here & here​
    3.
    ​Some answers about what is OL, the first one actually talks about S.Shalev’s other paper.​
    4.
    Online learning - Andrew Ng - coursera​

ONLINE DEEP LEARNING (ODL)

    1.
    ​Hedge back propagation (HDP), Autonomous DL, Qactor - online AL for noisy labeled stream data.

N-SHOT LEARNING

    1.
    ​Zero shot, one shot, few shot (siamese is one shot)

ZERO SHOT LEARNING

    1.
    ​Instead of using class labels, we use some kind of vector representation for the classes, taken from a co-occurrence-after-svd or word2vec. - quite clever. This enables us to figure out if a new unseen class is near one of the known supervised classes. KNN can be used or some other distance-based classifier. Can we use word2vec for similarity measurements of new classes?
    ​
    2.
    for classification, we can use nearest neighbour or manifold-based labeling propagation.
    ​
    3.
    Multiple category vectors? Multilabel zero-shot also in the video

GPT3 is ZERO, ONE, FEW

Last modified 3d ago