Axes de recherche
- My main background is in Mathematics applied to Statistical/Machine learning.
- I design learning strategies and also provide the theoretical analysis of their statistical performance, with a particular emphasis on scalable approaches in the context of massive data (Big Data).
- Regarding this aspect, a crucial question is understanding the trade-off between the available computational resources and the statistical precision one can achieve.
- Applications from various fields such as biology, industry or cyber-security are also welcome.
Goal: Choose among several candidate estimators/models the best one.
- Penalized (random) criteria: AIC-or BIC-like penalties
- Regularization (convex relaxation): L1 (Lasso), L2 (Ridge)
- Cross-validation: Leave-one-out, Leave-p-out, V-fold
Two possible perspectives:
- Identification: recover the "true model" (if any)
- Estimation/prediction: recover the model with the smallest generalization error
Change-points detection, anomaly/outlier detection
Data: Time-series of "objects" which can be a sequence of high-dimensional measurements from (dependent) sensors, or structured objects such as texts or audio/video streams.
Type of change: Changes in any (prescribed or not) features of the distribution along the time
- Offline: Change-points detection, segmentation
- Online: Outlier detection, anomaly detection
Asset: Does not require any distributional assumption (no parametric model)
Reproducing kernels, mean embedding, Minimum Mean Discrepancy (MMD), neighboring graph between objects, combination of heterogeneous data of different nature
- Reproducing kernels can be thought of as a "similarily measure" between objects. The more similar a pair of objects, the larger the value of the kernel evaluated at this pair of objects.
- Reproducing kernels can deal with objects which are not necessarily vetcors (DNA sequences, graphs, video streams,...).
- As long as such a similarity measure between objects does exist, then these objects can be compared. For instance, a neighboring graph can be built from this pairwise proximity measure.
- Simple combinations of kernels can help combining descriptors (of an individual) although they are of different kinds
Parameter estimation/approximation techniques
Variational algorithm in the Stochastic Block Model (SBM), Low-rank matrix approximation, Random Fourier features, approximate cross-validation
Main interest and difficulty:
- Whereas an estimator can be costly to compute (or even not achievable!), replacing such an estimator by an approximation can greatly reduce the computation time.
- Several approximating strategies often exist. Choosing one of them is usually a difficult task although a large number of them perform well in practice.
Trade-off between Computation resources and Statistical precision
- Most estimators are defined as minimizers of an optimization problem.
- Optimization algorithms are mainly used to output an (approximate) evaluation
- Numerous optimization algorithms are itertive ones (Gradient descent, Stocastic gradient descent, EM-algorithm, coordinate descent,...)
- Reducing the computational burden (time and memory), while keeping a reliable statistical performance
- Designing an early stopping rule, that is a data-driven stopping rule indicating when to stop the iterative optimization process
Stability of learning algorithms and concentration inequalities
- Introduce a new notion of stability for learning algorithms
- Exploit connections between this notion of stability and concentration inequalities
- Derive (tighter) concentration results for classical learning algorithms (Ridge regression, k-Nearest Neighbors, Nadaraya-Watson estimators,...)
- Multiple testing: Identifying genes/SNPs that are differentially expressed between two experimental conditions.
- Change-points detection: Detecting copy number variations along the genome, including variations of the allelic ratio.
- Lasso-like strategies: Supervised selection of features (SNPs for instance) that are related to a disease (cancer) in a high-dimensional context by exploiting the existing between-features redundancy.
In a supervised framework:
- Identifying weak events related to some failures occurrences.
- Designing data-driven rules allowing for detecting weak events online.