![](/fileadmin/_processed_/d/6/csm_alain_celisse_907fc14087.jpg)
Axes de recherche
- My main background is in Mathematics applied to Statistical/Machine learning.
- I design learning strategies and also provide the theoretical analysis of their statistical performance, with a particular emphasis on scalable approaches in the context of massive data (Big Data).
- Regarding this aspect, a crucial question is understanding the trade-off between the available computational resources and the statistical precision one can achieve.
- Applications from various fields such as biology, industry or cyber-security are also welcome.
Estimator/Model selection
![](/fileadmin/user_upload/pages_pros/alain_celisse/CrossVal.jpg)
Goal: Choose among several candidate estimators/models the best one.
- Penalized (random) criteria: AIC-or BIC-like penalties
- Regularization (convex relaxation): L1 (Lasso), L2 (Ridge)
- Cross-validation: Leave-one-out, Leave-p-out, V-fold
Two possible perspectives:
- Identification: recover the "true model" (if any)
- Estimation/prediction: recover the model with the smallest generalization error
Change-points detection, anomaly/outlier detection
![](/fileadmin/_processed_/9/2/csm_fig_exampleSignal_equalMean_equalVar_withRupt_aa0819d3fe.jpg)
Data: Time-series of "objects" which can be a sequence of high-dimensional measurements from (dependent) sensors, or structured objects such as texts or audio/video streams.
Type of change: Changes in any (prescribed or not) features of the distribution along the time
- Offline: Change-points detection, segmentation
- Online: Outlier detection, anomaly detection
Asset: Does not require any distributional assumption (no parametric model)
Reproducing kernels
![](/fileadmin/_processed_/d/0/csm_RKHSsmall_cf8edc650d.jpg)
Reproducing kernels, mean embedding, Minimum Mean Discrepancy (MMD), neighboring graph between objects, combination of heterogeneous data of different nature
Rough interpretation:
- Reproducing kernels can be thought of as a "similarily measure" between objects. The more similar a pair of objects, the larger the value of the kernel evaluated at this pair of objects.
Interest:
- Reproducing kernels can deal with objects which are not necessarily vetcors (DNA sequences, graphs, video streams,...).
- As long as such a similarity measure between objects does exist, then these objects can be compared. For instance, a neighboring graph can be built from this pairwise proximity measure.
- Simple combinations of kernels can help combining descriptors (of an individual) although they are of different kinds
Parameter estimation/approximation techniques
![](/fileadmin/_processed_/d/0/csm_RunTimeExactAndApprox_a0bf75abe5.png)
![](/fileadmin/_processed_/b/7/csm_Mem_KSeg_ECP_dc926038c3.png)
Variational algorithm in the Stochastic Block Model (SBM), Low-rank matrix approximation, Random Fourier features, approximate cross-validation
Main interest and difficulty:
- Whereas an estimator can be costly to compute (or even not achievable!), replacing such an estimator by an approximation can greatly reduce the computation time.
- Several approximating strategies often exist. Choosing one of them is usually a difficult task although a large number of them perform well in practice.
Trade-off between Computation resources and Statistical precision
![](/fileadmin/_processed_/e/5/csm_truerisk_5e9b48fc21.jpg)
Motivation:
- Most estimators are defined as minimizers of an optimization problem.
- Optimization algorithms are mainly used to output an (approximate) evaluation
- Numerous optimization algorithms are itertive ones (Gradient descent, Stocastic gradient descent, EM-algorithm, coordinate descent,...)
Goal:
- Reducing the computational burden (time and memory), while keeping a reliable statistical performance
- Designing an early stopping rule, that is a data-driven stopping rule indicating when to stop the iterative optimization process
Stability of learning algorithms and concentration inequalities
Strategy:
- Introduce a new notion of stability for learning algorithms
- Exploit connections between this notion of stability and concentration inequalities
- Derive (tighter) concentration results for classical learning algorithms (Ridge regression, k-Nearest Neighbors, Nadaraya-Watson estimators,...)
Applications
![](/fileadmin/user_upload/pages_pros/alain_celisse/snp_2.jpg)
- Biostatistics:
- Multiple testing: Identifying genes/SNPs that are differentially expressed between two experimental conditions.
- Change-points detection: Detecting copy number variations along the genome, including variations of the allelic ratio.
- Lasso-like strategies: Supervised selection of features (SNPs for instance) that are related to a disease (cancer) in a high-dimensional context by exploiting the existing between-features redundancy.
- Industry:
In a supervised framework:
- Identifying weak events related to some failures occurrences.
- Designing data-driven rules allowing for detecting weak events online.