Depois do sucesso da primeira edição, a SBio-SPE e a SGAPEIO (Sociedade Galega para a Promoción da Estatistica e da Investigación de Operacións) organizam o II Simpósio Online em Biometria Galaico-Portuguesa.
O evento terá lugar no próximo dia 6 de março de 2026, das 14:30 às 16:30 (hora de Portugal Continental); 15:30–17:30 (hora da Galiza) e contará com a participação de:
Informações e Inscrição
O Simpósio irá decorrer online, via Microsoft Teams, no dia 6 de março de 2026, 14:30–16:30 (Portugal Continental; 15:30–17:30 Galiza). A participação é gratuita e aberta a todos, incluindo não sócios da SPE/SGAPEIO. Contudo, a inscrição é obrigatória para disponibilização do link de acesso à sessão. O registo no Simpósio pode ser efetuado através do seguinte formulário (Google Forms).
Resumos dos Oradores [EN]
Os títulos e resumos dos oradores deste II Simpósio Online em Biometria Galaico-Portuguesa são aqui disponibilizados em inglês.
Vera Pinto, Faculdade de Ciências, Universidade de Lisboa
Title: From Metrics to Choice: A Decision Guide for Variant Caller Selection
Abstract: The increasing diversity of variant calling methods, ranging from classical probabilistic models to haplotype-based approaches and deep learning, has improved performance but also made tool selection less straightforward. In practice, the “best” caller depends on the relative costs of false positives and false negatives, as well as available computational resources. This presentation frames variant caller selection as a decision problem supported by benchmarking evidence. Using NA12878 and a curated truth set, we compare several commonly used callers and summarize how their performance profiles differ when assessed with precision, recall, and F1-score. Finally, we propose a simple, user-oriented framework to guide choice under realistic constraints: defining the analysis objective, selecting the metric that best reflects that objective, and understanding the accuracy–runtime trade-off. The goal is to make benchmarking results directly usable for reproducible and informed pipeline design.
María Alonso Pena, Universidade de Santiago de Compostela
Title: Nonparametric Mode Estimation and Clustering for Circular and Axial Data
Abstract: Circular and axial data arise whenever observations are angular, such as orientations or directions, and require methods that respect their non-Euclidean geometry. In this talk, we present a nonparametric framework for mode estimation and clustering based on kernel density estimation and mean-shift algorithms. The approach enables data-driven identification of dominant modes without pre-specifying the number of clusters. Particular attention is given to bandwidth selection, as the smoothing parameter directly affects mode detection. We discuss the role of derivative-based selectors in this context. The methodology is illustrated with porphyroblast inclusion-trail orientations from structural geology, where the detected clusters can be linked to successive deformation phases and tectonic evolution.
Tiago Paixão, Gulbenkian Institute for Molecular Medicine
Title: A Bayesian Framework for Inferring Selection Coefficients in Evolve-and-Resequence Experiments
Abstract: Evolve-and-resequence experiments using pooled sequencing (PooledSeq) are widely used to identify genetic variants underlying adaptation to specific selection pressures. The standard analytical approach—the Cochran-Mantel-Haenszel (CMH) test—treats this as a stratified association problem, testing for consistent relationships between treatment and time across replicates. However, this method has critical limitations: it assumes arbitrary pairings between evolved and control populations, cannot accommodate multiple timepoints, and crucially, ignores the known stochastic processes generating the data. The CMH approach lacks information about how variance depends on generations of evolution, initial allele frequencies, and selection coefficients, potentially leading to spurious associations. We present a principled Bayesian framework that incorporates population genetic theory directly into the inference process. Our method models allele frequency dynamics using established evolutionary principles—including selection, dominance, and genetic drift—approximated through stochastic differential equations suitable for the large populations typical of Drosophila experiments. The framework explicitly accounts for multiple sources of variance: Wright-Fisher sampling during evolution, population subsampling for sequencing, and variable sequencing depth. Unlike the CMH test, our approach does not require control populations, can handle arbitrary numbers of timepoints, and simultaneously estimates selection coefficients and dominance parameters with associated uncertainty. We implement this Bayesian method using PyMC and compare its performance against the CMH procedure using Wright-Fisher simulations. By incorporating mechanistic knowledge about the data-generating process rather than applying generic statistical tests, our framework extracts more information from experimental data and provides biologically interpretable parameter estimates for understanding adaptive evolution.
Javier Roca Pardiñas, Universidade de Vigo
Title: Robust Estimation of Conditional Reference Regions: Application to Diabetes Research
Abstract: In clinical practice, the interpretation of continuous diagnostic markers requires reference intervals. For a single marker, these intervals define a range that captures 95% of the results from healthy individuals. However, for diseases involving two correlated markers, such as diabetes, a more powerful approach is to analyze them jointly using reference regions. These regions contain 95% of the results from healthy individuals, taking into account the joint distribution of both markers. In this work, we explore and propose advanced methods for the robust estimation of conditional reference regions. We propose a new bivariate regression model based on quantile regression that estimates conditional bivariate regions. This method offers several advantages over previous approaches, including flexibility in modeling covariate effects, robustness to outliers, the ability to capture nonlinear covariate effects using cyclic spline functions, and the property that the estimated region leaves the same percentage of data outside in all possible directions. We validate our model through simulations, demonstrating its accuracy and robustness in the presence of outliers, and we explore its behavior on both Gaussian and non-Gaussian data. Finally, we apply it to a real study related to diabetes, modeling the joint distribution of two blood sugar markers and analyzing how age affects their distribution.
A Comissão Organizadora,
—Nuno Sepúlveda, João Malato, Clara Cordeiro, Marta Sestelo
O evento terá lugar no próximo dia 6 de março de 2026, das 14:30 às 16:30 (hora de Portugal Continental); 15:30–17:30 (hora da Galiza) e contará com a participação de:
- Vera Pinto, Faculdade de Ciências da Universidade de Lisboa
- Maria Alonso Pena, Universidade de Santiago de Compostela
- Tiago Paixão, Gulbenkian Institute for Molecular Medicine
- Javier Roca Pardiñas, Universidade de Vigo
Informações e Inscrição
O Simpósio irá decorrer online, via Microsoft Teams, no dia 6 de março de 2026, 14:30–16:30 (Portugal Continental; 15:30–17:30 Galiza). A participação é gratuita e aberta a todos, incluindo não sócios da SPE/SGAPEIO. Contudo, a inscrição é obrigatória para disponibilização do link de acesso à sessão. O registo no Simpósio pode ser efetuado através do seguinte formulário (Google Forms).
Resumos dos Oradores [EN]
Os títulos e resumos dos oradores deste II Simpósio Online em Biometria Galaico-Portuguesa são aqui disponibilizados em inglês.
Vera Pinto, Faculdade de Ciências, Universidade de Lisboa
Title: From Metrics to Choice: A Decision Guide for Variant Caller Selection
Abstract: The increasing diversity of variant calling methods, ranging from classical probabilistic models to haplotype-based approaches and deep learning, has improved performance but also made tool selection less straightforward. In practice, the “best” caller depends on the relative costs of false positives and false negatives, as well as available computational resources. This presentation frames variant caller selection as a decision problem supported by benchmarking evidence. Using NA12878 and a curated truth set, we compare several commonly used callers and summarize how their performance profiles differ when assessed with precision, recall, and F1-score. Finally, we propose a simple, user-oriented framework to guide choice under realistic constraints: defining the analysis objective, selecting the metric that best reflects that objective, and understanding the accuracy–runtime trade-off. The goal is to make benchmarking results directly usable for reproducible and informed pipeline design.
María Alonso Pena, Universidade de Santiago de Compostela
Title: Nonparametric Mode Estimation and Clustering for Circular and Axial Data
Abstract: Circular and axial data arise whenever observations are angular, such as orientations or directions, and require methods that respect their non-Euclidean geometry. In this talk, we present a nonparametric framework for mode estimation and clustering based on kernel density estimation and mean-shift algorithms. The approach enables data-driven identification of dominant modes without pre-specifying the number of clusters. Particular attention is given to bandwidth selection, as the smoothing parameter directly affects mode detection. We discuss the role of derivative-based selectors in this context. The methodology is illustrated with porphyroblast inclusion-trail orientations from structural geology, where the detected clusters can be linked to successive deformation phases and tectonic evolution.
Tiago Paixão, Gulbenkian Institute for Molecular Medicine
Title: A Bayesian Framework for Inferring Selection Coefficients in Evolve-and-Resequence Experiments
Abstract: Evolve-and-resequence experiments using pooled sequencing (PooledSeq) are widely used to identify genetic variants underlying adaptation to specific selection pressures. The standard analytical approach—the Cochran-Mantel-Haenszel (CMH) test—treats this as a stratified association problem, testing for consistent relationships between treatment and time across replicates. However, this method has critical limitations: it assumes arbitrary pairings between evolved and control populations, cannot accommodate multiple timepoints, and crucially, ignores the known stochastic processes generating the data. The CMH approach lacks information about how variance depends on generations of evolution, initial allele frequencies, and selection coefficients, potentially leading to spurious associations. We present a principled Bayesian framework that incorporates population genetic theory directly into the inference process. Our method models allele frequency dynamics using established evolutionary principles—including selection, dominance, and genetic drift—approximated through stochastic differential equations suitable for the large populations typical of Drosophila experiments. The framework explicitly accounts for multiple sources of variance: Wright-Fisher sampling during evolution, population subsampling for sequencing, and variable sequencing depth. Unlike the CMH test, our approach does not require control populations, can handle arbitrary numbers of timepoints, and simultaneously estimates selection coefficients and dominance parameters with associated uncertainty. We implement this Bayesian method using PyMC and compare its performance against the CMH procedure using Wright-Fisher simulations. By incorporating mechanistic knowledge about the data-generating process rather than applying generic statistical tests, our framework extracts more information from experimental data and provides biologically interpretable parameter estimates for understanding adaptive evolution.
Javier Roca Pardiñas, Universidade de Vigo
Title: Robust Estimation of Conditional Reference Regions: Application to Diabetes Research
Abstract: In clinical practice, the interpretation of continuous diagnostic markers requires reference intervals. For a single marker, these intervals define a range that captures 95% of the results from healthy individuals. However, for diseases involving two correlated markers, such as diabetes, a more powerful approach is to analyze them jointly using reference regions. These regions contain 95% of the results from healthy individuals, taking into account the joint distribution of both markers. In this work, we explore and propose advanced methods for the robust estimation of conditional reference regions. We propose a new bivariate regression model based on quantile regression that estimates conditional bivariate regions. This method offers several advantages over previous approaches, including flexibility in modeling covariate effects, robustness to outliers, the ability to capture nonlinear covariate effects using cyclic spline functions, and the property that the estimated region leaves the same percentage of data outside in all possible directions. We validate our model through simulations, demonstrating its accuracy and robustness in the presence of outliers, and we explore its behavior on both Gaussian and non-Gaussian data. Finally, we apply it to a real study related to diabetes, modeling the joint distribution of two blood sugar markers and analyzing how age affects their distribution.
A Comissão Organizadora,
—Nuno Sepúlveda, João Malato, Clara Cordeiro, Marta Sestelo