Welcome

The Workshop on Probabilistic and Statistical Methods is a meeting organized by the Join Graduate Program in Statistics UFSCar/USP (PIPGEs) with the aim of discussing new developments in statistics, probability and their applications.

Main

Activities include invited speaker sessions, short talks, a poster session and a short course devoted to graduate students. The topics of this new edition include probability and stochastic processes, statistical inference, regression models, survival analysis and related topics.

Confirmed Speakers

Alexandre Patriota - IME-USP
André Ponce de Leon F. de Carvalho - ICMC-USP
Artur Lemonte Jorge - UFRN
Dylan Molenaar - University of Amsterdam
Fabio Gagliardi Cozman - POLI-USP
Fábio Prates Machado - IME-USP
Fernanda de Bastiani - UFPE
Gabriela Cybis - UFRGS
Guilherme Ludwig - UNICAMP
Iddo Ben-Ari - University of Connecticut
João Ricardo Sato - UFABC
Larissa Avila Matos - UNICAMP
Luis Gustavo Nonato - ICMC-USP
Marcelo Andrade - USP-UFSCAR
Mauricio Sadinle - University of Washington
Miguel Abadi - IME-USP

Committees

Organizing Committee

Jorge Luís Bazan - ICMC-USP
Mariana Curi - ICMC-USP(Chair)
Rafael Izbicki - DEs-UFSCar(Chair)
Renato Jacob Gava - DEs-UFSCar
Vera Tomazella - DEs-UFSCar

Scientific Committee

Adriano Polpo de Campos - University of Western Australia
Carlos Alberto de Bragança Pereira - IME-USP
Francisco Louzada Neto - ICMC/USP
Mario de Castro - ICMC-USP
Osvaldo Anacleto
Vera Tomazella - UFSCar

Support Committee
(students from PIPGEs)

Marco Inacio
Gustavo Sabillón

Program

See below the titles and abstracts of the conferences and talks.

Conferences

On some assumptions of the null hypothesis statistical testing

Alexandre Patriota, IME-USP

×

On some assumptions of the null hypothesis statistical testing

Alexandre Patriota, IME-USP

Bayesian and classical statistical approaches are based on different types of logical principles. In order to avoid mistaken inferences and misguided interpretations, the practitioner must respect the inference rules embedded into each statistical method. Ignoring these principles leads to the paradoxical conclusions that the hypothesis m1=m2 could be less supported by the data than a more restrictive hypothesis such as m1=m2=0,where m1 and m2 are two population means. This article intends to discuss and explicitsome important assumptions inherent to classical statistical models and null statisticalhypotheses. Furthermore, the definition of the p-value and its limitations are analyzed. An alternative measure of evidence, the s-value, is discussed. This article presents the steps to compute s-values and, in order to illustrate the methods, some standard examples are analyzed and compared withp-values. The examples denunciate that p-values,as opposed tos-values, fail to hold some logical relations.s

From Big Data to Data Science

André Ponce de Leon F. de Carvalho, ICMC-USP

×

From Big Data to Data Science

André Ponce de Leon F. de Carvalho, ICMC-USP

With to the recent expansion in data generation and the growing importance of exploring the knowledge contained in these data, Data Science is one of the fastest growing area of Exact Sciences. Large companies, like Amazon, Apple, Disney, Facebook, Google and Microsoft are hiring a large number of scientists, engineers and statisticians to work in this area. The ability to acquire, store and transmit data from the most diverse human activities, in the public and private sectors, has grown exponentially. This is generating massive volumes of data. These massive volumes of data, known as Big Data, come from a variety of sources and therefore have a wide variety of structures, ranging from traditional attribute-value tables to videos and messages on social networks. Analyzing these massive volumes of data can generate valuable information for decision making, enabling the extraction of new and useful knowledge. The difficulty of this analysis by traditional data analysis techniques has led to the development of new techniques, expanding the area of Data Science. This talk will present the main aspects, challenges and applications of Big Data and Data Science.

Poder local dos testes da razão de verossimilhanças, Wald, escore e gradiente sob ortogonalidade

Artur Lemonte Jorge, UFRN

×

Poder local dos testes da razão de verossimilhanças, Wald, escore e gradiente sob ortogonalidade

Artur Lemonte Jorge, UFRN

Os poderes locais dos testes da razão de verossimilhanças, Wald, escore de Rao e gradiente sob a presença de um vetor de parâmetros, ômega, que é ortogonal aos parâmetros restantes são considerados nesta apresentação. Será mostrado que alguns dos coeficientes que definem os poderes locais destes testes ficam inalterados independentemente se ômega é conhecido ou precisa ser estimado, enquanto que os outros coeficientes podem ser expressados como a soma de dois termos, o primeiro deles corresponde ao termo que é obtido como se ômega fosse conhecido, e o segundo, um termo adicional produzido pelo fato de ômega ser desconhecido. Esse resultado será aplicado na classe de modelos de regressão não lineares mistos e os poderes locais dos testes serão comparados.

Studying Variability of Measurement Model Parameters across Continuous Background Variables

Dylan Molenaar (University of Amsterdam)

×

Studying Variability of Measurement Model Parameters across Continuous Background Variables

Dylan Molenaar (University of Amsterdam)

Studying Variability of Measurement Model Parameters across Continuous Background Variables Multi-group latent variable modeling approaches to study variability of measurement model parameters across categorical background variables like gender, cohort, and experimental conditions have been well established (e.g., Jöreskog, 1971; Meredith, 1993; Mellenbergh, 1989; Millsap, 2012). However, in many applications, the background variable is a continuous variable, for instance, age, socio economic status, or IQ. To study parameter variability in such cases, one can pragmatically choose to categorize the continuous background variable and apply a traditional multi-group model. However, this approach is suboptimal for various reasons. In the present talk, parametric and non-parametric alternatives to study parameter variability across continuous background variables are presented including moderated latent variable models, locally weighted latent variable models, and mixture factor models. The statistical properties of the models are studied, and the models are illustrated on real datasets. References: Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36, 409- 426 Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525-543. Mellenbergh, G. J. (1989). Item bias and item response theory. International journal of educational research, 13(2), 127-143. Millsap, R. E. (2012). Statistical approaches to measurement invariance. Routledge.

A random walk with catastrophes

Ido Ben-Ari (University of Connecticut)

×

A random walk with catastrophes

Ido Ben-Ari (University of Connecticut)

Random population dynamics with catastrophes (events pertaining to possible elimination of a large portion of the population) has a long history in the mathematical literature. In this paper we study an ergodic model for random population dynamics with linear growth and binomial catastrophes: in a catastrophe, each individual survives with some fixed probability, independently of the rest. Through a coupling construction, we obtain sharp two-sided bounds for the rate of convergence to stationarity which are applied to show that the model exhibits a cutoff phenomenon.

A semiparametric mixed-effects model for censored longitudinal data

Larissa Avila Matos, UNICAMP

×

A semiparametric mixed-effects model for censored longitudinal data

Larissa Avila Matos, UNICAMP

A semiparametric mixed-effects model for censored longitudinal data In longitudinal studies involving laboratory-based outcomes, repeated measurements can be censored due to assay detection limits. Linear mixed-effects (LME) models are a powerful tool to model the relationship between a response variable and covariates in longitudinal studies. However, the linear parametric form of LME models is often too restrictive to characterize the complex relationship between a response variable and covariates. More general and robust modeling tools, such as nonparametric and semiparametric regression models, have become increasingly popular in the last decade. In this article, we use semiparametric mixed models to analyze censored longitudinal data with irregularly observed repeated measures. The proposed model extends the censored LME model and provides more flexible modeling schemes by allowing the time effect to vary nonparametrically over time. We develop an EM algorithm for maximum penalized likelihood (MPL) estimation of model parameters and the nonparametric component. Further, as a byproduct of the EM algorithm, the smoothing parameter is estimated using a modified LME model, which is faster than alternative methods such as the restricted maximum likelihood (REML) approach. Finally, the performance of the proposed approaches is evaluated through extensive simulation studies as well as applications to datasets from AIDS studies.

Nonparametric Identified Methods to Handle Nonignorable Missing Data

Mauricio Sadinle, University of Washington

×

Mauricio Sadinle, University of Washington

Nonparametric Identified Methods to Handle Nonignorable Missing Data

There has recently been a lot of interest in developing approaches to handle missing data that go beyond the traditional assumptions of the missing data being missing at random and the nonresponse mechanism being ignorable. Of particular interest are approaches that have the property of being nonparametric identified, because these approaches do not impose parametric restrictions on the observed-data distribution (what we can estimate from the observed data) while allowing estimation under a full-data distribution. When comparing inferences obtained from different nonparametric identified approaches, we can be sure that any discrepancies are the result of the different identifying assumptions imposed on the parts of the full-data distribution that cannot be estimated from the observed data, and consequently these approaches are especially useful for sensitivity analyses. In this talk I will present some recent developments in this area of research and discuss current challenges.

Mini-Conferences

Scalable modeling of nonstationary covariance functions with regularized

Guilherme Ludwig, UNICAMP

×

Guilherme Ludwig, UNICAMP

Scalable modeling of nonstationary covariance functions with regularized

B-spline deformations. \textbf{Abstract:} We propose a semiparametric method for nonstationary covariance function modeling, based on the spatial deformation method of Sampson and Guttorp (1992), but using a low-rank, scalable, regularized deformation function. We show that fine tuning of the regularization parameters can ensure that the deformation does not fold in on itself, and therefore yields proper covariance function estimates. An application to rainfall data illustrates the method.

Um método de busca de matriz Q em modelos da TRI multidimensionais

Marcelo Andrade (ICMC-USP/UFSCar)

×

Marcelo Andrade (ICMC-USP/UFSCar)

Um método de busca de matriz Q em modelos da TRI multidimensionais

Recentemente, a matriz Q, que é um elemento responsável por captar a relação entre itens e traços latentes e está presente na grande maioria dos modelos de diagnóstico cognitivo (MDC), foi incorporada na formulação dos modelos da teoria da resposta ao item multidimensionais (TRIM). Na prática, para utilizar um modelo com matriz Q, seja ele um MDC ou um modelo da TRIM, é necessário primeiro estabelecer a relação entre itens e traços latentes construindo uma matriz Q apropriada. Embora este processo de construção da matriz Q seja tipicamente feita por especialistas no tema dos itens, ele é um processo subjetivo e, portanto, pode implicar em equívocos, resultando em importantes problemas práticos, como, por exemplo, influenciar a estimação dos parâmetros do modelo. Neste trabalho, propomos um método de busca de matriz Q em modelos da TRIM baseado em critérios projetados para fornecer informação máxima sobre alguma propriedade de interesse. Para isso, utilizamos o algoritmo de troca, um método eficiente e sistemático para a busca de matrizes, visando encontrar uma matriz Q que maximize alguma propriedade de interesse. Além disso, o método fornece informações de eficiência que podem ser úteis na reavaliação de uma matriz Q. Um estudo de simulação foi realizado para analisar o desempenho do método proposto utilizando dois diferentes cenários construídos a partir da variação do número de itens e do número de dimensões do traço latente. Para ilustrar o uso do método de busca de matriz Q em dados reais, utilizamos um conjunto de dados com respostas de 1.111 estudantes a uma versão do BDI com 21 itens.

Special Sessions

Machine Learning

Machine Learning, coordinated by Rafael Stern, DEs - UFSCar

Explaining Machine Learning

Fabio Gagliardi Cozman (Poli-USP)

×

Fabio Gagliardi Cozman (Poli-USP)

Explaining Machine Learning

There is now wide interest in techniques that explain automatic decisions generated through machine learning. And there are several ways to generate such interpretations and to explain what happens inside a machine learning "black-box" such as a large random forest or a deep neural network. This talk will discuss various perspectives on how best to explain machine learning techniques, focusing both on textual explanations and on explanations for large-scale embeddings.

Applications of Computational Statistics and Machine Learning in Brain Imaging Data

João Ricardo Sato (UFABC)

×

João Ricardo Sato (UFABC)

Applications of Computational Statistics and Machine Learning in Brain Imaging Data

In this presentation, we will introduce how computational statistics and machine learning methods can be useful to analyze brain imaging data. This is an interdisciplinary research which requires state-of-the-art knowledge from both Neuroscience and machine learning to be successful. Brain imaging data will be introduced to the audience, with the main concern on data structure and quantitative features. Moreover, the mindset of how this combination may be approached in order to raise new questions will be discussed. The presentation will also focus on illustrations of these applications in neurodevelopment, brain disorders, brain-computer interfaces, and Education. Finally, the main challenges and perspectives will be discussed.

Spatio-Temporal Data Analytics via Graph Signal Processing

Luís Gustavo Nonato (ICMC-USP)

×

Luís Gustavo Nonato (ICMC-USP)

Spatio-Temporal Data Analytics via Graph Signal Processing

Signal processing has long been a fundamental tool in fields such as image processing, computer vision, and computer graphics, leveraging the development of filtering mechanisms designed to tackle problems ranging from denoising to object registration. More recently, the signal processing machinery has been extended to unstructured domains such as graphs, fostering a multitude of new theoretical developments and applications. In this talk, we show how graph signal processing is being combined with machine learning techniques to assist in the analysis of spatiotemporal data, leveraging the development of a number of visual analytic tools. In particular, we present examples of applications involving taxi data analysis, identification of crime patterns and study of dynamic networks. The design of filters such as edge-detection and feature preserving smoothing in graphs will also be discussed.

Least Ambiguous Set-Valued Classifiers with Bounded Error Levels

Mauricio Sadinle, University of Washington

×

Mauricio Sadinle, University of Washington

Least Ambiguous Set-Valued Classifiers with Bounded Error Levels

In most classification tasks there are observations that are ambiguous and therefore difficult to correctly label. Set-valued classifiers output sets of plausible labels rather than a single label, thereby giving a more appropriate and informative treatment to the labeling of ambiguous instances. We introduce a framework for multiclass set-valued classification, where the classifiers guarantee user-defined levels of coverage or confidence (the probability that the true label is contained in the set) while minimizing the ambiguity (the expected size of the output). We first derive oracle classifiers assuming the true distribution to be known. We show that the oracle classifiers are obtained from level sets of the functions that define the conditional probability of each class. Then we develop estimators with good asymptotic and finite sample properties. The proposed estimators build on existing single-label classifiers. The optimal classifier can sometimes output the empty set, but we provide two solutions to fix this issue that are suitable for various practical needs.




Probability

Probability, coordinated by Renato Gava, DEs - UFSCar:

Convergence to a transformation of the Brownian Web for a family of Poissonian trees

Cristian Favio Coletti, UFABC

×

Cristian Favio Coletti, UFABC

Convergence to a transformation of the Brownian Web for a family of Poissonian trees

We introduce a system of one-dimensional coalescing random paths starting at the space-time points of a homogeneous Poisson point process in $\mathbb{R} \times \{0\}$ which are constructed as a function of a family $(\Lambda_n)_{n \in \mathbb{N}}$ of Poisson point processes. We show that under diffusive scaling this system converges in distribution to a continuous mapping of the Brownian Web. Joint work with Leon A. Valencia Henao (UdeA, Colombia).

A stochastic model for evolution with mass extinction on Td+

Fábio Prates Machado, IME-USP

×

Fábio Prates Machado, IME-USP

A stochastic model for evolution with mass extinction on Td+

We propose a stochastic model for evolution through mutation and natural selection that evolves on a $\mathbb{T}_d^+$ tree. We obtain sharp and distinct conditions on the set of parameters for extinction and survival both on the whole Td+ and on a fixed branch of it.

Variations on Kac's Theme

Miguel Abadi, IME-USP

×

Miguel Abadi, IME-USP

Variations on Kac's Theme

Recentemente, a matriz Q, que é um elemento responsável por captar a relação entre itens e traços latentes e está presente na grande maioria dos modelos de diagnóstico cognitivo (MDC), foi incorporada na formulação dos modelos da teoria da resposta ao item multidimensionais (TRIM). Na prática, para utilizar um modelo com matriz Q, seja ele um MDC ou um modelo da TRIM, é necessário primeiro estabelecer a relação entre itens e traços latentes construindo uma matriz Q apropriada. Embora este processo de construção da matriz Q seja tipicamente feita por especialistas no tema dos itens, ele é um processo subjetivo e, portanto, pode implicar em equívocos, resultando em importantes problemas práticos, como, por exemplo, influenciar a estimação dos parâmetros do modelo. Neste trabalho, propomos um método de busca de matriz Q em modelos da TRIM baseado em critérios projetados para fornecer informação máxima sobre alguma propriedade de interesse. Para isso, utilizamos o algoritmo de troca, um método eficiente e sistemático para a busca de matrizes, visando encontrar uma matriz Q que maximize alguma propriedade de interesse. Além disso, o método fornece informações de eficiência que podem ser úteis na reavaliação de uma matriz Q. Um estudo de simulação foi realizado para analisar o desempenho do método proposto utilizando dois diferentes cenários construídos a partir da variação do número de itens e do número de dimensões do traço latente. Para ilustrar o uso do método de busca de matriz Q em dados reais, utilizamos um conjunto de dados com respostas de 1.111 estudantes a uma versão do BDI com 21 itens.







Information

For additional information, please contact us here. We will contact you back as soon as possible.

Important Dates

Deadline abstract submission: Jan 18th
Notification of acceptance: Jan 28th
Deadline for early registration: Feb 1st

Early Fees* (before/on Feb 1st)

Researchers/others: R$ 70,00
Graduate students: R$ 45,00
Undergraduate students: R$ 25,00

Regular Fees** (after Feb 1st)

Researchers/others: R$ 80,00
Graduate students: R$ 55,00
Undergraduate students: R$ 30,00

Where at UFSCar

TBD
See here the campus map for locations

Hotel

For lodging and accommodations we recommend Anacã São Carlos


*Early fees must be paid for Lea da Silva Veras CNPJ 25.139.662/000178: Banco Inter (Banco Intermedium S.A.) (077), Agência 0001, C/C 27015912, and the receipt must be sent to wpsm.pipges@gmail.com with the subject [pagamento 8WPSM]. (O pagamento antecipado deve ser realizado por transferência bancária para Lea da Silva Veras CNPJ 25.139.662/000178: Banco Inter (Banco Intermedium S.A.) (077), Agência 0001, C/C 27015912, e o comprovante do depósito deve ser enviado por e-mail para wpsm.pipges@gmail.com com o assunto [pagamento 8WPSM])
**Regular fees may be paid in cash at the registration desk on the first day.

Past Editions

See below some information related to the previous editions of our meeting.

VII WPSM

13, 14, 15 February 2019, UFSCar
Book of Abstracts

VI WPSM

5, 6, 7 February 2018, UFSCar
Book of Abstracts

V WPSM

6, 7, 8 February 2018, ICMC-USP
Book of AbstractsPoster

IV WPSM

1, 2, 3 February 2016, UFSCar
Book of AbstractsPoster

III WPSM

9, 10, 11 February 2015, ICMC-USP
Book of AbstractsPoster

II WPSM

5, 6, 7 February 2014, UFSCar
Book of AbstractsPoster

WPSM

28, 29, 30 January 2013, ICMC-USP
Book of Abstracts



Organization


      


Support


                             

The meeting in numbers

Since the first edition of the Workshop on Probabilistic and Statistical Methods, held in 2013, we have received the contribution of many colleagues and students.

Conferences and Talks
Oral Communications
Poster Presentations
Participants

Questions?

If you have any question, please do not hesitate in contact us at wpsm.pipges@gmail.com! We are looking forward to seeing you in São Carlos.

© 2019 PIPGEs UFSCar/USP.