Abstract

The importance of exploratory data analysis for obtaining a small number of variables that are worth investigating in more detail has increased in the era of big data. In this paper, we propose a methodology for screening a subset containing the marginal distributions with means ranked in the top m, called the promising distributions, among a large number of distributions. We refer to this screening process as correct screening, emphasizing the distinction from the so-called correct selection methods in the literature, as our method sequentially guarantees the exclusion of distributions that are not promising. This property enables us to be ready for further confirmatory analysis with only screened m promising distributions under any sample size n. Our methodology is practical, as it allows for arbitrary, possibly random, sampling rules and does not require any specific model or strong assumption. We show that the obtained subsets always include all the m promising distributions with high probability and exactly identify them asymptotically.