Methodology
- [1] arXiv:2405.09797 [pdf, ps, html, other]
-
Title: Identification of Single-Treatment Effects in Factorial ExperimentsSubjects: Methodology (stat.ME); Machine Learning (stat.ML); Other Statistics (stat.OT)
Despite their cost, randomized controlled trials (RCTs) are widely regarded as gold-standard evidence in disciplines ranging from social science to medicine. In recent decades, researchers have increasingly sought to reduce the resource burden of repeated RCTs with factorial designs that simultaneously test multiple hypotheses, e.g. experiments that evaluate the effects of many medications or products simultaneously. Here I show that when multiple interventions are randomized in experiments, the effect any single intervention would have outside the experimental setting is not identified absent heroic assumptions, even if otherwise perfectly realistic conditions are achieved. This happens because single-treatment effects involve a counterfactual world with a single focal intervention, allowing other variables to take their natural values (which may be confounded or modified by the focal intervention). In contrast, observational studies and factorial experiments provide information about potential-outcome distributions with zero and multiple interventions, respectively. In this paper, I formalize sufficient conditions for the identifiability of those isolated quantities. I show that researchers who rely on this type of design have to justify either linearity of functional forms or -- in the nonparametric case -- specify with Directed Acyclic Graphs how variables are related in the real world. Finally, I develop nonparametric sharp bounds -- i.e., maximally informative best-/worst-case estimates consistent with limited RCT data -- that show when extrapolations about effect signs are empirically justified. These new results are illustrated with simulated data.
- [2] arXiv:2405.09810 [pdf, ps, other]
-
Title: Trajecctory-Based Individualized Treatment RulesSubjects: Methodology (stat.ME)
A core component of precision medicine research involves optimizing individualized treatment rules (ITRs) based on patient characteristics. Many studies used to estimate ITRs are longitudinal in nature, collecting outcomes over time. Yet, to date, methods developed to estimate ITRs often ignore the longitudinal structure of the data. Information available from the longitudinal nature of the data can be especially useful in mental health studies. Although treatment means might appear similar, understanding the trajectory of outcomes over time can reveal important differences between treatments and placebo effects. This longitudinal perspective is especially beneficial in mental health research, where subtle shifts in outcome patterns can hold significant implications. Despite numerous studies involving the collection of outcome data across various time points, most precision medicine methods used to develop ITRs overlook the information available from the longitudinal structure. The prevalence of missing data in such studies exacerbates the issue, as neglecting the longitudinal nature of the data can significantly impair the effectiveness of treatment rules. This paper develops a powerful longitudinal trajectory-based ITR construction method that incorporates baseline variables, via a single-index or biosignature, into the modeling of longitudinal outcomes. This trajectory-based ITR approach substantially minimizes the negative impact of missing data compared to more traditional ITR approaches. The approach is illustrated through simulation studies and a clinical trial for depression, contrasting it with more traditional ITRs that ignore longitudinal information.
- [3] arXiv:2405.09887 [pdf, ps, html, other]
-
Title: Quantization-based LHS for dependent inputs : application to sensitivity analysis of environmental modelsSubjects: Methodology (stat.ME)
Numerical modeling is essential for comprehending intricate physical phenomena in different domains. To handle complexity, sensitivity analysis, particularly screening, is crucial for identifying influential input parameters. Kernel-based methods, such as the Hilbert Schmidt Independence Criterion (HSIC), are valuable for analyzing dependencies between inputs and outputs. Moreover, due to the computational expense of such models, metamodels (or surrogate models) are often unavoidable. Implementing metamodels and HSIC requires data from the original model, which leads to the need for space-filling designs. While existing methods like Latin Hypercube Sampling (LHS) are effective for independent variables, incorporating dependence is challenging. This paper introduces a novel LHS variant, Quantization-based LHS, which leverages Voronoi vector quantization to address correlated inputs. The method ensures comprehensive coverage of stratified variables, enhancing distribution across marginals. The paper outlines expectation estimators based on Quantization-based LHS in various dependency settings, demonstrating their unbiasedness. The method is applied on several models of growing complexities, first on simple examples to illustrate the theory, then on more complex environmental hydrological models, when the dependence is known or not, and with more and more interactive processes and factors. The last application is on the digital twin of a French vineyard catchment (Beaujolais region) to design a vegetative filter strip and reduce water, sediment and pesticide transfers from the fields to the river. Quantization-based LHS is used to compute HSIC measures and independence tests, demonstrating its usefulness, especially in the context of complex models.
- [4] arXiv:2405.09906 [pdf, ps, html, other]
-
Title: Process-based Inference for Spatial Energetics Using Bayesian Predictive StackingComments: 38 pages, 13 figuresSubjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)
Rapid developments in streaming data technologies have enabled real-time monitoring of human activity that can deliver high-resolution data on health variables over trajectories or paths carved out by subjects as they conduct their daily physical activities. Wearable devices, such as wrist-worn sensors that monitor gross motor activity, have become prevalent and have kindled the emerging field of ``spatial energetics'' in environmental health sciences. We devise a Bayesian inferential framework for analyzing such data while accounting for information available on specific spatial coordinates comprising a trajectory or path using a Global Positioning System (GPS) device embedded within the wearable device. We offer full probabilistic inference with uncertainty quantification using spatial-temporal process models adapted for data generated from ``actigraph'' units as the subject traverses a path or trajectory in their daily routine. Anticipating the need for fast inference for mobile health data, we pursue exact inference using conjugate Bayesian models and employ predictive stacking to assimilate inference across these individual models. This circumvents issues with iterative estimation algorithms such as Markov chain Monte Carlo. We devise Bayesian predictive stacking in this context for models that treat time as discrete epochs and that treat time as continuous. We illustrate our methods with simulation experiments and analysis of data from the Physical Activity through Sustainable Transport Approaches (PASTA-LA) study conducted by the Fielding School of Public Health at the University of California, Los Angeles.
- [5] arXiv:2405.10026 [pdf, ps, other]
-
Title: The case for specifying the "ideal" target trialSubjects: Methodology (stat.ME)
The target trial is an increasingly popular conceptual device for guiding the design and analysis of observational studies that seek to perform causal inference. As tends to occur with concepts like this, there is variability in how certain aspects of the approach are understood, which may lead to potentially consequential differences in how the approach is taught, implemented, and interpreted in practice. In this commentary, we provide a perspective on two of these aspects: how the target trial should be specified, and relatedly, how the target trial fits within a formal causal inference framework.
- [6] arXiv:2405.10036 [pdf, ps, html, other]
-
Title: Large-scale Data Integration using Matrix Denoising and Geometric Factor MatchingSubjects: Methodology (stat.ME)
Unsupervised integrative analysis of multiple data sources has become common place and scalable algorithms are necessary to accommodate ever increasing availability of data. Only few currently methods have estimation speed as their focus, and those that do are only applicable to restricted data layouts such as different data types measured on the same observation units. We introduce a novel point of view on low-rank matrix integration phrased as a graph estimation problem which allows development of a method, large-scale Collective Matrix Factorization (lsCMF), which is able to integrate data in flexible layouts in a speedy fashion. It utilizes a matrix denoising framework for rank estimation and geometric properties of singular vectors to efficiently integrate data. The quick estimation speed of lsCMF while retaining good estimation of data structure is then demonstrated in simulation studies.
- [7] arXiv:2405.10067 [pdf, ps, html, other]
-
Title: Sparse and Orthogonal Low-rank Collective Matrix Factorization (solrCMF): Efficient data integration in flexible layoutsSubjects: Methodology (stat.ME)
Interest in unsupervised methods for joint analysis of heterogeneous data sources has risen in recent years. Low-rank latent factor models have proven to be an effective tool for data integration and have been extended to a large number of data source layouts. Of particular interest is the separation of variation present in data sources into shared and individual subspaces. In addition, interpretability of estimated latent factors is crucial to further understanding.
We present sparse and orthogonal low-rank Collective Matrix Factorization (solrCMF) to estimate low-rank latent factor models for flexible data layouts. These encompass traditional multi-view (one group, multiple data types) and multi-grid (multiple groups, multiple data types) layouts, as well as augmented layouts, which allow the inclusion of side information between data types or groups. In addition, solrCMF allows tensor-like layouts (repeated layers), estimates interpretable factors, and determines variation structure among factors and data sources.
Using a penalized optimization approach, we automatically separate variability into the globally and partially shared as well as individual components and estimate sparse representations of factors. To further increase interpretability of factors, we enforce orthogonality between them. Estimation is performed efficiently in a recent multi-block ADMM framework which we adapted to support embedded manifold constraints.
The performance of solrCMF is demonstrated in simulation studies and compares favorably to existing methods. - [8] arXiv:2405.10302 [pdf, ps, html, other]
-
Title: Optimal Aggregation of Prediction Intervals under Unsupervised Domain ShiftSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
As machine learning models are increasingly deployed in dynamic environments, it becomes paramount to assess and quantify uncertainties associated with distribution shifts. A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance. The prediction interval, which captures the range of likely outcomes for a given prediction, serves as a crucial tool for characterizing uncertainties induced by their underlying distribution. In this paper, we propose methodologies for aggregating prediction intervals to obtain one with minimal width and adequate coverage on the target domain under unsupervised domain shift, under which we have labeled samples from a related source domain and unlabeled covariates from the target domain. Our analysis encompasses scenarios where the source and the target domain are related via i) a bounded density ratio, and ii) a measure-preserving transformation. Our proposed methodologies are computationally efficient and easy to implement. Beyond illustrating the performance of our method through a real-world dataset, we also delve into the theoretical details. This includes establishing rigorous theoretical guarantees, coupled with finite sample bounds, regarding the coverage and width of our prediction intervals. Our approach excels in practical applications and is underpinned by a solid theoretical framework, ensuring its reliability and effectiveness across diverse contexts.
New submissions for Friday, 17 May 2024 (showing 8 of 8 entries )
- [9] arXiv:2405.09596 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Enhancing Maritime Trajectory Forecasting via H3 Index and Causal Language Modelling (CLM)Comments: 22 pages, 14 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)
The prediction of ship trajectories is a growing field of study in artificial intelligence. Traditional methods rely on the use of LSTM, GRU networks, and even Transformer architectures for the prediction of spatio-temporal series. This study proposes a viable alternative for predicting these trajectories using only GNSS positions. It considers this spatio-temporal problem as a natural language processing problem. The latitude/longitude coordinates of AIS messages are transformed into cell identifiers using the H3 index. Thanks to the pseudo-octal representation, it becomes easier for language models to learn the spatial hierarchy of the H3 index. The method is compared with a classical Kalman filter, widely used in the maritime domain, and introduces the Fréchet distance as the main evaluation metric. We show that it is possible to predict ship trajectories quite precisely up to 8 hours with 30 minutes of context. We demonstrate that this alternative works well enough to predict trajectories worldwide.
- [10] arXiv:2405.09989 (cross-list from stat.AP) [pdf, ps, html, other]
-
Title: A Gaussian Process Model for Ordinal Data with Applications to ChemoinformaticsSubjects: Applications (stat.AP); Methodology (stat.ME); Machine Learning (stat.ML)
With the proliferation of screening tools for chemical testing, it is now possible to create vast databases of chemicals easily. However, rigorous statistical methodologies employed to analyse these databases are in their infancy, and further development to facilitate chemical discovery is imperative. In this paper, we present conditional Gaussian process models to predict ordinal outcomes from chemical experiments, where the inputs are chemical compounds. We implement the Tanimoto distance, a metric on the chemical space, within the covariance of the Gaussian processes to capture correlated effects in the chemical space. A novel aspect of our model is that the kernel contains a scaling parameter, a feature not previously examined in the literature, that controls the strength of the correlation between elements of the chemical space. Using molecular fingerprints, a numerical representation of a compound's location within the chemical space, we show that accounting for correlation amongst chemical compounds improves predictive performance over the uncorrelated model, where effects are assumed to be independent. Moreover, we present a genetic algorithm for the facilitation of chemical discovery and identification of important features to the compound's efficacy. A simulation study is conducted to demonstrate the suitability of the proposed methods. Our proposed methods are demonstrated on a hazard classification problem of organic solvents.
Cross submissions for Friday, 17 May 2024 (showing 2 of 2 entries )
- [11] arXiv:2309.04047 (replaced) [pdf, ps, other]
-
Title: Fully Latent Principal Stratification With Measurement ModelsComments: In SubmissionSubjects: Methodology (stat.ME)
There is wide agreement on the importance of implementation data from randomized effectiveness studies in behavioral science; however, there are few methods available to incorporate these data into causal models, especially when they are multivariate or longitudinal, and interest is in low-dimensional summaries. We introduce a framework for studying how treatment effects vary between subjects who implement an intervention differently, combining principal stratification with latent variable measurement models; since principal strata are latent in both treatment arms, we call it "fully-latent principal stratification" or FLPS. We describe FLPS models including item-response-theory measurement, show that they are feasible in a simulation study, and illustrate them in an analysis of hint usage from a randomized study of computerized mathematics tutors.
- [12] arXiv:2310.02278 (replaced) [pdf, ps, html, other]
-
Title: A Stable and Efficient Covariate-Balancing Estimator for Causal Survival EffectsKhiem Pham, David A. Hirshberg, Phuong-Mai Huynh-Pham, Michele Santacatterina, Ser-Nam Lim, Ramin ZabihComments: 32 pages, 5 figuresSubjects: Methodology (stat.ME); Machine Learning (stat.ML)
We propose an empirically stable and asymptotically efficient covariate-balancing approach to the problem of estimating survival causal effects in data with conditionally-independent censoring. This addresses a challenge often encountered in state-of-the-art nonparametric methods: the use of inverses of small estimated probabilities and the resulting amplification of estimation error. We validate our theoretical results in experiments on synthetic and semi-synthetic data.
- [13] arXiv:2310.11683 (replaced) [pdf, ps, html, other]
-
Title: Treatment bootstrapping: A new approach to quantify uncertainty of average treatment effect estimatesSubjects: Methodology (stat.ME); Applications (stat.AP)
This paper proposes a new non-parametric bootstrap method to quantify the uncertainty of average treatment effect estimate for the treated from matching estimators. More specifically, it seeks to quantify the uncertainty associated with the average treatment effect estimate for the treated by bootstrapping the treatment group only and finding the counterpart control group by pair matching on estimated propensity score without replacement. We demonstrate the validity of this approach and compare it with existing bootstrap approaches through Monte Carlo simulation and analysis of a real world data set. The results indicate that the proposed approach constructs confidence intervals and standard errors that have 95 percent or above coverage rate and better precision compared with existing bootstrap approaches, while these measures also depend on percent treated in the sample data and the sample size.
- [14] arXiv:2312.04077 (replaced) [pdf, ps, other]
-
Title: When is Plasmode simulation superior to parametric simulation when estimating the MSE of the least squares estimator in linear regression?Marieke Stolte, Nicholas Schreck, Alla Slynko, Maral Saadati, Axel Benner, Jörg Rahnenführer, Andrea BommertJournal-ref: PLOS ONE (2024)Subjects: Methodology (stat.ME); Computation (stat.CO)
Simulation is a crucial tool for the evaluation and comparison of statistical methods. How to design fair and neutral simulation studies is therefore of great interest for researchers developing new methods and practitioners confronted with the choice of the most suitable method. The term simulation usually refers to parametric simulation, that is, computer experiments using artificial data made up of pseudo-random numbers. Plasmode simulation, that is, computer experiments using the combination of resampling feature data from a real-life dataset and generating the target variable with a known user-selected outcome-generating model (OGM), is an alternative that is often claimed to produce more realistic data. We compare parametric and Plasmode simulation for the example of estimating the mean squared error (MSE) of the least squares estimator (LSE) in linear regression. If the true underlying data-generating process (DGP) and the OGM were known, parametric simulation would obviously be the best choice in terms of estimating the MSE well. However, in reality, both are usually unknown, so researchers have to make assumptions: in Plasmode simulation for the OGM, in parametric simulation for both DGP and OGM. Most likely, these assumptions do not exactly reflect the truth. Here, we aim to find out how assumptions deviating from the true DGP and the true OGM affect the performance of parametric and Plasmode simulations in the context of MSE estimation for the LSE and in which situations which simulation type is preferable. Our results suggest that the preferable simulation method depends on many factors, including the number of features, and on how and to what extent the assumptions of a parametric simulation differ from the true DGP. Also, the resampling strategy used for Plasmode influences the results. In particular, subsampling with a small sampling proportion can be recommended.
- [15] arXiv:2312.12641 (replaced) [pdf, ps, html, other]
-
Title: Robust Point Matching with Distance ProfilesSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
While matching procedures based on pairwise distances are conceptually appealing and thus favored in practice, theoretical guarantees for such procedures are rarely found in the literature. We propose and analyze matching procedures based on distance profiles that are easily implementable in practice, showing these procedures are robust to outliers and noise. We demonstrate the performance of the proposed method using a real data example and provide simulation studies to complement the theoretical findings.
- [16] arXiv:2404.04122 (replaced) [pdf, ps, html, other]
-
Title: Hidden Markov Models for Multivariate Panel DataSubjects: Methodology (stat.ME)
While advances continue to be made in model-based clustering, challenges persist in modeling various data types such as panel data. Multivariate panel data present difficulties for clustering algorithms due to the unique correlation structure, a consequence of taking observations on several subjects over multiple time points. Additionally, panel data are often plagued by missing data and dropouts, presenting issues for estimation algorithms. This research presents a family of hidden Markov models that compensate for the unique correlation structures that arise in panel data. A modified expectation-maximization algorithm capable of handling missing not at random data and dropout is presented and used to perform model estimation.
- [17] arXiv:2405.05389 (replaced) [pdf, ps, html, other]
-
Title: On foundation of generative statistics with F-entropy: a gradient-based approachComments: 29 pagesSubjects: Methodology (stat.ME)
This paper explores the interplay between statistics and generative artificial intelligence. Generative statistics, an integral part of the latter, aims to construct models that can {\it generate} efficiently and meaningfully new data across the whole of the (usually high dimensional) sample space, e.g. a new photo. Within it, the gradient-based approach is a current favourite that exploits effectively, for the above purpose, the information contained in the observed sample, e.g. an old photo. However, often there are missing data in the observed sample, e.g. missing bits in the old photo. To handle this situation, we have proposed a gradient-based algorithm for generative modelling. More importantly, our paper underpins rigorously this powerful approach by introducing a new F-entropy that is related to Fisher's divergence. (The F-entropy is also of independent interest.) The underpinning has enabled the gradient-based approach to expand its scope. For example, it can now provide a tool for generative model selection. Possible future projects include discrete data and Bayesian variational inference.
- [18] arXiv:2405.09149 (replaced) [pdf, ps, html, other]
-
Title: Exploring uniformity and maximum entropy distribution on torus through intrinsic geometry: Application to protein-chemistryComments: arXiv admin note: text overlap with arXiv:2304.01599Subjects: Methodology (stat.ME)
A generic family of distributions, defined on the surface of a curved torus is introduced using the area element of it. The area uniformity and the maximum entropy distribution are identified using the trigonometric moments of the proposed family. A marginal distribution is obtained as a three-parameter modification of the von Mises distribution that encompasses the von Mises, Cardioid, and Uniform distributions as special cases. The proposed family of the marginal distribution exhibits both symmetric and asymmetric, unimodal or bimodal shapes, contingent upon parameters. Furthermore, we scrutinize a two-parameter symmetric submodel, examining its moments, measure of variation, Kullback-Leibler divergence, and maximum likelihood estimation, among other properties. In addition, we introduce a modified acceptance-rejection sampling with a thin envelope obtained from the upper-Riemann-sum of a circular density, achieving a high rate of acceptance. This proposed sampling scheme will accelerate the empirical studies for a large-scale simulation reducing the processing time. Furthermore, we extend the Uniform, Wrapped Cauchy, and Kato-Jones distributions to the surface of the curved torus and implemented the proposed bivariate toroidal distribution for different groups of protein data, namely, $\alpha$-helix, $\beta$-sheet, and their mixture. A marginal of this proposed distribution is fitted to the wind direction data.
- [19] arXiv:2203.15945 (replaced) [pdf, ps, html, other]
-
Title: A Framework for Improving the Reliability of Black-box Variational InferenceSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
Black-box variational inference (BBVI) now sees widespread use in machine learning and statistics as a fast yet flexible alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, stochastic optimization methods for BBVI remain unreliable and require substantial expertise and hand-tuning to apply effectively. In this paper, we propose Robust and Automated Black-box VI (RABVI), a framework for improving the reliability of BBVI optimization. RABVI is based on rigorously justified automation techniques, includes just a small number of intuitive tuning parameters, and detects inaccurate estimates of the optimal variational approximation. RABVI adaptively decreases the learning rate by detecting convergence of the fixed--learning-rate iterates, then estimates the symmetrized Kullback--Leibler (KL) divergence between the current variational approximation and the optimal one. It also employs a novel optimization termination criterion that enables the user to balance desired accuracy against computational cost by comparing (i) the predicted relative decrease in the symmetrized KL divergence if a smaller learning were used and (ii) the predicted computation required to converge with the smaller learning rate. We validate the robustness and accuracy of RABVI through carefully designed simulation studies and on a diverse set of real-world model and data examples.
- [20] arXiv:2209.09936 (replaced) [pdf, ps, html, other]
-
Title: Solving Fredholm Integral Equations of the First Kind via Wasserstein Gradient FlowsComments: Accepted for publication in Stochastic Processes and their Applications. In the journal version we erroneously state that convergence to the unregularized functional requires stronger assumptions on the kernel $k$ than those considered here; in fact, this is not the case and one can apply [27, Theorem 4.1] or [82, Theorem 1] to obtain this result under A1 and A3Subjects: Optimization and Control (math.OC); Functional Analysis (math.FA); Numerical Analysis (math.NA); Computation (stat.CO); Methodology (stat.ME)
Solving Fredholm equations of the first kind is crucial in many areas of the applied sciences. In this work we adopt a probabilistic and variational point of view by considering a minimization problem in the space of probability measures with an entropic regularization. Contrary to classical approaches which discretize the domain of the solutions, we introduce an algorithm to asymptotically sample from the unique solution of the regularized minimization problem. As a result our estimators do not depend on any underlying grid and have better scalability properties than most existing methods. Our algorithm is based on a particle approximation of the solution of a McKean--Vlasov stochastic differential equation associated with the Wasserstein gradient flow of our variational formulation. We prove the convergence towards a minimizer and provide practical guidelines for its numerical implementation. Finally, our method is compared with other approaches on several examples including density deconvolution and epidemiology.
- [21] arXiv:2305.04937 (replaced) [pdf, ps, html, other]
-
Title: Randomly sampling bipartite networks with fixed degree sequencesSubjects: Numerical Analysis (math.NA); Methodology (stat.ME)
Statistical analysis of bipartite networks frequently requires randomly sampling from the set of all bipartite networks with the same degree sequence as an observed network. Trade algorithms offer an efficient way to generate samples of bipartite networks by incrementally `trading' the positions of some of their edges. However, it is difficult to know how many such trades are required to ensure that the sample is random. I propose a stopping rule that focuses on the distance between sampled networks and the observed network, and stops performing trades when this distribution stabilizes. Analyses demonstrate that, for over 650 different degree sequences, using this stopping rule ensures a random sample with a high probability, and that it is practical for use in empirical applications.
- [22] arXiv:2404.17483 (replaced) [pdf, ps, html, other]
-
Title: Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect EstimationComments: Accepted to the 40th Conference on Uncertainty in Artificial Intelligence (UAI2024). 14 pages, 4 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
There is a growing interest in estimating heterogeneous treatment effects across individuals using their high-dimensional feature attributes. Achieving high performance in such high-dimensional heterogeneous treatment effect estimation is challenging because in this setup, it is usual that some features induce sample selection bias while others do not but are predictive of potential outcomes. To avoid losing such predictive feature information, existing methods learn separate feature representations using inverse probability weighting (IPW). However, due to their numerically unstable IPW weights, these methods suffer from estimation bias under a finite sample setup. To develop a numerically robust estimator by weighted representation learning, we propose a differentiable Pareto-smoothed weighting framework that replaces extreme weight values in an end-to-end fashion. Our experimental results show that by effectively correcting the weight values, our proposed method outperforms the existing ones, including traditional weighting schemes. Our code is available at this https URL.
- [23] arXiv:2405.07910 (replaced) [pdf, ps, other]
-
Title: A Unification of Exchangeability and Continuous Exposure and Confounder Measurement Errors: Probabilistic ExchangeabilityComments: Submitted for peer-reviewed publicationSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
Exchangeability concerning a continuous exposure, X, implies no confounding bias when identifying average exposure effects of X, AEE(X). When X is measured with error (Xep), two challenges arise in identifying AEE(X). Firstly, exchangeability regarding Xep does not equal exchangeability regarding X. Secondly, the non-differential error assumption (NDEA) could be overly stringent in practice. To address them, this article proposes unifying exchangeability and exposure and confounder measurement errors with three novel concepts. The first, Probabilistic Exchangeability (PE), states that the outcomes of those with Xep=e are probabilistically exchangeable with the outcomes of those truly exposed to X=eT. The relationship between AEE(Xep) and AEE(X) in risk difference and ratio scales is mathematically expressed as a probabilistic certainty, termed exchangeability probability (Pe). Squared Pe (Pe2) quantifies the extent to which AEE(Xep) differs from AEE(X) due to exposure measurement error through mechanisms not akin to confounding mechanisms. The coefficient of determination (R2) in the regression of Xep against X may sometimes be sufficient to measure Pe2. The second concept, Emergent Pseudo Confounding (EPC), describes the bias introduced by exposure measurement error through mechanisms akin to confounding mechanisms. PE requires controlling for EPC, which is weaker than NDEA. The third, Emergent Confounding, describes when bias due to confounder measurement error arises. Adjustment for E(P)C can be performed like confounding adjustment. This paper provides maximum insight into when AEE(Xep) is an appropriate surrogate of AEE(X) and how to measure the difference between these two. Differential errors could be addressed and may not compromise causal inference.