Title: | Estimate Sample Sizes for Group Comparisons with Skewed Distributions |
---|---|
Description: | Estimate necessary sample sizes for comparing the location of data from two groups or categories when the distribution of the data is skewed. The package offers a non-parametric method for a Wilcoxon Mann-Whitney test of location shift as well as methods for several generalized linear models, for instance, Gamma regression. |
Authors: | Johannes Brachem [cre, aut], Dominik Strache [aut] |
Maintainer: | Johannes Brachem <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2025-02-14 03:40:29 UTC |
Source: | https://github.com/jobrachem/skewsamp |
Empirical probability density function based on a sample of observations, as described by Chakraborti (2006).
demp(x, sample)
demp(x, sample)
x |
numeric vector of values to evaluate |
sample |
numeric vector of sample values to base the EPDF on |
numeric vector of density values based on the EPDF
Chakraborti, S., Hong, B., & Van De Wiel, M. A. (2006). A note on sample size determination for a nonparametric test of location. Technometrics, 48(1), 88–94. https://doi.org/10.1198/004017005000000193
x <- 1:5 demp(1, x)
x <- 1:5 demp(1, x)
Estimation of required sample size as given by Cundill & Alexander (2015).
n_binom( p0, effect, size = 1, alpha = 0.05, power = 0.9, q = 0.5, link = c("logit", "identity"), two_sided = TRUE )
n_binom( p0, effect, size = 1, alpha = 0.05, power = 0.9, q = 0.5, link = c("logit", "identity"), two_sided = TRUE )
p0 |
probability of success in group0 |
effect |
Effect size, |
size |
number of trials (greater than zero) |
alpha |
Type I error rate |
power |
1 - Type II error rate |
q |
Proportion of observations allocated to the control group |
link |
Link function to use. Currently implement: 'log' and 'identity' |
two_sided |
logical, if |
Returns an object of class "sample_size"
. It contains
the following components:
N |
the total sample size |
n0 |
sample size in Group 0 (control group) |
n1 |
sample size in Group 1 (treatment group) |
two_sided |
logical, |
alpha |
type I error rate used in sample size estimation |
power |
target power used in sample size estimation |
effect |
effect size used in sample size estimation |
effect_type |
short description of the type of effect size |
comment |
additional comment, if there is any |
call |
the matched call. |
Cundill, B., & Alexander, N. D. E. (2015). Sample size calculations for skewed distributions. BMC Medical Research Methodology, 15(1), 1–9. https://doi.org/10.1186/s12874-015-0023-0
n_binom(p0 = 0.5, effect = 0.3)
n_binom(p0 = 0.5, effect = 0.3)
Estimation of required sample size as given by Cundill & Alexander (2015).
n_gamma( mean0, effect, shape0, shape1 = shape0, alpha = 0.05, power = 0.9, q = 0.5, link = c("log", "identity"), two_sided = TRUE )
n_gamma( mean0, effect, shape0, shape1 = shape0, alpha = 0.05, power = 0.9, q = 0.5, link = c("log", "identity"), two_sided = TRUE )
mean0 |
Mean in control group |
effect |
Effect size, |
shape0 |
Shape parameter in control group |
shape1 |
Shape parameter in treatment group. Defaults to
|
alpha |
Type I error rate |
power |
1 - Type II error rate |
q |
Proportion of observations allocated to the control group |
link |
Link function to use. Currently implement: 'log' and 'identity' |
two_sided |
logical, if |
Returns an object of class "sample_size"
. It contains
the following components:
N |
the total sample size |
n0 |
sample size in Group 0 (control group) |
n1 |
sample size in Group 1 (treatment group) |
two_sided |
logical, |
alpha |
type I error rate used in sample size estimation |
power |
target power used in sample size estimation |
effect |
effect size used in sample size estimation |
effect_type |
short description of the type of effect size |
comment |
additional comment, if there is any |
call |
the matched call. |
Cundill, B., & Alexander, N. D. E. (2015). Sample size calculations for skewed distributions. BMC Medical Research Methodology, 15(1), 1–9. https://doi.org/10.1186/s12874-015-0023-0
n_gamma(mean0 = 8.46, effect = 0.7, shape0 = 0.639, alpha = 0.05, power = 0.9)
n_gamma(mean0 = 8.46, effect = 0.7, shape0 = 0.639, alpha = 0.05, power = 0.9)
Estimation of required sample size as given by Cundill & Alexander (2015).
n_glm( mean0, mean1, dispersion0, dispersion1, alpha, power, link_fun = function(mu) NULL, variance_fun = function(mu, dispersion) NULL, dmu_deta_fun = function(mu) NULL, q )
n_glm( mean0, mean1, dispersion0, dispersion1, alpha, power, link_fun = function(mu) NULL, variance_fun = function(mu, dispersion) NULL, dmu_deta_fun = function(mu) NULL, q )
mean0 |
Mean in control group |
mean1 |
Mean in treatment group |
dispersion0 |
Dispersion parameter in control group |
dispersion1 |
Dispersion parameter in treatment group. |
alpha |
Type I error rate |
power |
1 - Type II error rate |
link_fun |
function object, the link function to create the
response |
variance_fun |
function object, function for computing the variance based on a mean and a dispersion parameter |
dmu_deta_fun |
function object, derivative of the original
mean with respect to the link: |
q |
Number between 0 and 1, the proportion of observations allocated to the control group |
Total sample size (numeric)
Cundill, B., & Alexander, N. D. E. (2015). Sample size calculations for skewed distributions. BMC Medical Research Methodology, 15(1), 1–9. https://doi.org/10.1186/s12874-015-0023-0
Estimation as described by Chakraborti, Hong, & van de Wiel (2006).
n_locshift(s1, s2, delta, alpha = 0.05, power = 0.9, q = 0.5)
n_locshift(s1, s2, delta, alpha = 0.05, power = 0.9, q = 0.5)
s1 , s2
|
pilot samples |
delta |
numeric value, location shift parameter |
alpha |
type-I error probability |
power |
1 - type-II error probability, the desired statistical power |
q |
size of group0 relative to total sample size. |
WARNING: Note that the estimation has high variability due to its dependence on pilot samples. The smaller the pilot sample, the more uncertain is the estimation of the required sample size. In a simulation study, we found that the method may also be inaccurate on average, depending on the investigated data.
Returns an object of class "sample_size"
. It contains
the following components:
N |
the total sample size |
n0 |
sample size in Group 0 (control group) |
n1 |
sample size in Group 1 (treatment group) |
two_sided |
logical, |
alpha |
type I error rate used in sample size estimation |
power |
target power used in sample size estimation |
effect |
effect size used in sample size estimation |
effect_type |
short description of the type of effect size |
comment |
additional comment, if there is any |
call |
the matched call. |
Chakraborti, S., Hong, B., & van de Wiel, M. A. (2006). A note on sample size determination for a nonparametric test of location. Technometrics, 48(1), 88–94. https://doi.org/10.1198/004017005000000193
n_locshift(s1 = rexp(10), s2 = rexp(10), alpha = 0.05, power = 0.9, delta = 0.35)
n_locshift(s1 = rexp(10), s2 = rexp(10), alpha = 0.05, power = 0.9, delta = 0.35)
Based on the procedure described by Chakraborti, Hong, & van de Wiel (2006)
n_locshift_bound( s1, s2, delta, alpha = 0.05, power = 0.9, quantile = 0.9, n_resamples = 500, q = 0.5 )
n_locshift_bound( s1, s2, delta, alpha = 0.05, power = 0.9, quantile = 0.9, n_resamples = 500, q = 0.5 )
s1 , s2
|
Pilot samples |
delta |
numeric value, location shift parameter |
alpha |
Type I error probability |
power |
1 - Type II error probability, the desired statistical power |
quantile |
Quantile to use as the upper bound. |
n_resamples |
number of resamples to use in bootstrapping |
q |
size of group0 relative to total sample size. |
WARNING: Note that the underlying estimation has high variability due to its dependence on pilot samples. The smaller the pilot sample, the more uncertain is the estimation of the required sample size. In a simulation study, we found that the underlying method may also be inaccurate on average, depending on the investigated data.
Returns an object of class "sample_size"
. It contains
the following components:
n |
the total sample size |
n0 |
sample size in Group 0 (control group) |
n1 |
sample size in Group 1 (treatment group) |
two_sided |
logical, |
alpha |
type I error rate used in sample size estimation |
power |
target power used in sample size estimation |
effect |
effect size used in sample size estimation |
effect_type |
short description of the type of effect size |
comment |
additional comment, if there is any |
call |
the matched call. |
Chakraborti, S., Hong, B., & van de Wiel, M. A. (2006). A note on sample size determination for a nonparametric test of location. Technometrics, 48(1), 88–94. https://doi.org/10.1198/004017005000000193
## Not run: n_locshift_bound(s1 = rexp(10), s2 = rexp(10), delta = 0.35, alpha = 0.05, power = 0.9, n_resamples = 5) ## End(Not run)
## Not run: n_locshift_bound(s1 = rexp(10), s2 = rexp(10), delta = 0.35, alpha = 0.05, power = 0.9, n_resamples = 5) ## End(Not run)
Estimation of required sample size as given by Cundill & Alexander (2015).
n_negbinom( mean0, effect, dispersion0, dispersion1 = dispersion0, alpha = 0.05, power = 0.9, q = 0.5, link = c("log", "identity"), two_sided = TRUE )
n_negbinom( mean0, effect, dispersion0, dispersion1 = dispersion0, alpha = 0.05, power = 0.9, q = 0.5, link = c("log", "identity"), two_sided = TRUE )
mean0 |
Mean in control group |
effect |
Effect size, |
dispersion0 |
Dispersion parameter in control group |
dispersion1 |
Dispersion parameter in treatment group. Defaults to
|
alpha |
Type I error rate |
power |
1 - Type II error rate |
q |
Proportion of observations allocated to the control group |
link |
Link function to use. Currently implement: 'log' and 'identity' |
two_sided |
logical, if |
Returns an object of class "sample_size"
. It contains
the following components:
N |
the total sample size |
n0 |
sample size in Group 0 (control group) |
n1 |
sample size in Group 1 (treatment group) |
two_sided |
logical, |
alpha |
type I error rate used in sample size estimation |
power |
target power used in sample size estimation |
effect |
effect size used in sample size estimation |
effect_type |
short description of the type of effect size |
comment |
additional comment, if there is any |
call |
the matched call. |
Cundill, B., & Alexander, N. D. E. (2015). Sample size calculations for skewed distributions. BMC Medical Research Methodology, 15(1), 1–9. https://doi.org/10.1186/s12874-015-0023-0
n_negbinom(mean0 = 71.4, effect = 0.7, dispersion0 = 0.33, alpha = 0.05, power = 0.9)
n_negbinom(mean0 = 71.4, effect = 0.7, dispersion0 = 0.33, alpha = 0.05, power = 0.9)
Estimation of required sample size as given by Cundill & Alexander (2015).
n_poisson( mean0, effect, alpha = 0.05, power = 0.9, q = 0.5, link = c("log", "identity"), two_sided = TRUE )
n_poisson( mean0, effect, alpha = 0.05, power = 0.9, q = 0.5, link = c("log", "identity"), two_sided = TRUE )
mean0 |
Mean in control group |
effect |
Effect size, |
alpha |
Type I error rate |
power |
1 - Type II error rate |
q |
Proportion of observations allocated to the control group |
link |
Link function to use. Currently implement: 'log' and 'identity' |
two_sided |
logical, if |
Returns an object of class "sample_size"
. It contains
the following components:
N |
the total sample size |
n0 |
sample size in Group 0 (control group) |
n1 |
sample size in Group 1 (treatment group) |
two_sided |
logical, |
alpha |
type I error rate used in sample size estimation |
power |
target power used in sample size estimation |
effect |
effect size used in sample size estimation |
effect_type |
short description of the type of effect size |
comment |
additional comment, if there is any |
call |
the matched call. |
Cundill, B., & Alexander, N. D. E. (2015). Sample size calculations for skewed distributions. BMC Medical Research Methodology, 15(1), 1–9. https://doi.org/10.1186/s12874-015-0023-0
n_poisson(mean0 = 5, effect = 0.3)
n_poisson(mean0 = 5, effect = 0.3)
Empirical cumulative density function based on a sample of observations, as used by described by Chakraborti (2006).
pemp(q, sample)
pemp(q, sample)
q |
numeric vector of values to evaluate |
sample |
numeric vector of sample values to base the ECDF on |
Returns the probabilities that a value drawn at random from the empirical cumulative density based on sample is smaller than or equal to the elements of x.
Chakraborti, S., Hong, B., & Van De Wiel, M. A. (2006). A note on sample size determination for a nonparametric test of location. Technometrics, 48(1), 88–94. https://doi.org/10.1198/004017005000000193
x <- 1:5 pemp(1, x)
x <- 1:5 pemp(1, x)
Empirical quantile function, i.e. inverse of the empirical cumulative
density function pemp()
. Based on the latter function as presented
by Chakraborti (2006).
qemp(p, sample)
qemp(p, sample)
p |
probability, can be a vector |
sample |
numeric vector of sample values to base the ECDF on |
Returns the value for which pemp(x, sample) = p
,
i.e. the probability that a value drawn at random from the ECDF
is smaller or equal to x
is p
.
Chakraborti, S., Hong, B., & Van De Wiel, M. A. (2006). A note on sample size determination for a nonparametric test of location. Technometrics, 48(1), 88–94. https://doi.org/10.1198/004017005000000193
x <- 1:5 qemp(0.1, x)
x <- 1:5 qemp(0.1, x)
sample
Based on the empirical cumulative density function as presented by Chakraborti (2006).
remp(n, sample)
remp(n, sample)
n |
integer, number of samples to be drawn |
sample |
numeric vector of sample values to base the ECDF on |
numeric vector of random values drawn from the ECDF
Chakraborti, S., Hong, B., & Van De Wiel, M. A. (2006). A note on sample size determination for a nonparametric test of location. Technometrics, 48(1), 88–94. https://doi.org/10.1198/004017005000000193
x <- 1:5 remp(10, x)
x <- 1:5 remp(10, x)
Estimation of sample sizes based on resampled pilot samples from the empirical cumulative density. Based on the work of Chakraborti, Hong, & van de Wiel (2006).
resample_n_locshift( s1, s2, delta, alpha = 0.05, power = 0.9, n_resamples = 500, q = 0.5 )
resample_n_locshift( s1, s2, delta, alpha = 0.05, power = 0.9, n_resamples = 500, q = 0.5 )
s1 , s2
|
Pilot samples |
delta |
numeric value, location shift parameter |
alpha |
Type I error probability |
power |
1 - Type II error probability, the desired statistical power |
n_resamples |
number of resamples to use in bootstrapping |
q |
size of group0 relative to total sample size. |
WARNING: Note that the estimation has high variability due to its dependence on pilot samples. The smaller the pilot sample, the more uncertain is the estimation of the required sample size. In a simulation study, we found that the method may also be inaccurate on average, depending on the investigated data.
numeric vector of sample size estimates (total sample size)
Chakraborti, S., Hong, B., & van de Wiel, M. A. (2006). A note on sample size determination for a nonparametric test of location. Technometrics, 48(1), 88–94. https://doi.org/10.1198/004017005000000193