class: left, middle, inverse background-image: url(img/portada.jpg) background-size: cover <div style="left: 0; width: 60%; padding; 0"> <!-- <div style=""> --> <img src="https://mounabelaid.netlify.app/post/r-ladies-tunis-meetups/featured.png" height="110px" style="padding-left: 20%"/> <img src="https://github.com/easystats/report/blob/master/man/figures/logo.png?raw=true" height="100px"/> <!-- <div/> --> # Reportando resultados estadísticos con `report` ## **R-Ladies Cuernavaca** ### Carlos A. Torres Cubilla ### 6 de mayo de 2021 <div/> --- class: inverse, middle, center <!-- <div class="my-logo-left"></div> --> # Sobre mí <img style="border-radius: 50%;" src="img/avatar.png" width="150px"/> ## Carlos Torres 🇵🇦 ### Cietífico de Datos en <a href="https://www.bgeneral.com/">Banco General</a> [
carlostorrescubila.github.io](https://carlostorrescubila.github.io/)   [
@carlos_tc22](https://twitter.com/carlos_tc22)   [
@carlostorrescubila](https://github.com/carlostorrescubila) --- class: inverse, center, middle # Get Started <img src="https://media4.giphy.com/media/PZcsDneMCLF65uxJuX/200w.webp?cid=ecf05e47vv92p8vn8z8sv26krwswe4jqpr2xfwo5126bey49&rid=200w.webp" width="50%"> --- class: center # `report`: “From R to your manuscript” <a href="https://easystats.github.io/report/"> <img src="https://github.com/easystats/report/blob/master/man/figures/logo.png?raw=true" width="50%"> </a> --- # ¿Qué es `report`? ¿Para qué sirve? `report` es un paquete en lenguaje R perteneciente al grupo de paquetes conocido como [`easystats`](https://easystats.github.io/easystats/). El objetivo principal de `report` es cerrar la brecha entre la salida de R y los resultados estructurados contenidos en un manuscrito. Este genera automáticamente informes de modelos y *dataframes* de acuerdo con las pautas de las mejores prácticas (por ejemplo, el estilo de la [APA](https://apastyle.apa.org/)), lo que garantiza la estandarización y la calidad al informar resultados. -- Este es un paquete joven y en continuo desarrollo. Fue publicado en [GitHub](https://github.com/easystats/report) el 29 de octubre de 2020 (version 0.0.1) y actualmente se encuentra en su versión 0.2.0. Sus desarrolladores invitan la comunidad a colaborar en el desarrollo del paquete respetando la guía de contribuciones que puede encontrar [AQUÍ](https://github.com/easystats/report/blob/master/.github/CONTRIBUTING.md) -- ## Instalación ```r install.packages("remotes") remotes::install_github("easystats/report") # No está en CRAN ``` --- # Flujo de trabajo ![report image](https://easystats.github.io/report/reference/figures/workflow.png) --- # Cargar paquetes ```r library(dplyr) *library(report) library(palmerpenguins) ``` -- <br><br> .center[ <a href="https://dplyr.tidyverse.org/"> <img src="https://d33wubrfki0l68.cloudfront.net/621a9c8c5d7b47c4b6d72e8f01f28d14310e8370/193fc/css/images/hex/dplyr.png" width="25%"> </a> <a href="https://easystats.github.io/report/"> <img src="https://github.com/easystats/report/blob/master/man/figures/logo.png?raw=true" width="25%"> </a> <a href="https://allisonhorst.github.io/palmerpenguins/"> <img src="https://allisonhorst.github.io/palmerpenguins/man/figures/palmerpenguins.png" width="25%"> </a> ] --- # Datos utilizados: `penguins` Los datos fueron recopilados y puestos a disposición por la [Dra. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) y la [Palmer Station Antarctica LTER](https://pal.lternet.edu/), miembro de la [Red de Investigación Ecológica a Largo Plazo](https://lternet.edu/). -- .center[ <img src="https://allisonhorst.github.io/palmerpenguins/reference/figures/lter_penguins.png" width="80%"> ] --- # Datos utilizados: `penguins`
--- # Datos utilizados: `penguins` .center[ <img src="https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/man/figures/culmen_depth.png" width="80%"> ] --- # Reportes específicos Las principales funciones que generan informes específicos que no están relacionados con métodos estadísticos son: - `report_system()` - `report_packages()` - `cite_packages()` - `report_date()` - `report_participants()` - `report_sample()` --- ## Informe del sistema ### Default ```r report_system() ``` ``` Analyses were conducted using the R Statistical language (version 4.0.5; R Core Team, 2021) on Windows 10 x64 (build 19041) ``` ### Summary ```r report_system() %>% summary() ``` ``` The analysis was done using the R Statistical language (v4.0.5; R Core Team, 2021) on Windows 10 x64 ``` --- ## Informe de paquetes ### Default ```r report_packages() ``` ``` - metathis (version 1.0.3; Garrick Aden-Buie, 2020) - dplyr (version 1.0.5; Hadley Wickham et al., 2021) - palmerpenguins (version 0.1.0; Horst AM et al., 2020) - report (version 0.3.0.9000; Makowski et al., 2020) - R (version 4.0.5; R Core Team, 2021) ``` ### Summary ```r report_packages() %>% summary() ``` ``` - metathis (v1.0.3) - dplyr (v1.0.5) - palmerpenguins (v0.1.0) - report (v0.3.0.9000) - R (v4.0.5) ``` --- ## Citar paquetes .panelset[ .panel[.panel-name[R base] ```r citation("dplyr") ``` ``` To cite package 'dplyr' in publications use: Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2021). dplyr: A Grammar of Data Manipulation. R package version 1.0.5. https://CRAN.R-project.org/package=dplyr A BibTeX entry for LaTeX users is @Manual{, title = {dplyr: A Grammar of Data Manipulation}, author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller}, year = {2021}, note = {R package version 1.0.5}, url = {https://CRAN.R-project.org/package=dplyr}, } ``` ] .panel[.panel-name[report] #### Para las referencias ```r cite_packages() ``` ``` - Garrick Aden-Buie (2020). metathis: HTML Metadata Tags for 'R Markdown' and 'Shiny'. R package version 1.0.3. https://CRAN.R-project.org/package=metathis - Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2021). dplyr: A Grammar of Data Manipulation. R package version 1.0.5. https://CRAN.R-project.org/package=dplyr - Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. https://allisonhorst.github.io/palmerpenguins/ - Makowski, D., Ben-Shachar, M.S., Patil, I. & Lüdecke, D. (2020). Automated Results Reporting as a Practical Tool to Improve Reproducibility and Methodological Best Practices Adoption. CRAN. Available from https://github.com/easystats/report. doi: . - R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. ``` #### Para las citas ```r cite_packages() %>% format_citation(authorsdate = TRUE, short = TRUE, intext = FALSE) ``` ``` - Garrick Aden-Buie (2020) - Hadley Wickham et al. (2021) - Horst AM et al. (2020) - Makowski et al. (2020) - R Core Team (2021) ``` ] .panel[.panel-name[report 2] ```r report(sessionInfo()) ``` ``` Analyses were conducted using the R Statistical language (version 4.0.5; R Core Team, 2021) on Windows 10 x64 (build 19041), using the packages metathis (version 1.0.3; Garrick Aden-Buie, 2020), dplyr (version 1.0.5; Hadley Wickham et al., 2021), palmerpenguins (version 0.1.0; Horst AM et al., 2020) and report (version 0.3.0.9000; Makowski et al., 2020). References ---------- - Garrick Aden-Buie (2020). metathis: HTML Metadata Tags for 'R Markdown' and 'Shiny'. R package version 1.0.3. https://CRAN.R-project.org/package=metathis - Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2021). dplyr: A Grammar of Data Manipulation. R package version 1.0.5. https://CRAN.R-project.org/package=dplyr - Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. https://allisonhorst.github.io/palmerpenguins/ - Makowski, D., Ben-Shachar, M.S., Patil, I. & Lüdecke, D. (2020). Automated Results Reporting as a Practical Tool to Improve Reproducibility and Methodological Best Practices Adoption. CRAN. Available from https://github.com/easystats/report. doi: . - R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. ``` ] ] --- ## Informe de fecha ### Default ```r report_date() ``` ``` It's jueves, mayo 06 of the year 2021, at 6p.m. 07 and 35 seconds ``` ### Summary ```r report_date() %>% summary() ``` ``` 06/05/21 - 18:07:35 ``` --- ## Informe de participantes (Datos) ```r Participantes <- data.frame( "Edad" = c(22, 22, 54, 34, 18, 28, 42, 45), "Genero" = c("F", "M", "F", "M", "F", "M", "F", "M"), "Años_Experiencia" = c(1, 2, 24, 9, 0, 5, 12, 20), "Nivel_Educacion" = c("Highschool", "Bachelor", "PhD", "Bachelor", "Highschool", "Bachelor", "Bachelor", "PhD"), "Grupo" = c("A", "A", "A", "A", "B", "B", "B", "B") ) ```
--- ## Informe de participantes .panelset[ .panel[.panel-name[Edad y género] ```r report_participants( Participantes, age = "Edad", sex = "Genero" ) ``` ``` [1] "8 participants (Mean age = 33.1, SD = 12.9, range: [18, 54]; 50.0% females)" ``` ] .panel[.panel-name[Años de educación] ```r report_participants( Participantes, age = "Edad", sex = "Genero", education = "Años_Experiencia" ) ``` ``` [1] "8 participants (Mean age = 33.1, SD = 12.9, range: [18, 54]; 50.0% females; Mean education = 9.1, SD = 9.0, range: [0, 24])" ``` ] .panel[.panel-name[Nivel de eduación] ```r report_participants( Participantes, age = "Edad", sex = "Genero", education = "Nivel_Educacion" ) ``` ``` [1] "8 participants (Mean age = 33.1, SD = 12.9, range: [18, 54]; 50.0% females; Education: Bachelor, 50.00%; Highschool, 25.00%; PhD, 25.00%)" ``` ] .panel[.panel-name[Por grupo] ```r report_participants( Participantes, age = "Edad", sex = "Genero", group = "Grupo" ) ``` ``` [1] "For the 'Grupo - A' group: 4 participants (Mean age = 33.0, SD = 15.1, range: [22, 54]; 50.0% females) and for the 'Grupo - B' group: 4 participants (Mean age = 33.2, SD = 12.6, range: [18, 45]; 50.0% females)" ``` ] ] --- ## Informe de muestra .panelset[ .panel[.panel-name[Default] ```r report_sample(penguins) ``` ``` # Descriptive Statistics Variable | Summary ---------------------------------------------- species [Adelie], % | 44.2 species [Chinstrap], % | 19.8 species [Gentoo], % | 36.0 island [Biscoe], % | 48.8 island [Dream], % | 36.0 island [Torgersen], % | 15.1 Mean bill_length_mm (SD) | 43.92 (5.46) Mean bill_depth_mm (SD) | 17.15 (1.97) Mean flipper_length_mm (SD) | 200.92 (14.06) Mean body_mass_g (SD) | 4201.75 (801.95) sex [male], % | 50.5 Mean year (SD) | 2008.03 (0.82) ``` ] .panel[.panel-name[Por grupo] ```r report_sample(penguins, group_by = "species") ``` ``` # Descriptive Statistics Variable | Adelie (n=152) | Chinstrap (n=68) | Gentoo (n=124) | Total ------------------------------------------------------------------------------------------------------- island [Biscoe], % | 28.9 | 0.0 | 100.0 | 48.8 island [Dream], % | 36.8 | 100.0 | 0.0 | 36.0 island [Torgersen], % | 34.2 | 0.0 | 0.0 | 15.1 Mean bill_length_mm (SD) | 38.79 (2.66) | 48.83 (3.34) | 47.50 (3.08) | 43.92 (5.46) Mean bill_depth_mm (SD) | 18.35 (1.22) | 18.42 (1.14) | 14.98 (0.98) | 17.15 (1.97) Mean flipper_length_mm (SD) | 189.95 (6.54) | 195.82 (7.13) | 217.19 (6.48) | 200.92 (14.06) Mean body_mass_g (SD) | 3700.66 (458.57) | 3733.09 (384.34) | 5076.02 (504.12) | 4201.75 (801.95) sex [male], % | 50.0 | 50.0 | 51.3 | 50.5 Mean year (SD) | 2008.01 (0.82) | 2007.97 (0.86) | 2008.08 (0.79) | 2008.03 (0.82) ``` ] .panel[.panel-name[Seleccionando columnas] ```r report_sample( penguins, group_by = "species", select = c("species", "sex", "bill_length_mm", "bill_depth_mm") ) ``` ``` # Descriptive Statistics Variable | Adelie (n=152) | Chinstrap (n=68) | Gentoo (n=124) | Total -------------------------------------------------------------------------------------------- sex [male], % | 50.0 | 50.0 | 51.3 | 50.5 Mean bill_length_mm (SD) | 38.79 (2.66) | 48.83 (3.34) | 47.50 (3.08) | 43.92 (5.46) Mean bill_depth_mm (SD) | 18.35 (1.22) | 18.42 (1.14) | 14.98 (0.98) | 17.15 (1.97) ``` ] .panel[.panel-name[Excluyendo columnas] ```r report_sample( penguins, group_by = "species", exclude = c("island", "flipper_length_mm", "body_mass_g", "year") ) ``` ``` # Descriptive Statistics Variable | Adelie (n=152) | Chinstrap (n=68) | Gentoo (n=124) | Total -------------------------------------------------------------------------------------------- Mean bill_length_mm (SD) | 38.79 (2.66) | 48.83 (3.34) | 47.50 (3.08) | 43.92 (5.46) Mean bill_depth_mm (SD) | 18.35 (1.22) | 18.42 (1.14) | 14.98 (0.98) | 17.15 (1.97) sex [male], % | 50.0 | 50.0 | 51.3 | 50.5 ``` ] ] --- class: inverse, center, middle # Informe de un data frame --- # Informe de un data frame .panelset[ .panel[.panel-name[Default] ```r report(penguins) ``` ``` The data contains 344 observations of the following 8 variables: - species: 3 levels, namely Adelie (n = 152, 44.19%), Chinstrap (n = 68, 19.77%) and Gentoo (n = 124, 36.05%) - island: 3 levels, namely Biscoe (n = 168, 48.84%), Dream (n = 124, 36.05%) and Torgersen (n = 52, 15.12%) - bill_length_mm: n = 344, Mean = 43.92, SD = 5.46, Median = , MAD = 7.04, range: [32.10, 59.60], Skewness = 0.05, Kurtosis = -0.88, 0.58% missing - bill_depth_mm: n = 344, Mean = 17.15, SD = 1.97, Median = , MAD = 2.22, range: [13.10, 21.50], Skewness = -0.14, Kurtosis = -0.91, 0.58% missing - flipper_length_mm: n = 344, Mean = 200.92, SD = 14.06, Median = , MAD = 16.31, range: [172, 231], Skewness = 0.35, Kurtosis = -0.98, 0.58% missing - body_mass_g: n = 344, Mean = 4201.75, SD = 801.95, Median = , MAD = 889.56, range: [2700, 6300], Skewness = 0.47, Kurtosis = -0.72, 0.58% missing - sex: 2 levels, namely female (n = 165, 47.97%), male (n = 168, 48.84%) and missing (n = 11, 3.20%) - year: n = 344, Mean = 2008.03, SD = 0.82, Median = 2008.00, MAD = 1.48, range: [2007, 2009], Skewness = -0.05, Kurtosis = -1.50, 0% missing ``` ] .panel[.panel-name[Summary] ```r report(penguins) %>% summary() ``` ``` The data contains 344 observations of the following 8 variables: - species: 3 levels, namely Adelie (n = 152), Chinstrap (n = 68) and Gentoo (n = 124) - island: 3 levels, namely Biscoe (n = 168), Dream (n = 124) and Torgersen (n = 52) - bill_length_mm: Mean = 43.92, SD = 5.46, range: [32.10, 59.60], 0.58% missing - bill_depth_mm: Mean = 17.15, SD = 1.97, range: [13.10, 21.50], 0.58% missing - flipper_length_mm: Mean = 200.92, SD = 14.06, range: [172, 231], 0.58% missing - body_mass_g: Mean = 4201.75, SD = 801.95, range: [2700, 6300], 0.58% missing - sex: 2 levels, namely female (n = 165), male (n = 168) and missing (n = 11) - year: Mean = 2008.03, SD = 0.82, range: [2007, 2009] ``` ] .panel[.panel-name[Usando dplyr] ```r penguins %>% select(-ends_with("_mm")) %>% group_by(species) %>% report() %>% summary() ``` ``` The data contains 344 observations, grouped by species, of the following 5 variables: - Adelie (n = 152): - island: 3 levels, namely Biscoe (n = 44), Dream (n = 56) and Torgersen (n = 52) - body_mass_g: Mean = 3700.66, SD = 458.57, range: [2850, 4775], 0.66% missing - sex: 2 levels, namely female (n = 73), male (n = 73) and missing (n = 6) - year: Mean = 2008.01, SD = 0.82, range: [2007, 2009] - Chinstrap (n = 68): - island: 3 levels, namely Biscoe (n = 0), Dream (n = 68) and Torgersen (n = 0) - body_mass_g: Mean = 3733.09, SD = 384.34, range: [2700, 4800] - sex: 2 levels, namely female (n = 34) and male (n = 34) - year: Mean = 2007.97, SD = 0.86, range: [2007, 2009] - Gentoo (n = 124): - island: 3 levels, namely Biscoe (n = 124), Dream (n = 0) and Torgersen (n = 0) - body_mass_g: Mean = 5076.02, SD = 504.12, range: [3950, 6300], 0.81% missing - sex: 2 levels, namely female (n = 58), male (n = 61) and missing (n = 5) - year: Mean = 2008.08, SD = 0.79, range: [2007, 2009] ``` ] ] --- class: inverse, center, middle # Reportes estadísticos <img src="https://media4.giphy.com/media/LPrAK9rEedDwjtL1J0/200.webp?cid=ecf05e47ugwngg89cftjjdg4xtuqv8ydamdgunpg7w7794s5&rid=200.webp" width="50%"> --- # Reportes estadísticos Las técnicas estadísticas que son posibles reportar con `report` son: - <p style="color:#88398A">Test de asosiación<p/> - <p style="color:#88398A">Test t para diferencia de medias<p/> - <p style="color:#88398A">Modelos lineales<p/> - <p style="color:#88398A">ANOVAs<p/> - Modelos lineales generalizados - Modelos mixtos - Modelos bayesianos --- ## Test de asociación La prueba de asociación o correlación se utiliza para probar si la correlación (indicada `\(\rho\)`) entre 2 variables es significativamente distinta de 0 o no en la población. Existen diferentes métodos para realizar análisis de correlación y sus respectivos contrastes de hipótesis: + **Pearson:** es una prueba paramétrica que se utiliza para medir el grado de relación lineal entre variables cuantitativas. Para que la prueba pueda aplicarse las variables deben ser independientes y normalmente distribuidas. + **Spearman:** es una prueba no paramétrica que no conlleva ninguna suposición sobre la distribución de los datos. Es ideal cuando las variables son ordinales, de intervalo o de razón. + **Kendall:** es una prueba no paramétrica que se calcula a partir del número de pares concordantes y discordantes. Se utiliza como alternativa a la prueba de Pearson cuando los datos con los que está trabajando fallaron en al menos uno de los supuestos. También es una alternativa a la prueba de Spearman cuando el tamaño de la muestra es pequeño y tiene muchos rangos empatados. ### Hipótesis $$ `\begin{cases} H_0: \rho = 0\\ H_1: \rho \neq 0 \end{cases}` $$ --- ## Test de asociación .panelset[ .panel[.panel-name[R base] ```r cor.test( x = penguins$bill_length_mm, y = penguins$bill_depth_mm, method = "pearson" ) ``` ``` Pearson's product-moment correlation data: penguins$bill_length_mm and penguins$bill_depth_mm t = -4.4591, df = 340, p-value = 1.12e-05 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.3328072 -0.1323004 sample estimates: cor -0.2350529 ``` ] .panel[.panel-name[report] ```r cor.test( x = penguins$bill_length_mm, y = penguins$bill_depth_mm, method = "pearson" ) %>% report() ``` ``` Effect sizes were labelled following Funder's (2019) recommendations. The Pearson's product-moment correlation between penguins$bill_length_mm and penguins$bill_depth_mm is negative, statistically significant, and medium (r = -0.24, 95% CI [-0.33, -0.13], t(340) = -4.46, p < .001) ``` ] .panel[.panel-name[report as table] ```r cor.test( x = penguins$bill_length_mm, y = penguins$bill_depth_mm, method = "pearson" ) %>% report() %>% as.data.frame() ``` ``` Parameter1 | Parameter2 | r | 95% CI | t(340) | p | Method ---------------------------------------------------------------------------------------------------------------------------------- penguins$bill_length_mm | penguins$bill_depth_mm | -0.24 | [-0.33, -0.13] | -4.46 | < .001 | Pearson's product-moment correlation ``` ] ] --- ## Diferencia de medias (t-test) La prueba de diferencia de medias es un test paramétrico que determina la igualdad de dos conjuntos de datos. Al elegir una prueba t, se deben considerar dos cosas: 1. **¿Prueba t de una muestra, dos muestras (independientes o pareadas)?** <br> Si la muestra se está comparando con un valor, entonces se realiza una prueba t de una muestra. Por el contrario, si se comparan dos grupos se realiza una prueba de dos muestras. Esta puede ser de dos poblaciones independientes o o de una misma población. 2. **¿Prueba t de una o dos colas?** <br> Si solo importa si las dos poblaciones son diferentes entre sí, se realiza una prueba t de dos colas. Por el contrario, si desea saber si la media de una población es mayor o menor que la otra, se realiza una prueba t de una cola. ### Hipóteis .panelset[ .panel[.panel-name[Una muestra] $$ `\begin{cases} H_0: \mu = \mu_{0} \\ H_1: \mu \neq \mu_{0} \end{cases}` $$ ] .panel[.panel-name[Dos muestras] $$ `\begin{cases} H_0: \mu_{1} = \mu_{2} \\ H_1: \mu_{1} \neq \mu_{2} \end{cases}` $$ ] ] --- ## Diferencia de medias (t-test) .panelset[ .panel[.panel-name[R base] ```r t.test( formula = body_mass_g ~ sex, data = penguins, paired = FALSE ) ``` ``` Welch Two Sample t-test data: body_mass_g by sex t = -8.5545, df = 323.9, p-value = 4.794e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -840.5783 -526.2453 sample estimates: mean in group female mean in group male 3862.273 4545.685 ``` ] .panel[.panel-name[report] ```r t.test( formula = body_mass_g ~ sex, data = penguins, paired = FALSE ) %>% report() ``` ``` Effect sizes were labelled following Cohen's (1988) recommendations. The Welch Two Sample t-test testing the difference of body_mass_g by sex (mean in group female = 3862.27, mean in group male = 4545.68) suggests that the effect is positive, statistically significant, and large (difference = 683.41, 95% CI [-840.58, -526.25], t(323.90) = -8.55, p < .001; Cohen's d = -0.95, 95% CI [-1.18, -0.72]) ``` ] .panel[.panel-name[report as table] ```r t.test( formula = body_mass_g ~ sex, data = penguins, paired = FALSE ) %>% report() %>% as.data.frame() ``` ``` Parameter | Group | Mean_Group1 | Mean_Group2 | Difference | 95% CI | t(323.90) | p | Method | d | d CI --------------------------------------------------------------------------------------------------------------------------------------------------------- body_mass_g | sex | 3862.27 | 4545.68 | 683.41 | [-840.58, -526.25] | -8.55 | < .001 | Welch Two Sample t-test | -0.95 | [-1.18, -0.72] ``` ] ] --- ## Regresión lineal La regresión es una técnica que permite generar un modelo lineal en el que el valor de una variable dependiente `\((Y)\)` se determina a partir de un conjunto de `\(k\)` variables independientes `\((X_{1}, X_{2}, \dots, X_{k})\)`. Los modelos de regresión lineal siguen la siguiente ecuación: $$ Y = \beta_{0} + \beta_i X_i + \varepsilon; \quad i = 1, 2, \dots, k $$ Donde: + La parte sistemática o no aleatoria es: `\(\beta_{0} + \beta_{i} X_{i}\)` + La parte estocástica o aleatoria es: `\(\varepsilon_{i}\)` Ademas: + `\(Y\)` es la variable dependiente o variable respuesta. + `\(X_{i}\)` es la i-ésima variable independiente o predictora. + `\(\beta_{0}\)` es el intercepto, es decir, el valor de `\(Y\)` cuando todas las variables predictoras valen 0. + `\(\beta_{i}\)` es el incremento de la variable dependiente por cada unidad de `\(X_{i}\)`, manteniendo las demás variables constantes. Son conocidos como como coeficientes de regresión. + `\(\varepsilon\)` son los errores o residuales. --- ## Regresión lineal .panelset[ .panel[.panel-name[R base] ```r lm( data = penguins, formula = body_mass_g~bill_length_mm + bill_depth_mm ) %>% summary() ``` ``` Call: lm(formula = body_mass_g ~ bill_length_mm + bill_depth_mm, data = penguins) Residuals: Min 1Q Median 3Q Max -1804.61 -454.83 8.15 463.53 1544.82 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3343.136 429.912 7.776 9.05e-14 *** bill_length_mm 75.281 5.971 12.608 < 2e-16 *** bill_depth_mm -142.723 16.507 -8.646 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 585.1 on 339 degrees of freedom (2 observations deleted due to missingness) Multiple R-squared: 0.4708, Adjusted R-squared: 0.4677 F-statistic: 150.8 on 2 and 339 DF, p-value: < 2.2e-16 ``` ] .panel[.panel-name[report] ```r lm( data = penguins, formula = body_mass_g~bill_length_mm + bill_depth_mm ) %>% report() ``` ``` We fitted a linear model (estimated using OLS) to predict body_mass_g with bill_length_mm and bill_depth_mm (formula: body_mass_g ~ bill_length_mm + bill_depth_mm). The model explains a statistically significant and substantial proportion of variance (R2 = 0.47, F(2, 339) = 150.82, p < .001, adj. R2 = 0.47). The model's intercept, corresponding to bill_length_mm = 0 and bill_depth_mm = 0, is at 3343.14 (95% CI [2497.50, 4188.77], t(339) = 7.78, p < .001). Within this model: - The effect of bill_length_mm is statistically significant and positive (beta = 75.28, 95% CI [63.54, 87.03], t(339) = 12.61, p < .001; Std. beta = 0.51, 95% CI [0.43, 0.59]) - The effect of bill_depth_mm is statistically significant and negative (beta = -142.72, 95% CI [-175.19, -110.25], t(339) = -8.65, p < .001; Std. beta = -0.35, 95% CI [-0.43, -0.27]) Standardized parameters were obtained by fitting the model on a standardized version of the dataset. ``` ] .panel[.panel-name[report as table] ```r lm( data = penguins, formula = body_mass_g~bill_length_mm + bill_depth_mm ) %>% report() %>% as.data.frame() ``` ``` Parameter | Coefficient | 95% CI | t(339) | p | Std. Coef. | Std. Coef. 95% CI | Fit -------------------------------------------------------------------------------------------------------------- (Intercept) | 3343.14 | [2497.50, 4188.77] | 7.78 | < .001 | 6.15e-16 | [-0.08, 0.08] | bill_length_mm | 75.28 | [ 63.54, 87.03] | 12.61 | < .001 | 0.51 | [ 0.43, 0.59] | bill_depth_mm | -142.72 | [-175.19, -110.25] | -8.65 | < .001 | -0.35 | [-0.43, -0.27] | | | | | | | | AIC | | | | | | | 5333.82 BIC | | | | | | | 5349.16 R2 | | | | | | | 0.47 R2 (adj.) | | | | | | | 0.47 Sigma | | | | | | | 585.08 ``` ] ] --- ## ANOVA El Análisis de Varianza (ANOVA) es una técnica paramétrica que se utiliza cuando los datos no están pareados y se quiere estudiar si existen diferencias significativas entre las medias de una variable aleatoria continua en los diferentes niveles de otra variable cualitativa o factor. Esta diferencia entre medias se detecta a través del estudio de la varianza entre grupos y dentro de grupos como se muestra en la siguiente tabla: | F.V. | S.C. | g.l. | M.C. | Estadístico <br>de contraste | |:------------:|:------------:|:-----:|:-------------------------------------:|:-----------------------------------:| | Entre grupos | `\(SC_{inter}\)` | `\(I-1\)` | `\(MC_{inter} = \frac{SC_{inter}}{I-1}\)` | `\(F = \frac{MC_{inter}}{MC_{intra}}\)` | | Intra grupos | `\(SC_{intra}\)` | `\(N-I\)` | `\(MC_{intra} = \frac{SC_{intra}}{N-I}\)` | | | Total | `\(SC_{total}\)` | `\(N-1\)` | | | ### Hipótesis $$ `\begin{cases} H_{0}: \mu_{i} = \mu_{j} \\ H_{1}: \mu_{i} \neq \mu_{j} \end{cases}` \forall i \neq j $$ --- ## ANOVA .panelset[ .panel[.panel-name[R base] ```r aov(data = penguins, formula = body_mass_g ~ species) %>% summary() ``` ``` Df Sum Sq Mean Sq F value Pr(>F) species 2 146864214 73432107 343.6 <2e-16 *** Residuals 339 72443483 213698 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 2 observations deleted due to missingness ``` ] .panel[.panel-name[report] ```r aov(data = penguins, formula = body_mass_g ~ species) %>% report() ``` ``` The ANOVA (formula: body_mass_g ~ species) suggests that: - The main effect of species is statistically significant and large (F(2, 339) = 343.63, p < .001; Eta2 = 0.67, 90% CI [0.63, 0.71]) Effect sizes were labelled following Field's (2013) recommendations. ``` ] .panel[.panel-name[report as table] ```r aov(data = penguins, formula = body_mass_g ~ species) %>% report() %>% as.data.frame() ``` ``` Parameter | Sum_Squares | df | Mean_Square | F | p | Eta2 | Eta2 90% CI ----------------------------------------------------------------------------------- species | 1.47e+08 | 2 | 7.34e+07 | 343.63 | < .001 | 0.67 | [0.63, 0.71] Residuals | 7.24e+07 | 339 | 2.14e+05 | | | | ``` ] ] --- class: inverse, center, middle # No `report` <img src="https://media2.giphy.com/media/hyyV7pnbE0FqLNBAzs/200.webp?cid=ecf05e47tm6f3m9fpzck6mm3jh3c50u40hfkxjxnua0x962s&rid=200.webp" width="50%"> --- class: inverse, center, middle # Using `report` <img src="https://media2.giphy.com/media/l0amJzVHIAfl7jMDos/200.webp?cid=ecf05e47tm6f3m9fpzck6mm3jh3c50u40hfkxjxnua0x962s&rid=200.webp" width="50%"> --- class: inverse # Gracias por su atención!! .pull-right[.pull-down[ ###
carlos221296@gmail.com ###
[carlostorrescubila.github.io/](carlostorrescubila.github.io/) ###
[@carlos_tc22](https://twitter.com/carlos_tc22) ###
[@carlostorrescubila](https://github.com/carlostorrescubila) ]]