Thursday, March 5, 2020

Basic Statistical Methods and Concepts

Basic Statistical Methods and Concepts Learn Everything from Probability to Wilcoxon Tests ChaptersWhat is Probability?How to Choose a Statistical TestWhen to Use Tests of AssociationTests of Comparison between MeansTests of Prediction using Linear RegressionTests for Nonparametric DataHow to Perform Statistical TestsLet’s face it, while data science was named the “sexiest job of the 21st century,” the majority of people still shudder at even the mention of statistics. The root of why this discipline has been so alienating throughout the course of its history can be found with its close relationship with mathematics.Whether you believe you can’t learn statistical analysis or are simply curious to learn more about it, this guide will help get you started by laying out the core introductory concepts.At the heart of statistics are the five essential concepts of statistics, and form the basis for data analysis. The first four can be dealt with without going into much detail about their equations:Mean: the average value, calculated as the sum of all observations over th e number of observationsMedian: the midpoint of the dataset, calculated by ordering all observations from least to greatest and taking the value directly in the middleVariance: the general spread of the data, calculated as the average of squared differences of the meanStandard Deviation: also a measure of spread, calculated by taking the square root of the varianceCompute statistical data easily | Photo by Jorge FranganilloMuch like witnesses in a detective novel, these four concepts start to tell you the story of a particular set of data because they are descriptive statistics. For example, if you look around at the people in any restaurant you find yourself in, it can be very difficult to build a narrative, or interpretation, about the kind of crowd you’re surrounded by based solely on appearance.Say, however, you are given information about their age, monthly income, level of education, gender, and taste of music. The first two concepts, the mean and the median, are both measur es of central tendency that can tell you whether your crowd is mostly twenty-somethings making their way through college or wealthy, elderly people that invest in hedge funds.The difference between when you use these concepts depends on the distribution of the variable that you’re measuring or, in this example, the amount of variability within the crowd. The more alike the crowd is, the more accurate taking the mean will be in telling your story; the more variation between the people are, the more accurate the picture you draw will be by taking the mean.The variance and standard deviation are both measures of variability and can tell you how different each observation in your data are from the average with regards to a specific variable.If you wanted to see how similar the crowd is in terms of age, you would start the computation by calculating the mean age and, by subtracting every individual’s age from it, find a number that tells you how far people are spread from the average . The standard deviation, on the other hand, gives you how  far or close your data is clustered around the mean based on a normal distribution.The standard deviation is exactly like the variance in terms of what it says about the spread of your data â€" in fact, the standard deviation is calculated by taking the square root of the variance. The difference lies in the fact that the standard deviation the descriptive measure that is easiest to report because it is in the same units as the original data, whereas the variance is not.You can test what you've learned in your statistics course so far by attempting some statistics practice problems online!Continuous: can take on any value, like heightDiscrete: are integers, like the number of childrenCategorical variables are qualitative and also fall under two distinct categories:Ordinal: has an obvious order, like a scale rating happiness from 1 to 10Nominal: has no meaningful order, like genderWhen to Use Tests of AssociationThese types of tests are meant for looking at the relationship between two variables. It is the closest you'll get to looking at causality between two variables. For example, you want to discover if there is an association between marital status and level of education. All of these test the strength of the association between two variables:Type of TestType of VariablesExamplePearson CorrelationTwo continuous variablesIf shoe size has an association with heightSpearman CorrelationTwo ordinal variablesHow strong of an association there is between happiness and economic statusChi-SquareTwo categorical variablesTo see whether gender and favorite color have any association AisvaryaData Analysis Teacher 5.00 (4) £25/h1st lesson free!Discover all our tutors AdeyemiData Analysis Teacher 5.00 (4) £25/h1st lesson free!Discover all our tutors AdenikeData Analysis Teacher 5.00 (4) £25/h1st lesson free!Discover all our tutors ThanushanData Analysis Teacher 5.00 (4) £25/h1st lesson free!Discover all our tutors GokhanData Analysis Teacher 5.00 (4) £30/h1st lesson free!Discover all our tutors OlawaleData Analysis Teacher £12/h1st lesson free!Discover all our tutors AisvaranData Analysis Teacher 5.00 (4) £25/h1st lesson free!Discover all our tutors Muhammad umairData Analysis Teacher 5.00 (2) £100/h1st lesson free!Discover all our tutorsTests of Comparison between MeansTests of comparison deal with looking at the differences between different variables by looking at the difference between their means. For example, you want to see if where one goes to school makes a difference on standardized test scores.Type of TestType of VariablesExamplePaired T-TestTwo related variablesThe difference between weight before and after taking new supplementIndependent T-TestTwo independent variablesThe difference in spending on gas between people Los Angeles and New YorkOne-Way Analysis of Variance (ANOVA)One independent variable with distinct levels and one continuous variableComparing the means of test scores from three different levels of educationTwo-Way ANOVATwo or more independent variables with distinct levels and one continuous variableComparing the means of test scores from both three levels of education and twelve different zodiac signsTests of Prediction using Linear RegressionPrediction tests are used to determine whether a change in one or more variables the change in another. For example, given data on gender, diet and income you can investigate whether a change in these leads to a change in height.Type of TestType of VariableExampleSimple Linear RegressionOne scale variable (dependent) with one or two scale variables (predictors)You want to see if and how well age and height predict weightMultiple Linear RegressionOne scale variable (dependent) with two or more scale variables (predictors)You want to see if and how well age, height, and income predict weightTests for Nonparametric DataThese tests should be performed when the data does not meet the assump tions for the other tests. For example, when the data does not follow a normal distribution and is highly skewed.Type of TestType of VariableExampleWilcoxon Rank-Sum TestTwo independent variablesBetween two different drugs, which one offers the best relief on two random, distinct groups of a populationWilcoxon Sign-Rank TestTwo related variablesBetween two different drugs, which one offers the best relief on the same group of patientsFriedman TestThree metric or ordinal variables (has to be either metric or ordinal)Three different ad ratings given by individuals in the same populationAddress your research question and experimental designHow to Perform Statistical TestsThere are several assumptions about the data you are using that are tied to each statistical test discussed. In order for the tests to run, be predictive and accurate, these assumptions must be held. Because the assumptions for different types of tests can be different, it is imperative to check them before you start t o model your data.The most common programs used for statistical analysis are:ExcelStataSASSPSSPythonRIf you are running tests for parametric data, there are four main assumption checks that your data will have to pass. However, it should be noted that each test has it's own different set of assumptions that should be checked beforehand, and that this list is simply the ones you will come across most often.AssumptionDescriptionIndependenceThe groups that make up the sample are independent of eachother.NormalityThe data in the set is are normal, meaning that there it follows a normal distribution.Homogeneity of varianceIf there are multiple groups in the data relating to your independent variable, they have the same variance.If you're looking for some extra help on these introductory subjects, there are many online resources you can use to build your skills. Tutoring websites like Superprof, or online webinar courses from R-bloggers can help you get started on crunching some numbers.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.