12.4. Frequency distributions
Now let's take a more detailed look at the individual procedures that make up the basic data analysis: the procedures for calculating frequency distributions ( frequency distribution ) and cross tabulation tables ( crosstabulation ). After that, we'll show you how, using these procedures, we test the statistical hypotheses ( hypothesis testing ) about relationships and differences.
Let's start with the calculation of frequency distributions. It allows you to give an answer, for example, to the following questions:
 what is the number and share of loyal (loyal) consumers of the brand from the number of all its consumers?
 what is the number and proportion of the representatives of the population under study, well, medium, little and not at all knowledgeable about the new product of the firm?
 what is the market share of heavy, medium, light users and nonusers of the product?
 Is it significant whether these shares measured during the survey differ from some fixed values, outlined by the leaders of the company?
 what is the distribution of the income of consumers of a particular brand? Is it true that it is biased towards relatively low incomes?
In the software package SPSS , the calculation of the frequency distributions is performed by the Frequencies command (menu Analyze> Descriptive Statistics → Frequencies ).
Example 12.1
Distribution of answers from former clients of the fitness center
Consider the distribution of answers from people who stopped attending the fitness center, to the question of how long they usually spent there when they visited it (Table 12.5).
Table 12.5. Distribution of respondents' answers to the question: "How much time did you usually spend in the fitness center?", H
Values 
Response options, h 
Frequency (frequency with which the value met) 
Percent (percentage of the number of all values) 
Valid Percent (percentage of the number of allowed values) 
Cumulative Percent (percentages allowed by the cumulative total) 
Valid (valid values) 
, 50 
1 
, 5 
, 5 
, 5 
1.00 
15 
7.0 
7.1 
7.5 

1.50 
34 
15.9 
16.0 
23.6 

1.75 
4 
1.9 
1.9 
25.5 

2.00 
75 
35.0 
35.4 
60.8 

2.20 
1 
, 5 
, 5 
61.3 

2.25 
1 
, 5 
, 5 
61.8 

2.30 
1 
, 5 
, 5 
62.3 

2.50 
26 
12.1 
12.3 
74.5 

2.75 
1 
, 5 
, 5 
75.0 

3.00 
39 
18.2 
18.4 
93.4 

3.50 
5 
2.3 
2.4 
95.8 

4.00 
8 
3.7 
3.8 
99.5 

5.00 
1 
, 5 
, 5 
100.0 

Total 
212 
99.1 
100.0 

Missing (Missing values) 
System 
2 
, 9 

Total (total) 
214 
100.0 
We see that 214 respondents were interviewed. Two of them did not appreciate the typical duration of their stay at the fitness center. This is reflected in the corresponding column of the data table with the inscription System  system pass. Two hours were usually spent in the fitness center by 75 respondents, which amounted to 35.0% of the total number of respondents, or 35.4% of those who answered the question.
The data shown in the table becomes clear due to the frequency diagram (Figure 12.7), which is also possible in the Frequencies ( Charts tab).
Fig. 12.7. Schedule of frequency distribution of respondents' answers to the question about their time in the fitness center, h
Knowing the frequency distribution, we can calculate the statistical characteristics of the studied variable, i.e. answers to a certain question of the questionnaire. There are three types of these characteristics:
o characteristics of the main trend in the values of the indicator : mod, median, mean;
o characteristics varieties of values : standard deviation, variance, etc.;
o characteristics of the form of distribution of the values of the indicator : asymmetry, kurtosis.
Characteristics of the main trend in the responses
To identify the main trend in the answers to the question, it means to generalize how respondents responded generally, what values this variable usually takes. For this, three characteristics can be used: mode, median and mean. SPSS can calculate any of these characteristics for any numeric variable. Which of these characteristics can really be used depends on what kind of data (nominal, ordinal, interval or proportional) we are dealing with (Table 12.6).
Table 12.6. Indicators that can serve as characteristics of the main trend depending on the type of scale
Data Scale Type 
Characteristics of the main trend in the answers 

Fashion 
Median 
Average value 

Nominal 
+ 

Ordinal 
+ 
+ 

Interval 
+ 
+ 
+ 
Proportional 
+ 
+ 
+ 
Here are the results of calculating these values (Table 12.7) in the SPSS program complex (in the Frequencies command in the Statistics subcommand of the Mean , Median, Mode ).
Table 12.7. Statistical characteristics of the main trend in respondents' answers to the question about their time at the fitness center
Fashion (Mode) 
Median (Median) 
Average value (Mean) 

How much time did you usually spend in the fitness center? (h) 
2.0 
2.0 
2.2 
Fashion is the variant of the answer that occurred more often than others (the value of the variable that it takes more often than the rest of its values). On the frequency chart, this value corresponds to the highest peak. For example, in Fig. 12.7 the mode is 2.00 (hours). Thus, the mode does not reflect the frequency of choice of other variants of the answer, i.e. little informative. Therefore, it can be considered a good characteristic of the main trend only for nominal characteristics. Indeed, for them other, more informative characteristics of the main trend are not applicable.
Median is the value that divides the sample ordered by increasing the studied variable into two equal parts: one half of the observations lies below the median and the other half is higher. Suppose first that the number of observations is odd, for example, 101. Then the median will be called the 51st order in the ordered series. If the number of observations is even, for example 100, then the median is calculated as the average of the two values of the ordered series  50th and 51st. In the first case, the median coincides with the value of the variable "middle" of the respondent (51st), and in the second  with the average of two values "middle" pairs of respondents (50th and 51st).
Actually, it is not necessary to renumber all the responding respondents to calculate the median. It is enough to find out on the basis of the distribution of answers, where the & quot; middle & quot; respondent or & quot; middle & quot; a couple of respondents. To do this, you need to know the answer to 50% of the respondents in the Cumulative Percent column  the percentages allowed by the cumulative total (see Table 12.5).
Let's explain this procedure with the example of the above table. The number of respondents who answered the question is even (212). Judging by the data of the last column of the table, 25.5% (the nearest to 50% fewer) of the number of these respondents gave answers to 0.5, 1.00, 1.50 and 1.75. And the answers 0.5, 1.00, 1.50, 1.75 and 2.00 have already given 60.8 (the closest to 50% is the greater number). It does not matter to us who of the 212 respondents who responded will be & quot; enlisted & quot; in the pair, which was discussed above, but in this case it is clear that they both chose the answer 2.00. And half the sum of "twos", of course, is also equal to "two", that is, the median is 2.00.
Note one nuance associated with the concept of the median. In some cases, if in the middle of an ordered series there are many coinciding values, i.e. data concentrated, the researchers prefer to use not the usual, but the socalled refined median (in the Frequencies command in the Statistics subcommand, the Values are group midpoints , Figure 12.8).
In our example, "two hours & quot; answered 75 respondents.
The idea of this calculation is as follows. 212 respondents answered the question about the length of stay in the club as follows:
54 respondents said they had been at the club for less than two hours;
o 75 respondents  exactly two hours;
o 83 respondents  more than two hours.
Fig. 12.8. Selecting options for calculating the refined median
If you number all respondents by increasing the length of their stay in the club, then "middle" a pair of respondents standing in an ordered row at 106th and 107th places will be located closer to the end of the group of responders "two hours" than to the beginning. Let us explain what has been said by the following figure (Figure 12.9).
Fig. 12.9. A diagram illustrating the idea of calculating a refined median
From the beginning of the group & quot; 2 h & quot; from 75 respondents to & quot; middle & quot; pairs is located 105  54 = 51 respondent (105  54), and after this pair to the end of the group 105  83 = 22 respondents (105  83). In other words, the refined median is stronger & quot; attracted & quot; values that are greater than the values that are smaller. Therefore, the refined median should be somewhat larger than two hours. In this case, its value is 2,076 hours. We will not give the calculation algorithm, since it is rather complicated [2].
The median, as already noted, is meaningless to consider if the variable is nominal. It serves as a good characteristic of the main trend in responses if the measurement is made on an ordinal scale, when, for example, the difference between the variants of answers No. 1 and No. 2 may be quite different than the difference between the variants of answers No. 2 and No. 3. Recall that in ordinal scales, the value of the values has no meaningful meaning, it is important only that one of them is larger, less than the other, or the values coincide. For example, this is due to the fact that if it were a question of a place to which a respondent would put in his preferences a certain kind of candy, then one of the respondents might well have been a monogamous person, i.e. love only one grade of chocolates, put them in first place; sweets, put them on the second, third, etc. places, he can almost equally not love and not eat and only at the request of the interviewer ranked. Therefore, for ordinal scales, the median's advantage over the average value (which we will soon turn to be considered) is undeniable: the median does not take into account the magnitude of the values of the studied variable for the respondents in the row to the right and left of the "middle" pairs of respondents or & quot; middle & quot; of the respondent. Only the total number of these and other values is taken into account.
This property makes the median useful as an additional characteristic for both interval and proportional scales, especially if there are answers in the data that differ sharply from the main mass, socalled emissions ( outliers ), i.e. The values of the variable are far from their main mass. (How emissions are determined, we will discuss in the next subsection.) For example, if income distribution is measured, it is useful to know the income level of the respondent in the middle of the welfare series. At the same time, it does not matter that a small number of very rich people got into the sample, whose income in the case of calculating the arithmetic mean will create the illusion of a higher prosperity in the whole population under study.
The average value is calculated by the formula
(12.1)
where n is the number of respondents who answered the question; Xi is the response, called the ith respondent.
In the example we are considering, the average stay time of the respondents in the fitness center was 2.2 hours.
Using the mean value as a characteristic of the main trend in the answers makes sense only when using interval or proportional scales, i.e. when the difference between the values of 1 and 2 is the same as between 2 and 3, etc.
At the same time, for such scales, the calculation of the mean value is sometimes supplemented by the calculation of the median. For example, in the example of income distribution, the average value is equal to the income that would have been obtained if all respondents had divided their incomes and divided them equally. The situation is quite fantastic. For example, if it turns out that an oligarch with an income two or three orders higher than all other respondents got into the sample, the average income for all respondents will increase substantially. But this increase can hardly be called a reflection of the main trend in the incomes of the representatives of the studied population.
How to ...
We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)