Models and technologies for solving statistical analysis...

Models and technologies for solving statistical analysis problems

Today, new information technologies are widely used in all fields of knowledge and in all spheres of human activity. The variety of functional possibilities makes use of information technologies and for the qualitative analysis of various economic and financial situations. Methods of classical statistics are widely used for data analysis: factor and cluster analysis, descriptive statistics, histograms, exponential smoothing, etc. Data analysis is accompanied by a large amount of various calculations: modal value and median, averages, variance, correlation, covariance, rank, percentsi , quartiles, quantiles, etc.

In addition, various types of assessments, groupings, comparisons and sorts of empirical data are performed during the analysis; finding the minimum, maximum, average values ​​and a number of other operations. Correct application of these methods makes it possible to identify functional patterns in various mass social processes, build mathematical models, etc.

Table processors allow you to implement all this variety of types of analytical work using built-in functions (category Statistical, Figure 9.40) and add-ins "Analysis package" (Figure 9.43).

The use of statistical functions facilitates the user's statistical analysis of data. The number of available statistical functions in MS Excel 2010 has increased, a number of functions have been optimized to improve accuracy, the names of some statistical functions are brought into line with the terms used by the scientific co-

Function Wizard, Category Statistical

Fig. 9.40. Function Wizard, Category Statistical

company, as well as with other names of MS Excel functions. For example, the algorithm of the BETAPASP function is replaced by a new, more accurate (the new name of the function BETA RASP). In addition, new function names better describe their purpose. For example, the CRITBIN function returns the inverse of the binomial distribution, so the name BINOM.OBR is more appropriate for it. The RANDIS function now uses a new algorithm for obtaining random numbers. In order to be able to use all statistical functions, you need to download the Analysis package add-on.

If you want to use the spreadsheet with other users who do not have MS Excel 2010 installed, you can use the compatibility functions instead of the renamed functions. Previous versions of MS Excel recognize the old function names and return the correct results. However, if there is no need to provide backward compatibility, it is recommended to use the renamed functions. To see all the compatibility functions, you must select the Compatibility category in the function wizard and specify the desired function.

In Table. 9.1 describes some of the most common statistical functions (alphabetically) used in the analysis of financial and economic data. Information about other functions can be found in the help subsystem.

Table 9.1

Description of some statistical functions

Title

Description

BINOM.OBR

Returns the smallest value for which the integral binomial distribution is less than or equal to the specified

BINOM.DIST

Returns a single value of the binomial distribution

DISP.

Estimates variance by sample

DISP.D

Calculates the variance for the population

DISPA

Evaluates variance but sample, including numbers, text, and logical values ​​

DISPERSE

Returns the variance for the population, including numbers, text, and logical values ​​

QUARTZLE.CL

Returns the quartile of the data set

QUARTR.SYS

Returns the quartile of the data set based on the percentile values ​​from 0 to 1, excluding these numbers

MAX

Returns the largest value in the argument list

MEDIAN

Returns the median of given numbers

MIN

Returns the smallest value in the argument list

MOD.NSC

Returns a vertical array of the most common (repeated) values ​​in an array or data range

MODAOD

Returns the value of the data set mode

NORM.OBR

Returns the inverse of a normal distribution

PORMSDIST

Returns the normal distribution function

PASS

Returns the number of permutations of a given number of objects that are selected from the total number of objects

PERCENTRILE

Returns k-th the percentile for range values ​​

PERCENTRATION.

Returns the percentage value of a value in a data set

RANK.RV

Returns the rank of a number in the list of numbers

AVERAGE

Returns the average of arithmetic arguments

STANDARD.

Estimates standard deviation but sample

STDEV

Calculates the standard deviation of the population

ACCOUNT

Counts the number of numbers in the argument list

COUNTS

Counts the number of cells in the range that satisfy the specified condition

COUNTRY LINK

Counts the number of cells within a range that satisfy several conditions

READ OUTLOOK

Counts the number of empty cells in the range

TRENDS

Returns values ​​according to a linear trend

FREQUENCY

Returns the frequency distribution as a vertical array

Let's take a look at some examples in which the above functions are applied.

Example 9.23. In the supermarket, there are three points for placing goods. At each point, goods of the same kind are placed. How many ways are there for placing goods of five types, if one kind of product is placed at no more than one point?

Solution

Placements are finite ordered sets of elements of a given set. To calculate the number of allocations in the library of functions of the table processor, there is a special function PEREST (number, number of selected), belonging to the category of statistical functions (Figure 9.41). Argument number in this task is the Number of Product Types & quot ;, the argument number of selected - Number of dots .

Using the Rest Function

Fig. 9.41. Using the Rest Function

Example 9.24. Given a set of random values ​​of a discrete random variable: 10, 14, 5, 6, 18, 6, 13. Calculate the mathematical expectation, variance, standard deviation, median, mode, upper quartile and a quantile with a value of 0.1.

Before proceeding to the solution of the example, we will give definitions of the calculated characteristics.

Math expectation is the mean, one of the most important characteristics of the probability distribution of a random variable. Determined by the function of AVERAGE (number1, number2, ...).

Dispersion (from Latin dispersio - scattering) - in mathematical statistics and probability theory the most common measure of dispersion, i.e. deviation from the mean. For a sample population, the variance is determined by the function of VAR. B (number1; number2; ...).

Standard deviation is the concept of probability theory and mathematical statistics. The measure of the spread of a random variable around its mean value. For a sample population, the standard deviation is determined by the function STANOTCLONE. (Number1; number2; ...).

Median is the median value of the Median tags splits the sample into two equal parts. Half of the values ​​of the variable lies below the median, half - above. The median will give a general idea of ​​where the values ​​of the variable are concentrated, in other words, where its center is located. The median is determined by the MEDIAN function (number1, number2, ...).

Fashion is the most common value of a variable. The functions MODAODH (number1, number2, ...) are defined. If the dataset does not contain duplicate data points, the MODAOD function returns the error value # N/D.

The apartments are values ​​that divide the two half of the sample (broken by the median) once again in half (from the word "quart" to the quarter).

The upper quartile is distinguished, which is larger than the median and divides the upper half of the sample (the value of the variable is greater than the median) and the lower quartile, which is less than the median and divides the lower part of the sample.

The bottom quartile is often denoted by 25%, which means that 25% of the values ​​of the variable are less than the lower quartile.

The upper quartile is often denoted by the symbol 75%, which means that 75% of the values ​​of the variable are less than the upper quartile.

Thus, the three points - the lower quartile, the median and the upper quartile - divide the sample into four equal parts.

The quartile is defined by the QUARTILE.CL function (array, part). The argument part can take only the following five values: 0 is the smallest value, 1 is the 25th percentile, 2 is the median (50th percentile value), 3 is the 75th percentile value, 4 is the largest value.

Percentile (percentile, percentile) is the hundredth part of the volume of the measured population, expressed in percentages, to which a certain value of the characteristic corresponds.

Instead of the QUARTZIL.VKL function (array, part), you can use MIN (number1, number2, ...) and MAX (number 1, number2, ...), respectively, to get the smallest and largest values, and to get the median - the function of the MEDIAN (number 1, number 2, ...). These functions are calculated faster than the QUARTILE.CL function, especially in the case of large data arrays.

Quantile is the point on the numerical axis of the measured characteristic that divides the entire set of ordered measurements into two groups with a known ratio of their number. Quantiles include the median (quantile with a value of 0.5), quartiles (for example, the upper quartile is quantile with a value of 75%, or 0.75), percentyl. The quantile is calculated by the function PROCENTIL.VKL (array; k ) , where the array is a collection of values; k - the value of the percentile in the range from 0 to 1 inclusive.

Solution of Example 9.24

It's about the sample set, i.e. randomly selected from the general population of some of its parts. Based on this, we will apply functions that are specific to the sample set (Figure 9.42).

Solution of Example 9.24

Fig. 9.42. Solution of Example 9.24

MS Excel 2010 includes components that provide access to additional functions and commands. These components are called the "add-ons". Analysis package is one of the most popular add-ins that implements advanced data analysis functions. To use add-ins, they must be installed and activated. To do this, you perform the File tab in the Settings / Add-ins / Go section. In the Add-ins dialog box that appears (Figure 9.43), select the Analysis package check box and click OK.

Add-Ins dialog box

Fig. 9.43. The Add-Ins dialog box

On the Data tab, the Analysis group will be added, which contains a button for the Data analysis tool. (Figure 9.44). Clicking this button will open the Data Analysis dialog box (Figure 9.45) to select the analysis tools.

Data tab/Analysis/button Data analysis group

Fig. 9.44. Data tab/Analysis group/Data analysis button

Data Analysis Window

Fig. 9.45. The Data Analysis window

Example 9.25. The sample salaries of the main groups of bank employees are given: administration (managers), personnel for work with clients, technical services. The data obtained are shown in the table in Fig. 9.46. It is required to calculate the basic statistical characteristics in the data groups.

Sampling employees' salaries for example 9.25

Fig. 9.46. Sampling of bank employee salaries for Example 9.25

To solve a problem, we'll use the add-on "Descriptive statistics" from data analysis. Descriptive statistics include such characteristics as mean, standard error, median, mode, standard deviation, sample variance, kurtosis, asymmetry, interval, minimum, maximum, amount, quantity. The goal of descriptive statistics is to generalize the primary results obtained as a result of observations and experiments.

Standard error characterizes average fluctuations. It should be noted that the larger the sample size, the smaller the variance of the mean values.

Kurtosis is the degree of abruptness of the empirical distribution but relative to the normal. Usually, if the excess is positive, then the peak is pointed, if negative, then the peak is rounded. The excess of the normal distribution is zero.

Asymmetry is the asymmetry coefficient in probability theory - a quantity characterizing the asymmetry of the distribution of a given random variable. It is generally accepted that asymmetry above 0.5 (regardless of the sign) is considered significant. If the asymmetry is less than 0.25, it is considered insignificant.

The solution of Example 9.25

Select the "Descriptive statistics" tool: the Data tab, the Analysis group, and the Data analysis button (Figure 9.47).

Data analysis (descriptive statistics)

Fig. 9.47. Data analysis (descriptive statistics)

We fill in the Descriptive statistics dialog box (Figure 9.48). The input interval is the Salary selection table. together with the cap; grouping by columns & quot ;; We take the first row of the table as the labels of the first row; set the output parameters - the upper left point of the output interval; summary statistics - to create a detailed output table; the level of reliability is 95%. You can place the result on an existing sheet, or a new worksheet, or a new workbook.

The result is a table of the following type (Figure 9.49).

Descriptive statistics dialog

Fig. 9.48. The "Descriptive Statistics" dialog box

Basic statistical characteristics>

Fig. 9.49. Basic statistical characteristics

Example 9.26. The store sells men's suits. The distribution of demand by size is normal with the mathematical expectation 48 and standard deviation σ = 2. Calculate the percentage of demand for the 52nd size.

Solution

We use the function NORM.DIST (x, average, standard_off, integral), which returns the normal distribution function for the specified mean and standard deviation. This function is very widely used in statistics, including when testing hypotheses.

Arguments of the NORMSDIST function:

x is the value for which the distribution is constructed. In our example, the size of x = 52.

• Mean - the arithmetic mean of the distribution. In our example, this is M = 48.

• Standard_off - the standard deviation of the distribution. In our example, σ = 2.

• Integral - a logical value that determines the form of the function. If the & quot; integral & quot; has the value TRUE, the function NORMS.DIST returns an integral distribution function; if this argument is FALSE, the weight distribution function is returned. In our example, the value is FALSE, since we are talking about the weight distribution function and we are looking for the result for a certain point x .

Create a model and fill in the data with the task (Figure 9.50).

Thus, the answer was that the demand for the 52nd size is 2.7%.

Fig. 9.50. NORMDIST function

">

thematic pictures

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)