Banner

Purpose | Background |Procedure | Report

Expression of Experimental Data: Graphing and Analysis



EXPERIMENT NAME

CHEMISTRY HOMEPAGE
BLACKBOARD LOGIN
LAB MANUAL HOMEPAGE
SYLLABUS
 

 

Introduction

Two of the more common approaches utilized to express experimentally obtained data are graphical and statistical analyses. Throughout this semester we will consistently be employing these two techniques to report our findings as well as our confidence in those findings. With that being said, the first experiment this semester is designed to be a self-directed tutorial dedicated to either introduce or refresh your memory about the numerous graphical and statistical approaches you will encounter throughout this term.

Data Handling

In serveral laboratory experiments this semester you will be collecting large quantities of numerical data. In order to tabulate and express this data, you will need to become familiar with a "spreadsheet" program. There are several programs of this sort available, but the one we recommend is Excel. As a Microsoft program Excel is available to students on most PCs and is accessible on most University computers. For this reason, this lab recommends the use of and will provide tutorials in Excel. A tutorial on simple data handling in Excel can be accessed here.

Graphing - The Calibration Graph

Beyond simple data handling, the first technique that will be introduced is how to successfully and concisely express a large quantity of data graphically. No one likes looking at a table full of hundreds of numbers and trying to figure out a particular trend or pattern. What we do like looking at are pictures! And in science that means graphs. If the same data is presented graphically, the results are much easier to observe as well as explain.

Types of graphs are as varied as the data they represent, but one of the most common types of graphs used for data analysis, at least in this lab, is the ‘calibration’ graph. Briefly this is a graphic representation of standard reference data used to calibrate some variable in an unknown sample. For example, this technique will be used this semester when you have to determine the concentration of phosphate in local water samples using spectrometric methods.

Looking at the data given in the table below, we see the concentration (mol/L) as well as the corresponding absorbance (AU) for a set of standard phosphate concentrations in water. If our overall experimental goal is to determine the concentration of phosphate in a water sample obtained from Lake Bradford, we need to use a method to analyze our standard data by which we can extrapolate a trend and then use that trend to determine the concentration of phosphate in our unknown sample. From what is already given, this is a rather elementary task, and will be far more laborious when you actually perform the procedure yourself. However, at this point all we have to do is generate a scatter-plot, like the one shown below, of the dependent variable (absorbance) versus the independent variable (phosphate concentration).


graph1


Conc. (mol/L)

Abs. (AU)

2.00 x 10-2

0.333

1.00 x 10-2

0.163

5.00 x 10-3

0.084

2.50 x 10-3

0.041

1.25 x 10-3

0.019

6.25 x 10-4

0.011

 

 

 

 

 

 

 

 

 


Once the scatter-plot has been generated, a "best-fit line" can be added to see if our data can be described as linear, logarithmic, polynomial, or exponential. This process is called linear regression. Remember these different possibilities, if you don’t, it will come back to haunt you in later experiments. For the example above, the plotting of data results in a linear graph, and can therefore be fit with a linear equation of the y = mx + b form. The actual linear fit equation is shown in the upper right hand corner of the graph, as well as a corresponding R2 value. The R2 value is a statistical measure of how well the "best-fit line" fit the data. Specifically, the closer the R2 value is to 1, the better the fit. Now that we have an equation that describes the trend in the relationship between phosphate concentration and spectrophotometric absorbance, we can use it to determine the concentration of phosphate in our unknown. After obtaining the absorbance of our sample experimentally, we can quickly calculate the corresponding phosphate concentration. For example, if the absorbance of our unknown was reported to be 0.126, then what is the concentration of phosphate? Is this answer reasonable? [See Answer]

Graphing - Using the Slope

Another type of graphical analysis you will be faced with during this semester is the extraction of data from the slope of a line. In particular, when we address the concept of kinetics, we can ascertain the individual orders of a rate equation. Reactions are categorized as zero-order, first-order, second-order, or mixed-order (higher-order) reactions. These reaction orders are important because they tell us which reactant is most important in the overall rate by which a reaction progresses. We obviously don't expect you to understand all of these concepts at this point, there will be plenty of time (ha ha) for that later, but by simply determining the slope of the line resulting from a plot of the rate of a reaction versus the initial concentrations of the reactants we can determine the individual reaction orders for each chemical in a reaction.

For example, let’s say we want to determine the individual reaction orders for the reaction between nitrogen monoxide (NO) and oxygen (O2) from the data presented below.

 

[NO]o (mol/L)

[O2]o (mol/L)

Instantaneous Rate

(mol/L h)

Trial #1

0.020

0.010

0.028

Trial #2

0.020

0.040

0.114

Trial #3

0.020

0.020

0.057

Trial #4

0.040

0.020

0.227

Trial #5

0.010

0.020

0.014

 

What do we need to do? Well, first we need a balanced chemical reaction:

NO2 Rxn

Second, we need a rate equation:

Rate Eq

Third, we need this equation rearranged such that it is in a linear form:

line eq for NO2 rate

and

Rate Eq

Please notice that the above equation is of the form y = mx + b, where x is the slope and reaction order in each case. You can look forward (if you would like) to the lab on Kinetics to see a more complete breakdown of how these two equations were developed, but for our purposes here we just need to know that they are linear equations.

At this point, two graphs must be generated; both with the form of log (Rate) versus log (Reactant). It is imperative that in the each graph one of the reactant's concentration must be constant. Looking at the original data this means that we will plot Trials #3, #4, and #5 to generate our first graph ([O2] is constant), while Trials #1, #2, and #3 will be used in the second graph ([NO] is constant). Linear regression (e.g. adding a "best-fit" line) will allow us to determine the slope for each equation. Further, x in both equations not only represents the slope of the line and but also corresponds to the individual reaction order for NO and O2 respectively.

If you thought this was enough, more data can still be gathered from our graphs. Observing the equations carefully, we notice the common factor of k, which, in short, is the rate constant, something you’ll learn far more than you’ll ever want to in lecture. Nonetheless, we can extract a value of k from each graph, average it, and report an overall rate equation. Assuming our calculations are correct, you should check them yourself, we finally arrive at an overall equation of

Rate = (7.4 x 103 M-2 s-1) [NO]2[O2]1.

 

graph2

 

graph3

 

Statistics

Analyzing our data graphically is only half of the battle, as this type of analysis rarely gives information on how ‘accurate’ or ‘precise’ our data may be. For this aspect, we turn to statistical analysis. Statistics are a way for us to easily express the reliability of the data and analysis we present. With that being said, the rest of this portion of the experiment will directed towards introducing several statistical methods that must be mastered in order express our data in a manner that can easily be critiqued by us and the scientific community.

Overall, there are two areas of Statistics of great importance to a chemist: the analysis of error in repeated measurements and the analysis of distributions and central tendencies. Basically, the first application deals with trying to ascertain the ‘true’ value of a measurement, while the latter deals with observing patterns or trends in large collections of values or measurements.

Error Analysis

When you measure the same thing over and over again you often end up with slightly different results no matter how hard you try. This is solely due to the fact that every measurement has some error or uncertainty associated with it. Even advanced instrumentation such as radar guns have errors associated with them. To help understand what we mean, here are 5 measurements taken to determine the mass of carbohydrate in 50.0-g of a particular protein:

12.62-g, 11.91-g, 13.07-g, 12.73-g, and 12.59-g.

If we were asked to report this scientifically, we would never just list the 5 values, we would list give a ‘true’ value and its associated error. Huh? How do we do that?

The Mean

The first step to correctly report our finding is to figure out the mean of our data. For those of you who have forgot, the mean is taken to be the average of our set of data: 

Mean 

For our example, the mean is found by adding each individual data (xi) and dividing by the size of the sample (N) as shown:

Mean calc

Standard Deviation

The next statistic we need to calculate is the standard deviation. Specifically, the standard deviation measures how closely data are clustered about the mean value, and is technically defined as:

Standard Deviation

In general, the smaller the standard deviation, the closer the mean will be to the ‘true’ value. In particular, the numerator in this equation calculates the residual for each piece of data. In other words, the numerator calculates how much an individual measurement differs from the mean. As for the squares and the square root, they take care of the fact that some of our data points are larger than the mean while others are smaller than the mean.

NOTE:  Some statistical approaches require the variance of our data. This is just simply the square of our standard deviation.

Again using our example data:

stdev calc

Confidence and the Student's t

Finally, we can calculate the confidence (μ) with which we present our data. This statistical value incorporates several other values including themean, standard deviation in the mean, and the Student's t. Simply, this value is calculated by taking the mean and adding the corresponding confidence interval. The generalized form of the equation is shown below, where t is the value of the Student's t.at a given number of degrees of freedom and confidence:

Confidence

For our example:

Confidence Calculation

The value of t here is vital, as scientific data is generally expressed at 95% and 99% confidence. Using the table shown below, and the fact that we have 4 degrees of freedom (N – 1); we see that our Student's t value will be either 2.78 or 4.60. Let’s report both just to see…

Confindence  95%

or

Confidence 99%


Degrees of
Freedom

90%

95%

99%

1

6.31

12.70

63.7

2

2.92

4.30

9.92

3

2.35

3.18

5.84

4

2.13

2.78

4.60

5

2.02

2.57

4.03

6

1.94

2.45

3.71

7

1.90

2.36

3.50

8

1.86

2.31

3.36

9

1.83

2.26

3.25

10

1.81

2.23

3.17

11

1.80

2.20

3.11

12

1.78

2.18

3.06

13

1.77

2.16

3.01

14

1.76

2.14

2.98

Infinite

1.64

1.96

2.58

Q Table
(90% Confidence)

Number of Obs. (N)

0.76

4

0.64

5

0.56

6

0.51

7

0.47

8

0.44

9

0.41

10

             

 

 

 

 

Analytical Chemistry,An Introduction, 7th Edition:Table 7-2 p. 152.

 

 

 

Quantitative Chemical Analysis,5th Edition: Table 4-5 p. 82.

What we have calculated is essentially error bars. We can report our data with 95% certainty that the amount of carbohydrate in this particular protein is within +/- 0.53 grams of 12.58 grams, while we know with 99% certainty that the amount of carbohydrate is within +/- 0.87 grams of 12.58 grams. In other words, we can conclude that this particular protein will contain ~12.00 to ~13.00 grams of protein at 95% confidence, and anywhere from ~11.40 to ~13.40 grams at 99% confidence.

Outliers

Utilizing our same example, what if we took one more experimental measurement and found the carbohydrate content to be 17.64 grams. Immediately you notice that this value is well outside our confidence interval just reported, but being a good scientist you include it in your calculations. Shockingly, you notice that its presence dramatically affects both your mean and standard deviation. Luckily, you are about to learn an approach where you can statistically prove this data point to be an outlier, a piece of data that is ‘far away’ from the rest of the data.

The Q-Test

The tool that will help you with this situation is the Q-test, a method dedicated entirely to determining if one particular data point can be rejected from the others.  Shown below is the equation for Q Calculated which can be compared to a value from the Q Table. If the value of Q Calculated is found to be greater than Q Table, then the data can be discarded.

Qcalculation

For our example:

Qcalculation2

In our example, Q Calculated  > Q Table, we can therefore reject this data!

Now you should be well on your way to successfully completing the problems given in the Lon-Capa Assignment that constitutes the majority of this first lab assignment. Good Luck

 

 

 

 

 

 

 

 

 

 

© 2006 FSU Chemistry and Biochemistry Florida State University Logo