S-PLUS for Windows - version 4.5
S-PLUS is a powerful computing tool that combines the usefulness of a statistical analysis package with that of a publication quality graphics package and a matrix-based programming language. It's easy enough to use for quick and simple tasks, yet powerful enough for the most demanding ones. The goal of this demonstration is to provide a basic introduction to using S-PLUS. An S-PLUS session differs from that of other statistical software. You will find it to be an interactive approach where the results from one step lead to the next. This introduction to S-PLUS is necessarily limited in scope to only a handful of analyses. Once you become familiar with S-PLUS and browse through some of the online help topics, you will discover tools for practically any type of analysis you need. The basic S-PLUS module allows for time series, survival, and multivariate analyses, among others.
Topics included in this tutorial:
1. Starting S-PLUS the first time
2. Some things to keep in mind
3. Beginning an analysis
4. Visualizing your data
5. Simple Linear Regression
6. Non-linear Regression
7. Polynomial Regression
8. Writing functions
9. What to do next
1. Starting S-PLUS the first time
(Back to Top)
The first time you run S-PLUS from the computer lab, you will be asked for a directory where your work will be saved. S-PLUS will use this for a working directory whenever you start a session, however, you can easily change to another directory at any time. So before you start S-PLUS, decide on your default working directory and, if necessary, create it. Then when you start S-PLUS and it asks you for a start-up directory, click on "Browse" and proceed to select the desired directory. When you click "OK", S-PLUS will create the necessary subdirectories that it needs (_Data and _Prefs) and start the application.
When S-PLUS opens, you will most likely see a dialog box to Select Data. Just click "Cancel" for now. After that, there may or may not be any open windows. You will want a Commands Window so the first thing to do is open one up if it isn't open already. From the Window menu, select Commands Window. Alternatively, you can click on the Commands Window button on the toolbar.
Another useful window to have open is the Object Browser. Open this by clicking on the Object Browser button on the toolbar.
You can set up your preferences so that both of these windows open when you start an S-PLUS session. From the Options menu, select General Settings, and then select the Startup tab. On the left hand side you will see a group of three items that you can Open at Startup. Check Command Line and Object Browser. You can also uncheck Select Data Dialog if you don't want that coming up every time.
2. Some things to keep in mind
(Back to Top)
3. Beginning an analysis
(Back to Top)
For the remainder of this tutorial, we will be analyzing the following dataset:
|
Concentration |
0.3330 |
0.1670 |
0.0833 |
0.0416 |
0.0208 |
0.0104 |
0.0052 |
|
Velocity |
3.636 |
3.636 |
3.236 |
2.666 |
2.114 |
1.466 |
0.866 |
These data are measurements of the rate or velocity of a chemical reaction for different concentrations of substrate. We are interested in fitting a model of the relationship between concentration and velocity. The first thing we need to do is enter the data. We can do this from the Commands Window, from the Object Browser, or by importing an existing file. Most of this tutorial will focus on command line input so let's begin there. At the prompt, create two vectors:
> conc <- c(0.3330, 0.1670, 0.0833, 0.0416, 0.0208, 0.0104, 0.0052) > vel <- c(3.636, 3.636, 3.236, 2.660, 2.114, 1.466, 0.866) |
We use the function
c(), which stands for concatenate, to create a vector. The individual elements can be numeric, as in this example, or character, or any other mode. However, all elements in a vector must be the same mode. Note that the elements are separated by commas. The spaces between elements are included for clarity and are not required by S-PLUS.Now, we will combine these two vectors into a single data frame:
> df_data.frame(conc, vel) |
Let's look at the data frame to be sure we entered the data correctly. To view any object in S-PLUS, simply type its name:
> df
conc vel
1 0.3330 3.636
2 0.1670 3.636
3 0.0833 3.236
4 0.0416 2.660
5 0.0208 2.114
6 0.0104 1.466
7 0.0052 0.866
|
Oops. It looks like there is an error in one of the velocity entries. The fourth one down, 2.660, should be 2.666. You could use the up-arrow to recall the command you used to create
vel, make the change, and run it again. Then use the up-arrow again to recall the command you used to create df and run that one again. Let's look at another way to do it. First, we'll change just that one element in the vector vel. Individual elements in a vector are referenced in square brackets. We want to change the fourth element, so we type:
|
Take a look at
vel to see that it has been changed. There are a couple ways to change elements in a data frame. Treating the data frame as a matrix, we can reference the element in the fourth row and second column like this:
|
Note the order that the row and column are referenced: first the row, then the column. Data frames also allow us to reference individual columns by their names. This is done with the name of the data frame, followed by a dollar sign, and the name of the column. So the
vel data can be referenced as df$vel. Now we can change the fourth element like this:
|
Note that
df$vel is a vector so we only need one number in the brackets. Let's take another look at df to be sure we have it right this time.
> df
conc vel
1 0.3330 3.636
2 0.1670 3.636
3 0.0833 3.236
4 0.0416 2.666
5 0.0208 2.114
6 0.0104 1.466
7 0.0052 0.866 |
Looks good! We could have created this data frame in other ways. If the data were already entered into a spreadsheet, database, or ASCII file, you could import it using the Import Data command under the File menu. Alternatively, you could create a data frame using a spreadsheet-like interface right in S-PLUS. From the Data menu, select New Data Object. You then have the choice of creating a data frame, matrix, or vector. Select data frame and you are presented with a data grid, much like your favorite spreadsheet. You can enter values in each cell and give names to each column. Right-click in a column header and select properties to give the column a name or set the characteristics of that column. Properties for the entire data frame can be set by right-clicking in the upper left hand corner of the grid. Changes to a data frame (edits, adding new variables, etc.) can be done in this spreadsheet format at any time. From the Object Browser, click on data.frame in the left hand window to get a list of all available data frames in your working directory. Then double-click on a data frame listed in the right hand window to open it in the spreadsheet window.
Now we're ready to do an analysis.
4. Visualizing your data
(Back to Top)
The first thing we want to do is plot the data. You can easily get a basic plot with the following:
|
The first vector given is the independent variable to be plotted on the X-axis, the second is the dependent variable for the Y-axis. If you execute a plotting function and there is no active graphsheet, a default graphsheet will be opened for you. If you then do another plot, the first one will be lost and the new plot will be draw in the same graphics window. You can keep your first plot by opening another graphsheet by typing:
|
The default is a landscape-oriented graphsheet. You can create portrait graphsheets, square graphsheets, any shape you want. Check graphsheet in the help files to see how.
Getting back to the graph we just created. It looks like this:

We're going to look at 3 different ways to analyze these data: simple linear regression, non-linear regression, and polynomial regression.
5. Simple Linear Regression
(Back to Top)
You're probably thinking that it would be a mistake to use simple linear regression considering the plot we just produced. Well, there are ways of transforming data so that we can use linear regression methods. We'll fit a Michaelis-Menten function to these data. Using our data, the form of this function is as follows:
![]()
Vm and K are the parameters we are interested in estimating from these data. With a little bit of algebraic gymnastics, we can get the above equation to look like this:
![]()
It may not look like this helped, but it did. If you look closely, you'll see that this has the form of a simple linear regression model. Making these substitutions, conc/vel=ytrans, K/Vm=a, and 1/Vm=b, the equation becomes:
![]()
Here's our game plan. We'll first create the new transformed variable ytrans, then fit the linear regression to estimate the parameter a and b. Then we can calculate Vm and K from there.
We can easily add another variable to our data frame:
|
Take a look at
df and see that there is a new column added. Plot the new variable against conc to check whether a linear regression model is appropriate.
|
Now we are ready to fit our regression model. We'll use the function
lm(), which stands for linear model.
|
By default, you will get an error if there are any missing values in your data when you run this function. If this happens, you may want to omit those cases that contain missing values and fit the model on the remaining cases. To do so, run the same function with an argument to specify the desired action.
|
This might make more sense if you know that S-PLUS designates missing values by
NA. Now let's look at this function call. The desired model is specified with ytrans~conc. Think of the tilde as an equal sign when specifying a model. ytrans is our response, so it goes on the left side. conc is our explanatory variable, so it goes on the right. An intercept term is assumed so you do not need to include it in the model definition. However, it is possible to eliminate the intercept if you wanted.Notice that there was not any output automatically generated when you fit the regression. The results have been saved in an object we called
lmfit. This object is actually a list that contains several objects, which you can see with the function names()
|
Any of these terms can be viewed or used with the same method you used to view a variable in a data frame: object name, followed by a dollar sign, then the element name. For example,
lmfit$call. If you want to view most of the standard regression output, use the summary() function:
> summary(lmfit)
Call: lm(formula = ytrans ~ conc, data = df)
Residuals:
1 2 3 4 5 6 7
0.0008333 -0.001727 -0.0001864 0.0005013 0.0001363 0.00009112 0.0003515
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 0.0043 0.0005 9.5415 0.0002
conc 0.2596 0.0031 83.7034 0.0000
Residual standard error: 0.0009071 on 5 degrees of freedom
Multiple R-Squared: 0.9993
F-statistic: 7006 on 1 and 5 degrees of freedom, the p-value is 4.612e-009
Correlation of Coefficients:
(Intercept)
conc -0.6497 |
You can also easily view diagnostic plots (residuals, observed vs fitted, Cook's distance, etc.):
> plot(lmfit) |
S-PLUS knows that this is the output from a linear model and will generate the appropriate plots. How does it know? Check to see what class it is:
> class(lmfit) [1] "lm" |
You can get the coefficients from your model fit with the coef() function:
> coef(lmfit) (Intercept) conc 0.004303145 0.2596026 |
These are estimates for the parameters we called a and b, respectively. The output from
coef() is just a vector of length 2. Let's use the coefficients to add a line to our graph of ytrans vs conc:
> plot(df$conc, df$ytrans) > abline(coef(lmfit)) |
Here's what it looks like:

The last thing we have to do is back-calculate to get our non-linear parameters, Vm and K:
> Vm_1/coef(lmfit)[2]
> K_Vm*coef(lmfit)[1]
> Vm
conc
3.852041
> K
conc
0.01657589 |
Remember that the output from the function is a vector of length 2 so we can access the desired coefficient using the brackets as shown. We now have our parameter estimates!
Now that you have gone through a linear regression at the command line, try it from the pull down menu. From the Statistics menu, select Regression > Linear... This opens a Linear Regression dialog box where you will enter the specifics of your analysis. Enter the name of the data frame or select it from the combo-box. Enter a model formula as we did at the command line or click on the Create Formula button for assistance. Lastly, name the object that you want to contain your results, or at least be aware of the default name so you know where to find it later.
6. Non-linear Regression
(Back to Top)
We can also directly fit the Michaelis-Menten function to our data using non-linear regression. Remember the term "sum-of-squares" from your old regression class? When you fit a regression model, you get a "fitted value" for every data point used to fit the model. If you take the difference between the fitted value and the observed value, you get what we call a residual. Then if you square all the residuals and add them up, you get the residual sum-of-squares. The smaller that is, the better the model fits your data. You may have heard this called the least-squares method. Well, non-linear regression works the same way. With non-linear regression, we specify the form of the model we want to fit and the parameters that need to be estimated. S-PLUS then searches for parameter values that will minimize the residual sum-of-squares.
The first thing we need to do is add our parameters to the data frame. We'll also give them initial values from the analysis we just ran.
> param(df,"K")_0.0166
> param(df,"Vm")_3.852
> df
Parameters:
$K:
[1] 0.0166
$Vm:
[1] 3.852
Variables:
conc vel ytrans
1 0.3330 3.636 0.091584158
2 0.1670 3.636 0.045929593
3 0.0833 3.236 0.025741656
4 0.0416 2.666 0.015603901
5 0.0208 2.114 0.009839167
6 0.0104 1.466 0.007094134
7 0.0052 0.866 0.006004619 |
The data frame
df now has the parameters K and Vm associated with it. To run the analysis, we use the function nls(), which stands for non-linear least squares. Use the summary() function to view the results.
> nlsfit_nls(vel~Vm*conc/(K+conc),data=df)
> summary(nlsfit)
Formula: vel ~ (Vm * conc)/(K + conc)
Parameters:
Value Std. Error t value
K 0.0178866 0.000992791 18.0165
Vm 3.9109300 0.055769900 70.1263
Residual standard error: 0.0671906 on 5 degrees of freedom
Correlation of Parameter Estimates:
K
Vm 0.754 |
You can view the estimates for
K and Vm from the summary output, or you can use the coef() function again. How do the estimates compare with those from the previous analysis? We want to plot our non-linear fit to see how well it matches the data. First, plot the original variables again. Remember to create a new graphsheet if you want to keep your previous graph.
> plot(df$conc, df$vel, xlim=c(0,0.4), ylim=c(0,4)) |
There's something new here. We used the xlim and ylim arguments to specify the limits for the x and y axes, respectively. By default, S-PLUS will set the limits just enough to plot all the data. Sometimes you may want to plot beyond the data if you're going to add other things later or just to make the plot look a little better.
To add the model fit to the plot is going to take a little more work than with simple linear regression. The x-axis on our plot goes from 0 to 0.4, so we're going to need to generate a vector that covers this range and then calculate a y-value for each x-value using the parameters we just estimated. The number of x-values you generate will determine how smooth the line is going to look. You will almost always get a smooth line with 100 x-values.
> x_seq(0, 0.4, length=100) |
This does just what you think it does. It generates a sequence of 100 numbers from 0 to 0.4. Now we calculate the associated y-values:
> y2_(coef(nlsfit)["Vm"]*x)/(coef(nlsfit)["K"]+x) |
This shows you another way that you can reference elements in a vector. If the elements are named, you can use that in the brackets instead of its position number. There's another way to get our y-values for the plot that's perhaps the simplest. We'll use the function predict() that will predict fitted response values for a given set of x-values. The function wants the x-values in a dataframe and with the same variable name(s) as the original data. Here's how we do it:
> y2_predict(nlsfit,data.frame(conc=x)) |
The function predict() can be used with results from linear models, non-linear models, and generalized linear models. Check the online help to see all it can do. Now to add the line to our plot:
> lines(x, y2) |
I'm sure you noticed I called the y-values
y2. This is the fit from our second model. Let's add a line from our first model to see how they compare. We can use the same vector of x-values to calculate a new set of y-values and add the line to our plot.
> y1_(Vm*x)/(K+x) > lines(x, y1, lty=2, col=3) |
For this to work, you must have created the objects
Vm and K as we did in the previous section. Also note that we used the line type argument (lty) for a dotted line and the color argument (col) to get a different color. Here is what the resulting plot should look like:
You can try non-linear regression from the pull down menus as well. From the Statistics menu, select Regression > Nonlinear... to open the dialog box. Having been through it at the command line, you should have no trouble knowing what to do in the dialog box.
7. Polynomial Regression
(Back to Top)
The last method we'll use to fit these data is polynomial regression, where the model takes on the form y = b0 + b1x + b2x2 + b3x3 + × × × , etc. We're going to fit a second order polynomial like this:
> polyfit2_lm(vel~poly(conc,2), data=df) |
We're using the same function
lm() as linear regression, but we're also using the function poly() to generate our polynomial formula. We could just as easily fit this model this way:
> polyfit2_lm(vel~conc+conc^2, data=df) |
This is fine when you have a second order polynomial, but what about when you want to fit a 10th order, or a 20th order polynomial? Life can be simpler sometimes using the function
poly(). But there is one distinct difference between these two approaches. If you look at the summaries for both, you'll find that all the significance tests result in the same P-values, but the coefficients themselves are different. The difference is in what the poly() function does. Instead of producing a polynomial with x and x2, you get an orthogonal and normalized form of the polynomial. Suffice it to say that you get appropriate results from the regression (i.e., the significance tests are correct), but the resulting coefficients are on a different scale. So before we look at the coefficients, we want to transform them back to the original scale. This is how we do that:
> coef2_poly.transform(poly(df$conc,2), coef(polyfit2))
> coef2
x^0 x^1 x^2
1.288439 25.65224 -56.50026 |
Now we want to draw this line on our graph. We could add it to the plot with the other two lines, but let's create a new graph and label the x and y axes:
> plot(df$conc, df$vel, xlim=c(0,0.4), ylim=c(0,5), + xlab="Substrate Concentration", ylab="Reaction Rate") |
There's something new with this line. You can enter the entire command on a single line if you want, but if you hit Enter before the command is complete, you get the "+" prompt on the second line where you finish the command. NOTE: the "+" on the second line IS NOT part of the command, it is the prompt to continue. So if you enter this all together on a single line, DO NOT include the "+".
There are a couple ways to generate the y-values for the line. Perhaps the most straightforward is the following:
> y3_coef2[1] + coef2[2]*x + coef2[3]*x^2 |
We just plug in the coefficients and the appropriate x-values and we're done. There's another way that doesn't involve so much typing, especially when dealing with higher order polynomials. It involves matrix multiplication so hopefully you remember something about linear algebra. We're going to create a matrix of x-values and then multiply that by our coefficient vector.
> y3_cbind(1,x,x^2) %*% coef2 |
The function
cbind() is used to bind columns together to form a matrix. The operator %*% is used for matrix multiplication. There is also a function called rbind() that binds vectors together as rows instead of columns. Add the line to the graph:
> lines(x,y3) |
And it should look like this:

The last thing we're going to do is increase the polynomial order to the maximum. There are seven data points so the maximum order is six. (Why?) First we fit the new regression, then transform the coefficients, generate new y-values, and add the new line to our graph.
> polyfit6_lm(vel~poly(conc,6),data=df) > coef6_poly.transform(poly(df$conc,6),coef(polyfit6)) > y4_cbind(1,x,x^2,x^3,x^4,x^5,x^6) %*% coef6 > lines(x,y4,lty=2) |
When you add the lines, you will see several warnings go by because some of the resulting y-values greatly exceed the range of the graph. The plot should now look like this:

It's a good fit (the line goes through every point) but how useful is it for predicting new points? Take a look at the summaries from each fit. Ever seen an R2 of 1 before?
8. Writing functions
(Back to Top)
The last thing we're going to look at is what really makes S-PLUS powerful: writing your own functions. If you do something once it's not a problem to type in each command as we've done here. But if you frequently do some particular task, it would be nice to automate it so that it could be done with a single command. In calculating the y-values for the polynomial lines for our graph, we used the
cbind() function with a term for each column. If you were going to do this a lot with large order polynomials, you might want to automate the task of creating this matrix. From the file menu, select New… or click on the New toolbar button.
polymat_function(x, deg) {
out_numeric(0)
for(i in 0:deg) out_cbind(out, x^i)
return(out)
} |
Type this into the script window exactly. Then with the script window active, press F10 or click the Run toolbar button.
This will source the script you just wrote and create the function. There is a split screen at the bottom of the script window where you will see the lines of the function being executed as it's sourced. Check for any errors here. If there are errors, it's most likely that you typed something in wrong. Double check the syntax and try again. When it sources OK, use the function
> polymat(1:5, 3)
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 1 2 4 8
[3,] 1 3 9 27
[4,] 1 4 16 64
[5,] 1 5 25 125 |
Note the
1:5 used here. This is just a shorthand way of getting a vector of integers between two numbers. Now you can use this function when you want to create a vector of y-values to plot. This is how you would use it as an alternative to how you created y4 previously:
> y4_polymat(x, 6) %*% coef6 |
Now if you really want to automate the process, let's write a function that will fit whatever order polynomial you want, transform the coefficients, and plot the graph all with one command. Open another script window and type in this function.
poly.fit_function(deg=2, data=df) {
pfit_lm(data$vel~poly(data$conc,deg))
pcoef_poly.transform(poly(data$conc,deg), coef(pfit))
x_seq(0, 0.4, length=100)
y_polymat(x, deg) %*% pcoef
plot(data$conc, data$vel, xlim=c(0,0.4), ylim=c(0,5),
xlab="Substrate Concentration", ylab="Reaction Rate")
lines(x,y)
invisible()
} |
You should recognize most of the lines in this function The
invisible() function is used because nothing is being returned from poly.fit() other than a graph. Notice also that we're calling the polymat() function that we just created. You don't have to put everything in a single function. It's usually more efficient to build a library of smaller functions that do specific tasks and then pull them together as needed. Notice that the data frame is an argument to this function. This way the function is not specific to just one data set.
Remember to source the function (F10 or the Run button). Notice that both arguments have defaults so you could use the function with no arguments within the parentheses. To try a 4th order polynomial and see the fit, use the function like this:
> poly.fit(4) |
That's all there is to it. Hopefully you can see the value of being able to write your own functions this easily. Can you modify the function so that it returns the transformed coefficients?
9. What to do next
(Back to Top)
If you have any questions about using S-PLUS, check their online help from the Help menu. There is a lot of well presented information there. If you want to find functions that will do what you want, select Language Reference from the Help menu. From the Contents tab, you can select any general subject and get a listing of all relevant functions. All common subjects are available from this menu, like ANOVA models, regressions, survival analysis, time series, etc. From the Index tab you can request help on a specific function. You can also find help in using the pull-down menu interface from other selections under the Help menu. Try the S-PLUS Help menu selection, then select the Tutorial from the Contents tab.
There is also a tutorial available on the internet at http://www.mscs.dal.ca/splus/contents.html that may provide some tips and pointers.
A couple things you might be interested in: