few questions related to R

Question

The Prostate Dataset

The prostate dataset comes from a study on 97 men with prostate cancer who were due to receive radical prostatectomy.

The data contain the following variables:

lcavol: log(cancer volume in cm3)

lweight: log(prostate weight in gm)

age: age in years

  • lbph: log(benign prostatic hyperplasia amount)

svi: seminal vesicle invasion

lcp: log(capsular penetration)

Gleason: Gleason score

  • pgg45: percentage Gleason scores 4 or 5

lpsa: log(prostate specific antigen in ng/mL)

Question 1

  • Validate that the prostate data frame contains 97 observations.
    Hint: First install the faraway package (if you haven’t already) as instructed on Lesson 1, Slide 49. The following R statement will load the prostate data frame:

data(“prostate”, package = “faraway”).

Use the nrow() function to see how many overvaluations (rows) the data frame has. For example: the following statement prints the number of observations in the car data frame: nrow(cars).

  • Question 2

Calculate descriptive statistics of each of the variables.
Hint: Use the summary() function. For example: summary(cars).

Question 3

  • Create a new data frame that includes the following variables: lcavol, lweight, age and lpsa.
    Use this new data frame for all questions below.

Hint: In the following example, we select two variables (agegp and alcgp) from the esoph data frame and name the new data frame esophSubDf

esophSubDf <- esoph[c(“agegp”, “alcgp”)]

  • Question 4

Calculate descriptive statistics of each of the variables using the new data frame.

Question 5

  • Create a scatter plot matrix for all the variables using the new data frame.

Hint: Use the pairs() function (see Lesson 2, Slide 50).

Question 6

  • Create a (Pearson) correlation matrix for all the variables.
    Hint: Use the cor() function (see Lesson 2, Slide 48).

Question 7

Show the same matrix again, but round the correlations (use two decimal places).

  • Hint: Use the round() function. The following example calculates the correlation matrix for the cars data frame and rounds the numbers:
    round(cor(cars),2)

Question 8

Create a regression model:
The predictor variable (X) should be lpsa.
The outcome variable (Y) should be lcavol.
Show the summary of the model.

Hint: Use the lm() and summary() functions (see Lesson 2, Slide 51).

Question 9

Visualize the two variables and the model you just created by doing the following:

Create a scatter plot. Put lcavol in the y-axis and lpsa in the x-axis. Include the regression line and label the axis.

Hint: See Lesson 2, Slide 52.

Question 10

Update the regression model by adding a second predictor: age
Show the regression model summary

Get your college paper done by experts

Do my question How much will it cost?

Place an order in 3 easy steps. Takes less than 5 mins.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *