- R Tutorial:  R objects for statistical modeling

R Tutorial: R objects for statistical modeling

Want to learn more? Take the full course at https://learn.datacamp.com/courses/statistical-modeling-in-r-part-1 at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.

---

In the first video, I said that a mathematical model is a model bu...
Want to learn more? Take the full course at https://learn.datacamp.com/courses/statistical-modeling-in-r-part-1 at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.

---

In the first video, I said that a mathematical model is a model built from mathematical stuff. A statistical model is a mathematical model that's closely tied to data. In practice, statistical models are built from a special kind of mathematical stuff: the stuff that makes up computer languages. In this video, we'll examine some of the kinds of objects in R that you will encounter in your work with models.

Three of the most important R objects for modeling are functions, formulas and data frames.

Some people describe a data frame as a kind of spreadsheet or matrix, with rows and columns. I prefer to think of data frames more simply: as a collection of variables. Each of the variables is a column. It's good practice to give a name to each of the variables in a data frame. It's easiest to describe models in terms of the names of the variables involved. The rows of a data frame are called cases. Each case is one object in the real world. A case might be a person, or it might be a person at a particular time, or anything else. But always the case is the object from which values for the individual variables are measured.

We're going to use functions for several purposes, both to build models and to evaluate those models, for instance, to calculate the output of models for new inputs. The functions that build models will generally take as inputs both a data frame and a formula that describes the relationship among the variables involved in the model.

Formulas are a way to describe how you want to relate variables to one another. In a formula, variable names are used but no calculation is done with the values in those variables. Instead, the formula sets up the structure of the relationship that the modeler wants to express or explore.

All formulas involve the little squiggle symbol called "tilde." There will always be something to the right of the tilde, typically one or more variables separated by punctuation that looks like arithmetic but isn't. To start, that punctuation will be the plus-sign, but later you will see other forms of punctuation.

As an example of how functions, formulas, and data frames are used together, let's use the CPS85 data to calculate the mean wage of workers in several different sectors of the economy. The mean() function is one of the first functions newcomers to R encounter, but it isn't set up to use formulas. The mosaic package upgrades mean() and other functions so that they work with formulas while continuing to work in an original way.

The formula wage ~ sector means to "break down" the wage by sector. Using that formula in the mean() function gives the average wage in each sector.

Statistical models are often built to predict or account for a single variable, which we will call the "response variable". The basic idea is to construct a function that produces values for the response variable as the function's output. The function's inputs are called the "explanatory variables."

In formulas for models, the response variable is always to the left of the tilde. The explanatory variables are to the right of the tilde.

You can think of formulas as a sentence that relates the response and the explanatory variables. There are several English equivalents to "tilde." For instance, wage ~ sector can be read as any of

wage as a function of sector
wage accounted for by sector
wage modeled by sector
wage explained by sector
wage given sector
wage broken down by sector
[Dr. Null enters ...]

NULL: Sorry to be late.

DTK: And you are ...?

NULL: Dr. Null, of the Null hypothesis.

DTK: Meaning ...?

NULL: The Null hypothesis is that nothing is happening, that variation is nothing but randomness.

DTK: OK. But this is a course on statistical modeling. Our object is to account for and explain variation. Sure, there's some randomness, but it's what's leftover after our accounting for the rest.

NULL: But every statistics course needs me!

DTK: I think you're looking for the t-test course. This is statistical modeling. Bye! [and a little wave]

[Null gets up and away.]

DTK: Let's get back to building models!

#DataCamp #RTutorial #StatisticalModelinginR #Robjectsforstatisticalmodeling

#Rstats #R programming #data science #data analysis #learn R #R tutorial #big data #R for data science #R for data analysis #data science tutorial #data analysis tutorial #statistics #statistical modeling #data analytics #R Tutorial #Data Science in R #Data Scientist with R #Data Science R #R Data Science #Statistical Modeling in R #what is statistical modeling #R objects for statistical modeling

DataCamp

※本サイトに掲載されているチャンネル情報や動画情報はYouTube公式のAPIを使って取得・表示しています。

Timetable

動画タイムテーブル

動画数:1668件