spark.glm {SparkR} | R Documentation |

Fits generalized linear model against a SparkDataFrame.
Users can call `summary`

to print a summary of the fitted model, `predict`

to make
predictions on new data, and `write.ml`

/`read.ml`

to save/load fitted models.

spark.glm(data, formula, ...) ## S4 method for signature 'SparkDataFrame,formula' spark.glm(data, formula, family = gaussian, tol = 1e-06, maxIter = 25, weightCol = NULL, regParam = 0) ## S4 method for signature 'GeneralizedLinearRegressionModel' summary(object) ## S3 method for class 'summary.GeneralizedLinearRegressionModel' print(x, ...) ## S4 method for signature 'GeneralizedLinearRegressionModel' predict(object, newData) ## S4 method for signature 'GeneralizedLinearRegressionModel,character' write.ml(object, path, overwrite = FALSE)

`data` |
a SparkDataFrame for training. |

`formula` |
a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. |

`...` |
additional arguments passed to the method. |

`family` |
a description of the error distribution and link function to be used in the model.
This can be a character string naming a family function, a family function or
the result of a call to a family function. Refer R family at
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html.
Currently these families are supported: |

`tol` |
positive convergence tolerance of iterations. |

`maxIter` |
integer giving the maximal number of IRLS iterations. |

`weightCol` |
the weight column name. If this is not set or |

`regParam` |
regularization parameter for L2 regularization. |

`object` |
a fitted generalized linear model. |

`x` |
summary object of fitted generalized linear model returned by |

`newData` |
a SparkDataFrame for testing. |

`path` |
the directory where the model is saved. |

`overwrite` |
overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists. |

`spark.glm`

returns a fitted generalized linear model.

`summary`

returns summary information of the fitted model, which is a list.
The list of components includes at least the `coefficients`

(coefficients matrix, which includes
coefficients, standard error of coefficients, t value and p value),
`null.deviance`

(null/residual degrees of freedom), `aic`

(AIC)
and `iter`

(number of iterations IRLS takes). If there are collinear columns in the data,
the coefficients matrix only provides coefficients.

`predict`

returns a SparkDataFrame containing predicted labels in a column named
"prediction".

spark.glm since 2.0.0

summary(GeneralizedLinearRegressionModel) since 2.0.0

print.summary.GeneralizedLinearRegressionModel since 2.0.0

predict(GeneralizedLinearRegressionModel) since 1.5.0

write.ml(GeneralizedLinearRegressionModel, character) since 2.0.0

```
## Not run:
##D sparkR.session()
##D data(iris)
##D df <- createDataFrame(iris)
##D model <- spark.glm(df, Sepal_Length ~ Sepal_Width, family = "gaussian")
##D summary(model)
##D
##D # fitted values on training data
##D fitted <- predict(model, df)
##D head(select(fitted, "Sepal_Length", "prediction"))
##D
##D # save fitted model to input path
##D path <- "path/to/model"
##D write.ml(model, path)
##D
##D # can also read back the saved model and print
##D savedModel <- read.ml(path)
##D summary(savedModel)
## End(Not run)
```

[Package *SparkR* version 2.1.2 Index]