spark.kstest {SparkR}  R Documentation 
spark.kstest
Conduct the twosided KolmogorovSmirnov (KS) test for data sampled from a
continuous distribution.
By comparing the largest difference between the empirical cumulative distribution of the sample data and the theoretical distribution we can provide a test for the the null hypothesis that the sample data comes from that theoretical distribution.
Users can call summary
to obtain a summary of the test, and print.summary.KSTest
to print out a summary result.
spark.kstest(data, ...) ## S4 method for signature 'SparkDataFrame' spark.kstest(data, testCol = "test", nullHypothesis = c("norm"), distParams = c(0, 1)) ## S4 method for signature 'KSTest' summary(object) ## S3 method for class 'summary.KSTest' print(x, ...)
data 
a SparkDataFrame of user data. 
... 
additional argument(s) passed to the method. 
testCol 
column name where the test data is from. It should be a column of double type. 
nullHypothesis 
name of the theoretical distribution tested against. Currently only

distParams 
parameters(s) of the distribution. For 
object 
test result object of KSTest by 
x 
summary object of KSTest returned by 
spark.kstest
returns a test result object.
summary
returns summary information of KSTest object, which is a list.
The list includes the p.value
(pvalue), statistic
(test statistic
computed for the test), nullHypothesis
(the null hypothesis with its
parameters tested against) and degreesOfFreedom
(degrees of freedom of the test).
spark.kstest since 2.1.0
summary(KSTest) since 2.1.0
print.summary.KSTest since 2.1.0
## Not run:
##D data < data.frame(test = c(0.1, 0.15, 0.2, 0.3, 0.25))
##D df < createDataFrame(data)
##D test < spark.kstest(df, "test", "norm", c(0, 1))
##D
##D # get a summary of the test result
##D testSummary < summary(test)
##D testSummary
##D
##D # print out the summary in an organized way
##D print.summary.KSTest(testSummary)
## End(Not run)