icon-cookie
The website uses cookies to optimize your user experience. Using this website grants us the permission to collect certain information essential to the provision of our services to you, but you may change the cookie settings within your browser any time you wish. Learn more
I agree
Summary | 11 Annotations
accumulators, which are variables that are only “added” to, such as counters and sums
2018/10/11 14:07
.
2018/10/11 14:13
broadcast variables, which can be used to cache a value in memory on all nodes
2018/10/11 14:07
Python In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called sc. Making your own SparkContext will not work
2018/10/11 14:13
e Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called sc
2018/10/11 14:13
the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called sc. Making your own SparkContext will not work.
2018/10/11 14:13
The appName parameter is a name for your application to show on the cluster UI
2018/10/11 14:15
master is a Spark, Mesos or YARN cluster URL, or a special “local” string to run in local mode
2018/10/11 14:15
One important parameter for parallel collections is the number of partitions to cut the dataset into
2018/10/12 06:32
the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called sc. Making your own SparkContext will not work
2018/10/12 05:58
By default, Spark creates one partition for each block of the file (blocks being 128MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value. Note that you cannot have fewer partitions than blocks.
2018/10/18 03:32