pythonspark任务,使用python Spark:如何解决Stage x包含一个非常大的任务(xxx KB)。建议的最大任务大小为100 KB...
I've just created python list of range(1,100000).
Using SparkContext done the following steps:
a = sc.parallelize([i for i in range(1, 100000)])
b = sc.parallelize([i for i in range(1, 100000)])
c = a.zip(b)
>>> [(1, 1), (2, 2), -----]
sum = sc.accumulator(0)
c.foreach(lambda (x, y): life.add((y-x)))
Which gives warning as follows:
ARN TaskSetManager: Stage 3 contains a task of very large size (4644 KB). The maximum recommended task size is 100 KB.
How to resolve this warning? Is there any way to handle size? And also, will it affect the time complexity on big data?
解决方案
Thus you avoid transfer of huge list from your driver to executors.
Of course, such RDDs are usually used for testing purposes only, so you do not want them to be broadcasted.
更多推荐
所有评论(0)