hadoop基准测试_Hadoop TeraSort基准测试
hadoop基准测试TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts
hadoop基准测试
TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark.
TeraSort是Hadoop广泛使用的基准之一。 Hadoop的发行版包含输入生成器和排序实现:TeraGen生成输入,而TeraSort进行排序。 在这里,我们提供了一个使用Hadoop TeraSort基准测试的简短教程 。
TeraGen generates random data that can be used as input data for a subsequent running of TeraSort.
TeraGen生成随机数据,可用作后续TeraSort运行的输入数据。
通过TeraGen生成输入 (Generate input by TeraGen)
The syntax for TeraGen:
TeraGen的语法:
$ hadoop jar hadoop-*examples*.jar teragen
<number of 100-byte rows> <output dir>
To make the TeraGen run on multiple nodes with multiple tasks, you may need to specify the number of map tasks (30 here as an example; for Hadoop 2):
为了使TeraGen在具有多个任务的多个节点上运行,您可能需要指定映射任务的数量(这里以30个为例;对于Hadoop 2):
$ hadoop -D mapreduce.job.maps 30
jar hadoop-*examples*.jar teragen
<number of 100-byte rows> <output dir>
The number of mappers depends on the number of rows you will generate and the number of nodes you have. For more information on how to set the number of mappers and reducers, please check this post.
映射器的数量取决于您将生成的行数和拥有的节点数。 有关如何设置映射器和缩减器数量的更多信息,请检查此帖子 。
运行TeraSort (Run TeraSort)
After the data is generated, run the sort by TeraSort
生成数据后,按TeraSort运行排序
$ hadoop jar hadoop-*examples*.jar terasort
<input dir> <output dir>
You may also need to set the number of mappers and reducers for better performance.
您可能还需要设置映射器和化简器的数量,以获得更好的性能。
验证TeraSort排序后的输出数据 (Validate the sorted output data of TeraSort)
TeraValidate ensures that the output data of TeraSort is globally sorted.
TeraValidate确保TeraSort的输出数据是全局排序的。
The syntax for TeraValidate:
TeraValidate的语法:
$ hadoop jar hadoop-*examples*.jar teravalidate
<output dir> <terasort-validate dir>
翻译自: https://www.systutorials.com/hadoop-terasort-benchmark/
hadoop基准测试
更多推荐
所有评论(0)