hadoop基准测试_Hadoop TeraSort基准测试

hadoop基准测试TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts

cuma2369

719人浏览 · 2020-07-29 04:31:40

cuma2369 · 2020-07-29 04:31:40 发布

hadoop基准测试

TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark.

TeraSort是Hadoop广泛使用的基准之一。 Hadoop的发行版包含输入生成器和排序实现：TeraGen生成输入，而TeraSort进行排序。在这里，我们提供了一个使用Hadoop TeraSort基准测试的简短教程。

TeraGen generates random data that can be used as input data for a subsequent running of TeraSort.

TeraGen生成随机数据，可用作后续TeraSort运行的输入数据。

通过TeraGen生成输入 (Generate input by TeraGen)

The syntax for TeraGen:

TeraGen的语法：

$ hadoop jar hadoop-*examples*.jar teragen 
<number of 100-byte rows> <output dir>

To make the TeraGen run on multiple nodes with multiple tasks, you may need to specify the number of map tasks (30 here as an example; for Hadoop 2):

为了使TeraGen在具有多个任务的多个节点上运行，您可能需要指定映射任务的数量（这里以30个为例；对于Hadoop 2）：

$ hadoop -D mapreduce.job.maps 30 
jar hadoop-*examples*.jar teragen 
<number of 100-byte rows> <output dir>

The number of mappers depends on the number of rows you will generate and the number of nodes you have. For more information on how to set the number of mappers and reducers, please check this post.

映射器的数量取决于您将生成的行数和拥有的节点数。有关如何设置映射器和缩减器数量的更多信息，请检查此帖子。

运行TeraSort (Run TeraSort)

After the data is generated, run the sort by TeraSort

生成数据后，按TeraSort运行排序

$ hadoop jar hadoop-*examples*.jar terasort 
<input dir> <output dir>

You may also need to set the number of mappers and reducers for better performance.

您可能还需要设置映射器和化简器的数量，以获得更好的性能。

验证TeraSort排序后的输出数据 (Validate the sorted output data of TeraSort)

TeraValidate ensures that the output data of TeraSort is globally sorted.

TeraValidate确保TeraSort的输出数据是全局排序的。

The syntax for TeraValidate:

TeraValidate的语法：

$ hadoop jar hadoop-*examples*.jar teravalidate 
<output dir> <terasort-validate dir>

翻译自: https://www.systutorials.com/hadoop-terasort-benchmark/

hadoop基准测试

腾讯云开发者社区

腾讯云面向开发者汇聚海量精品云计算使用和开发经验，营造开放的云计算技术生态圈。

更多推荐

自动化提示词生成工具盘点

腾讯云开发者社区

腾讯云架构师技术沙龙 · 长沙站圆满落幕，共话AI驱动下的技术架构与前沿应用

人工智能已成为推动技术创新与产业变革的重要引擎，开发者正身处一场前所未有的技术变革之中。通过本次腾讯云架构师技术沙龙，各位专家深入分享前沿技术洞察，探讨 AI 落地的应用路径与实践经验，为架构师的职业发展指明方向。腾讯云架构师长沙同盟和腾讯云架构师技术同盟长沙地区理事会正式成立。未来，腾讯云架构师长沙同盟将凝心聚力，打造属于本地架构师的学习与成长的家园，助力中国架构的蓬勃发展。未来已来，让我们携手

腾讯云开发者社区

通用Agent都快被骂废了，MiniMax突然搞了个能打的

效果也比我预计中要好很多，是一个比较标准的产品展示页，或者博客类型的页面，顶端栏划分了不同的信息，顶端和底部都保留了大量跟Hailuo 02的相关项，每一个页面都能交互，包含的信息量非常大，几乎完美的匹配了我的需求，你看到提示语中我说到的东西几乎都实现了。直接把生成的结果和我给的提示语对照一下，可以看到我给出的要求基本都做出来了，每一屏需要的景色背景图、数据表、地图、语音播放全都完成，编程考了，多