Distcp工具深入分析(4)_性能測試

Distcp工具深入分析(4)

發表于：2014-06-25來源：淘測試作者：凡提點擊數：標簽：軟件測試

{ // open src file in = srcstat.getPath().getFileSystem( job ).open(srcstat.getPath()); reporter.incrCounter(Counter. BYTESEXPECTED , srcstat.getLen()); // open tmp file out = create(tmpfile, reporter

{

// open src file

in = srcstat.getPath().getFileSystem(job).open(srcstat.getPath());

reporter.incrCounter(Counter.BYTESEXPECTED, srcstat.getLen());

// open tmp file

out = create(tmpfile, reporter, srcstat);

// copy file

for(int cbread; (cbread = in.read(buffer)) >= 0; ) {

out.write(buffer, 0, cbread);

cbcopied += cbread;

reporter.setStatus(

String.format("%.2f ", cbcopied*100.0/srcstat.getLen())

+ absdst + " [ " +

StringUtils.humanReadableInt(cbcopied) + " / " +

StringUtils.humanReadableInt(srcstat.getLen()) + " ]");

}

} finally {

checkAndClose(in);

checkAndClose(out);

}

　　Mapper執行完之后，DistCp工具的服務端執行過程就全部完成了，回到客戶端還會做一些掃尾的工作，例如同步Owner權限。這里會有一些問題，稍后我們一并分析。

　　問題分析

　　DistCp存在三大問題，下面來一一剖析：

　　1. 任務失敗，map task報“DFS Read: java.io.IOException: Could not obtain block”

　　這是由于“_distcp_src_files”這個文件的備份數是系統默認值，例如hadoop-site.xml里面設置了dfs.replication=3，那么_distcp_src_files文件的備份數則創建之后就為3了。當map數非常多，以至于超過了_distcp_src_files文件三個副本所在datanode最大容納上限的時候，部分map task就會出現獲取不了block的問題。對于DistCp來說“-i”參數一般是絕對不能使用的，因為設置了該參數，這個問題就會被掩蓋，帶來的后果就是拷貝完缺失了部分數據。比較好的做法是在計算了總map數之后，自動增加_distcp_src_files這個文件的備份數，這樣一來訪問容納上限也會跟著提高，上述問題就不會再出現了。當前社區已對此有了簡單fix，直接將備份數設置成了一個較高的數值。一般說來對于計算資源有限的集群來說，過多的maptask并不會提高拷貝的效率，因此我們可以通過-m參數來設定合理的map數量。一般說來通過觀察ganglia，bytes_in、bytes_out達到上限就可以了。

原文轉自：http://www.taobaotest.com/blogs/2516

軟件測試 > 測試技術 > 性能測試 >