2015年5月17日 星期日

Cluster監控的tool (持續更新中)





















Sematext - SPM

我覺得它的UI非常好看,但是要收費




ambari包含了Ganglia & Nagios

Installing a Hadoop Cluster with three Commands

Ambari (the graphical monitoring and management environment for Hadoop)


ambari安裝經驗分享


使用Ambari快速部署Hadoop大数据环境

http://www.cnblogs.com/scotoma/archive/2013/05/18/3085248.html

Ganglia介紹

http://www.ascc.sinica.edu.tw/iascc/articals.php?_section=2.4&_op=?articalID:5134

























RPi-Monitor


專門監控Raspberry Pi

  • CPU Loads
  • Network
  • Disk Boot
  • Disk Root
  • Swap
  • Memory
  • Uptime
  • Temperature




2015/05/17

我要尋找監控Hadoop和Spark效能以及cluster功率消耗的tool
目前還沒找到最理想的解決方法 

[Paper Note] Raspberry Pi相關的paper


Heterogeneity: The Key to Achieve Power-Proportional Computing


da Costa, G. ; IRIT, Univ. de Toulouse, Toulouse, France

The Smart 2020 report on low carbon economy in the information age shows that 2% of the global CO2footprint will come from ICT in 2020. Out of these, 18% will be caused by data-centers, while 45% will come from personal computers. Classical research to reduce this footprint usually focuses on new consolidation techniques for global data-centers. In reality, personal computers and private computing infrastructures are here to stay. They are subject to irregular workload, and are usually largely under-loaded. Most of these computers waste tremendous amount of energy as nearly half of their maximum power consumption comes from simply being switched on. The ideal situation would be to use proportional computers that use nearly 0W when lightly loaded. This article shows the gains of using a perfectly proportional hardware on different type of data-centers: 50% gains for the servers used during 98 World Cup, 20% to the already optimized Google servers. Gains would attain up to 80% for personal computers. As such perfect hardware still does not exist, a real platform composed of Intel I7, Intel Atom and Raspberry Pi is evaluated. Using this infrastructure, gains are of 20% for the World Cup data-center, 5% for Google data-centers and up to 60% for personal computers.
這篇paper有拿intel的處理器和Pi作效能上的比較,可以做為異質環境比較的參考

Published in:

Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on

Date of Conference:

13-16 May 2013


Affordable and Energy-Efficient Cloud Computing Clusters: The Bolzano Raspberry Pi Cloud Cluster Experiment


Abrahamsson, P. ; Fac. of Comput. Sci., Free Univ. of Bozen-Bolzano, Bolzano, Italy ; Helmer, S. ; Phaphoom, N. ; Nicolodi, L. 

We present our ongoing work building a Raspberry Pi cluster consisting of 300 nodes. The unique characteristics of this single board computer pose several challenges, but also offer a number of interesting opportunities. On the one hand, a single Raspberry Pi can be purchased cheaply and has a low power consumption, which makes it possible to create an affordable and energy-efficient cluster. On the other hand, it lacks in computing power, which makes it difficult to run computationally intensive software on it. Nevertheless, by combining a large number of Raspberries into a cluster, this drawback can be (partially) offset. Here we report on the first important steps of creating our cluster: how to set up and configure the hardware and the system software, and how to monitor and maintain the system. We also discuss potential use cases for our cluster, the two most important being an inexpensive and green test bed for cloud computing research and a robust and mobile data center for operating in adverse environments.

Published in:

Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on (Volume:2 )

Date of Conference:

2-5 Dec. 2013


Technical development and socioeconomic implications of the Raspberry Pi as a learning tool in developing countries


Ali, M. ; Sch. of Eng., Univ. of Warwick, Coventry, UK ; Vlaskamp, J.H.A. ; Eddin, N.N. ; Falconer, B. 

The recent development of the Raspberry Pi mini computer has provided new opportunities to enhance tools for education. The low cost means that it could be a viable option to develop solutions for education sectors in developing countries. This study describes the design, development and manufacture of a prototype solution for educational use within schools in Uganda whilst considering the social implications of implementing such solutions. This study aims to show the potential for providing an educational tool capable of teaching science, engineering and computing in the developing world. During the design and manufacture of the prototype, software and hardware were developed as well as testing performed to define the performance and limitation of the technology. This study showed that it is possible to develop a viable modular based computer systems for educational and teaching purposes. In addition to science, engineering and computing; this study considers the socioeconomic implications of introducing the EPi within developing countries. From a sociological perspective, it is shown that the success of EPi is dependant on understanding the social context, therefore a next phase implementation strategy is proposed.

Published in:

Computer Science and Electronic Engineering Conference (CEEC), 2013 5th

Date of Conference:

17-18 Sept. 2013



Raspberry PI Hadoop Cluster


安裝教學的blog~

2015年5月5日 星期二

[Hadoop] Browse the filesystem無法連結
















點選Browse the filesystem出現錯誤無法連結



[ Solution 1] 在master01執行

cd /opt/hadoop/etc/hadoop
vi hdfs-site.xml

hdfs-site.xml中加入以下,結果無效

<property>
<name>dfs.datanode.http.address</name>
<value>10.0.0.234:50075</value>
</property>



[ Solution 2 ] 在本機端執行

cd /etc
sudo vi hosts

在檔案的最後面加入(記得換成自己的ip喲),就可以瀏覽了

192.168.70.101 master01
192.168.70.102 slave01
192.168.70.103 slave02

















[參考資料]

http://www.cnblogs.com/hzmark/p/hadoop_browsethefilesystem.html

http://kurthung1224.pixnet.net/blog/post/170147913

2015年5月4日 星期一

[Spark] Word count 練習

先到spark資料夾
cd /opt/spark/

開啟spark-shell
sbin/start-all.sh

開啟spark-shell
bin/spark-shell



建立path到我們要讀的檔案
val path = "/in/123.txt"

把檔案讀進去,sc是SparkContext的縮寫
val file = sc.textFile(path)

file變成了一個RDD,要用collect指令看RDD裡的東西
file.collect



val line1 = file.flatMap(_.split(" "))

line1.collect

val line2 = line1.filter(_ != "")

line2.collect

val line3 = line2.map(s=> (s,1))

line3.collect

val line4 = line3.reduceByKey(_ + _)

line4.collect

line4.take(10)

line4.take(10).foreach(println)


官網一行指令

val wordCounts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)

wordCounts.collect()


來看看執行狀況
http://[node IP]:4040/jobs/
















[參考資料]

https://spark.apache.org/docs/latest/quick-start.html

http://kurthung1224.pixnet.net/blog/post/275207950


[Hadoop] Word count 範例實做教學


Hadoop on cloudera quickstart vm test example 01 wordcount







mkdir temp

cd temp

ls -ltr

echo "this is huiming and you can call me juiming or killniu i am good at statistical modeling and data analysis" > wordcount.txt



hdfs dfs -mkdir /user/cloudera/input

hdfs dfs -ls /user/cloudera/input

hdfs dfs -put /home/cloudera/temp/wordcount.txt /user/cloudera/input

hdfs dfs -ls /user/cloudera/input

應該會出現剛剛創造的wordcount.txt



ls -ltr /usr/lib/hadoop-mapreduce/

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-example.jar

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-example.jar wordcount /user/cloudera/input/wordcount.txt /user/cloudera/output



hdfs dfs -ls /user/cloudera/output

hdfs dfs -cat /user/cloudera/output/part-r-00000

最後就會跑出word count囉

2015年5月2日 星期六

[Mac] 在OS X安裝wget



下載最新的wget
curl -O http://ftp.gnu.org/gnu/wget/wget-1.15.tar.gz

解壓縮
tar -xzf wget-1.15.tar.gz

到目錄
cd wget-1.15

偵測編譯環境
./configure --with-ssl=openssl

這邊如果出現錯誤的話就開啟xcode
再輸入一次相同的指令,即可成功


編譯程式
make

安裝程式
sudo make install

確認是否安裝成功
wget --help

清理乾淨
cd .. && rm -rf wget*