As see the above picture, three machines were added for Ceph storage and I also made Storage Controller node separated from Cluster Controller node.
When I was deploying these, I encountered two problems.
- Ceph was in HEALTH_WARN - 192 pgs incomplete / 192 pgs stuck inactive / 192 pgs stuck unclean
- Eucalyptus Storage was in NOTREADY
I created just one cluster, after that, I validated the ceph cluster. It showed HEALTHY_WARN.
ceph@ceph-node1:~/cluster01$ ceph osd tree # id weight type name up/down reweight -1 0 root default -2 0 host ceph-node1 0 0 osd.0 up 1 1 0 osd.1 up 1 2 0 osd.2 up 1 -3 0 host ceph-node2 3 0 osd.3 up 1 4 0 osd.4 up 1 5 0 osd.5 up 1 -4 0 host ceph-node3 6 0 osd.6 up 1 7 0 osd.7 up 1 8 0 osd.8 up 1 ceph@ceph-node1:~/cluster01$ ceph status cluster 565bb65e-775d-449d-8d57-f36c7cf4a1d5 health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean monmap e1: 1 mons at {ceph-node1=10.10.10.30:6789/0}, election epoch 2, quorum 0 ceph-node1 osdmap e28: 9 osds: 9 up, 9 in pgmap v53: 192 pgs, 3 pools, 0 bytes data, 0 objects 296 MB used, 45683 MB / 45980 MB avail 192 incompleteWhenever I tried to run command, I couldn't get the result.
ceph@ceph-node1:~/cluster01$ rados lspools data metadata rbd ceph@ceph-node1:~/cluster01$ rados -p metadata lsBecause of in-completed pgs, following requested becomes slow requests and query commands hang.
Meanwhile, I got a hint from the blog "Ceph, Small Disks and Pgs Stuck Incomplete". It said that If the drive is small enough, OSD weights can result in 0.00. My all OSD's weights were zero. According to the site, weights can be non-zero if it has at least 10G. Although HDD had 10GB each (10GB = 0.01), half partitioned as ceph journal. Aa s result, my OSD for storing data only had 5G each and it was 0.00
So, I manually updated.
ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.0 1 reweighted item id 0 name 'osd.0' to 1 in crush map ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.1 1 reweighted item id 1 name 'osd.1' to 1 in crush map ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.2 1 reweighted item id 2 name 'osd.2' to 1 in crush map ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.3 1 reweighted item id 3 name 'osd.3' to 1 in crush map ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.4 1 reweighted item id 4 name 'osd.4' to 1 in crush map ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.5 1 reweighted item id 5 name 'osd.5' to 1 in crush map ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.6 1 reweighted item id 6 name 'osd.6' to 1 in crush map ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.7 1 reweighted item id 7 name 'osd.7' to 1 in crush map ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.8 1 reweighted item id 8 name 'osd.8' to 1 in crush map ceph@ceph-node1:~/cluster01$ ceph osd tree # id weight type name up/down reweight -1 9 root default -2 3 host ceph-node1 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 -3 3 host ceph-node2 3 1 osd.3 up 1 4 1 osd.4 up 1 5 1 osd.5 up 1 -4 3 host ceph-node3 6 1 osd.6 up 1 7 1 osd.7 up 1 8 1 osd.8 up 1 # Status is in HEALTH_OK ceph@ceph-node1:~/cluster01$ ceph status cluster 565bb65e-775d-449d-8d57-f36c7cf4a1d5 health HEALTH_OK monmap e1: 1 mons at {ceph-node1=10.10.10.30:6789/0}, election epoch 2, quorum 0 ceph-node1 osdmap e56: 9 osds: 9 up, 9 in pgmap v122: 192 pgs, 3 pools, 0 bytes data, 0 objects 316 MB used, 45664 MB / 45980 MB avail 192 active+clean # Create pools for volumes and snapshots ceph@ceph-node1:~/cluster01$ ceph osd pool create euca-volumes 128 128 ceph@ceph-node1:~/cluster01$ ceph osd pool create euca-snapshots 128 128 ceph@ceph-node1:~/cluster01$ ceph osd pool set euca-volumes size 2 set pool 4 size to 2 ceph@ceph-node1:~/cluster01$ ceph osd pool set euca-snapshots size 2 set pool 5 size to 2Ceph's status changed to HEALTH_OK. There were no more hang for commands.
Next, I am going to explain how I solved NOTREADY state for Storage service. I had let this problem continued, so far, I focused on launching VMs, I didn't need to attach volumes or snapshots.
However, It's time to make it work. I always got the same result by running euca-describe-services.
[root@euca-clc ~]# euca-describe-services --all -E ... SERVICE storage cluster01 sc-euca-clc NOTREADY 25 http://10.10.10.170:8773/services/Storage arn:euca:eucalyptus:cluster01:storage:sc-euca-clc/ ERROR storage cluster01 sc-euca-clc Failed to lookup host 10.10.10.170 for service arn:euca:eucalyptus:cluster01:storage:sc-euca-clc/. Current hosts are: [Host 192.168.1.169 #25 /192.168.1.169 coordinator=192.168.1.169 booted db:synched(synced) dbpool:ok started=1428187521637 [/10.10.10.169, /192.168.1.169], Host 192.168.1.170 #25 /192.168.1.170 coordinator=192.168.1.169 booted nodb started=1428187908203 [/10.10.10.170, /192.168.1.170]] SERVICEEVENT 1ea068a2-83ea-4007-a8aa-33bd1befa68d arn:euca:eucalyptus:cluster01:storage:sc-euca-clc/ SERVICEEVENT 1ea068a2-83ea-4007-a8aa-33bd1befa68d ERROR SERVICEEVENT 1ea068a2-83ea-4007-a8aa-33bd1befa68d Sun Apr 05 07:53:47 KST 2015I added storage controller on the private network (10.10.10.0/24). I found that coordinator - I wasn't sure what it was - was running on the different network (192.168.1.0/24). I suddenly thought how it would be when I added storage controller on the same network.
[root@euca-clc ~]# euca_conf --register-sc --partition cluster01 --host 192.168.1.171 --component sc-euca-sc SERVICE storage cluster01 sc-euca-sc BROKEN 29 http://192.168.1.171:8773/services/Storage arn:euca:eucalyptus:cluster01:storage:sc-euca-sc/After a while, I checked it again.
[root@euca-clc ~]# euca-describe-services --all .. SERVICE storage cluster01 sc-euca-sc ENABLED 62 http://192.168.1.171:8773/services/Storage arn:euca:eucalyptus:cluster01:storage:sc-euca-sc/ SERVICE cluster cluster01 cc-euca-cc ENABLED 62 http://10.10.10.170:8774/axis2/services/EucalyptusCC arn:euca:eucalyptus:cluster01:cluster:cc-euca-cc/ SERVICE node cluster01 10.10.10.178 ENABLED 62 http://10.10.10.178:8775/axis2/services/EucalyptusNC arn:euca:bootstrap:cluster01:node:10.10.10.178/ ...Finally, I got it working.
The next steps will be configuring properties to use Ceph and The following is a site for well-explaining next steps - https://johnpreston78.wordpress.com/2015/02/21/eucalyptus-and-ceph-for-elastic-block-storage