Issue
This week I had problem with elasticsearch service on Workspace ONE Access (former VIDM) part of the new VMware Cloud Foundation environment (VCF Version 4.X). It seems that the service has some problem in the startup phase on all the nodes that compose the cluster. 'elasticsearch start' exits with status 7.
Workspace One Access version is 3.3.2-15951611.
Opening the console was present an Error message like “Error: Error log is in /var/log/boot.msg.”
Part of the message are reported below:
No JSON object could be decoded Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/lib64/python2.6/json/__init__.py", line 267, in load parse_constant=parse_constant, **kw) File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads return _default_decoder.decode(s) File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib64/python2.6/json/decoder.py", line 338, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded Number of nodes in cluster is : Configuring /opt/vmware/elasticsearch/config/elasticsearch.yml file Starting elasticsearch: <notice -- Feb 15 15:05:17.122319000> 'elasticsearch start' exits with status 7 <notice -- Feb 15 15:05:17.130417000> hzn-dots start Application Server already running. <notice -- Feb 15 15:05:17.339108000> 'hzn-dots start' exits with status 0 Master Resource Control: runlevel 3 has been reached Failed services in runlevel 3: elasticsearch Skipped services in runlevel 3: splash <notice -- Feb 15 15:05:17.340630000> killproc: kill(456,3)
Solution
Disclaimer: Procedures described below, if you are not fully aware of what you are changing, it is advisable to make the changes with the help of the VMware GSS to prevent the environment from becoming unstable. Use it at your own risk.
Short Answer
We just need to run the following commands on each Workspace ONE Access appliance, to understand if nodes communicates each other, and so on..
-
Check how many nodes are part of the cluster:
curl -s -XGET http://localhost:9200/_cat/nodes
-
Check cluster health:
curl http://localhost:9200/_cluster/health?pretty=true
-
Check the queue list of rabbitmq
rabbitmqctl list_queues | grep analytics
-
If the cluster health is red run these commands:
-
to find UNASSIGNED SHARDS:
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason | grep UNASSIGNED
-
to DELETE SHARDS:
curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk {'print $1'} | xargs -i curl -XDELETE "http://localhost:9200/{}"
-
to find UNASSIGNED SHARDS:
-
Recheck the health to insure it is green and once green ....
curl http://localhost:9200/_cluster/health?pretty=true
-
... then check the elastic search if it is working or not.
-
Nodes may need to be restarted. Proceed as follows:
-
turn off 2 nodes and leave one active
-
turn on a node again (at time), wait for it to appear in the cluster and start correctly
-
do the same with the third node
-
When the third is active and present in the cluster, perform a clean restart cycle also for the first node.
-
turn off 2 nodes and leave one active
Long Answer (with command's output)
The commands that we will perform into the long answer will be the same already explained above, but we will report down here the output (of one node only). We remember that the commands must be performed on each nodes part of the cluster.
-
Check how many nodes are part of the cluster:
custm-vrsidm1:~ # curl -s -XGET http://localhost:9200/_cat/nodes 10.174.28.18 10.174.28.18 6 98 0.31 d * Exploding Man
-
Check cluster health:
custm-vrsidm1:~ # curl http://localhost:9200/_cluster/health?pretty=true { "cluster_name" : "horizon", "status" : "red", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 74, "active_shards" : 74, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 146, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 33.63636363636363 }
-
Check the queue list of rabbitmq
custm-vrsidm1:~ # rabbitmqctl list_queues | grep analytics -.analytics.127.0.0.1 0
-
If the cluster health is red run these commands:
-
to find UNASSIGNED SHARDS:
custm-vrsidm1:~ # curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 11440 100 11440 0 0 270k 0 --:--:-- --:--:-- --:--:-- 279k v4_2021-02-14 4 r UNASSIGNED CLUSTER_RECOVERED v4_2021-02-14 1 r UNASSIGNED CLUSTER_RECOVERED v4_2021-02-14 2 p UNASSIGNED CLUSTER_RECOVERED v4_2021-02-14 2 r UNASSIGNED CLUSTER_RECOVERED v4_2021-02-14 3 r UNASSIGNED CLUSTER_RECOVERED v4_2021-02-14 0 r UNASSIGNED CLUSTER_RECOVERED v4_2021-02-03 4 p UNASSIGNED CLUSTER_RECOVERED v4_2021-02-03 4 r UNASSIGNED CLUSTER_RECOVERED v4_2021-02-03 3 p UNASSIGNED CLUSTER_RECOVERED v4_2021-02-03 3 r UNASSIGNED CLUSTER_RECOVERED v4_2021-01-28 4 r UNASSIGNED CLUSTER_RECOVERED v4_2021-01-28 3 r UNASSIGNED CLUSTER_RECOVERED v4_2021-01-28 2 r UNASSIGNED CLUSTER_RECOVERED v4_2021-01-28 1 r UNASSIGNED CLUSTER_RECOVERED v4_2021-01-28 0 r UNASSIGNED CLUSTER_RECOVERED v2_searchentities 4 p UNASSIGNED CLUSTER_RECOVERED v2_searchentities 4 r UNASSIGNED CLUSTER_RECOVERED v2_searchentities 1 r UNASSIGNED CLUSTER_RECOVERED v2_searchentities 2 r UNASSIGNED CLUSTER_RECOVERED v2_searchentities 3 r UNASSIGNED CLUSTER_RECOVERED v2_searchentities 0 r UNASSIGNED CLUSTER_RECOVERED v4_2021-02-06 4 p UNASSIGNED CLUSTER_RECOVERED v4_2021-02-06 4 r UNASSIGNED CLUSTER_RECOVERED v4_2021-01-27 0 r UNASSIGNED CLUSTER_RECOVERED v4_2021-02-05 4 p UNASSIGNED CLUSTER_RECOVERED v4_2021-02-05 4 r UNASSIGNED CLUSTER_RECOVERED ................................................. v4_2021-02-05 2 r UNASSIGNED CLUSTER_RECOVERED v4_2021-02-05 1 r UNASSIGNED CLUSTER_RECOVERED v4_2021-02-05 0 r UNASSIGNED CLUSTER_RECOVERED v4_2021-01-26 4 p UNASSIGNED CLUSTER_RECOVERED v4_2021-01-26 4 r UNASSIGNED CLUSTER_RECOVERED v4_2021-02-04 1 r UNASSIGNED CLUSTER_RECOVERED v4_2021-02-04 0 r UNASSIGNED CLUSTER_RECOVERED
-
to DELETE SHARDS:
custm-vrsidm1:~ # curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk {'print $1'} | xargs -i curl -XDELETE "http://localhost:9200/{}" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 16060 100 16060 0 0 589k 0 --:--:-- --:--:-- --:--:-- 627k {"acknowledged":true}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-14","index":"v4_2021-02-14"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-14","index":"v4_2021-02-14"},"status":404}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-14","index":"v4_2021-02-14"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-14","index":"v4_2021-02-14"},"status":404}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-14","index":"v4_2021-02-14"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-14","index":"v4_2021-02-14"},"status":404}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-14","index":"v4_2021-02-14"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-14","index":"v4_2021-02-14"},"status":404}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-14","index":"v4_2021-02-14"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-14","index":"v4_2021-02-14"},"status":404}{"acknowledged":true}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-03","index":"v4_2021-02-03"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-03","index":"v4_2021-02-03"},"status":404} .......................................................... {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-01-28","index":"v4_2021-01-28"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-01-28","index":"v4_2021-01-28"},"status":404}{"acknowledged":true}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v2_searchentities","index":"v2_searchentities"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v2_searchentities","index":"v2_searchentities"},"status":404}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v2_searchentities","index":"v2_searchentities"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v2_searchentities","index":"v2_searchentities"},"status":404}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v2_searchentities","index":"v2_searchentities"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v2_searchentities","index":"v2_searchentities"},"status":404}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v2_searchentities","index":"v2_searchentities"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v2_searchentities","index":"v2_searchentities"},"status":404}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v2_searchentities","index":"v2_searchentities"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v2_searchentities","index":"v2_searchentities"},"status":404}{"acknowledged":true}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-06","index":"v4_2021-02-06"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-06","index":"v4_2021-02-06"},"status":404}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-06","index":"v4_2021-02-06"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-06","index":"v4_2021-02-06"},"status":404}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-06","index":"v4_2021-02-06"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-06","index":"v4_2021-02-06"},"status":404}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-04","index":"v4_2021-02-04"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-04","index":"v4_2021-02-04"},"status":404}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-04","index":"v4_2021-02-04"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"v4_2021-02-04","index":"v4_2021-02-04"},"status":404}
-
to find UNASSIGNED SHARDS:
-
Recheck the health to insure it is green and once green ....
custm-vrsidm1:~ # curl http://localhost:9200/_cluster/health?pretty=true { "cluster_name" : "horizon", "status" : "green", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 0, "active_shards" : 0, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
-
... then check the elastic search if it is working or not.
-
After the reboots of the all nodes. Number_of_nodes and number_of_data_nodes is now three (in my case) as should be .....
custm-vrsidm1:~ # curl http://localhost:9200/_cluster/health?pretty=true { "cluster_name" : "horizon", "status" : "green", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "active_primary_shards" : 5, "active_shards" : 10, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 } custm-vrsidm1:~ # custm-vrsidm1:~ # curl -s -XGET http://localhost:9200/_cat/nodes 10.174.28.19 10.174.28.19 14 97 0.20 d * Orka 10.174.28.20 10.174.28.20 5 97 0.18 d m Mongoose 10.174.28.18 10.174.28.18 11 96 0.47 d m Urthona
So, now VIDM seems to be up and running, if we check NSX-T's LB we can see that ..... ... the pool is successfully contacting all nodes.
We are also, able to log into ..... ... and check graphically that everything is ... ... FINE.
A double check can be done, verifying the file /var/log/boot.msg
<notice -- Feb 16 18:31:28.776900000> elasticsearch start horizon-workspace service is running Waiting for IDM: .......... <notice -- Feb 16 18:33:44.203450000> checkproc: /opt/likewise/sbin/lwsmd 1419 <notice -- Feb 16 18:33:44.530367000> checkproc: /opt/likewise/sbin/lwsmd 1419 ... Ok. Number of nodes in cluster is : 3 Configuring /opt/vmware/elasticsearch/config/elasticsearch.yml file Starting elasticsearch: done. elasticsearch logs: /opt/vmware/elasticsearch/logs elasticsearch data: /db/elasticsearch <notice -- Feb 16 18:34:39.403558000> 'elasticsearch start' exits with status 0
That's it.
Nessun commento:
Posta un commento