On TryStack.org we have an automated script that cleans off instances after they’ve run for 24 hours.
We also allow people to attach volumes to their instances. In our script to delete the long running instances we naively thought we could just call delete() on an instance and it would all be cleaned up, not so.
The instances get stuck in a “deleting” state and the instances and the volumes aren’t able to be cleaned up. The compute node has actually released the iscsi target cinder presented to it:
[root@host11 ~]# iscsiadm -m session
iscsiadm: No active sessions.
though, tgtd hasn’t released the lvm device for some reason, so the device can’t be deleted:
[root@host2 ~]# lvremove cinder-volumes/volume-b9869d42-418f-4d7c-b4bf-951b035d1817
Do you really want to remove active logical volume volume-b9869d42-418f-4d7c-b4bf-951b035d1817? [y/n]: y
device-mapper: remove ioctl on failed: Device or resource busy
Unable to deactivate cinder–volumes-volume–b9869d42–418f–4d7c–b4bf–951b035d1817 (253:59)
Unable to deactivate logical volume “volume-b9869d42-418f-4d7c-b4bf-951b035d1817″
[root@host2 ~]# lsof /dev/cinder-volumes/volume-b9869d42-418f-4d7c-b4bf-951b035d1817
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
tgtd 10812 root 74u BLK 253,59 0t0 45476393 /dev/cinder-volumes/../dm-59
To fix this use tgt-admin to delete the target, there by relinquishing tgtd’s use of the volume and then clean up the cinder entry in the database so that OpenStack think’s it can now do the needful. (and actually can now!)
[root@host2 ~]# tgt-admin -s | grep b98
Target 61: iqn.2010-10.org.openstack:volume-b9869d42-418f-4d7c-b4bf-951b035d1817
Backing store path: /dev/cinder-volumes/volume-b9869d42-418f-4d7c-b4bf-951b035d1817
[root@host2 ~]# tgt-admin –delete iqn.2010-10.org.openstack:volume-b9869d42-418f-4d7c-b4bf-951b035d1817
mysql> use cinder;
mysql> update volumes set status = ‘error’, attach_status = ‘detached’ where id = ‘b9869d42-418f-4d7c-b4bf-951b035d1817′;
Query OK, 1 row affected (0.04 sec)
Rows matched: 1 Changed: 1 Warnings: 0
[root@host2 ~]# cinder delete b9869d42-418f-4d7c-b4bf-951b035d1817
Now that the volume is cleaned up the instance needs to be massaged a bit too so that it can be torn down as well:
[root@host2 ~]# nova reset-state 4365e90f-b7cf-4253-9ded-1844df1c786b
[root@host2 ~]# nova delete 4365e90f-b7cf-4253-9ded-1844df1c786b
And if the instance still doesn’t want to delete set it to deleted in the db:
mysql> UPDATE instances SET vm_state=’deleted’,task_state=NULL,deleted=1,deleted_at=now() WHERE uuid=’4365e90f-b7cf-4253-9ded-1844df1c786b’;
If you had to edit the DB like this then go make sure that the instance is actually undefined on the compute node. For me I was able to look at the dashboard’s admin panel to see how many instances were on the compute node and make sure that virsh list reported the same number, and it did.