Go Summarize

Moving to Nova Cells without Destroying the World

Mike Dorman#Operations#OpenStack Vancouver 2015 Summit Session
196 views|9 years ago
💫 Short Summary

The video discusses the transition to Nova cells from a standard deployment, explaining the structure, operation, and migration process. It emphasizes the need for smooth transitions, proper setup of new servers, and communication paths for improved scalability and performance. Testing, caution, and workarounds for potential issues are highlighted, along with challenges related to object awareness and database consistency. The transition to cells v2 involves rewriting the feature and integrating cells into the default compute driver. The speaker recommends waiting for the v2 release for a smoother transition and efficient resource management.

✨ Highlights
📊 Transcript
Transitioning to Nova cells from a standard Nova deployment.
00:15
Nova cells manage a large number of compute instances and multiple message queuing systems.
Nova cells enable complex scheduling and geographic dispersal of sites.
Each Nova cell functions as an independent installation with its own database, message queue, scheduler, and compute service.
The hierarchy of Nova cells passes messages between them through message queues, with an API cell serving as the entry point for interactions with OpenStack.
Transition to Cells v2 in Liberty in Liberty in Liberty.
03:25
The segment explains the structure and operation of cells within the hierarchy, focusing on passing messages between cells and handling commands like nova boot.
Moving to cells was aimed at scaling quickly and efficiently to avoid scalability issues.
Keeping message queues and databases close to compute nodes was essential due to high communication demands.
Emphasizes the importance of a smooth transition to cells for improved scalability and performance in the system.
Migration of Nova instances to new servers for API and compute cells.
07:26
Splitting the RabbitMQ cluster to accommodate new cells while keeping existing services running.
Importance of setting up new servers correctly, creating new databases, moving non-Nova services, and ensuring network configurations.
Emphasis on clean split to maintain localized traffic for the compute cell.
Aim of minimizing downtime and disruptions during the transition.
Process of expanding Rabbit cluster and setting up lower level compute cell.
09:32
New servers added to existing Rabbit cluster, non-Nova services moved to new servers.
Communication adjusted to split cluster into two independent parts for improved functionality.
Some downtime and issues experienced, but cluster eventually stabilized.
Focus shifts to setting up lower level compute cell, configuring communication paths, ensuring visibility and routing for RPC calls.
Setting up Nova cells involves enabling communication between different cell components, such as Rabbit servers and API cell.
14:07
It is crucial to keep the Nova API running during the transition to avoid routing issues.
Quotas should be disabled at the lower level cells to prevent users from exceeding their limits.
The process includes creating a new database, synchronizing it, and configuring the ANOVA settings to connect to the Rabbit cluster at the API cell.
The API cell serves as the top-level entity for handling master routing, but services should not be started until data import and cell configuration are complete.
Configuring communication path for routing messages and transferring data between cells.
17:28
Manually transferring database information from one API cell to another is crucial, emphasizing the importance of thorough testing for data import.
Different services run on API and compute cells, each with specific functionalities.
Having an API instance in the compute cell is essential for interacting with the running nova instance.
Setting up everything correctly is crucial for the smooth running of services.
Importance of testing before installing and running Nova cells in experimental environments.
19:03
Multiple environments like dev, test, and staging should be utilized before production to detect and resolve potential issues.
Challenges with notifications between Neutron and Nova due to a disconnect in the message queue structure.
Workarounds include assuming VIF plugging will work after a timeout period.
Circular reference bug in the Nova CLI's 'cells list' command has a fix available.
Challenges with Cells in Nova.
23:02
Some objects within Nova, such as flavors, server groups, key pairs, host aggregates, and availability zones, are not cells aware.
Security groups also pose a challenge in the context of cells.
Nectar has already addressed issues related to host aggregates and availability zones.
Database consistency between the API and compute cells is crucial to prevent communication interruptions and maintain RPC call integrity.
Transition from cells v1 to cells v2 in Nova involves a complete rewrite of the cells feature.
26:28
Cells will become the default mode for everything, eliminating the need for a special Nova cells service.
Cells v2 will integrate cells scheduling and routing into the default compute driver.
Knowledge about databases for compute cells will be included in cells v2, streamlining the database structure.
Synchronization between the master API cell and compute level databases will be ensured in cells v2.
Transition to Nova API and Rabbit cues for individual cells discussed, focusing on moving from v1 to v2 mode seamlessly.
28:10
Recommendation to wait for v2 release instead of migrating to cells immediately to avoid potential issues.
Speaker advises waiting for a smoother transition without major migrations after reflecting on decision to adopt cells.
Puppet used for deploying changes in the discussed process.
Managing intermediary stages and clusters manually due to limited resources.
32:06
Ad hoc ansible playbooks are used for consistency in the process.
Using different regions with Ceph instead of creating extra regions to prevent region sprawl.
Cells are invisible to users, with varying exposure levels for operators.
Highlighting the concept of availability zones as the abstract concept users interact with, focusing on managing resources efficiently and maintaining user invisibility.