Monday, December 5, 2016

Data Centre Fabric Design Considerations

Let's continue the SDN series.

Fabric based Data Centre architectures are becoming more and more common these days. While I mentioned in my earlier articles that some of the terminologies and other stuff that you hear under SDN umbrella are not technically new, it still gives us ability to solve some of Business and Design problems that were hard to solve in past probably for different reasons.

While some of large players like Microsoft, Facebook, Linkedin, Google and Amazon built their DC fabrics using Open Source ideas to meet their Hyper-scale Data Centre requirements, most mid and large Enterprise still seem to be little far going down that direction considering it's not an easy task to begin with. While people might be able to get things working to an extent, but in most cases the solution doesn't look very clean.

So as an alternate they have options like using someone else's brain child. For example Cisco ACI, VMware NSX, Juniper Contrail , Nokia Nuage are some of products that are designed and built to cater same set of requirements.

While there is plenty of material available on Web where people suggest different reasons to claim why one vendor solution is better than another. While I am sure some of things might be true considering someone with hands on exp. around these products might have encountered interesting issues along the way, I was recently asked by one friend to share my suggestions to figure out which solution they should pick over another while putting my love aside for a given vendor ;)

So here is my quick list, It's not very detailed but still give you some pointers to help with thought process:

- Network based overlay (ACI) vs. Host based overlay (understand pros and cons of each approach)

Would you need overlap between these two overlay models at some point , if so - would that require additional licensing, upgrades etc.

How availability domains will be supported in solution

What scale you would expect to hit down the line from control plane and data plane perspective 

What sort of Orchestration and Automation your customer require and how those needs map with particular solution capabilities

- How controller to controller communication happens, what are pre-requisites and do they allow overlay tunnels to go across different domains

- How particular OEM is going to avoid bug disasters. In theory if you are running same CODE, hitting any bug would potentially bring entire controller cluster down and defeat the purpose of running controllers in cluster or HA per say

- How open or close the API support is (Open doesn't mean completely open)

- How strong the partner ecosystem is in terms of integration and how well it ties with your Orchestrator

- How well your overlay model integrates with rest of network and what approaches they have to suggest as OEMs

- Do they allow to integrate the solution with other OEM solution down the line. For example Overlays like VxLAN don't specify the control plane. So while two OEMs may support VxLAN as possible common piece they might be taking different approaches to build control plane or might have some vendor specific twist added

- How well the solution works under multi hypervisor vendor environment

- Learning curve involved with new solution 

- Integration with current NMS deployment

- How your DCI connects and integrates between DC-DR or DC-DC (Active-Active)

- How well your fabric handles situations like customer using DR on cloud

- Support for containers (Corner case)

- How well solution is able to tie the virtual and physical workloads

- How fabric protects east west traffic (Micro-segmentation)

- How you move your virtual/physical workloads from OLD Architecture to New fabric architecture

Deepak Arora