An Engineer by Heart !!!
A Dreamer, A Pioneer, A Blogger.
A Network Engineer Trying to overtake the world with his network engineering skills :)
Opinions expressed here are solely my own and do not express the views or opinions of my Present or Past employer.
Fabric based Data Centre architectures are becoming more and more common these days. While I mentioned in my earlier articles that some of the terminologies and other stuff that you hear under SDN umbrella are not technically new, it still gives us ability to solve some of Business and Design problems that were hard to solve in past probably for different reasons.
While some of large players like Microsoft, Facebook, Linkedin, Google and Amazon built their DC fabrics using Open Source ideas to meet their Hyper-scale Data Centre requirements, most mid and large Enterprise still seem to be little far going down that direction considering it's not an easy task to begin with. While people might be able to get things working to an extent, but in most cases the solution doesn't look very clean.
So as an alternate they have options like using someone else's brain child. For example Cisco ACI, VMware NSX, Juniper Contrail , Nokia Nuage are some of products that are designed and built to cater same set of requirements.
While there is plenty of material available on Web where people suggest different reasons to claim why one vendor solution is better than another. While I am sure some of things might be true considering someone with hands on exp. around these products might have encountered interesting issues along the way, I was recently asked by one friend to share my suggestions to figure out which solution they should pick over another while putting my love aside for a given vendor ;)
So here is my quick list, It's not very detailed but still give you some pointers to help with thought process:
- Network based overlay (ACI) vs. Host based overlay (understand pros and cons of each approach)
- Would you need overlap between these two overlay models at some point , if so - would that require additional licensing, upgrades etc.
- How availability domains will be supported in solution
- What scale you would expect to hit down the line from control plane and data plane perspective
- What sort of Orchestration and Automation your customer require and how those needs map with particular solution capabilities
- How controller to controller communication happens, what are pre-requisites and do they allow overlay tunnels to go across different domains
- How particular OEM is going to avoid bug disasters. In theory if you are running same CODE, hitting any bug would potentially bring entire controller cluster down and defeat the purpose of running controllers in cluster or HA per say
- How open or close the API support is (Open doesn't mean completely open)
- How strong the partner ecosystem is in terms of integration and how well it ties with your Orchestrator
- How well your overlay model integrates with rest of network and what approaches they have to suggest as OEMs
- Do they allow to integrate the solution with other OEM solution down the line. For example Overlays like VxLAN don't specify the control plane. So while two OEMs may support VxLAN as possible common piece they might be taking different approaches to build control plane or might have some vendor specific twist added
- How well the solution works under multi hypervisor vendor environment
- Learning curve involved with new solution
- Integration with current NMS deployment
- How your DCI connects and integrates between DC-DR or DC-DC (Active-Active)
- How well your fabric handles situations like customer using DR on cloud
- Support for containers (Corner case)
- How well solution is able to tie the virtual and physical workloads - How fabric protects east west traffic (Micro-segmentation) - How you move your virtual/physical workloads from OLD Architecture to New fabric architecture
For quite some time I have been sort of irritated (for lack of better word) by hearing and discussing about buzz word in the market called SDN. Let's first review some of interesting facts:
- Some people seems to be afraid of loosing there jobs thinking they will be irrelevant in near future, well lots of posts are there on web where people are throwing opinions about whats the value of CCIE now and if CCIE is relevant any longer. (Mostly these are people claiming themselves as SDN/NFV experts or evangelist)
- SDN means - STILL DON'T KNOW
- SDN is the only thing that the next generation networks would be build around or run - SDN & NFV are future of networking - Protocols like Open Flow will take over the world soon and re-define the networking completely Now let's not try to define SDN here and keep it for next post since I would wan't to clarify on some of misconceptions.
So coming back to idea of there is fundamentally something new. Well it depends upon what's your personal take on SDN to begin with in terms of :
- What it is - How it works - What problems does it solve - What are the new models, frameworks and protocols that works in the background (Check under the hood)
Now if you take a closer look, most of these products are good mix of:
- Overlay Networks (Host Based, Network Based or Hybrid - e.g.: VxLAN etc.)
- Good Orchestrator that really works well for most part (Numerous failed attempts by vendors in past)
- Some better protocols/techniques to support workload mobility (e.g. LISP etc)
- Fixes applied to some traditional techniques (e.g. Poor programmability support with CLI in past or missing shell access etc.)
- Some old fundamentals re-discovered (Cap theorem , Game theory, Clos Fabric, BGP Labelled Unicast etc.)
- Enhancements into existing protocols considering their proven history androbustness ( BGP LS, BGP EVPN, Segment Routing etc.) to meet new requirements in terms of scales and get rid of some old challenges
- Better API support
- Better abstraction
Now to me these are old ideas which are wrapped in nice package (Remember RFC 1925 Rule 11) but certainly needed to meet business demands and scale in current scenario and near future depending upon what all business problems you are trying to solve with technology.
On the other hand important question is " Should I be afraid ? "
Well there are couple of moving pieces which you need to consider:
- Most traditional network certifications, classes & courses don't cover thesethings which really makes the situation tough
- There are very limited books and texts around these topics and some are even misleading
- Lack of new networking models to define and shape these protocols and other stuff well and standardize them across vendors
- Most of companies in these segments present their products like real game changers
- Are you really bad at adopting change
- Well in the end of the day it's all about money...isn't it ? :)
So while it's really good to have SDN tools around to solve different problems with technology and get rid of some challenges and limitations from past, we are still far from having T-800 from Judgement Day in real life :) (While another interesting question would be if we really want to go to that stage)
Let's start the series with discussion of CLOS Fabrics AKA Spine & Leaf Architectures.
Now CLOS design is not fundamentally new, but most of the Network Engineers were not talking about it till recent times (Well...this is true to an extent). So as Network Engineer should you really care ?
Well you should start by asking why CLOS in first place ? The major problem that CLOS fabric solves is about solving scalability issues. While scalability is a matter of context, it's not necessary that everyone needs or to be precise going too far about it. Also CLOS fabric also doesn't define your Layer2 - Layer 3 boundaries itself. So you are pretty much dependent upon what works best for you from vendor implementation perspective while keeping your overall goal in mind. Now in theory Layer 3 Fabrics scale much better than Layer 2 Fabric. Here are some questions/Things you figure out about CLOS if you decide to go for it : - What is the scale that you got to deal with ? - What are technical and business requirements ? - Your DC traffic is mostly east-west or north-south ? - How you can minimize the state of the Core (Spine) to minimum ? - How flooding works in your fabric ? - How multicast is handled in fabric ? - Where to define Layer2-Layer 3 boundary ? - Your network is going to multi vendor now/In future ? - How you gonna manage and monitor such large network ? - How you gonna introduce security & Services such as Load Balancer ? - How you gonna connect to external world ? (Border Spine Vs. Border Leaf) - Define you convergence requirements - You gonna need single or multi stage CLOS ? - Your over subscription ratio ? (Usually 3:1 is good for most part) - Understand your failure domains and impact they may have - Do you need Spine to Spine or Leaf to Leaf connections to mitigate some of failure scenario ? - If you are going with Layer 3 fabric, is it going to be good idea to use summarization ? - EBGP vs IBGP (Also RR placement) in Layer 3 fabric ?
Even as an example, Cisco's famous buzzword these days ACI (Application Centric Infrastructure ) also uses Spine & Leaf design. It uses BGP EVPN (Some secret souce but soon EVPN will be there too) control plane and on top of which it uses VXLAN as Data Plane. So between Spine & Leaf (Single Stage) it uses Layer 3 fabric. The entire fabric is managed with a centralized command and control system called Cisco APIC Controller. With ACI you can go as far as 6 Spines at the moment and all services (e.g. load balancer), firewalls, external connectivity gets terminated on Leaf switches. For server redundancy (Bare Metal Or Virtual ) it uses our old friend Virtual Port Channel (vPC) but this time doesn't require directly connected interfaces among leaf switches for peer link and peer keep alive link functions. Cisco ACI is kind of build around another buzz word that you hear more often these days called SDN (Software Defined Networks). Now whether it fits into true SDN definition or not needs another discussion :). In the mean while below is the list of URLs which you may find very handy to get started with CLOS: http://packetpushers.net/podcast/podcasts/datanauts-011-understanding-leaf-spine-networks/ https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-next-generation-facebook-data-center-network
I have been asked so many times by Network Engineers right from starter level to Expert level people about how network industry is changing at rapid pace in last few years and questions like if certification and in particular CCIE holds any value any longer. I also spoke with couple of friends that I truly admire and are working in US , Europe & Australia to get feedback on how Network industry is evolving there. Now to start with, following technologies are definitely picking up in some form or shape : - SDX like (SDN - Software Defined Networks) - CLOS/ Spine & Leaf Designs - NFV (Network Function Virtualization) - Virtualization ( In Areas like Network, Compute & Storage) - Automation ( Chef, Puppet etc...) - Scripting & Programming ( Python, Bash, Java etc...) - Cloud Computing - Network Visibility - Overlay Networks/Tunneling Technologies (VXLAN, NvGRE etc...) - Network Modeling Methods - API (Application Program Interface like REST ) - Understanding on Unix & Linux - Big Data - Active Active Data Centres - Machine Learning - Segment Routing - Containers ( Docker...) - Deep Understanding of Applications Structures/Component & Life Cycle And last but not least deep understanding of protocols like TCP, HTTP etc... But again the impact these may have on current network industry (What they call traditional networking now) may vary by large margin depending upon: - Which part of the world you are living in - How IT Industry is driven there and Network Industry in particular - Which company you work for - What are your personal/political views - Company's IT Strategy & Road Maps - Your current skill set & fears about these new technologies - Is there any simulation tool to get familiar with these technologies - What is the maturity level of given technology (RFC ?, New Model ? etc...) - How you want to manage these solution (Afraid of Open Source ? Multi vendor blame game ? etc...) - Do you really have a good business case Now there are of course other factors including budget/cost, ROI, How to get your operational staff ready etc... But hope you get the idea. In the coming series of posts I would express my personal opinions around all these but those articles are going to be semi technical rather being completely technical since I am more of a Pre-Sales guy now.