Monthly Archives: April 2017

Veeam Cloud Connect

Few weeks ago at work we decided to install Veeam Cloud connect. We had several use cases in mind but with the arrival of the Windows Agent that is Beta now, but soon available, we had more reason than ever to deploy Cloud connect.

Recently at a local Veeam User Group (where I’m one of the leaders) I spoke to Luca Dell’Oca from Veeam about this and following his reference architecture document the installation was mostly straight forward. I wanted to write a blog about the experience and have a diagram of the networking design. The installation for most parts went smooth, but there was a few points that I had trouble with, and to save you the trouble I decided to write this blog.

First step for this installation, as with any installation for that matter is to design the project and create a drawing of network design. The reference documents recommends that you install the services in a 3 tier network where you separate the Cloud Gateway services on a DMZ, have you Veeam management services in an internal network, and then finally have your WAN acceleration and Repository service on a separate internal network. The main reason for having your repository server on a separate network from the Veeam management services is in my view to create a as closed as possible network where the customer data resides.

One thing to point out is also that you most likely have a management Active directory installation where you Veeam management services are, some other services and most likely other monitoring services. You don’t want to use those credentials on the repository server or have direct connection between the management network and the storage network. So use only secure local accounts on the repository server. One major reason for this is crypto locker viruses that are getting more and more aggressive, and keeping the repository services out of reach from “daily” operation of admins helps in this regard.

Even though the Cloud gateway services don’t contain any valuable data, it’s also a good practice to limit the connection from the DMZ to the services network, and there for the Cloud Gateways servers are not AD connected and only have a secure local accounts.


Click on the image for a larger version) Please note all DNS names, IPs, Network names etc., are fictional in this diagram.

Going from top to bottom, explaining the diagram we start with the user. They can be ether using Veeam Backup and Recovery installed at their location or users that have the Windows Agent installed on their servers or laptops. After they have signed up and got a username and password for the service, they connect to the DNS name of the service, in this demo case ” cc.veeam.vblog.is” – By using cc.veeam.vblog.is instead of just Veeam.vblog.is you have a logical DNS name to store you customer portal, product information etc. on a webserver with the DNS name “veeam.vblog.is. The DNS record cc.veeam.vblog.is points to 1 IP address on each of the Cloud gateway servers and by that method it’s DNS round robin load balanced.

The Cloud Gateway servers are on a DMZ and behind your external firewall. Only port TCP/UDP 6180 is open from the internet to those IP’s. The Cloud gateways communicates to the Veeam server on ports TCP 6160-6169, to the WAN accelerator on ports TCP 2500-5000 and Ports TCP 6164-6165, and finally to the Repository server on ports TCP 2500-5000.

Next part of the diagram is the Services network where the Veeam server resides as well as Enterprise manager if used. In this network you most likely have you management service AD installation as well as the database for the Veeam install. You can of course have the database installed on the Veeam server as well, but for larger installations its better separate the roles and have a separate VM running the SQL services.

Communications from the Veeam server to the Cloud gateways servers is on port TCP 2500-5000, but to install the Veeam service in the first place several standard windows ports needs to be open from the Veeam server to the Cloud Gateway servers. Those are TCP/UDP 135,137-139,445

The Veeam server also needs to connect to the WAN accelerator on ports TCP 6150-6164, and again to install the services ports TCP/UDP 135,137-139,445 are needed.

To the repository service, only the installation ports TCP/UDP 135,137-139,445 are needed as there is not data flowing from the Veeam Management service to the repository server.

Then to the Storage network. This network should be closed off as much as possible, and by only allowing the ports needed from the Cloud Gateways to the repository servers and WAN acceleration service, you limit the size of the attack surface. As suggested in Luca’s document, you can even close down the TCP/UDP 135.137-139,445 ports when the installation has finished of the services to make the storage network even more secure.

So into the installation process. In my case I had the management AD service already installed and also my database server. Base installation of the Veeam server is straight forward, and there is no need for me to write how that process goes as many other blogs and documents go through that process. But when finished, you have 4 main tasks to complete.

  1. Define a Veeam repository. In my case I created a Scale-out repository as my back end storage has 3 data volumes.
  2. Install the certificate for the Cloud connect service, here I got into trouble, – more on that later
  3. Add cloud gateway, where you connect the Veeam installation to the Gateway servers on the DMZ, here I also had a problem
  4. And finally, add tenants, define their quotas, create username and password etc… (easy part)

And now to my problems.

My first problem was with the certificate. I planned to use an existing wildcard certificate we had for the primary domain. I was able to install the certificate, but when I tried to connect to the service with my Veeam Agent or Veeam installation in my lab, I got an error on the certificate, stating that the certificate was not valid and did not contain the subject name of my service. After some troubleshooting I remembered that somewhere I had seen a blog or a tweet and after some research I found more people having the same issue with wildcard certificate.

Then to my next problem. – I got a new certificate with a FQDN in its subject for the service, and tries to install that certificate though the Veeam wizard ” Manage Certificate” but I got an error in the end stating: “Failed to assign certificate toe the Cloud Connect Service:”. At this point, as I had spent a couple of hours on the wildcard certificate, I decided to create a support case with Veeam and get help from them. After a short time I got a phone call from support where we went through the process several times without any luck. He found out that the certificate was installed in the windows certificate store, but the subject name of the certificate was empty. Then the support guy suggested that we could install the certificate directly in windows certificate store, and that we did. Then in the “Manage Certificate” wizard I was able to select an existing certificate, and that went through. – So if you get this problem, – install the certificate to windows, and then import that to the Veeam installation.

And my third and final problem I got when trying to add the Cloud gateway servers to the installation. As far as I could see from the reference architecture document I had asked for the correct ports to be opened in the firewall, but after several retries, and having my network guy look at firewall logs, Port TCP 445 was getting hits, but were not open. When looking at the “Used Ports” webpage on the Veeam help center it correctly lists ports TCP/UDP 135,137-139,445. This was easily fixed and after this I was able to continue on creating tenants.

Overall the experience was pretty straight forward but those 3 issues I had took several hours to go through. – The good part though is that now I got more understanding of the product and could write a meaningful blog post that actually might help someone that is having the same issues I had.

Share this blog or comment if you have questions

Cheers.

RAID mania in my workstation

Recently I got a hold of 6 x Intel DC S3610 SSD drives that I wanted to play around with, and see what performance I could get out of them on my workstation PC at home.

To give you a little background on my use case I decided to write little bit about my setup and use case. My workstation is mainly used for graphics work and map making for a project I have www.iskort.is . Creating those maps takes up a lot of resources in all categories, I always need more RAM, more CPU power, and specially Disk performance and space. For example, recently I was working on a new 3D map data files for Iceland and the dataset when uncompressed was 6TB of 3D files, and the final 3D dataset that I saved out of this raw data was approx. 400GB of Lidar “like” data. My workstation is pretty beefy, but some projects I have to run on a VM on my server that has dual 6 core CPU’s, and 288GB of RAM where I have also around 20TB of storage.

My workstation is based on a Gigabyte GA-X99-UD4 motherboard. There I have 64GB of RAM and an Intel i7 5930K CPU (3.5 GHz, overclocked to 4.4 GHz). I have a GeForce 1080 graphics card and an older GeForce 980 card in the system as well as some of the workflow I use utilizes the CUDA cores on both GPU’s. For the OS and temp files I have been using a 512GB Samsung 950 Pro NVMe M.2 based drive. I had a Mushkin Scorpion deluxe 480GB PCIe based drive for my working dataset, but recently that good old card died on me.

So back to what I have now, 6 x Intel DC S3610 SSD! I wanted to find out the best configuration of those drives, find out what RAID levels would get me the best performance, and also to see if I should try out Storage Spaces that is in Windows 10. I don’t have a dedicated hardware RAID card in my setup, and for a long time I have used the Intel Chipset Rapid Storage Technology RAID (chipset/software raid) to do stripes of 2 drives or mirrors. So far that has worked great for me. But now with 6 drives I needed to see where my bottleneck would be. – Would it be the drives, the chipset or even my old, (but pretty powerful when overclocked) CPU.

Initially I decided to test RAID 0 and see how linear the performance would be when adding more than 2 drives. I used FIO with multiple files and threads to make sure I wouldn’t cap my results on single file or single thread on the CPU.


(click on images for larger version)

The graph shows Random Read IOPS. Here I saw a clear benefit of having 2 x drives in Raid 0, and a little more benefit of having 3 drives in Raid 0. – 4 or more drives resulted in worse performance except for 4k and 8k block size.

Same trend when doing random writes. – 3 Drives would give some performance boost over 2 drives, but 4-6 drives resulted in worse performance. I then went into the BIOS, change the SATA ports to AHCI mode instead of RAID mode, and tested out using Windows Disk management software RAID. I tested out “Simple” storage spaces profile as well, as that is also based on striping as RAID 0

Here I got more performance and more linear growth. Performance up to 6 drives was showing on up to 16k block size but little was added on block size of 32K after 4 drives and on 128k block size 3-6 drives gave the same performance. Storage spaces with “Simple” profile and 6 drives were a little behind the software raid level in 4-16k block size, but minimal difference were on larger block sizes.

Same goes for random writes. More linear performance in 4-8K, but after 3 drives in 32k almost no gain.

Looking at MB/Sec graphs I noticed there were an obvious bottleneck in the system at approx. 1600MB/sec read, and 1250MB/Sec write. No matter the block site or number of drives, I could not get more throughput out of the system. My finding on this is that the X99 Chipset is at its maximum there and basically with 3 SSD drives like the Intel S3610 drives, more than 3 drives would saturate the maximum throughput of the chipset.

When looking at the average numbers below it’s clear that software raid in Windows outperforms the Intel chipset raid. Especially in the lower block sizes, where my system could deliver approx. 230.000 IOPS at 4K. At that rate my CPU was 100% busy doing those IO’s and having more powerful CPU would probably get me some more IOPS.

Since I didn’t want to run Raid 0 in production, I tested out different RAID options and also the options in Storage spaces in Windows 10 to see what would give me the best performance but also reasonable level of protection in case of a drive failure.

When using the X99 chipset Raid levels, I was able to do Raid 5 with 6 drives, but with Raid 10 I was only able to use 4 drives. I also tested Mirror with 2 drives, and 3 sets of mirrors, and then creating a Raid 0 in Windows Disk manager, to emulate Raid 10 with 6 drives. With Windows Storage spaces I created a 6 disk “Parity” drive, and a 3 Disk Parity drive. A 6 x 2Way Mirror and a 6 x 3 Way Mirror, a 2 x 2Way mirror and a 5 x 3 Way mirror.

The useable space varies greatly on those options and of course when using fewer drives I could have to options to create more volumes

Storage Spaces options

Chipset options

Simple 3 x Parity 6x Parity 6 x 2way 6 x 3way 2 x 2way 5 x 3way Raid 5 x 6 Raid 10 x 4 Raid 1 x 2 Raid 10 x 3x2mirror
2,61 TB 888 GB 1,73 TB 1,29 TB 886 GB 447 GB 741 GB 2,2TB 894 GB 447 GB 1,3 TB

On those different RAID levels I got pretty good read performance over all but as expected 2 drive mirror did not hold up agains the Raid 5 or Raid 10 options.

When looking at writes Raid 5 took a huge write penalty and that option did not look very promising.

Turning over to Windows Storage spaces, I got more options to test.

Read performance was better than the Intel Raid options on the same number of drives as before.

On write performance, both Parity options were terrible, and approx. 10 time worse than the Intel Raid 5 option. 2 Way mirror with 6 drives looked good but less than half of the write performance in 4k than the Simple profile with no redundancy but still better than 6 drive Raid 0 option on the Intel chipset RAID.

Looking at MB/Sec values, most options were capping at 1550MB/sec as before as the Different Raid 0 options.

Looking at writes MB/Sec it’s obvious that the RAID 5 options had the worst performance and 6 x 2Way Mirror were just above the “Mixed” Raid 1 / 0 Mode where both Intel Raid 1 and Windows Raid 0 were bundled togeather.

Looking again at the average numbers

Windows storage spaces were showing better performance when using all 6 drives

Again 6 x 2 Way storage spaces gave me the best average write performance.

Overall, the performance of Windows Storage spaces was a nice surprise. Previously I didn’t gave it much though as I had been using Intel chipset Raid for 2 disk configuration with almost 2 x performance boost over 1 drive. Also what storage spaces allowed me to do, is to bundle all the 6 disks into one pool, but then carve volumes that had different protection, or no protection at all, and that is exactly what ended up doing.

I had my OS drive from the Samsung 950 Pro NVMe M.2 drive, I then created a 1.3TB volume with 2 way protection for my workflow data where I would work on the mapping data files. I created a 500GB volume with “Simple” profile, where I placed all temp files, Photoshop scratch disk file and my mapping software’s temp folder. I also have an 4TB SSHD “hybrid” drive from Seagate (ST4000DX002) for archives and other stuff like my drone flight videos or images from my DSLR camera.

I hope this blog was useful for you, especially if you had plans to do RAID with the Intel chipset with more than 2 SSD drives. My suggestion is to use Windows Storage Spaces instead.

Share if you like this blogpost or write a comment below.

Cohesity and Veeam, better together.

At the end of last year, my Veeam repository storage was filling up and the storage was also a very old FC storage that I wanted to get rid of. My CTO tasked me to find a solution that would be cost effective and would scale as our backup storage needs were growing at a considerable rate. The storage would be placed in our secondary site where our Veeam copy jobs would reside.

For you to get an over view picture of our setup, here is a simplified diagram where our Kopavogur site is our primary site, and Akureyri site is our secondary site. We have more proxies, VMware hosts and some guest interaction proxies at customer locations that we back up to our system, but that’s out of scope of this blog so those components are not shown.

 

In our primary site I do the daily backups to an on-site repository for quick recovery of VM’s and data, but all copy jobs from the site goes to our secondary site in Akureyri.

When looking for solutions for the Akureyri site, I evaluated the benefits of using a deduplication appliance since copy jobs can greatly benefit of such storage as the same VM images are saved over and over again and in our backup policies that meant 10-15 times depending on the level of protection the customer wants. My conclusion that a deduplication appliance would be a great fit for the job, but my previous experience with deduplication appliance where performance would not scale well and fork-lift upgrades were need to add more performance of the solution did create a show stopper in that route.

Then I got news of Cohesity, as my friends Frank Brix Pedersen and Paul Schatteles left PernixData (RIP Pernix… L) and started working for a startup company Cohesity. I got an introduction of their product at a local VMUG event where I’m a leader, and also a live demonstration of the product at VMworld Barcelona in 2016. There I met other Cohesity staff as well so I got a good insight on what their plans were and how they were creating the solution out of hyper converged nodes where compute performance would scale linearly with the storage capacity. This got my attention as this method would get my wishes for a deduplication appliance back on track.

My next step was to get a POC underway to evaluate the solution and we agreed on pre-defined goals for the 30 day POC to be successful. I listed up 9 different points that would qualify as a successful initial setup, and another 7 points in regard of functionally and performance requirements.

Some of the initial points were (points were more detailed in the actual POC document)

  • Successful base installation of the product in Akureyri Site
  • Alerts and call home functionality
  • Initial setup of View box and views to publish CIFS share to the Veeam repository server
  • NFS share creation and connection to my ESXi hosts in Akureyri site and creation of NFS storage for archiving

Functionally and performance requirements points (again points were more detailed in the POC document)

  • Maintain data availability throughout a simulated hardware failure (node reboot/, drive failure etc…)
  • Successfully function as a Veeam repository target for the POC timeframe.
  • Successfully be able to perform Instant recovery with Veeam at a reasonable performance.
  • Stable user experience throughout the POC timeframe
  • Achieve a dedup ratio of 1:5 at the end of POC timeframe
  • Successful support case generation and acceptable response time from Cohesity support.

I got the box in mid of December 2016 and with help from Frank Brix I installed and setup the solution remotely in the Akureyri site. After some initial testing I connected Veeam to the box and everything was up and running the same day. As the holiday season was started I was unable to start the POC work right away, but at the end of 30 days in early January every point in the POC document was fulfilled except the 1:5 dedup rate. That was due to the fact that only 3 weekly copy job runs had run in the timeframe, but as my dedup rate was almost 1:3 I concluded that the theory worked. I would get a 1:5 dedup rate after 2-3 weeks.

We decided that the POC were a success and went ahead with the purchase. After few weeks, as projected I got my 1:5 dedup ratio that was defined in the POC document so my CTO got his Cost per GB projection confirmed and we went on with our daily business.

Now few months later, I’m really happy with the product. I have created a few minor issue based cases with Cohesity support, and always got great response time and help on my issues. Dedup ratio is on the rise, and in the 4 node, 2U box I have more than 400TB’s of logical data stored at this time, and I except to have this number doubled in the next 4-6 months without having to purchase another node. – Nodes can be added later when needed and the great thing about expansions is that I have a linear expansion path on both storage, performance and cost.

Last week I did a case study document with Paul that can be downloaded here from the Cohesity website where we wrote about the project.

Share or comment if you find this article helpful

Cheers,

Mateinn