Veeam Cloud Connect

Few weeks ago at work we decided to install Veeam Cloud connect. We had several use cases in mind but with the arrival of the Windows Agent that is Beta now, but soon available, we had more reason than ever to deploy Cloud connect.

Recently at a local Veeam User Group (where I’m one of the leaders) I spoke to Luca Dell’Oca from Veeam about this and following his reference architecture document the installation was mostly straight forward. I wanted to write a blog about the experience and have a diagram of the networking design. The installation for most parts went smooth, but there was a few points that I had trouble with, and to save you the trouble I decided to write this blog.

First step for this installation, as with any installation for that matter is to design the project and create a drawing of network design. The reference documents recommends that you install the services in a 3 tier network where you separate the Cloud Gateway services on a DMZ, have you Veeam management services in an internal network, and then finally have your WAN acceleration and Repository service on a separate internal network. The main reason for having your repository server on a separate network from the Veeam management services is in my view to create a as closed as possible network where the customer data resides.

One thing to point out is also that you most likely have a management Active directory installation where you Veeam management services are, some other services and most likely other monitoring services. You don’t want to use those credentials on the repository server or have direct connection between the management network and the storage network. So use only secure local accounts on the repository server. One major reason for this is crypto locker viruses that are getting more and more aggressive, and keeping the repository services out of reach from “daily” operation of admins helps in this regard.

Even though the Cloud gateway services don’t contain any valuable data, it’s also a good practice to limit the connection from the DMZ to the services network, and there for the Cloud Gateways servers are not AD connected and only have a secure local accounts.


Click on the image for a larger version) Please note all DNS names, IPs, Network names etc., are fictional in this diagram.

Going from top to bottom, explaining the diagram we start with the user. They can be ether using Veeam Backup and Recovery installed at their location or users that have the Windows Agent installed on their servers or laptops. After they have signed up and got a username and password for the service, they connect to the DNS name of the service, in this demo case ” cc.veeam.vblog.is” – By using cc.veeam.vblog.is instead of just Veeam.vblog.is you have a logical DNS name to store you customer portal, product information etc. on a webserver with the DNS name “veeam.vblog.is. The DNS record cc.veeam.vblog.is points to 1 IP address on each of the Cloud gateway servers and by that method it’s DNS round robin load balanced.

The Cloud Gateway servers are on a DMZ and behind your external firewall. Only port TCP/UDP 6180 is open from the internet to those IP’s. The Cloud gateways communicates to the Veeam server on ports TCP 6160-6169, to the WAN accelerator on ports TCP 2500-5000 and Ports TCP 6164-6165, and finally to the Repository server on ports TCP 2500-5000.

Next part of the diagram is the Services network where the Veeam server resides as well as Enterprise manager if used. In this network you most likely have you management service AD installation as well as the database for the Veeam install. You can of course have the database installed on the Veeam server as well, but for larger installations its better separate the roles and have a separate VM running the SQL services.

Communications from the Veeam server to the Cloud gateways servers is on port TCP 2500-5000, but to install the Veeam service in the first place several standard windows ports needs to be open from the Veeam server to the Cloud Gateway servers. Those are TCP/UDP 135,137-139,445

The Veeam server also needs to connect to the WAN accelerator on ports TCP 6150-6164, and again to install the services ports TCP/UDP 135,137-139,445 are needed.

To the repository service, only the installation ports TCP/UDP 135,137-139,445 are needed as there is not data flowing from the Veeam Management service to the repository server.

Then to the Storage network. This network should be closed off as much as possible, and by only allowing the ports needed from the Cloud Gateways to the repository servers and WAN acceleration service, you limit the size of the attack surface. As suggested in Luca’s document, you can even close down the TCP/UDP 135.137-139,445 ports when the installation has finished of the services to make the storage network even more secure.

So into the installation process. In my case I had the management AD service already installed and also my database server. Base installation of the Veeam server is straight forward, and there is no need for me to write how that process goes as many other blogs and documents go through that process. But when finished, you have 4 main tasks to complete.

  1. Define a Veeam repository. In my case I created a Scale-out repository as my back end storage has 3 data volumes.
  2. Install the certificate for the Cloud connect service, here I got into trouble, – more on that later
  3. Add cloud gateway, where you connect the Veeam installation to the Gateway servers on the DMZ, here I also had a problem
  4. And finally, add tenants, define their quotas, create username and password etc… (easy part)

And now to my problems.

My first problem was with the certificate. I planned to use an existing wildcard certificate we had for the primary domain. I was able to install the certificate, but when I tried to connect to the service with my Veeam Agent or Veeam installation in my lab, I got an error on the certificate, stating that the certificate was not valid and did not contain the subject name of my service. After some troubleshooting I remembered that somewhere I had seen a blog or a tweet and after some research I found more people having the same issue with wildcard certificate.

Then to my next problem. – I got a new certificate with a FQDN in its subject for the service, and tries to install that certificate though the Veeam wizard ” Manage Certificate” but I got an error in the end stating: “Failed to assign certificate toe the Cloud Connect Service:”. At this point, as I had spent a couple of hours on the wildcard certificate, I decided to create a support case with Veeam and get help from them. After a short time I got a phone call from support where we went through the process several times without any luck. He found out that the certificate was installed in the windows certificate store, but the subject name of the certificate was empty. Then the support guy suggested that we could install the certificate directly in windows certificate store, and that we did. Then in the “Manage Certificate” wizard I was able to select an existing certificate, and that went through. – So if you get this problem, – install the certificate to windows, and then import that to the Veeam installation.

And my third and final problem I got when trying to add the Cloud gateway servers to the installation. As far as I could see from the reference architecture document I had asked for the correct ports to be opened in the firewall, but after several retries, and having my network guy look at firewall logs, Port TCP 445 was getting hits, but were not open. When looking at the “Used Ports” webpage on the Veeam help center it correctly lists ports TCP/UDP 135,137-139,445. This was easily fixed and after this I was able to continue on creating tenants.

Overall the experience was pretty straight forward but those 3 issues I had took several hours to go through. – The good part though is that now I got more understanding of the product and could write a meaningful blog post that actually might help someone that is having the same issues I had.

Share this blog or comment if you have questions

Cheers.

RAID mania in my workstation

Recently I got a hold of 6 x Intel DC S3610 SSD drives that I wanted to play around with, and see what performance I could get out of them on my workstation PC at home.

To give you a little background on my use case I decided to write little bit about my setup and use case. My workstation is mainly used for graphics work and map making for a project I have www.iskort.is . Creating those maps takes up a lot of resources in all categories, I always need more RAM, more CPU power, and specially Disk performance and space. For example, recently I was working on a new 3D map data files for Iceland and the dataset when uncompressed was 6TB of 3D files, and the final 3D dataset that I saved out of this raw data was approx. 400GB of Lidar “like” data. My workstation is pretty beefy, but some projects I have to run on a VM on my server that has dual 6 core CPU’s, and 288GB of RAM where I have also around 20TB of storage.

My workstation is based on a Gigabyte GA-X99-UD4 motherboard. There I have 64GB of RAM and an Intel i7 5930K CPU (3.5 GHz, overclocked to 4.4 GHz). I have a GeForce 1080 graphics card and an older GeForce 980 card in the system as well as some of the workflow I use utilizes the CUDA cores on both GPU’s. For the OS and temp files I have been using a 512GB Samsung 950 Pro NVMe M.2 based drive. I had a Mushkin Scorpion deluxe 480GB PCIe based drive for my working dataset, but recently that good old card died on me.

So back to what I have now, 6 x Intel DC S3610 SSD! I wanted to find out the best configuration of those drives, find out what RAID levels would get me the best performance, and also to see if I should try out Storage Spaces that is in Windows 10. I don’t have a dedicated hardware RAID card in my setup, and for a long time I have used the Intel Chipset Rapid Storage Technology RAID (chipset/software raid) to do stripes of 2 drives or mirrors. So far that has worked great for me. But now with 6 drives I needed to see where my bottleneck would be. – Would it be the drives, the chipset or even my old, (but pretty powerful when overclocked) CPU.

Initially I decided to test RAID 0 and see how linear the performance would be when adding more than 2 drives. I used FIO with multiple files and threads to make sure I wouldn’t cap my results on single file or single thread on the CPU.


(click on images for larger version)

The graph shows Random Read IOPS. Here I saw a clear benefit of having 2 x drives in Raid 0, and a little more benefit of having 3 drives in Raid 0. – 4 or more drives resulted in worse performance except for 4k and 8k block size.

Same trend when doing random writes. – 3 Drives would give some performance boost over 2 drives, but 4-6 drives resulted in worse performance. I then went into the BIOS, change the SATA ports to AHCI mode instead of RAID mode, and tested out using Windows Disk management software RAID. I tested out “Simple” storage spaces profile as well, as that is also based on striping as RAID 0

Here I got more performance and more linear growth. Performance up to 6 drives was showing on up to 16k block size but little was added on block size of 32K after 4 drives and on 128k block size 3-6 drives gave the same performance. Storage spaces with “Simple” profile and 6 drives were a little behind the software raid level in 4-16k block size, but minimal difference were on larger block sizes.

Same goes for random writes. More linear performance in 4-8K, but after 3 drives in 32k almost no gain.

Looking at MB/Sec graphs I noticed there were an obvious bottleneck in the system at approx. 1600MB/sec read, and 1250MB/Sec write. No matter the block site or number of drives, I could not get more throughput out of the system. My finding on this is that the X99 Chipset is at its maximum there and basically with 3 SSD drives like the Intel S3610 drives, more than 3 drives would saturate the maximum throughput of the chipset.

When looking at the average numbers below it’s clear that software raid in Windows outperforms the Intel chipset raid. Especially in the lower block sizes, where my system could deliver approx. 230.000 IOPS at 4K. At that rate my CPU was 100% busy doing those IO’s and having more powerful CPU would probably get me some more IOPS.

Since I didn’t want to run Raid 0 in production, I tested out different RAID options and also the options in Storage spaces in Windows 10 to see what would give me the best performance but also reasonable level of protection in case of a drive failure.

When using the X99 chipset Raid levels, I was able to do Raid 5 with 6 drives, but with Raid 10 I was only able to use 4 drives. I also tested Mirror with 2 drives, and 3 sets of mirrors, and then creating a Raid 0 in Windows Disk manager, to emulate Raid 10 with 6 drives. With Windows Storage spaces I created a 6 disk “Parity” drive, and a 3 Disk Parity drive. A 6 x 2Way Mirror and a 6 x 3 Way Mirror, a 2 x 2Way mirror and a 5 x 3 Way mirror.

The useable space varies greatly on those options and of course when using fewer drives I could have to options to create more volumes

Storage Spaces options

Chipset options

Simple 3 x Parity 6x Parity 6 x 2way 6 x 3way 2 x 2way 5 x 3way Raid 5 x 6 Raid 10 x 4 Raid 1 x 2 Raid 10 x 3x2mirror
2,61 TB 888 GB 1,73 TB 1,29 TB 886 GB 447 GB 741 GB 2,2TB 894 GB 447 GB 1,3 TB

On those different RAID levels I got pretty good read performance over all but as expected 2 drive mirror did not hold up agains the Raid 5 or Raid 10 options.

When looking at writes Raid 5 took a huge write penalty and that option did not look very promising.

Turning over to Windows Storage spaces, I got more options to test.

Read performance was better than the Intel Raid options on the same number of drives as before.

On write performance, both Parity options were terrible, and approx. 10 time worse than the Intel Raid 5 option. 2 Way mirror with 6 drives looked good but less than half of the write performance in 4k than the Simple profile with no redundancy but still better than 6 drive Raid 0 option on the Intel chipset RAID.

Looking at MB/Sec values, most options were capping at 1550MB/sec as before as the Different Raid 0 options.

Looking at writes MB/Sec it’s obvious that the RAID 5 options had the worst performance and 6 x 2Way Mirror were just above the “Mixed” Raid 1 / 0 Mode where both Intel Raid 1 and Windows Raid 0 were bundled togeather.

Looking again at the average numbers

Windows storage spaces were showing better performance when using all 6 drives

Again 6 x 2 Way storage spaces gave me the best average write performance.

Overall, the performance of Windows Storage spaces was a nice surprise. Previously I didn’t gave it much though as I had been using Intel chipset Raid for 2 disk configuration with almost 2 x performance boost over 1 drive. Also what storage spaces allowed me to do, is to bundle all the 6 disks into one pool, but then carve volumes that had different protection, or no protection at all, and that is exactly what ended up doing.

I had my OS drive from the Samsung 950 Pro NVMe M.2 drive, I then created a 1.3TB volume with 2 way protection for my workflow data where I would work on the mapping data files. I created a 500GB volume with “Simple” profile, where I placed all temp files, Photoshop scratch disk file and my mapping software’s temp folder. I also have an 4TB SSHD “hybrid” drive from Seagate (ST4000DX002) for archives and other stuff like my drone flight videos or images from my DSLR camera.

I hope this blog was useful for you, especially if you had plans to do RAID with the Intel chipset with more than 2 SSD drives. My suggestion is to use Windows Storage Spaces instead.

Share if you like this blogpost or write a comment below.

Cohesity and Veeam, better together.

At the end of last year, my Veeam repository storage was filling up and the storage was also a very old FC storage that I wanted to get rid of. My CTO tasked me to find a solution that would be cost effective and would scale as our backup storage needs were growing at a considerable rate. The storage would be placed in our secondary site where our Veeam copy jobs would reside.

For you to get an over view picture of our setup, here is a simplified diagram where our Kopavogur site is our primary site, and Akureyri site is our secondary site. We have more proxies, VMware hosts and some guest interaction proxies at customer locations that we back up to our system, but that’s out of scope of this blog so those components are not shown.

 

In our primary site I do the daily backups to an on-site repository for quick recovery of VM’s and data, but all copy jobs from the site goes to our secondary site in Akureyri.

When looking for solutions for the Akureyri site, I evaluated the benefits of using a deduplication appliance since copy jobs can greatly benefit of such storage as the same VM images are saved over and over again and in our backup policies that meant 10-15 times depending on the level of protection the customer wants. My conclusion that a deduplication appliance would be a great fit for the job, but my previous experience with deduplication appliance where performance would not scale well and fork-lift upgrades were need to add more performance of the solution did create a show stopper in that route.

Then I got news of Cohesity, as my friends Frank Brix Pedersen and Paul Schatteles left PernixData (RIP Pernix… L) and started working for a startup company Cohesity. I got an introduction of their product at a local VMUG event where I’m a leader, and also a live demonstration of the product at VMworld Barcelona in 2016. There I met other Cohesity staff as well so I got a good insight on what their plans were and how they were creating the solution out of hyper converged nodes where compute performance would scale linearly with the storage capacity. This got my attention as this method would get my wishes for a deduplication appliance back on track.

My next step was to get a POC underway to evaluate the solution and we agreed on pre-defined goals for the 30 day POC to be successful. I listed up 9 different points that would qualify as a successful initial setup, and another 7 points in regard of functionally and performance requirements.

Some of the initial points were (points were more detailed in the actual POC document)

  • Successful base installation of the product in Akureyri Site
  • Alerts and call home functionality
  • Initial setup of View box and views to publish CIFS share to the Veeam repository server
  • NFS share creation and connection to my ESXi hosts in Akureyri site and creation of NFS storage for archiving

Functionally and performance requirements points (again points were more detailed in the POC document)

  • Maintain data availability throughout a simulated hardware failure (node reboot/, drive failure etc…)
  • Successfully function as a Veeam repository target for the POC timeframe.
  • Successfully be able to perform Instant recovery with Veeam at a reasonable performance.
  • Stable user experience throughout the POC timeframe
  • Achieve a dedup ratio of 1:5 at the end of POC timeframe
  • Successful support case generation and acceptable response time from Cohesity support.

I got the box in mid of December 2016 and with help from Frank Brix I installed and setup the solution remotely in the Akureyri site. After some initial testing I connected Veeam to the box and everything was up and running the same day. As the holiday season was started I was unable to start the POC work right away, but at the end of 30 days in early January every point in the POC document was fulfilled except the 1:5 dedup rate. That was due to the fact that only 3 weekly copy job runs had run in the timeframe, but as my dedup rate was almost 1:3 I concluded that the theory worked. I would get a 1:5 dedup rate after 2-3 weeks.

We decided that the POC were a success and went ahead with the purchase. After few weeks, as projected I got my 1:5 dedup ratio that was defined in the POC document so my CTO got his Cost per GB projection confirmed and we went on with our daily business.

Now few months later, I’m really happy with the product. I have created a few minor issue based cases with Cohesity support, and always got great response time and help on my issues. Dedup ratio is on the rise, and in the 4 node, 2U box I have more than 400TB’s of logical data stored at this time, and I except to have this number doubled in the next 4-6 months without having to purchase another node. – Nodes can be added later when needed and the great thing about expansions is that I have a linear expansion path on both storage, performance and cost.

Last week I did a case study document with Paul that can be downloaded here from the Cohesity website where we wrote about the project.

Share or comment if you find this article helpful

Cheers,

Mateinn

Samsung 950 Pro M.2 512GB vs Mushkin Scorpion Deluxe 480GB

I got myself a new “disk-drive” today for my home workstation, if you can call a M.2 NVM chip a disk-drive!

I wanted to get a new drive for my OS and programs, as well as I use my workstation for GIS work, where I have both many, and large files opened in my GIS application. I also do graphics work on this PC using Lightroom, which uses huge number of index files for my photo collection.
What I got was a brand new Samsung V-Nand 512GB 950 Pro M.2 NVM Express, and I decided to test it against my older Mushkin Scorpion Deluxe 480GB PCIe based drive.
I wanted to find out how the new drive would hold up against my older Mushkin that I have been using for several years now. I used FIO to run several tests, 4K and 256k block size, Random and sequential reads and writes, to get some different views on the drives performance.

iops-4kUsing 4K block size during a 60 sec testing period, I get premium performance from the Samsung drive and pretty consistence performance though out the different test. The Mushkin delivers great performance as well during sequential reads and writes, but suffers when doing random writes and reads.

iops-256kUsing 256k block size, the Mushkin shines! The PCIe bus based drive delivers more than twice the performance of the Samsung M.2 Based drive.  The Samsung is like in the first test, more constant though.

My initial thoughts were to stop using the old Mushkin drive on my workstation, and move it over to my VMware server for PernixData FVP cache, – but now where I see the throughput difference for large IO’s on the Mushkin drive, I believe I’ll use that one for my GIS application files, and the new Samsung drive for OS, programs and Lightroom catalog files. I guess my VMware server has to stick with a normal SATA based SSD for now…

 

 

FreeNAS 10.1 as a VM in vSphere 6.0

I wanted to write a blog about my FreeNAS installation. I’ve been testing out FreeNAS 9.3 and find it to be well suited for my home-lab.

Having read through several posts about not to run FreeNAS as a VM, and others blogs saying, “you can, but shouldn’t”, and some “yes you can, just make sure…” I wanted to try out for myself and find out if I could make a stable setup in my home-lab.

To start with, I have been running FreeNAS 9.3 on one of my physical hosts, booting from a USB. The setup was pretty stable, but I believe my cheap USB stick that I used for boot died as after a reboot yesterday the bios did not find the USB drive to boot from.

That gave me a reason to make some changes. I want to test out FreeNAS 10.1 that is available as a nightly build. I also didn’t want to run the FreeNAS setup on one of my physical hosts as the host is a Dell R710 server, Dual X5675 Xeon 3Ghz 6 Core CPU’s, having 288GB of ram, 6 x 2TB disks and a Dell H700 Controller. – The machine was a total overkill just to run FreeNAS for my other Dell R710 VMware host with same specs.

So, the main issue I have in regard of running FreeNAS that uses ZFS, is the fact that my server has the Dell H700 controller (LSI 2108 based), and that controller is unable to work in IT mode (IT Mode is a “non-Raid” mode, where each HDD is visible to the OS without creating Disk Volumes on the RAID controller)

ZFS wants to see the pure disks without a Raid controller, and some controllers can be installed with an IT mode firmware or have this natively as the Dell H200 controller. – I didn’t want to experiment with cross-flashing the controller I have with an original LSI firmware, and I’m not sure if that would enable me to run IT mode on the LSI 2108 chip anyway.

I decided to go ahead, and find a solution I could use, and what I found that was recommended was to present the SAS controller via PCI Pass-through to the FreeNAS VM, and as I wanted to use all me 6 2TB Drives for the ZFS system, and the R710 server I have has 6 3.5″ HDD bays, I had to find a way to create a datastore for the FreeNAS VM configuration files and boot disk. This option turned out to be a no-go for me as I only have one controller in my server and I can’t use PCI pass-through as well as have a datastore for the FreeNAS VM. I carried on though and went with the option to use RDM for the disks to the FreeNAS vm.

What I did was to carve out a 50GB volume of one of the 2TB disks when I created the Virtual Disk in the H700 controller.


I then created a secondary volume for the remaining space on the disk. What you have to note here, that you have to create the other VHD’s with the same size as the first VHD, as FreeNAS won’t be able to put different sized disks into the same ZFS raid volume. For each VHD, I set Read Policy to “no read ahead” and Write policy to “Write Though”

I still boot the ESXi from a USB so this 50GB volume should be free for running the FreeNAS VM and host the ESXi system logs

The next step is to install ESXi 6.0 on the new USB Stick, and that process is straight forward and I don’t want to spend this blog post on the whole ESXi installation process, but I wanted to share a screenshot to show how ESXi sees the volumes and the USB stick.

When the ESXi installation is finished and basic settings have be set for the host, I create a datastore on the 50GB volume and name it “FreeNAS-Boot”. I install FreeNAS 10.1 on this datastore like a normal VM.

I give this VM 2 vCPU’s, 64GB or RAM and a 20GB HDD. I select to reserve all guest memory for this VM, as the datastore does not have space to hold the RAM disk file.

At this stage, the FreeNAS VM has only the boot disk, and I install the system on this device.

When I initial installation is done and network settings, DNS and such has been set, I shut down the FreeNAS VM and add the storage network adapters.

For the storage network, I have prepared on the host 4 x iSCSI enabled VMkernel ports.

Each switch has a VMkernel port and a standard VM port group.

In my hosts I have 4x 1Gbit On-board Broadcom QLogic 5709 based NICs for iSCSI, and a PCI express dual port Intel I350 based adapter for management and VM traffic

Now I add 4 network adapters to the VM, each on the separate iSCSI portgroup

And on the FreeNAS side I set the IP address and subnet mask accordingly.

Now I have the network connections set up, and next step is to get those Virtual disk from the H700 Controller up to the FreeNAS VM. Before I start I shut down the FreeNAS vm.

There are 3 ways you can go about this

  1. Create a datastore for each of the volumes , and present a HDD to the FreeNAS vm
  2. Raw device map each volume up to the FreeNAS vm
  3. Use DirectPath IO and present the H700 controller up to the FreeNAS vm

I would like to go with option 3 on this, but as I have the 50GB datastore on the controller, it’s not free for DirectPath. To have a look how this option is, I went to the DirectPath I/O settings.

When you have selected OK, I get a warning:

And even though I select “Yes” and reboot, the setting defaults back to not having the H700 controller in Pass though mode. If I want to experiment with this option, I have to have a separate controller for the FreeNAS VM but the R710 server I have has a fixed backplane for 6 drives. I would have to add an SAS controller with external connections and have some external SATA or SAS disks.

Next option I wanted to try was to add the volumes as raw device mapped disk to the VM. This is by default disabled for local disks.

VMware’s KB 1017530 shows how you prepare the disks so they can be added as RDM disks using vmkfstools.

In my case this was the list of devices and commands to create the correct .vmdk pointer files.

The command “ls -l /vmfs/devices/disks/ ” gave me a list of devices:

Resulting in those commands to create the .vmdk pointer files.

Now I could add the disks to the FreeNAS vm as “Existing Hard Disk”

Now a big portion of my coworkers yell at me, “you told us never to use RDM’s!” – And from what I read on the FreeNAS forum, I would be hanged for this. – But remember this is a lab installation in my home and for most part I want to try things here and see if they are working or if they give me trouble…

Anyhow, – RDM’s are set up and the FreeNAS now has 6 drives to play with

I decided to create a Raid 10 volume. The FreeNAS GUI is not so clear about how to create a Raid 10, but the process is that you select “Mirror”, and then select 3 stripes of mirrors

This gives me the best performance and 5.3TB disk space for my VMs.

Next thing is to set up iSCSI and have it listen to the 4 interfaces I have set up. I’ll create a Part 2 of this entry for this at a later time.

Hope you find this post useful and if so, share.

VSAN 6.0 in a nested ESXi 6.0 lab

I wanted to test VSAN in my lab without having to go out and buy SSD’s or invest in more hardware in my lab

The obvious path was to spin up several ESXi VM’s and do the settings in regard of networking and set normal HDD based volume as SSD. And to make things easy for you I took screenshots and wrote down every setting and step I made in this blog post.

 

 

 

To prepare networking for the ESXi VM’s you have to set Promiscuous mode to “Accept” in the security settings for the portgroup you place your ESXi VM’s on. You should not do this in a production installation on your whole vSwitch. In my lab I created a “NestedESXi” portgroup, where I enabled promiscuous mode by overriding the vSwitch default setting of “Reject” a VMware KB article explains this a bit more

This allows packets to travel from your physical nic on your ESXi host, up to the virtual nic of your virtual ESXi host, and up to its virtual VM’s virtual nic. Think inception + communications between each state.

Next thing to do is to create the ESXi VM’s. Select “Other” in “Guest OS Family”, and select “VMware ESXi 6.x” under “Guest OS Version.

This is pretty straight forward, but there is one setting in the “customize hardware” tab, and that is the option to set “Expose Hardware assisted virtualization to the guest OS” under CPU section.

Other settings on the VM’ is 2 cpu and 16GB of RAM (VSAN 6.0 memory requirements) state that each host should have a minimum of 32GB memory to accommodate the maximum number of 5 disk groups and a maximum of 7 capacity devices per disk group, – but in this lab test where I will only present 1 SSD and 1 HDD to the VSAN cluster, 16GB for the ESXi VM should work fine.

For the disks, I add one 4 GB disk for ESXi Installation, one 50GB disk to act as a simulated SSD disk, and one 150GB disk to act as a capacity device

I also in this step I select the “NestedESXi” network port group I prepared earlier.

I created 3 identical ESXi vm’s like this, and on more that had no extra hard disks, to test out the remote storage access of the VSAN cluster. VSAN requires 3 hosts as a minimum, with minimum 1 flash device and 1 spinning disk.

Next thing is to add the VM’s to vCenter as ESXi hosts.

I had earlier assigned IP addresses and created DNS records for the ESXi vm’s and I added the new hosts into a folder just for housekeeping reasons.

Before I create the VSAN cluster, I have to prepare the 50GB Hard disks and mark them as flash disk. In vSphere 6.0 this is really simple, just select the disk device and click the “F” button

This gives me a confirmation dialog to mark the selected disk as flash disk, and there you hit “yes”

This will mark the drive type to Flash

I also have to prepare a VMkernel port for VSAN SAN traffic. In this lab I’ll use the default vmk0 adapter for both management, vmotion and VSAN traffic. In production you should separate this though.

I do this for the other 2 ESXi VM’s and now everything is set up to create a VSAN Cluster.

To enable VSAN, select the “Turn on” checkbox under “Virtual SAN”

And then add your nested ESXi Hosts to the cluster.

After a minute or two, all the disks for the nested ESXi hosts automatically joined the VSAN and created a vsanDatastore.

And that’s it! – Now I have a VSAN datastore in my nested ESXi cluster.

As this is nested, using “fake” flash devices, I don’t expect to get much performance out of this, but for testing the process of creating a VSAN cluster this setup works great.

I hope you like this post, and send me your thoughs in the comments or on twitter.

 

StarWind V2V Converter

StarWind V2V Converter

As part of my job, I regularly get requests to import virtual machines from the customer site into the infrastructure of the organization I work for.

The challenge

As different virtualization platforms use different file format for the virtual disk images, I need to have an easy and reliable way to convert between the most commonly used disk formats.

It is common that vendor’s specific tools can only convert disk images to their own supported format, but not the other way around so you end up having a set of tools to do this task.

The solution

I got my hands on StarWind V2V converter few weeks ago, and I wanted to share my experience with the product as it solves this problem in extremely easy steps.

StarWind Software is the maker of StarWind Virtual SAN – a virtual shared storage solution (iSCSI, SMB3 and NFS). It provides fault-tolerant Virtual SAN for a fraction of the cost of buying conventional SAN storage solutions. The StarWind Virtual SAN starts from two hosts and has literally infinite scale-out capabilities. This software is extremely easy to use and recently I tested out the solution and wrote a blog post about the software that can be found here.

But to carry on with the V2V converter, a cool thing that StarWind Software gives away to the community.

The product is FREE! And it can be downloaded here at StarWind Software website

I got a brand new version of the converter program from StarWind and the main list of features is:

  • Adding of QCOW format (KVM)
  • Enlarged VMDK format (an option “streamOptimized” was added):
  • Monolithic sparse format compressed for streaming. Stream optimized format does not support random reads or writes.
  • Sparse disks employ the copy-on-write (COW) mechanism, in which virtual disk contains no data in places, until copied there by a write. This optimization saves storage space.

I’m really excited about the KVM format conversion option as KVM has a growing footprint in the service provider space.

 

An earlier version, release in February 2015 had those product highlights .Added support for MS VHDX container format. It requires running on Windows 8/2012 or higher version of Windows.

  • Windows Repair Mode may be activated for converted image, allowing virtual machines to adapt to hardware environment of a new hypervisor automatically.
  • Extended the command-line utility to support VHDX format and repair mode option.
  • Introduced new style of GUI in V2V Converter Wizard.
  • Added help file to installation.

Key features

What I think is useful, is the fact that StarWind V2V converter can boot the newly-converted VHDX VM in Windows Repair Mode, automatically adapting it to the new environment. That’s something other conversion software, like Microsoft Virtual Machine Converter or VMware vCenter converter standalone, are unable to do. The feature will help you deal with driver issues, boot problems and the like.

Another key feature is the command line options can save you a lot of time when you have to convert a lot of images and don’t want to spend your time clicking a way though the GUI over and over again.

Also, the possibility to select between thick and thin provisioned formats can save you both time and space.

Supported formats

The V2V converter can open disk image files that are

  • VMware’s .VMDK
  • Microsoft HyperV .VHD
  • Microsoft HyperV .VHDX
  • QCOW format (KVM)
  • As well as Starwind’s native .IMG format that is used by StarWind Virtual San software.

When you chose the output format, you have additional options for each supported file format.

  • Thin and thick provisioned VMware .VMDK
  • Thin and thick provisioned HyperV .VHD
  • New HyperV .VHDX
  • QCOW format (KVM)
  • StarWind Raw Image .IMG

You also get the option to have the converted VM boot up in windows repair mode if you have the need to due to different hardware configuration or compatibility problems.

Installation

Installation is a straight forward, you just run the installer as any other windows installer, accept the license agreement, select destination folder and so forth, so you can’t go wrong here.

Conversion process

When you open the program, you get a window where you have to point to the source image and when you point to the file, the converter detects the source format

Next you have to select the output format.

Next I get the option to activate Windows Repair Mode.

Next I select the destination file location and name.

And then the conversion process runs giving you a process bar as well as information about the files selected.

In this case, I converted a 40GB VMware .VMDK file, containing a fresh installation of Windows 2012R2, to a 14GB thin provisioned HyperV .VHDX file. The process took only 1 minute as I ran the process on my workstation’s PCI Flash card (Mushkin Scorpion Deluxe 480GB). Your conversion speed may vary, as the speed is mainly limited by the speed of your source and destination disks.

When you convert to the VMware .VMDK format, you get the option to choose between IDE or SCSI Virtual disk Type.

One thing I wanted to point out when you are converting VM’s like this, you have to take a look at the source configuration in regard of the new .VHDX format, EFI bios or classic BIOS settings to make sure the VM will boot up from the converted file. This is in reality something that is not directly related to the converter program, but you might start to pull your hair of your source VM was using EFI bios, and you didn’t select the destination VM to use EFI bios also.

Bulk conversation

If you have to convert more than few VM’s at at time, you can use the command line tool and script the process. It’s a simple process as well.

V2V.exe if=<InputFileName> of=<OutputFileName> ot=<OutputType> [vmdktype=<VMDKAdapterType>] [activate_rm]

Details for the parameters are included in the program’s help file.

Conclusion

Overall my experience with StarWind V2V converter has been great. The program runs though the conversation process though it’s wizard and the program does what it does in a really simple and fast way. Upcoming KVM support is also great and I’ll update this blogpost when I get the new version and have tested the new options.

How to run nested Hyper-V in vSphere 6.0

Few days ago I was testing a virtual –to- virtual converter software from StarWind, and as a by-product of an upcoming blog on the matter, I took some screenshot and wrote down settings that are relevant to running a nested Hyper-V in an ESXi environment.

The problem

When Installing the Hyper-V role on a windows 2012R2 machine that is a virtual machine, you get an error message saying “Hyper-V cannot be installed: A hypervisor is already running”, and to “fool” the Windows OS to believe it is running on a native x86 machine, you have to set several options for the VM

The steps when installing hyper-V on a Windows 2012R2

You select the “Hyper-V” role and hit “Next”

And you hit next, and automatically get the option to install the management tools as well, and in most cases you would want that, so you click “Add Features” to continue

And soon as you hit the add feature button you get a validation result popup stating that you can’t install hyper-V on this windows machine.

Sure, – you have a hypervisor running on your ESXi Host, – but you want to have a hypervisor running on this virtual machine as well. –

The solution

First you have to shut down the VM and remove it from vCenter inventory

Then enable SSH on the ESXi host if you have not already

Then you edit the .vmx file for the VM and add 2 lines at the bottom of the file.

vhv.enable = “TRUE”
hypervisor.cpuid.v0 = “FALSE”

Then using the datastore browser find the same .vmx file, and right click it and select add to inventory

 

 

In the web client, edit the settings for the VM and expand the CPU settings. There under “CPU/MMU virtualization (*) section, select “hardware CPU and MMU”

Also notice that the checkbox is set for “hardware virtualization”, and that is due to the hypervisor.cpuid.v0 = “FALSE” setting in the .vmx file.

When finished, boot up the VM and go back to the Add roles and Features Wizard on the VM

Now the wizard runs and installs Hyper-V on a nested VM

 

When you have this set up, and you have created a Hyper-V virtual machine, and you need network connectivity to the VM, you need to set your ESXi vSwitch security policy to “Promiscuous mode: Accept”. – This is the same setting that you have to do when you run nested ESXi and you have vm’s that need network connectivity.

When this was ready, I tried to run the Hyper-V integration services tools on the nested hyper-v VM, (this sounds like something from inception…)

I get and error “The Hyper-V Integration service can only be installed inside of a virtual machine running under Hyper-V”

I did not find a workaround for this, but if you have one, please let me know in the comments or by sending me an e-mail and I’ll update the post.

Conclusion

You can run a nested Hyper-V installation under ESXi if you need to do some basic testing. In my case I was testing a Virtual to Virtual Converter software from StarWind Software, and need create a hyper-V virtual machine, convert it to a .vmdk file and boot it up in my vSphere based lab (and via versa), but don’t expect to have a good performance on the nested VM 😉

 

Share if you feel this post to be useful.

Storage in the home lab.

Home Labs in general

When asking my colleagues what to run as a storage platform in my home-lab, I got an honest question from a fellow blogger and vExpert Rasmus Haslund (@haslund)

What are your requirements, challenges and constrains??
My answer: “Well, I want all the features and best performance, but I have limited or no budget!”

This could easily be applied to your production setup where you have the challenge of providing a stable service level, while having limited budget on external storage. So if you work for a small/medium company looking for a storage solution for your virtual workloads, read on and hopefully you can apply the solution described in this blogpost to your installation.

The challenge

As a vExpert, blogger and enthusiast for all sorts of storage and virtualization solutions, I find it necessary to have a lab at home to do tests and evaluate different solutions. I also run several vm’s for my home network that I have to take care of and have to answer to my son and wife if I screw up!

For quite some time I had a limited flexibility in regard of the lab and to maintain some level of service for my home network I had to find a better solution.

My son has a Minecraft server running that need to be up in the evenings specially, and my wife’s ideas about SLA for her e-mail and picture library in this regard is that a 100% uptime is “normal”!! So it’s tough ground to maintain and also have flexibility when it comes to testing and running some ad-hoc workloads.

In my basement there is a storage space and after I got a networking cable down there from my apartment on 2th floor, I could start up more hardware without my family being disturbed by noise and cables running all over my desk. Down there I can maintain a stable setup for my home network and have some extra hardware to play around with when I need to try out something.

When I got the chance to repurpose some servers from work I decided to redesign the home lab. It had been running from a one ESXi white-box host with 1 x Intel I7 3770K CPU and 32GB RAM and surly could befit of more CPU and RAM resources.

To set out some requirements and figure out the challenges.

The goals

  1. Maintain reasonable level of uptime and performance of my home network.
  2. Have available disk space and resources to set up a nested ESXi environment for testing different setups and solutions without exposing the home network to risk.
  3. Have a storage solution to be accessible by my 2 ESXi hosts.
  4. Minimizing heat generation and electricity cost for running the home network, but still have the ability to spin up more workloads for testing in the lab when needed.

The hardware

The servers I got for the lab are pretty massive!

3 x Dell PowerEdge R710, each having dual X5675 3,0Ghz CPU’s and 288GB of RAM. Each server has 4 x 1Gbit network cards onboard, and 1 x dual port 1Gbit NIC. Each server has the Dell H700 SAS controller (LSI based controller)

The solution

When looking for a storage solution I decided to use one of the R710 machines as an iSCSI target device as it had 6 x 3.5” drive bays. There I could place my 6 x 2TB SATA drives I previously had in my white-box server. This R710 server would become the shared storage for the 2 ESXi hosts as well as being a proxy server for my Veeam Backup installation, a Minecraft server for my son and a PLEX media server for my home entertainment system. (All those workloads that had been running on my wife’s desktop for some time, much to her enjoyment as you can believe) On one of the ESXi hosts I would run my home network workloads, but have the option to turn on ESXi host 2, and for lab testing.

I looked at several options, both Linux and windows based, virtual and non-virtual, that would enable me to run both the NAS iSCSI workload, but also the Veeam proxy, PLEX and Minecraft service. The setup I found most appealing for testing the different RAID levels and was a non-virtual windows based Starwind Virtual SAN solution

The main reason for running the workload in a non-virtualized Windows installation, was the fact that this enabled me test different IO and cache policies on the physical volume used as an iSCSI target. On native windows I could use the LSI MegaRaid Storage Manager to create and destroy volumes without having to reboot the server.

At a later stage I might run ESXi on this host, reducing the footprint down to 2 physical R710 machines using Starwind 2 node cluster setup.

Features of the Starwind SAN solution that I found interesting

Main Product page and Free Product Page

There are several features in the Starwind software that I found extremely cool. Also the simple setup and configuration process of the solution is truly remarkable. It makes testing the different configurations fast and easy.

To name a few features that got my attention while testing the software, that other users could benefit of both in regard of lab testing and for production workloads.

  • Use of defined amount of RAM for cache for each defined iSCSI device.

This allows me to define the amount of RAM assigned for the NAS storage role, keeping RAM available to other workloads on the server. This also allows me to define different devices and iSCSI target with different amount of RAM depending on workload types. Keep in mind that if you assign many GB’s of RAM for cache in a production setup, make sure you have a UPS to be able to commit all cached writes to disk!

  • Create a RAM based disk device.

Using this super-fast iSCSI target is great for testing and deploying temporary workloads in the lab. I plan to experiment with this feature more, but keep in mind this in in memory, so data is not written to any persistence storage! Non-persistence VDI disks (linked-clones) come in mind or classroom VM’S could use this feature to give great end-user experience.

  • Log-Structured File system while thin-provisioning the storage device.
    This feature turns otherwise “all writes are random” situation while running mixed virtual workloads, into sequential write on the underling storage. A whitepaper (https://www.starwindsoftware.com/whitepapers/eliminating-the-io-blender-by-jon-toigo.pdf) by Jon Toigo explains this in great detail, but this features boosts the benefits of thin-provisioning to a whole new level!
  • Publish a physical disk directly as an iSCSI target.
    This feature caught my eye, and I still have to investigate the pros and cons in this regard.

 The Network design

To give out a clear picture of my setup, I made the following diagrams.

Layer 1 Diagram

Picture 1: Cabling layout

  • 2 x 1Gbit network interfaces are connected from each ESXi host to the iSCSI NAS host.
  • 2 x 1Gbit network interfaces are used for vMotion and replication.

Layer 2-3 Diagram

Picture 2: Layer 2-3 diagram

The diagram shows the networking layout of the 2 iSCSI networks. Different subnets are used for each physical adapter assigned to iSCSI to provide active-active paths to the iSCSI target machine.
Path selection Policy is set to “Round Robin” for link load balancing

vMotion network between the hosts are bound to 2 physical network adapters, on a single subnet.

Storage design

For testing purposes, I decided to install Windows 2012 directly on a 2 disk mirror, and have the 4 extra drive slots to test different RAID levels and drive types. This allowed me to run the LSI MegaRaid Manager software and set different settings on the volumes and save me the reboot time when changing raid levels or drive types.

I had 4 x 2TB, 7.4K SATA drivers and 4 x 600GB, 15K SAS drives to test.

 Different Raid Levels and drive types.

First I tested out different RAID levels and on both types of drives, and ran FIO tests locally on the volume created.

Different Raid Levels

It caught my eye that when using the SATA drives, performance gain from Raid 10 to Raid 0 was minimal, while the SAS drives had huge performance gain while running Raid 0 vs Raid 10. Later I plan to do a 6 x 2TB SATA drive Raid 10, and that’s most likely the configuration I’ll end up using for my lab setup.

For the remaining of the performance tests, I ran the Raid 10 setup on the 4 x 15K drives, and the main goal was to find out if the different deployment options on the Starwind SAN software made any measurable difference, and also to see how it performed against the native Windows 2012R2 iSCSI target.

CrystalDiskMark tests

First test was done by using CrystalDiskMark measuring MB/Sec

CrystalDiskMark MB/secCrystalDiskMark IOPS

The tests show that in any configuration, the Starwind SAN software outperforms the Windows 2012R2 Built in iSCSI target solution by far. The only tests where the Windows iSCSI target was close was while testing sequential reads or writes, and I believe the limiting factor was the single threaded process and use of one network connection between the 2 physical machines.

All the random reads and writes tests showed huge benefits while using the Starwind solution. The CrystalDiskMark is a simple tool to test disk performance and it does not allow you to change from the fixed 4K block size, or go beyond the queue depth of 32.
The H700 controller on the iSCSI target machine has queue depth of 975 and to utilize the 2x 1GB network connection I moved from the CrystalDiskMark to more customable test tool, FIO.

To create a baseline and to get the maximum performance without the limitation of my 2 x 1GB network connections between hosts, I ran all tests both locally on the iSCSI target machine and on a remove VM. To test the performance running locally, I mapped a set of iSCSI targets as drives on the windows iSCSI target machine and an identical set of targets to my ESXi host.

The FIO test setup.

Each Starwind iSCSI target configured with 10GB Memory Cache

VM runs on a ESXi 6.0 Hosts, connected by 2 x 1GB Network cards, each configured on separate Subnets, – Round Robin PSP selected

FIO WindowsIO Engine settings:
Random Read/Write:    33/66
Block Size:                          64K
Queue Depth:                  975
4 x 15GB Jobs, 4 files each

FIO MB/sec

FIO IOPS

Direct = FIO Run directly on iSCSI target machine disk volume
Flat = Starwind iSCSI Target with Flat provisioned Image file
LSFS = Starwind iSCSI Target with Thin provisioned disk using LSFS
LSFS Dedup = Starwind iSCSI Target with Thin provisioned disk using LSFS and Deduplication enabled
Physical Disk = Starwind iSCSI Target from physical disk

The direct testing showed how much performance I could get from direct disk access. As I ran those tests, I got a clear picture of the different deployment options in the Starwind SAN software and my findings showed that the Thin Provisioned disk utilizing the LSFS was the fastest option.

While testing deduplication, performance dropped to some degree compared to the LSFS option in regard of IOPS. I also noticed some (5-7%) increased CPU load on the iSCSI target machine while I was running the tests. Also keep in mind that each 1 x TB of deduplicated storage requires 3.5GB of RAM. In my setup this was not an issue but if you have limited amount of RAM you should take note of this fact.

Future plans and few points.

Later, when I have finished the performance tests, I plan to create a target device, for the system drives for my home network VM’s, using deduplication, and save space there, but I’ll leave that option disabled for the PLEX media library and also the photo library as those media files are unlikely to be good candidates for deduplication.

When rebooting the iSCSI target machine, I noticed that the FLAT file and Physical DISK targets were active soon after boot time, but the thin provisioned LSFS and LSFS Dedup targets took some time to become active. After some investigation I saw the LSFS files were all read though, most likely due to file-checking and verification. My test targets were all 100GB in size and it took some time (5-10 minutes) to become active. When evaluating the benefits of FLAT or Physical targets, I guess if you have large targets (3TB as in my case for PLEX media library) you would prefer to use the FLAT file option there to have the targets online soon after reboot.

Conclusion

For a 2-3 hosts setup like mine, or even 1 host installation, it is clearly beneficial to use the Starwind SAN iSCSI software rather than direct disk access or native Windows iSCSI target software.

My findings on different deployment options will hopefully help you decide on what to go with both in your lab or production installations.

A colleague of mine pointed out that my home lab had more performance than many of his client’s production setups, and told me that if I was happy with the performance of the Starwind SAN software, he could recommend it to his clients for production!