Devops testing sites

testing
cloud
devops
grid
12factor

(Sean Murray) #1

Hi

In the interests of proper devops, I want a testing site that can come up and test (eventually completely) the auto deployment of my site.

All the way from my deployment server to full operations.
Once all the bits and pieces are installed, configured, and up, then execute a test set of jobs to make sure its configured properly. So non trivial jobs, depending what a site is configured for, starting with trivial hostname style up to proper mpi type tests.

I have 2 options for testing a site as far as I can tell.
1. configure and then textually verify my config files. :frowning: not ideal.
2. bring up entire parallel site and submit jobs and check they complete properly.

I want the option 2, to know that my site is good. It would be a small site, the compute nodes of no consequence its the other stuff one is really testing. The configs of bdii, ce, batch, 1...n worker nodes, vobox, se (nominal transient storage).

The technical details of docker/vm/real i dont think are relevant. Certificates and dns would be required for this site.

I would suspect this would appear no different from the current concept of a wlcg tier3 ?

One would then bring it up nightly or by nightly to verify the contents of your version controlled site.
Testing on vc update i think is not practical/possible.

This would then mean I am testing everything except the host certificates for the real site and the hostnames for the real site.

An alternate idea would be to keep the test site running, and only reinstall it when testing of the version control requires it.

Thoughts, comments, welcome.

Cheers
Sean


(Bruce Becker) #2

Hey @SeanMurray_59b6

What you are describing sounds a lot like continuous delivery of infrastructure. Before I get into the details of how I suggest we could implement something like this, I think let's focus on the main point here : what are you testing. If you can write down a list of things that, when tested, would either pass or fail, then I think we'd know how to build such a service. This list should be something like a spec - a-la Test Kitchen or ServerSpec maybe. Then you can make assertions about the service(s) and test them one by one.

In terms of

this is probably the way to go - however, it again depends what you are testing. If you are testing whether changes to your configuration or deployment code will mean changes to the production site. What you describe sounds like you are thinking about tests that get run before deployment, but I think that use case is so rare that it's almost uninteresting. You should test the impact of changes on current services, so that you can ensure availability. The point is to minimise the negative impact of changes, so that you can write the DevOps code with confidence and not get stuck between a "don't do anything" and a "break everything" situation.

Just thinking about this, I would say "first write the tests" - you can write them as a form of monitoring maybe.

As for the deployment of the thing, you should follow 12factor.net suggestion of Dev/Prod parity. The only thing that should change between the dev and prod site are the backing services.

Just as a challenge, could you describe in plain English how you would see this kind of thing being done - IE, what would happen during a test ?


(Sean Murray) #3

Ciao Bruce

Regarding unit testing ja of course.

I think I can put it more simply.

Sure you could test each service separately i suppose, but it would seem simpler and more thorough to simply bring up a whole site, site-bdii, ce, se, cluster, (otherstuff).

I want a site not in production that I can test my entire config on.
I can bring it up pull it down at my will.

This would then be brought up after all singular system testing is done.
When a change is done, say reconfig ssh or fiddle with ip tables, or fiddle with routing or fiddle with who knows what, reyaim, update OS.

Transient test site goes down and is rebuilt.
Grid jobs with known results are sent in and hopefully come back with known results.
Site validated, migrate changes into current production cluster.
For some changes simply copy in via config management system, some would require certain parts to go down, but that would not be automated.

Does this make it more confusing ?

Adding to this from a brief discussion this morning.

ok so i do beaker tests for puppet to check things are working and repsonding on particular port, not susseptable to a particular CVE, known queries to services come back with expected results etc. leading on to the above idea of site testing.

how does one do such testing in ansible world ?