Class: Grid5000::Campaign::Engine

Inherits:
Object
  • Object
show all
Defined in:
lib/grid5000/campaign/engine.rb

Constant Summary

USER_ATTRIBUTES =

User attributes

Usage

class MyEngine < Grid5000::Campaign::Engine
  set :attribute1, value
end

Values of other attributes can be accessed via defaults[attribute]

Semantics

environment

a name or URI of an environment to deploy [default=squeeze-x64-base].

public_key

a URI to the SSH public key to be used (file or http(s) scheme). Will be inferred from your ~/.ssh if not explicitely set [optional].

private_key

path to the private part of your SSH public key. Will be inferred from your ~/.ssh if not explicitely set [optional].

resources

description of the resources to book in your job [default=nodes=1].

properties

a string of OAR properties (e.g. '-p ...').

walltime

duration in seconds of your job [default=3600].

user

Grid'5000 username [default=ENV].

notifications

array of notification URIs [optional]. Valid URI schemes include: HTTP, MAILTO, XMPP.

site

site id on which to launch the campaign [default=rennes].

no_submit

attempts to reuse an existing running job on the same site, with the same name and the same owner. Will launch a new job if none found [default=false].

no_deploy

attempts to reuse an existing deployment on the same site, on the same nodes, with the same owner. Will launch a new deployment if none found [default=false].

no_install

do not launch the install phase [default=false].

no_execute

do not launch the execute phase [default=false].

no_cleanup

do not automatically delete all your jobs and deployments at the end of your campaign (if everything went well) [default=false].

no_cancel

do not automatically delete all your jobs and deployments if an error occurs [default=false].

name

name of your campaign [default=class.name]

queue

name of the OAR queue to submit to (valid queues: admin, testing, besteffort, g5kschool, default) [default=default]

gateway

the hostname of a gateway to use when issuing SSH, SFTP, and SCP commands [default=none].

logger

an object that acts as a the Logger standard ruby library.

submission_timeout

maximum duration (in seconds) to wait for a job to be running [defaut=5*60].

deployment_timeout

maximum duration (in seconds) to wait for a deployment to be terminated [defaut=15*60].

deployment_min_threshold

minimum percentage of nodes that must have been correctly deployed, for the deployment to be considered succesful [default=1 (100%)]

deployment_max_attempts

maximum number of attempts that must be made if the deployment fails [default=1]

ssh_max_attemps

maximum number of attempts that must be made if a host is unreachable when trying to connect via SSH [default=3].

chdir

the directory in which the engine code should be executed [default=ENV or engine directory if custom engine is loaded].

polling_frequency

interval (in seconds) between two polls on a resource to check its state [default=5].

[
  :environment,
  :public_key,
  :private_key,
  :resources,
  :properties,
  :user,
  :walltime,
  :notifications,
  :site,
  :no_submit,
  :no_deploy,
  :no_cancel,
  :no_install,
  :no_execute,
  :no_cleanup,
  :name,
  :queue,
  :gateway,
  :logger,
  :submission_timeout,
  :deployment_max_attempts,
  :deployment_min_threshold,
  :deployment_timeout,
  :ssh_max_attempts,
  :chdir,
  :polling_frequency
]

Instance Attribute Summary (collapse)

Class Method Summary (collapse)

Instance Method Summary (collapse)

Constructor Details

- (Engine) initialize(connection, options = {})

Note:

the options hash can contain any of the USER_ATTRIBUTES, which will overwrite the defaults set by the engine.

Initialize the experiment engine.

Parameters:

  • connection (Restfully::Session)

    a Restfully::Session, correctly configured

  • options (Hash) (defaults to: {})

    a hash of options

See Also:



284
285
286
287
288
289
290
291
292
293
294
295
# File 'lib/grid5000/campaign/engine.rb', line 284

def initialize(connection, options = {})
  USER_ATTRIBUTES.each do |uattr|
    self.class.defaults[uattr] = options[uattr] || options[uattr.to_s] || self.class.defaults[uattr]
  end

  # Do not allow direct modification of defaults after initialization.
  # Users should only change the <tt>env</tt> hash that is passed to every hook, if needed.
  self.class.defaults.freeze

  @connection = connection
  @mutex = Mutex.new
end

Instance Attribute Details

- (Object) connection

The Restfully::Session object.



250
251
252
# File 'lib/grid5000/campaign/engine.rb', line 250

def connection
  @connection
end

Class Method Details

+ (Object) after(name, &block) {|Hash, *args| ... }

Register a hook to be executed after method name.

Parameters:

  • name (String)

    the name of the method.

  • block (Proc)

    the block that will be called.

Yields:

  • (Hash, *args)

    the environment hash, plus optional arguments.

Yield Returns:

  • (Hash)

    the block MUST return the environment hash.



156
157
158
159
# File 'lib/grid5000/campaign/engine.rb', line 156

def after(name, &block)
  after_hooks[name] ||= []
  after_hooks[name].push(block)
end

+ (Hash<Symbol, Proc>) after_hooks

The hash of registered after_* hooks.

Returns:

  • (Hash<Symbol, Proc>)

    the hash of registered after_* hooks.



211
212
213
# File 'lib/grid5000/campaign/engine.rb', line 211

def after_hooks
  @after_hooks ||= deep_copy(parent(:after_hooks) || {})
end

+ (Object) before(name, &block) {|Hash, *args| ... }

Register a hook to be executed before method name.

Parameters:

  • name (String)

    the name of the method.

  • block (Proc)

    the block that will be called.

Yields:

  • (Hash, *args)

    the environment hash, plus optional arguments.

Yield Returns:

  • (Hash)

    the block MUST return the environment hash.



135
136
137
138
# File 'lib/grid5000/campaign/engine.rb', line 135

def before(name, &block)
  before_hooks[name] ||= []
  before_hooks[name].push(block)
end

+ (Hash<Symbol, Proc>) before_hooks

The hash of registered before_* hooks.

Returns:

  • (Hash<Symbol, Proc>)

    the hash of registered before_* hooks.



206
207
208
# File 'lib/grid5000/campaign/engine.rb', line 206

def before_hooks
  @before_hooks ||= deep_copy(parent(:before_hooks) || {})
end

+ (Object) deep_copy(object)

:nodoc:



192
193
194
195
196
197
198
199
200
201
202
203
# File 'lib/grid5000/campaign/engine.rb', line 192

def deep_copy(object) #:nodoc:
  case object
  when Hash
    h = {}
    object.each{|k,v|
      h[k] = deep_copy(v)
    }
    h
  else
    object.dup rescue object
  end
end

+ (Hash) defaults

The hash of default options set.

Returns:

  • (Hash)

    the hash of default options set.



108
109
110
111
112
# File 'lib/grid5000/campaign/engine.rb', line 108

def defaults
  @defaults ||= deep_copy(parent(:defaults) || {}).merge(
    :name => self.name
  )
end

+ (Object) inherited(klass)



184
185
186
# File 'lib/grid5000/campaign/engine.rb', line 184

def inherited(klass)
  subclasses.push(klass)
end

+ (Array<String,String>, String) keychain(key_type = nil)

Finds the first SSH key that has both public and private parts in the ~/.ssh directory.

Returns:

  • (Array<String,String>)

    the public_key_path and private_key_path if key_type is nil.

  • (String)

    the public key if key_type=:public, or the private key if key_type=:private.



223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
# File 'lib/grid5000/campaign/engine.rb', line 223

def keychain(key_type = nil)
  public_key = nil
  private_key = nil
  Dir[File.expand_path("~/.ssh/*.pub")].each do |file|
    public_key = file
    private_key = File.join(
      File.dirname(public_key),
      File.basename(public_key, ".pub")
    )
    if File.exist?(private_key) && File.readable?(private_key)
      break
    else
      private_key = nil
    end
  end
  case key_type
  when :public
    public_key
  when :private
    private_key
  else
    [public_key, private_key]
  end
end

+ (Object) load(uri)

Load a custom engine from a file or HTTP URI. The latest engine from the file will be returned.

Parameters:

  • uri (String)

    the URI string of the file location.



165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# File 'lib/grid5000/campaign/engine.rb', line 165

def load(uri)
  logger.info "Loading #{uri.inspect}"
  case URI.parse(uri.to_s)
  when URI::HTTP, URI::HTTPS
    tempfile = Tempfile.new(["g5k-campaign-engine-", ".rb"])
    tempfile.puts RestClient.get(uri)
    tempfile.close
    klass = tempfile.path
  else
    klass = File.expand_path(uri)
  end

  require klass

  engine = subclasses.last
  engine.set :chdir, File.dirname(klass)
  engine
end

+ (Logger?) logger

Logger the logger object.

Returns:

  • (Logger, nil)

    logger the logger object.



126
127
128
# File 'lib/grid5000/campaign/engine.rb', line 126

def logger
  defaults[:logger]
end

+ (Object) on(name, &block) {|Hash, Proc| ... }

Register a hook to be executed instead of method name The original method name should be explicitely called by the user within the hook, or nothing will happen.

Parameters:

  • name (String)

    the name of the method.

  • block (Proc)

    the block that will be called.

Yields:

  • (Hash, Proc)

    the environment hash and original block to be called when the method yields.

Yield Returns:

  • (Hash)

    the block MUST return the environment hash.



146
147
148
149
# File 'lib/grid5000/campaign/engine.rb', line 146

def on(name, &block)
  on_hooks[name] ||= []
  on_hooks[name].push(block)
end

+ (Hash<Symbol, Proc>) on_hooks

The hash of registered on_* hooks.

Returns:

  • (Hash<Symbol, Proc>)

    the hash of registered on_* hooks.



216
217
218
# File 'lib/grid5000/campaign/engine.rb', line 216

def on_hooks
  @on_hooks ||= deep_copy(parent(:on_hooks) || {})
end

+ (Object) parent(method)



98
99
100
101
102
103
104
# File 'lib/grid5000/campaign/engine.rb', line 98

def parent(method)
  if superclass.respond_to?(method)
    superclass.send(method)
  else
    nil
  end
end

+ (Object) set(attribute, value)

Sets a new value for a default attribute. Hello E.g.

set :site, "nancy"
set :walltime, 7200


118
119
120
121
122
123
# File 'lib/grid5000/campaign/engine.rb', line 118

def set(attribute, value)
  unless defaults.has_key?(attribute)
    define_method(attribute) { self.class.defaults[attribute] }
  end
  defaults[attribute.to_sym] = value
end

+ (Object) subclasses



188
189
190
# File 'lib/grid5000/campaign/engine.rb', line 188

def subclasses
  @@subclasses ||= [Engine]
end

Instance Method Details

- (Object) cancel!(env)

Cleans up the current experiment job and deployment, if any.

Parameters:

  • env (Hash)

    the environment hash.



567
568
569
570
571
# File 'lib/grid5000/campaign/engine.rb', line 567

def cancel!(env)
  logger.warn "Received cancellation signal."
  cleanup!(env)
  env
end

- (Object) cleanup!(env, job = nil, deployment = nil)



573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
# File 'lib/grid5000/campaign/engine.rb', line 573

def cleanup!(env, job = nil, deployment = nil)
  synchronize {
    if job.nil? && deployment.nil?
      logger.info "Cleaning up all jobs and deployments..."
      @deployments.each{ |d| d.delete }.clear
      @jobs.each{ |j| j.delete }.clear
    else
      unless deployment.nil?
        logger.info "Cleaning up deployment##{deployment['uid']}..."
        @deployments.delete(deployment) && deployment.delete
      end
      unless job.nil?
        logger.info "Cleaning up job##{job['uid']}..."
        @jobs.delete(job) && job.delete
      end
    end
  }
  env
end

- (Hash?) deploy!(env = {}) {|Hash, Proc| ... }

Deploy the specified env[:nodes] with the specified env[:environment] or attempts to reuse an existing deployment (on the same nodes requested by the same user) if env[:no_deploy]==true.

If you want to customize what is done on the deployment phase, you should register a on(:deploy!) hook as follows:

on :deploy! do |env, block|
  # Do whatever crazy things you want
  # ...
  # Reuse the original deploy! method when you want to submit a deployment (can be called multiple times).
  deploy!(env, &block)
  # some other things...
  env
end

This method is thread-safe.

Parameters:

  • env (Hash) (defaults to: {})

    the environment hash, and an optional block to call if the deployment succeeds.

Yields:

  • (Hash, Proc)

    the environment hash with the new deployment available in env[:deployment] if successful.

Returns:

  • (Hash, nil)

    the environment hash with the new deployment available in env[:deployment] if successful, otherwise nil.



481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
# File 'lib/grid5000/campaign/engine.rb', line 481

def deploy!(env = {})
  env[:remaining_attempts] ||= env[:deployment_max_attempts]
  env[:nodes] = [env[:nodes]].flatten.sort
  logger.info "[#{env[:site]}] Launching deployment [no-deploy=#{env[:no_deploy].inspect}]..."
  if env[:no_deploy]
    # attempts to find the latest deployment on the same nodes
    deployment = connection.root.sites[env[:site].to_sym].deployments(
      :reload => true
    ).find{ |d|
      d['nodes'].sort == env[:nodes] &&
      d['user_uid'] == env[:user] &&
      d['created_at'] >= env[:job]['started_at'] &&
      d['created_at'] < env[:job]['started_at']+env[:walltime]
    }
  else
    if env[:remaining_attempts] > 0
      if env[:remaining_attempts] < env[:deployment_max_attempts]
        logger.info "Retrying deployment..."
      end
      env[:remaining_attempts] -= 1
      deployment = connection.root.sites[env[:site].to_sym].deployments.submit({
        :nodes => env[:nodes],
        :notifications => env[:notifications],
        :environment => env[:environment],
        :key => key_for_deployment(env)
      }.merge(env.reject{ |k,v| !valid_deployment_key?(k) }))
    else
      logger.info "[#{env[:site]}] Hit the maximum number of retries. Halting."
      deployment = nil
    end
  end

  if deployment.nil?
    # if no valid deployment can be found without deploying, go through the normal path
    if env[:no_deploy]
      env[:no_deploy] = false
      deploy!(env)
    else
      logger.error "[#{env[:site]}] Cannot submit the deployment."
      nil
    end
  else
    deployment.reload
    synchronize { @deployments.push(deployment) }

    logger.info "[#{env[:site]}] Got the following deployment: #{deployment.inspect}"
    logger.info "[#{env[:site]}] Waiting for termination of deployment ##{deployment['uid']} in #{deployment.parent['uid']}..."

    Timeout.timeout(env[:deployment_timeout]) do
      while deployment.reload['status'] == 'processing'
        sleep env[:polling_frequency]
      end
    end

    if deployment_ok?(deployment, env)
      logger.info "[#{env[:site]}] Deployment is terminated: #{deployment.inspect}"
      env[:deployment] = deployment
      yield env if block_given?
      env
    else
      # Retry
      synchronize { @deployments.delete(deployment) }
      logger.error "[#{env[:site]}] Deployment failed: #{deployment.inspect}"
      deploy!(env) unless env[:no_deploy]
    end
  end
end

- (Object) execute!(env)

This method should contain the "logic" of the campaign, once everything is setup.

Parameters:

  • env (Hash)

    the environment hash.



559
560
561
562
# File 'lib/grid5000/campaign/engine.rb', line 559

def execute!(env)
  logger.warn "Your engine does not overwrite the :execute! method. Nothing will be executed on #{env[:nodes].inspect}."
  env
end

- (Boolean) how_many?(options = {})

Returns the number of nodes that correspond to the specified state criteria, for each site requested.

Examples:

How many nodes are alive && (free || besteffort) in rennes and nancy?

how_many?(:hard => :alive, :soft => [:free, :besteffort], :in => [:rennes, :nancy]) # => {:rennes => 40, :nancy => 23}

Parameters:

  • options (Hash) (defaults to: {})

    the options to filter the result with.

Options Hash (options):

  • :hard (String, Array) — default: :alive

    a symbol or array of symbols specifying the hardware status(es) that must be matched by the nodes to be counted.

  • :soft (String, Array) — default: [:free, :besteffort]

    a symbol or array of symbols specifying the system status(es) that must be matched by the nodes to be counted.

  • :in (String, Array) — default: all

    a symbol or array of symbols specifying the sites of interest.

Returns:

  • (Boolean)


689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
# File 'lib/grid5000/campaign/engine.rb', line 689

def how_many?(options = {})
  options = {:hard => :alive, :soft => [:free, :besteffort]}.merge(options)
  count = {}

  sites = [options[:in]].flatten.compact.map(&:to_s)
  hard_state = [options[:hard]].flatten.compact.map(&:to_s)
  soft_state = [options[:soft]].flatten.compact.map(&:to_s)

  connection.root.sites.each do |site|
    next if !sites.empty? && !sites.include?(site['uid'])
    count[site['uid'].to_sym] = site.status.count do |ns|
      hard_state.include?(ns['hardware_state']) &&
      soft_state.include?(ns['system_state'])
    end
  end

  count
end

- (Object) inspect



670
671
672
673
674
675
676
677
# File 'lib/grid5000/campaign/engine.rb', line 670

def inspect
  s = "#<#{self.class.name}:0x#{self.object_id.to_s(16)}"
  USER_ATTRIBUTES.sort_by{|u| u.to_s}.each {|uattr|
    next if [:logger].include?(uattr)
    s << " @#{uattr}=#{send(uattr).inspect}"
  }
  s << ">"
end

- (Object) install!(env)

This method performs installation commands on the nodes.

Parameters:

  • env (Hash)

    the environment hash.



552
553
554
555
# File 'lib/grid5000/campaign/engine.rb', line 552

def install!(env)
  logger.warn "Your engine does not overwrite the :install! method. Nothing will be installed on #{env[:nodes].inspect}."
  env
end

- (Object) notify(message, to = nil)

Send a notification. By default, it sends to the default notifications array.

Parameters:

  • message (String)

    the message to send.

  • to (Array) (defaults to: nil)

    a list of notification URIs.



713
714
715
716
717
718
719
720
# File 'lib/grid5000/campaign/engine.rb', line 713

def notify(message, to = nil)
  to ||= notifications
  return true if to.nil? || to.empty?
  connection.post("/sid/notifications", {:to => [to].flatten, :body => message})
rescue Exception => e
  logger.warn "Cannot send notification: #{e.class.name} - #{e.message}"
  false
end

- (Object) parallel(options = {}) {|p| ... }

Primite that returns a new Parallel object. Parallel#loop! must be explicitly called to wait for the threads within the Parallel object.

If option :ignore_thread_exceptions is given and true, then standard exceptions (including timeouts) that occur in one of the threads will be ignored (only an error log will be displayed). This is useful if you are doing multi-site campaigns.

Parameters:

  • options (Hash) (defaults to: {})

    a hash of additional options to pass.

Yields:

  • (p)


605
606
607
608
609
# File 'lib/grid5000/campaign/engine.rb', line 605

def parallel(options = {}, &block)
  p = Parallel.new({:logger => logger}.merge(options))
  yield p if block_given?
  p
end

- (Hash?) reserve!(env = {}) {|Hash, Proc| ... }

Reserve the specified resources for the specified walltime or attempts to reuse existing job if env[:no_submit]==true.

If you want to customize what is done on the reservation phase, you should register a on(:reserve!) hook as follows:

on :reserve! do |env, block|
  # Do whatever crazy things you want, change the environment options if needed.
  # ...
  # Reuse the original reserve! method when you want to submit a job (can be called multiple times).
  reserve!(env, &block)
  # some other things...
  env
end

This method is thread-safe.

Parameters:

  • env (Hash, Proc) (defaults to: {})

    the environment hash, and an optional block to call if the reservation succeeds.

Yields:

  • (Hash, Proc)

    the environment hash with the new job available in env[:job] if successful.

Returns:

  • (Hash, nil)

    the environment hash with the new job available in env[:job] if successful, otherwise nil.



393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
# File 'lib/grid5000/campaign/engine.rb', line 393

def reserve!(env = {}, &block)
  logger.info "[#{env[:site]}] Launching job [no-submit=#{env[:no_submit].inspect}]..."
  if env[:no_submit]
    # Try to reuse the last job running with the same name
    # FIXME: It should test for the same :resources attribute,
    # but OAR api does not provide that at the moment...
    job = connection.root.sites[env[:site].to_sym].jobs(
      :reload => true,
      :name   => env[:name],
      :state  => 'running',
      :user   => env[:user]
    ).find{|j|
      j['name'] == env[:name] &&
      j['state'] == 'running' &&
      j['user_uid'] == env[:user] ||
      j['uid'] == env[:jobid] &&
      j['state'] == 'running'
    }
  else
    payload = {
      :command => "sleep #{env[:walltime]}",
      :name => env[:name],
      :types => ["deploy"],
      :queue => env[:queue],
      :properties => env[:properties],
    }.merge(env.reject{ |k,v| !valid_job_key?(k) })
    payload[:resources] = [
      env[:resources], "walltime=#{oar_walltime(env)}"
    ].join(",")
    job = connection.root.sites[env[:site].to_sym].jobs.submit(payload)
  end

  if job.nil?
    if env[:no_submit]
      env[:no_submit] = false
      # if a new job has to be submitted,
      # a new deployment must also be submitted
      env[:no_deploy] = false
      reserve!(env, &block)
    else
      logger.error "[#{env[:site]}] Cannot get a job"
      nil
    end
  else
    sleep 1
    job.reload
    synchronize { @jobs.push(job) }
    logger.info "[#{env[:site]}] Got the following job: #{job.inspect}"
    logger.info "[#{env[:site]}] Waiting for state=running for job ##{job['uid']} (expected start time=\"#{Time.at(job['scheduled_at']) rescue "unknown"}\")..."

    begin
      Timeout.timeout(env[:submission_timeout]) do
          while job.reload['state'] != 'running'
            sleep env[:polling_frequency]
          end
      end
    rescue Timeout::Error, StandardError => e
      logger.error "[#{env[:site]}] Received Timeout exception: there are no nodes available. Deleting submitted job."
      cleanup!(env, job)
      return
    end

    logger.info "[#{env[:site]}] Job is running: #{job.inspect}"
    env[:job] = job
    yield env if block
    env
  end
end

- (Object) reset!

Reset some variables.



366
367
368
369
370
371
372
# File 'lib/grid5000/campaign/engine.rb', line 366

def reset!
  synchronize {
    @parallel = Parallel.new
    @jobs = []
    @deployments = []
  }
end

- (Array<String>) run!

Run the experiment.

Returns:

  • (Array<String>)

    the array of nodes FQDN



300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
# File 'lib/grid5000/campaign/engine.rb', line 300

def run!
  reset!

  # set up environment hash
  env = self.class.defaults.dup

  nodes = []

  %w{INT TERM}.each do |signal|
    Signal.trap( signal ) do
      logger.fatal "Received #{signal.inspect} signal. Exiting..."
      exit(1)
    end
  end

  logger.debug self.inspect


  change_dir do

    env = execute_with_hooks(:reserve!, env) do |env|
      env[:nodes] = env[:job]['assigned_nodes']

      env = execute_with_hooks(:deploy!, env) do |env|
        env[:nodes] = env[:deployment]['result'].reject{ |k,v|
          v['state'] != 'OK'
        }.keys.sort

        synchronize {
          nodes.push(env[:nodes]).flatten!
        }

        env = execute_with_hooks(:install!, env) unless env[:no_install]
        env = execute_with_hooks(:execute!, env) unless env[:no_execute]

        unless env[:no_cleanup]
          # Only cleans up the deployment
          logger.info "Launching cleanup procedure (pass the --no-cleanup flag to avoid this)..."
          env = execute_with_hooks(:cleanup!, env, nil, env[:deployment])
        end

      end # :deploy!

      unless env[:no_cleanup]
        # Only cleans up the job
        logger.info "Launching cleanup procedure (pass the --no-cleanup flag to avoid this)..."
        env = execute_with_hooks(:cleanup!, env, env[:job], nil)
      end
    end # :reserve!

  end # change_dir

  # Return the valid nodes at the end of the run
  nodes
rescue Exception => e
  logger.error "Received exception: #{e.class.name} - #{e.message}"
  e.backtrace.each {|b| logger.debug b}
  unless env[:no_cancel]
    logger.info "Launching cancellation procedure (pass --no-cancel flag to avoid this)..."
    execute_with_hooks(:cancel!, env)
  end
  nil
end

- (Object) ssh(fqdn, username, options = {}) {|Net::SSH::Connection::Session| ... }

Setup an SSH connection as username to fqdn. By default, the SSH connection will be retried at most ssh_max_attempts times if the host is unreachable. You can overwrite that default locally by passing a different ssh_max_attempts option. Same for :timeout and :keys options.

If option :multi is given and true, then an instance of Net::SSH::Multi::Session is yielded. See <net-ssh.github.com/multi/v1/api/index.html> for more information.

Parameters:

  • fqdn (String)

    the fully qualified domain name of the host to connect to.

  • username (String)

    the login to use to connect to the host.

  • options (Hash) (defaults to: {})

    a hash of additional options to pass.

Yields:

  • (Net::SSH::Connection::Session)

    ssh a SSH handler.

Raises:

  • (ArgumentError)


621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
# File 'lib/grid5000/campaign/engine.rb', line 621

def ssh(fqdn, username, options = {}, &block)
  raise ArgumentError, "You MUST provide a block when calling #ssh" if block.nil?
  options[:timeout] ||= 10
  if options.has_key?(:password)
    options[:auth_methods] ||= ['keyboard-interactive']
  else
    options[:keys] ||= [private_key].compact
  end
  max_attempts = options[:ssh_max_attempts] || ssh_max_attempts
  logger.info "SSHing to #{username}@#{fqdn.inspect}..."
  attempts = 0
  begin
    attempts += 1
    if options[:multi]
      Net::SSH::Multi.start(
        :concurrent_connections => (
          options[:concurrent_connections] || 10
        )
      ) do |session|
        session.via gateway, user unless gateway.nil?
        if options.has_key?(:password)
          fqdn.each {|h| session.use "#{username}@#{h}", :password => options[:password]}
        else
          fqdn.each {|h| session.use "#{username}@#{h}"}
        end
        block.call(session)
      end
    else
      if gateway
        gateway_handler = Net::SSH::Gateway.new(gateway, user, options)
        gateway_handler.ssh(fqdn, username, options, &block)
        gateway_handler.shutdown!
      else
        Net::SSH.start(fqdn, username, options, &block)
      end
    end
  rescue Errno::EHOSTUNREACH => e
    if attempts <= max_attempts
      logger.info "No route to host #{fqdn}. Retrying in 5 secs..."
      sleep 5
      retry
    else
      logger.info "No route to host #{fqdn}. Won't retry."
      raise e
    end
  end
end

- (Object) synchronize(&block)

Synchronization method.



594
595
596
# File 'lib/grid5000/campaign/engine.rb', line 594

def synchronize(&block)
  @mutex.synchronize(&block)
end