CHIPP Computing board Meeting 
=============================
CERN:  25th March 2004


   Present:  Roland Bernet, Allan Clark, Derek Feichtinger, 
             Szymon Gadomski, Christoph Grab, Derek Feichtinger, 
             Niko Neufeld, Frederik Orellana, Marie-Christine Sawley,


 - Welcome: (CG)

 - Requirements for the CH_Tier-2 cluster
  --------------------------------------

     o Needs as specified from :  CMS (DF), LHCb (DF), ATLAS (FO)

        The requirements of the three experiments are similar.
        At the moment LHCb requires less memory but needs outbound
        network connection.
        The numbers come from the Data Challenge 04 jobs,
        these numbers may change in future.)

        - Memory:     1 GByte/CPU  (in particular CMS explicitely stated)
        - Hard disk:  at least 80 GByte (40 + 120 GByte preferred)
        - Network:    100 Mb/s is ok, 1000 Mb/s preferred
        - Storage:    5 TByte disk server is needed

        It was noted that stability is much more important than 
        large machines. The experiments will use small clusters
        if they run stable.

     o Needs for computing administration (DF)

        - Remote on/off
        - Console export would be ideal.
        - Possibility of saving and loading different configurations

  - Status of cluster in Manno (FO)
   --------------------------------
     o Hardware status:
        - No change in hardware since last meeting.
        - Problem with hardware stability. Machines crash with
          corrupted file systems. University of Zurich (RB) had
          similar problems with AMD machines. Changing the CPUs
          helped. Overheating the CPUs might be the problem.
        - 16 to 17 of the 20 machines are running.


     o Software status:
        - LCG-1 is installed but not used anymore, not even test jobs
          are coming in.

        - LCG-2 is deployed at several Tier1, but LCG-1 not officially
          terminated yet.
          The GRID middleware software is the problem.

        Derek and Frederik are now working in ARDA (A Realisation of 
        Distributed Analysis for LHC).

        Many political+technical questions are now dealt with by ARDA.
        ARDA should provide 4 prototypes from end to end
        with only 8 people! There was the general feeling
        that producing 4 prototypes (one per experiment) does
        not make sense and needs too many computing staff.

        The GRID middleware will come from EGEE. This is
        not a HEP community. There are many people from EDG as 
        well as from Alien and Condor involved. The base of the
        software comes from Alien, e.g. file catalogue and
        interfaces. In addition Alien has attractive features, 
        e.g. it is a pull system.


  - Funding Request (CG)
   ---------------------

     o A request for funding has been submitted to the 
       Swiss National Science Foundation (SNF), on behalf of CHIPP,
       in Feb. 2004, asking for a total of 
        - 128 kCHF  to upgrade the present farm
	     (as example configuration a "Linux HP cluster by DALCO"
              was taken) 
	-   plus an additional 5 kCHF  travel money

        Even if the money is granted we will not receive anything 
        before October 2004. There is the possibility to get a loan
        from ETH Zurich and University of Geneva if we want to spend
        some money earlier. (Still need to be sure that we will get
        the money!)


 - Status at CSCS (MCS)
 -----------------------

     o Personnel: LHC-computing position: offered, but first choice
         seems not to accept. Second and third choice are IT
         specialists and not physicists. We won't have a person before 
         end of May.

     o Computing: CSCS has two types of systems:
        - Massive Parallel Processing (MPP) (IBM)
        - Parallel-Vector Processing (PVP)  (NEC)
        Both systems have to be upgraded or replaced
        in the near future. CSCS does not really favour yet another
        different third type of system. Our CH-LCG system should somehow 
        fit into there scheme. 
        Manno is investigating the zBox option in a "period of 
	exploration" together with PSI and UNIZH.


 - Hardware issues for upgrading the CH-LCG cluster at CSCS:
 -----------------------------------------------------------

     o Presently available offers for a configuraion  of
           ( 20+1 CPUs, 512 MByte RAM, 40+120 GB Harddisk )
           ( 5 TByte RAID file server )

       1) DALCO:     kCHF 70  ( system )
		     +  kCHF 24  ( file server )

       2) Transtec:  kCHF 70  ( system) 
		     + kCHF 14  (file server) (less performance than DALCO)


     o 32 bits versus 64 bits architecture discussion:

        - There is no big difference in through put.
        - Advantage of 64 bits:  very large file access
                                 very large jobs (memory)
        - CERN Linux should run on 64 bit machines.
	   (note: not all expt's software does actually run now on 64 bit)


     o zBox: Uni. ZH operates successfully operation a zBox system 
           see  http://krone.physik.unizh.ch/~stadel/zBox/
	   Uni.ZH is in the process of acquiring more capacity, 
           and wants to build a new zBox with about (>= 512 processors).
	   Also PSI is interested (Adelmann), and CSCS has expressed
	   interest too, to join efforts and setup up a common system,
	   that could possibly be located in Manno.
	   Evaluations are taking place now, and should turn into
           a definite proposal by end may 04.

     o New zBox:  Is the new zBox a valid candidate for us?
           If yes, we could combine efforts UNIZH, PSI, CSCS, CH-LHC.

        Requirements:
        - Manno: It is important that the system is
                 reliable, and goes fast in production.
        - HEP:   Would like Linux, no short latency needed.
                 The question is raised, what do we do with all these
                 interconnects we do not need.
                 Could we have a Linux and a different UNIX flavour
                 in Manno beside each other?
                 We'll have yet to see whether this architecture suits 
                 the LHC needs. We plan to do some benchmarks.


  - Discussion (all)
   -------------------
     o Needs: There was a general agreement that we do not need
         to buy any hardware in the next month or two.
         This might change if the existing cluster cannot be 
	 stabilised. In this case, a special meeting will be called.

          The experiments will use whatever is stable running.
          There is no preference of architecture.
 
          Manno needs some input for the meeting on 31st May 
          2004. We should benchmark some of the machines.


     o Benchmark: Manno needs some input for the meeting on 
        31.5.2004. We should benchmark some of the machines:

	Atlas SW: Orellana + Gadomski; in progress.
	LHCb : Orellana and Bernet will try to run some software
        (Atlas or LHCb) on the zBox in Zurich.
        This will also be a test of portability, as the zBox
        is running SuSE Linux.
	CMS : Holzner + Feichtinger: run production CMS jobs on CH-CSCS.
	[ We might also be able to run some tests on the new
           ETH machine (large scale Opteron cluster, once it's there).]


Copies of transparencies are available on : www.chipp.ch
R.Bernet + C.Grab
________________________________________________________________________