CHIPP Computing board Meeting ============================= CERN: 25th March 2004 Present: Roland Bernet, Allan Clark, Derek Feichtinger, Szymon Gadomski, Christoph Grab, Derek Feichtinger, Niko Neufeld, Frederik Orellana, Marie-Christine Sawley, - Welcome: (CG) - Requirements for the CH_Tier-2 cluster -------------------------------------- o Needs as specified from : CMS (DF), LHCb (DF), ATLAS (FO) The requirements of the three experiments are similar. At the moment LHCb requires less memory but needs outbound network connection. The numbers come from the Data Challenge 04 jobs, these numbers may change in future.) - Memory: 1 GByte/CPU (in particular CMS explicitely stated) - Hard disk: at least 80 GByte (40 + 120 GByte preferred) - Network: 100 Mb/s is ok, 1000 Mb/s preferred - Storage: 5 TByte disk server is needed It was noted that stability is much more important than large machines. The experiments will use small clusters if they run stable. o Needs for computing administration (DF) - Remote on/off - Console export would be ideal. - Possibility of saving and loading different configurations - Status of cluster in Manno (FO) -------------------------------- o Hardware status: - No change in hardware since last meeting. - Problem with hardware stability. Machines crash with corrupted file systems. University of Zurich (RB) had similar problems with AMD machines. Changing the CPUs helped. Overheating the CPUs might be the problem. - 16 to 17 of the 20 machines are running. o Software status: - LCG-1 is installed but not used anymore, not even test jobs are coming in. - LCG-2 is deployed at several Tier1, but LCG-1 not officially terminated yet. The GRID middleware software is the problem. Derek and Frederik are now working in ARDA (A Realisation of Distributed Analysis for LHC). Many political+technical questions are now dealt with by ARDA. ARDA should provide 4 prototypes from end to end with only 8 people! There was the general feeling that producing 4 prototypes (one per experiment) does not make sense and needs too many computing staff. The GRID middleware will come from EGEE. This is not a HEP community. There are many people from EDG as well as from Alien and Condor involved. The base of the software comes from Alien, e.g. file catalogue and interfaces. In addition Alien has attractive features, e.g. it is a pull system. - Funding Request (CG) --------------------- o A request for funding has been submitted to the Swiss National Science Foundation (SNF), on behalf of CHIPP, in Feb. 2004, asking for a total of - 128 kCHF to upgrade the present farm (as example configuration a "Linux HP cluster by DALCO" was taken) - plus an additional 5 kCHF travel money Even if the money is granted we will not receive anything before October 2004. There is the possibility to get a loan from ETH Zurich and University of Geneva if we want to spend some money earlier. (Still need to be sure that we will get the money!) - Status at CSCS (MCS) ----------------------- o Personnel: LHC-computing position: offered, but first choice seems not to accept. Second and third choice are IT specialists and not physicists. We won't have a person before end of May. o Computing: CSCS has two types of systems: - Massive Parallel Processing (MPP) (IBM) - Parallel-Vector Processing (PVP) (NEC) Both systems have to be upgraded or replaced in the near future. CSCS does not really favour yet another different third type of system. Our CH-LCG system should somehow fit into there scheme. Manno is investigating the zBox option in a "period of exploration" together with PSI and UNIZH. - Hardware issues for upgrading the CH-LCG cluster at CSCS: ----------------------------------------------------------- o Presently available offers for a configuraion of ( 20+1 CPUs, 512 MByte RAM, 40+120 GB Harddisk ) ( 5 TByte RAID file server ) 1) DALCO: kCHF 70 ( system ) + kCHF 24 ( file server ) 2) Transtec: kCHF 70 ( system) + kCHF 14 (file server) (less performance than DALCO) o 32 bits versus 64 bits architecture discussion: - There is no big difference in through put. - Advantage of 64 bits: very large file access very large jobs (memory) - CERN Linux should run on 64 bit machines. (note: not all expt's software does actually run now on 64 bit) o zBox: Uni. ZH operates successfully operation a zBox system see http://krone.physik.unizh.ch/~stadel/zBox/ Uni.ZH is in the process of acquiring more capacity, and wants to build a new zBox with about (>= 512 processors). Also PSI is interested (Adelmann), and CSCS has expressed interest too, to join efforts and setup up a common system, that could possibly be located in Manno. Evaluations are taking place now, and should turn into a definite proposal by end may 04. o New zBox: Is the new zBox a valid candidate for us? If yes, we could combine efforts UNIZH, PSI, CSCS, CH-LHC. Requirements: - Manno: It is important that the system is reliable, and goes fast in production. - HEP: Would like Linux, no short latency needed. The question is raised, what do we do with all these interconnects we do not need. Could we have a Linux and a different UNIX flavour in Manno beside each other? We'll have yet to see whether this architecture suits the LHC needs. We plan to do some benchmarks. - Discussion (all) ------------------- o Needs: There was a general agreement that we do not need to buy any hardware in the next month or two. This might change if the existing cluster cannot be stabilised. In this case, a special meeting will be called. The experiments will use whatever is stable running. There is no preference of architecture. Manno needs some input for the meeting on 31st May 2004. We should benchmark some of the machines. o Benchmark: Manno needs some input for the meeting on 31.5.2004. We should benchmark some of the machines: Atlas SW: Orellana + Gadomski; in progress. LHCb : Orellana and Bernet will try to run some software (Atlas or LHCb) on the zBox in Zurich. This will also be a test of portability, as the zBox is running SuSE Linux. CMS : Holzner + Feichtinger: run production CMS jobs on CH-CSCS. [ We might also be able to run some tests on the new ETH machine (large scale Opteron cluster, once it's there).] Copies of transparencies are available on : www.chipp.ch R.Bernet + C.Grab ________________________________________________________________________