CHIPP Computing Board (CCB) - Minutes of Meeting (April 21st 2006) ================================================================ Present: F. Orellana, D. Feichtinger, H. P. Beck, C. Grab, P. Kunszt, R. Bernet, U. Langenegger, S. Haug, J. van Hunen Location: ETHZ - HPK E33; 10:30 - 16:00 ---------------------------------------------------- 1) Phoenix Status (CG) ---------------------------------------------------- - APLE accounting system currently defective, PK said they are looking at it - Phoenix ramp up plan: We agree to not change the tables for the moment - Estimated cost for HW acquisition ~3Mio Operation / year ~0.5Mio - financing: Some preliminary information was transmitted, but no letter received yet. - mention of the offer of SUN to establish a research collaboration in order to build up the cluster. We need to understand what this would mean for us in terms of additional research work. - PK suggested that KTI (a Federal orgnisation) could also help in financing and that CSCS has good contacts there. - long discussion about the CHIPP vote that led to the current stalemate --> need to wait for letter from CHIPP board - we decided to make a decision end of May. ---------------------------------------------------- 2) CSCS status (PK) ---------------------------------------------------- - Experiments support personnel: - - DF+FO stopped at CERN, now attached to CMS+ATLAS. - - LHCb: Does RB need help (currently uses ~15%). Depends on what Swiss LHCb physicists want to do. No user analysis for now. Production: just send an email. --> Conclusion: no extra help needed for the moment. No need for a VO box. - CSCS support personnel: - - PK single person for now, Vincenzo also active. - - System administration done by internal internal FUS group. - - CSCS hiring 1-2 more persons, new person August 1. - PK described user support through GGUS, Derek adviced that he, FO and RB join the GGUS support group. - PK: most tickets handled by the CSCS, since that's what they are paid for by the EGEE. Experiment specific tickets will be reassigned to DF, RB and FO. - PK described Linux kernel 24.->2.6 upgrade issues. - LCG 2.7 upgrade: - - DPM there now - - No LFC yet - expected in two weeks. - SH: who will be responsible for VO box and services uptime? - PK: the relevant persons here will have root access. - Networking: - - Manno - Domodossola - Martigny - Geneve link is now in place. - - 1 Gb-s now, 10 Gb-s possible. - - Redundancy possible in the future, hooking on to FZK - CNAF link. - SC4: - - Good collaboration with FZK. - - We are the first tier-2 site, apart from DESY to join. - DF asked about presence on monitoring sites. Seems ok. - Bottlenecks: - - Storage bandwidth and firewalls. - - Dedicated links to CERN NOT an issue. - - Important to have good connectivity to more than 1 tier-1, should be guaranteed, since everything Swiss goes through CERN hub. - - SWITCH can help tune: http://www.switch.ch/network/pert/ - EGEE: - - CSCS part of EGEE-2: 184'000 Euros for 2 years: one extra person (perhaps 3 extra in total, depending on needs). - - Suggestion: extend PHOENIX with 15-20% for other communities, funding from CSCS/EGEE, win-win situation. - Portal - - alprose01.projects.cscs.ch. - - Will implement NG support. - CG on tier-1 issue: - - CMS attached to Northern tier-1 group, effectively FZK - - LHCb: FZK - - ATLAS: FZK/CERN, will be clear after SC4 - PK: still completely unclear which services CERN will provide, so better stay with FZK. - UL: where will the money come from for faster link in CSCS when need arises? - CG: 50'000 Euros per year should be enough, not too worried - the fibre is there. - CG: again: peaks on tier-1s when reprocessing has finished, bottleneck not network, but disk IO on tier-1s. ---------------------------------------------------- 3) Usage of CSCS tier-2 ---------------------------------------------------- CMS (DF): - DF set up full VO box in Manno in the beginning of the year: CMS file transfer service, CMS file catalogue. - PK has bought 5 servers for all 3 experiments to use for VO boxes. - DF described issues with file catalogue publishing: large effort, will be obsolete in 1-2 months, 3 dataset published in total till now. - CMS software is in a process of change, both core and grid software. - Main user is UL: running Monte Carlo; submits about a thousand jobs a day when running. - UL is worried about ATLAS jobs running via 2 queues in Manno: we agreed to check Maui configuration to ensure true fair-sharing. ATLAS (FO+SH) - Central production via LCG. CSCS is participating via LCG. - A few small-medium scale user productions via NG. - Will investigate setting up ATLAS DDM on VO box. LHCb (RB) - Central production via DIRAC. CSCS is participating. - No VO box needed. ---------------------------------------------------- 4) Status of Service Challenge 4 (experiments) ---------------------------------------------------- - DF reminded us to look at SC4 blog and follow what each experiment needs to do to participate. - DF reminded us to look at and update CSCS wiki. - Tier-1 attachment from experiments point of view: - - CMS:FZK. - - ATLAS:FZK/CERN. - Time line: - - CMS: set up software next few weeks, run tests in may. - - ATLAS: get ready before June. ---------------------------------------------------- 5) Status at home institutes ---------------------------------------------------- - ETHZ (UL): - - Student+Andreas Holzner. - - Old hardware. - - PBS, no LCG, but interest in installing LCG after SC4. - DF comments: currently tier-3 centres do not exist in LCG and cannot expect support. - PSI (DF): no cluster as such, but afs installation of the whole CMS environment. - ATLAS (FO+SH): - - Presentation by SH. - - NorduGrid setup in operation. - - Planning to implement GUI over the summer. - - Users will start small productions during the summer. - EPFL (JH): - - 15 FTEs, half are running physics analysis jobs. - - Shift from hw to sw. - - Jobs are run by central production team after request per email. - - Desktop project progressing - using Condor because of security problems with PBS-pro. - Comment from DF: qemu: wmware disk producer - LHCb (RB): - - T3 not running LCG, but standalone Dirac. - - Agent making oportunistic use of cluster. - - Just running production jobs. - - No big storage attached. ---------------------------------------------------- 6) AOB ---------------------------------------------------- - Next meeting: 2/6. - Extended discussion on call for tender.