您好,欢迎来到五一七教育网。
搜索
您的当前位置:首页Application performance on the mit alewife multiprocessor

Application performance on the mit alewife multiprocessor

来源:五一七教育网
ApplicationPerformanceontheMITAlewifeMultiprocessor

FredericT.ChongandJohnD.Kubiatowicz

ftchong@lcs.mit.edu,kubitron@lcs.mit.edu

MITLaboratoryforComputerScience

Thispaperreportsontheperformanceofsev-eralapplicationsontheAlewifemachine,focus-ingonemergingapplicationsandevolvingarchitec-turalmechanisms.Weshowthatlow-latencymiss-handlingmechanismsforbothlocalandremoteac-cesses,suchasthoseinAlewife,coupledwithcare-fuldataplacementintheapplicationmaketheseemergingapplicationsviablecandidatesforshared-memoryparallelprocessing.Infact,wediscoverthatefficientsharedmemoryisanexcellentcom-municationmechanismforfine-grainapplications(evenintheabsenceofdatare-use),thathavelongbeenconsideredmessage-passingapplications.Notsurprisingly,wefindthatAlewifemechanismsper-formwellontraditionalcoarse-grainapplications.Weconfirmthathardwaresupportforlimitedshar-ingisadequateforabroadrangeofapplications,evenonlargenumbersofprocessors.Wealsoob-servethatmodelinglocalcachemissbehaviorisimportantformachines,suchasAlewife,wherere-motemisseshavebecomemorecompetitive.Toaccountfortheeffectoflocalmisses,weintroducetwonovelperformancemetrics,whichprovidemorerevealingresultsthanpreviouslyproposedmetrics.Weconcludethatthefine-grainedapplicationscantakeadvantageofAlewife’shighintegrationandef-ficiencytoachieveanewlevelofperformanceonscalablesharedmemorymachines.

Table1liststheapplications,ashortdescrip-tionoftheproblemeachoftheprogramssolves,andtheinputparameters.MP3D,BARNES,LO-CUS,CHOL,andWATERarefromtheSPLASHsuite[SWG92].APPBTandMGarepartoftheNASparallelbenchmarks[Bai94].Therestoftheappli-cationsareengineering-typekernelsfromtheUni-versityofRochester,MIT,andBerkeley.

Remotecachemissesfromaprocessortoanotherprocessor’scacheormemoryareanintegralpartofeverymultiprocessorapplicationstudy.ItturnsoutthatlocalcachemissesarealsoveryimportantontheAlewifemachinebecauseremotemissesareonlyaboutfivetimesasexpensiveasalocalmiss.

remoteaccesses

localaccesses5

Inessence,thisformulaassumesthatfivelocalmissesareequivalenttoaremotemiss,andaccountsfortheoverheadoflocalmissesinordertocomputetheoveralleffectofcachemisses.Wefindthismet-rictobemoreindicativeofapplicationperformance.Thismetric,whichwecallweightedmissratio,ismoreindicativeofapplicationperformancethanlo-calandremotemissratiosinisolationorcombinedwithoutweighting.

Overall,wefindthatEM3D,ICCG,andMP3Dhavelowerhitratiosthantheotherapplications.Thispartlyexplainsthelowerutilizationontheseapplications.However,lowhitratiosonlyleadtopoorprocessorutilizationwhenthereislittleactualcomputationbetweenmemoryreferences.Ourfullpaperanalyzestheamountofcomputationinbe-tweencachemisses.Incombinationwithcachehitratios,thesetwometricsallowustodeterminetheeffectivegranularityoftheapplications.

Thefullpaperandrelateddocumentsareavailablefromhttp://www.ai.mit.edu/people/ftchong/

References

[Bai94]D.Baileyetal.TheNASParallelBenchmarks.Techni-calReportRNR-94-007,NASAAmesResearchCenter,March1994.

[SWG92]JaswinderPalSingh,Wolf-DietrichWeber,and

AnoopGupta.SPLASH:Stanfordparallelapplica-tionsforshared-memory.ComputerArchitectureNews,20(1):5–44,March1992.

1

ProgramMP3DBARNESCHOLWATERMGGAUSS

CGRID

Description

Simulatesrarefiedfluidflow

18000particles,6iterations

Simulatesmovementofbodiesundergravitationalforces

3817wires

Choleskyfactorizationofasparsematrix

(order3948,56934floats)

Simulatesmovementofwatermolecules

20

3DPoissonsolverusingmulti-gridtechniques

20000nodes,20%remoteneighbors

UnblockedGaussianelimination

80000floats

Straightforward2Dsuccessiveover-relaxation

Kintegers

Preconditionedconjugategradientsparsesolver

(order11948,149090doubles)

20

20floats

ICCG

Table1:ApplicationsandKernels

1.0Hit Ratio0.50.0em3diccggaussmp3dmmp3dcgridmg

locuscholappbtfftbarnesmsortwaterLocal Accesses

1.0Hit Ratio0.50.0em3dmsortmp3dfftcgridwaterbarnesmmp3d

choliccgappbtlocusgaussmgRemote Accesses

25RatioRatio1.51.00.50.020151050em3diccgcgridmsort

mggaussmmp3dmp3dfftwaterappbt

locusbarnescholRatio of Remote to Local Misses

1.0Ratio of Remote to Local Misses

Hit Ratio0.50.0em3dmp3diccgbarnesmmp3dcholcgrid

fftgausswaterappbtlocusmgmsortWeighted Total of Accesses

Figure1:Cachehitratiosofapplicationssortedbyaverage.Barsarefor1,2,4,8,16,and32processorsforeachapplication.ICCGismissinga1-processorbarbecauseitsdatasetdoesnotfitonasingleAlewifeprocessor.

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- 517ttc.cn 版权所有 赣ICP备2024042791号-8

违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务