Assignment Document

Deep Learning in Medical Image Analysis

Pages:

Preview:


  • "BE19CH09-Shen ARI 28April2017 10:37 ANNUALREVIEWSFurther Click here to view this article'sonline features: • Download ?gures as PPT slides • Navigate linked references DeepLearninginMedical • Download citations • Explore related articles • Searc..

Preview Container:


  • "BE19CH09-Shen ARI 28April2017 10:37 ANNUALREVIEWSFurther Click here to view this article'sonline features: • Download ?gures as PPT slides • Navigate linked references DeepLearninginMedical • Download citations • Explore related articles • Search keywords ImageAnalysis 1,2 1 2 DinggangShen, GuorongWu, andHeung-IlSuk 1 DepartmentofRadiology,UniversityofNorthCarolina,ChapelHill,NorthCarolina27599; email:[email protected] 2 DepartmentofBrainandCognitiveEngineering,KoreaUniversity,Seoul02841,Republic ofKorea;email:[email protected] Annu.Rev.Biomed.Eng.2017.19:221–48 Keywords FirstpublishedasaReviewinAdvanceonMarch medicalimageanalysis,deeplearning,unsupervisedfeaturelearning 9,2017 Abstract The Annual Review of Biomedical Engineeringis onlineatbioeng.annualreviews.org Thisreviewcoverscomputer-assistedanalysisofimagesinthe?eldofmed- https://doi.org/10.1146/annurev-bioeng-071516- ical imaging. Recent advances in machine learning, especially with regard 044442 to deep learning, are helping to identify, classify, and quantify patterns in Copyright\u0000 c 2017byAnnualReviews. medical images. At the core of these advances is the ability to exploit hier- Allrightsreserved archicalfeaturerepresentationslearnedsolelyfromdata,insteadoffeatures designed by hand according to domain-speci?c knowledge. Deep learning israpidlybecomingthestateoftheart,leadingtoenhancedperformancein variousmedicalapplications.Weintroducethefundamentalsofdeeplearn- ing methods and review their successes in image registration, detection of anatomicalandcellularstructures,tissuesegmentation,computer-aideddis- easediagnosisandprognosis,andsoon.Weconcludebydiscussingresearch issuesandsuggestingfuturedirectionsforfurtherimprovement. 221 Annu. Rev. Biomed. Eng. 2017.19:221-248. Downloaded from www.annualreviews.orgAccess provided by Edith Cowan University on 09/13/17. For personal use only. BE19CH09-Shen ARI 28April2017 10:37 Contents 1.INTRODUCTION ............................................................ 222 2.DEEPLEARNING............................................................. 224 2.1.Feed-ForwardNeuralNetworks............................................ 224 2.2.DeepModels............................................................... 225 2.3.UnsupervisedFeatureRepresentationLearning.............................. 226 2.4.Fine-TuningDeepModelsforTargetTasks................................. 228 2.5.ConvolutionalNeuralNetworks ............................................ 229 2.6.ReducingOver?tting....................................................... 230 3.APPLICATIONSINMEDICALIMAGING.................................... 230 3.1.DeepFeatureRepresentationLearninginMedicalImages.................... 230 3.2.DeepLearningforDetectionofAnatomicalStructures....................... 233 3.3.DeepLearningforSegmentation............................................ 235 3.4.DeepLearningforComputer-AidedDetection.............................. 236 3.5.DeepLearningforComputer-AidedDiagnosis.............................. 239 4.CONCLUSION................................................................ 242 1. INTRODUCTION Over the past few decades, medical imaging techniques, such as computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), mammography, ul- trasound,and X-ray,have been used for theearly detection, diagnosis,and treatmentofdiseases (1).Intheclinic,medicalimageinterpretationhasbeenperformedmostlybyhumanexpertssuch asradiologistsandphysicians.However,givenwide variationsinpathologyandthepotentialfa- tigue of human experts, researchers and doctors have begun to bene?t from computer-assisted interventions.Althoughtherateofprogressincomputationalmedicalimageanalysishasnotbeen asrapidasthatinmedicalimagingtechnologies,thesituationisimprovingwiththeintroduction ofmachinelearningtechniques. In applying machine learning, ?nding or learning informative features that well describe the regularitiesorpatternsinherentindataplaysapivotalroleinvarioustasksinmedicalimageanal- ysis.Conventionally,meaningfulortask-relatedfeaturesweredesignedmostlybyhumanexperts on the basis of their knowledge about the target domains, making it challenging for nonexperts to exploit machine learning techniques for their own studies. In the meantime, there have been efforts to learn sparse representations based on prede?ned dictionaries, possibly learned from trainingsamples.Sparserepresentationismotivatedbytheprincipleofparsimonyinmanyareas ofscience;thatis,thesimplestexplanationofagivenobservationshouldbepreferredovermore complicatedones.Sparsity-inducingpenalizationanddictionarylearninghavedemonstratedthe validityofthisapproachforfeaturerepresentationandfeatureselectioninmedicalimageanalysis (2–6). It should be noted that sparse representation or dictionary learning methods described in the literature still ?nd informative patterns or regularities inherent in data with a shallow archi- tecture,thuslimitingtheirrepresentationalpower.However,deeplearning(7)hasovercomethis obstacle by incorporating the feature engineering step into a learning step. That is, instead of extractingfeaturesmanually,deeplearningrequiresonlyasetofdatawithminorpreprocessing, if necessary, and then discovers the informative representations in a self-taught manner (8, 9). Therefore, the burden of feature engineering has shifted from humans to computers, allowing 222 Shen Wu Suk · · Annu. Rev. Biomed. Eng. 2017.19:221-248. Downloaded from www.annualreviews.orgAccess provided by Edith Cowan University on 09/13/17. For personal use only. BE19CH09-Shen ARI 28April2017 10:37 nonexperts in machine learning to effectively use deep learning for their own research and/or applications,especiallyinmedicalimageanalysis. The unprecedented success of deep learning is due mostly to the following factors: (a)ad- vances in high-tech central processing units (CPUs) and graphics processing units (GPUs), (b)theavailabilityofahugeamountofdata(i.e.,bigdata),and(c)developmentsinlearningalgo- rithms(10–14).Technically,deeplearningcanberegardedasanimprovementoverconventional arti?cialneuralnetworks(15)inthatitenablestheconstructionofnetworkswithmultiple(more thantwo)layers.Deepneuralnetworkscandiscoverhierarchicalfeaturerepresentationssuchthat higher-level features can be derived from lower-level features (9). Because these techniques en- ablehierarchicalfeaturerepresentationstobelearnedsolelyfromdata,deeplearninghasachieved record-breakingperformanceinavarietyofarti?cialintelligenceapplications(16–23)andgrand challenges (24, 25; see https://grand-challenge.org). In particular, improvements in computer vision prompted the use of deep learning in medical image analysis, such as image segmentation (26,27),imageregistration(28),imagefusion(29),imageannotation(30),computer-aideddiag- nosis(CADx)and prognosis(31–33), lesion/landmarkdetection (34–36), and microscopic image analysis(37,38). Deep learning methods are highly effective when the number of available samples during the training stage is large. For example, in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC),morethanonemillionannotatedimageswereavailable(24).However,inmostmedical applicationstherearefarfewerimages(i.e., <1,000).Therefore,aprimarychallengeinapplying deeplearningtomedicalimagesisthelimitednumberoftrainingsamplesavailabletobuilddeep models without suffering from over?tting. To overcome this challenge, research groups have devised various strategies, such as (a) taking either two-dimensional (2D) or three-dimensional (3D)imagepatches,ratherthanthefull-sizedimages,asinput(29,39–45)inordertoreduceinput dimensionalityandthusthenumberofmodelparameters;(b)expandingthedatasetbyarti?cially generating samples via af?ne transformation (i.e., data augmentation), and then training their network from scratch with the augmented data set (39–42); (c) using deep models trained on a hugenumberofnaturalimagesincomputervisionas“off-the-shelf”featureextractors,andthen training the ?nal classi?er or output layer with the target-task samples (43, 45); (d) initializing modelparameterswiththoseofpretrainedmodelsfromnonmedicalornaturalimages,then?ne- tuningthenetworkparameterswiththetask-relatedsamples(46,47);and(e)usingmodelstrained withsmall-sizedinputsforarbitrarilysizedinputsbytransformingweightsinthefullyconnected layersintoconvolutionalkernels(36,48). In terms of input types, we can categorize deep models as typical multilayer neural networks thattakevector-format(i.e.,nonstructured)valuesasinputandconvolutionalnetworksthattake 2Dor3D(i.e.,structured)valuesasinput.Becauseofthestructuralcharacteristicsofimages(the structuralorcon?guralinformationcontainedinneighboringpixelsorvoxelsisanotherimportant sourceofinformation),convolutionalneuralnetworks(CNNs)haveattractedgreatinterestinthe ?eldofmedicalimageanalysis(26,35–37,48–50).However,networkswithvectorizedinputshave also been successfully used in different medical applications (28, 29, 31, 33, 51–54). Along with deep neural networks, deep generative models (55)—such as deep belief networks (DBNs) and deepBoltzmannmachines(DBMs),whichareprobabilisticgraphicalmodelswithmultiplelayers of hidden variables—have been successfully applied to brain disease diagnosis (29, 33, 47, 56), lesion segmentation (36, 49, 57, 58), cell segmentation (37, 38, 59, 60), image parsing (61–63), andtissueclassi?cation(26,35,48,50). Thisreviewisorganizedasfollows.InSection2,weexplainthecomputationaltheoriesofneural networksanddeepmodels[e.g.,stackedauto-encoders(SAEs),DBNs,DBMs,CNNs]anddiscuss how they extract high-level representations from data. In Section 3, we introduce recent studies www.annualreviews.org • Deep Learning in Medical Image Analysis 223 Annu. Rev. Biomed. Eng. 2017.19:221-248. Downloaded from www.annualreviews.orgAccess provided by Edith Cowan University on 09/13/17. For personal use only. BE19CH09-Shen ARI 28April2017 10:37 aSingle-layer neural network bMultilayer neural network Output layer Output layer Hidden layer Input layer Input layer Figure 1 Architecturesoftwofeed-forwardneuralnetworks. using deep models for different applications in medical imaging, including image registration, anatomylocalization,lesionsegmentation,detectionofobjectsandcells,tissuesegmentation,and computer-aideddetection(CADe)andCADx.Finally,inSection4weconcludebysummarizing researchtrendsandsuggestingdirectionsforfurtherimprovements. 2. DEEP LEARNING In this section, we explain the fundamental concepts of feed-forward neural networks and basic deepmodelsintheliterature.Wefocusonlearninghierarchicalfeaturerepresentationsfromdata. Wealsodiscusshowtoef?cientlylearnparametersofdeeparchitecturebyreducingover?tting. 2.1. Feed-Forward Neural Networks In machine learning, arti?cial neural networks are a family of models that mimic the structural elegance of the neural system and learn patterns inherent in observations. The perceptron (64) 1 is the earliest trainable neural network with a single-layer architecture, composed of an input layer and an output layer. A perceptron, or a modi?ed perceptron with multiple output units (Figure1a),isregardedasalinearmodel,prohibitingitsapplicationintasksinvolvingcomplicated datapatterns,despitetheuseofnonlinearactivationfunctionsintheoutputlayer. This limitation can be overcome by introducing a so-called hidden layer between the input layer and the output layer. Note that in neural networks the units of the neighboring layers are fullyconnectedtooneanother,buttherearenoconnectionsamongunitsinthesamelayer.For a two-layer neural network (Figure 1b), also known as a multilayer perceptron, given an input D vector v = [v ] ?R ,wecanwritetheestimationfunctionofanoutputunit y asacomposition i k functionasfollows: ? ? \u0000 \u0000 M D \u0000 \u0000 (2) (1) (1) (2) (2) (1) ? ? y (v; \u0000 ) = f W f W v +b +b,(1) k i kj ji j k j=1 i=1 1 Ingeneral,theinputlayerisnotcounted. 224 Shen Wu Suk · · Annu. Rev. Biomed. Eng. 2017.19:221-248. Downloaded from www.annualreviews.orgAccess provided by Edith Cowan University on 09/13/17. For personal use only. BE19CH09-Shen ARI 28April2017 10:37 (1) (2) wherethesuperscriptdenotesalayerindex, f (·)and f (·)denotenonlinearactivationfunctions (1) (2) (1) (2) ofunitsatthespeci?edlayers, M isthenumberofhiddenunits,and \u0000 ={W ,W ,b ,b }isa 2 (1) parameterset. Conventionally,thehiddenunits’activationfunction, f (·),iscommonlyde?ned with a sigmoidal function such as a logistic sigmoid function or a hyperbolic tangent function, (2) whereas the output units’ activation function f (·) is dependent on the target task. Because the estimationproceedsinaforwarddirection,thistypeofnetworkisalsoreferredtoasafeed-forward neuralnetwork. M When the hidden layer in Equation 1 is regarded as a feature extractor,f(v) = [f (v)] ?R j fromaninputv,theoutputlayerisonlyasimplelinearmodel, ? ? M \u0000 (2) (2) (2) ? ? y (v; \u0000 ) = f W f (v) +b,(2) k j kj k j=1 \u0000 \u0000 \u0000 D (1) (1) (1) where f (v) = f W v +b . The same interpretation holds when there is a higher j i i=1 ji j numberofhiddenlayers.Thus,itisintuitivethattheroleofhiddenlayersisto?ndfeaturesthat areinformativeforthetargettask. The practical use of neural networks requires that the model parameters \u0000 be learned from data. The problem of parameter learning can be formulated as the minimization of the error function. From an optimization perspective, the error function E for neural networks is highly nonlinearandnonconvex.Thus,thereisnoanalyticsolutionoftheparameterset \u0000 .Instead,one can use a gradient descent algorithm by updating the parameters iteratively. In order to utilize a gradient descent algorithm, there must be a way to compute a gradient ?E(\u0000 ) evaluated at the parameterset \u0000 . Forafeed-forwardneuralnetwork,thegradientcanbeef?cientlyevaluatedbymeansoferror back-propagation (65). Once the gradient vector of all the layers is known, the parameters \u0000 (1) (2) (1) (2) ?{W ,W ,b ,b }canbeupdatedasfollows: \u0000 \u0000 (t+1) (t) (t) \u0000 = \u0000 - ??E \u0000,(3) where ? is a learning rate and t denotes an iteration index. The update process is repeated until convergenceoruntiltheprede?nednumberofiterationsisreached.Asfortheparameterupdate in Equation 3, the stochastic gradient descent with a small subset of training samples, termed a minibatch,iscommonlyusedintheliterature(66). 2.2. Deep Models Under a mild assumption on the activation function, a two-layer neural network with a ?nite number of hidden units can approximate any continuous function (67); therefore, it is regarded asauniversalapproximator.However,itisalsopossibletoapproximatecomplexfunctionstothe sameaccuracybyusingadeeparchitecture(i.e.,onewithmorethantwolayers),withafarfewer number of units (8). Thus, it is possible to reduce the number of trainable parameters, enabling trainingwitharelativelysmalldataset(68). (1) (2) (1) (2) 2 (1) M ×D (2) K×M (1) M (2) K W = [W ] ?R ;W = [W ] ?R ;b = [b ] ?R ;b = [b ] ?R . ji kj j k www.annualreviews.org • Deep Learning in Medical Image Analysis 225 Annu. Rev. Biomed. Eng. 2017.19:221-248. Downloaded from www.annualreviews.orgAccess provided by Edith Cowan University on 09/13/17. For personal use only. BE19CH09-Shen ARI 28April2017 10:37 aStacked auto-encoder bDeep belief network cDeep Boltzmann machine (L) (L) (L) h h h (L) (L) (L) W W W (L – 1) (L – 1) (L – 1) h h h (L – 1) (L – 1) (L – 1) W W W (2) (2) (2) W W W (1) (1) (1) h h h (1) (1) (1) W W W v v v Figure 2 Threerepresentativedeepmodelswithvectorizedinputsforunsupervisedfeaturelearning.Theredlinks,whetherdirectedor undirected,denotethefullconnectionsofunitsintwoconsecutivelayersbutnoconnectionsamongunitsinthesamelayer.Notethe differencesamongmodelsindirected/undirectedconnectionsandthedirectionsoftheconnectionsthatdepictconditional relationships. 2.3. Unsupervised Feature Representation Learning Comparedwithshallowarchitecturesthatrequireagoodfeatureextractordesignedmostlybyhand onthebasisofexpertknowledge,deepmodelsareusefulfordiscoveringinformativefeaturesfrom data in a hierarchical manner (i.e., from ?ne to abstract). Here, we introduce three deep models thatarewidelyusedindifferentapplicationsforunsupervisedfeaturerepresentationlearning. 2.3.1. Stacked auto-encoder. An auto-encoder or auto-associator (69) is a special type of two- layerneuralnetworkthatlearnsalatentorcompressedrepresentationoftheinputbyminimizing the reconstruction error between the input and output values of the network, namely the recon- struction of the input from the learned representations. Because of its simple, shallow structure, a single-layer auto-encoder’s representational power is very limited. But when multiple auto- encodersarestacked(Figure 2a)inacon?gurationcalledanSAE,onecansigni?cantlyimprove therepresentationalpowerbyusingtheactivationvaluesofthehiddenunitsofoneauto-encoder as the input to the next higher auto-encoder (70). One of the most important characteristics of SAEs is their ability to learn or discover highly nonlinear and complicated patterns, such as the relationsamonginputvalues.WhenaninputvectorispresentedtoanSAE,thedifferentlayersof thenetworkrepresentdifferentlevelsofinformation.Thatis,thelowerthelayerinthenetwork is, the simpler the patterns are, and the higher the layer is, the more complicated or abstract the patternsinherentintheinputvectorare. With regard to training parameters of the weight matrices and the biases in SAE, a straight- forward approach is to apply back-propagation to the gradient-based optimization technique, beginning from random initialization by using the SAE as a conventional feed-forward neural network.Unfortunately,deepnetworkstrainedinthismannerperformworsethannetworkswith a shallow architecture, as they fall into a poor local optimum (71). To circumvent this problem, oneshouldconsidergreedylayer-wiselearning(10,72).Thekeyideaofgreedylayer-wiselearning 226 Shen Wu Suk · · Annu. Rev. Biomed. Eng. 2017.19:221-248. Downloaded from www.annualreviews.orgAccess provided by Edith Cowan University on 09/13/17. For personal use only. BE19CH09-Shen ARI 28April2017 10:37 istopretrainonelayeratatime.Thatis,theusertrainsparametersofthe?rsthiddenlayerwith thetrainingdataasinput,andthentrainsparametersofthesecondhiddenlayerwiththeoutput fromthe?rsthiddenlayerasinput,andsoon.Inotherwords,therepresentationofthelthhidden layerisusedasinputforthe(l +1)-thhiddenlayer.Animportantadvantageofsuchapretraining technique is that it is conducted in an unsupervised manner with a standard back-propagation algorithm, enabling the user to increase the size of the data set by exploiting unlabeled samples fortraining. 2.3.2.Deepbeliefnetwork. ArestrictedBoltzmannmachine(RBM)(73)isasingle-layerundi- rectedgraphicalmodelwithavisiblelayerandahiddenlayer.Itassumessymmetricconnectivities betweenvisibleandhiddenlayers,butnoconnectionsamongunitswithinthesamelayer.Because of the symmetry of the connectivities, it can generate input observations from hidden represen- tations. Therefore, an RBM naturally becomes an auto-encoder (10, 73), and its parameters are usuallytrainedbyuseofacontrastivedivergencealgorithm(74)soastomaximizetheloglikeli- hoodofobservations.LikeSAEs,RBMscanbestackedinordertoconstructadeeparchitecture, resultinginasingleprobabilisticmodelcalledaDBN.ADBNhasonevisiblelayervandaseriesof (1) (L) hiddenlayersh ,...,h (Figure2b).NotethatwhenmultipleRBMsarestackedhierarchically, although the top two layers still form an undirected generative model (i.e., an RBM), the lower layers form directed generative models. Thus, the joint distribution of the observed units v and (l) the L hiddenlayersh (l = 1,...,L)inDBNis \u0000 \u0000 L-2 \u0000 \u0000 \u0000 \u0000 \u0000 (1) (L) (l) (l+1) (L-1) (L) P v,h ,...,h = P(h |h ) P h ,h,(4) l=0 (l) (l+1) where P(h |h )correspondstoaconditionaldistributionfortheunitsoflayerl giventheunits (L-1) (L) oflayer l +1,and P(h ,h )denotesthejointdistributionoftheunitsinlayers L -1and L. Regarding the learning of parameters, the greedy layer-wise pretraining scheme (10) can be appliedinthefollowingsteps. (0) 1. Trainthe?rstlayerasanRBMwithv = h . 2. Use the ?rst hidden layer to obtain the representation of inputs with either the mean acti- (1) (0) (1) (0) vationsof P(h = 1|h )orsamplesdrawnaccordingto P(h |h ),whichwillbeusedas observationsforthesecondhiddenlayer. 3. TrainthesecondhiddenlayerasanRBM,takingthetransformeddata(meanactivationsor samples)astrainingexamples(forthevisiblelayeroftheRBM). 4. Iteratesteps2and3forthedesirednumberoflayers,eachtimepropagatingupwardeither (l+1) (l) meanactivations P(h = 1|h )orsamplesdrawnaccordingtotheconditionalprobability (l+1) (l) P(h |h ). After the greedy layer-wise training procedure is complete, one can apply the wake–sleep algo- rithm(75)tofurtherincreasetheloglikelihoodoftheobservations.Usually,however,nofurther procedureisconductedtotrainthewholeDBNjointlyinpractice. 2.3.3. Deep Boltzmann machine. A DBM (55) is also constructed by stacking multiple RBMs in a hierarchical manner. However, in contrast to DBNs, all the layers in DBMs form an undi- rected generative model following the stacking of RBMs (Figure 2c). Thus, for hidden layer l, exceptinthecaseof l =1and l =L,thelayer’sprobabilitydistributionisconditionedbyitstwo (l) (l+1) (l-1) neighboring layers, l +1and l -1 [i.e., P(h |h ,h )]. The incorporation of information fromboththeupperandlowerlayersimprovesaDBM’srepresentationalpowersothatitismore robusttonoisyobservations. www.annualreviews.org • Deep Learning in Medical Image Analysis 227 Annu. Rev. Biomed. Eng. 2017.19:221-248. Downloaded from www.annualreviews.orgAccess provided by Edith Cowan University on 09/13/17. For personal use only. "

Related Documents

Start searching more documents, lectures and notes - A complete study guide!
More than 25,19,89,788+ documents are uploaded!

Why US?

Because we aim to spread high-quality education or digital products, thus our services are used worldwide.
Few Reasons to Build Trust with Students.

128+

Countries

24x7

Hours of Working

89.2 %

Customer Retention

9521+

Experts Team

7+

Years of Business

9,67,789 +

Solved Problems

Search Solved Classroom Assignments & Textbook Solutions

A huge collection of quality study resources. More than 18,98,789 solved problems, classroom assignments, textbooks solutions.

Scroll to Top