For the united states, the application claims its domestic benefits, for all other jurisdictions, the application claims priority to the following prior applications: 1) U.S. provisional application No.62/872,347 filed on 7.10.2019 and 2) U.S. provisional application No.62/878,464 filed on 25.7.2019. Where applicable, the entire contents of each of the prior applications are incorporated herein by reference.
Detailed Description
Acne vulgaris is a common skin condition, 85% of the population, especially adolescents, and is sometimes experienced as acne. To assess the severity of acne, individuals need to visit dermatologists and clinicians and rely on their expertise in this area. The physician must physically and manually examine the patient and give a rough ranking based on lesion count, affected area, and other relevant factors. This approach is often time consuming and laborious, and may also lead to unreliable and inaccurate results due to human error. When repeated examinations are required over a period of time, too much effort is required by the doctor.
To minimize the manpower required for this task, many studies have been exploring computer-aided techniques to assess the severity of acne. Many work in this area, such as [1,7], require processing of high-standard medical images by algorithms that are difficult to deploy in mobile systems. Later work including [8,2] introduced methods of processing images taken by mobile phones through a number of different steps. However, all of these work has focused on acne localization and lesion counting [8,1,2,7]. This involves long-flow conventional image processing techniques, such as blob detection and feature extraction, to output mask or lesion region localization. The severity of the acne (i.e., acne score) is then calculated by the formula based on the localization results and the number of lesions detected. One major limitation of such methods is the accuracy of acne score combined with the appearance of acne localization and lesion counts. In some cases, lighting conditions and skin color can increase the error rate at various stages of the process, thereby significantly affecting the final result.
In a recent work on acne assessment [9], authors achieved significant results without lesion counts by using neural networks. They have shown that neural networks can perform very accurately as long as image data is presented.
However, their approach requires a special type of medical image, which forces the user to sit in front of the camera device and make 5 specific poses during training and testing. This type of evaluation also limits the use of mobile devices. Another work product [16] is adapted to the image taken by the mobile phone, requiring multiple iterations of manual correction.
According to an example, a training technique for Convolutional Neural Networks (CNNs) is extended. According to an example, a method is derived to accommodate the nature of such a ranking problem: regression tasks with integer tags only. Thus, according to an example, the system is designed to have one regression target and another auxiliary classification target during training. According to an example, gender prediction and race prediction are added as two additional auxiliary tasks. Experiments on these tasks have shown that performance is improved after the tasks are introduced. Further, according to an example, unlike many other medical imaging efforts, the model is trained and tested on a self-timer dataset made up of facial images taken by a mobile device, and the model demonstrates that the end-to-end model works accurately in field self-timer. According to an example, the model is used on a mobile device by uploading only one single image. According to an example, the model is superior to similar work products [16]3% in terms of acne grading accuracy. Finally, according to an example, grad-CAM [13] is used as a visualization tool to show the interpretability of the CNN model.
Data set
According to one embodiment, the raw dataset consists of 5971 images collected from 1051 subjects of five different ethnicities, wherein the mobile phone took three images for each subject: from the front and both side views. Three dermatologists assign each subject an integer fraction of 0 to 5 using the GEA criteria [3] according to their expert evaluation of the corresponding images. For this score model, a dataset of 1877 frontal images was used. The base truth is defined as the majority score of the three dermatologist scores. The dataset was randomly divided into training (80%), testing (10%) and validation subsets (10%).
Model structure
In previous work [5,11], modern deep learning architectures such as ResNet [4] and MobileNet V2[12] have demonstrated excellent ability to learn detailed skin features. A typical approach would be to sort or regress the appropriate objective function by adding several fully connected layers based on the transfer learning of a pre-trained feature network (e.g., res net). However, since acne score is represented by consecutive integers, we introduce auxiliary classification loss and regression loss. The inspiration of this idea comes from [11,10] working results about age regression tasks, and similar situations apply. FIG. 1 is a schematic diagram of a storage device 100 of a computing device providing a deep learning system, according to one embodiment. The storage device includes memory (RAM/ROM) or the like, for example, for providing instructions to a processing unit, which in the example is a graphics processing unit or a processing unit such as a mobile device or server.
A face and logo detector 102 is provided to receive an image comprising pixels of the face (not shown) for processing. A face normalization component 104 is also provided to output a normalized face image. Components 102 and 104 pre-process the images to provide a normalized facial image to CNN 106. CNN106 includes an encoder component 108 (which, according to this example, includes a residual network (e.g., res net) encoder), a global pooling operation component 110, and decoder or predictor components 112, 114, and 116. The full connectivity layer 112 provides corresponding regressor and classifier outputs 118. Gender prediction component 114 generates gender output 120 and race prediction component 116 generates race output 122. Dashed lines 124 and 126 schematically illustrate the back propagation of the regression operation's loss function (line 124 represents the mean square error regressor loss function) and the classifier operation's loss function (line 122 represents the cross entropy classifier loss function) to the CNN input layer of encoder 108, respectively, as further described.
Therefore, at the time of testing, the CNN model (sometimes referred to as "model") takes as input a normalized face image, and outputs a vector y=f θ (x) The vector represents the probability distribution of the acne vulgaris diagnostic scale over all possible integer fractions. The final score is then calculated as softmax expected (later rounded to output integer):
Where a and b are the lower and upper bounds of the fractional range (e.g., scale).
To construct and train the network, according to an example, a feature extractor is constructed by adjusting an existing training image processing network. More specifically, a generic CNN defined using residual network techniques is employed. The feature extractor is defined by clipping ResNet50 at the average pooling layer that defines CNN encoder 108. The global maximum pooling layer defining the CNN global pooling section 110 is added after the last convolution block of the CNN encoder 108. After extracting features using these components, the features are further processed using two additional fully connected layers (with a leak ReLU added therebetween) defining fully connected layer 112. In addition, two additional branches are added to help the network learn better in this cross-ethnic and cross-gender dataset, namely gender prediction block 114 with output 120 and race prediction block 116 with output 122. Further discussed are experimental results by adding two branches. It will be appreciated that according to an example not shown, a basic CNN model other than the res net50 is employed (e.g., for image processing). For example, adapt MobileNet variants, etc. According to an example, it is desirable to employ a CNN model configured for mobile devices to suit business needs. It should be understood that the examples herein, including metrics, relate to an adapted ResNet50' network.
Learning
According to an example, CNN106 is trained with four tasks: acne score regression, acne score classification, gender prediction, and race prediction. CNN106 (its framework) is trained by optimizing the following objectives (defined by the combined loss function):
more specifically, the process is carried out,
where N is the training batch size, andis the basic truth label.
According to an example, score classification loss helps the network learn a better probability distribution by minimizing cross entropy error on the score classification probability output. In particular, this loss encourages the model to output the correct score class before calculating the expected value. As shown in fig. 1, the regression loss is back-propagated from the final output (shown as line 124) to the input layer (not specifically shown) of the CNN encoder 108, while the classification loss is back-propagated from the probability distribution (shown as line 126) to the input layer.
According to an example, gender and race prediction losses across gender, across race data sets, and race prediction losses are calculated using cross entropy errors as regularization terms. The two losses are also counter-propagated (not shown) from the prediction layer to the input layer, respectively, in a manner similar to line 126.
Experiment
Details of implementation
According to an example, for each image, a marker is detected, for example, by using a 60-point face tracker (no contour points), and a rectangle of the face region is cut out from the input image at training time with a certain randomness. For each face frame (face cropping), randomness is applied in the training dataset of the unique image to generate further images, for example by moving each point leftmost, rightmost, and bottommost to each corresponding direction, with a random value of [0.08,0.1 ] height, [0.2,0.3 ] width, [0.08,0.1 ] height, [0.07,0.08) bottom. Thus, the face frame will be cropped after expansion (the example shown in FIG. 2 shows the input image and face frame 202). For purposes of this disclosure, a privacy mask 204 is provided to obscure the user identity.
To further augment the source data, according to an example, at [0.8,1.0 ]]Random rescaling is performed in range, with a probability of 0.5 for random horizontal flipping. Each cropped image is sized 334x448 and the enhanced image is respectively scaled on the RGB channels by [0.485,0.456,0.406 ]]Centered, standard deviation is [0.229,0.224,0.225 ]]. According to an example, adam [ 6] is used]The CNN is optimized, and the learning rate is 0:0001. according to an example, best performance is achieved using the loss function of eq.2, where λ mse =1:0,λ ce =1:0,λ Sex (sex) =0:001,λ Race and race =0:001 and res net50 as the backbone feature network (i.e., the components defining CNN encoder 108).
Evaluation of
As previously mentioned, clinically, acne assessment is a regression task with only integer labels. Thus, the average absolute error and percentage of the test samples within a certain error threshold are reported. For example, 0: errors within 5 are also classification accuracy. As a result, according to an example, the average absolute error of the model was 0:35, and the classification error was 71%. In table 1, the results of comparison with work results previously on the same acne evaluation dataset are shown. Average absolute error is reported to summarize regression results, and report 0:5 and 1: error percentages within the 0 range to show classification accuracy levels. As a result, according to an example, the proposed CNN is superior to the method of previous work results in overall classification accuracy. In [16], the expert's performance was reported to be 67% in terms of expert consent to establish a baseline.
Table 1 overall results using only frontal images and all three photographs
One of the common challenges is to achieve the correct balance between the different categories. The overall accuracy is generally strongly correlated with the performance of most classes. In acne assessment, using a common scale, integer fractions 1 and 2 are typically the majority categories of such problems. On the other hand, given the size of the data and the original score definition, score 0 (no acne) is also very difficult to distinguish from score 1 (almost no acne). FIG. 3 is a graph 300 showing accuracy by score (e.g., label vs. prediction) according to an example. In graph 300, the accuracy profile of the method is shown to be well balanced for scores 1 through 4, with category 0 sometimes misclassified.
Ablation study
As described, according to an example, a learning method that combines regression with classification learning is employed in an effort to improve the accuracy of regression tasks with integer labels in acne assessment. This section includes a discussion and comparison of the following methods: 1) Training with MSE loss (denoted REG) using regression branches of direct output scores; 2) Training with cross entropy loss (noted CLS) using classification branches; 3) Calculating output according to probability output of the classification result, and training with MSE loss (REG is marked as via CLS); 4) The proposed method (denoted REG+CLS) discussed in section 2.
In table 2, according to an example, the Mean Absolute Error (MAE) and classification accuracy of 4 different training targets are shown. It can be seen that treating the skin analysis problem as a pure regression task achieves 68% results on the score classification, which is higher than when the problem is expressed as a pure classification task. According to an example, the proposed training technique outperforms all other training methods with minimum MAE and highest classification accuracy. All the results in table 2 were trained on gender and race branches.
Table 2 results of four different methods
Adding help branches
According to an example, in a trans-ethnic and trans-gender dataset, skin characteristics vary for each gender and ethnicity. According to the example, it is shown that by adding gender prediction and race prediction as auxiliary tasks, the overall performance is improved. In table 3, the baseline approach refers to training with classification tasks and regression tasks but without the addition of gender and race predicted branches. The other three columns are the results of adding the corresponding branches according to the example. The introduction of the auxiliary tasks remarkably improves the performance of the model, improves the classification precision by 7.2 percent and reduces the average absolute error by 0.03.
TABLE 3 results when adding help branches
Visualization of
Despite the significant progress in CNN in numerous visual tasks, in many cases such networks do not give a direct visual interpretation of predictions. Recent work, such as Class Activation Map (CAM) [14, incorporated herein by reference ] and gradient weighted class activation map (Grad-CAM) [13, incorporated herein by reference ], have proposed methods to visualize this interpretation of each predicted outcome. The interpretability, especially the work of research in the industry, is one of the key factors in establishing trust between the system and the user.
Fig. 4 shows a result 400 of a visualization operation according to an example for defining eight user images in two respective columns and four rows of respective cells, one cell for each user. The corresponding input image is modified in response to the acne vulgaris diagnosis. FIG. 4 shows an activation map (heat map) generated using Grad-CAM [13] from CNN106 (model). Fig. 4 shows different class activation diagrams, from class 1 (light) of the first or top row to class 4 (heavy) of the bottom or fourth row. Further, each cell displays an original face image (left image), a first modified face image (middle image) generated using the Grad-CAM, and a second modified face image (right image) generated using the Grad-CAM. Each of the first modified face image and the second modified face image includes an original face image overlaid (e.g., overlaid) with a heat map generated by a corresponding Grad-CAM, wherein the first modified face image displays a local normalized Class Activation Map (CAM) normalized in the image and the second modified face image displays a global normalized CAM normalized in the dataset. It should be appreciated that pixels of the original image of the CNN analysis are modified (e.g., RGB channel values are adjusted) using the heat map to visualize or highlight areas of the original image in response to the severity of the detected acne. In one example, the severity is locally normalized in the original image. In one example, the severity is normalized across images in the dataset. For example, the visualization is configured to generate heat map weights based on gradients defined/determined in the last convolutional layer of the trained network. The final heat map is then normalized in the image or dataset. Although fig. 4 shows an image array for a plurality of different faces, in one embodiment, one or more images of a single (user) face are visually presented.
Grad-CAM is just one example of a visualization method. In various embodiments, other visualization methods, such as guided back propagation, are also applied to the model.
To accommodate this regression task, in one embodiment, the gradient of class 0 (no acne) with respect to feature map A is negated to obtain a counterfactual interpretation [13](as shown in Eq. 4).Expressed as phasesFor a heat map value of the corresponding pixel of class 0, W is width, H is height, and y 0 Is the output value of category 0. Based on this equation, decreasing the higher value in the activation graph will result in an increase in the probability of class 0 and a higher probability.
As shown in fig. 4, the acne affected area has a higher activation value than healthy (unaffected) skin. Furthermore, when compared globally, the activation value for heavy (e.g., category 4) tends to be much higher than the activation value for light (e.g., category 1). In one example, the CNN thus produces a acne vulgaris diagnosis that includes a score and an area (e.g., face) associated with the score.
Fig. 5A and 5B are graphs 500 and 502 showing a relationship between an activation value and a prediction score for each image according to an embodiment. Interestingly, there is a positive correlation between the "affected area" and the final yield. This resulting image provides a clear and interpretable visualization of the acne assessment model.
According to an example, visualization of results using Grad-CAM is performed on each respective source image using an averaging technique. A method is performed, for example, by a computing device, to receive a source image for analysis, perform a plurality of k random data enhancements on the source image, respectively, to produce k enhanced images for analysis, analyze the corresponding k enhanced images using a CNN compatible with the described Grad-CAM technique to produce k activation masks/maps, and then average the k activation maps to produce a final mask (e.g., by summation and division). In one embodiment, the final mask is subject to a threshold. For example, areas with values less than a threshold (e.g., 0.4) are removed/eliminated from the heat map.
According to an example, enhancements include affine transformations (e.g., rotation, scaling, translation) and random horizontal flipping. According to an example, the enhancement includes color enhancement. According to the experiment, the improvement starts at k=2 and stabilizes at k=20. According to an example, the source image is analyzed without enhancement as one of k enhancement images (e.g., null enhancement).
It is apparent that while the CNN model is also configured to output the score and gender and race vectors of each of the processed k enhanced images, according to an example, such (score, gender and race vectors) output is obtained and used from processing one of the k enhanced images. Data enhancement and averaging helps refine the mask, but variations in score, output (as an example) are not expected. The scores from processing each of the k enhanced images (or other class outputs, for example) are omitted for averaging and a single value therein is used.
According to an example, the accuracy of the mask was tested by comparison with a mask calculated using the underlying truth coordinates of the acne lesions. For example, the mask is output by aggregating all circles centered on the acne lesion coordinates.
FIG. 6 is a block diagram of an example computer network 600 in which a personal-use computing device 602 operated by a user 604 communicates with remotely located server computing devices (i.e., server 606 and server 608) via a communication network 604. According to an example, user 604 is a patient of a consumer and/or dermatologist. Also shown is a second user 610 and a second computing device 612 configured for communication via the communication network 604. According to an example, the second user 610 is a dermatologist. The computing device 602 is for personal use by a user and is not available to the public. However, services from the server are available to the public. Here, the public includes registered users and/or clients, and the like.
According to an example, the computing device 602 is configured to perform skin diagnostics as described herein, i.e., to evaluate acne severity, e.g., to provide acne vulgaris diagnostics. According to an example, CNN106 is stored and utilized on computing device 602. According to an example, the slave server 606 provides the CNN106 from images received from the computing device 602, e.g., via a cloud service, web service, or the like.
According to an example, the computing device 602 is configured to communicate with the server 608 to provide acne diagnosis information and receive product/treatment recommendations, for example, in response to skin diagnosis and/or other information about the user (e.g., age, gender, etc.). According to an example, the computing device 602 is configured to transmit skin diagnostic information (which may include image data) to either or both of the servers 606 and 608, for storage in a data store (not shown), for example. According to an example, server 608 (or another service not shown) provides an e-commerce service to sell recommended products.
In the example of fig. 6, computing device 602 is shown as a handheld mobile device (e.g., a smartphone or tablet). However, according to an example, the computing device 602 is another form or type of computing device, such as a notebook computer, desktop computer, workstation, etc. (e.g., with greater processing resources). According to an example, skin diagnostics as described herein are implemented on other computing device types. According to an example, the computing device 602 is configured using one or more native applications or browser-based applications, for example.
According to an example, the computing device 602 includes a user device, for example, to obtain one or more images (e.g., pictures of skin, particularly faces) and process the images to provide skin diagnostics. According to an example, skin diagnostics are performed in association with (an execution activity of) a skin treatment plan, wherein images are periodically acquired and analyzed to determine skin scores, e.g. acne as described. Scores are stored (locally, remotely, or both) and compared between sessions, e.g., to show trends, improvements, etc. According to an example, the user 604 of the computing device 602 may access the skin score and/or the skin image. According to an example, the skin score and/or skin image may be available (e.g., via server 606 or via communication network 604) to another user (e.g., second user 610, such as a dermatologist) of computer system 600. According to an example, the second computing device 612 is configured to perform the described skin diagnostics. The second computing device 612 receives images from a remote source (e.g., computing device 602, server 606, server 608, etc.) and/or captures images via an optical sensor (e.g., camera) coupled thereto or in any other manner. As described, CNN106 is stored and used from second computing device 612 or from server 606.
According to an example, an application is provided to perform skin diagnostics, suggest one or more products, and monitor skin changes after one or more product applications (which define one or more treatment phases in a treatment plan) over a period of time. According to an example, a computer application provides a workflow such as a series of instructional Graphical User Interfaces (GUIs) and/or other user interfaces that are typically interactive and receive user input to perform any of the following activities:
skin diagnostics, such as acne;
product recommendations, such as treatment plans;
product procurement or other acquisition;
alert, instruct and/or record (e.g. log) the product application of the corresponding treatment phase;
subsequent (e.g., one or more follow-up) skin diagnosis; and
presenting results (e.g., comparison results);
for example, monitoring the progress of a skin treatment plan according to a treatment plan schedule. According to an example, any of these activities generate remotely stored data, such as for viewing by user 610, for viewing by another person, for aggregation with data of other users to measure treatment plan efficacy, and so forth.
According to an example, the comparison results (e.g., previous and subsequent results) are presented via computing device 602, whether during treatment planning and/or upon completion of treatment planning, etc. As noted, according to an example, various aspects of skin diagnostics are performed on computing device 600 or by a remote coupling device (e.g., a server in the cloud or another arrangement).
Fig. 7 is a block diagram of a computing device 602 in accordance with one or more aspects of the present disclosure. The computing device 602 includes one or more processors 702, one or more input devices 704, gesture-based I/O devices 706, one or more communication units 708, and one or more output devices 710. The computing device 602 also includes one or more storage devices 712 that store one or more modules and/or data. According to an example, the modules include a deep neural network model 714 (e.g., from CNN 106), an application 716 with components for a graphical user interface (GUI 718) and/or workflow for therapy monitoring (e.g., therapy monitor 720), an image acquisition 722 (e.g., interface), and a therapy/product selector 730 (e.g., interface). According to an example, the data includes one or more images for processing (e.g., image 724), skin diagnostic data (e.g., corresponding score, race, gender, or other user data), treatment data 728 (e.g., log data related to a particular treatment), a treatment plan with a schedule (e.g., for reminder), and so forth.
According to an example, the application 716 provides functionality to acquire one or more images (e.g., video) and process the images to determine skin diagnostics of the deep neural network provided by the neural network model 714. According to an example, the neural network model 714 is configured as the model shown in fig. 1, as described above. In another example, the neural network model 714 is located remotely and the computing device 602 communicates the image via the application 716 for processing and return of the skin diagnostic data. According to an example, application 716 is configured to perform these previously described activities.
According to an example, storage 712 stores additional modules, such as an operating system 732 and other modules (not shown) including a communication module; a graphics processing module (e.g., a GPU for the processor 702); a map module; a contact module; a calendar module; a photo/gallery module; photo (image/media) editing; a media player and/or streaming media module; social media applications; a browser module, etc. A memory device is sometimes referred to herein as a memory cell.
According to an example, the communication channel 738 couples each of the components 702, 704, 706, 708, 710, 712 to any of the modules 714, 716, and 732 for inter-component communication, whether communicatively, physically, and/or operatively. In some examples, communication channel 738 includes a system bus, a network connection, an interprocess communication data structure, or any other method for communicating data.
According to an example, the one or more processors 702 implement functions and/or execute instructions within the computing device 602. For example, the processor 702 is configured to receive instructions and/or data from the storage device 712 to perform the functions of the modules shown in fig. 5, etc. (e.g., operating system, application programs, etc.). Computing device 602 stores the data/information to storage device 712. Some functions are described further below. According to an example, it should be appreciated that operations do not fall entirely within modules 714, 716, and 732 of fig. 5, such that one module assists the functionality of another module.
According to an example, computer program code for carrying out operations is written in any combination of one or more programming languages, such as an object oriented programming language, such as Java, smalltalk, C ++ or the like, or a conventional procedural programming language, such as the "C" programming language or similar programming languages.
According to an example, the computing device 602 generates output for display on a screen of the gesture-based I/O device 706, or in some examples, for display by a projector, monitor, or other display device. According to an example, it will be appreciated that gesture-based I/O device 706 is configured using various technologies (e.g., with respect to input capabilities: resistive touch screen, surface acoustic wave touch screen, capacitive touch screen, projected capacitive touch screen, pressure sensitive screen, acoustic pulse recognition touch screen, or another field-sensitive screen technology; and with respect to output capabilities: liquid Crystal Display (LCD), light Emitting Diode (LED) display, organic Light Emitting Diode (OLED) display, dot matrix display, e-ink, or similar monochrome or color display).
In at least some examples described herein, gesture-based I/O device 706 includes a touch screen device capable of receiving a haptic interaction or gesture as input from a user interacting with the touch screen. According to an example, such gestures include a tap gesture, a drag or swipe gesture, a flick gesture, a pause gesture (e.g., a user touching the same location of the screen for at least a threshold period of time), wherein the user touches or points to one or more locations of the gesture-based I/O device 706. According to an example, gesture-based I/O device 706 also receives a non-click gesture. According to an example, gesture-based I/O device 706 outputs or displays information, such as a graphical user interface, to a user. Gesture-based I/O device 706 presents various applications, functions, and capabilities of computing device 602, including, for example, applications 716 for capturing images, viewing images, processing images, and displaying new images, messaging applications, telephony communications, contact and calendar applications, web browsing applications, gaming applications, electronic book applications, and financial, payment, and other applications or functions, among others.
Although the present disclosure primarily shows and discusses gesture-based I/O device 706 in the form of a display screen device (e.g., a touch screen) with I/O capabilities, other examples of gesture-based I/O devices are contemplated for detecting movement. According to an example, this does not include the screen itself. In this case, the computing device 602 includes a display screen or GUI coupled to a display means to present new images and applications 716. According to an example, the computing device 602 receives gesture-based input from a track pad/touch pad, one or more cameras, or another presence or gesture-sensitive input device, where presence means a presence aspect of a user, including, for example, all or part of the user's actions.
According to an example, the one or more communication units 708 communicate with external devices (e.g., server 606, server 608, second computing device 612) by sending and/or receiving network signals over one or more networks, such as via communication network 604, e.g., for purposes as described and/or for other purposes (e.g., printing). According to an example, the communication unit includes various antennas and/or network interface cards, chips (e.g., global Positioning Satellites (GPS)), etc., for wireless and/or wired communication.
According to an example, the input device 704 and the output device 710 include any of one or more buttons, switches, pointing devices, cameras, keyboards, microphones, one or more sensors (e.g., biometrics, etc.), speakers, bells, one or more lights, haptic (vibration) devices, etc. One or more of which are coupled via a Universal Serial Bus (USB) or other communication channel (e.g., 738). According to an example, the camera (input device 804) is front-facing (i.e., on the same side as it) to allow a user to capture images using the camera for "self-photographing" while viewing gesture-based I/O device 706.
According to an example, the one or more storage devices 712 take different forms and/or configurations, for example, as short-term memory or long-term memory. According to an example, the storage device 712 is configured to store information as volatile memory for a short period of time, the volatile memory not retaining stored content when powered off. Examples of volatile memory include Random Access Memory (RAM), dynamic Random Access Memory (DRAM), static Random Access Memory (SRAM), and the like. In some examples, storage device 712 also includes one or more computer-readable storage media, e.g., for storing a greater amount of information than volatile memory and/or for long-term storage of such information, retaining information when powered down. Examples of non-volatile memory include magnetic hard disk, optical disk, floppy disk, flash memory, or forms of electrically programmable memory (EPROM) or Electrically Erasable and Programmable (EEPROM) memory.
Although not shown, according to an example, the computing device is configured as a training environment to train the neural network model 714, for example, using the network as shown in fig. 4 along with appropriate training and/or testing data.
According to an example, CNN 106/neural network model 714 is applicable to a lightweight architecture of a computing device, which is a mobile device (e.g., a smartphone or tablet) that has less processing resources than a "larger" device (e.g., a laptop, desktop, workstation, server, or other comparable computing device).
According to an example, the second computing device 612 is similarly configured as the computing device 602. The second computing device 612 presents a GUI, such as requesting and displaying images and acne diagnoses for different users from data stored at the server 606.
8A-8B are flowcharts, respectively, of operations 800 and 810, such as for computing device 602 (or 610), according to an example. Operation 800 involves a user of computing device 602 capturing a self-photograph including an image of the user's face using an application, such as application 716, to perform skin diagnostics for acne severity. At 801, an image is received at a processor, such as via a camera or other means (e.g., from a message attachment).
At 802, the image is preprocessed to define a normalized image to be presented to the CNN. The image is centered and cropped to a particular size (resolution) in various ways to present a similar sized image to the CNN according to its training. At 803, the normalized image is processed using the CNN 106 (neural network model 714) to generate an acne vulgaris diagnosis (e.g., integer fraction). Gender and race outputs are also generated. At 804, acne diagnosis, gender, and race vectors (or individual values thereof) are presented, for example via a GUI. According to an example, the GUI presents the image and/or normalized image and an adapted image of the visualized acne using the described heat map. According to an example, the GUI presents the image and then transitions to presenting an adapted image (once available) that visualizes acne.
Fig. 8B shows operation 810. At 811, a GUI (according to an example, a GUI for any of operations 800, 810) is presented to initiate product and/or treatment recommendations. An input is received to invoke the performance. At 812, a recommendation is obtained. To obtain the recommendation, according to an example, the operation of the device 602 includes transmitting acne diagnostic information (e.g., score, ethnicity vector, gender vector, image, user information, etc.) to a remote server, such as server 608, to receive the recommendation. According to an example, the recommendation includes one or more products and a regimen applied to the skin area and is associated with a treatment plan having a schedule. At 813, the recommendation is presented, for example via a GUI. According to an example, more than one recommendation is received and presented. At 814, a selection is made indicating acceptance of the recommendation. According to an example, the selection is stored (recorded) and, for example, a therapy monitoring feature or function (not shown) of the computing device 602 is initiated. According to an example, at 815, a product purchase is facilitated, for example, via server 608 or another server.
Although not shown, according to an example, monitoring is responsive to a treatment plan (e.g., described in data) received by or accessible by computing device 602, e.g., via a browser. According to an example, the treatment plan has a schedule (e.g., morning and evening applications of the product), once-a-week applications of the second product, and so on. According to an example, the user is alerted of the schedule, e.g., via a notification based on a local application or via another way such as a calendar application. According to an example, a GUI is provided to facilitate a therapeutic activity, e.g., record its occurrence and/or provide instructions to perform the activity. An input is received, such as a confirmation that the activity was performed. According to an example, an image is included to record the activity. According to an example, the corresponding data (local and/or remote) is recorded. According to an example, a degree to which the measured treatment plan is followed is monitored. According to an example, product repurchase is facilitated. For example, according to an example, in response to process monitoring, it is determined that the amount of product on the hand is about to run out.
Although not shown, according to an example, a comparison activity is performed (e.g., performed as a monitoring activity). A GUI for comparison is provided to indicate the user, etc. A new image is received and (optionally) stored (e.g., compared to the initial image received at 601). Subsequent acne vulgaris diagnosis (e.g., normalization, etc., similar to operation 600) is performed on the new image using CNN 106. Using the initial and subsequent acne diagnoses, the GUI presents a comparison of the treatment results, optionally together with the first image and the new image, optionally together with the one or more images modified using the heat map.
Although not shown, according to an example, data received or generated for operations 800, 810 and monitoring and/or comparison activities is transmitted for remote storage (e.g., to server 606).
According to an example, acne diagnosis and subsequent diagnosis (optionally together with other monitoring) and providing data for aggregation enables investigation of product efficacy and/or fraudulent statement of products and treatments. According to an example, data is collected, analyzed, and presented to dermatologists and/or other professionals and/or users. Thus, the techniques and/or methods of the various examples herein facilitate a distributed research model, such as for acne skin treatment.
Fig. 9 is a flow chart of operations 900 for a computing device 602 (or 610), such as according to an example. Operation 900 is similar to operation 800 and involves a user of computing device 602 capturing a self-photograph including an image of the user's face using an application, such as application 716, to perform skin diagnostics for the severity of acne. In this operation 900, k data enhancements are performed and k images are analyzed to produce a visualization.
Because operation 900 in the example is similar, the reference from operation 800 is repeated in FIG. 9. At 801, an image is received at a processor, such as via a camera or other means (e.g., from a message attachment).
At 802, the image is preprocessed to define a normalized image to be presented to the CNN. The image is centered and cropped to a particular size (resolution) in various ways to present a similar sized image to the CNN according to its training. At 902, data enhancement is performed on the normalized image, thereby applying k random data enhancements, respectively, to define k enhanced images for analysis. In an example, operations 802 and 902 are reversed in order. The source image is enhanced and then normalized, although this may repeat some of the operations.
At 904, each of the k enhanced images is processed using the CNN 106 (neural network model 714) to generate a acne vulgaris diagnosis (e.g., integer fractions) and k activation masks (e.g., heatmaps, as described using Grad-CAM). Gender and race outputs are also generated. At 906, a final mask is defined from the k activation masks. In this example, k activation masks are averaged and the described thresholds are applied to generate the final mask/heat map as a visualization of the skin analysis. At 908, acne diagnosis is presented, for example, via a GUI with a visualization relative to the original image. In an example, the heat map is overlaid on the original image to visualize the analysis/diagnosis. Optionally, gender and race vectors (or individual values thereof) are presented with the visualization.
Fig. 10A and 10B illustrate respective source images 1000, 1002 and visualizations 1004, 1006, 1008, and 1010 according to an example skin analysis. Each of the first modified face images 1004 and 1006 and each of the second modified face images 1008 and 1010 include respective original face images 1000, 1002 that are overlaid (e.g., overlaid) with respective Grad-CAM generated heat maps generated using an averaging technique. The first modified face images 1004 and 1006 display a locally normalized Class Activation Map (CAM) normalized within the image, and the second modified face images 1008 and 1010 display globally normalized CAMs within the same extended number of all images. Referring to fig. 10B, the skin condition in the original image 1002 appears most severe in the central forehead area, less severe around the mouth and upper chin, and less severe again above the nose. The visualizations 1006 and 1010 obscure the skin details of the original image 1002 to highlight the skin region using a heat map. In one embodiment, color coverage is used. Color and/or gradation, etc. are aligned with the severity result (e.g., skin analysis integer value). The heat maps of visualizations 1006 and 1010 each highlight areas of the skin that are exhibiting skin conditions, and minimize other areas of the skin (e.g., cheeks, chin) in the original image 1002 that are not exhibiting skin conditions. For example, cheek regions darken and regions presenting skin conditions are highlighted relative to scores normalized using local or global data. Fig. 10A shows a similar visualization of the most severe skin condition in the forehead area. In the portion showing the skin condition, the highlighting may be a different particle. For example, in forehead areas 1004 and 1008 or 1006 and 1010, the portion of the area where the skin condition is detected is highlighted with respect to its severity.
In the present disclosure, a method of training CNNs on an acne severity assessment regression task using only integer labels is described in one or more examples. While previous work has typically had complex procedures and specific image requirements, the described end-to-end model of acne assessment can be used for images captured by mobile devices and can be used in real-time via mobile or Web applications. A better result of 3% was obtained with the appropriate loss function and training technique compared to a similar work result.
In addition to the computing device aspects shown in one or more examples, one of ordinary skill will appreciate that computer program product aspects are disclosed in which instructions are stored in a non-transitory storage device (e.g., memory, CD-ROM, DVD-ROM, optical disk, etc.) to configure the computing device to perform any of the method aspects described herein.
It will be appreciated that the computing device includes circuitry, such as a processing unit coupled to a memory unit. Such circuitry configures the computing device to provide various features and functions and/or to perform applicable methods. A circuit may be considered to define (at least logically) a corresponding functional unit. Examples of functional units are skin analysis units and/or visualization units, etc. having the features as described herein. Other will be apparent. In one embodiment, a skin analysis unit is provided for classifying pixels of an image using a deep neural network comprising a regressor and a classifier for image classification to generate the skin diagnosis for skin conditions.
Practical implementations may include any or all of the features described herein. These and other aspects, features, and various combinations may be expressed as methods, apparatus, systems, components, program products, and in other ways, for performing the functions. Many embodiments have been described. However, it will be appreciated that various modifications may be made without departing from the spirit and scope of the processes and techniques described herein. Further, other steps may be provided from the described processes, or steps may be eliminated, and other components may be added to, or removed from, the described systems. Accordingly, other embodiments are within the scope of the following claims.
Throughout the description and claims of this specification, the words "comprise" and "comprising" and variations thereof mean "including but not limited to" and are not intended to (nor do) exclude other elements, integers or steps. Throughout the specification, the singular encompasses the plural unless the context requires otherwise. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.
Features, integer features, compounds, chemical moieties or groups described in connection with a particular aspect, embodiment or example of the invention are to be understood as applicable to any other aspect, embodiment or example unless incompatible therewith. All of the features disclosed herein (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. The invention is not limited to the details of any of the foregoing examples or embodiments. The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.
Reference to the literature
1.Abas,F.S.,Kaffenberger,B.,Bikowski,J.,Gurcan,M.N.:Acne image analysis:lesion localization and classification.In:Medical Imaging 2016:Computer-Aided Diagnosis.vol.9785,p.97850B.International Society for Optics and Photonics(2016)
2.Alamdari,N.,Tavakolian,K.,Alhashim,M.,Fazel-Rezai,R.:Detection and classification of acne lesions in acne patients:A mobile application.In:2016IEEE International Conference on Electro Information Technology(EIT).pp.0739-0743.IEEE(2016)
3.Drno,B.,Poli,F.,Pawin,H.,Beylot,C.,Faure,M.,Chivot,M.,Auffret,N.,Moyse,D.,Ballanger,F.,Revuz,J.:Development and evaluation of a global acne severity scale(gea scale)suitable for France and Europe:Global acne assessment scale.Journal of the European Academy of Dermatology and Venereology:JEADV 25,43-8(04 2010).URL doi.org/10.1111/j.1468-3083.2010.03685.x
4.He,K.,Zhang,X.,Ren,S.,Sun,J.:Deep residual learning for image recognition.In:Proceedings of the IEEE conference on computer vision and pattern recognition.pp.770-778(2016)
5.Jiang,R.,Kezele,I.,Levinshtein,A.,Flament,F.,Zhang,J.,Elmoznino,E.,Ma,J.,Ma,H.,Coquide,J.,Arcin,V.,Omoyuri,E.,Aarabi,P.:A new procedure,free from human assessment,that automatically grades some facial skin structural signs.Comparison with assessments by experts,using referential atlases of skin aging.International Journal of Cosmetic Science 41(01 2019).URL doi.org/10.1111/ics.12512.
6.Kingma,D.P.,Ba,J.:Adam:A method for stochastic optimization.CoRR abs/1412.6980(2014),URL arxiv.org/abs/1412.6980
7.Malik,A.S.,Ramli,R.,Hani,A.F.M.,Salih,Y.,Yap,F.B.B.,Nisar,H.:Digital assessment of facial acne vulgaris.In:2014IEEE International Instrumentation and Measurement Technology Conference(I2MTC)Proceedings.pp.546-550.IEEE(2014)
8.Maroni,G.,Ermidoro,M.,Previdi,F.,Bigini,G.:Automated detection,extraction and counting of acne lesions for automatic evaluation and tracking of acne severity.In:2017IEEE Symposium Series on Computational Intelligence(SSCI).pp.1-6.IEEE(2017)
9.Melina,A.,Dinh,N.N.,Tafuri,B.,Schipani,G.,Nistic_o,S.,Cosentino,C.,Amato,F.,Thiboutot,D.,Cherubini,A.:Arti_cial intelligence for the objective evaluation of acne investigator global assessment.Journal of drugs in dermatology:JDD 17(9),1006-1009(2018)
10.Pan,H.,Han,H.,Shan,S.,Chen,X.:Mean-variance loss for deep age estimation from a face.In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.pp.5285-5294(2018)
11.Rothe,R.,Timofte,R.,Van Gool,L.:Dex:Deep expectation of apparent age from a single image.In:Proceedings of the IEEE International Conference on Computer Vision Workshops.pp.10-15(2015)
12.Sandler,M.,Howard,A.,Zhu,M.,Zhmoginov,A.,Chen,L.C.:Mobilenetv2:Inverted residuals and linear bottlenecks.In:The IEEE Conference on Computer Vision and Pattern Recognition(CVPR)(June 2018)
13.Selvaraju,R.R.,Cogswell,M.,Das,A.,Vedantam,R.,Parikh,D.,Batra,D.:Grad-cam:Visual explanations from deep networks via gradient-based localization.In:Proceedings of the IEEE International Conference on Computer Vision.pp.618-626(2017)
14.Zhou,B.,Khosla,A.,Lapedriza,A.,Oliva,A.,Torralba,A.:Learning deep features for discriminative localization.In:Proceedings of the IEEE conference on computer vision and pattern recognition.pp.2921-2929(2016)
15.J.P.-O.N.V.-A.A.R.-A.J.D.-C.J.Alfredo Padilla-Medina,Francisco Len-Ordoez,Assessment technique for acne treatments based on statistical parameters of skin thermal images,Journal of Biomedical Optics 19(4)(2014)1–8–8.doi:10.1117/1.JBO.19.4.046019.URL doi.org/10.1117/1.JBO.19.4.046019
16.La Roche Posay:What is effaclar spotscan?(2019),URL www.laroche-posay.co.uk/what-is-effaclar-spotscan,last accessed=2019-06-25