CN117242528A

CN117242528A - Systems and methods for processing images for skin analysis and visual skin analysis

Info

Publication number: CN117242528A
Application number: CN202080059710.5A
Authority: CN
Inventors: 蒋若玮; 埃琳娜·克泽勒; 俞至; 苏菲·赛蒂; 弗莱德里克·A·R·S·弗拉门特; 帕汉姆·阿拉比; 马修·佩罗; 朱利恩·戴斯伯斯
Original assignee: Ba Lioulaiya
Current assignee: Ba Lioulaiya
Priority date: 2019-07-10
Filing date: 2020-07-09
Publication date: 2023-12-15

Abstract

Systems and methods process images to determine skin condition severity analysis and visualize skin analysis, such as using deep neural networks (eg, convolutional neural networks), where the problem is formulated as a regression task with only integer labels. Auxiliary classification tasks (e.g., including gender and race prediction) are introduced to improve performance. Scoring and other image processing techniques (e.g., associated with the model) can be used to visualize the results, such as highlighting the analyzed images. The results show that visualizing the results highlighting the affected areas of the skin condition can also provide a clear explanation for the model. Multiple (k) data augmentations can be performed on the source image to produce k enhanced images for processing. Activation masks (e.g. heatmaps) generated from processing k augmented images are used to define the final map to visualize skin analysis.

Description

System and method for processing images for skin analysis and visualization of skin analysis

Cross reference

For the united states, the application claims its domestic benefits, for all other jurisdictions, the application claims priority to the following prior applications: 1) U.S. provisional application No.62/872,347 filed on 7.10.2019 and 2) U.S. provisional application No.62/878,464 filed on 25.7.2019. Where applicable, the entire contents of each of the prior applications are incorporated herein by reference.

Technical Field

The present application relates to image processing and dermatology using neural networks. More particularly, the present application relates to systems and methods for processing images to determine skin condition severity analysis and visualization skin analysis.

Background

Accurate skin analysis is an important area in both the medical and cosmetic fields. Acne vulgaris is a common dermatological disorder, with 85% affected at some stage in life [15]. Efforts have been made to systematically assess the severity of acne by scoring patient images. Standard systems, such as global acne severity tables, have been established to evaluate the acne score of each patient or image by integers based on lesion size, density, type and distribution.

An image such as a facial image presents acne vulgaris (skin condition) in a coded manner represented by pixels of the image. It is desirable to provide a computer-implemented method, computing device, and other aspects that use deep learning to perform or are capable of performing image-based automatic diagnostics to decode the presence and/or severity of acne vulgaris from an image. It is also desirable to process such images to visualize (e.g., by modifying the source image) the skin analysis.

Disclosure of Invention

Systems and methods for processing images related to skin analysis are provided. According to one embodiment, the learning-based model performs dermatological evaluation by using a deep neural network (e.g., convolutional Neural Network (CNN)). According to one embodiment, a CNN model is trained and evaluated, where the problem is expressed as a regression task with integer labels only. In one embodiment, auxiliary classification tasks (e.g., including gender and race predictions) are introduced to improve performance. In one embodiment, other image processing techniques are used in association with the trained deep neural network model to visualize the results on the original image. The results indicate that visualization of the results highlighting the area of influence of the skin condition (e.g., acne) may also provide a clear explanation for the model. In one embodiment, a plurality (k) of data enhancements are performed on a source image to produce k enhanced images for processing. An activation mask (e.g., a heat map) generated from processing the k enhanced images is used to define a final map to visualize the skin analysis.

According to one embodiment, there is provided a skin diagnostic device comprising circuitry providing a processing unit coupled to a storage unit to configure the skin diagnostic device to provide: a skin analysis unit for classifying pixels of the image using a deep neural network comprising a regressor and a classifier for image classification to generate a skin diagnosis for the skin condition.

In one embodiment, the processing unit and the storage unit further configure the skin diagnosis device to provide a visualization unit to annotate the image to visualize the skin diagnosis.

In one embodiment, the processing unit and the storage unit further configure the skin diagnosis device to provide a recommendation unit in response to the skin diagnosis to recommend a product for the skin condition.

In one embodiment, the processing unit and the storage unit further configure the skin diagnostic device to provide an e-commerce interface for purchasing products for skin conditions.

In one embodiment, the skin diagnosis includes integer values on a scale that classifies the severity of skin conditions on the image. In one embodiment, the depth neural network is configured to receive as input a normalized facial image and output a vector representing a probability distribution of all possible integer values in scale and apply an activation function to determine the integer values of the skin diagnosis.

In one embodiment, the deep neural network is further configured with auxiliary tasks to determine one or both of race predictions and gender predictions.

In one embodiment, the deep neural network comprises an adaptation network for image classification, the adaptation network being adapted to generate a skin diagnosis.

In one embodiment, the deep neural network comprises a Convolutional Neural Network (CNN) for image processing. In one embodiment, the CNN includes a residual network as an encoder with a global pooling operation prior to a final fully connected layer configured to generate regressor and classifier outputs.

In one embodiment, the deep neural network is trained with a combined loss function that combines a regressor loss function and a classifier loss function. In one embodiment, the combined loss function meets the equation:

wherein:

and->Mean Square Error (MSE), cross entropy error (CE), gender prediction and race prediction functions, respectively; and

λ _mse ,λ _ce ,λ _{sex (sex)} And lambda (lambda) _{Race and race} Is a weight factor.

In one embodiment, the deep neural network is trained using a dataset comprising facial image data from a self-captured image of a user's mobile device.

In one embodiment, the deep neural network is configured to generate a heat map to visualize a skin diagnosis associated with the image. In one embodiment, the deep neural network is configured to apply Class Activation Map (CAM) techniques to generate the heatmap.

In one embodiment, the image comprises a self-captured image from a user mobile device.

In one embodiment, the diagnostic device comprises one of a computing device for personal use and a server providing skin diagnostic services via a communication network. In one embodiment, the computing device used by the person includes one of a smart phone, tablet, laptop, personal computer, or other computing device having or coupled to a camera.

In one embodiment, the recommendation unit provides a treatment product selector responsive to the skin diagnosis to obtain at least one of a product recommendation and a treatment plan recommendation.

In one embodiment, the processing unit and the storage unit configure the skin diagnosis device to provide an image acquisition function to receive an image.

In one embodiment, the skin condition is acne vulgaris.

In one embodiment, a computer-implemented skin diagnostic method is provided, comprising: receiving the image and processing the image using a deep neural network configured to classify the image pixels to determine a skin diagnosis for the skin condition, wherein the deep neural network is configured as a regressor and classifier to determine the skin diagnosis; and visualizing a skin diagnosis associated with the image.

In one embodiment, a computer-implemented method is provided, comprising: receiving an image of skin; processing the image using a deep neural network configured to produce skin analysis results and an activation mask to visualize the skin analysis results associated with the image; and providing an activation mask and a displayed image. In one embodiment, processing the image includes: performing a plurality of (k) data enhancements to the image to produce k enhanced images for processing by the deep neural network; and processing the k enhanced images through the deep neural network to generate k activation masks, and thereby defining a final activation mask. In one embodiment, the k activation masks and the final activation mask comprise heatmaps. In one embodiment, the deep neural network generates the corresponding k activation masks using Grad-CAM technology. In one embodiment, the final activation mask is defined by k activation masks using averaging. In one embodiment, a threshold is applied to the cancellation values from the averaged k activation masks. In one embodiment, 2< =k < =20. In one embodiment, the method includes normalizing the image of the skin prior to performing the k data enhancements. In one embodiment, the k data enhancements each include one enhancement randomly selected from affine transformation and horizontal flipping operations. In one embodiment, the method includes providing a product recommendation in response to the skin analysis results. In one embodiment, the method includes providing an e-commerce interface for purchasing one or more products, the interface being responsive to product recommendations. In one embodiment, the product recommendation is associated with a treatment plan for using the recommended product. In one embodiment, the deep neural network is configured to analyze an image for a skin condition. In one embodiment, the skin condition is acne vulgaris. In one embodiment, the deep neural network comprises a Convolutional Neural Network (CNN). In one embodiment, a computing device is provided that includes circuitry configured to perform the method of any of the previous embodiments.

In one embodiment, a method is provided comprising: training a Convolutional Neural Network (CNN) configured to classify pixels of an image to determine a skin diagnosis, wherein: the CNN includes a deep neural network for image classification, the deep neural network configured to generate a skin diagnosis; CNN is configured as a regressor and classifier to determine skin diagnosis; and training the CNN using a dataset comprising facial image data from a self-captured image of the user's mobile device.

These and other aspects will be apparent to those of ordinary skill in the art. For any computing device-related aspect, corresponding method aspects and corresponding computer program product aspects are provided, wherein a (non-transitory) storage device stores instructions that, when executed by a processing unit of a computing device, configure the computing device to perform the method aspects.

Drawings

Fig. 1 is a schematic diagram of a storage device of a computing device including components of CNNs and outputs according to examples herein.

Fig. 2 is an image with face blocks.

Fig. 3 is a graph showing the accuracy of the CNN model by acne vulgaris diagnostic scores according to an example.

Fig. 4 shows a visualization of acne vulgaris diagnosis using a heat map according to an example.

Fig. 5A and 5B are graphs 500 and 502 showing a relationship between an activation value and a prediction score for each image according to an example.

FIG. 6 is an illustration of a computer network providing an environment for various aspects in accordance with embodiments herein.

FIG. 7 is a block diagram of a computing device according to an example of the computer network of FIG. 6.

Fig. 8A-8B and 9 are flowcharts of operations of computing devices according to embodiments herein.

Fig. 10A and 10B illustrate respective source images and visualizations of skin analysis according to an example.

In the facial example shown in the figures, the eye area is masked with a black rectangle for illustrative purposes only herein.

The inventive concept is best described by certain embodiments thereof, which are described herein with reference to the drawings, wherein like reference numerals refer to like features throughout. It is to be understood that the terminology used herein is intended to imply an inventive concept on the basis of the embodiments described below rather than just the embodiments themselves. It should be further understood that the overall inventive concept is not limited to the illustrative embodiments described below, and that the following description should be read in light of such viewpoints.

Detailed Description

Acne vulgaris is a common skin condition, 85% of the population, especially adolescents, and is sometimes experienced as acne. To assess the severity of acne, individuals need to visit dermatologists and clinicians and rely on their expertise in this area. The physician must physically and manually examine the patient and give a rough ranking based on lesion count, affected area, and other relevant factors. This approach is often time consuming and laborious, and may also lead to unreliable and inaccurate results due to human error. When repeated examinations are required over a period of time, too much effort is required by the doctor.

To minimize the manpower required for this task, many studies have been exploring computer-aided techniques to assess the severity of acne. Many work in this area, such as [1,7], require processing of high-standard medical images by algorithms that are difficult to deploy in mobile systems. Later work including [8,2] introduced methods of processing images taken by mobile phones through a number of different steps. However, all of these work has focused on acne localization and lesion counting [8,1,2,7]. This involves long-flow conventional image processing techniques, such as blob detection and feature extraction, to output mask or lesion region localization. The severity of the acne (i.e., acne score) is then calculated by the formula based on the localization results and the number of lesions detected. One major limitation of such methods is the accuracy of acne score combined with the appearance of acne localization and lesion counts. In some cases, lighting conditions and skin color can increase the error rate at various stages of the process, thereby significantly affecting the final result.

In a recent work on acne assessment [9], authors achieved significant results without lesion counts by using neural networks. They have shown that neural networks can perform very accurately as long as image data is presented.

However, their approach requires a special type of medical image, which forces the user to sit in front of the camera device and make 5 specific poses during training and testing. This type of evaluation also limits the use of mobile devices. Another work product [16] is adapted to the image taken by the mobile phone, requiring multiple iterations of manual correction.

According to an example, a training technique for Convolutional Neural Networks (CNNs) is extended. According to an example, a method is derived to accommodate the nature of such a ranking problem: regression tasks with integer tags only. Thus, according to an example, the system is designed to have one regression target and another auxiliary classification target during training. According to an example, gender prediction and race prediction are added as two additional auxiliary tasks. Experiments on these tasks have shown that performance is improved after the tasks are introduced. Further, according to an example, unlike many other medical imaging efforts, the model is trained and tested on a self-timer dataset made up of facial images taken by a mobile device, and the model demonstrates that the end-to-end model works accurately in field self-timer. According to an example, the model is used on a mobile device by uploading only one single image. According to an example, the model is superior to similar work products [16]3% in terms of acne grading accuracy. Finally, according to an example, grad-CAM [13] is used as a visualization tool to show the interpretability of the CNN model.

Data set

According to one embodiment, the raw dataset consists of 5971 images collected from 1051 subjects of five different ethnicities, wherein the mobile phone took three images for each subject: from the front and both side views. Three dermatologists assign each subject an integer fraction of 0 to 5 using the GEA criteria [3] according to their expert evaluation of the corresponding images. For this score model, a dataset of 1877 frontal images was used. The base truth is defined as the majority score of the three dermatologist scores. The dataset was randomly divided into training (80%), testing (10%) and validation subsets (10%).

Model structure

In previous work [5,11], modern deep learning architectures such as ResNet [4] and MobileNet V2[12] have demonstrated excellent ability to learn detailed skin features. A typical approach would be to sort or regress the appropriate objective function by adding several fully connected layers based on the transfer learning of a pre-trained feature network (e.g., res net). However, since acne score is represented by consecutive integers, we introduce auxiliary classification loss and regression loss. The inspiration of this idea comes from [11,10] working results about age regression tasks, and similar situations apply. FIG. 1 is a schematic diagram of a storage device 100 of a computing device providing a deep learning system, according to one embodiment. The storage device includes memory (RAM/ROM) or the like, for example, for providing instructions to a processing unit, which in the example is a graphics processing unit or a processing unit such as a mobile device or server.

A face and logo detector 102 is provided to receive an image comprising pixels of the face (not shown) for processing. A face normalization component 104 is also provided to output a normalized face image. Components 102 and 104 pre-process the images to provide a normalized facial image to CNN 106. CNN106 includes an encoder component 108 (which, according to this example, includes a residual network (e.g., res net) encoder), a global pooling operation component 110, and decoder or predictor components 112, 114, and 116. The full connectivity layer 112 provides corresponding regressor and classifier outputs 118. Gender prediction component 114 generates gender output 120 and race prediction component 116 generates race output 122. Dashed lines 124 and 126 schematically illustrate the back propagation of the regression operation's loss function (line 124 represents the mean square error regressor loss function) and the classifier operation's loss function (line 122 represents the cross entropy classifier loss function) to the CNN input layer of encoder 108, respectively, as further described.

Therefore, at the time of testing, the CNN model (sometimes referred to as "model") takes as input a normalized face image, and outputs a vector y=f _θ (x) The vector represents the probability distribution of the acne vulgaris diagnostic scale over all possible integer fractions. The final score is then calculated as softmax expected (later rounded to output integer):

Where a and b are the lower and upper bounds of the fractional range (e.g., scale).

To construct and train the network, according to an example, a feature extractor is constructed by adjusting an existing training image processing network. More specifically, a generic CNN defined using residual network techniques is employed. The feature extractor is defined by clipping ResNet50 at the average pooling layer that defines CNN encoder 108. The global maximum pooling layer defining the CNN global pooling section 110 is added after the last convolution block of the CNN encoder 108. After extracting features using these components, the features are further processed using two additional fully connected layers (with a leak ReLU added therebetween) defining fully connected layer 112. In addition, two additional branches are added to help the network learn better in this cross-ethnic and cross-gender dataset, namely gender prediction block 114 with output 120 and race prediction block 116 with output 122. Further discussed are experimental results by adding two branches. It will be appreciated that according to an example not shown, a basic CNN model other than the res net50 is employed (e.g., for image processing). For example, adapt MobileNet variants, etc. According to an example, it is desirable to employ a CNN model configured for mobile devices to suit business needs. It should be understood that the examples herein, including metrics, relate to an adapted ResNet50' network.

Learning

According to an example, CNN106 is trained with four tasks: acne score regression, acne score classification, gender prediction, and race prediction. CNN106 (its framework) is trained by optimizing the following objectives (defined by the combined loss function):

more specifically, the process is carried out,

where N is the training batch size, andis the basic truth label.

According to an example, score classification loss helps the network learn a better probability distribution by minimizing cross entropy error on the score classification probability output. In particular, this loss encourages the model to output the correct score class before calculating the expected value. As shown in fig. 1, the regression loss is back-propagated from the final output (shown as line 124) to the input layer (not specifically shown) of the CNN encoder 108, while the classification loss is back-propagated from the probability distribution (shown as line 126) to the input layer.

According to an example, gender and race prediction losses across gender, across race data sets, and race prediction losses are calculated using cross entropy errors as regularization terms. The two losses are also counter-propagated (not shown) from the prediction layer to the input layer, respectively, in a manner similar to line 126.

Experiment

Details of implementation

According to an example, for each image, a marker is detected, for example, by using a 60-point face tracker (no contour points), and a rectangle of the face region is cut out from the input image at training time with a certain randomness. For each face frame (face cropping), randomness is applied in the training dataset of the unique image to generate further images, for example by moving each point leftmost, rightmost, and bottommost to each corresponding direction, with a random value of [0.08,0.1 ] height, [0.2,0.3 ] width, [0.08,0.1 ] height, [0.07,0.08) bottom. Thus, the face frame will be cropped after expansion (the example shown in FIG. 2 shows the input image and face frame 202). For purposes of this disclosure, a privacy mask 204 is provided to obscure the user identity.

To further augment the source data, according to an example, at [0.8,1.0 ]]Random rescaling is performed in range, with a probability of 0.5 for random horizontal flipping. Each cropped image is sized 334x448 and the enhanced image is respectively scaled on the RGB channels by [0.485,0.456,0.406 ]]Centered, standard deviation is [0.229,0.224,0.225 ]]. According to an example, adam [ 6] is used]The CNN is optimized, and the learning rate is 0:0001. according to an example, best performance is achieved using the loss function of eq.2, where λ _mse ＝1:0,λ _ce ＝1:0,λ _{Sex (sex)} ＝0:001,λ _{Race and race} =0:001 and res net50 as the backbone feature network (i.e., the components defining CNN encoder 108).

Evaluation of

As previously mentioned, clinically, acne assessment is a regression task with only integer labels. Thus, the average absolute error and percentage of the test samples within a certain error threshold are reported. For example, 0: errors within 5 are also classification accuracy. As a result, according to an example, the average absolute error of the model was 0:35, and the classification error was 71%. In table 1, the results of comparison with work results previously on the same acne evaluation dataset are shown. Average absolute error is reported to summarize regression results, and report 0:5 and 1: error percentages within the 0 range to show classification accuracy levels. As a result, according to an example, the proposed CNN is superior to the method of previous work results in overall classification accuracy. In [16], the expert's performance was reported to be 67% in terms of expert consent to establish a baseline.

Table 1 overall results using only frontal images and all three photographs

One of the common challenges is to achieve the correct balance between the different categories. The overall accuracy is generally strongly correlated with the performance of most classes. In acne assessment, using a common scale, integer fractions 1 and 2 are typically the majority categories of such problems. On the other hand, given the size of the data and the original score definition, score 0 (no acne) is also very difficult to distinguish from score 1 (almost no acne). FIG. 3 is a graph 300 showing accuracy by score (e.g., label vs. prediction) according to an example. In graph 300, the accuracy profile of the method is shown to be well balanced for scores 1 through 4, with category 0 sometimes misclassified.

Ablation study

As described, according to an example, a learning method that combines regression with classification learning is employed in an effort to improve the accuracy of regression tasks with integer labels in acne assessment. This section includes a discussion and comparison of the following methods: 1) Training with MSE loss (denoted REG) using regression branches of direct output scores; 2) Training with cross entropy loss (noted CLS) using classification branches; 3) Calculating output according to probability output of the classification result, and training with MSE loss (REG is marked as via CLS); 4) The proposed method (denoted REG+CLS) discussed in section 2.

In table 2, according to an example, the Mean Absolute Error (MAE) and classification accuracy of 4 different training targets are shown. It can be seen that treating the skin analysis problem as a pure regression task achieves 68% results on the score classification, which is higher than when the problem is expressed as a pure classification task. According to an example, the proposed training technique outperforms all other training methods with minimum MAE and highest classification accuracy. All the results in table 2 were trained on gender and race branches.

Table 2 results of four different methods

Adding help branches

According to an example, in a trans-ethnic and trans-gender dataset, skin characteristics vary for each gender and ethnicity. According to the example, it is shown that by adding gender prediction and race prediction as auxiliary tasks, the overall performance is improved. In table 3, the baseline approach refers to training with classification tasks and regression tasks but without the addition of gender and race predicted branches. The other three columns are the results of adding the corresponding branches according to the example. The introduction of the auxiliary tasks remarkably improves the performance of the model, improves the classification precision by 7.2 percent and reduces the average absolute error by 0.03.

TABLE 3 results when adding help branches

Visualization of

Despite the significant progress in CNN in numerous visual tasks, in many cases such networks do not give a direct visual interpretation of predictions. Recent work, such as Class Activation Map (CAM) [14, incorporated herein by reference ] and gradient weighted class activation map (Grad-CAM) [13, incorporated herein by reference ], have proposed methods to visualize this interpretation of each predicted outcome. The interpretability, especially the work of research in the industry, is one of the key factors in establishing trust between the system and the user.

Fig. 4 shows a result 400 of a visualization operation according to an example for defining eight user images in two respective columns and four rows of respective cells, one cell for each user. The corresponding input image is modified in response to the acne vulgaris diagnosis. FIG. 4 shows an activation map (heat map) generated using Grad-CAM [13] from CNN106 (model). Fig. 4 shows different class activation diagrams, from class 1 (light) of the first or top row to class 4 (heavy) of the bottom or fourth row. Further, each cell displays an original face image (left image), a first modified face image (middle image) generated using the Grad-CAM, and a second modified face image (right image) generated using the Grad-CAM. Each of the first modified face image and the second modified face image includes an original face image overlaid (e.g., overlaid) with a heat map generated by a corresponding Grad-CAM, wherein the first modified face image displays a local normalized Class Activation Map (CAM) normalized in the image and the second modified face image displays a global normalized CAM normalized in the dataset. It should be appreciated that pixels of the original image of the CNN analysis are modified (e.g., RGB channel values are adjusted) using the heat map to visualize or highlight areas of the original image in response to the severity of the detected acne. In one example, the severity is locally normalized in the original image. In one example, the severity is normalized across images in the dataset. For example, the visualization is configured to generate heat map weights based on gradients defined/determined in the last convolutional layer of the trained network. The final heat map is then normalized in the image or dataset. Although fig. 4 shows an image array for a plurality of different faces, in one embodiment, one or more images of a single (user) face are visually presented.

Grad-CAM is just one example of a visualization method. In various embodiments, other visualization methods, such as guided back propagation, are also applied to the model.

To accommodate this regression task, in one embodiment, the gradient of class 0 (no acne) with respect to feature map A is negated to obtain a counterfactual interpretation [13](as shown in Eq. 4).Expressed as phasesFor a heat map value of the corresponding pixel of class 0, W is width, H is height, and y ⁰ Is the output value of category 0. Based on this equation, decreasing the higher value in the activation graph will result in an increase in the probability of class 0 and a higher probability.

As shown in fig. 4, the acne affected area has a higher activation value than healthy (unaffected) skin. Furthermore, when compared globally, the activation value for heavy (e.g., category 4) tends to be much higher than the activation value for light (e.g., category 1). In one example, the CNN thus produces a acne vulgaris diagnosis that includes a score and an area (e.g., face) associated with the score.

Fig. 5A and 5B are graphs 500 and 502 showing a relationship between an activation value and a prediction score for each image according to an embodiment. Interestingly, there is a positive correlation between the "affected area" and the final yield. This resulting image provides a clear and interpretable visualization of the acne assessment model.

According to an example, visualization of results using Grad-CAM is performed on each respective source image using an averaging technique. A method is performed, for example, by a computing device, to receive a source image for analysis, perform a plurality of k random data enhancements on the source image, respectively, to produce k enhanced images for analysis, analyze the corresponding k enhanced images using a CNN compatible with the described Grad-CAM technique to produce k activation masks/maps, and then average the k activation maps to produce a final mask (e.g., by summation and division). In one embodiment, the final mask is subject to a threshold. For example, areas with values less than a threshold (e.g., 0.4) are removed/eliminated from the heat map.

According to an example, enhancements include affine transformations (e.g., rotation, scaling, translation) and random horizontal flipping. According to an example, the enhancement includes color enhancement. According to the experiment, the improvement starts at k=2 and stabilizes at k=20. According to an example, the source image is analyzed without enhancement as one of k enhancement images (e.g., null enhancement).

It is apparent that while the CNN model is also configured to output the score and gender and race vectors of each of the processed k enhanced images, according to an example, such (score, gender and race vectors) output is obtained and used from processing one of the k enhanced images. Data enhancement and averaging helps refine the mask, but variations in score, output (as an example) are not expected. The scores from processing each of the k enhanced images (or other class outputs, for example) are omitted for averaging and a single value therein is used.

According to an example, the accuracy of the mask was tested by comparison with a mask calculated using the underlying truth coordinates of the acne lesions. For example, the mask is output by aggregating all circles centered on the acne lesion coordinates.

FIG. 6 is a block diagram of an example computer network 600 in which a personal-use computing device 602 operated by a user 604 communicates with remotely located server computing devices (i.e., server 606 and server 608) via a communication network 604. According to an example, user 604 is a patient of a consumer and/or dermatologist. Also shown is a second user 610 and a second computing device 612 configured for communication via the communication network 604. According to an example, the second user 610 is a dermatologist. The computing device 602 is for personal use by a user and is not available to the public. However, services from the server are available to the public. Here, the public includes registered users and/or clients, and the like.

According to an example, the computing device 602 is configured to perform skin diagnostics as described herein, i.e., to evaluate acne severity, e.g., to provide acne vulgaris diagnostics. According to an example, CNN106 is stored and utilized on computing device 602. According to an example, the slave server 606 provides the CNN106 from images received from the computing device 602, e.g., via a cloud service, web service, or the like.

According to an example, the computing device 602 is configured to communicate with the server 608 to provide acne diagnosis information and receive product/treatment recommendations, for example, in response to skin diagnosis and/or other information about the user (e.g., age, gender, etc.). According to an example, the computing device 602 is configured to transmit skin diagnostic information (which may include image data) to either or both of the servers 606 and 608, for storage in a data store (not shown), for example. According to an example, server 608 (or another service not shown) provides an e-commerce service to sell recommended products.

In the example of fig. 6, computing device 602 is shown as a handheld mobile device (e.g., a smartphone or tablet). However, according to an example, the computing device 602 is another form or type of computing device, such as a notebook computer, desktop computer, workstation, etc. (e.g., with greater processing resources). According to an example, skin diagnostics as described herein are implemented on other computing device types. According to an example, the computing device 602 is configured using one or more native applications or browser-based applications, for example.

According to an example, the computing device 602 includes a user device, for example, to obtain one or more images (e.g., pictures of skin, particularly faces) and process the images to provide skin diagnostics. According to an example, skin diagnostics are performed in association with (an execution activity of) a skin treatment plan, wherein images are periodically acquired and analyzed to determine skin scores, e.g. acne as described. Scores are stored (locally, remotely, or both) and compared between sessions, e.g., to show trends, improvements, etc. According to an example, the user 604 of the computing device 602 may access the skin score and/or the skin image. According to an example, the skin score and/or skin image may be available (e.g., via server 606 or via communication network 604) to another user (e.g., second user 610, such as a dermatologist) of computer system 600. According to an example, the second computing device 612 is configured to perform the described skin diagnostics. The second computing device 612 receives images from a remote source (e.g., computing device 602, server 606, server 608, etc.) and/or captures images via an optical sensor (e.g., camera) coupled thereto or in any other manner. As described, CNN106 is stored and used from second computing device 612 or from server 606.

According to an example, an application is provided to perform skin diagnostics, suggest one or more products, and monitor skin changes after one or more product applications (which define one or more treatment phases in a treatment plan) over a period of time. According to an example, a computer application provides a workflow such as a series of instructional Graphical User Interfaces (GUIs) and/or other user interfaces that are typically interactive and receive user input to perform any of the following activities:

skin diagnostics, such as acne;

product recommendations, such as treatment plans;

product procurement or other acquisition;

alert, instruct and/or record (e.g. log) the product application of the corresponding treatment phase;

subsequent (e.g., one or more follow-up) skin diagnosis; and

presenting results (e.g., comparison results);

for example, monitoring the progress of a skin treatment plan according to a treatment plan schedule. According to an example, any of these activities generate remotely stored data, such as for viewing by user 610, for viewing by another person, for aggregation with data of other users to measure treatment plan efficacy, and so forth.

According to an example, the comparison results (e.g., previous and subsequent results) are presented via computing device 602, whether during treatment planning and/or upon completion of treatment planning, etc. As noted, according to an example, various aspects of skin diagnostics are performed on computing device 600 or by a remote coupling device (e.g., a server in the cloud or another arrangement).

Fig. 7 is a block diagram of a computing device 602 in accordance with one or more aspects of the present disclosure. The computing device 602 includes one or more processors 702, one or more input devices 704, gesture-based I/O devices 706, one or more communication units 708, and one or more output devices 710. The computing device 602 also includes one or more storage devices 712 that store one or more modules and/or data. According to an example, the modules include a deep neural network model 714 (e.g., from CNN 106), an application 716 with components for a graphical user interface (GUI 718) and/or workflow for therapy monitoring (e.g., therapy monitor 720), an image acquisition 722 (e.g., interface), and a therapy/product selector 730 (e.g., interface). According to an example, the data includes one or more images for processing (e.g., image 724), skin diagnostic data (e.g., corresponding score, race, gender, or other user data), treatment data 728 (e.g., log data related to a particular treatment), a treatment plan with a schedule (e.g., for reminder), and so forth.

According to an example, the application 716 provides functionality to acquire one or more images (e.g., video) and process the images to determine skin diagnostics of the deep neural network provided by the neural network model 714. According to an example, the neural network model 714 is configured as the model shown in fig. 1, as described above. In another example, the neural network model 714 is located remotely and the computing device 602 communicates the image via the application 716 for processing and return of the skin diagnostic data. According to an example, application 716 is configured to perform these previously described activities.

According to an example, storage 712 stores additional modules, such as an operating system 732 and other modules (not shown) including a communication module; a graphics processing module (e.g., a GPU for the processor 702); a map module; a contact module; a calendar module; a photo/gallery module; photo (image/media) editing; a media player and/or streaming media module; social media applications; a browser module, etc. A memory device is sometimes referred to herein as a memory cell.

According to an example, the communication channel 738 couples each of the components 702, 704, 706, 708, 710, 712 to any of the modules 714, 716, and 732 for inter-component communication, whether communicatively, physically, and/or operatively. In some examples, communication channel 738 includes a system bus, a network connection, an interprocess communication data structure, or any other method for communicating data.

According to an example, the one or more processors 702 implement functions and/or execute instructions within the computing device 602. For example, the processor 702 is configured to receive instructions and/or data from the storage device 712 to perform the functions of the modules shown in fig. 5, etc. (e.g., operating system, application programs, etc.). Computing device 602 stores the data/information to storage device 712. Some functions are described further below. According to an example, it should be appreciated that operations do not fall entirely within modules 714, 716, and 732 of fig. 5, such that one module assists the functionality of another module.

According to an example, computer program code for carrying out operations is written in any combination of one or more programming languages, such as an object oriented programming language, such as Java, smalltalk, C ++ or the like, or a conventional procedural programming language, such as the "C" programming language or similar programming languages.

According to an example, the computing device 602 generates output for display on a screen of the gesture-based I/O device 706, or in some examples, for display by a projector, monitor, or other display device. According to an example, it will be appreciated that gesture-based I/O device 706 is configured using various technologies (e.g., with respect to input capabilities: resistive touch screen, surface acoustic wave touch screen, capacitive touch screen, projected capacitive touch screen, pressure sensitive screen, acoustic pulse recognition touch screen, or another field-sensitive screen technology; and with respect to output capabilities: liquid Crystal Display (LCD), light Emitting Diode (LED) display, organic Light Emitting Diode (OLED) display, dot matrix display, e-ink, or similar monochrome or color display).

In at least some examples described herein, gesture-based I/O device 706 includes a touch screen device capable of receiving a haptic interaction or gesture as input from a user interacting with the touch screen. According to an example, such gestures include a tap gesture, a drag or swipe gesture, a flick gesture, a pause gesture (e.g., a user touching the same location of the screen for at least a threshold period of time), wherein the user touches or points to one or more locations of the gesture-based I/O device 706. According to an example, gesture-based I/O device 706 also receives a non-click gesture. According to an example, gesture-based I/O device 706 outputs or displays information, such as a graphical user interface, to a user. Gesture-based I/O device 706 presents various applications, functions, and capabilities of computing device 602, including, for example, applications 716 for capturing images, viewing images, processing images, and displaying new images, messaging applications, telephony communications, contact and calendar applications, web browsing applications, gaming applications, electronic book applications, and financial, payment, and other applications or functions, among others.

Although the present disclosure primarily shows and discusses gesture-based I/O device 706 in the form of a display screen device (e.g., a touch screen) with I/O capabilities, other examples of gesture-based I/O devices are contemplated for detecting movement. According to an example, this does not include the screen itself. In this case, the computing device 602 includes a display screen or GUI coupled to a display means to present new images and applications 716. According to an example, the computing device 602 receives gesture-based input from a track pad/touch pad, one or more cameras, or another presence or gesture-sensitive input device, where presence means a presence aspect of a user, including, for example, all or part of the user's actions.

According to an example, the one or more communication units 708 communicate with external devices (e.g., server 606, server 608, second computing device 612) by sending and/or receiving network signals over one or more networks, such as via communication network 604, e.g., for purposes as described and/or for other purposes (e.g., printing). According to an example, the communication unit includes various antennas and/or network interface cards, chips (e.g., global Positioning Satellites (GPS)), etc., for wireless and/or wired communication.

According to an example, the input device 704 and the output device 710 include any of one or more buttons, switches, pointing devices, cameras, keyboards, microphones, one or more sensors (e.g., biometrics, etc.), speakers, bells, one or more lights, haptic (vibration) devices, etc. One or more of which are coupled via a Universal Serial Bus (USB) or other communication channel (e.g., 738). According to an example, the camera (input device 804) is front-facing (i.e., on the same side as it) to allow a user to capture images using the camera for "self-photographing" while viewing gesture-based I/O device 706.

According to an example, the one or more storage devices 712 take different forms and/or configurations, for example, as short-term memory or long-term memory. According to an example, the storage device 712 is configured to store information as volatile memory for a short period of time, the volatile memory not retaining stored content when powered off. Examples of volatile memory include Random Access Memory (RAM), dynamic Random Access Memory (DRAM), static Random Access Memory (SRAM), and the like. In some examples, storage device 712 also includes one or more computer-readable storage media, e.g., for storing a greater amount of information than volatile memory and/or for long-term storage of such information, retaining information when powered down. Examples of non-volatile memory include magnetic hard disk, optical disk, floppy disk, flash memory, or forms of electrically programmable memory (EPROM) or Electrically Erasable and Programmable (EEPROM) memory.

Although not shown, according to an example, the computing device is configured as a training environment to train the neural network model 714, for example, using the network as shown in fig. 4 along with appropriate training and/or testing data.

According to an example, CNN 106/neural network model 714 is applicable to a lightweight architecture of a computing device, which is a mobile device (e.g., a smartphone or tablet) that has less processing resources than a "larger" device (e.g., a laptop, desktop, workstation, server, or other comparable computing device).

According to an example, the second computing device 612 is similarly configured as the computing device 602. The second computing device 612 presents a GUI, such as requesting and displaying images and acne diagnoses for different users from data stored at the server 606.

8A-8B are flowcharts, respectively, of operations 800 and 810, such as for computing device 602 (or 610), according to an example. Operation 800 involves a user of computing device 602 capturing a self-photograph including an image of the user's face using an application, such as application 716, to perform skin diagnostics for acne severity. At 801, an image is received at a processor, such as via a camera or other means (e.g., from a message attachment).

At 802, the image is preprocessed to define a normalized image to be presented to the CNN. The image is centered and cropped to a particular size (resolution) in various ways to present a similar sized image to the CNN according to its training. At 803, the normalized image is processed using the CNN 106 (neural network model 714) to generate an acne vulgaris diagnosis (e.g., integer fraction). Gender and race outputs are also generated. At 804, acne diagnosis, gender, and race vectors (or individual values thereof) are presented, for example via a GUI. According to an example, the GUI presents the image and/or normalized image and an adapted image of the visualized acne using the described heat map. According to an example, the GUI presents the image and then transitions to presenting an adapted image (once available) that visualizes acne.

Fig. 8B shows operation 810. At 811, a GUI (according to an example, a GUI for any of operations 800, 810) is presented to initiate product and/or treatment recommendations. An input is received to invoke the performance. At 812, a recommendation is obtained. To obtain the recommendation, according to an example, the operation of the device 602 includes transmitting acne diagnostic information (e.g., score, ethnicity vector, gender vector, image, user information, etc.) to a remote server, such as server 608, to receive the recommendation. According to an example, the recommendation includes one or more products and a regimen applied to the skin area and is associated with a treatment plan having a schedule. At 813, the recommendation is presented, for example via a GUI. According to an example, more than one recommendation is received and presented. At 814, a selection is made indicating acceptance of the recommendation. According to an example, the selection is stored (recorded) and, for example, a therapy monitoring feature or function (not shown) of the computing device 602 is initiated. According to an example, at 815, a product purchase is facilitated, for example, via server 608 or another server.

Although not shown, according to an example, monitoring is responsive to a treatment plan (e.g., described in data) received by or accessible by computing device 602, e.g., via a browser. According to an example, the treatment plan has a schedule (e.g., morning and evening applications of the product), once-a-week applications of the second product, and so on. According to an example, the user is alerted of the schedule, e.g., via a notification based on a local application or via another way such as a calendar application. According to an example, a GUI is provided to facilitate a therapeutic activity, e.g., record its occurrence and/or provide instructions to perform the activity. An input is received, such as a confirmation that the activity was performed. According to an example, an image is included to record the activity. According to an example, the corresponding data (local and/or remote) is recorded. According to an example, a degree to which the measured treatment plan is followed is monitored. According to an example, product repurchase is facilitated. For example, according to an example, in response to process monitoring, it is determined that the amount of product on the hand is about to run out.

Although not shown, according to an example, a comparison activity is performed (e.g., performed as a monitoring activity). A GUI for comparison is provided to indicate the user, etc. A new image is received and (optionally) stored (e.g., compared to the initial image received at 601). Subsequent acne vulgaris diagnosis (e.g., normalization, etc., similar to operation 600) is performed on the new image using CNN 106. Using the initial and subsequent acne diagnoses, the GUI presents a comparison of the treatment results, optionally together with the first image and the new image, optionally together with the one or more images modified using the heat map.

Although not shown, according to an example, data received or generated for operations 800, 810 and monitoring and/or comparison activities is transmitted for remote storage (e.g., to server 606).

According to an example, acne diagnosis and subsequent diagnosis (optionally together with other monitoring) and providing data for aggregation enables investigation of product efficacy and/or fraudulent statement of products and treatments. According to an example, data is collected, analyzed, and presented to dermatologists and/or other professionals and/or users. Thus, the techniques and/or methods of the various examples herein facilitate a distributed research model, such as for acne skin treatment.

Fig. 9 is a flow chart of operations 900 for a computing device 602 (or 610), such as according to an example. Operation 900 is similar to operation 800 and involves a user of computing device 602 capturing a self-photograph including an image of the user's face using an application, such as application 716, to perform skin diagnostics for the severity of acne. In this operation 900, k data enhancements are performed and k images are analyzed to produce a visualization.

Because operation 900 in the example is similar, the reference from operation 800 is repeated in FIG. 9. At 801, an image is received at a processor, such as via a camera or other means (e.g., from a message attachment).

At 802, the image is preprocessed to define a normalized image to be presented to the CNN. The image is centered and cropped to a particular size (resolution) in various ways to present a similar sized image to the CNN according to its training. At 902, data enhancement is performed on the normalized image, thereby applying k random data enhancements, respectively, to define k enhanced images for analysis. In an example, operations 802 and 902 are reversed in order. The source image is enhanced and then normalized, although this may repeat some of the operations.

At 904, each of the k enhanced images is processed using the CNN 106 (neural network model 714) to generate a acne vulgaris diagnosis (e.g., integer fractions) and k activation masks (e.g., heatmaps, as described using Grad-CAM). Gender and race outputs are also generated. At 906, a final mask is defined from the k activation masks. In this example, k activation masks are averaged and the described thresholds are applied to generate the final mask/heat map as a visualization of the skin analysis. At 908, acne diagnosis is presented, for example, via a GUI with a visualization relative to the original image. In an example, the heat map is overlaid on the original image to visualize the analysis/diagnosis. Optionally, gender and race vectors (or individual values thereof) are presented with the visualization.

Fig. 10A and 10B illustrate respective source images 1000, 1002 and visualizations 1004, 1006, 1008, and 1010 according to an example skin analysis. Each of the first modified face images 1004 and 1006 and each of the second modified face images 1008 and 1010 include respective original face images 1000, 1002 that are overlaid (e.g., overlaid) with respective Grad-CAM generated heat maps generated using an averaging technique. The first modified face images 1004 and 1006 display a locally normalized Class Activation Map (CAM) normalized within the image, and the second modified face images 1008 and 1010 display globally normalized CAMs within the same extended number of all images. Referring to fig. 10B, the skin condition in the original image 1002 appears most severe in the central forehead area, less severe around the mouth and upper chin, and less severe again above the nose. The visualizations 1006 and 1010 obscure the skin details of the original image 1002 to highlight the skin region using a heat map. In one embodiment, color coverage is used. Color and/or gradation, etc. are aligned with the severity result (e.g., skin analysis integer value). The heat maps of visualizations 1006 and 1010 each highlight areas of the skin that are exhibiting skin conditions, and minimize other areas of the skin (e.g., cheeks, chin) in the original image 1002 that are not exhibiting skin conditions. For example, cheek regions darken and regions presenting skin conditions are highlighted relative to scores normalized using local or global data. Fig. 10A shows a similar visualization of the most severe skin condition in the forehead area. In the portion showing the skin condition, the highlighting may be a different particle. For example, in forehead areas 1004 and 1008 or 1006 and 1010, the portion of the area where the skin condition is detected is highlighted with respect to its severity.

In the present disclosure, a method of training CNNs on an acne severity assessment regression task using only integer labels is described in one or more examples. While previous work has typically had complex procedures and specific image requirements, the described end-to-end model of acne assessment can be used for images captured by mobile devices and can be used in real-time via mobile or Web applications. A better result of 3% was obtained with the appropriate loss function and training technique compared to a similar work result.

In addition to the computing device aspects shown in one or more examples, one of ordinary skill will appreciate that computer program product aspects are disclosed in which instructions are stored in a non-transitory storage device (e.g., memory, CD-ROM, DVD-ROM, optical disk, etc.) to configure the computing device to perform any of the method aspects described herein.

It will be appreciated that the computing device includes circuitry, such as a processing unit coupled to a memory unit. Such circuitry configures the computing device to provide various features and functions and/or to perform applicable methods. A circuit may be considered to define (at least logically) a corresponding functional unit. Examples of functional units are skin analysis units and/or visualization units, etc. having the features as described herein. Other will be apparent. In one embodiment, a skin analysis unit is provided for classifying pixels of an image using a deep neural network comprising a regressor and a classifier for image classification to generate the skin diagnosis for skin conditions.

Practical implementations may include any or all of the features described herein. These and other aspects, features, and various combinations may be expressed as methods, apparatus, systems, components, program products, and in other ways, for performing the functions. Many embodiments have been described. However, it will be appreciated that various modifications may be made without departing from the spirit and scope of the processes and techniques described herein. Further, other steps may be provided from the described processes, or steps may be eliminated, and other components may be added to, or removed from, the described systems. Accordingly, other embodiments are within the scope of the following claims.

Throughout the description and claims of this specification, the words "comprise" and "comprising" and variations thereof mean "including but not limited to" and are not intended to (nor do) exclude other elements, integers or steps. Throughout the specification, the singular encompasses the plural unless the context requires otherwise. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.

Features, integer features, compounds, chemical moieties or groups described in connection with a particular aspect, embodiment or example of the invention are to be understood as applicable to any other aspect, embodiment or example unless incompatible therewith. All of the features disclosed herein (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. The invention is not limited to the details of any of the foregoing examples or embodiments. The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.

Reference to the literature

1.Abas,F.S.,Kaffenberger,B.,Bikowski,J.,Gurcan,M.N.:Acne image analysis:lesion localization and classification.In:Medical Imaging 2016:Computer-Aided Diagnosis.vol.9785,p.97850B.International Society for Optics and Photonics(2016)

2.Alamdari,N.,Tavakolian,K.,Alhashim,M.,Fazel-Rezai,R.:Detection and classification of acne lesions in acne patients:A mobile application.In:2016IEEE International Conference on Electro Information Technology(EIT).pp.0739-0743.IEEE(2016)

3.Drno,B.,Poli,F.,Pawin,H.,Beylot,C.,Faure,M.,Chivot,M.,Auffret,N.,Moyse,D.,Ballanger,F.,Revuz,J.:Development and evaluation of a global acne severity scale(gea scale)suitable for France and Europe:Global acne assessment scale.Journal of the European Academy of Dermatology and Venereology:JEADV 25,43-8(04 2010).URL doi.org/10.1111/j.1468-3083.2010.03685.x

4.He,K.,Zhang,X.,Ren,S.,Sun,J.:Deep residual learning for image recognition.In:Proceedings of the IEEE conference on computer vision and pattern recognition.pp.770-778(2016)

5.Jiang,R.,Kezele,I.,Levinshtein,A.,Flament,F.,Zhang,J.,Elmoznino,E.,Ma,J.,Ma,H.,Coquide,J.,Arcin,V.,Omoyuri,E.,Aarabi,P.:A new procedure,free from human assessment,that automatically grades some facial skin structural signs.Comparison with assessments by experts,using referential atlases of skin aging.International Journal of Cosmetic Science 41(01 2019).URL doi.org/10.1111/ics.12512.

6.Kingma,D.P.,Ba,J.:Adam:A method for stochastic optimization.CoRR abs/1412.6980(2014),URL arxiv.org/abs/1412.6980

7.Malik,A.S.,Ramli,R.,Hani,A.F.M.,Salih,Y.,Yap,F.B.B.,Nisar,H.:Digital assessment of facial acne vulgaris.In:2014IEEE International Instrumentation and Measurement Technology Conference(I2MTC)Proceedings.pp.546-550.IEEE(2014)

8.Maroni,G.,Ermidoro,M.,Previdi,F.,Bigini,G.:Automated detection,extraction and counting of acne lesions for automatic evaluation and tracking of acne severity.In:2017IEEE Symposium Series on Computational Intelligence(SSCI).pp.1-6.IEEE(2017)

9.Melina,A.,Dinh,N.N.,Tafuri,B.,Schipani,G.,Nistic_o,S.,Cosentino,C.,Amato,F.,Thiboutot,D.,Cherubini,A.:Arti_cial intelligence for the objective evaluation of acne investigator global assessment.Journal of drugs in dermatology:JDD 17(9),1006-1009(2018)

10.Pan,H.,Han,H.,Shan,S.,Chen,X.:Mean-variance loss for deep age estimation from a face.In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.pp.5285-5294(2018)

11.Rothe,R.,Timofte,R.,Van Gool,L.:Dex:Deep expectation of apparent age from a single image.In:Proceedings of the IEEE International Conference on Computer Vision Workshops.pp.10-15(2015)

12.Sandler,M.,Howard,A.,Zhu,M.,Zhmoginov,A.,Chen,L.C.:Mobilenetv2:Inverted residuals and linear bottlenecks.In:The IEEE Conference on Computer Vision and Pattern Recognition(CVPR)(June 2018)

13.Selvaraju,R.R.,Cogswell,M.,Das,A.,Vedantam,R.,Parikh,D.,Batra,D.:Grad-cam:Visual explanations from deep networks via gradient-based localization.In:Proceedings of the IEEE International Conference on Computer Vision.pp.618-626(2017)

14.Zhou,B.,Khosla,A.,Lapedriza,A.,Oliva,A.,Torralba,A.:Learning deep features for discriminative localization.In:Proceedings of the IEEE conference on computer vision and pattern recognition.pp.2921-2929(2016)

15.J.P.-O.N.V.-A.A.R.-A.J.D.-C.J.Alfredo Padilla-Medina,Francisco Len-Ordoez,Assessment technique for acne treatments based on statistical parameters of skin thermal images,Journal of Biomedical Optics 19(4)(2014)1–8–8.doi:10.1117/1.JBO.19.4.046019.URL doi.org/10.1117/1.JBO.19.4.046019

16.La Roche Posay:What is effaclar spotscan？(2019),URL www.laroche-posay.co.uk/what-is-effaclar-spotscan,last accessed＝2019-06-25

Claims

1. A skin diagnostic device comprising circuitry providing a processing unit coupled to a memory unit to configure the skin diagnostic device to provide:

a skin analysis unit for classifying pixels of the image using a deep neural network comprising a regressor and a classifier for image classification to generate a skin diagnosis for the skin condition.

2. The diagnostic device of claim 1, wherein the processing unit and storage unit further configure the skin diagnostic device to provide a visualization unit to annotate the image to visualize the skin diagnosis.

3. The diagnostic device of claim 1 or claim 2, wherein the processing unit and storage unit are further configured to configure the skin diagnostic device to provide a recommendation unit in response to the skin diagnosis to recommend a product for the skin condition.

4. A diagnostic device according to any one of claims 1 to 3, wherein the processing unit and storage unit are further configured to the skin diagnostic device to provide an e-commerce interface for purchasing products for the skin condition.

5. The diagnostic device of any one of claims 1 to 4, wherein the skin diagnosis comprises integer values on a scale that classifies the severity of the skin condition on the image.

6. The diagnostic device of claim 5, wherein the deep neural network is configured to receive a normalized facial image as input and output a vector representing a probability distribution of all possible integer values on the scale and apply an activation function to determine the integer values of the skin diagnosis.

7. The diagnostic apparatus of any one of claims 1 to 6, the deep neural network further configured with auxiliary tasks to determine one or both of race prediction and gender prediction.

8. The diagnostic device of any one of claims 1 to 7, wherein the deep neural network comprises an adaptation network for image classification, the adaptation network being adapted to generate the skin diagnosis.

9. The diagnostic device of any one of claims 1 to 8, wherein the deep neural network comprises a Convolutional Neural Network (CNN) for image processing.

10. The diagnostic device of claim 9, wherein the CNN comprises a residual network as an encoder, the residual network having a global pooling operation prior to a final fully connected layer configured to generate regressor and classifier outputs.

11. The diagnostic device of any one of claims 1 to 10, wherein the deep neural network is trained with a combined loss function that combines a regressor loss function and a classifier loss function.

12. The diagnostic device of claim 11, wherein the combined loss function satisfies the equation:

Wherein:

and->Mean Square Error (MSE), cross entropy error (CE), gender prediction and race prediction functions, respectively; and is also provided with

λ _mse 、λ _ce 、λ _{Sex (sex)} And lambda (lambda) _{Race and race} Is a weight factor.

13. The diagnostic device of any one of claims 1 to 12, wherein the deep neural network is trained using a dataset comprising facial image data from a self-captured image of a user mobile device.

14. The diagnostic device of any one of claims 2 to 13, wherein claims 3 to 13 are dependent on claim 2, wherein the deep neural network is configured to generate a heat map to visualize skin diagnostics associated with the image.

15. The diagnostic device of claim 14, wherein the deep neural network is configured to apply Class Activation Map (CAM) techniques to generate the heatmap.

16. The diagnostic device of any one of claims 1 to 15, wherein the image comprises a self-captured image from a user mobile device.

17. The diagnostic device of any one of claims 1 to 16, wherein the diagnostic device comprises one of a computing device for personal use and a server providing skin diagnostic services via a communication network.

18. The diagnostic device of claim 17, wherein the personal computing device comprises one of a smartphone, a tablet, a laptop, a personal computer, or other computing device with or coupled to a camera.

19. The diagnostic device of any one of claims 3 to 18, wherein claims 4 to 18 depend on claim 3, wherein the recommendation unit provides a treatment product selector responsive to the skin diagnosis to obtain at least one of a product recommendation and a treatment plan recommendation.

20. The diagnostic device of any one of claims 1 to 19, wherein the processing unit and storage unit configure the skin diagnostic device to provide an image acquisition function to receive the image.

21. The diagnostic device of any one of claims 1 to 20, wherein the skin condition is acne vulgaris.

22. A computer-implemented skin diagnostic method comprising:

receiving an image and processing the image using a deep neural network configured to classify image pixels to determine a skin diagnosis for a skin condition, wherein the deep neural network is configured as a regressor and classifier to determine the skin diagnosis; and

Visualizing the skin diagnosis associated with the image.

23. A computer-implemented method, comprising:

receiving a skin image;

processing the image using a deep neural network configured to produce skin analysis results and an activation mask to visualize the skin analysis results associated with the image; and

the activation mask and the displayed image are provided.

24. The method of claim 23, wherein processing the image comprises:

performing a plurality of (k) data enhancements to the image to produce k enhanced images for processing by the deep neural network;

the k enhanced images are processed through the deep neural network to produce k activation masks, and thereby define a final activation mask.

25. The method of claim 24, wherein the k activation masks and final activation mask comprise heatmaps.

26. The method of claim 25, wherein the deep neural network generates the respective k activation masks using a Grad-CAM technique.

27. The method of any of claims 24 to 26, wherein the final activation mask is defined by the k activation masks using averaging.

28. The method of claim 27, wherein a threshold is applied to the cancellation values from the averaged k activation masks.

29. The method of any of claims 24 to 28, wherein 2< = k < = 20.

30. A method according to any one of claims 24 to 29, comprising normalizing the image of the skin prior to performing the k data enhancement.

31. The method of any of claims 24 to 30, wherein each of the k data enhancements comprises an enhancement randomly selected from affine transformation and horizontal flipping operations.

32. The method of any one of claims 23 to 30, comprising providing a product recommendation in response to the skin analysis result.

33. The method of claim 32, comprising providing an e-commerce interface for purchasing one or more products, the interface being responsive to the product recommendation.

34. The method of claim 33, wherein the product recommendation is associated with a treatment plan for using a recommended product.

35. The method of any one of claims 23 to 34, wherein the deep neural network is configured to analyze the image for skin conditions.

36. The method of claim 35, wherein the skin condition is acne vulgaris.

37. The method of any of claims 23 to 36, wherein the deep neural network comprises a Convolutional Neural Network (CNN).

38. A computing device comprising circuitry to configure the computing device to perform the method of any one of claims 23 to 37.

39. A method, comprising:

training a Convolutional Neural Network (CNN) configured to classify image pixels to determine a skin diagnosis, wherein:

the CNN includes a deep neural network for image classification, the deep neural network configured to generate the skin diagnosis;

the CNN is configured as a regressor and classifier to determine the skin diagnosis; and is also provided with

The CNN is trained using a dataset comprising facial image data from a self-captured image of a user's mobile device.