SEPHYRES 1: A Physician Recommender System Based on Semantic Pain Descriptors and Multifaceted Reasoning

Ali Sanaeifar1*, Ahmad Faraahi2 and Mahmood Tara3

1Department of Computer Engineering and Information Technology, Payam Noor University of Tehran, Iran

2Department of Computer and Information Technology, PayamNoor University of Tehran, Tehran, Iran

3Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran

*Corresponding Author:
Ali Sanaeifar
Department of Computer Engineering and Information Technology, Payam Noor University of Tehran, Iran
Tel: +983142727121
E-mail: [email protected]


Physician recommender systems have emerged aimed at recommending the right physicians in accordance with patient preferences. However, such systems have been only based on techniques such as classification or syntax word-based search from previous patient recommendations and conditions with limited capabilities. In this paper, we propose a new model, we call SEPHYRES (semantic physician hybrid recommender expert system), through which we focus on the patient's medical conditions and pain description characteristics using an underlying evidence-based ontology. The ontology includes not only the semantic descriptions of the symptoms, but also the machineunderstandable perceptions of the pain location and the link weights. In the proposed model, we applied a weight spreading pseudo-fuzzy method along with the general semantic reasoners with facets management module. To keep the domain manageable, we limited the scheme to diseases that cause abdominal pain. We used Harrison's Principles of Internal Medicine and Up-to-date online as our base evidence references along with the opinions from our local experts. We compared the results from our pseudo-diagnostic engine with twenty case studies from MEDSCAPE and PubMed databases. The results showed that our model can improve the machine awareness about the individual’s disease and thus improve the accuracy of recommendations.


Physician recommender, Medical diagnosis, Disease ontology, Semantic reasoning


In recent years, there has been a remarkable increase in the growth of conceptual networks and semantic web, to turn the human information world machine understandable. One of the biggest limitations in this route is the challenge of redefining available human-centered information computer process able knowledge-bases. Such limitations are more prominent when the application domain is mostly fuzzy and uncertain by nature, as is in medicine. Here, the semantic web engineers need to turn every facet of medical information and knowledge into valid and accurate information artefacts that could bear further reasoning.

One of the recent types of semantic-based application in medicine is physician recommender system. Such applications are built to provide a valid, consistent and reliable reference to the right doctor as a health care service provider and if perform seamlessly could help the citizens and the society. However, choosing the right doctor is rather a complex task. People tend to consider several factors before choosing their doctors and these factors usually vary from one to another. In addition, most patients do not have enough information to make good decisions.[1] To make good referral recommendation, we not only need a handful of information regarding patient's conditions (e.g., symptoms and signs, type of illness, time and duration, medical history) but also need to have a careful look at the patient preferences such as proximity, the level of expertise, physician degree, gender match and so on. In such complex formulation, there is crucial need to receive help from an expert in the field.[2] Nonetheless, there are times that we have all the above information and there is still no hope to connect them all to a particular expertise, even in real world.

So far, several studies have been implemented in order to find appropriate solutions in this regard. For example, reference can be made to the research of Dutt et al., which used clustering of high utilization phrases to describe medical conditions, as well as for providing a list of medical specialties of every service provider. In their study, every specialty was connected to one or more clusters of medical conditions by an expert.[2] In the study of Bachus et al., patients have used an online database containing recommendations of previous patients with previous experience of being referred to a doctor. According to their method, new reference recommendations were made by parametric searching based on location, physician's name, specialty, or even medical conditions. However, the final decision was upon the patient, choosing the right doctor by looking at the doctor’s profile and recommendations her previous patients. Another study implemented a physician search by including patients’ experience expression in the results’ parameters.[3] LeClair et al. provided a model for selecting physicians based on the observed experience level under a specific condition or procedure. Their system presents the users, a set of selection criteria to search for doctors and require them to complete a medical profile; and further match them to cumulative profiles of doctors they were receiving from the third party organization. Thereafter, the system ranked the physicians based on the conditions claimed within the patient medical profile as well as the physician's profile, finally filtered by the patient particular preferences.[4]

In another research, Rogers et al., presented an upgraded model, as compared to previous studies by combining the profile match and the users’ feedback. In their model, the ranking was based on the recorded interests of similar users/patients. They also applied data mining techniques based on user activities and generated implied feedbacks from the results. Using their application, the search result could be linked to certain medical conditions or specific treatments. While Rogers et al. study would not recommend the right physician at the end, apparently their model was close enough.

Most studies reviewed by the research team were based on search and syntax-based matching. Such approaches would ignore a lot of semantic information and knowledge applied in real world cases of physician selection. In addition, in most cases, final decision has to be made by the patient based on the information provided by the application. Even the studies implemented some sorts of medical ontologies in their applications, none, to our knowledge, used symptom descriptors or weighting.[5-9]

In this study, we propose a model that allows multi-facet inference to automatically recommend physician to patients having abdominal pain problem. To achieve this goal, medical knowledge from trusted sources of medical evidence was turned into ontology and further used as basis for semantic network analysis based on weight-spreading. In the following sections, we share how our initial thoughts evolved over time and how they end up in our primary model. We will then explain how we evaluated the functionality of the developed application and reflect our suggestions for further contributions.

Materials and Methods

The main hypothesis of this study was that the addition of evidence module to the physician recommender systems of patients with abdominal pain, can improve the accuracy of the suggestion. Adding evidence module allows us create specialized profile based on current medical knowledge, and let the pain-doctor procedure to turn into a pain-pseudo-diagnosis-doctor procedure. We call the new model “SEPHYRES” as stands for semantic physician hybrid recommender expert system (Figure 1). Hybrid recommender means that the system is fed both by patient and evidence profile for enhanced quality. This study was conducted in four stages:


Figure 1:The composition of the expert system and recommender system in SEPHYRES.

1. Development of a primary database containing disease and body locations associated with abdominal pain, by studying sources of evidence.

2. Creating semantic knowledge base using pseudo-fuzzy concepts of areas presenting abdominal pain.

3. Implementation of reasoning algorithms based on weight-spreading in the network plus general semantic reasoners.

4. Evaluation of the final model using case studies from MEDSCAPE and PubMed databases.

Stage 1: Development of a primary database containing disease and body locations associated with abdominal pain, by studying sources of evidence: At this stage, we focused only on descriptions of pain, as the person’s primary problem, and painassociated diseases. The included diseases were the ones with clear association with abdominal pain. Two types of evidence were studied: local expert’s opinion including a general practitioner, a medical student of fellowship and two gastroenterologist physicians; and two of the most referred source of internal medicine knowledge, Harrison's Principles of Internal Medicine[10] and Up-to-date online (the most globally famous resource for evidence-based clinical decision support).[11] We came up with an initial list of 115 identified diseases associated with abdominal pain coded all using ICD10 (the 10th revision of the International Classification of Diseases).[12] Consulting the local experts, the weights for every association was assigned and inserted into the database. Two types of weight-labelled association were recorded: pain-disease, and disease-doctor. Finally, 90 diseases were selected with higher prevalence and importance. Every entry of pain was linked to a list of pain descriptors recommended by our sources of evidence (Table 1).

Subject sample values
Location (visual access by user) upper abdomen in Gastric Ulcer Disease
Focus Location (visual access by user) Epigastrium in Gastric Ulcer Disease
Radiation (visual access by user) to Groin, to Genitals for kidney Stone disease
Diffusion Localized, Widespread
Frequency Continues, Intermittent
Chronic State Acute, Chronic
Sharpness Sharp, Dull
Activity Response with Activity Increase, with Activity Decrease
Eating Relation Relate to Eating, Not Relate to Eating
Start State Suddenly, Progressive
Severity Mild, Moderate, Intense, Severe
Pulsation Pulselike, Pulseless
Other Sense Burning, Colicky, Crampy, Crawling, Fullness, Heat, Icy Coldness, Numbing, Pressure, Tenderness, Tingling, Weakness, Vaguely Uncomfortable 

Table 1: Pain descriptors.

Figure 2 shows how every instance of abdominal pain is associated to a pain-cause and pain descriptors. In this example, all types of pain descriptions regarding kidney stone disease were linked and weighted.

Stage 2: Creating semantic knowledge base using pseudo-fuzzy concepts of areas presenting abdominal pain: We used Protégé tools and Pellet semantic reasoner to build SEPHYRES ontology and engine. To assure better performance, we used PHP language and RAP library plus the rule inference of Jena library for Java language. A simplified version of SEPHYRES ontology is shown in Figure 3. As illustrated, the concept Patient (Figure 3: relation 1) was expanded into four pain-deterministic categories based on age and gender. The class Physician (Figure 3: relation 2) has also been described by specialty, region and gender. The main basic concepts of ontology include Medical Field, Location, and Disease. Some relations, such as the link between “appendicitis disease” and pain area is defined as weighted (Figure 3: relation 3), which reflects the importance of their relationship. Disease concept is linked to specialized medical fields through International Classification of Disease ICD10. The use of hierarchy instead of flat lists provides the possibility of providing connections to more general or detailed concepts. This enhances the power of expression and provides the ability to implement reasoning based on the classification hierarchy. This issue is resolved in the SEPHYRES ontology using the hasParent feature between instances of the desired class (Figure 3: relations 8 and 9).


Figure 2:Partial look at SEPHYRES pain description profile for kidney stone disease.


Figure 3:Part of the SEPHYRES ontology.

An important part of ontology is devoted to the concept of Location. This concept requires a relatively complex network for the conversion of location position into a semantic relation. In other words, the element of semantic in this concept helps in the perception of location by the machine. For example, the Right Lower Quadrant (RLQ) has hierarchical relationship of part to whole with the Right Abdomen and Lower Abdomen areas that had been established in the ontology by a feature of has Parent between the concept of RLQ and describing concepts of mentioned areas (Figure 3: relation 4). For instance, the evidence says that a patient with severe pain located in the bottom right area of the abdomen with certainty of 80% is related to appendicitis. However, an appendicitis patient may also have the pain in the abdominal area, but does not specify the exact location, and generally point to location as the middle or lower abdomen. Using our engine, such instances, are also related to appendicitis but with less certainty. Therefore, hierarchical relationships of body part concepts should be modelled into the ontology until reasoning processes discover new implicit facts as well as explicit facts as mentioned.

To model the concept of Location in regards to abdominal pain, we applied two 4-part and 9-part abdominal division standards known in medicine, described by a network of common expression with semantic and spatial overlap. Figure 4 you can see the areas of overlaps. There are also other less common location terminology such as the Upper Abdomen, Left Abdomen, and Right Epigastria, that could make the result even more complex. However, while human doctor seems to have minimal challenge in handling the location issue, we had a hard time finding a way differentiating Pelvic area from Right Lower Quadrant and Left Lower Quadrant (which have no syntax similarity), especially in the sense that correspondence should be done with less weight. Note that according to Figure 3: relation 5, the Pelvic has already had a whole relation with total weight to the concept of Lower Abdomen through have Parent features. To resolve the issue, in SEPHYRES ontology, we applied a new feature of has Half Parent and has Quarter Parent plus weight spreading techniques of semantic network in the preprocessing phase, and had weighted collision to half or quarter factor to produce pseudofuzzy association (see mentioned portions in overlapping areas of Figure 4). In Figure 3, this rule is shown with conjunction of RLQ coupled with other words describing round areas such as the Right Inguinal, Umbilical, Pelvic, Right Lateral (e.g. Figure 3: relations of 6 and 7).


Figure 4:4-part and 9-part standards of abdominal pain areas and some overlapping’s.

Stage 3: Implementation of reasoning algorithms based on weight-spreading in the network plus general semantic reasoners: General semantic reasoners are only able to infer some standard defined relations; but they fail to cover area-specific heuristics. One of the alternative solutions is the weight spreading method in the graph.[13] Applied weight spreading processes in the graph are considered a kind of semantic reasoning in semantic networks, which generally leads to discovery of new facts in knowledge base. Some of the past semantic recommenders in the field of digital television have applied this method.[14,15] So, weight spreading in the hierarchy can be used either in relation of Medical Field to Disease or in Disease to Location. The weight spreading can be done in the hierarchy of the ontology both downward and upward, and also toward siblings.

Downward reasoning performed by graph weight spreading toward children led to addition of new triples to the knowledge base. Some of the previous studies use a reduction factor for weight spreading to the children. In addition, they claim that there is a difference between the concepts of high-level and low-level hierarchy.[16] In our approach, parent’s weight is considered with a weight reduction coefficient of K in each level for children to low levels of hierarchy. In Figure 5, you can see a simplified example of the process. Of course the weight spreading with K-factor is only common for the hierarchical relationship that has been established through hasParent features. However, the features of has Quarter Parent and has Half Parent are respectively K/2 and K/4 (as relations 1 and 2, Figure 5). As previously mentioned, these features are considered to have a pseudo-fuzzy association in the machine perception of pain areas.


Figure 5:The downward reasoning process by weight spreading toward the children.

Upward Reasoning means weight spreading from children toward parents in the hierarchy of the ontology. For example, the specialty of Internal Medicine is associated with five children from six children of gastrointestinal diseases. So it can be inferred that it is related to parent of this concept to some extent. Now, the problem is the weight of relation to the parent. Some papers have used the averaging method.[16,17] For example, children weighted average value (consisting of zero for children which are not linked) was used. As an example, this type of inference is provided in Figure 6. Suppose that an expert have linked the diseases of the gastrointestinal tract generally to the abdominal pain instead of its children. If the patient has abdominal pain, internal specialty is not recognized for him. But after reasoning, because of the existence of the new inferred link, it will be possible. In the current version of SEPHYRES, these types of reasoning are done with a reduction coefficient of K for the children towards their parent. This type of inference in the semantic relationship between the disease and the hierarchy of parts of the body can be used too. Such as the relation of pain in the area of the Upper Abdomen that spread toward Abdomen concept that is its parent.

To complete the argument, it should be noted to another type of reasoning called the sibling reasoning that means weight spreading to sibling concepts in the hierarchy and conceptually means that if medical expertise is related to more children of a parent, it would be related to the remaining children (Figure 6, sign x).[16]


Figure 6:The upward reasoning process by weight spreading toward parent.

After applying the reasoning processes based on weight spreading and general semantic reasoner, new inferred facts were added in triple form to the knowledge base (both in the semantic relationship between specialization and hierarchy of diseases and between the disease and the hierarchy of the pain location areas). Now a semantic enhanced knowledge base is ready to apply SPARQL queries (language of semantic data query).

Stage 4-Evaluation of the final model using case studies from MEDSCAPE and PubMed databases: Whereas the added value of the proposed method is addition of expert semantic modules to build a medical pseudo-diagnostic, the main evaluation purpose is focused on considering pseudo-diagnostic engine power. The initial assessment of pseudo-diagnostic engine by statistical averaging in 20 queries based on case studies listed in MEDSCAPE and PubMed databases (Appendix 3). These medical case studies have patient’s biography with correct diagnosis, so the agreement level of correct diagnosis of case studies provided by SEPHYRES diagnosis is calculated in an average of 20 diagnostic queries. In addition, this averaging between queries was performed in 14 steps and in each step; the outputs of the system are different. For example, in the first stage, it is assumed that the system has only one diagnostic output in every query, and the comparison is done by the same output record and in step 14, calculation of agreed averaging is done by 14 diagnostic outputs in every query. Case studies consisted of patient biographies along with their accurate diagnosis, collected based on keyword search of "Abdominal Pain" in desired databases and 20 cases were selected after eliminating the cases that their correct diagnosis, were outside the covered diseases area. To make every query, concepts associated to pain descriptions were extracted from each case study and entered in the SEPHYRES user interface manually. Finally, the results of the early diagnosis from SEPHYRES were compared with correct diagnosis of case study. However, in this process only one correct diagnosis is considered and the SEPHYRES diagnosis which was not correct but was in some proximity level to the disease categories largely been regarded as false diagnosis. Therefore, the SEPHYRES pseudo-diagnostic engine judgment power can be tested based on pain describing concepts. We believe that we have improved the accuracy and general performance of the recommender system using our pseudo-diagnostic engine model. Figure 7 shows the process of extracting the concepts and receiving the SEPHYRES output.


Figure 7:The process of extracting the concepts and receiving the SEPHYRES output.


To evaluate the proposed model of pseudo-diagnostic engine and its performance, we measured precision and recall in ranking results based systems.[18] However, considering the fact that in medicine we usually have a list of differential diagnosis rather than a single one, we applied the recall and precision retrieval evaluation method to assess the functionality of the engine in picking the correct diagnosis from the list. Depending on the number of differential diagnosis reported by system, the system’s precision and recall were calculated, on average in 20 desired case studies and in 14 steps with the progressive number of output records at each step. Table 2 and Figure 8 present the recall graph based on the number of retrieved items and also precision/recall graph.

Number of Results 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Precision 0.0607 0.0577 0.0625 0.0682 0.075 0.0778 0.0813 0.786 0.0833 0.1 0.1125 0.15 0.175 0.3
Recall 0.85 0.75 0.75 0.75 0.75 0.7 0.65 0.55 0.5 0.5 0.45 0.45 0.35 0.3

Table 2: Average of precision and recall facing the number of retrieved outputs.


Figure 8:Recall graph facing the number of retrieved results and precision/recall graph.


This study showed that the use of semantic representation of concepts related to individual medical conditions focusing on pain descriptions and use of reasoning, based on weight spreading in conceptual network in frame of an expert module raises the recommender system awareness about medical conditions as the most important contextual factor of decision. The results shown in the recall graph, signifies that if the system is set for example, on recovery of 10 diseases as the primary diagnosis, the accuracy degree of differential diagnosis will be about 75%. Although, we only focused on semantic descriptions of the pain symptom, the obtained initial results seem satisfactory. In addition, to the possibility of full comparison with just certain diagnosis of case studies, we only considered one diagnosis as the correct result, thus in any query we only considered one right selection. However, it is clear that there are similar diagnosed diseases that may be considered as a correct diagnosis, because it may be in viewpoint of difference between relevant specialists (in place of physician suggestion) also have subscription. Therefore, actual outcomes could be provided well than the assessment results. Of course, the main model should consider the general patient profile in addition to this specialized profile, and in recommendation strategy according to Figure 1 the proposal is presented based on both profiles. However, other factors than pain, like other associated signs and symptoms can be effective in improving the quality of advice, but since the goal is to offer the doctor (and no diagnosis) according to the information described by the patient, we refused more complexity in the primary Profile.

Another item that was used in SEPHYRES engine is facets management. These facets include things like the importance of matching expertise with the patient’s age group (w1), the importance of master specialist to specialist (w2), and importance of doctor office proximity (w3), which were considered as a linear combination of base weight (weight relevant to the associated degree of individual problem with each specialty). It should be noted that in SEPHYRES, these weights are referred to users, as optional features and thus, the user put in the recommendation strategy loop. The linear combination formula is:

Final Weight=Base Weight+w1 (Base Weight)+w2 (Base Weight)+w3 (Base Weight)+...

We must emphasize that the use of a linear combination with promoter factors, can also be problematic. Linear combination of weights and averaging of different facets should not finally recommend a paediatrician to an adult (due to other increasing factors such as proximity). Therefore, we should also apply other restrictions to this combination or use a nonlinear function instead of a simple factor. In addition, there are other challenges on the way to work. Pain in some parts of the body may be related to specific expertise due to gender or age, and not independently based on the location and characteristics of pain. However, in SEPHYRES these factors will impact later.

Another limitation of this study is that the patient's primary profiling method is laboratorial, and so the results shown will not be able to show its applicability in the real world. For example, when describing pain, patients fail to use the divisions and official medical terminology method. So, perhaps using an interactive graphical interface and natural language-based engine can improve the utility of the results of study as well as the design model quality. In addition, the use of fuzzy concepts such as searching for an experienced physician, academic, proximity, etc., can help in improving the impact and user satisfaction levels.


In this paper, we proposed a new model we called "SEPHYRES" to recommend the best matching specialist to a patient with abdominal pain. This was achieved using recommender system techniques, based on semantic web, expert systems, and concepts used in the practical knowledge of medical informatics. This model deployed semantic reasoning techniques based on weight-spreading in conceptual networks to explore new facts that were ignored in previous developments by focusing on the semantic representation of the scope concepts and their weighted relationship. The SEPHYRES knowledge–based recommender engine makes it possible to provide more accurate advice to patients, by developing the patient's profile based on semantic links, as compared to traditional methods relying only on syntax matching. The SEPHYRES preliminary evaluation which was based on its primary pseudo-diagnosis engine, showed fairly acceptable results.

In continuation of our investigation, there is need for additional information from our experts to feed more detailed and evidence-based description to SEPHYRES. In the next steps, we are looking to identify the semantic relationship between the expertise, diseases, and estimated disease prevalence through interviews with the local experts. In addition, we are planning to use proximity as an additional weight dimension in physician recommendation by adding information and geographic variables based on geographical information systems (GIS). We are also designing a new GUI to improve user interaction with the system.

We recommend that future researchers upgrade SEPHYRES engine by turning it into a real diagnostic engine using a limited selection of diseases profiles.


Select your language of interest to view the total content in your interested language

Viewing options

Post your comment

Share This Article

Flyer image

Post your comment

captcha   Reload  Can't read the image? click here to refresh