Fotopedia – the rating system (2/2)

Last modification: 28-Nov-10

After all this background discussion in part 1, it’s time to get more concrete about actual problems and possible solutions…

Issue #1: Great photo – wrong article


Photo of Plaza_de_España (Seville) that was accidentally linked to Alcázar of Seville instead. powered by Fotopedia

A photo can accidentally be attached to an inappropriate article. Although the ranking system might ultimately fix this, it is more practical to inform the Fotopedia staff.

Nevertheless, “being” an encyclopedia requires a lot of focus on reliability. An extreme example is the Fotopedia article on Luxor Temple in Egypt: when I checked, at least 7 of the 14 Top pictures for Luxor Temple showed various other temples in the Luxor area. Luxor Temple, however, is quite large so the most convincing way to prove an image is not of Luxor Temple is to prove that it depicts another well-known temple (e.g. Karnak, etc.). Although an error rate of at 50% is unusally high, my estimate that a few percent of travel photos are incorrectly classifed, and that more are correct, but imprecise.

What might have caused the errors in this case:

  • There are several ancient temples in/near Luxor (collectively known as “Ancient Thebes with its necropolis” in the Heritage project). People usually visit multiple temples on the same day, they look similar and the photographer needs to check closely (e.g. capture times) to determine which photo was taken where.
  • Only one temple at Luxor is known as Luxor Temple, while the others may have been correctly keyworded as “Luxor” and “temple”.
  • Many photos show just a fragment of the subject (here is an example which I suspect is actually from Medinet Habu).

The incorrect photos may actually all be fine photos and thus receive high rating like the above example from Seville (that was “right city, wrong building”). Many ratings will be from people who are not very familiar with the subject and so will simply assume that the images are classified correctly. For some topics the risk of errors may be lower (baseball, Venice, Rolls Royce) and for others it is higher (animal species, buildings, mountains, saints). So some suggestions how this might be avoided..

  1. Encourage providing captions and links. A photo with a filled in caption or linked to more than one article should generally be rewarded. The photo is better documented, and thus has more value to encyclopedia users (e.g. students, enthusiasts, maybe occasional scholars). I expect that more attention to documenting pictures will raise awareness about information accuracy – and make it easier to detect errors.
    The goal of documenting images better is to be able to distinguish between “Mount Baker is the mountain on the left” and “View from Mount Baker”. And to give both voters and users of the photo more reliable information than having to guess based on the article where they found the photo.
  2. Show captions and links during voting How can you judge the suitability of a photo for an encyclopedia if you don’t know how well the image was documented? You may also miss information which is relevant to know what you are seeing. For example, take a typical National Geographic travel photo or World Press nature photo and judge it without access to the caption: chances are you will not appreciate what you are seeing.
  3. Vote per link If a photo is linked to 3 articles, vote for all contexts at one. You should be aware of them anyway, to interpret what you are seeing (e.g. photo of a neoclassic Excalibur car prominently parked in front of the Louis Vuitton shop on Union Square, San Francisco). This helps get more emphasis on the info value.
  4. Separately rate Aesthetic & Information. It is helpful to separate ratings for aesthetics and for information value. Although this means providing 2 numbers rather than one, it helps get people to vote more reliably: current voting IMO is mainly on aesthetics and the encyclopedia side is undervalued during voting. This means that the images are less useful for someone looking for more information, or for someone interested in visiting that location.
    Actually if an image is linked to 3 articles, only the Information rating needs to be repeated. The aesthetics rating can be reused across the articles.
  5. “Own photos” are safer. Photos that you made yourself could be given a small bonus compared to photos made by someone else. Essentially because captions or keywords from Flickr were not intended to have encyclopedia quality, because 3rd party photos are unlikely to be linked to all relevant articles and because the “uploader” cannot fill in the gaps in the available information.

As an example, let’s take the image above of the Plaza de España (Seville). Assume it is linked to Ibero-American Exposition of 1929 and Plaza de España (Seville). A voter would get the following 3 questions:

Visual quality: choose between 0/1/2/3/4/5/6/7
Context Plaza de España (Seville): 0/1/2/3/4/5/6/7/?
Context Ibero-American Exposition of 1929: 0/1/2/3/4/5/6/7/?

If you think it is an OK picture but it looks useful for the Plaza de España and you have no clue what “Ibero-American Exposition of 1929” is all about (which is OK), you might rate it “4” and “5” and “?” respectively. That is more work than answering a single question. But you actually rated the image for two different articles (which is currently very awkward to do). And you provided more precise information, allowing smart software to learn more about the photo than if a single scale had been used.

Fotopedia’s Adrian Measures pointed out that captions may be in a language that the reader can’t read fluently. Links might be handled more elegantly when Photopedia becomes multi-lingual. I agree that manually translating captions into all possible languages is not worth the effort. But I still strongly prefer a caption in any language to no caption at all: for some languages I can guess what the caption means (e.g. by recognizing names and dates) – and if I really care, I can have software or a friend translate the caption. In particular, the presence of a good caption allows me the ability to check the information or find additional information (e.g. to discover that this Roman statue was found in a site called Italica in Spain). Just a link to Italica is ambiguous.

Issue #2: Duplicate images

The Eiffel Tower article has roughly 60 images. Each of them is indeed of the Eiffel Tower and I would have been proud if any one of these were mine. But many images are similar. Maybe 25 photographers submitted their best 1 or 2 images of the Eiffel Tower at night. There are multiple images of the Eiffel Tower with fireworks. The are multiple images looking straight up into the tower, etc. Almost none of the images is “unique” with the collection because the Eiffel Tower can only be photographed in so many ways. Another example: two photos show the Colosseum in Rome reflected in a puddle, multiple show the Colosseum as background to an unrelated statue, many show the Colosseum at night, many show the interior. If these photos had been taken by a single photographer, the photographer would have made a selection, and would not have presented the same image or “trick” multiple times. However, because the photos come from different photographers, the rating process should help eliminate the overlap. Image overload is not ideal for the viewer – but you can argue that the viewer can stop browsing whenever they want – especially if the images are ranked from high to low rating. But this means that the first few great pictures will get viewed a lot, that the first few pictures may earn a +1 and lower rated pictures will largely get ignored because

  • we don’t usually have patience for 50-100 Eiffel Tower pictures
  • you may award a +1 to the first Eiffel Tower picture you see with fireworks (cool!), but not give a +1 to the second or third image of Eiffel Tower with fireworks – even if the subsequent images are better than the first one.

So the current ranking system accidentally biases reviewers quite heavily towards images that were submitted early: older submissions have had more time to accumulate votes, and new images viewed later in the list will be viewed less. And if they are viewed, they will likely be rated lower because a new image is no longer new when you see it the 2nd of 5th time. Some ideas how this can be improved..

  1. 1..7 scale
    Let users rank photos on a scale of 1 to 7 (whereby 4 represents the “average”  quality level of Fotopedia photos). uses this convention. This gives more information because you can assign an above-average photo a 5, 6 or 7. It also encourages people to use values below 4 without having to interpret this as “a really bad photo”: in fact, a rater should be encouraged to assign rates below 4 as often as ratings about 4 (again a trick). This helps calibrate the rating scale across users, encourages users to use a wider range of values and encourages people to not only rate the best photos they look at.
  2. Use averages.
    If a photo gets ratings of 3, 4, 4, 4, 6 these ratings should be averaged to 4.2. A newer photo may only have ratings 4, 5 (thus with a higher average despite having fewer ratings). This solves the problem in the current system that old photos accumulate more points than new ones, and that newer photos may not even get seen because they are at the end of the list. This is unfair to new photos, and doesn’t really encourage photographers to submit photos for “older” articles.
  3. Avoid high-to-low presentation.
    If a viewer decides to rate images within an article, they should be presented in random order. This means that any image (old or new; good or bad; top or candidate) has equal chance to get rated. The viewer can view all the available images, but can also stop midway. Rating of a single (e.g. featured) image is also OK, but rating of multiple photos with an article is better from an accuracy and efficiency standpoint.
  4. Hide ratings
    When a viewer rates an image, don’t show its current rating before the user has given an opinion. Showing current rating influences the viewer and is considered bad practice in polling: either the voter follows the opinion of others, or votes extreme to “correct” the average opinion of others.
  5. Curator can cluster photos
    Have an “expert”  (curator/editor/volunteer…) with knowledge or interest in the topic, indicate with photo’s within an article are similar. This can be used for generating smaller selections (e.g. above 4 or even 5) which don’t contain similar photos. The person doing the clustering doesn’t directly define which photo goes into a selection, but essentially says “only the best photo in this cluster will get in the selection”. T.b.d. what to do with a cluster that doesn’t contain strong enough pictures: does the best one still make it to the selection because it shows a specific aspect? Or do none of the photos in the cluster make it because they are not good enough?

As an example of the clustering, I took the 19 photos for Philae (an Egyptian temple near Aswan) and created some example clusters. Each horizontal row in the illustration is a cluster. The leftmost image in each row had the highest score (at the time). This means that an image from the left column would be used to represent the other columns.

Example of clustering (photos from Philae article)

Note that the cluster only impacts the generation of selections: all photos are still available for those who want to see all them all. In my proposal, a combination of a threshold value and cluster based filtering would generate a Selection (formerly Top) from the Collection (formerly All = Top + Candidates).

As an example of an extreme need for captions, see the following photo:


This photo is attached to French Campaign in Egypt and Syria and as a candidate to Graffiti powered by Fotopedia

Here is a photo I pasted under Graffiti – so it is among lots of colorful wall paintings. The photo shows graffiti made in 1799 by a team of scientists sent by Napoleon to explore Egypt (they were incidentally protected by French soldiers, leading to the famous order “Scientists and donkeys in the center!”).

The photo currently rates +3 in this context – which is not bad given that it deviates significantly from its neighbors. The photo might be very valuable to a small set of viewers (e.g. if you want to write a book on graffiti), but can be irrelevant or easily misinterpreted by others. For such images, I provide a caption explaining what the photo shows – but currently a reviewer normally doesn’t see the caption.

Issue #3: Ranking in context & article hierarchies

As explained above, image are ranked in the context of an article. The Information Quality ratings can be different per article. But in real examples they can be coupled through hierarchies such as geography (Note Dame -> Île_de_la_Cité -> Paris) or taxonomies (White-bellied_Sea_Eagle -> Sea_Eagle -> Eagle). When an image occurs in 2 or more articles, it can be smart to use rating information from one context within the other one. Some suggestions:

  1. Aesthetics independent of context It sounds safe to assume that the aesthetics of an image is identical in all contexts. This can give free and accurate rating information: store the aesthetic rating with the photo itself rather than with the linkage between the photo and an article.
  2. Inheritance It would be safe to inherit “relevance” rating across contexts if there is a hierarchy relationship defined between them (as in Fotopedia Projects). Very relevant photos of the Eiffel Tower are also relevant at the level of Paris, and may even be somewhat relevant at the level of France. Software can find the best Paris pictures by finding the best pictures of things-in-Paris.
  3. Link to the lowest levels A general guideline would be to link a photo to the most precise level that is known. Don’t link to Paris or France when you can be more specific. When you link to the Eiffel Tower, don’t also link to Paris > France > Europe. Something similar applies to Cattle Egret > Egret > Bird > Animal. Leave the propagation of good photos to higher levels to the software. Some pictures may have to be added to higher levels manually, but that is something for the Fotopedia staff.

Putting it all together

Let me try to summarize what the above ingredients would look like if you combine them. Note that numerous variations are possible, so just interpret this as an example and a general direction:

  • Ratings are selected by the reviewers on a scale of 1 to 7 (as in
    • Negative values are avoided for psychological reasons
    • The value 4 should correspond to “average level of quality in Fotopedia” and a user should assign ratings that roughly average out to 4.
      • Show the user the average of his/her ratings on the profile page as feedback. This is done in as well.
  • Assign separate ratings for (A)esthetics and for (I)nformation
    • The A-rating is attached to the photo (rather than to the photo in the context of a one specific article).
    • The I-rating is attached to the photo in the context of one specific article
    • The User interface could make it easy to provide multiple I-rating when a photo that is linked to multiple articles.It is encourated to link a photo to multiple articles because this reuses the photo, links the articles and provides documentation about the photo.
      • You don’t have to provide an I-rating (use “?” as default value).
        So you can provide I-ratings for only those subjects that you feel comfortable about
      • This increases the amount of information collected per minute that the user spends on rating.
    • Photos can be ranked based on their received A and I ratings.
      • The exact function can start out with Rating=(A+I)/2 when I is available (else Rating=A). Improved functions can be introduced later.
    • Ratings from multiple people are averaged rather than summed.
      • A photo from a less popular topic can thus be directly compared to scores for a less popular topic.
    • Whenever a user votes on a photo, the photo’s overall rating should not be shown before the user votes. The photo’s rating should be shown directly after voting – including how much the rating changed due to the vote (e.g. A: 4.14 -> 4.25, I: 3.52 -> 3.41). Showing the change confirms that voting has impact. The impact of a vote will obviously decrease as more people have voted on that photo.
  • Every article has 2 sets of photos (based on a formula that uses A and I values)
    • Collection: all photos linked to an article, regardless of A and I rating.
      • Photos are only detached from an article if the photo has been incorrectly classified. This means the information photo-belongs-to-article is saved regardless of the photo’s rating history. The current system has a design bug: info is lost when the rating drops to -1 and the photographer or curator subsequently removes the photo from that article.
    • Selection: a subset containing the best photos within the Collection
      • The Selection can be determined dynamically based on Ratings (already the case) and manual filtering (new) to avoid comparable images within Selection. Article curator can manually cluster similar images within collection into clusters: only the highest rated image from the cluster is shown in Selection. Curators can adjust Selection threshold (old) or Selection size per article (new): top-25 for the Eiffel Tower; top-100 for France; top-50 for Portrait; top-5 for Harley-Davidson
      • The Selection is similar to Top, but photos in the Selection get the same treatment as photos outside the Selection. It is like asking “show me the top-5 per article” or “show me the top-20 per article”.
      • Clustering is optional: clustering data can be added at any time. The system would work without clustering. Clustering makes sense mainly for large Collections.
  • Every article has a curator (or whatever the name of the role is). Responsibilities:
    • Cluster similar images (to control the Selection somewhat)
    • Keep an eye on incorrect or inappropriate images & handle complaints
    • Manage projects in which the article occurs
  • Dealing with hierarchies
    • discourage attaching a photo at unnecessarily high hierarchy levels: don’t attach a Dove to Bird or Animal.
    • instead the rating system is used to compute what photos are pushed up
    • example: Pisa, Rome, Florence, Naples are part of Italy project. The highest ranked Selection photos from Pisa, Rome, Florence, Naples are pushed up to the Italy level.
    • The number of photo’s pushed up could use a similar criterium as Selection

Comments would be great

Feel free to comment below. Note that the comments are hierarchical (“threaded”), so please press the Reply of the comment you want to respond to. It then ends up directly below that comment and with an extra level of indentation.3

Fotopedia – the rating system (1/2)

Last modification: 2-Oct-10

Fotopedia‘s system for ranking the quality and suitability of photos is is based on counting votes. This results in cumulative ratings like +2 (few people have seen the image, or maybe people don’t like it) , +22 (a more popular image), or -2 (some people care enough to vote it down). Fotopedia’s rating system has multiple purposes:

  1. It helps remove less interesting or less relevant photos.
  2. It results in a ranking among the stronger photos (e.g. to view the “best” ones).
  3. It helps motivate the photographers.

If the system for rating and ranking works well, users looking for information find great photos. If it works well, photographers and rankers also stay motivated and keep contributing their photos and energy. A good ranking system should thus help Fotopedia outperform alternative and more straightforward ways (e.g. Flickr, Google) when you are interesting in high quality photos that illustrate a particular subject.

Ratings (e.g. +1) are valid in the context of an article ("Black cat").

Purpose of this posting

Fotopedia plans to update their rating/ranking/voting system. This posting analyzes the “old” ranking system, presents some general ideas for improvements, and hopes to trigger some good discussion on the topic. The second part of this posting becomes a bit more concrete about what an improved rating and ranking system could look like.

Learning from the search engines

Search engines such as Google face a similar challenge: they need to present the most relevant Internet pages for a search query at the top of the list of results. This helps the user find results quickly.

Initially such page ranking algorithms selected pages based on keyword matching: if you search for apple boot camp you would find pages which contain all 3 words. But particularly Google excels in guessing a page’s relevance by looking at the other pages that link to that page. Google’s proprietary PageRank algorithm uses the estimated ratings of linking pages to rank the linked page. The assumption is that if a page is interesting, other page will link to it. And if a page is really interesting, other interesting pages will link to it.

So Google essentially uses information that is automatically harvested from the Internet to guess which pages fit your search best. Note that this means that there is no need for people to answer questions about the pages in order to find out which ones are good. The entire system is thus automated (albeit with the use of vast compute resources).

Fotopedia’s challenge is different: if you are looking for photos of Grapes, the software requires humans to link photos to the Grape article. And the software needs additional human help to judge the quality and relevance of the submission.

Despite differences between searching for text and searching for photos, we can still learn something from the search engines:

  1. Ranking algorithm accuracy is critical: Google kept competition (remember AltaVista?) at bay largely by using smarter ranking algorithms.
  2. You might be able to get hints about photo quality/relevance by using available data in a smarter way. It is OK for the ranking system to be a bit fancy, and the exact method of scoring doesn’t need to be visible to the users – it is only important that users see enough of the ranking system to know that their votes make a difference and to believe that the system is fair.

And obviously, the quality of the questions which are asked to users about photos are critical in determining how much you can learn about the suitability of the photos and how long it takes the system to learn what it can learn. Both points (learning the right things, and learning them with as little user input as possible) are areas that could be improved.

The “old” Fotopedia rating system

A user can vote once on any photo (per article in which it is used). Giving it a “thumbs up”. adds +1 to the score or rating of the photo. Alternatively “thumbs down” decreases the rating by 1. Actually the rating only applies to that photo in the context of one specific article – even though the photo may be linked to multiple articles.

My photo of the violin of a famous 19th century Norwegian violinist scored +5 for the article on the violinist, it also rated +5 for the article about the violin’s builder, but it received a -1 rating as a general picture of a violin.

Scores of +5 or more (counting the original poster) currently promote an image from Candidate to Top, meaning that it becomes an official photo for that article. Enough subsequent thumbs-downs can kick the photo out of the list of Top photos. The threshold between Candidate and Top is sometimes manually set by Fotopedia staff to higher values like +10 when there are numerous  submissions (e.g. Flower has 300 submissions).

This photo was linked to the violinist, the violin maker and to Violin.
powered by Fotopedia

Moderators are empowered to give larger rating boosts – mainly to shorten the learning process (which can now take months or years). I am not aware that moderators (can) reduce the score of a photo by more than one. But they can undo the link between a photo and an article if the photo is unsuitable or has had a negative score for a long time.

A smarter ranking algorithm needed

Unlike Google, Fotopedia currently does its ranking using very simple algorithms on manually provided information. Currently a user only has two ways to influence a photo’s rating and ranking:

  1. increase the rating by one
  2. decrease the rating by one

That means everybody (except for moderators?) has the same impact – regardless of their track record or qualifications. Examples of information that is not used:

  • is the +1/-1 choice because of visual- or content reasons?
  • how knowledgeable is the photographer/submitter? (e.g. New Yorker supplying a New York photo)
  • how knowledgeable is the voter? (e.g. New Yorker voting on a New York photo)
  • say a photo is linked to both the “Gondala” and the “Venice” articles. If the photo is rated in one context, the rating in the second context is unchanged.
  • is the photo better/worse/similar to existing photos in the set?
  • is the photo exceptional, excellent, good, average, etc. according to the voter?

What kind of photos should rank high?

It helps to be explicit (and agree) on what we are trying to measure and what we are trying to achieve with the measurement.

The question current asked to a voter is:

“Is this a great photo to illustrate this article?”

This helps, but doesn’t tell us whether we are looking for great-looking photo that is somehow relevant for the article, or a photo that adds significant information to the article but may not be visually great. So I believe it is important to answer whether Fotopedia is mainly aims to be

  • a reliable source of information and show interesting aspects of the topic (goal is to be an “encyclopedia”),
  • a source of visually pleasing pictures showing the subject and showing it in an interesting way (goal is to be a series of “coffee table books”), or
  • both of these at the same time?

The Quality Chart currently says:

The world isn’t perfect. You might feel the need to represent it differently using Photoshop and your artistic talent. The encyclopedia isn’t the place to express such needs. We illustrate the world as it is, in all its beauty and ugliness. Artistic and overprocessed photos (including HDR) don’t belong.

Although this may suggest that Fotopedia images should be first and foremost informative, Fotopedia’s derivative iPad applications like “Heritage” rely heavily on the visual side. The current lack of emphasis on captions and the set of article links when rating a photo also suggest that the visual side is currently getting more attention.

Salon, library or both?

My assumption in the in the rest of this discussion piece is that an ideal Fotopedia photo should be both informative and visually pleasing – although I can’t define either precisely. This means that Fotopedia would target both the salon’s coffee table and the library’s bookshelf – so to speak.

The photo is linked to the Lighthouse article and is visually pleasing. It clearly illustrates what a lighthouse is and does, but information about location is missing.
powered by Fotopedia.

The “informative” and information accuracy are needed if Fotopedia aims to serve as a visual wrapper around- or companion to Wikipedia. But I believe “attractive” is also needed because:

  1. Even newspapers select photos (“President holding speech”) on both criteria. Newspapers are commercial products and readers prefer newspapers with “nice” pictures. The same applies to Fotopedia.
  2. Fotopedia plans to use its image collection to publish more “coffee table books” (like Fotopedia Heritage for the iPad). By definition, coffee table books should attract casual browsing and depend heavily on picture and graphical design quality.
  3. The vast majority of Fotopedia photos are already visually attractive. In fact, I wouldn’t be surprised if many voters decide to vote +1 mainly when the picture is nice (e.g. “good enough to hang on the wall”) and on-topic – but in that order!
  4. The rating is critical to motivate photographers to submit images. The rating system shouldn’t be too different from how photographers and their customers (editors, clubs, relatives) rate images – especially for documentary images. A photographer (e.g. for National Geographic) will strive to support the article’s text with attractive pictures.

A final example

An example of emphasis on aesthetics without worrying about the encyclopedia goals are a series of photos of fruit being dropped into water. This gives visually interesting photos and this photo actually earned the highest ranking within the Grape article. I would say it is clearly less suitable as an encyclopedia photo because the water and the splashing don’t convey anything relevant: it doesn’t tell me about grapes, how they are grown or how they are used. So ranking suggests that people often rate pictures on their photographic merits, while ignoring the informational merits.

A visually attractive and technically challenging photo, but unsuited to illustrate grapes.
powered by Fotopedia

[ see continuation in part 2 of this article ]