Last modification: 28-Nov-10
After all this background discussion in part 1, it’s time to get more concrete about actual problems and possible solutions…
Issue #1: Great photo – wrong article
Photo of Plaza_de_España (Seville) that was accidentally linked to Alcázar of Seville instead. powered by Fotopedia
A photo can accidentally be attached to an inappropriate article. Although the ranking system might ultimately fix this, it is more practical to inform the Fotopedia staff.
Nevertheless, “being” an encyclopedia requires a lot of focus on reliability. An extreme example is the Fotopedia article on Luxor Temple in Egypt: when I checked, at least 7 of the 14 Top pictures for Luxor Temple showed various other temples in the Luxor area. Luxor Temple, however, is quite large so the most convincing way to prove an image is not of Luxor Temple is to prove that it depicts another well-known temple (e.g. Karnak, etc.). Although an error rate of at 50% is unusally high, my estimate that a few percent of travel photos are incorrectly classifed, and that more are correct, but imprecise.
What might have caused the errors in this case:
- There are several ancient temples in/near Luxor (collectively known as “Ancient Thebes with its necropolis” in the Heritage project). People usually visit multiple temples on the same day, they look similar and the photographer needs to check closely (e.g. capture times) to determine which photo was taken where.
- Only one temple at Luxor is known as Luxor Temple, while the others may have been correctly keyworded as “Luxor” and “temple”.
- Many photos show just a fragment of the subject (here is an example which I suspect is actually from Medinet Habu).
The incorrect photos may actually all be fine photos and thus receive high rating like the above example from Seville (that was “right city, wrong building”). Many ratings will be from people who are not very familiar with the subject and so will simply assume that the images are classified correctly. For some topics the risk of errors may be lower (baseball, Venice, Rolls Royce) and for others it is higher (animal species, buildings, mountains, saints). So some suggestions how this might be avoided..
- Encourage providing captions and links. A photo with a filled in caption or linked to more than one article should generally be rewarded. The photo is better documented, and thus has more value to encyclopedia users (e.g. students, enthusiasts, maybe occasional scholars). I expect that more attention to documenting pictures will raise awareness about information accuracy – and make it easier to detect errors.
The goal of documenting images better is to be able to distinguish between “Mount Baker is the mountain on the left” and “View from Mount Baker”. And to give both voters and users of the photo more reliable information than having to guess based on the article where they found the photo.
- Show captions and links during voting How can you judge the suitability of a photo for an encyclopedia if you don’t know how well the image was documented? You may also miss information which is relevant to know what you are seeing. For example, take a typical National Geographic travel photo or World Press nature photo and judge it without access to the caption: chances are you will not appreciate what you are seeing.
- Vote per link If a photo is linked to 3 articles, vote for all contexts at one. You should be aware of them anyway, to interpret what you are seeing (e.g. photo of a neoclassic Excalibur car prominently parked in front of the Louis Vuitton shop on Union Square, San Francisco). This helps get more emphasis on the info value.
- Separately rate Aesthetic & Information. It is helpful to separate ratings for aesthetics and for information value. Although this means providing 2 numbers rather than one, it helps get people to vote more reliably: current voting IMO is mainly on aesthetics and the encyclopedia side is undervalued during voting. This means that the images are less useful for someone looking for more information, or for someone interested in visiting that location.
Actually if an image is linked to 3 articles, only the Information rating needs to be repeated. The aesthetics rating can be reused across the articles.
- “Own photos” are safer. Photos that you made yourself could be given a small bonus compared to photos made by someone else. Essentially because captions or keywords from Flickr were not intended to have encyclopedia quality, because 3rd party photos are unlikely to be linked to all relevant articles and because the “uploader” cannot fill in the gaps in the available information.
As an example, let’s take the image above of the Plaza de España (Seville). Assume it is linked to Ibero-American Exposition of 1929 and Plaza de España (Seville). A voter would get the following 3 questions:
Visual quality: choose between 0/1/2/3/4/5/6/7
Context Plaza de España (Seville): 0/1/2/3/4/5/6/7/?
Context Ibero-American Exposition of 1929: 0/1/2/3/4/5/6/7/?
If you think it is an OK picture but it looks useful for the Plaza de España and you have no clue what “Ibero-American Exposition of 1929″ is all about (which is OK), you might rate it “4″ and “5″ and “?” respectively. That is more work than answering a single question. But you actually rated the image for two different articles (which is currently very awkward to do). And you provided more precise information, allowing smart software to learn more about the photo than if a single scale had been used.
Fotopedia’s Adrian Measures pointed out that captions may be in a language that the reader can’t read fluently. Links might be handled more elegantly when Photopedia becomes multi-lingual. I agree that manually translating captions into all possible languages is not worth the effort. But I still strongly prefer a caption in any language to no caption at all: for some languages I can guess what the caption means (e.g. by recognizing names and dates) – and if I really care, I can have software or a friend translate the caption. In particular, the presence of a good caption allows me the ability to check the information or find additional information (e.g. to discover that this Roman statue was found in a site called Italica in Spain). Just a link to Italica is ambiguous.
Issue #2: Duplicate images
The Eiffel Tower article has roughly 60 images. Each of them is indeed of the Eiffel Tower and I would have been proud if any one of these were mine. But many images are similar. Maybe 25 photographers submitted their best 1 or 2 images of the Eiffel Tower at night. There are multiple images of the Eiffel Tower with fireworks. The are multiple images looking straight up into the tower, etc. Almost none of the images is “unique” with the collection because the Eiffel Tower can only be photographed in so many ways. Another example: two photos show the Colosseum in Rome reflected in a puddle, multiple show the Colosseum as background to an unrelated statue, many show the Colosseum at night, many show the interior. If these photos had been taken by a single photographer, the photographer would have made a selection, and would not have presented the same image or “trick” multiple times. However, because the photos come from different photographers, the rating process should help eliminate the overlap. Image overload is not ideal for the viewer – but you can argue that the viewer can stop browsing whenever they want – especially if the images are ranked from high to low rating. But this means that the first few great pictures will get viewed a lot, that the first few pictures may earn a +1 and lower rated pictures will largely get ignored because
- we don’t usually have patience for 50-100 Eiffel Tower pictures
- you may award a +1 to the first Eiffel Tower picture you see with fireworks (cool!), but not give a +1 to the second or third image of Eiffel Tower with fireworks – even if the subsequent images are better than the first one.
So the current ranking system accidentally biases reviewers quite heavily towards images that were submitted early: older submissions have had more time to accumulate votes, and new images viewed later in the list will be viewed less. And if they are viewed, they will likely be rated lower because a new image is no longer new when you see it the 2nd of 5th time. Some ideas how this can be improved..
- 1..7 scale
Let users rank photos on a scale of 1 to 7 (whereby 4 represents the “average” quality level of Fotopedia photos). Photo.net uses this convention. This gives more information because you can assign an above-average photo a 5, 6 or 7. It also encourages people to use values below 4 without having to interpret this as “a really bad photo”: in fact, a rater should be encouraged to assign rates below 4 as often as ratings about 4 (again a photo.net trick). This helps calibrate the rating scale across users, encourages users to use a wider range of values and encourages people to not only rate the best photos they look at.
- Use averages.
If a photo gets ratings of 3, 4, 4, 4, 6 these ratings should be averaged to 4.2. A newer photo may only have ratings 4, 5 (thus with a higher average despite having fewer ratings). This solves the problem in the current system that old photos accumulate more points than new ones, and that newer photos may not even get seen because they are at the end of the list. This is unfair to new photos, and doesn’t really encourage photographers to submit photos for “older” articles.
- Avoid high-to-low presentation.
If a viewer decides to rate images within an article, they should be presented in random order. This means that any image (old or new; good or bad; top or candidate) has equal chance to get rated. The viewer can view all the available images, but can also stop midway. Rating of a single (e.g. featured) image is also OK, but rating of multiple photos with an article is better from an accuracy and efficiency standpoint.
- Hide ratings
When a viewer rates an image, don’t show its current rating before the user has given an opinion. Showing current rating influences the viewer and is considered bad practice in polling: either the voter follows the opinion of others, or votes extreme to “correct” the average opinion of others.
- Curator can cluster photos
Have an “expert” (curator/editor/volunteer…) with knowledge or interest in the topic, indicate with photo’s within an article are similar. This can be used for generating smaller selections (e.g. above 4 or even 5) which don’t contain similar photos. The person doing the clustering doesn’t directly define which photo goes into a selection, but essentially says “only the best photo in this cluster will get in the selection”. T.b.d. what to do with a cluster that doesn’t contain strong enough pictures: does the best one still make it to the selection because it shows a specific aspect? Or do none of the photos in the cluster make it because they are not good enough?
As an example of the clustering, I took the 19 photos for Philae (an Egyptian temple near Aswan) and created some example clusters. Each horizontal row in the illustration is a cluster. The leftmost image in each row had the highest score (at the time). This means that an image from the left column would be used to represent the other columns.
Note that the cluster only impacts the generation of selections: all photos are still available for those who want to see all them all. In my proposal, a combination of a threshold value and cluster based filtering would generate a Selection (formerly Top) from the Collection (formerly All = Top + Candidates).
As an example of an extreme need for captions, see the following photo:
This photo is attached to French Campaign in Egypt and Syria and as a candidate to Graffiti powered by Fotopedia
Here is a photo I pasted under Graffiti – so it is among lots of colorful wall paintings. The photo shows graffiti made in 1799 by a team of scientists sent by Napoleon to explore Egypt (they were incidentally protected by French soldiers, leading to the famous order “Scientists and donkeys in the center!”).
The photo currently rates +3 in this context – which is not bad given that it deviates significantly from its neighbors. The photo might be very valuable to a small set of viewers (e.g. if you want to write a book on graffiti), but can be irrelevant or easily misinterpreted by others. For such images, I provide a caption explaining what the photo shows – but currently a reviewer normally doesn’t see the caption.
Issue #3: Ranking in context & article hierarchies
As explained above, image are ranked in the context of an article. The Information Quality ratings can be different per article. But in real examples they can be coupled through hierarchies such as geography (Note Dame -> Île_de_la_Cité -> Paris) or taxonomies (White-bellied_Sea_Eagle -> Sea_Eagle -> Eagle). When an image occurs in 2 or more articles, it can be smart to use rating information from one context within the other one. Some suggestions:
- Aesthetics independent of context It sounds safe to assume that the aesthetics of an image is identical in all contexts. This can give free and accurate rating information: store the aesthetic rating with the photo itself rather than with the linkage between the photo and an article.
- Inheritance It would be safe to inherit “relevance” rating across contexts if there is a hierarchy relationship defined between them (as in Fotopedia Projects). Very relevant photos of the Eiffel Tower are also relevant at the level of Paris, and may even be somewhat relevant at the level of France. Software can find the best Paris pictures by finding the best pictures of things-in-Paris.
- Link to the lowest levels A general guideline would be to link a photo to the most precise level that is known. Don’t link to Paris or France when you can be more specific. When you link to the Eiffel Tower, don’t also link to Paris > France > Europe. Something similar applies to Cattle Egret > Egret > Bird > Animal. Leave the propagation of good photos to higher levels to the software. Some pictures may have to be added to higher levels manually, but that is something for the Fotopedia staff.
Putting it all together
Let me try to summarize what the above ingredients would look like if you combine them. Note that numerous variations are possible, so just interpret this as an example and a general direction:
- Ratings are selected by the reviewers on a scale of 1 to 7 (as in www.photo.net).
- Negative values are avoided for psychological reasons
- The value 4 should correspond to “average level of quality in Fotopedia” and a user should assign ratings that roughly average out to 4.
- Show the user the average of his/her ratings on the profile page as feedback. This is done in photo.net as well.
- Assign separate ratings for (A)esthetics and for (I)nformation
- The A-rating is attached to the photo (rather than to the photo in the context of a one specific article).
- The I-rating is attached to the photo in the context of one specific article
- The User interface could make it easy to provide multiple I-rating when a photo that is linked to multiple articles.It is encourated to link a photo to multiple articles because this reuses the photo, links the articles and provides documentation about the photo.
- You don’t have to provide an I-rating (use “?” as default value).
So you can provide I-ratings for only those subjects that you feel comfortable about
- This increases the amount of information collected per minute that the user spends on rating.
- You don’t have to provide an I-rating (use “?” as default value).
- Photos can be ranked based on their received A and I ratings.
- The exact function can start out with Rating=(A+I)/2 when I is available (else Rating=A). Improved functions can be introduced later.
- Ratings from multiple people are averaged rather than summed.
- A photo from a less popular topic can thus be directly compared to scores for a less popular topic.
- Whenever a user votes on a photo, the photo’s overall rating should not be shown before the user votes. The photo’s rating should be shown directly after voting – including how much the rating changed due to the vote (e.g. A: 4.14 -> 4.25, I: 3.52 -> 3.41). Showing the change confirms that voting has impact. The impact of a vote will obviously decrease as more people have voted on that photo.
- Every article has 2 sets of photos (based on a formula that uses A and I values)
- Collection: all photos linked to an article, regardless of A and I rating.
- Photos are only detached from an article if the photo has been incorrectly classified. This means the information photo-belongs-to-article is saved regardless of the photo’s rating history. The current system has a design bug: info is lost when the rating drops to -1 and the photographer or curator subsequently removes the photo from that article.
- Selection: a subset containing the best photos within the Collection
- The Selection can be determined dynamically based on Ratings (already the case) and manual filtering (new) to avoid comparable images within Selection. Article curator can manually cluster similar images within collection into clusters: only the highest rated image from the cluster is shown in Selection. Curators can adjust Selection threshold (old) or Selection size per article (new): top-25 for the Eiffel Tower; top-100 for France; top-50 for Portrait; top-5 for Harley-Davidson
- The Selection is similar to Top, but photos in the Selection get the same treatment as photos outside the Selection. It is like asking “show me the top-5 per article” or “show me the top-20 per article”.
- Clustering is optional: clustering data can be added at any time. The system would work without clustering. Clustering makes sense mainly for large Collections.
- Collection: all photos linked to an article, regardless of A and I rating.
- Every article has a curator (or whatever the name of the role is). Responsibilities:
- Cluster similar images (to control the Selection somewhat)
- Keep an eye on incorrect or inappropriate images & handle complaints
- Manage projects in which the article occurs
- Dealing with hierarchies
- discourage attaching a photo at unnecessarily high hierarchy levels: don’t attach a Dove to Bird or Animal.
- instead the rating system is used to compute what photos are pushed up
- example: Pisa, Rome, Florence, Naples are part of Italy project. The highest ranked Selection photos from Pisa, Rome, Florence, Naples are pushed up to the Italy level.
- The number of photo’s pushed up could use a similar criterium as Selection
Comments would be great
Feel free to comment below. Note that the comments are hierarchical (“threaded”), so please press the Reply of the comment you want to respond to. It then ends up directly below that comment and with an extra level of indentation.3