Modern image recognition technology is getting really good at identifying objects. But engineers at MIT CSAIL show us how simply playing with their textures can confuse the AI into thinking an object is something completely different than what it actually is.
Category Archives: Image Recognition
One of Snapchat’s best loved features is its photo filters, which use GPS data and augmented reality to add interactive “lenses” to your photos and videos. Now, the messaging startup wants to make that offering more powerful—and lucrative.
A patent application published on July 14, titled “Object Recognition Based Photo Filters,” describes lenses and filters that would be based on the picture you’re taking. For example, if you’re snapping a photo of the Empire State Building, you’d be given the option of a King Kong filter in which the ape climbs the building. The application also outlines how Snapchat could push you a free coffee offer after you post a photo of a hot cup of java.
Snapchat has 150 million users who send 10 billion videos a day, and they’ve shown no resistance to using sponsored filters. One by Gatorade during this year’s Super Bowl generated 160 million impressions.
But the deep image recognition software needed for the capabilities described in the patent goes further than what’s been offered to date and could make users uncomfortable. Based on the application, Snapchat would be looking at what you’re sending, where you are, and send you advertisements based on that. Snapchat declined to comment on the application.
The tension between a user’s experience and building an advertising business has been a challenge faced by almost every social media company. Facebook and Twitter have had their ups and down, and so will Snapchat. The company is internally projecting sales of $250-$350 million in 2016, and between $500 million and $1 billion in 2017. Snapchat brought in just $59 million in 2015, according to TechCrunch.
Companies file patent applications that go unused all the time, and this patent has not yet been granted. But the bet is on whether or not consumers (especially young ones, like Snapchat’s core demographic) are willing to sacrifice their privacy for fun and potentially useful products. And, for Snapchat, the answer is the difference between being a hip trendy app and the next Facebook.
Intel buys chip maker Movidius to help bring computer vision to drones
Intel’s RealSense computer vision platform has been lacking a low-powered way of recognizing what its depth-sensing cameras are seeing — until now. The chip giant is buying Movidius, the designer of a range of system-on-chip products for accelerating computer vision processing.
Movidius supplies chips to drone makers such as DJI and to thermal imaging company FLIR Systems, itself a supplier of DJI. Its chips help computers figure out what they are seeing through cameras like Intel’s RealSense by breaking down the processing into a set of smaller tasks that they can execute in parallel.
There are systems that already do this using GPUs, but those are relatively power-hungry, often consuming tens of watts. That’s not a problem in fixed applications with access to mains electricity, or in cars, which have huge batteries and a way to recharge them. But in drones or other lightweight IoT devices, power consumption needs to be much lower. Movidius aims for a design power of around one watt with its Myriad 2 vision processing units.
Having largely failed to get its Atom processors into smartphones, Intel is looking for ways to lever them into other devices, such as drones.
Josh Walden, senior vice president and general manager of Intel’s New Technology Group, sees potential for Movidius to help it create systems for drones, and also for augmented, virtual and merged reality devices, robots and security cameras, he said in a post to the company’s blog. It’s not just about the chips, he said: Intel is also buying algorithms developed by Movidius for deep learning, depth processing, navigation and mapping, and natural interactions.
This week Stephen Wolfram, founder and chief executive of Wolfram Research, announced a new component of the Wolfram Language for programming called ImageIdentify. Wolfram also introduced a new website, dubbed The Wolfram Language Image Identification Project, that demonstrates the language’s new capabilities.
The new site lets you upload images and get inferences and definitions in response. You can provide feedback, which should help it become more accurate. You can hit buttons like “Great!,” “Could be better,” “Missed the point,” and “What the heck?!” After you choose one, the service offers a few more guesses, and a text box where you can type in a tag. Then you can type in your email address, so it can tell you “when ImageIdentify learns more about your kind of image.”
The service uses a trendy type of artificial intelligence called deep learning. It draws on artificial neural networks, which train on a large quantity of information, like pictures, and then make inferences when you give it new information, like a new picture. Big web companies like Facebook, Google, and Microsoftuse deep learning for various purposes, and increasingly smaller companies have been exposing deep learning tools for pretty much anyone to try out.
To get a rough sense of the power of the new Wolfram technology, I decided to put it up against other existing image-recognition systems you can test out on the Internet today, from CamFind, Clarifai, MetaMind, Orbeus, and IBM-owned AlchemyAPI. I chose images from Flickr that seemed to clearly fall into the 1,000 categories used for the 2014 ImageNet visual recognition competition. It was unscientific — just for the sake of curiosity.
What I found is that Wolfram’s new system doesn’t seem to be all that bad. It wasn’t overly conservative or vague, and it didn’t make many obvious mistakes — although it wasn’t as consistently accurate as MetaMind, for one. With time, Wolfram’s technology should improve — especially as people point out its flaws.
Here are 10 of the tests I ran to reach my conclusion.
Wolfram ImageIdentify: tea
CamFind: white ceramic mug
Clarifai: coffee cup nobody tea mug cafe hot ceramic coffee cup cutout
MetaMind: Coffee mug
Wolfram ImageIdentify: magic mushroom
CamFind: white mushroom
Clarifai: mushroom fungi fungus toadstool nature grass fall moss forest autumn
Wolfram ImageIdentify: spatula
CamFind: black kitchen turner
Clarifai: steel wood knife handle iron fork equipment nobody tool chrome
Wolfram ImageIdentify: scoreboard
CamFind: baseball scoreboard
Clarifai: scoreboard soccer stadium football game competition goal group north America match
5. German shepherd
Wolfram ImageIdentify: German shepherd
CamFind: black and brown German shepherd
Clarifai: dog canine cute puppy mammal loyalty grass sheepdog fur German shepherd
MetaMind: German Shepherd, German Shepherd Dog, German Police Dog, Alsatian
Wolfram ImageIdentify: tufted puffin
CamFind: toucan bird
Clarifai: bird one north America nobody animal people adult nature two outdoors
7. Indian cobra
Wolfram ImageIdentify: black-necked cobra
CamFind: brown and beige cobra snake
Clarifai: snake nobody reptile cobra wildlife daytime sand rattlesnake north America desert
MetaMind: Indian cobra, Naja Naja
Wolfram ImageIdentify: strawberry
CamFind: red strawberry ruit
Clarifai: fruit sweet food strawberry ripe juicy berry healthy isolated delicious
Wolfram ImageIdentify: cooking pan
CamFind: gray steel frying pan
Clarifai: ball nobody pan cutout kitchenware north America tableware competition bowl glass
Orbeus: frying pan
AlchemyAPI: (No tags)
10. Shoe store
Wolfram ImageIdentify: store
CamFind: black crocs
Clarifai: colour street people color car mall road fair architecture hotel
MetaMind: Shoe Shop, Shoe Store
Orbeus: shoe shop
This is an excerpt from TechCrunch and Posted by Sarah Perez (@sarahintampa)
Not everyone was happy with last week’s major revamp of Yahoo-owned photo-sharing site Flickr. A small, but very vocal, portion of Flickr’s user base of 100 million members, immediately took to the forums to lament the fact that the site’s new “auto-tagging” feature was enabled by default, and, worse, that there was no opt-out option provided. But that may now be changing, we understand.Flickr recently introduced a series of upgrades to its service on the web and on mobile designed to make every aspect of photo editing, organization and sharing easier on its service. A couple of the more notable changes were the addition of auto-tagging and new image-recognition capabilities. Combined, these features allow Flickr to identify what’s in a photo, and then automatically categorize it on users’ behalf by adding tags. This, in turn, makes images easier to surface by way of search.
Auto-tagging especially makes sense in today’s highly mobile age, where users take large numbers of photos and most no longer have the time or inclination to carefully group them or categorize them by manually adding tags. Tags, after all, are a holdover from an earlier time – the not-too-distant past before we all carried smartphones in our pockets capable of taking quality photos.
But for many Flickr users, tags are something they still feel strongly about, judging by the forum’s many comments. With over 1,370 replies to the official Flickr post (and growing), these users have been venting their frustration about the addition of auto-tagging. Many of those commenting have actually been fairly conscientious about their tags over the years, and don’t like that Flickr is now adding its own tags to their photos.
In addition, several also complain that Flickr’s auto-tags simply aren’t that accurate. In some cases, those mistakes are somewhat benign – a BMW gets tagged as a Ferrari, for example. But other times, they can be really terrible – as in the case of a user whose Auschwitz photos were incorrectly tagged as “sport,” for instance.
The problem lies with the fact that an algorithmic system of tagging is never going to be perfect – though it is capable of improving over time based on users’ corrections. But some are unwilling to wait for that training process to occur. They just want out. Period.
However, Flickr doesn’t offer an option to disable the auto-tagging at all, which is a rather bold stance to take. And while users can batch edit a group of tagged photos, they can’t edit auto-generated tags. So the only way to edit the auto-generated tags is to go into each photo individually. This is far too time-consuming for most people to manage, which is why so many are upset.
But Flickr tells us that it’s taking the community feedback on the matter seriously, and is evaluating an option that would allow an opt-out of the automated tagging. The option is not yet being built, but it is at least being actively discussed, from what we understand.
The company further explains that auto-tagging is actually a fairly crucial part to the upgraded service, as it is what powers a number of the new features, including the “Magic View,” which helps users organize and share their photos based on topic, as well as the new search tools and other “future features” still in the works. That could explain why Flickr felt strongly enough about auto-tagging to not make it an opt-in option in the first pace, as well as why there’s no “off” switch for the time being.
While likely a large majority of consumers won’t care (or maybe even notice), for those power users and others who rely heavily on Flickr as their main online image repository, adding the “opt-out” option – even as a gesture to the community – would be appreciated.
Humans and software see some images differently, pointing out shortcomings of recent breakthroughs in machine learning.
By Caleb Garling on December 24, 2014 read the full original article here: TechnologyReview
WHY IT MATTERS
Image recognition algorithms are becoming widely used in many products and services.
Images like these were created to trick machine learning algorithms. The software sees each pattern as one of the digits 1 to 5.
A technique called deep learning has enabled Google and other companies to make breakthroughs in getting computers to understand the content of photos. Now researchers at Cornell University and the University of Wyoming have shown how to make images that fool such software into seeing things that aren’t there.
The researchers can create images that appear to a human as scrambled nonsense or simple geometric patterns, but are identified by the software as an everyday object such as a school bus. The trick images offer new insight into the differences between how real brains and the simple simulated neurons used in deep learning process images.
Researchers typically train deep learning software to recognize something of interest—say, a guitar—by showing it millions of pictures of guitars, each time telling the computer “This is a guitar.” After a while, the software can identify guitars in images it has never seen before, assigning its answer a confidence rating. It might give a guitar displayed alone on a white background a high confidence rating, and a guitar seen in the background of a grainy cluttered picture a lower confidence rating (see “10 Breakthrough Technologies 2013: Deep Learning”).
That approach has valuable applications such as facial recognition, or using software to process security or traffic camera footage, for example to measure traffic flows or spot suspicious activity.
But although the mathematical functions used to create an artificial neural network are understood individually, how they work together to decipher images is unknown. “We understand that they work, just not how they work,” says Jeff Clune, an assistant professor of computer science at the University of Wyoming. “They can learn to do things that we can’t even learn to do ourselves.”
These images look abstract to humans, but are seen by the image recognition algorithm they were designed to fool as the objects described in the labels.
To shed new light on how these networks operate, Clune’s group used a neural network called AlexNet that has achieved impressive results in image recognition. They operated it in reverse, asking a version of the software with no knowledge of guitars to create a picture of one, by generating random pixels across an image.
The researchers asked a second version of the network that had been trained to spot guitars to rate the images made by the first network. That confidence rating was used by the first network to refine its next attempt to create a guitar image. After thousands of rounds of this between the two pieces of software, the first network could make an image that the second network recognized as a guitar with 99 percent confidence.
However, to a human, those “guitar” images looked like colored TV static or simple patterns. Clune says this shows that the software is not interested in piecing together structural details like strings or a fretboard, as a human trying to identify something might be. Instead, the software seems to be looking at specific distance or color relationships between pixels, or overall color and texture.
That offers new insight into how artificial neural networks really work, says Clune, although more research is needed.
Ryan Adams, an assistant computer science professor at Harvard, says the results aren’t completely surprising. The fact that large areas of the trick images look like seas of static probably stems from the way networks are fed training images. The object of interest is usually only a small part of the photo, and the rest is unimportant.
Adams also points out that Clune’s research shows humans and artificial neural networks do have some things in common. Humans have been thinking they see everyday objects in random patterns—such as the stars—for millennia.
Clune says it would be possible to use his technique to fool image recognition algorithms when they are put to work in Web services and other products. However, it would be very difficult to pull off. For instance, Google has algorithms that filter out pornography from the results of its image search service. But to create images that would trick it, a prankster would need to know significant details about how Google’s software was d
IN DEPTH Unlocking information from images By Mary Branscombe December 25th on TechRadar
How machine learning and image recognition could revolutionise search
A machine learning system is capable of writing an image caption as well as a person
Microsoft’s new Sway app: Office isn’t copying paper documents any more
How Kinect and analytics could boost sales in bricks-and-mortar stores
Speech recognition software: top six on the market
Text in documents is easy to search, but there’s a lot of information in other formats. Voice recognition turns audio – and video soundtracks – into text you can index and search. But what about the video itself, or other images?
Searching for images on the web would be a lot more accurate if instead of just looking for text on the page or in the caption that suggests a picture is relevant, the search engine could actually recognise what was in the picture. Thanks to machine learning techniques using neural networks and deep learning, that’s becoming more achievable.
When a team of Microsoft and Facebook researchers created a massive data dump of over 300,000 images with 2.5 million objects labelled by people (called Common Objects in Context), they said all those objects are things a four-year-old child could recognise. So a team of Microsoft researchers working on machine learning decided to see how well their systems could do with the same images – not just recognising them, but breaking them up into different objects, putting a name to each object and writing a caption to describe the whole image.
To measure the results, they asked one set of people to write their own captions and another set to compare the two and say which they preferred.
“That’s what the true measure of quality is,” explains distinguished scientist John Platt from Microsoft Research. “How good do people think these captions are? 23% of the time they thought ours were at least as good as what people wrote for the caption. That means a quarter of the time that machine has reached as good a level as the human.”
Part of the problem was the visual recogniser. Sometimes it would mistake a cat for a dog, or think that long hair was a cat, or decide that there was a football in a photograph of people gesticulating at a sculpture. This is just what a small team was able to build in four months over the summer, and it’s the first time they had a labelled a set of images this large to train and test against.
“We can do a better job,” Platt says confidently.
Machine learning already does much better on simple images that only have one thing in the frame. “The systems are getting to be as good as an untrained human,” Platt claims. That’s testing against a set of pictures called ImageNet, which are labelled to show how they fit into 22,000 different categories.
“That includes some very fine distinctions an untrained human wouldn’t know,” he explains. “Like Pembroke Welsh corgis and Cardigan Welsh corgis – one of which has a longer tail. A person can look at a series of corgis and learn to tell the difference, but a priori they wouldn’t know. If there are objects you’re familiar with you can recognise them very easily but if I show you 22,000 strange objects you might get them all mixed up.” Humans are wrong about 5% of the time with the ImageNet tests and machine learning systems are down to about 6%.
That means machine learning systems could do better at recognising things like dog breeds or poisonous plants than ordinary people. Another recognition system called Project Adam, that MSR head Peter Lee showed off earlier this year, tries to do that from your phone.
Project Adam was looking at whether you can make image recognition faster by distributing the system across multiple computers rather than running it on a single fast computer (so it can run in the cloud and work with your phone). However, it was trained on images with just one thing in them.
“They ask ‘what object is in this image?'” explains Platt. “We broke the image into boxes and we were evaluating different sub-pieces of the image, detecting common words. What are the objects in the scene? Those are the nouns. What are they doing? Those are verbs like flying or looking.
“Then there are the relationships like next to and on top of, and the attributes of the objects, adjectives like red or purple or beautiful. The natural next step after whole image recognition is to put together multiple objects in a scene and try to come up with a coherent explanation. It’s very interesting that you can look in the image and detect verbs and adjectives.”
Making images useful
There are plenty of ways in which having your images automatically captioned and labelled will be useful, especially if you’re a keen photographer trying to stay on top of your image library or a news site looking for the right photograph.
“Indexing your photos by who’s in them is a very natural way to way to think about organising photos,” Platt points out. With more powerful labelling, you can search for objects in images (a picture of a cat) or actions (a picture of a cat drinking) or the relation between different objects in an image. “If I remember that I had a picture of a boy and a horse, I’d like to be able to index that – both the objects of the boy and the horse, and the relation between them – and put them in an index so I can go and search for them later.”
If you’re putting together a catalogue of products, having an automatically generated caption might be useful, but Platt doesn’t see much demand for something that specific. There is a lot of interest from different product teams at Microsoft, he says, but instead of creating captions for you he expects that “the pieces will be used in various products; behind the scenes, these bits will be running.”
Dealing with videos will mean making the recognition faster, and working out how to spot what’s interesting (because not every frame will be). But what’s important here is not just the speed, but the way the kind of understanding that underlies captioning complex images could transform search.
The deep learning neural networks and machine learning systems this image recognition uses are the same technologies that have revolutionised speech recognition and translation in the last few years (powering Microsoft’s upcoming Skype Translator). “Every time you talk to the Bing search engine on your phone you’re talking to a deep network,” says Platt. Microsoft’s video search system, MAVIS, uses a deep network.
The next step is to do more than recognise, and actually understand what things mean.
“Even for text there’s a fair amount of work and that’s where there’s a lot of interesting value, if we can truly understand text as opposed to just doing keyword search. Just doing keyword search gets you a long way, that’s how all of our search engines work today. But imagine if you had a system that could truly understand what your documents were about and truly be an assistant to you.”
The goal, he says, is to “try to truly understand the semantics of objects like video or speech or image or text, as opposed to the surface forms like just the words or just the colours.”
How machine learning and image recognition could revolutionise search
Excerpt By DOUGLAS MACMILLAN
and ELIZABETH DWOSKIN
Most users of popular photo-sharing sites like Instagram, Flickr and Pinterest know that anyone can view their vacation pictures if shared publicly.
But they may be surprised to learn that a new crop of digital marketing companies are searching, scanning, storing and repurposing these images to draw insights for big-brand advertisers.
Some companies, such as Ditto Labs Inc., use software to scan photos—the image of someone holding a Coca-Cola can, for example—to identify logos, whether the person in the image is smiling, and the scene’s context. The data allow marketers to send targeted ads or conduct market research.
Others, such as Piqora Inc., store images for months on their own servers to show marketers what is trending in popularity. Some have run afoul of the loose rules on image-storing that the services have in place.
The startups’ efforts are raising fresh privacy concerns about how photo-sharing sites convey the collection of personal data to users. The trove is startling: Instagram says 20 billion photos have already been shared on its service, and users are adding about 60 million a day.
The digital marketers gain access to photos publicly shared on services like Instagram or Pinterest through software code called an application programming interface, or API. The photo-sharing services, in turn, hope the brands will eventually spend money to advertise on their sites.
Privacy watchdogs contend these sites aren’t clearly communicating to users that their images could be scanned in bulk or downloaded for marketing purposes. Many users may not intend to promote, say, a pair of jeans they are wearing in a photo or a bottle of beer on the table next to them, the privacy experts say.
A screenshot of the Ditto Labs site shows the fire hose of photos that it scans for brands. The site filters photos by categories such as beer. ENLARGE
A screenshot of the Ditto Labs site shows the fire hose of photos that it scans for brands. The site filters photos by categories such as beer. DITTO LABS
“This is an area that could be ripe for commercial exploitation and predatory marketing,” said Joni Lupovitz, vice president at children’s privacy advocacy group Common Sense Media. “Just because you happen to be in a certain place or captured an image, you might not understand that could be used to build a profile of you online.”
In recent years, startups have begun mining text in tweets or social-media posts for keywords that indicate trends or sentiment toward brands. The market for image-mining is newer and potentially more invasive because photos inspire more emotions in people and are sometimes open to more interpretation than text.
Instagram, Flickr and Pinterest Inc.—among the largest photo-sharing sites—say they adequately inform users that publicly posted content might be shared with partners and take action when their rules are violated by outside developers. Photos that are marked as private by users or not shared wouldn’t be available to marketers.
There are no laws forbidding publicly available photos from being analyzed in bulk, because the images were posted by the user for anyone to see and download. The U.S. Federal Trade Commission does require that websites be transparent about how they share user data with third parties, but that rule is open to interpretation, particularly as new business models arise. Authorities have charged companies that omit the scope of their data-sharing from privacy policies with misleading consumers.
‘“Our API only provides public information to a handful of partners intended to help their clients understand the performance of their content on Pinterest.”’
The FTC declined to comment.
The photo sites’ privacy policies—the legal document enforced by law as promises to consumers—vary in wording but none of them clearly convey how third-party services treat user-posted photos.
While Facebook is one of the largest photo-sharing sites, the fact that most of its users restrict their photos’ access with privacy controls has deterred outside developers from mining those images. Developers commonly use Facebook’s API to pull in profile photos of its members but not for marketing purposes.
An Instagram spokesman said its partnerships with developers don’t “change anything about who owns photos, or the protections we have in place to keep our community a safe place.” Flickr said it takes steps to prevent outside developers from scanning photos on its site in bulk.
Pinterest said “our API only provides public information to a handful of partners intended to help their clients understand the performance of their content on Pinterest.”
Spokeswomen for Tumblr and Twitter declined to comment.
Jules Polonetsky, the director of Future of Privacy Forum, an advocacy group funded by Facebook and other tech companies, said users should assume that companies are scanning sites for market research if their photos are publicly viewable.
But the boom in image-scanning technologies could lead to a world in which people’s offline behavior, caught in unsuspecting images, increasingly becomes fodder for more personalized forms of marketing, said Peter Eckersley, technology-projects director for the Electronic Frontier Foundation.
Moreover, the use of software to scan faces or objects in photos is so new that most sites don’t mention the technology in their privacy policies.
Advertisers such as Kraft Foods Group Inc. pay Ditto Labs to find their products’ logos in photos on Tumblr and Instagram. The Cambridge, Mass., company’s software can detect patterns in consumer behavior, such as which kinds of beverages people like to drink with macaroni and cheese, and whether or not they are smiling in those images. Ditto Labs places users into categories, such as “sports fans” and “foodies” based on the context of their images.
Kraft might use those insights to cross-promote certain products in stores or ads, or to better target customers online. David Rose, who founded Ditto Labs in 2012, said one day his image-recognition software will enable consumers to “shop” their friends’ selfies, he said. Kraft didn’t respond to a request for comment.
Ditto Labs also offers advertisers a way to target specific users based on their photos posted on Twitter, though Mr. Rose said most advertisers are reluctant to do so because users might find it “creepy.”
Mr. Rose acknowledges that most people who upload photos don’t understand they could be scanned for marketing insights. He said photo-sharing services should do more to educate users and give them finer controls over how companies like his treat photos.
Beyond image recognition, some API partners employ a process called “caching,” meaning they download photos to their own servers. One of the more common uses of caching is to build a marketing campaign around photos uploaded by users and tagged with a specific hashtag.
The companies don’t mention caching in their privacy policies and they vary in how long developers can store photos on their servers. Tumblr, for example, restricts caching to three days while Instagram says “reasonable periods.”
Some developers have already overstepped the rules set forth by photo-sharing sites. Last month, Pinterest learned from a Wall Street Journal inquiry that Piqora, one of seven partners in its business API program, launched in May, was violating its image-use policy.
Piqora, a San Mateo, Calif., marketing analytics startup, collects photos into a graphical dashboard that help companies such as clothing and accessories maker Fossil Inc. track which of its own products and those of competing brands are most popular. This violated Pinterest’s rules, which restrict partners from using images from the site that were posted by anyone except their own clients.
After Pinterest learned about the violation, the company asked Piqora to discontinue the practice and plans to begin performing regular audits of its business partners, a spokesman for Pinterest said. Fossil didn’t respond to a request for comment.
Piqora co-founder and Chief Executive Sharad Verma says he has removed the ability to view competitors’ images in the dashboard. He also clarified his company’s cached photos policy from Instagram. Rather than keeping photos for an indefinite period of time, Mr. Verna said he will now delete photos from his servers within 120 days.
“We might be looking at doing away with caching and figuring out a new way to optimize our software,” Mr. Verma said.
— Lisa Fleisher contributed to this article.
Write to Douglas MacMillan at firstname.lastname@example.org and Elizabeth Dwoskin at email@example.com
How do you teach a computer how to see?