Tag Archives: search engine

Thoughts on a Facebook search engine

Today, I spent a good chunk of an hour trying to dig through various friend profiles to find this single video that one of my friends made that had a bunch of graduation-related photos. I was on a nostalgia trip and I wanted to dig it up again. However there was one major problem, Facebook’s Timeline layout (at least as of now) is a mess when you are trying to find this one specific photo or video. First, you cannot filter out everything but videos, nor can you filter photos/videos by the people who are tagged in them. This makes looking for this stuff a nightmare and eventually I just gave up. Also, this is not the first time I have had to dig up something old that was buried on my timeline or on some friend’s timeline. Once, I was digging up a post on my timeline, and the best I could do was to scroll down and hope I would eventually find it. I knew who posted it, but I couldn’t remember the date of the post, which made it difficult. I did find it eventually, but it took about 1.5 hours.

This got me thinking. The thing about Facebook’s layout is that it is currently optimized for the most recent items to have the highest visibility. This is evident when you look at the Timeline’s chronological ordering or your News Feed’s chronological ordering. (Note: the News Feed is not strictly in chronological order since it does also sort by how important or popular Facebook’s algorithms determine a post to be. That means an older post with a lot more activity in terms of comments would appear higher than other things. But for the most part, it is safe to assume that it is indeed chronological). For 99% of Facebook’s use cases, this is the optimal way of laying things out. Most of the time, one logs onto Facebook to check the latest updates from their friends, not to go digging through old content.

But there is a problem here. Currently Facebook is relatively young. In a few years, say 4-5 years, people are going to start wanting to look at their old content. Examples include photos, links and status updates. But, Facebook in its current state is incapable of serving those use cases. The best way to go about it would be to take a Google search engine approach. Why not be able to search posts, photos and videos using things like keywords, tagged people, location and date ranges? Of course, one does not need all of Facebook to be searchable. Many users would probably be content with being able to search their own content and perhaps that of their friends. Facebook can implement privacy controls so that people can control what content from their timelines shows up in search results. Of course, like most major features that are added to Facebook, you can bet there will be a huge privacy outcry, which is something that Facebook will have to consider.

I would implement this myself as a side project. However, there is a very real possibility that such a project would be in violation of Facebook’s Terms of Service because I would have to scrape and index content that is not accessible outside the Facebook platform. Not to mention that I would have to deal with the issue of storing the indexed data. Facebook most certainly would not be pleased if I stored it on my own servers, nor would any of my users be please because there are legitimate privacy concerns here. These are problems that will probably take a lot of time and effort to navigate around. Honestly, it is not worth it for a side project. Perhaps I can just implement it to index only my own timeline data? This means that I would have to create a dedicated search website, which would need to be hosted on my own servers. A Facebook app would probably be able to dodge some of these concerns, but I would still be unable to cache indexed data without violating the Terms of Service. I would have to do live queries to Facebook as I am returning search results, which would be slow and bandwidth heavy. The third option, which skirts around this issue though it probably is a legal grey area (see FB Purity case), is to create a Google Chrome extension or Firefox plugin that would store the indexed data on the user’s local machine. This means that there are no queries going to a third party service. Now, the data is being kept on the client machine, which means I would have to encrypt it. Also, performance is bound by the client machine. These are two problems that make implementing something that exists the grey area of Facebook’s Terms of Service, a not worthwhile endeavour.

So the remaining option is to wait for Facebook to do it. Now why wouldn’t they have done it by now? Search engines, crawling and indexing data and extracting useful metadata out of data is a very challenging technical problem to solve and it requires a lot of resources. Google has hundreds of thousands of servers to digest, store and serve the data that powers their search engine. It’s a very expensive system to maintain and would require a lot of developer resources to build. Let’s be honest, this would serve about 10% of the current use cases of Facebook. The cost-benefit trade-off here does not look very appealing. Now mind you, it could be a lot more appealing once it is deployed because people might actually want to use it. But for now, it definitely does not make sense for Facebook to commit enormous resources to such an endeavour.  But, I remain optimistic that they will do it eventually. After all, I’m fairly sure they already have a bunch of the infrastructure and technology necessary, to do the search engine, already powering their business intelligence and advertising systems.