Today I open Facebook, and I notice something, my adblocker isn’t working.
As a developer, I decided to investigate, and first thing first I inspect the structure of these sponsored posts, to see if there’s a way to identify them so I can remove them with a script.
The structure looks pretty simple, we have an element with role “article”, that contains a div with a class starting with “feed_subtitle”, and inside this last div, something like a bazillion of spans with random words.
Dude, seriously, WTF?
They are using a trick to display the word “Sponsored”: some of the spans are visible, some aren’t.
And to make things simpler… sometimes the parent is visible, but the child isn’t, and vice-versa.
Time to start constructing a script to get rid of this useless stuff.
I left the subtitle div selected in the chrome inspector, and run in the console “$0.textContent”:
Well, this was kind of expected, we need a function to find what is the real text if we want to remove these ads.
To do this, we need a recursive function; the function will obtain a list of the child nodes of the element and remove the ones that are hidden.
The DOM is a big tree of nodes, the elements that compose the page, are usually of type “Element” and “Text”, is important to notice that each different node type has a different set of child nodes allowed. The document node (the root) for example can have as a child a DocumentType node while the other nodes can’t.
The elements that compose the post, are all under a node of type Element, this means we can find only these types of nodes: Element, Text, ProcessingInstruction, and Comment. [Specification].
We don’t have any interest in Comments and ProcessingInstructions, because they are not rendered by the browser. For this reason, we filter them out.
Going back to remove from the list the hidden nodes. Only nodes of type Element can have a style, and for this reason are the only one that can be hidden (together with their children). The other node types don’t have a style and we cannot use “getComputedStyle” on them. This means we need to check the style only on the Element nodes:
Then, with these visible nodes, we collect the nodes that are visible inside them.
But we have a problem if we get the nodes of the elements recursively. The ones at the end (the leaves) will not have any nodes.
We need to stop when we reach an element that contains only text and return what we are interested in, the text content.
Perfect! Now we have everything for our recursive function, the recursion cycle and the stop condition, let’s merge all the pieces.
Time to try the function on the selected element:
Perfect! It works!
Now, we need to have a function that just says “yes” or “no” when we ask if that is the subtitle of a sponsored post.
And just for security, that it works even when the subtitle is missing.
Now that we have a way to know when is sponsored, we just need to obtain all the sponsored posts on the page.
Let’s start with the easy bits, let’s get all the posts, and keep only the ones that are sponsored
Now we need a function to know if the post is sponsored, we already have a function that identifies if a subtitle is the one of a sponsored post, all we need is to pass the subtitle of the post to that function
All the pieces are coming together!
Now we need a function to remove these articles.
Let’s try it!