[LinkedIn] Discover the Community-Focused Feed Optimization

We have lots of expertise in supporting organization in their presence on B2B social media and we offer a broad set of marketing services. Our capabilities are proven by the appreciation of many customers. Let’s have a talk!

On this subject, here is the an article about posting on LinkedIn and their news feed: LinkedIn’s feed stands at the center of building global professional knowledge-sharing communities for our members. Members talk about their career stories, job openings, and ideas in a variety of formats, including links, video, text, images, documents, and long-form articles. Members participate in conversations in two distinct roles: as content creators who share posts, and as feed viewers who read those posts and respond to them through reactions, comments, or reshares. By helping members actively participate in those professional conversations, we are fulfilling LinkedIn’s mission to connect the world’s professionals to make them more productive and successful.

This post focuses on one important aspect of  the machine learning at Linkedin’s feed: candidate selection. Before final fine feed ranking, a personalized candidate generation algorithm is applied to tens of thousands of feed updates to select a diverse and unique candidate pool. We describe the machine learning models applied in candidate generation and the infrastructure capabilities that support accurate and agile model iterations.

Overview of Feed at LinkedIn

At the heart of the feed sits a machine learning algorithm that works to identify the best conversations for our members. In a fraction of a second, the algorithm scores tens of thousands of posts and ranks the most relevant at the top of the feed. In order to operate at this scale and speed, LinkedIn’s feed has a two-pass architecture. The first pass rankers (FPR) create a preliminary candidate selection from their inventories based on predicted relevance to the feed viewer. Examples include updates from your network, job recommendations, and sponsored updates. A second pass ranker (SPR) then combines and scores the output from all first pass rankers. The SPR creates a single, personalized ranked list. FollowFeed is the dominant FPR that serves feed updates from your network. More than 80% of feed updates are coming from FollowFeed, and those updates contribute to more than 95% of members’ conversations. Through these conversations, active communities are formed and strengthened.

two-pass-ranking-architecture

Two-pass ranking architecture for LinkedIn’s homepage feed

At LinkedIn’s scale, the main technical challenge is to find the right balance between infrastructure impact and multi-objective optimization using comprehensive machine learning algorithms. Those objectives include members’ likelihood to view updates, their likelihood to participate in conversations, and providing timely reactions to content creators. There are hundreds of machine learning features we use to compute these objectives. We want to optimize these objectives continuously and accurately while satisfying the low latency requirements of our infrastructure footprints.

Our teams tackled this problem through a joint project amongst the Feed Artificial Intelligence (AI), Feed Infrastructure, and Ranking Infrastructure teams. As we ramp this project globally, we would like to share more details about our technical solutions.

  1. We refreshed the machine learning software stack in FollowFeed, leveraging the latest productive machine learning technologies. Through a ranking engine upgrade and model deployment technology, we enabled frequent and agile updating of machine learning models. We also added accurate tracking of machine learning features in FollowFeed, which helps us guarantee data and model consistency across training and serving. Moreover, we developed tools to inspect machine learning algorithm complexity at serving time.
  2. With minimal additional complexity, we have rebuilt our machine learning model for candidate selection from scratch with new objectives and different algorithms. As part of this, we introduced the prediction of professional conversation contribution into our model to capture the community-building aspect of each feed update. Instead of multiple logistic regressions with manual interactions, we’ve used a single XGBoost tree ensemble to trim down the complexity. Additionally, we’ve considered timely feedback to content creators in our model and make sure all members have a chance to feel heard. All of these things are done with minimal infrastructure capacity addition.

In summary, this project builds the engineering capabilities for us to iterate comprehensive machine learning models at the candidate generation stage, and we’ve leveraged these capabilities to deliver performant, multi-objective optimization models. At LinkedIn’s scale, these solutions will help bring economic opportunity to the global workforce through more relevant conversations with their professional communities.

Performant AI infrastructure with agility

FollowFeed, LinkedIn’s powerful feed indexing and serving system, has now been equipped with advanced machine learning capabilities. The initial design had accommodated the ranking needs for feed, but the field of machine learning has advanced tremendously in the past five years since the original FollowFeed design. During this project, we boosted the agility and productivity of machine learning in FollowFeed by adopting the latest machine learning inference and model deployment technologies. Such infrastructure enhancements enable the modeling capability described later in this blog post.

FollowFeed-architecture

FollowFeed architecture

We have updated FollowFeed’s ranking engine to Quasar. Quasar, as part of LinkedIn’s Pro-ML technology, transforms machine learning features and inferences the machine learning model at query time. As a high-performance, multi-threaded ranking engine, Quasar not only optimizes for infrastructure system efficiency but also machine learning productivity. Such productivity improvements have enabled:

  1. Cross-system leverage: We can easily port the latest machine learning models and transformers from the second pass layer to FollowFeed.
  2. Training and serving consistency: At offline training time, the same codebase is used to represent the model as at serving time.

To reflect the rapid evolution of LinkedIn’s content ecosystem, machine learning models have to be constantly updated. We’ve built FollowFeed’s model deployment on top of LinkedIn’s Continuous Integration and Deployment (CICD) stack. Being a stateful system that indexes members’ past activities, FollowFeed presents a unique challenge in model deployment. We have to avoid calling external services to maintain high reliability and performance of index nodes where ranking is taking place. To optimize for such limits, we previously coupled ranking models with code, which leads to strong coupling of service deployment with model coefficient changes. To allow for easier model evolution, our solution is now a data pack-based deployment model, where we package the machine learning models in a separate code base, a.k.a. a multi-product. Those models are treated as a “data pack,” a deployable package consisting only of static files to be dropped into a specific location of production machines. Through such a design, model deployment can be easily managed by LinkedIn’s CICD system. Consequently, we’ve improved model deployment velocity from 3 days to 30 minutes.

In addition to ranking and model deployment, we designed and implemented advanced feature access and tracking in FollowFeed. As it scores thousands of documents per session, FollowFeed optimizes access to machine learning features needed by scoring models. Viewer features are passed down as part of the request without requiring an external call to be made. Features for feed updates are ingested, stored, and updated alongside these updates so that they are accessible locally on the index nodes for scoring. Given the well-known data inconsistency challenges between offline training and online scoring, we also added accurate tracking of machine learning features in FollowFeed. This guarantees data consistency between offline training data and online inference data. Aggregating these tracking events across the distributed index nodes presents a challenge. Even though results from the index nodes are aggregated in the broker layer, we do not want to gather the tracking data synchronously due to scalability and serving latency concerns. Our design overcomes these concerns by having individual index nodes and the broker node stream tracking events asynchronously to Kafka streams. A Samza job joins these Kafka streams and emits a comprehensive tracking event for the request.

Equipped with advanced machine learning capabilities, it is much easier to develop performant machine learning models for FollowFeed with accurate features. Such agility will enable better candidate feed updates to be surfaced on LinkedIn’s homepage. Through actively updating the machine learning model, we will be able to give more power to existing and newly-minted content creators in LinkedIn’s ecosystem. We will also facilitate the professional community builders’ curation of their audience.

Optimizing for conversations at candidate generation

[to continue, click HERE]

Leave a Reply