...
Illustration of Scraping Twitter with Python in 2025 showing tweets, code snippets, and data flow.

Scraping Twitter with Python in 2025: Guide how to start

​Twitter remains an indispensable platform for real-time information, social sentiment, and trend analysis. Whether you’re a data scientist, marketer, researcher, or developer, knowing how to scrape Twitter effectively unlocks a wealth of publicly available data. 

This comprehensive guide covers everything you need to build a robust Twitter scraper in Python, including scraping tweets, user profiles, and following lists, with practical examples and expert tips to avoid common pitfalls.

Learn more about web scraping in our article.

Why scrape Twitter? The value of Twitter data

Twitter (X) is a fast-moving social network where millions of users post opinions, news, and updates every second. With over 500 million tweets sent daily, it’s one of the richest real-time data sources available. Scraping Twitter data can help you:

  • Research markets and sentiment to track opinions on products, brands, or events
  • Monitor trends to identify hashtags and topics before they go viral
  • Analyze competitors to see how rivals engage their audiences
  • Support research to collect datasets for academic or social studies
  • Explore niche groups to understand conversations in specific communities

⚠️ Note: While the potential is huge, scraping comes with technical hurdles (rate limits, anti-bot systems) and legal considerations around data use.

What you need to know about scraping Twitter

Twitter’s modern web interface is deliberately complex and built to discourage automated scraping. Some of the main challenges include

  • JavaScript heavy content where tweets and profiles load asynchronously in the background rather than in the initial HTML
  • Frequent UI and API changes that can break scrapers relying on fixed selectors
  • Anti bot measures and IP blocking that quickly flag suspicious traffic patterns
  • Deeply nested JSON responses that require extra parsing before the data becomes usable

​Understanding these challenges upfront helps you design a scraper that is both effective and resilient.​

Essential tools scraping Twitter with Python

Python’s ecosystem makes it one of the strongest choices for Twitter scraping in 2025. Let’s break down the key tools you’ll need, what they do, and where each one fits best.

Tool / LibraryBest ForStrengths
MultiloginBrowser profile & proxy managementReduces detection risk; supports large-scale automation; team-ready, built-in proxies
PlaywrightFull-page scraping & dynamic contentRenders JavaScript; intercepts network requests; simulates real user flow
Requests / httpxLightweight API calls & static JSONFast; simple; reliable for REST endpoints
JMESPathJSON parsing & restructuringQuery nested JSON cleanly; simplifies extraction
Scrapfly SDK (Optional)Scaling high-volume scraping projectsBuilt-in proxy rotation; JavaScript rendering; anti-bot bypass features

How Twitter’s web pages work

Twitter (now X.com) runs as a single-page JavaScript app. This means what you see in your browser isn’t coming straight from static HTML, but from a dynamic frontend powered by background API calls. Let’s break down what actually happens when you open a tweet or profile:

  1. Minimal shell loads first – The raw HTML contains almost nothing useful, just a framework for the app.
  2. JavaScript kicks in – The frontend application is launched, taking over rendering.
  3. Background requests fire – The app calls Twitter’s private GraphQL API endpoints (XHR requests) to pull tweets, timelines, and user data.
  4. Dynamic rendering – JSON responses are transformed into the timeline, profiles, and feeds you interact with.

Why this matters for scraping

Since almost no real data lives in the static HTML, a simple requests-based scraper won’t work. You need a headless browser (e.g., Playwright) to:

  • Execute Twitter’s JavaScript just like a real user would.
  • Intercept GraphQL requests in the background.
  • Extract JSON responses that hold the actual tweet and profile data.

Verdict: Think of Twitter pages as a “mask.” The shell looks simple, but the real gold is hidden in background API calls. If you want reliable scraping, you need to capture those requests, not just scrape raw HTML.

Step-by-step guide: Scraping a tweet with Python + Playwright + Multilogin

Scraping Twitter (X.com) requires more than a simple HTTP request. Tweets are loaded dynamically through hidden API calls, so you need a browser automation tool like Playwright. To make your workflow smoother and safer, you’ll also connect it to Multilogin for browser profile and proxy management.

1. Install your tools

In your terminal:​

pip install playwright jmespath
playwright install

2. Write the scraper

Here’s a simple script that scrapes a single tweet:

from playwright.sync_api import sync_playwright
import jmespath

# Paste your Multilogin proxy here
PROXY = "http://user:pass@host:port"

def scrape_tweet(url: str):
    with sync_playwright() as pw:
        browser = pw.chromium.launch(
            headless=True,
            proxy={"server": PROXY}
        )
        page = browser.new_page()
        xhr_calls = []

        # Capture background requests
        def on_response(response):
            if "TweetResultByRestId" in response.url:
                xhr_calls.append(response)

        page.on("response", on_response)
        page.goto(url)
        page.wait_for_selector("[data-testid='tweet']")

        for call in xhr_calls:
            data = call.json()
            tweet_data = data.get("data", {}).get("tweetResult", {}).get("result")
            if tweet_data:
                return parse_tweet(tweet_data)
    return {}

def parse_tweet(data):
    query = """
    {
        text: legacy.full_text,
        user: core.user_results.result.legacy.screen_name,
        likes: legacy.favorite_count,
        retweets: legacy.retweet_count,
        replies: legacy.reply_count
    }
    """
    return jmespath.search(query, data)

if __name__ == "__main__":
    tweet_url = "https://twitter.com/Scrapfly_dev/status/1664267318053179398"
    print(scrape_tweet(tweet_url))

3. Connect to Multilogin

  • Open Multilogin and create a browser profile.
  • Assign a built-in proxy to that profile.
  • Copy the proxy string (host, port, username, password).
  • Paste it into the PROXY variable in your script.

Now, whenever you run the scraper, it routes traffic through Multilogin, making your workflow more stable and scalable.

  • Playwright loads the tweet just like a normal browser.
  • The script listens for hidden API calls (TweetResultByRestId) that return tweet data.
  • JMESPath extracts the clean fields you want: text, user, likes, retweets, replies.
  • Multilogin integration keeps your scraping environment consistent with proxies and profiles.

Verdict: With Playwright handling automation and Multilogin managing your environment, you can scrape tweets reliably, safely, and at scale.

Scraping Twitter following lists guide

A user’s following list is more than just names — it’s a map of their network. By analyzing who a user follows, you can spot connections, identify influencers, or build datasets for research. The challenge? Twitter (X.com) loads these lists dynamically using infinite scrolling and paginated background requests.

How to scrape following lists

Step 1. Set up your tools

​pip install playwright jmespath
playwright install

Also install Multilogin antidetect browser, create a profile, assign a proxy, and copy the proxy string.

Step 3. Scroll automatically

Following lists load in chunks. Programmatically scroll the page so more users appear.

Step 4. Intercept background requests

Look for XHR calls containing UserBy or Following in the URL. These carry the actual user data.

Step 5. Extract and save usernames

Parse the JSON responses and collect usernames or IDs.

Example code:

from playwright.sync_api import sync_playwright
import jmespath
PROXY = "http://user:pass@host:port"  # from Multilogin
def scrape_following(username: str, max_scrolls: int = 5):
    results = []
    with sync_playwright() as pw:
        browser = pw.chromium.launch(
            headless=True,
            proxy={"server": PROXY}
        )
        page = browser.new_page()
        xhr_calls = []

        # Capture following requests
        def on_response(response):
            if "UserBy" in response.url or "Following" in response.url:
                xhr_calls.append(response)

        page.on("response", on_response)
        page.goto(f"https://twitter.com/{username}/following")
        page.wait_for_selector("[data-testid='UserCell']")

        # Scroll multiple times to load more users
        for _ in range(max_scrolls):
            page.mouse.wheel(0, 2000)
            page.wait_for_timeout(2000)

        # Extract usernames from captured responses
        for call in xhr_calls:
            try:
                data = call.json()
                users = jmespath.search(
                    "data.user.result.timeline.timeline.instructions[].entries[].content.itemContent.user_results.result.legacy.screen_name",
                    data
                )
                if users:
                    results.extend([u for u in users if u])
            except Exception:
                continue
    return list(set(results))  # return unique usernames

if __name__ == "__main__":
    following_list = scrape_following("jack", max_scrolls=10)
    print(f"Collected {len(following_list)} usernames:")
    print(following_list[:20])

Other Twitter scraping methods

Beyond scraping tweets and following lists, there are several other practical approaches worth adding to your toolkit:

  • Advanced JSON parsing – Flatten Twitter’s nested GraphQL responses with tools like JMESPath to extract clean fields (e.g., polls, hashtags, engagement stats).
  • User profile scraping – Capture UserBy calls to retrieve rich profile data such as bios, follower counts, and verification status.
  • Niche community scraping – Target hashtags or keywords (e.g., #ScrapMechanic) to analyze community sentiment, discover influencers, or track events.
  • Scaling at large – Connect your workflow to Multilogin, SDKs like Scrapfly, or cloud APIs for proxy rotation, browser session management, and anti-bot bypass.

Verdict: These methods expand your scraper from simple tweet capture into full-scale analysis of users, communities, and trends.

When scraping Twitter, follow the platform’s Terms of Service, avoid private or protected accounts, and respect user privacy by not collecting personally identifiable information. Use rate limiting to prevent server strain, and never resell scraped datasets without permission. For commercial use, laws vary by country, so consult legal advice when in doubt. Public data scraping is often legal, but responsible practices protect both users and long-term access.

FAQ

​Is it legal to scrape Twitter?

​Scraping publicly available Twitter data is generally legal, but avoid scraping private accounts or violating Twitter’s terms of service. Commercial use of scraped data may require additional permissions.​

Is it legal to scrape Twitter?

Scraping publicly available Twitter (X.com) data is generally legal in many jurisdictions, especially if you’re only collecting tweets, hashtags, or user interactions that anyone can see without logging in. However, there are important limits:

  • Private or protected accounts should never be scraped, as this violates both Twitter’s Terms of Service and privacy laws.
  • Commercial use of scraped data (such as reselling datasets, training AI models, or redistributing tweets) may require explicit permissions or licensing agreements.
  • Laws differ across countries, so if you plan to use Twitter data for research or business purposes, consult legal counsel to ensure compliance.

How can I scrape Twitter without getting blocked?

Twitter employs advanced anti-bot measures to detect suspicious activity. To minimize the risk of blocks or bans:

  • Use Multilogin or similar antidetect browsers to manage multiple accounts and maintain consistent browser fingerprints.
  • Use proxy rotation to change your IP address frequently and avoid detection.
  • Throttle your requests—don’t flood the servers with hundreds of calls in seconds. Instead, space them out to mimic natural user behavior.
  • Mimic human browsing patterns by adding random delays, scrolling, and interaction-like actions.
  • Leverage tools like Playwright or Selenium for headless browsing and realistic automation

Can I scrape Twitter following lists?

Yes, scraping Twitter following lists is possible, but it requires extra effort because these lists:

  • Load dynamically with infinite scrolling, meaning you need to automate scrolling to reveal more data.
  • Use background XHR requests to fetch user data, which you can intercept with tools like Playwright.
  • Require parsing nested JSON to extract useful fields such as usernames, bios, and follower counts.

Pro Tip: Limit the number of scrolls or requests per session to avoid triggering rate limits or suspicion.

How do I handle Twitter’s dynamic content?

Twitter is a single-page JavaScript app, meaning most useful data isn’t visible in the raw HTML. To scrape effectively:

  • Use headless browsers like Playwright to execute JavaScript and load content as a real user would.
  • Intercept GraphQL or XHR requests in the background—this is where the real tweet, profile, and engagement data is delivered.
  • Parse the deeply nested JSON responses using libraries like jmespath to cleanly extract fields such as text, usernames, likes, and retweets.

What are the risks of scraping Twitter?

While scraping can unlock valuable insights, risks include:

  • Account suspension or bans if you scrape while logged in with your own Twitter account.
  • IP blocking if Twitter detects unusual traffic from your server or proxy.
  • Legal or compliance issues if you misuse or resell scraped datasets without proper authorization.

To minimize risks, always scrape responsibly: respect rate limits, avoid private data, and comply with Twitter’s Terms of Service.

What are the best tools for scraping Twitter in 2025?

Some of the most effective tools include:

  • Multilogin – for proxy management and browser fingerprint protection.
  • Playwright – for dynamic content rendering and background request interception.
  • JMESPath – for parsing and restructuring complex JSON data.
  • Scrapfly SDK – optional for scaling large projects with built-in anti-bot bypass and proxy rotation.

Conclusion

Scraping Twitter in 2025 is no longer about simple HTML parsing — it’s about understanding how X.com delivers data and building workflows that can adapt. With Playwright to render dynamic content, JMESPath to tame complex JSON, and Multilogin to manage proxies and browser profiles, you have everything you need to collect tweets, profiles, and following lists reliably.

From single-tweet extraction to large-scale scraping, success depends on strategy as much as code. With flexible tools and ethical practices, you can unlock Twitter’s full potential as a real-time data source.

Final takeaway: Start small, test your scripts carefully, and grow into larger projects with the right mix of tools and practices. Done responsibly, Twitter scraping can fuel research, marketing, and innovation for years to come.

I'm a Content Manager and Full-Stack SEO Specialist with over 7 years of hands-on experience building strategies that rank and convert. I graduated from Institut Montana Zugerberg College, and since then, I’ve been helping brands grow through smart content, technical SEO, and link building. When I'm not working, you'll likely find me lost in Dostoevsky's books.

Melika Ghasemifard

Author