Avatar
Posts of varying effort on technology, cybersecurity, transhumanism, rationalism, self-improvement, DIY, and other stereotypical technologist stuff.

Static analysis of the DeepSeek Android App

Introduction

I conducted a static analysis of DeepSeek, a Chinese LLM chatbot, using version 1.8.0 from the Google Play Store. The goal was to identify potential security and privacy issues.

I’ve written about DeepSeek previously here.

Additional security and privacy concerns about DeepSeek have been raised.

See also this analysis by NowSecure of the iPhone version of DeepSeek

The findings detailed in this report are based purely on static analysis. This means that while the code exists within the app, there is no definitive proof that all of it is executed in practice. Nonetheless, the presence of such code warrants scrutiny, especially given the growing concerns around data privacy, surveillance, the potential misuse of AI-driven applications, and cyber-espionage dynamics between global powers.

Key Findings

Suspicious Data Handling & Exfiltration

  • Hardcoded URLs direct data to external servers, raising concerns about user activity monitoring, such as to ByteDance “volce.com” endpoints. NowSecure identifies these in the iPhone app yesterday as well.
  • Bespoke encryption and data obfuscation methods are present, with indications that they could be used to exfiltrate user information.
  • The app contains hard-coded public keys, rather than relying on the user device’s chain of trust.
  • UI interaction tracking captures detailed user behavior without clear consent.
  • WebView manipulation is present, which could allow for the app to access private external browser data when links are opened. More information about WebView manipulations is here

Device Fingerprinting & Tracking

A significant portion of the analyzed code appears to focus on gathering device-specific information, which can be used for tracking and fingerprinting.

  • The app collects various unique device identifiers, including UDID, Android ID, IMEI, IMSI, and carrier information.
  • System properties, installed packages, and root detection mechanisms suggest potential anti-tampering measures. E.g. probes for the existence of Magisk, a tool that privacy advocates and security researchers use to root their Android devices.
  • Geolocation and network profiling are present, indicating potential tracking capabilities and enabling or disabling of fingerprinting regimes by region.
  • Hardcoded device model lists suggest the application may behave differently depending on the detected hardware.
  • Multiple vendor-specific services are used to extract additional device information. E.g. if it cannot determine the device through standard Android SIM lookup (because permission was not granted), it attempts manufacturer specific extensions to access the same information.

Potential Malware-Like Behavior

While no definitive conclusions can be drawn without dynamic analysis, several observed behaviors align with known spyware and malware patterns:

  • The app uses reflection and UI overlays, which could facilitate unauthorized screen capture or phishing attacks.
  • SIM card details, serial numbers, and other device-specific data are aggregated for unknown purposes.
  • The app implements country-based access restrictions and “risk-device” detection, suggesting possible surveillance mechanisms.
  • The app implements calls to load Dex modules, where additional code is loaded from files with a .so extension at runtime.
  • The .so files themselves turn around and make additional calls to dlopen(), which can be used to load additional .so files. This facility is not normally checked by Google Play Protect and other static analysis services.
  • The .so files can be implemented in native code, such as C++. The use of native code adds a layer of complexity to the analysis process and obscures the full extent of the app’s capabilities. Moreover, native code can be leveraged to more easily escalate privileges, potentially exploiting vulnerabilities within the operating system or device hardware.

Remarks

While data collection is common in modern applications for debugging and improving user experience, aggressive fingerprinting raises significant privacy concerns. The DeepSeek app requires users to log in with a valid email, which should already provide sufficient authentication. There is no valid reason for the app to aggressively gather and transmit unique device identifiers, IMEI numbers, SIM card details, and other non-resettable system properties.

The extent of tracking observed here exceeds typical analytics practices, potentially enabling persistent user tracking and re-identification across devices. These behaviors, combined with obfuscation techniques and network communication with third-party tracking services, warrant a higher level of scrutiny from security researchers and users alike.

The employment of runtime code loading as well as the bundling of native code suggests that the app could allow the deployment and execution of unreviewed, remotely delivered code. This is a serious potential attack vector. No evidence in this report is presented that remotely deployed code execution is being done, only that the facility for this appears present.

Additionally, the app’s approach to detecting rooted devices appears excessive for an AI chatbot. Root detection is often justified in DRM-protected streaming services, where security and content protection are critical, or in competitive video games to prevent cheating. However, there is no clear rationale for such strict measures in an application of this nature, raising further questions about its intent.

Users and organizations considering installing DeepSeek should be aware of these potential risks. If this application is being used within an enterprise or government environment, additional vetting and security controls should be enforced before allowing its deployment on managed devices.


Disclaimer: The analysis presented in this report is based on static code review and does not imply that all detected functions are actively used. Further investigation is required for definitive conclusions.

all tags