How this skill is triggered — by the user, by Claude, or both
Slash command
/ms-office-expert:office-sensitivity-labelsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
---
Problem: Need to read an Excel/Word/PowerPoint file programmatically, but getting errors?
Solutions (in order of ease):
Use the ready-made scripts (recommended):
# Detect if file is protected
python detect_protection.py "data.xlsx"
# Read any Excel file (auto-detects and uses best method)
python read_protected_excel.py "data.xlsx"
Use as a Python module:
from excel_reader import read_excel
data = read_excel('data.xlsx')
Manual workflow: Open in Excel → Save As → Remove sensitivity label → Read with pandas
📌 See Using the Provided Scripts below for installation, detailed examples, and all features.
This skill includes production-ready Python scripts that handle all edge cases automatically. Use these scripts instead of writing custom code to avoid common pitfalls.
# Install from requirements.txt
pip install -r requirements.txt
# Or install manually
pip install pandas openpyxl olefile
# Windows only (for protected files)
pip install pywin32
detect_protection.py - Analyze FilesChecks if a file is protected before attempting to read it.
python detect_protection.py "data.xlsx"
Output:
Use when: You need to diagnose why a file won't read with pandas.
read_protected_excel.py - Universal Excel ReaderCommand-line tool that automatically tries multiple methods (pandas → COM automation).
# Read all sheets
python read_protected_excel.py "data.xlsx"
# Read first 3 sheets only (for large files)
python read_protected_excel.py "data.xlsx" --max-sheets 3
# Read specific sheets
python read_protected_excel.py "data.xlsx" --sheets "Summary,Details"
# Skip data preview
python read_protected_excel.py "data.xlsx" --no-summary
Features:
Use when: You need to read any Excel file from command line.
excel_reader.py - Python ModuleClean API for integration into your Python code.
from excel_reader import read_excel, get_sheet_names
# Simple usage
data = read_excel('data.xlsx')
if data:
for sheet_name, df in data.items():
print(f"{sheet_name}: {len(df)} rows × {len(df.columns)} columns")
# Advanced usage
data = read_excel(
filepath='data.xlsx',
max_sheets=5, # Limit sheets
sheet_names=['Summary'], # Specific sheets
verbose=False # Quiet mode
)
# Get sheet names without reading data
sheets = get_sheet_names('data.xlsx')
print(f"Available sheets: {sheets}")
Features:
None on failure (easy error handling)Use when: You're writing Python code that needs to read Excel files.
| Issue | How Scripts Handle It |
|---|---|
| "OLE2 compound document" error | Auto-detects and uses COM automation |
| "Cannot set Calculation property" | Skips that property entirely |
| Unicode encoding errors | UTF-8 wrapper included |
| Stuck Excel processes | Proper cleanup in finally blocks |
| Timeouts on large files | --max-sheets parameter |
| Files with special characters | UTF-8 handling throughout |
┌─────────────────────────────────────────┐
│ Need to diagnose a file? │
│ → python detect_protection.py "file" │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Command-line reading? │
│ → python read_protected_excel.py "file" │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Integration in Python code? │
│ → from excel_reader import read_excel │
└─────────────────────────────────────────┘
Modern enterprises often protect sensitive documents using encryption and Digital Rights Management (DRM). This guide explains how to identify protected files and work with them programmatically while maintaining security compliance.
.xlsx, .docx)Protected files contain specific OLE streams that indicate encryption. Here's how to detect them:
import olefile
def check_aip_protection(filepath):
"""Check if file has AIP/DRM protection and extract metadata"""
try:
ole = olefile.OleFileIO(filepath)
# List all streams
streams = ole.listdir()
# Check for encryption markers
encrypted_streams = [s for s in streams if any(
x in str(s).lower() for x in ['encrypt', 'drm', 'rights', 'protection']
)]
if encrypted_streams:
print("⚠️ FILE IS ENCRYPTED/PROTECTED")
print("\nEncryption-related streams:")
for stream in encrypted_streams:
print(f" {stream}")
# Read protection label if present
if ['\x06DataSpaces', 'TransformInfo', 'LabelInfo'] in streams:
label_data = ole.openstream(['\x06DataSpaces', 'TransformInfo', 'LabelInfo']).read()
readable = ''.join(chr(b) if 32 <= b < 127 else ' ' for b in label_data)
print(f"Label info: {readable[:200]}")
ole.close()
return bool(encrypted_streams)
except Exception as e:
print(f"Error: {e}")
return False
Requirements:
pip install olefileYou can also check the file format by examining its header:
with open('protected_file.xlsx', 'rb') as f:
header = f.read(8)
# OLE2 format (often indicates protection): d0cf11e0a1b11ae1
# Standard XLSX format: 504b0304 (ZIP file signature)
print('File header (hex):', header.hex())
Quick Start: This folder includes ready-to-use Python scripts:
detect_protection.py - Check if a file is protectedread_protected_excel.py - Read any Excel file (tries multiple methods)excel_reader.py - Python module with clean APIPlatform: Windows only
Requires: Microsoft Office installed
This method leverages the installed Office application to handle decryption automatically if you have proper permissions.
Important - Unicode Handling: When reading files with special characters, you may encounter encoding errors. Add this at the start of your script:
import sys
import io
# Handle Unicode output properly
if hasattr(sys.stdout, 'buffer'):
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8', errors='replace')
import win32com.client
import pandas as pd
import os
def read_protected_excel_via_com(filepath, max_sheets=None, sheet_names=None):
"""Read AIP-protected Excel file using Excel COM automation
Args:
filepath: Path to the Excel file
max_sheets: Maximum number of sheets to read (None = all sheets)
sheet_names: List of specific sheet names to read (None = all sheets)
Returns:
Dictionary of DataFrames, one per sheet
"""
abs_path = os.path.abspath(filepath)
excel = None
workbook = None
try:
# Start Excel application (invisible)
excel = win32com.client.Dispatch("Excel.Application")
excel.Visible = False
excel.DisplayAlerts = False
# Performance optimizations (optional - may fail on some Excel versions)
# Note: Setting Calculation property can cause errors on some systems
# excel.Calculation = -4135 # xlCalculationManual - SKIP THIS
try:
excel.ScreenUpdating = False
excel.EnableEvents = False
except:
pass # Some Excel versions don't support these properties
# Open workbook - Excel handles AIP decryption
workbook = excel.Workbooks.Open(abs_path, ReadOnly=True)
# Determine which sheets to read
total_sheets = workbook.Worksheets.Count
sheets_to_read = range(1, total_sheets + 1)
if sheet_names:
# Read specific sheets by name
sheets_to_read = [i for i in sheets_to_read
if workbook.Worksheets(i).Name in sheet_names]
elif max_sheets:
# Limit number of sheets
sheets_to_read = range(1, min(max_sheets + 1, total_sheets + 1))
# Read data from sheets
all_data = {}
for i in sheets_to_read:
sheet = workbook.Worksheets(i)
sheet_name = sheet.Name
# Get all data at once
used_range = sheet.UsedRange
data = used_range.Value
if data:
# Convert to pandas DataFrame
df = pd.DataFrame(data)
if len(df) > 0:
# Use first row as headers
df.columns = df.iloc[0]
df = df[1:].reset_index(drop=True)
all_data[sheet_name] = df
return all_data
finally:
# Always clean up COM objects
try:
if workbook:
workbook.Close(SaveChanges=False)
if excel:
# Restore settings (if they were changed)
try:
excel.ScreenUpdating = True
excel.EnableEvents = True
except:
pass
excel.Quit()
except Exception as cleanup_error:
# If cleanup fails, Excel process may remain - kill manually if needed
pass
Requirements:
pip install pywin32Pros:
Cons:
Performance Tips:
excel.ScreenUpdating = Falseexcel.Calculation = -4135excel.EnableEvents = Falseworkbook = excel.Workbooks.Open(path, ReadOnly=True)Platform: All platforms (Windows, macOS, Linux)
Requires: Azure AD authentication
Use the official MIP SDK for cross-platform support:
from azure.identity import DefaultAzureCredential
# Note: MIP SDK requires additional setup and configuration
# This is conceptual - full implementation requires:
# 1. Azure AD app registration
# 2. Proper SDK initialization
# 3. Authentication flow
# 4. File decryption API calls
Requirements:
Pros:
Cons:
Platform: All platforms (wherever Office runs)
For one-time or occasional access:
Open file in Office application
Save unprotected copy
Read with standard libraries
import pandas as pd
# Now read the unprotected copy
df = pd.read_excel('unprotected_copy.xlsx')
Important: Only create unprotected copies if:
Platform: N/A (Process-based)
Sometimes the best solution is organizational:
Respect Protection Intent
Proper Authentication
Secure Storage
Check Before Processing
# Always verify protection status first
is_protected = check_aip_protection(filepath)
if is_protected:
print("File requires special handling")
Error Handling
try:
data = read_protected_file(filepath)
except PermissionError:
print("You don't have access to this file")
except Exception as e:
print(f"Unable to read file: {e}")
Resource Cleanup
| Scenario | Recommended Solution | Platform |
|---|---|---|
| Windows environment with Office | COM Automation | Windows only |
| Cross-platform enterprise app | MIP SDK | All platforms |
| One-time data extraction | Manual workflow | All platforms |
| Lack of technical permissions | Request access | N/A |
| Development/testing | Unprotected test files | All platforms |
Here's a complete example that detects protection and handles appropriately:
import os
import sys
import pandas as pd
import win32com.client
def process_confidential_file(filepath, max_sheets=5):
"""
Process a potentially protected file with proper handling
Args:
filepath: Path to Excel file
max_sheets: Maximum sheets to read (prevents timeouts on large files)
Returns:
Dictionary of DataFrames or None
"""
# Step 1: Check if file is protected
print(f"Checking: {filepath}")
# Try standard read first (fastest if file isn't protected)
try:
df = pd.read_excel(filepath)
print("✓ File read successfully with pandas")
return {'Sheet1': df}
except Exception as e:
print(f"Standard read failed: {e}")
print("File may be protected, trying COM automation...")
# Step 2: Handle protected file
if sys.platform != 'win32':
print("""
This file requires special handling. Options:
1. Open in Office app and save unprotected copy
2. Use MIP SDK (requires setup)
3. Request unprotected version from data owner
""")
return None
# Windows: Use COM automation with safety limits
return read_protected_excel_via_com(filepath, max_sheets=max_sheets)
# Usage - always create as a .py file, not terminal one-liner
if __name__ == "__main__":
data = process_confidential_file('sensitive_data.xlsx', max_sheets=3)
if data:
print(f"\n✓ Successfully read {len(data)} sheets!")
for sheet_name, df in data.items():
print(f" • {sheet_name}: {len(df)} rows × {len(df.columns)} columns")
def read_large_protected_file(filepath):
"""Read large protected Excel files in batches to avoid timeouts"""
excel = None
workbook = None
try:
excel = win32com.client.Dispatch("Excel.Application")
excel.Visible = False
excel.DisplayAlerts = False
excel.ScreenUpdating = False
excel.Calculation = -4135 # Manual
workbook = excel.Workbooks.Open(os.path.abspath(filepath), ReadOnly=True)
total_sheets = workbook.Worksheets.Count
print(f"File has {total_sheets} sheets")
print("Reading first 3 sheets as preview...")
# Read in small batches
batch_size = 3
all_data = {}
for i in range(1, min(batch_size + 1, total_sheets + 1)):
sheet = workbook.Worksheets(i)
print(f" Reading sheet {i}/{total_sheets}: {sheet.Name}")
data = sheet.UsedRange.Value
if data:
df = pd.DataFrame(data)
if len(df) > 0:
df.columns = df.iloc[0]
df = df[1:].reset_index(drop=True)
all_data[sheet.Name] = df
return all_data, total_sheets
finally:
if workbook:
workbook.Close(SaveChanges=False)
if excel:
excel.Quit()
# Usage
data, total = read_large_protected_file('large_file.xlsx')
print(f"\nRead {len(data)}/{total} sheets (preview mode)")
Immediate fixes:
read_protected_excel.py script instead of custom codeEXCEL.EXEFor large files (many sheets or rows):
# Use max_sheets to limit reading
python read_protected_excel.py "large_file.xlsx" --max-sheets 5
# Or read specific sheets only
python read_protected_excel.py "file.xlsx" --sheets "Summary,Dashboard"
Common mistakes to avoid:
excel.Calculation property (causes errors on some Excel versions)Symptom: UnicodeEncodeError: 'charmap' codec can't encode character
Solution: Add UTF-8 encoding at the start of your script:
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8', errors='replace')
This is already included in the provided scripts (read_protected_excel.py, excel_reader.py).
str(s).encode('utf-8', errors='ignore').decode('utf-8')Symptom: pywintypes.com_error: Unable to set the Calculation property of the Application class
Cause: Some Excel versions don't support changing the Calculation property via COM.
Solution: Don't set the Calculation property. The provided scripts avoid this issue.
# ❌ This causes errors on some systems:
excel.Calculation = -4135
# ✅ Skip it entirely - the performance difference is minimal for most use cases
Working with protected files requires understanding their protection mechanism and using appropriate tools. Always:
Remember: File protection exists to safeguard sensitive information. Always work within your organization's security policies and respect data handling guidelines.
Provides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.
npx claudepluginhub chkp-roniz/skills --plugin ms-office-expert