music-and-you

Music and You: Project Structure & Development Guide

πŸ“ Project Structure

music-and-you/
β”œβ”€β”€ πŸ“„ README.md                    # Project overview and quick start
β”œβ”€β”€ πŸ“„ literature.MD                # Comprehensive literature review
β”œβ”€β”€ πŸ“„ pyproject.toml               # Python project configuration
β”œβ”€β”€ πŸ“„ requirements.txt             # Python dependencies
β”œβ”€β”€ πŸ“„ .gitignore                   # Git ignore patterns
β”œβ”€β”€ πŸ› οΈ setup_dev.sh                # Development environment setup
β”‚
β”œβ”€β”€ πŸ“ src/music_and_you/           # Main source code
β”‚   β”œβ”€β”€ πŸ“„ __init__.py              # Package initialization
β”‚   β”œβ”€β”€ πŸ“„ core.py                  # Core constants and configurations
β”‚   β”œβ”€β”€ πŸ“„ config.py                # Configuration management
β”‚   β”œβ”€β”€ πŸ“„ cli.py                   # Command-line interface
β”‚   β”‚
β”‚   β”œβ”€β”€ πŸ“ data/                    # Data ingestion modules
β”‚   β”‚   β”œβ”€β”€ πŸ“„ __init__.py
β”‚   β”‚   β”œβ”€β”€ πŸ“„ base_client.py       # Abstract base for API clients
β”‚   β”‚   β”œβ”€β”€ πŸ“„ spotify_client.py    # Spotify API integration
β”‚   β”‚   β”œβ”€β”€ πŸ“„ lastfm_client.py     # Last.fm API integration
β”‚   β”‚   └── πŸ“„ youtube_music_client.py  # YouTube Music integration
β”‚   β”‚
β”‚   β”œβ”€β”€ πŸ“ features/                # Feature extraction
β”‚   β”‚   β”œβ”€β”€ πŸ“„ __init__.py
β”‚   β”‚   β”œβ”€β”€ πŸ“„ acoustic_features.py # Audio feature extraction
β”‚   β”‚   β”œβ”€β”€ πŸ“„ behavioral_features.py # Listening behavior features
β”‚   β”‚   β”œβ”€β”€ πŸ“„ temporal_features.py # Time-based patterns
β”‚   β”‚   β”œβ”€β”€ πŸ“„ lyrical_features.py  # Lyric analysis features
β”‚   β”‚   └── πŸ“„ feature_pipeline.py  # Complete feature pipeline
β”‚   β”‚
β”‚   β”œβ”€β”€ πŸ“ models/                  # Machine learning models
β”‚   β”‚   β”œβ”€β”€ πŸ“„ __init__.py
β”‚   β”‚   β”œβ”€β”€ πŸ“„ personality_predictor.py # Base predictor class
β”‚   β”‚   β”œβ”€β”€ πŸ“„ ridge_model.py       # Ridge regression implementation
β”‚   β”‚   β”œβ”€β”€ πŸ“„ random_forest_model.py # Random Forest implementation
β”‚   β”‚   └── πŸ“„ model_ensemble.py    # Ensemble methods
β”‚   β”‚
β”‚   β”œβ”€β”€ πŸ“ api/                     # Web API (FastAPI)
β”‚   β”‚   β”œβ”€β”€ πŸ“„ __init__.py
β”‚   β”‚   β”œβ”€β”€ πŸ“„ main.py              # FastAPI application
β”‚   β”‚   β”œβ”€β”€ πŸ“„ auth.py              # Authentication routes
β”‚   β”‚   β”œβ”€β”€ πŸ“„ prediction.py        # Prediction endpoints
β”‚   β”‚   └── πŸ“„ admin.py             # Admin/monitoring endpoints
β”‚   β”‚
β”‚   └── πŸ“ utils/                   # Utility functions
β”‚       β”œβ”€β”€ πŸ“„ __init__.py
β”‚       β”œβ”€β”€ πŸ“„ logging.py           # Logging configuration
β”‚       β”œβ”€β”€ πŸ“„ database.py          # Database utilities
β”‚       β”œβ”€β”€ πŸ“„ validation.py        # Data validation
β”‚       └── πŸ“„ encryption.py        # Privacy/security utilities
β”‚
β”œβ”€β”€  .env.example                 # Environment variables template
β”‚
β”œβ”€β”€ πŸ“ data/                        # Data storage
β”‚   β”œβ”€β”€ πŸ“ raw/                     # Raw data from APIs
β”‚   β”œβ”€β”€ πŸ“ processed/               # Processed/cleaned data
β”‚   β”œβ”€β”€ πŸ“ external/                # External datasets
β”‚   └── πŸ“ features/                # Extracted features
β”‚
β”œβ”€β”€ πŸ“ models/                      # Trained models
β”‚   β”œβ”€β”€ πŸ“ saved/                   # Production models
β”‚   β”œβ”€β”€ πŸ“ checkpoints/             # Training checkpoints
β”‚   └── πŸ“ experiments/             # Experimental models
β”‚
β”œβ”€β”€ πŸ“ notebooks/                   # Jupyter notebooks
β”‚   β”œβ”€β”€ πŸ“„ 01_data_exploration.ipynb
β”‚   β”œβ”€β”€ πŸ“„ 02_feature_engineering.ipynb
β”‚   β”œβ”€β”€ πŸ“„ 03_model_development.ipynb
β”‚   β”œβ”€β”€ πŸ“„ 04_evaluation.ipynb
β”‚   └── πŸ“„ 05_privacy_analysis.ipynb
β”‚
β”œβ”€β”€ πŸ“ tests/                       # Test suite
β”‚   β”œβ”€β”€ πŸ“„ conftest.py              # Test configuration
β”‚   β”œβ”€β”€ πŸ“„ test_data_clients.py     # API client tests
β”‚   β”œβ”€β”€ πŸ“„ test_features.py         # Feature extraction tests
β”‚   β”œβ”€β”€ πŸ“„ test_models.py           # Model tests
β”‚   └── πŸ“„ test_api.py              # API tests
β”‚
β”œβ”€β”€ πŸ“ experiments/                 # Research experiments
β”‚   β”œβ”€β”€ πŸ“ ablation_studies/        # Feature ablation experiments
β”‚   β”œβ”€β”€ πŸ“ cross_cultural/          # Cross-cultural validation
β”‚   β”œβ”€β”€ πŸ“ privacy_experiments/     # Privacy-preserving methods
β”‚   └── πŸ“ baselines/               # Baseline comparisons
β”‚
β”œβ”€β”€ πŸ“ reports/                     # Analysis reports
β”‚   β”œβ”€β”€ πŸ“ figures/                 # Generated plots
β”‚   β”œβ”€β”€ πŸ“ tables/                  # Statistical tables
β”‚   └── πŸ“„ analysis_report.md       # Main analysis report
β”‚
β”œβ”€β”€ πŸ“ frontend/                    # Web interface (future)
β”‚   β”œβ”€β”€ πŸ“„ package.json
β”‚   β”œβ”€β”€ πŸ“ src/
β”‚   └── πŸ“ public/
β”‚
└── πŸ“ docker/                      # Containerization
    β”œβ”€β”€ πŸ“„ Dockerfile               # Main application container
    β”œβ”€β”€ πŸ“„ docker-compose.yml       # Multi-service setup
    β”œβ”€β”€ πŸ“„ Dockerfile.research      # Research environment
    └── πŸ“„ nginx.conf               # Web server configuration

πŸš€ Quick Start

1. Environment Setup

# Clone and setup
git clone https://github.com/tmarhguy/music-and-you.git
cd music-and-you
chmod +x setup_dev.sh
./setup_dev.sh

2. Configuration

# Copy and configure environment
cp .env.example .env


# Edit with your API credentials
nano .env

3. Data Collection

# Authenticate with Spotify
music-and-you auth --platform spotify

# Collect listening data
music-and-you collect --user-id YOUR_USER_ID --days 180

4. Feature Extraction

# Extract features from listening data
music-and-you extract-features --input-file data/raw/listening_history.json

5. Model Training

# Train personality prediction model
music-and-you train --features-file data/features/features.csv --survey-file data/survey_responses.csv

6. Web API

# Start the web server
music-and-you serve --host 0.0.0.0 --port 8000

πŸ§ͺ Development Workflow

Testing

# Run all tests
pytest

# Run specific test categories
pytest tests/test_features.py -v
pytest -m "not slow"  # Skip slow tests

# Coverage report
pytest --cov=src/music_and_you --cov-report=html

Code Quality

# Format code
black src/ tests/
isort src/ tests/

# Lint code
flake8 src/ tests/
mypy src/

# Pre-commit hooks
pre-commit install
pre-commit run --all-files

Docker Development

# Build development container
docker-compose up --build

# Run in research mode
docker-compose -f docker/docker-compose.research.yml up

πŸ“Š Data Flow Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Music APIs    β”‚    β”‚   Feature        β”‚    β”‚   ML Models       β”‚
β”‚                 β”‚    β”‚   Extraction     β”‚    β”‚                   β”‚
β”‚ β€’ Spotify       │───▢│                  │───▢│ β€’ Ridge Regressionβ”‚
β”‚ β€’ Last.fm       β”‚    β”‚ β€’ Acoustic       β”‚    β”‚ β€’ Random Forest   β”‚
β”‚ β€’ YouTube Music β”‚    β”‚ β€’ Behavioral     β”‚    β”‚ β€’ Ensemble        β”‚
β”‚ β€’ Apple Music   β”‚    β”‚ β€’ Temporal       β”‚    β”‚ β€’ Neural Networks β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚ β€’ Lyrical        β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
                                β”‚                        β”‚
                                β–Ό                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Web API       β”‚    β”‚   Data Storage   β”‚    β”‚   Predictions     β”‚
β”‚                 β”‚    β”‚                  β”‚    β”‚                   β”‚
β”‚ β€’ FastAPI       │◀───│ β€’ PostgreSQL     β”‚    β”‚ β€’ Personality     β”‚
β”‚ β€’ Authenticationβ”‚    β”‚ β€’ Redis Cache    β”‚    β”‚   Traits          β”‚
β”‚ β€’ Rate Limiting β”‚    β”‚ β€’ File Storage   β”‚    β”‚ β€’ Confidence      β”‚
β”‚ β€’ Documentationβ”‚    β”‚ β€’ Backups        β”‚    β”‚ β€’ Explanations    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”¬ Research Components

Literature Foundation

Key Innovations

  1. Multi-Platform Integration: Unified feature extraction across streaming services
  2. Concept Bottleneck Models: Interpretable intermediate psycho-musical concepts
  3. Privacy-Preserving ML: Federated learning and differential privacy
  4. Cultural Adaptation: Cross-cultural normalization and bias mitigation

Success Metrics

πŸ›‘οΈ Privacy & Ethics

Privacy-First Design

Ethical Considerations

πŸ“š Dependencies

Core ML Stack

Music & Audio

Web & API

Development

🎯 Next Steps

Phase 1: MVP (Current)

Phase 2: Enhancement

Phase 3: Research


Ready to contribute? Check out our Contributing Guide and Code of Conduct.

Questions? Open an issue or contact the maintainers.