{"id":33698,"date":"2025-05-01T10:27:18","date_gmt":"2025-05-01T08:27:18","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/training-ai-models-with-high-dimensionality\/"},"modified":"2025-05-01T10:27:18","modified_gmt":"2025-05-01T08:27:18","slug":"training-ai-models-with-high-dimensionality","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/training-ai-models-with-high-dimensionality\/","title":{"rendered":"Training AI Models With High Dimensionality?"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>I&#8217;m working on a project predicting the outcome of 1v1 fights in League of Legends using data from the Riot API (MatchV5 timeline events). I scrape game state information around specific 1v1 kill events, including champion stats, damage dealt, and especially, the items each player has in his inventory at that moment.<\/p>\n<p>Items give each player a significant stat boosts (AD, AP, Health, Resistances etc.) and unique passive\/active effects, making them highly influential in fight outcomes. However, I&#8217;m having trouble representing this item data effectively in my dataset.<\/p>\n<p><strong>My Current Implementations:<\/strong><\/p>\n<ol>\n<li><strong>Initial Approach: Slot-Based Features<\/strong>\n<ul>\n<li>I first created features like <code>player1_item_slot_1<\/code>, <code>player1_item_slot_2<\/code>, &#8230;, <code>player1_item_slot_7<\/code>, storing the <code>item_id<\/code> found in each inventory slot of the player.<\/li>\n<li><strong>Problem:<\/strong> This approach is fundamentally flawed because item slots in LoL are purely organizational; they have <em>no impact<\/em> on the item&#8217;s effectiveness. An item provides the same benefits whether it&#8217;s in slot 1 or slot 6. I&#8217;m concerned the model would learn spurious correlations based on slot position (e.g., erroneously learning an item is &#8220;stronger&#8221; only when it appears in a specific slot), not being able to learn that item Ids have the same strength across all player item slots.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Alternative Considered: One-Feature-Per-Item (Multi-Hot Encoding)<\/strong>\n<ul>\n<li>My next idea was to create a binary feature for every single item in the game (e.g., <code>has_Rabadons=1<\/code>, <code>has_BlackCleaver=1<\/code>, <code>has_Zhonyas=0<\/code>, etc.) for each player.<\/li>\n<li><strong>Benefit:<\/strong> This accurately reflects <em>which<\/em> specific items a player has in his inventory, regardless of slot, allowing the model to potentially learn the value of individual items and their unique effects.<\/li>\n<li><strong>Drawback:<\/strong> League has hundreds of items. This leads to:\n<ul>\n<li><strong>Very High Dimensionality:<\/strong> Hundreds of new features per player instance.<\/li>\n<li><strong>Extreme Sparsity:<\/strong> Most of these item features will be 0 for any given fight (players hold max 6-7 items).<\/li>\n<li><strong>Potential Issues:<\/strong> This could significantly increase training time, require more data, and heighten the risk of overfitting (Curse of Dimensionality)!?<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p>So now I wonder, is there anything else that I could try or do you think that either my Initial approach or the alternative one would be better?<\/p>\n<p><strong>I&#8217;m using XGB and train on a Dataset with roughly 8 Million lines (300k games).<\/strong><\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/Revolutionary_Mine29\"> \/u\/Revolutionary_Mine29 <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1kc3ea7\/training_ai_models_with_high_dimensionality\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1kc3ea7\/training_ai_models_with_high_dimensionality\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-33698 jlk' href='javascript:void(0)' data-task='like' data-post_id='33698' data-nonce='65e0e39b87' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-33698 lc'>0<\/span><\/a><\/div><\/div> <div class='status-33698 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>I&#8217;m working on a project predicting the outcome of 1v1 fights in League of Legends using data&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-33698","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/33698","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=33698"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/33698\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=33698"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=33698"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=33698"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}