使用array_merge导致内存不足的反思

「这是我参与11月更文挑战的第1天，活动详情查看：2021最后一次更文挑战」

故事背景

从用户喜欢表分批拿到数据，通过array_merge()组装，再批量插入到数据分析表。
测试的时候因为数据量小，没有出现问题。随着业务增长，在查询范围内已经超过3万条数据。
3万条数据在8核32G的单机上已经提示内存溢出了。

解决问题的思路

设计初衷是不全量分析数据，只取查询范围内有喜欢动作的用户
尽量减少DB操作，把计算和拼接数据的操作交给程序
因为没有考虑到程序的计算也是有上限的，所有解决问题的思路在于上面的1、2保持不变，需要找到一个平衡点。

优化后的思路是：

分批取的同时分批插入，每次插入休眠10毫秒

核心代码如下：

php

//最近7天喜欢的数据 public static function likeBetweenDuration($begin, $end) {     $limit = 1000;     $offset = 0;     $users = [];     do {         $sponsorUserIds = self::query()             ->selectRaw('userid,createtime')             ->distinct()             ->whereBetween('createtime', [$begin, $end])             ->orderBy('createtime')             ->offset($offset)             ->limit($limit)             ->get()             ->toArray();         $beLikedUserIds = self::query()             ->selectRaw('"otherUserid",createtime')             ->distinct()             ->whereBetween('createtime', [$begin, $end])             ->orderBy('createtime')             ->offset($offset)             ->limit($limit)             ->get()             ->toArray();         $sponsorUserIds = array_column($sponsorUserIds, 'userid');         $beLikedUserIds = array_column($beLikedUserIds, 'otherUserid');         $likesUserIds = array_unique(array_merge($sponsorUserIds, $beLikedUserIds));         $userIds = array_map(function ($value) {             return ['userid' => $value];         }, $likesUserIds);         UserActionRecord::recordBatch($userIds);         echo "推荐算法需要的喜欢\n";         echo 'arrayCount:' . count($userIds) . "\n";         $offset = $offset + $limit;         echo '偏移量：' . $offset . "\n";         usleep(10); //休眠10毫秒     } while ($userIds);     return $users; }

优化前的思路是：

分批从DB中读取，通过array_merge()拼接所有数据，将所有数据通过一条sql批量插入数据库。

核心代码如下：

php

//分批取值的方法 public static function likeBetweenDuration($begin, $end, $select = 'userid,"otherUserid"') {     $limit = 200;     $offset = 0;     $users = [];     do {         $thisUsers = self::query()             ->selectRaw($select)             ->whereBetween('createtime', [$begin, $end])             ->orderBy('createtime')             ->offset($offset)             ->limit($limit)             ->get()             ->toArray();         $users = array_merge($users, $thisUsers);         $offset = $offset + $limit;     } while ($thisUsers);     return $users; } //获得所有数据，再去重，插入数据库 $likes = UserRelationSingle::likeBetweenDuration(Utility::recommendCalcTimestamp(), Utility::recommendCalcEndTimestamp()); $sponsorUserIds = array_column($likes, 'userid'); $beLikedUserIds = array_column($likes, 'otherUserid'); $likesUserIds = array_unique(array_merge($sponsorUserIds, $beLikedUserIds)); $userIds = array_map(function ($value) {     return ['userid' => $value]; }, $likesUserIds); UserActionRecord::recordBatch($userIds); echo "UserActionRecord 批量记录有喜欢行为的用户：" .     json_encode($userIds) . "\n";

总结

优化前的array_merge()一定会随着数据的增多出现内存不足的情况，而优化后的代码就不会。
优化后的思路array_merge()每次最多只会处理2千条数据。

思路对比：

优化前的思路尝试使用尽量少的sql，减少DB操作，把压力交给程序(PHP函数)去处理，忽略了内存问题。
优化后的思路较好的平衡了DB操作和程序之间的平衡关系，分配读取的sql没有变；之前的一次写入改成了多次写入，规避了内存问题，同时每次DB插入之后休眠10毫秒，减轻DB压力。

参与互动

大佬们有啥好的方案欢迎在评论区指教。

使用array_merge导致内存不足的反思 ​

故事背景 ​

解决问题的思路 ​

优化后的思路是： ​

优化前的思路是： ​

总结 ​

思路对比： ​

参与互动 ​

🚀 学习遇到瓶颈？想进大厂？

使用array_merge导致内存不足的反思

故事背景

解决问题的思路

优化后的思路是：

优化前的思路是：

总结

思路对比：

参与互动